Structured Web Extraction API

About this tool

Unlike a plain scraper, /extract uses Mozilla's Readability algorithm to strip navbars, ads, footers, and sidebars — leaving only the meaningful content. Output includes clean markdown, title, description, author, publication date, links, and images. You can also pass a schema object to extract specific fields via CSS selectors or regex.

🧪 Try it live

API Key

URL to extract

Quick Start

curl -X POST https://api.iteratools.com/extract \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article",
    "wait_for": 2000
  }'

With Schema Extraction

Pass a schema object to extract specific fields using CSS selectors. Falls back to regex if selector not found.

curl -X POST https://api.iteratools.com/extract \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://shop.example.com/product",
    "schema": {
      "price": ".price, .product-price",
      "name": "h1",
      "rating": "[data-rating]"
    }
  }'

Response

{
  "ok": true,
  "data": {
    "title": "Article Title",
    "description": "Meta description...",
    "author": "Jane Doe",
    "published_date": "2024-01-15",
    "markdown": "# Article Title\n\nClean article content...",
    "links": [{ "text": "Read more", "href": "https://..." }],
    "images": [{ "src": "https://example.com/img.jpg", "alt": "..." }],
    "word_count": 842,
    "url": "https://example.com/article",
    "schema_data": { "price": "$29.99", "name": "Product Name" }
  }
}

Request Parameters

url	required — the URL to extract content from
schema	optional — object mapping field names to CSS selectors or regex patterns
wait_for	optional — milliseconds to wait for JS rendering (default: 2000, max: 5000)

Details

Endpoint	POST /extract
Price	$0.005 / extraction
Auth	Bearer token or x402 micropayment
Engine	Mozilla Readability + Playwright (JS rendering)
Base URL	https://api.iteratools.com