Extract clean content from any webpage โ markdown, metadata, links, images, and schema-defined fields. Powered by Readability + Playwright.
Unlike a plain scraper, /extract uses Mozilla's Readability algorithm to strip navbars, ads, footers, and sidebars โ leaving only the meaningful content. Output includes clean markdown, title, description, author, publication date, links, and images. You can also pass a schema object to extract specific fields via CSS selectors or regex.
Pass a schema object to extract specific fields using CSS selectors. Falls back to regex if selector not found.
| url | required โ the URL to extract content from |
| schema | optional โ object mapping field names to CSS selectors or regex patterns |
| wait_for | optional โ milliseconds to wait for JS rendering (default: 2000, max: 5000) |
| Endpoint | POST /extract |
| Price | $0.005 / extraction |
| Auth | Bearer token or x402 micropayment |
| Engine | Mozilla Readability + Playwright (JS rendering) |
| Base URL | https://api.iteratools.com |