Use Cases
- Extract company information from websites
- Gather product details from e-commerce pages
- Collect contact information from business pages
- Extract structured data from articles or blog posts
Request Body
The URL of the webpage you want to extract information from. You must provide exactly one of:
website_url, website_html, or website_markdown.Raw HTML content to process directly (max 2MB). Mutually exclusive with
website_url and website_markdown. Useful when you already have HTML content cached or want to process modified HTML.Raw Markdown content to process directly (max 2MB). Mutually exclusive with
website_url and website_html. Perfect for extracting structured data from Markdown documentation, README files, or any content already in Markdown format.Natural language description of what information you want to extract from the webpage.
Optional schema to structure the output. If provided, the AI will attempt to format the results according to this schema.
Optional custom HTTP headers to send with the request. Useful for setting User-Agent, cookies, authentication tokens, and other request metadata.Example:
{"User-Agent": "Mozilla/5.0...", "Cookie": "session=abc123"}Optional parameter to enable pagination and scrape multiple pages. Specify the number of pages to extract data from.Default: 1
Range: 1-100
Optional parameter for infinite scroll pages. Specify how many times to scroll down to load more content before extraction.Default: 0
Range: 0-50
Optional parameter to enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, SPAs). Use when standard rendering doesn’t capture all content.Default: false
Optional parameter to enable mock mode. When set to true, the request will return mock data instead of performing an actual extraction. Useful for testing and development.Default: false
Optional parameter to return plain text instead of JSON. When set to true, the result will be returned as plain text rather than structured JSON data.Default: false
Optional cookies object for authentication and session management. Useful for accessing authenticated pages or maintaining session state.Example:
{"session_id": "abc123", "auth_token": "xyz789"}Optional array of interaction steps to perform on the webpage before extraction. Each step is a string describing the action to take (e.g., “click on filter button”, “wait for results to load”).Example:
["click on search button", "type query in search box", "wait for results"]Example Requests
Basic Request
Advanced Request with Pagination and Stealth Mode
Request with HTML Content
Extract structured data directly from HTML content:Request with Markdown Content
Extract structured data directly from Markdown content:Request with Cookies and Infinite Scroll
Extract data from authenticated pages with infinite scroll:Request with Browser Interaction Steps
Perform browser interactions before extraction:Example Response
Authorizations
Body
application/json
Either website_url or website_html must be provided
Example:
"Extract info about the company"
Example:
"https://scrapegraphai.com/"
HTML content, maximum size 2MB
Example:
"<html><body><h1>Title</h1><p>Content</p></body></html>"
Optional headers to send with the request, including cookies and user agent
Example:
{
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Cookie": "cookie1=value1; cookie2=value2"
}Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost