Start SmartScraper

SmartScraper allows you to extract specific information from any webpage using AI. Simply provide a URL and describe what information you want to extract in natural language.

Use Cases

Extract company information from websites
Gather product details from e-commerce pages
Collect contact information from business pages
Extract structured data from articles or blog posts

Request Body

website_url

string

required

The URL of the webpage you want to extract information from. You must provide exactly one of: website_url, website_html, or website_markdown.

website_html

string

Raw HTML content to process directly (max 2MB). Mutually exclusive with website_url and website_markdown. Useful when you already have HTML content cached or want to process modified HTML.

website_markdown

string

Raw Markdown content to process directly (max 2MB). Mutually exclusive with website_url and website_html. Perfect for extracting structured data from Markdown documentation, README files, or any content already in Markdown format.

user_prompt

string

required

Natural language description of what information you want to extract from the webpage.

output_schema

object

Optional schema to structure the output. If provided, the AI will attempt to format the results according to this schema.

headers

object

Optional custom HTTP headers to send with the request. Useful for setting User-Agent, cookies, authentication tokens, and other request metadata.Example: {"User-Agent": "Mozilla/5.0...", "Cookie": "session=abc123"}

total_pages

integer

Optional parameter to enable pagination and scrape multiple pages. Specify the number of pages to extract data from.Default: 1 Range: 1-100

number_of_scrolls

integer

Optional parameter for infinite scroll pages. Specify how many times to scroll down to load more content before extraction.Default: 0 Range: 0-50

render_heavy_js

boolean

Optional parameter to enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, SPAs). Use when standard rendering doesn’t capture all content.Default: false

mock

boolean

Optional parameter to enable mock mode. When set to true, the request will return mock data instead of performing an actual extraction. Useful for testing and development.Default: false

plain_text

boolean

Optional parameter to return plain text instead of JSON. When set to true, the result will be returned as plain text rather than structured JSON data.Default: false

object

Optional cookies object for authentication and session management. Useful for accessing authenticated pages or maintaining session state.Example: {"session_id": "abc123", "auth_token": "xyz789"}

steps

array

Optional array of interaction steps to perform on the webpage before extraction. Each step is a string describing the action to take (e.g., “click on filter button”, “wait for results to load”).Example: ["click on search button", "type query in search box", "wait for results"]

Example Requests

Basic Request

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://scrapegraphai.com/",
  "user_prompt": "Extract company information and features",
  "output_schema": {
    "properties": {
      "company_name": {"type": "string"},
      "description": {"type": "string"},
      "features": {"type": "array", "items": {"type": "string"}},
      "contact_email": {"type": "string"}
    }
  }
}'

Advanced Request with Pagination and Stealth Mode

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/news",
  "user_prompt": "Extract all the headlines from this section into a table with the date and URL of the news",
  "total_pages": 2,
  "stealth": true,
  "render_heavy_js": true,
  "headers": {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Cookie": "cookie1=value1; cookie2=value2"
  }
}'

Request with HTML Content

Extract structured data directly from HTML content:

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_html": "<html><body><h1>Product Catalog</h1><div class=\"product\"><h2>Laptop Pro 15</h2><p><strong>Brand:</strong> TechCorp</p><p><strong>Price:</strong> $1,299.99</p><p><strong>Rating:</strong> 4.5/5</p><p><strong>In Stock:</strong> Yes</p></div></body></html>",
  "user_prompt": "Extract all products with their names, brands, prices, ratings, and stock status",
  "output_schema": {
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "properties": {
            "name": {"type": "string"},
            "brand": {"type": "string"},
            "price": {"type": "string"},
            "rating": {"type": "string"},
            "in_stock": {"type": "boolean"}
          }
        }
      }
    }
  }
}'

Request with Markdown Content

Extract structured data directly from Markdown content:

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_markdown": "# Product Catalog\n\n## Laptop Pro 15\n- **Brand**: TechCorp\n- **Price**: $1,299.99\n- **Rating**: 4.5/5\n- **In Stock**: Yes",
  "user_prompt": "Extract all products with their names, brands, prices, ratings, and stock status",
  "output_schema": {
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "properties": {
            "name": {"type": "string"},
            "brand": {"type": "string"},
            "price": {"type": "string"},
            "rating": {"type": "string"},
            "in_stock": {"type": "boolean"}
          }
        }
      }
    }
  }
}'

Request with Cookies and Infinite Scroll

Extract data from authenticated pages with infinite scroll:

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/dashboard",
  "user_prompt": "Extract all dashboard items and their details",
  "cookies": {
    "session_id": "abc123def456",
    "auth_token": "xyz789"
  },
  "number_of_scrolls": 5,
  "output_schema": {
    "properties": {
      "items": {
        "type": "array",
        "items": {
          "properties": {
            "title": {"type": "string"},
            "description": {"type": "string"}
          }
        }
      }
    }
  }
}'

Request with Browser Interaction Steps

Perform browser interactions before extraction:

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/products",
  "user_prompt": "Extract all visible product information after applying filters",
  "steps": [
    "click on filter button",
    "select category electronics",
    "click apply filters",
    "wait for results to load"
  ],
  "output_schema": {
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "properties": {
            "name": {"type": "string"},
            "price": {"type": "string"},
            "category": {"type": "string"}
          }
        }
      }
    }
  }
}'

Example Response

{
  "request_id": "<request-id>",
  "status": "completed",
  "website_url": "https://scrapegraphai.com/",
  "user_prompt": "Extract info about the company",
  "result": {
    "company_name": "ScrapeGraphAI",
    "description": "ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents...",
    "features": [
      "Effortless, cost-effective, and AI-powered data extraction",
      "Handles proxy rotation and rate limits",
      "Supports a wide variety of websites"
    ],
    "contact_email": "contact@scrapegraphai.com",
    "social_links": {
      "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
      "linkedin": "https://www.linkedin.com/company/101881123",
      "twitter": "https://x.com/scrapegraphai"
    },
    "..."
  },
  "error": ""
}

Authorizations

SGAI-APIKEY

string

header

required

Body

application/json

Either website_url or website_html must be provided

user_prompt

string

required

Example:

"Extract info about the company"

website_url

string

Example:

"https://scrapegraphai.com/"

website_html

string

HTML content, maximum size 2MB

Example:

"<html><body><h1>Title</h1><p>Content</p></body></html>"

headers

object

Optional headers to send with the request, including cookies and user agent

Show child attributes

Example:

{
  "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
  "Cookie": "cookie1=value1; cookie2=value2"
}

output_schema

object | null

stealth

boolean

default:false

Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost

Response

Successful Response

request_id

string

required

status

enum<string>

required

Available options:

queued,

processing,

completed,

failed

website_url

string

required

user_prompt

string

required

result

object | null

error

string

default:""

API Documentation

SmartScraper

SearchScraper

SmartCrawler

Sitemap

Markdownify

User

Use Cases

Request Body

Example Requests

Basic Request

Request with HTML Content

Request with Markdown Content

Request with Cookies and Infinite Scroll

Request with Browser Interaction Steps

Example Response

Authorizations

Body

Response

API Documentation

SmartScraper

SearchScraper

SmartCrawler

Sitemap

Markdownify

User

​Use Cases

​Request Body

​Example Requests

​Basic Request

​Advanced Request with Pagination and Stealth Mode

​Request with HTML Content

​Request with Markdown Content

​Request with Cookies and Infinite Scroll

​Request with Browser Interaction Steps

​Example Response

Authorizations

Body

Response

Use Cases

Request Body

Example Requests

Basic Request

Advanced Request with Pagination and Stealth Mode

Request with HTML Content

Request with Markdown Content

Request with Cookies and Infinite Scroll

Request with Browser Interaction Steps

Example Response