Skip to main content
POST
/
v1
/
smartscraper
cURL
curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
  -H 'SGAI-APIKEY: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "user_prompt": "Extract info about the company",
    "website_url": "https://scrapegraphai.com/"
  }'
{
  "request_id": "<string>",
  "status": "queued",
  "website_url": "<string>",
  "user_prompt": "<string>",
  "result": {},
  "error": ""
}
SmartScraper allows you to extract specific information from any webpage using AI. Simply provide a URL and describe what information you want to extract in natural language.

Use Cases

  • Extract company information from websites
  • Gather product details from e-commerce pages
  • Collect contact information from business pages
  • Extract structured data from articles or blog posts

Request Body

website_url
string
required
The URL of the webpage you want to extract information from. You must provide exactly one of: website_url, website_html, or website_markdown.
website_html
string
Raw HTML content to process directly (max 2MB). Mutually exclusive with website_url and website_markdown. Useful when you already have HTML content cached or want to process modified HTML.
website_markdown
string
Raw Markdown content to process directly (max 2MB). Mutually exclusive with website_url and website_html. Perfect for extracting structured data from Markdown documentation, README files, or any content already in Markdown format.
user_prompt
string
required
Natural language description of what information you want to extract from the webpage.
output_schema
object
Optional schema to structure the output. If provided, the AI will attempt to format the results according to this schema.
headers
object
Optional custom HTTP headers to send with the request. Useful for setting User-Agent, cookies, authentication tokens, and other request metadata.Example: {"User-Agent": "Mozilla/5.0...", "Cookie": "session=abc123"}
total_pages
integer
Optional parameter to enable pagination and scrape multiple pages. Specify the number of pages to extract data from.Default: 1 Range: 1-100
number_of_scrolls
integer
Optional parameter for infinite scroll pages. Specify how many times to scroll down to load more content before extraction.Default: 0 Range: 0-50
render_heavy_js
boolean
Optional parameter to enable enhanced JavaScript rendering for heavy JS websites (React, Vue, Angular, SPAs). Use when standard rendering doesn’t capture all content.Default: false
mock
boolean
Optional parameter to enable mock mode. When set to true, the request will return mock data instead of performing an actual extraction. Useful for testing and development.Default: false
plain_text
boolean
Optional parameter to return plain text instead of JSON. When set to true, the result will be returned as plain text rather than structured JSON data.Default: false
cookies
object
Optional cookies object for authentication and session management. Useful for accessing authenticated pages or maintaining session state.Example: {"session_id": "abc123", "auth_token": "xyz789"}
steps
array
Optional array of interaction steps to perform on the webpage before extraction. Each step is a string describing the action to take (e.g., “click on filter button”, “wait for results to load”).Example: ["click on search button", "type query in search box", "wait for results"]

Example Requests

Basic Request

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://scrapegraphai.com/",
  "user_prompt": "Extract company information and features",
  "output_schema": {
    "properties": {
      "company_name": {"type": "string"},
      "description": {"type": "string"},
      "features": {"type": "array", "items": {"type": "string"}},
      "contact_email": {"type": "string"}
    }
  }
}'

Advanced Request with Pagination and Stealth Mode

curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/news",
  "user_prompt": "Extract all the headlines from this section into a table with the date and URL of the news",
  "total_pages": 2,
  "stealth": true,
  "render_heavy_js": true,
  "headers": {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    "Cookie": "cookie1=value1; cookie2=value2"
  }
}'

Request with HTML Content

Extract structured data directly from HTML content:
curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_html": "<html><body><h1>Product Catalog</h1><div class=\"product\"><h2>Laptop Pro 15</h2><p><strong>Brand:</strong> TechCorp</p><p><strong>Price:</strong> $1,299.99</p><p><strong>Rating:</strong> 4.5/5</p><p><strong>In Stock:</strong> Yes</p></div></body></html>",
  "user_prompt": "Extract all products with their names, brands, prices, ratings, and stock status",
  "output_schema": {
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "properties": {
            "name": {"type": "string"},
            "brand": {"type": "string"},
            "price": {"type": "string"},
            "rating": {"type": "string"},
            "in_stock": {"type": "boolean"}
          }
        }
      }
    }
  }
}'

Request with Markdown Content

Extract structured data directly from Markdown content:
curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_markdown": "# Product Catalog\n\n## Laptop Pro 15\n- **Brand**: TechCorp\n- **Price**: $1,299.99\n- **Rating**: 4.5/5\n- **In Stock**: Yes",
  "user_prompt": "Extract all products with their names, brands, prices, ratings, and stock status",
  "output_schema": {
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "properties": {
            "name": {"type": "string"},
            "brand": {"type": "string"},
            "price": {"type": "string"},
            "rating": {"type": "string"},
            "in_stock": {"type": "boolean"}
          }
        }
      }
    }
  }
}'

Request with Cookies and Infinite Scroll

Extract data from authenticated pages with infinite scroll:
curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/dashboard",
  "user_prompt": "Extract all dashboard items and their details",
  "cookies": {
    "session_id": "abc123def456",
    "auth_token": "xyz789"
  },
  "number_of_scrolls": 5,
  "output_schema": {
    "properties": {
      "items": {
        "type": "array",
        "items": {
          "properties": {
            "title": {"type": "string"},
            "description": {"type": "string"}
          }
        }
      }
    }
  }
}'

Request with Browser Interaction Steps

Perform browser interactions before extraction:
curl -X POST 'https://api.scrapegraphai.com/v1/smartscraper' \
-H 'SGAI-APIKEY: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{
  "website_url": "https://example.com/products",
  "user_prompt": "Extract all visible product information after applying filters",
  "steps": [
    "click on filter button",
    "select category electronics",
    "click apply filters",
    "wait for results to load"
  ],
  "output_schema": {
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "properties": {
            "name": {"type": "string"},
            "price": {"type": "string"},
            "category": {"type": "string"}
          }
        }
      }
    }
  }
}'

Example Response

{
  "request_id": "<request-id>",
  "status": "completed",
  "website_url": "https://scrapegraphai.com/",
  "user_prompt": "Extract info about the company",
  "result": {
    "company_name": "ScrapeGraphAI",
    "description": "ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents...",
    "features": [
      "Effortless, cost-effective, and AI-powered data extraction",
      "Handles proxy rotation and rate limits",
      "Supports a wide variety of websites"
    ],
    "contact_email": "contact@scrapegraphai.com",
    "social_links": {
      "github": "https://github.com/ScrapeGraphAI/Scrapegraph-ai",
      "linkedin": "https://www.linkedin.com/company/101881123",
      "twitter": "https://x.com/scrapegraphai"
    },
    "..."
  },
  "error": ""
}

Authorizations

SGAI-APIKEY
string
header
required

Body

application/json

Either website_url or website_html must be provided

user_prompt
string
required
Example:

"Extract info about the company"

website_url
string
Example:

"https://scrapegraphai.com/"

website_html
string

HTML content, maximum size 2MB

Example:

"<html><body><h1>Title</h1><p>Content</p></body></html>"

headers
object

Optional headers to send with the request, including cookies and user agent

Example:
{
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Cookie": "cookie1=value1; cookie2=value2"
}
output_schema
object | null
stealth
boolean
default:false

Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost

Response

Successful Response

request_id
string
required
status
enum<string>
required
Available options:
queued,
processing,
completed,
failed
website_url
string
required
user_prompt
string
required
result
object | null
error
string
default:""