Scrape, batch, or crawl

curl --request POST \
  --url https://api.reader.dev/v1/read \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "url": "https://example.com",
  "urls": [
    "<string>"
  ],
  "formats": [
    "markdown"
  ],
  "onlyMainContent": true,
  "includeTags": [
    "<string>"
  ],
  "excludeTags": [
    "<string>"
  ],
  "waitForSelector": "<string>",
  "timeoutMs": 60500,
  "proxyMode": "auto",
  "batchConcurrency": 10,
  "maxDepth": 5,
  "maxPages": 5000,
  "cache": true
}
'

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)",
    "metadata": {
      "title": "Example Domain",
      "description": null,
      "statusCode": 200,
      "duration": 487,
      "cached": false,
      "proxyMode": "standard",
      "proxyEscalated": false,
      "scrapedAt": "2026-04-04T12:00:00Z"
    }
  }
}

POST

read

curl --request POST \
  --url https://api.reader.dev/v1/read \
  --header 'Content-Type: application/json' \
  --header 'x-api-key: <api-key>' \
  --data '
{
  "url": "https://example.com",
  "urls": [
    "<string>"
  ],
  "formats": [
    "markdown"
  ],
  "onlyMainContent": true,
  "includeTags": [
    "<string>"
  ],
  "excludeTags": [
    "<string>"
  ],
  "waitForSelector": "<string>",
  "timeoutMs": 60500,
  "proxyMode": "auto",
  "batchConcurrency": 10,
  "maxDepth": 5,
  "maxPages": 5000,
  "cache": true
}
'

{
  "success": true,
  "data": {
    "url": "https://example.com",
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)",
    "metadata": {
      "title": "Example Domain",
      "description": null,
      "statusCode": 200,
      "duration": 487,
      "cached": false,
      "proxyMode": "standard",
      "proxyEscalated": false,
      "scrapedAt": "2026-04-04T12:00:00Z"
    }
  }
}

POST /v1/read is the unified content-extraction endpoint. It auto-detects the operation from the body:

Single url → synchronous scrape, returned immediately in the response
Multiple urls → async batch job
url + maxDepth or maxPages → async crawl job

See The read primitive for a narrative overview.

Proxy mode

Set proxyMode to "standard" (1 credit, fast), "stealth" (3 credits, bypasses bot walls), or "auto" (default, which tries standard first and escalates to stealth on block). The response metadata tells you which mode actually ran. See Proxy modes.

Idempotency

Pass an x-idempotency-key header to deduplicate retried POSTs. Reader caches the original response for 24 hours and returns it verbatim on any subsequent request with the same key.

curl -X POST https://api.reader.dev/v1/read \
  -H "x-api-key: $READER_KEY" \
  -H "x-idempotency-key: batch-2026-04-04-run-1" \
  -H "Content-Type: application/json" \
  -d '{ "urls": ["https://example.com/a", "https://example.com/b"] }'

Authorizations

x-api-key

string

header

required

Body

application/json

url

string<uri>

Example:

"https://example.com"

urls

string<uri>[]

Required array length: 1 - 1000 elements

formats

enum<string>[]

Content formats to include in the response.

Available options:

markdown,

html

Example:

["markdown"]

onlyMainContent

boolean

Strip navigation, footers, and boilerplate. Default: true.

includeTags

string[]

excludeTags

string[]

waitForSelector

string

timeoutMs

integer

Required range: 1000 <= x <= 120000

proxyMode

enum<string>

Proxy mode for the scrape. standard is fast and affordable; stealth bypasses aggressive bot walls; auto picks automatically and only escalates to stealth when a block is detected. Default: auto.

Available options:

standard,

stealth,

auto

Example:

"auto"

batchConcurrency

integer

Required range: 1 <= x <= 20

maxDepth

integer

Crawl depth (when crawling). Omit for single-URL scrape.

Required range: 1 <= x <= 10

maxPages

integer

Maximum pages to discover during crawl.

Required range: 1 <= x <= 10000

cache

boolean

Reuse cached content within TTL. Default: true.

webhook

object

Show child attributes

Response

Sync scrape completed

success

enum<boolean>

required

Available options:

true

data

object

required

Show child attributes

Authentication Get a job

​Proxy mode

​Idempotency

Authorizations

Body

Response

Proxy mode

Idempotency