- Scrape: you already have the URLs. Reader fetches them and returns content.
- Crawl: you have one starting URL and want Reader to discover and fetch the rest by following links.
POST /v1/read) by changing the body. The page-level output shape is identical.
Scrape
Passurl for a single page, urls for a list. A single URL runs synchronously and returns the result in the response body. A list creates an async job.
- You have a sitemap, RSS feed, CSV, or search-result list of URLs
- You’re re-fetching pages that changed
- You’re ingesting a known set of product or article URLs for an LLM pipeline
Crawl
Passurl together with maxDepth or maxPages. Reader starts at the URL, extracts links from each page, and follows them up to the limits you set.
- You want to index an entire docs site, knowledge base, or blog
- You don’t have a list of URLs and don’t want to build one
- You need Reader to discover pages you haven’t seen yet
Limits
| Knob | Default | Max |
|---|---|---|
maxDepth | 2 | 10 |
maxPages | 20 | 10000 |
docs.stripe.com will not follow links to dashboard.stripe.com).
When to batch-scrape instead of crawl
If you can generate the URL list yourself (from a sitemap, RSS feed, or your own database), prefer batch scrape over crawl. You pay only for pages you actually want, you control the order, and you don’t spend budget on pages the crawler happens to find but you don’t care about. A rule of thumb:- Got URLs? Batch scrape.
- Don’t have URLs and the site doesn’t publish a sitemap? Crawl.
- The site has a sitemap.xml? Fetch the sitemap, batch-scrape the URLs you want.
Same output shape
A crawl result and a batch result are the same thing as far as your code is concerned: a job with aresults array where each entry is one page. You can write one handler and point it at either.
Next
- Async jobs: how to watch a batch or crawl finish
- Choosing a proxy mode: when to force stealth

