crawl() when you don’t know the exact URLs you want - you just know the entry point and want Reader to discover everything reachable from there.
Discover links only
{ url, title, description } for every discovered page. No content scraping - just link discovery. Fast and cheap.
Crawl and scrape in one call
Setscrape: true to scrape every discovered page:
Scope the crawl with patterns
By default, crawling is same-domain. To further scope, useincludePatterns and excludePatterns:
includePatterns entry (or if includePatterns is empty) and matches no excludePatterns entry.
Depth vs max pages
Bothdepth and maxPages bound the crawl. Whichever triggers first stops it.
| Setting | Effect |
|---|---|
depth: 1 | Only the seed URL’s direct links |
depth: 2 | Seed + direct links + their direct links |
depth: 5 | Deep exploration |
maxPages: 20 | Hard stop after 20 pages regardless of depth |
depth: 3, maxPages: 100 covers most content. For large sites, tune maxPages to your credit/time budget.
Rate limiting
Reader rate-limits crawls withdelayMs (default 1000ms):
scrape: true with scrapeConcurrency > 1, the delay still applies - scrapes inside a crawl serialize behind the crawl delay.
Sticky proxy per crawl
When you’ve configured proxy pools, Reader picks one proxy at the start of the crawl and uses it for every request. This is intentional - rotating IPs mid-crawl looks unnatural to anti-bot systems. If you want different crawl sessions on different proxies, just callcrawl() multiple times. Each call picks a fresh proxy.
Resuming a crawl
Reader’s crawl is stateless - if it fails partway through, you can’t resume from where it left off. For long crawls, consider:- Running multiple smaller crawls with different seed URLs
- Using
crawl()for discovery only (fast), thenscrape()with the resulting URL list (resumable by chunking)
Where to go next
Crawling concept
How BFS discovery works under the hood.
CrawlOptions reference
Every option, every default.

