crawl() - Reader

Signature

reader.crawl(options: CrawlOptions): Promise<CrawlResult>

Discover links only

const result = await reader.crawl({
  url: "https://docs.example.com",
  depth: 2,
  maxPages: 50,
});

for (const page of result.urls) {
  console.log(page.url, "-", page.title);
}

Crawl and scrape in one call

const result = await reader.crawl({
  url: "https://docs.example.com",
  depth: 2,
  maxPages: 50,
  scrape: true,
  scrapeConcurrency: 3,
});

console.log(`Discovered: ${result.urls.length}`);
console.log(`Scraped:    ${result.scraped?.batchMetadata.successfulUrls}`);

Parameter

options: CrawlOptions - see CrawlOptions for the full field list. The only required field is url: string. Everything else has defaults.

Return type

Promise<CrawlResult> - see CrawlResult for the full shape. Key fields:

{
  urls: Array<{ url: string; title: string; description: string | null }>;
  scraped?: ScrapeResult;   // only when scrape: true
  metadata: {
    totalUrls: number;
    maxDepth: number;
    totalDuration: number;
    seedUrl: string;
  };
}

BFS behavior

crawl() does breadth-first search starting from url:

Fetch the seed and extract links
Filter: same domain, not visited, matches includePatterns, doesn’t match excludePatterns, not blocked by robots.txt
Enqueue matching links at depth + 1 if within bounds
Rate limit with delayMs between requests
Stop when queue is empty or maxPages reached

See Crawling for the full mental model.

Sticky proxy

When proxy pools are configured, crawl() picks one proxy at the start and uses it for every request in the session. This mimics real user browsing and avoids tripping anti-bot systems.

​Signature

​Discover links only

​Crawl and scrape in one call

​Parameter

​Return type

​BFS behavior

​Sticky proxy

​Where to go next

CrawlOptions

CrawlResult