Scraping - Reader

Scraping in Reader means: fetch a URL, render it (if needed), extract the main content, and convert it to clean markdown. Under the hood this is a four-step pipeline:

Fetch - Reader renders the page in a headless Chrome browser (Hero engine) with JavaScript execution, TLS fingerprinting, and proxy routing. See Scraping Engine.
Extract - Reader identifies the main content area (article, main, largest text block), removes navigation, footers, sidebars, ads, and hidden elements.
Convert - the cleaned HTML is converted to markdown via supermarkdown (a Rust-backed converter optimized for LLM input).

The `scrape()` primitive

scrape() takes one or more URLs and returns a ScrapeResult:

const result = await reader.scrape({
  urls: ["https://example.com"],
  formats: ["markdown"],
});

console.log(result.data[0].markdown);

For a single URL, the call runs synchronously. For multiple URLs, set batchConcurrency to process them in parallel - see Batch Scraping.

Output formats

Reader supports two output formats:

markdown - cleaned, LLM-optimized markdown (default)
html - the cleaned HTML that markdown was generated from

You can request both in a single call:

const result = await reader.scrape({
  urls: ["https://example.com"],
  formats: ["markdown", "html"],
});

console.log(result.data[0].markdown);
console.log(result.data[0].html);

Every result also includes metadata regardless of format, with the title, description, canonical URL, favicon, Open Graph tags, and Twitter Card tags.

What scraping is not

Crawling - scrape() only visits URLs you explicitly pass. It does not discover links. For that, use crawl().
Screenshots or PDFs - Reader is text-first. It does not produce rendered images of pages.
A JavaScript execution environment - while hero runs JavaScript, Reader doesn’t expose the page object or let you run arbitrary browser scripts. It’s a scraping tool, not a general-purpose browser automation framework.

Where to go next

Content Extraction

How Reader decides what to keep and what to strip.

Scraping Engine

How the Hero engine and proxy escalation work.

Basic Scraping guide

Practical recipes and patterns.

ScrapeOptions reference

Every option, every default.

​The scrape() primitive

​Output formats

​What scraping is not

​Where to go next