- Fetch - Reader renders the page in a headless Chrome browser (Hero engine) with JavaScript execution, TLS fingerprinting, and proxy routing. See Scraping Engine.
- Extract - Reader identifies the main content area (article, main, largest text block), removes navigation, footers, sidebars, ads, and hidden elements.
- Convert - the cleaned HTML is converted to markdown via supermarkdown (a Rust-backed converter optimized for LLM input).
The scrape() primitive
scrape() takes one or more URLs and returns a ScrapeResult:
batchConcurrency to process them in parallel - see Batch Scraping.
Output formats
Reader supports two output formats:markdown- cleaned, LLM-optimized markdown (default)html- the cleaned HTML that markdown was generated from
metadata regardless of format, with the title, description, canonical URL, favicon, Open Graph tags, and Twitter Card tags.
What scraping is not
- Crawling -
scrape()only visits URLs you explicitly pass. It does not discover links. For that, usecrawl(). - Screenshots or PDFs - Reader is text-first. It does not produce rendered images of pages.
- A JavaScript execution environment - while
heroruns JavaScript, Reader doesn’t expose the page object or let you run arbitrary browser scripts. It’s a scraping tool, not a general-purpose browser automation framework.
Where to go next
Content Extraction
How Reader decides what to keep and what to strip.
Scraping Engine
How the Hero engine and proxy escalation work.
Basic Scraping guide
Practical recipes and patterns.
ScrapeOptions reference
Every option, every default.

