How It Works
When you callscrape(), Reader:
- Loads the page in a real browser (Ulixee Hero)
- Handles challenges like JavaScript execution and anti-bot protection
- Waits for content to ensure dynamic elements are loaded
- Extracts main content by removing navigation, ads, and other noise
- Converts to markdown using supermarkdown
Basic Usage
Output Formats
Reader supports two output formats:| Format | Description |
|---|---|
markdown | Clean markdown, optimized for LLMs |
html | Cleaned HTML with main content only |
Scrape Result Structure
Website Metadata
Reader extracts rich metadata from each page:Timeouts
Control how long Reader waits for pages:Waiting for Selectors
Wait for specific elements before extracting content:Custom User Agent
Custom Headers
Next Steps
Batch Scraping
Scrape multiple URLs concurrently
Content Extraction
Control what content is extracted

