Why a single engine?
A browser engine handles everything a simpler HTTP client can, plus everything it can’t. Sites that serve static HTML work fine in a browser. Sites that require JavaScript, handle Cloudflare challenges, or check TLS fingerprints also work — because it’s a real browser. The tradeoff is speed: a plain HTTP fetch completes in ~100ms, while a browser page load takes 1-5 seconds. In practice, the reliability gain far outweighs the latency cost — failed scrapes that require retries are slower than a single browser-based scrape that succeeds on the first try.How a scrape runs
Each scrape attempt opens a fresh tab in a warm Chrome process (the browser pool keeps Chrome running between requests):Proxy escalation
Reader uses a two-step retry strategy per URL:- Datacenter proxies are fast and cheap. They work for most sites.
- Residential proxies use real household IPs. They bypass anti-bot systems that block datacenter IP ranges.
hardDeadlineMs (total cap, default 30s) and datacenterTimeoutMs (first attempt, default 10s).
Quality check
After Hero returns HTML, the orchestrator runs a minimal quality check:- HTTP 2xx/3xx with any text content → pass
- HTTP 2xx with empty body → fail (
empty_content) - HTTP 4xx/5xx with empty body → fail (
http_error)
Where to go next
Proxy Tiers
How datacenter and residential proxies are managed.
Error Handling
What happens when scraping fails.

