Skip to main content

The Problem

Building agents that need web access is frustrating. You piece together Puppeteer, add stealth plugins, manage proxies, and it still breaks in production. Because production-grade web scraping isn’t about rendering a page and converting HTML to markdown. It’s about everything underneath:
LayerWhat it actually takes
Browser architectureManaging browser instances at scale, not one-off scripts
Anti-bot handlingJS challenges, Turnstile, and other protections
TLS fingerprintingReal browsers have fingerprints. Puppeteer doesn’t. Sites know.
Proxy infrastructureDatacenter vs residential, rotation strategies, sticky sessions
Resource managementBrowser pooling, memory limits, graceful recycling
ReliabilityRate limiting, retries, timeouts, graceful degradation

The Solution

Two primitives. That’s it.
import { ReaderClient } from "@vakra-dev/reader";

const reader = new ReaderClient();

// Scrape URLs → clean markdown
const result = await reader.scrape({ urls: ["https://example.com"] });
console.log(result.data[0].markdown);

// Crawl a site → discover + scrape pages
const pages = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  scrape: true,
});
console.log(`Found ${pages.urls.length} pages`);
All the hard stuff (browser pooling, challenge detection, proxy rotation, retries) happens under the hood. You get clean markdown. Your agents get the web.

Features

Production-Grade

Built on Ulixee Hero with TLS fingerprinting and stealth browsing

Clean Output

Markdown and HTML with automatic main content extraction

Browser Pool

Auto-recycling, health monitoring, and queue management

Website Crawling

BFS link discovery with depth and page limits

Proxy Support

Datacenter and residential proxies with rotation strategies

CLI Included

Use from command line or programmatically in your code

Next Steps