Skip to main content
Reader is a production-grade web scraping library built on Ulixee Hero. It provides two main primitives: scrape and crawl.

Core Concepts

Scraping

Scraping is the process of fetching and extracting content from URLs. Reader handles:
  • Loading pages in a real browser
  • Waiting for dynamic content
  • Extracting main content
  • Converting HTML to clean markdown
const result = await reader.scrape({
  urls: ["https://example.com"],
});
Learn more about scraping →

Crawling

Crawling is the process of discovering pages on a website. Reader uses breadth-first search to find links and can optionally scrape the content of discovered pages.
const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 50,
  scrape: true,
});
Learn more about crawling →

Content Extraction

Reader automatically extracts the main content from web pages, removing navigation, headers, footers, ads, and other non-content elements. Learn more about content extraction →

Browser Pool

For high-volume scraping, Reader manages a pool of browser instances with automatic recycling and health monitoring. Learn more about browser pool →

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      ReaderClient                           │
│                   (manages lifecycle)                       │
└─────────────────────────┬───────────────────────────────────┘

          ┌───────────────┴───────────────┐
          │                               │
    ┌─────▼─────┐                   ┌─────▼─────┐
    │  scrape() │                   │  crawl()  │
    │           │                   │           │
    └─────┬─────┘                   └─────┬─────┘
          │                               │
          └───────────────┬───────────────┘

                ┌─────────▼─────────┐
                │   Browser Pool    │
                │ (Hero instances)  │
                └───────────────────┘

Guides