Overview - Reader

Reader is a production-grade web scraping library built on Ulixee Hero. It provides two main primitives: scrape and crawl.

Core Concepts

Scraping

Scraping is the process of fetching and extracting content from URLs. Reader handles:

Loading pages in a real browser
Waiting for dynamic content
Extracting main content
Converting HTML to clean markdown

const result = await reader.scrape({
  urls: ["https://example.com"],
});

Learn more about scraping →

Crawling

Crawling is the process of discovering pages on a website. Reader uses breadth-first search to find links and can optionally scrape the content of discovered pages.

const result = await reader.crawl({
  url: "https://example.com",
  depth: 2,
  maxPages: 50,
  scrape: true,
});

Learn more about crawling →

Content Extraction

Reader automatically extracts the main content from web pages, removing navigation, headers, footers, ads, and other non-content elements. Learn more about content extraction →

Browser Pool

For high-volume scraping, Reader manages a pool of browser instances with automatic recycling and health monitoring. Learn more about browser pool →

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      ReaderClient                           │
│                   (manages lifecycle)                       │
└─────────────────────────┬───────────────────────────────────┘
                          │
          ┌───────────────┴───────────────┐
          │                               │
    ┌─────▼─────┐                   ┌─────▼─────┐
    │  scrape() │                   │  crawl()  │
    │           │                   │           │
    └─────┬─────┘                   └─────┬─────┘
          │                               │
          └───────────────┬───────────────┘
                          │
                ┌─────────▼─────────┐
                │   Browser Pool    │
                │ (Hero instances)  │
                └───────────────────┘

Guides

Basic Scraping

Scrape single and multiple URLs

Batch Scraping

High-volume concurrent scraping

Website Crawling

Discover and scrape entire websites

Proxy Configuration

Set up proxy rotation

CLI Usage

Use Reader from the command line

Deployment

Deploy Reader in production

Documentation

Concepts

Guides

​Core Concepts

​Scraping

​Crawling

​Content Extraction

​Browser Pool

​Architecture

​Guides

Basic Scraping

Batch Scraping

Website Crawling

Proxy Configuration

CLI Usage

Deployment

Core Concepts

Scraping

Crawling

Content Extraction

Browser Pool

Architecture

Guides