Request both
When to use markdown only
The default. Clean, tokenizer-friendly, good for LLMs and RAG. Use markdown by itself unless you know you need HTML.When to include HTML
- You need structure Reader’s markdown conversion strips. For example,
<table>s with complex layouts sometimes lose nuance in markdown. The HTML preserves it. - You want to run your own parser. If you already have a cheerio / BeautifulSoup pipeline and just need Reader’s clean HTML as input, skip the markdown.
- You’re extracting specific elements. You want just the first
<img>, or every<blockquote>, without parsing markdown back.
["markdown"] in production once you know what you need.
What HTML you actually get
The HTML Reader returns is cleaned, not the raw DOM. Scripts, styles, tracking pixels, and (ifonlyMainContent: true) boilerplate are already removed. It’s the same DOM Reader used internally to generate the markdown: good for parsing, not for rehydrating the original page.
If you need the absolute raw HTML, set onlyMainContent: false. Reader will skip its boilerplate stripping and give you closer to the source.

