Lifecycle
A job moves through these states:- queued: Reader has accepted the job but hasn’t started yet. Usually brief.
- processing: At least one URL is being fetched. Results start appearing in the
resultsarray as pages finish. - completed: All URLs attempted. Some may have errors, but the job finished its run.
- failed: The whole job hit a fatal error before finishing (rare, usually an infrastructure issue).
- cancelled: You called
DELETE /v1/jobs/{id}and Reader stopped processing.
Job response shape
results has the same page shape as a sync scrape result: same fields, same metadata. Code that handles one can handle the other.
Three ways to watch a job
1. Poll
FetchGET /v1/jobs/{id} every few seconds until status is terminal. Simplest; best for short jobs and CLIs.
waitForJob(id) wraps this pattern and auto-collects paginated results when done.
Costs: one API request per poll, which counts toward your RPM. Don’t poll faster than you need to.
2. Server-sent events (SSE)
Open a stream toGET /v1/jobs/{id}/stream and receive events as the job makes progress:
3. Webhook
Subscribe a webhook tojob.completed and job.failed events. Reader calls your endpoint when the job terminates: no polling, no open connections.
See Webhooks for the full contract and signing.
Picking a pattern
| You want | Use |
|---|---|
| A quick script, a few small jobs | Poll |
| A real-time progress bar in a UI | SSE |
| Long-running batches you don’t want to babysit | Webhook |
| Fire-and-forget workloads on server infrastructure | Webhook |
| Jobs that run while your process is offline | Webhook |
Pagination
GET /v1/jobs/{id} returns the job envelope plus pagination for the results array. Default limit is 20 per page, max 100. The SDK’s waitForJob and getAllJobResults follow pagination automatically.
Cancellation
DELETE /v1/jobs/{id} stops a queued or processing job. Any pages already completed stay in the results; Reader just stops scheduling new URLs. Calling cancel on a job that’s already completed, failed, or cancelled returns 409 conflict.
Retries
POST /v1/jobs/{id}/retry re-queues just the URLs that errored. The successful results stay put. Use this when a flaky batch had a few bad URLs you think might succeed on a second try. Returns 400 invalid_request if there are no failed URLs.
Next
- Events: workspace-level SSE stream
- Webhooks: push notifications and signing
- Polling vs SSE vs webhooks: deeper comparison

