The five pillars
- Idempotency key on the
/v1/readPOST, so retries don’t create duplicate jobs. - Track the job ID in your own database immediately after submission.
- Webhooks as the primary completion signal, so a restart doesn’t strand the job.
- Poll as a fallback, in case the webhook was dropped.
- Retry failed URLs rather than restarting the whole batch.
Submission
x-idempotency-key is critical. If your request times out but Reader already accepted it, your retry with the same key returns the original job ID, not a new job. Without it, you’d submit the batch twice.
Completion via webhook
Hydrating results (the slow part)
Fallback polling
Webhooks can get lost: configuration mistakes, your endpoint being down when all three retries happen, a DNS outage. As a safety net, run a periodic job that polls Reader for any batches that have beensubmitted for more than some threshold:
Retrying failed URLs
When a batch completes with some failed URLs, you have two options:- Accept the failures (your data has
errorfields for those rows) and move on - Retry the failed subset with
POST /v1/jobs/{id}/retry
job.completed webhook when the retry finishes.
Monitoring
Track in your own metrics:- Submission rate (batches / minute)
- Completion time (webhook received - submitted)
- Per-batch failure rate (
failed URLs / total URLs) - Webhook delivery failures (via
deliveryStats)

