# Dead-Letter Queue

The DLQ is the ledger of deliveries that exhausted normal retry behavior and still need operator attention. Use it when you want to inspect terminal failures, export evidence, or create fresh replay deliveries from failed rows.

## DLQ model

DLQ rows are separate records, not just a delivery status flag. Each entry ties back to:

- the source delivery
- the terminal attempt
- replayability state based on retained source and dispatch payload artifacts

DLQ re-drive does not mutate the failed delivery in place. It creates a new replay delivery using the same replay coordinator path as normal delivery replay.

## Auth model

- **Public API** `project API key`: DLQ list, detail, export, and single re-drive are project-authenticated Public API routes. Bulk re-drive is also project-authenticated and requires Starter+.
- **Dashboard** `session auth`: The [dashboard](/docs/dashboard.md) provides DLQ list, detail, single re-drive, and [auto-refreshed views](/docs/dashboard-live-updates.md) for operators.
- **CLI / SDK**: The CLI and SDK cover core DLQ inspection and recovery flows.

## Inspection, export, and re-drive

The normal operator sequence is:

1. inspect `GET /v1/dlq` or `GET /v1/dlq/{id}`
2. export evidence with `GET /v1/dlq/export` when you need retained source payload snapshots
3. run `POST /v1/dlq/{id}/re-drive` to create a fresh replay delivery

Important behavior:

- export includes the same filters and cursor semantics as the normal list route, but each export page is capped at 20 rows and payload bodies are base64-encoded
- detail includes `replayable` so you can tell whether the required payload artifacts still exist
- re-drive can return `409` when the source payload or dispatch snapshot has already expired
- re-drive can safely be retried after certain handoff failures because pending unsequenced replays are reused

## Bulk recovery and replayability

Bulk recovery is the Starter+ async path:

- `POST /v1/dlq/bulk-re-drive` creates a job over a frozen snapshot of matching DLQ rows
- poll `GET /v1/bulk-operations/{id}` for counts and per-item progress

**Create a bulk DLQ recovery job**

```bash
curl https://api.hooksbase.com/v1/dlq/bulk-re-drive \
  -H "Authorization: Bearer swk_..." \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: recover-dlq-orders-2026-04-23" \
  -d '{
    "filters": {
      "webhookId": "wh_123",
      "from": 1740000000000,
      "to": 1740086400000
    },
    "maxItems": 100
  }'
```

**Poll recovery progress**

```bash
curl https://api.hooksbase.com/v1/bulk-operations/bulk_123 \
  -H "Authorization: Bearer swk_..."
```

How DLQ differs from direct replay:

- direct replay starts from a known delivery row
- DLQ re-drive starts from a dead-letter record created after retry exhaustion
- both still depend on retained source payload and any retained transformed dispatch snapshot

## Related routes

- `GET /v1/dlq`
- `GET /v1/dlq/{id}`
- `GET /v1/dlq/export`
- `POST /v1/dlq/{id}/re-drive`
- `POST /v1/dlq/bulk-re-drive`
- `GET /v1/bulk-operations/{id}`

## Common mistakes

- Treating DLQ re-drive as an in-place retry. It always creates a new replay delivery.
- Waiting until retention expires and then expecting `replayable` entries to remain recoverable.
- Using DLQ export as the primary delivery history feed instead of an operator recovery tool.
- Forgetting that bulk re-drive runs asynchronously through a bulk-operation job.
