Glossary · Reliability

DLQ (Dead-Letter Queue)

A holding area for messages that failed terminally so they can be inspected and recovered.

A DLQ (dead-letter queue) is a holding area for messages that couldn't be processed successfully — events that exhausted their retry budget, hit a terminal error, or got stuck in a way retries can't fix. Without a DLQ, those events disappear silently or block the rest of the queue.

What ends up in a DLQ

Three categories of failure:

  • Exhausted retries. A delivery has been attempted N times and still fails. After N, the system moves it to the DLQ instead of looping forever.
  • Terminal destination responses. A destination keeps returning non-2xx or timing out until the retry budget is exhausted.
  • Poison messages. A specific event triggers a bug that crashes or rejects the consumer every time. Without a DLQ, the queue stays stuck on this one event.

Provider signature failures are different: Hooksbase rejects forged or stale provider requests before persistence, so they do not create deliveries or DLQ entries.

Recovery patterns

Four patterns show up in production:

  1. Manual inspect-and-replay — engineer opens the DLQ, looks at an entry, decides to replay
  2. Bulk re-drive after a fix — ship a fix, then bulk-replay everything that failed for the same reason
  3. Escalation path — high-stakes failures notify on-call immediately
  4. Backfill from source — for events past the relay's payload retention, query the upstream API

Hooksbase moves terminally-failed deliveries to a DLQ tier with their own dashboard, querying, and re-drive operations. For the practical recovery walkthrough: Recover failed agent events with DLQ and replay. For the design framework: Webhook DLQs: design and recovery patterns.

Related terms