# Operator Alerting

Operator alerting is the customer-visible incident layer for webhook delivery, quota, DLQ, and drain health. Use it when you need durable issue state and routed notifications instead of raw event firehoses.

**Available through**

Operator alerting is available in the dashboard and on public project-authenticated routes for admin keys.

| Surface | Status | Notes |
| --- | --- | --- |
| [Dashboard](/docs/dashboard.md) | Preferred | Use the UI to configure channels, review incidents, and investigate dispatch history. |
| [Public API](/docs/api-reference.md#operator-alerting) | Available | Use admin project API keys for operator webhooks, channels, rules, incidents, and failure clusters. |
| [Raw HTTP](/docs/operator-alerting.md#api-examples) | Raw HTTP | The route family is public, but not yet wrapped by first-party clients. |
| [CLI](/docs/cli.md) | Not wrapped | The CLI does not expose operator alerting commands yet. |

## Alerting model

This surface is Pro+ and project-scoped.

The model has four main objects:

- operator webhooks for webhook-backed alert delivery
- alert channels for project-owned routing targets, including email channels
- fixed alert rules that bind incident families to channels
- incidents and dispatch history

Issue families include delivery failures, backlog growth, quota pressure, secret lifecycle events, DLQ accumulation, and drain degradation.

## Auth model

- **Public API** `admin project API key`: Operator notification routes live under `/v1/project/...` and require the admin role, not a write key.
- **Dashboard** `session auth`: The [dashboard](/docs/dashboard.md) provides the same controls — channels, rules, and incidents — as a UI for project members.
- **SDK / CLI**: The first-party SDK and CLI do not currently wrap operator alerting. Use raw HTTP.

## Channels, rules, and incidents

Channel and rule basics:

- webhook-backed channels are managed through the operator-webhook API
- email channels are managed through the alert-channel API
- alert rules are fixed families, patched rather than user-defined from scratch
- enabling a rule or reactivating a channel can backfill matching open incidents

Incident lifecycle highlights:

- most issue families use `open` and `resolved`
- `secret_lifecycle` is edge-triggered and uses `occurred`
- incidents can be muted, unmuted, resolved, and reopened
- manual resolve suppresses immediate reopening until recovery is observed

This complements [event drains](/docs/event-drains.md): alerting gives you durable incident state and fan-out, while drains give you the raw lifecycle stream.

## API examples

Use an admin project API key for operator alerting routes. These examples use raw HTTP because the first-party SDK and CLI do not wrap this surface yet.

**List alert channels**

```bash
curl https://api.hooksbase.com/v1/project/alert-channels \
  -H "Authorization: Bearer swk_..."
```

**Update one alert rule**

```bash
curl https://api.hooksbase.com/v1/project/alert-rules/terminal_failure_spike \
  -X PATCH \
  -H "Authorization: Bearer swk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "channelIds": ["alch_123"]
  }'
```

**List open incidents**

```bash
curl -G https://api.hooksbase.com/v1/project/operator-incidents \
  -H "Authorization: Bearer swk_..." \
  -d status=open \
  -d limit=20
```

## Dispatches, clusters, and audit adjacency

Dispatch phases include:

- `opened`
- `reminder`
- `resolved`
- `occurred`

Other operator views:

- per-channel dispatch history
- read-time failure clusters for recent failed deliveries and DLQ activity
- project audit logs on Business+ plans for adjacent control-plane investigation

The legacy operator-webhook API still matters because webhook-backed channels use its lifecycle, signing-secret rotation, and secret-version ledger.

## Related routes

- `GET/POST /v1/project/operator-webhooks`
- `GET/PATCH /v1/project/operator-webhooks/{id}`
- `POST /v1/project/operator-webhooks/{id}/pause`
- `POST /v1/project/operator-webhooks/{id}/resume`
- `POST /v1/project/operator-webhooks/{id}/archive`
- `POST /v1/project/operator-webhooks/{id}/restore`
- `POST /v1/project/operator-webhooks/{id}/rotate-signing-secret`
- `GET /v1/project/operator-webhooks/{id}/secret-versions`
- `GET /v1/project/operator-webhooks/{id}/dispatches`
- `GET/POST /v1/project/alert-channels`
- `GET/PATCH /v1/project/alert-channels/{id}`
- `GET /v1/project/alert-channels/{id}/dispatches`
- `GET /v1/project/alert-rules`
- `GET /v1/project/alert-rules/{family}`
- `PATCH /v1/project/alert-rules/{family}`
- `GET /v1/project/operator-failure-clusters`
- `GET /v1/project/operator-incidents`
- `GET /v1/project/operator-incidents/{id}`
- `POST /v1/project/operator-incidents/{id}/mute`
- `POST /v1/project/operator-incidents/{id}/unmute`
- `POST /v1/project/operator-incidents/{id}/resolve`
- `POST /v1/project/operator-incidents/{id}/reopen`

## Common mistakes

- Using a write key and expecting access to `/v1/project/operator-...` routes.
- Treating alert rules as arbitrary custom rule builders instead of fixed incident families.
- Forgetting that webhook-backed channels are still managed through operator-webhook routes.
- Expecting raw event streaming from alerting instead of [event drains](/docs/event-drains.md).