Add retry and dead-letter queue for failed backend dispatches #2

Open
opened 2026-02-18 11:01:39 +13:00 by guisea · 0 comments
Owner

When a zone save or delete is dispatched to multiple backends in parallel, a partial failure (one backend succeeds, another fails) is currently logged and discarded. There is no retry mechanism and no way to recover the failed operation without manual intervention.

Current behaviour:

  • _process_backends_parallel / _delete_single_backend log errors and move on
  • The queue item is marked done regardless of backend success/failure
  • A BIND failure while CoreDNS succeeds leaves DNS state inconsistent across backends

Proposed solution:

  1. Track per-backend success/failure in the dispatch result
  2. On partial failure, re-queue the item with a retry counter and a list of failed backends
  3. After N retries (configurable, default 3), move the item to a dead-letter queue file for manual review
  4. Expose dead-letter queue length in the /health and queue_status endpoints

Config addition:

app:
  max_retries: 3
When a zone save or delete is dispatched to multiple backends in parallel, a partial failure (one backend succeeds, another fails) is currently logged and discarded. There is no retry mechanism and no way to recover the failed operation without manual intervention. **Current behaviour:** - `_process_backends_parallel` / `_delete_single_backend` log errors and move on - The queue item is marked done regardless of backend success/failure - A BIND failure while CoreDNS succeeds leaves DNS state inconsistent across backends **Proposed solution:** 1. Track per-backend success/failure in the dispatch result 2. On partial failure, re-queue the item with a retry counter and a list of failed backends 3. After N retries (configurable, default 3), move the item to a dead-letter queue file for manual review 4. Expose dead-letter queue length in the `/health` and queue_status endpoints **Config addition:** ```yaml app: max_retries: 3 ```
guisea added the enhancement label 2026-02-18 11:01:39 +13:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: cybercinch/directdnsonly#2