Compare commits

...

14 Commits

Author SHA1 Message Date
fbb6220728 feat: add NSD backend and Topology C (multi-instance with peer sync) 🏗️
- New NSDBackend: zone files + nsd-control reload, zone registration via
  nsd.conf.d include file; mirrors BIND backend interface exactly
- BackendRegistry now supports type "nsd"; config defaults for nsd.zones_dir
  and nsd.nsd_conf
- Dockerfile installs both NSD and BIND9 — entrypoint detects configured
  backend type(s) and starts only the required daemon; CoreDNS MySQL
  deployments start neither
- docker/nsd.conf: minimal NSD base config with remote-control and
  zones.conf include
- entrypoint.sh: reads config file + env vars to determine which daemon
  to start; runs nsd-control-setup on first boot
- 20 new NSD backend tests (117 total, all passing)
- README: Topology C (multi-instance + peer sync) documented as most robust
  HA option; NSD config reference; updated topology comparison table;
  NSD env-var-only compose examples; version 2.5.0
2026-02-20 06:29:39 +13:00
f9907d2859 chore: complete SQLAlchemy 2.0 migration in coredns_mysql backend and tests ⬆️
Migrate remaining session.query() calls in coredns_mysql.py to
select()/session.execute() style; update bulk delete to delete()
construct and count to func.count(); drop sessionmaker(bind=).
Update test fixtures and assertions to match.

Zero session.query() calls remaining across the entire codebase.
2026-02-19 23:43:54 +13:00
d81ecd6bdd fix: migrate remaining session.query() calls to SQLAlchemy 2.0 select() 🔧 2026-02-19 23:38:31 +13:00
8c1c2b4abc chore: upgrade SQLAlchemy to 2.0 and bump all stale deps ⬆️
- SQLAlchemy 1.4 → 2.0.46: migrate all session.query() calls to
  select() / session.execute() style; move declarative_base import
  from ext.declarative to sqlalchemy.orm; explicit conn.commit()
  after DDL in _migrate(); drop sessionmaker(bind=) keyword
- persist-queue 1.0 → 1.1, pymysql 1.1.1 → 1.1.2,
  dnspython 2.7 → 2.8, pyyaml 6.0.2 → 6.0.3
- pytest 8.3 → 9.0.2, pytest-cov 6.1 → 7.0,
  pytest-mock 3.14 → 3.15.1, black 25.1 → 26.1

97 tests pass, zero deprecation warnings
2026-02-19 23:37:15 +13:00
22e64498ce chore: bump version to 2.4.0 🚀 2026-02-19 22:20:28 +13:00
143cf9c792 feat: add peer sync worker for zone_data exchange between nodes 🔄
Adds optional peer-to-peer zone_data replication between directdnsonly
instances. Enables eventual consistency in DA Multi-Server topologies
without a shared datastore.

- InternalAPI: GET /internal/zones (list) and ?domain= (detail)
  exposes zone_data to peers via existing basic auth
- PeerSyncWorker: interval-based daemon thread that fetches zone_data
  from configured peers, storing newer entries locally; peer downtime
  is silently skipped and retried next interval
- WorkerManager: wires PeerSyncWorker alongside reconciler; exposes
  peer_syncer_alive in queue_status
- Config: peer_sync block with enabled/interval_minutes/peers[]
- Tests: 13 tests covering sync, skip-older, skip-unreachable, empty
  peer list, bad status, and missing zone_data scenarios
2026-02-19 22:16:55 +13:00
33f4f30b5f feat: add initial_delay_minutes to reconciler for LB stagger 🕐
Configurable startup delay before the first reconciliation pass so that
multiple receivers behind a load balancer can be offset without relying
on container start order (which is lost on reboot). Set to half the
interval on the secondary receiver — e.g. interval 60m → delay 30m.
Default is 0 (no change to existing behaviour). Stop event is respected
during the delay so the worker shuts down cleanly even mid-wait.
2026-02-19 15:28:30 +13:00
b939bb5fa0 docs: add DNS server resource and scale guide with NSD/Knot comparison 📊
Cover memory profiles, zone-count thresholds, reload behaviour, and
throughput characteristics for BIND9, CoreDNS MySQL, NSD, and Knot DNS.
Call out NSD as the recommended lighter bundled alternative to BIND9
(~5-10 MB base, near-identical zone file format, same reload semantics)
and note the ~300-zone crossover where CoreDNS MySQL starts to win.
2026-02-19 14:48:10 +13:00
70ae81ee0d docs: rewrite topology comparison with accurate failure-mode analysis 📋
Expand both topology diagrams to show the retry queue and healing pass in
the flow. Add per-topology failure-behaviour tables covering transient backend
failure, prolonged outage, container-down-during-push, and cross-node drift.
Rewrite the comparison table to call out the key architectural difference:
Topology A has no auto-recovery from prolonged BIND failure (needs next DA push);
Topology B's reconciler healing pass re-syncs missing backends from stored
zone_data without any DA involvement.
2026-02-19 14:17:53 +13:00
b523b17f30 feat: retry queue, backend healing, and zone_data persistence 🔁
- worker.py: third persistent retry queue with exponential backoff (30s→30m,
  max 5 attempts); failed backends tracked per-item so retries target only the
  failing nodes; zone_data stored in DB after every successful write
- Domain model: zone_data TEXT + zone_updated_at DATETIME columns; additive
  migration applied on startup so existing deployments upgrade in place
- ReconciliationWorker: Option C healing pass — checks every configured backend
  for zone presence after each reconciliation cycle and re-queues any zone
  missing from a backend using stored zone_data, enabling automatic recovery
  from prolonged backend outages without waiting for DirectAdmin to re-push
- 82 tests, all passing
2026-02-19 14:05:22 +13:00
0e044b7dc2 chore: remove unimplemented PowerDNS MySQL backend 🗑️
Dead code from v1 planning — never implemented, superseded by the
CoreDNS MySQL backend. Also carried a broken stale import that would
have caused an ImportError on load.
2026-02-19 12:24:30 +13:00
e0a119558d refactor: extract DirectAdminClient into directdnsonly.app.da module 🏗️
Move all outbound DirectAdmin HTTP logic out of ReconciliationWorker and
into a dedicated, independently testable DirectAdminClient class:

- directdnsonly/app/da/client.py: list_domains (paginated JSON + legacy
  fallback), get (authenticated GET to any CMD_* endpoint), _login
  (DA Evo session-cookie fallback), _parse_legacy_domain_list
- directdnsonly/app/da/__init__.py: public re-export of DirectAdminClient
- reconciler.py: now purely reconciliation logic; instantiates a client
  per configured server — no HTTP code remaining
- tests/test_da_client.py: 16 dedicated tests for DirectAdminClient
- tests/test_reconciler.py: mocks at the DirectAdminClient class boundary
  instead of the internal _fetch_da_domains method

Bumped to 2.2.0 — DirectAdminClient is now a first-class public API.
2026-02-19 12:16:22 +13:00
ae1e89a236 feat: conditional BIND startup; config search path priority fix 🔧
- entrypoint: only start named when a bind backend is configured and
  enabled in app.yml; CoreDNS-only deployments skip named entirely
- config: user-supplied paths (/etc/directdnsonly, ./config) now
  searched before the bundled app.yml so mounted configs take effect
- docs: deployment topology reference — Topology A (dual BIND HA) and
  Topology B (single instance, multi-DC CoreDNS MySQL)
- chore: bump version to 2.1.0
- justfile: add build-docker recipe
2026-02-19 12:07:37 +13:00
aac7b365a5 fix: remove stale COPY config from Dockerfile 🐛
Root config/ directory was removed when the duplicate config/app.yml was
deleted — the canonical config is now bundled inside directdnsonly/config/
and is already covered by the existing COPY directdnsonly step.
2026-02-18 23:16:52 +13:00
30 changed files with 3368 additions and 1452 deletions

View File

@@ -1,16 +1,22 @@
FROM python:3.11.12-slim FROM python:3.11.12-slim
# Install system dependencies # Install system dependencies.
RUN apt-get update && apt-get install -y \ # Both NSD and BIND are installed so the image works with any DNS backend type.
# The entrypoint detects which one is configured and starts only that daemon.
# CoreDNS MySQL users: neither daemon is started — the image is still usable.
RUN apt-get update && apt-get install -y --no-install-recommends \
bind9 \ bind9 \
bind9utils \ bind9utils \
nsd \
dnsutils \ dnsutils \
gcc \ gcc \
python3-dev \ python3-dev \
default-libmysqlclient-dev \ default-libmysqlclient-dev \
&& rm -rf /var/lib/apt/lists/* && rm -rf /var/lib/apt/lists/*
# Configure BIND # ---------------------------------------------------------------------------
# BIND setup
# ---------------------------------------------------------------------------
RUN mkdir -p /etc/named/zones && \ RUN mkdir -p /etc/named/zones && \
chown -R bind:bind /etc/named && \ chown -R bind:bind /etc/named && \
chmod 755 /etc/named/zones chmod 755 /etc/named/zones
@@ -19,32 +25,34 @@ COPY docker/named.conf.local /etc/bind/
COPY docker/named.conf.options /etc/bind/ COPY docker/named.conf.options /etc/bind/
RUN chown root:bind /etc/bind/named.conf.* RUN chown root:bind /etc/bind/named.conf.*
# Install Python dependencies # ---------------------------------------------------------------------------
# NSD setup
# ---------------------------------------------------------------------------
RUN mkdir -p /etc/nsd/zones /etc/nsd/nsd.conf.d && \
chown -R nsd:nsd /etc/nsd && \
chmod 755 /etc/nsd/zones
COPY docker/nsd.conf /etc/nsd/nsd.conf
RUN chown nsd:nsd /etc/nsd/nsd.conf
# ---------------------------------------------------------------------------
# Application
# ---------------------------------------------------------------------------
WORKDIR /app WORKDIR /app
COPY pyproject.toml poetry.lock README.md ./ COPY pyproject.toml poetry.lock README.md ./
# Install specific Poetry version that matches your lock file RUN pip install "poetry==2.1.2"
RUN pip install "poetry==2.1.2" # Adjust version to match your lock file
# Copy application files
COPY directdnsonly ./directdnsonly COPY directdnsonly ./directdnsonly
COPY config ./config
COPY schema ./schema COPY schema ./schema
RUN poetry config virtualenvs.create false && \ RUN poetry config virtualenvs.create false && \
poetry install poetry install
# Create data directories # Create data directories
RUN mkdir -p /app/data/queues && \ RUN mkdir -p /app/data/queues /app/data/zones /app/logs && \
mkdir -p /app/data/zones && \
mkdir -p /app/logs && \
chmod -R 755 /app/data chmod -R 755 /app/data
# Configure BIND zone directory to match app config
#RUN ln -s /app/data/zones /etc/named/zones/dadns
# Start script # Start script
COPY docker/entrypoint.sh /entrypoint.sh COPY docker/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh RUN chmod +x /entrypoint.sh

592
README.md
View File

@@ -1,10 +1,340 @@
# DaDNS - DNS Management System # DirectDNSOnly - DNS Management System
## Deployment Topologies
Three reference topologies are documented below. Choose the one that matches your infrastructure.
---
### Topology A — Dual NSD/BIND Instances (High-Availability / Multi-Server)
Two independent DirectDNSOnly containers, each running a bundled DNS daemon (NSD by default, or BIND9). Both are registered as Extra DNS servers in the same DirectAdmin Multi-Server environment, so DA pushes every zone change to both simultaneously.
```
DirectAdmin Multi-Server
├─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-1 (container, BIND backend)
│ │
│ Persistent Queue
│ ├─ writes zone file
│ ├─ reloads named
│ └─ retry on failure (exp. backoff)
│ (serves authoritative DNS on :53)
└─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-2 (container, BIND backend)
Persistent Queue
├─ writes zone file
├─ reloads named
└─ retry on failure (exp. backoff)
(serves authoritative DNS on :53)
```
**Each instance is completely independent** — no shared state, no cross-talk. Redundancy comes from DA pushing to both. If one container goes down, DA continues to push to the other.
#### Failure behaviour
| Scenario | What happens |
|---|---|
| One container down during DA push | DA cannot deliver; that instance misses the update. The retry queue inside that instance cannot help — the push never arrived. When the container recovers, it will serve stale zone data until DA re-pushes (next zone change triggers a new push). |
| BIND crashes but container stays up | The zone write lands in the persistent queue. The retry worker replays it with exponential backoff (30 s → 2 m → 5 m → 15 m → 30 m, up to 5 attempts). |
| Zone deleted from DA while instance was down | The reconciliation poller detects the orphan on the next pass and queues a delete, keeping the BIND instance clean without manual intervention. |
| Two instances diverge | No automatic cross-instance sync. Drift persists until DA re-pushes the affected zone (i.e. the next time that domain is touched in DA). |
> **DNS consistency note:** DirectAdmin pushes to each Extra DNS server sequentially, not atomically. If one instance is offline when a zone is changed, that instance will serve stale data until the next DA push for that zone. For workloads where split-brain DNS is unacceptable, use Topology B (single write path → multiple MySQL backends) instead.
#### `config/app.yml` — instance 1
```yaml
app:
auth_username: directdnsonly
auth_password: your-secret
dns:
default_backend: bind
backends:
bind:
type: bind
enabled: true
zones_dir: /etc/named/zones
named_conf: /etc/bind/named.conf.local
```
#### `docker-compose.yml` sketch — instance 1
```yaml
services:
directdnsonly-1:
image: guisea/directdnsonly:2.5.0
ports:
- "2222:2222" # DA pushes here
- "53:53/udp" # authoritative DNS
volumes:
- ./config:/app/config
- ./data:/app/data
```
Register both containers as separate Extra DNS entries in DA → DNS Administration → Extra DNS Servers, with the same credentials configured in each `config/app.yml`.
---
### Topology B — Single Instance, Multiple CoreDNS MySQL Backends (Multi-DC)
One DirectDNSOnly instance receives zone pushes from DirectAdmin and fans out to two (or more) CoreDNS MySQL databases in parallel. CoreDNS servers in each data centre read from their local database. The directdnsonly instance is the sole write path — it does **not** serve DNS itself.
```
DirectAdmin
└─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly (single container)
Persistent Queue (survives restarts)
zone_data stored to SQLite after each write
ThreadPoolExecutor (one thread per backend)
│ │
▼ ▼
coredns_mysql_dc1 coredns_mysql_dc2
(MySQL 10.0.0.80) (MySQL 10.0.1.29)
│ │
[success] [failure → retry queue]
│ │
▼ 30s/2m/5m/15m/30m backoff
CoreDNS (DC1) retry → coredns_mysql_dc2
serves :53 from DB
Reconciliation poller (every N minutes)
├─ orphan detection (zones removed from DA)
└─ healing pass: zone_exists() per backend
→ re-queue any backend missing a zone
using stored zone_data (no DA re-push needed)
```
Both MySQL backends are written **concurrently** within the same zone update. A slow or unreachable secondary does not block the primary write. Failed backends enter the retry queue automatically. The reconciliation healing pass provides a further safety net for prolonged outages.
#### Failure behaviour
| Scenario | What happens |
|---|---|
| One MySQL backend unreachable | Other backend(s) succeed immediately. Failed backend queued for retry with exponential backoff (30 s → 2 m → 5 m → 15 m → 30 m, up to 5 attempts). |
| MySQL backend down for hours | Retry queue exhausts. On recovery, the reconciliation healing pass detects the backend is missing zones and re-pushes all of them using stored `zone_data` — no DA intervention required. |
| directdnsonly container restarts | Persistent queue survives. In-flight zone updates replay on startup. |
| directdnsonly container down during DA push | DA cannot deliver. Persistent queue on disk is intact; when the container comes back, it resumes processing any previously queued items. New pushes during downtime are lost at the DA level (DA does not retry). |
| Zone deleted from DA | Reconciliation poller detects orphan and queues delete across all backends. |
#### `config/app.yml`
```yaml
app:
auth_username: directdnsonly
auth_password: your-secret
dns:
default_backend: coredns_mysql_dc1
backends:
coredns_mysql_dc1:
type: coredns_mysql
enabled: true
host: 10.0.0.80
port: 3306
database: coredns
username: coredns
password: your-db-password
coredns_mysql_dc2:
type: coredns_mysql
enabled: true
host: 10.0.1.29
port: 3306
database: coredns
username: coredns
password: your-db-password
```
Adding a third data centre is a single stanza in the config — no code changes required.
---
### Topology C — Multi-Instance with Peer Sync (Most Robust)
Multiple independent DirectDNSOnly containers, each with a single local DNS backend (NSD or CoreDNS MySQL), registered as separate Extra DNS servers in DirectAdmin Multi-Server. Peer sync provides eventual consistency — if one instance misses a DA push while it is offline, it recovers the missing zone data from a peer on the next sync interval.
```
DirectAdmin Multi-Server
├─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-syd (NSD or CoreDNS MySQL)
│ │
│ Persistent Queue + zone_data store
│ ├─ writes zone file / MySQL
│ ├─ reloads daemon
│ └─ retry on failure
│ │
│ ◀──── peer sync ────▶
│ │
└─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-mlb (NSD or CoreDNS MySQL)
Persistent Queue + zone_data store
├─ writes zone file / MySQL
├─ reloads daemon
└─ retry on failure
```
**Why this is the most robust topology:**
- DA pushes to each instance independently — no single point of failure
- No load balancer in the write path — a dead LB cannot silence both instances
- Each instance serves DNS immediately from its own daemon
- If SYD misses a push while offline, it pulls the newer zone from MLB on the next peer sync (default 15 minutes)
- Peer sync is best-effort eventual consistency — deliberately simple, no consensus protocol
#### Failure behaviour
| Scenario | What happens |
|---|---|
| One instance down during DA push | Other instance(s) receive and serve the update. When the downed instance recovers, peer sync detects the stale/missing `zone_updated_at` and pulls the newer zone data from a peer. |
| Both instances down during DA push | Both miss the push. When they recover, they sync from each other — the most recently updated peer wins per zone. No DA re-push needed. |
| Peer offline | Peer sync silently skips unreachable peers. Syncs resume automatically when the peer recovers. |
| Zone deleted from DA | Reconciliation poller detects the orphan and queues the delete on each instance independently. |
#### `config/app.yml` — instance syd
```yaml
app:
auth_username: directdnsonly
auth_password: your-secret
dns:
default_backend: nsd
backends:
nsd:
type: nsd
enabled: true
zones_dir: /etc/nsd/zones
nsd_conf: /etc/nsd/nsd.conf.d/zones.conf
peer_sync:
enabled: true
interval_minutes: 15
peers:
- url: http://directdnsonly-mlb:2222
username: directdnsonly
password: your-secret
reconciliation:
enabled: true
interval_minutes: 60
directadmin_servers:
- hostname: da.syd.example.com
port: 2222
username: admin
password: da-secret
ssl: true
```
Register each container as a separate Extra DNS server entry in DA → DNS Administration → Extra DNS Servers with the same credentials.
---
### Topology Comparison
| | Topology A — Dual NSD/BIND | Topology B — CoreDNS MySQL | Topology C — Multi-Instance + Peer Sync |
|---|---|---|---|
| **DNS server** | NSD or BIND9 (bundled) | CoreDNS (separate, reads MySQL) | NSD or CoreDNS MySQL (per instance) |
| **Write path** | DA → each instance independently | DA → single instance → all backends | DA → each instance independently |
| **Zone storage** | Zone files on container disk | MySQL database rows | Zone files or MySQL + SQLite zone_data store |
| **DA registration** | Two Extra DNS server entries | One Extra DNS server entry | One entry per instance |
| **Redundancy model** | Independent app+DNS units | One app, N database backends | Independent instances + peer sync |
| **Transient backend failure** | Retry queue (exp. backoff, 5 attempts) | Retry queue (exp. backoff, 5 attempts) | Retry queue (exp. backoff, 5 attempts) |
| **Prolonged backend outage** | No auto-recovery — waits for next DA push | Reconciler healing pass re-pushes all missing zones | Peer sync pulls missed zones from a healthy peer |
| **Container down during push** | Zone missed entirely | Zone missed at DA level | Zone missed at DA level; recovered via peer sync |
| **Cross-node consistency** | No sync between instances | All backends share same write path | Peer sync provides eventual consistency |
| **Orphan detection** | Yes — reconciler | Yes — reconciler | Yes — reconciler (per instance) |
| **External DB required** | No | Yes (MySQL per CoreDNS node) | No (NSD) or Yes (CoreDNS MySQL) |
| **Horizontal scaling** | Add DA Extra DNS entries + containers | Add backend stanzas in config | Add DA Extra DNS entries + containers + peer list |
| **Best for** | Simple HA, no external DB | Multi-DC, stronger consistency | Most robust HA — survives extended outages without DA re-push |
---
## DNS Server Resource and Scale Guide
### BIND9 vs CoreDNS MySQL — resource profile
| | BIND9 (bundled) | CoreDNS + MySQL |
|---|---|---|
| **Base memory** | ~1315 MB | ~2030 MB (CoreDNS binary) + MySQL process |
| **Per-zone overhead** | ~300 bytes per resource record in memory | Schema rows in MySQL; CoreDNS itself holds no zone state |
| **100-zone deployment** | ~3060 MB total | ~80150 MB (CoreDNS + MySQL combined) |
| **500-zone deployment** | ~100300 MB total | ~100200 MB (zone data lives in MySQL, not CoreDNS) |
| **Zone reload** | `rndc reload <zone>` — per-zone is fast; full reload blocks queries for seconds at large counts | No reload needed — CoreDNS queries MySQL at resolution time |
| **Zone update latency** | File write + `rndc reload` — typically <100 ms for a single zone | Write to MySQL — immediately visible to CoreDNS on next query |
| **CPU on reload** | Spikes on full `rndc reload`; grows linearly with zone count | No reload CPU spike; MySQL write is the only cost |
| **Query throughput** | High — zones loaded into memory | Slightly lower — each query hits MySQL (mitigated by MySQL query cache / connection pooling) |
| **Scale ceiling** | Degrades past ~1 000 zones: memory climbs, full reloads take 120 s+ | Scales with MySQL — thousands of zones with no DNS-process impact |
**Rule of thumb:** Below ~300 zones BIND9 and CoreDNS MySQL are broadly comparable. Above ~500 zones, CoreDNS MySQL has a significant advantage because zone data lives entirely in the database — adding a new zone costs one MySQL INSERT, not a daemon reload.
---
### Bundled DNS daemons — NSD and BIND9
The container image ships with **both NSD and BIND9** installed. The entrypoint reads your config and starts only the daemon that matches the configured backend type. CoreDNS MySQL deployments start neither.
**NSD (Name Server Daemon)** from NLnet Labs is the default recommendation:
| | BIND9 | NSD | Knot DNS |
|---|---|---|---|
| **Design focus** | Everything (authoritative + recursive + DNSSEC + ...) | Authoritative only | Authoritative only |
| **Base memory** | ~1315 MB | ~510 MB | ~1015 MB |
| **500-zone memory** | ~100300 MB | <100 MB (estimated) | ~100200 MB (3× zone text size) |
| **Zone update** | `rndc reload <zone>` | `nsd-control reload` | `knotc zone-reload` (atomic via RCU — zero query interruption) |
| **Config format** | `named.conf` / zone files | `nsd.conf` / zone files (nearly identical format) | `knot.conf` / zone files |
| **Docker image** | ~150200 MB | ~3050 MB Alpine | ~4060 MB Alpine |
| **Recursive queries** | Yes (if configured) | No | No |
| **Throughput** | Baseline | ~25× BIND9 | ~510× BIND9 (2.2 Mqps at 32 cores) |
| **Production use** | Wide adoption | TLD servers (`.nl`, `.se`), major registries | CZ.NIC, Cloudflare internal testing |
**NSD** would slot almost directly into the existing BIND backend implementation — zone files have the same RFC 1035 format, and `nsd-control reload` is the equivalent of `rndc reload`. The main implementation difference is the daemon config file (`nsd.conf` vs `named.conf`) and the absence of `named.conf.local`-style zone includes (NSD uses pattern-based config).
**Knot DNS** is worth considering if seamless zone updates matter: its RCU (Read-Copy-Update) mechanism serves the old zone to in-flight queries while atomically swapping in the new one — there is no window where queries see a partially-loaded zone. It is meaningfully heavier than NSD at moderate zone counts but the best performer at high scale.
**Summary recommendation:**
- **Up to ~300 zones, no external DB:** Use the NSD backend (bundled) — lighter, faster, authoritative-only, same zone file format as BIND.
- **3001 000+ zones:** CoreDNS MySQL wins — zone data in MySQL means no daemon reload at all.
- **Need zero-interruption zone swaps:** Knot DNS.
- **Need an HTTP API for zone management (no file I/O):** PowerDNS Authoritative with its native HTTP API and file/SQLite backend.
---
## CoreDNS MySQL Backend — Required Fork
The `coredns_mysql` backend writes zones to a MySQL database that CoreDNS reads
at query time. **Vanilla CoreDNS with a stock MySQL plugin is not sufficient**
out of the box it does not act as a fully authoritative server, does not return
NS records in the additional section, does not set the AA flag, and does not
handle wildcard records.
This project is designed to work with a patched fork that resolves all of those
issues:
**[cybercinch/coredns_mysql_extend](https://github.com/cybercinch/coredns_mysql_extend)**
Key differences from the upstream plugin:
- Fully authoritative responses — correct AA flag and NXDOMAIN on misses
- Wildcard record support (`*` entries served correctly)
- NS records returned in the additional section
Use the BIND backend if you want a zero-dependency setup with no custom CoreDNS
build required.
---
## Features ## Features
- Multi-backend DNS management (BIND, CoreDNS MySQL) - Multi-backend DNS management (NSD, BIND, CoreDNS MySQL)
- Parallel backend dispatch — all enabled backends updated simultaneously - Parallel backend dispatch — all enabled backends updated simultaneously
- Persistent queue — zone updates survive restarts - Persistent queue — zone updates survive restarts
- Automatic record-count verification and drift reconciliation - Automatic record-count verification and drift reconciliation
- Peer sync — eventual consistency between directdnsonly instances
- Thread-safe operations - Thread-safe operations
- Loguru-based logging - Loguru-based logging
@@ -16,7 +346,7 @@
## Concurrent Multi-Backend Processing ## Concurrent Multi-Backend Processing
DaDNS propagates every zone update to all enabled backends in parallel using a DirectDNSOnly propagates every zone update to all enabled backends in parallel using a
queue-based worker architecture. queue-based worker architecture.
### Architecture ### Architecture
@@ -91,33 +421,249 @@ dns:
## Configuration ## Configuration
Edit `config/app.yml` for backend settings. Credentials can be overridden via DirectDNSOnly uses [Vyper](https://github.com/sn3d/vyper-py) for configuration. Settings are resolved in this priority order (highest wins):
environment variables using the `DADNS_` prefix (e.g.
`DADNS_APP_AUTH_PASSWORD`). 1. **Environment variables**`DADNS_` prefix, dots replaced with underscores (e.g. `DADNS_APP_AUTH_PASSWORD`)
2. **Config file**`app.yml` searched in `/etc/directdnsonly`, `.`, `./config`, then the bundled default
3. **Built-in defaults** (shown in the table below)
**A config file is entirely optional.** Every scalar setting can be provided through environment variables alone.
---
### Configuration Reference
#### Core
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `log_level` | `DADNS_LOG_LEVEL` | `info` | Log verbosity: `debug`, `info`, `warning`, `error` |
| `timezone` | `DADNS_TIMEZONE` | `Pacific/Auckland` | Timezone for log timestamps |
| `queue_location` | `DADNS_QUEUE_LOCATION` | `./data/queues` | Path for the persistent zone-update queue |
#### App (HTTP server)
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `app.auth_username` | `DADNS_APP_AUTH_USERNAME` | `directdnsonly` | Basic auth username for all API routes (including `/internal`) |
| `app.auth_password` | `DADNS_APP_AUTH_PASSWORD` | `changeme` | Basic auth password — **always override in production** |
| `app.listen_port` | `DADNS_APP_LISTEN_PORT` | `2222` | TCP port the HTTP server binds to |
| `app.ssl_enable` | `DADNS_APP_SSL_ENABLE` | `false` | Enable TLS on the HTTP server |
| `app.proxy_support` | `DADNS_APP_PROXY_SUPPORT` | `true` | Trust `X-Forwarded-For` from a reverse proxy |
| `app.proxy_support_base` | `DADNS_APP_PROXY_SUPPORT_BASE` | `http://127.0.0.1` | Trusted proxy base address |
#### Datastore (internal SQLite)
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `datastore.type` | `DADNS_DATASTORE_TYPE` | `sqlite` | Internal datastore type (only `sqlite` supported) |
| `datastore.db_location` | `DADNS_DATASTORE_DB_LOCATION` | `data/directdns.db` | Path to the SQLite database file |
#### DNS backends — BIND
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `dns.default_backend` | `DADNS_DNS_DEFAULT_BACKEND` | _(none)_ | Name of the primary backend (used for status/health reporting) |
| `dns.backends.bind.enabled` | `DADNS_DNS_BACKENDS_BIND_ENABLED` | `false` | Enable the bundled BIND9 backend |
| `dns.backends.bind.zones_dir` | `DADNS_DNS_BACKENDS_BIND_ZONES_DIR` | `/etc/named/zones` | Directory where zone files are written |
| `dns.backends.bind.named_conf` | `DADNS_DNS_BACKENDS_BIND_NAMED_CONF` | `/etc/named.conf.local` | `named.conf` include file managed by directdnsonly |
#### DNS backends — NSD
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `dns.backends.nsd.enabled` | `DADNS_DNS_BACKENDS_NSD_ENABLED` | `false` | Enable the NSD backend |
| `dns.backends.nsd.zones_dir` | `DADNS_DNS_BACKENDS_NSD_ZONES_DIR` | `/etc/nsd/zones` | Directory where zone files are written |
| `dns.backends.nsd.nsd_conf` | `DADNS_DNS_BACKENDS_NSD_NSD_CONF` | `/etc/nsd/nsd.conf.d/zones.conf` | NSD zone include file managed by directdnsonly |
#### DNS backends — CoreDNS MySQL
The built-in env var mapping targets the backend named `coredns_mysql`. For multiple named CoreDNS backends (e.g. `coredns_dc1`, `coredns_dc2`) you must use a config file — see [Multi-backend via config file](#multi-backend-via-config-file) below.
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `dns.backends.coredns_mysql.enabled` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_ENABLED` | `false` | Enable the CoreDNS MySQL backend |
| `dns.backends.coredns_mysql.host` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_HOST` | `localhost` | MySQL host |
| `dns.backends.coredns_mysql.port` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_PORT` | `3306` | MySQL port |
| `dns.backends.coredns_mysql.database` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_DATABASE` | `coredns` | MySQL database name |
| `dns.backends.coredns_mysql.username` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_USERNAME` | `coredns` | MySQL username |
| `dns.backends.coredns_mysql.password` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_PASSWORD` | _(empty)_ | MySQL password |
#### Reconciliation poller
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `reconciliation.enabled` | `DADNS_RECONCILIATION_ENABLED` | `false` | Enable the background reconciliation poller |
| `reconciliation.dry_run` | `DADNS_RECONCILIATION_DRY_RUN` | `false` | Log orphans but do not queue deletes (safe first-run mode) |
| `reconciliation.interval_minutes` | `DADNS_RECONCILIATION_INTERVAL_MINUTES` | `60` | How often the poller runs |
| `reconciliation.verify_ssl` | `DADNS_RECONCILIATION_VERIFY_SSL` | `true` | Verify TLS certificates when querying DirectAdmin |
> The `reconciliation.directadmin_servers` list (DA hostnames, credentials) requires a config file — it cannot be expressed as simple env vars.
#### Peer sync
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `peer_sync.enabled` | `DADNS_PEER_SYNC_ENABLED` | `false` | Enable background peer-to-peer zone sync |
| `peer_sync.interval_minutes` | `DADNS_PEER_SYNC_INTERVAL_MINUTES` | `15` | How often each peer is polled |
> The `peer_sync.peers` list (peer URLs, credentials) requires a config file — it cannot be expressed as simple env vars.
---
### Environment-variable-only setup
No config file is needed for single-backend deployments. Pass all settings as container environment variables.
#### Topology A/C — NSD backend (env vars only, recommended)
```bash
DADNS_APP_AUTH_PASSWORD=my-strong-secret
DADNS_DNS_DEFAULT_BACKEND=nsd
DADNS_DNS_BACKENDS_NSD_ENABLED=true
DADNS_DNS_BACKENDS_NSD_ZONES_DIR=/etc/nsd/zones
DADNS_DNS_BACKENDS_NSD_NSD_CONF=/etc/nsd/nsd.conf.d/zones.conf
DADNS_QUEUE_LOCATION=/app/data/queues
DADNS_DATASTORE_DB_LOCATION=/app/data/directdns.db
```
`docker-compose.yml` snippet (Topology C — two instances with peer sync via config file):
### Config Files
#### `config/app.yml`
```yaml ```yaml
timezone: Pacific/Auckland services:
log_level: INFO directdnsonly-syd:
queue_location: ./data/queues image: guisea/directdnsonly:2.5.0
ports:
- "2222:2222"
- "53:53/udp"
environment:
DADNS_APP_AUTH_PASSWORD: my-strong-secret
DADNS_DNS_DEFAULT_BACKEND: nsd
DADNS_DNS_BACKENDS_NSD_ENABLED: "true"
volumes:
- ./config/syd:/app/config # contains peer_sync.peers list
- syd-data:/app/data
directdnsonly-mlb:
image: guisea/directdnsonly:2.5.0
ports:
- "2223:2222"
- "54:53/udp"
environment:
DADNS_APP_AUTH_PASSWORD: my-strong-secret
DADNS_DNS_DEFAULT_BACKEND: nsd
DADNS_DNS_BACKENDS_NSD_ENABLED: "true"
volumes:
- ./config/mlb:/app/config # contains peer_sync.peers list
- mlb-data:/app/data
volumes:
syd-data:
mlb-data:
```
#### Topology A — BIND backend (env vars only)
```bash
# docker run / docker-compose environment:
DADNS_APP_AUTH_USERNAME=directdnsonly
DADNS_APP_AUTH_PASSWORD=my-strong-secret
DADNS_DNS_DEFAULT_BACKEND=bind
DADNS_DNS_BACKENDS_BIND_ENABLED=true
DADNS_DNS_BACKENDS_BIND_ZONES_DIR=/etc/named/zones
DADNS_DNS_BACKENDS_BIND_NAMED_CONF=/etc/named/named.conf.local
DADNS_QUEUE_LOCATION=/app/data/queues
DADNS_DATASTORE_DB_LOCATION=/app/data/directdns.db
```
`docker-compose.yml` snippet:
```yaml
services:
directdnsonly:
image: guisea/directdnsonly:2.5.0
ports:
- "2222:2222"
- "53:53/udp"
environment:
DADNS_APP_AUTH_PASSWORD: my-strong-secret
DADNS_DNS_DEFAULT_BACKEND: bind
DADNS_DNS_BACKENDS_BIND_ENABLED: "true"
DADNS_DNS_BACKENDS_BIND_ZONES_DIR: /etc/named/zones
DADNS_DNS_BACKENDS_BIND_NAMED_CONF: /etc/named/named.conf.local
volumes:
- ddo-data:/app/data
volumes:
ddo-data:
```
#### Topology B — single CoreDNS MySQL backend (env vars only)
```bash
DADNS_APP_AUTH_PASSWORD=my-strong-secret
DADNS_DNS_DEFAULT_BACKEND=coredns_mysql
DADNS_DNS_BACKENDS_COREDNS_MYSQL_ENABLED=true
DADNS_DNS_BACKENDS_COREDNS_MYSQL_HOST=mysql.dc1.internal
DADNS_DNS_BACKENDS_COREDNS_MYSQL_PORT=3306
DADNS_DNS_BACKENDS_COREDNS_MYSQL_DATABASE=coredns
DADNS_DNS_BACKENDS_COREDNS_MYSQL_USERNAME=coredns
DADNS_DNS_BACKENDS_COREDNS_MYSQL_PASSWORD=db-secret
DADNS_QUEUE_LOCATION=/app/data/queues
DADNS_DATASTORE_DB_LOCATION=/app/data/directdns.db
```
---
### Multi-backend via config file
When you need **multiple named backends** (e.g. two CoreDNS MySQL instances in different data centres), **peer sync**, or **reconciliation with DA servers**, use a config file mounted at `/app/config/app.yml` (or `/etc/directdnsonly/app.yml`):
```yaml
app: app:
auth_username: directdnsonly auth_username: directdnsonly
auth_password: changeme # override with DADNS_APP_AUTH_PASSWORD auth_password: my-strong-secret # or use DADNS_APP_AUTH_PASSWORD
dns: dns:
default_backend: bind default_backend: coredns_dc1
backends: backends:
bind: coredns_dc1:
type: coredns_mysql
enabled: true enabled: true
zones_dir: ./data/zones host: 10.0.0.80
named_conf: ./data/named.conf.include
coredns_mysql:
enabled: true
host: "127.0.0.1"
port: 3306 port: 3306
database: "coredns" database: coredns
username: "coredns" username: coredns
password: "password" password: db-secret-dc1
coredns_dc2:
type: coredns_mysql
enabled: true
host: 10.0.1.29
port: 3306
database: coredns
username: coredns
password: db-secret-dc2
reconciliation:
enabled: true
dry_run: false
interval_minutes: 60
verify_ssl: true
directadmin_servers:
- hostname: da1.example.com
port: 2222
username: admin
password: da-secret
ssl: true
peer_sync:
enabled: true
interval_minutes: 15
peers:
- url: http://ddo-2:2222
username: directdnsonly
password: my-strong-secret
```
Credentials in the config file can still be overridden by env vars — for example, `DADNS_APP_AUTH_PASSWORD` overrides `app.auth_password` regardless of what the file says.

View File

@@ -0,0 +1,79 @@
import cherrypy
import json
from loguru import logger
from sqlalchemy import select
from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain
class InternalAPI:
"""Peer-to-peer zone_data exchange endpoint.
Used by PeerSyncWorker to replicate zone_data between directdnsonly
instances so each node can independently heal its local backends.
All routes require the same basic auth as the main API.
"""
@cherrypy.expose
def zones(self, domain=None):
"""Return zone metadata or zone_data for a specific domain.
GET /internal/zones
Returns a JSON array of {domain, zone_updated_at, hostname, username}
for all domains that have stored zone_data.
GET /internal/zones?domain=example.com
Returns {domain, zone_data, zone_updated_at, hostname, username}
for the requested domain, or 404 if not found / no zone_data.
"""
cherrypy.response.headers["Content-Type"] = "application/json"
session = connect()
try:
if domain:
record = session.execute(
select(Domain)
.filter_by(domain=domain)
.where(Domain.zone_data.isnot(None))
).scalar_one_or_none()
if not record:
cherrypy.response.status = 404
return json.dumps({"error": "not found"}).encode()
return json.dumps(
{
"domain": record.domain,
"zone_data": record.zone_data,
"zone_updated_at": (
record.zone_updated_at.isoformat()
if record.zone_updated_at
else None
),
"hostname": record.hostname,
"username": record.username,
}
).encode()
else:
records = session.execute(
select(Domain).where(Domain.zone_data.isnot(None))
).scalars().all()
return json.dumps(
[
{
"domain": r.domain,
"zone_updated_at": (
r.zone_updated_at.isoformat()
if r.zone_updated_at
else None
),
"hostname": r.hostname,
"username": r.username,
}
for r in records
]
).encode()
except Exception as exc:
logger.error(f"[internal] Error serving /internal/zones: {exc}")
cherrypy.response.status = 500
return json.dumps({"error": "internal server error"}).encode()
finally:
session.close()

View File

@@ -2,6 +2,7 @@ from typing import Dict, Type, Optional
from .base import DNSBackend from .base import DNSBackend
from .bind import BINDBackend from .bind import BINDBackend
from .coredns_mysql import CoreDNSMySQLBackend from .coredns_mysql import CoreDNSMySQLBackend
from .nsd import NSDBackend
from directdnsonly.config import config from directdnsonly.config import config
from loguru import logger from loguru import logger
@@ -11,6 +12,7 @@ class BackendRegistry:
self._backend_types = { self._backend_types = {
"bind": BINDBackend, "bind": BINDBackend,
"coredns_mysql": CoreDNSMySQLBackend, "coredns_mysql": CoreDNSMySQLBackend,
"nsd": NSDBackend,
} }
self._backend_instances: Dict[str, DNSBackend] = {} self._backend_instances: Dict[str, DNSBackend] = {}
self._initialized = False self._initialized = False

View File

@@ -1,8 +1,7 @@
from typing import Optional, Dict, Set, Tuple, Any from typing import Optional, Dict, Set, Tuple, Any
from sqlalchemy import create_engine, Column, String, Integer, Text, ForeignKey, Boolean from sqlalchemy import create_engine, Column, String, Integer, Text, ForeignKey, Boolean, select, func, delete
from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.orm import sessionmaker, scoped_session, relationship, declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session, relationship
from dns import zone as dns_zone_module from dns import zone as dns_zone_module
from dns.rdataclass import IN from dns.rdataclass import IN
from loguru import logger from loguru import logger
@@ -46,7 +45,7 @@ class CoreDNSMySQLBackend(DNSBackend):
pool_size=5, pool_size=5,
max_overflow=10, max_overflow=10,
) )
self.Session = scoped_session(sessionmaker(bind=self.engine)) self.Session = scoped_session(sessionmaker(self.engine))
Base.metadata.create_all(self.engine) Base.metadata.create_all(self.engine)
logger.info( logger.info(
f"Initialized CoreDNS MySQL backend '{self.instance_name}' " f"Initialized CoreDNS MySQL backend '{self.instance_name}' "
@@ -80,7 +79,7 @@ class CoreDNSMySQLBackend(DNSBackend):
# Get existing records for this zone but track SOA records separately # Get existing records for this zone but track SOA records separately
existing_records = {} existing_records = {}
existing_soa = None existing_soa = None
for r in session.query(Record).filter_by(zone_id=zone.id).all(): for r in session.execute(select(Record).filter_by(zone_id=zone.id)).scalars().all():
if r.type == "SOA": if r.type == "SOA":
existing_soa = r existing_soa = r
else: else:
@@ -192,17 +191,17 @@ class CoreDNSMySQLBackend(DNSBackend):
session = self.Session() session = self.Session()
try: try:
# First find the zone # First find the zone
zone = ( zone = session.execute(
session.query(Zone) select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
.filter_by(zone_name=self.dot_fqdn(zone_name)) ).scalar_one_or_none()
.first()
)
if not zone: if not zone:
logger.warning(f"Zone {zone_name} not found for deletion") logger.warning(f"Zone {zone_name} not found for deletion")
return False return False
# Delete all records associated with the zone # Delete all records associated with the zone
count = session.query(Record).filter_by(zone_id=zone.id).delete() count = session.execute(
delete(Record).where(Record.zone_id == zone.id)
).rowcount
# Delete the zone itself # Delete the zone itself
session.delete(zone) session.delete(zone)
@@ -229,12 +228,9 @@ class CoreDNSMySQLBackend(DNSBackend):
def zone_exists(self, zone_name: str) -> bool: def zone_exists(self, zone_name: str) -> bool:
session = self.Session() session = self.Session()
try: try:
exists = ( exists = session.execute(
session.query(Zone) select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
.filter_by(zone_name=self.dot_fqdn(zone_name)) ).scalar_one_or_none() is not None
.first()
is not None
)
logger.debug(f"Zone existence check for {zone_name}: {exists}") logger.debug(f"Zone existence check for {zone_name}: {exists}")
return exists return exists
except Exception as e: except Exception as e:
@@ -245,7 +241,9 @@ class CoreDNSMySQLBackend(DNSBackend):
def _ensure_zone_exists(self, session, zone_name: str) -> Zone: def _ensure_zone_exists(self, session, zone_name: str) -> Zone:
"""Ensure a zone exists in the database, creating it if necessary""" """Ensure a zone exists in the database, creating it if necessary"""
zone = session.query(Zone).filter_by(zone_name=self.dot_fqdn(zone_name)).first() zone = session.execute(
select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
).scalar_one_or_none()
if not zone: if not zone:
logger.debug(f"Creating new zone: {self.dot_fqdn(zone_name)}") logger.debug(f"Creating new zone: {self.dot_fqdn(zone_name)}")
zone = Zone(zone_name=self.dot_fqdn(zone_name)) zone = Zone(zone_name=self.dot_fqdn(zone_name))
@@ -323,11 +321,9 @@ class CoreDNSMySQLBackend(DNSBackend):
""" """
session = self.Session() session = self.Session()
try: try:
zone = ( zone = session.execute(
session.query(Zone) select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
.filter_by(zone_name=self.dot_fqdn(zone_name)) ).scalar_one_or_none()
.first()
)
if not zone: if not zone:
logger.warning( logger.warning(
f"[{self.instance_name}] Zone {zone_name} not found " f"[{self.instance_name}] Zone {zone_name} not found "
@@ -335,7 +331,9 @@ class CoreDNSMySQLBackend(DNSBackend):
) )
return False, 0 return False, 0
actual_count = session.query(Record).filter_by(zone_id=zone.id).count() actual_count = session.execute(
select(func.count()).select_from(Record).where(Record.zone_id == zone.id)
).scalar()
matches = actual_count == expected_count matches = actual_count == expected_count
if not matches: if not matches:
@@ -383,11 +381,9 @@ class CoreDNSMySQLBackend(DNSBackend):
""" """
session = self.Session() session = self.Session()
try: try:
zone = ( zone = session.execute(
session.query(Zone) select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
.filter_by(zone_name=self.dot_fqdn(zone_name)) ).scalar_one_or_none()
.first()
)
if not zone: if not zone:
logger.warning( logger.warning(
f"[{self.instance_name}] Zone {zone_name} not found " f"[{self.instance_name}] Zone {zone_name} not found "
@@ -405,7 +401,9 @@ class CoreDNSMySQLBackend(DNSBackend):
} }
# Query all records currently in the backend for this zone # Query all records currently in the backend for this zone
db_records = session.query(Record).filter_by(zone_id=zone.id).all() db_records = session.execute(
select(Record).where(Record.zone_id == zone.id)
).scalars().all()
removed = 0 removed = 0
for record in db_records: for record in db_records:

View File

@@ -0,0 +1,179 @@
import os
import re
import subprocess
from loguru import logger
from pathlib import Path
from typing import Dict, List, Optional
from .base import DNSBackend
class NSDBackend(DNSBackend):
"""DNS backend for NSD (Name Server Daemon) by NLnet Labs.
Zone files use the same RFC 1035 format as BIND. NSD is reloaded via
``nsd-control reload`` after each write. Zone registration is managed in a
dedicated include file so the main ``nsd.conf`` is never modified by the
application.
"""
@classmethod
def get_name(cls) -> str:
return "nsd"
@classmethod
def is_available(cls) -> bool:
try:
result = subprocess.run(
["nsd-control", "status"],
capture_output=True,
text=True,
)
# nsd-control exits 0 when NSD is running, non-zero otherwise.
# Either way, a non-FileNotFoundError means the binary is present.
logger.info("NSD available (nsd-control found)")
return True
except FileNotFoundError:
logger.warning("NSD not found in PATH — nsd-control missing")
return False
def __init__(self, config: Dict):
super().__init__(config)
self.zones_dir = Path(config.get("zones_dir", "/etc/nsd/zones"))
self.nsd_conf = Path(
config.get("nsd_conf", "/etc/nsd/nsd.conf.d/zones.conf")
)
# Ensure zones directory exists
try:
if self.zones_dir.is_symlink():
logger.debug(f"{self.zones_dir} is already a symlink")
elif not self.zones_dir.exists():
self.zones_dir.mkdir(parents=True, mode=0o755)
logger.debug(f"Created zones directory: {self.zones_dir}")
os.chmod(self.zones_dir, 0o755)
except FileExistsError:
pass
except Exception as e:
logger.error(f"Failed to setup zones directory: {e}")
raise
# Ensure the conf include directory and file exist
self.nsd_conf.parent.mkdir(parents=True, exist_ok=True)
if not self.nsd_conf.exists():
self.nsd_conf.touch()
logger.info(f"Created empty NSD zone conf: {self.nsd_conf}")
logger.success(
f"NSD backend initialized — zones: {self.zones_dir}, "
f"conf: {self.nsd_conf}"
)
# ------------------------------------------------------------------
# Core backend interface
# ------------------------------------------------------------------
def write_zone(self, zone_name: str, zone_data: str) -> bool:
zone_file = self.zones_dir / f"{zone_name}.db"
try:
zone_file.write_text(zone_data)
logger.debug(f"Wrote zone file: {zone_file}")
self._ensure_zone_in_conf(zone_name)
return True
except IOError as e:
logger.error(f"Failed to write zone file {zone_file}: {e}")
return False
def delete_zone(self, zone_name: str) -> bool:
zone_file = self.zones_dir / f"{zone_name}.db"
try:
if zone_file.exists():
zone_file.unlink()
logger.debug(f"Deleted zone file: {zone_file}")
else:
logger.warning(f"Zone file not found: {zone_file}")
return False
self._remove_zone_from_conf(zone_name)
return True
except IOError as e:
logger.error(f"Failed to delete zone {zone_name}: {e}")
return False
def reload_zone(self, zone_name: Optional[str] = None) -> bool:
try:
if zone_name:
cmd = ["nsd-control", "reload", zone_name]
logger.debug(f"Reloading single zone: {zone_name}")
else:
cmd = ["nsd-control", "reload"]
logger.debug("Reloading all zones")
result = subprocess.run(
cmd,
check=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
logger.debug(f"NSD reload successful: {result.stdout.strip()}")
return True
except subprocess.CalledProcessError as e:
logger.error(f"NSD reload failed: {e.stderr.strip()}")
return False
except Exception as e:
logger.error(f"Unexpected error during NSD reload: {e}")
return False
def zone_exists(self, zone_name: str) -> bool:
exists = (self.zones_dir / f"{zone_name}.db").exists()
logger.debug(f"Zone existence check for {zone_name}: {exists}")
return exists
# ------------------------------------------------------------------
# NSD conf file management
# ------------------------------------------------------------------
def update_nsd_conf(self, zones: List[str]) -> bool:
"""Rewrite the NSD zones include file with exactly the given zone list.
Equivalent to BINDBackend.update_named_conf — full replacement from a
known-good source list.
"""
try:
lines = []
for zone in zones:
zone_file = self.zones_dir / f"{zone}.db"
lines.append(
f'\nzone:\n name: "{zone}"\n zonefile: "{zone_file}"\n'
)
self.nsd_conf.write_text("".join(lines))
logger.debug(f"Rewrote NSD zone conf: {self.nsd_conf}")
return True
except IOError as e:
logger.error(f"Failed to update NSD zone conf: {e}")
return False
def _ensure_zone_in_conf(self, zone_name: str) -> None:
"""Append a zone stanza to the NSD conf file if it is not already present."""
zone_file = self.zones_dir / f"{zone_name}.db"
stanza = f'\nzone:\n name: "{zone_name}"\n zonefile: "{zone_file}"\n'
content = self.nsd_conf.read_text() if self.nsd_conf.exists() else ""
if f'name: "{zone_name}"' not in content:
with open(self.nsd_conf, "a") as f:
f.write(stanza)
logger.debug(f"Added zone {zone_name} to NSD conf")
def _remove_zone_from_conf(self, zone_name: str) -> None:
"""Remove a zone stanza from the NSD conf file."""
if not self.nsd_conf.exists():
return
content = self.nsd_conf.read_text()
pattern = (
r'\nzone:\n name: "'
+ re.escape(zone_name)
+ r'"\n zonefile: "[^"]+"\n'
)
new_content = re.sub(pattern, "", content)
if new_content != content:
self.nsd_conf.write_text(new_content)
logger.debug(f"Removed zone {zone_name} from NSD conf")

View File

@@ -1,332 +0,0 @@
from typing import Optional, Dict, Set, Tuple, List
from sqlalchemy import (
create_engine,
Column,
String,
Integer,
Text,
Boolean,
DateTime,
func,
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session
from loguru import logger
from .base import DNSBackend
from config import config
import time
Base = declarative_base()
class Domain(Base):
__tablename__ = "domains"
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False, index=True, unique=True)
master = Column(String(128), nullable=True)
last_check = Column(Integer, nullable=True)
type = Column(String(6), nullable=False, default="NATIVE")
notified_serial = Column(Integer, nullable=True)
account = Column(String(40), nullable=True)
class Record(Base):
__tablename__ = "records"
id = Column(Integer, primary_key=True)
domain_id = Column(Integer, nullable=False, index=True)
name = Column(String(255), nullable=False, index=True)
type = Column(String(10), nullable=False)
content = Column(Text, nullable=False)
ttl = Column(Integer, nullable=True)
prio = Column(Integer, nullable=True)
change_date = Column(Integer, nullable=True)
disabled = Column(Boolean, nullable=False, default=False)
ordername = Column(String(255), nullable=True)
auth = Column(Boolean, nullable=False, default=True)
class PowerDNSMySQLBackend(DNSBackend):
@classmethod
def get_name(cls) -> str:
return "powerdns_mysql"
@classmethod
def is_available(cls) -> bool:
try:
import pymysql
return True
except ImportError:
logger.warning("PyMySQL not available - PowerDNS MySQL backend disabled")
return False
@staticmethod
def ensure_fqdn(name: str, zone_name: str) -> str:
"""Ensure name is fully qualified for PowerDNS"""
if name == "@" or name == "":
return zone_name
elif name.endswith("."):
return name.rstrip(".")
elif name == zone_name:
return name
else:
return f"{name}.{zone_name}"
def __init__(self, config: dict = None):
c = config or config.get("dns.backends.powerdns_mysql")
self.engine = create_engine(
f"mysql+pymysql://{c['username']}:{c['password']}@"
f"{c['host']}:{c['port']}/{c['database']}",
pool_pre_ping=True,
)
self.Session = scoped_session(sessionmaker(bind=self.engine))
Base.metadata.create_all(self.engine)
logger.info(f"Initialized PowerDNS MySQL backend for {c['database']}")
def _ensure_domain_exists(self, session, zone_name: str) -> Domain:
"""Ensure domain exists and return domain object"""
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
domain = Domain(name=zone_name, type="NATIVE")
session.add(domain)
session.flush() # Flush to get the domain ID
logger.info(f"Created new domain: {zone_name}")
return domain
def _parse_soa_content(self, soa_content: str) -> Dict[str, str]:
"""Parse SOA record content into components"""
parts = soa_content.split()
if len(parts) >= 7:
return {
"primary_ns": parts[0],
"hostmaster": parts[1],
"serial": parts[2],
"refresh": parts[3],
"retry": parts[4],
"expire": parts[5],
"minimum": parts[6],
}
return {}
def write_zone(self, zone_name: str, zone_data: str) -> bool:
from dns import zone as dns_zone_module
from dns.rdataclass import IN
session = self.Session()
try:
# Ensure domain exists
domain = self._ensure_domain_exists(session, zone_name)
# Get existing records for this domain
existing_records = {
(r.name, r.type): r
for r in session.query(Record).filter_by(domain_id=domain.id).all()
}
# Parse the zone data
dns_zone = dns_zone_module.from_text(zone_data, check_origin=False)
# Track records we process
current_records: Set[Tuple[str, str]] = set()
changes = {"added": 0, "updated": 0, "removed": 0}
current_time = int(time.time())
# Process all records
for name, ttl, rdata in dns_zone.iterate_rdatas():
if rdata.rdclass != IN:
continue
record_name = self.ensure_fqdn(str(name), zone_name)
record_type = rdata.rdtype.name
record_content = rdata.to_text()
record_ttl = ttl
record_prio = None
# Handle MX records priority
if record_type == "MX":
parts = record_content.split(" ", 1)
if len(parts) == 2:
record_prio = int(parts[0])
record_content = parts[1]
# Handle SRV records priority and other fields
elif record_type == "SRV":
parts = record_content.split(" ", 3)
if len(parts) == 4:
record_prio = int(parts[0])
record_content = f"{parts[1]} {parts[2]} {parts[3]}"
# Ensure CNAME and other records have proper FQDN format
if record_type in ["CNAME", "MX", "NS"]:
if not record_content.endswith(".") and record_content != "@":
if record_content == "@":
record_content = zone_name
elif "." not in record_content:
record_content = f"{record_content}.{zone_name}"
key = (record_name, record_type)
current_records.add(key)
if key in existing_records:
# Update existing record if needed
record = existing_records[key]
if (
record.content != record_content
or record.ttl != record_ttl
or record.prio != record_prio
):
record.content = record_content
record.ttl = record_ttl
record.prio = record_prio
record.change_date = current_time
record.disabled = False
changes["updated"] += 1
else:
# Add new record
new_record = Record(
domain_id=domain.id,
name=record_name,
type=record_type,
content=record_content,
ttl=record_ttl,
prio=record_prio,
change_date=current_time,
disabled=False,
auth=True,
)
session.add(new_record)
changes["added"] += 1
# Remove deleted records
for key in set(existing_records.keys()) - current_records:
session.delete(existing_records[key])
changes["removed"] += 1
session.commit()
logger.success(
f"Zone {zone_name} updated: "
f"+{changes['added']} ~{changes['updated']} -{changes['removed']}"
)
return True
except Exception as e:
session.rollback()
logger.error(f"Zone update failed for {zone_name}: {e}")
return False
finally:
session.close()
def delete_zone(self, zone_name: str) -> bool:
session = self.Session()
try:
# First find the domain
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
logger.warning(f"Domain {zone_name} not found for deletion")
return False
# Delete all records associated with the domain
count = session.query(Record).filter_by(domain_id=domain.id).delete()
# Delete the domain itself
session.delete(domain)
session.commit()
logger.info(f"Deleted domain {zone_name} with {count} records")
return True
except Exception as e:
session.rollback()
logger.error(f"Domain deletion failed for {zone_name}: {e}")
return False
finally:
session.close()
def reload_zone(self, zone_name: Optional[str] = None) -> bool:
"""PowerDNS reload - could trigger pdns_control reload if needed"""
if zone_name:
logger.debug(f"PowerDNS reload triggered for zone {zone_name}")
# Optional: Call pdns_control reload-zones here if needed
# subprocess.run(['pdns_control', 'reload-zones'], check=True)
else:
logger.debug("PowerDNS reload triggered for all zones")
# Optional: Call pdns_control reload here if needed
# subprocess.run(['pdns_control', 'reload'], check=True)
return True
def zone_exists(self, zone_name: str) -> bool:
session = self.Session()
try:
exists = session.query(Domain).filter_by(name=zone_name).first() is not None
logger.debug(f"Zone existence check for {zone_name}: {exists}")
return exists
except Exception as e:
logger.error(f"Zone existence check failed for {zone_name}: {e}")
return False
finally:
session.close()
def get_zone_records(self, zone_name: str) -> List[Dict]:
"""Get all records for a zone - useful for debugging/inspection"""
session = self.Session()
try:
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
return []
records = session.query(Record).filter_by(domain_id=domain.id).all()
return [
{
"name": r.name,
"type": r.type,
"content": r.content,
"ttl": r.ttl,
"prio": r.prio,
"disabled": r.disabled,
}
for r in records
]
except Exception as e:
logger.error(f"Failed to get records for {zone_name}: {e}")
return []
finally:
session.close()
def set_record_status(
self, zone_name: str, record_name: str, record_type: str, disabled: bool
) -> bool:
"""Enable/disable specific records"""
session = self.Session()
try:
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
logger.warning(f"Domain {zone_name} not found")
return False
full_name = self.ensure_fqdn(record_name, zone_name)
record = (
session.query(Record)
.filter_by(domain_id=domain.id, name=full_name, type=record_type)
.first()
)
if not record:
logger.warning(
f"Record {full_name} {record_type} not found in {zone_name}"
)
return False
record.disabled = disabled
record.change_date = int(time.time())
session.commit()
status = "disabled" if disabled else "enabled"
logger.info(f"Record {full_name} {record_type} {status} in {zone_name}")
return True
except Exception as e:
session.rollback()
logger.error(f"Failed to set record status: {e}")
return False
finally:
session.close()

View File

@@ -0,0 +1,3 @@
from .client import DirectAdminClient
__all__ = ["DirectAdminClient"]

View File

@@ -0,0 +1,204 @@
"""DirectAdmin HTTP client.
Encapsulates all outbound communication with a single DirectAdmin server:
authenticated requests, the Basic-Auth → session-cookie fallback for DA Evo,
paginated domain listing, and the legacy URL-encoded response parser.
"""
from __future__ import annotations
from urllib.parse import parse_qs
from typing import Optional
import requests
import requests.exceptions
from loguru import logger
class DirectAdminClient:
"""HTTP client for a single DirectAdmin server.
Handles two authentication modes transparently:
- Basic Auth (classic DA / API-only access)
- Session cookie via CMD_LOGIN (DA Evolution — redirects Basic Auth)
Usage::
client = DirectAdminClient("da1.example.com", 2222, "admin", "secret")
domains = client.list_domains() # set[str] or None on failure
response = client.get("CMD_API_SHOW_ALL_USERS")
"""
def __init__(
self,
hostname: str,
port: int,
username: str,
password: str,
ssl: bool = True,
verify_ssl: bool = True,
) -> None:
self.hostname = hostname
self.port = port
self.username = username
self.password = password
self.scheme = "https" if ssl else "http"
self.verify_ssl = verify_ssl
self._cookies = None # populated on first successful session login
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def list_domains(self, ipp: int = 1000) -> Optional[set]:
"""Return all domains on this DA server via CMD_DNS_ADMIN (JSON, paginated).
Falls back to the legacy URL-encoded parser if JSON decode fails.
Returns a set of lowercase domain strings, or ``None`` if the server
is unreachable or returns an error.
"""
page = 1
all_domains: set = set()
total_pages = 1
try:
while page <= total_pages:
response = self.get(
"CMD_DNS_ADMIN",
params={"json": "yes", "page": page, "ipp": ipp},
)
if response is None:
return None
if response.is_redirect or response.status_code in (301, 302, 303, 307, 308):
if self._cookies:
logger.error(
f"[da:{self.hostname}] Still redirecting after session login — "
f"check that '{self.username}' has admin-level access. Skipping."
)
return None
logger.debug(
f"[da:{self.hostname}] Basic Auth redirected "
f"(HTTP {response.status_code}) — attempting session login (DA Evo)"
)
if not self._login():
return None
continue # retry this page with cookies
response.raise_for_status()
content_type = response.headers.get("Content-Type", "")
if "text/html" in content_type:
logger.error(
f"[da:{self.hostname}] Returned HTML instead of API response — "
f"check credentials and admin-level access. Skipping."
)
return None
try:
data = response.json()
for k, v in data.items():
if k.isdigit() and isinstance(v, dict) and "domain" in v:
all_domains.add(v["domain"].strip().lower())
total_pages = int(data.get("info", {}).get("total_pages", 1))
page += 1
except Exception as exc:
logger.error(
f"[da:{self.hostname}] JSON decode failed on page {page}: {exc}\n"
f"Raw response: {response.text[:500]}"
)
all_domains.update(self._parse_legacy_domain_list(response.text))
break # no paging in legacy mode
return all_domains
except requests.exceptions.SSLError as exc:
logger.error(
f"[da:{self.hostname}] SSL error — {exc}. "
f"Set verify_ssl: false in reconciliation config if using self-signed certs."
)
except requests.exceptions.ConnectionError as exc:
logger.error(f"[da:{self.hostname}] Cannot reach server — {exc}. Skipping.")
except requests.exceptions.Timeout:
logger.error(f"[da:{self.hostname}] Connection timed out. Skipping.")
except requests.exceptions.HTTPError as exc:
logger.error(f"[da:{self.hostname}] HTTP error — {exc}. Skipping.")
except Exception as exc:
logger.error(f"[da:{self.hostname}] Unexpected error: {exc}")
return None
def get(
self, command: str, params: Optional[dict] = None
) -> Optional[requests.Response]:
"""Authenticated GET to any DA CMD_* endpoint.
Uses session cookies when available (after a successful ``_login``),
otherwise falls back to HTTP Basic Auth. Does **not** follow redirects
so callers can detect the Basic-Auth → cookie upgrade.
"""
url = f"{self.scheme}://{self.hostname}:{self.port}/{command}"
kwargs: dict = dict(
params=params or {},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if self._cookies:
kwargs["cookies"] = self._cookies
else:
kwargs["auth"] = (self.username, self.password)
try:
return requests.get(url, **kwargs)
except Exception as exc:
logger.error(f"[da:{self.hostname}] GET {command} failed: {exc}")
return None
# ------------------------------------------------------------------
# Internal
# ------------------------------------------------------------------
def _login(self) -> bool:
"""POST CMD_LOGIN to obtain a DA Evo session cookie.
Populates ``self._cookies`` on success and returns ``True``.
Returns ``False`` on any failure.
"""
login_url = f"{self.scheme}://{self.hostname}:{self.port}/CMD_LOGIN"
try:
response = requests.post(
login_url,
data={
"username": self.username,
"password": self.password,
"referer": "/CMD_DNS_ADMIN?json=yes&page=1&ipp=500",
},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if not response.cookies:
logger.error(
f"[da:{self.hostname}] CMD_LOGIN returned no session cookie — "
f"check username/password."
)
return False
self._cookies = response.cookies
logger.debug(f"[da:{self.hostname}] Session login successful (DA Evo)")
return True
except Exception as exc:
logger.error(f"[da:{self.hostname}] Session login failed: {exc}")
return False
@staticmethod
def _parse_legacy_domain_list(body: str) -> set:
"""Parse DA's legacy CMD_API_SHOW_ALL_DOMAINS URL-encoded response.
DA returns ``list[]=example.com&list[]=example2.com``, optionally
newline-separated instead of ampersand-separated.
"""
normalised = body.replace("\n", "&").strip("&")
params = parse_qs(normalised)
domains = params.get("list[]", [])
return {d.strip().lower() for d in domains if d.strip()}

View File

@@ -1,13 +1,36 @@
from sqlalchemy import create_engine from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker from sqlalchemy.orm import sessionmaker, declarative_base
from sqlalchemy.ext.declarative import declarative_base
from vyper import v from vyper import v
from loguru import logger
import datetime import datetime
Base = declarative_base() Base = declarative_base()
def _migrate(engine):
"""Apply additive schema migrations for columns added after initial release."""
migrations = [
("domains", "zone_data", "ALTER TABLE domains ADD COLUMN zone_data TEXT"),
(
"domains",
"zone_updated_at",
"ALTER TABLE domains ADD COLUMN zone_updated_at DATETIME",
),
]
with engine.connect() as conn:
for table, column, ddl in migrations:
try:
conn.execute(text(f"SELECT {column} FROM {table} LIMIT 1"))
except Exception:
try:
conn.execute(text(ddl))
conn.commit()
logger.info(f"[db] Migration applied: added {table}.{column}")
except Exception as exc:
logger.warning(f"[db] Migration skipped ({table}.{column}): {exc}")
def connect(dbtype="sqlite", **kwargs): def connect(dbtype="sqlite", **kwargs):
if dbtype == "sqlite": if dbtype == "sqlite":
# Start SQLite engine # Start SQLite engine
@@ -19,7 +42,8 @@ def connect(dbtype="sqlite", **kwargs):
"sqlite:///" + db_location, connect_args={"check_same_thread": False} "sqlite:///" + db_location, connect_args={"check_same_thread": False}
) )
Base.metadata.create_all(engine) Base.metadata.create_all(engine)
return sessionmaker(bind=engine)() _migrate(engine)
return sessionmaker(engine)()
elif dbtype == "mysql": elif dbtype == "mysql":
# Start a MySQL engine # Start a MySQL engine
db_user = v.get_string("datastore.user") db_user = v.get_string("datastore.user")
@@ -50,6 +74,7 @@ def connect(dbtype="sqlite", **kwargs):
+ db_name + db_name
) )
Base.metadata.create_all(engine) Base.metadata.create_all(engine)
return sessionmaker(bind=engine)() _migrate(engine)
return sessionmaker(engine)()
else: else:
raise Exception("Unknown/unimplemented database type: {}".format(dbtype)) raise Exception("Unknown/unimplemented database type: {}".format(dbtype))

View File

@@ -1,5 +1,5 @@
from directdnsonly.app.db import Base from directdnsonly.app.db import Base
from sqlalchemy import Column, Integer, String, DateTime from sqlalchemy import Column, Integer, String, DateTime, Text
class Key(Base): class Key(Base):
@@ -25,6 +25,8 @@ class Domain(Base):
domain = Column(String(255), unique=True) domain = Column(String(255), unique=True)
hostname = Column(String(255)) hostname = Column(String(255))
username = Column(String(255)) username = Column(String(255))
zone_data = Column(Text, nullable=True) # last known zone file from DA
zone_updated_at = Column(DateTime, nullable=True) # when zone_data was last stored
def __repr__(self): def __repr__(self):
return "<Domain(id='%s', domain='%s', hostname='%s', username='%s')>" % ( return "<Domain(id='%s', domain='%s', hostname='%s', username='%s')>" % (

View File

@@ -0,0 +1,190 @@
#!/usr/bin/env python3
"""Peer sync worker — exchanges zone_data between directdnsonly instances.
Each node stores zone_data in its local SQLite DB after every successful
backend write. When DirectAdmin pushes a zone to one node but the other
is temporarily offline, the offline node misses that zone_data.
PeerSyncWorker corrects this by periodically comparing zone lists with
configured peers and fetching any zone_data that is newer or absent locally.
It only updates the local DB — it never writes directly to backends. The
existing reconciler healing pass then detects missing zones and re-pushes
using the freshly synced zone_data.
Safety properties:
- If a peer is unreachable, skip it silently and retry next interval
- Only zone_data is synced — backend writes remain the sole responsibility
of the local save queue worker
- Newer zone_updated_at timestamp wins; local data is never overwritten
with older peer data
"""
import datetime
import threading
from loguru import logger
import requests
from sqlalchemy import select
from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain
class PeerSyncWorker:
"""Periodically fetches zone_data from peer directdnsonly instances and
stores it locally so the healing pass can re-push missing zones without
waiting for a DirectAdmin re-push."""
def __init__(self, peer_sync_config: dict):
self.enabled = peer_sync_config.get("enabled", False)
self.interval_seconds = peer_sync_config.get("interval_minutes", 15) * 60
self.peers = peer_sync_config.get("peers") or []
self._stop_event = threading.Event()
self._thread = None
def start(self):
if not self.enabled:
logger.info("Peer sync disabled — skipping")
return
if not self.peers:
logger.warning("Peer sync enabled but no peers configured")
return
self._stop_event.clear()
self._thread = threading.Thread(
target=self._run, daemon=True, name="peer_sync_worker"
)
self._thread.start()
peer_urls = [p.get("url", "?") for p in self.peers]
logger.info(
f"Peer sync worker started — "
f"interval: {self.interval_seconds // 60}m, "
f"peers: {peer_urls}"
)
def stop(self):
self._stop_event.set()
if self._thread:
self._thread.join(timeout=10)
logger.info("Peer sync worker stopped")
@property
def is_alive(self):
return self._thread is not None and self._thread.is_alive()
# ------------------------------------------------------------------
# Internal
# ------------------------------------------------------------------
def _run(self):
logger.info("Peer sync worker starting — running initial sync now")
self._sync_all()
while not self._stop_event.wait(timeout=self.interval_seconds):
self._sync_all()
def _sync_all(self):
logger.debug(f"[peer_sync] Starting sync pass across {len(self.peers)} peer(s)")
for peer in self.peers:
url = peer.get("url")
if not url:
logger.warning("[peer_sync] Peer config missing url — skipping")
continue
try:
self._sync_from_peer(peer)
except Exception as exc:
logger.warning(f"[peer_sync] Skipping unreachable peer {url}: {exc}")
def _sync_from_peer(self, peer: dict):
url = peer.get("url", "").rstrip("/")
username = peer.get("username")
password = peer.get("password")
auth = (username, password) if username else None
# Fetch the peer's zone list
resp = requests.get(f"{url}/internal/zones", auth=auth, timeout=10)
if resp.status_code != 200:
logger.warning(
f"[peer_sync] {url}: /internal/zones returned {resp.status_code}"
)
return
peer_zones = resp.json() # [{domain, zone_updated_at, hostname, username}]
if not peer_zones:
logger.debug(f"[peer_sync] {url}: no zone_data on peer yet")
return
session = connect()
try:
synced = 0
for entry in peer_zones:
domain = entry.get("domain")
if not domain:
continue
peer_ts_str = entry.get("zone_updated_at")
peer_ts = (
datetime.datetime.fromisoformat(peer_ts_str)
if peer_ts_str
else None
)
local = session.execute(
select(Domain).filter_by(domain=domain)
).scalar_one_or_none()
needs_sync = (
local is None
or local.zone_data is None
or (peer_ts and not local.zone_updated_at)
or (
peer_ts
and local.zone_updated_at
and peer_ts > local.zone_updated_at
)
)
if not needs_sync:
continue
# Fetch full zone_data from peer
zresp = requests.get(
f"{url}/internal/zones",
params={"domain": domain},
auth=auth,
timeout=10,
)
if zresp.status_code != 200:
logger.warning(
f"[peer_sync] {url}: could not fetch zone_data "
f"for {domain} (HTTP {zresp.status_code})"
)
continue
zdata = zresp.json()
zone_data = zdata.get("zone_data")
if not zone_data:
continue
if local is None:
local = Domain(
domain=domain,
hostname=entry.get("hostname"),
username=entry.get("username"),
zone_data=zone_data,
zone_updated_at=peer_ts,
)
session.add(local)
logger.debug(
f"[peer_sync] {url}: created local record for {domain}"
)
else:
local.zone_data = zone_data
local.zone_updated_at = peer_ts
logger.debug(f"[peer_sync] {url}: updated zone_data for {domain}")
synced += 1
if synced:
session.commit()
logger.info(f"[peer_sync] Synced {synced} zone(s) from {url}")
else:
logger.debug(f"[peer_sync] {url}: already up to date")
finally:
session.close()

View File

@@ -1,11 +1,9 @@
#!/usr/bin/env python3 #!/usr/bin/env python3
import threading import threading
from urllib.parse import parse_qs
from loguru import logger from loguru import logger
from sqlalchemy import select
import requests from directdnsonly.app.da import DirectAdminClient
import requests.exceptions
from directdnsonly.app.db import connect from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain from directdnsonly.app.db.models import Domain
@@ -14,6 +12,10 @@ class ReconciliationWorker:
"""Periodically polls configured DirectAdmin servers and queues deletes """Periodically polls configured DirectAdmin servers and queues deletes
for any zones in our DB that no longer exist in DirectAdmin. for any zones in our DB that no longer exist in DirectAdmin.
Also runs an Option C backend healing pass: for each zone with stored
zone_data, checks every backend for presence and re-queues any that are
missing (e.g. after a prolonged backend outage).
Safety rules: Safety rules:
- If a DA server is unreachable, skip it entirely — never delete on uncertainty - If a DA server is unreachable, skip it entirely — never delete on uncertainty
- Only touches domains registered via DaDNS (present in our `domains` table) - Only touches domains registered via DaDNS (present in our `domains` table)
@@ -21,14 +23,23 @@ class ReconciliationWorker:
- Pushes to the existing delete_queue so the full delete path is exercised - Pushes to the existing delete_queue so the full delete path is exercised
""" """
def __init__(self, delete_queue, reconciliation_config: dict): def __init__(
self,
delete_queue,
reconciliation_config: dict,
save_queue=None,
backend_registry=None,
):
self.delete_queue = delete_queue self.delete_queue = delete_queue
self.save_queue = save_queue
self.backend_registry = backend_registry
self.enabled = reconciliation_config.get("enabled", False) self.enabled = reconciliation_config.get("enabled", False)
self.interval_seconds = reconciliation_config.get("interval_minutes", 60) * 60 self.interval_seconds = reconciliation_config.get("interval_minutes", 60) * 60
self.servers = reconciliation_config.get("directadmin_servers") or [] self.servers = reconciliation_config.get("directadmin_servers") or []
self.verify_ssl = reconciliation_config.get("verify_ssl", True) self.verify_ssl = reconciliation_config.get("verify_ssl", True)
self.ipp = int(reconciliation_config.get("ipp", 1000)) self.ipp = int(reconciliation_config.get("ipp", 1000))
self.dry_run = bool(reconciliation_config.get("dry_run", False)) self.dry_run = bool(reconciliation_config.get("dry_run", False))
self._initial_delay = reconciliation_config.get("initial_delay_minutes", 0) * 60
self._stop_event = threading.Event() self._stop_event = threading.Event()
self._thread = None self._thread = None
@@ -49,9 +60,15 @@ class ReconciliationWorker:
self._thread.start() self._thread.start()
server_names = [s.get("hostname", "?") for s in self.servers] server_names = [s.get("hostname", "?") for s in self.servers]
mode = "DRY-RUN" if self.dry_run else "LIVE" mode = "DRY-RUN" if self.dry_run else "LIVE"
delay_str = (
f", initial_delay: {self._initial_delay // 60}m"
if self._initial_delay
else ""
)
logger.info( logger.info(
f"Reconciliation poller started [{mode}] — " f"Reconciliation poller started [{mode}] — "
f"interval: {self.interval_seconds // 60}m, " f"interval: {self.interval_seconds // 60}m"
f"{delay_str}, "
f"servers: {server_names}" f"servers: {server_names}"
) )
if self.dry_run: if self.dry_run:
@@ -74,9 +91,15 @@ class ReconciliationWorker:
# ------------------------------------------------------------------ # ------------------------------------------------------------------
def _run(self): def _run(self):
if self._initial_delay > 0:
logger.info(
f"[reconciler] Initial delay {self._initial_delay // 60}m — "
f"first reconciliation pass deferred"
)
if self._stop_event.wait(timeout=self._initial_delay):
return # stopped cleanly during the initial delay
logger.info("Reconciliation worker starting — running initial check now") logger.info("Reconciliation worker starting — running initial check now")
self._reconcile_all() self._reconcile_all()
# Wait for interval or stop signal; returns True when stopped
while not self._stop_event.wait(timeout=self.interval_seconds): while not self._stop_event.wait(timeout=self.interval_seconds):
self._reconcile_all() self._reconcile_all()
@@ -86,35 +109,38 @@ class ReconciliationWorker:
f"{len(self.servers)} server(s)" f"{len(self.servers)} server(s)"
) )
total_queued = 0 total_queued = 0
# Build a map of all domains seen on all DA servers
all_da_domains = {} # domain -> hostname # Build a map of all domains seen on all DA servers: domain -> hostname
all_da_domains: dict = {}
for server in self.servers: for server in self.servers:
hostname = server.get("hostname") hostname = server.get("hostname")
if not hostname: if not hostname:
logger.warning("[reconciler] Server config missing hostname — skipping") logger.warning("[reconciler] Server config missing hostname — skipping")
continue continue
try: try:
da_domains = self._fetch_da_domains( client = DirectAdminClient(
hostname, hostname=hostname,
server.get("port", 2222), port=server.get("port", 2222),
server.get("username"), username=server.get("username"),
server.get("password"), password=server.get("password"),
server.get("ssl", True), ssl=server.get("ssl", True),
ipp=self.ipp, verify_ssl=self.verify_ssl,
) )
da_domains = client.list_domains(ipp=self.ipp)
if da_domains is not None: if da_domains is not None:
for d in da_domains: for d in da_domains:
all_da_domains[d] = hostname all_da_domains[d] = hostname
logger.debug( logger.debug(
f"[reconciler] {hostname}: {len(da_domains) if da_domains else 0} active domain(s) in DA" f"[reconciler] {hostname}: "
f"{len(da_domains) if da_domains else 0} active domain(s) in DA"
) )
except Exception as e: except Exception as exc:
logger.error(f"[reconciler] Unexpected error polling {hostname}: {e}") logger.error(f"[reconciler] Unexpected error polling {hostname}: {exc}")
# Now check local DB for all domains, update master if needed, and queue deletes only from recorded master # Compare local DB against what DA reported; update masters and queue deletes
session = connect() session = connect()
try: try:
all_local_domains = session.query(Domain).all() all_local_domains = session.execute(select(Domain)).scalars().all()
migrated = 0 migrated = 0
backfilled = 0 backfilled = 0
known_servers = {s.get("hostname") for s in self.servers} known_servers = {s.get("hostname") for s in self.servers}
@@ -137,7 +163,6 @@ class ReconciliationWorker:
record.hostname = actual_master record.hostname = actual_master
migrated += 1 migrated += 1
else: else:
# Only act if the recorded master is one we're polling
if recorded_master in known_servers: if recorded_master in known_servers:
if self.dry_run: if self.dry_run:
logger.warning( logger.warning(
@@ -158,6 +183,7 @@ class ReconciliationWorker:
f"(master: {recorded_master})" f"(master: {recorded_master})"
) )
total_queued += 1 total_queued += 1
if migrated or backfilled: if migrated or backfilled:
session.commit() session.commit()
if backfilled: if backfilled:
@@ -170,6 +196,7 @@ class ReconciliationWorker:
) )
finally: finally:
session.close() session.close()
if self.dry_run: if self.dry_run:
logger.info( logger.info(
f"[reconciler] Reconciliation pass complete [DRY-RUN] — " f"[reconciler] Reconciliation pass complete [DRY-RUN] — "
@@ -181,265 +208,72 @@ class ReconciliationWorker:
f"{total_queued} domain(s) queued for deletion" f"{total_queued} domain(s) queued for deletion"
) )
def _fetch_da_domains( # Option C: heal backends that are missing zones
self, if self.save_queue is not None and self.backend_registry is not None:
hostname: str, self._heal_backends()
port: int,
username: str,
password: str,
use_ssl: bool,
ipp: int = 1000,
):
"""Fetch all domains from a DA server via CMD_DNS_ADMIN (JSON, paging supported).
Returns a set of domain strings on success, or None on any failure. def _heal_backends(self):
"""Check every backend for zone presence and re-queue any zone that is
missing from one or more backends, using the stored zone_data as the
authoritative source. This corrects backends that missed pushes due to
downtime without waiting for DirectAdmin to re-send the zone.
""" """
scheme = "https" if use_ssl else "http" backends = self.backend_registry.get_available_backends()
page = 1 if not backends:
all_domains = set() return
total_pages = 1
cookies = None
session = connect()
try: try:
while page <= total_pages: domains = session.execute(
url = f"{scheme}://{hostname}:{port}/CMD_DNS_ADMIN?json=yes&page={page}&ipp={ipp}" select(Domain).where(Domain.zone_data.isnot(None))
req_kwargs = dict( ).scalars().all()
timeout=30, if not domains:
verify=self.verify_ssl,
allow_redirects=False,
)
if cookies:
req_kwargs["cookies"] = cookies
else:
req_kwargs["auth"] = (username, password)
response = requests.get(url, **req_kwargs)
if response.is_redirect or response.status_code in (
301,
302,
303,
307,
308,
):
if not cookies:
logger.debug( logger.debug(
f"[reconciler] {hostname}:{port} redirected Basic Auth " "[reconciler] Healing pass: no zone_data stored yet — skipping"
f"(HTTP {response.status_code}) — attempting session login (DA Evo)"
) )
cookies = self._da_session_login( return
scheme, hostname, port, username, password
)
if cookies is None:
return None
continue # retry this page with cookies
else:
logger.error(
f"[reconciler] {hostname}:{port} still redirecting after session login — "
f"check that '{username}' has admin-level access. Skipping."
)
return None
response.raise_for_status() healed = 0
content_type = response.headers.get("Content-Type", "") for record in domains:
if "text/html" in content_type: missing = []
logger.error( for backend_name, backend in backends.items():
f"[reconciler] {hostname}:{port} returned HTML instead of API response — "
f"check credentials and admin-level access. Skipping."
)
return None
# Try JSON first
try: try:
data = response.json() if not backend.zone_exists(record.domain):
# Domains are in keys '0', '1', ... missing.append(backend_name)
for k, v in data.items(): except Exception as exc:
if k.isdigit() and isinstance(v, dict) and "domain" in v: logger.warning(
all_domains.add(v["domain"].strip().lower()) f"[reconciler] heal: zone_exists check failed for "
# Paging info f"{record.domain} on {backend_name}: {exc}"
info = data.get("info", {})
total_pages = int(info.get("total_pages", 1))
page += 1
continue
except Exception as e:
logger.error(
f"[reconciler] JSON decode failed for {hostname}:{port} page {page}: {e}\nRaw response: {response.text[:500]}"
)
# Fallback to legacy parser
domains = self._parse_da_domain_list(response.text)
all_domains.update(domains)
break # No paging in legacy mode
return all_domains
except requests.exceptions.SSLError as e:
logger.error(
f"[reconciler] SSL error connecting to {hostname}:{port}{e}. "
f"Set verify_ssl: false in reconciliation config if using self-signed certs."
)
return None
except requests.exceptions.ConnectionError as e:
logger.error(
f"[reconciler] Cannot reach {hostname}:{port}{e}. "
f"Skipping this server."
)
return None
except requests.exceptions.Timeout:
logger.error(
f"[reconciler] Timeout connecting to {hostname}:{port}. "
f"Skipping this server."
)
return None
except requests.exceptions.HTTPError as e:
logger.error(
f"[reconciler] HTTP {response.status_code} from {hostname}:{port}{e}. "
f"Skipping this server."
)
return None
except Exception as e:
logger.error(f"[reconciler] Unexpected error fetching from {hostname}: {e}")
return None
def _da_session_login(
self, scheme: str, hostname: str, port: int, username: str, password: str
):
"""POST to CMD_LOGIN to obtain a DA Evo session cookie.
Returns a RequestsCookieJar on success, or None on failure.
"""
login_url = f"{scheme}://{hostname}:{port}/CMD_LOGIN"
try:
response = requests.post(
login_url,
data={
"username": username,
"password": password,
"referer": "/CMD_DNS_ADMIN?json=yes&page=1&ipp=500",
},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if not response.cookies:
logger.error(
f"[reconciler] {hostname}:{port} CMD_LOGIN returned no session cookie — "
f"check username/password."
)
return None
logger.debug(
f"[reconciler] {hostname}:{port} session login successful (DA Evo)"
)
return response.cookies
except Exception as e:
logger.error(f"[reconciler] {hostname}:{port} session login failed: {e}")
return None
@staticmethod
def _parse_da_domain_list(body: str) -> set:
"""Parse DA's CMD_API_SHOW_ALL_DOMAINS response.
DA returns URL-encoded key=value pairs, either on one line or newline-
separated. The domain list uses the key 'list[]'.
Example response:
list[]=example.com&list[]=example2.com
"""
# Normalise newline-separated responses to a single query string
normalised = body.replace("\n", "&").strip("&")
params = parse_qs(normalised)
domains = params.get("list[]", [])
return {d.strip().lower() for d in domains if d.strip()}
if __name__ == "__main__":
import argparse
import sys
from queue import Queue
parser = argparse.ArgumentParser(
description="Test DirectAdmin domain fetcher (JSON/paging)"
)
parser.add_argument("--hostname", required=True, help="DirectAdmin server hostname")
parser.add_argument(
"--port", type=int, default=2222, help="DirectAdmin port (default: 2222)"
)
parser.add_argument("--username", required=True, help="DirectAdmin admin username")
parser.add_argument("--password", required=True, help="DirectAdmin admin password")
parser.add_argument("--ssl", action="store_true", help="Use HTTPS (default: True)")
parser.add_argument(
"--no-ssl", dest="ssl", action="store_false", help="Use HTTP (not recommended)"
)
parser.set_defaults(ssl=True)
parser.add_argument(
"--verify-ssl", action="store_true", help="Verify SSL certs (default: True)"
)
parser.add_argument(
"--no-verify-ssl",
dest="verify_ssl",
action="store_false",
help="Don't verify SSL certs",
)
parser.set_defaults(verify_ssl=True)
parser.add_argument(
"--ipp", type=int, default=1000, help="Items per page (default: 1000)"
)
parser.add_argument(
"--print-json",
action="store_true",
help="Print raw JSON response for first page",
) )
args = parser.parse_args() if missing:
mode = "[DRY-RUN] Would heal" if self.dry_run else "Healing"
# Minimal config for testing logger.warning(
config = { f"[reconciler] {mode}{record.domain} missing from "
"enabled": True, f"{missing}; re-queuing with stored zone_data"
"directadmin_servers": [ )
if not self.dry_run:
self.save_queue.put(
{ {
"hostname": args.hostname, "domain": record.domain,
"port": args.port, "hostname": record.hostname or "",
"username": args.username, "username": record.username or "",
"password": args.password, "zone_file": record.zone_data,
"ssl": args.ssl, "failed_backends": missing,
"retry_count": 0,
"source": "reconciler_heal",
} }
],
"verify_ssl": args.verify_ssl,
}
q = Queue()
worker = ReconciliationWorker(q, config)
server = config["directadmin_servers"][0]
print(
f"Fetching domains from {server['hostname']}:{server['port']} (ipp={args.ipp})..."
) )
# Directly call the fetch method for testing healed += 1
domains = worker._fetch_da_domains(
server["hostname"],
server.get("port", 2222),
server.get("username"),
server.get("password"),
server.get("ssl", True),
ipp=args.ipp,
)
if domains is None:
print("Failed to fetch domains.", file=sys.stderr)
sys.exit(1)
print(f"Fetched {len(domains)} domains:")
for d in sorted(domains):
print(d)
if args.print_json: if healed:
# Print the first page's raw JSON for inspection logger.info(
scheme = "https" if server.get("ssl", True) else "http" f"[reconciler] Healing pass complete — "
url = f"{scheme}://{server['hostname']}:{server.get('port', 2222)}/CMD_DNS_ADMIN?json=yes&page=1&ipp={args.ipp}" f"{healed} zone(s) re-queued for backend recovery"
resp = requests.get(
url,
auth=(server.get("username"), server.get("password")),
timeout=30,
verify=args.verify_ssl,
allow_redirects=False,
) )
try: else:
print("\nRaw JSON for first page:") logger.debug(
print(resp.json()) "[reconciler] Healing pass complete — all backends consistent"
except Exception: )
print("(Could not parse JSON)") finally:
session.close()

View File

@@ -1,4 +1,5 @@
from loguru import logger from loguru import logger
from sqlalchemy import select
from directdnsonly.app.db.models import * from directdnsonly.app.db.models import *
from directdnsonly.app.db import connect from directdnsonly.app.db import connect
@@ -8,12 +9,11 @@ def check_zone_exists(zone_name):
# Check if zone is present in the index # Check if zone is present in the index
session = connect() session = connect()
logger.debug("Checking if {} is present in the DB".format(zone_name)) logger.debug("Checking if {} is present in the DB".format(zone_name))
domain_exists = bool(session.query(Domain.id).filter_by(domain=zone_name).first()) domain_exists = bool(
session.execute(select(Domain.id).filter_by(domain=zone_name)).first()
)
logger.debug("Returned from query: {}".format(domain_exists)) logger.debug("Returned from query: {}".format(domain_exists))
if domain_exists: return domain_exists
return True
else:
return False
def put_zone_index(zone_name, host_name, user_name): def put_zone_index(zone_name, host_name, user_name):
@@ -28,7 +28,9 @@ def put_zone_index(zone_name, host_name, user_name):
def get_domain_record(zone_name): def get_domain_record(zone_name):
"""Return the Domain record for zone_name, or None if not found""" """Return the Domain record for zone_name, or None if not found"""
session = connect() session = connect()
return session.query(Domain).filter_by(domain=zone_name).first() return session.execute(
select(Domain).filter_by(domain=zone_name)
).scalar_one_or_none()
def check_parent_domain_owner(zone_name): def check_parent_domain_owner(zone_name):
@@ -38,7 +40,9 @@ def check_parent_domain_owner(zone_name):
return False return False
session = connect() session = connect()
logger.debug("Checking if parent domain {} exists in DB".format(parent_domain)) logger.debug("Checking if parent domain {} exists in DB".format(parent_domain))
return bool(session.query(Domain.id).filter_by(domain=parent_domain).first()) return bool(
session.execute(select(Domain.id).filter_by(domain=parent_domain)).first()
)
def get_parent_domain_record(zone_name): def get_parent_domain_record(zone_name):
@@ -47,4 +51,6 @@ def get_parent_domain_record(zone_name):
if not parent_domain: if not parent_domain:
return None return None
session = connect() session = connect()
return session.query(Domain).filter_by(domain=parent_domain).first() return session.execute(
select(Domain).filter_by(domain=parent_domain)
).scalar_one_or_none()

View File

@@ -10,10 +10,12 @@ from typing import Any, Dict
def load_config() -> Vyper: def load_config() -> Vyper:
# Initialize Vyper # Initialize Vyper
v.set_config_name("app") # Looks for app.yaml/app.yml v.set_config_name("app") # Looks for app.yaml/app.yml
# Bundled config colocated with this module (always present in the package) # User-supplied paths checked first so they override the bundled defaults
v.add_config_path("/etc/directdnsonly") # system-level mount
v.add_config_path(".") # CWD (e.g. /app when run directly)
v.add_config_path("./config") # docker-compose volume mount at /app/config
# Bundled config colocated with this module — last-resort fallback
v.add_config_path(str(Path(__file__).parent)) v.add_config_path(str(Path(__file__).parent))
v.add_config_path(".") # Search in current directory
v.add_config_path("./config")
v.set_env_prefix("DADNS") v.set_env_prefix("DADNS")
v.set_env_key_replacer("_", ".") v.set_env_key_replacer("_", ".")
v.automatic_env() v.automatic_env()
@@ -41,6 +43,10 @@ def load_config() -> Vyper:
v.set_default("dns.backends.bind.zones_dir", "/etc/named/zones") v.set_default("dns.backends.bind.zones_dir", "/etc/named/zones")
v.set_default("dns.backends.bind.named_conf", "/etc/named.conf.local") v.set_default("dns.backends.bind.named_conf", "/etc/named.conf.local")
v.set_default("dns.backends.nsd.enabled", False)
v.set_default("dns.backends.nsd.zones_dir", "/etc/nsd/zones")
v.set_default("dns.backends.nsd.nsd_conf", "/etc/nsd/nsd.conf.d/zones.conf")
v.set_default("dns.backends.coredns_mysql.enabled", False) v.set_default("dns.backends.coredns_mysql.enabled", False)
v.set_default("dns.backends.coredns_mysql.host", "localhost") v.set_default("dns.backends.coredns_mysql.host", "localhost")
v.set_default("dns.backends.coredns_mysql.port", 3306) v.set_default("dns.backends.coredns_mysql.port", 3306)
@@ -60,6 +66,10 @@ def load_config() -> Vyper:
v.set_default("reconciliation.interval_minutes", 60) v.set_default("reconciliation.interval_minutes", 60)
v.set_default("reconciliation.verify_ssl", True) v.set_default("reconciliation.verify_ssl", True)
# Peer sync defaults
v.set_default("peer_sync.enabled", False)
v.set_default("peer_sync.interval_minutes", 15)
# Read configuration # Read configuration
try: try:
if not v.read_in_config(): if not v.read_in_config():

View File

@@ -14,6 +14,8 @@ app:
# enabled: true # enabled: true
# dry_run: true # log orphans but do NOT queue deletes — safe first-run mode # dry_run: true # log orphans but do NOT queue deletes — safe first-run mode
# interval_minutes: 60 # interval_minutes: 60
# initial_delay_minutes: 0 # stagger first run when running multiple receivers behind a LB
# # e.g. receiver-1: 0, receiver-2: 30 (half the interval)
# verify_ssl: true # set false for self-signed DA certs # verify_ssl: true # set false for self-signed DA certs
# ipp: 1000 # items per page when polling DA (default 1000) # ipp: 1000 # items per page when polling DA (default 1000)
# directadmin_servers: # directadmin_servers:
@@ -28,6 +30,18 @@ app:
# password: secret # password: secret
# ssl: true # ssl: true
# Peer sync — exchange zone_data between directdnsonly instances
# Enables eventual consistency without a shared datastore.
# If a peer is offline, the sync is silently skipped and retried next interval.
# Use the same credentials as the peer's app.auth_username / auth_password.
#peer_sync:
# enabled: true
# interval_minutes: 15
# peers:
# - url: http://ddo-2:2222 # URL of the peer directdnsonly instance
# username: directdnsonly
# password: changeme
dns: dns:
default_backend: bind default_backend: bind
backends: backends:

View File

@@ -3,6 +3,7 @@ import cherrypy
from app.backends import BackendRegistry from app.backends import BackendRegistry
from app.api.admin import DNSAdminAPI from app.api.admin import DNSAdminAPI
from app.api.health import HealthAPI from app.api.health import HealthAPI
from app.api.internal import InternalAPI
from app import configure_logging from app import configure_logging
from worker import WorkerManager from worker import WorkerManager
from directdnsonly.config import config from directdnsonly.config import config
@@ -38,10 +39,12 @@ def main():
# Setup worker manager # Setup worker manager
reconciliation_config = config.get("reconciliation") or {} reconciliation_config = config.get("reconciliation") or {}
peer_sync_config = config.get("peer_sync") or {}
worker_manager = WorkerManager( worker_manager = WorkerManager(
queue_path=config.get("queue_location"), queue_path=config.get("queue_location"),
backend_registry=registry, backend_registry=registry,
reconciliation_config=reconciliation_config, reconciliation_config=reconciliation_config,
peer_sync_config=peer_sync_config,
) )
worker_manager.start() worker_manager.start()
logger.info( logger.info(
@@ -95,6 +98,7 @@ def main():
backend_registry=registry, backend_registry=registry,
) )
root.health = HealthAPI(registry) root.health = HealthAPI(registry)
root.internal = InternalAPI()
# Add queue status endpoint # Add queue status endpoint
root.queue_status = lambda: worker_manager.queue_status() root.queue_status = lambda: worker_manager.queue_status()

View File

@@ -1,3 +1,4 @@
import datetime
import os import os
import threading import threading
import time import time
@@ -5,43 +6,62 @@ from concurrent.futures import ThreadPoolExecutor, as_completed
from loguru import logger from loguru import logger
from persistqueue import Queue from persistqueue import Queue
from persistqueue.exceptions import Empty from persistqueue.exceptions import Empty
from sqlalchemy import select
from app.utils import check_zone_exists, put_zone_index from app.utils import check_zone_exists, put_zone_index
from app.utils.zone_parser import count_zone_records from app.utils.zone_parser import count_zone_records
from directdnsonly.app.db.models import Domain from directdnsonly.app.db.models import Domain
from directdnsonly.app.db import connect from directdnsonly.app.db import connect
from directdnsonly.app.reconciler import ReconciliationWorker from directdnsonly.app.reconciler import ReconciliationWorker
from directdnsonly.app.peer_sync import PeerSyncWorker
# ---------------------------------------------------------------------------
# Retry configuration
# ---------------------------------------------------------------------------
MAX_RETRIES = 5
# Seconds to wait before each retry attempt (exponential-ish backoff)
BACKOFF_SECONDS = [30, 120, 300, 900, 1800] # 30s, 2m, 5m, 15m, 30m
RETRY_DRAIN_INTERVAL = 30 # how often the retry drain thread wakes
class WorkerManager: class WorkerManager:
def __init__( def __init__(
self, queue_path: str, backend_registry, reconciliation_config: dict = None self,
queue_path: str,
backend_registry,
reconciliation_config: dict = None,
peer_sync_config: dict = None,
): ):
self.queue_path = queue_path self.queue_path = queue_path
self.backend_registry = backend_registry self.backend_registry = backend_registry
self._running = False self._running = False
self._save_thread = None self._save_thread = None
self._delete_thread = None self._delete_thread = None
self._retry_thread = None
self._reconciler = None self._reconciler = None
self._peer_syncer = None
self._reconciliation_config = reconciliation_config or {} self._reconciliation_config = reconciliation_config or {}
self._peer_sync_config = peer_sync_config or {}
# Initialize queues with error handling
try: try:
os.makedirs(queue_path, exist_ok=True) os.makedirs(queue_path, exist_ok=True)
self.save_queue = Queue(f"{queue_path}/save") self.save_queue = Queue(f"{queue_path}/save")
self.delete_queue = Queue(f"{queue_path}/delete") self.delete_queue = Queue(f"{queue_path}/delete")
self.retry_queue = Queue(f"{queue_path}/retry")
logger.success(f"Initialized queues at {queue_path}") logger.success(f"Initialized queues at {queue_path}")
except Exception as e: except Exception as e:
logger.critical(f"Failed to initialize queues: {e}") logger.critical(f"Failed to initialize queues: {e}")
raise raise
# ------------------------------------------------------------------
# Save queue worker
# ------------------------------------------------------------------
def _process_save_queue(self): def _process_save_queue(self):
"""Main worker loop for processing save requests"""
logger.info("Save queue worker started") logger.info("Save queue worker started")
# Get DB Connection
session = connect() session = connect()
# Batch tracking
batch_start = None batch_start = None
batch_processed = 0 batch_processed = 0
batch_failed = 0 batch_failed = 0
@@ -50,58 +70,67 @@ class WorkerManager:
try: try:
item = self.save_queue.get(block=True, timeout=5) item = self.save_queue.get(block=True, timeout=5)
# Start a new batch timer on the first item
if batch_start is None: if batch_start is None:
batch_start = time.monotonic() batch_start = time.monotonic()
batch_processed = 0 batch_processed = 0
batch_failed = 0 batch_failed = 0
pending = self.save_queue.qsize() pending = self.save_queue.qsize()
logger.info( logger.info(
f"📥 Batch started — {pending + 1} zone(s) queued " f"📥 Batch started — {pending + 1} zone(s) queued for processing"
f"for processing"
) )
domain = item.get("domain", "unknown")
is_retry = item.get("source") in ("retry", "reconciler_heal")
target_backends = item.get("failed_backends") # None = all backends
logger.debug( logger.debug(
f"Processing zone update for {item.get('domain', 'unknown')}" f"Processing zone update for {domain}"
+ (f" [retry #{item.get('retry_count', 0)}]" if is_retry else "")
+ (f" [backends: {target_backends}]" if target_backends else "")
) )
if not check_zone_exists(item.get("domain")): if not is_retry and not check_zone_exists(domain):
put_zone_index( put_zone_index(domain, item.get("hostname"), item.get("username"))
item.get("domain"), item.get("hostname"), item.get("username")
)
# Validate item structure
if not all(k in item for k in ["domain", "zone_file"]): if not all(k in item for k in ["domain", "zone_file"]):
logger.error(f"Invalid queue item: {item}") logger.error(f"Invalid queue item: {item}")
self.save_queue.task_done() self.save_queue.task_done()
batch_failed += 1 batch_failed += 1
continue continue
# Process with all available backends
backends = self.backend_registry.get_available_backends() backends = self.backend_registry.get_available_backends()
if target_backends:
backends = {
k: v for k, v in backends.items() if k in target_backends
}
if not backends: if not backends:
logger.warning("No active backends available!") logger.warning("No target backends available for this item!")
self.save_queue.task_done()
batch_failed += 1
continue
if len(backends) > 1: if len(backends) > 1:
# Process backends in parallel for faster sync failed = self._process_backends_parallel(backends, item, session)
logger.debug(
f"Processing {item['domain']} across "
f"{len(backends)} backends concurrently: "
f"{', '.join(backends.keys())}"
)
self._process_backends_parallel(backends, item, session)
else: else:
# Single backend, no need for thread overhead failed = set()
for backend_name, backend in backends.items(): for backend_name, backend in backends.items():
self._process_single_backend( if not self._process_single_backend(
backend_name, backend, item, session backend_name, backend, item, session
) ):
failed.add(backend_name)
if failed:
self._schedule_retry(item, failed)
batch_failed += 1
else:
# Successful write — persist zone_data for Option C healing
self._store_zone_data(session, domain, item["zone_file"])
batch_processed += 1
self.save_queue.task_done() self.save_queue.task_done()
batch_processed += 1 logger.debug(f"Completed processing for {domain}")
logger.debug(f"Completed processing for {item['domain']}")
except Empty: except Empty:
# Queue is empty — if we were in a batch, log the summary
if batch_start is not None: if batch_start is not None:
elapsed = time.monotonic() - batch_start elapsed = time.monotonic() - batch_start
total = batch_processed + batch_failed total = batch_processed + batch_failed
@@ -119,35 +148,146 @@ class WorkerManager:
except Exception as e: except Exception as e:
logger.error(f"Unexpected worker error: {e}") logger.error(f"Unexpected worker error: {e}")
batch_failed += 1 batch_failed += 1
time.sleep(1) # Prevent tight error loops time.sleep(1)
def _process_single_backend(self, backend_name, backend, item, session): def _process_single_backend(self, backend_name, backend, item, session) -> bool:
"""Process a zone update for a single backend""" """Write a zone to one backend. Returns True on success, False on failure."""
try: try:
logger.debug(f"Using backend: {backend_name}")
if backend.write_zone(item["domain"], item["zone_file"]): if backend.write_zone(item["domain"], item["zone_file"]):
logger.debug(f"Successfully updated {item['domain']} in {backend_name}") logger.debug(f"Successfully updated {item['domain']} in {backend_name}")
if backend.get_name() == "bind": if backend.get_name() == "bind":
# Need to update the named.conf
backend.update_named_conf( backend.update_named_conf(
[d.domain for d in session.query(Domain).all()] [d.domain for d in session.execute(select(Domain)).scalars().all()]
) )
# Reload all zones
backend.reload_zone() backend.reload_zone()
else: else:
backend.reload_zone(zone_name=item["domain"]) backend.reload_zone(zone_name=item["domain"])
# Verify record count matches the source zone from DirectAdmin
self._verify_backend_record_count( self._verify_backend_record_count(
backend_name, backend, item["domain"], item["zone_file"] backend_name, backend, item["domain"], item["zone_file"]
) )
return True
else: else:
logger.error(f"Failed to update {item['domain']} in {backend_name}") logger.error(f"Failed to update {item['domain']} in {backend_name}")
return False
except Exception as e: except Exception as e:
logger.error(f"Error in {backend_name}: {str(e)}") logger.error(f"Error in {backend_name}: {str(e)}")
return False
def _process_backends_parallel(self, backends, item, session) -> set:
"""Write a zone to multiple backends concurrently.
Returns a set of backend names that failed."""
start_time = time.monotonic()
failed = set()
with ThreadPoolExecutor(
max_workers=len(backends), thread_name_prefix="backend"
) as executor:
futures = {
executor.submit(
self._process_single_backend, backend_name, backend, item, session
): backend_name
for backend_name, backend in backends.items()
}
for future in as_completed(futures):
backend_name = futures[future]
try:
success = future.result()
if not success:
failed.add(backend_name)
except Exception as e:
logger.error(f"Unhandled error in backend {backend_name}: {e}")
failed.add(backend_name)
elapsed = (time.monotonic() - start_time) * 1000
logger.debug(
f"Parallel processing of {item['domain']} across "
f"{len(backends)} backends completed in {elapsed:.0f}ms"
)
return failed
def _schedule_retry(self, item: dict, failed_backends: set):
"""Push a failed write onto the retry queue with exponential backoff.
Discards to dead-letter after MAX_RETRIES attempts."""
retry_count = item.get("retry_count", 0) + 1
if retry_count > MAX_RETRIES:
logger.error(
f"[retry] Dead-letter: {item['domain']} failed on "
f"{failed_backends} after {MAX_RETRIES} attempts — giving up"
)
return
delay = BACKOFF_SECONDS[min(retry_count - 1, len(BACKOFF_SECONDS) - 1)]
retry_item = {
**item,
"failed_backends": list(failed_backends),
"retry_count": retry_count,
"retry_after": time.time() + delay,
"source": "retry",
}
self.retry_queue.put(retry_item)
logger.warning(
f"[retry] {item['domain']}{list(failed_backends)} "
f"scheduled for retry #{retry_count} in {delay}s"
)
def _store_zone_data(self, session, domain: str, zone_file: str):
"""Persist the latest zone file content to the domain DB record."""
try:
record = session.execute(
select(Domain).filter_by(domain=domain)
).scalar_one_or_none()
if record:
record.zone_data = zone_file
record.zone_updated_at = datetime.datetime.utcnow()
session.commit()
except Exception as exc:
logger.warning(f"[worker] Could not store zone_data for {domain}: {exc}")
# ------------------------------------------------------------------
# Retry drain worker
# ------------------------------------------------------------------
def _process_retry_queue(self):
"""Periodically drain the retry queue and re-feed ready items to the
save queue. Items not yet due are put back onto the retry queue."""
logger.info("Retry drain worker started")
while self._running:
time.sleep(RETRY_DRAIN_INTERVAL)
now = time.time()
pending = []
# Drain all current retry items into memory
while True:
try:
pending.append(self.retry_queue.get_nowait())
self.retry_queue.task_done()
except Empty:
break
if not pending:
continue
ready = [i for i in pending if i.get("retry_after", 0) <= now]
not_ready = [i for i in pending if i.get("retry_after", 0) > now]
for item in not_ready:
self.retry_queue.put(item)
for item in ready:
logger.info(
f"[retry] Re-queuing {item['domain']}"
f"{item.get('failed_backends')} "
f"(attempt #{item.get('retry_count', '?')})"
)
self.save_queue.put(item)
if ready:
logger.debug(
f"[retry] Drain: {len(ready)} item(s) ready, "
f"{len(not_ready)} still pending"
)
# ------------------------------------------------------------------
# Delete queue worker
# ------------------------------------------------------------------
def _process_delete_queue(self): def _process_delete_queue(self):
"""Worker loop for processing zone deletion requests"""
logger.info("Delete queue worker started") logger.info("Delete queue worker started")
session = connect() session = connect()
@@ -159,7 +299,9 @@ class WorkerManager:
logger.debug(f"Processing delete for {domain}") logger.debug(f"Processing delete for {domain}")
record = session.query(Domain).filter_by(domain=domain).first() record = session.execute(
select(Domain).filter_by(domain=domain)
).scalar_one_or_none()
if not record: if not record:
logger.warning(f"Domain {domain} not found in DB — skipping delete") logger.warning(f"Domain {domain} not found in DB — skipping delete")
self.delete_queue.task_done() self.delete_queue.task_done()
@@ -179,111 +321,18 @@ class WorkerManager:
) )
backends = self.backend_registry.get_available_backends() backends = self.backend_registry.get_available_backends()
remaining_domains = [d.domain for d in session.query(Domain).all()] remaining_domains = [
d.domain for d in session.execute(select(Domain)).scalars().all()
]
delete_success = True delete_success = True
if not backends: if not backends:
logger.warning( logger.warning(
f"No active backends — {domain} will be removed from DB only" f"No active backends — {domain} will be removed from DB only"
) )
elif len(backends) > 1: elif len(backends) > 1:
# Parallel delete, track failures
results = [] results = []
def delete_backend_wrapper(
backend_name, backend, domain, remaining_domains
):
try:
return backend.delete_zone(domain)
except Exception as e:
logger.error(
f"Error deleting {domain} from {backend_name}: {e}"
)
return False
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=len(backends)) as executor: with ThreadPoolExecutor(max_workers=len(backends)) as executor:
futures = {
executor.submit(
delete_backend_wrapper,
backend_name,
backend,
domain,
remaining_domains,
): backend_name
for backend_name, backend in backends.items()
}
for future in as_completed(futures):
backend_name = futures[future]
try:
result = future.result()
results.append(result)
if not result:
logger.error(
f"Failed to delete {domain} from {backend_name}"
)
except Exception as e:
logger.error(
f"Unhandled error deleting from {backend_name}: {e}"
)
results.append(False)
delete_success = all(results)
else:
# Single backend
for backend_name, backend in backends.items():
try:
result = backend.delete_zone(domain)
if not result:
logger.error(
f"Failed to delete {domain} from {backend_name}"
)
delete_success = False
except Exception as e:
logger.error(
f"Error deleting {domain} from {backend_name}: {e}"
)
delete_success = False
if delete_success:
session.delete(record)
session.commit()
logger.info(f"Removed {domain} from database")
self.delete_queue.task_done()
logger.success(f"Delete completed for {domain}")
else:
logger.error(
f"Delete failed for {domain} on one or more backends — DB record retained"
)
self.delete_queue.task_done()
except Empty:
continue
except Exception as e:
logger.error(f"Unexpected delete worker error: {e}")
time.sleep(1)
def _delete_single_backend(self, backend_name, backend, domain, remaining_domains):
"""Delete a zone from a single backend"""
try:
if backend.delete_zone(domain):
logger.debug(f"Deleted {domain} from {backend_name}")
if backend.get_name() == "bind":
backend.update_named_conf(remaining_domains)
backend.reload_zone()
else:
backend.reload_zone(zone_name=domain)
else:
logger.error(f"Failed to delete {domain} from {backend_name}")
except Exception as e:
logger.error(f"Error deleting {domain} from {backend_name}: {e}")
def _process_backends_delete_parallel(self, backends, domain, remaining_domains):
"""Delete a zone from multiple backends in parallel"""
start_time = time.monotonic()
with ThreadPoolExecutor(
max_workers=len(backends),
thread_name_prefix="backend_del",
) as executor:
futures = { futures = {
executor.submit( executor.submit(
self._delete_single_backend, self._delete_single_backend,
@@ -297,58 +346,62 @@ class WorkerManager:
for future in as_completed(futures): for future in as_completed(futures):
backend_name = futures[future] backend_name = futures[future]
try: try:
future.result() results.append(future.result())
except Exception as e:
logger.error(f"Unhandled error deleting from {backend_name}: {e}")
elapsed = (time.monotonic() - start_time) * 1000
logger.debug(
f"Parallel delete of {domain} across "
f"{len(backends)} backends completed in {elapsed:.0f}ms"
)
def _process_backends_parallel(self, backends, item, session):
"""Process zone updates across multiple backends in parallel"""
start_time = time.monotonic()
with ThreadPoolExecutor(
max_workers=len(backends), thread_name_prefix="backend"
) as executor:
futures = {
executor.submit(
self._process_single_backend, backend_name, backend, item, session
): backend_name
for backend_name, backend in backends.items()
}
for future in as_completed(futures):
backend_name = futures[future]
try:
future.result()
except Exception as e: except Exception as e:
logger.error( logger.error(
f"Unhandled error processing backend " f"Unhandled error deleting from {backend_name}: {e}"
f"{backend_name}: {str(e)}"
) )
elapsed = (time.monotonic() - start_time) * 1000 results.append(False)
logger.debug( delete_success = all(results)
f"Parallel processing of {item['domain']} across " else:
f"{len(backends)} backends completed in {elapsed:.0f}ms" for backend_name, backend in backends.items():
if not self._delete_single_backend(
backend_name, backend, domain, remaining_domains
):
delete_success = False
if delete_success:
session.delete(record)
session.commit()
logger.success(f"Delete completed for {domain}")
else:
logger.error(
f"Delete failed for {domain} on one or more backends — "
f"DB record retained"
) )
self.delete_queue.task_done()
except Empty:
continue
except Exception as e:
logger.error(f"Unexpected delete worker error: {e}")
time.sleep(1)
def _delete_single_backend(
self, backend_name, backend, domain, remaining_domains
) -> bool:
"""Delete a zone from one backend. Returns True on success."""
try:
if backend.delete_zone(domain):
logger.debug(f"Deleted {domain} from {backend_name}")
if backend.get_name() == "bind":
backend.update_named_conf(remaining_domains)
backend.reload_zone()
else:
backend.reload_zone(zone_name=domain)
return True
else:
logger.error(f"Failed to delete {domain} from {backend_name}")
return False
except Exception as e:
logger.error(f"Error deleting {domain} from {backend_name}: {e}")
return False
# ------------------------------------------------------------------
# Record count verification
# ------------------------------------------------------------------
def _verify_backend_record_count(self, backend_name, backend, zone_name, zone_data): def _verify_backend_record_count(self, backend_name, backend, zone_name, zone_data):
"""Verify and reconcile the backend record count against the
authoritative BIND zone from DirectAdmin.
After a successful write, this method checks whether the number of
records stored in the backend matches the number of records parsed
from the source zone file. If there are **extra** records in the
backend (e.g. from replication drift or stale data) they are
automatically removed via the backend's reconcile method.
Args:
backend_name: Display name of the backend instance
backend: The backend instance
zone_name: The zone that was just written
zone_data: The raw BIND zone file content (authoritative source)
"""
try: try:
expected = count_zone_records(zone_data, zone_name) expected = count_zone_records(zone_data, zone_name)
if expected < 0: if expected < 0:
@@ -359,46 +412,40 @@ class WorkerManager:
return return
matches, actual = backend.verify_zone_record_count(zone_name, expected) matches, actual = backend.verify_zone_record_count(zone_name, expected)
if matches: if matches:
return # All good return
if actual > expected: if actual > expected:
logger.warning( logger.warning(
f"[{backend_name}] Backend has {actual - expected} extra " f"[{backend_name}] Backend has {actual - expected} extra "
f"record(s) for {zone_name} — reconciling against " f"record(s) for {zone_name} — reconciling"
f"DirectAdmin source zone"
) )
success, removed = backend.reconcile_zone_records(zone_name, zone_data) success, removed = backend.reconcile_zone_records(zone_name, zone_data)
if success and removed > 0: if success and removed > 0:
# Verify again after reconciliation
matches, new_count = backend.verify_zone_record_count( matches, new_count = backend.verify_zone_record_count(
zone_name, expected zone_name, expected
) )
if matches: if matches:
logger.success( logger.success(
f"[{backend_name}] Reconciliation successful for " f"[{backend_name}] Reconciliation successful for "
f"{zone_name}: removed {removed} extra record(s), " f"{zone_name}: removed {removed} extra record(s)"
f"count now matches source ({new_count})"
) )
else: else:
logger.error( logger.error(
f"[{backend_name}] Reconciliation for {zone_name} " f"[{backend_name}] Reconciliation for {zone_name} "
f"removed {removed} record(s) but count still " f"removed {removed} record(s) but count still mismatched: "
f"mismatched: expected {expected}, got {new_count}" f"expected {expected}, got {new_count}"
) )
else: else:
logger.warning( logger.warning(
f"[{backend_name}] Backend has fewer records than source " f"[{backend_name}] Backend has fewer records than source "
f"for {zone_name} (expected {expected}, got {actual}) — " f"for {zone_name} (expected {expected}, got {actual}) — "
f"this may indicate a write failure; the next zone push " f"next zone push from DirectAdmin should correct this"
f"from DirectAdmin should correct this"
) )
except NotImplementedError: except NotImplementedError:
logger.debug( logger.debug(
f"[{backend_name}] Record count verification not " f"[{backend_name}] Record count verification not supported — skipping"
f"supported — skipping"
) )
except Exception as e: except Exception as e:
logger.error( logger.error(
@@ -406,50 +453,64 @@ class WorkerManager:
f"for {zone_name}: {e}" f"for {zone_name}: {e}"
) )
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def start(self): def start(self):
"""Start background workers"""
if self._running: if self._running:
return return
self._running = True self._running = True
self._save_thread = threading.Thread( self._save_thread = threading.Thread(
target=self._process_save_queue, daemon=True, name="save_queue_worker" target=self._process_save_queue, daemon=True, name="save_queue_worker"
) )
self._delete_thread = threading.Thread( self._delete_thread = threading.Thread(
target=self._process_delete_queue, daemon=True, name="delete_queue_worker" target=self._process_delete_queue, daemon=True, name="delete_queue_worker"
) )
self._retry_thread = threading.Thread(
target=self._process_retry_queue, daemon=True, name="retry_drain_worker"
)
self._save_thread.start() self._save_thread.start()
self._delete_thread.start() self._delete_thread.start()
logger.info( self._retry_thread.start()
f"Started worker threads: {self._save_thread.name}, {self._delete_thread.name}" logger.info(f"Started worker threads: save, delete, retry_drain")
)
self._reconciler = ReconciliationWorker( self._reconciler = ReconciliationWorker(
delete_queue=self.delete_queue, delete_queue=self.delete_queue,
save_queue=self.save_queue,
backend_registry=self.backend_registry,
reconciliation_config=self._reconciliation_config, reconciliation_config=self._reconciliation_config,
) )
self._reconciler.start() self._reconciler.start()
self._peer_syncer = PeerSyncWorker(self._peer_sync_config)
self._peer_syncer.start()
def stop(self): def stop(self):
"""Stop background workers gracefully"""
self._running = False self._running = False
if self._reconciler: if self._reconciler:
self._reconciler.stop() self._reconciler.stop()
if self._save_thread: if self._peer_syncer:
self._save_thread.join(timeout=5) self._peer_syncer.stop()
if self._delete_thread: for thread in (self._save_thread, self._delete_thread, self._retry_thread):
self._delete_thread.join(timeout=5) if thread:
thread.join(timeout=5)
logger.info("Workers stopped") logger.info("Workers stopped")
def queue_status(self): def queue_status(self):
"""Return current queue status"""
return { return {
"save_queue_size": self.save_queue.qsize(), "save_queue_size": self.save_queue.qsize(),
"delete_queue_size": self.delete_queue.qsize(), "delete_queue_size": self.delete_queue.qsize(),
"retry_queue_size": self.retry_queue.qsize(),
"save_worker_alive": self._save_thread and self._save_thread.is_alive(), "save_worker_alive": self._save_thread and self._save_thread.is_alive(),
"delete_worker_alive": self._delete_thread "delete_worker_alive": self._delete_thread
and self._delete_thread.is_alive(), and self._delete_thread.is_alive(),
"retry_worker_alive": self._retry_thread and self._retry_thread.is_alive(),
"reconciler_alive": ( "reconciler_alive": (
self._reconciler.is_alive if self._reconciler else False self._reconciler.is_alive if self._reconciler else False
), ),
"peer_syncer_alive": (
self._peer_syncer.is_alive if self._peer_syncer else False
),
} }

View File

@@ -1,12 +1,91 @@
#!/bin/bash #!/bin/bash
set -e
# Start BIND # ---------------------------------------------------------------------------
/usr/sbin/named -u bind -f & # Detect which DNS backend type(s) are configured and enabled.
# Uses the same config search order as the application itself.
# ---------------------------------------------------------------------------
detect_backend_types() {
python3 - <<'EOF'
import yaml, sys, os
## Initialize MySQL schema if needed config_paths = [
#if [ -f /app/schema/coredns_mysql.sql ]; then "/etc/directdnsonly/app.yml",
# mysql -h mysql -u root -prootpassword coredns < /app/schema/coredns_mysql.sql "/etc/directdnsonly/app.yaml",
#fi "/app/app.yml",
"/app/app.yaml",
"/app/config/app.yml",
"/app/config/app.yaml",
]
# Start the application # Also honour env-var-only deployments (no config file)
poetry run python directdnsonly/main.py bind_env = os.environ.get("DADNS_DNS_BACKENDS_BIND_ENABLED", "").lower() == "true"
nsd_env = os.environ.get("DADNS_DNS_BACKENDS_NSD_ENABLED", "").lower() == "true"
config = {}
for path in config_paths:
if os.path.exists(path):
with open(path) as f:
config = yaml.safe_load(f) or {}
break
backends = config.get("dns", {}).get("backends", {})
has_bind = bind_env
has_nsd = nsd_env
for cfg in backends.values():
if not isinstance(cfg, dict) or not cfg.get("enabled", False):
continue
btype = cfg.get("type", "")
if btype == "bind":
has_bind = True
elif btype == "nsd":
has_nsd = True
types = []
if has_bind:
types.append("bind")
if has_nsd:
types.append("nsd")
print(" ".join(types) if types else "none")
EOF
}
BACKEND_TYPES=$(detect_backend_types)
echo "[entrypoint] Detected DNS backend type(s): ${BACKEND_TYPES:-none}"
# ---------------------------------------------------------------------------
# Start BIND if a bind backend is configured
# ---------------------------------------------------------------------------
if echo "$BACKEND_TYPES" | grep -qw "bind"; then
if command -v named >/dev/null 2>&1; then
echo "[entrypoint] Starting BIND (named)"
/usr/sbin/named -u bind -f &
else
echo "[entrypoint] WARNING: bind backend configured but 'named' not found — skipping"
fi
fi
# ---------------------------------------------------------------------------
# Start NSD if an nsd backend is configured
# ---------------------------------------------------------------------------
if echo "$BACKEND_TYPES" | grep -qw "nsd"; then
if command -v nsd >/dev/null 2>&1; then
echo "[entrypoint] Starting NSD"
# Ensure nsd-control keys exist (generated on first run)
if [ ! -f /etc/nsd/nsd_server.key ]; then
nsd-control-setup 2>/dev/null || true
fi
/usr/sbin/nsd -d -c /etc/nsd/nsd.conf &
else
echo "[entrypoint] WARNING: nsd backend configured but 'nsd' not found — skipping"
fi
fi
if [ "$BACKEND_TYPES" = "none" ] || [ -z "$BACKEND_TYPES" ]; then
echo "[entrypoint] No local DNS daemon required (CoreDNS MySQL or similar)"
fi
# ---------------------------------------------------------------------------
# Start the directdnsonly application
# ---------------------------------------------------------------------------
exec python -m directdnsonly

20
docker/nsd.conf Normal file
View File

@@ -0,0 +1,20 @@
# NSD base configuration for directdnsonly containers.
# Zone stanzas are written to /etc/nsd/nsd.conf.d/zones.conf by the NSD
# backend and auto-included via the glob below.
server:
server-count: 1
ip-address: 0.0.0.0
port: 53
username: nsd
zonesdir: /etc/nsd/zones
verbosity: 1
# Log to stderr so Docker captures it
logfile: ""
remote-control:
control-enable: yes
control-interface: 127.0.0.1
control-port: 8952
include: /etc/nsd/nsd.conf.d/*.conf

View File

View File

@@ -87,6 +87,9 @@ build:
directdnsonly/main.py directdnsonly/main.py
rm -f *.spec rm -f *.spec
build-docker:
export DOCKER_CONFIG="/home/guisea/.docker/guisea" && \
docker buildx build --platform linux/amd64,linux/arm64 -t guisea/directdnsonly:dev --push --progress plain --file Dockerfile .
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Clean # Clean
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------

1117
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
[project] [project]
name = "directdnsonly" name = "directdnsonly"
version = "1.0.9" version = "2.5.0"
description = "DNS Management System - DirectAdmin to multiple backends" description = "DNS Management System - DirectAdmin to multiple backends"
authors = [ authors = [
{name = "Aaron Guise",email = "aaron@guise.net.nz"} {name = "Aaron Guise",email = "aaron@guise.net.nz"}
@@ -11,12 +11,12 @@ requires-python = ">=3.11,<3.14"
dependencies = [ dependencies = [
"vyper-config (>=1.2.1,<2.0.0)", "vyper-config (>=1.2.1,<2.0.0)",
"loguru (>=0.7.3,<0.8.0)", "loguru (>=0.7.3,<0.8.0)",
"persist-queue (>=1.0.0,<2.0.0)", "persist-queue (>=1.1.0,<2.0.0)",
"cherrypy (>=18.10.0,<19.0.0)", "cherrypy (>=18.10.0,<19.0.0)",
"sqlalchemy (<2.0.0)", "sqlalchemy (>=2.0.0,<3.0.0)",
"pymysql (>=1.1.1,<2.0.0)", "pymysql (>=1.1.2,<2.0.0)",
"dnspython (>=2.7.0,<3.0.0)", "dnspython (>=2.8.0,<3.0.0)",
"pyyaml (>=6.0.2,<7.0.0)", "pyyaml (>=6.0.3,<7.0.0)",
"requests (>=2.32.0,<3.0.0)", "requests (>=2.32.0,<3.0.0)",
] ]
@@ -24,11 +24,11 @@ dependencies = [
package-mode = true package-mode = true
[tool.poetry.group.dev.dependencies] [tool.poetry.group.dev.dependencies]
black = "^25.1.0" black = "^26.1.0"
pyinstaller = "^6.13.0" pyinstaller = "^6.13.0"
pytest = "^8.3.5" pytest = "^9.0.2"
pytest-cov = "^6.1.1" pytest-cov = "^7.0.0"
pytest-mock = "^3.14.0" pytest-mock = "^3.15.1"
[build-system] [build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"] requires = ["poetry-core>=2.0.0,<3.0.0"]

View File

@@ -21,7 +21,7 @@ def engine():
@pytest.fixture @pytest.fixture
def db_session(engine): def db_session(engine):
session = sessionmaker(bind=engine)() session = sessionmaker(engine)()
yield session yield session
session.close() session.close()
@@ -37,4 +37,5 @@ def patch_connect(db_session, monkeypatch):
_factory = lambda: db_session # noqa: E731 _factory = lambda: db_session # noqa: E731
monkeypatch.setattr("directdnsonly.app.utils.connect", _factory) monkeypatch.setattr("directdnsonly.app.utils.connect", _factory)
monkeypatch.setattr("directdnsonly.app.reconciler.connect", _factory) monkeypatch.setattr("directdnsonly.app.reconciler.connect", _factory)
monkeypatch.setattr("directdnsonly.app.peer_sync.connect", _factory)
return db_session return db_session

View File

@@ -1,7 +1,7 @@
"""Tests for the CoreDNS MySQL backend (run against in-memory SQLite).""" """Tests for the CoreDNS MySQL backend (run against in-memory SQLite)."""
import pytest import pytest
from sqlalchemy import create_engine from sqlalchemy import create_engine, select
from sqlalchemy.orm import scoped_session, sessionmaker from sqlalchemy.orm import scoped_session, sessionmaker
from directdnsonly.app.backends.coredns_mysql import ( from directdnsonly.app.backends.coredns_mysql import (
@@ -28,7 +28,7 @@ def mysql_backend():
self.config = {} self.config = {}
self.instance_name = "test" self.instance_name = "test"
self.engine = engine self.engine = engine
self.Session = scoped_session(sessionmaker(bind=engine)) self.Session = scoped_session(sessionmaker(engine))
yield _TestBackend() yield _TestBackend()
engine.dispose() engine.dispose()
@@ -84,8 +84,8 @@ def test_write_zone_removes_stale_records(mysql_backend):
mysql_backend.write_zone("example.com", reduced) mysql_backend.write_zone("example.com", reduced)
session = mysql_backend.Session() session = mysql_backend.Session()
zone = session.query(Zone).filter_by(zone_name="example.com.").first() zone = session.execute(select(Zone).filter_by(zone_name="example.com.")).scalar_one_or_none()
records = session.query(Record).filter_by(zone_id=zone.id, type="AAAA").all() records = session.execute(select(Record).filter_by(zone_id=zone.id, type="AAAA")).scalars().all()
assert records == [] assert records == []
session.close() session.close()
@@ -141,7 +141,7 @@ def test_reconcile_removes_extra_records(mysql_backend):
# Inject a phantom record directly into the DB # Inject a phantom record directly into the DB
session = mysql_backend.Session() session = mysql_backend.Session()
zone = session.query(Zone).filter_by(zone_name="example.com.").first() zone = session.execute(select(Zone).filter_by(zone_name="example.com.")).scalar_one_or_none()
session.add( session.add(
Record( Record(
zone_id=zone.id, zone_id=zone.id,

192
tests/test_da_client.py Normal file
View File

@@ -0,0 +1,192 @@
"""Tests for directdnsonly.app.da.client — DirectAdminClient."""
import requests.exceptions
from unittest.mock import MagicMock, patch
from directdnsonly.app.da import DirectAdminClient
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_json_response(domains_list, total_pages=1):
data = {str(i): {"domain": d} for i, d in enumerate(domains_list)}
data["info"] = {"total_pages": total_pages}
mock = MagicMock()
mock.status_code = 200
mock.is_redirect = False
mock.headers = {"Content-Type": "application/json"}
mock.json.return_value = data
mock.raise_for_status = MagicMock()
return mock
def _client():
return DirectAdminClient("da1.example.com", 2222, "admin", "secret", ssl=True, verify_ssl=True)
# ---------------------------------------------------------------------------
# list_domains — JSON happy path
# ---------------------------------------------------------------------------
def test_list_domains_returns_set_from_json():
mock_resp = _make_json_response(["example.com", "test.com"])
with patch("requests.get", return_value=mock_resp):
result = _client().list_domains()
assert result == {"example.com", "test.com"}
def test_list_domains_paginates():
page1 = _make_json_response(["a.com"], total_pages=2)
page2 = _make_json_response(["b.com"], total_pages=2)
with patch("requests.get", side_effect=[page1, page2]):
result = _client().list_domains()
assert result == {"a.com", "b.com"}
# ---------------------------------------------------------------------------
# list_domains — DA Evo session login fallback
# ---------------------------------------------------------------------------
def test_redirect_triggers_session_login():
redirect_resp = MagicMock()
redirect_resp.status_code = 302
redirect_resp.is_redirect = True
client = _client()
with (
patch("requests.get", return_value=redirect_resp),
patch.object(client, "_login", return_value=False),
):
result = client.list_domains()
assert result is None
def test_persistent_redirect_after_login_returns_none():
redirect_resp = MagicMock()
redirect_resp.status_code = 302
redirect_resp.is_redirect = True
client = _client()
# Simulate cookies already set (login succeeded previously)
client._cookies = {"session": "abc"}
with patch("requests.get", return_value=redirect_resp):
result = client.list_domains()
assert result is None
# ---------------------------------------------------------------------------
# list_domains — error cases
# ---------------------------------------------------------------------------
def test_html_response_returns_none():
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.is_redirect = False
mock_resp.headers = {"Content-Type": "text/html; charset=utf-8"}
mock_resp.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_resp):
result = _client().list_domains()
assert result is None
def test_connection_error_returns_none():
with patch("requests.get", side_effect=requests.exceptions.ConnectionError("refused")):
result = _client().list_domains()
assert result is None
def test_timeout_returns_none():
with patch("requests.get", side_effect=requests.exceptions.Timeout()):
result = _client().list_domains()
assert result is None
def test_ssl_error_returns_none():
with patch("requests.get", side_effect=requests.exceptions.SSLError("cert verify failed")):
result = _client().list_domains()
assert result is None
# ---------------------------------------------------------------------------
# _parse_legacy_domain_list
# ---------------------------------------------------------------------------
def test_parse_standard_querystring():
result = DirectAdminClient._parse_legacy_domain_list("list[]=example.com&list[]=test.com")
assert result == {"example.com", "test.com"}
def test_parse_newline_separated():
result = DirectAdminClient._parse_legacy_domain_list("list[]=example.com\nlist[]=test.com")
assert result == {"example.com", "test.com"}
def test_parse_empty_body_returns_empty_set():
assert DirectAdminClient._parse_legacy_domain_list("") == set()
def test_parse_normalises_to_lowercase():
result = DirectAdminClient._parse_legacy_domain_list("list[]=EXAMPLE.COM")
assert "example.com" in result
assert "EXAMPLE.COM" not in result
def test_parse_strips_whitespace():
result = DirectAdminClient._parse_legacy_domain_list("list[]= example.com ")
assert "example.com" in result
# ---------------------------------------------------------------------------
# _login
# ---------------------------------------------------------------------------
def test_login_stores_cookies_on_success():
mock_resp = MagicMock()
mock_resp.cookies = {"session": "tok123"}
client = _client()
with patch("requests.post", return_value=mock_resp):
result = client._login()
assert result is True
assert client._cookies == {"session": "tok123"}
def test_login_returns_false_when_no_cookies():
mock_resp = MagicMock()
mock_resp.cookies = {}
client = _client()
with patch("requests.post", return_value=mock_resp):
result = client._login()
assert result is False
assert client._cookies is None
def test_login_returns_false_on_exception():
client = _client()
with patch("requests.post", side_effect=requests.exceptions.ConnectionError()):
result = client._login()
assert result is False

227
tests/test_nsd.py Normal file
View File

@@ -0,0 +1,227 @@
"""Tests for directdnsonly.app.backends.nsd — NSDBackend."""
import subprocess
from pathlib import Path
from unittest.mock import patch, MagicMock
import pytest
from directdnsonly.app.backends.nsd import NSDBackend
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
ZONE_DATA = """\
$ORIGIN example.com.
$TTL 300
@ 300 IN SOA ns1.example.com. hostmaster.example.com. (2024010101 3600 900 604800 300)
@ 300 IN NS ns1.example.com.
@ 300 IN A 192.0.2.1
"""
def _make_backend(tmp_path) -> NSDBackend:
"""Return an NSDBackend pointing at tmp_path directories.
is_available() is patched so the tests do not require a real nsd install.
"""
zones_dir = tmp_path / "zones"
nsd_conf = tmp_path / "nsd.conf.d" / "zones.conf"
config = {
"instance_name": "test_nsd",
"zones_dir": str(zones_dir),
"nsd_conf": str(nsd_conf),
}
with patch.object(NSDBackend, "is_available", return_value=True):
return NSDBackend(config)
# ---------------------------------------------------------------------------
# Availability check
# ---------------------------------------------------------------------------
def test_is_available_true(monkeypatch):
monkeypatch.setattr(
"directdnsonly.app.backends.nsd.subprocess.run",
lambda *a, **kw: MagicMock(returncode=0),
)
assert NSDBackend.is_available()
def test_is_available_false_when_not_installed(monkeypatch):
def raise_fnf(*args, **kwargs):
raise FileNotFoundError
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", raise_fnf)
assert not NSDBackend.is_available()
# ---------------------------------------------------------------------------
# Initialisation
# ---------------------------------------------------------------------------
def test_init_creates_zones_dir(tmp_path):
backend = _make_backend(tmp_path)
assert backend.zones_dir.exists()
def test_init_creates_nsd_conf(tmp_path):
backend = _make_backend(tmp_path)
assert backend.nsd_conf.exists()
def test_get_name():
assert NSDBackend.get_name() == "nsd"
# ---------------------------------------------------------------------------
# write_zone
# ---------------------------------------------------------------------------
def test_write_zone_creates_zone_file(tmp_path):
backend = _make_backend(tmp_path)
assert backend.write_zone("example.com", ZONE_DATA)
assert (backend.zones_dir / "example.com.db").exists()
def test_write_zone_content_matches(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
content = (backend.zones_dir / "example.com.db").read_text()
assert content == ZONE_DATA
def test_write_zone_adds_to_conf(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
conf = backend.nsd_conf.read_text()
assert 'name: "example.com"' in conf
assert "example.com.db" in conf
def test_write_zone_idempotent_conf_entry(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.write_zone("example.com", ZONE_DATA)
conf = backend.nsd_conf.read_text()
# Should appear exactly once
assert conf.count('name: "example.com"') == 1
def test_write_zone_multiple_zones(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.write_zone("other.com", ZONE_DATA)
conf = backend.nsd_conf.read_text()
assert 'name: "example.com"' in conf
assert 'name: "other.com"' in conf
# ---------------------------------------------------------------------------
# zone_exists
# ---------------------------------------------------------------------------
def test_zone_exists_after_write(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
assert backend.zone_exists("example.com")
def test_zone_not_exists_before_write(tmp_path):
backend = _make_backend(tmp_path)
assert not backend.zone_exists("missing.com")
# ---------------------------------------------------------------------------
# delete_zone
# ---------------------------------------------------------------------------
def test_delete_zone_removes_file(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
assert backend.delete_zone("example.com")
assert not (backend.zones_dir / "example.com.db").exists()
def test_delete_zone_removes_conf_entry(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.delete_zone("example.com")
conf = backend.nsd_conf.read_text()
assert 'name: "example.com"' not in conf
def test_delete_zone_returns_false_when_missing(tmp_path):
backend = _make_backend(tmp_path)
assert not backend.delete_zone("ghost.com")
def test_delete_zone_leaves_other_zones(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.write_zone("other.com", ZONE_DATA)
backend.delete_zone("example.com")
assert 'name: "other.com"' in backend.nsd_conf.read_text()
# ---------------------------------------------------------------------------
# reload_zone — subprocess interactions
# ---------------------------------------------------------------------------
def test_reload_zone_calls_nsd_control_reload(tmp_path, monkeypatch):
backend = _make_backend(tmp_path)
calls = []
def fake_run(cmd, **kwargs):
calls.append(cmd)
return MagicMock(returncode=0, stdout="ok", stderr="")
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", fake_run)
assert backend.reload_zone()
assert calls[0] == ["nsd-control", "reload"]
def test_reload_single_zone_passes_zone_name(tmp_path, monkeypatch):
backend = _make_backend(tmp_path)
calls = []
def fake_run(cmd, **kwargs):
calls.append(cmd)
return MagicMock(returncode=0, stdout="ok", stderr="")
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", fake_run)
assert backend.reload_zone("example.com")
assert calls[0] == ["nsd-control", "reload", "example.com"]
def test_reload_zone_returns_false_on_failure(tmp_path, monkeypatch):
backend = _make_backend(tmp_path)
def fake_run(cmd, **kwargs):
raise subprocess.CalledProcessError(1, cmd, stderr="nsd-control: error")
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", fake_run)
assert not backend.reload_zone()
# ---------------------------------------------------------------------------
# update_nsd_conf — full rewrite
# ---------------------------------------------------------------------------
def test_update_nsd_conf_replaces_all_zones(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("old.com", ZONE_DATA)
backend.update_nsd_conf(["new1.com", "new2.com"])
conf = backend.nsd_conf.read_text()
assert 'name: "old.com"' not in conf
assert 'name: "new1.com"' in conf
assert 'name: "new2.com"' in conf

263
tests/test_peer_sync.py Normal file
View File

@@ -0,0 +1,263 @@
"""Tests for directdnsonly.app.peer_sync — PeerSyncWorker."""
import datetime
import json
import pytest
from sqlalchemy import select, func
from unittest.mock import patch, MagicMock
from directdnsonly.app.peer_sync import PeerSyncWorker
from directdnsonly.app.db.models import Domain
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
BASE_CONFIG = {
"enabled": True,
"interval_minutes": 15,
"peers": [
{
"url": "http://ddo-2:2222",
"username": "directdnsonly",
"password": "changeme",
}
],
}
NOW = datetime.datetime(2024, 6, 1, 12, 0, 0)
OLDER = datetime.datetime(2024, 6, 1, 11, 0, 0)
ZONE_DATA = "$ORIGIN example.com.\n@ 300 IN SOA ns1 hostmaster 1 3600 900 604800 300\n"
# ---------------------------------------------------------------------------
# Config / startup tests
# ---------------------------------------------------------------------------
def test_disabled_by_default():
worker = PeerSyncWorker({})
assert not worker.enabled
def test_interval_stored():
worker = PeerSyncWorker({"enabled": True, "interval_minutes": 30})
assert worker.interval_seconds == 1800
def test_default_interval():
worker = PeerSyncWorker({"enabled": True})
assert worker.interval_seconds == 15 * 60
def test_peers_stored():
worker = PeerSyncWorker(BASE_CONFIG)
assert len(worker.peers) == 1
assert worker.peers[0]["url"] == "http://ddo-2:2222"
def test_start_skips_when_disabled(caplog):
worker = PeerSyncWorker({"enabled": False})
worker.start()
assert worker._thread is None
def test_start_warns_when_no_peers(caplog):
import logging
worker = PeerSyncWorker({"enabled": True, "peers": []})
with patch.object(worker, "_run"):
worker.start()
# Thread should not have started
assert worker._thread is None
# ---------------------------------------------------------------------------
# _sync_from_peer tests
# ---------------------------------------------------------------------------
def _make_peer():
return BASE_CONFIG["peers"][0]
def _peer_list(domain, ts=None):
"""Simulate the JSON response from GET /internal/zones."""
return [
{
"domain": domain,
"zone_updated_at": ts.isoformat() if ts else None,
"hostname": "da1.example.com",
"username": "admin",
}
]
def _peer_zone(domain, ts=None, zone_data=ZONE_DATA):
"""Simulate the JSON response from GET /internal/zones?domain=X."""
return {
"domain": domain,
"zone_data": zone_data,
"zone_updated_at": ts.isoformat() if ts else None,
"hostname": "da1.example.com",
"username": "admin",
}
def test_sync_creates_new_local_record(patch_connect, monkeypatch):
"""When local DB has no record, peer zone_data is fetched and stored."""
worker = PeerSyncWorker(BASE_CONFIG)
session = patch_connect
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
resp.json.return_value = _peer_zone("example.com", NOW)
else:
resp.json.return_value = _peer_list("example.com", NOW)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
record = session.execute(
select(Domain).filter_by(domain="example.com")
).scalar_one_or_none()
assert record is not None
assert record.zone_data == ZONE_DATA
assert record.zone_updated_at == NOW
def test_sync_updates_older_local_record(patch_connect, monkeypatch):
"""When local zone_data is older than peer's, it is overwritten."""
session = patch_connect
session.add(
Domain(domain="example.com", zone_data="old data", zone_updated_at=OLDER)
)
session.commit()
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
resp.json.return_value = _peer_zone("example.com", NOW)
else:
resp.json.return_value = _peer_list("example.com", NOW)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
record = session.execute(
select(Domain).filter_by(domain="example.com")
).scalar_one_or_none()
assert record.zone_data == ZONE_DATA
assert record.zone_updated_at == NOW
def test_sync_skips_when_local_is_newer(patch_connect, monkeypatch):
"""When local zone_data is newer than peer's, it is not overwritten."""
session = patch_connect
session.add(
Domain(domain="example.com", zone_data="newer local", zone_updated_at=NOW)
)
session.commit()
worker = PeerSyncWorker(BASE_CONFIG)
fetch_calls = []
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
fetch_calls.append(url)
resp.json.return_value = _peer_zone("example.com", OLDER)
else:
resp.json.return_value = _peer_list("example.com", OLDER)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
# zone_data fetch should not have been called
assert not fetch_calls
record = session.execute(
select(Domain).filter_by(domain="example.com")
).scalar_one_or_none()
assert record.zone_data == "newer local"
def test_sync_skips_unreachable_peer(monkeypatch):
"""If the peer raises a connection error, _sync_all catches it gracefully."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(*args, **kwargs):
raise ConnectionError("peer down")
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
# Should not raise
worker._sync_all()
def test_sync_skips_peer_with_bad_status(patch_connect, monkeypatch):
"""Non-200 response from peer zone list is silently skipped."""
worker = PeerSyncWorker(BASE_CONFIG)
session = patch_connect
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 503
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
# No records should have been created
assert session.execute(select(func.count()).select_from(Domain)).scalar() == 0
def test_sync_skips_missing_zone_data_in_response(patch_connect, monkeypatch):
"""If the peer returns no zone_data for a domain, it is skipped."""
session = patch_connect
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
resp.json.return_value = {"domain": "example.com", "zone_data": None}
else:
resp.json.return_value = _peer_list("example.com", NOW)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
assert session.execute(select(func.count()).select_from(Domain)).scalar() == 0
def test_sync_empty_peer_list(patch_connect, monkeypatch):
"""Empty zone list from peer results in zero syncs without error."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
resp.json.return_value = []
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())

View File

@@ -1,9 +1,8 @@
"""Tests for directdnsonly.app.reconciler — ReconciliationWorker.""" """Tests for directdnsonly.app.reconciler — ReconciliationWorker."""
import pytest import pytest
import requests.exceptions
from queue import Queue from queue import Queue
from unittest.mock import MagicMock, patch from unittest.mock import patch, MagicMock
from directdnsonly.app.reconciler import ReconciliationWorker from directdnsonly.app.reconciler import ReconciliationWorker
from directdnsonly.app.db.models import Domain from directdnsonly.app.db.models import Domain
@@ -47,6 +46,18 @@ def dry_run_worker(delete_queue):
return ReconciliationWorker(delete_queue, cfg) return ReconciliationWorker(delete_queue, cfg)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
DA_CLIENT_PATH = "directdnsonly.app.reconciler.DirectAdminClient"
def _patch_da(return_value):
"""Patch DirectAdminClient so list_domains returns a fixed value."""
return patch(DA_CLIENT_PATH, **{"return_value.list_domains.return_value": return_value})
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# _reconcile_all — orphan detection # _reconcile_all — orphan detection
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -58,7 +69,7 @@ def test_orphan_queued_when_domain_missing_from_da(worker, delete_queue, patch_c
) )
patch_connect.commit() patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value=set()): with _patch_da(set()):
worker._reconcile_all() worker._reconcile_all()
assert not delete_queue.empty() assert not delete_queue.empty()
@@ -73,7 +84,7 @@ def test_orphan_not_queued_in_dry_run(dry_run_worker, delete_queue, patch_connec
) )
patch_connect.commit() patch_connect.commit()
with patch.object(dry_run_worker, "_fetch_da_domains", return_value=set()): with _patch_da(set()):
dry_run_worker._reconcile_all() dry_run_worker._reconcile_all()
assert delete_queue.empty() assert delete_queue.empty()
@@ -86,7 +97,7 @@ def test_orphan_not_queued_for_unknown_server(worker, delete_queue, patch_connec
) )
patch_connect.commit() patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value=set()): with _patch_da(set()):
worker._reconcile_all() worker._reconcile_all()
assert delete_queue.empty() assert delete_queue.empty()
@@ -98,7 +109,7 @@ def test_active_domain_not_queued(worker, delete_queue, patch_connect):
) )
patch_connect.commit() patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value={"good.com"}): with _patch_da({"good.com"}):
worker._reconcile_all() worker._reconcile_all()
assert delete_queue.empty() assert delete_queue.empty()
@@ -113,7 +124,7 @@ def test_backfill_null_hostname(worker, patch_connect):
patch_connect.add(Domain(domain="backfill.com", hostname=None, username="admin")) patch_connect.add(Domain(domain="backfill.com", hostname=None, username="admin"))
patch_connect.commit() patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value={"backfill.com"}): with _patch_da({"backfill.com"}):
worker._reconcile_all() worker._reconcile_all()
record = patch_connect.query(Domain).filter_by(domain="backfill.com").first() record = patch_connect.query(Domain).filter_by(domain="backfill.com").first()
@@ -126,7 +137,7 @@ def test_migration_updates_hostname(worker, patch_connect):
) )
patch_connect.commit() patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value={"moved.com"}): with _patch_da({"moved.com"}):
worker._reconcile_all() worker._reconcile_all()
record = patch_connect.query(Domain).filter_by(domain="moved.com").first() record = patch_connect.query(Domain).filter_by(domain="moved.com").first()
@@ -138,148 +149,13 @@ def test_dry_run_still_backfills(dry_run_worker, patch_connect):
patch_connect.add(Domain(domain="fill.com", hostname=None, username="admin")) patch_connect.add(Domain(domain="fill.com", hostname=None, username="admin"))
patch_connect.commit() patch_connect.commit()
with patch.object(dry_run_worker, "_fetch_da_domains", return_value={"fill.com"}): with _patch_da({"fill.com"}):
dry_run_worker._reconcile_all() dry_run_worker._reconcile_all()
record = patch_connect.query(Domain).filter_by(domain="fill.com").first() record = patch_connect.query(Domain).filter_by(domain="fill.com").first()
assert record.hostname == "da1.example.com" assert record.hostname == "da1.example.com"
# ---------------------------------------------------------------------------
# _fetch_da_domains — HTTP handling
# ---------------------------------------------------------------------------
def _make_json_response(domains_dict, total_pages=1):
"""Return a mock requests.Response with JSON payload matching DA format."""
data = {str(i): {"domain": d} for i, d in enumerate(domains_dict)}
data["info"] = {"total_pages": total_pages}
mock = MagicMock()
mock.status_code = 200
mock.is_redirect = False
mock.headers = {"Content-Type": "application/json"}
mock.json.return_value = data
mock.raise_for_status = MagicMock()
return mock
def test_fetch_returns_domains_from_json(worker):
mock_resp = _make_json_response(["example.com", "test.com"])
with patch("requests.get", return_value=mock_resp):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result == {"example.com", "test.com"}
def test_fetch_paginates(worker):
page1 = _make_json_response(["a.com"], total_pages=2)
page2 = _make_json_response(["b.com"], total_pages=2)
with patch("requests.get", side_effect=[page1, page2]):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result == {"a.com", "b.com"}
def test_fetch_redirect_triggers_session_login(worker):
redirect_resp = MagicMock()
redirect_resp.status_code = 302
redirect_resp.is_redirect = True
with (
patch("requests.get", return_value=redirect_resp),
patch.object(worker, "_da_session_login", return_value=None),
):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_html_response_returns_none(worker):
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.is_redirect = False
mock_resp.headers = {"Content-Type": "text/html; charset=utf-8"}
mock_resp.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_resp):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_connection_error_returns_none(worker):
with patch(
"requests.get", side_effect=requests.exceptions.ConnectionError("refused")
):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_timeout_returns_none(worker):
with patch("requests.get", side_effect=requests.exceptions.Timeout()):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_ssl_error_returns_none(worker):
with patch(
"requests.get", side_effect=requests.exceptions.SSLError("cert verify failed")
):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
# ---------------------------------------------------------------------------
# _parse_da_domain_list — legacy format fallback
# ---------------------------------------------------------------------------
def test_parse_standard_querystring():
body = "list[]=example.com&list[]=test.com"
result = ReconciliationWorker._parse_da_domain_list(body)
assert result == {"example.com", "test.com"}
def test_parse_newline_separated():
body = "list[]=example.com\nlist[]=test.com"
result = ReconciliationWorker._parse_da_domain_list(body)
assert result == {"example.com", "test.com"}
def test_parse_empty_body_returns_empty_set():
assert ReconciliationWorker._parse_da_domain_list("") == set()
def test_parse_normalises_to_lowercase():
result = ReconciliationWorker._parse_da_domain_list("list[]=EXAMPLE.COM")
assert "example.com" in result
assert "EXAMPLE.COM" not in result
def test_parse_strips_whitespace():
result = ReconciliationWorker._parse_da_domain_list("list[]= example.com ")
assert "example.com" in result
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
# Worker lifecycle # Worker lifecycle
# --------------------------------------------------------------------------- # ---------------------------------------------------------------------------
@@ -297,3 +173,140 @@ def test_no_servers_does_not_start(delete_queue):
w = ReconciliationWorker(delete_queue, cfg) w = ReconciliationWorker(delete_queue, cfg)
w.start() w.start()
assert not w.is_alive assert not w.is_alive
def test_initial_delay_stored(delete_queue):
cfg = {**BASE_CONFIG, "initial_delay_minutes": 30}
w = ReconciliationWorker(delete_queue, cfg)
assert w._initial_delay == 30 * 60
def test_zero_initial_delay_by_default(delete_queue):
w = ReconciliationWorker(delete_queue, BASE_CONFIG)
assert w._initial_delay == 0
# ---------------------------------------------------------------------------
# _heal_backends — Option C backend healing
# ---------------------------------------------------------------------------
def _make_backend_registry(zone_exists_return: bool):
"""Build a mock backend_registry with one backend whose zone_exists returns
the given value."""
backend = MagicMock()
backend.zone_exists.return_value = zone_exists_return
registry = MagicMock()
registry.get_available_backends.return_value = {"coredns": backend}
return registry, backend
def test_heal_queues_zone_missing_from_backend(delete_queue, patch_connect):
save_queue = Queue()
registry, backend = _make_backend_registry(zone_exists_return=False)
patch_connect.add(
Domain(
domain="missing.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
w = ReconciliationWorker(
delete_queue, BASE_CONFIG, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert not save_queue.empty()
item = save_queue.get_nowait()
assert item["domain"] == "missing.com"
assert item["failed_backends"] == ["coredns"]
assert item["source"] == "reconciler_heal"
assert item["zone_file"] == "; zone file"
def test_heal_skips_domains_without_zone_data(delete_queue, patch_connect):
save_queue = Queue()
registry, _ = _make_backend_registry(zone_exists_return=False)
patch_connect.add(
Domain(domain="nodata.com", hostname="da1.example.com", username="admin", zone_data=None)
)
patch_connect.commit()
w = ReconciliationWorker(
delete_queue, BASE_CONFIG, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert save_queue.empty()
def test_heal_skips_when_all_backends_have_zone(delete_queue, patch_connect):
save_queue = Queue()
registry, _ = _make_backend_registry(zone_exists_return=True)
patch_connect.add(
Domain(
domain="present.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
w = ReconciliationWorker(
delete_queue, BASE_CONFIG, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert save_queue.empty()
def test_heal_dry_run_does_not_queue(delete_queue, patch_connect):
save_queue = Queue()
registry, _ = _make_backend_registry(zone_exists_return=False)
patch_connect.add(
Domain(
domain="dry.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
cfg = {**BASE_CONFIG, "dry_run": True}
w = ReconciliationWorker(
delete_queue, cfg, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert save_queue.empty()
def test_heal_skipped_when_no_registry(delete_queue, patch_connect):
"""_heal_backends should not run when backend_registry is None."""
save_queue = Queue()
patch_connect.add(
Domain(
domain="noregistry.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
w = ReconciliationWorker(delete_queue, BASE_CONFIG, save_queue=save_queue)
# Should not raise; healing is silently skipped
with _patch_da({"noregistry.com"}):
w._reconcile_all()
assert save_queue.empty()