Compare commits

...

24 Commits

Author SHA1 Message Date
db60d808de feat: operational status endpoint + reconciler/peer state tracking 📊
- ReconciliationWorker._last_run stores per-pass stats (da_servers_polled,
  zones_in_da/db, orphans_found/queued, hostnames_backfilled/migrated,
  zones_healed, duration_seconds, dry_run flag)
- ReconciliationWorker.get_status() exposes state for API/UI consumption
- _heal_backends() now returns healed count
- PeerSyncWorker.get_peer_status() serialises _peer_health to JSON-safe dict
  (url, healthy, consecutive_failures, last_seen) with summary totals
- WorkerManager tracks dead-letter count; queue_status() now returns nested
  reconciler/peer_sync dicts replacing flat reconciler_alive/peer_syncer_alive
- New GET /status endpoint (StatusAPI) aggregates queue depths, worker liveness,
  reconciler last-run, peer health, and live zone count; computes ok/degraded/error
- .gitignore: exclude .claude/, .vscode/, .env (always local)
- app.yml: add documented datastore section (SQLite default + MySQL commented)
- 164 tests passing (23 new tests added)
2026-02-25 18:51:56 +13:00
0f417da204 feat: add CMD_MULTI_SERVER methods to DirectAdminClient 🔌
Adds get_extra_dns_servers(), add_extra_dns_server(), and the
high-level ensure_extra_dns_server() which registers a node and
enforces dns=yes + domain_check=yes in a single call.  Also adds
the generic post() helper.  10 new tests, 141 total.
2026-02-25 16:29:21 +13:00
3f6a061ffe feat: mesh peer sync with health tracking and separate peer credentials 🔗
- Separate peer_sync.auth_username/password from the DA-facing credentials
  so /internal/* uses its own basic auth; a compromised peer cannot push
  zones or access the admin API
- Per-peer health tracking: consecutive failure count, degraded/recovered
  log events at FAILURE_THRESHOLD (3) and on first successful contact after
  degradation
- Gossip-lite mesh discovery: each sync pass calls /internal/peers on every
  known peer and adds newly discovered node URLs automatically; a linear
  chain of initial connections is sufficient to form a full mesh
- /internal/peers endpoint returns the node's live peer URL list
- Support DADNS_PEER_SYNC_PEER_N_URL/USERNAME/PASSWORD numbered env vars
  for multi-peer env-var-only deployments (up to 9); original single-peer
  DADNS_PEER_SYNC_PEER_URL retained for backward compatibility
2026-02-25 16:08:26 +13:00
0b31b75789 fix: correct RDATA encoding and batch processing in CoreDNS MySQL backend 🐛
- Fix dnspython silently relativizing in-zone FQDN targets to '@' by
  calling rdata.to_text(origin=origin, relativize=False); CoreDNS MySQL
  requires absolute FQDNs in RDATA and was serving '.' for any CNAME/MX
  pointing to the zone apex
- Reorder write_zone to delete stale records before inserting new ones
  so a brief NXDOMAIN is preferred over briefly serving duplicate records
- Rework save-queue batch loop: keep batch open until queue is empty
  rather than closing after a fixed timeout, so sequential DA zone pushes
  accumulate into a single batch
- Add managed_by='directadmin' to _ensure_zone_exists for new and
  legacy NULL rows
2026-02-25 15:43:08 +13:00
83fbb03cad fix: relativize zone-apex hostnames to '@' for CoreDNS MySQL 🐛
CoreDNS MySQL (cybercinch fork) expects '@' for zone-apex references in
record RDATA. Storing the full FQDN (e.g. 'ithome.net.nz.') caused CoreDNS
to strip the zone suffix and serve 'MX 0 .' / 'CNAME .' instead of the
correct apex target.

- Add _relativize_name(): converts zone FQDN → '@', in-zone subdomains →
  relative label, external FQDNs left unchanged. Handles both already-
  relativized output from dnspython ($ORIGIN present) and absolute FQDNs
  when $ORIGIN is absent from the zone file.
- Replace _normalize_cname_data() with _relativize_name(); add
  _normalize_mx_data(), _normalize_ns_data(), _normalize_srv_data() using
  the same helper.
- _parse_zone_to_record_set() now normalizes MX, NS, SRV alongside CNAME.
- _ensure_zone_exists() sets managed_by='directadmin' on create and
  back-fills NULL rows from pre-migration installs.
- Zone.managed_by changed to nullable=True to match ALTER TABLE migration
  where existing rows have no value.
- schema/coredns_mysql.sql updated to reflect actual two-table schema with
  managed_by column and migration comment.
- 11 new tests (130 total, all passing).
2026-02-25 14:37:14 +13:00
5e9a6f19bd fix: add __main__.py so python -m directdnsonly works in container 🐛
- directdnsonly/__main__.py: inserts package dir into sys.path before
  importing main.py (which uses short-form relative imports) then calls
  main(); works for both `python -m directdnsonly` and the dadns script
- pyproject.toml: wire up `dadns` console script entry point
2026-02-20 14:17:53 +13:00
4a4b4f2b98 docs: clarify Knot DNS and PowerDNS are not implemented backends 📝
Add explicit note that only nsd, bind, and coredns_mysql are available —
Knot and PowerDNS are listed as architectural context only.
2026-02-20 06:59:12 +13:00
6e96e78376 docs: CoreDNS MySQL is the recommended choice at all scale levels 🏆
The cybercinch fork's resilience features (cache fallback, health monitoring,
zero downtime, connection pooling) make it the best DNS backend regardless of
zone count — not just at 300+ zones. Update summary recommendation and
topology comparison "Best for" row to reflect this.
2026-02-20 06:53:47 +13:00
e8939bcd82 docs: document CoreDNS fork resilience features accurately 📋
Replace vague "file caching" description with the confirmed feature set:
connection pooling, degraded operation (JSON cache fallback), smart caching,
health monitoring, zero downtime. Update Topology B failure table to reflect
that CoreDNS serves from cache throughout MySQL outages. Add write/read split
summary — retry queue covers writes, CoreDNS cache covers reads.
2026-02-20 06:52:27 +13:00
d98f08a408 feat: peer sync configurable via env vars + document CoreDNS file cache 🔗
- PeerSyncWorker reads DADNS_PEER_SYNC_PEER_URL / _USERNAME / _PASSWORD env
  vars to populate a single peer without a config file; deduped against any
  config-file peers so the URL never appears twice
- 2 new tests (119 total, all passing)
- README: peer sync single-peer env var table; Topology C compose example
  updated to use env vars only (no config file needed for two-node setup)
- README: document cybercinch/coredns_mysql_extend built-in file caching —
  serves from cache during MySQL outages, eliminates per-query round-trips
2026-02-20 06:41:46 +13:00
fbb6220728 feat: add NSD backend and Topology C (multi-instance with peer sync) 🏗️
- New NSDBackend: zone files + nsd-control reload, zone registration via
  nsd.conf.d include file; mirrors BIND backend interface exactly
- BackendRegistry now supports type "nsd"; config defaults for nsd.zones_dir
  and nsd.nsd_conf
- Dockerfile installs both NSD and BIND9 — entrypoint detects configured
  backend type(s) and starts only the required daemon; CoreDNS MySQL
  deployments start neither
- docker/nsd.conf: minimal NSD base config with remote-control and
  zones.conf include
- entrypoint.sh: reads config file + env vars to determine which daemon
  to start; runs nsd-control-setup on first boot
- 20 new NSD backend tests (117 total, all passing)
- README: Topology C (multi-instance + peer sync) documented as most robust
  HA option; NSD config reference; updated topology comparison table;
  NSD env-var-only compose examples; version 2.5.0
2026-02-20 06:29:39 +13:00
f9907d2859 chore: complete SQLAlchemy 2.0 migration in coredns_mysql backend and tests ⬆️
Migrate remaining session.query() calls in coredns_mysql.py to
select()/session.execute() style; update bulk delete to delete()
construct and count to func.count(); drop sessionmaker(bind=).
Update test fixtures and assertions to match.

Zero session.query() calls remaining across the entire codebase.
2026-02-19 23:43:54 +13:00
d81ecd6bdd fix: migrate remaining session.query() calls to SQLAlchemy 2.0 select() 🔧 2026-02-19 23:38:31 +13:00
8c1c2b4abc chore: upgrade SQLAlchemy to 2.0 and bump all stale deps ⬆️
- SQLAlchemy 1.4 → 2.0.46: migrate all session.query() calls to
  select() / session.execute() style; move declarative_base import
  from ext.declarative to sqlalchemy.orm; explicit conn.commit()
  after DDL in _migrate(); drop sessionmaker(bind=) keyword
- persist-queue 1.0 → 1.1, pymysql 1.1.1 → 1.1.2,
  dnspython 2.7 → 2.8, pyyaml 6.0.2 → 6.0.3
- pytest 8.3 → 9.0.2, pytest-cov 6.1 → 7.0,
  pytest-mock 3.14 → 3.15.1, black 25.1 → 26.1

97 tests pass, zero deprecation warnings
2026-02-19 23:37:15 +13:00
22e64498ce chore: bump version to 2.4.0 🚀 2026-02-19 22:20:28 +13:00
143cf9c792 feat: add peer sync worker for zone_data exchange between nodes 🔄
Adds optional peer-to-peer zone_data replication between directdnsonly
instances. Enables eventual consistency in DA Multi-Server topologies
without a shared datastore.

- InternalAPI: GET /internal/zones (list) and ?domain= (detail)
  exposes zone_data to peers via existing basic auth
- PeerSyncWorker: interval-based daemon thread that fetches zone_data
  from configured peers, storing newer entries locally; peer downtime
  is silently skipped and retried next interval
- WorkerManager: wires PeerSyncWorker alongside reconciler; exposes
  peer_syncer_alive in queue_status
- Config: peer_sync block with enabled/interval_minutes/peers[]
- Tests: 13 tests covering sync, skip-older, skip-unreachable, empty
  peer list, bad status, and missing zone_data scenarios
2026-02-19 22:16:55 +13:00
33f4f30b5f feat: add initial_delay_minutes to reconciler for LB stagger 🕐
Configurable startup delay before the first reconciliation pass so that
multiple receivers behind a load balancer can be offset without relying
on container start order (which is lost on reboot). Set to half the
interval on the secondary receiver — e.g. interval 60m → delay 30m.
Default is 0 (no change to existing behaviour). Stop event is respected
during the delay so the worker shuts down cleanly even mid-wait.
2026-02-19 15:28:30 +13:00
b939bb5fa0 docs: add DNS server resource and scale guide with NSD/Knot comparison 📊
Cover memory profiles, zone-count thresholds, reload behaviour, and
throughput characteristics for BIND9, CoreDNS MySQL, NSD, and Knot DNS.
Call out NSD as the recommended lighter bundled alternative to BIND9
(~5-10 MB base, near-identical zone file format, same reload semantics)
and note the ~300-zone crossover where CoreDNS MySQL starts to win.
2026-02-19 14:48:10 +13:00
70ae81ee0d docs: rewrite topology comparison with accurate failure-mode analysis 📋
Expand both topology diagrams to show the retry queue and healing pass in
the flow. Add per-topology failure-behaviour tables covering transient backend
failure, prolonged outage, container-down-during-push, and cross-node drift.
Rewrite the comparison table to call out the key architectural difference:
Topology A has no auto-recovery from prolonged BIND failure (needs next DA push);
Topology B's reconciler healing pass re-syncs missing backends from stored
zone_data without any DA involvement.
2026-02-19 14:17:53 +13:00
b523b17f30 feat: retry queue, backend healing, and zone_data persistence 🔁
- worker.py: third persistent retry queue with exponential backoff (30s→30m,
  max 5 attempts); failed backends tracked per-item so retries target only the
  failing nodes; zone_data stored in DB after every successful write
- Domain model: zone_data TEXT + zone_updated_at DATETIME columns; additive
  migration applied on startup so existing deployments upgrade in place
- ReconciliationWorker: Option C healing pass — checks every configured backend
  for zone presence after each reconciliation cycle and re-queues any zone
  missing from a backend using stored zone_data, enabling automatic recovery
  from prolonged backend outages without waiting for DirectAdmin to re-push
- 82 tests, all passing
2026-02-19 14:05:22 +13:00
0e044b7dc2 chore: remove unimplemented PowerDNS MySQL backend 🗑️
Dead code from v1 planning — never implemented, superseded by the
CoreDNS MySQL backend. Also carried a broken stale import that would
have caused an ImportError on load.
2026-02-19 12:24:30 +13:00
e0a119558d refactor: extract DirectAdminClient into directdnsonly.app.da module 🏗️
Move all outbound DirectAdmin HTTP logic out of ReconciliationWorker and
into a dedicated, independently testable DirectAdminClient class:

- directdnsonly/app/da/client.py: list_domains (paginated JSON + legacy
  fallback), get (authenticated GET to any CMD_* endpoint), _login
  (DA Evo session-cookie fallback), _parse_legacy_domain_list
- directdnsonly/app/da/__init__.py: public re-export of DirectAdminClient
- reconciler.py: now purely reconciliation logic; instantiates a client
  per configured server — no HTTP code remaining
- tests/test_da_client.py: 16 dedicated tests for DirectAdminClient
- tests/test_reconciler.py: mocks at the DirectAdminClient class boundary
  instead of the internal _fetch_da_domains method

Bumped to 2.2.0 — DirectAdminClient is now a first-class public API.
2026-02-19 12:16:22 +13:00
ae1e89a236 feat: conditional BIND startup; config search path priority fix 🔧
- entrypoint: only start named when a bind backend is configured and
  enabled in app.yml; CoreDNS-only deployments skip named entirely
- config: user-supplied paths (/etc/directdnsonly, ./config) now
  searched before the bundled app.yml so mounted configs take effect
- docs: deployment topology reference — Topology A (dual BIND HA) and
  Topology B (single instance, multi-DC CoreDNS MySQL)
- chore: bump version to 2.1.0
- justfile: add build-docker recipe
2026-02-19 12:07:37 +13:00
aac7b365a5 fix: remove stale COPY config from Dockerfile 🐛
Root config/ directory was removed when the duplicate config/app.yml was
deleted — the canonical config is now bundled inside directdnsonly/config/
and is already covered by the existing COPY directdnsonly step.
2026-02-18 23:16:52 +13:00
35 changed files with 4677 additions and 1571 deletions

6
.gitignore vendored
View File

@@ -26,3 +26,9 @@ build
*.mypy_cache
*.pytest_cache
/data/*
# Editor / tool settings — always local, never committed
.vscode/
.claude/
.env
*.env

View File

@@ -1,16 +1,22 @@
FROM python:3.11.12-slim
# Install system dependencies
RUN apt-get update && apt-get install -y \
# Install system dependencies.
# Both NSD and BIND are installed so the image works with any DNS backend type.
# The entrypoint detects which one is configured and starts only that daemon.
# CoreDNS MySQL users: neither daemon is started — the image is still usable.
RUN apt-get update && apt-get install -y --no-install-recommends \
bind9 \
bind9utils \
nsd \
dnsutils \
gcc \
python3-dev \
default-libmysqlclient-dev \
&& rm -rf /var/lib/apt/lists/*
# Configure BIND
# ---------------------------------------------------------------------------
# BIND setup
# ---------------------------------------------------------------------------
RUN mkdir -p /etc/named/zones && \
chown -R bind:bind /etc/named && \
chmod 755 /etc/named/zones
@@ -19,35 +25,37 @@ COPY docker/named.conf.local /etc/bind/
COPY docker/named.conf.options /etc/bind/
RUN chown root:bind /etc/bind/named.conf.*
# Install Python dependencies
# ---------------------------------------------------------------------------
# NSD setup
# ---------------------------------------------------------------------------
RUN mkdir -p /etc/nsd/zones /etc/nsd/nsd.conf.d && \
chown -R nsd:nsd /etc/nsd && \
chmod 755 /etc/nsd/zones
COPY docker/nsd.conf /etc/nsd/nsd.conf
RUN chown nsd:nsd /etc/nsd/nsd.conf
# ---------------------------------------------------------------------------
# Application
# ---------------------------------------------------------------------------
WORKDIR /app
COPY pyproject.toml poetry.lock README.md ./
# Install specific Poetry version that matches your lock file
RUN pip install "poetry==2.1.2" # Adjust version to match your lock file
RUN pip install "poetry==2.1.2"
# Copy application files
COPY directdnsonly ./directdnsonly
COPY config ./config
COPY schema ./schema
RUN poetry config virtualenvs.create false && \
poetry install
# Create data directories
RUN mkdir -p /app/data/queues && \
mkdir -p /app/data/zones && \
mkdir -p /app/logs && \
RUN mkdir -p /app/data/queues /app/data/zones /app/logs && \
chmod -R 755 /app/data
# Configure BIND zone directory to match app config
#RUN ln -s /app/data/zones /etc/named/zones/dadns
# Start script
COPY docker/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
EXPOSE 2222 53/udp
CMD ["/entrypoint.sh"]
CMD ["/entrypoint.sh"]

613
README.md
View File

@@ -1,10 +1,347 @@
# DaDNS - DNS Management System
# DirectDNSOnly - DNS Management System
## Deployment Topologies
Three reference topologies are documented below. Choose the one that matches your infrastructure.
---
### Topology A — Dual NSD/BIND Instances (High-Availability / Multi-Server)
Two independent DirectDNSOnly containers, each running a bundled DNS daemon (NSD by default, or BIND9). Both are registered as Extra DNS servers in the same DirectAdmin Multi-Server environment, so DA pushes every zone change to both simultaneously.
```
DirectAdmin Multi-Server
├─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-1 (container, BIND backend)
│ │
│ Persistent Queue
│ ├─ writes zone file
│ ├─ reloads named
│ └─ retry on failure (exp. backoff)
│ (serves authoritative DNS on :53)
└─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-2 (container, BIND backend)
Persistent Queue
├─ writes zone file
├─ reloads named
└─ retry on failure (exp. backoff)
(serves authoritative DNS on :53)
```
**Each instance is completely independent** — no shared state, no cross-talk. Redundancy comes from DA pushing to both. If one container goes down, DA continues to push to the other.
#### Failure behaviour
| Scenario | What happens |
|---|---|
| One container down during DA push | DA cannot deliver; that instance misses the update. The retry queue inside that instance cannot help — the push never arrived. When the container recovers, it will serve stale zone data until DA re-pushes (next zone change triggers a new push). |
| BIND crashes but container stays up | The zone write lands in the persistent queue. The retry worker replays it with exponential backoff (30 s → 2 m → 5 m → 15 m → 30 m, up to 5 attempts). |
| Zone deleted from DA while instance was down | The reconciliation poller detects the orphan on the next pass and queues a delete, keeping the BIND instance clean without manual intervention. |
| Two instances diverge | No automatic cross-instance sync. Drift persists until DA re-pushes the affected zone (i.e. the next time that domain is touched in DA). |
> **DNS consistency note:** DirectAdmin pushes to each Extra DNS server sequentially, not atomically. If one instance is offline when a zone is changed, that instance will serve stale data until the next DA push for that zone. For workloads where split-brain DNS is unacceptable, use Topology B (single write path → multiple MySQL backends) instead.
#### `config/app.yml` — instance 1
```yaml
app:
auth_username: directdnsonly
auth_password: your-secret
dns:
default_backend: bind
backends:
bind:
type: bind
enabled: true
zones_dir: /etc/named/zones
named_conf: /etc/bind/named.conf.local
```
#### `docker-compose.yml` sketch — instance 1
```yaml
services:
directdnsonly-1:
image: guisea/directdnsonly:2.5.0
ports:
- "2222:2222" # DA pushes here
- "53:53/udp" # authoritative DNS
volumes:
- ./config:/app/config
- ./data:/app/data
```
Register both containers as separate Extra DNS entries in DA → DNS Administration → Extra DNS Servers, with the same credentials configured in each `config/app.yml`.
---
### Topology B — Single Instance, Multiple CoreDNS MySQL Backends (Multi-DC)
One DirectDNSOnly instance receives zone pushes from DirectAdmin and fans out to two (or more) CoreDNS MySQL databases in parallel. CoreDNS servers in each data centre read from their local database. The directdnsonly instance is the sole write path — it does **not** serve DNS itself.
```
DirectAdmin
└─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly (single container)
Persistent Queue (survives restarts)
zone_data stored to SQLite after each write
ThreadPoolExecutor (one thread per backend)
│ │
▼ ▼
coredns_mysql_dc1 coredns_mysql_dc2
(MySQL 10.0.0.80) (MySQL 10.0.1.29)
│ │
[success] [failure → retry queue]
│ │
▼ 30s/2m/5m/15m/30m backoff
CoreDNS (DC1) retry → coredns_mysql_dc2
serves :53 from DB
Reconciliation poller (every N minutes)
├─ orphan detection (zones removed from DA)
└─ healing pass: zone_exists() per backend
→ re-queue any backend missing a zone
using stored zone_data (no DA re-push needed)
```
Both MySQL backends are written **concurrently** within the same zone update. A slow or unreachable secondary does not block the primary write. Failed backends enter the retry queue automatically. The reconciliation healing pass provides a further safety net for prolonged outages.
#### Failure behaviour
| Scenario | What happens |
|---|---|
| One MySQL backend unreachable | Other backend(s) succeed immediately. Failed backend queued for retry with exponential backoff (30 s → 2 m → 5 m → 15 m → 30 m, up to 5 attempts). CoreDNS continues serving from its local JSON cache throughout. |
| MySQL backend down for hours | Retry queue exhausts. CoreDNS serves from cache the entire time — zero query downtime. On recovery, the reconciliation healing pass detects the backend is missing zones and re-pushes all of them using stored `zone_data` — no DA intervention required. |
| directdnsonly container restarts | Persistent queue survives. In-flight zone updates replay on startup. |
| directdnsonly container down during DA push | DA cannot deliver. Persistent queue on disk is intact; when the container comes back, it resumes processing any previously queued items. New pushes during downtime are lost at the DA level (DA does not retry). |
| Zone deleted from DA | Reconciliation poller detects orphan and queues delete across all backends. |
#### `config/app.yml`
```yaml
app:
auth_username: directdnsonly
auth_password: your-secret
dns:
default_backend: coredns_mysql_dc1
backends:
coredns_mysql_dc1:
type: coredns_mysql
enabled: true
host: 10.0.0.80
port: 3306
database: coredns
username: coredns
password: your-db-password
coredns_mysql_dc2:
type: coredns_mysql
enabled: true
host: 10.0.1.29
port: 3306
database: coredns
username: coredns
password: your-db-password
```
Adding a third data centre is a single stanza in the config — no code changes required.
---
### Topology C — Multi-Instance with Peer Sync (Most Robust)
Multiple independent DirectDNSOnly containers, each with a single local DNS backend (NSD or CoreDNS MySQL), registered as separate Extra DNS servers in DirectAdmin Multi-Server. Peer sync provides eventual consistency — if one instance misses a DA push while it is offline, it recovers the missing zone data from a peer on the next sync interval.
```
DirectAdmin Multi-Server
├─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-syd (NSD or CoreDNS MySQL)
│ │
│ Persistent Queue + zone_data store
│ ├─ writes zone file / MySQL
│ ├─ reloads daemon
│ └─ retry on failure
│ │
│ ◀──── peer sync ────▶
│ │
└─ POST /CMD_API_DNS_ADMIN ──▶ directdnsonly-mlb (NSD or CoreDNS MySQL)
Persistent Queue + zone_data store
├─ writes zone file / MySQL
├─ reloads daemon
└─ retry on failure
```
**Why this is the most robust topology:**
- DA pushes to each instance independently — no single point of failure
- No load balancer in the write path — a dead LB cannot silence both instances
- Each instance serves DNS immediately from its own daemon
- If SYD misses a push while offline, it pulls the newer zone from MLB on the next peer sync (default 15 minutes)
- Peer sync is best-effort eventual consistency — deliberately simple, no consensus protocol
#### Failure behaviour
| Scenario | What happens |
|---|---|
| One instance down during DA push | Other instance(s) receive and serve the update. When the downed instance recovers, peer sync detects the stale/missing `zone_updated_at` and pulls the newer zone data from a peer. |
| Both instances down during DA push | Both miss the push. When they recover, they sync from each other — the most recently updated peer wins per zone. No DA re-push needed. |
| Peer offline | Peer sync silently skips unreachable peers. Syncs resume automatically when the peer recovers. |
| Zone deleted from DA | Reconciliation poller detects the orphan and queues the delete on each instance independently. |
#### `config/app.yml` — instance syd
```yaml
app:
auth_username: directdnsonly
auth_password: your-secret
dns:
default_backend: nsd
backends:
nsd:
type: nsd
enabled: true
zones_dir: /etc/nsd/zones
nsd_conf: /etc/nsd/nsd.conf.d/zones.conf
peer_sync:
enabled: true
interval_minutes: 15
peers:
- url: http://directdnsonly-mlb:2222
username: directdnsonly
password: your-secret
reconciliation:
enabled: true
interval_minutes: 60
directadmin_servers:
- hostname: da.syd.example.com
port: 2222
username: admin
password: da-secret
ssl: true
```
Register each container as a separate Extra DNS server entry in DA → DNS Administration → Extra DNS Servers with the same credentials.
---
### Topology Comparison
| | Topology A — Dual NSD/BIND | Topology B — CoreDNS MySQL | Topology C — Multi-Instance + Peer Sync |
|---|---|---|---|
| **DNS server** | NSD or BIND9 (bundled) | CoreDNS (separate, reads MySQL) | NSD or CoreDNS MySQL (per instance) |
| **Write path** | DA → each instance independently | DA → single instance → all backends | DA → each instance independently |
| **Zone storage** | Zone files on container disk | MySQL database rows | Zone files or MySQL + SQLite zone_data store |
| **DA registration** | Two Extra DNS server entries | One Extra DNS server entry | One entry per instance |
| **Redundancy model** | Independent app+DNS units | One app, N database backends | Independent instances + peer sync |
| **Transient backend failure** | Retry queue (exp. backoff, 5 attempts) | Retry queue (exp. backoff, 5 attempts) | Retry queue (exp. backoff, 5 attempts) |
| **Prolonged backend outage** | No auto-recovery — waits for next DA push | Reconciler healing pass re-pushes all missing zones | Peer sync pulls missed zones from a healthy peer |
| **Container down during push** | Zone missed entirely | Zone missed at DA level | Zone missed at DA level; recovered via peer sync |
| **Cross-node consistency** | No sync between instances | All backends share same write path | Peer sync provides eventual consistency |
| **Orphan detection** | Yes — reconciler | Yes — reconciler | Yes — reconciler (per instance) |
| **External DB required** | No | Yes (MySQL per CoreDNS node) | No (NSD) or Yes (CoreDNS MySQL) |
| **Horizontal scaling** | Add DA Extra DNS entries + containers | Add backend stanzas in config | Add DA Extra DNS entries + containers + peer list |
| **Best for** | Simple HA, no external DB | Best overall — resilient writes (retry queue) + resilient reads (CoreDNS cache fallback), no daemon reloads, scales to thousands of zones | Most robust HA — resilient at every layer, survives extended outages without DA re-push |
---
## DNS Server Resource and Scale Guide
### BIND9 vs CoreDNS MySQL — resource profile
| | BIND9 (bundled) | CoreDNS + MySQL |
|---|---|---|
| **Base memory** | ~1315 MB | ~2030 MB (CoreDNS binary) + MySQL process |
| **Per-zone overhead** | ~300 bytes per resource record in memory | Schema rows in MySQL; CoreDNS itself holds no zone state |
| **100-zone deployment** | ~3060 MB total | ~80150 MB (CoreDNS + MySQL combined) |
| **500-zone deployment** | ~100300 MB total | ~100200 MB (zone data lives in MySQL, not CoreDNS) |
| **Zone reload** | `rndc reload <zone>` — per-zone is fast; full reload blocks queries for seconds at large counts | No reload needed — CoreDNS queries MySQL at resolution time |
| **Zone update latency** | File write + `rndc reload` — typically <100 ms for a single zone | Write to MySQL — immediately visible to CoreDNS on next query |
| **CPU on reload** | Spikes on full `rndc reload`; grows linearly with zone count | No reload CPU spike; MySQL write is the only cost |
| **Query throughput** | High — zones loaded into memory | Slightly lower — each query hits MySQL (mitigated by MySQL query cache / connection pooling) |
| **Scale ceiling** | Degrades past ~1 000 zones: memory climbs, full reloads take 120 s+ | Scales with MySQL — thousands of zones with no DNS-process impact |
**Rule of thumb:** Below ~300 zones BIND9 and CoreDNS MySQL are broadly comparable. Above ~500 zones, CoreDNS MySQL has a significant advantage because zone data lives entirely in the database — adding a new zone costs one MySQL INSERT, not a daemon reload.
---
### Bundled DNS daemons — NSD and BIND9
The container image ships with **both NSD and BIND9** installed. The entrypoint reads your config and starts only the daemon that matches the configured backend type. CoreDNS MySQL deployments start neither.
**NSD (Name Server Daemon)** from NLnet Labs is the default recommendation:
| | BIND9 | NSD | Knot DNS |
|---|---|---|---|
| **Design focus** | Everything (authoritative + recursive + DNSSEC + ...) | Authoritative only | Authoritative only |
| **Base memory** | ~1315 MB | ~510 MB | ~1015 MB |
| **500-zone memory** | ~100300 MB | <100 MB (estimated) | ~100200 MB (3× zone text size) |
| **Zone update** | `rndc reload <zone>` | `nsd-control reload` | `knotc zone-reload` (atomic via RCU — zero query interruption) |
| **Config format** | `named.conf` / zone files | `nsd.conf` / zone files (nearly identical format) | `knot.conf` / zone files |
| **Docker image** | ~150200 MB | ~3050 MB Alpine | ~4060 MB Alpine |
| **Recursive queries** | Yes (if configured) | No | No |
| **Throughput** | Baseline | ~25× BIND9 | ~510× BIND9 (2.2 Mqps at 32 cores) |
| **Production use** | Wide adoption | TLD servers (`.nl`, `.se`), major registries | CZ.NIC, Cloudflare internal testing |
**NSD** would slot almost directly into the existing BIND backend implementation — zone files have the same RFC 1035 format, and `nsd-control reload` is the equivalent of `rndc reload`. The main implementation difference is the daemon config file (`nsd.conf` vs `named.conf`) and the absence of `named.conf.local`-style zone includes (NSD uses pattern-based config).
**Knot DNS** is worth considering if seamless zone updates matter: its RCU (Read-Copy-Update) mechanism serves the old zone to in-flight queries while atomically swapping in the new one — there is no window where queries see a partially-loaded zone. It is meaningfully heavier than NSD at moderate zone counts but the best performer at high scale.
**Summary recommendation:**
- **Any scale, external DB available:** CoreDNS MySQL ([cybercinch fork](https://github.com/cybercinch/coredns_mysql_extend)) wins at every zone count. Connection pooling, JSON cache fallback, health monitoring, and zero-downtime operation during DB maintenance make it the most resilient choice regardless of size. No daemon reload ever needed — a zone write is a MySQL INSERT.
- **No external DB, simplicity first:** NSD (bundled) — lightweight, fast, authoritative-only, same RFC 1035 zone file format as BIND.
- **Need zero-interruption zone swaps:** Knot DNS (RCU — serves old zone to in-flight queries while atomically swapping in the new one).
- **Need an HTTP API for zone management:** PowerDNS Authoritative with its native HTTP API.
> **Note:** Knot DNS and PowerDNS backends are **not implemented** in directdnsonly — they are listed here as architectural context only. Implemented backends: `nsd`, `bind`, `coredns_mysql`. Pull requests for additional backends are welcome.
---
## CoreDNS MySQL Backend — Required Fork
The `coredns_mysql` backend writes zones to a MySQL database that CoreDNS reads
at query time. **Vanilla CoreDNS with a stock MySQL plugin is not sufficient**
out of the box it does not act as a fully authoritative server, does not return
NS records in the additional section, does not set the AA flag, and does not
handle wildcard records.
This project is designed to work with a patched fork that resolves all of those
issues and adds production-grade resilience:
**[cybercinch/coredns_mysql_extend](https://github.com/cybercinch/coredns_mysql_extend)**
| Feature | Detail |
|---|---|
| **Fully authoritative** | Correct AA flag, NXDOMAIN on misses, NS records in the additional section |
| **Wildcard records** | `*` entries served correctly |
| **Connection pooling** | Configurable MySQL connection management — efficient under load |
| **Degraded operation** | Automatic fallback to a local JSON cache when MySQL is unavailable — DNS keeps serving |
| **Smart caching** | Intelligent per-record cache management reduces per-query MySQL round-trips |
| **Health monitoring** | Continuous database health checks with configurable intervals |
| **Zero downtime** | DNS continues serving during database maintenance windows |
**Why this matters for Topology B:** directdnsonly's retry queue handles the write side during a MySQL outage — the CoreDNS fork handles the read side. Between them, neither writes nor queries are dropped during transient database failures.
Use the NSD or BIND backend if you want a zero-dependency setup with no custom CoreDNS build required.
---
## Features
- Multi-backend DNS management (BIND, CoreDNS MySQL)
- Multi-backend DNS management (NSD, BIND, CoreDNS MySQL)
- Parallel backend dispatch — all enabled backends updated simultaneously
- Persistent queue — zone updates survive restarts
- Automatic record-count verification and drift reconciliation
- Peer sync — eventual consistency between directdnsonly instances
- Thread-safe operations
- Loguru-based logging
@@ -16,7 +353,7 @@
## Concurrent Multi-Backend Processing
DaDNS propagates every zone update to all enabled backends in parallel using a
DirectDNSOnly propagates every zone update to all enabled backends in parallel using a
queue-based worker architecture.
### Architecture
@@ -91,33 +428,263 @@ dns:
## Configuration
Edit `config/app.yml` for backend settings. Credentials can be overridden via
environment variables using the `DADNS_` prefix (e.g.
`DADNS_APP_AUTH_PASSWORD`).
DirectDNSOnly uses [Vyper](https://github.com/sn3d/vyper-py) for configuration. Settings are resolved in this priority order (highest wins):
1. **Environment variables**`DADNS_` prefix, dots replaced with underscores (e.g. `DADNS_APP_AUTH_PASSWORD`)
2. **Config file**`app.yml` searched in `/etc/directdnsonly`, `.`, `./config`, then the bundled default
3. **Built-in defaults** (shown in the table below)
**A config file is entirely optional.** Every scalar setting can be provided through environment variables alone.
---
### Configuration Reference
#### Core
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `log_level` | `DADNS_LOG_LEVEL` | `info` | Log verbosity: `debug`, `info`, `warning`, `error` |
| `timezone` | `DADNS_TIMEZONE` | `Pacific/Auckland` | Timezone for log timestamps |
| `queue_location` | `DADNS_QUEUE_LOCATION` | `./data/queues` | Path for the persistent zone-update queue |
#### App (HTTP server)
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `app.auth_username` | `DADNS_APP_AUTH_USERNAME` | `directdnsonly` | Basic auth username for all API routes (including `/internal`) |
| `app.auth_password` | `DADNS_APP_AUTH_PASSWORD` | `changeme` | Basic auth password — **always override in production** |
| `app.listen_port` | `DADNS_APP_LISTEN_PORT` | `2222` | TCP port the HTTP server binds to |
| `app.ssl_enable` | `DADNS_APP_SSL_ENABLE` | `false` | Enable TLS on the HTTP server |
| `app.proxy_support` | `DADNS_APP_PROXY_SUPPORT` | `true` | Trust `X-Forwarded-For` from a reverse proxy |
| `app.proxy_support_base` | `DADNS_APP_PROXY_SUPPORT_BASE` | `http://127.0.0.1` | Trusted proxy base address |
#### Datastore (internal SQLite)
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `datastore.type` | `DADNS_DATASTORE_TYPE` | `sqlite` | Internal datastore type (only `sqlite` supported) |
| `datastore.db_location` | `DADNS_DATASTORE_DB_LOCATION` | `data/directdns.db` | Path to the SQLite database file |
#### DNS backends — BIND
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `dns.default_backend` | `DADNS_DNS_DEFAULT_BACKEND` | _(none)_ | Name of the primary backend (used for status/health reporting) |
| `dns.backends.bind.enabled` | `DADNS_DNS_BACKENDS_BIND_ENABLED` | `false` | Enable the bundled BIND9 backend |
| `dns.backends.bind.zones_dir` | `DADNS_DNS_BACKENDS_BIND_ZONES_DIR` | `/etc/named/zones` | Directory where zone files are written |
| `dns.backends.bind.named_conf` | `DADNS_DNS_BACKENDS_BIND_NAMED_CONF` | `/etc/named.conf.local` | `named.conf` include file managed by directdnsonly |
#### DNS backends — NSD
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `dns.backends.nsd.enabled` | `DADNS_DNS_BACKENDS_NSD_ENABLED` | `false` | Enable the NSD backend |
| `dns.backends.nsd.zones_dir` | `DADNS_DNS_BACKENDS_NSD_ZONES_DIR` | `/etc/nsd/zones` | Directory where zone files are written |
| `dns.backends.nsd.nsd_conf` | `DADNS_DNS_BACKENDS_NSD_NSD_CONF` | `/etc/nsd/nsd.conf.d/zones.conf` | NSD zone include file managed by directdnsonly |
#### DNS backends — CoreDNS MySQL
The built-in env var mapping targets the backend named `coredns_mysql`. For multiple named CoreDNS backends (e.g. `coredns_dc1`, `coredns_dc2`) you must use a config file — see [Multi-backend via config file](#multi-backend-via-config-file) below.
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `dns.backends.coredns_mysql.enabled` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_ENABLED` | `false` | Enable the CoreDNS MySQL backend |
| `dns.backends.coredns_mysql.host` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_HOST` | `localhost` | MySQL host |
| `dns.backends.coredns_mysql.port` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_PORT` | `3306` | MySQL port |
| `dns.backends.coredns_mysql.database` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_DATABASE` | `coredns` | MySQL database name |
| `dns.backends.coredns_mysql.username` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_USERNAME` | `coredns` | MySQL username |
| `dns.backends.coredns_mysql.password` | `DADNS_DNS_BACKENDS_COREDNS_MYSQL_PASSWORD` | _(empty)_ | MySQL password |
#### Reconciliation poller
| Config key | Environment variable | Default | Description |
|---|---|---|---|
| `reconciliation.enabled` | `DADNS_RECONCILIATION_ENABLED` | `false` | Enable the background reconciliation poller |
| `reconciliation.dry_run` | `DADNS_RECONCILIATION_DRY_RUN` | `false` | Log orphans but do not queue deletes (safe first-run mode) |
| `reconciliation.interval_minutes` | `DADNS_RECONCILIATION_INTERVAL_MINUTES` | `60` | How often the poller runs |
| `reconciliation.verify_ssl` | `DADNS_RECONCILIATION_VERIFY_SSL` | `true` | Verify TLS certificates when querying DirectAdmin |
> The `reconciliation.directadmin_servers` list (DA hostnames, credentials) requires a config file — it cannot be expressed as simple env vars.
#### Peer sync
| Config key / Environment variable | Default | Description |
|---|---|---|
| `peer_sync.enabled` / `DADNS_PEER_SYNC_ENABLED` | `false` | Enable background peer-to-peer zone sync |
| `peer_sync.interval_minutes` / `DADNS_PEER_SYNC_INTERVAL_MINUTES` | `15` | How often each peer is polled |
For a **single peer** (the typical two-node Topology C setup) the peer can be configured entirely via env vars — no config file required:
| Environment variable | Default | Description |
|---|---|---|
| `DADNS_PEER_SYNC_PEER_URL` | _(unset)_ | URL of the single peer (e.g. `http://ddo-2:2222`). When set, this peer is automatically appended to the peers list. |
| `DADNS_PEER_SYNC_PEER_USERNAME` | `directdnsonly` | Basic auth username for the peer |
| `DADNS_PEER_SYNC_PEER_PASSWORD` | _(empty)_ | Basic auth password for the peer |
> For **multiple peers**, use a config file with the `peer_sync.peers` list. A peer defined via env var is deduped — if the same URL already appears in the config file it will not be added twice.
---
### Environment-variable-only setup
No config file is needed for single-backend deployments. Pass all settings as container environment variables.
#### Topology A/C — NSD backend (env vars only, recommended)
```bash
DADNS_APP_AUTH_PASSWORD=my-strong-secret
DADNS_DNS_DEFAULT_BACKEND=nsd
DADNS_DNS_BACKENDS_NSD_ENABLED=true
DADNS_DNS_BACKENDS_NSD_ZONES_DIR=/etc/nsd/zones
DADNS_DNS_BACKENDS_NSD_NSD_CONF=/etc/nsd/nsd.conf.d/zones.conf
DADNS_QUEUE_LOCATION=/app/data/queues
DADNS_DATASTORE_DB_LOCATION=/app/data/directdns.db
```
`docker-compose.yml` snippet (Topology C — two instances with peer sync via config file):
### Config Files
#### `config/app.yml`
```yaml
timezone: Pacific/Auckland
log_level: INFO
queue_location: ./data/queues
services:
directdnsonly-syd:
image: guisea/directdnsonly:2.5.0
ports:
- "2222:2222"
- "53:53/udp"
environment:
DADNS_APP_AUTH_PASSWORD: my-strong-secret
DADNS_DNS_DEFAULT_BACKEND: nsd
DADNS_DNS_BACKENDS_NSD_ENABLED: "true"
DADNS_PEER_SYNC_ENABLED: "true"
DADNS_PEER_SYNC_PEER_URL: http://directdnsonly-mlb:2222
DADNS_PEER_SYNC_PEER_USERNAME: directdnsonly
DADNS_PEER_SYNC_PEER_PASSWORD: my-strong-secret
volumes:
- syd-data:/app/data
directdnsonly-mlb:
image: guisea/directdnsonly:2.5.0
ports:
- "2223:2222"
- "54:53/udp"
environment:
DADNS_APP_AUTH_PASSWORD: my-strong-secret
DADNS_DNS_DEFAULT_BACKEND: nsd
DADNS_DNS_BACKENDS_NSD_ENABLED: "true"
DADNS_PEER_SYNC_ENABLED: "true"
DADNS_PEER_SYNC_PEER_URL: http://directdnsonly-syd:2222
DADNS_PEER_SYNC_PEER_USERNAME: directdnsonly
DADNS_PEER_SYNC_PEER_PASSWORD: my-strong-secret
volumes:
- mlb-data:/app/data
volumes:
syd-data:
mlb-data:
```
#### Topology A — BIND backend (env vars only)
```bash
# docker run / docker-compose environment:
DADNS_APP_AUTH_USERNAME=directdnsonly
DADNS_APP_AUTH_PASSWORD=my-strong-secret
DADNS_DNS_DEFAULT_BACKEND=bind
DADNS_DNS_BACKENDS_BIND_ENABLED=true
DADNS_DNS_BACKENDS_BIND_ZONES_DIR=/etc/named/zones
DADNS_DNS_BACKENDS_BIND_NAMED_CONF=/etc/named/named.conf.local
DADNS_QUEUE_LOCATION=/app/data/queues
DADNS_DATASTORE_DB_LOCATION=/app/data/directdns.db
```
`docker-compose.yml` snippet:
```yaml
services:
directdnsonly:
image: guisea/directdnsonly:2.5.0
ports:
- "2222:2222"
- "53:53/udp"
environment:
DADNS_APP_AUTH_PASSWORD: my-strong-secret
DADNS_DNS_DEFAULT_BACKEND: bind
DADNS_DNS_BACKENDS_BIND_ENABLED: "true"
DADNS_DNS_BACKENDS_BIND_ZONES_DIR: /etc/named/zones
DADNS_DNS_BACKENDS_BIND_NAMED_CONF: /etc/named/named.conf.local
volumes:
- ddo-data:/app/data
volumes:
ddo-data:
```
#### Topology B — single CoreDNS MySQL backend (env vars only)
```bash
DADNS_APP_AUTH_PASSWORD=my-strong-secret
DADNS_DNS_DEFAULT_BACKEND=coredns_mysql
DADNS_DNS_BACKENDS_COREDNS_MYSQL_ENABLED=true
DADNS_DNS_BACKENDS_COREDNS_MYSQL_HOST=mysql.dc1.internal
DADNS_DNS_BACKENDS_COREDNS_MYSQL_PORT=3306
DADNS_DNS_BACKENDS_COREDNS_MYSQL_DATABASE=coredns
DADNS_DNS_BACKENDS_COREDNS_MYSQL_USERNAME=coredns
DADNS_DNS_BACKENDS_COREDNS_MYSQL_PASSWORD=db-secret
DADNS_QUEUE_LOCATION=/app/data/queues
DADNS_DATASTORE_DB_LOCATION=/app/data/directdns.db
```
---
### Multi-backend via config file
When you need **multiple named backends** (e.g. two CoreDNS MySQL instances in different data centres), **peer sync**, or **reconciliation with DA servers**, use a config file mounted at `/app/config/app.yml` (or `/etc/directdnsonly/app.yml`):
```yaml
app:
auth_username: directdnsonly
auth_password: changeme # override with DADNS_APP_AUTH_PASSWORD
auth_password: my-strong-secret # or use DADNS_APP_AUTH_PASSWORD
dns:
default_backend: bind
default_backend: coredns_dc1
backends:
bind:
coredns_dc1:
type: coredns_mysql
enabled: true
zones_dir: ./data/zones
named_conf: ./data/named.conf.include
coredns_mysql:
enabled: true
host: "127.0.0.1"
host: 10.0.0.80
port: 3306
database: "coredns"
username: "coredns"
password: "password"
database: coredns
username: coredns
password: db-secret-dc1
coredns_dc2:
type: coredns_mysql
enabled: true
host: 10.0.1.29
port: 3306
database: coredns
username: coredns
password: db-secret-dc2
reconciliation:
enabled: true
dry_run: false
interval_minutes: 60
verify_ssl: true
directadmin_servers:
- hostname: da1.example.com
port: 2222
username: admin
password: da-secret
ssl: true
peer_sync:
enabled: true
interval_minutes: 15
peers:
- url: http://ddo-2:2222
username: directdnsonly
password: my-strong-secret
```
Credentials in the config file can still be overridden by env vars — for example, `DADNS_APP_AUTH_PASSWORD` overrides `app.auth_password` regardless of what the file says.

17
directdnsonly/__main__.py Normal file
View File

@@ -0,0 +1,17 @@
import os
import sys
def run():
# main.py uses short-form imports (from app.*, from worker) that resolve
# relative to the directdnsonly/ package directory. Insert it into the
# path before importing so `python -m directdnsonly` and the `dadns`
# console script both work without changing main.py.
sys.path.insert(0, os.path.dirname(__file__))
from main import main
main()
if __name__ == "__main__":
run()

View File

@@ -0,0 +1,96 @@
import cherrypy
import json
from loguru import logger
from sqlalchemy import select
from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain
class InternalAPI:
"""Peer-to-peer zone_data exchange endpoints.
Used by PeerSyncWorker to replicate zone_data between directdnsonly
instances so each node can independently heal its local backends.
All routes require peer_sync basic auth credentials, which are
configured separately from the main DirectAdmin-facing credentials
(peer_sync.auth_username / peer_sync.auth_password).
"""
def __init__(self, peer_syncer=None):
self._peer_syncer = peer_syncer
@cherrypy.expose
def zones(self, domain=None):
"""Return zone metadata or zone_data for a specific domain.
GET /internal/zones
Returns a JSON array of {domain, zone_updated_at, hostname, username}
for all domains that have stored zone_data.
GET /internal/zones?domain=example.com
Returns {domain, zone_data, zone_updated_at, hostname, username}
for the requested domain, or 404 if not found / no zone_data.
"""
cherrypy.response.headers["Content-Type"] = "application/json"
session = connect()
try:
if domain:
record = session.execute(
select(Domain)
.filter_by(domain=domain)
.where(Domain.zone_data.isnot(None))
).scalar_one_or_none()
if not record:
cherrypy.response.status = 404
return json.dumps({"error": "not found"}).encode()
return json.dumps(
{
"domain": record.domain,
"zone_data": record.zone_data,
"zone_updated_at": (
record.zone_updated_at.isoformat()
if record.zone_updated_at
else None
),
"hostname": record.hostname,
"username": record.username,
}
).encode()
else:
records = session.execute(
select(Domain).where(Domain.zone_data.isnot(None))
).scalars().all()
return json.dumps(
[
{
"domain": r.domain,
"zone_updated_at": (
r.zone_updated_at.isoformat()
if r.zone_updated_at
else None
),
"hostname": r.hostname,
"username": r.username,
}
for r in records
]
).encode()
except Exception as exc:
logger.error(f"[internal] Error serving /internal/zones: {exc}")
cherrypy.response.status = 500
return json.dumps({"error": "internal server error"}).encode()
finally:
session.close()
@cherrypy.expose
def peers(self):
"""Return the list of peer URLs this node knows about.
GET /internal/peers
Returns a JSON array of URL strings. Used by other nodes during
sync to discover new cluster members (gossip-lite mesh expansion).
"""
cherrypy.response.headers["Content-Type"] = "application/json"
urls = self._peer_syncer.get_peer_urls() if self._peer_syncer else []
return json.dumps(urls).encode()

View File

@@ -0,0 +1,82 @@
"""Operational status endpoint — aggregates queue, worker, reconciler, and peer health."""
import json
import cherrypy
from sqlalchemy import func, select
from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain
class StatusAPI:
"""Exposes GET /status as a JSON health/status document.
Aggregates data from WorkerManager.queue_status() and a live DB zone count
into a single response that a UI or monitoring system can poll.
Overall ``status`` field:
- ``ok`` — all workers alive, no dead-letters, all peers healthy
- ``degraded`` — retries pending, dead-letters present, or a peer is unhealthy
- ``error`` — a core worker thread is not alive
"""
def __init__(self, worker_manager):
self._wm = worker_manager
@cherrypy.expose
def index(self):
cherrypy.response.headers["Content-Type"] = "application/json"
return json.dumps(self._build(), default=str).encode()
# ------------------------------------------------------------------
# Internal
# ------------------------------------------------------------------
def _build(self) -> dict:
qs = self._wm.queue_status()
zone_count = self._zone_count()
overall = self._compute_overall(qs)
return {
"status": overall,
"queues": {
"save": qs.get("save_queue_size", 0),
"delete": qs.get("delete_queue_size", 0),
"retry": qs.get("retry_queue_size", 0),
"dead_letters": qs.get("dead_letters", 0),
},
"workers": {
"save": qs.get("save_worker_alive"),
"delete": qs.get("delete_worker_alive"),
"retry_drain": qs.get("retry_worker_alive"),
},
"reconciler": qs.get("reconciler", {}),
"peer_sync": qs.get("peer_sync", {}),
"zones": {"total": zone_count},
}
@staticmethod
def _zone_count() -> int:
session = connect()
try:
return session.execute(select(func.count(Domain.id))).scalar() or 0
except Exception:
return 0
finally:
session.close()
@staticmethod
def _compute_overall(qs: dict) -> str:
if not qs.get("save_worker_alive") or not qs.get("delete_worker_alive"):
return "error"
peer_sync = qs.get("peer_sync", {})
if (
qs.get("retry_queue_size", 0) > 0
or qs.get("dead_letters", 0) > 0
or peer_sync.get("degraded", 0) > 0
):
return "degraded"
return "ok"

View File

@@ -2,6 +2,7 @@ from typing import Dict, Type, Optional
from .base import DNSBackend
from .bind import BINDBackend
from .coredns_mysql import CoreDNSMySQLBackend
from .nsd import NSDBackend
from directdnsonly.config import config
from loguru import logger
@@ -11,6 +12,7 @@ class BackendRegistry:
self._backend_types = {
"bind": BINDBackend,
"coredns_mysql": CoreDNSMySQLBackend,
"nsd": NSDBackend,
}
self._backend_instances: Dict[str, DNSBackend] = {}
self._initialized = False

View File

@@ -1,8 +1,7 @@
from typing import Optional, Dict, Set, Tuple, Any
from sqlalchemy import create_engine, Column, String, Integer, Text, ForeignKey, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session, relationship
from sqlalchemy import create_engine, Column, String, Integer, Text, ForeignKey, Boolean, select, func, delete
from sqlalchemy.orm import sessionmaker, scoped_session, relationship, declarative_base
from dns import zone as dns_zone_module
from dns.rdataclass import IN
from loguru import logger
@@ -15,6 +14,7 @@ class Zone(Base):
__tablename__ = "zones"
id = Column(Integer, primary_key=True)
zone_name = Column(String(255), nullable=False, index=True, unique=True)
managed_by = Column(String(255), nullable=True) # 'directadmin' | 'direct' | NULL (legacy)
class Record(Base):
@@ -46,7 +46,7 @@ class CoreDNSMySQLBackend(DNSBackend):
pool_size=5,
max_overflow=10,
)
self.Session = scoped_session(sessionmaker(bind=self.engine))
self.Session = scoped_session(sessionmaker(self.engine))
Base.metadata.create_all(self.engine)
logger.info(
f"Initialized CoreDNS MySQL backend '{self.instance_name}' "
@@ -80,7 +80,7 @@ class CoreDNSMySQLBackend(DNSBackend):
# Get existing records for this zone but track SOA records separately
existing_records = {}
existing_soa = None
for r in session.query(Record).filter_by(zone_id=zone.id).all():
for r in session.execute(select(Record).filter_by(zone_id=zone.id)).scalars().all():
if r.type == "SOA":
existing_soa = r
else:
@@ -91,10 +91,34 @@ class CoreDNSMySQLBackend(DNSBackend):
zone_name, zone_data
)
# Track changes
current_records = set()
# Pre-compute the set of (hostname, type, data) keys that should
# remain after this update, so we can identify stale records upfront.
incoming_keys = {
(name, rtype, data) for name, rtype, data, _ in source_records
}
changes = {"added": 0, "updated": 0, "removed": 0}
# --- 1. Remove stale records first ---
# Deleting before inserting means a brief NXDOMAIN is preferable
# to briefly serving both old and new records simultaneously.
for key, record in existing_records.items():
if key not in incoming_keys:
logger.debug(
f"Removed record: {record.hostname} {record.type} {record.data}"
)
session.delete(record)
changes["removed"] += 1
# Handle SOA removal if needed
if existing_soa and not source_soa:
logger.debug(
f"Removed SOA record: {existing_soa.hostname} SOA {existing_soa.data}"
)
session.delete(existing_soa)
changes["removed"] += 1
# --- 2. Add / update incoming records ---
# Handle SOA record
if source_soa:
soa_name, soa_content, soa_ttl = source_soa
@@ -124,7 +148,6 @@ class CoreDNSMySQLBackend(DNSBackend):
# Process all non-SOA records
for record_name, record_type, record_content, record_ttl in source_records:
key = (record_name, record_type, record_content)
current_records.add(key)
if key in existing_records:
# Update existing record if TTL changed
@@ -152,23 +175,6 @@ class CoreDNSMySQLBackend(DNSBackend):
f"Added new record: {record_name} {record_type} {record_content}"
)
# Remove records that no longer exist in the source zone
for key, record in existing_records.items():
if key not in current_records:
logger.debug(
f"Removed record: {record.hostname} {record.type} {record.data}"
)
session.delete(record)
changes["removed"] += 1
# Handle SOA removal if needed
if existing_soa and not source_soa:
logger.debug(
f"Removed SOA record: {existing_soa.hostname} SOA {existing_soa.data}"
)
session.delete(existing_soa)
changes["removed"] += 1
session.commit()
total_changes = changes["added"] + changes["updated"] + changes["removed"]
if total_changes > 0:
@@ -192,17 +198,17 @@ class CoreDNSMySQLBackend(DNSBackend):
session = self.Session()
try:
# First find the zone
zone = (
session.query(Zone)
.filter_by(zone_name=self.dot_fqdn(zone_name))
.first()
)
zone = session.execute(
select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
).scalar_one_or_none()
if not zone:
logger.warning(f"Zone {zone_name} not found for deletion")
return False
# Delete all records associated with the zone
count = session.query(Record).filter_by(zone_id=zone.id).delete()
count = session.execute(
delete(Record).where(Record.zone_id == zone.id)
).rowcount
# Delete the zone itself
session.delete(zone)
@@ -229,12 +235,9 @@ class CoreDNSMySQLBackend(DNSBackend):
def zone_exists(self, zone_name: str) -> bool:
session = self.Session()
try:
exists = (
session.query(Zone)
.filter_by(zone_name=self.dot_fqdn(zone_name))
.first()
is not None
)
exists = session.execute(
select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
).scalar_one_or_none() is not None
logger.debug(f"Zone existence check for {zone_name}: {exists}")
return exists
except Exception as e:
@@ -244,41 +247,27 @@ class CoreDNSMySQLBackend(DNSBackend):
session.close()
def _ensure_zone_exists(self, session, zone_name: str) -> Zone:
"""Ensure a zone exists in the database, creating it if necessary"""
zone = session.query(Zone).filter_by(zone_name=self.dot_fqdn(zone_name)).first()
"""Ensure a zone exists in the database, creating it if necessary."""
zone = session.execute(
select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
).scalar_one_or_none()
if not zone:
logger.debug(f"Creating new zone: {self.dot_fqdn(zone_name)}")
zone = Zone(zone_name=self.dot_fqdn(zone_name))
zone = Zone(
zone_name=self.dot_fqdn(zone_name),
managed_by="directadmin",
)
session.add(zone)
session.flush() # Get the zone ID
session.flush()
elif not zone.managed_by:
# Migrate pre-existing rows that were created before this field was added
zone.managed_by = "directadmin"
return zone
def _normalize_cname_data(self, zone_name: str, record_content: str) -> str:
"""Normalize CNAME record data to ensure consistent FQDN format.
This ensures CNAME targets are always stored as fully-qualified domain
names so that record comparison between the BIND zone source and the
database is deterministic.
Args:
zone_name: The zone name for relative-name expansion
record_content: The raw CNAME target from the parsed zone
Returns:
The normalized CNAME target string
"""
if record_content.startswith("@"):
logger.debug(f"CNAME target starts with '@', replacing with zone FQDN")
record_content = self.dot_fqdn(zone_name)
elif not record_content.endswith("."):
logger.debug(f"CNAME target {record_content} is relative, appending zone")
record_content = ".".join([record_content, self.dot_fqdn(zone_name)])
return record_content
def _parse_zone_to_record_set(
self, zone_name: str, zone_data: str
) -> Tuple[Set[Tuple[str, str, str, int]], Optional[Tuple[str, str, int]]]:
"""Parse a BIND zone file into a set of normalised record keys.
"""Parse a BIND zone file into a set of record keys.
Returns:
Tuple of:
@@ -289,21 +278,27 @@ class CoreDNSMySQLBackend(DNSBackend):
records: Set[Tuple[str, str, str, int]] = set()
soa = None
# Use the zone origin (if available) to expand relative names in RDATA
# back to absolute FQDNs. Without this, dnspython's default relativize=True
# behaviour turns in-zone targets like `wvvcc.co.nz.` into `@` in the
# stored data, which CoreDNS then serves incorrectly.
origin = dns_zone.origin
for name, ttl, rdata in dns_zone.iterate_rdatas():
if rdata.rdclass != IN:
continue
record_name = str(name)
record_type = rdata.rdtype.name
record_content = rdata.to_text()
if origin is not None:
record_content = rdata.to_text(origin=origin, relativize=False)
else:
record_content = rdata.to_text()
if record_type == "SOA":
soa = (record_name, record_content, ttl)
continue
if record_type == "CNAME":
record_content = self._normalize_cname_data(zone_name, record_content)
records.add((record_name, record_type, record_content, ttl))
return records, soa
@@ -323,11 +318,9 @@ class CoreDNSMySQLBackend(DNSBackend):
"""
session = self.Session()
try:
zone = (
session.query(Zone)
.filter_by(zone_name=self.dot_fqdn(zone_name))
.first()
)
zone = session.execute(
select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
).scalar_one_or_none()
if not zone:
logger.warning(
f"[{self.instance_name}] Zone {zone_name} not found "
@@ -335,7 +328,9 @@ class CoreDNSMySQLBackend(DNSBackend):
)
return False, 0
actual_count = session.query(Record).filter_by(zone_id=zone.id).count()
actual_count = session.execute(
select(func.count()).select_from(Record).where(Record.zone_id == zone.id)
).scalar()
matches = actual_count == expected_count
if not matches:
@@ -383,11 +378,9 @@ class CoreDNSMySQLBackend(DNSBackend):
"""
session = self.Session()
try:
zone = (
session.query(Zone)
.filter_by(zone_name=self.dot_fqdn(zone_name))
.first()
)
zone = session.execute(
select(Zone).filter_by(zone_name=self.dot_fqdn(zone_name))
).scalar_one_or_none()
if not zone:
logger.warning(
f"[{self.instance_name}] Zone {zone_name} not found "
@@ -405,7 +398,9 @@ class CoreDNSMySQLBackend(DNSBackend):
}
# Query all records currently in the backend for this zone
db_records = session.query(Record).filter_by(zone_id=zone.id).all()
db_records = session.execute(
select(Record).where(Record.zone_id == zone.id)
).scalars().all()
removed = 0
for record in db_records:

View File

@@ -0,0 +1,179 @@
import os
import re
import subprocess
from loguru import logger
from pathlib import Path
from typing import Dict, List, Optional
from .base import DNSBackend
class NSDBackend(DNSBackend):
"""DNS backend for NSD (Name Server Daemon) by NLnet Labs.
Zone files use the same RFC 1035 format as BIND. NSD is reloaded via
``nsd-control reload`` after each write. Zone registration is managed in a
dedicated include file so the main ``nsd.conf`` is never modified by the
application.
"""
@classmethod
def get_name(cls) -> str:
return "nsd"
@classmethod
def is_available(cls) -> bool:
try:
result = subprocess.run(
["nsd-control", "status"],
capture_output=True,
text=True,
)
# nsd-control exits 0 when NSD is running, non-zero otherwise.
# Either way, a non-FileNotFoundError means the binary is present.
logger.info("NSD available (nsd-control found)")
return True
except FileNotFoundError:
logger.warning("NSD not found in PATH — nsd-control missing")
return False
def __init__(self, config: Dict):
super().__init__(config)
self.zones_dir = Path(config.get("zones_dir", "/etc/nsd/zones"))
self.nsd_conf = Path(
config.get("nsd_conf", "/etc/nsd/nsd.conf.d/zones.conf")
)
# Ensure zones directory exists
try:
if self.zones_dir.is_symlink():
logger.debug(f"{self.zones_dir} is already a symlink")
elif not self.zones_dir.exists():
self.zones_dir.mkdir(parents=True, mode=0o755)
logger.debug(f"Created zones directory: {self.zones_dir}")
os.chmod(self.zones_dir, 0o755)
except FileExistsError:
pass
except Exception as e:
logger.error(f"Failed to setup zones directory: {e}")
raise
# Ensure the conf include directory and file exist
self.nsd_conf.parent.mkdir(parents=True, exist_ok=True)
if not self.nsd_conf.exists():
self.nsd_conf.touch()
logger.info(f"Created empty NSD zone conf: {self.nsd_conf}")
logger.success(
f"NSD backend initialized — zones: {self.zones_dir}, "
f"conf: {self.nsd_conf}"
)
# ------------------------------------------------------------------
# Core backend interface
# ------------------------------------------------------------------
def write_zone(self, zone_name: str, zone_data: str) -> bool:
zone_file = self.zones_dir / f"{zone_name}.db"
try:
zone_file.write_text(zone_data)
logger.debug(f"Wrote zone file: {zone_file}")
self._ensure_zone_in_conf(zone_name)
return True
except IOError as e:
logger.error(f"Failed to write zone file {zone_file}: {e}")
return False
def delete_zone(self, zone_name: str) -> bool:
zone_file = self.zones_dir / f"{zone_name}.db"
try:
if zone_file.exists():
zone_file.unlink()
logger.debug(f"Deleted zone file: {zone_file}")
else:
logger.warning(f"Zone file not found: {zone_file}")
return False
self._remove_zone_from_conf(zone_name)
return True
except IOError as e:
logger.error(f"Failed to delete zone {zone_name}: {e}")
return False
def reload_zone(self, zone_name: Optional[str] = None) -> bool:
try:
if zone_name:
cmd = ["nsd-control", "reload", zone_name]
logger.debug(f"Reloading single zone: {zone_name}")
else:
cmd = ["nsd-control", "reload"]
logger.debug("Reloading all zones")
result = subprocess.run(
cmd,
check=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
)
logger.debug(f"NSD reload successful: {result.stdout.strip()}")
return True
except subprocess.CalledProcessError as e:
logger.error(f"NSD reload failed: {e.stderr.strip()}")
return False
except Exception as e:
logger.error(f"Unexpected error during NSD reload: {e}")
return False
def zone_exists(self, zone_name: str) -> bool:
exists = (self.zones_dir / f"{zone_name}.db").exists()
logger.debug(f"Zone existence check for {zone_name}: {exists}")
return exists
# ------------------------------------------------------------------
# NSD conf file management
# ------------------------------------------------------------------
def update_nsd_conf(self, zones: List[str]) -> bool:
"""Rewrite the NSD zones include file with exactly the given zone list.
Equivalent to BINDBackend.update_named_conf — full replacement from a
known-good source list.
"""
try:
lines = []
for zone in zones:
zone_file = self.zones_dir / f"{zone}.db"
lines.append(
f'\nzone:\n name: "{zone}"\n zonefile: "{zone_file}"\n'
)
self.nsd_conf.write_text("".join(lines))
logger.debug(f"Rewrote NSD zone conf: {self.nsd_conf}")
return True
except IOError as e:
logger.error(f"Failed to update NSD zone conf: {e}")
return False
def _ensure_zone_in_conf(self, zone_name: str) -> None:
"""Append a zone stanza to the NSD conf file if it is not already present."""
zone_file = self.zones_dir / f"{zone_name}.db"
stanza = f'\nzone:\n name: "{zone_name}"\n zonefile: "{zone_file}"\n'
content = self.nsd_conf.read_text() if self.nsd_conf.exists() else ""
if f'name: "{zone_name}"' not in content:
with open(self.nsd_conf, "a") as f:
f.write(stanza)
logger.debug(f"Added zone {zone_name} to NSD conf")
def _remove_zone_from_conf(self, zone_name: str) -> None:
"""Remove a zone stanza from the NSD conf file."""
if not self.nsd_conf.exists():
return
content = self.nsd_conf.read_text()
pattern = (
r'\nzone:\n name: "'
+ re.escape(zone_name)
+ r'"\n zonefile: "[^"]+"\n'
)
new_content = re.sub(pattern, "", content)
if new_content != content:
self.nsd_conf.write_text(new_content)
logger.debug(f"Removed zone {zone_name} from NSD conf")

View File

@@ -1,332 +0,0 @@
from typing import Optional, Dict, Set, Tuple, List
from sqlalchemy import (
create_engine,
Column,
String,
Integer,
Text,
Boolean,
DateTime,
func,
)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, scoped_session
from loguru import logger
from .base import DNSBackend
from config import config
import time
Base = declarative_base()
class Domain(Base):
__tablename__ = "domains"
id = Column(Integer, primary_key=True)
name = Column(String(255), nullable=False, index=True, unique=True)
master = Column(String(128), nullable=True)
last_check = Column(Integer, nullable=True)
type = Column(String(6), nullable=False, default="NATIVE")
notified_serial = Column(Integer, nullable=True)
account = Column(String(40), nullable=True)
class Record(Base):
__tablename__ = "records"
id = Column(Integer, primary_key=True)
domain_id = Column(Integer, nullable=False, index=True)
name = Column(String(255), nullable=False, index=True)
type = Column(String(10), nullable=False)
content = Column(Text, nullable=False)
ttl = Column(Integer, nullable=True)
prio = Column(Integer, nullable=True)
change_date = Column(Integer, nullable=True)
disabled = Column(Boolean, nullable=False, default=False)
ordername = Column(String(255), nullable=True)
auth = Column(Boolean, nullable=False, default=True)
class PowerDNSMySQLBackend(DNSBackend):
@classmethod
def get_name(cls) -> str:
return "powerdns_mysql"
@classmethod
def is_available(cls) -> bool:
try:
import pymysql
return True
except ImportError:
logger.warning("PyMySQL not available - PowerDNS MySQL backend disabled")
return False
@staticmethod
def ensure_fqdn(name: str, zone_name: str) -> str:
"""Ensure name is fully qualified for PowerDNS"""
if name == "@" or name == "":
return zone_name
elif name.endswith("."):
return name.rstrip(".")
elif name == zone_name:
return name
else:
return f"{name}.{zone_name}"
def __init__(self, config: dict = None):
c = config or config.get("dns.backends.powerdns_mysql")
self.engine = create_engine(
f"mysql+pymysql://{c['username']}:{c['password']}@"
f"{c['host']}:{c['port']}/{c['database']}",
pool_pre_ping=True,
)
self.Session = scoped_session(sessionmaker(bind=self.engine))
Base.metadata.create_all(self.engine)
logger.info(f"Initialized PowerDNS MySQL backend for {c['database']}")
def _ensure_domain_exists(self, session, zone_name: str) -> Domain:
"""Ensure domain exists and return domain object"""
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
domain = Domain(name=zone_name, type="NATIVE")
session.add(domain)
session.flush() # Flush to get the domain ID
logger.info(f"Created new domain: {zone_name}")
return domain
def _parse_soa_content(self, soa_content: str) -> Dict[str, str]:
"""Parse SOA record content into components"""
parts = soa_content.split()
if len(parts) >= 7:
return {
"primary_ns": parts[0],
"hostmaster": parts[1],
"serial": parts[2],
"refresh": parts[3],
"retry": parts[4],
"expire": parts[5],
"minimum": parts[6],
}
return {}
def write_zone(self, zone_name: str, zone_data: str) -> bool:
from dns import zone as dns_zone_module
from dns.rdataclass import IN
session = self.Session()
try:
# Ensure domain exists
domain = self._ensure_domain_exists(session, zone_name)
# Get existing records for this domain
existing_records = {
(r.name, r.type): r
for r in session.query(Record).filter_by(domain_id=domain.id).all()
}
# Parse the zone data
dns_zone = dns_zone_module.from_text(zone_data, check_origin=False)
# Track records we process
current_records: Set[Tuple[str, str]] = set()
changes = {"added": 0, "updated": 0, "removed": 0}
current_time = int(time.time())
# Process all records
for name, ttl, rdata in dns_zone.iterate_rdatas():
if rdata.rdclass != IN:
continue
record_name = self.ensure_fqdn(str(name), zone_name)
record_type = rdata.rdtype.name
record_content = rdata.to_text()
record_ttl = ttl
record_prio = None
# Handle MX records priority
if record_type == "MX":
parts = record_content.split(" ", 1)
if len(parts) == 2:
record_prio = int(parts[0])
record_content = parts[1]
# Handle SRV records priority and other fields
elif record_type == "SRV":
parts = record_content.split(" ", 3)
if len(parts) == 4:
record_prio = int(parts[0])
record_content = f"{parts[1]} {parts[2]} {parts[3]}"
# Ensure CNAME and other records have proper FQDN format
if record_type in ["CNAME", "MX", "NS"]:
if not record_content.endswith(".") and record_content != "@":
if record_content == "@":
record_content = zone_name
elif "." not in record_content:
record_content = f"{record_content}.{zone_name}"
key = (record_name, record_type)
current_records.add(key)
if key in existing_records:
# Update existing record if needed
record = existing_records[key]
if (
record.content != record_content
or record.ttl != record_ttl
or record.prio != record_prio
):
record.content = record_content
record.ttl = record_ttl
record.prio = record_prio
record.change_date = current_time
record.disabled = False
changes["updated"] += 1
else:
# Add new record
new_record = Record(
domain_id=domain.id,
name=record_name,
type=record_type,
content=record_content,
ttl=record_ttl,
prio=record_prio,
change_date=current_time,
disabled=False,
auth=True,
)
session.add(new_record)
changes["added"] += 1
# Remove deleted records
for key in set(existing_records.keys()) - current_records:
session.delete(existing_records[key])
changes["removed"] += 1
session.commit()
logger.success(
f"Zone {zone_name} updated: "
f"+{changes['added']} ~{changes['updated']} -{changes['removed']}"
)
return True
except Exception as e:
session.rollback()
logger.error(f"Zone update failed for {zone_name}: {e}")
return False
finally:
session.close()
def delete_zone(self, zone_name: str) -> bool:
session = self.Session()
try:
# First find the domain
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
logger.warning(f"Domain {zone_name} not found for deletion")
return False
# Delete all records associated with the domain
count = session.query(Record).filter_by(domain_id=domain.id).delete()
# Delete the domain itself
session.delete(domain)
session.commit()
logger.info(f"Deleted domain {zone_name} with {count} records")
return True
except Exception as e:
session.rollback()
logger.error(f"Domain deletion failed for {zone_name}: {e}")
return False
finally:
session.close()
def reload_zone(self, zone_name: Optional[str] = None) -> bool:
"""PowerDNS reload - could trigger pdns_control reload if needed"""
if zone_name:
logger.debug(f"PowerDNS reload triggered for zone {zone_name}")
# Optional: Call pdns_control reload-zones here if needed
# subprocess.run(['pdns_control', 'reload-zones'], check=True)
else:
logger.debug("PowerDNS reload triggered for all zones")
# Optional: Call pdns_control reload here if needed
# subprocess.run(['pdns_control', 'reload'], check=True)
return True
def zone_exists(self, zone_name: str) -> bool:
session = self.Session()
try:
exists = session.query(Domain).filter_by(name=zone_name).first() is not None
logger.debug(f"Zone existence check for {zone_name}: {exists}")
return exists
except Exception as e:
logger.error(f"Zone existence check failed for {zone_name}: {e}")
return False
finally:
session.close()
def get_zone_records(self, zone_name: str) -> List[Dict]:
"""Get all records for a zone - useful for debugging/inspection"""
session = self.Session()
try:
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
return []
records = session.query(Record).filter_by(domain_id=domain.id).all()
return [
{
"name": r.name,
"type": r.type,
"content": r.content,
"ttl": r.ttl,
"prio": r.prio,
"disabled": r.disabled,
}
for r in records
]
except Exception as e:
logger.error(f"Failed to get records for {zone_name}: {e}")
return []
finally:
session.close()
def set_record_status(
self, zone_name: str, record_name: str, record_type: str, disabled: bool
) -> bool:
"""Enable/disable specific records"""
session = self.Session()
try:
domain = session.query(Domain).filter_by(name=zone_name).first()
if not domain:
logger.warning(f"Domain {zone_name} not found")
return False
full_name = self.ensure_fqdn(record_name, zone_name)
record = (
session.query(Record)
.filter_by(domain_id=domain.id, name=full_name, type=record_type)
.first()
)
if not record:
logger.warning(
f"Record {full_name} {record_type} not found in {zone_name}"
)
return False
record.disabled = disabled
record.change_date = int(time.time())
session.commit()
status = "disabled" if disabled else "enabled"
logger.info(f"Record {full_name} {record_type} {status} in {zone_name}")
return True
except Exception as e:
session.rollback()
logger.error(f"Failed to set record status: {e}")
return False
finally:
session.close()

View File

@@ -0,0 +1,3 @@
from .client import DirectAdminClient
__all__ = ["DirectAdminClient"]

View File

@@ -0,0 +1,340 @@
"""DirectAdmin HTTP client.
Encapsulates all outbound communication with a single DirectAdmin server:
authenticated requests, the Basic-Auth → session-cookie fallback for DA Evo,
paginated domain listing, and the legacy URL-encoded response parser.
"""
from __future__ import annotations
from urllib.parse import parse_qs
from typing import Optional
import requests
import requests.exceptions
from loguru import logger
class DirectAdminClient:
"""HTTP client for a single DirectAdmin server.
Handles two authentication modes transparently:
- Basic Auth (classic DA / API-only access)
- Session cookie via CMD_LOGIN (DA Evolution — redirects Basic Auth)
Usage::
client = DirectAdminClient("da1.example.com", 2222, "admin", "secret")
domains = client.list_domains() # set[str] or None on failure
response = client.get("CMD_API_SHOW_ALL_USERS")
"""
def __init__(
self,
hostname: str,
port: int,
username: str,
password: str,
ssl: bool = True,
verify_ssl: bool = True,
) -> None:
self.hostname = hostname
self.port = port
self.username = username
self.password = password
self.scheme = "https" if ssl else "http"
self.verify_ssl = verify_ssl
self._cookies = None # populated on first successful session login
# ------------------------------------------------------------------
# Public API
# ------------------------------------------------------------------
def list_domains(self, ipp: int = 1000) -> Optional[set]:
"""Return all domains on this DA server via CMD_DNS_ADMIN (JSON, paginated).
Falls back to the legacy URL-encoded parser if JSON decode fails.
Returns a set of lowercase domain strings, or ``None`` if the server
is unreachable or returns an error.
"""
page = 1
all_domains: set = set()
total_pages = 1
try:
while page <= total_pages:
response = self.get(
"CMD_DNS_ADMIN",
params={"json": "yes", "page": page, "ipp": ipp},
)
if response is None:
return None
if response.is_redirect or response.status_code in (
301,
302,
303,
307,
308,
):
if self._cookies:
logger.error(
f"[da:{self.hostname}] Still redirecting after session login — "
f"check that '{self.username}' has admin-level access. Skipping."
)
return None
logger.debug(
f"[da:{self.hostname}] Basic Auth redirected "
f"(HTTP {response.status_code}) — attempting session login (DA Evo)"
)
if not self._login():
return None
continue # retry this page with cookies
response.raise_for_status()
content_type = response.headers.get("Content-Type", "")
if "text/html" in content_type:
logger.error(
f"[da:{self.hostname}] Returned HTML instead of API response — "
f"check credentials and admin-level access. Skipping."
)
return None
try:
data = response.json()
for k, v in data.items():
if k.isdigit() and isinstance(v, dict) and "domain" in v:
all_domains.add(v["domain"].strip().lower())
total_pages = int(data.get("info", {}).get("total_pages", 1))
page += 1
except Exception as exc:
logger.error(
f"[da:{self.hostname}] JSON decode failed on page {page}: {exc}\n"
f"Raw response: {response.text[:500]}"
)
all_domains.update(self._parse_legacy_domain_list(response.text))
break # no paging in legacy mode
return all_domains
except requests.exceptions.SSLError as exc:
logger.error(
f"[da:{self.hostname}] SSL error — {exc}. "
f"Set verify_ssl: false in reconciliation config if using self-signed certs."
)
except requests.exceptions.ConnectionError as exc:
logger.error(f"[da:{self.hostname}] Cannot reach server — {exc}. Skipping.")
except requests.exceptions.Timeout:
logger.error(f"[da:{self.hostname}] Connection timed out. Skipping.")
except requests.exceptions.HTTPError as exc:
logger.error(f"[da:{self.hostname}] HTTP error — {exc}. Skipping.")
except Exception as exc:
logger.error(f"[da:{self.hostname}] Unexpected error: {exc}")
return None
def get(
self, command: str, params: Optional[dict] = None
) -> Optional[requests.Response]:
"""Authenticated GET to any DA CMD_* endpoint.
Uses session cookies when available (after a successful ``_login``),
otherwise falls back to HTTP Basic Auth. Does **not** follow redirects
so callers can detect the Basic-Auth → cookie upgrade.
"""
url = f"{self.scheme}://{self.hostname}:{self.port}/{command}"
kwargs: dict = dict(
params=params or {},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if self._cookies:
kwargs["cookies"] = self._cookies
else:
kwargs["auth"] = (self.username, self.password)
try:
return requests.get(url, **kwargs)
except Exception as exc:
logger.error(f"[da:{self.hostname}] GET {command} failed: {exc}")
return None
def post(
self, command: str, data: Optional[dict] = None
) -> Optional[requests.Response]:
"""Authenticated POST to any DA CMD_* endpoint."""
url = f"{self.scheme}://{self.hostname}:{self.port}/{command}"
kwargs: dict = dict(
data=data or {},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if self._cookies:
kwargs["cookies"] = self._cookies
else:
kwargs["auth"] = (self.username, self.password)
try:
return requests.post(url, **kwargs)
except Exception as exc:
logger.error(f"[da:{self.hostname}] POST {command} failed: {exc}")
return None
def get_extra_dns_servers(self) -> dict:
"""Return the Extra DNS server map from CMD_MULTI_SERVER (GET).
Returns a dict keyed by server hostname/IP, each value being the
per-server settings dict (dns, domain_check, port, user, ssl, …).
Returns ``{}`` on any error.
"""
resp = self.get("CMD_MULTI_SERVER", params={"json": "yes"})
if resp is None or resp.status_code != 200:
logger.error(f"[da:{self.hostname}] CMD_MULTI_SERVER GET failed")
return {}
try:
return resp.json().get("servers", {})
except Exception as exc:
logger.error(f"[da:{self.hostname}] CMD_MULTI_SERVER parse error: {exc}")
return {}
def add_extra_dns_server(
self, ip: str, port: int, user: str, passwd: str, ssl: bool = False
) -> bool:
"""Register a new Extra DNS server via CMD_MULTI_SERVER action=add.
Returns ``True`` if DA reports success, ``False`` otherwise.
"""
resp = self.post(
"CMD_MULTI_SERVER",
data={
"action": "add",
"json": "yes",
"ip": ip,
"port": str(port),
"user": user,
"passwd": passwd,
"ssl": "yes" if ssl else "no",
},
)
if resp is None or resp.status_code != 200:
logger.error(f"[da:{self.hostname}] CMD_MULTI_SERVER add failed for {ip}")
return False
try:
result = resp.json()
if result.get("success"):
logger.info(f"[da:{self.hostname}] Added Extra DNS server {ip}")
return True
logger.error(
f"[da:{self.hostname}] CMD_MULTI_SERVER add error: {result.get('result', result)}"
)
return False
except Exception as exc:
logger.error(f"[da:{self.hostname}] CMD_MULTI_SERVER add parse error: {exc}")
return False
def ensure_extra_dns_server(
self, ip: str, port: int, user: str, passwd: str, ssl: bool = False
) -> bool:
"""Add (if absent) and configure a directdnsonly Extra DNS server.
Ensures the server is registered with ``dns=yes`` and
``domain_check=yes`` so DirectAdmin pushes zone updates to it.
Returns ``True`` if fully configured, ``False`` on any failure.
"""
servers = self.get_extra_dns_servers()
if ip not in servers:
if not self.add_extra_dns_server(ip, port, user, passwd, ssl):
return False
ssl_str = "yes" if ssl else "no"
resp = self.post(
"CMD_MULTI_SERVER",
data={
"action": "multiple",
"save": "yes",
"json": "yes",
"passwd": "",
"select0": ip,
f"port-{ip}": str(port),
f"user-{ip}": user,
f"ssl-{ip}": ssl_str,
f"dns-{ip}": "yes",
f"domain_check-{ip}": "yes",
f"user_check-{ip}": "no",
f"email-{ip}": "no",
f"show_all_users-{ip}": "no",
},
)
if resp is None or resp.status_code != 200:
logger.error(
f"[da:{self.hostname}] CMD_MULTI_SERVER save failed for {ip}"
)
return False
try:
result = resp.json()
if result.get("success"):
logger.info(
f"[da:{self.hostname}] Extra DNS server {ip} configured "
f"(dns=yes domain_check=yes)"
)
return True
logger.error(
f"[da:{self.hostname}] CMD_MULTI_SERVER save error: {result.get('result', result)}"
)
return False
except Exception as exc:
logger.error(
f"[da:{self.hostname}] CMD_MULTI_SERVER save parse error: {exc}"
)
return False
# ------------------------------------------------------------------
# Internal
# ------------------------------------------------------------------
def _login(self) -> bool:
"""POST CMD_LOGIN to obtain a DA Evo session cookie.
Populates ``self._cookies`` on success and returns ``True``.
Returns ``False`` on any failure.
"""
login_url = f"{self.scheme}://{self.hostname}:{self.port}/CMD_LOGIN"
try:
response = requests.post(
login_url,
data={
"username": self.username,
"password": self.password,
"referer": "/CMD_DNS_ADMIN?json=yes&page=1&ipp=500",
},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if not response.cookies:
logger.error(
f"[da:{self.hostname}] CMD_LOGIN returned no session cookie — "
f"check username/password."
)
return False
self._cookies = response.cookies
logger.debug(f"[da:{self.hostname}] Session login successful (DA Evo)")
return True
except Exception as exc:
logger.error(f"[da:{self.hostname}] Session login failed: {exc}")
return False
@staticmethod
def _parse_legacy_domain_list(body: str) -> set:
"""Parse DA's legacy CMD_API_SHOW_ALL_DOMAINS URL-encoded response.
DA returns ``list[]=example.com&list[]=example2.com``, optionally
newline-separated instead of ampersand-separated.
"""
normalised = body.replace("\n", "&").strip("&")
params = parse_qs(normalised)
domains = params.get("list[]", [])
return {d.strip().lower() for d in domains if d.strip()}

View File

@@ -1,13 +1,36 @@
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker, declarative_base
from vyper import v
from loguru import logger
import datetime
Base = declarative_base()
def _migrate(engine):
"""Apply additive schema migrations for columns added after initial release."""
migrations = [
("domains", "zone_data", "ALTER TABLE domains ADD COLUMN zone_data TEXT"),
(
"domains",
"zone_updated_at",
"ALTER TABLE domains ADD COLUMN zone_updated_at DATETIME",
),
]
with engine.connect() as conn:
for table, column, ddl in migrations:
try:
conn.execute(text(f"SELECT {column} FROM {table} LIMIT 1"))
except Exception:
try:
conn.execute(text(ddl))
conn.commit()
logger.info(f"[db] Migration applied: added {table}.{column}")
except Exception as exc:
logger.warning(f"[db] Migration skipped ({table}.{column}): {exc}")
def connect(dbtype="sqlite", **kwargs):
if dbtype == "sqlite":
# Start SQLite engine
@@ -19,7 +42,8 @@ def connect(dbtype="sqlite", **kwargs):
"sqlite:///" + db_location, connect_args={"check_same_thread": False}
)
Base.metadata.create_all(engine)
return sessionmaker(bind=engine)()
_migrate(engine)
return sessionmaker(engine)()
elif dbtype == "mysql":
# Start a MySQL engine
db_user = v.get_string("datastore.user")
@@ -50,6 +74,7 @@ def connect(dbtype="sqlite", **kwargs):
+ db_name
)
Base.metadata.create_all(engine)
return sessionmaker(bind=engine)()
_migrate(engine)
return sessionmaker(engine)()
else:
raise Exception("Unknown/unimplemented database type: {}".format(dbtype))

View File

@@ -1,5 +1,5 @@
from directdnsonly.app.db import Base
from sqlalchemy import Column, Integer, String, DateTime
from sqlalchemy import Column, Integer, String, DateTime, Text
class Key(Base):
@@ -25,6 +25,8 @@ class Domain(Base):
domain = Column(String(255), unique=True)
hostname = Column(String(255))
username = Column(String(255))
zone_data = Column(Text, nullable=True) # last known zone file from DA
zone_updated_at = Column(DateTime, nullable=True) # when zone_data was last stored
def __repr__(self):
return "<Domain(id='%s', domain='%s', hostname='%s', username='%s')>" % (

View File

@@ -0,0 +1,353 @@
#!/usr/bin/env python3
"""Peer sync worker — exchanges zone_data between directdnsonly instances.
Each node stores zone_data in its local SQLite DB after every successful
backend write. When DirectAdmin pushes a zone to one node but another
is temporarily offline, the offline node misses that zone_data.
PeerSyncWorker corrects this by periodically comparing zone lists with
all known peers and fetching any zone_data that is newer or absent locally.
It only updates the local DB — it never writes directly to backends. The
existing reconciler healing pass then detects missing zones and re-pushes
using the freshly synced zone_data.
Mesh behaviour:
- Each node exposes /internal/peers listing the URLs it knows about
- During each sync pass, every peer is asked for its peer list; any URLs
not already known are added automatically (gossip-lite discovery)
- A three-node cluster therefore only needs a linear chain of initial
connections — nodes propagate awareness of each other on the first pass
Health tracking:
- Consecutive failures per peer are counted; after FAILURE_THRESHOLD
misses the peer is marked degraded and a warning is logged once
- On the next successful contact the peer is marked recovered
Safety properties:
- If a peer is unreachable, skip it and try next interval
- Only zone_data is synced — backend writes remain the sole responsibility
of the local save queue worker
- Newer zone_updated_at timestamp wins; local data is never overwritten
with older peer data
- Peer discovery is best-effort and never fails a sync pass
"""
import datetime
import os
import threading
from loguru import logger
import requests
from sqlalchemy import select
from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain
# Consecutive failures before a peer is logged as degraded
FAILURE_THRESHOLD = 3
class PeerSyncWorker:
"""Periodically fetches zone_data from peer directdnsonly instances and
stores it locally so the healing pass can re-push missing zones without
waiting for a DirectAdmin re-push."""
def __init__(self, peer_sync_config: dict):
self.enabled = peer_sync_config.get("enabled", False)
self.interval_seconds = peer_sync_config.get("interval_minutes", 15) * 60
self.peers = list(peer_sync_config.get("peers") or [])
# Per-peer health state: url -> {consecutive_failures, healthy, last_seen}
self._peer_health: dict = {}
# ----------------------------------------------------------------
# Env-var peer injection
# ----------------------------------------------------------------
# Original single-peer vars (backward compat):
# DADNS_PEER_SYNC_PEER_URL / _USERNAME / _PASSWORD
# Numbered multi-peer vars (new):
# DADNS_PEER_SYNC_PEER_1_URL / _USERNAME / _PASSWORD
# DADNS_PEER_SYNC_PEER_2_URL / ... (up to 9)
known_urls = {p.get("url") for p in self.peers}
env_candidates = []
single_url = os.environ.get("DADNS_PEER_SYNC_PEER_URL", "").strip()
if single_url:
env_candidates.append({
"url": single_url,
"username": os.environ.get("DADNS_PEER_SYNC_PEER_USERNAME", "peersync"),
"password": os.environ.get("DADNS_PEER_SYNC_PEER_PASSWORD", ""),
})
for i in range(1, 10):
numbered_url = os.environ.get(f"DADNS_PEER_SYNC_PEER_{i}_URL", "").strip()
if not numbered_url:
break
env_candidates.append({
"url": numbered_url,
"username": os.environ.get(
f"DADNS_PEER_SYNC_PEER_{i}_USERNAME", "peersync"
),
"password": os.environ.get(f"DADNS_PEER_SYNC_PEER_{i}_PASSWORD", ""),
})
for candidate in env_candidates:
if candidate["url"] not in known_urls:
self.peers.append(candidate)
known_urls.add(candidate["url"])
logger.debug(
f"[peer_sync] Added peer from env vars: {candidate['url']}"
)
self._stop_event = threading.Event()
self._thread = None
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def start(self):
if not self.enabled:
logger.info("Peer sync disabled — skipping")
return
if not self.peers:
logger.warning("Peer sync enabled but no peers configured")
return
self._stop_event.clear()
self._thread = threading.Thread(
target=self._run, daemon=True, name="peer_sync_worker"
)
self._thread.start()
peer_urls = [p.get("url", "?") for p in self.peers]
logger.info(
f"Peer sync worker started — "
f"interval: {self.interval_seconds // 60}m, "
f"peers: {peer_urls}"
)
def stop(self):
self._stop_event.set()
if self._thread:
self._thread.join(timeout=10)
logger.info("Peer sync worker stopped")
@property
def is_alive(self):
return self._thread is not None and self._thread.is_alive()
def get_peer_urls(self) -> list:
"""Return the current list of known peer URLs.
Exposed via /internal/peers so other nodes can discover this node's mesh."""
return [p["url"] for p in self.peers if p.get("url")]
def get_peer_status(self) -> dict:
"""Return peer health summary for the /status endpoint."""
peers = []
for peer in self.peers:
url = peer.get("url", "")
h = self._peer_health.get(url, {})
last_seen = h.get("last_seen")
peers.append({
"url": url,
"healthy": h.get("healthy", True),
"consecutive_failures": h.get("consecutive_failures", 0),
"last_seen": last_seen.isoformat() if last_seen else None,
})
healthy = sum(1 for p in peers if p["healthy"])
return {
"enabled": self.enabled,
"alive": self.is_alive,
"interval_minutes": self.interval_seconds // 60,
"peers": peers,
"total": len(peers),
"healthy": healthy,
"degraded": len(peers) - healthy,
}
# ------------------------------------------------------------------
# Health tracking
# ------------------------------------------------------------------
def _health(self, url: str) -> dict:
return self._peer_health.setdefault(
url, {"consecutive_failures": 0, "healthy": True, "last_seen": None}
)
def _record_success(self, url: str):
h = self._health(url)
recovered = not h["healthy"]
h.update(
consecutive_failures=0,
healthy=True,
last_seen=datetime.datetime.utcnow(),
)
if recovered:
logger.info(f"[peer_sync] {url}: peer recovered")
def _record_failure(self, url: str, exc):
h = self._health(url)
h["consecutive_failures"] += 1
if h["healthy"] and h["consecutive_failures"] >= FAILURE_THRESHOLD:
h["healthy"] = False
logger.warning(
f"[peer_sync] {url}: marked degraded after {FAILURE_THRESHOLD} "
f"consecutive failures — {exc}"
)
else:
logger.debug(
f"[peer_sync] {url}: unreachable "
f"(failure #{h['consecutive_failures']}) — {exc}"
)
# ------------------------------------------------------------------
# Internal
# ------------------------------------------------------------------
def _run(self):
logger.info("Peer sync worker starting — running initial sync now")
self._sync_all()
while not self._stop_event.wait(timeout=self.interval_seconds):
self._sync_all()
def _sync_all(self):
logger.debug(f"[peer_sync] Starting sync pass across {len(self.peers)} peer(s)")
# Iterate over a snapshot — _discover_peers_from may grow self.peers
for peer in list(self.peers):
url = peer.get("url")
if not url:
logger.warning("[peer_sync] Peer config missing url — skipping")
continue
try:
self._sync_from_peer(peer)
self._discover_peers_from(peer)
self._record_success(url)
except Exception as exc:
self._record_failure(url, exc)
def _discover_peers_from(self, peer: dict):
"""Fetch peer's known peer list and add any new nodes for mesh expansion.
This is best-effort — failures are silently swallowed so they never
interrupt the main sync pass."""
url = peer.get("url", "").rstrip("/")
username = peer.get("username")
password = peer.get("password")
auth = (username, password) if username else None
try:
resp = requests.get(f"{url}/internal/peers", auth=auth, timeout=5)
if resp.status_code != 200:
return
remote_urls = resp.json() # list of URL strings
known_urls = {p.get("url") for p in self.peers}
for remote_url in remote_urls:
if remote_url and remote_url not in known_urls:
# Inherit credentials from the introducing peer — in practice
# all cluster nodes share the same peer_sync auth credentials.
self.peers.append({
"url": remote_url,
"username": username,
"password": password,
})
known_urls.add(remote_url)
logger.info(
f"[peer_sync] Discovered new peer {remote_url} via {url}"
)
except Exception:
pass # discovery is best-effort
def _sync_from_peer(self, peer: dict):
url = peer.get("url", "").rstrip("/")
username = peer.get("username")
password = peer.get("password")
auth = (username, password) if username else None
# Fetch the peer's zone list
resp = requests.get(f"{url}/internal/zones", auth=auth, timeout=10)
if resp.status_code != 200:
logger.warning(
f"[peer_sync] {url}: /internal/zones returned {resp.status_code}"
)
return
peer_zones = resp.json() # [{domain, zone_updated_at, hostname, username}]
if not peer_zones:
logger.debug(f"[peer_sync] {url}: no zone_data on peer yet")
return
session = connect()
try:
synced = 0
for entry in peer_zones:
domain = entry.get("domain")
if not domain:
continue
peer_ts_str = entry.get("zone_updated_at")
peer_ts = (
datetime.datetime.fromisoformat(peer_ts_str)
if peer_ts_str
else None
)
local = session.execute(
select(Domain).filter_by(domain=domain)
).scalar_one_or_none()
needs_sync = (
local is None
or local.zone_data is None
or (peer_ts and not local.zone_updated_at)
or (
peer_ts
and local.zone_updated_at
and peer_ts > local.zone_updated_at
)
)
if not needs_sync:
continue
# Fetch full zone_data from peer
zresp = requests.get(
f"{url}/internal/zones",
params={"domain": domain},
auth=auth,
timeout=10,
)
if zresp.status_code != 200:
logger.warning(
f"[peer_sync] {url}: could not fetch zone_data "
f"for {domain} (HTTP {zresp.status_code})"
)
continue
zdata = zresp.json()
zone_data = zdata.get("zone_data")
if not zone_data:
continue
if local is None:
local = Domain(
domain=domain,
hostname=entry.get("hostname"),
username=entry.get("username"),
zone_data=zone_data,
zone_updated_at=peer_ts,
)
session.add(local)
logger.debug(
f"[peer_sync] {url}: created local record for {domain}"
)
else:
local.zone_data = zone_data
local.zone_updated_at = peer_ts
logger.debug(f"[peer_sync] {url}: updated zone_data for {domain}")
synced += 1
if synced:
session.commit()
logger.info(f"[peer_sync] Synced {synced} zone(s) from {url}")
else:
logger.debug(f"[peer_sync] {url}: already up to date")
finally:
session.close()

View File

@@ -1,11 +1,10 @@
#!/usr/bin/env python3
import datetime
import threading
from urllib.parse import parse_qs
from loguru import logger
from sqlalchemy import select
import requests
import requests.exceptions
from directdnsonly.app.da import DirectAdminClient
from directdnsonly.app.db import connect
from directdnsonly.app.db.models import Domain
@@ -14,6 +13,10 @@ class ReconciliationWorker:
"""Periodically polls configured DirectAdmin servers and queues deletes
for any zones in our DB that no longer exist in DirectAdmin.
Also runs an Option C backend healing pass: for each zone with stored
zone_data, checks every backend for presence and re-queues any that are
missing (e.g. after a prolonged backend outage).
Safety rules:
- If a DA server is unreachable, skip it entirely — never delete on uncertainty
- Only touches domains registered via DaDNS (present in our `domains` table)
@@ -21,16 +24,36 @@ class ReconciliationWorker:
- Pushes to the existing delete_queue so the full delete path is exercised
"""
def __init__(self, delete_queue, reconciliation_config: dict):
def __init__(
self,
delete_queue,
reconciliation_config: dict,
save_queue=None,
backend_registry=None,
):
self.delete_queue = delete_queue
self.save_queue = save_queue
self.backend_registry = backend_registry
self.enabled = reconciliation_config.get("enabled", False)
self.interval_seconds = reconciliation_config.get("interval_minutes", 60) * 60
self.servers = reconciliation_config.get("directadmin_servers") or []
self.verify_ssl = reconciliation_config.get("verify_ssl", True)
self.ipp = int(reconciliation_config.get("ipp", 1000))
self.dry_run = bool(reconciliation_config.get("dry_run", False))
self._initial_delay = reconciliation_config.get("initial_delay_minutes", 0) * 60
self._stop_event = threading.Event()
self._thread = None
self._last_run: dict = {}
def get_status(self) -> dict:
"""Return reconciler configuration and last-run statistics."""
return {
"enabled": self.enabled,
"alive": self.is_alive,
"dry_run": self.dry_run,
"interval_minutes": self.interval_seconds // 60,
"last_run": dict(self._last_run),
}
def start(self):
if not self.enabled:
@@ -49,9 +72,15 @@ class ReconciliationWorker:
self._thread.start()
server_names = [s.get("hostname", "?") for s in self.servers]
mode = "DRY-RUN" if self.dry_run else "LIVE"
delay_str = (
f", initial_delay: {self._initial_delay // 60}m"
if self._initial_delay
else ""
)
logger.info(
f"Reconciliation poller started [{mode}] — "
f"interval: {self.interval_seconds // 60}m, "
f"interval: {self.interval_seconds // 60}m"
f"{delay_str}, "
f"servers: {server_names}"
)
if self.dry_run:
@@ -74,49 +103,68 @@ class ReconciliationWorker:
# ------------------------------------------------------------------
def _run(self):
if self._initial_delay > 0:
logger.info(
f"[reconciler] Initial delay {self._initial_delay // 60}m — "
f"first reconciliation pass deferred"
)
if self._stop_event.wait(timeout=self._initial_delay):
return # stopped cleanly during the initial delay
logger.info("Reconciliation worker starting — running initial check now")
self._reconcile_all()
# Wait for interval or stop signal; returns True when stopped
while not self._stop_event.wait(timeout=self.interval_seconds):
self._reconcile_all()
def _reconcile_all(self):
started_at = datetime.datetime.utcnow()
self._last_run = {"status": "running", "started_at": started_at.isoformat()}
logger.info(
f"[reconciler] Starting reconciliation pass across "
f"{len(self.servers)} server(s)"
)
total_queued = 0
# Build a map of all domains seen on all DA servers
all_da_domains = {} # domain -> hostname
da_servers_polled = 0
da_servers_unreachable = 0
migrated = 0
backfilled = 0
zones_in_db = 0
# Build a map of all domains seen on all DA servers: domain -> hostname
all_da_domains: dict = {}
for server in self.servers:
hostname = server.get("hostname")
if not hostname:
logger.warning("[reconciler] Server config missing hostname — skipping")
continue
try:
da_domains = self._fetch_da_domains(
hostname,
server.get("port", 2222),
server.get("username"),
server.get("password"),
server.get("ssl", True),
ipp=self.ipp,
client = DirectAdminClient(
hostname=hostname,
port=server.get("port", 2222),
username=server.get("username"),
password=server.get("password"),
ssl=server.get("ssl", True),
verify_ssl=self.verify_ssl,
)
da_servers_polled += 1
da_domains = client.list_domains(ipp=self.ipp)
if da_domains is not None:
for d in da_domains:
all_da_domains[d] = hostname
else:
da_servers_unreachable += 1
logger.debug(
f"[reconciler] {hostname}: {len(da_domains) if da_domains else 0} active domain(s) in DA"
f"[reconciler] {hostname}: "
f"{len(da_domains) if da_domains else 0} active domain(s) in DA"
)
except Exception as e:
logger.error(f"[reconciler] Unexpected error polling {hostname}: {e}")
except Exception as exc:
logger.error(f"[reconciler] Unexpected error polling {hostname}: {exc}")
da_servers_unreachable += 1
# Now check local DB for all domains, update master if needed, and queue deletes only from recorded master
# Compare local DB against what DA reported; update masters and queue deletes
session = connect()
try:
all_local_domains = session.query(Domain).all()
migrated = 0
backfilled = 0
all_local_domains = session.execute(select(Domain)).scalars().all()
zones_in_db = len(all_local_domains)
known_servers = {s.get("hostname") for s in self.servers}
for record in all_local_domains:
domain = record.domain
@@ -137,7 +185,6 @@ class ReconciliationWorker:
record.hostname = actual_master
migrated += 1
else:
# Only act if the recorded master is one we're polling
if recorded_master in known_servers:
if self.dry_run:
logger.warning(
@@ -158,6 +205,7 @@ class ReconciliationWorker:
f"(master: {recorded_master})"
)
total_queued += 1
if migrated or backfilled:
session.commit()
if backfilled:
@@ -170,6 +218,7 @@ class ReconciliationWorker:
)
finally:
session.close()
if self.dry_run:
logger.info(
f"[reconciler] Reconciliation pass complete [DRY-RUN] — "
@@ -181,265 +230,93 @@ class ReconciliationWorker:
f"{total_queued} domain(s) queued for deletion"
)
def _fetch_da_domains(
self,
hostname: str,
port: int,
username: str,
password: str,
use_ssl: bool,
ipp: int = 1000,
):
"""Fetch all domains from a DA server via CMD_DNS_ADMIN (JSON, paging supported).
# Option C: heal backends that are missing zones
zones_healed = 0
if self.save_queue is not None and self.backend_registry is not None:
zones_healed = self._heal_backends()
Returns a set of domain strings on success, or None on any failure.
completed_at = datetime.datetime.utcnow()
self._last_run = {
"status": "ok",
"started_at": started_at.isoformat(),
"completed_at": completed_at.isoformat(),
"duration_seconds": round(
(completed_at - started_at).total_seconds(), 1
),
"da_servers_polled": da_servers_polled,
"da_servers_unreachable": da_servers_unreachable,
"zones_in_da": len(all_da_domains),
"zones_in_db": zones_in_db,
"orphans_found": total_queued,
"orphans_queued": total_queued if not self.dry_run else 0,
"hostnames_backfilled": backfilled,
"hostnames_migrated": migrated,
"zones_healed": zones_healed,
"dry_run": self.dry_run,
}
def _heal_backends(self) -> int:
"""Check every backend for zone presence and re-queue any zone that is
missing from one or more backends, using the stored zone_data as the
authoritative source. This corrects backends that missed pushes due to
downtime without waiting for DirectAdmin to re-send the zone.
"""
scheme = "https" if use_ssl else "http"
page = 1
all_domains = set()
total_pages = 1
cookies = None
backends = self.backend_registry.get_available_backends()
if not backends:
return 0
session = connect()
healed = 0
try:
while page <= total_pages:
url = f"{scheme}://{hostname}:{port}/CMD_DNS_ADMIN?json=yes&page={page}&ipp={ipp}"
req_kwargs = dict(
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
domains = session.execute(
select(Domain).where(Domain.zone_data.isnot(None))
).scalars().all()
if not domains:
logger.debug(
"[reconciler] Healing pass: no zone_data stored yet — skipping"
)
if cookies:
req_kwargs["cookies"] = cookies
else:
req_kwargs["auth"] = (username, password)
response = requests.get(url, **req_kwargs)
if response.is_redirect or response.status_code in (
301,
302,
303,
307,
308,
):
if not cookies:
logger.debug(
f"[reconciler] {hostname}:{port} redirected Basic Auth "
f"(HTTP {response.status_code}) — attempting session login (DA Evo)"
return 0
for record in domains:
missing = []
for backend_name, backend in backends.items():
try:
if not backend.zone_exists(record.domain):
missing.append(backend_name)
except Exception as exc:
logger.warning(
f"[reconciler] heal: zone_exists check failed for "
f"{record.domain} on {backend_name}: {exc}"
)
cookies = self._da_session_login(
scheme, hostname, port, username, password
)
if cookies is None:
return None
continue # retry this page with cookies
else:
logger.error(
f"[reconciler] {hostname}:{port} still redirecting after session login — "
f"check that '{username}' has admin-level access. Skipping."
)
return None
response.raise_for_status()
content_type = response.headers.get("Content-Type", "")
if "text/html" in content_type:
logger.error(
f"[reconciler] {hostname}:{port} returned HTML instead of API response — "
f"check credentials and admin-level access. Skipping."
if missing:
mode = "[DRY-RUN] Would heal" if self.dry_run else "Healing"
logger.warning(
f"[reconciler] {mode}{record.domain} missing from "
f"{missing}; re-queuing with stored zone_data"
)
return None
if not self.dry_run:
self.save_queue.put(
{
"domain": record.domain,
"hostname": record.hostname or "",
"username": record.username or "",
"zone_file": record.zone_data,
"failed_backends": missing,
"retry_count": 0,
"source": "reconciler_heal",
}
)
healed += 1
# Try JSON first
try:
data = response.json()
# Domains are in keys '0', '1', ...
for k, v in data.items():
if k.isdigit() and isinstance(v, dict) and "domain" in v:
all_domains.add(v["domain"].strip().lower())
# Paging info
info = data.get("info", {})
total_pages = int(info.get("total_pages", 1))
page += 1
continue
except Exception as e:
logger.error(
f"[reconciler] JSON decode failed for {hostname}:{port} page {page}: {e}\nRaw response: {response.text[:500]}"
)
# Fallback to legacy parser
domains = self._parse_da_domain_list(response.text)
all_domains.update(domains)
break # No paging in legacy mode
return all_domains
except requests.exceptions.SSLError as e:
logger.error(
f"[reconciler] SSL error connecting to {hostname}:{port}{e}. "
f"Set verify_ssl: false in reconciliation config if using self-signed certs."
)
return None
except requests.exceptions.ConnectionError as e:
logger.error(
f"[reconciler] Cannot reach {hostname}:{port}{e}. "
f"Skipping this server."
)
return None
except requests.exceptions.Timeout:
logger.error(
f"[reconciler] Timeout connecting to {hostname}:{port}. "
f"Skipping this server."
)
return None
except requests.exceptions.HTTPError as e:
logger.error(
f"[reconciler] HTTP {response.status_code} from {hostname}:{port}{e}. "
f"Skipping this server."
)
return None
except Exception as e:
logger.error(f"[reconciler] Unexpected error fetching from {hostname}: {e}")
return None
def _da_session_login(
self, scheme: str, hostname: str, port: int, username: str, password: str
):
"""POST to CMD_LOGIN to obtain a DA Evo session cookie.
Returns a RequestsCookieJar on success, or None on failure.
"""
login_url = f"{scheme}://{hostname}:{port}/CMD_LOGIN"
try:
response = requests.post(
login_url,
data={
"username": username,
"password": password,
"referer": "/CMD_DNS_ADMIN?json=yes&page=1&ipp=500",
},
timeout=30,
verify=self.verify_ssl,
allow_redirects=False,
)
if not response.cookies:
logger.error(
f"[reconciler] {hostname}:{port} CMD_LOGIN returned no session cookie — "
f"check username/password."
if healed:
logger.info(
f"[reconciler] Healing pass complete — "
f"{healed} zone(s) re-queued for backend recovery"
)
return None
logger.debug(
f"[reconciler] {hostname}:{port} session login successful (DA Evo)"
)
return response.cookies
except Exception as e:
logger.error(f"[reconciler] {hostname}:{port} session login failed: {e}")
return None
@staticmethod
def _parse_da_domain_list(body: str) -> set:
"""Parse DA's CMD_API_SHOW_ALL_DOMAINS response.
DA returns URL-encoded key=value pairs, either on one line or newline-
separated. The domain list uses the key 'list[]'.
Example response:
list[]=example.com&list[]=example2.com
"""
# Normalise newline-separated responses to a single query string
normalised = body.replace("\n", "&").strip("&")
params = parse_qs(normalised)
domains = params.get("list[]", [])
return {d.strip().lower() for d in domains if d.strip()}
if __name__ == "__main__":
import argparse
import sys
from queue import Queue
parser = argparse.ArgumentParser(
description="Test DirectAdmin domain fetcher (JSON/paging)"
)
parser.add_argument("--hostname", required=True, help="DirectAdmin server hostname")
parser.add_argument(
"--port", type=int, default=2222, help="DirectAdmin port (default: 2222)"
)
parser.add_argument("--username", required=True, help="DirectAdmin admin username")
parser.add_argument("--password", required=True, help="DirectAdmin admin password")
parser.add_argument("--ssl", action="store_true", help="Use HTTPS (default: True)")
parser.add_argument(
"--no-ssl", dest="ssl", action="store_false", help="Use HTTP (not recommended)"
)
parser.set_defaults(ssl=True)
parser.add_argument(
"--verify-ssl", action="store_true", help="Verify SSL certs (default: True)"
)
parser.add_argument(
"--no-verify-ssl",
dest="verify_ssl",
action="store_false",
help="Don't verify SSL certs",
)
parser.set_defaults(verify_ssl=True)
parser.add_argument(
"--ipp", type=int, default=1000, help="Items per page (default: 1000)"
)
parser.add_argument(
"--print-json",
action="store_true",
help="Print raw JSON response for first page",
)
args = parser.parse_args()
# Minimal config for testing
config = {
"enabled": True,
"directadmin_servers": [
{
"hostname": args.hostname,
"port": args.port,
"username": args.username,
"password": args.password,
"ssl": args.ssl,
}
],
"verify_ssl": args.verify_ssl,
}
q = Queue()
worker = ReconciliationWorker(q, config)
server = config["directadmin_servers"][0]
print(
f"Fetching domains from {server['hostname']}:{server['port']} (ipp={args.ipp})..."
)
# Directly call the fetch method for testing
domains = worker._fetch_da_domains(
server["hostname"],
server.get("port", 2222),
server.get("username"),
server.get("password"),
server.get("ssl", True),
ipp=args.ipp,
)
if domains is None:
print("Failed to fetch domains.", file=sys.stderr)
sys.exit(1)
print(f"Fetched {len(domains)} domains:")
for d in sorted(domains):
print(d)
if args.print_json:
# Print the first page's raw JSON for inspection
scheme = "https" if server.get("ssl", True) else "http"
url = f"{scheme}://{server['hostname']}:{server.get('port', 2222)}/CMD_DNS_ADMIN?json=yes&page=1&ipp={args.ipp}"
resp = requests.get(
url,
auth=(server.get("username"), server.get("password")),
timeout=30,
verify=args.verify_ssl,
allow_redirects=False,
)
try:
print("\nRaw JSON for first page:")
print(resp.json())
except Exception:
print("(Could not parse JSON)")
else:
logger.debug(
"[reconciler] Healing pass complete — all backends consistent"
)
finally:
session.close()
return healed

View File

@@ -1,4 +1,5 @@
from loguru import logger
from sqlalchemy import select
from directdnsonly.app.db.models import *
from directdnsonly.app.db import connect
@@ -8,12 +9,11 @@ def check_zone_exists(zone_name):
# Check if zone is present in the index
session = connect()
logger.debug("Checking if {} is present in the DB".format(zone_name))
domain_exists = bool(session.query(Domain.id).filter_by(domain=zone_name).first())
domain_exists = bool(
session.execute(select(Domain.id).filter_by(domain=zone_name)).first()
)
logger.debug("Returned from query: {}".format(domain_exists))
if domain_exists:
return True
else:
return False
return domain_exists
def put_zone_index(zone_name, host_name, user_name):
@@ -28,7 +28,9 @@ def put_zone_index(zone_name, host_name, user_name):
def get_domain_record(zone_name):
"""Return the Domain record for zone_name, or None if not found"""
session = connect()
return session.query(Domain).filter_by(domain=zone_name).first()
return session.execute(
select(Domain).filter_by(domain=zone_name)
).scalar_one_or_none()
def check_parent_domain_owner(zone_name):
@@ -38,7 +40,9 @@ def check_parent_domain_owner(zone_name):
return False
session = connect()
logger.debug("Checking if parent domain {} exists in DB".format(parent_domain))
return bool(session.query(Domain.id).filter_by(domain=parent_domain).first())
return bool(
session.execute(select(Domain.id).filter_by(domain=parent_domain)).first()
)
def get_parent_domain_record(zone_name):
@@ -47,4 +51,6 @@ def get_parent_domain_record(zone_name):
if not parent_domain:
return None
session = connect()
return session.query(Domain).filter_by(domain=parent_domain).first()
return session.execute(
select(Domain).filter_by(domain=parent_domain)
).scalar_one_or_none()

View File

@@ -10,10 +10,12 @@ from typing import Any, Dict
def load_config() -> Vyper:
# Initialize Vyper
v.set_config_name("app") # Looks for app.yaml/app.yml
# Bundled config colocated with this module (always present in the package)
# User-supplied paths checked first so they override the bundled defaults
v.add_config_path("/etc/directdnsonly") # system-level mount
v.add_config_path(".") # CWD (e.g. /app when run directly)
v.add_config_path("./config") # docker-compose volume mount at /app/config
# Bundled config colocated with this module — last-resort fallback
v.add_config_path(str(Path(__file__).parent))
v.add_config_path(".") # Search in current directory
v.add_config_path("./config")
v.set_env_prefix("DADNS")
v.set_env_key_replacer("_", ".")
v.automatic_env()
@@ -41,6 +43,10 @@ def load_config() -> Vyper:
v.set_default("dns.backends.bind.zones_dir", "/etc/named/zones")
v.set_default("dns.backends.bind.named_conf", "/etc/named.conf.local")
v.set_default("dns.backends.nsd.enabled", False)
v.set_default("dns.backends.nsd.zones_dir", "/etc/nsd/zones")
v.set_default("dns.backends.nsd.nsd_conf", "/etc/nsd/nsd.conf.d/zones.conf")
v.set_default("dns.backends.coredns_mysql.enabled", False)
v.set_default("dns.backends.coredns_mysql.host", "localhost")
v.set_default("dns.backends.coredns_mysql.port", 3306)
@@ -60,6 +66,12 @@ def load_config() -> Vyper:
v.set_default("reconciliation.interval_minutes", 60)
v.set_default("reconciliation.verify_ssl", True)
# Peer sync defaults
v.set_default("peer_sync.enabled", False)
v.set_default("peer_sync.interval_minutes", 15)
v.set_default("peer_sync.auth_username", "peersync")
v.set_default("peer_sync.auth_password", "changeme")
# Read configuration
try:
if not v.read_in_config():

View File

@@ -3,6 +3,20 @@ timezone: Pacific/Auckland
log_level: INFO
queue_location: ./data/queues
# Application datastore — stores domain index and zone_data for healing/peer-sync.
# SQLite (default) requires no extra dependencies and is fine for single-node setups.
# MySQL is recommended for multi-node deployments with a shared datastore.
datastore:
type: sqlite
db_location: ./data/directdnsonly.db
# --- MySQL ---
# type: mysql
# host: "127.0.0.1"
# port: "3306"
# name: "directdnsonly"
# user: "directdnsonly"
# pass: "changeme"
app:
auth_username: directdnsonly
auth_password: changeme # Override via DADNS_APP_AUTH_PASSWORD env var
@@ -14,6 +28,8 @@ app:
# enabled: true
# dry_run: true # log orphans but do NOT queue deletes — safe first-run mode
# interval_minutes: 60
# initial_delay_minutes: 0 # stagger first run when running multiple receivers behind a LB
# # e.g. receiver-1: 0, receiver-2: 30 (half the interval)
# verify_ssl: true # set false for self-signed DA certs
# ipp: 1000 # items per page when polling DA (default 1000)
# directadmin_servers:
@@ -28,6 +44,18 @@ app:
# password: secret
# ssl: true
# Peer sync — exchange zone_data between directdnsonly instances
# Enables eventual consistency without a shared datastore.
# If a peer is offline, the sync is silently skipped and retried next interval.
# Use the same credentials as the peer's app.auth_username / auth_password.
#peer_sync:
# enabled: true
# interval_minutes: 15
# peers:
# - url: http://ddo-2:2222 # URL of the peer directdnsonly instance
# username: directdnsonly
# password: changeme
dns:
default_backend: bind
backends:

View File

@@ -3,6 +3,8 @@ import cherrypy
from app.backends import BackendRegistry
from app.api.admin import DNSAdminAPI
from app.api.health import HealthAPI
from app.api.internal import InternalAPI
from app.api.status import StatusAPI
from app import configure_logging
from worker import WorkerManager
from directdnsonly.config import config
@@ -38,10 +40,12 @@ def main():
# Setup worker manager
reconciliation_config = config.get("reconciliation") or {}
peer_sync_config = config.get("peer_sync") or {}
worker_manager = WorkerManager(
queue_path=config.get("queue_location"),
backend_registry=registry,
reconciliation_config=reconciliation_config,
peer_sync_config=peer_sync_config,
)
worker_manager.start()
logger.info(
@@ -87,6 +91,17 @@ def main():
if config.get_string("app.log_level").upper() != "DEBUG":
cherrypy.log.access_log.propagate = False
# Peer sync auth — separate credentials from the DA-facing API so a
# compromised peer node cannot push zones or access the admin endpoints.
peer_user_password_dict = {
config.get_string("peer_sync.auth_username"): config.get_string(
"peer_sync.auth_password"
)
}
peer_check_password = cherrypy.lib.auth_basic.checkpassword_dict(
peer_user_password_dict
)
# Mount applications
root = Root()
root = DNSAdminAPI(
@@ -95,11 +110,18 @@ def main():
backend_registry=registry,
)
root.health = HealthAPI(registry)
root.internal = InternalAPI(peer_syncer=worker_manager._peer_syncer)
root.status = StatusAPI(worker_manager)
# Add queue status endpoint
# Add queue status endpoint (debug)
root.queue_status = lambda: worker_manager.queue_status()
cherrypy.tree.mount(root, "/")
# Override auth for /internal so peers use their own credentials
cherrypy.tree.mount(root, "/", config={
"/internal": {
"tools.auth_basic.checkpassword": peer_check_password,
}
})
cherrypy.engine.start()
logger.success(f"Server started on port {config.get_int('app.listen_port')}")

View File

@@ -1,3 +1,4 @@
import datetime
import os
import threading
import time
@@ -5,149 +6,287 @@ from concurrent.futures import ThreadPoolExecutor, as_completed
from loguru import logger
from persistqueue import Queue
from persistqueue.exceptions import Empty
from sqlalchemy import select
from app.utils import check_zone_exists, put_zone_index
from app.utils.zone_parser import count_zone_records
from directdnsonly.app.db.models import Domain
from directdnsonly.app.db import connect
from directdnsonly.app.reconciler import ReconciliationWorker
from directdnsonly.app.peer_sync import PeerSyncWorker
# ---------------------------------------------------------------------------
# Retry configuration
# ---------------------------------------------------------------------------
MAX_RETRIES = 5
# Seconds to wait before each retry attempt (exponential-ish backoff)
BACKOFF_SECONDS = [30, 120, 300, 900, 1800] # 30s, 2m, 5m, 15m, 30m
RETRY_DRAIN_INTERVAL = 30 # how often the retry drain thread wakes
class WorkerManager:
def __init__(
self, queue_path: str, backend_registry, reconciliation_config: dict = None
self,
queue_path: str,
backend_registry,
reconciliation_config: dict = None,
peer_sync_config: dict = None,
):
self.queue_path = queue_path
self.backend_registry = backend_registry
self._running = False
self._save_thread = None
self._delete_thread = None
self._retry_thread = None
self._reconciler = None
self._peer_syncer = None
self._reconciliation_config = reconciliation_config or {}
self._peer_sync_config = peer_sync_config or {}
self._dead_letter_count = 0
# Initialize queues with error handling
try:
os.makedirs(queue_path, exist_ok=True)
self.save_queue = Queue(f"{queue_path}/save")
self.delete_queue = Queue(f"{queue_path}/delete")
self.retry_queue = Queue(f"{queue_path}/retry")
logger.success(f"Initialized queues at {queue_path}")
except Exception as e:
logger.critical(f"Failed to initialize queues: {e}")
raise
# ------------------------------------------------------------------
# Save queue worker
# ------------------------------------------------------------------
def _process_save_queue(self):
"""Main worker loop for processing save requests"""
logger.info("Save queue worker started")
# Get DB Connection
session = connect()
# Batch tracking
batch_start = None
batch_processed = 0
batch_failed = 0
while self._running:
# Block until at least one item is available
try:
item = self.save_queue.get(block=True, timeout=5)
# Start a new batch timer on the first item
if batch_start is None:
batch_start = time.monotonic()
batch_processed = 0
batch_failed = 0
pending = self.save_queue.qsize()
logger.info(
f"📥 Batch started — {pending + 1} zone(s) queued "
f"for processing"
)
logger.debug(
f"Processing zone update for {item.get('domain', 'unknown')}"
)
if not check_zone_exists(item.get("domain")):
put_zone_index(
item.get("domain"), item.get("hostname"), item.get("username")
)
# Validate item structure
if not all(k in item for k in ["domain", "zone_file"]):
logger.error(f"Invalid queue item: {item}")
self.save_queue.task_done()
batch_failed += 1
continue
# Process with all available backends
backends = self.backend_registry.get_available_backends()
if not backends:
logger.warning("No active backends available!")
if len(backends) > 1:
# Process backends in parallel for faster sync
logger.debug(
f"Processing {item['domain']} across "
f"{len(backends)} backends concurrently: "
f"{', '.join(backends.keys())}"
)
self._process_backends_parallel(backends, item, session)
else:
# Single backend, no need for thread overhead
for backend_name, backend in backends.items():
self._process_single_backend(
backend_name, backend, item, session
)
self.save_queue.task_done()
batch_processed += 1
logger.debug(f"Completed processing for {item['domain']}")
except Empty:
# Queue is empty — if we were in a batch, log the summary
if batch_start is not None:
elapsed = time.monotonic() - batch_start
total = batch_processed + batch_failed
rate = batch_processed / elapsed if elapsed > 0 else 0
logger.success(
f"📦 Batch complete — {batch_processed}/{total} zone(s) "
f"processed successfully in {elapsed:.1f}s "
f"({rate:.1f} zones/sec)"
+ (f", {batch_failed} failed" if batch_failed else "")
)
batch_start = None
batch_processed = 0
batch_failed = 0
continue
except Exception as e:
logger.error(f"Unexpected worker error: {e}")
batch_failed += 1
time.sleep(1) # Prevent tight error loops
def _process_single_backend(self, backend_name, backend, item, session):
"""Process a zone update for a single backend"""
# Open a batch and keep processing until the queue is empty
batch_start = time.monotonic()
batch_processed = 0
batch_failed = 0
logger.info("📥 Batch started")
while True:
try:
domain = item.get("domain", "unknown")
is_retry = item.get("source") in ("retry", "reconciler_heal")
target_backends = item.get("failed_backends") # None = all backends
logger.debug(
f"Processing zone update for {domain}"
+ (f" [retry #{item.get('retry_count', 0)}]" if is_retry else "")
+ (f" [backends: {target_backends}]" if target_backends else "")
)
if not is_retry and not check_zone_exists(domain):
put_zone_index(domain, item.get("hostname"), item.get("username"))
if not all(k in item for k in ["domain", "zone_file"]):
logger.error(f"Invalid queue item: {item}")
self.save_queue.task_done()
batch_failed += 1
else:
backends = self.backend_registry.get_available_backends()
if target_backends:
backends = {
k: v for k, v in backends.items() if k in target_backends
}
if not backends:
logger.warning("No target backends available for this item!")
self.save_queue.task_done()
batch_failed += 1
else:
if len(backends) > 1:
failed = self._process_backends_parallel(backends, item, session)
else:
failed = set()
for backend_name, backend in backends.items():
if not self._process_single_backend(
backend_name, backend, item, session
):
failed.add(backend_name)
if failed:
self._schedule_retry(item, failed)
batch_failed += 1
else:
self._store_zone_data(session, domain, item["zone_file"])
batch_processed += 1
self.save_queue.task_done()
logger.debug(f"Completed processing for {domain}")
except Exception as e:
logger.error(f"Unexpected worker error processing {item.get('domain', '?')}: {e}")
batch_failed += 1
time.sleep(1)
# Check immediately for the next item — keep batch open while
# more work is queued; close it only when the queue is empty.
try:
item = self.save_queue.get_nowait()
except Empty:
break
elapsed = time.monotonic() - batch_start
total = batch_processed + batch_failed
rate = batch_processed / elapsed if elapsed > 0 else 0
logger.success(
f"📦 Batch complete — {batch_processed}/{total} zone(s) "
f"processed successfully in {elapsed:.1f}s "
f"({rate:.1f} zones/sec)"
+ (f", {batch_failed} failed" if batch_failed else "")
)
def _process_single_backend(self, backend_name, backend, item, session) -> bool:
"""Write a zone to one backend. Returns True on success, False on failure."""
try:
logger.debug(f"Using backend: {backend_name}")
if backend.write_zone(item["domain"], item["zone_file"]):
logger.debug(f"Successfully updated {item['domain']} in {backend_name}")
if backend.get_name() == "bind":
# Need to update the named.conf
backend.update_named_conf(
[d.domain for d in session.query(Domain).all()]
[d.domain for d in session.execute(select(Domain)).scalars().all()]
)
# Reload all zones
backend.reload_zone()
else:
backend.reload_zone(zone_name=item["domain"])
# Verify record count matches the source zone from DirectAdmin
self._verify_backend_record_count(
backend_name, backend, item["domain"], item["zone_file"]
)
return True
else:
logger.error(f"Failed to update {item['domain']} in {backend_name}")
return False
except Exception as e:
logger.error(f"Error in {backend_name}: {str(e)}")
return False
def _process_backends_parallel(self, backends, item, session) -> set:
"""Write a zone to multiple backends concurrently.
Returns a set of backend names that failed."""
start_time = time.monotonic()
failed = set()
with ThreadPoolExecutor(
max_workers=len(backends), thread_name_prefix="backend"
) as executor:
futures = {
executor.submit(
self._process_single_backend, backend_name, backend, item, session
): backend_name
for backend_name, backend in backends.items()
}
for future in as_completed(futures):
backend_name = futures[future]
try:
success = future.result()
if not success:
failed.add(backend_name)
except Exception as e:
logger.error(f"Unhandled error in backend {backend_name}: {e}")
failed.add(backend_name)
elapsed = (time.monotonic() - start_time) * 1000
logger.debug(
f"Parallel processing of {item['domain']} across "
f"{len(backends)} backends completed in {elapsed:.0f}ms"
)
return failed
def _schedule_retry(self, item: dict, failed_backends: set):
"""Push a failed write onto the retry queue with exponential backoff.
Discards to dead-letter after MAX_RETRIES attempts."""
retry_count = item.get("retry_count", 0) + 1
if retry_count > MAX_RETRIES:
self._dead_letter_count += 1
logger.error(
f"[retry] Dead-letter: {item['domain']} failed on "
f"{failed_backends} after {MAX_RETRIES} attempts — giving up"
)
return
delay = BACKOFF_SECONDS[min(retry_count - 1, len(BACKOFF_SECONDS) - 1)]
retry_item = {
**item,
"failed_backends": list(failed_backends),
"retry_count": retry_count,
"retry_after": time.time() + delay,
"source": "retry",
}
self.retry_queue.put(retry_item)
logger.warning(
f"[retry] {item['domain']}{list(failed_backends)} "
f"scheduled for retry #{retry_count} in {delay}s"
)
def _store_zone_data(self, session, domain: str, zone_file: str):
"""Persist the latest zone file content to the domain DB record."""
try:
record = session.execute(
select(Domain).filter_by(domain=domain)
).scalar_one_or_none()
if record:
record.zone_data = zone_file
record.zone_updated_at = datetime.datetime.utcnow()
session.commit()
except Exception as exc:
logger.warning(f"[worker] Could not store zone_data for {domain}: {exc}")
# ------------------------------------------------------------------
# Retry drain worker
# ------------------------------------------------------------------
def _process_retry_queue(self):
"""Periodically drain the retry queue and re-feed ready items to the
save queue. Items not yet due are put back onto the retry queue."""
logger.info("Retry drain worker started")
while self._running:
time.sleep(RETRY_DRAIN_INTERVAL)
now = time.time()
pending = []
# Drain all current retry items into memory
while True:
try:
pending.append(self.retry_queue.get_nowait())
self.retry_queue.task_done()
except Empty:
break
if not pending:
continue
ready = [i for i in pending if i.get("retry_after", 0) <= now]
not_ready = [i for i in pending if i.get("retry_after", 0) > now]
for item in not_ready:
self.retry_queue.put(item)
for item in ready:
logger.info(
f"[retry] Re-queuing {item['domain']}"
f"{item.get('failed_backends')} "
f"(attempt #{item.get('retry_count', '?')})"
)
self.save_queue.put(item)
if ready:
logger.debug(
f"[retry] Drain: {len(ready)} item(s) ready, "
f"{len(not_ready)} still pending"
)
# ------------------------------------------------------------------
# Delete queue worker
# ------------------------------------------------------------------
def _process_delete_queue(self):
"""Worker loop for processing zone deletion requests"""
logger.info("Delete queue worker started")
session = connect()
@@ -159,7 +298,9 @@ class WorkerManager:
logger.debug(f"Processing delete for {domain}")
record = session.query(Domain).filter_by(domain=domain).first()
record = session.execute(
select(Domain).filter_by(domain=domain)
).scalar_one_or_none()
if not record:
logger.warning(f"Domain {domain} not found in DB — skipping delete")
self.delete_queue.task_done()
@@ -179,33 +320,21 @@ class WorkerManager:
)
backends = self.backend_registry.get_available_backends()
remaining_domains = [d.domain for d in session.query(Domain).all()]
remaining_domains = [
d.domain for d in session.execute(select(Domain)).scalars().all()
]
delete_success = True
if not backends:
logger.warning(
f"No active backends — {domain} will be removed from DB only"
)
elif len(backends) > 1:
# Parallel delete, track failures
results = []
def delete_backend_wrapper(
backend_name, backend, domain, remaining_domains
):
try:
return backend.delete_zone(domain)
except Exception as e:
logger.error(
f"Error deleting {domain} from {backend_name}: {e}"
)
return False
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=len(backends)) as executor:
futures = {
executor.submit(
delete_backend_wrapper,
self._delete_single_backend,
backend_name,
backend,
domain,
@@ -216,12 +345,7 @@ class WorkerManager:
for future in as_completed(futures):
backend_name = futures[future]
try:
result = future.result()
results.append(result)
if not result:
logger.error(
f"Failed to delete {domain} from {backend_name}"
)
results.append(future.result())
except Exception as e:
logger.error(
f"Unhandled error deleting from {backend_name}: {e}"
@@ -229,32 +353,22 @@ class WorkerManager:
results.append(False)
delete_success = all(results)
else:
# Single backend
for backend_name, backend in backends.items():
try:
result = backend.delete_zone(domain)
if not result:
logger.error(
f"Failed to delete {domain} from {backend_name}"
)
delete_success = False
except Exception as e:
logger.error(
f"Error deleting {domain} from {backend_name}: {e}"
)
if not self._delete_single_backend(
backend_name, backend, domain, remaining_domains
):
delete_success = False
if delete_success:
session.delete(record)
session.commit()
logger.info(f"Removed {domain} from database")
self.delete_queue.task_done()
logger.success(f"Delete completed for {domain}")
else:
logger.error(
f"Delete failed for {domain} on one or more backends — DB record retained"
f"Delete failed for {domain} on one or more backends — "
f"DB record retained"
)
self.delete_queue.task_done()
self.delete_queue.task_done()
except Empty:
continue
@@ -262,8 +376,10 @@ class WorkerManager:
logger.error(f"Unexpected delete worker error: {e}")
time.sleep(1)
def _delete_single_backend(self, backend_name, backend, domain, remaining_domains):
"""Delete a zone from a single backend"""
def _delete_single_backend(
self, backend_name, backend, domain, remaining_domains
) -> bool:
"""Delete a zone from one backend. Returns True on success."""
try:
if backend.delete_zone(domain):
logger.debug(f"Deleted {domain} from {backend_name}")
@@ -272,83 +388,19 @@ class WorkerManager:
backend.reload_zone()
else:
backend.reload_zone(zone_name=domain)
return True
else:
logger.error(f"Failed to delete {domain} from {backend_name}")
return False
except Exception as e:
logger.error(f"Error deleting {domain} from {backend_name}: {e}")
return False
def _process_backends_delete_parallel(self, backends, domain, remaining_domains):
"""Delete a zone from multiple backends in parallel"""
start_time = time.monotonic()
with ThreadPoolExecutor(
max_workers=len(backends),
thread_name_prefix="backend_del",
) as executor:
futures = {
executor.submit(
self._delete_single_backend,
backend_name,
backend,
domain,
remaining_domains,
): backend_name
for backend_name, backend in backends.items()
}
for future in as_completed(futures):
backend_name = futures[future]
try:
future.result()
except Exception as e:
logger.error(f"Unhandled error deleting from {backend_name}: {e}")
elapsed = (time.monotonic() - start_time) * 1000
logger.debug(
f"Parallel delete of {domain} across "
f"{len(backends)} backends completed in {elapsed:.0f}ms"
)
def _process_backends_parallel(self, backends, item, session):
"""Process zone updates across multiple backends in parallel"""
start_time = time.monotonic()
with ThreadPoolExecutor(
max_workers=len(backends), thread_name_prefix="backend"
) as executor:
futures = {
executor.submit(
self._process_single_backend, backend_name, backend, item, session
): backend_name
for backend_name, backend in backends.items()
}
for future in as_completed(futures):
backend_name = futures[future]
try:
future.result()
except Exception as e:
logger.error(
f"Unhandled error processing backend "
f"{backend_name}: {str(e)}"
)
elapsed = (time.monotonic() - start_time) * 1000
logger.debug(
f"Parallel processing of {item['domain']} across "
f"{len(backends)} backends completed in {elapsed:.0f}ms"
)
# ------------------------------------------------------------------
# Record count verification
# ------------------------------------------------------------------
def _verify_backend_record_count(self, backend_name, backend, zone_name, zone_data):
"""Verify and reconcile the backend record count against the
authoritative BIND zone from DirectAdmin.
After a successful write, this method checks whether the number of
records stored in the backend matches the number of records parsed
from the source zone file. If there are **extra** records in the
backend (e.g. from replication drift or stale data) they are
automatically removed via the backend's reconcile method.
Args:
backend_name: Display name of the backend instance
backend: The backend instance
zone_name: The zone that was just written
zone_data: The raw BIND zone file content (authoritative source)
"""
try:
expected = count_zone_records(zone_data, zone_name)
if expected < 0:
@@ -359,46 +411,40 @@ class WorkerManager:
return
matches, actual = backend.verify_zone_record_count(zone_name, expected)
if matches:
return # All good
return
if actual > expected:
logger.warning(
f"[{backend_name}] Backend has {actual - expected} extra "
f"record(s) for {zone_name} — reconciling against "
f"DirectAdmin source zone"
f"record(s) for {zone_name} — reconciling"
)
success, removed = backend.reconcile_zone_records(zone_name, zone_data)
if success and removed > 0:
# Verify again after reconciliation
matches, new_count = backend.verify_zone_record_count(
zone_name, expected
)
if matches:
logger.success(
f"[{backend_name}] Reconciliation successful for "
f"{zone_name}: removed {removed} extra record(s), "
f"count now matches source ({new_count})"
f"{zone_name}: removed {removed} extra record(s)"
)
else:
logger.error(
f"[{backend_name}] Reconciliation for {zone_name} "
f"removed {removed} record(s) but count still "
f"mismatched: expected {expected}, got {new_count}"
f"removed {removed} record(s) but count still mismatched: "
f"expected {expected}, got {new_count}"
)
else:
logger.warning(
f"[{backend_name}] Backend has fewer records than source "
f"for {zone_name} (expected {expected}, got {actual}) — "
f"this may indicate a write failure; the next zone push "
f"from DirectAdmin should correct this"
f"next zone push from DirectAdmin should correct this"
)
except NotImplementedError:
logger.debug(
f"[{backend_name}] Record count verification not "
f"supported — skipping"
f"[{backend_name}] Record count verification not supported — skipping"
)
except Exception as e:
logger.error(
@@ -406,50 +452,70 @@ class WorkerManager:
f"for {zone_name}: {e}"
)
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def start(self):
"""Start background workers"""
if self._running:
return
self._running = True
self._save_thread = threading.Thread(
target=self._process_save_queue, daemon=True, name="save_queue_worker"
)
self._delete_thread = threading.Thread(
target=self._process_delete_queue, daemon=True, name="delete_queue_worker"
)
self._retry_thread = threading.Thread(
target=self._process_retry_queue, daemon=True, name="retry_drain_worker"
)
self._save_thread.start()
self._delete_thread.start()
logger.info(
f"Started worker threads: {self._save_thread.name}, {self._delete_thread.name}"
)
self._retry_thread.start()
logger.info(f"Started worker threads: save, delete, retry_drain")
self._reconciler = ReconciliationWorker(
delete_queue=self.delete_queue,
save_queue=self.save_queue,
backend_registry=self.backend_registry,
reconciliation_config=self._reconciliation_config,
)
self._reconciler.start()
self._peer_syncer = PeerSyncWorker(self._peer_sync_config)
self._peer_syncer.start()
def stop(self):
"""Stop background workers gracefully"""
self._running = False
if self._reconciler:
self._reconciler.stop()
if self._save_thread:
self._save_thread.join(timeout=5)
if self._delete_thread:
self._delete_thread.join(timeout=5)
if self._peer_syncer:
self._peer_syncer.stop()
for thread in (self._save_thread, self._delete_thread, self._retry_thread):
if thread:
thread.join(timeout=5)
logger.info("Workers stopped")
def queue_status(self):
"""Return current queue status"""
reconciler = (
self._reconciler.get_status()
if self._reconciler
else {"enabled": False, "alive": False, "last_run": {}}
)
peer_sync = (
self._peer_syncer.get_peer_status()
if self._peer_syncer
else {"enabled": False, "alive": False, "peers": [], "total": 0, "healthy": 0, "degraded": 0}
)
return {
"save_queue_size": self.save_queue.qsize(),
"delete_queue_size": self.delete_queue.qsize(),
"save_worker_alive": self._save_thread and self._save_thread.is_alive(),
"delete_worker_alive": self._delete_thread
and self._delete_thread.is_alive(),
"reconciler_alive": (
self._reconciler.is_alive if self._reconciler else False
),
"retry_queue_size": self.retry_queue.qsize(),
"dead_letters": self._dead_letter_count,
"save_worker_alive": bool(self._save_thread and self._save_thread.is_alive()),
"delete_worker_alive": bool(self._delete_thread and self._delete_thread.is_alive()),
"retry_worker_alive": bool(self._retry_thread and self._retry_thread.is_alive()),
"reconciler": reconciler,
"peer_sync": peer_sync,
}

View File

@@ -1,12 +1,91 @@
#!/bin/bash
set -e
# Start BIND
/usr/sbin/named -u bind -f &
# ---------------------------------------------------------------------------
# Detect which DNS backend type(s) are configured and enabled.
# Uses the same config search order as the application itself.
# ---------------------------------------------------------------------------
detect_backend_types() {
python3 - <<'EOF'
import yaml, sys, os
## Initialize MySQL schema if needed
#if [ -f /app/schema/coredns_mysql.sql ]; then
# mysql -h mysql -u root -prootpassword coredns < /app/schema/coredns_mysql.sql
#fi
config_paths = [
"/etc/directdnsonly/app.yml",
"/etc/directdnsonly/app.yaml",
"/app/app.yml",
"/app/app.yaml",
"/app/config/app.yml",
"/app/config/app.yaml",
]
# Start the application
poetry run python directdnsonly/main.py
# Also honour env-var-only deployments (no config file)
bind_env = os.environ.get("DADNS_DNS_BACKENDS_BIND_ENABLED", "").lower() == "true"
nsd_env = os.environ.get("DADNS_DNS_BACKENDS_NSD_ENABLED", "").lower() == "true"
config = {}
for path in config_paths:
if os.path.exists(path):
with open(path) as f:
config = yaml.safe_load(f) or {}
break
backends = config.get("dns", {}).get("backends", {})
has_bind = bind_env
has_nsd = nsd_env
for cfg in backends.values():
if not isinstance(cfg, dict) or not cfg.get("enabled", False):
continue
btype = cfg.get("type", "")
if btype == "bind":
has_bind = True
elif btype == "nsd":
has_nsd = True
types = []
if has_bind:
types.append("bind")
if has_nsd:
types.append("nsd")
print(" ".join(types) if types else "none")
EOF
}
BACKEND_TYPES=$(detect_backend_types)
echo "[entrypoint] Detected DNS backend type(s): ${BACKEND_TYPES:-none}"
# ---------------------------------------------------------------------------
# Start BIND if a bind backend is configured
# ---------------------------------------------------------------------------
if echo "$BACKEND_TYPES" | grep -qw "bind"; then
if command -v named >/dev/null 2>&1; then
echo "[entrypoint] Starting BIND (named)"
/usr/sbin/named -u bind -f &
else
echo "[entrypoint] WARNING: bind backend configured but 'named' not found — skipping"
fi
fi
# ---------------------------------------------------------------------------
# Start NSD if an nsd backend is configured
# ---------------------------------------------------------------------------
if echo "$BACKEND_TYPES" | grep -qw "nsd"; then
if command -v nsd >/dev/null 2>&1; then
echo "[entrypoint] Starting NSD"
# Ensure nsd-control keys exist (generated on first run)
if [ ! -f /etc/nsd/nsd_server.key ]; then
nsd-control-setup 2>/dev/null || true
fi
/usr/sbin/nsd -d -c /etc/nsd/nsd.conf &
else
echo "[entrypoint] WARNING: nsd backend configured but 'nsd' not found — skipping"
fi
fi
if [ "$BACKEND_TYPES" = "none" ] || [ -z "$BACKEND_TYPES" ]; then
echo "[entrypoint] No local DNS daemon required (CoreDNS MySQL or similar)"
fi
# ---------------------------------------------------------------------------
# Start the directdnsonly application
# ---------------------------------------------------------------------------
exec python -m directdnsonly

20
docker/nsd.conf Normal file
View File

@@ -0,0 +1,20 @@
# NSD base configuration for directdnsonly containers.
# Zone stanzas are written to /etc/nsd/nsd.conf.d/zones.conf by the NSD
# backend and auto-included via the glob below.
server:
server-count: 1
ip-address: 0.0.0.0
port: 53
username: nsd
zonesdir: /etc/nsd/zones
verbosity: 1
# Log to stderr so Docker captures it
logfile: ""
remote-control:
control-enable: yes
control-interface: 127.0.0.1
control-port: 8952
include: /etc/nsd/nsd.conf.d/*.conf

View File

View File

@@ -87,6 +87,9 @@ build:
directdnsonly/main.py
rm -f *.spec
build-docker:
export DOCKER_CONFIG="/home/guisea/.docker/guisea" && \
docker buildx build --platform linux/amd64,linux/arm64 -t guisea/directdnsonly:dev --push --progress plain --file Dockerfile .
# ---------------------------------------------------------------------------
# Clean
# ---------------------------------------------------------------------------

1117
poetry.lock generated

File diff suppressed because it is too large Load Diff

View File

@@ -1,6 +1,6 @@
[project]
name = "directdnsonly"
version = "1.0.9"
version = "2.5.0"
description = "DNS Management System - DirectAdmin to multiple backends"
authors = [
{name = "Aaron Guise",email = "aaron@guise.net.nz"}
@@ -11,24 +11,27 @@ requires-python = ">=3.11,<3.14"
dependencies = [
"vyper-config (>=1.2.1,<2.0.0)",
"loguru (>=0.7.3,<0.8.0)",
"persist-queue (>=1.0.0,<2.0.0)",
"persist-queue (>=1.1.0,<2.0.0)",
"cherrypy (>=18.10.0,<19.0.0)",
"sqlalchemy (<2.0.0)",
"pymysql (>=1.1.1,<2.0.0)",
"dnspython (>=2.7.0,<3.0.0)",
"pyyaml (>=6.0.2,<7.0.0)",
"sqlalchemy (>=2.0.0,<3.0.0)",
"pymysql (>=1.1.2,<2.0.0)",
"dnspython (>=2.8.0,<3.0.0)",
"pyyaml (>=6.0.3,<7.0.0)",
"requests (>=2.32.0,<3.0.0)",
]
[project.scripts]
dadns = "directdnsonly.__main__:run"
[tool.poetry]
package-mode = true
[tool.poetry.group.dev.dependencies]
black = "^25.1.0"
black = "^26.1.0"
pyinstaller = "^6.13.0"
pytest = "^8.3.5"
pytest-cov = "^6.1.1"
pytest-mock = "^3.14.0"
pytest = "^9.0.2"
pytest-cov = "^7.0.0"
pytest-mock = "^3.15.1"
[build-system]
requires = ["poetry-core>=2.0.0,<3.0.0"]

View File

@@ -1,12 +1,33 @@
CREATE TABLE IF NOT EXISTS `records` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`zone` varchar(255) NOT NULL,
`name` varchar(255) NOT NULL,
`ttl` int(11) DEFAULT NULL,
`type` varchar(10) NOT NULL,
`data` text NOT NULL,
-- DirectDNSOnly — CoreDNS MySQL schema
-- Compatible with cybercinch/coredns_mysql_extend
--
-- managed_by values:
-- 'directadmin' zone is managed via directdnsonly / DirectAdmin push
-- 'direct' zone was created directly (not via DA)
-- NULL legacy row created before this column was added
CREATE TABLE IF NOT EXISTS `zones` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`zone_name` varchar(255) NOT NULL,
`managed_by` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_zone` (`zone`),
KEY `idx_name` (`name`),
KEY `idx_type` (`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
UNIQUE KEY `uq_zone_name` (`zone_name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE IF NOT EXISTS `records` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`zone_id` int(11) NOT NULL,
`hostname` varchar(255) NOT NULL,
`type` varchar(10) NOT NULL,
`data` text NOT NULL,
`ttl` int(11) DEFAULT NULL,
`online` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
KEY `idx_zone_id` (`zone_id`),
KEY `idx_hostname` (`hostname`),
CONSTRAINT `fk_records_zone` FOREIGN KEY (`zone_id`) REFERENCES `zones` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- Migration: add managed_by to an existing installation
-- ALTER TABLE `zones` ADD COLUMN `managed_by` varchar(255) DEFAULT NULL;
-- UPDATE `zones` SET `managed_by` = 'directadmin' WHERE `managed_by` IS NULL;

View File

@@ -21,7 +21,7 @@ def engine():
@pytest.fixture
def db_session(engine):
session = sessionmaker(bind=engine)()
session = sessionmaker(engine)()
yield session
session.close()
@@ -37,4 +37,6 @@ def patch_connect(db_session, monkeypatch):
_factory = lambda: db_session # noqa: E731
monkeypatch.setattr("directdnsonly.app.utils.connect", _factory)
monkeypatch.setattr("directdnsonly.app.reconciler.connect", _factory)
monkeypatch.setattr("directdnsonly.app.peer_sync.connect", _factory)
monkeypatch.setattr("directdnsonly.app.api.status.connect", _factory)
return db_session

View File

@@ -1,7 +1,7 @@
"""Tests for the CoreDNS MySQL backend (run against in-memory SQLite)."""
import pytest
from sqlalchemy import create_engine
from sqlalchemy import create_engine, select
from sqlalchemy.orm import scoped_session, sessionmaker
from directdnsonly.app.backends.coredns_mysql import (
@@ -28,7 +28,7 @@ def mysql_backend():
self.config = {}
self.instance_name = "test"
self.engine = engine
self.Session = scoped_session(sessionmaker(bind=engine))
self.Session = scoped_session(sessionmaker(engine))
yield _TestBackend()
engine.dispose()
@@ -84,8 +84,8 @@ def test_write_zone_removes_stale_records(mysql_backend):
mysql_backend.write_zone("example.com", reduced)
session = mysql_backend.Session()
zone = session.query(Zone).filter_by(zone_name="example.com.").first()
records = session.query(Record).filter_by(zone_id=zone.id, type="AAAA").all()
zone = session.execute(select(Zone).filter_by(zone_name="example.com.")).scalar_one_or_none()
records = session.execute(select(Record).filter_by(zone_id=zone.id, type="AAAA")).scalars().all()
assert records == []
session.close()
@@ -141,7 +141,7 @@ def test_reconcile_removes_extra_records(mysql_backend):
# Inject a phantom record directly into the DB
session = mysql_backend.Session()
zone = session.query(Zone).filter_by(zone_name="example.com.").first()
zone = session.execute(select(Zone).filter_by(zone_name="example.com.")).scalar_one_or_none()
session.add(
Record(
zone_id=zone.id,
@@ -165,3 +165,37 @@ def test_reconcile_no_changes_when_zone_matches(mysql_backend):
success, removed = mysql_backend.reconcile_zone_records("example.com", ZONE_DATA)
assert success
assert removed == 0
# ---------------------------------------------------------------------------
# managed_by field
# ---------------------------------------------------------------------------
def test_write_zone_sets_managed_by_directadmin(mysql_backend):
mysql_backend.write_zone("example.com", ZONE_DATA)
session = mysql_backend.Session()
zone = session.execute(
select(Zone).filter_by(zone_name="example.com.")
).scalar_one_or_none()
assert zone.managed_by == "directadmin"
session.close()
def test_write_zone_migrates_null_managed_by(mysql_backend):
"""Zones that pre-exist without managed_by get it set on next write."""
session = mysql_backend.Session()
zone = Zone(zone_name="example.com.", managed_by=None)
session.add(zone)
session.commit()
session.close()
mysql_backend.write_zone("example.com", ZONE_DATA)
session = mysql_backend.Session()
zone = session.execute(
select(Zone).filter_by(zone_name="example.com.")
).scalar_one_or_none()
assert zone.managed_by == "directadmin"
session.close()

370
tests/test_da_client.py Normal file
View File

@@ -0,0 +1,370 @@
"""Tests for directdnsonly.app.da.client — DirectAdminClient."""
import requests.exceptions
from unittest.mock import MagicMock, patch
from directdnsonly.app.da import DirectAdminClient
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
def _make_json_response(domains_list, total_pages=1):
data = {str(i): {"domain": d} for i, d in enumerate(domains_list)}
data["info"] = {"total_pages": total_pages}
mock = MagicMock()
mock.status_code = 200
mock.is_redirect = False
mock.headers = {"Content-Type": "application/json"}
mock.json.return_value = data
mock.raise_for_status = MagicMock()
return mock
def _client():
return DirectAdminClient(
"da1.example.com", 2222, "admin", "secret", ssl=True, verify_ssl=True
)
# ---------------------------------------------------------------------------
# list_domains — JSON happy path
# ---------------------------------------------------------------------------
def test_list_domains_returns_set_from_json():
mock_resp = _make_json_response(["example.com", "test.com"])
with patch("requests.get", return_value=mock_resp):
result = _client().list_domains()
assert result == {"example.com", "test.com"}
def test_list_domains_paginates():
page1 = _make_json_response(["a.com"], total_pages=2)
page2 = _make_json_response(["b.com"], total_pages=2)
with patch("requests.get", side_effect=[page1, page2]):
result = _client().list_domains()
assert result == {"a.com", "b.com"}
# ---------------------------------------------------------------------------
# list_domains — DA Evo session login fallback
# ---------------------------------------------------------------------------
def test_redirect_triggers_session_login():
redirect_resp = MagicMock()
redirect_resp.status_code = 302
redirect_resp.is_redirect = True
client = _client()
with (
patch("requests.get", return_value=redirect_resp),
patch.object(client, "_login", return_value=False),
):
result = client.list_domains()
assert result is None
def test_persistent_redirect_after_login_returns_none():
redirect_resp = MagicMock()
redirect_resp.status_code = 302
redirect_resp.is_redirect = True
client = _client()
# Simulate cookies already set (login succeeded previously)
client._cookies = {"session": "abc"}
with patch("requests.get", return_value=redirect_resp):
result = client.list_domains()
assert result is None
# ---------------------------------------------------------------------------
# list_domains — error cases
# ---------------------------------------------------------------------------
def test_html_response_returns_none():
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.is_redirect = False
mock_resp.headers = {"Content-Type": "text/html; charset=utf-8"}
mock_resp.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_resp):
result = _client().list_domains()
assert result is None
def test_connection_error_returns_none():
with patch(
"requests.get", side_effect=requests.exceptions.ConnectionError("refused")
):
result = _client().list_domains()
assert result is None
def test_timeout_returns_none():
with patch("requests.get", side_effect=requests.exceptions.Timeout()):
result = _client().list_domains()
assert result is None
def test_ssl_error_returns_none():
with patch(
"requests.get", side_effect=requests.exceptions.SSLError("cert verify failed")
):
result = _client().list_domains()
assert result is None
# ---------------------------------------------------------------------------
# _parse_legacy_domain_list
# ---------------------------------------------------------------------------
def test_parse_standard_querystring():
result = DirectAdminClient._parse_legacy_domain_list(
"list[]=example.com&list[]=test.com"
)
assert result == {"example.com", "test.com"}
def test_parse_newline_separated():
result = DirectAdminClient._parse_legacy_domain_list(
"list[]=example.com\nlist[]=test.com"
)
assert result == {"example.com", "test.com"}
def test_parse_empty_body_returns_empty_set():
assert DirectAdminClient._parse_legacy_domain_list("") == set()
def test_parse_normalises_to_lowercase():
result = DirectAdminClient._parse_legacy_domain_list("list[]=EXAMPLE.COM")
assert "example.com" in result
assert "EXAMPLE.COM" not in result
def test_parse_strips_whitespace():
result = DirectAdminClient._parse_legacy_domain_list("list[]= example.com ")
assert "example.com" in result
# ---------------------------------------------------------------------------
# _login
# ---------------------------------------------------------------------------
def test_login_stores_cookies_on_success():
mock_resp = MagicMock()
mock_resp.cookies = {"session": "tok123"}
client = _client()
with patch("requests.post", return_value=mock_resp):
result = client._login()
assert result is True
assert client._cookies == {"session": "tok123"}
def test_login_returns_false_when_no_cookies():
mock_resp = MagicMock()
mock_resp.cookies = {}
client = _client()
with patch("requests.post", return_value=mock_resp):
result = client._login()
assert result is False
assert client._cookies is None
def test_login_returns_false_on_exception():
client = _client()
with patch("requests.post", side_effect=requests.exceptions.ConnectionError()):
result = client._login()
assert result is False
# ---------------------------------------------------------------------------
# get_extra_dns_servers
# ---------------------------------------------------------------------------
def _multi_server_get_resp(servers=None):
mock = MagicMock()
mock.status_code = 200
mock.is_redirect = False
mock.headers = {"Content-Type": "application/json"}
mock.json.return_value = {"CLUSTER_ON": "yes", "servers": servers or {}}
mock.raise_for_status = MagicMock()
return mock
def test_get_extra_dns_servers_returns_servers_dict():
servers = {
"1.2.3.4": {"dns": "yes", "domain_check": "yes", "port": "2222", "ssl": "no"}
}
with patch("requests.get", return_value=_multi_server_get_resp(servers)):
result = _client().get_extra_dns_servers()
assert "1.2.3.4" in result
assert result["1.2.3.4"]["dns"] == "yes"
def test_get_extra_dns_servers_returns_empty_on_http_error():
mock_resp = MagicMock()
mock_resp.status_code = 500
with patch("requests.get", return_value=mock_resp):
result = _client().get_extra_dns_servers()
assert result == {}
def test_get_extra_dns_servers_returns_empty_on_connection_error():
with patch(
"requests.get", side_effect=requests.exceptions.ConnectionError("refused")
):
result = _client().get_extra_dns_servers()
assert result == {}
# ---------------------------------------------------------------------------
# add_extra_dns_server
# ---------------------------------------------------------------------------
def test_add_extra_dns_server_returns_true_on_success():
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.json.return_value = {"result": "", "success": "Connection Added"}
with patch("requests.post", return_value=mock_resp):
result = _client().add_extra_dns_server("1.2.3.4", 2222, "ddnsonly", "s3cr3t")
assert result is True
def test_add_extra_dns_server_returns_false_on_da_error():
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.json.return_value = {"result": "Server already exists", "success": ""}
with patch("requests.post", return_value=mock_resp):
result = _client().add_extra_dns_server("1.2.3.4", 2222, "ddnsonly", "s3cr3t")
assert result is False
def test_add_extra_dns_server_returns_false_on_connection_error():
with patch(
"requests.post", side_effect=requests.exceptions.ConnectionError("refused")
):
result = _client().add_extra_dns_server("1.2.3.4", 2222, "ddnsonly", "s3cr3t")
assert result is False
# ---------------------------------------------------------------------------
# ensure_extra_dns_server
# ---------------------------------------------------------------------------
def _add_success_resp():
mock = MagicMock()
mock.status_code = 200
mock.json.return_value = {"result": "", "success": "Connection Added"}
return mock
def _save_success_resp():
mock = MagicMock()
mock.status_code = 200
mock.json.return_value = {"result": "", "success": "Connections Saved"}
return mock
def test_ensure_extra_dns_server_adds_and_configures_new_server():
"""Server not yet registered — adds it, then saves dns+domain_check settings."""
with (
patch("requests.get", return_value=_multi_server_get_resp(servers={})),
patch(
"requests.post",
side_effect=[_add_success_resp(), _save_success_resp()],
),
):
result = _client().ensure_extra_dns_server(
"1.2.3.4", 2222, "ddnsonly", "s3cr3t"
)
assert result is True
def test_ensure_extra_dns_server_skips_add_when_already_present():
"""Server already registered — no add call, only saves settings."""
existing = {
"1.2.3.4": {"dns": "no", "domain_check": "no", "port": "2222", "ssl": "no"}
}
with (
patch("requests.get", return_value=_multi_server_get_resp(servers=existing)),
patch("requests.post", return_value=_save_success_resp()) as mock_post,
):
result = _client().ensure_extra_dns_server(
"1.2.3.4", 2222, "ddnsonly", "s3cr3t"
)
assert result is True
assert mock_post.call_count == 1 # save only, no add
def test_ensure_extra_dns_server_returns_false_when_add_fails():
fail_resp = MagicMock()
fail_resp.status_code = 200
fail_resp.json.return_value = {"result": "error", "success": ""}
with (
patch("requests.get", return_value=_multi_server_get_resp(servers={})),
patch("requests.post", return_value=fail_resp),
):
result = _client().ensure_extra_dns_server(
"1.2.3.4", 2222, "ddnsonly", "s3cr3t"
)
assert result is False
def test_ensure_extra_dns_server_returns_false_when_save_fails():
"""Add succeeds but the subsequent settings save fails."""
fail_save = MagicMock()
fail_save.status_code = 200
fail_save.json.return_value = {"result": "error", "success": ""}
with (
patch("requests.get", return_value=_multi_server_get_resp(servers={})),
patch(
"requests.post",
side_effect=[_add_success_resp(), fail_save],
),
):
result = _client().ensure_extra_dns_server(
"1.2.3.4", 2222, "ddnsonly", "s3cr3t"
)
assert result is False

227
tests/test_nsd.py Normal file
View File

@@ -0,0 +1,227 @@
"""Tests for directdnsonly.app.backends.nsd — NSDBackend."""
import subprocess
from pathlib import Path
from unittest.mock import patch, MagicMock
import pytest
from directdnsonly.app.backends.nsd import NSDBackend
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
ZONE_DATA = """\
$ORIGIN example.com.
$TTL 300
@ 300 IN SOA ns1.example.com. hostmaster.example.com. (2024010101 3600 900 604800 300)
@ 300 IN NS ns1.example.com.
@ 300 IN A 192.0.2.1
"""
def _make_backend(tmp_path) -> NSDBackend:
"""Return an NSDBackend pointing at tmp_path directories.
is_available() is patched so the tests do not require a real nsd install.
"""
zones_dir = tmp_path / "zones"
nsd_conf = tmp_path / "nsd.conf.d" / "zones.conf"
config = {
"instance_name": "test_nsd",
"zones_dir": str(zones_dir),
"nsd_conf": str(nsd_conf),
}
with patch.object(NSDBackend, "is_available", return_value=True):
return NSDBackend(config)
# ---------------------------------------------------------------------------
# Availability check
# ---------------------------------------------------------------------------
def test_is_available_true(monkeypatch):
monkeypatch.setattr(
"directdnsonly.app.backends.nsd.subprocess.run",
lambda *a, **kw: MagicMock(returncode=0),
)
assert NSDBackend.is_available()
def test_is_available_false_when_not_installed(monkeypatch):
def raise_fnf(*args, **kwargs):
raise FileNotFoundError
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", raise_fnf)
assert not NSDBackend.is_available()
# ---------------------------------------------------------------------------
# Initialisation
# ---------------------------------------------------------------------------
def test_init_creates_zones_dir(tmp_path):
backend = _make_backend(tmp_path)
assert backend.zones_dir.exists()
def test_init_creates_nsd_conf(tmp_path):
backend = _make_backend(tmp_path)
assert backend.nsd_conf.exists()
def test_get_name():
assert NSDBackend.get_name() == "nsd"
# ---------------------------------------------------------------------------
# write_zone
# ---------------------------------------------------------------------------
def test_write_zone_creates_zone_file(tmp_path):
backend = _make_backend(tmp_path)
assert backend.write_zone("example.com", ZONE_DATA)
assert (backend.zones_dir / "example.com.db").exists()
def test_write_zone_content_matches(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
content = (backend.zones_dir / "example.com.db").read_text()
assert content == ZONE_DATA
def test_write_zone_adds_to_conf(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
conf = backend.nsd_conf.read_text()
assert 'name: "example.com"' in conf
assert "example.com.db" in conf
def test_write_zone_idempotent_conf_entry(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.write_zone("example.com", ZONE_DATA)
conf = backend.nsd_conf.read_text()
# Should appear exactly once
assert conf.count('name: "example.com"') == 1
def test_write_zone_multiple_zones(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.write_zone("other.com", ZONE_DATA)
conf = backend.nsd_conf.read_text()
assert 'name: "example.com"' in conf
assert 'name: "other.com"' in conf
# ---------------------------------------------------------------------------
# zone_exists
# ---------------------------------------------------------------------------
def test_zone_exists_after_write(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
assert backend.zone_exists("example.com")
def test_zone_not_exists_before_write(tmp_path):
backend = _make_backend(tmp_path)
assert not backend.zone_exists("missing.com")
# ---------------------------------------------------------------------------
# delete_zone
# ---------------------------------------------------------------------------
def test_delete_zone_removes_file(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
assert backend.delete_zone("example.com")
assert not (backend.zones_dir / "example.com.db").exists()
def test_delete_zone_removes_conf_entry(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.delete_zone("example.com")
conf = backend.nsd_conf.read_text()
assert 'name: "example.com"' not in conf
def test_delete_zone_returns_false_when_missing(tmp_path):
backend = _make_backend(tmp_path)
assert not backend.delete_zone("ghost.com")
def test_delete_zone_leaves_other_zones(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("example.com", ZONE_DATA)
backend.write_zone("other.com", ZONE_DATA)
backend.delete_zone("example.com")
assert 'name: "other.com"' in backend.nsd_conf.read_text()
# ---------------------------------------------------------------------------
# reload_zone — subprocess interactions
# ---------------------------------------------------------------------------
def test_reload_zone_calls_nsd_control_reload(tmp_path, monkeypatch):
backend = _make_backend(tmp_path)
calls = []
def fake_run(cmd, **kwargs):
calls.append(cmd)
return MagicMock(returncode=0, stdout="ok", stderr="")
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", fake_run)
assert backend.reload_zone()
assert calls[0] == ["nsd-control", "reload"]
def test_reload_single_zone_passes_zone_name(tmp_path, monkeypatch):
backend = _make_backend(tmp_path)
calls = []
def fake_run(cmd, **kwargs):
calls.append(cmd)
return MagicMock(returncode=0, stdout="ok", stderr="")
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", fake_run)
assert backend.reload_zone("example.com")
assert calls[0] == ["nsd-control", "reload", "example.com"]
def test_reload_zone_returns_false_on_failure(tmp_path, monkeypatch):
backend = _make_backend(tmp_path)
def fake_run(cmd, **kwargs):
raise subprocess.CalledProcessError(1, cmd, stderr="nsd-control: error")
monkeypatch.setattr("directdnsonly.app.backends.nsd.subprocess.run", fake_run)
assert not backend.reload_zone()
# ---------------------------------------------------------------------------
# update_nsd_conf — full rewrite
# ---------------------------------------------------------------------------
def test_update_nsd_conf_replaces_all_zones(tmp_path):
backend = _make_backend(tmp_path)
backend.write_zone("old.com", ZONE_DATA)
backend.update_nsd_conf(["new1.com", "new2.com"])
conf = backend.nsd_conf.read_text()
assert 'name: "old.com"' not in conf
assert 'name: "new1.com"' in conf
assert 'name: "new2.com"' in conf

446
tests/test_peer_sync.py Normal file
View File

@@ -0,0 +1,446 @@
"""Tests for directdnsonly.app.peer_sync — PeerSyncWorker."""
import datetime
import json
import pytest
from sqlalchemy import select, func
from unittest.mock import patch, MagicMock
from directdnsonly.app.peer_sync import PeerSyncWorker
from directdnsonly.app.db.models import Domain
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
BASE_CONFIG = {
"enabled": True,
"interval_minutes": 15,
"peers": [
{
"url": "http://ddo-2:2222",
"username": "directdnsonly",
"password": "changeme",
}
],
}
NOW = datetime.datetime(2024, 6, 1, 12, 0, 0)
OLDER = datetime.datetime(2024, 6, 1, 11, 0, 0)
ZONE_DATA = "$ORIGIN example.com.\n@ 300 IN SOA ns1 hostmaster 1 3600 900 604800 300\n"
# ---------------------------------------------------------------------------
# Config / startup tests
# ---------------------------------------------------------------------------
def test_disabled_by_default():
worker = PeerSyncWorker({})
assert not worker.enabled
def test_interval_stored():
worker = PeerSyncWorker({"enabled": True, "interval_minutes": 30})
assert worker.interval_seconds == 1800
def test_default_interval():
worker = PeerSyncWorker({"enabled": True})
assert worker.interval_seconds == 15 * 60
def test_peers_stored():
worker = PeerSyncWorker(BASE_CONFIG)
assert len(worker.peers) == 1
assert worker.peers[0]["url"] == "http://ddo-2:2222"
def test_peer_from_env_var(monkeypatch):
"""DADNS_PEER_SYNC_PEER_URL adds a peer without a config file."""
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_URL", "http://ddo-env:2222")
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_USERNAME", "admin")
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_PASSWORD", "secret")
worker = PeerSyncWorker({"enabled": True})
assert len(worker.peers) == 1
assert worker.peers[0]["url"] == "http://ddo-env:2222"
assert worker.peers[0]["username"] == "admin"
assert worker.peers[0]["password"] == "secret"
def test_env_peer_not_duplicated_when_also_in_config(monkeypatch):
"""Env var peer is not added if it already appears in the config file peers list."""
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_URL", "http://ddo-2:2222")
worker = PeerSyncWorker(BASE_CONFIG)
# BASE_CONFIG already has http://ddo-2:2222 — must remain exactly one entry
urls = [p["url"] for p in worker.peers]
assert urls.count("http://ddo-2:2222") == 1
def test_numbered_env_peers(monkeypatch):
"""DADNS_PEER_SYNC_PEER_1_URL and _2_URL add multiple peers."""
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_1_URL", "http://node-a:2222")
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_1_USERNAME", "peersync")
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_1_PASSWORD", "s3cr3t")
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_2_URL", "http://node-b:2222")
worker = PeerSyncWorker({"enabled": True})
urls = [p["url"] for p in worker.peers]
assert "http://node-a:2222" in urls
assert "http://node-b:2222" in urls
assert len(urls) == 2
def test_numbered_env_peers_not_duplicated(monkeypatch):
"""Numbered env var peers are deduplicated against the config file list."""
monkeypatch.setenv("DADNS_PEER_SYNC_PEER_1_URL", "http://ddo-2:2222")
worker = PeerSyncWorker(BASE_CONFIG)
urls = [p["url"] for p in worker.peers]
assert urls.count("http://ddo-2:2222") == 1
def test_get_peer_urls():
worker = PeerSyncWorker(BASE_CONFIG)
assert worker.get_peer_urls() == ["http://ddo-2:2222"]
# ---------------------------------------------------------------------------
# Health tracking
# ---------------------------------------------------------------------------
def test_peer_health_starts_healthy():
worker = PeerSyncWorker(BASE_CONFIG)
h = worker._health("http://ddo-2:2222")
assert h["healthy"] is True
assert h["consecutive_failures"] == 0
def test_record_failure_increments_count():
worker = PeerSyncWorker(BASE_CONFIG)
worker._record_failure("http://ddo-2:2222", ConnectionError("down"))
assert worker._health("http://ddo-2:2222")["consecutive_failures"] == 1
assert worker._health("http://ddo-2:2222")["healthy"] is True
def test_record_failure_marks_degraded_at_threshold():
from directdnsonly.app.peer_sync import FAILURE_THRESHOLD
worker = PeerSyncWorker(BASE_CONFIG)
for _ in range(FAILURE_THRESHOLD):
worker._record_failure("http://ddo-2:2222", ConnectionError("down"))
assert worker._health("http://ddo-2:2222")["healthy"] is False
def test_record_success_resets_health():
from directdnsonly.app.peer_sync import FAILURE_THRESHOLD
worker = PeerSyncWorker(BASE_CONFIG)
for _ in range(FAILURE_THRESHOLD):
worker._record_failure("http://ddo-2:2222", ConnectionError("down"))
assert not worker._health("http://ddo-2:2222")["healthy"]
worker._record_success("http://ddo-2:2222")
assert worker._health("http://ddo-2:2222")["healthy"] is True
assert worker._health("http://ddo-2:2222")["consecutive_failures"] == 0
# ---------------------------------------------------------------------------
# Peer discovery (_discover_peers_from)
# ---------------------------------------------------------------------------
def test_discover_peers_adds_new_peer(monkeypatch):
"""New peer URL returned by /internal/peers is added to the peer list."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
resp.json.return_value = ["http://node-c:2222"]
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._discover_peers_from(BASE_CONFIG["peers"][0])
urls = [p["url"] for p in worker.peers]
assert "http://node-c:2222" in urls
def test_discover_peers_skips_known(monkeypatch):
"""Already-known peer URLs are not re-added."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
resp.json.return_value = ["http://ddo-2:2222"] # already known
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._discover_peers_from(BASE_CONFIG["peers"][0])
assert len(worker.peers) == 1 # unchanged
def test_discover_peers_tolerates_failure(monkeypatch):
"""Network error during discovery does not propagate."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(*args, **kwargs):
raise ConnectionError("peer down")
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
# Should not raise
worker._discover_peers_from(BASE_CONFIG["peers"][0])
def test_start_skips_when_disabled(caplog):
worker = PeerSyncWorker({"enabled": False})
worker.start()
assert worker._thread is None
def test_start_warns_when_no_peers(caplog):
import logging
worker = PeerSyncWorker({"enabled": True, "peers": []})
with patch.object(worker, "_run"):
worker.start()
# Thread should not have started
assert worker._thread is None
# ---------------------------------------------------------------------------
# _sync_from_peer tests
# ---------------------------------------------------------------------------
def _make_peer():
return BASE_CONFIG["peers"][0]
def _peer_list(domain, ts=None):
"""Simulate the JSON response from GET /internal/zones."""
return [
{
"domain": domain,
"zone_updated_at": ts.isoformat() if ts else None,
"hostname": "da1.example.com",
"username": "admin",
}
]
def _peer_zone(domain, ts=None, zone_data=ZONE_DATA):
"""Simulate the JSON response from GET /internal/zones?domain=X."""
return {
"domain": domain,
"zone_data": zone_data,
"zone_updated_at": ts.isoformat() if ts else None,
"hostname": "da1.example.com",
"username": "admin",
}
def test_sync_creates_new_local_record(patch_connect, monkeypatch):
"""When local DB has no record, peer zone_data is fetched and stored."""
worker = PeerSyncWorker(BASE_CONFIG)
session = patch_connect
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
resp.json.return_value = _peer_zone("example.com", NOW)
else:
resp.json.return_value = _peer_list("example.com", NOW)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
record = session.execute(
select(Domain).filter_by(domain="example.com")
).scalar_one_or_none()
assert record is not None
assert record.zone_data == ZONE_DATA
assert record.zone_updated_at == NOW
def test_sync_updates_older_local_record(patch_connect, monkeypatch):
"""When local zone_data is older than peer's, it is overwritten."""
session = patch_connect
session.add(
Domain(domain="example.com", zone_data="old data", zone_updated_at=OLDER)
)
session.commit()
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
resp.json.return_value = _peer_zone("example.com", NOW)
else:
resp.json.return_value = _peer_list("example.com", NOW)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
record = session.execute(
select(Domain).filter_by(domain="example.com")
).scalar_one_or_none()
assert record.zone_data == ZONE_DATA
assert record.zone_updated_at == NOW
def test_sync_skips_when_local_is_newer(patch_connect, monkeypatch):
"""When local zone_data is newer than peer's, it is not overwritten."""
session = patch_connect
session.add(
Domain(domain="example.com", zone_data="newer local", zone_updated_at=NOW)
)
session.commit()
worker = PeerSyncWorker(BASE_CONFIG)
fetch_calls = []
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
fetch_calls.append(url)
resp.json.return_value = _peer_zone("example.com", OLDER)
else:
resp.json.return_value = _peer_list("example.com", OLDER)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
# zone_data fetch should not have been called
assert not fetch_calls
record = session.execute(
select(Domain).filter_by(domain="example.com")
).scalar_one_or_none()
assert record.zone_data == "newer local"
def test_sync_skips_unreachable_peer(monkeypatch):
"""If the peer raises a connection error, _sync_all catches it gracefully."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(*args, **kwargs):
raise ConnectionError("peer down")
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
# Should not raise
worker._sync_all()
def test_sync_skips_peer_with_bad_status(patch_connect, monkeypatch):
"""Non-200 response from peer zone list is silently skipped."""
worker = PeerSyncWorker(BASE_CONFIG)
session = patch_connect
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 503
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
# No records should have been created
assert session.execute(select(func.count()).select_from(Domain)).scalar() == 0
def test_sync_skips_missing_zone_data_in_response(patch_connect, monkeypatch):
"""If the peer returns no zone_data for a domain, it is skipped."""
session = patch_connect
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
if params and params.get("domain"):
resp.json.return_value = {"domain": "example.com", "zone_data": None}
else:
resp.json.return_value = _peer_list("example.com", NOW)
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
assert session.execute(select(func.count()).select_from(Domain)).scalar() == 0
def test_sync_empty_peer_list(patch_connect, monkeypatch):
"""Empty zone list from peer results in zero syncs without error."""
worker = PeerSyncWorker(BASE_CONFIG)
def mock_get(url, auth=None, timeout=10, params=None):
resp = MagicMock()
resp.status_code = 200
resp.json.return_value = []
return resp
monkeypatch.setattr("directdnsonly.app.peer_sync.requests.get", mock_get)
worker._sync_from_peer(_make_peer())
# ---------------------------------------------------------------------------
# get_peer_status
# ---------------------------------------------------------------------------
def test_get_peer_status_no_contact_yet():
worker = PeerSyncWorker(BASE_CONFIG)
status = worker.get_peer_status()
assert status["enabled"] is True
assert status["total"] == 1
assert status["healthy"] == 1
assert status["degraded"] == 0
assert status["peers"][0]["url"] == "http://ddo-2:2222"
assert status["peers"][0]["healthy"] is True
assert status["peers"][0]["last_seen"] is None
def test_get_peer_status_after_success():
worker = PeerSyncWorker(BASE_CONFIG)
worker._record_success("http://ddo-2:2222")
status = worker.get_peer_status()
assert status["healthy"] == 1
assert status["degraded"] == 0
assert status["peers"][0]["last_seen"] is not None
def test_get_peer_status_after_degraded():
from directdnsonly.app.peer_sync import FAILURE_THRESHOLD
worker = PeerSyncWorker(BASE_CONFIG)
for _ in range(FAILURE_THRESHOLD):
worker._record_failure("http://ddo-2:2222", Exception("timeout"))
status = worker.get_peer_status()
assert status["healthy"] == 0
assert status["degraded"] == 1
assert status["peers"][0]["healthy"] is False
def test_get_peer_status_disabled():
worker = PeerSyncWorker({})
status = worker.get_peer_status()
assert status["enabled"] is False
assert status["total"] == 0
assert status["peers"] == []

View File

@@ -1,9 +1,8 @@
"""Tests for directdnsonly.app.reconciler — ReconciliationWorker."""
import pytest
import requests.exceptions
from queue import Queue
from unittest.mock import MagicMock, patch
from unittest.mock import patch, MagicMock
from directdnsonly.app.reconciler import ReconciliationWorker
from directdnsonly.app.db.models import Domain
@@ -47,6 +46,20 @@ def dry_run_worker(delete_queue):
return ReconciliationWorker(delete_queue, cfg)
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
DA_CLIENT_PATH = "directdnsonly.app.reconciler.DirectAdminClient"
def _patch_da(return_value):
"""Patch DirectAdminClient so list_domains returns a fixed value."""
return patch(
DA_CLIENT_PATH, **{"return_value.list_domains.return_value": return_value}
)
# ---------------------------------------------------------------------------
# _reconcile_all — orphan detection
# ---------------------------------------------------------------------------
@@ -58,7 +71,7 @@ def test_orphan_queued_when_domain_missing_from_da(worker, delete_queue, patch_c
)
patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value=set()):
with _patch_da(set()):
worker._reconcile_all()
assert not delete_queue.empty()
@@ -73,7 +86,7 @@ def test_orphan_not_queued_in_dry_run(dry_run_worker, delete_queue, patch_connec
)
patch_connect.commit()
with patch.object(dry_run_worker, "_fetch_da_domains", return_value=set()):
with _patch_da(set()):
dry_run_worker._reconcile_all()
assert delete_queue.empty()
@@ -86,7 +99,7 @@ def test_orphan_not_queued_for_unknown_server(worker, delete_queue, patch_connec
)
patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value=set()):
with _patch_da(set()):
worker._reconcile_all()
assert delete_queue.empty()
@@ -98,7 +111,7 @@ def test_active_domain_not_queued(worker, delete_queue, patch_connect):
)
patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value={"good.com"}):
with _patch_da({"good.com"}):
worker._reconcile_all()
assert delete_queue.empty()
@@ -113,7 +126,7 @@ def test_backfill_null_hostname(worker, patch_connect):
patch_connect.add(Domain(domain="backfill.com", hostname=None, username="admin"))
patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value={"backfill.com"}):
with _patch_da({"backfill.com"}):
worker._reconcile_all()
record = patch_connect.query(Domain).filter_by(domain="backfill.com").first()
@@ -126,7 +139,7 @@ def test_migration_updates_hostname(worker, patch_connect):
)
patch_connect.commit()
with patch.object(worker, "_fetch_da_domains", return_value={"moved.com"}):
with _patch_da({"moved.com"}):
worker._reconcile_all()
record = patch_connect.query(Domain).filter_by(domain="moved.com").first()
@@ -138,148 +151,13 @@ def test_dry_run_still_backfills(dry_run_worker, patch_connect):
patch_connect.add(Domain(domain="fill.com", hostname=None, username="admin"))
patch_connect.commit()
with patch.object(dry_run_worker, "_fetch_da_domains", return_value={"fill.com"}):
with _patch_da({"fill.com"}):
dry_run_worker._reconcile_all()
record = patch_connect.query(Domain).filter_by(domain="fill.com").first()
assert record.hostname == "da1.example.com"
# ---------------------------------------------------------------------------
# _fetch_da_domains — HTTP handling
# ---------------------------------------------------------------------------
def _make_json_response(domains_dict, total_pages=1):
"""Return a mock requests.Response with JSON payload matching DA format."""
data = {str(i): {"domain": d} for i, d in enumerate(domains_dict)}
data["info"] = {"total_pages": total_pages}
mock = MagicMock()
mock.status_code = 200
mock.is_redirect = False
mock.headers = {"Content-Type": "application/json"}
mock.json.return_value = data
mock.raise_for_status = MagicMock()
return mock
def test_fetch_returns_domains_from_json(worker):
mock_resp = _make_json_response(["example.com", "test.com"])
with patch("requests.get", return_value=mock_resp):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result == {"example.com", "test.com"}
def test_fetch_paginates(worker):
page1 = _make_json_response(["a.com"], total_pages=2)
page2 = _make_json_response(["b.com"], total_pages=2)
with patch("requests.get", side_effect=[page1, page2]):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result == {"a.com", "b.com"}
def test_fetch_redirect_triggers_session_login(worker):
redirect_resp = MagicMock()
redirect_resp.status_code = 302
redirect_resp.is_redirect = True
with (
patch("requests.get", return_value=redirect_resp),
patch.object(worker, "_da_session_login", return_value=None),
):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_html_response_returns_none(worker):
mock_resp = MagicMock()
mock_resp.status_code = 200
mock_resp.is_redirect = False
mock_resp.headers = {"Content-Type": "text/html; charset=utf-8"}
mock_resp.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_resp):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_connection_error_returns_none(worker):
with patch(
"requests.get", side_effect=requests.exceptions.ConnectionError("refused")
):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_timeout_returns_none(worker):
with patch("requests.get", side_effect=requests.exceptions.Timeout()):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
def test_fetch_ssl_error_returns_none(worker):
with patch(
"requests.get", side_effect=requests.exceptions.SSLError("cert verify failed")
):
result = worker._fetch_da_domains(
"da1.example.com", 2222, "admin", "secret", True
)
assert result is None
# ---------------------------------------------------------------------------
# _parse_da_domain_list — legacy format fallback
# ---------------------------------------------------------------------------
def test_parse_standard_querystring():
body = "list[]=example.com&list[]=test.com"
result = ReconciliationWorker._parse_da_domain_list(body)
assert result == {"example.com", "test.com"}
def test_parse_newline_separated():
body = "list[]=example.com\nlist[]=test.com"
result = ReconciliationWorker._parse_da_domain_list(body)
assert result == {"example.com", "test.com"}
def test_parse_empty_body_returns_empty_set():
assert ReconciliationWorker._parse_da_domain_list("") == set()
def test_parse_normalises_to_lowercase():
result = ReconciliationWorker._parse_da_domain_list("list[]=EXAMPLE.COM")
assert "example.com" in result
assert "EXAMPLE.COM" not in result
def test_parse_strips_whitespace():
result = ReconciliationWorker._parse_da_domain_list("list[]= example.com ")
assert "example.com" in result
# ---------------------------------------------------------------------------
# Worker lifecycle
# ---------------------------------------------------------------------------
@@ -297,3 +175,225 @@ def test_no_servers_does_not_start(delete_queue):
w = ReconciliationWorker(delete_queue, cfg)
w.start()
assert not w.is_alive
def test_initial_delay_stored(delete_queue):
cfg = {**BASE_CONFIG, "initial_delay_minutes": 30}
w = ReconciliationWorker(delete_queue, cfg)
assert w._initial_delay == 30 * 60
def test_zero_initial_delay_by_default(delete_queue):
w = ReconciliationWorker(delete_queue, BASE_CONFIG)
assert w._initial_delay == 0
# ---------------------------------------------------------------------------
# _heal_backends — Option C backend healing
# ---------------------------------------------------------------------------
def _make_backend_registry(zone_exists_return: bool):
"""Build a mock backend_registry with one backend whose zone_exists returns
the given value."""
backend = MagicMock()
backend.zone_exists.return_value = zone_exists_return
registry = MagicMock()
registry.get_available_backends.return_value = {"coredns": backend}
return registry, backend
def test_heal_queues_zone_missing_from_backend(delete_queue, patch_connect):
save_queue = Queue()
registry, backend = _make_backend_registry(zone_exists_return=False)
patch_connect.add(
Domain(
domain="missing.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
w = ReconciliationWorker(
delete_queue, BASE_CONFIG, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert not save_queue.empty()
item = save_queue.get_nowait()
assert item["domain"] == "missing.com"
assert item["failed_backends"] == ["coredns"]
assert item["source"] == "reconciler_heal"
assert item["zone_file"] == "; zone file"
def test_heal_skips_domains_without_zone_data(delete_queue, patch_connect):
save_queue = Queue()
registry, _ = _make_backend_registry(zone_exists_return=False)
patch_connect.add(
Domain(
domain="nodata.com",
hostname="da1.example.com",
username="admin",
zone_data=None,
)
)
patch_connect.commit()
w = ReconciliationWorker(
delete_queue, BASE_CONFIG, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert save_queue.empty()
def test_heal_skips_when_all_backends_have_zone(delete_queue, patch_connect):
save_queue = Queue()
registry, _ = _make_backend_registry(zone_exists_return=True)
patch_connect.add(
Domain(
domain="present.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
w = ReconciliationWorker(
delete_queue, BASE_CONFIG, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert save_queue.empty()
def test_heal_dry_run_does_not_queue(delete_queue, patch_connect):
save_queue = Queue()
registry, _ = _make_backend_registry(zone_exists_return=False)
patch_connect.add(
Domain(
domain="dry.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
cfg = {**BASE_CONFIG, "dry_run": True}
w = ReconciliationWorker(
delete_queue, cfg, save_queue=save_queue, backend_registry=registry
)
w._heal_backends()
assert save_queue.empty()
def test_heal_skipped_when_no_registry(delete_queue, patch_connect):
"""_heal_backends should not run when backend_registry is None."""
save_queue = Queue()
patch_connect.add(
Domain(
domain="noregistry.com",
hostname="da1.example.com",
username="admin",
zone_data="; zone file",
)
)
patch_connect.commit()
w = ReconciliationWorker(delete_queue, BASE_CONFIG, save_queue=save_queue)
# Should not raise; healing is silently skipped
with _patch_da({"noregistry.com"}):
w._reconcile_all()
assert save_queue.empty()
# ---------------------------------------------------------------------------
# get_status — last-run state
# ---------------------------------------------------------------------------
def test_get_status_before_any_run(worker):
status = worker.get_status()
assert status["enabled"] is True
assert status["alive"] is False
assert status["last_run"] == {}
def test_get_status_after_run(worker, patch_connect):
with _patch_da(set()):
worker._reconcile_all()
s = worker.get_status()
assert s["enabled"] is True
lr = s["last_run"]
assert lr["status"] == "ok"
assert "started_at" in lr
assert "completed_at" in lr
assert "duration_seconds" in lr
assert lr["da_servers_polled"] == 1
assert lr["da_servers_unreachable"] == 0
assert lr["dry_run"] is False
def test_get_status_counts_unreachable_server(worker, patch_connect):
with _patch_da(None):
worker._reconcile_all()
lr = worker.get_status()["last_run"]
assert lr["da_servers_polled"] == 1
assert lr["da_servers_unreachable"] == 1
def test_get_status_counts_orphans(worker, delete_queue, patch_connect):
patch_connect.add(
Domain(domain="orphan.com", hostname="da1.example.com", username="admin")
)
patch_connect.commit()
with _patch_da(set()):
worker._reconcile_all()
lr = worker.get_status()["last_run"]
assert lr["orphans_found"] == 1
assert lr["orphans_queued"] == 1
def test_get_status_dry_run_orphans_not_queued_in_stats(dry_run_worker, patch_connect):
patch_connect.add(
Domain(domain="orphan.com", hostname="da1.example.com", username="admin")
)
patch_connect.commit()
with _patch_da(set()):
dry_run_worker._reconcile_all()
lr = dry_run_worker.get_status()["last_run"]
assert lr["dry_run"] is True
assert lr["orphans_found"] == 1
assert lr["orphans_queued"] == 0
def test_get_status_zones_in_db_counted(worker, patch_connect):
for d in ["a.com", "b.com", "c.com"]:
patch_connect.add(Domain(domain=d, hostname="da1.example.com", username="admin"))
patch_connect.commit()
with _patch_da({"a.com", "b.com", "c.com"}):
worker._reconcile_all()
lr = worker.get_status()["last_run"]
assert lr["zones_in_db"] == 3
assert lr["zones_in_da"] == 3
assert lr["orphans_found"] == 0

162
tests/test_status_api.py Normal file
View File

@@ -0,0 +1,162 @@
"""Tests for directdnsonly.app.api.status — StatusAPI."""
import json
from unittest.mock import MagicMock
import cherrypy
import pytest
from directdnsonly.app.api.status import StatusAPI
from directdnsonly.app.db.models import Domain
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
_RECONCILER_OK = {
"enabled": True,
"alive": True,
"dry_run": False,
"interval_minutes": 60,
"last_run": {},
}
_PEER_SYNC_OFF = {
"enabled": False,
"alive": False,
"peers": [],
"total": 0,
"healthy": 0,
"degraded": 0,
}
def _qs(**overrides):
base = {
"save_queue_size": 0,
"delete_queue_size": 0,
"retry_queue_size": 0,
"dead_letters": 0,
"save_worker_alive": True,
"delete_worker_alive": True,
"retry_worker_alive": True,
"reconciler": _RECONCILER_OK,
"peer_sync": _PEER_SYNC_OFF,
}
base.update(overrides)
return base
def _api(qs=None):
wm = MagicMock()
wm.queue_status.return_value = qs or _qs()
return StatusAPI(wm)
# ---------------------------------------------------------------------------
# _compute_overall
# ---------------------------------------------------------------------------
def test_overall_ok_all_healthy():
assert StatusAPI._compute_overall(_qs()) == "ok"
def test_overall_error_save_worker_dead():
assert StatusAPI._compute_overall(_qs(save_worker_alive=False)) == "error"
def test_overall_error_delete_worker_dead():
assert StatusAPI._compute_overall(_qs(delete_worker_alive=False)) == "error"
def test_overall_degraded_retries_pending():
assert StatusAPI._compute_overall(_qs(retry_queue_size=3)) == "degraded"
def test_overall_degraded_dead_letters():
assert StatusAPI._compute_overall(_qs(dead_letters=1)) == "degraded"
def test_overall_degraded_peer_unhealthy():
ps = {**_PEER_SYNC_OFF, "degraded": 1}
assert StatusAPI._compute_overall(_qs(peer_sync=ps)) == "degraded"
def test_overall_error_takes_priority_over_degraded():
"""error > degraded when both conditions are true."""
assert (
StatusAPI._compute_overall(
_qs(save_worker_alive=False, retry_queue_size=5)
)
== "error"
)
# ---------------------------------------------------------------------------
# _build — structure and zone count
# ---------------------------------------------------------------------------
def test_build_structure(patch_connect):
api = _api()
result = api._build()
assert "status" in result
assert "queues" in result
assert "workers" in result
assert "reconciler" in result
assert "peer_sync" in result
assert "zones" in result
def test_build_zone_count_zero(patch_connect):
api = _api()
result = api._build()
assert result["zones"]["total"] == 0
def test_build_zone_count_with_domains(patch_connect):
for d in ["a.com", "b.com", "c.com"]:
patch_connect.add(Domain(domain=d, hostname="da1.example.com", username="admin"))
patch_connect.commit()
api = _api()
result = api._build()
assert result["zones"]["total"] == 3
def test_build_queues_forwarded(patch_connect):
api = _api(_qs(save_queue_size=2, delete_queue_size=1, retry_queue_size=3, dead_letters=1))
result = api._build()
assert result["queues"]["save"] == 2
assert result["queues"]["delete"] == 1
assert result["queues"]["retry"] == 3
assert result["queues"]["dead_letters"] == 1
def test_build_workers_forwarded(patch_connect):
api = _api()
result = api._build()
assert result["workers"]["save"] is True
assert result["workers"]["delete"] is True
assert result["workers"]["retry_drain"] is True
# ---------------------------------------------------------------------------
# index — JSON encoding
# ---------------------------------------------------------------------------
def test_index_returns_valid_json(patch_connect):
api = _api()
with MagicMock() as mock_resp:
cherrypy.response = mock_resp
cherrypy.response.headers = {}
body = api.index()
data = json.loads(body)
assert data["status"] == "ok"
assert isinstance(data["zones"]["total"], int)