Benchmarks — rDNS

Head-to-head: queries per second

Cached query throughput, 10-second runs at a 500 K QPS cap.

rDNS, 10 clients

401,752

Unbound, 10

262,187

rDNS, 50

437,434

Unbound, 50

335,813

rDNS, 100

328,340

Unbound, 100

346,334

rDNS, 200

375,192

Unbound, 200

263,109

rDNS, 500

437,365

Unbound, 500

328,564

Average latency

Clients	rDNS	Unbound	Speedup
10	34 µs	317 µs	9.3×
50	32 µs	248 µs	7.8×
100	59 µs	230 µs	3.9×
200	57 µs	302 µs	5.3×
500	53 µs	237 µs	4.5×

Test environment

CPU	24 cores (AMD64)
RAM	32 GB
OS	Linux 6.6.87 (WSL2)
rDNS	v1.5.0, release build (LTO fat, codegen-units=1, target-cpu=native)
Unbound	1.19.2-1ubuntu3.7, single-threaded, module-config: iterator
Workload	100 unique queries (A, AAAA, MX, NS, TXT, NXDOMAIN), all cached
Tool	`dnsperf -l 10 -Q 500000`

Both servers configured as forwarders to 1.1.1.1 with DNSSEC disabled, logging at error-only level.

The optimization journey

rDNS started at 29,630 QPS and ended at 437,434 — a 14.8× improvement across five rounds. Each one is a learning artifact for any high-performance Rust networking work.

v1 → v2: Concurrency (+216%)

The original UDP listener processed queries sequentially. Each query blocked the socket while waiting for upstream resolution.

Fix: Spawn a Tokio task per incoming query. Forwarder connection pool multiplexes queries over one connected UDP socket with oneshot-channel response dispatch.

v2 → v3: Reduce allocations (−9%, better scaling)

Spawning a task per query added clone overhead and full Message::decode → encode round trips.

Fix: Sync fast-path for cache hits, authoritative answers, and RPZ blocks — inline in the recv loop. Wire-format fast parser. Direct wire encode with TTL adjustment.

v3 → v4: Faster cache (+264%)

DashMap's coarse sharding and get_mut write-lock-on-hit became the bottleneck.

Fix: Custom 256-shard cache on parking_lot::RwLock. Cache hits take read locks only. LTO fat, single codegen unit, target-cpu native in the release profile.

v4 → v5: Eliminate socket contention (+41%)

Multiple workers sharing one socket caused kernel-level race on the receive buffer.

Fix: SO_REUSEPORT — separate socket per worker on the same port. Kernel distributes packets by flow hash. SO_RCVBUF increased to 4 MB.

Reproduce it

# Build with native CPU optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Install tools
sudo apt-get install -y dnsperf unbound

# Run the benchmark suite
bash bench/run.sh

# Or manually:
./target/release/rdns -c bench/rdns-bench.toml &
dnsperf -s 127.0.0.1 -p 5553 -d bench/queryfile.txt -c 50 -l 10 -Q 500000

Notes

Single-client performance is lower than Unbound because SO_REUSEPORT distributes by flow hash — one source, one worker. Not a realistic production scenario.
Unbound was tested with num-threads: 1 (its default for benchmarking).
These benchmarks measure cached query throughput only. Cold-cache depends on upstream latency.
Results vary by hardware, kernel, and system load.

Get rDNS running in 60 seconds.

Single static binary. TOML config. MIT licensed. Linux, FreeBSD, and macOS.

Install → View on GitHub