Head-to-head: queries per second

Cached query throughput, 10-second runs at a 500 K QPS cap.

rDNS, 10 clients
401,752
Unbound, 10
262,187
rDNS, 50
437,434
Unbound, 50
335,813
rDNS, 100
328,340
Unbound, 100
346,334
rDNS, 200
375,192
Unbound, 200
263,109
rDNS, 500
437,365
Unbound, 500
328,564

Average latency

ClientsrDNSUnboundSpeedup
1034 µs317 µs9.3×
5032 µs248 µs7.8×
10059 µs230 µs3.9×
20057 µs302 µs5.3×
50053 µs237 µs4.5×

Test environment

CPU24 cores (AMD64)
RAM32 GB
OSLinux 6.6.87 (WSL2)
rDNSv1.5.0, release build (LTO fat, codegen-units=1, target-cpu=native)
Unbound1.19.2-1ubuntu3.7, single-threaded, module-config: iterator
Workload100 unique queries (A, AAAA, MX, NS, TXT, NXDOMAIN), all cached
Tooldnsperf -l 10 -Q 500000

Both servers configured as forwarders to 1.1.1.1 with DNSSEC disabled, logging at error-only level.

The optimization journey

rDNS started at 29,630 QPS and ended at 437,434 — a 14.8× improvement across five rounds. Each one is a learning artifact for any high-performance Rust networking work.

v1 → v2: Concurrency (+216%)

The original UDP listener processed queries sequentially. Each query blocked the socket while waiting for upstream resolution.

Fix: Spawn a Tokio task per incoming query. Forwarder connection pool multiplexes queries over one connected UDP socket with oneshot-channel response dispatch.

v2 → v3: Reduce allocations (−9%, better scaling)

Spawning a task per query added clone overhead and full Message::decode → encode round trips.

Fix: Sync fast-path for cache hits, authoritative answers, and RPZ blocks — inline in the recv loop. Wire-format fast parser. Direct wire encode with TTL adjustment.

v3 → v4: Faster cache (+264%)

DashMap's coarse sharding and get_mut write-lock-on-hit became the bottleneck.

Fix: Custom 256-shard cache on parking_lot::RwLock. Cache hits take read locks only. LTO fat, single codegen unit, target-cpu native in the release profile.

v4 → v5: Eliminate socket contention (+41%)

Multiple workers sharing one socket caused kernel-level race on the receive buffer.

Fix: SO_REUSEPORT — separate socket per worker on the same port. Kernel distributes packets by flow hash. SO_RCVBUF increased to 4 MB.

Reproduce it

# Build with native CPU optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release

# Install tools
sudo apt-get install -y dnsperf unbound

# Run the benchmark suite
bash bench/run.sh

# Or manually:
./target/release/rdns -c bench/rdns-bench.toml &
dnsperf -s 127.0.0.1 -p 5553 -d bench/queryfile.txt -c 50 -l 10 -Q 500000

Notes

  • Single-client performance is lower than Unbound because SO_REUSEPORT distributes by flow hash — one source, one worker. Not a realistic production scenario.
  • Unbound was tested with num-threads: 1 (its default for benchmarking).
  • These benchmarks measure cached query throughput only. Cold-cache depends on upstream latency.
  • Results vary by hardware, kernel, and system load.

Get rDNS running in 60 seconds.

Single static binary. TOML config. MIT licensed. Linux, FreeBSD, and macOS.