Benchmarks
437,434 queries per second. 32 microsecond latency.
rDNS measured against Unbound 1.19.2 on the same hardware, with the same workload, using dnsperf 2.14. Everything is reproducible.
Head-to-head: queries per second
Cached query throughput, 10-second runs at a 500 K QPS cap.
Average latency
| Clients | rDNS | Unbound | Speedup |
|---|---|---|---|
| 10 | 34 µs | 317 µs | 9.3× |
| 50 | 32 µs | 248 µs | 7.8× |
| 100 | 59 µs | 230 µs | 3.9× |
| 200 | 57 µs | 302 µs | 5.3× |
| 500 | 53 µs | 237 µs | 4.5× |
Test environment
| CPU | 24 cores (AMD64) |
|---|---|
| RAM | 32 GB |
| OS | Linux 6.6.87 (WSL2) |
| rDNS | v1.5.0, release build (LTO fat, codegen-units=1, target-cpu=native) |
| Unbound | 1.19.2-1ubuntu3.7, single-threaded, module-config: iterator |
| Workload | 100 unique queries (A, AAAA, MX, NS, TXT, NXDOMAIN), all cached |
| Tool | dnsperf -l 10 -Q 500000 |
Both servers configured as forwarders to 1.1.1.1 with DNSSEC disabled, logging at error-only level.
The optimization journey
rDNS started at 29,630 QPS and ended at 437,434 — a 14.8× improvement across five rounds. Each one is a learning artifact for any high-performance Rust networking work.
v1 → v2: Concurrency (+216%)
The original UDP listener processed queries sequentially. Each query blocked the socket while waiting for upstream resolution.
Fix: Spawn a Tokio task per incoming query. Forwarder connection pool multiplexes queries over one connected UDP socket with oneshot-channel response dispatch.
v2 → v3: Reduce allocations (−9%, better scaling)
Spawning a task per query added clone overhead and full Message::decode → encode round trips.
Fix: Sync fast-path for cache hits, authoritative answers, and RPZ blocks — inline in the recv loop. Wire-format fast parser. Direct wire encode with TTL adjustment.
v3 → v4: Faster cache (+264%)
DashMap's coarse sharding and get_mut write-lock-on-hit became the bottleneck.
Fix: Custom 256-shard cache on parking_lot::RwLock. Cache hits take read locks only. LTO fat, single codegen unit, target-cpu native in the release profile.
v4 → v5: Eliminate socket contention (+41%)
Multiple workers sharing one socket caused kernel-level race on the receive buffer.
Fix: SO_REUSEPORT — separate socket per worker on the same port. Kernel distributes packets by flow hash. SO_RCVBUF increased to 4 MB.
Reproduce it
# Build with native CPU optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release
# Install tools
sudo apt-get install -y dnsperf unbound
# Run the benchmark suite
bash bench/run.sh
# Or manually:
./target/release/rdns -c bench/rdns-bench.toml &
dnsperf -s 127.0.0.1 -p 5553 -d bench/queryfile.txt -c 50 -l 10 -Q 500000
Notes
- Single-client performance is lower than Unbound because
SO_REUSEPORTdistributes by flow hash — one source, one worker. Not a realistic production scenario. - Unbound was tested with
num-threads: 1(its default for benchmarking). - These benchmarks measure cached query throughput only. Cold-cache depends on upstream latency.
- Results vary by hardware, kernel, and system load.
Get rDNS running in 60 seconds.
Single static binary. TOML config. MIT licensed. Linux, FreeBSD, and macOS.