docs/401-ratelimit.md
This document describes the per-source rate-limit on 401 Unauthorized
responses. It covers what the feature defends
against, how it is implemented, how to operate it, expected performance, and
its known weaknesses.
TURN/STUN long-term authentication is a challenge/response flow: the first
request from a client arrives without valid credentials, and the server
answers with 401 Unauthorized carrying a REALM and a NONCE. This is
normal — the legitimate client then retries with credentials derived from the
nonce.
The problem is on UDP. UDP source addresses are trivially spoofable, and a
401 response is larger than the request that triggers it (it carries
REALM, NONCE, SOFTWARE, and message-integrity material). An attacker can
therefore:
401 responses at the victim, and401, multiplying
the attacker's outbound bandwidth at the victim.The server is an unwitting reflector/amplifier. The mitigation is to cap how
many 401 responses the server will emit toward any single source address per
unit time. Past the cap, the server simply stays silent — it does not send the
401, denying the attacker both the reflection and the amplification.
The feature is off by default and opt-in, because suppressing 401
responses changes a protocol-visible behavior and only matters for operators
exposed to UDP reflection abuse.
When enabled, for every request that would produce a 401:
UDP client sockets are considered. TCP/TLS can't be spoofed for
reflection (the handshake forces a real return path), so they are never
rate-limited.401 is sent as normal.no_response is set: the 401 is silently
suppressed for the rest of the window.Counting is consume-on-401: only requests that actually result in a 401
spend a token. Successful or otherwise-errored requests don't touch the table.
| File | Role |
|---|---|
| src/ns_turn_atomic.h | Portable 32-bit atomics (load/store/fetch_add/CAS) over C11 <stdatomic.h> and, on MSVC, the Interlocked* intrinsics. |
| src/server/ns_turn_ratelimit.h / .c | The lock-free rate-limit table and its two entry points. |
| src/server/ns_turn_server.c | The consume call site inside handle_turn_command. |
| src/apps/relay/mainrelay.c | CLI flags, defaults, and one-time ratelimit_init(). |
| src/apps/relay/prom_server.c | Prometheus counters for UDP 401 decisions. |
| examples/run_tests_ratelimit_401.sh | End-to-end positive/negative system test. |
#define RATELIMIT_BUCKETS 4096u // power of two
typedef struct {
turn_atomic_u32 tag; // hash of source IP (port stripped); 0 = empty
turn_atomic_u32 window_start; // turn_time() when the current window opened
turn_atomic_u32 count; // requests counted in this window
turn_atomic_u32 logged; // 1 once a drop has been logged this window
turn_atomic_u32 collision_logged;// 1 once a collision has been logged this window
} ratelimit_bucket;
static ratelimit_bucket ratelimit_table[RATELIMIT_BUCKETS];
A single statically-allocated, zero-initialized table of 4096 buckets, 20
bytes each (~80 KiB resident). No malloc/free on the hot path, no growth,
no eviction list.
It is a direct-mapped structure: bucket = hash(addr) & (RATELIMIT_BUCKETS-1).
There is exactly one active budget per bucket. An address that collides with
an unexpired bucket shares that existing budget; it cannot replace the bucket
owner and get a fresh response allowance.
sin_addr.s_addr.0 can mean "empty bucket".ratelimit_consume_address(addr, max_per_sec, &first_drop, &first_collision)
returns true when the current request is over the limit (caller should
suppress). The window is a fixed 1 second, so max_per_sec is the allowed
number of responses per source per second:
now = (uint32_t)turn_time().tag and window_start.now - window_start >= 1): atomically store a fresh window_start,
clear the log latches, set count = 1, and finally store the new tag.
Returns false (this request is the first in a fresh window).first_collision for one bounded diagnostic log line.fetch_add(count, 1) returns the pre-increment
value prev. If prev < max, allow (false). Otherwise this is the
(max+1)-th request: it's over the limit, return true.CAS(logged, 0 -> 1).
The single winner of that CAS gets *first_drop = true; everyone else in
the window is silent.There is no mutex. All bucket fields are sequentially-consistent atomics, and the design accepts small, bounded races by construction rather than locking them out:
count == 1 by the time the tag store lands. Worst case the
effective count is off by a request or two at a window boundary.This is acceptable because the goal is coarse abuse mitigation, not exact accounting. Most importantly, active collisions share a budget instead of granting additional reflected responses.
The earlier shim typed the on/off flag as bool instead of bool * in
init_turn_server(), which truncated the parameter pointer and left the
feature effectively always-on. The fix makes both tunables pointers into
turn_params (so every relay thread sees live CLI values without per-thread
copies) and centralizes the atomic primitives in
src/ns_turn_atomic.h. That header gates on
_MSC_VER (not the project WINDOWS macro) because only MSVC lacks usable C11
atomics — MinGW is a GCC toolchain and takes the <stdatomic.h> path. The
Interlocked* intrinsics and the non-explicit C11 atomics are both
sequentially consistent, so callers never reason about per-platform ordering.
| Flag | Default | Meaning |
|---|---|---|
--unauthorized-ratelimit | off | Enable per-source 401 rate-limiting on UDP. |
--unauthorized-ratelimit-rps=<count> | 10 | Max 401 responses per source IP per second. |
A non-positive value for the threshold is rejected with a warning and falls back to the default. The default (10 per second) is well above any legitimate client's challenge/retry rate, so normal traffic is never affected.
Example:
turnserver --use-auth-secret --static-auth-secret=secret --realm=north.gov \
--unauthorized-ratelimit --unauthorized-ratelimit-rps=10
When the limit is first crossed for a source in a window the server logs:
401 rate-limit exceeded from <ip>, suppressing responses for this window
If a different address first collides with an active bucket in a window, the server also logs one diagnostic line for that bucket and window:
401 rate-limit bucket collision from <ip>, sharing active bucket budget for this window
When Prometheus is enabled, these metrics describe the UDP 401 reflection
surface:
| Metric | Type | Meaning |
|---|---|---|
turn_unauthenticated_401_requests | counter | Requests that required a UDP 401 response. |
turn_unauthenticated_401_responses | counter | UDP 401 responses emitted. |
turn_unauthenticated_401_dropped_responses | counter | UDP 401 responses suppressed by this mitigation. |
A second group describes the health of the bucket table itself:
| Metric | Type | Meaning |
|---|---|---|
turn_ratelimit_hash_collisions | counter | Total requests whose source hashed to a bucket already owned by a different live address. A rising rate means distinct sources are sharing budgets — the false-positive surface for the mitigation. |
turn_ratelimit_occupied_buckets | gauge | Buckets currently holding a live (non-expired) window. |
turn_ratelimit_total_buckets | gauge | Table capacity in buckets (the compile-time constant). |
turn_ratelimit_occupied_buckets / turn_ratelimit_total_buckets is the table
utilization; as it approaches 1, the birthday-paradox collision probability
climbs, so a sustained high ratio (or a climbing turn_ratelimit_hash_collisions
rate) is the signal to enlarge RATELIMIT_BUCKETS. These two are refreshed
lazily when Prometheus scrapes /metrics: the collision counter is a single
atomic incremented only on the collision branch, and occupancy is a one-pass
scan of the table performed at scrape time, so neither adds cost to the request
path.
The feature is built to be effectively free on the data path:
401 branch, so
it adds nothing to authenticated relay traffic (the throughput the load tests
in CLAUDE.md measure).ratelimit_table of 4096 * 20 B ≈ 80 KiB (five
32-bit atomic fields per bucket), fixed for the life of the process and
shared across all relay threads.No microbenchmark numbers are committed for this path; the cost is dominated by
the existing 401 message construction it guards, not by the table operation.
The DigitalOcean load-test harness in CLAUDE.md measures relay
throughput, which this feature does not touch.
401s are never rate-limited. That is
correct for the reflection threat (those transports can't be spoofed), but it
means this is not a general brute-force-auth throttle.401s.RATELIMIT_BUCKETS); there is no runtime sizing. A very large, highly-distributed
spoof set will cycle buckets faster, but the table never grows.turnserver
process and resets on restart. There is no coordination across a cluster of
servers behind a load balancer; each instance rate-limits independently.--unauthorized-ratelimit-rps, legitimate
challenges could be suppressed. Keep the threshold comfortably above
aggregate legitimate challenge rates for shared egress IPs.turn_time() at
1-second granularity stored in 32 bits; this matches the per-second limit but
is not suitable for sub-second limiting.examples/run_tests_ratelimit_401.sh
runs two end-to-end cases against a real turnserver with bad credentials:
--unauthorized-ratelimit-rps=1): a single turnutils_uclient
session retries the 401 challenge enough times to cross the threshold, so
exactly one 401 rate-limit exceeded line must appear.--unauthorized-ratelimit-rps=100000): the same traffic stays
far below the threshold, so the line must not appear.It is split out of run_tests.sh so the rate-limit server fixture can't mask
or be masked by the protocol suite's flags. It is skipped on macOS (loopback
UDP relay is intermittently lossy there, making the log-line accounting flaky);
Linux CI is the canonical target.
tests/test_ratelimit.c finds a colliding source address and verifies that a live collision remains suppressed and emits its collision signal only once. examples/run_tests_prom.sh drives a low-limit unauthorized flow and verifies all three Prometheus counters are non-zero in Linux CI.