docs/rfcs/20220207_connections_per_ip.md
Add per-client state to track and limit the number of connections open.
Similar to KIP-308, adapted to Redpanda's thread per core model.
New configuration properties:
New prometheus metric:
When a connection is accepted which exceeds one of the configured bounds, it is dropped before reading or writing anything to the socket.
If both configuration properties are unset, then no action is taken: this preserves the existing unlimited behaviour.
A sharded service storing per-client tokens for open connections, clients are mapped to shards by hash of their IP.
In kafka::protocol::apply, acquire a token at start and relinquish it at end, probably using a RAII structure, although this function already provides a convenient localized place to take+release a resource explicitly.
On first connection from a client, the shard servicing the connection makes a cross-shard call to the core that holds state for this client IP hash. This initializes the client's state:
struct client_conn_quota {
// The max per connection. This is set by applying
// the combination of per_ip and per_ip_override settings.
uint32_t total;
// How many connections are currently open. This may be at
// most `total`, although it may exceed it transiently if
// configuration properties are decreased during operation.
uint32_t in_use;
}
The connection quota state is stored in a absl::btree_map: since the overall number of unique clients maybe high (so a flat store is risky), but the entries are small (so a std::map would generate a lot of tiny allocations).
In the case where the client is not close to exceeding its limits, we can avoid the overhead of a cross-core call by caching some per-client tokens on each shard.
This cache can be pre-primed on the first client connection, if the total tokens for the client is greater than the core count: that way, subsequent connections from the same client will usually be accepted without the need for a cross-core call.
Changes to the configuration properties affect all outstanding
client_conn_quota objects: the total attribute must be recalculated.
If total is decreased, then it is also necessary to call out to all other
shards and revoke any cached tokens that are held in excess of the new limit.
When this feature is enabled, a per-connection latency overhead is added, as the shard handling the connection must dispatch a call to the shard that owns the client IP state. A cross core call can take up to a few microseconds, although the impact on P99 latency can be worse if the target core is very busy.
On a system where the connection count limit is significantly larger than the core count, this overhead will be avoided in most cases, as each core will store some locally cached tokens to allow them to accept connections.
ulimit)overrides property
Mitigating factors:
An application may exhaust server resources by opening a very large number of connections (e.g. by instantiating many Kafka clients). This can be mitigated in environments where the application design is tightly controlled and the server cluster is sized to match, but in real-life environments this is unlikely to be the case & it is relatively easy for a naively-written application to open unbounded numbers of Kafka client connections. For example, consider a web service that opens a Kafka connection for each request it serves, and does not have a limit on its own concurrent request count.
To protect redpanda from resource exhaustion, it is not necessary to discriminate between clients: we could just have a single overall connection count.
This prevents redpanda from having resources exhausted, but does not help preserve availability of the system: one misbehaving server can crowd out other clients.
We could avoid some complexity and latency by setting a per-core connection limit, rather than a total connection limit across all cores. This would enable all cores to apply their limits using local state only.
This is reasonable at small core counts, but tends to fall down at larger core counts: consider a 128 core system, where a per-core limit to tolerate reasonable applications (e.g. 10 connections) would result in a total limit of 1280, much higher than the user wanted.
In many environments, the client IP is not a meaningful discriminator between workloads -- workloads are more likely to be identifiable by their service account username.
Strictly speaking, it is not possible to limit connections by username, because we have to accept a connection and receive some bytes to learn the username, but it could still be useful to drop connections immediately after authentication if a user has too many connections already.
Adding per-user limits in future is not ruled out, but the scope of this RFC is limited to the more basic IP-based limits, to get parity with KIP-308.
Where the connection count for a particular source IP has been exhausted, and/or was set to zero, we could more efficiently drop connections by configuring the kernel to drop them for us, rather than doing it in userspace. This kind of efficiency is important in filtering systems that guard against DDoS situations.
The connection count limits in this RFC are primarily meant to protect against misbehaving (but benign) applications rather than active attack, so the overhead of enforcing in userspace is acceptable. Avoiding kernel mechanisms also helps with portability (including to older enterprise linux distros) & potential permissions issues in locked-down environments that might forbid userspace processes from using some kernel space mechanisms.
All that said, extending our userspace enforcement with lower level mechanisms in future could be a worthwhile optimization. In Kubernetes environments, we would need to consider whether to implement this type of enforcement within Redpanda pods, or within the Kubernetes network stack where external IPs are exposed.