docs/proposals/shuffle-sharding-and-zone-awareness.md
Cortex shards the received series across all available ingesters. In a multi-tenant cluster, each tenant series are sharded across all ingesters. This allows to horizontally scale the series across the pool of ingesters but also suffers some issues:
Cortex currently supports sharding a tenant to a subset of the ingesters on the write path PR, using a feature called “subring”. However, the current subring implementation suffers two issues:
The goal of this work is to fix “shuffling” and “zone-awareness” when building the subring for a given tenant, honoring the following properties:
This proposal is based on Amazon’s Shuffle Sharding article and the algorithm has been inspired by shuffle sharding implementation in the AWS Route53 infima library.
Given a tenant and a shard size S (number of instances to which tenant data/workload should be sharded to), we build a subring selecting N instances from each zone, where N = ceil(S / num of zones). The shard size S is required to be a multiple of the number of zones, in order to select an equal number of instances from each zone.
To do it, we treat each zone as a separate ring and select N unique instances from each zone. The instances selection process works as follow:
The same tenant ID always generates the same seed. Given the same seed, the pseudo number random generator always generates the same sequence of numbers.
This guarantees that, given the same ring, we generate the same exact subring for a given tenant.
The consistency property is honored by two aspects of the algorithm:
Let’s consider an initial ring with 3 instances and 1 zone (for simplicity):
With a replication factor = 2, the random sequence looks up:
Then we add a new instance and the updated ring is:
Now, let’s compare two different algorithms to solve collisions:
Random sequence = 3 (I4), 6 (I4 - collision), 12 (I3)
all instances are different (I4, I3)
Random sequence = 3 (I4), 6 (I4 - collision, next is I1)
only 1 instance is different (I4, I1)
Unless when resolving collisions, the algorithm doesn’t walk the ring to find the next instances, but uses a sequence of random numbers. This guarantees instances are shuffled, between different tenants, when building the subring.
We treat each zone as a separate ring and select an equal number of instances from each zone. This guarantees a fair balance of instances between zones.
We’ve built a reference implementation of the proposed algorithm, to test the properties described above.
In particular, we’ve observed that the actual distribution of matching instances between different tenants is very close to the theoretical one, as well as consistency and stability properties are both honored.