docs/sources/reference-pyroscope-v2-architecture/data-distribution/index.md
Pyroscope v2 uses a sophisticated data distribution algorithm to place profiles across segment-writers. The algorithm ensures that profiles from the same application are co-located while maintaining even load distribution across the cluster.
The distribution algorithm is designed to achieve:
The choice of placement for a profile involves a three-step process:
tenant_id.service_name label.Where:
{{< mermaid >}} block-beta columns 15
shards["ring"]:2
space
shard_0["0"]
shard_1["1"]
shard_2["2"]
shard_3["3"]
shard_4["4"]
shard_5["5"]
shard_6["6"]
shard_7["7"]
shard_8["8"]
shard_9["9"]
shard_10["10"]
shard_11["11"]
tenant["tenant"]:2
space:4
ts_3["3"]
ts_4["4"]
ts_5["5"]
ts_6["6"]
ts_7["7"]
ts_8["8"]
ts_9["9"]
space:2
dataset["dataset"]:2
space:5
ds_4["4"]
ds_5["5"]
ds_6["6"]
ds_7["7"]
space:4
{{< /mermaid >}}
In this example:
Pyroscope uses Jump consistent hash to select positions within each subring. This algorithm ensures:
This minimizes data re-balancing when the cluster size changes.
To prevent hot spots where many datasets end up on the same node, shards are mapped to instances through a separate mapping table. This mapping:
{{< mermaid >}} graph LR Distributor==>SegmentWriter PlacementAgent-.-PlacementRules SegmentWriter-->|metadata|PlacementManager SegmentWriter==>|data|Segments PlacementManager-.->PlacementRules
subgraph Distributor["distributor"]
PlacementAgent
end
subgraph Metastore["metastore"]
PlacementManager
end
subgraph ObjectStore["object store"]
PlacementRules(placement rules)
Segments(segments)
end
subgraph SegmentWriter["segment-writer"]
end
{{< /mermaid >}}
Due to the nature of continuous profiling, data can be distributed unevenly across profile series. To mitigate this:
fingerprint mod n is used as the distribution keyrandom(n) distributionThis adaptive approach handles uneven data distribution while maintaining locality when possible.
The Placement Manager runs on the metastore leader and:
Placement rules are stored in object storage and fetched by distributors. Since actual data re-balancing is not performed, placement rules don't need to be synchronized in real-time.
If a segment-writer fails:
Two requests with the same distribution key may occasionally end up in different shards, but this is expected to be rare.
For detailed implementation information, including the full algorithm specification and shard mapping procedures, refer to the internal documentation.