design/replication-policy.md
FoundationDB Replication Policy System
FoundationDB uses a composable replication policy system to determine how data replicas are distributed across physical infrastructure. This system enables the database to enforce constraints that ensure fault tolerance, availability, and performance while abstracting details from end users.
Shards, representing key ranges, are replicated across some number of storage servers (the shard's "team"). FDB maintains a "shard map" data structure to map shards to the team. More information on these terms can be found in data-distributer-internals.md
The commit proxy uses the shard map to derive which log servers to send committed data to. Each SS has a "buddy" with a single corresponding log server. The buddy is found by taking the SS id modded by the number of tLogs in the dc.
When a batch of transactions is committed, the modified key(s) will logically reside within one or more shards, which physically are replicated over the SS in their team. Each such SS has a buddy log server, and in the simple case, the CP forwards writes to those logs.
For example, suppose there were 4 log servers and 10 storage servers on a cluster configured with 3 replicas, and a shard's team resides on SS 0, 3, 6. Given there are 4 log servers total, the destination log servers would be {0%0, 3%4, 6%4}, or {0,3,2}.
Storage servers: 0 1 2 3 4 5 6 7 8 9
TLogs: 0 1 2 3
The "buddy" associations between shards and log servers are dynamic:
This means the set of log servers must be recalculated on each commit.
The team of log servers must meet the requirements of the replication policy. If it does not, additional logs are chosen. This is performed by "selectReplicas".
The selectReplicas function accepts a list of available servers and already chosen "also" servers (if any). It will check if the already chosen servers meet the replication policy. If they do not, the function will select additional servers at random out of the available servers in order to meet the policy. Because the shard map has replicas for all key ranges,the set of "also" servers passed to selectReplica usually already satisfies the policy, but there are exceptions.
Though users interact with only a few predefined redundancy modes (double, triple, etc.), FoundationDB internally supports a rich set of compositional replication rules. These are defined using basic building blocks that can be composed recursively. They are implemented in ReplicationPolicy.cpp as subclasses of IReplicationPolicy.
Core Syntax Constructs
One()
Across(n, attribute, subpolicy)
zone_id)n groups by attribute, apply subpolicy in each.PolicyAnd(p1, p2)
Redundancy Modes → Internal Policy Mapping
single
One()double, fast_recovery_double
Across(2, zone_id, One())triple, fast_recovery_triple
Across(3, zone_id, One())three_datacenter, multi_dc
Across(3, dcid, Across(2, zone_id, One()))Across(2, dcid, Across(2, zone_id, One()))three_datacenter_fallback
Across(2, dcid, Across(2, zone_id, One()))three_data_hall
Across(3, data_hall, One())Across(2, data_hall, Across(2, zone_id, One()))three_data_hall_fallback