design-docs/atomic-slot-migration.md
Atomic Slot Migration (ASM) provides a seamless, atomic method for migrating
hash slots between nodes in a Valkey cluster. This mechanism replaces
CLUSTER SETSLOT IMPORTING/MIGRATING and MIGRATE for migrating slots between
nodes.
Rather than migrating data key-by-key, ASM operates at the slot level by adapting existing replication and failover primitives:
Snapshot Transfer: The source node transfers data to the target node. The source node forks a child process to iterate and serialize the slot's keys.
The source transfers data in "AOF" (Append Only File) format to the target node. This format consists of a stream of commands. Consequently, the target primary and target replicas replay these commands to restore the slot's state.
Incremental Updates: While transferring the initial snapshot, the source node serves business requests. The source node records any changes to the slot's keys during this time and sends them to the target node as incremental updates after completing the snapshot transfer.
Pause: After sending the incremental updates, the source node pauses writes to the migrating slots. Consequently, the source node rejects any further business requests for those slots. This pause ensures the target node maintains the same state as the source node.
Failover: After the source node pauses, the target node performs a takeover and becomes the primary node for the slot.
Clean Up: After the target node becomes the primary node for the slot, the source node receives this information via cluster topology updates. The source node then unpauses and completes the slot migration. Failed migrations on the target side are cleaned up by deleting keys that are no longer owned by the node.
The source, target, and target replica use the CLUSTER SYNCSLOTS command to
coordinate the handover state:
Source Target Target Replica
| | |
|------------ SYNCSLOTS ESTABLISH -------------->| |
| |----- SYNCSLOTS ESTABLISH ------>|
|<-------------------- +OK ----------------------| |
| | |
|---------------- SYNCSLOTS ACK ---------------->| |
| | |
|~~~~~~~~~~~~~~ snapshot as AOF ~~~~~~~~~~~~~~~~>| |
| |~~~~~~ forward snapshot ~~~~~~~~>|
|----------- SYNCSLOTS SNAPSHOT-EOF ------------>| |
| | |
|<----------- SYNCSLOTS REQUEST-PAUSE -----------| |
| | |
|~~~~~~~~~~~~ incremental changes ~~~~~~~~~~~~~~>| |
| |~~~~~~ forward changes ~~~~~~~~~>|
|--------------- SYNCSLOTS PAUSED -------------->| |
| | |
|<---------- SYNCSLOTS REQUEST-FAILOVER ---------| |
| | |
|---------- SYNCSLOTS FAILOVER-GRANTED --------->| |
| | |
| (performs takeover & |
| propagates topology) |
| | |
| |------- SYNCSLOTS FINISH ------->|
(finds out about topology | |
change & marks migration done) | |
| | |
Throughout the migration, both the source and target nodes exchange periodic
SYNCSLOTS ACK messages to monitor the health and progress of the operation.
If a node fails to receive an acknowledgment within the replication timeout,
the migration is aborted.
See code comments in cluster_migrateslots.c for detailed state machines.
Various scenarios cause slot migration failure:
FLUSHDB on the source or target nodeIn such cases, ASM automatically rolls back the migration.
Source Target Target Replica
| | |
|------------ SYNCSLOTS ESTABLISH -------------->| |
| |----- SYNCSLOTS ESTABLISH ------>|
|<-------------------- +OK ----------------------| |
... ... ...
| | |
| <FAILURE> |
| | |
| (performs cleanup) |
| | ~~~~~~ UNLINK <key> ... ~~~~~~~>|
| | |
| | ------ SYNCSLOTS FINISH ------->|
| | |
The cluster automatically cleans up failed or cancelled slot migrations. The
primary is solely responsible for cleaning up unowned slots. Primaries demoted
during migration do not clean up previously active slot imports. The promoted
replica is responsible for both cleaning up the slot and sending a
SYNCSLOTS FINISH.
The system rejects any keyed command executed on a node that is not the primary
for that slot with -MOVED (e.g. GET, SET, DEL, INCR, etc).
Nodes filter unkeyed read commands, like SCAN and KEYS, to avoid exposing
importing slot data. Each node in the target shard tracks the slot migration job
state and hides writes to that slot from the end user until the migration
completes.
To ensure that replicas resyncing during an import remain aware of it, Valkey serializes each in-progress slot import into an RDB section defined by a new opcode. The encoding includes the job name and the slot ranges being imported. Whenever the system loads an RDB file containing a slot import section, whether from disk or during a primary sync, it adds a new migration to track the import. If the Valkey node becomes a primary after loading the RDB, it cancels the slot migration.
Failure to load the opcode results in consistency problems, so the opcode is mandatory. If the opcode is not recognized, the RDB load will fail.
Loading this tracking state on primaries ensures that replicas partially syncing
to a restarted primary still get their SYNCSLOTS FINISH message in the
replication stream.
Valkey propagates the ESTABLISH and FINISH commands to the AOF, ensuring
they replay properly on AOF load. Similar to RDB, if any pending ESTABLISH
commands lack a subsequent FINISH upon becoming primary, the system fails them
after loading.