documentation/sphinx/source/consistency-check-urgent.rst
############################## Consistency Checker Urgent ##############################
| Author: Zhe Wang | Reviewer: Jingyu Zhou | Audience: FDB developers, SREs and expert users.
In a FoundationDB (FDB) key-value cluster, every key-value pair is replicated across multiple storage servers. The Consistency Checker Urgent tool can be used to validate the consistency of all replicas for each key-value pair. If any data inconsistency is detected, the tool generates ConsistencyCheck_DataInconsistent trace events for the corresponding shard. There are two types of data inconsistencies:
The ConsistencyCheck_DataInconsistent trace event differentiates between these two types of corruption.
The Consistency Checker Urgent tool is designed to ensure safe, fast, and comprehensive checking of data consistency across the entire key space (i.e., " " ~ "\xff\xff"). It achieves this through the following features:
To run the ConsistencyCheckerUrgent, you need 1 checker and N testers. The process is as follows:
Users should manually remove testers when they are no longer needed. This approach allows for re-running the one-shot checking by restarting the checker process.
ConsistencyCheckerUrgent offers significant improvements over the existing consistency checker in several key areas:
The ConsistencyCheckerUrgent system conducts consistency checks in a distributed, client-server manner. It comprises a centralized leader (referred to as the Checker) and N agents (referred to as Testers). The Checker manages the checking process by:
The agents perform consistency checking tasks, comparing every key in the assigned range across all source servers at a specific version. As agents complete tasks or encounter failures, the leader is informed and updates the progress of the checking process accordingly.
The checker operates in the following steps:
The tester operates in the following steps: