pkg/kv/kvserver/allocator/doc.md
MaybeAddAsync with every replica. Every base queue’s MaybeAdd then calls into base queue’s replicaCanBeProcessed, shouldQueue. If shouldQueue returns true, it adds the replica to the queue with the priority given by shouldQueue. Skips if shouldQueue returns false.bq.processLoop which pops items added by the replica scanner and calls into bq.processOneAsyncAndReleaseSem. bq.processOneAsyncAndReleaseSem then calls bq.processReplica async with a time out. bq.processReplica then calls bq.process. If replicas fail to be processed, they will be sent to the purgatory queue.ReplicateQueue.shouldQueue calls into ReplicaPlanner.ShouldPlanChange which decides whether the replica scanner adds the replica to the queue.replicateQueue.process with one replica, a) grab the range token b) calls into rq.processOneChange.processOneChange calls into ReplicaPlanner.PlanOneChange which plans a replicate change or an error. PlanOneChange calls into allocator.ComputeAction which returns an enum action. This enum action tells replicate queue to consider add / remove / replace replicas. ComputeAction gives an action that should be done.
The separation is here because planner.ShouldPlanChange calls into ComputeAction as well, and we want a cheap check for the queue. Note that actions returned from ComputeAction in ShouldPlanChange v.s. PlanOneChange may be different.shouldQueue takes a replica and calls into ReplicaPlanner.ShouldPlanChange which then calls into the allocator via rp.allocator.ComputeAction.ComputeAction computes an action that needs to be done on this replica.AllocatorConsiderRebalance by default.ReplicaPlanner.ShouldPlanChange the action, it would return true if a repair action is needed. Otherwise, it checks if a rebalance target can be found via Allocator.RebalanceTarget. We will go into the details of RebalanceTarget below.replicateQueue.processOneChange happens when the base queue pops the replica off the queue and actually start processing the replica.replicateQueue.processOneChange calls into ReplicaPlanner.PlanOneChange which then calls into ComputeAction as well. Based on the action, it would call into ReplicaPlanner's helper functions (such as addOrReplaceVoters) to compute the exact allocation operation out (such as a lease transfer, change of replicas).considerRebalance. As a reminder, this action is returned by default for every replica when no other repair action applies. Note that this action does help repair zone constraints.ReplicaPlanner first picks which scorer options to be used for the allocator decision. It uses the RangeCountScorerOptions by default and ScatterScorerOptions if this action originates from a scatter operation. ScorerOptions define the heuristics used to evaluate candidate replicas, helping the allocator guide decisions toward specific goals such as range count convergence.ReplicaPlanner then calls into the allocator to find the best rebalancing target via Allocator.RebalanceTarget.Allocator.RebalanceTarget first builds constraint checkers, which are passed to rankedCandidateListForRebalancing to determine which existing replicas / stores are valid or required based on the range’s constraints.Allocator.RebalanceTarget then calls into rankedCandidateListForRebalancing to find a list of rebalance options. Each rebalance option consists of an existing replica and a set of candidate stores that can replace it.rankedCandidateListForRebalancing first builds a map of existing store ids to their corresponding candidate entries - annotated with attributes such as valid, necessary, disk fullness, I/O overload, and diversity score.
For each existing replica store, it then constructs an equivalence class, containing the store itself along with a list of comparable candidates.
This is how each equivalence class is constraucted: for every existing replica, all other stores are considered as potential candidates. During this iteration, some filtering is applied based on the state of other existing replicas. If a candidate store is not worse than the existing store, it is added to the list of comparable candidates. From all considered candidates, the best set is selected, and an equivalance class is constructed using existing replica and these best candidates. Note that at this stage, only attributes such as valid, necessary (constraint conformance), diversity score, and disk fullness are considered. At this point, range count convergence has not been considered yet.
(TODO(wenyihu6 during review): why ioOverloaded is always false here there is a comment https://github.com/cockroachdb/cockroach/blob/bc9fdcd029eae1f30a5ea82e44b60a629d452f6c/pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go#L1550-L1553 but idu).
(TODO(wenyihu6 during review) looks like at this stage we do not exclude other existing replicas stores - we filter it out below when iterating over comparable candidates but why)
At this point, we have an equivalence class for every existing replica. We then examine candidates’ range count statistics within each class to help break ties.
Allocator.RebalanceTarget iterates over all rebalanceOptions to identify the option that offers the largest improvement. As a reminder, each rebalanceOption consists of an existing replica and a list of candidate stores. For each option, it selects the best candidate set, randomly chooses one candidate, and then determines the overall best rebalance option by comparing all rebalanceOptions.