client/matching/README.md
This package provides the client for communicating with the Matching service. It handles two key concerns: partition selection and routing.
The LoadBalancer distributes add/poll API calls across available task queue partitions.
Write path: Partitions are selected uniformly at random from the configured write partition count.
Read path: Partitions are selected to balance poller counts across partitions. The load balancer tracks outstanding polls per partition and sends new polls to the partition with the fewest active polls.
The number of partitions is controlled by one of two mechanisms:
matching.numTaskqueueWritePartitions, matching.numTaskqueueReadPartitions)There are test hooks to force specific partition selection for testing.
Routing determines which matching node owns a given task queue partition. All clients (frontend, history) independently perform this computation using consistent hashing via ringpop.
Each partition has a routing key of the form:
namespace_id:queue_name:task_type
This key is hashed with the consistent hashing algorithm to find the owning node.
With basic routing, partitions of the same queue are placed independently, which can cause multiple partitions to land on the same node, creating hot spots.
Spread routing groups partitions into batches and uses LookupN to ensure
partitions within a batch are assigned to different nodes if possible.
The batch size is controlled by dynamic config matching.spreadRoutingBatchSize,
default zero (i.e. use basic routing).
Algorithm:
batch = partition_id / batch_sizeindex = partition_id % batch_sizenamespace_id:queue_name:batch_number:task_typeLookupN(key, index+1) and take the host at position indexFor example, with batch size 8 and partition 25:
namespace_id:queue_name:3:task_typeLookupN(key, 2), take host at index 1If fewer hosts are available than the batch size, wrap around to spread among available hosts.
Tradeoff: Larger batch sizes provide better spread but cause more partition movement when membership changes.
Changing partition count dynamically is generally safe and doesn't cause partitions to move. The caveat is that when reducing, write partitions has to be reduced first, and then the extra partitions have to be empty before reducing read partitions.
Changing batch size will cause most partitions to move between nodes.
To avoid moving lots of partitions simultaneously on a live cluster,
spread routing can be rolled out gradually (partition by partition)
using wall-clock-synchronized changes. See the the GradualChange mechanism.
The Route(partition) method on the client computes the owning node address for any partition.
This is used internally by the grpc client, and can be used by other code to
determine the owner for other purposes (e.g. matching engine knowing when to
unload non-owned partitions).
If managed partition scaling is in use, the server communicates the current read/write partition counts to the client (in a grpc trailer). The client also communicates it back to the server (in a grpc header) to ensure the client isn't using stale counts. The client maintains a cache of the current counts for each task queue.