docs/RFCS/20230118_virtual_cluster_orchestration.md
This RFC proposes to clarify the lifecycle of virtual clusters (henceforth abbreviated "VC", a.k.a. "secondary tenants") in v23.1 and introduce a mechanism by which the SQL service for a VC can be proactively started on every cluster node (shared-process execution as used in Unified Architecture/UA clusters).
The clarification takes the form of a state diagram (see below).
The new mechanism relies on the introduction of a new column in
system.tenants, ServiceMode, that describes the deployment style
for that VC's servers. This can be NONE (no service),
EXTERNAL (SQL pods, as in CC Serverless) or SHARED (shared-process
with the KV nodes). When in state SHARED, the KV nodes auto-start the
service.
We also propose to use this mechanism as a mutual exclusion interlock to prevent SQL pods from starting when the VCs use shared-process deployments.
The implementation for this change is spread over the following PRs:
Until today, the only requirements we knew of were that:
The implementation as of this writing is that of a "server controller" able to instantiate the service for a VC upon first use (triggered by a client connection), and only if that VC is ACTIVE.
Unfortunately, this simplistic view failed to provide answers to many operational questions. These questions include, but are not limited to:
Generally, there is appetite for some form of mechanism that proactively starts the SQL service on all nodes.
Additionally, we have two additional concerns:
The proposal is to evolve the current VC record state diagram, from this (Current state diagram as of 2022-01-17):
(3 states: ADD, ACTIVE, DROP. ADD is used during streaming replication.)
To the following new state diagram:
In prose:
DataState) indicates the readiness of the logical
keyspace: ADD, READY, DROP.ServiceMode) indicates whether there's a service
running doing processing for that VC: NONE (no server
possible), SHARED (shared-process multitenancy) and EXTERNAL
(separate-process multitenancy)mt start-sql command (start standalone SQL pod, used in CC
Serverless) would refuse to start a SQL service for a VC whose
record is not in state SERVICE:EXTERNAL, because at this stage we do
not support running mixed-style deployments (with both
separate-process and shared-process SQL services) - this solves
this issue.Once we have this mechanism in place:
We also take the opportunity to restructure the system.tenants
table, to store the data state and service mode as separate SQL
columns.
None known.
The main alternative considered was to not rely on a separate service mode column. Instead:
tenancy.shared_process.auto_start.enabled,
which, when set (it would be set for UA clusters) automatically
starts the SQL service for all VCs in state ACTIVE.tenancy.shared_process.auto_start.enabledis set, all ACTIVE
VCs otherwise, only system.This alternate design does not allow us to serve some VCs using separate processes, and some other VCs using shared-process multitenancy, inside the same cluster. We are interested in this use case for SRE access control in CC Serverless (e.g. using a VC with limited privileges to manage the cluster, where SREs would connect to)
We have also considered the following alternatives:
a cluster setting that controls which VCs to wake up on every node.
We disliked the cluster setting because it does not offer us clear controls about what happens on the "in" and "out" path of the state change.
a constraint that max 1 SQL service for a VC can run at a time.
This makes certain use cases / test cases difficult.
the absence of any constraint on the max number of SQL service per node.
We dislike this because it's too easy for folk to make mistakes and get confused about which VCs have running SQL services. We also dislike this because it will make it too easy for customers eager to use multi-tenancy to (ab)use the mechanisms.
a single fixed VC record (with a fixed ID or a fixed name) that would be considered as "the" resident VC, and have servers only start SQL for that one VC.
We dislike this because it will make flexible scripting of C2C replication more difficult.
| Main approach: separate SERVICE and DATA states | Approach 2: no separate RESIDENT state, new cluster setting auto_activate | |
|---|---|---|
| When does the SQL service start? | When record enters SERVICE:SHARED state. Or on node startup for VCs already in SERVICE:SHARED state. | When record enters ACTIVE state and auto_activate is true. Or on node startup for VCs already in ACTIVE state. |
| When does the SQL service stop? | When VC record leaves SERVICE:SHARED state. Or on node shutdown. | When record gets dropped or deleted. Or on node shutdown. |
| Steps during C2C replication failover. | ALTER VIRTUAL CLUSTER COMPLETE REPLICATION + ALTER VIRTUAL CLUSTER START SERVICE SHARED | ALTER VIRTUAL CLUSTER COMPLETE REPLICATION |
| Which VCs to consider for UI login? | All VCs in SERVICE:SHARED state. | If auto_activate is true, all VCs in ACTIVE state. Otherwise, only system VC. |
| Ability to run some VCs using shared-process multitenancy in CC Serverless host clusters, alongside to Serverless fleet, for access control for SREs. | Yes | No |
| Control on number of SQL services separate from VC activation? | Yes | No |
The explanation here is largely unchanged from the previous stories we have told about v23.1.
The main change is that a user would need to run ALTER VIRTUAL CLUSTER ... START SERVICE SHARED/EXTERNAL before they can start the SQL service and
connect their SQL clients to it.
N/A
Why we may not support renaming VCs while they have SQL services running.
There are at least the following problems:
In the future, we may want to support serving SQL for a VC keyspace in a read-only state.