docs/design_docs/20230918-datanode_remove_datacoord_dependency.md
DatacoordCurrent state: "Accepted"
ISSUE: https://github.com/milvus-io/milvus/issues/26758
Keywords: datacoord, datanode, flush, dependency, roll-upgrade
Remove the dependency of Datacoord for Datanodes.
If datanodes performs sync during rolling upgrade, it needs datacoord to change the related meta in metastore. If datacoord happens to be offline or it is during some period of rolling-upgrade, datanode has to panic to ensure there is no data lost.
This proposal means to remove the dependency of datacoord ensuring:
Datacoord shall be able to detect the segment meta updates and provides recent targets for QueryCoordThe most brief description if this proposal is to:
Datanode operating the segment meta directlyDatacoord refresh the latest segment change periodicallyThere is a major concern that if multiple Datanodes are handling the same dml channel, there shall be only one DataNode could update segment meta successfully.
This guarantee is previously implemented by singleton writer in Datacoord: it checks the valid watcher id before update the segment meta when receiving the SaveBinlogPaths grpc call.
In this proposal, DataNodes update segment meta on its own, so we need to introduce a new mechanism to prevent this error from happening:
{% note %}
Note: Like the "etcd lease for key", the ownership of each dml channel is bound to a lease id. This lease id shall be recorded in metastore (etcd/tikv or any other implementation).
When a DataNode start to watch a dml channel, it shall read this lease id (via etcd or grpc call). ANY operations on this dml channel shall under a transaction with the lease id is equal to previously read value.
If a datanode finds the lease id is revoke or updated, it shall close the flowgraph/pipeline and cancel all pending operations instead of panicking.
{% endnote %}
TransactionIf like APIs in TxnKV interfaceLikewise, all channel checkpoints update operations are performed by Datacoord invoking by grpc calls from DataNodes. So it has the same problem in previously stated scenarios.
So, "updating channel checkpoint" shall also be processed in DataNodes while removing the dependency of DataCoord.
The rules system shall follow is:
{% note %}
Note: Segments meta shall be updated BEFORE changing the channel checkpoint in case of datanode crashing during the prodedure. Under this premise, reconsuming from the old checkpoint shall recover all the data and duplidated entries will be discarded by segment checkpoints.
{% endnote %}
DataCoordAs previous described, DataCoord shall refresh the segment meta and channel checkpoint periodically to provide recent target for QueryCoord.
The watching via Etcd strategy is ruled out first since Watch operation shall avoided in the future design: currently Milvus system tends to not use Watch operation and try to remove it from metastore.
Also Watch is heavy and has caused lots of issue before.
The winning option is to:
{% note %}
Note: Datacoord reloads from metastore periodically.
Optimization 1: reload channel checkpoint first, then reload segment meta if newly read revision is greater than in-memory one.
Optimization 2: After L0 segment is implemented, datacoord shall refresh growing segments only.
{% endnote %}
This change shall guarantee that:
Datacoord starts, it shall be able to upgrade the old watch info and add lease id into it
release then watch is the second choice, try call watch with lease idDataNodes could invoking SaveBinlogPaths and other legacy grpc calls without panickingDataNodes receiving old watch request(without lease id) shall fallback to older strategy, which is to update meta via grpcSaveBinlogPaths, UpdateChannelCheckpoints APIs shall be kept until next break changeCoverage over 90%
GetRecoveryInfo, which shall returns latest targetDataCoord refresh meta via Etcd watch