docs/design_docs/20211224-drop_collection_release_resources.md
When dropping a collection
For not in used binlogs on blob storage: Why are there such binlogs
This enhancement is focused on solving these 2 problems.
DataNode ignites Flush&Drop receive drop collection msg -> cancel compaction -> flush all insert buffer and delete buffer -> release the flowgraph
Plan 1: Picked
Add a dropped flag in SaveBinlogPathRequest proto.
DataNode
dropped flag true.
DataCoord
dropped, doesn't remove segmentInfos from etcd.Pros: 1. The easiest approach in both DataNode and DataCoord. 2. DataNode can reuse the current flush manager procedure. Cons: 1. The No. rpc call is equal to the No. segments in a collection, expensive.
Plan 2: Enhance later
Add a new rpc FlushAndDrop, it's a vchannel scope rpc.
Pros: 1. much lesser rpc calls, equal to shard-numbers. 2. More clarity of flush procedure in DataNode. Cons: 1. More efforts in DataNode and DataCoord.
message FlushAndDropRequest {
common.MsgBase base = 1;
string channelID = 2;
int64 collectionID = 3;
repeated SegmentBinlogPaths segment_binlog_paths = 6;
}
message SegmentBinlogPaths {
int64 segmentID = 1;
CheckPoint checkPoint = 2;
repeated FieldBinlog field2BinlogPaths = 2;
repeated FieldBinlog field2StatslogPaths = 3;
repeated DeltaLogInfo deltalogs = 4;
}
DataCoord runs a background GC goroutine, triggers every 1 day: