docs/design_docs/20211217-milvus_create_collection.md
Milvus 2.0 uses Collection to represent a set of data, like Table in a traditional database. User can create or drop Collection.
This article introduces the execution path of CreateCollection, at the end of this article, you should know which components are involved in CreateCollection.
The execution flow of CreateCollection is shown in the following figure:
SDK starts a CreateCollection request to Proxy via Grpc, the proto is defined as follows:service MilvusService {
...
rpc CreateCollection(CreateCollectionRequest) returns (common.Status) {}
...
}
message CreateCollectionRequest {
// Not useful for now
common.MsgBase base = 1;
// Not useful for now
string db_name = 2;
// The unique collection name in milvus.(Required)
string collection_name = 3;
// The serialized `schema.CollectionSchema`(Required)
bytes schema = 4;
// Once set, no modification is allowed (Optional)
// https://github.com/milvus-io/milvus/issues/6690
int32 shards_num = 5;
}
message CollectionSchema {
string name = 1;
string description = 2;
bool autoID = 3; // deprecated later, keep compatible with c++ part now
repeated FieldSchema fields = 4;
}
CreateCollection request, Proxy would wrap this request into CreateCollectionTask, and pushes this task into DdTaskQueue queue. After that, Proxy would call WaitToFinish method to wait until the task is finished.type task interface {
TraceCtx() context.Context
ID() UniqueID // return ReqID
SetID(uid UniqueID) // set ReqID
Name() string
Type() commonpb.MsgType
BeginTs() Timestamp
EndTs() Timestamp
SetTs(ts Timestamp)
OnEnqueue() error
PreExecute(ctx context.Context) error
Execute(ctx context.Context) error
PostExecute(ctx context.Context) error
WaitToFinish() error
Notify(err error)
}
type createCollectionTask struct {
Condition
*milvuspb.CreateCollectionRequest
ctx context.Context
rootCoord types.RootCoord
result *commonpb.Status
schema *schemapb.CollectionSchema
}
There is a background service in Proxy, this service would get the CreateCollectionTask from DdTaskQueue, and execute it in three phases.
PreExecute, do some static checking at this phase, such as check if Collection Name and Field Name are legal, if there are duplicate columns, etc.Execute, at this phase, Proxy would send CreateCollection request to RootCoord via Grpc, and wait for response, the proto is defined as follows:service RootCoord {
...
rpc CreateCollection(milvus.CreateCollectionRequest) returns (common.Status){}
...
}
PostExecute, CreateCollectionTask does nothing at this phase, and return directly.RootCoord would wrap the CreateCollection request into CreateCollectionReqTask, and then call function executeTask. executeTask would return until the context is done or CreateCollectionReqTask.Execute is returned.
type reqTask interface {
Ctx() context.Context
Type() commonpb.MsgType
Execute(ctx context.Context) error
Core() *Core
}
type CreateCollectionReqTask struct {
baseReqTask
Req *milvuspb.CreateCollectionRequest
}
CreateCollectionReqTask.Execute would alloc CollectionID and default PartitionID, and set Virtual Channel and Physical Channel, which are used by MsgStream, then write the Collection's meta into metaTable
After Collection's meta written into metaTable, Milvus would consider this collection has been created successfully.
RootCoord would alloc a timestamp from TSO before writing Collection's meta into metaTable, and this timestamp is considered as the point when the collection was created
At last RootCoord will send a message of CreateCollectionRequest into MsgStream, and other components, who have subscribed to the MsgStream, would be notified. The Proto of CreateCollectionRequest is defined as follows:
message CreateCollectionRequest {
common.MsgBase base = 1;
string db_name = 2;
string collectionName = 3;
string partitionName = 4;
int64 dbID = 5;
int64 collectionID = 6;
int64 partitionID = 7;
// `schema` is the serialized `schema.CollectionSchema`
bytes schema = 8;
repeated string virtualChannelNames = 9;
repeated string physicalChannelNames = 10;
}
RootCoord would update the internal timestamp and return, so Proxy would get the response.Notes:
In Proxy, all DDL requests will be wrapped into task, and push the task into DdTaskQueue.
A background service will read a new task from DdTaskQueue only when the previous one is finished.
So all the DDL requests are executed serially on Proxy.
In RootCoord, all DDL requests will be wrapped into reqTask, but there is no task queue, so the DDL requests will be executed in parallel on RootCoord.