docs/design_docs/20211221-retrieve_entity.md
In Milvus, a collection has multiple fields, mainly there are two kinds of fields: vector field and scalar field. We call a row an entity, one entity encapsulates multiple vectors and scalar values.
When creating a collection, you can specify using the auto-generated primary key, or using the user-provided primary key. If a user sets to use the user-provided primary key, each entity inserted must contain the primary key field. Otherwise, the insertion fails. The primary keys will be returned after the insertion request is successful.
Milvus currently only supports primary keys of the int64 type.
QueryNode subscribes to the insert channel and will determine whether to use the data extracted from the insert channel or data processed by DataNode to provide services according to the status of a segment.
When the DataNode processes each inserted entity, it updates the bloomfilter of the Segment to which the entity belongs. If it does not exist, it creates a bloomfilter in memory and updates it.
Once DataNode receives a Flush command from DataCoord, it sorts the data in the segment in ascending order of primary key, records the maximum and minimum values of a primary key, and writes the segment, statistics and bloomfilter to the storage system.
${tenant}/insert_log/${collection_id}/${partition_id}/${segment_id}/${field_id}/_${log_idx}${tenant}/insert_log/${collection_id}/${partition_id}/${segment_id}/${field_id}/stats_${log_idx}${tenant}/insert_log/${collection_id}/${partition_id}/${segment_id}/${field_id}/bf_${log_idx}QueryNode maintains a mapping from primary key to entities in each segment. This mapping updates every time an insert request is processed.
After receiving the Get request from the client, the Proxy sends the request to the search channel and waits for the result returned from the searchResult channel.
The processing flow after QueryNode reads the Get request from search channel:
Growing status segment, and return directly if found;Growing segments, return the results;Sealed segments;Sealed segment;Sealed segment, return empty if not found;// pseudo-code
func get(collection_name string,
ids list[string],
output_fields list[string],
partition_names list[string]) (list[entity], error)
// Example
// entities = get("collection1", ["103"], ["_id", "age"], nil)
When the primary key does not exist in specified collection( and partitions), Milvus will return an empty result, which is not considered as an error.
Both bloomfilter files and statistical information files belong to Binlog files and follow the Binlog file format.
https://github.com/milvus-io/milvus/blob/master/docs/developer_guides/chap08_binlog.md
Two new types of Binlog are added: BFBinlog and StatsBinlog.
BFBinlog Payload: Refer to https://github.com/milvus-io/milvus/blob/1.1/core/src/segment/SegmentWriter.h for storage methods
StatsBinlog Payload: Json format string, currently only contains the keys max, min.
${log_idx} to _${log_idx}In the newly created collection, insert an entity with a primary key of 107, call the Get interface to query the entity with a primary key of 107, and each field of the retrieved entity is exactly the same as the inserted entity.
In the newly created collection, insert a record with a primary key of 107, call the Get interface to query the record with a primary key of 106, and the retrieved record is empty.
In the newly created collection, insert the records with the primary keys of 105, 106, 107, call the Get interface to query the records with the primary keys of 101, 102, 103, 104, 105, 106, 107, the retrieved result only contains the records with primary keys of 105, 106, 107.
In the newly created collection, insert a record with a primary key of 107, call the Flush interface, and check whether there are stats and bloomfilter files on MinIO.