docs/design_docs/segcore/segment_growing.md
Growing segment has the following additional interfaces:
PreInsert(size) -> reservedOffset: serial interface, which reserves space for future insertion and returns the reservedOffset.
Insert(reservedOffset, size, ...Data...): write ...Data... into range [reservedOffset, reservedOffset + size). This interface is allowed to be called concurrently.
...Data... contains row_ids, timestamps two system attributes, and other columnsPreDelete & Delete(reservedOffset, row_ids, timestamps) is a delete interface similar to insert interface.Growing segment stores data in the form of chunk. The number of rows in each chunk is restricted by configs.
Rows per segment are controlled by parameters size_per_Chunk config
When inserting, first allocate enough space to ensure total_size <= num_chunk * size_per_chunk, and then convert data from row format to column format.
During a search, each 'chunk' will be searched, and the search results will be saved as 'subquery result', then reduced into TopK.
Growing Segment also implements small batch index for vectors. The parameters of small batch index are preset in segcore config
When metric type is specified in the schema, the default parameters will build an index for each chunk to accelerate query
parse_from can parse from yaml files(this function is not enabled by default)
${milvus}/internal/core/unittest/test_utils/test_segcore.yamldefault_config offers default parametersUsed to manage concurrent inserted data, including:
atomic<int64_t> reserved reserved space calculationAckResponder calculate which segment to insert, returns current segment offsetConcurrentVector stores data columns, each column has one concurrent vectorThe following steps are executed when insert,
Serially Execute PreInsert(size) -> reserved_offset to allocate memory space, the address of space is [reserved_offset, reserved_offset + size) is reserved
Parallelly execute Insert(reserved_offset, size, ...Data...) interface,copy data into the above memory address
ConcurrentVector of each column, call grow_to_at_least to reserve spaceset_data_raw interface to put data into corresponding locations.AddSegment of AckResponder ,mark the space [reserved_offset, reserved_offset + size) to already insertedThis is a column data storage that can be inserted concurrently. It is composed of multi-data chunks.
grow_to_at_least(size) called, reserve space no less than sizeset_data_raw(element_offset, source, element_count) point source to continuous piece of dataget_span(chunk_id) get the span of the corresponding chunk