docs/internals/ingest-v2.md
Ingest V2 is the latest ingestion API that is designed to be more efficient and scalable for thousands of indexes than the previous version. It is the default since 0.9.
Just like ingest V1, the new ingest uses mrecordlog to persist ingested documents that are waiting to be indexed. But unlike V1, which always persists the documents locally on the node that receives them, ingest V2 can dynamically distribute them into WAL units called shards. The assigned shard can be local or on another indexer. The control plane is in charge of distributing the shards to balance the indexing work as well as possible across all indexer nodes. The progress within each shard is not tracked as an index metadata checkpoint anymore but in a dedicated metastore shards table.
In the future, the shard based ingest will also be capable of writing a replica for each shard, thus ensuring a high durability of the documents that are waiting to be indexed (durability of the indexed documents is guarantied by the object store).
Variables driving the ingest configuration are documented here.
With ingest V2, you can also activate the enable_cooperative_indexing option in the indexer configuration. This setting is useful for deployments with very large numbers (dozens) of actively written indexers, to limit the indexing workbench memory consumption. The indexer configuration is in the node configuration:
version: 0.8
# [...]
indexer:
enable_cooperative_indexing: true
See full configuration example.
queues/ directory whereas V2 uses the wal/ directoryingest_api.max_queue_memory_usageingest_api.max_queue_disk_usageingest_api.replication_factor, not working yet