docs/server/features/indexes/secondary.md
KurrentDB v25.1 introduces support for secondary indexes, allowing for efficient querying of event streams based on indexed fields.
The initial version supports two default secondary indexes:
The primary aim for these default secondary indexes is to enhance read performance for common query patterns as well as to remove the need for running system projections that create linked streams for categories ($by-category) and event types ($by-event-type).
::: important How system projections work Category and event type system projections create streams of link events that index events by category or event type. When reading streams of links, KurrentDB must resolve each link event to the original event, which adds overhead to read operations. Also, when streams are being truncated or deleted, link events remain in the database because KurrentDB cannot remove events from streams other than the one being truncated or deleted. This can lead to increased storage usage over time. Statistics we collected from production systems indicate that in systems that keep the database size contained by deleting unused data, up to 50% of the database size can be due to link events created by these system projections, where old chunk files primarily consist of link events that are pointing to deleted events and thus cannot be resolved, taking up to 90% of disk space in those chunk files. As you can imagine, replaying events from the beginning of time in such systems can be very inefficient because first the link events need to be read and then resolved to the original events, many of which may no longer exist. :::
Differences between secondary indexes and system projections:
::: note Example On a database with 130 million events (~400 bytes each) distributed across 1 million streams, using category and event type system projections resulted in 280 million link events. The database files without the link events were around 48 GB, while the database files with link events were around 102 GB. The default index size without link events was around 3.2 GB, while with link events it was around 8.7 GB. The total impact of storing link events on storage size for that particular dataset was around 60 GB, which is roughly a 100% increase. In contrast, the secondary indexes for category and event type were only around 2.2 GB in total.
Note that this example also counts links produced by $streams and $stream-by-category system projections, but those projections produce only one link per stream, so their impact on the database size is marginal compared to category and event type projections.
:::
The secondary indexes feature is enabled by default. After KurrentDB is upgraded to v25.1, it will start building the secondary indexes in the background. Depending on the database size, this process can take from minutes to hours. During this time, you can still use the database normally, but queries that would use the secondary indexes may not return complete results until the indexes are fully built. The database would also report high increase in reads count until the initial indexing is complete.
::: warning Extensive reads during the initial indexing process may impact the performance of other operations on the database. It is recommended to perform the upgrade during a maintenance window or a period of low activity. :::
To disable secondary indexes, set the following configuration options in the kurrentdb.conf file:
SecondaryIndexing:
Enabled: false
Refer to the configuration guide for configuration mechanisms other than YAML.
Once the secondary indexes are built, you can use them to efficiently query events by category or event type. The KurrentDB client libraries provide methods to query events using secondary indexes.
It is possible to use secondary indexes for both read and subscribe operations. KurrentDB supports that by using a special kind of filter that is available for all operations that read from or subscribe to $all. The main difference between streams of links like category, and indexes, is that indexes act as a filter and streams of links are actual streams. Therefore, index reads do not return link events, do not use resolveLinkTos setting, and do not provide a sequential event numbers for returned events. Read and subscribe operations therefore use the log position of the original event for tracking progress instead of the event number in the index.
For example, to read all events of type OrderPlaced using the old event type system projection, you would use the following code:
var readFromEtStream = client.ReadStreamAsync(
Direction.Forwards,
"$et-OrderPlaced",
StreamPosition.Start,
resolveLinkTos: true,
maxCount: 1000
);
With the event type secondary index, you would use the following code:
var read = client.ReadAllAsync(
Direction.Forwards,
Position.Start,
StreamFilter.Prefix("$idx-et-OrderPlaced"),
maxCount: 1000
);
Any client API that supports read and subscribe operation to $all with filters can use secondary indexes in the same way. There are a few things to consider:
$idx-, e.g. $idx-ce-CATEGORYNAME or $idx-et-EVENTTYPEWhen using secondary indexes, keep in mind the following:
In terms of storage and performance, using secondary indexes can lead to:
The first version of secondary indexes has some limitations that will be addressed in future releases:
first vs last and a custom separator character). It always behaves like the first mode with - as the separator character.MaxAge or MaxCount. Events that have been deleted may still be returned by index reads until they are scavenged.$all, those operations can only be performed by users that are part of the $admins group. Future releases will enable reading from $all and secondary indexes without requiring admin privileges.KurrentDB uses an embedded DuckDB for storing index data. The DuckDB database files are stored in the same directory as the main KurrentDB database files. Unlike KurrentDB database files, DuckDB files are not append-only and can be modified in place.
Because of that, using file-system based backup solutions (e.g., file copies) may lead to inconsistent backups of secondary indexes if the DuckDB files are modified during the backup process. Therefore, the recommended way to perform backup and restore for KurrentDB v25.1 and later is to use volume snapshots instead of file copies. Volume snapshots ensure that all files, including DuckDB files, are captured in a consistent state. A file copy backup can still be performed if the KurrentDB node is stopped first.