doc/user/content/sql/create-index.md
CREATE INDEX creates an in-memory index on a source, view, or materialized view.
In Materialize, indexes store query results in memory within a specific cluster, and keep these results incrementally updated as new data arrives. This ensures that indexed data remains fresh, reflecting the latest changes with minimal latency.
The primary use case for indexes is to accelerate direct queries issued via SELECT statements.
By maintaining fresh, up-to-date results in memory, indexes can significantly optimize query performance, reducing both response time and compute load—especially for resource-intensive operations such as joins, aggregations, and repeated subqueries.
Because indexes are scoped to a single cluster, they are most useful for accelerating queries within that cluster. For results that must be shared across clusters or persisted to durable storage, consider using a materialized view, which also maintains fresh results but is accessible system-wide.
{{< tabs >}} {{< tab "CREATE INDEX" >}}
Create an index using the specified columns as the index key.
{{% include-syntax file="examples/create_index" example="syntax" %}}
{{< /tab >}} {{< tab "CREATE DEFAULT INDEX" >}}
Create a default index using a set of columns that uniquely identify each row. If this set of columns cannot be inferred, all columns are used.
{{% include-syntax file="examples/create_index" example="syntax-default" %}}
{{< /tab >}} {{< /tabs >}}
You can only reference the columns available in the SELECT list of the query
that defines the view. For example, if your view was defined as SELECT a, b FROM src, you can only reference columns a and b, even if src contains
additional columns.
You cannot exclude any columns from being in the index's "value" set. For
example, if your view is defined as SELECT a, b FROM ..., all indexes will
contain {a, b} as their values.
If you want to create an index that only stores a subset of these columns,
consider creating another materialized view that uses SELECT some_subset FROM this_view....
Indexes in Materialize have the following structure for each unique row:
((tuple of indexed expressions), (tuple of the row, i.e. stored columns))
Automatically created indexes will use all columns as key expressions for the index, unless Materialize is provided or can infer a unique key for the source or view.
For instance, unique keys can be...
GROUP BY.When creating your own indexes, you can choose the indexed expressions.
The in-memory sizes of indexes are proportional to the current size of the source or view they represent. The actual amount of memory required depends on several details related to the rate of compaction and the representation of the types of data in the source or view.
Creating an index may also force the first materialization of a view, which may cause Materialize to install a dataflow to determine and maintain the results of the view. This dataflow may have a memory footprint itself, in addition to that of the index.
{{% include-from-yaml data="index_view_details" name="index-best-practices" %}}
{{% include-from-yaml data="index_view_details" name="table-usage-pattern-intro" %}} {{% include-from-yaml data="index_view_details" name="table-usage-pattern" %}}
You might want to create indexes when...
{{% include-from-yaml data="index_view_details" name="index-query-optimization-specific-instances" %}}
You can optimize the performance of JOIN on two relations by ensuring their
join keys are the key columns in an index.
CREATE MATERIALIZED VIEW active_customers AS
SELECT guid, geo_id, last_active_on
FROM customer_source
WHERE last_active_on > now() - INTERVAL '30' DAYS;
CREATE INDEX active_customers_geo_idx ON active_customers (geo_id);
CREATE MATERIALIZED VIEW active_customer_per_geo AS
SELECT geo.name, count(*)
FROM geo_regions AS geo
JOIN active_customers ON active_customers.geo_id = geo.id
GROUP BY geo.name;
In the above example, the index active_customers_geo_idx...
Helps us because it contains a key that the view active_customer_per_geo can
use to look up values for the join condition (active_customers.geo_id).
Because this index is exactly what the query requires, the Materialize
optimizer will choose to use active_customers_geo_idx rather than build
and maintain a private copy of the index just for this query.
Obeys our restrictions by containing only a subset of columns in the result set.
If you commonly filter by a certain column being equal to a literal value, you can set up an index over that column to speed up your queries:
CREATE MATERIALIZED VIEW active_customers AS
SELECT guid, geo_id, last_active_on
FROM customer_source
GROUP BY geo_id;
CREATE INDEX active_customers_idx ON active_customers (guid);
-- This should now be very fast!
SELECT * FROM active_customers WHERE guid = 'd868a5bf-2430-461d-a665-40418b1125e7';
-- Using indexed expressions:
CREATE INDEX active_customers_exp_idx ON active_customers (upper(guid));
SELECT * FROM active_customers WHERE upper(guid) = 'D868A5BF-2430-461D-A665-40418B1125E7';
-- Filter using an expression in one field and a literal in another field:
CREATE INDEX active_customers_exp_field_idx ON active_customers (upper(guid), geo_id);
SELECT * FROM active_customers WHERE upper(guid) = 'D868A5BF-2430-461D-A665-40418B1125E7' and geo_id = 'ID_8482';
Create an index with an expression to improve query performance over a frequently used expression, and
avoid building downstream views to apply the function like the one used in the example: upper().
Take into account that aggregations like count() cannot be used as indexed expressions.
For more details on using indexes to optimize queries, see Optimization.
The privileges required to execute this statement are:
{{% include-headless "/headless/sql-command-privileges/create-index" %}}