docs/documentation/indexing/reindexing.mdx
If an index's schema is changed, it must be rebuilt. This includes:
Let's assume the existing index is called search_idx, and we want to create a new index called search_idx_v2.
First, use CREATE INDEX CONCURRENTLY to build a new index in the background.
from django.db import connection
from paradedb.indexes import BM25Index
with connection.schema_editor(atomic=False) as schema_editor:
schema_editor.add_index(
MockItem,
BM25Index(
fields={
"id": {},
"description": {},
"category": {},
},
key_field="id",
name="search_idx_v2",
),
concurrently=True,
)
from sqlalchemy import Index
from paradedb.sqlalchemy import indexing
idx = Index(
"search_idx_v2",
indexing.BM25Field(MockItem.id),
indexing.BM25Field(MockItem.description),
indexing.BM25Field(MockItem.category),
postgresql_using="bm25",
postgresql_with={"key_field": "id"},
postgresql_concurrently=True,
)
with engine.connect() as conn:
conn = conn.execution_options(isolation_level="AUTOCOMMIT")
idx.create(conn)
ActiveRecord::Base.connection.add_bm25_index(
:mock_items,
fields: {
id: {},
description: {},
category: {}
},
key_field: :id,
name: :search_idx_v2,
concurrently: true
)
From another session, you can use pg_stat_progress_create_index to track the progress of the new index.
Once the new index is done building, confirm that it is valid:
SELECT ix.indisvalid, ix.indisready, ix.indislive
FROM pg_class i
JOIN pg_index ix ON ix.indexrelid = i.oid
WHERE i.relname = 'search_idx_v2';
indisvalid | indisready | indislive
------------+------------+-----------
t | t | t
(1 row)
If all three columns are true, the original index can safely be dropped, which will redirect queries to the new index.
DROP INDEX search_idx;
REINDEX is used to rebuild an index without changing the schema.
The basic syntax for REINDEX is:
REINDEX INDEX search_idx;
This operation takes an exclusive lock on the table, which blocks incoming writes (but not reads) while the new index is being built.
To allow for concurrent writes during a reindex, use REINDEX CONCURRENTLY:
REINDEX INDEX CONCURRENTLY search_idx;
The tradeoff is that REINDEX CONCURRENTLY is slower than a plain REINDEX. Generally speaking, REINDEX CONCURRENTLY is recommended for
production systems that cannot tolerate temporarily blocked writes.
Although CREATE INDEX CONCURRENTLY and REINDEX CONURRENTLY run in the background, Postgres requires that the
session that is executing the command remain open. If the session is closed,
Postgres will cancel the operation. This is relevant if you are using a
connection pooler like pgbouncer, which may terminate
sessions after a certain idle timeout is reached.
If REINDEX CONCURRENTLY fails or is cancelled, an invalid transient index will be left behind that must be dropped manually.
To check for invalid indexes in psql, run \d <table_name> and look for INVALID indexes.