Reindexing - Paradedb

Changing the Schema

If an index's schema is changed, it must be rebuilt. This includes:

Adding a field to the index
Removing a field from the index
Renaming an indexed column in the underlying table
Changing a field's tokenizer

Let's assume the existing index is called search_idx, and we want to create a new index called search_idx_v2. First, use CREATE INDEX CONCURRENTLY to build a new index in the background.

<CodeGroup> ```sql SQL CREATE INDEX CONCURRENTLY search_idx_v2 ON mock_items USING bm25 (id, description, category) WITH (key_field = 'id'); ```

python

from django.db import connection
from paradedb.indexes import BM25Index

with connection.schema_editor(atomic=False) as schema_editor:
    schema_editor.add_index(
        MockItem,
        BM25Index(
            fields={
                "id": {},
                "description": {},
                "category": {},
            },
            key_field="id",
            name="search_idx_v2",
        ),
        concurrently=True,
    )

python

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

idx = Index(
    "search_idx_v2",
    indexing.BM25Field(MockItem.id),
    indexing.BM25Field(MockItem.description),
    indexing.BM25Field(MockItem.category),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
    postgresql_concurrently=True,
)

with engine.connect() as conn:
    conn = conn.execution_options(isolation_level="AUTOCOMMIT")
    idx.create(conn)

ruby

ActiveRecord::Base.connection.add_bm25_index(
  :mock_items,
  fields: {
    id: {},
    description: {},
    category: {}
  },
  key_field: :id,
  name: :search_idx_v2,
  concurrently: true
)

</CodeGroup> <Note> The `CONCURRENTLY` clause is required. `CONCURRENTLY` allows the existing index to continue serving queries while the new index is being built. </Note>

From another session, you can use pg_stat_progress_create_index to track the progress of the new index.

Once the new index is done building, confirm that it is valid:

sql

SELECT ix.indisvalid, ix.indisready, ix.indislive
FROM pg_class i
JOIN pg_index ix ON ix.indexrelid = i.oid
WHERE i.relname = 'search_idx_v2';

csv

 indisvalid | indisready | indislive
------------+------------+-----------
 t          | t          | t
(1 row)

If all three columns are true, the original index can safely be dropped, which will redirect queries to the new index.

sql

DROP INDEX search_idx;

Rebuilding the Index

REINDEX is used to rebuild an index without changing the schema. The basic syntax for REINDEX is:

sql

REINDEX INDEX search_idx;

This operation takes an exclusive lock on the table, which blocks incoming writes (but not reads) while the new index is being built.

To allow for concurrent writes during a reindex, use REINDEX CONCURRENTLY:

sql

REINDEX INDEX CONCURRENTLY search_idx;

The tradeoff is that REINDEX CONCURRENTLY is slower than a plain REINDEX. Generally speaking, REINDEX CONCURRENTLY is recommended for production systems that cannot tolerate temporarily blocked writes.

Important Caveats

Although CREATE INDEX CONCURRENTLY and REINDEX CONURRENTLY run in the background, Postgres requires that the session that is executing the command remain open. If the session is closed, Postgres will cancel the operation. This is relevant if you are using a connection pooler like pgbouncer, which may terminate sessions after a certain idle timeout is reached.

If REINDEX CONCURRENTLY fails or is cancelled, an invalid transient index will be left behind that must be dropped manually. To check for invalid indexes in psql, run \d <table_name> and look for INVALID indexes.