docs/documentation/indexing/verify-index.mdx
ParadeDB provides amcheck-style index verification functions to detect corruption and validate the structural integrity of BM25 indexes.
These functions are useful for:
The pdb.verify_index function performs structural integrity checks on a BM25 index:
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx")
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(engine, "search_idx")
ParadeDB.paradedb_verify_index("search_idx")
This returns a table with three columns:
| Column | Type | Description |
|---|---|---|
check_name | text | Name of the verification check |
passed | boolean | Whether the check passed |
details | text | Additional information about the check result |
check_name | passed | details
----------------------------------------+--------+-----------------------------------------------
search_idx: schema_valid | t | Index schema loaded successfully
search_idx: index_readable | t | Index reader opened successfully
search_idx: checksums_valid | t | All segment checksums validated successfully
search_idx: segment_metadata_valid | t | 3 segments validated successfully
To verify that all indexed entries still exist in the heap table, use the heapallindexed option:
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx", heapallindexed=True)
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
)
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true
)
This adds an additional check that validates every indexed ctid (tuple identifier) references a valid row in the table.
This is particularly useful for detecting index entries that reference deleted or non-existent rows.
For large indexes, you can check a random sample of documents instead of all documents:
<CodeGroup> ```sql SQL -- Check 10% of documents SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, sample_rate := 0.1 ); ```from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
sample_rate=0.1,
)
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
sample_rate=0.1,
)
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
sample_rate: 0.1
)
For long-running verifications, enable progress reporting to see status updates:
<CodeGroup> ```sql SQL SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, report_progress := true ); ```from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
report_progress=True,
)
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
report_progress=True,
)
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
report_progress: true
)
Progress messages are emitted via PostgreSQL's NOTICE channel.
For detailed logging including segment-by-segment progress and resume hints, enable verbose mode:
<CodeGroup> ```sql SQL SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, report_progress := true, verbose := true ); ```from paradedb import paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
report_progress=True,
verbose=True,
)
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
report_progress=True,
verbose=True,
)
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
report_progress: true,
verbose: true
)
To stop verification immediately when the first error is found (similar to pg_amcheck --on-error-stop):
from paradedb import paradedb_verify_index
paradedb_verify_index("search_idx", on_error_stop=True)
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
on_error_stop=True,
)
ParadeDB.paradedb_verify_index(
"search_idx",
on_error_stop: true
)
A single verify_index call processes segments sequentially within one PostgreSQL backend.
For very large indexes, you can distribute verification across multiple database connections
by specifying which segments each connection should check using the segment_ids parameter.
This allows you to utilize multiple CPU cores by running verification in parallel processes.
First, list all segments in the index:
<CodeGroup> ```sql SQL SELECT * FROM pdb.index_segments('search_idx'); ```from paradedb import paradedb_index_segments
paradedb_index_segments("search_idx")
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_index_segments(engine, "search_idx")
ParadeDB.paradedb_index_segments("search_idx")
partition_name | segment_idx | segment_id | num_docs | num_deleted | max_doc
----------------+-------------+------------+----------+-------------+---------
search_idx | 0 | b7e661af | 10000 | 0 | 10000
search_idx | 1 | b4fc1b40 | 10000 | 0 | 10000
search_idx | 2 | 9894b412 | 10000 | 0 | 10000
search_idx | 3 | 4d0168d6 | 5000 | 0 | 5000
Then verify specific segments using the segment_ids parameter:
-- Worker 2: Verify odd segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ARRAY[1, 3] );
```python Django
from paradedb import paradedb_verify_index
# Worker 1: Verify even segments
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[0, 2],
)
# Worker 2: Verify odd segments
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[1, 3],
)
from paradedb.sqlalchemy import diagnostics
# Worker 1: Verify even segments
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[0, 2],
)
# Worker 2: Verify odd segments
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[1, 3],
)
# Worker 1: Verify even segments
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: [0, 2]
)
# Worker 2: Verify odd segments
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: [1, 3]
)
Distribute verification across N workers:
<CodeGroup> ```sql SQL -- Get segments for worker 0 (of 4 workers) SELECT array_agg(segment_idx) AS segments FROM pdb.index_segments('search_idx') WHERE segment_idx % 4 = 0;-- Run verification with those segments SELECT * FROM pdb.verify_index('search_idx', heapallindexed := true, segment_ids := ( SELECT array_agg(segment_idx) FROM pdb.index_segments('search_idx') WHERE segment_idx % 4 = 0 ) );
```python Django
from paradedb import paradedb_index_segments, paradedb_verify_index
paradedb_verify_index(
"search_idx",
heapallindexed=True,
segment_ids=[
row["segment_idx"]
for row in paradedb_index_segments("search_idx")
if row["segment_idx"] % 4 == 0
],
)
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_index(
engine,
"search_idx",
heapallindexed=True,
segment_ids=[
row["segment_idx"]
for row in diagnostics.paradedb_index_segments(engine, "search_idx")
if row["segment_idx"] % 4 == 0
],
)
ParadeDB.paradedb_verify_index(
"search_idx",
heapallindexed: true,
segment_ids: ParadeDB.paradedb_index_segments("search_idx").filter_map { |row|
row["segment_idx"] if row["segment_idx"] % 4 == 0
})
To verify all BM25 indexes in the database at once:
<CodeGroup> ```sql SQL SELECT * FROM pdb.verify_all_indexes(); ```from paradedb import paradedb_verify_all_indexes
paradedb_verify_all_indexes()
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_verify_all_indexes(engine)
ParadeDB.paradedb_verify_all_indexes()
Filter indexes by schema or name pattern (using SQL LIKE syntax):
-- Verify indexes matching a name pattern SELECT * FROM pdb.verify_all_indexes(index_pattern := 'search_%');
-- Combine filters SELECT * FROM pdb.verify_all_indexes( schema_pattern := 'app_%', index_pattern := '%_idx', heapallindexed := true );
```python Django
from paradedb import paradedb_verify_all_indexes
# Verify indexes in the 'public' schema only
paradedb_verify_all_indexes(schema_pattern="public")
# Verify indexes matching a name pattern
paradedb_verify_all_indexes(index_pattern="search_%")
# Combine filters
paradedb_verify_all_indexes(
schema_pattern="app_%",
index_pattern="%_idx",
heapallindexed=True,
)
from paradedb.sqlalchemy import diagnostics
# Verify indexes in the 'public' schema only
diagnostics.paradedb_verify_all_indexes(
engine,
schema_pattern="public",
)
# Verify indexes matching a name pattern
diagnostics.paradedb_verify_all_indexes(
engine,
index_pattern="search_%",
)
# Combine filters
diagnostics.paradedb_verify_all_indexes(
engine,
schema_pattern="app_%",
index_pattern="%_idx",
heapallindexed=True,
)
# Verify indexes in the 'public' schema only
ParadeDB.paradedb_verify_all_indexes(schema_pattern: "public")
# Verify indexes matching a name pattern
ParadeDB.paradedb_verify_all_indexes(index_pattern: "search_%")
# Combine filters
ParadeDB.paradedb_verify_all_indexes(
schema_pattern: "app_%",
index_pattern: "%_idx",
heapallindexed: true
)
To see all BM25 indexes in the database with summary statistics:
<CodeGroup> ```sql SQL SELECT * FROM pdb.indexes(); ```from paradedb import paradedb_indexes
paradedb_indexes()
from paradedb.sqlalchemy import diagnostics
diagnostics.paradedb_indexes(engine)
ParadeDB.paradedb_indexes()
schemaname | tablename | indexname | indexrelid | num_segments | total_docs
------------+-------------+---------------+------------+--------------+------------
public | products | products_idx | 16421 | 3 | 50000
public | documents | documents_idx | 16435 | 5 | 125000
app | articles | articles_idx | 16448 | 2 | 10000
pdb.verify_indexVerifies a single BM25 index.
| Parameter | Type | Default | Description |
|---|---|---|---|
index | regclass | (required) | The index to verify |
heapallindexed | boolean | false | Check that all indexed ctids exist in the heap |
sample_rate | float | NULL | Fraction of documents to check (0.0-1.0). NULL = check all |
report_progress | boolean | false | Emit progress messages |
verbose | boolean | false | Emit detailed segment-level progress and resume hints |
on_error_stop | boolean | false | Stop on first error found |
segment_ids | int[] | NULL | Specific segment indices to check. NULL = all segments |
pdb.verify_all_indexesVerifies all BM25 indexes in the database.
| Parameter | Type | Default | Description |
|---|---|---|---|
schema_pattern | text | NULL | Filter by schema name (SQL LIKE pattern). NULL = all |
index_pattern | text | NULL | Filter by index name (SQL LIKE pattern). NULL = all |
heapallindexed | boolean | false | Check that all indexed ctids exist in the heap |
sample_rate | float | NULL | Fraction of documents to check (0.0-1.0). NULL = check all |
report_progress | boolean | false | Emit progress messages |
on_error_stop | boolean | false | Stop on first error found |
pdb.index_segmentsLists all segments in a BM25 index.
| Parameter | Type | Default | Description |
|---|---|---|---|
index | regclass | (required) | The index to inspect |
Returns:
| Column | Type | Description |
|---|---|---|
partition_name | text | Name of the index partition |
segment_idx | int | Segment index (use with segment_ids parameter) |
segment_id | text | Tantivy segment UUID |
num_docs | bigint | Number of live documents |
num_deleted | bigint | Number of deleted documents |
max_doc | bigint | Maximum document ID |
pdb.indexesLists all BM25 indexes in the database.
Returns:
| Column | Type | Description |
|---|---|---|
schemaname | text | Schema containing the index |
tablename | text | Table the index is on |
indexname | text | Name of the index |
indexrelid | oid | OID of the index |
num_segments | int | Number of Tantivy segments |
total_docs | bigint | Total documents across all segments |