Back to Paradedb

Index Creation

docs/documentation/performance-tuning/create-index.mdx

0.23.31.6 KB
Original Source

These actions can improve the performance and memory consumption of CREATE INDEX and REINDEX statements.

Raise Parallel Indexing Workers

ParadeDB uses Postgres' max_parallel_maintenance_workers setting to determine the degree of parallelism during CREATE INDEX/REINDEX. Postgres' default is 2, which may be too low for large tables.

sql
SET max_parallel_maintenance_workers = 8;

In order for max_parallel_maintenance_workers to take effect, it must be less than or equal to both max_parallel_workers and max_worker_processes.

Configure Indexing Memory

The default Postgres maintenance_work_mem value of 64MB is quite conservative and can slow down parallel index builds. We recommend at least 64MB per parallel indexing worker.

sql
SET maintenance_work_mem = '2GB';
<Note> Each worker is required to have at least `15MB` memory. If `maintenance_work_mem` is set too low, an error will be returned. </Note>

Defer Index Creation

If possible, creating the BM25 index should be deferred until after a table has been populated. To illustrate:

sql
-- This is preferred
CREATE TABLE test (id SERIAL, data text);
INSERT INTO test (data) VALUES ('hello world'), ('many more values');
CREATE INDEX ON test USING bm25 (id, data) WITH (key_field = 'id');

-- ...to this
CREATE TABLE test (id SERIAL, data text);
CREATE INDEX ON test USING bm25 (id, data) WITH (key_field = 'id');
INSERT INTO test (data) VALUES ('hello world'), ('many more values');

This allows the BM25 index to create a more tightly packed, efficient representation on disk and will lead to faster build times.