docs/deploy/citus.mdx
Citus transforms PostgreSQL into a distributed database with horizontal sharding. ParadeDB is fully compatible with Citus, enabling distributed full-text search across sharded tables.
create_distributed_table()||| (match disjunction) and &&& (match conjunction) operators in queries across sharded tablesBoth citus and pg_search must be added to shared_preload_libraries in the correct order:
# Install Citus first
curl https://install.citusdata.com/community/deb.sh | sudo bash
apt-get install -y postgresql-18-citus-14.0
# Add both extensions to shared_preload_libraries
sed -i "s/^shared_preload_libraries = .*/shared_preload_libraries = 'citus,pg_search'/" /var/lib/postgresql/data/postgresql.conf
# Restart PostgreSQL
# Then create extensions in your database
Here's a complete example of setting up distributed search with Citus:
CREATE EXTENSION citus;
CREATE EXTENSION pg_search;
-- Create a table with a distribution key
CREATE TABLE articles (
id SERIAL,
author_id INT NOT NULL,
title TEXT,
body TEXT,
PRIMARY KEY (author_id, id) -- Must include distribution column
);
-- Distribute the table across shards
SELECT create_distributed_table('articles', 'author_id');
-- Create a BM25 index on the distributed table
CREATE INDEX articles_search_idx ON articles
USING bm25 (id, title, body)
WITH (key_field='id');
-- Insert some data
INSERT INTO articles (author_id, title, body) VALUES
(1, 'PostgreSQL Performance', 'Optimizing PostgreSQL queries for large datasets'),
(1, 'Distributed Databases', 'Understanding sharding and replication strategies'),
(2, 'Full-Text Search', 'Building search engines with PostgreSQL');
-- Search across shards
SELECT id, title FROM articles
WHERE body ||| 'PostgreSQL distributed'
ORDER BY id;
-- Results:
-- id | title
-- ----+------------------------
-- 1 | PostgreSQL Performance
-- 3 | Full-Text Search
You can verify that both ParadeDB and Citus are working together by examining the query plan:
EXPLAIN (VERBOSE)
SELECT id, title FROM articles
WHERE body ||| 'PostgreSQL distributed'
ORDER BY id;
The plan should show:
Sort (cost=11041.82..11291.82 rows=100000 width=36)
Output: remote_scan.id, remote_scan.title
Sort Key: remote_scan.id
-> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=36)
Output: remote_scan.id, remote_scan.title
Task Count: 32
Tasks Shown: One of 32
-> Task
Query: SELECT id, title FROM public.articles_102008 articles WHERE (id OPERATOR(pg_catalog.@@@) ...)
Node: host=localhost port=5432 dbname=postgres
-> Custom Scan (ParadeDB Base Scan) on public.articles_102008 articles (cost=10.00..10.01 rows=1 width=36)
Output: id, title
Table: articles_102008
Index: articles_search_idx_102008
Tantivy Query: {"with_index":{"query":{"with_index":{"query":{"match":{"field":"body","value":"PostgreSQL distributed"}}}}}}
ParadeDB search operators work seamlessly with Citus distributed JOINs:
-- Create and distribute a second table
CREATE TABLE authors (
id INT PRIMARY KEY,
name TEXT,
bio TEXT
);
SELECT create_distributed_table('authors', 'id');
-- JOIN with search operators
SELECT a.name, ar.title
FROM authors a
JOIN articles ar ON a.id = ar.author_id
WHERE ar.body ||| 'PostgreSQL'
ORDER BY a.name;
-- Results:
-- name | title
-- -------+------------------------
-- Alice | PostgreSQL Performance
-- Bob | Full-Text Search
Check the execution plan for distributed JOINs with search:
EXPLAIN (VERBOSE)
SELECT a.name, ar.title
FROM authors a
JOIN articles ar ON a.id = ar.author_id
WHERE ar.body ||| 'PostgreSQL'
ORDER BY a.name;
Sort (cost=12067.32..12317.32 rows=100000 width=64)
Output: remote_scan.name, remote_scan.title
Sort Key: remote_scan.name
-> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=64)
Output: remote_scan.name, remote_scan.title
Task Count: 32
Tasks Shown: One of 32
-> Task
Query: SELECT a.name, ar.title FROM (public.authors_102040 a JOIN public.articles_102008 ar ON (...))
Node: host=localhost port=5432 dbname=postgres
-> Nested Loop (cost=10.15..18.20 rows=1 width=64)
Output: a.name, ar.title
Inner Unique: true
-> Custom Scan (ParadeDB Base Scan) on public.articles_102008 ar (cost=10.00..10.01 rows=1 width=36)
Output: ar.title, ar.author_id
Table: articles_102008
Index: articles_search_idx_102008
Tantivy Query: {"with_index":{"query":{"with_index":{"query":{"match":{"field":"body","value":"PostgreSQL"}}}}}}
-> Index Scan using authors_pkey_102040 on public.authors_102040 a (cost=0.15..8.17 rows=1 width=36)
Output: a.id, a.name, a.bio
Index Cond: (a.id = ar.author_id)
Key indicators:
Nested Loop shows efficient JOIN execution on each shardCustom Scan (ParadeDB Base Scan) on the outer side of the JOIN uses BM25 for filteringIndex Scan on authors table uses the primary key for lookupsWhen using ParadeDB with Citus:
For more guidance on optimizing distributed search workloads, please reach out to us in the ParadeDB Community Slack or via email.