docs/deploy/logical-replication/operational-guide.mdx
This guide covers how to operate ParadeDB after logical replication has been set up.
Use Getting Started with Logical Replication to create the publication and subscription first. This page focuses on what happens after the link is established and ParadeDB is staying in sync continuously.
When ParadeDB is used as a logical subscriber:
This keeps the source database authoritative while isolating search traffic from OLTP traffic.
<Note> Logical replication copies row changes into ParadeDB, but it does not copy BM25 indexes from the publisher. For the deployment described in this guide, build the BM25 indexes you plan to query on the ParadeDB subscriber. </Note>Let PostgreSQL finish copying the base table data before you build BM25 indexes. This avoids extra indexing work during the bootstrap phase.
On ParadeDB, you can check whether the initial copy is still running with:
SELECT
subname,
worker_type,
CASE WHEN relid = 0 THEN NULL ELSE relid::regclass END AS table_name,
latest_end_time
FROM pg_stat_subscription
ORDER BY 1, 2, 3;
The initial copy is complete when there are no remaining rows with
worker_type = 'table synchronization'. If you want a stricter per-table
check, run:
SELECT srrelid::regclass AS table_name, srsubstate
FROM pg_subscription_rel
ORDER BY 1;
The initial copy is complete when every replicated table is in state r
(ready).
Once the replicated tables are caught up, create BM25 indexes locally on ParadeDB:
CREATE INDEX mock_items_bm25_idx ON public.mock_items
USING bm25 (id, description, category, rating)
WITH (key_field='id');
After this, ongoing replicated INSERT, UPDATE, and DELETE operations will
keep the BM25 index current automatically.
Your application can now issue search queries to ParadeDB without adding search indexes to the primary database:
SELECT id, description, pdb.score(id) AS score
FROM mock_items
WHERE description @@@ 'running shoes'
ORDER BY score DESC
LIMIT 10;
For large or high-churn production tables, use one publication and one subscription per large table, or group only small related tables together.
This gives each subscription its own main apply worker and replication slot. In normal steady-state replication, PostgreSQL does not parallelize ordinary change application across tables within a single subscription, so one hot table can delay other tables that share that apply worker. A publication per table alone does not provide that isolation unless it also has its own subscription.
If you split replication this way, size the replication worker settings for the number of subscriptions you plan to run:
max_replication_slots to at least the number of
subscriptions plus reserve for initial table synchronization workers. During
bootstrap, each active table synchronization worker can temporarily consume
its own replication slot on the publisher. With the default
max_sync_workers_per_subscription = 2, leave room for the main subscription
plus up to two extra sync slots per bootstrapping subscription, and set
max_wal_senders high enough to cover the same plus any physical replicas.max_replication_slots and
max_logical_replication_workers to at least the number of subscriptions plus
reserve for table synchronization workers. On PostgreSQL 18+,
max_active_replication_origins controls replication origin tracking
separately and should also be sized accordingly. max_worker_processes must
be high enough to accommodate those logical replication workers and any other
background workers used by the system.max_sync_workers_per_subscription controls initial-copy parallelism when a
subscription is created or refreshed. The default is 2, so multi-table
publications normally copy at most two tables at a time unless you raise it.When you want ParadeDB to index a new table:
Whether step 3 is manual depends on how the publication was defined. If the
publication uses FOR ALL TABLES, the new table is included automatically. If
it uses FOR TABLES IN SCHEMA ..., new tables in those schemas are included
automatically. If it was created from an explicit table list, add the table
manually. If you do not want the table on ParadeDB, do not include it in the
publication.
-- On the publisher
ALTER PUBLICATION app_search_pub ADD TABLE public.new_table;
-- On ParadeDB
ALTER SUBSCRIPTION app_search_sub REFRESH PUBLICATION;
If you add or remove a column that is part of a BM25 index:
See Reindexing for the BM25 rebuild workflow.
PostgreSQL logical replication does not replicate schema changes. That means the publisher and ParadeDB must be kept in sync manually.
In practice, most teams do this through their existing migration runner or framework tooling, whether that is Rails migrations, Django migrations, Prisma Migrate, or another migration system.
For additive changes such as ADD COLUMN, the safest rollout is usually:
This follows PostgreSQL's recommendation to apply additive schema changes on the
subscriber first whenever possible, which avoids intermittent apply failures.
Logical replication can tolerate extra columns on the subscriber, so adding a
column on ParadeDB first will not stop replication by itself. Those extra
subscriber-only columns use their local default value, or NULL if no default
is defined, until the publisher starts sending that column.
If the new column must be NOT NULL, give it a compatible default on both
sides or use a coordinated maintenance window. Otherwise replicated INSERT
operations can fail before the publisher-side change is in place.
If the change is not additive, such as a column rename, drop, or incompatible type change, use a short maintenance window, pause writes to the affected tables if possible, and coordinate both sides explicitly:
-- On Subscriber
ALTER SUBSCRIPTION marketplace_sub DISABLE;
ALTER TABLE mock_items RENAME COLUMN category TO product_category;
-- On Publisher
ALTER TABLE mock_items RENAME COLUMN category TO product_category;
-- Back on Subscriber
ALTER SUBSCRIPTION marketplace_sub ENABLE;
PostgreSQL needs a replica identity to replicate UPDATE and DELETE
operations. A primary key is best. Another suitable unique index can also be
used as the replica identity. If a table has no suitable key, you can use the
per-table fallback:
ALTER TABLE public.events REPLICA IDENTITY FULL;
Do not think of this as a server-wide setting. REPLICA IDENTITY FULL is set
per published table and should be treated as a fallback rather than the default
design.
PostgreSQL explicitly warns that subscriber-side UPDATE and DELETE can
become very inefficient under FULL, because the subscriber must locate the
matching row using the entire old row image rather than a compact key.
FULL also increases WAL volume and replication traffic on the publisher,
since every UPDATE and DELETE writes the full before-image of the row
into WAL instead of just the key columns.
Permanent logical replication is operationally safe only if you watch the publisher, not just the subscriber. The most important signal is how much WAL a logical slot is retaining.
SELECT
slot_name,
active,
restart_lsn,
confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal,
wal_status,
safe_wal_size,
inactive_since
FROM pg_replication_slots
WHERE slot_type = 'logical';
Watch for:
retained_wal growing steadily because the subscriber is not acknowledging
WAL quickly enoughinactive_since becoming non-NULL for longer than expectedwal_status showing that the slot is under pressurepg_walTo reduce blast radius, configure max_slot_wal_keep_size on the publisher.
This caps how much WAL a slot may retain, but it can also invalidate a lagging
subscriber, so it should be paired with alerting and a reseed plan.
Use the subscriber to confirm that apply workers are healthy and that errors are not accumulating:
SELECT subname, worker_type, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription;
SELECT subname, apply_error_count, sync_error_count
FROM pg_stat_subscription_stats;
If latest_end_time stops advancing or apply_error_count increases, inspect
the subscriber logs immediately.
One common cause of apply-worker failures is schema drift between the publisher and subscriber.
Two common log patterns for schema drift are:
logical replication target relation "public.doctor" is missing replicated columns: "personnel_id", "role_function_id"
logical replication apply worker for subscription "paradedb_subscription" has started
background worker "logical replication apply worker" (PID 2570238) exited with exit code 1
The first message is the root cause. The second means the apply worker crashed after hitting that error and PostgreSQL will try to restart it.
When you see these messages:
Another common cause of apply-worker failures is a logical replication conflict. For example, a duplicate key, a permissions failure on the target table, or row-level security on the subscriber can stop replication even when the schemas match.
ERROR: duplicate key value violates unique constraint ...
CONTEXT: processing remote data during INSERT for replication target relation ...
When you suspect a replication conflict:
ALTER SUBSCRIPTION ... SKIP with careSkipping a conflicting transaction can leave the subscriber inconsistent, so it should be treated as a last resort rather than the default fix. For conflict types and the PostgreSQL recovery workflow, see the PostgreSQL logical replication conflicts documentation.
If the logical slot on the publisher is filling disk and ParadeDB cannot catch up quickly enough, the priority is protecting the publisher.
If the subscriber is reachable and healthy enough to cleanly tear down, dropping the subscription is the cleanest path:
DROP SUBSCRIPTION paradedb_subscription;
To protect the publisher from continued pg_wal growth when you are
intentionally giving up the current replica state, drop the slot on the
publisher:
SELECT pg_drop_replication_slot('paradedb_subscription');
After either step, ParadeDB must be reinitialized from a fresh schema and data copy before it can resume as a logical subscriber.
copy_data = truepg_wal
fills diskALTER SUBSCRIPTION ... DISABLE relieves publisher-side WAL pressureFor schema-change basics, see Schema Changes. For multiple source databases, see Multi-Database Replication for Microservices.