Logical Replication Operational Guide - Paradedb

This guide covers how to operate ParadeDB after logical replication has been set up.

Use Getting Started with Logical Replication to create the publication and subscription first. This page focuses on what happens after the link is established and ParadeDB is staying in sync continuously.

Operating Model

When ParadeDB is used as a logical subscriber:

Your application writes to tables on the publisher
PostgreSQL logical replication applies those row changes to matching tables on ParadeDB
ParadeDB maintains BM25 indexes locally on the subscriber
Search and analytics queries run against ParadeDB instead of the primary

This keeps the source database authoritative while isolating search traffic from OLTP traffic.

<Note> Logical replication copies row changes into ParadeDB, but it does not copy BM25 indexes from the publisher. For the deployment described in this guide, build the BM25 indexes you plan to query on the ParadeDB subscriber. </Note>

Baseline Workflow

1. Wait for the Initial Copy to Finish

Let PostgreSQL finish copying the base table data before you build BM25 indexes. This avoids extra indexing work during the bootstrap phase.

On ParadeDB, you can check whether the initial copy is still running with:

sql

SELECT
  subname,
  worker_type,
  CASE WHEN relid = 0 THEN NULL ELSE relid::regclass END AS table_name,
  latest_end_time
FROM pg_stat_subscription
ORDER BY 1, 2, 3;

The initial copy is complete when there are no remaining rows with worker_type = 'table synchronization'. If you want a stricter per-table check, run:

sql

SELECT srrelid::regclass AS table_name, srsubstate
FROM pg_subscription_rel
ORDER BY 1;

The initial copy is complete when every replicated table is in state r (ready).

2. Build BM25 Indexes on ParadeDB

Once the replicated tables are caught up, create BM25 indexes locally on ParadeDB:

sql

CREATE INDEX mock_items_bm25_idx ON public.mock_items
USING bm25 (id, description, category, rating)
WITH (key_field='id');

After this, ongoing replicated INSERT, UPDATE, and DELETE operations will keep the BM25 index current automatically.

3. Query ParadeDB

Your application can now issue search queries to ParadeDB without adding search indexes to the primary database:

sql

SELECT id, description, pdb.score(id) AS score
FROM mock_items
WHERE description @@@ 'running shoes'
ORDER BY score DESC
LIMIT 10;

Day-2 Operations

Choose Publication and Subscription Boundaries

For large or high-churn production tables, use one publication and one subscription per large table, or group only small related tables together.

This gives each subscription its own main apply worker and replication slot. In normal steady-state replication, PostgreSQL does not parallelize ordinary change application across tables within a single subscription, so one hot table can delay other tables that share that apply worker. A publication per table alone does not provide that isolation unless it also has its own subscription.

If you split replication this way, size the replication worker settings for the number of subscriptions you plan to run:

On the publisher, set max_replication_slots to at least the number of subscriptions plus reserve for initial table synchronization workers. During bootstrap, each active table synchronization worker can temporarily consume its own replication slot on the publisher. With the default max_sync_workers_per_subscription = 2, leave room for the main subscription plus up to two extra sync slots per bootstrapping subscription, and set max_wal_senders high enough to cover the same plus any physical replicas.
On the subscriber, set max_replication_slots and max_logical_replication_workers to at least the number of subscriptions plus reserve for table synchronization workers. On PostgreSQL 18+, max_active_replication_origins controls replication origin tracking separately and should also be sized accordingly. max_worker_processes must be high enough to accommodate those logical replication workers and any other background workers used by the system.
max_sync_workers_per_subscription controls initial-copy parallelism when a subscription is created or refreshed. The default is 2, so multi-table publications normally copy at most two tables at a time unless you raise it.

Add New Tables

When you want ParadeDB to index a new table:

Apply the new table DDL on the publisher
Apply the same DDL on ParadeDB
Make sure the publication includes the table
Refresh the subscription
Build a BM25 index on ParadeDB if the table should be searchable

Whether step 3 is manual depends on how the publication was defined. If the publication uses FOR ALL TABLES, the new table is included automatically. If it uses FOR TABLES IN SCHEMA ..., new tables in those schemas are included automatically. If it was created from an explicit table list, add the table manually. If you do not want the table on ParadeDB, do not include it in the publication.

sql

-- On the publisher
ALTER PUBLICATION app_search_pub ADD TABLE public.new_table;

-- On ParadeDB
ALTER SUBSCRIPTION app_search_sub REFRESH PUBLICATION;

Change Indexed Columns

If you add or remove a column that is part of a BM25 index:

Apply the table change on both the publisher and ParadeDB
Let replication catch up again
Rebuild the BM25 index on ParadeDB

See Reindexing for the BM25 rebuild workflow.

Roll Out DDL Safely

PostgreSQL logical replication does not replicate schema changes. That means the publisher and ParadeDB must be kept in sync manually.

In practice, most teams do this through their existing migration runner or framework tooling, whether that is Rails migrations, Django migrations, Prisma Migrate, or another migration system.

For additive changes such as ADD COLUMN, the safest rollout is usually:

Apply the additive DDL on ParadeDB first
Apply the same DDL on the publisher
Let replication continue normally
Rebuild any BM25 indexes whose indexed column list changed

This follows PostgreSQL's recommendation to apply additive schema changes on the subscriber first whenever possible, which avoids intermittent apply failures. Logical replication can tolerate extra columns on the subscriber, so adding a column on ParadeDB first will not stop replication by itself. Those extra subscriber-only columns use their local default value, or NULL if no default is defined, until the publisher starts sending that column.

If the new column must be NOT NULL, give it a compatible default on both sides or use a coordinated maintenance window. Otherwise replicated INSERT operations can fail before the publisher-side change is in place.

If the change is not additive, such as a column rename, drop, or incompatible type change, use a short maintenance window, pause writes to the affected tables if possible, and coordinate both sides explicitly:

sql

-- On Subscriber
ALTER SUBSCRIPTION marketplace_sub DISABLE;
ALTER TABLE mock_items RENAME COLUMN category TO product_category;

-- On Publisher
ALTER TABLE mock_items RENAME COLUMN category TO product_category;

-- Back on Subscriber
ALTER SUBSCRIPTION marketplace_sub ENABLE;

<Warning> Do not leave a disabled subscription in place longer than necessary. The logical slot on the publisher can continue retaining WAL while the subscriber is disabled. </Warning>

Handle Tables Without Primary Keys

PostgreSQL needs a replica identity to replicate UPDATE and DELETE operations. A primary key is best. Another suitable unique index can also be used as the replica identity. If a table has no suitable key, you can use the per-table fallback:

sql

ALTER TABLE public.events REPLICA IDENTITY FULL;

Do not think of this as a server-wide setting. REPLICA IDENTITY FULL is set per published table and should be treated as a fallback rather than the default design.

PostgreSQL explicitly warns that subscriber-side UPDATE and DELETE can become very inefficient under FULL, because the subscriber must locate the matching row using the entire old row image rather than a compact key. FULL also increases WAL volume and replication traffic on the publisher, since every UPDATE and DELETE writes the full before-image of the row into WAL instead of just the key columns.

Monitor the Publisher

Permanent logical replication is operationally safe only if you watch the publisher, not just the subscriber. The most important signal is how much WAL a logical slot is retaining.

sql

SELECT
  slot_name,
  active,
  restart_lsn,
  confirmed_flush_lsn,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal,
  wal_status,
  safe_wal_size,
  inactive_since
FROM pg_replication_slots
WHERE slot_type = 'logical';

Watch for:

retained_wal growing steadily because the subscriber is not acknowledging WAL quickly enough
inactive_since becoming non-NULL for longer than expected
wal_status showing that the slot is under pressure
Filesystem usage on the volume that contains pg_wal

To reduce blast radius, configure max_slot_wal_keep_size on the publisher. This caps how much WAL a slot may retain, but it can also invalidate a lagging subscriber, so it should be paired with alerting and a reseed plan.

Monitor the Subscriber

Use the subscriber to confirm that apply workers are healthy and that errors are not accumulating:

sql

SELECT subname, worker_type, received_lsn, latest_end_lsn, latest_end_time
FROM pg_stat_subscription;

SELECT subname, apply_error_count, sync_error_count
FROM pg_stat_subscription_stats;

If latest_end_time stops advancing or apply_error_count increases, inspect the subscriber logs immediately.

Troubleshoot Apply Failures

One common cause of apply-worker failures is schema drift between the publisher and subscriber.

Two common log patterns for schema drift are:

text

logical replication target relation "public.doctor" is missing replicated columns: "personnel_id", "role_function_id"

text

logical replication apply worker for subscription "paradedb_subscription" has started
background worker "logical replication apply worker" (PID 2570238) exited with exit code 1

The first message is the root cause. The second means the apply worker crashed after hitting that error and PostgreSQL will try to restart it.

When you see these messages:

Inspect the subscriber logs for the first schema-mismatch error, not just the worker restart message
Compare the affected table definition on the publisher and ParadeDB
Apply the missing DDL on ParadeDB
Re-enable or refresh the subscription if needed
Rebuild any BM25 indexes affected by the schema change

Another common cause of apply-worker failures is a logical replication conflict. For example, a duplicate key, a permissions failure on the target table, or row-level security on the subscriber can stop replication even when the schemas match.

text

ERROR: duplicate key value violates unique constraint ...
CONTEXT: processing remote data during INSERT for replication target relation ...

When you suspect a replication conflict:

Inspect the subscriber logs for the first conflict error and note the finish LSN and replication origin if PostgreSQL logged them
Resolve the underlying issue on the subscriber, such as conflicting local data, missing privileges, or row-level security policy interference
Resume replication normally once the conflict is removed
Only if you intentionally want to discard that remote transaction, use ALTER SUBSCRIPTION ... SKIP with care

Skipping a conflicting transaction can leave the subscriber inconsistent, so it should be treated as a last resort rather than the default fix. For conflict types and the PostgreSQL recovery workflow, see the PostgreSQL logical replication conflicts documentation.

Emergency: WAL Keeps Accumulating on the Publisher

If the logical slot on the publisher is filling disk and ParadeDB cannot catch up quickly enough, the priority is protecting the publisher.

First, fix the subscriber if the issue is simple and recent, such as a schema mismatch or networking issue
If the publisher is running out of disk and the subscriber can be rebuilt, remove the subscription or drop the logical slot so the publisher can recycle WAL again
Recreate the subscription and reseed ParadeDB once the publisher is safe

<Warning> Disabling the subscription is not an emergency fix for WAL buildup. A disabled subscription still leaves the logical slot behind on the publisher, and that slot can continue retaining WAL. </Warning>

If the subscriber is reachable and healthy enough to cleanly tear down, dropping the subscription is the cleanest path:

sql

DROP SUBSCRIPTION paradedb_subscription;

To protect the publisher from continued pg_wal growth when you are intentionally giving up the current replica state, drop the slot on the publisher:

sql

SELECT pg_drop_replication_slot('paradedb_subscription');

After either step, ParadeDB must be reinitialized from a fresh schema and data copy before it can resume as a logical subscriber.

Common Pitfalls

Starting with pre-populated subscriber tables while using copy_data = true
Applying DDL on only one side of the replication link
Forgetting that new tables must be added to the publication and refreshed on the subscription
Writing directly to subscribed tables on ParadeDB, which can create conflicts with incoming replicated changes
Leaving a broken logical slot unattended on the publisher until pg_wal fills disk
Assuming ALTER SUBSCRIPTION ... DISABLE relieves publisher-side WAL pressure

For schema-change basics, see Schema Changes. For multiple source databases, see Multi-Database Replication for Microservices.