documentation/high-availability/setup.md
import { ConfigTable } from "@theme/ConfigTable" import { EnterpriseNote } from "@site/src/components/EnterpriseNote"
import replicationConfig from "../configuration/configuration-utils/_replication.config.json"
<EnterpriseNote> This guide covers setting up primary-replica replication. </EnterpriseNote>This guide walks you through setting up QuestDB Enterprise replication.
Prerequisites: Read the Replication overview to understand how replication works.
Choose your object storage provider and build the connection string for
replication.object.store in server.conf.
Create an S3 bucket following AWS documentation.
Recommendations:
Connection string:
replication.object.store=s3::bucket=${BUCKET_NAME};root=${DB_INSTANCE_NAME};region=${AWS_REGION};access_key_id=${AWS_ACCESS_KEY};secret_access_key=${AWS_SECRET_ACCESS_KEY};
DB_INSTANCE_NAME can be any unique alphanumeric string (dashes allowed). Use
the same value across all nodes in your replication cluster.
:::tip[Using IAM roles] If your instance has an IAM role attached (EC2 instance profile, EKS pod identity, or ECS task role), you can omit the credentials:
replication.object.store=s3::bucket=${BUCKET_NAME};root=${DB_INSTANCE_NAME};region=${AWS_REGION};
QuestDB will automatically use the instance's IAM role for authentication. :::
Create a Storage Account following Azure documentation, then create a Blob Container.
Recommendations:
Connection string:
replication.object.store=azblob::endpoint=https://${STORE_ACCOUNT}.blob.core.windows.net;container=${BLOB_CONTAINER};root=${DB_INSTANCE_NAME};account_name=${STORE_ACCOUNT};account_key=${STORE_KEY};
:::tip[Using Managed Identity]
If your instance has a Managed Identity assigned (Azure VM, AKS pod identity,
or Container Apps), you can omit the account_key:
replication.object.store=azblob::endpoint=https://${STORE_ACCOUNT}.blob.core.windows.net;container=${BLOB_CONTAINER};root=${DB_INSTANCE_NAME};account_name=${STORE_ACCOUNT};
QuestDB will automatically use the Managed Identity for authentication. Ensure the identity has the Storage Blob Data Contributor role on the container. :::
Create a GCS bucket, then create a service account with Storage Admin (or
equivalent) permissions. Download the JSON key and encode it as Base64:
cat <key>.json | base64
Connection string:
replication.object.store=gcs::bucket=${BUCKET_NAME};root=/;credential=${BASE64_ENCODED_KEY};
Alternatively, use credential_path to reference the key file directly.
:::tip[Using Workload Identity] If your instance uses Workload Identity (GKE) or runs on a GCE VM with a service account attached, you can omit the credentials entirely:
replication.object.store=gcs::bucket=${BUCKET_NAME};root=/;
QuestDB will automatically use Application Default Credentials for authentication. :::
Mount the shared filesystem on all nodes. Ensure the QuestDB user has read/write permissions.
Important: Both the WAL folder and scratch folder must be on the same NFS mount to prevent write corruption.
Connection string:
replication.object.store=fs::root=/mnt/nfs_replication/final;atomic_write_dir=/mnt/nfs_replication/scratch;
Add to server.conf:
| Setting | Value |
|---|---|
replication.role | primary |
replication.object.store | Your connection string from step 1 |
cairo.snapshot.instance.id | Unique UUID for this node |
Restart QuestDB.
Replicas are initialized from a snapshot of the primary's data. This involves creating a backup of the primary and preparing it for restoration on replica nodes.
See Backup and restore for the full procedure.
:::tip Set up regular snapshots (daily or weekly). :::
Create a new QuestDB instance. Add to server.conf:
| Setting | Value |
|---|---|
replication.role | replica |
replication.object.store | Same connection string as primary |
cairo.snapshot.instance.id | Unique UUID for this replica |
:::warning
Do not copy server.conf from the primary. Two nodes configured as primary
with the same object store will break replication.
:::
Restore the db directory from the primary's snapshot, then start the replica.
It will download and apply WAL files to catch up with the primary.
All replication settings go in server.conf. After changes, restart QuestDB.
:::tip Use environment variables for sensitive settings:
export QDB_REPLICATION_OBJECT_STORE="azblob::..."
:::
<ConfigTable rows={replicationConfig} />For tuning options, see the Tuning guide.
Replicated WAL data accumulates in object storage over time. The WAL cleaner runs on the primary node and automatically removes data that is no longer needed, based on your backup and checkpoint history.
The cleaner is enabled by default and requires no configuration when backups or checkpoint history are active. By default, it retains replication data for the most recent 5 backups or checkpoints and deletes everything older.
See the WAL Cleanup guide for configuration options, tuning, and troubleshooting.
| Node | Recoverable | Unrecoverable |
|---|---|---|
| Primary | Restart | Promote replica, create new replica |
| Replica | Restart | Destroy and recreate |
Temporary partitions cause replicas to lag, then catch up when connectivity restores. This is normal operation.
Permanent partitions require emergency primary migration.
If a crash corrupts transactions, tables may suspend on restart. You can skip the corrupted transaction and reload missing data, or follow the emergency migration flow.
Symptoms: high latency, unmounted disk, suspended tables. Follow the emergency migration flow to move to new storage.
Use when the current primary is healthy but you want to switch to a new one.
replication.role=primary-catchup-uploadsUse when the primary has failed.
replication.role=primary on the replica_migrate_primary file in the installation directory:::warning Data committed to the primary but not yet replicated will be lost. Use planned migration if the primary is still functional. :::
Restore the database to a specific historical timestamp.
_recover_point_in_time file containing:
replication.object.store=<source object store>
replication.recovery.timestamp=YYYY-MM-DDThh:mm:ss.mmmZ
_restore file to trigger recoveryserver.conf to replicate to a new object store