Deduplication

import Screenshot from "@theme/Screenshot"

Deduplication ensures that only one row exists for a given set of key columns. When a new row matches an existing row's keys, the old row is replaced.

When to use deduplication

Use deduplication when:

You need idempotent writes (safe to retry or resend data)
You're reloading data that may have changed (e.g., third-party data feeds)
You want "last write wins" behavior for a given key
You're recovering from ingestion errors and need to resend a time range

Skip deduplication when:

Your timestamps are always unique (no duplicates possible)
You're doing append-only logging where duplicates are acceptable
Write performance is critical and you're certain duplicates won't occur

Quick example

questdb-sql

CREATE TABLE prices (
    ts TIMESTAMP,
    ticker SYMBOL,
    price DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(ts, ticker);

With this configuration, each (ts, ticker) combination can have only one row:

questdb-sql

INSERT INTO prices VALUES ('2026-01-15T10:00:00', 'AAPL', 185.50);
INSERT INTO prices VALUES ('2026-01-15T10:00:00', 'AAPL', 186.00);  -- replaces previous

SELECT * FROM prices;

ts	ticker	price
2026-01-15T10:00:00	AAPL	186.00

Only the last value is kept.

How it works

When deduplication is enabled, QuestDB:

Checks if incoming rows match existing rows by UPSERT KEYS
If keys match, compares the full row content
If the row is identical, skips the write entirely (no disk I/O)
If the row differs, replaces the old row with the new one

This full-row comparison significantly reduces write amplification when reloading large datasets where only a small portion has changed — common when consuming third-party data feeds that provide full snapshots.

Performance

Deduplication has minimal overhead when:

Timestamps are mostly unique across rows
Data arrives in roughly time-ordered fashion

Deduplication is more expensive when:

Many rows share the same timestamp
Deduplication keys have high cardinality

The full-row check optimization means that reloading unchanged data is cheap — QuestDB detects identical rows and skips unnecessary writes.

Configuration

Create table with deduplication

questdb-sql

CREATE TABLE prices (
    ts TIMESTAMP,
    ticker SYMBOL,
    price DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(ts, ticker);

The designated timestamp must always be included in UPSERT KEYS.

Enable on existing table

questdb-sql

ALTER TABLE prices DEDUP ENABLE UPSERT KEYS(ts, ticker);

Disable deduplication

questdb-sql

ALTER TABLE prices DEDUP DISABLE;

Change UPSERT KEYS

questdb-sql

ALTER TABLE prices DEDUP ENABLE UPSERT KEYS(ts, ticker, exchange);

Checking configuration

Check if deduplication is enabled:

questdb-sql

SELECT dedup FROM tables() WHERE table_name = 'prices';

Check which columns are UPSERT KEYS:

questdb-sql

SELECT "column", upsertKey FROM table_columns('prices');

Requirements

Deduplication requires WAL tables
The designated timestamp must be included in UPSERT KEYS
Enabling deduplication does not deduplicate existing data — only new inserts

When to use deduplication

Quick example

How it works

Performance

Configuration

Create table with deduplication

Enable on existing table

Disable deduplication

Change UPSERT KEYS

Checking configuration

Requirements

See also