Back to Langfuse

Use Partitioning for Data Lifecycle Management

.agents/skills/clickhouse-best-practices/rules/schema-partition-lifecycle.md

3.172.11.3 KB
Original Source

Use Partitioning for Data Lifecycle Management

Impact: HIGH

Partitioning is primarily a data management technique, not a query optimization tool. It excels at:

  • Dropping data: Remove entire partitions as single metadata operations
  • TTL retention: Implement time-based retention policies efficiently
  • Tiered storage: Move old partitions to cold storage
  • Archiving: Move partitions between tables

Incorrect (no time alignment for lifecycle):

sql
-- Cannot efficiently drop old data by time
CREATE TABLE events (...)
ENGINE = MergeTree()
PARTITION BY event_type  -- No time alignment
ORDER BY (timestamp);

-- Slow: must scan and delete row by row
DELETE FROM events WHERE timestamp < '2023-01-01';

Correct (time-based for lifecycle):

sql
CREATE TABLE events (
    timestamp DateTime,
    event_type LowCardinality(String)
)
ENGINE = MergeTree()
PARTITION BY toStartOfMonth(timestamp)
ORDER BY (event_type, timestamp)
TTL timestamp + INTERVAL 1 YEAR DELETE;  -- Drops whole partitions

-- Fast: metadata-only operation
ALTER TABLE events DROP PARTITION '202301';

-- Archive to cold storage
ALTER TABLE events_archive ATTACH PARTITION '202301' FROM events;

Reference: Choosing a Partitioning Key