.agents/skills/clickhouse-best-practices/rules/schema-pk-plan-before-creation.md
Impact: CRITICAL (immutable after creation)
ClickHouse's ORDER BY clause defines physical data ordering and the sparse index. Unlike other databases, ORDER BY cannot be modified after table creation. A wrong choice requires creating a new table and migrating all data.
Incorrect (arbitrary ORDER BY without query analysis):
-- Creating table without analyzing query patterns
CREATE TABLE events (
event_id UUID,
user_id UInt64,
timestamp DateTime
)
ENGINE = MergeTree()
ORDER BY (event_id); -- Chosen arbitrarily
-- Later: "Most queries filter by user_id!"
-- Cannot fix with: ALTER TABLE events MODIFY ORDER BY (user_id, timestamp)
-- ERROR: Cannot modify ORDER BY
Correct (query-driven ORDER BY selection):
-- Step 1: Document query patterns BEFORE creating table
/*
Query Analysis:
- 60% of queries: WHERE user_id = ? AND timestamp BETWEEN ? AND ?
- 25% of queries: WHERE event_type = ? AND timestamp > ?
- 15% of queries: WHERE event_id = ?
Conclusion: user_id and event_type are primary filters
*/
-- Step 2: Create table with correct ORDER BY
CREATE TABLE events (
event_id UUID DEFAULT generateUUIDv4(),
user_id UInt64,
event_type LowCardinality(String),
timestamp DateTime,
event_date Date DEFAULT toDate(timestamp)
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (user_id, event_date, event_id);
Pre-creation checklist:
Reference: Choosing a Primary Key