.agents/skills/clickhouse-best-practices/rules/query-join-consider-alternatives.md
Impact: CRITICAL
Repeated JOINs to dimension tables add overhead. Dictionaries or denormalization shift computational work from query time to insert/pre-processing time.
Incorrect (JOIN on every query):
-- JOIN on every query
SELECT o.order_id, c.name, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01';
Correct - Dictionary Lookup:
-- Create dictionary
CREATE DICTIONARY customer_dict (
id UInt64,
name String,
email String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(TABLE 'customers'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 360);
-- Use dictGet instead of JOIN (uses direct join algorithm - fastest)
SELECT
order_id,
dictGet('customer_dict', 'name', customer_id) as customer_name,
dictGet('customer_dict', 'email', customer_id) as customer_email
FROM orders
WHERE created_at > '2024-01-01';
Correct - Denormalization:
-- Denormalized table with materialized view
CREATE MATERIALIZED VIEW orders_enriched_mv TO orders_enriched AS
SELECT
o.order_id, o.customer_id,
c.name as customer_name,
c.email as customer_email,
o.total, o.created_at
FROM orders o
JOIN customers c ON c.id = o.customer_id;
Approach comparison:
| Approach | Use Case | Performance |
|---|---|---|
| Dictionary | Frequent lookups to small dimension | Fastest (in-memory) |
| Denormalization | Analytics always need enriched data | Fast (no join at query) |
| IN subquery | Existence filtering | Often faster than JOIN |
| JOIN | Infrequent or complex joins | Acceptable |
Critical dictionary caveat: Dictionaries silently deduplicate duplicate keys, retaining only the final value. Only use when source has unique keys.
Reference: Minimize and Optimize JOINs