Back to Langfuse

Consider Alternatives to JOINs

.agents/skills/clickhouse-best-practices/rules/query-join-consider-alternatives.md

3.172.11.9 KB
Original Source

Consider Alternatives to JOINs

Impact: CRITICAL

Repeated JOINs to dimension tables add overhead. Dictionaries or denormalization shift computational work from query time to insert/pre-processing time.

Incorrect (JOIN on every query):

sql
-- JOIN on every query
SELECT o.order_id, c.name, c.email
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2024-01-01';

Correct - Dictionary Lookup:

sql
-- Create dictionary
CREATE DICTIONARY customer_dict (
    id UInt64,
    name String,
    email String
)
PRIMARY KEY id
SOURCE(CLICKHOUSE(TABLE 'customers'))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 360);

-- Use dictGet instead of JOIN (uses direct join algorithm - fastest)
SELECT
    order_id,
    dictGet('customer_dict', 'name', customer_id) as customer_name,
    dictGet('customer_dict', 'email', customer_id) as customer_email
FROM orders
WHERE created_at > '2024-01-01';

Correct - Denormalization:

sql
-- Denormalized table with materialized view
CREATE MATERIALIZED VIEW orders_enriched_mv TO orders_enriched AS
SELECT
    o.order_id, o.customer_id,
    c.name as customer_name,
    c.email as customer_email,
    o.total, o.created_at
FROM orders o
JOIN customers c ON c.id = o.customer_id;

Approach comparison:

ApproachUse CasePerformance
DictionaryFrequent lookups to small dimensionFastest (in-memory)
DenormalizationAnalytics always need enriched dataFast (no join at query)
IN subqueryExistence filteringOften faster than JOIN
JOINInfrequent or complex joinsAcceptable

Critical dictionary caveat: Dictionaries silently deduplicate duplicate keys, retaining only the final value. Only use when source has unique keys.

Reference: Minimize and Optimize JOINs