Back to Langfuse

Choose the Right JOIN Algorithm

.agents/skills/clickhouse-best-practices/rules/query-join-choose-algorithm.md

3.172.11.6 KB
Original Source

Choose the Right JOIN Algorithm

Impact: CRITICAL

ClickHouse's default hash join loads the RIGHT table entirely into memory. Choose the right algorithm based on table sizes and constraints.

Algorithm selection:

AlgorithmBest ForTrade-off
parallel_hashSmall-to-medium in-memory tablesDefault since 24.11; fast, concurrent
hashGeneral purpose, all join typesSingle-threaded hash table build
directDictionary lookups (INNER/LEFT only)Fastest; no hash table construction
full_sorting_mergeTables already sorted on join keySkips sort if pre-ordered; low memory
partial_mergeLarge tables, memory-constrainedMinimized memory; slower execution
grace_hashLarge datasets, tunable memoryFlexible; disk-spilling capability
autoAdaptive algorithm selectionTries hash first, falls back on memory pressure

Example usage:

sql
-- Let ClickHouse choose automatically
SET join_algorithm = 'auto';

-- For large-to-large joins where memory is constrained
SET join_algorithm = 'partial_merge';
SELECT * FROM large_a JOIN large_b ON large_b.id = large_a.id;

-- When joining by primary key columns, sort-merge skips sorting step
SET join_algorithm = 'full_sorting_merge';
SELECT * FROM table_a a JOIN table_b b ON b.pk_col = a.pk_col;

Note: ClickHouse 24.12+ automatically positions smaller tables on the right side. For earlier versions, manually ensure the smaller table is on the RIGHT.

Reference: Minimize and Optimize JOINs