Back to Langfuse

Use Refreshable MVs for Complex Joins and Batch Workflows

.agents/skills/clickhouse-best-practices/rules/query-mv-refreshable.md

3.172.11.8 KB
Original Source

Use Refreshable MVs for Complex Joins and Batch Workflows

Impact: HIGH

Refreshable MVs execute queries periodically on a schedule. The full query re-executes and overwrites (or appends to) the target table.

Best for:

  • Sub-millisecond latency where minor staleness is acceptable
  • Caching "top N" results or lookup tables
  • Complex multi-table joins requiring denormalization
  • Batch workflows and DAG dependencies

Incorrect (expensive join on every request):

sql
-- Complex join executed on every request
SELECT
    o.order_id, o.total,
    c.name as customer_name,
    p.name as product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= now() - INTERVAL 1 DAY;

Correct (refreshable MV):

sql
-- Create refreshable MV that runs every 5 minutes
CREATE MATERIALIZED VIEW orders_denormalized
REFRESH EVERY 5 MINUTE
ENGINE = MergeTree()
ORDER BY (created_at, order_id)
AS SELECT
    o.order_id, o.created_at, o.total,
    c.name as customer_name, c.segment,
    p.name as product_name
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= now() - INTERVAL 1 DAY;

-- Query the pre-joined data (sub-millisecond)
SELECT * FROM orders_denormalized WHERE segment = 'enterprise';

APPEND vs REPLACE modes:

ModeBehaviorUse Case
REPLACE (default)Overwrites previous contentsCurrent state, lookup tables
APPENDAdds new rows to existing dataPeriodic snapshots, historical accumulation

Critical warning: Query should run quickly compared to refresh interval. Don't schedule every 10 seconds if the query takes 10+ seconds.

Reference: Use Materialized Views