.agents/skills/opik-backend/clickhouse.md
src/main/resources/liquibase/db-app-analytics/migrations/Most tables need deduplication. Exception: audit/log tables (ReplicatedMergeTree) keep full history.
Updates are modeled as inserts — multiple row versions coexist until background merge.
-- Pattern 1: LIMIT 1 BY (reads only referenced columns, requires explicit sort)
SELECT * FROM traces
WHERE workspace_id = :workspace_id
ORDER BY (workspace_id, project_id, id) DESC, last_updated_at DESC
LIMIT 1 BY id
-- Pattern 2: FINAL (query-time merge, streams through sorted parts)
SELECT * FROM traces FINAL
WHERE workspace_id = :workspace_id
-- ❌ BAD - missing deduplication, returns duplicates
SELECT * FROM traces WHERE workspace_id = :workspace_id
There's no general rule for which is faster. It depends on the query and table state:
With LIMIT 1 BY, filters on mutable columns (status, name, tags, etc.) MUST go
AFTER dedup in an outer query. Filtering before dedup can return stale/phantom rows.
-- ✅ CORRECT -- ❌ WRONG
SELECT * FROM ( SELECT * FROM traces
SELECT * FROM traces WHERE status = :status -- before dedup!
WHERE workspace_id = :wid ORDER BY (...) DESC, last_updated_at DESC
ORDER BY (...) DESC LIMIT 1 BY id
LIMIT 1 BY id
) WHERE status = :status
Safe to filter before dedup: immutable columns (workspace_id, project_id, id, created_at)
and monotonically-changing columns (last_updated_at — only increases, so a lower-bound cutoff
can't exclude the latest version while keeping an older one).
With FINAL, this doesn't apply — dedup happens before WHERE.
Never index fields that flip back and forth (e.g., status). The index sees old row
versions and can't filter reliably before dedup.
Monotonically-changing fields are safe (e.g., last_updated_at only increases). Use
minmax GRANULARITY 1 — ClickHouse skips granules entirely outside the filter range.
Add a migration comment explaining why the indexed field is safe.
Skip indexes are ignored with FINAL by default (up to CH 25.3). Enable with:
SETTINGS use_skip_indexes_if_final=1
Avoid bare FINAL on large tables. Scope it with a monotonic field + minmax index:
-- Migration (comment why it's safe):
ALTER TABLE t ADD INDEX idx_last_updated_at last_updated_at TYPE minmax GRANULARITY 1;
ALTER TABLE t MATERIALIZE INDEX idx_last_updated_at;
-- Query: FINAL only considers recent granules
SELECT * FROM (
SELECT * FROM t FINAL WHERE last_updated_at > now() - INTERVAL 1 DAY
) WHERE status = 'active'
SETTINGS use_skip_indexes_if_final=1
-- Escape < operator
WHERE id \\<= :uuid_to_time
-- Conditional blocks
<if(project_id)> AND project_id = :project_id <endif>
// ✅ GOOD - snake_case
.bind("workspace_id", workspaceId)
.bind("project_id", projectId.toString())
// ❌ BAD - camelCase
.bind("workspaceId", workspaceId)
toDecimal64(
greatest(least(if(isFinite(v), v, 0), 999999999.999999999), -999999999.999999999),
9
)
Decimal64(9)Decimal(38, 12) → use toDecimal128(..., 12)IN (subquery) patternLEFT ANY JOIN when the right table has at most one match per keyAlways add log comment:
SETTINGS log_comment = '<log_comment>'