.ai/principles/distilled/clickhouse.md
Prerequisite: If you haven't already, also read .ai/principles/distilled/database-fundamentals.md - it contains foundational rules that apply to all database work.
clickhouse:check-schema CI job logs; if it fails, inspect differences carefully and discuss non-whitespace discrepancies with the MR author.clickhouse:check-schema job to fail without investigation — it is not allowed to fail and will block the MR pipeline.pipeline:skip-check-clickhouse-schema label only for confirmed false positives (e.g., ClickHouse version mismatches).db/click_house/main.sql is updated and committed in the MR; if missing, ask the author to run bundle exec rake gitlab:clickhouse:migrate; bundle exec rake gitlab:clickhouse:schema:dump.db/click_house/schema_migrations/ are auto-generated and do not require a newline at the end — do not flag missing newlinessql = 'SELECT * FROM events WHERE id > {min_id:UInt64}'.ClickHouse::Client::Quoting.quote(...) for fixed string interpolation assigned to Ruby constants to prevent SQL injection.SHOW CREATE TABLE table_name FORMAT raw) to understand partitioning and primary keys.EXPLAIN indexes=1 to verify that filters use primary key indexes; check the Granules ratio in the PrimaryKey section.gitlab-org or gitlab-org/gitlab).POPULATE keyword or have a backfill migration for large datasets.MergeTree only when data is strictly append-only and duplicates cannot occur.ReplacingMergeTree tables provide a monotonic version column (typically DateTime64) and an optional deleted flag (Bool) for soft deletes.ReplacingMergeTree — without it, the deduplicated row after a merge is arbitrary.argMax by the version column with GROUP BY on the primary key for query-time deduplication in production queries.FINAL in production queries — it forces on-the-fly collapsing/merging and can be very expensive I/O-wise; prefer the query-time dedup pattern instead.When a diff modifies or replaces an existing structure, always verify the current state from an authoritative source before flagging a discrepancy. Never infer the pre-change state solely from diff context — check the actual source of truth. For example:
down methods: verify the down schema against the actual pre-migration schema by
querying the local ClickHouse database (SHOW CREATE TABLE tablename) or, if unavailable, reading
the schema from the base branch (git show master:db/click_house/main.sql). Compare
column-by-column: names, types, defaults, engine, primary key, ORDER BY, and SETTINGS.DROP TABLE IF EXISTS + CREATE TABLE): verify the old table definition
the same way before claiming columns or settings are missing.For the full picture, see: