Database Migration Patterns

Migration Locations

MySQL: apps/opik-backend/src/main/resources/liquibase/db-app-state/migrations/
ClickHouse: apps/opik-backend/src/main/resources/liquibase/db-app-analytics/migrations/

Table Naming

Always use plural names for database tables: traces, spans, feedback_scores, datasets, experiments (not trace, span, feedback_score).

Liquibase Format

sql

--liquibase formatted sql
--changeset author:000001_description
--comment: Brief description of the migration

-- Your SQL here

Always end with empty line.

MySQL Migration Example

sql

--liquibase formatted sql
--changeset john.doe:000001_add_user_table
--comment: Create users table with authentication fields

CREATE TABLE users (
    id VARCHAR(36) PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

-- Index for email lookups (frequent login queries)
CREATE INDEX idx_users_email ON users(email);

ClickHouse Migration Example

sql

--liquibase formatted sql
--changeset john.doe:000001_add_analytics_table
--comment: Create analytics events table

CREATE TABLE IF NOT EXISTS analytics_events ON CLUSTER '{cluster}' (
    id              FixedString(36),
    workspace_id    String,
    event_type      Enum8('unknown' = 0, 'view' = 1, 'click' = 2),
    created_at      DateTime64(9, 'UTC') DEFAULT now64(9),
    last_updated_at DateTime64(6, 'UTC') DEFAULT now64(6)
)
ENGINE = ReplicatedReplacingMergeTree(
    '/clickhouse/tables/{shard}/${ANALYTICS_DB_DATABASE_NAME}/analytics_events',
    '{replica}',
    last_updated_at
)
ORDER BY (workspace_id, event_type, id);

Rollback: Additive vs In-Place Changes

The kind of change decides whether a rollback is allowed:

Additive / structural changes → provide a rollback that undoes them (new table → DROP TABLE, new column → DROP COLUMN, new index → DROP INDEX). Dropping what was just added is safe.
In-place changes to an existing column → rollback MUST be empty (--rollback empty): enum modify/rename, type or precision changes, default/codec changes. Reverting them (1) often reintroduces the bug the migration fixed, and (2) is lossy or fails once new data is written under the new definition — they are effectively irreversible.

sql

-- ✅ Additive change → real rollback
ALTER TABLE events ON CLUSTER '{cluster}' ADD COLUMN IF NOT EXISTS source String DEFAULT '';
--rollback ALTER TABLE events ON CLUSTER '{cluster}' DROP COLUMN IF EXISTS source;

-- ✅ In-place enum change → empty rollback
ALTER TABLE events ON CLUSTER '{cluster}' MODIFY COLUMN level Enum8('TRACE' = 0, 'INFO' = 2, 'WARN' = 3, 'ERROR' = 4);
--rollback empty

-- ❌ In-place enum change → reverting reintroduces the old (buggy) label and breaks rows already written
--rollback ALTER TABLE events ON CLUSTER '{cluster}' MODIFY COLUMN level Enum8('TRACE' = 0, 'INFO' = 2, 'WARM' = 3, 'ERROR' = 4);

Always declare an intentionally empty rollback explicitly as --rollback empty (see 000029_add_thread_enum_value_to_feedback_scores.sql) — never just omit it.

Rollback Dependency Order

Rollback statements run in the order they appear. When columns have dependencies (e.g. a MATERIALIZED column that references another column), drop dependent columns first — the reverse of creation order.

sql

-- Forward: create base column, then materialized column
ALTER TABLE t ADD COLUMN IF NOT EXISTS description String DEFAULT '';
ALTER TABLE t ADD COLUMN IF NOT EXISTS description_hash UInt64 MATERIALIZED xxHash64(description);

-- ❌ BAD - drops base column before its dependent materialized column
--rollback ALTER TABLE t DROP COLUMN IF EXISTS description;
--rollback ALTER TABLE t DROP COLUMN IF EXISTS description_hash;

-- ✅ GOOD - drops materialized (dependent) column first, then base column
--rollback ALTER TABLE t DROP COLUMN IF EXISTS description_hash;
--rollback ALTER TABLE t DROP COLUMN IF EXISTS description;

This applies to any dependency chain: MATERIALIZED, ALIAS, indexes referencing columns, etc.

ClickHouse Gotchas

Always use ON CLUSTER '{cluster}' for distributed operations
Engine: Use ReplicatedReplacingMergeTree for deduplication, ReplicatedMergeTree for audit/logs
ORDER BY: Include workspace_id first, then logical groupings

Index Comments

Always explain why an index exists:

sql

-- ❌ BAD - No explanation
CREATE INDEX idx_users_created_at ON users(created_at);

-- ✅ GOOD - Explains purpose
-- Index for user registration analytics (used in monthly reports)
CREATE INDEX idx_users_created_at ON users(created_at);