Back to Langfuse

Use JSON Type for Dynamic Schemas

.agents/skills/clickhouse-best-practices/rules/schema-json-when-to-use.md

3.172.11.8 KB
Original Source

Use JSON Type for Dynamic Schemas

Impact: MEDIUM

ClickHouse's JSON type splits JSON objects into separate sub-columns, enabling field-level query optimization. Use it for truly dynamic data, not everything.

Incorrect (schema bloat or opaque String):

sql
-- BAD: Hundreds of nullable columns for event properties
CREATE TABLE events (
    event_id UUID,
    prop_page_url Nullable(String),
    prop_button_id Nullable(String),
    -- ... 100 more nullable columns
)

-- BAD: JSON as String when you need field queries
CREATE TABLE events (
    event_id UUID,
    properties String  -- No field-level optimization
)

Correct (JSON for dynamic, typed for known):

sql
-- Use JSON type for dynamic properties
CREATE TABLE events (
    event_id UUID DEFAULT generateUUIDv4(),
    event_type LowCardinality(String),
    timestamp DateTime DEFAULT now(),
    properties JSON  -- Flexible schema with type inference
)
ENGINE = MergeTree()
ORDER BY (event_type, timestamp);

-- Query JSON paths directly
SELECT
    event_type,
    properties.url as page_url,
    properties.amount as purchase_amount
FROM events
WHERE event_type = 'page_view' AND properties.url = '/home';

When to use JSON:

ScenarioUse JSON?
Data structure varies unpredictablyYes
Field types/schemas change over timeYes
Need field-level queryingYes
Fixed, known schemaNo (use typed columns)
JSON as opaque blob (no field queries)No (use String)

Optimization: specify types for known paths:

sql
CREATE TABLE events (
    properties JSON(
        url String,
        amount Float64,
        product_id UInt64
    )
)

Reference: Use JSON Where Appropriate