apps/api/scripts/clickhouse-seeder/README.md
A comprehensive TypeScript script to populate ClickHouse observability tables with realistic mock data for load testing and development.
This seeding script generates realistic Novu usage data across multiple organizations, environments, and workflows. It creates hierarchical data that mimics real-world scenarios including:
Organization
└── Environment(s)
├── Workflow(s)
│ └── Workflow Run(s)
│ └── Step Run(s)
│ └── Trace(s)
└── Subscriber(s)
ClickHouse instance running and accessible
Environment variables set:
CLICK_HOUSE_URL=http://localhost:8123
CLICK_HOUSE_DATABASE=novu
CLICK_HOUSE_USER=default
CLICK_HOUSE_PASSWORD=
ClickHouse tables and materialized views created (run migrations first)
From the apps/api directory:
pnpm seed:clickhouse
Or from the root directory:
pnpm --filter @novu/api-service seed:clickhouse
This will generate:
pnpm seed:clickhouse -- --scale=10 --organizations=50
This will generate:
pnpm seed:clickhouse -- \
--organizations=20 \
--days=7 \
--scale=5 \
--batch-size=5000 \
--start-date=2024-01-01
For testing with existing organization, environment, workflow, and subscriber IDs:
pnpm seed:clickhouse -- \
--single-env \
--workflow=693ab23238cf527f6dc645d6 \
--subscriber=69395055051b1b19ff9e1b4c \
--org-id=69395056051b1b19ff9e1b52 \
--env-id=69395056c66fd6620f4521ba \
--days=30 \
--runs-per-day=5000
This will generate data for:
| Option | Short | Default | Description |
|---|---|---|---|
--organizations | -o | 10 | Number of organizations to create |
--days | -d | 30 | Days of data to generate |
--scale | -s | 1.0 | Data volume multiplier for load testing |
--batch-size | -b | 10000 | Records per ClickHouse insert batch |
--start-date | - | Last month | Start date for data generation (YYYY-MM-DD) |
--help | -h | - | Show help message |
| Option | Short | Default | Description |
|---|---|---|---|
--single-env | - | - | Enable single environment mode |
--org-id | - | auto-generated | Organization ID to use |
--env-id | - | auto-generated | Environment ID to use |
--workflows | -w | 5 | Number of workflows to create |
--workflow | - | - | Specific workflow ID (sets workflows to 1) |
--subscribers | - | 1000 | Number of subscribers to create |
--subscriber | - | - | Specific subscriber ID (sets subscribers to 1) |
--runs-per-day | -r | 5000 | Workflow runs per day |
--days | -d | 30 | Days of data to generate |
--batch-size | -b | 10000 | Records per ClickHouse insert batch |
--start-date | - | Last month | Start date for data generation (YYYY-MM-DD) |
| Profile | Count | Runs/Day | Total/Month |
|---|---|---|---|
| Enterprise | 3 | 20K-50K | 1.8M-4.5M |
| Large | 4 | 5K-15K | 600K-1.8M |
| Medium | 3 | 500-2K | 45K-180K |
Total: ~2.5M-6.5M workflow runs per month
Multiply all numbers by 10x for load testing scenarios.
The script generates realistic workflow patterns:
| Type | Channels | Weight | Example |
|---|---|---|---|
| Transactional | email + in_app | 40% | Order Confirmation |
| Marketing | 25% | Newsletter | |
| Alerts | push + sms | 15% | Critical Alert |
| Multi-channel | email + in_app + push | 20% | Campaign Update |
The materialized views are automatically populated by ClickHouse as data is inserted into the primary tables.
completed: 85%processing: 5%error: 10%completed: 88%failed: 7%skipped: 3%delayed: 2%delivered: 70%sent: 15%errored: 8%skipped: 4%canceled: 2%merged: 1%============================================================
ClickHouse Data Seeding Script
============================================================
Configuration:
Organizations: 10
Days: 30
Scale: 1x
Batch Size: 10000
Start Date: 2024-12-01
✓ Connected to ClickHouse
Phase 1: Generating Organizations and Structure
------------------------------------------------------------
✓ Generated 10 organizations
Environments: 21
Workflows: 147
Subscribers: 32,450
Organization Breakdown:
Enterprise: 3
Large: 4
Medium: 3
Phase 2: Generating Workflow Runs
------------------------------------------------------------
✓ Generated 2,847,593 workflow runs
Phase 3: Generating Step Runs
------------------------------------------------------------
✓ Generated 7,119,483 step runs
Phase 4: Generating Traces
------------------------------------------------------------
✓ Generated 21,358,449 traces
Phase 5: Inserting Data into ClickHouse
------------------------------------------------------------
...
============================================================
Insertion Statistics
============================================================
Workflow Runs: 2,847,593
Step Runs: 7,119,483
Traces: 21,358,449
Total Records: 31,325,525
Duration: 127.34s
============================================================
Additional Information:
Estimated Size: 14.2 GB
Records/Second: 246,123
✓ Data seeding completed successfully!
Ensure ClickHouse environment variables are set correctly:
export CLICK_HOUSE_URL=http://localhost:8123
export CLICK_HOUSE_DATABASE=novu
Reduce batch size for systems with limited memory:
pnpm seed:clickhouse -- --batch-size=5000
Increase batch size for faster insertion (if memory allows):
pnpm seed:clickhouse -- --batch-size=20000
apps/api/scripts/
├── seed-clickhouse.ts # Main entry point
└── clickhouse-seeder/
├── config.ts # Configuration and CLI parsing
├── time-distribution.ts # Time pattern generation
├── generators.ts # Data generation logic
├── inserter.ts # Batched ClickHouse insertion
└── README.md # This file
Edit config.ts and add to ORGANIZATION_PROFILES:
export const ORGANIZATION_PROFILES = {
// ... existing profiles
startup: {
type: 'startup',
runsPerDayMin: 10,
runsPerDayMax: 100,
workflowsMin: 1,
workflowsMax: 3,
subscribersMin: 10,
subscribersMax: 100,
environmentsMin: 1,
environmentsMax: 1,
},
};
Edit config.ts and add to WORKFLOW_TEMPLATES:
export const WORKFLOW_TEMPLATES: WorkflowTemplate[] = [
// ... existing templates
{
type: 'support',
name: 'Support Ticket',
channels: ['email', 'sms'],
weight: 0.1
},
];
--scale=10 or higheruser_1, user_2, etc.