packages/shared/scripts/seeder/utils/README.md
System for generating test data in ClickHouse and PostgreSQL for Langfuse development and testing.
seeder/
├── types.ts # Core interfaces and types
├── data-generators.ts # Data generation logic
├── clickhouse-builder.ts # ClickHouse query building
├── seeder-orchestrator.ts # Main orchestration logic
├── postgres-seed-constants.ts # PostgreSQL data constants
├── clickhouse-seed-constants.ts # ClickHouse data constants
└── seed-helpers.ts # Utility functions
import { SeederOrchestrator } from "./seeder/seeder-orchestrator";
const orchestrator = new SeederOrchestrator();
// Full seed (datasets + evaluation + synthetic data)
await orchestrator.executeFullSeed(projectIds, {
numberOfDays: 30,
totalObservations: 10000,
numberOfRuns: 3,
});
// Individual data types
await orchestrator.createDatasetExperimentData(projectIds, config);
await orchestrator.createEvaluationData(projectIds);
await orchestrator.createSyntheticData(projectIds, config);
langfuse-prompt-experimenttrace-dataset-{datasetName}-{itemIndex}-{projectId}-{runNumber}langfuse-evaluationtrace-eval-{index}-{projectId}defaulttrace-synthetic-{index}-{projectId}Generates realistic data for all three types. If you need to change any clickhouse data, you should modify this class. Key methods:
generateDatasetTrace() - Creates traces from dataset itemsgenerateSyntheticTraces() - Creates realistic synthetic tracesgenerateEvaluationTraces() - Creates evaluation-focused tracesBuilds optimized ClickHouse insert queries. No need to edit this file. Handles proper escaping and type handling.
Main coordination class that:
interface SeederConfig {
numberOfDays: number; // How far back to generate timestamps
numberOfRuns?: number; // How many experiment runs per dataset
totalObservations?: number; // Total observations for synthetic data
}
types.tsDataGeneratorClickHouseQueryBuilderSeederOrchestratorSeederOrchestrator.loadFileContent()DataGeneratorFileContent interface if neededDataGeneratorclickhouse-seed-constants.tsseed-helpers.ts functions consistentlypackages/shared/clickhouse/
├── nested_json.json # Large JSON for realistic inputs
├── markdown.txt # Markdown content for document analysis
└── chat_ml_json.json # Chat ML format examples
postgres-seed-constants.ts - Datasets, prompts, and PostgreSQL dataclickhouse-seed-constants.ts - ClickHouse-specific constants (models, names)