rust/batch-import-worker/scripts/README.md
This directory contains test scripts and utilities for the PostHog Rust batch import worker.
Generates comprehensive test data for testing the PostHog Amplitude identify logic during batch imports.
Features:
cd rust/batch-import-worker/scripts
npm install
# Generate test data (saves to amplitude-test-data.json)
npm run generate
# Or run directly
node amplitude-test-generator.js
# US Cluster (default)
AMPLITUDE_API_KEY=your_real_api_key npm run generate
# EU Cluster
AMPLITUDE_API_KEY=your_real_api_key AMPLITUDE_CLUSTER=eu npm run generate
# Specify custom time range using environment variables
AMPLITUDE_START_TIME='2024-01-01T00:00:00Z' AMPLITUDE_END_TIME='2024-01-07T23:59:59Z' npm run generate
# Use relative time ranges
AMPLITUDE_START_TIME='1 week ago' AMPLITUDE_END_TIME='now' npm run generate
# Single day range
AMPLITUDE_START_TIME='2024-01-15' AMPLITUDE_END_TIME='2024-01-15T23:59:59Z' npm run generate
# Combined with API key and cluster
AMPLITUDE_API_KEY=your_key AMPLITUDE_CLUSTER=eu AMPLITUDE_START_TIME='2024-01-01' AMPLITUDE_END_TIME='2024-01-02' npm run generate
AMPLITUDE_API_KEY: Your Amplitude API key (required for sending to Amplitude)AMPLITUDE_CLUSTER: Choose 'us' (default) or 'eu' for the Amplitude clusterAMPLITUDE_START_TIME: Start timestamp for events (ISO format or relative, e.g., '2024-01-01T00:00:00Z', '1 week ago')AMPLITUDE_END_TIME: End timestamp for events (ISO format or relative, e.g., '2024-01-07T23:59:59Z', 'now')Note: If no time range is specified, events will be generated for a 24-hour period starting from yesterday.
The generator creates 7 comprehensive scenarios to test identify event logic:
Total: ~45 events, expecting exactly 30 identify events
Based on the generated data, you should see identify events for:
first_time_user_alice + first_time_device_mobile_1first_time_user_bob + first_time_device_laptop_2duplicate_user_frank + duplicate_device_phone_1 (first event only)duplicate_user_grace + duplicate_device_computer_2 (first event only)duplicate_user_henry + duplicate_device_ipad_3 (first event only)multi_device_user_sarah + each of 4 devicesshared_family_tablet_mainedge_unicode_用户 + edge_unicode_设备, etc.anon_to_id_user_1 + anon_device_phone_1 (when user first identifies)journey_user_alex + journey_phone_commute, journey_laptop_office, journey_tablet_homeTotal Expected: 30 identify events from ~45 generated events
Generate Test Data:
# US cluster (default) with default time range
npm run generate
# EU cluster
AMPLITUDE_CLUSTER=eu npm run generate
# With custom time range for specific migration window
AMPLITUDE_START_TIME='2024-01-01' AMPLITUDE_END_TIME='2024-01-07' npm run generate
Export from Amplitude: Use Amplitude's export API to get the generated events in the format expected by PostHog batch imports. Make sure to use the same cluster (US/EU) that you sent the data to.
Run PostHog Batch Import: Create a batch import with:
generate_identify_events: trueimport_events: trueVerify Results:
$identify events should be created$anon_distinct_id (device_id) and distinct_id (user_id)This test data generator is specifically designed to test the Rust batch import worker's identify logic found in:
src/parse/content/amplitude/identify.rs - Identify event creation logicsrc/parse/content/amplitude.rs - Main Amplitude event parsing with identify injectionsrc/job/config.rs - Job configuration including generate_identify_events flagThe generated test scenarios comprehensively cover the edge cases and logic paths in the Rust implementation.
amplitude-test-data.json - Complete test dataset with metadataTo add new test scenarios:
SCENARIO_USERS/SCENARIO_DEVICESgenerate*() method in AmplitudeTestGeneratorgenerateAllScenarios()@amplitude/analytics-node - Official Amplitude Node.js Analytics SDK for sending test events