x-pack/platform/packages/shared/kbn-evals-suite-streams/README.md
Evaluation suite for Elastic Streams pattern extraction quality.
# Start Scout server
node scripts/scout.js start-server --arch stateful --domain classic
# Run evaluations
node scripts/evals run --suite streams --evaluation-connector-id azure-gpt4o
# Only run pipeline_suggestion
node scripts/evals run --suite streams --evaluation-connector-id azure-gpt4o pipeline_suggestion
You can easily create new dataset entries from AI suggestions generated in Kibana:
In Kibana Streams UI, generate a suggestion (grok, dissect, or pipeline)
Open browser dev console and run: copyStreamsSuggestion()
Run the dataset creation script:
pbpaste | node --require ./src/setup_node_env/ ./x-pack/platform/packages/shared/kbn-evals-suite-streams/scripts/create_dataset_from_clipboard.ts
The script will automatically:
🔧 NEW DATASETS GO HERE markerReview the generated entry and fill in TODO items:
Tests run in parallel across 20 Playwright workers using fullyParallel: true. A shared setup/teardown project handles lifecycle:
logs.otel, indexes synthtrace dataConnector creation is idempotent (handles 409 conflicts) so multiple workers can safely set up the same connector.
To run a single connector:
node scripts/evals run --suite streams --evaluation-connector-id bedrock-claude --project bedrock-claude
Tests Grok and Dissect pattern generation using 21 real-world log examples with quality metrics:
Pattern generation uses @kbn/grok-heuristics and @kbn/dissect-heuristics.