x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/README.md
attack-discovery)This package provides a Kibana eval suite for Attack Discovery using the @kbn/evals framework (Playwright + Scout).
It's designed to:
attack-discoveryRegistered in .buildkite/pipelines/evals/evals.suites.json.
The core task is: generate Attack Discoveries from alert context.
Outputs are scored by:
AttackDiscoveryBasic (CODE): ensures the output is present and shaped correctlyAttackDiscoveryRubric (LLM): legacy rubric-based correctness judgement (Y/N mapped to score 1/0)The dataset is not checked into the repository to avoid polluting OSS training data. It is managed via the Evals dataset API on the golden cluster: https://kbn-evals-serverless-ed035a.kb.us-central1.gcp.elastic.cloud/
By default, the suite uses trustUpstreamDataset: true and resolves the dataset
by name from the golden cluster. The name defaults to attack_discovery: bundled alerts (jsonl)
and can be overridden with ATTACK_DISCOVERY_DATASET_NAME.
This requires EVALUATIONS_KBN_URL and EVALUATIONS_KBN_API_KEY to be set
(automatically configured in CI via the vault config, or locally via local_ci_env.sh).
For local development, you can bypass the golden cluster and load directly from a local JSONL file:
ATTACK_DISCOVERY_DATASET_JSONL_PATH=data/eval_dataset_attack_discovery_all_scenarios.jsonl \
node scripts/evals run --suite attack-discovery ...
Place the JSONL in the data/ directory (it is gitignored).
Use the provided upload script. Set the required env vars from your config (or export them via your preferred method):
EVALUATIONS_KBN_URL=https://kbn-evals-serverless-ed035a.kb.us-central1.gcp.elastic.cloud \
EVALUATIONS_KBN_API_KEY=<your-api-key> \
node x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/scripts/upload_dataset.js [path/to/file.jsonl]
If no path is given, it defaults to data/eval_dataset_attack_discovery_all_scenarios.jsonl.
The API key is the same one generated by node scripts/evals init config.
The suite supports an input union AttackDiscoveryTaskInput (src/types.ts):
bundledAlerts (primary / golden cluster dataset)Use this when your dataset already contains anonymized alert context.
attack_discovery: bundled alerts (jsonl)src/dataset/load_attack_discovery_jsonl.tsEach example is mapped to:
input.mode = "bundledAlerts"input.anonymizedAlerts[] (objects with pageContent + metadata)output.attackDiscoveries[] (expected results)searchAlerts (API-backed)This mode triggers Attack Discovery via the existing public API:
POST /api/attack_discovery/_generateGET /api/attack_discovery/generations/{execution_uuid} until completionImplementation lives in src/clients/attack_discovery_client.ts.
To provide realistic alert data for the searchAlerts path, this suite supports
restoring an Elasticsearch snapshot from GCS before running:
node scripts/evals run --suite attack-discovery --grep "searchAlerts"
Defaults are pinned in code for repeatability:
security-ai-datasetsattack-discovery/oh-my-malware/2026-03-26alerts-snapshotOverride as needed:
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BUCKET=security-ai-datasets \
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BASE_PATH=attack-discovery/oh-my-malware/2026-03-26 \
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_NAME=alerts-snapshot \
node scripts/evals run --suite attack-discovery --grep "searchAlerts"
The restore uses the shared GCS_CREDENTIALS service account (automatically set via the vault config when using node scripts/evals).
See src/data_generators/restore_alerts_snapshot.ts.
To make this snapshot discoverable via the Snapshot Dataset Management best practices, register it in Dataplex using the checked-in aspects file:
x-pack/platform/packages/shared/kbn-evals/snapshots/dataplex/security_ai/attack_discovery_oh_my_malware_2026_03_26.yamlExample:
gcloud dataplex entries create attack-discovery-oh-my-malware-2026-03-26 \
--location=us-central1 \
--project=elastic-observability \
--entry-group=snapshot-datasets \
--entry-type=projects/elastic-observability/locations/global/entryTypes/es-snapshot \
--fully-qualified-name="custom:es-snapshots.security-ai-datasets.attack-discovery.oh-my-malware.2026-03-26" \
--entry-source-resource="gs://security-ai-datasets/attack-discovery/oh-my-malware/2026-03-26" \
--entry-source-display-name="Attack Discovery: oh-my-malware (2026-03-26)" \
--entry-source-description="Attack Discovery alert snapshot for searchAlerts eval runs" \
--entry-source-update-time="2026-03-26T00:00:00Z" \
--aspects=x-pack/platform/packages/shared/kbn-evals/snapshots/dataplex/security_ai/attack_discovery_oh_my_malware_2026_03_26.yaml
graphState (prompt-input stub)This mode accepts partial "graph-state-like" inputs and applies helpful defaults. It's intended as an extension point for future parity with deeper legacy graph-state evaluation.
From repo root:
nvm use
yarn kbn bootstrap
node scripts/evals init config
nvm use && node scripts/evals scout
nvm use && node scripts/evals run --suite attack-discovery --model sonnet-3-7 --judge sonnet-3-7
nvm use && ATTACK_DISCOVERY_DATASET_JSONL_PATH=x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/data/eval_dataset_attack_discovery_all_scenarios.jsonl \
node scripts/evals run --suite attack-discovery \
--model sonnet-3-7 --judge sonnet-3-7
For fast local sanity checks with a local JSONL:
ATTACK_DISCOVERY_DATASET_LIMIT=1: load only 1 JSONL recordATTACK_DISCOVERY_DATASET_OFFSET=<n>: skip the first n records (0-based)ATTACK_DISCOVERY_EVAL_CONCURRENCY=1: run the executor with concurrency 1Example:
nvm use && ATTACK_DISCOVERY_DATASET_JSONL_PATH=data/eval_dataset_attack_discovery_all_scenarios.jsonl \
ATTACK_DISCOVERY_DATASET_LIMIT=1 ATTACK_DISCOVERY_DATASET_OFFSET=4 ATTACK_DISCOVERY_EVAL_CONCURRENCY=1 \
node scripts/evals run --suite attack-discovery \
--model sonnet-3-7 --judge sonnet-3-7
| Variable | Purpose | Default |
|---|---|---|
ATTACK_DISCOVERY_DATASET_NAME | Dataset name to resolve from golden cluster | attack_discovery: bundled alerts (jsonl) |
ATTACK_DISCOVERY_DATASET_JSONL_PATH | When set, load dataset from this local JSONL path instead of the golden cluster | (unset) |
ATTACK_DISCOVERY_DATASET_LIMIT | Max examples to load from JSONL | (all) |
ATTACK_DISCOVERY_DATASET_OFFSET | Skip first N examples in JSONL | 0 |
ATTACK_DISCOVERY_EVAL_CONCURRENCY | Concurrency for executor runExperiment | 5 |
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_DISABLE | Disable snapshot restore for searchAlerts smoke | (unset) |
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BUCKET | GCS bucket for alert snapshot restore | security-ai-datasets |
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BASE_PATH | GCS base path within the bucket | attack-discovery/oh-my-malware/2026-03-26 |
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_NAME | Specific snapshot name (defaults to pinned) | alerts-snapshot |
EVALUATIONS_KBN_URL | Golden cluster Kibana URL for dataset ops | (from vault config) |
EVALUATIONS_KBN_API_KEY | API key for golden cluster Kibana | (from vault config) |
This suite uses /internal/inference/prompt for prompt-based generation and the rubric judge.
In some environments, .inference connectors are not supported by that endpoint. If you see:
Connector '...' of type '.inference' not recognized as a supported connectorUse a connector that works with the inference prompt endpoint (e.g. sonnet-3-7).
If using Azure .gen-ai connectors and you see provider errors like DeploymentNotFound,
the configured deployment may not exist (or may not be reachable from your environment).
This suite is Node-only (Playwright eval suite). It includes a package-local ESLint override:
./.eslintrc.js disables import/no-nodejs-modules inside this package