README

Attack Discovery eval suite (`attack-discovery`)

This package provides a Kibana eval suite for Attack Discovery using the @kbn/evals framework (Playwright + Scout).

It's designed to:

run dataset-driven evals with data sourced from the golden cluster (managed datasets API)
exercise multiple input modes (bundled alerts, search alerts via API, graph state)
score outputs with a legacy rubric (LLM-as-judge) plus deterministic sanity checks

Suite ID

attack-discovery

Registered in .buildkite/pipelines/evals/evals.suites.json.

What this suite evaluates

The core task is: generate Attack Discoveries from alert context.

Outputs are scored by:

AttackDiscoveryBasic (CODE): ensures the output is present and shaped correctly
AttackDiscoveryRubric (LLM): legacy rubric-based correctness judgement (Y/N mapped to score 1/0)

Dataset sourcing

The dataset is not checked into the repository to avoid polluting OSS training data. It is managed via the Evals dataset API on the golden cluster: https://kbn-evals-serverless-ed035a.kb.us-central1.gcp.elastic.cloud/

Default (CI / golden cluster)

By default, the suite uses trustUpstreamDataset: true and resolves the dataset by name from the golden cluster. The name defaults to attack_discovery: bundled alerts (jsonl) and can be overridden with ATTACK_DISCOVERY_DATASET_NAME.

This requires EVALUATIONS_KBN_URL and EVALUATIONS_KBN_API_KEY to be set (automatically configured in CI via the vault config, or locally via local_ci_env.sh).

Local JSONL override

For local development, you can bypass the golden cluster and load directly from a local JSONL file:

bash

ATTACK_DISCOVERY_DATASET_JSONL_PATH=data/eval_dataset_attack_discovery_all_scenarios.jsonl \
  node scripts/evals run --suite attack-discovery ...

Place the JSONL in the data/ directory (it is gitignored).

Uploading the dataset to the golden cluster

Use the provided upload script. Set the required env vars from your config (or export them via your preferred method):

bash

EVALUATIONS_KBN_URL=https://kbn-evals-serverless-ed035a.kb.us-central1.gcp.elastic.cloud \
EVALUATIONS_KBN_API_KEY=<your-api-key> \
  node x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/scripts/upload_dataset.js [path/to/file.jsonl]

If no path is given, it defaults to data/eval_dataset_attack_discovery_all_scenarios.jsonl. The API key is the same one generated by node scripts/evals init config.

Input modes

The suite supports an input union AttackDiscoveryTaskInput (src/types.ts):

1) `bundledAlerts` (primary / golden cluster dataset)

Use this when your dataset already contains anonymized alert context.

Dataset name: attack_discovery: bundled alerts (jsonl)
Loader (local fallback): src/dataset/load_attack_discovery_jsonl.ts

Each example is mapped to:

input.mode = "bundledAlerts"
input.anonymizedAlerts[] (objects with pageContent + metadata)
output.attackDiscoveries[] (expected results)

2) `searchAlerts` (API-backed)

This mode triggers Attack Discovery via the existing public API:

POST /api/attack_discovery/_generate
polls GET /api/attack_discovery/generations/{execution_uuid} until completion

Implementation lives in src/clients/attack_discovery_client.ts.

Snapshot-based alert data

To provide realistic alert data for the searchAlerts path, this suite supports restoring an Elasticsearch snapshot from GCS before running:

bash

node scripts/evals run --suite attack-discovery --grep "searchAlerts"

Defaults are pinned in code for repeatability:

bucket: security-ai-datasets
base path: attack-discovery/oh-my-malware/2026-03-26
snapshot: alerts-snapshot

Override as needed:

bash

ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BUCKET=security-ai-datasets \
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BASE_PATH=attack-discovery/oh-my-malware/2026-03-26 \
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_NAME=alerts-snapshot \
  node scripts/evals run --suite attack-discovery --grep "searchAlerts"

The restore uses the shared GCS_CREDENTIALS service account (automatically set via the vault config when using node scripts/evals). See src/data_generators/restore_alerts_snapshot.ts.

Dataset registry (Dataplex)

To make this snapshot discoverable via the Snapshot Dataset Management best practices, register it in Dataplex using the checked-in aspects file:

x-pack/platform/packages/shared/kbn-evals/snapshots/dataplex/security_ai/attack_discovery_oh_my_malware_2026_03_26.yaml

Example:

bash

gcloud dataplex entries create attack-discovery-oh-my-malware-2026-03-26 \
  --location=us-central1 \
  --project=elastic-observability \
  --entry-group=snapshot-datasets \
  --entry-type=projects/elastic-observability/locations/global/entryTypes/es-snapshot \
  --fully-qualified-name="custom:es-snapshots.security-ai-datasets.attack-discovery.oh-my-malware.2026-03-26" \
  --entry-source-resource="gs://security-ai-datasets/attack-discovery/oh-my-malware/2026-03-26" \
  --entry-source-display-name="Attack Discovery: oh-my-malware (2026-03-26)" \
  --entry-source-description="Attack Discovery alert snapshot for searchAlerts eval runs" \
  --entry-source-update-time="2026-03-26T00:00:00Z" \
  --aspects=x-pack/platform/packages/shared/kbn-evals/snapshots/dataplex/security_ai/attack_discovery_oh_my_malware_2026_03_26.yaml

3) `graphState` (prompt-input stub)

This mode accepts partial "graph-state-like" inputs and applies helpful defaults. It's intended as an extension point for future parity with deeper legacy graph-state evaluation.

How to run locally

0) Bootstrap

From repo root:

bash

nvm use
yarn kbn bootstrap

1) Set up local config (one-time)

bash

node scripts/evals init config

2) Start the local eval stack (Scout)

bash

nvm use && node scripts/evals scout

3) Run the suite (golden cluster dataset)

bash

nvm use && node scripts/evals run --suite attack-discovery --model sonnet-3-7 --judge sonnet-3-7

4) Run with local JSONL override

bash

nvm use && ATTACK_DISCOVERY_DATASET_JSONL_PATH=x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/data/eval_dataset_attack_discovery_all_scenarios.jsonl \
  node scripts/evals run --suite attack-discovery \
  --model sonnet-3-7 --judge sonnet-3-7

5) Fast smoke: run exactly one record

For fast local sanity checks with a local JSONL:

ATTACK_DISCOVERY_DATASET_LIMIT=1: load only 1 JSONL record
ATTACK_DISCOVERY_DATASET_OFFSET=<n>: skip the first n records (0-based)
ATTACK_DISCOVERY_EVAL_CONCURRENCY=1: run the executor with concurrency 1

Example:

bash

nvm use && ATTACK_DISCOVERY_DATASET_JSONL_PATH=data/eval_dataset_attack_discovery_all_scenarios.jsonl \
  ATTACK_DISCOVERY_DATASET_LIMIT=1 ATTACK_DISCOVERY_DATASET_OFFSET=4 ATTACK_DISCOVERY_EVAL_CONCURRENCY=1 \
  node scripts/evals run --suite attack-discovery \
  --model sonnet-3-7 --judge sonnet-3-7

Environment variables

Variable	Purpose	Default
`ATTACK_DISCOVERY_DATASET_NAME`	Dataset name to resolve from golden cluster	`attack_discovery: bundled alerts (jsonl)`
`ATTACK_DISCOVERY_DATASET_JSONL_PATH`	When set, load dataset from this local JSONL path instead of the golden cluster	(unset)
`ATTACK_DISCOVERY_DATASET_LIMIT`	Max examples to load from JSONL	(all)
`ATTACK_DISCOVERY_DATASET_OFFSET`	Skip first N examples in JSONL	0
`ATTACK_DISCOVERY_EVAL_CONCURRENCY`	Concurrency for executor `runExperiment`	5
`ATTACK_DISCOVERY_ALERTS_SNAPSHOT_DISABLE`	Disable snapshot restore for `searchAlerts` smoke	(unset)
`ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BUCKET`	GCS bucket for alert snapshot restore	`security-ai-datasets`
`ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BASE_PATH`	GCS base path within the bucket	`attack-discovery/oh-my-malware/2026-03-26`
`ATTACK_DISCOVERY_ALERTS_SNAPSHOT_NAME`	Specific snapshot name (defaults to pinned)	`alerts-snapshot`
`EVALUATIONS_KBN_URL`	Golden cluster Kibana URL for dataset ops	(from vault config)
`EVALUATIONS_KBN_API_KEY`	API key for golden cluster Kibana	(from vault config)

Connector notes / troubleshooting

This suite uses /internal/inference/prompt for prompt-based generation and the rubric judge. In some environments, .inference connectors are not supported by that endpoint. If you see:

Connector '...' of type '.inference' not recognized as a supported connector

Use a connector that works with the inference prompt endpoint (e.g. sonnet-3-7).

If using Azure .gen-ai connectors and you see provider errors like DeploymentNotFound, the configured deployment may not exist (or may not be reachable from your environment).

Package notes

This suite is Node-only (Playwright eval suite). It includes a package-local ESLint override:

./.eslintrc.js disables import/no-nodejs-modules inside this package

Attack Discovery eval suite (attack-discovery)

Suite ID

What this suite evaluates

Dataset sourcing

Default (CI / golden cluster)

Local JSONL override

Uploading the dataset to the golden cluster

Input modes

1) bundledAlerts (primary / golden cluster dataset)

2) searchAlerts (API-backed)

Snapshot-based alert data

Dataset registry (Dataplex)

3) graphState (prompt-input stub)

How to run locally

0) Bootstrap

1) Set up local config (one-time)

2) Start the local eval stack (Scout)

3) Run the suite (golden cluster dataset)

4) Run with local JSONL override

5) Fast smoke: run exactly one record

Environment variables

Connector notes / troubleshooting

Package notes

Attack Discovery eval suite (`attack-discovery`)

1) `bundledAlerts` (primary / golden cluster dataset)

2) `searchAlerts` (API-backed)

3) `graphState` (prompt-input stub)