Back to Kibana

README

x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/README.md

9.4.08.9 KB
Original Source

Attack Discovery eval suite (attack-discovery)

This package provides a Kibana eval suite for Attack Discovery using the @kbn/evals framework (Playwright + Scout).

It's designed to:

  • run dataset-driven evals with data sourced from the golden cluster (managed datasets API)
  • exercise multiple input modes (bundled alerts, search alerts via API, graph state)
  • score outputs with a legacy rubric (LLM-as-judge) plus deterministic sanity checks

Suite ID

  • attack-discovery

Registered in .buildkite/pipelines/evals/evals.suites.json.


What this suite evaluates

The core task is: generate Attack Discoveries from alert context.

Outputs are scored by:

  • AttackDiscoveryBasic (CODE): ensures the output is present and shaped correctly
  • AttackDiscoveryRubric (LLM): legacy rubric-based correctness judgement (Y/N mapped to score 1/0)

Dataset sourcing

The dataset is not checked into the repository to avoid polluting OSS training data. It is managed via the Evals dataset API on the golden cluster: https://kbn-evals-serverless-ed035a.kb.us-central1.gcp.elastic.cloud/

Default (CI / golden cluster)

By default, the suite uses trustUpstreamDataset: true and resolves the dataset by name from the golden cluster. The name defaults to attack_discovery: bundled alerts (jsonl) and can be overridden with ATTACK_DISCOVERY_DATASET_NAME.

This requires EVALUATIONS_KBN_URL and EVALUATIONS_KBN_API_KEY to be set (automatically configured in CI via the vault config, or locally via local_ci_env.sh).

Local JSONL override

For local development, you can bypass the golden cluster and load directly from a local JSONL file:

bash
ATTACK_DISCOVERY_DATASET_JSONL_PATH=data/eval_dataset_attack_discovery_all_scenarios.jsonl \
  node scripts/evals run --suite attack-discovery ...

Place the JSONL in the data/ directory (it is gitignored).

Uploading the dataset to the golden cluster

Use the provided upload script. Set the required env vars from your config (or export them via your preferred method):

bash
EVALUATIONS_KBN_URL=https://kbn-evals-serverless-ed035a.kb.us-central1.gcp.elastic.cloud \
EVALUATIONS_KBN_API_KEY=<your-api-key> \
  node x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/scripts/upload_dataset.js [path/to/file.jsonl]

If no path is given, it defaults to data/eval_dataset_attack_discovery_all_scenarios.jsonl. The API key is the same one generated by node scripts/evals init config.


Input modes

The suite supports an input union AttackDiscoveryTaskInput (src/types.ts):

1) bundledAlerts (primary / golden cluster dataset)

Use this when your dataset already contains anonymized alert context.

  • Dataset name: attack_discovery: bundled alerts (jsonl)
  • Loader (local fallback): src/dataset/load_attack_discovery_jsonl.ts

Each example is mapped to:

  • input.mode = "bundledAlerts"
  • input.anonymizedAlerts[] (objects with pageContent + metadata)
  • output.attackDiscoveries[] (expected results)

2) searchAlerts (API-backed)

This mode triggers Attack Discovery via the existing public API:

  • POST /api/attack_discovery/_generate
  • polls GET /api/attack_discovery/generations/{execution_uuid} until completion

Implementation lives in src/clients/attack_discovery_client.ts.

Snapshot-based alert data

To provide realistic alert data for the searchAlerts path, this suite supports restoring an Elasticsearch snapshot from GCS before running:

bash
node scripts/evals run --suite attack-discovery --grep "searchAlerts"

Defaults are pinned in code for repeatability:

  • bucket: security-ai-datasets
  • base path: attack-discovery/oh-my-malware/2026-03-26
  • snapshot: alerts-snapshot

Override as needed:

bash
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BUCKET=security-ai-datasets \
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BASE_PATH=attack-discovery/oh-my-malware/2026-03-26 \
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_NAME=alerts-snapshot \
  node scripts/evals run --suite attack-discovery --grep "searchAlerts"

The restore uses the shared GCS_CREDENTIALS service account (automatically set via the vault config when using node scripts/evals). See src/data_generators/restore_alerts_snapshot.ts.

Dataset registry (Dataplex)

To make this snapshot discoverable via the Snapshot Dataset Management best practices, register it in Dataplex using the checked-in aspects file:

  • x-pack/platform/packages/shared/kbn-evals/snapshots/dataplex/security_ai/attack_discovery_oh_my_malware_2026_03_26.yaml

Example:

bash
gcloud dataplex entries create attack-discovery-oh-my-malware-2026-03-26 \
  --location=us-central1 \
  --project=elastic-observability \
  --entry-group=snapshot-datasets \
  --entry-type=projects/elastic-observability/locations/global/entryTypes/es-snapshot \
  --fully-qualified-name="custom:es-snapshots.security-ai-datasets.attack-discovery.oh-my-malware.2026-03-26" \
  --entry-source-resource="gs://security-ai-datasets/attack-discovery/oh-my-malware/2026-03-26" \
  --entry-source-display-name="Attack Discovery: oh-my-malware (2026-03-26)" \
  --entry-source-description="Attack Discovery alert snapshot for searchAlerts eval runs" \
  --entry-source-update-time="2026-03-26T00:00:00Z" \
  --aspects=x-pack/platform/packages/shared/kbn-evals/snapshots/dataplex/security_ai/attack_discovery_oh_my_malware_2026_03_26.yaml

3) graphState (prompt-input stub)

This mode accepts partial "graph-state-like" inputs and applies helpful defaults. It's intended as an extension point for future parity with deeper legacy graph-state evaluation.


How to run locally

0) Bootstrap

From repo root:

bash
nvm use
yarn kbn bootstrap

1) Set up local config (one-time)

bash
node scripts/evals init config

2) Start the local eval stack (Scout)

bash
nvm use && node scripts/evals scout

3) Run the suite (golden cluster dataset)

bash
nvm use && node scripts/evals run --suite attack-discovery --model sonnet-3-7 --judge sonnet-3-7

4) Run with local JSONL override

bash
nvm use && ATTACK_DISCOVERY_DATASET_JSONL_PATH=x-pack/solutions/security/packages/kbn-evals-suite-attack-discovery/data/eval_dataset_attack_discovery_all_scenarios.jsonl \
  node scripts/evals run --suite attack-discovery \
  --model sonnet-3-7 --judge sonnet-3-7

5) Fast smoke: run exactly one record

For fast local sanity checks with a local JSONL:

  • ATTACK_DISCOVERY_DATASET_LIMIT=1: load only 1 JSONL record
  • ATTACK_DISCOVERY_DATASET_OFFSET=<n>: skip the first n records (0-based)
  • ATTACK_DISCOVERY_EVAL_CONCURRENCY=1: run the executor with concurrency 1

Example:

bash
nvm use && ATTACK_DISCOVERY_DATASET_JSONL_PATH=data/eval_dataset_attack_discovery_all_scenarios.jsonl \
  ATTACK_DISCOVERY_DATASET_LIMIT=1 ATTACK_DISCOVERY_DATASET_OFFSET=4 ATTACK_DISCOVERY_EVAL_CONCURRENCY=1 \
  node scripts/evals run --suite attack-discovery \
  --model sonnet-3-7 --judge sonnet-3-7

Environment variables

VariablePurposeDefault
ATTACK_DISCOVERY_DATASET_NAMEDataset name to resolve from golden clusterattack_discovery: bundled alerts (jsonl)
ATTACK_DISCOVERY_DATASET_JSONL_PATHWhen set, load dataset from this local JSONL path instead of the golden cluster(unset)
ATTACK_DISCOVERY_DATASET_LIMITMax examples to load from JSONL(all)
ATTACK_DISCOVERY_DATASET_OFFSETSkip first N examples in JSONL0
ATTACK_DISCOVERY_EVAL_CONCURRENCYConcurrency for executor runExperiment5
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_DISABLEDisable snapshot restore for searchAlerts smoke(unset)
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BUCKETGCS bucket for alert snapshot restoresecurity-ai-datasets
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_BASE_PATHGCS base path within the bucketattack-discovery/oh-my-malware/2026-03-26
ATTACK_DISCOVERY_ALERTS_SNAPSHOT_NAMESpecific snapshot name (defaults to pinned)alerts-snapshot
EVALUATIONS_KBN_URLGolden cluster Kibana URL for dataset ops(from vault config)
EVALUATIONS_KBN_API_KEYAPI key for golden cluster Kibana(from vault config)

Connector notes / troubleshooting

This suite uses /internal/inference/prompt for prompt-based generation and the rubric judge. In some environments, .inference connectors are not supported by that endpoint. If you see:

  • Connector '...' of type '.inference' not recognized as a supported connector

Use a connector that works with the inference prompt endpoint (e.g. sonnet-3-7).

If using Azure .gen-ai connectors and you see provider errors like DeploymentNotFound, the configured deployment may not exist (or may not be reachable from your environment).


Package notes

This suite is Node-only (Playwright eval suite). It includes a package-local ESLint override:

  • ./.eslintrc.js disables import/no-nodejs-modules inside this package