x-pack/platform/packages/shared/kbn-evals-extensions/README.md
Advanced evaluation capabilities for @kbn/evals - standalone extensions package.
This package extends @kbn/evals with advanced features ported from cursor-plugin-evals and serves as the home for Phases 3-5 of the evals roadmap.
Critical principle: This package is designed to be completely independent from @kbn/evals.
┌─────────────────────────────────────────────────────┐
│ Evaluation Suites │
│ (agent-builder, obs-ai-assistant, security) │
└──────────────────┬──────────────────────────────────┘
│
┌──────────┴──────────┐
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────────────────┐
│ @kbn/evals │ │ @kbn/evals-extensions │
│ (core) │ │ (advanced features) │
│ │ │ │
│ ✅ Evaluators │ │ ✅ Safety evaluators │
│ ✅ Scout/PW │ │ ✅ Cost tracking │
│ ✅ ES export │ │ ✅ Dataset management │
│ ✅ Stats │ │ ✅ UI components │
│ ✅ CLI basics │ │ ✅ Watch mode │
│ │ │ ✅ A/B testing │
│ ❌ NO imports │ │ ✅ Human-in-the-loop │
│ from ext ─────┼───┼──X │
│ │ │ │
└──────────────────┘ └──────────┬──────────────────┘
│
│ depends on
▼
┌──────────────────┐
│ @kbn/evals │
│ (types, utils) │
└──────────────────┘
Dependency Rules:
kbn-evals-extensions CAN import from kbn-evalskbn-evals MUST NOT import from kbn-evals-extensionsEvaluation suites import extensions explicitly:
// Example: agent-builder evaluation suite
import { evaluate } from '@kbn/evals';
import {
createToxicityEvaluator,
createPiiDetector,
createBiasEvaluator,
costTracker,
watchMode
} from '@kbn/evals-extensions';
evaluate('security test', async ({ executorClient }) => {
// Mix core and extension evaluators
await executorClient.runExperiment(
{ dataset, task },
[
...createCorrectnessEvaluators(), // core kbn/evals
createToxicityEvaluator(), // extension
createPiiDetector(), // extension
]
);
// Use extension features
await costTracker.logRunCost(executorClient.getRunId());
});
Extensions use environment variables for opt-in behavior:
# Enable watch mode
KBN_EVALS_EXT_WATCH_MODE=true node scripts/evals run --suite <id>
# Enable parallel execution
KBN_EVALS_EXT_PARALLEL=true node scripts/evals run --suite <id>
# Enable result caching
KBN_EVALS_EXT_CACHE=true node scripts/evals run --suite <id>
All features follow principles from "Future of @kbn/evals":
yarn test:jest --testPathPattern=kbn-evals-extensions
yarn test:type_check --project x-pack/platform/packages/shared/kbn-evals-extensions/tsconfig.json
node scripts/eslint --fix x-pack/platform/packages/shared/kbn-evals-extensions
See individual feature directories for contribution guidelines. All PRs should:
@kbn/evals core