.gemini/skills/behavioral-evals/references/running.md
Behavioral evals run against the compiled binary. You must build and bundle the project first after making changes:
npm run build && npm run bundle
Evals require a standard API key. If your .env file has multiple keys or
comments, use this precise extraction setup:
export GEMINI_API_KEY=$(grep '^GEMINI_API_KEY=' .env | cut -d '=' -f2) && RUN_EVALS=1 npx vitest run --config evals/vitest.config.ts <file_name>
| Command | Scope | Description |
|---|---|---|
npm run test:always_passing_evals | ALWAYS_PASSES | Fast feedback, runs in CI. |
npm run test:all_evals | All | Runs nightly incubation tests. Sets RUN_EVALS=1. |
Note: RUN_EVALS=1 is required for incubated (USUALLY_PASSES) tests.
RUN_EVALS=1 npx vitest run --config evals/vitest.config.ts my_feature.eval.ts
If a test fails, verify:
evals/logs/<test_name>.log.GEMINI_DEBUG_LOG_FILE:
export GEMINI_DEBUG_LOG_FILE="debug.log"
To maintain CI stability, all new evals follow a strict incubation period.
USUALLY_PASSES)New tests must be created with the USUALLY_PASSES policy.
evalTest('USUALLY_PASSES', { ... })
They run in Evals: Nightly workflows and do not block PR merges.
If a nightly eval regresses, investigate via agent:
gemini /fix-behavioral-eval [optional-run-uri]
ALWAYS_PASSES)Once a test scores 100% consistency over multiple nightly cycles:
gemini /promote-behavioral-eval
Do not promote manually. The command verifies trajectory logs before updating the file policy.