scripts/session-rag-eval/SKILL.md
Use this skill when validating file question-answering behavior for large chat attachments.
Verify both sides of model behavior:
query_session_attachment when the answer depends on an uploaded large file.Prefer the Chatbox conversation-flow harness for product validation because it exercises the real renderer, config, license, local API, file upload, indexing, tool registration, and persisted messages.
Default fixture repo:
../../chatbox-session-rag-eval-fixtures
Fixture types:
Regenerate fixtures in the fixture repo:
node scripts/generate-fixtures.mjs
node scripts/fetch-real-fixtures.mjs
Start the local API before running the harness. Then build with USE_LOCAL_API=true; the renderer API origin is compiled
into the bundle.
USE_LOCAL_API=true node ./node_modules/electron-vite/bin/electron-vite.js build --mode development
pnpm eval:session-rag:chatbox -- --case long-citrine-threshold --keep-user-data
The harness copies the real config.json into an isolated temporary userDataDir, injects a temporary default chat model
if missing, and stores the session RAG sqlite DB at a separate temporary path.
session_attachment_embedding capability is false or unavailable for the
active local API/license path.USE_LOCAL_API=true.sessionAttachmentIndexStatus: "ready" before context building.pnpm eval:session-rag:chatbox -- --case long-citrine-threshold
pnpm eval:session-rag:chatbox -- --case implicit-citrine-current-policy
pnpm eval:session-rag:chatbox -- --case real-wiki-apollo-implicit-landing-site
pnpm eval:session-rag:chatbox -- --case multi-turn-real-wiki-apollo-followup
pnpm eval:session-rag:chatbox -- --case unrelated-simple-math
pnpm eval:session-rag:chatbox -- --case real-wiki-unrelated-capital
For fast model-behavior iteration without Electron:
pnpm eval:session-rag -- --dry-run
pnpm eval:session-rag -- --case implicit-citrine-current-policy