docs/technical/session-attachment-rag-eval.md
Last updated: 2026-05
This document records the automated test workflow for the session attachment RAG PoC.
Session attachment RAG is model-driven: large file content is not injected into the prompt. The prompt contains
<ATTACHMENT_FILE> tags and a ``, and the model must decide when to call retrieval tools.
The eval suite checks:
query_session_attachment;Default local path:
../../chatbox-session-rag-eval-fixtures
The fixture repo contains:
Regenerate fixtures:
cd ../../chatbox-session-rag-eval-fixtures
node scripts/generate-fixtures.mjs
node scripts/fetch-real-fixtures.mjs
Use this for fast tool-use behavior checks against an OpenAI-compatible endpoint:
CHATBOX_EVAL_BASE_URL="https://your-openai-compatible-endpoint/v1" \
CHATBOX_EVAL_MODEL="your-model" \
CHATBOX_EVAL_API_KEY="your-api-key" \
pnpm eval:session-rag
Dry-run fixture loading:
pnpm eval:session-rag -- --dry-run
Use this when validating product behavior end to end. It exercises Electron, renderer config loading, license loading, local API routing, file upload, session attachment indexing, tool registration, model calls, and persisted messages.
Start the local API first. Then build the app with the local API flag because the renderer origin is compiled in:
USE_LOCAL_API=true node ./node_modules/electron-vite/bin/electron-vite.js build --mode development
pnpm eval:session-rag:chatbox -- --case long-citrine-threshold --keep-user-data
The harness copies the real config.json into a temporary userDataDir and sets SESSION_ATTACHMENT_RAG_DB_PATH to a
separate temp sqlite path. It does not mutate the real app profile.
session_attachment_embedding must be enabled by the local API/license response.USE_LOCAL_API=true must be present during renderer build. Passing it only when launching Electron is not enough.settings.defaultChatModel into the temp
config when the copied config does not define one.sessionAttachmentIndexStatus: "ready"; otherwise the model receives an indexing reminder and will not
query.pnpm eval:session-rag:chatbox -- --case long-citrine-threshold
pnpm eval:session-rag:chatbox -- --case implicit-citrine-current-policy
pnpm eval:session-rag:chatbox -- --case implicit-multi-doc-release-followup
pnpm eval:session-rag:chatbox -- --case real-wiki-apollo-implicit-landing-site
pnpm eval:session-rag:chatbox -- --case multi-turn-real-wiki-apollo-followup
pnpm eval:session-rag:chatbox -- --case unrelated-simple-math
pnpm eval:session-rag:chatbox -- --case real-wiki-unrelated-capital