site/docs/integrations/looper.md
This guide shows you how to integrate Promptfoo evaluations into a Looper CI/CD workflow so that every pull‑request (and optional nightly job) automatically runs your prompt tests.
promptfooconfig.yaml and your prompt fixtures (prompts/**/*.json) committed to the repository.looper.ymlAdd the following file to the root of your repo:
language: workflow # optional but common
tools:
nodejs: 22 # Looper provisions Node.js
jq: 1.7
envs:
global:
variables:
PROMPTFOO_CACHE_PATH: '${HOME}/.promptfoo/cache'
triggers:
- pr # run on every pull‑request
- manual: 'Nightly Prompt Tests' # manual button in UI
call: nightly # invokes the nightly flow below
flows:
# ---------- default PR flow ----------
default:
- (name Install Promptfoo) npm install -g promptfoo
- (name Evaluate Prompts) |
promptfoo eval \
-c promptfooconfig.yaml \
--prompts "prompts/**/*.json" \
--share \
-o output.json
- (name Quality gate) |
SUCC=$(jq -r '.results.stats.successes' output.json)
FAIL=$(jq -r '.results.stats.failures' output.json)
echo "✅ $SUCC ❌ $FAIL"
test "$FAIL" -eq 0 # non‑zero exit fails the build
# ---------- nightly scheduled flow ----------
nightly:
- call: default # reuse the logic above
- (name Upload artefacts) |
aws s3 cp output.json s3://your-bucket/promptfoo/output.json
| Section | Purpose |
|---|---|
tools | Declares tool versions Looper should provision. |
envs.global.variables | Environment variables available to every step. |
triggers | Determines when the workflow runs (pr, manual, cron, etc.). |
flows | Ordered shell commands; execution stops on the first non‑zero exit. |
Looper lacks a first‑class cache API. Two common approaches:
${HOME}/.promptfoo/cache on a reusable volume. - (name Pass‑rate gate) |
TOTAL=$(jq '.results.stats.successes + .results.stats.failures' output.json)
PASS=$(jq '.results.stats.successes' output.json)
RATE=$(echo "scale=2; 100*$PASS/$TOTAL" | bc)
echo "Pass rate: $RATE%"
test $(echo "$RATE >= 95" | bc) -eq 1 # fail if <95 %
Evaluate both staging and production configs and compare failures:
flows:
compare-envs:
- (name Eval‑prod) |
promptfoo eval \
-c promptfooconfig.prod.yaml \
--prompts "prompts/**/*.json" \
-o output-prod.json
- (name Eval‑staging) |
promptfoo eval \
-c promptfooconfig.staging.yaml \
--prompts "prompts/**/*.json" \
-o output-staging.json
- (name Compare) |
PROD_FAIL=$(jq '.results.stats.failures' output-prod.json)
STAGE_FAIL=$(jq '.results.stats.failures' output-staging.json)
if [ "$STAGE_FAIL" -gt "$PROD_FAIL" ]; then
echo "⚠️ Staging has more failures than production!"
fi
In order to send evaluation results elsewhere, use:
- github --add-comment \
--repository "$CI_REPOSITORY" \
--issue "$PR_NUMBER" \
--body "$(cat comment.md)" # set comment as appropriate
| Problem | Remedy |
|---|---|
npm: command not found | Add nodejs: under tools or use an image with Node pre‑installed. |
| Cache not restored | Verify the path and that the files pull task succeeds. |
| Long‑running jobs | Split prompt sets into separate flows or raise timeoutMillis in the build definition. |
| API rate limits | Enable Promptfoo cache and/or rotate API keys. |
looper diff --name-only prompts/ into promptfoo eval to test only changed prompts.import it.