site/docs/integrations/ci-cd.md
Integrate promptfoo into your CI/CD pipelines to automatically evaluate prompts, test for security vulnerabilities, and ensure quality before deployment. This guide covers modern CI/CD workflows for both quality testing and security scanning.
If you're using GitHub Actions, check out our dedicated GitHub Actions guide or the GitHub Marketplace action.
For other platforms, here's a basic example:
# Run eval (no global install required)
npx promptfoo@latest eval -c promptfooconfig.yaml -o results.json
# Run security scan (red teaming)
npx promptfoo@latest redteam run
promptfooconfig.yaml)Promptfoo supports two main CI/CD workflows:
Eval - Test prompt quality and performance:
npx promptfoo@latest eval -c promptfooconfig.yaml
Red Teaming - Security vulnerability scanning:
npx promptfoo@latest redteam run
See our red team quickstart for security testing details.
Promptfoo supports multiple output formats for different CI/CD needs:
# JSON for programmatic processing
npx promptfoo@latest eval -o results.json
# HTML for human-readable reports
npx promptfoo@latest eval -o report.html
# XML for enterprise tools
npx promptfoo@latest eval -o results.xml
# Multiple formats
npx promptfoo@latest eval -o results.json -o report.html
Learn more about output formats and processing.
:::info Enterprise Feature
SonarQube integration is available in Promptfoo Enterprise. Use the standard JSON output format and process it for SonarQube import.
:::
Fail the build when quality thresholds aren't met:
# Fail on any test failures
npx promptfoo@latest eval --fail-on-error
# Custom threshold checking
npx promptfoo@latest eval -o results.json
PASS_RATE=$(jq '.results.stats.successes / (.results.stats.successes + .results.stats.failures) * 100' results.json)
if (( $(echo "$PASS_RATE < 95" | bc -l) )); then
echo "Quality gate failed: Pass rate ${PASS_RATE}% < 95%"
exit 1
fi
See assertions and metrics for comprehensive validation options.
name: LLM Eval
on:
pull_request:
paths:
- 'prompts/**'
- 'promptfooconfig.yaml'
jobs:
evaluate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'npm'
- name: Cache promptfoo
uses: actions/cache@v4
with:
path: ~/.cache/promptfoo
key: ${{ runner.os }}-promptfoo-${{ hashFiles('prompts/**') }}
restore-keys: |
${{ runner.os }}-promptfoo-
- name: Run eval
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
PROMPTFOO_CACHE_PATH: ~/.cache/promptfoo
run: |
npx promptfoo@latest eval \
-c promptfooconfig.yaml \
--share \
-o results.json \
-o report.html
- name: Check quality gate
run: |
FAILURES=$(jq '.results.stats.failures' results.json)
if [ "$FAILURES" -gt 0 ]; then
echo "❌ Eval failed with $FAILURES failures"
exit 1
fi
echo "✅ All tests passed!"
- name: Upload results
if: always()
uses: actions/upload-artifact@v4
with:
name: eval-results
path: |
results.json
report.html
For red teaming in CI/CD:
name: Security Scan
on:
schedule:
- cron: '0 0 * * *' # Daily
workflow_dispatch:
jobs:
red-team:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run red team scan
uses: promptfoo/promptfoo-action@v1
with:
type: 'redteam'
config: 'promptfooconfig.yaml'
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
github-token: ${{ secrets.GITHUB_TOKEN }}
See also: Standalone GitHub Action example.
See our detailed GitLab CI guide.
image: node:20
evaluate:
script:
- |
npx promptfoo@latest eval \
-c promptfooconfig.yaml \
--share \
-o output.json
variables:
OPENAI_API_KEY: ${OPENAI_API_KEY}
PROMPTFOO_CACHE_PATH: .cache/promptfoo
cache:
key: ${CI_COMMIT_REF_SLUG}-promptfoo
paths:
- .cache/promptfoo
artifacts:
reports:
junit: output.xml
paths:
- output.json
- report.html
See our detailed Jenkins guide.
pipeline {
agent any
environment {
OPENAI_API_KEY = credentials('openai-api-key')
PROMPTFOO_CACHE_PATH = "${WORKSPACE}/.cache/promptfoo"
}
stages {
stage('Evaluate') {
steps {
sh '''
npx promptfoo@latest eval \
-c promptfooconfig.yaml \
--share \
-o results.json
'''
}
}
stage('Quality Gate') {
steps {
script {
def results = readJSON file: 'results.json'
def failures = results.results.stats.failures
if (failures > 0) {
error("Eval failed with ${failures} failures")
}
}
}
}
}
}
Create a custom Docker image with promptfoo pre-installed:
FROM node:20-slim
WORKDIR /app
COPY . .
CMD ["npx", "promptfoo@latest", "eval"]
Test multiple models or configurations in parallel:
# GitHub Actions example
strategy:
matrix:
model: [gpt-4, gpt-3.5-turbo, claude-3-opus]
steps:
- name: Test ${{ matrix.model }}
run: |
npx promptfoo@latest eval \
--providers.0.config.model=${{ matrix.model }} \
-o results-${{ matrix.model }}.json
Run comprehensive security scans on a schedule:
# GitHub Actions
on:
schedule:
- cron: '0 2 * * *' # 2 AM daily
jobs:
security-scan:
runs-on: ubuntu-latest
steps:
- name: Full red team scan
run: |
npx promptfoo@latest redteam generate \
--plugins harmful,pii,contracts \
--strategies jailbreak,prompt-injection
npx promptfoo@latest redteam run
:::info Enterprise Feature
Direct SonarQube output format is available in Promptfoo Enterprise. For open-source users, export to JSON and transform the results.
:::
For enterprise environments, integrate with SonarQube:
# Export results for SonarQube processing
- name: Run promptfoo security scan
run: |
npx promptfoo@latest eval \
--config promptfooconfig.yaml \
-o results.json
# Transform results for SonarQube (custom script required)
- name: Transform for SonarQube
run: |
node transform-to-sonarqube.js results.json > sonar-report.json
- name: SonarQube scan
env:
SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
run: |
sonar-scanner \
-Dsonar.externalIssuesReportPaths=sonar-report.json
See our SonarQube integration guide for detailed setup.
The output JSON follows this schema:
interface OutputFile {
evalId?: string;
results: {
stats: {
successes: number;
failures: number;
errors: number;
};
outputs: Array<{
pass: boolean;
score: number;
error?: string;
// ... other fields
}>;
};
config: UnifiedConfig;
shareableUrl: string | null;
}
Example processing script:
const fs = require('fs');
const results = JSON.parse(fs.readFileSync('results.json', 'utf8'));
// Calculate metrics
const passRate =
(results.results.stats.successes /
(results.results.stats.successes + results.results.stats.failures)) *
100;
console.log(`Pass rate: ${passRate.toFixed(2)}%`);
console.log(`Shareable URL: ${results.shareableUrl}`);
// Check for specific failures
const criticalFailures = results.results.outputs.filter(
(o) => o.error?.includes('security') || o.error?.includes('injection'),
);
if (criticalFailures.length > 0) {
console.error('Critical security failures detected!');
process.exit(1);
}
Post eval results to PR comments, Slack, or other channels:
# Extract and post results
SHARE_URL=$(jq -r '.shareableUrl' results.json)
PASS_RATE=$(jq '.results.stats.successes / (.results.stats.successes + .results.stats.failures) * 100' results.json)
# Post to GitHub PR
gh pr comment --body "
## Promptfoo Eval Results
- Pass rate: ${PASS_RATE}%
- [View detailed results](${SHARE_URL})
"
Optimize CI/CD performance with proper caching [[memory:3455374]]:
# Set cache location
env:
PROMPTFOO_CACHE_PATH: ~/.cache/promptfoo
PROMPTFOO_CACHE_TTL: 86400 # 24 hours
# Cache configuration
cache:
key: promptfoo-${{ hashFiles('prompts/**', 'promptfooconfig.yaml') }}
paths:
- ~/.cache/promptfoo
API Key Management
Network Security
Data Privacy
export PROMPTFOO_STRIP_RESPONSE_OUTPUT=true
export PROMPTFOO_STRIP_TEST_VARS=true
Audit Logging
| Issue | Solution |
|---|---|
| Rate limits | Enable caching, reduce concurrency with -j 1 |
| Timeouts | Increase timeout values, use --max-concurrency |
| Memory issues | Use streaming mode, process results in batches |
| Cache misses | Check cache key includes all relevant files |
Enable detailed logging:
LOG_LEVEL=debug npx promptfoo@latest eval -c config.yaml