site/docs/configuration/outputs.md
Save and analyze your evaluation results in various formats.
# Interactive web viewer (default)
promptfoo eval
# Save as HTML report
promptfoo eval --output results.html
# Export as JSON for further processing
promptfoo eval --output results.json
# Create CSV for spreadsheet analysis
promptfoo eval --output results.csv
# Generate XML for integration with other tools
promptfoo eval --output results.xml
Generate a visual, shareable report:
promptfoo eval --output report.html
Features:
Use when: Presenting results to stakeholders or reviewing outputs visually.
Export complete evaluation data:
promptfoo eval --output results.json
Structure:
{
"version": 3,
"timestamp": "2024-01-15T10:30:00Z",
"results": {
"prompts": [...],
"providers": [...],
"outputs": [...],
"stats": {...}
}
}
Use when: Integrating with other tools or performing custom analysis.
Create spreadsheet-compatible data:
promptfoo eval --output results.csv
Columns include:
Use when: Analyzing results in Excel, Google Sheets, or data science tools.
Human-readable structured data:
promptfoo eval --output results.yaml
Use when: Reviewing results in a text editor or version control.
Each line contains one JSON result:
promptfoo eval --output results.jsonl
Use when: Working with very large evaluations or when JSON export fails with memory errors.
{"testIdx":0,"promptIdx":0,"success":true,"score":1.0,"response":{"output":"Response 1"},"gradingResult":{"pass":true,"score":1.0,"reason":"All assertions passed","componentResults":[{"pass":true,"score":1.0,"reason":"Expected output to contain \"hello\"","assertion":{"type":"contains","value":"hello"}}]}}
{"testIdx":1,"promptIdx":0,"success":false,"score":0.0,"response":{"output":"Response 2"},"gradingResult":null}
For assertion-level details, inspect each row's gradingResult?.componentResults array when
present. The top-level success, score, and gradingResult fields describe the aggregate
result for the row, while each componentResults[] entry contains the pass/fail, score,
reason, and assertion metadata for one evaluated assertion. Both gradingResult and
componentResults may be absent on error rows or rows without assertions.
To stream a JSONL file and read each row's component results:
import fs from 'node:fs';
import readline from 'node:readline';
const rl = readline.createInterface({
input: fs.createReadStream('results.jsonl', { encoding: 'utf8' }),
crlfDelay: Infinity,
});
for await (const line of rl) {
if (!line.trim()) {
continue;
}
const row = JSON.parse(line);
for (const component of row.gradingResult?.componentResults ?? []) {
console.log({
type: component.assertion?.type,
pass: component.pass,
score: component.score,
reason: component.reason,
});
}
}
?. and ?? [] together cover the gradingResult: null case shown above and rows
where a single top-level assertion produced no nested componentResults.
Structured data for enterprise integrations:
promptfoo eval --output results.xml
Structure:
<promptfoo>
<evalId>abc-123-def</evalId>
<results>
<version>3</version>
<timestamp>2024-01-15T10:30:00Z</timestamp>
<prompts>...</prompts>
<providers>...</providers>
<outputs>...</outputs>
<stats>...</stats>
</results>
<config>...</config>
<shareableUrl>...</shareableUrl>
</promptfoo>
Use when: Integrating with enterprise systems, XML-based workflows, or when XML is a requirement.
# Specify default output file
outputPath: evaluations/latest_results.html
prompts:
- '...'
tests:
- '...'
Generate multiple formats simultaneously:
# Command line
promptfoo eval --output results.html --output results.json
# Or use shell commands
promptfoo eval --output results.json && \
promptfoo eval --output results.csv
All formats include:
| Field | Description |
|---|---|
timestamp | When the evaluation ran |
prompts | Prompts used in evaluation |
providers | LLM providers tested |
tests | Test cases with variables |
outputs | Raw LLM responses |
results | Pass/fail for each assertion |
stats | Summary statistics |
:::warning
json, yaml, yml, txt, html, and xml outputs include the eval config. Sensitive fields are redacted using Promptfoo's sanitizer rules on a best-effort basis (not comprehensive). Non-sensitive config.env values may still appear in exports.
:::
When available, outputs include:
const fs = require('fs');
// Load results
const results = JSON.parse(fs.readFileSync('results.json', 'utf8'));
// Analyze pass rates by provider
const providerStats = {};
results.results.outputs.forEach((output) => {
const provider = output.provider;
if (!providerStats[provider]) {
providerStats[provider] = { pass: 0, fail: 0 };
}
if (output.pass) {
providerStats[provider].pass++;
} else {
providerStats[provider].fail++;
}
});
console.log('Pass rates by provider:', providerStats);
import pandas as pd
# Load results
df = pd.read_csv('results.csv')
# Group by provider and calculate metrics
summary = df.groupby('provider').agg({
'pass': 'mean',
'latency': 'mean',
'cost': 'sum'
})
print(summary)
project/
├── promptfooconfig.yaml
├── evaluations/
│ ├── 2024-01-15-baseline.html
│ ├── 2024-01-16-improved.html
│ └── comparison.json
# Include date and experiment name
promptfoo eval --output "results/$(date +%Y%m%d)-gpt4-temperature-test.html"
# .gitignore
# Exclude large output files
evaluations/*.html
evaluations/*.json
# But keep summary reports
!evaluations/summary-*.csv
#!/bin/bash
# run_evaluation.sh
TIMESTAMP=$(date +%Y%m%d-%H%M%S)
promptfoo eval \
--output "reports/${TIMESTAMP}-full.json" \
--output "reports/${TIMESTAMP}-summary.html"
The default web viewer (promptfoo view) provides:
HTML outputs are self-contained:
# Generate report
promptfoo eval --output team-review.html
# Share via email, Slack, etc.
# No external dependencies required
For collaborative review:
# Share results with your team
promptfoo share
Creates a shareable link with:
For extensive evaluations:
# Limit output size
outputPath: results.json
sharing:
# Exclude raw outputs from file
includeRawOutputs: false
Ensure proper encoding for international content:
# Explicitly set encoding
LANG=en_US.UTF-8 promptfoo eval --output results.csv