Output Formats

Save and analyze your evaluation results in various formats.

Quick Start

bash

# Interactive web viewer (default)
promptfoo eval

# Save as HTML report
promptfoo eval --output results.html

# Export as JSON for further processing
promptfoo eval --output results.json

# Create CSV for spreadsheet analysis
promptfoo eval --output results.csv

# Generate XML for integration with other tools
promptfoo eval --output results.xml

Available Formats

HTML Report

Generate a visual, shareable report:

bash

promptfoo eval --output report.html

Features:

Interactive table with sorting and filtering
Side-by-side output comparison
Pass/fail statistics
Shareable standalone file

Use when: Presenting results to stakeholders or reviewing outputs visually.

JSON Output

Export complete evaluation data:

bash

promptfoo eval --output results.json

Structure:

json

{
  "version": 3,
  "timestamp": "2024-01-15T10:30:00Z",
  "results": {
    "prompts": [...],
    "providers": [...],
    "outputs": [...],
    "stats": {...}
  }
}

Use when: Integrating with other tools or performing custom analysis.

CSV Export

Create spreadsheet-compatible data:

bash

promptfoo eval --output results.csv

Columns include:

Test variables
Prompt used
Model outputs
Pass/fail status
Latency
Token usage

Use when: Analyzing results in Excel, Google Sheets, or data science tools.

YAML Format

Human-readable structured data:

bash

promptfoo eval --output results.yaml

Use when: Reviewing results in a text editor or version control.

JSONL Format

Each line contains one JSON result:

bash

promptfoo eval --output results.jsonl

Use when: Working with very large evaluations or when JSON export fails with memory errors.

jsonl

{"testIdx":0,"promptIdx":0,"success":true,"score":1.0,"response":{"output":"Response 1"},"gradingResult":{"pass":true,"score":1.0,"reason":"All assertions passed","componentResults":[{"pass":true,"score":1.0,"reason":"Expected output to contain \"hello\"","assertion":{"type":"contains","value":"hello"}}]}}
{"testIdx":1,"promptIdx":0,"success":false,"score":0.0,"response":{"output":"Response 2"},"gradingResult":null}

For assertion-level details, inspect each row's gradingResult?.componentResults array when present. The top-level success, score, and gradingResult fields describe the aggregate result for the row, while each componentResults[] entry contains the pass/fail, score, reason, and assertion metadata for one evaluated assertion. Both gradingResult and componentResults may be absent on error rows or rows without assertions.

To stream a JSONL file and read each row's component results:

import fs from 'node:fs';
import readline from 'node:readline';

const rl = readline.createInterface({
  input: fs.createReadStream('results.jsonl', { encoding: 'utf8' }),
  crlfDelay: Infinity,
});

for await (const line of rl) {
  if (!line.trim()) {
    continue;
  }
  const row = JSON.parse(line);
  for (const component of row.gradingResult?.componentResults ?? []) {
    console.log({
      type: component.assertion?.type,
      pass: component.pass,
      score: component.score,
      reason: component.reason,
    });
  }
}

?. and ?? [] together cover the gradingResult: null case shown above and rows where a single top-level assertion produced no nested componentResults.

XML Format

Structured data for enterprise integrations:

bash

promptfoo eval --output results.xml

Structure:

xml

<promptfoo>
  <evalId>abc-123-def</evalId>
  <results>
    <version>3</version>
    <timestamp>2024-01-15T10:30:00Z</timestamp>
    <prompts>...</prompts>
    <providers>...</providers>
    <outputs>...</outputs>
    <stats>...</stats>
  </results>
  <config>...</config>
  <shareableUrl>...</shareableUrl>
</promptfoo>

Use when: Integrating with enterprise systems, XML-based workflows, or when XML is a requirement.

Configuration Options

Setting Output Path in Config

yaml

# Specify default output file
outputPath: evaluations/latest_results.html

prompts:
  - '...'
tests:
  - '...'

Multiple Output Formats

Generate multiple formats simultaneously:

bash

# Command line
promptfoo eval --output results.html --output results.json

# Or use shell commands
promptfoo eval --output results.json && \
promptfoo eval --output results.csv

Output Contents

Standard Fields

All formats include:

Field	Description
`timestamp`	When the evaluation ran
`prompts`	Prompts used in evaluation
`providers`	LLM providers tested
`tests`	Test cases with variables
`outputs`	Raw LLM responses
`results`	Pass/fail for each assertion
`stats`	Summary statistics

:::warning

json, yaml, yml, txt, html, and xml outputs include the eval config. Sensitive fields are redacted using Promptfoo's sanitizer rules on a best-effort basis (not comprehensive). Non-sensitive config.env values may still appear in exports.

:::

Detailed Metrics

When available, outputs include:

Latency: Response time in milliseconds
Token Usage: Input/output token counts
Cost: Estimated API costs
Error Details: Failure reasons and stack traces

Analyzing Results

JSON Processing Example

javascript

const fs = require('fs');

// Load results
const results = JSON.parse(fs.readFileSync('results.json', 'utf8'));

// Analyze pass rates by provider
const providerStats = {};
results.results.outputs.forEach((output) => {
  const provider = output.provider;
  if (!providerStats[provider]) {
    providerStats[provider] = { pass: 0, fail: 0 };
  }

  if (output.pass) {
    providerStats[provider].pass++;
  } else {
    providerStats[provider].fail++;
  }
});

console.log('Pass rates by provider:', providerStats);

CSV Analysis with Pandas

python

import pandas as pd

# Load results
df = pd.read_csv('results.csv')

# Group by provider and calculate metrics
summary = df.groupby('provider').agg({
    'pass': 'mean',
    'latency': 'mean',
    'cost': 'sum'
})

print(summary)

Best Practices

1. Organize Output Files

text

project/
├── promptfooconfig.yaml
├── evaluations/
│   ├── 2024-01-15-baseline.html
│   ├── 2024-01-16-improved.html
│   └── comparison.json

2. Use Descriptive Filenames

bash

# Include date and experiment name
promptfoo eval --output "results/$(date +%Y%m%d)-gpt4-temperature-test.html"

3. Version Control Considerations

gitignore

# .gitignore
# Exclude large output files
evaluations/*.html
evaluations/*.json

# But keep summary reports
!evaluations/summary-*.csv

4. Automate Report Generation

bash

#!/bin/bash
# run_evaluation.sh

TIMESTAMP=$(date +%Y%m%d-%H%M%S)
promptfoo eval \
  --output "reports/${TIMESTAMP}-full.json" \
  --output "reports/${TIMESTAMP}-summary.html"

Web Viewer

The default web viewer (promptfoo view) provides:

Real-time updates during evaluation
Interactive exploration
Local-only (no data sent externally)

HTML outputs are self-contained:

bash

# Generate report
promptfoo eval --output team-review.html

# Share via email, Slack, etc.
# No external dependencies required

For collaborative review:

bash

# Share results with your team
promptfoo share

Creates a shareable link with:

Read-only access
Commenting capabilities
No setup required for viewers

Troubleshooting

Large Output Files

For extensive evaluations:

yaml

# Limit output size
outputPath: results.json
sharing:
  # Exclude raw outputs from file
  includeRawOutputs: false

Encoding Issues

Ensure proper encoding for international content:

bash

# Explicitly set encoding
LANG=en_US.UTF-8 promptfoo eval --output results.csv

Performance Tips

Use JSONL for large datasets - avoids memory issues
Use JSON for standard datasets - complete data structure
Generate HTML for presentations - best visual format
Use CSV for data analysis - Excel/Sheets compatible

Configuration Reference - All output options
Integrations - Using outputs with other tools
Command Line Guide - CLI options

Output Formats - Results Export and Analysis

Output Formats

Quick Start

Available Formats

HTML Report

JSON Output

CSV Export

YAML Format

JSONL Format

XML Format

Configuration Options

Setting Output Path in Config

Multiple Output Formats

Output Contents

Standard Fields

Detailed Metrics

Analyzing Results

JSON Processing Example

CSV Analysis with Pandas

Best Practices

1. Organize Output Files

2. Use Descriptive Filenames

3. Version Control Considerations

4. Automate Report Generation

Sharing Results

Web Viewer

Sharing HTML Reports

Promptfoo Share

Troubleshooting

Large Output Files

Encoding Issues

Performance Tips

Related Documentation