site/docs/integrations/mcp-server.md
Expose promptfoo's eval tools to AI agents via Model Context Protocol (MCP).
:::info Prerequisites
:::
# For Cursor, Claude Desktop (STDIO transport)
npx promptfoo@latest mcp --transport stdio
# For web tools (HTTP transport)
npx promptfoo@latest mcp --transport http --port 3100
Cursor: Create .cursor/mcp.json in your project root
{
"mcpServers": {
"promptfoo": {
"command": "npx",
"args": ["promptfoo@latest", "mcp", "--transport", "stdio"],
"description": "Promptfoo MCP server for LLM evaluation and testing"
}
}
}
:::warning Development vs Production Configuration
For regular usage: Always use npx promptfoo@latest as shown above.
For promptfoo contributors: The repository's .cursor/mcp.json runs from source code for development. It requires the repo's dev dependencies and won't work elsewhere.
:::
Claude Desktop: Add to config file
Config file locations:
~/Library/Application Support/Claude/claude_desktop_config.json%APPDATA%\Claude\claude_desktop_config.json~/.config/Claude/claude_desktop_config.json{
"mcpServers": {
"promptfoo": {
"command": "npx",
"args": ["promptfoo@latest", "mcp", "--transport", "stdio"],
"description": "Promptfoo MCP server for LLM evaluation and testing"
}
}
}
Restart your AI tool after adding the configuration.
After restarting your AI tool, you should see promptfoo tools available. Try asking:
"List my recent evaluations using the promptfoo tools"
list_evaluations - Browse your evaluation runs with optional dataset filteringget_evaluation_details - Get comprehensive results, metrics, and test cases for a specific evaluationrun_evaluation - Execute evaluations with custom parameters, test case filtering, and concurrency controlshare_evaluation - Generate publicly shareable URLs for evaluation resultsgenerate_dataset - Generate test datasets using AI for comprehensive evaluation coveragegenerate_test_cases - Generate test cases with assertions for existing promptscompare_providers - Compare multiple AI providers side-by-side for performance and qualityredteam_run - Execute comprehensive security testing against AI applications with dynamic attack probesredteam_generate - Generate adversarial test cases for redteam security testing with configurable plugins and strategiesvalidate_promptfoo_config - Validate configuration files using the same logic as the CLItest_provider - Test AI provider connectivity, credentials, and response qualityrun_assertion - Test individual assertion rules against outputs for debuggingAsk your AI assistant:
"Help me run an evaluation. First, validate my config, then list recent evaluations, and finally run a new evaluation with just the first 5 test cases."
The AI will use these tools in sequence:
validate_promptfoo_config - Check your configurationlist_evaluations - Show recent runsrun_evaluation - Execute with test case filtering"Compare the performance of GPT-4, Claude 3, and Gemini Pro on my customer support prompt."
The AI will:
test_provider - Verify each provider workscompare_providers - Run side-by-side comparison"Run a security audit on my chatbot prompt to check for jailbreak vulnerabilities."
The AI will:
redteam_generate - Create adversarial test casesredteam_run - Execute security testsget_evaluation_details - Analyze vulnerabilities found"Generate 20 diverse test cases for my email classification prompt, including edge cases."
The AI will:
generate_dataset - Create test data with AIgenerate_test_cases - Add appropriate assertionsrun_evaluation - Test the generated casesChoose the appropriate transport based on your use case:
--transport stdio): For desktop AI tools (Cursor, Claude Desktop) that communicate via stdin/stdout--transport http): For web applications, APIs, and remote integrations that need HTTP endpointsBegin with simple tools like list_evaluations and validate_promptfoo_config before moving to more complex operations.
When working with large datasets:
get_evaluation_detailsWhen using redteam tools:
Server won't start:
# Verify promptfoo installation
npx promptfoo@latest --version
# Check if you have a valid promptfoo project
npx promptfoo@latest validate
# Test the MCP server manually
npx promptfoo@latest mcp --transport stdio
Port conflicts (HTTP mode):
# Use a different port
npx promptfoo@latest mcp --transport http --port 8080
# Check what's using port 3100
lsof -i :3100 # macOS/Linux
netstat -ano | findstr :3100 # Windows
AI tool can't connect:
curl http://localhost:3100/healthTools not appearing:
"Eval not found":
list_evaluations first to see available evaluation IDs"Config error":
validate_promptfoo_config to check your configurationpromptfooconfig.yaml exists and is valid"Provider error":
test_provider to diagnose connectivity and authentication issuesFor HTTP transport, you can integrate with any system that supports HTTP:
// Example: Call MCP server from Node.js
const response = await fetch('http://localhost:3100/mcp', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
method: 'tools/call',
params: {
name: 'list_evaluations',
arguments: { datasetId: 'my-dataset' },
},
}),
});
The MCP server respects all promptfoo environment variables:
# Set provider API keys
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
# Configure promptfoo behavior
export PROMPTFOO_CONFIG_DIR=/path/to/configs
export PROMPTFOO_OUTPUT_DIR=/path/to/outputs
# Start server with environment
npx promptfoo@latest mcp --transport stdio