examples/redteam-mcp/README.md
This example demonstrates red teaming an AI assistant that uses Model Context Protocol (MCP) for tool use capabilities. The example focuses on testing various attack vectors specific to MCP implementations, such as function call exploits, system prompt leakage, unauthorized tool discovery, and other MCP-specific vulnerabilities.
You can run this example with:
npx promptfoo@latest init --example redteam-mcp
cd redteam-mcp
This example requires the following environment variable:
ANTHROPIC_API_KEY - Your Anthropic API keyYou can set this in a .env file or directly in your environment:
export ANTHROPIC_API_KEY=your_anthropic_key_here
Initialize the example:
npx promptfoo@latest init --example redteam-mcp
Navigate to the example directory:
cd redteam-mcp
Run the red team evaluation:
npx promptfoo redteam run
This example evaluates an AI customer support agent implementing MCP with the following attack vectors:
This example is configured to test the Anthropic Claude 4 Sonnet model with MCP enabled. The MCP server is specified as:
https://customer-service-mcp-server-example.promptfoo.app/mcp
The red team evaluation uses multiple testing strategies:
After running the evaluation, you'll see a report showing which attack vectors were successful and which were blocked by the system's defenses.
You can modify the promptfooconfig.yaml file to:
anthropic:claude-sonnet-4-6)