examples/claude-agent-sdk/README.md
The Claude Agent SDK provider (aka Claude Code provider) enables you to run agentic evals with configurable tools, permissions, and environments.
npx promptfoo@latest init --example claude-agent-sdk
cd claude-agent-sdk
Install the Claude Agent SDK:
npm install @anthropic-ai/claude-agent-sdk
Export your Anthropic API key as ANTHROPIC_API_KEY:
export ANTHROPIC_API_KEY=your_api_key_here
This example shows Claude Agent SDK in its simplest form - running in a temporary directory with no file system access or tools enabled, behaving similarly to the standard Anthropic provider.
Location: ./basic/
Usage:
(cd basic && promptfoo eval)
This example provides Claude Agent SDK with read-only access to a sample project containing Python, TypeScript, and JavaScript files with intentional bugs for analysis. Because the working_dir is set, Claude Agent SDK has access to the following read-only tools:
Read - Read file contentsGrep - Search file contentsGlob - Find files by patternLS - List directory contentsLocation: ./working-dir/
Usage:
(cd working-dir && promptfoo eval)
This example shows Claude Agent SDK's ability to modify files with:
Write, Edit, and MultiEdit tools are added to the default set of read-only tools by setting append_allowed_toolspermission_mode is set to acceptEdits for automatic approval of file edits./workspace) uses beforeAll, afterEach, and afterAll extension hooks defined in hooks.js to:
.git directory after all testsmaxConcurrency: 1 to prevent race conditions during concurrent testsLocation: ./advanced/
Usage:
(cd advanced && promptfoo eval)
This example shows Claude Agent SDK integration with:
@h1deya/mcp-server-weather for weather datamcp__weather__get-forecast, mcp__weather__get-alerts)Location: ./mcp/
Usage:
(cd mcp && promptfoo eval)
This example demonstrates Claude Agent SDK's structured output feature, which returns validated JSON that conforms to a schema. It includes:
Location: ./structured-output/
Usage:
(cd structured-output && promptfoo eval)
This example demonstrates advanced Claude Agent SDK configuration options including sandbox settings, runtime configuration, permission bypass, and CLI arguments.
Location: ./advanced-options/
Usage:
(cd advanced-options && promptfoo eval)
Features demonstrated:
This example demonstrates handling the AskUserQuestion tool in automated evaluations. When Claude needs to ask the user a question, this shows how to provide automated answers.
Location: ./ask-user-question/
Usage:
(cd ask-user-question && promptfoo eval)
Features demonstrated:
ask_user_question.behavior for simple automated responsesAskUserQuestion via append_allowed_toolsThis example demonstrates testing Agent Skills with the Claude Agent SDK. Skills are reusable capabilities defined as SKILL.md files that Claude automatically invokes when relevant.
setting_sources: ['project'] to load skills from .claude/skills/Skill tool via append_allowed_toolsmetadata.skillCalls with the skill-used assertionLocation: ./skills/
Usage:
(cd skills && promptfoo eval)
This example demonstrates loading skills from a plugin instead of from setting_sources. Plugins are self-contained directories that bundle skills, agents, hooks, and MCP servers together.
plugins: [{type: local, path: ./sample-plugin}] to load a local pluginSkill tool via append_allowed_toolsmetadata.skillCalls with the skill-used assertionLocation: ./plugins/
Usage:
(cd plugins && promptfoo eval)
This example demonstrates testing AI agents against cyber espionage attack patterns based on Anthropic's "Disrupting AI Espionage" blog post. It includes:
harmful:cybercrime, harmful:cybercrime:malicious-code, ssrf, pii, excessive-agency, and morejailbreak:meta, jailbreak:hydra, crescendo, goat for sophisticated attacksRead, Grep, Glob, Bash) to test security boundariesLocation: ./cyber-espionage/
Usage:
(cd cyber-espionage && promptfoo eval)
⚠️ This example is for authorized security testing only. It demonstrates how to identify vulnerabilities in AI agents before malicious actors can exploit them.