examples/compare-agentic-sdks/README.md
Compare OpenAI Codex SDK, Claude Agent SDK, and OpenCode SDK on a security audit task.
npx promptfoo@latest init --example compare-agentic-sdks
npx promptfoo eval
npx promptfoo view
Four providers analyze an intentionally vulnerable Python codebase:
| Provider | How It Works | Output |
|---|---|---|
| Codex SDK | Reads files implicitly, uses output_schema | Structured JSON |
| Claude Agent SDK | Uses Read/Grep/Glob tools explicitly | Natural language |
| OpenCode SDK | Uses read/grep/glob tools, provider-agnostic | Natural language |
| Plain LLM | No file access (baseline) | Explains how to audit |
The vulnerable code lives in the test-codebase directory.
user_service.py:
payment_processor.py:
Codex SDK returns structured JSON matching the schema. Fast, predictable, good for automation. OpenAI only.
Claude Agent SDK uses file system tools to explore, returns natural language. More flexible, shows reasoning. Anthropic only.
OpenCode SDK uses file system tools similar to Claude Agent SDK, but supports 75+ LLM providers including Anthropic, OpenAI, Google, Ollama (local), and more.
Plain LLM can't read files, so it explains how to do a security audit instead of doing one.