compare-agentic-sdks (Agentic SDK Comparison)

Compare OpenAI Codex SDK, Claude Agent SDK, and OpenCode SDK on a security audit task.

Quick Start

bash

npx promptfoo@latest init --example compare-agentic-sdks
npx promptfoo eval
npx promptfoo view

What This Compares

Four providers analyze an intentionally vulnerable Python codebase:

Provider	How It Works	Output
Codex SDK	Reads files implicitly, uses `output_schema`	Structured JSON
Claude Agent SDK	Uses Read/Grep/Glob tools explicitly	Natural language
OpenCode SDK	Uses read/grep/glob tools, provider-agnostic	Natural language
Plain LLM	No file access (baseline)	Explains how to audit

Vulnerabilities Planted

The vulnerable code lives in the test-codebase directory.

user_service.py:

MD5 password hashing
Timing attack in authentication
Predictable session tokens

payment_processor.py:

Float for currency (precision loss)
PCI-DSS violations (storing CVV)
Sensitive data in logs

Key Differences

Codex SDK returns structured JSON matching the schema. Fast, predictable, good for automation. OpenAI only.

Claude Agent SDK uses file system tools to explore, returns natural language. More flexible, shows reasoning. Anthropic only.

OpenCode SDK uses file system tools similar to Claude Agent SDK, but supports 75+ LLM providers including Anthropic, OpenAI, Google, Ollama (local), and more.

Plain LLM can't read files, so it explains how to do a security audit instead of doing one.

compare-agentic-sdks (Agentic SDK Comparison)

compare-agentic-sdks (Agentic SDK Comparison)

Quick Start

What This Compares

Vulnerabilities Planted

Key Differences

Learn More