Back to Promptfoo

compare-agentic-sdks (Agentic SDK Comparison)

examples/compare-agentic-sdks/README.md

0.121.92.0 KB
Original Source

compare-agentic-sdks (Agentic SDK Comparison)

Compare OpenAI Codex SDK, Claude Agent SDK, and OpenCode SDK on a security audit task.

Quick Start

bash
npx promptfoo@latest init --example compare-agentic-sdks
npx promptfoo eval
npx promptfoo view

What This Compares

Four providers analyze an intentionally vulnerable Python codebase:

ProviderHow It WorksOutput
Codex SDKReads files implicitly, uses output_schemaStructured JSON
Claude Agent SDKUses Read/Grep/Glob tools explicitlyNatural language
OpenCode SDKUses read/grep/glob tools, provider-agnosticNatural language
Plain LLMNo file access (baseline)Explains how to audit

Vulnerabilities Planted

The vulnerable code lives in the test-codebase directory.

user_service.py:

  • MD5 password hashing
  • Timing attack in authentication
  • Predictable session tokens

payment_processor.py:

  • Float for currency (precision loss)
  • PCI-DSS violations (storing CVV)
  • Sensitive data in logs

Key Differences

Codex SDK returns structured JSON matching the schema. Fast, predictable, good for automation. OpenAI only.

Claude Agent SDK uses file system tools to explore, returns natural language. More flexible, shows reasoning. Anthropic only.

OpenCode SDK uses file system tools similar to Claude Agent SDK, but supports 75+ LLM providers including Anthropic, OpenAI, Google, Ollama (local), and more.

Plain LLM can't read files, so it explains how to do a security audit instead of doing one.

Learn More