examples/anthropic/opus-4-8-coding/README.md
This example exercises Claude Opus 4.8 on hard coding tasks using the xhigh effort level with adaptive thinking.
You can run this example with:
npx promptfoo@latest init --example anthropic/opus-4-8-coding
cd opus-4-8-coding
Claude Opus 4.8 is Anthropic's most capable model for complex reasoning and agentic coding. This example evaluates:
thinking: { type: adaptive } (as this example does) to let the model decide when and how much to reason per request. Without an explicit thinking block the model runs without extended thinking, even at high effort.effort defaults to high; xhigh is available. Setting effort: high behaves the same as omitting it. Start with xhigh for coding and agentic work, and pair high effort with a large max_tokens.temperature, top_p, and top_k at the model level; promptfoo omits them automatically (don't set them in config).# Set your API key
export ANTHROPIC_API_KEY=your_api_key_here
# Run the evaluation
npx promptfoo@latest eval
# View results
npx promptfoo@latest view
Opus 4.8 is also reachable through:
bedrock:us.anthropic.claude-opus-4-8 (or bedrock:converse:us.anthropic.claude-opus-4-8)vertex:claude-opus-4-8 with config.region: globalanthropic:messages:claude-opus-4-8 at https://<resource>.services.ai.azure.com/anthropic via apiBaseUrlAcross all four providers, promptfoo automatically omits the unsupported sampling parameters (temperature, top_p, top_k) for Opus 4.8. The Anthropic Messages provider also logs a one-time warning if you set them explicitly; the Bedrock, Vertex, and Azure paths omit them silently.