examples/claude-thinking/README.md
This example demonstrates Claude's "thinking" capability, which allows you to see the model's step-by-step reasoning process before it provides a final answer. The example compares thinking outputs from Claude Sonnet 4 (Anthropic API) and Claude Haiku 4.5 (AWS Bedrock).
You can run this example with:
npx promptfoo@latest init --example claude-thinking
cd claude-thinking
This example requires:
ANTHROPIC_API_KEY - Your Anthropic API key from console.anthropic.comAWS_ACCESS_KEY_ID - Your AWS access keyAWS_SECRET_ACCESS_KEY - Your AWS secret keyaws configureAfter setting up environment variables:
# From the example directory
promptfoo eval
promptfoo view
This example includes several test cases of increasing complexity:
These test cases are specifically designed to showcase Claude's ability to break down complex problems and show detailed thinking steps.
The thinking feature is enabled by setting special parameters in the provider configuration:
thinking:
type: 'enabled'
budget_tokens: 4096 # Controls how many tokens are allocated for thinking
max_tokens: 8192 # Must be greater than budget_tokens
When enabled, Claude's response will include a "Thinking:" section that shows its reasoning process before the final answer:
Thinking: Let me solve this step by step...
1. First, I'll divide the 8 balls into three groups...
2. In the first weighing, I'll compare groups A and B...
3. Based on the result, I can determine...
Final answer: We need exactly 2 weighings to find the heavier ball.