integration-strands-agents (Strands Agents SDK example)

This example demonstrates how to evaluate Strands Agents SDK with promptfoo.

Strands Agents is an open-source AI agent framework developed by AWS that provides a model-driven approach to building AI agents.

You can run this example with:

bash

npx promptfoo@latest init --example integration-strands-agents
cd integration-strands-agents

Overview

This example showcases:

bash

pip install -r requirements.txt

This installs:

bash

export OPENAI_API_KEY=your-api-key-here

bash

pip install 'strands-agents[anthropic]'
export ANTHROPIC_API_KEY=your-key

Then modify agent.py to use AnthropicModel instead of OpenAIModel.

bash

pip install 'strands-agents[bedrock]'

bash

# Run evaluation
npx promptfoo eval

# View results in the web UI
npx promptfoo view

The agent is defined in agent.py using the Strands Agent class with two tools:

get_weather: Returns mock weather data for cities (New York, London, Tokyo, Paris, Seattle, San Francisco)
convert_temperature: Converts temperatures between Fahrenheit and Celsius

Tools are defined using the @tool decorator which automatically exposes them to the LLM based on their docstrings.

agent_provider.py exposes a call_api function that promptfoo's Python provider calls to interact with the Strands agent.

The promptfoo config includes 5 test cases that demonstrate different assertion types:

Test	Description	Assertion types used
Weather query for New York	Basic tool usage	`contains-any`, `llm-rubric`, `latency`
Weather query for London	Verify temperature format	`contains-any`, `javascript`, `latency`
Weather query for Tokyo	Case-insensitive matching	`icontains`, `javascript`, `latency`
Weather with temperature conversion	Multi-tool chaining	`llm-rubric`, `javascript`, `latency`
Weather for unknown city	Graceful fallback handling	`icontains`, `not-contains`, `latency`

latency - Ensures responses complete within 30 seconds (applied to all tests via defaultTest)
contains-any - Verifies the agent returns expected city names and weather data from the mock tool
icontains - Case-insensitive matching to verify city names appear regardless of formatting
not-contains - Ensures the agent handles unknown cities gracefully without error messages
javascript - Validates temperature format (°F/°C symbols) and response length requirements
llm-rubric - Semantically evaluates whether the agent correctly chains weather lookup with temperature conversion