Back to Promptfoo

integration-strands-agents (Strands Agents SDK example)

examples/integration-strands-agents/README.md

0.121.95.1 KB
Original Source

integration-strands-agents (Strands Agents SDK example)

This example demonstrates how to evaluate Strands Agents SDK with promptfoo.

Strands Agents is an open-source AI agent framework developed by AWS that provides a model-driven approach to building AI agents.

You can run this example with:

bash
npx promptfoo@latest init --example integration-strands-agents
cd integration-strands-agents

Overview

This example showcases:

Prerequisites

Setup

1. Install Python dependencies

bash
pip install -r requirements.txt

This installs:

2. Set environment variables

bash
export OPENAI_API_KEY=your-api-key-here

Alternative: use Anthropic or Bedrock

Strands supports multiple model providers. To use Anthropic:

bash
pip install 'strands-agents[anthropic]'
export ANTHROPIC_API_KEY=your-key

Then modify agent.py to use AnthropicModel instead of OpenAIModel.

To use Amazon Bedrock:

bash
pip install 'strands-agents[bedrock]'

Running the example

bash
# Run evaluation
npx promptfoo eval

# View results in the web UI
npx promptfoo view

How it works

Agent structure

The agent is defined in agent.py using the Strands Agent class with two tools:

  • get_weather: Returns mock weather data for cities (New York, London, Tokyo, Paris, Seattle, San Francisco)
  • convert_temperature: Converts temperatures between Fahrenheit and Celsius

Tools are defined using the @tool decorator which automatically exposes them to the LLM based on their docstrings.

Provider integration

agent_provider.py exposes a call_api function that promptfoo's Python provider calls to interact with the Strands agent.

Test cases and assertion types

The promptfoo config includes 5 test cases that demonstrate different assertion types:

TestDescriptionAssertion types used
Weather query for New YorkBasic tool usagecontains-any, llm-rubric, latency
Weather query for LondonVerify temperature formatcontains-any, javascript, latency
Weather query for TokyoCase-insensitive matchingicontains, javascript, latency
Weather with temperature conversionMulti-tool chainingllm-rubric, javascript, latency
Weather for unknown cityGraceful fallback handlingicontains, not-contains, latency

Assertion types explained

  • latency - Ensures responses complete within 30 seconds (applied to all tests via defaultTest)
  • contains-any - Verifies the agent returns expected city names and weather data from the mock tool
  • icontains - Case-insensitive matching to verify city names appear regardless of formatting
  • not-contains - Ensures the agent handles unknown cities gracefully without error messages
  • javascript - Validates temperature format (°F/°C symbols) and response length requirements
  • llm-rubric - Semantically evaluates whether the agent correctly chains weather lookup with temperature conversion