OpenAI Agents

Test multi-turn agentic workflows built with the @openai/agents SDK. Evaluate agents that use tools, hand off between specialists, and handle multi-step tasks.

:::note This page covers the JavaScript @openai/agents SDK and the built-in openai:agents:* provider.

If you are using the Python openai-agents SDK, including SDK 0.14 SandboxAgent workflows or the experimental Python codex_tool, use the OpenAI Agents Python SDK guide and the openai-agents example instead. :::

Prerequisites

Install SDK: npm install @openai/agents
Set OPENAI_API_KEY environment variable
Agent definition (inline or in a TypeScript/JavaScript file)

Basic Usage

yaml

providers:
  - openai:agents:my-agent
    config:
      agent:
        name: Customer Support Agent
        model: gpt-5-mini
        instructions: You are a helpful customer support agent.
      maxTurns: 10

Configuration Options

Parameter	Description	Default
`agent`	Agent definition (inline object or `file://path`)	-
`tools`	Additional tool definitions (inline array or `file://path`)	-
`handoffs`	Additional handoff definitions (inline array or `file://path`)	-
`maxTurns`	Maximum conversation turns	10
`model`	Override model specified in agent definition	-
`modelSettings`	SDK `ModelSettings` overrides, including reasoning, verbosity, and retry settings	-
`inputGuardrails`	Additional input guardrails (inline array or `file://`)	-
`outputGuardrails`	Additional output guardrails (inline array or `file://`)	-
`executeTools`	Execute function tools normally (`real`) or replace them with mocked results	`real`
`toolMocks`	Mocked tool outputs keyed by tool name, used when `executeTools` is `mock` or false	-
`tracing`	Enable OpenTelemetry OTLP tracing	false
`otlpEndpoint`	Custom OTLP endpoint URL for tracing	http://localhost:4318

File-Based Configuration

Load agent and tools from external files:

yaml

providers:
  - openai:agents:support-agent
    config:
      agent: file://./agents/support-agent.ts
      tools: file://./tools/support-tools.ts
      maxTurns: 15
      tracing: true

Top-level tools, handoffs, inputGuardrails, and outputGuardrails augment whatever is already defined on the loaded agent.

Multimodal Input

If a rendered prompt is a JSON object or array that matches the SDK's AgentInputItem shape, Promptfoo passes it to run() as structured input instead of a plain string. This supports image, audio, and file inputs:

yaml

prompts:
  - file://./prompts/vision-input.json

providers:
  - id: openai:agents:vision-agent
    config:
      agent: file://./agents/vision-agent.ts

tests:
  - vars:
      image: file://./images/cat.jpg

Example prompt file (prompts/vision-input.json):

json

[
  {
    "role": "user",
    "content": [
      { "type": "input_text", "text": "What is in this image?" },
      { "type": "input_image", "image": "{{image}}" }
    ]
  }
]

Promptfoo resolves local image vars like file://./images/cat.jpg to data URLs before the prompt is passed to the SDK.

Arbitrary JSON prompts that do not match an agent input item are still sent as plain text.

Example agent file (agents/support-agent.ts):

typescript

import { Agent } from '@openai/agents';

export default new Agent({
  name: 'Support Agent',
  model: 'gpt-5-mini',
  instructions: 'You are a helpful customer support agent.',
});

Example tools file (tools/support-tools.ts):

typescript

import { tool } from '@openai/agents';
import { z } from 'zod';

export const lookupOrder = tool({
  name: 'lookup_order',
  description: 'Look up order status by order ID',
  parameters: z.object({
    order_id: z.string().describe('The order ID'),
  }),
  execute: async ({ order_id }) => {
    return { status: 'shipped', tracking: 'ABC123' };
  },
});

export default [lookupOrder];

Agent Handoffs

Transfer conversations between specialized agents:

yaml

providers:
  - openai:agents:triage
    config:
      agent:
        name: Triage Agent
        model: gpt-5-mini
        instructions: Route questions to the appropriate specialist.
      handoffs:
        - agent:
            name: Technical Support
            model: gpt-5-mini
            instructions: Handle technical troubleshooting.
          description: Transfer for technical issues

Guardrails

Validate tool inputs and outputs with guardrails:

yaml

providers:
  - openai:agents:secure-agent
    config:
      agent: file://./agents/secure-agent.ts
      inputGuardrails: file://./guardrails/input-guardrails.ts
      outputGuardrails: file://./guardrails/output-guardrails.ts

Guardrails run validation logic before tool execution (input) and after (output), enabling content filtering, PII detection, or custom business rules.

Retry Policies

OpenAI Agents SDK v0.7 added opt-in retry settings on modelSettings.retry. Promptfoo supports YAML-friendly retry policy presets and passes them to the SDK as runtime callbacks.

yaml

providers:
  - openai:agents:support-agent
    config:
      agent: file://./agents/support-agent.ts
      modelSettings:
        retry:
          maxRetries: 2
          backoff:
            initialDelayMs: 250
            maxDelayMs: 2000
            multiplier: 2
            jitter: true
          policy:
            any:
              - providerSuggested
              - httpStatus:
                  - 429
                  - 503

Supported preset policies are never, providerSuggested, networkError, and retryAfter.

You can also compose them with any or all. If you are configuring Promptfoo in TypeScript or JavaScript instead of YAML, you can pass SDK retry callbacks directly.

Mock Tool Execution

Use mocked tool outputs when you want deterministic evals without calling external systems:

yaml

providers:
  - openai:agents:support-agent
    config:
      agent: file://./agents/support-agent.ts
      tools: file://./tools/support-tools.ts
      executeTools: mock
      toolMocks:
        lookup_order:
          status: shipped
          tracking: ABC123

Tracing

Enable OpenTelemetry tracing to debug agent execution:

yaml

providers:
  - openai:agents:my-agent
    config:
      agent: file://./agents/my-agent.ts
      tracing: true # Exports to http://localhost:4318

With a custom OTLP endpoint:

yaml

providers:
  - openai:agents:my-agent
    config:
      agent: file://./agents/my-agent.ts
      tracing: true
      otlpEndpoint: https://otel-collector.example.com:4318

Or enable globally:

bash

export PROMPTFOO_TRACING_ENABLED=true
npx promptfoo eval

Traces include agent execution spans, tool invocations, model calls, handoff events, and token usage.

Once Promptfoo is collecting those traces, you can assert on the agent's path instead of only its final message:

yaml

tests:
  - vars:
      query: 'Find order 123 and tell me whether it shipped'
    assert:
      - type: trajectory:tool-used
        value: search_orders

      - type: trajectory:tool-args-match
        value:
          name: search_orders
          args:
            order_id: '123'

      - type: trajectory:tool-sequence
        value:
          steps:
            - search_orders
            - compose_reply

      - type: trajectory:goal-success
        value: 'Determine whether order 123 shipped and tell the user the correct status'
        provider: openai:gpt-5-mini

See Tracing for the eval-level OTLP setup required when you want Promptfoo to ingest and evaluate these traces directly.

Example: D&D Dungeon Master

Full working example with D&D mechanics, dice rolling, and character management:

yaml

description: D&D Adventure with AI Dungeon Master

prompts:
  - '{{query}}'

providers:
  - id: openai:agents:dungeon-master
    config:
      agent: file://./agents/dungeon-master-agent.ts
      tools: file://./tools/game-tools.ts
      maxTurns: 20
      tracing: true

tests:
  - description: Dragon combat with attack roll
    vars:
      query: 'I draw my longsword and attack the red dragon!'
    assert:
      - type: llm-rubric
        value: Response includes dice rolls for attack and damage

  - description: Check character stats
    vars:
      query: 'What are my character stats and current HP?'
    assert:
      - type: contains-any
        value: ['Thorin', 'Fighter', 'level 5']

:::tip

Try the interactive example: npx promptfoo@latest init --example openai-agents-basic

:::

Environment Variables

Variable	Description
`OPENAI_API_KEY`	OpenAI API key (required)
`PROMPTFOO_TRACING_ENABLED`	Enable tracing globally
`OPENAI_BASE_URL`	Custom OpenAI API base URL
`OPENAI_ORGANIZATION`	OpenAI organization ID

Limitations

:::warning

Tools must be async functions. Synchronous tools will cause runtime errors.

:::

Agent definition files must be TypeScript or JavaScript
File paths require file:// prefix (relative paths resolve from config file location)
Default maximum: 10 turns (configure with maxTurns)

OpenAI Provider - Standard OpenAI completions and chat
OpenAI Agents Python SDK Guide - Python SDK example with Promptfoo tracing, Sandbox Agents, and Codex tool span mapping
Red Team Guide - Test agent safety
Assertions - Validate agent responses
OpenAI Agents SDK - Official SDK documentation

OpenAI Agents

OpenAI Agents

Prerequisites

Basic Usage

Configuration Options

File-Based Configuration

Multimodal Input

Agent Handoffs

Guardrails

Retry Policies

Mock Tool Execution

Tracing

Example: D&D Dungeon Master

Environment Variables

Limitations

Related Documentation