examples/openai-agents-basic/README.md
This example demonstrates how to use the OpenAI Agents SDK with promptfoo to create an interactive D&D adventure game powered by an AI Dungeon Master.
roll_dice, check_inventory, describe_scene, and check_character_stats toolsnvm use to align with .nvmrc)@openai/agents SDK (installed via npm)This example requires:
OPENAI_API_KEY - Your OpenAI API keyYou can set it in a .env file or directly in your environment:
export OPENAI_API_KEY=sk-...
You can run this example with:
npx promptfoo@latest init --example openai-agents-basic
cd openai-agents-basic
Or if you've cloned the repo:
cd examples/openai-agents-basic
npm install
npx promptfoo eval
This runs test cases simulating player actions and validates the DM's responses.
npx promptfoo view
Opens the evaluation results in a web interface showing how the DM handled different scenarios.
Modify promptfooconfig.yaml to add your own scenarios:
tests:
- description: Negotiate with the dragon
vars:
query: 'I try to convince the dragon to let us pass peacefully'
assert:
- type: llm-rubric
value: Response involves charisma check and dragon's reaction based on roll
openai-agents-basic/
├── agents/
│ └── dungeon-master-agent.ts # D&D Dungeon Master agent
├── tools/
│ └── game-tools.ts # D&D game mechanics (dice, inventory, stats, scenes)
├── promptfooconfig.yaml # Test scenarios
├── package.json
└── README.md
agents/dungeon-master-agent.ts)The DM agent orchestrates D&D adventures using proper game mechanics:
export default new Agent({
name: 'Dungeon Master',
instructions: `You are an enthusiastic Dungeon Master running an epic fantasy D&D adventure.
Your role:
- Guide players through thrilling quests, combat encounters, and mysteries
- Use roll_dice for attack rolls, saving throws, ability checks, and damage (D&D 5e rules)
- Use check_inventory to see what items, equipment, and gold players have
- Use check_character_stats to view player abilities, HP, AC, and level
- Use describe_scene to paint vivid, atmospheric pictures of locations`,
model: 'gpt-5-mini',
tools: gameTools,
});
tools/game-tools.ts)Four core tools power the D&D mechanics:
1. roll_dice - Simulates D&D dice rolls with modifiers and critical hit detection:
export const rollDice = tool({
name: 'roll_dice',
description: 'Roll dice for D&D mechanics: attack rolls, damage, saving throws, ability checks',
parameters: z.object({
sides: z.number(),
count: z.number().default(1),
modifier: z.number().default(0),
purpose: z.string().default(''),
}),
execute: async ({ sides, count, modifier, purpose }) => {
// Returns rolls, total, notation, and detects natural 20/1 for crits
},
});
2. check_inventory - Manages equipped weapons, armor, and carried items:
export const checkInventory = tool({
name: 'check_inventory',
description: 'Check what items, equipment, and gold the player character has',
parameters: z.object({
playerId: z.string().default('player1'),
}),
execute: async ({ playerId }) => {
// Returns equipped weapon, armor, inventory items, and currency
},
});
3. describe_scene - Generates atmospheric D&D location descriptions:
export const describeScene = tool({
name: 'describe_scene',
description: 'Generate vivid descriptions of D&D locations, encounters, and environments',
parameters: z.object({
location: z.string(),
mood: z.string(),
}),
execute: async ({ location, mood }) => {
// Returns immersive scene description with possible actions
},
});
4. check_character_stats - Displays full D&D 5e character sheet:
export const checkCharacterStats = tool({
name: 'check_character_stats',
description: 'View player character stats, abilities, HP, AC, and other D&D 5e attributes',
parameters: z.object({
playerId: z.string().default('player1'),
}),
execute: async ({ playerId }) => {
// Returns complete character: ability scores, HP, AC, skills, features
},
});
promptfooconfig.yaml)The config includes engaging D&D test cases:
tests:
- description: Dragon combat with attack roll
vars:
query: 'I draw my longsword and attack the red dragon!'
assert:
- type: llm-rubric
value: Response includes dice rolls for attack and damage, describes combat outcome
- description: Ridiculous player action
vars:
query: 'I attempt to seduce the ancient dragon using interpretive dance'
assert:
- type: llm-rubric
value: DM responds with humor and wit while keeping the game engaging
Extend the game with new D&D mechanics:
export const castSpell = tool({
name: 'cast_spell',
description: 'Cast a D&D spell',
parameters: z.object({
spell: z.string(),
target: z.string(),
spellLevel: z.number(),
}),
execute: async ({ spell, target, spellLevel }) => {
// Spell implementation with saving throws
},
});
export default [rollDice, checkInventory, describeScene, checkCharacterStats, castSpell];
Modify agents/dungeon-master-agent.ts to change DM personality:
instructions: `You are a dramatic Dungeon Master inspired by classic fantasy epics.
- Describe everything with cinematic flair and dramatic tension
- Include plot twists and moral dilemmas
- Reference classic D&D adventures with unique twists
- Make combat visceral and choices consequential`,
Add complex multi-step scenarios:
- description: Multi-step puzzle challenge
vars:
query: 'I examine the ancient mechanism blocking the door'
assert:
- type: llm-rubric
value: Response describes puzzle mechanics clearly with hints toward solution
- type: javascript
value: output.length > 150 # Ensures detailed description
Tracing is enabled in the configuration to capture agent execution details:
config:
tracing: true # Attempts to export traces via OTLP to http://localhost:4318
The agent will attempt to export OpenTelemetry traces showing:
To view traces, you'll need an OTLP-compatible collector running on http://localhost:4318. Popular options:
Quick Setup with Jaeger:
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
Then visit http://localhost:16686 to view traces.
Note: If no trace collector is running, the agent will log warnings but continue working normally. Tracing failures don't affect evaluation results.
Combat Scenario:
Player: "I attack the goblin with my longsword!"
DM: *rolls 1d20+5* You rolled a 18 total! Your blade strikes true.
*rolls 1d8+4* You deal 9 slashing damage. The goblin staggers back,
clutching its wounded side...
Natural 20:
Player: "I attack the dragon!"
DM: *rolls 1d20+5* Natural 20! Critical hit! Your longsword finds a gap
in the dragon's scales. *rolls 2d8+4* You deal a devastating 16 damage!
Character Stats Check:
Player: "What are my current stats?"
DM: You're Thorin Ironforge, a Level 5 Mountain Dwarf Fighter:
- HP: 42/47
- AC: 18 (Chain Mail)
- STR: 16 (+3), DEX: 12 (+1), CON: 16 (+3)
- Special: Second Wind, Action Surge, Darkvision
Scene Description:
Player: "I enter the ancient crypt"
DM: *describes ominous crypt* Rows of stone sarcophagi line the walls,
some with their lids askew. The air is thick and stale. Strange scratch
marks mar the inside of several coffins. Your torch reveals fresh
footprints in the dust - heading deeper into the crypt.
What do you do?