examples/openai-realtime/README.md
This example demonstrates how to use promptfoo to test OpenAI's Realtime API capabilities. The Realtime API allows for real-time communication with gpt-realtime and gpt-realtime-1.5 using WebSockets, supporting both text and audio inputs/outputs.
You can run this example with:
npx promptfoo@latest init --example openai-realtime
cd openai-realtime
This will create all necessary files and folder structure to get started quickly.
export OPENAI_API_KEY=your-api-key-here
You can point the Realtime provider at custom/proxy endpoints (including Azure-compatible gateways) or local/dev servers by setting apiBaseUrl. The provider automatically converts https:// → wss:// and http:// → ws:// for the WebSocket connection.
providers:
- id: openai:realtime:gpt-realtime-1.5
config:
# Custom hosted gateway
apiBaseUrl: 'https://my-custom-api.com/v1' # connects to wss://my-custom-api.com/v1/realtime
modalities: ['text']
temperature: 0.7
For local development:
providers:
- id: openai:realtime:gpt-realtime-1.5
config:
apiBaseUrl: 'http://localhost:8080/v1' # connects to ws://localhost:8080/v1/realtime
modalities: ['text']
You can also use environment variables like OPENAI_API_BASE_URL or OPENAI_BASE_URL instead of apiBaseUrl.
promptfooconfig.yaml: Configuration file defining the providers and testspromptfooconfig-gpt-realtime.yaml: Comprehensive gpt-realtime-1.5 model demonstration with audio supporttest-webui-audio.yaml: Simple audio test for WebUI playbackrealtime-input.json: JSON template for the realtime input promptpromptfooconfig-conversation.yaml: Configuration for multi-turn conversation testsrealtime-conversation.js: JavaScript prompt function for multi-turn conversationsThe Realtime API supports maintaining conversation history across multiple turns. This example includes a multi-turn conversation configuration that demonstrates how to:
To run the multi-turn conversation example:
npx promptfoo eval -c examples/openai-realtime/promptfooconfig-conversation.yaml
The multi-turn conversation example demonstrates how the OpenAI Realtime API can maintain context across multiple exchanges. This is implemented using promptfoo's built-in support for conversation history through the _conversation variable and metadata.
_conversation variable contains all previous turns in the conversationconversationId metadata value is part of the same conversation threadWhen using conversationId in the metadata of tests, promptfoo automatically:
For each conversation turn:
_conversation variable is automatically populated with all previous prompts and outputsUser: What are some popular tourist destinations in Japan?
AI: Some popular tourist destinations in Japan include Tokyo, Kyoto, Osaka, Hiroshima, and Hokkaido...
User: Which of those places is best to visit in autumn?
AI: Kyoto is particularly beautiful in autumn with its colorful maple leaves...
User: What traditional foods should I try there?
AI: In Kyoto during autumn, you should try momiji manju (maple leaf-shaped cakes), kyo-kaiseki (traditional multi-course meal)...
The API maintains context throughout this exchange, understanding that follow-up questions refer to Japan and then to the specific autumn locations.
This example uses a JavaScript function (realtime-conversation.js) to properly format the conversation for the OpenAI Realtime API:
module.exports = async function ({ vars, provider }) {
// Create the messages array starting with system message
const messages = [
{
role: 'system',
content: [
{
type: 'input_text',
text: vars.system_message || 'You are a helpful AI assistant.',
},
],
},
];
// Add previous conversation turns if they exist
if (vars._conversation && Array.isArray(vars._conversation)) {
for (const completion of vars._conversation) {
// Add user message
messages.push({
role: 'user',
content: [
{
type: 'input_text',
text: completion.input,
},
],
});
// Add assistant message
messages.push({
role: 'assistant',
content: [
{
type: 'text',
text: completion.output,
},
],
});
}
}
// Add the current question as the final user message
messages.push({
role: 'user',
content: [
{
type: 'input_text',
text: vars.question || '',
},
],
});
return messages;
};
This approach provides better flexibility and error handling than using JSON templates with Nunjucks.
We also provide a JSON template approach for reference:
[
{
"role": "system",
"content": [
{
"type": "input_text",
"text": "{{ system_message }}"
}
]
}{% for completion in _conversation %},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "{{ completion.input }}"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "{{ completion.output }}"
}
]
}{% endfor %},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": "{{ question }}"
}
]
}
]
Note: JSON validators may show errors for this template because of the Nunjucks expressions, but promptfoo will correctly process this file at runtime. This approach uses the
_conversationvariable to maintain conversation history in a way that works with the Realtime API.
The configuration includes two separate conversation threads:
Each thread maintains its own independent context while tests are evaluated.
The provider implementation in promptfoo creates a direct WebSocket connection with the OpenAI Realtime API, following the official protocol:
wss://api.openai.com/v1/realtime?model=MODEL_IDThe WebSocket connection follows the official OpenAI documentation:
const wsUrl = `wss://api.openai.com/v1/realtime?model=${modelName}`;
const ws = new WebSocket(wsUrl, {
headers: {
Authorization: `Bearer ${apiKey}`,
'OpenAI-Beta': 'realtime=v1',
// Other headers...
},
});
When sending messages to the OpenAI Realtime API, you must use the correct content type format:
// When sending user or system messages, use the 'input_text' content type
ws.send(
JSON.stringify({
type: 'conversation.item.create',
item: {
type: 'message',
role: 'user',
content: [
{
type: 'input_text', // Must be 'input_text' for user/system inputs
text: 'Your message here',
},
],
},
}),
);
// When configuring modalities for response settings, use 'text' and 'audio'
const config = {
modalities: ['text', 'audio'],
};
Important Note: The OpenAI Realtime API is in beta and the format requirements may change. The implementation in promptfoo follows the current requirements as of the latest API version.
The implementation supports the Realtime API's function calling capabilities:
In this example, we've configured a simple weather function to demonstrate the capability. In a real implementation, you would connect this to actual weather data.
To use function calling in your application, you would implement a function call handler. Here's an example of how you might implement a weather function handler in JavaScript:
// In your application code
const functionCallHandler = async (name, args) => {
// Parse the function arguments
const parsedArgs = JSON.parse(args);
if (name === 'get_weather') {
const { location } = parsedArgs;
// In a real implementation, you would call a weather API here
// This is just a mock example
return JSON.stringify({
location,
temperature: '72°F',
condition: 'Sunny',
humidity: '45%',
forecast: 'Clear skies for the next 24 hours',
});
}
// Handle unknown function
return JSON.stringify({ error: `Unknown function: ${name}` });
};
// You can then pass this handler in your prompt configuration
const config = {
functionCallHandler,
};
The Realtime API supports both text and audio interactions. promptfoo now includes full audio support:
cedar and marin (in addition to existing voices)promptfoo view to access the WebUI and play generated audio filesTo enable audio support, configure your provider with:
providers:
- id: openai:realtime:gpt-realtime-1.5
config:
modalities: ['text', 'audio']
voice: 'cedar' # or 'marin', 'alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer'
instructions: 'Please respond with audio.'
test-webui-audio.yaml: Simple audio test for WebUI playbackpromptfooconfig-gpt-realtime.yaml: Comprehensive gpt-realtime model demonstrationFrom the root directory of promptfoo, run:
npx promptfoo eval -c examples/openai-realtime/promptfooconfig.yaml
If you encounter a "WebSocket error: Unexpected server response: 403" error, this typically indicates one of these issues:
Network/Firewall Restrictions: WebSocket connections may be blocked by your network or firewall.
API Access: Your OpenAI API key may not have access to the Realtime API beta.
Rate Limits: You may have hit rate limits or quotas for the Realtime API.
If you're unable to use the Realtime API due to WebSocket connection issues, you can still use the regular OpenAI chat API for most use cases. The configuration includes both providers, so you'll see results from the regular chat API even if the Realtime API fails to connect.