examples/config-websockets/streaming/README.md
This example shows how to configure a websocket application that streams its responses. It includes a small Node.js server that exposes two WebSocket endpoints:
/ws) that returns a single message when the model finishes./ws-stream) that sends incremental deltas and a final message.You’ll run the server locally and use promptfoo’s eval command to test the quality of the application.
You can run this example with:
npx promptfoo@latest init --example config-websockets/streaming
cd config-websockets/streaming
promptfooconfig.yaml – Configures a target pointing at the local WebSocket server using the streaming endpointserver/ – Minimal Express + WebSocket server that calls the OpenAI Responses API and exposes the two endpointsOPENAI_API_KEYFrom this directory:
cd server
npm install
# Option A: set environment variables in your shell
export OPENAI_API_KEY=your_key_here
# Optional:
# export CHATBOT_MODEL=gpt-4.1-mini # defaults to gpt-4.1-mini
# export PORT=3300 # defaults to 3300
# Start the server
npm start
You should see the server listening at http://localhost:3300.
Health check:
curl http://localhost:3300/health
# {"status":"ok"}
WebSocket Endpoints:
ws://localhost:3300/ws – non-streamingws://localhost:3300/ws-stream – streaming (sends delta updates and a final message)In promptfooconfig.yaml, the websocket endpoint is configured under the websocket endpoint id:
- id: 'ws://localhost:3300/ws-stream'
The target configuration uses the streamResponse function streamResponse(accumulator, data, context?) to decide when to stop and what to return.
The server three types of messages:
delta messages that include a partial responsemessage messages that include the finalized response in fullerror messages that indicate an error occurredExample of a successful message stream:
{"type":"delta","message":"Part of a thought"}
{"type":"message","message":"Part of a thought, now the thought is completed"}
The streamResponse function includes logic for handling these different cases. Note: the delta case is the fallback, which returns false for the second item in the tuple to indicate the response is not yet complete:
- id: 'ws://localhost:3300/ws-stream'
config:
messageTemplate: '{"input": {{prompt | dump}}}'
streamResponse: |
(accumulator, event, context) => {
const { message, type } = JSON.parse(event.data);
if (type === 'message') { return [{ output: message }, true]; }
if (type === 'error') { return [{ error: message }, true]; }
return [{output: message}, false];
}
Tip: If you need to concatenate partials for UX, you can return an accumulator object with the concatenated value on delta frames and only return true when you receive the final message.
With the server running, open a new terminal at this example directory and run:
promptfoo eval
This will evaluate the test cases against the streaming WebSocket endpoint.
View results in the browser UI:
promptfoo view
OPENAI_API_KEY is set in the environment where the server is running.PORT, defaults to 3300) and that you’re using the correct ws:// URL.delta events followed by a final message.Stop the server with Ctrl+C in its terminal.