config-websockets/streaming (WebSocket Streaming)

This example shows how to configure a websocket application that streams its responses. It includes a small Node.js server that exposes two WebSocket endpoints:

A non-streaming endpoint (/ws) that returns a single message when the model finishes.
A streaming endpoint (/ws-stream) that sends incremental deltas and a final message.

You’ll run the server locally and use promptfoo’s eval command to test the quality of the application.

You can run this example with:

bash

npx promptfoo@latest init --example config-websockets/streaming
cd config-websockets/streaming

What’s in this folder

promptfooconfig.yaml – Configures a target pointing at the local WebSocket server using the streaming endpoint
server/ – Minimal Express + WebSocket server that calls the OpenAI Responses API and exposes the two endpoints

Prerequisites

Node.js 20+
An OpenAI API key set as OPENAI_API_KEY

1) Start the local WebSocket server

From this directory:

bash

cd server
npm install

# Option A: set environment variables in your shell
export OPENAI_API_KEY=your_key_here
# Optional:
# export CHATBOT_MODEL=gpt-4.1-mini  # defaults to gpt-4.1-mini
# export PORT=3300                   # defaults to 3300

# Start the server
npm start

You should see the server listening at http://localhost:3300.

Health check:

bash

curl http://localhost:3300/health
# {"status":"ok"}

WebSocket Endpoints:

ws://localhost:3300/ws – non-streaming
ws://localhost:3300/ws-stream – streaming (sends delta updates and a final message)

2) How the WebSocket configuration works

In promptfooconfig.yaml, the websocket endpoint is configured under the websocket endpoint id:

yaml

- id: 'ws://localhost:3300/ws-stream'

The target configuration uses the streamResponse function streamResponse(accumulator, data, context?) to decide when to stop and what to return.

Server Response Format

The server three types of messages:

delta messages that include a partial response
message messages that include the finalized response in full
error messages that indicate an error occurred

Example of a successful message stream:

json

{"type":"delta","message":"Part of a thought"}
{"type":"message","message":"Part of a thought, now the thought is completed"}

The streamResponse function includes logic for handling these different cases. Note: the delta case is the fallback, which returns false for the second item in the tuple to indicate the response is not yet complete:

yaml

- id: 'ws://localhost:3300/ws-stream'
  config:
    messageTemplate: '{"input": {{prompt | dump}}}'
    streamResponse: |
      (accumulator, event, context) => {
        const { message, type } = JSON.parse(event.data);
        if (type === 'message') { return [{ output: message }, true]; }
        if (type === 'error')   { return [{ error: message }, true]; }
        return [{output: message}, false];
      }

Tip: If you need to concatenate partials for UX, you can return an accumulator object with the concatenated value on delta frames and only return true when you receive the final message.

3) Run the evaluation

With the server running, open a new terminal at this example directory and run:

bash

promptfoo eval

This will evaluate the test cases against the streaming WebSocket endpoint.

View results in the browser UI:

bash

promptfoo view

Troubleshooting

If requests fail immediately, ensure OPENAI_API_KEY is set in the environment where the server is running.
If the client can’t connect, verify the server is listening on the expected port (PORT, defaults to 3300) and that you’re using the correct ws:// URL.
For streaming behavior, watch the server logs and confirm you’re receiving delta events followed by a final message.

Cleanup

Stop the server with Ctrl+C in its terminal.