site/docs/providers/websocket.md
The WebSocket provider allows you to connect to a WebSocket endpoint for inference. This is useful for real-time, bidirectional communication. WebSockets are often used to stream messages that contain partial responses to improve the perceived performance of LLM applications. Promptfoo supports a range of implementations from servers that respond with a single message containing the full response, to those that stream a series of partial responses.
To use the WebSocket provider, set the provider id to websocket and provide the necessary configuration in the config section.
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": "{{prompt}}", "model": "{{model}}"}'
transformResponse: 'data.output'
timeoutMs: 300000
headers:
Authorization: 'Bearer your-token-here'
url (required): The WebSocket URL to connect to.messageTemplate (required): A template for the message to be sent over the WebSocket connection. You can use placeholders like {{prompt}} which will be replaced with the actual prompt.transformResponse (optional): A JavaScript snippet or function to extract the desired output from the WebSocket response given the data parameter. If not provided, the entire response will be used as the output. If the response is valid JSON, the object will be returned.streamResponse (optional): A JavaScript function to extract the desired output from streamed WebSocket messages when the server sends multiple messages per prompt. It receives (accumulator, data, context?) and must return [nextAccumulator, complete]. When streamResponse is provided, it is used instead of transformResponse.timeoutMs (optional): The timeout in milliseconds for the WebSocket connection. Default is 300000 (5 minutes).headers (optional): A map of HTTP headers to include in the WebSocket connection request. Useful for authentication or other custom headers.You can use test variables in your messageTemplate:
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": {{ prompt | dump }}, "model": {{ model | dump }}, "language": {{ language | dump }} }'
transformResponse: 'data.translation'
tests:
- vars:
model: 'gpt-4'
language: 'French'
Use the transformResponse property to extract specific values from the WebSocket response. For example:
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": {{ prompt | dump }} }'
transformResponse: 'data.choices[0].message.content'
This configuration extracts the message content from a response structure similar to:
{
"choices": [
{
"message": {
"content": "This is the response."
}
}
]
}
Some WebSocket endpoints stream their replies as multiple messages (for example, token-by-token deltas) before sending a final completion. Use streamResponse to handle these incremental messages and decide when you're done.
streamResponse worksaccumulator: the current accumulated result. This should be a ProviderResponse-shaped object, e.g. { output: string }.data: the raw WebSocket message event. Access the payload via data.data. If your server sends JSON, you will typically start by parsing this such as: JSON.parse(data.data).context (optional): the call context from callApi, including test vars and flags.[result, complete] where:
result: the updated accumulated result you want to carry forward.complete (boolean): set true only when you’ve received the final message and want to stop streaming and return the result.When complete is false, promptfoo keeps the WebSocket open and waits for the next message. When true, the connection is closed and result is returned (after being normalized as a ProviderResponse).
:::info
data is the browser/Node MessageEvent. Most servers send the useful payload in data.data as a string. Parse it if needed:
const message = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
:::
Imagine your server streams JSON like this while writing a travel suggestion:
{"type":"chunk","text":"You should visit "}
{"type":"chunk","text":"Kyoto in spring."}
{"type":"done"}
Here’s a streamResponse that concatenates the text fields until a type: done arrives:
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": {{ prompt | dump }} }'
streamResponse: |
(accumulator, data, context) => {
const msg = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
const previous = typeof accumulator?.output === 'string' ? accumulator.output : '';
if (msg?.type === 'chunk' && typeof msg.text === 'string') {
return [{ output: previous + msg.text }, false];
}
if (msg?.type === 'done') {
return [{ output: previous }, true];
}
return [accumulator, false];
}
This will return a single final string: "You should visit Kyoto in spring." once the done message is received.
complete flagMany realtime APIs emit interim deltas and a final message that includes complete: true. Suppose the stream contains a friendly recipe generation convo like:
{"role":"assistant","event":"delta","content":"Start by sautéing onions...","complete":false}
{"role":"assistant","event":"delta","content":" then add tomatoes and simmer.","complete":false}
{"role":"assistant","event":"final","content":"Start by sautéing onions, then add tomatoes and simmer.","complete":true}
If you only want to score the finished answer (not each partial), set complete to true only on the final frame and ignore everything else:
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": {{ prompt | dump }} }'
streamResponse: |
(accumulator, data, context) => {
const msg = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
if (msg?.complete === true) {
return [{ output: msg.content }, true];
}
// Not complete yet — keep waiting and keep the previous accumulator
return [accumulator, false];
}
completeSometimes you want the best of both worlds: concatenate partials for UI preview, but only finalize when the API says it’s done. A common pattern for customer support answers:
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": {{ prompt | dump }} }'
streamResponse: |
(accumulator, data, context) => {
const msg = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
const previous = typeof accumulator?.output === 'string' ? accumulator.output : '';
if (msg?.event === 'delta' && typeof msg.content === 'string') {
return [{ output: previous + msg.content }, false];
}
if (msg?.complete === true) {
return [{ output: previous }, true];
}
return [accumulator, false];
}
For larger handlers, keep the logic in a file and reference it:
providers:
- id: 'wss://example.com/ws'
config:
messageTemplate: '{"prompt": {{ prompt | dump }} }'
streamResponse: 'file://scripts/wsStreamHandler.js'
You can also point to a named export: file://scripts/wsStreamHandler.js:myHandler.
If you are using promptfoo as a node library, you can provide the equivalent provider config:
{
// ...
providers: [{
id: 'wss://example.com/ws',
config: {
messageTemplate: '{"prompt": "{{prompt}}"}',
transformResponse: (data) => data.foobar,
timeoutMs: 15000,
}
}],
}
Note that when using the WebSocket provider, the connection will be opened for each API call and closed after receiving the response or when the timeout is reached.
Supported config options:
| Option | Type | Description |
|---|---|---|
| url | string | The WebSocket URL to connect to. If not provided, the id of the provider will be used as the URL. |
| messageTemplate | string | A template string for the message to be sent over the WebSocket connection. Supports Nunjucks templating. |
| transformResponse | string | A function body or string to parse a single response. Ignored when streamResponse is provided. |
| streamResponse | Function | A function body, function expression, or file:// reference that receives (accumulator, data, context?) and returns [result, complete] for streamed messages. |
| timeoutMs | number | The timeout in milliseconds for the WebSocket connection. Defaults to 300000 (5 minutes) if not specified. |
| headers | object | A map of HTTP headers to include in the WebSocket connection request. Useful for authentication or other custom headers. |
Note: The messageTemplate supports Nunjucks templating, allowing you to use the {{prompt}} variable or any other variables passed in the test context.
In addition to a full URL, the provider id field accepts ws, wss, or websocket as values.
:::info
If you're using the OpenAI Realtime provider, you can configure custom endpoints via apiBaseUrl (or env vars). The provider automatically converts https:// → wss:// and http:// → ws://. See the OpenAI docs: /docs/providers/openai/#custom-endpoints-and-proxies-realtime.
:::