WebSockets

The WebSocket provider allows you to connect to a WebSocket endpoint for inference. This is useful for real-time, bidirectional communication. WebSockets are often used to stream messages that contain partial responses to improve the perceived performance of LLM applications. Promptfoo supports a range of implementations from servers that respond with a single message containing the full response, to those that stream a series of partial responses.

Configuration

To use the WebSocket provider, set the provider id to websocket and provide the necessary configuration in the config section.

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": "{{prompt}}", "model": "{{model}}"}'
      transformResponse: 'data.output'
      timeoutMs: 300000
      headers:
        Authorization: 'Bearer your-token-here'

Configuration Options

url (required): The WebSocket URL to connect to.
messageTemplate (required): A template for the message to be sent over the WebSocket connection. You can use placeholders like {{prompt}} which will be replaced with the actual prompt.
transformResponse (optional): A JavaScript snippet or function to extract the desired output from the WebSocket response given the data parameter. If not provided, the entire response will be used as the output. If the response is valid JSON, the object will be returned.
streamResponse (optional): A JavaScript function to extract the desired output from streamed WebSocket messages when the server sends multiple messages per prompt. It receives (accumulator, data, context?) and must return [nextAccumulator, complete]. When streamResponse is provided, it is used instead of transformResponse.
timeoutMs (optional): The timeout in milliseconds for the WebSocket connection. Default is 300000 (5 minutes).
headers (optional): A map of HTTP headers to include in the WebSocket connection request. Useful for authentication or other custom headers.

Using Variables

You can use test variables in your messageTemplate:

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": {{ prompt | dump }}, "model": {{ model | dump }}, "language": {{ language | dump }} }'
      transformResponse: 'data.translation'

tests:
  - vars:
      model: 'gpt-4'
      language: 'French'

Parsing the Response

Use the transformResponse property to extract specific values from the WebSocket response. For example:

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": {{ prompt | dump }} }'
      transformResponse: 'data.choices[0].message.content'

This configuration extracts the message content from a response structure similar to:

json

{
  "choices": [
    {
      "message": {
        "content": "This is the response."
      }
    }
  ]
}

Streaming Responses

Some WebSocket endpoints stream their replies as multiple messages (for example, token-by-token deltas) before sending a final completion. Use streamResponse to handle these incremental messages and decide when you're done.

How `streamResponse` works

It is called for every incoming WebSocket message and receives:
- accumulator: the current accumulated result. This should be a ProviderResponse-shaped object, e.g. { output: string }.
- data: the raw WebSocket message event. Access the payload via data.data. If your server sends JSON, you will typically start by parsing this such as: JSON.parse(data.data).
- context (optional): the call context from callApi, including test vars and flags.
It must return a tuple [result, complete] where:
- result: the updated accumulated result you want to carry forward.
- complete (boolean): set true only when you’ve received the final message and want to stop streaming and return the result.

When complete is false, promptfoo keeps the WebSocket open and waits for the next message. When true, the connection is closed and result is returned (after being normalized as a ProviderResponse).

:::info data is the browser/Node MessageEvent. Most servers send the useful payload in data.data as a string. Parse it if needed:

const message = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;

:::

Example: Concatenate streamed chunks into a single answer

Imagine your server streams JSON like this while writing a travel suggestion:

json

{"type":"chunk","text":"You should visit "}
{"type":"chunk","text":"Kyoto in spring."}
{"type":"done"}

Here’s a streamResponse that concatenates the text fields until a type: done arrives:

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": {{ prompt | dump }} }'
      streamResponse: |
        (accumulator, data, context) => {
          const msg = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
          const previous = typeof accumulator?.output === 'string' ? accumulator.output : '';

          if (msg?.type === 'chunk' && typeof msg.text === 'string') {
            return [{ output: previous + msg.text }, false];
          }
          if (msg?.type === 'done') {
            return [{ output: previous }, true];
          }
          return [accumulator, false];
        }

This will return a single final string: "You should visit Kyoto in spring." once the done message is received.

Example: Filter out non-final messages using a `complete` flag

Many realtime APIs emit interim deltas and a final message that includes complete: true. Suppose the stream contains a friendly recipe generation convo like:

json

{"role":"assistant","event":"delta","content":"Start by sautéing onions...","complete":false}
{"role":"assistant","event":"delta","content":" then add tomatoes and simmer.","complete":false}
{"role":"assistant","event":"final","content":"Start by sautéing onions, then add tomatoes and simmer.","complete":true}

If you only want to score the finished answer (not each partial), set complete to true only on the final frame and ignore everything else:

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": {{ prompt | dump }} }'
      streamResponse: |
        (accumulator, data, context) => {
          const msg = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
          if (msg?.complete === true) {
            return [{ output: msg.content }, true];
          }
          // Not complete yet — keep waiting and keep the previous accumulator
          return [accumulator, false];
        }

Example: Accumulate partials and still stop on `complete`

Sometimes you want the best of both worlds: concatenate partials for UI preview, but only finalize when the API says it’s done. A common pattern for customer support answers:

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": {{ prompt | dump }} }'
      streamResponse: |
        (accumulator, data, context) => {
          const msg = typeof data.data === 'string' ? JSON.parse(data.data) : data.data;
          const previous = typeof accumulator?.output === 'string' ? accumulator.output : '';

          if (msg?.event === 'delta' && typeof msg.content === 'string') {
            return [{ output: previous + msg.content }, false];
          }
          if (msg?.complete === true) {
            return [{ output: previous }, true];
          }
          return [accumulator, false];
        }

Referencing a function from a file

For larger handlers, keep the logic in a file and reference it:

yaml

providers:
  - id: 'wss://example.com/ws'
    config:
      messageTemplate: '{"prompt": {{ prompt | dump }} }'
      streamResponse: 'file://scripts/wsStreamHandler.js'

You can also point to a named export: file://scripts/wsStreamHandler.js:myHandler.

Using as a Library

If you are using promptfoo as a node library, you can provide the equivalent provider config:

{
  // ...
  providers: [{
    id: 'wss://example.com/ws',
    config: {
      messageTemplate: '{"prompt": "{{prompt}}"}',
      transformResponse: (data) => data.foobar,
      timeoutMs: 15000,
    }
  }],
}

Note that when using the WebSocket provider, the connection will be opened for each API call and closed after receiving the response or when the timeout is reached.

Reference

Supported config options:

Option	Type	Description
url	string	The WebSocket URL to connect to. If not provided, the `id` of the provider will be used as the URL.
messageTemplate	string	A template string for the message to be sent over the WebSocket connection. Supports Nunjucks templating.
transformResponse	string	A function body or string to parse a single response. Ignored when `streamResponse` is provided.
streamResponse	Function	A function body, function expression, or `file://` reference that receives `(accumulator, data, context?)` and returns `[result, complete]` for streamed messages.
timeoutMs	number	The timeout in milliseconds for the WebSocket connection. Defaults to 300000 (5 minutes) if not specified.
headers	object	A map of HTTP headers to include in the WebSocket connection request. Useful for authentication or other custom headers.

Note: The messageTemplate supports Nunjucks templating, allowing you to use the {{prompt}} variable or any other variables passed in the test context.

In addition to a full URL, the provider id field accepts ws, wss, or websocket as values.

:::info If you're using the OpenAI Realtime provider, you can configure custom endpoints via apiBaseUrl (or env vars). The provider automatically converts https:// → wss:// and http:// → ws://. See the OpenAI docs: /docs/providers/openai/#custom-endpoints-and-proxies-realtime. :::

WebSockets

WebSockets

Configuration

Configuration Options

Using Variables

Parsing the Response

Streaming Responses

How streamResponse works

Example: Concatenate streamed chunks into a single answer

Example: Filter out non-final messages using a complete flag

Example: Accumulate partials and still stop on complete

Referencing a function from a file

Using as a Library

Reference

How `streamResponse` works

Example: Filter out non-final messages using a `complete` flag

Example: Accumulate partials and still stop on `complete`