Use voice mode

import Icon from "@site/src/components/icon";

:::warning Voice mode is deprecated as of Langflow 1.10.

The <Icon name="Mic" aria-hidden="true"/> Microphone button in the Playground now only enables speech-to-text, with no additional voice mode functionality. Speech-to-text transcribes speech into the Playground's chat input field, but does not provide voice interaction with the Langflow UI or text-to-speech responses.

The api/v1/voice WebSocket endpoints described below are still available. :::

:::info Voice mode is not available in Langflow Desktop. To use voice mode, Install the Langflow OSS Python package. :::

You can use Langflow's voice mode to interact with your flows verbally through a microphone and speakers.

Prerequisites

Voice mode requires the following:

A flow with Chat Input, Language Model, and Chat Output components.

If your flow has an Agent component, make sure the tools in your flow have accurate names and descriptions to help the agent choose which tools to use.

Additionally, be aware that voice mode overrides typed instructions in the Agent component's Agent Instructions field.
An OpenAI account and an OpenAI API key because Langflow uses the OpenAI API to process voice input and generate responses.
Optional: An ElevenLabs API key to enable more voice options for the LLM's response.
A microphone and speakers.

A high quality microphone and minimal background noise are recommended for optimal voice comprehension.

Test voice mode in the Playground

In the Playground, click the <Icon name="Mic" aria-hidden="true"/> Microphone to enable voice mode and verbally interact with your flows through a microphone and speakers.

The following steps use the Simple Agent template to demonstrate how to enable voice mode:

Create a flow based on the Simple Agent template.
Add your OpenAI API key credentials to the Agent component.
Click Playground.
Click the <Icon name="Mic" aria-hidden="true"/> Microphone icon to open the Voice mode dialog.
Enter your OpenAI API key, and then click Save. Langflow saves the key as a global variable.
If you are prompted to grant microphone access, you must allow microphone access to use voice mode. If microphone access is blocked, you won't be able to provide verbal input.
For Audio Input, select the input device to use with voice mode.
Optional: Add an ElevenLabs API key to enable more voices for the LLM's response. Langflow saves this key as a global variable.
For Preferred Language, select the language you want to use for your conversations with the LLM. This option changes both the expected input language and the response language.
Speak into your microphone to start the chat.

If configured correctly, the waveform registers your input, and then the agent's logic and response are described verbally and in the Playground.

Develop applications with websockets endpoints

Langflow exposes an OpenAI Realtime API-compatible websocket endpoint for your flows. You can build applications against this endpoint the same way you would build against OpenAI Realtime API websockets.

The Langflow API's websocket endpoint requires an OpenAI API key for authentication, and supports an optional ElevenLabs integration with an ElevenLabs API key.

Additionally, the endpoint requires that you provide the flow ID in the endpoint path.

Speech-to-text audio transcription

The /ws/flow_tts/$FLOW_ID endpoint converts audio to text using OpenAI Realtime voice transcription, and then directly invokes the specified flow for each transcript.

This is the mode used in the Langflow Playground.

Session IDs for websockets endpoints

The endpoint accepts an optional /$SESSION_ID path parameter to provide a unique ID for the conversation. If omitted, Langflow uses the flow ID as the session ID.

However, be aware that voice mode only maintains context within the current conversation instance. When you close the Playground or end a chat, verbal chat history is discarded and not available for future chat sessions.

Prerequisites

Test voice mode in the Playground

Develop applications with websockets endpoints

Speech-to-text audio transcription

Session IDs for websockets endpoints

See also