sdk/voicelive/Azure.AI.VoiceLive/README.md
Azure VoiceLive is a managed service that enables low-latency, high-quality speech-to-speech interactions for voice agents. The API consolidates speech recognition, generative AI, and text-to-speech functionalities into a single, unified interface, providing an end-to-end solution for creating seamless voice-driven experiences.
Use the client library to:
Source code | Package (NuGet) | API reference documentation | Product documentation | Samples
This section includes everything a developer needs to install the package and create their first VoiceLive client connection.
Install the client library for .NET with NuGet:
dotnet add package Azure.AI.VoiceLive
You must have an Azure subscription and an Azure AI Foundry resource to use this service.
The client library targets .NET Standard 2.0 and .NET 8.0, providing compatibility with a wide range of .NET implementations. To use the async streaming features demonstrated in the examples, you'll need .NET 6.0 or later.
The Azure.AI.VoiceLive client supports two authentication methods:
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
AzureKeyCredential credential = new AzureKeyCredential("your-api-key");
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
For the recommended keyless authentication with Microsoft Entra ID, you need to:
Cognitive Services User role to your user account or managed identity in the Azure portal under Access control (IAM) > Add role assignmentTokenCredential implementation - the SDK automatically handles token acquisition and refresh with the appropriate scopeThe client library targets the latest service API version by default. You can optionally specify the API version when creating a client instance.
You have the flexibility to explicitly select a supported service API version when instantiating a client by configuring its associated options:
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClientOptions options = new VoiceLiveClientOptions(VoiceLiveClientOptions.ServiceVersion.V2025_10_01);
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential, options);
The Azure.AI.VoiceLive client library provides several key classes for real-time voice interactions:
The primary entry point for the Azure.AI.VoiceLive service. Use this client to establish sessions and configure authentication.
Represents an active WebSocket connection to the VoiceLive service. This class handles bidirectional communication, allowing you to send audio input and receive audio output, text transcriptions, and other events in real-time.
The service uses session configuration to control various aspects of the voice interaction:
The VoiceLive API supports multiple AI models with different capabilities:
| Model | Description | Use Case |
|---|---|---|
gpt-4o-realtime-preview | GPT-4o with real-time audio processing | High-quality conversational AI |
gpt-4o-mini-realtime-preview | Lightweight GPT-4o variant | Fast, efficient interactions |
phi4-mm-realtime | Phi model with multimodal support | Cost-effective voice applications |
The VoiceLive API provides Azure-specific enhancements:
We guarantee that all client instance methods are thread-safe and independent of each other (guideline). This ensures that the recommendation of reusing client instances is always safe, even across threads.
Client options | Accessing the response | Long-running operations | Handling failures | Diagnostics | Mocking | Client lifetime
<!-- CLIENT COMMON BAR -->You can familiarize yourself with different APIs using Samples.
// Create the VoiceLive client
Uri endpoint = new Uri("https://your-resource.cognitiveservices.azure.com");
DefaultAzureCredential credential = new DefaultAzureCredential();
VoiceLiveClient client = new VoiceLiveClient(endpoint, credential);
var model = "gpt-4o-mini-realtime-preview"; // Specify the model to use
// Start a new session
VoiceLiveSession session = await client.StartSessionAsync(model).ConfigureAwait(false);
// Configure session for voice conversation
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a helpful AI assistant. Respond naturally and conversationally.",
Voice = new AzureStandardVoice("en-US-AvaNeural"),
TurnDetection = new AzureSemanticVadTurnDetection()
{
Threshold = 0.5f,
PrefixPadding = TimeSpan.FromMilliseconds(300),
SilenceDuration = TimeSpan.FromMilliseconds(500)
},
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// Ensure modalities include audio
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions).ConfigureAwait(false);
// Process events from the session
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync().ConfigureAwait(false))
{
if (serverEvent is SessionUpdateResponseAudioDelta audioDelta)
{
// Play audio response
byte[] audioData = audioDelta.Delta.ToArray();
// ... audio playback logic
}
else if (serverEvent is SessionUpdateResponseTextDelta textDelta)
{
// Display text response
Console.Write(textDelta.Delta);
}
}
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a customer service representative. Be helpful and professional.",
Voice = new AzureCustomVoice("your-custom-voice-name", "your-custom-voice-endpoint-id")
{
Temperature = 0.8f
},
TurnDetection = new AzureSemanticVadTurnDetection()
{
RemoveFillerWords = true
},
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// Ensure modalities include audio
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions).ConfigureAwait(false);
// Define a function for the assistant to call
var getCurrentWeatherFunction = new VoiceLiveFunctionDefinition("get_current_weather")
{
Description = "Get the current weather for a given location",
Parameters = BinaryData.FromString("""
{
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state or country"
}
},
"required": ["location"]
}
""")
};
VoiceLiveSessionOptions sessionOptions = new()
{
Model = model,
Instructions = "You are a weather assistant. Use the get_current_weather function to help users with weather information.",
Voice = new AzureStandardVoice("en-US-AvaNeural"),
InputAudioFormat = InputAudioFormat.Pcm16,
OutputAudioFormat = OutputAudioFormat.Pcm16
};
// Add the function tool
sessionOptions.Tools.Add(getCurrentWeatherFunction);
// Ensure modalities include audio
sessionOptions.Modalities.Clear();
sessionOptions.Modalities.Add(InteractionModality.Text);
sessionOptions.Modalities.Add(InteractionModality.Audio);
await session.ConfigureSessionAsync(sessionOptions).ConfigureAwait(false);
// Process events from the session
await foreach (SessionUpdate serverEvent in session.GetUpdatesAsync().ConfigureAwait(false))
{
if (serverEvent is SessionUpdateResponseFunctionCallArgumentsDone functionCall)
{
if (functionCall.Name == "get_current_weather")
{
// Extract parameters from the function call
var parametersString = functionCall.Arguments;
var parameters = System.Text.Json.JsonSerializer.Deserialize<Dictionary<string, string>>(parametersString);
string location = parameters != null ? parameters["location"] : string.Empty;
// Call your external weather service here and get the result
string weatherInfo = $"The current weather in {location} is sunny with a temperature of 75�F.";
// Send the function response back to the session
await session.AddItemAsync(new FunctionCallOutputItem(functionCall.CallId, weatherInfo)).ConfigureAwait(false);
// Start the next response.
await session.StartResponseAsync().ConfigureAwait(false);
}
}
}
// Add a user message to the session
await session.AddItemAsync(new UserMessageItem("Hello, can you help me with my account?")).ConfigureAwait(false);
// Start the response from the assistant
await session.StartResponseAsync().ConfigureAwait(false);
Authentication Errors: If you receive authentication errors, verify that:
WebSocket Connection Issues: VoiceLive uses WebSocket connections. Ensure that:
*.cognitiveservices.azure.comAudio Processing Errors: For audio-related issues:
Enable logging to help diagnose issues:
using Azure.Core.Diagnostics;
// Enable logging for Azure SDK
using AzureEventSourceListener listener = AzureEventSourceListener.CreateConsoleLogger();
The VoiceLive service implements rate limiting based on:
Implement appropriate retry logic and connection management to handle throttling gracefully.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
<!-- LINKS -->