docs/decisions/0065-realtime-api-clients.md
Multiple model providers are starting to enable realtime voice-to-voice or even multi-modal, realtime, two-way communication with their models, this includes OpenAI with their Realtime API and Google Gemini. These API's promise some very interesting new ways of using LLM's for different scenario's, which we want to enable with Semantic Kernel.
The key feature that Semantic Kernel brings into this system is the ability to (re)use Semantic Kernel function as tools with these API's. There are also options for Google to use video and images as input, this will likely not be implemented first, but the abstraction should be able to deal with it.
[!IMPORTANT] Both the OpenAI and Google realtime api's are in preview/beta, this means there might be breaking changes in the way they work coming in the future, therefore the clients built to support these API's are going to be experimental until the API's stabilize.
At this time, the protocols that these API's use are Websockets and WebRTC.
In both cases there are events being sent to and from the service, some events contain content, text, audio, or video (so far only sending, not receiving), while some events are "control" events, like content created, function call requested, etc. Sending events include, sending content, either voice, text or function call output, or events, like committing the input audio and requesting a response.
Websocket has been around for a while and is a well known technology, it is a full-duplex communication protocol over a single, long-lived connection. It is used for sending and receiving messages between client and server in real-time. Each event can contain a message, which might contain a content item, or a control event. Audio is sent as a base64 encoded string in a event.
WebRTC is a Mozilla project that provides web browsers and mobile applications with real-time communication via simple APIs. It allows audio and video communication to work inside web pages and other applications by allowing direct peer-to-peer communication, eliminating the need to install plugins or download native apps. It is used for sending and receiving audio and video streams, and can be used for sending (data-)messages as well. The big difference compared to websockets is that it explicitly create a channel for audio and video, and a separate channel for "data", which are events and in this space that contains all non-AV content, text, function calls, function results and control events, like errors or acknowledgements.
| Content/Control event | Event Description | OpenAI Event | Google Event |
|---|---|---|---|
| Control | Configure session | session.update | BidiGenerateContentSetup |
| Content | Send voice input | input_audio_buffer.append | BidiGenerateContentRealtimeInput |
| Control | Commit input and request response | input_audio_buffer.commit | - |
| Control | Clean audio input buffer | input_audio_buffer.clear | - |
| Content | Send text input | conversation.item.create | BidiGenerateContentClientContent |
| Control | Interrupt audio | conversation.item.truncate | - |
| Control | Delete content | conversation.item.delete | - |
| Control | Respond to function call request | conversation.item.create | BidiGenerateContentToolResponse |
| Control | Ask for response | response.create | - |
| Control | Cancel response | response.cancel | - |
| Content/Control event | Event Description | OpenAI Event | Google Event |
|---|---|---|---|
| Control | Error | error | - |
| Control | Session created | session.created | BidiGenerateContentSetupComplete |
| Control | Session updated | session.updated | BidiGenerateContentSetupComplete |
| Control | Conversation created | conversation.created | - |
| Control | Input audio buffer committed | input_audio_buffer.committed | - |
| Control | Input audio buffer cleared | input_audio_buffer.cleared | - |
| Control | Input audio buffer speech started | input_audio_buffer.speech_started | - |
| Control | Input audio buffer speech stopped | input_audio_buffer.speech_stopped | - |
| Content | Conversation item created | conversation.item.created | - |
| Content | Input audio transcription completed | conversation.item.input_audio_transcription.completed | |
| Content | Input audio transcription failed | conversation.item.input_audio_transcription.failed | |
| Control | Conversation item truncated | conversation.item.truncated | - |
| Control | Conversation item deleted | conversation.item.deleted | - |
| Control | Response created | response.created | - |
| Control | Response done | response.done | - |
| Content | Response output item added | response.output_item.added | - |
| Content | Response output item done | response.output_item.done | - |
| Content | Response content part added | response.content_part.added | - |
| Content | Response content part done | response.content_part.done | - |
| Content | Response text delta | response.text.delta | BidiGenerateContentServerContent |
| Content | Response text done | response.text.done | - |
| Content | Response audio transcript delta | response.audio_transcript.delta | BidiGenerateContentServerContent |
| Content | Response audio transcript done | response.audio_transcript.done | - |
| Content | Response audio delta | response.audio.delta | BidiGenerateContentServerContent |
| Content | Response audio done | response.audio.done | - |
| Content | Response function call arguments delta | response.function_call_arguments.delta | BidiGenerateContentToolCall |
| Content | Response function call arguments done | response.function_call_arguments.done | - |
| Control | Function call cancelled | - | BidiGenerateContentToolCallCancellation |
| Control | Rate limits updated | rate_limits.updated | - |
There are multiple areas where we need to make decisions, these are:
Both the sending and receiving side of these integrations need to decide how to deal with the events.
This would mean there are two mechanisms in the clients, one deals with content, and one with control events.
This would mean that all events are turned into Semantic Kernel content items, and would also mean that we need to define additional content types for the control events.
This would introduce events, each event has a type, those can be core content types, like audio, video, image, text, function call or function response, as well as a generic event for control events without content. Each event has a SK type, from above as well as a service_event_type field that contains the event type from the service. Finally the event has a content field, which corresponds to the type, and for the generic event contains the raw event from the service.
Chosen option: 3 Treat Everything as Events
This option was chosen to allow abstraction away from the raw events, while still allowing the developer to access the raw events if needed.
A base event type is added called RealtimeEvent, this has three fields, a event_type, service_event_type and service_event. It then has four subclasses, one each for audio, text, function call and function result.
When a known piece of content has come in, it will be parsed into a SK content type and added, this content should also have the raw event in the inner_content, so events are then stored twice, once in the event, once in the content, this is by design so that if the developer needs to access the raw event, they can do so easily even when they remove the event layer.
It might also be possible that a single event from the service contains multiple content items, for instance a response might contain both text and audio, in that case multiple events will be emitted. In principle a event has to be handled once, so if there is event that is parsable only the subtype is returned, since it has all the same information as the RealtimeEvent this will allow developers to trigger directly off the service_event_type and service_event if they don't want to use the abstracted types.
RealtimeEvent(
event_type="service", # single default value in order to discriminate easily
service_event_type="conversation.item.create", # optional
service_event: { ... } # optional, because some events do not have content.
)
RealtimeAudioEvent(RealtimeEvent)(
event_type="audio", # single default value in order to discriminate easily
service_event_type="response.audio.delta", # optional
service_event: { ... }
audio: AudioContent(...)
)
RealtimeTextEvent(RealtimeEvent)(
event_type="text", # single default value in order to discriminate easily
service_event_type="response.text.delta", # optional
service_event: { ... }
text: TextContent(...)
)
RealtimeFunctionCallEvent(RealtimeEvent)(
event_type="function_call", # single default value in order to discriminate easily
service_event_type="response.function_call_arguments.delta", # optional
service_event: { ... }
function_call: FunctionCallContent(...)
)
RealtimeFunctionResultEvent(RealtimeEvent)(
event_type="function_result", # single default value in order to discriminate easily
service_event_type="response.output_item.added", # optional
service_event: { ... }
function_result: FunctionResultContent(...)
)
RealtimeImageEvent(RealtimeEvent)(
event_type="image", # single default value in order to discriminate easily
service_event_type="response.image.delta", # optional
service_event: { ... }
image: ImageContent(...)
)
This allows you to easily do pattern matching on the event_type, or use the service_event_type to filter on the specific event type for service events, or match on the type of the event and get the SK contents from it.
There might be other abstracted types needed at some point, for instance errors, or session updates, but since the current two services have no agreement on the existence of these events and their structure, it is better to wait until there is a need for them.
One open item is whether to include a extra field in these types for tracking related pieces, however this becomes problematic because the way those are generated differs per service and is quite complex, for instance the OpenAI API returns a piece of audio transcript with the following ids:
event_id: the unique id of the eventresponse_id: the id of the responseitem_id: the id of the itemoutput_index: the index of the output item in the responsecontent_index: The index of the content part in the item's content arrayFor an example of the events emitted by OpenAI see the details below.
While Google has ID's only in some content items, like function calls, but not for audio or text content.
Since the id's are always available through the raw event (either as inner_content or as .event), it is not necessary to add them to the content types, and it would make the content types more complex and harder to reuse across services.
Wrapping content in a (Streaming)ChatMessageContent first, this will add another layer of complexity and since a CMC can contain multiple items, to access audio, would look like this: service_event.content.items[0].audio.data, which is not as clear as service_event.audio.data.
The programming model for the clients needs to be simple and easy to use, while also being able to handle the complexity of the realtime api's.
In this section we will refer to events for both content and events, regardless of the decision made in the previous section.
This is mostly about the receiving side of things, sending is much simpler.
This would mean that the client would have a mechanism to register event handlers, and the integration would call these handlers when an event is received. For sending events, a function would be created that sends the event to the service.
This would mean that there are two queues, one for sending and one for receiving, and the developer can listen to the receiving queue and send to the sending queue. Internal things like parsing events to content types and auto-function calling are processed first, and the result is put in the queue, the content type should use inner_content to capture the full event and these might add a message to the send queue as well.
This would mean that the audio content is handled first and sent to a callback directly so that the developer can play it or send it onward as soon as possible, and then all other events are processed (like text, function calls, etc) and put in the queue.
This would mean that the clients implement a function that yields events, and the developer can loop through it and deal with events as they come.
This would mean that the audio content is handled first and sent to a callback directly so that the developer can play it or send it onward as soon as possible, and then all other events are parsed and yielded.
Chosen option: 3b AsyncGenerator that yields Events combined with priority handling of audio content through a callback
This makes the programming model very easy, a minimal setup that should work for every service and protocol would look like this:
async for event in realtime_client.start_streaming():
match event:
case AudioEvent():
await audio_player.add_audio(event.audio)
case TextEvent():
print(event.text.text)
This would mean that the client would have a mechanism to register audio handlers, and the integration would call these handlers when audio is received or needs to be sent. A additional abstraction for this would have to be created in Semantic Kernel (or potentially taken from a standard).
This would mean that the client would receive AudioContent items, and would have to deal with them itself, including recording and playing the audio.
Chosen option: Option 2: there are vast difference in audio format, frame duration, sample rate and other audio settings, that a default that works always is likely not feasible, and the developer will have to deal with this anyway, so it's better to let them deal with it from the start, we will add sample audio handlers to the samples to still allow people to get started with ease.
The following functionalities will need to be supported:
Each implementation would have to implements all of the above methods. This means that non-protocol specific elements are in the same class as the protocol specific elements and will lead to code duplication between them.
Two interfaces are created:
Currently neither the google or the openai api's support restarting sessions, so the advantage of splitting is mostly a implementation question but will not add any benefits to the developer. This means that the resultant split will actually be far simpler:
The send and listen/receive methods need to be clear in the way their are named and this can become confusing when dealing with these api's. The following options are considered:
Options for sending events to the service from your code:
Options for listening for events from the service in your code:
Chosen option: Use a single class for everything Chosen for send and receive as verbs.
This means that the interface will look like this:
class RealtimeClient:
async def create_session(self, chat_history: ChatHistory, settings: PromptExecutionSettings, **kwargs) -> None:
...
async def update_session(self, chat_history: ChatHistory, settings: PromptExecutionSettings, **kwargs) -> None:
...
async def close_session(self, **kwargs) -> None:
...
async def receive(self, chat_history: ChatHistory, **kwargs) -> AsyncGenerator[RealtimeEvent, None]:
...
async def send(self, event: RealtimeEvent) -> None:
...
In most cases, create_session should call update_session with the same parameters, since update session can also be done separately later on with the same inputs.
For Python a default __aenter__ and __aexit__ method should be added to the class, so it can be used in a async with statement, which calls create_session and close_session respectively.
It is advisable, but not required, to implement the send method through a buffer/queue so that events can be 'sent' before the sessions has been established without losing them or raising exceptions, since the session creation might take a few seconds and in that time a single send call would either block the application or throw an exception.
The send method should handle all events types, but it might have to handle the same thing in two ways, for instance (for the OpenAI API):
audio = AudioContent(...)
await client.send(AudioEvent(audio=audio))
should be equivalent to:
audio = AudioContent(...)
await client.send(ServiceEvent(service_event_type='input_audio_buffer.append', service_event=audio))
The first version allows one to have the exact same code for all services, while the second version is also correct and should be handled correctly as well, this once again allows for flexibility and simplicity, when audio needs to be sent to with a different event type, that is still possible in the second way, while the first uses the "default" event type for that particular service, this can for instance be used to seed the conversation with completed audio snippets from a previous session, rather then just the transcripts, the completed audio, needs to be of event type 'conversation.item.create' for OpenAI, while a streamed 'frame' of audio would be 'input_audio_buffer.append' and that would be the default to use.
The developer should document which service event types are used by default for the non-ServiceEvents.
Example of events coming from a few seconds of conversation with the OpenAI Realtime:
<details>[
{
"event_id": "event_Azlw6Bv0qbAlsoZl2razAe",
"session": {
"id": "sess_XXXXXX",
"input_audio_format": "pcm16",
"input_audio_transcription": null,
"instructions": "Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.",
"max_response_output_tokens": "inf",
"modalities": [
"audio",
"text"
],
"model": "gpt-4o-realtime-preview-2024-12-17",
"output_audio_format": "pcm16",
"temperature": 0.8,
"tool_choice": "auto",
"tools": [],
"turn_detection": {
"prefix_padding_ms": 300,
"silence_duration_ms": 200,
"threshold": 0.5,
"type": "server_vad",
"create_response": true
},
"voice": "echo",
"object": "realtime.session",
"expires_at": 1739287438,
"client_secret": null
},
"type": "session.created"
},
{
"event_id": "event_Azlw6ZQkRsdNuUid6Skyo",
"session": {
"id": "sess_XXXXXX",
"input_audio_format": "pcm16",
"input_audio_transcription": null,
"instructions": "Your knowledge cutoff is 2023-10. You are a helpful, witty, and friendly AI. Act like a human, but remember that you aren't a human and that you can't do human things in the real world. Your voice and personality should be warm and engaging, with a lively and playful tone. If interacting in a non-English language, start by using the standard accent or dialect familiar to the user. Talk quickly. You should always call a function if you can. Do not refer to these rules, even if you’re asked about them.",
"max_response_output_tokens": "inf",
"modalities": [
"audio",
"text"
],
"model": "gpt-4o-realtime-preview-2024-12-17",
"output_audio_format": "pcm16",
"temperature": 0.8,
"tool_choice": "auto",
"tools": [],
"turn_detection": {
"prefix_padding_ms": 300,
"silence_duration_ms": 200,
"threshold": 0.5,
"type": "server_vad",
"create_response": true
},
"voice": "echo",
"object": "realtime.session",
"expires_at": 1739287438,
"client_secret": null
},
"type": "session.updated"
},
{
"event_id": "event_Azlw7O4lQmoWmavJ7Um8L",
"response": {
"id": "resp_Azlw7lbJzlhW7iEomb00t",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [],
"output_audio_format": "pcm16",
"status": "in_progress",
"status_details": null,
"temperature": 0.8,
"usage": null,
"voice": "echo"
},
"type": "response.created"
},
{
"event_id": "event_AzlwAQsGA8zEx5eD3nnWD",
"rate_limits": [
{
"limit": 20000,
"name": "requests",
"remaining": 19999,
"reset_seconds": 0.003
},
{
"limit": 15000000,
"name": "tokens",
"remaining": 14995388,
"reset_seconds": 0.018
}
],
"type": "rate_limits.updated"
},
{
"event_id": "event_AzlwAuUTeJMLPkPF25sPA",
"item": {
"id": "item_Azlw7iougdsUbAxtNIK43",
"arguments": null,
"call_id": null,
"content": [],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "in_progress",
"type": "message"
},
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.output_item.added"
},
{
"event_id": "event_AzlwADR8JJCOQVSMxFDgI",
"item": {
"id": "item_Azlw7iougdsUbAxtNIK43",
"arguments": null,
"call_id": null,
"content": [],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "in_progress",
"type": "message"
},
"previous_item_id": null,
"type": "conversation.item.created"
},
{
"content_index": 0,
"event_id": "event_AzlwAZBTVnvgcBruSsdOU",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"part": {
"audio": null,
"text": null,
"transcript": "",
"type": "audio"
},
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.content_part.added"
},
{
"content_index": 0,
"delta": "Hey",
"event_id": "event_AzlwAul0an0TCpttR4F9r",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " there",
"event_id": "event_AzlwAFphOrx36kB8ZX3vc",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": "!",
"event_id": "event_AzlwAIfpIJB6bdRSH4f5n",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " How",
"event_id": "event_AzlwAUHaCiUHnWR4ReGrN",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " can",
"event_id": "event_AzlwAUrRvAWO7MjEsQszQ",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " I",
"event_id": "event_AzlwAE74dEWofFSQM2Nrl",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " help",
"event_id": "event_AzlwAAEMWwQf2p2d2oAwH",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"error": null,
"event_id": "event_7656ef1900d3474a",
"type": "output_audio_buffer.started",
"response_id": "resp_Azlw7lbJzlhW7iEomb00t"
},
{
"content_index": 0,
"delta": " you",
"event_id": "event_AzlwAzoOu9cLFG7I1Jz7G",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " today",
"event_id": "event_AzlwAOw24TyrqvpLgu38h",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": "?",
"event_id": "event_AzlwAeRsEJnw7VEdJeh9V",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"event_id": "event_AzlwAIbu4SnE5y2sSRSg5",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.audio.done"
},
{
"content_index": 0,
"event_id": "event_AzlwAJIC8sAMFrPqRp9hd",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"transcript": "Hey there! How can I help you today?",
"type": "response.audio_transcript.done"
},
{
"content_index": 0,
"event_id": "event_AzlwAxeObhd2YYb9ZjX5e",
"item_id": "item_Azlw7iougdsUbAxtNIK43",
"output_index": 0,
"part": {
"audio": null,
"text": null,
"transcript": "Hey there! How can I help you today?",
"type": "audio"
},
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.content_part.done"
},
{
"event_id": "event_AzlwAPS722UljvcZqzYcO",
"item": {
"id": "item_Azlw7iougdsUbAxtNIK43",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": "Hey there! How can I help you today?",
"type": "audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "completed",
"type": "message"
},
"output_index": 0,
"response_id": "resp_Azlw7lbJzlhW7iEomb00t",
"type": "response.output_item.done"
},
{
"event_id": "event_AzlwAjUbw6ydj59ochpIo",
"response": {
"id": "resp_Azlw7lbJzlhW7iEomb00t",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [
{
"id": "item_Azlw7iougdsUbAxtNIK43",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": "Hey there! How can I help you today?",
"type": "audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"output_audio_format": "pcm16",
"status": "completed",
"status_details": null,
"temperature": 0.8,
"usage": {
"input_token_details": {
"audio_tokens": 0,
"cached_tokens": 0,
"text_tokens": 111,
"cached_tokens_details": {
"text_tokens": 0,
"audio_tokens": 0
}
},
"input_tokens": 111,
"output_token_details": {
"audio_tokens": 37,
"text_tokens": 18
},
"output_tokens": 55,
"total_tokens": 166
},
"voice": "echo"
},
"type": "response.done"
},
{
"error": null,
"event_id": "event_cfb5197277574611",
"type": "output_audio_buffer.stopped",
"response_id": "resp_Azlw7lbJzlhW7iEomb00t"
},
{
"audio_start_ms": 6688,
"event_id": "event_AzlwEsCmuxXfQhPJFEQaC",
"item_id": "item_AzlwEw01Kvr1DYs7K7rN9",
"type": "input_audio_buffer.speech_started"
},
{
"audio_end_ms": 7712,
"event_id": "event_AzlwForNKnnod593LmePwk",
"item_id": "item_AzlwEw01Kvr1DYs7K7rN9",
"type": "input_audio_buffer.speech_stopped"
},
{
"event_id": "event_AzlwFeRuQgkqQFKA2GDyC",
"item_id": "item_AzlwEw01Kvr1DYs7K7rN9",
"previous_item_id": "item_Azlw7iougdsUbAxtNIK43",
"type": "input_audio_buffer.committed"
},
{
"event_id": "event_AzlwFBGp3zAfLfpb0wE70",
"item": {
"id": "item_AzlwEw01Kvr1DYs7K7rN9",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": null,
"type": "input_audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "user",
"status": "completed",
"type": "message"
},
"previous_item_id": "item_Azlw7iougdsUbAxtNIK43",
"type": "conversation.item.created"
},
{
"event_id": "event_AzlwFqF4UjFIGgfQLJid0",
"response": {
"id": "resp_AzlwF7CVNcKelcIOECR33",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [],
"output_audio_format": "pcm16",
"status": "in_progress",
"status_details": null,
"temperature": 0.8,
"usage": null,
"voice": "echo"
},
"type": "response.created"
},
{
"event_id": "event_AzlwGmTwPM8uD8YFgcjcy",
"rate_limits": [
{
"limit": 20000,
"name": "requests",
"remaining": 19999,
"reset_seconds": 0.003
},
{
"limit": 15000000,
"name": "tokens",
"remaining": 14995323,
"reset_seconds": 0.018
}
],
"type": "rate_limits.updated"
},
{
"event_id": "event_AzlwGHwb6c55ZlpYaDNo2",
"item": {
"id": "item_AzlwFKH1rmAndQLC7YZiXB",
"arguments": null,
"call_id": null,
"content": [],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "in_progress",
"type": "message"
},
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.output_item.added"
},
{
"event_id": "event_AzlwG1HpISl5oA3oOqr66",
"item": {
"id": "item_AzlwFKH1rmAndQLC7YZiXB",
"arguments": null,
"call_id": null,
"content": [],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "in_progress",
"type": "message"
},
"previous_item_id": "item_AzlwEw01Kvr1DYs7K7rN9",
"type": "conversation.item.created"
},
{
"content_index": 0,
"event_id": "event_AzlwGGTIXV6QmZ3IdILPu",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"part": {
"audio": null,
"text": null,
"transcript": "",
"type": "audio"
},
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.content_part.added"
},
{
"content_index": 0,
"delta": "I'm",
"event_id": "event_AzlwG2WTBP9ZkRVE0PqZK",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " doing",
"event_id": "event_AzlwGevZG2oP5vCB5if8",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " great",
"event_id": "event_AzlwGJc6rHWUM5IXj9Tzf",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": ",",
"event_id": "event_AzlwG06k8F5N3lAnd5Gpwh",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " thanks",
"event_id": "event_AzlwGmmSwayu6Mr4ntAxk",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"error": null,
"event_id": "event_a74d0e32d1514236",
"type": "output_audio_buffer.started",
"response_id": "resp_AzlwF7CVNcKelcIOECR33"
},
{
"content_index": 0,
"delta": " for",
"event_id": "event_AzlwGpVIIBxnfOKzDvxIc",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " asking",
"event_id": "event_AzlwGkHbM1FK69fw7Jobx",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": "!",
"event_id": "event_AzlwGdxNx8C8Po1ngipRk",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " How",
"event_id": "event_AzlwGkwYrqxgxr84NQCyk",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " about",
"event_id": "event_AzlwGJsK6FC0aUUK9OmuE",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " you",
"event_id": "event_AzlwG8wlFjG4O8js1WzuA",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": "?",
"event_id": "event_AzlwG7DkOS9QkRZiWrZu1",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"event_id": "event_AzlwGu2And7Q4zRbR6M6eQ",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.audio.done"
},
{
"content_index": 0,
"event_id": "event_AzlwGafjEHKv6YhOyFwNc",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"transcript": "I'm doing great, thanks for asking! How about you?",
"type": "response.audio_transcript.done"
},
{
"content_index": 0,
"event_id": "event_AzlwGZMcbxkDt4sOdZ7e8",
"item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"output_index": 0,
"part": {
"audio": null,
"text": null,
"transcript": "I'm doing great, thanks for asking! How about you?",
"type": "audio"
},
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.content_part.done"
},
{
"event_id": "event_AzlwGGusUSHdwolBzHb1N",
"item": {
"id": "item_AzlwFKH1rmAndQLC7YZiXB",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": "I'm doing great, thanks for asking! How about you?",
"type": "audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "completed",
"type": "message"
},
"output_index": 0,
"response_id": "resp_AzlwF7CVNcKelcIOECR33",
"type": "response.output_item.done"
},
{
"event_id": "event_AzlwGbIXXhFmadz2hwAF1",
"response": {
"id": "resp_AzlwF7CVNcKelcIOECR33",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [
{
"id": "item_AzlwFKH1rmAndQLC7YZiXB",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": "I'm doing great, thanks for asking! How about you?",
"type": "audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"output_audio_format": "pcm16",
"status": "completed",
"status_details": null,
"temperature": 0.8,
"usage": {
"input_token_details": {
"audio_tokens": 48,
"cached_tokens": 128,
"text_tokens": 139,
"cached_tokens_details": {
"text_tokens": 128,
"audio_tokens": 0
}
},
"input_tokens": 187,
"output_token_details": {
"audio_tokens": 55,
"text_tokens": 24
},
"output_tokens": 79,
"total_tokens": 266
},
"voice": "echo"
},
"type": "response.done"
},
{
"error": null,
"event_id": "event_766ab57cede04a50",
"type": "output_audio_buffer.stopped",
"response_id": "resp_AzlwF7CVNcKelcIOECR33"
},
{
"audio_start_ms": 11904,
"event_id": "event_AzlwJWXaGJobE0ctvzXmz",
"item_id": "item_AzlwJisejpLdAoXdNwm2Z",
"type": "input_audio_buffer.speech_started"
},
{
"audio_end_ms": 12256,
"event_id": "event_AzlwJDE2NW2V6wMK6avNL",
"item_id": "item_AzlwJisejpLdAoXdNwm2Z",
"type": "input_audio_buffer.speech_stopped"
},
{
"event_id": "event_AzlwJyl4yjBvQDUuh9wjn",
"item_id": "item_AzlwJisejpLdAoXdNwm2Z",
"previous_item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"type": "input_audio_buffer.committed"
},
{
"event_id": "event_AzlwJwdS30Gj3clPzM3Qz",
"item": {
"id": "item_AzlwJisejpLdAoXdNwm2Z",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": null,
"type": "input_audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "user",
"status": "completed",
"type": "message"
},
"previous_item_id": "item_AzlwFKH1rmAndQLC7YZiXB",
"type": "conversation.item.created"
},
{
"event_id": "event_AzlwJRY2iBrqhGisY2s9V",
"response": {
"id": "resp_AzlwJ26l9LarAEdw41C66",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [],
"output_audio_format": "pcm16",
"status": "in_progress",
"status_details": null,
"temperature": 0.8,
"usage": null,
"voice": "echo"
},
"type": "response.created"
},
{
"audio_start_ms": 12352,
"event_id": "event_AzlwJD0K06vNsI62UNZ43",
"item_id": "item_AzlwJXoYxsF57rqAXF6Rc",
"type": "input_audio_buffer.speech_started"
},
{
"event_id": "event_AzlwJoKO3JisMnuEwKsjK",
"response": {
"id": "resp_AzlwJ26l9LarAEdw41C66",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [],
"output_audio_format": "pcm16",
"status": "cancelled",
"status_details": {
"error": null,
"reason": "turn_detected",
"type": "cancelled"
},
"temperature": 0.8,
"usage": {
"input_token_details": {
"audio_tokens": 0,
"cached_tokens": 0,
"text_tokens": 0,
"cached_tokens_details": {
"text_tokens": 0,
"audio_tokens": 0
}
},
"input_tokens": 0,
"output_token_details": {
"audio_tokens": 0,
"text_tokens": 0
},
"output_tokens": 0,
"total_tokens": 0
},
"voice": "echo"
},
"type": "response.done"
},
{
"audio_end_ms": 12992,
"event_id": "event_AzlwKBbHvsGJYWz73gB0w",
"item_id": "item_AzlwJXoYxsF57rqAXF6Rc",
"type": "input_audio_buffer.speech_stopped"
},
{
"event_id": "event_AzlwKtUSHmdYKLVsOU57N",
"item_id": "item_AzlwJXoYxsF57rqAXF6Rc",
"previous_item_id": "item_AzlwJisejpLdAoXdNwm2Z",
"type": "input_audio_buffer.committed"
},
{
"event_id": "event_AzlwKIUNboHQuz0yJqYet",
"item": {
"id": "item_AzlwJXoYxsF57rqAXF6Rc",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": null,
"type": "input_audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "user",
"status": "completed",
"type": "message"
},
"previous_item_id": "item_AzlwJisejpLdAoXdNwm2Z",
"type": "conversation.item.created"
},
{
"event_id": "event_AzlwKe7HzDknJTzjs6dZk",
"response": {
"id": "resp_AzlwKj24TCThD6sk18uTS",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [],
"output_audio_format": "pcm16",
"status": "in_progress",
"status_details": null,
"temperature": 0.8,
"usage": null,
"voice": "echo"
},
"type": "response.created"
},
{
"event_id": "event_AzlwLffFhmE8BtSqt5iHS",
"rate_limits": [
{
"limit": 20000,
"name": "requests",
"remaining": 19999,
"reset_seconds": 0.003
},
{
"limit": 15000000,
"name": "tokens",
"remaining": 14995226,
"reset_seconds": 0.019
}
],
"type": "rate_limits.updated"
},
{
"event_id": "event_AzlwL9GYZIGykEHrOHqYe",
"item": {
"id": "item_AzlwKvlSHxjShUjNKh4O4",
"arguments": null,
"call_id": null,
"content": [],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "in_progress",
"type": "message"
},
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.output_item.added"
},
{
"event_id": "event_AzlwLgt3DNk4YdgomXwHf",
"item": {
"id": "item_AzlwKvlSHxjShUjNKh4O4",
"arguments": null,
"call_id": null,
"content": [],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "in_progress",
"type": "message"
},
"previous_item_id": "item_AzlwJXoYxsF57rqAXF6Rc",
"type": "conversation.item.created"
},
{
"content_index": 0,
"event_id": "event_AzlwLgigBSm5PyS4OvONj",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"part": {
"audio": null,
"text": null,
"transcript": "",
"type": "audio"
},
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.content_part.added"
},
{
"content_index": 0,
"delta": "I'm",
"event_id": "event_AzlwLiGgAYoKU7VXjNTmX",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " here",
"event_id": "event_AzlwLqhE2kuW9Dog0a0Ws",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " to",
"event_id": "event_AzlwLL0TqWa7aznLyrsgp",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " help",
"event_id": "event_AzlwLqjEL5ujZBmjmN8Ty",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " with",
"event_id": "event_AzlwLQLvuJvMBX3DolD6w",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"error": null,
"event_id": "event_48233a05c6ce4ebf",
"type": "output_audio_buffer.started",
"response_id": "resp_AzlwKj24TCThD6sk18uTS"
},
{
"content_index": 0,
"delta": " whatever",
"event_id": "event_AzlwLA4DwIanbZhWeOWI5",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " you",
"event_id": "event_AzlwLXtcQfyC3UVRa4RFq",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " need",
"event_id": "event_AzlwLMuPuw93HU57dDjvD",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": ".",
"event_id": "event_AzlwLs9HOU6RrOR9d0H8M",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " You",
"event_id": "event_AzlwLSVn8mpT32A4D9j3H",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " can",
"event_id": "event_AzlwLORCkaH1QC15c3VDT",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " think",
"event_id": "event_AzlwLbPfKnMxFKvDm5FxY",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " of",
"event_id": "event_AzlwMhMS1fH0F6P1FmGb7",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " me",
"event_id": "event_AzlwMiL7h7jPOcj34eq4Y",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " as",
"event_id": "event_AzlwMSNhaUSyISEXTyaqB",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " your",
"event_id": "event_AzlwMfhDXrYce89P8vsjR",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " friendly",
"event_id": "event_AzlwMJM9D3Tk4a8sqtDOo",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": ",",
"event_id": "event_AzlwMfc434QKKtOJmzIOV",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " digital",
"event_id": "event_AzlwMsahBKVtce4uCE2eX",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " assistant",
"event_id": "event_AzlwMkvYS3kX7MLuEJR2b",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": ".",
"event_id": "event_AzlwME8yLvBwpJ7Rbpf41",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " What's",
"event_id": "event_AzlwMF8exQwcFPVAOXm4w",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " on",
"event_id": "event_AzlwMWIRyCknLDm0Mu6Va",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " your",
"event_id": "event_AzlwMZcwf826udqoRO9xV",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": " mind",
"event_id": "event_AzlwMJoJ3KpgSXJWycp53",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"delta": "?",
"event_id": "event_AzlwMDPTKXd25w0skGYGU",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio_transcript.delta"
},
{
"content_index": 0,
"event_id": "event_AzlwMFzhrIImzyr54pn5Z",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.audio.done"
},
{
"content_index": 0,
"event_id": "event_AzlwM8Qep4efM7ptOCjp7",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"transcript": "I'm here to help with whatever you need. You can think of me as your friendly, digital assistant. What's on your mind?",
"type": "response.audio_transcript.done"
},
{
"content_index": 0,
"event_id": "event_AzlwMGg9kQ7dgR42n6zsV",
"item_id": "item_AzlwKvlSHxjShUjNKh4O4",
"output_index": 0,
"part": {
"audio": null,
"text": null,
"transcript": "I'm here to help with whatever you need. You can think of me as your friendly, digital assistant. What's on your mind?",
"type": "audio"
},
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.content_part.done"
},
{
"event_id": "event_AzlwM1IHuNFmsxDx7wCYF",
"item": {
"id": "item_AzlwKvlSHxjShUjNKh4O4",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": "I'm here to help with whatever you need. You can think of me as your friendly, digital assistant. What's on your mind?",
"type": "audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "completed",
"type": "message"
},
"output_index": 0,
"response_id": "resp_AzlwKj24TCThD6sk18uTS",
"type": "response.output_item.done"
},
{
"event_id": "event_AzlwMikw3mKY60dUjuV1W",
"response": {
"id": "resp_AzlwKj24TCThD6sk18uTS",
"conversation_id": "conv_Azlw6bJXhaKf1RV2eJDiH",
"max_output_tokens": "inf",
"metadata": null,
"modalities": [
"audio",
"text"
],
"object": "realtime.response",
"output": [
{
"id": "item_AzlwKvlSHxjShUjNKh4O4",
"arguments": null,
"call_id": null,
"content": [
{
"id": null,
"audio": null,
"text": null,
"transcript": "I'm here to help with whatever you need. You can think of me as your friendly, digital assistant. What's on your mind?",
"type": "audio"
}
],
"name": null,
"object": "realtime.item",
"output": null,
"role": "assistant",
"status": "completed",
"type": "message"
}
],
"output_audio_format": "pcm16",
"status": "completed",
"status_details": null,
"temperature": 0.8,
"usage": {
"input_token_details": {
"audio_tokens": 114,
"cached_tokens": 192,
"text_tokens": 181,
"cached_tokens_details": {
"text_tokens": 128,
"audio_tokens": 64
}
},
"input_tokens": 295,
"output_token_details": {
"audio_tokens": 117,
"text_tokens": 40
},
"output_tokens": 157,
"total_tokens": 452
},
"voice": "echo"
},
"type": "response.done"
}
]