docs/cookbooks/companions/voice-companion-openai.mdx
This guide demonstrates how to combine OpenAI's Agents SDK for voice applications with Mem0's memory capabilities to create a voice assistant that remembers user preferences and past interactions.
Before you begin, make sure you have:
pip install 'openai-agents[voice]'
pip install mem0ai
pip install numpy sounddevice pydantic
Let's break down the key components of this implementation:
# OpenAI Agents SDK imports
from agents import (
Agent,
function_tool
)
from agents.voice import (
AudioInput,
SingleAgentVoiceWorkflow,
VoicePipeline
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
# Mem0 imports
from mem0 import AsyncMemoryClient
# Set up API keys (replace with your actual keys)
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
os.environ["MEM0_API_KEY"] = "your-mem0-api-key"
# Define a global user ID for simplicity
USER_ID = "voice_user"
# Initialize Mem0 client
mem0_client = AsyncMemoryClient()
This section handles:
The @function_tool decorator transforms Python functions into callable tools for the OpenAI agent. Here are the key memory tools:
import logging
# Set up logging at the top of your file
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
force=True
)
logger = logging.getLogger("memory_voice_agent")
# Then use logger in your function tools
@function_tool
async def save_memories(
memory: str
) -> str:
"""Store a user memory in memory."""
# This will be visible in your console
logger.debug(f"Saving memory: {memory} for user {USER_ID}")
# Store the preference in Mem0
memory_content = f"User memory - {memory}"
await mem0_client.add(
memory_content,
user_id=USER_ID,
)
return f"I've saved your memory: {memory}"
This function:
add() method@function_tool
async def search_memories(
query: str
) -> str:
"""
Find memories relevant to the current conversation.
Args:
query: The search query to find relevant memories
"""
print(f"Finding memories related to: {query}")
results = await mem0_client.search(
query,
filters={"user_id": USER_ID},
top_k=5,
threshold=0.7, # Higher threshold for more relevant results
)
# Format and return the results
if not results.get('results', []):
return "I don't have any relevant memories about this topic."
memories = [f"• {result['memory']}" for result in results.get('results', [])]
return "Here's what I remember that might be relevant:\n" + "\n".join(memories)
This tool:
def create_memory_voice_agent():
# Create the agent with memory-enabled tools
agent = Agent(
name="Memory Assistant",
instructions=prompt_with_handoff_instructions(
"""You're speaking to a human, so be polite and concise.
Always respond in clear, natural English.
You have the ability to remember information about the user.
Use the save_memories tool when the user shares an important information worth remembering.
Use the search_memories tool when you need context from past conversations or user asks you to recall something.
""",
),
model="gpt-5-mini",
tools=[save_memories, search_memories],
)
return agent
This function:
prompt_with_handoff_instructions to include standard voice agent behaviorsasync def record_from_microphone(duration=5, samplerate=24000):
"""Record audio from the microphone for a specified duration."""
print(f"Recording for {duration} seconds...")
# Create a buffer to store the recorded audio
frames = []
# Callback function to store audio data
def callback(indata, frames_count, time_info, status):
frames.append(indata.copy())
# Start recording
with sd.InputStream(samplerate=samplerate, channels=1, callback=callback, dtype=np.int16):
await asyncio.sleep(duration)
# Combine all frames into a single numpy array
audio_data = np.concatenate(frames)
return audio_data
This function:
async def main():
# Create the agent
agent = create_memory_voice_agent()
# Set up the voice pipeline
pipeline = VoicePipeline(
workflow=SingleAgentVoiceWorkflow(agent)
)
# Configure TTS settings
pipeline.config.tts_settings.voice = "alloy"
pipeline.config.tts_settings.speed = 1.0
try:
while True:
# Get user input
print("\nPress Enter to start recording (or 'q' to quit)...")
user_input = input()
if user_input.lower() == 'q':
break
# Record and process audio
audio_data = await record_from_microphone(duration=5)
audio_input = AudioInput(buffer=audio_data)
result = await pipeline.run(audio_input)
# Play response and handle events
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
agent_response = ""
print("\nAgent response:")
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
elif event.type == "voice_stream_event_content":
content = event.data
agent_response += content
print(content, end="", flush=True)
# Save the agent's response to memory
if agent_response:
try:
await mem0_client.add(
f"Agent response: {agent_response}",
user_id=USER_ID,
metadata={"type": "agent_response"}
)
except Exception as e:
print(f"Failed to store memory: {e}")
except KeyboardInterrupt:
print("\nExiting...")
This main function orchestrates the entire process:
Now that we've explained each component, here's the complete implementation that combines OpenAI Agents SDK for voice with Mem0's memory capabilities:
import asyncio
import os
import logging
from typing import Optional, List, Dict, Any
import numpy as np
import sounddevice as sd
from pydantic import BaseModel
# OpenAI Agents SDK imports
from agents import (
Agent,
function_tool
)
from agents.voice import (
AudioInput,
SingleAgentVoiceWorkflow,
VoicePipeline
)
from agents.extensions.handoff_prompt import prompt_with_handoff_instructions
# Mem0 imports
from mem0 import AsyncMemoryClient
# Set up API keys (replace with your actual keys)
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
os.environ["MEM0_API_KEY"] = "your-mem0-api-key"
# Define a global user ID for simplicity
USER_ID = "voice_user"
# Initialize Mem0 client
mem0_client = AsyncMemoryClient()
# Create tools that utilize Mem0's memory
@function_tool
async def save_memories(
memory: str
) -> str:
"""
Store a user memory in memory.
Args:
memory: The memory to save
"""
print(f"Saving memory: {memory} for user {USER_ID}")
# Store the preference in Mem0
memory_content = f"User memory - {memory}"
await mem0_client.add(
memory_content,
user_id=USER_ID,
)
return f"I've saved your memory: {memory}"
@function_tool
async def search_memories(
query: str
) -> str:
"""
Find memories relevant to the current conversation.
Args:
query: The search query to find relevant memories
"""
print(f"Finding memories related to: {query}")
results = await mem0_client.search(
query,
filters={"user_id": USER_ID},
top_k=5,
threshold=0.7, # Higher threshold for more relevant results
)
# Format and return the results
if not results.get('results', []):
return "I don't have any relevant memories about this topic."
memories = [f"• {result['memory']}" for result in results.get('results', [])]
return "Here's what I remember that might be relevant:\n" + "\n".join(memories)
# Create the agent with memory-enabled tools
def create_memory_voice_agent():
# Create the agent with memory-enabled tools
agent = Agent(
name="Memory Assistant",
instructions=prompt_with_handoff_instructions(
"""You're speaking to a human, so be polite and concise.
Always respond in clear, natural English.
You have the ability to remember information about the user.
Use the save_memories tool when the user shares an important information worth remembering.
Use the search_memories tool when you need context from past conversations or user asks you to recall something.
""",
),
model="gpt-5-mini",
tools=[save_memories, search_memories],
)
return agent
async def record_from_microphone(duration=5, samplerate=24000):
"""Record audio from the microphone for a specified duration."""
print(f"Recording for {duration} seconds...")
# Create a buffer to store the recorded audio
frames = []
# Callback function to store audio data
def callback(indata, frames_count, time_info, status):
frames.append(indata.copy())
# Start recording
with sd.InputStream(samplerate=samplerate, channels=1, callback=callback, dtype=np.int16):
await asyncio.sleep(duration)
# Combine all frames into a single numpy array
audio_data = np.concatenate(frames)
return audio_data
async def main():
print("Starting Memory Voice Agent")
# Create the agent and context
agent = create_memory_voice_agent()
# Set up the voice pipeline
pipeline = VoicePipeline(
workflow=SingleAgentVoiceWorkflow(agent)
)
# Configure TTS settings
pipeline.config.tts_settings.voice = "alloy"
pipeline.config.tts_settings.speed = 1.0
try:
while True:
# Get user input
print("\nPress Enter to start recording (or 'q' to quit)...")
user_input = input()
if user_input.lower() == 'q':
break
# Record and process audio
audio_data = await record_from_microphone(duration=5)
audio_input = AudioInput(buffer=audio_data)
print("Processing your request...")
# Process the audio input
result = await pipeline.run(audio_input)
# Create an audio player
player = sd.OutputStream(samplerate=24000, channels=1, dtype=np.int16)
player.start()
# Store the agent's response for adding to memory
agent_response = ""
print("\nAgent response:")
# Play the audio stream as it comes in
async for event in result.stream():
if event.type == "voice_stream_event_audio":
player.write(event.data)
elif event.type == "voice_stream_event_content":
# Accumulate and print the text response
content = event.data
agent_response += content
print(content, end="", flush=True)
print("\n")
# Example of saving the conversation to Mem0 after completion
if agent_response:
try:
await mem0_client.add(
f"Agent response: {agent_response}",
user_id=USER_ID,
metadata={"type": "agent_response"}
)
except Exception as e:
print(f"Failed to store memory: {e}")
except KeyboardInterrupt:
print("\nExiting...")
if __name__ == "__main__":
asyncio.run(main())
This implementation offers several key features:
Simplified User Management: Uses a global USER_ID variable for simplicity, but can be extended to manage multiple users.
Real Microphone Input: Includes a record_from_microphone() function that captures actual voice input from your microphone.
Interactive Voice Loop: Implements a continuous interaction loop, allowing for multiple back-and-forth exchanges.
Memory Management Tools:
save_memories: Stores user memories in Mem0search_memories: Searches for relevant past informationVoice Configuration: Demonstrates how to configure TTS settings for the voice response.
To run this example:
The agent will listen to your request, process it through the OpenAI model, utilize Mem0 for memory operations as needed, and respond both through text output and voice speech.
Optimizing Memory for Voice: Keep memories concise and relevant for voice responses.
Forgetting Mechanism: Implement a way to delete or expire memories that are no longer relevant.
Context Preservation: Store enough context with each memory to make retrieval effective.
Error Handling: Implement robust error handling for memory operations, as voice interactions should continue smoothly even if memory operations fail.
By combining OpenAI's Agents SDK with Mem0's memory capabilities, you can create voice agents that maintain persistent memory of user preferences and past interactions. This significantly enhances the user experience by making conversations more natural and personalized.
As you build your voice application, experiment with different memory strategies and filtering approaches to find the optimal balance between comprehensive memory and efficient retrieval for your specific use case.
When working with the OpenAI Agents SDK, you might notice that regular print() statements inside @function_tool decorated functions don't appear in your console output. This is because the Agents SDK captures and redirects standard output when executing these functions.
To effectively debug your function tools, use Python's logging module instead:
import logging
# Set up logging at the top of your file
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
force=True
)
logger = logging.getLogger("memory_voice_agent")
# Then use logger in your function tools
@function_tool
async def save_memories(
memory: str
) -> str:
"""Store a user memory in memory."""
# This will be visible in your console
logger.debug(f"Saving memory: {memory} for user {USER_ID}")
# Rest of your function...