docs/doc/developer/backend/chat_system.mdx
Omi's chat system is a sophisticated agentic AI pipeline that enables users to have intelligent conversations about their recorded memories, calendar events, health data, and more. This document provides a complete technical understanding of how questions flow through the system.
<CardGroup cols={3}> <Card title="Classifies" icon="filter"> Determines if context is needed </Card> <Card title="Routes" icon="route"> Simple, Agentic, or Persona path </Card> <Card title="Tools" icon="wrench"> 22+ integrated data sources </Card> <Card title="Retrieves" icon="magnifying-glass"> Vector search & metadata filters </Card> <Card title="Cites" icon="link"> Links to source conversations </Card> <Card title="Streams" icon="tower-broadcast"> Real-time thinking & response </Card> </CardGroup>flowchart TD
subgraph Client["📱 Flutter App"]
Q[User Question]
end
subgraph Backend["🖥️ FastAPI Backend"]
Router{LangGraph
Router}
end
Q --> Router
Router -->|Simple| NC[No Context Path]
Router -->|Context Needed| A[Agentic Path]
Router -->|Persona App| P[Persona Path]
NC --> LLM1[Direct LLM
Response]
A --> Tools[Tool Calls
22+ tools]
Tools --> LLM2[LLM with
Context]
P --> LLM3[Persona LLM
Response]
LLM1 --> Stream[Streaming Response]
LLM2 --> Stream
LLM3 --> Stream
Stream --> |with citations| Q
**When triggered:** Simple greetings, general advice, brainstorming questions
**Classification criteria** (from `requires_context()` function):
- Greetings: "Hi", "Hello", "How are you?"
- General knowledge: "What's the capital of France?"
- Advice without personal context: "Tips for productivity"
**Processing:**
```python
# Location: backend/utils/retrieval/graph.py
def no_context_conversation(state: ChatState, config: RunnableConfig):
# Direct LLM call without tool access
# Fast response, no memory retrieval
```
<Tip>
This path provides the fastest responses since no external data retrieval is needed.
</Tip>
**When triggered:** Questions requiring personal data, temporal queries, integration lookups
**Classification criteria:**
- References to "my", "I", personal data
- Temporal references: "yesterday", "last week", "this morning"
- Questions about conversations, memories, calendar, health
- Requests involving connected services
**Processing:**
```python
# Location: backend/utils/retrieval/agentic.py
# LangGraph ReAct agent with full tool access
# LLM autonomously decides which tools to call
# Can make multiple tool calls to gather comprehensive context
```
<Info>
This is the most powerful path - the LLM can call 22+ tools to gather comprehensive context before answering.
</Info>
**When triggered:** Questions directed at persona-based apps (e.g., "Ask Einstein")
**Processing:**
- Uses the app's configured `persona_prompt`
- Character-consistent responses
- May have limited tool access based on app configuration
<Note>
Persona apps can customize which tools are available, allowing for focused conversational experiences.
</Note>
The requires_context() function determines the routing path:
# Location: backend/utils/retrieval/graph.py
def requires_context(messages: list) -> bool:
"""
Uses GPT-4-mini for fast classification.
Returns True if question needs:
- Personal memories/conversations
- Calendar/email/health data
- Temporal context
- User-specific information
"""
The LangGraph ReAct agent follows this cycle:
<Steps> <Step title="Receive Question" icon="inbox"> System prompt provides tool descriptions, user's timezone, and citation instructions </Step> <Step title="Decide Tools" icon="brain"> LLM autonomously decides which tool(s) to call based on question intent </Step> <Step title="Execute Tools" icon="play"> Tool calls are executed and results returned to the agent </Step> <Step title="Synthesize or Continue" icon="arrows-rotate"> Agent synthesizes response OR makes additional tool calls if more context needed </Step> <Step title="Generate Answer" icon="message"> Final answer generated with proper `[1][2]` citations linking to source conversations </Step> </Steps>| Tool | Purpose | Key Parameters |
|------|---------|----------------|
| `get_conversations_tool` | Retrieve by date range | `start_date`, `end_date`, `limit`, `include_transcript` |
| `search_conversations_tool` | Semantic search | `query`, `start_date`, `end_date`, `limit` |
| `get_memories_tool` | Personal facts about user | `limit`, `offset` |
| Tool | Purpose |
|------|---------|
| `get_action_items_tool` | Retrieve pending tasks |
| `create_action_item_tool` | Create new task |
| `update_action_item_tool` | Mark complete/update |
| Tool | Purpose |
|------|---------|
| `get_calendar_events_tool` | Fetch events by date/person |
| `create_calendar_event_tool` | Create meetings with attendees |
| `update_calendar_event_tool` | Modify existing events |
| `delete_calendar_event_tool` | Cancel meetings |
| Tool | Service | Purpose |
|------|---------|---------|
| `get_gmail_messages_tool` | Gmail | Search emails |
| `get_whoop_sleep_tool` | Whoop | Sleep data |
| `get_whoop_recovery_tool` | Whoop | Recovery scores |
| `get_whoop_workout_tool` | Whoop | Workout history |
| `search_notion_pages_tool` | Notion | Search workspace |
| `get_twitter_tweets_tool` | Twitter/X | Recent tweets |
| `get_github_pull_requests_tool` | GitHub | Open PRs |
| `get_github_issues_tool` | GitHub | Open issues |
| `perplexity_web_search_tool` | Perplexity | Web search |
```python
# Location: backend/utils/retrieval/tools/app_tools.py
def load_app_tools(uid: str) -> List[Callable]:
"""
Loads tools from user's enabled apps.
Each app can define chat_tools in its configuration.
"""
```
<Tip>
See [Chat Tools for Apps](/doc/developer/apps/ChatTools) to learn how to build custom tools.
</Tip>
# Maximum 10 tool calls per question (prevents runaway loops)
# Maximum 500K tokens in context (prevents context overflow)
# 30-second timeout per external API call
sequenceDiagram
participant Agent as 🤖 LLM Agent
participant Tool as 🔧 Vector Search Tool
participant Embed as 📊 OpenAI Embeddings
participant Pine as 🌲 Pinecone
participant Fire as 🔥 Firestore
Agent->>Tool: search_conversations_tool(query, dates)
Tool->>Embed: embed_query("John project discussion")
Embed-->>Tool: [0.012, -0.034, 0.056, ...] (3,072 dims)
Tool->>Pine: query(vector, uid filter, date range)
Pine-->>Tool: [conv_id_456, conv_id_789] ranked by similarity
Tool->>Fire: get_conversations_by_id(ids)
Fire-->>Tool: Full conversation data
Tool-->>Agent: Formatted context with citations
| Setting | Value |
|---|---|
| Database | Pinecone (serverless) |
| Embedding Model | text-embedding-3-large (OpenAI) |
| Vector Dimensions | 3,072 |
| Namespace | "ns1" |
| Vector ID Format | {uid}-{conversation_id} |
| Data | Embedded? | Metadata? |
|---|---|---|
| Title | Yes | No |
| Overview/Summary | Yes | No |
| Action Items | Yes | No |
| Full Transcript | No (too large) | No |
| People Mentioned | No | Yes |
| Topics | No | Yes |
| Entities | No | Yes |
| Dates Mentioned | No | Yes |
created_at | No | Yes (Unix timestamp) |
# Location: backend/utils/conversations/process_conversation.py
# Triggered after conversation processing completes
def save_structured_vector(uid: str, conversation: Conversation):
"""
1. Generate embedding from conversation.structured
(title + overview + action_items + events)
2. Extract metadata via LLM (people, topics, entities, dates)
3. Upsert to Pinecone with metadata
"""
# Location: backend/database/vector_db.py
def query_vectors(query: str, uid: str, starts_at: int, ends_at: int, k: int):
"""
1. Embed query using text-embedding-3-large
2. Query Pinecone with uid filter and optional date range
3. Return top-k conversation IDs ranked by similarity
"""
def query_vectors_by_metadata(uid, vector, dates_filter, people, topics, entities, dates, limit):
"""
Advanced query with metadata filters.
Includes fallback: if no results with filters, retries without them.
"""
Memories are distinct from Conversations. They are structured facts extracted about the user over time.
| Category | Examples |
|---|---|
interesting | Hobbies, opinions, stories |
system | Preferences, habits |
manual | User-defined facts |
# From backend/utils/llm/chat.py
# Maximum 15 words per memory
# Must pass "shareability test" - worth telling someone
# Max 2 interesting + 2 system memories per conversation
# NO duplicate/near-duplicate facts
# NO mundane details (eating, sleeping, commuting)
# Tool: get_memories_tool
# Returns formatted list of known facts about user
# Used when questions ask "What do you know about me?"
# Location: backend/database/chat.py
# Chat sessions group related messages
# Each session tracks:
# - message_ids: List of message IDs
# - file_ids: Uploaded files for this session
# - openai_thread_id: For file-based chat
# Last 10 messages included in context
# Enables follow-up questions without re-stating context
# Older messages summarized or excluded
The LLM generates citations in [1][2] format:
# Citation rules:
# - No space before citation: "discussed this[1]" not "discussed this [1]"
# - Citations map to conversation IDs
# - Post-processing extracts citations → memories_id field
# - Frontend displays linked conversation cards
The main system prompt includes:
# Location: backend/utils/llm/chat.py - _get_agentic_qa_prompt()
# 1. Current datetime in user's timezone
# 2. Tool usage instructions
# 3. DateTime formatting rules for tool calls
# 4. Conversation retrieval strategies (5-step strategy)
# 5. Citation format instructions
# 6. Memory extraction guidelines
# Good: "2024-01-19T00:00:00-08:00"
# Bad: "yesterday", "last week" (must be converted)
# The system prompt instructs the LLM to convert relative
# references to absolute ISO timestamps before tool calls
The system prompt guides the LLM through a 5-step strategy:
get_conversations for date-based, vector_search for topic-based| Model | Use Case | Location |
|---|---|---|
gpt-4.1-mini | Fast classification, date extraction | requires_context(), filters |
gpt-4.1 | Medium complexity, initial QA | QA with RAG context |
gpt-5.1 | Agentic workflows with tool calling | Main chat agent |
text-embedding-3-large | Vector embeddings (3,072 dims) | Pinecone queries |
| Gemini Flash 1.5 | Persona responses | Via OpenRouter |
| Claude 3.5 Sonnet | Persona responses | Via OpenRouter |
The backend streams responses in Server-Sent Events (SSE) format:
think: Searching conversations # Tool call indicator
data: Yesterday you discussed... # Response text chunks
done: {base64 encoded JSON} # Final message with metadata
The Flutter app parses these to show:
| Component | File Path |
|---|---|
| Chat Router | backend/routers/chat.py |
| LangGraph Router | backend/utils/retrieval/graph.py |
| Agentic System | backend/utils/retrieval/agentic.py |
| Tools Directory | backend/utils/retrieval/tools/ |
| Conversation Tools | backend/utils/retrieval/tools/conversation_tools.py |
| Memory Tools | backend/utils/retrieval/tools/memory_tools.py |
| Calendar Tools | backend/utils/retrieval/tools/calendar_tools.py |
| App Tools Loader | backend/utils/retrieval/tools/app_tools.py |
| LLM Clients | backend/utils/llm/clients.py |
| Chat Prompts | backend/utils/llm/chat.py |
| Vector Database | backend/database/vector_db.py |
User asks: "What did I discuss with John yesterday about the project?"
<Steps> <Step title="Classification" icon="filter"> `requires_context()` → **TRUE** (temporal + person + topic reference)Route to: `agentic_context_dependent_conversation`
Agent thinks: *"Need conversations from yesterday about project with John"*
Agent calls: `search_conversations_tool`
- `query`: "John project discussion"
- `start_date`: "2024-01-19T00:00:00-08:00"
- `end_date`: "2024-01-19T23:59:59-08:00"
*"Yesterday you discussed the Q1 roadmap with John[1]. He mentioned the frontend refactoring is ahead of schedule[1][2]..."*