docs/doc/developer/backend/StoringConversations.mdx
Omi uses a dual-collection architecture for storing user data:
<CardGroup cols={2}> <Card title="Conversations" icon="comments"> **Primary storage** for recorded interactions - transcripts, audio, structured summaries </Card> <Card title="Memories" icon="brain"> **Secondary storage** for extracted facts/learnings FROM conversations </Card> </CardGroup>This separation allows for efficient retrieval of both full conversation context and quick access to key facts about the user.
flowchart TD
subgraph Recording["š± Recording"]
R[User Recording] --> AS[Audio Stream]
AS --> T[Transcription
Deepgram]
end
T --> POST[POST /v1/conversations]
POST --> PC[process_conversation]
PC --> GS[_get_structured
title, overview,
action_items, events]
PC --> EM[_extract_memories
facts ā memories]
PC --> SAI[_save_action_items
standalone collection]
GS --> UC[upsert_conversation]
EM --> SM[save_memories]
UC --> FS[(Firestore:
conversations/)]
SM --> FSM[(Firestore:
memories/)]
UC --> Pine[(Pinecone:
vectors)]
users/
āāā {uid}/
ā āāā conversations/ # PRIMARY - Recorded interactions
ā ā āāā {conversation_id}/
ā ā āāā id
ā ā āāā created_at
ā ā āāā started_at
ā ā āāā finished_at
ā ā āāā source
ā ā āāā language
ā ā āāā status
ā ā āāā structured
ā ā āāā transcript_segments
ā ā āāā geolocation
ā ā āāā photos/ (subcollection)
ā ā āāā audio_files
ā ā āāā apps_results
ā ā āāā discarded
ā ā āāā visibility
ā ā āāā is_locked
ā ā āāā data_protection_level
ā ā
ā āāā memories/ # SECONDARY - Extracted facts
ā ā āāā {memory_id}/
ā ā āāā id
ā ā āāā uid
ā ā āāā conversation_id
ā ā āāā content
ā ā āāā category
ā ā āāā tags
ā ā āāā visibility
ā ā āāā created_at
ā ā āāā updated_at
ā ā āāā reviewed
ā ā āāā user_review
ā ā āāā scoring
ā ā āāā data_protection_level
ā ā
ā āāā action_items/ # Standalone action items
ā āāā {action_item_id}/
ā āāā description
ā āāā completed
ā āāā conversation_id
ā āāā created_at
ā āāā due_at
ā āāā completed_at
| Field | Type | Description |
|---|---|---|
id | string | Unique conversation identifier |
created_at | datetime | When the conversation record was created |
started_at | datetime | When the actual conversation started |
finished_at | datetime | When the conversation ended |
source | enum | Source device (omi, phone, desktop, openglass, etc.) |
language | string | Language code of the conversation |
status | enum | Processing status: in_progress, processing, completed, failed |
structured | object | Extracted structured information (see below) |
transcript_segments | array | List of transcript segments |
geolocation | object | Location data (latitude, longitude, address) |
photos | array | Photos captured during conversation |
audio_files | array | Audio file references |
apps_results | array | Results from summarization apps |
external_data | object | Data from external integrations |
discarded | boolean | Whether conversation was marked as low-quality |
visibility | enum | private, shared, or public |
is_locked | boolean | Whether conversation is locked from editing |
data_protection_level | string | standard or enhanced (encrypted) |
The structured field contains LLM-extracted information:
| Field | Type | Description |
|---|---|---|
title | string | Short descriptive title for the conversation |
overview | string | Summary of key points discussed |
emoji | string | Emoji representing the conversation |
category | enum | Category (personal, work, health, etc.) |
action_items | array | Tasks or to-dos mentioned |
events | array | Calendar events to be created |
Each segment in transcript_segments includes:
| Field | Type | Description |
|---|---|---|
text | string | Transcribed text content |
speaker | string | Speaker label (e.g., "SPEAKER_00") |
start | float | Start time in seconds |
end | float | End time in seconds |
is_user | boolean | Whether spoken by the device owner |
person_id | string | ID of identified person (if matched) |
Action items are stored both inline (in structured.action_items) and in a standalone collection:
| Field | Type | Description |
|---|---|---|
description | string | The action item text |
completed | boolean | Whether the item is done |
created_at | datetime | When extracted |
due_at | datetime | Optional due date |
completed_at | datetime | When marked complete |
conversation_id | string | Source conversation |
Calendar events extracted from conversations:
| Field | Type | Description |
|---|---|---|
title | string | Event title |
description | string | Event description |
start | datetime | Start date/time |
duration | integer | Duration in minutes |
created | boolean | Whether added to calendar |
Memories are facts about the user extracted from conversations. They represent learnings, preferences, habits, and other personal information.
During process_conversation(), the system:
| Field | Type | Description |
|---|---|---|
id | string | Unique memory identifier |
uid | string | User ID |
conversation_id | string | Source conversation (links back) |
content | string | The actual fact/learning (max ~15 words) |
category | enum | interesting, system, or manual |
tags | array | Categorization tags |
visibility | string | private or public |
created_at | datetime | When memory was created |
updated_at | datetime | Last modification time |
reviewed | boolean | Whether user has reviewed |
user_review | boolean | User's approval (true/false/null) |
edited | boolean | Whether user edited the content |
scoring | string | Ranking score for retrieval |
manually_added | boolean | Whether user created manually |
is_locked | boolean | Prevent automatic deletion |
app_id | string | Source app (if from integration) |
data_protection_level | string | Encryption level |
The system follows these guidelines when extracting memories:
interesting + 2 system memories per conversationBoth conversations and memories support encryption for sensitive data.
<Tabs> <Tab title="Standard" icon="unlock"> ### Standard Protection LevelNo encryption, stored as plaintext. This is the default for most users.
- Fastest read/write performance
- Data visible in Firestore console
- Suitable for general use
AES encryption for sensitive fields. Provides additional security for sensitive conversations.
**Encrypted Fields:**
- **Conversations**: `transcript_segments` (the actual transcript text)
- **Memories**: `content` (the memory text)
<Warning>
Enhanced encryption adds processing overhead to read/write operations.
</Warning>
# Conversations: database/conversations.py
def _prepare_conversation_for_write(conversation_data, data_protection_level):
if data_protection_level == 'enhanced':
# Encrypt transcript_segments before storage
...
def _prepare_conversation_for_read(conversation_data, data_protection_level):
if data_protection_level == 'enhanced':
# Decrypt transcript_segments after retrieval
...
Conversations are also stored as vector embeddings in Pinecone for semantic search.
| Data | Embedded? | Stored in Metadata? |
|---|---|---|
| Title | Yes | No |
| Overview | Yes | No |
| Action Items | Yes | No |
| Full Transcript | No (too large) | No |
| People Mentioned | No | Yes |
| Topics | No | Yes |
| Entities | No | Yes |
| created_at | No | Yes |
Vectors are created in a background thread after conversation processing:
# utils/conversations/process_conversation.py
threading.Thread(
target=save_structured_vector,
args=(uid, conversation)
).start()
The save_structured_vector() function:
conversation.structured (title + overview + action_items + events)| Component | File Path |
|---|---|
| Conversation Model | backend/models/conversation.py |
| Memory Model | backend/models/memories.py |
| Process Conversation | backend/utils/conversations/process_conversation.py |
| Database - Conversations | backend/database/conversations.py |
| Database - Memories | backend/database/memories.py |
| Router - Conversations | backend/routers/conversations.py |
| Router - Memories | backend/routers/memories.py |
| Vector Database | backend/database/vector_db.py |
| Method | Endpoint | Description |
|---|---|---|
| POST | /v1/conversations | Process and store a new conversation |
| GET | /v1/conversations | List user's conversations |
| GET | /v1/conversations/{id} | Get specific conversation |
| PATCH | /v1/conversations/{id}/title | Update conversation title |
| DELETE | /v1/conversations/{id} | Delete a conversation |
| Method | Endpoint | Description |
|---|---|---|
| POST | /v3/memories | Create a manual memory |
| GET | /v3/memories | List user's memories |
| PATCH | /v3/memories/{id} | Edit a memory |
| DELETE | /v3/memories/{id} | Delete a memory |
| PATCH | /v3/memories/{id}/visibility | Change memory visibility |