Back to Omi

Omi Desktop App — Flows & Exploration

desktop/e2e/SKILL.md

3.0.0-Android-App7.4 KB
Original Source

Omi Desktop App — Flows & Exploration

This skill teaches you the Omi desktop macOS app's navigation structure, screen architecture, and SwiftUI patterns. Use it when developing features (to understand how the app works), fixing bugs (to navigate to the affected screen), or verifying changes (to confirm your code works in the live app).

How to Explore the App

You can interact with the running app via agent-swift — a CLI that clicks elements, reads the accessibility tree, and captures screenshots through the macOS Accessibility API. Works with any macOS app, no app-side instrumentation needed.

Setup

bash
# App must be running via ./run.sh from desktop/
agent-swift doctor                                   # check Accessibility permission
agent-swift connect --bundle-id com.omi.desktop-dev  # connect to Omi Dev
agent-swift snapshot -i --json                       # see what's on screen

Commands

CommandPurposeExample
snapshot -i --jsonSee all interactive elements with refs, types, labelsagent-swift snapshot -i --json
click @refCGEvent click — SwiftUI elements (NavigationLink, gestures)agent-swift click @e3
press @refAXPress — AppKit buttons, Settings sidebar itemsagent-swift press @e5
find role/text/key VALUEFind element and chain actionagent-swift find text "Settings" click
fill @ref "text"Type into text fieldagent-swift fill @e7 "search"
scroll down/upScroll current viewagent-swift scroll down
wait text "X"Wait for element to appearagent-swift wait text "Loading" --timeout 5000
is exists @refAssert element exists (exit 0/1)agent-swift is exists @e3
get PROP @refRead property valueagent-swift get value @e5 --json
screenshot PATHCapture app windowagent-swift screenshot /tmp/screen.png

Key rules:

  • click = CGEvent mouse click (SwiftUI). Use for main sidebar icons, NavigationLink.
  • press = AXPress action (AppKit). Use for Settings sidebar sections.
  • Refs go stale after any mutation — always re-snapshot before the next interaction.
  • find with chained action is more stable than hardcoded @ref numbers.
  • --json flag on any command gives structured output for parsing.

App Navigation Architecture

Screen Map

Main Window
├── Sidebar (SidebarView.swift) — use `click`
│   ├── Home (DesktopHomeView.swift)
│   ├── Conversation (ChatSessionsSidebar.swift)
│   ├── brain → Memories
│   ├── checklist → Action Items
│   ├── puzzlepiece.fill → Integrations
│   └── gearshape.fill → Settings
│
└── Settings (SettingsPage.swift) — use `press` for sidebar sections
    ├── General — app preferences
    ├── Rewind — screenshot/timeline settings
    ├── Transcription — Language Mode (Auto-Detect / Single Language)
    │   └── Language picker (popupbutton or button)
    ├── Notifications — alert preferences
    ├── Privacy — data settings
    ├── Account — user info
    ├── AI Chat — chat model settings
    ├── Advanced — developer options
    └── About — version info

System Tray Menu
├── openOmi — Open Omi
├── checkFor — Check for Updates
├── resetOnb — Reset Onboarding
├── reportIs — Report Issue
├── signOut — Sign Out
└── quitApp — Quit

Interaction Patterns

Main sidebar navigation:

  • Icons are image type elements with accessibility identifiers: sidebar_dashboard, sidebar_chat, sidebar_memories, sidebar_tasks, sidebar_rewind, sidebar_apps, sidebar_settings
  • Use find key sidebar_dashboard click for reliable navigation (survives UI changes)
  • Keyboard shortcuts: Cmd+1 (Dashboard), Cmd+2 (Chat), Cmd+3 (Memories), Cmd+4 (Tasks), Cmd+5 (Rewind), Cmd+6 (Apps), Cmd+, (Settings)
  • Use click — these are SwiftUI views with onTapGesture

Settings sidebar navigation:

  • Sections are button type elements with section name labels
  • Use press — these are SwiftUI Button views that respond to AXPress

Transcription language mode:

  • Two radio-button-style options: "Auto-Detect Multi-Language" and "Single Language Better Accuracy"
  • click on the text to switch modes
  • Single Language mode shows a language picker (popupbutton)
  • Click popupbutton → menu items appear as menuitem elements

System tray menu:

  • Menu items have identifier prefixes for detection
  • Access via snapshot --json (includes menu bar items)

Known Flows

Reference flows in desktop/e2e/flows/*.yaml describe the app's key user journeys. Read these to understand navigation paths, expected elements, and UI state at each step.

FlowCoversWhat it describes
flows/navigation.yamlSidebarView, DesktopHomeView, OmiAppSidebar icons, section switching, text input, scroll, tray menu
flows/language.yamlSettingsPage, SettingsSidebar, SidebarViewSettings nav, Transcription, language mode toggle, picker
flows/dashboard.yamlDashboardPage, GoalsWidget, TasksWidgetGoals CRUD, task toggle, embedded conversations
flows/chat.yamlChatPage, ChatProviderSend message, AI response, message actions
flows/memories.yamlMemoriesPage, MemoryGraphPageTag filtering, search, visibility toggle, memory graph
flows/tasks.yamlTasksPage, TasksStoreTask categories, filters, create task, toggle completion
flows/settings.yamlSettingsPage, SettingsSidebarAll 9 settings sections (General through About)

When you modify a Swift file, check if any flow's covers: includes it. That flow describes the user journey your change affects.

Adding a New Flow

Create desktop/e2e/flows/<name>.yaml:

yaml
name: my-flow
description: What this flow covers
covers:
  - Desktop/Sources/path/to/YourView.swift
setup: normal   # normal | fresh_auth | signed_out
steps:
  - name: Step description
    click: { type: image, label: "gearshape.fill" }
    screenshot: step-name
  - name: Verify result
    assert: { text: "Expected Text" }

Verification & Evidence

After making changes, verify them in the live app:

  1. Navigate to the affected screen using the commands above
  2. Check that your changes appear (snapshot, screenshot)
  3. Test interactions (click buttons, fill fields, scroll)
  4. Capture evidence: agent-swift screenshot /tmp/evidence.png
  5. Generate video: ffmpeg -framerate 1 -pattern_type glob -i '/tmp/e2e-*.png' -vf "scale=1280:720:force_original_aspect_ratio=decrease,pad=1280:720:-1:-1" -c:v libx264 -pix_fmt yuv420p /tmp/report.mp4

Decision Tree

ProblemSolution
Element not foundRe-snapshot, try scrolling, check if on wrong screen
Click doesn't navigateTry press instead (Settings sidebar = press, main sidebar = click)
Picker not respondingSwiftUI Picker .menu style may not expose as popupbutton — look for button with value label
App seems frozenCheck agent-swift status --json, re-connect, check /private/tmp/omi-dev.log

Guard Conditions

NEVER:

  • Automate the production app (com.omi.computer-macos)
  • Kill or restart the production Omi app
  • Use development env vars to bypass auth — test real auth flows
  • Set hasCompletedOnboarding to skip onboarding — test the real flow
  • Modify source code to make tests pass — report the failure instead