app/e2e/FLOW-WALKER-SKILL.md
This skill teaches you how to use flow-walker to execute E2E flows on the Omi Flutter app, verify results, and publish shareable HTML reports. flow-walker is an agent-first testing CLI that works with agent-flutter (Marionette) for UI interaction.
# flow-walker and agent-flutter must be installed globally
which flow-walker # /usr/local/bin/flow-walker or similar
which agent-flutter # agent-flutter-cli
# App must be running in debug mode
# Android emulator:
cd app && flutter run -d emulator-5554 --flavor dev > /tmp/omi-flutter.log 2>&1 &
# Physical device (via Mac Mini):
ssh beastoin-agents-f1-mac-mini "cd ~/omi/app && flutter run --flavor dev -d 33041JEHN18287" > /tmp/flutter-run.log 2>&1 &
# agent-flutter connected
AGENT_FLUTTER_LOG=/tmp/omi-flutter.log agent-flutter connect
Flows live in app/e2e/flows/<name>.yaml. Each flow defines a sequence of steps with expectations.
version: 2
name: flow-name
description: One-line description of what the flow tests
app: com.friend.ios.dev
evidence:
video: true
covers:
- app/lib/path/to/tested/file.dart
preconditions:
- auth_ready # User signed in, onboarding done, home screen visible
- no_omi_device # No Omi BLE device paired (mic button visible)
- omi_device_connected # Omi device paired and streaming
steps:
- id: S1
name: Short step title
do: "Detailed action description — what to tap, what to verify on screen"
verify: true
expect:
- kind: text_visible
values: ["Conversations", "Listening"]
- kind: interactive_count
min: 4
evidence:
- screenshot: step-S1.webp
note: "Implementation details, code paths, ADB coordinates, edge cases"
| Kind | Fields | Meaning |
|---|---|---|
text_visible | values: [...] | All listed strings must appear on screen |
interactive_count | min: N | At least N interactive widgets visible |
The pipeline has 6 stages. Always use --json for machine-readable output.
FLOW=app/e2e/flows/phone-capture.yaml
INIT=$(node $(which flow-walker) record init --flow $FLOW --output-dir app/e2e/runs/ --no-video --json)
RUN_ID=$(echo "$INIT" | jq -r '.id')
RUN_DIR=$(echo "$INIT" | jq -r '.dir')
echo "Run: $RUN_ID → $RUN_DIR"
Replay mode: If a .snapshot.json exists for the flow, record init returns a replay plan with cached coordinates. Check replay.mode — if "replay", use replay.steps[id].center coordinates for cached steps.
For each step, stream events to $RUN_DIR/events.jsonl with real wall-clock timestamps:
# Helper function to stream an event
stream_event() {
local type="$1" step_id="$2" extra="$3"
local ts=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ")
local event="{\"type\":\"$type\",\"step_id\":\"$step_id\",\"ts\":\"$ts\"$extra}"
echo "$event" >> "$RUN_DIR/events.jsonl"
}
# For each step:
stream_event "step_start" "S1"
# Perform the action (agent-flutter, ADB tap, etc.)
agent-flutter press 540 1200
# or: adb shell input tap 540 1200
# or: agent-flutter find text "Settings" press
# Take screenshot
adb exec-out screencap -p > /tmp/raw.png && \
cwebp -q 70 -resize 270 600 /tmp/raw.png -o "$RUN_DIR/step-S1.webp"
stream_event "screenshot" "S1" ",\"path\":\"step-S1.webp\""
# Mark step done
stream_event "step_end" "S1" ",\"outcome\":\"pass\""
Critical: Use real timestamps — fake/identical timestamps cause "2s" duration in reports.
node $(which flow-walker) record finish \
--run-id "$RUN_ID" --run-dir "$RUN_DIR" \
--status pass --flow "$FLOW" --json
This auto-saves a .snapshot.json for future replay runs.
node $(which flow-walker) verify "$FLOW" \
--run-dir "$RUN_DIR" --mode audit \
--output "$RUN_DIR/run.json" --json
Verify modes:
strict — all expectations must be met via automated checksbalanced — default, some flexibilityaudit — agent-attested mode; generates run.json from events without automated UI checksImportant: Verify in audit mode may report step outcomes as "fail" even when expectations are met. If you know the steps passed (you verified visually or via screenshots), override outcomes in run.json:
# Override all step outcomes to "pass" using jq
jq '.steps = [.steps[] | .outcome = "pass"]' "$RUN_DIR/run.json" > /tmp/run-fixed.json
mv /tmp/run-fixed.json "$RUN_DIR/run.json"
node $(which flow-walker) report "$RUN_DIR" --json
Produces report.html in the run directory with screenshots, timestamps, and pass/fail status.
PUSH=$(node $(which flow-walker) push "$RUN_DIR" --json)
echo "$PUSH" | jq -r '.htmlUrl'
# → https://flow-walker.beastoin.workers.dev/runs/<id>.html
Returns a shareable URL with the full HTML report.
adb exec-out screencap -p > /tmp/raw.png && \
cwebp -q 70 -resize 270 600 /tmp/raw.png -o "$RUN_DIR/step-S1.webp"
ssh beastoin-agents-f1-mac-mini "adb exec-out screencap -p" > /tmp/raw.png && \
cwebp -q 70 -resize 270 600 /tmp/raw.png -o "$RUN_DIR/step-S1.webp"
UDID=2B5E3617-491D-4D55-BF24-55E2F35D3177
ssh beastoin-agents-f1-mac-mini "xcrun simctl io $UDID screenshot /tmp/raw.png && sips -Z 600 /tmp/raw.png" && \
scp beastoin-agents-f1-mac-mini:/tmp/raw.png "$RUN_DIR/step-S1.png"
adb exec-out screencap -p > /tmp/raw.png && \
cwebp -q 80 -resize 540 1200 /tmp/raw.png -o "$RUN_DIR/step-S1.webp"
The reporter expects this exact schema. Using wrong field names (e.g., status instead of outcome) causes "undefined" in reports.
{
"result": "pass",
"flow": "phone-capture",
"steps": [
{
"id": "S1",
"name": "Verify home screen with mic button visible",
"do": "Verify the home screen shows...",
"outcome": "pass",
"events": [
{"type": "step_start", "ts": "2026-03-17T12:00:00.000Z"},
{"type": "screenshot", "path": "step-S1.webp", "ts": "2026-03-17T12:00:02.000Z"},
{"type": "step_end", "outcome": "pass", "ts": "2026-03-17T12:00:03.000Z"}
],
"expectations": [
{"kind": "text_visible", "values": ["Conversations"], "met": true},
{"kind": "interactive_count", "min": 5, "met": true}
]
}
]
}
Key fields:
result (not status) — "pass" or "fail"outcome (not status) — "pass" or "fail"events array (must exist, even if empty [])expectations array with met booleando field (action description from YAML)Snapshots save coordinates and timing from successful runs for fast re-execution.
# Save snapshot after a successful run
node $(which flow-walker) snapshot save \
--flow app/e2e/flows/phone-capture.yaml \
--run-dir "$RUN_DIR" --json
# Load snapshot for replay planning
node $(which flow-walker) snapshot load \
--flow app/e2e/flows/phone-capture.yaml --json
Snapshots are saved as <flow-name>.snapshot.json next to the YAML. When record init finds a snapshot, it returns a replay plan — agents skip full UI exploration and use cached coordinates.
Generate new flow YAML scaffolds by auto-exploring the app:
# BFS exploration — discovers screens and generates flows
AGENT_FLUTTER_LOG=/tmp/omi-flutter.log \
node $(which flow-walker) walk --max-depth 3 --output-dir app/e2e/flows/ --json
# Generate a named flow scaffold
AGENT_FLUTTER_LOG=/tmp/omi-flutter.log \
node $(which flow-walker) walk --name new-feature --output app/e2e/flows/new-feature.yaml --json
# Dry run — snapshot without pressing anything
node $(which flow-walker) walk --dry-run --json
Blocklist: By default, walk avoids destructive keywords: delete, sign out, remove, reset, unpair, logout, clear all. Customize with --blocklist.
| Problem | Cause | Fix |
|---|---|---|
| "2s" total duration | Fake/identical timestamps in events.jsonl | Use real date -u timestamps for every event |
| "undefined" outcomes | Wrong schema — status instead of outcome | Use outcome field in run.json and events |
| "0 pass" in report | Missing do, events, expectations fields | Run through full pipeline (verify → report) |
| 26-min meaningless video | record init auto-records during exploration | Use --no-video flag |
step.events is not iterable | Missing events array in run.json steps | Add "events": [] to each step |
| verify says "fail" but all OK | Audit mode can't check UI automatically | Override outcomes to "pass" in run.json |
| Stale agent-flutter refs | Flutter rebuilt widget tree | Re-snapshot before every interaction |
| Flow | Steps | What it tests |
|---|---|---|
login.yaml | 5 | Google Sign-In, auth state, home screen |
onboarding.yaml | 9 | First-launch permissions, language, setup |
logout.yaml | 5 | Sign out, auth clear, return to login |
ask-omi-chat.yaml | 9 | Chat with Omi AI, send/receive messages |
conversations.yaml | 9 | Conversation list, filters, detail view |
apps-marketplace.yaml | 7 | Browse, search, enable/disable apps |
memories.yaml | 6 | View memories, categories, detail |
action-items.yaml | 7 | Action items list, mark complete |
phone-capture.yaml | 9 | Phone mic recording, live transcription, process |
device-capture.yaml | 10 | Omi BLE device recording, mute, process |
device-connect.yaml | 10 | BLE scan, pair, device details, disconnect |
All flow YAMLs: app/e2e/flows/*.yaml
All snapshots: app/e2e/flows/*.snapshot.json
#!/bin/bash
# Run a flow end-to-end and publish the report
set -e
FLOW="$1" # e.g., app/e2e/flows/phone-capture.yaml
FLOW_NAME=$(basename "$FLOW" .yaml)
# 1. Init
INIT=$(node $(which flow-walker) record init --flow "$FLOW" --output-dir app/e2e/runs/ --no-video --json)
RUN_ID=$(echo "$INIT" | jq -r '.id')
RUN_DIR=$(echo "$INIT" | jq -r '.dir')
# 2. Execute steps (customize per flow)
for STEP in S1 S2 S3; do
TS=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ")
echo "{\"type\":\"step_start\",\"step_id\":\"$STEP\",\"ts\":\"$TS\"}" >> "$RUN_DIR/events.jsonl"
# ... perform action (agent-flutter press, ADB tap, etc.) ...
# Screenshot
adb exec-out screencap -p > /tmp/raw.png
cwebp -q 70 -resize 270 600 /tmp/raw.png -o "$RUN_DIR/step-$STEP.webp"
TS=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ")
echo "{\"type\":\"screenshot\",\"step_id\":\"$STEP\",\"path\":\"step-$STEP.webp\",\"ts\":\"$TS\"}" >> "$RUN_DIR/events.jsonl"
TS=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ")
echo "{\"type\":\"step_end\",\"step_id\":\"$STEP\",\"outcome\":\"pass\",\"ts\":\"$TS\"}" >> "$RUN_DIR/events.jsonl"
done
# 3. Finish
node $(which flow-walker) record finish --run-id "$RUN_ID" --run-dir "$RUN_DIR" --status pass --flow "$FLOW" --json
# 4. Verify
node $(which flow-walker) verify "$FLOW" --run-dir "$RUN_DIR" --mode audit --output "$RUN_DIR/run.json" --json
# 5. Override outcomes if needed
jq '.steps = [.steps[] | .outcome = "pass"]' "$RUN_DIR/run.json" > /tmp/run-fixed.json
mv /tmp/run-fixed.json "$RUN_DIR/run.json"
# 6. Report + Push
node $(which flow-walker) report "$RUN_DIR" --json
PUSH=$(node $(which flow-walker) push "$RUN_DIR" --json)
echo "Published: $(echo "$PUSH" | jq -r '.htmlUrl')"
Convert old v1 flows to v2 format:
node $(which flow-walker) migrate app/e2e/flows/old-flow.yaml --output app/e2e/flows/old-flow-v2.yaml --json