frameworks/motia/contributors/rfc/2025-06-02-observability-system.md
This RFC proposes implementing an observability system for Motia that provides comprehensive tracing and real-time monitoring through an intuitive horizontal timeline interface. The system will enable users to track execution traces, monitor performance, and debug issues with detailed insights into their workflow executions, focusing on Motia-specific concepts like States, Emit events, Streams, and step interactions through a clean, horizontal visualization.
Currently, Motia provides basic logging capabilities through:
However, users lack intuitive observability into their workflow executions, making it difficult to:
Real-world workflows often involve multiple related executions that form a single logical business process, but individual traces don't capture these relationships:
✅ Required: Correlated Logical Flows
Chat Thread → [Trace A] → [Trace B] → [Trace C]
└─────── All part of same conversation ──────┘
Order Flow → [API: Create] → [Event: Process] → [API: Confirm] → [Event: Ship]
└─────────── Complete business process visibility ────────────┘
await context.correlate('business_process_id')X-Motia-Correlation-Id for API callsChat Session: user_123_thread_abc (3 traces, 2 min duration)
├─ Trace 1: "Hello, weather?" (✓ 150ms) - 2 min ago
│ └─ get-weather → format-response → send-reply
├─ ⏱️ [Gap: 45s - user thinking]
├─ Trace 2: "What about tomorrow?" (✓ 200ms) - 1 min ago
│ └─ get-context → get-forecast → format-response
└─ Trace 3: "Thanks!" (🔄 running) - 10s ago
└─ process-gratitude → [running]
┌─────────────────────────────────────────────────────────────────┐
│ Motia Core System │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌──────────────────┐ ┌────────────┐ │
│ │ Enhanced │───►│ Trace Builder │───►│ Trace │ │
│ │ Logging │ │ Service │ │ Storage │ │
│ │ (call-step) │ │ │ │(In-Memory) │ │
│ └─────────────────┘ └──────────────────┘ └────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌──────────────────┐ ┌────────────┐ │
│ │ Structured │ │ Real-time │ │ Workbench │ │
│ │ Log Events │ │ Updates │ │ API │ │
│ └─────────────────┘ └──────────────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Workbench UI Layer │
│ ┌─────────────┐ ┌─────────────────┐ ┌─────────────────────┐ │
│ │ Traces List │ │ Horizontal │ │ Real-time Flow │ │
│ │ Page │ │ Timeline │ │ Status │ │
│ └─────────────┘ └─────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────┐
│ Step Execution │
│ (any step) │
└─────┬───────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ Enhanced Logger │────►│ ObservabilityEvent│
│ (logEvent) │ │ - step_start │
└─────────────────┘ │ - step_end │
│ │ - state_op │
│ │ - emit_op │
│ │ - stream_op │
▼ └──────────────────┘
┌─────────────────┐ │
│ Log Stream │◄─────────────┘
│ (existing) │
└─────┬───────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ TraceBuilder │────►│ Trace │
│ Service │ │ - id │
│ (aggregator) │ │ - steps[] │
└─────────────────┘ │ - metadata │
│ └──────────────────┘
▼ │
┌─────────────────┐ │
│ In-Memory │◄─────────────┘
│ Trace Cache │
└─────┬───────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ API Endpoints │────►│ UI Components │
│ /motia/traces │ │ - TracesPage │
│ /motia/traces/:id│ │ - Timeline │
└─────────────────┘ │ - FlowStatus │
└──────────────────┘
interface Trace {
id: string
correlationId?: string // Links related traces across triggers
parentTraceId?: string // For child/continuation traces
flowName: string
status: 'running' | 'completed' | 'failed'
startTime: number
duration?: number
entryPoint: { type: 'http' | 'cron' | 'event', stepName: string }
steps: Step[]
metadata: {
totalSteps: number,
completedSteps: number,
errorCount: number,
traceIndex?: number, // Position in logical flow sequence
isChildTrace?: boolean, // Indicates continuation of another trace
correlationContext?: any // Additional correlation metadata
}
}
interface Step {
name: string
status: 'waiting' | 'running' | 'completed' | 'failed'
startTime?: number
duration?: number
operations: { state: number, emit: number, stream: number }
error?: { message: string, code?: string | number }
}
interface TraceGroup {
id: string // Unique identifier (same as correlationId)
correlationId: string // Primary correlation identifier
name: string // Business process name (e.g., "Chat Thread", "Order Processing")
status: 'active' | 'completed' | 'failed' | 'stalled'
startTime: number
lastActivity: number
totalDuration?: number
traces: Trace[] // All related traces in chronological order
metadata: {
totalTraces: number,
completedTraces: number,
activeTraces: number,
totalSteps: number,
averageStepDuration: number,
gapsCount: number, // Number of waiting periods between traces
totalGapDuration: number // Total time spent waiting between traces
}
}
interface ObservabilityEvent extends Log {
eventType: 'step_start' | 'step_end' | 'state_op' | 'emit_op' | 'stream_op'
| 'correlation_start' | 'correlation_continue' // 🆕 Correlation events
traceId: string
correlationId?: string // 🆕 Links to logical flow
parentTraceId?: string // 🆕 For trace relationships
stepName: string
duration?: number
metadata?: {
operation?: 'get' | 'set' | 'delete' | 'clear'
key?: string
topic?: string
streamName?: string
success?: boolean
correlationContext?: any // 🆕 Additional correlation data
correlationMethod?: 'automatic' | 'manual' | 'state-based' | 'event-based' // 🆕
}
}
When a user registration flow executes, the observability system will capture comprehensive data about each step and operation:
Flow: user-registration
Status: Not Started
Steps: 0/3 completed
Trace ID: trace_abc123_20241230_143052
Flow: user-registration
Status: Running
Current Step: validate-user (started at 143:052ms)
Step Operations:
├─ 🗄️ state.get('user.email') - 15ms ✓
├─ 🗄️ state.set('validation.status', 'pending') - 8ms ✓
├─ 🗄️ state.set('validation.result', validationData) - 12ms ✓
└─ 📡 emit('user.validated', userData) → triggers save-user - 5ms ✓
Step Result: ✓ Completed in 95ms
Operations: 3 state, 1 emit, 0 stream
Current Step: save-user (started at 143:147ms)
Step Operations:
├─ 🗄️ state.get('validation.result') - 8ms ✓
├─ 🌊 stream.set('users', userId, userData) - 45ms ✓
├─ 🗄️ state.set('user.saved', true) - 10ms ✓
└─ 📡 emit('user.saved', { userId, email }) → triggers send-email - 3ms ✓
Step Result: ✓ Completed in 155ms
Operations: 2 state, 1 emit, 1 stream
Current Step: send-email (started at 143:302ms)
Step Operations:
├─ 🗄️ state.get('user.saved') - 5ms ✓
├─ 🌊 stream.set('email-queue', emailId, emailData) - 25ms ✓
└─ 🗄️ state.set('email.queued', true) - 8ms ✓
Step Result: ✓ Completed in 75ms
Operations: 2 state, 0 emit, 1 stream
{
"id": "trace_abc123_20241230_143052",
"flowName": "user-registration",
"status": "completed",
"startTime": 1703943052000,
"duration": 325,
"entryPoint": {
"type": "api",
"stepName": "validate-user"
},
"steps": [
{
"name": "validate-user",
"status": "completed",
"startTime": 0,
"duration": 95,
"operations": { "state": 3, "emit": 1, "stream": 0 }
},
{
"name": "save-user",
"status": "completed",
"startTime": 95,
"duration": 155,
"operations": { "state": 2, "emit": 1, "stream": 1 }
},
{
"name": "send-email",
"status": "completed",
"startTime": 250,
"duration": 75,
"operations": { "state": 2, "emit": 0, "stream": 1 }
}
],
"metadata": {
"totalSteps": 3,
"completedSteps": 3,
"errorCount": 0
}
}
user-registration (✓ 325ms) - Started 2 minutes ago
Time → 0ms 100ms 200ms 300ms 400ms
| | | |
┌─────┐
│█████│ validate-user ✓ 95ms
└──┬──┘ 🗄️(3) 📡(1) 🌊(0)
│
▼
┌──────────────┐
│██████████████│ save-user ✓ 155ms
└──────┬───────┘ 🗄️(2) 📡(1) 🌊(1)
│
▼
┌─────────┐
│█████████│ send-email ✓ 75ms
└─────────┘ 🗄️(2) 📡(0) 🌊(1)
Total Operations: 7 state, 2 emit, 2 stream
Performance: All steps < 200ms ✓
This comprehensive observability data enables developers to understand exactly what happened during the workflow execution, identify performance bottlenecks, debug issues, and optimize their flows.
Time → 0ms 100ms 250ms 400ms [ongoing]
| | | |
StepA [████]
|
├─ 🗄️ StateOp(get): user.email (25ms)
├─ 📡 EmitOp: user.validated → StepB
└─ ✓ Complete: 95ms
|
v
StepB [████████]
| |
├─ 🗄️ StateOp(set): validation.result (15ms)
├─ 🌊 StreamOp: users.create (35ms)
├─ 📡 EmitOp: user.saved → StepC
└─ ✓ Complete: 155ms
|
v
StepC [██████]
| |
├─ 🌊 StreamOp: emails.queue (20ms)
└─ ✓ Complete: 75ms
TracesPage
├── TracesList (1/3 width)
│ ├── TraceSearch
│ ├── TraceFilters
│ └── TraceItems[]
│ └── TraceListItem
│ ├── Status Indicator
│ ├── Duration & Steps
│ └── Flow Information
└── TraceTimeline (2/3 width)
├── TimelineHeader
│ ├── TraceMetadata
│ └── TimeAxis
└── TimelineBody
└── StepRow[]
├── StepLabel
├── StepExecutionBar
├── OperationBadges[]
└── DurationInfo
┌─────────────────────────────────────────────────────────────────┐
│ User Registration Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ validate-user ────→ save-user ────→ send-email │
│ 🟢 2 🟡 1 ⚪ 0 │
│ (2 running) (1 running) (waiting) │
│ │
│ 📊 Live Stats: │
│ • Active traces: 3 │
│ • Avg duration: 245ms │
│ • Success rate: 98.5% │
│ │
└─────────────────────────────────────────────────────────────────┘
Step Execution
│
▼
Enhanced Logging ─────► ObservabilityEvent
│ │
▼ ▼
Log Stream ──────────► TraceBuilder
│ │
▼ ▼
WebSocket ────────────► UI Updates
│ │
▼ ▼
Live Status ──────────► Timeline Animation
call-step-file.ts with structured observability events/motia/traces endpoints for trace retrievalThis observability system provides comprehensive visibility into Motia workflows through an intuitive horizontal timeline interface. By leveraging the existing logging infrastructure and focusing on clean, performant UI components, we can deliver powerful debugging and monitoring capabilities with minimal complexity and overhead. The architecture prioritizes simplicity, performance, and developer experience while providing the essential observability features needed for complex workflow debugging.