Observational Memory

Added in: @mastra/[email protected]

Observational Memory (OM) is Mastra's memory system for long-context agentic memory. Two background agents — an Observer that watches conversations and creates observations, and a Reflector that restructures observations by combining related items, reflecting on overarching patterns, and condensing where possible — maintain an observation log that replaces raw message history as it grows.

Usage

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: true,
    },
  }),
})

Configuration

The observationalMemory option accepts true, a configuration object, or false. Setting true enables OM with google/gemini-2.5-flash as the default model. When passing a config object, a model must be explicitly set — either at the top level, or on observation.model and/or reflection.model.

Observer input is multimodal-aware. OM keeps text placeholders like [Image #1: screenshot.png] in the transcript it builds for the Observer, and also sends the underlying image parts when possible. This applies to both single-thread observation and batched multi-thread observation. Non-image files appear as placeholders only.

OM performs thresholding with fast local token estimation. Text uses tokenx, and image-like inputs use provider-aware heuristics plus deterministic fallbacks when metadata is incomplete.

Token estimate metadata cache

OM persists token payload estimates so repeated counting can reuse prior token estimation work.

Part-level cache: part.providerMetadata.mastra.
String-content fallback cache: message-level metadata when no parts exist.
Cache entries are ignored and recomputed if cache version/tokenizer source doesn't match.
Per-message and per-conversation overhead are always recomputed at runtime and aren't cached.
data-* and reasoning parts are skipped and don't receive cache entries.

Examples

Resource scope with custom thresholds (experimental)

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        scope: 'resource',
        observation: {
          messageTokens: 20_000,
        },
        reflection: {
          observationTokens: 60_000,
        },
      },
    },
  }),
})

Shared token budget

When shareTokenBudget is enabled, the total budget is observation.messageTokens + reflection.observationTokens (100k in this example). If observations only use 30k tokens, messages can expand to use up to 70k. If messages are short, observations have more room before triggering reflection.

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        shareTokenBudget: true,
        observation: {
          messageTokens: 20_000,
          bufferTokens: false, // required when using shareTokenBudget (temporary limitation)
        },
        reflection: {
          observationTokens: 80_000,
        },
      },
    },
  }),
})

Custom model

By passing a model in the config, you can use any model from Mastra's model router.

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    options: {
      observationalMemory: {
        // highlight-next-line
        model: 'openai/gpt-5-mini',
      },
    },
  }),
})

Different models per agent

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    options: {
      observationalMemory: {
        // highlight-start
        observation: {
          model: 'google/gemini-2.5-flash',
        },
        reflection: {
          model: 'openai/gpt-5-mini',
        },
        // highlight-end
      },
    },
  }),
})

Custom instructions

Customize what the Observer and Reflector focus on by providing custom instructions:

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'health-assistant',
  instructions: 'You are a health and wellness assistant.',
  model: 'openai/gpt-5.4',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        observation: {
          // Focus observations on health-related preferences and goals
          instruction:
            'Prioritize capturing user health goals, dietary restrictions, exercise preferences, and medical considerations. Avoid capturing general chit-chat.',
        },
        reflection: {
          // Guide reflection to consolidate health patterns
          instruction:
            'When consolidating, group related health information together. Preserve specific metrics, dates, and medical details.',
        },
      },
    },
  }),
})

Async buffering

Async buffering is enabled by default. It pre-computes observations in the background as the conversation grows — when the messageTokens threshold is reached, buffered observations activate instantly with no blocking LLM call.

The lifecycle is: buffer → activate → remove messages → repeat. Background Observer calls run at bufferTokens intervals, each producing a chunk of observations. At threshold, chunks activate: observations move into the log, raw messages are removed from context. The blockAfter threshold forces a synchronous fallback if buffering can't keep up.

Default settings:

observation.bufferTokens: 0.2 — buffer every 20% of messageTokens (e.g. every ~6k tokens with a 30k threshold)
observation.bufferActivation: 0.8 — on activation, remove enough messages to keep only 20% of the threshold remaining
Buffered observations include continuation hints (suggestedResponse, currentTask) that survive activation to maintain conversational continuity
reflection.bufferActivation: 0.5 — start background reflection at 50% of observation threshold

To customize:

typescript

import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        observation: {
          messageTokens: 30_000,
          // Buffer every 5k tokens (runs in background)
          bufferTokens: 5_000,
          // Activate to retain 30% of threshold
          bufferActivation: 0.7,
          // Force synchronous observation at 1.5x threshold
          blockAfter: 1.5,
        },
        reflection: {
          observationTokens: 60_000,
          // Start background reflection at 50% of threshold
          bufferActivation: 0.5,
          // Force synchronous reflection at 1.2x threshold
          blockAfter: 1.2,
        },
      },
    },
  }),
})

To disable async buffering entirely:

typescript

observationalMemory: {
  model: "google/gemini-2.5-flash",
  observation: {
    bufferTokens: false,
  },
}

Setting bufferTokens: false disables both observation and reflection async buffering. Observations and reflections will run synchronously when their thresholds are reached.

:::note

Async buffering isn't supported with scope: 'resource' and is automatically disabled in resource scope.

:::

Streaming data parts

Observational Memory emits typed data parts during agent execution that clients can use for real-time UI feedback. These are streamed alongside the agent's response.

`data-om-status`

Emitted once per agent loop step, before model generation. Provides a snapshot of the current memory state, including token usage for both context windows and the state of any async buffered content.

typescript

interface DataOmStatusPart {
  type: 'data-om-status'
  data: {
    windows: {
      active: {
        /** Unobserved message tokens and the threshold that triggers observation */
        messages: { tokens: number; threshold: number }
        /** Observation tokens and the threshold that triggers reflection */
        observations: { tokens: number; threshold: number }
      }
      buffered: {
        observations: {
          /** Number of buffered chunks staged for activation */
          chunks: number
          /** Total message tokens across all buffered chunks */
          messageTokens: number
          /** Projected message tokens that would be removed if activation happened now (based on bufferActivation ratio and chunk boundaries) */
          projectedMessageRemoval: number
          /** Observation tokens that will be added on activation */
          observationTokens: number
          /** idle: no buffering in progress. running: background observer is working. complete: chunks are ready for activation. */
          status: 'idle' | 'running' | 'complete'
        }
        reflection: {
          /** Observation tokens that were fed into the reflector (pre-compression size) */
          inputObservationTokens: number
          /** Observation tokens the reflection will produce on activation (post-compression size) */
          observationTokens: number
          /** idle: no reflection buffered. running: background reflector is working. complete: reflection is ready for activation. */
          status: 'idle' | 'running' | 'complete'
        }
      }
    }
    recordId: string
    threadId: string
    stepNumber: number
    /** Increments each time the Reflector creates a new generation */
    generationCount: number
  }
}

buffered.reflection.inputObservationTokens is the size of the observations that were sent to the Reflector. buffered.reflection.observationTokens is the compressed result — the size of what will replace those observations when the reflection activates. A client can use these two values to show a compression ratio.

Clients can derive percentages and post-activation estimates from the raw values:

typescript

// Message window usage %
const msgPercent = status.windows.active.messages.tokens / status.windows.active.messages.threshold

// Observation window usage %
const obsPercent =
  status.windows.active.observations.tokens / status.windows.active.observations.threshold

// Projected message tokens after buffered observations activate
// Uses projectedMessageRemoval which accounts for bufferActivation ratio and chunk boundaries
const postActivation =
  status.windows.active.messages.tokens -
  status.windows.buffered.observations.projectedMessageRemoval

// Reflection compression ratio (when buffered reflection exists)
const { inputObservationTokens, observationTokens } = status.windows.buffered.reflection
if (inputObservationTokens > 0) {
  const compressionRatio = observationTokens / inputObservationTokens
}

`data-om-observation-start`

Emitted when the Observer or Reflector agent begins processing.

`data-om-observation-end`

Emitted when observation or reflection completes successfully.

`data-om-observation-failed`

Emitted when observation or reflection fails. The system falls back to synchronous processing.

`data-om-buffering-start`

Emitted when async buffering begins in the background. Buffering pre-computes observations or reflections before the main threshold is reached.

`data-om-buffering-end`

Emitted when async buffering completes. The content is stored but not yet activated in the main context.

`data-om-buffering-failed`

Emitted when async buffering fails. The system falls back to synchronous processing when the threshold is reached.

`data-om-activation`

Emitted when buffered observations or reflections are activated (moved into the active context window). This is an instant operation — no LLM call is involved.

Standalone usage

Most users should use the Memory class above. Using ObservationalMemory directly is mainly useful for benchmarking, experimentation, or when you need to control processor ordering with other processors (like guardrails).

typescript

import { ObservationalMemory } from '@mastra/memory/processors'
import { Agent } from '@mastra/core/agent'
import { LibSQLStore } from '@mastra/libsql'

const storage = new LibSQLStore({
  id: 'my-storage',
  url: 'file:./memory.db',
})

const om = new ObservationalMemory({
  storage: storage.stores.memory,
  model: 'google/gemini-2.5-flash',
  scope: 'resource',
  observation: {
    messageTokens: 20_000,
  },
  reflection: {
    observationTokens: 60_000,
  },
})

export const agent = new Agent({
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  inputProcessors: [om],
  outputProcessors: [om],
})

Standalone config

The standalone ObservationalMemory class accepts all the same options as the observationalMemory config object above, plus the following:

<PropertiesTable content={[ { name: 'storage', type: 'MemoryStorage', description: 'Storage adapter for persisting observations. Must be a MemoryStorage instance (from MastraStorage.stores.memory).', isOptional: false, }, { name: 'onDebugEvent', type: '(event: ObservationDebugEvent) => void', description: 'Debug callback for observation events. Called whenever observation-related events occur. Useful for debugging and understanding the observation flow.', isOptional: true, }, { name: 'obscureThreadIds', type: 'boolean', description: 'When enabled, thread IDs are hashed before being included in observation context. This prevents the LLM from recognizing patterns in thread identifiers. Automatically enabled when using resource scope through the Memory class.', isOptional: true, defaultValue: 'false', }, ]} />

Recall tool

When retrieval: true is set with scope: 'thread', OM registers a recall tool that the agent can call to page through the raw messages behind an observation group's _range. The tool is automatically added to the agent's tool list — no manual registration is needed.

Parameters

Returns

Reference: Observational Memory | Memory

Observational Memory

Usage

Configuration

Token estimate metadata cache

Examples

Resource scope with custom thresholds (experimental)

Shared token budget

Custom model

Different models per agent

Custom instructions

Async buffering

Streaming data parts

data-om-status

data-om-observation-start

data-om-observation-end

data-om-observation-failed

data-om-buffering-start

data-om-buffering-end

data-om-buffering-failed

data-om-activation

Standalone usage

Standalone config

Recall tool

Parameters

Returns

Related

`data-om-status`

`data-om-observation-start`

`data-om-observation-end`

`data-om-observation-failed`

`data-om-buffering-start`

`data-om-buffering-end`

`data-om-buffering-failed`

`data-om-activation`