Back to Eliza

Deep Analysis: Agent Context Gaps in Feed Simulation

packages/feed/docs/agent-context-analysis.md

2.0.321.6 KB
Original Source

Deep Analysis: Agent Context Gaps in Feed Simulation

Analysis of what context agents actually receive, what's missing, and why it produces poor simulation quality. Audited against the codebase on 2026-03-30. All claims verified — see audit annotations throughout. Updated 2026-03-31: Added Section 11 with live inspection results from production DB using context-inspector.


Executive Summary

The agent context system is a paradox: it's simultaneously over-engineered and under-delivering. There are elaborate context assembly pipelines, but critical data is either truncated to uselessness, built but never wired into prompts, or simply absent. The result is agents that feel like amnesiacs performing in a play they haven't read.

The core problem is not that the system lacks data — it's that the data doesn't reach the agents in a useful form.


1. What Agents Actually Receive for Trading Decisions

The Trading Context Pipeline

MarketDecisionEngineMarketContextServicenpc-market-decisions.ts prompt

Each NPC gets a "TRADER DASHBOARD" (MarketDecisionEngine.ts lines 641-649):

ID: {npcId} | Name: {npcName}
Archetype: {mapped_personality} | Strategy: {trend/contrarian/random}
Bias: {strategy_bias} | Cash: ${availableBalance}
Total PnL: {totalPnL} | Exposure: {exposure%}
Network: {top_4_relationships}           ← max 4, sentiment > |0.4| only
Positions: {top_3_positions}             ← top 3 by PnL only
Current Focus: {last_3_posts_20chars}    ← 20 characters each
PRIVATE INTEL: {last_2_group_msgs}       ← 2 messages, 120 chars each

What IS provided (and how it's crippled)

DataIncluded?LimitProblem
Current portfolio positionsYesTop 3 by PnLAgents can't see their full book; may have 10 positions but only see 3
Available balanceYesNoneWorks correctly
Perp market pricesYesAll companiesOnly current price + 24h change — no history, no trends
Prediction market pricesYes15 markets maxQuestion text truncated to 120 chars; no price history
Recent feed postsYes50 posts, 200 chars eachTruncated to the point of losing meaning
Group chat messagesPartially2 messages, 120 chars eachAlmost useless — 2 messages with 120 chars provides no conversational context
World eventsYes30 events, 150 chars eachTruncated descriptions lose critical detail
RelationshipsYesTop 4 by sentimentJust "Ally:Name, Rival:Name" — no history, no context for why
Event-market signalsYesCached, all marketsShows which events affect which markets — actually useful
Active questionsYesTop 10Question text + days until resolution

What is NOT provided for trading

Missing DataImpactData Exists in DB?
Trading historyAgents can't learn from past trades; repeat same mistakesYes — npcTrades table
Price history beyond 24hNo trend analysis, no support/resistanceYes — predictionPriceHistory, stockPrices
Order book / liquidity depthCan't assess slippage or market depthPartially — AMM formulas exist
Other NPCs' positionsNo awareness of market consensus or crowdingYes — poolPositions table
Resolved question outcomesCan't learn from what happened beforeYes — questions table with outcomes
Ongoing narrative arcsTemplate variable {{ongoingNarrativesContext}} exists but is never populatedYes — arc plans in DB
Previous tradesTemplate variable {{previousTrades}} exists but is never populatedYes — trade records in DB
Detailed character profilesTemplate variable {{detailedCharacterProfiles}} exists but is never populatedYes — static data
Character rosterTemplate variable {{characterRoster}} exists but is never populatedYes — static data
Resolved questionsTemplate variable {{resolvedQuestionsContext}} exists but is never populatedYes — DB
Market signal analysisextractMarketSignals() is built (lines 1005-1071 in market-context-service.ts) — analyzes YES/NO signal strength, confidence — but never exposed to NPCsBuilt in code, never wired

The "Ghost Variables" Problem

The prompt template (npc-market-decisions.ts) defines these variables that are never populated by the engine:

  • {{characterRoster}} — empty
  • {{detailedCharacterProfiles}} — empty
  • {{relationshipContext}} — empty
  • {{resolvedQuestionsContext}} — empty
  • {{previousTrades}} — empty
  • {{ongoingNarrativesContext}} — empty

[CONFIRMED] These are in the prompt loader's optionalVars list (prompts/loader.ts lines 55-141), so they don't throw errors — but they also don't render as empty. The literal strings like {{characterRoster}} are sent verbatim to the LLM alongside section headers like "=== ALL TRADERS IN WORLD ===". The LLM sees template syntax where content should be, which actively degrades output quality.


2. What Agents Receive for Social/Posting Decisions

The posting pipeline is significantly richer than the trading pipeline. FeedGenerator assembles per-character context via buildRichCharacterContext() (lines 455-539):

Context provided for posts

  • Full character identity (description, bio, domain, affiliations, tier)
  • Personality, voice style, post examples (up to 5, shuffled)
  • Social dynamics (allies/rivals with behavioral instructions)
  • Motivations (wealth/reputation/ideology/chaos)
  • Deception tendency
  • Emotional state (mood, luck, and how they affect tone)
  • Personal event history
  • Recent own posts (for anti-repetition)
  • Market positions and P&L
  • Relationship history with interaction notes
  • Complete event timeline (all previous days)
  • Recent feed posts from all NPCs
  • Ongoing storylines
  • Resolved question outcomes
  • World facts
  • Phase guidance (WILD/CONNECTION/CONVERGENCE/CLIMAX/RESOLUTION)
  • Trending topics
  • Group chat messages
  • Time-of-day energy modifiers

The asymmetry problem

Posts get far more context than trading decisions. An NPC writing a shitpost about a stock gets the full event timeline, resolved questions, ongoing narratives, and rich relationship history. The same NPC making a $10,000 trade gets 3 truncated positions, 2 group chat messages at 120 chars, and an empty {{ongoingNarrativesContext}} section.

This means NPC posts reference narratives and events that their trading decisions are blind to. An NPC might post "TSLAI is going to moon based on the leaked partnership" but then make a trading decision without any awareness of that leaked partnership, because the trading context doesn't include the narrative arc.


3. Agent Memory System: Exists But Shallow

What persists between ticks

NPC Memory Service (npc-memory-service.ts, 697 lines):

StatePersists?StorageCap
Recent memoriesYesActorState.recentMemories (JSONB)50 entries, FIFO eviction
Relationship sentimentYesActorState.relationships (JSONB)Per-pair, 10 notes max
Activity stateYesActorState columnsPosts today, last active, mood
Conversation historyYesmessages tablePermanent storage
Trading positionsYespoolPositions / perpPositionsUntil closed

Memory types tracked: posted, replied_to, mentioned_by, witnessed_event, traded, running_bit

What does NOT persist

  • No decision reasoning — agents don't remember WHY they made a trade
  • No beliefs or theories — no "I think TSLAI will go up because..." state
  • No learning from outcomes — resolved questions don't update agent beliefs
  • No cross-agent knowledge — agents can't share private conclusions
  • No conversation summaries — raw messages stored, but no distilled takeaways
  • No strategy evolution — trading strategy is static (mapped from personality), never adapts

How memory actually reaches the LLM

For posting: formatMemoriesForPrompt() creates a ## Recent Memories section with time-ago labels. This works reasonably well.

For trading: Memories are not included in the trading prompt at all. The npc-market-decisions.ts template has no {{memories}} variable. NPCs trade without any memory of their past actions or observations.


4. Autonomous Agent System (User-Controlled Agents)

File: packages/agents/src/autonomous/AutonomousCoordinator.ts

User-controlled autonomous agents have an even thinner context than NPCs:

Per-tick context gathering

Each executeAutonomousTick() starts completely fresh (line 66). No in-memory state survives between ticks. Context is re-fetched every tick:

  • getAgentPositions() — current positions
  • getRecentPosts() — last 24h of posts (all users)
  • getAgentOwnPosts() — last 5 of agent's own posts
  • getAgentGroupChats() — list of group chats

Group chat context

  • Looks back only 1 hour of messages
  • Limited to 10 messages per chat
  • Only last 5 messages included in DM context

This means an autonomous agent that had a detailed strategic conversation in a group chat 2 hours ago has zero memory of that conversation when making its next decision.

No trajectory or learning

  • Trajectory recording is disabled by default (line 69)
  • When enabled, it's for RL training data, not runtime behavior
  • No mechanism to learn from past trades, conversations, or market outcomes

5. The Production Path is Worse Than the Dev Path

A critical finding: the production code paths are more context-impoverished than the development/GameGenerator paths.

FeatureGameGenerator PathProduction Path
Actor/org shufflingYes (lines 431-440)No — fixed order from StaticDataRegistry
ScenariosLLM-generated, variedHardcoded scenarioId = 1 (line 1357)
Rich game contextFull event timelineCached, potentially stale (60s TTL)
World context detailComprehensiverealityGroundingLevel: 'minimal'

6. Truncation Destroys Context Value

The system aggressively truncates everything to fit token budgets, but the truncation points destroy the informational value:

ContentTruncationWhat's Lost
Feed posts200 charsA typical insight is 300-500 chars; agents see sentence fragments
Group chat msgs120 charsConversations are incomprehensible at this length
Event descriptions150 charsComplex events ("Company X acquires Y for $Z, pending regulatory...") get cut mid-sentence
Question text120 charsPrediction market questions are often >120 chars, so agents can't read the full question they're betting on
Post "focus"20 chars per post"Current Focus: TSLAI looks like i..." — meaningless

The token budget is real (4-20 NPCs batched per LLM call), but the current approach of "include everything, truncate everything" produces quantity without quality. Agents see 50 posts they can't understand rather than 5 posts they can.


7. The Information Asymmetry That Breaks Immersion

What players see vs. what agents "see"

A human player reading the feed sees:

  • Full-length posts with complete arguments
  • Complete news articles with analysis
  • Full prediction market questions with resolution dates
  • Price charts with historical trends
  • Complete group chat conversations

An NPC agent deciding to trade sees:

  • 3 truncated positions
  • 2 group chat snippets (120 chars)
  • 50 truncated posts (200 chars) — but only "Current Focus" shows 3 at 20 chars in the dashboard
  • No price history
  • No article content
  • Empty narrative context sections

The result: agents make decisions that are obviously uninformed compared to what a human player can see, breaking the illusion that they're participants in the same simulation.


8. Root Cause Map

WHY ARE AGENTS BAD?
├── Trading context is thin
│   ├── 6 prompt template variables are never populated (ghost variables)
│   ├── Market signal analysis is built but never wired to trading
│   ├── Only top 3 positions shown (agents don't know their full book)
│   ├── No trading history (can't learn from past trades)
│   └── No resolved question context (can't learn from outcomes)
│
├── Memory doesn't reach trading
│   ├── NPC memories exist (50 entries) but aren't in trading prompt
│   ├── No beliefs/theories persist between ticks
│   └── No decision reasoning is stored or recalled
│
├── Truncation destroys meaning
│   ├── 120-char group chat messages are incomprehensible
│   ├── 20-char "current focus" is meaningless
│   ├── 200-char post truncation loses core arguments
│   └── 120-char question text means agents can't read what they're betting on
│
├── Posting/trading context mismatch
│   ├── Posts get full narrative arc context; trades get empty sections
│   ├── Posts reference events that trading decisions are blind to
│   └── Creates incoherent agent behavior (posts contradict trades)
│
├── Autonomous agents are amnesiac
│   ├── 1-hour lookback on group chats
│   ├── No cross-tick state
│   ├── No learning from outcomes
│   └── Fresh context fetch every tick with no continuity
│
└── Production path is impoverished
    ├── No actor/org shuffling (deterministic LLM inputs)
    ├── Minimal reality grounding
    └── Hardcoded scenarioId = 1

9. Structural Weaknesses Summary

IssueLocationSeverityImpact
Ghost template variables (6 empty sections)npc-market-decisions.tsCriticalTrading decisions made without narrative, history, or character context
Market signals built but never usedmarket-context-service.ts:1005-1071HighSignal analysis exists but NPCs can't see it
Memories excluded from trading promptMarketDecisionEngine.tsHighNPCs trade without any memory of past actions
Aggressive truncationmarket-context-service.ts (multiple)HighContext becomes noise — quantity without quality
Post-trade context asymmetryFeedGenerator.ts vs MarketDecisionEngine.tsHighPosts reference things trading decisions can't see
1-hour group chat lookback for autonomous agentsAutonomousGroupChatService.ts:80MediumStrategic conversations forgotten after 1 hour
No trading history in contextMarketDecisionEngine.tsMediumAgents repeat mistakes, can't develop strategies
No resolved question outcomesMarketDecisionEngine.tsMediumAgents can't learn from market resolutions
Top-3-only position visibilityMarketDecisionEngine.ts:602MediumAgents unaware of their full exposure
No price history beyond 24hmarket-context-service.tsMediumNo trend analysis possible
Production path lacks shufflingQuestionManager.ts:1072-1097LowDeterministic inputs reduce variety
Fixed trading archetypesMarketDecisionEngine.tsLowStrategy never adapts based on performance

10. Recommendations

Tier 1: Wire Up What Already Exists (Low effort, high impact)

  1. Populate the ghost variables{{previousTrades}}, {{resolvedQuestionsContext}}, {{ongoingNarrativesContext}} are already in the template. The data exists in the DB. Just wire them up in MarketDecisionEngine.generateDecisionsForContexts().

  2. Include NPC memories in trading contextNpcMemoryService.formatMemoriesForPrompt() already exists and works for posting. Add a {{memories}} section to the trading prompt.

  3. Expose market signal analysisextractMarketSignals() already computes YES/NO signal strength and confidence. Add it to the trading prompt.

  4. Unify post and trading context — Use the same buildRichCharacterContext() pipeline for trading that posting already uses, or at minimum share the narrative/event context.

Tier 2: Fix Truncation Strategy (Medium effort, high impact)

  1. Quality over quantity — Instead of 50 posts at 200 chars, provide 10 most relevant posts at 500 chars. Use topic relevance (daily topic, held positions) to select which posts matter.

  2. Full question text — Never truncate prediction market questions. If agents can't read the question, they can't bet intelligently. Cut something else.

  3. Meaningful group chat context — Increase from 2 messages at 120 chars to 5-8 messages at 300 chars, or provide a summary of recent conversation themes.

  4. Remove "Current Focus" at 20 chars — It's noise. Either show full recent posts or remove the field entirely.

Tier 3: Add Missing Capabilities (Higher effort)

  1. Trading history context — Include last 5-10 trades with outcomes in the prompt. Let agents learn from their past decisions.

  2. Price trend data — Include 7-day price trend (direction, volatility, key levels) for held positions. The predictionPriceHistory and stockPrices tables already have this data.

  3. Belief/theory persistence — After each trading decision, store the agent's reasoning. Include it in the next tick's context so agents develop consistent strategies.

  4. Extend autonomous agent lookback — Increase group chat lookback from 1 hour to 24 hours, or implement conversation summarization.

  5. Cross-agent information sharing — Let agents who are "allies" share position information or trading theses through the relationship system.


11. Live Inspection Results (2026-03-31)

Using bun run inspect:context -- --agent <userId> --raw against the production DB, we inspected multiple autonomous agents. The findings below are observed behavior, not code analysis.

Most autonomous agents are effectively dead

Every agent inspected showed the same pattern:

AgentBalanceLifetime PnLOpen PositionsAvailable Actions
Delta Lab$0.00-$198B0REPLY_CHAT, FINISH, WAIT
Beta Edge$0.39-$1.7T0REPLY_CHAT, FINISH, WAIT
Iota One$0.00-$124B10 (all -100%)REPLY_CHAT, FINISH, WAIT
Cosmic AI$0.00-$740REPLY_CHAT, FINISH, WAIT

Key observations:

  • Every agent has $0 or near-$0 balance with astronomical negative PnL
  • With $0 balance, the prompt correctly disables TRADE/POST/COMMENT actions
  • Agents can only REPLY_CHAT, FINISH, or WAIT — they are functionally inert
  • Agents with open positions (e.g., Iota One) have all positions at -100% with 0 shares
  • The simulation has essentially bankrupted every autonomous agent

The prompt is technically correct but practically useless

The buildMultiStepDecisionPrompt correctly reflects the agent's state. But the state itself is broken:

  • Balance: $0.00 with no mechanism to recover
  • PnL values in the trillions of dollars negative (suggests overflow or compounding bugs in the trading/settlement system)
  • Positions with entry prices in the billions of cents (e.g., entry: 250768773974¢) — clearly corrupted data
  • Even agents with autonomousTrading: true in config can't trade because they have no funds

Context utilization is extremely low

Even for the "richest" agent context inspected:

  • Total rendered prompt: ~2,300 tokens out of a 30,000 token budget
  • Utilization: 5-8% — the prompt is 92-95% empty
  • Markets, posts, and positions ARE gathered (shown in actionability summary) but sections like trading actions are gated behind balance checks
  • The quality rules, banned patterns, and examples consume more tokens (~800) than the actual agent-specific context (~400)

Comparison: NPC prompt vs Agent prompt

MetricNPC Trading PromptAutonomous Agent Prompt
Total tokens~3,100~2,300
Lines~340~210
Market dataFull table with all perps + predictionsCounts only (sections gated)
PositionsAll positions with PnLNone rendered (gated by features)
World contextReality grounding + parody names + themesNone
Narrative contextEvent signals, resolved questionsNone
Character identityArchetype, strategy, bias, relationshipsName only
Quality rulesMinimal~60% of prompt is quality/ban rules

The autonomous agent prompt is dominated by quality/formatting rules rather than actual decision-relevant context. When an agent CAN trade, it gets less market/world context than NPCs.

New issues discovered

  1. Agent bankruptcy with no recovery — Agents that hit $0 are permanently stuck. There is no mechanism to refund, reset, or gradually restore agent balances. The simulation needs either balance resets, minimum balance guarantees, or income mechanics.

  2. Corrupted PnL/price data — Entry prices in the billions of cents and PnL in the trillions suggest either overflow bugs in the AMM, settlement errors, or missing validation in trade execution. This needs investigation before any context enrichment will matter.

  3. Prompt is mostly boilerplate when agents can't act — When balance is $0, the 2,300-token prompt is ~60% quality rules for content the agent will never generate (since it can only REPLY_CHAT). The prompt should be dramatically shorter for limited-action states.

  4. No world context for autonomous agents — Unlike NPCs which get reality grounding (parody names, world state, running themes), autonomous agents get zero world context. They have no awareness of the game's satirical setting, current events, or market narratives.