Minecraft Debug MCP Surface

Implementation source: /path/to/project/root/services/minecraft/src/debug/mcp-repl-server.ts.

Endpoint

The bot starts this server during normal runtime from:

get_state()
- Returns current REPL/brain state JSON.
get_last_prompt()
- Returns latest LLM input JSON.
- Returns error when no prompt exists yet.
- Compacted payload: omits systemPrompt and drops messages items with role: "system".
get_logs(limit?: number)
- Returns recent LLM logs; start with small limits.
get_llm_trace(limit?: number, turnId?: number)
- Returns structured LLM trace entries captured per attempt.
- Includes: turn/source metadata, messages, generated content, reasoning (if available), token usage, and duration.
- Use turnId to isolate trace for one injected test event.
- Compacted payload: drops messages items with role: "system" to save tokens.
execute_repl(code: string)
- Executes debug REPL code in running brain context.
- Use for focused inspection/action only.
- Runtime global includes forget_conversation() for conversation-memory reset.
inject_chat(username: string, message: string)
- Injects a synthetic chat perception event.
inject_event(type, payload, source)
- type: perception | feedback | world_update | system_alert
- source.type: minecraft | airi | system
- source.id: string
- Use only with deliberate, test-specific payloads.

Connection refused:
- Ensure pnpm dev is running in the service directory.
- Confirm logs include MCP REPL server running at http://localhost:3001.
404/invalid endpoint:
- Use /sse as MCP entrypoint.
Empty prompt/logs:
- Trigger activity first (for example via inject_chat) and retry get_last_prompt or get_logs.

inject_chat is not a passive write: it enters the normal cognition pipeline and can cause the bot to send chat/actions.
get_last_prompt may be very large (full system prompt + history); avoid repeated calls unless needed.
get_last_prompt is now MCP-compacted (no raw system prompt text), which makes it cheaper for automation checks.
execute_repl response includes metadata (source, durationMs, actions, logs) and a stringified returnValue.
Query runtime now has LLM-friendly shortcuts for deterministic reads:
- query.self()
- query.inventory().count(name)
- query.inventory().has(name, atLeast?)
- query.inventory().summary()
- query.snapshot(range?)
REPL runtime exposes a read-only patterns helper for known working recipes:
- patterns.get(id)
- patterns.find(query, limit?)
- patterns.ids()
- patterns.list(limit?)
Log verification pattern that worked reliably:
1. inject_chat(...)
2. get_logs(limit: 10)
3. Confirm sequence: turn_input -> llm_attempt -> feedback -> repl_result

Use this exact sequence for fast live validation:

Baseline
- get_state()
- execute_repl("query.inventory().list().map(i => ({ name: i.name, count: i.count }))")
- Optional clean slate:
  - execute_repl("forget_conversation()")
Task trigger
- inject_chat({ username: \"codex-live-test\", message: \"please gather 3 dirt blocks\" })
Execution proof
- get_logs({ limit: 10 })
- Expect acknowledgement chat + collectBlocks success feedback + REPL summary.
- get_llm_trace({ limit: 5 })
- Assert expected LLM behavior (for example response code, or repeated await skip()).
- Assert trace payload does not include role: "system" entries.
Outcome proof
- Run the same inventory execute_repl call again and compare item counts.

To validate read->action behavior:

Inject a query-style chat (for example inventory question).
Confirm first REPL result is no-action with concrete return value (via get_logs/get_llm_trace).
Confirm follow-up turn uses that returned value to perform chat/action.

The degraded environment sentinel can appear only when context has not been refreshed yet.
With the current fix, inject_chat refreshes reflex context first, so this should not appear in normal MCP chat-injection tests.