.qwen/design/2026-05-21-memory-pressure-monitor-design.md
Long-running Qwen Code sessions can accumulate memory through large tool results, repeated file reads, chat history, and native/external allocations. Before this change, the core package had diagnostics and session-reset cleanup, but no runtime response when memory pressure rises during normal tool execution.
The highest-value cache-specific gap is FileReadCache: it already has a
bounded FIFO size, but it did not have a time-based eviction path. That means a
session can retain inactive file-read metadata until the hard entry limit is
hit, even when the process is under memory pressure.
Config instances isolated from parent session cleanup.--expose-gc.Config.initialize() creates one MemoryPressureMonitor per initialized
Config. getMemoryPressureMonitor() mirrors the existing getFileReadCache()
Object.create isolation pattern: when a child config is created through
prototype delegation, the getter lazily installs an own monitor bound to that
child config.
CoreToolScheduler.executeSingleToolCall() calls scheduleCheck() in its
finally block after ending the tool span. scheduleCheck() coalesces multiple
calls in the same event-loop turn with queueMicrotask, so concurrent read-like
tool batches do not run one memory check per tool result.
The monitor uses the stronger of two pressure signals:
/sys/fs/cgroup/memory.max when it is a finite positive value; fall back to
cgroup v1 /sys/fs/cgroup/memory/memory.limit_in_bytes, then to
os.totalmem() otherwise. cgroup v1's huge "unlimited" sentinel values are
ignored.heapUsed divided by getHeapStatistics().heap_size_limit.Using both signals matters because containers usually fail by RSS/cgroup limit, while local high-memory machines can hit V8 heap OOM long before RSS is a large fraction of total system memory.
Default thresholds are intentionally conservative enough to react before the OS or container OOM killer does:
softPressureRatio = 0.50hardPressureRatio = 0.65criticalRatio = 0.80cleanupCooldownMs = 5000enableExplicitGC = falseEnvironment overrides:
QWEN_MEMORY_PRESSURE_SOFTQWEN_MEMORY_PRESSURE_HARDQWEN_MEMORY_PRESSURE_CRITICALQWEN_MEMORY_ENABLE_GC=1Invalid ratios fall back to defaults. Valid ratios must be ordered as
soft < hard < critical, with a lower soft bound of 0.3 and an upper
critical bound of 0.98. Ratio env vars are parsed strictly with Number(),
so values such as 0.8extra are rejected instead of partially accepted.
Invalid memory-pressure env configuration writes a visible warning to stderr
and to the debug log before falling back to defaults.
Pressure levels map to increasingly strong cleanup:
soft: evict stale FileReadCache entries not accessed in 60 minutes.hard: evict cache entries not accessed in 30 minutes.critical: clear the file-read cache and optionally trigger global.gc().The monitor intentionally does not force chat compaction. Compaction can call the model backend and rewrite active chat state, so it should be triggered only from a call site that can safely coordinate with the conversation loop.
Cleanup is fire-and-forget from the scheduler, but the monitor guards cleanup
steps with cleanupInProgress and a cooldown timestamp. A higher-pressure
cleanup can bypass the cooldown and queue behind an in-progress lower-pressure
cleanup, so a critical check is not lost while a soft cleanup is finishing.
After successful cleanup it logs an RSS delta on setImmediate(), but RSS
movement is diagnostic only: V8 and libc may retain freed pages even when
JavaScript objects became collectible. Consecutive failures count cleanup-step
exceptions, not unchanged RSS, and the counter is reset on a new session. If
three successful cleanup attempts in a row free less than 1% RSS, the monitor
emits memory-cleanup-ineffective as a diagnostic signal without treating the
cleanup step itself as failed.
The implementation is covered by:
process.memoryUsage();memory.max and cgroup v1 memory.limit_in_bytes behavior;scheduleCheck() coalescing;scheduleCheck() after tool execution;Config monitor isolation through Object.create;FileReadCache.evictNotAccessedSince() behavior.