skills/cache-expert/references/debugging.md
This guide focuses on practical debugging for current dagql + filesync cache behavior.
Use a tight test repro before adding logs.
Recommended integration command format:
dagger --progress=plain call engine-dev test --pkg ./core/integration --run='TestNameOrSubtest'
Capture output to file for long runs:
dagger --progress=plain call engine-dev test --pkg ./core/integration --run='TestModule' > /tmp/cache-debug.log 2>&1
rg -n "panic:|--- FAIL:|^FAIL\s" /tmp/cache-debug.log
Avoid broad ./... when debugging cache; use focused package/test slices.
When debugging leaked dagql cache refs, start with Prometheus metrics before adding deep logs.
Enable metrics on the target engine:
_EXPERIMENTAL_DAGGER_METRICS_ADDR=0.0.0.0:9090
_EXPERIMENTAL_DAGGER_METRICS_CACHE_UPDATE_INTERVAL=1s
Key metrics:
dagger_connected_clientsdagger_dagql_cache_entriesdagger_dagql_cache_ongoing_calls_entriesdagger_dagql_cache_completed_calls_entriesdagger_dagql_cache_completed_calls_by_content_entriesdagger_dagql_cache_ongoing_arbitrary_entriesdagger_dagql_cache_completed_arbitrary_entriesInterpretation:
connected_clients is 0 but dagql_cache_entries stays non-zero, refs are retained.completed_calls growth: call-result refs not released.ongoing_calls growth: waiter/cancel path likely stuck.*_arbitrary_* growth: opaque/arbitrary cache path leak.dagger_dagql_cache_entries is index-entry count, not unique-result count.
The same shared result may appear in multiple indexes.Practical scrape tip for nested-engine integration tests:
curl http://dev-engine:9090/metrics).Useful correlation log (session teardown):
engine/server/session.go logs:
released dagql cache refs for session with beforeEntries and afterEntriesafterEntries trends upward across completed sessions, session close is not releasing all refs.dagql/objects.go
preselect: log newID, returned cacheCfgResp.CacheKey.ID, and decoded args after rewritenewCacheKey: log ID, DoNotCache, TTL, ConcurrencyKeydagql/cache.go
GetOrInitCall: log callKey, storageKey, contentKey, hit path takenwait: log index insertion (storageKey, resultCallKey, contentDigestKey)dagql/session_cache.go
DoNotCache retries (noCacheNext)isClosed checks)dagql/cache.go around DB select/update logicdagql/db/queries.go compare-and-upsert behaviorengine/filesync/change_cache.go for change dedupe/wait/releaseengine/filesync/localfs.go for conflict detection (verifyExpectedChange) and release timingCheck in order:
GetCacheConfig rewrite ID unexpectedly?Check:
storageKey vs contentDigestKey)?Check:
noCacheNext for this key?DoNotCache and then reinserted?Check:
Prefer small, high-signal log lines with:
storage, content, miss, ongoing)This usually narrows root cause quickly without overwhelming logs.