skills/cache-expert/references/egraph.md
This document describes the current dagql cache implementation centered on
the e-graph in dagql/cache.go and dagql/cache_egraph.go.
The code is the source of truth. This doc is meant to make the current model faster to load into your head, not replace reading the code.
For test-running and debugging workflow, see debugging.md. This doc is about
how the cache is structured and how it behaves.
This is not a full rewrite engine or a generic academic e-graph implementation. In practice, the important pieces are:
The useful mental model is:
selfDigest + canonical input eq-classes -> output eq-class.sharedResults hanging off the graph, not nodes in
the graph itself.So the "e-graph" part is mostly the union-find + congruence closure over terms. The cache behavior around ownership, sessions, persistence, and lazy payloads lives around it.
There are four distinct identity layers worth keeping separate in your head:
Recipe digest
ResultCall / call.ID.Extra digests
Structural term
Shared result ID
The most important invariant is:
sharedResults hold real payloads and lifecycle stateThat separation is why the implementation works as well as it does:
dagql/cache.go splits the cache into three main lock domains:
callsMu
sessionMu
egraphMu
This split matters both for correctness and performance. In particular, recent
fixes deliberately avoid cross-locking callsMu and egraphMu in observational
paths.
Eq-class state lives in Cache:
egraphDigestToClasseqClassToDigestseqClassExtraDigestsegraphParentsegraphRanksThis is a standard union-find setup:
The main entry points are:
ensureEqClassForDigestLockedfindEqClassLockedmergeEqClassesLockedrepairClassTermsLockedeqClassExtraDigests is worth calling out separately: it stores labeled
class-level digest facts known for an output eq-class. It is not the main
direct-hit index. Direct digest hits come from egraphResultsByDigest.
Terms live in dagql/cache_egraph.go as egraphTerm:
selfDigestinputEqIDstermDigestoutputEqIDImportant detail: terms are symbolic only. They do not hold payloads.
termDigest is the digest of:
That means if input eq-classes merge later, the term may need to move to a new
termDigest. That is what congruence repair does.
Term indexes:
egraphTermsegraphTermsByTermDigestinputEqClassToTermsoutputEqClassToTermstermInputProvenanceMaterialized results are sharedResults in dagql/cache.go.
These hold:
self, hasValue, isObject)ResultCalldeps)incomingOwnershipCount)This is a major design point: the e-graph decides equivalence, but the
sharedResult owns the payload and lifecycle.
The bridge between symbolic graph and concrete payload is:
resultOutputEqClassestermResultsresultTermsegraphResultsByDigestThese let the cache answer questions like:
That distinction matters for hit selection.
One output eq-class can have multiple materialized results.
For example:
Because of that, lookup prefers:
That behavior lives mainly in:
appendTermSetResultsLockedfirstResultForTermSetDeterministicallyAtLockedfirstResultForOutputEqClassDeterministicallyAtLockedThe e-graph consumes structural identity produced upstream by ResultCall /
call.ID.
The key functions are in dagql/call/id.go:
SelfDigestAndInputRefsSelfDigestAndInputsThe shape is:
That distinction is why structural equivalence works: the cache can say "same operation over equivalent inputs" without flattening the whole call into one undifferentiated digest.
Content-preferred digest is related but separate. In dagql/call/id_content.go,
it expresses "if outputs are interchangeable by content, what digest should we
prefer?" It is used as equivalence evidence, not as the authoritative recipe.
These are the functions that really matter when reading the system.
lookupCacheForRequestThis is the normal lookup path for a new call.
High-level flow:
The actual structural lookup is in lookupMatchForCallLocked.
lookupCacheForDigestsThis is a digest-only lookup path.
It does not compute a structural term. It looks up by:
This is useful when the caller already has digest evidence and does not need the full request-term path.
indexWaitResultInEgraphLockedThis is the main publication path for a newly completed result.
It:
sharedResultIDThis is the main place where fresh execution results become lookup-visible.
teachResultIdentityLockedThis is the "we got a hit, now teach the graph more about what this result is" path.
This matters because a cache hit may arrive through one route, but we still want future requests for the new recipe shape to resolve directly.
Typical uses:
TeachCallEquivalentToResultThis is an externalized way to say:
"this call frame is equivalent to this existing result"
It attaches the result if needed, derives the structural identity for the call, and then teaches that identity onto the result.
TeachContentDigestThis is the path for late output-equivalence evidence.
It updates the result call frame with a content digest and teaches that new digest into the e-graph without replacing recipe identity.
AttachResultThis is a big one conceptually even though it is not "e-graph logic" in the narrow sense.
It takes a detached result, normalizes any pending ResultCallRefs, tries to
resolve it against the cache, and if necessary publishes it as a new
sharedResult.
This is one of the main ways semantic attachment and result-call references get bridged into the e-graph world.
importPersistedStateOn engine startup, persisted mirror tables are read back into memory.
This reconstructs:
This is how the in-memory e-graph is restored after restart.
removeResultFromEgraphLockedThis removes a materialized result from the graph when ownership drains to zero.
It:
compactEqClassesLockedThis is maintenance rather than hot-path lookup, but it matters.
Union-find IDs only grow. After lots of merges and pruning, the class ID space can get sparse. Compaction rebuilds the live eq-class space to keep it smaller and more coherent.
Today this runs after prune when needed.
lookupMatchForDigestsLocked does:
If the request is a simple pure recipe hit:
then lookupCacheForRequestLocked takes a fast path and skips teaching the graph
anything new. This keeps exact-hit overhead down.
The direct-hit index here is egraphResultsByDigest, which indexes request and
response recipe digests plus extra digests for concrete results.
If exact digest lookup misses, lookupMatchForCallLocked:
termDigest = hash(selfDigest, canonical input eq IDs)Candidate gathering has an intentional preference order:
That means the cache prefers "we have seen this exact structural shape produce this result" over "something equivalent exists somewhere in the same output class."
Even if a result is structurally equivalent, it may depend on session-scoped resources the current session does not have.
selectLookupCandidateForSessionLocked filters candidates using:
requiredSessionResources on the resultsessionHandlesBySession for the current sessionSo the e-graph gives you semantic candidates, then session/resource filtering chooses a result that is actually usable.
Some paths do not start from a fresh request lookup. They start from an already attached result or a specific result ID and need the best reusable equivalent for the current session.
That is what canonicalEquivalentSharedResultLocked does.
This matters because one output eq-class can have multiple materialized results.
The cache can legitimately canonicalize from one sharedResult onto another
session-compatible sibling in the same equivalence region.
GetOrInitCall / waitNormal execution goes:
ongoingCallwait() calls initCompletedResultinitCompletedResultThis is the center of fresh result publication.
Important work done here:
sharedResultResultCallattachDependencyResultsThat temporary handoff hold is important. It prevents publication races where a freshly published result becomes unowned before the producing waiters or direct attach path have claimed the real session ownership.
The current publication order intentionally publishes before attachment is fully complete, because semantic module object attachment needs the parent result to already exist in the cache.
That creates a visibility race: another reader could otherwise observe the
payload while AttachDependencyResults is still rewriting it.
The fix is not "publish later." The fix is:
sharedResultensurePersistedHitValueLoadedThat preserves the required publication order for semantic attachment while blocking readers from seeing partially rewritten payloads.
The most e-graph-specific logic lives in mergeEqClassesLocked and
repairClassTermsLocked.
When output digests or extra digests are merged:
termDigest is recomputedThat last step is the actual congruence-closure behavior.
This is why the implementation is more than "union-find over digests." The term repair step is what propagates "equivalent inputs imply equivalent outputs."
Each term/result association stores per-input provenance:
resultdigestThat is subtle but important. The cache wants to remember not just that a result matched a term, but whether the match came from fully attached result-backed inputs or from digest-only witness inputs.
Today this mostly matters for preserving more honest associations and for keeping the door open to stricter behavior if provenance-sensitive cases matter more in the future.
Persisted import is not a side feature. It is a first-class part of the cache model.
importPersistedState reconstructs the in-memory graph from mirrored rows.
Imported results may start life in a partially materialized state:
persistedEnvelope != nilhasValue == falseThat means the cache knows the result exists and can index it, but has not yet decoded the typed payload.
ensurePersistedHitValueLoaded is the boundary that makes those hits safe before
they escape:
ResultCallThis function is why imported object hits no longer silently leak out as unresolved nil payloads.
The e-graph does not own liveness by itself.
Liveness comes from:
incomingOwnershipCount on sharedResult is the real truth for whether a result
is still live.
When that count reaches zero, the cache can collect the result and remove its graph associations.
This is an important architectural boundary:
A lot of the implementation is deliberately deterministic.
Important examples:
TreeSets are used for term/result indexes so iteration order is stablesharedResultIDThis is not just nice-to-have polish. It makes:
The fast path for pure recipe hits avoids extra teaching work.
If any input digest has never been seen before, structural term lookup does not try to invent anything. It records that primary lookup was not possible and falls back to miss behavior.
callsMuGetOrInitCall intentionally does not do a second lookup while holding
callsMu. That means it can occasionally re-execute work instead of catching a
very late hit, but it avoids adding more lookup cost and lock coupling on the
normal miss path.
Repair work is driven from inputEqClassToTerms, so merges only revisit terms
that actually mention the merged class as an input.
Eq-class slots are compacted after prune rather than continuously. That keeps the hot path simpler.
The graph finds semantic candidates first. Session resource compatibility is a second-stage filter. This keeps the graph structure global while still making session-scoped results safe to reuse.
WriteDebugCacheSnapshot writes JSON incrementally instead of building one huge
in-memory object. This matters because the cache can get large, and the debug
path itself should not become a second problem.
The live engine exposes two especially useful debug endpoints:
/debug/dagql/egraph
/debug/dagql/cache
The streamed cache snapshot is generally the more useful one when debugging real cache behavior, because it shows both the graph and the payload / lifecycle side around it.
A hit does not mean "return the canonical result call frame and forget the request." The cache teaches new request identity onto the result so future lookups can resolve through that route too.
If a result shares an output eq-class with a term, that does not mean it was directly observed for that term. The implementation intentionally tracks both.
Do not confuse term edges with real retention edges. Real retention lives in
sharedResult.deps, persisted edges, and session ownership.
That is valid. It is why ensurePersistedHitValueLoaded exists.
Two results can be semantically equivalent and still not both be valid hits for the same session.
If you are trying to load this implementation into your head, this order works well:
dagql/cache.go
CachesharedResultGetOrInitCallwaitinitCompletedResultAttachResultdagql/cache_egraph.go
lookupMatchForDigestsLockedlookupMatchForCallLockedlookupCacheForRequestLockedteachResultIdentityLockedindexWaitResultInEgraphLockedmergeEqClassesLockedrepairClassTermsLockedremoveResultFromEgraphLockedcompactEqClassesLockeddagql/cache_persistence_import.go
importPersistedStateensurePersistedHitValueLoadeddagql/call/id.go and dagql/call/id_content.go
If you only remember one sentence, make it this:
The current dagql cache is a symbolic equivalence engine over call structure and output digests, with concrete cached payloads managed separately by explicit ownership edges, persistence state, and session/resource constraints.