skills/cache-expert/references/lazy_evaluation.md
This document describes the current lazy-evaluation model used by the dagql cache and the core object implementations built on top of it.
At a high level, laziness means a resolver can return a real dagql result immediately while deferring some materialization work until that result is actually needed later. The returned result is still a normal cache-backed result with normal identity, dependency edges, session-resource requirements, and persistence behavior. What is deferred is the work needed to fully materialize its internal state.
In practice, this is used heavily for Directory, File, and Container objects. The typical pattern is:
Lazy implementation that knows how to finish materializing it later.LazyAccessor, which calls into dagql.Cache.Evaluate.There are three main reasons this exists:
This is different from DoNotCache. A lazy result is still expected to be attached to the cache. In fact, the cache rejects newly created lazy results from DoNotCache calls, because lazy evaluation depends on the result having an attached sharedResult.
A DoNotCache field can still return an already attached lazy result. What is rejected is creating a brand new lazy result that never becomes cache-backed.
The dagql cache does not know anything about Directory, File, Container, or any other concrete type-specific lazy shape. Its contract is intentionally small:
dagql.HasLazyEvaluationdagql.LazyEvalFuncdagql.Cache.EvaluateAny result can participate in lazy evaluation if its wrapped value implements:
type HasLazyEvaluation interface {
LazyEvalFunc() LazyEvalFunc
}
That means the mechanism is generic. In practice, the engine mostly uses it for object results, but the cache itself is not limited to objects.
The key dagql entry points are:
dagql.Cache.Evaluate
The public API for forcing one or more attached results to finish lazy evaluation.dagql.Cache.evaluateOne
The per-result implementation that handles singleflight, recursion detection, cancelation, call-context restoration, and telemetry resumption.dagql.Cache.registerLazyEvaluation
Stores the current lazy callback on the attached sharedResult when a result is first published or when a cache/persisted hit is re-wrapped.dagql.HasPendingLazyEvaluation
Reports whether an attached result still has deferred work. Telemetry uses this to avoid treating a pending lazy hit as a fully satisfied cache hit.Cache.Evaluate GuaranteesCache.Evaluate(ctx, results...) does two different kinds of coordination:
First, if the caller passes multiple results, the cache evaluates those different results in parallel with an errgroup.
Second, for any single attached result, the cache uses per-sharedResult singleflight so that only one lazy callback is running for that result at a time. Other callers wait for the same work instead of duplicating it.
That gives two important properties:
The tests in dagql/cache_test.go cover both behaviors.
sharedResult Lazy StateAttached results carry cache-owned lazy state in dagql.sharedResult:
lazyEvallazyEvalCompletelazyEvalWaitChlazyEvalCancellazyEvalWaiterslazyEvalErrThis state is guarded by lazyMu.
Conceptually:
lazyEval is the callback the cache should runlazyEvalComplete means the attached result is fully materializedlazyEvalWaitCh means evaluation is currently in flightlazyEvalWaiters tracks how many callers are waiting on that in-flight evaluationlazyEvalCancel lets the cache cancel the in-flight evaluation if the last waiter goes awayThe cache registers lazy evaluation whenever it publishes or reconstructs an attached result that still has deferred work.
Important places where this happens:
initCompletedResultsharedResultThis matters because the sharedResult is the stable cache-owned object, while typed wrappers may be re-created on hit paths. The cache re-derives the current LazyEvalFunc from the wrapped value and stores it on the sharedResult so later Evaluate calls have the right callback.
For a single result, Cache.evaluateOne works roughly like this:
sharedResult.sharedResultIDs stored in context.LazyEvalFunc from the wrapped value.Two details are especially important.
The cache threads a linked stack of sharedResultIDs through context while evaluating lazy results. If a lazy callback tries to re-enter evaluation of itself, or any ancestor already on that stack, the cache returns:
recursive lazy evaluation detected
That prevents accidental infinite recursion when a lazy implementation evaluates the wrong result.
The actual callback runs under a context built from context.WithoutCancel(stackCtx), then wrapped in context.WithCancelCause.
That means one impatient caller does not immediately tear down the shared lazy callback for everyone else. Instead:
This is a shared-work model, not a per-caller callback model.
Before starting the lazy callback, the cache restores the result's authoritative ResultCall into the callback context with:
dagql.ContextWithCall(evalCtx, resultCall)
This is crucial. Lazy evaluation often runs much later than the original field resolver, but many core helpers still need the current dagql call.
One concrete example is DirectoryWithoutLazy.Evaluate, which calls:
dir.Without(ctx, lazy.Parent, dagql.CurrentCall(ctx), true, lazy.Paths...)
That only works because Cache.Evaluate restored the original call frame first.
Without this, lazy implementations that depend on dagql.CurrentCall(ctx) would behave differently from eager execution and could break equivalence-teaching, provenance, or other call-sensitive behavior.
The lazy model also restores telemetry lineage instead of treating lazy work as an unrelated background task.
When a result is first returned from GetOrInitCall or wait, the cache captures the session's current span context in captureSessionLazySpanContext.
Later, when some caller actually triggers Cache.Evaluate, the cache:
resume lazy evaluation or resume <field>Then it wraps the callback context with resumedCallbackSpan, which deliberately reports the original span context to the lazy callback itself.
That gives the desired split:
This is why the telemetry tests verify both:
resume ... span linked to the original spanAfter the lazy callback returns successfully, the cache:
syncResultSnapshotLeaseslazyEvalComplete = truelazyEvalIf the callback fails, the cache does not mark the result complete. Future Evaluate calls can try again.
So the rule is simple:
core.Lazy[T]The cache-level mechanism is only half the story. The object implementations use a second layer in core/lazy_state.go:
type Lazy[T dagql.Typed] interface {
Evaluate(context.Context, T) error
AttachDependencies(context.Context, func(dagql.AnyResult) (dagql.AnyResult, error)) ([]dagql.AnyResult, error)
EncodePersisted(context.Context, dagql.PersistedObjectCache) (json.RawMessage, error)
}
This is the object-side contract.
Every concrete lazy type in core embeds a LazyState:
type LazyState struct {
LazyMu *sync.Mutex
LazyInitComplete bool
}
LazyState.Evaluate gives per-instance idempotence:
LazyMu, runs the callback once, and marks the instance complete on successThis is distinct from the cache-level singleflight.
There are two separate jobs here:
core.LazyState makes each concrete lazy object implementation itself behave like a one-time materializerThe cache layer is the authoritative cross-caller coordination layer. The core layer keeps each lazy implementation internally disciplined and idempotent.
LazyAccessor: The Actual Field BoundaryThe most important practical API for authors is LazyAccessor.
Examples:
Directory.DirDirectory.SnapshotFile.FileFile.SnapshotContainer.FSContainer.MetaSnapshotLazyAccessor exists to make it hard to accidentally read a lazy-populated field without first evaluating the owning result.
GetOrEvalGetOrEval(ctx, ownerResult) is the normal access path.
It:
cache.Evaluate(ctx, ownerResult)If evaluation succeeds but the accessor still was not populated, it returns an error. In other words, GetOrEval treats "the lazy callback forgot to set the field" as a bug.
PeekPeek() returns the currently stored value without triggering lazy evaluation.
This is intentional and important. Many paths need to inspect already-known state without forcing full materialization, including:
Peek is for "use what is already present." GetOrEval is for "I need the real materialized value."
SetValueSetValue is used by:
The comment on GetOrEval is important: the caller must pass the dagql result wrapper for the same owning object as the accessor. That pairing is not validated automatically today.
Every concrete core.Lazy type also implements AttachDependencies.
This is not the same thing as calling cache.Evaluate inside the lazy callback.
They serve different purposes:
AttachDependencies runs when the object is attached to the cache. It rewrites embedded result references to attached/cache-backed results and returns the exact dependency edges that should be recorded for ownership, pruning, and persistence closure.cache.Evaluate inside the lazy callback runs later when the implementation actually needs those dependencies materialized.This distinction is central to the design. A dependency can be known structurally and retained correctly long before its expensive value is actually needed.
Most schema resolvers follow the same shape:
Lazy implementation on the object.Two common examples:
container.rootfsThe schema returns a Directory shell immediately with:
Lazy: &core.ContainerRootFSLazy{...}Dir pre-seeded to "/"The expensive work of resolving the actual rootfs snapshot is deferred until somebody needs it.
container.directory(path)The schema resolves env expansion and working-directory normalization immediately, then returns a Directory shell with:
Lazy: &core.ContainerDirectoryLazy{Parent: parent, Path: path}Dir pre-seeded to the resolved pathValidation and snapshot reopening happen later during lazy evaluation.
file.withName(name)The schema constructs a File shell immediately, stores FileWithNameLazy, and if the parent's current path is already known it pre-seeds the derived path in the accessor right away.
This is a very important practical point: the fields stored on a concrete lazy struct are the arguments that define eventual evaluation.
They are not required to match the outer GraphQL arg struct one-for-one.
In many places, schema code has already normalized the inputs before storing them on the lazy struct. For example:
dagql.ObjectResult[...] values before being storedSo when reading a lazy type, treat its fields as the real execution recipe for deferred evaluation, not as a copy of some public API shape.
Directory and File follow a very consistent shape:
AttachDependencyResults implementation just delegates to the current lazy opLazyEvalFunc wrapper calls lazy.Evaluate(...) and then clears Lazy on successThat last point is important: for Directory and File, the wrapper method itself handles the common "clear Lazy after successful materialization" behavior.
Representative lazy types include:
DirectoryWithDirectoryLazyDirectoryWithFileLazyDirectorySubdirectoryLazyDirectoryWithoutLazyFileSubfileLazyFileWithNameLazyFileWithReplacedLazyThe common implementation shape is:
lazy.LazyState.EvaluateGetOrEval on accessors when actual values are neededTwo representative examples:
DirectoryWithDirectoryLazy.Evaluate just delegates to dir.WithDirectory(...)DirectorySubdirectoryLazy.Evaluate materializes the parent, validates the subdirectory only when needed, then reopens the parent snapshot by ID and populates the new directory shellThat second case is a good example of why laziness exists: path validation and snapshot reopening are deferred until somebody actually needs the subdirectory value.
Container uses the same overall model, but with more variation.
There are two large families of container lazy ops:
Directory or File view from a containerExamples:
ContainerWithRootFSLazyContainerWithDirectoryLazyContainerWithFileLazyContainerWithUnixSocketLazyThese usually follow this pattern:
cache.EvaluatematerializeContainerStateFromParentWithDirectory, WithFile, WithUnixSocketFromParent, etc.)container.LazyThe container lazy implementations typically clear container.Lazy themselves after success. Unlike Directory and File, container-wide lazy clearing is not centralized in Container.LazyEvalFunc.
Examples:
ContainerRootFSLazyContainerDirectoryLazyContainerFileLazyThese materialize detached Directory or File shells from container state. They often:
PeekThis detached-clone behavior is important. It keeps a child result shell from sharing mutable accessor state with the parent container object.
materializeContainerStateFromParent ExistsContainer mutation lazies almost all need a concrete copy of the parent's current state before applying one more operation.
materializeContainerStateFromParent does that by:
FSMountsMetaSnapshotThis avoids reimplementing the same copy logic in every lazy type and keeps container lazy evaluation deterministic.
Lazy evaluation is designed to survive persistence.
For Directory and File, persisted object encoding chooses between:
For Container, the persisted payload distinguishes:
Nested directory/file values inside containers are also encoded explicitly.
Each lazy type that supports persistence implements EncodePersisted, and the corresponding object decoder reconstructs the right lazy type from an explicit persisted lazy kind.
So persistence does not serialize "a function pointer." It serializes an explicit, typed lazy recipe plus references to the attached dependency results it needs.
Selector-style container lazies are supported as standalone top-level persisted forms.
The persisted directory/file payload carries a lazyKind alongside the lazy recipe JSON. This is necessary because some field names are ambiguous across parent types. For example, file can mean Directory.file or Container.file; decoding from the retained call field alone is not enough to know which lazy recipe to reconstruct.
The supported selector lazy kinds are:
container.rootfscontainer.directorycontainer.fileThese recipes store the parent container result ID, plus the selected path for container.directory and container.file. They do not evaluate the container or materialize snapshots during shutdown persistence.
Directory and File decoders require lazyKind for lazy forms. This is a hard persistence schema cut: older persisted lazy payloads without lazyKind are not decoded.
Peek is intentionally used throughout lifecycle, accounting, and persistence code to avoid triggering expensive materialization from read-only bookkeeping paths.AttachDependencies should describe the real structural dependencies even if the lazy callback will not immediately materialize them.DoNotCache result cannot be lazy.Lazy cleared.If you are trying to understand or modify this system, this is a good reading order:
dagql/cache.go
registerLazyEvaluationHasPendingLazyEvaluationCache.EvaluateCache.evaluateOnecore/lazy_state.go
LazyLazyStateLazyAccessorcore/directory.go
Directory.LazyEvalFuncDirectorySubdirectoryLazyDirectoryWithoutLazycore/file.go
File.LazyEvalFuncFileSubfileLazycore/container.go
Container.LazyEvalFuncmaterializeContainerStateFromParentContainerRootFSLazyContainerDirectoryLazyContainerWithDirectoryLazyContainerWithFileLazyContainerWithUnixSocketLazycore/schema/container.go, core/schema/directory.go, core/schema/file.go
The cleanest mental model is:
core owns what the deferred work actually isLazyAccessor is the safety boundary that makes consumers go through evaluation instead of casually reaching into half-materialized stateIf you keep those three layers distinct, the implementation becomes much easier to reason about.