Back to Dagger

Filesync (Host Engine File Sync)

skills/cache-expert/references/filesync.md

0.21.05.2 KB
Original Source

Filesync (Host <-> Engine File Sync)

Filesync transfers host files between client and engine. It powers host.directory / host.file imports and local exports.

This doc focuses on protocol flow, engine sync behavior, and cache interactions relevant to correctness/performance.

Key Components

Client side (engine/client/filesync.go)

Client exposes two gRPC services per session:

  1. FileSync (source)
  • streams host filesystem stats/data to engine
  • supports stat-only and single-file fast paths
  • can mark gitignored entries in stats
  1. FileSend (target)
  • receives filesystem/file streams from engine for exports

Protocol (internal/fsutil)

Wire protocol uses packetized stat/data/request frames:

  • sender walks filesystem and emits stats
  • receiver requests file contents by stat index when needed
  • paths are normalized for cross-platform transfer semantics

Server Side (engine/filesync)

Snapshot entry (FileSyncer.Snapshot)

High-level flow:

  1. resolve client metadata and absolute input path via stat-only filesync
  2. get/create a ClientFilesyncMirror-owned mutable mirror snapshot
  3. sync parent dirs when needed
  4. sync requested subtree into mirror and produce immutable snapshot

remoteFS (engine/filesync/remotefs.go)

remoteFS reads from client stream:

  • exposes Walk and ReadFile
  • lazily requests file data
  • skips content requests for gitignored regular files

localFS (engine/filesync/localfs.go)

localFS applies diff from remote stream into per-client mirror:

  • compares remote and local view
  • applies mkdir/symlink/hardlink/write/delete changes
  • computes content hash for resulting subtree
  • reuses existing immutable ref when content hash matches

Filesync Cache Model (Current)

There are two distinct cache layers involved now:

  1. The dagql-level mirror object:
  • core.ClientFilesyncMirror
  • persistable for stable clients via _clientFilesyncMirror
  • keyed by stable client ID plus drive
  • owns the mutable snapshot that acts as the long-lived mirror backing store
  1. The in-package filesync change cache:
  • engine/filesync/change_cache.go
  • key: local path (string)
  • value: *ChangeWithStat
  • behavior: in-memory singleflight + refcount + release
  • no TTL, no persistence, no content-key indexing, no generic adapters

The dagql object owns the mirror snapshot's lifecycle. The in-package change cache only dedupes and validates mutations during active syncs against that mirror.

Why the Change Cache Exists

localFS.Sync uses change-cache entries to:

  • dedupe equivalent concurrent mutations on same path
  • detect mid-sync host mutations (conflict detection)
  • avoid false conflicts after sync completes by releasing all held entries

This cache is shared across syncs for the same client mirror via localFSSharedState.

Conflict Detection

Applied changes are validated against expected remote stats (verifyExpectedChange).

If a cached/applied change does not match expected change, sync fails with conflict instead of silently producing mixed-state snapshot.

GitIgnore Behavior

With gitignore-enabled import:

  • client marks entries as ignored in stats
  • server treats ignored paths as present-but-ignored
  • updates/deletes under ignored prefixes are skipped to keep mirror stable and avoid cross-sync interference

Content Reuse

After sync operations:

  • content hash is computed for subtree
  • if matching immutable ref already exists, reuse it
  • otherwise copy changed paths into new ref and commit

This is separate from dagql call cache; it is snapshot-content reuse at filesync layer on top of the mutable mirror.

Mirror Lifetime

For stable clients, host imports route through a persistable _clientFilesyncMirror object keyed by stable client ID and drive. That object owns:

  • the long-lived mutable snapshot mirror
  • temporary mounted runtime state used while syncing (MirrorSharedState)

For clients without a stable ID, the engine uses an ephemeral mirror object instead.

Either way, each actual import still returns an immutable directory/file snapshot to the caller. The mutable mirror is just backing state that makes repeated syncs cheaper and more consistent.

noCache does not bypass the mirror. It adds a filesync cache-buster so the sync result is treated as fresh while still using the existing mirror as the base state.

Export Path (Engine -> Client)

Exports use client FileSend service:

  • tree exports via fsutil receive
  • single-file/tar streams via chunked bytes

Gotchas

  • Parent directory modtimes are not fully normalized to client today.
  • Device files and named pipes are skipped.
  • Filesync change cache is not durable state; it is lifecycle-scoped dedupe/consistency machinery.
  • The mirror itself is durable for stable clients, but the mounted runtime state around it is lazy and reference-counted during active syncs.

Code Map

  • Client handlers: engine/client/filesync.go
  • Filesync protocol: internal/fsutil/send.go, internal/fsutil/receive.go
  • Server sync logic: engine/filesync/filesyncer.go, engine/filesync/localfs.go, engine/filesync/remotefs.go
  • Filesync change cache: engine/filesync/change_cache.go
  • API glue: core/schema/host.go, core/host.go