packages/server/sandbox/CONTEXT.md
Glossary for the execution-cache / sandbox domain. Terms only — no implementation details.
A single pre-built artifact, scoped to one flow version, containing the flow definition + compiled custom Code Steps only. Built for every locked version (even ones with no code steps) so it is the uniform serving path — the run-time flow definition and code always come from the bundle in object storage rather than a per-run getFlowVersion fetch, offloading the API/DB. The bundle also carries a pieces.json manifest — the resolved PiecePackage[] the flow references — so the worker reads the piece list directly on extract instead of re-parsing the flow and re-resolving each piece via getPiece. Piece versions in the manifest are frozen at bundle-build time (resolved PiecePackage[]), so a locked version no longer floats ^-range pieces — it runs reproducibly forever (see ADR-0001). Piece binaries and the engine are still NOT in the bundle — they remain the shared, cross-flow, per-worker caches (piece-installer / engine-installer) exactly as today; only the manifest of which pieces is precomputed. Served by the API; the worker's code-step preparation degenerates to download + extract, while piece install is unchanged. The bundle serves all five flow-scoped job types (execute-flow, execute-trigger-hook, execute-webhook, execute-polling, renew-webhook) since they share one flow-scoped provision keyed by flowVersionId. Piece-scoped jobs (execute-property, execute-validation, extract-piece-info) never touch the bundle. Scope is deliberately the lowest-risk slice: no piece-loader changes, no per-flow piece dedup loss. Accelerating piece-install latency is explicitly out of scope.
Format is a .tgz (flow.json + pieces.json + codes/<stepName>/index.js), extracted to <cache>/v12/bundles/<flowVersionId>/; the codes/ subdir mounts to /root/codes/<flowVersionId> (same mount target as today). Invariant: the local-build path and the download path produce a byte-identical on-disk tree, so downstream mount logic has a single path regardless of cache hit/miss, and "upload" is just tarring the dir that was built.
A user-authored custom code action inside a flow (TypeScript). Built (bun install + esbuild) into a single clean JS file. Per flow version.
The act of producing a Flow Bundle (bun install + esbuild of code steps and pieces). Triggered lazily on first execution request. The API only stores and serves bundles — it never builds. On a cache hit the worker downloads + extracts. On a cache miss the worker builds the bundle inside its sandbox (the only place with the toolchain and isolation), then uploads it back to the API/file service, so the build happens once globally per flow version rather than once per worker. Keyed by immutable flowVersionId. On a cold burst, the build is serialized by a distributed lock per flowVersionId: the first worker builds + uploads; other workers block until it's ready, then download — guaranteeing exactly one build.
A "provisioned" hint carried in the flow job data (a boolean / the bundleFileId), stamped by the API at enqueue from the flow_version.bundleFileId column. It tells the worker only whether a bundle exists — it is not a URL. The download endpoint is deterministic from flowVersionId (/flow-bundles/{flowVersionId}), which the worker constructs itself; hitting it makes the API 307-redirect to a fresh presigned S3 URL (cloud) or stream the bytes from DB (self-hosted CE), reusing the existing signedFileTransport idiom. The hint exists purely to skip a wasted round-trip on cold/draft.
LOCKED, re-uploads it — serialized by the distributed lock so exactly one worker builds during a cold burst. Local build is therefore both the cold-boot path and the degraded fallback — execution never depends on the bundle store being warm.Bundles are only ever stored for LOCKED flow versions (immutable). DRAFT versions are mutable (builder/test runs) and are never uploaded — they always build locally.
Bundles are stored forever (no eviction/TTL) — small, immutable, write-once, so cheap to keep; this also means zero invalidation logic. (Optional future: delete-on-flow-deletion for data residency.) State of record is a nullable bundleFileId on the flow_version row (write-once; no build-version dimension). Upload + registration happen through a single idempotent API endpoint that stores the bytes via fileService (S3 PUT redirect or DB) and sets bundleFileId in the same handler, under the build lock.
One of the 400+ integrations a flow references (e.g. Slack, Gmail). Versioned and shared/deduped across flows on a worker. Not part of the Flow Bundle — installed by the existing piece-installer at provision time as today.
The shipped v1 implements the core mechanism; a few design points above are deliberately deferred (each is a pure optimization over a correct baseline):
{ flowVersion, pieces, codes: [{ stepName, compiledJs }] }), not a compressed .tgz — dep-free (no tar writer), and we control both ends. Compression is a later optimization.File with id = flowVersionId, type = FileType.FLOW_BUNDLE (idempotent upsert). No bundleFileId column and no DB migration — addressing by flowVersionId makes the column unnecessary.WorkerToApiContract — getFlowBundle / uploadFlowBundle — no HTTP routes.getFlowBundle on every flow-scoped provision (returns null on miss). The hint that skips that round-trip is a follow-up.LOCKED-version miss the worker builds locally and uploads best-effort (last-write-wins; bundles are immutable so redundant cold-burst builds are wasted CPU, not incorrectness). The "exactly one build" blocking lock is a follow-up.flowVersion.schemaVersion !== LATEST_FLOW_SCHEMA_VERSION and rebuilds — so a schema bump self-heals.index.js per code step (what the engine imports); on hit the worker writes them to the code cache and the resolve result reports code: { kind: 'materialized' } so install skips compile; on a miss it reports code: { kind: 'source', steps } to compile during install. Pieces install via the manifest as normal.Key files: cache/flow/flow-provisioning.ts (resolve hit/miss + publish), cache/flow/flow-bundle-store.ts (fetch/publish + codec), cache/flow/code/code-cache.ts (compiled-step layout), sandbox.ts (lifecycle), worker-rpc-service.ts (getFlowBundle/uploadFlowBundle), file.service.ts (getLocationForFile → S3 for FLOW_BUNDLE), worker-contract.ts (contract).
The provisioning code is organized as three nested modules so the bundle hit/miss decision is testable in isolation (each tested through its own interface with a faked apiClient / temp dir):
cache/flow/flow-provisioning.ts. The deep seam over "turn a flowVersionId into { flowVersion, pieces, code }." resolve() hides the Flow Bundle hit vs. fetch-resolve-and-publish decision (and the disable-on-missing-piece path, formerly flow-artifacts.ts, now inlined). It returns the code provenance explicitly (code: { kind: 'materialized' | 'source' }) and, for a LOCKED current-schema miss, a best-effort publishBundle handle the runtime invokes after install (compiled code only exists on disk post-install). sandbox keeps only sandbox lifecycle: resolve → acquire → install → publish. The flow-steps.ts helper is the single home for "which steps of a flow version are Code Steps / Piece steps," shared by Flow Provisioning and the Flow Bundle Store.
cache/flow/flow-bundle-store.ts. Owns the Flow Bundle end to end and is local-first (via cacheState): tryFetch() returns a cached, current-schema manifest from memory/disk with no RPC and no writes; only on a cold miss does it RPC getFlowBundle, schema-guard, and materialize compiled code via Code Cache. Misses are never persisted, so a later-published bundle is picked up next run. publish() reads code via Code Cache + RPC upload. The worker-side RPC pair is the adapter at this seam; tests use an in-memory adapter and assert a warm second fetch makes no further RPC.
cache/flow/code/code-cache.ts. Single owner of the compiled Code Step on-disk layout (<codes>/<flowVersionId>/<stepName>/index.js) — read/write/path. Both code-builder (Bundle Build miss) and the Flow Bundle Store (hit/publish) go through it, so the byte-identical-tree invariant is enforced by construction, not convention.