Turbo Run Architecture

This document serves as a sketch of the architecture of the turbo run command

Overview

A run consists of the following steps:

Build a package graph based on the Javascript package manager settings
Build a task graph based on package dependencies and configuration
Determine global/task hashes
Execute tasks in topological order
1. Attempt to restore outputs from cache
2. Execute task
3. Cache task outputs for future runs in background
Collect and summarize execution results

Entry Point

CLI Entry: crates/turborepo/src/main.rs - Constructs TurboQueryServer (the concrete QueryServer implementation) and passes it to turborepo_lib::main
Command Handler: crates/turborepo-lib/src/commands/run.rs - Entry point for the run command, sets up signal handling and UI
Main Logic: crates/turborepo-lib/src/run/mod.rs - Core run implementation

Core Architecture Components

Signal-Driven Shutdown

Graceful shutdown and parent-death cleanup are separate responsibilities. Graceful shutdown happens while the Turbo process is still alive, so it should be handled internally by the run and process manager. Parent-death cleanup only applies when Turbo disappears before Rust cleanup code can run.

crates/turborepo-lib/src/commands/run.rs creates a shared SignalHandler and does not return until all shutdown subscribers finish their cleanup work.
The handler distinguishes signal-driven shutdown (ShutdownReason::Signal) from close-driven shutdown (ShutdownReason::Close). Normal command completion uses the close path to drain subscribers without printing signal-specific shutdown UX.
crates/turborepo-lib/src/run/mod.rs registers shutdown subscribers for task processes, cache writes, and the microfrontends proxy.
Task processes are spawned into dedicated process groups so Turbo can signal a task and all of its descendants together.
On the first SIGINT/SIGTERM, Turbo enters graceful shutdown: it prints a shutdown message, forwards SIGINT to running tasks, and waits for their process groups to exit.
Turbo must not treat direct-child exit as task-tree exit. Package managers, shells, and watch commands can leave descendants running after the leader exits, so the process manager should track the process targets it spawned and keep Turbo alive until all tracked process groups are gone.
Close-driven shutdown still flushes cache writes and stops processes, but it does not arm signal-specific force-shutdown timers.
If tasks are still running after 3 seconds, Turbo prints the remaining task list. In an interactive terminal it also prompts for a second Ctrl+C to force shut down. Without a terminal on stdin, Turbo instead prints the remaining time before the automatic force shutdown.
On Unix, a second signal escalates to a force kill. When stdin is not attached to a terminal, Turbo auto-escalates after 10 seconds instead.
On Windows, graceful shutdown falls back to an immediate kill because the platform does not support Unix-style signal forwarding to task process groups.

Parent-death cleanup is not part of normal graceful shutdown. An in-process map cannot help after SIGKILL, a crash, or OOM because the map dies with Turbo. Turbo should not start a per-task Unix watchdog for this case. If abnormal cleanup is required later, prefer a bounded run-level mechanism:

A run-level reaper, if used, should be owned by ProcessManager and shared by all tasks in the run.
Tasks register their process target (pid, pgid, and session identity) when spawned and unregister on normal exit or Turbo-managed shutdown.
If the Turbo process disappears and the control pipe reaches EOF, the reaper can signal the remaining registered process groups and escalate if needed.
Linux can use prctl(PR_SET_PDEATHSIG) as a best-effort no-helper option, but it only signals the direct child and cannot provide delayed escalation.
Windows should continue using job objects for parent-death cleanup.

Regression coverage for shutdown changes should focus on observable lifecycle behavior:

A direct child exiting during shutdown must not let Turbo exit while a tracked descendant process group is still alive.
Graceful shutdown must wait for all tracked process groups, not just all direct children.
Forced shutdown must kill stubborn descendants and clear the tracked task records.
End-to-end turbo run signal tests should assert that descendants are not leaked after force shutdown.
Existing final-output coverage should continue proving that shutdown keeps the UI and log pipeline alive long enough to drain task output.

1. Run Builder (`crates/turborepo-lib/src/run/builder.rs`)

Key responsibilities:

Package discovery and lockfile analysis
Task filtering based on arguments (task names and --filter)
Root task scoping via FilterMode (from turborepo-types): when no filter or only exclude filters are active, root tasks defined in turbo.json are auto-included. Explicit include filters or --affected suppress root task injection. See calculate_filtered_packages and FilterMode.
Task graph construction and validation
Task-level affected detection (see below)
Cache setup (local and remote)
Activating shared HTTP client initialization once telemetry, remote cache, or linked analytics are known to be needed
Building a tracked repo index eagerly, then augmenting it with scoped untracked-file discovery once the selected package set is known
Producing a final Run struct ready for execution

Task-Level Affected Detection

When the affectedUsingTaskInputs future flag is enabled and --affected is active, the run builder applies a second filtering pass after engine construction:

File change detection: SCM provides the set of changed files between refs
Task input matching (turborepo-types/src/task_input_matching.rs): Each task's inputs globs are compiled and checked against the changed files. Shared with turbo query { affectedTasks }.
Task change detection (turborepo-lib/src/task_change_detector.rs): Determines directly affected tasks, handling global deps and per-task inputs
Engine pruning (Engine::retain_affected_tasks): Returns a new engine containing directly affected tasks, their transitive dependents, and all transitive dependencies required for execution (upstream tasks needed as cache hits)

This differs from the default --affected behavior which operates at the package level (all tasks in changed packages run).

2. Package Graph (`crates/turborepo-repository/src/package_graph/`)

Represents the workspace structure and package dependencies:

Identify package manager being used
Discovers packages in workspace
Performs lockfile analysis
Builds dependency relationships between workspace packages
Validates that all non-root packages have a name field (PackageGraph::validate())

The package graph intentionally allows cyclic dependencies between packages — this aligns with how npm, pnpm, and yarn handle cyclic workspace deps. Cycle detection is deferred to the task graph layer (engine builder), since package-level cycles only matter when they produce task-level cycles via topological (^) dependencies.

3. Task Graph (`crates/turborepo-lib/src/engine/`)

The task graph is a graph of all tasks that will be part of the run and related configuration.

Due to purely historical reasons, this is referenced as "engine" throughout the codebase.

The core task graph consists of:

Engine Builder (`crates/turborepo-lib/src/engine/builder.rs`)

Parses turbo.json and other configuration sources to determine task definitions
Resolves task dependencies (topological ^build and direct build)
Creates task nodes and dependency edges
Validates task definitions and is the sole layer that checks for circular dependencies (both cycles and self-dependencies in the task graph)

Engine Execution (`crates/turborepo-lib/src/engine/execute.rs`)

Orchestrates task execution in topological order
Enforces user set concurrency limit
Sends tasks to the visitor for execution
Handles early termination and error propagation

Task Graph Structure:

Nodes: Individual tasks identified by TaskId (package#task) or root
Root is an artifacts of our Go graph library which required all graphs have a single entrypoint
Edges: Dependencies between tasks, at the moment no additional data (weights) are added to the edge

Engine Pruning (`crates/turborepo-engine/src/lib.rs`)

retain_affected_tasks keeps directly affected tasks, transitive dependents, and all transitive dependencies required for normal --affected execution
create_engine_for_subgraph is used by watch mode. It keeps changed package tasks, transitive dependents, and only cacheable upstream dependencies that can restore outputs without forcing non-cacheable tasks to rerun

4. Task Visitor (`crates/turborepo-lib/src/task_graph/visitor/`)

The task graph visitor handles task execution:

Visitor `visit` (`crates/turborepo-lib/src/task_graph/visitor/mod.rs`)

Receives tasks from the engine when they can be executed
Calculates task hashes
Creates ExecContext for each task
Manages UI output and progress tracking
Collects errors and execution information

Task Executor (`crates/turborepo-lib/src/task_graph/visitor/exec.rs`)

ExecContext: Holds state required to execute a task
Attempts cache restoration before execution
Spawns and manages child processes using turborepo_process
Captures stdout/sterr output
Saves outputs to cache on success
Reports task result back to the execution engine

Execution Flow:

Check cache for existing results
If cache miss, execute the task
Capture outputs and logs
Save results to cache (if successful)
Report status back to engine

5. Caching System (`crates/turborepo-lib/src/run/cache.rs` and `crates/turborepo-cache/`)

Multi-layered caching system:

Cache Hierarchy

Local FS Cache: Fast local file system cache
Remote Cache: Shared cache (typically Vercel's service)
Cache Multiplexer: Wraps local and remote to provide single cache to check

Task Cache Flow

Cache Lookup: Check local cache first, then remote
Cache Restoration: Extract and restore cached files
Cache Storage: Compress and store task outputs
Cache Metadata: Track cache hits, timing, and sources

Key Components

RunCache: High-level cache coordination
TaskCache: Individual task cache management
AsyncCache: Handles async cache operations. Supports both local filesystem and remote HTTP caches
SharedHttpClient: Process-wide lazy/activatable reqwest::Client initialization shared by telemetry and remote-cache consumers

Shared HTTP Client Initialization

Network consumers do not construct an HTTP client speculatively at process startup. Instead:

The CLI and run builder determine whether telemetry, remote cache, or linked analytics will actually need networking for the current invocation
Once that need is known, they activate shared client initialization immediately so TLS setup overlaps with other startup work
Telemetry flushes and remote-cache operations both reuse the same initialized reqwest::Client

This avoids paying client/TLS setup on invocations with no network use while still warming the client before the first network request in the common case.

Two-Stage Repo Index Construction

turbo run builds SCM state in two stages:

A background startup task reads .git/index and records committed blob IDs plus modified/deleted tracked files for the whole repo
After package filtering finishes, Turborepo computes the package roots it actually needs for hashing and augments that tracked index with untracked files only for those prefixes

Those prefixes are relative to the repo index root, which is usually the Git root. This matters when the Turbo root is nested inside a larger Git repository: the root package should scope to the nested Turbo directory, not request an untracked walk of the entire parent repository.

This keeps the cheap tracked-index work overlapped with other startup work while avoiding a repo-wide untracked walk when only a subset of packages will be hashed.

Worktree Cache Sharing

When running in a Git linked worktree (created via git worktree add), Turborepo automatically shares the local file system cache with the main worktree. This enables:

Cache hits across worktrees: Builds on different branches share cache artifacts
Reduced disk usage: Avoids duplicate cache entries across worktrees
Faster iteration: Switching between feature branches benefits from existing cache

How it works:

WorktreeInfo::detect() in turborepo-scm determines if the current directory is a linked worktree using Git commands (git rev-parse --show-toplevel and git rev-parse --git-common-dir)
If in a linked worktree, ConfigurationOptions::resolve_cache_dir() returns the main worktree's .turbo/cache directory instead of the local one
Users are notified via the run prelude message: "Remote caching {status}, using shared worktree cache"

Configuration:

Setting an explicit cacheDir in turbo.json disables worktree cache sharing
Detection failures (non-git repos, git errors) gracefully fall back to local cache

Atomic Cache Writes

Cache writes use an atomic write pattern (write-to-temp-then-rename) for concurrent safety:

Cache archives are written to temporary files (.{filename}.{pid}.{counter}.tmp)
On successful completion, temp files are atomically renamed to final destination
CacheWriter implements Drop to clean up temp files if finish() is not called (e.g., on error or panic)

This ensures concurrent readers never see partially written cache files.

6. Task Hashing (`crates/turborepo-lib/src/task_hash/`)

Creates a "content identifier" for a specific task depending on current state of inputs:

Hash Inputs

Global Hash: Package manager lockfile, global dependencies, environment variables
Task Hash: Task definition, package dependencies, input files, environment variables
File Hashing: Uses git for tracking file changes efficiently
Explicit Inputs: When tasks use custom inputs, glob matches still walk the filesystem, but clean tracked matches reuse blob OIDs from the repo index instead of re-hashing file contents
CRLF Normalization: When .gitattributes marks files as text or text=auto, git normalizes CRLF line endings to LF in blob objects. The crlf module in turborepo-scm replicates this so turbo's file hashes match git's regardless of the code path (git or manual/no-git after turbo prune). .gitattributes is included in the global hash inputs and preserved by turbo prune. Known limitations: only root-level .gitattributes is loaded; eol= is not handled.

`globalConfiguration` and `global.inputs`

When the globalConfiguration future flag is enabled, global.inputs (formerly globalDependencies) files are not included in the global hash. Instead, they are prepended as implicit input globs to every task's TaskInputs during engine construction (see prepend_global_inputs in crates/turborepo-engine/src/task_definition.rs).

This means:

The global hash still exists (lockfile, engines, global env, root deps) but does not include global.inputs file hashes
Tasks can exclude specific global input files via negation globs (e.g. "inputs": ["$TURBO_DEFAULT$", "!$TURBO_ROOT$/tsconfig.json"])
Tasks with no explicit inputs key get default: true set so package files are still hashed alongside the global inputs

Hash Calculation

Combines global and task-specific inputs
Calculated by leveraging capnp to serialize in memory structs for hashing
Artifact of ensuring shared hashing logic between Go and Rust

7. Run Tracking and Summary (`crates/turborepo-lib/src/run/summary/`)

The summary module is responsible for any time of summary:

The "FULL TURBO" summary block at the end of a run
The summary produced by --summarize
Dry run output --dry=json

Run Tracker (`crates/turborepo-lib/src/run/summary/mod.rs`)

Tracks overall run metadata (start time, command, etc.)
Coordinates task tracking across execution
Takes final result from Visitor::visit
Generates final run summary

Task Tracker (`crates/turborepo-lib/src/run/summary/execution.rs`)

Tracks individual task execution states
Records timing, exit codes, and cache status
Receives information about tasks in real time

Summary Generation

Stitches together result from visitor and the task tracker
Constructs final summary depending on user ask e.g. --dry=json/--summarize

8. Query Subsystem

The query subsystem powers turbo query (GraphQL introspection of the package/task graph) and the Web UI mode (--ui=web).

Crate layout:

turborepo-query-api — Trait definitions (QueryServer, QueryRun) and shared error/result types. turborepo-lib depends on this thin interface crate instead of the heavy implementation.
turborepo-query — GraphQL implementation using async-graphql, axum, and oxc. Implements the resolvers and HTTP server.
turborepo/src/main.rs — Wires the two halves together via TurboQueryServer, which implements QueryServer by delegating to turborepo-query.

Data flow: main() constructs Arc<TurboQueryServer> → passes to turborepo_lib::main → threaded through shim → cli::run → commands::run → RunBuilder → Run. The Run struct stores the query_server and uses it in start_web_ui() and the turbo query command handler.

Data Flow Overview

1. Task Graph Building

RunBuilder
  ├── Package Discovery → PackageGraph (validates package names)
  ├── Task Discovery → EngineBuilder
  ├── Task Graph Construction → Engine (built)
  └── Task Graph Validation (cycles, missing deps) → Ready Engine

Process:

Discover packages and build package dependency graph
Load turbo.json configurations for tasks
Create task nodes for each package × task combination
Build dependency edges based on dependsOn configurations
Validate task graph for cycles and missing dependencies

2. Task Graph Traversal

Engine.execute()
  ├── Walker (topological order)
  ├── Semaphore (concurrency control)
  ├── Engine -[Task to Run]→ Visitor
  └── Engine ←[Task Result]- Visitor

Process:

Walker traverses graph in topological order
Semaphore controls maximum concurrent tasks
Each ready task is sent to the Visitor
Visitor executes task and reports back to Engine
Walker continues with newly available tasks

3. Task Execution

Visitor.visit()
  ├── Calculate Hash
  ├── Check Cache → Cache Hit? → Restore & Done
  ├── Execute Task → Create ExecContext and `exec_context.exec()`
  ├── Save to Cache
  └── Track Results

Process:

Calculate task hash from inputs
Check local then remote cache
If cache hit: restore outputs and logs
If cache miss: execute task command
Capture outputs and logs during execution
Save results to cache (if successful)
Track timing and results

4. Cache Operations

TaskCache.restore_outputs()
  ├── Check caching disabled?
  ├── Local Cache → exists?
  ├── Remote Cache → exists?
  ├── Fetch & Extract
  └── Return metadata

TaskCache.save_outputs()
  ├── Collect output files
  ├── Compress to tar
  ├── Save to Local Cache
  └── Upload to Remote Cache (async)

Incremental Cache (`crates/turborepo-run-cache/src/incremental.rs`)

Handles tool-managed incremental artifacts (e.g., .tsbuildinfo) that persist across runs via remote cache, speeding up cache misses by restoring prior incremental state before execution.

Gated behind the incrementalTasks future flag
Operates per-partition with independent cache keys
Fetch completes before task execution begins (strict ordering)
Upload happens after successful execution, in parallel with regular cache save
All blocking filesystem operations run on spawn_blocking threads
See SPEC.md for full specification

On Cache Miss:
  Visitor.visit()
    ├── Calculate Hash → Cache Miss
    ├── Fetch Incremental Artifacts (sequential per-partition, must complete before exec)
    ├── Execute Task
    ├── Save to Cache
    ├── Upload Incremental Artifacts (concurrent per-partition, parallel with cache save)
    └── Track Results

5. Data Collection and Summary

RunTracker
  ├── Task Events → ExecutionTracker
  ├── State Aggregation → SummaryState
  ├── Summary Generation → RunSummary
  └── Output (JSON/Console)

Process:

Each task sends lifecycle events (start, success, failure, cache hit)
ExecutionTracker aggregates state across all tasks
Final summary includes timing, cache status, errors
Summary is saved to .turbo/runs/ and optionally printed

8. Observability (`crates/turborepo-run-summary/src/observability/` and `crates/turborepo-otel/`)

The observability subsystem enables exporting run metrics to external backends via OpenTelemetry.

Architecture

The system uses a two-layer design:

turborepo-otel: Low-level OTLP exporter crate
- Manages the OpenTelemetry SDK meter provider and instruments
- Supports gRPC and HTTP/Protobuf protocols
- Handles connection lifecycle and metric flushing
turborepo-run-summary/observability: Integration layer
- Provides a RunObserver trait for pluggable backends
- Converts RunSummary data into metrics payloads
- Enabled via the otel feature flag

Main Components

observability::Handle: Main entry point; wraps backend-specific implementations
RunObserver trait: Abstraction allowing future backends (Prometheus, etc.)
OtelObserver: OpenTelemetry implementation of RunObserver

Configuration

Observability is configured via experimentalObservability.otel in turbo.json:

jsonc

{
  "futureFlags": {
    "experimentalObservability": true
  },
  "experimentalObservability": {
    "otel": {
      "enabled": true,
      "protocol": "http/protobuf",
      "endpoint": "https://otel-collector.example.com:4318/v1/metrics",
      "resource": {
        "service.name": "turborepo"
      },
      "metrics": {
        "runSummary": true,
        "taskDetails": true,
        "runAttributes": {
          "id": false,        // turbo.run.id — unbounded cardinality
          "scmRevision": false // turbo.scm.revision — unbounded cardinality
        },
        "taskAttributes": {
          "id": false,    // turbo.task.id
          "hashes": false // turbo.task.hash, turbo.task.external_inputs_hash — unbounded
        }
      }
    }
  }
}

Configuration can also be set via environment variables (TURBO_EXPERIMENTAL_OTEL_*) or CLI flags (--experimental-otel-*).

Metrics Emitted

turbo.run.duration_ms - Run duration histogram
turbo.run.tasks.attempted - Tasks attempted counter
turbo.run.tasks.failed - Tasks failed counter
turbo.run.tasks.cached - Cache hit counter
turbo.task.duration_ms - Per-task duration histogram (when taskDetails enabled)
turbo.task.cache.events - Per-task cache events (when taskDetails enabled)

Attributes with unbounded cardinality (unique run IDs, Git SHAs, content hashes) are gated behind runAttributes and taskAttributes config flags, all defaulting to false. See the Metric Attributes and Cardinality section in crates/turborepo-otel/src/lib.rs for the full attribute inventory.

Data Flow

RunSummary.finish()
  ├── observability::Handle.record(&summary)
  │     ├── Convert to RunMetricsPayload
  │     └── Record via OpenTelemetry instruments
  └── observability::Handle.shutdown()
        └── Flush pending metrics to backend

9. User-Facing Logging (`crates/turborepo-log/`)

Structured event system for messages intended for end users (warnings, errors, informational output). Distinct from tracing, which remains for developer diagnostics.

Key Types

Logger — Dispatches events to registered sinks. Set globally via init() (once, at startup) or used directly via Logger::handle() for testing.
LogHandle — Source-scoped handle for emitting events. Created via log() (global) or Logger::handle() (specific logger). Resolves the global logger at .emit() time, not at handle or builder creation time — handles and builders created before init() work once the global logger is set.
LogSink — Trait for event destinations. Built-in sinks: CollectorSink (in-memory buffer for post-run summaries) and FileSink (newline-delimited JSON with optional size limiting).
LogEvent — Structured event with level, source, message, typed fields, and timestamp.

Relationship to `turborepo-ui`

turborepo-ui handles terminal rendering (TUI, console formatting). turborepo-log handles structured event capture and dispatch. A terminal sink in turborepo-ui can implement LogSink to bridge events into the rendering pipeline. turborepo-log intentionally has no dependency on turborepo-ui — it sits at the bottom of the dependency graph.

Data Flow

Subsystem / Task Executor
  └── LogHandle.warn("msg").field("k", v).emit()
        └── Logger.emit(&event)
              ├── CollectorSink → in-memory buffer → post-run summary
              └── FileSink → JSONL file → external tooling

Turbo Run Architecture

Turbo Run Architecture

Overview

Entry Point

Core Architecture Components

Signal-Driven Shutdown

1. Run Builder (crates/turborepo-lib/src/run/builder.rs)

Task-Level Affected Detection

2. Package Graph (crates/turborepo-repository/src/package_graph/)

3. Task Graph (crates/turborepo-lib/src/engine/)

Engine Builder (crates/turborepo-lib/src/engine/builder.rs)

Engine Execution (crates/turborepo-lib/src/engine/execute.rs)

Engine Pruning (crates/turborepo-engine/src/lib.rs)

4. Task Visitor (crates/turborepo-lib/src/task_graph/visitor/)

Visitor visit (crates/turborepo-lib/src/task_graph/visitor/mod.rs)

Task Executor (crates/turborepo-lib/src/task_graph/visitor/exec.rs)

5. Caching System (crates/turborepo-lib/src/run/cache.rs and crates/turborepo-cache/)

Cache Hierarchy

Task Cache Flow

Key Components

Shared HTTP Client Initialization

Two-Stage Repo Index Construction

Worktree Cache Sharing

Atomic Cache Writes

6. Task Hashing (crates/turborepo-lib/src/task_hash/)

Hash Inputs

globalConfiguration and global.inputs

Hash Calculation

7. Run Tracking and Summary (crates/turborepo-lib/src/run/summary/)

Run Tracker (crates/turborepo-lib/src/run/summary/mod.rs)

Task Tracker (crates/turborepo-lib/src/run/summary/execution.rs)

Summary Generation

8. Query Subsystem

Data Flow Overview

1. Task Graph Building

2. Task Graph Traversal

3. Task Execution

4. Cache Operations

Incremental Cache (crates/turborepo-run-cache/src/incremental.rs)

5. Data Collection and Summary

8. Observability (crates/turborepo-run-summary/src/observability/ and crates/turborepo-otel/)

Architecture

Main Components

Configuration

Metrics Emitted

Data Flow

9. User-Facing Logging (crates/turborepo-log/)

Key Types

Relationship to turborepo-ui

Data Flow

1. Run Builder (`crates/turborepo-lib/src/run/builder.rs`)

2. Package Graph (`crates/turborepo-repository/src/package_graph/`)

3. Task Graph (`crates/turborepo-lib/src/engine/`)

Engine Builder (`crates/turborepo-lib/src/engine/builder.rs`)

Engine Execution (`crates/turborepo-lib/src/engine/execute.rs`)

Engine Pruning (`crates/turborepo-engine/src/lib.rs`)

4. Task Visitor (`crates/turborepo-lib/src/task_graph/visitor/`)

Visitor `visit` (`crates/turborepo-lib/src/task_graph/visitor/mod.rs`)

Task Executor (`crates/turborepo-lib/src/task_graph/visitor/exec.rs`)

5. Caching System (`crates/turborepo-lib/src/run/cache.rs` and `crates/turborepo-cache/`)

6. Task Hashing (`crates/turborepo-lib/src/task_hash/`)

`globalConfiguration` and `global.inputs`

7. Run Tracking and Summary (`crates/turborepo-lib/src/run/summary/`)

Run Tracker (`crates/turborepo-lib/src/run/summary/mod.rs`)

Task Tracker (`crates/turborepo-lib/src/run/summary/execution.rs`)

Incremental Cache (`crates/turborepo-run-cache/src/incremental.rs`)

8. Observability (`crates/turborepo-run-summary/src/observability/` and `crates/turborepo-otel/`)

9. User-Facing Logging (`crates/turborepo-log/`)

Relationship to `turborepo-ui`