eden/mononoke/docs/1.3-architecture-overview.md
This document provides a high-level overview of Mononoke's architecture. After reading this, you should understand how Mononoke's major components fit together, how data flows through the system, and the key architectural decisions that shape the codebase.
Reading time: ~30 minutes
Mononoke is a distributed source control server designed to scale along multiple dimensions: commit rate, repository size, file count, and branch count. Rather than being a monolithic application, Mononoke is structured as a collection of Rust libraries that implement source control functionality, which are then composed into various servers, services, and tools.
The architecture is designed around several core principles:
Mononoke is designed as a distributed system composed of multiple services that work together.
Mononoke is organized into three tiers:
These services serve external clients and implement various protocols. They are stateless and can be horizontally scaled to handle client load.
Protocol Servers:
All frontend services authenticate requests, translate between protocol-specific formats and Mononoke's internal Bonsai format, and route operations to the appropriate backend storage or microservices.
These services handle expensive or serialized operations that are offloaded from the main frontend servers. This prevents resource contention and enables dedicated scaling for specific workloads.
Microservices:
By offloading these operations to dedicated services, frontend servers remain responsive to lightweight operations like fetches and clones.
All services (frontend and microservices) share common backend storage:
Storage Systems:
This shared storage enables stateless services and allows any server to handle any request for a given repository.
┌──────────────────────────────────────────────────────────────┐
│ Client Layer │
│ Sapling CLI │ EdenFS │ Git Clients │ Other Services │
└──────────────────────────────────────────────────────────────┘
▼
┌──────────────────────────────────────────────────────────────┐
│ Frontend Services (Stateless) │
│ │
│ Mononoke Server (SLAPI) │ Git Server │ LFS Server │
│ SCS Server (Thrift) │
└──────────────────────────────────────────────────────────────┘
▼
┌───────────────┴───────────────┐
▼ ▼
┌────────────────────────┐ ┌──────────────────────────────┐
│ Internal Microservices│ │ Backend Storage │
│ │ │ │
│ Land Service │◄────►│ Blobstore (immutable) │
│ Derived Data Service │ │ - File contents │
│ Diff Service │ │ - Commit metadata │
│ Bookmark Service │ │ - Derived data │
│ │ │ │
│ │ │ Metadata DB (mutable) │
│ │ │ - Bookmarks │
│ │ │ - VCS mappings │
│ │ │ - Commit graph │
└────────────────────────┘ └──────────────────────────────┘
This service-oriented architecture has the following characteristics:
Independent Scaling - Each tier can be scaled independently. Git servers can be added to increase capacity for Git operations, while derivation workers can be scaled separately.
Workload Isolation - Expensive operations (landing, derivation) run in dedicated services, isolating them from lightweight operations (fetches, file reads).
Sequential Coordination - Operations requiring mutual exclusion (e.g., landing to the same bookmark) are coordinated by routing through a single service instance.
Independent Deployment - Services can be deployed, updated, and monitored independently.
Resource Allocation - Different services have different resource profiles (CPU, memory, I/O) and can be provisioned accordingly.
It's important to understand that the service architecture and code architecture are orthogonal concerns:
Every Mononoke binary - whether it's the SLAPI server, the land service, or the admin tool - uses the same internal layered structure. They all share the same repository implementation, blobstore abstractions, and core types. The difference is in what protocol they serve or what operations they perform.
While the system architecture describes how Mononoke services are composed, each individual Mononoke application (whether a server, microservice, tool, or job) is structured internally using a consistent layered design. This code architecture is implemented as a collection of Rust libraries that are composed together.
Understanding this internal structure is key to navigating the codebase and understanding how to build new tools or modify existing ones.
Every Mononoke application - whether it's the main SLAPI server, the admin CLI tool, or the walker job - is built using the same layered architecture. This consistent structure makes it easier to understand any part of the codebase once you understand the pattern.
The layers are:
┌──────────────────────────────────┐
│ Application Binary │ (server, tool, or job main())
│ (uses cmdlib/mononoke_app) │
└──────────────────────────────────┘
▼
┌──────────────────────────────────┐
│ API Layer (optional) │ High-level abstractions
│ (mononoke_api) │ (Repo, Changeset, File objects)
└──────────────────────────────────┘
▼
┌──────────────────────────────────┐
│ Features │ High-level operations
│ (pushrebase, cross_repo_sync) │ (combines multiple facets)
└──────────────────────────────────┘
▼
┌──────────────────────────────────┐
│ Repository Attributes │ Composable capabilities
│ (repo_attributes/*) │ (facets providing specific functionality)
└──────────────────────────────────┘
▼
┌──────────────────────────────────┐
│ Common Components │ Foundation libraries
│ (blobstore, mononoke_types, │ (storage, types, utilities)
│ common/*, cmdlib) │
└──────────────────────────────────┘
Each layer is described below:
The foundation layer provides basic building blocks used by all Mononoke applications. These libraries know nothing about repositories or high-level version control concepts - they provide fundamental primitives.
Key components:
context/) - Request context carrying permissions, logging, and tracingblobstore/) - Key-value storage interface and implementationscommon/) - Async utilities, SQL helpers, logging extensionsmononoke_types/) - Fundamental data structures (Bonsai changesets, file content)metaconfig/) - Repository configuration systempermission_checker/) - Access control primitivesThese components are designed to be reusable and have minimal dependencies on each other.
Repository attributes are implemented as facets - trait-based components that provide specific capabilities. Each facet encapsulates a single responsibility and can contain state that forms part of the repository.
Core attribute categories:
Identity and Configuration:
repo_identity - Repository ID and namerepo_config - Repository configurationStorage:
repo_blobstore - Access to immutable blob storagefilestore - File content storage with chunkingCommit Graph:
commit_graph - Efficient ancestry queries and traversalcommit_graph_writer - Write interface for graph updatesDerived Data:
repo_derived_data - Manager for all derived data typesrepo_derivation_queues - Remote derivation coordinationVCS Integration:
bonsai_hg_mapping - Bonsai ↔ Mercurial changeset mappingsbonsai_git_mapping - Bonsai ↔ Git commit mappingsbonsai_globalrev_mapping - Sequential integer IDsgit_symbolic_refs - Git reference handlingBookmarks and References:
bookmarks - Branch management (read/write)bookmark_update_log - History of bookmark movementsbookmarks_cache - Cached bookmark dataOperations:
repo_permission_checker - Access controlhook_manager - Pre-commit hooksrepo_cross_repo - Cross-repository operationsFacets are located in repo_attributes/ and are composed together into complete repositories using the facet container pattern. This allows different operations to declare exactly which repository capabilities they need.
Features combine multiple facets to implement source control operations. Unlike facets, features don't hold state - they orchestrate operations across repository attributes.
Key features (in features/):
features/pushrebase/) - Server-side rebasing for linear historyfeatures/cross_repo_sync/) - Repository synchronizationfeatures/commit_transformation/) - Rewriting commitsfeatures/redaction/) - Content removalfeatures/diff/, features/history_traversal/) - Content comparison and traversalbookmarks/bookmarks_movement/) - Safe bookmark updatesfeatures/microwave/) - Cache warmingFeatures are typically implemented as functions that accept trait bounds specifying the required facets.
The API layer (mononoke_api/ and mononoke_api_hg/) provides high-level abstractions over the underlying implementation details. Not all applications use this layer - it's primarily used by servers and interactive tools.
It exposes objects like:
This layer hides whether operations use commit graph vs. segmented changelog, which manifest type is used, etc. It provides a stable interface that isolates protocol implementations from internal representation changes.
All Mononoke binaries use the cmdlib/mononoke_app/ framework, which provides:
This ensures all Mononoke applications have consistent behavior, configuration, and operational characteristics.
Different types of applications use different subsets of these layers:
Servers (Mononoke, Git, SCS):
Server Binary (cmdlib/mononoke_app)
↓
API Layer (mononoke_api)
↓
Features (pushrebase, hooks)
↓
Repository Facets
↓
Common Components
Tools (admin CLI, walker):
Tool Binary (cmdlib/mononoke_app)
↓
Features (or direct facet access)
↓
Repository Facets
↓
Common Components
Microservices (Land, Derived Data):
Service Binary (cmdlib/mononoke_app)
↓
Features
↓
Repository Facets
↓
Common Components
The key insight is that all applications share the same library codebase. A function implementing pushrebase can be used by the SLAPI server (via API layer), the land service (directly), and the admin tool (for testing).
At the heart of Mononoke is Bonsai, a VCS-agnostic data model that serves as the single source of truth. Bonsai is designed for:
Core Bonsai data includes:
Everything else in Mononoke is derived from or maps to this core data.
Mononoke's architecture separates the write path (committing new data) from the read path (serving queries), optimizing each independently.
When a commit is pushed:
Derived data is not computed on the write path. This keeps the critical section minimal and push latency low.
Most read operations require data not present in the minimal Bonsai format. For example:
These are provided by derived data - indexes and representations computed asynchronously from Bonsai changesets. Derived data is:
Major derived data types:
The derived data service can coordinate derivation across workers, deduplicating work and enabling horizontal scaling of derivation workload.
A request flows through the system as follows:
repo_derived_data to get appropriate manifest typerepo_blobstore to fetch manifest and file contentbonsai_hg_mapping (if needed) to resolve Mercurial hash to BonsaiThroughout this flow, caching (cachelib, memcache) reduces load on storage backends.
Mononoke uses two complementary storage systems:
The blobstore is a key-value store for immutable data:
Blobstore implementations:
Blobstore decorator pattern:
Mononoke uses decorators to layer functionality:
Application
↓
prefixblob (repo-specific key namespacing)
↓
cacheblob (memcache + cachelib caching)
↓
multiplexedblob (write-all, read-any across backends)
↓
packblob (compression for space efficiency)
↓
Backend storage (sqlblob/manifoldblob/s3blob)
Additional decorators include redaction (content removal), sampling (observability), throttling (rate limiting), and chaos/delay (testing).
Multiplexing for availability:
To avoid cyclic dependencies (storage systems need source control to deploy fixes), Mononoke writes to multiple independent blobstores simultaneously. If one backend is down, reads succeed from others. A healer job ensures consistency across backends.
A SQL database (MySQL in production, SQLite for development) stores mutable repository state and mappings:
The metadata DB is used for small, frequently changing data, while the blobstore handles large, immutable content.
Mononoke employs multi-level caching:
Cache configuration can be controlled via command-line flags (e.g., --cache-mode=local-only).
Several design choices shape Mononoke's implementation:
VCS-Agnostic Data Model - Mononoke uses Bonsai as its internal representation rather than Git or Mercurial formats. This requires conversion when serving clients but allows a single backend to support multiple version control systems. VCS mappings maintain bidirectional relationships between Bonsai and external commit identifiers.
Asynchronous Derivation - Derived data (manifests, blame, filenodes) is computed off the critical write path. When a commit is pushed, only the Bonsai changeset and file contents are written synchronously. Indexes are computed asynchronously by derivation workers. This reduces push latency but introduces eventual consistency for derived data.
Facet Composition - Repository functionality is provided through composable facets (traits) rather than monolithic objects. Each facet encapsulates a specific capability such as blobstore access, commit graph traversal, or bookmark management. Functions declare their requirements through trait bounds, making dependencies explicit.
Stateless Services - Servers maintain no persistent state. All repository data resides in external storage (blobstore and metadata database). This enables horizontal scaling and simplifies deployments but requires caching for performance.
Decorator-Based Storage - Blobstore functionality is composed through decorators that add caching, multiplexing, compression, and other capabilities. This allows flexible configuration of the storage stack at the cost of additional abstraction layers.
Microservice Delegation - Expensive operations (landing, large diffs, coordinated derivation) are delegated to dedicated microservices. This isolates resource-intensive work from lightweight read operations and enables serialization of operations that require mutual exclusion.
Understanding Mononoke's architecture requires understanding both the system architecture (how services are composed) and the code architecture (how each application is structured internally).
At the system level, requests flow through the service tiers:
Read Request (e.g., fetch a file):
Client (Sapling)
→ Frontend Service (Mononoke Server)
→ Backend Storage (Blobstore + Metadata DB)
→ Response
Write Request (e.g., push commits):
Client (Sapling)
→ Frontend Service (Mononoke Server)
→ Microservice (Land Service)
→ Backend Storage (Blobstore + Metadata DB)
→ Microservice (Derived Data Service, async)
→ Response
The frontend services are stateless and can scale horizontally. Microservices handle expensive or serialized operations. Backend storage is shared by all services.
Within each application, dependencies flow downward through the library layers:
Application Binaries (servers, tools, jobs)
↓ use
API Layer (mononoke_api) - optional, for high-level abstractions
↓ uses
Features (pushrebase, cross_repo_sync, hooks)
↓ use
Repository Facets (repo_attributes/*)
↓ use
Common Components (blobstore, mononoke_types, common/*)
Lower layers have no knowledge of upper layers. This keeps dependencies clean and makes testing easier.
Understanding how data flows helps clarify the separation of concerns:
Push (Write Path):
Client → Protocol Server → Hooks → Bonsai Creation →
Blobstore Write → Metadata DB Update → Bookmark Move →
Async Derivation (off critical path)
Pull (Read Path):
Client → Protocol Server → API Layer →
Check Derived Data → Fetch from Blobstore (via cache) →
Convert to VCS Format → Return to Client
Background Derivation:
Derivation Worker → Check What Needs Derivation →
Fetch Dependencies → Compute Derived Data →
Store in Blobstore → Update Derivation Status
The Mononoke codebase is organized to reflect both architectures:
Library Layers (Code Architecture):
common/ - Foundation components (utilities, async helpers)blobstore/ - Storage abstractions and implementationsmononoke_types/ - Core data types (Bonsai, content IDs)repo_attributes/ - Repository facets (~35 composable capabilities)features/ - High-level operations (pushrebase, cross-repo sync, etc.)mononoke_api/ - API layer abstractionscmdlib/ - Application frameworkApplications (System Architecture):
Frontend Services (in servers/):*
servers/slapi/slapi_server/ - Mononoke Server (SLAPI)servers/slapi/slapi_service/ - SLAPI service implementationservers/git/git_server/ - Git protocol serverservers/lfs/lfs_server/ - LFS protocol serverscs/ - SCS Thrift API server (not yet migrated to servers/)Microservices:
servers/land_service/ - Landing operationsderived_data/remote/ - Derived data coordinationTools and Jobs:
tools/admin/ - Admin CLItools/testtool/ - Testing utilitiesjobs/walker/ - Graph walkerjobs/blobstore_healer/ - Storage healerThis organization makes it clear which directories contain reusable libraries (used by all applications) and which contain application-specific code (servers, tools, jobs).
This overview provides the big picture of Mononoke's architecture. To dive deeper:
Component-specific documentation lives in the respective directories alongside the code.