Architecture Overview

This document provides a high-level overview of Mononoke's architecture. After reading this, you should understand how Mononoke's major components fit together, how data flows through the system, and the key architectural decisions that shape the codebase.

Reading time: ~30 minutes

Introduction

Mononoke is a distributed source control server designed to scale along multiple dimensions: commit rate, repository size, file count, and branch count. Rather than being a monolithic application, Mononoke is structured as a collection of Rust libraries that implement source control functionality, which are then composed into various servers, services, and tools.

The architecture is designed around several core principles:

Stateless servers - All persistent state lives in external storage (blobstore, databases)
Horizontal scalability - Servers can be scaled independently to handle load
VCS independence - A canonical data model supports multiple version control systems
Separation of concerns - Write path optimized separately from read path
Modular composition - Repository functionality built from composable facets

System Architecture: Service Composition

Mononoke is designed as a distributed system composed of multiple services that work together.

Service Tiers

Mononoke is organized into three tiers:

1. Frontend Services (Stateless)

These services serve external clients and implement various protocols. They are stateless and can be horizontally scaled to handle client load.

Protocol Servers:

Mononoke Server (SLAPI, formerly EdenAPI) - Serves Sapling CLI and EdenFS clients using the SLAPI protocol over HTTP
Git Server - Serves Git clients using the Git protocol over HTTP
LFS Server - Serves Git LFS requests
SCS Server - Provides a Thrift API for programmatic access to repositories

All frontend services authenticate requests, translate between protocol-specific formats and Mononoke's internal Bonsai format, and route operations to the appropriate backend storage or microservices.

2. Internal Microservices

These services handle expensive or serialized operations that are offloaded from the main frontend servers. This prevents resource contention and enables dedicated scaling for specific workloads.

Microservices:

Land Service - Handles commit landing (merging) operations, serializing writes to public bookmarks to prevent conflicts
Derived Data Service - Coordinates asynchronous computation of derived data across multiple workers
Diff Service - Computes diffs for code review and other tools
Bookmark Service - Maintains warm caches of bookmark state

By offloading these operations to dedicated services, frontend servers remain responsive to lightweight operations like fetches and clones.

3. Backend Storage

All services (frontend and microservices) share common backend storage:

Storage Systems:

Blobstore - Immutable key-value storage for file contents, commit metadata, derived data, and manifests
Metadata Database - Mutable SQL database storing bookmarks, VCS mappings, commit graph index, and repository state

This shared storage enables stateless services and allows any server to handle any request for a given repository.

System Diagram

┌──────────────────────────────────────────────────────────────┐
│                      Client Layer                            │
│  Sapling CLI  │  EdenFS  │  Git Clients  │  Other Services   │
└──────────────────────────────────────────────────────────────┘
                             ▼
┌──────────────────────────────────────────────────────────────┐
│              Frontend Services (Stateless)                   │
│                                                              │
│  Mononoke Server (SLAPI)  │  Git Server  │  LFS Server       │
│  SCS Server (Thrift)                                         │
└──────────────────────────────────────────────────────────────┘
                             ▼
             ┌───────────────┴───────────────┐
             ▼                               ▼
┌────────────────────────┐      ┌──────────────────────────────┐
│  Internal Microservices│      │    Backend Storage           │
│                        │      │                              │
│  Land Service          │◄────►│  Blobstore (immutable)       │
│  Derived Data Service  │      │  - File contents             │
│  Diff Service          │      │  - Commit metadata           │
│  Bookmark Service      │      │  - Derived data              │
│                        │      │                              │
│                        │      │  Metadata DB (mutable)       │
│                        │      │  - Bookmarks                 │
│                        │      │  - VCS mappings              │
│                        │      │  - Commit graph              │
└────────────────────────┘      └──────────────────────────────┘

Service Architecture Characteristics

This service-oriented architecture has the following characteristics:

Independent Scaling - Each tier can be scaled independently. Git servers can be added to increase capacity for Git operations, while derivation workers can be scaled separately.

Workload Isolation - Expensive operations (landing, derivation) run in dedicated services, isolating them from lightweight operations (fetches, file reads).

Sequential Coordination - Operations requiring mutual exclusion (e.g., landing to the same bookmark) are coordinated by routing through a single service instance.

Independent Deployment - Services can be deployed, updated, and monitored independently.

Resource Allocation - Different services have different resource profiles (CPU, memory, I/O) and can be provisioned accordingly.

Relationship to Code Architecture

It's important to understand that the service architecture and code architecture are orthogonal concerns:

Service architecture describes how Mononoke is deployed as a distributed system (frontend services, microservices, storage)
Code architecture describes how each service/tool is built internally using shared libraries (common → facets → features → API)

Every Mononoke binary - whether it's the SLAPI server, the land service, or the admin tool - uses the same internal layered structure. They all share the same repository implementation, blobstore abstractions, and core types. The difference is in what protocol they serve or what operations they perform.

Code Architecture: Internal Application Structure

While the system architecture describes how Mononoke services are composed, each individual Mononoke application (whether a server, microservice, tool, or job) is structured internally using a consistent layered design. This code architecture is implemented as a collection of Rust libraries that are composed together.

Understanding this internal structure is key to navigating the codebase and understanding how to build new tools or modify existing ones.

Application Layering

Every Mononoke application - whether it's the main SLAPI server, the admin CLI tool, or the walker job - is built using the same layered architecture. This consistent structure makes it easier to understand any part of the codebase once you understand the pattern.

The layers are:

┌──────────────────────────────────┐
│   Application Binary             │  (server, tool, or job main())
│   (uses cmdlib/mononoke_app)     │
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   API Layer (optional)           │  High-level abstractions
│   (mononoke_api)                 │  (Repo, Changeset, File objects)
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   Features                       │  High-level operations
│   (pushrebase, cross_repo_sync)  │  (combines multiple facets)
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   Repository Attributes          │  Composable capabilities
│   (repo_attributes/*)            │  (facets providing specific functionality)
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   Common Components              │  Foundation libraries
│   (blobstore, mononoke_types,    │  (storage, types, utilities)
│    common/*, cmdlib)             │
└──────────────────────────────────┘

Each layer is described below:

1. Common Components (Foundation Layer)

The foundation layer provides basic building blocks used by all Mononoke applications. These libraries know nothing about repositories or high-level version control concepts - they provide fundamental primitives.

Key components:

Context (context/) - Request context carrying permissions, logging, and tracing
Blobstore abstractions (blobstore/) - Key-value storage interface and implementations
Common utilities (common/) - Async utilities, SQL helpers, logging extensions
Core types (mononoke_types/) - Fundamental data structures (Bonsai changesets, file content)
Configuration (metaconfig/) - Repository configuration system
Permission checking (permission_checker/) - Access control primitives

These components are designed to be reusable and have minimal dependencies on each other.

2. Repository Attributes (Composition Layer)

Repository attributes are implemented as facets - trait-based components that provide specific capabilities. Each facet encapsulates a single responsibility and can contain state that forms part of the repository.

Core attribute categories:

Identity and Configuration:

repo_identity - Repository ID and name
repo_config - Repository configuration

Storage:

repo_blobstore - Access to immutable blob storage
filestore - File content storage with chunking

Commit Graph:

commit_graph - Efficient ancestry queries and traversal
commit_graph_writer - Write interface for graph updates

Derived Data:

repo_derived_data - Manager for all derived data types
repo_derivation_queues - Remote derivation coordination

VCS Integration:

bonsai_hg_mapping - Bonsai ↔ Mercurial changeset mappings
bonsai_git_mapping - Bonsai ↔ Git commit mappings
bonsai_globalrev_mapping - Sequential integer IDs
git_symbolic_refs - Git reference handling

Bookmarks and References:

bookmarks - Branch management (read/write)
bookmark_update_log - History of bookmark movements
bookmarks_cache - Cached bookmark data

Operations:

repo_permission_checker - Access control
hook_manager - Pre-commit hooks
repo_cross_repo - Cross-repository operations

Facets are located in repo_attributes/ and are composed together into complete repositories using the facet container pattern. This allows different operations to declare exactly which repository capabilities they need.

3. Features (High-Level Operations)

Features combine multiple facets to implement source control operations. Unlike facets, features don't hold state - they orchestrate operations across repository attributes.

Key features (in features/):

Pushrebase (features/pushrebase/) - Server-side rebasing for linear history
Cross-repo sync (features/cross_repo_sync/) - Repository synchronization
Commit transformation (features/commit_transformation/) - Rewriting commits
Redaction (features/redaction/) - Content removal
Diff and history (features/diff/, features/history_traversal/) - Content comparison and traversal
Bookmark movement (bookmarks/bookmarks_movement/) - Safe bookmark updates
Microwave (features/microwave/) - Cache warming

Features are typically implemented as functions that accept trait bounds specifying the required facets.

4. API Layer (Optional)

The API layer (mononoke_api/ and mononoke_api_hg/) provides high-level abstractions over the underlying implementation details. Not all applications use this layer - it's primarily used by servers and interactive tools.

It exposes objects like:

Repo - Repository operations
Changeset - Commit metadata and operations
File - File content and history
Tree - Directory listings

This layer hides whether operations use commit graph vs. segmented changelog, which manifest type is used, etc. It provides a stable interface that isolates protocol implementations from internal representation changes.

5. Application Framework (cmdlib)

All Mononoke binaries use the cmdlib/mononoke_app/ framework, which provides:

Argument parsing and configuration loading
Monitoring and observability setup
Graceful shutdown handling
Repository initialization
Common command-line options

This ensures all Mononoke applications have consistent behavior, configuration, and operational characteristics.

How Applications Use These Layers

Different types of applications use different subsets of these layers:

Servers (Mononoke, Git, SCS):

Server Binary (cmdlib/mononoke_app)
    ↓
API Layer (mononoke_api)
    ↓
Features (pushrebase, hooks)
    ↓
Repository Facets
    ↓
Common Components

Tools (admin CLI, walker):

Tool Binary (cmdlib/mononoke_app)
    ↓
Features (or direct facet access)
    ↓
Repository Facets
    ↓
Common Components

Microservices (Land, Derived Data):

Service Binary (cmdlib/mononoke_app)
    ↓
Features
    ↓
Repository Facets
    ↓
Common Components

The key insight is that all applications share the same library codebase. A function implementing pushrebase can be used by the SLAPI server (via API layer), the land service (directly), and the admin tool (for testing).

Data Model and Flow

The Bonsai Data Model

At the heart of Mononoke is Bonsai, a VCS-agnostic data model that serves as the single source of truth. Bonsai is designed for:

VCS independence - Not tied to Git or Mercurial internals
High throughput writes - Minimal metadata, optimized for fast commit ingestion
Content addressing - Blake2b hashing creates a Merkle DAG ensuring data integrity
Multi-VCS support - Can be converted to/from Git and Mercurial formats

Core Bonsai data includes:

Bonsai changesets - Commit metadata (parents, author, date, message, file changes)
File content blobs - The actual file contents, stored with content-based keys enabling deduplication

Everything else in Mononoke is derived from or maps to this core data.

Write Path vs. Read Path

Mononoke's architecture separates the write path (committing new data) from the read path (serving queries), optimizing each independently.

Write Path: Core Data Only

When a commit is pushed:

Validate and transform - Client's commit (Git/Mercurial format) is validated
Create Bonsai changeset - Converted to canonical Bonsai format
Store in blobstore - Bonsai changeset and new file contents written
Update metadata DB - VCS mapping (Bonsai ↔ Git/Mercurial hash) stored
Update commit graph - Parent relationships indexed
Run hooks - Pre-commit policy enforcement
Move bookmark - Branch pointer updated
Push completes - Next commit can now proceed

Derived data is not computed on the write path. This keeps the critical section minimal and push latency low.

Read Path: Derived Data

Most read operations require data not present in the minimal Bonsai format. For example:

Mercurial clients need Mercurial manifests and filenodes
Git clients need Git trees and commits
Blame needs line-by-line author attribution
File listings need efficient directory traversal

These are provided by derived data - indexes and representations computed asynchronously from Bonsai changesets. Derived data is:

Computed off the critical path - After push completes
Stored in blobstore - Cached for future reads
Dependency-aware - Some types depend on others (blame depends on unodes)
Derivable in parallel - For independent changesets and types
Versionable - Can be redesigned and recomputed (backfilled)

Major derived data types:

Manifests - Directory structures (fsnodes, unodes, skeleton manifests, Git trees)
File metadata - Filenodes (Mercurial), blame, fastlog
Git-specific - Git commits, delta manifests
Utilities - Changeset info, deleted manifest, case conflict detection

The derived data service can coordinate derivation across workers, deduplicating work and enabling horizontal scaling of derivation workload.

Request Flow Example: Fetching a File

A request flows through the system as follows:

Client request - Sapling client requests file content via SLAPI
Protocol server - Mononoke server authenticates and translates request
API layer - Calls high-level file fetch API
Facets - API uses:
- repo_derived_data to get appropriate manifest type
- repo_blobstore to fetch manifest and file content
- bonsai_hg_mapping (if needed) to resolve Mercurial hash to Bonsai
Storage layer - Blobstore returns data (from cache or backend)
Response - Data flows back through layers to client

Throughout this flow, caching (cachelib, memcache) reduces load on storage backends.

Storage Architecture

Mononoke uses two complementary storage systems:

Immutable Blobstore

The blobstore is a key-value store for immutable data:

File contents - Actual file data, chunked for large files
Bonsai changesets - Commit metadata
Derived data - Manifests, blame, filenodes, etc.
VCS-specific formats - Original Git/Mercurial bytes

Blobstore implementations:

Manifoldblob - Primary production backend (Facebook-internal)
SQLblob - SQL database backend
S3blob - Amazon S3 backend
Fileblob - Filesystem (development/testing)
Memblob - In-memory (testing)

Blobstore decorator pattern:

Mononoke uses decorators to layer functionality:

Application
    ↓
prefixblob (repo-specific key namespacing)
    ↓
cacheblob (memcache + cachelib caching)
    ↓
multiplexedblob (write-all, read-any across backends)
    ↓
packblob (compression for space efficiency)
    ↓
Backend storage (sqlblob/manifoldblob/s3blob)

Additional decorators include redaction (content removal), sampling (observability), throttling (rate limiting), and chaos/delay (testing).

Multiplexing for availability:

To avoid cyclic dependencies (storage systems need source control to deploy fixes), Mononoke writes to multiple independent blobstores simultaneously. If one backend is down, reads succeed from others. A healer job ensures consistency across backends.

Mutable Metadata Database

A SQL database (MySQL in production, SQLite for development) stores mutable repository state and mappings:

Bookmarks - Current branch positions
VCS mappings - Bonsai ↔ Git/Mercurial/SVN/Globalrev hash mappings
Commit graph index - Parent/child relationships for fast queries
Phases - Draft vs. public commit status
Bookmark update log - Audit trail of all bookmark movements
Counters - Mutable counters for sync operations

The metadata DB is used for small, frequently changing data, while the blobstore handles large, immutable content.

Caching Strategy

Mononoke employs multi-level caching:

In-memory cache (Cachelib) - Per-server process cache
Shared cache (Memcache) - Cross-server cache within a region
Warm bookmark cache - Precomputed state for important branches

Cache configuration can be controlled via command-line flags (e.g., --cache-mode=local-only).

Architectural Characteristics

Several design choices shape Mononoke's implementation:

VCS-Agnostic Data Model - Mononoke uses Bonsai as its internal representation rather than Git or Mercurial formats. This requires conversion when serving clients but allows a single backend to support multiple version control systems. VCS mappings maintain bidirectional relationships between Bonsai and external commit identifiers.

Asynchronous Derivation - Derived data (manifests, blame, filenodes) is computed off the critical write path. When a commit is pushed, only the Bonsai changeset and file contents are written synchronously. Indexes are computed asynchronously by derivation workers. This reduces push latency but introduces eventual consistency for derived data.

Facet Composition - Repository functionality is provided through composable facets (traits) rather than monolithic objects. Each facet encapsulates a specific capability such as blobstore access, commit graph traversal, or bookmark management. Functions declare their requirements through trait bounds, making dependencies explicit.

Stateless Services - Servers maintain no persistent state. All repository data resides in external storage (blobstore and metadata database). This enables horizontal scaling and simplifies deployments but requires caching for performance.

Decorator-Based Storage - Blobstore functionality is composed through decorators that add caching, multiplexing, compression, and other capabilities. This allows flexible configuration of the storage stack at the cost of additional abstraction layers.

Microservice Delegation - Expensive operations (landing, large diffs, coordinated derivation) are delegated to dedicated microservices. This isolates resource-intensive work from lightweight read operations and enables serialization of operations that require mutual exclusion.

Putting It All Together

Understanding Mononoke's architecture requires understanding both the system architecture (how services are composed) and the code architecture (how each application is structured internally).

Service Composition

At the system level, requests flow through the service tiers:

Read Request (e.g., fetch a file):

Client (Sapling)
    → Frontend Service (Mononoke Server)
        → Backend Storage (Blobstore + Metadata DB)
            → Response

Write Request (e.g., push commits):

Client (Sapling)
    → Frontend Service (Mononoke Server)
        → Microservice (Land Service)
            → Backend Storage (Blobstore + Metadata DB)
                → Microservice (Derived Data Service, async)
                    → Response

The frontend services are stateless and can scale horizontally. Microservices handle expensive or serialized operations. Backend storage is shared by all services.

Code Layer Dependencies

Within each application, dependencies flow downward through the library layers:

Application Binaries (servers, tools, jobs)
    ↓ use
API Layer (mononoke_api) - optional, for high-level abstractions
    ↓ uses
Features (pushrebase, cross_repo_sync, hooks)
    ↓ use
Repository Facets (repo_attributes/*)
    ↓ use
Common Components (blobstore, mononoke_types, common/*)

Lower layers have no knowledge of upper layers. This keeps dependencies clean and makes testing easier.

Data Flow Patterns

Understanding how data flows helps clarify the separation of concerns:

Push (Write Path):

Client → Protocol Server → Hooks → Bonsai Creation →
    Blobstore Write → Metadata DB Update → Bookmark Move →
        Async Derivation (off critical path)

Pull (Read Path):

Client → Protocol Server → API Layer →
    Check Derived Data → Fetch from Blobstore (via cache) →
        Convert to VCS Format → Return to Client

Background Derivation:

Derivation Worker → Check What Needs Derivation →
    Fetch Dependencies → Compute Derived Data →
        Store in Blobstore → Update Derivation Status

Directory Organization

The Mononoke codebase is organized to reflect both architectures:

Library Layers (Code Architecture):

common/ - Foundation components (utilities, async helpers)
blobstore/ - Storage abstractions and implementations
mononoke_types/ - Core data types (Bonsai, content IDs)
repo_attributes/ - Repository facets (~35 composable capabilities)
features/ - High-level operations (pushrebase, cross-repo sync, etc.)
mononoke_api/ - API layer abstractions
cmdlib/ - Application framework

Applications (System Architecture):

Frontend Services (in servers/):*

servers/slapi/slapi_server/ - Mononoke Server (SLAPI)
servers/slapi/slapi_service/ - SLAPI service implementation
servers/git/git_server/ - Git protocol server
servers/lfs/lfs_server/ - LFS protocol server
scs/ - SCS Thrift API server (not yet migrated to servers/)

Microservices:

servers/land_service/ - Landing operations
derived_data/remote/ - Derived data coordination
Diff service, bookmark service (various locations)

Tools and Jobs:

tools/admin/ - Admin CLI
tools/testtool/ - Testing utilities
jobs/walker/ - Graph walker
jobs/blobstore_healer/ - Storage healer

This organization makes it clear which directories contain reusable libraries (used by all applications) and which contain application-specific code (servers, tools, jobs).

Next Steps

This overview provides the big picture of Mononoke's architecture. To dive deeper:

Key Concepts - Essential terminology and concepts
Navigating the Codebase - Finding your way around approximately 55-60 directories
Bonsai Data Model - Deep dive into the core data model
Repository Facets - Understanding the facet pattern
Derived Data - How indexes are computed and managed
Storage Architecture - Blobstore and database details

Component-specific documentation lives in the respective directories alongside the code.