Back to Sapling

Architecture Overview

eden/mononoke/docs/1.3-architecture-overview.md

latest27.3 KB
Original Source

Architecture Overview

This document provides a high-level overview of Mononoke's architecture. After reading this, you should understand how Mononoke's major components fit together, how data flows through the system, and the key architectural decisions that shape the codebase.

Reading time: ~30 minutes

Introduction

Mononoke is a distributed source control server designed to scale along multiple dimensions: commit rate, repository size, file count, and branch count. Rather than being a monolithic application, Mononoke is structured as a collection of Rust libraries that implement source control functionality, which are then composed into various servers, services, and tools.

The architecture is designed around several core principles:

  1. Stateless servers - All persistent state lives in external storage (blobstore, databases)
  2. Horizontal scalability - Servers can be scaled independently to handle load
  3. VCS independence - A canonical data model supports multiple version control systems
  4. Separation of concerns - Write path optimized separately from read path
  5. Modular composition - Repository functionality built from composable facets

System Architecture: Service Composition

Mononoke is designed as a distributed system composed of multiple services that work together.

Service Tiers

Mononoke is organized into three tiers:

1. Frontend Services (Stateless)

These services serve external clients and implement various protocols. They are stateless and can be horizontally scaled to handle client load.

Protocol Servers:

  • Mononoke Server (SLAPI, formerly EdenAPI) - Serves Sapling CLI and EdenFS clients using the SLAPI protocol over HTTP
  • Git Server - Serves Git clients using the Git protocol over HTTP
  • LFS Server - Serves Git LFS requests
  • SCS Server - Provides a Thrift API for programmatic access to repositories

All frontend services authenticate requests, translate between protocol-specific formats and Mononoke's internal Bonsai format, and route operations to the appropriate backend storage or microservices.

2. Internal Microservices

These services handle expensive or serialized operations that are offloaded from the main frontend servers. This prevents resource contention and enables dedicated scaling for specific workloads.

Microservices:

  • Land Service - Handles commit landing (merging) operations, serializing writes to public bookmarks to prevent conflicts
  • Derived Data Service - Coordinates asynchronous computation of derived data across multiple workers
  • Diff Service - Computes diffs for code review and other tools
  • Bookmark Service - Maintains warm caches of bookmark state

By offloading these operations to dedicated services, frontend servers remain responsive to lightweight operations like fetches and clones.

3. Backend Storage

All services (frontend and microservices) share common backend storage:

Storage Systems:

  • Blobstore - Immutable key-value storage for file contents, commit metadata, derived data, and manifests
  • Metadata Database - Mutable SQL database storing bookmarks, VCS mappings, commit graph index, and repository state

This shared storage enables stateless services and allows any server to handle any request for a given repository.

System Diagram

┌──────────────────────────────────────────────────────────────┐
│                      Client Layer                            │
│  Sapling CLI  │  EdenFS  │  Git Clients  │  Other Services   │
└──────────────────────────────────────────────────────────────┘
                             ▼
┌──────────────────────────────────────────────────────────────┐
│              Frontend Services (Stateless)                   │
│                                                              │
│  Mononoke Server (SLAPI)  │  Git Server  │  LFS Server       │
│  SCS Server (Thrift)                                         │
└──────────────────────────────────────────────────────────────┘
                             ▼
             ┌───────────────┴───────────────┐
             ▼                               ▼
┌────────────────────────┐      ┌──────────────────────────────┐
│  Internal Microservices│      │    Backend Storage           │
│                        │      │                              │
│  Land Service          │◄────►│  Blobstore (immutable)       │
│  Derived Data Service  │      │  - File contents             │
│  Diff Service          │      │  - Commit metadata           │
│  Bookmark Service      │      │  - Derived data              │
│                        │      │                              │
│                        │      │  Metadata DB (mutable)       │
│                        │      │  - Bookmarks                 │
│                        │      │  - VCS mappings              │
│                        │      │  - Commit graph              │
└────────────────────────┘      └──────────────────────────────┘

Service Architecture Characteristics

This service-oriented architecture has the following characteristics:

Independent Scaling - Each tier can be scaled independently. Git servers can be added to increase capacity for Git operations, while derivation workers can be scaled separately.

Workload Isolation - Expensive operations (landing, derivation) run in dedicated services, isolating them from lightweight operations (fetches, file reads).

Sequential Coordination - Operations requiring mutual exclusion (e.g., landing to the same bookmark) are coordinated by routing through a single service instance.

Independent Deployment - Services can be deployed, updated, and monitored independently.

Resource Allocation - Different services have different resource profiles (CPU, memory, I/O) and can be provisioned accordingly.

Relationship to Code Architecture

It's important to understand that the service architecture and code architecture are orthogonal concerns:

  • Service architecture describes how Mononoke is deployed as a distributed system (frontend services, microservices, storage)
  • Code architecture describes how each service/tool is built internally using shared libraries (common → facets → features → API)

Every Mononoke binary - whether it's the SLAPI server, the land service, or the admin tool - uses the same internal layered structure. They all share the same repository implementation, blobstore abstractions, and core types. The difference is in what protocol they serve or what operations they perform.

Code Architecture: Internal Application Structure

While the system architecture describes how Mononoke services are composed, each individual Mononoke application (whether a server, microservice, tool, or job) is structured internally using a consistent layered design. This code architecture is implemented as a collection of Rust libraries that are composed together.

Understanding this internal structure is key to navigating the codebase and understanding how to build new tools or modify existing ones.

Application Layering

Every Mononoke application - whether it's the main SLAPI server, the admin CLI tool, or the walker job - is built using the same layered architecture. This consistent structure makes it easier to understand any part of the codebase once you understand the pattern.

The layers are:

┌──────────────────────────────────┐
│   Application Binary             │  (server, tool, or job main())
│   (uses cmdlib/mononoke_app)     │
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   API Layer (optional)           │  High-level abstractions
│   (mononoke_api)                 │  (Repo, Changeset, File objects)
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   Features                       │  High-level operations
│   (pushrebase, cross_repo_sync)  │  (combines multiple facets)
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   Repository Attributes          │  Composable capabilities
│   (repo_attributes/*)            │  (facets providing specific functionality)
└──────────────────────────────────┘
               ▼
┌──────────────────────────────────┐
│   Common Components              │  Foundation libraries
│   (blobstore, mononoke_types,    │  (storage, types, utilities)
│    common/*, cmdlib)             │
└──────────────────────────────────┘

Each layer is described below:

1. Common Components (Foundation Layer)

The foundation layer provides basic building blocks used by all Mononoke applications. These libraries know nothing about repositories or high-level version control concepts - they provide fundamental primitives.

Key components:

  • Context (context/) - Request context carrying permissions, logging, and tracing
  • Blobstore abstractions (blobstore/) - Key-value storage interface and implementations
  • Common utilities (common/) - Async utilities, SQL helpers, logging extensions
  • Core types (mononoke_types/) - Fundamental data structures (Bonsai changesets, file content)
  • Configuration (metaconfig/) - Repository configuration system
  • Permission checking (permission_checker/) - Access control primitives

These components are designed to be reusable and have minimal dependencies on each other.

2. Repository Attributes (Composition Layer)

Repository attributes are implemented as facets - trait-based components that provide specific capabilities. Each facet encapsulates a single responsibility and can contain state that forms part of the repository.

Core attribute categories:

Identity and Configuration:

  • repo_identity - Repository ID and name
  • repo_config - Repository configuration

Storage:

  • repo_blobstore - Access to immutable blob storage
  • filestore - File content storage with chunking

Commit Graph:

  • commit_graph - Efficient ancestry queries and traversal
  • commit_graph_writer - Write interface for graph updates

Derived Data:

  • repo_derived_data - Manager for all derived data types
  • repo_derivation_queues - Remote derivation coordination

VCS Integration:

  • bonsai_hg_mapping - Bonsai ↔ Mercurial changeset mappings
  • bonsai_git_mapping - Bonsai ↔ Git commit mappings
  • bonsai_globalrev_mapping - Sequential integer IDs
  • git_symbolic_refs - Git reference handling

Bookmarks and References:

  • bookmarks - Branch management (read/write)
  • bookmark_update_log - History of bookmark movements
  • bookmarks_cache - Cached bookmark data

Operations:

  • repo_permission_checker - Access control
  • hook_manager - Pre-commit hooks
  • repo_cross_repo - Cross-repository operations

Facets are located in repo_attributes/ and are composed together into complete repositories using the facet container pattern. This allows different operations to declare exactly which repository capabilities they need.

3. Features (High-Level Operations)

Features combine multiple facets to implement source control operations. Unlike facets, features don't hold state - they orchestrate operations across repository attributes.

Key features (in features/):

  • Pushrebase (features/pushrebase/) - Server-side rebasing for linear history
  • Cross-repo sync (features/cross_repo_sync/) - Repository synchronization
  • Commit transformation (features/commit_transformation/) - Rewriting commits
  • Redaction (features/redaction/) - Content removal
  • Diff and history (features/diff/, features/history_traversal/) - Content comparison and traversal
  • Bookmark movement (bookmarks/bookmarks_movement/) - Safe bookmark updates
  • Microwave (features/microwave/) - Cache warming

Features are typically implemented as functions that accept trait bounds specifying the required facets.

4. API Layer (Optional)

The API layer (mononoke_api/ and mononoke_api_hg/) provides high-level abstractions over the underlying implementation details. Not all applications use this layer - it's primarily used by servers and interactive tools.

It exposes objects like:

  • Repo - Repository operations
  • Changeset - Commit metadata and operations
  • File - File content and history
  • Tree - Directory listings

This layer hides whether operations use commit graph vs. segmented changelog, which manifest type is used, etc. It provides a stable interface that isolates protocol implementations from internal representation changes.

5. Application Framework (cmdlib)

All Mononoke binaries use the cmdlib/mononoke_app/ framework, which provides:

  • Argument parsing and configuration loading
  • Monitoring and observability setup
  • Graceful shutdown handling
  • Repository initialization
  • Common command-line options

This ensures all Mononoke applications have consistent behavior, configuration, and operational characteristics.

How Applications Use These Layers

Different types of applications use different subsets of these layers:

Servers (Mononoke, Git, SCS):

Server Binary (cmdlib/mononoke_app)
    ↓
API Layer (mononoke_api)
    ↓
Features (pushrebase, hooks)
    ↓
Repository Facets
    ↓
Common Components

Tools (admin CLI, walker):

Tool Binary (cmdlib/mononoke_app)
    ↓
Features (or direct facet access)
    ↓
Repository Facets
    ↓
Common Components

Microservices (Land, Derived Data):

Service Binary (cmdlib/mononoke_app)
    ↓
Features
    ↓
Repository Facets
    ↓
Common Components

The key insight is that all applications share the same library codebase. A function implementing pushrebase can be used by the SLAPI server (via API layer), the land service (directly), and the admin tool (for testing).

Data Model and Flow

The Bonsai Data Model

At the heart of Mononoke is Bonsai, a VCS-agnostic data model that serves as the single source of truth. Bonsai is designed for:

  • VCS independence - Not tied to Git or Mercurial internals
  • High throughput writes - Minimal metadata, optimized for fast commit ingestion
  • Content addressing - Blake2b hashing creates a Merkle DAG ensuring data integrity
  • Multi-VCS support - Can be converted to/from Git and Mercurial formats

Core Bonsai data includes:

  • Bonsai changesets - Commit metadata (parents, author, date, message, file changes)
  • File content blobs - The actual file contents, stored with content-based keys enabling deduplication

Everything else in Mononoke is derived from or maps to this core data.

Write Path vs. Read Path

Mononoke's architecture separates the write path (committing new data) from the read path (serving queries), optimizing each independently.

Write Path: Core Data Only

When a commit is pushed:

  1. Validate and transform - Client's commit (Git/Mercurial format) is validated
  2. Create Bonsai changeset - Converted to canonical Bonsai format
  3. Store in blobstore - Bonsai changeset and new file contents written
  4. Update metadata DB - VCS mapping (Bonsai ↔ Git/Mercurial hash) stored
  5. Update commit graph - Parent relationships indexed
  6. Run hooks - Pre-commit policy enforcement
  7. Move bookmark - Branch pointer updated
  8. Push completes - Next commit can now proceed

Derived data is not computed on the write path. This keeps the critical section minimal and push latency low.

Read Path: Derived Data

Most read operations require data not present in the minimal Bonsai format. For example:

  • Mercurial clients need Mercurial manifests and filenodes
  • Git clients need Git trees and commits
  • Blame needs line-by-line author attribution
  • File listings need efficient directory traversal

These are provided by derived data - indexes and representations computed asynchronously from Bonsai changesets. Derived data is:

  • Computed off the critical path - After push completes
  • Stored in blobstore - Cached for future reads
  • Dependency-aware - Some types depend on others (blame depends on unodes)
  • Derivable in parallel - For independent changesets and types
  • Versionable - Can be redesigned and recomputed (backfilled)

Major derived data types:

  • Manifests - Directory structures (fsnodes, unodes, skeleton manifests, Git trees)
  • File metadata - Filenodes (Mercurial), blame, fastlog
  • Git-specific - Git commits, delta manifests
  • Utilities - Changeset info, deleted manifest, case conflict detection

The derived data service can coordinate derivation across workers, deduplicating work and enabling horizontal scaling of derivation workload.

Request Flow Example: Fetching a File

A request flows through the system as follows:

  1. Client request - Sapling client requests file content via SLAPI
  2. Protocol server - Mononoke server authenticates and translates request
  3. API layer - Calls high-level file fetch API
  4. Facets - API uses:
    • repo_derived_data to get appropriate manifest type
    • repo_blobstore to fetch manifest and file content
    • bonsai_hg_mapping (if needed) to resolve Mercurial hash to Bonsai
  5. Storage layer - Blobstore returns data (from cache or backend)
  6. Response - Data flows back through layers to client

Throughout this flow, caching (cachelib, memcache) reduces load on storage backends.

Storage Architecture

Mononoke uses two complementary storage systems:

Immutable Blobstore

The blobstore is a key-value store for immutable data:

  • File contents - Actual file data, chunked for large files
  • Bonsai changesets - Commit metadata
  • Derived data - Manifests, blame, filenodes, etc.
  • VCS-specific formats - Original Git/Mercurial bytes

Blobstore implementations:

  • Manifoldblob - Primary production backend (Facebook-internal)
  • SQLblob - SQL database backend
  • S3blob - Amazon S3 backend
  • Fileblob - Filesystem (development/testing)
  • Memblob - In-memory (testing)

Blobstore decorator pattern:

Mononoke uses decorators to layer functionality:

Application
    ↓
prefixblob (repo-specific key namespacing)
    ↓
cacheblob (memcache + cachelib caching)
    ↓
multiplexedblob (write-all, read-any across backends)
    ↓
packblob (compression for space efficiency)
    ↓
Backend storage (sqlblob/manifoldblob/s3blob)

Additional decorators include redaction (content removal), sampling (observability), throttling (rate limiting), and chaos/delay (testing).

Multiplexing for availability:

To avoid cyclic dependencies (storage systems need source control to deploy fixes), Mononoke writes to multiple independent blobstores simultaneously. If one backend is down, reads succeed from others. A healer job ensures consistency across backends.

Mutable Metadata Database

A SQL database (MySQL in production, SQLite for development) stores mutable repository state and mappings:

  • Bookmarks - Current branch positions
  • VCS mappings - Bonsai ↔ Git/Mercurial/SVN/Globalrev hash mappings
  • Commit graph index - Parent/child relationships for fast queries
  • Phases - Draft vs. public commit status
  • Bookmark update log - Audit trail of all bookmark movements
  • Counters - Mutable counters for sync operations

The metadata DB is used for small, frequently changing data, while the blobstore handles large, immutable content.

Caching Strategy

Mononoke employs multi-level caching:

  1. In-memory cache (Cachelib) - Per-server process cache
  2. Shared cache (Memcache) - Cross-server cache within a region
  3. Warm bookmark cache - Precomputed state for important branches

Cache configuration can be controlled via command-line flags (e.g., --cache-mode=local-only).

Architectural Characteristics

Several design choices shape Mononoke's implementation:

VCS-Agnostic Data Model - Mononoke uses Bonsai as its internal representation rather than Git or Mercurial formats. This requires conversion when serving clients but allows a single backend to support multiple version control systems. VCS mappings maintain bidirectional relationships between Bonsai and external commit identifiers.

Asynchronous Derivation - Derived data (manifests, blame, filenodes) is computed off the critical write path. When a commit is pushed, only the Bonsai changeset and file contents are written synchronously. Indexes are computed asynchronously by derivation workers. This reduces push latency but introduces eventual consistency for derived data.

Facet Composition - Repository functionality is provided through composable facets (traits) rather than monolithic objects. Each facet encapsulates a specific capability such as blobstore access, commit graph traversal, or bookmark management. Functions declare their requirements through trait bounds, making dependencies explicit.

Stateless Services - Servers maintain no persistent state. All repository data resides in external storage (blobstore and metadata database). This enables horizontal scaling and simplifies deployments but requires caching for performance.

Decorator-Based Storage - Blobstore functionality is composed through decorators that add caching, multiplexing, compression, and other capabilities. This allows flexible configuration of the storage stack at the cost of additional abstraction layers.

Microservice Delegation - Expensive operations (landing, large diffs, coordinated derivation) are delegated to dedicated microservices. This isolates resource-intensive work from lightweight read operations and enables serialization of operations that require mutual exclusion.

Putting It All Together

Understanding Mononoke's architecture requires understanding both the system architecture (how services are composed) and the code architecture (how each application is structured internally).

Service Composition

At the system level, requests flow through the service tiers:

Read Request (e.g., fetch a file):

Client (Sapling)
    → Frontend Service (Mononoke Server)
        → Backend Storage (Blobstore + Metadata DB)
            → Response

Write Request (e.g., push commits):

Client (Sapling)
    → Frontend Service (Mononoke Server)
        → Microservice (Land Service)
            → Backend Storage (Blobstore + Metadata DB)
                → Microservice (Derived Data Service, async)
                    → Response

The frontend services are stateless and can scale horizontally. Microservices handle expensive or serialized operations. Backend storage is shared by all services.

Code Layer Dependencies

Within each application, dependencies flow downward through the library layers:

Application Binaries (servers, tools, jobs)
    ↓ use
API Layer (mononoke_api) - optional, for high-level abstractions
    ↓ uses
Features (pushrebase, cross_repo_sync, hooks)
    ↓ use
Repository Facets (repo_attributes/*)
    ↓ use
Common Components (blobstore, mononoke_types, common/*)

Lower layers have no knowledge of upper layers. This keeps dependencies clean and makes testing easier.

Data Flow Patterns

Understanding how data flows helps clarify the separation of concerns:

Push (Write Path):

Client → Protocol Server → Hooks → Bonsai Creation →
    Blobstore Write → Metadata DB Update → Bookmark Move →
        Async Derivation (off critical path)

Pull (Read Path):

Client → Protocol Server → API Layer →
    Check Derived Data → Fetch from Blobstore (via cache) →
        Convert to VCS Format → Return to Client

Background Derivation:

Derivation Worker → Check What Needs Derivation →
    Fetch Dependencies → Compute Derived Data →
        Store in Blobstore → Update Derivation Status

Directory Organization

The Mononoke codebase is organized to reflect both architectures:

Library Layers (Code Architecture):

  • common/ - Foundation components (utilities, async helpers)
  • blobstore/ - Storage abstractions and implementations
  • mononoke_types/ - Core data types (Bonsai, content IDs)
  • repo_attributes/ - Repository facets (~35 composable capabilities)
  • features/ - High-level operations (pushrebase, cross-repo sync, etc.)
  • mononoke_api/ - API layer abstractions
  • cmdlib/ - Application framework

Applications (System Architecture):

Frontend Services (in servers/):*

  • servers/slapi/slapi_server/ - Mononoke Server (SLAPI)
  • servers/slapi/slapi_service/ - SLAPI service implementation
  • servers/git/git_server/ - Git protocol server
  • servers/lfs/lfs_server/ - LFS protocol server
  • scs/ - SCS Thrift API server (not yet migrated to servers/)

Microservices:

  • servers/land_service/ - Landing operations
  • derived_data/remote/ - Derived data coordination
  • Diff service, bookmark service (various locations)

Tools and Jobs:

  • tools/admin/ - Admin CLI
  • tools/testtool/ - Testing utilities
  • jobs/walker/ - Graph walker
  • jobs/blobstore_healer/ - Storage healer

This organization makes it clear which directories contain reusable libraries (used by all applications) and which contain application-specific code (servers, tools, jobs).

Next Steps

This overview provides the big picture of Mononoke's architecture. To dive deeper:

Component-specific documentation lives in the respective directories alongside the code.