Navigating the Mononoke Codebase

This guide helps developers find their way around Mononoke's approximately 55-60 top-level directories. Rather than providing an exhaustive catalog, this document teaches the organizational principles and patterns that make the codebase navigable.

Overview

Mononoke's codebase is organized around two key architectural concepts explained in the Architecture Overview:

System Architecture - How Mononoke is deployed as frontend services, microservices, and shared storage
Code Architecture - How each application is built internally using layered libraries

This document focuses on the code architecture and directory organization. The library code is organized into architectural layers, with each layer building on the ones below it. Understanding this layered structure is key to knowing where to find (or add) functionality:

Servers & Tools         (Entry points for users and operators)
         ↓
    API Layer           (High-level source control abstractions)
         ↓
   Features Layer       (Source control operations)
         ↓
Repo Attributes Layer   (Repository facets and capabilities)
         ↓
Base Components Layer   (Fundamental building blocks)

This layering principle guides the directory organization. Lower layers know nothing about higher layers - for example, base components know nothing about repositories, and repository attributes don't implement complete features.

Directory Organization Principles

Two-Level Hierarchy

Almost all components follow a two-level structure:

Category directory: Groups related components (e.g., blobstore/, repo_attributes/)
Component directory: Contains the actual implementation

Each component directory typically contains:

src/ - Rust source files
BUCK - Build definitions
Optional: Cargo.toml for OSS builds

Flat Namespace for Crates

Rust uses a flat namespace for crate names, so Mononoke components use descriptive names to avoid collisions. For example, the fsnodes component is named fsnodes_derivation rather than just fsnodes, and SQL storage implementations often use database-related prefixes (e.g., dbbookmarks, sql_commit_graph_storage).

Oncall Specification

All BUCK files start with oncall("scm_server_infra") to designate ownership. This is standard across the entire codebase.

Finding Components by Layer

Base Components (Foundation)

The lowest layer provides fundamental building blocks that know nothing about repositories or source control concepts.

common/ - Shared utilities and libraries

Async utilities: futures_watchdog, async_limiter
Logging: scuba_ext, logger_ext
Data structures: dedupmap, uniqueheap
Time and measurement: time_measuring, reloader
Graph algorithms: topo_sort
And many more specialized utilities

blobstore/ - Immutable key-value storage

Backend implementations: fileblob/, memblob/, sqlblob/, s3blob/
Caching: cacheblob/
Multi-backend: multiplexedblob/, multiplexedblob_wal/
Compression: packblob/ (see its README.md for details)
Storage decorators: redactedblobstore/, throttledblob/, prefixblob/
Testing utilities: chaosblob/, delayblob/
Ephemeral storage: ephemeral_blobstore/

mononoke_types/ - Core data type definitions

Defines Bonsai changesets, file changes, paths, and content IDs
See mononoke_types/docs/ for detailed type documentation
This is monolithic but could be split up in the future

cmdlib/ - Command-line application framework

mononoke_app/ - Standard framework for new binaries (use this!)
base_app/ - Base application primitives
Argument handling: config_args/, commit_id/, sharding/
Environment setup: environment/, logging/, log/
Common capabilities: caching/, scrubbing/, cross_repo/

These directories provide the foundation but contain no repository-specific logic.

The repo_attributes/ directory contains all the facets that compose a repository. Each subdirectory implements one aspect of repository functionality.

Finding facets: All repository attributes live in repo_attributes/. If you need to understand or modify a specific repository capability, start here.

Key facet categories:

Identity and Configuration

repo_identity/ - Repository name and ID
repo_bookmark_attrs/ - Bookmark configuration attributes

Storage Access

repo_blobstore/ - Repository-specific blobstore access
filestore/ - File content storage and retrieval
mutable_blobstore/ - Mutable blob operations

Commit Graph and History

commit_graph/ - Contains commit_graph/ for reading and sql_commit_graph_storage/ for storage
phases/ - Commit phase tracking (public, draft, etc.)

Derived Data

repo_derived_data/ - Per-repository derived data management
repo_derivation_queues/ - Derivation work queues

VCS Mappings

bonsai_hg_mapping/ - Bonsai ↔ Mercurial changeset mapping
bonsai_git_mapping/ - Bonsai ↔ Git commit mapping
bonsai_globalrev_mapping/ - Bonsai ↔ GlobalRev mapping
bonsai_svnrev_mapping/ - Bonsai ↔ SVN revision mapping
bonsai_tag_mapping/ - Tag object mappings
bonsai_blob_mapping/ - Blob content mappings

Bookmarks (Branches)

bookmarks/ - Contains bookmarks/ for bookmark access and dbbookmarks/ for storage

Git-Specific Attributes

git_ref_content_mapping/ - Git reference to content mappings
git_source_of_truth/ - Tracks Git source of truth
git_symbolic_refs/ - Git symbolic reference handling

Operations and Metadata

hook_manager/ - Hook execution management
pushrebase_mutation_mapping/ - Pushrebase mutation tracking
deletion_log/ - Deleted commit tracking
mutable_counters/ - Repository counters
mutable_renames/ - File rename tracking
repo_cross_repo/ - Cross-repository sync attributes
repo_lock/ - Repository locking
repo_permission_checker/ - Permission checking
repo_sparse_profiles/ - Sparse profile management
restricted_paths/ - Path access restrictions
sql_query_config/ - SQL query configuration
repo_metadata_checkpoint/ - Metadata checkpointing
repo_event_publisher/ - Event publishing

Legacy Filenode Storage

filenodes/ - Legacy filenode interface
newfilenodes/ - Newer filenode implementation

Pattern for using facets: Features and higher layers access repository capabilities through facet traits. Look at the dependencies in a feature's BUCK file to see which facets it uses.

Features Layer

The features/ directory contains source control operations implemented by combining repository facets. Features are stateless - they operate on facets but don't hold repository state themselves.

Current features in features/:

async_requests/ - Asynchronous request handling
cache_warmup/ - Repository cache preloading
changesets_creation/ - Changeset creation operations
commit_cloud/ - Commit cloud synchronization
commit_rewriting/ - Commit transformation and rewriting
commit_transformation/ - Commit rewriting and transformation
cross_repo_sync/ - Cross-repository synchronization
diff/ - Diff computation
history_traversal/ - History walking and traversal
hooks/ - Pre-commit and other hooks
microwave/ - Fast cache warming
pushrebase/ - Server-side rebasing
redaction/ - Content redaction
repo_metadata/ - Repository metadata operations
repo_stats_logger/ - Repository statistics logging
repo_update_logger/ - Repository update logging

Other feature locations: Some feature-related functionality is in other locations:

bookmarks/bookmarks_movement/ - Bookmark updates (within repo_attributes)

The features layer is being gradually populated as code is refactored.

Derived Data

The derived_data/ directory contains the derived data framework and all derived data type implementations.

Framework Components

manager/ - Derived data manager and coordination
remote/ - Remote derivation service
bulk_derivation/ - Batch derivation
constants/ - Shared constants
test_utils/ - Testing utilities

Derived Data Types (partial list - there are ~22 total)

Manifests: fsnodes/, unodes/, skeleton_manifest/, skeleton_manifest_v2/, basename_suffix_skeleton_manifest_v3/, deleted_manifest/, case_conflict_skeleton_manifest/, content_manifest_derivation/
File Metadata: filenodes_derivation/, blame/, fastlog/
Mercurial-specific: mercurial_derivation/
Utilities: changeset_info/, inferred_copy_from/
Testing: test_manifest/, test_sharded_manifest/

Finding derived data types: All implementations are subdirectories of derived_data/. Each type has its own directory with source and BUCK file.

API Layer

The mononoke_api/ directory provides high-level abstractions over Mononoke's internal data structures:

Repository objects
Changeset abstractions
Tree and file interfaces
VCS-agnostic operations

There's also mononoke_api_hg/ for Mercurial-specific API extensions.

Note: The API layer is intended to be the primary interface for servers and tools, though many components still access lower layers directly for historical reasons.

Servers, Jobs, and Tools

These directories contain the application binaries that use the library layers described above. As explained in the Architecture Overview, Mononoke is deployed as a collection of services (frontend servers and microservices) plus tools and background jobs. Each application uses the same layered library code.

Servers

Servers are organized under the servers/ directory, grouped by protocol. Each protocol has its own subdirectory:

Frontend Servers (serve external clients)

servers/slapi/slapi_server/ - Main Mononoke server (SLAPI for Sapling and EdenFS clients)
- Contains: repo_listener/, context/, qps/ subdirectories
- The actual binary is defined in the server's BUCK file
scs/ - Source Control Service (Thrift API for programmatic access)
- Contains: scs_server/ for the server and if/ for Thrift definitions
servers/git/git_server/ - Git protocol server (HTTP-based)
servers/lfs/lfs_server/ - Git LFS protocol server

Microservices (handle expensive operations)

servers/land_service/ - Landing (merge) service
derived_data/remote/ - Remote derivation service (within derived_data)

Protocol Handlers (libraries used by servers)

servers/slapi/wireproto_handler/ - Mercurial wire protocol handling
servers/slapi/slapi_service/ - SLAPI service implementation
servers/sshrelay/ - SSH relay server
hgproto/ - Mercurial protocol definitions

Jobs (Background Workers)

The jobs/ directory contains long-running background jobs for maintenance and async operations:

walker/ - Graph traversal, validation, scrubbing (see walker/src/README.md)
blobstore_healer/ - Storage durability and repair
cas_sync/ - Content-addressed storage synchronization
modern_sync/ - Modern sync job
statistics_collector/ - Repository statistics collection

Jobs are distinct from servers (which handle client requests) and tools (which are run on-demand).

Tools (Command-Line Utilities)

The tools/ directory contains command-line utilities for operators and developers:

Administrative Tools

admin/ - Main admin CLI (primary operational tool)
testtool/ - Testing and debugging utility

Import/Export

blobimport/ - Import Mercurial repositories
import/ - Import utilities
repo_import/ - Repository import tool

Verification

aliasverify/ - Verify content-addressed aliases
bonsai_verify/ - Bonsai changeset verification
check_git_wc/ - Git working copy verification

Maintenance

packer/ - Packblob utilities
sqlblob_gc/ - SQL blobstore garbage collection
backfill_mapping/ - Backfill mapping tables

Other

streaming_clone/ - Streaming clone generation
executor/ - Task executor
example/ - Example tool
tail-to-cloudwatch/ - CloudWatch log tailing

The admin/ tool is the primary interface for most operational tasks. Other tools are more specialized.

Clients

The clients/ directory contains client tools and libraries for interacting with Mononoke servers:

clients/scsc/ - Source Control Service CLI client (scsc command-line tool)
clients/git_pushrebase/ - Git pushrebase client utilities
clients/facebook/ - Facebook-internal client implementations

These client tools provide command-line and programmatic interfaces to Mononoke's various services.

VCS Integration

Git Support - git/ directory

git_types/ - Git-specific type definitions
protocol/, packfile/, packetline/ - Git protocol implementation
gitimport/, gitexport/ - Import and export
import_direct/, import_tools/ - Import utilities
git_env/ - Git environment setup
bundle_uri/ - Bundle URI support
git-pool/ - Git object pooling
check_git_wc/ - Working copy checking

Mercurial Support - mercurial/ directory

Mercurial type definitions
Revlog support
Legacy compatibility

LFS Support

lfs_protocol/ - LFS protocol definitions
servers/lfs/lfs_server/ - LFS server implementation
lfs_import_lib/ - LFS import utilities

Configuration and Metadata

metaconfig/ - Repository metadata configuration (legacy name, should be config/)
mononoke_configs/ - Global configuration system
mononoke_macros/ - Rust procedural macros

Testing

Integration Tests - tests/integration/

.t test files (Mercurial-style test format)
Test fixtures in tests/fixtures/
Library scripts: library.sh, library-commit.sh, library-git-lfs.sh, etc.
BUCK file defines dott_test targets for each test suite
See tests/integration/README.md for comprehensive documentation

Test Utilities - tests/utils/

Shared testing utilities

Unit Tests: Embedded in each component's src/ directory (standard Rust practice)

Other Key Directories

Operations

observability/ - Observability infrastructure
rate_limiting/ - Rate limiting implementations
permission_checker/ - Permission checking (base component)
repo_authorization/ - Repository authorization

Additional Components

blobrepo/, blobrepo_utils/ - Legacy repository abstraction (being phased out in favor of facets)
repo_factory/ - Repository creation and initialization
manifest/ - Manifest handling utilities
megarepo_api/ - Megarepo (monorepo) operations
acl_regions/ - ACL region management
cas_client/ - Content-addressed storage client
cats/ - Configuration as a Service integration
repo_client/ - Repository client implementation
gotham_ext/ - Gotham web framework extensions
quiet_stream/ - Stream utilities
time_window_counter/ - Time-windowed counters
adaptive_rate_limiter/ - Adaptive rate limiting
benchmarks/ - Performance benchmarks
third_party/ - Third-party code

Facebook-Internal: facebook/ contains Facebook-specific implementations mirroring the main hierarchy.

OSS Builds: public_autocargo/ contains auto-generated Cargo.toml files for open-source builds.

Finding Specific Functionality

All facets are in repo_attributes/
Facet names are descriptive (e.g., repo_identity, bookmarks, commit_graph)
SQL storage for a facet is typically in a sql_* subdirectory within the facet

Example: To find bookmark implementation:

Facet interface: repo_attributes/bookmarks/src/
SQL storage: Look for dbbookmarks within the bookmarks directory

How to Find a Derived Data Type

All derived data types are in derived_data/
Directory names match the derived data type name
The manager and framework are in derived_data/manager/ and derived_data/remote/

Example: To find fsnodes derivation:

Implementation: derived_data/fsnodes/
Manager integration: Check derived_data/manager/ for registration

How to Find Tests

Unit tests: Look in the component's source directory

Tests are co-located with code in src/ files
May also be in component-specific test/ directories

Integration tests: Look in tests/integration/

Test files use .t extension
Named descriptively (e.g., test-pushrebase.t, test-gitimport.t)
Organized by functionality in BUCK file dott_test targets

Example: To find pushrebase tests:

Unit tests: features/pushrebase/src/ may contain #[test] functions
Integration tests: tests/integration/test-pushrebase*.t

How to Find a Server's Implementation

Servers are organized under the servers/ directory by protocol:

SLAPI server code: servers/slapi/slapi_server/, servers/slapi/slapi_service/
Git server code: servers/git/git_server/
LFS server code: servers/lfs/lfs_server/
SCS server code: scs/scs_server/ (not yet migrated to servers/)
Binary definitions: Check component BUCK files

How to Find Storage Implementation

Blobstore backends: All in blobstore/ with descriptive names

File-based: blobstore/fileblob/
SQL-based: blobstore/sqlblob/
S3-based: blobstore/s3blob/
In-memory: blobstore/memblob/

Metadata storage: Look for sql_* subdirectories within facets

Bookmarks: repo_attributes/bookmarks/ area
Commit graph: repo_attributes/commit_graph/sql_commit_graph_storage/
Phases: Within repo_attributes/phases/

BUCK File Patterns

BUCK files follow consistent patterns:

load("@fbsource//tools/build_defs:rust_library.bzl", "rust_library")

oncall("scm_server_infra")

rust_library(
    name = "component_name",
    srcs = glob(["src/**/*.rs"]),
    autocargo = {"cargo_toml_dir": "component_name"},  # For OSS
    deps = [
        # Dependencies listed here
    ],
)

Key patterns:

oncall("scm_server_infra") appears in every file
autocargo metadata enables OSS builds
glob(["src/**/*.rs"]) is standard for sources
Dependencies use full paths (e.g., "//eden/mononoke/mononoke_types:mononoke_types")
Third-party deps: "fbsource//third-party/rust:crate_name"

Finding dependencies: Look at the deps field in a component's BUCK file to see what it uses.

Integration tests: Use dott_test targets that reference test files and their binary dependencies.

Rules of Thumb

Repository capabilities? → Look in repo_attributes/
Source control operation? → Check features/, then top-level directories
Storage backend? → Look in blobstore/
Derived data type? → Look in derived_data/
Server implementation? → Top-level server directories
Background job? → jobs/ directory
CLI tool? → tools/ directory (especially tools/admin/)
VCS-specific code? → git/ or mercurial/ directories
Common utility? → common/ directory
Test? → tests/integration/ for integration, src/ for unit tests
Configuration? → metaconfig/ or mononoke_configs/
Unknown? → Check the README.md in likely directories, or use grep/search

"I need to modify bookmark behavior"

Facet interface: repo_attributes/bookmarks/
Bookmark updates: repo_attributes/bookmarks/bookmarks_movement/
Caching: repo_attributes/bookmarks/bookmarks_cache/, repo_attributes/bookmarks/warm_bookmarks_cache/
Storage: Look for SQL implementation within the bookmarks directory structure

"I need to add a new server endpoint"

Identify the server: servers/slapi/, servers/git/, scs/, etc.
Check protocol service: servers/slapi/slapi_service/, servers/slapi/wireproto_handler/, etc.
Look at similar endpoints in that server's source
Add integration tests in tests/integration/

"I need to add a new derived data type"

Create new directory in derived_data/
Look at existing types (e.g., derived_data/fsnodes/) as examples
Implement derivation logic
Register with framework in derived_data/manager/
Add tests

"I'm debugging a storage issue"

Identify the layer: blobstore or metadata database?
Blobstore: blobstore/ + check decorators like cacheblob/, multiplexedblob/
Metadata: Find the facet in repo_attributes/ and its SQL implementation
Check walker/ for validation tools
Check jobs/blobstore_healer/ for healing logic

"I need to understand a protocol"

Git: git/protocol/, git/packfile/, git/packetline/
Mercurial: hgproto/, servers/slapi/wireproto_handler/
SLAPI: servers/slapi/slapi_service/
LFS: lfs_protocol/, servers/lfs/lfs_server/
SCS: scs/if/source_control.thrift for interface definition

Documentation Locations

High-level docs: docs/ (this document and architecture docs)
Component-specific docs: In component directories (e.g., walker/src/README.md, blobstore/packblob/README.md)
Type documentation: mononoke_types/docs/
Integration test docs: tests/integration/README.md

Summary

Mononoke's directory structure follows consistent principles:

Layered architecture: Base components → Repo attributes → Features → API → Servers/Tools
Two-level hierarchy: Category directories contain component subdirectories
Descriptive naming: Component names clearly indicate their purpose
Consistent patterns: BUCK files, source layout, and testing follow standards

When navigating the codebase:

Start with the layer that matches your concern
Use directory names as guides - they're descriptive
Check README.md files in major directories
Look at BUCK files to understand dependencies
Follow the patterns established in existing code

The codebase is large, but its organization is systematic. Understanding the layers and patterns makes it navigable.

Navigating the Mononoke Codebase

Navigating the Mononoke Codebase

Overview

Directory Organization Principles

Two-Level Hierarchy

Flat Namespace for Crates

Oncall Specification

Finding Components by Layer

Base Components (Foundation)

Repository Attributes (Facets)

Features Layer

Derived Data

API Layer

Servers, Jobs, and Tools

Servers

Jobs (Background Workers)

Tools (Command-Line Utilities)

Clients

VCS Integration

Configuration and Metadata

Testing

Other Key Directories

Finding Specific Functionality

How to Find a Facet Implementation

How to Find a Derived Data Type

How to Find Tests

How to Find a Server's Implementation

How to Find Storage Implementation

BUCK File Patterns

Rules of Thumb

Common Navigation Scenarios

"I need to modify bookmark behavior"

"I need to add a new server endpoint"

"I need to add a new derived data type"

"I'm debugging a storage issue"

"I need to understand a protocol"

Documentation Locations

Summary