Back to Sapling

Navigating the Mononoke Codebase

eden/mononoke/docs/1.4-navigating-the-codebase.md

latest21.7 KB
Original Source

Navigating the Mononoke Codebase

This guide helps developers find their way around Mononoke's approximately 55-60 top-level directories. Rather than providing an exhaustive catalog, this document teaches the organizational principles and patterns that make the codebase navigable.

Overview

Mononoke's codebase is organized around two key architectural concepts explained in the Architecture Overview:

  1. System Architecture - How Mononoke is deployed as frontend services, microservices, and shared storage
  2. Code Architecture - How each application is built internally using layered libraries

This document focuses on the code architecture and directory organization. The library code is organized into architectural layers, with each layer building on the ones below it. Understanding this layered structure is key to knowing where to find (or add) functionality:

Servers & Tools         (Entry points for users and operators)
         ↓
    API Layer           (High-level source control abstractions)
         ↓
   Features Layer       (Source control operations)
         ↓
Repo Attributes Layer   (Repository facets and capabilities)
         ↓
Base Components Layer   (Fundamental building blocks)

This layering principle guides the directory organization. Lower layers know nothing about higher layers - for example, base components know nothing about repositories, and repository attributes don't implement complete features.

Directory Organization Principles

Two-Level Hierarchy

Almost all components follow a two-level structure:

  • Category directory: Groups related components (e.g., blobstore/, repo_attributes/)
  • Component directory: Contains the actual implementation

Each component directory typically contains:

  • src/ - Rust source files
  • BUCK - Build definitions
  • Optional: Cargo.toml for OSS builds

Flat Namespace for Crates

Rust uses a flat namespace for crate names, so Mononoke components use descriptive names to avoid collisions. For example, the fsnodes component is named fsnodes_derivation rather than just fsnodes, and SQL storage implementations often use database-related prefixes (e.g., dbbookmarks, sql_commit_graph_storage).

Oncall Specification

All BUCK files start with oncall("scm_server_infra") to designate ownership. This is standard across the entire codebase.

Finding Components by Layer

Base Components (Foundation)

The lowest layer provides fundamental building blocks that know nothing about repositories or source control concepts.

common/ - Shared utilities and libraries

  • Async utilities: futures_watchdog, async_limiter
  • Logging: scuba_ext, logger_ext
  • Data structures: dedupmap, uniqueheap
  • Time and measurement: time_measuring, reloader
  • Graph algorithms: topo_sort
  • And many more specialized utilities

blobstore/ - Immutable key-value storage

  • Backend implementations: fileblob/, memblob/, sqlblob/, s3blob/
  • Caching: cacheblob/
  • Multi-backend: multiplexedblob/, multiplexedblob_wal/
  • Compression: packblob/ (see its README.md for details)
  • Storage decorators: redactedblobstore/, throttledblob/, prefixblob/
  • Testing utilities: chaosblob/, delayblob/
  • Ephemeral storage: ephemeral_blobstore/

mononoke_types/ - Core data type definitions

  • Defines Bonsai changesets, file changes, paths, and content IDs
  • See mononoke_types/docs/ for detailed type documentation
  • This is monolithic but could be split up in the future

cmdlib/ - Command-line application framework

  • mononoke_app/ - Standard framework for new binaries (use this!)
  • base_app/ - Base application primitives
  • Argument handling: config_args/, commit_id/, sharding/
  • Environment setup: environment/, logging/, log/
  • Common capabilities: caching/, scrubbing/, cross_repo/

These directories provide the foundation but contain no repository-specific logic.

Repository Attributes (Facets)

The repo_attributes/ directory contains all the facets that compose a repository. Each subdirectory implements one aspect of repository functionality.

Finding facets: All repository attributes live in repo_attributes/. If you need to understand or modify a specific repository capability, start here.

Key facet categories:

Identity and Configuration

  • repo_identity/ - Repository name and ID
  • repo_bookmark_attrs/ - Bookmark configuration attributes

Storage Access

  • repo_blobstore/ - Repository-specific blobstore access
  • filestore/ - File content storage and retrieval
  • mutable_blobstore/ - Mutable blob operations

Commit Graph and History

  • commit_graph/ - Contains commit_graph/ for reading and sql_commit_graph_storage/ for storage
  • phases/ - Commit phase tracking (public, draft, etc.)

Derived Data

  • repo_derived_data/ - Per-repository derived data management
  • repo_derivation_queues/ - Derivation work queues

VCS Mappings

  • bonsai_hg_mapping/ - Bonsai ↔ Mercurial changeset mapping
  • bonsai_git_mapping/ - Bonsai ↔ Git commit mapping
  • bonsai_globalrev_mapping/ - Bonsai ↔ GlobalRev mapping
  • bonsai_svnrev_mapping/ - Bonsai ↔ SVN revision mapping
  • bonsai_tag_mapping/ - Tag object mappings
  • bonsai_blob_mapping/ - Blob content mappings

Bookmarks (Branches)

  • bookmarks/ - Contains bookmarks/ for bookmark access and dbbookmarks/ for storage

Git-Specific Attributes

  • git_ref_content_mapping/ - Git reference to content mappings
  • git_source_of_truth/ - Tracks Git source of truth
  • git_symbolic_refs/ - Git symbolic reference handling

Operations and Metadata

  • hook_manager/ - Hook execution management
  • pushrebase_mutation_mapping/ - Pushrebase mutation tracking
  • deletion_log/ - Deleted commit tracking
  • mutable_counters/ - Repository counters
  • mutable_renames/ - File rename tracking
  • repo_cross_repo/ - Cross-repository sync attributes
  • repo_lock/ - Repository locking
  • repo_permission_checker/ - Permission checking
  • repo_sparse_profiles/ - Sparse profile management
  • restricted_paths/ - Path access restrictions
  • sql_query_config/ - SQL query configuration
  • repo_metadata_checkpoint/ - Metadata checkpointing
  • repo_event_publisher/ - Event publishing

Legacy Filenode Storage

  • filenodes/ - Legacy filenode interface
  • newfilenodes/ - Newer filenode implementation

Pattern for using facets: Features and higher layers access repository capabilities through facet traits. Look at the dependencies in a feature's BUCK file to see which facets it uses.

Features Layer

The features/ directory contains source control operations implemented by combining repository facets. Features are stateless - they operate on facets but don't hold repository state themselves.

Current features in features/:

  • async_requests/ - Asynchronous request handling
  • cache_warmup/ - Repository cache preloading
  • changesets_creation/ - Changeset creation operations
  • commit_cloud/ - Commit cloud synchronization
  • commit_rewriting/ - Commit transformation and rewriting
  • commit_transformation/ - Commit rewriting and transformation
  • cross_repo_sync/ - Cross-repository synchronization
  • diff/ - Diff computation
  • history_traversal/ - History walking and traversal
  • hooks/ - Pre-commit and other hooks
  • microwave/ - Fast cache warming
  • pushrebase/ - Server-side rebasing
  • redaction/ - Content redaction
  • repo_metadata/ - Repository metadata operations
  • repo_stats_logger/ - Repository statistics logging
  • repo_update_logger/ - Repository update logging

Other feature locations: Some feature-related functionality is in other locations:

  • bookmarks/bookmarks_movement/ - Bookmark updates (within repo_attributes)

The features layer is being gradually populated as code is refactored.

Derived Data

The derived_data/ directory contains the derived data framework and all derived data type implementations.

Framework Components

  • manager/ - Derived data manager and coordination
  • remote/ - Remote derivation service
  • bulk_derivation/ - Batch derivation
  • constants/ - Shared constants
  • test_utils/ - Testing utilities

Derived Data Types (partial list - there are ~22 total)

  • Manifests: fsnodes/, unodes/, skeleton_manifest/, skeleton_manifest_v2/, basename_suffix_skeleton_manifest_v3/, deleted_manifest/, case_conflict_skeleton_manifest/, content_manifest_derivation/
  • File Metadata: filenodes_derivation/, blame/, fastlog/
  • Mercurial-specific: mercurial_derivation/
  • Utilities: changeset_info/, inferred_copy_from/
  • Testing: test_manifest/, test_sharded_manifest/

Finding derived data types: All implementations are subdirectories of derived_data/. Each type has its own directory with source and BUCK file.

API Layer

The mononoke_api/ directory provides high-level abstractions over Mononoke's internal data structures:

  • Repository objects
  • Changeset abstractions
  • Tree and file interfaces
  • VCS-agnostic operations

There's also mononoke_api_hg/ for Mercurial-specific API extensions.

Note: The API layer is intended to be the primary interface for servers and tools, though many components still access lower layers directly for historical reasons.

Servers, Jobs, and Tools

These directories contain the application binaries that use the library layers described above. As explained in the Architecture Overview, Mononoke is deployed as a collection of services (frontend servers and microservices) plus tools and background jobs. Each application uses the same layered library code.

Servers

Servers are organized under the servers/ directory, grouped by protocol. Each protocol has its own subdirectory:

Frontend Servers (serve external clients)

  • servers/slapi/slapi_server/ - Main Mononoke server (SLAPI for Sapling and EdenFS clients)
    • Contains: repo_listener/, context/, qps/ subdirectories
    • The actual binary is defined in the server's BUCK file
  • scs/ - Source Control Service (Thrift API for programmatic access)
    • Contains: scs_server/ for the server and if/ for Thrift definitions
  • servers/git/git_server/ - Git protocol server (HTTP-based)
  • servers/lfs/lfs_server/ - Git LFS protocol server

Microservices (handle expensive operations)

  • servers/land_service/ - Landing (merge) service
  • derived_data/remote/ - Remote derivation service (within derived_data)

Protocol Handlers (libraries used by servers)

  • servers/slapi/wireproto_handler/ - Mercurial wire protocol handling
  • servers/slapi/slapi_service/ - SLAPI service implementation
  • servers/sshrelay/ - SSH relay server
  • hgproto/ - Mercurial protocol definitions

Jobs (Background Workers)

The jobs/ directory contains long-running background jobs for maintenance and async operations:

  • walker/ - Graph traversal, validation, scrubbing (see walker/src/README.md)
  • blobstore_healer/ - Storage durability and repair
  • cas_sync/ - Content-addressed storage synchronization
  • modern_sync/ - Modern sync job
  • statistics_collector/ - Repository statistics collection

Jobs are distinct from servers (which handle client requests) and tools (which are run on-demand).

Tools (Command-Line Utilities)

The tools/ directory contains command-line utilities for operators and developers:

Administrative Tools

  • admin/ - Main admin CLI (primary operational tool)
  • testtool/ - Testing and debugging utility

Import/Export

  • blobimport/ - Import Mercurial repositories
  • import/ - Import utilities
  • repo_import/ - Repository import tool

Verification

  • aliasverify/ - Verify content-addressed aliases
  • bonsai_verify/ - Bonsai changeset verification
  • check_git_wc/ - Git working copy verification

Maintenance

  • packer/ - Packblob utilities
  • sqlblob_gc/ - SQL blobstore garbage collection
  • backfill_mapping/ - Backfill mapping tables

Other

  • streaming_clone/ - Streaming clone generation
  • executor/ - Task executor
  • example/ - Example tool
  • tail-to-cloudwatch/ - CloudWatch log tailing

The admin/ tool is the primary interface for most operational tasks. Other tools are more specialized.

Clients

The clients/ directory contains client tools and libraries for interacting with Mononoke servers:

  • clients/scsc/ - Source Control Service CLI client (scsc command-line tool)
  • clients/git_pushrebase/ - Git pushrebase client utilities
  • clients/facebook/ - Facebook-internal client implementations

These client tools provide command-line and programmatic interfaces to Mononoke's various services.

VCS Integration

Git Support - git/ directory

  • git_types/ - Git-specific type definitions
  • protocol/, packfile/, packetline/ - Git protocol implementation
  • gitimport/, gitexport/ - Import and export
  • import_direct/, import_tools/ - Import utilities
  • git_env/ - Git environment setup
  • bundle_uri/ - Bundle URI support
  • git-pool/ - Git object pooling
  • check_git_wc/ - Working copy checking

Mercurial Support - mercurial/ directory

  • Mercurial type definitions
  • Revlog support
  • Legacy compatibility

LFS Support

  • lfs_protocol/ - LFS protocol definitions
  • servers/lfs/lfs_server/ - LFS server implementation
  • lfs_import_lib/ - LFS import utilities

Configuration and Metadata

  • metaconfig/ - Repository metadata configuration (legacy name, should be config/)
  • mononoke_configs/ - Global configuration system
  • mononoke_macros/ - Rust procedural macros

Testing

Integration Tests - tests/integration/

  • .t test files (Mercurial-style test format)
  • Test fixtures in tests/fixtures/
  • Library scripts: library.sh, library-commit.sh, library-git-lfs.sh, etc.
  • BUCK file defines dott_test targets for each test suite
  • See tests/integration/README.md for comprehensive documentation

Test Utilities - tests/utils/

  • Shared testing utilities

Unit Tests: Embedded in each component's src/ directory (standard Rust practice)

Other Key Directories

Operations

  • observability/ - Observability infrastructure
  • rate_limiting/ - Rate limiting implementations
  • permission_checker/ - Permission checking (base component)
  • repo_authorization/ - Repository authorization

Additional Components

  • blobrepo/, blobrepo_utils/ - Legacy repository abstraction (being phased out in favor of facets)
  • repo_factory/ - Repository creation and initialization
  • manifest/ - Manifest handling utilities
  • megarepo_api/ - Megarepo (monorepo) operations
  • acl_regions/ - ACL region management
  • cas_client/ - Content-addressed storage client
  • cats/ - Configuration as a Service integration
  • repo_client/ - Repository client implementation
  • gotham_ext/ - Gotham web framework extensions
  • quiet_stream/ - Stream utilities
  • time_window_counter/ - Time-windowed counters
  • adaptive_rate_limiter/ - Adaptive rate limiting
  • benchmarks/ - Performance benchmarks
  • third_party/ - Third-party code

Facebook-Internal: facebook/ contains Facebook-specific implementations mirroring the main hierarchy.

OSS Builds: public_autocargo/ contains auto-generated Cargo.toml files for open-source builds.

Finding Specific Functionality

How to Find a Facet Implementation

  1. All facets are in repo_attributes/
  2. Facet names are descriptive (e.g., repo_identity, bookmarks, commit_graph)
  3. SQL storage for a facet is typically in a sql_* subdirectory within the facet

Example: To find bookmark implementation:

  • Facet interface: repo_attributes/bookmarks/src/
  • SQL storage: Look for dbbookmarks within the bookmarks directory

How to Find a Derived Data Type

  1. All derived data types are in derived_data/
  2. Directory names match the derived data type name
  3. The manager and framework are in derived_data/manager/ and derived_data/remote/

Example: To find fsnodes derivation:

  • Implementation: derived_data/fsnodes/
  • Manager integration: Check derived_data/manager/ for registration

How to Find Tests

Unit tests: Look in the component's source directory

  • Tests are co-located with code in src/ files
  • May also be in component-specific test/ directories

Integration tests: Look in tests/integration/

  • Test files use .t extension
  • Named descriptively (e.g., test-pushrebase.t, test-gitimport.t)
  • Organized by functionality in BUCK file dott_test targets

Example: To find pushrebase tests:

  • Unit tests: features/pushrebase/src/ may contain #[test] functions
  • Integration tests: tests/integration/test-pushrebase*.t

How to Find a Server's Implementation

Servers are organized under the servers/ directory by protocol:

  • SLAPI server code: servers/slapi/slapi_server/, servers/slapi/slapi_service/
  • Git server code: servers/git/git_server/
  • LFS server code: servers/lfs/lfs_server/
  • SCS server code: scs/scs_server/ (not yet migrated to servers/)
  • Binary definitions: Check component BUCK files

How to Find Storage Implementation

Blobstore backends: All in blobstore/ with descriptive names

  • File-based: blobstore/fileblob/
  • SQL-based: blobstore/sqlblob/
  • S3-based: blobstore/s3blob/
  • In-memory: blobstore/memblob/

Metadata storage: Look for sql_* subdirectories within facets

  • Bookmarks: repo_attributes/bookmarks/ area
  • Commit graph: repo_attributes/commit_graph/sql_commit_graph_storage/
  • Phases: Within repo_attributes/phases/

BUCK File Patterns

BUCK files follow consistent patterns:

load("@fbsource//tools/build_defs:rust_library.bzl", "rust_library")

oncall("scm_server_infra")

rust_library(
    name = "component_name",
    srcs = glob(["src/**/*.rs"]),
    autocargo = {"cargo_toml_dir": "component_name"},  # For OSS
    deps = [
        # Dependencies listed here
    ],
)

Key patterns:

  • oncall("scm_server_infra") appears in every file
  • autocargo metadata enables OSS builds
  • glob(["src/**/*.rs"]) is standard for sources
  • Dependencies use full paths (e.g., "//eden/mononoke/mononoke_types:mononoke_types")
  • Third-party deps: "fbsource//third-party/rust:crate_name"

Finding dependencies: Look at the deps field in a component's BUCK file to see what it uses.

Integration tests: Use dott_test targets that reference test files and their binary dependencies.

Rules of Thumb

  1. Repository capabilities? → Look in repo_attributes/
  2. Source control operation? → Check features/, then top-level directories
  3. Storage backend? → Look in blobstore/
  4. Derived data type? → Look in derived_data/
  5. Server implementation? → Top-level server directories
  6. Background job?jobs/ directory
  7. CLI tool?tools/ directory (especially tools/admin/)
  8. VCS-specific code?git/ or mercurial/ directories
  9. Common utility?common/ directory
  10. Test?tests/integration/ for integration, src/ for unit tests
  11. Configuration?metaconfig/ or mononoke_configs/
  12. Unknown? → Check the README.md in likely directories, or use grep/search

Common Navigation Scenarios

"I need to modify bookmark behavior"

  1. Facet interface: repo_attributes/bookmarks/
  2. Bookmark updates: repo_attributes/bookmarks/bookmarks_movement/
  3. Caching: repo_attributes/bookmarks/bookmarks_cache/, repo_attributes/bookmarks/warm_bookmarks_cache/
  4. Storage: Look for SQL implementation within the bookmarks directory structure

"I need to add a new server endpoint"

  1. Identify the server: servers/slapi/, servers/git/, scs/, etc.
  2. Check protocol service: servers/slapi/slapi_service/, servers/slapi/wireproto_handler/, etc.
  3. Look at similar endpoints in that server's source
  4. Add integration tests in tests/integration/

"I need to add a new derived data type"

  1. Create new directory in derived_data/
  2. Look at existing types (e.g., derived_data/fsnodes/) as examples
  3. Implement derivation logic
  4. Register with framework in derived_data/manager/
  5. Add tests

"I'm debugging a storage issue"

  1. Identify the layer: blobstore or metadata database?
  2. Blobstore: blobstore/ + check decorators like cacheblob/, multiplexedblob/
  3. Metadata: Find the facet in repo_attributes/ and its SQL implementation
  4. Check walker/ for validation tools
  5. Check jobs/blobstore_healer/ for healing logic

"I need to understand a protocol"

  1. Git: git/protocol/, git/packfile/, git/packetline/
  2. Mercurial: hgproto/, servers/slapi/wireproto_handler/
  3. SLAPI: servers/slapi/slapi_service/
  4. LFS: lfs_protocol/, servers/lfs/lfs_server/
  5. SCS: scs/if/source_control.thrift for interface definition

Documentation Locations

  • High-level docs: docs/ (this document and architecture docs)
  • Component-specific docs: In component directories (e.g., walker/src/README.md, blobstore/packblob/README.md)
  • Type documentation: mononoke_types/docs/
  • Integration test docs: tests/integration/README.md

Summary

Mononoke's directory structure follows consistent principles:

  • Layered architecture: Base components → Repo attributes → Features → API → Servers/Tools
  • Two-level hierarchy: Category directories contain component subdirectories
  • Descriptive naming: Component names clearly indicate their purpose
  • Consistent patterns: BUCK files, source layout, and testing follow standards

When navigating the codebase:

  • Start with the layer that matches your concern
  • Use directory names as guides - they're descriptive
  • Check README.md files in major directories
  • Look at BUCK files to understand dependencies
  • Follow the patterns established in existing code

The codebase is large, but its organization is systematic. Understanding the layers and patterns makes it navigable.