Back to Sapling

Mercurial and Sapling Support

eden/mononoke/docs/5.2-mercurial-sapling-support.md

latest17.0 KB
Original Source

Mercurial and Sapling Support

This document explains how Mononoke supports Mercurial and Sapling clients. Mononoke was originally designed as a scalable Mercurial backend and retains extensive support for Mercurial protocols and data formats. As Mercurial evolved into Sapling, Mononoke evolved alongside it, and now primarily only support Sapling clients.

Historical Context

Mononoke originated as a scalable backend for Mercurial, which served as Meta's primary version control system. The initial design focused on supporting Mercurial clients while introducing Bonsai as an internal canonical format. This approach allowed Mononoke to scale beyond traditional Mercurial server implementations while preserving client compatibility.

As Mercurial development transitioned to the Sapling project, Mononoke continued serving clients through protocol evolution rather than architectural changes. The server maintained support for legacy Mercurial wire protocols while adding support for EdenAPI (now called SLAPI), an HTTP-based protocol that replaced older SSH-based communication.

Today, Mononoke serves primarily Sapling clients, and Mercurial wireproto support is being removed. For clarity, this document uses "Mercurial" to refer to the version control system and its data model, and "Sapling" to refer to the current client implementation. In the future we may evolve the data model as well, in which case we will likely refer to the new data model as the "Sapling" data model.

Mercurial Data Model

Mercurial uses a content-addressed data model based on nodes. Each node—whether a changeset, manifest, or file—is identified by a hash computed from its parents and content. This section describes the key Mercurial types that Mononoke handles.

Core Mercurial Types

Mercurial types are defined in mercurial/types/. The primary identifier types are:

HgChangesetId (mercurial/types/src/nodehash.rs)

  • Identifies a Mercurial changeset (commit)
  • SHA-1 hash computed from parent hashes and changeset content
  • 20-byte identifier, typically displayed in hexadecimal

HgManifestId (mercurial/types/src/nodehash.rs)

  • Identifies a Mercurial manifest (directory tree)
  • Computed from parent manifest hashes and directory contents
  • Forms a DAG independent of changesets
  • Note that Mononoke only supports tree manifests (not traditional Mercurial flat manifests)

HgFileNodeId (mercurial/types/src/nodehash.rs)

  • Identifies a file version (filenode)
  • Computed from parent filenode hashes and file content
  • Each file has its own history DAG

Node Computation

Mercurial node hashes follow a consistent pattern:

node_hash = SHA1(p1_hash || p2_hash || content)

Where:

  • p1_hash is the first parent (null hash if no parent)
  • p2_hash is the second parent (null hash if fewer than two parents)
  • content is the serialized content specific to the node type

This computation creates content-addressed Merkle DAGs for changesets, manifests, and files. The same content with different parents produces different hashes, embedding history in the identity.

Mercurial Structures

Changesets (mercurial/types/src/blobs/changeset/)

  • Contain commit metadata (author, date, message)
  • Reference a manifest ID (the root directory)
  • List parent changeset IDs (0-2 parents)
  • Include extra metadata (key-value pairs)

Manifests (mercurial/types/src/blobs/manifest.rs)

  • Map filenames to file IDs and metadata
  • Represent directory contents
  • Can reference sub-manifests for subdirectories
  • Stored as sorted lists of entries

Filenodes (mercurial/types/src/blobs/file.rs)

  • Represent specific file versions
  • Store file content or deltas
  • Include metadata (flags, copy information)
  • Form per-file history DAGs

Wire Formats

Mercurial wire protocols use several serialization formats:

Revlog Format (mercurial/revlog/)

  • On-disk storage format used by Mercurial
  • Contains compressed deltas and indexes
  • Mononoke can read revlog data for imports

Bundle2 Format (mercurial/bundles/)

  • Wire protocol format for exchanging data
  • Streams of parts containing changesets, manifests, files
  • Supports capabilities negotiation
  • Used for push and pull operations

Wirepack Format (mercurial/bundles/src/wirepack/)

  • Alternative wire format for file data
  • More efficient than changegroups for large files
  • Used by remotefilelog extension

Bonsai ↔ Mercurial Conversion

Mononoke stores commits internally as Bonsai changesets and converts to Mercurial format when serving clients. The BonsaiHgMapping facet maintains bidirectional mappings between these representations.

BonsaiHgMapping Facet

The BonsaiHgMapping facet (repo_attributes/bonsai_hg_mapping/) provides mapping operations:

Core Operations:

  • get_hg_from_bonsai - Convert Bonsai changeset ID to Mercurial changeset ID
  • get_bonsai_from_hg - Convert Mercurial changeset ID to Bonsai changeset ID
  • add - Store a new mapping entry
  • get - Batch lookup supporting both directions

Storage:

  • Mappings stored in metadata database
  • Cached for performance (via CachingBonsaiHgMapping)
  • Written during commit ingestion or derivation

Conversion Process

When a Mercurial client pushes commits:

  1. Receive Mercurial data - Server receives changesets in Mercurial format via wire protocol
  2. Convert to Bonsai - Transform Mercurial changeset structure to Bonsai format
  3. Store core data - Write Bonsai changeset and file contents to blobstore
  4. Create mapping - Store Bonsai ↔ Mercurial changeset ID mapping
  5. Store Mercurial format - Optionally preserve original Mercurial bytes

When a client requests data:

  1. Receive request - Client requests by Mercurial changeset ID
  2. Resolve to Bonsai - Look up corresponding Bonsai changeset ID
  3. Operate on Bonsai - Perform operations using Bonsai representation
  4. Derive Mercurial data - Generate Mercurial changeset if needed
  5. Return Mercurial format - Convert response to Mercurial wire format

Conversion Characteristics

The conversion between Bonsai and Mercurial has the following characteristics:

Not Bijective - Multiple Bonsai changesets can map to the same Mercurial changeset ID if they differ only in metadata not represented in Mercurial format. In practice, mappings are effectively one-to-one due to compatibility constraints.

Deterministic - Given a Bonsai changeset, the derived Mercurial changeset ID is deterministic. This allows regeneration and verification.

Async Derivation - Mercurial representations can be derived asynchronously after Bonsai commits are stored. See the Derived Data section below.

Mercurial-Specific Derived Data

Several derived data types exist specifically to support Mercurial and Sapling clients. These types provide data in formats expected by Mercurial protocols.

Mapped Mercurial Changeset

MappedHgChangesetId (derived_data/mercurial_derivation/)

This derived data type generates a Mercurial changeset from a Bonsai changeset. The derivation:

  1. Converts Bonsai file changes to Mercurial manifest format
  2. Derives the manifest tree (see below)
  3. Constructs Mercurial changeset bytes with metadata
  4. Computes the Mercurial changeset ID
  5. Stores the mapping in BonsaiHgMapping

This type is required for serving Mercurial clients and is typically derived on-demand or by background workers.

Mercurial Augmented Manifests

RootHgAugmentedManifestId (derived_data/mercurial_derivation/)

Augmented manifests are content-addressed directory trees in Mercurial format with additional metadata. Unlike traditional Mercurial manifests, augmented manifests include:

  • File sizes
  • Content hashes (SHA-1 and SHA-256)
  • File type information

Augmented manifests enable efficient serving of tree data to Sapling clients without requiring full manifest computation. The derivation constructs Mercurial-compatible tree structures from Bonsai file changes.

Augmented manifests are stored in the blobstore using sharded storage for large directories. See mercurial/types/src/sharded_augmented_manifest.rs for the data structure.

Filenodes

Filenodes (derived_data/filenodes_derivation/)

Filenodes represent Mercurial's per-file history tracking. Each filenode records:

  • File path
  • File node ID (Mercurial file hash)
  • Changeset ID (linkrev - the changeset that introduced this version)
  • Parent file node IDs

Filenodes are required for Mercurial wire protocol compatibility, particularly for operations like getfiles and getpackv1. The filenode derivation:

  1. Examines file changes in the Bonsai changeset
  2. Computes Mercurial file node IDs based on content and parents
  3. Records the linkrev (changeset containing the file change)
  4. Stores filenode information in the metadata database

Legacy filenode implementation is in repo_attributes/filenodes/, while the derived data implementation is in derived_data/filenodes_derivation/.

Derivation Dependencies

Mercurial-specific derived data types have dependencies:

  • MappedHgChangesetId depends on Fsnodes (for manifest structure)
  • RootHgAugmentedManifestId depends on Fsnodes
  • Filenodes depend on MappedHgChangesetId (for linkrevs)

These dependencies ensure that Mercurial data can be derived consistently from Bonsai commits.

Protocol Support

Mononoke supports multiple protocols for Mercurial and Sapling clients, reflecting the evolution of client-server communication.

EdenAPI (SLAPI)

EdenAPI, now called SLAPI (Sapling Remote API), is an HTTP-based protocol that replaced older Mercurial wire protocols. The implementation is in servers/slapi/slapi_service/.

Protocol Characteristics:

  • HTTP/2 based
  • Structured requests and responses (CBOR or JSON serialization)
  • Streaming support for large responses
  • Authentication via headers
  • Supports concurrent requests

Major Endpoints (servers/slapi/slapi_service/src/handlers/):

  • Files (files.rs) - Fetch file contents by content hash or path
  • Trees (trees.rs) - Fetch directory manifests
  • Commit (commit.rs) - Upload commits and push operations
  • History (history.rs) - Fetch file or directory history
  • Bookmarks (bookmarks.rs) - Fetch bookmark values
  • Lookup (lookup.rs) - Resolve commit hashes and bookmarks
  • Blame (blame.rs) - Fetch blame annotations
  • Land (land.rs) - Landing (merge) operations
  • Commit Cloud (commit_cloud.rs) - Sync uncommitted work across machines

Serving Process:

  1. Client sends HTTP request to specific endpoint
  2. Middleware authenticates and extracts metadata
  3. Handler deserializes request
  4. Handler accesses repository via mononoke_api layer
  5. Repository operations use facets to fetch Bonsai data
  6. Derived data is generated if needed (Mercurial changesets, manifests)
  7. Response is serialized and streamed to client

EdenAPI/SLAPI is the primary protocol used by modern Sapling clients and EdenFS.

Legacy Wire Protocol

The legacy Mercurial wire protocol (servers/slapi/wireproto_handler/) supports older Mercurial clients using SSH-based communication. This protocol:

  • Uses bundle2 format for data exchange
  • Supports capabilities negotiation
  • Handles pushes via changegroup bundles
  • Provides pull operations via getbundle command

The wire protocol handler converts between bundle2 format and Mononoke's internal operations. While still supported, most clients have migrated to EdenAPI/SLAPI.

Remotefilelog

Remotefilelog is a Mercurial extension that avoids downloading full file history by fetching file contents on-demand. Mononoke supports remotefilelog through:

Getpack Endpoint - Fetches file contents in wirepack format Getcommitdata - Fetches commit metadata without manifests Gettreepack - Fetches tree manifests without file contents

Remotefilelog support is integrated into both the wire protocol handler and EdenAPI. File data is served from the blobstore, with Mercurial file node IDs resolved via filenodes derived data.

Bundle Formats

Mercurial uses bundle formats to transfer data between client and server. Mononoke supports reading and writing these formats for protocol compatibility.

Bundle2

Bundle2 (mercurial/bundles/) is the primary wire protocol format. A bundle2 stream consists of:

Header - Protocol version and capabilities Parts - Typed chunks of data (changesets, manifests, files) Parameters - Metadata about each part

Common Parts:

  • Changegroup - Contains changesets, manifests, and file deltas
  • Obsmarkers - Obsolescence markers (deprecated commits)
  • Pushkey - Bookmark updates
  • Reply parts - Server responses to client requests

Bundle2 parsing (mercurial/bundles/src/bundle2_encode.rs) handles:

  • Stream decompression
  • Part deserialization
  • Capability negotiation
  • Error handling

Changegroup Format

Changegroups (mercurial/bundles/src/changegroup/) are sequences of deltas for changesets, manifests, and files. Each entry contains:

  • Node ID
  • Parent node IDs
  • Delta or full content
  • Linkrev (for files and manifests)

Changegroup unpacking (unpacker.rs) reconstructs full content from deltas, while packing (packer.rs) generates efficient delta sequences for transmission.

Wirepack Format

Wirepack (mercurial/bundles/src/wirepack/) is an alternative to changegroups used primarily for file data. Wirepack format:

  • Streams file content with metadata
  • Avoids delta chains
  • More efficient for large files
  • Used by remotefilelog and EdenAPI

Migration and Compatibility

Mononoke maintains compatibility with both legacy Mercurial clients and modern Sapling clients through careful protocol support and data format handling.

Protocol Version Support

Mononoke servers negotiate capabilities with clients:

  • Legacy clients - Use bundle2 wire protocol over SSH
  • Remotefilelog clients - Use getpack endpoints over HTTP
  • Sapling clients - Use EdenAPI/SLAPI over HTTP

Capability negotiation allows servers to advertise supported features while clients can fall back to compatible operation modes.

Client Type Detection

The server identifies client type through:

  • User-agent headers (for HTTP requests)
  • Capability strings (for wire protocol)
  • Requested endpoints (EdenAPI vs legacy commands)

This allows serving different clients from the same server infrastructure.

Data Format Compatibility

Mononoke ensures Mercurial data format compatibility by:

Preserving Hashes - Derived Mercurial changeset IDs match what a native Mercurial server would produce for the same commits. This ensures clients can verify data integrity.

Supporting Legacy Formats - Bundle2, changegroup, and wirepack formats are fully supported for clients that require them.

Maintaining Mapping Consistency - The BonsaiHgMapping provides stable Bonsai ↔ Mercurial ID mappings across server restarts and derivation operations.

Evolution from Mercurial to Sapling

The transition from Mercurial to Sapling occurred gradually, with Mononoke adapting to serve both client types:

Protocol Migration - The shift from SSH-based wire protocol to HTTP-based EdenAPI improved performance and simplified deployment. Mononoke added EdenAPI support while maintaining wire protocol compatibility.

Client Rebranding - As Mercurial clients were rebranded to Sapling, the server continued serving the same protocols with updated client-agent strings.

Feature Additions - New features like Commit Cloud and suffix query were added to EdenAPI without breaking compatibility with older clients.

Deprecation Path - Legacy wire protocol support remains for compatibility but is not actively developed. New features are added to SLAPI.

Implementation Locations

Mercurial and Sapling support is distributed across several directories:

Core Types and Formats:

  • mercurial/types/ - Mercurial type definitions
  • mercurial/bundles/ - Bundle2 and wire format handling
  • mercurial/revlog/ - Revlog format support
  • mercurial/mutation/ - Mutation (history editing) tracking

Mappings:

  • repo_attributes/bonsai_hg_mapping/ - Bonsai ↔ Mercurial mapping facet

Derived Data:

  • derived_data/mercurial_derivation/ - Mercurial changeset and manifest derivation
  • derived_data/filenodes_derivation/ - Filenode derivation
  • repo_attributes/filenodes/ - Legacy filenode storage

Protocols:

  • servers/slapi/slapi_service/ - EdenAPI/SLAPI HTTP server
  • servers/slapi/wireproto_handler/ - Legacy wire protocol handler

Server:

  • servers/slapi/slapi_server/ - Main Mononoke server (serves SLAPI)

Component-specific documentation for Mercurial types, protocols, and derived data lives in the respective directories.