Back to Sapling

Git Support

eden/mononoke/docs/5.1-git-support.md

latest14.2 KB
Original Source

Git Support

This document explains how Mononoke supports Git clients and repositories. Git support allows standard Git clients to interact with Mononoke-backed repositories through the Git protocol while maintaining Bonsai as the canonical internal representation.

Overview

Mononoke provides Git protocol support through several components:

  • Git server - HTTP-based server implementing the Git smart protocol
  • Format conversion - Bidirectional mapping between Bonsai changesets and Git commits
  • Git-specific derived data - Pre-computed Git trees and delta manifests
  • Reference management - Git symbolic refs, tags, and branch handling
  • LFS server - Large file storage compatible with Git LFS protocol
  • Source of truth tracking - Migration support for repositories transitioning between VCS systems

This integration enables Git clients to clone, fetch, and push to Mononoke repositories using standard Git operations, while Mononoke maintains a single canonical representation in Bonsai format.

Git Protocol Server

The Git server (servers/git/git_server/) implements the Git smart HTTP protocol, allowing standard Git clients to interact with Mononoke repositories.

Protocol Implementation

The server handles Git protocol operations through several layers:

Protocol parsing (git/protocol/) - Implements Git's pack protocol, including negotiation, packfile generation, and reference advertisement.

Packfile handling (git/packfile/) - Generates and parses Git packfiles containing commits, trees, and blobs. Packfiles are the primary data transfer format in the Git protocol.

Packet-line format (git/packetline/) - Implements Git's packet-line protocol used for framing data in Git communication.

Supported Operations

The Git server handles standard Git operations:

Clone - Advertises available references and generates a packfile containing the requested commits, trees, and blobs. The server converts Bonsai changesets to Git commits on-demand during packfile generation.

Fetch - Determines which objects the client needs based on negotiation and generates an incremental packfile. Uses commit graph traversal to identify objects not present on the client.

Push - Receives a packfile from the client, extracts Git objects, converts Git commits to Bonsai changesets, and updates references after validation and hook execution.

ls-remote - Lists available references (branches and tags) with their current commit SHAs.

Request Routing

The Git server uses Gotham-based HTTP routing (servers/git/git_server/src/service/) to handle protocol endpoints. Upload-pack requests (fetch/clone) and receive-pack requests (push) are routed to separate handlers that manage the respective protocol flows.

Bonsai-Git Conversion

Mononoke maintains a bidirectional mapping between Bonsai changesets and Git commits, enabling conversion in both directions.

Git Commits as Derived Data

Git commits are implemented as a derived data type (MappedGitCommitId) in git/git_types/src/derive_commit.rs. When a Bonsai changeset needs to be served to Git clients, the corresponding Git commit is derived.

The derivation process:

  1. Converts Bonsai author and date to Git signature format
  2. Handles committer information (Git always requires a committer)
  3. Translates file changes to Git tree structure
  4. Generates Git commit object with appropriate parent references
  5. Computes Git SHA-1 hash
  6. Stores the commit object in the blobstore
  7. Records the Bonsai ↔ Git SHA mapping

Metadata Conversion

Bonsai changeset metadata maps to Git commit fields:

Author and committer - Bonsai author and optional committer are converted to Git signature format (name, email, timestamp with timezone).

Commit message - Preserved directly. Extra metadata fields specific to other VCS systems are omitted.

Parents - Parent Bonsai changesets are mapped to their corresponding Git commits through the mapping table.

File Changes to Tree Objects

Bonsai's flat list of file changes is converted to Git's hierarchical tree structure. Each directory level becomes a Git tree object referencing either subtrees (directories) or blobs (files).

File modes:

  • Regular files → mode 100644
  • Executable files → mode 100755
  • Symlinks → mode 120000

Content addressing - File contents are shared between Bonsai and Git representations. The Git blob hash is stored in Bonsai's content metadata.

Copy/move information - Git's object model does not represent copy or rename metadata (Git infers these during diff operations). This information is not included in Git commits but is preserved in the Bonsai representation.

Mapping Storage

The BonsaiGitMapping facet (repo_attributes/bonsai_git_mapping/) maintains the bidirectional mapping between Bonsai changeset IDs and Git commit SHAs. This mapping is stored in a SQL table and cached for performance.

The mapping is populated during:

  • Git import (when existing Git repositories are imported into Mononoke)
  • Git push (when new Git commits are converted to Bonsai)
  • Git commit derivation (when Bonsai changesets are converted to Git)

Additional mappings are provided by:

  • BonsaiTagMapping - Maps Git annotated tag objects to Bonsai
  • BonsaiBlobMapping - Maps Git blob objects to file content IDs

Import and Export

gitimport (git/gitimport/) - Imports existing Git repositories into Mononoke by converting Git commits to Bonsai changesets. Preserves commit graph structure and reference positions.

gitexport (git/gitexport/) - Exports Mononoke repositories in Git format. Used for migration scenarios and backup.

import_tools (git/import_tools/) - Utilities supporting import operations, including validation and metadata handling.

Git-Specific Derived Data

Beyond Git commits, Mononoke derives additional Git-specific data to optimize protocol operations.

Git Trees

Git tree objects (GitTreeId) represent directory structures. These are derived alongside Git commits, creating the hierarchical tree structure from Bonsai's flat file change representation.

Delta Manifests

Delta manifests optimize Git packfile generation by pre-computing object delta information. Generating deltas on-demand for every fetch would be prohibitively expensive for large repositories.

GitDeltaManifestV2 (git/git_types/src/delta_manifest_v2.rs) - Tracks potential delta bases for each object in a commit, reducing packfile generation time.

GitDeltaManifestV3 (git/git_types/src/delta_manifest_v3.rs) - Enhanced version with chunking for better performance.

CompactedGitDeltaManifest (git/git_types/src/compacted_delta_manifest.rs) - For repositories with many small commits, fetching individual delta manifests per commit creates overhead. The compacted format aggregates delta information across multiple commits, reducing the number of blobstore fetches during packfile generation.

These derived data types are computed asynchronously off the critical write path, similar to other derived data in Mononoke.

Bundle URI Support

Git's bundle URI protocol allows servers to provide pre-generated repository bundles for efficient full clones. Mononoke can pre-compute bundles (git/bundle_uri/) and serve them to clients, avoiding the cost of generating packfiles for complete repository clones.

Git Reference Management

Git uses references (refs) to track branches, tags, and other named pointers. Mononoke provides several facets to manage Git reference semantics.

Symbolic References

The GitSymbolicRefs facet (repo_attributes/git_symbolic_refs/) handles Git symbolic references, which are references that point to other references rather than directly to commits.

The most common symbolic reference is HEAD, which typically points to the current branch (e.g., HEAD → refs/heads/main). The facet stores the symbolic reference name, target reference name, and target type (branch or tag).

Reference Content Mapping

The GitRefContentMapping facet (repo_attributes/git_ref_content_mapping/) maps Git references to their content. This is used for annotated tags and other reference types that contain content beyond a simple commit pointer.

Bookmarks Integration

Git branches are mapped to Mononoke bookmarks. When a Git client pushes to a branch, the corresponding bookmark is updated. When advertising references, the server queries bookmarks and formats them as Git refs.

Reference paths follow Git conventions:

  • Branches: refs/heads/<name>
  • Tags: refs/tags/<name>
  • HEAD: Symbolic reference to current branch

Git LFS Integration

The LFS server (servers/lfs/lfs_server/) implements the Git LFS protocol for large file storage. Git LFS replaces large files in Git repositories with small pointer files, storing the actual content separately.

LFS Protocol

The server implements the Git LFS batch API over HTTP:

Batch endpoint - Clients request upload or download URLs for multiple objects. The server returns signed URLs or direct endpoints for transferring content.

Upload/download endpoints - Handle actual file content transfer. Upload accepts file content via HTTP PUT. Download serves file content via HTTP GET.

LFS Pointer Interpretation

Mononoke can optionally interpret LFS pointer files. When enabled:

  1. During Git push, pointer files are detected by pattern matching
  2. Both the pointer and full file contents are uploaded to the blobstore
  3. The full file contents are stored in the Bonsai changeset
  4. A git_lfs field in the Bonsai FileChange indicates the file should be represented as an LFS pointer in Git format

This approach stores actual file contents in Bonsai rather than pointers, providing:

  • Consistent file access across different APIs (SCS, Sapling)
  • Accurate repository size statistics
  • Data integrity checks covering actual content
  • Proper handling in cross-repository sync operations

When serving Git clients, files marked with the git_lfs field are converted back to pointer format.

LFS Storage

LFS file contents are stored in the blobstore using the Filestore facet, which handles chunking for large files. The LFS server uses content-addressed storage, with object IDs derived from SHA-256 hashes of file contents.

The LFS server can be configured to use an upstream LFS server for federation, allowing Mononoke to act as a cache or proxy.

Source of Truth Tracking

The GitSourceOfTruthConfig facet (repo_attributes/git_source_of_truth/) tracks which VCS format is authoritative for a repository. This is used during migrations when repositories transition from external Git systems to Mononoke or vice versa.

The facet records:

  • Repository ID and name
  • Source of truth designation (Mononoke or external Git)
  • Migration status and mutation ID for tracking updates

This information ensures consistency when a repository is being served through multiple VCS protocols during migration periods.

Request Flow Examples

Git Clone

  1. Client sends HTTP GET to git-upload-pack endpoint
  2. Git server advertises available references (branches, tags)
  3. Client sends want/have negotiation
  4. Server determines which commits and objects the client needs
  5. Server derives Git commits from Bonsai changesets
  6. Server derives Git trees from file changes
  7. Server uses delta manifests to determine object deltas
  8. Server generates packfile with commits, trees, and blobs
  9. Packfile is streamed to client
  10. Client unpacks objects and checks out working directory

Git Push

  1. Client sends HTTP POST to git-receive-pack endpoint with packfile
  2. Git server parses packfile and extracts Git objects
  3. Server converts Git commits to Bonsai changesets
  4. Server validates commits and runs hooks (via HookManager facet)
  5. Server stores Bonsai changesets in blobstore
  6. Server updates Bonsai-Git mapping in metadata database
  7. Server updates commit graph index
  8. Server updates bookmark (Git ref) position
  9. Server confirms push success to client
  10. Derived data computation is queued asynchronously

Git Fetch

  1. Client sends HTTP POST to git-upload-pack with have/want negotiation
  2. Server uses commit graph to identify missing commits
  3. Server generates incremental packfile with new objects
  4. Server uses delta manifests to optimize delta computation
  5. Packfile is streamed to client
  6. Client applies updates to local repository

Component Organization

Git support is implemented across multiple directories:

Protocol and server:

  • servers/git/git_server/ - Git server binary and HTTP handlers
  • git/protocol/ - Git protocol implementation
  • git/packfile/ - Packfile generation and parsing
  • git/packetline/ - Packet-line protocol format

Data types and conversion:

  • git/git_types/ - Git object types, commit derivation, delta manifests
  • git/git_types/src/tree.rs - Git tree derivation from Bonsai
  • git/git_types/src/derive_commit.rs - Git commit derivation
  • git/git_types/src/delta_manifest_*.rs - Delta manifest types

Facets (in repo_attributes/):

  • bonsai_git_mapping/ - Bonsai ↔ Git SHA mapping
  • git_symbolic_refs/ - Symbolic reference handling
  • git_ref_content_mapping/ - Reference content mapping
  • git_source_of_truth/ - Source of truth configuration
  • bonsai_tag_mapping/ - Git annotated tag mapping
  • bonsai_blob_mapping/ - Git blob mapping

LFS:

  • servers/lfs/lfs_server/ - LFS server binary and protocol implementation
  • git/git_types/src/git_lfs.rs - LFS pointer handling

Import/export:

  • git/gitimport/ - Git repository import
  • git/gitexport/ - Git repository export
  • git/import_tools/ - Import utilities
  • git/import_direct/ - Direct import operations

Supporting components:

  • git/bundle_uri/ - Bundle URI protocol support
  • git/check_git_wc/ - Working copy validation
  • git/git_env/ - Git environment utilities