eden/mononoke/docs/5.1-git-support.md
This document explains how Mononoke supports Git clients and repositories. Git support allows standard Git clients to interact with Mononoke-backed repositories through the Git protocol while maintaining Bonsai as the canonical internal representation.
Mononoke provides Git protocol support through several components:
This integration enables Git clients to clone, fetch, and push to Mononoke repositories using standard Git operations, while Mononoke maintains a single canonical representation in Bonsai format.
The Git server (servers/git/git_server/) implements the Git smart HTTP protocol, allowing standard Git clients to interact with Mononoke repositories.
The server handles Git protocol operations through several layers:
Protocol parsing (git/protocol/) - Implements Git's pack protocol, including negotiation, packfile generation, and reference advertisement.
Packfile handling (git/packfile/) - Generates and parses Git packfiles containing commits, trees, and blobs. Packfiles are the primary data transfer format in the Git protocol.
Packet-line format (git/packetline/) - Implements Git's packet-line protocol used for framing data in Git communication.
The Git server handles standard Git operations:
Clone - Advertises available references and generates a packfile containing the requested commits, trees, and blobs. The server converts Bonsai changesets to Git commits on-demand during packfile generation.
Fetch - Determines which objects the client needs based on negotiation and generates an incremental packfile. Uses commit graph traversal to identify objects not present on the client.
Push - Receives a packfile from the client, extracts Git objects, converts Git commits to Bonsai changesets, and updates references after validation and hook execution.
ls-remote - Lists available references (branches and tags) with their current commit SHAs.
The Git server uses Gotham-based HTTP routing (servers/git/git_server/src/service/) to handle protocol endpoints. Upload-pack requests (fetch/clone) and receive-pack requests (push) are routed to separate handlers that manage the respective protocol flows.
Mononoke maintains a bidirectional mapping between Bonsai changesets and Git commits, enabling conversion in both directions.
Git commits are implemented as a derived data type (MappedGitCommitId) in git/git_types/src/derive_commit.rs. When a Bonsai changeset needs to be served to Git clients, the corresponding Git commit is derived.
The derivation process:
Bonsai changeset metadata maps to Git commit fields:
Author and committer - Bonsai author and optional committer are converted to Git signature format (name, email, timestamp with timezone).
Commit message - Preserved directly. Extra metadata fields specific to other VCS systems are omitted.
Parents - Parent Bonsai changesets are mapped to their corresponding Git commits through the mapping table.
Bonsai's flat list of file changes is converted to Git's hierarchical tree structure. Each directory level becomes a Git tree object referencing either subtrees (directories) or blobs (files).
File modes:
Content addressing - File contents are shared between Bonsai and Git representations. The Git blob hash is stored in Bonsai's content metadata.
Copy/move information - Git's object model does not represent copy or rename metadata (Git infers these during diff operations). This information is not included in Git commits but is preserved in the Bonsai representation.
The BonsaiGitMapping facet (repo_attributes/bonsai_git_mapping/) maintains the bidirectional mapping between Bonsai changeset IDs and Git commit SHAs. This mapping is stored in a SQL table and cached for performance.
The mapping is populated during:
Additional mappings are provided by:
BonsaiTagMapping - Maps Git annotated tag objects to BonsaiBonsaiBlobMapping - Maps Git blob objects to file content IDsgitimport (git/gitimport/) - Imports existing Git repositories into Mononoke by converting Git commits to Bonsai changesets. Preserves commit graph structure and reference positions.
gitexport (git/gitexport/) - Exports Mononoke repositories in Git format. Used for migration scenarios and backup.
import_tools (git/import_tools/) - Utilities supporting import operations, including validation and metadata handling.
Beyond Git commits, Mononoke derives additional Git-specific data to optimize protocol operations.
Git tree objects (GitTreeId) represent directory structures. These are derived alongside Git commits, creating the hierarchical tree structure from Bonsai's flat file change representation.
Delta manifests optimize Git packfile generation by pre-computing object delta information. Generating deltas on-demand for every fetch would be prohibitively expensive for large repositories.
GitDeltaManifestV2 (git/git_types/src/delta_manifest_v2.rs) - Tracks potential delta bases for each object in a commit, reducing packfile generation time.
GitDeltaManifestV3 (git/git_types/src/delta_manifest_v3.rs) - Enhanced version with chunking for better performance.
CompactedGitDeltaManifest (git/git_types/src/compacted_delta_manifest.rs) - For repositories with many small commits, fetching individual delta manifests per commit creates overhead. The compacted format aggregates delta information across multiple commits, reducing the number of blobstore fetches during packfile generation.
These derived data types are computed asynchronously off the critical write path, similar to other derived data in Mononoke.
Git's bundle URI protocol allows servers to provide pre-generated repository bundles for efficient full clones. Mononoke can pre-compute bundles (git/bundle_uri/) and serve them to clients, avoiding the cost of generating packfiles for complete repository clones.
Git uses references (refs) to track branches, tags, and other named pointers. Mononoke provides several facets to manage Git reference semantics.
The GitSymbolicRefs facet (repo_attributes/git_symbolic_refs/) handles Git symbolic references, which are references that point to other references rather than directly to commits.
The most common symbolic reference is HEAD, which typically points to the current branch (e.g., HEAD → refs/heads/main). The facet stores the symbolic reference name, target reference name, and target type (branch or tag).
The GitRefContentMapping facet (repo_attributes/git_ref_content_mapping/) maps Git references to their content. This is used for annotated tags and other reference types that contain content beyond a simple commit pointer.
Git branches are mapped to Mononoke bookmarks. When a Git client pushes to a branch, the corresponding bookmark is updated. When advertising references, the server queries bookmarks and formats them as Git refs.
Reference paths follow Git conventions:
refs/heads/<name>refs/tags/<name>HEAD: Symbolic reference to current branchThe LFS server (servers/lfs/lfs_server/) implements the Git LFS protocol for large file storage. Git LFS replaces large files in Git repositories with small pointer files, storing the actual content separately.
The server implements the Git LFS batch API over HTTP:
Batch endpoint - Clients request upload or download URLs for multiple objects. The server returns signed URLs or direct endpoints for transferring content.
Upload/download endpoints - Handle actual file content transfer. Upload accepts file content via HTTP PUT. Download serves file content via HTTP GET.
Mononoke can optionally interpret LFS pointer files. When enabled:
git_lfs field in the Bonsai FileChange indicates the file should be represented as an LFS pointer in Git formatThis approach stores actual file contents in Bonsai rather than pointers, providing:
When serving Git clients, files marked with the git_lfs field are converted back to pointer format.
LFS file contents are stored in the blobstore using the Filestore facet, which handles chunking for large files. The LFS server uses content-addressed storage, with object IDs derived from SHA-256 hashes of file contents.
The LFS server can be configured to use an upstream LFS server for federation, allowing Mononoke to act as a cache or proxy.
The GitSourceOfTruthConfig facet (repo_attributes/git_source_of_truth/) tracks which VCS format is authoritative for a repository. This is used during migrations when repositories transition from external Git systems to Mononoke or vice versa.
The facet records:
This information ensures consistency when a repository is being served through multiple VCS protocols during migration periods.
HookManager facet)Git support is implemented across multiple directories:
Protocol and server:
servers/git/git_server/ - Git server binary and HTTP handlersgit/protocol/ - Git protocol implementationgit/packfile/ - Packfile generation and parsinggit/packetline/ - Packet-line protocol formatData types and conversion:
git/git_types/ - Git object types, commit derivation, delta manifestsgit/git_types/src/tree.rs - Git tree derivation from Bonsaigit/git_types/src/derive_commit.rs - Git commit derivationgit/git_types/src/delta_manifest_*.rs - Delta manifest typesFacets (in repo_attributes/):
bonsai_git_mapping/ - Bonsai ↔ Git SHA mappinggit_symbolic_refs/ - Symbolic reference handlinggit_ref_content_mapping/ - Reference content mappinggit_source_of_truth/ - Source of truth configurationbonsai_tag_mapping/ - Git annotated tag mappingbonsai_blob_mapping/ - Git blob mappingLFS:
servers/lfs/lfs_server/ - LFS server binary and protocol implementationgit/git_types/src/git_lfs.rs - LFS pointer handlingImport/export:
git/gitimport/ - Git repository importgit/gitexport/ - Git repository exportgit/import_tools/ - Import utilitiesgit/import_direct/ - Direct import operationsSupporting components:
git/bundle_uri/ - Bundle URI protocol supportgit/check_git_wc/ - Working copy validationgit/git_env/ - Git environment utilities