eden/mononoke/docs/4.2-cross-repo-sync.md
This document explains Mononoke's cross-repository synchronization system—the framework for automatically synchronizing commits between different repositories.
Cross-repo sync is a system that maintains bidirectional synchronization between repositories. It automatically replicates commits from one repository to another, transforming file paths and commit metadata as needed to match each repository's structure.
The most common use case is synchronizing between a large repository (monorepo) and smaller project-specific repositories. When a developer pushes to a small repository, those changes are automatically synced to the corresponding location in the large repository. Conversely, changes in the large repository can be synced back to the small repositories.
Cross-repo sync handles:
Cross-repo sync operates in two directions:
Forward Sync (Small → Large)
Backward Sync (Large → Small)
The relationship between repositories is configured rather than inferred from structure. A repository can participate in multiple sync relationships.
When syncing commits between repositories, the sync system transforms them to match the target repository's structure.
Files are moved according to configured path mappings. Each small repository has a map defining how paths are transformed:
Example transformation:
src/foo.rsprojects/myproject/src/foo.rsThe mapping is defined in the commit sync configuration using prefix replacements. More complex transformations can be expressed through multiple mapping entries or by using the default action configuration.
Commits are rewritten during sync using the commit transformation library (features/commit_transformation/). The rewriting process:
Commits that would be empty after path transformation (no files in the mapped paths) are handled specially. These may be recorded as "not a sync candidate" or mapped to an equivalent working copy ancestor.
Merge Commits - Supported with limitations. Both parents must already be synced. Some complex merge scenarios are not supported and will cause sync to fail.
Copy/Move Information - File copy and move metadata is preserved when both source and destination paths map to the target repository.
Git Submodules - When configured, submodule expansion can be performed during sync, converting submodule pointers into expanded directory contents.
Cross-repo sync operates by tailing the bookmark update log of the source repository.
The x-repo sync job (features/commit_rewriting/mononoke_x_repo_sync_job/) performs forward sync:
Common pushrebase bookmarks (configured as common_pushrebase_bookmarks) receive special handling. These are bookmarks where pushrebase is used to maintain linear history in both repositories.
The backsyncer (features/commit_rewriting/backsyncer/) performs backward sync:
Backsyncer can run continuously or in catch-up mode to process historical commits.
All synced commits are recorded in the synced commit mapping database (features/commit_rewriting/synced_commit_mapping/). This mapping stores:
The mapping is bidirectional, allowing queries in either direction. It is consulted before syncing to avoid duplicate work and to remap parent commits correctly.
When a commit is synced, the result is recorded as a commit sync outcome:
RewrittenAs - The commit was transformed and created as a new commit in the target repository. This is the most common outcome.
EquivalentWorkingCopyAncestor - The commit would be empty after transformation (no files in mapped paths), so it maps to an ancestor commit with the same working copy state.
NotSyncCandidate - The commit should not be synced to the target repository. This is recorded when all file changes are outside mapped paths.
Multiple source commits may map to the same target commit when they only affect unmapped paths. The plural commit sync outcome type represents this many-to-one relationship.
Cross-repo sync is configured through the CommitSyncConfig structure in repository metadata.
Each sync relationship defines:
Large Repository - The repository ID of the large (monorepo) repository
Small Repositories - Map of small repository IDs to their configurations, each containing:
Common Pushrebase Bookmarks - Bookmarks where changes from small repos are pushed via pushrebase
Version Name - Identifies this configuration version
Sync configurations are versioned using CommitSyncConfigVersion. This allows the sync rules to evolve over time:
Configuration is managed through live_commit_sync_config (features/commit_rewriting/live_commit_sync_config/), which provides access to both current and historical configuration versions.
Small repository bookmarks can be mapped to large repository bookmarks with a configured prefix. For example:
mainprojectname/mainThe bookmark prefix is configured per small repository in the permanent configuration.
Cross-repo sync is integrated into repositories through the repo_cross_repo facet (repo_attributes/repo_cross_repo/). This facet provides:
The facet is used by both sync jobs and by other repository operations that need to query or update sync state.
Cross-repo sync serves several use cases:
Monorepo and Project Repos - A large monorepo contains many projects. Each project can have a dedicated small repository. Developers can commit to either repository, and changes are automatically synchronized.
Code Sharing - Common code can be shared between repositories. Changes in the shared code are propagated to all repositories that include it.
Migration - Repositories can be gradually migrated into or out of a monorepo while maintaining dual operation during the transition period.
Megarepo Operations - Initial import of repositories into a megarepo uses cross-repo sync mechanisms to establish the initial mappings and history. See megarepo_api/ for megarepo-specific operations.
Directory Isolation - A large repository can be split into smaller repositories by directory, allowing teams to work in isolated repositories while maintaining the option to work in the full repository.
The primary sync job (features/commit_rewriting/mononoke_x_repo_sync_job/) runs continuously to sync commits from small repositories to the large repository. It can be run in sharded mode across multiple processes for scalability.
Operation:
The backsyncer (features/commit_rewriting/backsyncer/) syncs commits from large repository to small repositories. It can run continuously or in catch-up mode.
Modes:
The admin CLI (tools/admin/) provides commands for cross-repo sync operations:
The cross-repo sync system has several limitations:
Merge Commits - Some merge commit scenarios are not supported. Merges where both parents have not been synced will fail.
Root Commits - Root commits (commits with no parents) and their descendants may not sync unless merged into a main line of development.
Path Conflicts - If path transformations would create conflicts (multiple source paths mapping to the same target path), sync will fail.
Bookmark Filters - Not all bookmarks are synced. Only explicitly configured bookmarks participate in sync.
Sequential Processing - Commits must be synced in topological order, which can limit parallelism.
These limitations are documented in the sync job source code and are checked during sync operations.
Cross-repo sync is designed for continuous operation with commits arriving at high rates:
Incremental Sync - Only new commits since the last sync are processed, not the entire repository history.
Batching - Multiple commits can be processed in a single sync iteration when catching up.
Leasing - Sync lease prevents multiple servers from syncing the same commits concurrently, avoiding wasted work.
Derived Data - Derived data can be computed asynchronously after sync completes, not blocking the sync operation.
Caching - Live commit sync config is cached to avoid repeated configuration lookups.
Sharding - The sync job can be sharded across multiple processes, each handling a subset of repositories.
Cross-repo sync includes validation mechanisms:
Working Copy Verification - The verify_working_copy function confirms that synced commits have identical working copies (after path transformation).
Bookmark Diff - The find_bookmark_diff function identifies discrepancies between bookmarks in source and target repositories.
Commit Validator - The commit validator (features/commit_rewriting/commit_validator/) checks that forward and backward sync produce consistent results.
Scuba Logging - All sync operations are logged to Scuba for monitoring and debugging.
Validation can detect sync configuration errors, sync job failures, or unexpected commit transformations.
Component-specific documentation:
features/cross_repo_sync/ - Core sync library implementationfeatures/commit_transformation/ - Commit rewriting logicfeatures/commit_rewriting/mononoke_x_repo_sync_job/ - Forward sync jobfeatures/commit_rewriting/backsyncer/ - Backward sync jobfeatures/commit_rewriting/synced_commit_mapping/ - Mapping databasefeatures/commit_rewriting/live_commit_sync_config/ - Configuration managementrepo_attributes/repo_cross_repo/ - Repository facetmegarepo_api/ - Megarepo operations