Cross-Repository Sync

This document explains Mononoke's cross-repository synchronization system—the framework for automatically synchronizing commits between different repositories.

What is Cross-Repo Sync?

Cross-repo sync is a system that maintains bidirectional synchronization between repositories. It automatically replicates commits from one repository to another, transforming file paths and commit metadata as needed to match each repository's structure.

The most common use case is synchronizing between a large repository (monorepo) and smaller project-specific repositories. When a developer pushes to a small repository, those changes are automatically synced to the corresponding location in the large repository. Conversely, changes in the large repository can be synced back to the small repositories.

Cross-repo sync handles:

Path transformation (files in different locations between repos)
Bookmark mapping (branch names may differ)
Commit metadata preservation
History linearization through pushrebase
Merge commit handling (with limitations)

Sync Directions

Cross-repo sync operates in two directions:

Forward Sync (Small → Large)

Commits from small repositories are synced into the large repository
Each small repo's files are placed in a designated prefix within the large repo
This is the primary direction when small repos are the source of truth
Implemented by the x-repo sync job

Backward Sync (Large → Small)

Commits from the large repository are synced to small repositories
Only changes affecting the small repo's prefix are synced
Maintains the ability to develop in either repository
Implemented by the backsyncer

The relationship between repositories is configured rather than inferred from structure. A repository can participate in multiple sync relationships.

Commit Transformation

When syncing commits between repositories, the sync system transforms them to match the target repository's structure.

Path Transformation

Files are moved according to configured path mappings. Each small repository has a map defining how paths are transformed:

Example transformation:

Small repo path: src/foo.rs
Large repo path: projects/myproject/src/foo.rs

The mapping is defined in the commit sync configuration using prefix replacements. More complex transformations can be expressed through multiple mapping entries or by using the default action configuration.

Commit Rewriting

Commits are rewritten during sync using the commit transformation library (features/commit_transformation/). The rewriting process:

Loads the source commit
Applies path transformations to file changes
Remaps parent commits (already synced)
Preserves author, timestamp, and commit message
Stores the rewritten commit in the target repository

Commits that would be empty after path transformation (no files in the mapped paths) are handled specially. These may be recorded as "not a sync candidate" or mapped to an equivalent working copy ancestor.

Special Cases

Merge Commits - Supported with limitations. Both parents must already be synced. Some complex merge scenarios are not supported and will cause sync to fail.

Copy/Move Information - File copy and move metadata is preserved when both source and destination paths map to the target repository.

Git Submodules - When configured, submodule expansion can be performed during sync, converting submodule pointers into expanded directory contents.

Sync Process

Cross-repo sync operates by tailing the bookmark update log of the source repository.

Forward Sync Process

The x-repo sync job (features/commit_rewriting/mononoke_x_repo_sync_job/) performs forward sync:

Tail Bookmark Updates - Monitor bookmark update log for changes
Find Unsynced Ancestors - Identify commits not yet synced to target repo
Sync in Order - Process commits topologically (parents before children)
Transform Commits - Rewrite each commit for the target repository
Pushrebase - Land synced commits onto target bookmarks via pushrebase
Record Mapping - Store source-to-target commit mapping
Update Counter - Record progress for resumption

Common pushrebase bookmarks (configured as common_pushrebase_bookmarks) receive special handling. These are bookmarks where pushrebase is used to maintain linear history in both repositories.

Backward Sync Process

The backsyncer (features/commit_rewriting/backsyncer/) performs backward sync:

Tail Large Repo - Monitor bookmark updates in the large repository
Filter Changes - Select commits affecting the small repo's mapped paths
Transform Commits - Rewrite commits removing unrelated paths
Sync to Small Repo - Push transformed commits to small repository
Record Mapping - Store bidirectional mapping

Backsyncer can run continuously or in catch-up mode to process historical commits.

Synced Commit Mapping

All synced commits are recorded in the synced commit mapping database (features/commit_rewriting/synced_commit_mapping/). This mapping stores:

Source repository and changeset ID
Target repository and changeset ID
Sync config version used
Sync outcome (rewritten, equivalent working copy, not a sync candidate)

The mapping is bidirectional, allowing queries in either direction. It is consulted before syncing to avoid duplicate work and to remap parent commits correctly.

Commit Sync Outcomes

When a commit is synced, the result is recorded as a commit sync outcome:

RewrittenAs - The commit was transformed and created as a new commit in the target repository. This is the most common outcome.

EquivalentWorkingCopyAncestor - The commit would be empty after transformation (no files in mapped paths), so it maps to an ancestor commit with the same working copy state.

NotSyncCandidate - The commit should not be synced to the target repository. This is recorded when all file changes are outside mapped paths.

Multiple source commits may map to the same target commit when they only affect unmapped paths. The plural commit sync outcome type represents this many-to-one relationship.

Configuration

Cross-repo sync is configured through the CommitSyncConfig structure in repository metadata.

Commit Sync Config Structure

Each sync relationship defines:

Large Repository - The repository ID of the large (monorepo) repository

Small Repositories - Map of small repository IDs to their configurations, each containing:

Default Action - What to do with paths not explicitly mapped
Path Map - Prefix replacements (small repo path → large repo path)
Submodule Config - Git submodule handling if applicable

Common Pushrebase Bookmarks - Bookmarks where changes from small repos are pushed via pushrebase

Version Name - Identifies this configuration version

Configuration Versioning

Sync configurations are versioned using CommitSyncConfigVersion. This allows the sync rules to evolve over time:

New path mappings can be added in a new version
Different commits may be synced using different config versions
The version is recorded in the synced commit mapping
Live config can change while preserving historical mappings

Configuration is managed through live_commit_sync_config (features/commit_rewriting/live_commit_sync_config/), which provides access to both current and historical configuration versions.

Bookmark Mapping

Small repository bookmarks can be mapped to large repository bookmarks with a configured prefix. For example:

Small repo bookmark: main
Large repo bookmark: projectname/main

The bookmark prefix is configured per small repository in the permanent configuration.

Repository Integration

Cross-repo sync is integrated into repositories through the repo_cross_repo facet (repo_attributes/repo_cross_repo/). This facet provides:

Synced Commit Mapping - Database of synced commits
Live Commit Sync Config - Current and historical sync configurations
Sync Lease - Coordination to prevent concurrent sync attempts
Submodule Dependencies - Repository IDs of submodule dependencies

The facet is used by both sync jobs and by other repository operations that need to query or update sync state.

Use Cases

Cross-repo sync serves several use cases:

Monorepo and Project Repos - A large monorepo contains many projects. Each project can have a dedicated small repository. Developers can commit to either repository, and changes are automatically synchronized.

Code Sharing - Common code can be shared between repositories. Changes in the shared code are propagated to all repositories that include it.

Migration - Repositories can be gradually migrated into or out of a monorepo while maintaining dual operation during the transition period.

Megarepo Operations - Initial import of repositories into a megarepo uses cross-repo sync mechanisms to establish the initial mappings and history. See megarepo_api/ for megarepo-specific operations.

Directory Isolation - A large repository can be split into smaller repositories by directory, allowing teams to work in isolated repositories while maintaining the option to work in the full repository.

Jobs and Tools

X-Repo Sync Job

The primary sync job (features/commit_rewriting/mononoke_x_repo_sync_job/) runs continuously to sync commits from small repositories to the large repository. It can be run in sharded mode across multiple processes for scalability.

Operation:

Tails bookmark update log
Processes commits in topological order
Uses pushrebase for common bookmarks
Records progress in mutable counters
Resumes from last processed position on restart

Backsyncer

The backsyncer (features/commit_rewriting/backsyncer/) syncs commits from large repository to small repositories. It can run continuously or in catch-up mode.

Modes:

Continuous tailing of large repo bookmarks
Catch-up mode to process historical commits
Validation mode to verify sync correctness

Admin Tools

The admin CLI (tools/admin/) provides commands for cross-repo sync operations:

Manually triggering sync of specific commits
Querying synced commit mappings
Validating working copy equivalence
Inspecting sync configuration

Limitations and Constraints

The cross-repo sync system has several limitations:

Merge Commits - Some merge commit scenarios are not supported. Merges where both parents have not been synced will fail.

Root Commits - Root commits (commits with no parents) and their descendants may not sync unless merged into a main line of development.

Path Conflicts - If path transformations would create conflicts (multiple source paths mapping to the same target path), sync will fail.

Bookmark Filters - Not all bookmarks are synced. Only explicitly configured bookmarks participate in sync.

Sequential Processing - Commits must be synced in topological order, which can limit parallelism.

These limitations are documented in the sync job source code and are checked during sync operations.

Performance Considerations

Cross-repo sync is designed for continuous operation with commits arriving at high rates:

Incremental Sync - Only new commits since the last sync are processed, not the entire repository history.

Batching - Multiple commits can be processed in a single sync iteration when catching up.

Leasing - Sync lease prevents multiple servers from syncing the same commits concurrently, avoiding wasted work.

Derived Data - Derived data can be computed asynchronously after sync completes, not blocking the sync operation.

Caching - Live commit sync config is cached to avoid repeated configuration lookups.

Sharding - The sync job can be sharded across multiple processes, each handling a subset of repositories.

Monitoring and Validation

Cross-repo sync includes validation mechanisms:

Working Copy Verification - The verify_working_copy function confirms that synced commits have identical working copies (after path transformation).

Bookmark Diff - The find_bookmark_diff function identifies discrepancies between bookmarks in source and target repositories.

Commit Validator - The commit validator (features/commit_rewriting/commit_validator/) checks that forward and backward sync produce consistent results.

Scuba Logging - All sync operations are logged to Scuba for monitoring and debugging.

Validation can detect sync configuration errors, sync job failures, or unexpected commit transformations.

Architecture Overview - Cross-repo sync in system context
Repository Facets - RepoCrossRepo facet
Pushrebase - How commits are landed during sync
Jobs and Background Workers - Sync job operational context

Component-specific documentation:

features/cross_repo_sync/ - Core sync library implementation
features/commit_transformation/ - Commit rewriting logic
features/commit_rewriting/mononoke_x_repo_sync_job/ - Forward sync job
features/commit_rewriting/backsyncer/ - Backward sync job
features/commit_rewriting/synced_commit_mapping/ - Mapping database
features/commit_rewriting/live_commit_sync_config/ - Configuration management
repo_attributes/repo_cross_repo/ - Repository facet
megarepo_api/ - Megarepo operations

Cross-Repository Sync

Cross-Repository Sync

What is Cross-Repo Sync?

Sync Directions

Commit Transformation

Path Transformation

Commit Rewriting

Special Cases

Sync Process

Forward Sync Process

Backward Sync Process

Synced Commit Mapping

Commit Sync Outcomes

Configuration

Commit Sync Config Structure

Configuration Versioning

Bookmark Mapping

Repository Integration

Use Cases

Jobs and Tools

X-Repo Sync Job

Backsyncer

Admin Tools

Limitations and Constraints

Performance Considerations

Monitoring and Validation

Related Documentation