eden/mononoke/docs/4.4-redaction.md
This document explains Mononoke's content redaction system, which allows removal of sensitive content from repositories while maintaining history integrity. Redaction is implemented at the blobstore layer and operates independently of the version control data model.
Content redaction is the process of preventing access to specific file contents in a repository. When content is redacted, the repository history remains intact—commits, file paths, and metadata are preserved—but attempts to access the actual file contents are blocked.
Redaction addresses several scenarios:
Unlike commit rewriting or history modification, redaction does not alter the repository's commit graph. The redacted content remains stored in the blobstore but is marked as inaccessible.
Redaction is implemented through the RedactedBlobstore decorator in blobstore/redactedblobstore/. This decorator wraps the repository's blobstore and intercepts all access attempts.
The RedactedBlobstore maintains a mapping of redacted blobstore keys (content IDs) to metadata. When a blob is requested:
Redacted error containing the reason for redactionThis approach intercepts access at the storage layer regardless of which protocol (Sapling, Git, SCS) or operation (clone, fetch, blame) is being used.
Redaction configuration consists of two components stored separately:
Redaction Sets - Configuration stored in Configerator (Meta's configuration management system) at a path specified in repository configuration. Each redaction set contains:
reason - Reference to a task or SEV explaining why content was redactedid - RedactionKeyListId pointing to the list of redacted keys in the blobstoreenforce - Boolean controlling whether access is denied (true) or only logged (false)Redaction Key Lists - Lists of blobstore keys stored in a special RedactionConfigBlobstore. Each list is a serialized RedactionKeyList containing the actual content IDs to redact.
The configuration is reloaded periodically (every 60 seconds) from Configerator, allowing redaction policy updates without server restarts.
Redaction operates on content IDs (Blake2b hashes of file contents), not file paths or commits. This means:
Redacting content involves multiple steps performed using the admin CLI tool in tools/admin/.
The first step is determining which content needs redaction. The admin tool provides commands for this:
From file paths - Given a commit and file paths, the tool:
From blobstore keys - For cases where content IDs are already known, a key list can be created directly from blobstore keys.
Listing redacted content - Given a commit, the tool can search for all redacted file paths by comparing content IDs against the current redaction configuration.
The create-key-list command creates a RedactionKeyList and stores it in the redaction config blobstore:
admin redaction create-key-list -i <commit-id> file1.txt file2.txt
This command:
--force is used)RedactionKeyList containing the content IDsRedactionKeyListId (a Blake2b hash of the serialized list)The key list can optionally be written to a file using --output-file.
Creating a key list does not activate redaction. The list ID must be added to the Configerator redaction configuration:
RedactionSet entry to the repository's redaction configuration in ConfigeratorRedactionKeyListId, a reason (task or SEV), and enforce statusSetting enforce: false in a redaction set enables log-only mode. Access to redacted content is permitted but logged. This allows:
Access logs include the session ID, username (if available), operation type, and enforcement status.
Redaction enforcement operates independently of repository permissions. Even users with read access to a repository cannot access redacted content.
The CoreContext carries session information used for audit logging but does not grant exceptions to redaction. There is no mechanism to selectively grant access to redacted content based on user identity.
For operational purposes, the RedactedBlobstore provides an as_inner_unredacted() method that returns the underlying blobstore, but this is used only by internal tools that explicitly need to bypass redaction for maintenance operations.
Metadata Preservation - Redaction affects only file contents. Commit messages, file paths, author information, and other metadata remain visible. If metadata itself contains sensitive information, redaction alone is insufficient.
Historical Commits - Commits that include redacted content remain in the repository. The commit graph is not modified. Only access to the file contents is prevented.
Content-Addressed Nature - Redaction applies to content IDs. If the same content was previously accessible, copies may exist in caches (memcache, cachelib, local filesystems). Cache expiration eventually removes these copies, but immediate removal is not guaranteed.
Derived Data - Some derived data types (like Git trees or Mercurial manifests) may reference redacted content IDs. Accessing these derived data structures succeeds, but attempts to fetch the actual file contents fail.
No Content Deletion - Redaction prevents access but does not delete content from the blobstore. The blobs remain stored and continue consuming storage space. Physical deletion would require separate blobstore cleanup operations.
Configuration Propagation - Changes to redaction configuration take up to 60 seconds to propagate to all servers. During this window, some servers may enforce redaction while others do not.
The redaction system is implemented across several components:
Blobstore Decorator (blobstore/redactedblobstore/) - The RedactedBlobstore implements the Blobstore trait and wraps another blobstore implementation. It maintains the redacted content map and performs access checks.
Feature Library (features/redaction/) - Provides high-level functions for creating and managing redaction key lists. Used by the admin tool.
Admin Commands (tools/admin/src/commands/redaction/) - Command-line interface for redaction operations:
create-key-list - Create key list from file paths in a commitcreate-key-list-from-ids - Create key list from blobstore keys directlyfetch-key-list - Retrieve a key list by IDlist - List redacted paths in a commitType Definitions (mononoke_types/) - Defines RedactionKeyList and RedactionKeyListId types for serialization and content addressing.
Configuration (metaconfig/types/) - Defines RedactionConfig structure specifying the blobstore for key lists and the Configerator location for redaction sets.
The RedactedBlobstore is inserted into the blobstore decorator stack by the repository factory, positioned to intercept all content access regardless of caching or other decorators.
Component-specific implementation details are located in blobstore/redactedblobstore/ and features/redaction/.