Back to Sapling

Content Redaction

eden/mononoke/docs/4.4-redaction.md

latest9.1 KB
Original Source

Content Redaction

This document explains Mononoke's content redaction system, which allows removal of sensitive content from repositories while maintaining history integrity. Redaction is implemented at the blobstore layer and operates independently of the version control data model.

What is Redaction?

Content redaction is the process of preventing access to specific file contents in a repository. When content is redacted, the repository history remains intact—commits, file paths, and metadata are preserved—but attempts to access the actual file contents are blocked.

Redaction addresses several scenarios:

  • Accidentally committed credentials or secrets
  • Sensitive data that must be removed for compliance or security reasons
  • Content that violates policies or legal requirements
  • Data subject to removal requests

Unlike commit rewriting or history modification, redaction does not alter the repository's commit graph. The redacted content remains stored in the blobstore but is marked as inaccessible.

How Redaction Works

Redaction is implemented through the RedactedBlobstore decorator in blobstore/redactedblobstore/. This decorator wraps the repository's blobstore and intercepts all access attempts.

Blobstore-Level Enforcement

The RedactedBlobstore maintains a mapping of redacted blobstore keys (content IDs) to metadata. When a blob is requested:

  1. Access Check - The blobstore key is checked against the redacted content list
  2. Decision - If the key is redacted:
    • Enforce mode - Returns a Redacted error containing the reason for redaction
    • Log-only mode - Logs the access attempt but allows the operation to proceed
  3. Audit - All access attempts to redacted content are logged to Scuba with session information, username, and enforcement status
  4. Passthrough - If not redacted, the request proceeds to the underlying blobstore

This approach intercepts access at the storage layer regardless of which protocol (Sapling, Git, SCS) or operation (clone, fetch, blame) is being used.

Redaction Configuration

Redaction configuration consists of two components stored separately:

Redaction Sets - Configuration stored in Configerator (Meta's configuration management system) at a path specified in repository configuration. Each redaction set contains:

  • reason - Reference to a task or SEV explaining why content was redacted
  • id - RedactionKeyListId pointing to the list of redacted keys in the blobstore
  • enforce - Boolean controlling whether access is denied (true) or only logged (false)

Redaction Key Lists - Lists of blobstore keys stored in a special RedactionConfigBlobstore. Each list is a serialized RedactionKeyList containing the actual content IDs to redact.

The configuration is reloaded periodically (every 60 seconds) from Configerator, allowing redaction policy updates without server restarts.

Content Addressing

Redaction operates on content IDs (Blake2b hashes of file contents), not file paths or commits. This means:

  • The same content appearing in multiple files or commits is redacted everywhere
  • Renaming or moving a file does not bypass redaction
  • Different file contents at the same path are treated independently

The Redaction Process

Redacting content involves multiple steps performed using the admin CLI tool in tools/admin/.

Identifying Content to Redact

The first step is determining which content needs redaction. The admin tool provides commands for this:

From file paths - Given a commit and file paths, the tool:

  1. Derives the fsnode manifest for the specified commit
  2. Looks up content IDs for the specified paths
  3. Checks whether any content exists in the main bookmark
  4. Creates a redaction key list if checks pass

From blobstore keys - For cases where content IDs are already known, a key list can be created directly from blobstore keys.

Listing redacted content - Given a commit, the tool can search for all redacted file paths by comparing content IDs against the current redaction configuration.

Creating a Redaction Key List

The create-key-list command creates a RedactionKeyList and stores it in the redaction config blobstore:

admin redaction create-key-list -i <commit-id> file1.txt file2.txt

This command:

  1. Resolves file paths to content IDs in the specified commit
  2. Validates that content is not reachable from the main bookmark (unless --force is used)
  3. Creates a RedactionKeyList containing the content IDs
  4. Stores the list in the redaction config blobstore
  5. Returns a RedactionKeyListId (a Blake2b hash of the serialized list)

The key list can optionally be written to a file using --output-file.

Activating Redaction

Creating a key list does not activate redaction. The list ID must be added to the Configerator redaction configuration:

  1. Add a new RedactionSet entry to the repository's redaction configuration in Configerator
  2. Specify the RedactionKeyListId, a reason (task or SEV), and enforce status
  3. Submit and deploy the configuration change
  4. Mononoke servers reload the configuration within 60 seconds and begin enforcing the redaction

Log-Only Mode

Setting enforce: false in a redaction set enables log-only mode. Access to redacted content is permitted but logged. This allows:

  • Testing redaction before enforcement
  • Monitoring access to sensitive content
  • Gradual rollout of redaction policies

Access logs include the session ID, username (if available), operation type, and enforcement status.

Access Control and Permissions

Redaction enforcement operates independently of repository permissions. Even users with read access to a repository cannot access redacted content.

The CoreContext carries session information used for audit logging but does not grant exceptions to redaction. There is no mechanism to selectively grant access to redacted content based on user identity.

For operational purposes, the RedactedBlobstore provides an as_inner_unredacted() method that returns the underlying blobstore, but this is used only by internal tools that explicitly need to bypass redaction for maintenance operations.

Limitations and Considerations

Metadata Preservation - Redaction affects only file contents. Commit messages, file paths, author information, and other metadata remain visible. If metadata itself contains sensitive information, redaction alone is insufficient.

Historical Commits - Commits that include redacted content remain in the repository. The commit graph is not modified. Only access to the file contents is prevented.

Content-Addressed Nature - Redaction applies to content IDs. If the same content was previously accessible, copies may exist in caches (memcache, cachelib, local filesystems). Cache expiration eventually removes these copies, but immediate removal is not guaranteed.

Derived Data - Some derived data types (like Git trees or Mercurial manifests) may reference redacted content IDs. Accessing these derived data structures succeeds, but attempts to fetch the actual file contents fail.

No Content Deletion - Redaction prevents access but does not delete content from the blobstore. The blobs remain stored and continue consuming storage space. Physical deletion would require separate blobstore cleanup operations.

Configuration Propagation - Changes to redaction configuration take up to 60 seconds to propagate to all servers. During this window, some servers may enforce redaction while others do not.

Implementation Details

The redaction system is implemented across several components:

Blobstore Decorator (blobstore/redactedblobstore/) - The RedactedBlobstore implements the Blobstore trait and wraps another blobstore implementation. It maintains the redacted content map and performs access checks.

Feature Library (features/redaction/) - Provides high-level functions for creating and managing redaction key lists. Used by the admin tool.

Admin Commands (tools/admin/src/commands/redaction/) - Command-line interface for redaction operations:

  • create-key-list - Create key list from file paths in a commit
  • create-key-list-from-ids - Create key list from blobstore keys directly
  • fetch-key-list - Retrieve a key list by ID
  • list - List redacted paths in a commit

Type Definitions (mononoke_types/) - Defines RedactionKeyList and RedactionKeyListId types for serialization and content addressing.

Configuration (metaconfig/types/) - Defines RedactionConfig structure specifying the blobstore for key lists and the Configerator location for redaction sets.

The RedactedBlobstore is inserted into the blobstore decorator stack by the repository factory, positioned to intercept all content access regardless of caching or other decorators.

Component-specific implementation details are located in blobstore/redactedblobstore/ and features/redaction/.