eden/mononoke/docs/6.1-rate-limiting-and-load-shedding.md
This document describes Mononoke's rate limiting and load shedding mechanisms, which control request load to maintain system stability at scale.
Mononoke serves source control traffic for repositories with thousands of commits per day and hundreds of thousands of clients. Without load control, individual clients or groups of clients can overwhelm system resources, degrading service for all users. Rate limiting and load shedding provide mechanisms to control this load.
These mechanisms operate at different layers of the system: rate limiting controls individual client behavior based on request patterns, while load shedding responds to system-wide resource pressure. Together, they maintain service availability under varying load conditions.
Rate limiting and load shedding serve distinct purposes:
Rate limiting restricts the number of requests from specific clients within a time window. Limits are configured based on metrics tracked per client, such as request rate or egress bandwidth. When a client exceeds its limit, additional requests are rejected with HTTP 429 (Too Many Requests) responses.
Characteristics:
Example: Reject requests from a specific client identity when that client's EdenAPI request rate exceeds 1 million requests per second globally.
Load shedding rejects requests when system resources reach critical thresholds. Rather than targeting individual clients, load shedding responds to resource utilization (CPU, memory, queue depth) across the service. When load shedding activates, requests are rejected based on priority, with lower-priority traffic dropped first.
Characteristics:
Example: When regional CPU utilization exceeds 95%, reject all incoming requests until utilization returns to acceptable levels.
The rate limiting framework is implemented in rate_limiting/ and integrated into request handling middleware.
Rate limits can be configured for various metrics:
Request Metrics:
EdenApiQps - Requests per second to the EdenAPI serviceLocationToHashCount - Commit location-to-hash queriesData Transfer Metrics:
EgressBytes - Bytes sent to clientsTotalManifests - Manifest objects servedGetpackFiles - File objects served via getpackCommit Metrics:
Commits - Commits created or modifiedCommitsPerAuthor - Commits attributed to a specific authorCommitsPerUser - Commits from a specific user identityEach metric is tracked over a configured time window (e.g., 1 second, 1 minute) to determine if limits are exceeded.
Rate limits can apply at different scopes:
Global Scope - Limits apply across all regions and server instances. A client exceeding the global limit is throttled regardless of which region or server handles the request. Global limits require coordination across server instances.
Regional Scope - Limits apply within a single geographic region or data center. A client might be throttled in one region while remaining under limits in another region. Regional limits allow independent capacity management.
Rate limits target specific clients or groups:
Main Client ID - Identifies individual client applications or services. Each client is assigned a unique identifier used for quota tracking. The main client ID can include additional context such as Sandcastle job alias for CI traffic.
Identity Set - Targets clients matching a set of identities. Identities include service identity (e.g., SERVICE_IDENTITY:quicksand), machine tier (e.g., MACHINE_TIER:sandcastle), or other attributes. All specified identities must be present for the rule to apply.
Static Slice - Applies limits to a percentage of clients based on consistent hashing of client hostname. This allows gradual rollout of new limits by targeting an increasing slice of traffic.
When no target is specified, the rate limit applies to all traffic.
Rate limits are configured using Thrift-based configuration files that can be updated dynamically. Each rate limit specifies:
Configuration is loaded via cached config handles and updates are applied without service restart. Configuration files define arrays of rate limits, allowing multiple limits with different targets and metrics to be active simultaneously.
Rate limiting is checked at the beginning of request processing, in the middleware layer before expensive operations are performed:
This early rejection avoids consuming resources for requests that will be dropped, and the 429 response code triggers exponential backoff in Sapling clients.
Load shedding is implemented alongside rate limiting in the rate_limiting/ component and uses similar configuration structures.
Load shedding decisions are based on system resource metrics:
Local Counters - Metrics exposed by the Mononoke process itself via the fb303 interface, such as CPU utilization, memory usage, request queue depth, or thread pool saturation. These are prefixed with mononoke. and available without external queries.
External Counters - Metrics from external systems such as ODS (Operational Data Store), including storage backend latency, database connection pool utilization, or cache hit rates. External counters require periodic queries to external metrics systems, with results cached locally.
The choice between local and external counters depends on what resource triggers load shedding. Backend storage pressure might use external counters for storage system metrics, while frontend overload uses local CPU and memory metrics.
Each load shed limit specifies:
Multiple load shed limits can be configured with different targets, allowing progressive shedding. Lower-priority traffic is shed at lower thresholds, while critical traffic is only shed when resources are critically exhausted.
Example configuration:
SERVICE_IDENTITY:quicksandThis progressive approach preserves service for high-priority traffic while protecting the system from complete failure.
Load shedding is checked for each request, similar to rate limiting:
Shed requests receive the same HTTP 429 response as rate-limited requests, triggering client backoff behavior.
The Load Limiter service (facebook/load_limiter/) is an external microservice that coordinates load-aware job scheduling for continuous integration systems.
The Load Limiter implements the Jupiter External Dependencies interface, which allows jobs to block on arbitrary external conditions. When load is high, jobs wait in queue rather than executing and failing due to service overload.
The service implements three operations:
acquire - Attempts to acquire the dependency. Returns success if load is acceptable, or blocked if load exceeds thresholds. Jobs receiving blocked responses remain in the scheduler queue rather than executing.
canAcquire - Checks whether the dependency could be acquired without actually acquiring it. Used by schedulers to predict wait times.
release - Releases the dependency when a job completes. For source control dependencies, this is typically a no-op since Mononoke load is not quota-based.
The Load Limiter monitors Mononoke metrics (CPU utilization, request rates, error rates) and returns blocked responses when these metrics exceed configured thresholds. The relationship between Mononoke load and Load Limiter responses:
| Avg CPU Utilization | Mononoke Response | Load Limiter Response | Job Outcome |
|---|---|---|---|
| 80% | 200 (success) | Acquired | Job executes successfully |
| 97% | Intermittent 200/429 | Blocked | Job queues, executes when load decreases |
| 99% | 429 (rate limited) | Blocked | Job queues until system recovers |
When jobs are blocked, they remain in the scheduler queue rather than consuming worker resources and failing. This reduces the number of failed CI jobs caused by transient Mononoke overload.
The Load Limiter uses separate configuration (scm/mononoke/load_limiter/load_limiter) specifying which Mononoke metrics to monitor and threshold values for blocking jobs. Configuration includes:
The service logs decisions to Scuba (mononoke_load_limiter dataset) for analysis of blocking patterns and load correlation.
In addition to request-level rate limiting, Mononoke can throttle blobstore operations using the ThrottledBlob decorator.
ThrottledBlob (blobstore/throttledblob/) wraps a blobstore and applies rate limiting to individual operations. It can throttle on two dimensions:
Operations per Second (QPS):
Bytes per Second:
Throttling uses token bucket rate limiting (via the governor crate) with jitter to prevent thundering herd effects when multiple operations wait for tokens.
Blobstore throttling is used to:
Protect Storage Backends - Limit load on underlying storage systems (SQL, S3, Manifold) to prevent overwhelming them during traffic spikes or bulk operations.
Control Resource Consumption - Limit network bandwidth consumption when multiple repositories share network capacity.
Test Resilience - Simulate slow storage during testing by applying artificial throttling to verify timeout and retry behavior.
Throttled blobstores appear in the blobstore decorator stack between the application and storage backend, applying limits before requests reach the underlying storage.
Sapling and EdenFS clients implement retry behavior for rate limiting and load shedding responses:
429 Responses (Rate Limit/Load Shed):
Other Error Responses:
Data Preservation:
EdenFS Behavior:
This client behavior provides automatic recovery from transient overload while preventing indefinite hangs.
Rate limiting and load shedding configuration is managed through Mononoke's repository configuration system:
Configuration Storage:
Configuration Validation:
Monitoring Integration:
Configuration updates follow standard Mononoke configuration deployment practices, with changes committed to version control and deployed through the configuration service.
The rate limiting implementation consists of several components:
RateLimiter Trait (rate_limiting/src/lib.rs)
check_rate_limit - Verifies if a request exceeds configured limitscheck_load_shed - Verifies if system load requires sheddingbump_load - Updates metric counters for accepted requestsfind_rate_limit - Retrieves applicable limits for a clientRateLimitEnvironment (rate_limiting/src/lib.rs)
Configuration Handling (rate_limiting/src/config.rs)
Middleware Integration (server/repo_listener/src/request_handler.rs)
Counter Management (rate_limiting/src/facebook.rs for internal builds)
The OSS build (rate_limiting/src/oss.rs) provides a minimal implementation that always allows requests, as full rate limiting depends on internal infrastructure.
Several Mononoke components interact with rate limiting:
Request Handlers - All frontend servers (Mononoke, SCS, Git, LFS) check rate limits in their request handling middleware before processing requests.
Repository Facets - The repo_permission_checker facet is consulted alongside rate limiting to enforce both authorization and load control.
Monitoring Infrastructure - Scuba logging and ODS counters provide visibility into rate limiting decisions and load patterns.
Configuration System - Cached config handles enable dynamic rate limit updates without service restart.
Mononoke's rate limiting and load shedding mechanisms operate at multiple layers to control request load:
Request-Level Rate Limiting - Tracks per-client metrics (QPS, egress bytes, commits) over time windows, rejecting requests from clients exceeding configured thresholds. Applies at global or regional scope.
System-Level Load Shedding - Monitors resource utilization (CPU, memory, queue depth) and sheds requests when thresholds are exceeded, with progressive shedding based on request priority.
Blobstore Throttling - Limits storage operations (QPS and bytes/sec) to protect backend storage systems from overload.
Load-Aware Job Scheduling - The Load Limiter service integrates with CI systems to queue jobs instead of executing them during high load periods.
These mechanisms use dynamic configuration, early request rejection, and coordinated client backoff to maintain service stability under varying load conditions. Configuration is deployed through standard configuration management with monitoring integration for observability.
Component-specific details are documented in rate_limiting/, facebook/load_limiter/, and blobstore/throttledblob/ directories.