doc/administration/gitaly/concurrency_limiting.md
To avoid overwhelming the servers running Gitaly, you can limit concurrency of:
These limits can be fixed, or set as adaptive.
[!warning] Enabling limits on your environment should be done with caution and only in select circumstances, such as to protect against unexpected traffic. When reached, limits do result in disconnects that negatively impact users. For consistent and stable performance, you should first explore other options such as adjusting node specifications, and reviewing large repositories or workloads.
When cloning or pulling repositories, various RPCs run in the background. In particular, the Git pack RPCs:
SSHUploadPackWithSidechannel (for Git SSH).PostUploadPackWithSidechannel (for Git HTTP).These RPCs can consume a large amount of resources, which can have a significant impact in situations such as:
You can limit these processes from overwhelming your Gitaly server in these scenarios using the concurrency limits in the Gitaly configuration file. For example:
# in /etc/gitlab/gitlab.rb
gitaly['configuration'] = {
# ...
concurrency: [
{
rpc: '/gitaly.SmartHTTPService/PostUploadPackWithSidechannel',
max_per_repo: 20,
max_queue_wait: '1s',
max_queue_size: 10,
},
{
rpc: '/gitaly.SSHService/SSHUploadPackWithSidechannel',
max_per_repo: 20,
max_queue_wait: '1s',
max_queue_size: 10,
},
],
}
rpc is the name of the RPC to set a concurrency limit for per repository.max_per_repo is the maximum number of in-flight RPC calls for the given RPC per repository.max_queue_wait is the maximum amount of time a request can wait in the concurrency queue to
be picked up by Gitaly.max_queue_size is the maximum size the concurrency queue (per RPC method) can grow to before requests are rejected by
Gitaly.This limits the number of in-flight RPC calls for the given RPCs. The limit is applied per repository. In the previous example:
PostUploadPackWithSidechannel and
SSHUploadPackWithSidechannel RPC calls in flight.[!note] When these limits are reached, users are disconnected.
You can observe the behavior of this queue using the Gitaly logs and Prometheus. For more information, see the relevant documentation.
{{< history >}}
gitaly_limit_unauthenticated. Disabled by default.{{< /history >}}
[!flag] The availability of this feature is controlled by a feature flag. For more information, see the history. This feature is available for testing, but not ready for production use.
By default, RPC concurrency limits apply to all requests regardless of authentication status. However, you can configure separate, more restrictive limits for unauthenticated requests to protect your Gitaly server from potential abuse or resource exhaustion from anonymous traffic.
When you configure the unauthenticated field for an RPC, Gitaly uses
separate limiters:
unauthenticated field.This separation allows you to:
If you don't configure the unauthenticated field, all requests (both
authenticated and unauthenticated) share the same concurrency limits.
Consider configuring separate unauthenticated limits when:
The following example shows how to configure separate static limits for authenticated and unauthenticated requests:
# in /etc/gitlab/gitlab.rb
gitaly['configuration'] = {
# ...
concurrency: [
{
rpc: '/gitaly.SmartHTTPService/PostUploadPackWithSidechannel',
# Limits for authenticated requests
max_per_repo: 20,
max_queue_wait: '1s',
max_queue_size: 10,
# Separate limits for unauthenticated requests
unauthenticated: {
max_per_repo: 5,
max_queue_wait: '500ms',
max_queue_size: 5,
},
},
],
}
In this example:
The unauthenticated field supports both static and adaptive concurrency
limits, just like the main configuration. You can configure adaptive limits
for unauthenticated requests:
# in /etc/gitlab/gitlab.rb
gitaly['configuration'] = {
# ...
concurrency: [
{
rpc: '/gitaly.SmartHTTPService/PostUploadPackWithSidechannel',
# Adaptive limits for authenticated requests
adaptive: true,
min_limit: 10,
initial_limit: 20,
max_limit: 40,
max_queue_wait: '1s',
max_queue_size: 10,
# Adaptive limits for unauthenticated requests
unauthenticated: {
adaptive: true,
min_limit: 2,
initial_limit: 5,
max_limit: 10,
max_queue_wait: '500ms',
max_queue_size: 5,
},
},
],
}
This configuration allows both authenticated and unauthenticated limits to adapt independently based on system resource usage, while maintaining the separation between the two traffic types.
Gitaly triggers git-pack-objects processes when handling both SSH and HTTPS traffic to clone or pull repositories. These processes generate a pack-file and can
consume a significant amount of resources, especially in situations such as unexpectedly high traffic or concurrent pulls from a large repository. On GitLab.com, we also
observe problems with clients that have slow internet connections.
You can limit these processes from overwhelming your Gitaly server by setting pack-objects concurrency limits in the Gitaly configuration file. This setting limits the number of in-flight pack-object processes per remote IP address.
[!warning] Only enable these limits on your environment with caution and only in select circumstances, such as to protect against unexpected traffic. When reached, these limits disconnect users. For consistent and stable performance, you should first explore other options such as adjusting node specifications, and reviewing large repositories or workloads.
Example configuration:
# in /etc/gitlab/gitlab.rb
gitaly['pack_objects_limiting'] = {
'max_concurrency' => 15,
'max_queue_length' => 200,
'max_queue_wait' => '60s',
}
max_concurrency is the maximum number of in-flight pack-object processes per key.max_queue_length is the maximum size the concurrency queue (per key) can grow to before requests are rejected by Gitaly.max_queue_wait is the maximum amount of time a request can wait in the concurrency queue to be picked up by Gitaly.In the previous example:
When the pack-object cache is enabled, pack-objects limiting kicks in only if the cache is missed. For more, see Pack-objects cache.
You can observe the behavior of this queue using Gitaly logs and Prometheus. For more information, see Monitor Gitaly pack-objects concurrency limiting.
When setting concurrency limits, you should choose appropriate values based on your specific workload patterns. This section provides guidance on how to calibrate these limits effectively.
Prometheus metrics provide quantitative insights into usage patterns and the impact of each type of RPC on Gitaly node resources. Several key metrics are particularly valuable for this analysis:
git processes and so the command usually shelled out to is the Git binary.
Gitaly exposes collected metrics from those commands as logs and Prometheus metrics.
gitaly_command_cpu_seconds_total - Sum of CPU time spent by shelling out, with labels for grpc_service, grpc_method, cmd, and subcmd.gitaly_command_real_seconds_total - Sum of real time spent by shelling out, with similar labels.gitaly_concurrency_limiting_in_progress - Number of concurrent requests being processed.gitaly_concurrency_limiting_queued - Number of requests for an RPC for a given repository in waiting state.gitaly_concurrency_limiting_acquiring_seconds - Duration a request waits due to concurrency limits before processing.These metrics provide a high-level view of resource utilization at a given point in time. The gitaly_command_cpu_seconds_total metric is particularly effective for
identifying specific RPCs that consume substantial CPU resources. Additional metrics are available for more detailed analysis as described in
Monitoring Gitaly.
While metrics capture overall resource usage patterns, they typically do not provide per-repository breakdowns. Therefore, logs serve as a complementary data source. To analyze logs:
This combined approach of using both metrics and logs provides comprehensive visibility into both system-wide resource usage and repository-specific patterns. Analysis tools such as Kibana or similar log aggregation platforms can facilitate this process.
If you find that your initial limits are not efficient enough, you might need to adjust them. With adaptive limiting, precise limits are less critical because the system automatically adjusts based on resource usage.
Remember that concurrency limits are scoped by repository. A limit of 30 means allowing at most 30 simultaneous in-flight requests per repository. If the limit is reached, requests are queued and only rejected if the queue is full or the maximum waiting time is reached.
{{< history >}}
{{< /history >}}
Gitaly supports two concurrency limits:
If this limit is exceeded, either:
Both of these concurrency limits can be configured statically. Though static limits can yield good protection results, they have some drawbacks:
You can overcome all of these drawbacks and keep the benefits of concurrency limiting by configuring adaptive concurrency limits. Adaptive concurrency limits are optional and build on the two concurrency limiting types. It uses Additive Increase/Multiplicative Decrease (AIMD) algorithm. Each adaptive limit:
This mechanism provides some headroom for the machine to "breathe" and speeds up current inflight requests.
The adaptive limiter calibrates the limits every 30 seconds and:
Otherwise, the limits increase by one until reaching the upper bound.
Adaptive limiting is enabled for each RPC or pack-objects cache individually. However, limits are calibrated at the same time. Adaptive limiting has the following configurations:
adaptive sets whether the adaptiveness is enabled.max_limit is the maximum concurrency limit. Gitaly increases the current limit until it reaches this number. This should be a generous value that the system can fully support under typical conditions.min_limit is the is the minimum concurrency limit of the configured RPC. When the host machine has a resource problem, Gitaly quickly reduces the limit until reaching this value. Setting min_limit to 0 could completely shut down processing, which is typically undesirable.initial_limit provides a reasonable starting point between these extremes.Prerequisites:
The following is an example to configure an adaptive limit for RPC concurrency:
# in /etc/gitlab/gitlab.rb
gitaly['configuration'] = {
# ...
cgroups: {
# Minimum required configuration to enable cgroups support.
repositories: {
count: 1
},
},
concurrency: [
{
rpc: '/gitaly.SmartHTTPService/PostUploadPackWithSidechannel',
max_queue_wait: '1s',
max_queue_size: 10,
adaptive: true,
min_limit: 10,
initial_limit: 20,
max_limit: 40
},
{
rpc: '/gitaly.SSHService/SSHUploadPackWithSidechannel',
max_queue_wait: '10s',
max_queue_size: 20,
adaptive: true,
min_limit: 10,
initial_limit: 50,
max_limit: 100
},
],
}
For more information, see RPC concurrency.
Prerequisites:
The following is an example to configure an adaptive limit for pack-objects concurrency:
# in /etc/gitlab/gitlab.rb
gitaly['pack_objects_limiting'] = {
'max_queue_length' => 200,
'max_queue_wait' => '60s',
'adaptive' => true,
'min_limit' => 10,
'initial_limit' => 20,
'max_limit' => 40
}
For more information, see pack-objects concurrency.
Adaptive concurrency limiting is very different from the usual way that GitLab protects Gitaly resources. Rather than relying on static thresholds that may be either too restrictive or too permissive, adaptive limiting intelligently responds to actual resource conditions in real-time.
This approach eliminates the need to find "perfect" threshold values through extensive calibration as described in Calibrating concurrency limits. During failure scenarios, the adaptive limiter reduces limits exponentially (for example, 60 → 30 → 15 → 10) and then automatically recovers by incrementally raising limits when the system stabilizes.
When calibrating adaptive limits, you can prioritize flexibility over precision.
Expensive Gitaly RPCs, which should be protected, can be categorized into two general types:
Each type has distinct characteristics that influence how concurrency limits should be configured. The following examples illustrate the reasoning behind limit configuration. They can also be used as a starting point.
These RPCs involve Git pull, push, and fetch operations, and possess the following characteristics:
RPCs in SmartHTTPService and SSHService fall into the pure Git data operations category. A configuration example:
{
rpc: "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel", # or `/gitaly.SmartHTTPService/SSHUploadPackWithSidechannel`
adaptive: true,
min_limit: 10, # Minimum concurrency to maintain even under extreme load
initial_limit: 40, # Starting concurrency when service initializes
max_limit: 60, # Maximum concurrency under ideal conditions
max_queue_wait: "60s",
max_queue_size: 300
}
These RPCs serve GitLab itself and other clients with different characteristics:
For these RPCs, the timeout configuration in GitLab should inform the max_queue_wait parameter. For instance, get_tree_entries typically has a medium timeout of 30 seconds in GitLab:
{
rpc: "/gitaly.CommitService/GetTreeEntries",
adaptive: true,
min_limit: 5, # Minimum throughput maintained under resource pressure
initial_limit: 10, # Initial concurrency setting
max_limit: 20, # Maximum concurrency under optimal conditions
max_queue_size: 50,
max_queue_wait: "30s"
}
To observe how adaptive limits are behaving in production environments, refer to the monitoring tools and metrics described in Monitor Gitaly adaptive concurrency limiting. Observing adaptive limit behavior helps confirm that limits are properly responding to resource pressures and adjusting as expected.