doc/administration/reference_architectures/sizing.md
{{< details >}}
{{< /details >}}
To select an appropriate reference architecture, you should use a systematic approach for assessing and sizing GitLab environments based on reference architectures.
To determine the appropriate reference architecture and any required component-specific adjustments, the following information helps you analyze:
You can use this information if you have a complex environment to select an appropriate reference architecture. You might not require this level of detail, and you can assess the size of your environment by using the information for less complex environments.
[!note] Need expert guidance? Sizing your architecture correctly is critical for optimal performance. Our Professional Services team can evaluate your specific architecture and provide tailored recommendations for performance, stability, and availability optimization.
To follow this documentation, you must have Prometheus monitoring deployed with the GitLab instance. Prometheus provides the accurate metrics required for proper sizing assessment.
If you haven't yet configured Prometheus:
kube-prometheus-stack Helm chart
to configure metrics scraping.If you can't configure Prometheus monitoring:
If migrating from other platforms, the following PromQL queries cannot be applied without existing GitLab metrics. However, the general assessment methodology remains valid:
Running PromQL queries depends on the monitoring solution you use. As noted in Prometheus monitoring documentation, monitoring data can be accessed either by connecting directly to Prometheus or by using a dashboard tool like Grafana.
Requests per second (RPS) is the primary metric for sizing GitLab infrastructure. Different traffic types (API, Web, Git operations) stress different components, so each is analyzed separately to find true capacity requirements.
Run these queries to understand your maximum load. These queries show you:
If absolute peaks are rare anomalies, sizing for sustained load may be appropriate.
Adjust time ranges in queries based on retention (change [7d] to [30d] if longer history available).
[!note] For high-activity environments,
max_over_timeorquantile_over_timequeries may time out. If this occurs, remove the outer aggregation function and visualize the inner query with a graph. For example, for API traffic peak, use:prometheussum(rate(gitlab_transaction_duration_seconds_count{controller=~"Grape", action!~".*/internal/.*"}[1m]))Then visually identify the peak values from the graphed results over your monitoring period.
To identify maximum observed RPS over the specified time period:
Run these queries:
API traffic peak, to measure peak API requests from automation, external tools, and webhooks:
max_over_time(
sum(rate(gitlab_transaction_duration_seconds_count{controller=~"Grape", action!~".*/internal/.*", action!="POST /api/jobs/request"}[1m]))[7d:1m]
)
Web traffic peak, to measure peak UI interactions from users in browsers:
max_over_time(
sum(rate(gitlab_transaction_duration_seconds_count{controller!~"Grape|HealthController|MetricsController|Repositories::GitHttpController|GraphqlController"}[1m]))[7d:1m]
)
Git pull and clone peak, to measure peak repository clone and fetch operations:
max_over_time(
(sum(rate(gitlab_transaction_duration_seconds_count{action="git_upload_pack"}[1m])) or vector(0) +
sum(rate(gitaly_service_client_requests_total{grpc_method="SSHUploadPack"}[1m])) or vector(0))[7d:1m]
)
Git push peak, to measure peak code push operations:
max_over_time(
(sum(rate(gitlab_transaction_duration_seconds_count{action="git_receive_pack"}[1m])) or vector(0) +
sum(rate(gitaly_service_client_requests_total{grpc_method="SSHReceivePack"}[1m])) or vector(0))[7d:1m]
)
Record the results.
To identify typical high-load levels, filtering out rare spikes:
Run these queries:
API sustained peak:
quantile_over_time(0.95,
sum(rate(gitlab_transaction_duration_seconds_count{controller=~"Grape", action!~".*/internal/.*", action!="POST /api/jobs/request"}[1m]))[7d:1m]
)
Web sustained peak:
quantile_over_time(0.95,
sum(rate(gitlab_transaction_duration_seconds_count{controller!~"Grape|HealthController|MetricsController|Repositories::GitHttpController|GraphqlController"}[1m]))[7d:1m]
)
Git pull and clone sustained peak:
quantile_over_time(0.95,
(sum(rate(gitlab_transaction_duration_seconds_count{action="git_upload_pack"}[1m])) or vector(0) +
sum(rate(gitaly_service_client_requests_total{grpc_method="SSHUploadPack"}[1m])) or vector(0))[7d:1m]
)
Git push sustained peak:
quantile_over_time(0.95,
(sum(rate(gitlab_transaction_duration_seconds_count{action="git_receive_pack"}[1m])) or vector(0) +
sum(rate(gitaly_service_client_requests_total{grpc_method="SSHReceivePack"}[1m])) or vector(0))[7d:1m]
)
Record the results.
To map traffic to reference architectures, using the results you recorded earlier:
Consult the available reference architectures to see which reference architecture each traffic type suggests.
Fill in an analysis table. Use the following table as a guide:
| Traffic type | Peak RPS | Peak suggested RA | Sustained RPS | Sustained suggested RA |
|---|---|---|---|---|
| API | ________ | _____ (up to ___ RPS) | _____________ | _____ (up to ____ RPS) |
| Web | ________ | _____ (up to ___ RPS) | _____________ | _____ (up to ____ RPS) |
| Git pull and clone | ________ | _____ (up to ___ RPS) | _____________ | _____ (up to ____ RPS) |
| Git push | ________ | _____ (up to ___ RPS) | _____________ | _____ (up to ____ RPS) |
Compare all reference architectures in the Peak Suggested RA column and select the largest size. Repeat for the Sustained Suggested RA column.
Document the baseline:
At this point, there are two candidate reference architecture sizes:
To choose a reference architecture:
General guidelines:
For environments under 40 RPS and where high availability (HA) is a requirement, consult the high availability section to identify whether switching to the 60 RPS / 3,000 user architecture with supported reductions is needed.
Having completed this section, you've established your baseline reference architecture size. This forms the foundation, but the following sections identify whether specific workload requires component adjustments beyond the standard configuration.
Before proceeding, ensure you've documented the details you've gathered in this section. You can use the following as a guide:
Reference architecture assessment summary:
- Selected reference architecture: _____
- Justification based on _____ RPS [absolute/sustained]
| Traffic Type | Peak RPS | Sustained RPS (95th) |
|:-------------------|:---------|:---------------------|
| API | ________ | ____________________ |
| Web | ________ | ____________________ |
| Git pull and clone | ________ | ____________________ |
| Git push | ________ | ____________________ |
Highest RPS Peak timestamp for workload analysis: _____
Total RPS is the primary sizing metric, but workload composition significantly impacts component resource requirements. Different request types stress different components with varying intensity.
Reference Architecture RPS targets assume typical workload composition based on production data:
Atypical compositions - Environments where one request type significantly exceeds typical proportions (may require component-specific adjustments even within target RPS ranges)
Use the RPS extraction queries from Extract peak traffic metrics to understand your workload composition. Compare your distribution to typical patterns:
API-heavy workloads (API >90% of total RPS):
Web-heavy workloads (Web >20% of total RPS):
Git-intensive workloads (Git >15% of total RPS or pull rates notably above typical for your size):
[!note] Small variations (5-10 RPS difference in any category) do not require architecture changes. Monitor actual component saturation metrics (CPU, memory, queue depths) from production rather than making decisions based solely on RPS comparisons. Components under 70% sustained utilization generally have sufficient capacity regardless of minor RPS variations.
Workload assessment identifies specific usage patterns that require component adjustments beyond the base reference architecture. While RPS determines overall size, workload patterns determine the shape. Two environments with identical RPS can have vastly different resource needs.
Different workloads stress different parts of GitLab architecture:
Using the peak timestamp from the earlier section, identify which endpoints received the most traffic during maximum load.
[!note] If your RPS metrics show consistently high traffic during off-hours (>50% of peak), this suggests heavy automation beyond typical patterns. For example, peak traffic that reaches 100 RPS during business hours but maintains 50+ RPS during nights and weekends indicates significant automated workload. Consider this when evaluating component adjustments.
Run this query with visualization enabled (bar chart for distribution over time, or pie chart for general distribution):
topk(20,
sum by (controller, action) (
rate(gitlab_transaction_duration_seconds_count{controller!~"HealthController|MetricsController", action!~".*/internal/.*"}[1m])
)
)
Review the results for the distribution of top endpoints during the absolute RPS peak. The results might have:
Record findings:
Workload pattern identified:
- [ ] Database-intensive
- [ ] Sidekiq- or Gitaly-intensive
- [ ] None detected
The indicators above provide initial signals of additional workloads. Because of built-in headroom in reference architectures, these workloads may be handled without adjustments. However, if strong indicators exist and high levels of automation are known, consider the following adjustments.
Based on the workload pattern identified earlier, different components require scaling:
| Workload type | When to apply | Components to scale |
|---|---|---|
| Database-intensive | <ul><li>Heavy API usage for non-Git traffic (webhooks, issues, groups, and projects)</li><li>Known extensive automation or integration workloads</li></ul> | <ul><li>Increase Rails resources</li><li>Database scaling</li></ul> |
| Sidekiq/Gitaly-intensive** | <ul><li>Heavy Git operations, CI/CD jobs, security scanning, import operations, and Git server hooks</li><li>Known CI/CD-heavy usage patterns</li></ul> | <ul><li>Increase Sidekiq specifications</li><li>Gitaly vertical scaling</li><li>Database scaling</li><li>Advanced: Configure specific job classes</li></ul> |
Resource adjustments vary based on workload intensity and saturation metrics:
If you are planning to deploy cloud-native GitLab, workload patterns identified in this assessment have additional implications for Kubernetes configuration:
Database scaling strategy depends on workload characteristics and might require multiple approaches:
Use this Prometheus query to identify read/write distribution:
# Percentage of READ operations
(
(sum(rate(gitlab_transaction_db_count_total[5m])) - sum(rate(gitlab_transaction_db_write_count_total[5m]))) /
sum(rate(gitlab_transaction_db_count_total[5m]))
) * 100
Having completed this section, you've identified workload patterns and determined any required component adjustments.
Before you proceed, record the complete workload assessment:
Workload pattern identified:
- [ ] Database-intensive
- [ ] Sidekiq- or Gitaly-intensive
- [ ] None detected
- Component adjustments needed: _____
In the next section, you assess special data characteristics that might require additional infrastructure considerations.
Repository characteristics and network usage patterns can significantly impact GitLab performance beyond what RPS metrics reveal.
Large monorepos, extensive binary files, and network-intensive operations require infrastructure adjustments that standard sizing doesn't account for.
Large monorepos (several gigabytes or more) fundamentally change how Git operations perform. A single clone of a 10 GB repository consumes more resources than hundreds of clones of typical repositories.
These repositories affect not just Gitaly, but also Rails, Sidekiq, and the database depending on the workload.
The profiling process focuses on identifying repositories that significantly exceed typical sizes:
To identify a repository's size:
Go to a project's usage quotas.
Review the Repository storage type.
Calculate the number of projects with repositories larger than 2 GB and larger than 10 GB.
Record the results:
Number of medium monorepos (2GB - 10GB): _____
Number of large monorepos (>10GB): _____
Large repositories require both vertical scaling and operational adjustments. These repositories affect performance across the entire stack, from Git operations and CPU usage to memory consumption and network bandwidth.
| Scenario | Component adjustments |
|---|---|
| Several medium monorepos | <ul><li>Gitaly: 1.5x-2x specifications</li><li>Rails: 1.25x-1.5x specifications</li></ul> |
| Large monorepos | <ul><li>Gitaly: 2x-4x specifications</li><li>Rails: 1.5x-2x specifications</li><li>Consider sharding monorepo to dedicated Gitaly node</li></ul> |
Additional optimization strategies for monorepo environments are documented in Improving monorepo performance, including Git LFS for binary files and shallow cloning.
Network saturation causes unique problems that are often difficult to diagnose. Unlike CPU or memory bottlenecks that affect specific operations, network saturation can cause seemingly random timeouts across all GitLab functions.
Common network load sources:
Calculate peak and baseline network consumption to identify potential bottlenecks. Assess both to distinguish between occasional spikes (handled by burst capacity) and sustained high traffic (requiring network-enhanced VMs).
Run the following queries:
# Outbound traffic (Gbps) - top 10 nodes
topk(10, sum by (instance) (rate(node_network_transmit_bytes_total{device!="lo"}[5m]) * 8 / 1000000000))
# Inbound traffic (Gbps) - top 10 nodes
topk(10, sum by (instance) (rate(node_network_receive_bytes_total{device!="lo"}[5m]) * 8 / 1000000000))
Record both peak spikes and typical baseline observed across your monitoring period:
Peak outbound traffic: _____ Gbps (baseline: _____ Gbps)
Peak inbound traffic: _____ Gbps (baseline: _____ Gbps)
The thresholds below are approximate guidelines only. Actual network bandwidth guarantees vary significantly by cloud provider and VM type. Always verify the network specifications (baseline and burst limits) for your specific instance types to ensure they align with your workload patterns.
Based on outbound and inbound traffic measurements:
| Network load | Threshold | Why this threshold | Action required |
|---|---|---|---|
| Standard | <1 Gbps | Within baseline bandwidth of most standard instances | Standard instances sufficient |
| Moderate | 1-3 Gbps | May exceed AWS baseline but within GCP/Azure standard instances | <ul><li>AWS: Monitor for throttling, might need network-enhanced</li><li>GCP/Azure: Standard instances usually sufficient</li></ul> |
| High | 3-10 Gbps | Exceeds AWS baseline. Approaches limits of some standard instances | <ul><li>AWS: Network-enhanced VMs required</li><li>GCP/Azure: Verify instance bandwidth specifications</li></ul> |
| Very High | >10 Gbps | Exceeds most standard instance capabilities | <ul><li>Network-enhanced VMs required across all providers</li><li>For large artifacts, disable object proxy download</li></ul> |
Before you proceed, record the complete data profiling assessment:
Data Profile Summary:
- Medium monorepos (2GB-10GB): _____
- Large monorepos (>10GB): _____
- Gitaly adjustments needed: _____
- Rails adjustments needed: _____
- Peak outbound traffic: _____ Gbps (sustained baseline: _____ Gbps)
- Peak inbound traffic: _____ Gbps (sustained baseline: _____ Gbps)
- Network infrastructure changes: _____
Understanding the existing environment provides crucial context for recommendations:
Collect comprehensive environment data to establish the current state:
Compare the current environment to available reference architectures. Consider the following:
Record your findings:
Nearest Reference Architecture: _____
Custom configurations or deviations:
- _____
- _____
Compare the current environment against the recommended reference architecture you developed from the previous sections. If the current environment:
Current environment might be over-provisioned or has valid reasons for additional resources that need to be analysed. Check CPU/memory resource utilization on Rails, Gitaly, the database, and Sidekiq.
Low utilization (<40%) suggests over-provisioning. High utilization might indicate specific workload requirements not captured in RPS analysis.
Review whether recommendations need adjustment for undiscovered requirements.
If current environment has performance issues:
Having completed this section, you've analyzed the current environment and compared against recommendations.
Before you proceed, record the complete environment comparison:
Current Environment Analysis:
- Current RA (nearest): _____
- Recommended RA (from RPS and workload analysis): _____
- Resource comparison: [ ] Current < Recommended [ ] Current ≈ Recommended [ ] Current > Recommended
- Performance status: [ ] No issues [ ] Has issues
- Adjustments needed: _____
- Notes: _____
In the next section, you assess growth projections to ensure sizing remains appropriate over time.
Infrastructure changes require significant lead time for procurement, migration, and testing. Growth estimation ensures the recommended architecture remains viable throughout the implementation period and beyond.
Historical trends combined with business plans provide the most accurate growth projections.
Past growth patterns can help to predict future trajectory better than business projections:
Expected business changes that impact infrastructure needs:
Evaluate whether any of these factors (or other organizational changes) could affect load on the environment and require infrastructure adjustments. Document relevant changes and their expected timeline.
Based on historical trends and business projections, select the appropriate growth accommodation strategy:
Having completed this section, growth projections are incorporated into sizing decision.
Record the complete growth analysis:
Growth Assessment Summary:
- Historical RPS comparison: _____
- Business growth factors: _____
- Growth category: [ ] Stable/Minimal [ ] Moderate [ ] Significant
- Strategy: [ ] Current RA sufficient [ ] Size for projected growth
In the next section, you compile all findings into final architecture recommendations.
Compile findings from all previous sections to determine the optimal reference architecture and required adjustments.
Gather the key outputs from each section to form the sizing decision:
Based on the comprehensive assessment, record the complete architecture recommendation:
Final Architecture Recommendation
==================================
- Selected RA: [Size] based on [Absolute/Sustained] Peak RPS of [value]
- Component adjustments required:
- [ ] No adjustments needed - standard RA configuration sufficient
- [ ] Adjustments required:
- Rails: _____
- Sidekiq: _____
- Database: _____
- Gitaly: _____
- Network considerations: □ Standard instances □ Network-optimized instances
- Selected RA is aligned with existing environment: [Yes/No/Not applicable]
- Growth accommodation: [Current RA sufficient / Sized up for growth]
Assessment Summary:
├── RPS Analysis
│ ├── Absolute Peak RPS: _____ → Baseline RA: _____
│ └── Sustained Peak RPS: _____ → Sustained RA: _____
├── Workload Type
│ └── Type: [ ] Database-Intensive [ ] Sidekiq-Intensive [ ] None
├── Data Profile
│ ├── Large repos (>2GB): _____ | Monorepos (>10GB): _____
│ └── Network: Peak _____ Gbps | Baseline _____ Gbps
├── Current State
│ ├── Nearest RA: _____
| └── Discrepancies and customizations: _____
└── Growth
├── Growth projection: _____
└── Growth buffer strategy: _____
Having completed all the sections, the sizing assessment is complete. The final recommendation includes:
Regular monitoring remains essential to validate assumptions and adjust infrastructure as workload patterns evolve.