doc/development/architecture.md
There are two software distributions of GitLab:
The EE repository has been archived. GitLab now operates under a single codebase.
GitLab is available under different subscriptions.
New versions of GitLab are released from stable branches, and the main branch is used for
bleeding-edge development.
For more information, see the GitLab release process.
Both distributions require additional components. These components are described in the
Component details section, and all have their own repositories.
New versions of each dependent component are usually tags, but staying on the main branch of the
GitLab codebase gives you the latest stable version of those components. New versions are
generally released around the same time as GitLab releases, with the exception of informal security
updates deemed critical.
A typical install of GitLab is on GNU/Linux, but growing number of deployments also use the Kubernetes platform. The largest known GitLab instance is on GitLab.com, which is deployed using our official GitLab Helm chart and the official Linux package.
A typical installation uses NGINX or Apache as a web server to proxy through GitLab Workhorse and into the Puma application server. GitLab serves web pages and the GitLab API using the Puma application server. It uses Sidekiq as a job queue which, in turn, uses Redis as a non-persistent database backend for job information, metadata, and incoming jobs.
By default, communication between Puma and Workhorse is via a Unix domain socket, but forwarding
requests via TCP is also supported. Workhorse accesses the gitlab/public directory, bypassing the
Puma application server to serve static pages, uploads (for example, avatar images or attachments),
and pre-compiled assets.
The GitLab application uses PostgreSQL for persistent database information (for example, users,
permissions, issues, or other metadata). GitLab stores the bare Git repositories in the location
defined in the configuration file, repositories: section.
It also keeps default branch and hook information with the bare repository.
When serving repositories over HTTP/HTTPS GitLab uses the GitLab API to resolve authorization and access and to serve Git objects.
The add-on component GitLab Shell serves repositories over SSH. It manages the SSH keys within the
location defined in the configuration file, GitLab Shell section.
The file in that location should never be manually edited. GitLab Shell accesses the bare
repositories through Gitaly to serve Git objects, and communicates with Redis to submit jobs to
Sidekiq for GitLab to process. GitLab Shell queries the GitLab API to determine authorization and access.
Gitaly executes Git operations from GitLab Shell and the GitLab web app, and provides an API to the GitLab web app to get attributes from Git (for example, title, branches, tags, or other metadata), and to get blobs (for example, diffs, commits, or files).
You may also be interested in the production architecture of GitLab.com.
There are fundamental differences in how the application behaves when it is installed on a traditional Linux machine compared to a containerized platform, such as Kubernetes.
Compared to our official installation methods, some of the notable differences are:
In other words, the shared state between services needs to be carefully considered when architecting new features and adding new components. Services that need to have access to the same files, need to be able to exchange information through the appropriate APIs. Whenever possible, this should not be done with files.
Since components written with the API-first philosophy in mind are compatible with both methods, all new features and services must be written to consider Kubernetes compatibility first.
The simplest way to ensure this, is to add support for your feature or service to the official GitLab Helm chart or reach out to the Distribution team.
Refer to the process for adding new service components for more details.
This is a simplified architecture diagram that can be used to understand the GitLab architecture.
A complete architecture diagram is available in our component diagram below.
%%{init: {"flowchart": { "useMaxWidth": false } }}%%
graph TB
%% Component declarations and formatting
HTTP((HTTP/HTTPS))
SSH((SSH))
GitLabPages(GitLab Pages)
GitLabWorkhorse(GitLab Workhorse)
GitLabShell(GitLab Shell)
Gitaly(Gitaly)
Puma("Puma (Gitlab Rails)")
Sidekiq("Sidekiq (GitLab Rails)")
PostgreSQL(PostgreSQL)
Redis(Redis)
HTTP -- TCP 80,443 --> NGINX
SSH -- TCP 22 --> GitLabShell
NGINX -- TCP 8090 --> GitLabPages
NGINX --> GitLabWorkhorse
GitLabShell --> Gitaly
GitLabShell --> GitLabWorkhorse
GitLabWorkhorse --> Gitaly
GitLabWorkhorse --> Puma
GitLabWorkhorse --> Redis
Sidekiq --> PostgreSQL
Sidekiq --> Redis
Puma --> PostgreSQL
Puma --> Redis
Puma --> Gitaly
Gitaly --> GitLabWorkhorse
All connections use Unix sockets unless noted otherwise.
%%{init: {"flowchart": { "useMaxWidth": false } }}%%
graph LR
%% Anchor items in the appropriate subgraph.
%% Link them where the destination* is.
subgraph Clients
Browser((Browser))
Git((Git))
end
%% External Components / Applications
Geo{{GitLab Geo}} -- TCP 80, 443 --> HTTP
Geo -- TCP 22 --> SSH
Geo -- TCP 5432 --> PostgreSQL
Runner{{GitLab Runner}} -- TCP 443 --> HTTP
K8sAgent{{GitLab agent}} -- TCP 443 --> HTTP
%% GitLab Application Suite
subgraph GitLab
subgraph Ingress
HTTP[[HTTP/HTTPS]]
SSH[[SSH]]
NGINX[NGINX]
GitLabShell[GitLab Shell]
%% inbound/internal
Browser -- TCP 80,443 --> HTTP
Git -- TCP 80,443 --> HTTP
Git -- TCP 22 --> SSH
HTTP -- TCP 80, 443 --> NGINX
SSH -- TCP 22 --> GitLabShell
end
subgraph GitLab Services
%% inbound from NGINX
NGINX --> GitLabWorkhorse
NGINX -- TCP 8090 --> GitLabPages
NGINX -- TCP 8150 --> GitLabKas
NGINX --> Registry
%% inbound from GitLabShell
GitLabShell --> GitLabWorkhorse
%% services
Puma["Puma (GitLab Rails)"]
Puma <--> Registry
GitLabWorkhorse[GitLab Workhorse] <--> Puma
GitLabKas[GitLab agent server] --> GitLabWorkhorse
GitLabPages[GitLab Pages] --> GitLabWorkhorse
Mailroom
Sidekiq
end
subgraph Integrated Services
%% Mattermost
Mattermost
Mattermost ---> GitLabWorkhorse
NGINX --> Mattermost
%% Grafana
Grafana
NGINX --> Grafana
end
subgraph Metadata
%% PostgreSQL
PostgreSQL
PostgreSQL --> Consul
%% Consul and inbound
Consul
Puma ---> Consul
Sidekiq ---> Consul
Migrations --> PostgreSQL
%% PgBouncer and inbound
PgBouncer
PgBouncer --> Consul
PgBouncer --> PostgreSQL
Sidekiq --> PgBouncer
Puma --> PgBouncer
end
subgraph State
%% Redis and inbound
Redis
Puma --> Redis
Sidekiq --> Redis
GitLabWorkhorse --> Redis
Mailroom --> Redis
GitLabKas --> Redis
%% Sentinel and inbound
Sentinel <--> Redis
Puma --> Sentinel
Sidekiq --> Sentinel
GitLabWorkhorse --> Sentinel
Mailroom --> Sentinel
GitLabKas --> Sentinel
end
subgraph Git Repositories
%% Gitaly / Praefect
Praefect --> Gitaly
GitLabKas --> Praefect
GitLabShell --> Praefect
GitLabWorkhorse --> Praefect
Puma --> Praefect
Sidekiq --> Praefect
Praefect <--> PraefectPGSQL[PostgreSQL]
%% Gitaly makes API calls
%% Ordered here to ensure placement.
Gitaly --> GitLabWorkhorse
end
subgraph Storage
%% ObjectStorage and inbound traffic
ObjectStorage["Object storage"]
Puma -- TCP 443 --> ObjectStorage
Sidekiq -- TCP 443 --> ObjectStorage
GitLabWorkhorse -- TCP 443 --> ObjectStorage
Registry -- TCP 443 --> ObjectStorage
GitLabPages -- TCP 443 --> ObjectStorage
%% Gitaly can perform repository backups to object storage.
Gitaly --> ObjectStorage
end
subgraph Monitoring
%% Prometheus
Grafana -- TCP 9090 --> Prometheus[Prometheus]
Prometheus -- TCP 80, 443 --> Puma
RedisExporter[Redis Exporter] --> Redis
Prometheus -- TCP 9121 --> RedisExporter
PostgreSQLExporter[PostgreSQL Exporter] --> PostgreSQL
PgBouncerExporter[PgBouncer Exporter] --> PgBouncer
Prometheus -- TCP 9187 --> PostgreSQLExporter
Prometheus -- TCP 9100 --> NodeExporter[Node Exporter]
Prometheus -- TCP 9168 --> GitLabExporter[GitLab Exporter]
Prometheus -- TCP 9127 --> PgBouncerExporter
Prometheus --> Alertmanager
GitLabExporter --> PostgreSQL
GitLabExporter --> GitLabShell
GitLabExporter --> Sidekiq
%% Alertmanager
Alertmanager -- TCP 25 --> SMTP
end
%% end subgraph GitLab
end
subgraph External
subgraph External Services
SMTP[SMTP Gateway]
LDAP
%% Outbound SMTP
Sidekiq -- TCP 25 --> SMTP
Puma -- TCP 25 --> SMTP
Mailroom -- TCP 25 --> SMTP
%% Outbound LDAP
Puma -- TCP 369 --> LDAP
Sidekiq -- TCP 369 --> LDAP
%% Elasticsearch
Elasticsearch
Puma -- TCP 9200 --> Elasticsearch
Sidekiq -- TCP 9200 --> Elasticsearch
Elasticsearch --> Praefect
%% Zoekt
Zoekt --> Praefect
end
subgraph External Monitoring
%% Sentry
Sidekiq -- TCP 80, 443 --> Sentry
Puma -- TCP 80, 443 --> Sentry
%% Jaeger
Jaeger
Sidekiq -- UDP 6831 --> Jaeger
Puma -- UDP 6831 --> Jaeger
Gitaly -- UDP 6831 --> Jaeger
GitLabShell -- UDP 6831 --> Jaeger
GitLabWorkhorse -- UDP 6831 --> Jaeger
end
%% end subgraph External
end
click Alertmanager "#alertmanager"
click Praefect "#praefect"
click Geo "#gitlab-geo"
click NGINX "#nginx"
click Runner "#gitlab-runner"
click Registry "#registry"
click ObjectStorage "#object-storage"
click Mattermost "#mattermost"
click Gitaly "#gitaly"
click Jaeger "#jaeger"
click GitLabWorkhorse "#gitlab-workhorse"
click LDAP "#ldap-authentication"
click Puma "#puma"
click GitLabShell "#gitlab-shell"
click SSH "#ssh-request-22"
click Sidekiq "#sidekiq"
click Sentry "#sentry"
click GitLabExporter "#gitlab-exporter"
click Elasticsearch "#elasticsearch"
click Migrations "#database-migrations"
click PostgreSQL "#postgresql"
click Consul "#consul"
click PgBouncer "#pgbouncer"
click PgBouncerExporter "#pgbouncer-exporter"
click RedisExporter "#redis-exporter"
click Redis "#redis"
click Prometheus "#prometheus"
click Grafana "#grafana"
click GitLabPages "#gitlab-pages"
click PostgreSQLExporter "#postgresql-exporter"
click SMTP "#outbound-email"
click NodeExporter "#node-exporter"
Component statuses are linked to configuration documentation for each component.
| Component | Description | Omnibus GitLab | GitLab Environment Toolkit (GET) | GitLab chart | minikube Minimal | GitLab.com | Source | GDK | CE/EE |
|---|---|---|---|---|---|---|---|---|---|
| AI Gateway | GitLab AI-native features | ⤓ | ❌ | ✅ | ❌ | ✅ | ⤓ | ✅ | EE Only |
| GitLab Duo Workflow Service | GitLab AI-native features | ⤓ | ❌ | ✅ | ❌ | ✅ | ⤓ | ✅ | EE Only |
| Certificate Management | TLS Settings, Let's Encrypt | ✅ | ✅ | ✅ | ⚙ | ✅ | ⚙ | ⚙ | CE & EE |
| Consul | Database node discovery, failover | ⚙ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | EE Only |
| Database Migrations | Database migrations | ✅ | ✅ | ✅ | ✅ | ✅ | ⚙ | ✅ | CE & EE |
| Elasticsearch | Improved search within GitLab | ⤓ | ⚙ | ⤓ | ⤓ | ✅ | ⤓ | ⚙ | EE Only |
| Gitaly | Git RPC service for handling all Git calls made by GitLab | ✅ | ✅ | ✅ | ✅ | ✅ | ⚙ | ✅ | CE & EE |
| GitLab Exporter | Generates a variety of GitLab metrics | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | CE & EE |
| GitLab Geo | Geographically distributed GitLab site | ⚙ | ⚙ | ❌ | ❌ | ✅ | ❌ | ⚙ | EE Only |
| GitLab Pages | Hosts static websites | ⚙ | ⚙ | ⚙ | ❌ | ✅ | ⚙ | ⚙ | CE & EE |
| GitLab agent for Kubernetes | Integrate Kubernetes clusters in a cloud-native way | ⚙ | ⚙ | ⚙ | ❌ | ❌ | ⤓ | ⚙ | EE Only |
| GitLab self-monitoring: Alertmanager | Deduplicates, groups, and routes alerts from Prometheus | ⚙ | ⚙ | ✅ | ⚙ | ✅ | ❌ | ❌ | CE & EE |
| GitLab self-monitoring: Grafana | Metrics dashboard | ✅ | ✅ | ⚙ | ⤓ | ✅ | ❌ | ⚙ | CE & EE |
| GitLab self-monitoring: Jaeger | View traces generated by the GitLab instance | ❌ | ⚙ | ⚙ | ❌ | ❌ | ⤓ | ⚙ | CE & EE |
| GitLab self-monitoring: Prometheus | Time-series database, metrics collection, and query service | ✅ | ✅ | ✅ | ⚙ | ✅ | ❌ | ⚙ | CE & EE |
| GitLab self-monitoring: Sentry | Track errors generated by the GitLab instance | ⤓ | ⤓ | ⤓ | ❌ | ✅ | ⤓ | ⤓ | CE & EE |
| GitLab Shell | Handles git over SSH sessions | ✅ | ✅ | ✅ | ✅ | ✅ | ⚙ | ✅ | CE & EE |
| GitLab Workhorse | Smart reverse proxy, handles large HTTP requests | ✅ | ✅ | ✅ | ✅ | ✅ | ⚙ | ✅ | CE & EE |
| Inbound email (SMTP) | Receive messages to update issues | ⤓ | ⤓ | ⚙ | ⤓ | ✅ | ⤓ | ⤓ | CE & EE |
| Jaeger integration | Distributed tracing for deployed apps | ⤓ | ⤓ | ⤓ | ⤓ | ⤓ | ⤓ | ⚙ | EE Only |
| LDAP Authentication | Authenticate users against centralized LDAP directory | ⤓ | ⤓ | ⤓ | ⤓ | ❌ | ⤓ | ⚙ | CE & EE |
| Mattermost | Open-source Slack alternative | ⚙ | ⚙ | ⤓ | ⤓ | ⤓ | ❌ | ⚙ | CE & EE |
| Object storage | S3-compatible object storage service | ⤓ | ⤓ | ✅ | ✅ | ✅ | ❌ | ⚙ | CE & EE |
| NGINX | Routes requests to appropriate components, terminates SSL | ✅ | ✅ | ✅ | ⚙ | ✅ | ⤓ | ⚙ | CE & EE |
| Node Exporter | Prometheus endpoint with system metrics | ✅ | ✅ | N/A | N/A | ✅ | ❌ | ❌ | CE & EE |
| Outbound email (SMTP) | Send email messages to users | ⤓ | ⤓ | ⚙ | ⤓ | ✅ | ⤓ | ⤓ | CE & EE |
| Patroni | Manage PostgreSQL HA cluster leader selection and replication | ⚙ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | EE Only |
| PgBouncer Exporter | Prometheus endpoint with PgBouncer metrics | ⚙ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | CE & EE |
| PgBouncer | Database connection pooling, failover | ⚙ | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | EE Only |
| PostgreSQL Exporter | Prometheus endpoint with PostgreSQL metrics | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | CE & EE |
| PostgreSQL | Database | ✅ | ✅ | ✅ | ✅ | ✅ | ⤓ | ✅ | CE & EE |
| Praefect | A transparent proxy between any Git client and Gitaly storage nodes. | ✅ | ✅ | ⚙ | ❌ | ❌ | ⚙ | ✅ | CE & EE |
| Puma (GitLab Rails) | Handles requests for the web interface and API | ✅ | ✅ | ✅ | ✅ | ✅ | ⚙ | ✅ | CE & EE |
| Redis Exporter | Prometheus endpoint with Redis metrics | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ | CE & EE |
| Redis | Caching service | ✅ | ✅ | ✅ | ✅ | ✅ | ⤓ | ✅ | CE & EE |
| Registry | Container registry, allows pushing and pulling of images | ⚙ | ⚙ | ✅ | ✅ | ✅ | ⤓ | ⚙ | CE & EE |
| Runner | Executes GitLab CI/CD jobs | ⤓ | ⤓ | ✅ | ⚙ | ✅ | ⚙ | ⚙ | CE & EE |
| Sentry integration | Error tracking for deployed apps | ⤓ | ⤓ | ⤓ | ⤓ | ⤓ | ⤓ | ⤓ | CE & EE |
| Sidekiq | Background jobs processor | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | CE & EE |
| Token Revocation API | Receives and revokes leaked secrets | ❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | EE Only |
This document is designed to be consumed by systems administrators and GitLab Support Engineers who want to understand more about the internals of GitLab and how they work together.
When deployed, GitLab should be considered the amalgamation of the below processes. When troubleshooting or debugging, be as specific as possible as to which component you are referencing. That should increase clarity and reduce confusion.
Layers
GitLab can be considered to have two layers from a process perspective:
alertmanagerAlert manager is a tool provided by Prometheus that "handles alerts sent by client applications such as the Prometheus server. It takes care of deduplicating, grouping, and routing them to the correct receiver integration such as email, PagerDuty, or Opsgenie. It also takes care of silencing and inhibition of alerts." You can read more in issue #45740 about what we alert on.
GitLab AI Gateway is a standalone-service that will give access to AI features to all users of GitLab, no matter which instance they are using: self-managed, dedicated or GitLab.com.
You can read more:
GitLab Duo Workflow Service is our agentic AI Features that is deployed via our Runway service in the
You can read more about:
Consul is a tool for service discovery and configuration. Consul is distributed, highly available, and extremely scalable.
Elasticsearch is a distributed RESTful search engine built for the cloud.
gitalyGitaly is a service designed by GitLab to remove our need for NFS for Git storage in distributed deployments of GitLab (think GitLab.com or High Availability Deployments). As of 11.3.0, this service handles all Git level access in GitLab. You can read more about the project in the project's README.
praefectPraefect is a transparent proxy between each Git client and the Gitaly coordinating the replication of repository updates to secondary nodes.
Geo is a premium feature built to help speed up the development of distributed teams by providing one or more read-only mirrors of a primary GitLab instance. This mirror (a Geo secondary site) reduces the time to clone or fetch large repositories and projects, or can be part of a Disaster Recovery solution.
gitlab-exporterGitLab Exporter is a process designed in house that allows us to export metrics about GitLab application internals to Prometheus. You can read more in the project's README.
The GitLab agent for Kubernetes is an active in-cluster component for solving GitLab and Kubernetes integration tasks in a secure and cloud-native way.
You can use it to sync deployments onto your Kubernetes cluster.
GitLab Pages is a feature that allows you to publish static websites directly from a repository in GitLab.
You can use it either for personal or business websites, such as portfolios, documentation, manifestos, and business presentations. You can also attribute any license to your content.
GitLab Runner runs jobs and sends the results to GitLab.
GitLab CI/CD is the open-source continuous integration service included with GitLab that coordinates the testing. The old name of this project was GitLab CI Multi Runner, but you should use GitLab Runner (without CI) from now on.
GitLab Shell is a program designed at GitLab to handle SSH-based git sessions, and modifies the list of authorized keys. GitLab Shell is not a Unix shell nor a replacement for Bash or Zsh.
gitlab-workhorseGitLab Workhorse is a program designed at GitLab to help alleviate pressure from Puma. You can read more about the historical reasons for developing. It's designed to act as a smart reverse proxy to help speed up GitLab as a whole.
Grafana is an open source, feature rich metrics dashboard and graph editor for Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB.
Jaeger, inspired by Dapper and OpenZipkin, is a distributed tracing system. It can be used for monitoring microservices-based distributed systems.
logrotateGitLab is comprised of a large number of services that all log. We bundle our own Logrotate to make sure we were logging responsibly. This is just a packaged version of the common open source offering.
Mattermost is an open source, private cloud, Slack-alternative from https://mattermost.com.
GitLab requires an S3-compatible object storage for storing data such as CI artifacts, LFS objects, uploads, and container registry images. GitLab is compatible with any object storage provider that offers full S3 API compatibility. The choice of provider is your responsibility. Cloud-managed services such as Amazon S3, Google Cloud Storage, and Azure Blob Storage are commonly used, as is any self-hosted S3-compatible solution.
nginxNGINX has an Ingress port for all HTTP requests and routes them to the appropriate sub-systems within GitLab. We are bundling an unmodified version of the popular open source webserver.
node-exporterNode Exporter is a Prometheus tool that gives us metrics on the underlying machine (think CPU/Disk/Load). It's just a packaged version of the common open source offering from the Prometheus project.
patroniLightweight connection pooler for PostgreSQL.
Prometheus exporter for PgBouncer. Exports metrics at 9127/metrics.
postgresqlGitLab packages the popular Database to provide storage for Application meta data and user information.
postgres-exporterpostgres_exporter is the community provided Prometheus exporter that delivers data about PostgreSQL to Prometheus for use in Grafana Dashboards.
prometheusPrometheus is a time-series tool that helps GitLab administrators expose metrics about the individual processes used to provide GitLab the service.
redisRedis is packaged to provide a place to store:
See our Redis guidelines for more information about how GitLab uses Redis.
redis-exporterRedis Exporter is designed to give specific metrics about the Redis process to Prometheus so that we can graph these metrics in Grafana.
The registry is what users use to store their own Docker images. The bundled
registry uses NGINX as a load balancer and GitLab as an authentication manager.
Whenever a client requests to pull or push an image from the registry, it
returns a 401 response along with a header detailing where to get an
authentication token, in this case the GitLab instance. The client then
requests a pull or push auth token from GitLab and retries the original request
to the registry. For more information, see
token authentication.
An external registry can also be configured to use GitLab as an auth endpoint.
Sentry fundamentally is a service that helps you monitor and fix crashes in real time. The server is in Python, but it contains a full API for sending events from any language, in any application.
For monitoring deployed apps, see the Sentry integration docs
sidekiqSidekiq is a Ruby background job processor that pulls jobs from the Redis queue and processes them. Background jobs allow GitLab to provide a faster request/response cycle by moving work into the background.
Starting with GitLab 13.0, Puma is the default web server.
pumaPuma is a Ruby application server that is used to run the core Rails Application that provides the user facing features in GitLab. Often this displays in process output as bundle or config.ru depending on the GitLab version.
GitLab provides two "interfaces" for end users to access the service:
It's important to understand the distinction as some processes are used in both and others are exclusive to a specific request type.
When making a request to an HTTP Endpoint (think /users/sign_in) the request takes the following path through the GitLab Service:
Below we describe the different paths that HTTP vs. SSH Git requests take. There is some overlap with the Web Request Cycle but also some differences.
Git operations over HTTP use the stateless "smart" protocol described in the Git documentation, but responsibility for handling these operations is split across several GitLab components.
Here is a sequence diagram for git fetch. All requests pass through
NGINX and any other HTTP load balancers, but are not transformed in any
way by them. All paths are presented relative to a /namespace/project.git URL.
sequenceDiagram
participant Git on client
participant NGINX
participant Workhorse
participant Rails
participant Gitaly
participant Git on server
Note left of Git on client: git fetch
info-refs
Git on client->>+Workhorse: GET /info/refs?service=git-upload-pack
Workhorse->>+Rails: GET /info/refs?service=git-upload-pack
Note right of Rails: Auth check
Rails-->>-Workhorse: Gitlab::Workhorse.git_http_ok
Workhorse->>+Gitaly: SmartHTTPService.InfoRefsUploadPack request
Gitaly->>+Git on server: git upload-pack --stateless-rpc --advertise-refs
Git on server-->>-Gitaly: git upload-pack response
Gitaly-->>-Workhorse: SmartHTTPService.InfoRefsUploadPack response
Workhorse-->>-Git on client: 200 OK
Note left of Git on client: git fetch
fetch-pack
Git on client->>+Workhorse: POST /git-upload-pack
Workhorse->>+Rails: POST /git-upload-pack
Note right of Rails: Auth check
Rails-->>-Workhorse: Gitlab::Workhorse.git_http_ok
Workhorse->>+Gitaly: SmartHTTPService.PostUploadPack request
Gitaly->>+Git on server: git upload-pack --stateless-rpc
Git on server-->>-Gitaly: git upload-pack response
Gitaly-->>-Workhorse: SmartHTTPService.PostUploadPack response
Workhorse-->>-Git on client: 200 OK
The sequence is similar for git push, except git-receive-pack is used
instead of git-upload-pack.
Git operations over SSH can use the stateful protocol described in the Git documentation, but responsibility for handling them is split across several GitLab components.
No GitLab components speak SSH directly - all SSH connections are made between
Git on the client machine and the SSH server, which terminates the connection.
To the SSH server, all connections are authenticated as the git user; GitLab
users are differentiated by the SSH key presented by the client.
Here is a sequence diagram for git fetch, assuming Fast SSH key lookup
is enabled. AuthorizedKeysCommand is an executable provided by
GitLab Shell:
sequenceDiagram
participant Git on client
participant SSH server
participant AuthorizedKeysCommand
participant GitLab Shell
participant Rails
participant Gitaly
participant Git on server
Note left of Git on client: git fetch
Git on client->>+SSH server: ssh git fetch-pack request
SSH server->>+AuthorizedKeysCommand: gitlab-shell-authorized-keys-check git AAAA...
AuthorizedKeysCommand->>+Rails: GET /internal/api/authorized_keys?key=AAAA...
Note right of Rails: Lookup key ID
Rails-->>-AuthorizedKeysCommand: 200 OK, command="gitlab-shell upload-pack key_id=1"
AuthorizedKeysCommand-->>-SSH server: command="gitlab-shell upload-pack key_id=1"
SSH server->>+GitLab Shell: gitlab-shell upload-pack key_id=1
GitLab Shell->>+Rails: GET /internal/api/allowed?action=upload_pack&key_id=1
Note right of Rails: Auth check
Rails-->>-GitLab Shell: 200 OK, { gitaly: ... }
GitLab Shell->>+Gitaly: SSHService.SSHUploadPack request
Gitaly->>+Git on server: git upload-pack request
Note over Git on client,Git on server: Bidirectional communication between Git client and server
Git on server-->>-Gitaly: git upload-pack response
Gitaly -->>-GitLab Shell: SSHService.SSHUploadPack response
GitLab Shell-->>-SSH server: gitlab-shell upload-pack response
SSH server-->>-Git on client: ssh git fetch-pack response
The git push operation is very similar, except git receive-pack is used
instead of git upload-pack.
If fast SSH key lookups are not enabled, the SSH server reads from the
~git/.ssh/authorized_keys file to determine what command to run for a given
SSH session. This is kept up to date by an AuthorizedKeysWorker
in Rails, scheduled to run whenever an SSH key is modified by a user.
SSH certificates may be used
instead of keys. In this case, AuthorizedKeysCommand is replaced with an
AuthorizedPrincipalsCommand. This extracts a username from the certificate
without using the Rails internal API, which is used instead of key_id in the
/api/internal/allowed call later.
GitLab Shell also has a few operations that do not involve Gitaly, such as resetting two-factor authentication codes. These are handled in the same way, except there is no round-trip into Gitaly - Rails performs the action as part of the internal API call, and GitLab Shell streams the response back to the user directly.
When referring to ~git in the pictures it means the home directory of the Git user which is typically /home/git.
GitLab is primarily installed within the /home/git user home directory as git user. Within the home directory is where the GitLab server software resides as well as the repositories (though the repository location is configurable).
The bare repositories are located in /home/git/repositories. GitLab is a Ruby on rails application so the particulars of the inner workings can be learned by studying how a Ruby on rails application works.
To serve repositories over SSH there's an add-on application called GitLab Shell which is installed in /home/git/gitlab-shell.
To summarize here's the directory structure of the git user home directory.
ps aux | grep '^git'
GitLab has several components to operate. It requires a persistent database
(PostgreSQL) and Redis database, and uses Apache httpd or NGINX to proxypass
Puma. All these components should run as different system users to GitLab
(for example, postgres, redis, and www-data, instead of git).
As the git user it starts Sidekiq and Puma (a simple Ruby HTTP server
running on port 8080 by default). Under the GitLab user there are usually 4
processes: puma master (1 process), puma cluster worker
(2 processes), sidekiq (1 process).
Repositories get accessed via HTTP or SSH. HTTP cloning/push/pull uses the GitLab API and SSH cloning is handled by GitLab Shell (previously explained).
See the README for more information.
The GitLab init script starts and stops Puma and Sidekiq:
/etc/init.d/gitlab
Usage: service gitlab {start|stop|restart|reload|status}
Redis (key-value store/non-persistent database):
/etc/init.d/redis
Usage: /etc/init.d/redis {start|stop|status|restart|condrestart|try-restart}
SSH daemon:
/etc/init.d/sshd
Usage: /etc/init.d/sshd {start|stop|restart|reload|force-reload|condrestart|try-restart|status}
Web server (one of the following):
/etc/init.d/httpd
Usage: httpd {start|stop|restart|condrestart|try-restart|force-reload|reload|status|fullstatus|graceful|help|configtest}
$ /etc/init.d/nginx
Usage: nginx {start|stop|restart|reload|force-reload|status|configtest}
Persistent database:
$ /etc/init.d/postgresql
Usage: /etc/init.d/postgresql {start|stop|restart|reload|force-reload|status} [version ..]
GitLab (includes Puma and Sidekiq logs):
/home/git/gitlab/log/ usually contains application.log, production.log, sidekiq.log, puma.stdout.log, git_json.log and puma.stderr.log.GitLab Shell:
/home/git/gitlab-shell/gitlab-shell.logSSH:
/var/log/auth.log auth log (on Ubuntu)./var/log/secure auth log (on RHEL).NGINX:
/var/log/nginx/ contains error and access logs.Apache httpd:
/var/log/apache2/ contains error and output logs (on Ubuntu)./var/log/httpd/ contains error and output logs (on RHEL).Redis:
/var/log/redis/redis.log there are also log-rotated logs there.PostgreSQL:
/var/log/postgresql/*GitLab has configuration files located in /home/git/gitlab/config/*. Commonly referenced
configuration files include:
gitlab.yml: GitLab Rails configurationpuma.rb: Puma web server settingsdatabase.yml: Database connection settingsGitLab Shell has a configuration file at /home/git/gitlab-shell/config.yml.
Settings which belong in gitlab.yml include those related to:
Many other settings are better placed in the app itself, in ApplicationSetting. Managing settings in UI is usually a better user experience compared to managing configuration files. With respect to development cost, modifying gitlab.yml often seems like a faster iteration, but when you consider all the deployment methods below, it may be a poor tradeoff.
When adding a setting to gitlab.yml:
GitLab provides Rake tasks with which you see version information and run a quick check on your configuration to ensure it is configured properly within the application. See maintenance Rake tasks. In a nutshell, do the following:
sudo -i -u git
cd gitlab
bundle exec rake gitlab:env:info RAILS_ENV=production
bundle exec rake gitlab:check RAILS_ENV=production
It's recommended to sign in to the git user using either sudo -i -u git or
sudo su - git. Although the sudo commands provided by GitLab work in Ubuntu,
they don't always work in RHEL.
The GitLab.com architecture is detailed for your reference, but this architecture is only useful if you have millions of users.
A SaaS model gateway is available to enable AI-native features.