doc/integration/clickhouse.md
{{< details >}}
{{< /details >}}
{{< history >}}
{{< /history >}}
ClickHouse is an open-source column-oriented database management system. It can efficiently filter, aggregate, and query across large data sets.
GitLab uses ClickHouse as a secondary data store to enable advanced analytics features such as GitLab Duo, SDLC trends, and CI Analytics. GitLab only stores data that supports these features in ClickHouse.
You should use ClickHouse Cloud to connect ClickHouse to GitLab.
Alternatively, you can bring your own ClickHouse. For more information, see ClickHouse recommendations for GitLab Self-Managed.
After you configure ClickHouse, you can use the following analytics features:
| Feature | Description |
|---|---|
| Runner fleet dashboard | Displays runner usage metrics and job wait times. Provides export of CSV files containing job counts and executed runner minutes by runner type and job status for each project. |
| Contribution analytics | Provides analytics of group member contributions (push events, issues, merge requests) over time. ClickHouse reduces the likelihood of timeout issues for large instances. |
| GitLab Duo and SDLC trends | Measures the impact of GitLab Duo on software development performance. Tracks development metrics (deployment frequency, lead time, change failure rate, time to restore) alongside AI-specific indicators (GitLab Duo seat adoption, Code Suggestions acceptance rates, and GitLab Duo Chat usage). |
| GraphQL API for AI Metrics | Provides programmatic access to GitLab Duo and SDLC trend data through the AiMetrics, AiUserMetrics, and AiUsageData endpoints. Provides export of pre-aggregated metrics and raw event data for integration with BI tools and custom analytics. |
The supported ClickHouse version differs depending on your GitLab version:
dictGet), see the snippet.ClickHouse Cloud is always compatible with the latest stable GitLab release.
[!warning] If you're using ClickHouse 25.12, note that it introduced a backward-incompatible change to
ALTER MODIFY COLUMN. This breaks the migration process for the GitLab ClickHouse integration in versions prior to 18.8. It requires upgrading GitLab to version 18.8+.
Choose your deployment type based on your operational requirements:
After setting up your ClickHouse instance:
Prerequisites:
To set up ClickHouse Cloud:
9440 for secure connections)[!note] ClickHouse Cloud automatically handles version upgrades and security patches. Enterprise Edition (EE) customers can schedule upgrades to control when they occur, and avoid unexpected service interruptions during business hours. For more information, see upgrade ClickHouse.
After you create your ClickHouse Cloud service, you then create the GitLab database and user.
Prerequisites:
[!warning] For ClickHouse for GitLab Self-Managed, you are responsible for planning and executing version upgrades, security patches, and backups. For more information, see Upgrade ClickHouse.
For a multi-node, high-availability (HA) setup, GitLab supports the Replicated table engine in ClickHouse.
Prerequisites:
remote_servers configuration section.clustershardreplicaWhen configuring the database for HA, you must run the statements with the ON CLUSTER clause.
For more information, see ClickHouse Replicated database engine documentation.
The GitLab application communicates with the ClickHouse cluster through the HTTP/HTTPS interface. For HA deployments, use an HTTP proxy or load balancer to distribute requests across ClickHouse cluster nodes.
Recommended load balancer options:
Basic chproxy configuration example:
server:
http:
listen_addr: ":8080"
clusters:
- name: "clickhouse_cluster"
nodes: [
"http://ch-node1:8123",
"http://ch-node2:8123",
"http://ch-node3:8123"
]
users:
- name: "gitlab"
password: "your_secure_password"
to_cluster: "clickhouse_cluster"
to_user: "gitlab"
When using a load balancer, configure GitLab to connect to the load balancer URL instead of individual ClickHouse nodes.
For more information, see chproxy documentation.
After you configure your ClickHouse for GitLab Self-Managed instance, create the GitLab database and user.
Before configuring the database, verify ClickHouse is installed and accessible:
Check ClickHouse is running:
clickhouse-client --query "SELECT version()"
If ClickHouse is running, you see the version number (for example, 24.3.1.12).
Verify you can connect with credentials:
clickhouse-client --host your-clickhouse-host --port 9440 --secure --user default --password 'your-password'
[!note] If you have not configured TLS yet, use port
9000without the--secureflag for initial testing.
To create the necessary user and database objects:
clickhouse-client.PASSWORD_HERE with the generated password.{{< tabs >}}
{{< tab title="Single-node or ClickHouse Cloud" >}}
CREATE DATABASE gitlab_clickhouse_main_production;
CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE';
CREATE ROLE gitlab_app;
GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE, dictGet ON gitlab_clickhouse_main_production.* TO gitlab_app;
GRANT SELECT ON information_schema.* TO gitlab_app;
GRANT gitlab_app TO gitlab;
{{< /tab >}}
{{< tab title="HA ClickHouse for GitLab Self-Managed" >}}
Replace CLUSTER_NAME_HERE with your cluster's name:
CREATE DATABASE gitlab_clickhouse_main_production ON CLUSTER CLUSTER_NAME_HERE ENGINE = Replicated('/clickhouse/databases/{cluster}/gitlab_clickhouse_main_production', '{shard}', '{replica}');
CREATE USER gitlab IDENTIFIED WITH sha256_password BY 'PASSWORD_HERE' ON CLUSTER CLUSTER_NAME_HERE;
CREATE ROLE gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT SELECT, INSERT, ALTER, CREATE, UPDATE, DROP, TRUNCATE, OPTIMIZE, dictGet ON gitlab_clickhouse_main_production.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT SELECT ON information_schema.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
GRANT gitlab_app TO gitlab ON CLUSTER CLUSTER_NAME_HERE;
{{< /tab >}}
{{< /tabs >}}
{{< tabs >}}
{{< tab title="Linux package" >}}
To provide GitLab with ClickHouse credentials:
Edit /etc/gitlab/gitlab.rb:
gitlab_rails['clickhouse_databases']['main']['database'] = 'gitlab_clickhouse_main_production'
gitlab_rails['clickhouse_databases']['main']['url'] = 'https://your-clickhouse-host:port'
gitlab_rails['clickhouse_databases']['main']['username'] = 'gitlab'
gitlab_rails['clickhouse_databases']['main']['password'] = 'PASSWORD_HERE' # replace with the actual password
Replace the URL with:
https://your-service.clickhouse.cloud:9440https://your-clickhouse-host:8443https://your-load-balancer:8080 (or your load balancer URL)Save the file and reconfigure GitLab:
sudo gitlab-ctl reconfigure
{{< /tab >}}
{{< tab title="Helm chart (Kubernetes)" >}}
Save the ClickHouse password as a Kubernetes Secret:
kubectl create secret generic gitlab-clickhouse-password --from-literal="main_password=PASSWORD_HERE"
Export the Helm values:
helm get values gitlab > gitlab_values.yaml
Edit gitlab_values.yaml:
global:
clickhouse:
enabled: true
main:
username: gitlab
password:
secret: gitlab-clickhouse-password
key: main_password
database: gitlab_clickhouse_main_production
url: 'https://your-clickhouse-host:port'
Replace the URL with:
https://your-service.clickhouse.cloud:9440https://your-clickhouse-host:8443https://your-load-balancer:8080 (or your load balancer URL)Save the file and apply the new values:
helm upgrade -f gitlab_values.yaml gitlab gitlab/gitlab
{{< /tab >}}
{{< /tabs >}}
[!note] For production deployments, configure TLS/SSL on your ClickHouse instance and use
https://URLs. For GitLab Self-Managed installations, see the Network Security documentation.
To verify that your connection is set up successfully:
Sign in to the Rails console.
Execute the following command:
ClickHouse::Client.select('SELECT 1', :main)
If successful, the command returns [{"1"=>1}].
If the connection fails, verify:
{{< tabs >}}
{{< tab title="Linux package" >}}
To create the required database objects, execute:
sudo gitlab-rake gitlab:clickhouse:migrate
{{< /tab >}}
{{< tab title="Helm chart (Kubernetes)" >}}
Migrations are executed automatically with the GitLab-Migrations chart.
Alternatively, you can run migrations by executing the following command in the Toolbox pod:
gitlab-rake gitlab:clickhouse:migrate
{{< /tab >}}
{{< /tabs >}}
After your GitLab instance is connected to ClickHouse, you can enable features that use ClickHouse:
Prerequisites:
To enable ClickHouse for Analytics:
To disable ClickHouse for Analytics:
Prerequisites:
To disable:
[!note] Disabling ClickHouse for Analytics stops GitLab from querying ClickHouse but does not delete any data from your ClickHouse instance. Analytics features that rely on ClickHouse will fall back to alternative data sources or become unavailable.
ClickHouse Cloud automatically handles version upgrades and security patches. No manual intervention is required.
For information about upgrade scheduling and maintenance windows, see ClickHouse Cloud upgrades.
[!note] ClickHouse Cloud notifies you in advance of upcoming upgrades. Review the ClickHouse Cloud changelog to stay informed about new features and changes.
For ClickHouse for GitLab Self-Managed, you are responsible for planning and executing version upgrades.
Prerequisites:
Before upgrading:
To upgrade ClickHouse:
[!warning] Always ensure the ClickHouse version remains compatible with your GitLab version. Incompatible versions might cause indexing to pause and features to fail. For more information, see supported ClickHouse versions
For detailed upgrade procedures, see the ClickHouse documentation on updates.
Prerequisites:
To check the status of ClickHouse migrations:
Alternatively, check for pending migrations using the Rails console:
# Sign in to Rails console
# Run this to check migrations
ClickHouse::MigrationSupport::Migrator.new(:main).pending_migrations
If a ClickHouse migration fails:
Check the logs for error details. ClickHouse-related errors are logged in the GitLab application logs.
Address the underlying issue (for example, insufficient memory, connectivity problems).
Retry the migration:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:migrate
# For self-compiled installations
bundle exec rake gitlab:clickhouse:migrate RAILS_ENV=production
[!note] Migrations are designed to be idempotent and safe to retry. If a migration fails partway through, running it again resumes from where it left off or skip already-completed steps.
GitLab provides several Rake tasks for managing your ClickHouse database.
The following Rake tasks are available:
| Task | Description |
|---|---|
sudo gitlab-rake gitlab:clickhouse:migrate | Runs all pending ClickHouse migrations to create or update database schema. |
sudo gitlab-rake gitlab:clickhouse:drop | Drops all ClickHouse databases. Use with extreme caution as this deletes all data. |
sudo gitlab-rake gitlab:clickhouse:create | Creates ClickHouse databases if they do not exist. |
sudo gitlab-rake gitlab:clickhouse:setup | Creates databases and runs all migrations. Equivalent to running create and migrate tasks. |
sudo gitlab-rake gitlab:clickhouse:schema:dump | Dumps the current database schema to a file for backup or version control. |
sudo gitlab-rake gitlab:clickhouse:schema:load | Loads the database schema from a dump file. |
[!note] For self-compiled installations, use
bundle exec rakeinstead ofsudo gitlab-rakeand addRAILS_ENV=productionto the end of the command.
To verify your ClickHouse connection is working:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:info
# For self-compiled installations
bundle exec rake gitlab:clickhouse:info RAILS_ENV=production
This task outputs debugging information about the ClickHouse connection and configuration.
To run all pending migrations:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:migrate
# For self-compiled installations
bundle exec rake gitlab:clickhouse:migrate RAILS_ENV=production
[!warning] This deletes all data in your ClickHouse database. Use only in development or when troubleshooting.
To drop and recreate the database:
# For installations that use the Linux package
sudo gitlab-rake gitlab:clickhouse:drop
sudo gitlab-rake gitlab:clickhouse:setup
# For self-compiled installations
bundle exec rake gitlab:clickhouse:drop RAILS_ENV=production
bundle exec rake gitlab:clickhouse:setup RAILS_ENV=production
You can use environment variables to control Rake task behavior:
| Environment variable | Data type | Description |
|---|---|---|
VERBOSE | Boolean | Set to true to see detailed output during migrations. Example: VERBOSE=true sudo gitlab-rake gitlab:clickhouse:migrate |
[!note] For resource sizing and deployment recommendations based on your user count, see system requirements.
For information about ClickHouse architecture and performance tuning, see the ClickHouse documentation on architecture.
You should perform a full backup before upgrading the GitLab application. ClickHouse data is not included in GitLab backup tooling.
Backup and restore strategy depends on the choice of deployment.
ClickHouse Cloud automatically:
You do not have to do any additional configuration.
For more information, see ClickHouse Cloud backups.
If you manage your own ClickHouse instance, you should take regular backups to ensure data safety:
metrics or logs) to a object storage bucket, for example AWS S3.This duplicates data for every full backup, but is the easiest approach to restore data.
Alternatively, use clickhouse-backup. This is a third-party tool that provides similar functionality with additional features like scheduling and remote storage management.
To ensure the stability of the GitLab integration, you should monitor the health and performance of your ClickHouse cluster.
ClickHouse Cloud provides a native Prometheus integration that exposes metrics through a secure API endpoint.
After generating the API credentials, you can configure collectors to scrape metrics from ClickHouse Cloud. For example, a Prometheus deployment.
ClickHouse can expose metrics in Prometheus format. To enable this:
Configure the prometheus section in your config.xml to expose metrics on a dedicated port (default is 9363).
<prometheus>
<endpoint>/metrics</endpoint>
<port>9363</port>
<metrics>true</metrics>
<events>true</events>
<asynchronous_metrics>true</asynchronous_metrics>
</prometheus>
Configure Prometheus or a similar compatible server to scrape http://<clickhouse-host>:9363/metrics.
You should set up alerts for the following metrics to detect issues that may impact GitLab features:
| Metric Name | Description | Alert Threshold (Recommendation) |
|---|---|---|
ClickHouse_Metrics_Query | Number of queries currently executing. A sudden spike might indicate a performance bottleneck. | Baseline deviation (for example > 100) |
ClickHouseProfileEvents_FailedSelectQuery | Number of failed select queries | Baseline deviation (for example > 50) |
ClickHouseProfileEvents_FailedInsertQuery | Number of failed insert queries | Baseline deviation (for example > 10) |
ClickHouse_AsyncMetrics_ReadonlyReplica | Indicates if a replica has gone into read-only mode (often due to ZooKeeper connection loss). | > 0 (take immediate action) |
ClickHouse_ProfileEvents_NetworkErrors | Network errors (connection resets/timeouts). Frequent errors might cause GitLab background jobs to fail. | Rate > 0 |
If ClickHouse is available behind a load balancer, you can use the HTTP /ping endpoint to check for liveness.
The expected response is Ok with HTTP Code 200.
To ensure the security of your data and ensure audit ability, use the following security practices.
TLS Encryption: Configure ClickHouse servers to use TLS encryption to validate connections.
When configuring the connection URL in GitLab, you should use the https:// protocol (for example, https://clickhouse.example.com:8443) to specify this.
IP Allow lists: Restrict access to the ClickHouse port (default 8443 or 9440) to only the GitLab application nodes and other authorized networks.
GitLab application does not maintain a separate audit log for individual ClickHouse queries. In order to satisfy specific requirements regarding data access (who queried what and when), you can enable logging on the ClickHouse side.
In ClickHouse Cloud, query logging is enabled by default.
You can access these logs by querying the system.query_log table.
For self-managed instances, ensure the query_log configuration parameter is enabled in your server configuration:
Verify that the query_log section exists in your config.xml or users.xml:
<query_log>
<database>system</database>
<table>query_log</table>
<partition_by>toYYYYMM(event_date)</partition_by>
<flush_interval_milliseconds>7500</flush_interval_milliseconds>
<ttl>event_date + INTERVAL 30 DAY</ttl> <!-- Keep only 30 days -->
</query_log>
Once enabled, all executed queries are recorded in the system.query_log table, allowing for audit trail.
The recommended system requirements change depending on the number of users.
| Users | Primary recommendation | Comparable AWS ARM instance | Comparable GCP ARM instance | Comparable Azure ARM instance | Deployment type |
|---|---|---|---|---|---|
| 1K | ClickHouse Cloud Basic | - | - | - | Managed |
| 2K | ClickHouse Cloud Basic | m8g.xlarge | c4a-standard-4 | Standard_D4ps_v6 | Managed or Single Node |
| 3K | ClickHouse Cloud Scale | m8g.2xlarge | c4a-standard-8 | Standard_D8ps_v6 | Managed or Single Node |
| 5K | ClickHouse Cloud Scale | m8g.4xlarge | c4a-standard-16 | Standard_D16ps_v6 | Managed or Single Node |
| 10K | ClickHouse Cloud Scale | m8g.4xlarge | c4a-standard-16 | Standard_D16ps_v6 | Managed or Single Node/HA |
| 25K | ClickHouse for GitLab Self-Managed or ClickHouse Cloud Scale | m8g.8xlarge or 3×m8g.4xlarge | c4a-standard-32 or 3×c4a-standard-16 | Standard_D32ps_v6 or 3xStandard_D16ps_v6 | Managed or Single Node/HA |
| 50K | ClickHouse for GitLab Self-Managed high availability (HA) or ClickHouse Cloud Scale | 3×m8g.4xlarge | 3×c4a-standard-16 | 3xStandard_D16ps_v6 | Managed or HA Cluster |
Recommendation: ClickHouse Cloud Basic as it provides good cost efficiency with no operational complexity.
Recommendation: ClickHouse Cloud Basic as it offers best value with no operational complexity.
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
Recommendation: ClickHouse Cloud Scale
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
[!note] HA deployments are not cost-effective at this scale.
Recommendation: ClickHouse Cloud Scale
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
Recommendation: ClickHouse Cloud Scale
Alternative recommendation for ClickHouse for GitLab Self-Managed deployment:
Recommendation: ClickHouse Cloud Scale or ClickHouse for GitLab Self-Managed. Both options are economically feasible at this scale.
Recommendations for ClickHouse for GitLab Self-Managed deployment:
Single Node:
HA Deployment:
Storage: 400 GB per node with high performance tier.
Recommendation: ClickHouse for GitLab Self-Managed HA or ClickHouse Cloud Scale. The self-managed option is slightly more cost-effective at this scale.
Recommendations for ClickHouse for GitLab Self-Managed deployment:
Single Node:
HA Deployment (Preferred):
Storage: 1000 GB per node with high performance tier.
HA setup becomes cost effective only at 10k users or above.
MergeTree is a table engine in ClickHouse designed for high data ingest rates and large data volumes.
It is the core storage engine in ClickHouse, providing features such as columnar storage, custom partitioning, sparse primary indexes, and support for background data merges.[!warning] On GitLab 18.0.0 and earlier, running database schema migrations for ClickHouse may fail for ClickHouse 24.x and 25.x with the following error message:
plaintextCode: 344. DB::Exception: Projection is fully supported in ReplacingMergeTree with deduplicate_merge_projection_mode = throw. Use 'drop' or 'rebuild' option of deduplicate_merge_projection_modeWithout running all migrations, the ClickHouse integration will not work.
To work around this issue and run the migrations:
Sign in to the Rails console.
Execute the following command:
ClickHouse::Client.execute("INSERT INTO schema_migrations (version) VALUES ('20231114142100'), ('20240115162101')", :main)
Migrate the database again:
sudo gitlab-rake gitlab:clickhouse:migrate
This time the database migration should successfully finish.
From GitLab 18.8, GitLab starts using ClickHouse Dictionaries for data denormalization. The GRANT statements prior 18.8 did not give permission to the gitlab user to query dictionaries so a manual modification step is needed:
clickhouse-client.PASSWORD_HERE with the generated password.{{< tabs >}}
{{< tab title="Single-node or ClickHouse Cloud" >}}
GRANT dictGet ON gitlab_clickhouse_main_production.* TO gitlab_app;
{{< /tab >}}
{{< tab title="HA ClickHouse for GitLab Self-Managed" >}}
Replace CLUSTER_NAME_HERE with your cluster's name:
GRANT dictGet ON gitlab_clickhouse_main_production.* TO gitlab_app ON CLUSTER CLUSTER_NAME_HERE;
{{< /tab >}}
{{< /tabs >}}
Without granting the permission, the ClickHouse migration (CreateNamespaceTraversalPathsDict) will fail with the following error:
DB::Exception: gitlab: Not enough privileges.
After granting the permission, the migration can be safely retried (ideally, wait 1-2 hours until the distributed migration lock clears).
In GitLab 18.5 and earlier, duplicate data could be inserted into ClickHouse tables
(such as ci_finished_pipelines and ci_finished_builds) when Sidekiq workers
retried after network timeouts. This issue caused materialized views to display incorrect
aggregated metrics in analytics dashboards, including the runner fleet dashboard.
This issue was fixed in GitLab 18.9 and backported to 18.6, 18.7, and 18.8. To resolve this issue, upgrade to GitLab 18.6 or later.
If you have existing duplicate data, a fix to rebuild the affected materialized views is planned for GitLab 18.10 in issue 586319. For assistance, contact GitLab Support.