doc/integration/zoekt/_index.md
{{< details >}}
{{< /details >}}
{{< history >}}
index_code_with_zoekt and search_code_with_zoekt. Disabled by default.zoekt_cross_namespace_search. Disabled by default.index_code_with_zoekt and search_code_with_zoekt removed in GitLab 17.1.zoekt_rollout_worker added in GitLab 17.9. Disabled by default.zoekt_cross_namespace_search and zoekt_rollout_worker removed in GitLab 18.7.{{< /history >}}
[!warning] This feature is in limited availability. For more information, see epic 9404. Provide feedback in issue 420920.
Zoekt is an open-source search engine designed specifically to search for code.
With this integration, you can use exact code search instead of advanced search to search for code in GitLab. You can use exact match and regular expression modes to search for code in a group or repository.
[!note] Zoekt handles only code search and does not replace Elasticsearch or OpenSearch. For all other search scopes, including comments, commits, epics, issues, merge requests, milestones, projects, users, and wikis, Elasticsearch or OpenSearch is still required.
Prerequisites:
To enable exact code search in GitLab, you must have at least one Zoekt node connected to the instance. The following installation methods are supported for Zoekt:
gitlab-zoekt.install=true)The following installation methods are available for testing, not for production use:
Prerequisites:
To enable exact code search from the GitLab UI:
{{< history >}}
{{< /history >}}
Prerequisites:
You can manage exact code search with Rake tasks.
To enable indexing and search, run this task:
gitlab-rake gitlab:zoekt:index
This task enables zoekt_indexing_enabled, zoekt_search_enabled,
and zoekt_auto_index_root_namespace.
RolloutWorker indexes all root namespaces automatically, and
search becomes available when indices are ready.
To disable indexing and search, run this task:
gitlab-rake gitlab:zoekt:disable
This task disables both zoekt_indexing_enabled and zoekt_search_enabled.
To pause indexing (for example, during maintenance), run this task:
gitlab-rake gitlab:zoekt:pause_indexing
To resume indexing, run this task:
gitlab-rake gitlab:zoekt:resume_indexing
To estimate the storage required for your Zoekt nodes, run this task:
sudo gitlab-rake gitlab:zoekt:estimate_storage
For more information, see estimate storage.
{{< history >}}
zoekt_critical_watermark_stop_indexing. Disabled by default.zoekt_critical_watermark_stop_indexing removed.{{< /history >}}
Prerequisites:
Indexing performance depends on the CPU and memory limits on the Zoekt indexer nodes. To check indexing status:
{{< tabs >}}
{{< tab title="GitLab 17.10 and later" >}}
Run this Rake task:
gitlab-rake gitlab:zoekt:info
To have the data refresh automatically every 10 seconds, run this task instead:
gitlab-rake "gitlab:zoekt:info[10]"
{{< /tab >}}
{{< tab title="GitLab 17.9 and earlier" >}}
In a Rails console, run these commands:
Search::Zoekt::Index.group(:state).count
Search::Zoekt::Repository.group(:state).count
Search::Zoekt::Task.group(:state).count
{{< /tab >}}
{{< /tabs >}}
The gitlab:zoekt:info Rake task returns an output similar to the following:
Exact Code Search
GitLab version: 18.9.0
Enable indexing: yes
Enable searching: yes
Pause indexing: no
Index root namespaces automatically: yes
Cache search results for five minutes: yes
Indexing CPU to tasks multiplier: 1.0
Probability of random force reindexing (percentage): 0.25
Number of parallel processes per indexing task: 1
Number of namespaces per indexing rollout: 32
Offline nodes automatically deleted after: 20m
Indexing timeout per project: 30m
Maximum number of files per project to be indexed: 500000
Maximum file size for indexing: 1MB
Maximum trigrams per file: 20000
Retry interval for failed namespaces: 1d
Number of replicas per namespace: 1
Nodes
# Number of Zoekt nodes and their status
Node count: 2 (online: 2, offline: 0)
Last seen at: 2025-11-21 22:58:09 UTC (less than a minute ago)
Max schema_version: 2531
Storage reserved / usable: 71.1 MiB / 124 GiB (0.06%)
Storage indexed / reserved: 42.7 MiB / 71.1 MiB (60.0%)
Storage used / total: 797 GiB / 921 GiB (86.54%)
Online node watermark levels: 2
- low: 2
Indexing status
Group count: 8
# Number of enabled namespaces and their status
EnabledNamespace count: 8 (without indices: 0, rollout blocked: 0, with search disabled: 0)
Replicas count: 8
- ready: 8
Indices count: 8
- ready: 8
Indices watermark levels: 8
- healthy: 8
Repositories count: 10
- ready: 10
Tasks count: 10
- done: 10
Tasks pending/processing by type: (none)
Storage buffer factor: 0.831× [static fallback (FF disabled)]
Feature Flags (Default Values)
- zoekt_too_many_replicas_event: disabled
Node Details
Node 1 - test-zoekt-hostname-1:
Status: Online
Last seen at: 2025-11-21 22:58:09 UTC (less than a minute ago)
Disk utilization: 86.54%
Unclaimed storage: 62 GiB
# Zoekt build version on the node. Must match GitLab version.
Zoekt version: 2025.11.20-v1.7.6-28-gb9a0fd8
Schema version: 2531
Node 2 - test-zoekt-hostname-2:
Status: Online
Last seen at: 2025-11-21 22:58:09 UTC (less than a minute ago)
Disk utilization: 86.54%
Unclaimed storage: 62 GiB
Zoekt version: 2025.11.20-v1.7.6-28-gb9a0fd8
Schema version: 2531
{{< history >}}
{{< /history >}}
Prerequisites:
Run a health check to understand the status of your Zoekt infrastructure, including:
To run a health check, execute the following task:
gitlab-rake gitlab:zoekt:health
This task provides:
HEALTHY, DEGRADED, or UNHEALTHY0=healthy, 1=degraded, or 2=unhealthyTo run health checks automatically every 10 seconds, execute the following task:
gitlab-rake "gitlab:zoekt:health[10]"
The output includes colored status indicators and shows:
HEALTHY, DEGRADED, or UNHEALTHY{{< history >}}
{{< /history >}}
Prerequisites:
Perform force reindexing for the range of projects.
Run this Rake task:
gitlab-rake gitlab:zoekt:reindex_projects ID_FROM=10 ID_TO=20
Using the ID_FROM and ID_TO environment variables, you can force reindex a limited number of projects.
To reindex just one project keep the ID_FROM and ID_TO equal to the project ID to be reindexed.
To reindex all projects omit the environment variables.
Prerequisites:
To pause indexing for exact code search:
When you pause indexing for exact code search, all changes in your repository are queued. To resume indexing, clear the Pause indexing for exact code search checkbox.
{{< history >}}
{{< /history >}}
Prerequisites:
You can index both existing and new root namespaces automatically. To index all root namespaces automatically:
When you enable this setting, GitLab creates indexing tasks for all projects in:
After a project is indexed, GitLab creates only incremental indexing when a repository change is detected.
When you disable this setting:
{{< history >}}
{{< /history >}}
Prerequisites:
You can cache search results for better performance. This feature is enabled by default and caches results for five minutes.
To cache search results:
{{< history >}}
{{< /history >}}
Prerequisites:
You can set the number of concurrent indexing tasks for a Zoekt node relative to its CPU capacity.
A higher multiplier means more tasks can run concurrently, which would
improve indexing throughput at the cost of increased CPU usage.
The default value is 1.0 (one task per CPU core).
You can adjust this value based on the node's performance and workload. To set the number of concurrent indexing tasks:
In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Indexing CPU to tasks multiplier text box, enter a value.
For example, if a Zoekt node has 4 CPU cores and the multiplier is 1.5,
the number of concurrent tasks for the node is 6.
Select Save changes.
{{< history >}}
{{< /history >}}
Prerequisites:
You can define the probability that a project is
force reindexed instead of incrementally indexed.
The default value is 0.25 (0.25%).
Force reindexing helps prevent memory map (mmap) handlers from running out by periodically rebuilding indices from scratch. A higher percentage increases indexing load, especially for very large repositories.
To define the probability of random force reindexing:
0 and 100.{{< history >}}
{{< /history >}}
Prerequisites:
You can set the number of parallel processes per indexing task.
A higher number would improve indexing time at the cost of increased CPU and memory usage.
The default value is 1 (one process per indexing task).
You can adjust this value based on the node's performance and workload. To set the number of parallel processes per indexing task:
{{< history >}}
{{< /history >}}
Prerequisites:
You can set the number of namespaces per RolloutWorker job for initial indexing.
The default value is 32.
You can adjust this value based on the node's performance and workload.
To set the number of namespaces per indexing rollout:
{{< history >}}
{{< /history >}}
Prerequisites:
You can delete offline Zoekt nodes automatically after a specific period of time
along with their related indices, repositories, and tasks.
The default value is 12h (12 hours).
Use this setting to manage your Zoekt infrastructure and prevent orphaned resources. To define when offline nodes are automatically deleted:
30m (30 minutes), 2h (two hours), or 1d (one day)).
To disable automatic deletion, set to 0.{{< history >}}
{{< /history >}}
Prerequisites:
You can define the indexing timeout for a project.
The default value is 30m (30 minutes).
To define the indexing timeout for a project:
30m (30 minutes), 2h (two hours), or 1d (one day)).{{< history >}}
{{< /history >}}
Prerequisites:
You can set the maximum number of files in a project that can be indexed. Projects with more files than this limit in the default branch are not indexed.
The default value is 500,000.
You can adjust this value based on the node's performance and workload. To set the maximum number of files in a project to be indexed:
{{< history >}}
{{< /history >}}
Prerequisites:
You can set the maximum size for a file to be indexed.
The default value is 1MB.
Only filenames are indexed for files that exceed the specified size. You can search these files only by filename. To set maximum file size for indexing:
512B, 50KB, 2MB, or 1GB).
The value can also be in lowercase.{{< history >}}
{{< /history >}}
Prerequisites:
You can set the maximum number of trigrams for a file to be indexed.
The default value is 20,000.
Trigrams are three-character sequences that Zoekt uses for efficient code search. For files that exceed this trigram limit, only filenames are indexed. A higher limit affects both indexing and search performance.
To set the maximum trigram count for indexing:
{{< history >}}
{{< /history >}}
Prerequisites:
You can define the retry interval for namespaces that previously failed.
The default value is 1d (one day).
A value of 0 means failed namespaces never retry.
To define the retry interval for failed namespaces:
30m (30 minutes), 2h (two hours), or 1d (one day)).{{< history >}}
{{< /history >}}
Prerequisites:
You can set the number of replicas per namespace.
The default value is 1 (one replica per namespace).
Increasing the number of replicas per namespace improves search availability by distributing the load across multiple Zoekt nodes. More replicas increase storage requirements.
To set the number of replicas per namespace:
{{< history >}}
{{< /history >}}
Prerequisites:
To run Zoekt on a different server than GitLab:
The following recommendations might be over-provisioned for some deployments. You should monitor your deployment to ensure:
Adjust resources based on your specific workload characteristics, including:
The webserver and indexer have different memory usage patterns.
The webserver memory-maps index shards from disk into virtual memory. The operating system pages shard data in and out of physical memory as searches are served. Resident memory usage grows with the active working set. Nodes with larger indices or higher query volume require more webserver memory to avoid page thrashing and out-of-memory conditions.
The indexer processes Git object data in memory when it builds or rebuilds indices. Memory usage spikes when indexing large repositories or when multiple tasks run in parallel. You can control peak indexer memory by adjusting the number of parallel processes per indexing task and the indexing CPU to tasks multiplier.
On VM and bare metal deployments, the webserver and indexer share the same system memory.
For optimal performance, proper sizing of Zoekt nodes is crucial. Sizing recommendations differ between Kubernetes and VM deployments due to how resources are allocated and managed.
The following table shows recommended resources per node (per StatefulSet pod) for Kubernetes deployments based on index storage requirements. Each pod in the StatefulSet runs its own webserver and indexer containers with independent resource allocations and its own persistent volume for index storage. If you run multiple nodes, multiply these resources by the number of nodes to calculate total cluster resources.
| Disk | Webserver CPU | Webserver memory | Indexer CPU | Indexer memory |
|---|---|---|---|---|
| 128 GB | 1 | 16 GiB | 1 | 6 GiB |
| 256 GB | 1.5 | 32 GiB | 1 | 8 GiB |
| 512 GB | 2 | 64 GiB | 1 | 12 GiB |
| 1 TB | 3 | 128 GiB | 1.5 | 24 GiB |
| 2 TB | 4 | 256 GiB | 2 | 32 GiB |
To manage resources more granularly, you can allocate CPU and memory separately to different containers.
For Kubernetes deployments:
pd-balanced on GCP, which balances performance and cost.
Equivalent options include gp3 on AWS and Premium_LRS on Azure.The following table shows recommended resources per node for VM and bare metal deployments based on index storage requirements. If you run multiple nodes, multiply these resources by the number of nodes to calculate total cluster resources.
| Disk | VM size | Total CPU | Total memory | AWS | GCP | Azure |
|---|---|---|---|---|---|---|
| 128 GB | Small | 2 cores | 16 GB | r5.large | n1-highmem-2 | Standard_E2s_v3 |
| 256 GB | Medium | 4 cores | 32 GB | r5.xlarge | n1-highmem-4 | Standard_E4s_v3 |
| 512 GB | Large | 4 cores | 64 GB | r5.2xlarge | n1-highmem-8 | Standard_E8s_v3 |
| 1 TB | X-Large | 8 cores | 128 GB | r5.4xlarge | n1-highmem-16 | Standard_E16s_v3 |
| 2 TB | 2X-Large | 16 cores | 256 GB | r5.8xlarge | n1-highmem-32 | Standard_E32s_v3 |
You can allocate these resources only to the entire node.
For VM and bare metal deployments:
Zoekt storage requirements depend on the size of your Git repositories and your replica configuration. Zoekt indexes only Git object data (source code and commit history). It does not index LFS files, CI/CD artifacts, packages, wikis, or other storage components.
To estimate storage requirements, run the Rake task:
sudo gitlab-rake gitlab:zoekt:estimate_storage
This task queries your GitLab database and outputs a storage estimate based on your current repository sizes and replica configuration.
If you prefer to calculate manually, use:
storage_per_replica = sum(repository_git_size) × buffer_factor
total_cluster_storage = storage_per_replica × number_of_replicas
Where repository_git_size is the Git object size for each repository.
This value does not include LFS objects, wiki, artifacts, or packages.
And buffer_factor is the headroom during initial indexing.
It could be calculated as Search::Zoekt::Index.global_buffer_factor which is mostly 3 by default.
To view repository_git_size:
For the initial provisioning target, start with three times
your total repository_git_size multiplied by replica count.
For example:
GitLab reserves this buffer internally to ensure Zoekt has headroom during indexing.
After initial indexing is complete, actual disk usage is typically closer to
half the repository_git_size based on observed GitLab.com data.
Scale vertically or horizontally only when needed.
You can view the current buffer factor in use by running:
sudo gitlab-rake gitlab:zoekt:info
The output includes a Storage buffer factor line showing the value the planner
is currently using and whether it is dynamic or the static fallback.
To monitor Zoekt node storage, see check indexing status. If namespaces are not indexed due to low disk space, add nodes or increase disk capacity.
Zoekt implements a multi-layered authentication system to secure communication between GitLab, Zoekt indexer, and Zoekt webserver components. Authentication is enforced across all communication channels.
All authentication methods use the GitLab Shell secret.
Failed authentication attempts return 401 Unauthorized responses.
The Zoekt indexer authenticates to GitLab with JSON web tokens (JWT) to retrieve indexing tasks and send completion callbacks.
This method uses .gitlab_shell_secret for signing and verification.
Tokens are sent in the Gitlab-Shell-Api-Request header.
Endpoints include:
GET /internal/search/zoekt/:uuid/heartbeat for task retrievalPOST /internal/search/zoekt/:uuid/callback for status updatesThis method ensures secure polling for task distribution and status reporting between Zoekt indexer nodes and GitLab.
{{< history >}}
{{< /history >}}
GitLab authenticates to the Zoekt webserver with JSON web tokens (JWT) to execute search queries. JWT tokens provide time-limited, cryptographically signed authentication consistent with other GitLab authentication patterns.
This method uses Gitlab::Shell.secret_token and the HS256 algorithm (HMAC with SHA-256).
Tokens are sent in the Authorization: Bearer <jwt_token> header
and expire in five minutes to limit exposure.
Endpoints include /webserver/api/search and /webserver/api/v2/search.
JWT claims are the issuer (gitlab) and the audience (gitlab-zoekt).