Zoekt - Gitlabhq — ContextQMD

Tier: Premium, Ultimate
Offering: GitLab.com, GitLab Self-Managed
Status: Limited availability

Introduced as a beta in GitLab 15.9 with flags named index_code_with_zoekt and search_code_with_zoekt. Disabled by default.
Enabled on GitLab.com and GitLab Self-Managed in GitLab 16.6.
Global code search introduced in GitLab 16.11 with a flag named zoekt_cross_namespace_search. Disabled by default.
Feature flags index_code_with_zoekt and search_code_with_zoekt removed in GitLab 17.1.
Feature flag zoekt_rollout_worker added in GitLab 17.9. Disabled by default.
Changed from beta to limited availability in GitLab 18.6.
Feature flags zoekt_cross_namespace_search and zoekt_rollout_worker removed in GitLab 18.7.

[!warning] This feature is in limited availability. For more information, see epic 9404. Provide feedback in issue 420920.

Zoekt is an open-source search engine designed specifically to search for code.

With this integration, you can use exact code search instead of advanced search to search for code in GitLab. You can use exact match and regular expression modes to search for code in a group or repository.

[!note] Zoekt handles only code search and does not replace Elasticsearch or OpenSearch. For all other search scopes, including comments, commits, epics, issues, merge requests, milestones, projects, users, and wikis, Elasticsearch or OpenSearch is still required.

Install Zoekt

Prerequisites:

Be an administrator of the instance.

To enable exact code search in GitLab, you must have at least one Zoekt node connected to the instance. The following installation methods are supported for Zoekt:

Zoekt chart (as a standalone chart or subchart of the GitLab Helm chart)
GitLab Operator (with gitlab-zoekt.install=true)

The following installation methods are available for testing, not for production use:

Enable exact code search

From the GitLab UI

Prerequisites:

Be an administrator of the instance.
Zoekt is installed.

To enable exact code search from the GitLab UI:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
Select the Enable indexing and Enable searching checkboxes.
Select Save changes.

With Rake tasks

Introduced in GitLab 18.10.

Prerequisites:

Be an administrator of the instance.
Zoekt is installed.

You can manage exact code search with Rake tasks.

Enable indexing and search

To enable indexing and search, run this task:

shell

gitlab-rake gitlab:zoekt:index

This task enables zoekt_indexing_enabled, zoekt_search_enabled, and zoekt_auto_index_root_namespace. RolloutWorker indexes all root namespaces automatically, and search becomes available when indices are ready.

Disable indexing and search

To disable indexing and search, run this task:

shell

gitlab-rake gitlab:zoekt:disable

This task disables both zoekt_indexing_enabled and zoekt_search_enabled.

Pause and resume indexing

To pause indexing (for example, during maintenance), run this task:

shell

gitlab-rake gitlab:zoekt:pause_indexing

To resume indexing, run this task:

shell

gitlab-rake gitlab:zoekt:resume_indexing

Estimate storage requirements

To estimate the storage required for your Zoekt nodes, run this task:

shell

sudo gitlab-rake gitlab:zoekt:estimate_storage

For more information, see estimate storage.

Check indexing status

Stopping indexing when Zoekt node storage exceeds the critical watermark introduced in GitLab 17.7 with a flag named zoekt_critical_watermark_stop_indexing. Disabled by default.
Enabled on GitLab.com, GitLab Self-Managed, and GitLab Dedicated in GitLab 18.0.
Generally available in GitLab 18.1. Feature flag zoekt_critical_watermark_stop_indexing removed.

Prerequisites:

You must have administrator access to the instance.

Indexing performance depends on the CPU and memory limits on the Zoekt indexer nodes. To check indexing status:

Run this Rake task:

shell

gitlab-rake gitlab:zoekt:info

To have the data refresh automatically every 10 seconds, run this task instead:

shell

gitlab-rake "gitlab:zoekt:info[10]"

In a Rails console, run these commands:

ruby

Search::Zoekt::Index.group(:state).count
Search::Zoekt::Repository.group(:state).count
Search::Zoekt::Task.group(:state).count

Sample output

The gitlab:zoekt:info Rake task returns an output similar to the following:

console

Exact Code Search
GitLab version:                                      18.9.0
Enable indexing:                                     yes
Enable searching:                                    yes
Pause indexing:                                      no
Index root namespaces automatically:                 yes
Cache search results for five minutes:               yes
Indexing CPU to tasks multiplier:                    1.0
Probability of random force reindexing (percentage): 0.25
Number of parallel processes per indexing task:      1
Number of namespaces per indexing rollout:           32
Offline nodes automatically deleted after:           20m
Indexing timeout per project:                        30m
Maximum number of files per project to be indexed:   500000
Maximum file size for indexing:                      1MB
Maximum trigrams per file:                           20000
Retry interval for failed namespaces:                1d
Number of replicas per namespace:                    1

Nodes
# Number of Zoekt nodes and their status
Node count:                   2 (online: 2, offline: 0)
Last seen at:                 2025-11-21 22:58:09 UTC (less than a minute ago)
Max schema_version:           2531
Storage reserved / usable:    71.1 MiB / 124 GiB (0.06%)
Storage indexed / reserved:   42.7 MiB / 71.1 MiB (60.0%)
Storage used / total:         797 GiB / 921 GiB (86.54%)
Online node watermark levels: 2
  - low: 2

Indexing status
Group count:                      8
# Number of enabled namespaces and their status
EnabledNamespace count:           8 (without indices: 0, rollout blocked: 0, with search disabled: 0)
Replicas count:                   8
  - ready: 8
Indices count:                    8
  - ready: 8
Indices watermark levels:         8
  - healthy: 8
Repositories count:               10
  - ready: 10
Tasks count:                      10
  - done: 10
Tasks pending/processing by type: (none)
Storage buffer factor:            0.831× [static fallback (FF disabled)]

Feature Flags (Default Values)
- zoekt_too_many_replicas_event: disabled

Node Details
Node 1 - test-zoekt-hostname-1:
  Status:                       Online
  Last seen at:                 2025-11-21 22:58:09 UTC (less than a minute ago)
  Disk utilization:             86.54%
  Unclaimed storage:            62 GiB
  # Zoekt build version on the node. Must match GitLab version.
  Zoekt version:                2025.11.20-v1.7.6-28-gb9a0fd8
  Schema version:               2531
Node 2 - test-zoekt-hostname-2:
  Status:                       Online
  Last seen at:                 2025-11-21 22:58:09 UTC (less than a minute ago)
  Disk utilization:             86.54%
  Unclaimed storage:            62 GiB
  Zoekt version:                2025.11.20-v1.7.6-28-gb9a0fd8
  Schema version:               2531

Run a health check

Introduced in GitLab 18.4.

Prerequisites:

You must have administrator access to the instance.

Run a health check to understand the status of your Zoekt infrastructure, including:

Online and offline nodes
Indexing and search settings
Search API endpoints
JSON web token generation

To run a health check, execute the following task:

shell

gitlab-rake gitlab:zoekt:health

This task provides:

The overall status: HEALTHY, DEGRADED, or UNHEALTHY
Recommendations for resolving detected issues
Exit codes for automation and monitoring integrations: 0=healthy, 1=degraded, or 2=unhealthy

Run checks automatically

To run health checks automatically every 10 seconds, execute the following task:

shell

gitlab-rake "gitlab:zoekt:health[10]"

The output includes colored status indicators and shows:

Online and offline node counts, storage usage warnings, and connectivity issues
Core settings validation and namespace and repository indexing statuses
The overall status including a combined health assessment: HEALTHY, DEGRADED, or UNHEALTHY
Recommendations for resolving issues

Perform force reindexing

Introduced in GitLab 18.10.

Prerequisites:

You must have administrator access to the instance.

Perform force reindexing for the range of projects.

Run this Rake task:

shell

gitlab-rake gitlab:zoekt:reindex_projects ID_FROM=10 ID_TO=20

Using the ID_FROM and ID_TO environment variables, you can force reindex a limited number of projects. To reindex just one project keep the ID_FROM and ID_TO equal to the project ID to be reindexed. To reindex all projects omit the environment variables.

Pause indexing

Prerequisites:

You must have administrator access to the instance.

To pause indexing for exact code search:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
Select the Pause indexing checkbox.
Select Save changes.

When you pause indexing for exact code search, all changes in your repository are queued. To resume indexing, clear the Pause indexing for exact code search checkbox.

Index root namespaces automatically

Introduced in GitLab 17.1.

Prerequisites:

You must have administrator access to the instance.

You can index both existing and new root namespaces automatically. To index all root namespaces automatically:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
Select the Index root namespaces automatically checkbox.
Select Save changes.

When you enable this setting, GitLab creates indexing tasks for all projects in:

All groups and subgroups
Any new root namespace

After a project is indexed, GitLab creates only incremental indexing when a repository change is detected.

When you disable this setting:

Existing root namespaces remain indexed.
New root namespaces are no longer indexed.

Cache search results

Introduced in GitLab 18.0.

Prerequisites:

You must have administrator access to the instance.

You can cache search results for better performance. This feature is enabled by default and caches results for five minutes.

To cache search results:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
Select the Cache search results for five minutes checkbox.
Select Save changes.

Set concurrent indexing tasks

Introduced in GitLab 17.4.

Prerequisites:

You must have administrator access to the instance.

You can set the number of concurrent indexing tasks for a Zoekt node relative to its CPU capacity.

A higher multiplier means more tasks can run concurrently, which would improve indexing throughput at the cost of increased CPU usage. The default value is 1.0 (one task per CPU core).

You can adjust this value based on the node's performance and workload. To set the number of concurrent indexing tasks:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Indexing CPU to tasks multiplier text box, enter a value.

For example, if a Zoekt node has 4 CPU cores and the multiplier is 1.5, the number of concurrent tasks for the node is 6.
Select Save changes.

Define the probability of random force reindexing

Introduced in GitLab 18.9.

Prerequisites:

You must have administrator access to the instance.

You can define the probability that a project is force reindexed instead of incrementally indexed. The default value is 0.25 (0.25%).

Force reindexing helps prevent memory map (mmap) handlers from running out by periodically rebuilding indices from scratch. A higher percentage increases indexing load, especially for very large repositories.

To define the probability of random force reindexing:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Probability of random force reindexing (percentage) text box, enter a number between 0 and 100.
Select Save changes.

Set the number of parallel processes per indexing task

Introduced in GitLab 18.1.

Prerequisites:

You must have administrator access to the instance.

You can set the number of parallel processes per indexing task.

A higher number would improve indexing time at the cost of increased CPU and memory usage. The default value is 1 (one process per indexing task).

You can adjust this value based on the node's performance and workload. To set the number of parallel processes per indexing task:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Number of parallel processes per indexing task text box, enter a value.
Select Save changes.

Set the number of namespaces per indexing rollout

Introduced in GitLab 18.0.

Prerequisites:

You must have administrator access to the instance.

You can set the number of namespaces per RolloutWorker job for initial indexing. The default value is 32. You can adjust this value based on the node's performance and workload.

To set the number of namespaces per indexing rollout:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Number of namespaces per indexing rollout text box, enter a number greater than zero.
Select Save changes.

Define when offline nodes are automatically deleted

Introduced in GitLab 17.5.
Delete offline nodes after 12 hours checkbox updated to Offline nodes automatically deleted after text box in GitLab 18.1.

Prerequisites:

You must have administrator access to the instance.

You can delete offline Zoekt nodes automatically after a specific period of time along with their related indices, repositories, and tasks. The default value is 12h (12 hours).

Use this setting to manage your Zoekt infrastructure and prevent orphaned resources. To define when offline nodes are automatically deleted:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Offline nodes automatically deleted after text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)). To disable automatic deletion, set to 0.
Select Save changes.

Define the indexing timeout for a project

Introduced in GitLab 18.2.

Prerequisites:

You must have administrator access to the instance.

You can define the indexing timeout for a project. The default value is 30m (30 minutes).

To define the indexing timeout for a project:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Indexing timeout per project text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)).
Select Save changes.

Set the maximum number of files in a project to be indexed

Introduced in GitLab 18.2.

Prerequisites:

You must have administrator access to the instance.

You can set the maximum number of files in a project that can be indexed. Projects with more files than this limit in the default branch are not indexed.

The default value is 500,000.

You can adjust this value based on the node's performance and workload. To set the maximum number of files in a project to be indexed:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Maximum number of files per project to be indexed text box, enter a number greater than zero.
Select Save changes.

Set maximum file size for indexing

Introduced in GitLab 18.7.

Prerequisites:

You must have administrator access to the instance.

You can set the maximum size for a file to be indexed. The default value is 1MB.

Only filenames are indexed for files that exceed the specified size. You can search these files only by filename. To set maximum file size for indexing:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Maximum file size for indexing text box, enter a value (for example, 512B, 50KB, 2MB, or 1GB). The value can also be in lowercase.
Select Save changes.

Set the maximum trigram count for indexing

Introduced in GitLab 18.8.

Prerequisites:

You must have administrator access to the instance.

You can set the maximum number of trigrams for a file to be indexed. The default value is 20,000.

Trigrams are three-character sequences that Zoekt uses for efficient code search. For files that exceed this trigram limit, only filenames are indexed. A higher limit affects both indexing and search performance.

To set the maximum trigram count for indexing:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Maximum trigrams per file text box, enter a number greater than zero.
Select Save changes.

Define the retry interval for failed namespaces

Introduced in GitLab 17.10.

Prerequisites:

You must have administrator access to the instance.

You can define the retry interval for namespaces that previously failed. The default value is 1d (one day). A value of 0 means failed namespaces never retry.

To define the retry interval for failed namespaces:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Retry interval for failed namespaces text box, enter a value (for example, 30m (30 minutes), 2h (two hours), or 1d (one day)).
Select Save changes.

Set the number of replicas per namespace

Introduced in GitLab 18.7.

Prerequisites:

You must have administrator access to the instance.

You can set the number of replicas per namespace. The default value is 1 (one replica per namespace).

Increasing the number of replicas per namespace improves search availability by distributing the load across multiple Zoekt nodes. More replicas increase storage requirements.

To set the number of replicas per namespace:

In the upper-right corner, select Admin.
Select Settings > Search.
Expand Exact code search.
In the Number of replicas per namespace text box, enter a number greater than zero.
Select Save changes.

Run Zoekt on a separate server

Authentication for Zoekt introduced in GitLab 16.3.

Prerequisites:

Be an administrator of the instance.

To run Zoekt on a different server than GitLab:

Sizing recommendations

The following recommendations might be over-provisioned for some deployments. You should monitor your deployment to ensure:

No out-of-memory events occur.
CPU throttling is not excessive.
Indexing performance meets your requirements.

Adjust resources based on your specific workload characteristics, including:

Repository size and complexity
Number of active developers
Frequency of code changes
Indexing patterns

Memory architecture

The webserver and indexer have different memory usage patterns.

The webserver memory-maps index shards from disk into virtual memory. The operating system pages shard data in and out of physical memory as searches are served. Resident memory usage grows with the active working set. Nodes with larger indices or higher query volume require more webserver memory to avoid page thrashing and out-of-memory conditions.

The indexer processes Git object data in memory when it builds or rebuilds indices. Memory usage spikes when indexing large repositories or when multiple tasks run in parallel. You can control peak indexer memory by adjusting the number of parallel processes per indexing task and the indexing CPU to tasks multiplier.

On VM and bare metal deployments, the webserver and indexer share the same system memory.

Nodes

For optimal performance, proper sizing of Zoekt nodes is crucial. Sizing recommendations differ between Kubernetes and VM deployments due to how resources are allocated and managed.

Kubernetes deployments

The following table shows recommended resources per node (per StatefulSet pod) for Kubernetes deployments based on index storage requirements. Each pod in the StatefulSet runs its own webserver and indexer containers with independent resource allocations and its own persistent volume for index storage. If you run multiple nodes, multiply these resources by the number of nodes to calculate total cluster resources.

Disk	Webserver CPU	Webserver memory	Indexer CPU	Indexer memory
128 GB	1	16 GiB	1	6 GiB
256 GB	1.5	32 GiB	1	8 GiB
512 GB	2	64 GiB	1	12 GiB
1 TB	3	128 GiB	1.5	24 GiB
2 TB	4	256 GiB	2	32 GiB

To manage resources more granularly, you can allocate CPU and memory separately to different containers.

For Kubernetes deployments:

Do not set CPU limits for Zoekt containers. CPU limits might cause unnecessary throttling during indexing bursts, which would significantly impact performance. Instead, rely on resource requests to guarantee minimum CPU availability and ensure containers use additional CPU when available and needed.
Set appropriate memory limits to prevent resource contention and out-of-memory conditions.
Use high-performance storage classes for better indexing performance. GitLab.com uses pd-balanced on GCP, which balances performance and cost. Equivalent options include gp3 on AWS and Premium_LRS on Azure.

VM and bare metal deployments

The following table shows recommended resources per node for VM and bare metal deployments based on index storage requirements. If you run multiple nodes, multiply these resources by the number of nodes to calculate total cluster resources.

Disk	VM size	Total CPU	Total memory	AWS	GCP	Azure
128 GB	Small	2 cores	16 GB	`r5.large`	`n1-highmem-2`	`Standard_E2s_v3`
256 GB	Medium	4 cores	32 GB	`r5.xlarge`	`n1-highmem-4`	`Standard_E4s_v3`
512 GB	Large	4 cores	64 GB	`r5.2xlarge`	`n1-highmem-8`	`Standard_E8s_v3`
1 TB	X-Large	8 cores	128 GB	`r5.4xlarge`	`n1-highmem-16`	`Standard_E16s_v3`
2 TB	2X-Large	16 cores	256 GB	`r5.8xlarge`	`n1-highmem-32`	`Standard_E32s_v3`

You can allocate these resources only to the entire node.

For VM and bare metal deployments:

Monitor CPU, memory, and disk usage to identify bottlenecks.
Consider using SSD storage for better indexing performance.
Ensure adequate network bandwidth for data transfer between GitLab and Zoekt nodes.

Storage

Zoekt storage requirements depend on the size of your Git repositories and your replica configuration. Zoekt indexes only Git object data (source code and commit history). It does not index LFS files, CI/CD artifacts, packages, wikis, or other storage components.

Estimate storage

To estimate storage requirements, run the Rake task:

shell

sudo gitlab-rake gitlab:zoekt:estimate_storage

This task queries your GitLab database and outputs a storage estimate based on your current repository sizes and replica configuration.

If you prefer to calculate manually, use:

plaintext

storage_per_replica = sum(repository_git_size) × buffer_factor
total_cluster_storage = storage_per_replica × number_of_replicas

Where repository_git_size is the Git object size for each repository. This value does not include LFS objects, wiki, artifacts, or packages. And buffer_factor is the headroom during initial indexing. It could be calculated as Search::Zoekt::Index.global_buffer_factor which is mostly 3 by default.

To view repository_git_size:

In the upper-right corner, select Admin.
Select Overview > Projects.
In the Repository column, view the Git object size.

For the initial provisioning target, start with three times your total repository_git_size multiplied by replica count. For example:

100 GB of Git repository data and one replica: 300 GB of Zoekt storage.
100 GB of Git repository data and two replicas: 600 GB of Zoekt storage.

GitLab reserves this buffer internally to ensure Zoekt has headroom during indexing. After initial indexing is complete, actual disk usage is typically closer to half the repository_git_size based on observed GitLab.com data. Scale vertically or horizontally only when needed.

You can view the current buffer factor in use by running:

shell

sudo gitlab-rake gitlab:zoekt:info

The output includes a Storage buffer factor line showing the value the planner is currently using and whether it is dynamic or the static fallback.

To monitor Zoekt node storage, see check indexing status. If namespaces are not indexed due to low disk space, add nodes or increase disk capacity.

Security and authentication

Zoekt implements a multi-layered authentication system to secure communication between GitLab, Zoekt indexer, and Zoekt webserver components. Authentication is enforced across all communication channels.

All authentication methods use the GitLab Shell secret. Failed authentication attempts return 401 Unauthorized responses.

Zoekt indexer to GitLab

The Zoekt indexer authenticates to GitLab with JSON web tokens (JWT) to retrieve indexing tasks and send completion callbacks.

This method uses .gitlab_shell_secret for signing and verification. Tokens are sent in the Gitlab-Shell-Api-Request header. Endpoints include:

GET /internal/search/zoekt/:uuid/heartbeat for task retrieval
POST /internal/search/zoekt/:uuid/callback for status updates

This method ensures secure polling for task distribution and status reporting between Zoekt indexer nodes and GitLab.

GitLab to the Zoekt webserver

JWT authentication

JWT authentication introduced in GitLab Zoekt 1.0.0.

GitLab authenticates to the Zoekt webserver with JSON web tokens (JWT) to execute search queries. JWT tokens provide time-limited, cryptographically signed authentication consistent with other GitLab authentication patterns.

This method uses Gitlab::Shell.secret_token and the HS256 algorithm (HMAC with SHA-256). Tokens are sent in the Authorization: Bearer <jwt_token> header and expire in five minutes to limit exposure.

Endpoints include /webserver/api/search and /webserver/api/v2/search. JWT claims are the issuer (gitlab) and the audience (gitlab-zoekt).