doc/administration/repository_storage_paths.md
{{< details >}}
{{< /details >}}
GitLab stores repositories on repository storage. Repository storage is either:
gitaly_address that points to a Gitaly node.[!warning] Repository storage could be configured as a
paththat points directly to the directory where the repositories are stored. GitLab directly accessing a directory containing repositories is deprecated. You should configure GitLab to access repositories through a physical or virtual storage.
For more information on:
Hashed storage stores projects on disk in a location based on a hash of the project's ID. This makes the folder structure immutable and eliminates the need to synchronize state from URLs to disk structure. This means that renaming a group, user, or project:
The hash also helps spread the repositories more evenly on the disk. The top-level directory contains fewer folders than the total number of top-level namespaces.
The hash format is based on the hexadecimal representation of a SHA256, calculated with
SHA256(project.id). The top-level folder uses the first two characters, followed by another folder
with the next two characters. They are both stored in a special @hashed folder so they can
co-exist with existing legacy storage projects. For example:
# Project's repository:
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
# Wiki's repository:
"@hashed/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
Troubleshooting problems with the Git repositories, adding hooks, and other tasks requires you translate between the human-readable project name and the hashed storage path. You can translate:
{{< history >}}
{{< /history >}}
Administrators can look up a project's hashed path from its name or ID using:
To look up a project's hash path in the Admin area:
In the upper-right corner, select Admin.
Select Overview > Projects and select the project.
Locate the Relative path field. The value is similar to:
"@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git"
To look up a project's hash path using a Rails console:
Start a Rails console.
Run a command similar to this example (use either the project's ID or its name):
Project.find(16).disk_path
Project.find_by_full_path('group/project').disk_path
Administrators can look up a project's name from its hashed relative path using:
config file in the *.git directory.To look up a project's name using the Rails console:
Start a Rails console.
Run a command similar to this example:
ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project
The quoted string in that command is the directory tree you can find on your GitLab server. For
example, on a default Linux package installation this would be /var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git
with .git from the end of the directory name removed.
The output includes the project ID and the project name. For example:
=> #<Project id:16 it/supportteam/ticketsystem>
To look up a project's full path using the Rails console:
Start a Rails console.
Run a command similar to this example:
ProjectRepository.find_by(disk_path: '@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9').project.full_path
In the example, the quoted string in that command is the directory tree on your GitLab server.
For example, on a default Linux package installation, this string would be
/var/opt/gitlab/git-data/repositories/@hashed/b1/7e/b17ef6d19c7a5b1ee83b907c595526dcb1eb06db8227d650d5dda0a9f4ce8cd9.git,
with .git removed from the end of the directory name.
The output includes the full path of the project. For example:
=> "it/supportteam/ticketsystem"
Object pools are repositories used to deduplicate forks of public and internal projects and
contain the objects from the source project. Using objects/info/alternates, the source project and
forks use the object pool for shared objects. For more information, see
Git object deduplication information in the GitLab development documentation.
Objects are moved from the source project to the object pool when housekeeping is run on the source
project. Object pool repositories are stored similarly to regular repositories in a directory called @pools instead of @hashed
# object pool paths
"@pools/#{hash[0..1]}/#{hash[2..3]}/#{hash}.git"
[!warning] Do not run
git pruneorgit gcin object pool repositories, which are stored in the@poolsdirectory. This can cause data loss in the regular repositories that depend on the object pool.
To look up a project's object pool using a Rails console:
Start a Rails console.
Run a command similar to the following example:
project_id = 1
pool_repository = Project.find(project_id).pool_repository
pool_repository = Project.find_by_full_path('group/project').pool_repository
# Get more details about the pool repository
pool_repository.source_project
pool_repository.member_projects
pool_repository.shard
pool_repository.disk_path
Unlike project wikis that are stored in the @hashed directory, group wikis are stored in a directory called @groups.
Like project wikis, group wikis follow the hashed storage folder convention, but use a hash of the group ID rather than the project ID.
For example:
# group wiki paths
"@groups/#{hash[0..1]}/#{hash[2..3]}/#{hash}.wiki.git"
If Gitaly Cluster (Praefect) is used, Praefect manages storage locations. The internal path used by Praefect for the repository differs from the hashed path. For more information, see Praefect-generated replica paths.
Users can download an archive in formats such as .zip or .tar.gz of a repository by using either:
GitLab stores this archive in a cache in a directory on the GitLab server.
The location of the cache depends on your installation method:
/var/opt/gitlab/gitlab-rails/shared/cache/archive. You can configure this with
the gitlab_rails['gitlab_repository_downloads_path'] setting in /etc/gitlab/gitlab.rb./srv/gitlab/shared/cache/archive. The directory cannot be configured.A background job running on Sidekiq periodically cleans out stale archives from this directory. For this reason, this directory must be accessible by all Sidekiq and GitLab Workhorse nodes. If Sidekiq can't access the same directory used by GitLab Workhorse, the disk containing the directory fills up.
If you don't want to use a shared mount for Sidekiq and GitLab
Workhorse, you can instead configure a separate cron job to delete
files from this directory.
Alternatively, you can disable the cache entirely:
{{< tabs >}}
{{< tab title="Linux package (Omnibus)" >}}
To disable the cache:
Set the WORKHORSE_ARCHIVE_CACHE_DISABLED environment variable on all nodes that run Puma:
sudo -e /etc/gitlab/gitlab.rb
gitlab_rails['env'] = { 'WORKHORSE_ARCHIVE_CACHE_DISABLED' => '1' }
Reconfigure the updated nodes for the change to take effect:
sudo gitlab-ctl reconfigure
{{< /tab >}}
{{< tab title="Helm chart (Kubernetes)" >}}
To disable the cache, you can use --set gitlab.webservice.extraEnv.WORKHORSE_ARCHIVE_CACHE_DISABLED="1", or
specify the following in your values file:
gitlab:
webservice:
extraEnv:
WORKHORSE_ARCHIVE_CACHE_DISABLED: "1"
{{< /tab >}}
{{< /tabs >}}
This table shows which storable objects are storable in each storage type:
| Storable object | Hashed storage | S3 compatible |
|---|---|---|
| Repository | Yes | - |
| Attachments | Yes | - |
| Avatars | No | - |
| Pages | No | - |
| Docker Registry | No | - |
| CI/CD job logs | No | - |
| CI/CD artifacts | No | Yes |
| CI/CD cache | No | Yes |
| LFS objects | Similar | Yes |
| Repository pools | Yes | - |
Files stored in an S3-compatible endpoint can have the same advantages as
hashed storage, as long as they are not prefixed with
#{namespace}/#{project_name}. This is true for CI/CD cache and LFS objects.
Each file is stored in a directory that matches the id assigned to it in the database. The
filename is always avatar.png for user avatars. When an avatar is replaced, the Upload model is
destroyed and a new one takes place with a different id.
CI/CD artifacts are S3-compatible.
LFS Objects in GitLab implement a similar storage pattern using two characters and two-level folders, following the Git implementation:
"shared/lfs-objects/#{oid[0..1}/#{oid[2..3]}/#{oid[4..-1]}"
# Based on object `oid`: `8909029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c`, path will be:
"shared/lfs-objects/89/09/029eb962194cfb326259411b22ae3f4a814b5be4f80651735aeef9f3229c"
LFS objects are also S3-compatible.
After you configure multiple repository storages, you can choose where new repositories are stored:
Each repository storage path can be assigned a weight from 0-100. When a new project is created, these weights are used to determine the storage location the repository is created on.
The higher the weight of a given repository storage path relative to other repository storages
paths, the more often it is chosen ((storage weight) / (sum of all weights) * 100 = chance %).
By default, if repository weights have not been configured earlier:
default is weighted 100.0.[!note] If all storage weights are
0(for example, whendefaultdoes not exist), GitLab attempts to create new repositories ondefault, regardless of the configuration or ifdefaultexists. See the tracking issue for more information.
To move a repository to a different repository storage (for example, from default to storage2), use the
same process as migrating to Gitaly Cluster (Praefect).