Git LFS development guidelines

To handle large binary files, Git Large File Storage (LFS) involves several components working together. These guidelines explain the architecture and code flow for working on the GitLab LFS codebase.

For user documentation, see Git Large File Storage.

The following is a high-level diagram that explains Git push when Git LFS is in use:

mermaid

%%{init: { "fontFamily": "GitLab Sans" }}%%
flowchart LR
accTitle: Git pushes with Git LFS
accDescr: Explains how the LFS hook routes new files depending on type

A[Git push] -->B[LFS hook]
    B -->C[Pointers]
    B -->D[Binary files]
    C -->E[Repository]
    D -->F[LFS server]

This diagram is a high-level explanation of a Git pull when Git LFS is in use:

mermaid

%%{init: { "fontFamily": "GitLab Sans" }}%%
flowchart LR
accTitle: Git pull using Git LFS
accDescr: Explains how the LFS hook pulls LFS assets from the LFS server, and everything else from the Git repository

A[User] -->|initiates
git pull| B[Repository]
    B -->|Pull data and
LFS transfers| C[LFS hook]
    C -->|LFS pointers| D[LFS server]
    D -->|Binary
files| C
    C -->|Pull data and
binary files| A

Controllers and Services

Repositories::GitHttpClientController

The methods for authentication defined here are inherited by all the other LFS controllers.

Repositories::LfsApiController

`#batch`

After authentication the batch action is the first action called by the Git LFS client during downloads and uploads (such as pull, push, and clone).

Repositories::LfsStorageController

`#upload_authorize`

Provides payload to Workhorse including a path for Workhorse to save the file to. Could be remote object storage.

`#upload_finalize`

Handles requests from Workhorse that contain information on a file that workhorse already uploaded (see this middleware) so that gitlab can either:

Create an LfsObject.
Connect an existing LfsObject to a project with an LfsObjectsProject.

LfsObject and LfsObjectsProject

Only one LfsObject is created for a file with a given oid (a SHA256 checksum of the file) and file size.
LfsObjectsProject associate LfsObjects with Projects. They determine if a file can be accessed through a project.
These objects are also used for calculating the amount of LFS storage a given project is using. For more information, see ProjectStatistics#update_lfs_objects_size.

Repositories::LfsLocksApiController

Handles the lock API for LFS. Delegates mostly to corresponding services:

Lfs::LockFileService
Lfs::UnlockFileService
Lfs::LocksFinderService

These services create and delete LfsFileLock.

`#verify`

This endpoint responds with a payload that allows a client to check if there are any files being pushed that have locks that belong to another user.
A client-side lfs.locksverify configuration can be set so that the client aborts the push if locks exist that belong to another user.
The existence of locks belonging to other users is also validated on the server side.

Example authentication

mermaid

%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
autonumber
    alt Over HTTPS
        Git client-->>Git client: user-supplied credentials
    else Over SSH
        Git client->>gitlab-shell: git-lfs-authenticate
        activate gitlab-shell
        activate GitLab Rails
        gitlab-shell->>GitLab Rails:  POST /api/v4/internal/lfs_authenticate
        GitLab Rails-->>gitlab-shell: token with expiry
        deactivate gitlab-shell
        deactivate GitLab Rails
    end

Clients can be configured to store credentials in a few different ways. See the Git LFS documentation on authentication.
Running gitlab-lfs-authenticate on gitlab-shell. See the Git LFS documentation concerning gitlab-lfs-authenticate.
gitlab-shellmakes a request to the GitLab API.
Responding to shell with token which is used in subsequent requests. See Git LFS documentation concerning authentication.

Example clone

mermaid

%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
    Note right of Git client: Typical Git clone things happen first
    Note right of Git client: Authentication for LFS comes next
    activate GitLab Rails
    autonumber
    Git client->>GitLab Rails: POST project/namespace/info/lfs/objects/batch
    GitLab Rails-->>Git client: payload with objects
    deactivate GitLab Rails
    loop each object in payload
    Git client->>GitLab Rails: GET project/namespace/gitlab-lfs/objects/:oid/ (<- This URL is from the payload)
    GitLab Rails->>Workhorse: SendfileUpload
    Workhorse-->> Git client: Binary data
    end

Git LFS requests to download files with authorization header from authorization.
gitlab responds with the list of objects and where to find them. See LfsApiController#batch.
Git LFS makes a request for each file for the href in the previous response. See how downloads are handled with the basic transfer mode.
gitlab redirects to the remote URL if remote object storage is enabled. See SendFileUpload.

Example push

mermaid

%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
    Note right of Git client: Typical Git push things happen first.
    Note right of Git client: Authentication for LFS comes next.
    autonumber
    activate GitLab Rails
        Git client ->> GitLab Rails: POST project/namespace/info/lfs/objects/batch
        GitLab Rails-->>Git client: payload with objects
    deactivate GitLab Rails
    loop each object in payload
    Git client->>Workhorse: PUT project/namespace/gitlab-lfs/objects/:oid/:size (URL is from payload)
    Workhorse->>GitLab Rails: PUT project/namespace/gitlab-lfs/objects/:oid/:size/authorize
    GitLab Rails-->>Workhorse: response with where path to upload
    Workhorse->>Workhorse: Upload
    Workhorse->>GitLab Rails: PUT project/namespace/gitlab-lfs/objects/:oid/:size/finalize
    end

Git LFS requests to upload files.
gitlab responds with the list of objects and uploads to find them. See LfsApiController#batch.
Git LFS makes a request for each file for the href in the previous response. See how uploads are handled with the basic transfer mode.
gitlab responds with a payload including a path for Workhorse to save the file to. Could be remote object storage. See LfsStorageController#upload_authorize.
Workhorse does the work of saving the file.
Workhorse makes a request to gitlab with information on the uploaded file so that gitlab can create an LfsObject. See LfsStorageController#upload_finalize.

Including LFS blobs in project archives

The following diagram illustrates how GitLab resolves LFS files for project archives:

mermaid

%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
    autonumber
    Client->>+Workhorse: GET /group/project/-/archive/master.zip
    Workhorse->>+Rails: GET /group/project/-/archive/master.zip
    Rails->>+Workhorse: Gitlab-Workhorse-Send-Data git-archive
    Workhorse->>Gitaly: SendArchiveRequest
    Gitaly->>Git: git archive master
    Git->>Smudge: OID 12345
    Smudge->>+Workhorse: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
    Workhorse->>+Rails: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
    Rails->>+Workhorse: Gitlab-Workhorse-Send-Data send-url
    Workhorse->>Smudge: <LFS data>
    Smudge->>Git: <LFS data>
    Git->>Gitaly: <streamed data>
    Gitaly->>Workhorse: <streamed data>
    Workhorse->>Client: master.zip

The user requests the project archive from the UI.
Workhorse forwards this request to Rails.
If the user is authorized to download the archive, Rails replies with an HTTP header of Gitlab-Workhorse-Send-Data with a base64-encoded JSON payload prefaced with git-archive. This payload includes the SendArchiveRequest binary message, which is encoded again in base64.
Workhorse decodes the Gitlab-Workhorse-Send-Data payload. If the archive already exists in the archive cache, Workhorse sends that file. Otherwise, Workhorse sends the SendArchiveRequest to the appropriate Gitaly server.
The Gitaly server calls git archive <ref> to begin generating the Git archive on-the-fly. If the include_lfs_blobs flag is enabled, Gitaly enables a custom LFS smudge filter with the -c filter.lfs.smudge=/path/to/gitaly-lfs-smudge Git option.
When git identifies a possible LFS pointer using the .gitattributes file, git calls gitaly-lfs-smudge and provides the LFS pointer through the standard input. Gitaly provides GL_PROJECT_PATH and GL_INTERNAL_CONFIG as environment variables to enable lookup of the LFS object.
If a valid LFS pointer is decoded, gitaly-lfs-smudge makes an internal API call to Workhorse to download the LFS object from GitLab.
Workhorse forwards this request to Rails. If the LFS object exists and is associated with the project, Rails sends ArchivePath either with a path where the LFS object resides (for local disk) or a pre-signed URL (when object storage is enabled) with the Gitlab-Workhorse-Send-Data HTTP header with a payload prefaced with send-url.
Workhorse retrieves the file and send it to the gitaly-lfs-smudge process, which writes the contents to the standard output.
git reads this output and sends it back to the Gitaly process.
Gitaly sends the data back to Rails.
The archive data is sent back to the client.

In step 7, the gitaly-lfs-smudge filter must talk to Workhorse, not to Rails, or an invalid LFS blob is saved. To support this, GitLab changed the default Linux package configuration to have Gitaly talk to the Workhorse instead of Rails.

One side effect of this change: the correlation ID of the original request is not preserved for the internal API requests made by Gitaly (or gitaly-lfs-smudge), such as the one made in step 8. The correlation IDs for those API requests are random values until this Workhorse issue is resolved.

Blog post: Getting started with Git LFS
User documentation: Git Large File Storage (LFS)
GitLab Git Large File Storage (LFS) Administration for GitLab Self-Managed

Controllers and Services

Repositories::GitHttpClientController

Repositories::LfsApiController

#batch

Repositories::LfsStorageController

#upload_authorize

#upload_finalize

LfsObject and LfsObjectsProject

Repositories::LfsLocksApiController

#verify

Example authentication

Example clone

Example push

Including LFS blobs in project archives

Related topics

`#batch`

`#upload_authorize`

`#upload_finalize`

`#verify`