doc/development/lfs.md
To handle large binary files, Git Large File Storage (LFS) involves several components working together. These guidelines explain the architecture and code flow for working on the GitLab LFS codebase.
For user documentation, see Git Large File Storage.
The following is a high-level diagram that explains Git push when Git LFS is in use:
%%{init: { "fontFamily": "GitLab Sans" }}%%
flowchart LR
accTitle: Git pushes with Git LFS
accDescr: Explains how the LFS hook routes new files depending on type
A[Git push] -->B[LFS hook]
B -->C[Pointers]
B -->D[Binary files]
C -->E[Repository]
D -->F[LFS server]
This diagram is a high-level explanation of a Git pull when Git LFS is in use:
%%{init: { "fontFamily": "GitLab Sans" }}%%
flowchart LR
accTitle: Git pull using Git LFS
accDescr: Explains how the LFS hook pulls LFS assets from the LFS server, and everything else from the Git repository
A[User] -->|initiates
git pull| B[Repository]
B -->|Pull data and
LFS transfers| C[LFS hook]
C -->|LFS pointers| D[LFS server]
D -->|Binary
files| C
C -->|Pull data and
binary files| A
The methods for authentication defined here are inherited by all the other LFS controllers.
#batchAfter authentication the batch action is the first action called by the Git LFS
client during downloads and uploads (such as pull, push, and clone).
#upload_authorizeProvides payload to Workhorse including a path for Workhorse to save the file to. Could be remote object storage.
#upload_finalizeHandles requests from Workhorse that contain information on a file that workhorse already uploaded (see this middleware) so that gitlab can either:
LfsObject.LfsObject to a project with an LfsObjectsProject.LfsObject is created for a file with a given oid (a SHA256 checksum of the file) and file size.LfsObjectsProject associate LfsObjects with Projects. They determine if a file can be accessed through a project.ProjectStatistics#update_lfs_objects_size.Handles the lock API for LFS. Delegates mostly to corresponding services:
Lfs::LockFileServiceLfs::UnlockFileServiceLfs::LocksFinderServiceThese services create and delete LfsFileLock.
#verifylfs.locksverify configuration can be set so that the client aborts the push if locks exist that belong to another user.%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
autonumber
alt Over HTTPS
Git client-->>Git client: user-supplied credentials
else Over SSH
Git client->>gitlab-shell: git-lfs-authenticate
activate gitlab-shell
activate GitLab Rails
gitlab-shell->>GitLab Rails: POST /api/v4/internal/lfs_authenticate
GitLab Rails-->>gitlab-shell: token with expiry
deactivate gitlab-shell
deactivate GitLab Rails
end
gitlab-lfs-authenticate on gitlab-shell. See the Git LFS documentation concerning gitlab-lfs-authenticate.gitlab-shellmakes a request to the GitLab API.%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
Note right of Git client: Typical Git clone things happen first
Note right of Git client: Authentication for LFS comes next
activate GitLab Rails
autonumber
Git client->>GitLab Rails: POST project/namespace/info/lfs/objects/batch
GitLab Rails-->>Git client: payload with objects
deactivate GitLab Rails
loop each object in payload
Git client->>GitLab Rails: GET project/namespace/gitlab-lfs/objects/:oid/ (<- This URL is from the payload)
GitLab Rails->>Workhorse: SendfileUpload
Workhorse-->> Git client: Binary data
end
gitlab responds with the list of objects and where to find them. See
LfsApiController#batch.href in the previous response. See
how downloads are handled with the basic transfer mode.gitlab redirects to the remote URL if remote object storage is enabled. See
SendFileUpload.%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
Note right of Git client: Typical Git push things happen first.
Note right of Git client: Authentication for LFS comes next.
autonumber
activate GitLab Rails
Git client ->> GitLab Rails: POST project/namespace/info/lfs/objects/batch
GitLab Rails-->>Git client: payload with objects
deactivate GitLab Rails
loop each object in payload
Git client->>Workhorse: PUT project/namespace/gitlab-lfs/objects/:oid/:size (URL is from payload)
Workhorse->>GitLab Rails: PUT project/namespace/gitlab-lfs/objects/:oid/:size/authorize
GitLab Rails-->>Workhorse: response with where path to upload
Workhorse->>Workhorse: Upload
Workhorse->>GitLab Rails: PUT project/namespace/gitlab-lfs/objects/:oid/:size/finalize
end
gitlab responds with the list of objects and uploads to find them. See
LfsApiController#batch.href in the previous response. See
how uploads are handled with the basic transfer mode.gitlab responds with a payload including a path for Workhorse to save the file to.
Could be remote object storage. See
LfsStorageController#upload_authorize.gitlab with information on the uploaded file so
that gitlab can create an LfsObject. See
LfsStorageController#upload_finalize.The following diagram illustrates how GitLab resolves LFS files for project archives:
%%{init: { "fontFamily": "GitLab Sans" }}%%
sequenceDiagram
autonumber
Client->>+Workhorse: GET /group/project/-/archive/master.zip
Workhorse->>+Rails: GET /group/project/-/archive/master.zip
Rails->>+Workhorse: Gitlab-Workhorse-Send-Data git-archive
Workhorse->>Gitaly: SendArchiveRequest
Gitaly->>Git: git archive master
Git->>Smudge: OID 12345
Smudge->>+Workhorse: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
Workhorse->>+Rails: GET /internal/api/v4/lfs?oid=12345&gl_repository=project-1234
Rails->>+Workhorse: Gitlab-Workhorse-Send-Data send-url
Workhorse->>Smudge: <LFS data>
Smudge->>Git: <LFS data>
Git->>Gitaly: <streamed data>
Gitaly->>Workhorse: <streamed data>
Workhorse->>Client: master.zip
Gitlab-Workhorse-Send-Data with a base64-encoded
JSON payload prefaced with git-archive. This payload includes the
SendArchiveRequest binary message, which is encoded again in base64.Gitlab-Workhorse-Send-Data payload. If the
archive already exists in the archive cache, Workhorse sends that
file. Otherwise, Workhorse sends the SendArchiveRequest to the
appropriate Gitaly server.git archive <ref> to begin generating
the Git archive on-the-fly. If the include_lfs_blobs flag is enabled,
Gitaly enables a custom LFS smudge filter with the -c filter.lfs.smudge=/path/to/gitaly-lfs-smudge
Git option.git identifies a possible LFS pointer using the
.gitattributes file, git calls gitaly-lfs-smudge and provides the
LFS pointer through the standard input. Gitaly provides GL_PROJECT_PATH
and GL_INTERNAL_CONFIG as environment variables to enable lookup of
the LFS object.gitaly-lfs-smudge makes an
internal API call to Workhorse to download the LFS object from GitLab.ArchivePath either
with a path where the LFS object resides (for local disk) or a
pre-signed URL (when object storage is enabled) with the
Gitlab-Workhorse-Send-Data HTTP header with a payload prefaced with
send-url.gitaly-lfs-smudge
process, which writes the contents to the standard output.git reads this output and sends it back to the Gitaly process.In step 7, the gitaly-lfs-smudge filter must talk to Workhorse, not to
Rails, or an invalid LFS blob is saved. To support this, GitLab
changed the default Linux package configuration to have Gitaly talk to the Workhorse
instead of Rails.
One side effect of this change: the correlation ID of the original
request is not preserved for the internal API requests made by Gitaly
(or gitaly-lfs-smudge), such as the one made in step 8. The
correlation IDs for those API requests are random values until
this Workhorse issue is
resolved.