doc/development/packages/new_format_development.md
This document guides you through adding support to GitLab for a new a package management system.
See the already supported formats in the Packages and registries documentation
It is possible to add a new format with only backend changes. This guide is superficial and does not cover the way the code should be written. However, you can find a good example by looking at the following merge requests:
The existing database model requires the following:
Package systems work with GitLab via API. For example lib/api/npm_project_packages.rb
implements API endpoints to work with npm clients. So, the first thing to do is to
add a new lib/api/your_name_project_packages.rb file with API endpoints that are
necessary to make the package system client work. Usually that means having
endpoints like:
Because the packages belong to a project, it's expected to have project-level endpoint (remote) for uploading and downloading them. For example:
GET https://gitlab.com/api/v4/projects/<your_project_id>/packages/npm/
PUT https://gitlab.com/api/v4/projects/<your_project_id>/packages/npm/
Group-level and instance-level endpoints should only be considered after the project-level endpoint is available in production.
Packages are scoped within various levels of access, which is generally configured by setting your remote. A remote endpoint may be set at the project level, meaning when installing packages, only packages belonging to that project are visible. Alternatively, a group-level endpoint may be used to allow visibility to all packages in a given group. Lastly, an instance-level endpoint can be used to allow visibility to all packages in an entire GitLab instance.
As an MVC, we recommend beginning with a project-level endpoint. A typical iteration plan for remote hierarchies is to go from:
Using instance-level endpoints requires stricter naming conventions.
[!note] Composer package naming scope is Instance Level.
To avoid name conflict for instance-level endpoints you must define a package naming convention that gives a way to identify the project that the package belongs to. This generally involves using the project ID or full project path in the package name. For more information with an example, see Package recipe naming convention for instance remotes.
For group and project-level endpoints, naming can be less constrained and it is up to the group and project members to be certain that there is no conflict between two package names. However, the system should prevent a user from reusing an existing name within a given scope.
Otherwise, naming should follow the package manager's naming conventions and include a validation in the package.md
model for that package type.
Logic for performing tasks such as creating package or package file records or finding packages should not live in the API file, but should live in services and finders. Existing services and finders should be used or extended when possible to keep the common package logic grouped as much as possible.
GitLab has a packages section in its configuration file (gitlab.rb or gitlab.yml).
It applies to all package systems supported by GitLab. Usually you don't need
to add anything there.
Packages can be configured to use object storage, therefore your code must support it.
The way new package systems are integrated in GitLab is using an MVC. Therefore, the first iteration should support the bare minimum user actions:
Required actions are all the additional requests that GitLab must handle so the corresponding package manager CLI can work properly. It could be a search feature or an endpoint providing meta information about a package. For example:
npm to get the tarball URL.For the first MVC iteration, it's recommended to stay at the project level of the remote hierarchy. Other levels can be tackled with future Merge Requests.
The MVC usually has two phases:
When implementing a new package manager, it is tempting to create one large merge request containing all of the necessary endpoints and services necessary to support basic usage. Instead:
During this phase, the idea is to collect as much information as possible about the API used by the package system. Here some aspects that can be useful to include:
The analysis usually takes a full milestone to complete, though it's not impossible to start the implementation in the same milestone.
In particular, the upload request can have some requirements in the GitLab Workhorse project. This project has a different release cycle than the rails backend. It's strongly recommended that you open an issue there as soon as the upload request analysis is done. This way GitLab Workhorse is already ready when the upload request is implemented on the rails backend.
The implementation of the different Merge Requests varies between different package system integrations. Contributors should take into account some important aspects of the implementation phase.
The MVC must support personal access tokens right from the start. We support two options for these tokens: OAuth and Basic Access.
OAuth authentication is already supported. You can see an example in the npm API.
Basic Access authentication
support is done by overriding a specific function in the API helpers, like
this example in the Conan API.
For this authentication mechanism, keep in mind that some clients can send an unauthenticated
request first, wait for the 401 Unauthorized response with the WWW-Authenticate
field, then send an updated (authenticated) request. This case is more involved as
GitLab must handle the 401 Unauthorized response. The NuGet API
supports this case.
Project permissions and group permissions exist for read_package, create_package, and destroy_package. Each
endpoint should
authorize the requesting user
against the project or group before continuing.
The current database model allows you to store a name and a version for each package.
Every time you upload a new package, you can either create a new record of Package
or add files to existing record. PackageFile should be able to store all file-related
information like the file name, side, sha1, and so on.
If there is specific data necessary to be stored for only one package system support,
consider creating a separate metadata model. See packages_maven_metadata table
and Packages::Maven::Metadatum model as an example for package specific data, and packages_conan_file_metadata table
and Packages::Conan::FileMetadatum model as an example for package file specific data.
If there is package specific behavior for a given package manager, add those methods to the metadata models and delegate from the package model.
The existing package UI only displays information in the packages_packages and packages_package_files
tables. If the data stored in the metadata tables must be displayed, a ~frontend change is required.
File uploads should be handled by GitLab Workhorse using object accelerated uploads. What this means is that the workhorse proxy that checks all incoming requests to GitLab intercept the upload request, upload the file, and forward a request to the main GitLab codebase only containing the metadata and file location rather than the file itself. An overview of this process can be found in the development documentation.
In terms of code, this means a route must be added to the GitLab Workhorse project for each upload endpoint being added (instance, group, project). This merge request demonstrates adding an instance-level endpoint for Conan to workhorse. You can also see the Maven project level endpoint implemented in the same file.
After the route has been added, you must add an additional /authorize version of the upload endpoint to your API file.
This example
shows the additional endpoint added for Maven. The /authorize endpoint verifies and authorizes the request from workhorse,
then the typical upload endpoint is implemented below, consuming the metadata that Workhorse provides to
create the package record. Workhorse provides a variety of file metadata such as type, size, and different checksum formats.
For testing purposes, you may want to enable object storage in your local development environment.
Files uploaded to the GitLab package registry are limited by format. On GitLab.com, these are typically set to 5 GB to help prevent timeout issues and abuse.
When a new package type is added to the Packages::Package model, a size limit must be added
similar to this example,
or the related test
must be updated if file size limits do not apply. The only reason a size limit does not apply is if
the package format does not upload and store package files.
Package manager clients can make rapid requests that exceed the
GitLab.com standard API rate limits.
This results in a 429 Too Many Requests error.
We have opened a set of paths to allow higher rate limits. Unless it is not possible, new package managers should follow these conventions so they can take advantage of the expanded package rate limit.
These route prefixes guarantee a higher rate limit:
/api/v4/packages/
/api/v4/projects/:project_id/packages/
/api/v4/groups/:group_id/-/packages/
When adding support to GitLab for a new package manager, the first iteration must contain the following features. You can add the features through many merge requests as needed, but all the features must be implemented when the feature flag is removed.
db/fixtures/development/26_packages.rbWhile working on the MVC, contributors might find features that are not mandatory for the MVC but can provide a better user experience. It's generally a good idea to keep an eye on those and open issues.
Here are some examples
This documentation is just guidelines on how to implement a package manager to match the existing structure and logic already present in GitLab. While the structure is intended to be extendable and flexible enough to allow for any given package manager, if there is good reason to stray due to the constraints or needs of a given package manager, then it should be raised and discussed in the implementation issue or merge request to work towards the most efficient outcome.