rfc/20231108-registry-automatic-version-bumping.md
Issue: https://github.com/opentofu/opentofu/issues/837
[!NOTE]
This RFC was originally written by @RLRabinowitz and was ported from the old RFC process. It should not be used as a reference for current RFC best practices.
This proposal is part of the Homebrew-like Registry Design.
This proposal lays out how existing providers and modules would be updated, adding new versions to existing providers and modules. This proposal is based on scheduled automatic updated, checking if there are any new versions for the providers/modules, and also allows for manual updates for the providers/modules if necessary.
In order to update a provider/module, one would simply need to update the relevant JSON file for the module/provider. Once the JSON file is updated, a process that will eventually the newly-updated provider/module will start. This process will be further explained in a different RFC.
So here I'll explain how the provider/module JSON are going to be updated.
A module or provider author should not need to perform any action, other than cutting a release.
From @Yantrio: Right now the intended behavior is as follows for authors of modules/providers:
As new provider and module versions are constantly being published, we'd want the version bump process to be mostly automatic. This RFC takes an approach of a simple update process.
Once an hour, via a scheduled GitHub action, the update process will begin. It will go over all the existing providers and modules, and check if there are new versions for them.
The process will end up rebuilding some of the provider and module JSON files, and pushing those changed files directly to the main branch of the registry repository.
We will go by the following heuristic in order to minimize the amount of API calls we make to the GH API (specifically, the "releases" API):
https://github.com/<NAMESPACE>/terraform-provider-<PROVIDER>/releases.atom to get the latest tags of the repository. Then find the last semver tag that has a v prefix (so, the first matching element in the RSS feed). This action is very quick, and will most likely not be susceptible to throttling by GitHub
Authorization: Bearer <GITHUB_TOKEN> header, then you get 429 errors pretty quickly. By adding the Bearer token, those throttling errors are mitigated. I was not able to find those limitations in GitHub docs, but it does not take from Rest API call poolid field which is in format tag:github.com,2008:Repository/<REPO_ID>/<TAG_NAME>. Other fields are not as optimal for resolving the tag name (link isn't displayed correctly if tag name has a + sign for semver build, title might not represent the tag name in some specific scenarios)In this manner, if any new release exists, then we'd find its tag in the RSS feed and attempt to rebuild the provider JSON file. The only "false-positive" cases we'd attempt to rebuild the JSON file would be cases were the last tag is a pre-release tag for a release, or just a tag that's not ever going to get a release (very unlikely, mainly happens for old unmaintained providers)
git ls-remoteThere are a couple of advantages of getting the tags from the RSS feed, as compared to git ls-remote git command
git ls-remote we only get the tag names, without any metadata like the time of creation. We could only rely on the semver itself to attempt and guess which tag was created when. That's not as good because:
However, the RSS feed of releases.atom is not documented well, and its behaviour might change in the future. For example, there's no documentation regarding its limitations and quotas, or regarding using a bearer token.
We overwrite the entire content of versions in the provider JSON file. We will only keep the repository as-is, if it exists
version - Will be the release's tag, with the prefix v removedprotocols - Will be taken from the *_manifest.json artifact of the release. Otherwise, it would default to ["5.0"]shasums_url - Will be the download URL of the *_SHA256SUMS artifactshasums_signature_url - Will be the download URL of the *_SHA256SUMS.sig artifacttargets - A target will be created per each release artifact in format *_<OS>_<ARCH>.zip
os and arch will be taken from the file name of the release artifactfilename and download_url will be taken from the release artifact's informationshasum can be taken from the _SHA256SUMS file, cross-referenced with the current release artifactFor modules, we will simply list all tags, using git ls-remote, and pick the semver tags (with or without a v prefix).
Whether there's any difference compared to the existing JSON file or not, the process will build the JSON file fresh from the discovered semver tags. Building the file is extremely fast, and building it fresh would make sure that any new version (even new patch versions for a prior major release) would be added to the JSON file
Very simple. Each version should simply be a semver tag
This approach is very simple, and makes sure the update process is fast and not so prone to GH API throttling. We will only call the GH API if the latest semver tag of a provider does not exist, and not GH API calls will be made at all for the modules
We would allow manually update a provider/module's JSON files, via a PR. This would not be necessary in most cases, as the automatic version bump process should add new versions once an hour for all modules and providers. However, it might be necessary, for example, if a provider's already released artifacts have been changed and re-uploaded.
In this case, one could open a Pull Request to the registry repository, with the necessary changes. The core maintainers of OpenTofu would manually go over the Pull Request, and decide whether the change is OK and legitimate and can be put into the registry
This approach tries making the automatic version bumps for providers and modules as simple as possible, without the need to start working around GH API throttling issues and errors. The update process is pretty quick, and re-running it (if necessary) should be very easy
However, as the registry scales with more providers, this approach might not scale well if more providers are added that have "stranded" semver tags that are never going to be actually released. If there are more providers with such tags, this would mean that the update process would make more GH API calls
This is probably not much of a concern, as this is a rare case, and we can work around it if we'd like (remove those providers from the auto-update process, for example)
From @Yantrio: For pre-releases: We have to bear in mind that some providers are already heavily using Github Release pre-releases and their users expect these pre-releases to be consumed. We've hit this issue with our current registry implementation here: https://github.com/kbst/terraform-provider-kustomization/issues/240
We should be careful here when discussing this though. This is with regards to Github Releases being marked as pre-release, and not the semver version having a suffix to mark it as a pre-release.