rfc/20251027-provider-registry-hashes.md
Related Issues:
Locking providers is currently a necessary, and usually painless process for the standard use case. There exist several advanced settings which alter the provider installation behavior enough to cause speed-bumps in this workflow.
OpenTofu locks providers to ensure that the provider installed has not been tampered with or adulterated either in transport or after installation. Locking is done with two different hashing mechanisms, zh and h1.
Zip Hashing is the older mechanism which consists of a sha256sum calculation of the entire provider release bundle (zip). In the tofu registry, this hash is provided in multiple ways and can be signed with a GPG key for added security.
This hash only protects the provider during initial download and extraction during the installation process, but is useless to check the unzipped/installed provider.
Hash Scheme 1 is a newer mechanism which uses go's sumdb Hash1 algorithm and DirHash function to provide a snapshot of what is expected post-installation. This ensures that any modification to the provider post-installation is detected and reported.
The population of the .terraform.lock.hcl file is managed by tofu init. The registry is queried for acceptable hashes for the providers present in the configuration, as well as for the corresponding download links. tofu init then downloads and extracts the providers to .terraform/providers. It records all zh's in the registry's response, as well as the single h1 hash for the current platform.
On the developer's machine, subsequent calls to tofu in the project directory will use the lockfile to ensure the locally installed providers have not been tampered with.
When running tofu in automation (Github actions, GitLab workflows, TACOs, etc...), tofu init will re-download the providers from the registry and will require at least one ZH to match. If the architecture does not match, a new h1 hash will be appended to the lockfile for each provider.
In automation, subsequent calls to tofu in the project directory will use the lockfile to ensure the locally installed providers have not been tampered with.
As mentioned above: If the architecture between a developer's machine and the automation system does not match (A developer running on a mac vs their CI/CD platform being a linux machine), a new h1 entry will be added to the lockfile as the provider is installed. For security reasons, some automation systems consider the lockfile read-only and will interpret this change to the lockfile as a blocking issue and refuse to continue. This is enforced via tofu init -lockfile=readonly.
This is usually fixed by running tofu providers lock on the developer's machine (sometimes as a git-hook) with additional architectures listed and committing the resulting lock file.
For efficiency, many people rely on a internal Network or Filesystem mirror for providers. This usually takes the form of an internal service or pre-built cache.
When utilizing a network mirror, tofu init will only fetch the download link and zh for the current platform to store in the lockfile. If this is during a developer's first tofu init or tofu init -upgrade, the lockfile will not contain anything other than the zh and h1 for the developer's platform and will fail on systems with a different architecture. Again, this can be addressed with tofu providers lock.
Filesystem mirrors can also be configured to contain only the pre-extracted providers. This has some significant performance advantages, but again is a stumbling block when multiple architectures are at play.
A developer may have run init on a standard machine, populating all of the zh entries and a single h1 entry for a different platform. An automation system with a filesystem mirror would not have any matching hash entries and would therefore fail to init.
The global plugin cache (TF_PLUGIN_CACHE_DIR) suffers from the same issue.
The only way to work around this is to allow TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE (potentially dangerous) or to use tofu providers lock on every provider upgrade/installation.
tofu providers lock can be given multiple platforms and will update the lockfile for each
External tools would like to manage the provider lock file
Technical limitations:
Given that we maintain the registry code and data in the OpenTofu org, we can add the additional h1 hashes to the registry metadata (more notes on this below) and extend our api responses to include that data. Once the registry is able to serve this new information, OpenTofu could optionally trust this new source of hashes.
The current registry trust chain is as follows for a given provider release bundle:
"signing_keys" signs the list of zh hashes linked to via "shasums_url" and "shasums_signature_url"
"shasum" exists within the list of signing keys given by "shasums_url", which has been verified in the previous step"shasum" matches the calculated value for the downloaded provider release bundleWe propose amending that trust chain:
This means that on tofu init, all applicable zh and h1 hashes will be stored in the dependency lock file (assuming all validation passes).
If you are:
Your current workflow will involve some form of tofu providers lock -platform=... -platform=... for every platform you might interact with, and committing the lockfile. Alternatively, you have disabled provider locking with tofu init -upgrade or similar (not recommended).
In practice, many users just run tofu init and are confused/annoyed when the plan they produce does not run correctly on other environments (different architecture).
With these proposed changes, that stumbling block goes away for this common scenario. tofu init will produce a fully functional lockfile across all supported platforms (depending on provider releases), no second step required.
For users that want to provider additional safety guarantees, they can also use tofu providers lock, which will keep the same functionality that it has today. It will have the added benefit that it will also compare the
h1 registry data to the downloaded archive and ensure no local corruption has occurred during the bundle extraction.
The largest technical hurdle with this approach is providing the h1 hashes in the registry v1 api response data. In practice, this means that we need to calculate this every time a new provider release is added to the registry, as well as back-filling the existing release data.
Calculating the h1 and zh values for a given provider binary is quite simple and relies entirely on standard library calls, a dozen lines of code. Amending the registry codebase to support this and writing a backfill tool is not terribly difficult, I have prototyped that in this commit for reference and to gather performance metrics.
Right now, we scan every single provider the registry knows about every 15 minutes and update the registry data if there are new version. This process (Bump Versions Github Action) takes ~4 minutes on average and we want to keep it in that same ballpark so we can maintain our bump rate even as the registry grows. For most providers this does not add any significant time, the entirety of the k8s provider (70 releases) took under 4 minutes to backfill. 3.5 min to download interleaved with 56s hashing, corresponding to under 4s added per "medium" sized provider release. The AWS provider is much larger and therefore slower, but in practice we can still bump a single release with hashing in under 20s.
We have also run the Bump Versions action in a different Github repo for a week with the patch linked above. It has not seen any dramatic slowdowns compared to the main repository.
The only scenario in which this cause a hiccup in bump times would be if a massive provider like AWS with thousands of release files was added via the standard submission pipeline. If we are concerned about that, we could modify the "Submit Provider" command to generate a more complete release file instead of relying on the bump process to pick it up later.
As with any API design, there are multiple ways to accomplish the same thing. In practice, our requirements are fairly clear given the above discussion.
OpenTofu fetches the json data at ${registryBaseURL}/${providerNamespace}/${providerType}/${providerVersion}/download/${OS}/${ARCH}. This includes everything that is necessary to download and check a release bundle. We propose adding the following optional field to this structure:
{
"packages": {
"${platform}": {
"hashes": [
"zh:${hashvalue}",
"h1:${hashvalue}"
],
"package_size": ${calculated_size_bytes},
}
}
}
This describes what additional hashes are valid for a provider release bundle across all platforms. It also leaves room to add additional types of hashes in the future.
By providing the "package_size" field, we also can reduce the likelihood of hash collision attacks.
Once Version Bump process above has been patched to support calculating this field for new versions, we can begin the backport process.
This process should:
Improving the performance of tofu providers lock
Have tofu init download and h1 checksum all the provider release platforms
The above solution works well for scenarios where the main source of truth is the OpenTofu registry. However, there are environments in which the lockfile is entirely generated from provider mirrors.
Today, OpenTofu only trusts a single locally verified hash (platform specific) from provider mirrors. This means that the lockfile generated in this scenario is incredibly sparse and subject to the speed-bumps described above.
Given that provider mirrors usually are part of a secure and managed internal environment, we should add a config option to the .tofurc file to trust all platform hashes reported by mirrors if we have successfully downloaded and verified the package for the current platform.
The other tweak we would likely need to make is extending tofu providers mirror to record the zh's along with the already present h1 hashes.
Although not common, we do occasionally encounter issues with our registry host (#3508). Users who have TF_PLUGIN_CACHE_DIR setup have found that their cache can not be used without the registry, even if the lockfile is fully populated.
We propose the following changes to the provider package installer (from @apparentlymart):
Before we make any request to the registry, we'd check whether the dependency lock file selection corresponds to a package already in the global cache directory. If so, we just install from that cache directory and don't interact with the registry or change the checksums in the dependency lock file at all, and the rest of the steps are skipped.
If the cache contents could not be immediately verified with the hashes in the dependency lock file then OpenTofu would make the registry API request, add new hashes to the lock file as appropriate, and re-validate the cache contents against the hashes returned from the registry.
If this succeeds then we can skip fetching the package itself from the registry and just install from the cache. Between this and the previous step we will use the cache if either the lock file or the registry can provide a hash that matches the cache directory.
If none of the hashes we can find in either the lock file or the registry match the cache directory then we fall back to fetching the package from the URL the registry indicated, just as we would if the cache directory were not present at all.
This also has the benefit of reducing the number of calls to the OpenTofu registry when tofu init is run massively in parallel, i.e Terragrunt and other automation tools.