notes/pep508.md
This document collects my notes on PEP 508.
A package manager for Python is naturally constrained by the metadata format that the community embraces. While a package manager can in theory add a custom metadata format like poetry did, it limits the ability to publish such packages and prevents inter-tool compatibility.
The metadata format in the Python ecosystem is the pyproject.toml and the most significant
limitation is the dependency specification from PEP 508.
For a package manager the most important piece of information are dependencies. In Python there are two dependency sections, both are very limited in that they are lists of strings.
For instance project.dependencies is an array where each item is a PEP 508 string. This
means that within that array, only information can be carried that PEP 508 supports. If a
package manager requires additional information (temporary or permanently) it does not find
place in the array for it.
Example:
[project]
dependencies = [
'Flask>=2.0',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]
If additional information wants to be stored with either one of those dependencies, the only reasonable correlation today would be to either duplicate it elsewhere, or to index by number.
[tool.rye.dependencies_meta.0]
extra_information = 42
No matter how, this is always a challenge. In Cargo.toml the information is both kept as
array of objects, but additionally "merging" sections are provided. This means that platform
and target specific dependencies are merged in at a higher level. Additionally if a dependency
is contained more than once, it can be renamed. For Python the Cargo.toml format does not make
too much sense, but some alternatives might work:
[project.dependencies]
Flask = { version = ">=2.0" }
[project.dependencies.more-itertools]
match = [
{ version=">=4.0.0,<6.0.0", python_version = "<=2.7" },
{ version=">=4.0.0", python_version = ">2.7" }
]
In that case at all times an object is placed in the format where extra keys could be added:
[project.dependencies]
Flask = { version = ">=2.0", extra_information = 42 }
That however would be a significant departure from where the format is today. A pragmatic
alternative could be to utilize the URL portion of the dependency to funnel additional
information into the package. Unfortunately PyPI will refuse to accept any PEP 508 marker
that contains a URL. Additionally URLs also have restrictions in that they cannot be
combined with version markers. The former issue might not be an issue as such a
dependency could be rewritten before publishing into an automated format.
cargo does something like this today already with Cargo.toml which is rewritten for
publishing.
Given that PEP 508 cannot really be amended without causing a massive rift in the ecosystem the solution has to come down to encoding it into something that is already part of the structure. My current thinking is that the format should permit objects where the strings are today and to re-write on publish. So this format might become possible:
[project]
dependencies = [
{ version = 'Flask@>=2.0', extra_information = 42 },
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]
# turns into
[project]
dependencies = [
'Flask@>=2.0',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]
[project.dependencies_meta.0]
extra_information = 42
Unfortunately that would still cause issues for tools that locally interpret pyproject.toml
files.
Alternatively a correlation marker could be added. This could be done by inventing a new marker (requires double quoting):
[project]
dependencies = [
'Flask@>=2.0 ; "id" == "x"',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]
[project.dependencies_meta.x]
extra_information = 42
If the correlation marker is missing, the package name would be assumed (in this case Flask).
In a very similar manner local dependencies do not really work in that world yet. Personally
I believe this is similar to the issue above as we have no reasonable way to encode both the
version and the path location into one place. Contrast this to Cargo.toml:
[dependencies]
foo = { version = "1.0", path = "../crates" }
Upon publishing, cargo removes the path marker. Before publishing, the path takes
precedence over the lookup from the index.
If one were to be able to encode a correlation marker maybe this format could work:
[project]
dependencies = [
'Flask@>=2.0 ; "id" == "flask"',
'more-itertools>=4.0.0,<6.0.0;python_version<="2.7"',
'more-itertools>=4.0.0;python_version>"2.7"',
]
[project.dependencies_meta.flask]
path = "../packages/flask"
(Again, if there is no marker, the package name could be directly used).