Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use vers: with pkg:? #386

Open
jaimergp opened this issue Feb 3, 2025 · 11 comments
Open

How to use vers: with pkg:? #386

jaimergp opened this issue Feb 3, 2025 · 11 comments

Comments

@jaimergp
Copy link

jaimergp commented Feb 3, 2025

@pombredanne I might have missed something obvious, but how is vers combined with purl? That is, how do you I describe a version range instead of a specific version with a purl?

These comments https://github.com/package-url/purl-spec/pull/139/files#r811475168 in the PR are touching on this point as well but I don't see anything mentioned in the final specs.

Originally posted by @tobiasdiez in #66


Echoing this comment ^ by @tobiasdiez, what's the current recommendation? The prior conversation hinted that putting the vers string in the URL would mean characters like > have to be %-encoded, which hinders readability.

So is the current recommendation to just AND the two URIs together? For example, to refer to the PyPI requests library with versions >=3,<4 do I need to spell it like ["pkg:pypi/requests", "vers:pypi/>=3,<4"]?

It would be nice if we could repurpose the @version field in pkg instead. With no operators, it's a version (implied exact equality). If there's one or more operators, then it's a range. The namespace would be the same as the pkg if unspecified, or a generic one if necessary.

Or if backwards compatibility is a concern, is this a case for a new protocol dep: (or req:?) where pkg and vers are merged to exhibit the proposed behavior?

@matt-phylum
Copy link
Contributor

It sounds nice but there are some potential issues:

I've seen, because of software bugs or user error, PURLs that look like pkg:pypi/requests@>=3,<4. >=3,<4 is not a valid version number for PyPA. It might be possible to imply vers:pypi in this case, but not all Python dependencies are going to be vers:pypi. Poetry has its own specifier format which is slightly different from PyPA's, and not just in how it is formatted. This could mean that for pkg:pypi this feature is not supported. However, even for package types where today there is only one commonly used version range specification format, that doesn't mean there won't be another one in the future, so it could be risky to have an implicit range conversion for any format because there's no way to tell what people were doing at the time that the PURL was written.

I would be surprised if for any package type it were really ambiguous whether a string is referring to a specific version or to a range of versions that can contain a version which isn't exactly the same as the range itself, but because there are so many different formats I would be surprised if there weren't cases where what's a range in one format is a version in a different format (eg using "-" for a range instead of a semver separator), so the rules for determining whether a version is actually a range would need to be package-type dependent, which could complicate the spec and implementation.

Having PURLs with version ranges can pose problems for use cases where ranges are not allowed. I work on software which deals with PURLs for specific package types that have specific versions, and we check for versions which are invalid because we know the rules for those package types and we get stuff like ranges where we're expecting specific versions. However, I think general SBOM tools etc will just expect that the PURL is correct and at most canonicalize it. This may lead to problems where dependencies are under-specified or they are specified multiple times because in one place it is a range and in another it is a version, or in one place it is a range with one syntax and in another it is a range with equivalent but different syntax.

If it's explicit that you write pkg:pypi/requests@vers:pypi%2F>=3,<4 or whatever, that avoids the first two problems about ambiguity, but doesn't avoid potential issues with software that is expecting specific versions. However, it'd be much easier to check for vers: in software that needs to detect ranges, without having to implement different rules for all the package types, so it could be that such tools just need to start doing that.

Although, is it true that vers: version ranges cannot be valid specific version numbers? pkg:deb versions can contain colons. Maybe putting vers: in the version field of pkg: is also problematic.

@rgommers
Copy link

rgommers commented Feb 3, 2025

These comments https://github.com/package-url/purl-spec/pull/139/files#r811475168

The most important comment on this topic seems to be slightly higher up: https://github.com/package-url/purl-spec/pull/139/files#discussion_r809185075 contains an expanded version of the three options with pros and cons:

  • A - EITHER fold back vers entirely in purl [...]
  • B - OR instead adopt most of the purl semantics with a vers URI scheme. [...]
  • C - OR instead keep things as they are now. If there is a new versioning scheme that is discovered, either the package(s) using it merit their very own Package URL type or we register a new versioning scheme here

@jaimergp is asking about A, which is not what was chosen. The problems here are (1) that it's not very clear if B or C was chosen, and (2) if A wasn't possible due to backwards compatibility (which makes sense), why did vers: not end up as a superset of pkg: so that it can be migrated to? Now there seem to be two schemes that both have some feature that may be needed, but it does not seem possible to use a single design (purl, vers, or something else) that can describe both individual versions and version ranges.

@jaimergp
Copy link
Author

jaimergp commented Feb 4, 2025

Although, is it true that vers: version ranges cannot be valid specific version numbers?

Some version syntaxes do allow things like for == for exact equality, but it changes from ecosystem to ecosystem, so we cannot generalize. Since most version schemes can be extended arbitrarily (e.g. a package versioned as 0.1.0 can also have a post release 0.1.0.1), a range will always include an infinite amount of versions, no matter how specific. So the only way to identify single version is to use @version with exact equality; i.e. version MUST be 0.1.0.

That's why I was trying to gather more information, but mostly to confirm that if we want a single URL scheme for "PURLs with versions or version ranges" then we need another scheme?

@matt-phylum
Copy link
Contributor

I mean vers: could be the start of a version number if the package manager allows version numbers matching ^[a-z]+:[a-z]+/, in which case it would not be possible to put vers: in the version component of a PURL to indicate that the version is vers because it would be ambiguous for such a package type even when explicitly written out.

@jaimergp
Copy link
Author

jaimergp commented Feb 5, 2025

Ok, I see it now, thanks for the explanation. So in terms of next steps, what would be more desirable?

  • Push for pkg:pypi/requests@vers:pypi%2F>=3,<4, or an alternative "vers" operator?
  • Push for a new scheme like dep: where the only difference with pkg is that the @ field is a vers range.

@jloehel
Copy link

jloehel commented Feb 5, 2025

One question. A PURL should identify exactly one package, right? That package which the package manager selected during the build/test/deployment, right? If I build/test/deploy again, a new SBOM should be generated, right? Why would I like to use vers which can specify also a range of versions?

@jaimergp
Copy link
Author

jaimergp commented Feb 6, 2025

A PURL should identify exactly one package, right?

The version field seems to be optional, so a PURL can either identify all versions or only one version. Supporting vers support you can now select some.

Even if we only consider @version (no ranges), some ecosystems can have several artifacts per version (e.g. architectures, different GPU/CPU backends) or even dependency trees (e.g. extras in PyPI). So I'm not sure if the "one package" promise is universally guaranteed across ecosystems. You need to start using qualifiers to uniquely identify an artifact (e.g. ?file_name).

What's a "package" after all? The source tarball, the installable artifact...

For context, the use case I have for vers-ready PURLs is annotating external dependencies that are not available in the PyPI ecosystems but are nonetheless necessary to build a package. Think compilers, build tools, system libraries, etc. So I don't need to specify a single package, just a subset of them that are compatible with my project.

@jloehel
Copy link

jloehel commented Feb 6, 2025

A PURL should identify exactly one package, right?

The version field seems to be optional, so a PURL can either identify all versions or only one version. Supporting vers support you can now select some.

Okay fair enough. You are right. But the PURL specifier represents exactly one component in a SPDX or CycloneDX file, right? Shouldn't PURL address then one exact component? The software is not using multiple versions of an dependency at the same time. Otherwise, what benefit has a PURL. In the end I want to download an archive and check if the LICENSE is correct and if the CVE that matched is really reproducible on my system.

Even if we only consider @version (no ranges), some ecosystems can have several artifacts per version (e.g. architectures, different GPU/CPU backends) or even dependency trees (e.g. extras in PyPI). So I'm not sure if the "one package" promise is universally guaranteed across ecosystems. You need to start using qualifiers to uniquely identify an artifact (e.g. ?file_name).

Okay, the arch should be specified if necessary. The rest should be noarch . You build/run/test it on a specific arch. Dependency Trees are the job of the SBOM not of PURL.

What's a "package" after all? The source tarball, the installable artifact...

The archive or binary blob the package manager selects, downloads and installs.

For context, the use case I have for vers-ready PURLs is annotating external dependencies that are not available in the PyPI ecosystems but are nonetheless necessary to build a package. Think compilers, build tools, system libraries, etc. So I don't need to specify a single package, just a subset of them that are compatible with my project.

Can you explain this a little bit more in detail, please. Do you use PURLs in your own package manager?

@jaimergp
Copy link
Author

jaimergp commented Feb 6, 2025

Can you explain this a little bit more in detail, please. Do you use PURLs in your own package manager?

We are proposing PURLs as a way to identify external dependencies in the PyPI ecosystem. See the PEP draft and, if you need some "light" reading, this forum thread. The relevant section of the PEP is "Specifying external dependencies".

PURLs have also come up in the conda ecosystem (see draft CEP) to allow unequivocal mappings between PyPI and conda-forge.

The package managers wouldn't handle PURLs directly as input specifiers, but they might rely on mappings created with the underlying PURL identifier metadata as a "foreign key" between in principle unrelated packaging sources.

I'm happy to provide more specific details if you want, but that's the basic overview.

@rgommers
Copy link

rgommers commented Feb 6, 2025

But the PURL specifier represents exactly one component in a SPDX or CycloneDX file, right?

@jloehel I think your questions stem from this assumption, however if you look at the Problem-Solution content of the README of this repo, it seems quite clear that PURL is for a much wider range of tools and use cases than only SBOMs. For example: "A purl is a URL string used to identify and locate a software package in a mostly universal and uniform way across programming languages, package managers, packaging conventions, tools, APIs and databases."

@jloehel
Copy link

jloehel commented Feb 7, 2025

@jaimergp Thank you very much, I will read it.

But the PURL specifier represents exactly one component in a SPDX or CycloneDX file, right?

@jloehel I think your questions stem from this assumption, however if you look at the Problem-Solution content of the README of this repo, it seems quite clear that PURL is for a much wider range of tools and use cases than only SBOMs. For example: "A purl is a URL string used to identify and locate a software package in a mostly universal and uniform way across programming languages, package managers, packaging conventions, tools, APIs and databases."

The Definition says a software package. I guess it needs to be more clear what a software package means. For me that's a specific archive which I can download and install through a package manager. A single resource. A tarball, a wheel from pypi in example. But if I can locate multiple resources with a single PURL it should be more clear. How I am supposed to locate the package/resource behind pkg:generic/zlib? Why is it necessary to give up DNS and introduce a second mapping from pypi to pypi.org?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants