Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File name reuse issue #12724

Open
GergelyKalmar opened this issue Dec 27, 2022 · 4 comments
Open

File name reuse issue #12724

GergelyKalmar opened this issue Dec 27, 2022 · 4 comments

Comments

@GergelyKalmar
Copy link

Describe the bug
I'm receiving the "This filename has already been used" for a fresh project (mindlab) when uploading the source tar file for v1.0.0 (mindlab-1.0.0.tar.gz). Note that I'm not aware of any previous use of this project name and I don't see anything under the 'Security history' section of the project either. From my perspective, this is just a new project like any other.

It's an awfully weird user experience. There seems to be no way to identify which filenames are already "taken" for a project I created myself, so I don't know when will my PyPI uploader CI/CD jobs break because of a random conflict from the invisible past.

I understand that the reasoning for prohibiting filename reuse was to avoid the possibility of an attacker replacing an existing package with a malicious package, however, prohibiting filename reuse over past "phantom" projects while allowing the uploading of a (potentially malicious) zipped replacement source doesn't really prevent that. The correct approach against this attack vector is simply to use package hashes, in which case it should not matter whether a source file existed in the past.

Expected behavior
The source tar file uploads.

To Reproduce
twine upload mindlab-1.0.0.tar.gz

@GergelyKalmar GergelyKalmar added bug 🐛 requires triaging maintainers need to do initial inspection of issue labels Dec 27, 2022
@di di added feature request and removed requires triaging maintainers need to do initial inspection of issue bug 🐛 labels Dec 28, 2022
@di
Copy link
Member

di commented Dec 28, 2022

Sorry about that. FYI, both mindlab-1.0.0.tar.gz and mindlab-0.0.1.tar.gz were previously used (circa 2018) by a previous owner of the project.

There seems to be no way to identify which filenames are already "taken" for a project I created myself, so I don't know when will my PyPI uploader CI/CD jobs break because of a random conflict from the invisible past.

Where would you expect to see this information if it was available?

The correct approach against this attack vector is simply to use package hashes, in which case it should not matter whether a source file existed in the past.

I agree, however in reality very few Python users are hash-pinning, so we're fairly unlikely to reverse this policy anytime soon.

@GergelyKalmar
Copy link
Author

I suppose there are two separate use cases:

  1. It would be great to surface this information in the normal package search result page (at least for non-existing projects), so that when someone searches for a new project they are aware of past usage (and particularly this filename reuse limitation).
  2. It would also be great to be able to see this for projects that a user owns, so that they are able to work around existing "phantom" releases from the past for example.

The tooling support for using hashes is terrible at the moment (e.g. pypa/pip#4732), so of course usage is going to be low. It's a major supply chain vulnerability of the Python packaging ecosystem, and despite that progress seems to be quite slow on it.

@di
Copy link
Member

di commented Dec 28, 2022

It would be great to surface this information in the normal package search result page (at least for non-existing projects), so that when someone searches for a new project they are aware of past usage (and particularly this filename reuse limitation).

It might be confusing to show users results for things they can't install, how would you suggest working around that?

It would also be great to be able to see this for projects that a user owns, so that they are able to work around existing "phantom" releases from the past for example.

I think that's probably something we could do now. Would you like to work on this?

@GergelyKalmar
Copy link
Author

I suppose the "There were no results for 'package'" sentence could be extended along the lines of "There were no results for 'package' (see used filenames from past releases »)" when there are past filenames that can be shown.

I have plenty to do at the moment on my own projects (open source and otherwise) where I'm more effective anyways, so I don't think I can contribute to a new project anytime soon, however, if the two features would need to be prioritized I believe that it would be more important to implement this for existing project owners at least. That should decrease the tickets related to this topic (perhaps you should also update the twine error message to point to the list of all used filenames for a project).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants