-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to confirm expected package hashes when installing #1175
Comments
One problem to deal with is that you may (rightly) get two different files for the same project + version, for instance if there's a binary Wheel uploaded for Windows you might get that on Windows (or not if you don't use --use-wheel) and you might get something completely different for Linux. I think peep just ignores this problem and says that a peep flavored requirements file is only promised to be valid on using the same install time options and the same target system. |
So, we have a couple of options:
I'm getting pretty close to starting to merge peep into pip. I'll probably take option number 2 unless somebody stops me. Any further thoughts? I'm happy to put forward a more concrete strawman proposal so we don't waste time upfront bikeshedding. |
One other horrible truth: currently, requirements.txt supports any pip version specifier, including things like |
I don't think we can break the large number of users who may be using https://github.com/search?l=&p=2&q=%3E%3D++path%3Arequirements.txt&ref=advsearch&type=Code It's probably more widely used in private repos too. To be fair you are breaking http://pythonhosted.org/setuptools/pkg_resources.html#requirement-objects Which EBNF is
So the extralist item should be an identifier and not contain |
Yes, obviously we would be changing the grammar. But we'd break old versions of pip only when used against versions of requirements.txt that actually include hashes. That's actually a desired behavior, as it fails safe. But, like I said, I favor option 2 anyway. |
In any case, I certainly didn't mean to suggest dropping support for range specifiers. I was more curious about anyone had worked out a semantic boundary between using them in requirements files and using them in |
I would prefer a lockfile (vs a hashfile) because Ideally it would do more than just hashes. I'm thinking the "normal' requirements.txt would no longer need to list every dependency and if the lockfile doesn't exist then pip will install from the requirements.txt, resolve the deps, write out a lockfile with a complete list of deps + hashes. If the lockfile does exist then it can just skip resolving deps and act like --no-deps was passed to it. |
But that may be more than you want to get into. It's something i've sketched out in my head a few times and just hadn't had time to work on yet. |
You're focused on the repeatability use case for requirements files. Currently, requirements files are just a way to get in a lot of There are 3 main purposes I think:
|
hmm, the phrase "no longer need" here concerns me. whether it's a requirements file, or a "lock file", when repeatability is concerned, many people will want an overt user artifact that is managed in version control. Not sure of your intention, but the lock file can't just be a run-time by-product, that isn't specifically referenced.
the lockfile would need to be something that can be used explictly, not just used if it exists.
btw, we can't assume write access next to the requirements file. I'm concerned about growing a new "lock file" concept in addition to requirements files. We'll end up with three notions: install_requires, requirements files, and lock files. I'd thinking maybe just "requirements 2.0". Just a better requirements file format that supports hashes. Conceptually the same, just better. |
I think even if we have requirements 2.0 (which is also something i'd love to do) I think we want a lockfile. Bundler has this and it's really nice. In order to have a repeatable process right now you have to manually manage the entire dependency graph inside of the requirements file. Pinning every single dependency and managing updating them or removing them when no longer needed. A lock file means that you only need to declare what your project depends on and pip itself manages pinning the entire dependency graph. In Bundler land you have Gemfile instead of requirements.txt and the lockfile is called Gemfile.lock, if Gemfile.lock exists it uses that otherwise it resolves the dependencies and creates Gemfile.lock and then installs it. If you want to force the use of a .lock you run |
manual? we have
hmm, I would say gemfile is like our |
pip freeze helps but there's still a bunch of manual busy work to doing it. It's annoying to keep updated.
No, install_requires would be gemspec. install_requires, requirements.txt, and the proposed lockfile do not include the same information (nor should they).
I'm -1 on including another file for just the hashes. Either we should go all out and include a real lock file or we should figure out how to get the hashes into requirements.txt |
To be specific.
Now it's true you can approximate a lockfile with requirements.txt but that's not a very user friendly UX. At best case they'll want two files anyways and they'll need to delete a virtualenv, install everything, then |
I don't want to linger on gem anymore, because I don't claim to know it, but I think gemfile just exists for the case where an "app" is not gemified or whatever. If the "app" were a gem, the gemfile and gemspec would likely be redundant.
Understood.
Again, "Locking" has been a defining use case for requirements files for years. Stripping "locking" away from requirements files is a big deal.
I'll post a more detailed breakdown of the 2 workflows. 1) the current approach using requirements files as lock files, and 2) the new lockfile approach. Whether "apps" use a "setup.py" or not is an important point in analyzing this. I'll include that in my breakdown. |
let me emphasize that. If you're default is to always create setup.py (and use otoh, if you're creating apps without a setup.py, then you'll end up wanting two requirements files (as @dstufft mentioned above). One to hold the "abstract" requirements, and the other for deployment repeatability. |
I disagree about the AFAICT there is exactly one case where you can truly treat One of the problems with |
Besides the fact that pip freeze loses information (for example if you install a tarball). |
isn't extras the only case? ultimately, your app installation gets off the ground with a pip command. that could include the extras for your top-level app. Running the installation this way would be something that's done at specific times, when you're interested in dealing with upgrading dependency versions. "human format"? it's the minimally-defined requirements that produce different results as pypi changes. |
well, there's VCS and url formats, if that's what you mean? |
@qwcode If I have a Django project than "gunicorn" isn't a dependency of my application so it's not appropriate to put it in |
Basically I believe it should be (in that it makes the most sense and streamlines things for the most use cases).
The only case where you don't need 3 distinct formats is the case where your requirements.txt (as above) is a collection with a length of 1. As soon as your collection is more than one thing then you need an intermediate format to serve as the "collection of things to install". As has also been touched on, setup.py also cannot properly specify every dependency type (because it contains abstract dependencies not concrete ones, see https://caremad.io/blog/setup-vs-requirement/) Additionally you cannot specify where to install something from in a setup.py however you can in a requirement.txt (which is also an important feature to have). |
ok, this breakdown works for me. the abstract/concrete distinction is nice. good stuff to get into the packaging user guide. About the synchronization between the files... Assuming it will be a practice to check in the lock files in some cases, I can imagine requirements getting "ahead" of the lock file in VCS (i.e. you update your requirements, but never install again before committing), but then the lock file get's used later as part of a deployment process in a separate environment (that only works against lock files). maybe the lock file needs to contain a hash of the source requirement file? |
👍 |
Having recently gotten used to Bundler's Gemfile.lock and npm shrinkwrap's npm-shrinkwrap.json, returning to pip and having to manually keep the list of inherited dependencies up to date in our requirements file is pretty painful.
npm shrinkwrap (though it does have it's own set of bugs/issues) warns when out of sync - and things like |
Btw, expect a PR from me in a day or two for basic hash checking. |
will it conflict with #3125 ? |
I don't think so. —Reply to this email directly or view it on GitHub. |
ok, no biggie if it did. I would just wait before merging to see what you have. |
Btw this seems strongly related to #468 |
It looks like commit 1e41f01 introduced a test that fails on Windows (sorry, I can't track back to the PR that contained the commit). @dstufft @erikrose do either of you have any insight? The test is def test_unsupported_hashes(self, data):
"""VCS and dir links should raise errors when --require-hashes is
on.
In addition, complaints about the type of requirement (VCS or dir)
should trump the presence or absence of a hash.
"""
reqset = self.basic_reqset(require_hashes=True)
reqset.add_requirement(
list(process_line(
'git+git://github.com/pypa/pip-test-package --hash=sha256:123',
'file',
1))[0])
dir_path = data.packages.join('FSPkg')
reqset.add_requirement(
list(process_line(
'file://%s' % (dir_path,),
'file',
2))[0])
finder = PackageFinder([data.find_links], [], session=PipSession())
sep = os.path.sep
if sep == '\\':
sep = '\\\\' # This needs to be escaped for the regex
assert_raises_regexp(
HashErrors,
r"Can't verify hashes for these requirements because we don't "
r"have a way to hash version control repositories:\n"
r" git\+git://github\.com/pypa/pip-test-package \(from -r file "
r"\(line 1\)\)\n"
r"Can't verify hashes for these file:// requirements because they "
r"point to directories:\n"
r" file://.*{sep}data{sep}packages{sep}FSPkg "
"\(from -r file \(line 2\)\)".format(sep=sep),
reqset.prepare_files,
finder) and it fails with an error in the url2pathname function in the stdlib:
It looks like a problem with the arcane rules for how Windows drive letters get encoded in URLs. But I don't understand the logic well enough to fix it. For now, I've marked this test as an expected fail on Windows, but someone should take a look at it. |
@pfmoore It sounds worth opening a separate issue about. Do you have a traceback handy? I might be able to take a guess if I could see one. |
pip should offer some way to confirm expected hashes when installing
(mainly due to it being possible for the same pypi distribution to have a different hash over time; project authors are allowed to delete a distribution, and then upload a new one with the same name and version, but different hash.)
to be clear, pip already validates the hash url fragment in the PyPI link. this is not about doing that. this about confirming that you're getting the same hash that you've installed before, and want to keep getting.
peep offers this feature using requirement file comments: https://pypi.python.org/pypi/peep
as for implementing, here's one idea from @dstufft
"I keep playing with the idea of a lock file. e.g. in your requirements.txt you'd just specify your top level dependencies, then when you install it would resolve the dependencies, put them + hashes in requirements.txt.lock and next time you install it will look for a lockfile and install from there (implying --no-deps) so we don't get this thing where in order to do repeatable builds you basically have to manually pin every dependencies inside of a requirements file by hand"
cc @erikrose
The text was updated successfully, but these errors were encountered: