Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utitlity to lazily acquire wheel metadata over HTTP #8467

Merged
merged 3 commits into from
Jun 30, 2020

Conversation

McSinyx
Copy link
Contributor

@McSinyx McSinyx commented Jun 18, 2020

This is created as discussed in GH-7819, GH-8448 and GH-8442.

TODO: Finalize the interface to integrate it into the current codebase.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jun 19, 2020

Thank you for putting this up!!

A question on strategy: what are your thoughts on potentially merging #8448 beforehand, but removing anything that hooks up the code in pip._internal.network.shallow elsewhere, and then using this PR to expose it as an unstable feature? I'm especially just thinking about:

Write tests

Since #8448 happens to split up the httpfile, zipfile, and wheel submodules and unit test them individually, I'm thinking that merging that first could reduce some of the implementation work in this PR, and then allow us to focus on more integration tests than unit tests in this PR. Integration testing is actually the only remaining TODO in #8448 right now, so that separation of responsibility would remove the final blocker to merging it!

What are your thoughts on that approach?

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jun 19, 2020

#8448 adds the ability to lazily download requirements, but it kind of hacks it into the RequirementSet class in order to make it easy to extract later in the pip download task. I would love to be able to just remove that entirely from #8448, and then let this PR figure out how to expose everything to the pip download task.

If we were to do that, I'm thinking the LazyZip class you've created here could potentially turn into essentially what the ShallowWheelDistribution class added in #8448 does.

You'll note that #8448 currently adds this hacky .get_session() method to Downloader. It might be better to instead add a .download_shallow() method to Downloader (or to a new LazyDownloader? not sure at all) that returns an object like the ShallowWheelDistribution object from #8448.

@McSinyx
Copy link
Contributor Author

McSinyx commented Jun 19, 2020

Thanks for your suggestions, @cosmicexplorer! I'll see what I can do, soon-ish (because of sudden things from school again 😞). On the merging part, I'm not somebody who can make the call and I don't have the confident to decide in this case either, so I think we need to get the maintainers to speak up their opinions.

That being said, I'll try to limit the scope of this PR down to only adding (a supposingly compatible version of, as you suggested) the utility instead, since I don't want to get down the rabbit hole of too many tangled decisions at the same time.

@cosmicexplorer
Copy link
Contributor

Ok!! I have plenty of free time and don't need to get #8448 in under any particular timeline, so I'm totally comfortable letting you set the pace with your work and modifying #8448 to react to whatever we figure out here, if necessary. I totally agree it would be good to avoid too many tangled decisions at once.

Super pumped about all this!!! :D

@McSinyx McSinyx force-pushed the lazy-wheel branch 2 times, most recently from f6ecdb9 to 62d2c14 Compare June 21, 2020 07:40
@McSinyx
Copy link
Contributor Author

McSinyx commented Jun 21, 2020

I've been thinking about this for a while, and I imagine that it would be easier now and in the future to use the shallow wheel only for dependency resolution. I'm speaking of this with the upcoming JSON API in mind and with that currently pip's codebase is tossing around path to the wheel (unpacked or not). So I'm proposing to use the outcome of this PR only for e.g. LinkCandidate.iter_dependencies and not for the wheel to be installed. If that's the case, I'll finalize this PR with unit tests to get it ready for merging.

@cosmicexplorer
Copy link
Contributor

I think that sounds great!

@McSinyx
Copy link
Contributor Author

McSinyx commented Jun 24, 2020

Per a private discussion with @pradyunsg yesterday, usage could look something like this:

patch on resolution/resolvelib/candidates.py

diff --git a/src/pip/_internal/resolution/resolvelib/candidates.py b/src/pip/_internal/resolution/resolvelib/candidates.py
index c7a30e18..8c736815 100644
--- a/src/pip/_internal/resolution/resolvelib/candidates.py
+++ b/src/pip/_internal/resolution/resolvelib/candidates.py
@@ -1,11 +1,13 @@
 import logging
 import sys
 
+from pip._vendor.contextlib2 import suppress
 from pip._vendor.packaging.specifiers import InvalidSpecifier, SpecifierSet
 from pip._vendor.packaging.utils import canonicalize_name
 from pip._vendor.packaging.version import Version
 
 from pip._internal.exceptions import MetadataInconsistent
+from pip._internal.network.lazy_wheel import dist_from_wheel_url
 from pip._internal.req.constructors import (
     install_req_from_editable,
     install_req_from_line,
@@ -222,30 +224,35 @@ class _InstallRequirementBackedCandidate(Candidate):
         self._prepare()
         return self._dist
 
-    def _get_requires_python_specifier(self):
-        # type: () -> Optional[SpecifierSet]
-        requires_python = get_requires_python(self.dist)
+    def _get_requires_python_specifier(self, dist):
+        # type: (Distribution) -> Optional[SpecifierSet]
+        requires_python = get_requires_python(dist)
         if requires_python is None:
             return None
         try:
             spec = SpecifierSet(requires_python)
         except InvalidSpecifier as e:
+            name = canonicalize_name(dist.project_name)
             logger.warning(
-                "Package %r has an invalid Requires-Python: %s", self.name, e,
+                "Package %r has an invalid Requires-Python: %s", name, e,
             )
             return None
         return spec
 
-    def iter_dependencies(self):
-        # type: () -> Iterable[Optional[Requirement]]
-        for r in self.dist.requires():
+    def _iter_dependencies(self, dist):
+        # type: (Distribution) -> Iterable[Optional[Requirement]]
+        for r in dist.requires():
             yield self._factory.make_requirement_from_spec(str(r), self._ireq)
         python_dep = self._factory.make_requires_python_requirement(
-            self._get_requires_python_specifier(),
+            self._get_requires_python_specifier(dist),
         )
         if python_dep:
             yield python_dep
 
+    def iter_dependencies(self):
+        # type: () -> Iterable[Optional[Requirement]]
+        return self._iter_dependencies(self.dist)
+
     def get_install_requirement(self):
         # type: () -> Optional[InstallRequirement]
         self._prepare()
@@ -283,6 +290,23 @@ class LinkCandidate(_InstallRequirementBackedCandidate):
             version=version,
         )
 
+    def iter_dependencies(self):
+        # type: () -> Iterable[Optional[Requirement]]
+        dist = None  # type: Optional[Distribution]
+        # TODO: Opt-out of hashing is needed.
+        # Awaiting https://github.com/pypa/pip/pull/8411.
+        # TODO: Opt-in as unstable feature.
+        if self._link.is_wheel:
+            url = self._link.url.split('#', 1)[0]
+            session = self._factory.preparer.downloader._session
+            logger.info('Collecting %s %s', self._name, self._version)
+            assert self._name is not None
+            with suppress(RuntimeError):
+                # TODO: Check for consistency of name and version
+                # of candidate and distribution.
+                dist = dist_from_wheel_url(self._name, url, session)
+        return self._iter_dependencies(dist or self.dist)
+
     def _prepare_abstract_distribution(self):
         # type: () -> AbstractDistribution
         return self._factory.preparer.prepare_linked_requirement(

Regarding testing, is it OK to fetch from PyPI?

Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding testing, is it OK to fetch from PyPI?

Yes. The important think there is to make sure to mark the test as a network test (with pytest.mark.network).

src/pip/_internal/network/utils.py Show resolved Hide resolved
src/pip/_internal/network/lazy_wheel.py Outdated Show resolved Hide resolved
@McSinyx McSinyx marked this pull request as ready for review June 26, 2020 08:17
@McSinyx McSinyx changed the title Draft lazy zip over HTTP Add utitlity to lazily acquire wheel metadata over HTTP Jun 26, 2020
@McSinyx McSinyx requested a review from pradyunsg June 26, 2020 11:19
Copy link
Member

@pradyunsg pradyunsg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pradyunsg pradyunsg merged commit 18431be into pypa:master Jun 30, 2020
@McSinyx McSinyx deleted the lazy-wheel branch June 30, 2020 13:39
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants