-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix .files and inferred packages_distributions for .egg-info packages #437
Conversation
The build_files() helper which handles these FilesDef nested dicts already has code to handle values of type 'bytes' (they get written verbatim instead of first being processed by DALS()). Changing the FilesDef type to reflect this prevents Mypy from failing when encountering legitimate bytes values in these dicts.
This corresponds to the qiskit[1] meta-package which: - does not contain any (runtime) Python code itself, but serves as a mechanism to install its transitive dependencies (which populate the qiskit package namespace). - is distributed as a source archive. - includes a top_level.txt which is empty (contains a single newline), arguably correct given that it does not directly install any importable packages/modules. - when installed as an egg, provides a SOURCES.txt which is incorrect from a runtime POV: it references 3 .py files, a setup.py and two files under test/, none of which are actually installed. - when installed (as an egg) by pip, provides an installed-files.txt file which is _more_ accurate than SOURCES.txt, since it reflects the files that are actually available after installation. importlib_metadata reports incorrect .files for this package, because we end up using SOURCES.txt. It is better to use installed-files.txt when it is available. Furthermore, as a result of this, packages_distributions() also incorrectly reports that this packages provides imports names that do not actually exist ("setup" and "test", in qiskit's case). This commit adds EggInfoPkgPipInstalledNoModules, a test project that mimics the egg installation of qiskit, and adds it to existing test cases, as well as adding a new test cases specifically for verifying packages_distributions() with egg-info packages. The following tests fail in this commit, but will be fixed in the next commit: - PackagesDistributionsTest.test_packages_distributions_on_eggs - APITests.test_files_egg_info See the python#115 issue for more details. [1]: qiskit is found at https://pypi.org/project/qiskit/0.41.1/#files
When listing the files in a *.egg-info distribution, prefer using *.egg-info/installed-files.txt instead of *.egg-info/SOURCES.txt. installed-files.txt is written by pip[1] when installing a package, whereas the SOURCES.txt is written by setuptools when creating a source archive[2]. installed-files.txt is only present when the package has been installed by pip, so we cannot depend on it always being available. However, when it _is_ available, it is an accurate record of what files are installed. SOURCES.txt, on the other hand, is always avaiable, but is not always accurate: Since it is generated from the source archive, it will often include files (like 'setup.py') that are no longer available after the package has been installed. Fixes python#115 for the cases where a installed-files.txt file is available. [1]: https://pip.pypa.io/en/stable/news/#v0-3 [2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
As established in previous commits, the SOURCES.txt file is not always an accurate source of files that are present after a package has been installed. One situation where this inaccuracy is problematic is when top_level.txt is also missing, and packages_distributions() is forced to infer the provided import names based on Distribution.files. In this situation we end up with incorrect mappings between import packages and distribution packages, including import packages that clearly do not exist at all. For example, a SOURCES.txt that lists setup.py (which is used _when_ installing, but is not available after installation), will see that setup.py returned from .files, which then will cause packages_distributions() to claim a mapping from the non-existent 'setup' import name to this distribution. This commit adds EggInfoPkgSourcesFallback which demostrates such a scenario, and adds this new class to a couple of relevant tests. A couple of these tests are currently failing, to demonstrate the issue at hand. These test failures will be fixed in the next commit. See the python#115 issue for more details.
Add an extra filter on the paths returned from Distribution.files, to prevent paths that don't exist on the filesystem from being returned. This attempts to solve the issue of .files returning incorrect information based on the inaccuracies of SOURCES.txt. As the code currently is organized, it is more complicated to write this such that it only applies to the information read from SOURCES.txt specifically, hence we apply it to _all_ of .files instead. This fixes python#115, also in the case where there is no installed-files.txt file available. [1]: https://pip.pypa.io/en/stable/news/#v0-3 [2]: https://setuptools.pypa.io/en/latest/deprecated/python_eggs.html#sources-txt-source-files-manifest
f3fecf8
to
22d9ea5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. I am just gonna wait a couple days to give Jason a bit more time to review, otherwise I'll just merge the fixes, unblocking this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jherland for the contrib. I can see you took great care to be minimally invasive while addressing the core concerns. Well done, and kudos for your approach, especially considering the wonky structure of the fixtures and test parameterization.
Overall, this approach looks acceptable. My main concern is with the non-metadata files as presented by installed-files. I have some style/quality concerns, but nothing serious. Feel free to reject or ignore any of my comments and I can address them to my satisfaction.
I'll want to spend a bit more time with this change before merging it in, but feel free to address some of my comments as your time permits.
Move test_packages_distributions_on_eggs() method into a new class, PackagesDistributionsEggTest, to prevent applying unnecessary fixtures to existing tests.
…URCES.txt Rename starved_egg to sources_fallback.
Rename empty_egg to egg_with_no_modules
Remove unnecessary list() call.
…iles for all the entries in RECORD
The import names that were created by these tests were not valid Python identifiers. Fix that, and furthermore: add another check to verify that _all_ import names returned from packages_distributions() are always valid Python identifiers. Ideally we should check that all keys returned from packages_distributions() are valid import names (i.e. can be imported), but this is at least a step in the right direction.
…port names" This reverts commit 70ff991. This behavior was adopted in 5e8260c and subsequently adapted as part of python#443.
…isting files for all the entries in RECORD" This reverts commit fa9cca4.
The first three commits fix #115 for the case where a
.egg-info
package includesinstalled-files.txt
:We prefer
installed-files.txt
overSOURCES.txt
as a source forDistribution.files
, as the former is deemed more accurate when it is available.The last two commits attempts to fix #115 for the case where we have no choice but to parse
SOURCES.txt
:This assumes that
Distribution.files
should never contain a path to a non-existent file.We simply remove non-existent files with a filter, thus removing any entries from
SOURCES.txt
that refer to files that were part of the source package, but not part of the final installation.Commits:
tests/fixtures
: FixFilesDef
type to includebytes
valuesDistribution.files
: Prefer*.egg-info/installed-files.txt
toSOURCES.txt
.files
from inaccurateSOURCES.txt
Distribution.files
: Only return files that actually exist