fix: validate wheel files in a RAM friendly way #183

ralbertazzi · 2023-05-22T17:56:16Z

Content validation of a wheel record currently loads the entire file in memory with a self._zipfile.read(item). This is extremely inefficient from big wheels (the well known PyTorch has now >2 GB wheel files) and leads to an extremely high RAM consumption. This PR fixes this behaviour by reading the zip file content in a buffered way, as other parts of the codebase are already doing. Unfortunately this required a small change to some signatures.

dimbleby · 2023-05-22T18:49:27Z

src/installer/sources.py

-                        f"In {self._zipfile.filename}, hash / size of {item.filename} didn't match RECORD"
-                    )
+                with self._zipfile.open(item, "r") as stream:
+                    if not record.validate(cast("BinaryIO", stream), item.file_size):


(straying into changes beyond the scope of this MR: but so far as I can see this repository could use IO[bytes] everywhere it currently uses BinaryIO which would save some casting)

pradyunsg · 2023-05-29T19:46:48Z

I've filed #185 for this to have an issue associated with the PR; in case there's any high-level details to discuss. There likely aren't but it can't hurt to have an issue to close and drive not-PR-specific discussions into.

Other than that, I don't think making a backwards incompatible change is necessary here -- I've filed #186 that does not contain backwards incompatible API changes and instead adds a new method and deprecates the (problematic) RecordEntry.validate(data) method.

fix: validate wheel files in a RAM friendly way

fbe5ca1

dimbleby reviewed May 22, 2023

View reviewed changes

ralbertazzi mentioned this pull request May 23, 2023

fix: disable wheel content validation python-poetry/poetry#7987

Merged

pradyunsg mentioned this pull request May 29, 2023

Validate RECORD file using streams instead of reading in-memory #186

Merged

pradyunsg closed this in #186 May 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: validate wheel files in a RAM friendly way #183

fix: validate wheel files in a RAM friendly way #183

ralbertazzi commented May 22, 2023

dimbleby May 22, 2023

pradyunsg commented May 29, 2023

fix: validate wheel files in a RAM friendly way #183

fix: validate wheel files in a RAM friendly way #183

Conversation

ralbertazzi commented May 22, 2023

dimbleby May 22, 2023

Choose a reason for hiding this comment

pradyunsg commented May 29, 2023