Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Literal for compression in zipfile #9346

Merged
merged 3 commits into from
Dec 14, 2022
Merged

Use Literal for compression in zipfile #9346

merged 3 commits into from
Dec 14, 2022

Conversation

Viicos
Copy link
Contributor

@Viicos Viicos commented Dec 10, 2022

According to docs:

compression is the ZIP compression method to use when writing the archive, and should be ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2 or ZIP_LZMA; unrecognized values will cause NotImplementedError to be raised.

Literal could be used for the compresslevel parameter as well:

The compresslevel parameter controls the compression level to use when writing files to the archive. When using ZIP_STORED or ZIP_LZMA it has no effect. When using ZIP_DEFLATED integers 0 through 9 are accepted (see zlib for more information). When using ZIP_BZIP2 integers 1 through 9 are accepted (see bz2 for more information).

But this would require extra overloads for each compression value, so considering the fact that there's already a split between 3.8 and 3.11 this would make things more complicated.

Note: We could use Literal for the module constants, e.g. ZIP_STORED, ZIP_DEFLATED, etc. I can add another commit to this PR if required.

@github-actions

This comment has been minimized.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We could use Literal for the module constants, e.g. ZIP_STORED, ZIP_DEFLATED, etc. I can add another commit to this PR if required.

We need to either make both changes together, or not at all. Otherwise, type checkers will emit false positives on code like this, due to the fact that they will infer ZIP_DEFLATED as being an int, while the function requires a Literal:

import zipfile

zip_output = zipfile.ZipFile("some/path.zip", "w", zipfile.ZIP_DEFLATED, allowZip64=True)

This is the cause of the pip error in the mypy_primer output: https://github.com/pypa/pip/blob/5f3f592c4581a059ff9de0fb8052ef5c6ef25fd4/src/pip/_internal/req/req_install.py#L714-L719

Note: I'm not yet sure I'm sold on the idea of making a change here at all, due to the risk of false-positive errors being emitted. But in order to see the impact of the change, we'll need to see what it looks like if we change the module-level constants to Literal values as well.

@Viicos
Copy link
Contributor Author

Viicos commented Dec 13, 2022

We need to either make both changes together, or not at all. Otherwise, type checkers will emit false positives on code like this, due to the fact that they will infer ZIP_DEFLATED as being an int, while the function requires a Literal

Yes I've been checking the reported mypy_primer errors and they all seem related to this. I'll add another commit and see what is the updated report.

@github-actions

This comment has been minimized.

@Viicos
Copy link
Contributor Author

Viicos commented Dec 13, 2022

Related code failing:

# vision (https://github.com/pytorch/vision)

_ZIP_COMPRESSION_MAP: Dict[str, int] = {
    ".bz2": zipfile.ZIP_BZIP2,
    ".xz": zipfile.ZIP_LZMA,
}


def _extract_zip(from_path: str, to_path: str, compression: Optional[str]) -> None:
    with zipfile.ZipFile(
        from_path, "r", compression=_ZIP_COMPRESSION_MAP[compression] if compression else zipfile.ZIP_STORED
    ) as zip:
        zip.extractall(to_path)
# jinja (https://github.com/pallets/jinja)
zip_file = ZipFile(
    target, "w", dict(deflated=ZIP_DEFLATED, stored=ZIP_STORED)[zip]
)

This would require a manual fix on these repos. So this is still breaking things, but imo it would result in safer code, as ZipFile will throw a NotImplementedError if the compression method is not supported.

Copy link
Collaborator

@srittau srittau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think we need to bite the bullet and need to make these kind of changes rather sooner than later.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM too, though I'm not sure it makes as much sense for ZIP64_LIMIT, ZIP_FILECOUNT_LIMIT and ZIP_MAX_COMMENT to be made Literals as it does for the other constants. Those are very big numbers, and the constants serve a different purpose to ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2 and ZIP_LZMA.

@srittau
Copy link
Collaborator

srittau commented Dec 14, 2022

I agree. Personally, I would have left them as int. But it doesn't hurt to annotate them, either.

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Viicos!

@AlexWaygood
Copy link
Member

The stubtest failures are due to the recent release of Python 3.10.9, and are unrelated to this PR.

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

vision (https://github.com/pytorch/vision)
+ torchvision/datasets/utils.py:301: error: Argument "compression" to "ZipFile" has incompatible type "int"; expected "Literal[0, 8, 12, 14]"  [arg-type]

jinja (https://github.com/pallets/jinja)
+ src/jinja2/environment.py:864: error: Argument 3 to "ZipFile" has incompatible type "int"; expected "Literal[0, 8, 12, 14]"  [arg-type]

@AlexWaygood AlexWaygood merged commit 034cfab into python:main Dec 14, 2022
JukkaL added a commit to JukkaL/typeshed that referenced this pull request Dec 15, 2022
This reverts commit 034cfab.

The commit exposed the value of the compression constants, which seem
to be implementation details, in the public API. I don't see anything
in the documentation about the values of the constants such as
`ZIP_STORED`:
https://docs.python.org/3/library/zipfile.html?highlight=zipfile#zipfile.ZipFile

Example where this makes a difference:
```
from typing import Literal
import zipfile

def f1(p: str, compression: int) -> None:
    """Error: compression should have the type Literal[0, 8, 12, 14]"""
    zipfile.ZipFile(p, compression=compression)

def f2(p: str, compression: Literal[0, 8, 12, 14]) -> None:
    """Works, but cryptic and exposes internal implementation details"""
    zipfile.ZipFile(p, compression=compression)
```

The values are of constants need to be explicitly specified if somebody
wants to wrap `zipfipe.ZipFile`, which arguably exposes implementation
details in a problematic fashion.

Here is a real-world example where this caused a regression:
https://github.com/pytorch/vision/blob/main/torchvision/datasets/utils.py#L301
AlexWaygood pushed a commit that referenced this pull request Dec 16, 2022
This reverts commit 034cfab.

The commit exposed the value of the compression constants, which seem
to be implementation details, in the public API. I don't see anything
in the documentation about the values of the constants such as
`ZIP_STORED`:
https://docs.python.org/3/library/zipfile.html?highlight=zipfile#zipfile.ZipFile

Example where this makes a difference:
```
from typing import Literal
import zipfile

def f1(p: str, compression: int) -> None:
    """Error: compression should have the type Literal[0, 8, 12, 14]"""
    zipfile.ZipFile(p, compression=compression)

def f2(p: str, compression: Literal[0, 8, 12, 14]) -> None:
    """Works, but cryptic and exposes internal implementation details"""
    zipfile.ZipFile(p, compression=compression)
```

The values are of constants need to be explicitly specified if somebody
wants to wrap `zipfipe.ZipFile`, which arguably exposes implementation
details in a problematic fashion.

Here is a real-world example where this caused a regression:
https://github.com/pytorch/vision/blob/main/torchvision/datasets/utils.py#L301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants