Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip raising exception when closing with buffer backed by BytesIO #129726

Closed
xrmx opened this issue Feb 6, 2025 · 16 comments
Closed

gzip raising exception when closing with buffer backed by BytesIO #129726

xrmx opened this issue Feb 6, 2025 · 16 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@xrmx
Copy link
Contributor

xrmx commented Feb 6, 2025

Bug report

Bug description:

Hello,

the following snippet raises an exception in Python 3.13.2 while it's fine with Python 3.12.

import io
import gzip

def foo():
    buffer = gzip.GzipFile(fileobj=io.BytesIO(), mode="w")

foo()

Running this with python3.12 is silent, with Python 3.13.2 instead:

Exception ignored in: <gzip on 0x7fa4fd99c550>
Traceback (most recent call last):
  File "/usr/lib/python3.13/gzip.py", line 359, in close
    fileobj.write(self.compress.flush())

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

@xrmx xrmx added the type-bug An unexpected behavior, bug, or error label Feb 6, 2025
@xrmx xrmx changed the title gzip raising exception when closing empty buffer backed by bytesio gzip raising exception when closing empty buffer backed by BytesIO Feb 6, 2025
@xrmx xrmx changed the title gzip raising exception when closing empty buffer backed by BytesIO gzip raising exception when closing with buffer backed by BytesIO Feb 7, 2025
@picnixz picnixz added stdlib Python modules in the Lib dir extension-modules C modules in the Modules dir and removed extension-modules C modules in the Modules dir labels Feb 7, 2025
@cmaloney
Copy link
Contributor

cmaloney commented Feb 9, 2025

Explicit buffer.close() and del buffer in the method make the error go away, so I think this has to do with ordering of cleanup of references when buffer is actually deallocated. Moving to with gzip.GzipFile ... as buffer: also doesn't error. The issue seems to be that the io.BytesIO is cleaned up before the gzip.GzipFile.

The same behavior occurs with an explicit reference to the io.BytesIO:

def explicit_reference():
    bio = io.BytesIO()
    buffer = gzip.GzipFile(fileobj=bio, mode="w")

Can work around it so no longer error at least by updating GzipFile.close() to check if self.fileobj.closed and fast-path exiting (See: https://github.com/python/cpython/compare/main...cmaloney:cpython:exp/gzip?expand=0). Not sure that is the right approach though vs. getting the objects to destruct in the right order.

@xrmx
Copy link
Contributor Author

xrmx commented Feb 9, 2025

Explicit buffer.close() and del buffer in the method make the error go away, so I think this has to do with ordering of cleanup of references when buffer is actually deallocated. Moving to with gzip.GzipFile ... as buffer: also doesn't error.

Thanks for testing these, I cannot use the context manager but I may be able to close the buffer explictly.

The issue seems to be that the io.BytesIO is cleaned up before the gzip.GzipFile.

Yeah, that is my impression too

The same behavior occurs with an explicit reference to the io.BytesIO:

def explicit_reference():
bio = io.BytesIO()
buffer = gzip.GzipFile(fileobj=bio, mode="w")

Can work around it so no longer error at least by updating GzipFile.close() to check if self.fileobj.closed and fast-path exiting (See: https://github.com/python/cpython/compare/main...cmaloney:cpython:exp/gzip?expand=0). Not sure that is the right approach though vs. getting the objects to destruct in the right order.

Yeah would be nice to keep the code working instead of not just crashing :) The gzip code should already have references to the fileobj so to me is surprising it gets deallocated while we are in in GzipFile.close.

@xrmx
Copy link
Contributor Author

xrmx commented Feb 10, 2025

So my original code when getting the gzipped content is this so am not sure I can workaround this other than stop using GzipFile.

    fileobj = buffer.fileobj  # get a reference to the fileobj before closing the gzip file
    buffer.close()

    data = fileobj.getbuffer()

Irony of the comment 😅

The documentation explicitly says that calling GzipFile's close() does not close the fileobj, so I guess it also means it should not be freed:

Calling a GzipFile object’s close() method does not close fileobj, since you might wish to append more material after the compressed data.
This also allows you to pass an io.BytesIO object opened for writing as fileobj, and retrieve the resulting memory buffer using the io.BytesIO object’s getvalue() method.

@cmaloney
Copy link
Contributor

I still haven't found what is causing the fileobj to be closed... The line which is causing the exception print was added to 3.12 in https://github.com/python/cpython/pull/105920/files it looks like to fix #105808

@xrmx
Copy link
Contributor Author

xrmx commented Feb 10, 2025

@cmaloney the one you linked touches flush, here is crashing in close. AFAICS the only commit that touches close (and coincidentally the gc 😅 ) that is in 3.13 but not in 3.12 is b52fc70#diff-ad9b54ac8ef847cbb11fb0550f7e8cede55b1d92de15899e0a885a94a124838a

@cmaloney
Copy link
Contributor

I think it's the move to using a BufferedWriter (gh-89550) changed the behavior around flush and close in some unintended ways (https://github.com/python/cpython/blob/main/Modules/_io/bufferedio.c#L543-L598 being one of the parts). That the stream is being marked as "closed", but later gets flushed to. Still investigating / don't have any verifiable answers yet, just hunches.

@danifus
Copy link
Contributor

danifus commented Feb 11, 2025

Bisected this issue to #105104

@xrmx
Copy link
Contributor Author

xrmx commented Feb 11, 2025

Ok so with 3.12 and development mode I see the same warning, don't see the warning in 3.10 and 3.11 though:

$ python3.13 gzipfoo.py 
Exception ignored in: <gzip on 0x7f39c7098580>
Traceback (most recent call last):
  File "/usr/lib/python3.13/gzip.py", line 359, in close
    fileobj.write(self.compress.flush())
ValueError: I/O operation on closed file.
$ python3.12 gzipfoo.py 
$ python3.12 -X dev gzipfoo.py 
Exception ignored in: <gzip on 0x7f1450669590>
Traceback (most recent call last):
  File "/usr/lib/python3.12/gzip.py", line 357, in close
    fileobj.write(self.compress.flush())
ValueError: I/O operation on closed file.
$ /usr/bin/python3.10 -X dev gzipfoo.py 
$ /usr/bin/python3.11 -X dev gzipfoo.py

So there is something that changed the behavior in 3.12.

@xrmx
Copy link
Contributor Author

xrmx commented Feb 12, 2025

@danifus thanks for your bisection. Any chance you can bisect with the -X dev param? I think there are good chances the code was failing already, just not printing the exception.

@danifus
Copy link
Contributor

danifus commented Feb 12, 2025

I bisected with -X dev and it was introduced by #89550 - the commit @cmaloney identified :)

Could this be due to the GC delaying the finalization of the GzipFile obj due to a reference cycle causing the BytesIO obj to be finalized before GzipFile's IOBase.__del__ ancestor can call GzipFile.close()?

The reference cycle is:
GzipFile._buffer -> io.BufferedWriter
io.BufferedWriter._raw -> _WriteBufferStream
_WriteBufferStream.gzip_file -> back to GzipFile

@cmaloney
Copy link
Contributor

Confirmed that breaking the reference cyle (adding a weakref.proxy()) resolves the issue. Working on a test + PR for that:

            self._buffer = io.BufferedWriter(_WriteBufferStream(weakref.proxy(self)),
                                             buffer_size=self._buffer_size)

cmaloney added a commit to cmaloney/cpython that referenced this issue Feb 12, 2025
A reference loop was resulting in the `fileobj` held by the `GzipFile`
being closed before the `GzipFile`.

The issue started with pythongh-89550 in 3.12, but was hidden in most cases
until 3.13 when pythongh-62948 made it more visible.
vstinner pushed a commit that referenced this issue Feb 28, 2025
A reference loop was resulting in the `fileobj` held by the `GzipFile`
being closed before the `GzipFile`.

The issue started with gh-89550 in 3.12, but was hidden in most cases
until 3.13 when gh-62948 made it more visible.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2025
A reference loop was resulting in the `fileobj` held by the `GzipFile`
being closed before the `GzipFile`.

The issue started with pythongh-89550 in 3.12, but was hidden in most cases
until 3.13 when pythongh-62948 made it more visible.
(cherry picked from commit 7f39137)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 28, 2025
A reference loop was resulting in the `fileobj` held by the `GzipFile`
being closed before the `GzipFile`.

The issue started with pythongh-89550 in 3.12, but was hidden in most cases
until 3.13 when pythongh-62948 made it more visible.
(cherry picked from commit 7f39137)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
@vstinner
Copy link
Member

Fixed by change 7f39137. Backports to 3.12 and 3.13 will follow.

@xrmx
Copy link
Contributor Author

xrmx commented Feb 28, 2025

Thanks a lot!

@vstinner
Copy link
Member

Maybe GzipFile should emit a ResourceWarning if the file is not closed explicitly to make sure that data are written on disk.

vstinner pushed a commit that referenced this issue Feb 28, 2025
…130670)

gh-129726: Break `gzip.GzipFile` reference loop (GH-130055)

A reference loop was resulting in the `fileobj` held by the `GzipFile`
being closed before the `GzipFile`.

The issue started with gh-89550 in 3.12, but was hidden in most cases
until 3.13 when gh-62948 made it more visible.
(cherry picked from commit 7f39137)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
vstinner pushed a commit that referenced this issue Feb 28, 2025
…130669)

gh-129726: Break `gzip.GzipFile` reference loop (GH-130055)

A reference loop was resulting in the `fileobj` held by the `GzipFile`
being closed before the `GzipFile`.

The issue started with gh-89550 in 3.12, but was hidden in most cases
until 3.13 when gh-62948 made it more visible.
(cherry picked from commit 7f39137)

Co-authored-by: Cody Maloney <cmaloney@users.noreply.github.com>
@cmaloney
Copy link
Contributor

I implemented a ResourceWarning on BufferedWriter checking for unwritten data, and just the CPython I/O tests it fires a lot. Even with conditions (only if raw isn't closed, etc) many cases to work through. I think worthwhile to help track down and cleanup inconsistencies at least in CPython itself. Not sure if it's going to be too noisy in general Python ecosystem codebases though.

@vstinner
Copy link
Member

vstinner commented Mar 2, 2025

The test suite can be fixed. The risk of losing data justify emitting ResourceWarning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants