-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reset file pointer to 0 when reading file stream #7304
Conversation
Instead of raising a ValueError about the file pointer not being at the start of the file, reset the file pointer automatically to zero, and warn that the pointer has been reset.
Traceback from the 2 test failures at https://github.com/pydata/xarray/actions/runs/3516849430/jobs/5893926099#step:9:252 =================================== FAILURES ===================================
____________________ TestH5NetCDFFileObject.test_open_twice ____________________
[gw2] linux -- Python 3.10.7 /home/runner/micromamba-root/envs/xarray-tests/bin/python
self = <xarray.tests.test_backends.TestH5NetCDFFileObject object at 0x7f211de81e40>
def test_open_twice(self) -> None:
expected = create_test_data()
expected.attrs["foo"] = "bar"
> with pytest.raises(ValueError, match=r"read/write pointer not at the start"):
E Failed: DID NOT RAISE <class 'ValueError'>
/home/runner/work/xarray/xarray/xarray/tests/test_backends.py:3034: Failed
___________________ TestH5NetCDFFileObject.test_open_fileobj ___________________
[gw2] linux -- Python 3.10.7 /home/runner/micromamba-root/envs/xarray-tests/bin/python
self = <xarray.tests.test_backends.TestH5NetCDFFileObject object at 0x7f211de82530>
@requires_scipy
def test_open_fileobj(self) -> None:
# open in-memory datasets instead of local file paths
expected = create_test_data().drop_vars("dim3")
expected.attrs["foo"] = "bar"
with create_tmp_file() as tmp_file:
expected.to_netcdf(tmp_file, engine="h5netcdf")
with open(tmp_file, "rb") as f:
with open_dataset(f, engine="h5netcdf") as actual:
assert_identical(expected, actual)
f.seek(0)
with open_dataset(f) as actual:
assert_identical(expected, actual)
f.seek(0)
with BytesIO(f.read()) as bio:
with open_dataset(bio, engine="h5netcdf") as actual:
assert_identical(expected, actual)
f.seek(0)
with pytest.raises(TypeError, match="not a valid NetCDF 3"):
open_dataset(f, engine="scipy")
# TODO: this additional open is required since scipy seems to close the file
# when it fails on the TypeError (though didn't when we used
# `raises_regex`?). Ref https://github.com/pydata/xarray/pull/5191
with open(tmp_file, "rb") as f:
f.seek(8)
with pytest.raises(
ValueError,
match="match in any of xarray's currently installed IO",
):
> with pytest.warns(
RuntimeWarning,
match=re.escape("'h5netcdf' fails while guessing"),
):
E Failed: DID NOT WARN. No warnings of type (<class 'RuntimeWarning'>,) matching the regex were emitted.
E Regex: 'h5netcdf'\ fails\ while\ guessing
E Emitted warnings: [ UserWarning('cannot guess the engine, file-like object read/write pointer not at the start of the file, so resetting file pointer to zero. If this does not work, please close and reopen, or use a context manager'),
E RuntimeWarning("deallocating CachingFileManager(<class 'h5netcdf.core.File'>, <_io.BufferedReader name='/tmp/tmpoxdfl12i/temp-720.nc'>, mode='r', kwargs={'invalid_netcdf': None, 'decode_vlen_strings': True}, manager_id='b62ec6c8-b328-409c-bc5d-bbab265bea51'), but file is not already closed. This may indicate a bug.")]
/home/runner/work/xarray/xarray/xarray/tests/test_backends.py:3076: Failed
=========================== short test summary info ============================
FAILED xarray/tests/test_backends.py::TestH5NetCDFFileObject::test_open_twice - Failed: DID NOT RAISE <class 'ValueError'>
FAILED xarray/tests/test_backends.py::TestH5NetCDFFileObject::test_open_fileobj - Failed: DID NOT WARN. No warnings of type (<class 'RuntimeWarning'>,) matching the regex were emitted.
Regex: 'h5netcdf'\ fails\ while\ guessing
Emitted warnings: [ UserWarning('cannot guess the engine, file-like object read/write pointer not at the start of the file, so resetting file pointer to zero. If this does not work, please close and reopen, or use a context manager'),
RuntimeWarning("deallocating CachingFileManager(<class 'h5netcdf.core.File'>, <_io.BufferedReader name='/tmp/tmpoxdfl12i/temp-720.nc'>, mode='r', kwargs={'invalid_netcdf': None, 'decode_vlen_strings': True}, manager_id='b62ec6c8-b328-409c-bc5d-bbab265bea51'), but file is not already closed. This may indicate a bug.")]
= 2 failed, 14608 passed, 1190 skipped, 203 xfailed, 73 xpassed, 54 warnings in 581.98s (0:09:41) = |
Fixes the `Failed: DID NOT RAISE <class 'ValueError'>`
The ValueError and RuntimeWarning isn't raised anymore.
xarray/core/utils.py
Outdated
warnings.warn( | ||
"cannot guess the engine, " | ||
"file-like object read/write pointer not at the start of the file, " | ||
"so resetting file pointer to zero. If this does not work, " | ||
"please close and reopen, or use a context manager" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a warning actually necessary here, or can we remove the warning and let the file pointer reset to zero silently?
Edit: warning has been removed in 929cb62
@martindurant do you have time to take a look here please? |
It loos reasonable to me. I'm not sure if the warning is needed or not - we don't expect anyone to see it, or if they do, necessarily do anything about it. It's not unusual for code interacting with a file-like object to move the file pointer. |
Hmm, in that case, I'm leaning towards removing the warning. The file pointer is reset anyway after reading the magic byte number, and that hasn't caused any issues (as mentioned in #6813 (comment)), so it should be more or less safe. Let me push another commit. Edit: done at 929cb62. |
File pointer is reset to zero after reading the magic byte number anyway, so should be ok not to warn about this.
Thanks @weiji14 ! |
* upstream/main: (39 commits) Support the new compression argument in netCDF4 > 1.6.0 (pydata#6981) Remove setuptools-scm-git-archive, require setuptools-scm>=7 (pydata#7253) Fix mypy failures (pydata#7343) Docs: add example of writing and reading groups to netcdf (pydata#7338) Reset file pointer to 0 when reading file stream (pydata#7304) Enable mypy warn unused ignores (pydata#7335) Optimize some copying (pydata#7209) Add parse_dims func (pydata#7051) Fix coordinate attr handling in `xr.where(..., keep_attrs=True)` (pydata#7229) Remove code used to support h5py<2.10.0 (pydata#7334) [pre-commit.ci] pre-commit autoupdate (pydata#7330) Fix PR number in what’s new (pydata#7331) Enable `origin` and `offset` arguments in `resample` (pydata#7284) fix doctests: supress urllib3 warning (pydata#7326) fix flake8 config (pydata#7321) implement Zarr v3 spec support (pydata#6475) Fix polyval overloads (pydata#7315) deprecate pynio backend (pydata#7301) mypy - Remove some ignored packages and modules (pydata#7319) Switch to T_DataArray in .coords (pydata#7285) ...
Bump xarray from 2022.11.0 to 2022.12.0 which contains a [bugfix](pydata/xarray#7304) useful for reading multiple groups from ICESat-2 HDF5 files in AWS S3 buckets. Also taking the opportunity to sort packages alphabetically in the `environment.yml` and reorganize some sections originally added in #2. Hopefully this will make it clearer on where new conda packages can be added in the future!
Instead of raising a ValueError about the file pointer not being at the start of the file, reset the file pointer automatically to zero, and warn that the pointer has been reset.
whats-new.rst
api.rst