-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Numeric scalar variable attributes (including fill_value, scale_factor, add_offset) are 1-d instead of 0-d with h5netcdf engine, triggering ValueError: non-broadcastable output on application when loading single elements #4471
Comments
Just tested this with |
I just tested this with some more combinations:
|
Probably related: when reading an open file through a file system instance, the import xarray
from fsspec.implementations.local import LocalFileSystem
fn = "/data/gholl/cache/fogtools/abi/2017/03/14/20/06/7/OR_ABI-L1b-RadF-M3C07_G16_s20170732006100_e20170732016478_c20170732016514.nc"
ds1 = xarray.open_dataset(fn, decode_cf=True, mask_and_scale=False)
print(ds1["esun"].attrs["_FillValue"])
print(ds1["Rad"].attrs["scale_factor"])
with LocalFileSystem().open(fn) as of:
ds2 = xarray.open_dataset(of, decode_cf=True, mask_and_scale=False)
print(ds2["esun"].attrs["_FillValue"])
print(ds2["Rad"].attrs["scale_factor"]) Result:
I strongly suspect that this is what causes the |
Some further digging shows it's due to differences between the import xarray
fn = "/data/gholl/cache/fogtools/abi/2017/03/14/20/06/7/OR_ABI-L1b-RadF-M3C07_G16_s20170732006100_e20170732016478_c20170732016514.nc"
with xarray.open_dataset(fn, decode_cf=False, mask_and_scale=False, engine="netcdf4") as ds:
print(ds["esun"].attrs["_FillValue"])
print(ds["Rad"].attrs["scale_factor"])
with xarray.open_dataset(fn, decode_cf=False, mask_and_scale=False, engine="h5netcdf") as ds:
print(ds["esun"].attrs["_FillValue"])
print(ds["Rad"].attrs["scale_factor"]) Results in:
|
However, a simple `xarray.open_dataset(fn, engine="h5netcdf") still fails with ValueError only if passed an open file, so there appear to be still other differences apart from the dimensionality of the variable attributes depending on backend. |
My last comment was inaccurate. Although the open succeeds, the non-scalar scale factor does trigger failure upon accessing data (due to lazy loading) even without any open file: import xarray
fn = "OR_ABI-L1b-RadF-M3C07_G16_s20170732006100_e20170732016478_c20170732016514.nc"
with xarray.open_dataset(fn, engine="h5netcdf") as ds:
print(ds["Rad"][400, 300]) The data file is publicly available at: s3://noaa-goes16/ABI-L1b-RadF/2017/073/20/OR_ABI-L1b-RadF-M3C07_G16_s20170732006100_e20170732016478_c20170732016514.nc |
Interestingly, the problem is prevented if one adds
before the print statement. |
The
|
According to The NetCDF User's Guide, attributes are supposed to be vectors:
That suggests that, strictly speaking, the h5netcdf engine is right and the netcdf4 engine is wrong, and that other components (such as where the scale factor and add_offset are applied) need to be adapted to handle arrays of length 1 for those values. |
Great diagnosis @gerritholl . This could be fixed here (I think): xarray/xarray/coding/variables.py Lines 245 to 263 in 333e8db
|
I can try to fix this in a PR, I just need to be sure what the fix should look like - to change the dimensionality of attributes (has the potential to break backward compatibility) or to adapt other components to handle either scalars or length 1 arrays (safer alternative, but may occur in more locations both inside and outside xarray, so in this case perhaps a note in the documentation could be in order as well). I don't know if xarray thrives for consistency between what the different engines expose on opening the same file. |
I think we can make this change safely in the decoding machinery. As you point out, it will be backwards compatible. |
I agree, xarray's decoding should be robust as to whether these attributes are scalars or vectors of length one. This should probably be considered a bug in h5netcdf, which I guess should the assumption from netCDF4-python that vector attributes of length 1 are scalars. (h5netcdf can store true scalar attributes in HDF5 files, but it's probably better to be consistent with netCDF) |
The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes pydata#4471.
The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes pydata#4471.
Add a whats-new entry for the fix to issue pydata#4471, corresponding to PR pydata#4485.
* Handle scale_factor and add_offset as scalar The h5netcdf engine exposes single-valued attributes as arrays of shape (1,), which is correct according to the NetCDF standard, but may cause a problem when reading a value of shape () before the scale_factor and add_offset have been applied. This PR adds a check for the dimensionality of add_offset and scale_factor and ensures they are scalar before they are used for further processing, adds a unit test to verify that this works correctly, and a note to the documentation to warn users of this difference between the h5netcdf and netcdf4 engines. Fixes #4471. * DOC: Add whats-new entry for fixing 4471 Add a whats-new entry for the fix to issue #4471, corresponding to PR #4485. * Update doc/io.rst Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>
What happened:
When I try to open a NetCDF file using the
h5netcdf
engine, accessing a single data point before scale factors have been applied results inValueError: non-broadcastable output operand with shape () doesn't match the broadcast shape (1,)
. The MCVE (see below) results in:What you expected to happen:
I expect the data access to work similarly as when opening with other engines.
Minimal Complete Verifiable Example:
Anything else we need to know?:
An earlier version of this issue, and some comments, refer to fsspec or working on open files, but that proved to have nothing to do with the problem.
Environment:
I've confirmed this issue installing xarray from latest master, which means xarray 0.16.2.dev11+gf821fe20 at the time of writing,
Output of xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.8.5 | packaged by conda-forge | (default, Sep 24 2020, 16:55:52)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.12.14-lp150.12.82-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4
xarray: 0.16.2.dev11+gf821fe20
pandas: 1.1.2
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.6
cfgrib: None
iris: None
bottleneck: None
dask: 2.27.0
distributed: 2.27.0
matplotlib: 3.3.2
cartopy: None
seaborn: None
numbagg: None
pint: None
setuptools: 49.6.0.post20200917
pip: 20.2.3
conda: None
pytest: 6.0.2
IPython: None
sphinx: None
The text was updated successfully, but these errors were encountered: