-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add default fill values for decode_cf #5680
base: main
Are you sure you want to change the base?
Conversation
Unit Test Results 6 files 6 suites 56m 11s ⏱️ For more details on these failures, see this check. Results for commit 26f1b32. ♻️ This comment has been updated with latest results. |
xarray/coding/variables.py
Outdated
@@ -183,6 +184,8 @@ def decode(self, variable, name=None): | |||
pop_to(attrs, encoding, attr, name=name) | |||
for attr in ("missing_value", "_FillValue") | |||
] | |||
|
|||
raw_fill_values.append(netCDF4.default_fillvals["f8"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should vendor this dictionary to avoid adding netcdf4 as a required dependency.
{'S1': '\x00',
'i1': -127,
'u1': 255,
'i2': -32767,
'u2': 65535,
'i4': -2147483647,
'u4': 4294967295,
'i8': -9223372036854775806,
'u8': 18446744073709551614,
'f4': 9.969209968386869e+36,
'f8': 9.969209968386869e+36}
We'll also have to pick the appropriate one based on dtype.
834aaba
to
26f1b32
Compare
I would have liked to add the dictionary in It seems that a lot of test failures are caused by something like: > xarray.core.dtypes.maybe_promote(dtype('int64'))
(dtype('float64'), nan So the answer to this must really be yes, am I right?
|
Could you clarify where these default fill values come from? Are they just an arbitrary choice by netCDF4-Python? Or are they part of some broader standard? |
It's in the standard (partly?): https://www.unidata.ucar.edu/software/netcdf/documentation/4.7.4-pre/file_format_specifications.html#atts_spec
and
EDIT: I remember reading some text about how the default _FillValues are "close to the largest or smallest number representable by a datatype", but I cannot find it now. |
AFAIK, these values are chosen, because their binary presentation is good for compression. For instance the 32bit float 9.969209968386869e+36 is hex 0x7CF00000. Unfortunately I can't find a link describing that. |
Right, so netCDF3 has a default value used for filling out variables before any data is written. My concern is that there are two (overlapping) use-case for fill values:
Certainly these sometimes coincide, but that isn't necessarily the case. |
To follow up, from a practical perspective, there are two problems with assuming that there are always "truly missing values" (case 2):
Both of these issues are problematic for faithful "round tripping" of Xarray data into netCDF and back. For this reason, Xarray needs an unambiguous way to know if a netCDF variable could contain semantically missing values. So far, we've used the presence of |
pre-commit run --all-files
whats-new.rst
This is a work in progress, mostly so that I can ask some clarifying questions.
I see that
netCDF4
is an optional dependency forxarray
, so probablyimport netCDF4
can't be used. Shouldxarray
simply hard-code default fill values ?From the issue's conversation, it wasn't clear to me whether an argument should control the use of the default fill value. Since some tests fail now I guess the answer is yes.