-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent call to obj.__dict__ when obj is a compound variable. #3069
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting this fix. I have a few comments below.
satpy/readers/netcdf_utils.py
Outdated
if obj.dtype.kind == 'V': | ||
print(f'Warning: Cannot load attributes for compound variable ({obj.name})') | ||
return {} | ||
except (AttributeError, KeyError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases do we hit these exceptions? When does obj
not have a dtype
or dtype
not have a kind
...or is obj.name
the problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally you could probably replace the except
and pass
with a else: obj.__dict__
. But actually, if the exception isn't possible in the real world then maybe removing the try/except altogether would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When obj is a group there is no dtype == KeyError
When obj is variable, and variable is vlen, then dtype = <class string> == AttributeError class string has no attribute kind (at least in the Tropomi L1B i'm using)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.dtype
is a KeyError? Strange, but ok. I was just wondering, would it be more flexible to do the inverse of these conditions and try first to return obj.__dict__
and if there's an attribute error then warn and return if compound variable otherwise raise the original exception? That way, if it is possible that netcdf4-python adds the ability to read attributes in the future then things just work. Or is that too much wishful thinking on my part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O.K. on looking deeper into the problem with Tropomi, the problem is that the attributes 'units' and 'comment' are compound types of strings whereas '_FillValue' is an appropriate compound type of appropriate numerical value.
I can load the _FillValue attribute (getncattr) and get a tuple consistent with the number of elements in the compound data. I can load the data (variable[:]). however, with the 'units' and 'comment' attributes, I get e.g. 'KeyError: "attribute b'comment' has unsupported datatype".
I have tried to search online (see github.com/Unidata/netcdf4-python/issues/773) and it appears that 'it is not possible to create compound types with variable length arrays' (2018) (numpy issue?). I cannot find anything more recent. So, reading of strings in compound types, as such, seems not to be supported in netCDF4.py, however, ncdump (see below) and panoply (panoply without data loading however) seem to manage - don't understand how.
So, for now,
- i) the simplest would be to replace obj.__dict__ in _get_object_attrs with a call to getncattr from loop over ncattrs with a try/except KeyError to return the loadable attributes and drop the others.
ii) put a try/except on dict as you wanted, and then implement (i) on failure. - otherwise solve the string issue somehow.
This would then parse Tropomi and I would be happy - but it doesn't address any deeper issues that satpy might have with compound data.
From ncdump:
compound msmt_to_det_row_table_type {
short det_start_row ;
short det_end_row ;
}; // msmt_to_det_row_table_type
compound msmt_to_det_row_table_type_str {
string det_start_row ;
string det_end_row ;
}; // msmt_to_det_row_table_type_str
msmt_to_det_row_table_type measurement_to_detector_row_table(time, scanline, ground_pixel) ;
msmt_to_det_row_table_type_str measurement_to_detector_row_table:units = {"1", "1"} ;
msmt_to_det_row_table_type measurement_to_detector_row_table:_FillValue = {-32767, -32767} ;
msmt_to_det_row_table_type_str measurement_to_detector_row_table:comment = {"Detector start row for measurement row", "Detector end row for measurement row"} ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So overall I think I like option 1(ii) best as far as getting most support. BUT as you point out, the rest of satpy isn't going to like having multiple values for things it assumes should be scalars (_FillValue
is typically deleted, units
must be a string, etc).
Panoply is written in Java, ncdump is written in C, so whatever those versions of the NetCDF library are using to store variable length data must allow for these variable length strings. That makes sense given the lower-level nature of those languages.
I'm not sure I've asked this specifically, but what is/are the compound types in any of the files you've seen (ex. tropomi)? I mean, why did they decide a compound type was needed and could we (satpy's readers) split those variables into individual variables?
Any idea if the netcdf4-python library makes it possible to extract one of the "units" elements at a time? Like my_nc_file["my_variable"].getncattr("units", index=0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of the current satellite sensors that I have come across, Tropomi L1B data are the first I have seen with compound data types (recently). In fact compound datatypes are not listed in the cf convention's datatypes and, in fact, files which include these do not pass the cf-checker afaik (actually, on checking, it does - somehow). I have come across compound types in point-source station data iirc.
When KNMI decided to do this they 'expected that many software products will be upgraded in time to support the features of the enhanced data model.' (Input-output-data-specification-for-the-TROPOMI-L01b-data-processor.pdf, 2014), basically the compound data in Tropomi is just metadata information on the processing/instrument status.
Splitting into individual variables is the simplest and most obvious way forward without interlacing code to manage compound data into satpy - but the consequence would be a lot of single point variables, possibly with duplicate naming conflicts.
As far as I can see, getncattr, getattr, getatrr, getattribute> are, currently, relatively primitive and don't appear to have options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the consequence would be a lot of single point variables, possibly with duplicate naming conflicts.
What do you mean by this? My understanding of what you said, coming from a place of mostly Visible/IR satellite instrument understanding, is that tropomi has a compound variable. One part of each element is the actual measurement, the rest of the information in a single element is mostly metadata that (hopefully) Satpy's basic usage doesn't need. If we can load the data and some/most of the attributes, then could we pull out just one portion of the compound type that has the observation measurement and throw away the rest to produce the single 2D "image" array that Satpy expects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct, sorry I was being distracted by other thoughts. Strikethrough 'single point variables'.
2953e33
to
26aa442
Compare
I assume that: docs/readthedocs.org:satpy — Read the Docs build failed! Is nothing to do with me. |
Correct. It's a bug/memory leak in the new version of sphinx that I'm working on fixing. |
…butes on failure.
AUTHORS.md
if not there already