Prevent call to obj.dict when obj is a compound variable. #3069

mataeui · 2025-02-20T16:59:57Z

Closes NetCDF4FileHandler faults when using obj.__dict__ called when object is a compound variable. #3066 -->
Tests added
Fully documented
Add your name to AUTHORS.md if not there already

djhoese

Thanks for starting this fix. I have a few comments below.

satpy/readers/netcdf_utils.py

djhoese · 2025-02-20T17:17:47Z

satpy/readers/netcdf_utils.py

+            if obj.dtype.kind == 'V':
+                print(f'Warning: Cannot load attributes for compound variable ({obj.name})')
+                return {}
+        except (AttributeError, KeyError):


In what cases do we hit these exceptions? When does obj not have a dtype or dtype not have a kind...or is obj.name the problem?

Additionally you could probably replace the except and pass with a else: obj.__dict__. But actually, if the exception isn't possible in the real world then maybe removing the try/except altogether would be good.

When obj is a group there is no dtype == KeyError
When obj is variable, and variable is vlen, then dtype = <class string> == AttributeError class string has no attribute kind (at least in the Tropomi L1B i'm using)

.dtype is a KeyError? Strange, but ok. I was just wondering, would it be more flexible to do the inverse of these conditions and try first to return obj.__dict__ and if there's an attribute error then warn and return if compound variable otherwise raise the original exception? That way, if it is possible that netcdf4-python adds the ability to read attributes in the future then things just work. Or is that too much wishful thinking on my part?

O.K. on looking deeper into the problem with Tropomi, the problem is that the attributes 'units' and 'comment' are compound types of strings whereas '_FillValue' is an appropriate compound type of appropriate numerical value.

I can load the _FillValue attribute (getncattr) and get a tuple consistent with the number of elements in the compound data. I can load the data (variable[:]). however, with the 'units' and 'comment' attributes, I get e.g. 'KeyError: "attribute b'comment' has unsupported datatype".

I have tried to search online (see github.com/Unidata/netcdf4-python/issues/773) and it appears that 'it is not possible to create compound types with variable length arrays' (2018) (numpy issue?). I cannot find anything more recent. So, reading of strings in compound types, as such, seems not to be supported in netCDF4.py, however, ncdump (see below) and panoply (panoply without data loading however) seem to manage - don't understand how.

So, for now,

i) the simplest would be to replace obj.__dict__ in _get_object_attrs with a call to getncattr from loop over ncattrs with a try/except KeyError to return the loadable attributes and drop the others.
ii) put a try/except on dict as you wanted, and then implement (i) on failure.

otherwise solve the string issue somehow.

This would then parse Tropomi and I would be happy - but it doesn't address any deeper issues that satpy might have with compound data.

From ncdump:
compound msmt_to_det_row_table_type {
short det_start_row ;
short det_end_row ;
}; // msmt_to_det_row_table_type

compound msmt_to_det_row_table_type_str { string det_start_row ; string det_end_row ; }; // msmt_to_det_row_table_type_str msmt_to_det_row_table_type measurement_to_detector_row_table(time, scanline, ground_pixel) ; msmt_to_det_row_table_type_str measurement_to_detector_row_table:units = {"1", "1"} ; msmt_to_det_row_table_type measurement_to_detector_row_table:_FillValue = {-32767, -32767} ; msmt_to_det_row_table_type_str measurement_to_detector_row_table:comment = {"Detector start row for measurement row", "Detector end row for measurement row"} ;

So overall I think I like option 1(ii) best as far as getting most support. BUT as you point out, the rest of satpy isn't going to like having multiple values for things it assumes should be scalars (_FillValue is typically deleted, units must be a string, etc).

Panoply is written in Java, ncdump is written in C, so whatever those versions of the NetCDF library are using to store variable length data must allow for these variable length strings. That makes sense given the lower-level nature of those languages.

I'm not sure I've asked this specifically, but what is/are the compound types in any of the files you've seen (ex. tropomi)? I mean, why did they decide a compound type was needed and could we (satpy's readers) split those variables into individual variables?

Any idea if the netcdf4-python library makes it possible to extract one of the "units" elements at a time? Like my_nc_file["my_variable"].getncattr("units", index=0)?

Of the current satellite sensors that I have come across, Tropomi L1B data are the first I have seen with compound data types (recently). In fact compound datatypes are not listed in the cf convention's datatypes and, in fact, files which include these do not pass the cf-checker afaik (actually, on checking, it does - somehow). I have come across compound types in point-source station data iirc.

When KNMI decided to do this they 'expected that many software products will be upgraded in time to support the features of the enhanced data model.' (Input-output-data-specification-for-the-TROPOMI-L01b-data-processor.pdf, 2014), basically the compound data in Tropomi is just metadata information on the processing/instrument status.

Splitting into individual variables is the simplest and most obvious way forward without interlacing code to manage compound data into satpy - but the consequence would be a lot of single point variables, possibly with duplicate naming conflicts.

As far as I can see, getncattr, getattr, getatrr, getattribute> are, currently, relatively primitive and don't appear to have options.

but the consequence would be a lot of single point variables, possibly with duplicate naming conflicts.

What do you mean by this? My understanding of what you said, coming from a place of mostly Visible/IR satellite instrument understanding, is that tropomi has a compound variable. One part of each element is the actual measurement, the rest of the information in a single element is mostly metadata that (hopefully) Satpy's basic usage doesn't need. If we can load the data and some/most of the attributes, then could we pull out just one portion of the compound type that has the observation measurement and throw away the rest to produce the single 2D "image" array that Satpy expects?

Yes, you are correct, sorry I was being distracted by other thoughts. Strikethrough 'single point variables'.

mataeui · 2025-02-21T13:40:12Z

I assume that:

docs/readthedocs.org:satpy — Read the Docs build failed!

Is nothing to do with me.

djhoese · 2025-02-21T14:44:27Z

Correct. It's a bug/memory leak in the new version of sphinx that I'm working on fixing.

…butes on failure.

mataeui requested review from djhoese and mraspaud as code owners February 20, 2025 16:59

djhoese changed the title ~~#3066 Prevent call to obj.__dict__ when obj is a compound variable.~~ Prevent call to obj.__dict__ when obj is a compound variable. Feb 20, 2025

djhoese reviewed Feb 20, 2025

View reviewed changes

mataeui force-pushed the main branch 4 times, most recently from 2953e33 to 26aa442 Compare February 20, 2025 18:52

To get object attributes use __dict__ but retrieve all possible attri…

07adb7a

…butes on failure.

mataeui force-pushed the main branch from 26aa442 to 07adb7a Compare February 24, 2025 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent call to obj.dict when obj is a compound variable. #3069

Prevent call to obj.dict when obj is a compound variable. #3069

mataeui commented Feb 20, 2025

djhoese left a comment

djhoese Feb 20, 2025

djhoese Feb 20, 2025

mataeui Feb 20, 2025 •

edited

Loading

djhoese Feb 21, 2025

mataeui Feb 24, 2025 •

edited

Loading

djhoese Feb 24, 2025

mataeui Feb 25, 2025 •

edited

Loading

djhoese Feb 25, 2025

mataeui Feb 25, 2025 •

edited

Loading

mataeui commented Feb 21, 2025

djhoese commented Feb 21, 2025

Prevent call to obj.__dict__ when obj is a compound variable. #3069

Are you sure you want to change the base?

Prevent call to obj.__dict__ when obj is a compound variable. #3069

Conversation

mataeui commented Feb 20, 2025

djhoese left a comment

Choose a reason for hiding this comment

djhoese Feb 20, 2025

Choose a reason for hiding this comment

djhoese Feb 20, 2025

Choose a reason for hiding this comment

mataeui Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

djhoese Feb 21, 2025

Choose a reason for hiding this comment

mataeui Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

djhoese Feb 24, 2025

Choose a reason for hiding this comment

mataeui Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

djhoese Feb 25, 2025

Choose a reason for hiding this comment

mataeui Feb 25, 2025 • edited Loading

Choose a reason for hiding this comment

mataeui commented Feb 21, 2025

djhoese commented Feb 21, 2025

Prevent call to obj.dict when obj is a compound variable. #3069

Prevent call to obj.dict when obj is a compound variable. #3069

mataeui Feb 20, 2025 •

edited

Loading

mataeui Feb 24, 2025 •

edited

Loading

mataeui Feb 25, 2025 •

edited

Loading

mataeui Feb 25, 2025 •

edited

Loading