Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent call to obj.__dict__ when obj is a compound variable. #3069

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mataeui
Copy link

@mataeui mataeui commented Feb 20, 2025

@djhoese djhoese changed the title #3066 Prevent call to obj.__dict__ when obj is a compound variable. Prevent call to obj.__dict__ when obj is a compound variable. Feb 20, 2025
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting this fix. I have a few comments below.

if obj.dtype.kind == 'V':
print(f'Warning: Cannot load attributes for compound variable ({obj.name})')
return {}
except (AttributeError, KeyError):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what cases do we hit these exceptions? When does obj not have a dtype or dtype not have a kind...or is obj.name the problem?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally you could probably replace the except and pass with a else: obj.__dict__. But actually, if the exception isn't possible in the real world then maybe removing the try/except altogether would be good.

Copy link
Author

@mataeui mataeui Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When obj is a group there is no dtype == KeyError
When obj is variable, and variable is vlen, then dtype = <class string> == AttributeError class string has no attribute kind (at least in the Tropomi L1B i'm using)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.dtype is a KeyError? Strange, but ok. I was just wondering, would it be more flexible to do the inverse of these conditions and try first to return obj.__dict__ and if there's an attribute error then warn and return if compound variable otherwise raise the original exception? That way, if it is possible that netcdf4-python adds the ability to read attributes in the future then things just work. Or is that too much wishful thinking on my part?

Copy link
Author

@mataeui mataeui Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O.K. on looking deeper into the problem with Tropomi, the problem is that the attributes 'units' and 'comment' are compound types of strings whereas '_FillValue' is an appropriate compound type of appropriate numerical value.

I can load the _FillValue attribute (getncattr) and get a tuple consistent with the number of elements in the compound data. I can load the data (variable[:]). however, with the 'units' and 'comment' attributes, I get e.g. 'KeyError: "attribute b'comment' has unsupported datatype".

I have tried to search online (see github.com/Unidata/netcdf4-python/issues/773) and it appears that 'it is not possible to create compound types with variable length arrays' (2018) (numpy issue?). I cannot find anything more recent. So, reading of strings in compound types, as such, seems not to be supported in netCDF4.py, however, ncdump (see below) and panoply (panoply without data loading however) seem to manage - don't understand how.

So, for now,

  1. i) the simplest would be to replace obj.__dict__ in _get_object_attrs with a call to getncattr from loop over ncattrs with a try/except KeyError to return the loadable attributes and drop the others.
    ii) put a try/except on dict as you wanted, and then implement (i) on failure.
  2. otherwise solve the string issue somehow.

This would then parse Tropomi and I would be happy - but it doesn't address any deeper issues that satpy might have with compound data.

From ncdump:
compound msmt_to_det_row_table_type {
short det_start_row ;
short det_end_row ;
}; // msmt_to_det_row_table_type

    compound msmt_to_det_row_table_type_str {
      string det_start_row ;
      string det_end_row ;
    }; // msmt_to_det_row_table_type_str

    msmt_to_det_row_table_type measurement_to_detector_row_table(time, scanline, ground_pixel) ;
    msmt_to_det_row_table_type_str measurement_to_detector_row_table:units = {"1", "1"} ;
    msmt_to_det_row_table_type measurement_to_detector_row_table:_FillValue = {-32767, -32767} ;
    msmt_to_det_row_table_type_str measurement_to_detector_row_table:comment = {"Detector start row for measurement row", "Detector end row for measurement row"} ;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So overall I think I like option 1(ii) best as far as getting most support. BUT as you point out, the rest of satpy isn't going to like having multiple values for things it assumes should be scalars (_FillValue is typically deleted, units must be a string, etc).

Panoply is written in Java, ncdump is written in C, so whatever those versions of the NetCDF library are using to store variable length data must allow for these variable length strings. That makes sense given the lower-level nature of those languages.

I'm not sure I've asked this specifically, but what is/are the compound types in any of the files you've seen (ex. tropomi)? I mean, why did they decide a compound type was needed and could we (satpy's readers) split those variables into individual variables?

Any idea if the netcdf4-python library makes it possible to extract one of the "units" elements at a time? Like my_nc_file["my_variable"].getncattr("units", index=0)?

Copy link
Author

@mataeui mataeui Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the current satellite sensors that I have come across, Tropomi L1B data are the first I have seen with compound data types (recently). In fact compound datatypes are not listed in the cf convention's datatypes and, in fact, files which include these do not pass the cf-checker afaik (actually, on checking, it does - somehow). I have come across compound types in point-source station data iirc.

When KNMI decided to do this they 'expected that many software products will be upgraded in time to support the features of the enhanced data model.' (Input-output-data-specification-for-the-TROPOMI-L01b-data-processor.pdf, 2014), basically the compound data in Tropomi is just metadata information on the processing/instrument status.

Splitting into individual variables is the simplest and most obvious way forward without interlacing code to manage compound data into satpy - but the consequence would be a lot of single point variables, possibly with duplicate naming conflicts.

As far as I can see, getncattr, getattr, getatrr, getattribute> are, currently, relatively primitive and don't appear to have options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the consequence would be a lot of single point variables, possibly with duplicate naming conflicts.

What do you mean by this? My understanding of what you said, coming from a place of mostly Visible/IR satellite instrument understanding, is that tropomi has a compound variable. One part of each element is the actual measurement, the rest of the information in a single element is mostly metadata that (hopefully) Satpy's basic usage doesn't need. If we can load the data and some/most of the attributes, then could we pull out just one portion of the compound type that has the observation measurement and throw away the rest to produce the single 2D "image" array that Satpy expects?

Copy link
Author

@mataeui mataeui Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct, sorry I was being distracted by other thoughts. Strikethrough 'single point variables'.

@mataeui mataeui force-pushed the main branch 4 times, most recently from 2953e33 to 26aa442 Compare February 20, 2025 18:52
@mataeui
Copy link
Author

mataeui commented Feb 21, 2025

I assume that:

docs/readthedocs.org:satpy — Read the Docs build failed!

Is nothing to do with me.

@djhoese
Copy link
Member

djhoese commented Feb 21, 2025

Correct. It's a bug/memory leak in the new version of sphinx that I'm working on fixing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NetCDF4FileHandler faults when using obj.__dict__ called when object is a compound variable.
2 participants