Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader for NWC SAF HRW (high resolution winds) data #3052

Open
pnuu opened this issue Feb 7, 2025 · 8 comments · May be fixed by #3070
Open

Reader for NWC SAF HRW (high resolution winds) data #3052

pnuu opened this issue Feb 7, 2025 · 8 comments · May be fixed by #3070
Assignees
Labels
component:readers enhancement code enhancements, features, improvements help wanted

Comments

@pnuu
Copy link
Member

pnuu commented Feb 7, 2025

Feature Request

I'm looking at creating a reader for NWC SAF (v2023.3) HRW data (in NetCDF4). The format is complicated with two nested compound data types:

$ ncdump -h S_NWC_HRW_MSG3_MSG-N-BS_20250206T130000Z.nc
netcdf S_NWC_HRW_MSG3_MSG-N-BS_20250206T130000Z {
types:
  compound segment {
    float latitude ;
    float longitude ;
    float latitude_increment ;
    float longitude_increment ;
    float air_temperature ;
    float air_pressure ;
    float air_pressure_error ;
    float air_pressure_correction ;
    float wind_speed ;
    float wind_from_direction ;
    ubyte quality_index_with_forecast ;
    ubyte quality_index_without_forecast ;
    ubyte quality_index_iwwg_value ;
    ubyte tracer_correlation_method ;
    ubyte tracer_type ;
    ubyte height_assignment_method ;
    ubyte orographic_index ;
    ubyte cloud_type ;
    ubyte correlation ;
  }; // segment
  segment(*) trajectory ;
  compound Wind {
    uint wind_idx ;
    uint previous_wind_idx ;
    ubyte number_of_winds ;
    ubyte correlation_test ;
    ushort quality_test ;
    uint segment_x ;
    uint segment_y ;
    uint segment_x_pix ;
    uint segment_y_pix ;
    float latitude ;
    float longitude ;
    float latitude_increment ;
    float longitude_increment ;
    float air_temperature ;
    float air_pressure ;
    float air_pressure_error ;
    float air_pressure_correction ;
    float air_pressure_nwp_at_best_fit_level ;
    float wind_speed ;
    float wind_from_direction ;
    float wind_speed_nwp_at_amv_level ;
    float wind_from_direction_nwp_at_amv_level ;
    float wind_speed_nwp_at_best_fit_level ;
    float wind_from_direction_nwp_at_best_fit_level ;
    float wind_speed_difference_nwp_at_amv_level ;
    float wind_from_direction_difference_nwp_at_amv_level ;
    float wind_speed_difference_nwp_at_best_fit_level ;
    float wind_from_direction_difference_nwp_at_best_fit_level ;
    ubyte quality_index_with_forecast ;
    ubyte quality_index_without_forecast ;
    ubyte quality_index_iwwg_value ;
    ubyte tracer_correlation_method ;
    ubyte tracer_type ;
    ubyte height_assignment_method ;
    ubyte orographic_index ;
    ubyte cloud_type ;
    ubyte correlation ;
    trajectory trajectory ;
  }; // Wind
dimensions:
	number_of_observations_HRVIS = UNLIMITED ; // (7263 currently)
	number_of_observations_VIS06 = UNLIMITED ; // (14313 currently)
	number_of_observations_VIS08 = UNLIMITED ; // (15614 currently)
	number_of_observations_IR108 = UNLIMITED ; // (18415 currently)
	number_of_observations_IR120 = UNLIMITED ; // (0 currently)
	number_of_observations_WV062 = UNLIMITED ; // (4954 currently)
	number_of_observations_WV073 = UNLIMITED ; // (11404 currently)
variables:
	Wind wind_hrvis(number_of_observations_HRVIS) ;
		wind_hrvis:standard_name = "Atmospheric winds" ;
		wind_hrvis:long_name = "NWC/GEO High Resolution Winds" ;
		wind_hrvis:wind_computation_method = 2U ;
		wind_hrvis:first_guess = 2U ;
		wind_hrvis:sensor_band_identifier = "HRV" ;
		wind_hrvis:sensor_band_central_radiation_frequency = 3.997e+14f ;
		wind_hrvis:sensor_band_central_radiation_width = 7.495e+14f ;
		wind_hrvis:cycle = 53.f ;
		wind_hrvis:manual_automatic_quality_control = 0U ;
		wind_hrvis:time_period = 15.f ;
		wind_hrvis:number_of_nwp_wind_levels = 15U ;
		wind_hrvis:validation_nwp_forecast_or_analysis = 1U ;
	Wind wind_vis06(number_of_observations_VIS06) ;
		wind_vis06:standard_name = "Atmospheric winds" ;
		wind_vis06:long_name = "NWC/GEO High Resolution Winds" ;
		wind_vis06:wind_computation_method = 2U ;
		wind_vis06:first_guess = 2U ;
		wind_vis06:sensor_band_identifier = "VIS006" ;
		wind_vis06:sensor_band_central_radiation_frequency = 4.721e+14f ;
		wind_vis06:sensor_band_central_radiation_width = 1.9986e+15f ;
		wind_vis06:cycle = 53.f ;
		wind_vis06:manual_automatic_quality_control = 0U ;
		wind_vis06:time_period = 15.f ;
		wind_vis06:number_of_nwp_wind_levels = 15U ;
		wind_vis06:validation_nwp_forecast_or_analysis = 1U ;
	Wind wind_vis08(number_of_observations_VIS08) ;
		wind_vis08:standard_name = "Atmospheric winds" ;
		wind_vis08:long_name = "NWC/GEO High Resolution Winds" ;
		wind_vis08:wind_computation_method = 2U ;
		wind_vis08:first_guess = 2U ;
		wind_vis08:sensor_band_identifier = "VIS008" ;
		wind_vis08:sensor_band_central_radiation_frequency = 3.701e+14f ;
		wind_vis08:sensor_band_central_radiation_width = 2.1414e+15f ;
		wind_vis08:cycle = 53.f ;
		wind_vis08:manual_automatic_quality_control = 0U ;
		wind_vis08:time_period = 15.f ;
		wind_vis08:number_of_nwp_wind_levels = 15U ;
		wind_vis08:validation_nwp_forecast_or_analysis = 1U ;
	Wind wind_ir108(number_of_observations_IR108) ;
		wind_ir108:standard_name = "Atmospheric winds" ;
		wind_ir108:long_name = "NWC/GEO High Resolution Winds" ;
		wind_ir108:wind_computation_method = 1U ;
		wind_ir108:first_guess = 2U ;
		wind_ir108:sensor_band_identifier = "IR_108" ;
		wind_ir108:sensor_band_central_radiation_frequency = 2.78e+13f ;
		wind_ir108:sensor_band_central_radiation_width = 1.499e+14f ;
		wind_ir108:cycle = 53.f ;
		wind_ir108:manual_automatic_quality_control = 0U ;
		wind_ir108:time_period = 15.f ;
		wind_ir108:number_of_nwp_wind_levels = 15U ;
		wind_ir108:validation_nwp_forecast_or_analysis = 1U ;
	Wind wind_ir120(number_of_observations_IR120) ;
		wind_ir120:standard_name = "Atmospheric winds" ;
		wind_ir120:long_name = "NWC/GEO High Resolution Winds" ;
		wind_ir120:wind_computation_method = 1U ;
		wind_ir120:first_guess = 2U ;
		wind_ir120:sensor_band_identifier = "IR_120" ;
		wind_ir120:sensor_band_central_radiation_frequency = 2.5e+13f ;
		wind_ir120:sensor_band_central_radiation_width = 1.499e+14f ;
		wind_ir120:cycle = 53.f ;
		wind_ir120:manual_automatic_quality_control = 0U ;
		wind_ir120:time_period = 15.f ;
		wind_ir120:number_of_nwp_wind_levels = 15U ;
		wind_ir120:validation_nwp_forecast_or_analysis = 1U ;
	Wind wind_wv062(number_of_observations_WV062) ;
		wind_wv062:standard_name = "Atmospheric winds" ;
		wind_wv062:long_name = "NWC/GEO High Resolution Winds" ;
		wind_wv062:wind_computation_method = 7U ;
		wind_wv062:first_guess = 2U ;
		wind_wv062:sensor_band_identifier = "WV_062" ;
		wind_wv062:sensor_band_central_radiation_frequency = 4.8e+13f ;
		wind_wv062:sensor_band_central_radiation_width = 1.666e+14f ;
		wind_wv062:cycle = 53.f ;
		wind_wv062:manual_automatic_quality_control = 0U ;
		wind_wv062:time_period = 15.f ;
		wind_wv062:number_of_nwp_wind_levels = 15U ;
		wind_wv062:validation_nwp_forecast_or_analysis = 1U ;
	Wind wind_wv073(number_of_observations_WV073) ;
		wind_wv073:standard_name = "Atmospheric winds" ;
		wind_wv073:long_name = "NWC/GEO High Resolution Winds" ;
		wind_wv073:wind_computation_method = 7U ;
		wind_wv073:first_guess = 2U ;
		wind_wv073:sensor_band_identifier = "WV_073" ;
		wind_wv073:sensor_band_central_radiation_frequency = 4.08e+13f ;
		wind_wv073:sensor_band_central_radiation_width = 2.998e+14f ;
		wind_wv073:cycle = 53.f ;
		wind_wv073:manual_automatic_quality_control = 0U ;
		wind_wv073:time_period = 15.f ;
		wind_wv073:number_of_nwp_wind_levels = 15U ;
		wind_wv073:validation_nwp_forecast_or_analysis = 1U ;

// global attributes:
		:Conventions = "CF-1.6" ;
		:title = "NWC/GEO-High Resolution Winds Product" ;
		:history = "2025-02-06T13:18:58Z satman Product Created by NWC/GEO v2021.3\n2025-02-06T13:18:58Z satman GEO-HRW-v622 2025-02-06T13:00:00Z safnwc_MSGN.cfg safnwc_HRW.cfm" ;
		:institution = "FMI" ;
		:source = "NWC/GEO version v2021.3" ;
		:comment = "Copyright 2025, EUMETSAT, All Rights Reserved" ;
		:references = "http://nwc-saf.eumetsat.int" ;
		:contact = "safnwchd@aemet.es" ;
		:summary = "High Resolution Winds Product of the NWC/GEO. Detailed sets of Atmospheric Motion Vectors and Trajectories throughout all hours of the day, considering visible, infrared and water vapour channel data" ;
		:keywords = "Atmospheric Motion Vectors, Satellite winds, Satellite trajectories" ;
		:keywords_vocabulary = "GCMD Science Keywords" ;
		:id = "S_NWC_HRW_MSG3_MSG-N-VISIR_20250206T130000Z.nc" ;
		:naming_authority = "FMI" ;
		:cdm_data_type = "Bulletin" ;
		:date_created = "2025-02-06T13:18:58Z" ;
		:creator_name = "FMI" ;
		:creator_url = "https://en.ilmatieteenlaitos.fi/" ;
		:creator_email = "@fmi.fi" ;
		:project = "NWC/GEO" ;
		:processing_level = "Level 2" ;
		:time_coverage_start = "2025-02-06T13:09:08Z" ;
		:time_coverage_end = "2025-02-06T13:12:13Z" ;
		:license = "EUMETSAT user policy" ;
		:saf = "NWC/GEO" ;
		:product_name = "HRW" ;
		:product_algorithm_version = "6.2.2" ;
		:satellite_identifier = "MSG3" ;
		:sub-satellite_longitude = 0.f ;
		:centre_projection_longitude = 0.f ;
		:nominal_product_time = "2025-02-06T13:00:00Z" ;
		:region_id = "MSG-N" ;
		:region_name = "MSG-N; CENTRE=42N 0E; SIZE=928x3712 VISIR pix" ;
		:spatial_resolution = 3.f ;
		:product_quality = 89.2335f ;
		:product_completeness = 3.137833f ;
}

The structure is such that netCDF4 backend just says the compound datatype is unsupported, so none of the data are accessible:

fid = xr.open_dataset(fname, engine="netcdf4")
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: unsupported VLEN type, skipping...
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: unsupported Compound type, skipping...
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_hrvis' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_vis06' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_vis08' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_ir108' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_ir120' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_wv062' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)
/home/lahtinep/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/file_manager.py:217: UserWarning: WARNING: variable 'wind_wv073' has unsupported compound datatype, skipping ..
  file = self._opener(*self._args, **kwargs)

Using h5netcdf I get further:

In [70]: fid = xr.open_dataset(fname, engine="h5netcdf")

In [71]: wind_vis06 = fid["wind_vis06"]

In [72]: wind_vis06
Out[72]: 
<xarray.DataArray 'wind_vis06' (number_of_observations_VIS06: 14313)> Size: 2MB
[14313 values with dtype={'names': ['wind_idx', 'previous_wind_idx', 'number_of_winds', 'correlation_test', 'quality_test', 'segment_x', 'segment_y', 'segment_x_pix', 'segment_y_pix', 'latitude', 'longitude', 'latitude_increment', 'longitude_increment', 'air_temperature', 'air_pressure', 'air_pressure_error', 'air_pressure_correction', 'air_pressure_nwp_at_best_fit_level', 'wind_speed', 'wind_from_direction', 'wind_speed_nwp_at_amv_level', 'wind_from_direction_nwp_at_amv_level', 'wind_speed_nwp_at_best_fit_level', 'wind_from_direction_nwp_at_best_fit_level', 'wind_speed_difference_nwp_at_amv_level', 'wind_from_direction_difference_nwp_at_amv_level', 'wind_speed_difference_nwp_at_best_fit_level', 'wind_from_direction_difference_nwp_at_best_fit_level', 'quality_index_with_forecast', 'quality_index_without_forecast', 'quality_index_iwwg_value', 'tracer_correlation_method', 'tracer_type', 'height_assignment_method', 'orographic_index', 'cloud_type', 'correlation', 'trajectory'], 'formats': ['<u4', '<u4', 'u1', 'u1', '<u2', '<u4', '<u4', '<u4', '<u4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', '<f4', 'u1', 'u1', 'u1', 'u1', 'u1', 'u1', 'u1', 'u1', 'u1', 'O'], 'offsets': [0, 4, 8, 9, 10, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 60, 64, 68, 72, 76, 80, 84, 88, 92, 96, 100, 104, 105, 106, 107, 108, 109, 110, 111, 112, 120], 'itemsize': 136}]
Dimensions without coordinates: number_of_observations_VIS06
Attributes:
    standard_name:                            Atmospheric winds
    long_name:                                NWC/GEO High Resolution Winds
    wind_computation_method:                  2
    first_guess:                              2
    sensor_band_identifier:                   VIS006
    sensor_band_central_radiation_frequency:  472100000000000.0
    sensor_band_central_radiation_width:      1998600000000000.0
    cycle:                                    53.0
    manual_automatic_quality_control:         0
    time_period:                              15.0
    number_of_nwp_wind_levels:                15
    validation_nwp_forecast_or_analysis:      1

But now I can't find a way to access the actual data:

In [74]: data = wind_vis06.data
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[74], line 1
----> 1 data = wind_vis06.data

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/dataarray.py:795, in DataArray.data(self)
    783 @property
    784 def data(self) -> Any:
    785     """
    786     The DataArray's data as an array. The underlying array type
    787     (e.g. dask, sparse, pint) is preserved.
   (...)
    793     DataArray.values
    794     """
--> 795     return self.variable.data

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/variable.py:415, in Variable.data(self)
    413     return self._data
    414 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 415     return self._data.get_duck_array()
    416 else:
    417     return self.values

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/indexing.py:835, in MemoryCachedArray.get_duck_array(self)
    834 def get_duck_array(self):
--> 835     self._ensure_cached()
    836     return self.array.get_duck_array()

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/indexing.py:832, in MemoryCachedArray._ensure_cached(self)
    831 def _ensure_cached(self):
--> 832     self.array = as_indexable(self.array.get_duck_array())

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/indexing.py:789, in CopyOnWriteArray.get_duck_array(self)
    788 def get_duck_array(self):
--> 789     return self.array.get_duck_array()

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/indexing.py:652, in LazilyIndexedArray.get_duck_array(self)
    648     array = apply_indexer(self.array, self.key)
    649 else:
    650     # If the array is not an ExplicitlyIndexedNDArrayMixin,
    651     # it may wrap a BackendArray so use its __getitem__
--> 652     array = self.array[self.key]
    654 # self.array[self.key] is now a numpy array when
    655 # self.array is a BackendArray subclass
    656 # and self.key is BasicIndexer((slice(None, None, None),))
    657 # so we need the explicit check for ExplicitlyIndexed
    658 if isinstance(array, ExplicitlyIndexed):

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:55, in H5NetCDFArrayWrapper.__getitem__(self, key)
     54 def __getitem__(self, key):
---> 55     return indexing.explicit_indexing_adapter(
     56         key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR, self._getitem
     57     )

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/core/indexing.py:1013, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
    991 """Support explicit indexing by delegating to a raw indexing method.
    992 
    993 Outer and/or vectorized indexers are supported by indexing a second time
   (...)
   1010 Indexing result, in the form of a duck numpy-array.
   1011 """
   1012 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
-> 1013 result = raw_indexing_method(raw_key.tuple)
   1014 if numpy_indices.tuple:
   1015     # index the loaded np.ndarray
   1016     indexable = NumpyIndexingAdapter(result)

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:62, in H5NetCDFArrayWrapper._getitem(self, key)
     60 with self.datastore.lock:
     61     array = self.get_array(needs_lock=False)
---> 62     return array[key]

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/h5netcdf/core.py:553, in BaseVariable.__getitem__(self, key)
    547     h5ds = self._h5ds
    549 if (
    550     isinstance(self.datatype, CompoundType)
    551     and (view := self.datatype.dtype_view) is not None
    552 ):
--> 553     return h5ds[key].view(view)
    554 else:
    555     return h5ds[key]

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File ~/mambaforge/envs/py312/lib/python3.12/site-packages/h5py/_hl/dataset.py:864, in Dataset.__getitem__(self, args, new_dtype)
    862 mspace = h5s.create_simple(selection.mshape)
    863 fspace = selection.id
--> 864 self.id.read(mspace, fspace, arr, mtype, dxpl=self._dxpl)
    866 # Patch up the output for NumPy
    867 if arr.shape == ():

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5d.pyx:241, in h5py.h5d.DatasetID.read()

File h5py/_proxy.pyx:143, in h5py._proxy.dset_rw()

File h5py/_conv.pyx:669, in h5py._conv.vlen2ndarray()

File h5py/_conv.pyx:741, in h5py._conv.conv_vlen2ndarray()

ValueError: Cannot create cython.array from NULL pointer

Describe the solution you'd like
Anything to access the actual values listed in the dtype dictionary of the Wind compound data.

Describe any changes to existing user workflow
Looks like reading the data needs h5netcdf backend, while the other NWC SAF GEO data are read with netcdf4. So one additional library might be needed.

Additional context
"My brain hurts." - Monty Python

@pnuu pnuu added component:readers enhancement code enhancements, features, improvements help wanted labels Feb 7, 2025
@pnuu pnuu self-assigned this Feb 7, 2025
@pnuu
Copy link
Member Author

pnuu commented Feb 7, 2025

Looking at https://numpy.org/doc/2.1/user/basics.rec.html I tried a bunch of ways accessing fid["wind_vis06"].data all of which resulted in a variety of errors:

In [115]: fid = xr.open_dataset(fname, engine="h5netcdf")
In [116]: fid["wind_vis06"][0].data
...
TypeError: Cannot change data-type for array of references.

In [116]: fid["wind_vis06"].data
...
ValueError: Cannot create cython.array from NULL pointer

In [118]: fid["wind_vis06"][0].data["wind_idx"]
...
TypeError: Cannot change data-type for array of references.

As I haven't defined chunks kwarg, it should be None and .data should be a Numpy array.

When using h5py, there's a success:

In [109]: fid = h5py.File(fname, "r")

In [110]: fid["wind_vis06"][0]["wind_idx"]
Out[110]: np.uint32(50388)

@TAlonglong
Copy link
Collaborator

Like this can also work?

Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import xarray as xr
>>> d=xr.open_dataset("/home/trygveas/Git/departments/S_NWC_HRW_MSG3_MSG-N-BS_20250207T091500Z.nc", engine="h5netcdf")
>>> d.wind_vis06[0].values['latitude']
array(30.832611, dtype=float32)

@pnuu
Copy link
Member Author

pnuu commented Feb 7, 2025

@TAlonglong that doesn't work for me 🤔 I'm now using XArray v2025.1.2, Numpy 2.0.2 and Python 3.12. I get the TypeError: Cannot change data-type for array of references. error.

@pnuu
Copy link
Member Author

pnuu commented Feb 20, 2025

Tried updating h5py to v3.13.0, still fails identically with XArray.

I think I'll just have a look how terrible it would be to open the HRW files with h5py whereas everything else is opened with xr.open_dataset() 🙈

@pnuu
Copy link
Member Author

pnuu commented Feb 20, 2025

The next hurdle: how to organize the data?

  • there are 7 channels
  • there are 38 fields for each observation for each channel
  • each observation also has an array of trajectory objects with 19 separate fields
    ** for the file I'm looking at, the array has 24 trajectory segments in it

Even with discarding the trajectory segments, there are 7 * 37 = 259 different datasets, so the datasets would need to be dynamic.

How would the user request for example wind speeds and directions for wind_vis06 variable? Would it be

scn.load(["wind_vis06_wind_speed", "wind_vis06_wind_from_direction"])

I also think this might need to be a completely separate reader to the other NWC SAF data.

@pnuu
Copy link
Member Author

pnuu commented Feb 20, 2025

I think I accidentally found out what causes the ValueError: Cannot create cython.array from NULL pointer when opening the files with xr.open_dataset(fname, engine="h5netcdf").

Running

fid = h5py.File(fname, "r")
wind_vis06 = fid["wind_vis06"]
for key in wind_vis06:
    print(key)

all the data are printed, until there's the crash. So I think XArray is mapping out all the data at opening. So if I find the offending dataset(s), I could do fid = xr.open_dataset(..., drop_variables=[bad1, ... badN]) to make opening work.

@pnuu
Copy link
Member Author

pnuu commented Feb 20, 2025

The drop_variables won't work as a fix because the problem is with the trajectory item of the Wind compound data type.

@lahtinep
Copy link
Contributor

The netCDF4.Dataset() has cmptypes argument where it would be possibly to define the mapping for compound data types, but this is not exposed in xarray.backends.NetCDF4BackendEntrypoint.open_dataset(). So I'll go for creating a completely new reader using h5py.

@pnuu pnuu linked a pull request Feb 21, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:readers enhancement code enhancements, features, improvements help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants