-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework grib2 #198
Rework grib2 #198
Conversation
Seems to still need the physical file
@rsignell-usgs , this will need the HRRR notebook example to be remade, because scan_grib doesn't merge by itself, but creates a list of dataset refs to be merged afterwards. See the new kerchunk.grib2.example_combine . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can review in depth later.
Do you intend the new codecs to be compatible with old output?
It seems like it should be. Is there already a test that asserts the correct values read using the codec? That would make it easier to show the behavior has not changed.
There was no test before, but now there is one which shows that kerchunk/zarr and xarray/cfgrib get the same set of array values. Yes, I believe the codec is backward compatible, but of course that is a generally tricky thing in general. I don't know how numcodecs intends, in general, to deal with codec versions. |
@jakirkham, @joshmoore , I wonder if you have an opinion on a |
Filed a Zarr spec issue ( zarr-developers/zarr-specs#148 ) to discuss this. Thanks for surfacing Martin! 🙏 |
@martindurant Got a better solution for the tmp/index files from the ecmwf folks! |
Thanks @emfdavid . We no longer call cfgrib.dataset at all here, but work with the Message directly, so there was no need for that workaround; but I did add it for the case of the test where a .grib2 is loaded directly with xarray and would have left an index file in the source directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @martindurant
Sorry I am late to review. Last week was crazy.
One question about the logging setup call.
Otherwise it looks like a big improvement though I don't fully grok the details yet.
|
||
logger = logging.getLogger("grib2-to-zarr") | ||
fsspec.utils.setup_logging(logger=logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this appropriate to do in a library?
I don't think you want to add your own handlers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, this does seem to be from my own debugging. Generally, people probably do want this logging, but indeed it should be up to the caller to decide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you release 0.0.7 with these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's on my list :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Released - thank you!
if "typeOfLevel" in m and "level" in m: | ||
name = m["typeOfLevel"] | ||
data = np.array([m["level"]]) | ||
attrs = cfgrib.dataset.COORD_ATTRS[name] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martindurant Updating after release 0.0.7 I am now getting the following error using scan_grib with HRRR: gcs://high-resolution-rapid-refresh/hrrr.20220720/conus/hrrr.t00z.wrfsfcf21.grib2
zarr_meta = scan_grib(input_path, **SCAN_SURFACE_GRIB)
File "/Users/dstuebe/.cache/bazel/_bazel_dstuebe/76bbf57da584b86027104686797623fa/execroot/ritta/bazel-out/k8-dbg/bin/ingestion/noaa_nwp/run_local.runfiles/common_deps_kerchunk/kerchunk/grib2.py", line 160, in scan_grib
attrs = cfgrib.dataset.COORD_ATTRS[name]
KeyError: 'surface'
where
SCAN_SURFACE_GRIB = dict(
common=["time", "step", "latitude", "longitude", "valid_time"],
filter=dict(stepType="instant", typeOfLevel="surface"),
storage_options=dict(token=None),
)
I am afraid I am a bit lost on how to start debugging this and wishing I tested last week. I tried shuffling around my filter keywords a bit but I can't grok how this changed from the previous iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had meant to merge #204 before release... Can you try with that? I believe it runs, but note that the grib scanner no longer does any merging by itself, but creates a list of output dicts that you need to pass to MultiZarrToZarr. With that PR, I think you will still get a coordinate for the typeOfLevel parameter, which you might argue would be better as a dataset attribute. Also, I think I see a bug (which won't stop you, but the attributes of that one-value coordinate variable will be wrong), so maybe it's just as well I didn't merge yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works - thank you.
Will look forward to the release 0.0.8 when you have time.
With ideas from cogrib and gribscan: