-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add to xarray backends docs #461
Comments
@Anu-Ra-g , would you like to do this? |
Yes. This will tutorial will be related to the zarr data format? |
It would show how to use |
In https://docs.xarray.dev/en/stable/user-guide/io.html I would just add a kerchunk section and link here may be. We just want people to know that it's possible and have a place to go to for more information. |
I've made this pull request, pydata/xarray#9146 |
I've done some performance testing for Kerchunk vs something the NCAS-CMS team have developed called CFA, and I've discovered something interesting about the Xarray-Kerchunk engine compared to the old get_mapper/open_zarr method and CFA. It looks like the data loading is happening at the point of slicing and not computing for the Kerchunk Engine? |
Can I see some minimal code to get this behaviour, please? |
For reference, the kerchunk backend essentially does m = fsspec.get_mapper("reference://", fo=filename_or_obj, **storage_options)
return xr.open_dataset(m, engine="zarr", consolidated=False, **open_dataset_options) Are you passing In general, I am happy to help try and get the best performance out of kerchunk and your implementation too - we have the same overall goal. |
I'm running individual cells in a Jupyter notebook to obtain the different values for those different sections:
Wall time: 7.81 s
Wall time: 2.75 s
Wall time: 0.8 ms
Wall time: 984 ms |
Can I have the "example_CMIP_file.json" ? |
Uncompressed this file is 64MB, I can probably make a smaller version if needed. This file has https:// links now so you should have no issues with accessing the data. |
I will test to see if using open_dataset with the zarr engine gives a different result. I've considered m2 as:
|
Ah, I see these are local files - so not much testing I can do about that.
|
For reference, the CFA conventions/specification can be found here: https://github.com/NCAS-CMS/cfa-conventions/blob/main/source/cfa.md and my developing implementation is here: https://github.com/dwest77a/CFAPyX My module is just the CFA reader backend for Xarray which reads CFA-netCDF files like a normal file but with additional 'decoding' of |
FYI the variable |
Correct, the data should not be touched (except the coordinates) until you ask for them - whether with xarray's in-built lazyness or backed by dask. |
With the use of get_mapper/open_zarr instead, the object Using the Kerchunk Engine with the exact same slice/mean operation I get this: Edit: Interestingly the Zarr engine |
As you can see in this notebook (at output 5) from our recent blog, in that case, the object definitely stays as an uncomputed dask array after a |
It looks like at the moment it's either the That second look at h2 where we can see the array does not happen when you use the old method without open_dataset. I think there must be something different in the Xarray backend for both the Kerchunk and Zarr engines where the mean is being applied, and is still tied to other objects, hence why |
@norlandrhagen , @TomNicholas knowing something about xarray engines, have you any idea how to explain what is described above? Why would calling |
It would not for a dask array. However, what you have is not a dask array but one of Xarray's internal lazy arrays. These will compute for any operation but indexing. I think you're missing a |
I mentioned |
Was this closed by the merging of pydata/xarray#9163? |
It appears yes |
Now that there is a kerchunk engine, it'd be nice to mention it in the Xarray docs:
https://docs.xarray.dev/en/stable/user-guide/io.html
And maybe a super simple example in the tutorial repo: https://tutorial.xarray.dev that then directs people to a more in-depth cookbook?
The text was updated successfully, but these errors were encountered: