Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cfgrib loads all chunks into memory when indexing #311

Open
guidocioni opened this issue Sep 2, 2022 · 2 comments
Open

cfgrib loads all chunks into memory when indexing #311

guidocioni opened this issue Sep 2, 2022 · 2 comments

Comments

@guidocioni
Copy link

Related to dask/dask#9451 (and probably to fsspec/kerchunk#198).

When indexing (either sel or isel) over (lat, lon) GRIB files loaded with open_mfdataset (thus containing chunked data) cfrgib attempts to load all chunks into memory. This causes excessive RAM consumption and slow performance.

From the discussion we had the hypothesis is that cfgrib needs to scan the entire file to subset only in few dimensions.
Still, it should be possible not to load the entire dataset into memory when performing the opration.

@matteodefelice
Copy link

I'm interested to this too. I am trying to extract a small subset from a ERA5-land file but - independently from the chunk size - xarray/dask tries to read the entire file in memory.

@iainrussell
Copy link
Member

If I understand the problem correctly, this issue is partly because ecCodes can only read the whole message (field) from disk, even if you only want some meta-data. We have plans to improve that situation, but there is no firm time-frame for it yet. When we do, cfgrib should benefit enormously from it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants