-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally cache small data variables and file handles #981
Conversation
In the netcdf utility reader, cache small data variables to prevent needlessly often opening and closing the data files.
In the FCI reader, use the data variable caching implemented in the previous commit. This should address pytroll#972.
For strings, I cannot measure their size because their .dtype is a type, not a dtype. Therefore I can't get the itemsize so I don't know how large they will be (they're also variable length). Don't cache those for now, I'm not using them anyway.
Codecov Report
@@ Coverage Diff @@
## master #981 +/- ##
=========================================
+ Coverage 86.96% 87% +0.03%
=========================================
Files 181 181
Lines 27531 27581 +50
=========================================
+ Hits 23943 23997 +54
+ Misses 3588 3584 -4
Continue to review full report at Codecov.
|
There's a lot of work to be done on this still. It only works mutually with #845. |
Fix a bug in the small variable caching, where I was overwriting rather than adding a key to the cache dictionary.
Fix a small bug in the ncutils small var caching, wrong variable named.
Downstream, we need at least the attributes for some of the cached variables. Therefore we do need to make them into xarray dataaarrays again.
Fix bug in small var caching method, should be xr not xarray
In netcdf_utils, add an option to avoid the slow xarray.open_dataset completely. Instead, this option allows to keep the fileformat open as long as the filehandler objects is, and create xarray.dataarray objects manually. The coordinates are missing for now.
The FCI reader nowm uses the new option (introduced in the previous commit) to bypass xarray.open_dataset completely, this should further imporve performance.
Fix a bug introduced a couple of commits ago, where a return statement went AWOL for cases where __getitem__ on the NetCDF4FileHandler is retrieving an attribute or shape.
Fix a bug where an import statement for dask was missing in the netcdf-utils.
The previous commit cannot possibly have been running at all.
Add a test case to cover the newly implemented caching feature in netcdf-utils
PEP8/flake8 fixes in netcdf_utils and test_netcdf_utils
This speeds up the FCI reading-but-not-reading from 40 minutes / 10 GB RAM to 80 seconds / 2 GB RAM. |
Improve test coverage for netcdf_utils. Test coverage for this module is now 100% according to my local pytest.
Fix PEP8 / flake8 complaints
A few cosmetic changes to the netcdf utils caching. Improve the API documentation, change an argument name to better reflect its role, and point out in additional places that we're not doing coordinates when caching variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @gerritholl is this ready to merge ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job. Thanks for all the documentation on the changes.
I have one request that shouldn't stop this from being merge but would be nice to have: The documentation of the class mentions uncached datasets before caching has been discussed at all. Do you think it would be possible to talk about how the loading/caching works before talking about uncached variables or other caching related things?
In the docstring for the optimised netcdf_utils, clarify the first reference to caching.
In the netcdf utility reader, cache small data variables to prevent
needlessly often opening and closing the data files.
flake8 satpy
AUTHORS.md
if not there already