-
Notifications
You must be signed in to change notification settings - Fork 303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize readers searching for matching filenames #1178
Conversation
I think I'd like to look at caching these globified patterns so in MultiScene cases they are only globified once. That's my next step although I don't expect it to make a big difference with my current test case. |
If globify isn't listed anymore, then maybe the optimization should happen somewhere else now ? |
Codecov Report
@@ Coverage Diff @@
## master #1178 +/- ##
=======================================
Coverage 89.61% 89.61%
=======================================
Files 200 200
Lines 29504 29537 +33
=======================================
+ Hits 26439 26471 +32
- Misses 3065 3066 +1
Continue to review full report at Codecov.
|
We could think about using If we wanted this semi-automated we could add it to the integration tests on Jenkins (running on the SSEC's bumi server). |
Ok, after answering messages and taking a break for a bit, I started playing around with On slack @mraspaud had brought up making I'd still like to restructure some of the stuff in Satpy as it is needlessly calling some of these functions. I'll see what I can do. |
Ok I think I'm done with all of the satpy-specific optimizations, but I have a related refactor I want to do. @mraspaud what are your thoughts on renaming some of the internal functions so they have Edit: I still need to make a trollsift PR. I'm not sure any of the changes here need tests since they are all refactors. |
See pytroll/trollsift#25 for trollsift optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We might want to plan for using asv in the future...
I tried this with the latest trollsift master and also #1169 merged. With the same test script as before. New times are:
|
I think I've addressed everything @gerritholl mentioned and I'm ready for a re-review and merge assuming the tests pass. |
Excellent work, thanks. All good as far as I can tell :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This addresses some of the performance issues mentioned in #1172. Thanks to @gerritholl's investigating, it was discovered that the base reader's functions for searching for file was taking a long time. We've tracked it down to a few key parts, the main one being that globify was called thousands of times for only hundreds of files. The changes in this PR seem to make a big difference.
Here is a PyCharm profiling graph showing the most called or longest running functions in the call graph of my test script:
With a total run time of ~16.5s. Here's what it looks like after this PR (globify is not even listed):
With a total run time of ~4.7s.
flake8 satpy