-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of pyam.concat()
#510
Improve performance of pyam.concat()
#510
Conversation
Patrick Jürgens hinzugefügt
@pjuergens, it would be great if you could try out this implementation for your use cases and report any unexpected issues. Also, it would be interesting to hear about the performance improvements in a real-world application. |
Codecov Report
@@ Coverage Diff @@
## main #510 +/- ##
=====================================
Coverage 93.1% 93.1%
=====================================
Files 43 43
Lines 4907 4926 +19
=====================================
+ Hits 4569 4588 +19
Misses 338 338
Continue to review full report at Codecov.
|
@danielhuppmann thanks for working on this issue. I'll try to test the implementation today. Concerning performance-improvement in real-world-application: We have a code in which we aggregate a lot of hourly-resolved timeseries. The bottleneck there was the pyam.append-function. Switching from append to concat on major parts of the code with my original implementation I could reduce the calculation time from 24 to 15 minutes. I'll have a deeper look at it again today when I test the new implementation. |
my script ran without any issues, thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I can't find any faults in that implementation. Thanks for the nice speed-up!
The next low-hanging fruit on the speed-up quest is to add a (non-copying!?) fast-path to __init__
for a pd.Series
in perfect condition, ie an index with the right names that is unique and maybe other conditions that one would have to extract from format_data
to avoid having to reindex.
At some point, it will be so fast that you don't even have time for coffee while your scripts are running 😜... |
Please confirm that this PR has done the following:
Description of PR
This PR refactors the implementation of
pyam.concat()
to usepd.concat()
on the timeseries data rather than iteratively callingIamDataFrame.append()
.closes #500