Improve performance of `pyam.concat()` #510

danielhuppmann · 2021-03-21T15:02:33Z

Please confirm that this PR has done the following:

Tests Added
Documentation Added
Name of contributors Added to AUTHORS.rst
Description in RELEASE_NOTES.md Added

Description of PR

This PR refactors the implementation of pyam.concat() to use pd.concat() on the timeseries data rather than iteratively calling IamDataFrame.append().

closes #500

Patrick Jürgens hinzugefügt

…ject

danielhuppmann · 2021-03-21T15:05:02Z

@pjuergens, it would be great if you could try out this implementation for your use cases and report any unexpected issues. Also, it would be interesting to hear about the performance improvements in a real-world application.

codecov · 2021-03-21T15:11:42Z

Codecov Report

Merging #510 (7c1b183) into main (ebceba1) will increase coverage by 0.0%.
The diff coverage is 100.0%.

@@          Coverage Diff          @@
##            main    #510   +/-   ##
=====================================
  Coverage   93.1%   93.1%           
=====================================
  Files         43      43           
  Lines       4907    4926   +19     
=====================================
+ Hits        4569    4588   +19     
  Misses       338     338

Impacted Files	Coverage Δ
pyam/core.py	`92.5% <100.0%> (+<0.1%)`	⬆️
tests/conftest.py	`100.0% <100.0%> (ø)`
tests/test_core.py	`100.0% <100.0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ebceba1...7c1b183. Read the comment docs.

pjuergens · 2021-03-22T08:58:11Z

@danielhuppmann thanks for working on this issue. I'll try to test the implementation today.

Concerning performance-improvement in real-world-application: We have a code in which we aggregate a lot of hourly-resolved timeseries. The bottleneck there was the pyam.append-function. Switching from append to concat on major parts of the code with my original implementation I could reduce the calculation time from 24 to 15 minutes. I'll have a deeper look at it again today when I test the new implementation.

pjuergens · 2021-03-22T11:16:29Z

my script ran without any issues, thanks again!

coroa

LGTM. I can't find any faults in that implementation. Thanks for the nice speed-up!

The next low-hanging fruit on the speed-up quest is to add a (non-copying!?) fast-path to __init__ for a pd.Series in perfect condition, ie an index with the right names that is unique and maybe other conditions that one would have to extract from format_data to avoid having to reindex.

danielhuppmann · 2021-03-22T18:05:46Z

At some point, it will be so fast that you don't even have time for coffee while your scripts are running 😜...

* Fix iterable regression in concat introduced by #510 * Cast to list instead of to iterable * Update pyam/core.py Co-authored-by: Daniel Huppmann <dh@dergelbesalon.at>

danielhuppmann and others added 6 commits March 20, 2021 15:54

Cherry-pick from branch pjuergens:concat (@pjuergens)

9adc1e5

Update AUTHORS.rst

3abcac4

Patrick Jürgens hinzugefügt

Fix docstrings

b180ce5

Add test for calling concat with pd.DataFrame

b6aad6d

Rework concat implementation using pd.concat intead of append

c1f6644

Add test that single-element list for concat() returns identical ob…

bae4c68

…ject

danielhuppmann added the enhancement label Mar 21, 2021

danielhuppmann requested a review from coroa March 21, 2021 15:02

danielhuppmann self-assigned this Mar 21, 2021

danielhuppmann mentioned this pull request Mar 21, 2021

Concat #501

Closed

4 tasks

danielhuppmann added 2 commits March 21, 2021 16:06

Add to release notes

23cd875

Fix typos and docstrings

fc86e72

danielhuppmann added 2 commits March 22, 2021 09:05

Add more tests

231f48f

Make black

9e9efd9

danielhuppmann marked this pull request as ready for review March 22, 2021 08:31

coroa approved these changes Mar 22, 2021

View reviewed changes

Merge branch 'main' into performance/concat

7c1b183

danielhuppmann mentioned this pull request Mar 22, 2021

Add fast-path initialization from pd.Series #511

Open

danielhuppmann merged commit 32b06f7 into IAMconsortium:main Mar 22, 2021

danielhuppmann deleted the performance/concat branch March 22, 2021 18:06

coroa added a commit to coroa/pyam that referenced this pull request Mar 29, 2021

Fix iterable regression in concat introduced by IAMconsortium#510

6085fcd

coroa mentioned this pull request Mar 29, 2021

Fix iterable regression in concat introduced by #510 #512

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of `pyam.concat()` #510

Improve performance of `pyam.concat()` #510

danielhuppmann commented Mar 21, 2021 •

edited

Loading

danielhuppmann commented Mar 21, 2021

codecov bot commented Mar 21, 2021 •

edited

Loading

pjuergens commented Mar 22, 2021

pjuergens commented Mar 22, 2021

coroa left a comment

danielhuppmann commented Mar 22, 2021

Improve performance of pyam.concat() #510

Improve performance of pyam.concat() #510

Conversation

danielhuppmann commented Mar 21, 2021 • edited Loading

Please confirm that this PR has done the following:

Description of PR

danielhuppmann commented Mar 21, 2021

codecov bot commented Mar 21, 2021 • edited Loading

Codecov Report

pjuergens commented Mar 22, 2021

pjuergens commented Mar 22, 2021

coroa left a comment

Choose a reason for hiding this comment

danielhuppmann commented Mar 22, 2021

Improve performance of `pyam.concat()` #510

Improve performance of `pyam.concat()` #510

danielhuppmann commented Mar 21, 2021 •

edited

Loading

codecov bot commented Mar 21, 2021 •

edited

Loading