-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring density integrals for efficiency #591
Refactoring density integrals for efficiency #591
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev/gfdl #591 +/- ##
============================================
+ Coverage 35.93% 37.08% +1.14%
============================================
Files 268 271 +3
Lines 79919 80897 +978
Branches 15106 15075 -31
============================================
+ Hits 28722 29998 +1276
+ Misses 45523 45299 -224
+ Partials 5674 5600 -74 ☔ View full report in Codecov by Sentry. |
7cff497
to
1797f52
Compare
I would guess that you should see equivalent performance in a 2D arrays like I am guessing that would require new 2D interfaces for There might be further benefit to break out of a single i-loop and do it over the full i-/j- arrays, although that would also require 3D interfaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While refactoring this PR, I found a few issues in the dz PCM integral, and another likely issue in the dz PPM when the Stanley EOS is enabled. These would not be detected by our current testing, since AFAIK our layered tests use analytic EOS integration rather than PCM quadrature.
These were detected in the benchmark
test with EOS_QUADRATURE
turned on.
Although my changes are complete, we will apply my changes on top of this PR, rather than rewriting this one.
1797f52
to
bae962b
Compare
Thank you for spotting these bugs. I have updated this commit to address the 4 bugs noted above, plus two others that I found after expanding my testing to set EOS_QUADRATURE = True. |
Refactored 4 routines (int_density_generic_pcm, int_density_generic_ppm, int_spec_vol_generic_pcm and int_spec_vol_generic_plm) in density integrals for greater computational efficiency by doing fewer calls to the equation of state routines but calculating entire rows of densities at subgrid locations with each each call, replicating what was already being done int_density_dz_generic_plm. To accomplish this, a number of variables now use larger arrays than previously. The total computational cost of the non-Boussinesq pressure gradient force calculation was more than 50% greater with the previous code in some tests. All answers are bitwise identical.
bae962b
to
9d90e04
Compare
Could you document the other two changes that were needed here? I will need to confirm that they were fixed in my changes. |
The other two bugs in the original pull request were an incorrect memory declaration for |
I have profiled these changes and they show a very clear speedup. Results for a 6hr run of The current code:
This PR:
This 2.5x speedup is very reproducible. The suggested refactor is currently unable to achieve this level of performance. The conversion from a 1D to 3D array appears to introduce a slowdown from 6s to about 8s, and appears to be due to converting the 15N loop to nested 3x5xN element loops. The compilers are either unable or unwilling to convert these to single loops. (The @adcroft and I have tested a potential version using assumed-size arrays which appear to get very close to the optimized speed (0.13/6.03 sec ~ 2%)
and believe there may be additional speedups using this technique. Since there are no plans to replace this commit, I will approve this one and will continue to develop the improvements to the EOS, which will be submitted into a separate PR. |
Gaea regression: https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/23267 ✔️ |
The previous runtimes only contained updates to 3d array times (which dominated the profile). After updating the 2d profiles, the runtimes are equivalent. The accepted PR using packed 1D arrays:
Proposed changes based on 2d/3d arrays (not yet submitted):
So there is no longer any reason to not refactor this PR to replace the pack arrays with equivalent 2D and 3D arrays. |
I would not normally keep bantering on in a merged PR, but this PR appears to have changed answers, at least in
I used ifort (IFORT) 2021.7.1 20221019 with -O2 optimization (plus essential flags). I am not sure if this requires special action. I am unsure if anyone is even using EOS_QUADRATURE in production. I first noticed it because my refactoring of this PR produced a third set of answers (equally small in size). |
I repeated using all of the GFDL flags:
and the answers now agree (and in many cases are faster, although that is somewhat expected). I would have thought that |
Refactored 4 routines (
int_density_generic_pcm()
,int_density_generic_ppm()
,int_spec_vol_generic_pcm()
andint_spec_vol_generic_plm()
) in density integrals for greater computational efficiency by doing fewer calls to the equation of state routines but calculating entire rows of densities at subgrid locations with each each call, replicating what was already being doneint_density_dz_generic_plm()
. To accomplish this, a number of variables now use larger arrays than previously. The total computational cost of the non-Boussinesq pressure gradient force calculation was more than 50% greater with the previous code in some tests. All answers are bitwise identical.