Refactoring density integrals for efficiency #591

Hallberg-NOAA · 2024-03-31T17:50:01Z

Refactored 4 routines (int_density_generic_pcm(), int_density_generic_ppm(), int_spec_vol_generic_pcm() and int_spec_vol_generic_plm()) in density integrals for greater computational efficiency by doing fewer calls to the equation of state routines but calculating entire rows of densities at subgrid locations with each each call, replicating what was already being done int_density_dz_generic_plm(). To accomplish this, a number of variables now use larger arrays than previously. The total computational cost of the non-Boussinesq pressure gradient force calculation was more than 50% greater with the previous code in some tests. All answers are bitwise identical.

codecov · 2024-03-31T17:52:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 37.08%. Comparing base (01b2ea9) to head (1797f52).

❗ Current head 1797f52 differs from pull request most recent head 9d90e04. Consider uploading reports for the commit 9d90e04 to get more accurate results

Additional details and impacted files

@@             Coverage Diff              @@
##           dev/gfdl     #591      +/-   ##
============================================
+ Coverage     35.93%   37.08%   +1.14%     
============================================
  Files           268      271       +3     
  Lines         79919    80897     +978     
  Branches      15106    15075      -31     
============================================
+ Hits          28722    29998    +1276     
+ Misses        45523    45299     -224     
+ Partials       5674     5600      -74

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

marshallward · 2024-04-15T14:55:51Z

I would guess that you should see equivalent performance in a 2D arrays like T_N(1:N,iscB+c1:isec+c2) rather than the staggered 5-tuple and 15-tuple 1D arrays introduced here, which is introducing a lot of complexity. Although the 2D array is not required to be contiguous, there is hardly a compiler in the world where this is not true. (If needed, the contiguous keyword could be added to calculate_density, calculate_spec_vol etc, although I would not recommend it).

I am guessing that would require new 2D interfaces for calculate_density etc.

There might be further benefit to break out of a single i-loop and do it over the full i-/j- arrays, although that would also require 3D interfaces.

marshallward

While refactoring this PR, I found a few issues in the dz PCM integral, and another likely issue in the dz PPM when the Stanley EOS is enabled. These would not be detected by our current testing, since AFAIK our layered tests use analytic EOS integration rather than PCM quadrature.

These were detected in the benchmark test with EOS_QUADRATURE turned on.

Although my changes are complete, we will apply my changes on top of this PR, rather than rewriting this one.

src/core/MOM_density_integrals.F90

Hallberg-NOAA · 2024-04-26T08:57:39Z

Thank you for spotting these bugs. I have updated this commit to address the 4 bugs noted above, plus two others that I found after expanding my testing to set EOS_QUADRATURE = True.

Refactored 4 routines (int_density_generic_pcm, int_density_generic_ppm, int_spec_vol_generic_pcm and int_spec_vol_generic_plm) in density integrals for greater computational efficiency by doing fewer calls to the equation of state routines but calculating entire rows of densities at subgrid locations with each each call, replicating what was already being done int_density_dz_generic_plm. To accomplish this, a number of variables now use larger arrays than previously. The total computational cost of the non-Boussinesq pressure gradient force calculation was more than 50% greater with the previous code in some tests. All answers are bitwise identical.

marshallward · 2024-04-27T15:02:20Z

Could you document the other two changes that were needed here? I will need to confirm that they were fixed in my changes.

Hallberg-NOAA · 2024-04-27T18:17:25Z

The other two bugs in the original pull request were an incorrect memory declaration for S5 in int_spec_vol_dp_generic_pcm() and a missing pos = assignment in int_density_dz_generic_ppm(). Both bugs would have been eliminated in the much simpler refactoring of this module that you are doing.

marshallward · 2024-04-29T23:07:28Z

I have profiled these changes and they show a very clear speedup. Results for a 6hr run of benchmark using the legacy ("buggy") Wright EOS with EOS_QUADRATURE enabled.

The current code:

(Ocean pressure force)                  24     15.356693
(Ocean pressure force)                  24     15.422615
(Ocean pressure force)                  24     15.515217

This PR:

(Ocean pressure force)                  24      6.039421
(Ocean pressure force)                  24      6.056434
(Ocean pressure force)                  24      6.034422

This 2.5x speedup is very reproducible.

The suggested refactor is currently unable to achieve this level of performance. The conversion from a 1D to 3D array appears to introduce a slowdown from 6s to about 8s, and appears to be due to converting the 15N loop to nested 3x5xN element loops. The compilers are either unable or unwilling to convert these to single loops. (The contiguous keyword was no help here.)

@adcroft and I have tested a potential version using assumed-size arrays which appear to get very close to the optimized speed (0.13/6.03 sec ~ 2%)

(Ocean pressure force)                  24      6.221991
(Ocean pressure force)                  24      6.187612
(Ocean pressure force)                  24      6.166069

and believe there may be additional speedups using this technique.

Since there are no plans to replace this commit, I will approve this one and will continue to develop the improvements to the EOS, which will be submitted into a separate PR.

marshallward · 2024-04-30T03:15:04Z

Gaea regression: https://gitlab.gfdl.noaa.gov/ogrp/MOM6/-/pipelines/23267 ✔️

marshallward · 2024-04-30T03:43:41Z

The previous runtimes only contained updates to 3d array times (which dominated the profile). After updating the 2d profiles, the runtimes are equivalent.

The accepted PR using packed 1D arrays:

(Ocean pressure force)                  24      5.995598
(Ocean pressure force)                  24      6.019213
(Ocean pressure force)                  24      5.997528

Proposed changes based on 2d/3d arrays (not yet submitted):

(Ocean pressure force)                  24      5.955973
(Ocean pressure force)                  24      5.997446
(Ocean pressure force)                  24      5.985435

So there is no longer any reason to not refactor this PR to replace the pack arrays with equivalent 2D and 3D arrays.

marshallward · 2024-05-01T04:14:16Z

I would not normally keep bantering on in a merged PR, but this PR appears to have changed answers, at least in benchmark using the Wright EOS and EOS_QUADRATURE = True.

4c4
<     24,       0.250,     0, En 5.4681542507953240E-01, CFL  0.01994, SL -1.1407E-08, M 7.90973E+19, S 35.0000, T  5.0639, Me  1.93E-16, Se  6.51E-15, Te  3.38E-15
---
>     24,       0.250,     0, En 5.4681542498895175E-01, CFL  0.01994, SL -1.1405E-08, M 7.90973E+19, S 35.0000, T  5.0639, Me  1.95E-16, Se  6.65E-15, Te  3.39E-15

I used ifort (IFORT) 2021.7.1 20221019 with -O2 optimization (plus essential flags).

I am not sure if this requires special action. I am unsure if anyone is even using EOS_QUADRATURE in production. I first noticed it because my refactoring of this PR produced a third set of answers (equally small in size).

marshallward · 2024-05-01T18:51:15Z

I repeated using all of the GFDL flags:

-O2
-fp-model source
-march=core-avx-i
-qno-opt-dynamic-align

and the answers now agree (and in many cases are faster, although that is somewhat expected).

I would have thought that -fprotect-parens was sufficient for bit reproducibility, but there appears to be more going on here. But that is a conversation for another place and time. By our definition, this PR is fine and there is no reproducibility issue.

Hallberg-NOAA added the refactor Code cleanup with no changes in functionality or results label Mar 31, 2024

Hallberg-NOAA force-pushed the efficient_density_integrals branch from 7cff497 to 1797f52 Compare March 31, 2024 19:17

Hallberg-NOAA requested a review from marshallward April 1, 2024 08:06

marshallward requested changes Apr 24, 2024

View reviewed changes

src/core/MOM_density_integrals.F90 Outdated Show resolved Hide resolved

src/core/MOM_density_integrals.F90 Show resolved Hide resolved

src/core/MOM_density_integrals.F90 Outdated Show resolved Hide resolved

src/core/MOM_density_integrals.F90 Outdated Show resolved Hide resolved

Hallberg-NOAA force-pushed the efficient_density_integrals branch from 1797f52 to bae962b Compare April 26, 2024 08:56

Hallberg-NOAA force-pushed the efficient_density_integrals branch from bae962b to 9d90e04 Compare April 26, 2024 09:01

marshallward approved these changes Apr 29, 2024

View reviewed changes

marshallward merged commit 4058462 into NOAA-GFDL:dev/gfdl Apr 30, 2024
10 checks passed

Hallberg-NOAA deleted the efficient_density_integrals branch May 10, 2024 21:22

This was referenced May 16, 2024

GFDL to main (2024-05-16) mom-ocean/MOM6#1626

Closed

GFDL to main 2024-05-16 mom-ocean/MOM6#1627

Closed

marshallward mentioned this pull request May 31, 2024

GFDL to main (2024-05-31) mom-ocean/MOM6#1631

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactoring density integrals for efficiency #591

Refactoring density integrals for efficiency #591

Hallberg-NOAA commented Mar 31, 2024

codecov bot commented Mar 31, 2024 •

edited

Loading

marshallward commented Apr 15, 2024

marshallward left a comment

Hallberg-NOAA commented Apr 26, 2024

marshallward commented Apr 27, 2024

Hallberg-NOAA commented Apr 27, 2024

marshallward commented Apr 29, 2024

marshallward commented Apr 30, 2024

marshallward commented Apr 30, 2024

marshallward commented May 1, 2024

marshallward commented May 1, 2024 •

edited

Loading

Refactoring density integrals for efficiency #591

Refactoring density integrals for efficiency #591

Conversation

Hallberg-NOAA commented Mar 31, 2024

codecov bot commented Mar 31, 2024 • edited Loading

Codecov Report

marshallward commented Apr 15, 2024

marshallward left a comment

Choose a reason for hiding this comment

Hallberg-NOAA commented Apr 26, 2024

marshallward commented Apr 27, 2024

Hallberg-NOAA commented Apr 27, 2024

marshallward commented Apr 29, 2024

marshallward commented Apr 30, 2024

marshallward commented Apr 30, 2024

marshallward commented May 1, 2024

marshallward commented May 1, 2024 • edited Loading

codecov bot commented Mar 31, 2024 •

edited

Loading

marshallward commented May 1, 2024 •

edited

Loading