Fix gas optics interpolation #312

tjahns · 2024-12-13T23:02:14Z

This series of fixes addresses a number of bugs and performance problems the original kernel has. While the code is fine with respect to the Fortran standard, it contains a number of places where contemporary CPUs and GPUs can profit from less latency- and optimizer-sensitive idioms.

@skosukhin This works fine for various experiments I did in ICON with the Intel compiler so I'd really like to see this in production, even if that needs additional work on my side. Since I'm not particularly familiar with the rte-rrtmgp external, I'd appreciate any hint on how to refine this PR.

* The following is bad in any performance-sensitive codes: + Using arrays instead of scalars. + Assigning to the same array element more than once in a loop-nest. While the compiler can in principle fix this up, it's actually too hard to ask for that and in the case fixed here, seriously trips up the Intel Fortran compiler.

* In some cases these conversions are carried out via memory such that reading data just written to memory back into a register stalls the FPU.

RobertPincus · 2024-12-15T20:07:32Z

@tjahns If you think these changes are generally beneficial could you also port them to kernels/accel/mo_gas_optics_rrtmgp.F90 ?

Do you know for sure that the workaround for Intel ifort is no longer needed? What version are you testing with?

tjahns · 2024-12-16T09:48:51Z

I tried the versions installed at DKRZ (2022.0.1 and 2023.2.1). but it's hard to make sure that it works in all configurations and versions. I initially looked into this because the work-around failed for some OpenMP setups.
I'll first look into what's going wrong for nvhpc and then investigate applicability to kernels/accel/mo_gas_optics_rrtmgp.F90, now.

tjahns · 2024-12-16T16:07:04Z

Is there a recipe to reproduce what happens during the tests exactly? I can't seem to find the all_tests.sh that seems to run extended tests in the CI setup. When I try to test the same setup as (nvfortran, default, SP), by using the following installed packages on Levante

/sw/spack-levante/nvhpc-24.5-ipi3ad
/sw/spack-levante/netcdf-fortran-4.6.1-y65bmi
/sw/spack-levante/mambaforge-22.9.0-2-Linux-x86_64-kptncg
and configure like this

 ../../icon-mpim/externals/rte-rrtmgp/configure --disable-silent-rules --enable-gpu=no --enable-single-precision=yes --enable-tests  FC=nvfortran FCFLAGS='-Mallocatable=03 -Mstandard -Mbounds -Mchkptr -Kieee -Mchkstk' --with-netcdf-root=/sw/spack-levante/netcdf-fortran-4.6.1-y65bmi

all tests run successfully in a make -j4 check.

RobertPincus · 2024-12-16T16:42:37Z

@tjahns Executing make libs tests check on develop runs all the tests there are to run. If this command executes successfully the tests have passed.

tjahns · 2024-12-16T17:55:53Z

Okay, my bad: I used the autoconf branch that's currently in use in ICON. I'll retry locally with the develop branch.

tjahns · 2024-12-16T23:55:40Z

I was now able to successfully reproduce the error and, after adding some diagnostics, the resulting deviation is about 2 times the expected value:

+ ./check_equivalence test_atmospheres.nc ../rrtmgp-data/rrtmgp-gas-sw-g112.nc
 gas optics is for the shortwave
   pressure    limits (Pa):    1.005184        109663.3    
   temperature limits (K):    160.0000        355.0000    
   Intialized atmosphere twice
   Default calculation
   Vertical orientation invariance
   TSI invariance
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   halving/doubling fails
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   Incrementing with 1scl fails
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   Incrementing with 2str fails
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   Incrementing with nstr fails
   Incrementing
Warning: ieee_invalid is signaling
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
ERROR STOP 1

My change to get these values printed:

diff --git a/tests/check_equivalence.F90 b/tests/check_equivalence.F90
index 7ece94a..67529c6 100644
--- a/tests/check_equivalence.F90
+++ b/tests/check_equivalence.F90
@@ -655,7 +655,8 @@ contains
     real(wp), dimension(:,:), intent(in) :: array1, array2
     real(wp), optional,       intent(in) :: tol
 
-    real(wp) :: tolerance 
+    real(wp) :: tolerance, maxdev
+    integer :: position(2)
     if (present(tol)) then
       tolerance = tol
     else
@@ -663,6 +664,13 @@ contains
     end if
 
     allclose = all(abs(array1-array2) <= tolerance * spacing(array1))
+    if (.not. allclose) then
+      position = maxloc(abs(array1-array2))
+      maxdev = abs(array1(position(1), position(2)) &
+           - array2(position(1), position(2)))
+      write (0, *) 'max deviation: ', maxdev, &
+           ' tolerance: ', spacing(array1(position(1), position(2)))
+    end if
   end function allclose
   ! ----------------------------------------------------------------------------
   subroutine report_err(error_msg)

tjahns · 2024-12-16T23:58:33Z

The deviation is a result of the 3 replacements of division by multiplication with the inverse. So I guess it's your call if the deviation introduced is worth relaxing the criterion or if you'd rather stay within the limits currently set. When I revert only be0801c that's already enough to get the tests to pass in my setup.

RobertPincus · 2024-12-17T02:35:05Z

@tjahns Thanks for this careful investigation. It looks like the variation is precisely a factor of 2 larger than we allow for in the tolerance arguments at line 41-414, does that seem right? There's nothing magic about those tolerance arguments - perhaps you can propose the smallest possible increases to the tolerances that allows the tests to pass?

tjahns · 2024-12-17T08:31:52Z

@RobertPincus Since the deviation is rather miniscule (i.e. 0.2441406E-03 is for a value of array2(16, 1) = 1181.387, perhaps the interesting question is the accuracy you expect the measurements underlying the experiments to have in this regard and what errors you expect to result from other parts of the system. In terms of the numeric analysis, perhaps the interesting question is if the change introduces bias. Looking into sum gives

sum(array1-array2) = 0.6545106E-01

so there is a tiny uptick, but the sum of array2 is 1394896 meaning the change is rather small overall.
There's another way to look at this, that I found when investigating what made the nvfortran setup so special: the -Kieee flag prevents the use of fma instructions which can recapture some of the precision here (i.e. removing that flag and running on a system with AVX2 also allows the test to pass).
Edit: perhaps I should elaborate: modern hardware can compute a*b+c with slightly higher precision than a/b+c because the intermediate result of the multiplication can be used directly instead of the truncation which occurs when the result of the division is stored back to a register before the addition. The -Kieee enforces strict semantics which the IEEE754 floating point standard provides for the basic operations of addition, subtraction, multiplication and division, even though such a result is less precise than possible with fused-multiply-add instructions.
But I should perhaps add that numerical analysis is not my specialty and neither are the physics of radiative transfer models, rather my work is in HPC software engineering, so I can only give you hints here.
Your current criterion allows for a difference in the lowest 3 bits only, each doubling gives you another bit. The way an IEEE754 float is stored gives effectively 24 bits for the mantissa. So from my point of view it makes sense to determine how much of that precision actually still fits your data within typical accuracy and allow for the rest to wiggle and rather make sure that practically no bias occurs.

RobertPincus · 2024-12-17T23:13:05Z

@tjahns It seems I don't get notified that you've written unless you tag me.

The code in question is the unit tests where we check variants of the calculation that should reproduce the same answers. The test that's failing is where we calculate flux, divide the opacity by two, add the atmosphere back to itself (we should recover the original), and check the fluxes again. The addition isn't simple but involves some algebra.

Since this is a little arbitrary I don't have a strong reason for choosing the thresholds I have in place - they are tight enough to pass across compilers. I don't see a problem if you make them somewhat loser, especially since the reasons will be documented in this PR.

I don't remember why we use -Kieee for nvfortran during testing - maybe @skosukhin knows/remembers (is it related to Icon?)

RobertPincus

Thanks very much for these @tjahns

tjahns added 7 commits December 14, 2024 01:10

Replace unnecessary log operation.

adaff34

Replace division by multiplication of inverse.

9712a07

Replace array reference with scalar.

76f159e

Replace division by multiplication with inverse.

be0801c

Keep data in float format.

5954bc8

* In some cases these conversions are carried out via memory such that reading data just written to memory back into a register stalls the FPU.

Fix mix in precisions.

e55d549

tjahns force-pushed the fix-gas-optics-interpolation branch from c9a38fd to e55d549 Compare December 14, 2024 00:12

RobertPincus changed the base branch from develop to feature-tjahns-gas-optics-interp December 20, 2024 19:20

RobertPincus approved these changes Dec 20, 2024

View reviewed changes

RobertPincus merged commit 1f504c9 into earth-system-radiation:feature-tjahns-gas-optics-interp Dec 20, 2024
24 of 25 checks passed

RobertPincus mentioned this pull request Dec 20, 2024

Thomas Jahn's gas optics interpolation fixes #315

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix gas optics interpolation #312

Fix gas optics interpolation #312

tjahns commented Dec 13, 2024

RobertPincus commented Dec 15, 2024

tjahns commented Dec 16, 2024

tjahns commented Dec 16, 2024

RobertPincus commented Dec 16, 2024

tjahns commented Dec 16, 2024

tjahns commented Dec 16, 2024

tjahns commented Dec 16, 2024

RobertPincus commented Dec 17, 2024

tjahns commented Dec 17, 2024 •

edited

Loading

RobertPincus commented Dec 17, 2024

RobertPincus left a comment

Fix gas optics interpolation #312

Fix gas optics interpolation #312

Conversation

tjahns commented Dec 13, 2024

RobertPincus commented Dec 15, 2024

tjahns commented Dec 16, 2024

tjahns commented Dec 16, 2024

RobertPincus commented Dec 16, 2024

tjahns commented Dec 16, 2024

tjahns commented Dec 16, 2024

tjahns commented Dec 16, 2024

RobertPincus commented Dec 17, 2024

tjahns commented Dec 17, 2024 • edited Loading

RobertPincus commented Dec 17, 2024

RobertPincus left a comment

Choose a reason for hiding this comment

tjahns commented Dec 17, 2024 •

edited

Loading