Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix gas optics interpolation #312

Conversation

tjahns
Copy link
Contributor

@tjahns tjahns commented Dec 13, 2024

This series of fixes addresses a number of bugs and performance problems the original kernel has. While the code is fine with respect to the Fortran standard, it contains a number of places where contemporary CPUs and GPUs can profit from less latency- and optimizer-sensitive idioms.

@skosukhin This works fine for various experiments I did in ICON with the Intel compiler so I'd really like to see this in production, even if that needs additional work on my side. Since I'm not particularly familiar with the rte-rrtmgp external, I'd appreciate any hint on how to refine this PR.

* The following is bad in any performance-sensitive codes:
  + Using arrays instead of scalars.
  + Assigning to the same array element more than once in a loop-nest.
While the compiler can in principle fix this up, it's actually
too hard to ask for that and in the case fixed here, seriously trips
up the Intel Fortran compiler.
* In some cases these conversions are carried out via memory such that
  reading data just written to memory back into a register stalls the
  FPU.
@tjahns tjahns force-pushed the fix-gas-optics-interpolation branch from c9a38fd to e55d549 Compare December 14, 2024 00:12
@RobertPincus
Copy link
Member

@tjahns If you think these changes are generally beneficial could you also port them to kernels/accel/mo_gas_optics_rrtmgp.F90 ?

Do you know for sure that the workaround for Intel ifort is no longer needed? What version are you testing with?

@tjahns
Copy link
Contributor Author

tjahns commented Dec 16, 2024

I tried the versions installed at DKRZ (2022.0.1 and 2023.2.1). but it's hard to make sure that it works in all configurations and versions. I initially looked into this because the work-around failed for some OpenMP setups.
I'll first look into what's going wrong for nvhpc and then investigate applicability to kernels/accel/mo_gas_optics_rrtmgp.F90, now.

@tjahns
Copy link
Contributor Author

tjahns commented Dec 16, 2024

Is there a recipe to reproduce what happens during the tests exactly? I can't seem to find the all_tests.sh that seems to run extended tests in the CI setup. When I try to test the same setup as (nvfortran, default, SP), by using the following installed packages on Levante

  • /sw/spack-levante/nvhpc-24.5-ipi3ad
  • /sw/spack-levante/netcdf-fortran-4.6.1-y65bmi
  • /sw/spack-levante/mambaforge-22.9.0-2-Linux-x86_64-kptncg
    and configure like this
 ../../icon-mpim/externals/rte-rrtmgp/configure --disable-silent-rules --enable-gpu=no --enable-single-precision=yes --enable-tests  FC=nvfortran FCFLAGS='-Mallocatable=03 -Mstandard -Mbounds -Mchkptr -Kieee -Mchkstk' --with-netcdf-root=/sw/spack-levante/netcdf-fortran-4.6.1-y65bmi

all tests run successfully in a make -j4 check.

@RobertPincus
Copy link
Member

@tjahns Executing make libs tests check on develop runs all the tests there are to run. If this command executes successfully the tests have passed.

@tjahns
Copy link
Contributor Author

tjahns commented Dec 16, 2024

Okay, my bad: I used the autoconf branch that's currently in use in ICON. I'll retry locally with the develop branch.

@tjahns
Copy link
Contributor Author

tjahns commented Dec 16, 2024

I was now able to successfully reproduce the error and, after adding some diagnostics, the resulting deviation is about 2 times the expected value:

+ ./check_equivalence test_atmospheres.nc ../rrtmgp-data/rrtmgp-gas-sw-g112.nc
 gas optics is for the shortwave
   pressure    limits (Pa):    1.005184        109663.3    
   temperature limits (K):    160.0000        355.0000    
   Intialized atmosphere twice
   Default calculation
   Vertical orientation invariance
   TSI invariance
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   halving/doubling fails
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   Incrementing with 1scl fails
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   Incrementing with 2str fails
 max deviation:    2.4414063E-04  tolerance:    1.2207031E-04
   Incrementing with nstr fails
   Incrementing
Warning: ieee_invalid is signaling
Warning: ieee_underflow is signaling
Warning: ieee_inexact is signaling
ERROR STOP 1

My change to get these values printed:

diff --git a/tests/check_equivalence.F90 b/tests/check_equivalence.F90
index 7ece94a..67529c6 100644
--- a/tests/check_equivalence.F90
+++ b/tests/check_equivalence.F90
@@ -655,7 +655,8 @@ contains
     real(wp), dimension(:,:), intent(in) :: array1, array2
     real(wp), optional,       intent(in) :: tol
 
-    real(wp) :: tolerance 
+    real(wp) :: tolerance, maxdev
+    integer :: position(2)
     if (present(tol)) then
       tolerance = tol
     else
@@ -663,6 +664,13 @@ contains
     end if
 
     allclose = all(abs(array1-array2) <= tolerance * spacing(array1))
+    if (.not. allclose) then
+      position = maxloc(abs(array1-array2))
+      maxdev = abs(array1(position(1), position(2)) &
+           - array2(position(1), position(2)))
+      write (0, *) 'max deviation: ', maxdev, &
+           ' tolerance: ', spacing(array1(position(1), position(2)))
+    end if
   end function allclose
   ! ----------------------------------------------------------------------------
   subroutine report_err(error_msg)

@tjahns
Copy link
Contributor Author

tjahns commented Dec 16, 2024

The deviation is a result of the 3 replacements of division by multiplication with the inverse. So I guess it's your call if the deviation introduced is worth relaxing the criterion or if you'd rather stay within the limits currently set. When I revert only be0801c that's already enough to get the tests to pass in my setup.

@RobertPincus
Copy link
Member

@tjahns Thanks for this careful investigation. It looks like the variation is precisely a factor of 2 larger than we allow for in the tolerance arguments at line 41-414, does that seem right? There's nothing magic about those tolerance arguments - perhaps you can propose the smallest possible increases to the tolerances that allows the tests to pass?

@tjahns
Copy link
Contributor Author

tjahns commented Dec 17, 2024

@RobertPincus Since the deviation is rather miniscule (i.e. 0.2441406E-03 is for a value of array2(16, 1) = 1181.387, perhaps the interesting question is the accuracy you expect the measurements underlying the experiments to have in this regard and what errors you expect to result from other parts of the system. In terms of the numeric analysis, perhaps the interesting question is if the change introduces bias. Looking into sum gives

sum(array1-array2) = 0.6545106E-01

so there is a tiny uptick, but the sum of array2 is 1394896 meaning the change is rather small overall.
There's another way to look at this, that I found when investigating what made the nvfortran setup so special: the -Kieee flag prevents the use of fma instructions which can recapture some of the precision here (i.e. removing that flag and running on a system with AVX2 also allows the test to pass).
Edit: perhaps I should elaborate: modern hardware can compute a*b+c with slightly higher precision than a/b+c because the intermediate result of the multiplication can be used directly instead of the truncation which occurs when the result of the division is stored back to a register before the addition. The -Kieee enforces strict semantics which the IEEE754 floating point standard provides for the basic operations of addition, subtraction, multiplication and division, even though such a result is less precise than possible with fused-multiply-add instructions.
But I should perhaps add that numerical analysis is not my specialty and neither are the physics of radiative transfer models, rather my work is in HPC software engineering, so I can only give you hints here.
Your current criterion allows for a difference in the lowest 3 bits only, each doubling gives you another bit. The way an IEEE754 float is stored gives effectively 24 bits for the mantissa. So from my point of view it makes sense to determine how much of that precision actually still fits your data within typical accuracy and allow for the rest to wiggle and rather make sure that practically no bias occurs.

@RobertPincus
Copy link
Member

@tjahns It seems I don't get notified that you've written unless you tag me.

The code in question is the unit tests where we check variants of the calculation that should reproduce the same answers. The test that's failing is where we calculate flux, divide the opacity by two, add the atmosphere back to itself (we should recover the original), and check the fluxes again. The addition isn't simple but involves some algebra.

Since this is a little arbitrary I don't have a strong reason for choosing the thresholds I have in place - they are tight enough to pass across compilers. I don't see a problem if you make them somewhat loser, especially since the reasons will be documented in this PR.

I don't remember why we use -Kieee for nvfortran during testing - maybe @skosukhin knows/remembers (is it related to Icon?)

@RobertPincus RobertPincus changed the base branch from develop to feature-tjahns-gas-optics-interp December 20, 2024 19:20
Copy link
Member

@RobertPincus RobertPincus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for these @tjahns

@RobertPincus RobertPincus merged commit 1f504c9 into earth-system-radiation:feature-tjahns-gas-optics-interp Dec 20, 2024
24 of 25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants