-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix gas optics interpolation #312
Fix gas optics interpolation #312
Conversation
* The following is bad in any performance-sensitive codes: + Using arrays instead of scalars. + Assigning to the same array element more than once in a loop-nest. While the compiler can in principle fix this up, it's actually too hard to ask for that and in the case fixed here, seriously trips up the Intel Fortran compiler.
* In some cases these conversions are carried out via memory such that reading data just written to memory back into a register stalls the FPU.
c9a38fd
to
e55d549
Compare
@tjahns If you think these changes are generally beneficial could you also port them to Do you know for sure that the workaround for Intel |
I tried the versions installed at DKRZ (2022.0.1 and 2023.2.1). but it's hard to make sure that it works in all configurations and versions. I initially looked into this because the work-around failed for some OpenMP setups. |
Is there a recipe to reproduce what happens during the tests exactly? I can't seem to find the all_tests.sh that seems to run extended tests in the CI setup. When I try to test the same setup as (nvfortran, default, SP), by using the following installed packages on Levante
all tests run successfully in a |
@tjahns Executing |
Okay, my bad: I used the autoconf branch that's currently in use in ICON. I'll retry locally with the develop branch. |
I was now able to successfully reproduce the error and, after adding some diagnostics, the resulting deviation is about 2 times the expected value:
My change to get these values printed: diff --git a/tests/check_equivalence.F90 b/tests/check_equivalence.F90
index 7ece94a..67529c6 100644
--- a/tests/check_equivalence.F90
+++ b/tests/check_equivalence.F90
@@ -655,7 +655,8 @@ contains
real(wp), dimension(:,:), intent(in) :: array1, array2
real(wp), optional, intent(in) :: tol
- real(wp) :: tolerance
+ real(wp) :: tolerance, maxdev
+ integer :: position(2)
if (present(tol)) then
tolerance = tol
else
@@ -663,6 +664,13 @@ contains
end if
allclose = all(abs(array1-array2) <= tolerance * spacing(array1))
+ if (.not. allclose) then
+ position = maxloc(abs(array1-array2))
+ maxdev = abs(array1(position(1), position(2)) &
+ - array2(position(1), position(2)))
+ write (0, *) 'max deviation: ', maxdev, &
+ ' tolerance: ', spacing(array1(position(1), position(2)))
+ end if
end function allclose
! ----------------------------------------------------------------------------
subroutine report_err(error_msg) |
The deviation is a result of the 3 replacements of division by multiplication with the inverse. So I guess it's your call if the deviation introduced is worth relaxing the criterion or if you'd rather stay within the limits currently set. When I revert only be0801c that's already enough to get the tests to pass in my setup. |
@tjahns Thanks for this careful investigation. It looks like the variation is precisely a factor of 2 larger than we allow for in the |
@RobertPincus Since the deviation is rather miniscule (i.e. 0.2441406E-03 is for a value of
so there is a tiny uptick, but the sum of array2 is 1394896 meaning the change is rather small overall. |
@tjahns It seems I don't get notified that you've written unless you tag me. The code in question is the unit tests where we check variants of the calculation that should reproduce the same answers. The test that's failing is where we calculate flux, divide the opacity by two, add the atmosphere back to itself (we should recover the original), and check the fluxes again. The addition isn't simple but involves some algebra. Since this is a little arbitrary I don't have a strong reason for choosing the thresholds I have in place - they are tight enough to pass across compilers. I don't see a problem if you make them somewhat loser, especially since the reasons will be documented in this PR. I don't remember why we use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for these @tjahns
1f504c9
into
earth-system-radiation:feature-tjahns-gas-optics-interp
This series of fixes addresses a number of bugs and performance problems the original kernel has. While the code is fine with respect to the Fortran standard, it contains a number of places where contemporary CPUs and GPUs can profit from less latency- and optimizer-sensitive idioms.
@skosukhin This works fine for various experiments I did in ICON with the Intel compiler so I'd really like to see this in production, even if that needs additional work on my side. Since I'm not particularly familiar with the rte-rrtmgp external, I'd appreciate any hint on how to refine this PR.