Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Div-by-0 in clubb pdf_closure_module.F90 #2194

Closed
amametjanov opened this issue Mar 27, 2018 · 4 comments
Closed

Div-by-0 in clubb pdf_closure_module.F90 #2194

amametjanov opened this issue Mar 27, 2018 · 4 comments

Comments

@amametjanov
Copy link
Member

Logging this error for ne120-wcycl runs on Theta (SMS.ne120_oRRS18v3_ICG.A_WCYCL1950S_CMIP6_HR.theta_intel.cam-cosplite)

forrtl: error (73): floating divide by zero
Image              PC                Routine            Line        Source
e3sm.exe           0000000009356540  Unknown               Unknown  Unknown
e3sm.exe           00000000023A7281  pdf_closure_modul         875  pdf_closure_module.F90
e3sm.exe           00000000020E60D1  advance_clubb_cor        1231  advance_clubb_core_module.F90
e3sm.exe           00000000016ABC08  clubb_intr_mp_clu        1855  clubb_intr.F90
e3sm.exe           000000000084D44A  physpkg_mp_tphysb        2461  physpkg.F90
e3sm.exe           0000000000846C2B  physpkg_mp_phys_r        1027  physpkg.F90
e3sm.exe           00000000005FC48D  cam_comp_mp_cam_r         250  cam_comp.F90
e3sm.exe           00000000005E8B00  atm_comp_mct_mp_a         348  atm_comp_mct.F90
e3sm.exe           0000000000439581  component_mod_mp_         267  component_mod.F90
e3sm.exe           0000000000423FF5  cime_comp_mod_mp_        1958  cime_comp_mod.F90
e3sm.exe           0000000000430BFC  MAIN__                     92  cime_driver.F90

The location of the error does not immediately reveal the source of the div-by-0:

 873     ! Account for subplume correlation in qt-thl
 874     rtprcp = mixt_frac * ( (rt_1-rtm)*rc_1 + (crt_1*varnce_rt_1)*cloud_frac_1 ) &
>875            + (one-mixt_frac) * ( (rt_2-rtm)*rc_2 + (crt_2*varnce_rt_2)*cloud_frac_2 ) &
 876            - mixt_frac*rrtthl*cthl_1*sqrt( varnce_rt_1*varnce_thl_1 )*cloud_frac_1 &
 877            - (one-mixt_frac)*rrtthl*cthl_2*sqrt( varnce_rt_2*varnce_thl_2 )*cloud_frac_2

The optimized build (-O2) can be inlining division stored in the intermediate variables. This was with -fpe0 and so is the first/earliest time such error occurs and aborts the run.

@wlin7
Copy link
Contributor

wlin7 commented Apr 4, 2018

@amametjanov , as you said, this statement does not include an explicit division operation, which part of it is more likely to to involve inlining division involving intermediate variables? Sorry, I am clueless. I think this is part of your effort in investigating why high-res coupled + cosp is not able to run stably. Is cosp to be blamed? What if doing this test but manually turn off cosp?

@worleyph
Copy link
Contributor

worleyph commented Apr 4, 2018

Looking at the code, the following variables in the expression at line 875 (seems strange to point into the middle of this by the way) involve divisions:

crt_1  = one/( one + beta1*rsatl_1)
crt_2  = one/( one + beta2*rsatl_2)
cthl_1 = ( (one + beta1 * rt_1) / ( one + beta1*rsatl_1)**2 ) &
         * ( Cp/Lv ) * beta1 * rsatl_1 * exner
cthl_2 = ( (one + beta2 * rt_2) / ( one + beta2*rsatl_2 )**2 ) & 
         * ( Cp/Lv ) * beta2 * rsatl_2 * exner

and the following components of the above do also:

beta1 = ep * ( Lv/(Rd*tl1) ) * ( Lv/(Cp*tl1) )
beta2 = ep * ( Lv/(Rd*tl2) ) * ( Lv/(Cp*tl2) )

Also,

rc_coef = Lv / (exner*Cp) - ep2 * thv_ds

is used in the next executable statement, in case the line number is not correct.

There are probably others, but you might start by looking at these?

@wlin7
Copy link
Contributor

wlin7 commented Apr 5, 2018

Thanks @worleyph , I see now what @amametjanov meant by intermediate variables.

@amametjanov
Copy link
Member Author

This was probably due to aggressive optimizations on KNLs and that was fixed by #2208. We can re-open if this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants