Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on FP exception checking on KNL nodes with Intel compiler #2208

Merged
merged 2 commits into from
Apr 3, 2018

Conversation

amametjanov
Copy link
Member

@amametjanov amametjanov commented Mar 29, 2018

Turn on FP exception checking on KNL nodes with Intel compiler.

Also, don't halt on div-by-0 in a clubb MKL call on KNLs when FP exception-checking (-fpe0) is turned on. This makes debug and non-debug runs behave the same with -fpe0 on.

[BFB]

@amametjanov
Copy link
Member Author

Additional reference about the -fpe0 flag with Intel v18 compiler: https://software.intel.com/en-us/fortran-compiler-18.0-developer-guide-and-reference-fpe

Sets option -fp-speculation=strict (Linux* and macOS*) or /Qfp-speculation:strict (Windows*) for any program unit compiled with -fpe0 (Linux* and macOS*) or /fpe:0 (Windows*). This disables certain optimizations in cases where speculative execution of floating-point operations could lead to floating-point exceptions that would not occur in the absence of speculation. For example, this may prevent the vectorization of some loops containing conditionals.

Disables certain optimizations that generate calls to the Short Vector Math Library that could lead to floating-point exceptions for extreme input arguments that would not occur if libm was called instead. For example, this may prevent the vectorization of some loops containing calls to transcendental math functions.

This appears to fix all of the NaN, FP invalid and div-by-0 errors in ne120-wcycl runs on Theta (~10 different runs).

@mt5555
Copy link
Contributor

mt5555 commented Mar 29, 2018

@amametjanov - is this really needed? I would have thought that divide by zeros in MLK, from speculative execution, would be harmless.

@ndkeen IIRC, had figured out a way in DEBUG mode to allow speculative execution, but not abort on the (harmless) NaNs sometimes produced by speculative execution.

@rljacob
Copy link
Member

rljacob commented Mar 29, 2018

If it makes the random fails while running high-res on KNL go away, then yes its necessary.

@amametjanov
Copy link
Member Author

Yes, this enables harmless divide by zeros inside MKL in production mode. Previously, it was enabled only in debug mode: #1183.

Debug mode sets -fpe0 and because of that none of the errors in production runs could be re-produced in debug mode.

@mt5555
Copy link
Contributor

mt5555 commented Mar 29, 2018

I'm still confused: in non-debug mode, we want to enable vectorization, and we want to allow the harmless divide-by-zeros, since the NaNs produced by speculative execution should not show up in the data we care about?

@amametjanov
Copy link
Member Author

-fpe0 does not completely disable vectorization, it should only disable potentially dangerous, speculative vectorization and disable compiler-generated calls to SVML functions. Somewhere in the E3SM code, these two optimizations are producing NaNs.

We can try to track down NaN-sensitive source files and put them in Depends.intel to be compiled with -fp-speculation=strict, -fimf-use-svml=false or similar flags and eventually remove the -fpe0 flag.

@worleyph
Copy link
Contributor

Just tried master

-compset A_WCYCL1950S_CMIP6_HR -res ne120_oRRS18v3_ICG

on Titan using PGI, once with 1 thread and once with 2 threads, same number of MPI tasks otherwise. The two runs diverge at timestep 7 in atm.log .

@worleyph
Copy link
Contributor

Sorry - this should go on one of the github issue pages, or the Confluence page, but it appears that this case and master have problems beyond KNL systems and the intel compiler.

@PeterCaldwell
Copy link
Contributor

Thanks for checking this, Pat. I've been wanting to know how the model does on machines other than KNL...

@PeterCaldwell
Copy link
Contributor

I was so excited last night that someone had tried running on Titan that I forgot to respond to the bad news part of your post - did you do these runs using -fpe0 or not? It might be worth repeating your experiments with -fpe0 set to the opposite of what it was for your runs just to get more data points. Also, could you add your results to the end of the https://acme-climate.atlassian.net/wiki/spaces/SIM/pages/626721264/KNL+Reproducibility+-+v1+High-Res+Coupled page, where I added a table for non-bfb behavior on titan? Thanks!

@worleyph
Copy link
Contributor

Not -fpe0 yet. Not even sure that this is a thing with the PGI compiler / for the AMD processor. I want to do some more studies first: see if results are deterministic, then see if same issue arises for a low res case, then see which component introduces non-b4b behavior when adding threading, then start seeing when this behavior first occurred. I will add some information to the Confluence page as soon as I get the chance. For the Titan results, I am assuming that this is a model bug, not a compiler bug.

@amametjanov
Copy link
Member Author

Pat, were the two runs (1 and 2 threads) with the same executable (or one pureMPI and the other threaded)? A quick way to test reproducibility with the default pelayout is to

./cime/scripts/create_test ERS.ne120_oRRS18v3_ICG.A_WCYCL1950S_CMIP6_HR.titan_pgi.cam-cosplite

IIRC, there were PGI compiler upgrades recently and we don't run high-res tests on Titan yet.

@worleyph
Copy link
Contributor

Pat, were the two runs (1 and 2 threads) with the same executable (or one pureMPI and the other threaded)?

I did a --clean-all and a fresh build after changing the PE layout. SInce BUILD_THREADED is TRUE, they could have been identical. I did not check.

@amametjanov
Copy link
Member Author

By default, if all NTHRDS are 1, then BUILD_THREADED=FALSE (unless explicitly set to TRUE prior to building). So the first one must have been a pureMPI run and the second -- threaded. In this case, reproducibility is not expected, and so we are still OK.

@worleyph
Copy link
Contributor

You are correct - I thought we set BUILD_THREADED to be TRUE by default so that we could better evaluate reproducibility. Am I dreaming? If not, when was this changed back?

@amametjanov
Copy link
Member Author

It's been like this for a while (couple of years). To check threaded reproducibility, there are PET (PE Threaded) tests that are part of e3sm_integration test suite. And they pass for Intel compiler. Min (@minxu74) can switch from current e3sm_developer to e3sm_integration for regular (twice-a-week) testing on Titan to get coverage of threading runs.

@worleyph
Copy link
Contributor

Never mind. It was broken for awhile, and we fixed it to be what it was before (MPI-only had BUILD_THREADED=FALSE).

@worleyph
Copy link
Contributor

The two runs diverge at timestep 7 in atm.log .

So one build was without -mp and one was with. I do not trust this result as indicating a problem. Sorry about that, and thanks to @amametjanov for pointing this out. I'll do the "corect" comparison next. I'll be sticking with the production case since this is what is showing the problems on other systems.

@worleyph
Copy link
Contributor

Update: On Titan/gpi, low res (ne30_oECv3_ICG) compset A_WCYCL1950S_CMIP6_LR is not b4b with change in threading. Tracked it down to ATM only, and is fixed with reintroducing the istanbul CPU target. Either the compiler upgrade or the use of COSP reintroduced this need (probably the latter). In any case, this does not shed any light on the problems this PR is trying to address. Sorry for the distraction. I have nothing to add on high resolution runs on Titan for this case.

@rljacob
Copy link
Member

rljacob commented Apr 2, 2018

@worleyph can you open a new issue about the threading BFB problem you found?

@worleyph
Copy link
Contributor

worleyph commented Apr 2, 2018 via email

@rljacob
Copy link
Member

rljacob commented Apr 2, 2018

How do you know its only PGI and Titan? I'm not sure a PET test has been run with this compset on other platforms.

@worleyph
Copy link
Contributor

worleyph commented Apr 2, 2018 via email

@rljacob
Copy link
Member

rljacob commented Apr 2, 2018

@amametjanov go ahead and merge this to next.

amametjanov added a commit that referenced this pull request Apr 2, 2018
…2208)

Turn on FP exception checking on KNL nodes with Intel compiler.
Also, don't halt on div-by-0 in a clubb MKL call on KNLs when
FP exception-checking (-fpe0) is turned on. This makes debug and
non-debug runs behave the same with -fpe0 on.

[BFB]
@amametjanov
Copy link
Member Author

Merged to next.

@rljacob
Copy link
Member

rljacob commented Apr 3, 2018

@amametjanov please merge this to master.

@amametjanov amametjanov merged commit d1b0715 into master Apr 3, 2018
amametjanov added a commit that referenced this pull request Apr 3, 2018
Turn on FP exception checking on KNL nodes with Intel compiler.

Also, don't halt on div-by-0 in a clubb MKL call on KNLs when
FP exception-checking (-fpe0) is turned on. This makes debug and
non-debug runs behave the same with -fpe0 on.

[BFB]
@amametjanov amametjanov deleted the azamat/knl/rm-clubb-mkl-div-by-0-halts branch April 3, 2018 19:52
@amametjanov
Copy link
Member Author

Merged to master.

jgfouca pushed a commit that referenced this pull request Apr 11, 2018
Turn on FP exception checking on KNL nodes with Intel compiler.

Also, don't halt on div-by-0 in a clubb MKL call on KNLs when
FP exception-checking (-fpe0) is turned on. This makes debug and
non-debug runs behave the same with -fpe0 on.

[BFB]
brhillman pushed a commit that referenced this pull request Apr 5, 2023
…ns_output

Automatically Merged using E3SM Pull Request AutoTester
PR Title: Improve jenkins output when things fail.
PR Author: jgfouca
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants