-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many tests with LND fail with GNU on cori-knl and cori-haswell #3270
Comments
Adjusting the DEBUG flags, I get this error:
|
Noting that this test also fails with Intel DEBUG as in #3284 The above issue is fixed (intel+debug) |
In tests on Theta, |
I tested using |
Using repo from Nov 6th, I tried I also tried The case dirs are here: |
Looks like an issue with an uninitialized variable, hitting a NaN. After replacing
|
That looks oddly familiar - see #1832 |
Thanks for pointing to that issue and the one it points to https://github.com/ESMCI/cime/issues/1974. So current, latest GNU compilers 8.2.0 and 8.3.0 still have that issue of throwing SIGFPE on calls to My vote is to remove
Doing that allowed There's still an issue with DEBUG runs multi-threaded:
But at least there's a way to run DEBUG in MPI-only mode. |
Huh, I had forgotten about 1832. And you're right that it looks similar. I would be OK with adjusting the compiler flags to stop on fewer conditions, but |
Update:
I've also found that I can remove the |
I've been running a few more experiments and I've found that I would want to make this change and allow GNU tests to work now. Then later try to work allowing the invalid trap to work again. My guess with |
I was going to make 2 PR's: One to remove
I think I can effectively use These changes allow the GNU tests (that I have run so far on cori) to pass, but certainly the optimization level change is likely not BFB compared to previous runs.
|
Cori now has a new version of GNU compilers ( |
With #3629, we can now use GNU with DEBUG, but currently the issue still exists without DEBUG. It's still true that changing the fortran flag for GNU from |
For Cori machines building with GNU compilers, append -O2 to flags for non-debug builds. Fixes #3270
After the Cori upgrade, I had trouble with GNU. In fact, all GNU tests failed immediately with a runtime error that was eventually attributed to the hugepages module. NERSC added this module by default, and we now remove it. After the PR to remove hugepages, I ran e3sm_developer with GNU and some tests passed, but most failed. I think most or all fail with
Program received signal SIGILL: Illegal instruction.
Example of how to reproduce (I just tried with master as of Oct23rd) is:
ERS.f09_g16.I1850CLM45CN --compiler=gnu
Looking back at e3sm_developer test after the PR to remove hugepages, the only tests that passed were:
Trying with DEBUG=TRUE has same failure and I do get a stack trace (below)
I also tried using only 1 thread -- same fail.
Same error using gnu 8.3.0 (currently we use 8.2.0) and higher version of cray-libsci.
Same error on cori-haswell with gnu 8.2.0.
However, with gnu 8.3.0 (as well as higher version of cray-libsci), the test works on cori-haswell.
Stack from DEBUG runs:
/global/cscratch1/sd/ndk/acme_scratch/cori-knl/m17-oct23/ERS_D.f09_g16.I1850CLM45CN.cori-knl_gnu.20191030_115956_s1tz98
The text was updated successfully, but these errors were encountered: