-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ERS.f19_g16.I1850CLM45.edison_intel.clm-betr
fails on edison with threads and intel15 (the default)
#1555
Comments
@ndkeen I am trying to get fates to run on multiple threads as well, so I have been following along. If I figure anything out on my side, I will share my experience. For now, can we verify that ERS.f19_g16.I1850CLM45.edison_intel.clm-betr indeed passed tests once upon a time? Maybe we should look at what happened when it stopped passing this one. FYI: My problem has something to do with counting the number of total columns with natural vegetation during initialization, using that to allocate IO space, and then the ordering of the column indices for natural vegetation. |
I looked thru past runs of acme_developer and some had passed. It was the ones were I used intel17. As noted above, I thought I tried intel17, but the test did not. I've got a few jobs in the Q now that will verify if using intel17 "fixes" the problem. |
ok, got it. Is it possible that those tests that passed didn't use multi-threading as a default on those grids? (and has since changed?) |
Well it looks like a very recent test. |
ERS.f19_g16.I1850CLM45.edison_intel.clm-betr
fails on edison (one of the tests) with threadsERS.f19_g16.I1850CLM45.edison_intel.clm-betr
fails on edison with threads and intel15 (the default)
Ok, it's passing when I use intel17 compiler. |
did you end up specifying it in the test name, or with a flag? ERS.f19_g16.I1850CLM45.edison_intel17.clm-betr or --compiler=intel17 sounds like the first one right? |
Yes, the first one. Because this test has a modifier after the compiler name, I guess the |
Update testreporter and change hobart queue to medium. Update testreporter.py to handle compare failures that were being missed. Remove tagname from the testdb comments that were added to the GENERATE and BASLINE lines in TestStatus. Change the default queue on hobart from short to medium to handle tests that were running a little long. Test suite: scripts_regression_tests.pr, populated testdb for alpha06m Test baseline: Test namelist changes: Test status: bit for bit, Fixes #1555 User interface changes?: Code review:jedwards
This test failure is a known issue with intel15+edison, but the test works fine with intel17. |
After edison upgrade July 2017, I don't see the Intel v15 installed. So maybe can close as "won't fix"? |
ERS.f19_g16.I1850CLM45.edison_intel.clm-betr
This test looks new and may have always failed. I assumed it was due to an easy-to-adjust runtime setting. However, I've been trying a few things and thought I should share my experiences.
The error sorta sounds like it might be running out of memory. So I tried increasing and decreasing the OMP_STACKSIZE from the current setting of 64M. I tried 32,64,128,256,512,1024M and all fail.
This test uses 96 MPI's and 4 threads, so on edison it needs 16 nodes. If I run the test forcing 1 thread it passes. If I try 2 threads, it also fails.
The test passes with
--compiler=gnu
I also tried the
--compiler=intel17
option and I haven't seen it behave differently. There are several other tests with what look like similar grids (all using 16 nodes on edison) that pass.Whoops -- looks like I thought I was testing with intel17, but it's not actually using intel17 -- still 15.
Looks like specifying
--compiler=intel17
isn't working with stand-alone create_test (I tested this with acme_developer)The problem is that I was specifying the compiler by using "_intel" in the test name, which will override my --compiler setting. The better way to try the test with intel17 is to
create_test ERS.f19_g16.I1850CLM45.edison_intel17.clm-betr
. Trying that now -- and the test passes with intel17.The text was updated successfully, but these errors were encountered: