`ERS.f19_g16.I1850CLM45.edison_intel.clm-betr` fails on edison with threads and intel15 (the default) #1555

ndkeen · 2017-05-24T17:23:13Z

ERS.f19_g16.I1850CLM45.edison_intel.clm-betr
This test looks new and may have always failed. I assumed it was due to an easy-to-adjust runtime setting. However, I've been trying a few things and thought I should share my experiences.

The error sorta sounds like it might be running out of memory. So I tried increasing and decreasing the OMP_STACKSIZE from the current setting of 64M. I tried 32,64,128,256,512,1024M and all fail.

This test uses 96 MPI's and 4 threads, so on edison it needs 16 nodes. If I run the test forcing 1 thread it passes. If I try 2 threads, it also fails.

The test passes with --compiler=gnu

I also tried the --compiler=intel17 option and I haven't seen it behave differently. There are several other tests with what look like similar grids (all using 16 nodes on edison) that pass.
Whoops -- looks like I thought I was testing with intel17, but it's not actually using intel17 -- still 15.

Looks like specifying --compiler=intel17 isn't working with stand-alone create_test (I tested this with acme_developer)

edison11% create_test ERS.f19_g16.I1850CLM45.edison_intel.clm-betr --force-threads=1 --compiler=intel17 -ttest
No handlers could be found for logger "CIME.utils"
Using project from .cesm_proj: acme
Creating test directory /global/cscratch1/sd/ndk/acme_scratch/edison/n24may19/ERS_PMx1.f19_g16.I1850CLM45.edison_intel.clm-betr.test

If it worked, the edison_intel would instead be edison_intel17

The problem is that I was specifying the compiler by using "_intel" in the test name, which will override my --compiler setting. The better way to try the test with intel17 is to create_test ERS.f19_g16.I1850CLM45.edison_intel17.clm-betr. Trying that now -- and the test passes with intel17.

The text was updated successfully, but these errors were encountered:

rgknox · 2017-05-24T21:58:05Z

@ndkeen I am trying to get fates to run on multiple threads as well, so I have been following along. If I figure anything out on my side, I will share my experience.

For now, can we verify that ERS.f19_g16.I1850CLM45.edison_intel.clm-betr indeed passed tests once upon a time? Maybe we should look at what happened when it stopped passing this one.

FYI: My problem has something to do with counting the number of total columns with natural vegetation during initialization, using that to allocate IO space, and then the ordering of the column indices for natural vegetation.

ndkeen · 2017-05-24T22:18:27Z

I looked thru past runs of acme_developer and some had passed. It was the ones were I used intel17. As noted above, I thought I tried intel17, but the test did not. I've got a few jobs in the Q now that will verify if using intel17 "fixes" the problem.

rgknox · 2017-05-24T22:23:23Z

ok, got it. Is it possible that those tests that passed didn't use multi-threading as a default on those grids? (and has since changed?)

ndkeen · 2017-05-24T22:28:09Z

Well it looks like a very recent test.

ndkeen · 2017-05-24T23:08:34Z

Ok, it's passing when I use intel17 compiler.

rgknox · 2017-05-24T23:12:02Z

did you end up specifying it in the test name, or with a flag?

ERS.f19_g16.I1850CLM45.edison_intel17.clm-betr

or

--compiler=intel17

sounds like the first one right?

ndkeen · 2017-05-24T23:13:11Z

Yes, the first one. Because this test has a modifier after the compiler name, I guess the --compiler= trick won't work.

Update testreporter and change hobart queue to medium. Update testreporter.py to handle compare failures that were being missed. Remove tagname from the testdb comments that were added to the GENERATE and BASLINE lines in TestStatus. Change the default queue on hobart from short to medium to handle tests that were running a little long. Test suite: scripts_regression_tests.pr, populated testdb for alpha06m Test baseline: Test namelist changes: Test status: bit for bit, Fixes #1555 User interface changes?: Code review:jedwards

bishtgautam · 2017-06-07T16:27:22Z

This test failure is a known issue with intel15+edison, but the test works fine with intel17.

ndkeen · 2017-07-27T21:34:33Z

After edison upgrade July 2017, I don't see the Intel v15 installed. So maybe can close as "won't fix"?

ndkeen added the Edison label May 24, 2017

ndkeen changed the title ~~ERS.f19_g16.I1850CLM45.edison_intel.clm-betr fails on edison (one of the tests) with threads~~ ERS.f19_g16.I1850CLM45.edison_intel.clm-betr fails on edison with threads and intel15 (the default) May 24, 2017

rljacob assigned bishtgautam and rgknox and unassigned bishtgautam Jun 7, 2017

ndkeen closed this as completed Aug 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`ERS.f19_g16.I1850CLM45.edison_intel.clm-betr` fails on edison with threads and intel15 (the default) #1555

`ERS.f19_g16.I1850CLM45.edison_intel.clm-betr` fails on edison with threads and intel15 (the default) #1555

ndkeen commented May 24, 2017 •

edited

Loading

rgknox commented May 24, 2017

ndkeen commented May 24, 2017

rgknox commented May 24, 2017 •

edited

Loading

ndkeen commented May 24, 2017

ndkeen commented May 24, 2017

rgknox commented May 24, 2017

ndkeen commented May 24, 2017

bishtgautam commented Jun 7, 2017

ndkeen commented Jul 27, 2017

ERS.f19_g16.I1850CLM45.edison_intel.clm-betr fails on edison with threads and intel15 (the default) #1555

ERS.f19_g16.I1850CLM45.edison_intel.clm-betr fails on edison with threads and intel15 (the default) #1555

Comments

ndkeen commented May 24, 2017 • edited Loading

rgknox commented May 24, 2017

ndkeen commented May 24, 2017

rgknox commented May 24, 2017 • edited Loading

ndkeen commented May 24, 2017

ndkeen commented May 24, 2017

rgknox commented May 24, 2017

ndkeen commented May 24, 2017

bishtgautam commented Jun 7, 2017

ndkeen commented Jul 27, 2017

`ERS.f19_g16.I1850CLM45.edison_intel.clm-betr` fails on edison with threads and intel15 (the default) #1555

`ERS.f19_g16.I1850CLM45.edison_intel.clm-betr` fails on edison with threads and intel15 (the default) #1555

ndkeen commented May 24, 2017 •

edited

Loading

rgknox commented May 24, 2017 •

edited

Loading