Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master fails PET_Ln9.ne30_oECv3_ICG.A_WCYCL1850S.cori-knl_intel #1850

Closed
mark-petersen opened this issue Oct 18, 2017 · 12 comments
Closed

Master fails PET_Ln9.ne30_oECv3_ICG.A_WCYCL1850S.cori-knl_intel #1850

mark-petersen opened this issue Oct 18, 2017 · 12 comments
Assignees

Comments

@mark-petersen
Copy link
Contributor

mark-petersen commented Oct 18, 2017

commit 15780e3 from On Oct 17.

Passes the identical test with gnu. Passes G cases with both gnu and intel.

@mark-petersen
Copy link
Contributor Author

message in *nc.base.cprnc.out is:

 A total number of    407 fields were compared
          of which    283 had non-zero differences

@mark-petersen
Copy link
Contributor Author

The following lower resolution G cases PASS.

PET_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_intel
PET_Ln9.T62_oQU240.GMPAS-IAF.cori-knl_gnu
PET_Ln9.T62_oQU240.GMPAS-IAF.cori-haswell_intel
PET_Ln9.T62_oQU240.GMPAS-IAF.cori-haswell_gnu

@mark-petersen
Copy link
Contributor Author

Identical case passes using gnu:

PET_Ln9.ne30_oECv3_ICG.A_WCYCL1850S.cori-knl_gnu

@rljacob
Copy link
Member

rljacob commented Oct 19, 2017

Try PET_Ln5.ne4_ne4.FC5AV1C-L.cori-knl_intel

@ndkeen
Copy link
Contributor

ndkeen commented Oct 19, 2017

PET_Ln5.ne4_ne4.FC5AV1C-L.cori-knl_intel that Rob suggested failed for me.

I can't run the PET test with DEBUG as all threaded tests will fail with a known bug (#1829).

So I also tried the following which passed (debug, forcing 1 thread):

SMS_D_PMx1.T62_oQU240.GMPAS-IAF.cori-knl_intel
SMS_D_PMx1_Ln9.ne30_oECv3_ICG.A_WCYCL1850S.cori-knl_intel

Could be related to #1477. I tried with intel18 and it also failed compare:

PET_Ln5.ne4_ne4.FC5AV1C-L.cori-knl_intel18

And then when I use -fp-model consistent instead of -fp-model source, the PET_Ln5.ne4_ne4.FC5AV1C-L.cori-knl_intel passes.

So if we want to use the consistent flag, I can make a PR, but it does slow down the code overall.

I personally think we need a BFB build flag for testing and then a way to run fast in production.

@rljacob
Copy link
Member

rljacob commented Oct 19, 2017

The failure of that test means the PET problem is in the atmosphere or land. Attn. @singhbalwinder @bishtgautam

@ndkeen
Copy link
Contributor

ndkeen commented Oct 19, 2017

Yea, I think it's the same issue as #1477

@mark-petersen
Copy link
Contributor Author

@amametjanov FYI I just tested this for PR#1862, and this is also an issue on theta. This fails:

./create_test PET_Ln9.ne30_oECv3_ICG.A_WCYCL1850S -p OceanClimate

on theta with the same error as cori:

SUMMARY of cprnc:
 A total number of    407 fields were compared
          of which    283 had non-zero differences
               and      0 had differences in fill patterns
 A total number of      0 fields could not be analyzed
 A total number of      0 fields on file 1 were not found on file2.

@mark-petersen
Copy link
Contributor Author

G cases pass, so it is most likely a threading mismatch in the atmosphere or land.
@singhbalwinder and @bishtgautam any progress on this? Land-only and atmosphere-only (I and F?) PET tests would differentiate.

@ndkeen
Copy link
Contributor

ndkeen commented Oct 26, 2017

Mark, if you read more in my other issues (commented above), I think we know that the issue is the floating-point treatment of the Intel compiler on KNL. Right now, if you modify the flags for Intel to use fp-model consistent instead of fp-model source, it will pass.

@mark-petersen
Copy link
Contributor Author

OK, thanks. I just looked up further and realized that.

asteyer pushed a commit that referenced this issue Oct 30, 2017
fix error message

Do not print message about leaving case directory if the directory was never created

Test suite: hand tests of create_newcase
Test baseline:
Test namelist changes:
Test status: bit for bit
Fixes #1824

User interface changes?:

Update gh-pages html (Y/N)?:

Code review:
@ndkeen
Copy link
Contributor

ndkeen commented Feb 1, 2018

Actually, #1919 should have fixed this. I forgot to add this as an issue the PR addressed.

@ndkeen ndkeen closed this as completed Mar 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants