Switch to -fp-model fast in standalone HOMME runs #1960

amametjanov · 2017-12-06T01:52:39Z

Switch to -fp-model fast in standalone HOMME runs

[non-BFB] - for the HOMME test due to cflag change

mt5555 · 2017-12-07T18:05:50Z

will this be ok on KNL? @ndkeen is seeing reproducibility problems on KNL with -fp-model fast for EAM.

mt5555 · 2017-12-07T20:27:43Z

with -fp-mode=fast:

skybridge, with intel16: HOMME tests pass
anvil, intel17: HOMME tests pass
cork-knl, intel18: HOMME tests pass, but had to increase swtc6 tolerance to 5e-14

So this looks good to go (after I push the update to swtc6 tol).

amametjanov · 2017-12-07T20:37:05Z

Checked for reproducibility for intel18 on cori-knl with baselines generated from git master v1.0.0-beta.2-2897-g82cf067 (+ -fp-model fast):

azamat@cori11:/global/cscratch1/sd/azamat/acme_scratch/cori-knl/HOMME_P24.f19_g16_rx1.A.cori-knl_intel18.C.20171207_111804_ne1i1z/run> 
cat homme.log
[  2%] Built target timing
[  8%] Built target pio
[ 20%] Built target swtcA
[ 28%] Built target swtcB
[ 42%] Built target baroC
[ 57%] Built target theta-nlev30
[ 71%] Built target baroCam
[ 85%] Built target baroCam-acc
[100%] Built target theta-nlev20
Scanning dependencies of target check
Test project /global/cscratch1/sd/azamat/acme_scratch/cori-knl/HOMME_P24.f19_g16_rx1.A.cori-knl_intel18.C.20171207_111804_ne1i1z/bld/te
st_execs
      Start  1: verifyBaselineResults
 1/17 Test  #1: verifyBaselineResults ............   Passed    5.46 sec
      Start  2: swtc1
 2/17 Test  #2: swtc1 ............................   Passed   33.86 sec
      Start  3: swtc2
 3/17 Test  #3: swtc2 ............................   Passed   21.43 sec
      Start  4: swtc5
 4/17 Test  #4: swtc5 ............................   Passed  101.60 sec
      Start  5: swtc6
 5/17 Test  #5: swtc6 ............................   Passed   30.23 sec
      Start  6: baro2b
 6/17 Test  #6: baro2b ...........................   Passed  130.89 sec
      Start  7: baro2c
 7/17 Test  #7: baro2c ...........................   Passed   52.31 sec
      Start  8: baro2d
 8/17 Test  #8: baro2d ...........................   Passed  172.19 sec
      Start  9: baroCamMoist
 9/17 Test  #9: baroCamMoist .....................   Passed  102.46 sec
      Start 10: baroCamMoistSL
10/17 Test #10: baroCamMoistSL ...................   Passed   22.72 sec
      Start 11: baroCamMoist-acc
11/17 Test #11: baroCamMoist-acc .................   Passed   57.80 sec
      Start 12: thetah-test22
12/17 Test #12: thetah-test22 ....................   Passed   52.14 sec
      Start 13: thetanh-test22
13/17 Test #13: thetanh-test22 ...................   Passed   48.77 sec
      Start 14: thetah-TC
14/17 Test #14: thetah-TC ........................   Passed   16.88 sec
      Start 15: thetanh-TC
15/17 Test #15: thetanh-TC .......................   Passed   30.83 sec
      Start 16: thetanhwet-TC
16/17 Test #16: thetanhwet-TC ....................   Passed   31.47 sec
      Start 17: templates
17/17 Test #17: templates ........................   Passed   22.97 sec

100% tests passed, 0 tests failed out of 17

Total Test time (real) = 937.85 sec
[100%] Built target check

mt5555 · 2017-12-07T20:40:02Z

Here's the issue I saw with swtc6:

5/17 Test #5: swtc6 ............................***Failed 37.15 sec
Submitting 1 jobs
Running test swtc6-run ... /global/homes/t/taylorm/scratch2/regtest/homme/tests/swtc6/swtc6-run.sh > swtc6-run.out 2> swtc6-run.err
test swtc6-run was run successfully
Test name = swtc6
Examining cprnc reference comparison output files
file = exodus-swtc61.nc
The files are different: DIFF_RESULT=DIFFERENT
Checking RMS differences with tol = 1E-14
CPRNC returned the following RMS differences
RMS geop 1.7501E-12 NORMALIZED 1.9114E-16
RMS u 1.7622E-14 NORMALIZED 5.4737E-16
RMS v 2.4322E-14 NORMALIZED 1.1928E-15
RMS geop 4.6121E-12 NORMALIZED 5.0375E-16
RMS u 1.1562E-13 NORMALIZED 3.5950E-15
RMS v 1.0836E-13 NORMALIZED 5.2769E-15
RMS geop 6.4231E-12 NORMALIZED 7.0133E-16
RMS u 2.1307E-13 NORMALIZED 6.6225E-15
RMS v 2.1734E-13 NORMALIZED 1.0649E-14
1.9114E-16 <= 1E-14 OK
5.4737E-16 <= 1E-14 OK
1.1928E-15 <= 1E-14 OK
5.0375E-16 <= 1E-14 OK
3.5950E-15 <= 1E-14 OK
5.2769E-15 <= 1E-14 OK
7.0133E-16 <= 1E-14 OK
6.6225E-15 <= 1E-14 OK
1.0649E-14 > 1E-14 ERROR: TOL EXCEEDED

amametjanov · 2017-12-07T20:49:56Z

+1 on increasing the tol

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

Test system (sandiatoss3) was overriding all the compiler flags, meaning that it was not testing the changes in this PR. tweak machine file so sandiatoss3 systems will use default compiler flags.

Remerging branch

mt5555 · 2017-12-15T20:43:13Z

Merged to next - but all tests passed on skybridge because skybridge uses custom, machine specific fortran flags. I modified the branch so that skybridge will use the default Intel settings, and re-merged to next. HOMME test should now diff tomorrow.

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

…rtens/fix_datacomps This fixes issue #1960 that arose when a stream domain file is not euqla to the model domain

Fix several data component issues This PR fixes several outstanding issues: src/components/data_comps/dshare was moved to src/share/streams in order to resolved issue #1802. For aquaplanet runs - the landmask was reset in docn_comp_mod.F90 and in fact this does work if you are reading an input sst file as the aquaplanet forcing. The right place to put this is in shr_strdata_init.F90right after the model grid is read in. An optional argument was introduced in shr_strdata_init.F90 (reset_domain_mask) that will allow this in a backwards compatible manner. This resolves issue #1960. If the data component domain file is equal to 'UNSET' the 'domainfile' namelist variable should be set to'null'which will assume that the model domain is contained in the first stream file. This resolves issue #1937. Test suite: scripts_regressions_tests The following extra tests were also run on cheyenne and compared to cesm2_0_alpha07f: ERI.T62_g16.C1850ECO.cheyenne_intel.pop-ecosys ERI.T62_g37.G.cheyenne_intel.pop-cice ERP_D_Ln9.f19_f19_mg17.QPC6.cheyenne_intel.cam-outfrq9s ERP_D_Ln9.f19_f19_mg17.QSC6.cheyenne_intel.cam-outfrq9s ERP_Ln9.f09_f09_mg17.F1850_DONOTUSE.cheyenne_intel.cam-outfrq9s ERP_P180x2_D_Ld5.f19_g17_gl4.I1850Clm50BgcCropG.cheyenne_intel.clm-default ERR.f45_g37_rx1.A.cheyenne_intel ERS_IOP.T62_g16.CIAF.cheyenne_intel.pop-default ERS_IOP.T62_g16.GIAF.cheyenne_intel.pop-default ERS_Lm3.T62_g16.AIAF.cheyenne_intel ERS_Ly3.f09_g16_gl4.T1850G.cheyenne_intel Test baseline: cesm2_0_alpha07b Test namelist changes: none Test status: bit for bit Fixes #1802 Fixes #1960 Fixes #1937 User interface changes?: None Update gh-pages html (Y/N)?:N Code review:sacks, edwards

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

Switch to -fp-model fast in standalone HOMME runs

c6e223c

amametjanov added HOMME non-BFB PR makes roundoff changes to answers. labels Dec 6, 2017

rljacob assigned mt5555 Dec 6, 2017

increase tol for swtc6 from 1e-14 to 5e-14

b0f4c04

mt5555 added a commit that referenced this pull request Dec 14, 2017

Merge branch 'azamat/homme/fp-fast' into next (PR #1960)

7ff0024

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

remove forced fortran options from sandiatoss3.cmake

b78895f

Test system (sandiatoss3) was overriding all the compiler flags, meaning that it was not testing the changes in this PR. tweak machine file so sandiatoss3 systems will use default compiler flags.

mt5555 added a commit that referenced this pull request Dec 15, 2017

Merge branch 'azamat/homme/fp-fast' into next (PR #1960)

0f70152

Remerging branch

amametjanov mentioned this pull request Dec 18, 2017

standalone HOMME NaN check bugfix #1966

Merged

mt5555 merged commit b78895f into master Dec 20, 2017

mt5555 added a commit that referenced this pull request Dec 20, 2017

Merge 'origin/azamat/homme/fp-fast' (PR #1960)

406216f

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

amametjanov deleted the azamat/homme/fp-fast branch December 21, 2017 17:51

jgfouca pushed a commit that referenced this pull request Jan 23, 2018

Merge 'origin/azamat/homme/fp-fast' (PR #1960)

52b5113

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

rljacob pushed a commit that referenced this pull request Apr 21, 2021

Merge 'origin/azamat/homme/fp-fast' (PR #1960)

411a900

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

rljacob pushed a commit that referenced this pull request May 6, 2021

Merge remote-tracking branch 'origin/mvertens/fix_aquapfile' into mve…

13ea7e3

…rtens/fix_datacomps This fixes issue #1960 that arose when a stream domain file is not euqla to the model domain

rljacob pushed a commit that referenced this pull request May 6, 2021

Merge 'origin/azamat/homme/fp-fast' (PR #1960)

356b541

Switch to -fp-model fast in standalone HOMME runs [non-BFB] - for the HOMME test due to cflag change

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to -fp-model fast in standalone HOMME runs #1960

Switch to -fp-model fast in standalone HOMME runs #1960

amametjanov commented Dec 6, 2017

mt5555 commented Dec 7, 2017

mt5555 commented Dec 7, 2017

amametjanov commented Dec 7, 2017 •

edited

Loading

mt5555 commented Dec 7, 2017

amametjanov commented Dec 7, 2017

mt5555 commented Dec 15, 2017

Switch to -fp-model fast in standalone HOMME runs #1960

Switch to -fp-model fast in standalone HOMME runs #1960

Conversation

amametjanov commented Dec 6, 2017

mt5555 commented Dec 7, 2017

mt5555 commented Dec 7, 2017

amametjanov commented Dec 7, 2017 • edited Loading

mt5555 commented Dec 7, 2017

amametjanov commented Dec 7, 2017

mt5555 commented Dec 15, 2017

amametjanov commented Dec 7, 2017 •

edited

Loading