Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decomposition test failure #1103

Closed
MinsukJi-NOAA opened this issue Mar 10, 2022 · 74 comments
Closed

Decomposition test failure #1103

MinsukJi-NOAA opened this issue Mar 10, 2022 · 74 comments
Labels
bug Something isn't working

Comments

@MinsukJi-NOAA
Copy link
Contributor

Description

cpld_decomp_p8 fails with a different domain decomposition

To Reproduce:

What compilers/machines are you seeing this with? Intel/Hera
Give explicit steps to reproduce the behavior.

  1. Check out the latest ufs weather model (9b6b740)
  2. cd ufs-weather-model/tests; ./rt.sh -n cpld_decomp_p8. This test will PASS
  3. Modify ufs-weather-model/tests/tests/cpld_decomp_p8:
diff --git a/tests/tests/cpld_decomp_p8 b/tests/tests/cpld_decomp_p8
index cbf1b68f..bbbde45b 100644
--- a/tests/tests/cpld_decomp_p8
+++ b/tests/tests/cpld_decomp_p8
@@ -66,8 +66,10 @@ export RESTART_INTERVAL="${RESTART_N} -1"
 
 export TASKS=$TASKS_cpl_dcmp
 export TPN=$TPN_cpl_dcmp
-export INPES=$INPES_cpl_dcmp
-export JNPES=$JNPES_cpl_dcmp
+#export INPES=$INPES_cpl_dcmp
+#export JNPES=$JNPES_cpl_dcmp
+export INPES=8
+export JNPES=3
 export THRD=$THRD_cpl_dcmp
 export WRTTASK_PER_GROUP=$WPG_cpl_dcmp
  1. Repeat step 2. This test will FAIL.

For a comparison,
5. Check out the previous commit 38aa634 of the ufs weather model
6. Repeat steps 2, 3, and 4 above. Both tests will PASS.

@MinsukJi-NOAA MinsukJi-NOAA added the bug Something isn't working label Mar 10, 2022
@junwang-noaa
Copy link
Collaborator

@MinsukJi-NOAA Is this issue showing up in previous revisions?

@MinsukJi-NOAA
Copy link
Contributor Author

MinsukJi-NOAA commented Mar 10, 2022

@MinsukJi-NOAA Is this issue showing up in previous revisions?

The previous version 38aa634 does not have this issue.

@junwang-noaa
Copy link
Collaborator

Since the physics suite is changed and there is no dycore update in 9b6b740, it might be related to physics.

@yangfanglin @JessicaMeixner-NOAA @DeniseWorthen FYI.

@yangfanglin
Copy link
Collaborator

Is there any uncoupled atmos decomposition test ? Is it working ?

@JessicaMeixner-NOAA
Copy link
Collaborator

Yes, there is an uncoupled decomposition test. I'm running it with he 8,3 combination now.

@JessicaMeixner-NOAA
Copy link
Collaborator

The standalone atm decomposition test w/the 8,3 combination passed.

@JessicaMeixner-NOAA
Copy link
Collaborator

So 3,8 is the what the baseline is created with. 4,6 also gives the same answer, but 8,3 does not. It's my understanding that the only guarantee is for repro mode, so are we compiling in repro mode? I don't see any instructions on that here: https://github.com/ufs-community/ufs-weather-model/wiki/Building-model

I believe @DeniseWorthen's usual suggestion for debugging something is to write out and check the mediator history files, so I believe for that we follow the instructions here: https://github.com/ufs-community/ufs-weather-model/wiki/Advanced-Topics#using-the-cmeps-mediator-to-understand-the-coupling-fields-under-construction correct?

@MinsukJi-NOAA
Copy link
Contributor Author

@JessicaMeixner-NOAA I believe repro mode compilation can be done with -DREPRO=ON

@DeniseWorthen
Copy link
Collaborator

UFS discussion 934 contains the instructions for writing the mediator history files. In this case, you could add the following to nems.configure:

      history_n_atm_inst = 1
      history_option_atm_inst = nsteps
      history_n_ice_inst = 1
      history_option_ice_inst = nsteps
      history_n_ocn_inst = 1
      history_option_ocn_inst = nsteps
      history_tile_atm = 96

This will write the ATM mediator history as a single file containing all 6 tiles on every pass through the coupling loop. ICE and OCN will get their own history files.

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @MinsukJi-NOAA and @DeniseWorthen

@junwang-noaa
Copy link
Collaborator

@JessicaMeixner-NOAA To clarify, 1) the cpld_control_p8 and cpld_decomp_p8 RT have decomposition 4x6 and 3x8, 2)ORT test runs cpld_control_p8 with 8x3, both run in PROD mode for some time. It was working in previous PRs, the 8x3 setting stopped working since PR#1071.

@DeniseWorthen
Copy link
Collaborator

@JessicaMeixner-NOAA If you can point me to a run directory containing cpld_control_p8, I can look start with the mediator history files.

@JessicaMeixner-NOAA
Copy link
Collaborator

I just started a run with the extra outputs, they will be here: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_RT/rt_13823

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Mar 11, 2022

I copied your run director and made a sandbox. I used it to create 3 run directories

/scratch1/NCEPDEV/stmp2/Denise.Worthen/decomp96/decomp38
/scratch1/NCEPDEV/stmp2/Denise.Worthen/decomp96/decomp83
/scratch1/NCEPDEV/stmp2/Denise.Worthen/decomp96/decomp46

where each varies only in the input.nml layout variable. I set fhmax=2 in model configure.

The decomp38 and decomp46 directories are b4b after 2 hours. The decomp38 and decomp83 directories differ on the 2nd coupling step in the coupling fields sent by ATM (rain, snow, shum, tbot, height). The differing values are randomly scattered on each tile---they are not associated w/ land fraction for example, which I've seen before. A diff file at the second timestep is in the run directory (atm.diff.23040.nc).

60: RMS atmImp_Faxa_rain                 4.2134E-19            NORMALIZED  1.1768E-14
76: RMS atmImp_Faxa_snow                 1.0935E-20            NORMALIZED  2.7293E-15
148: RMS atmImp_Sa_shum                   6.0702E-19            NORMALIZED  6.7205E-17
157: RMS atmImp_Sa_tbot                   2.5005E-15            NORMALIZED  8.7086E-18
180: RMS atmImp_Sa_z                      8.8095E-17            NORMALIZED  8.4206E-18

@JessicaMeixner-NOAA
Copy link
Collaborator

Repro mode did not help: /scratch1/NCEPDEV/stmp2/Jessica.Meixner/FV3_RT/rt_10951 (comparing the output between the two directories, there are many diffs starting at fhr001).

@pjpegion
Copy link
Collaborator

I did some runs saving every time-step and different configurations.
Decomposition 4x6 and 6x4 both pass the regression test, only when switching to 3x8 is when it fails. The different in meteorology 1st appears in precipitation 30 minutes into the run on tile 1, but I see differences in the 1st time-step in
several tracers related to aerosols (hydrophobic black carbon mixing ratio, so2 mixing ratio, sulfate mixing ratio etc.)
and this is something I notice that is different between the 2 runs:
for 4x6: Aspect Ratio : min: 0.10000000000000E+01 max: 0.10654828722430E+01 avg: 0.61158324642517E+01
and 3x8: Aspect Ratio : min: 0.10000000000000E+01 max: 0.10654828722430E+01 avg: 0.61158324642518E+01

I then compiled with REPRO=ON, and running with 3x8 and 4x6 gets the same answers, but these answers are different than the regression test, which I expect since the model was optimized differently. And in the REPRO cases, the aspec ration average is the same as the 3x8 case above.

@pjpegion
Copy link
Collaborator

pjpegion commented Mar 17, 2022

I also changed cplchm=F and turned off the aerosols by changing the nems.configure and field_table, and 4x6 and 3x8 give same results with the original RT executable.

@yangfanglin
Copy link
Collaborator

Phil, Thanks. This is interesting. I believe Jessica was also testing the coupled without gocart aerosols.
Even though there is no feedbacks between gocart aerosols and the met, the met tracers might have been affected.

Are Raffaele's tracer fix (related to Thompson MP) included in these tests ?

@pjpegion
Copy link
Collaborator

@yangfanglin I don't know. @rmontuoro?

@yangfanglin
Copy link
Collaborator

I am making a test now on WCOSS. I checked out the latest ufs-weatehr-model and pointed to https://github.com/rmontuoro/fv3atm/tree/bugfix/thompson-tracer-index to the fv3atm. I am only testing the cpld_decomp_p8 in the rt.sh. Is this sufficient ? Anyhow, I will report back when the test is done.

@yangfanglin
Copy link
Collaborator

My RT returns "+ echo REGRESSION TEST WAS SUCCESSFUL". Does this mean the decomposition bug is also fixed with Raffaele's tracer bug fix ? Should I change the layout manually to test different configurations ?

@pjpegion
Copy link
Collaborator

@yangfanglin follow the steps at the top of this thread to try a different processor layout

@yangfanglin
Copy link
Collaborator

yangfanglin commented Mar 17, 2022

@pjpegion Got it. Running step 3 now.

@JessicaMeixner-NOAA
Copy link
Collaborator

@yangfanglin I have run with @rmontuoro PR changes and it solves some of the decomp issues, but there's still something going on as the results do not completely reproduce. In trying to figure out what is going on, I'm going to try to attempt to run this test w/aerosols but with the older physics options to see if it's again something pointing to an interaction of physics/aerosols that is not otherwise being seen.

@yangfanglin
Copy link
Collaborator

@JessicaMeixner-NOAA Was your coupled model test without gocart aerosol successful ?

@JessicaMeixner-NOAA
Copy link
Collaborator

I haven't tried with the very top of develop, but with the 9b6b740 commit, I get consistent results with @pjpegion that it worked. Trying to figure out why since the updated code @rmontuoro seems to resolve some of these issues with leaving aerosols on.

@yangfanglin
Copy link
Collaborator

My test with 8x3 decomposition did not reproduce using the latest ufs-weather-model develop branch and Raffaele's fv3-atm.

@junwang-noaa junwang-noaa mentioned this issue Mar 18, 2022
16 tasks
@yangfanglin
Copy link
Collaborator

Please see https://docs.google.com/document/d/11vo2-DyrR2LWbQoqprlVoTxhSEnpvblWPw30WqOl6uA/edit for a track of changes made to fv3-atm and ccpp repos after March 4 and before March 10 when Minsuk first reported the decomposition failure. Can we reverse the " fix 2phases intermediate restart" update and see if the decomposition RT works ?

@JessicaMeixner-NOAA
Copy link
Collaborator

I have not run the ORT yet, but will try to do that now. I personally find it easier to create new regression tests over running the ORT.

@arunchawla-NOAA
Copy link

The ORT is a requirement to PRs so I would like confirmation that this works.

@JessicaMeixner-NOAA
Copy link
Collaborator

It's running now I will post as soon as I have the answer to your question. I'm using the compiler flags @rmontuoro gave me, I'm not sure if that's will be the final decision for the flags or not. I will run the ORT and regression tests again once I know what the final flags are.

@JessicaMeixner-NOAA
Copy link
Collaborator

The ORT failed:

/scratch2/NCEPDEV/climate/Jessica.Meixner/p8b/ufs-decomp-03/tests$ ./opnReqTest -n cpld_control_c96_p8 
hecflow01
ECF_HOST = hecflow01, ECF_PORT = 26510
test name: cpld_control_c96_p8
cases to compile: std bit dbg
compiling std with compile option -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,FV3_GFS_v16_coupled_p7_rrtmgp,FV3_GFS_v17_coupled_p8
done compiling std
compiling bit with compile option -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,FV3_GFS_v16_coupled_p7_rrtmgp,FV3_GFS_v17_coupled_p8 -D32BIT=ON
Died with error code 2

will look after the meeting as to why.

@JessicaMeixner-NOAA
Copy link
Collaborator

Okay so this failed because we shouldn't be trying to compile the coupled model with -D32BIT=ON (yet).

I'll resubmit specifying which tests and report back how the tests go.

@JessicaMeixner-NOAA
Copy link
Collaborator

Next error:

compiling dbg with compile option -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,FV3_GFS_v16_coupled_p7_rrtmgp,FV3_GFS_v17_coupled_p8 -DDEBUG=ON
Died with error code 2

Debug build fails because of waves I'm assuming. I thought we could at least build though, so I'll look into debugging that issue, in the meantime:

Running now with the following options:
./opnReqTest -n cpld_control_p8 -c thr,mpi,dcp,rst

Please let me know if I should be testing/compiling with different compiler flags or using different options in the ORT test.

@arunchawla-NOAA
Copy link

Can we run the opnreqtest with the flags that @rmontuoro identified ?

@DeniseWorthen
Copy link
Collaborator

I think your debug failure for S2SWA requires the fix that Kyle provided here.

@JessicaMeixner-NOAA
Copy link
Collaborator

Thanks @DeniseWorthen I agree.
@arunchawla-NOAA I am running with the flags that @rmontuoro identified. It's unclear to me if everyone agrees that's the best/final option.

Next ORT failure:

./opnReqTest -n cpld_control_p8 -c thr,mpi,dcp,rst
hecflow01
ECF_HOST = hecflow01, ECF_PORT = 26510
test name: cpld_control_p8
cases to compile: std
compiling std with compile option -DAPP=S2SWA -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,FV3_GFS_v16_coupled_p7_rrtmgp,FV3_GFS_v17_coupled_p8
done compiling std
cases to run: std_base thr mpi dcp rst
Running test for std_base with
    THRD: 1; INPES: 3; JNPES: 8; TASKS: 200; TPN: 40
Running test for thr with
    THRD: 2; INPES: 3; JNPES: 4; TASKS: 120; TPN: 20
Coupled application not yet implemented for mpi
Died with error code 1

Where can I find when test and what options are expected to work for the ORT tests? I don't see anything here: https://ufs-weather-model.readthedocs.io/en/latest/BuildingAndRunning.html#using-the-operational-requirement-test-script

@arunchawla-NOAA
Copy link

For now I would like to know if these flags do work. We have reached out to GDIT about appropriate flags. @MinsukJi-NOAA can you address the questions that @JessicaMeixner-NOAA has raised ?

@JessicaMeixner-NOAA
Copy link
Collaborator

@arunchawla-NOAA in terms of knowing if the flags worked, I did the following:
I created an extra "cpld_decomp_p8_2" test which uses the 8,3 decomposition. I created a baseline (for the non-debug S2SAW tests) and then ran the regression tests against that baseline and the tests were successful. I did not test the debug test, because in previous tests, it gave the same answers whereas the non-debug tests gave different ones. This test was on hera. On orion, I ran the cpld_decomp_p8 set up with the 3,8 8,3 and 4,6 decompositions and did diffs of the directories and also got reproducibility. In the meantime, I'll keep working on the ORT tests, to provide further evidence that the tests are working.

@JessicaMeixner-NOAA
Copy link
Collaborator

The following test passed:
./opnReqTest -n cpld_control_c96_p8 -c thr,dcp,rst
with the following compiler flags

GOCART$ git diff CMakeLists.txt
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 12d5a32..4371571 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -33,7 +33,9 @@ if (UFS_GOCART)
 # Ensure we build as 32-bit
   message ("Force 32-bit build for GOCART")
   if (CMAKE_Fortran_COMPILER_ID MATCHES "Intel")
-    string (REPLACE "-real-size 64" "" CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS}")
+#   string (REPLACE "-real-size 64" "" CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS}")
+    set (CMAKE_Fortran_FLAGS "-g -traceback -fp-model source -ftz -align array64byte -qno-opt-dynamic-align")
+#   set (CMAKE_Fortran_FLAGS "-g -traceback -fpp -fno-alias -auto -safe-cray-ptr -ftz -assume byterecl -nowarn -sox -align array64byte -qno-opt-dynamic-align -O2 -debug minimal -fp-model source -qoverride-limits -qopt-prefetch=3 -prec-div -prec-sqrt -mp1 -O2 -fPIC")
   elseif (CMAKE_Fortran_COMPILER_ID MATCHES "GNU")
     string (REPLACE "-fdefault-real-8" "" CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS}")
   endif()

I'll work on making the debug test pass as well as any other desired combination/test.

@arunchawla-NOAA
Copy link

so if we were to replace fp-model flag with source instead of consistent in repro mode the ORT test would pass ? (instead of having to create these seperate flags just for UFS GOCART

@JessicaMeixner-NOAA
Copy link
Collaborator

This is the original code I tried: https://github.com/JessicaMeixner-NOAA/ufs-weather-model/tree/bug/decompissue which I have also now confirmed with the fix @DeniseWorthen pointed me to dbg also works.

I'm now trying the suggested update @arunchawla-NOAA is curious about with repro mode and replacing fp-model flag with source instead of consistent. The code is here: https://github.com/JessicaMeixner-NOAA/ufs-weather-model/tree/bug/decompissuerepro and I'm trying the following test:
./opnReqTest -n cpld_control_c96_p8 -c std,thr,dcp,rst,dbg,fhz

I'll report back when I have more information.

@JessicaMeixner-NOAA
Copy link
Collaborator

Here's my output of the ORT with turning on repro mode:

$ ./opnReqTest -n cpld_control_c96_p8 -c std,thr,dcp,rst,dbg,fhz
hfe10
ECF_HOST = hecflow01, ECF_PORT = 26510
test name: cpld_control_c96_p8
cases to compile: std dbg
compiling std with compile option -DAPP=S2SWA -DREPRO=ON -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,FV3_GFS_v16_coupled_p7_rrtmgp,FV3_GFS_v17_coupled_p8
done compiling std
compiling dbg with compile option -DAPP=S2SWA -DREPRO=ON -DCCPP_SUITES=FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,FV3_GFS_v16_coupled_p7_rrtmgp,FV3_GFS_v17_coupled_p8 -DDEBUG=ON
done compiling dbg
cases to run: std_base thr dcp rst dbg_base fhz
Running test for std_base with
    THRD: 1; INPES: 3; JNPES: 8; TASKS: 192; TPN: 40
Running test for thr with
    THRD: 2; INPES: 3; JNPES: 4; TASKS: 120; TPN: 20
Running test for dcp with
    THRD: 1; INPES: 8; JNPES: 3; TASKS: 192; TPN: 40
Running test for rst with
    THRD: 1; INPES: 3; JNPES: 8; TASKS: 192; TPN: 40
Running test for dbg_base with
    THRD: 1; INPES: 3; JNPES: 8; TASKS: 192; TPN: 40
Died with error code 1

So the debug test failed, but perhaps I should be building with both -DREPRO=ON and -DDEBUG=ON, but it looks to me like all the other tests passed although the log file didn't get created.

@rmontuoro
Copy link
Collaborator

I was able to successfully run the operational requirement test for cpld_control_c96_p8 on Hera following the steps below:

  1. Checkout ufs-weather-model rev. 1bd68ca
  2. Add @kgerheiser's fix to WW3/CMakeLists.txt
  3. Append -DREPRO=ON to all options in the ORT script:
diff --git a/tests/opnReqTest b/tests/opnReqTest
index e6aa93e0..802e98e7 100755
--- a/tests/opnReqTest
+++ b/tests/opnReqTest
@@ -102,6 +102,7 @@ build_opnReqTests() {
         ;;
     esac
     MAKE_OPT=$(echo $MAKE_OPT | sed -e 's/^ *//' -e 's/ *$//')
+    MAKE_OPT="${MAKE_OPT} -DREPRO=ON"
     export COMPILE_NR=${name}
 
       cat <<-EOF > ${RUNDIR_ROOT}/compile_${name}.env
  1. Run:
 ./opnReqTest -n cpld_control_c96_p8 -c std,thr,dcp,rst,dbg,fhz

The original REPRO flags were used in this test. These include -fp-model consistent:

elseif(REPRO)
add_definitions(-DREPRO)
set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model consistent -qoverride-limits")
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O2 -debug minimal")
else()

@junwang-noaa
Copy link
Collaborator

@rmontuoro Thanks for confirming the ORT does work with REPRO mode. Can you build GOCART only with REPRO compile option and the rest of model in PROD mode to see if it still passes?

@JessicaMeixner-NOAA
Copy link
Collaborator

By removing -no-prec-div and -no-prec-sqrt in the prod option for non 32 bit, (see: https://github.com/JessicaMeixner-NOAA/ufs-weather-model/tree/bug/twoflags) the ORT test:
./opnReqTest -n cpld_control_c96_p8 -c thr,dcp,rst,dbg,fhz passes. If this does not affect timing, would this be an acceptable solution which would not require REPRO mode for any or all of the components? I saw in other PRs that effect on timing seemed to be judged by timing of regression tests. Would that be sufficient in this case or will other timing tests be required? (FYI: @arunchawla-NOAA @rmontuoro )

@arunchawla-NOAA
Copy link

I would think this would be a better solution than having conditional builds with different flags for different components. @junwang-noaa and @DusanJovic-NOAA what is the best way to assess the impact of this change on the timing of the runs. Since this change keeps the AVX options for GOCART which seems to be critical for speed based on what @climbfuji has stated, this should work right ?

@junwang-noaa
Copy link
Collaborator

I'd suggest running RT and maybe one 35 day benchmark test to confirm the timing is not impacted much.

@rmontuoro
Copy link
Collaborator

Preliminary tests performed on Hera using cpld_bmark_p8 returned the following timings based on 3 identical runs per case:

Case Run # Total wall time
current flags 01 1044.692037
current flags 02 1049.168395
current flags 03 1034.183660
Average 1042.68
Std. Dev. 7.69
remove -no-prec flags 01 1039.495619
remove -no-prec flags 02 1030.581493
remove -no-prec flags 03 1041.315700
Average 1037.13
Std. Dev. 5.74

Note that -no-prec flags were removed as:

diff --git a/cmake/Intel.cmake b/cmake/Intel.cmake
index a91bc30c..b71b172d 100644
--- a/cmake/Intel.cmake
+++ b/cmake/Intel.cmake
@@ -24,7 +24,7 @@ else()
     if(32BIT)
       set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model source -qoverride-limits -qopt-prefetch=3")
     else()
-      set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model source -qoverride-limits -qopt-prefetch=3 -no-prec-div -no-prec-sqrt")
+      set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model source -qoverride-limits -qopt-prefetch=3")
     endif()
     set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O2 -debug minimal")
     set(FAST "-fast-transcendentals")

These flags are still used in GFDL_atmos_cubed_sphere:
https://github.com/NOAA-GFDL/GFDL_atmos_cubed_sphere/blob/43f7ed39fcd302e8404a152011f7a02c5d76ddc9/cmake/compiler_flags_Intel_Fortran.cmake#L4

@JessicaMeixner-NOAA
Copy link
Collaborator

Okay, so I some how majorly goofed. So many apologies for this, but I misread or read what I wanted to see, but the removing just the two-flags at the top level does in fact not work for the decomposition tests. I kept re-running tests last night because they all of a sudden were not working and I kept thinking I had done something wrong, but then this morning I rechecked in the original log file which does in fact say the test failed:
https://github.com/JessicaMeixner-NOAA/ufs-weather-model/blob/bug/twoflags/tests/OpnReqTests_cpld_control_c96_p8_hera.intel.log#L390-L394

I still think it's valuable information for what @rmontuoro ran and discovered, because if it holds that other configurations also do not have a boost from using these flags then perhaps they should be removed. However, it still appears we need to find a clean solution for the decomposition issue.

@rmontuoro
Copy link
Collaborator

Below are results from my latest ORT on Hera:

  • ufs-weather-model revision a46c7ed
  • WW3/CMakeLists.txt fix included
  • Fortran compiler flags changed as:
diff --git a/cmake/Intel.cmake b/cmake/Intel.cmake
index a91bc30c..d9d5f767 100644
--- a/cmake/Intel.cmake
+++ b/cmake/Intel.cmake
@@ -24,7 +24,7 @@ else()
     if(32BIT)
       set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model source -qoverride-limits -qopt-prefetch=3")
     else()
-      set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model source -qoverride-limits -qopt-prefetch=3 -no-prec-div -no-prec-sqrt")
+      set(CMAKE_Fortran_FLAGS "${CMAKE_Fortran_FLAGS} -O2 -debug minimal -fp-model consistent -qoverride-limits -qopt-prefetch=3")
     endif()
     set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -O2 -debug minimal")
     set(FAST "-fast-transcendentals")
Test Case Result
dbg_base cpld_control_c96_p8 PASS
dcp cpld_control_c96_p8 PASS
fhz cpld_control_c96_p8 PASS
rst cpld_control_c96_p8 PASS
std_base cpld_control_c96_p8 PASS
thr cpld_control_c96_p8 PASS

@MinsukJi-NOAA
Copy link
Contributor Author

Using the latest weather model (846f0e4, #1171), the decomposition test is successful: the cpld_decomp_p8 regression test works not only as it is (INPES=4 and JNPES=6), but also with INPES=8 and JNPES=3.

@junwang-noaa
Copy link
Collaborator

@MinsukJi-NOAA Thanks for testing. Since the problem is resolved, I will close the issue.

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests