-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] First iteration of overhauling WE2E tests #686
[develop] First iteration of overhauling WE2E tests #686
Conversation
- No need to test GFS_v15p2 on more than one 3km domain - Two RRFS_CONUScompact_25km tests were practically identical to others - grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional is identical to grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot except for the plotting tasks at the end
test an old obscure functionality where the number of leading zeroes in NUM_ENS_MEMBERS would be carried over to the task names, but this hasn't been the case for a long time (ensemble numbers are padded to 3 digits regardless)
community_ensemble_2mems_stoch into a single test.
…dlmp and community_ensemble_2mems_stoch into a single test.
inline_post into a single test.
specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE into a single test; combine grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta and specify_DOT_OR_USCORE into a single test.
specify_RESTART_INTERVAL into a single test.
MET_verification into a single test; combine grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR and MET_ensemble_verification into a single test.
- Track start time and total walltime in yaml file - Print total walltime of a given experiment when it is complete - Refuse user interrupt with ctrl-c if YAML file is being currently written (to avoid printing malformed YAML)
- Change config.grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16.yaml to use HRRR suite (cuts run time in half) - Remove known failure grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v16 - Fix grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional_plot test name
to use RRFS_v1beta; also cuts runtime and core hours in half
… tasks now do that
…risons are easier
- Only run two cycles for grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp - Change physics suite for grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_HRRR to grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16
…NUM_THREADS_MAKE_OROG, increase NNODES_MAKE_ICS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich Thanks for creating the first iteration of changes related to overhauling the WE2E tests!
Overall, these changes look good to me, however, I did note that some of the changes, especially to either the physics suite being used or ICs/LBCs being derived weren't updated in the descriptions. I have noted these issues when I found them in my review.
...n_mdls_suites_community/config.grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta.yaml
Show resolved
Hide resolved
...extrn_mdls_suites_community/config.grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml
Show resolved
Hide resolved
..._extrn_mdls_suites_community/config.grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16.yaml
Outdated
Show resolved
Hide resolved
...s_extrn_mdls_suites_community/config.grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR.yaml
Show resolved
Hide resolved
...uites_community/config.grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta.yaml
Show resolved
Hide resolved
@MichaelLueken Thanks for your comments, I have made all those necessary changes. An update on testing: I did not re-run the fundamental tests after incorporating changes from develop until now. When I did, I found that verification tests were failing on Hera (since fundamental tests on Hera are run in NCO mode); this is a known issue described in issue #688 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich Thank you very much for addressing my comments! I will go ahead and approve this work now.
Please note that neither Jenkins nor I are able to connect to Gaea right now, so the tests will likely take some time to complete.
@mkavulich The Jenkins tests have passed for all systems, except for Gaea (still down) and Cheyenne Intel, where the |
@MichaelLueken Can you point me to the location of failing tests on Cheyenne? I suspect this is due to missing data in the old staged-data location since #674 has not been merged yet. |
@mkavulich Sure, the directory on Cheyenne is:
|
Okay good, it is indeed a failure due to missing data as I suspected. I will talk to @natalie-perlin to see about getting that data moved to the new data locations. |
@mkavulich The Gaea test logs were manually reviewed and all successfully passed. Do you know whether the necessary data has been moved to the new data locations? Once Jenkins comes back up, I can resubmit the Cheyenne Intel suite. Thanks! |
@MichaelLueken I am still waiting on one set of observations to be staged so all verification tests can pass. I will let you know when they are in place. |
…_WE2E_test_suites
@MichaelLueken I have merged the latest changes and it looks like all the data is in place that needs to be. I think I'm ready for you to kick off the Jenkins workflow now. |
@mkavulich Thanks! I have resubmitted the Jenkins tests now. |
@MichaelLueken I see from other PR discussions that the "aborted" message is a Jenkins pipeline problem communicating with Jet. I can see by checking manually that all tests passed on platforms I have access to (Cheyenne, Orion, Jet) and Hera definitely passes since that's where I ran all my manual tests. Can this be merged? |
@mkavulich Yes, the Jenkins tests successfully passed and my own manual tests on Jet also passed. There are no XML file changes, so I will move forward with merging this work now. |
Update the WE2E test used in srw_metric_example.sh to reflect the changes that the WE2E test "grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16" has been substituted with "grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0" in #686 (The GFS v16 suite is not intended for use at such high resolutions, so using the WoFS_v0 suite was more appropriate in this case).
DESCRIPTION OF CHANGES:
This is the first set of changes overhauling the WE2E test suites. This represents steps 1 and 2 of the overhaul process as described in #587 (comment). In addition, some quality-of-life improvements to the WE2E test scripts are included.
Major takeaways:
To avoid cluttering this PR message, a more detailed summary of changes and the test savings can be found in this comment on issue 587: #587 (comment)
Test changes
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp_regional
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_HRRR
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_GFS_v15p2
community_ensemble_008mems
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_TRUE
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019101818
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2021010100
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019101818
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021010100
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2021062000
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
+community_ensemble_2mems_stoch
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_2017_gfdlmp
+community_ensemble_2mems
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
+inline_post
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
+specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
+specify_DOT_OR_USCORE
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
+specify_RESTART_INTERVAL
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_HRRR
+MET_verification
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16
+MET_ensemble_verification
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16.yaml
Changed physics suite to more resolution-appropriate and cheaper FV3_GFS_v16, reduced forecast hours from 6 to 3grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR
Changed physics suite to more resolution-appropriate and cheaper FV3_HRRR, reduced forecast hours from 6 to 3.grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Reduced forecast hours from 6 to 3.grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
Saved ~50% core hours (~1100 -> ~600)grid_RRFS_NA_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta
Reduced forecast hours from 6 to 3, increased DT_ATMOS to 40, increased OMP_NUM_THREADS_MAKE_OROG and NNODES_MAKE_ICS.SUBCONUS_Ind_3km
-domain tests were re-arranged to ensure all input IC/LBC combinations are tested on a 3-km domain.Testing script updates:
Type of change
TESTS CONDUCTED:
Ran comprehensive and fundamental tests on Hera and Jet, as described above. Ran fundamental tests on Orion (intel).
DEPENDENCIES:
None
DOCUMENTATION:
WE2E test documentation will need updating, but will handle this in next round of changes.
ISSUE:
Partially addresses #587
CHECKLIST