Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOM6 IAU and atmos stochy restart test #668

Merged
merged 19 commits into from
Jul 22, 2021

Conversation

pjpegion
Copy link
Collaborator

@pjpegion pjpegion commented Jun 25, 2021

PR Checklist

  • Ths PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • If new or updated input data is required by this PR, it is clearly stated in the text of the PR.

The newly added test will need a new input file

@[INPUTDATA_ROOT]/MOM6_IC/@[OCNRES]/2011100100/mom6_increment.nc, this file currently resides on hera in /scratch2/BMC/gsienkf/Philip.Pegion/MOM6-IAU/mom6_increment.nc ./INPUT/

Instructions: All subsequent sections of text should be filled in as appropriate.

The code changes are contained within MOM6. One additional file was added to the MOM6 repo, thus the mom6_files.cmake
was updated.

Description

This PR brings in the ability for the ocean model to read in either analysis increments or a full analysis field and apply the increments over a specified time interval. I new regression test was added for this. In addition, a regression test was added to test the atmospheric stochastic physics restart capability.

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

Testing

How were these changes tested? What compilers / HPCs was it tested with? Are the changes covered by regression tests? (If not, why? Do new tests need to be added?) Have regression tests and unit tests (utests) been run? On which platforms and with which compilers? (Note that unit tests can only be run on tier-1 platforms)

Dependencies

If testing this branch requires non-default branches in other repositories, list them. Those branches should have matching names (ideally).

Do PRs in upstream repositories need to be merged first?
If so add the "waiting for other repos" label and list the upstream PRs

@DeniseWorthen DeniseWorthen added the New Input Data Req'd This PR requires new data to be sync across platforms label Jun 28, 2021
@DeniseWorthen
Copy link
Collaborator

@pjpegion I don't think a new MOM-IAU_input_template_100 is required. You could implement the additional required parameters by having a new variable for ODA_INCUPD (set false by default). We did something similar when we removed the extra MOM template that included waves. You can look at how the variable MOM6_USE_WAVES works.

When you want to include your feature, your test would the appropriate variable to true. I'd also suggest a variable for the ODA_INCUPD_NHOURS instead of a hard-wired "6". Since the value you're using is different than the default, I'm assuming it might be something which changes.

@pjpegion
Copy link
Collaborator Author

@DeniseWorthen I will make those changes. Thanks

@pjpegion
Copy link
Collaborator Author

I have removed the MOM-IAU_input_template_100 and added variables to MOM_input_template_100.

@DusanJovic-NOAA DusanJovic-NOAA added the Baseline Updates Current baselines will be updated. label Jul 20, 2021
@DeniseWorthen
Copy link
Collaborator

I am going to remove the label for input data needed; the required file was added in the previous PR.

@DeniseWorthen DeniseWorthen removed the New Input Data Req'd This PR requires new data to be sync across platforms label Jul 20, 2021
@pjpegion
Copy link
Collaborator Author

There is a bugfix for stochastic physics that I am going to pull in with this PR.

pjpegion and others added 2 commits July 21, 2021 10:33
change baseline date to 20210721
@BrianCurtis-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: BL
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/677884559/20210721170015/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_285769
Baseline creation and move successful
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/677884559/20210721185533/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_280735
Test control_c384gdas 024 failed failed
Test control_c384gdas 024 failed in run_test failed
Please make changes and add the following label back:
jet-intel-BL

@DusanJovic-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: BL
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/677884559/20210721170015/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_285769
Baseline creation and move successful
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/677884559/20210721185533/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_280735
Test control_c384gdas 024 failed failed
Test control_c384gdas 024 failed in run_test failed
Please make changes and add the following label back:
jet-intel-BL

$ cat /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_280735/control_c384gdas/err
...
  0:  in fcst,init total time:    17.4488160610199     
181: [181:x331] unexpected DAPL connection event 0x4008 from 358
181: Fatal error in PMPI_Wait: Internal MPI error!, error stack:
181: PMPI_Wait(219)...........: MPI_Wait(request=0x1d694d20, status=0x1) failed
181: MPIR_Wait_impl(81).......: fail failed
181: PMPIDI_CH3I_Progress(850): fail failed
181: (unknown)(): Internal MPI error!
  0: slurmstepd: error: *** STEP 55251043.0 ON x9 CANCELLED AT 2021-07-21T20:02:04 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: x330: task 150: Killed
...

@BrianCurtis-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/677884559/20210721214508/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_144572
Test control_2threads 018 failed failed
Test control_2threads 018 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

@DusanJovic-NOAA
Copy link
Collaborator

Machine: jet
Compiler: intel
Job: RT
Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/677884559/20210721214508/ufs-weather-model
Please manually delete: /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_144572
Test control_2threads 018 failed failed
Test control_2threads 018 failed in run_test failed
Please make changes and add the following label back:
jet-intel-RT

$ tail /lfs4/HFIP/h-nems/emc.nemspara/RT_RUNDIRS/emc.nemspara/FV3_RT/rt_144572/control_2threads/err
72:  Size of pset is:          104
72:  datset is: GFSFLX
72:  MXFLD is:          999
72:  size of lvlsxml:       150000
72:  size of post_avblflds param         104
72:  in readxml. nfld=         104 nrecout=         113
72:  post_fname=GFSFLX.GrbF24
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
 0: slurmstepd: error: *** STEP 55257334.0 ON x371 CANCELLED AT 2021-07-21T22:44:58 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 55257334 ON x371 CANCELLED AT 2021-07-21T22:44:58 DUE TO TIME LIMIT ***

export SKEB=0.3
export SHUM=0.003
export SPPT=0.2


export FV3_RUN=control_run.IN
export FV3_RUN=control_stochy_run.IN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the only difference between the control_run.IN and control_stochy_run.IN is that the stochy is using 6hr restart files while control is using 12, maybe in the future we can add a restart file time to the script and unify the two scripts so that we don't need to maintain two scripts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that can be added so we don't have 2 scripts.

@pjpegion
Copy link
Collaborator Author

submodules and gitmodules now point to the authoritative repos and is ready to merge.

@DusanJovic-NOAA DusanJovic-NOAA merged commit 40b6975 into ufs-community:develop Jul 22, 2021
@pjpegion pjpegion deleted the mom6-iau branch September 30, 2021 17:49
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
)

## DOCUMENTATION:
This PR removes the `FV3_CPT_v0`, `FV3_GSD_v0`, and `FV3_GSD_SAR` suites from the workflow.  This consists of:
1. Removing these suites from ex-scripts, templates, and the set of valid values for the variable `CCPP_PHYS_SUITE`,
2. Removing the `diag_table_...` and `field_table_...` files for these suites.
3. Removing WE2E tests in the `grids_extrn_mdls_suites_community` category (which are tests to make sure that specific combinations of grids, external models, and suites work well together) that use these suites.
4. Modifying the three WE2E tests in the `wflow_features` category (`get_from_HPSS_ics_HRRR_lbcs_RAP`, `get_from_HPSS_ics_RAP_lbcs_RAP`, and `specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE`) that happen to use the `FV3_GSD_SAR` suite such that they now use the `FV3_HRRR` suite. (There are no such tests that use the `FV3_CPT_v0` and `FV3_GSD_v0` suites.)  Note that we don't remove these tests because their purpose is not to test the suite but to test fetching of files from HPSS (`get_from_HPSS_ics_HRRR_lbcs_RAP` and `get_from_HPSS_ics_RAP_lbcs_RAP`) and to test that the experiment variables `DT_ATMOS`, `LAYOUT_X`, `LAYOUT_Y`, and `BLOCKSIZE` can be correctly specified in the user's experiment configuration file (`specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE`
5. Updating comments in scripts that may refer to one of these three suites.

This PR also makes improvements to the `tests/get_expts_status.sh` script that is used to check the status of a set of experiments in a specified directory.

## DEPENDENCIES:
PR #[224](ufs-community/ufs-srweather-app#224) in the `ufs-srweather-app` repo.

## TESTS CONDUCTED:
Ran the following tests on Hera:
```
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1alpha
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
nco_grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
get_from_HPSS_ics_HRRR_lbcs_RAP
get_from_HPSS_ics_RAP_lbcs_RAP
specify_DT_ATMOS_LAYOUT_XY_BLOCKSIZE
```
All succeeded.  Also, since the modifications to the `FV3.input.yml` file affect the `FV3_RRFS_v1alpha`, `FV3_RRFS_v1beta`, and `FV3_HRRR` suites, the `input.nml` files for these suites generated using the (original) `develop` branch were compared to the ones generated using this branch/PR, and all were found to be identical.

## ISSUE (optional): 
Resolves Issue #668.
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
## DESCRIPTION OF CHANGES: 
Several paths in the machine-specific files point to locations in user paths or old locations of static data. This PR updates paths of static data in regional_workflow/ush/machine/ to point to the official, centralized locations on Cheyenne, Hera, and Jet.

## TESTS CONDUCTED: 
Ran the following suite of end-to-end tests on Cheyenne and Jet prior to the latest ufs-weather-model hash update. All passed. This list of tests was chosen because all of these tests are known to succeed on all tested platforms, and this tests a variety of input and boundary condition types.

- grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
- grid_RRFS_CONUS_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
- grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2
- grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_HRRR_suite_RRFS_v1beta
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_HRRR
- grid_RRFS_CONUS_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
- grid_RRFS_CONUS_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta


On Hera, I ran tests with the latest SRW hash, which included the updated weather model. Because of this, many tests could not be generated due to using old, removed CCPP suites (see issue #668). To get around this issue, I tested with the fixes from #697 incorporated into my branch. With those extra commits, all "get_extrn_ics" and "get_extrn_lbcs" tasks completed successfully, which indicates that all data is in its correct place.

## ISSUE (optional): 
Will resolve a few issues in #673, many remain however.
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
…rom PointStat and EnsembleStat tasks) (#668)

Split calls to METplus's Pb2nc tool out of the tasks that run:

 1. deterministic verification (vx) using the PointStat tool,
 2. ensemble vx using the EnsembleStat, and
 3. verification on the ensemble mean and ensemble probabilities using the PointStat tool

into separate tasks. This new task is run for each cycle to convert NDAS prep-bufr files to NetCDF format. Also, include additional cleanup of vx tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated.
Projects
None yet
6 participants