Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix nstf setting for coupled model, update atm aerosol test to use P8 configuration #1218

Merged

Conversation

DeniseWorthen
Copy link
Collaborator

@DeniseWorthen DeniseWorthen commented May 13, 2022

PR Checklist

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR. Please consult the ufs-weather-model wiki if you are unsure how to do this.

  • This PR has been tested using a branch which is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR

  • An Issue describing the work contained in this PR has been created either in the subcomponent(s) or in the ufs-weather-model. The Issue should be created in the repository that is most relevant to the changes in contained in the PR. The Issue and the dependent sub-component PR
    are specified below.

  • Results for one or more of the regression tests change and the reasons for the changes are understood and explained below.

  • New or updated input data is required by this PR. If checked, please work with the code managers to update input data sets on all platforms.

Instructions: All subsequent sections of text should be filled in as appropriate.

The information provided below allows the code managers to understand the changes relevant to this PR, whether those changes are in the ufs-weather-model repository or in a subcomponent repository. Ufs-weather-model code managers will use the information provided to add any applicable labels, assign reviewers and place it in the Commit Queue. Once the PR is in the Commit Queue, it is the PR owner's responsiblity to keep the PR up-to-date with the develop branch of ufs-weather-model.

Description

Provide a detailed description of what this PR does. What bug does it fix, or what feature does it add? Is a change of answers expected from this PR? Are any library updates included in this PR (modulefiles etc.)?

Issue(s) addressed

Link the issues to be closed with this PR, whether in this repository, or in another repository.
(Remember, issues must always be created before starting work on a PR branch!)

Testing

This branch was tested against the existing develop-20220517 at 062a0fd for both hera.intel and hera.gnu. The following tests failed, as expected:

  • all cpld tests (bug fix for DOGP_LWSCAT and change to NSTF_NAME)
  • atmaero_control_p8 (new test; no baseline)

For the new atmaero_control_p8 test, the input.nml and aerosol rc files were compared against the existing cpld_control_p8 test. The files were identical except for the following

<   ccpp_suite = 'FV3_GFS_v17_p8'
---
>   ccpp_suite = 'FV3_GFS_v17_coupled_p8'
26c26
<   domains_stack_size = 3000000
---
>   domains_stack_size = 8000000
150,151c150,151
<   min_seaice   = 0.15
<   use_cice_alb = .false.
---
>   min_seaice   = 1.0e-6
>   use_cice_alb = .true.
227,230c227,230
<   cplflx       = .false.
<   cplice       = .false.
<   cplwav       = .false.
<   cplwav2atm   = .false.
---
>   cplflx       = .true.
>   cplice       = .true.
>   cplwav       = .true.
>   cplwav2atm   = .true.
  • hera.intel
  • hera.gnu
  • orion.intel
  • cheyenne.intel
  • cheyenne.gnu
  • gaea.intel
  • jet.intel
  • wcoss_cray
  • wcoss_dell_p3
  • wcoss2
  • opnReqTest for newly added/changed feature. Tested at cc508ee, the atmaero_control_p8 test passed bit_base, dbg_base, dcp, fhz, mpi, rst, std_base and thr oRT tests.
  • CI

Dependencies

None. Script level changes only.

DeniseWorthen and others added 30 commits March 27, 2021 12:30
This reverts commit 7b826d4.
@DeniseWorthen
Copy link
Collaborator Author

Original labels on Cheyenne don't seem to have started, so re-trying.

@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: jet
Compiler: intel
Job: RT
[RT] Repo location: /lfs4/HFIP/h-nems/emc.nemspara/autort/pr/936244575/20220531223017/ufs-weather-model
[RT] Error: Test rap_sfcdiff_debug 083 failed in run_test failed
[RT] Error: Test hafs_regional_docn_oisst 093 failed in run_test failed
Please make changes and add the following label back: jet-intel-RT

@DeniseWorthen
Copy link
Collaborator Author

On jet.intel, the hafs failure was a timeout. The rap_sfcdiff_debug test failed with

144:  file: module_write_netcdf.F90 line:          207 Permission denied
144: Abort(1) on node 144 (rank 144 in comm 496): application called MPI_Abort(comm=0x84000002, 1) - process 144

The job appeared to actually run but failed writing something at Finalize.

@DeniseWorthen
Copy link
Collaborator Author

On Cheyenne.intel, a lot of baseline jobs have failed. They all seem to be associated with the same error in the run_*.log:

run_001_cpld_control_p8.log:chdir: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory

and then later

322 min. TEST 001 cpld_control_p8 is waiting in a queue,  status: Q jobid 4476288
run_test.sh terminated PID=34571
++ [[ pbs = \p\b\s ]]
++ echo 'run_util.sh: interrupt_job qsub_id = 4476288'
run_util.sh: interrupt_job qsub_id = 4476288

@BrianCurtis-NOAA
Copy link
Collaborator

Didn't Cheyenne tend to kill pbs jobs at the midnight time for that machine? 5 hours and 22 minutes is a long time to run too. I' have never seen that Cheyenne error before, though.

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Jun 1, 2022

I put in a ticket to the help desk on Cheyenne. The job Q seems stuck---just a few jobs running but almost all are just sitting in the Q (not just mine).

(edit: I started the jobs manually on Cheyenne early this morning)

@climbfuji
Copy link
Collaborator

climbfuji commented Jun 1, 2022 via email

on-behalf-of @ufs-community <brian.curtis@noaa.gov>
@BrianCurtis-NOAA
Copy link
Collaborator

Automated RT Failure Notification
Machine: cheyenne
Compiler: intel
Job: BL
[BL] Repo location: /glade/scratch/epicufsrt/autort/tests/auto/pr/936244575/20220601094513/ufs-weather-model
[BL] Error: Test atmaero_control_p8 098 failed in run_test failed
Please make changes and add the following label back: cheyenne-intel-BL

@DeniseWorthen
Copy link
Collaborator Author

I think I need to increase resources on cheyenne.intel for the new atmaero p8 test

0:  ... DONE microphysical lookup tables
108:MPT ERROR: Could not register RMA window with the HCA. There may not be
108:    enough memory.
108:MPT ERROR: Rank 108(g:108) is aborting with error code 0.
108:    Process ID: 60233, Host: r3i6n6, Program: /glade/scratch/epicufsrt/FV3_RT/rt_30952/atmaero_control_p8/fv3.exe
108:    MPT Version: HPE MPT 2.25  08/14/21 03:05:20

* configuration requires more resources to run on
cheyenne
on-behalf-of @ufs-community <brian.curtis@noaa.gov>
@DeniseWorthen
Copy link
Collaborator Author

All RTs are finished.

@DeniseWorthen
Copy link
Collaborator Author

@MinsukJi-NOAA Please review

@DeniseWorthen DeniseWorthen merged commit 66c7291 into ufs-community:develop Jun 2, 2022
@DeniseWorthen DeniseWorthen deleted the feature/fix_nstf_hafs_aerosols branch June 15, 2022 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. Waiting for Reviews The PR is waiting for reviews from associated component PR's.
Projects
None yet
6 participants