Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfixes on previous PR that was merged prior to testing #45

Merged
merged 14 commits into from
Sep 4, 2024

Conversation

aerorahul
Copy link
Contributor

@aerorahul aerorahul commented Aug 30, 2024

#44 was merged without a review while testing was still in progress.

During the testing of #44, a few bugs were uncovered. They are fixed in this PR.
Additionally, this PR:

  • allows user to setup a ecflow case for any arbitrary date and experiment ID. dev/ecf/setup_ecf.sh YYYYMMDDHH expid will result in ecf/def/wafsexpid.def. This allows one to run any number of suites from a single clone of the WAFS package.
  • removes grib1 processing of forecast hours 00, 06, 54, 66. It was noticed that the exwafs_grib.sh script has no condition to handle these forecast hours.
  • upp.fd is cloned as wafs_upp.fd per EE2 requirement

Throughout testing, the wafs_grib2_1p25 jobs for forecast hours 06-36 consistently fail during the execution of tocgrib2. This behavior is reproducible. Please see attached screenshot @YaliMao-NOAA can you please assist?
Screenshot 2024-08-30 at 6 10 12 PM

A snippet from a failed log looks like:

12 + /apps/ops/prod/libs/intel/19.1.3.304/grib_util/1.2.3/bin/tocgrib2
12 + export err=30
12 + err=30
12 + err_chk

-------------------------------------------------------------
-- FATAL ERROR: Job wafs_grib2_1p25_f006_00.150736122.cbqs01 failed, ERROR IN /apps/ops/prod/libs/intel/19.1.3.304/grib_util/1.2.3/bin/tocgrib2 RETURN CODE 30
-- ABNORMAL EXIT at Fri Aug 30 21:57:44 UTC 2024 on nid001589
-------------------------------------------------------------

The DATA directory for the above failed job is at:
/lfs/h2/emc/stmp/rahul.mahajan/wafsx001/tmp/wafs_grib2_1p25_f006_00.150736122.cbqs01

The log file is at:
/lfs/h2/emc/stmp/rahul.mahajan/wafsx001/output/wafs_grib2_1p25_f006_00.o150736122

The relevant paths for review are available at:
HOMEwafs: /lfs/h2/emc/eib/noscrub/rahul.mahajan/eibWork/EE2/wafs/wafs.v7.0.0
COM: /lfs/h2/emc/ptmp/rahul.mahajan/wafsx001/com
OUTPUT: /lfs/h2/emc/stmp/rahul.mahajan/wafsx001/output
DATA: /lfs/h2/emc/stmp/rahul.mahajan/wafsx001/tmp

@aerorahul aerorahul requested a review from YaliMao-NOAA August 30, 2024 22:09
@YaliMao-NOAA
Copy link
Collaborator

I found a root issue which came from UPP. UPP failed to generate WAFS products successfully. I am going to point it out inline where the issue is.

@YaliMao-NOAA
Copy link
Collaborator

@aerorahul Ecflow server failed to start. Error message:
ERROR: Cannot locate directory /lfs/h2/emc/ptmp/yali.mao/ecflow_test/ecflow/com

A simple fix:
instead of this line: mkdir -p ${ECF_ROOT}
change it to: mkdir -p ${ECF_COMDIR}

@aerorahul
Copy link
Contributor Author

@YaliMao-NOAA
I ran a second test with the updates and all jobs successfully succeeded.
Screenshot 2024-09-03 at 1 59 16 PM

@YaliMao-NOAA
Copy link
Collaborator

@aerorahul
Should HOMEwafs line be removed from all job scripts?
export HOMEwafs=${HOMEwafs:-${NWROOT}/wafs.${wafs_ver}}

@aerorahul
Copy link
Contributor Author

@aerorahul Should HOMEwafs line be removed from all job scripts? export HOMEwafs=${HOMEwafs:-${NWROOT}/wafs.${wafs_ver}}

There is no HOMEwafs in any of the j-jobs (and there shouldn't be). They are only present in the dev/driver scripts.

@YaliMao-NOAA YaliMao-NOAA merged commit b6a254f into release/wafs.v7 Sep 4, 2024
@aerorahul aerorahul deleted the feature/correct-grib-fhrs branch September 5, 2024 01:57
aerorahul added a commit that referenced this pull request Oct 8, 2024
* remove rdhpcs options (#42)

* remove hera/orion modulefiles. rename drivers without wcoss2 and remove detect_machine.sh ush scripts

* cleanup versions and make fix files not exec

* Update README.md

* EE2 review updates (#44)

* update wafs_upp to EE2

* update upp job per EE2 standards

* fix scripting errors

* ignore the dirty upp.fd directory as it creates build artifacts that are not captured in its .gitignore

* itag is not a namlist in this version of UPP.  Go Figure!

* remove copying of analysis master file, and move setting of some variables to exscript

* EE2 mods for grib2 1p25 and 0p25 (no blending)

* update blending scripts for EE2

* fix grib1 jobs

* apply EE2 fixes to gcip

* some more updates on gcip

* Bugfixes on previous PR that was merged prior to testing (#45)

* remove unnecessary hours for grib, the offline UPP executable should match EE2 convention, setting up ecflow for development use with multiple expids

* revert changes .gitmodules

* move upp.fd to wafs_upp.fd per EE2

* ensure git submodule update is performed in the right directory

* fix a couple of COMIN bugs

* update experiment paths

* update JWAFS_GFS_MANAGER so it is similar to all other jjobs

* bugfixes discovered while testing

* Copy the folder of upp parm to WAFS/parm after copying gtg.config.gfs from GTG repository to upp parm

* UPP didn't generate WAFS master file correctly. To fix it,
add a line (even blank) between 'flxfile' and '&nampgb' to UPP itag.

* Made the non-ecflow version back to work and added HOMEwafs flexibility

* only copy relevant UPP parm files to WAFS vertical structure

* update doco

* add draft of release_notes

---------

Co-authored-by: yali mao <yali.mao@clogin03.cactus.wcoss2.ncep.noaa.gov>

* Update script document blocks, bug fixes of previous PRs (#48)

* Change all command with "``" to "$()"

* GCIP doesn't need SENDDBN.

* Don't need wmo folder since GRIB2_0P25 products are not added WMO headers.

* Change back to {EXECwafs}/${pgm} from {DATA}/${pgm}

* For UPP, move environment variables from scripts to jobs

* Add SENDDBN_NTC to jobs and correct dbn_alert for SENDDBN_NTC and SENDDBN

* Update document blocks of the scripts

* For WAFS GRIB1 scripts, move defination of jobsuffix from ush/mkwfsgbl.sh to
script/exwafs_grib.sh since fhr doesn't have the same value.

* Add descriptions of JWAFS_GFS_MANAGER

---------

Co-authored-by: yali mao <yali.mao@dlogin08.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin07.dogwood.wcoss2.ncep.noaa.gov>

* Extend waiting time window of UK data to 25 minutes (#49)

Co-authored-by: yali mao <yali.mao@dlogin07.dogwood.wcoss2.ncep.noaa.gov>

* Adjust forecast hours up to 36 for the additional levels per AWC request (#50)

* AWC needs extra levels up to F036

* Update branch of UPP in .gitmodules

---------

Co-authored-by: yali mao <yali.mao@dlogin09.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin06.dogwood.wcoss2.ncep.noaa.gov>

* Update UPP tag to upp_wafs_v7.0.0 (#52)

* update UPP code revision to upp_wafs_v7.0.0

* Update UPP tag in .gitmodules

---------

Co-authored-by: yali mao <yali.mao@dlogin06.dogwood.wcoss2.ncep.noaa.gov>

* AWC request adjusted, blending wall time extended (#53)

* Extend the wall time of the job card for the additional 5 minutes of waiting UK data

* Modified scripts for the additional levels on the second request from AWC

---------

Co-authored-by: yali mao <yali.mao@dlogin06.dogwood.wcoss2.ncep.noaa.gov>

* NRT with ecflow (#54)

* add possibilty of doing in NRT

* depend on all GFS data, not just log files

* fix extensions to atm and sfc files

* GCIP jobs in addition to JWAFS_GFS_MANAGER, have a time trigger in NRT

* gcip time trigger can be anytime after the time specified

* remove GFS forecast job triggers for NRT and rely on time

* remove GFS job triggers in experimental runs.

* add time triggers for blending jobs based on PR review comments

* First version of Release Note for WAFS.v7.0.0 (#55)

* First version of Release Note for WAFS.v7.0.0

* Update docs/Release_Notes.md

Co-authored-by: Kate Friedman <kate.friedman@noaa.gov>

* Update docs/Release_Notes.md

Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>

* Update docs/Release_Notes.md

Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>

* Modified Release Notes from feedback from Rahual and Huiya

* Adjust a table in Release Notes

* Update Release Notes according to the WAFS separation kickout slides

---------

Co-authored-by: yali mao <yali.mao@dlogin01.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: Kate Friedman <kate.friedman@noaa.gov>
Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
Co-authored-by: yali mao <yali.mao@dlogin07.dogwood.wcoss2.ncep.noaa.gov>

* Add ecflow manual text to .ecf files (#58)

Co-authored-by: yali mao <yali.mao@dlogin08.dogwood.wcoss2.ncep.noaa.gov>

* Remove processing for fhrs = 1,2,3,4,5 for UPP in WAFS (#59)

* remove processing of forecast hours 1-5 for UPP in WAFS

* unindent the task

* update exwafs_gfs_manager.sh for hrs 1-5 in upp

* Update release note and .ecf manuals (#60)

* Add more details to .ecf manuals of upp and grib2_0p25

* Update Release Notes of stopping WAFS master files when FFF is between [001-005]

* Update UPP com size after removing WAFS master files for forecast hours between [001-005]

---------

Co-authored-by: yali mao <yali.mao@clogin01.cactus.wcoss2.ncep.noaa.gov>

* Update blending script to send email when UK data is missing (#61)

* Update blending script to send email when UK data is missing
1. usonly.emailbody is differentiated for each forecast hour with missing UK data
2. Remove the condition of sending UK unblended data if US unblended data is missing. It won't happen because the job itself won't get triggered if US unblended data is missing

* Added an ecflow client test script

* Update dev/ecf/README.md

* Update ecf README.md

---------

Co-authored-by: yali mao <yali.mao@clogin03.cactus.wcoss2.ncep.noaa.gov>

* make the NRT suite repeat daily (#62)

* To fix bugzilla 1370 and 1371 for WAFS blending job, (#68)

1. change variable name 'maillist' to 'MAILTO'
2. assign the value in job cards instead of in scripts

Co-authored-by: yali mao <yali.mao@clogin05.cactus.wcoss2.ncep.noaa.gov>

* Change blending job to MPMD to fix bugzilla 1593. Fix bugzilla 1226 (#69)

* Change blending	job to MPMD to fix bugzilla 1593, meanwhile fix	bugzilla 1226

The MPMD change for bugzilla 1593 is for NCO who wants to receive one single email
combining all forecast hours with missing UK data

For bugzilla 1226, AWC is fine with dbn_alert of US unblended data earlier in JWAFS_GRIB2_0P25 job

Bugzilla 1593 -	Improve email notification for missing UK WAFS data
Bugzilla 1226 - Eliminate the duplicated dbn_alert for unblended gfs wafs data

* Update release note and modify the driver

* 1. If US unblended data is missing, don't quit silently, instead send out email and dbn_alert.
2. Add not-blended email and dbn_alert if both UK and US unblended files are missing
3. Change fhours from a string to an array
4. Bug fix and code improvement

* Update blending scripts

* Bug fix

* Update the way of handling err and removing np variable for MPIRUN

* Update scripts/exwafs_grib2_0p25_blending.sh

Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>

---------

Co-authored-by: yali mao <yali.mao@clogin09.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin03.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin07.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin05.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>

* Update ecflow after switching blending to MPMD parallel run. (#72)

* Update ecflow after switching blending to MPMD parallel run.
1. Don't need to setup ecflow links for blending
2. In ecflow definations, change event triggers of each forecast hour to f048 of the upstream completion
Change COMROOT from 'com' to '%ENVIR%/com'

* Remove ecf/scripts/grib2/0p25/blending/jwafs_grib2_0p25_blending_f*.ecf from .gitignore

---------

Co-authored-by: yali mao <yali.mao@clogin05.cactus.wcoss2.ncep.noaa.gov>

---------

Co-authored-by: Rahul Mahajan <aerorahul@users.noreply.github.com>
Co-authored-by: yali mao <yali.mao@clogin03.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin08.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin07.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin09.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin06.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@dlogin01.dogwood.wcoss2.ncep.noaa.gov>
Co-authored-by: Kate Friedman <kate.friedman@noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin01.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin05.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin09.cactus.wcoss2.ncep.noaa.gov>
Co-authored-by: yali mao <yali.mao@clogin07.cactus.wcoss2.ncep.noaa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants