Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WCOSS-Dell capability to coupled-crow and migrate to hpc-stack #273

Conversation

WalterKolczynski-NOAA
Copy link
Contributor

Restores the capability to run the coupled workflow on WCOSS-Dell. Also partially migrates the coupled workflow to hpc-stack.

The capability to run on WCOSS suffered code rot due to an inability of one of the primary maintainers to run there. That capability is now restored for WCOSS-Dell. It is only tested on phase 3; it would likely run on phase 3.5, but resource allocation would be suboptimal due to the larger number of cores. Additional configuration optimized for phase 3.5 can be added at a later time.

This merge also migrates library imports to hpc-stack for programs located within global-workflow (i.e. not the model or post) and are currently built in the coupled configuration. There are two that fall into both categories: reg2grb2 and waveprepost. Also included in this migration is the runtime modulefile (module_base). Modulefiles are migrated on WCOSS-Dell, Hera, and Orion.

New commit hashes will be needed to update the components. @KateFriedman-NOAA is working on getting these squared away for the develop branch for component versions using the release version of hpc-stack (#164) (some are currently using a test version). The coupled model will need to be similarly updated (it also currently uses a test version), along with any other coupled-only components.

The hpc-stack changes are also relevant to the cmake migration for reg2grb2 (#268).

Refs: #164, #268
Closes: #175

The build for the ww3 and reg2grb2 programs are updated to build on
WCOSS Dell. New modules are created for each. The two modules are both
using development modules; this needs to be updated to NCO modules in
the future. ww3 uses the newer versions of modules under NCEPLIBS-v1.2.0,
but the reg2grb2 was unable to use that location due to an issue with
wgrib2/2.0.8. reg2grb2 is also using older modules than ww3 on Hera, so
this mismatch already exists.

The build script for reg2grb2 is updated to use the environment
variables set by the wgrib2 module on WCOSS. It should remain backwards
compatible for Hera/Orion, but this needs to be tested.

Refs: NOAA-EMC#175
…kflow into feature/port_coupled_wcoss_dell

* 'feature/coupled-crow' of github.com:NOAA-EMC/global-workflow:
  Feature/gfsv16intocoupled (NOAA-EMC#194)
  feature/coupled-crow fractional grid  (NOAA-EMC#218)
  updating module file so that atm pos will work (NOAA-EMC#214)
  feature/coupled-crow updating to use latest ups-weather (NOAA-EMC#210)
  feature/coupled-crow: Updates for consistency  (NOAA-EMC#199)
  modify modulefile.reg2grb2.orion, modulefile.reg2grb2.hera and build_reg2grb2.sh to use hpc-stack library
  update readme to current wave post status
  rest of post jobs that should have been committed
  at this point, the post jobs for the waves post run, but the gridded grib2 files are not correct
  modifications for wave jobs after updating to be closer to v16 wave scripts but work for feature/coupled-crow
  update ENV files for wave jobs
  update wave jobs
  updates from gfsv16b-waves
  add correct path to wave exe
  remove exit
  update ww3 build script
  add building of WW3 pre and prep jobs currently requires extra module files (can be removed when base uses hpc-stack)
  update for module files on orion and adding building of wave prep
  updating scrip files for waves to match gfsv16b
  updating ush files for waves to match gfsv16b (includes restructure)
  cleanup
  (1) set IC dir path as /work/noaa/marine/Partha.Bhattacharjee/IC_Dir in workflow/platforms/orion.yaml (2) add extra white space in ush/nems_configure.sh in line 56 to eliminate warning in run log (3) correction on restart_n template variable setting in ush/nems.configure.med_atm_ocn_ice.IN
  corrections on E-P calculation in regdiag.f
  final updates to make cmeps forecasts run
  add pointer to which mediator restart for cmeps
  updates for CMEPS mediator
  adding MED pets so that they only use the forecast not write grid pets
  updates for nems.configure for running with CMEPS
  updating ice_in see NOAA-EMC/CICE5#25
  adding nems.configure to the log file
  add note to use rocoto/1.3.2 on orion as this version has fix for the UNAVAILABLE issue
  reg2grb2 now builds on orion
  updates of an lmod->lmod/lmod update to build of fv3_coupled
  updates for wave post jobs - changing exe dir location
  updating build to work with CMEPS
  cleanup from duplications
  adding cycle definition to wave post job
  fixing syntax error in JWAVE_PREP job for variables that are not actively being used
  change for lmod -> lmod/lmod on orion updated README
  updates - everything but wave post seems to run now, part of wave post does updating model to latest tag
  add waves to jglobal forecast config
  updated wave-realted scripts to match gfsv16 branch updated environment files env/ORION.env updated configuration system
  fix nems.configure for cold start
  at this point the ocean post is now running
  updates for forecast with waves
  updating forecast job for wave jobs
  updates to run wave prep for s2s case
  wave init now works on orion
  put ic type variable into config.fv3ic
  add ocean, ice and wave configurable variables
  add ocean, ice and wave configurable variables
  adding missing file from last commit
  wave init job now works on hera
  ocean post hours at boundary are processed twice, now fixed.
  point WW3-related executables through HOMEgfs/fv3_coupled.fd/exec
  updates for running wave jobs
  updates for wave defaults for s2s p4
  fix bugs in config.wave and coupled_ic.sh
  first try for adding wave ICs
  move OCNPETS, ICEPETS and WAVEPETS into dedicated model-specific sessions instead of fv3_gfs_settings
  bug fix in layout file
  bugfix in layout file
  changes to wave case
  cleaning up double cplwav
  adding updates to wave scripts from gfsv16b branch by @ajhenrique
  add ocn-ice build to readme
  Merged and ready for test
  reg2grb2 Build on Hera and Orion
  orion hera build test
  reg2grb2 for test on feature coupled-crow
  wave post-process job update
  Revise CCPP_SUITE variable name, default option, to "IPD"
  Fix suite name of CCPP Move cplwav and cplwav2atm variable
  1, Moved some of the variables into configuration system from JWAVE scripts 2, Wave configuration system built out 3, Modularized scripts updated to accomodate wave variables
  fix module-setup.sh.inc script, which causes module purge unsuccessful
  wave configuration update
  wave configuration files
  both forecast and atm post seem to be running okay on orion
  updating module_base.orion that allows cold start to run. Not sure if other jobs will now fail though or if what is in your .bashrc file will effect the success/failure of the run.
  fix typo
  updates for tar files and do not keep data
  adding ncks, updating ocean post, some fixes for tar files
  mediator cold start succeeds on hera and forecast job starts post jobs for atmosphere are no longer triggered on cold starts
  updates to readme for orion
  updating module files on hera
  we don't want to cycle/have 4 runs per day so setting the cases to be just 1 IC
  updating module files to match s2s model and ncep post modules
  last of archive updates for p4
  updating for reorganized tar files, pulling in changes from wcoss-p2 workflow
  making a fix dir of all available ccpp suite files copying suite xml file to rundirectory
  updates for CCPP suite used in p4
  updates for ice_in to match prototype 4
  update diag_table for ocean
  renaming exe to reflect updated build
  updates for new fix (which includes prep for multiple resolutions) and some clean-up
  more updates for ICs
  updates for copying ICs
  replace MOM6IC with BASE_CPLIC for base directory location of IC for s2s model and made it platform dependent. (Orion location is temporary and needs to be replaced, can also be changed in cases/coupled_free_forecast.yaml to users prefered location)
  fixed error when linking on hera
  removing fix from .gitignore
  making Externals_coupled.cfg consistent with checkout.sh
  update UPP version to v1.0.8
  updates to checkout, build and link to run prototype 4
  automate fix file linking, to this location: /work/noaa/marine/jmeixner/tempFixICdir/fix/fix_prep_benchmark3
  expand gitignore
  case file adjustment for Orion IC
  update checkout and External.cfg
  checkout and External.cfg update
  remove HERA-like paths in platform.yaml of Orion
  port to Orion
  reiflag default value changed to 1
  generalize fix file directory for imp_physics 8
  configuration update for CCPP scripts
  turn off stochastic features in case file
  update on sandbox platform, for FIX_SCRUB settings
  delete falseful copy after namelist and model_configure parsing add FNMSKH conditional clause, coupled vs non-coupled
  stochastic variables update
  OCNTIM and ICETIM are included in config.fv3 ice variable istep0 bug fix
  stochy variable namelist update, add hard-coded variables into config.fcst (fcst.yaml)
  capital case CDUMP values in config.fv3
  add stoch variables to config.fcst scripts update: CCPP and IPD works
  merge in the latest revision from IPD work
  model_configure_DATM
  move all default values into /defaults from /schema
  clean up unused schema move default value into defaults/
  Clean up exwave_post_sbs
  Moving standalone fv3 model_config exglobal_fcst block into if/else/fi cplwav model_config block. Reinstating config.wave block in JGLOBAL_FORECAST. Pointing EXECwave to HOMEgfs/exec directory for WW3 util executables (changed link_fv3gfs.sh accordingly). Removing debug options from compile.sh line in build_fv3.sh.
  Fix to JWAVE_PREP to look back a day for rtofs
  Squashing commits to wave2global
  bug fix, POSTGRB2TBL undefined
  Pointing to v3.1 tag for ufs-s2s-model making readme consistant with user.yaml.default
The build of WW3 pre/post was returning as successful despite failing
to compile some of the executables. The issue was traced back to some
of the library paths not being set correctly. The build script is
updated to use the existing environment variables for those paths if
they are already set, as they are by the modules on WCOSS.

Refs: NOAA-EMC#175
The build of reg2grb2 is fixed on WCOSS by adding a missing module for
Jasper and fixing the library paths in the build script.

Refs: NOAA-EMC#175
Updates the locations for various directories on WCOSS. Some had previously
just been placeholders, while others were outdated.

The paths for intial conditions and fix files are temporary, as they currently
point to a personal directory synchronized from Hera. These need to be moved
under glopara at a later date.

Refs: NOAA-EMC#175
Ocean post requires NCL. NCL is not available by default on WCOSS, so has to be
loaded via a module.

Refs: NOAA-EMC#175
Some of the changes made to the build script during the WCOSS port broke
compatability on Hera. These issues are now resolved.

As part of this resolution, WW3 pre/post and reg2grb2 were updated to
use hpc-stack 1.1.0. This required some additional libraries to be
explicitly linked for reg2grb2. Previously, modules were being loaded
from a beta version of hpc-stack that included share objects in the
library folders that would automatically include necessary additional
libraries. The final version does not have these shared objects,
requiring the additional linking.

Refs: NOAA-EMC#164, NOAA-EMC#175
Updates the modulefiles to use hpc-stack 1.1.0 on WCOSS-Dell. Other than
the compiler platform module, modules are now identical to Hera. This
only applies to reg2grb2 and ww3 pre/post, as the coupled model and post
manage their own modules.

Refs: NOAA-EMC#164, NOAA-EMC#175
Some locations were not being properly set, resulting in incorrect
locations being used on some platforms.
Updates the jasper version to fix a bug that was causing constant-valued
grib files for wave output.

Refs: NOAA-EMC#161 NOAA-EMC#164 NOAA-EMC/hpc-stack#137
Transitions the module_base file for Hera to use hpc-stack.

There are still five modules used by module_base that are not available
via hpc-stack: hpss, nco, gempak, ncl, cdo.

Refs: NOAA-EMC#164
Migrates the coupled-crow branch to use hpc-stack. Only jobs run by
coupled-crow are migrated. Modules handled in component repos are
not updated.

Refs: NOAA-EMC#164
Migrates the coupled-crow branch to use hpc-stack. Only jobs run by
coupled-crow are migrated. Modules handled in component repos are
not updated.

Refs: NOAA-EMC#164
@WalterKolczynski-NOAA
Copy link
Contributor Author

This should be it, but I am going to reclone and start from scratch on each machine on Monday to confirm everything works. I am testing using the coupled_free_forecast_wave case, assuming that encompasses all programs for now.

@WalterKolczynski-NOAA WalterKolczynski-NOAA added the port Porting the system to a new platform label Feb 22, 2021
@WalterKolczynski-NOAA WalterKolczynski-NOAA linked an issue Feb 22, 2021 that may be closed by this pull request
@JessicaMeixner-NOAA
Copy link
Contributor

Thanks @WalterKolczynski-NOAA I will review this afternoon. One quick comment thought: This is just a new port not "restoring" a capabaility as feature/coupled-crow has never been ported to WCOSS-DELL.

@JessicaMeixner-NOAA
Copy link
Contributor

@WalterKolczynski-NOAA one question: since everything is using hpc-stack and the grib issues with WW3 are resolved, the need for the extra WW3 build modules isn't really needed. Do we want to remove those in this PR or in a subsequent PR?

@JessicaMeixner-NOAA
Copy link
Contributor

My test on hera seems to have worked as expected: /scratch2/NCEPDEV/climate/Jessica.Meixner/p6pr273/test01/COMROOT/test01/gfs.20130401

Set the WGRIB2 environment variable to the Hera and Orion module_base files
to match the addition to the WCOSS version until it is added as a variable
to the hpc-stack wgrib2 module.

This also allows the removal of a line defining the variable in wave post.

Refs: NOAA-EMC#164
@WalterKolczynski-NOAA WalterKolczynski-NOAA changed the title Restore WCOSS-Dell capability and migrate to hpc-stack Add WCOSS-Dell capability to coupled-crow and migrate to hpc-stack Feb 24, 2021
@WalterKolczynski-NOAA WalterKolczynski-NOAA marked this pull request as ready for review February 25, 2021 06:27
@WalterKolczynski-NOAA
Copy link
Contributor Author

coupled_free_forecast_wave passed on all three machines (Hera, Orion, WCOSS-Dell). Did encounter some issues on Venus I didn't encounter on Mars (had to use mpiifort for reg2grb2, ocean post hitting wallclock limit). Will keep an eye on that and possible increase the runtime or otherwise optimize for WCOSS.

Also discovered that ocean post does not resume from where it left off; it redoes at least some of the processing from scratch. That may deserve its own issue, but it would be low-priority (mostly needed for operations) and might be mooted when we switch to the unified workflow anyway.

@WalterKolczynski-NOAA WalterKolczynski-NOAA merged commit 017ec94 into NOAA-EMC:feature/coupled-crow Feb 25, 2021
@WalterKolczynski-NOAA WalterKolczynski-NOAA deleted the feature/port_coupled_wcoss_dell branch February 25, 2021 15:11
@WalterKolczynski-NOAA WalterKolczynski-NOAA restored the feature/port_coupled_wcoss_dell branch February 25, 2021 16:55
WalterKolczynski-NOAA added a commit to WalterKolczynski-NOAA/global-workflow that referenced this pull request Feb 25, 2021
The build of reg2grb2 was not linking the MPI libraries on WCOSS-Dell,
causing the build to fail (I don't know why this wasn't an issue before).
Modified the modulefile to now set the fortran compiler to the $MPI_FC
set by the impi module (which should be mpiifort) instead of ifort.

Refs: NOAA-EMC#273
zhanglikate pushed a commit to zhanglikate/global-workflow that referenced this pull request Oct 6, 2022
* If ufs.cpld.cpl.r.* files do not reproduce, try nccmp
* Modify/add to run cpld 35d bmark tests
* Fix nccmp implementation. Fix a bug for 35d tests
* Use env var NCCMP
* Add hera.intel to rt_35 and rt_wave_35d
* Use which to find nccmp path. Apply nccmp to all files that fail cmp. Fix a bug in fv3_ccpp_wrtGauss_netcdf_parallel
* Change compare method from nccmp to compare_ncfile.py
* Netcdf compare changes on Hera; skip-ci
* Netcdf compare changes on WCOSS Dell P3; skip-ci
* Modify comopare_ncfile.py
* Netcdf compare changes on wcoss cray
* Implement Dusan's ecflow fix NOAA-EMC#273
* Move miniconda3 to emc.nemspara on Hera and Orion. Minor change in default_vars.sh
@emcbot emcbot added the CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-Orion-Building **Bot use only** CI testing is cloning/building on Orion port Porting the system to a new platform
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port feature/coupled-crow to WCOSS-Dell
3 participants