-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add WCOSS-Dell capability to coupled-crow and migrate to hpc-stack #273
Add WCOSS-Dell capability to coupled-crow and migrate to hpc-stack #273
Conversation
The build for the ww3 and reg2grb2 programs are updated to build on WCOSS Dell. New modules are created for each. The two modules are both using development modules; this needs to be updated to NCO modules in the future. ww3 uses the newer versions of modules under NCEPLIBS-v1.2.0, but the reg2grb2 was unable to use that location due to an issue with wgrib2/2.0.8. reg2grb2 is also using older modules than ww3 on Hera, so this mismatch already exists. The build script for reg2grb2 is updated to use the environment variables set by the wgrib2 module on WCOSS. It should remain backwards compatible for Hera/Orion, but this needs to be tested. Refs: NOAA-EMC#175
…kflow into feature/port_coupled_wcoss_dell * 'feature/coupled-crow' of github.com:NOAA-EMC/global-workflow: Feature/gfsv16intocoupled (NOAA-EMC#194) feature/coupled-crow fractional grid (NOAA-EMC#218) updating module file so that atm pos will work (NOAA-EMC#214) feature/coupled-crow updating to use latest ups-weather (NOAA-EMC#210) feature/coupled-crow: Updates for consistency (NOAA-EMC#199) modify modulefile.reg2grb2.orion, modulefile.reg2grb2.hera and build_reg2grb2.sh to use hpc-stack library update readme to current wave post status rest of post jobs that should have been committed at this point, the post jobs for the waves post run, but the gridded grib2 files are not correct modifications for wave jobs after updating to be closer to v16 wave scripts but work for feature/coupled-crow update ENV files for wave jobs update wave jobs updates from gfsv16b-waves add correct path to wave exe remove exit update ww3 build script add building of WW3 pre and prep jobs currently requires extra module files (can be removed when base uses hpc-stack) update for module files on orion and adding building of wave prep updating scrip files for waves to match gfsv16b updating ush files for waves to match gfsv16b (includes restructure) cleanup (1) set IC dir path as /work/noaa/marine/Partha.Bhattacharjee/IC_Dir in workflow/platforms/orion.yaml (2) add extra white space in ush/nems_configure.sh in line 56 to eliminate warning in run log (3) correction on restart_n template variable setting in ush/nems.configure.med_atm_ocn_ice.IN corrections on E-P calculation in regdiag.f final updates to make cmeps forecasts run add pointer to which mediator restart for cmeps updates for CMEPS mediator adding MED pets so that they only use the forecast not write grid pets updates for nems.configure for running with CMEPS updating ice_in see NOAA-EMC/CICE5#25 adding nems.configure to the log file add note to use rocoto/1.3.2 on orion as this version has fix for the UNAVAILABLE issue reg2grb2 now builds on orion updates of an lmod->lmod/lmod update to build of fv3_coupled updates for wave post jobs - changing exe dir location updating build to work with CMEPS cleanup from duplications adding cycle definition to wave post job fixing syntax error in JWAVE_PREP job for variables that are not actively being used change for lmod -> lmod/lmod on orion updated README updates - everything but wave post seems to run now, part of wave post does updating model to latest tag add waves to jglobal forecast config updated wave-realted scripts to match gfsv16 branch updated environment files env/ORION.env updated configuration system fix nems.configure for cold start at this point the ocean post is now running updates for forecast with waves updating forecast job for wave jobs updates to run wave prep for s2s case wave init now works on orion put ic type variable into config.fv3ic add ocean, ice and wave configurable variables add ocean, ice and wave configurable variables adding missing file from last commit wave init job now works on hera ocean post hours at boundary are processed twice, now fixed. point WW3-related executables through HOMEgfs/fv3_coupled.fd/exec updates for running wave jobs updates for wave defaults for s2s p4 fix bugs in config.wave and coupled_ic.sh first try for adding wave ICs move OCNPETS, ICEPETS and WAVEPETS into dedicated model-specific sessions instead of fv3_gfs_settings bug fix in layout file bugfix in layout file changes to wave case cleaning up double cplwav adding updates to wave scripts from gfsv16b branch by @ajhenrique add ocn-ice build to readme Merged and ready for test reg2grb2 Build on Hera and Orion orion hera build test reg2grb2 for test on feature coupled-crow wave post-process job update Revise CCPP_SUITE variable name, default option, to "IPD" Fix suite name of CCPP Move cplwav and cplwav2atm variable 1, Moved some of the variables into configuration system from JWAVE scripts 2, Wave configuration system built out 3, Modularized scripts updated to accomodate wave variables fix module-setup.sh.inc script, which causes module purge unsuccessful wave configuration update wave configuration files both forecast and atm post seem to be running okay on orion updating module_base.orion that allows cold start to run. Not sure if other jobs will now fail though or if what is in your .bashrc file will effect the success/failure of the run. fix typo updates for tar files and do not keep data adding ncks, updating ocean post, some fixes for tar files mediator cold start succeeds on hera and forecast job starts post jobs for atmosphere are no longer triggered on cold starts updates to readme for orion updating module files on hera we don't want to cycle/have 4 runs per day so setting the cases to be just 1 IC updating module files to match s2s model and ncep post modules last of archive updates for p4 updating for reorganized tar files, pulling in changes from wcoss-p2 workflow making a fix dir of all available ccpp suite files copying suite xml file to rundirectory updates for CCPP suite used in p4 updates for ice_in to match prototype 4 update diag_table for ocean renaming exe to reflect updated build updates for new fix (which includes prep for multiple resolutions) and some clean-up more updates for ICs updates for copying ICs replace MOM6IC with BASE_CPLIC for base directory location of IC for s2s model and made it platform dependent. (Orion location is temporary and needs to be replaced, can also be changed in cases/coupled_free_forecast.yaml to users prefered location) fixed error when linking on hera removing fix from .gitignore making Externals_coupled.cfg consistent with checkout.sh update UPP version to v1.0.8 updates to checkout, build and link to run prototype 4 automate fix file linking, to this location: /work/noaa/marine/jmeixner/tempFixICdir/fix/fix_prep_benchmark3 expand gitignore case file adjustment for Orion IC update checkout and External.cfg checkout and External.cfg update remove HERA-like paths in platform.yaml of Orion port to Orion reiflag default value changed to 1 generalize fix file directory for imp_physics 8 configuration update for CCPP scripts turn off stochastic features in case file update on sandbox platform, for FIX_SCRUB settings delete falseful copy after namelist and model_configure parsing add FNMSKH conditional clause, coupled vs non-coupled stochastic variables update OCNTIM and ICETIM are included in config.fv3 ice variable istep0 bug fix stochy variable namelist update, add hard-coded variables into config.fcst (fcst.yaml) capital case CDUMP values in config.fv3 add stoch variables to config.fcst scripts update: CCPP and IPD works merge in the latest revision from IPD work model_configure_DATM move all default values into /defaults from /schema clean up unused schema move default value into defaults/ Clean up exwave_post_sbs Moving standalone fv3 model_config exglobal_fcst block into if/else/fi cplwav model_config block. Reinstating config.wave block in JGLOBAL_FORECAST. Pointing EXECwave to HOMEgfs/exec directory for WW3 util executables (changed link_fv3gfs.sh accordingly). Removing debug options from compile.sh line in build_fv3.sh. Fix to JWAVE_PREP to look back a day for rtofs Squashing commits to wave2global bug fix, POSTGRB2TBL undefined Pointing to v3.1 tag for ufs-s2s-model making readme consistant with user.yaml.default
The build of WW3 pre/post was returning as successful despite failing to compile some of the executables. The issue was traced back to some of the library paths not being set correctly. The build script is updated to use the existing environment variables for those paths if they are already set, as they are by the modules on WCOSS. Refs: NOAA-EMC#175
The build of reg2grb2 is fixed on WCOSS by adding a missing module for Jasper and fixing the library paths in the build script. Refs: NOAA-EMC#175
Updates the locations for various directories on WCOSS. Some had previously just been placeholders, while others were outdated. The paths for intial conditions and fix files are temporary, as they currently point to a personal directory synchronized from Hera. These need to be moved under glopara at a later date. Refs: NOAA-EMC#175
Ocean post requires NCL. NCL is not available by default on WCOSS, so has to be loaded via a module. Refs: NOAA-EMC#175
…kflow into feature/port_coupled_wcoss_dell
…kflow into feature/port_coupled_wcoss_dell
Some of the changes made to the build script during the WCOSS port broke compatability on Hera. These issues are now resolved. As part of this resolution, WW3 pre/post and reg2grb2 were updated to use hpc-stack 1.1.0. This required some additional libraries to be explicitly linked for reg2grb2. Previously, modules were being loaded from a beta version of hpc-stack that included share objects in the library folders that would automatically include necessary additional libraries. The final version does not have these shared objects, requiring the additional linking. Refs: NOAA-EMC#164, NOAA-EMC#175
Updates the modulefiles to use hpc-stack 1.1.0 on WCOSS-Dell. Other than the compiler platform module, modules are now identical to Hera. This only applies to reg2grb2 and ww3 pre/post, as the coupled model and post manage their own modules. Refs: NOAA-EMC#164, NOAA-EMC#175
Some locations were not being properly set, resulting in incorrect locations being used on some platforms.
…kflow into feature/port_coupled_wcoss_dell
…kflow into feature/port_coupled_wcoss_dell
Updates the jasper version to fix a bug that was causing constant-valued grib files for wave output. Refs: NOAA-EMC#161 NOAA-EMC#164 NOAA-EMC/hpc-stack#137
Transitions the module_base file for Hera to use hpc-stack. There are still five modules used by module_base that are not available via hpc-stack: hpss, nco, gempak, ncl, cdo. Refs: NOAA-EMC#164
Migrates the coupled-crow branch to use hpc-stack. Only jobs run by coupled-crow are migrated. Modules handled in component repos are not updated. Refs: NOAA-EMC#164
Migrates the coupled-crow branch to use hpc-stack. Only jobs run by coupled-crow are migrated. Modules handled in component repos are not updated. Refs: NOAA-EMC#164
This should be it, but I am going to reclone and start from scratch on each machine on Monday to confirm everything works. I am testing using the coupled_free_forecast_wave case, assuming that encompasses all programs for now. |
Thanks @WalterKolczynski-NOAA I will review this afternoon. One quick comment thought: This is just a new port not "restoring" a capabaility as feature/coupled-crow has never been ported to WCOSS-DELL. |
@WalterKolczynski-NOAA one question: since everything is using hpc-stack and the grib issues with WW3 are resolved, the need for the extra WW3 build modules isn't really needed. Do we want to remove those in this PR or in a subsequent PR? |
My test on hera seems to have worked as expected: /scratch2/NCEPDEV/climate/Jessica.Meixner/p6pr273/test01/COMROOT/test01/gfs.20130401 |
Set the WGRIB2 environment variable to the Hera and Orion module_base files to match the addition to the WCOSS version until it is added as a variable to the hpc-stack wgrib2 module. This also allows the removal of a line defining the variable in wave post. Refs: NOAA-EMC#164
coupled_free_forecast_wave passed on all three machines (Hera, Orion, WCOSS-Dell). Did encounter some issues on Venus I didn't encounter on Mars (had to use mpiifort for reg2grb2, ocean post hitting wallclock limit). Will keep an eye on that and possible increase the runtime or otherwise optimize for WCOSS. Also discovered that ocean post does not resume from where it left off; it redoes at least some of the processing from scratch. That may deserve its own issue, but it would be low-priority (mostly needed for operations) and might be mooted when we switch to the unified workflow anyway. |
The build of reg2grb2 was not linking the MPI libraries on WCOSS-Dell, causing the build to fail (I don't know why this wasn't an issue before). Modified the modulefile to now set the fortran compiler to the $MPI_FC set by the impi module (which should be mpiifort) instead of ifort. Refs: NOAA-EMC#273
* If ufs.cpld.cpl.r.* files do not reproduce, try nccmp * Modify/add to run cpld 35d bmark tests * Fix nccmp implementation. Fix a bug for 35d tests * Use env var NCCMP * Add hera.intel to rt_35 and rt_wave_35d * Use which to find nccmp path. Apply nccmp to all files that fail cmp. Fix a bug in fv3_ccpp_wrtGauss_netcdf_parallel * Change compare method from nccmp to compare_ncfile.py * Netcdf compare changes on Hera; skip-ci * Netcdf compare changes on WCOSS Dell P3; skip-ci * Modify comopare_ncfile.py * Netcdf compare changes on wcoss cray * Implement Dusan's ecflow fix NOAA-EMC#273 * Move miniconda3 to emc.nemspara on Hera and Orion. Minor change in default_vars.sh
Restores the capability to run the coupled workflow on WCOSS-Dell. Also partially migrates the coupled workflow to hpc-stack.
The capability to run on WCOSS suffered code rot due to an inability of one of the primary maintainers to run there. That capability is now restored for WCOSS-Dell. It is only tested on phase 3; it would likely run on phase 3.5, but resource allocation would be suboptimal due to the larger number of cores. Additional configuration optimized for phase 3.5 can be added at a later time.
This merge also migrates library imports to hpc-stack for programs located within global-workflow (i.e. not the model or post) and are currently built in the coupled configuration. There are two that fall into both categories: reg2grb2 and waveprepost. Also included in this migration is the runtime modulefile (module_base). Modulefiles are migrated on WCOSS-Dell, Hera, and Orion.
New commit hashes will be needed to update the components. @KateFriedman-NOAA is working on getting these squared away for the develop branch for component versions using the release version of hpc-stack (#164) (some are currently using a test version). The coupled model will need to be similarly updated (it also currently uses a test version), along with any other coupled-only components.
The hpc-stack changes are also relevant to the cmake migration for reg2grb2 (#268).
Refs: #164, #268
Closes: #175