-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update module files to build gsi on Gaea-C5 #746
Update module files to build gsi on Gaea-C5 #746
Conversation
@DavidHuber-NOAA , if you have time would you please review this PR? @DavidBurrows-NCO - any Gaea users we should ask to review this PR? |
I would suggest @jswhit. |
@RussTreadon-NOAA just to clone and build? |
It would be good to get verification from users that they can clone, build, and execute @DavidHuber-NOAA , do you know if we have the GSI and EnKF ctests |
@RussTreadon-NOAA It looks like ctests are run from the regression directory. Is that correct? I want to dig into the ctest workflow |
GSI Wiki GSI Ctests (regression tests) provides a brief overview of GSI ctests. These are not ctests in the sense of unit tests. The CMake ctest capability is used to sequentially submit scripts in I do not recall running the GSI ctests on Gaea. I logged into Gaea this morning. It appears my last login dates back to March 2023. I do not use Gaea so my user support ability is limited. |
Thanks! The mechanics are there to run on Gaea in |
@DavidBurrows-NCO @RussTreadon-NOAA There is some old regression test data in |
I should note that that commit is ~3 months old, so there may be some additional changes needed. |
Excellent. Thanks @DavidHuber-NOAA |
I see now that I also had to make these changes: 04a737d. I believe that's it. |
0f0d82b
to
e48c7bf
Compare
regression/regression_var.sh
Outdated
@@ -190,7 +186,7 @@ export savdir="$ptmp" | |||
export JCAP="62" | |||
|
|||
# Case Study analysis dates | |||
export global_adate="2024022300" | |||
export global_adate="2022110900" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should remain at 2024022300
, though that may require updated RT data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. /gpfs/f5/epic/world-shared/GSI_data/CASES/regtest/gfs/prod/
should be populated with enkfgdas.202402*
and gdas.202402*
directories and files from Hera /scratch1/NCEPDEV/da/Russ.Treadon/CASES/regtest/gfs/prod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavidHuber-NOAA @RussTreadon-NOAA Yep, will do. I was just trying to get a run going before copying files, but that's exactly what I'll do. Revert that global_adate change and copy those dirs from Hera to Gaea
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavidHuber-NOAA @RussTreadon-NOAA Looks like I don't have write permissions at
ls -l /gpfs/f5/epic/world-shared/
drwxr-sr-x 4 role.epic epic 4.0K Jan 22 10:00 GSI_data
unless I request a role.epic account. Should I do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DavidHuber-NOAA @RussTreadon-NOAA I reverted this change, began populating /gpfs/f5/
ufs-ard/world-shared/GSI_data/
with regression test data from Hera, and updated GSI_BINARY_SOURCE_DIR
. Let me know if you still have issues cloning on Gaea.
583f403
to
c44a771
Compare
modulefiles/gsi_gaea.intel.lua
Outdated
unload("cray-mpich") | ||
unload("cray-python") | ||
unload("darshan") | ||
prepend_path("MODULEPATH", "/ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/unified-env/install/modulefiles/Core") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really needed? (I think gsi-addon-dev provides everything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jswhit I tested with just gsi-addon-dev load and don't see any issues. I committed an updated module file.
Clone The build aborts while trying to copy GSI binary fix files from |
Looks like that directory needs to be copied to |
Morning... @RussTreadon-NOAA I am seeing similar errors in all 7 jobs except my global_4denvar now failed with |
Thank you @DavidBurrows-NCO for confirming that you also are encountering problems. I am most familiar with the global |
@DavidBurrows-NCO Looking into the global_4denvar failure (to create subdirectories), it seems the directory it is trying to make already exists (that's the meaning of ierror=17). And based on the duplicate messages of INIT_DIRECTORIES: ***ERROR** Failed to create directory dir.0000 for PE
0000 ierror= 17
****STOP2**** ABORTING EXECUTION w/code= 678
INIT_DIRECTORIES: ***ERROR** Failed to create directory dir.0000 for PE
0000 ierror= 17
****STOP2**** ABORTING EXECUTION w/code= 678 it seems to me that multiple PEs have a |
Remove
However, |
Prints added to |
@RussTreadon-NOAA I retried the runs after removing --mpi=pmi2. I placed the output here: |
|
Morning @RussTreadon-NOAA @DavidHuber-NOAA I made some progress on the ctests. In the original module file, there were |
@RussTreadon-NOAA I made one more update to gsi_gaea module file |
@DavidBurrows-NCO , thank you for recent updates to
The hafs_4denvar_glbens failure is due to
A check of the
This being the first time I have run GSI ctests on Gaea I do not know if this is a fatal fail. Wall time variability is observed on other machines. It is usually viewed as a non-fatal fail. As mentioned in other GSI issues and/or PRs, the checks in |
Reran hafs_4denvar_glbens and this time the test passed.
This time the
|
Thank you @DavidHuber-NOAA for filling in the blanks in my memory. While |
@RussTreadon-NOAA I meant to ask this earlier, but are the numbers in |
@DavidBurrows-NCO , I don't use Gaea so I am not familiar with the system configuration. Which of the machines on which we currently build GSI and run ctests is most similar to Gaea in terms of cores and memory per node? If Gaea configuration is similar to machine-a, I'd look at the job configuration values for machine-a as the starting point for Gaea job configuration. |
@RussTreadon-NOAA Following wcoss2 makes sense then. Thanks. |
Install Ran ctests on each machine with the following results Cactus
Gaea
Hera
Orion
The Passed results are expected. This PR does not alter the build or execution on WCOSS2, Hera, or Orion. On Gaea the Passed is also expected since both the contrl and updat are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ctests have been run on various machines with acceptable results.
Approve.
@DavidHuber-NOAA , do you have any more requests or comments for this PR? ctests have been run on various platforms with acceptable results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks @DavidBurrows-NCO!
@RussTreadon-NOAA Thanks for the nudge. Everything looks in order. |
Thank you @DavidHuber-NOAA for the approval. I'll work with the GSI Handling Review team to schedule merger of this PR into GSI |
@DavidHuber-NOAA @RussTreadon-NOAA Thank you both for your help through the process! |
Description
Update modulefiles/gsi_gaea.intel.lua and ush/module-setup.sh to build GSI on Gaea-C5. The new module file is minimal and follows gsi_hera.intel.lua.
Refs #696
Refs NOAA-EMC/global-workflow/issues/2535
Type of change
How Has This Been Tested?
Cloned and compiled on Gaea and Hera
Checklist