Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upcoming Orion OS upgrade (June 12-13, 2024) #981

Closed
climbfuji opened this issue Jan 31, 2024 · 14 comments
Closed

Upcoming Orion OS upgrade (June 12-13, 2024) #981

climbfuji opened this issue Jan 31, 2024 · 14 comments
Assignees
Labels
INFRA JEDI Infrastructure OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center

Comments

@climbfuji
Copy link
Collaborator

climbfuji commented Jan 31, 2024

Is your feature request related to a problem? Please describe.

UPDATE - the OS upgrade was postponed to June 12-13, 2024.

Email from the Orion sysadmins:

Orion OS and Software Stack Upgrade - Scheduled for 04/24/24
The Operating System (OS) used on the Orion system is currently CentOS 7. The CentOS 7 OS will reach End-of-Support on June 30, 2024. End-of-Support means that there will be no user support, package updates, or security patches. As a result, the system must be migrated to a new OS before June 30, 2024. The OS used on the Hercules system is based on Rocky 9, another derivative of Red Hat Linux. So the current plan is to replicate Hercules's entire software stack onto Orion. This update will bring consistency between the two MSU systems and include the OS, module files, supported compilers, and newer versions of all "/apps" software. You should expect that this change will require that all models and all user maintained software ("/apps/contrib") will have to be rebuilt on Orion. You should also expect that you may run into technical issues when trying to rebuild your codes. So please plan accordingly.

Due to Hercules being fully available prior to the migration, there will be no incremental migration. The entire Orion system will be upgraded all at once. The Orion OS and software stack upgrade is currently scheduled to occur during the April 24th, 2024 maintenance downtime. In the meantime, we encourage you and your project to utilize the Hercules system. Getting your models and workflows to run properly on Hercules will help you to greatly minimize the impact of the OS upgrade on Orion.

Describe the solution you'd like
We need to rebuild whatever versions of spack-stack we want to support (current for sure, how many back?) after the OS upgrade.

Additional context
n/a

@climbfuji climbfuji added INFRA JEDI Infrastructure OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center labels Jan 31, 2024
@ulmononian
Copy link
Collaborator

@RatkoVasic-NOAA fyi

@RatkoVasic-NOAA RatkoVasic-NOAA self-assigned this Mar 6, 2024
@climbfuji climbfuji changed the title Upcoming Orion OS upgrade (April 2024) Upcoming Orion OS upgrade (May 22nd 2024) Apr 10, 2024
@RatkoVasic-NOAA
Copy link
Collaborator

Upgrade moved to May 22nd.

@RatkoVasic-NOAA
Copy link
Collaborator

RatkoVasic-NOAA commented Jun 18, 2024

For now gcc@12 is not yet functional and sys admins are working on it.
(using our installation: /work/noaa/epic/role-epic/spack-stack/orion/modulefiles)

Here is TODO list:

  • spack-stack-1.5.1 - unified-env-rocky9 (INTEL) (installed with old intel, failing with new)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/envs/unified-env-rocky9/install/modulefiles/Core

  • spack-stack-1.5.1 - unified-env-rocky9 (GNU)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/envs/unified-env-rocky9/install/modulefiles/Core

  • spack-stack-1.5.1 - gsi-addon-rocky9 (INTEL)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/envs/gsi-addon-rocky9/install/modulefiles/Core

  • spack-stack-1.5.1 - gsi-addon-rocky9 (GNU)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.5.1/envs/gsi-addon-rocky9/install/modulefiles/Core

  • spack-stack-1.6.0 - unified-env-rocky9 (INTEL) (installed with old intel, failing with new)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/unified-env-rocky9/install/modulefiles/Core/

  • spack-stack-1.6.0 - unified-env-rocky9 (GNU)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/ue-gcc/install/modulefiles/Core

  • spack-stack-1.6.0 - gsi-addon-rocky9 (INTEL) (installed with old intel, failing with new)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-env-rocky9/install/modulefiles/Core

  • spack-stack-1.6.0 - gsi-addon-rocky9 (GNU)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/gsi-addon-gcc/install/modulefiles/Core

  • spack-stack-1.7.0 - ue-intel (INTEL)

  • spack-stack-1.7.0 - ue-gcc (GNU)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.7.0/envs/ue-gcc/install/modulefiles/Core/

  • spack-stack-1.7.0 - gsi-addon-intel (INTEL)

  • spack-stack-1.7.0 - gsi-addon-gcc (GNU)
    /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.7.0/envs/gsi-addon-gcc/install/modulefiles/Core

@RaghuReddy-NOAA
Copy link

Hi Ratko, the following github issue says the spack-stack team is waiting for some action from admins. I assume that you are referring to MSU Orion admins that you are waiting for?

If yes, what do you need from them? Is there a test case that they can investigate?

@RatkoVasic-NOAA
Copy link
Collaborator

RatkoVasic-NOAA commented Jun 20, 2024

Hi @RaghuReddy-NOAA , for now we had problem with gcc@12.2.0
I just installed gcc@12.2.0 and openmpi@4.1.6, as we speak spack-stack is compiling using those two.
For my first Intel installation of the spack-stack, it looks like I used old ifort: intel@2021.9.0, and it worked, but @climbfuji realized it was old installation, then I tried with new one (intel-oneapi-compilers/2023.1.0), but with this one started failing immediately.
BTW, there's also rocoto/ruby installation missing.

@RatkoVasic-NOAA
Copy link
Collaborator

CURRENT ORIONN STATUS (from email):

I first started with 1.6.0 version, since SRW and ufs-weather-model use it, and environment unified-env-rocky9.
Intel part worked OK, but gnu was failing immediately.
I did same thing for 1.5.1.
Again Intel compiled OK, but gnu failed.

Dom suggested that there were some problems with system GNU installation, so I installed gcc@12.2.0 and openmpi@4.1.6
(/work/noaa/epic/role-epic/spack-stack/orion/modulefiles)

Using our GNU/OpenMPI installations, I first tested spack-stack 1.7.0, environment ue-gcc.
After fixing couple of small problems, all but three libraries were installed. Three that failed are: wgrib2, cdo, and py-matplotlib.

  1. CDO: failed with "internal compielr error" (same error that Natalie encountered on Hera with gcc@12.x

  2. WGRIB2: that one failed while compiling ipolates.F90 and ipolatev.F90, which include config.h
    and compiler is complaining on first character in config.h:

\#define USE_SPECTRAL 0
/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.7.0/spack/lib/spack/env/gcc/gfortran  -c -O2 -fopenmp -fdefault-real-8 -fdefault-double-8 -cpp -DLSIZE=8 ipolates.F90
config.h:1:2:

    1 | \#define USE_SPECTRAL 0
      |  1
Error: Invalid character in name at (1)
make[1]: *** [Makefile:27: ipolates.o] Error 1

  1. py-matplotlib: cannot find crti.o (which IS in /usr/lib64/, but somehow is missing in LIBRARY_PATH environment variable)
  /usr/bin/ld: cannot find crti.o: No such file or directory
  collect2: error: ld returned 1 exit status

Run directories of failing packages:

/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.7.0/cache/build_stage/spack-stage-cdo-2.0.5-iykdtbemxqnlz42yzxbthnffwfqsj3ht
/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.7.0/cache/build_stage/spack-stage-wgrib2-2.0.8-ransj2qqnjbepoitmw72y3ciufxm2yub
/work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.7.0/cache/build_stage/spack-stage-py-matplotlib-3.7.4-ozycdt7vbzyq42dteyonynfwelnz2rlm

@RatkoVasic-NOAA
Copy link
Collaborator

RatkoVasic-NOAA commented Jun 25, 2024

Good news:

  1. installing wgrib2@3.1.1 instead of wgrib2@2.0.8 worked OK.
  2. turning off requirement for cdo@2.0.5 and installing cdo@2.3.0 solved problem

We still have to fix py-matplotlib!

@huston-rogers
Copy link

This is Huston @ MSU, doing the gcc-12 rebuild for Orion. Please confirm the expected packages for gcc-12.2.0, here's what was built today, and will be sync'd after confirmation:

gcc-12.2.0, no mpi:

Note Issues: ncview, ncl, and cdo had errors.

 -  cdo@2.1.0%gcc@12.2.0
[+] eccodes@2.25.0%gcc@12.2.0
[+] esmf@8.3.1%gcc@12.2.0 ~~mpi
[+] ffmpeg@4.4.1%gcc@12.2.0
[+] fftw@2.1.5%gcc@12.2.0 ~~mpi++openmp
[+] fftw@3.3.10%gcc@12.2.0 ~~mpi++openmp
[+] hdf5@1.12.2%gcc@12.2.0 ~~mpi
[+] mpich@4.0.2%gcc@12.2.0 +fortran+hwloc+hydra+libxml2+pci+romio+slurm+wrapperrpath
[+] mpich@4.1.1%gcc@12.2.0 +fortran+hwloc+hydra+libxml2+pci+romio+slurm+wrapperrpath
[+] nccmp@1.9.0.1%gcc@12.2.0
 -  ncl@6.6.2%gcc@12.2.0
[+] nco@5.0.1%gcc@12.2.0
 -  ncview@2.1.8%gcc@12.2.0
[+] netcdf-c@4.9.0%gcc@12.2.0 ~~mpi
[+] netcdf-cxx@4.2%gcc@12.2.0
[+] netcdf-fortran@4.6.0%gcc@12.2.0
[+] openmpi@4.1.4%gcc@12.2.0 +pmi+romio+rsh+static+vt+wrapper-rpath fabrics=none schedulers=slurm
[+] parallel-netcdf@1.12.2%gcc@12.2.0
[+] su2@7.3.1%gcc@12.2.0
[+] wgrib2@3.1.1%gcc@12.2.0

gcc-12.2.0, openmpi-4.1.4
Note: namd requires an extra step, and hasn't been completed yet.

[+] esmf@8.3.1%gcc@12.2.0
[+] fftw@2.1.5%gcc@12.2.0 ++openmp
[+] fftw@3.3.10%gcc@12.2.0 ++openmp
[+] hdf5@1.12.2%gcc@12.2.0 +fortran+hl++mpi
[+] hdf5@1.14.3%gcc@12.2.0 +fortran+hl++mpi
 -  namd@2.14%gcc@12.2.0
[+] ncl@6.6.2%gcc@12.2.0
[+] nco@5.0.1%gcc@12.2.0
[+] netcdf-c@4.9.0%gcc@12.2.0 ++mpi
[+] netcdf-cxx@4.2%gcc@12.2.0
[+] netcdf-fortran@4.6.0%gcc@12.2.0
[+] parallel-netcdf@1.12.2%gcc@12.2.0

gcc-12.2.0, mpich-4.1.1
Note: same namd note

[+] esmf@8.3.1%gcc@12.2.0
[+] fftw@2.1.5%gcc@12.2.0 ++openmp
[+] fftw@3.3.10%gcc@12.2.0 ++openmp
[+] hdf5@1.12.2%gcc@12.2.0 ++mpi
 -  namd@2.14%gcc@12.2.0
[+] ncl@6.6.2%gcc@12.2.0
[+] nco@5.0.1%gcc@12.2.0
[+] netcdf-c@4.9.0%gcc@12.2.0 ++mpi
[+] netcdf-cxx@4.2%gcc@12.2.0
[+] netcdf-fortran@4.6.0%gcc@12.2.0
[+] parallel-netcdf@1.12.2%gcc@12.2.0

gcc-12.2.0, mpich-4.0.2
Note: mpich-4.0.2 now conflicts hdf5, and as such nothing builds correctly. Still working on it.

 -  esmf@8.3.1%gcc@12.2.0
 -  fftw@2.1.5%gcc@12.2.0 ++openmp
 -  fftw@3.3.10%gcc@12.2.0 ++openmp
 -  gptl@8.0.3%gcc@12.2.0
 -  namd@2.14%gcc@12.2.0
 -  ncl@6.6.2%gcc@12.2.0
 -  nco@5.0.1%gcc@12.2.0
 -  netcdf-c@4.9.0%gcc@12.2.0 ++mpi
 -  netcdf-cxx@4.2%gcc@12.2.0
 -  netcdf-fortran@4.6.0%gcc@12.2.0
 -  parallel-netcdf@1.12.2%gcc@12.2.0

Question 1, is there a use case for gcc-12 + intel-impi? I would sooner encourage using intel-compilers with intel-impi over any others.

Question 2, should openmpi or mpich be updated to other versions?

Extra Information:

Environment Location, going to work towards leveraging environments more, i.e. one environment file per compiler+mpi combination: /apps/spack-managed/spack-devel/var/spack/environments/

Getting Spack on HPC systems. Tentative plan is to make this a permanent path on all MSU systems: /apps/spack-managed/spack-devel

Our config is in there, under etc/spack, and the setup script is in the expected share/spack/

@climbfuji
Copy link
Collaborator Author

@snowbird294 Thanks for the info. I wasn't involved in any of the conversations regarding sysadmin spack builds on Orion after the Rocky9 transition, but usually what we do with spack-stack is that we use the compilers and MPI libraries from the system/sysadmins and build the rest ourselves. @AlexanderRichert-NOAA @RatkoVasic-NOAA please correct me if a different path was chosen for Orion.

As far as compilers and MPI are concerned, gcc@12.x.y with openmpi@4.1.x or 5.0.x should do; we don't usually mix and match GNU compilers with Intel MPI, we use the intel compilers (so far mostly the classic compilers up to the very last release of oneAPI that had them 2023.??.??) with Intel oneAPI MPI.

@RatkoVasic-NOAA
Copy link
Collaborator

This time on Orion we went ahead and installed gnu and openmpi:
/work/noaa/epic/role-epic/spack-stack/orion/modulefiles/
gcc/12.2.0
openmpi/4.1.6

@climbfuji
Copy link
Collaborator Author

Update from JCSDA: spack-stack-1.7.0 is working for them, with the exception of having to load the git-lfs module manually after loading all other modules. This is because the Orion site config package.yaml hasn't been updated in the release/1.7.0 branch. It was simply left as-is from the old CentOS system. I fixed that for develop after discovering the issue, but not for 1.7.0. Given that only JCSDA uses spack-stack-1.7.0, that they are aware of the workaround, and that spack-stack-1.8.0 will be released in about two months, that's good enough. It's also in the spack stack wiki for other users.

@climbfuji
Copy link
Collaborator Author

@AlexanderRichert-NOAA Where are we at with the 1.5.1 and 1.6.0 installs? Are they all done? If so, then we can close this issue as completed - I did 1.7.0 and we just merged the updates for develop.

@AlexanderRichert-NOAA
Copy link
Collaborator

All done. The only ones not checked off in the list are two of the 1.7.0 ones, so if those are done, then let's close this puppy.

@climbfuji
Copy link
Collaborator Author

All done. The only ones not checked off in the list are two of the 1.7.0 ones, so if those are done, then let's close this puppy.

They were done, I forgot the list. Checked the boxes, closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
INFRA JEDI Infrastructure OAR-EPIC NOAA Oceanic and Atmospheric Research and Earth Prediction Innovation Center
Projects
None yet
Development

No branches or pull requests

7 participants