Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omega on GPUs with >1 node need MPICH_GPU_SUPPORT_ENABLED=1 #275

Open
mark-petersen opened this issue Feb 3, 2025 · 4 comments
Open

Omega on GPUs with >1 node need MPICH_GPU_SUPPORT_ENABLED=1 #275

mark-petersen opened this issue Feb 3, 2025 · 4 comments

Comments

@mark-petersen
Copy link

Polaris runs with Omega on GPUs failed with >1 node. We need MPICH_GPU_SUPPORT_ENABLED=1 in the environment.

When I add this setting by hand, the test
ocean/planar/manufactured_solution/convergence_both/default
passes. If not, it fails at forward_50km_75s.

@mark-petersen
Copy link
Author

mark-petersen commented Feb 3, 2025

This problem was reported in E3SM-Project/Omega#196.

Here is my exact test sequence. Here I am using polaris to set up the test case.

details:


# choose one of:
COMPILER=gnu        # CPU
COMPILER=crayclang  # CPU
export COMPILER=gnugpu
export COMPILER=crayclanggpu
export MPICH_GPU_SUPPORT_ENABLED=1 # on GPUs

CODEDIR=opr

export DATE=`date +"%y%m%d"`
export r=/lustre/orion/cli115/scratch/mpetersen/runs
export RUNDIR=$r/${DATE}_omega_${CODEDIR}_${COMPILER}

source /ccs/home/mpetersen/repos/polaris/main/load_dev_polaris_0.5.0-alpha.2_frontier_${COMPILER}_mpich.sh
export PARMETIS_ROOT=/ccs/proj/cli115/software/polaris/frontier/spack/dev_polaris_0_5_0_${COMPILER}_mpich/var/spack/environments/dev_polaris_0_5_0_${COMPILER}_mpich/.spack-env/view

rm -rf $RUNDIR 
mkdir -p ${RUNDIR}/build
cd $RUNDIR/build

module load cmake
cmake \
   -DOMEGA_CIME_COMPILER=${COMPILER} \
   -DOMEGA_BUILD_TYPE=Release \
   -DOMEGA_CIME_MACHINE=frontier \
   -DOMEGA_PARMETIS_ROOT=${PARMETIS_ROOT}\
   -DOMEGA_BUILD_TEST=ON \
   -DOMEGA_MPI_ON_DEVICE=ON \
   -Wno-dev \
   -S /ccs/home/mpetersen/repos/E3SM/${CODEDIR}/components/omega \
   -B .
./omega_build.sh

polaris --list
  59: ocean/planar/manufactured_solution/convergence_space/default
  60: ocean/planar/manufactured_solution/convergence_time/default
  61: ocean/planar/manufactured_solution/convergence_both/default
  62: ocean/planar/manufactured_solution/convergence_both/del2
  63: ocean/planar/manufactured_solution/convergence_both/del4

polaris setup -p $RUNDIR/build  --model=omega -w $RUNDIR -n 61 

# choose one of:
salloc -A cli115 -J inter -t 40:00 -q debug -N 1 -S 0  # CPU
salloc -A cli115 -J inter -t 1:00:00 -q debug -N 4 -p batch  #GPU


source /ccs/home/mpetersen/repos/polaris/main/load_dev_polaris_0.5.0-alpha.2_frontier_${COMPILER}_mpich.sh
cd $RUNDIR
polaris serial # runs the full suite

# test individually:
cd ocean/planar/manufactured_solution/default/forward/100km_150s/
srun -N 2 -n 8 --ntasks-per-gpu=1 --gpu-bind=closest -c 1 ./omega.exe

@mark-petersen
Copy link
Author

@cbegeman and @xylar, once I ran with the MPICH_GPU_SUPPORT_ENABLED=1 flag, the srun flags didn't seem to matter. These all worked on a 4-node interactive reservation:

cd ocean/planar/manufactured_solution/convergence_both/default/forward//50km_75s
time srun -N 4 -c 1 ./omega.exe # uses 1 GPUs/node
real	0m13.933s
time srun -N 4 -n 56 --ntasks-per-gpu=7 --gpu-bind=closest -c 1 ./omega.exe # uses 2 GPUs/node
real	0m30.768s
time srun -N 4 -n 112 --ntasks-per-gpu=7 --gpu-bind=closest -c 1 ./omega.exe # uses 4 GPUs/node
real	0m41.862s
time srun -N 4 -n 64 --ntasks-per-gpu=2 --gpu-bind=closest -c 1 ./omega.exe
real	0m31.663s
time srun -N 4 -n 64 -c 1 ./omega.exe
real	0m31.764s
time srun -N 4 -n 56 -c 1 ./omega.exe
real	0m30.436s
# except this one, which would be 8 GPUs/node:
time srun -N 4 -n 228 --ntasks-per-gpu=7 --gpu-bind=closest -c 1 ./omega.exe
srun: error: Invalid generic resource (gres) specification

Here -c 1 is --cpus-per-task=1 as >1 is only useful for multi-threading.

I was watching the GPU usage with

module load rocm
watch -n 0.5 rocm-smi

and could see that the GPUs are in use for all of the above commands as follows:
Image

I need to understand why fewer GPUs are faster above, but for these convergence tests it appears that

srun -N [number_nodes]  -c 1 ./omega.exe

works fine. I will need to do more research into these flag settings for the performance tests.

@cbegeman
Copy link
Collaborator

cbegeman commented Feb 3, 2025

@mark-petersen Glad to hear it. We can follow up elsewhere on whether we want ntasks-per-gpu to be a polaris config option.

@xylar
Copy link
Collaborator

xylar commented Feb 6, 2025

I have added the missing environment variables to mache here:
E3SM-Project/mache#231

For this to propagate to Polaris, we will need a new mache release and then to update the mache version here in Polaris. That should happen pretty soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants