smilei v5.0 problems with compiling on GPU on new HPC #674

spadova-a · 2023-11-27T13:54:01Z

Hello there,

I would like to use Smilei with GPU on Karolina cluster at IT4I in Ostrava and I am not sure how to compile it. So, I asked the administrator to help me with it, but he encoutered the following problem - GPU compilation for A100 fails with:

src/Diagnostic/DiagnosticScalar.cpp(802): error: expected a ";"
                                        maxval = fieldval;
                                        ^
src/Diagnostic/DiagnosticScalar.cpp(803): error #547: nonstandard form for taking the address of a member function
                              ATOMIC(write)
                                     ^
src/Diagnostic/DiagnosticScalar.cpp(804): error: expected a ";"
                              i_max=i;
                              ^
src/Diagnostic/DiagnosticScalar.cpp(805): error #547: nonstandard form for taking the address of a member function
                              ATOMIC(write)
                                     ^
src/Diagnostic/DiagnosticScalar.cpp(806): error: expected a ";"
                                        j_max=j;
                                        ^
src/Diagnostic/DiagnosticScalar.cpp(807): error #547: nonstandard form for taking the address of a member function
                              ATOMIC(write)
                                     ^
src/Diagnostic/DiagnosticScalar.cpp(808): error: expected a ";"
                                        k_max=k;

I will share this issue with the administrator, since I don't know any details of his procedure. Could you please help us find the problem? Note, that there were no problems with the CPU compilation.

The text was updated successfully, but these errors were encountered:

charlesprouveur · 2023-11-27T23:12:20Z

Hello,
We will need a bit more information to help you, ie what make command did you use? Did you try to use a machine file? The documentation will be updated in the near future to better guide SMILEI's users through the compilation targeting GPU acceleration. In the meantime there has been a discussion for V100 on the element channel and you can see on this github an issue detailing the compilation process to target AMD GPU which should inspire you nonetheless.

Typically for a A100 you can look at the machine file:

smilei/scripts/compile_tools/machine/jean_zay_gpu_A100

Your make command would look something like:

make -j 12 machine="jean_zay_gpu_A100" config="gpu_nvidia noopenmp verbose"

An example of a working environment we can recommend would be:

module purge
module load anaconda-py3/2020.11
module load nvidia-compilers/23.1
module load cuda/11.2
module load openmpi/4.1.1-cuda
module load hdf5/1.12.0-mpi-cuda
# For HDF5, note that module show can give you the right path
export HDF5_ROOT_DIR=/DIRECTORY_NAME/hdf5/1.12.0/pgi-20.4-HASH/

Regarding your specific error, it looks like you did not compile with nvc++, likely because no machine file for GPU was specified in the make command.
Edit: actually this is because you did not use a machine file: the compilation flags are missing "-DSMILEI_OPENACC_MODE"

spadova-a · 2023-12-04T17:50:54Z

Hi, sorry for the late answer, I will try to summarise what we tried.

Loaded modules:

Python/3.10.8-GCCcore-12.2.0 
HDF5/1.14.0-gompi-2022b 
NVHPC/23.7
CUDA/11.7

Created new machine file containing:

SMILEICXX.DEPS = nvcc
THRUSTCXX = nvcc

ACCELERATOR_GPU_FLAGS += -DSMILEI_OPENACC_MODE
ACCELERATOR_GPU_KERNEL_FLAGS += -DSMILEI_OPENACC_MODE

LDFLAGS += -ta=tesla:cc70 -std=c++14 -Mcudalib=curand -lcudart -lcurand -lacccuda -L${EBROOTCUDA}lib64/
CXXFLAGS +=  -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1
LDFLAGS = $LDFLAGS:$LD_LIBRARY_PATH
HDF5_ROOT_DIR = ${EBROOTHDF5}

and used the make command:
make -j 12 machine="karolina_IT4I" config="gpu_nvidia noopenmp verbose"

But still didn't have luck with the compilation, the error we are getting is:

src/Params/Params.h: In static member function ‘static constexpr int Params::getGPUClusterWidth(int)’:
src/Params/Params.h:421:5: error: body of ‘constexpr’ function ‘static constexpr int Params::getGPUClusterWidth(int)’ not a return-statement
  421 |     }
      |     ^

Do you have an idea, what could be the problem? wrong compilation flags, missing or incorrect module?

charlesprouveur · 2023-12-04T19:36:05Z

Hi,
Looking at the modules i do see a couple issues:

HDF5/1.14.0-gompi-2022b

means your hdf5 module was not compiled with your nvhpc module / the nvidia compiler nvc++

Now i can already predict some issues with your nvhpc module that is recent:

NVHPC/23.7

it will require using '-gpu=cc70' as -ta=tesla:cc70 is deprecated after nvhpc 23.5, -Mcudalib=curand should also be removed as it is deprecated, we also know that we have an issue with the newest curand library therefore you will need a fix for the header file gpuRandom.h in src/tools/ ...
As for the cuda version i would recommend either 11.2 or 11.8.. We have currently an issue with CUDA >12.0

Finally for your specific error, could you print the command make is trying to execute (that you should be seeing thanks to the "verbose" configuration ) to be sure there is nothing else?

To sum things up: the quickest way for you to use smilei on gpu would be to:

use cuda <= 11.8 and nvhpc <= 23.1 (23.2 and 23.3 may work with no changes, i just have not tested this specific configuration before)
compile a hdf5 module with nvidia compiler

spadova-a · 2023-12-04T20:29:54Z

So, concerning the HDF5 - there is no module compiled with nvidia compiler and enable the paralell option at the cluster, this means I have to download and compile it myself, right?

About the CUDA - there is no CUDA 11.8 nor 11.2, will 11.3, 11.4 or 11.7 do?

Concerning the error, I am not sure, where I can find this. But these are the last lines and the error occurred, in fact, multiple times
make: *** [build/src/Diagnostic/DiagnosticFieldsAM.o] Error 1
make: *** [build/src/Diagnostic/DiagnosticFields2D.o] Error 1
make: *** [build/src/Diagnostic/DiagnosticTrack.o] Error 1
make: *** [build/src/Diagnostic/DiagnosticParticleList.o] Error 1
make: *** [build/src/Checkpoint/Checkpoint.o] Error 1
make: *** [build/src/Collisions/BinaryProcesses.o] Error 1

beck-llr · 2023-12-04T21:12:21Z

The way to go is normally to ask the administrator to make it available to you. It will benefit other potential users too.

charlesprouveur · 2023-12-04T23:17:19Z

Regarding hdf5, as beck-llr said, that should be the job of your support team/admins. (you would do something like this comment: https://forums.developer.nvidia.com/t/how-to-build-parallel-hdf5-with-nvhpc/181361/4 )

For the cuda version you mention it should not be a problem. You have to watch out for the nvhpc module though, do you have anything <= 23.1 ?

Finally, the errors you mentioned are due to make terminating because of the error you showed part of previously.
Can you show what

 make -j 1 machine="karolina_IT4I" config="gpu_nvidia noopenmp verbose"

returns you?

spadova-a · 2023-12-05T08:44:11Z

Hi,
ok, I will ask the support teams to compile the HDF5.

Yes, there is NVHPC/23.1, 22.7 and 22.2.

Still not sure, what exactly do you want me to show... this is everything that is written by the terminal:

Compiling src/Checkpoint/Checkpoint.cpp
mpicxx -Wno-reorder -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -D__VERSION=\"5.0-13-g4f145b3-master\" -DOMPI_SKIP_MPICXX -std=c++11 -Wall -Wextra -I/apps/all/HDF5/1.14.0-gompi-2022b/include -Isrc -Isrc/Profiles -Isrc/Params -Isrc/Projector -Isrc/Checkpoint -Isrc/picsar_interface -Isrc/ElectroMagnBC -Isrc/ElectroMagn -Isrc/Tools -Isrc/Patch -Isrc/Diagnostic -Isrc/PartCompTime -Isrc/ParticleBC -Isrc/Radiation -Isrc/Merging -Isrc/Interpolator -Isrc/DomainDecomposition -Isrc/Collisions -Isrc/MultiphotonBreitWheeler -Isrc/Pusher -Isrc/MovWindow -Isrc/Field -Isrc/Particles -Isrc/SmileiMPI -Isrc/ElectroMagnSolver -Isrc/Python -Isrc/Ionization -Isrc/Species -Isrc/ParticleInjector -Ibuild/src/Python -I/apps/all/Anaconda3/2023.09-0/include/python3.11 -I/apps/all/Anaconda3/2023.09-0/include/python3.11 -I/apps/all/Anaconda3/2023.09-0/lib/python3.11/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -O3 -g -DSMILEI_OPENACC_MODE -DSMILEI_ACCELERATOR_MODE -c src/Checkpoint/Checkpoint.cpp -o build/src/Checkpoint/Checkpoint.o

Edit: only the correct part of the terminal message was kept, so the post is not too long.

charlesprouveur · 2023-12-05T12:32:07Z

For future reference, this is what i meant:

mpicxx -Wno-reorder -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -D__VERSION=\"5.0-13-g4f145b3-master\" -DOMPI_SKIP_MPICXX -std=c++11 -Wall -Wextra -I/apps/all/HDF5/1.14.0-gompi-2022b/include -Isrc -Isrc/Profiles -Isrc/Params -Isrc/Projector -Isrc/Checkpoint -Isrc/picsar_interface -Isrc/ElectroMagnBC -Isrc/ElectroMagn -Isrc/Tools -Isrc/Patch -Isrc/Diagnostic -Isrc/PartCompTime -Isrc/ParticleBC -Isrc/Radiation -Isrc/Merging -Isrc/Interpolator -Isrc/DomainDecomposition -Isrc/Collisions -Isrc/MultiphotonBreitWheeler -Isrc/Pusher -Isrc/MovWindow -Isrc/Field -Isrc/Particles -Isrc/SmileiMPI -Isrc/ElectroMagnSolver -Isrc/Python -Isrc/Ionization -Isrc/Species -Isrc/ParticleInjector -Ibuild/src/Python -I/apps/all/Anaconda3/2023.09-0/include/python3.11 -I/apps/all/Anaconda3/2023.09-0/include/python3.11 -I/apps/all/Anaconda3/2023.09-0/lib/python3.11/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -O3 -g -DSMILEI_OPENACC_MODE -DSMILEI_ACCELERATOR_MODE -c src/Checkpoint/Checkpoint.cpp -o build/src/Checkpoint/Checkpoint.o

Because you were using recent module (nvhpc 23.7) you were missing flags such as -gpu=cc70,cc80 -acc etc.

To simplifiy your first compilation, please use nvhpc23.1 and cuda 11.3 as you have these, and have support compile hdf5 with the compiler that comes with it.

Finally your machine file should look like this (i saw that the Karolina cluster is using AMD CPU + A100) :

SMILEICXX.DEPS = nvcc
THRUSTCXX = nvcc

ACCELERATOR_GPU_FLAGS += -w
ACCELERATOR_GPU_FLAGS += -tp=zen3 -ta=tesla:cc80 -std=c++14  -lcurand

ACCELERATOR_GPU_KERNEL_FLAGS += -O3 --std c++14 $(DIRS:%=-I%)
ACCELERATOR_GPU_KERNEL_FLAGS += --expt-relaxed-constexpr
ACCELERATOR_GPU_KERNEL_FLAGS += $(shell $(PYTHONCONFIG) --includes)
ACCELERATOR_GPU_KERNEL_FLAGS += -arch=sm_80
ACCELERATOR_GPU_FLAGS        += -Minfo=accel # what is offloaded/copied 
ACCELERATOR_GPU_FLAGS += -DSMILEI_OPENACC_MODE
ACCELERATOR_GPU_KERNEL_FLAGS += -DSMILEI_OPENACC_MODE

LDFLAGS += -ta=tesla:cc80 -std=c++14  -lcudart -lcurand -lacccuda -L${EBROOTCUDA}lib64/
CXXFLAGS +=  -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1
LDFLAGS = $LDFLAGS:$LD_LIBRARY_PATH
HDF5_ROOT_DIR = ${EBROOTHDF5}

If you want to try nvhpc 23.7, it should look like this:

SMILEICXX.DEPS = nvcc
THRUSTCXX = nvcc

ACCELERATOR_GPU_FLAGS += -w
ACCELERATOR_GPU_FLAGS += -tp=zen3 -gpu=cc80 -acc  -std=c++14  -lcurand

ACCELERATOR_GPU_KERNEL_FLAGS += -O3 --std c++14 $(DIRS:%=-I%)
ACCELERATOR_GPU_KERNEL_FLAGS += --expt-relaxed-constexpr
ACCELERATOR_GPU_KERNEL_FLAGS += $(shell $(PYTHONCONFIG) --includes)
ACCELERATOR_GPU_KERNEL_FLAGS += -arch=sm_80
ACCELERATOR_GPU_FLAGS        += -Minfo=accel # what is offloaded/copied 
ACCELERATOR_GPU_FLAGS += -DSMILEI_OPENACC_MODE
ACCELERATOR_GPU_KERNEL_FLAGS += -DSMILEI_OPENACC_MODE

LDFLAGS += -gpu=cc80 -std=c++14 -acc -cuda  -lcudart -lcurand -lacccuda -L${EBROOTCUDA}lib64/
CXXFLAGS +=  -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1
LDFLAGS = $LDFLAGS:$LD_LIBRARY_PATH
HDF5_ROOT_DIR = ${EBROOTHDF5}

spadova-a · 2023-12-05T12:58:23Z

Ok, thank you. I will let you know, once the proper HDF5 module is ready and I try the compilation again.

spadova-a · 2024-01-17T11:07:19Z

Hi,
so I finally got the right HDF5 module available. Nonetheless, I still wasn't successful with the compilation. The latest error is this:
Linking smilei . . . -L/apps/all/HDF5/1.14.0-nvompi-2022.07/lib DFLAGS:D_LIBRARY_PATH -lhdf5 -L/apps/all/Python/3.10.4-GCCcore-11.3.0/lib -lpython3.10 -lcrypt -ldl -lm -lpthread -lutil -lm -lm -Xlinker -export-dynamic /apps/all/binutils/2.38-GCCcore-11.3.0/bin/ld: cannot find DFLAGS:D_LIBRARY_PATH: No such file or directory make: *** [smilei] Error 2
I guess the problem is that it is looking for DFLAGS:D_LIBRARY_PATH instead of LDFLAGS:LD_LIBRARY_PATH. However, I don't know neither why of how to fix it. Any ideas?

mccoys · 2024-01-17T13:20:43Z

Something is very wrong in your setup. Can you show the result of make env

iltommi · 2024-01-17T13:30:55Z

also a make config=verbose can help

spadova-a · 2024-01-17T14:00:52Z

make env:
VERSION : 5.0-57-gc23dd35-master SMILEICXX : mpicxx OPENMP_FLAG : -fopenmp -D_OMP HDF5_ROOT_DIR : FFTW3_LIB_DIR : SITEDIR : /home/spadoalz/.local/lib/python3.10/site-packages PYTHONEXE : python PY_CXXFLAGS : -I/apps/all/Python/3.10.4-GCCcore-11.3.0/include/python3.10 -I/apps/all/Python/3.10.4-GCCcore-11.3.0/include/python3.10 -I/apps/all/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION PY_LDFLAGS : -L/apps/all/Python/3.10.4-GCCcore-11.3.0/lib -lpython3.10 -lcrypt -ldl -lm -lpthread -lutil -lm -lm -Xlinker -export-dynamic CXXFLAGS : -D__VERSION=\"5.0-57-gc23dd35-master\" -DOMPI_SKIP_MPICXX -std=c++14 -Isrc -Isrc/Profiles -Isrc/Params -Isrc/Projector -Isrc/Checkpoint -Isrc/picsar_interface -Isrc/ElectroMagnBC -Isrc/ElectroMagn -Isrc/Tools -Isrc/Patch -Isrc/Diagnostic -Isrc/PartCompTime -Isrc/ParticleBC -Isrc/Radiation -Isrc/Merging -Isrc/Interpolator -Isrc/DomainDecomposition -Isrc/Collisions -Isrc/MultiphotonBreitWheeler -Isrc/Pusher -Isrc/MovWindow -Isrc/Field -Isrc/Particles -Isrc/SmileiMPI -Isrc/ElectroMagnSolver -Isrc/Python -Isrc/Ionization -Isrc/Species -Isrc/ParticleInjector -Ibuild/src/Python -I/apps/all/Python/3.10.4-GCCcore-11.3.0/include/python3.10 -I/apps/all/Python/3.10.4-GCCcore-11.3.0/include/python3.10 -I/apps/all/Python/3.10.4-GCCcore-11.3.0/lib/python3.10/site-packages/numpy/core/include -DSMILEI_USE_NUMPY -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -O3 -g -fopenmp -D_OMP LDFLAGS : -lhdf5 -L/apps/all/Python/3.10.4-GCCcore-11.3.0/lib -lpython3.10 -lcrypt -ldl -lm -lpthread -lutil -lm -lm -Xlinker -export-dynamic -lm -fopenmp -D_OMP COMPILER_INFO : pgc++

and I used the machine file, that was recommended a few comments above

charlesprouveur · 2024-01-17T20:09:01Z

remove the line:

LDFLAGS = $LDFLAGS:$LD_LIBRARY_PATH

make clean, and try again.
To add some details, LDFLAGS is supposed to be only flags that are added at the linking step, which was redefined in your script with an attempt appending it with the LD_LIBRARY_PATH environment variable. the compiler was looking for -L(OPTION_NAME) and therefore took out the L from LDFLAGS and from LD_LIBRARY_PATH, giving you your error message with "DFLAGS:D_LIBRARY_PATH".

I should have removed this line from your script when i adapted it.

spadova-a · 2024-01-18T13:11:10Z

Hello,
so I was able to compile smilei, but a test run failed

error.txt

Do you think there is a problem with some of the modules I used to compile the code? These are the modules I used:

GCCcore/11.3.0 12) libevent/2.1.12-GCCcore-11.3.0 23) Szip/2.1.1-GCCcore-11.3.0
zlib/1.2.12-GCCcore-11.3.0 13) UCX/1.12.1-GCCcore-11.3.0 24) HDF5/1.14.0-nvompi-2022.07
binutils/2.38-GCCcore-11.3.0 14) GDRCopy/2.3-GCCcore-11.3.0 25) bzip2/1.0.8-GCCcore-11.3.0
numactl/2.0.14-GCCcore-11.3.0 15) UCX-CUDA/1.12.1-GCCcore-11.3.0-CUDA-11.7.0 26) ncurses/6.3-GCCcore-11.3.0
CUDA/11.7.0 16) libfabric/1.15.1-GCCcore-11.3.0 27) libreadline/8.1.2-GCCcore-11.3.0
NVHPC/22.7-CUDA-11.7.0 17) PMIx/4.1.2-GCCcore-11.3.0 28) Tcl/8.6.12-GCCcore-11.3.0
XZ/5.2.5-GCCcore-11.3.0 18) UCC/1.0.0-GCCcore-11.3.0 29) SQLite/3.38.3-GCCcore-11.3.0
libxml2/2.9.13-GCCcore-11.3.0 19) NCCL/2.12.12-GCCcore-11.3.0-CUDA-11.7.0 30) GMP/6.2.1-GCCcore-11.3.0
libpciaccess/0.16-GCCcore-11.3.0 20) UCC-CUDA/1.0.0-GCCcore-11.3.0-CUDA-11.7.0 31) libffi/3.4.2-GCCcore-11.3.0
hwloc/2.7.1-GCCcore-11.3.0 21) OpenMPI/4.1.4-NVHPC-22.7-CUDA-11.7.0 32) Python/3.10.4-GCCcore-11.3.0

charlesprouveur · 2024-01-18T13:19:00Z

There should be nothing wrong with your modules. We are encountering a completely different class of problems which are runtime issues. from your message (please try to format it if you can, EDIT: thanks for the formatting) it crashes while computing a scalar diag.

First, what test case are you trying to run? What diags are in the namelist?
Post the ouput file, we are missing a lot of info

EDIT: are you using the latest version of smilei? Post november we added some fixes.

spadova-a · 2024-01-18T13:43:13Z

I tried to run two of the basic tutorials - thermal plasma (this is the mistake in the comment) and laser propagation in vacuum (this one failed at Fields diagnostics).
I git cloned the new version yesterday, so it should be the latest version.
Now, this is the output file:
smilei.out.txt

charlesprouveur · 2024-01-18T14:01:56Z

In the smilei.out.txt you just provided the reason for the failure is clear: you do not have the package numpy in the python module that is loaded. Make sure you have the packages required as in the doc :

sphinx, h5py, numpy, matplotlib, pint
you can also add scipy.
You can do that with pip install sphinx h5py numpy matplotlib pint ffmpe if your cluster allows it or ask your support (they may have an anaconda package already with everything)

For the other tutorial that failed (thermal plasma i think), please provide the exact input and output file. You may want to do that after you installed the python packages and run it again.

spadova-a · 2024-01-18T15:16:20Z

yeah sorry, I loaded the wrong module. Sending the current error file
smilei.out.txt

beck-llr · 2024-01-18T15:20:05Z

Does it still occur with non frozen species ? It could be that the time frozen option i not supported on gpu.

charlesprouveur · 2024-01-18T15:28:48Z

yeah sorry, I loaded the wrong module. Sending the current error file smilei.out.txt

I'd like to look at your input file as well to check.
Also, the machine file you used was for an execution on A100, can you confirm this is the hardware you are trying to execute smilei on?
Finally, what is your slurm script looking like?

spadova-a · 2024-01-18T16:06:45Z

input file: input.txt
yes, the cluster has NVIDIA A100 (link to the website: https://docs.it4i.cz/karolina/compute-nodes/)
slurm script: srun.txt (I also tried to run it as an interactive job allocating one gpu node)

charlesprouveur · 2024-01-18T16:46:04Z

You are running a test case in 1D when it is not currently supported on GPU :) (it might be soon-ish) (check the list of currently supported features here )
Edit: an additional comment, trying such a small test case on 8 GPUs might be an issue (here you would have 4 points plus the ghostcells for each patch, with one patch per GPU) , in theory it should be ok but ...

spadova-a · 2024-01-18T17:48:36Z

ok, that was a pretty silly mistake... I tried another case (input file: [input.txt])(https://github.com/SmileiPIC/Smilei/files/13980089/input.txt)
but it is still not working (output: out.txt)

charlesprouveur · 2024-01-18T19:33:32Z

so we are back to the cuda device error.

In your slurm script i don't see you loading the environment you used at compile time. typically mine looks like this:

#!/bin/bash
#SBATCH --job-name=smilei            # Job name
#SBATCH -A account
#SBATCH --partition=YOUR_GPU_PARTITION_NAME            # Partition to use
##SBATCH --qos=YOUR_QUEUE
#SBATCH --ntasks=8                   # total Number of MPI processes (= total number of GPU)
#SBATCH --ntasks-per-node=8    # number of MPI rank per node 
#SBATCH --gres=gpu:8                 # GPU number per node
#SBATCH --cpus-per-task=6           
#SBATCH --hint=nomultithread         
#SBATCH --time=00:10:00             
#SBATCH --output=output        # Name of the output file
#SBATCH --error=error         # Name of the error file

# Smilei specific env
source smilei_gpu_env_23.1.sh

set -x

# execution with binding via bind_gpu.sh : 1 GPU per MPI.
srun /gpfslocalsup/pub/idrtools/bind_gpu.sh  ./smilei input.py

while bind_gpu.sh (might not be required here though):

#!/bin/bash

LOCAL_RANK=${MPI_LOCALRANKID} # mpirun Intel MPI
if [ -z "${LOCAL_RANK}" ]; then LOCAL_RANK=${OMPI_COMM_WORLD_LOCAL_RANK}; fi # mpirun OpenMPI
if [ -z "${LOCAL_RANK}" ]; then LOCAL_RANK=${SLURM_LOCALID}; fi  # srun 

export CUDA_VISIBLE_DEVICES=${LOCAL_RANK}

"$@"

Try again with sourcing the compilation environment in your slurm script, you might just be missing that.
If that does not work:
Doing a bit of googling, (https://forums.developer.nvidia.com/t/cudalaunchkernel-returned-status-98-invalid-device-function/169958), this seems to confirm my suspicion that something could have gone wrong with the machine file. Can you do make clean and recompile + execute with the new binary just to be sure.
If that does not work share the machine file you are currently using and also look in nvcc -h what --gpu-architecture shows you ( as in, what are the sm option available)

spadova-a · 2024-01-24T15:55:28Z

Hi, sorry I have never used an environment for the compilation before, I just loaded the modules and I did the same thing in the submission script. Therefore, I don't really know how an environment should look like, I did some googling but it did not help me much... Could you please provide me with an example or some guideline?

charlesprouveur · 2024-01-24T18:46:11Z

In your slurm script i can only see:

ml purge
ml load HDF5/1.14.0-nvompi-2022.07
ml Python/3.10.4-GCCcore-11.3.0

ergo, unless by default the running environment includes nvhpc, cuda & openmpi, i don't see how your executable can access its dependencies.

Can you add "module list" in your slurm script and run it so we can see what is available at runtime?
What i call an environment is simply the module & environment variables available to your executable. Usually one uses a script to load the appropriate modules at runtime (or lists the 'module load' commands in the slurm script)

Also, in your latest output i see one mpi process and 8 patches. Are you trying to run on 1 or 8 GPUs?

spadova-a · 2024-01-25T07:58:43Z

Here is the output file with module list (the HDF5 module loads a lot of other modules as its dependencies) out.txt

I am trying to run on 8 GPUs as I am able to only allocate a full node, which has 8 GPUs. I also had 8 mpi processes in the slurm script, and the error I got was the same, while every process printed the same error message in the output file, so for testing purposes I set only 1 mpi process so the output file wouldn't be so long.

mccoys · 2024-01-25T08:04:34Z

You should load NVHPC when you compile Smilei

charlesprouveur · 2024-01-25T09:50:14Z

That seems to be the case. Although the fact that there is another cuda module loaded is not great.

@spadova-a Can you do make clean and recompile + execute with the new binary just to be sure.
If that does not work share the machine file you are currently using and also look in nvcc -h what --gpu-architecture shows you ( as in, what are the sm option available)

spadova-a · 2024-02-13T07:53:21Z

Hi, sorry for the inactivity, right now I have a lot of work to do. I will give the installation a new try soon.

mccoys · 2024-02-29T12:25:19Z

@spadova-a The makefile has been modified to make GPU compilation easier.
See this: https://smileipic.github.io/Smilei/Use/installation.html#setup-environment-variables-for-compilation
and this: https://smileipic.github.io/Smilei/Use/installation.html#compilation-for-gpu-accelerated-nodes

Horymir001 · 2024-05-08T18:17:59Z

Dear colleagues,
I have observed this discussion for some time. As Karolina undertook an upgrade recently and there are new modules available now, so I tried to compile Smilei with GPU accelerator as well. However, I did not succeed so far.

I tried the compilation on an accelerated node with 8 A100 GPUs.

(base) [it4i-vojtech@login2.karolina Smilei]$ salloc -A DD-23-157 -p qgpu_exp -N 1 --ntasks-per-node 16 --gpus 8 -t 00:40:00
salloc: Granted job allocation 1032620
salloc: Waiting for resource configuration
salloc: Nodes acn17 are ready for job
(base) [it4i-vojtech@acn17.karolina Smilei]$

There, I tried to load the proper modules, and there is a good candidate indeed.


(base) [it4i-vojtech@acn17.karolina Smilei]$ module spider HDF5

...
     Versions:
        HDF5/1.12.1-gompi-2021b
        HDF5/1.12.2-gompi-2022a
        HDF5/1.12.2-iimpi-2022a
        HDF5/1.14.0-gompi-2023a
        HDF5/1.14.0-iimpi-2022b-serial
        HDF5/1.14.0-iimpi-2022b
        HDF5/1.14.3-gompi-2023b
        HDF5/1.14.3-iimpi-2023b
        HDF5/1.14.3-NVHPC-24.1-CUDA-12.4.0
        HDF5/1.14.3-NVHPC-24.3-CUDA-12.3.0

...

Let's try the last one then.

(base) [it4i-vojtech@acn17.karolina Smilei]$ ml HDF5/1.14.3-NVHPC-24.3-CUDA-12.3.0

These are all the loaded modules:

(base) [it4i-vojtech@l@acn17.karolina Smilei]$ module list

Currently Loaded Modules:
  1) GCCcore/12.2.0
  2) zlib/1.2.12-GCCcore-12.2.0
  3) binutils/2.39-GCCcore-12.2.0
  4) numactl/2.0.16-GCCcore-12.2.0
  5) CUDA/12.3.0
  6) NVHPC/24.3-CUDA-12.3.0
  7) XZ/5.2.7-GCCcore-12.2.0
  8) libxml2/2.10.3-GCCcore-12.2.0
  9) libpciaccess/0.17-GCCcore-12.2.0
 10) hwloc/2.8.0-GCCcore-12.2.0
 11) libevent/2.1.12-GCCcore-12.2.0
 12) UCX/1.16.0-GCCcore-12.2.0
 13) GDRCopy/2.4.1-GCCcore-12.2.0
 14) UCX-CUDA/1.16.0-GCCcore-12.2.0-CUDA-12.3.0
 15) libfabric/1.16.1-GCCcore-12.2.0
 16) PMIx/4.2.2-GCCcore-12.2.0
 17) UCC/1.3.0-GCCcore-12.2.0
 18) NCCL/2.21.5-GCCcore-12.2.0-CUDA-12.3.0
 19) UCC-CUDA/1.3.0-GCCcore-12.2.0-CUDA-12.3.0
 20) OpenMPI/4.1.6-NVHPC-24.3-CUDA-12.3.0
 21) Szip/2.1.1-GCCcore-12.2.0
 22) HDF5/1.14.3-NVHPC-24.3-CUDA-12.3.0

We have OpenMPI,

 (base) [it4i-vojtech@acn17.karolina Smilei]$ which mpicc
/apps/all/OpenMPI/4.1.6-NVHPC-24.3-CUDA-12.3.0/bin/mpicc
(base) [it4i-vojtech@login2.karolina Smilei]$ ls /apps/all/OpenMPI/4.1.6-NVHPC-24.3-CUDA-12.3.0/bin/
aggregate_profile.pl  mpif90        ortecc       oshcc           shmemc++
mpic++                mpifort       orte-clean   oshCC           shmemcc
mpicc                 mpirun        orted        oshcxx          shmemCC
mpiCC                 ompi-clean    orte-info    oshfort         shmemcxx
mpicxx                ompi_info     orterun      oshmem_info     shmemfort
mpiexec               ompi-server   orte-server  oshrun          shmemrun
mpif77                opal_wrapper  oshc++       profile2mat.pl

and proper python

 (base) [it4i-vojtech@acn17.karolina Smilei]$ python
Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>>

I created a primitive machine file "karolina" including just two lines

SMILEICXX_DEPS = g++
CXXFLAGS += -gpu=cc80 -acc

Then I tried

(base) [it4i-vojtech@acn17.karolina Smilei]$ make machine="karolina" config="gpu_nvidia" 
Compiling src/Checkpoint/Checkpoint.cpp
"src/Tools/H5.h", line 11: catastrophic error: #error directive: "HDF5 was not built with --enable-parallel option"
  #error "HDF5 was not built with --enable-parallel option"
   ^

1 catastrophic error detected in the compilation of "src/Checkpoint/Checkpoint.cpp".
Compilation terminated.
make: *** [makefile:369: build/src/Checkpoint/Checkpoint.o] Error 2

It seems to me that despite the name of the module HDF5/1.14.3-NVHPC-24.3-CUDA-12.3.0, HDF5 was not built properly. Do you think that it is possible?

I might try to compile HDF5 myself according to your instructions eventually as well.

charlesprouveur · 2024-05-08T19:27:11Z

Hi,
It is very likely HDF5 was not properly built.

Preface: no test has been done with the latest nvhpc version (ie 24.0 and above) but it "should" work.

Here is an example of how I do it on my machine with nvhpc 23.11 that you can use as a reference:
( note that in your case "/.../YOUR_DIRECTORY/modulefiles/nvhpc/23.11" should be replaced with NVHPC/24.3-CUDA-12.3.0 )

cd YOUR_DIRECTORY
mkdir tools
cd tools
 
wget https://github.com/HDFGroup/hdf5/releases/download/hdf5-1_14_2/hdf5-1_14_2.tar.gz
 
tar xzfv hdf5-1_14_2.tar.gz
cd hdfsrc/
mkdir build
cd build
module load /.../YOUR_DIRECTORY/modulefiles/nvhpc/23.11 cmake
cmake -DCMAKE_C_COMPILER=`which mpicc` -DCMAKE_INSTALL_PREFIX=/gpfswork/rech/YOUR_DIRECTORY/tools/hdfsrc/install -DHDF5_ENABLE_PARALLEL=ON ..
make
make install

It seems the person who installed your HDF5 module did not include the "-DHDF5_ENABLE_PARALLEL=ON" option in
his install script.

Once your hdf5 install is finished you should

export HDF5_ROOT_DIR=YOUR_DIRECTORY/tools/hdfsrc/install
export LD_LIBRARY_PATH=YOUR_DIRECTORY/tools/hdfsrc/install/lib/:$LD_LIBRARY_PATH

at compile time and runtime.
At compile time you might need to change your machine file:

SMILEICXX_DEPS = g++ -I/YOUR_DIRECTORY/tools/hdfsrc/install/include/

GPU_COMPILER = nvcc -I/YOUR_DIRECTORY/tools/hdfsrc/install/include/

Horymir001 · 2024-05-09T06:58:32Z

Hi Charles,
thanks for your advice.
I compiled HDF5 in the following way:

salloc -A DD-23-157 -p qgpu_exp -N 1 --ntasks-per-node 16 --gpus 8 -t 00:40:00
ml OpenMPI/4.1.6-NVHPC-23.11-CUDA-12.2.0 # SAME NVHPC YOU RECOMMEND
ml CMake/3.24.3-GCCcore-12.2.0

cd
mkdir myHDF5
cd HDF5
mkdir tools
cd tools
 wget https://github.com/HDFGroup/hdf5/releases/download/hdf5-1_14_2/hdf5-1_14_2.tar.gz
 tar xzfv hdf5-1_14_2.tar.gz
cd hdfsrc/
mkdir build
cd build
cmake -DCMAKE_C_COMPILER=`which mpicc` -DCMAKE_INSTALL_PREFIX=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install/ -DHDF5_ENABLE_PARALLEL=ON ..
make -j 50
make install
export HDF5_ROOT_DIR=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install
export LD_LIBRARY_PATH=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install/lib/:$LD_LIBRARY_PATH

This installation was successful.

Then I prepared this machine file karolina.

SMILEICXX_DEPS = g++ -I//home/it4i-vojtech/myHDF5/tools/hdfsrc/install/include/
GPU_COMPILER = nvcc -I//home/it4i-vojtech/myHDF5/tools/hdfsrc/install/include/ 
CXXFLAGS += -gpu=cc80 -acc

Then I attempted to compile Smilei

make clean
make -j 50 machine="karolina" config="gpu_nvidia" > output.log 2> error.log

error.log
output.log

Typical errors are:

src/Projector/Projector2D2OrderGPUKernelCUDAHIP.cu(1186): error: calling a constexpr __device__ function("Params::getGPUClusterWidth(int)") from a __host__ function("currentDepositionKernel2D") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
          do { if( !( Params::getGPUClusterWidth( 2 ) != -1 && Params::getGPUClusterGhostCellBorderWidth( 2 ) != -1 ) ) { {{{std::string line = " "; for (int __ic =0; __ic < 80 ; __ic++) line += "-"; std::cerr << "\033[1;31m" << line << "\n [" << "ERROR" << "] " << "src/Projector/Projector2D2OrderGPUKernelCUDAHIP.cu" << ":" << 1186 << " (" << __FUNCTION__ << ") " << "Params::getGPUClusterWidth( 2 ) != -1 && Params::getGPUClusterGhostCellBorderWidth( 2 ) != -1" << "\n" << line << "\033[0m" << std::endl;}; raise(

and

src/Particles/nvidiaParticles.cu(697): error: calling a constexpr __device__ function("_ZN6Params18getGPUClusterWidthE1?") from a __host__ function("computeParticleClusterKey") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
                                       Cluster3D<Params::getGPUClusterWidth( 3 )>{ parameters.res_space[0],
                                                 ^

"/apps/all/CUDA/12.2.0/include/crt/host_defines.h", line 86: warning: incompatible redefinition of macro "__forceinline__" (declared at line 39 of "/apps/all/CUDA/12.2.0/include/cuda/std/detail/__config") [bad_macro_redef]
  #define __forceinline__ \
          ^

8 errors detected in the compilation of "src/Particles/nvidiaParticles.cu".
make: *** [makefile:374: build/src/Particles/nvidiaParticles.o] Error 2
NVC++-W-1053-External and Static

I think I need to specify flags better. However, I do not know how.

mccoys · 2024-05-09T08:17:20Z

Try to add --expt-relaxed-constexpr in the variable GPU_COMPILER_FLAGS

Horymir001 · 2024-05-09T08:36:39Z

Done. Different errors popped up:
error.log
output.log

They are of this kind

src/Projector/Projector3D2OrderGPUKernelCUDAHIP.cu(85): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, double)
                      ::atomicAdd( a_pointer, a_value );

src/Profiles/Function.h(655): warning #611-D: overloaded virtual function "Function::valueAt" is only partially overridden in class "Function_Polygonal2D"
  class Function_Polygonal2D : public Function
        ^

make: *** [makefile:374: build/src/Projector/Projector3D2OrderGPUKernelCUDAHIP.o] Error 1

charlesprouveur · 2024-05-09T08:50:55Z

Assuming you did a "make clean" before compiling again, i am thinking you do not have the -arch option specified in your machine file for GPU_COMPILER_FLAGS.
As an example in my machine file:

(...)
CXXFLAGS += -w
CXXFLAGS += -acc=gpu -gpu=cc86,fastmath -std=c++14  -lcurand # do not put -cuda here

GPU_COMPILER_FLAGS += -O2 --std c++14 $(DIRS:%=-I%) 

GPU_COMPILER_FLAGS += --expt-relaxed-constexpr
GPU_COMPILER_FLAGS += $(shell $(PYTHONCONFIG) --includes)
GPU_COMPILER_FLAGS += -arch=sm_86 #native #--generate-code arch=compute_86,code=sm_86  
CXXFLAGS        += -Minfo=accel # what is offloaded/copied

LDFLAGS += -acc=gpu -gpu=cc86  -cudalib=curand  # ccnative also works
CXXFLAGS += -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1  -std=c++14

Horymir001 · 2024-05-09T11:22:09Z

Thank you both. We can move forward as the compilation was successful. It failed by running though.

Compilation of HDF5 as in my post 4 hours ago.
Machine file karolina

SMILEICXX_DEPS = g++ -I//home/it4i-vojtech/myHDF5/tools/hdfsrc/install/include/
GPU_COMPILER = nvcc -I//home/it4i-vojtech/myHDF5/tools/hdfsrc/install/include/ --expt-relaxed-constexpr
CXXFLAGS += -gpu=cc80 -acc
CXXFLAGS += -w
CXXFLAGS += -acc=gpu -gpu=cc80,fastmath -std=c++14  -lcurand # do not put -cuda here
GPU_COMPILER_FLAGS += -O2 --std c++14 $(DIRS:%=-I%) 
GPU_COMPILER_FLAGS += --expt-relaxed-constexpr
GPU_COMPILER_FLAGS += $(shell $(PYTHONCONFIG) --includes)
GPU_COMPILER_FLAGS += -arch=sm_80 #native #--generate-code arch=compute_80,code=sm_80  
CXXFLAGS        += -Minfo=accel # what is offloaded/copied
LDFLAGS += -acc=gpu -gpu=cc80  -cudalib=curand  # ccnative also works
CXXFLAGS += -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1  -std=c++14

Compilation of Smilei GPU

export HDF5_ROOT_DIR=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install
export LD_LIBRARY_PATH=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install/lib/:$LD_LIBRARY_PATH
make clean
make -j 50 machine="karolina" config="gpu_nvidia"

Launching attempt

I took a slightly modified example file for 2D LWFA with GPU computing on.
tst2d_04_laser_wake.py.txt

salloc -A DD-23-157 -p qgpu_exp -N 1 --ntasks-per-node 16 --gpus 8 -t 00:40:00

ml OpenMPI/4.1.6-NVHPC-23.11-CUDA-12.2.0 # SAME NVHPC YOU RECOMMEND
ml CMake/3.24.3-GCCcore-12.2.0
export HDF5_ROOT_DIR=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install
export LD_LIBRARY_PATH=/home/it4i-vojtech/myHDF5/tools/hdfsrc/install/lib/:$LD_LIBRARY_PATH
srun /home/it4i-vojtech/Smilei/smilei tst2d_04_laser_wake.py > output_smilei.log 2> error_smilei.log

It runs for half a minute, writes some outputs, and then fails. I watched nvidia-smi output in time, it run at up to three GPUs (of 8). Here are the outputs:

error_smilei.log
output_smilei.log

I think it is only a question of a proper submission now. Accelerated nodes at Karolina have 128 cores and 8 x NVIDIA A100, i.e. 16 cores per GPU. For some reason, 16 processes runs with a running command shown above.

mccoys · 2024-05-09T11:29:44Z

You probably want -arch=sm_80 instead of -arch=sm_86, as suggested from the error

Horymir001 · 2024-05-09T12:03:59Z

You probably want -arch=sm_80 instead of -arch=sm_86, as suggested from the error

I edited the previous reply.

mccoys · 2024-05-09T13:15:33Z

It could be a memory issue. Try with less particles ?

Horymir001 · 2024-05-09T13:32:29Z

I tried now even with one particle per cell. The same error still.

charlesprouveur · 2024-05-09T13:45:37Z

As far as i can see the input file contains not yet supported features such as the filter and the load balancing for instance

Horymir001 · 2024-05-09T13:47:48Z

Oh, I did not think about it! Could you please recommend some save input for a test, please?

charlesprouveur · 2024-05-09T13:51:15Z

here is a namelist that i used to benchmark an A100 (note that this is in 3D with no moving window, also we use one patch as it is best for GPUs, for multiple GPUs you have to increase the number of patches proportionaly):


import math as m
import numpy as np
import os

c = 299792458
lambdar = 1e-6                  # reference wavelength
wr = 2*m.pi*c/lambdar

temperature   = 100./511.                               # electron & ion temperature in me c^2

density  = 0.01

# plasma wavelength
lambdap = 2*m.pi/density

# Debye length in units of c/\omega_{pe}
Lde = m.sqrt(temperature)

dx = 0.5*Lde
dy = dx
dz = dx

dt  = 0.5 * dx /m.sqrt(3.)              # timestep (0.95 x CFL)

Lx = 128*dx
Ly = 128*dy
Lz = 128*dz

# Simulation time
simulation_time  = 100*dt

particles_per_cell = 8

number_of_patches = [1,1,1]

position_initialization = 'random'

gpu_computing = True
vectorization = "off"

Main(
    geometry = "3Dcartesian",

    interpolation_order = 2,

    timestep = dt,
    simulation_time = simulation_time,

    cell_length  = [dx,dy,dz],
    grid_length = [Lx,Ly,Lz],

    number_of_patches = number_of_patches,

    EM_boundary_conditions = [ ["periodic"] ],

    print_every = 100,

    gpu_computing = gpu_computing,

    random_seed = smilei_mpi_rank,
)

Vectorization(
   mode=vectorization,
)

Species(
    name = "proton",
    position_initialization = position_initialization,
    momentum_initialization = "mj",
    particles_per_cell = particles_per_cell,
    c_part_max = 1.0,
    mass = 1836.0,
    charge = 1.0,
    charge_density = density,
    mean_velocity = [0., 0.0, 0.0],
    temperature = [temperature],
    pusher = "boris",
    boundary_conditions = [
        ["periodic", "periodic"],
        ["periodic", "periodic"],
        ["periodic", "periodic"],
    ],
)
Species(
    name = "electron",
    position_initialization = "proton",
    momentum_initialization = "mj",
    particles_per_cell = particles_per_cell,
    c_part_max = 1.0,
    mass = 1.0,
    charge = -1.0,
    charge_density = density,
    mean_velocity = [0., 0.0, 0.0],
    temperature = [temperature],
    pusher = "boris",
    boundary_conditions = [
        ["periodic", "periodic"],
        ["periodic", "periodic"],
        ["periodic", "periodic"],
    ],
)

DiagScalar(every = 10)

fields = ["Ex", "Ey", "Ez", "Jx","Jy","Jz","Rho"]

diag_species_list = ["Jx","Jy","Jz","Rho"]
species_list = ["electron", "proton"]

for diag in diag_species_list:
    for species in species_list:
        fields.append(diag + "_" + species)

DiagFields(
    #name = "my field diag",
    every = 50,
    fields = fields,
    #subgrid = None
)

DiagParticleBinning(
    deposited_quantity = "weight",
    every = 50,
    time_average = 1,
    species = ["electron"],
    axes = [
        ["x", 0., Lx, 128],
        ["y", 0., Ly, 128],
        ["z", 0., Lz, 128]
    ]
)

DiagParticleBinning(
    deposited_quantity = "weight",
    every = 50,
    time_average = 1,
    species = ["proton"],
    axes = [
        ["x", 0., Lx, 128],
        ["y", 0., Ly, 128],
        ["z", 0., Lz, 128]
    ]
)

DiagParticleBinning(
    deposited_quantity = "weight_ekin",
    every = 50,
    time_average = 1,
    species = ["electron"],
    axes = [
        ["x", 0., Lx, 128],
        ["y", 0., Ly, 128]
    ]
)

DiagProbe(
    #name = "my_probe",
    every    = 50,
    origin   = [0., 0., 0.5*Lz],
    corners  = [
        [Lx,0.,0.5*Lz],
        [0.,Ly,0.5*Lz],
    ],
    number   = [32, 32],
    fields   = fields,
)

Horymir001 · 2024-05-09T14:18:07Z

Great, this one runs till the end!
I increased the number of timesteps and observed the output of nvidia-smi. It seems it uses only one GPU out of 8 available. Do you have an idea how to improve it?

(base) [it4i-vojtech@acn17.karolina ~]$ nvidia-smi 
Thu May  9 16:15:43 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:07:00.0 Off |                    0 |
| N/A   46C    P0            182W /  400W |    7347MiB /  40960MiB |     93%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          Off |   00000000:0B:00.0 Off |                    0 |
| N/A   34C    P0             65W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          Off |   00000000:48:00.0 Off |                    0 |
| N/A   29C    P0             63W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          Off |   00000000:4C:00.0 Off |                    0 |
| N/A   31C    P0             67W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          Off |   00000000:88:00.0 Off |                    0 |
| N/A   28C    P0             62W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          Off |   00000000:8B:00.0 Off |                    0 |
| N/A   31C    P0             64W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          Off |   00000000:C8:00.0 Off |                    0 |
| N/A   29C    P0             63W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          Off |   00000000:CB:00.0 Off |                    0 |
| N/A   29C    P0             63W /  400W |     425MiB /  40960MiB |      0%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei             7338MiB |
|    1   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A     66146      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
+-----------------------------------------------------------------------------------------+
(base) [it4i-vojtech@acn17.karolina ~]$

mccoys · 2024-05-09T14:22:06Z

You must define a binding between processes and gpus, typically using a binding file, or using the proper options for your queue manager (such as slurm)

Horymir001 · 2024-05-09T14:30:35Z

Well, thank you. I am not sure if I am capable of figuring it out myself. I guess I should try to discuss it with cluster user support.

charlesprouveur · 2024-05-09T14:41:12Z

The fact that it ran on one GPU was what we asked for in the input file since there was only one patch.

As for the binding script it may not be necessary in your case, simply change


Lx = 128*dx
Ly = 128*dy
Lz = 128*dz

# Simulation time
simulation_time  = 100*dt

particles_per_cell = 8

number_of_patches = [1,1,1]

to


Lx = 256*dx
Ly = 256*dy
Lz = 256*dz

# Simulation time
simulation_time  = 100*dt

particles_per_cell = 8

number_of_patches = [2,2,2]

(increasing the size of the problem and the number of patch to have an equivalent charge on each GPU)

and in your slurm command you would specify something like:

#SBATCH --ntasks=8                   # Number of MPI processes (= total number of GPU)
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8          #  MPI tasks  per node (= number of GPU per node)
#SBATCH --gres=gpu:8                 # number of GPU per node
#SBATCH --cpus-per-task=4           # number of  CPU core per task

See if that crashes / works

Horymir001 · 2024-05-09T14:50:52Z

I could not do #SBATCH --cpus-per-task=4 . But otherwise, it seems fine to me so far!

I will try to do some more testing tomorrow! Thanks.

(base) [it4i-vojtech@acn33.karolina ~]$ nvidia-smi
Thu May  9 16:46:11 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off |   00000000:07:00.0 Off |                    0 |
| N/A   35C    P0             81W /  400W |   11067MiB /  40960MiB |     88%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          Off |   00000000:0B:00.0 Off |                    0 |
| N/A   35C    P0             68W /  400W |   11067MiB /  40960MiB |     88%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          Off |   00000000:48:00.0 Off |                    0 |
| N/A   33C    P0            142W /  400W |   11067MiB /  40960MiB |     94%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          Off |   00000000:4C:00.0 Off |                    0 |
| N/A   36C    P0             70W /  400W |   11067MiB /  40960MiB |     94%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          Off |   00000000:88:00.0 Off |                    0 |
| N/A   34C    P0             67W /  400W |   10757MiB /  40960MiB |     94%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          Off |   00000000:8B:00.0 Off |                    0 |
| N/A   36C    P0             66W /  400W |   11067MiB /  40960MiB |     93%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          Off |   00000000:C8:00.0 Off |                    0 |
| N/A   35C    P0             93W /  400W |   11067MiB /  40960MiB |     63%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          Off |   00000000:CB:00.0 Off |                    0 |
| N/A   36C    P0            160W /  400W |   11067MiB /  40960MiB |     89%      Default |
|                                         |                        |             Disabled |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
|    0   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    0   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    0   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    0   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    0   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    0   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    0   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
|    1   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    1   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
|    2   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    2   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
|    3   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    3   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei             7794MiB |
|    4   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    4   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
|    5   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    5   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    6   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
|    6   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879260      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879261      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879262      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879263      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879264      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879265      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879266      C   /home/it4i-vojtech/Smilei/smilei              416MiB |
|    7   N/A  N/A    879267      C   /home/it4i-vojtech/Smilei/smilei             8104MiB |
+-----------------------------------------------------------------------------------------+
(base) [it4i-vojtech@acn33.karolina ~]$

charlesprouveur · 2024-05-09T14:56:51Z

Glad we could help :)
Considering that the original issue is solved, i think we can close this issue unless @spadova-a has further questions.

mccoys · 2024-05-09T15:07:54Z

Side note: we really need to explain, in the documentation for gpu, that there should be 1 process per gpu, and that it is better to have 1 patch per gpu or so

charlesprouveur · 2024-05-09T15:16:26Z

For one patch per gpu: it is here (hidden in Parallelization & optimization)
For the "one rank mpi per GPU" it should indeed be added there

mccoys · 2024-05-09T15:17:41Z

Ok I think we really need one page dedicated to gpu with links to other places if necessary

charlesprouveur · 2024-05-09T15:21:32Z

Agreed

Horymir001 · 2024-05-10T08:06:52Z

Hi, could you include the machine file in the code? Here are my suggestions, the comments include the installation description.
karolina.txt

charlesprouveur · 2024-05-10T09:13:44Z

We might add it in /scripts/compile_tools/machine/ with the other machine scripts , likely under "karolina_gpu"

spadova-a added the installation compilation, installation label Nov 27, 2023

mccoys closed this as completed May 9, 2024

smilei v5.0 problems with compiling on GPU on new HPC #674

smilei v5.0 problems with compiling on GPU on new HPC #674

Comments

spadova-a commented Nov 27, 2023

charlesprouveur commented Nov 27, 2023 • edited Loading

spadova-a commented Dec 4, 2023

charlesprouveur commented Dec 4, 2023 • edited Loading

spadova-a commented Dec 4, 2023

beck-llr commented Dec 4, 2023

charlesprouveur commented Dec 4, 2023 • edited Loading

spadova-a commented Dec 5, 2023 • edited Loading

charlesprouveur commented Dec 5, 2023 • edited Loading

spadova-a commented Dec 5, 2023

spadova-a commented Jan 17, 2024

mccoys commented Jan 17, 2024

iltommi commented Jan 17, 2024

spadova-a commented Jan 17, 2024

charlesprouveur commented Jan 17, 2024 • edited Loading

spadova-a commented Jan 18, 2024 • edited Loading

charlesprouveur commented Jan 18, 2024 • edited Loading

spadova-a commented Jan 18, 2024

charlesprouveur commented Jan 18, 2024 • edited Loading

spadova-a commented Jan 18, 2024

beck-llr commented Jan 18, 2024

charlesprouveur commented Jan 18, 2024

spadova-a commented Jan 18, 2024

charlesprouveur commented Jan 18, 2024 • edited Loading

spadova-a commented Jan 18, 2024 • edited Loading

charlesprouveur commented Jan 18, 2024 • edited Loading

spadova-a commented Jan 24, 2024

charlesprouveur commented Jan 24, 2024

spadova-a commented Jan 25, 2024

mccoys commented Jan 25, 2024 • edited Loading

charlesprouveur commented Jan 25, 2024

spadova-a commented Feb 13, 2024

mccoys commented Feb 29, 2024

Horymir001 commented May 8, 2024

charlesprouveur commented May 8, 2024 • edited Loading

Horymir001 commented May 9, 2024

mccoys commented May 9, 2024

Horymir001 commented May 9, 2024

charlesprouveur commented May 9, 2024

Horymir001 commented May 9, 2024 • edited Loading

mccoys commented May 9, 2024

Horymir001 commented May 9, 2024

mccoys commented May 9, 2024

Horymir001 commented May 9, 2024

charlesprouveur commented May 9, 2024

Horymir001 commented May 9, 2024

charlesprouveur commented May 9, 2024 • edited Loading

Horymir001 commented May 9, 2024

mccoys commented May 9, 2024

Horymir001 commented May 9, 2024

charlesprouveur commented May 9, 2024 • edited Loading

Horymir001 commented May 9, 2024

charlesprouveur commented May 9, 2024

mccoys commented May 9, 2024

charlesprouveur commented May 9, 2024

mccoys commented May 9, 2024

charlesprouveur commented May 9, 2024

Horymir001 commented May 10, 2024

charlesprouveur commented May 10, 2024

charlesprouveur commented Nov 27, 2023 •

edited

Loading

charlesprouveur commented Dec 4, 2023 •

edited

Loading

charlesprouveur commented Dec 4, 2023 •

edited

Loading

spadova-a commented Dec 5, 2023 •

edited

Loading

charlesprouveur commented Dec 5, 2023 •

edited

Loading

charlesprouveur commented Jan 17, 2024 •

edited

Loading

spadova-a commented Jan 18, 2024 •

edited

Loading

charlesprouveur commented Jan 18, 2024 •

edited

Loading

charlesprouveur commented Jan 18, 2024 •

edited

Loading

charlesprouveur commented Jan 18, 2024 •

edited

Loading

spadova-a commented Jan 18, 2024 •

edited

Loading

charlesprouveur commented Jan 18, 2024 •

edited

Loading

mccoys commented Jan 25, 2024 •

edited

Loading

charlesprouveur commented May 8, 2024 •

edited

Loading

Horymir001 commented May 9, 2024 •

edited

Loading

charlesprouveur commented May 9, 2024 •

edited

Loading

charlesprouveur commented May 9, 2024 •

edited

Loading