Compilation with GPU accelerated nodes on the HPC Niflheim #750
Replies: 8 comments 8 replies
-
Thank you kindly.
And yes, I believe I need to install nvhpc. I will try to set the flags correctly as well. |
Beta Was this translation helpful? Give feedback.
-
Hi again,
Sorry for this late reply. I have been trying to locally compile OpenMP and HDF5 on Niflheim, but it has proven too difficult due to the setup on the HPC. However, we managed to locally compile SMILEI with GPU support on a Linux computer.
The compilation was successful, but the simulation seems unable to complete a timestep when running with GPU acceleration.
PACE_2D_1.0.txt is the namelist in txt format where the gpu_computing = True, is set for the GPU run. We also ran the simulation with CPUs only.
Attached is the output from the CPU and the GPU run. They were run as:
CPU version is run with
mpirun -n 4 build_cpu/smilei PACE_2D_1.0_cpu.py &> out_cpu.txt
GPU version is run with
mpirun -n 1 build_nvidia/smilei PACE_2D_1.0.py &> out_gpu.txt
We set the number of patches from [16,16] to [1, 1] when running with GPU acceleration as well.
It can be seen that the simulation completes a few timesteps on the CPU run but doesn't complete any with the GPU acceleration. 7 GB is allocated to the GPU, so it seems to be initiating just fine.
Finally, nvidia_envis the one from the guide used, but we had to replace one $NVDIR with gcc to recover from a bug.
I was wondering if you by chance could see what our mistake is? 🙂
Many thanks again for your patience, and best wishes, Johan
…________________________________
From: charlesprouveur ***@***.***>
Sent: Friday, October 11, 2024 12:01 PM
To: SmileiPIC/Smilei ***@***.***>
Cc: Johan Kølsen de Wit ***@***.***>; Author ***@***.***>
Subject: Re: [SmileiPIC/Smilei] Compilation with GPU accelerated nodes on the HPC Niflheim (Discussion #750)
I am afraid, i don't have any experience with EasyBuild.
What we can do as a first step, is try with the openmpi builtin nvhpc (although i expect issues at runtime) to complete the compilation process of smilei .
Once we have completed that and we see runtime issues, we can go into the details of a local openmpi install.
Were you able to create a module of hdf5 compiled with nvc++ ?
If not you can look at https://smileipic.github.io/Smilei/Use/install_linux_GPU.html for a local install of hdf5 with nvc++
—
Reply to this email directly, view it on GitHub<#750 (reply in thread)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BB3MI25R5TIHSGHMCAOA2G3Z26OYVAVCNFSM6AAAAABPWIGLJWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTAOJRGM3DEOA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
_ _
___ _ | | _ \ \ Version : 5.1-34-g60be16288-master
/ __| _ __ (_) | | ___ (_) | |
\__ \ | ' \ _ | | / -_) _ | |
|___/ |_|_|_| |_| |_| \___| |_| | |
/_/
Reading the simulation parameters
-------------------------------------------------------------------------------
HDF5 version 1.14.2
Python version 3.10.12
Parsing pyinit.py
Parsing 5.1-34-g60be16288-master
Parsing pyprofiles.py
Parsing PACE_2D_1.0.py
Parsing pycontrol.py
Check for function preprocess()
python preprocess function does not exist
Calling python _smilei_check
Calling python _prepare_checkpoint_dir
Calling python _keep_python_running() :
[1;36mCAREFUL: Patches distribution: hilbertian
[0m
Smilei will run on GPU devices
[;33m[WARNING] src/Params/Params.cpp:1171 (compute) simulation_time has been redefined from 18.221237 to 18.219817 to match timestep.[0m
Geometry: 2Dcartesian
-------------------------------------------------------------------------------
Interpolation order : 2
Maxwell solver : Yee
simulation duration = 18.219817, total number of iterations = 2856
timestep = 0.006379 = 0.950000 x CFL, time resolution = 156.752402
Grid length: 1.21559, 1.21559
Cell length: 0.0094968, 0.0094968, 0
Number of cells: 128, 128
Spatial resolution: 105.299, 105.299
Cell sorting: activated
Electromagnetic boundary conditions
-------------------------------------------------------------------------------
xmin silver-muller, absorbing vector [1, 0]
xmax silver-muller, absorbing vector [-1, -0]
ymin silver-muller, absorbing vector [0, 1]
ymax silver-muller, absorbing vector [-0, -1]
Vectorization:
-------------------------------------------------------------------------------
Mode: adaptive
Default mode: off
Time selection: never
Calling python writeInfo
Initializing MPI
-------------------------------------------------------------------------------
MPI_THREAD_MULTIPLE enabled
Number of MPI processes: 1
OpenMP disabled
OpenMP task parallelization not activated
Number of patches: 1 x 1
Number of cells in one patch: 128 x 128
Dynamic load balancing: never
Initializing the restart environment
-------------------------------------------------------------------------------
Initializing species
-------------------------------------------------------------------------------
Creating Species #0: electron
Pusher: boris
Boundary conditions: thermalize thermalize thermalize thermalize
[1;36mCAREFUL: For species 'electron' Using thermal_boundary_temperature[0] in all directions
[0m
Density profile: 2D user-defined function (uses numpy)
Creating Species #1: deuteron
Pusher: boris
Boundary conditions: thermalize thermalize thermalize thermalize
[1;36mCAREFUL: For species 'deuteron' Using thermal_boundary_temperature[0] in all directions
[0m
Density profile: 2D user-defined function (uses numpy)
Initializing External fields
-------------------------------------------------------------------------------
External field Bz: 2D built-in profile `constant` (value: 0.275099)
Binary processes #0 within species (0 1)
1. Collisions with Coulomb logarithm: auto
Initializing Patches
-------------------------------------------------------------------------------
First patch created
All patches created
Creating Diagnostics, antennas, and external fields
-------------------------------------------------------------------------------
Diagnostic Fields #0 :
Ex Ey Ez Rho_electron Rho_deuteron
Created performances diagnostic
Finalize MPI environment
-------------------------------------------------------------------------------
Done creating diagnostics, antennas, and external fields
Minimum memory consumption (does not include all temporary buffers)
-------------------------------------------------------------------------------
Particles: Master 6400 MB; Max 6400 MB; Global 6.25 GB
Fields: Master 2 MB; Max 2 MB; Global 0.00198 GB
scalars.txt: Master 0 MB; Max 0 MB; Global 0 GB
Fields0.h5: Master 0 MB; Max 0 MB; Global 0 GB
Performances.h5: Master 0 MB; Max 0 MB; Global 0 GB
Initial fields setup
-------------------------------------------------------------------------------
Applying external fields at time t = 0
Applying prescribed fields at time t = 0
Applying antennas at time t = 0
GPU allocation and copy of the fields and particles
-------------------------------------------------------------------------------
Open files & initialize diagnostics
-------------------------------------------------------------------------------
Running diags at time t = 0
-------------------------------------------------------------------------------
Species creation summary
-------------------------------------------------------------------------------
Species 0 (electron) created with 67108864 particles
Species 1 (deuteron) created with 67108864 particles
Expected disk usage (approximate)
-------------------------------------------------------------------------------
WARNING: disk usage by non-uniform particles maybe strongly underestimated,
especially when particles are created at runtime (ionization, pair generation, etc.)
Expected disk usage for diagnostics:
File Fields0.h5: 1.79 G
File Performances.h5: 5.91 M
File scalars.txt: 390.67 K
Total disk usage for diagnostics: 1.80 G
Keeping or closing the python runtime environment
-------------------------------------------------------------------------------
Checking for cleanup() function:
python cleanup function does not exist
Closing Python
Time-Loop started: number of time-steps n_time = 2856
-------------------------------------------------------------------------------
[1;36mCAREFUL: The following `push time` assumes a global number of 1 cores (hyperthreading is unknown)
[0m
timestep sim time cpu time [s] ( diff [s] ) push time [ns]
_ _
___ _ | | _ \ \ Version : 5.1-34-g60be16288-master
/ __| _ __ (_) | | ___ (_) | |
\__ \ | ' \ _ | | / -_) _ | |
|___/ |_|_|_| |_| |_| \___| |_| | |
/_/
Reading the simulation parameters
-------------------------------------------------------------------------------
HDF5 version 1.10.7
Python version 3.10.12
Parsing pyinit.py
Parsing 5.1-34-g60be16288-master
Parsing pyprofiles.py
Parsing PACE_2D_1.0_cpu.py
Parsing pycontrol.py
Check for function preprocess()
python preprocess function does not exist
Calling python _smilei_check
Calling python _prepare_checkpoint_dir
Calling python _keep_python_running() :
[;33m
[WARNING](0) src/Params/Params.cpp:696 (Params) Resources allocated 48 underloaded regarding the total number of patches 4[0m
[1;36mCAREFUL: Patches distribution: hilbertian
[0m
Smilei will run on CPU devices
[;33m
[WARNING](0) src/Params/Params.cpp:1170 (compute) simulation_time has been redefined from 18.221237 to 18.219817 to match timestep.[0m
[;33m
[WARNING](0) src/Params/Params.cpp:1262 (compute) Particles cluster width `cluster_width` set to : 32[0m
[;33m
[WARNING](0) src/Params/Params.cpp:1276 (compute) Particles cluster width set to: 64 for the adaptive vectorization mode[0m
Geometry: 2Dcartesian
-------------------------------------------------------------------------------
Interpolation order : 2
Maxwell solver : Yee
simulation duration = 18.219817, total number of iterations = 2856
timestep = 0.006379 = 0.950000 x CFL, time resolution = 156.752402
Grid length: 1.21559, 1.21559
Cell length: 0.0094968, 0.0094968, 0
Number of cells: 128, 128
Spatial resolution: 105.299, 105.299
Cell sorting: activated
Electromagnetic boundary conditions
-------------------------------------------------------------------------------
xmin silver-muller, absorbing vector [1, 0]
xmax silver-muller, absorbing vector [-1, -0]
ymin silver-muller, absorbing vector [0, 1]
ymax silver-muller, absorbing vector [-0, -1]
Vectorization:
-------------------------------------------------------------------------------
Mode: adaptive
Default mode: off
Time selection: never
Calling python writeInfo
Initializing MPI
-------------------------------------------------------------------------------
MPI_THREAD_MULTIPLE enabled
Number of MPI processes: 4
Number of threads per MPI process : 12
OpenMP task parallelization not activated
Number of patches: 2 x 2
Number of cells in one patch: 64 x 64
Dynamic load balancing: never
Initializing the restart environment
-------------------------------------------------------------------------------
Initializing species
-------------------------------------------------------------------------------
Creating Species #0: electron
Pusher: boris
Boundary conditions: thermalize thermalize thermalize thermalize
[1;36mCAREFUL: For species 'electron' Using thermal_boundary_temperature[0] in all directions
[0m
Density profile: 2D user-defined function (uses numpy)
Creating Species #1: deuteron
Pusher: boris
Boundary conditions: thermalize thermalize thermalize thermalize
[1;36mCAREFUL: For species 'deuteron' Using thermal_boundary_temperature[0] in all directions
[0m
Density profile: 2D user-defined function (uses numpy)
Initializing External fields
-------------------------------------------------------------------------------
External field Bz: 2D built-in profile `constant` (value: 0.275099)
Binary processes #0 within species (0 1)
1. Collisions with Coulomb logarithm: auto
Initializing Patches
-------------------------------------------------------------------------------
First patch created
All patches created
Creating Diagnostics, antennas, and external fields
-------------------------------------------------------------------------------
Diagnostic Fields #0 :
Ex Ey Ez Rho_electron Rho_deuteron
Created performances diagnostic
Finalize MPI environment
-------------------------------------------------------------------------------
Done creating diagnostics, antennas, and external fields
Minimum memory consumption (does not include all temporary buffers)
-------------------------------------------------------------------------------
Particles: Master 3200 MB; Max 3200 MB; Global 12.5 GB
Fields: Master 0 MB; Max 0 MB; Global 0.00213 GB
scalars.txt: Master 0 MB; Max 0 MB; Global 0 GB
Fields0.h5: Master 0 MB; Max 0 MB; Global 0 GB
Performances.h5: Master 0 MB; Max 0 MB; Global 0 GB
Initial fields setup
-------------------------------------------------------------------------------
Solving Poisson at time t = 0
Initializing E field through Poisson solver
-------------------------------------------------------------------------------
Poisson solver converged at iteration: 0, relative err is ctrl = 0.000000 x 1e-14
Poisson equation solved. Maximum err = 0.000000 at i= -1
Time in Poisson : 0.000249
Applying external fields at time t = 0
Applying prescribed fields at time t = 0
Applying antennas at time t = 0
Open files & initialize diagnostics
-------------------------------------------------------------------------------
Running diags at time t = 0
-------------------------------------------------------------------------------
Species creation summary
-------------------------------------------------------------------------------
Species 0 (electron) created with 67108864 particles
Species 1 (deuteron) created with 67108864 particles
Expected disk usage (approximate)
-------------------------------------------------------------------------------
WARNING: disk usage by non-uniform particles maybe strongly underestimated,
especially when particles are created at runtime (ionization, pair generation, etc.)
Expected disk usage for diagnostics:
File Fields0.h5: 1.79 G
File Performances.h5: 7.28 M
File scalars.txt: 390.67 K
Total disk usage for diagnostics: 1.80 G
Keeping or closing the python runtime environment
-------------------------------------------------------------------------------
Checking for cleanup() function:
python cleanup function does not exist
Closing Python
Time-Loop started: number of time-steps n_time = 2856
-------------------------------------------------------------------------------
[1;36mCAREFUL: The following `push time` assumes a global number of 48 cores (hyperthreading is unknown)
[0m
timestep sim time cpu time [s] ( diff [s] ) push time [ns]
1/2856 9.5692e-03 1.7488e+01 ( 1.7488e+01 ) 6254
2/2856 1.5949e-02 3.4516e+01 ( 1.7028e+01 ) 6089
3/2856 2.2328e-02 5.2292e+01 ( 1.7776e+01 ) 6357
import math
import scipy.constants
import numpy as np
# Constants
c = scipy.constants.speed_of_light
q = scipy.constants.electron_volt
m = scipy.constants.electron_mass
eps0 = scipy.constants.epsilon_0
# EPOCH input values
B_EPOCH = 0.057
n_EPOCH = 7.4971e16
T_EPOCH = 5
l_EPOCH = 0.01 # MW
t_end_EPOCH = 5e-10# 3e-9 #
N_cells = 128 # MW
CR_EPOCH = l_EPOCH/(N_cells) * 1/c
I0_EPOCH = 1e3 #W/m²
omega_r = 5.8e9*2*math.pi
B_r = m*omega_r/q
n_r = eps0*m*omega_r**2/q**2
L_r = c/omega_r
t_r = 1/omega_r
# SMILEI parameters
T_SMILEI = T_EPOCH/511e3
n_SMILEI = n_EPOCH/n_r
B_SMILEI = B_EPOCH/B_r
l_SMILEI = l_EPOCH / L_r
x0_SMILEI = l_SMILEI/2
t_end_SMILEI = t_end_EPOCH/t_r
l_cav_SMILEI = l_EPOCH/L_r
a0_SMILEI = 0.86*c/(omega_r/(2*math.pi)) * 10**6 * math.sqrt(I0_EPOCH/1e18)
dx_sim = l_EPOCH/N_cells
dt_CR = 0.95*dx_sim/c/np.sqrt(2)
dt_SIM = 2*np.pi/(20.3*omega_r) # originally 10.3
field_step = 1 # MW int(dt_SIM/dt_CR) # save fields every field_step
def super_gaussian(x, y):
return n_SMILEI * np.exp(-((np.sqrt((x-x0_SMILEI)**2 + (y-x0_SMILEI)**2))/(l_cav_SMILEI/3))**6)
Main(
geometry = "2Dcartesian",
interpolation_order = 2,
number_of_cells = [N_cells, N_cells],
grid_length = [l_SMILEI, l_SMILEI],
#number_of_patches = [ 16, 16 ], # MW
number_of_patches = [ 1, 1 ], # MW
gpu_computing = True, #MW
timestep = CR_EPOCH * omega_r * 0.95/np.sqrt(2),
simulation_time = t_end_SMILEI,
EM_boundary_conditions = [ ['silver-muller'], ['silver-muller' ]],
reference_angular_frequency_SI = omega_r,
print_every = int(1) #
)
Species(
name = "electron",
position_initialization = "regular",
momentum_initialization = "maxwell-juettner",
charge = -1.0,
mass = 1.0,
particles_per_cell = 4096,
number_density = super_gaussian,
temperature=[T_SMILEI],
boundary_conditions = [["thermalize", "thermalize"], ["thermalize", "thermalize"]],
thermal_boundary_temperature = [T_SMILEI],
)
Species(
name = "deuteron",
position_initialization = "regular",
momentum_initialization = "maxwell-juettner",
charge = 1.0,
mass = 1.0*1836.2,
particles_per_cell = 4096,
number_density = super_gaussian,
temperature=[T_SMILEI],
boundary_conditions = [["thermalize", "thermalize"], ["thermalize", "thermalize"]],
thermal_boundary_temperature = [T_SMILEI],
)
Collisions(
species1 = ["electron", "deuteron"],
species2 = ["electron", "deuteron"],
)
ExternalField(
field = "Bz",
profile = constant(B_SMILEI)
)
DiagPerformances(
every = field_step,
#flush_every = field_step,
)
DiagFields(
every = field_step,
fields = ['Ex','Ey','Ez','Rho_electron', 'Rho_deuteron']
)
DiagScalar(
every = field_step,
vars = ["Utot", "Ukin", "Uelm", "Uelm_Ex", "Ukin_bnd", "Uelm_bnd"],
precision = 10
)
"""
Checkpoints(
#restart_dir = '../PACE_2D_1.0/',
dump_minutes = 1410,
exit_after_dump = True,
keep_n_dumps = 2,
)
"""
export BUILD_DIR=build_nvidia
export NVDIR="/home/matthias/Projekte/Smilei/NVDIR"
export PATH=$NVDIR/Linux_x86_64/23.11/compilers/bin:$PATH
export PATH=$NVDIR/Linux_x86_64/23.11/comm_libs/mpi/bin:$PATH
export HDF5_ROOT_DIR=$NVDIR/hdfsrc/install/
export LD_LIBRARY_PATH=$HDF5_ROOT_DIR/lib
export LDFLAGS="-acc=gpu -gpu=ccnative -cudalib=curand "
export CXXFLAGS="-acc=gpu -gpu=ccnative,fastmath -std=c++14 -lcurand -Minfo=accel -w -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -I$NVDIR/Linux_x86_64/23.11/math_libs/include/"
export GPU_COMPILER_FLAGS="-O3 --std c++14 -arch=sm_86 --expt-relaxed-constexpr --compiler-bindir gcc -I$NVDIR/Linux_x86_64/23.11/comm_libs/12.3/openmpi4/openmpi-4.1.5/include/ -I$NVDIR/hdfsrc/install/include/"
export SMILEICXX_DEPS=g++
export SLURM_LOCALID=0
|
Beta Was this translation helpful? Give feedback.
-
Hi Johan, Sorry to hear about the compilation issues on your Niflheim As for your execution, i see a couple issues. You are using currently not supported features: I have not tested maxwell-juettner so far so this could be an issue (or not, i will have a look if necessary EDIT: as long as you don't use injection or moving window it should not be a problem) To check your install you can run this small test case that runs on my laptop GPU: Best, Charles PS: can you be more specific on this: "Finally, nvidia_envis the one from the guide used, but we had to replace one $NVDIR with gcc to recover from a bug." |
Beta Was this translation helpful? Give feedback.
-
Hi Charles,
Thank you kindly for your reply.
Apologies for the vagueness in the encountered bug with the compilation - I'm away from the local Linux computer, so it is a colleague that is helping me compiling SMILEI and running the comparison tests. I have asked him for a clarification on the bug.
I only need the thermalized BC. If you could add that, it would be of great help!
Do you have any recommendations for other momentum-initializations apart from maxwell-juettner that are GPU supported?
Best wishes, Johan
…________________________________
From: charlesprouveur ***@***.***>
Sent: Wednesday, October 23, 2024 11:59 AM
To: SmileiPIC/Smilei ***@***.***>
Cc: Johan Kølsen de Wit ***@***.***>; Author ***@***.***>
Subject: Re: [SmileiPIC/Smilei] Compilation with GPU accelerated nodes on the HPC Niflheim (Discussion #750)
Hi Johan,
Sorry to hear about the compilation issues on your Niflheim
As for your execution, i see a couple issues. You are using currently not supported features:
momentum_initialization = "maxwell-juettner",
...
boundary_conditions = [["thermalize", "thermalize"], ["thermalize", "thermalize"]],
I have not tested maxwell-juettner so far so this could be an issue (or not, i will have a look if necessary)
The BC thermalize is currently not supported on GPU (simply a question of time, i can add it if you only need that)
To check your install you can run this small test case that runs on my laptop GPU:
input.txt<https://github.com/user-attachments/files/17488844/input.txt>
Best,
Charles
PS: can you be more specific on this: "Finally, nvidia_envis the one from the guide used, but we had to replace one $NVDIR with gcc to recover from a bug."
—
Reply to this email directly, view it on GitHub<#750 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BB3MI264DNJKGNBA2BU7PVTZ45XQDAVCNFSM6AAAAABPWIGLJWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMBSG42DGMA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
Hi Johan, NVDIR is an environment variable that we set ourselves at the root of the nvhpc folder we setup, ie: In your case, since you are using an RTX 3090, the exact options in our example work (ie -arch=sm_86) Looking at your output there is nothing unusual, we do see "smilei will run on GPU" so on that side i think you are good. Now onto the "acceleration" aspect, there are two things here:
Your theroretical performance is slashed by a factor 70 between single and double precision. I will add another point: at high performance writing outputs (the .h5 files) will be a huge bottleneck which is why their frequency should be reduced to the minimum. In your cases you can see that diagnostics took respectively 82% and 87% of the computing time. At this point any performance comparison between the two chips is pointless, you are only seeing the performance of your SSD. If you want to look at pure performance you can run a much bigger test case with an output at the end for instance. Best regards, Charles |
Beta Was this translation helpful? Give feedback.
-
Hi Johan, |
Beta Was this translation helpful? Give feedback.
-
Dear Charles,
Thank you very much for making this update. I am looking forward to using it!
Best wishes, Johan
…________________________________
From: charlesprouveur ***@***.***>
Sent: Tuesday, November 19, 2024 7:04 PM
To: SmileiPIC/Smilei ***@***.***>
Cc: Johan Kølsen de Wit ***@***.***>; Author ***@***.***>
Subject: Re: [SmileiPIC/Smilei] Compilation with GPU accelerated nodes on the HPC Niflheim (Discussion #750)
Hi Johan,
Thermal BC have been ported on GPU. A new version of smilei has been pushed on the github repo.
Best regards,
Charles
—
Reply to this email directly, view it on GitHub<#750 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BB3MI256S7HNCO5GY6EZBLT2BN4TPAVCNFSM6AAAAABPWIGLJWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTCMZQHE2TOMA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I heard that vpic will have a pure gpu version coming soon. Does smilei have plans to port to gpu in the future? |
Beta Was this translation helpful? Give feedback.
-
This discussion is a continuation of a Q&A mistakenly opened under 'issues'. I will post my question here underneath:
Dear SMILEI team,
I'm having trouble compiling SMILEI with GPU accelerated nodes on the HPC, Niflheim’s, NVIDIA A100 nodes.
I'm not experienced with GPU accelerated nodes, and it is very likely me that is making the mistakes. However, if it is not too inconvenient, I was hoping you could help me compile SMILEI with GPU acceleration on Niflheim successfully?
I successfully compile with intel/2023a for CPU but encounter an issue when using CUDA for the A100 GPUs. After exporting the GPU compiler to nvcc and running make config=gpu_nvidia, I get the following error:
src/Params/Params.h:421:5: error: body of ‘constexpr’ function ‘static constexpr int Params::getGPUClusterWidth(int)’ not a return-statement 421 | } | ^ src/Params/Params.h: In static member function ‘static constexpr int Params::getGPUClusterGhostCellBorderWidth(int)’: src/Params/Params.h:469:5: error: body of ‘constexpr’ function ‘static constexpr int Params::getGPUClusterGhostCellBorderWidth(int)’ not a return-statement 469 | } | ^ src/Params/Params.h: In static member function ‘static constexpr int Params::getGPUInterpolationClusterCellVolume(int, int)’: src/Params/Params.h:494:5: error: body of ‘constexpr’ function ‘static constexpr int Params::getGPUInterpolationClusterCellVolume(int, int)’ not a return-statement 494 | } | ^
I see in your installation guide that several flags must be supplied in $CXXFLAGS and $GPU_COMPILER_FLAGS, and that your environment variables for jean_zay_gpu_A100 are set to:
export CXXFLAGS="-O3 -std=c++14 -fopenmp -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1"
export GPU_COMPILER_FLAGS="-O3 --std=c++14 -arch=sm_80 --expt-relaxed-constexpr"
export LDFLAGS="-lcudart -lcurand -lgomp"
Neither I or the admins for Niflheim are however sure what these flags must be set to in my case.
Could you please advice me to the correct CXXFLAGS and GPU_COMPILER_FLAGS for Niflheim’s A100 partition?. :-)
Thank you for your help, and kind regards, Johan
With the kind answer:
To your question:
Could you specify what environment you are using / what modules have you loaded.
You mention compiling for CPU with intel oneapi.
For GPU you should only use the compilers provided in an nvhpc package ( currently i recomment 24.5)
Our dependencies are mostly hdf5 and openmpi, these should be compiled with nvc++ after installing nvhpc
Hdf5 is simple enough and you could install it locally following our guide, for openmpi this should be handled by your support.
Regarding flags, there is lot to be said, you should inspire yourself from the jeanzay A100 example in scripts/compile_tools/machine/ but also https://smileipic.github.io/Smilei/Use/install_linux_GPU.html
FOr starters:
-fopenmp should NOT be here , neither -lgomp
you are missing the gpu specific flags for CXXFLAGS such as -acc=gpu -gpu=cc80
so it should look like:
export LDFLAGS="-acc=gpu -gpu=cc80 -cudalib=curand "
export CXXFLAGS="-acc=gpu -gpu=cc80,fastmath -std=c++14 -lcurand -Minfo=accel -w -D__GCC_ATOMIC_TEST_AND_SET_TRUEVAL=1 -I$NVDIR/Linux_x86_64/23.11/math_libs/include/"
export GPU_COMPILER_FLAGS="-O3 --std c++14 -arch=sm_80 --expt-relaxed-constexpr -I$NVDIR/hdfsrc/install/include/"
while specifying NVDIR (or you can remove those if you have a module that exports the include and lib folders)
Beta Was this translation helpful? Give feedback.
All reactions