NEXT STEP: wait for prgenv-gnu/24.11:v1 to be released for testing
From the release notes and spelunking the HPC SDK, we have the following version matrix:
nvhpc |
cufftmp | nvshmem |
---|---|---|
24.9 | 11.2.6 | 3.0.6-4 |
24.11 | 11.2.6 | 3.0.6-4 |
We will target versions 11.2.6 and 3.0.6 of cufftmp and nvshmem respectively.
The notes below are a bit more generic, to make spelunking future releases a bit easier.
We build nvshmem from source, which is available for download as versioned tar balls from NVIDIA.
It can be be downloaded using wget
or curl
.
The full version that is available for download is 3.0.6-4, which can be downloaded as follows:
wget https://developer.download.nvidia.com/compute/redist/nvshmem/3.0.6/source/nvshmem_src_3.0.6-4.txz
tar -xvJf nvshmem_src_3.0.6-4.txz
Note: it is compressed with xz, hence the J
flag passed to `tar.
NVSHMEM_DEBUG=OFF
NVSHMEM_DEVEL=OFF
NVSHMEM_DEFAULT_PMI2=OFF
NVSHMEM_DEFAULT_PMIX=OFF
NVSHMEM_DISABLE_COLL_POLL=ON
NVSHMEM_ENABLE_ALL_DEVICE_INLINING=OFF
NVSHMEM_GPU_COLL_USE_LDST=OFF
NVSHMEM_LIBFABRIC_SUPPORT=ON
NVSHMEM_MPI_SUPPORT=ON
NVSHMEM_NVTX=ON
NVSHMEM_PMIX_SUPPORT=ON
NVSHMEM_SHMEM_SUPPORT=ON
NVSHMEM_TEST_STATIC_LIB=OFF
NVSHMEM_TIMEOUT_DEVICE_POLLING=OFF
NVSHMEM_TRACE=OFF
NVSHMEM_USE_DLMALLOC=OFF
NVSHMEM_USE_NCCL=ON
NVSHMEM_USE_GDRCOPY=ON
NVSHMEM_VERBOSE=OFF
NVSHMEM_DEFAULT_UCX=OFF
NVSHMEM_UCX_SUPPORT=ON
NVSHMEM_IBGDA_SUPPORT=ON
NVSHMEM_IBGDA_SUPPORT_GPUMEM_ONLY=OFF
NVSHMEM_IBDEVX_SUPPORT=ON
NVSHMEM_IBRC_SUPPORT=ON
As of the time of writing, version 3 of nvshmem is not supported in the Spack package for nvshmem:
https://packages.spack.io/package.html?name=nvshmem
The build system in version 2 was based on a Makefile, while this was changed to CMake in version 3.
As a result, we updating the Spack package requires writing a new package.py
file
https://docs.nvidia.com/hpc-sdk/cufftmp/index.html
Only available as a pre-built binary inside the NVIDIA HPC SDK.
As a result, the library, headers, etc need to be extracted from the SDK and put into a stand alone tar ball.
Create a tar ball of a path with the following structure:
cufftmp-aarch64.X.Y.Z
├─ lib
│ ├─ libcufftMp.so
│ ├─ libcufftMp.so.X
│ └─ libcufftMp.so.X.Y.Z
└─ include
├─ cufftXt.h
├─ cufftMp.h
├─ cufft.h
└─ cudalibxt.h
where X, Y and Z are the major
, minor
, patch
components of the version of cufftmp.
The first step is to download the nvidia hpc sdk.
cufftmp_major=11
cufftmp_minor=2
cufftmp_patch=6
# for 24.9
major=24
minor=9
cuda=12.6
# for 24.11
major=24
minor=11
cuda=12.6
nvpath=nvhpc_2024_${major}${minor}_Linux_aarch64_cuda_${cuda}
wget https://developer.download.nvidia.com/hpc-sdk/${major}.${minor}/${nvpath}.tar.gz
tar -xzf${nvpath}.tar.gz
# the following commands can be used to find files that we are looking for
# libraries
find $nvpath | grep libcufftMp
# header path
find $nvpath -name cufftmp
cufftpath=cufftmp-aarch64-${cufftmp_major}.${cufftmp_minor}.${cufftmp_patch}
mkdir $cufftpath
mkdir $cufftpath/include
mkdir $cufftpath/lib
cp $(find $nvpath -name cufftmp)/* $cufftpath/include
libpath=$nvpath/install_components/Linux_aarch64/${major}.${minor}/math_libs/${cuda}/targets/sbsa-linux/lib
Using libtree
, we can see that libcufftMp.so
the only other package that it depends on is NVSHMEM:
> libtree $cufftpath/lib/libcufftMp.so
libcufftMp.so.11
├── libnvshmem_host.so.3 [LD_LIBRARY_PATH]
│ ├── libpthread.so.0 [default path]
│ └── librt.so.1 [default path]
│ └── libpthread.so.0 [default path]
├── libpthread.so.0 [default path]
└── librt.so.1 [default path]
Note: I had to set LD_LIBRARY_PATH
to find libnvshmem_host.so.3
.