Skip to content

Measuring XGC with TAU on Summit and Spock

Kevin Huck edited this page Sep 24, 2021 · 7 revisions

Measuring XGC with TAU on Summit (Spock coming soon)

Here are some instructions for using TAU to measure XGC on Summit and Spock at OLCF.

CAMTIMERS with PerfStubs support

There is a version of the CAMTIMERS library which has been integrated with the PerfStubs support. On OLCF resources, that library is now installed in /gpfs/alpine/world-shared/phy122/lib/install/summit/camtimers-perfstubs/nvhpc21.7. For more information on PerfStubs, see https://github.com/khuck/perfstubs/blob/master/perfstubs_api/README.md

The source for this installation is in https://github.com/khuck/camtimers.

TAU installation

TAU has tool support for the PerfStubs interface. TAU is installed in /gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/nvhpc21.7 on summit. For use with GENE, it is installed in /gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/gcc9.3.

How was TAU configured?

TAU was installed from https://github.com/UO-OACISS/tau2/, using these commands:

git clone https://github.com/UO-OACISS/tau2.git
cd tau2
module load nvhpc/21.7 spectrum-mpi cuda/11.4 papi/6.0.0.1 binutils/2.36.1
./configure -mpi \
-c++=mpicxx \
-cc=mpicc \
-fortran=mpif90 \
-iowrapper \
-otf=download \
-ompt \
-cuda=/sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.4 \
-papi=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/papi-6.0.0.1-fsjlc4klbur5qax5vkys7jwqdorsof6x \
-bfd=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/binutils-2.36.1-abgveowfozcbngvoli6duel7zsfguvui \
-prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/nvhpc21.7 \
-tag=nvhpc21.7_omp
make -j16 install

The TAU configuration for use with GENE was built with this configuration:

module load otf2/2.3
./configure -mpi \
-c++=mpicxx \
-cc=mpicc \
-fortran=mpif90 \
-iowrapper \
-otf=${OLCF_OTF2_ROOT} \
-cuda=${OLCF_CUDA_ROOT} \
-papi=${OLCF_PAPI_ROOT} \
-bfd=${OLCF_BINUTILS_ROOT} \
-prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/gcc9.3 \
-tag=gcc9.3
make -j16 install

Running with TAU:

TAU is used at runtime with the tau_exec wrapper script. The script will preload TAU libraries and set appropriate environment variables. A sample script for running the XGC test program is:

#!/bin/bash
#BSUB -P PROJECTID
#BSUB -W 0:02
#BSUB -nnodes 1
#BSUB -J JOB_NAME
#BSUB -o OUTPUT_NAME.%J
#BSUB -e ERROR_OUTPUT_NAME.%J
#BSUB -N EMAIL_ADDRESS@pppl.gov
#BSUB -B EMAIL_ADDRESS@pppl.gov

# Load modules and set paths, same as the build environment
source /ccs/home/khuck/ECP-WDM/src/sourceme-summit.sh
# VERY IMPORTANT!  Darshan and TAU both try to wrap MPI_Init/MPI_Finalize, but only one library can...
module unload darshan-runtime

cd /gpfs/alpine/world-shared/projectid/userid/summit/XGC1Example

# create restart file directory
mkdir -p restart_dir

export OMP_NUM_THREADS=4
export xgc_bin_path=/ccs/home/userid/ECP-WDM/src/XGC-Devel/build_full_summit/bin/xgc-es-cpp-gpu
export tau_path=/gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/nvhpc21.7/ibm64linux/bin
PATH=${tau_path}:$PATH
cmd="tau_exec -T nvhpc21.7_omp -ompt"
jsrun -n 4 -r 4 -a 1 -g 1 -c 7 -b rs $cmd $xgc_bin_path --test

APEX installation

APEX also has tool support for the PerfStubs interface. APEX is a slightly different, but related, tool to TAU. For more information on using APEX, see https://github.com/UO-OACISS/apex.

APEX is installed in in /gpfs/alpine/world-shared/phy122/lib/install/summit/apex/nvhpc21.7. on summit.

How was APEX configured?

APEX was installed with these commands:

module load nvhpc/21.7 spectrum-mpi cuda/11.4 papi/6.0.0.1 binutils/2.36.1
module load otf2/2.3
module load gperftools/2.8.1
git clone https://github.com/UO-OACISS/apex.git
cd apex
cwd=`pwd`
builddir=apex_build_summit
instdir=apex_install_summit

rm -rf ${builddir} ${instdir}/include ${instdir}/lib
mkdir ${builddir}
cd ${builddir}

set -x
cmake \
-DCMAKE_C_COMPILER=`which nvc` \
-DCMAKE_CXX_COMPILER=`which nvc++` \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_INSTALL_PREFIX=/gpfs/alpine/world-shared/phy122/lib/install/summit/apex/nvhpc21.7 \
-DAPEX_SANITIZE=FALSE \
-DAPEX_WITH_CUDA=TRUE \
-DCUDAToolkit_ROOT=${OLCF_CUDA_ROOT}/11.0 \
-DPAPI_ROOT=${OLCF_PAPI_ROOT} \
-DGPERFTOOLS_ROOT=${OLCF_GPERFTOOLS_ROOT} \
-DOTF2_ROOT=${OLCF_OTF2_ROOT} \
-DBFD_ROOT=${OLCF_BINUTILS_ROOT} \
-DAPEX_WITH_OMPT=TRUE \
-DAPEX_BUILD_OMPT=FALSE \
-DAPEX_WITH_BFD=TRUE \
-DAPEX_WITH_PLUGINS=TRUE \
-DAPEX_WITH_PAPI=TRUE \
-DAPEX_WITH_OTF2=TRUE \
-DAPEX_WITH_TCMALLOC=TRUE \
-DAPEX_WITH_LM_SENSORS=TRUE \
-DAPEX_BUILD_TESTS=FALSE \
-DAPEX_BUILD_EXAMPLES=FALSE \
-DAPEX_WITH_ACTIVEHARMONY=FALSE \
-DAPEX_WITH_MPI=TRUE \
${cwd}/apex

make -j8
make -j install

Running with APEX:

Here's a sample job script for running with APEX:

#!/bin/bash
#BSUB -P PROJECTID
#BSUB -W 0:02
#BSUB -nnodes 1
#BSUB -J JOB_NAME
#BSUB -o OUTPUT_NAME.%J
#BSUB -e ERROR_OUTPUT_NAME.%J
#BSUB -N EMAIL_ADDRESS@pppl.gov
#BSUB -B EMAIL_ADDRESS@pppl.gov

# Load modules and set paths, same as the build environment
source /ccs/home/khuck/ECP-WDM/src/sourceme-summit.sh
# VERY IMPORTANT!  Darshan and APEX both try to wrap MPI_Finalize, but only one library can...
module unload darshan-runtime

cd /gpfs/alpine/world-shared/projectid/userid/summit/XGC1Example

# create restart file directory
mkdir -p restart_dir

export OMP_NUM_THREADS=4
export xgc_bin_path=/ccs/home/userid/ECP-WDM/src/XGC-Devel/build_full_summit/bin/xgc-es-cpp-gpu
# Options: see the output of `apex_exec --apex:help` for more info
apex_cmd="apex_exec --apex:quiet --apex:ompt --apex:kokkos --apex:cuda --apex:gtrace"
jsrun -n 4 -r 4 -a 1 -g 1 -c 10 -b rs $apex_cmd $xgc_bin_path --test

Example output

Here's a view of the Google Trace Events trace generated from the above command, visualized in Google Chrome: XGC trace visualized in Google Chrome And the same trace, visualized in Perfetto: XGC trace visualized in Google Perfetto

Known Issues

  • There is one problematic timer in the current XGC code base - the “F_SOURCE_FIRST_PART” timer that is started in update_analytic_wrap (in XGC_core/main_loop_f90_routines.F90) and stopped in add_particle_and_grid_dist_funcs_wrap overlaps with other timers (“UPDATE_ANALYTIC_F0” for example), and should either be removed or promoted up a function call or two, if it is intended to measure a phase. If I commented out, TAU handles the timers fine.
  • Also, it appears that even though nvc/nvc++/nvfortran claims to support OpenMP 5.0 OMPT callbacks, TAU and APEX are not getting any. More investigation is needed, but it appears that NVIDIA/PGI implemented provides the omp-tools.h header but doesn't actually provide any OMPT support in the runtime. Otherwise, we do see all the camtimers, Kokkos, and CUDA (host and device) events in the measurements.
  • Darshan and TAU don't play well together - see above. Make sure the Darshan runtime module is unloaded when running the simulation with TAU.