-
Notifications
You must be signed in to change notification settings - Fork 16
Measuring XGC with TAU on Summit and Spock
Here are some instructions for using TAU to measure XGC on Summit and Spock at OLCF.
There is a version of the CAMTIMERS library which has been integrated with the PerfStubs support. On OLCF resources, that library is now installed in /gpfs/alpine/world-shared/phy122/lib/install/summit/camtimers-perfstubs/nvhpc21.7
. For more information on PerfStubs, see https://github.com/khuck/perfstubs/blob/master/perfstubs_api/README.md
The source for this installation is in https://github.com/khuck/camtimers.
TAU has tool support for the PerfStubs interface.
TAU is installed in /gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/nvhpc21.7
on summit. For use with GENE, it is installed in /gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/gcc9.3
.
TAU was installed from https://github.com/UO-OACISS/tau2/, using these commands:
git clone https://github.com/UO-OACISS/tau2.git
cd tau2
module load nvhpc/21.7 spectrum-mpi cuda/11.4 papi/6.0.0.1 binutils/2.36.1
./configure -mpi \
-c++=mpicxx \
-cc=mpicc \
-fortran=mpif90 \
-iowrapper \
-otf=download \
-ompt \
-cuda=/sw/summit/nvhpc_sdk/rhel8/Linux_ppc64le/21.7/cuda/11.4 \
-papi=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/papi-6.0.0.1-fsjlc4klbur5qax5vkys7jwqdorsof6x \
-bfd=/sw/summit/spack-envs/base/opt/linux-rhel8-ppc64le/gcc-8.3.1/binutils-2.36.1-abgveowfozcbngvoli6duel7zsfguvui \
-prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/nvhpc21.7 \
-tag=nvhpc21.7_omp
make -j16 install
The TAU configuration for use with GENE was built with this configuration:
module load otf2/2.3
./configure -mpi \
-c++=mpicxx \
-cc=mpicc \
-fortran=mpif90 \
-iowrapper \
-otf=${OLCF_OTF2_ROOT} \
-cuda=${OLCF_CUDA_ROOT} \
-papi=${OLCF_PAPI_ROOT} \
-bfd=${OLCF_BINUTILS_ROOT} \
-prefix=/gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/gcc9.3 \
-tag=gcc9.3
make -j16 install
TAU is used at runtime with the tau_exec
wrapper script. The script will preload TAU libraries and set appropriate environment variables. A sample script for running the XGC test program is:
#!/bin/bash
#BSUB -P PROJECTID
#BSUB -W 0:02
#BSUB -nnodes 1
#BSUB -J JOB_NAME
#BSUB -o OUTPUT_NAME.%J
#BSUB -e ERROR_OUTPUT_NAME.%J
#BSUB -N EMAIL_ADDRESS@pppl.gov
#BSUB -B EMAIL_ADDRESS@pppl.gov
# Load modules and set paths, same as the build environment
source /ccs/home/khuck/ECP-WDM/src/sourceme-summit.sh
# VERY IMPORTANT! Darshan and TAU both try to wrap MPI_Init/MPI_Finalize, but only one library can...
module unload darshan-runtime
cd /gpfs/alpine/world-shared/projectid/userid/summit/XGC1Example
# create restart file directory
mkdir -p restart_dir
export OMP_NUM_THREADS=4
export xgc_bin_path=/ccs/home/userid/ECP-WDM/src/XGC-Devel/build_full_summit/bin/xgc-es-cpp-gpu
export tau_path=/gpfs/alpine/world-shared/phy122/lib/install/summit/tau2/nvhpc21.7/ibm64linux/bin
PATH=${tau_path}:$PATH
cmd="tau_exec -T nvhpc21.7_omp -ompt"
jsrun -n 4 -r 4 -a 1 -g 1 -c 7 -b rs $cmd $xgc_bin_path --test
APEX also has tool support for the PerfStubs interface. APEX is a slightly different, but related, tool to TAU. For more information on using APEX, see https://github.com/UO-OACISS/apex.
APEX is installed in in /gpfs/alpine/world-shared/phy122/lib/install/summit/apex/nvhpc21.7
. on summit.
APEX was installed with these commands:
module load nvhpc/21.7 spectrum-mpi cuda/11.4 papi/6.0.0.1 binutils/2.36.1
module load otf2/2.3
module load gperftools/2.8.1
git clone https://github.com/UO-OACISS/apex.git
cd apex
cwd=`pwd`
builddir=apex_build_summit
instdir=apex_install_summit
rm -rf ${builddir} ${instdir}/include ${instdir}/lib
mkdir ${builddir}
cd ${builddir}
set -x
cmake \
-DCMAKE_C_COMPILER=`which nvc` \
-DCMAKE_CXX_COMPILER=`which nvc++` \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DCMAKE_INSTALL_PREFIX=/gpfs/alpine/world-shared/phy122/lib/install/summit/apex/nvhpc21.7 \
-DAPEX_SANITIZE=FALSE \
-DAPEX_WITH_CUDA=TRUE \
-DCUDAToolkit_ROOT=${OLCF_CUDA_ROOT}/11.0 \
-DPAPI_ROOT=${OLCF_PAPI_ROOT} \
-DGPERFTOOLS_ROOT=${OLCF_GPERFTOOLS_ROOT} \
-DOTF2_ROOT=${OLCF_OTF2_ROOT} \
-DBFD_ROOT=${OLCF_BINUTILS_ROOT} \
-DAPEX_WITH_OMPT=TRUE \
-DAPEX_BUILD_OMPT=FALSE \
-DAPEX_WITH_BFD=TRUE \
-DAPEX_WITH_PLUGINS=TRUE \
-DAPEX_WITH_PAPI=TRUE \
-DAPEX_WITH_OTF2=TRUE \
-DAPEX_WITH_TCMALLOC=TRUE \
-DAPEX_WITH_LM_SENSORS=TRUE \
-DAPEX_BUILD_TESTS=FALSE \
-DAPEX_BUILD_EXAMPLES=FALSE \
-DAPEX_WITH_ACTIVEHARMONY=FALSE \
-DAPEX_WITH_MPI=TRUE \
${cwd}/apex
make -j8
make -j install
Here's a sample job script for running with APEX:
#!/bin/bash
#BSUB -P PROJECTID
#BSUB -W 0:02
#BSUB -nnodes 1
#BSUB -J JOB_NAME
#BSUB -o OUTPUT_NAME.%J
#BSUB -e ERROR_OUTPUT_NAME.%J
#BSUB -N EMAIL_ADDRESS@pppl.gov
#BSUB -B EMAIL_ADDRESS@pppl.gov
# Load modules and set paths, same as the build environment
source /ccs/home/khuck/ECP-WDM/src/sourceme-summit.sh
# VERY IMPORTANT! Darshan and APEX both try to wrap MPI_Finalize, but only one library can...
module unload darshan-runtime
cd /gpfs/alpine/world-shared/projectid/userid/summit/XGC1Example
# create restart file directory
mkdir -p restart_dir
export OMP_NUM_THREADS=4
export xgc_bin_path=/ccs/home/userid/ECP-WDM/src/XGC-Devel/build_full_summit/bin/xgc-es-cpp-gpu
# Options: see the output of `apex_exec --apex:help` for more info
apex_cmd="apex_exec --apex:quiet --apex:ompt --apex:kokkos --apex:cuda --apex:gtrace"
jsrun -n 4 -r 4 -a 1 -g 1 -c 10 -b rs $apex_cmd $xgc_bin_path --test
Here's a view of the Google Trace Events trace generated from the above command, visualized in Google Chrome: And the same trace, visualized in Perfetto:
- There is one problematic timer in the current XGC code base - the “F_SOURCE_FIRST_PART” timer that is started in
update_analytic_wrap
(in XGC_core/main_loop_f90_routines.F90) and stopped inadd_particle_and_grid_dist_funcs_wrap
overlaps with other timers (“UPDATE_ANALYTIC_F0” for example), and should either be removed or promoted up a function call or two, if it is intended to measure a phase. If I commented out, TAU handles the timers fine. - Also, it appears that even though nvc/nvc++/nvfortran claims to support OpenMP 5.0 OMPT callbacks, TAU and APEX are not getting any. More investigation is needed, but it appears that NVIDIA/PGI implemented provides the
omp-tools.h
header but doesn't actually provide any OMPT support in the runtime. Otherwise, we do see all the camtimers, Kokkos, and CUDA (host and device) events in the measurements. - Darshan and TAU don't play well together - see above. Make sure the Darshan runtime module is unloaded when running the simulation with TAU.
Still have questions? Check out the official documentation or contact tau-bugs@cs.uoregon.edu for help.
- Home
- Installing TAU
- Using TAU
- Measuring XGC with TAU on Summit and Spock
- Configuring TAU to measure IO libraries
- Instrumenting CXX Applications
- Measuring the Papyrus Key Value Store
- Using TAU to Profile and or Trace ADIOS
- Using the Monitoring Plugin
- Quick Start for p2z with TAU
- Quick Start for LULESH with TAU
- Paraprof with X11 Forwarding
- Using the TAU Skel Plugin
- Using TAU with Python
- Streaming TAU data to ADIOS2 Profiles
- Frequently Asked Questions (FAQ)