-
Notifications
You must be signed in to change notification settings - Fork 16
Frequently Asked Questions (FAQ)
NOTE: This FAQ is in the process of being updated, some instructions might be out of date...
- Who do I contact for help (after reading this page)?
- What is TAU?
- What are the pre-requisites / dependencies for TAU?
- How do I get TAU?
- How do I build TAU?
- What systems are supported?
- What languages / compilers are supported?
- Can you give me a simple example with GCC and OpenMP?
- What is instrumentation?
- Why should I instrument my code?
- How should I instrument my code?
- How do I configure and use PDT?
- Why SHOULDN'T I instrument my code?
- What about binary instrumentation? Does TAU support that?
- What is sampling?
- Why should I sample my code?
- How should I sample my code?
- Can I instrument and sample my code? How would I do that?
- What are the environment variables for controlling TAU at runtime?
- How do I collect a profile?
- How do I collect a trace?
When in doubt, you could read the manuals. If you can't find what you are looking for, please send an email to TAU Bugs.
Tuning and Analysis Utilities (TAU) is a project of the Performance Research Laboratory, in the Computer and Information Science Department at the University of Oregon. TAU is an instrumentation API and measurement library for performance analysis, particularly large scale parallel (HPC) applications. However, TAU can be used to measure nearly any application, including those written in C, C++, Fortran, Python, Java, etc. TAU includes source-to-source compilers and binary tools for automatic instrumentation and/or periodic sampling. TAU also includes analysis and visualization tools. TAU does lots more. For more information, see the TAU web site.
The only hard dependency for TAU is a working C++ compiler. Depending on the additional features you want to use, you may have other dependencies. Common dependencies include:
- Low overhead instrumentation: PDT (HIGHLY recommended for instrumentation)
- Program counter lookup: Binutils (included with TAU if not preinstalled, recommended for compiler-based instrumentation - i.e. without PDT, also recommended for OpenMP and/or sampling measurement) See these instructions for building binutils on your own (if necessary).
- Hardware counters: PAPI
- Callstack unwinding: Libunwind (included with TAU if not preinstalled, recommended for sampling measurement)
- MPI (probably pre-installed on your system)
- CUDA/CUPTI (probably pre-installed on your system)
Unusual dependencies include:
- SHMEM (probably pre-installed on your system)
- Tracing: VTF, OTF, or EPILOG library
- I/O profiling: Darshan
- Score-P
- Java: a working JDK
- Python
Aside from the Github mirror of the current TAU master branch (this repo), official TAU releases are available for download from http://tau.uoregon.edu. Follow the instructions on the "Downloads" page, or download it directly using wget from a terminal:
wget http://tau.uoregon.edu/tau.tgz
# ...expand the tarball...
tar -xzf tau.tgz
# ...and change to the TAU directory. The release number will vary.
cd tau-<major_version>.<minor_version>.<point_version>
TAU is typically configured with:
./configure
make install
# ...and then you need to put TAU in your shell command path:
export PATH=$PATH:/$arch/bin # bash
set path ($path /$arch/bin) # csh
...although, this will just support measuring single threaded, single process
applications with no hardware counter support, or any other options. If you
want TAU, you are likely interested in those options (see below). For a full
list of options, try ./configure -help
or ./configure -fullhelp
for options, although you will likely end up back on this page. The most
common configuration is the specification of a compiler (other than the
autodetected default). Here is an example using the Intel compiler:
./configure -cc=icc -c++=icpc -fortran=intel ...
TAU has been ported to all major systems. The simplified (and not exhaustive) list includes:
- any/all versions of *nix, Linux clusters
- Cray Systems (XT3, CNL, XMT, etc.)
- IBM BG/Q, BG/P, BG/L, AIX, ppc64le
- Sun
- SGI
- Windows
- OSX
- GPGPU systems
...and many more.
If you are instrumenting your code, TAU supports many languages and compilers. The list includes:
- C: cc, gcc, clang, bgclang, gcc4, scgcc, KCC, pgcc, guidec, xlc, ecc, pathcc, orcc
- C++: CC, KCC, g++, xlC, cxx, pgCC, pgcpp, FCC, guidec++, aCC, c++, ecpc, clang++, bgclang++, g++4, icpc, scgcc, scpathCC, pathCC, orCC
- Fortran: gnu, sgi, ibm, ibm64, hp, cray, pgi, absoft, fujitsu, sun, compaq, g95, open64, kai, nec, hitachi, intel, absoft, lahey, nagware, pathscale gfortran, gfortran4
- Unified Parallel C: upc/gcc (GNU UPC), upcc (Berkeley UPC), cc (Cray CCE UPC)
- Python
- Java
...If you are NOT instrumenting your code, you can use TAU sampling to measure any executable.
Sure!
./configure -openmp # configures TAU
make install # builds TAU
export PATH=$PATH:`pwd`/`arch`/bin # puts TAU utilities in your execution path
export TAU_MAKEFILE=`pwd`/`arch`/lib/Makefile.tau-openmp # tells tau_cc.sh what settings to use
That's it! GCC is the default compiler, and if it is in your path, you are ready to go. For example, if you have a simple program in one C file, you would instrument and build it with TAU like this:
tau_cc.sh test.c -fopenmp -o test
...will parse, instrument and compile your code. When you execute it, you will get N profiles, profile.0.0.0 ... profile.0.0.N-1:
OMP_NUM_THREADS=2 ./test
...and to see a summary of the profile data, run the pprof
program.
The examples directory is full of useful examples for different TAU configurations. The simplest (and most commonly used) example is in the examples/mm directory. The full set of instructions for building TAU and running that example is:
./configure -openmp
make install
export PATH=$PATH:`pwd`/`arch`/bin
export TAU_MAKEFILE=`pwd`/`arch`/lib/Makefile.tau-openmp
cd examples/mm
make clean
make
export OMP_NUM_THREADS=2
./matmult
pprof
...which will generate output something like this:
pprof
Reading Profile files in profile.*
NODE 0;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 4 36 1 1 36522 .TAU application
89.0 0.086 32 1 1 32502 main
88.8 0.332 32 1 11 32416 do_work
48.4 0.149 17 1 1 17668 compute
48.0 17 17 1 1 17519 OpenMP_PARALLEL_REGION: compute.omp_fn.1
31.8 0.049 11 1 1 11610 compute_interchange
31.7 11 11 1 1 11561 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0
5.1 1 1 3 3 619 initialize
2.4 0.88 0.88 3 0 293 allocateMatrix
1.2 0.409 0.422 3 3 141 OpenMP_PARALLEL_REGION: initialize.omp_fn.0
1.1 0.405 0.405 1 0 405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1
0.7 0.251 0.251 1 0 251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0
0.2 0.069 0.069 3 0 23 freeMatrix
0.0 0.013 0.013 3 0 4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0
NODE 0;CONTEXT 0;THREAD 1:
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 0.539 29 1 5 29793 .TAU application
58.8 17 17 1 0 17509 OpenMP_PARALLEL_REGION: compute.omp_fn.1
38.8 11 11 1 0 11551 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0
0.7 0.194 0.194 3 0 65 OpenMP_PARALLEL_REGION: initialize.omp_fn.0
FUNCTION SUMMARY (total):
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 4 66 2 6 33158 .TAU application
52.8 34 35 2 1 17514 OpenMP_PARALLEL_REGION: compute.omp_fn.1
49.0 0.086 32 1 1 32502 main
48.9 0.332 32 1 11 32416 do_work
34.9 22 23 2 1 11556 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0
26.6 0.149 17 1 1 17668 compute
17.5 0.049 11 1 1 11610 compute_interchange
2.8 1 1 3 3 619 initialize
1.3 0.88 0.88 3 0 293 allocateMatrix
0.9 0.603 0.616 6 3 103 OpenMP_PARALLEL_REGION: initialize.omp_fn.0
0.6 0.405 0.405 1 0 405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1
0.4 0.251 0.251 1 0 251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0
0.1 0.069 0.069 3 0 23 freeMatrix
0.0 0.013 0.013 3 0 4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0
FUNCTION SUMMARY (mean):
---------------------------------------------------------------------------------------
%Time Exclusive Inclusive #Call #Subrs Inclusive Name
msec total msec usec/call
---------------------------------------------------------------------------------------
100.0 2 33 1 3 33158 .TAU application
52.8 17 17 1 0.5 17514 OpenMP_PARALLEL_REGION: compute.omp_fn.1
49.0 0.043 16 0.5 0.5 32502 main
48.9 0.166 16 0.5 5.5 32416 do_work
34.9 11 11 1 0.5 11556 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0
26.6 0.0745 8 0.5 0.5 17668 compute
17.5 0.0245 5 0.5 0.5 11610 compute_interchange
2.8 0.718 0.928 1.5 1.5 619 initialize
1.3 0.44 0.44 1.5 0 293 allocateMatrix
0.9 0.301 0.308 3 1.5 103 OpenMP_PARALLEL_REGION: initialize.omp_fn.0
0.6 0.203 0.203 0.5 0 405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1
0.4 0.126 0.126 0.5 0 251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0
0.1 0.0345 0.0345 1.5 0 23 freeMatrix
0.0 0.0065 0.0065 1.5 0 4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0
Instrumentation is the process of inserting observation/measurement into an executable. It can be done at either the source code or binary stage or hooked into the runtime in the case of Java and Python. TAU instrumentation includes timers, counters and other specialized measurement.
Instrumentation is the most portable way to observe the behavior of an application. In the case of C/C++ and Fortran, TAU timers and counters are simple C function calls, and are linked in just like any other library. Instrumentation is not the only method of measurement, and sometimes it is not advised (see below).
There are two methods for source based instrumentation. The first and recommended method is to use PDT. PDT will parse the source code, find function (and optionally outer loop) boundaries, insert TAU timers, and pass the instrumented code to the regular compiler. The second method is to use compiler based instrumentation. Compiler based instrumentation is just what it sounds like - the compiler actually inserts the TAU calls during the compilation process. We strongly recommend using PDT whenever possible, as the overhead associated with compiler-based instrumentation is typically much higher. Regardless of the method, once TAU is configured and built you can use TAU like any other compiler:
tau_cc.sh -O3 -g -c test.c -o test
PDT is available for download from http://tau.uoregon.edu/pdt. Follow the instructions on the "Downloads" page, or download it directly using wget from a terminal:
wget http://tau.uoregon.edu/pdt.tgz
# ...expand the tarball...
tar -xzf pdt.tgz
# ...and change to the PDT directory. The release number will vary.
cd pdtoolkit-[major_version].[minor_version]
Configure and build PDT (optionally provide an installation prefix, otherwise the installation is in-place). The following example will build PDT with GCC and install it in /path/to/pdt/installation:
./configure -GNU -prefix=/path/to/pdt/installation
make install
# Then, when configuring TAU, use the -pdt option:
cd tau-[major_version].[minor_version].[point_version]
./configure -pdt=/path/to/pdt/installation
make install
# ...for a full list of configuration options, do this:
./configure -help
The primary reason to not instrument your source code is that you don't have access to the source code. In that case, you could try binary instrumentation. The primary reason to not instrument your code AT ALL is if you have many lightweight functions that will be called millions of times and introduce too much overhead. A common example is a C++ program with many getter and setter functions. In those cases you should try sampling.
Yes, TAU supports binary instrumentation with either MAQAO, DynInstAPI or PEBiL. The choice of instrumentor is configuration-dependent. After compiling your program normally, use the tau_rewrite program to instrument your program: tau_rewrite ./myprogram
Sampling is when the program is periodically interrupted, and the running state of the program is examined. The samples are aggregated and a histogram of where the program spends its time is built. Statistical theory states that more samples are taken in the functions which are executed more often and/or for longer durations.
The first reason to sample your program is that you don't have access to the source code. The second reason to sample your code is if you have many lightweight functions that will be called millions of times and introduce too much overhead if instrumented. A common example is a C++ program with many getter and setter functions. In those cases you should try sampling.
The way to sample a program which is not instrumented by TAU is to run it with the tau_exec program. For example, a program with MPI is executed with: tau_exec -T mpi -ebs ./myprogram For another example, a program with just OpenMP or pthread concurrency is executed with: tau_exec -T serial,openmp -ebs ./myprogram tau_exec -T serial,pthread -ebs ./myprogram ...where the "serial" configuration tells TAU to not use the MPI configuration. As a final example, a program with both MPI and OpenMP is executed this way: tau_exec -T mpi,openmp -ebs ./myprogram ...or (because MPI is the default): tau_exec -T openmp -ebs ./myprogram For more information, run tau_exec with no parameters to get help.
Yes. Essentially, you would either manually or automatically instrument your program, and then set the TAU_SAMPLING environment variable before executing the program: export TAU_SAMPLING=1 #bash setenv TAU_SAMPLING 1 #csh ./myprogram Multiprocess Questions
All of them, for the most part. In particular: MPICH MVAPICH LAM/MPI Open MPI IBM MPI Cray MPI HP MPI ... TAU uses the PMPI profiling interface to measure all MPI calls. Because PMPI is part of the MPI standard, any standards-compliant implementation should be supported.
First, make sure MPI (mpicc, mpiCC, mpif77, etc.) is in your path. Then, configure TAU with:
./configure -mpi
If your MPI installation has predictable locations for include and lib directories, the configure process should find them. For example, if your mpicc compiler is in /home/user/mpi/bin, configure will use /home/user/mpi/include and /home/user/mpi/lib. If that is NOT the case, you also need to tell TAU the path to those directories:
./configure -mpi -mpiinc=/some/path/to/include -mpilib=/some/other/path/to/lib
If your MPI implementation has an unusual library name(s) or additional library dependencies, you can also specify the library name(s):
./configure -mpi -mpiinc=/some/path/to/include -mpilib=/some/other/path/to/lib
-mpilibrary="-lmy_mpi -lmy_mpi2 -L/path/to/dependency/lib -lmpi_dependency"
Building TAU for OpenMP is straightforward: ./configure -openmp For the best results, if you are using GCC, Open64 or OpenUH we recommend you configure with the following options (it allows for the most data collection flexibility): ./configure -openmp -bfd=download -unwind=download If you are using Intel compilers, you should configure like this (OMPT is an interface for OpenMP runtime introspection. For more information, see http://openmp.org/mp-documents/ompt-tr2.pdf or http://link.springer.com/chapter/10.1007%2F978-3-642-40698-0_13): ./configure -openmp -bfd=download -unwind=download -ompt=download If you are using some other compiler vendor and you want to use OPARI to instrument OpenMP regions, you should configure with -opari (see above for information on configuring PDT): ./configure -openmp -opari -pdt=/path/to/pdt/installation
Building TAU for pthreads is straightforward: ./configure -pthread
There are several. A complete list is here. Common variables include:
Variable | Description | Default Value |
---|---|---|
TAU_PROFILE |
Set to 1 to have TAU profile your code | 1 |
TAU_TRACE |
Set to 1 to have TAU trace your code | 0 |
TAU_METRICS |
Colon delimited list of TAU/PAPI metrics to profile | TIME |
PROFILEDIR |
Selectively measure groups of routines and statements. Use with -profile command line option. See Chapter 2, Profiling | current working directory |
TAU_CALLPATH |
When set to 1 TAU will generate call-path data. Use with TAU_CALLPATH_DEPTH . |
0 |
TAU_CALLPATH_DEPTH |
Sets the depth of the callpath profiling. Use with TAU_CALLPATH environment variable. |
1 |
TAU_TRACK_MESSAGE |
Track MPI message statistics (profiling), messages lines (tracing). | 0 |
TAU_COMM_MATRIX |
Generate MPI communication matrix data. | 0 |
TAU_THROTTLE |
Enables the runtime throttling of events that are lightweight. See Section 1.3, “Selectively Profiling an Application” | 1 |
TAU_THROTTLE_NUMCALLS |
Set the maximum number of calls that will be profiled for any function when TAU_THROTTLE is enabled. See Section 1.3, “Selectively Profiling an Application” |
100000 |
TAU_THROTTLE_PERCALL |
Set the minimum inclusive time (in milliseconds) a function has to have to be instrumented when TAU_THROTTLE is enabled. See Section 1.3, “Selectively Profiling an Application” |
1 |
TAU_TRACEFILE |
Specifies the name of trace file. | 'traces.*' |
TRACEDIR |
Specifies the directory where trace file are to be stored. See Section 3.1, “Generating Event Traces” | current working directory |
TAU_VERBOSE |
When set TAU will print out information about the its configuration when running a instrumented application. | 0 |
TAU_PROFILE_FORMAT |
When set to snapshot TAU will generate condensed snapshot profiles (they merge together different metrics so there is only one file per node.) Instead of the default kind. When set to merged, TAU will pre-compute mean and std. dev. at the end of execution. | profiles |
TAU_SYNCHRONIZE_CLOCKS |
When set TAU will correct for any time discrepancies between nodes because of their CPU clock lag. This should produce more reliable trace data. | 0 |
TAU_SAMPLING |
Default value is 0 (off). When TAU_SAMPLING is set, we collect additional profile or trace information (depending on whether TAU_PROFILE or TAU_TRACE is set respectively) via periodic sampling at runtime. Metrics collected and sampling period is controlled by TAU_EBS_SOURCE and TAU_EBS_PERIOD variables respectively. The TAU_EBS_UNWIND variable determines if callstack unwinding is enabled at each sample. |
0 |
TAU_EBS_PERIOD |
Default value is 1,000. This variable sets the period between samples. The semantics of this value is discussed in the section above on TAU_EBS_SOURCE . |
varies by platform, usually 30000 microseconds |
TAU_SUMMARY |
Set this variables to 1 to generate just min/max/stddev/mean statistics instead of per-node data. Use paraprof -dumpsummary and then pprof -f profile.Max/Min to see the data. | 0 |
TAU_CUPTI_API |
Default: runtime, options: runtime,driver,both. Controls which layer of CUDA is tracked within the CUPTI measurement system. See for example: tau_exec -T serial,cupti -cupti ./matmult . Option should be set basied on which layer the CUDA program uses—runtime when the program uses the CUDA runtime API, driver when the program uses the driver API. NOTE: Both the PGI accelerator and the HMPP compilers use the driver API. |
0 |
Profiling is the default mode of TAU - whether instrumenting or sampling, profiles will be collected by default.
To collect a trace, set the appropriate environment variable before executing your program:
export TAU_TRACE=1 #bash
setenv TAU_TRACE 1 #csh
Here are the steps to build binutils 2.23.2 for TAU ($installation_dir
is the location where you want to install binutils):
Get the tarball
wget http://www.cs.uoregon.edu/research/paracomp/tau/tauprofile/dist/binutils-2.23.2.tar.gz
Expand the tarball
tar -xvzf binutils-2.23.2.tar.gz
change to the source directory
cd binutils-2.23.2
configure
./configure CFLAGS=-fPIC CXXFLAGS=-fPIC —prefix=$installation_dir --disable-nls --disable-werror
make
make
make install
make install
copy additional resources required by TAU to the installation directory
cp bfd/*.h $installation_dir/include/.
cp -r include/* $installation_dir/include/.
cp libiberty/libiberty.a $installation_dir/lib
cp libopcodes/libopcodes.a $installation_dir/lib
edit the bfd.h header to not require config.h
sed -e 's/#if !defined PACKAGE && !defined PACKAGE_VERSION/#if 0/' $installation_dir/include/bfd.h > /tmp/bfd.h
mv /tmp/bfd.h $installation_dir/include
done!
Still have questions? Check out the official documentation or contact tau-bugs@cs.uoregon.edu for help.
- Home
- Installing TAU
- Using TAU
- Measuring XGC with TAU on Summit and Spock
- Configuring TAU to measure IO libraries
- Instrumenting CXX Applications
- Measuring the Papyrus Key Value Store
- Using TAU to Profile and or Trace ADIOS
- Using the Monitoring Plugin
- Quick Start for p2z with TAU
- Quick Start for LULESH with TAU
- Paraprof with X11 Forwarding
- Using the TAU Skel Plugin
- Using TAU with Python
- Streaming TAU data to ADIOS2 Profiles
- Frequently Asked Questions (FAQ)