Frequently Asked Questions (FAQ)

Frequently Asked Questions about TAU

NOTE: This FAQ is in the process of being updated, some instructions might be out of date...

General Questions

Who do I contact for help (after reading this page)?
What is TAU?
What are the pre-requisites / dependencies for TAU?
How do I get TAU?
How do I build TAU?
What systems are supported?
What languages / compilers are supported?
Can you give me a simple example with GCC and OpenMP?

Instrumentation Questions

What is instrumentation?
Why should I instrument my code?
How should I instrument my code?
How do I configure and use PDT?
Why SHOULDN'T I instrument my code?
What about binary instrumentation? Does TAU support that?

Sampling Questions

What is sampling?
Why should I sample my code?
How should I sample my code?
Can I instrument and sample my code? How would I do that?

Multithread Questions

How do I build TAU for OpenMP?
How do I build TAU for pthreads?

Runtime Questions

What are the environment variables for controlling TAU at runtime?
How do I collect a profile?
How do I collect a trace?

General Questions

Who do I contact for help (after reading this page)?

When in doubt, you could read the manuals. If you can't find what you are looking for, please send an email to TAU Bugs.

What is TAU?

Tuning and Analysis Utilities (TAU) is a project of the Performance Research Laboratory, in the Computer and Information Science Department at the University of Oregon. TAU is an instrumentation API and measurement library for performance analysis, particularly large scale parallel (HPC) applications. However, TAU can be used to measure nearly any application, including those written in C, C++, Fortran, Python, Java, etc. TAU includes source-to-source compilers and binary tools for automatic instrumentation and/or periodic sampling. TAU also includes analysis and visualization tools. TAU does lots more. For more information, see the TAU web site.

What are the pre-requisites / dependencies for TAU?

The only hard dependency for TAU is a working C++ compiler. Depending on the additional features you want to use, you may have other dependencies. Common dependencies include:

Low overhead instrumentation: PDT (HIGHLY recommended for instrumentation)
Program counter lookup: Binutils (included with TAU if not preinstalled, recommended for compiler-based instrumentation - i.e. without PDT, also recommended for OpenMP and/or sampling measurement) See these instructions for building binutils on your own (if necessary).
Hardware counters: PAPI
Callstack unwinding: Libunwind (included with TAU if not preinstalled, recommended for sampling measurement)
MPI (probably pre-installed on your system)
CUDA/CUPTI (probably pre-installed on your system)

Unusual dependencies include:

SHMEM (probably pre-installed on your system)
Tracing: VTF, OTF, or EPILOG library
I/O profiling: Darshan
Score-P
Java: a working JDK
Python

How do I get TAU?

Aside from the Github mirror of the current TAU master branch (this repo), official TAU releases are available for download from http://tau.uoregon.edu. Follow the instructions on the "Downloads" page, or download it directly using wget from a terminal:

wget http://tau.uoregon.edu/tau.tgz
# ...expand the tarball...
tar -xzf tau.tgz
# ...and change to the TAU directory. The release number will vary.
cd tau-<major_version>.<minor_version>.<point_version>

How do I build TAU?

TAU is typically configured with:

./configure
make install
# ...and then you need to put TAU in your shell command path:
export PATH=$PATH:/$arch/bin # bash
set path ($path /$arch/bin)  # csh

...although, this will just support measuring single threaded, single process applications with no hardware counter support, or any other options. If you want TAU, you are likely interested in those options (see below). For a full list of options, try ./configure -help or ./configure -fullhelp for options, although you will likely end up back on this page. The most common configuration is the specification of a compiler (other than the autodetected default). Here is an example using the Intel compiler:

./configure -cc=icc -c++=icpc -fortran=intel ...

What systems are supported?

TAU has been ported to all major systems. The simplified (and not exhaustive) list includes:

any/all versions of *nix, Linux clusters
Cray Systems (XT3, CNL, XMT, etc.)
IBM BG/Q, BG/P, BG/L, AIX, ppc64le
Sun
SGI
Windows
OSX
GPGPU systems

...and many more.

What languages / compilers are supported?

If you are instrumenting your code, TAU supports many languages and compilers. The list includes:

C: cc, gcc, clang, bgclang, gcc4, scgcc, KCC, pgcc, guidec, xlc, ecc, pathcc, orcc
C++: CC, KCC, g++, xlC, cxx, pgCC, pgcpp, FCC, guidec++, aCC, c++, ecpc, clang++, bgclang++, g++4, icpc, scgcc, scpathCC, pathCC, orCC
Fortran: gnu, sgi, ibm, ibm64, hp, cray, pgi, absoft, fujitsu, sun, compaq, g95, open64, kai, nec, hitachi, intel, absoft, lahey, nagware, pathscale gfortran, gfortran4
Unified Parallel C: upc/gcc (GNU UPC), upcc (Berkeley UPC), cc (Cray CCE UPC)
Python
Java

...If you are NOT instrumenting your code, you can use TAU sampling to measure any executable.

Can you give me a simple example with GCC and OpenMP?

Sure!

./configure -openmp                   # configures TAU
make install                          # builds TAU
export PATH=$PATH:`pwd`/`arch`/bin    # puts TAU utilities in your execution path
export TAU_MAKEFILE=`pwd`/`arch`/lib/Makefile.tau-openmp   # tells tau_cc.sh what settings to use

That's it! GCC is the default compiler, and if it is in your path, you are ready to go. For example, if you have a simple program in one C file, you would instrument and build it with TAU like this:

tau_cc.sh test.c -fopenmp -o test

...will parse, instrument and compile your code. When you execute it, you will get N profiles, profile.0.0.0 ... profile.0.0.N-1:

OMP_NUM_THREADS=2 ./test

...and to see a summary of the profile data, run the pprof program.

The examples directory is full of useful examples for different TAU configurations. The simplest (and most commonly used) example is in the examples/mm directory. The full set of instructions for building TAU and running that example is:

./configure -openmp 
make install
export PATH=$PATH:`pwd`/`arch`/bin
export TAU_MAKEFILE=`pwd`/`arch`/lib/Makefile.tau-openmp
cd examples/mm
make clean
make
export OMP_NUM_THREADS=2
./matmult
pprof

...which will generate output something like this:

pprof
Reading Profile files in profile.*

NODE 0;CONTEXT 0;THREAD 0:
---------------------------------------------------------------------------------------
%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call 
---------------------------------------------------------------------------------------
100.0            4           36           1           1      36522 .TAU application
 89.0        0.086           32           1           1      32502 main 
 88.8        0.332           32           1          11      32416 do_work 
 48.4        0.149           17           1           1      17668 compute 
 48.0           17           17           1           1      17519 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
 31.8        0.049           11           1           1      11610 compute_interchange 
 31.7           11           11           1           1      11561 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
  5.1            1            1           3           3        619 initialize 
  2.4         0.88         0.88           3           0        293 allocateMatrix 
  1.2        0.409        0.422           3           3        141 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
  1.1        0.405        0.405           1           0        405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1 
  0.7        0.251        0.251           1           0        251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0 
  0.2        0.069        0.069           3           0         23 freeMatrix 
  0.0        0.013        0.013           3           0          4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0 

NODE 0;CONTEXT 0;THREAD 1:
---------------------------------------------------------------------------------------
%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call 
---------------------------------------------------------------------------------------
100.0        0.539           29           1           5      29793 .TAU application
 58.8           17           17           1           0      17509 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
 38.8           11           11           1           0      11551 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
  0.7        0.194        0.194           3           0         65 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 

FUNCTION SUMMARY (total):
---------------------------------------------------------------------------------------
%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call 
---------------------------------------------------------------------------------------
100.0            4           66           2           6      33158 .TAU application
 52.8           34           35           2           1      17514 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
 49.0        0.086           32           1           1      32502 main 
 48.9        0.332           32           1          11      32416 do_work 
 34.9           22           23           2           1      11556 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
 26.6        0.149           17           1           1      17668 compute 
 17.5        0.049           11           1           1      11610 compute_interchange 
  2.8            1            1           3           3        619 initialize 
  1.3         0.88         0.88           3           0        293 allocateMatrix 
  0.9        0.603        0.616           6           3        103 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
  0.6        0.405        0.405           1           0        405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1 
  0.4        0.251        0.251           1           0        251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0 
  0.1        0.069        0.069           3           0         23 freeMatrix 
  0.0        0.013        0.013           3           0          4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0 

FUNCTION SUMMARY (mean):
---------------------------------------------------------------------------------------
%Time    Exclusive    Inclusive       #Call      #Subrs  Inclusive Name
              msec   total msec                          usec/call 
---------------------------------------------------------------------------------------
100.0            2           33           1           3      33158 .TAU application
 52.8           17           17           1         0.5      17514 OpenMP_PARALLEL_REGION: compute.omp_fn.1 
 49.0        0.043           16         0.5         0.5      32502 main 
 48.9        0.166           16         0.5         5.5      32416 do_work 
 34.9           11           11           1         0.5      11556 OpenMP_PARALLEL_REGION: compute_interchange.omp_fn.0 
 26.6       0.0745            8         0.5         0.5      17668 compute 
 17.5       0.0245            5         0.5         0.5      11610 compute_interchange 
  2.8        0.718        0.928         1.5         1.5        619 initialize 
  1.3         0.44         0.44         1.5           0        293 allocateMatrix 
  0.9        0.301        0.308           3         1.5        103 OpenMP_PARALLEL_REGION: initialize.omp_fn.0 
  0.6        0.203        0.203         0.5           0        405 OpenMP_IMPLICIT_BARRIER: compute.omp_fn.1 
  0.4        0.126        0.126         0.5           0        251 OpenMP_IMPLICIT_BARRIER: compute_interchange.omp_fn.0 
  0.1       0.0345       0.0345         1.5           0         23 freeMatrix 
  0.0       0.0065       0.0065         1.5           0          4 OpenMP_IMPLICIT_BARRIER: initialize.omp_fn.0

Instrumentation Questions

What is instrumentation?

Instrumentation is the process of inserting observation/measurement into an executable. It can be done at either the source code or binary stage or hooked into the runtime in the case of Java and Python. TAU instrumentation includes timers, counters and other specialized measurement.

Why should I instrument my code?

Instrumentation is the most portable way to observe the behavior of an application. In the case of C/C++ and Fortran, TAU timers and counters are simple C function calls, and are linked in just like any other library. Instrumentation is not the only method of measurement, and sometimes it is not advised (see below).

How should I instrument my code?

There are two methods for source based instrumentation. The first and recommended method is to use PDT. PDT will parse the source code, find function (and optionally outer loop) boundaries, insert TAU timers, and pass the instrumented code to the regular compiler. The second method is to use compiler based instrumentation. Compiler based instrumentation is just what it sounds like - the compiler actually inserts the TAU calls during the compilation process. We strongly recommend using PDT whenever possible, as the overhead associated with compiler-based instrumentation is typically much higher. Regardless of the method, once TAU is configured and built you can use TAU like any other compiler:

tau_cc.sh -O3 -g -c test.c -o test

How do I configure and use PDT?

PDT is available for download from http://tau.uoregon.edu/pdt. Follow the instructions on the "Downloads" page, or download it directly using wget from a terminal:

wget http://tau.uoregon.edu/pdt.tgz
# ...expand the tarball...
tar -xzf pdt.tgz
# ...and change to the PDT directory. The release number will vary.
cd pdtoolkit-[major_version].[minor_version]

Configure and build PDT (optionally provide an installation prefix, otherwise the installation is in-place). The following example will build PDT with GCC and install it in /path/to/pdt/installation:

./configure -GNU -prefix=/path/to/pdt/installation
make install
# Then, when configuring TAU, use the -pdt option:
cd tau-[major_version].[minor_version].[point_version]
./configure -pdt=/path/to/pdt/installation
make install
# ...for a full list of configuration options, do this:
./configure -help

Why SHOULDN'T I instrument my code?

The primary reason to not instrument your source code is that you don't have access to the source code. In that case, you could try binary instrumentation. The primary reason to not instrument your code AT ALL is if you have many lightweight functions that will be called millions of times and introduce too much overhead. A common example is a C++ program with many getter and setter functions. In those cases you should try sampling.

What about binary instrumentation? Does TAU support that?

Yes, TAU supports binary instrumentation with either MAQAO, DynInstAPI or PEBiL. The choice of instrumentor is configuration-dependent. After compiling your program normally, use the tau_rewrite program to instrument your program: tau_rewrite ./myprogram

Sampling Questions

What is sampling?

Sampling is when the program is periodically interrupted, and the running state of the program is examined. The samples are aggregated and a histogram of where the program spends its time is built. Statistical theory states that more samples are taken in the functions which are executed more often and/or for longer durations.

Why should I sample my code?

The first reason to sample your program is that you don't have access to the source code. The second reason to sample your code is if you have many lightweight functions that will be called millions of times and introduce too much overhead if instrumented. A common example is a C++ program with many getter and setter functions. In those cases you should try sampling.

How should I sample my code?

The way to sample a program which is not instrumented by TAU is to run it with the tau_exec program. For example, a program with MPI is executed with: tau_exec -T mpi -ebs ./myprogram For another example, a program with just OpenMP or pthread concurrency is executed with: tau_exec -T serial,openmp -ebs ./myprogram tau_exec -T serial,pthread -ebs ./myprogram ...where the "serial" configuration tells TAU to not use the MPI configuration. As a final example, a program with both MPI and OpenMP is executed this way: tau_exec -T mpi,openmp -ebs ./myprogram ...or (because MPI is the default): tau_exec -T openmp -ebs ./myprogram For more information, run tau_exec with no parameters to get help.

Can I instrument and sample my code? How would I do that?

Yes. Essentially, you would either manually or automatically instrument your program, and then set the TAU_SAMPLING environment variable before executing the program: export TAU_SAMPLING=1 #bash setenv TAU_SAMPLING 1 #csh ./myprogram Multiprocess Questions

Which MPI implementations are supported?

All of them, for the most part. In particular: MPICH MVAPICH LAM/MPI Open MPI IBM MPI Cray MPI HP MPI ... TAU uses the PMPI profiling interface to measure all MPI calls. Because PMPI is part of the MPI standard, any standards-compliant implementation should be supported.

How do I build TAU for MPI?

First, make sure MPI (mpicc, mpiCC, mpif77, etc.) is in your path. Then, configure TAU with: ./configure -mpi If your MPI installation has predictable locations for include and lib directories, the configure process should find them. For example, if your mpicc compiler is in /home/user/mpi/bin, configure will use /home/user/mpi/include and /home/user/mpi/lib. If that is NOT the case, you also need to tell TAU the path to those directories: ./configure -mpi -mpiinc=/some/path/to/include -mpilib=/some/other/path/to/lib If your MPI implementation has an unusual library name(s) or additional library dependencies, you can also specify the library name(s): ./configure -mpi -mpiinc=/some/path/to/include -mpilib=/some/other/path/to/lib
-mpilibrary="-lmy_mpi -lmy_mpi2 -L/path/to/dependency/lib -lmpi_dependency"

Multithread Questions

How do I build TAU for OpenMP?

Building TAU for OpenMP is straightforward: ./configure -openmp For the best results, if you are using GCC, Open64 or OpenUH we recommend you configure with the following options (it allows for the most data collection flexibility): ./configure -openmp -bfd=download -unwind=download If you are using Intel compilers, you should configure like this (OMPT is an interface for OpenMP runtime introspection. For more information, see http://openmp.org/mp-documents/ompt-tr2.pdf or http://link.springer.com/chapter/10.1007%2F978-3-642-40698-0_13): ./configure -openmp -bfd=download -unwind=download -ompt=download If you are using some other compiler vendor and you want to use OPARI to instrument OpenMP regions, you should configure with -opari (see above for information on configuring PDT): ./configure -openmp -opari -pdt=/path/to/pdt/installation

How do I build TAU for pthreads?

Building TAU for pthreads is straightforward: ./configure -pthread

Runtime Questions

What are the environment variables for controlling TAU at runtime?

There are several. A complete list is here. Common variables include:

Variable	Description	Default Value
`TAU_PROFILE`	Set to 1 to have TAU profile your code	1
`TAU_TRACE`	Set to 1 to have TAU trace your code	0
`TAU_METRICS`	Colon delimited list of TAU/PAPI metrics to profile	TIME
`PROFILEDIR`	Selectively measure groups of routines and statements. Use with -profile command line option. See Chapter 2, Profiling	current working directory
`TAU_CALLPATH`	When set to 1 TAU will generate call-path data. Use with `TAU_CALLPATH_DEPTH`.	0
`TAU_CALLPATH_DEPTH`	Sets the depth of the callpath profiling. Use with `TAU_CALLPATH` environment variable.	1
`TAU_TRACK_MESSAGE`	Track MPI message statistics (profiling), messages lines (tracing).	0
`TAU_COMM_MATRIX`	Generate MPI communication matrix data.	0
`TAU_THROTTLE`	Enables the runtime throttling of events that are lightweight. See Section 1.3, “Selectively Profiling an Application”	1
`TAU_THROTTLE_NUMCALLS`	Set the maximum number of calls that will be profiled for any function when `TAU_THROTTLE` is enabled. See Section 1.3, “Selectively Profiling an Application”	100000
`TAU_THROTTLE_PERCALL`	Set the minimum inclusive time (in milliseconds) a function has to have to be instrumented when `TAU_THROTTLE` is enabled. See Section 1.3, “Selectively Profiling an Application”	1
`TAU_TRACEFILE`	Specifies the name of trace file.	'traces.*'
`TRACEDIR`	Specifies the directory where trace file are to be stored. See Section 3.1, “Generating Event Traces”	current working directory
`TAU_VERBOSE`	When set TAU will print out information about the its configuration when running a instrumented application.	0
`TAU_PROFILE_FORMAT`	When set to snapshot TAU will generate condensed snapshot profiles (they merge together different metrics so there is only one file per node.) Instead of the default kind. When set to merged, TAU will pre-compute mean and std. dev. at the end of execution.	profiles
`TAU_SYNCHRONIZE_CLOCKS`	When set TAU will correct for any time discrepancies between nodes because of their CPU clock lag. This should produce more reliable trace data.	0
`TAU_SAMPLING`	Default value is 0 (off). When `TAU_SAMPLING` is set, we collect additional profile or trace information (depending on whether `TAU_PROFILE` or `TAU_TRACE` is set respectively) via periodic sampling at runtime. Metrics collected and sampling period is controlled by `TAU_EBS_SOURCE` and `TAU_EBS_PERIOD` variables respectively. The `TAU_EBS_UNWIND` variable determines if callstack unwinding is enabled at each sample.	0
`TAU_EBS_PERIOD`	Default value is 1,000. This variable sets the period between samples. The semantics of this value is discussed in the section above on `TAU_EBS_SOURCE`.	varies by platform, usually 30000 microseconds
`TAU_SUMMARY`	Set this variables to 1 to generate just min/max/stddev/mean statistics instead of per-node data. Use paraprof -dumpsummary and then pprof -f profile.Max/Min to see the data.	0
`TAU_CUPTI_API`	Default: runtime, options: runtime,driver,both. Controls which layer of CUDA is tracked within the CUPTI measurement system. See for example: `tau_exec -T serial,cupti -cupti ./matmult`. Option should be set basied on which layer the CUDA program uses—runtime when the program uses the CUDA runtime API, driver when the program uses the driver API. NOTE: Both the PGI accelerator and the HMPP compilers use the driver API.	0

How do I collect a profile?

Profiling is the default mode of TAU - whether instrumenting or sampling, profiles will be collected by default.

How do I collect a trace?

To collect a trace, set the appropriate environment variable before executing your program:

export TAU_TRACE=1 #bash
setenv TAU_TRACE 1 #csh

Frequently Asked Questions (FAQ)

Frequently Asked Questions about TAU

General Questions

Instrumentation Questions

Sampling Questions

Multiprocess Questions

Multithread Questions

Runtime Questions

Other Questions

General Questions

Who do I contact for help (after reading this page)?

What is TAU?

What are the pre-requisites / dependencies for TAU?

How do I get TAU?

How do I build TAU?

What systems are supported?

What languages / compilers are supported?

Can you give me a simple example with GCC and OpenMP?

Instrumentation Questions

What is instrumentation?

Why should I instrument my code?

How should I instrument my code?

How do I configure and use PDT?

Why SHOULDN'T I instrument my code?

What about binary instrumentation? Does TAU support that?

Sampling Questions

What is sampling?

Why should I sample my code?

How should I sample my code?

Can I instrument and sample my code? How would I do that?

Which MPI implementations are supported?

How do I build TAU for MPI?

Multithread Questions

How do I build TAU for OpenMP?

How do I build TAU for pthreads?

Runtime Questions

What are the environment variables for controlling TAU at runtime?

How do I collect a profile?

How do I collect a trace?

Other Questions

I can't use -bfd=download (no network access). How can I configure binutils for TAU?

Still have questions? Check out the official documentation or contact tau-bugs@cs.uoregon.edu for help.

Clone this wiki locally