Profiling MPI
This page describes how to use the
Intel Trace Analyzer and Collector (ITAC) to profile MPI code which uses Intel MPI. ITAC is part of the Intel Cluster Studio and is enabled when you run
module load ics
. Documentation can be found in the directory
/opt/intel/itac/8.0.3.007/doc
An alternative MPI implementation is provided by SGI's Message Passing Toolkit which uses a different profiling method (see
SgiMessagePassingToolkit).
MPI profiling is also available using SGI's
MpInside. This will work for both Intel MPI and SGI MPT.
There is more information on general code optimisation in the
code optimisation check list on the Hector website. Some of this document is specific to Hector but it gives an overview of things to consider when optimising code.
How to use the Intel Trace Analyzer and Collector (ITAC)
ITAC can profile an existing binary without recompilation by using the
itcpin
command (see section 3.5 of the
ITC reference guide). The following instructions describe how to profile an application by re-linking (and re-compiling if profiling of user code is required).
ITAC can be used to profile non-MPI code but it is more complicated than profiling MPI code. The steps described here assume you are using an MPI code.
- Build an executable which is linked to the ITAC version of the MPI libararies. This is done by giving the linker a
-trace
switch e.g.
> mpif90 -trace -o myprog.exe mysrc1.o mysrc2.o
> qsub myjob.pbs
- Look at the results using traceanalyzer. traceanalyzer can be CPU and memory hungry so consider running this application on zen-viz.
> traceanalyzer myprog.stf
The steps described above will profile the MPI calls but will not profile user code. In order to profile your code it is necessary to add a
-tcollect
flag to the compile step for the routines you wish to profile. When viewing the output in traceanalyzer a breakdown by subroutine can be obtained by right-clicking "Group application" and selecting the "Ungroup Application" option.
Problems and Solutions
Several things can prevent a trace file from being produced.
- The trace file is usually produced after the call to MPI_FINALIZE. If the job exits before calling MPI_FINALIZE then a trace file may not be produced.
- ITAC works by linking to an alternative MPI library. If the linker is given flags which force it to link with the standard MPI libraries, rather than the ITAC versions, this will result in a trace file not being produced.
There have been problems when profiling code which makes a very large number of subroutine calls. traceanalyzer can have difficulty coping and uses a large amount of CPU and memory. One solution is to be selective about which parts of your code are compiled with
-tcollect
, alternatively it is possible to reduce the amount of profiling information using the folding capability in ITAC (see section 3.4.3 of the
ITC reference guide.
Problems have been seen when ITAC tries to flush output to /tmp. The solution was to set the
VT_CONFIG
environment variable to point to a configuration file which specified a different temporary directory e.g.
-
> setenv VT_CONFIG ${HOME}/itac_conf
- Then create the configuration file containing the line:
FLUSH-PREFIX "/scratch/user/tmp"
where user
is your user name.
- Make the temporary directory if required:
> mkdir /scratch/user/tmp
The ITAC documentation says that version 10 or later of the Intel compilers is required to profile user code using
-tcollect
.
Calls to the ITAC API
Subroutine calls can be made, for example to profile a section of code. The subroutine which is being profiled needs to include the correct header file e.g.
INCLUDE 'vt.inc'
for Fortran, and the compiler needs to be able to find the include files e.g. may need to use a compiler flag
-I${VT_ROOT}/include
.
Some things to look out for are:
Calls to the ITAC API from Fortran are different to those given in the ITC Reference Guide. For example VT_begin in the guide is VTbegin in the Fortran version of the library.
The Fortran versions of the routines need an extra integer argument, which is an error code returned by the call. Failing to provide this argument results in a segmentation fault.
--
DavidAcreman - 04 Jan 2008