Name | Debugging tool |
Performance analysis |
Parallel programming |
Origin |
Exemplar |
---|---|---|---|---|---|
Assure, AssureView |
x | x | x | ||
cvd/cvperf |
x | x | |||
CXdb |
x | x | |||
CXperf |
x | x | |||
Guide, GuideView |
x | x | x | ||
HPF |
x | x | x | ||
Iris Explorer |
x | x | x | ||
KCC |
x | x | |||
MPI |
x | x | x | ||
MPICH |
x | x | |||
MPICL |
x | x | |||
OpenMP |
x | x | |||
Paragraph |
x | x | x | ||
Perfex |
x | x | |||
Pixie |
x | x | |||
PVM/XPVM |
x | x | x | x | x |
ROMIO |
x | x | |||
Scaltool |
x | x | |||
SpeedShop |
x | x | |||
SvPablo |
x | x | x | x | |
TotalView |
x | x |
In order to run it you should type both of the following commands:
(M)PICL and
ParaGraph
have been installed on modi4.
MPICL is a subroutine library for collecting information on communication and user-defined events in message-passing parallel programs
written in C or FORTRAN. In particular, for MPI programs it uses the MPI profiling interface to automatically intercept calls to MPI
communication routines, eliminating the need to add more than a few statements to the source code in order to collect the information. By using
the MPI_Pcontrol interface to the instrumentation commands, a single version of the MPI program can be used whether the instrumentation
library is linked with the executable or not.
ParaGraph is a graphical display tool for visualizing the behavior and performance of parallel programs that use MPI (Message-Passing Interface). The visual animation of a parallel program is based on execution trace information gathered during an actual run of the program on a
message-passing parallel computer system. The resulting trace data are replayed pictorially to provide a dynamic depiction of the behavior of the parallel program, as well as graphical summaries of its overall performance. The same performance data can be viewed from many different
visual perspectives to gain insights that might be missed by any single view. The necessary execution trace data are produced by MPICL,
which uses the profiling interface of MPI to provide timestamped records of MPI events.
Note: In the above MPI really means MPI and a few other message-passing
libraries like PVM, NX and others.
The usage of these libraries on modi4 is as follows:
1) ParaGraph
Use Paragraph with pre-built tracefiles:
2) MPICL
To generate your own tracefiles to feed to ParaGraph, you need to
add a few MPICL calls to your MPI code, and link to MPICL using
something like the following in your makefile
To see which MPICL calls you need, look at the example programs in
Run your MPI executable as usual. Assuming your PICL traces are in a file
called `tracefile`, you need to sort it
Assure and AssureView
Assure is a tool that can identify the bugs in a
parallel program and be a parallelization tool at the same time.
To use Assure to parallelize your code, identify a candidate parallel loop
in your code and insert OpenMP directives. Identify any variables that
obviously need to be private, and place them in an OpenMP PRIVATE() clause
(read more about OpenMP here).
Next, use 'assuref77' in place of the Fortran compiler:
assuref77 -c matmul.f
assuref77 -o matmul matmul.o
Assure builds a parallel computer simulator that identifies all potential
differences between the serial and OpenMP parallel versions of your code
for a given input data set.
AssureView can be used to browse the error lists alongside your source code, you
can run it by typing "assureview" on the Origin.
AssureView is a graphical tool which requires X windows for remote display.
More information on Assure can be found by reading the files
/usr/local/apps/tools/kai/readme
or
/usr/local/apps/tools/kai/assure37/docs
or by visiting the KAI website.
Cvd/cvperf
The WorkShop debugger on the Origin can be invoked simply by
cvd a.out core
where a.out has been compiled with the debug option, -g. More information
on the use can be obtained in the online help by issuing "man cvd".
A graphical tool for looking at the performance of an executable is cvperf.
The user first creates file with counts using pixie.
Several types of experiments can be made, and the user should consult the
on-line documentation of cvperf for further info.
CXdb
CXdb is a window based symbolic debugger that can be used to debug Fortran,
C and C++ programs compiled on the Exemplar. You can program, create, profile
and debug thread-parallel applications with CXdb. You can debug either an
executable, a core file or a running process.
To compile and link your program in a single step you would use
f77 -g file.f
and if you need to do it in two steps, you would need the debugging option
in both steps
f77 -g -c file.f sub1.f sub2.f
f77 -g file.o sub1.o sub2.o
Similarly for f90 you can type
f90 -g file.f foo.o -W1,+tools
You can invoke the debugger by "cxdb a.out" or "cxdb -c core" or
"cxdb a.out core" depending on which files you have available
during debugging. An MPI executable can be debugged by issuing the
"cxdb -mpi mpiexecutable". For a tutorial you should issue the command
"cxdb -tutorial".
More information on how to
compile your program and how to use the different options with cxdb
can be found on the man page "man cxdb".
CXperf
CXperf is a performance analyser for programs compiled with the Exemplar
Fortran90, C and C++ compilers. You can profile MPI and PVM application programs
as well. You can use CXperf in GUI, line or batch mode.
For instance for a Fortran program in line mode you would first compile and link with options
+pa and +O2 or +O3:
f90 +O2 +pa prog.f
Then start the line version of cxperf by typing
cxperf -nw a.out
Now you have entered cxperf. An online help can be invoked now by typing "help".
First you have to decide which routines you want to
profile (here we select all)
select routine all
Then you can use the collect command to select the metrics:
collect cpu
Then run the program by typing " run" and to view the performance reports,
you can type " analyze".
More information on
this performance analyzer tool can be found on the Exemplar man page
"man cxperf" or invoking help in cxperf itself.
Guide and GuideView
Guide is a high performance, instrumented implementation of OpenMP for
Fortran on the Origin. Guide is used in place of the Fortran compiler to create a parallel
program from Fortran source containing OpenMP directives.
To use Guide,
replace the command 'f77' with 'guidef77' on the Origin
guidef77 -c matmul.f
guidef77 -o matmul matmul.o
To create an instrumented version of the program, specify "-WGstats"
on the link line:
guidef77 -WGstats -o matmul matmul.o
To run in parallel, set the environment variable OMP_NUM_THREADS equal
to the number of processors.
When an instrumented OpenMP program runs, it creates a file called
'guide_stats'. GuideView is a graphical performance data browser, designed
to make it easy to identify and eliminate parallel performance bottlenecks.
To invoke GuideView, issue 'guideview' with the name of a statistics file:
guideview guide_stats
GuideView provides several views of the performance data. It shows wall
clock time spent in parallel and in serial, grouped by parallel region
or by thread. It also displays overheads due to thread synchronization
and OpenMP itself. With GuideView, you can quickly find and eliminate
problem spots in your code.
More information on Guide can be found by reading the files
/usr/local/apps/tools/kai/readme
or
/usr/local/apps/tools/kai/guide37/docs
or by visiting the KAI website.
HPF
Release 2.4 of the
Portland Group HPF
(High Performance Fortran) compiler (pghpf) is
available along with the associated profiler (pgprof) in the
directory:
/usr/local/apps/PGIHPF/pghpf-2.4
To use release 2.4 of the PGI HPF compiler, add the following lines
to the end of your .cshrc file:
setenv PGI /usr/local/apps/PGIHPF/pghpf-2.4
set path=($PGI/bin $path)
setenv MANPATH "${MANPATH}:${PGI}/man"
Other online manuals can be found at the
pghpf site.
IRIS Explorer
IRIS Explorer from NAG is installed on the Origin. IRIS Explorer is a visual
programming system for data visualization, animation, manipulation and
analysis.
source /usr/local/apps/math/nag_new/iris_explorer.start
explorer
More information on what IE can do, as well as more information and help on all
the features, can be found at the
NAG site and at the IRIS
EXPLORER site.
KCC
The KCC C++ compiler from Kuck and Associates (KAI)
has been installed
on modi4. KCC is the top level driver for the KAI C++ compiler. It can
generate intermediate C, object, or executable files. KCC is a full
draft-standard C++ implementation, including templates,
exception handling and a complete draft-standard class library.
Version 3.4 of the KAI C++ compiler has been installed on modi4.
It can be found in
/usr/local/apps/tools/kai/kcc3.4
and to invoke it you can type
KCC file.c
Users of the KAI KCC should be aware that dbx does not work anymore with
their product. Users should instead use the KAI debugger kdb or TotalView.
The kdb debugger can be used as follows:
setenv KAILMD_LICENSE_FILE /usr/apps/tools/kai/kcc3.4/KCC_BASE/kdb/bin/license.dat
/usr/apps/tools/kai/kcc3.4/KCC_BASE/kdb/bin/kdb a.out
Users may wish to add kdb to their path.
Users can find manpages for KCC either on modi4 in the installation directory or on
the KAI on-line documentation pages.
MPI
MPI, a Message Passing Interface, is a software package that enables the
user to write portable parallel programs across several computing
platforms. The novice user should first consult the
MPI
website for information on syntax and information on MPI. The native implementation
on both the Exemplar and the Origin can be found by reading "man mpi". You should also
first consult the Message
Passing at NCSA documentation for these machines.
In brief, once your code has the relevant MPI calls, you can run your program simply
by issuing
mpirun -np num_procs a.out
where num_procs is the number of processors you want to use.
MPICH
The Portable MPI Model Implementation (MPICH),
Version 1.1.2 (released February 1999) has been installed on modi4 in
/usr/apps/MessPass/mpich
MPICH is a freely available, portable implementation of MPI, the Standard for
message-passing libraries. You can find also there the
GLOBUS compatible version of MPICH.
A change log for all changes from the previous release is available from
the MPICH website.
Documentation can be found on the web page or in the /doc directory. Please
also read the README files in each directory when using this software.
Use the compilers in /usr/apps/MessPass/mpich/bin when using MPICH.
This version includes support for viewing the message queues with debuggers
such as Totalview that know how to access the generic queue routines. These
are dynamically linked routines, and require that the MPICH configure figures
out how to build a shared library for your system.
OpenMP
OpenMP is a standard for portable compiler
directives. There are two
versions of OpenMP available on the NCSA Origin.
The SGI supported version can be used with the f77 compiler (using OpenMP with
the C compiler the
notation is the same) by setting the flag
f77 -MP:open_mp=ON file.f
This is also the default for the MIPSpro f77 compiler when the -mp option is used.
Usage directions for the OpenMP directives can be
found in the MIPSpro Fortran 77 Programmer's Guide . Other
languages are discussed further at the SGI OpenMP site.
Release 3.5 of the KAP/Pro Toolset for OpenMP (KPTS) is also available
on the Origin. It is located on the Origin in
/usr/local/apps/tools/kai
and discussed also in Assure and Guide. These
are tools that can include OpenMP commands directly into your program.
The KAI Toolset also includes translators to help you move existing code from older
directive sets, such at Cray, SGI, X3H5, KPTS directives, to OpenMP. To
translate the file "original.f" from SGI directives to OpenMP syntax,
for example, you can issue:
sgi2omp.pl original.f > original_omp.f
Paragraph and MPICL
alias PG "/usr/local/apps/math/paragraph/ParaGraph.mpi/PG"
cp /usr/local/apps/math/paragraph/ParaGraph.mpitracefiles/fft16.trf .
PG fft16.trf &
MPICL_DIR = /usr/local/apps/math/picl/mpicl
MPICL_INC = $(MPICL_DIR)/INCLUDE
MPICL_LIB = $(MPICL_DIR)/LIBS/sgi-origin
MPICL = -L$(MPICL_LIB) -lmpicl
/usr/local/apps/math/picl/mpicl/examples
sort +2n -3 +1rn -2 +0rn -1 tracefile > mytrace.trf
After this you can do
PG mytrace.trf&
Pixie
Pixie is a tool that adds profiling information to an executable.
Pixie reads an executable program, partitions it into basic blocks,
and writes an equivalent program containing additional code that
counts the execution of each basic block. (A basic block is
region of the program that can
be entered only at the beginning and exited only at the end).
You can issue the command
pixie my_executable
to make the instrumented file, and then issue
prof my_executable.Counts
If your program has several processes running in parallel, each process
will have a process id attached to it, and prof must be run on each
one of them individually. A similar tool to pixie is
SpeedShop. See "man pixie" for further information.
Perfex
Perfex is a tool that reports the hardware counts from select events from the R10000 counters.
With perfex you can either profile the whole program's event counts or only the event counts
of a small section of your program. You can get the exact counts of two select counters or
you can get the counts of 32 events with some statistical error. If you want the absolute
event count of two events (see events for a full list of counters), you can
specify the events by
perfex -e 10 -e 20 a.out
However, if you only want the average counts of all events, you would specify
perfex -a -x a.out
You can limit the event counts by specifying in your program what part you want to
profile (look in "man libperfex" how to do this), and then link with the library
(same for cc)
f77 -o a.out -lperfex
If you wan to know the time spent in each event counter, you can use the option -y:
perfex -a -x -y a.out
More information on perfex can be found on the perfex reference page,
at the perfex manpage
or at the tuning tools
page on the Origin.
PVM/XPVM
PVM , a Parallel
Virtual Machine System,
is a software system that enables a collection of heterogeneous
computers to be used as a coherent and flexible concurrent computational
resource.
The individual computers may be shared- or local-memory multiprocessors,
vector supercomputers, specialized graphics engines, or scalar
workstations, that may be interconnected by a variety of networks,
such as ethernet, FDDI.
User programs written in C, C++ or Fortran access PVM through library
routines.
There are two versions of PVM available on each machine, a native version
and a version provided by the PVM research group. Usage of the native version
of PVM on the Origin and the Exemplar can be found here.
Version 3.3 of PVM from Oak Ridge National Lab is located on the Origin in
/usr/local/apps/math/pvm3
To use this 64 bit version set the following variable:
setenv PVM_ROOT /usr/local/apps/math/pvm3
To run it, issue "run $PVM_ROOT/lib/pvm" on the
command line. C, C++ and Fortran programs must be linked with pvm3/lib/ARCH/libpvm3.a
and
Fortran programs must also be linked with pvm3/lib/ARCH/libfpvm3.a .
Include C/C++ header file pvm3/include/pvm3.h for constants and function
prototypes. A Fortran header file is in pvm3/include/fpvm3.h.
On the Origin there is available also an n32 bit PVM and XPVM.
XPVM is a graphical interface to the PVM commands as well as a real time
performance monitor for PVM tasks.
Put first the following commands in your .login file:
setenv PVM_ROOT /usr/apps/MessPass/pvm3/pvm3
setenv XPVM_ROOT /usr/apps/MessPass/pvm3/xpvm
setenv TCL_LIBRARY /usr/apps/MessPass/pvm3/tcl/library
setenv TK_LIBRARY /usr/apps/MessPass/pvm3/tk/library
You need to compile your program using
the PVM library, and then type
$XPVM_ROOT/src/SGI64/xpvm
on the Origin to start using it. There
are three views that you can use to monitor the progress of your program:
the network, space-time and utilization views. You can also use XPVM as a simple
debugger by looking at the trace events which are the entry and exit points of a
PVM routine. There is an on-line help included.
There are two versions of PVM available on each machine, a native version
and a version provided by the PVM research group. The native version of PVM
can be called on the Origin and the Exemplar by issuing the command "pvm".
Version 3.3 of PVM from Oak Ridge National Lab is located on the Origin in
/usr/local/apps/math/pvm3
To use this version set the following variable:
setenv PVM_ROOT /usr/local/apps/math/pvm3
To run it, issue "run $PVM_ROOT/lib/pvm" on the
command line. C, C++ and Fortran programs must be linked with pvm3/lib/ARCH/libpvm3.a
and
Fortran programs must also be linked with pvm3/lib/ARCH/libfpvm3.a .
Include C/C++ header file pvm3/include/pvm3.h for constants and function
prototypes. A Fortran header file is in pvm3/include/fpvm3.h.
ROMIO
ROMIO, a high-performance, portable MPI-IO implementation
is installed on modi4. The version installed on modi4 in
/usr/local/apps/MessPass/romio/romio
is using the native SGI MPI. Please read the documentation in
each directory. You can find background information on the
Parallel-I/O interface in
/usr/local/apps/MessPass/romio/romio/doc
Scaltool
Scaltool Version 1.0, released on July 9, 1999 has been installed on modi4.
Scaltool is a tool to pinpoint performance bottlenecks of parallel codes
on Distributed Shared Memory machines such as the Origin2000. Scaltool
can quantify the effects (in cycles) of capacity misses (due to limited L2
cache size), synchronization, and load imbalance. Thus, it can be a
valuable tool for parallel programmers to discover bottlenecks of their
applications, to understand the behaviour of their applications better,
and to help them hand-tune the applications to get the desired performance.
Scaltool was developed by Yan Solihin with the help of his colleague
Vinh Lam, and his advisor Prof Josep Torrellas, at the University of
Illinois at Urbana Champaign. The tool is based on an empirical mathematical
model that has been published in CSRD Technical Report 1563 and will be
published in Supercomputing 99. Interested users are referred to either
the technical report or the paper. An abstract of the talk in WSSMM '99
is included in the release.
Version 1.0 is the first release of the tool. It has been made
straightforward to use. It also has a self error detection feature that
it will report possible inaccuracy in the prediction. Please note that
not all inaccuracy and errors can be detected and some accurate
prediction maybe reported as inaccuracy, although this is very
unlikely.
Usage information and contact info can be found at the
NCSA Scaltool page.
SpeedShop
SpeedShop is the generic name for an integrated package of performance
tools on the Origin to locate performance problems in executables.
It also supports starting a process, in
such a way as to permit a debugger to attach to it, and it supports
running Purify on executables.
There are several different ways to use SpeedShop: by collecting
data (using ssrun), by creating reports (using prof) and by inserting
caliper points into your code (using the ssapi interface). Each of these is
discussed in length in the
NCSA Origin 2000 tuning page
or in the SGI online manual. Next we will briefly
review the most common ways to use SpeedShop (except ssapi, which you can
find in the pages above).
When using SpeedShop, you should first collect data from your program, by
issuing ssrun with various options:
ssrun -fpcsamp my_executable
The fpcsamp option for instance uses statistical program counter values based on
user and system time. Many more options exist and the
users should consult these in the man pages for SpeedShop.
The file that is created by ssrun,
can now be analyzed by prof:
prof my_executable.fpcsamp.332
will list the number of call of a function and the time spent in each function in
the program in descending order. If your program creates several processes, the process
id will be attached to the output file, and each one must be run separately by prof.
Another interesting sampling metrics is using the ideal option in ssrun, where your
program is instrumented into basic blocks with one entrance and exit. The profile
of such a basic block program is then reported. You can issue
ssrun -ideal my_executable
prof my_executable.ideal.4432
to see the number of times that a basic block was encountered. Sometimes during
benchmarking it is interesting to know the total number of instructions or floating
point operations executed:
prof -op my_executable.ideal.4432
The prof command alone only gives the total number of calls of each function.
If you use the gprof option for prof, you can also obtain information on the
calling tree hierarchy:
prof -gprof my_executable.ideal.4432
More information on prof can be found in the man pages.
SvPablo
SvPablo
is a graphical source code browser and performance visualizer that
integrates the Pablo project's dynamic performance instrumentation software
with PGI HPF, the Portland Group's commercial HPF compiler, and the
MIPS R10000 hardware performance counters.
SvPablo is installed on the Origin in
/usr/local/apps/Pablo/SvPablo/
In order to use SvPablo, you should set the following variables
setenv SVPABLO /usr/local/apps/Pablo/SvPablo/Install
set path = ($path $SVPABLO/bin)
setenv MANPATH "${MANPATH}:${SVPABLO}/Man"
You can run SvPablo by typing "runSvPablo" on the Origin.
More information can be found on-line in
/usr/local/apps/Pablo/SvPablo/Documentation
Other information can be found on the
SvPablo
website or the
Pablo Research group
website.
TotalView
Totalview is a full-featured, source-level,
graphical debugger for C, C++, and Fortran (77 and 90), assembler, and
mixed source/assembler codes based on the X Window System.
TotalView supports PVM, MPI, and HPF.
In order to use the debugger,
add the following to the end of your .cshrc file:
set path=(/usr/apps/tools/totalview/totalview.3.9.0-0/irix6-mips/bin $path)
setenv MANPATH "${MANPATH}:/usr/apps/tools/totalview/totalview.3.9.0-0/irix6-mips/man"
Start up the debugger (after setting your display and xhost appropriately)
with the command "totalview". See "man totalview" for command syntax and
options.
The Release Notes are available at
/usr/apps/tools/totalview/
Some example C codes are available in subdirectories of
/usr/apps/tools/totalview/totalview.3.9.0-0/irix6-mips
A short description on how to run Totalview to debug MPI jobs can be found
here.
More information is available from the
Etnus website.
There you can also find a
tutorial.
More Tools Here