Parallel Compilers and Tools

List of all softwares

Table of Software Tools at NCSA

Name	Debugging tool	Performance analysis	Parallel programming	Origin	Exemplar
Name	Debugging tool	Performance analysis	Parallel programming	Origin	Exemplar
Assure, AssureView	x		x	x
cvd/cvperf		x		x
CXdb	x				x
CXperf		x			x
Guide, GuideView		x	x	x
HPF			x	x	x
Iris Explorer		x	x	x
KCC			x	x
MPI			x	x	x
MPICH			x	x
MPICL		x	x
OpenMP			x	x
Paragraph		x	x	x
Perfex		x		x
Pixie		x		x
PVM/XPVM	x	x	x	x	x
ROMIO			x	x
Scaltool		x	x
SpeedShop		x		x
SvPablo	x	x	x	x
TotalView	x			x

Assure and AssureView

Assure is a tool that can identify the bugs in a parallel program and be a parallelization tool at the same time. To use Assure to parallelize your code, identify a candidate parallel loop in your code and insert OpenMP directives. Identify any variables that obviously need to be private, and place them in an OpenMP PRIVATE() clause (read more about OpenMP here).
Next, use 'assuref77' in place of the Fortran compiler:

assuref77 -c matmul.f
assuref77 -o matmul matmul.o

Assure builds a parallel computer simulator that identifies all potential differences between the serial and OpenMP parallel versions of your code for a given input data set.

AssureView can be used to browse the error lists alongside your source code, you can run it by typing "assureview" on the Origin. AssureView is a graphical tool which requires X windows for remote display. More information on Assure can be found by reading the files

/usr/local/apps/tools/kai/readme
or
/usr/local/apps/tools/kai/assure37/docs

or by visiting the KAI website.

Cvd/cvperf

The WorkShop debugger on the Origin can be invoked simply by

cvd a.out core

where a.out has been compiled with the debug option, -g. More information on the use can be obtained in the online help by issuing "man cvd".
A graphical tool for looking at the performance of an executable is cvperf. The user first creates file with counts using pixie. Several types of experiments can be made, and the user should consult the on-line documentation of cvperf for further info.

CXdb

CXdb is a window based symbolic debugger that can be used to debug Fortran, C and C++ programs compiled on the Exemplar. You can program, create, profile and debug thread-parallel applications with CXdb. You can debug either an executable, a core file or a running process.

To compile and link your program in a single step you would use

f77 -g file.f

and if you need to do it in two steps, you would need the debugging option in both steps

f77 -g -c file.f sub1.f sub2.f
f77 -g file.o sub1.o sub2.o

Similarly for f90 you can type

f90 -g file.f foo.o -W1,+tools

You can invoke the debugger by "cxdb a.out" or "cxdb -c core" or "cxdb a.out core" depending on which files you have available during debugging. An MPI executable can be debugged by issuing the "cxdb -mpi mpiexecutable". For a tutorial you should issue the command "cxdb -tutorial".
More information on how to compile your program and how to use the different options with cxdb can be found on the man page "man cxdb".

CXperf

CXperf is a performance analyser for programs compiled with the Exemplar Fortran90, C and C++ compilers. You can profile MPI and PVM application programs as well. You can use CXperf in GUI, line or batch mode. For instance for a Fortran program in line mode you would first compile and link with options +pa and +O2 or +O3:

f90 +O2 +pa prog.f

Then start the line version of cxperf by typing

cxperf -nw a.out

Now you have entered cxperf. An online help can be invoked now by typing "help". First you have to decide which routines you want to profile (here we select all)

select routine all

Then you can use the collect command to select the metrics:

collect cpu

Then run the program by typing " run" and to view the performance reports, you can type " analyze". More information on this performance analyzer tool can be found on the Exemplar man page "man cxperf" or invoking help in cxperf itself.

Guide and GuideView

Guide is a high performance, instrumented implementation of OpenMP for Fortran on the Origin. Guide is used in place of the Fortran compiler to create a parallel program from Fortran source containing OpenMP directives. To use Guide, replace the command 'f77' with 'guidef77' on the Origin

guidef77 -c matmul.f
guidef77 -o matmul matmul.o

To create an instrumented version of the program, specify "-WGstats" on the link line:

guidef77 -WGstats -o matmul matmul.o

To run in parallel, set the environment variable OMP_NUM_THREADS equal to the number of processors.

When an instrumented OpenMP program runs, it creates a file called 'guide_stats'. GuideView is a graphical performance data browser, designed to make it easy to identify and eliminate parallel performance bottlenecks. To invoke GuideView, issue 'guideview' with the name of a statistics file:

guideview guide_stats

GuideView provides several views of the performance data. It shows wall clock time spent in parallel and in serial, grouped by parallel region or by thread. It also displays overheads due to thread synchronization and OpenMP itself. With GuideView, you can quickly find and eliminate problem spots in your code.

More information on Guide can be found by reading the files

/usr/local/apps/tools/kai/readme
or
/usr/local/apps/tools/kai/guide37/docs

or by visiting the KAI website.

HPF

Release 2.4 of the Portland Group HPF (High Performance Fortran) compiler (pghpf) is available along with the associated profiler (pgprof) in the directory:

/usr/local/apps/PGIHPF/pghpf-2.4

To use release 2.4 of the PGI HPF compiler, add the following lines to the end of your .cshrc file:

setenv PGI /usr/local/apps/PGIHPF/pghpf-2.4
set path=($PGI/bin $path)
setenv MANPATH "${MANPATH}:${PGI}/man"

Other online manuals can be found at the pghpf site.

IRIS Explorer

IRIS Explorer from NAG is installed on the Origin. IRIS Explorer is a visual programming system for data visualization, animation, manipulation and analysis.

In order to run it you should type both of the following commands:

source /usr/local/apps/math/nag_new/iris_explorer.start
explorer

More information on what IE can do, as well as more information and help on all the features, can be found at the NAG site and at the IRIS EXPLORER site.

KCC

The KCC C++ compiler from Kuck and Associates (KAI) has been installed on modi4. KCC is the top level driver for the KAI C++ compiler. It can generate intermediate C, object, or executable files. KCC is a full draft-standard C++ implementation, including templates, exception handling and a complete draft-standard class library.

Version 3.4 of the KAI C++ compiler has been installed on modi4. It can be found in

/usr/local/apps/tools/kai/kcc3.4

and to invoke it you can type

KCC file.c

Users of the KAI KCC should be aware that dbx does not work anymore with their product. Users should instead use the KAI debugger kdb or TotalView. The kdb debugger can be used as follows:

setenv KAILMD_LICENSE_FILE /usr/apps/tools/kai/kcc3.4/KCC_BASE/kdb/bin/license.dat
/usr/apps/tools/kai/kcc3.4/KCC_BASE/kdb/bin/kdb a.out

Users may wish to add kdb to their path.

Users can find manpages for KCC either on modi4 in the installation directory or on the KAI on-line documentation pages.

MPI

MPI, a Message Passing Interface, is a software package that enables the user to write portable parallel programs across several computing platforms. The novice user should first consult the MPI website for information on syntax and information on MPI. The native implementation on both the Exemplar and the Origin can be found by reading "man mpi". You should also first consult the Message Passing at NCSA documentation for these machines.
In brief, once your code has the relevant MPI calls, you can run your program simply by issuing

mpirun -np num_procs a.out

where num_procs is the number of processors you want to use.

MPICH

The Portable MPI Model Implementation (MPICH), Version 1.1.2 (released February 1999) has been installed on modi4 in

/usr/apps/MessPass/mpich

MPICH is a freely available, portable implementation of MPI, the Standard for message-passing libraries. You can find also there the GLOBUS compatible version of MPICH. A change log for all changes from the previous release is available from the MPICH website.

Documentation can be found on the web page or in the /doc directory. Please also read the README files in each directory when using this software. Use the compilers in /usr/apps/MessPass/mpich/bin when using MPICH.

This version includes support for viewing the message queues with debuggers such as Totalview that know how to access the generic queue routines. These are dynamically linked routines, and require that the MPICH configure figures out how to build a shared library for your system.

OpenMP

OpenMP is a standard for portable compiler directives. There are two versions of OpenMP available on the NCSA Origin. The SGI supported version can be used with the f77 compiler (using OpenMP with the C compiler the notation is the same) by setting the flag

f77 -MP:open_mp=ON file.f

This is also the default for the MIPSpro f77 compiler when the -mp option is used. Usage directions for the OpenMP directives can be found in the MIPSpro Fortran 77 Programmer's Guide . Other languages are discussed further at the SGI OpenMP site.

Release 3.5 of the KAP/Pro Toolset for OpenMP (KPTS) is also available on the Origin. It is located on the Origin in

/usr/local/apps/tools/kai

and discussed also in Assure and Guide. These are tools that can include OpenMP commands directly into your program. The KAI Toolset also includes translators to help you move existing code from older directive sets, such at Cray, SGI, X3H5, KPTS directives, to OpenMP. To translate the file "original.f" from SGI directives to OpenMP syntax, for example, you can issue:

sgi2omp.pl original.f > original_omp.f

Paragraph and MPICL

(M)PICL and ParaGraph have been installed on modi4. MPICL is a subroutine library for collecting information on communication and user-defined events in message-passing parallel programs written in C or FORTRAN. In particular, for MPI programs it uses the MPI profiling interface to automatically intercept calls to MPI communication routines, eliminating the need to add more than a few statements to the source code in order to collect the information. By using the MPI_Pcontrol interface to the instrumentation commands, a single version of the MPI program can be used whether the instrumentation library is linked with the executable or not.

ParaGraph is a graphical display tool for visualizing the behavior and performance of parallel programs that use MPI (Message-Passing Interface). The visual animation of a parallel program is based on execution trace information gathered during an actual run of the program on a message-passing parallel computer system. The resulting trace data are replayed pictorially to provide a dynamic depiction of the behavior of the parallel program, as well as graphical summaries of its overall performance. The same performance data can be viewed from many different visual perspectives to gain insights that might be missed by any single view. The necessary execution trace data are produced by MPICL, which uses the profiling interface of MPI to provide timestamped records of MPI events.

Note: In the above MPI really means MPI and a few other message-passing libraries like PVM, NX and others.

The usage of these libraries on modi4 is as follows:

1) ParaGraph

Use Paragraph with pre-built tracefiles:

   alias PG "/usr/local/apps/math/paragraph/ParaGraph.mpi/PG"

   cp /usr/local/apps/math/paragraph/ParaGraph.mpitracefiles/fft16.trf .

   PG fft16.trf &

2) MPICL

To generate your own tracefiles to feed to ParaGraph, you need to add a few MPICL calls to your MPI code, and link to MPICL using something like the following in your makefile

   MPICL_DIR = /usr/local/apps/math/picl/mpicl
   MPICL_INC = $(MPICL_DIR)/INCLUDE
   MPICL_LIB = $(MPICL_DIR)/LIBS/sgi-origin
   MPICL     = -L$(MPICL_LIB) -lmpicl

To see which MPICL calls you need, look at the example programs in

   /usr/local/apps/math/picl/mpicl/examples

Run your MPI executable as usual. Assuming your PICL traces are in a file called `tracefile`, you need to sort it

   sort +2n -3 +1rn -2 +0rn -1  tracefile > mytrace.trf

After this you can do

   PG mytrace.trf&

Pixie

Pixie is a tool that adds profiling information to an executable. Pixie reads an executable program, partitions it into basic blocks, and writes an equivalent program containing additional code that counts the execution of each basic block. (A basic block is region of the program that can be entered only at the beginning and exited only at the end). You can issue the command

pixie my_executable

to make the instrumented file, and then issue

prof my_executable.Counts

If your program has several processes running in parallel, each process will have a process id attached to it, and prof must be run on each one of them individually. A similar tool to pixie is SpeedShop. See "man pixie" for further information.

Perfex

Perfex is a tool that reports the hardware counts from select events from the R10000 counters. With perfex you can either profile the whole program's event counts or only the event counts of a small section of your program. You can get the exact counts of two select counters or you can get the counts of 32 events with some statistical error. If you want the absolute event count of two events (see events for a full list of counters), you can specify the events by

perfex -e 10 -e 20 a.out

However, if you only want the average counts of all events, you would specify

perfex -a -x a.out

You can limit the event counts by specifying in your program what part you want to profile (look in "man libperfex" how to do this), and then link with the library (same for cc)

f77 -o a.out -lperfex

If you wan to know the time spent in each event counter, you can use the option -y:

perfex -a -x -y a.out

More information on perfex can be found on the perfex reference page, at the perfex manpage or at the tuning tools page on the Origin.

PVM/XPVM

PVM , a Parallel Virtual Machine System, is a software system that enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource. The individual computers may be shared- or local-memory multiprocessors, vector supercomputers, specialized graphics engines, or scalar workstations, that may be interconnected by a variety of networks, such as ethernet, FDDI. User programs written in C, C++ or Fortran access PVM through library routines.

There are two versions of PVM available on each machine, a native version and a version provided by the PVM research group. Usage of the native version of PVM on the Origin and the Exemplar can be found here. Version 3.3 of PVM from Oak Ridge National Lab is located on the Origin in

/usr/local/apps/math/pvm3

To use this 64 bit version set the following variable:

setenv PVM_ROOT /usr/local/apps/math/pvm3

To run it, issue "run $PVM_ROOT/lib/pvm" on the command line. C, C++ and Fortran programs must be linked with pvm3/lib/ARCH/libpvm3.a and Fortran programs must also be linked with pvm3/lib/ARCH/libfpvm3.a . Include C/C++ header file pvm3/include/pvm3.h for constants and function prototypes. A Fortran header file is in pvm3/include/fpvm3.h.

On the Origin there is available also an n32 bit PVM and XPVM. XPVM is a graphical interface to the PVM commands as well as a real time performance monitor for PVM tasks. Put first the following commands in your .login file:

setenv PVM_ROOT /usr/apps/MessPass/pvm3/pvm3
setenv XPVM_ROOT /usr/apps/MessPass/pvm3/xpvm
setenv TCL_LIBRARY /usr/apps/MessPass/pvm3/tcl/library
setenv TK_LIBRARY /usr/apps/MessPass/pvm3/tk/library

You need to compile your program using the PVM library, and then type

$XPVM_ROOT/src/SGI64/xpvm

on the Origin to start using it. There are three views that you can use to monitor the progress of your program: the network, space-time and utilization views. You can also use XPVM as a simple debugger by looking at the trace events which are the entry and exit points of a PVM routine. There is an on-line help included.

There are two versions of PVM available on each machine, a native version and a version provided by the PVM research group. The native version of PVM can be called on the Origin and the Exemplar by issuing the command "pvm". Version 3.3 of PVM from Oak Ridge National Lab is located on the Origin in

/usr/local/apps/math/pvm3

To use this version set the following variable:

setenv PVM_ROOT /usr/local/apps/math/pvm3

To run it, issue "run $PVM_ROOT/lib/pvm" on the command line. C, C++ and Fortran programs must be linked with pvm3/lib/ARCH/libpvm3.a and Fortran programs must also be linked with pvm3/lib/ARCH/libfpvm3.a . Include C/C++ header file pvm3/include/pvm3.h for constants and function prototypes. A Fortran header file is in pvm3/include/fpvm3.h.

ROMIO

ROMIO, a high-performance, portable MPI-IO implementation is installed on modi4. The version installed on modi4 in

/usr/local/apps/MessPass/romio/romio

is using the native SGI MPI. Please read the documentation in each directory. You can find background information on the Parallel-I/O interface in

/usr/local/apps/MessPass/romio/romio/doc

Scaltool

Scaltool Version 1.0, released on July 9, 1999 has been installed on modi4.

Scaltool is a tool to pinpoint performance bottlenecks of parallel codes on Distributed Shared Memory machines such as the Origin2000. Scaltool can quantify the effects (in cycles) of capacity misses (due to limited L2 cache size), synchronization, and load imbalance. Thus, it can be a valuable tool for parallel programmers to discover bottlenecks of their applications, to understand the behaviour of their applications better, and to help them hand-tune the applications to get the desired performance.

Scaltool was developed by Yan Solihin with the help of his colleague Vinh Lam, and his advisor Prof Josep Torrellas, at the University of Illinois at Urbana Champaign. The tool is based on an empirical mathematical model that has been published in CSRD Technical Report 1563 and will be published in Supercomputing 99. Interested users are referred to either the technical report or the paper. An abstract of the talk in WSSMM '99 is included in the release.

Version 1.0 is the first release of the tool. It has been made straightforward to use. It also has a self error detection feature that it will report possible inaccuracy in the prediction. Please note that not all inaccuracy and errors can be detected and some accurate prediction maybe reported as inaccuracy, although this is very unlikely.

Usage information and contact info can be found at the NCSA Scaltool page.

SpeedShop

SpeedShop is the generic name for an integrated package of performance tools on the Origin to locate performance problems in executables. It also supports starting a process, in such a way as to permit a debugger to attach to it, and it supports running Purify on executables. There are several different ways to use SpeedShop: by collecting data (using ssrun), by creating reports (using prof) and by inserting caliper points into your code (using the ssapi interface). Each of these is discussed in length in the NCSA Origin 2000 tuning page or in the SGI online manual. Next we will briefly review the most common ways to use SpeedShop (except ssapi, which you can find in the pages above).

When using SpeedShop, you should first collect data from your program, by issuing ssrun with various options:

ssrun -fpcsamp my_executable

The fpcsamp option for instance uses statistical program counter values based on user and system time. Many more options exist and the users should consult these in the man pages for SpeedShop. The file that is created by ssrun, can now be analyzed by prof:

prof my_executable.fpcsamp.332

will list the number of call of a function and the time spent in each function in the program in descending order. If your program creates several processes, the process id will be attached to the output file, and each one must be run separately by prof.
Another interesting sampling metrics is using the ideal option in ssrun, where your program is instrumented into basic blocks with one entrance and exit. The profile of such a basic block program is then reported. You can issue

ssrun -ideal my_executable
prof my_executable.ideal.4432

to see the number of times that a basic block was encountered. Sometimes during benchmarking it is interesting to know the total number of instructions or floating point operations executed:

prof -op my_executable.ideal.4432

The prof command alone only gives the total number of calls of each function. If you use the gprof option for prof, you can also obtain information on the calling tree hierarchy:

prof -gprof my_executable.ideal.4432

More information on prof can be found in the man pages.

SvPablo

SvPablo is a graphical source code browser and performance visualizer that integrates the Pablo project's dynamic performance instrumentation software with PGI HPF, the Portland Group's commercial HPF compiler, and the MIPS R10000 hardware performance counters.

SvPablo is installed on the Origin in

/usr/local/apps/Pablo/SvPablo/

In order to use SvPablo, you should set the following variables

setenv SVPABLO /usr/local/apps/Pablo/SvPablo/Install
set path = ($path $SVPABLO/bin)
setenv MANPATH "${MANPATH}:${SVPABLO}/Man"

You can run SvPablo by typing "runSvPablo" on the Origin. More information can be found on-line in

/usr/local/apps/Pablo/SvPablo/Documentation

Other information can be found on the SvPablo website or the Pablo Research group website.

TotalView

Totalview is a full-featured, source-level, graphical debugger for C, C++, and Fortran (77 and 90), assembler, and mixed source/assembler codes based on the X Window System. TotalView supports PVM, MPI, and HPF. In order to use the debugger, add the following to the end of your .cshrc file:

set path=(/usr/apps/tools/totalview/totalview.3.9.0-0/irix6-mips/bin $path)
setenv MANPATH "${MANPATH}:/usr/apps/tools/totalview/totalview.3.9.0-0/irix6-mips/man"

Start up the debugger (after setting your display and xhost appropriately) with the command "totalview". See "man totalview" for command syntax and options.

The Release Notes are available at

/usr/apps/tools/totalview/

Some example C codes are available in subdirectories of

/usr/apps/tools/totalview/totalview.3.9.0-0/irix6-mips

A short description on how to run Totalview to debug MPI jobs can be found here. More information is available from the Etnus website. There you can also find a tutorial.

More Tools Here

GNU Software Library