|
|
Linux » Books » Developer »
Message Passing Toolkit (MPT) User's Guide
(document number: 007-3773-012 / published: 2009-10-22)
table of contents | additional info | download find in page
Chapter 6. Profiling MPI Applications
This chapter describes the use of profiling tools to obtain performance
information. Compared to the performance analysis of sequential applications,
characterizing the performance of parallel applications can be challenging.
Often it is most effective to first focus on improving the performance
of MPI applications at the single process level.
It may also be important to understand the message traffic generated
by an application. A number of tools can be used to analyze this aspect
of a message passing application's performance, including Performance
Co-Pilot and various third party products. In this chapter, you can learn
how to use these various tools with MPI applications. It covers the following
topics:
Using Profiling Tools with MPI Applications
Two
of the most common SGI profiling tools are profile.pl
and histx+. The following sections describe how to
invoke these tools. Performance Co-Pilot tools and tips for writing your
own tools are also included.
You can also use the perfcatch utility to profile
the performance of an MPI program. For more information, see Chapter 8, “MPI Performance Profiling”.
You can
use profile.pl to obtain procedure level profiling
as well as information about the hardware performance monitors. For further
information, see the profile.pl(1) and pfmon(1) man pages.
General format: % mpirun mpirun_entry_object [mpirun_entry_object ...] profile.pl [profile.pl_options] executable |
Example: % mpirun -np 4 profile.pl -s1 -c4,5 -N 1000 ./a.out |
histx+
is a small set of tools that can assist with performance analysis
and bottlenect identification.
General formats for histx (Histogram) and
lipfpm (Linux IPF Performance Monitor): % mpirun -np 4 histx [histx_options] ./a.out |
% lipfpm [lipfpm_options] mmpirun -np 4 ./a.out |
Examples: % mpirun -np 4 histx -f -o histx.out ./a.out |
% lipfpm -f -e LOADS_RETIRED -e STORES_RETIRED mpirun -np 4 ./a.out |
You
can write your own profiling by using the MPI-1 standard PMPI_*
calls. In addition, either within your own profiling library
or within the application itself you can use the MPI_Wtime
function call to time specific calls or sections of your code.
The following example is actual output for a single rank of a program
that was run on 128 processors, using a user-created profiling library
that performs call counts and timings of common MPI calls. Notice that
for this rank most of the MPI time is being spent in MPI_Waitall
and MPI_Allreduce. Total job time 2.203333e+02 sec
Total MPI processes 128
Wtime resolution is 8.000000e-07 sec
activity on process rank 0
comm_rank calls 1 time 8.800002e-06
get_count calls 0 time 0.000000e+00
ibsend calls 0 time 0.000000e+00
probe calls 0 time 0.000000e+00
recv calls 0 time 0.00000e+00 avg datacnt 0 waits 0 wait time 0.00000e+00
irecv calls 22039 time 9.76185e-01 datacnt 23474032 avg datacnt 1065
send calls 0 time 0.000000e+00
ssend calls 0 time 0.000000e+00
isend calls 22039 time 2.950286e+00
wait calls 0 time 0.00000e+00 avg datacnt 0
waitall calls 11045 time 7.73805e+01 # of Reqs 44078 avg data cnt 137944
barrier calls 680 time 5.133110e+00
alltoall calls 0 time 0.0e+00 avg datacnt 0
alltoallv calls 0 time 0.000000e+00
reduce calls 0 time 0.000000e+00
allreduce calls 4658 time 2.072872e+01
bcast calls 680 time 6.915840e-02
gather calls 0 time 0.000000e+00
gatherv calls 0 time 0.000000e+00
scatter calls 0 time 0.000000e+00
scatterv calls 0 time 0.000000e+00
activity on process rank 1
... |
MPI
keeps track of certain resource utilization statistics. These can be used
to determine potential performance problems caused by lack of MPI message
buffers and other MPI internal resources.
To turn on the displaying of MPI internal statistics, use the
MPI_STATS environment variable or the -stats
option on the mpirun command. MPI internal statistics
are always being gathered, so displaying them does not cause significant
additional overhead. In addition, one can sample the MPI statistics counters
from within an application, allowing for finer grain measurements. If
the MPI_STATS_FILE variable is set, when the program
completes, the internal statistics will be written to the file specified
by this variable. For information about these MPI extensions, see the
mpi_stats man page.
These statistics can be very useful in optimizing codes in the following
ways:
For additional information on how to use the MPI statistics counters
to help tune the run-time environment for an MPI application, see Chapter 7, “Run-time Tuning”.
Two third party
tools that you can use with the SGI MPI implementation are Vampir from
Pallas (www.pallas.com) and Jumpshot, which is part
of the MPICH distribution. Both of these tools are effective for smaller,
short duration MPI jobs. However, the trace files these tools generate
can be enormous for longer running or highly parallel jobs. This causes
a program to run more slowly, but even more problematic is that the tools
to analyze the data are often overwhelmed by the amount of data.
Message Passing Toolkit (MPT) User's Guide
(document number: 007-3773-012 / published: 2009-10-22)
table of contents | additional info | download
Front Matter
New Features in This Manual
About This Manual
Chapter 1. Introduction
Chapter 2. Administrating MPT
Chapter 3. Getting Started
Chapter 4. Programming with SGI MPI
Chapter 5. Debugging MPI Applications
Chapter 6. Profiling MPI Applications
Chapter 7. Run-time Tuning
Chapter 8. MPI Performance Profiling
Chapter 9. Troubleshooting and Frequently Asked Questions
Index
home/search |
what's new |
help
|
|
|