|
|
Linux » Books » Developer »
Message Passing Toolkit (MPT) User's Guide
(document number: 007-3773-019 / published: 2011-11-15)
table of contents | additional info | download find in page
Chapter 9. MPI Performance Profiling
This
chapter describes the perfcatch utility used to profile
the performance of an MPI program and other tools that can be used for
profiling MPI applications. It covers the following topics:
Overview of perfcatch Utility
The perfcatch utility
runs an MPI program with a wrapper profiling library that prints MPI call
profiling information to a summary file upon MPI program completion. This
MPI profiling result file is called MPI_PROFILING_STATS,
by default (see “ MPI_PROFILING_STATS Results File Example”). It is created in the
current working directory of the MPI process with rank 0.
Using the perfcatch Utility
The syntax of the perfcatch
utility is, as follows: perfcatch [-v | -vofed | -i] cmd args |
The perfcatch utility accepts the following options: | No option | | Supports MPT
| | -v | | Supports Voltaire MPI
| | -vofed | | Supports Voltaire OFED MPI
| | -i | | Supports Intel MPI
|
To use perfcatch with an SGI Message Passing
Toolkit MPI program, insert the perfcatch command in
front of the executable name. Here are some examples: mpirun -np 64 perfcatch a.out arg1 |
and
mpirun host1 32, host2 64 perfcatch a.out arg1 |
To use perfcatch with Intel MPI, add the
-i options. An example is, as follows: mpiexec -np 64 perfcatch -i a.out arg1 |
For more information, see the perfcatch
(1) man page.
MPI_PROFILING_STATS Results File Example
The MPI profiling result file has a summary
statistics section followed by a rank-by-rank profiling information section.
The summary statistics section reports some overall statistics, including
the percent time each rank spent in MPI functions, and the MPI process
that spent the least and the most time in MPI functions. Similar reports
are made about system time usage.
The rank-by-rank profiling information
section lists every profiled MPI function called by a particular MPI process.
The number of calls and the total time consumed by these calls is reported.
Some functions report additional information such as average data counts
and communication peer lists.
An example MPI_PROFILING_STATS results file is,
as follows: ============================================================
PERFCATCHER version 22
(C) Copyright SGI. This library may only be used
on SGI hardware platforms. See LICENSE file for
details.
============================================================
MPI program profiling information
Job profile recorded Wed Jan 17 13:05:24 2007
Program command line: /home/estes01/michel/sastest/mpi_hello_linux
Total MPI processes 2
Total MPI job time, avg per rank 0.0054768 sec
Profiled job time, avg per rank 0.0054768 sec
Percent job time profiled, avg per rank 100%
Total user time, avg per rank 0.001 sec
Percent user time, avg per rank 18.2588%
Total system time, avg per rank 0.0045 sec
Percent system time, avg per rank 82.1648%
Time in all profiled MPI routines, avg per rank 5.75004e-07 sec
Percent time in profiled MPI routines, avg per rank 0.0104989%
Rank-by-Rank Summary Statistics
-------------------------------
Rank-by-Rank: Percent in Profiled MPI routines
Rank:Percent
0:0.0112245% 1:0.00968502%
Least: Rank 1 0.00968502%
Most: Rank 0 0.0112245%
Load Imbalance: 0.000771%
Rank-by-Rank: User Time
Rank:Percent
0:17.2683% 1:19.3699%
Least: Rank 0 17.2683%
Most: Rank 1 19.3699%
Rank-by-Rank: System Time
Rank:Percent
0:86.3416% 1:77.4796%
Least: Rank 1 77.4796%
Most: Rank 0 86.3416%
Notes
-----
Wtime resolution is 5e-08 sec
Rank-by-Rank MPI Profiling Results
----------------------------------
Activity on process rank 0
Single-copy checking was not enabled.
comm_rank calls: 1 time: 6.50005e-07 s 6.50005e-07 s/call
Activity on process rank 1
Single-copy checking was not enabled.
comm_rank calls: 1 time: 5.00004e-07 s 5.00004e-07 s/call
------------------------------------------------
recv profile
cnt/sec for all remote ranks
local ANY_SOURCE 0 1
rank
------------------------------------------------
recv wait for data profile
cnt/sec for all remote ranks
local 0 1
rank
------------------------------------------------
recv wait for data profile
cnt/sec for all remote ranks
local 0 1
rank
------------------------------------------------
send profile
cnt/sec for all destination ranks
src 0 1
rank
------------------------------------------------
ssend profile
cnt/sec for all destination ranks
src 0 1
rank
------------------------------------------------
ibsend profile
cnt/sec for all destination ranks
src 0 1
rank
|
MPI Performance Profiling Environment Variables
The MPI performance profiling
environment variables are, as follows: | Variable | Description
| | MPI_PROFILE_AT_INIT | Activates MPI profiling immediately, that is, at the start
of MPI program execution.
| | MPI_PROFILING_STATS_FILE | Specifies the file where MPI profiling results are written.
If not specified, the file MPI_PROFILING_STATS is written.
|
MPI Supported Profiled Functions
The MPI supported profiled
functions are, as follows:
 | Note: Some functions may not be implemented in all language
as indicated below.
|
| Languages | Function
| | C Fortran | mpi_allgather
| | C Fortran | mpi_allgatherv
| | C Fortran | mpi_allreduce
| | C Fortran | mpi_alltoall
| | C Fortran | mpi_alltoallv
| | C Fortran | mpi_alltoallw
| | C Fortran | mpi_barrier
| | C Fortran | mpi_bcast
| | C Fortran | mpi_comm_create
| | C Fortran | mpi_comm_free
| | C Fortran | mpi_comm_group
| | C Fortran | mpi_comm_rank
| | C Fortran | mpi_finalize
| | C Fortran | mpi_gather
| | C Fortran | mpi_gatherv
| | C | mpi_get_count
| | C Fortran | mpi_group_difference
| | C Fortran | mpi_group_excl
| | C Fortran | mpi_group_free
| | C Fortran | mpi_group_incl
| | C Fortran | mpi_group_intersection
| | C Fortran | mpi_group_range_excl
| | C Fortran | mpi_group_range_incl
| | C Fortran | mpi_group_union
| | C | mpi_ibsend
| | C Fortran | mpi_init
| | C | mpi_init_thread
| | C Fortran | mpi_irecv
| | C Fortran | mpi_isend
| | C | mpi_probe
| | C Fortran | mpi_recv
| | C Fortran | mpi_reduce
| | C Fortran | mpi_scatter
| | C Fortran | mpi_scatterv
| | C Fortran | mpi_send
| | C Fortran | mpi_sendrecv
| | C Fortran | mpi_ssend
| | C Fortran | mpi_test
| | C Fortran | mpi_testany
| | C Fortran | mpi_wait
| | C Fortran | mpi_wait
|
Profiling MPI Applications
This
section describes the use of profiling tools to obtain performance information.
Compared to the performance analysis of sequential applications, characterizing
the performance of parallel applications can be challenging. Often it
is most effective to first focus on improving the performance of MPI applications
at the single process level.
It may also be important to understand the message traffic generated
by an application. A number of tools can be used to analyze this aspect
of a message passing application's performance, including Performance
Co-Pilot and various third party products. In this section, you can learn
how to use these various tools with MPI applications. It covers the following
topics:
You can write your own profiling by using
the MPI-1 standard PMPI_* calls. In addition, either
within your own profiling library or within the application itself you
can use the MPI_Wtime function call to time specific
calls or sections of your code.
The following example is actual output for a single rank of a program
that was run on 128 processors, using a user-created profiling library
that performs call counts and timings of common MPI calls. Notice that
for this rank most of the MPI time is being spent in MPI_Waitall
and MPI_Allreduce. Total job time 2.203333e+02 sec
Total MPI processes 128
Wtime resolution is 8.000000e-07 sec
activity on process rank 0
comm_rank calls 1 time 8.800002e-06
get_count calls 0 time 0.000000e+00
ibsend calls 0 time 0.000000e+00
probe calls 0 time 0.000000e+00
recv calls 0 time 0.00000e+00 avg datacnt 0 waits 0 wait time 0.00000e+00
irecv calls 22039 time 9.76185e-01 datacnt 23474032 avg datacnt 1065
send calls 0 time 0.000000e+00
ssend calls 0 time 0.000000e+00
isend calls 22039 time 2.950286e+00
wait calls 0 time 0.00000e+00 avg datacnt 0
waitall calls 11045 time 7.73805e+01 # of Reqs 44078 avg data cnt 137944
barrier calls 680 time 5.133110e+00
alltoall calls 0 time 0.0e+00 avg datacnt 0
alltoallv calls 0 time 0.000000e+00
reduce calls 0 time 0.000000e+00
allreduce calls 4658 time 2.072872e+01
bcast calls 680 time 6.915840e-02
gather calls 0 time 0.000000e+00
gatherv calls 0 time 0.000000e+00
scatter calls 0 time 0.000000e+00
scatterv calls 0 time 0.000000e+00
activity on process rank 1
... |
MPI keeps track of certain resource utilization
statistics. These can be used to determine potential performance problems
caused by lack of MPI message buffers and other MPI internal resources.
To turn on the displaying of MPI internal statistics, use the
MPI_STATS environment variable or the -stats
option on the mpirun command. MPI internal statistics
are always being gathered, so displaying them does not cause significant
additional overhead. In addition, one can sample the MPI statistics counters
from within an application, allowing for finer grain measurements. If
the MPI_STATS_FILE variable is set, when the program
completes, the internal statistics will be written to the file specified
by this variable. For information about these MPI extensions, see the
mpi_stats man page.
These statistics can be very useful in optimizing codes in the following
ways:
For additional information on how to use the MPI statistics counters
to help tune the run-time environment for an MPI application, see Chapter 8, “Run-time Tuning”.
Two third party tools that you can use with the SGI MPI implementation
are Vampir from Pallas (www.pallas.com) and Jumpshot,
which is part of the MPICH distribution. Both of these tools are effective
for smaller, short duration MPI jobs. However, the trace files these tools
generate can be enormous for longer running or highly parallel jobs. This
causes a program to run more slowly, but even more problematic is that
the tools to analyze the data are often overwhelmed by the amount of data.
Message Passing Toolkit (MPT) User's Guide
(document number: 007-3773-019 / published: 2011-11-15)
table of contents | additional info | download
Front Matter
New Features in This Manual
About This Manual
Chapter 1. Introduction
Chapter 2. Administrating MPT
Chapter 3. Getting Started
Chapter 4. Programming with SGI MPI
Chapter 5. Debugging MPI Applications
Chapter 6. PerfBoost
Chapter 7. Checkpoint/Restart
Chapter 8. Run-time Tuning
Chapter 9. MPI Performance Profiling
Chapter 10. Troubleshooting and Frequently Asked Questions
Index
home/search |
what's new |
help
|
|
|