SGI Techpubs Library

Linux  »  Books  »  Developer  »  
Message Passing Toolkit (MPT) User's Guide
(document number: 007-3773-012 / published: 2009-10-22)    table of contents  |  additional info  |  download
find in page

Chapter 6. Profiling MPI Applications

This chapter describes the use of profiling tools to obtain performance information. Compared to the performance analysis of sequential applications, characterizing the performance of parallel applications can be challenging. Often it is most effective to first focus on improving the performance of MPI applications at the single process level.

It may also be important to understand the message traffic generated by an application. A number of tools can be used to analyze this aspect of a message passing application's performance, including Performance Co-Pilot and various third party products. In this chapter, you can learn how to use these various tools with MPI applications. It covers the following topics:

Using Profiling Tools with MPI Applications

Two of the most common SGI profiling tools are profile.pl and histx+. The following sections describe how to invoke these tools. Performance Co-Pilot tools and tips for writing your own tools are also included.

You can also use the perfcatch utility to profile the performance of an MPI program. For more information, see Chapter 8, “MPI Performance Profiling”.

profile.pl

You can use profile.pl to obtain procedure level profiling as well as information about the hardware performance monitors. For further information, see the profile.pl(1) and pfmon(1) man pages.

General format:

% mpirun  mpirun_entry_object [mpirun_entry_object ...] profile.pl [profile.pl_options] executable

Example:

% mpirun -np 4 profile.pl -s1 -c4,5 -N 1000 ./a.out

histx+

histx+ is a small set of tools that can assist with performance analysis and bottlenect identification.

General formats for histx (Histogram) and lipfpm (Linux IPF Performance Monitor):

% mpirun -np 4 histx [histx_options] ./a.out

% lipfpm [lipfpm_options] mmpirun -np 4 ./a.out 

Examples:

% mpirun -np 4 histx -f -o histx.out ./a.out

% lipfpm -f -e LOADS_RETIRED -e STORES_RETIRED mpirun -np 4 ./a.out

Profiling Interface

You can write your own profiling by using the MPI-1 standard PMPI_* calls. In addition, either within your own profiling library or within the application itself you can use the MPI_Wtime function call to time specific calls or sections of your code.

The following example is actual output for a single rank of a program that was run on 128 processors, using a user-created profiling library that performs call counts and timings of common MPI calls. Notice that for this rank most of the MPI time is being spent in MPI_Waitall and MPI_Allreduce.

Total job time 2.203333e+02 sec
Total MPI processes 128
Wtime resolution is 8.000000e-07 sec

activity on process rank 0
comm_rank calls 1      time 8.800002e-06
get_count calls 0      time 0.000000e+00
ibsend calls    0      time 0.000000e+00
probe calls     0      time 0.000000e+00
recv calls      0      time 0.00000e+00   avg datacnt 0   waits 0       wait time 0.00000e+00
irecv calls     22039  time 9.76185e-01   datacnt 23474032 avg datacnt 1065
send calls      0      time 0.000000e+00
ssend calls     0      time 0.000000e+00
isend calls     22039  time 2.950286e+00
wait calls      0      time 0.00000e+00   avg datacnt 0
waitall calls   11045  time 7.73805e+01   # of Reqs 44078  avg data  cnt 137944
barrier calls   680    time 5.133110e+00   
alltoall calls  0      time 0.0e+00       avg datacnt 0
alltoallv calls 0      time 0.000000e+00
reduce calls    0      time 0.000000e+00
allreduce calls 4658   time 2.072872e+01
bcast calls     680    time 6.915840e-02
gather calls    0      time 0.000000e+00
gatherv calls   0      time 0.000000e+00
scatter calls   0      time 0.000000e+00
scatterv calls  0      time 0.000000e+00  

activity on process rank 1 
...

MPI Internal Statistics

MPI keeps track of certain resource utilization statistics. These can be used to determine potential performance problems caused by lack of MPI message buffers and other MPI internal resources.

To turn on the displaying of MPI internal statistics, use the MPI_STATS environment variable or the -stats option on the mpirun command. MPI internal statistics are always being gathered, so displaying them does not cause significant additional overhead. In addition, one can sample the MPI statistics counters from within an application, allowing for finer grain measurements. If the MPI_STATS_FILE variable is set, when the program completes, the internal statistics will be written to the file specified by this variable. For information about these MPI extensions, see the mpi_stats man page.

These statistics can be very useful in optimizing codes in the following ways:

  • To determine if there are enough internal buffers and if processes are waiting (retries) to aquire them

  • To determine if single copy optimization is being used for point-to-point or collective calls

For additional information on how to use the MPI statistics counters to help tune the run-time environment for an MPI application, see Chapter 7, “Run-time Tuning”.

Third Party Products

Two third party tools that you can use with the SGI MPI implementation are Vampir from Pallas (www.pallas.com) and Jumpshot, which is part of the MPICH distribution. Both of these tools are effective for smaller, short duration MPI jobs. However, the trace files these tools generate can be enormous for longer running or highly parallel jobs. This causes a program to run more slowly, but even more problematic is that the tools to analyze the data are often overwhelmed by the amount of data.

Message Passing Toolkit (MPT) User's Guide
(document number: 007-3773-012 / published: 2009-10-22)    table of contents  |  additional info  |  download

    Front Matter
    New Features in This Manual
    About This Manual
    Chapter 1. Introduction
    Chapter 2. Administrating MPT
    Chapter 3. Getting Started
    Chapter 4. Programming with SGI MPI
    Chapter 5. Debugging MPI Applications
    Chapter 6. Profiling MPI Applications
    Chapter 7. Run-time Tuning
    Chapter 8. MPI Performance Profiling
    Chapter 9. Troubleshooting and Frequently Asked Questions
    Index


home/search | what's new | help