SGI Techpubs Library

IRIX 6.5  »  Man Pages
find in page | jump to first hit | clear highlight

MPI(1)

 NAME

     MPI - Introduction to the Message Passing Interface (MPI)

 DESCRIPTION

     The Message Passing Interface (MPI) is a component of the Message Passing
     Toolkit (MPT), which is a software package that supports parallel
     programming across a network of computer systems through a technique
     known as message passing.  The goal of MPI, simply stated, is to develop
     a widely used standard for writing message-passing programs. As such, the
     interface establishes a practical, portable, efficient, and flexible
     standard for message passing.

     This MPI implementation supports the MPI 1.2 standard, as documented by
     the MPI Forum in the spring 1997 release of MPI: A Message Passing
     Interface Standard.  In addition, certain MPI-2 features are also
     supported.  In designing MPI, the MPI Forum sought to make use of the
     most attractive features of a number of existing message passing systems,
     rather than selecting one of them and adopting it as the standard.  Thus,
     MPI has been strongly influenced by work at the IBM T. J. Watson Research
     Center, Intel's NX/2, Express, nCUBE's Vertex, p4, and PARMACS. Other
     important contributions have come from Zipcode, Chimp, PVM, Chameleon,
     and PICL.

     MPI requires the presence of an Array Services daemon (arrayd) on each
     host that is to run MPI processes. In a single-host environment, no
     system administration effort should be required beyond installing and
     activating arrayd. However, users wishing to run MPI applications across
     multiple hosts will need to ensure that those hosts are properly
     configured into an array.  For more information about Array Services, see
     the arrayd(1M), arrayd.conf(4), and array_services(5) man pages.

     When running across multiple hosts, users must set up their .rhosts files
     to enable remote logins. Note that MPI does not use rsh, so it is not
     necessary that rshd be running on security-sensitive systems; the .rhosts
     file was simply chosen to eliminate the need to learn yet another
     mechanism for enabling remote logins.

     Other sources of MPI information are as follows:

     *   Man pages for MPI library functions

     *   A copy of the MPI standard as PostScript or hypertext on the World
         Wide Web at the following URL:

              http://www.mpi-forum.org/

     *   Other MPI resources on the World Wide Web, such as the following:

              http://www.mcs.anl.gov/mpi/index.html
              http://www.erc.msstate.edu/mpi/index.html
              http://www.mpi.nd.edu/lam/

   Getting Started
     For IRIX systems, the Modules software package is available to support
     one or more installations of MPT.  To use the MPT software, load the
     desired mpt module.

     After you have initialized modules, enter the following command:

          module load mpt

     To unload the mpt module, enter the following command:

          module unload mpt

     MPT software can be installed in an alternate location for use with the
     modules software package.  If MPT software has been installed on your
     system for use with modules, you can access the software with the module
     command shown in the previous example.  If MPT has not been installed for
     use with modules, the software resides in default locations on your
     system (/usr/include, /usr/lib, /usr/array/PVM, and so on), as in
     previous releases.  For further information, see Installing MPT for Use
     with Modules, in the Modules relnotes.

   Using MPI
     Compile and link your MPI program as shown in the following examples.

     IRIX systems:

     To use the 64-bit MPI library, choose one of the following commands:

          cc -64 compute.c -lmpi
          f77 -64 -LANG:recursive=on compute.f -lmpi
          f90 -64 -LANG:recursive=on compute.f -lmpi
          CC -64 compute.C -lmpi++ -lmpi

     To use the 32-bit MPI library, choose one of the following commands:

          cc -n32 compute.c -lmpi
          f77 -n32 -LANG:recursive=on compute.f -lmpi
          f90 -n32 -LANG:recursive=on compute.f -lmpi
          CC -n32 compute.C -lmpi++ -lmpi

     Linux systems:

     To use the 64-bit MPI library on Linux IA64 systems, choose one of the
     following commands:

          g++ -o myprog myproc.C -lmpi++ -lmpi
          gcc -o myprog myprog.c -lmpi

     For Altix the libmpi++.so library is not binary compatible with code
     generated by g++ 3.0 compilers.  For this reason an additional library is
     supported for g++ 3.0 users as well as Intel C++ 8.0 users. The library
     is libg++3mpi++.so and can be linked in by using -lg++3mpi++ instead of
     -lmpi++.

     For IRIX systems, if Fortran 90 compiler 7.2.1 or higher is installed,
     you can add the -auto_use option as follows to get compile-time checking
     of MPI subroutine calls:

          f90 -auto_use mpi_interface -64 compute.f -lmpi
          f90 -auto_use mpi_interface -n32 compute.f -lmpi

     For IRIX with MPT version 1.4 or higher, and Altix with MPT 1.9 or
     higher, the Fortran 90 USE MPI feature is supported.  You can replace the
     include 'mpif.h' statement in your Fortran 90 source code with USE MPI.
     This facility includes MPI type and parameter definitions, and performs
     compile-time checking of MPI function and subroutine calls.

     For Altix users, if you USE MPI you must supply a -I option with the efc
     command line to specify the directory in which the MPI.mod file resides.
     efc will fail to find MPI.mod unless you supply a -I option; there is no
     default search path for Fortran module files.  For default-location
     installations, -I/usr/include is correct; replace /usr/include with the
     corresponding directory in your non-default-location installation if
     necessary.

     The Intel efc compiler does not support the notion of "allow any type"
     formal arguments, so definitions for such routines as MPI_Send and
     MPI_Recv which have buffer or other arguments which may be of any type
     are omitted from USE MPI on Altix.  Compile-time checking of these
     functions is therefore not available on Altix.

     NOTE:  Do not use the IRIX Fortran 90 -auto_use mpi_interface option to
     compile IRIX Fortran 90 source code that contains the USE MPI statement.
     They are incompatible with each other.

     For IRIX systems, applications compiled under a previous release of MPI
     should not require recompilation to run under this new (3.3) release.
     However, it is not possible for executable files running under the 3.2
     release to interoperate with others running under the 3.3 release.

     The C version of the MPI_Init(3) routine ignores the arguments that are
     passed to it and does not modify them.

     Stdin is enabled only for those MPI processes with rank 0 in the first
     MPI_COMM_WORLD (which does not need to be located on the same host as
     mpirun).  Stdout and stderr results are enabled for all MPI processes in
     the job, whether launched via mpirun, or one of the MPI-2 spawn
     functions.

     This version of the IRIX MPI implementation is compatible with the sproc
     system call and can therefore coexist with doacross loops.  SGI MPI can
     likewise coexist with OpenMP on Linux systems. By default MPI is not
     threadsafe.  Therefore, calls to MPI routines in a multithreaded
     application will require some form of mutual exclusion.  The
     MPI_Init_thread call can be used to request thread safety.  In this case,
     MPI calls can be made within parallel regions.  MPI_Init_thread is
     available on IRIX only.

     For IRIX and Linux systems, this implementation of MPI requires that all
     MPI processes call MPI_Finalize eventually.

   Buffering
     The current implementation buffers messages unless the MPI_BUFFER_MAX
     environment variable is set or if the message size is large enough and
     certain safe MPI functions are used.

     Buffered messages are grouped into two classes based on length: short
     (messages with lengths of 64 bytes or less) and long (messages with
     lengths greater than 64 bytes).

     When MPI_BUFFER_MAX is set, messages greater than this value are
     candidates for single-copy transfers.  For IRIX systems, the data from
     the sending process must reside in the symmetric data, symmetric heap, or
     global heap segment and be a contiguous type.  For Linux systems, the
     data from the sending process can reside in the static region, stack, or
     private heap and must be a contiguous type.

     For more information on single-copy transfers, see the MPI_BUFFER_MAX and
     MPI_DEFAULT_SINGLE_COPY_OFF environment variables.

   Myrinet (GM) Support
     This release provides support for use of the GM protocol over Myrinet
     interconnects on IRIX systems. Support is currently limited to 64-bit
     applications.

   Using MPI with cpusets
     You can use cpusets to run MPI applications (see cpuset(4)).  However, it
     is highly recommended that the cpuset have the MEMORY_LOCAL attribute.
     On Origin systems, if this attribute is not used, you should disable NUMA
     optimizations (see the MPI_DSM_OFF environment variable description in
     the following section).

   Default Interconnect Selection
     Beginning with the MPT 1.6 release, the search algorithm for selecting a
     multi-host interconnect has been significantly modified.  By default, if
     MPI is being run across multiple hosts, or if multiple binaries are
     specified on the mpirun command, the software now searches for
     interconnects in the following order (for IRIX systems):

          1) XPMEM (NUMAlink - only available on partitioned systems)
          2) GSN
          3) MYRINET
          4) TCP/IP

     The only supported interconnects on Linux systems are XPMEM and TCP/IP.

     MPI uses the first interconnect it can detect and configure correctly.
     There will only be one interconnect configured for the entire MPI job,
     with the exception of XPMEM.  If XPMEM is found on some hosts, but not on
     others, one additional interconnect is selected.

     The user can specify a mandatory interconnect to use by setting one of
     the following new environment variables.  These variables will be
     assessed in the following order:

          1) MPI_USE_XPMEM
          2) MPI_USE_GSN
          3) MPI_USE_GM
          4) MPI_USE_TCP

     For a mandatory interconnect to be used, all of the hosts on the mpirun
     command line must be connected via the device, and the interconnect must
     be configured properly.  If this is not the case, an error message is
     printed to stdout and the job is terminated.  XPMEM is an exception to
     this rule, however.

     If MPI_USE_XPMEM is set, one additional interconnect can be selected via
     the MPI_USE variables.  Messaging between the partitioned hosts will use
     the XPMEM driver while messaging between non-partitioned hosts will use
     the second interconnect.  If a second interconnect is required but not
     selected by the user, MPI will choose the interconnect to use, based on
     the default hierarchy.

     If the global -v verbose option is used on the mpirun command line, a
     message is printed to stdout, indicating which multi-host interconnect is
     being used for the job.

     The following interconnect selection environment variables have been
     deprecated in the MPT 1.6 release: MPI_GSN_ON, MPI_GM_ON, and
     MPI_BYPASS_OFF.  If any of these variables are set, MPI prints a warning
     message to stdout. The meanings of these variables are ignored.

   Using MPI-2 Process Creation and Management Routines
     This release provides support for MPI_Comm_spawn and
     MPI_Comm_spawn_multiple.  However, options must be specified as an
     argument on the mpirun command line or as an environment variable to
     enable this feature.  On IRIX, this feature is only supported for MPI
     jobs running within a single host running IRIX 6.5.2 or later.  Support
     on Linux is restricted to Altix numalinked systems.  Consult the mpirun
     man page for details on how to enable spawn support.

 ENVIRONMENT VARIABLES

     This section describes the variables that specify the environment under
     which your MPI programs will run. Unless otherwise specified, these
     variables are available for both Linux and IRIX systems.  Environment
     variables have predefined values.  You can change some variables to
     achieve particular performance objectives; others are required values for
     standard-compliant programs.

     MPI_ARRAY
          Sets an alternative array name to be used for communicating with
          Array Services when a job is being launched.

          Default:  The default name set in the arrayd.conf file

     MPI_BAR_COUNTER (IRIX systems only)
          Specifies the use of a simple counter barrier algorithm within the
          MPI_Barrier(3) and MPI_Win_fence(3) functions.

          Default:  Enabled for jobs using fewer than 64 MPI processes.

     MPI_BAR_DISSEM
          Specifies the use of of a dissemination/butterfly algorithm within
          the MPI_Barrier(3) and MPI_Win_fence(3) functions. This algorithm
          has generally been found to provide the best performance.  By
          default on IRIX systems this algorithm is used for MPI_COMM_WORLD
          and congruent communicators.  Explicitly specifying this environment
          variable also enables the use of this algorithm for other
          communicators on both IRIX and Linux systems.

          Default:  On IRIX systems enabled for MPI_COMM_WORLD for jobs using
          more than 64 processes.  On Altix systems enabled by default for all
          MPI communicators for all process counts.

     MPI_BAR_TREE
          Specifies the use of a tree barrier within the MPI_Barrier(3) and
          MPI_Win_fence(3) functions.   This variable can also be used to
          change the default arity(fan-in) of the tree barrier algorithm.
          Typically this barrier is slower than the butterfly/dissemination
          barrier.

          Default:  Not enabled.  Default arity is 8 when enabled.

     MPI_BUFFER_MAX
          Specifies a minimum message size, in bytes, for which the message
          will be considered a candidate for single-copy transfer.

          On IRIX, this mechanism is available only for communication between
          MPI processes on the same host. The sender data must reside in
          either the symmetric data, symmetric heap, or global heap. The MPI
          data type on the send side must also be a contiguous type.

          On IRIX, if the XPMEM driver is enabled (for single host jobs, see
          MPI_XPMEM_ON and for multihost jobs, see MPI_USE_XPMEM), MPI allows
          single-copy transfers for basic predefined MPI data types from any
          sender data location, including the stack and private heap.  The
          XPMEM driver also allows single-copy transfers across partitions.

          On IRIX, if cross mapping of data segments is enabled at job
          startup, data in common blocks will reside in the symmetric data
          segment.  On systems running IRIX 6.5.2 or higher, this feature is
          enabled by default. You can employ the symmetric heap by using the
          shmalloc(shpalloc) functions available in LIBSMA.

          On Linux, this feature is supported for both single host MPI jobs
          and MPI jobs running across partitions. MPI uses the xpmem module to
          map memory from one MPI process onto another during job startup.
          The mapped areas include the static region, private heap, and stack
          region.  Single-copy is supported for contiguous data types from any
          of the mapped regions.

          Memory mapping is enabled by default on Linux.  To disable it, set
          the MPI_MEMMAP_OFF environment variable.  In addition, the xpmem
          kernel module must be installed on your system for single-copy
          transfers. The xpmem module is released with the OS.

          Testing of this feature has indicated that most MPI applications
          benefit more from buffering of medium-sized messages than from
          buffering of large size messages, even though buffering of medium-
          sized messages requires an extra copy of data.  However, highly
          synchronized applications that perform large message transfers can
          benefit from the single-copy pathway.

          Single-copy can occur by default for certain MPI functions that
          transfer large size messages.  See MPI_DEFAULT_SINGLE_COPY_OFF for
          more information and how to disable it.

          Default:  Not enabled

     MPI_BUFS_PER_HOST
          Determines the number of shared message buffers (16 KB each) that
          MPI is to allocate for each host.  These buffers are used to send
          long messages and interhost messages.

          Default:  32 pages (1 page = 16KB)

     MPI_BUFS_PER_PROC
          Determines the number of private message buffers (16 KB each) that
          MPI is to allocate for each process.  These buffers are used to send
          long messages and intrahost messages.

          Default:  32 pages (1 page = 16KB)

     MPI_CHECK_ARGS
          Enables checking of MPI function arguments. Segmentation faults
          might occur if bad arguments are passed to MPI, so this is useful
          for debugging purposes.  Using argument checking adds several
          microseconds to latency.

          Default:  Not enabled

     MPI_COMM_MAX
          Sets the maximum number of communicators that can be used in an MPI
          program.  Use this variable to increase internal default limits.
          (Might be required by standard-compliant programs.)  MPI generates
          an error message if this limit (or the default, if not set) is
          exceeded.

          Default:  256

     MPI_COREDUMP
          Controls which ranks of an MPI job can dump core on receipt of a
          core-dumping signal.  Valid values are NONE, FIRST, ALL, or INHIBIT.
          NONE means that no rank should dump core.  FIRST means that the
          first rank on each host to receive a core-dumping signal should dump
          core. ALL means that all ranks should dump core if they receive a
          core-dumping signal.   INHIBIT disables MPI signal-handler
          registration for core-dumping signals.

          When MPI_Init() is called, the MPI library attempts to register a
          signal handler for each signal for which reception causes a core
          dump. If a signal handler was previously registered, MPI removes the
          MPI registration and restores the other signal handler for that
          signal. If no previously-registered handler is present, the MPI
          handler is invoked if and when the rank receives a core-dumping
          signal.

          When the MPI signal handler is invoked, it displays a stack
          traceback for the first rank entering the handler on each host, and
          then consults MPI_COREDUMP to determine if a core dump should be
          produced.

          Note that process limits on core dump size interact with this
          setting.  First a process decides to dump core or is inhibited from
          dumping core based on the MPI_COREDUMP setting. Then "limit
          coredump" applies to the resulting core dump file(s), if any.

          Default: FIRST

     MPI_COREDUMP_DEBUGGER (Linux only)
          This variable lets you optionally specify which debugger should be
          used by MPT to display the stack traceback when your program
          receives a core-dumping signal.  Set MPI_VERBOSE to have MPT display
          the debugger command just before it executes it. If the environment
          variable is not defined, MPT uses the idb debugger.

          You can specify this variable in any of the following formats:

               Format                   Meaning

               Basename of a debugger   If you specify idb or gdb, MPT uses
                                        that debugger, customizing the command
                                        line argument and debugger commands
                                        sent to the debugger, as appropriate.

                                        Note that the program you specify must
                                        be located in one of the directories
                                        specified by the PATH environment
                                        variable in the MPT job. This might be
                                        different from the PATH variable in
                                        your interactive sessions.  If you
                                        receive a message similar to sh: idb:
                                        command not found in the stack
                                        traceback, you can use the pathname to
                                        the debugger (described in the
                                        following format) to supply a full
                                        pathname instead.

               Pathname to a debugger   If you specify a value that contains a
                                        /, but no spaces, MPT takes the value
                                        as the pathname to the debugger you
                                        wish to use. The final four characters
                                        of the value must be /idb or /gdb.
                                        Command-line arguments are not
                                        supplied to the debugger, but debugger
                                        commands are customized according to
                                        the debugger specified.  If you need
                                        to specify command-line arguments to
                                        the debugger, use a complete command
                                        line (described in the following
                                        format).

               Complete command line    If the value contains a space, it is
                                        taken as the complete command line to
                                        be passed to system(1).  Up to four
                                        occurrences of %d in the command line
                                        are replaced by the process ID of the
                                        process upon which the debugger should
                                        be run.  You will need to arrange for
                                        debugger commands to be sent to the
                                        debugger.  The third and fourth
                                        examples below show samples of this.

          Examples:  (There are four examples here, each of which must be
          typed all on one line)

          setenv MPI_COREDUMP_DEBUGGER gdb
          setenv MPI_COREDUMP_DEBUGGER /my/test/version/of/idb
          setenv MPI_COREDUMP_DEBUGGER "(echo print my_favorite_variable; echo where; echo quit) | gdb -p %d"
          setenv MPI_COREDUMP_DEBUGGER '(echo set \$stoponattach = 1; echo attach %d /proc/%d/exe; echo where; echo quit) | /sw/com/intel-compilers/7.1.013/compiler70/ia64/bin/idb | sed -e "s/^/coredump: /"'

          Default: idb

     MPI_COREDUMP_VERBOSE
          Instructs mpirun(1) to print information about coredump control and
          traceback handling.   Notably, a message will be printed if a user-
          or library-registered signal handler overrides a signal handler
          which the MPT library would otherwise have installed.  Output is
          sent to stderr.

          Default: Not enabled

     MPI_DEFAULT_SINGLE_COPY_OFF
          Disables the single-copy mode by default optimization.  This
          optimization causes transfers of more than 2000 bytes that use
          MPI_Isend, MPI_Sendrecv, MPI_Alltoall, MPI_Bcast, MPI_Allreduce and
          MPI_Reduce to use the single-copy mode optimization.  Users of
          MPI_Send should continue to use the MPI_BUFFER_MAX environment
          variable to enable single-copy.

          Default:  Not enabled

     MPI_DIR
          Sets the working directory on a host. When an mpirun(1) command is
          issued, the Array Services daemon on the local or distributed node
          responds by creating a user session and starting the required MPI
          processes. The user ID for the session is that of the user who
          invokes mpirun, so this user must be listed in the .rhosts file on
          the corresponding nodes. By default, the working directory for the
          session is the user's $HOME directory on each node. You can direct
          all nodes to a different directory (an NFS directory that is
          available to all nodes, for example) by setting the MPI_DIR variable
          to a different directory.

          Default:  $HOME on the node. If using the -np option of mpirun(1),
          the default is the current directory.

     MPI_DPLACE_INTEROP_OFF (IRIX systems only)
          Disables an MPI/dplace interoperability feature available beginning
          with IRIX 6.5.13.  By setting this variable, you can obtain the
          behavior of MPI with dplace on older releases of IRIX.

          Default:  Not enabled

     MPI_DSM_CPULIST
          Specifies a list of CPUs on which to run an MPI application.  To
          ensure that processes are linked to CPUs, this variable should be
          used in conjunction with the MPI_DSM_MUSTRUN variable.

          For an explanation of the syntax for this environment variable, see
          the section titled "Using a CPU List."

     MPI_DSM_CPULIST_TYPE
          Specifies the way in which MPI should interpret the CPU values given
          by the MPI_DSM_CPULIST variable.  This variable can be set to the
          following values:

               Value          Action

               hwgraph        This tells MPI to interpret the CPU numbers
                              designated by the MPI_DSM_CPULIST variable as
                              cpunum values as defined in the hardware
                              graph(see hwgraph(4)). This is the default
                              interpretation when running MPI outside of a
                              cpuset(see cpuset(4)).

               cpuset         This tells MPI to interpret the CPU numbers
                              designated by the MPI_DSM_CPULIST variable as
                              relative processors within a cpuset.  This is
                              the default interpretation of this list when MPI
                              is running within a cpuset.  Setting
                              MPI_DSM_CPULIST_TYPE to this value when not
                              running within a cpuset has no effect.

     MPI_DSM_DISTRIBUTE (Linux systems only)
          Ensures that each MPI process gets a unique CPU and physical memory
          on the node with which that CPU is associated.  Currently, the CPUs
          are chosen by simply starting at relative CPU 0 and incrementing
          until all MPI processes have been forked.  To choose specific CPUs,
          use the MPI_DSM_CPULIST environment variable.  This feature is most
          useful if running on a dedicated system or running within a cpuset.
          Some batch schedulers including LSF 5.1 will cause
          MPI_DSM_DISTRIBUTE to be set automatically when using dynamic
          cpusets.

          Default:  Not enabled

     MPI_DSM_MUSTRUN
          Enforces memory locality for MPI processes.  Use of this feature
          ensures that each MPI process will get a CPU and physical memory on
          the node to which it was originally assigned.  This variable has
          been observed to improve program performance on IRIX systems running
          release 6.5.7 and earlier, when running a program on a quiet system.
          With later IRIX releases, under certain circumstances, setting this
          variable is not necessary. Internally, this feature directs the
          library to use the process_cpulink(3) function instead of
          process_mldlink(3) to control memory placement.

          MPI_DSM_MUSTRUN should not be used when the job is submitted to
          miser (see miser_submit(1)) because program hangs may result.

          The process_cpulink(3) function is inherited across process fork(2)
          or sproc(2).  For this reason, when using mixed MPI/OpenMP
          applications, it is recommended either that this variable not be
          set, or that _DSM_MUSTRUN also be set (see pe_environ(5)).

          On Linux systems, this environment variable has been deprecated and
          will be removed in a future release. Use the MPI_DSM_DISTRIBUTE
          environment variable instead.

          Default:  Not enabled

     MPI_DSM_OFF
          Turns off nonuniform memory access (NUMA) optimization in the MPI
          library.

          Default:  Not enabled

     MPI_DSM_PLACEMENT (IRIX systems only)
          Specifies the default placement policy to be used for the stack and
          data segments of an MPI process.  Set this variable to one of the
          following values:

               Value          Action

               firsttouch     With this policy, IRIX attempts to satisfy
                              requests for new memory pages for stack, data,
                              and heap memory on the node where the requesting
                              process is currently scheduled.

               fixed          With this policy, IRIX attempts to satisfy
                              requests for new memory pages for stack, data,
                              and heap memory on the node associated with the
                              memory locality domain (mld) with which an MPI
                              process was linked at job startup. This is the
                              default policy for MPI processes.

               roundrobin     With this policy, IRIX attempts to satisfy
                              requests for new memory pages in a round robin
                              fashion across all of the nodes associated with
                              the MPI job. It is generally not recommended to
                              use this setting.

               threadroundrobin
                              This policy is intended for use with hybrid
                              MPI/OpenMP applications only. With this policy,
                              IRIX attempts to satisfy requests for new memory
                              pages for the MPI process stack, data, and heap
                              memory in a roundrobin fashion across the nodes
                              allocated to its OpenMP threads. This placement
                              option might be helpful for large OpenMP/MPI
                              process ratios.  For non-OpenMP applications,
                              this value is ignored.

          Default:  fixed

     MPI_DSM_PPM
          Sets the number of MPI processes per memory locality domain (mld).
          For Origin 2000 systems, values of 1 or 2 are allowed. For Origin
          3000 and Origin 300 systems, values of 1, 2, or 4 are allowed. On
          Altix systems, values of 1 or 2 are allowed.

          Default:  Origin 2000 systems, 2; Origin 3000 and Origin 300
          systems, 4; Altix systems, 2.

     MPI_DSM_TOPOLOGY (IRIX systems only)
          Specifies the shape of the set of hardware nodes on which the PE
          memories are allocated.  Set this variable to one of the following
          values:

               Value          Action

               cube           A group of memory nodes that form a perfect
                              hypercube.  The number of processes per host
                              must be a power of 2.  If a perfect hypercube is
                              unavailable, a less restrictive placement will
                              be used.

               cube_fixed     A group of memory nodes that form a perfect
                              hypercube.  The number of processes per host
                              must be a power of 2.  If a perfect hypercube is
                              unavailable, the placement will fail, disabling
                              NUMA placement.

               cpucluster     Any group of memory nodes.  The operating system
                              attempts to place the group numbers close to one
                              another, taking into account nodes with disabled
                              processors.  (Default for Irix 6.5.11 and
                              higher).

               free           Any group of memory nodes.  The operating system
                              attempts to place the group numbers close to one
                              another.  (Default for Irix 6.5.10 and earler
                              releases).

     MPI_DSM_VERBOSE
          Instructs mpirun(1) to print information about process placement for
          jobs running on nonuniform memory access (NUMA) machines (unless
          MPI_DSM_OFF is also set). Output is sent to stderr.

          Default:  Not enabled

     MPI_DSM_VERIFY (IRIX systems only)
          Instructs mpirun(1) to run some diagnostic checks on proper memory
          placement of MPI data structures at job startup. If errors are
          found, a diagnostic message is printed to stderr.

          Default:  Not enabled

     MPI_GM_DEVS (IRIX systems only)
          Sets the order for opening GM(Myrinet) adapters. The list of devices
          does not need to be space-delimited (0321 is valid).  In this
          release, a maximum of 8 adpaters are supported on a single host.

          Default:  MPI will use all available GM(Myrinet) devices.

     MPI_GM_VERBOSE
          Setting this variable allows some diagnostic information concerning
          messaging between processes using GM (Myrinet) to be displayed on
          stderr.

          Default:  Not enabled

     MPI_GROUP_MAX
          Determines the maximum number of groups that can simultaneously
          exist for any single MPI process.  Use this variable to increase
          internal default limits. (This variable might be required by
          standard-compliant programs.)  MPI generates an error message if
          this limit (or the default, if not set) is exceeded.

          Default:  32

     MPI_GSN_DEVS (IRIX 6.5.12 systems or later)
          Sets the order for opening GSN adapters. The list of devices does
          not need to be quoted or space-delimited (0123 is valid).

          Default:  MPI will use all available GSN devices

     MPI_GSN_VERBOSE (IRIX 6.5.12 systems or later)
          Allows additional MPI initialization information to be printed in
          the standard output stream. This information contains details about
          the GSN (ST protocol) OS bypass connections and the GSN adapters
          that are detected on each of the hosts.

          Default:  Not enabled

     MPI_MAPPED_HEAP_SIZE (Linux systems only)
          Sets the new size (in bytes) for the amount of heap that is memory
          mapped per MPI process.  The default size of the mapped heap is the
          physical memory available per CPU less the static region size. For
          more information regarding memory mapping, see MPI_MEMMAP_OFF.

          Default:  The physical memory available per CPU less the static
          region size

     MPI_MAPPED_STACK_SIZE (Linux systems only)
          Sets the new size (in bytes) for the amount of stack that is memory
          mapped per MPI process.  The default size of the mapped stack is the
          stack limit size. If the stack is unlimited, the mapped region is
          set to the physical memory available per CPU. For more information
          regarding memory mapping, see MPI_MEMMAP_OFF.

          Default:  The stack limit size

     MPI_MEMMAP_OFF (Linux systems only)
          Turns off the memory mapping feature.

          The memory mapping feature provides support for single-copy
          transfers and MPI-2 one-sided communication on Linux.  These
          features are supported for single host MPI jobs and MPI jobs that
          span partitions. At job startup, MPI uses the xpmem module to map
          memory from one MPI process onto another.  The mapped areas include
          the static region, private heap, and stack.

          Memory mapping is enabled by default on Linux.  To disable it, set
          the MPI_MEMMAP_OFF environment variable.

          For memory mapping, the xpmem kernel module must be installed on
          your system.  The xpmem module is released with the OS.

          Default: Not enabled

     MPI_MEMMAP_VERBOSE (Linux systems only)
          Allows MPI to display additional information regarding the memory
          mapping initialization sequence.  Output is sent to stderr.

          Default: Not enabled

     MPI_MSG_RETRIES
          Specifies the number of times the MPI library will try to get a
          message header, if none are available.  Each MPI message that is
          sent requires an initial message header.  If one is not available
          after MPI_MSG_RETRIES, the job will abort.

          Note that this variable no longer applies to processes on the same
          host, or when using the GM (Myrinet) protocol. In these cases,
          message headers are allocated dynamically on an as-needed basis.

          Default:  500

     MPI_MSGS_MAX
          This variable can be set to control the total number of message
          headers that can be allocated.  This allocation applies to messages
          exchanged between processes on a single host, or between processes
          on different hosts when using the GM(Myrinet) OS bypass protocol.
          Note that the initial allocation of memory for message headers is
          128 Kbytes.

          Default:  Allow up to 64 Mbytes to be allocated for message headers.
          If you set this variable, specify the maximum number of message
          headers.

     MPI_MSGS_PER_HOST
          Sets the number of message headers to allocate for MPI messages on
          each MPI host. Space for messages that are destined for a process on
          a different host is allocated as shared memory on the host on which
          the sending processes are located. MPI locks these pages in memory.
          Use the MPI_MSGS_PER_HOST variable to allocate buffer space for
          interhost messages.

          Caution:  If you set the memory pool for interhost packets to a
          large value, you can cause allocation of so much locked memory that
          total system performance is degraded.

          The previous description does not apply to processes that use the
          GM(Myrinet) OS bypass protocol. In this case, message headers are
          allocated dynamically as needed. See the MPI_MSGS_MAX variable
          description.

          Default:  1024 messages

     MPI_MSGS_PER_PROC
          This variable is effectively obsolete. Message headers are now
          allocated on an as needed basis for messaging either between
          processes on the same host, or between processes on different hosts
          when using the GM (Myrinet) OS bypass protocol.  The new
          MPI_MSGS_MAX variable can be used to control the total number of
          message headers that can be allocated.

          Default:  1024

     MPI_NAP
          This variable affects the way in which ranks wait for events to
          occur. For example, when a receive is issued for which there are as
          yet no matching sends, the receiving rank awaits the matching send
          issued event.

          When MPI_NAP is not defined (that is, unsetenv MPI_NAP), the library
          spins in a tight loop when awaiting events. While this provides the
          best possible response time when the event occurs, each waiting rank
          uses CPU time at wall-clock rates until then.  Leaving MPI_NAP
          undefined is best if sends and matching receives occur nearly
          simultaneously.

          If defined with no value (that is, setenv MPI_NAP), the library
          makes a system call while waiting, which might yield the CPU to
          another eligible process that can use it.  If no such process
          exists, the rank receives control back nearly immediately, and CPU
          time accrues at near wall-clock rates.  If another process does
          exist, it is given some CPU time, after which the MPI rank is again
          given the CPU to test for the event.  This is best if the system is
          oversubscribed (there are more processes ready to run than there are
          CPUs).  This option was previously available in MPT, but was not
          documented.

          If defined with a positive integer value (for example, setenv
          MPI_NAP 10), the rank sleeps for that many milliseconds before again
          testing to determine if an event has occurred.  This dramatically
          reduces the CPU time that is charged against the rank, and might
          increase the system's "idle" time.  This setting is best if there is
          usually a significant time difference between the times that sends
          and matching receives are posted.

          Default:  Not applicable - one of the cases above always applies.

     MPI_OPENMP_INTEROP
          Setting this variable modifies the placement of MPI processes to
          better accomodate the OpenMP threads associated with each process.
          For more information, see the section titled Using MPI with OpenMP.

          NOTE: This option is available only on Origin 300 and Origin 3000
          servers and Altix systems.

          Default:  Not enabled

     MPI_REQUEST_MAX
          Determines the maximum number of nonblocking sends and receives that
          can simultaneously exist for any single MPI process.  Use this
          variable to increase internal default limits.  (This variable might
          be required by standard-compliant programs.)  MPI generates an error
          message if this limit (or the default, if not set) is exceeded.

          Default:  16384

     MPI_SHARED_VERBOSE
          Setting this variable allows for some diagnostic information
          concerning messaging within a host to be displayed on stderr.

          Default:  Not enabled

     MPI_SIGTRAP  (Linux systems only)
          Specifies if MPT's signal handler should override any existing
          signal handlers for signals SIGSEGV, SIGQUIT, SIGILL, SIGABRT,
          SIGBUS, and SIGFPE.  If set to ON, the MPT signal handler will
          override any pre-existing signal handler for these signals.   If
          OFF, then the existing signal handlers will remain in effect.

          These signals are sometimes handled by compiler-language-specific
          runtime libraries.  In some cases, the signal handler in the runtime
          library makes inappropriate references to memory-mapped fetchop
          areas, which may result in a system panic.  This has been observed
          with Intel's efc 7.x compilers.

          Default:  ON  (This may change in future releases.)

     MPI_SIGTRAP_VERBOSE (Linux systems only)
          If set, MPT will display the value of the MPI_SIGTRAP environment
          variable, and messages about the actions taken if MPT overrides a
          pre-existing signal handler.  See also MPI_COREDUMP_VERBOSE.

          Default:  Not enabled

     MPI_SLAVE_DEBUG_ATTACH
          Specifies the MPI process to be debugged. If you set
          MPI_SLAVE_DEBUG_ATTACH to N, the MPI process with rank N prints a
          message during program startup, describing how to attach to it from
          another window using the dbx debugger on IRIX or the gdb or idb
          debugger on Linux.  The message includes the number of seconds you
          have to attach the debugger to process N.  If you fail to attach
          before the time expires, the process continues.

     MPI_STATIC_NO_MAP (IRIX systems only)
          Disables cross mapping of static memory between MPI processes.  This
          variable can be set to reduce the significant MPI job startup and
          shutdown time that can be observed for jobs involving more than 512
          processors on a single IRIX host.  Note that setting this shell
          variable disables certain internal MPI optimizations and also
          restricts the usage of MPI-2 one-sided functions.  For more
          information, see the MPI_Win man page.

          Default:  Not enabled

     MPI_STATS
          Enables printing of MPI internal statistics.  Each MPI process
          prints statistics about the amount of data sent with MPI calls
          during the MPI_Finalize process.  Data is sent to stderr.  To prefix
          the statistics messages with the MPI rank, use the -p option on the
          mpirun command. For additional information, see the MPI_SGI_stats
          man page.

          NOTE: Because the statistics-collection code is not thread-safe,
          this variable should not be set if the program uses threads.

          Default:  Not enabled

     MPI_TYPE_DEPTH
          Sets the maximum number of nesting levels for derived data types.
          (Might be required by standard-compliant programs.) The
          MPI_TYPE_DEPTH variable limits the maximum depth of derived data
          types that an application can create.  MPI generates an error
          message if this limit (or the default, if not set) is exceeded.

          Default:  8 levels

     MPI_TYPE_MAX
          Determines the maximum number of data types that can simultaneously
          exist for any single MPI process. Use this variable to increase
          internal default limits.  (This variable might be required by
          standard-compliant programs.)  MPI generates an error message if
          this limit (or the default, if not set) is exceeded.

          Default:  1024

     MPI_UNBUFFERED_STDIO
          Normally, mpirun line-buffers output received from the MPI processes
          on both the stdout and stderr standard IO streams.  This prevents
          lines of text from different processes from possibly being merged
          into one line, and allows use of the mpirun -prefix option.

          Of course, there is a limit to the amount of buffer space that
          mpirun has available (currently, about 8,100 characters can appear
          between new line characters per stream per process).  If more
          characters are emitted before a new line character, the MPI program
          will abort with an error message.

          Setting the MPI_UNBUFFERED_STDIO environment variable disables this
          buffering.  This is useful, for example, when a program's rank 0
          emits a series of periods over time to indicate progress of the
          program. With buffering, the entire line of periods will be output
          only when the new line character is seen.  Without buffering, each
          period will be immediately displayed as soon as mpirun receives it
          from the MPI program.   (Note that the MPI program still needs to
          call fflush(3) or FLUSH(101) to flush the stdout buffer from the
          application code.)

          Additionally, setting MPI_UNBUFFERED_STDIO allows an MPI program
          that emits very long output lines to execute correctly.

          NOTE: If MPI_UNBUFFERED_STDIO is set, the mpirun -prefix option is
          ignored.

          Default:  Not set

     MPI_UNIVERSE  (Linux systems only)
          When running MPI applications on partitioned Altix systems which use
          the MPI_Comm_spawn and MPI_Comm_spawn_multiple functions, it may be
          necessary to explicitly specify the partitions on which additional
          MPI processes may be launched.  The MPI_UNIVERSE environment
          variable may be used for this purpose.

          For more information, see the section titled "Launching Spawn
          Capable Jobs on Altix Partitioned Systems" from the mpirun man page.

          Default:  Not set

     MPI_UNIVERSE_SIZE  (Linux systems only)
          When running MPI applications on partitioned Altix systems which use
          the MPI_Comm_spawn and MPI_Comm_spawn_multiple functions users can
          now specify MPI_UNIVERSE_SIZE instead of using the -up option on the
          mpirun command.

          For more information, see the section titled "Launching Spawn
          Capable Jobs on Altix Partitioned Systems" from the mpirun man page.

          Default:  Not set

     MPI_USE_GM  (IRIX systems only)
          Requires the MPI library to use the Myrinet (GM protocol) OS bypass
          driver as the interconnect when running across multiple hosts or
          running with multiple binaries.  If a GM connection cannot be
          established among all hosts in the MPI job, the job is terminated.

          For more information, see the section titled "Default Interconnect
          Selection."

          Default:  Not set

     MPI_USE_GSN (IRIX 6.5.12 systems or later)
          Requires the MPI library to use the GSN (ST protocol) OS bypass
          driver as the interconnect when running across multiple hosts or
          running with multiple binaries.  If a GSN connection cannot be
          established among all hosts in the MPI job, the job is terminated.

          GSN imposes a limit of one MPI process using GSN per CPU on a
          system. For example, on a 128-CPU system, you can run multiple MPI
          jobs, as long as the total number of MPI processes using the GSN
          bypass does not exceed 128.

          Once the maximum allowed MPI processes using GSN is reached,
          subsequent MPI jobs return an error to the user output, as in the
          following example:

                 MPI: Could not connect all processes to GSN adapters. The maximum
                      number of GSN adapter connections per system is normally equal
                      to the number of CPUs on the system.

          If there are a few CPUs still available, but not enough to satisfy
          the entire MPI job, the error will still be issued and the MPI job
          terminated.

          For more information, see the section titled "Default Interconnect
          Selection."

          Default:  Not set

     MPI_USE_TCP
          Requires the MPI library to use the TCP/IP driver as the
          interconnect when running across multiple hosts or running with
          multiple binaries.

          For more information, see the section titled "Default Interconnect
          Selection."

          Default:  Not set

     MPI_USE_XPMEM  (IRIX 6.5.13 systems or later and Linux systems)
          Requires the MPI library to use the XPMEM driver as the interconnect
          when running across multiple hosts or running with multiple
          binaries.  This driver allows MPI processes running on one partition
          to communicate with MPI processes on a different partition via the
          NUMAlink network.  The NUMAlink network is powered by block transfer
          engines (BTEs).  BTE data transfers do not require processor
          resources.

          For IRIX, the XPMEM (cross partition) device driver is available
          only on Origin 3000 and Origin 300 systems running IRIX 6.5.13 or
          greater.

          NOTE: Due to possible MPI program hangs, you should not run MPI
          across partitions using the XPMEM driver on IRIX versions 6.5.13,
          6.5.14, or 6.5.15.  This problem has been resolved in IRIX version
          6.5.16.

          For Linux, the XPMEM device driver requires the xpmem kernel module
          to be installed.  The xpmem module is released with the OS.

          If all of the hosts specified on the mpirun command do not reside in
          the same partitioned system, you can select one additional
          interconnect via the MPI_USE variables.  MPI communication between
          partitions will go through the XPMEM driver, and communication
          between non-partitioned hosts will go through the second
          interconnect.

          For more information, see the section titled "Default Interconnect
          Selection."

          Default:  Not set

     MPI_XPMEM_ON (IRIX 6.5.15 systems or later)
          Enables the XPMEM single-copy enhancements for processes residing on
          the same host.

          The XPMEM enhancements allow single-copy transfers for basic
          predefined MPI data types from any sender data location, including
          the stack and private heap.  Without enabling XPMEM, single-copy is
          allowed only from data residing in the symmetric data, symmetric
          heap, or global heap.

          Both the MPI_XPMEM_ON and MPI_BUFFER_MAX variables must be set to
          enable these enhancements.  Both are disabled by default.

          If the following additional conditions are met, the block transfer
          engine (BTE) is invoked instead of bcopy, to provide increased
          bandwidth:

               *   Send and receive buffers are cache-aligned.

               *   Amount of data to transfer is greater than or equal to the
                   MPI_XPMEM_THRESHOLD value.

          NOTE:  The XPMEM driver does not support checkpoint/restart at this
          time. If you enable these XPMEM enhancements, you will not be able
          to checkpoint and restart your MPI job.

          The XPMEM single-copy enhancements require an Origin 3000 and Origin
          300 servers running IRIX release 6.5.15 or greater.

          Default: Not set

     MPI_XPMEM_THRESHOLD (IRIX 6.5.15 systems or later)
          Specifies a minimum message size, in bytes, for which single-copy
          messages between processes residing on the same host will be
          transferred via the BTE, instead of bcopy.  The following conditions
          must exist before the BTE transfer is invoked:

               *   Single-copy mode is enabled (MPI_BUFFER_MAX).

               *   XPMEM single-copy enhancements are enabled (MPI_XPMEM_ON).

               *   Send and receive buffers are cache-aligned.

               *   Amount of data to transfer is greater than or equal to the
                   MPI_XPMEM_THRESHOLD value.

          Default: 8192

     MPI_XPMEM_VERBOSE
          Setting this variable allows additional MPI diagnostic information
          to be printed in the standard output stream. This information
          contains details about the XPMEM connections.

          Default:  Not enabled

     PAGESIZE_DATA (IRIX systems only)
          Specifies the desired page size in kilobytes for program data areas.
          On Origin series systems, supported values include 16, 64, 256,
          1024, and 4096.  Specified values must be integer.

          NOTE:  Setting MPI_DSM_OFF  disables the ability to set the data
          pagesize via this shell variable.

          Default:  Not enabled

     PAGESIZE_STACK (IRIX systems only)
          Specifies the desired page size in kilobytes for program stack
          areas.  On Origin series systems, supported values include 16, 64,
          256, 1024, and 4096.  Specified values must be integer.

          NOTE:  Setting MPI_DSM_OFF  disables the ability to set the data
          page size via this shell variable.

          Default:  Not enabled

     SMA_GLOBAL_ALLOC (IRIX systems only)
          Activates the LIBSMA based global heap facility.  This variable is
          used by 64-bit MPI applications for certain internal optimizations,
          as well as support for the MPI_Alloc_mem function. For additional
          details, see the intro_shmem(3) man page.

          Default: Not enabled

     SMA_GLOBAL_HEAP_SIZE (IRIX systems only)
          For 64-bit applications, specifies the per process size of the
          LIBSMA global heap in bytes.

          Default: 33554432 bytes

   Using a CPU List
     You can manually select CPUs to use for an MPI application by setting the
     MPI_DSM_CPULIST shell variable.  This setting is treated as a comma
     and/or hyphen delineated ordered list, specifying a mapping of MPI
     processes to CPUs.  If running across multiple hosts or when using
     multiple executables, the per host and per executable components of the
     CPU list are delineated by colons. The shepherd process(es) and mpirun
     are not included in this list.  This feature is not compatible with job
     migration features available in IRIX.

     Examples when launching an MPI job with the following syntax:

                    mpirun -np 3 a.out

               Value          CPU Assignment

               8,16,32        Place three MPI processes on CPUs 8, 16, and 32.

               32,16,8        Place the MPI process rank zero on CPU 32, one
                              on 16, and two on CPU 8.

     Examples when launching an MPI job with the following syntax:

                    mpirun -np 16 a.out

               Value          CPU Assignment

               8-15,32-39     Place the MPI processes 0 through 7 on CPUs 8 to
                              15.  Place the MPI processes 8 through 15 on
                              CPUs 32 to 39.

               39-32,8-15     Place the MPI processes 0 through 7 on CPUs 39
                              to 32.  Place the MPI processes 8 through 15 on
                              CPUs 8 to 15.

     Example when launching an MPI job with the following syntax:

                    mpirun host1,host2 8 a.out

               Value          CPU Assignment

               8-15:16-23     Place the MPI processes 0 through 7 on the first
                              host on CPUs 8 through 15.  Place MPI processes
                              8 through 15 on CPUs 16 to 23 on the second
                              host.

     Example when launching an MPI job with the following syntax:

                    mpirun host1,host2 8 a.out : host2 8 b.out

               Value          CPU Assignment

               8-15:16-23:28-35
                              Place the MPI processes 0 through 7 running
                              application a.out on the first host on CPUs 8
                              through 15.  Place MPI processes 8 through 15
                              running a.out on CPUs 16 to 23 on the second
                              host.   Place MPI processes 16 to 23 running
                              b.out on CPUS 28 to 35 on the second host.

     Note that the process rank is the MPI_COMM_WORLD rank.  The
     interpretation of the CPU values specified in the MPI_DSM_CPULIST depends
     on whether the MPI job is being run within a cpuset.  If the job is run
     outside of a cpuset, the CPUs specify cpunum values given in the hardware
     graph (hwgraph(4)).  When running within a cpuset, the default behavior
     is to interpret the CPU values as relative processor numbers within the
     cpuset.  To specify cpunum values instead, you can use the
     MPI_DSM_CPULIST_TYPE shell variable.

     On Linux systems, the CPU values are always treated as relative processor
     numbers within the cpuset.  It is assumed that the system will always
     have a default (unnamed) cpuset consisting of the entire system of
     available processors and nodes.

     The number of processors specified should equal the number of MPI
     processes (excluding the shepherd process) that will be used.  The number
     of colon delineated parts of the list must equal the number of hosts or
     executables used for the MPI job. If an error occurs in processing the
     CPU list, the default placement policy is used.  If the number of
     specified processors is smaller than the total number of MPI processes,
     only a subset of the MPI processes will be placed on the specified
     processors.  For example, if four processors are specified using the
     MPI_DSM_CPULIST variable, but five MPI processes are started, the last
     MPI process will not be attached to a processor.

     This feature should not be used with MPI jobs running in spawn capable
     mode.

   Using MPI with OpenMP
     Hybrid MPI/OpenMP applications might require special memory placement
     features to operate efficiently on ccNUMA Origin and Altix servers.  A
     method for realizing this memory placement is available.  The basic idea
     is to space out the MPI processes to accomodate the OpenMP threads
     associated with each MPI process.  In addition, assuming a particular
     ordering of library init code (see the DSO(5) man page), procedures are
     employed to insure that the OpenMP threads remain close to the parent MPI
     process. This type of placement has been found to improve the performance
     of some hybrid applications significantly when more than four OpenMP
     threads are used by each MPI process.

     To take partial advantage of this placement option, the following
     requirements must be met:

               *   The user must set the MPI_OPENMP_INTEROP shell variable
                   when running the application.

               *   On IRIX systems, the user must use a MIPSpro compiler and
                   the -mp option to compile the application.  This placement
                   option is not available with other compilers.

               *   The user must run the application on an Origin 300, Origin
                   3000, or Altix series server.

     To take full advantage of this placement option on IRIX systems, the user
     must be able to link the application such that the libmpi.so init code is
     run before the libmp.so init code.  This is done by linking the
     MPI/OpenMP application as follows:

          cc -64 -mp compute_mp.c -lmp -lmpi
          f77 -64 -mp compute_mp.f -lmp -lmpi
          f90 -64 -mp compute_mp.f -lmp -lmpi
          CC -64 -mp compute_mp.C -lmp -lmpi++ -lmpi

     This linkage order insures that the libmpi.so init runs procedures for
     restricting the placement of OpenMP threads before the libmp.so init is
     run.  Note that this is not the default linkage if only the -mp option is
     specified on the link line.

     On IRIX systems, you can use an additional memory placement feature for
     hybrid MPI/OpenMP applications by using the MPI_DSM_PLACEMENT shell
     variable. Specification of a threadroundrobin policy results in the
     parent MPI process stack, data, and heap memory segments being spread
     across the nodes on which the child OpenMP threads are running.  For more
     information, see the ENVIRONMENT VARIABLES section of this man page.

     MPI reserves nodes for this hybrid placement model based on the number of
     MPI processes and the number of OpenMP threads per process, rounded up to
     the nearest multiple of 4 on IRIX systems and 2 on Altix systems.  For
     instance, on IRIX systems, if 6 OpenMP threads per MPI process are going
     to be used for a 4 MPI process job, MPI will request a placement for 32
     (4 X 8) CPUs on the host machine.  You should take this into account when
     requesting resources in a batch environment or when using cpusets.  In
     this implementation, it is assumed that all MPI processes start with the
     same number of OpenMP threads, as specified by the OMP_NUM_THREADS or
     equivalent shell variable at job startup.

     NOTE:  This placement is not recommended when setting _DSM_PPM to  a
     non-default value (for more information, see pe_environ(5)).   This
     placement is also not recommended when running on a host with partially
     populated nodes.  Also, on IRIX systems, if you  are using
     MPI_DSM_MUSTRUN, it is important to also set _DSM_MUSTRUN to properly
     schedule the OpenMP  threads.

     On Linux systems, the OpenMP threads are not actually pinned to specific
     CPUs but are limited to the set of CPUs near the MPI rank.  Actual
     pinning of the threads will be supported in a future release.

 SEE ALSO

     mpirun(1), shmem_intro(1)

     arrayd(1M)

     MPI_Buffer_attach(3), MPI_Buffer_detach(3), MPI_Init(3), MPI_IO(3)
     arrayd.conf(4)

     array_services(5)

     For more information about using MPI, including optimization, see the
     Message Passing Toolkit: MPI Programmer's Manual. You can access this
     manual online at http://techpubs.sgi.com.

     Man pages exist for every MPI subroutine and function, as well as for the
     mpirun(1) command.  Additional online information is available at
     http://www.mcs.anl.gov/mpi, including a hypertext version of the
     standard, information on other libraries that use MPI, and pointers to
     other MPI resources.




home/search | what's new | help