SGI Techpubs Library

Linux  »  Man Pages
find in page



DPLACE
NAME
       dplace - a tool for controlling placement of processes onto cpus

SYNOPSIS
       dplace [-e] [-c cpu_numbers] [-s skip_count] [-n process_name] \
             [-x skip_mask] [-r [l|L|b|B|A|t]] [-o log_file] [-v 1|2] \
             command [command-args]
       dplace [-p placement_file] [-o log_file] command [command-args]
       dplace [-q] [-qq] [-qqq]

DESCRIPTION
       The  given  program  is  executed after scheduling and memory placement
       policies are set up according to command line arguments,

       By default, memory is allocated to a process on the node that the  pro-
       cess  is  executing on. If a process moves from node to node during its
       lifetime, a higher percentage of memory references will  be  to  remote
       nodes.  Remote  accesses  typically  have higher access times.  Process
       performance may suffer.

       Dplace is used to bind a related set of processes to specific  cpus  or
       nodes  to  prevent process migrations. In some cases, this will improve
       performance since a higher percentage of memory accesses  will  to  the
       local node.

       Processes always execute within a cpuset. The cpuset specifies the cpus
       that are available for a process to execute on. By  default,  processes
       usually execute in a cpuset that contains all the cpus in the system.

       The  cpu numbers specified on the command line or in the placement file
       are always cpuset-relative.

       Dplace invokes a kernel module to create a job placement container con-
       sisting of all (or a subset of) the cpus of the cpuset.  In the current
       version, version 2, a LD_PRELOAD  library  (libdplace.so)  is  used  to
       intercept calls to fork(), exec(), and pthread_create() to do placement
       of tasks being created. Note that tasks created internal to  glibc  are
       not intercepted by the preload library. These tasks will not be placed.

       If no placement file is being used, then the dplace process  is  placed
       in  this  container  and  (by default) is bound to the first cpu of the
       cpuset associated with the container. Then  dplace  "execs"  the  <com-
       mand>.   The  command  is executing within this placement container and
       continues to be bound to the first cpu of the container. As the command
       forks  child processes, they inherit the container and are bound to the
       next available cpu of the container.

       If a placement file is being used,  then  the  dplace  process  is  not
       placed  at  the  time the job placement container is created. Placement
       occurs as processes are forked and exec'd. The placement file may  con-
       tain  a  directive for placing this first task. See dplace(5) for addi-
       tional details.

       dplace maintains a global count of the number of active processes  that
       have been placed (by dplace) on each cpu.

       dplace supports 2 placement modes: load balanced and exact placement.

       If  load  balanced  placement (default) is selected, dplace will bind a
       process to the cpu that has the lowest number of  processes  that  were
       placed  by  dplace  AND is also in the users cpu list.  For example, if
       the current cpuset consists of physical cpus 2, 5, 8, and  9,  and  the
       user types:

               mpirun -np 2 dplace -s1 app1
               mpirun -np 2 dplace -s1 app2

       app1  will  run  on  cpus  2  and  5. But app2 will run on cpus 8 and 9
       because they have a lesser load (the -s1 above  causes  dplace  to  not
       place  an  MPI  helper  process that is mostly inactive).  This assumes
       that no other processes that were placed by dplace are still running on
       these cpus.

       If exact placement is selected <-e>, processes are bound to cpus in the
       exact order that the cpus are specified in the cpu  list.  Cpu  numbers
       may  appear  multiple  times  in the list. A cpu value of "x" indicates
       that binding should not be done for that process. If  the  end  of  the
       list is reached, binding starts over at the beginning of the list.

OPTIONS
       -c     Cpu numbers. Specified as a list of cpus, optionally strided cpu
              ranges, or a striding pattern. Example: "-c 1",  "-c  2-4",  "-c
              1,4-8,3",  "-c  2-8:3",  "-c CS", "-c BT". The specification "-c
              2-4" is equivalent to "-c 2,3,4" and "-c 2-8:3" is equivalent to
              2,5,8.  Ranges may also be specified in reverse order: "-c 12-8"
              is equivalent to 12,11,10,9,8. Cpu numbers are NOT physical  cpu
              numbers.  They  are  logical cpu number that are relative to the
              cpus that are in the set of allowed cpus  as  specified  by  the
              current  cpuset.  A  cpu  value of "x" (or "*"), in the argument
              list for -c option, indicates that binding should  not  be  done
              for  that  process."x"  should  be used only if the -e option is
              also used. Cpu numbers start at 0.  For  striding  patterns  any
              subset of the characters (B)lade, (S)ocket, (C)ore, (T)hread may
              be used and their ordering specifies the nesting of  the  itera-
              tion.  For  example  "SC"  means  to  iterate all the cores in a
              socket before moving to the next CPU socket, while "CB" means to
              pin  to  the  first  core of each blade, then the second core of
              every blade, etc. For best results, use the -e option when using
              stride  patterns. If the -c option is not specified, all cpus of
              the current cpuset are available. The command itself  (which  is
              exec'd  by  dplace)  is the first process to be placed by the -c
              cpu_numbers.

       -e     Exact placement. As processes are created,  they  are  bound  to
              cpus  in  the exact order that the cpus are specified in the cpu
              list. Cpu numbers may appear multiple times in the list.  A  cpu
              value  of "x" indicates that binding should not be done for that
              process. If the end of the list is reached, binding starts  over
              at the beginning of the list.

       -o     Write  a  trace  file  to <log_file> that decribes the placement
              actions that were made for each fork, exec, etc. Each line  con-
              tains  a  timestamp, process id:thread number, cpu that task was
              executing on, taskname | placement action. Works with version  2
              only.

       -s     Skip  the  first <skip_count> processes before starting to place
              processes  onto  cpus.  This  option  is  useful  if  the  first
              <skip_count>  processes  are  "shepherd" processes that are used
              only for launching the application. If <skip_count> is not spec-
              ified, a default value of 0 is used.

       -n     Only  processes named <process_name> are placed. Other processes
              are ignored and are not explicitly bound to  cpus.   Note:  pro-
              cess_name is the basename of the executable.

       -r     Specifies that text and/or data should be replicated on the node
              or nodes where the application is running. In some cases, repli-
              cation  will  improve  performance  by reducing the need to make
              offnode memory references. The replication option applies to all
              programs  placed by the dplace command.  See dplace(5) for addi-
              tional information on text replication.  The replication options
              are a string of one or more of the following characters:

              l      replicate library text

              L      replicate library RW data

              b      replicate binary (a.out) text

              B      replicate binary (a.out) RW data

              A      replicate all library and DSO text & RW data

              t      thread round-robin option

       -x     Provides the ability to skip placement of processes. <skip_mask>
              is a bitmask. If bit N of <skip_mask> is  set,  then  the  N+1th
              process  that  is forked is not placed. For example, setting the
              mask to 6 will cause  the  2nd  and  3rd  processes  from  being
              placed.  The  first process (the process named by the <command>)
              will be assigned to the first cpu. The  second  and  third  pro-
              cesses  are  not  placed.  The fourth process is assigned to the
              second cpu, etc..  This option is useful for certain classes  of
              threaded  apps  that spawn a few helper processes that typically
              do not use much cpu time.  (Hint: Intel OpenMP applications cur-
              rently  should be placed using -x 2. This could change in future
              versions of OpenMP).

       -v     Provides the ability to run in version 1 or version  2  compati-
              bility  mode  if  the kernel support is available. If not speci-
              fied, version 2 compatibility  is  selected.  See  COMPATIBILITY
              section  for more details.  Note: version 1 requires kernel sup-
              port for PAGG.

       -p     Specifies a placement file that contains  additional  directives
              that  are  used  to control process placement. See dplace(5) for
              additional details and for a description of the  placement  file
              syntax.

       -q     If  specified  once,  lists  the  global  count of the number of
              active processes that have been placed (by dplace) on  each  cpu
              in  the  current  cpuset.  Note that cpu numbers are logical cpu
              numbers within the cpuset, NOT physical cpu numbers.  If  speci-
              fied  twice,  lists the current dplace jobs that are running. If
              specified 3 times, lists the current dplace jobs and  the  tasks
              that are in each job.

EXAMPLES
       The following examples assume the command is executed from a shell run-
       ning in a cpuset consisting of physical cpus 8-15.

       To execute a process on a specific set of logical cpus:

               dplace -c 2 date        # date runs on physical cpu 10.

               dplace make linux       # gcc and related processes run on
                                       # physical cpus 8-15.

               dplace -c 0-4,6 make linux      # make (gcc and related
                                               # processes) run on physical
                                               # cpus 8-12 or 14.

       The following example assumes the application is NOT run in  a  cpuset;
       in other words, the cpuset is the entire system.

               dplace -e -c 6,x,x,2 app # app will run on cpu 6. The first two
                                        # threads created by app (either by fork
                                        # or pthread_create) will not be bound.
                                        # The 3th thread is bound to cpu 2. If a
                                        # 4th thread is created, it is bound to
                                        # cpu 6.

               dplace -e -c 31,x,30-0   # The app will run on cpu 31. The first
                                        # app-created thread (either by fork or
                                        # pthread_create) will not be bound.
                                        # The second app-created thread is bound
                                        # to cpu 30, the third app-created
                                        # thread is bound to cpu 29, etc.

MPI EXAMPLE
       Most  SGI Message Passing Toolkit (MPT) MPI jobs are launched by mpirun
       and use N+1 threads. The first thread is mainly  inactive  and  usually
       does not need to be placed.

       To launch an MPI application use the following syntax:

               mpirun -np <process_count> dplace [dplace_args] app [args]

       Example:

               mpirun -np 8 dplace -s1 lu.8

OPENMP EXAMPLE
       Intel  OpenMP  jobs use an extra thread that is unknown to the user and
       need not be placed. In addition,  OpenMP  jobs  have  Intel-driven  cpu
       placement  functionality  that  must  be disabled by setting KMP_AFFIN-
       ITY=disabled when running OpenMP jobs with dplace.

       To launch an OpenMP application  while  pinning  the  user  threads  to
       unique logical cpus, use the following syntax:

               env KMP_AFFINITY=disabled OMP_NUM_THREADS=<thread count> \
                       dplace -e -x 2 [dplace_args] app [args]

       Example:

               env KMP_AFFINITY=disabled OMP_NUM_THREADS=4 \
                       dplace -e -x 2 -c 0-3 lu-omp.4

       If  you  have SGI's MPT MPI package installed, you can conveniently use
       omplace for a simpler syntax that hides some of the details  associated
       with launching OpenMP apps that were described in the previous example.
       To launch an OpenMP application  while  pinning  the  user  threads  to
       unique logical cpus starting at 0, use the following syntax:

               omplace -nt <thread count> [omplace_args] app [args]

       Example:

               omplace -nt 4 lu-omp.4

COMPATIBILITY
       Version  1 of numatools required kernel support for PAGG process place-
       ment groups. This support is no longer available in  all  kernel  vari-
       ants.

       Version  2  of  numatools  uses a preload library to intercept calls to
       fork(), exec() (all variants), pthread_create() and pthread_exit(). The
       intercept  code performs placement as part of the library call. In most
       cases, version 1 and version 2 are compatible. In some cases,  however,
       a user will notice differences:

               preload libraries do not work with statically linked binaries

               preload libraries do not intercept fork() or exec() calls that
               come from glibc itself. Specifically, the system() call is not
               intercepted and no placement of tasks that result from a system()
               call will be done. In most cases, this is not an issue although
               you may need to adjust the <skip_count> if you use this option
               to skip tasks created by system().

       In some cases, version 2 of numatools will give better performance than
       version 1. Assuming first-touch placement  policy,  in  version  1  all
       thread-private data and a few stack pages will be located on the parent
       node, not the node that the task is placed on. In version 2, this  mem-
       ory is usually allocated local to the task's node.

       Dplace sets an environment variable to indicate if version 1 or version
       2 placement is being done. This variable can be tested by  applications
       or scripts:

              __DPLACE_ = 1
                     # version 1 placement (requires PAGG kernel support)

              __DPLACE_ = 2
                     # version 2 placement.

BUGS
       The <skip_mask> is a kludge. A better solution is needed.

       The  "-n  <process_name>"  option  is only marginally useful. The <pro-
       cess_name> is checked at fork time but not at "exec" time.

       Tasks created internal to glibc are  not  intercepted  by  the  preload
       libray.   These  tasks  are  not  placed and will run on any cpu in the
       cpuset. For example, tasks created by the system()  call  will  not  be
       placed. THIS IS AN INCOMPATIBILITY with version 1.x of numatools.

       Unless  running  in  version 1 compatibility mode, dplace does not work
       with statically linked binaries.

       Because LD_PRELOAD is ignored for SUID programs,  dplace  will  not  do
       correct placement of child processes of SUID programs.

ERRORS
       Dplace  depends  on a loadable kernel module named "numatools". If this
       module is not loaded, dplace will fail and print a  message  to  remind
       the user to load the numatools module.

SEE ALSO
       cpuset(1), dplace(5), dlook(1), omplace(1)

2.0                              26 June 2012                        dplace(1)

Output converted with man2html


home/search | what's new | help