Linux » Books » End-User »
SCSL User's Guide
(document number: 007-4325-001 / published: 2003-12-30)
table of contents | additional info | download find in page | jump to first hit | clear highlight
Chapter 2. Basic Linear Algebra Subprogram (BLAS) Routines
The SCSL BLAS routines are a library of routines
that perform basic operations involving matrices and vectors. The BLAS
are used in a wide range of software, including LINPACK, LAPACK, and many
other algorithms commonly in use today. They have become a de facto standard
for elementary vector and matrix operations.
There are three 'levels' of BLAS routines: Level 1: these routines perform vector-vector
operations such as dot-product and the adding of a multiple
of one vector to another.
Level 2: these routines perform matrix-vector
operations that occur frequently in the implementation of many of the
most common linear algebra algorithms. Note that algorithms that use Level
2 BLAS can be very efficient on vector computers, but are not well suite
to computers with a hierarchy of memory (that is, cache memory).
Level 3: these routines are used
for matrix-matrix operations.
See the remaining subsections in this chapter for details about
each type of BLAS.
BLAS 2 and BLAS 3 modules in SCSL are optimized and parallelized
to take advantage of SGI's hardware architecture. Best performance is
achieved with BLAS 3 routines where outer-loop unrolling and blocking
techniques have been applied to take advantage of the memory cache.
SCSL's LAPACK algorithms make extensive use of BLAS 3 modules and
are more efficient than the older, BLAS 1-based LINPACK algorithms.
The BLAS routines use the following
data types:
Single precision: Fortran “real” data types, C/C++ “float”
data types, 32-bit floating point. These routine names begin with
S.
Single precision complex: Fortran “complex”
data type, C/C++ “scsl_complex” data type (defined in
<scsl_blas.h>), C++ STL “complex<float>” data
type (defined in <complex.h>), two 32-bit
floating point reals. These routine names begin with C.
Double precision: Fortran “double precision” data
type, C/C++ “double” data type, 64-bit floating point.
These routine names begin with D.
Double precision complex: Fortran “double complex”
data type, C/C++ “scsl_zomplex” data type (defined in <
scsl_blas.h>), C++ STL “complex<double>” data
type (defined in <complex.h>), two 64-bit
floating point doubles. These routine names begin with Z.
The man(1) command
can find a man page online by either the single precision, single precision
complex, double precision, or double precision complex name, as shown
in the following table:
--------------------------------------------------------------
Single Double
Single Double Precision Precision
Precision Precision Complex Complex
--------------------------------------------------------------
form: Sname Dname Cname Zname
example: SGEMM DGEMM CGEMM ZGEMM
-------------------------------------------------------------- |
C Interface to the BLAS Routines
SCSL supports two different C
interfaces to the BLAS:
The C interface described in individual BLAS man pages
follows the same conventions used for the C interface to the SCSL signal
processing library.
SCSL also supports the C interface to the legacy BLAS
set forth by the BLAS Technical Forum. This interface supports row-major
storage of multidimensional arrays; see the
INTRO_CBLAS(3S) man page for details.
By default, the integer arguments are
4 bytes (32 bits) in size; this is the size obtained when the SCSL library
is linked with -lscs or lscs_mp. Another
version of SCSL is available, however, in which integers are 8 bytes (64
bits). This version allows the user access to larger memory sizes and
helps when porting legacy Cray codes. It can be loaded by using either
the -lscs_i8 or -lscs_i8_mp link option.
Any program may use only one of the two versions; 4-byte integer and8-byte
integer library calls cannot be mixed.
C/C++ function prototypes for Level
1 BLAS routines are provided in <scsl_blas.h>, when
using the default 4-byte integers, and in <scsl_blas_i8.h>
when using 8-byte integers. These header files define the complex
types scsl_complex and scsl_zomplex,
which are used in the prototypes. Alternatively, C++ programs may declare
arguments using the types complex<float> and
complex<double> from the standard template library. But if
these types are used, <complex.h> must be included
before <scsl_blas.h> (or <scsl_blas_i8.h>
). Both complex types are equivalent: they simply represent
(real, imaginary) pairs of floating point numbers stored contiguously
in memory. With the proper casts, you can simply pass arrays of floating
point data to the routines where complex arguments are expected.
Casts, however, can be avoided. The header files
<scsl_blas.h> and <scsl_blas_i8.h>
directly support the use of user-defined complex types or disabling prototype
checking for complex arguments completely. By defining the symbol
SCSL_VOID_ARGS before including <scsl_blas.h>
or <scsl_blas_i8.h> all complex arguments will be
prototyped as void *. To define the symbol
SCSL_VOID_ARGS at compile time use the -D
compiler option (for example, -DSCSL_VOID_ARGS) or
use an explicit #define SCSL_VOID_ARGS in the source
code. This allows the use of any complex data structure without warnings
from the compiler, provided the structure is the following: The real and imaginary components must be contiguous in
memory.
Sequential array elements must also be contiguous
in memory
While this allows the use of non-standard
complex types without generating compiler warnings, it has the disadvantage
that the compiler does not catch type mismatches.
Strong type checking can be enabled employing user-defined complex
types instead of SCSL's standard complex types. To do this, define
SCSL_USER_COMPLEX_T=my_complex and
SCSL_USER_ZOMPLEX_T=my_zomplex, where
my_complex and my_zomplex are
the names of user-defined complex types. These complex types must be defined
before including the <scsl_blas.h> or
<scsl_blas_i8.h> header file.
Fortran 90 users on IRIX systems can perform compile-time checking
of SCSL BLAS subroutine and function calls by adding USE SCSL_BLAS
(for 4-byte integer arguments) or USE SCSL_BLAS_I8
(for 8-byte integer arguments) to the source code from which
the BLAS calls are made. Alternatively, the compile-time checking can
be invoked without any source code modifications by using the
-auto_use compiler option, as in the following example:
% f90 -auto_use SCSL_BLAS test.f -lscs
% f90 -auto_use SCSL_BLAS_I8 -i8 test.f -lscs_i8 |
A vector's description
consists of the name of the array (x or
y) followed by the storage spacing (increment) in the array
of vector elements (incx or incy
). The increment can be positive or negative. When a vector
x consists of n elements, the
corresponding actual array arguments must be of a length at least
1+(n-1)*|incx|. For a negative increment, the first element
of x is assumed to be x(1+(n-1)*|incx|)
for Fortran arrays, x[(n-1)*|incx|] for
C/C++ arrays. The standard specification of _SCAL,
_NRM2, _ASUM, and I_AMAX
does not define the behavior for negative increments, so this functionality
is an extension to the standard BLAS.
Note that setting an increment argument to 0
can cause unpredictable results.
Array Storage (BLAS 2 and BLAS 3)
Multidimensional arrays passed as arguments to
BLAS routines must be stored in column-major order, the storage convention
used in Fortran programs. C and C++ users must explicitly store multidimensional
arrays column-by-column.
One way to do this is to reverse the order of array dimensions with
respect to the Fortran declaration (for example., x(ldx,n)
in Fortran versus x[n][ldx] in C/C++). Because of the
prototypes used in <scsl_blas.h>, the array should
be cast as a pointer to the appropriate type when passed as an argument
to a BLAS routine in order to avoid potential compiler type mismatch errors
or warning messages.
C and C++ users who want to employ row-major storage for multidimensional
arrays when calling the BLAS routines should see the
INTRO_CBLAS(3S)man page for details.
The
Level 1 BLAS routines perform vector-vector linear algebra operations.
The following types of vector-vector operations are available: Dot products and various vector norms
Scaling, copying, swapping, and computing linear combinations
of vectors
Generating or applying plane or modified plane rotations.
You should
use Fortran type declarations for functions. Declaring the data type of
the complex Level 1 BLAS functions is important because, based on the
first letter of the name of the routine and the Fortran data typing rules,
the default implied data type would be REAL.
Fortran type declarations for function names are as follows: | Type | | Function Name
| | REAL | | SASUM, SCASUM,
SCNRM2, SDOT, SNRM2,
SSUM
| | COMPLEX | | CDOTC, CDOTU,
CSUM
| | DOUBLE PRECISION | | DASUM, DZASUM,
DDOT, DNRM2, DZNRM2,
DSUM
| | DOUBLE COMPLEX | | ZDOTC, ZDOTU,
ZSUM
| | INTEGER | | ISAMAX, IDAMAX,
ICAMAX, IZAMAX, ISAMIN,
IDAMIN, ISMAX, IDMAX,
ISMIN, IDMIN
|
The following routines are available in the SCSL BLAS 1: SASUM, DASUM: Sums
the absolute values of the elements of a real vector (also called the
l norm).
SCASUM, DZASUM:
Sums the absolute values of the real and imaginary parts of the elements
of a complex vector.
SAXPBY*, DAXPBY*,
CAXPBY*, ZAXPBY*: Adds a scalar multiple
of a real or complex vector to a scalar multiple of another vector.
SAXPY, DAXPY,
CAXPY, ZAXPY: Adds a scalar multiple of a
real or complex vector to another vector.
SCOPY, DCOPY,
CCOPY, ZCOPY: Copies a real or complex vector
into another vector.
CDOTC, ZDOTC: Computes
a dot product of the conjugate of a complex vector and another complex
vector.
SHAD*, DHAD*,
CHAD*, ZHAD*: Computes the Hadamard product
of two vectors.
SNRM2, DNRM2: Computes
the Euclidean norm (also called l2 norm) of a
real vector.
SCNRM2, DZNRM2:
Computes the Euclidean norm (12 norm) of a complex
vector. 2
CSROT*, ZDROT*,
CROT*, ZROT*: Applies a real plane rotation
to a pair of complex vectors.
SROT, DROT: Applies
an orthogonal plane rotation.
SROTG, DROTG,
CROTG*, ZROTG*: Constructs a Givens plane
rotation.
SROTM, DROTM: Applies
a modified Givens plane rotation.
SROTMG,DROTMG: Constructs
a modified Givens plane rotation.
SSCAL, DSCAL,
CSCAL, ZSCAL, CSSCAL,
ZDSCAL: Scales a real or complex vector.
SSUM*, DSUM*,
CSUM*, ZSUM*: Sums the elements of a real
or complex vector.
SSWAP, DSWAP,
CSWAP, ZSWAP: Swaps two real or two complex
vectors.
ISAMAX, IDAMAX,
ICAMAX, IZAMAX: Searches a vector for the
first occurrence of the maximum absolute value.
ISAMIN*, IDAMIN*:
Searches a vector for the first occurrence of the minimum absolute value.
ISMAX*, IDMAX*:
Searches a vector for the first occurrence of the maximum value.
ISMIN*, IDMIN*:
Searches a vector for the first occurrence of the minimum value.
The
Level 2 BLAS routines perform matrix-vector linear algebra operations.
The following routines are available: CHBMV, ZHBMV: Multiplies
a complex vector by a complex Hermitian band matrix.
CHEMV, ZHEMV: Multiplies
a complex vector by a complex Hermitian matrix.
CHER, ZHER: Performs
Hermitian rank 1 update of a complex Hermitian matrix.
CHER2, ZHER2: Performs
Hermitian rank 2 update of a complex Hermitian matrix.
CHPMV, ZHPMV: Multiplies
a complex vector by a packed complex Hermitian matrix.
CHPR, ZHPR: Performs
Hermitian rank 1 update of a packed complex Hermitian matrix.
CHPR2, ZHPR2: Performs
Hermitian rank 2 update of a packed complex Hermitian matrix.
SGBMV, DGBMV,
CGBMV, ZGBMV: Multiplies a real or complex
vector by a real or complex general band matrix.
SGEMV, DGEMV,
CGEMV, ZGEMV: Multiplies a real or complex
vector by a real or complex general matrix.
SGER, DGER: Performs
rank 1 update of a real general matrix.
CGERC, ZGERC: Performs
conjugated rank 1 update of a complex general matrix.
CGERU, ZGERU: Performs
unconjugated rank 1 update of a complex general matrix.
SGESUM*, DGESUM*,
CGESUM*, ZGESUM*: Adds a scalar multiple
of a real or complex matrix to a scalar multiple of another real or complex
matrix.
SSBMV, DSBMV: Multiplies
a real vector by a real symmetric band matrix.
SSPMV, DSPMV,
CSPMV*, ZSPMV*: Multiplies a real or complex
vector by a real or complex symmetric packed matrix.
SSPR, DSPR,
CSPR*, ZSPR*: Performs symmetric rank 1 update
of a real or complex symmetric packed matrix.
SSPR2, DSPR2: Performs
symmetric rank 2 update of a real symmetric packed matrix.
SSYMV, DSYMV,
CSYMV*, ZSYMV*: Multiplies a real or complex
vector by a real or complex symmetric matrix.
SSYR, DSYR,
CSYR*, ZSYR*: Performs symmetric rank 1 update
of a real or complex symmetric matrix.
SSYR2, DSYR2: Performs
symmetric rank 2 update of a real symmetric matrix.
STBMV, DTBMV,
CTBMV, ZTBMV: Multiplies a real or complex
vector by a real or complex triangular band matrix.
STBSV, DTBSV,
CTBSV, ZTBSV: Solves a real or complex triangular
band system of equations.
STPMV, DTPMV,
CTPMV, ZTPMV: Multiplies a real or complex
vector by a real or complex triangular packed matrix.
STPSV, DTPSV,
CTPSV, ZTPSV: Solves a real or complex triangular
packed system of equations.
STRMV, DTRMV,
CTRMV, ZTRMV: Multiplies a real or complex
vector by a real or complex triangular matrix.
STRSV, DTRSV,
CTRSV, ZTRSV: Solves a real or complex triangular
system of equations.
The
Level 3 BLAS routines perform matrix-matrix linear algebra operations.
The following routines are available: SGEMM, DGEMM,
CGEMM, ZGEMM: Multiplies a real or complex
general matrix by a real or complex general matrix.
CGEMM3M*, ZGEMM3M*:
Multiplies a complex general matrix by a complex general matrix, using
3 real matrix multiplications and 5 matrix additions.
DGEMMS*: Multiplies a double precision
general matrix by a double precision general matrix, using a variation
of Strassen's algorithm.
SSYMM, DSYMM,
CSYMM, ZSYMM: Multiplies a real or complex
general matrix by a real or complex symmetric matrix.
CHEMM, ZHEMM: Multiplies
a complex general matrix by a Hermitian matrix.
SSYR2K, DSYR2K,
CSYR2K, ZSYR2K: Performs symmetric rank 2k
update of a real or complex symmetric matrix.
CHER2K, ZHER2K:
Performs Hermitian rank 2k update of a complex Hermitian matrix.
SSYRK, DSYRK,
CSYRK, ZSYRK: Performs symmetric rank k update
of a real or complex symmetric matrix.
CHERK, ZHERK: Performs
Hermitian rank k update of a complex Hermitian matrix.
STRMM, DTRMM,
CTRMM, ZTRMM: Multiplies a real or complex
general matrix by a real or complex triangular matrix.
SCSL User's Guide
(document number: 007-4325-001 / published: 2003-12-30)
table of contents | additional info | download
Front Matter
About This Guide
Chapter 1. Introduction
Chapter 2. Basic Linear Algebra Subprogram (BLAS) Routines
Chapter 3. LAPACK
Chapter 4. Using Sparse Linear Equation Solvers
Chapter 5. Signal Processing Routines
Appendix A. Supported SCSL Routines
Glossary
Index
home/search |
what's new |
help
|