Linux » Books » End-User »
SCSL User's Guide
(document number: 007-4325-001 / published: 2003-12-30)
table of contents | additional info | download find in page | jump to first hit | clear highlight
Chapter 5. Signal Processing Routines
SCSL provides three types of signal processing routines:
See the release notes that are provided with your system for details
about converting FFT routines for CHALLENGERcomplib to SCSL.
The
FFT routines have been highly optimized for single-processor use. The
two-dimensional, three-dimensional, and one-dimentional multiple routines
are also multitasked (multithreaded) for all sizes for which there is
a performance benefit. The one-dimensional routines are multitasked if
the data size exceeds the size of the largest processor cache. Each routine
can compute either a forward or an inverse Fourier transform.
The following data types are used in
these routines: Single precision: Fortran "real" data type, C/C++ "float"
data type, 32-bit floating point; these routine names begin with
S.
Single precision complex: Fortran "complex" data type,
C/C++ "scsl_complex" data type (defined in <scsl_fft.h>),
C++ STL "complex<float>" data type (defined in <complex.h>
, two 32-bit floating point reals; these routine names begin
with C.
Double precision: Fortran "double precision" data type,
C/C++ "double" data type, 64-bit floating point; these routine names begin
with D.
Double precision complex: Fortran "double complex" data
type, C/C++ “scsl_zomplex" data type (defined in <scsl_fft.h>
), C++ STL "complex<double>" data type (defined in
<complex.h>), two 64-bit floating point doubles; these routine
names begin with Z.
When using the C++ Standard Template
Library (STL) to define complex types, the include files must be used
in the following order: #include <complex.h>
#include <scsl_fft.h> |
Often little or no difference
exists between these versions, other than the data types of some inputs
and outputs. In this case, the routines are described on the same man
page, and that man page is named after the real or complex routine.
The man(1) command can find a man
page online by either the real, complex, double precision, or double complex
name.
The data types for the scale,
table, and work arguments in
these routines vary, depending on the function which is called. In the
CC, SC, and CS routines,
the arguments are single precision. In the ZZ,
DZ and ZD routines, the arguments are double
precision.
By default, the integer arguments are 4 bytes (32
bits) in size; this is the size obtained when one links to the SCSL library
with -lscs or -lscs_mp. Another version
of SCSL is available, however, in which integers are 8 bytes (64 bits).
This version allows the user access to larger memory sizes and helps when
porting legacy codes. It can be loaded by using either the -lscs_i8
or -lscs_i8_mp link option. Note that any program
may use only one of the two versions; 4-byte integer and 8-byte integer
library calls cannot be mixed.
C/C++ function prototypes for the signal processing
routines are provided in <scsl_fft.h>, when using
the default 4-byte integers, and <scsl_fft_i8.h>
when using 8-byte integers. These header files define the complex types
scsl_complex and scsl_zomplex, which are
used in the prototypes. Alternatively, C++ programs may declare arguments
using the types complex<float> and complex<double>
from the standard template library (STL). But if these types
are used, <complex.h> must be included before
<scsl_fft.h> (or <scsl_fft_i8.h>).
Note, though, that both complex types are equivalent: they simply represent
(real, imaginary) pairs of floating point numbers stored contiguously
in memory. With the proper casts, you can simply pass arrays of floating
point data to the routines where complex arguments are expected.
Casts, however, can be avoided. The header
files <scsl_fft.h> and <scsl_fft_i8.h>
directly support the use of user-defined complex types or disabling
prototype checking for complex arguments completely. By defining the
symbol SCSL_VOID_ARGS before including <scsl_fft.h>
or <scsl_fft_i8.h> all complex arguments
will be prototyped as void *. To define the
symbol SCSL_VOID_ARGS at compile time use the
-D compiler option (for example, -DSCSL_VOID_ARGS)
or use an explicit #define SCSL_VOID_ARGS in the source
code. This allows the use of any complex data structure without warnings
from the compiler, provided the structure is as described above: The real and imaginary components must be contiguous in
memory.
Sequential array elements must also be contiguous
in memory.
While this allows the use of non-standard
complex types without generating compiler warnings, it has the disadvantage
that the compiler will not catch type mismatches.
Strong type checking can be enabled employing user-defined complex
types instead of SCSL's standard complex types. To do this, define
SCSL_USER_COMPLEX_T=my_complex and
SCSL_USER_ZOMPLEX_T=my_zomplex, where
my_complex and my_zomplex are
the names of user-defined complex types. These complex types must be defined
before including the <scsl_fft.h> (or
<scsl_fft_i8.h>) header file.
Fortran 90 users on IRIX systems can perform compile-time checking
of SCSL FFT subroutine calls by adding USE SCSL_FFT (for
4-byte integer arguments) or USE SCSL_FFT_I8 (for 8-byte
integer arguments) to the source code from which the FFT calls are made.
Alternatively, the compile-time checking can be invoked without any source
code modifications by using the -auto_use compiler option.
For example:
% f90 -auto_use SCSL_FFT test.f -lscs
% f90 -auto_use SCSL_FFT_I8 -i8 test.f -lscs_i8 |
The following list describes
the supported FFT routines. Each of these routines is highly optimized
for single-processor use. The two-dimensional, three-dimensional, and
one-dimensional multiple routines are alsomultitasked (multi-threaded)
for all sizes for which there is a performance benefit; the one-dimensional
routines are multitasked if the data size exceeds the size of the largest
processor cache. Each routine can compute either a forward or an inverse
Fourier transform. Rows of the table represent input and output data typesfor
the routines in each column:
C->C implies 32-bit complex input and
output.
Z->Z implies 64-bit double complex
input and 64-bit double complex output. Each routine in this row is documented
with the complex routine in the above row.
S->C implies 32-bit real input and
32-bit complex output.
D->Z implies 64-bit double precision
real input and 64-bit double precision complex output. Each routine in
this row is documented with the real-to-complex routine in the above row.
C->S implies 32-bit complex input and
32-bit real output.
Z->D implies 64-bit double complex
input and 64-bit double precision output. Each routine named in this row
is documented with the complex-real routine in the above row.
Columns of the table represent the number of dimensions for which
the FFT is calculated for the routines in each row:
One-dimensional (single) calculates one FFT in one dimension.
One-dimensional (multiple) calculates an FFT in one dimension
for each column of a two-dimensional matrix.
Two-dimensional calculates one FFT in two dimensions.
Three-dimensional calculates one FFT in three dimensions.
---------------------------------------------------------------------------
1-dimensional 1-dimensional 2-dimensional 3-dimensional
(single) (multiple)
---------------------------------------------------------------------------
C->C CCFFT CCFFTM CCFFTMR CCFFT2D CCFFT3D
Z->Z ZZFFT ZZFFTM ZZFFTMR ZZFFT2D ZZFFT3D
S->C SCFFT SCFFTM SCFFT2D SCFFT3D
D->Z DZFFT DZFFTM DZFFT2D DZFFT3D
C->S CSFFT CSFFTM CSFFT2D CSFFT3D
Z->D ZDFFT ZDFFTM ZDFFT2D ZDFFT3D
--------------------------------------------------------------------------- |
Implementation Notes: work and
table arrays
The FFT routines were designed so that they
can be implemented efficiently on many different architectures. The calling
sequence is the same in any implementation. Certain details, however,
depend on the particular implementation.
One area of difference is the size of the table
and work arrays. Different systems may need
different sizes. The subroutine call requires no change, but you may have
to change array sizes in the DIMENSION or
type statements that declare the arrays. The following are the
required array sizes for the Origin series (the values of
NR and NFR are explained later):
CCFFT table: 2n + NF REAL
work: 2n REAL WORDS |
ZZFFT table: 2n + NF DBL PREC WORDS
work: 2n DBL PREC WORDS |
CCFFTMR table: 2n + NF REAL WORDS
work: 2n REAL WORDS |
ZZFFTMR table: 2n + NF DBL PREC WORDS
work: 2n DBL PREC WORDS |
CCFFT2D table: (2*n1+NF) + (2*n2+NF) REAL WORDS
work: 2*MAX(n1,n2) REAL WORDS |
ZZFFT2D table: (2*n1+NF) + (2*n2+NF) DBL PREC WORDS
work: 2*MAX(n1,n2) DBL PREC WORDS |
CCFFT3D table: (2*n1+NF) + (2*n2+NF) + (2*n3+NF) REAL WORDS
work: 2*MAX(n1,n2,n3) REAL WORDS |
ZZFFT3D table: (2*n1+NF) + (2*n2+NF) + (2*n3+NF) DBL PREC WORDS
work: 2*MAX(n1,n2,n3) DBL PREC WORDS |
CCFFTM table: (NF + 2 * n) REAL
work: 2n REAL WORDS |
ZZFFTM
table: (NF + 2 * n) DBL PREC
work: 2n DBL PREC WORDS |
SCFFT, CSFFT table: (n+NFR) REAL
work: n+2 REAL WORDS |
DZFFT, ZDFFT table: (n+NFR) DBL PREC
work: n + 2 DBL PREC WORDS |
SCFFT2D, CSFFT2D table: (n+NFR) + (2*n2+NF) REAL
work: n1+4*n2 REAL WORDS |
DZFFT2D, ZDFFT2D table: (n1+NFR) + (2*n2+NF) DBL PREC WORDS
work: n1 + 4 * n2 DBL PREC WORDS |
SCFFT3D, CSFFT3D table: (n1+NFR) + (2*n2+NF) + (2*n3+NF) REAL WORDS
work: n1 + 4 * n3 REAL WORDS |
DZFFT3D, ZDFFT3D table: (n1+NFR) + (2*n2+NF) + (2*n3+NF) DBL PREC WORDS
work: n1 + 4 * n3 DBL PREC WORDS |
SCFFTM, CSFFTM table: (n+NFR) REAL WORDS
work: n + 2 REAL WORDS |
DZFFTM, ZDFFTM table: (n+NFR) DBL PREC
work: n + 2 DBL PREC WORDS |
Implementation Notes: isys Parameter
Array
The
second area of difference is the isys parameter array, an array that gives
certain implementation-specific information. All features and functions
of the FFT routines specific to any particular implementation are confined
to this isys array. On any implementation,
you can use the default values by using an argument value of 0.
In the Origin series implementation, isys
(0)=0 and isys(0)=1
are supported. In SCSL versions prior to 1.3, only isys
(0)=0 was allowed. For isys
(0)=0, NF=30
and NFR=15, and for
isys(0)=1, NF=NFR
=256. The NF(R) words
of storage in the table array contain a factorization
of the length of the transform.
The smaller values of NF and
NFR for isys(0)=0 are historical.
They are too small to store all the required factors for the highest performing
FFT, so when isys(0)=0, extra space is allocated
when the table array is initialized. To avoid
memory leaks, this extra space must be deallocated when the
table array is no longer needed. The routines
CCFFTF, CCFFTMF, etc., are used to release
this memory. Due to the potential for memory leaks, the use of
isys(0)=0 should be avoided.
For isys(0)=1, the
values of NF and NFR
are large enough so that no extra memory needs to be allocated, and there
is no need to call CCFFTF, etc. to release memory.
(If called, these routines do nothing.) isys(0)
= 1 means that isys
is an integer array with two elements. The second element,
isys(1), will not be accessed.
Implementation Notes: Scratch Space
Finally,
in addition to the work array, the FFT routines
also dynamically allocate scratch space from the stack. The amount of
space allocated can be slightly bigger than the size of the largest processor
cache. For single processor runs, the default stack size is large enough
that these allocations generally cause no problems. But for parallel runs,
you need to ensure that the stack size of slave threads is big enough
to hold this scratch space. Failure to reserve sufficient stack space
will cause programs to dump core due to stack overflows. The stack size
of MP library slave threads is controlled via the MP_SLAVE_STACKSIZE
environment variable or the mp_set_slave_stacksize()
library routine. See the mp(3c), mp(3F) and pe_environ(5)man
pages for more information on controlling the slave stack size.
For pthreads applications, the thread's stack size is specified
as one of many creation attributes provided in the pthread_attr_t
argument to pthread_create(3P).
The stacksize attribute should be set explicitly to a non-default value
using the pthread_attr_setstacksize(3P)
call, described in the pthread_attr_init(3P)
man page.
Convolution and Correlation Routines
The convolution routines feature
convolution for Finite Impulse Response (FIR) as well as correlations.
Each routine is highly optimized for single-processor use. The routines
which use two-dimensional input sequences are multitasked (multi-threaded).
The convolution and correlation routines are very general. To achieve
this generality and maximum flexibility, one-dimensional sequences are
defined by 3 parameters. Six parameters are necessary for two-dimensional
sequences. One drawback of this generality are the long calling sequences.
The following table contains a summary of the filter and correlation
routines. In this table, rows of the table represent data types for the
routines in each column: C implies 32-bit complex data.
Z implies 64-bit double complex data
S implies 32-bit real data.
D implies 64-bit double precision real
data.
Columns of the table represent the type of computation as well as
the number of dimensions for which the convolution or correlation is calculated
for the routines in each row:
One-dimensional FIR applies a Finite Impulse Response
filter to one-dimensional signals.
One-dimensional (multiple) FIR applies a Finite Impulse
Response filter to multiple one-dimensional signals.
Two-dimensional FIR applies a Finite Impulse Response
filter to two-dimensional signals.
One-dimensional COR calculates the correlation of one-dimensional
sequences.
One-dimensional (multiple) COR calculates the correlation
of multiple one-dimensional sequences.
Two-dimensional COR calculates the correlation of two-dimensional
sequences.
------------------------------------------------------------
Type 1D (single) 1D (multiple) 2D
------------------------------------------------------------
C CFIR1D CFIRM1D CFIR2D
Z ZFIR1D ZFIRM1D ZFIR2D
S SFIR1D SFIRM1D SFIR2D
D DFIR1D DFIRM1D DFIR2D
------------------------------------------------------------
C CCOR1D CCORM1D CCOR2D
Z ZCOR1D ZCORM1D ZCOR2D
S SCOR1D SCORM1D SCOR2D
D DCOR1D DCORM1D DCOR2D
------------------------------------------------------------ |
SCSL User's Guide
(document number: 007-4325-001 / published: 2003-12-30)
table of contents | additional info | download
Front Matter
About This Guide
Chapter 1. Introduction
Chapter 2. Basic Linear Algebra Subprogram (BLAS) Routines
Chapter 3. LAPACK
Chapter 4. Using Sparse Linear Equation Solvers
Chapter 5. Signal Processing Routines
Appendix A. Supported SCSL Routines
Glossary
Index
home/search |
what's new |
help
|