SGI Techpubs Library

Linux  »  Books  »  End-User  »  
SCSL User's Guide
(document number: 007-4325-001 / published: 2003-12-30)    table of contents  |  additional info  |  download
find in page | jump to first hit | clear highlight

Chapter 5. Signal Processing Routines

SCSL provides three types of signal processing routines:

  • Fast Fourier Transform (FFT) routines

  • Convolution routines

  • Correlation routines

See the release notes that are provided with your system for details about converting FFT routines for CHALLENGERcomplib to SCSL.

FFT Routines

The FFT routines have been highly optimized for single-processor use. The two-dimensional, three-dimensional, and one-dimentional multiple routines are also multitasked (multithreaded) for all sizes for which there is a performance benefit. The one-dimensional routines are multitasked if the data size exceeds the size of the largest processor cache. Each routine can compute either a forward or an inverse Fourier transform.

Data Types

The following data types are used in these routines:

  • Single precision: Fortran "real" data type, C/C++ "float" data type, 32-bit floating point; these routine names begin with S.

  • Single precision complex: Fortran "complex" data type, C/C++ "scsl_complex" data type (defined in <scsl_fft.h>), C++ STL "complex<float>" data type (defined in <complex.h> , two 32-bit floating point reals; these routine names begin with C.

  • Double precision: Fortran "double precision" data type, C/C++ "double" data type, 64-bit floating point; these routine names begin with D.

  • Double precision complex: Fortran "double complex" data type, C/C++ “scsl_zomplex" data type (defined in <scsl_fft.h> ), C++ STL "complex<double>" data type (defined in <complex.h>), two 64-bit floating point doubles; these routine names begin with Z.

When using the C++ Standard Template Library (STL) to define complex types, the include files must be used in the following order:

#include <complex.h>
#include <scsl_fft.h>

Implementation Details

Often little or no difference exists between these versions, other than the data types of some inputs and outputs. In this case, the routines are described on the same man page, and that man page is named after the real or complex routine.

The man(1) command can find a man page online by either the real, complex, double precision, or double complex name.

The data types for the scale, table, and work arguments in these routines vary, depending on the function which is called. In the CC, SC, and CS routines, the arguments are single precision. In the ZZ, DZ and ZD routines, the arguments are double precision.

By default, the integer arguments are 4 bytes (32 bits) in size; this is the size obtained when one links to the SCSL library with -lscs or -lscs_mp. Another version of SCSL is available, however, in which integers are 8 bytes (64 bits). This version allows the user access to larger memory sizes and helps when porting legacy codes. It can be loaded by using either the -lscs_i8 or -lscs_i8_mp link option. Note that any program may use only one of the two versions; 4-byte integer and 8-byte integer library calls cannot be mixed.

C/C++ function prototypes for the signal processing routines are provided in <scsl_fft.h>, when using the default 4-byte integers, and <scsl_fft_i8.h> when using 8-byte integers. These header files define the complex types scsl_complex and scsl_zomplex, which are used in the prototypes. Alternatively, C++ programs may declare arguments using the types complex<float> and complex<double> from the standard template library (STL). But if these types are used, <complex.h> must be included before <scsl_fft.h> (or <scsl_fft_i8.h>). Note, though, that both complex types are equivalent: they simply represent (real, imaginary) pairs of floating point numbers stored contiguously in memory. With the proper casts, you can simply pass arrays of floating point data to the routines where complex arguments are expected.

Casts, however, can be avoided. The header files <scsl_fft.h> and <scsl_fft_i8.h> directly support the use of user-defined complex types or disabling prototype checking for complex arguments completely. By defining the symbol SCSL_VOID_ARGS before including <scsl_fft.h> or <scsl_fft_i8.h> all complex arguments will be prototyped as void *. To define the symbol SCSL_VOID_ARGS at compile time use the -D compiler option (for example, -DSCSL_VOID_ARGS) or use an explicit #define SCSL_VOID_ARGS in the source code. This allows the use of any complex data structure without warnings from the compiler, provided the structure is as described above:

  1. The real and imaginary components must be contiguous in memory.

  2. Sequential array elements must also be contiguous in memory.

While this allows the use of non-standard complex types without generating compiler warnings, it has the disadvantage that the compiler will not catch type mismatches.

Strong type checking can be enabled employing user-defined complex types instead of SCSL's standard complex types. To do this, define SCSL_USER_COMPLEX_T=my_complex and SCSL_USER_ZOMPLEX_T=my_zomplex, where my_complex and my_zomplex are the names of user-defined complex types. These complex types must be defined before including the <scsl_fft.h> (or <scsl_fft_i8.h>) header file.

Fortran 90 users on IRIX systems can perform compile-time checking of SCSL FFT subroutine calls by adding USE SCSL_FFT (for 4-byte integer arguments) or USE SCSL_FFT_I8 (for 8-byte integer arguments) to the source code from which the FFT calls are made. Alternatively, the compile-time checking can be invoked without any source code modifications by using the -auto_use compiler option. For example:

% f90 -auto_use SCSL_FFT test.f -lscs
% f90 -auto_use SCSL_FFT_I8 -i8 test.f -lscs_i8

Supported Routines

The following list describes the supported FFT routines. Each of these routines is highly optimized for single-processor use. The two-dimensional, three-dimensional, and one-dimensional multiple routines are alsomultitasked (multi-threaded) for all sizes for which there is a performance benefit; the one-dimensional routines are multitasked if the data size exceeds the size of the largest processor cache. Each routine can compute either a forward or an inverse Fourier transform. Rows of the table represent input and output data typesfor the routines in each column:

  • C->C implies 32-bit complex input and output.

  • Z->Z implies 64-bit double complex input and 64-bit double complex output. Each routine in this row is documented with the complex routine in the above row.

  • S->C implies 32-bit real input and 32-bit complex output.

  • D->Z implies 64-bit double precision real input and 64-bit double precision complex output. Each routine in this row is documented with the real-to-complex routine in the above row.

  • C->S implies 32-bit complex input and 32-bit real output.

  • Z->D implies 64-bit double complex input and 64-bit double precision output. Each routine named in this row is documented with the complex-real routine in the above row.

Columns of the table represent the number of dimensions for which the FFT is calculated for the routines in each row:

  • One-dimensional (single) calculates one FFT in one dimension.

  • One-dimensional (multiple) calculates an FFT in one dimension for each column of a two-dimensional matrix.

  • Two-dimensional calculates one FFT in two dimensions.

  • Three-dimensional calculates one FFT in three dimensions.

     ---------------------------------------------------------------------------
             1-dimensional        1-dimensional       2-dimensional 3-dimensional
               (single)            (multiple)
     ---------------------------------------------------------------------------
     C->C       CCFFT         CCFFTM       CCFFTMR       CCFFT2D      CCFFT3D

     Z->Z       ZZFFT         ZZFFTM       ZZFFTMR       ZZFFT2D      ZZFFT3D

     S->C       SCFFT         SCFFTM                     SCFFT2D      SCFFT3D

     D->Z       DZFFT         DZFFTM                     DZFFT2D      DZFFT3D

     C->S       CSFFT         CSFFTM                     CSFFT2D      CSFFT3D

     Z->D       ZDFFT         ZDFFTM                     ZDFFT2D      ZDFFT3D
     ---------------------------------------------------------------------------

Implementation Notes: work and table arrays

The FFT routines were designed so that they can be implemented efficiently on many different architectures. The calling sequence is the same in any implementation. Certain details, however, depend on the particular implementation.

One area of difference is the size of the table and work arrays. Different systems may need different sizes. The subroutine call requires no change, but you may have to change array sizes in the DIMENSION or type statements that declare the arrays. The following are the required array sizes for the Origin series (the values of NR and NFR are explained later):

  • CCFFT

    table: 2n + NF REAL
    work: 2n REAL WORDS

  • ZZFFT

    table: 2n + NF DBL PREC WORDS
    work: 2n DBL PREC WORDS

  • CCFFTMR

    table: 2n + NF REAL WORDS
    work: 2n REAL WORDS

  • ZZFFTMR

    table: 2n + NF DBL PREC WORDS 
    work: 2n DBL PREC WORDS

  • CCFFT2D

    table: (2*n1+NF) + (2*n2+NF) REAL WORDS
    work: 2*MAX(n1,n2) REAL WORDS

  • ZZFFT2D

    table: (2*n1+NF) + (2*n2+NF) DBL PREC WORDS 
    work: 2*MAX(n1,n2) DBL PREC WORDS

  • CCFFT3D

    table: (2*n1+NF) + (2*n2+NF) + (2*n3+NF) REAL WORDS
    work: 2*MAX(n1,n2,n3) REAL WORDS

  • ZZFFT3D

    table: (2*n1+NF) + (2*n2+NF) + (2*n3+NF) DBL PREC WORDS
    work:  2*MAX(n1,n2,n3) DBL PREC WORDS

  • CCFFTM

    table: (NF + 2 * n) REAL
    work: 2n REAL WORDS

  • ZZFFTM

    table: (NF + 2 * n) DBL PREC
    work: 2n DBL PREC WORDS

  • SCFFT, CSFFT

    table: (n+NFR) REAL
    work: n+2 REAL WORDS

  • DZFFT, ZDFFT

    table: (n+NFR) DBL PREC
    work: n + 2 DBL PREC WORDS

  • SCFFT2D, CSFFT2D

    table: (n+NFR) + (2*n2+NF) REAL
    work: n1+4*n2 REAL WORDS

  • DZFFT2D, ZDFFT2D

    table: (n1+NFR) + (2*n2+NF) DBL PREC WORDS
    work: n1 + 4 * n2 DBL PREC WORDS

  • SCFFT3D, CSFFT3D

    table: (n1+NFR) + (2*n2+NF) + (2*n3+NF) REAL WORDS
    work: n1 + 4 * n3 REAL WORDS

  • DZFFT3D, ZDFFT3D

    table: (n1+NFR) + (2*n2+NF) + (2*n3+NF) DBL PREC WORDS
    work: n1 + 4 * n3 DBL PREC WORDS

  • SCFFTM, CSFFTM

    table: (n+NFR) REAL WORDS
    work: n + 2 REAL WORDS

  • DZFFTM, ZDFFTM

    table: (n+NFR) DBL PREC
    work: n + 2 DBL PREC WORDS

Implementation Notes: isys Parameter Array

The second area of difference is the isys parameter array, an array that gives certain implementation-specific information. All features and functions of the FFT routines specific to any particular implementation are confined to this isys array. On any implementation, you can use the default values by using an argument value of 0.

In the Origin series implementation, isys (0)=0 and isys(0)=1 are supported. In SCSL versions prior to 1.3, only isys (0)=0 was allowed. For isys (0)=0, NF=30 and NFR=15, and for isys(0)=1, NF=NFR =256. The NF(R) words of storage in the table array contain a factorization of the length of the transform.

The smaller values of NF and NFR for isys(0)=0 are historical. They are too small to store all the required factors for the highest performing FFT, so when isys(0)=0, extra space is allocated when the table array is initialized. To avoid memory leaks, this extra space must be deallocated when the table array is no longer needed. The routines CCFFTF, CCFFTMF, etc., are used to release this memory. Due to the potential for memory leaks, the use of isys(0)=0 should be avoided.

For isys(0)=1, the values of NF and NFR are large enough so that no extra memory needs to be allocated, and there is no need to call CCFFTF, etc. to release memory. (If called, these routines do nothing.) isys(0) = 1 means that isys is an integer array with two elements. The second element, isys(1), will not be accessed.

Implementation Notes: Scratch Space

Finally, in addition to the work array, the FFT routines also dynamically allocate scratch space from the stack. The amount of space allocated can be slightly bigger than the size of the largest processor cache. For single processor runs, the default stack size is large enough that these allocations generally cause no problems. But for parallel runs, you need to ensure that the stack size of slave threads is big enough to hold this scratch space. Failure to reserve sufficient stack space will cause programs to dump core due to stack overflows. The stack size of MP library slave threads is controlled via the MP_SLAVE_STACKSIZE environment variable or the mp_set_slave_stacksize() library routine. See the mp(3c), mp(3F) and pe_environ(5)man pages for more information on controlling the slave stack size.

For pthreads applications, the thread's stack size is specified as one of many creation attributes provided in the pthread_attr_t argument to pthread_create(3P). The stacksize attribute should be set explicitly to a non-default value using the pthread_attr_setstacksize(3P) call, described in the pthread_attr_init(3P) man page.

Convolution and Correlation Routines

The convolution routines feature convolution for Finite Impulse Response (FIR) as well as correlations. Each routine is highly optimized for single-processor use. The routines which use two-dimensional input sequences are multitasked (multi-threaded).

The convolution and correlation routines are very general. To achieve this generality and maximum flexibility, one-dimensional sequences are defined by 3 parameters. Six parameters are necessary for two-dimensional sequences. One drawback of this generality are the long calling sequences.

The following table contains a summary of the filter and correlation routines. In this table, rows of the table represent data types for the routines in each column:

  • C implies 32-bit complex data.

  • Z implies 64-bit double complex data

  • S implies 32-bit real data.

  • D implies 64-bit double precision real data.

Columns of the table represent the type of computation as well as the number of dimensions for which the convolution or correlation is calculated for the routines in each row:

  • One-dimensional FIR applies a Finite Impulse Response filter to one-dimensional signals.

  • One-dimensional (multiple) FIR applies a Finite Impulse Response filter to multiple one-dimensional signals.

  • Two-dimensional FIR applies a Finite Impulse Response filter to two-dimensional signals.

  • One-dimensional COR calculates the correlation of one-dimensional sequences.

  • One-dimensional (multiple) COR calculates the correlation of multiple one-dimensional sequences.

  • Two-dimensional COR calculates the correlation of two-dimensional sequences.

     ------------------------------------------------------------
          Type       1D (single)    1D (multiple)         2D
     ------------------------------------------------------------
           C            CFIR1D         CFIRM1D          CFIR2D
           Z            ZFIR1D         ZFIRM1D          ZFIR2D
           S            SFIR1D         SFIRM1D          SFIR2D
           D            DFIR1D         DFIRM1D          DFIR2D
     ------------------------------------------------------------
           C            CCOR1D         CCORM1D          CCOR2D
           Z            ZCOR1D         ZCORM1D          ZCOR2D
           S            SCOR1D         SCORM1D          SCOR2D
           D            DCOR1D         DCORM1D          DCOR2D
     ------------------------------------------------------------

SCSL User's Guide
(document number: 007-4325-001 / published: 2003-12-30)    table of contents  |  additional info  |  download

    Front Matter
    About This Guide
    Chapter 1. Introduction
    Chapter 2. Basic Linear Algebra Subprogram (BLAS) Routines
    Chapter 3. LAPACK
    Chapter 4. Using Sparse Linear Equation Solvers
    Chapter 5. Signal Processing Routines
    Appendix A. Supported SCSL Routines
    Glossary
    Index


home/search | what's new | help