SGI Techpubs Library

IRIX 6.5  »  Books  »  Developer  »  
Origin 2000 and Onyx2 Performance Tuning and Optimization Guide
(document number: 007-3430-003 / published: 2001-08-02)    table of contents  |  additional info  |  download
find in page | jump to first hit | clear highlight

List of Figures

| Table of Contents | List of Figures | List of Examples | List of Tables |

Figure 1-1. Block Diagrams of 4-, 8-, 16-, 32-, and 64-CPU SN0 Systems
Figure 1-2. Block Diagram and Approximate Appearance of a Node Board
Figure 1-3. Block Diagram of Memory, Hub, and Cache Directory
Figure 1-4. XIO and XBOW Provide I/O Attachment to a Node
Figure 2-1. Program Address Space Versus SN0 Architecture
Figure 2-2. Parallel Process Memory Access Pattern Versus SN0 Architecture
Figure 2-3. Parallel Processes at Opposite Corners of SN0 System
Figure 2-4. Parallel Processes and Memory at Bad Locations
Figure 2-5. Parallel Program Ideally Placed in SN0 System
Figure 2-6. Parallel Program Mapped to a Pair of MLDs
Figure 2-7. Parallel Program Mapped to an MLD Set with Hypercube Topology and Affinity to a Graphics Device
Figure 2-8. Parallel Program Mapped through MLDs to Hardware
Figure 4-1. Code Residence versus Data reference
Figure 4-2. On-screen plot of dprof output
Figure 4-3. dprof Output for a Program with Poor Memory
Figure 4-4. dprof Output for a Program with Good Memory
Figure 5-1. DAXPY Software Pipeline Schedule
Figure 5-2. Inlining and the Call Hierarchy
Figure 6-1. Processing Directions in adi2.f
Figure 6-2. Memory Use in Matrix Multiply
Figure 6-3. Cache Blocking of Matrix Multiplication
Figure 6-4. Schematic of Data Motion in Radix-2 Fast Fourier Transform
Figure 7-1. Table of Loop-Unrolling Parameters for Matrix Multiply
Figure 7-2. Performance of Vector Intrinsic Functions on an Origin 2000
Figure 8-1. Possible Speedup for Different Values of p
Figure 8-2. Performance of Weather Model Before and After Tuning
Figure 8-3. Calculated Bandwidth for Different Placement Policies
Figure 8-4. Calculated Iteration Times for Different Placement Policies
Figure 8-5. Cumulative Run Time for Different Placement Policies
Figure 8-6. Effect of Migration Level on Iteration Time
Figure 8-7. Effect of Page Granularity in First-Touch Allocation
Figure 8-8. Data Partition for NAS FT Kernel
Figure 8-9. NAS FT Kernel Data Redistributed
Figure 8-10. Some Possible Regular Distributions for Four Processors
Figure 8-11. Possible Outcomes of Distribute ONTO Clause
Figure 8-12. Reshaped Distribution of Three-Dimensional Array for Four CPUs
Figure 8-13. Copying By Cache Lines for Summation
Figure 8-14. Placement File and its Results

Origin 2000 and Onyx2 Performance Tuning and Optimization Guide
(document number: 007-3430-003 / published: 2001-08-02)    table of contents  |  additional info  |  download

    Front Matter
    About This Guide
    Chapter 1. Understanding SN0 Architecture
    Chapter 2. SN0 Memory Management
    Chapter 3. Tuning for a Single Process
    Chapter 4. Profiling and Analyzing Program Behavior
    Chapter 5. Using Basic Compiler Optimizations
    Chapter 6. Optimizing Cache Utilization
    Chapter 7. Using Loop Nest Optimization
    Chapter 8. Tuning for Parallel Processing
    Appendix A. Bentley's Rules Updated
    Appendix B. R10000 Counter Event Types
    Appendix C. Useful Scripts and Code
    Glossary
    Index


home/search | what's new | help