SGI Techpubs Library

IRIX 6.5  »  Books  »  Developer  »  
Origin 2000 and Onyx2 Performance Tuning and Optimization Guide
(document number: 007-3430-003 / published: 2001-08-02)    table of contents  |  additional info  |  download
find in page

List of Examples

| Table of Contents | List of Figures | List of Examples | List of Tables |

Example 1-1. Parallel Code Using Directives for Simple Scheduling
Example 1-2. Parallel Code Using Directives for Dynamic Scheduling
Example 4-1. Experimenting with perfex
Example 4-2. Output of perfex -a
Example 4-3. Output of perfex -a -y
Example 4-4. Performing an ssrun Experiment
Example 4-5. Example Run of ssruno
Example 4-6. Default prof Report from ssrun Experiment
Example 4-7. Profile at the Source Line Level Using prof -heavy
Example 4-8. Ideal Time Profile Run
Example 4-9. Default Report of Ideal Time Profile
Example 4-10. Ideal Time Report Truncated with -quit
Example 4-11. Ideal Time Report by Lines
Example 4-12. Ideal Time Profile Using -lines and -only Options
Example 4-13. Ideal Time Architecture Information Report
Example 4-14. Extract from a Butterfly Report
Example 4-15. Usertime Call Hierarchy
Example 4-16. Application of dprof
Example 4-17. Example of Default dlook Output
Example 5-1. Simple Summation Loop
Example 5-2. Unrolled Summation Loop
Example 5-3. Basic DAXPY Loop
Example 5-4. Unrolled DAXPY Loop
Example 5-5. Compiler-Generated DAXPY Schedule
Example 5-6. Basic DAXPY Loop Code
Example 5-7. Sample Software Pipeline Report Card
Example 5-8. C Implementation of DAXPY Loop
Example 5-9. SWP Report Card for C Loop with Default Alias Model
Example 5-10. SWP Report Card for C Loop with Alias=Restrict
Example 5-11. C Loop Nest on Multidimensional Array
Example 5-12. SWP Report Card for Stencil Loop with Alias=Restrict (output obtained using version 7.2.1.3m compiler)
Example 5-13. SWP Report Card for Stencil Loop with Alias=Disjoint (output obtained using version 7.3 compiler)
Example 5-14. Indirect DAXPY Loop
Example 5-15. SWP Report Card on Indirect DAXPY (7.2.1.3 compiler)
Example 5-16. Indirect DAXPY in Fortran with ivdep
Example 5-17. Indirect DAXPY in C with ivdep
Example 5-18. SWP Report Card for Indirect DAXPY with ivdep
Example 5-19. Loop with Two Types of Dependency
Example 5-20. C Loop with Obvious Loop-Carried Dependence
Example 5-21. C Loop with Lexically-Forward Dependency
Example 5-22. C Loop Test Using Dereferenced Pointer
Example 5-23. C Loop Test Using Local Copy of Dereferenced Pointer
Example 5-24. C Loop with Disguised Invariants
Example 5-25. SWP Report Card for Loop with Disguised Invariance
Example 5-26. C Loop with Invariants Exposed
Example 5-27. SWP Report Card for Modified Loop
Example 5-28. Conventional Code to Avoid an Exception
Example 5-29. Speculative Equivalent Permitting an Exception
Example 5-30. Code Suitable for Inlining
Example 5-31. Subroutine Candidates for Inlining
Example 5-32. Inlined Code from w2f File
Example 6-1. Simple Loop Nest with Poor Cache Use
Example 6-2. Reversing Loop Nest to Achieve Stride-One Access
Example 6-3. Loop Using Three Vectors
Example 6-4. Three Vectors Combined in an Array
Example 6-5. Fortran Code That May Cause Thrashing
Example 6-6. Perfex Data for adi2.f based on 250 MHz IP27 MIPS R10000 CPU
Example 6-7. Perfex Data for adi5.f based on 250 MHz IP27 MIPS R10000 CPU
Example 6-8. Perfex Data for adi53.f based on 250 MHz IP27 MIPS R10000 CPU
Example 6-9. Sequence of DAXPY and Dot-Product on a Single Vector
Example 6-10. DAXPY and Dot-Product Loops Fused
Example 6-11. Matrix Multiplication Loop
Example 7-1. Matrix Multiplication Subroutine
Example 7-2. SWP Report Card for Matrix Multiplication
Example 7-3. Matrix Multiplication Unrolled on Outer Loop
Example 7-4. Matrix Multiplication Unrolled on Middle Loop
Example 7-5. Matrix Multiplication Unrolled on Outer and Middle Loops
Example 7-6. Simple Loop Nest with Poor Cache Use
Example 7-7. Simple Loop Nest Interchanged for Stride-1 Access
Example 7-8. Loop Nest with Data Recursion
Example 7-9. Recursive Loop Nest Interchanged and Unrolled
Example 7-10. Matrix Multiplication in C
Example 7-11. Cache-Blocked Matrix Multiplication
Example 7-12. Fortran Nest with Explicit Cache Block Sizes for Middle and Inner Loops
Example 7-13. Fortran Loop with Explicit Cache Block Sizes and Interchange
Example 7-14. Transformed Fortran Loop
Example 7-15. Adjacent Loops that Cannot be Fused
Example 7-16. Adjacent Loops Fused After Peeling
Example 7-17. Sketch of a Loop with a Long Body
Example 7-18. Sketch of a Loop After Fission
Example 7-19. Loop Nest that Cannot Be Interchanged
Example 7-20. Loop Nest After Fission and Interchange
Example 7-21. Simple Reduction Loop Needing Prefetch
Example 7-22. Simple Reduction Loop with Prefetch
Example 7-23. Reduction with Conditional Prefetch
Example 7-24. Reduction with Prefetch Unrolled Once
Example 7-25. Reduction Loop Unrolled with Two-Ahead Prefetch
Example 7-26. Reduction Loop Unrolled Four Times
Example 7-27. Fortran Use of Manual Prefetch
Example 7-28. Typical Fortran Declaration of Local Arrays
Example 7-29. Common, Improper Fortran Practice
Example 7-30. Fortran Loop to which Gather-Scatter Is Applicable
Example 7-31. Fortran Loop with Gather-Scatter Applied
Example 7-32. Fortran Loop That Processes a Vector
Example 7-33. Fortran Loop Transformed to Vector Intrinsic Call
Example 8-1. Typical C Loop
Example 8-2. Amdahl's law: Speedup(n) Given p
Example 8-3. Amdahl's law: p Given Speedup(2)
Example 8-4. Amdahl's Law: p Given Speedup(n) and Speedup(m)
Example 8-5. Fortran Loop with False Sharing
Example 8-6. Fortran Loop with False Sharing Removed
Example 8-7. Easily Parallelized Fortran Vector Routine
Example 8-8. Fortran Vector Operation, Parallelized
Example 8-9. Fortran Vector Operation with Distribution Directives
Example 8-10. Parallel Loop with Affinity in Data
Example 8-11. Parallel Loop with Affinity in Threads
Example 8-12. Loop Parallelized with the NEST Clause
Example 8-13. Loop Parallelized with NEST Clause with Data Affinity
Example 8-14. Loop Parallelized with NEST, AFFINITY, and ONTO
Example 8-15. Fortran Code for Explicit Page Placement
Example 8-16. Declarations Using the Distribute_Reshape Directive
Example 8-17. Valid and Invalid Use of Reshaped Array
Example 8-18. Corrected Use of Reshaped Array
Example 8-19. Gathering Reshaped Data with Copying
Example 8-20. Gathering Reshaped Data with Cache-Friendly Copying
Example 8-21. Reshaped Array as Actual Parameter—Valid
Example 8-22. Reshaped Array as Actual Parameter—Invalid
Example 8-23. Differently Reshaped Arrays as Actual Parameters
Example 8-24. Typical Output of _DSM_VERBOSE
Example 8-25. Test Placement Display from First-Touch Allocation
Example 8-26. Test Placement Display from Round-Robin Placement
Example 8-27. Scalable Placement File
Example 8-28. Scalable Placement File for Two Threads Per Memory
Example 8-29. Various Ways of Distributing Threads to Memories
Example 8-30. Calling dplace Dynamically from Fortran
Example 8-31. Using a Script to Capture Redirected Output from an MPI Job
Example A-1. Naive Function to Find Nearest Point
Example A-2. Nearest-Point Function with Short-Circuit Test
Example C-1. Program adi2.f
Example C-2. Program adi5.f
Example C-3. Program adi53.f
Example C-4. Basic Makefile
Example C-5. Shell Script swplist
Example C-6. SpeedShop Experiment Script ssruno
Example C-7. Awk Script to Analyze Output of perfex -a
Example C-8. Awk Script to Extrapolate Amdahl's Law from Measured Times
Example C-9. Routine va2pa() Returns the Physical Page of a Virtual Address
Example C-10. Routine cpuclock() Gets the Clock Speed from the Hardware Inventory

Origin 2000 and Onyx2 Performance Tuning and Optimization Guide
(document number: 007-3430-003 / published: 2001-08-02)    table of contents  |  additional info  |  download

    Front Matter
    About This Guide
    Chapter 1. Understanding SN0 Architecture
    Chapter 2. SN0 Memory Management
    Chapter 3. Tuning for a Single Process
    Chapter 4. Profiling and Analyzing Program Behavior
    Chapter 5. Using Basic Compiler Optimizations
    Chapter 6. Optimizing Cache Utilization
    Chapter 7. Using Loop Nest Optimization
    Chapter 8. Tuning for Parallel Processing
    Appendix A. Bentley's Rules Updated
    Appendix B. R10000 Counter Event Types
    Appendix C. Useful Scripts and Code
    Glossary
    Index


home/search | what's new | help

Contact Us | Site Map | Trademarks | Privacy | Using this site means you accept its Terms of Use

Copyright © 1993-2007 SGI, Inc. All rights reserved.