IRIX 6.5 » Books » Developer »
OpenGL Optimizer Programmer's Guide: An Open API for Large-Model Visualization
(document number: 007-2852-002 / published: 1998-06-09)
table of contents | additional info | download find in page
Chapter 5. Culling Unneeded Objects From the Scene Graph
With one exception, the tools discussed in this chapter reduce the number of objects and vertices submitted to OpenGL processing. The tools cull unnecessary objects from the scene graph before a draw traversal. The effect of these tools is to reduce the load on at least one of the transformation, rasterization, and display stages of the graphics pipeline.
The following list shows the main culling topics discussed in this chapter:
View-frustum culling identifies csGeoSets in a scene graph whose geometry is not in the viewing frustum, and prevents their further processing in the graphics pipeline, clearly a potential benefit for all downstream resources.
Cosmo3D provides integrated, hierarchical view-frustum culling, which runs as part of the rendering process. OpenGL Optimizer provides an additional method for multiprocess view-frustum culling as part of the occlusion culler discussed in the next section.
When to Use View-Frustum Culling
View-frustum culling is beneficial if the viewpoint is near or inside a complex scene where much of the scene is outside the viewing frustum, for example, during a walkthrough of a building.
A view-frustum test is not helpful if the scene fits in the viewing frustum, for example, when you view an entire building from outside. The hierarchical containment test used to implement view frustum culling in Cosmo3D ensures that unneeded processing is avoided in such “all-visible” cases by detecting geometry that is completely within the culling frustum and skipping subordinate frustum tests.
View-Frustum Culling and Pipeline Load Balancing
View-frustum culling usually reduces the work done by the graphics hardware. But it may either increase or decrease the load on the host, depending on whether the time needed to perform the cull tests is greater or smaller than the time saved by eliminating pieces of the scene graph from a draw traversal.
For few csGeoSets with many triangles, a view-frustum cull is quite fast on the host, but unneeded triangles slow down the graphics hardware.
Many smaller csGeoSets with few triangles result in more precise culling and fewer unneeded triangles sent to the graphics hardware, because a larger fraction of the member triangles are likely to intersect the view frustum. However, the cost is a larger number of intersection tests.
For optimal performance, adjust the csGeoSet size to balance the time spent intersection testing with the time spent transforming off-screen triangles.
If the host is a bottleneck, send more triangles to the rendering hardware.
If the rendering hardware is the bottleneck, more precise culling might be a good use for the free CPU cycles.
You can use OpenGL Optimizer spatialization tools to control the average number of triangles in a csGeoSet (see Chapter 6, “Organizing the Scene Graph Spatially”).
To illustrate the load balancing issues, consider viewing a lug nut of a car for the following two extreme scene graphs for rendering a car model:
One graph consists of one million triangles in one csGeoSet. No time would be spent on a view-frustum cull. When rendering a close-up view of the lug nut, all one million triangles pass through the graphics hardware, creating a transform bottleneck, because the few triangles making up the lug nut were in the viewing frustum.
A second graph for the same car consists of one million csGeoSets, each containing a single triangle. After a view-frustum cull, only the on-screen triangles go to the graphics hardware, minimizing its load. However, the view-frustum cull test would cause a host bottleneck.
Because data bases and views are almost never at these polar extremes, view frustum culling is beneficial in nearly all cases. Balancing the pipeline enhances the benefits.
Occlusion culling identifies triangles in a scene graph that are occluded by objects in the foreground and prevents their further processing in the graphics pipeline.
 | Note: If you set opDrawAction::setVFCullMode to true, the occlusion culler performs view frustum culling before occlusion culling to reduce the number of objects for which occlusion culling has to be performed.
|
You can control what you mean by “occluded;” the occlusion culler allows you to eliminate objects for which a specified fraction of their bounding boxes are occluded by foreground objects. This partial occlusion control allows you to further reduce the load on the graphics pipeline; the efficacy of culling surges as you decrease the fraction, but at the possible cost of eliminating partially visible objects.
The default fraction for culling, 100%, is conservative in that the occlusion culler never eliminates a visible triangle; however, it might not cull all occluded triangles. You can change the fraction according to your needs, and update it dynamically in response to graphics-pipeline load as a closed-loop frame rate control mechanism.
Rendering occluded triangles does not generate an incorrect image because the depth buffer test eliminates occluded pixels, but that test occurs late in the rasterization stage after vertices have been transformed, so relying on depth-buffer testing for occlusion culling wastes graphics hardware processing cycles.
Just like view-frustum culling, occlusion culling is clearly a potential benefit for all downstream processing resources.
When to Use Occlusion Culling
Occlusion culling is appropriate for scenes with high depth complexity, that is, with many objects that may be occluded. For example, 95% of the triangles in a typical view of an automobile or other complicated mechanical assembly are occluded. Occlusion culling provides less of a benefit for scenes with less depth complexity. In a visual simulation application, where objects do not contain internal parts, more than half the triangles are commonly visible. In this case, an occlusion culler would have a significant effect on frame rate.
Figure 5-1 illustrates how view frustum and occlusion culling work together to greatly reduce the amount of geometry that needs to be rendered. This is the first step in high-fidelity, large-model visualization.
You can run the occlusion culler on multiple processes, and you can choose the number of processes. Even on a single-processor machine, you may benefit from using multiple processes because the host can cull while the OpenGL process is blocked, waiting for the graphics FIFO to unclog.
Occlusion Culling and Pipeline Load Balancing
If you set opDrawAction::setVFCullMode to true, the occlusion culler performs view frustum culling before occlusion culling. As a result, all view frustum performance characteristics also apply to occlusion culling.
If the time required for occlusion culling is greater than the rendering time saved, culling only moves a bottleneck to the host and increases the processing time of the graphics pipeline. If occlusion culling takes less time than drawing, you can use the extra time to eliminate more triangles from a scene graph, thus further reducing the load on the graphics hardware and shifting the balance of tasks in the graphics pipeline.
 | Note: You get lower-quality culling if a scene occupies only a portion of the total z-range of the depth buffer. For the best precision, set the z-clipping tightly around your scene.
|
Spatialization to Balance Pipeline Load When Occlusion Culling
You can adjust the execution times of the host and the graphics hardware by controlling the number of triangles in each csGeoSet (see Chapter 6, “Organizing the Scene Graph Spatially”).
Coarser granularities, which are characterized by a few large csGeoSets, make culling run faster at the risk of drawing more occluded geometry.
Finer granularities give more precise culling at the cost of extra culling time.
The culler uses bounding boxes to determine whether a csGeoSet is occluded. Although it may increase the time spent culling, creating smaller csGeoSets with tighter bounding boxes may have a particularly dramatic impact on graphics hardware processing. For example, in many tightly-packed mechanical assemblies, the corner of a bounding box may be visible, even though its enclosed csGeoSet is fully occluded. In that case, the graphics hardware is engaged in an unproductive rendering task. In summary, long csGeoSets are bad, small rectangular ones are good.
Changing the Fraction of the Bounding Box Required for Elimination
You can dynamically shift the load between the host and the OpenGL pipeline by varying the fraction of a bounding box that must be occluded before it is eliminated from the pipeline: thus you can create a closed-loop frame rate control mechanism.
View-Frustum and Occlusion Cull Draw Traversal: opDrawAction
The class opDrawAction is a csDrawAction that allows you to traverse a scene graph and draw the scene with occlusion culling, view-frustum culling, or both. You can also set the background color for the scene by specifying RGBA values.
To draw with an opDrawAction, follow these steps:
Call setScene() to set the scene graph to be drawn.
Call initializeScene() to initialize the scene graph
In each frame:
If the scene graph is modified while occlusion culling is enabled, the method opDrawAction::reset() must be called after the scene graph modification.
Class Declaration for opDrawAction
The class has the following main methods
class opDrawAction : public csDrawAction
{
public:
opDrawAction( int nProcs=1,opBool computeStats=false);
virtual ~opDrawAction();
// Drawing the scene
virtual void setScene ( csNode* node);
virtual void initializeScene ( );
virtual csTravDirective apply ( csNode* node);
virtual void postDraw ( );
//Accessor functions
csNode* getScene ( );
void setLights (int nLights, csLight** lights);
void setWindowSize (int width, int height);
void setConservativeMode (opBool enabled=1);
void setVFCullMode (opBool enabled=1);
void setOCCullMode (opBool enabled=1);
void setDrawCulledMode (opBool enabled=1);
void setBPCullMode (csDrawAction::BpcModeEnum bpcMode);
int getVFShapesCount ();
int getVFTrisCount ();
int getShapesDrawnCount ();
int getTrisDrawnCount ();
void setClearColor (const csVec4f& c);
void getClearColor ( csVec4f& c);
};
|
| setScene(node) | |
Sets the scene graph to be drawn by this opDrawAction.
| | initializeScene() | |
Performs the necessary initializations of the scene graph. This method must be called every time the scene graph is modified so that the new scene graph is initialized correctly.
initializeScene() modifies the scene graph. Therefore, when rendering with multiple parallel threads, initializeScene() cannot be called by any thread while the draw threads are drawing.
| | apply(node) | | Draws a scene.
| | postDraw() | | Performs necessary post processing of the scene. This method modifies the scene graph. Therefore, when you are rendering with multiple parallel threads, postDraw() cannot be called by any thread while the draw threads are drawing.
|
The remaining methods allow you to control the types of culling applied, window size, and lights, and to recover statistics about the scene.
Rendering With View-Frustum and Occlusion Culling: opOccDrawImpl
To use the occlusion-culling algorithm in a rendering application, you can register an opOccDrawImpl in an opViewer. An example appears inAppendix C, “opviewer Sample Application.”
The class opOccDrawImpl is an opDrawImpl, which is the base class for drawing implementations discussed in “Controlling Rendering: opKeyCallback and opDrawImpl”.
opOccDrawImpl defines key bindings that control its rendering options in an opViewer application, and that allow you to record a sequence of control operations so that you can save a “tour” of a scene.
opOccDrawImpl uses opDrawAction to render the scene in an opViewer application.
Class Declaration for opOccDrawImpl
The class has the following main methods
class opOccDrawImpl : public opDrawImpl
{
public:
opOccDrawImpl(opViewer *viewer,int nProcs = 2);
~opOccDrawImpl();
virtual void draw(unsigned frame);
virtual void activated();
virtual void deactivated();
virtual void reset();
static opBool keyHandler(opDrawImpl *,int);
void setConservativeMode(opBool enabled);
void setDrawCulledMode(opBool enabled);
void setOCullMode(opBool enabled);
void setVFCullMode(opBool enabled);
opBool getConservativeMode() const;
opBool getDrawCulledMode() const;
opBool getOCullMode() const;
opBool getVFCullMode() const;
int loadRecording(const char *filename);
void saveRecording(const char *filename);
void beginRecording();
int endRecording();
void playback(opViewer *viewer, void *userData);
};
|
| opOccDrawImpl() | |
Registers the occlusion culler with the opViewer, thus making key bindings effective, and allocates the number of processors to use when performing the occlusion or view-frustum culling.
| | draw() | | Is inherited from opDrawImpl. Implements occlusion culling for each frame update in opViewer::eventLoop(). The other inherited functions do nothing.
| | keyHandler() | | Defines the effects of the keyboard commands registered by calls to registerKey(). An opOccDrawImpl has the keyboard control definitions described in “Key Bindings for opOccDrawImpl”.
| | get...() and set...() | |
Provide interactions with the control parameters.
| | loadRecording(), and so on | |
Provide control over recording, writing, reading, and playing a sequence of manipulations of your scene graph. You can store up to 1000 frames.
| | registerKey()
| | Registers a keyboard command and specifies the function (keyHandler()) that interprets the command.
The registerKey() method is inherited from opDrawImpl, which is discussed in “Controlling Rendering: opKeyCallback and opDrawImpl”. See the file /usr/share/optimizer/src/libopGUI/opOccDrawImpl.cxx for details.
|
Key Bindings for opOccDrawImpl
The class constructor for opOccDrawImpl uses the methods registerKey() and keyHandler() to register the following keyboard commands (see the file opOccDrawImpl.cxx):
| c | | Toggles “conservative” occlusion culling. If you use “non-conservative” occlusion culling, the culler runs faster, but the screen may flash during rendering; with conservative culling, no flashing occurs.
| | o | | Toggles occlusion culling on and off. Initially, occlusion culling is disabled and all geometry is rendered. The algorithm removes only geometry that is not visible, so you do not see any change in the scene, however, the frame rate increases.
| | O | | Toggles rendering of occluded and foreground geometry. This feature lets you see exactly which portions of your scene are completely occluded. Note that all the occluded geometry is rendered when this option is enabled, so for a scene with many layers, the occluded geometry renders much more slowly than the foreground geometry.
| | v | | Toggles view-frustum culling on and off. OpenGL Optimizer allows you to use multiple processors to perform view-frustum culling.
| | + - | | Allow you to increase and decrease the threshold fraction that specifies how much of an object's bounding box must be occluded to cull the object.
| | [ | | Starts recording keyboard commands.
| | ] | | Stops recording.
| | \ | | Playback last recording.
| | ! | | Saves recording.
|
Tuning Tips for Occlusion Culling
The central concern for tuning occlusion culling is load balance. The goal is to avoid bottlenecks, see “Bottlenecks in the Pipeline”. Some tuning controls are the number of processors, the size of the csGeoSets, spatialization, and the z-resolution of the framebuffer. Because every database is different, you have to measure performance to identify bottlenecks. An iterative process of measuring performance, adjusting tuning parameters, and measuring performance again is usually appropriate. The sections below describe some common problems and their likely causes:
Culling Takes Longer Than Rendering
Possible causes and solutions:
Not enough geometry is being culled, either because most is visible, or because the bounding boxes are too long.
The csGeoSets are too small, so that the time required to cull one is longer than the time required to draw it. To address this problem, combine csGeoSets to make them bigger (see “Merging csGeoSets in a Scene Graph: opCombineGeoSets”).
Not enough processors. To address this problem, increase the nProcs parameter for the constructor opDrawAction() up to the number of processors on your system. On a single CPU system, use the value 2; this allows the host to cull while the OpenGL process is blocked, waiting for the graphics first-in-first-out queue to clear.
Occluded Geometry Is Not Culled
Possible causes and solutions:
Bounding boxes are not tight enough.
Too much downsampling in x-y space.
Not enough z-resolution.
Geometry is actually visible through cracks in model.
Very Small Speedup and Fast Culling
Possible causes and solutions:
Level-of-detail nodes are useful for adjusting the number of vertices associated with any given object. In some cases, however, it is most appropriate not to render objects below a certain size. The methods of opDetailSimplify allow you to remove geometry from csShapes that are “small.” Small is determined by a threshold for the ratio of shape size to overall scene graph size, calculated from the radii of their respective bounding spheres. You can explicitly set the large-scale dimension and thus have more direct control over which objects are culled.
Small csShape nodes are not removed from the graph; the scene graph structure remains the same. You can therefore use as an LOD a scene graph that has been detail simplified.
Class Declaration for opDetailSimplify
The class has the following main methods
class opDetailSimplify
{
public:
opDetailSimplify (void)
~opDetailSimplify (void)
// --- ratio of shape size to overall size
void setSizeRatio (float ratio)
float getSizeRatio ()
// --- detail cull scene graph below root
void apply (csNode *root);
void setRootRadius(float radius)
};
|
Methods in opDetailSimplify
| apply() | | Traverses the graph below root and culls small objects. Whether an object is “small” is determined by:
The radius of the bounding sphere of the object.
The value set by setSizeRatio ().
The radius of the bounding sphere of the root node. You can explicitly set this maximum scale by calling setRootRadius().
| | setSizeRatio () and getSizeRatio() | |
Sets and gets the threshold for culling small objects.
| | setRootRadius() | |
Explicitly sets the dimension to which all objects are compared.
|
Typically, triangles should not be rendered when their front sides do not face the viewpoint. Such pieces of a surface are called back faces. Figure 5-2 illustrates the back faces of an open and a capped cylinder: the back faces are those for which the normals point away from the viewpoint.
Back-face culling keeps these triangles from being rasterized, thus saving on pixel fill time. Because the cull operation depends on the orientation of the triangles relative to the viewer, back-face culling occurs in the graphics pipeline after the transform stage: only rasterization and display stages are affected.
It is not always appropriate to cull back faces. If a surface has any holes, you should render the back faces because they may be visible through the holes at certain viewing angles. For example, if you can see into a pipe, render the pipe's back face. Figure 5-2 illustrates this point by showing the effects of back-face culling on an open and a capped cylinder.
Occasionally surface normals are inconsistent or inappropriate. For example, the normals to a car body part might point towards the interior. Rather than maintain consistent normals, many CAD applications ignore sidedness of surfaces and light scenes with two-sided lighting: the front and back sides of triangles are made renderable that way. To make this work, materials must be set to be two-sided. The right-most panel in Figure 5-2 illustrates the effect.
Two-sided lighting is inefficient for two reasons:
Two-sided triangles do not have a back face and so cannot be culled, even for only one light source.
Levels of optimization may differ for the different rendering paths.
For example, the rendering path with a single light and single-sided material is on the optimized path in Silicon Graphics machines, but rendering modes with two or more lights or with two-sided materials are on the unoptimized path, which may run at half the speed of the optimized path.
An OpenGL Optimizer tool that accommodates inconsistent normals and gives faster rendering than two lights is the Gaussian light reflection map, discussed in “Gaussian Map”.
Setting Back-Face Culling
You have two options for controlling back-face culling.
For a single csGeoSet, control rendering of the back face of a surface with the method csGeoSet::setCullFace(). See the Cosmo 3D Programmer's Guide for more information.
For an entire scene, use csContext if you want to set back face culling. See the Cosmo 3D Programmer's Guide for more information on this feature.
OpenGL Optimizer Programmer's Guide: An Open API for Large-Model Visualization
(document number: 007-2852-002 / published: 1998-06-09)
table of contents | additional info | download
Front Matter
About This Guide
Part I. Getting Started
Part II. High-Level Strategic Tools for Fast Rendering
Part III. Specific Tools for Fast Rendering
Part IV. Managing and Rendering Higher-Order Geometric Primitives
Part V. Traversers, Low-Level Geometry Processing, and Multiprocessing
Part VI. Utilities and Troubleshooting
Part VII. Appendices
Glossary
Index
home/search |
what's new |
help
|