What do I need to know about
First, read chapters 11 through 14 of the
book OpenGL on Silicon Graphics Systems. Although some of the information is SGI machine
specific, most of the information applies to OpenGL
programming on any platform. It's invaluable reading for the
performance-minded OpenGL programmer.
Consider a performance tuning analogy: A
database application spends 5 percent of its time looking up
records and 95 percent of its time transmitting data over a
network. The database developer decides to tune the
performance. He sits down and looks at the code for looking
up records and sees that with a few simple changes he can
reduce the time itll take to look up records by more
than 50 percent. He makes the changes, compiles the database,
and runs it. To his dismay, there's little or no noticeable
What happened? The developer didn't
identify the bottleneck before he began tuning. The most
important thing you can do when attempting to boost your
OpenGL programs performance is to identify where the
Graphics applications can be bound in
several places. Generally speaking, bottlenecks fall into
three broad categories: CPU limited, geometry limited, and
CPU limited is a general term. Specifically,
it means performance is limited by the speed of the CPU. Your
application may also be bus limited, in which the bus
bandwidth prevents better performance. Cache size and amount
of RAM can also play a role in performance. For a true CPU-limited
application, performance will increase with a faster CPU.
Another way to increase performance is to reduce your
applications demand on CPU resources.
A geometry limited application is bound by
how fast the computer or graphics hardware can perform vertex
computations, such as transformation, clipping, lighting,
culling, vertex fog, and other OpenGL operations performed on
a per vertex basis. For many very low-end graphics devices,
this processing is performed in the CPU. In this case, the
line between CPU limited and geometry limited becomes fuzzy.
In general, CPU limited implies that the bottleneck is CPU
processing unrelated to graphics.
In a fill-limited application, the rate you
can render is limited by how fast your graphics hardware can
fill pixels. To go faster, you'll need to find a way to
either fill fewer pixels, or simplify how pixels are filled,
so they can be filled at a faster rate.
Its usually quite simple to discern
whether your application is fill limited. Shrink the window
size, and see if rendering speeds up. If it does, you're fill
If you're not fill limited, then you're
either CPU limited or geometry limited. One way to test for a
CPU limitation is to change your code, so it repeatedly
renders a static, precalculated scene. If the performance is
significantly faster, you're dealing with a CPU limitation.
The part of your code that calculates the scene or does other
application-specific processing is causing your performance
hit. You need to focus on tuning this part of your code.
If it's not fill limited and not CPU
limited, congratulations! It's geometry limited. The per
vertex features youve enabled or the shear volume of
vertices you're rendering is causing your performance hit.
You need to reduce the geometry processing either by reducing
the number of vertices or reducing the calculations OpenGL
must use to process each vertex.
How can I measure my application's
You usually do this by getting the system
time, doing some rendering, and getting the system time again.
The difference between the two time measurements tells you
how long it took to render. You can do other quick
calculations to determine frames per second, triangles per
second, and vertices per second.
Calculating pixels per second is a little
tougher. The easiest way to calculate it is to write a small
benchmark program that renders primitives of a known pixel
Some benchmark software is free to download.
GLUT 3.7 comes with a benchmark called progs/bucciarelli/gltest
that measures OpenGL rendering performance.
You can also visit the Standard
Performance Evaluation Corporation,
which has many benchmarks you can download and the latest
performance results from several OpenGL hardware vendors.
22.030 Which primitive type is the fastest?
GL_TRIANGLE_STRIP is generally recognized
as the most optimal OpenGL primitive type. Be aware that the
primitive type might not make a difference unless you're
What's the cost of redundant
While some OpenGL implementations make
redundant calls as cheap as possible, making redundant calls
generally is considered bad practice. Certainly you shouldn't
count on redundant calls as being cheap. Good application
developers avoid them when possible.
I have (n) lights on, and when I
turned on (n+1), suddenly performance dramatically drops. What
Your graphics device supports (n) lights in
hardware, but because you turned on more lights than what's
supported, you were kicked off the hardware and are now
rendering in the software. The only solution to this problem,
except to use less lights, is to buy better hardware.
I'm using (n) different texture
maps and when I started using (n+1) instead, performance
drastically drops. What happened?
Your graphics device has a limited amount
of dedicated texture map memory. Your (n) textures fit well
in the texture memory, but there wasn't room left for any
more texture maps. When you started using (n+1) textures,
suddenly the device couldn't store all the textures it needed
for a frame, and it had to swap them in from the computers
system memory. The additional bus bandwidth required to
download these textures in each frame killed your performance.
You might consider using smaller texture
maps at the expense of image quality.
Why are glDrawPixels() and
glReadPixels() so slow?
While performance of the OpenGL 2D path (as
its called) is acceptable on many higher-end UNIX workstation-class
devices, some implementations (especially low-end inexpensive
consumer-level graphics cards) never have had good 2D path
performance. One can only expect that corners were cut on
these devices or in the device driver to bring their cost
down and decrease their time to market. When this was written
(early 2000), if you purchase a graphics device for under $500,
chances are the OpenGL 2D path performance will be
If your graphics system should have decent
performance but doesnt, there are some steps you can
take to boost the performance.
First, all glPixelTransfer() state should
be set to their default values. Also, glPixelStore() should
be set to its default value, with the exception of GL_PACK_ALIGNMENT
and GL_UNPACK_ALIGNMENT (whichever is relevant), which should
be set to 8. Your data pointer will need to be
correspondingly double- word aligned.
Second, examine the parameters to
glDrawPixels() or glReadPixels(). Do they correspond to the
framebuffer layout? Think about how the framebuffer is
configured for your application. For example, if you know you're
rendering into a 24-bit framebuffer with eight bits of
destination alpha, your type parameter should be GL_RGBA, and
your format parameter should be GL_UNSIGNED_BYTE. If your
type and format parameters don't correspond to the
framebuffer configuration, it's likely you'll suffer a
performance hit due to the per pixel processing that's
required to translate your data between your parameter
specification and the framebuffer format.
Finally, make sure you don't have
unrealistic expectations. Know your system bus and memory
Is it faster to use absolute
coordinates or to use relative coordinates?
By using absolute (or world)
coordinates, your application doesn't have to change the
ModelView matrix as often. By using relative (or object)
coordinates, you can cut down on data storage of redundant
primitives or geometry.
A good analogy is an architectural software
package that models a hotel. The hotel model has hundreds of
thousands of rooms, most of which are identical. Certain
features are identical in each room, and maybe each room has
the same lamp or the same light switch or doorknob. The
application might choose to keep only one doorknob model and
change the ModelView matrix as needed to render the doorknob
for each hotel room door. The advantage of this method is
that data storage is minimized. The disadvantage is that
several calls are made to change the ModelView matrix, which
can reduce performance. Alternatively, the application could
instead choose to keep hundreds of copies of the doorknob in
memory, each with its own set of absolute coordinates. These
doorknobs all could be rendered with no change to the
ModelView matrix. The advantage is the possibility of
increased performance due to less matrix changes. The
disadvantage is additional memory overhead. If memory
overhead gets out of hand, paging can become an issue, which
certainly will be a performance hit.
There is no clear answer to this question.
It's model- and application-specific. You'll need to
benchmark to determine which method is best for your model or
Are display lists or vertex
Which is faster varies from system to
If your application isn't geometry limited,
you might not see a performance difference at all between
display lists, vertex arrays, or even immediate mode.