The G3D Graphics Engine

C/C++ Users Journal December, 2004

Advanced language features for simplicity and safety in a graphics API

By Morgan McGuire

Morgan McGuire has worked in the research labs of IBM, NEC, and Mitsubishi. He is currently an independent consultant while completing his Ph.D. at Brown University. Morgan has been the project manager for G3D since 2000 and he can be contacted at matrix@graphics3d.com.

G3D is a commercial-grade 3D Engine available as open source under the BSD License. It is maintained by a team of six programmers, contains 100,000 lines of C++, and has been stable for four years. The library is used in games, tech demos, research papers, military simulators, and university courses. Its mandate is to provide a set of routines and structures so common that they are needed in almost every graphics program. These have been highly optimized and debugged to provide a rock-solid base from which programmers can quickly write 3D applications.

G3D is built around OpenGL and focuses on real-time hardware graphics. The library abstracts 3D math, the platform windowing system, and the graphics card, but does not impose a 3D model class, application structure, or GUI on applications. Because of this, programmers using G3D have a lot of flexibility but must already be familiar with 3D graphics algorithms and have a cursory knowledge of hardware rendering. Although almost all OpenGL functionality is wrapped by the API, G3D provides a breakable abstraction that exposes the underlying OpenGL and platform-specific handles so that programmers can dip down to lower level APIs and are never locked out of functionality.

In this article, I provide a behind-the-scenes look at how we used C++ to make the library easy to use, fast, and safe. The theme of this article is that language features can solve domain-specific problems without complicating an API. The samples given have been simplified from the real source code [1] to reduce line count.

"Easy to Use"

G3D is intentionally a C++ API. We use the full feature set of C++ in the interfaces and in their implementations. The design challenge is doing this in a way that is transparent to and convenient for programmers, while keeping the implementation simple enough for our small team to maintain and document. Three design strategies make the library easy to use:

Don't require extensive knowledge of C++.
Fail loudly, preferably at compile time, when misused.
Eschew boilerplate.

From a user's perspective, the API uses few C++-specific paradigms and avoids the less-intuitive language rules. Notable exceptions are iterators, explicit single template arguments (std::vector<int>, for instance), and our reference-counting variation on the autopointer. In each of these cases, the paradigm is likely known to most C++ programmers anyway, offers tremendous safety and power to justify its complexity, and can be used in an idiomatic fashion by pattern matching even if not understood. This simplicity keeps expert programmers focused on graphics instead of the C++rules, and lets novice programmers (especially students) dive in without hitting C++ roadblocks. Finally, it makes the API more friendly and accessible to everyone and results in more understandable error messages during debugging sessions. We have intentionally passed over several opportunities to extend the API in a useful direction via a more complicated interface because it would be too hard to document or too easy to accidentally misuse.

To make code fail when it is misused, we rely heavily on static type checking. Enumerations provide named type-safe constants. Templates and overloaded inline functions keep the speed and flexibility of macros and void*, while verifying correctness at compile time. For cases where static checking is impossible, we use extensive runtime assertions and carefully track runtime types, especially for graphics card resources that are traditionally typeless byte arrays.

Raw OpenGL cannot open a window on its own. Even once platform windowing boilerplate has been satisfied, OpenGL requires hundreds of lines of code to initialize and configure extensions. OpenGL operates on floats and byte arrays, providing no facilities for loading images or computing 3D coordinates. Compare that to Listing 1, which shows an extremely simple program that demonstrates how G3D eliminates boilerplate, and provides an easy-to-use API.

The first two lines of Listing 1 create and initialize a RenderDevice, the central class in the API that abstracts most of OpenGL. Optional arguments to the init function describe the width, height, antialiasing, refresh, and other properties of the window that will be created. It is even possible to subclass the GWindow interface and provide a new window class on initialization; for example, to render inside an ActiveX control.

The next line loads a JPG image and creates a texture (G3D supports a host of popular 2D and 3D formats, including JPG, PCX, TGA, BMP, ICO, MD2, IFS, 3DS, OBJ, PLY, PNG, DDS, and SM).

The main loop uses System::time to run an animation for 10 seconds. That routine takes daylight savings and timezone into effect and is accurate to one billionth of a second, so it can be used for profiling.

Modern graphics cards essentially perform only one function. They take a list of 3D triangles with optional colors, textures, and lighting properties and draw those triangles really fast. G3D renders triangles in the same way as OpenGL, by executing beginPrimitive, sending a series of vertices to the graphics card, and then executing endPrimitive. The texture-mapped triangle in Figure 1 is the output of Listing 1.

There is also a faster mode that lets triangles be rendered using indexes into a vertex array, which is discussed at the end of this article. For the single triangle rendered in the code listing, note the use of overloaded math operators for the vector operations (which are inlined and use MMX and SIMD instructions), including mixing integers, floats, scalars, and vectors. A common vector operation is the swizzle, which returns a permutation of the coordinates, say, (z,x,y). The vector library contains programmatically generated code for all possible permutations on vector types, so this can be expressed as Vector3::zxy(). This is also handy for dropping a coordinate in an expression: Vector2 v2 = v3.xy().

The second batch of rendering calls enables a light and draws a set of 3D axes. The Draw class is mainly for debugging; it provides an easy way to visualize axes, boxes, spheres, and other primitives without sending individual triangles.

Every line of the program is necessary. All boilerplate has been rolled into the classes and the most common arguments are defaults. There is also no explicit memory management anywhere in the program, which means there are no memory leaks possible. The listing shows G3D at its most stripped down. Many G3D programs take advantage of the optional application framework classes, GApp and GApplet. These set up a rendering and user input loop, provide a default camera that responds to mouse and keyboard controls, print frame rate information to the screen, and provide a file log and on-screen display to which debugging output can be sent. These large infrastructure pieces of the library dramatically reduce the size of infrastructure code in a project. For example, one G3D sample program is a networked game client and server for three platforms in only 300 lines of code.

Floating-Point Math

G3D provides a number of handy constants, from PI and inf to colors and identity matrices. Unfortunately, the four most common techniques for supplying constants in C++ all have drawbacks: static and global constants are subject to order-of-initialization bugs, functions have overhead (particularly for Matrix4, whose constructor takes 30 cycles), macros do not obey namespace rules, and enum constants must be unique integers.

A better alternative is a static inline member function that returns a const reference:

static inline const Color3& Color3::white() {
    static const Color3 c(1, 1, 1);
    return c;
}

This is as fast as a global constant but is guaranteed to always return the correct value. In this example, the Color3 constructor will be called once and subsequent calls will only cost one cycle for the assignment of a reference.

Having avoided problems with floating-point constants, comparisons between them are still dangerous. Algebraically, A*B/A == B, but the result is subject to floating-point round-off on a computer. Code that compares two floats using the == operator is almost always a bug and inequalities can often give surprising results as well. A constant epsilon value is not the solution—the error on a large float value may be five orders of magnitude higher than the error on a small float value.

G3D offers a set of fuzzy comparison operators with an epsilon value that scales with the magnitude of the first argument. To see why this is valid, consider the case for fuzzyEq(a, b). For a and b to be nearly equal, they must have nearly the same magnitude. This means that b either has the same magnitude as a (so we can ignore it) or it has a different magnitude, in which case the comparison will fail anyway and it doesn't matter if we use a lousy epsilon value.

We start with an initial fuzzy epsilon (0.000001) and scale it by the magnitude of a plus one, so that epsilon does not go to zero as a becomes very small. Two numbers are fuzzy equal if they are exactly equal or if their difference is less than the scaled epsilon. We perform the exact test first to short-circuit computation of epsilon in the common case:

double eps(double a);
   const double aa = abs(a) + 1;
   return (aa == inf()) ? 
      fuzzyEpsilon : fuzzyEpsilon * aa;
}
inline bool fuzzyEq(double a, double b) {
   return (a == b) || 
      (abs(a - b) <= eps(a));
}

NaN is a special float value that cannot be tested for using the == operator. It is produced by operations such as 0.0/sin(0.0) (which does not create a compile-time divide-by-zero) and 0*inf(), where there is no sensible result. The problem with NaN is that it produces false for almost every comparison: NaN == NaN is false and although some versions of math.h provide one, there is no compiler-independent isnan function. To test for NaN, we must either match bit patterns or, more safely, see if two failed comparisons lead to a contradiction:

inline bool isNaN(double x) {
    bool b1  = (x < 0.0);
    bool b2  = (x >= 0.0);
    bool b3  = !(b1 || b2);
    return b3;
}

Reference Counting

The key to good software development in C++ is good resource management. This is because all resources must be manually allocated and freed in C++ except for memory on the stack. The front line of resource management is, of course, carefully written constructors and destructors. G3D has these, and furthermore encourages you to allocate most small objects on the stack. It also goes beyond these to address graphics-specific resource issues. In addition to files on disk, network sockets, and memory buffers in the heap, graphics programs allocate resources in AGP memory and on the graphics and sound cards. Resources such as sounds, textures, and models do not have a natural owner and their lifetime is hard to presuppose. Consider an image of a brick wall used as a texture map for a 3D model of a building. The same texture map is likely shared among several building models and cannot be deallocated while any of those models is still in memory. In fact, it might be needed even after those models are deallocated—the Graphics Processing Unit (GPU) might still be rendering from that texture even though the CPU no longer needs it.

The natural solution to this problem is automatically managing memory using garbage collection. It is possible to implement a conservative version of the full-blown garbage collection used by languages such as C# and Java, and the Boehm-Demers-Weiser collector is a library that does just that for C++ [2]. This solution is overkill for our problem; it can introduce noticeable delays when reclaiming memory and imposes excessive structure on the users of the library. G3D instead uses reference counting, the style of memory management employed by most scripting languages and by Microsoft's COM API. Every object tracks the number of pointers referencing it and calls delete on itself when that number hits zero. Our implementation, which is based on an earlier one by Justin Miller [3], uses the same technology as auto and smart pointers and pairs inheritance with templates and overloading.

A class T that uses reference counting publicly inherits from ReferenceCountedObject, which contains a single integer that is the number of references:

class ReferenceCountedObject {
public:
   int refCount;
   virtual ~ReferenceCountedObject() {}
protected:
   ReferenceCountedObject() : 
   ReferenceCountedObject_refCount(0) {}
};

ReferenceCountedPointer<T> (Listing 2) overloads the dereference operator to act like a pointer and is used in place of T*. The assignment operator increments the referenced object's count, and the destructor decrements it. The assignment operator is itself a template to make reference-counted pointers act more like normal pointers. Normally in C++, there is no subclass relationship between templates instantiated on subclasses. For example, if subclass S inherits from class T, there is no type relationship between std::vector<S> and std::vector<T>. But with our pointers, we want ReferenceCountedPointer<S> to be a subtype of ReferenceCountedPointer<T>, just as S* is a subtype of T*. To do this, we allow assignment (and the constructor, not shown in the listing) to accept any reference-counted pointer and allow normal subtyping rules to test the pointer comparison. This gives the expected behavior and even produces useful error messages. Assigning upwards on the type hierarchy or between unrelated types gives the message "cannot convert from 'class A*' to 'class B*' Types pointed to are unrelated."

Early versions of G3D had an implicit cast-to-pointer method to allow interoperation with external routines that take T* arguments. However, the implicit cast allowed delete to be called through the cast, causing memory corruption without producing a compile-time error. Too many programmers accidentally called delete, so we removed the implicit cast.

As an algorithm, reference counting has two drawbacks. It cannot collect cyclic references, so it is important to never have a chain of pointers that circles back to the original. The reference-counted classes that we collect typically contain no reference-counted pointers themselves, so this is not a concern. The other drawback is a performance hit: Every dereference and assignment takes at least twice as long as the same operation on a regular pointer. Because of this, we only use reference counting on objects where this cost will be dwarfed by the cost of the method calls. For example, textures and fonts are reference counted but their methods take thousands of cycles to execute. It would be convenient to reference count the BinaryInput class; however, its methods take only a few cycles each and would be measurably slower if the extra dereference time was added.

We use static factory methods on classes to prevent raw pointers from ever appearing and typedef the pointer names to shorten names. The final implementation lets you create and manipulate automatically managed objects (as if writing Java code) without necessarily understanding the C++ mechanism behind them:

GFontRef font = GFont::fromFile(renderDevice, "courier.fnt");
font->draw2D("hello world", Vector2(0, 0), 10);

Compile-time checking guarantees that type safety is preserved during implicit casts to base classes and that programmers do not accidentally use operators new or delete with the automatically managed classes.

Augmenting the STL

We generally advocate using the STL instead of duplicating its functionality. However, there are some cases where the general-purpose STL does not meet the needs of graphics programmers, so G3D augments it with five templated data structures: Array, Queue, Table, Set, and AABSPTree. These are optimized for speed, not space, and have constants that are tuned for the access patterns we've observed in over hundreds of graphics programs. They also fix some bugs observed in various STL implementations. For example, while the STL-port library is great, the MSVC++ one fails to call copy constructors when std::vector uses realloc.

Many G3D users are students who are new to C++ but know Java and C. Where we match STL functionality, we try to provide a friendlier interface, starting with friendly names (more than one novice has been confused by "std::vector" since, to a graphics programmer, a "vector" is a 3D direction). For example, Array provides sort, randomize, fastRemove (swap with the last element and shrink by one), insert, next (resize by one and return a reference to the last element), and append methods that accept one to five elements or an array of elements. In debug mode, the [] operator checks bounds on all data structures, providing a level of safety not enjoyed by the STL, which makes callers choose between the at and [] methods.

Most critically, MMX and SIMD processor instructions used for graphics can only operate on 16-byte aligned data. The G3D data structures allocate on these boundaries automatically, and the System class provides fast allocation, deallocation, memcopy, and memset routines that use these advanced processor instructions. It is not generally possible to use STL data types with these instructions.

While the other data structures are optimized for graphics, the AABSPTree class is used exclusively for graphics. The axis-aligned binary space partition tree is a special kind of set that uses a binary tree for log time operations and a matching hash table for constant time ones. Each member of the set has a 3D bounding box (given by getBounds, analogous to a hashCode method) that describes its position. The root node describes a plane perpendicular to the x-, y-, or z-axis that divides the set in half. Subsequent nodes divide the remaining subsets.

As with a hash set, AABSPTree::isMember is constant expected, amortized time and enumerating all members is linear in the set size. 3D queries such as "find all members intersecting this box" are generally log time instead of linear because they use the splitting planes to quickly eliminate members.

One of the most useful methods is the ray iterator, which produces the members that are intersected by a ray through space in order of distance from the start of the ray. This can be applied to tracing bullets in a video game and light photons in a photorealistic ray tracer. Although it follows the C++ iterator design pattern, it is really an unrolled recursive program with an internal stack frame and a cache to avoid recomputing common expressions. The 355-line iterator implementation is too complicated to describe further here, but is an excellent design example for mapping recursive tree algorithms to the iterator pattern.

These data structures continue the trend we observed in the first simple listing and expanded with reference counting. Because all dynamic memory allocation occurs inside the provided data structures and reference-counted objects, it is rarely necessary to explicitly call new or delete in a G3D program. The C++ pass-by-reference (&) calling option eliminates the need for explicit pointers, and most small objects such as matrices can be stack allocated, so there are few opportunities for memory leaks. This is even more important in graphics applications than other consumer software because rendering moves a lot of data. A new game like Doom 3 uses half a gigabyte of data per level and makes about 100 megabytes of read/write operations per second, so a memory leak will quickly exhaust system resources.

Debugging

The release build of the library primarily depends on type checking for verifying the correctness of the program to which it is linked. The debug build is loaded with assertions that check the input to routines, however. Three assertion macros are used throughout the library itself and recommended to programmers using it. The debugAssert and debugAssertM macros are like C++'s assert macro, but they give a more useful dialog (see Figure 2 and Listing 3). On Linux, the assertion dialog appears at the command line because that operating system provides no built-in GUI library.

The assertion dialog shows the assertion message, expression that failed, file and line number, and the OS error code. The last may be unrelated to the assertion failure but is often helpful for debugging. Many 3D programs grab the keyboard and mouse focus. Under Windows, these are automatically released under the regular C++ assert, although the mouse cursor may not appear and user input will be extremely lagged for the first few seconds. On other operating systems, you may be stuck and never regain keyboard control. To avoid both undesirable situations, the G3D assertion routines automatically detect focus lock and release it, restoring it only if you choose to continue the program.

The dialog offers four options that are more useful than the default Windows "Abort, Retry, Ignore." The debug option brings up the line that caused the assertion failure—note that it breaks inside the routine where the problem occurred, and not inside the assert routine. This is accomplished by an inline assembler, which invokes interrupt number 3 on Intel and AMD processors and SIGINT on PowerPC. Ignore lets the program continue as if the assertion had not failed. Graphics programs tend to have tight loops that repeat many times per second, so a single ignore is not always sufficient. Ignore Always uses a static local variable to record the programmer's first ignore choice and then skips future failures for the specific assertion line. Note the use of a nested local scope in the macro so that each assertion instance has its own flag. The exit option terminates the program.

To provide useful information in the dialog, the macro uses the stringizing operator # to convert the expression code to a printable form and uses the built-in __FILE__ and __LINE__ macros to record the failure location.

G3D contains many classes that parse human-readable formats. These respond to data errors with a modified assertion that uses a fake file and line number. This is because the assertion location is not the location of the error—the parser is correct and the data file is the problem! By printing the data filename, error line, and error message to stderr (or the MSVC++ debug pane with OutputDebugString) in a format that mimics a real compiler output, G3D tricks the debugger into showing the data file and not the parser. Under MSVC++, pressing F4 will then take you to the corrupt data; similar functionality exists in other IDEs.

The aforementioned behavior is for debug mode only. In release mode, the assertions compile away to improve performance. Given enough time to write code in an ideal fashion, you would check data files and other input for errors and have your programs respond with appropriate custom dialogs. In practice, you rarely have enough time for proper error handling and make assertions carry the load. G3D offers alwaysAssertM for this situation. It is as easy to call as an assertion but remains in the release build. In the debug build, alwaysAssertM acts like an assertion and brings up the debugging dialog. In release mode, it prints a friendlier error for users explaining that an internal error has occurred and offering an "ok" button to shut down the program. The actual error, file, and line number are written to the default log file, which can then be e-mailed to the developer.

The log file is managed by the Log class. It offers printf-style output to a file (Figure 3) that flushes after every write, guaranteeing output is written before a crash. RenderDevice and other major classes accept an optional Log on initialization. They record their initialization progress and a profile of the machine to give you enough information to reproduce bugs.

Tricks for Portability

Programs that use G3D generally compile on all supported platforms without any changes or #ifdefs. G3D itself is compiled from the same source code on Windows, Linux, and OS X, under the MSVC++ 6, MSVC++ 7, GCC, and Xcode compilers. These compilers are not 100-percent compatible, but a few tricks bring them much closer.

The C99 Standard changed the scope of variables declared inside the preamble of a for-loop from outside to inside the loop body. The following macro produces this behavior on the pre-C99 MSVC++ 6 compiler:

#if (_MSC_VER <= 1200)
  #pragma warning (disable : 4127)
  #define for if (false) {} else for
#endif

The pragma disables a spurious warning and the #define forces for to use the if-scoping rules for local variables. We #define other key words on Linux and OS X; for example:

#define __cdecl __attribute__((cdecl))

In MSVC++, pragmas give us control of the linker. We use this to automatically link the G3D library into the user's program:

#pragma comment(lib, "g3d.lib")

and to ensure that the correct version of the C runtime is used:

#ifndef _MT
  #define _MT 1
#endif

#ifdef _STATIC_CPPLIB
  #undef _STATIC_CPPLIB
#endif

#pragma comment (linker, "/NODEFAULTLIB:LIBCMTD.LIB")

The result is that under MSVC++, you can use default project settings and G3D adjusts them as needed. On other platforms, you must manually configure the compiler and linker (or use iCompile [4], which automatically compiles G3D and OpenGL programs with no customization).

The memory layout of some classes is critical. For example, Color3uint8 must occupy exactly 3 bytes in memory and be packed tightly in an array because hardware interfaces are specified at the byte level. Fortunately, compilers support these rigid byte layouts, albeit with different syntax. Under MSVC++, #pragma pack(push, 1) enables tight byte alignment before a class definition and #pragma pack(pop) disables it afterward. On Linux and OS X, instead insert __attribute((aligned(1))) between the class definition and trailing semicolon. It is, of course, critical that byte-aligned classes contain no virtual members; doing otherwise would add an extra invisible pointer for the vtable. We take advantage of the known alignment for Vector and Color types to allow them to act as arrays with overloaded operators:

class Color3uint8 {
  ...
  inline uint8& operator[] (int i) {
    debugAssert((uint)i < 3);
    ((uint8*)this)[i];
  }
  inline operator uint8*() {
    return (uint8*)this;
  }
};

The unsigned comparison of i against 3 verifies that i is not negative (which would cause it to appear as a large unsigned number) and less than three with a single if-test in debug mode. In release mode, the bounds check is removed entirely for performance. The implicit cast also lets this type be passed to a C interface (such as OpenGL) that operates on byte pointers and void*.

Motorola and PowerPC processors used by Sun and Macintosh computers are Big endian—they put the most significant byte of an integer first in the memory layout. Intel processors are Little endian and place the most significant byte last. This is completely transparent except when casting a series of bytes to a larger type, such as int or float, a process that occurs frequently when reading and writing from files and sockets. G3D provides BinaryInput and BinaryOutput classes to abstract endianness and simplify the process of handling binary data. Internally, they convert data as needed based on the processor endianness, which is determined by casting an integer pointer to an array of bytes and testing where the least significant bit lies:

int32 a = 1;
_machineEndian = (*(uint8*)&a == 1) ?
   G3D_LITTLE_ENDIAN :
   G3D_BIG_ENDIAN;

Vertex Arrays

Instead of sending individual vertices to the graphics card, it is possible to pack the vertices of a 3D mesh in an array of float triples. The triangles of the mesh are then stored in an index array of int, where every three indices into the vertex array forms a triangle. Although the index array may be stored in main memory, the vertex array (which is much larger) is generally allocated in video RAM and set once when a model is first loaded. This reduces the amount of data that must move across the AGP or PCI-express bus per frame. Because the on-card memory bandwidth is over 10 times greater than the bus bandwidth, storing vertex arrays on the graphics card is essential to keeping the graphics processor from being data starved.

Unfortunately, the underlying OpenGL APIs for managing vertex arrays are complex—they combine all the challenges of data synchronization, manual resource management, explicit cache hints, platform-specific APIs, and area memory management. It is important that this complex OpenGL API is supported though a simple G3D API. Not only are vertex arrays critical for performance, but a single 3D model actually contains multiple arrays of per-vertex properties, such as color, normal (the orientation of the surface), texture mapping information, and the actual vertex position. Thus, you will frequently create vertex arrays, and we want the process to be safe and simple.

Listing 4 presents the G3D vertex array API. You create a (reference counted) memory area of type VARArea, then upload blocks of data into VAR arrays. This VARArea is reference counted, so it is automatically released when all pointers to it and the VARs are dropped. The VARArea also selects the appropriate memory allocation technique at runtime based on the user's OpenGL version and graphics card. It can even fall back to main memory and emulate vertex arrays on very old graphics cards.

The VAR structures hide the real memory handles and store type information. They contain only a few bytes and can be passed by value without substantial overhead.

The complexity lies beneath the surface, hidden from you. The graphics card must know the fundamental data type (float, int, double, and so on) of the data in the array, the number of coordinates per element, and the total size of the array. This information is all present in the C++ type system, so we want to avoid requiring you to write boilerplate to express it—and to avoid potential bugs where you express the type incorrectly. Our solution is to use templates to access and work with types.

Listing 5 shows the VAR constructor. It extracts the type T on which the Array has been parameterized (there is also a T* version, not shown). The number of elements is easily obtained from the array, and the sizeof macro tells us the number of bytes in each element—all that is needed is a trick to obtain the number of components and their format from a type. The glFormatOf macro provides this trick. Like sizeof, it is a macro that maps types to values. It invokes the static method _GLFormat<T>::type that returns an enum representing the coordinate type of T. The _GLFormat class uses partial template specification to supply different values for each of the known vector types. You can even introduce your own classes with the DECLARE_GLFORMATOF macro. According to the C++ specification, we could have used a partially specified function template instead of a class. However, because the MSVC++ 6 compiler's template implementation is extremely buggy, this does not work. Fortunately, the STL uses partially specified templates for classes, so the compiler must support them correctly.

A computer with a graphics card is really an asynchronous parallel processing machine because the graphics card contains its own processor(s). This creates a synchronization problem similar to multithreaded programming. The graphics card cannot read from a vertex buffer while the CPU is writing to it. The vertex arrays are dynamic (VAR::update and the methods for reusing space in a VARArea are not shown in the listing), so most programs have the potential for disaster—reading while writing can crash the computer.

Different graphics cards provide different functionality for determining when memory is being accessed. The Milestone class detects the graphics card mechanism and maps it to a platform-independent interface. A milestone is semantically a flag that is inserted at the beginning of the graphics card pipeline and flows through to the end. The milestone is "unset" when it is inserted and "set" when it emerges. Although the milestone functionality is available to programmers, most are unaware of it because it is primarily used deep within the VAR implementation.

We use the milestone as a kind of mutex. When a vertex array is used for rendering, we attach a fresh, unset milestone to it and send the milestone through the pipeline after the triangles that use the array. Because all read operations against the array must complete before the milestone is set, we know that a set milestone indicates an array that is safe to write to. If a new batch of triangles is rendered, a new milestone overrides the previous one. Writing is now safe because the VAR::update routine simply blocks on the CPU if the vertex array has an unset milestone.

There are two further wrinkles to the implementation. First, even if the milestone is overwritten before it emerges from the pipeline, we cannot prematurely deallocate it. The milestone wraps hardware resources that will leak if not properly freed and, on some platforms, those resources cannot themselves be deallocated until the milestone emerges from the pipeline. We solve this with our standard resource solution—milestones are reference counted and the RenderDevice holds a list of pending milestone pointers. (Interestingly, this is the equivalent of a program stack or root set for objects currently in the pipeline. Graphics-card pipelines are hundreds of stages long and unlike traditional garbage collectors, we can't afford to flush the pipeline before collection.) The second wrinkle is that about 12,000 rendering calls are issued per second. Allocating and deallocating a few bytes on the heap for each is not unreasonable, but allocating the underlying resources on the graphics card is prohibitively expensive. Our solution is pooled storage buffers, the common C++ design pattern for this situation, albeit typically used for heap allocation. The first Milestone allocated initializes a static array of the underlying graphics card milestones. Future constructor invocations pop the last element from that array and destructor calls push onto the array. Should the array size ever hit zero, a new batch of card resources is allocated and appended.

Conclusion

G3D uses almost every language feature in a C++ programmer's arsenal:

compile-time types
overloading functions
operators and casts
inheritance
templates
static factory methods
partial template specification
inline assembly
generated code
pooled allocation
macros
destructors
stringizing operator
fake file and line numbers
pragmas
static local variables

At the same time, G3D ensures that you need not understand these C++ features or how they are used. This style of using language complexity to achieve simplicity and safety is the key to the success of the library. Many of the techniques discussed have applicability beyond graphics. Yet, what is perhaps most interesting is how G3D uses language features to solve problems that are specific to graphics. I encourage you to consider that theme for your own code. Mediocre programmers design by rote, applying the same formulaic patterns everywhere. They allow the complexity of C++ to make their interfaces complex. Instead, consider your problem domain and usage scenarios carefully. Then use the complexity of the language to create powerfully simple interfaces.

References

[1] http://g3d-cpp.sf.net/.

[2] ftp://parcftp.xerox.com/pub/gc/gc.html.

[3] Miller, Justin. "Clean Up: C++ Garbage Collection," BYTE, January 1996 (http://www.byte.com/art/9601/sec12/art3.htm).

[4] iCompile, http://ice.sf.net/.