December 1995/Smart Pointers for C++ Garbage Collection

Memory Management

Smart Pointers for C++ Garbage Collection

Gregory Colvin

Dr. Colvin has been squeezing large programs into small computers for over twenty years. He is a Senior Scientist at Information Management Resources, in Englewood Colorado. He may be reached at gregor@netcom.com.

Introduction
Creating objects in C++ is easy: just say new. Destroying objects is equally easy: just say delete. Nonetheless, memory management in C++ is a difficult problem. The problem is clearly expressed by Bjarne Stroustrup in The C++ Programming Language, Second Edition:
"The fundamental question about memory management can be stated in this way: If f() passes or returns a pointer to an object to g(), who is responsible for the object's destruction? A secondary question must also be answered: When can it be destroyed? ... From the point of view of a library provider, the ideal answers are 'the system, and 'whenever nobody is using the object any longer.'"
Many other languages provide built-in "garbage collection" that comes very close to this ideal. Using a garbage collector means never having to say delete: the garbage collection system itself figures out which objects are no longer in use and deletes them automatically. The draft C++ Standard has no garbage collection, so if you need it you must borrow it, buy it, or build it. Below I present one approach to building a garbage collector.
Many implementations of garbage collection are possible, most too complex to present in a short article. I have chosen to present a very simple "mark and sweep" collector. Fortunately, the performance of a well implemented mark and sweep collector can be excellent for programs not requiring bounded real-time response.
A mark and sweep collector operates in two phases:

In the mark phase it starts from the root pointers and marks every garbage collected object which can be reached from those pointers.

In the sweep phase it destroys all the unmarked objects.
A root pointer is any pointer located in memory that is not garbage collected. When garbage collection is part of the language, the compiler can help in tracking root pointers and traversing objects, but writing a new compiler is beyond my skill and interests. Instead, I have chosen to replace critical raw pointers with "smart pointers" that behave much like pointers, but that also implement the necessary bookkeeping for tracing the roots.

Smart Pointer Interface and Usage
Before tackling the implementation, let's look at the interface and intended usage. The interface is as follows:

template <class alloc> void* operator new(size_t n,alloc& a) { return a.allocate(n); } template <class X> struct gc { X* allocate(size_t); }; template<class X> struct gc_ptr { gc_ptr(X* p=0); ~gc_ptr(); gc_ptr& operator=(const gc_ptr&); gc_ptr& operator=(X*); operator X*() const; X* operator->() const; X operator[](size_t) const; X& operator[](size_t); };
The template operator new is a recent addition to the draft C++ Standard.
The template gc is an allocator for garbage collected objects. To be garbage collected, an object must be allocated with a gc allocator using the template operator new.
The gc_ptr template declares classes of "smart pointers." Objects allocated with the gc allocator will be destroyed by the garbage collector when they are no longer reachable through any root gc_ptr. A root is a gc_ptr that is not itself in a garbage-collected object.
The usage of a gc_ptr is much the same as for an ordinary pointer. The gc_ptr template provides conversions to and from ordinary pointers, so that a gc_ptr can be used in most all places a pointer can be, with the added convenience of not needing to call delete. This convenience is not without costs. One obvious cost is that implementing an efficient garbage collector is a non-trivial task. Another cost is that the overloaded pointer operations may require function calls and other overhead, and the actual garbage collection can take enough time to cause noticeable pauses in program execution.
A more subtle cost is the uncertainty of when, and in what order, the destructors for garbage collected objects will be called. This can cause difficulties in multi-threaded systems, or if the destructor for a garbage-collected object tries to access another object that has already been destroyed by the garbage collector.
To get maximum efficiency and safety from gc_ptr, follow these simple rules:
1. Always wrap pointers to newly allocated garbage collected objects with a gc_ptr.
2. Avoid unnecessary construction and assignment of gc_ptr objects. In particular, do not define gc_ptr variables as function parameters. Use raw pointers instead.
3. Always use gc_ptr variables for class members that may point at collected objects.
4. Do not access any gc_ptr in the destructor for a garbage-collected object.
Note that the following statements, though allowable, do not create garbage-collected objects:

gc_ptr<T> p(new T); gc_ptr<T> q(&t);
To create a garbage-collected object you must instead use:

gc_ptr<T> p(new(gc<T>())T);
Also, beware of mistakes like:

T* p = new(gc<T>())T;
which will compile just fine, but will not register the new object with the garbage collector. A similar, but more subtle, mistake would be:

void f(T*); f (new(gc<T>()) T;
which should instead be written with a gc_ptr temporary:

f (gc_ptr<T>(new(gc<T>()) T));

MS-DOS Implementation
Over the years I have become convinced that efficient memory management requires non-portable code. Memory is a precious resource on all systems, and systems vary widely in all aspects of its implementation and use. So rather than present portable code, I present code fine-tuned for the Borland compiler, targeting real-mode MS-DOS on 16-bit 80x86 processors. However, I have also isolated system-dependent code as much as possible, so that my code should be reasonably easy to reimplement on other systems.
The source code is organized into five files. GC.H (Listing 1) provides the portable interface specified above, using the non-portable gc_link class, gc_data class, and gc_value template specified in GC_IMP.H (Listing 2) . GC.CPP (Listing 3) contains the non-portable implementation of these classes. RUNTIME.H (Listing 4) and RUNTIME.CPP (Listing 5) provide access to the portions of the runtime library I need, as well as patches to track the relevant portions of the draft C++ Standard.
MS-DOS runs only in real mode on the Intel 8086 and compatible processors. Two facts about this processor in this mode dominate my code: memory is segmented, and memory is scarce. A real-mode pointer consists of a 16-bit segment and a 16-bit offset. Segments are indexes to 16-byte paragraphs, thus limiting the effective address space to one megabyte. The design of MS-DOS further limits easily usable memory to less than 640 kilobytes.
I take advantage of the segmented architecture by aligning all object allocations on paragraph boundaries, so that the address of each allocated object has a unique segment. Thus, in much of the code I can compute with 16-bit segments, which will fit in a single 8086 register, rather than 32-bit pointers, which require two registers. I take advantage of the scarce memory by organizing the free store as a contiguous sequence of allocated and free paragraphs, accessed through a simple rover algorithm. Larger, virtual memories would require a more sophisticated page-management approach.
Before the first allocation, the gc_statics constructor grabs all but 4 Kb of the memory provided by MS-DOS, then builds an empty heap by zero filling each free paragraph. The allocate function simply walks the heap, skipping over allocated segments and gathering up contiguous empty paragraphs until it satisfies the request or runs out of heap. Allocated segments begin with a 32-bit gc_hdr structure which stores the size and state of the allocation. Also stored in the gc_hdr is a count of the number of root pointers pointing to that segment.
When the allocate function fails, the operator new functions call the current new-handler. The gc_handler function is installed as a new-handler that performs garbage collection. If portions of your code must run uninterrupted, you can disable garbage collection by removing this new_hander with set_new_handler(0). Collection is re-enabled by calling set_new_handler(gc_handler).
The gc_data::Mark function performs the mark phase of collection. It turns on a static IsMarking flag and proceeds to walk the heap, calling the gc_data::mark_all function for each allocated, unmarked segment with a non-zero root count. The mark_all function causes all gc_ptr objects within a segment to recursively mark the segments they point to. Thus, at the end of the mark phase, all and only objects reachable from root gc_ptr objects are marked. The gc_data::Sweep function then performs the sweep phase of collection. It simply walks the heap, calling gc_data::destroy_all for each marked object, which in turn causes the appropriate destructor to be called.
How does gc_data::mark_all accomplish such magic? All garbage collected objects are allocated as sub-objects of a class derived via the gc_value template from the gc_data class. The mark_all function is a virtual function generated by the gc_value template which invokes the gc_data::mark_loop(size_t) function. This mark_loop function loops over the possibly multiple elements in a segment, invoking the virtual mark(void*) function also generated by the gc_value template.
The mark function appears to accomplish nothing. It simply casts its argument and assigns an object to itself. But looks are deceiving. The assignment operator for gc_link checks the IsMarking flag, and when it is set it calls the mark_all function for the gc_data it points at. Since the default assignment operators for a class will invoke the assignment operator for each of its members, including any gc_link members, and since all gc_ptr classes are derived from gc_link, we have effectively tricked all reachable objects into marking themselves.
Given the gc_link assignment trick, all that remains is to keep track of the root pointers. The copy constructor, assignment operator, and destructor for gc_link accomplish this by incrementing and decrementing gc_hdr:: root counters according to whether the gc_link itself is stored in a garbage collected object.

Alternatives
A more sophisticated commercial implementation of garbage collected smart pointers is available from Geodesic Systems (support@geodesic.com). An alternative approach to C++ garbage collection, which does not require smart pointers and which better handles the timing of object destruction, is the Ellis interface to the Boehm collector. C/C++ source is available on the Internet via anonymous FTP from Xerox PARC (parcftp.xerox.com: pub/gc/gc.tar.Z).

References
[1] J.O. Coplien. Advanced C++ Programming Styles and Idioms (Addison-Wesley, 1992).
[2] D.R. Edelson and I. Pohl. "Smart Pointers: They're smart but they're not pointers." In Proceedings of the 1991 Usenix C++ Conference.
[3] J.R. Ellis and D.L. Detlefs. "Safe, efficient garbage collection for C++." In Proceedings of the 1994 Usenix C++ Conference.
[4] B.S. Stroustrup. The C++ Programming Language, Second Edition (Addison-Wesley, 1991).
[5] B.S. Stroustrup. The Design and Evolution of C++ (Addison-Wesley, 1994).
[6] P.R. Wilson. "Uniprocessor Garbage Collection Techniques." In Proceedings of the 1992 International Workshop on Memory Management.