C++ Smart Pointers & Tags

C/C++ Users Journal November, 2005

Memory-efficient recovery from catastrophic shutdowns

By Edward Walker

Edward Walker is currently a Research Associate at the Texas Advanced Computing Center, in The University of Texas at Austin. His research interests include building robust distributed systems, network protocols, compilers, and object-oriented systems. He can be contacted at ewalker@tacc.utexas.edu.

Recovery of long-running, system-level daemons or services in the event of catastrophic failure is an important consideration in any software design of such processes. When these systems are written in C++ and use the reference-counted smart-pointer idiom, efficient recovery of smart pointers is critical to enable scalable recovery, and even correctness of its resumed operation.

In this article, I describe a method to efficiently recover reference-counted pointers on software recovery after a shutdown. In particular, the method is memory efficient, avoiding object duplication, which results from traditional techniques of serialization and unserialization of references to pointers in C++. The technique uses tags to uniquely identify pointer objects when unserialized from an archive, an object cache to keep pointer references on recovery, and a method for smart pointers to impersonate other pointer objects.

I present this technique in the context of the development of a real event-based commerical job-scheduling system that is being used by a large market intelligence company to trigger many thousands of computational jobs in a cluster of workstations everyday.

Many C++ projects use automatic memory-management techniques because they engender benefits such as:

Code safety. Avoids the risk of deallocating an object too soon.
Code accuracy. Avoids the risk of forgetting to deallocate unused memory.
Programming simplicity. Assumes a computation model with unlimited memory.
Programmer productivity. Programmers are relieved from taking care of memory management.
Code modularity. Program does not have to be interspersed with bookkeeping code not related to the application.

A number of different C++ memory-management techniques exist and prior work has elucidated the relative merits of the different schemes [1, 4]. Reference-counted smart pointers are one such memory-management technique.

Reference-counted smart pointers [1], sometimes called the "counted body idiom," are lifetime-managed objects that have an internal counter of the number of references to it. These objects destroy themselves automatically when its internal reference count reaches zero. This is a useful technique that is used in many C++ production-software projects because it is simple and does not require any extensions to the language or compiler.

Reference-counted smart pointers can be further defined as either attached or detached [2], where an attached pointer has the reference counter within the body being counted, and a detached pointer places the reference counter outside of the object. In my experience, described in this article, my team was using the detached smart pointer idiom. This required an additional level of indirection, by overloading the -> and * operators in a smart pointer template object, to access the actual object pointer being managed. This is essentially a specific instance of the PROXY design pattern [3].

As far as I know, there is no previous work describing a scheme for recovering smart pointers in a memory-efficient manner. The traditional method of C++ serialization and unserialization of objects causes memory inefficiencies because a new object is always created when a reference to it is encountered in an object that is unserialized [5]. In the worst case, as my team encountered, this bloats the memory consumption of a recovered daemon to unacceptable levels, preventing the software from even starting up. A scheme to intelligently recover these smart pointers was needed to recover the original memory state of the daemon on software recovery.

The Problem

Memory-inefficient recovery of smart pointers can result with the naive implementation of traditional object serialization/unserialization schemes. In these traditional schemes, when an object is serialized, the member pointers in the object are dereferenced and its contents are serialized together with the object into an archive. The problem with this approach is that when the object is unserialized, the member pointers are reconstructed again, one for each object that is recovered.

An illustration of this is shown in the event-based job-scheduling system where users submit template job definitions that are instantiated by the system when appropriate events trigger them. A job definition is defined in a CJobDef object that contains the static attributes of the job, such as its executable command, its working directory, and the user ID under which the job is to be executed. Running instances of the triggered job definition are encapsulated in the CJobInst object, and contain instance-specific attributes like its process ID, execution parameters, and runtime history information. In the systems class hierarchy, every CJobInst object contains a member that references to the original CJobDef object that triggered this job instance.

Figure 1 shows the system before a software shutdown, where multiple instances of the runtime CJobInst object may reference a single CJobDef object. After software shutdown and recovery, current object recovery serialization techniques result in a CJobDef object created for each running CJobInst object, as shown in Figure 2.

This situation results from the traditional method of serializing and unserializing pointer members in a C++ class object. Listing 1, a corresponding code snippet for a CArchive class with operators >> and << overloaded to serialize and unserialize pointers to the CJobInst and CJobDef classes, illustrates this point.

Serializing the private CJobDef member m_def in CJobInst involves calling the appropriate overloaded << operator of the CArchive class. The overloaded << operator serializes the CJobDef pointer by serializing the attributes of the object into a persistent archive. Unserializing the CJobDef pointer involves constructing a new object and invoking the overloaded >> operator to update its attributes from the persistent archive.

Reference-counted smart pointers have mostly the same semantics as raw pointers, and the traditional method of serializing/unserializing these pointers exhibit the same memory-inefficiency issue.

The Solution

Reference-counted smart pointers are implemented by deriving an object from a CReferable class that contains a private reference counter and an increaseReferenceCount() and decreaseReferenceCount() method to modify the value of this counter. A corresponding Ref template class, with the ->, *, and = operators overloaded, is also implemented to manage the lifetime, and access, of this object. The Ref template causes an assignment of a smart pointer to increase the object's reference count, and its destruction to decrease the count. The object in the smart pointer is destroyed only when its reference count reaches zero. In the job-scheduling system, the CJobDef object is encapsulated in a CJobDefPtr type, defined with the statement:

typedef Ref<CJobDef> CJobDefPtr;

It is this CJobDefPtr type that the main CScheduler class uses. When users submit a job definition to the event job scheduler, a new object of type CJobDefPtr is created and the CJobDef object is assigned to it. Later, when an instance of this job is created, it is this CJobDefPtr type that is assigned to the instance. Figure 3 shows the CJobDefPtr type used by the CScheduler class.

In the CJobDefPtr class, the assignment operator = increases the reference counter in the base CReferable class of the CJobDef object, and the delete operator decreases this reference counter. The CJobDef object encapsulated in the CJobDefPtr object is not destroyed until its reference count reaches zero. This indicates that no other object is referencing the CJobDef object in the system, and it is safe to destroy it.

Again, instances of jobs created from a job definition are encapsulated in a CJobInst class. In the same way as CJobDef, the CScheduler class knows only of its corresponding smart pointer version CJobInstPtr, and instances of this object persist while there are references to it.

Our system includes three additional features that lets the event-scheduling system recover efficiently:

The CReferable class is augmented with a tag attribute to uniquely identify each created pointer instance, and a getTag() base method to access this tag.
The Ref template class manages its object in a global object cache called CReferableCache, accessible by other smart pointer objects.
The Ref template class adds an impersonate() method to allow a smart pointer to become another pointer given its tag.

When a new CJobDefPtr or CJobInstPtr object is created, a unique tag is assigned to the object in the CReferable base class constructor. This unique tag can be generated in a number of ways, but any method chosen should guarantee a unique ID across software restarts. A simple scheme is to use a static, global-persistent counter object that stores the last generated ID in a persistent archive, thereby guaranteeing a monotonically increasing ID even across software restarts. Other methods exist, but those are beyond the scope of this article.

The tag assigned to a smart pointer uniquely identifies the pointer, and it is the responsibility of the object serializer to store this tag into the archive. This tag itself is accessible to the object serializer through the getTag() method in the CReferable base class. The object unserializer then uses this tag to restore references to the correct object-pointer instance when the software is recovered. The sequence of actions that the object unserializer has to perform are:

Recover the tag from the archive.
Recover the object attributes from an archive identified by this tag.
Invoke the impersonate() method with the tag as its perimeter to recover the reference to the correct pointer object.

The impersonate() method checks if a tag indexes an object in the global CReferableCache object collection. If no object is found for the tag, then this object is added to the CReferableCache with the tag as its index, and nothing else is done. However, if an object already exists in the CReferableCache object collection, you discard the old reference by calling the set() method with this new reference, and the extraneous object copy is automatically deleted. The implementation of the CReferable and Ref classes is available at http://www.cuj.com/code/. Listing 2 uses the technique for archiving smart pointers.

Figure 4 describes the interaction between the CScheduler, CJobDefPtr, CJobDef, and CReferableCache classes in the system. The CReferableCache class has static member methods getUniqueTag(), addObject(), and deleteObject(). When a smart pointer to CJobDef is created, for example, like this:

CJobDefPtr jobDefPtr = new CJobDef

CScheduler constructs CJobDefPtr and a CJobDef object. The getUniqueTag() method is called by the CJobDef base CRefereable constructor when the object is constructed. This gives every CJobDef object created a unique identification tag. The CJobDef object is then assigned to the CJobDefPtr object, which calls its set() method to add the CJobDef object to it.

The addObject() method is called when the CJobDefPtr assignment operator is called. This adds the CJobDef object into the global CReferableCache collection if this is its first assignment. The getObject() method is called by the impersonate() method, when the smart pointer is asked to replace its internal object reference to that identified by the tag. If the impersonate() method does not find the tagged object in CReferableCache, the CJobDefPtr object replaces its internal objects tag and adds itself to the collection. Finally, the deleteObject() method is called when the CJobDefPtr is deleted and the reference count of the object becomes zero.

The production event-scheduling system described here was used in a market intelligence data company to trigger compute jobs in a network cluster of workstations. Jobs were triggered every week over a period of three days when data became available from retail vendors across the world. At any point within this three-day period, the system could have had up to 200,000 jobs running in this compute cluster. Thus, it was important for the software to be able to support restarts within reasonable memory and CPU consumption limits. Table 1 shows the memory consumption of the event scheduler daemon across restarts with various numbers of running jobs in the system. The small percentage memory overhead in our system restart was due to modules in our software that did not use our scheme to serialize/unserialize less-often used class objects. The memory is eventually reclaimed when jobs complete.

References