Design Issues in Persistent Systems


We had to make a few major design choices in implementing our persistent list system. Our rationale for these choices may be of interest if you are involved in your own object-oriented design project. The major issues in question were:


Interfacing to the ISAM Engine

We could have made do without class ISAMFile, and defined the methods of ObjFile to access the third-party ISAM engine directly. This kind of design would have provided about the same level of encapsulation and hiding of the third-party ISAM API. We chose not to do so for two reasons. The first is to simplify the insulating layer between our code and the third-party API. Class ISAMFile exists for the sole purpose of providing this encapsulation, and is therefore simple to understand and reuse. More importantly, by placing the encapsulation code in class ISAMFile we avoided some of the code bloat which would have occurred if this code was incorporated into the class ObjFile, and hence into every instantiation of this template class.

Inheritance vs. Templates

As mentioned elsewhere in the article, class ObjFile uses two methods to convert live application objects of class X to and from storable ISAM records. These methods are ConvObjToRec( X * obj, char * rec ) and ConvRecToObj( char * rec, X * obj).

The design presented two options regarding the signatures for these methods, and for that matter, all other methods of ObjFile that take objects of type X as parameters. We wanted to be able to replace X in the declarations listed above with any class for which we may want to define a data file. In the actual code that implements this design we needed to replace X with something the compiler could accept.

The most direct solution was to define a new common base class, and to inherit from it all classes whose objects we want to store. The type X in the method signatures above would then be replaced by the name of this base class. This alternative is quite acceptable for those good number of programming problems where you have the luxury of writing from scratch the classes whose objects you want to store. However, we found the constraint that all stored classes must inherit from a common object a bit limiting. We found it especially limiting because we were trying to construct a general purpose mechanism which could be reused, refined, and extended in the future. In particular, we wanted to be able to store objects of any arbitrary class without having to change these classes to inherit from a specific base class.

An alternative solution, which we chose, was to make ObjFile a template class parametrized by the class of the objects we wanted to store. Using templates introduces some minor problems. The treatment of templates is not yet universally consistent across different compilers (see "Templates and Today's Compilers," by Anil Admal and Chris Tarr, CUJ, January 1997), and templates tend to add additional code overhead to the application. On the other hand, templates provide tighter type checking and generally faster code. But more importantly, using templates made it easier to design and implement new data files. We could grab any existing class whose instances we wanted to store, instantiate a copy of the ObjFile template for this class, and then derive the data file class to handle the schema and conversion details. We did not need to modify these classes to inherit from some other base class. This template approach allowed us to totally separate the objects the application used from the mechanisms used for managing and traversing lists of such objects, as well as from the mechanisms used to translate and store these objects on disk. The objects to be stored remain blissfully unaware that they are being put to sleep and reawakened by this mechanism.

This method of separating stored objects from the schema and conversion information is not without its problems. It can be difficult to convert objects for which we do not have sufficient access to their internal state, either directly or through accessor methods. For example, important object state information may be stored in private data members, or may be calculated based on some external information. However, because each class derived from ObjFile is free to implement the conversions in whichever fashion is most appropriate for the class of objects to be stored, we can deal with awkward designs on a case-by-case basis.

Our main concern in making ObjFile a template class was to allow objects of any arbitrary type to be stored, with type-safety as a very desirable side-effect.

Maintaining Object Identity

Rudimentary schemes for object persistence usually stream objects out in a specified order, and stream them back in the same order. Every object is identified solely by its position in the stream. In such a scheme, I know that if I write object x to the stream, followed immediately by object y, then after I read in object x, the next one will be object y. Once loaded into memory, each object is identified by its RAM address until the entire list is again streamed out to disk. This type of scheme amounts to nothing more than taking a snapshot of a selected set of application objects at a given moment, and then later restoring those objects to their previous state. This scheme is all well and good for such things as swapping to disk to free up more RAM, or storing the state of the user's workspace until the next time the program is invoked.

But object identity is not such a simple matter in our persistent lists. In the streaming approach above, the stored objects have only a single identity relevant to the application at any given time. An object either resides in memory, or it is stored on disk. Our persistent lists present problems once we have accessed the object from disk, because now two physical copies exist (RAM and disk) of the same logical entity, and both copies now have different unique identities. Both copies are relevant to the application since the code must know whether an object it wishes to access resides in RAM or on disk in order to use the proper addressing mechanism to get at the object. The problem is exacerbated when the same object is simultaneously accessed several times, since it will clone itself in RAM with a different identity each time.

Several techniques are available for resolving this ambiguity problem. They range from tailoring the memory management schemes to putting the burden on the application. One such scheme uses reference counting and a static list of identifiers of objects that have already been loaded into RAM. Another scheme declares as a new abstract data type a pointer to the object class, and overloads the address-of and dereferencing operators.We took the easy way out of this problem. Our application does not need to hold on to these objects for very long; nor is it a transaction intensive application; furthermore, it needs to access between one and at most twenty objects at a time. So we let this application access an object it needs and return it to the object store immediately thereafter. In other words, we let the ISAM engine's cache maintain our objects live in RAM. In this scheme we identify these objects always by their record id. The ISAM engine then determines whether an object must be read in from disk, and always returns a RAM identifier for the object.

The only time our application refers to these objects by their RAM addresses is within a single block or scope, where we know there will not be other accesses to an object with the same id. Preemptive multithreaded applications using this technique must either declare a critical section around the code that manipulates the object via a RAM pointer, or make use of the ISAM engine's record locking. The latter case is a bit more complex to code since the code that accesses the objects must now deal with the possibility of being blocked. With critical sections, if the object is blocked, the operating system deals with the fact, and the code simply sits idle until the requested object is no longer blocked, never knowing the difference. o