Persistent objects aren't all that hard to implement, but they do take a bit of, well, persistence.
Introduction
About two years ago, I wrote a program to simulate the oscillation of a spring for physics students. I wanted to turn this program into an "authoring" program, which would allow a teacher (not a programmer) to prepare a sequence of simulations for students. The authoring program would enable the teacher to control initial values and other parameters. It needed to pass various data structures from the user interface (initially Delphi) to the simulator, which was implemented as a DLL. Each simulation would also be accompanied with text, might have a multiple-choice question, or show a graphic.
I saved all this information in a Paradox database, which already had appropriate fields for the text and graphics. However, saving the various numeric parameters required a field for each of them. Despite saving a lot of flags in the bits of a long integer, the record turned out to have a messy 30 fields, of which 25 were parameters. Worse, even though those values started and ended as organized data structures, they had to be moved individually in and out of database fields. Debugging and extending this part of the program was quite difficult [1].
When I next had to write a similar authoring program, I tried to find a more flexible mechanism that would write to the database any needed object with a minimum of coding. I also wanted to avoid disorganized storage of information among various database fields; I wanted to maintain type safety; and I wanted reusability in the sense that I could easily add new object types in a new application. What I came up with was a simple object persistence technique.
An Approach to Persistence
My persistence technique relies on a combination of templates, RTTI (Run Time Type Information), and the map container from STL (Standard Template Library). I write each of the objects I need to one Blob (Binary Large Object) field [2]. Once I've written an object to a Blob field, I can later get a copy of it and find out its type.
I easily extended the mechanism to write my objects to a disk file, and read them back sequentially. Since this is a more standard solution, I am going to present that technique here [3].
User Interface to Persistence
The program in Listing 1 demonstrates the use of the simple object I/O mechanism. You need two classes, declared in the header file UIOObj.h (Listing 2), that do the I/O for your objects. One is TIOObjectBase. The other is a class template derived from TIOObjectBase:
template<class T> class TIOObjectThe template parameter T is a placeholder for the types (structs or classes) for which you want to provide file I/O. As a template, TIOObject creates (instantiates) a different class for every type of object you use.
TIOObject has one public constructor with one argument of the same type as the template parameter, the object that you want to write.
Writing Simple Objects
Look at the writeFile function (Listing 1), which writes some objects of type X, Y, and XY defined in the beginning of the program, to see some equivalent ways of using TIOObject for output. To write an object x of type X to a file stream pointed to by fstr, you must construct an object of TIObject<X> with x as a parameter. Then, call its write(fstream*) method with a pointer to a file stream as a parameter. The two steps can be combined into one:
TIOObject<X>(x).write(fstr);or
TIOObject<X>(X(20)).write(fstr);if your class has a suitable constructor like X does.
Reading Simple Objects
Input is trickier. Function useFile (Listing 1) shows how to get input and process it.
Since the program does not know what type of object it is going to read, the input cannot go to TIObject<X> or TIOObject<Y>, etc. Instead, you must use a pointer to the common base class TIOObjectBase:
TIOObjectBase* pIOObjectBase; pIOObjectBase = TIOObjectBase::Create(*pfstream);If the read is successful, pIOObjectBase points to a TIOObject<X> or a TIOObject<Y> type object, etc., which contains one of your objects (here an X, Y, or XY). But to use the TIOObject, you need to know the kind of object it contains and obtain a pointer to that object. So, you must write code for each class you want to process.
Every TIOObject template class provides its own static function PtrData(TIOObjectBase*), which returns either zero or a pointer to the contained object. For the X objects, for example, the code for obtaining the pointer is
if ( X* p = TIOObject<X>::PtrData(pIOObjectBase) ) { // process the p as a X* }That's all there is to doing input. However, notice the try block enclosing the Create(*pfstream) input code. End of file is recognized by catching a member exception of TIOObjectBase. The TIOObjectBase::EUnknownClass exception can be used for debugging.
Object Limitations
There are two limitations to what structures or classes you can use. They must be "plain old data" classes that do not contain pointers or references to other objects or any other data members that are not simple objects themselves. In other words, it must be possible to initialize their objects with a simple memory copy of their contents. A char * type of field will not do, since the contents don't have a fixed size and are not stored sequentially. But you can use a char something[20], as well as any other C-style array of simple objects. Also the classes must have an implicit or explicit default constructor.
Initialization and Cleanup
See the top of Listing 1 for the initialization you need. Declare a pointer to the disk file stream:
fstream* pfstream;See also the init function of Listing 1. You must have a statement like:
TIOObject<X>::Register(); [4]for every type of object you expect to read from your disk file, even though you might not want to process it in that program. Writing objects requires no such initialization, but input won't be able to continue after encountering an object of a missing type. However, the TIOObjectBase::EUnknownClass exception will make the name of the missing class available in the string returned by its mess member (see Listing 1, useFile function).
Implementing Simple Persistence
Listings 2 and 3 contain the implementation of the TIOObjectBase class and TIOObject class template.
There are ways to do similar object I/O that do not require the use of rather new language features like templates, RTTI, and STL. But not without having to intrude on your I/O classes by deriving them from some common class or having to overload the ios << and >> operators for each of them [5].
I did not want to have to derive all my I/O classes from some common base that would do the job for them, so I used templates. The TIOObjectBase class and the derived template classes TIOObject<T> will provide I/O for any type T that satisfies the requirements I mentioned above.
TIOObjectBase has no data members. TIOObject<T> has only one data member, T Data, containing the simple structure for which we want to provide I/O.
Writing to a File
Writing objects is not a problem, because the object's type is known. The problem is how to read arbitrary types of objects, with arbitrary sizes, in arbitrary order. The program must know how many bytes to read each time and what type of object it has read. The requirements for reading back objects determine the kind of information that must be written to the file when an object is stored.
For starters, when the objects are read back from the file, they must be able to identify themselves to the reading function. I used the typeid(T).name functions of RTTI (see C++ RTTI documentation) to get the class names and write them to the file.
Take a look at the write members [6] of TIOObjectBase (Listing 3). The public TIOObjectBase::write(fstream&) is the one users will call. This function calls the protected TIOObjectBase::write(ostrstream &). write(ostrstream &) uses the name method of TIOObject (declared virtual abstract in TIOObjectBase, so that the correct TIOObject<T> method is called) to write two pieces of information to a temporary memory buffer: the size of the actual class name and the class name itself [7].
write(ostrstream &) also copies the contents of the "simple I/O" object to the memory buffer, again using virtual methods dataStart and dataSize. TIOObject Base::write(fstream&) then copies everything from the memory buffer to the disk.
Reading from a File
The implementation of input is more complicated, because the program does not know what type it is reading. It just gets a string containing the object's name, and C++ provides no way to get a type from a string with the type's name [8]. That's where the create static member functions come in.
If the object's type is known, each TIOObject<T> type knows how to construct itself from file input with its static TIOObjectBase* create(fstream&) function (Listing 2). This function just creates a new object of its class type, reads as many bytes from the file stream as it takes to hold all the class's data members, and copies the bytes to the object.
You can think of the static create functions as special kinds of constructors, because they construct new objects and return pointers to them.
The real problem is how to call the correct constructor for each object we read. To do that, we will use function pointers. Although the create(fstream&) constructors construct a new TIOObject<T>, they return a TIOObjectBase*, and their declaration does not contain a T. I did this on purpose to make them all have exactly the same signature, the same argument type, and the same return type.
Because of this common signature, it is possible to assign all the create(fstream&) functions to the same type of pointer to function. The TIOObjectBase class (Listing 2) defines a pfCreate type, which can hold a pointer to any create function:
typedef TIOObjectBase* (*pfCreate)(fstream&);Before reading the object's data from the file, the program reads a string with the object's class name. So we just need to associate this string with the create function of that particular type. If I read a "TIOObject<SomeType>" string from the file, then I must use the constructor (create(fstream&)) for the TIOObject<SomeType> template class.
We can also access these constructor functions through the pfCreate pointers introduced above. This is what makes the whole idea work. Suppose I have defined a struct:
struct createFunction { string className; pfCreate pcreate; };and an array of createFunction, for which each element has been assigned the name and the pointer to the create(fstream&) function for one of the classes instantiated from TIOObject<T>.
Then each time I read a class name string from the file, I can search the array for the element whose class name matches the string just read, and then call the create function pointed to by the element's pcreate member [9]. I should get the correct type of object.
Of course, what I get is a TIOObjectBase*, not a TIOObject<X>* or a TIOObject<Y>*. The type information has been lost again. But that can be easily fixed with the RTTI dynamic_cast operator. I can test the TIOObjectBase* pointer against each TIOObject<T> that I am interested in. Users of object I/O do the testing indirectly through TIOObject<T>::PtrData(TIOObjectBase*) (see Listing 2). This function conveniently returns a pointer not to the TIOObject wrapper, but to the ready-to-use simple object itself.
dynamic_cast is the second of the less common language features that makes this mechanism possible. (Function pointers were the first.)
Improving on the Array
The problem of simple object I/O is essentially solved by the use of templates (so that each simple object type can have its own methods of saving and reading itself), RTTI (to get the class name to write to disk, and to test the kind of object by dynamic_cast), and function pointers (to treat different creation functions as belonging to the same type, so that they can be referred to from the TIOObjectBase base class and accessed through a container).
However, the new STL map container can be used to provide a more elegant and direct interface for object creation than the previously described array.
The map container is an "associative container." It associates values of one type with values of another. Its elements are pairs of values; the first value is the key, which acts as an index to the second value, much like an integer indexes the values in an array by denoting their positions.
In my case, I need to index the create function pointers by the names of the classes they instantiate. So, my keys are strings containing the class names, and their associated values are the function pointers used to create instances of those classes. See the TIOObjectBase declaration (Listing 2) [10]:
typedef map<string, pfCreate, less<string> > CreateMap;TIOObjectBase holds a static member pointer to a CreateMap. The map is allocated when needed by the TIOObject <T>::Register function, which is also called by users of the classes to populate the map with all the classes. TIOObjectBase::reset can be used either to empty the map, so that a new set of classes can be registered, or to clean up after object I/O is no longer needed. Finally, the static TIOObjectBase* TIOObjectBase::Create(fstream&) function (Listing 3) brings everything together by reading a class name from file, indexing the map with that name, and using the function pointer to construct a new object from the file.
Afterthoughts
It should be easy to extend this library to write an additional field to disk, which would indicate the length of the class data. This field would enable recovery from reading an unregistered object from the file by just moving on to the next object. But since I initially implemented my technique for Paradox Blob fields containing only one object, I could recover by just going on to the next record. So, I haven't implemented a length field. I also have not implemented a method for writing objects to a stream or for reading them back from a stream. Possibly, I should have implemented these features first.
Finally, C++ is a great language, and the new extensions make it even greater. If templates and STL were not such terrific concepts, I would never have gone on using them, at least with the compilers I have. You can read about the problem I had with the less<string> template in the beginning of Listing 2. Unfortunately, that wasn't the only one. Still, I think templates and STL are well worth the trouble.
Acknowledgement
This article would not have been possible without the consent and support of the Ministry of Education and Culture of Cyprus. This article was conceived in part as a result of work I am doing for the Ministry.
Notes
[1] For example, since I wanted the option of two simulations of the same type to run simultaneously, I had to duplicate each field.
[2] Paradox naming for fields that contain arbitrary-length binary data.
[3] Partial code for the C++ Builder Blob field is included in the files UBlobClass.h and UBlobClass.cpp, available on the CUJ ftp site (see p. 3 for downloading instructions).
[4] If I don't use that specific class, then my linker gives an error "Unresolved external reference TIOObject<X>::Register() referenced from". In that case, I use TIOObject<X> (X()).Register(), forcing instantiation.
[5] I looked at Borland's "persistent objects" how-to and decided I definitely did not need all that complexity. Of course, it is a different story if you need persistence for non-simple classes.
[6] I split the write member into two methods: a protected method that writes to memory through an ostrstream, and one that copies the ostrstream to disk. I did that to use the memory write to do all the work before writing to other outputs, such as a disk file or the Paradox Blob type field.
[7] Not the T's name, but the TIOObject<T> class's name. If the program writes to file an object of class SomeClass, it will write its name as "TIOObject<SomeClass>."
[8] Object Pascal can. I did the same thing in Delphi, but did not get around to using it before I decided I should use C++ Builder.
[9] To call a function when you have a pointer to that function, simply use the pointer name in place of the function name. If pcreate points to a TIOObject<X>::create(fstream&), then TIOObjectBase* p = pcreate (fstr) will be equivalent to writing TIOObjectBase* p = TIOObject<X>::create(fstr).
[10] less<string> just tells the map to use the string class's overloaded operator<(const string&) to determine an order relation for the keys. The definition of less<string> at the top of the TIOObject.h (Listing 2) should not be needed. In fact, C++ Builder's compiler does not require it. But Borland's C++ v5.02 returned a compilation error without this definition, even though BC++ v5.02 uses the same STL implementation as C++Builder. In theory, their two cstring.h headers should be the same.
Alberto Florentin has been a secondary education physics teacher employed by the Ministry of Education and Culture of Cyprus since 1988. Before that he worked as a programmer for various companies. For the last few years he has been assigned by the ministry to write educational computer simulations for the teaching of physics (among other duties). He enjoys writing these simulations in C++, since it combines two of his intellectual interests, physics and programming. He is sure that computers, and especially simulations, can have a large impact in the teaching and understanding of physics, although a lot of work needs to be done for this to materialize. He welcomes any email with comments on these issues. He may be reached at albertos@spidernet.com.cy.