Think of a persistent object broker as a poor man's database. Then think of all the ways you might use it.
In many applications, objects that are created exist only while the program is executing. However, there are times when it would be useful to preserve the state of an object from one run of the program to the next. Such object persistence can be achieved by saving the object's state to a file or database. Using a database often amounts to overkill; a simpler solution is needed. One simpler solution is to use a Persistent Object Broker, or POB. This article will show how you can use a POB to save objects to files for reuse. The POB presented here also supports modification of persistent objects, permanent deletion, and fast object retrieval.
The POB gives each instance of a class a unique OID (Object Identifier), which the POB uses to reference and access the instance. The OID is separate from the class data members and cannot be changed by your application. A POB's OID is not to be confused with a UID (Universal Object Identifier) OIDs are unique only within a given scope, so you can't use the same OIDs on two different POB sets. However, you can change the OID implementation to make it universal and give it the ability to be used across different domains.
POB Structure
The POB consists of two groups of classes. The first group provides an interface for users. It includes the templates POBbase and POBroker, and the POBObj class. The second group provides the core functionality of the POB and includes the classes Gbiostream, Gfactory, and DiskMBTree. I first discuss the POB from a user's perspective, which includes requirements on user classes, and functions the POB provides that enable users to store and retrieve persistent objects.
The User Interface
Every class you want to make persistent must inherit from the POBObj class (see Listing 1). The two overloaded operators shown in Listing 1 act as the data channel for the object, providing bidirectional streaming of data between the object and the disk object file. The assignment operator, operator=, takes a string as its input parameter. The POB calls this operator in the process of reading an object from the object data file. The other operator, operator const string &, converts an object into a string. The POB uses this conversion operator for writing an object into the object file.
Your class must override these two operators to translate its internal data from and to std::strings. Specifically, the assignment operator must convert the input string into values and assign them to data members; the conversion operator must convert data members to an output string, and return it.
The snippet below shows a sample persistent class ABC, and its implementation of the assignment and conversion operators:
class ABC: public POBObj { public: static POBObj* create() { (POBObj*) return new ABC; } ... operator const string& (); void operator=(const string&); ... private: int i; double d; string s; ... } // assignment operator void ABC::operator=(const string& str) { putData(str); //* your code start from here *this >> i; *this >> d; *this >> s; } // conversion operator ABC::operator const string& () { clear(); //* your code start from here *this << i; *this << d; *this << s; return getData(); }Your implementation of the assignment operator must call the function putData before attempting to stream in any data. putData is a member function of class Gbiostream, which is POBObj's immediate base class. The putData function takes the input string that was passed to the assignment operator and copies it into a data buffer, which is also a member of Gbiostream. Function putData is defined as follows:
void Gbiostream::putData(const string& str) { _data = str; // copy string to // data buffer idx = 0; }After calling putData, your assignment operator can stream in data from the data buffer using overloaded versions of operator>>. These operators are inherited from class Gbiostream; they extract values from the data buffer into their arguments.
Your implementation of the conversion operator (operator const string &) must call Gbiostream's member function clear before attempting to stream out any data. clear simply cleans out the data buffer:
void Gbiostream::clear() { _err[0] = _err[1] = false; _idx = 0; _data.resize(0); }After calling clear, your conversion operator can stream data to the data buffer using overloaded versions of operator<<. These operators are also inherited from class Gbiostream; they convert their argument values to strings and insert them into the data buffer.
Finally, your conversion operator must call the getData function before exiting. getData is another function inherited from Gbiostream. It simply returns a reference to the data buffer:
const string& Gbiostream::getData() { return _data; }The order in which you send or receive data members doesn't have to match their order in the class declaration; however, the order that you send data in the conversion operator must match the order that you receive it in the assignment operator. This principle is illustrated in the sample class ABC above.
Note that Gbiostream's data member, _idx, is reset to zero in both the putData and clear functions. _idx is an index into the data buffer. Gbiostream's overloaded >> operators use this index (indirectly) to determine where in the data buffer to begin extracting data. This index must be set to zero before your implementation of the assignment operator starts streaming in data from the buffer.
Your classes must also provide a static class creation function, as shown in the ABC class above. The POB will use this function to create new instances of your classes. You register both your class, and the creation function, by calling the POB's register_creation_function:
// create persistent object broker POBroker<POBObj> pob("TEST.OBJ"); ... // register class ABC and its create function pob.register_creation_function(typeid(ABC).name(), ABC::create) ...When you call register_creation_function, the POB in turn registers your class creation function with the GFactory class a data member of the POBbase class. You call register_creation_function by passing in the class typeid and a pointer to the class creation function. (The typeid is a string returned by C++'s RTTI facility. It returns a unique name for an expression or a type.)
When the POB retrieves an object from the disk file, GFactory uses the specified class creation function to construct the object instance.
The POBroker Class
The POB consists of two templates: POBroker (Listing 2) and POBbase.
POBroker acts as the public interface to the objects on the disk file, and provides both sequential and random retrieval, as well as the ability to save, modify, and delete objects. Specifically, using POBroker, you can carry out the following operations:
- Insertion. You can call the function:
POBbase<T>::OBJ_ID writeObj(const T&)to save your object, of type T, to the object file. If the write is successful, writeObj returns an object identifier.- Reading. The function:
POBbase<T>::OBJ_ID readObj(T*&)performs a sequential read from the object file. As an example of its use, the following code shows how you can easily "dump" all the objects in an object file:int main() { POBroker<POBObj> pob("TEST.OBJ"); pob.register_create_function(typeid(ABC).name(), ABC::create); pob.register_create_function(typeid(EFG).name(), EFG::create); pob.register_create_function(typeid(HIJ).name(), HIJ::create); ... ABC* pabc; EFG* pefg; HIJ* phij; while(!pob.eof()) { POBObj* po; if(pob.readObj(po)) { if(typeid(*po) == typeid(ABC)) { ABC* pabc = (ABC*)po; ... } else if(typeid(*po) == typeid(EFG)) { EFG* pefg = (EFG*)po; ... } ... POB_DEL_REF (po); } } return 0; }readObj(T*&) will return zero if there is no object at the current disk position.Note that in the above example, the template POBroker is specialized with the template argument POBObj, the base class for persistent objects. Since so far I have presented POBObj as the only base class to be used for all persistent objects, there will apparently be no need for other specializations it may seem strange to have made POBroker a template in the first place. I made POBroker a template for design reasons: if I develop different base classes, they can use the same persistent mechanism, so as to separate the object details from the persistent carrier tasks performed on the object.
A second version of readObj can be used to read a specific object from the object file:
T* readObj(POBbase<T>::OBJ_ID)If you pass this function a valid object identifier it will retrieve the object from the disk file for you. readObj(OBJ_ID) will return a null pointer if no object is found.
- Deleting. After you have finished working with an object, use the function:
bool POB_DEL_REF(T*);to free the object from memory. There is another function,bool deleteObj(POBbase<T>::OBJ_ID, T*&)which should not be confused with POB_DEL_REF. The POB_DEL_REF function just releases the object from memory, but deleteObj removes the object from both memory and the disk file. In its second parameter, deleteObj will return a pointer to the object deleted from the disk file it is the user's job to release this object from memory by using POB_DEL_REF.- Updating. As I have shown, you can use readObj to read an object from the disk file and to get a pointer to the object. You can then modify the attributes and data members of that object. After changing the object, you can resave it using:
POBbase<T>::OBJ_ID modifyObj(T*);When the POB writes an object back to the disk file, it may relocate the object in the file if the modified object's size exceeds its original size. Figure 1 shows the object file format.
The POBbase Class
POBbase (see Listing 3) serves as a base class for the POBroker class. POBbase contains subobjects of type GFactory<>, objAvl, and DskMBTree; POBroker uses these subobjects to accomplish its POB tasks.
The sole purpose of template class GFactory (see Listing 4) is to create object instances. In the above example, when the code calls
pob.register_create_function(typeid(ABC).name(), ABC::create);the call is forwarded to the method
add_function(const string &, <function pointer>)of the GFactory class.
GFactory uses an STL map to store the association between the object's typeid and its creation function. It stores the typeid string as the key, and the pointer to the creation function as the value.
The map enables the POB to create objects of a requested type. The POB calls the GFactory method, create, passing it the typeid of the object to be created. The GFactory uses this typeid to look up the correct creation function from its map. This is why it is important to register your class creation function before calling any POB functions.
objAvl (see Listing 5) is a class that keeps track of empty (available) spaces in the object file. When an object is deleted from the object file, the space freed up is recorded in the objAvl object. By consulting the objAvl object first, the POB can sometimes store new objects to the object file without having to increase its size. Since objects that are stored in the object file have variables sizes, a special data structure is required to keep track of available storage locations. objAvl uses an STL deque of SIZE_POS for this purpose.
SIZE_POS is defined as follows:
typedef pair<size_t, streampos> SIZE_POS;The first member of the pair is the size of the empty slot in the object file; the second member is the slot's location.
Class objAvl is itself a persistent class, and inherits from class POBObj; a single instance of class objAvl is embedded in the object disk file along with the other stored objects.
The deque is kept in sorted order by empty slot size, so that a best-fit strategy can be used when searching for space to hold a new object. The best-fit strategy finds the first available slot closest to the size required to hold the new object.
The contents of the objAvl object will change frequently, as other objects are deleted, or modified and put back with a different size. To avoid having to frequently relocate the objAvl object on the disk file, the size it occupies on the disk file must be kept from changing. This is accomplished by padding the deque with dummy entries immediately before it is written to the disk, so that the deque always requires the same amount of storage.
Class objAvl maintains a const data member, _max, which is initialized when an instance of objAvl is constructed. It sets the initial size of the deque that will be written out to disk (with padding).
objAvl::objAvl():_max(100), _pad_size(0) { _acc_size = _max; }_acc_size is the current value for the size of the deque that will be written to the disk file. (It does not indicate the current value of the deque itself; that value is given by the deque's size member function. The deque will be padded to _acc_size immediately before it is written.)
_pad_size indicates how many dummy entries will be added to the deque before it is written to the disk. _pad_size is recalculated immediately before the deque is written to the disk.
If the number of entries in the deque exceeds objAvl's current limit (given by _acc_size), this limit is enlarged by _max/2. It remains at this new value until it has to be increased again.
The padding of the deque takes place within objAvl's conversion operator, which is called by the POB to write the contents of the objAvl to the disk file:
objAvl::operator const string& () { ... if((_pad_size = _acc_size - _avl.size()) < 0) { spaceRecollection(); // empty slot recollection if((_pad_size = _acc_size - _avl.size()) < 0) { _acc_size += _max/2; _pad_size = _acc_size - _avl.size(); } } SIZE_POS ps(0, 0); _avl.insert(_avl.begin(), _pad_size, ps); //padding data .... return getData(); }When the object is retrieved from disk file, the padding is filtered out:
void objAvl::operator=(const string& str) { putData(str); *this >> _acc_size; *this >> _pad_size; *this >> _avl; removepad(); // filter out padding data }objAvl also does some space reclamation work for you, combining two empty slots into a single large slot if two slots on the availability list are physically adjacent.
The Disk Interface
The current version of the POB maintains three disk files. One is an object file for storing all saved objects; one is a multiway tree file which records all object positions in the object file, thus acting as an "index" to access objects from the object file; and the last file enables the POB to manage the multiway tree file disk space this file records all unused disk space in the multiway tree file. The whole multiway tree resides in the disk file, and only the head node is loaded into RAM when the POB is started.
As mentioned previously, the POB's goal is not only to save an object for reuse, but also to provide fast retrieval. This is why I chose the multiway tree as the backbone of the POB. In general, reading a block of storage from disk is very slow compared to reading one from memory; but the multiway tree improves the efficiency considerably.
Normally, users will not need to interact directly with the multiway tree to use the POB. For more information on the multiway tree, see the sidebar.
Using the POB
Using the POB consists of three steps:
1. Derive any classes you want to make persistent from the POBObj class.
2. Implement your class creation function.
3. Implement the two operators previously discussed. These operators must override the virtual operators provided in class POBObj.
Two examples of how to use the POB are shown here. Listing 6 demonstrates how to modify an object obtained from the POB. In this example, the program first saves the ABC object into the POB. Then the program retrieves the object using its OID, which was returned by the write function. Some of the object's data members are then modified, and the object is saved back into the POB. Finally, the same OID is used to retrieve the object again and display the values of its current data members.
Listing 7 shows how to delete an object from the POB. The program of this example passes the OID from Listing 6 into the POB's deleteObj function to remove the object from the POB. The program also calls the POB_DEL_REF function to release the object from memory, using a pointer which was returned from the deleteObj function.
Other Considerations
Using a best-fit strategy to select an empty slot is intuitively appealing, but it exacts a price. The strategy requires that the POB search through at least a part of the list, not only when it needs to find an empty slot for a new object, but also when recording the space freed up by a newly deleted object. This extra processing time could be significant in some real-time applications.
The current POB is a simple persistence implementation. A persistent object's data member, which is also an object, doesn't have an object identifier associated with it. The POB does not directly retrieve this data member; rather, this object is simply saved and retrieved as a part of the parent object. Thus, saving two persistent objects that both reference the same data member object will cause two copies of the object to be saved.
Although the POB relies on object identifiers to insert, search, delete, and modify persistent objects, it is possible to access such objects in a more user-friendly manner. By pairing some unique attribute of an object with its OID, the object can be accessed using that attribute as a key. This is accomplished through use of a map, as shown in the following example.
Class ABC is a persistent class that has a unique attribute, an order number:
class ABC : public POBObj { public: ... string& orderNumber() { return OrderNumber; } ... private: string OrderNumber; ... };Class OrderTable is another persistent class that associates order numbers with OIDs:
class OrderTable : public POBObj { typedef string order_number; typedef unsigned long OID; map<order_number, OID> odrtbl; ... public: ... // add key-value pair(odrnum, oi) into odrtbl by using // odrtbl's insert() function void add_order(order_number odrnum, OID oi); // * get OI by passing order number into odrtbl's // find() function OBJ_ID getOrderOI(order_number); ... ... }The program creates both an ABC object and an OrderTable object:
POBroker<POBObj> pob("TEST.OBJ"); ... OrderTable ordtbl; ABC abc; ...The ABC object is written to disk and the object's order number is associated with its OID through a call to the add_order function:
... unsigned long oi = pob.writeObj(abc); ordtbl.add_order(abc.orderNumber(), oi); ...Later, the program uses the object's order number as a key, to indirectly retrieve the ABC object:
ABC* pabc = (ABC*)pob.findObj (ordtbl.getOrderOI("AAA0001"));Tested Platforms
All POB files and examples have been compiled and tested on Solaris X86 and Redhat 6.0 using GNU C++ and SGI STL.
Acknowledgment
I would like to thank my friend and colleague John Ousey for his help, without which this article would not have been possible.
Gary Hsiao works as a programmer at RBC Dominion Securities Corporation. He has an M.A. in Computer Science from Queens College, City University of New York. He can be reached at ghsiao@rbcds.com.