April 1997/Persistent Lists Using ISAM Files

Databases/Persistent Objects

Persistent Lists Using ISAM Files

Lionel Lopez

Persistent objects needn't involve complex file management if you can lean on an existing manager of sufficient complexity.

Not long ago I worked on a distributed system for automating on-air playback of music and commercials for radio stations. We needed a system that could iterate through playlists, merge them, randomly jump around in them, and add, delete, and modify items in the lists. In short, the system had to manage the playlists in much the same way that current class libraries (e.g., STL, Rogue Wave's Tools.h++) manage various types of container classes. There was an added twist, however: the lists had to be persistent, and they needed to be of arbitrary length, which precluded the use of simple streaming I/O for transferring entire lists between primary and secondary storage (i.e., between RAM and hard disk).

We also wanted to sort and access items in the lists according to their different attributes. The attributes could be many, because the playlists contained information about the music, commercials, and other items scheduled to play on the air. Even so, we did not need all the horsepower (or licensing fees) that came with existing relational systems.

The solution we came up was a persistence mechanism implemented with the Indexed Sequential Access Method (ISAM) of managing data files. Since a class definition can be thought of as the schema definition for a traditional data table [1] , objects of this class correspond to individual records in such a table. This correspondence made a simple file-record manager (such as ISAM) seem like the most obvious and straightforward implementation for our persistent playlists. In particular, ISAM would allow us to keep the entire list on disk and to access items in any order or at random without having to read the entire list into memory. The end design resulted in a small group of classes which offered this functionality while hiding gory details, and which allowed permanent storage to be defined for any arbitrary class by the addition of a single class.

Design Requirements

The general specifications for the playlist system were as follows:

The program would be able to freely exchange objects with the data file, access them in any arbitrary ordering, and at no point worry about the details of either the object store, or the object itself. We wanted a collection of objects that exhibited ISAM-like behavior, that was not limited by RAM, and that could persist between invocations of the program. Ideally, the application would not perceive that it was manipulating objects, which did not really reside in RAM. The application would only visit them as long as they were needed before they went back to rest on disk again. Because of the iterative nature of our development process, we wanted such persistent object lists to be easy to define and modify.

The ISAM system had to integrate with the rest of our application, which was implemented in C++, making thorough use of OO design. A good old C++ wrapper would have sufficed to make the interface to the ISAM file manager easy to use and portable, but to make the layout of the data files easy to define and change, the data files would have to return live objects. A flat ISAM record running naked through an application full of objects was bound to develop unnecessarily strong affinities with the methods that access it. We needed the data abstraction and information hiding of well defined objects to maintain data implementation details under cover, and allow the definition of the data types to be easily changed.

The General Implementation

The design of the persistent lists of objects is based on a triad of classes. First, there is a common base class, ISAMFile, which does the grunt work of managing and exchanging records with the ISAM file. Next, a translation class, ObjFile, is derived from this base class. ObjFile is a template class which provides the schema details for each specific type of data file and converts records from this file to and from the corresponding objects. The third group of classes are those whose objects will be stored in the data files.
Class ISAMFile provides a wrapper around third-party ISAM engines. Its interface provides the basic functionality expected of any ISAM package. The interface makes no reference to any data types specific to a third-party ISAM API, making it easier to switch to another ISAM engine in the future should the need arise. The methods of class ISAMFile, the skeleton of which is shown in Listing 1, export functionality in the following categories:

manipulating ISAM files (creating, deleting, renaming)

informational (file name, number of records, number of indexes, etc.)

sequential and random read, write, and update of records

building and changing indexes on the fly

transaction management (begin, commit, rollback)

file and record lock management

Class ObjFile (Listing 2) , which is derived from ISAMFile, provides an interface to objects that closely parallels ISAMFile's interface to ISAM records. As a template class, ObjFile really represents a family of classes, each of which is instantiated for the particular type of object it must handle.

Class ObjFile is the workhorse of the lot. It handles all the calls to the underlying ISAM engine (through base class ISAMFile), and provides the application with a clean object-oriented interface to the list manipulation and ordering functionality. The application exchanges objects with ObjFile, and ObjFile exchanges flat records with the ISAM engine. This implies that somewhere along the way a conversion must take place between a live object and a flat record. I discuss this conversion process later.

ISAMFile Implementation

Most of class ISAMFile is relatively straightforward. This class maintains state information about the file name, the access mode, the number of records, the current record number, and the currently defined indexes. It does not keep information about the internal format of the record, nor about which is the current index. This information is maintained external to this class, and is passed in as needed to the index management methods of class ISAMFile.

Class ISAMFile's declaration exports several enumerations and structures for use in specifying index keys. One enumeration lists the data types the ISAM engine can use as an index key field. The structure ISAMFile::KeyField contains information about each field used in an index key: its data type, its position in the record layout, and whether the ordering for this field is ascending/descending or case-sensitive. The structure ISAMFile::IndexKey contains a list of the fields, represented as ISAMFile::KeyField structures, which make up the key. It also contains a flag field which indicates whether the index must be unique and whether the fields in the entire index may be modified for any given record.

ISAMFile's constructor takes a parameter indicating the name of the data file which the object will create and manage. New ISAM files are created using method Build:
AccessID
Build( IndexKey * primary_key,
       int rec_size,
       ISAMError * error = 0,
       LockModes lock_mode = exclusive,
       AccessModes access_mode = readWrite );
This method takes a parameter indicating the size of the records in the file, and optionally a description of an index key to use as the primary key for the file. The size of the record is really all the information that the ISAM engine needs to create and manage a data file. This is because an ISAM engine manages fixed-sized records, and is not concerned with what is contained in each record. The description of the primary index key is used to indicate that one or more regions of this amorphous record space are to be interpreted as specified data types and are to be used in creating an ordered index of the records in the file.

On opening or building a file with methods Open or Build, class ISAMFile returns a file access id. All subsequent calls to file access methods must pass in this id. Intuitively, it seems logical to store this file access id as a member of class ISAMFile. We chose not to because the particular ISAM engine we used allows a single data file to be opened more than once, returning a different access id each time it is opened. The application can then select different indexes for each different access id and maintain different current record pointers for each one. We have leveraged this feature to support iteration, as explained below. The only hole in this design is that, conceivably, you could create an instance of ISAMFile for one data file and pass this instance the access id for another data file. The object would allow you to access the second data file without complaints. This is not a major drawback, just a minor blemish in class ISAMFile's overall cohesiveness.

The index manipulation methods DelIndex and SelectIndex use the ISAMFile::IndexKey structure to identify which index to delete or use. The method SelectIndex also allows you to specify where to position the record pointer initially after switching to the new index:
ISAMError
AddIndex( AccessID file_id,
          struct IndexKey &  new_index );

ISAMError
DelIndex( AccessID file_id,
          struct IndexKey &  old_index );

ISAMError
SelectIndex( AccessID file_id,
struct IndexKey & index,
SearchModes mode,
char * start_rec,
int significant_key_bytes );
For example, the record pointer may be placed at the first record, the last record, or a record matching certain criteria. The last parameter to this method may be used to indicate that only a subportion of the key should be used in positioning the pointer. This is useful in cases where, for example, you want to select a last name/first name index and you want to position the pointer at the first Smith record. Since in this case you do not know the first name of the first Smith record, you would use this parameter to ignore the first name portion of the index.

The Object Implementation

As mentioned previously, class ObjFile is a template class. (The sidebar gives reasons for our use of templates, as opposed to extensive use of inheritance.) ObjFile provides basically the same set of methods as does ISAMFile, but the interface is specified in terms of objects rather than flat records (Listing 2) . While class ISAMFile requires a pointer to char for manipulation of records, the methods of ObjFile take a reference to the object of the specific type being stored:
ISAMError GetCur( ObjType & obj,
                  boolean lock =
                     FALSE );
The way indexes are specified in class ObjFile differs slightly from how they are specified in class ISAMFile. Since the application code that makes use of ISAMFile objects will be dealing with flat records, the index key fields should be specified in terms of position within the record. This is how they are specified in the ISAMFile::KeyField structure described above. But the application code that uses ObjFile will be dealing with objects instead of flat records. In this case it makes sense to enumerate the attributes of the classes whose objects are being stored, and to use this enumeration to specify which attributes are to be used in an index key. The structure ObjKeyField, exported as part of the interface declarations supporting class ObjFile, is declared accordingly:
struct ObjKeyField {
    // holds enum values
    int fieldNum;
    // same as in ISAMFile
    short flags;
};
Since the mapping between class attributes and the record layout is specified and managed by each class derived from ObjFile, these derived classes are also responsible for providing an enumeration of the attributes which may be used in an index key. These classes must also provide the mapping between the ObjKeyField and ISAMFile::KeyField structures. Class ObjFile provides two overloaded versions of method ConvertKey, which provide a default conversion between these two representation of index key fields. These versions of ConvertKey assume that the attributes of the stored objects are enumerated in the same order as are the fields in the record. These two overloads are declared virtual so that the derived classes may override the conversion logic if the enumerations do not match.

Apart from these methods, ObjFile's most interesting method is its constructor:
template < class ObjType >
ObjFile<ObjType>::ObjFile(
    const char * file_name,
    ObjFieldInfo * field_info,
    int num_fields ) : ISAMFile( file_name )
{
    fieldInfo_ = field_info;
    numFields_ = num_fields;

    fileId_ = -1;
    recSize_ =
        fieldInfo_[ numFields_ - 1 ].start +
        fieldInfo_[ numFields_ - 1 ].len;
    isamRec_ = new char[ recSize_ ];

}
The constructor expects a file name, an array of structures describing each field in the record, and the number of fields in the record (i.e., in the array). Since the constructor for this class is protected, the responsibility for providing the field information falls to each derived class. ObjFile's constructor passes the file name along intact to base class ISAMFile. The constructor stores the field information array and number of fields for later use in converting index key field information between ObjKeyField and ISAMFile::KeyField structures.

The constructor also uses the field information to calculate the size of the record needed to store the object. If only a subset of the attributes are specified in this array (since not all of them may be adequate for use in an index), then a padding field must be added to the array so that the size of the record used is calculated correctly. The constructor then allocates a buffer of the correct size for later use in exchanging records with ISAMFile methods.

The class ObjFile declares two methods, ConvObjToRec( X * obj, char * rec ) and ConvRecToObj( char * rec, X * obj), which it uses to convert live application objects of class X into storable records and to bring these objects back to life again from the ISAM records.

If X were fixed to a specific class and the above methods were implemented accordingly, then class ObjFile would be useless for managing lists of any other types besides X. Instead, we wanted to keep only the common interface and implementation in ObjFile, and to move the conversion details of these routines elsewhere. This is where the derived classes come in. Declaring these two methods pure virtual pushes down to derived classes the knowledge of how an object of a particular class maps into its storage record. The constructor for class ObjFile is declared protected as well to emphasize the fact that this is an abstract base class which should only be instantiated as part of the instantiation of a derived class.

Supporting Iteration

Class ObjFile provides all the methods necessary for traversing and managing the objects in the list. In our radio playback system, however, we needed to simultaneously access the same list more than once, according to different indexes. For example, the disc jockey needed to see the playlist in play sequence order, while some background processes needed to access objects in this same list according to their status and unique id. We satisfied this requirement by implementing an iterator class, ObjFileIter (Listing 3) , for class ObjFile, which would allow different simultaneous views and orderings of the same list. This iterator is also a template class and is instantiated with the same class used to instantiate ObjFile.

As mentioned above, our implementation of these iterators relied that the third-party ISAM engine allowing the same data file to be opened more than once. Each time it is opened the engine returns a different file access id, and for each such id it maintains a separate current index and record position. The trick then is to associate each ObjFile and ObjFileIter instance with a single access id. Each of these instances can then maintain a different ordering and record pointer in the list, thereby presenting a separate view of the list.

Class ObjFileIter's interface is nearly identical to that of class ObjFile for opening and closing a file, for record access, and for index management. Class ObjFile provides private versions of all its record and index management methods and declares ObjFileIter a friend class to allow it to access these methods. Class ObjFileIter maintains a reference to the ObjFile object over which it is iterating, and routes all calls to the corresponding private methods of ObjFile. The public methods of ObjFile also route to the corresponding private versions.

For syntactic convenience, class ObjFileIter also provides several operators for the basic list iteration functionality. These operators cannot return status codes indicating end-of-list or error conditions, so some other mechanism must be used. The operators return "null" objects to indicate these conditions. Class ObjFileIter exports method IsNull to check if an object returned by one of these methods is "null." Internally, it calls methods IsNull and MakeNull of class ObjFile. These methods are declared virtual in class ObjFile, since it is the classes derived from ObjFile that know the specifics of the stored classes, and how best to check and set their objects' null status.

An Interesting Example

Listing 4 shows an example from the radio playback system that demonstrates creating and using permanent lists of objects. This listing shows part of the declaration for a PlayEvent class, which maintains information about the items in the playlists scheduled for on-air playback. Class PlayEvent is basically just a data structure with some additional exported functionality and types to create and manage a permanent list of PlayEvent objects. In this example I use the PlayEvent sequence number and status as keys for accessing the list. Most other attributes are informational.

Listing 5 shows how to manage permanent lists of PlayEvent objects, with a new class PlayListFile derived from ObjFile. The first item of action is to instantiate ObjFile using class PlayEvent, and to inherit PlayListFile publicly from ObjFile<PlayEvent>. This new template instance provides a class whose record and index management records are strictly typed for PlayEvent objects.

The next thing to do is to define the schema for the data file used to store PlayEvent objects, and pass this information to the constructor of ObjFile<PlayEvent> in the base class initializer. Listing 6 shows the declaration for the structure PlayEventRec (private to the compilation unit) as the flat record used for storing PlayEvent objects. Aside from data fields corresponding to PlayEvent's attributes, this structure also exports some declarations for enumerating its fields, their positions, and their lengths. The code in this listing also declares an array of ObjFieldInfo (with internal linkage) and initializes it to contain all the pertinent information, including data type, for all the fields in PlayEventRec. This array, together with the number of fields and file name, is passed as a parameter to the ObjFile<PlayEvent> constructor in the base class initializer for the PlayListFile constructor.

primaryIndexKey, another internally linked variable of type ObjIndexKey, holds the primary index key for PlayListFile. This index key is initialized to define a key with no duplicates. It is comprised of a single field, the sequence number, whose ordering sequence is ascending (value zero means use the default ordering). This key becomes a parameter to method SetPrimaryKey of ObjFile<PlayEvent>, which is called from the body of the PlayListFile constructor.

Listing 6 also shows definitions for the conversions routines ConvRecToObj and ConvObjToRec, which were declared pure virtual in class ObjFile. These routines utilize two methods, PutFieldStr and GetFieldStr. These latter routines convert any of the class attributes to and from string representations. They were originally created for use with the system's GUI, but they come in handy here as well.

The next functions to be implemented are IsNull and MakeNull. In this case a null PlayEvent is one whose unique id attribute is -1.

Note that class PlayListFile does not need to export an enumeration of the fields in PlayEvent. These fields are already specified by the enumeration PlayEvent::Fields. We created this enumeration to be used with the GetFieldStr and PutFieldStr methods of class PlayEvent. The classes whose objects are to be stored usually will not have any such an enumeration as part of their interface, so it will be necessary for the classes deriving from ObjFile to provide them. Also note that we did not need to override the ConvertKey overloaded methods since the PlayEvent::Fields enumeration maps precisely to the PlayEventRec::RecFields enumeration. In other circumstances a programmer would have had to override these methods to correctly map the object field to the record field.

Listing 7 shows the new permanent list in action. First, a PlayListFile object is created and associated with a file name. (For the sake of this example assume that this file already contains interesting data.) The example code opens the playlist and goes to the first item. Using the default key based on sequence number, the code iterates through all play events and prints out some information for each. Next, the code creates an index based on the status field. An iterator is opened on the playlist, and using this new index, the code iterates until it finds the first item whose status is Cued.

Summary

It took some effort to design and implement the infrastructure to manage persistent lists of objects. Even so, we ended up with a flexible and functionally rich toolkit which is easy to use and extend. We can easily create and manage lists of objects with sizes not limited by RAM. We can simultaneously access the same list in different sequence orders and maintain different iteration pointers for each. The programmer can freely manipulate objects in lists without ever having to be concerned about the details of how the lists are managed or how the objects are handled internally.

Notes

[1] I intentionally use the wording "can be thought of," which means something different from "is equivalent to." Relating classes and tables is common in current DBMSs and in the minds of most software engineers. For an enlightening discussion of why this is not a desirable way of associating the OO and relational models, I would encourage you to read the writings of C.J. Date and Hugh Darwen:

C.J. Date. An Introduction to Database Systems, 6th Ed., Addison-Wesley (Especially part VI).
C.J. Date, Hugh Darwen. "The Third Manifesto," ACMSIGMOD Record 24, No. 1 (March 1995).
Also of Interest: "Interview with C.J. Date," DBMS Magazine, October 1994.

Lionel Lopez is President and Senior Consultant with Volks Media Corporation, a consulting and software development firm in the Washington, D.C. area. He specializes in client/server systems combining relational and OO architectures. He may be reached at (703)-998-6908 or lionel@volksmedia.com.