C++ Experts Forum


The (B)Leading Edge: Building an Indexed File Class Using XDR_Stream

by Jack W. Reeves


I want to kick off this column like I have many others with a little historical perspective. Early in my C++ career, I got involved in an effort to build a medical imaging workstation using an ordinary personal computer — in this case an Apple MacIntosh IIfx. The images were stored in individual files on the largest hard disk that could be installed in the machine. The images were then indexed by several different criteria such as patient name, type of study, etc. I ended up being responsible for building this index. Eventually, it was intended that the system have a real database behind it, but for an initial prototype it was decided to build the index using a commercial ISAM file product.

Partially because I was part of the "a DBMS is overkill for this" crowd, I got stuck developing the index using the ISAM library. Actually, if you think of the hard disk as an image database, with the full path name of an image being its primary key, then I was building a number of secondary indexes. In any case, the project turned out to be relatively straightforward with the commercial ISAM product providing more than enough capability. Even now, nine years later, I am still impressed by the performance that was achieved on the hardware that we were working with. I also remain impressed by the quality of the design and the code that was produced. The development team was one of the most talented I have ever been associated with.

The only problem, from my perspective, was that the ISAM product was completely C based, whereas the rest of the project was object-oriented C++ . Naturally, I hid the C-based code behind a C++ interface, but I wanted to go even further and create a C++ wrapper for the C-based API of the ISAM mechanism. Unfortunately, I realized from the beginning that what I really needed was some type of template, and the compiler we were using did not yet support templates. Even if it had, I wasn't sure that I had enough insight at the time to come up with a good C++ ISAM API. As a result, the idea went into the "someday" file in the back of my brain.

A couple of years later, I was introduced to an early version of the STL (Standard Template Library). I quickly realized that the associative containers, especially map and multimap, had almost precisely the interface that I needed for a C++ class representing an indexed file. In fact, the internal class that implements the red-black tree usually used as the underlying implementation of all associative containers in the STL provided the most inspiration.

This class is a template that takes four arguments Key, Item, Compare, and KeyFromItem (ignoring the Allocator for now). The first two are data types, and the second two are function object types. The red-black tree store Items. The KeyFromItem functor is used to extract the key portion of an item so that it can be compared using the Compare functor. When used to implement a set, the Item and Key are the same type, and KeyFromItem is an identify function. When used to implement a map, the Item is a pair<const Key, Value> and the KeyFromItem functor returns the first element of the pair. I realized that this was a perfect generalization of a database that stored Items that were indexed by Keys [1].

I still had one problem however. Consider trying to build an IndexedFile with a key of type std::string. You would expect this to be quite common. While it is easy enough to write a string object to a file, it is a waste of time since the actual string data is not typically stored in the string object itself. In order to correctly store a string in a file, you have to do more than just store the object. Furthermore, strings can vary in length. I rejected outright any idea that an IndexedFile class could only be built using data types that were fixed size and stored all their data internally. Being able to use a std::string as a key is simply one example of a capability too useful to ignore. Many classes in C++ store their data on the free store and only hold pointers in the object itself. In addition, I wanted this to be a true object-oriented facility that could handle storing items that were actually objects from derived classes in a hierarchy. Taken together, this all meant that IndexedFile itself could not be expected to know how to store and retrieve the actual data it was working with; some help would have to be provided by the user.

Right after I encountered the STL, I had to do some work on a project that was using XDR. When exposed to XDR as a presentation-layer communications protocol, I was not overly impressed; it seemed too simplistic and tightly coupled to its applications. On the other hand, I quickly realized that those very traits made it a reasonable protocol for persisting objects in a stream. XDR even comes reasonably close to supporting the storage of derived data types with its concept of a discriminated union.

With XDR as a platform-independent object storage mechanism, I now had all the pieces I needed to design and implement a C++ IndexedFile class template that could store arbitrary Items — or at least I thought I had all the pieces. When I sat down to actually do it however, I found there were a challenging number of details to be addressed. On the chance that you may find some of those details interesting, I intend to describe my resulting IndexedFile class and its associated BtreeIndex class in this column. The next column will get inside the implementation.

Before I begin, let me get some caveats and qualifications out of the way. I am not trying to build a database engine with IndexedFile. The underlying metaphor is still that of an ordinary "file," more specifically the underlying metaphor is that of an XDRStream, which is in turn modeled after IOStreams from the Standard C++ library. The intent is to add a layer of additional organization to an ordinary file to make it more useful under certain circumstances. As such, the intended usage is basically the same as any other XDRStream or IOStream.

In particular, there is no provision in IndexedFile for record locking, so the concept of multiple simultaneous users of an IndexedFile is a little foreign — at least if any of them is doing updates. The implementation depends upon whatever locking is provided by the underlying XDRStream mechanism, which basically means whatever is provided by the underlying OS file system. I will admit that it might be nice if IndexedFile was more robust in this area, but I am a firm believer in keeping solutions as simple as the problem will allow. IndexedFile is intended for use by single applications that need to be able to access external data by index. If your needs are more sophisticated, then you will have to build them yourself. Maybe you could use IndexedFile (or something similar) as a starting point, or maybe you should just use a DBMS.

Class IndexedFile

Template Arguments

Listing 1 contains the public interface of class IndexedFile. Let me describe certain parts of it in detail. First, you will note that it is a template with several arguments. While IndexedFile's public interface is modeled after an associative container (multimap to be precise), it requires considerably more help from the user than the typical associative container. The class declaration is as follows:

template<typename Key,
    typename Item,
    typename Compare = std::less<Key>,
    typename KeyTraits = XDR_Traits<Key>,
    typename ItemTraits = XDR_Traits<Item>,
    size_t PageSize = 4096>

class IndexedFile;

The first two template arguments are the Key used for the indexing and the actual data Item to be stored. The third template argument is the type of a function object to be used to compare the Key values. As is usually the case, if no Compare argument is supplied, it defaults to the standard less<Key> functor.

Right off the bat, you will notice that there is no KeyFromItem argument. After some thought, I decided to forgo the idea of insisting that the Item contain the Key. My experience with the medical imaging database was a key factor in my decision. In that case, the actual items being stored were simply filenames that did not contain the key information — the key information had to be supplied externally. I decided it was simpler in the long run to have clients supply both the key and the item as part of an insert rather than having to supply a KeyFromItem functor. If the item does contain the key, the client can simply extract it before calling insert. In essence, I require the client to do whatever KeyFromItem would have done. After I made this decision, I realized that it eliminates the problem of trying to allow the user to update an Item without forcing them to remove it and re-insert it just to make sure the key portion doesn't change without a corresponding update of the index. When design decisions make things simpler for both the implementation and the client, that is usually a sign that they are the right decisions.

Up to this point, the template arguments are the same as for multimap (or map) and the requirements on the types are pretty similar. Unfortunately, there are some additional requirements imposed by IndexedFile on Key and Item types.

The most obvious is that it must be possible to encode and decode both a Key and an Item to/from an XDRStream. The next two template arguments are concerned with supplying this capability. In order to store Keys and Items in the external file, IndexedFile has to have functions like operator<< and operator>> for those types. Originally, I thought about just using those operators and requiring that they be available. This is a little restrictive, however, so I decided I had to do better. Rather than add four additional template arguments for the functions, I decided to incorporate a traits type — in this case an XDR_Traits type. As with other traits types, the XDR_Traits type is a struct that must supply certain typedefs and static functions that in turn provide information and operations on the underlying type. For IndexedFile, the requirements for XDR_Traits are that it supply the following:

The header file provides a template XDR_Traits that supplies a default xio_type from the XIO.h header and on the standard operator<< and operator>> functions. The XIOFileStream type uses an XIOFileBuf type that is based on the std::filebuf type in the Standard C++ library. Clients are free to provide their own version of the xio_type. It might be especially useful to provide an xio_type that is more directly connected to the underlying file system.

In addition to being able to store Key and Item types in an XDRStream, the encoding/decoding of different objects must be order independent. It should be obvious with an IndexedFile that Key/Items can be inserted and then retrieved in random order. Most of the time this should pose no problems, but recall that one of my goals is to store different object types from a class hierarchy by just being able to encode/decode a base class reference. In my previous column, I discussed how an encoder could store a type id with the object, and a decoder could recover the type id and then create the correct derived object type. The mechanism I described then was based on assigning unique type id's sequentially on an as-needed basis. The first time an id was used, a user assigned type name was also stored with the id. That would work for a sequential object store, but not for IndexedFile. For IndexedFile, the association between the type name and the id has to be made externally. In retrospect, I think this is probably a better idea anyway. I will discuss this more when I talk about support for polymorphic class hierarchies below. For now, just note that a requirement of the Key and Item encoders and decoders is that each operation must be independent of all previous operations.

Finally, Key and Item types must also be able to handle their own memory management. Originally, I had hoped to be able to allow any type that could be encoded into an XDRStream to be used as the Key or Item type. If you have followed previous columns, you will recall that if I have a hierarchy of types derived from a class Base that I wish to be able to encode and decode in an XDRStream, I typically create the following functions:

operator<<(oXDR_Stream&, const Base*);
operator>>(iXDR_Stream&, Base*&);

The first function puts the actual type information in the XDRStream (a type discriminant) and then dispatches to a member function to write the object itself into the stream. The second function is responsible for reading the type information back, creating the correct derived type object, and then dispatching to a member function to actually read the object from the stream. I use a pointer in the operator<< function to emphasize the fact said function handles a hierarchy of types, rather than just the Base type. The operator>> function has to take a reference to a pointer since it must create the object after it figures out what type it should be. In the latter case, the function acts as a Factory, creates the correct type of object, and then turns ownership of the object over to the client. It is up to the client to make sure that the object is destroyed later.

As I said, I had hoped to be able to use such types with IndexedFile, as in:

IndexedFile<std::string, Base*>

In this case, the Item type is "pointer to Base." Assuming I have the functions described above (encapsulated in an XDR_Traits type), then originally I thought that everything should work. Unfortunately it doesn't — or at least it might not. I immediately realized that it was not possible to use something like Base* as the Key type, and after some thought, I decided it is probably not a good idea even for the Item type. For a Key type, the IndexedFile will have to decode keys from the external index as part of searching. These keys will never be seen outside of IndexedFile, but IndexedFile does not know that they might have to be explicitly deleted after being decoded. Using a pointer for Key is guaranteed to leak memory. For the Item type, the situation is a little more ambiguous. Items are only decoded from external storage when explicitly requested by a user (by dereferencing an iterator). Therefore it seemed theoretically possible to still have the user be responsible for deleting the object created by the decode operation. It may be, but I decided to make no promises that would limit my implementation options. In particular, I wanted to allow the possibility of caching. All this means that it must be possible for IndexedFile to decode both Keys and Items from external storage without worrying about whether they need to be passed to a client to be deleted.

In order to have Keys or Items that represent hierarchies of object types, you must either be using a real garbage collector, or the actual types used to specialize the IndexedFile template must be smart pointers that do reference-counted memory management. I don't see a class hierarchy being used as a Key type very often (if ever), but I expect it to be quite common as an Item type. Fortunately, reference-counted pointer types are easily obtained. See [2] or <www.boost.org>.

The final template argument is the page size to be used in the BtreeIndex implementation. This parameter will be discussed in more detail when I describe the BtreeIndex class.

Operations

The actual interface for IndexedFile is similar to the interface of the Standard C++ library's multimap container. This involved another decision that took me a long time to make. I do not claim to be a database expert, but I know enough about relational data modeling to know that every table should have a unique key. Therefore, my original idea was that IndexedFile would support only unique keys (i.e., it should look like a map).

Again, my experience with the medical imaging application provided a different perspective. As I noted in the beginning, the idea was to index images by things like patient name. Obviously, a given patient was likely to have multiple images in the database. The more I thought about it, the more I realized that support for non-unique keys was essential. Furthermore, I realized that it wasn't really any harder to implement. So, I decided that IndexedFile would be patterned after multimap instead of map. I am still a little ambivalent about this since it means that clients that want only unique keys will have to handle the elimination of duplicates themselves. While I am a firm believer in minimal interfaces whenever possible, I am also a firm believer in providing for common situations, even if it does bloat the interface somewhat. I am still thinking about this, but for now, IndexedFile looks very much like a persistent multimap. There are some differences, however.

First, since IndexedFile is a file, there are almost no const operations. The underlying file can be opened for input only, but the actual IndexedFile object is not a const object. As a result there are no const_iterators.

Another important thing to note is that dereferencing an Iterator does not return a reference to the value_type. Technically speaking, this probably means that an IndexedFile::Iterator does not meet the Standard's requirements for an iterator (although they are probably close enough for most purposes). Unlike a typical iterator that mimics a pointer, the IndexedFile Iterator is more like a Proxy object. It contains enough information to allow the Key/Item pair to be decoded from the external XDRStream when needed. Since the Key/Item pair is not stored directly as part of the IndexedFile object, the Iterator cannot return a reference to it.

Originally, when an Iterator was dereferenced, I just decoded the key (which was probably already cached in memory) and the Item and returned the resulting pair. But this left me with no way to actually update an item while keeping the same key. The alternative was to remove the existing item and then insert the new item. This seemed like an inefficient way to do things, especially for an external Btree index. I figured this was an important enough capability that I actually added a member function to handle it. The function (named replace) took an iterator and a new Item value. Then I realized that I really needed an IndexedFile::Iterator to have correct iterator semantics, and there was a way I could provide them.

Now, when an Iterator is dereferenced, instead of returning a value_type object directly, it returns std::pair<const Key, ItemProxy>. The ItemProxy class provides two operators, one that converts the ItemProxy into an Item. ItemProxy also provides an assignment operator that takes a new Item and stores it in the IndexedFile with the existing Key value.

There are a couple of special operations that are more a part of the file interface than part of the multimap interface. One of these is the compact function. This function is a minor exposure of the underlying implementation, and this seems like a good point to at least outline that implementation. IndexedFile actually is implemented as two XDR file-based streams. One of these contains the keys, and the other contains the items. Because items are not required to be the same size (they are not even required to be the same type), the items are simply written sequentially into the data file as they are inserted. When an item is removed, or a new item replaces an existing one, the old item remains in the file, but without a key to access it. No attempt is made to reclaim this space. As a result, the data file part of IndexedFile can grow indefinitely large. The compact function is there to deal with this. When invoked, it creates a new data file and then steps through the index and copies every valid item from the old data file into the new one. It updates the index to reference the new file in the process. When done, the new compacted data file replaces the old one.

This is another of those little design decisions that often seem to make the difference between really good class design, and just so-so. Besides providing the compact function so clients can explicitly compact the IndexedFile, I also thought about making it something that happened automatically when the file was closed. I quickly rejected this idea since compacting a file that doesn't need to be compacted would be a real waste of time. I then thought about adding a compact parameter to the close function with the default being no-compaction. In the end, I just stuck with the explicit function call. Unfortunately, the current design does not provide any way to determine the amount of wasted space in an IndexedFile, so it is practically impossible to know when compaction is needed or when it would be a waste of time. This is one design area that I am still working on.

The other utility function is the read function. This is deliberately there to allow external indexes to read the data portion of an IndexedFile. Again, it might be nice to be able to create multiple indices all within the same IndexedFile object, but I am not trying to build a database engine. For now, it seems sufficient to just be able to read the data Items.

Finally, I need to comment on supporting the storage of polymorphic objects in an IndexedFile. In one sense, it is totally up to the XDR encode/decode functions to handle any polymorphic objects. The simplest approach is to treat polymorphic objects like a discriminated union of all the types in the hierarchy. An enumerated type could be created to map a discriminate value into the actual type of the object. The encoder would determine the actual object type, write out the discriminate for that type, and then write out the actual object value. The latter would probably be done via a virtual function call. The decoder would read the discriminate, create a new object of the corresponding type, and read the value into the new object. Again, the last step would probably involve a virtual function call. The prior step would probably be done with a switch statement. Whenever a new derived class was added to the hierarchy, the enumeration would have to be updated, and the encoder and decoder changed to handle the update. For hierarchies that do not change very often, this is probably acceptable. On the other hand, it is not what most of us consider good object-oriented programming style.

For this reason I originally added some support to iXDR_Stream and oXDR_Stream to provide a more general solution. iXDR_Stream provides an object of type IdToTypeMap. It is suppose to map unique ids into strings that represent type names. The strings are then expected to be used by the decoder function to create new objects of the requested type. This still might be done with the equivalent of a switch statement, or it might use something like the FactoryCollection I described in a previous column, or a Factory pattern based upon a Prototype. The point is that the IdToTypeMap, as well as the corresponding TypeToIdMap in the oXDR_Stream class, were intended to allow data initialization to replace programming. (This is also the point of the FactoryCollection class.)

Using this capability with IndexedFile means being able to initialize the maps in the XDR_Stream so that they will be available to the encode/decode functions. For this reason, I provide two accessor functions. index returns a reference to the IndexedFile's BtreeIndex. The XDR_Stream used for Keys can be obtained from this. xdrstream returns a reference to the XDR_Stream used for the Items. With access to the underlying XDRStream classes, the user can initialize the TypeToIdMap with the appropriate type names and corresponding ids (likewise for the IdToTypeMap).

I must note that I added an additional function to the TypeToIdMap class in the XDRStream header. Before, inserting a type name would just assign the next available id. While this capability still exists, I added a function that allows a specific id to be assigned to a type name. This way, users can ensure that the same id is assigned to the same type every time. This is essential for IndexedFile. No checks are made to ensure that the same id is not assigned to more than one type name, and in fact, it is not unreasonable that the same ids can be used for classes in different hierarchies. The only real requirement is that every class in a hierarchy has a unique id within the hierarchy.

Class BtreeIndex

While the ability to create IndexedFiles, essentially a persistent multimap, is what I set out to do, the heart of IndexedFile is its Btree index. Naturally, I created this as a class in its own right, and since it turns out to be useful by itself, I made it a public class. The class BtreeIndex has an interface that is almost identical to that of IndexedFile, so I will limit myself to pointing out the important differences.

First, BtreeIndex is a template, but it only stores Keys, so it does not have the Item and ItemTraits template arguments.

The PageSize template argument is of special interest to BtreeIndex. A Btree implementation stores keys in external pages, which are fixed sized. It is assumed that retrieving pages from external storage will completely swamp other performance aspects, even if the keys are processed linearly within a page. Therefore, the intent of a Btree implementation is to minimize the number of external page fetches (also called probes) that it has to do. Note that it is not so much the number of keys per page that determines the number of probes, but the depth of the search tree. Obviously the more keys per page, the shallower the tree will be, but equally obvious are the practical limits to how large a page can be. Typically a page should be able to hold "many" keys. A good number seems to be at least 100.

Note that since BtreeIndex allows variable length keys, PageSize is specified in raw characters rather than as the number of keys per page. Again, the page size is typically specified as some multiple of the actual virtual-memory page size of the machine, which is usually some power of two. The default of 4,096 is based on the assumption that if a string is used as the Key, and the average length is 32 characters, then a 4,096-character page could typically store 128 keys.

Another thing to note is that PageSize is actually used only when a new BtreeIndex is first created. When an existing BtreeIndex is opened, the PageSize of the existing index is retrieved and checked against that of the class. If the existing page size is not the same as the argument, an exception is thrown. This is perhaps not the best approach. I would like to get rid of the template argument, but some way must be provided for the user to specify the parameter. I considered making it an argument to the open function (and the corresponding constructor), but decided it was more fundamental than that -- likewise for making it something that could be set/retrieved. The jury is still out on this, and I intend to re-examine it after I gain some experience using the class.

Moving on, the value_type of BtreeIndex is std::pair<const Key, size_t >. The second field is actually the stream position of a file. I decided not to use std::streampos since it is potentially too heavy a class for what I need. The idea of BtreeIndex is that a key value references an offset in some file where the actual data can be retrieved. It is up to the client to make the connection between the index and the data file being indexed. For IndexedFile, the position field refers to an XDR_Stream, and the value is in terms of XDR_Chars. Other types of indexes are obviously possible; BtreeIndex doesn't care.

One obvious use for a BtreeIndex separate from IndexedFile is when the entire object to be stored is the key itself or can be reasonably treated as such. In such a case, the second field of value_type can be set to zero (or any value). A special version of the insert method supports this use by simply taking a Key value to be inserted.

Like IndexedFile, when BtreeIndex::Iterator is dereferenced, it does not return a reference to the value_type. Instead it returns a std::pair<const Key, StreamposProxy>. The later class is similar to the ItemProxy in IndexedFile — primarily it provides an assignment operator that updates the stream position value for the specific key.

Finally, there are three utility functions: reindex, set_delete_strategy, and set_update_strategy. The first function is similar in concept to the compact function in IndexedFile, but it requires a little more explanation, which starts with the second function. One of the key characteristics of a Btree is that all pages, except the root, are at least half full at all times. When the Btree is first built, this happens automatically as new keys are inserted; when a new key needs to be inserted into a full page, the page is split and the new key inserted into one of the new pages. The resulting two pages are both at least half full. The problem happens when keys are removed.

If a key is removed from a page that was just half full, there are three possibilities (that I could see anyway). First is just to do nothing. (Obviously, when a page gets to be completely empty, the page itself will have to be removed.) In this case, the Btree is no longer technically correct, but it may not matter very much. The second possibility is to see if either of the page's siblings is also half full or less. This would allow two pages to be consolidated into one page that was more than half full. Finally, assuming that no consolidation was possible, the theoretically correct approach would be to transfer a key from a sibling page to keep the current page at least half full. Unfortunately, shuffling keys and consolidating pages could be a lot of unnecessary work — and a significant performance penalty — for many applications.

The theoretician in me wanted to implement the last option, but the pragmatist in me wondered what was the point. This is not just an idle question asked out of laziness. The most important characteristic of a Btree is its depth. This determines the number of external probes that are typically necessary in a search. Once a Btree has grown to a certain depth, nothing is going to affect performance until enough keys have been removed to shrink the tree's depth. Of course, the problem is how do you determine when you can shrink the tree unless you actually try to do the consolidation. I decided to pass this buck back to the user.

There is an enumeration type DeleteStrategy that provides three values corresponding to the three cases mentioned above: None (do nothing), Compact (consolidate pages together if possible), and Shuffle (move keys from one page to another to keep pages at least half full). Note that Shuffle implies Compact. The default is Compact. This seemed to me like a reasonable compromise between performance and letting pages get too empty. The user can change it if desired. One thing a user might want to do is change the delete strategy to None to increase performance when a lot of deletes occur. This could leave the Btree with a lot of mostly empty pages. To allow a Btree to be restored to its proper state in such a case, I provide the reindex method. This walks through the existing Btree and rebuilds the index. Note that re-indexing is only useful if None or Compact have been the delete strategies when keys are removed. If Shuffle is always the delete strategy, then no re-indexing is necessary.

I will describe the utility function set_update_strategy when I discuss the implementation in the next column. For now, simply note that the default update strategy is MostlySafe.

Final Thoughts

That pretty much covers the interface to IndexedFile and BtreeIndex. It really doesn't look like there is very much there — which is good. On the other hand, I think it should be sufficient for a lot of different applications. I want to reiterate what I said at the beginning about this not being intended to substitute for a full-fledged database engine. Let me give one simple example to illustrate both points.

One obvious use for a stand-alone BtreeIndex is as a secondary index for an existing IndexedFile. Let's assume we have a IndexedFile of Parts, keyed by PartNo, and we want to build a secondary index based an a common name for the part. Further, lets assume that the name is a field of a Part. Assume we have the following:

typedef IndexedFile<PartNo, Part>   Parts;
Parts parts;

Our secondary index creation might look something like this:

typedef BtreeIndex<string>       PartNames;
PartNames names;

parts.open("Parts", std::ios_base::in);
names.open("Names");

Parts::Iterator it;

for (it = parts.begin(); it != parts.end(); ++it) {
    Part p = (*it).second;
    names.insert( PartNames::value_type(p.name,
        (*it.base()).second)); 
}

This steps through all the records in the parts file and inserts the name and the stream position into the secondary index. The stream index is obtained by calling the base function of the Iterator. This returns the underlying BtreeIndex::Iterator. Dereferencing that provides the key and the stream position.

The point is that this is not too difficult. The use of iterators and the typical STL-like interface make it straightforward to write. On the other hand, it is not automatic. For example, the application has to keep track of what the secondary index refers to. This is what I mean by not trying to duplicate a database.

Next time, I will walk through the implementation.

Notes and References

[1] My actual IndexedFile class did not end up with the same interface as the rBtree.

[2] Jack Reeves. "The (B)Leading Edge: Simple Memory Management Classes," C++ Report, July 2000.

About the Author

Jack W. Reeves is an engineer and consultant specializing in object-oriented software design and implementation. His background includes Space Shuttle simulators, military CCCI systems, medical imaging systems, financial data systems, and numerous middleware and low-level libraries. He currently is living and working in Europe and can be contacted via jack_reeves@bleading-edge.com.