May 2002 C++ Experts Forum/The (B)Leading Edge

C++ Experts Forum

The (B)Leading Edge: Musings on the IndexedFile Design

by Jack W. Reeves

This month's edition of "The (B)Leading Edge" is going to step back from my usual modus operandi of looking at code and instead look at some larger issues. A columnist in one of C/C++ User's Journal's sister publications has remarked that unlike a book, where the author usually presents a finished product, a column provides both the opportunity and (often) the necessity of presenting a work in progress. That is certainly the case with what I am doing now. For those who may not have followed my previous columns all that closely, let me make a brief summary. I am developing an IndexedFile class. This class is intended to provide a persistent object storage mechanism that allows retrieval of items by key. In my previous column [1], I presented the first draft of my IndexedFile class interface. The class interface is modeled on that of the map/multimap classes in the C++ Standard library, but its underlying implementation is based on the XDRStream classes that I have developed previously [2]. XDRStreams are in turn intended to provide IOStreams-like interfaces to the XDR presentation layer protocol.

The IndexedFile class provides a number of features that I think are important, but that I have not found in any other similar libraries I have looked at. First, it allows variable length data items and keys. There is a maximum upper size to a key, but even that aspect is under the control of the user. Second, the data items that can be stored in the IndexedFile do not have to be of the same type — an IndexedFile can handle objects from an entire class hierarchy. Finally, the index part of an IndexedFile is represented by a separate class of its own: BtreeIndex. While objects of this type can be used independently of an IndexedFile, one of my primary assumptions was that they would be most useful in building secondary indexes for an IndexedFile.

Given the fact that BtreeIndex is a separate class that maintains an index in an external file, it should be fairly obvious and understandable that the underlying implementation of an IndexedFile consists of two actual files: one an ordinary XDRStream that holds the actual data items, and a second that is the XDRStream for the BtreeIndex.

I noted in my previous column that I was not trying to build a replacement for a database engine. Nevertheless, I am aware that most database engines have something similar to this mechanism at their heart. I was recently having dinner with a good friend of mine. This friend happens to have a couple orders of magnitude more database experience than I do. He also happens to be a top-notch object-oriented software architect. I was describing my IndexedFile class to him, and, in return, he was asking some very probing questions about its implementation — most of which I could answer. One of his questions — recast into the current vernacular — was "if you delete an item from the IndexedFile, do iterators referring to other items in the file remain valid?" I answered "no." The iterators for IndexedFile are based upon the iterator for the BtreeIndex. Recall that a Btree consists of pages, each of which contains several (key,link) pairs. The BtreeIndex iterator contains the page ID and slot number of the key it represents. Deleting an item in the IndexedFile means deleting the key in the BtreeIndex. In turn, that would cause all the keys with slot numbers higher than the current key on its page to shift down one slot, invalidating all iterators that refer to those keys. Furthermore, if the page gets consolidated, then all the iterators for one page will be invalidated. While not all iterators would be invalidated by a deletion, it didn't seem possible to present a useful algorithm for determining which iterators would still be valid, and which would not. So I simply noted that all iterators should be considered invalid after a deletion.

My friend remarked that an alternative might be to simply mark the removed item as "dead" and leave it in the file. This would allow iterators to remain valid across deletions, but would require that the file be compacted periodically to get rid of the dead items. I noted that such an approach would solve only the problem with deletions, I would still have a problem whenever a new item was added to IndexedFile and a new key had to be inserted into the index. At this point, the conversation turned to other topics.

Afterward, I spent some time thinking about my friend's question. If you are familiar with the associative containers of the Standard C++ library (and I hope you are), then you know that one of the features of those containers (and list) is that their iterators remain valid over insertions and deletions (except for the iterator of a deleted item, of course). I have actually designed code that exploited this fact [3]. I wondered how important it might be for an IndexedFile class to have the same type of behavior. The truth is, I do not have any idea. Since I do not have any real project experience with IndexedFile yet, I do not have any idea about what might turn out to be important details. The fact that my friend had asked the question, however, seemed to indicate that the issue could be important, at least in some circumstances.

So I thought about what changes I would need to make to the design of IndexedFile in order to ensure that iterators into the file remained valid across insertions and deletions. It is said that all problems in computer science can be solved by another level of indirection, and I was pretty sure this was one of them. I fairly quickly realized that the data file portion of my IndexedFile already had the right properties. I never try to reclaim space in the data file — new items are always added at the end. This means the file has to be compacted periodically, but there is a member function to do just that. If an IndexedFile iterator contained the actual (key,link) that referenced the item in the data file, instead of containing a BtreeIndex iterator, then that iterator would remain valid until the file was compacted, or the item was deleted.

Adopting this change, however, opened up all kinds of other considerations. First, the iterator would now have to carry the actual key value. I had decided early on that I did not want my BtreeIndex iterator carrying the actual key value, and the same reasoning applies to the IndexedFile iterator. Iterators are supposed to be lightweight objects (in some intuitive sense). In particular, iterators are usually passed by value and are often copied or assigned one to another. Now consider the simple case where the key is a string. Many standard library implementations provide a reference counted string class, but they are not required to do so. The only portable assumption has to be that if key is a string, then copying an iterator containing such a key would involve copying the string data as well. Since strings can vary in size, this probably means doing a free store allocation. At this point, the supposedly lightweight object is demonstrating some pretty heavy behavior. I had no illusions that IndexedFile iterators (or BtreeIndex iterators) would meet the requirements specified in the Standard for true STL iterators, but I did not want to gratuitously mess with their behavior either, so I was reluctant to have iterators carry actual keys.

If I did not want the iterator to carry the key, then I could go back to where I started. In my original plans for IndexedFile, I envisioned it storing just items, where each item would contain its own key, and the user would have to provide a functor that would extract the key from the item. I had changed to the present idea where keys and items were separate because it seemed more flexible in the long run.

Another aspect that would change (IndexedFile iterators are not based upon the BtreeIndex iterators) was how to actually increment the iterator. Incrementing a BtreeIndex iterator is fairly straightforward: just increment the slot number and then check to see if you have run off the page. Since the pages at a given level in the index tree are linked together, running off the page means using the first slot of the next page. (Note that you do not have to actually access the next page in this case; it is sufficient to know that it exists.) Incrementing an IndexedFile iterator that does not contain the BtreeIndex iterator first involves finding the corresponding BtreeIndex iterator, which would mean searching the index for the corresponding (key,link) pair. (Here we see that extra level of indirection that was mentioned above.) Like having the iterator carry the key value, doing it this way would seem to add another extra layer of complexity onto what is typically a fairly lightweight iterator operation. Also, I should mention that my current design calls for an IndexedFile to support non-unique keys (like a multimap). Searching an index with multiple identical keys for a specific entry would technically be an O(n) operation over the number of identical keys that had to be searched. If I were to restrict IndexedFile to unique keys, then incrementing an iterator and doing a find by key would be basically the same operation.

So the question of how important it is to have iterators remain valid across insertions and deletions means answering other questions such as how important is being able to support non-unique keys, how important is being able to efficiently copy iterators, and how important is it that sequential access be more efficient than keyed access. I spent some time thinking about all of this without reaching any conclusions — at least not yet.

Now let me turn to the actual implementation and look at some other issues. I implemented the BtreeIndex class and IndexedFile over a couple of long weekends. When I first started testing, I promptly discovered that my interface was inadequate in terms of error handling. I expected IndexedFile to have a simple file-like interface, so it had operations like open and close. Unfortunately, since I did not derive IndexedFile from class XDR_Stream, it did not inherit the base class' interface to the file status information. So, I quickly added that.

Next, I discovered that I had a problem with the IOStreams openmode. The IndexedFile open function accepts an openmode parameter that is the same as for IOStreams. This parameter is passed through unchanged to the open function for the data file, but having an index file opened with a mode of out does not make a lot sense (i.e., even if you are just adding items to the index you have to be able to read the file). So, internally, I always added in to the openmode for the index file. Unfortunately, the IOStreams requirements state that if you open a file for in or in|out, then the file must exist. If the file does not exist, then the open fails. Not only does this cause a problem for the index, but it seemed reasonable to me that many times someone might want to both create a new IndexedFile and then use it immediately for retrieval. With the standard IOStreams interpretation of openmode, you first have to open a file with mode == out to guarantee that it gets created if it does not exist and then close the file and reopen it with mode == in|out to be able to both write to it and read from it. I could handle the problem for the index file internally (which I do), but I wondered about trying to change the behavior for the overall IndexedFile class. I thought briefly about just eliminating the openmode parameter. If I did that, then an IndexedFile would always be opened for both reading and writing, and a new one would be created if it did not exist. I rejected that idea because I wanted to keep some type of openmode just to be able to open an IndexedFile in read-only mode.

Next, I wondered if I should not just go ahead and create a file if one did not exist when an openmode of in|out was specified. This would have been a change from the documented, standard behavior for openmode, however, and I am very reluctant to change standard behavior — even to make it better. I decided, for now, to keep the behavior of openmode the way it is specified in the Standard.

Finally, let me toss out a couple of truly internal implementation issues. One of my pet peeves about many of the object-oriented designs that I have reviewed as a consultant is that they often have reasonably good high-level abstractions, but seem to be implemented in C for all practical purposes. In other words, there are no intermediate abstractions in the implementation. Naturally, my IndexedFile implementation is based upon several intermediate classes. One obvious internal class of BtreeIndex is class Page. Briefly recall that a BtreeIndex consists of fixed-size pages that are stored in the external file. Each such page contains some number of keys and links to other pages. Since my external file was an XDRStream, my actual Page class does not look exactly like its external representation, but is instead an abstraction that can be written to or read from the external file by the appropriate XDRStream insert or extract functions. In typical object-oriented fashion, these XDRStream functions are member functions of class Page.

There is also a RootPage class. RootPage is a different class because it contains some different information than an ordinary Page. Nevertheless, it also contains some of the same information, so class RootPage derives from Page. This allows the root page to be passed like an ordinary page to functions that recursively search the tree. Again, in typical object-oriented fashion, the insertion and extractor functions of Page are virtual and are overridden by class RootPage.

Besides being able to transfer the class' data to and from the external representation, class Page also provides a function to insert a new (key,link) pair into the page. Again, this seems like a perfectly reasonable object-oriented design. But now it gets tricky. When a page is full, attempting to insert a new key should cause the existing page to be split into two new pages. Additionally, a link to the new page must be inserted into the parent of the original page. Of course, if the root page is split, a new root page must be created and links to both of the existing pages inserted into it. Traditional object-oriented design would say that a Page should be able to tell if it was full or not and know how to split itself if required. Originally, that is what I planned to do, but I ran into a few problems.

Unlike the data file, the index attempts to reuse empty pages rather than just always creating a new page. These empty pages are held in a link list chained off the root page. So first, splitting a page requires checking to see if there are any existing free pages available, which means accessing the root page.

Then there is the problem of the page cache. From the very beginning, I knew that BtreeIndex would have a cache of recently read pages. This was essential for any kind of reasonable performance. Specifically, I did not want a find operation to read one or more pages into memory in order to locate a key, return an iterator to that key, and then have the dereference operation of the iterator re-read the same pages into memory. So, all requests for a page (other than for a new page) go through the page cache. Naturally, the PageCache is also a class. Besides being responsible for fetching pages from external storage when they are not in the cache, class PageCache also is responsible for making sure that internal pages are flushed back to external storage and then deleted when they fall out of the cache. When a page is split, if the new page is taken from the empty list, then that page is retrieved via the cache by requesting it via its page ID. If a new page is created however, then somehow the new page needs to be given back to the cache so that it can be managed properly from then on.

Finally, there is the question of how to take care of inserting the link to the new page into the parent page. Of the three things I have mentioned so far, this is the only one I had really thought about before I started doing the implementation. I figured it was simple; each page would just contain a link to its parent. I gave up this idea rather quickly when I realized that splitting a parent would require accessing half the pages linked from that page in order to update their parent links. Given that I expected a typical page to contain 100 or more links when it was full, this did not seem like such a good idea. Without a link to its parent, however, there is no way a page can completely handle the case where it needs to split itself.

After playing with several options (there are limits to even what a columnist might present), I settled on the following functionality.

Page provides a function named must_split. This returns a boolean to indicate whether the page is overfull or not. It is up to the upper-layer BtreeIndex function to call this function whenever it has performed an operation (such as inserting a new key) that could cause the Page to become overfull.

Page provides a split function. This is a virtual function since the RootPage split is different than for a regular Page. This function takes a pointer to the BtreeIndex class itself. That allows it to access the get_free_page function of BtreeIndex that will return an empty page, either from the free list maintained in the root page, or by creating a new page from the free store.

The get_free_page function will call the PageCache::add_page function if it has to create a new page.

The split function returns the pointer to the new page to the calling function, which is responsible for seeing that it gets inserted into the parent page.

If all of the issues I have discussed above are not perfectly clear, do not worry about it. The point of this column was not the details of this design. Instead, the whole point of this column was to set the stage for me to drag out one of my standard soapboxes. My desire to do this was triggered by two recent events: first, Dr. Robert Martin offered to reprint in his upcoming book an article I wrote back in 1992 titled "What is Software Design?" [4]. My argument in that article was that the source code is the design. The second event in this chain was a presentation that I recently saw by Ivar Jacobson about future trends in software. In that presentation, Dr. Jacobson argued for "UML all the way down." I agree with a lot of Dr. Jacobson's dreams, but this one I have a problem with.

What I have tried to do in this column is present just a few of the typical considerations that often go into any type of design effort. The issue about the lifetime of iterators is a fundamental aspect of the class's design, yet where would it have turned up in any type of typical design documentation. Maybe, if this class design had been part of a formal project, with the typical requirements for design documents and reviews, the question would have been raised early. Certainly, I think that if someone with my friend's level of experience had been involved, it probably would have been. But that is itself a problem. My friend has literally several decades worth of experience. How often does the typical design review actually have participants that have that kind of relevant background?

The issue about needing the interface to provide functions allowing the underlying file status to be checked probably should have been caught by me. But I didn't catch it until I actually had the entire class implemented and started testing. Again, if this had been part of the typical "first we do a design and freeze it, then we code" type of project, would this oversight have been caught earlier? I do not know, but my own feeling is that if I didn't catch it, I would have to say the odds were probably less than 50/50 that anyone else would either.

Finally, the issues that came up with what interface the Page class needed in order to provide a reasonable abstraction supporting the splitting of pages while at the same time keeping the Page class reasonably self-contained are ones that I defy just about anyone to anticipate without actually getting into implementing the code to do a page split. As I noted above, I had thought about one aspect of the functionality — the need to insert a new reference into a parent — but I got even that wrong.

And the point of all this? The point is that what I wrote in 1992 seems even more true to me today than it did back then: the source code is the design. Designing software is tough. Maybe there are a few Mozart-like geniuses out there that can see an entire design in their head before they write one line of code, but for the vast majority of us, designing software is a continuous feedback loop of discovery and refinement. In other words, designing software and implementing software are the same thing. In the grand scheme of things, IndexedFile is not that big of a class. If I can come up with so many different issues during the implementation of a relatively small piece of software, what is the reality for large-scale systems?

It may be true that something like UML can represent 80% of the code in a software module (this is Jacobson's figure — not mine). However, that remaining 20% is not only critical to the functioning of the software, but I contend that until you actually do that 20% your other 80% is nothing more than a preliminary guess. It is not unusual for that last 20% to cause a 100% rewrite of the other 80%. Even under good circumstances, figuring out that last 20% is likely to change at least 20% of the other 80%. At the end of my "What Is Software Design?" article, I noted that what we really need in the software industry is more expressive programming languages. I noted that C++ was a step in the right direction, but only a step. Instead of Dr. Jacobson's "UML all the way down," what I really think we need is a programming language X that is both powerful and flexible enough that we can say "X all the way up." In the meantime, we will have to make do with what we have, and I still think C++ is a pretty good choice.

Notes

[1] Jack Reeves. "The (B)Leading Edge: Building an Indexed File Class using XDR_Stream," C/C++ User's Journal C++ Experts Forum, March 2002, <www.cuj.com/experts/2003/reeves.htm>.

[2] Jack Reeves. "The (B)Leading Edge: Using the XDR_Stream Class, Part II", C/C++ User's Journal C++ Experts Forum, November 2001, <www.cuj.com/experts/1911/reeves.htm>.

[3] Jack Reeves. "The (B)Leading Edge: Simple Memory Management Classes," C++ Report, July 2000.

[4] Jack Reeves. "What is Software Design?" The C++ Journal, July 1992.

Jack W. Reeves is an engineer and consultant specializing in object-oriented software design and implementation. His background includes Space Shuttle simulators, military CCCI systems, medical imaging systems, financial data systems, and numerous middleware and low-level libraries. He currently is living and working in Europe and can be contacted via jack_reeves@bleading-edge.com.