October 1996/Memory Caching for Disk-Based Objects

Features

Memory Caching for Disk-Based Objects

Tom Nelson

Caching logic is another of those program components that's well worth recycling, particularly if someone else has written it for you already.

Memory caching is a technique for keeping recently-accessed data in memory instead of retrieving it each time from disk [1]. Depending on the application and how it uses disk files, most patterns of disk access are not random. It's likely that an application will use some of this previously-accessed data (i.e. sectors, blocks, objects, etc.) again within a certain time. Retaining some of this data in memory reduces the average access time at the cost of increased memory use and increased code size.

All modern operating systems employ some sort of cache for accessing raw disk sectors, below the logical level of files. This system-wide cache manager processes requests from applications (via the OS) to read and write the sectors that constitute data files (see Figures 1 and 2) . It begins by checking the cache for the presence of the requested sector. If present, the cache manager either reads it from or writes it to (updates) the cache directly. If the sector is not present, the manager responds to a sector-read request from the application by first fetching the desired sector from disk into the cache. The manager responds to a sector-write request by first copying the sector (supplied by the application) into cache memory. The cache manager then either commits the updated sector to disk immediately (for safety), or marks the sector to be written to disk at a later time (for speed).

Although this overview illustrates the fundamental operation of nearly any type of cache, all caches are not created equal. The system cache just described processes sector I/O requests from applications indiscriminately, as a sort of jack-of-all-trades. It cannot to make any distinctions with regard to file usage among different applications. For example, one application may read or write a file sequentially, filling up cache memory as it attempts a single access of each sector. In this case it would have been faster to access the disk directly and bypass the cache entirely. In a multitasking OS, sequential access of files by one or more applications may flush the cache of sectors currently in use by your application.

As a general rule, though, system-level disk caching usually increases overall system performance. To address some of the problems encountered with a system cache and improve performance for specific applications, you can use the technique of file-level caching. A program designer usually knows how an application will interact with a specific disk file and can often use file-level caching to advantage. File-level caching uses a private section of memory to hold a subset of the data contained in a specific file. The file-level cache makes no assumptions about how that file is logically organized. (I will henceforth refer to a cached file as a collection of generic "objects.") The overall operation of a file cache parallels that outlined earlier (Figures 1 and 2) , except that it is independent of the system cache.

Caching and Application Performance

In the hypothetical case where file access is perfectly random (I know of no actual cases), the probability of finding a requested object in a cache (a hit) holding n objects attached to a file of b objects is n/b. When the cache size becomes equal to or greater than the total file size (n >= b), the probability of a cache hit is 1. However, the increased search times for larger caches also reduce performance. At some limiting value of cache size it becomes more efficient to read entire files into memory sequentially, then access them directly, than to read them first into a cache. At the other extreme, overly small caches can also create inefficient overhead because the probability of a cache hit becomes small.

Most real-world applications access files in a non-random fashion, making file-level caching a viable optimization technique. (Of course, sequential file access, as mentioned earlier, may be considered the ultimate in non-randomness. But it does not lend itself well to caching.) As an example of non-random access, a tree-structured scheme for database indexing will need frequent access to the root and adjacent index nodes of the tree kept on file. This results in a highly skewed, non-random access pattern. Caching these nodes makes sense because their probability of access becomes significantly greater than n/b.

The primary goal in caching a file is to reduce the average access time for objects contained in that file. This is the principal measure of general cache performance, and it is subject to a number of variables, such as cache size, file-access characteristics, and other hardware-related details. The significant variable over which you have complete control is cache size. Keeping in mind the usual tradeoff of memory size versus execution time, you can adjust the cache size to maximize the ratio of cache hits to the total number of searches performed. The cache manager can record this data to help you determine the ideal cache size.

The type of operating system you're running also plays a part. For a single-task system like MS-DOS, any gain in performance from using a file-level cache manager will be nearly undetectable on a fast CPU. Taken over a sufficient length of time, however, you should discern at least some improvement in average access time, since retrieving data directly from memory still involves less execution overhead than calling an MS-DOS system function to retrieve data from the system cache. I would expect the greatest performance gains when running on a true multitasking, pre-emptive OS. Maintaining data close at hand in a private cache circumvents the performance degradation that results when other currently running programs monopolize the system cache.

The M/LRU Caching Algorithm

This article presents a cache manager class in C++ for accessing and maintaining collections of objects on disk. The cache manager incorporates the M/LRU algorithm, an acronym for "Most/Least Recently Used." This simple algorithm delivers adequate performance relative to more complex, code-intensive designs employing Frequency-Based Replacement or other predictive techniques [2].

The M/LRU algorithm exploits non-random file usage by placing the most-recently-accessed object at the head end (the MRU node) of a linked priority list of cache nodes. If client code requests the same object again immediately, the cache manager will find it there (at the MRU node) in the shortest possible search time. Otherwise, the next object accessed from either the cache or disk is placed in the MRU node. This action pushes all other nodes down one place to the next highest priority. The previous MRU object thus becomes the "next-most-recently-used." Unless the application re-accesses a cached object, it gets placed into increasingly less favorable locations as new objects are added to the cache. Eventually an unused object reaches the end of the list (the LRU node, or least-recently-used node) where it ultimately becomes overwritten with new object data. If the cache manager needs space for a new object, it always reuses the LRU node.

Figure 3 helps illustrate the foregoing. Figure 3a shows the cache in some initial phase, such as during a search for object 1. Since the cache list currently contains object 1, the cache manager will find it and score a hit.

Before returning, the cache manager must first remove node 1 from the list as shown in Figure 3b. Since node 1 was also in the LRU position, the node immediately preceding it (node 7) becomes the new LRU. The cache manager then reconnects node 1 to the list as the new MRU node (Figure 3c) . The application can then either read or refresh the object stored at this location.

If the manager didn't find the requested object (a cache miss), it must reuse the lowest-priority (LRU) node to receive a new object. If the application marked this node for delayed processing, the manager writes it to disk at this time. Otherwise, the object present at that node will be overwritten. As in the preceding paragraph, the manager first removes, then reconnects the LRU node as the new MRU node. At this point the application gains control, which can either refresh the MRU node from disk or write an updated object to the same location.

Implementing the Cache Manager

Viewed conceptually, the cache manager consists of a specialized interface to a double-linked list container. The manager supervises cache searches and indicates the correct place in the cache where clients can retrieve or update object data. The manager neither controls the physical transfer of object data into and out of the cache, nor is it attached to any specific disk file. Applications can handle such object I/O themselves directly, but this is preferably managed by an intermediate interface layer.

The cache manager can employ any container class library for actual maintenance of the cache list. A separate class interface module for the container library maintains the independence of the cache manager from any specific class library implementation.

I also constructed the cache manager to handle generic (void *) objects. I felt the other alternative, templates, was generally unacceptable due to excessive bloat resulting from code expansion for different object types. However, I still needed type safety for the cache, so I constructed a templatized wrapper class for the generic cache manager. The wrapper class is small enough to circumvent the "balloon clause" penalty you frequently get with templates.

Listings 1 and 2 (mru.h and mru.cpp) present the definition and implementation for MruCache, the cache manager class. Protected data in MruCache include pointers to both ends of the cache list (MRU and LRU nodes), as well as pointers to a double linked-list and list iterator objects. The cache list (Cm) stores pointers to objects of type CacheNode (Listing 1) . Each CacheNode in turn maintains a pointer (_obj) to a non-specific object buffer.

Data member Cm points to an indirect double linked-list container (see container interface code in Listing 3) . I could have instead used a direct double linked-list container, which would have contained actual CacheNode objects rather than pointers. However, I felt that this might impede cache performance to some degree. A direct container usually incurs additional overhead due to implicit object movement or copying. Instead of moving whole objects, I'd rather pass pointers to them at the cost of juggling memory management for CacheNodes myself. Also, a list of CacheNode objects would still obligate me to manually allocate space for the associated object buffer and, in addition, provide deep-level copy and assignment operators. Including the cached object directly as part of a CacheNode (instead of a pointer to it) would also compromise its generic character, involving templates and unnecessary code inflation.

Each CacheNode maintains a description of each cached object. You stipulate how many CacheNodes you want and the size of cached objects as parameters to the MruCache class constructor. Adjusting the number of CacheNodes provides you with the greatest amount of control over cache performance.

State data within each CacheNode indicates the node's status and identifies the object contained at that node. A node's status can be either free (released) or marked. When a node is marked it means that its object buffer contains data flagged for update that hasn't yet been committed to disk. A free CacheNode is either empty or points to a recently-accessed object.

A CacheNode identifies an object with a unique ID number (_tag). The cache manager does not specify any purpose or usage for this number. In most applications the ID will refer to the read/write offset of that object in the cached file. Alternatively, it could also refer to a "block" number in a file that you have logically divided into an array-like sequence of equal-sized sections. To summarize, your client code must determine how to locate and access each object in the cached file. The cache manager won't handle this task for you.

For simplicity, I declared all member data in class CacheNode as public. Ideally, you would hide these members from clients and provide access to private data only through member functions.

You can't instantiate MruCache objects directly because the class definition contains a pure virtual member function, MruCache::Proc (discussed later). This restriction contributes indirectly to maintaining the type safety of the cache. As I outlined earlier, a non-type-specific cache avoids excess code expansion resulting from the overuse of templates, but does not offer type security. To instantiate MruCache, you must derive an instance class from it. For the demo code (available electronically, see p. 3), I derived the templatized wrapper class TMruCache from MruCache. This wrapper lets you construct a completely type-safe cache for any data type without concern for bloated code.

Public member functions in MruCache (Search, AddNew, and Mark) that return pointers to generic CacheNodes may also appear to jeopardize type safety. I made this concession to again avoid the template "balloon clause." These member functions operate at a relatively low level and ideally you shouldn't use them directly in your application. Preferably, you should employ an intermediate layer that mediates data movement between disk, the cache, and your application. You can configure this I/O interface to reestablish type safety.

This month's code disk and online sources contain code for CachedFile, a file I/O class that accesses objects at specified offsets in a file. This class automatically creates an object cache of any appropriate size for the file you specify. Class CachedFile is generic and thus not type safe. As with TMruCache, the templatized wrapper class TCachedFile supports type security for base CachedFile objects, but contributes little to code bloat.

TCachedFile also supports the use of temporary cache files, which disappear when your program shuts down. An application may generate data (a compiler symbol table, for example) that need temporary file-based storage. Using delayed processing, TCachedFile would create a scratch file for storage only if the generated data overflow the cache.

Managing Cache Memory

The class MruAlloc in Listing 1 centralizes memory management for the cache manager. All calls to operators new and delete (from MruCache and CacheNode) end up here. By default, MruAlloc depends on global new and delete. You can customize this class to suit the requirements of practically any run-time environment.

Some readers may be puzzled by the derivation of MruAlloc from Borland's default TStandardAllocator class for container objects. This resulted from a forced marriage of convenience with Borland's container class library. The Borland library has its own peculiarities regarding memory management. I'm still at a loss to interpret some aspects of its behavior with regard to TStandardAllocator. Any allocator class you modify to work with the Borland library must conform to the public interface defined in TStandardAllocator.

I placed my own versions of new and delete within the derived class MruAlloc because of the conflicts that resulted when I tried to use TStandardAllocator directly from MruCache and CacheNode. This procedure hides MruAlloc from the Borland Container Library, yet still puts memory management under one roof. Even though this works, I'm still unsatisfied with this resolution of the problem.

Operating the Cache

You initiate cache searches by calling member function MruCache::Search (Listing 2) . If it finds the requested object, Search passes a pointer to the relevant CacheNode to function MruCache::make_mru. This utility function removes the given CacheNode from the linked cache list and reconnects it in the MRU position. Search then returns a pointer to the MRU node. Your application can either retrieve or update the object contained at CacheNode::_obj.

MruCache::Search returns NULL if the object you requested wasn't found. You then direct the cache manager to prepare new space in the cache by calling MruCache::AddNew. AddNew always uses the LRU node for a new object. AddNew may find that the LRU object has been marked for update (via CacheNode::Mark). In this case, AddNew calls a special member function Proc. Proc's purpose is to "process" the object, which usually involves writing it to disk. After processing the object, AddNew renumbers the node with the given tag number, then calls make_mru. Recall that make_mru moves the specified node (in this case, the LRU node) into the MRU position. At this point, the application either copies an updated object into the MRU node or fetches one from disk.

The function MruCache::Proc is declared virtual to permit customized object processing by descendant class objects. This allows the cache manager to remain independent from application-dependent I/O routines.

When you write objects into the cache, you can follow one of two strategies. The most efficient procedure is to simply leave the object there and mark it for update. When the cache manager must reuse the same space, it will commit the object to disk using the procedure provided in your version of Proc. You can also use a conservative, write-through strategy. This procedure immediately commits any object you place in the cache to disk (again using Proc) instead of marking it for update.

Testing the Cache Manager

To compile the cache manager, you must modify the container interface module (Listing 3) to fit your preferred container class library. I adapted both TIGenDoubleList and its Iterator class to work with the Borland Container Class Library shipped with the Borland C++ 4.0 compiler. I consider the Borland container library somewhat difficult to use. However, it works as advertised. Certainly better libraries exist; you may find the Standard Template Library (STL) a good place to start looking [3] [4].

Cache demo code is available on the ftp site and code disk (see p. 3). This consists of a simple interactive display loop to exercise the public functions in MruCache. It doesn't create a cached file nor does any data movement occur. It works only with the object tag numbers you supply when responding to prompts.

The test program automatically pre-loads each node of the cache with a tag number before asking for input. You provide arbitrary object tag numbers to perform searches, add a new object, mark an object, or flush the cache. When running the test program, you may find that the cache manager does not prevent certain kinds of user errors. In particular, you can call AddNew to add a tag number that already exists in the cache. This action will corrupt the cache list.

The correct procedure would be to search the cache list for the tag number first. If not present, you can safely call AddNew to add it to the cache. Instead of direct calls like this, however, you should provide an intermediate I/O interface that sits between the cache manager and your application. The interface will call MruCache member functions in their proper sequence to avoid errors like this (refer to cfile.h on the code disk).

Maximizing Performance

Since the primary goal in caching a file is to reduce the average access time, you might consider using a profiler to help you fine tune your application in its use of that file. Configure the profiler to record the execution times for object read and write operations, then average these data over a suitable length of time. Run the test procedure for different cache sizes using the same set of test data each time. This should give you a good idea of how the average access time responds to different cache sizes.

You can also evaluate the performance statistics (cache hits and misses) collected by the cache manager. These will give you a rough idea of how your application responds to caching. The value to watch is the cache hit ratio, defined as the number of hits divided by the total number of searches performed:
hit_ratio = hits/(hits + miss);
Bear in mind, however, that the hit ratio approaches 1 as you make the cache larger. For a cache as big as the file, the number of misses falls to 0 once the cache is full. This case represents superfluous execution overhead, however, since you don't need the cache manager anymore. Instead, try going in the other direction. Use the cache hit ratio to determine how small you can make the cache and still realize improved performance.

You could also use the hit ratio to evaluate how other variables affect performance for a given cache size. For instance, you might use this to test the effect of linking with different container libraries, to observe how the same cached application performs on different hardware, or to investigate a modified caching algorithm.

Hashing

As a further refinement, consider hashing to expedite cache lookups. A hash table could operate in parallel with the cache list to associate a key (usually a string, such as an employee name or numbers) with an object tag number. The contents of the hash table would mirror those of the cache list.

A hashing interface should act as a wrapper for MruCache member functions. This maintains the independence of the cache manager from any specific hashing technique or key type. All cache searches would be preceded by a hash table lookup using a specified key. If the object is present in the hash table, it still needs to be located in the cache list using MruCache::Search. The payoff comes when the object is not found in the hash table. In this case, you need not search the cache list, a time-consuming procedure relative to a hash table lookup. Of course, any changes in the contents of the cache list (additions and deletions) must also be reflected in the hash table.

Conclusion

This article has demonstrated a simple caching algorithm and implemented a cache manager useful for improving access to data files. In contrast to installing faster hardware, file caching represents an extremely cost-effective option for optimizing the performance of I/O-intensive applications. Some readers may have suggestions for improvements; if so, I would appreciate hearing about them.

Notes and References

[1] I draw a distinction here between caching and I/O buffering techniques such as used for iostreams. Iostream buffering manages character data in streams of variable length. The memory caching technique presented here handles discrete objects of fixed size.

[2] See "Enhancing the UNIX Kern Shell Using Predictor Techniques," by Philip Thomas and Shmuel Rotenstreich, CUJ, March 1994, pg. 83, for a look at one type of predictor technique.

[3] P.J. Plauger. "Standard C/C++," CUJ, December 1995 through October 1996. Discussion of Standard Template Library (STL).

[4] Check out the CUJ home page, http://www.cuj.com/links.htm, for more references to STL information.

Tom Nelson is an independent programmer and technical writer. His current interests are PC systems programming and OOP design in C++. He may be reached at tnelso39@alliance.net. All code appearing in this article copyright © 1995 T.W. Nelson. Permission is hereby granted to use this code in any manner provided the user display this copyright notice appropriately in source.