March 1994/Weight Reduction Techniques in C++

Features

Weight Reduction Techniques in C++

Randy Kamradt

Randy Kamradt has been programming in C/C++ for the past seven years. He is currently working for TEAM Software, a fashionably lean consutling company developing an integrated database package for the Ventura County Superintendent of Schools, in Ventura California.

Introduction
I have been working on a large scale database downsizing project for the Ventura County School District since the beginning of 1993. Recently we turned over to the users the first application, a teaching-credential tracking database. Instead of the ticker tape parade we had expected, all they did was comment on how slow it seemed to run. Even though we used all the latest buzz words — such as "object-oriented" and "client-server" — we had to admit it was very sluggish. So we began an intense optimization effort, putting our application on a crash course of exercise and diet, to get it down to size and up to speed. This article deals with the potential problems facing object-oriented programming in C++ and some of the solutions we used.
One of the major advantages to object-oriented programming is that it can hide complexity. Very complicated sub-systems can be wrapped by classes. These classes provide well defined interactions, making their objects relatively immune to environmental factors. The down side to this is that the application programmer may not be aware of all the ramifications of using an object. Objects may encapsulate the acquisition of system resources, or they may create sub-objects. Just declaring a local variable can be a costly procedure.
One problem that faced us early on was from a class of the Commonbase Database class library, DBTable. Just creating a variable of type DBTable had the side effect of opening a communication channel (socket) to the database server. We were allocating DBTable objects from free store and in rare cases they never got deleted. Since our networking software had only 20 sockets available, the application would run fine for a while and then run out of sockets. This is actually an old problem, familiar to C programmers, but with a new twist: more than memory might be allocated by creating a user-defined type. Since this caused the program to die it was fixed early on, but other similar problems existed. We had to automate the creation of each DBTable, so as not to create it until it was needed, and to delete it as soon as possible. It could not be left up to the application programmer.
Another side of complexity hiding is that an object may be a front for many sub-objects. One of our objects, BMOJoin (Business Model Object Join), contained a list of DBTables. A join is database jargon for a relational combination of tables. Since a DBTable holds one socket, a BMOJoin can hold many. To add to this problem, another object, BMOIterator has a pointer to a BMOJoin. As I will explain later, the BMOIterator was the key to controlling all these resources, and minimizing their allocation.
Another class, ACond, represents a logical condition. It is implemented as an expression tree. So doing a simple assignment of an ACond could be costly. Again this is actually an old problem, the only difference being that in C such copying can be done only by calling a function, while in C++ the function can be initiated with an assignment operator. C programmers know that calling a function can take time and resources, but using built-in operators always takes (relatively small) constant time and never takes system resources. These assumptions go out the window in C++.
Free-store usage is another problem. Just opening a window and tieing it to a database in our application takes thousands of memory allocations, some of them just for a few bytes. Memory allocation is the lifeblood of C++ programming because objects should be made autonomous, and cannot rely on outside sources such as local or global variables for their internal memory space.
A string class, for instance, typically contains a pointer to memory containing the string (see Figure 1) . That memory must come from free store if the class is to be considered general purpose. If the memory came from outside the class, the string could not delete it when it was done, and the user would have to delete it if necessary. The issue of pointer ownership is important in making a safe application, and helps alleviate some of the common C pointer troubles such as deleting memory twice, or forgetting to delete it at all. It also means that memory allocation and copying may be done more often than absolutely necessary.
Another problem introduced with object-oriented programming is over-generalization. With base classes one can provide the common subset of functions available from a set of derived classes. If the programmer uses the base class, the full set of functions may not be available. An example of this is the use of a list object. A list object might be represented either by a linked list or by an array. If the programmer uses a list and needs to get to the n-th object, indexing may not be an option. The programmer will have to iterate through the list, keeping a count. If the programmer uses an array object, indexing is available.
Another example of this is the use of a complex number class. The operation (2 + 0i) * (6 + 0i) can be calculated much easier as 2 * 6. Because the more general form is used, the optimization for a special case is missed. The ability to generalize is a major plus for C++, but can limit optimization.

Code Bloat
Another issue I deal with here is code bloat. I used to think a program was big if I had to leave small model (greater than 64 kilobytes of code or data). Our current application is about 1.5 megabytes and growing (from about 30,000 lines of code and two libraries). Part of this growth stems from the tendency to make objects all-purpose (or worse, multi-purpose). Poorly designed objects often give liberal access to their internals, or multiple methods of access. A well designed object should do one thing, one way, and do it well. The public interface should be minimal in order to guide the programmer to the expected usage, and to keep things simple.
Making matters worse, currently the linkers I use (Borland's tlink and HP-UX 1d) always link in all virtual functions even if they are never used. This is because all virtual functions are referenced in an object's virtual table, which is included with a class constructor. I have heard that new linkers are addressing this problem, but without thorough evaluation of all the code, they will have trouble determining whether any one virtual function will assuredly be called (or assuredly not be called).
Another culprit in code bloat is the inline function. In the C++ style of object-oriented programming, small functions are very prevalent, and inline functions are necessary for a well oiled program. The factor that should decide whether a function is inline is the ratio of time spent calling the function to the time spent inside the function. If the time spent inside the function is much greater than the time spent calling the function, making it inline won't help speed much and will simply contribute to excessive code size. This implies that to use inlines effectively one should be aware of the inner workings of the compiler, and all of the side effects caused by some C++ functions.
One of the best ways to optimize is to use a profiler. I used the Borland Windows profiler extensively while optimizing. In spite of its tendency to crash, or reboot my computer occasionally, it was a tremendous help not only in optimizing, but in finding bugs and memory leaks. The Borland profiler is interactive, much like a visual debugger. It allows you to stop the program at any point to examine statistics collected, to turn profiling on and off during a single run, and to selectively profile portions of the code. It provides time spent on a single line of code, and the number of times a single line is called. This can be very handy for finding hidden bugs (see Figure 2) .
Profiling event-driven programs can be a challenge since the main loop of the program may be inaccessible. The method I used was to profile each of the main classes of the program individually. Since we keep all of the code for a class in one module, that meant compiling that module with the debugging option on. In the profiler I could then set a profile area on every line in the module. After running the program and doing a few transactions that would exercise the class I was profiling, the statistics window would show me where the most time was spent. I used this technique for finding candidates for inlining and other optimizations. By profiling the main classes first, I was able to tell which of the sub-classes were in need of optimizing, focusing my effort where it was needed.
There are some limitations on inlining virtual functions. If a virtual function is called via a pointer or reference to an object, the actual function that gets called depends on the original type of that object. The compiler cannot determine at compile time the correct function, and therefore it cannot inline it. Virtual functions that are called via an actual object, or ones that are explicitly called with the :: operator, can be inlined (see Figure 3) . In at least one case we changed a virtual function to a non-virtual function in order to take advantage of inlining. This step must be taken with caution, however, possibly adjusting for any change in functionality.
The decision to inline doesn't need to be all or nothing for a function. There might be a situation where a function can be split apart to facilitate inlining. Figure 4 shows that the member function GetWidgetPointer gets called 10,000 times, but only has to create a Widget 10 times. By splitting this function in two, the part that gets executed 10,000 times can be inlined, and the main part of the code can be isolated in a private member function.
Some special precautions should be taken when inlining constructors and destructors. The compiler may add code that you didn't call explicitly. For example with an inline constructor, the compiler will also inline for you the setting of the virtual table pointer, calls to any base class constructors, and calls to constructors of any data members that have constructors. Borland C++ also adds code to check if the constructor is called on the behalf of a new statement and optionally calls a memory allocation routine. The destructors will do the same.
One more precaution for inlines, if the inline contains a function call, that function call may also be inline, which may also contain an inline function, and so on. Make sure you know what you put inline, Just as an aside, some current compilers forget to call the destructors for local variables in inline functions. This is important if the local variables hold memory or system resources and can cause a memory leak. For now I wouldn't create local variables or pass-by-value user-defined objects in an inline function.

Pointer Tricks
One of the best ways to address rampant free-store allocation is by counting pointers. This is a simple but effective technique that can be used on any class to speed up assignments and passing by value. It is best targeted at utility classes that are used in many different places and cannot be isolated.
A good example is a string class. I pointed out above that a class always allocating its own memory can lead to excessive memory allocation and copying. But if the string class contains a pointer to an intermediate data structure with a count, assignment becomes a simple matter of decrementing and incrementing a counter (see Figure 5) .
By adding a little smarts to the string class we can reduce the amount of work done, but we don't have to change the external appearance of the class. It is important to maintain the external appearance if you need to change the class, but don't want to make changes to all modules that use the class.
One problem still remains. If you assign string1 to string2, then modify string1, string2 will also get modified. This problem is solved by using "copy-on-write" semantics. Copy-on-write is a strategy where any member function of the string class that modifies the string will first call a function that splits off a private copy of the internal implementation (see Figure 6) .
In our application we use both pointer counting and pointer counting with copy on write. The class BMOIterator mentioned above uses pointer counting as a smart pointer. Besides constructors, assignment operators, and a destructor, the only member function is the overloaded operator->. An object used with this operator appears as simply a pointer (see Figure 7) . In order to work intuitively as a pointer, we did not implement copy on write. Also, since a BMOIterator holds system resources, we judged it best that no copying should take place unless explicitly asked for.
In another class, AttributeList, which is basically a list of integers, we did implement copy on write. Since these AttributeLists are not often modified, adding copy on write doesn't add significantly to run time. However after running the AttributeList code through the profiler, we discovered that there were many thousands of empty lists being created, and much time being spent allocating arrays of zero length. To alleviate this situation, we established a special case where a zero-length list was represented by a null implementation pointer (see Figure 8. ). Copying and assignment are slightly longer but the default constructor is simplified. Only after using the profiler did we become aware of this special case optimization.

Overloading new and delete
Another strategy for addressing memory allocation bottlenecks is to overload the new and delete operators for a class. Using a special purpose memory allocation algorithm rather than the general purpose algorithm used by malloc and free, you can squeeze out additional speed. By using a different heap for different allocation sizes you can reduce the amount of time needed to search for a chunk of free memory, and reduce fragmentation as well.
An example of this is presented in Listing 1, Listing 2, and Listing 3, which implement a memory-pool class and a linked list class that uses the memory pool. The memory-pool class is a heap for a specific size allocation. It is used by associating one instance of the memory pool class with any class (via a static data member) and overloading the new and delete operators to use the pool. Since the specific new operator is only used for that class member, the size is always the same.
The heap works by allocating 32 objects at a time, and maintaining a bit map of used areas. The pool has two lists, one for chunks that have at least one space, and the other for chunks that are completely used up. Therefore the allocation logic has to search only one chunk for a free spot. (I have to thank my brother Mark for the bit searching method, without which this method was actually slower than the built-in malloc.) This special version of new could also be rewritten in assembly language for even greater speed.
For classes that hold limited resources (such as our BMOJoin which controls one or more network sockets), a good strategy is to delay creation until the last possible moment. We used code similar to that in Figure 4b inside the BMOIterator, calling GetJoinPointer rather than accessing the BMOJoin pointer itself. In this way we can create BMOIterators as we put up a window, but the BMOJoin isn't created until a search is performed. We added a member function called Disconnect to the BMOIterator, and call it when a window is put into the background to delete the BMOJoin pointer and free resources for the foreground window (see Figure 9) .

Conclusion
Finally, running the profiler through a particularly sluggish area, we discovered that a lot of time was being spent padding strings with spaces. The problem here was that operator+= for the string was not as efficient as we had hoped. By going to a lower level, and accessing C-style strings directly, we managed to speed up the padding process considerably.
I should mention that a good string class would have had a padding function, or at least the ability to create a blank string of count bytes. This low-level access then wouldn't be necessary. Since the string class was part of a commercial library we didn't want to modify or add to it. It's good to know that all the old C tricks are available even if just used at the lowest levels. Credit has to be given here to the profiler for finding this bottleneck.
Many of the optimizations described above are fairly simple because of the separation of interface from implementation. The idea is to be able to grease up a class without having to worry about side effects that might be possible in a less restrictive interface. Other optimizations are possible by dipping into C++'s C heritage as a low-level language.
Of course there is no substitute for a good design effort up front. A temptation in design is to make a lot of "friendly" classes, which access each other's internals. But what you wind up with is "spaghetti objects." By maintaining integrity between classes, you can twiddle with the internal bits to your heart's content without having to worry about unforeseen side effects.

Figure 10: Going low-level to improve efficiency

Features

Weight Reduction Techniques in C++

Randy Kamradt

Introduction

Code Bloat

Pointer Tricks

Overloading new and delete

Conclusion

Figure 10: Going low-level to improve efficiency