Memory Management


Reducing Allocation Overhead in 2-D Arrays

Glen W. Deen


Glen Deen has a BSEE degree from the University of Texas. He is the publisher of MicroSky(tm), a microfiche reproduction of the Palomar Observatory Sky Survey. He uses C++ to process astronomical data catalogues and to plot star charts. His internet address is glen@metronet. com.

If you allocate a large two-dimensional array of small objects from the heap, you may be shocked at the amount of memory wasted using conventional techniques. For example, suppose you allocate 1,000 eight-byte null-terminated C-strings on a PC using the Borland C++ compiler. Each eight-byte string, if allocated separately with operator new, will carry eight additional bytes of overhead. (The overhead may vary with different compilers and machines.) You also need 1,000 pointers, which may be two or four bytes each, depending on your memory model. So the 8,000-byte overhead works out to 40% of the total memory needed for the strings, assuming four-byte pointers (% = overhead/(overhead + data + pointers) = 8000/(8000 + 8000 + 4000)). With two-byte pointers the overhead climbs to 44%. A conventional method for allocating 1,000 arrays of arrays of small objects is shown as follows:

// type of small object
class   Type;
const int m = 1000, n = 8;
// pointer to array of Type*
Type** array = new Type*[m];
// for each row . . .
for (int i = 0; i < m; i++) {
   // allocate an n-size array for Type objects
   array[i] = new Type[n];
   // If Type is built-in
   memset(array[i], 0, n*sizeof(Type));
}
Figure 1 shows what this structure looks like when Type is of type char. The shaded portion represents overhead bytes, operator new cannot currently initialize an array of builtin types, such as char, so you can't write

// won't work!
array[i] = new char[n](0);
Therefore you must initialize the allocated array of built-in types with a for loop or a call to memset as shown above. If Type is a class, however, its default constructor will be called by operator new to initialize each element in the array.

To dereference the element in the rth row and the cth column, write

Type element = array [r][c];
Calling operator new to allocate each row of a 2-D array can be costly in time as well as memory overhead. Stroustrup [1] suggests two possible time-saving solutions when allocating many small objects on the free store: provide a better general purpose allocator, or take over free store management for objects of a particular class by defining appropriate allocation and deallocation functions.

These suggested solutions are probably not worth the effort required in most cases if they are not already available. But if the large number of small objects are defined in a multi-dimensional array, there is an easier way to eliminate the waste. Just allocate one block for the entire array. (I got this idea from a suggestion made to me by a Borland customer-support engineer.):

class Type;
const int m = 1000, n = 8;
// pointer to array of Type*
Type** array = new Type*[m];
// allocate an m*n array of Type objects
Type* block = new Type[m*n];
memset(block, 0, m*n*sizeof(Type));
// If Type is built-in simulate
// allocation of each row
for (int i = 0; i < m; i++)
     array[i] = &block[i*n];
After you allocate the array of m pointers, you allocate a singly-dimensioned array of Types, called block in the example, which contains m times n elements. (If Type is a built-in type, you also need to initialize the block for the reason given above.)

The for loop simply assigns each row's address to the pointer for that row, stepping m times through block in multiples of n elements. The 1,000 arrays of eight-element arrays are packed together, and only one eight-byte overhead block is needed for the entire aggregate. If Type is of type char (depicted in Figure 2) , you save 8,000 - 8 = 7,992 bytes. Finally, to dereference an element in the rth row and cth column, again, you write

Type element = array [r][c];
So you can still use the conventional syntax. Not bad for an extra line of code!

Reference

[1] Bjarn Stroustrup. The C++ Programming Language, Second Edition (Addison-Wesley, 1991), p. 176.