March 1997/C++ Theory and Practice

Columns

C++ Theory and Practice

Dan Saks

Class-Specific new and delete

If an object can construct and destroy itself, why not have it buy and sell its own storage? Well it can, as Dan shows, but you have to be careful.

Copyright © 1997 by Dan Saks

A couple of months ago, I started to take a detailed look at new and delete expressions. (See "C++ Theory and Practice: new and delete," CUJ, January 1997.) I got sidetracked last month by recent events on the standards front, but now I'm back on track.

In my January column, I explained the basic structure of new expressions and their interaction with both constructors and the global functions named operator new and operator new[]. At the same time, I explained the basics of delete expressions, and their interaction with destructors and the global functions named operator delete and operator delete[]. This month, I'll look at allocation and deallocation functions as class members.

Global Allocation and Deallocation Functions

An allocation function is any function whose name is either operator new or operator new[]. A deallocation function is any function whose name is either operator delete or operator delete[].

The draft Standard C++ library provides definitions for global allocation functions declared as:
void *operator new(std::size_t n)
    throw (std::bad_alloc);
void *operator new[](std::size_t n)
    throw (std::bad_alloc);
std is a namespace that contains nearly all the standard library components. In fact, the only standard library components not nested within namespace std, aside from macros, are allocation functions and deallocation functions. Even though they are not declared in namespace std, the global operator new and operator new[] are declared in terms of the names size_t and bad_alloc, which are declared in std.

As in C, size_t is a typedef name defined as the unsigned integer type that is the result type of the sizeof operator. In other words, size_t is an unsigned integer type (either unsigned int or unsigned long int) that can represent the size of any object in the execution environment.

bad_alloc is the standard exception type for reporting storage allocation failures. The throw specification
throw(std::bad_alloc)
at the end of both function declarations indicates that the functions can throw only exceptions of type bad_alloc or a type derived from bad_alloc.

For the remainder of this article, I shall refer to size_t and bad_alloc without the std:: prefix. That is, you should assume that the using-directive
using namespace std;
is in force from here on.

The draft Standard C++ library also provides definitions for global deallocation functions, declared as:
void operator delete(void *p)
    throw();
void operator delete[](void *p)
    throw();
Again, these functions are truly global; they are not members of any namespace. The throw specification
throw()
in both functions indicates that neither function can throw any exceptions at all.

The draft Standard C++ library provides implementations of these global allocation and deallocation functions that can allocate and deallocate objects of any size. This generality is quite appropriate (and, in fact, essential) for a standard library, but it rarely yields optimal performance for any particular program. Replacing the general-purpose algorithm with one specifically tuned to the program may lead to significantly faster execution time or lower memory consumption.

Any C++ program can define its own version of any of the aforementioned allocation and deallocation functions to replace the definition it would otherwise get from the library. Thus, if your program defines its own global allocation function, then every new expression that would have called the library's function with the same name will call yours instead. Similarly, if your program defines its own global deallocation function, then every delete expression that would have called the library function with the same name will call yours instead.

Although you can easily find memory management algorithms in textbooks and journals, writing an industrial-strength general-purpose memory manager (a set of corresponding allocation and deallocation functions) is still not an easy task. The memory manager must be able to allocate and deallocate objects of any size, and it must comply with the memory alignment requirements of the target machine. (Some architectures require that objects of certain types reside only at addresses that are multiples of some power of 2. For example, it may be that each object of type double must be aligned at an address that is a multiple of 8.) Rather than write your own memory manager, you should first consider purchasing one. I believe ads for such products occasionally appear in this magazine. (I have no financial stake in any such products.)

Class-specific Allocation and Deallocation Functions

In many applications, the majority of dynamically-allocated objects tend to be of just a few types. You may be able to achieve significant performance improvements by using special-purpose allocation and deallocation functions for just those few heavily-used types, while still using the library's general-purpose allocation and deallocation functions for all the other (less often used) types.

C++ lets you implement special-purpose memory managers for objects of a given class by defining allocation and deallocation functions as members of that class. For example,
class widget
    {
public:
    void *operator new(size_t n)
        throw (bad_alloc);
    void operator delete(void *p)
        throw ();
    ...
    };
defines class widget with class-specific versions of operator new and operator delete. Thereafter, a new expression such as
pw = new widget;
allocates memory using widget::operator new rather than the global operator new. Likewise,
delete pw;
deallocates memory using widget::operator delete.

In the previous example, class widget does not declare operator new[] and operator delete[] as members. Therefore, a new expression such as
pw = new widget[N];
uses the global operator new[] to allocate an array of N widgets. Similarly,
delete [] pw;
uses the global operator delete[] to deallocate that array. However, were widget to declare operator new[] and operator delete[] as members, then these array-new and array-delete expressions would use the member functions instead.

When declared as a class member, a deallocation function can have a second parameter of type size_t:
void operator delete(void *p, size_t n)
    throw ();
void operator delete[](void *p, size_t n)
    throw ();
In that case, each delete expression passes the size of the object to be deleted as that second parameter.

For example, if class widget had a member operator delete declared as above, then
delete pw;
would call
widget::delete(pw, sizeof(widget));
Declaring allocation and deallocation functions for class widget has no effect on allocation and deallocation for other types. That is, new and delete expressions for objects of built-in types (or arrays thereof), and for class types that do not declare their own operator new and operator delete (or arrayss thereof), continue to use the global allocation and deallocation functions.

A program can bypass any member allocation or deallocation function and use a global one instead by explicitly writing the scope resolution operator :: at the beginning of a new or delete expression. For example,
pw = ::new widget;
invokes ::operator new (the global operator new) despite the presence of widget::operator new. Similarly,
::delete pw;
invokes ::operator delete (the global operator delete).

Static Members

When declared as a class member, an allocation or deallocation function is always a static member function, even if the keyword static is absent from the declaration. For example, the declarations for widget::operator new and widget::operator delete shown above behave exactly as if they had been declared as
class widget
    {
public:
    static void *operator new(size_t n)
        throw (bad_alloc);
    static void operator delete(void *p)
        throw ();
    ...
    };
A static member function is a member function that does not have an implicitly-declared parameter named this. The primary use for static member functions is to manipulate static data members. A static data member is a statically-allocated object declared in class scope. The primary use for static data members is to define data that is shared by various members of a particular class, yet not part of any one object of that class. (I explained static members in some detail in "Stepping Up to C++: Other Assorted Changes, Part 1," CUJ, July 1995.)

The reason that member allocation and deallocation functions are static is that they don't really act upon objects of their class type; they act upon raw storage.

For example, widget::operator new does not do anything with a widget object. Rather, it allocates storage for a widget-to-be from a pool of available memory. A new expression that invokes widget::operator new passes the address of that storage as the this parameter of a widget constructor, and that constructor transforms that storage into a widget.

A delete expression that invokes widget::operator delete first passes the address of a widget as the this parameter of the widget destructor. In a sense, that destructor breaks the widget down into raw storage by destroying the widget's value. The delete expression then passes the address of that ex-widget to widget::operator delete, which places that storage back into the memory pool.

Whenever possible, the variables that represent that memory pool should probably be static data members. Indeed, they should have static storage duration so they can retain their values between calls to the class-specific allocation and deallocation functions. However, these variables need not be global because only the class member functions need to access them. Therefore, the variables should be static data members, declared private to prevent access from outside the class.

For example, one simple technique for speeding deallocation and subsequent reallocation of widgets is to maintain the memory for them in a linked list. Rather than return storage for ex-widgets to the free store, widget::operator delete could place that storage into a list of available storage chunks that are just the right size for widgets-to-be. Subsequent calls to widget::operator new could then grab the next available chunk from the head of that list rather than go hunting through the free store. widget::operator new would need to dip into the free store only when the list is empty.

The pointer to the head of this list should be a static member of class widget, as sketched in Listing 1. Listing 1 shows that pointer, widget::available, as a static member of type widget::chunk *. widget::chunk is a struct conjured by some implementation-dependent magic so that it has the same size as a widget.

Access Control

When declared as class members, allocation and deallocation functions are subject to access control. That is, the functions can be private and protected as well as public. I believe you will usually want them to be public; but private, and maybe even protected, access can be useful.

For example, if you would like to prevent your application from allocating objects of a certain type X on the free store, simply declare operator new as a private member of X:
class X
    {
    ...
private:
    void *operator new(size_t)
        throw (bad_alloc);
    ...
    };
Don't define the function either. Subsequently, a new expression such as
px = new X;
that attempts to create an X object outside the scope of X will trigger a compilation error because X::operator new is inaccessible. Any new expression within the scope of X will trigger a linker error because, even though X::operator new is accessible, it has no definition.

The previous example does not declare operator new[] as a member of X. Therefore, a program can still allocate arrays of X objects using
px = new X[N];
which invokes the global operator new[].

Declaring X::operator new as private does not completely prevent a program from dynamically allocating individual X objects. The program can still employ the global operator new in explicitly-qualified new expressions such as
px = ::new X;
This is rather unfortunate because it weakens a useful idiom.

Inheritance

Since allocation and deallocation functions can be members, they can also be inherited. For example, given
using namespace std;
...
class B
    {
public:
    void *operator new(size_t n)
        throw (bad_alloc);
    void operator delete(void *p)
        throw ();
     ...

    };
class D : public B
    {
    ...
    };
where D does not declare its own operator new or operator delete, then
pd = new D;
invokes B::operator new, and
delete pd;
invokes B::operator delete. This behavior is certainly consistent with the usual rules for inherited functions, but I don't believe it's very useful, and it can be dangerous.

For example, suppose that class D declares additional data members beyond those it inherited from class B. Then sizeof(D) will be greater than sizeof(B). If B::operator new allocates only enough storage for a B object, then
pd = new D;
will attempt to construct a D object in storage (allocated by B::operator new) that is too small for a D.

There's a fairly simple way to guard against this error. B::operator new should check that the size of the storage request is the same as sizeof(B). If not, B::operator new should either reject the request, as in
void *B::operator new(size_t n)
    throw (bad_alloc)
    {
    if (n != sizeof(B))
        throw bad_alloc();
    ...
    }
or pass the storage request to some other allocation function (such as the global operator new), as in:
void *B::operator new(size_t n)
    throw (bad_alloc)
    {
    if (n != sizeof(B))
        return ::operator new(n);
    ...
    }
Many compilers will remind you to do something about this problem because they will issue a warning if B::operator new ignores the value of its parameter.

Virtual Functions

Since member allocation and deallocation functions are static they cannot be virtual. For allocation functions, this doesn't really matter because constructors can't be virtual either. The entire process of creating an object (allocating and initializing storage) depends on knowing exactly the type of the object. In that case, the virtual calling mechanism is no help.

On the other hand, since destructors can be virtual, it seems that deallocation functions should be also. But they aren't. However, when a delete expression destroys an object of class type that has a virtual destructor, it uses the deallocation function corresponding to the class of the destructor actually called. Thus, the delete expression behaves as if the deallocation function were virtual as well.

For example, consider this simple hierarchy:
class B
    {
public:
    virtual ~B();
    void operator delete(void *);
    };
class D : public B
    {
public:
    ~D();
    void operator delete(void*);
         
};
Here, base class B has a virtual destructor, and so does derived class D. Now suppose you create a D object using
B* bp = new D;
so that bp points to an object that is actually a D. Subsequently, the delete expression
delete bp;
destroys *bp (the object addressed by bp). Since B has a virtual destructor, the delete expression actually invokes D's destructor, which is good because that's the proper way to destroy a D. Since it invokes D's destructor, it also invokes D's operator delete, which is also good.

Different compilers use different techniques to pull this off. Some compilers generate code that simply calls the deallocation function from within the destructor. Other compilers generate code wherein the destructor returns the address of the appropriate deallocation function. Yet other compilers generate calls to additional "helper" functions that select the appropriate deallocation function.

A word of caution: this virtual calling mechanism does not work with array-delete expressions. For example, consider the program in Listing 2. Here, base class B and derived class D each have their own member deallocation functions and virtual destructors. However, the array-delete expression in
void delete_array(B *bp)
    {
    delete [] bp;
    }
always invokes B::operator delete[], even if bp points to an array of D objects. In array-delete expressions, the compiler always selects the deallocation function corresponding to the declared type (the "static" type) of the pointer operand rather than the actual type (the "dynamic" type) of the object to be deleted.

The draft C++ Standard says the array-delete expression in Listing 2 has undefined behavior. If you compile Listing 2 using different compilers, you may see different results. For example, using Watcom C++ 10.6 under Windows 95, the program produces seemingly correct output:
B()
D()
B()
D()
~D()
~B()
~D()
~B()
Using Borland C++ 5.0, the same program yields:
B()
D()
B()
D()
~B()
~B()
I suspect some compilers might produce a program that terminates abnormally.

Meyers [1] expands on this and other related issues under the general guideline of "don't treat arrays polymorphically." (In the acknowledgement, he says he got the idea from me.) o

References

[1] Scott Meyers. More Effective C++. Addison-Wesley.

Dan Saks is the president of Saks &Associates, which offers consulting and training in C++ and C. He is secretary of the ANSI and ISO C++ committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield OH, 45504-4906, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.