December 1997/The Learning C/C++urve

Columns

The Learning C/C++urve

Bobby Schmidt

NULL and void *

Bobby works hard to make nothing out of something.

Copyright © 1997 Robert H. Schmidt

This month we resume exploring auto_pointer, left dangling since September. And while I had intended to bring NULL into play this month anyway, I've received added incentive from Diligent Readers objecting to the loose semantics of null pointers. One such reader (David R. Tribble) even mentioned the September column in his posting to comp.std.c++.

My original motivation for crafting auto_pointer was to overcome deficiencies in real C++ pointers. By analogy, I'll create a "null" auto_pointer to overcome deficiencies in real C++ null pointers. As those deficiencies originate with C, we'll start by tracing the etymology of C's NULL.

ints and Pointers

To begin, consider int, C's most basic type. Fundamentally, an int object always hold some int value:
int i = 123; // i holds 123
even if that value is implicit:
// i holds whatever was passed in
void f(int i)
or undefined:
// i holds some random value
auto int i;
Every value you can give an int object is a genuine int value. Whether or not that value makes sense in the current program context is a different question.

Now consider pointers. You may be tempted to extrapolate from int and decide pointer objects always hold some pointer value; or, more precisely, that pointer objects always hold the address of some other object:
int i;
/* OK, p holds address of i */
int *p = &i;
But what about an uninitialized pointer:
/* p holds the address of...what? */
auto int *p;
Does it contain the address of some object, perhaps a "default" object?

In the world of int, an uninitialized object always holds some int value. That value may be unknown, but it is always an int:
auto int a;
cout << a; /* dubious but OK */
By comparison, in the world of pointers an uninitialized object most likely contains a value that is not a genuine pointer (i.e., not the address of any existing object):
auto int *leeloo;
/* may go badaboom! */
cout << *leeloo;
As a solution, you could create a dummy object:
extern void *dummy = &dummy_;
// OK, i contains a real pointer */
int *i = dummy;
But if you happen to accidentally dereference such a dummy pointer:
int *i = dummy;
int j = *i; // OK but probably unwise
the translator might not catch the error at run time (the behavior is undefined).

floats and Pointers

Instead of ints, perhaps floats are a better inspiration for pointers. Unlike integer objects, real number objects can contain values that correspond to no genuine number. For example,
#include <math.h>
float f = log(-2);
cannot correctly yield a genuine real number, since taking the logarithm of a negative number is a mathematical no-no. But since f has to contain something, what value should be stored? For just such occasions, math.h has special implementation-defined values corresponding to no actual real numbers. In this example, f is often set to the special value NAN, or "not-a-number" [1] .

By analogy, pointers need a peaceful NAP, or "not-a-pointer," some expression that is of pointer type but not an actual pointer (address) value. Fortunately, as we all learned in CS 101, Standard C has such not-a-pointers, formally known as null pointers.

Null Pointers

According to the C9X Draft Standard [2] :

An integral constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant. If a null pointer constant is assigned to or compared for equality to a pointer, the constant is converted to a pointer of that type. Such a pointer, called a null pointer, is guaranteed to compare unequal to a pointer to any object or function.

Two null pointers, converted through possibly different sequences of casts to pointer types, shall compare equal.

The macro NULL is defined in <stddef.h> as a null pointer constant.

Possible null pointer constants include

0

0L

0LL (C9X long long)

(void *) cast of the above

The (in)famous macro NULL is defined to be one of the possible null pointer constants. Which one is up to the implementation, although in practice C vendors typically define NULL as a void *. Why? Two possible motivations spring to mind.

First, such a definition keeps NULL away from non-pointers:
int n;
n = 0;          /* OK (well *duh*) */
n = (void *) 0; /* error, type mismatch */
preventing mysterious compile-time flags.

Second, void * is usually the correct size for an unchecked pointer argument:
printf("%p", 0);          /* possible size error */
printf("%p", (void *) 0); /* OK, correct size */
In fact, you may wonder how NULL could ever be defined as an integer. Remember, the Standard says that a null pointer constant such as 0 converts to a null pointer — not that 0 is a pointer, but that it converts to one. The distinction is subtle but real. Keep this point in mind, for I'll refer to it again shortly.

Null Pointers in C++

Standard C++ generally adopts C's definition of null pointer constants and null pointers. One exception: NULL cannot be defined as a void *.

In C a void * is assignment compatible with other pointer types:
void *v = NULL;
int *i = v; /* OK, v converts to int * */
This allows NULL to be a void * and still work:
int *i = NULL;       /* OK */
int *j = (void *) 0; /* equivalent */
In C++, such an assignment draws a compile-time diagnostic:
int *i = NULL;       // error, type mismatch
int *j = (void *) 0; // equivalent
since C++'s void * won't convert to other pointer types. Were NULL so defined, you'd have to use a cast:
int *i = (int *) NULL; // OK
breaking ported C code and defying programmer intuition. To circumnavigate this problem, the C++ Standard requires that NULL be a plain integer:
int *i = 0; // OK
Why Null Pointers are Evil

Remember that for an arbitrary type t, the statement
t *p = NULL;
requires that the null pointer constant (defined by NULL) must convert to type t *. Because 0 is a possible value for NULL, this implies the statement
t *p = 0; // OK
will work. By implication, constructs like
void f(t *);
// OK, 0 converts to t *
// during call */
f(0);
work too. Such conversion leads to confusion in overload resolution:
void f(void *);
void f(long);
f(f);    // OK, converts argument to void *
f(1);    // OK, converts argument to long
f(NULL); // error, ambiguous conversion from 0
Also note a subtle point that confuses many: when null pointer constants (like 0) convert to null pointers, the representation may no longer be a machine zero. That is, after the statement
char *p = 0;
the bit pattern inside p may not be that of the original integer 0. This implies that zeroing-out a pointer is not necessarily the same as nulling-out that pointer.

As an example,
int **p = (int **) calloc(100, sizeof(*p));
and the equivalent
int **p = (int **) malloc(100 * sizeof(*p));
memset(p, 0, 100 * sizeof(*p));
dynamically allocate an array of 100 pointers to int, setting each member of that array to 0. But since integer 0 is not necessarily the same encoding as a null pointer, this array may not be properly initialized.

A more portable initialization is
int **p = (int **) malloc(100 * sizeof(*p));
for (int i = 0; i < 100; ++i)
    p[i] = 0; // tantamount to p[i] = (int *) 0;
allowing the proper conversion of 0 to a null pointer before assignment to p[i].

Diligent Reader David Tribble's quandary was especially perverse. In compiling the POSIX-compliant call
if (gmtime_r(&t, &s) != 0)
    return -1; /* failed */
he found one implementation declaring gmtime_r as
int gmtime_r(const time_t *, struct tm *);
/* returns 0 on success */
and another declaring it as
struct tm *gmtime_r(const time_t *, struct tm *);
/* returns null pointer on failure */
Note the difference in return type and return value semantics: in the former declaration, a 0 integer implies success, while in the latter a null pointer implies failure. Both versions of the call compile, but clearly, only the first runs correctly. Because logical false, null pointers, and arithmetic zero morph into one another so easily, C and C++ offer many opportunities for such type mischief.

As I mentioned in September's column, were it up to me, NULL would act like Pascal's nil, a true null pointer constant convertible only to pointers. Believe it or not, we can create such a nil look-alike that gets remarkably close to this goal.

Null Pointers Redeemed

In that column I disabled NULL-initialized auto_pointers:
template<class t>
class auto_pointer
    {
    // ...
private:
    auto_pointer(void *);
    };
auto_pointer<int> p = NULL; // error
Remember, NULL is either a void * matching auto_pointer(void *), or an integer 0 converting to a void * that matches auto_pointer(void *). In either case, the matched member is private and inaccessible, leading to a compile-time flag.

But if NULL and null pointers are no longer assignable to auto_pointers, how do we indicate an auto_pointer is not bound to anything? That is, how do we get the benefits of NULL without the unpleasant after-taste? The same way we code ourselves out of most jams: we write a new class.

Departing from my usual style, I'll show my version of such a class first, then explain the rationale behind it:
//
//  header nil.h
//
class nil
    {
public:
    ~nil();
    nil(int, int);
    };
inline nil::~nil()
    {
    }
     
inline nil::nil(int, int)
    {
    }
     
static nil const nil(0, 0);
 
//
//  modified header auto_pointer.h
//
template<class t>
class auto_pointer
    {
    // ...
protected:
    auto_pointer(class nil const &);
    };
// ...
     
template <class t>
inline auto_pointer<t>::auto_pointer(class nil const &)
        :
        is_owner_(true),
        pointer_(NULL)
    {
    }
nil Dissected

I've adopted the name of Pascal's nil, my inspiration for this class. Java-come-latelys may opt to call this class null. I avoid that name to reduce confusion with C++'s use of the term "null."

Note the lookup tricks I play in nil.h:

nil is a class, meaning the name nil appears in both the tagged and untagged namespaces. I can reference the class as either class nil or plain nil.

I want the name nil to represent the conceptual not-a-pointer value, not the nil class. I therefore declare an object nil of type class nil. This means plain nil now refers to the object, hiding the implicit use as a class. To reference the class, I must explicitly use class nil.

I also want it to appear as if one object of this nil class exists in the whole program. To ensure this nil object is available to other static objects at initialization, I declare a static copy of nil in each translation unit [3] . Because class nil has no data members and its member functions have empty inline bodies, such object duplication should be fairly benign.

At the same time, I don't want users minting other objects of type nil. By declaring class nil(int, int), I prevent the compiler from synthesizing nil default and copy constructors. This gives me a non-obvious constructor for my use, while eliminating the constructors others might use by accident.

To use nil with auto_pointer:
auto_pointer<int> p = nil; // same as p(nil)
auto_pointer<char> q;
q = nil;
Because nil does not convert to NULL or some other null pointer constant, you can't treat it as a real pointer or pointer constant:
int *p = nil;    // error
int i = nil;     // error
if (nil == NULL) // error
nil is compatible only with auto_pointer.

In fact, the only current use of nil is as a dummy argument for an auto_pointer constructor. To further mimic the behavior of real pointers and NULL, we can allow equality comparisons to nil:
template<class t>
class auto_pointer
    {
    // ...
protected:
    bool operator==
        (auto_pointer const &);
    bool operator!=
        (auto_pointer const &);
    };
auto_pointer<int> p;
if (p == nil)
    // ...
You may wonder how this works, since operator== and operator!= accept auto_pointer, not nil. The secret is the auto_pointer conversion constructor that turns nil into an auto_pointer. The compiler constructs a temporary auto_pointer object, which it then passes into the comparison operator.

Thus, the expression
p == nil
is really
p.operator==(nil)
which in turn becomes
p.operator==(auto_pointer<int>(nil))
This definition of operator== and operator!= allows comparison with all the types acceptable to auto_pointer single-argument constructors:
auto_pointer<int> p;
int *q = NULL;
// #1, OK, no conversion
p == p
// #2, OK, converts from int *
p == q;
// #3, OK, converts from class nil
p == nil;
For your designs, you must decide if comparing auto_pointers to real pointers (#2 above) is desirable. I find such comparisons do not wantonly violate my sense of auto_pointer's meaning.

You may also object to constructing temporary objects this way. You could create specialized operator== and operator!= overloads tuned to real pointers and nil, but I leave that as an exercise for the student [4] .

Going Once, Going Twice...

Next month's bacchanalia promises the latest C committee gossip, a new Mac OS development environment, a core dump of SD '97 East, and my odyssey into the Internet's Wayback Machine. All this, and your normal dosage of programming nostrum. Tell your family, tell your friends: as they say in the local phone ads, life's better here. Well, certainly better here than in that other guy's decl-spec column.

Notes

1. Today's koan: a number that is not a number. See there, you can now add "path of enlightenment" to this magazine's portfolio of virtues.

2. Section 6.2.2.3 (titled "Pointers"), paragraphs 3-4 and footnote 44.

3. I could use the Standard Library's trick for similarly making cout and company available for static initialization, but that's too complex for this exercise.

4. These same comments apply to operator=, which also relies on construction of a temporary.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE Bothell WA 98011; by phone at +1-425-488-7696, or via Internet e-mail as rschmidt@netcom.com.