Columns


Standard C

The Header <stddef.h>

P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standards committee, WG14. His latest book is The Standard C Library, published by Prentice-Hall. You can reach him at PJP@wsa.oz; or uunet!munnari!wsa.oz!pjp.

Background

The header <stddef.h> is yet another invention of committee X3J11 in forming the C Standard. The name follows the usual cryptic pattern for naming headers in the Standard C library. It is meant to suggest that here is where you find certain "standard definitions." The only other suitable parking spot for the definitions in this header might be <stdlib.h>. That too is a committee invention. It earned its (equally) vague name as a place to declare various functions, old and new, that had no traditional associated standard headers. It may seem silly to create two such catchall repositories. Nevertheless, the committee had its reasons — <stddef.h> is one of the four headers available even in a freestanding environment.

The types and macros defined in <stddef.h> have an interesting thing in common. Every one has been, at one time or another, a candidate for inclusion in the language proper. That's because every one is, in the end, defined by the translator in a private way. It is not easy to write portable code that can take the place of any of these definitions. Sometimes it is essentially impossible.

On the other hand, all the types and macros defined in <stddef.h> can, as a rule, be written as conventional type and macro definitions. The implementor simply needs to be privy to how a given translator defines certain types and operations.

Consider the three type definitions in this header — ptrdiff_t, size_t, and wchar_t. Each is a synonym for one of the standard integer types. An implementation cannot, for example, make short 16-bits, wchar_t 24-bits, and int 32-bits. It must make wchar_t the same as some type that you can specify for a type definition. The same constraints apply to the other two type definitions.

Implementing the macro NULL simply requires that you choose the most suitable of several possible options — 0, 0L, or (void *)0. You pick a form that works properly as an argument of type pointer to void (or pointer to char, signed char, or unsigned char) in the absence of a function prototype. (I discuss the macro NULL in greater detail below.)

It might be more elegant, perhaps, to include a null-pointer constant in the C language proper. The suggestion has been raised any number of times. Nevertheless, one of these forms usually suffices for the ways in which NULL tends to be used.

That leaves the macro offsetof. You use it to determine the offset in bytes of a structure member from the start of the structure. Standard C defines no portable way to write this macro. Each implementation, however, must have some non-standard way to implement it. An implementation may, for example, reliably evaluate some expression whose behavior is undefined in the C Standard.

You can look on offsetof as a portable way to perform a nonportable operation. That is true of many macros and type definitions in the Standard C library. In each instance, the need to actually extend the C language proper is not quite there. That's why the header <stddef.h> exists.

What The C Standard Says

7.1.6 Common Definitions <stddef.h>

The following types and macros are defined in the standard header <stddef.h>. Some are also defined in other headers, as noted in their respective subclauses.

The types are

ptrdiff_t
which is the signed integral type of the result of subtracting two pointers;

size_t
which is the unsigned integral type of the result of the sizeof operator; and

wchar_t
which is an integral type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales; the null character shall have the code value zero and each member of the basic character set defined in 5.2.1 shall have a code value equal to its value when used as the lone character in an integer character constant.

The macros are

NULL
which expands to an implementation-defined null pointer constant; and

offsetof(type, member-designator)
which expands to an integral constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The member-designator shall be such that given

static type t;
then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.)

Forward references: localization (7.4).

Using <stddef.h>

The uses for type and macro definitions in the header <stddef.h> are essentially unrelated. You include this header if you need one or more of the definitions it provides. Note, however, that only the type definition ptrdiff_t and the macro offsetof are unique to this header. You will often find that including another standard header will supply the definition you need. I discuss each of the type and macro definitions separately.

Type ptrdiff_t

When you subtract two pointers in a C expression, the result has type ptrdiff_t. It is an integer type that can represent negative values. Almost certainly it is either int or long. It is always the signed type that has the same number of bits as the unsigned type chosen for size_t, described below. (I said above that the use of these definitions is essentially unrelated. These two definitions are themselves highly related.)

You can subtract two pointers only if they have compatible data-object types. One may have a const type qualifier and the other not, for example, but both must point to the same data-object type. The translator can check types and complain if they are inappropriate. It generally cannot verify the additional constraint — both pointers must point to elements within the same array data object. Write an expression that violates this constraint and you often get a nonsense result from the subtraction.

The arithmetic essentially proceeds as follows. The program represents both pointers as offsets in bytes from a common origin in a common address space. It subtracts the two offsets algebraically, producing a signed intermediate result. It then divides this intermediate result by the size in bytes of the data object pointed to by both pointers. If both pointers point to elements of a common array, the division will yield no remainder. The final result is the difference in subscripts of the two array elements, regardless of the type of the elements.

That means, for example, that the expression &a[5] - &a[2] always has the value 3, of type ptrdiff_t. Similarly &a[2] - &a[5] always has the value -3. I assume in both cases that a is an array data object with at least 5 elements. (Pointer arithmetic is still defined for the element "just off the end" of an array, in this case &a[5] if a has exactly 5 elements.)

ptrdiff_t can be an inadequate type, in some instances. Consider an implementation where size_t is the type unsigned int. Then ptrdiff_t is the type int. Let's say further that you can declare a data object x as an array of char whose size N is greater than INT_MAX bytes. (The header <limits.h> defines the macro INT_MAX as the largest positive value representable by type int.) Then you might write something like:

#inlcude <limits.h>
#include <stddef.h>

#define NINT_MAX+10
.....
   char x[N];
   ptrdiff_t n = &x[N] - &x[0];
What is the result of the expression that initializes n? An overflow occurs because the result is too large to represent as an integer of type ptrdiff_t. The result is undefined. You can't get around this problem. It is an intrinsic weakness of the Standard C language.

Having painted this bleak picture, I must now tell you that such a situation rarely arises. It can only happen with arrays whose elements occupy only one byte. Typically, these are elements of type char, signed char, or unsigned char. Rarely are they anything else. Overflow can happen on small computer architectures where type int has, say, a 16-bit representation. It can also happen on architectures that let you create enormous data objects.

Even then, you get an overflow only if you subtract pointers to two character array elements more than half an adddress-space apart. And even then the overflow may cause no problems because two's-complement arithmetic (the commonest form today) forgives many sins. Your program may well pass through all these perils and do what you intend anyway.

I recite all this esoterica to justify a simple conclusion. You will seldom, if ever, have a need to use the type definition ptrdiff_t. It's only practical use that I can imagine is to store the result of a pointer subtraction or the difference between two subscripts. Usually, your program consumes such results on the fly. This type has the intrinsic limitation that it cannot reliably capture all results of pointer subtractions. That limits its usefulness in a portable program. It's nice to know that you can determine the type of the result of a pointer subtraction. But I don't know why you would care most of the time.

Type size_t

When you apply the sizeof operator in a C expression, the result has type size_t. It is an unsigned integer type that can represent the size of the largest data object you can declare. Almost certainly it is either unsigned int or unsigned long. It is always the unsigned type that has the same number of bits as the signed type chosen for ptrdiff_t, described above.

Unlike ptrdiff_t, however, size_t is very useful. It is the safest type to represent any integer data object you use as an array subscript. You don't have to worry if a small array evolves to a very large one as the program changes. Subscript arithmetic will never overflow when performed in type size_t. You don't have to worry if the program moves to a machine with peculiar properties, such as 32-bit bytes and 1-byte longs. Type size_t offers the greatest chance that your code won't be unduly surprised. The only sensible type to use for computing the sizes of data objects is size_t.

The Standard C library makes extensive use of the type size_t. You will find that many function arguments and return values are declared to have this type. That is a deliberate change over older practice in C that often led to program bugs. It is part of a general trend away from declaring almost all integers as type int

You should make a point of using type size_t anywhere your program performs array subscripting or address arithmetic. Be warned, however, that unsigned-integer arithmetic has more pitfalls than signed. You cannot run an unsigned counter down until it goes negative — it never will. If the translator doesn't warn you of a silly test expression, the program may loop forever. You may find, in fact, that counting down to zero sometimes leads to clumsy tests. You will occasionally miss the convenience of using negative values (such as EOF, defined in <stdio.h> to signal end-of-file) and testing for them easily. Nevertheless, the improvement in robustness is well worth the learning investment.

Type wchar_t

You write a wide character constant as, for example, L'x'. It has type wchar_t. You write a wide character string literal as, for example, L"hello". It has type array of wchar_t. wchar_t is an integer type that can represent all the code values for all wide-character encodings supported by the implementation.

For an implementation with only minimal support for wide characters, wchar_t may be as small as char For a very ambitious implementation, it may be as large as unsigned long. More likely, wchar_t is a synonym for an integer type that has at least a 16-bit representation, such as short or unsigned short.

You use wchar_t to represent all data objects that must hold wide characters. Several functions declared in <stdlib.h> manipulate wide characters, either one at a time or as part of null-terminated strings. You will find that many function arguments and return values in this group are declared to have this type. For this reason, the header <stdlib.h> also defines type wchar_t.

Macro NULL

The macro NULL serves as an almost-universal null pointer constant. You use it as the value of a data-object pointer that should point to no data object declared (or allocated) in the program. As I mentioned earlier, the macro can have any of the definitions 0, 0L, or (void *)0.

The last definition is compatible with any data object pointer. It is not, however, compatible with a function pointer. That means you cannot write:

int (*pfun)(void) = NULL; /* WRONG */
The translator may complain that the expression type is incompatible with the data object you wish to initialize.

An important traditional use for NULL has largely gone away. Early versions of the C language had no function prototypes. The translator could not check whether a function-call argument expression was compatible with the corresponding function parameter declaration. Hence, it could not adjust the representation of an expression that was compatible but had a different type (such as changing tan (1) to tan(1.0). The programmer had to ensure that each argument value had the proper representation.

Modern programming style is to declare function prototypes for all functions that you call. Nevertheless, an important context still exists where a function argument has no corresponding parameter declaration. That is when you call a function that accepts a variable argument list (such as printf, declared in <stdio.h>). For the extra arguments, the older C rules apply. A few standard type conversions occur, but mostly it is up to you, the programmer, to get each such argument right.

In the earliest implementations of C, all pointers had the same representation. Usually, this representation was the same size as one of the integer types int or long. Thus, one of the decimal constants 0 or 0L masqueraded nicely as a null pointer of any type. Define NULL as one of these two constants and you could assign it to an arbitrary pointer. The macro was particularly useful as an argument expression. It advertized that the expression had some pointer type and was a null-pointer constant.

Then along came implementations where pointers looked quite different than any of the integer types. The only safe way to write a null pointer was with a type cast, as in (char *)0. If all pointers looked the same, you could still define NULL as, say, (char *)0. The macro still served as a useful way to write argument expressions.

Standard C permits different pointer types to have different representations. You are guaranteed that you can convert any data object pointer to type pointer to char (or pointer to signed char or pointer to unsigned char) and back again with no loss of information. The newly introduced type pointer to void has the same representation as pointer to char, but is assignment-compatible with all data-object pointers. You use pointer to void as a convenient genetic data-object pointer type, particularly for declaring function arguments and return values.

The safest definition for NULL on such an implementation is (void *)0. There is no guarantee, however, that pointer to void has the same representation as any other pointer. It isn't even assignment-compatible with function pointers. That means that you can't write NULL as a universal null-pointer constant. Nor can you safely use it as an argument expression in place of an arbitrary data-object pointer. It is guaranteed to masquerade properly only as a character pointer or as a generic pointer to void.

One modern style of writing C is to avoid the use of NULL altogether. Write every null pointer constant religiously with an appropriate type cast, as in (int *)0. That can lead to wordy programs, but has the virtue of being most unambiguous. A modification of this style is to write a simple 0 as a null-pointer constant wherever possible. That can lead to programs clear enough to the translator but not to human readers.

You will find the macro NULL defined in half a dozen different headers. It is easy for you to use the macro if you so choose. My only advice is that you choose a uniform style, as always, and stick with it.

Macro offsetof

You use the macro offsetof to determine the offset in bytes of a member from the start of the structure that contains it. That can be important if you wish to manipulate the individual members of a structure using a table-driven function.

The result of this macro is an integer constant expression of type size_t. That means you can use it to initialize a static data object such as a constant table with integer elements. It is the only portable way to do so. If you write code such as:

struct xx {
   int a, b;
   } x;
static size_t off =
      (char *)&x-b (char *)&x;
the behavior of the last declaration is undefined. Some implementations can choose to evaluate the initializer and obtain the obvious result. Others can choose to diagnose the expression instead.

Nor can you reliably step from member to member by performing pointer arithmetic. The macros defined in <stdarg.h> let you step from argument to argument in a function that accepts a variable argument list. Those macros, or others like them, are not guaranteed to work within a structure. That's because the holes between structure members can differ from the holes between function arguments. They need not follow any documented rules, in fact.

You need the macro offsetof to write code that is portable:

#include <stddef.h>

struct xx {
   int a, b;
   } x;
static size_t off =
   offsetof(struct xx, b);

Implementing <stddef.h>

Listing 1 shows the file stddef.h. It is fairly simple. Once again, I use the internal header <yvals.h> to supply imformation that can vary among implementations. In this case, that information determines all three type definitions and the form of the macro NULL. The header <yvals.h> typically contains the following definitions:

typedef int _Ptrdifft;
typedef unsigned int _Sizet;
typedef unsigned short _Wchart;
#define _NULL (void *)0
These definitions work for a wide variety of implementations. Nevertheless, certain implementations may require that one or more of them change. That's why I chose to parametrize them.

Macro offsetof

For the macro offsetof I chose to use a common trick. Many implementations let you type cast an integer zero to a data-object pointer type, then perform pointer arithmetic on the result. That is certainly undefined behavior, so you may well find an implementation that balks at this approach.

The translator must indulge you a bit further for this definition of the macro to work properly. It must let you type cast the zero-based address back to an integer type, in this case size_t in disguise. Moreover, it must tolerate such antics in an integer constant expression. That's what you need to initialize static data objects.

Luckily, quite a few translators grant such a triple indulgence. If you encounter one that doesn't, you will have to research how its implementors expect you to define offsetof. (Just list the file <stddef.h> that comes with the translator.) To comply with the C Standard, each implementation must provide some method.

This article is excerpted from P.J. Plauger, The Standard C Library, (Englewood Cliffs, N.J.: Prentice-Hall, 1992).