Dan Saks is the owner of Saks & Associates, which offers training and consulting in C and C++. He is a member of X3J11, the ANSI C committee. He has an M.S.E. in computer science from the University of Pennsylvania. You can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or call (513) 324-3601.
In "Writing Your Own Standard Headers: The String Functions" (The C Users Journal, Jan. 1990), I presented some basic rules for creating standard headers, and then I showed you how to apply those rules to create <string.h>. This article shows how to write five other headers you most likely need. But first, here is a non-standard header that simplifies writing the standard ones and eliminates some irritating portability problems.
<quirks.h>
The standard headers frequently use void and void * types. void indicates that a function returns no value, as in
void exit(int);or to indicate that a function accepts no arguments, as in
int rand(void);void * is the "generic data pointer" type used in declarations like
void *malloc(size_t); void free(void *);Many old compilers don't recognize void as a keyword. For these compilers 'void' functions are written without a return type in the function declaration (it defaults to int), and char * is used instead of void * for generic pointers. You can express your intent more clearly if you define
typedef int void; typedef char *void_star;These let you write declarations like
void_star malloc(); void free();which look more like Standard C.If your compiler generates code so that functions return ints the same way they return chars, then you can safely define
typedef char void;and write declarations like
void *malloc(); void free();which looks even more like Standard C.Some compilers, like cc on UNIX 4.2 BSD, implement void as a keyword, but don't allow void * as a type. On these systems, you need only define void_star.
After putting your definitions for void or void_star in a header called <quirks.h>, you should include it at the beginning of every standard header. These types will then almost appear to be built-in. You will need to include <quirks.h> explicitly only in source files that use none of the standard headers.
quirks.h can smooth out other differences in dialects. For example, if your compiler doesn't implement the const and volatile keywords, you can add
#define const #define volatileListing 1 shows a version of <quirks.h> for DECUS C. The protective wrapper prevents repeated definitions of void.
<stdlib.h>
Like <string.h>, <stdlib.h> was invented by the ANSI standard. It declares the general utility functions in the standard library, summarized in Table 1.EXIT_SUCCESS and EXIT_FAILURE are codes used with the exit function to indicate a program's success or failure to the host environment. They expand to integral expressions that need not be constants. (An integral type is any of the signed or unsigned forms of char, short int, int or long int, or any enumerated type.) On MS-DOS and UNIX, the codes are usually defined by
#define EXIT_SUCCESS 0 #define EXIT_FAILURE 1Some systems, such as RT-11, define multiple levels of failure, such as warning, error, severe error, etc., one of which you must pick for EXIT_FAILURE. You can define additional codes like EXIT_WARNING, but they will clearly be non-portable.MB_CUR_MAX expands to a positive integer expression whose value is the maximum number of bytes in a multibyte character as determined by the currect locale. This is meaningful only if you already have multibyte character support, in which case MB_CUR_MAX is already in your header. I just set it to 1.
RAND_MAX is the maximum value that can be returned by the rand function. It must be integral and constant. The return type of rand is int, so RAND_MAX is typically the value of the largest positive signed integer. The Standard stipulates that RAND_MAX must be at least 32767, but if your rand operates over a smaller range, use the smaller value until you rewrite the function.
div_t and ldiv_t are structure types returned by the div and ldiv functions, respectively. You can define them as
typedef struct {int quot, rem} div_t; typedef struct {long quot, rem} ldiv_t;where quot and rem may be in either order.wchar_t is the wide character type, an integral type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales. Like MB_CUR_MAX, this symbol relates to multibyte and wide character support. If you don't have it, use
typedef char wchar_t;Listing 2 shows my <stdlib.h> for UNIX 4.2 BSD. Notice that the definition of NULL uses void_star from <quirks.h>. A protective wrapper surrounds wchar_t because it appears in other headers in <stdlib.h>. No wrapper protects div_t and ldiv_t because they appear only in <stdlib.h> and the protective wrapper around the entire header prevents them from being redefined.The abs and labs macros are just interim implementations, because the ANSI standard requires that, unless explicitly exempted, all library functions must be implemented as functions (so they are addressable). Functions declared in headers may also be implemented as a macro, provided that the macro is "safe" (i.e., it expands to code that evaluates each of its arguments only once), but abs and labs aren't safe. When abs is a macro,
abs(*p++)will evaluate *p++ twice, producing both unpredictable results and unwanted side effects.
<stddef.h>
<stddef.h> contains some commonly used definitions, three of which NULL, size_t and wchar_t are also in <stdlib.h>. <stddef.h> also introduces a new type, ptrdiff_t, and a new macro, offsetof. My DECUS C implementation appears as Listing 3.ptrdiff_t is the type of the result of pointer subtraction. It is a signed integral type either int or long int. It doesn't need a protective wrapper because it isn't defined anywhere else.
The macro call offsetof(t, m) returns the offset (in bytes) of member m within structure type t. offsetof expands to a constant expression of type size_t. m cannot be a bit-field. In the rationale for the ANSI committee suggests three possible definitions for offsetof:
#define offsetof(t, m) \ ((size_t)&(((t *)0)->m))or
#define offsetof(t, m) \ ((size_t) (char *)&(((t *)O)->m))or
#define offsetof(t, m) \ ((size_t)((char *)&(((t *)X)->m) - (char *)&(x))where X is some predeclared address. None of these definitions is guaranteed to be portable, but so far, the first one has worked on every system I've tried.
<stdarg.h>
This header defines a type, va_list, and three macros, va_start, va_arg, and va_end, which access the arguments to a function with a variable length argument list (like printf or scanf). <stdarg.h> is very similar to <varargs.h> found with many UNIX C compilers.Listing 4 shows a simple function, concat, that uses <stdarg.h>. The function heading is a prototype whose parameter list ends with an ellipsis (, ...), indicating that the length of the list is variable. va_list is the type of a data object that tracks the current position in the argument list. va_start initializes ap so that the first call to va_arg returns the value of the first argument in the list's variable part. Subsequent calls to va_arg return the values of the succeeding arguments. You must supply the argument's type to each call to va_arg since arguments in a variable length list may be of different types. (Bear in mind that the type of an argument in a variable length parameter list will be promoted so that it will not be an integer type smaller than int, nor will it be float.) va_end does any cleanup that might be needed.
The implementation of <stdarg.h> depends on the compiler'sparameter-passing conventions. Most compilers pass arguments by pushing them onto the run-time stack. The rationale for the ANSI standard states that <stdarg.h> was designed to accommodate newer machines that may pass arguments in machine registers. Having no experience with C compilers for these machines, I will stick to the more common stack-oriented methods.
Most MS-DOS compilers push arguments so that the first argument has the lowest address. Figure 1 shows the argument list format for a call to
printf("%d %f %d\n", i, x, n)using a typical MS-DOS C compiler (where i and n are 16-bit ints and x is a 64-bit double). SP represents the value of the stack pointer. The figure shows the state of the stack just before jumping to printf.Listing 5 presents an implementation of <stdarg.h> for moat MS-DOS C compilers. va_start(ap, p) initializes ap to
(va_list)(&(p) + 1)which is the address of the first parameter in the list's variable part (the parameter after p). Some implementations write this expression as
(va_list)&(p) + sizeof(p)which is equivalent as long as va_list is char *. va_start should expand to a void expression, but many compilers erroneously omit the void cast.va_arg(ap, t) returns the value of the current argument addressed by ap (cast to type t), and advances ap to point to the next argument. On many compilers, you can implement va_arg as
#define va_arg(ap, t) (*((t *)(ap))++)This auto-increment expression may be a little easier to understand than the one in Listing 5, but it relies on an extension to the C Standard. The standard states that a cast expression, such as
(t *) (ap)is not an lvalue, so it cannot be the operand of ++. The version of va_arg in Listing 5 increments ap before applying the cast, then subscripts backwards to obtain the argument originally referenced by ap. It's more obscure, but stays within the standard.If your compiler lets you use the auto-increment expression, is there any reason not to? Yes. Consider Microsoft C v5.1. By default, the compiler lets you use various language extensions. You can implement va_arg as an auto-increment expression, but compiling your code with the /Za option (disable language extensions) produces a warning from the compiler. Microsoft implements va_arg as in Listing 5 so that it will work with every compiler option.
On the other hand, Zortech C v1.07 also uses the auto-increment version of va_arg. However, if you compile code using va_arg with the -A option (enforce ANSI compatibility), you don't get a warning. This means the compiler can't warn you about using this language extension in you code.
In most implementations va_end does nothing, but the standard states that it should expand to a void expression. If your compiler complains that ((void)0) is a useless expression, you can try using
#define va_end(ap) ((void)((ap) = 0))If generating unnecessary code bothers you, you can
#define va_end(ap)which works fine when va_end is called in a separate statement (as in Listing 4) , but produces a syntax error when va_end is embedded in nasty (but legal) expressions like
va_end(ap), n = 1;If your compiler pushes the arguments so that the first one is at the highest address, then you should use an implementation of <stdarg.h> like the one in Listing 6. It differs from Listing 5 in two ways:
- va_start initializes ap to point to (instead of beyond) the last fixed argument, and
- va_arg uses a pre-decrement (instead of a post-increment) to step to the next argument.
<limits.h>
This header contains macros that define limits for the sizes and ranges of integral types. Table 2 lists the macro names and their meanings. The standard specifies a minimum magnitude (absolute value) for each limit. The version of <limits.h> in Listing 7 uses these minimums. All implementation may (may because it's permitted by the standard!) increase the magnitude of the limits, but any program that relies on extended limits will not be portable to all implementations.For example, SHRT_MIN and SHRT_MAX define the range of values for type short int. The standard requires the range to be at least -32767 to +32767 (decimal) the set of values that can be represented using 16-bit ones-complement or sign-magnitude arithmetic. On a two-complement machine, you can increase the magnitude of SHRT_MIN to -32768, but any program that stores -32768 in a short int might not work on other architectures.
The standard allows the range of int to be as small as the range of short int. Hence, the minimum magnitudes for INT_MIN and INT_MAX are the same as for SHRT_MIN and SHRT_MAX, respectively. At the opposite extreme INT_MIN and INT_MAX could be as large as LONG_MIN and LONG_MAX, respectively.
I recommend that you write your <limits.h> to use the actual ranges supported by your compiler. This lets you take full advantage of your architecture when efficiency is more important than portability. When portability is important, you must remember to avoid depending on the larger limits.
CHAR_MIN and CHAR_MAX define the range of values for "plain" char. A compiler can choose to represent plain char as either signed char or unsigned char. If your compiler treats plain char as signed, then use
#define CHAR_MAX SCHAR_MAX #define CHAR_MIN SCHAR_MINOtherwise, use
#define CHAR_MAX UCHAR_MAX #define CHAR_MIN 0Some compilers let you select the representation of "plain" char. For example, Microsoft C v5.1 normally treats char as signed, but the /J option changes it to unsigned. This option also defines the macro _CHAR_UNSIGNED to allow conditional compilation (as in Listing 7) to determine the appropriate settings for CHAR_MIN and CHAR_MAX.Borland's Turbo C v2.0 provides a switch for selecting the the representation of "plain" char, but doesn't define a macro like _CHAR_UNSIGNED. In place of
#ifndef_CHAR_UNSIGNEDit uses
#if (((int)((char)0x80)) < 0)According to the standard, #if expressions cannot use type casts or the sizeof operator. Therefore, this technique can be used only on a compiler that supports this language extension. It also means the compiler won't warn you about using this feature even when you ask it to disable language extensions.The standard states that every macro, except CHAR_BIT and MB_LEN_MAX is defined as an expression that has "the same type as would an expression that is an object of the corresponding type converted according to the integral promotions." For example, INT_MAX is defined as an expression of type int, and UINT_MAX is an expression of type unsigned int. On the other hand, the character range limits (such as UCHAR_MAX) are defined as int expressions, rather than as (signed or unsigned) char expressions, because character types are promoted to int when used in an expression.
Notice that the unsigned limits are defined as unsigned constants. For example, UINT_MAX is defined by
#define UINT_MAX 65535uin Listing 7. The u suffix on the constant makes it unsigned. Without the u, a decimal constant is either a signed int or a signed long int, depending on the compiler. For example, DECUS C treats 65535 as (-1), but Microsoft C treats it as 65535L (a long int).If your compiler doesn't support the u suffix, you can try to write unsigned int constants in octal or hex. For instance, some compilers with 16-bit ints treat 0100000 through 0177777 and 0x8000 through 0xFFFF as unsigned int constants. If that doesn't work, you can try
#define UINT_MAX ((unsigned)65535)which might introduce another problem. Limits like UINT_MAX are supposed to be usable in #if expressions; however, this definition uses a cast, which (according to the standard) isn't usable. Even if your preprocessor won't accept casts in #if expressions, you might still find this definition useful in other contexts.A similar problem occurs when you try to set INT_MIN to -32768 on some two-complement machines (such as a PC) using 16-bit ints. In Microsoft C, 32768 is greater than INT_MAX, so it's a long int. Therefore, the definition
#define INT_MIN (-32768)is wrong because it makes INT_MIN a long int. On the other hand,
#define INT_MIN (-32767-1)only uses constants of type int, and so correctly defines INT_MIN as an int.
What's Been Gained?
I have shown how to write five standard headers: <string.h>, <stdlib.h>, <stddef.h>, <stdarg.h>, and <limits.h>. I have also presented <quirks.h>, which fakes a few new keywords that are missing from older compilers. With just these few headers, it's much easier to port Standard C code to older compilers.