February 1990/Doctor C's Pointers

Columns

Doctor C's Pointers ®

Header Design And Management

Rex Jaeschke

Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex.
All too often, programs just "happen." There is little if any serious design done, and programmers "design on the fly", using an approach I call stepwise refinement. That is, you code a bit and test it then iteratively refine it till it's somewhere close to what you think you want. And after you have hard-coded the same macro definitions and function declarations in ten different places you think perhaps it would be a good idea to create a header instead. However, this either doesn't get done or it's done at the local level to solve just the particular problem in the code you are currently working on. For the most part, I find people program defensively.
Designing and managing headers is an integral part of a C project design. It must be done before any code is written to ensure that the design is consistent, can be managed easily, and that a high degree of quality assurance can result. The lack of properly designed headers is a likely recipe for added development, debugging, and maintenance time, as well as significantly reduced reliability.
There are many aspects to designing headers. In this article I will look at those I've recognized. However, before I begin, a definition of the term header is in order. I think you all know what a header is but for the purposes of this discussion, I will consider a header to be a collection of declarations that can be shared across multiple source files via the #include preprocessing directive. And while a header is typically represented as a source code file on disk, it need not exist as such. For example, a header might actually be built into the compiler (at least the standard ones like math.h could be) or it could be compiled into some binary form that the preprocessor can more easily or efficiently handle. The specific representation details are left to the implementer's choice and will not be further discussed here. As such, I prefer to use the term header rather than header file or include file since the last two names imply a file representation. Whatever term you use, be consistent.

Header Categories
There are four categories into which headers can be classified: standard, system, library, and application.
A standard header is one of the 15 defined by ANSI C, such as stdio.h, math.h, and string.h. ANSI requires you to include standard headers using the notation #include <header.h>. Do so even if #include "header.h" appears to work for them. A standard header is stored in some special place such that it can be accessed from all places in which a source file can be compiled.
A system header is one supplied by the compiler vendor that can be used to interface to and/or exploit the host hardware and/or operating system. Examples on MS-DOS systems include bios.h and dos.h; on VAX/VMS, headers rms.h, rab.h, and fab.h are used to access the RMS file system; and on UNIX, the special set sys\*.h is provided. An implementer can provide as many system headers as he needs. VAX C, for example, comes with about 200. Since system headers are useful to all applications, they are typically stored in the same place as standard headers.
A library header is one provided with a third-party library such as a windows, graphics, or statistical package. Again, a product may include many headers and you may use a number of different libraries in the same application. Library headers are also universally shareable and will likely reside with standard and system headers.
An application header is one you design for a particular application and as such, it should be located in a place separate from headers in the other three categories. It is possible, however, that over the course of designing an application, you build a header that is useful beyond the life of the current system. This header then, should really be treated as a miscellaneous library header. If each programmer on the project develops his own private miscellaneous headers naming conflicts can easily arise, so you must ensure that private headers are not used.
During testing stages of a project, it can be very tempting to provide a quick (and often dirty) fix to a given problem by simply changing a header and recompiling the offending source module. However, this can cause other nasty side-effects later on when the system as a whole is rebuilt. Also, you must never, never, ever even think of changing a standard, system, or library header; these are sacred. For example, you might discover you need macros called TRUE and FALSE in several modules and since stdio.h is included in all of them, why not simply add definitions for these macros to that header? Afterall, it can't hurt any existing uses of these headers, can it? Apart from reflecting bad style when you next (re)install the compiler, these changes are lost. One solution to this is to make all headers, including application headers that have been moved to production, read-only. That way, if you should ever try to change or overwrite them you are reminded of the seriousness of such an action.

Header Names
ANSI C requires the standard header names to be written in lower case. Do so even if your file system is case insensitive (as is the case with MS-DOS and VAX/VMS.) In fact, ANSI does not require that filenames of the form header.h be supported by your file system. The compiler must accept #include <stdio.h>, but is allowed to map the period or any other part of that header name to other characters.
The convention of naming headers with a .h suffix is exactly that, a convention and need not be followed by user-written headers. Certainly, it's a useful default convention if you have no good reason to do otherwise.
If you wish to port code, keep in mind that the length of significance, case distinction, and format of filename (assuming a header is a file), are all implementation-defined.
It is generally considered bad style to specify device and or directory information in a header name. Considering that almost all compilers provide compile-time options and/or environment variables to specify an include search path, I see no reason to unduly reduce your flexibility options.

Header Contents
Just what should go in a header and how big should headers be? It is relatively easy to answer the "what." If something cannot be shared, it does not belong in a header. For the record, candidates for inclusion in a header are: macros, typedefs, templates for structures, unions, and enumerations, and function prototypes, extern data declarations, and preprocessing directives. Placing anything else in a header needs careful scrutiny. In particular, including executable code that is not inside a macro definition is very bad style.
My rule of thumb is to put all related stuff together in one header. However, if that makes for a very large header and the contents can easily be broken into logical subsets, then I prefer each subset be in its own header. It's useful to give such headers names with the same prefix so you can easily determine they are related. The only difference here is whether the preprocessor has to process one big header instead of just those parts it needs. Don't get too hung up on worrying how much work the preprocessor has to do unnecessarily since that's what CPU cycles are for. In fact, in the extreme case where you put each declaration in its own header, the preprocessor won't need to do any extra work, except for opening and closing all those headers.
It's quite likely that, while most things will fit neatly into related groups each in a header, some miscellaneous bits will be left over. About the only way to handle these reasonably is a miscellaneous header. ANSI C has one of these, called stddef.h. Whatever organization you chose, everything that can be shared should be shared. That is, you should make sure that all macros, function prototypes, etc., are part of some header and not hard-coded in source files directly.
Each header should be self-contained. If one header refers to something in another header, the first should directly include the second. Forcing the programmer to know and remember the order in which related headers need be included is burdensome and unnecessary.

Protecting Header Contents
It is very likely that in some source modules you will include the same header multiple times, once directly and one or more times indirectly via other headers. Since everything in a header is supposed to be shareable, there should be no problem in processing the same header multiple times except the extra work of preprocessing. Right? Well, that's not quite true. Specifically, if the same typedef or structure, union, or enumeration template definition is seen more than once, the compiler produces an error so they must be somehow protected. The best way to achieve this is to place a conditional compilation protective wrapper around the whole header as follows:

/* header local.h */ #ifndef LOCAL_H #define LOCAL_H ... #endif
I prefer to use a macro spelled in upper case the same as the header, along with a suffix of _H. This naming convention is easy to understand and is very unlikely to be used for other macros elsewhere in the set of headers. Using something like LOCAL could easily be used as a different macro elsewhere, leading to confusion.
Since the standard headers can also be included multiple times and some of them contain typedefs and structure templates, these too must be protected. Check those provided with your compiler to see if they indeed are protected. The only difference between your wrapper and that used by the standard headers is that you must not begin your private macro name with an underscore while they must, since that's the implementer's namespace.
It is preferable to have each thing defined in one, and only one, header. However, for various reasons it may be desirable to duplicate something in multiple headers. The problem here is to make sure that all of those headers containing duplicates can be included at the same time. For example, consider the case of having a typedef for count in two headers as in Listing 1.
You should also check your standard headers for this kind of protection since size_t, the type of the sizeof operator, is required to be typedefed in five of them. Note that ANSI C places strict rules on whether a standard header can include another standard header. For example, most identifiers defined in a standard header are only "reserved" if their parent header is included. For example, if you don't include one of the six standard headers that define NULL, you are perfectly safe in defining your own identifier NULL even though it would be bad style. So, if assert includes stdio.h, all the names in stdio.h would become defined as well, even though they are not defined in assert.h. And while assert.h could contain #undefs to remove these, there is no way for it to remove any typedefs or template definitions.
Many mainstream compilers claiming ANSI conformance or claiming to be tracking the ANSI standard break this rule. As such, they are not ANSI-conforming. Check your standard headers for this.

Conditional Inclusion
There are a number of ways to conditionally include headers as necessary. Perhaps the best is to conditionally compile a subset of #include directives inside a header, based on the existence or value of a macro defined using a compiler option. That is, the compilation path is specified outside all source modules. This way, you can trigger any possible conditional compilation path from as few as one macro.
You also have the ANSI invention of #include macro where macro must expand to a header name of the form <...> or "...". You also can use the stringize and token pasting preprocessor operators # and ## respectively, to construct a macro that is to expand to a header name.
I have also found that it is a good idea to remove as many preprocessing directives as possible from source modules into headers. In particular, I find conditional compilation directives in source code to be most distracting, especially when there are more that two compilation paths. The aim is to isolate such dependencies into headers so you can forget about them and get on with the business of implementing or maintaining the application. An example of this strategy follows:

#if TARGET == 1 fp = fopen("DBAO:[direct]master.date", "r"); #else fp = fopen("A:\direct\master.date", "r"); #endif
This can be implemented in a much clearer way by abstracting the filename into a header as in Listing 2.

Planning For Debugging And Maintenance
People who don't design programs are unlikely to plan for debugging and maintenance. They probably don't even write a shopping list for that matter. Unfortunately, there are lots of these people programming, many of them in C. It is very naive and probably irresponsible to believe that with a non-trivial program, debugging will be a mere formality and that you will always be around to maintain the code.
Over the years I have found it a useful idea to include a header called something like debug.h into every source file I write when working on a non-trivial project. If the header is empty, that's fine. However, it makes it very easy to add or change that header's contents and recompile all or part of the system for testing. Since you have one header included everywhere, it is trivially easy to make powerful changes and to experiment. And the cost of having this flexibility is practically nothing, if you cater for it at the beginning.

Concatenating Headers
There are always people who try to stretch a language's capabilities to the extreme. For example, they place part of a source file in one header and the rest in another and include them both to form a valid source module. Cute, but very bad style.
Let's look at just what can and cannot be split across multiple source modules, and therefore across multiple headers. A source module must contain complete tokens. That is, a source token cannot be split across two files. Specifically, the notation of backslash/new-line continuation cannot be used in the last line of a source file. Likewise, a comment cannot span two files.
With string literal concatenation now supported by ANSI, you could have a string in one file concatenated with a string in another, but that would require the strings to be outside a macro definition and I have already said that's very bad style. You could also split a structure template definition across multiple files, but I see no benefit.
One thing not immediately obvious in ANSI C is that each matching set of #if/endif and corresponding #elif and #else directives must be contained within the same source file. That is, the #if and matching #endif directives must be in the same source file.

Conclusion
I have addressed many issues here most of which have arisen from my own experiences. I am sure there are others that could be added. For the most part, I find header design to be simply a matter of common sense once you know and understand the tools the language and preprocessor provide. But then again, I find that to be pretty much the solution to a vast number of problems. It's sad that common sense is not all that common.