January 1997/The Learning C/C++urve

Columns

The Learning C/C++urve

Bobby Schmidt

Driving You to Abstraction

The difference between a single file and a module is much like the difference between information and knowledge a matter of abstraction.

Copyright © 1997 Robert H. Schmidt

Happy New Year! Do you realize that according to sci-fi TV and movies this year the Jupiter 2 will launch from Alpha Centauri, two years later Moon Base Alpha will explode, and two years later still we'll send Discovery to Jupiter? I remember as a kid watching all this and thinking the notion of inter-planetary travel 30 years on seemed inevitable. Now we'll be lucky to stet foot on Mars in my lifetime [1] . Sigh ...

I'm also beginning to think we'll be lucky to have the next C standard out before that Mars landing. We on the ANSI C committee colloquially call this emerging standard C9X, where "9X" represents the year we're supposed to publish the standard. I'm slowly believing we should start calling it C0X. Maybe the next one after that we'll publish in 2014. Then we could call it C14 and use it to carbon-date the age of legacy C code — and legacy C coders.

In fact, it is you hardened C types that I address in these next months. As you may recall, I ended last month's column promising to start a new series building a bridge to the 21st century ... no wait, sorry, that came from writing this too soon after watching the Presidential debate ... building a bridge from C to C++. In past articles, I've discussed ways to convert your code from C to C++. Now I try to convert you from a C to a C++ programmer.

My intentions are not so nefarious as they sound. Many techniques promoted by C++ also happen to work well in C; thus, if you don't want to make the jump completely to C++, you can still profitably adapt these techniques to your C designs. Regardless of the language you use, I trust you'll find the discussion illuminating. So instead of saying I'll turn you into a C++ programmer, perhaps I should say I'll expose you to C++'s design principles, and let you decide which language to use as those principles' expressive medium.

C++ is often advertised as the latest programming be-all end-all [2] , a PL/I or Ada for the '90s, a magic elixir you must take because it's good for you and keeps you employed. I hope to show that C++ does offer some tangible, demonstrable advantages (and disadvantages) compared to C. While in the balance I find C++ superior to C, in the end what matters more is what you design, not the language in which you design it.

Definition and Rationale

I assert that the single most important design technique you can apply to your code is abstraction: the principle that says each piece of code should perceive other pieces for what they do, not for how they do it. In programming vernacular, abstraction describes a thing's interface, not its implementation.

A necessary means to the end of abstraction is encapsulation: showing exactly the relevant pieces of something, and hiding exactly the irrelevant pieces of that same thing. Encapsulation enables abstraction by showing interface and hiding implementation. The definitions of "relevant," "showing," and "hiding" depend on context, but the general concepts are fairly universal, even outside the programming realm.

Why is all this important? I suppose entire books could be written on the subject, but I'll distill my rationale down to a few points:

By insulating yourself from a particular implementation, you give the implementor free rein to change that implementation without affecting you.

By conceiving of and manipulating entities by their abstracted interfaces, you raise the abstraction level of your solution to more closely match the level of the real-world problem you are modeling.

If your programming language does not support a feature you want, you can create an abstraction that makes the feature look built-in to the language. Even though it's actually implemented by you, outside the language's specification, you can treat this new feature as if it were originally specified as part of the language.

Have You Driven a Ford Lately?

Let me give a non-programming example. In college I had a 1971 Ford Mustang. I knew how to start it and drive it, but didn't know, or care to know, how it all worked under the hood. To me, the car was an abstraction. I thought of it in terms of what it did, not how it did it; that is, I perceived it as being synonymous with its interface (steering wheel, brake, shifter, etc.) instead of as its implementation.

By treating cars as synonymous with their interfaces, and by exploiting the general sameness of all cars' interfaces, I could drive other people's cars with little retraining. Oh, maybe I'd have to adjust to a new power curve or clutch behavior, but the learning curve was short.

Unfortunately, after all those snowy Ohio winters with their winter road salting, my beloved Mustang quickly disintegrated into what we dubbed the "Rustang." In short order, the parking brake cable broke, the master cylinder leaked, wiring harnesses corroded, and I found myself having to understand how this abstracted car actually worked.

Because dealing with the pure interface was no longer enough, I bought the car's "source code"— the Ford shop manuals — and learned how to maintain it. From then on, I became aware of every change in its driving behavior, and mentally visualized all the pieces that could fall apart at a moment's notice. The driving task became severely more complicated and much less fun, as the domain of relevant car behavior had increased.

Segue to present day, and my 1995 Ford Probe GT. This car is almost pure interface; the bulk of the car is encapsulated beyond my perception. I take it in every two weeks for detailing, every 3,000 miles for an oil change, and every 6,000 miles for scheduled maintenance. I open the hood only to add washer fluid. I pay little attention to the car's implementation, and pay great attention to the joy of driving it.

Moral: by restricting your perception of a thing to its abstraction, you focus your attention on how that thing integrates into the rest of your life. Returning to programming vernacular, by focusing on a piece of code's interface, you get a clearer understanding of how it integrates with the rest of the code.

Cohesion and Coupling

Two commonly used terms that characterize a design's level of abstraction are cohesion and coupling [3] . My colloquial definitions for them are:

Cohesion — why things are intentionally connected together

Coupling — how changes in one thing affect another
Cohesion, like cholesterol, comes in both good and bad forms. The good form — functional cohesion — describes things that combine to perform a similar task, or operate on a common set of data. A functionally cohesive entity encapsulates and abstracts several other entities, yet has a single identifiable purpose. In effect, the entity raises the abstraction level of the encapsulated entities, further insulating users of the cohesive entity from the contained entities' implementation. For example, a binary tree node typically wraps three pieces (some data and two pointers) inside a conceptually single abstraction. You can chose to see the node as a bag of pieces that is, at the level of its implementation or as a cohesive whole at the level of its interface.

The simple rule for good abstraction says to maximize functional cohesion and minimize coupling. The more specific rule says to maximize functional cohesion and to localize, but not eliminate, coupling.

Low coupling is arguably always better than high coupling, but zero coupling is not possible. Consider a multithreading multitasking system. Assume that every thread works on its own unique data set, and that no task communicates with any other task. Although these pieces may operate independently of one another, they still interact indirectly via the operating system and user.

On some level, the OS must schedule execution, arbitrate device contention, synchronize user mouse input with screen output, and so on. Whether the pieces are aware of it or not, they are coupled together. Indeed, if the system is a fairly brittle one such as the MacOS or 16-bit Windows, one piece can crash all the other pieces, exposing the coupling.

The Internet extends coupling even further. Your machine and my PowerBook are completely independent systems; yet once we both connect to the Internet, our presence influences the network's resources and routing, so that my download of Apple's latest OS patch slows down while you're browsing http://www.elvissightings.com.

In the programming realm, abstraction must ultimately map to something concrete, and some parts of the implementation must become aware of one another. If you build a string library, and that library implements strings as arrays of char accessed via char *'s, some code somewhere must be aware of and coupled to that char * implementation. The trick is to minimize the coupling, so that if you replace the char * with something else, the smallest possible domain of code is aware and affected.

C and Abstraction

Given the above discussion, I can offer a redefinition:

Abstraction — the art of selectively exposing functionally cohesive interfaces and hiding coupled implementations.

C's origins as a high-level assembler prevent it from having sophisticated abstraction mechanisms; indeed, many aspects of C (e.g., signed vs. unsigned integers, short vs. long integers, bit operations) force a C programmer to be consciously aware of a particular implementation.

Nonetheless, C does support some abstraction mechanisms. While I expect none of these language features to be foreign to you, it's possible you may not have considered their properties as abstraction techniques.

Macros

Macros are a sledgehammer: they obey no scope, can cause non-obvious side effects, and allow apparent violations of normal syntax rules. Where possible, I recommend other techniques before macros, especially in C++; that language's support for inline functions, function overloading, templates, and constant objects replaces much of the historic need for macros [4] .

The one place macros work where other abstraction techniques fail is in solutions requiring textual substitution. Suppose you define a set of character string variables, and want to initialize each variable based on that variable's name:
function f()
    {
    char *s = "initial s value";
    char *error = "initial error value";
    /* ... and so on */
    }
If you have many such definitions, the code can become tedious and error-prone. Macros offer a solution:
#define string_construct(name)\
    char * name = "initial " #name " value"
     
string_construct(s);
string_construct(error);
/* ... and so on */
Using macros here not only simplifies the code, but also creates a functional cohesion: the simultaneous definition and initialization of a variable is abstracted into that variable's construction.

Modifying the example, if you initialize every scalar variable in your program to zero:
function f()
    {
    char c = 0;
    int i = 0;
    char *p = 0;
    /* ... and so on */
    }
you can instead use a macro:
#define scalar_construct(type, variable)\
    type variable = 0
     
function f()
    {
    scalar_construct(char, c);
    scalar_construct(int, i);
    scalar_construct(char *, p);
    }
Macros and structs

To zero-construct both scalars and structures:
#define zero_construct(type, variable)\
    type variable;\
    memset(&variable, 0, sizeof variable)
     
function f()
    {
    zero_construct(char, c);
    zero_construct(struct s, x);
    zero_construct(void *, p);
    }
To construct a particular structure (in this case, struct node):
struct node
    {
    char *data;
    struct node *next;
    };
     
#define node_construct(name)\
    struct node name;\
    name.data = "";\
    name.next = NULL
     
function f()
    {
    node_construct(a);
    node_construct(b);
    node_construct(c);
    }
For this latter example, you could also define the macro as
#define node_construct(name)\
    struct node name = {"", NULL}
node_construct, of course, maps to C++'s notion of a node class constructor. This technique not only creates a functionally cohesive interface, but it also insulates changes to the implementation; if you later decide to initialize the node members with some other values, you change the construction implementation in one place, leaving all the other code unaffected.

Later I will show a more versatile but less compact pseudo-construction technique using functions. In that discussion I will also give a (hopefully) more compelling argument for wanting to use such construction techniques in the first place.

typedefs

Perhaps the most obvious C data abstraction technique is the typedef. In the declarations
typedef int celsius;
celsius absolute_zero = -273;
celsius water_melting_point = 0;
celsius water_boiling_point = 100;
celsius is an abstracted type; instead of perceiving celsius variables as their underlying int implementation, you think of them as abstracted temperature quantities. If you later need to represent a larger range of temperatures, you simply redefine the typedef:
typedef long celsius;
For simple examples like this, typedefs work reasonably well. Once you start making the aliased types more complex, the mechanism shows its limitations. Consider
typedef char *string;
     
string a = "a";
*a = 'b';
which works. But what if you want to define a string object that doesn't allow you to change the pointed-to string? The definition
const string a = "a";
doesn't work, for you can still write
*a = 'b'; /* OK, but undesired */
typedefs and Type Qualifiers

typedefs were introduced into C a decade before the type qualifiers const and volatile, so I'm not surprised to find that they don't work seamlessly together. As Dan "const-correct" Saks has explained in the past few months, const in such contexts is part of the declarator (the thing being declared). To make const part of the decl-specifier (the type of the thing being declared) you need to bundle it into the typedef:
typedef char const *string; /* added 'const' */
     
string a = "a";
*a = 'b'; /* error, as desired */
From one perspective, this bundling breaks the abstraction model: it forces you to be aware that this form of constness is somehow buried inside the string definition. However, I'd argue that this does not affect the abstraction — provided you have the discipline to think of string as an abstracted textual entity, and not as a named wrapper for a char *. From that perspective, it makes no sense to change what a string points to since, in the abstracted view, string doesn't point to anything! The char * pointer is the implementation, not the interface.

Unfortunately, while the above interpretation may match your mental model, the compiler does not share this insight; to avoid unintentionally changing what the (conceptually hidden) char * implementation references, and to allow string variables to initialize from char const * variables, you still must put the const in the typedef.

typedefs and structs

I advocate that you always add the thin abstraction layer of a typedef to all your tagged types (enums, structs, and unions) [5] :
struct complex
    {
    double r;
    double i;
    };
     
typedef struct complex complex;
     
struct complex a; /* OK */
complex b;        /* also OK */
I call this an abstraction because the declaration
complex b;
hides the fact that complex is implemented as a structure. However, I also call it a "thin" abstraction because it doesn't take much to expose that implementation:
/* the '.' exposes 'b' as a struct */
b.i = 0.0; 
As an abstraction technique, typedefs are pretty weak; they require a fair amount of programmer discipline, and are always prone to revealing their underlying implementation. However, where credible, I recommend you use them over raw built-in type names; if/when you switch to a more robust abstraction method (like a real C++ class), your code will already have reserved that user-defined type name and integrated its use.

Segue

Next month, we'll continue surveying C abstraction techniques. After that, we'll start developing parts of a "real world" application that uses these techniques, exposes their limitations, and catalyzes exploration of more sophisticated (and eventually C++-specific) methods. I'll not mire you in the intricacies of developing a full-blown application; rather, I'll focus on the problems and solutions relating to abstraction that come up along the way.

I honestly don't know what progression this development will take; I only know that, by the time we're finished, we'll have used most (if not all) of the colors in these languages' abstraction palettes. o

Notes

1. However, I just read today that NASA is landing a rover on Mars on 4 July 1997, and that the rover will send live video feed over the Internet! July 4 is American Independence Day, so maybe the rover can video the space aliens as they come to invade.

2. Someone even more cynical than I might say Java has replaced C++ as the latest programming salvation.

3. I first learned of these terms in Structured Design by Edward Yourdon and Larry Constantine, Prentice-Hall 1979.

4. Some form of inlining should be available in C9X, or C0X, or whatever it ends up being called.

5. Because C++ automatically synthesizes these typedefs for you, you'll make your C code more C++-compatible by adding them. I made the case for this in my "C->C++ Mutations" series last year.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE Bothell WA 98011; by phone at +1-206-488-7696, or via Internet e-mail as rschmidt@netcom.com.