May 1997/The Learning C/C++urve

Columns

The Learning C/C++urve

Bobby Schmidt

Let Me Say That About this

Our intrepid columnist leads us down a nice garden path, which ends in a very slippery slope.

Copyright © 1997 Robert H. Schmidt

This month's column features a wonderful synchronicity. I had planned all along to introduce C pseudo-member functions. I had also planned to introduce a new periodic series on C Committee happenings. What I hadn't planned was that the Committees would concern themselves with real C member functions, neatly tying these two pieces together.

I'll start with the what you can do in C today, how that maps into C++ today, and how life may change with C9X tomorrow.

Data/Function Independence

Consider the Standard C Library functions declared in header string.h, featuring such familiar routines as strlen, strcpy, and the like. When coding with these routines, you may well think in terms of functional control flow:
Functional control currently lives with your routine.

You push a couple arguments on the stack, then call the library routine.

Functional control now lives in the library; also, as a side-effect, your data is changed and/or a return value gets pushed and sent back to you.

Once the library returns, your routine resumes functional control.

Rather than focusing on variables and thinking they have something done to them, you focus on the functions and think they do something — a subtle but important distinction.

String routines operate on parameters of type char *. However, you don't think of strlen, et al, as inherent properties of char *, let alone as defining the complete set of char * properties. In fact, the opposite is more true — the routines rely on char *'s properties (e.g., pointing to a null-terminated char array).

You have a body of char *s you use in your program, and a detached set of library routines. You decide when or even if you wed your char *s with these routines; there is no innate connection between them.

Data/Function Symbiosis

Now turn attention to the Standard Library header stdio.h. Like string.h, this header features a bevy of routines. Unlike string.h, it also features a data type (FILE) upon which many of the routines operate.

All the stdio.h routines starting with the letter f either take a FILE * parameter or return a FILE *. Unfortunately, the organization is inconsistent. Some f-prefixed routines take FILE * as their first parameter:
int fgetpos(FILE *, fpos_t *);
some take FILE * as their last parameter:
int fputc(int, FILE *);
some that use FILE * don't start with f:
void rewind(FILE *);
and some don't mention FILE * at all:
int printf(char const *, ...);
However, even these latter routines use FILE indirectly, for
int printf(char const *, ...);
is tantamount to
int fprintf(FILE *, char const *, ...);
with stdout (a predefined FILE *) passed as the first argument. Indeed, many of the FILE-less declarations are really thin wrappers around equivalent declarations that accept FILEs.

FILE and the routines are symbiotic — FILE has little purpose without the routines, and the routines have little purpose without FILE. Unlike char *, which could live in happy ignorance of the string.h routines, FILE inhabits exactly the same design universe as do the stdio.h routines.

Also unlike char *, FILE is abstracted behind a struct definition. With C strings, you are aware of the underlying implementation (char *). With C files, you don't know (or more properly, don't need to know) the implementation inside the struct; the particular canon of struct members implementing a FILE is unimportant.

When you use FILE variables, you design around the data/function symbiosis. You think less of the functional flow independent of data lifetime, and more of data and functions as existing together.

Data + Functions = Class

This symbiosis exists not only from your (calling) perspective, but also from the library's perspective. Actually, the linkage between data and functions is greater there, for not only do the routines manipulate variables of type FILE, they also manipulate FILE's innards (the data members comprising a FILE implementation).

In fact, this linkage is so strong that, were it not for the syntactic and semantic limitations of C, you would be tempted to bundle the FILE structure and the associated routines as if they were part of a single "thing." In C++ we'd call that "thing" a class. C doesn't have classes, but it does let you get amazingly close.

Consider the following code that copies one file to another:
FILE *in, *out;
in = fopen("in", "r");
out = fopen("outv, "w");
while (!feof(in))
    {
    int c = fgetc(in);
    fputc(c, out);
    }
fclose(in);
fclose(out);
Change the names so that the FILE-centric nature is more evident:
FILE *in, *out;
in = FILE_open("in", "r");
out = FILE_open("out", "w");
while (!FILE_eof(in))
    {
    int c = FILE_getc(in);
    FILE_putc(c, out);
    }
FILE_close(in);
FILE_close(out);
Next, change the in and out declarations from FILE * to FILE, and arrange the function prototypes so that FILE * — the special object in symbiosis with these routines — is always the first parameter:
FILE in, out;
FILE_open(&in, "inv, "r");
FILE_open(&out, "out", "w");
while (!FILE_eof(&in))
    {
    int c = FILE_getc(&in);
    FILE_putc(&out, c);
    }
FILE_close(&in);
FILE_close(&out);
Within the library's function implementations, always call that first parameter this:
int FILE_eof(FILE *this)
    {
    /* ... */
    }

void FILE_putc(FILE *this, int c)
    {
    /* ... */
    }

/* and so on */
This forces an equivalence relationship between the magic identifier this and the FILE variable being referenced by every routine, so that you think of this and the FILE bound to this as one and the same.

Finally, group the FILE type definition and the FILE_ routines into a common header, as is done with stdio.h. The result is something like
struct FILE
    {
    /* FILE data members */
    };

typedef struct FILE FILE;

void FILE_open(FILE *);
void FILE_close(FILE *);
int FILE_getc(FILE *);
void FILE_putc(FILE *, int);
int FILE_printf(FILE *, char const *, ...);
/* and so on */
While none of this changes how the library or your code works, it does change your perception of how these pieces interrelate. The type definition and function declarations together form a single abstracted interface; you think of this interface not just as a FILE and not just as a set of functions, but as a meta-entity arising from the synthesis.

Comparison to C++

By orienting your C type/function declarations around these principles, you gain two advantages. First, as the related data and functions are more obviously tied together; the syntax reflects and reinforces the design. Further, because your code resembles "native" C++, you can more easily migrate to that language later.

How easily? The changes are largely syntactic:

Move the function declarations within the FILE structure definition, and remove the FILE_ prefix from those function declarations.

Remove the leading FILE * parameter from each function.

Remove the typedef.

The result is: [1]
struct FILE
    {
    void open();
    void close();
    int getc();
    void putc(int);
    int printf(char const *, ...);
    /* and so on, followed by FILE data members */
    };
For the function implementations:

Change the FILE_ prefix to FILE::.

Remove the first FILE * parameter's declaration. The parameter still exists, and is still called this; but C++ declares it implicitly, and doesn't need (or want) you to declare it explicitly. Using the implied hidden parameter, the function implementation still refers to the current FILE variable as this.

These changes yield function definitions like
int FILE::eof()
    {
    /* ... */
    }
void FILE::putc(int c)
    {
    /* ... */
    }

/* and so on */
Your code must now reflect these changes in two ways. First, functions are now members of the FILE struct, so use the . notation accordingly. Second, there is no longer an explicit first parameter of type FILE *, so you no longer pass in an explicit FILE * argument. The result for our example is
FILE in, out;
in.open("in", "r");
out.open("out", "w");
while (!in.eof())
    {
    int c = in.getc();
    out.putc(c);
    }
in.close();
out.close();
What's Next

By following these C design strategies, you more clearly represent intent today, and pave the way to easier C++ migration tomorrow. Throughout my C programming career, I've used such type/function symbiosis with consistent success.

Among other things, this shows that you need not use an OOP language to gain OOP features. In fact, I find that, for many projects, most of C++'s advantage over C distills out to:

C++ formally binds functions and their data, as I've shown here.

C++ makes some members accessible (public) and others hidden (private).

to a lesser extent, C++ provides implicit object construction/destruction.

Next month, I'll explore other aspects of C++ from the vantage point of C. I will then tie that together with last month's COM background to work through the larger MAPI example I've been threatening.

WG14 Follies

With this column, I start a new periodic feature: pulling back the curtain on the C Standard committees and our efforts to bring forth Standard C Version 2. Witness the spectacle as the Titans of C battle for supremacy over your development dollar — and all without having to leave the comfort of your La-Z-Boy or ante up the $US750/year committee fees.

I refer you to Dan Saks' wonderful C++ Standard organization taxonomy on page 68 of the March 1997 CUJ. Where Dan writes about the American and International C++ committees (X3J16 and WG21, respectively), I will unveil machinations of the corresponding C committees (X3J11 and WG14).

Disclaimer: I submit my columns quite a while before you read them, especially if you live outside the States. For example, it is mid-February as I write, and you are reading the May issue. Thus, any proposed C language changes may no longer be relevant by the time you learn of them from me. Caveat emptor.

This month's burning topic: proposals extending C structs to include

member functions

public and private access

the :: operator

These proposals don't include:

constructors and destructors

protected access

inheritance

function overloading

static members

pointer-to-member operators

If the proposals make it into the C9X standard, you will be able to write the earlier C++ FILE class definition as C. Now, long-time readers may be thinking I would unambiguously hail such a change. I clearly endorse the design such C additions would favor. However, the law of unintended consequences [2] looms large here:

You can't use the names private, public, and this for your own identifiers.

Without constructors, you must explicitly call a member function to set private data.

By moving functions inside a struct, you change the order in which those functions look up names.

You can't create a strictly conforming pointer to a member function without adding more C++-like features (::* and the like).

Member functions require structs to have linkage, which will in turn require a change in C's type compatibility rules. Implementations must embed linkage information (probably by C++-like name mangling) without compromising the guaranteed minimum of 31 significant characters in an external name.

In a separate proposal, we are considering inline functions, which would further complicate linkage and name lookup.

Without constructors, how do you initialize const private data?

What is the size of a structure with only function members?

I'd like to explore this last point. Currently in C, you cannot have a struct with no members:
struct s
    {
    }; /* error */
At the very least, you must stick in a dummy data member:
struct s
    {
    char unused;
    }; /*OK */
Why would you want an empty struct? To create objects with strong compile-time information and no run-time state information. Possible applications:

the run-time equivalent of enumeration constants

a limited form of abstract bases

"generic" pointers with semantics stronger than void *

a unique resource "handle" [3]

If the size of a function-only structure is non-zero, we add unnecessary padding for the above applications and risk alignment surprises. If the size is zero, we break other assumptions within the C language. Some committee members have suggested the size be non-zero unless the structure is serving as a pseudo-base or is aggregated in another struct, but others have argued this is difficult to specify in the Standard and non-intuitive for users.

My biggest concern in all these proposals is the ultimate rationale. C++ already provides a C-emulation environment that supports these proposed extensions. What is the compelling advantage in duplicating this environment? If C offered features that C++ didn't, features that were obviously enhanced by member functions, then perhaps I'd understand.

I also wonder about the proverbial slippery slope. If we add member functions in this limited form, will we be giving people enough advantage to use C-with-member-functions instead of C++? Or to make the choice of C compelling, will we end up reinventing so much stuff that C becomes largely indistinguishable from C++?

By asking these questions and discussing these design issues, I'm exposing you to the difficulties we encounter getting this language out the door. From an outsider's perspective, you may marvel at how many years these Standards take to gestate. From an insider's perspective, it often amazes me they are ever born at all.

Erratica

The Moving Finger writes; and having writ, Moves on; nor all your Piety nor Wit Shall lure it back to cancel half a Line, Nor all your Tears wash out a Word of it.
— Omar Khayyam

In March's column, I mentioned my earlier misspelling of the German "Drang" in "Sturm und Drang." I then went on to make the same mistake in the paragraph immediately following the correction! So consider this a meta-correction.

In April I had some fun at the expense of the Microsoft Systems Journal. Along the way I mentioned that the technical and acquisitions editor (Dave Edson) was a good friend. I recently found out just how good.

Literally one day after I submitted that April column, Dave contacted me. One of his authors for an upcoming issue had dropped out suddenly. Dave was desperate — all his other pinch hitters were unavailable. For the afternoon, my name became Obi-Wan Kenobi, for it seemed I was his only hope.

After meeting with him and a Microsoft program manager at their Redmond campus, I agreed. Aware that my reputation for dissing Microsoft is at stake, you must realize that 1) he's been a friend for a long time, 2) the subject is one I know something about, and 3) he's throwing a lot of money at me.

So, if you are curious, check out my article in the May 1997 MSJ. Since you are currently reading the May 1997 CUJ, odds are the MSJ will be available around the time your read this.

(Another synchronicity: the day before I sent CUJ my final corrections for this column, Dave and Kate Edson's first child Dominique Keanna was born. Both parents are professional photographers, with a darkroom in their basement — so Dave processed, printed, scanned, and e-mailed same-day pictures from home. I dedicate this column to the Edson trio, and look forward to resuming our perpetual 9-Ball tournament.) o

Notes

[1] Note that I still use the keyword struct even though I'm calling this a class. The only difference between the keywords class and struct is the default access specification, a topic I'm purposely avoiding this month.
[2] See P.J. Plauger's Editor's Forum in the March 1997 CUJ for more on unintended consequences.
[3] As I've mentioned in previous columns, Microsoft Windows has used such a scheme for several years.

Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE Bothell WA 98011; by phone at +1-206-488-7696, or via Internet e-mail as rschmidt@netcom.com.