Features


Creating C++-Like "Objects" In C

Christopher Skelly


Christopher Skelly is chief executive officer and co-founder of Insight Resource Inc. Insight provides C, Advanced C and C++ training. Insight Resource also produces the popular KO-PILOT line of help and training help products, which Brit Hume called "the best add-ins ever written." Mr. Skelly can be reached at 914-631-5032.

With all the talk about objects these days, one might expect the Standard C definition of "object" to be somewhat more glamorous than a region of storage. Yet that is precisely how Standard C and the emerging C++ standard define object in its most fundamental sense, as a region of storage. Not even in terms of memory or disk storage, mind you, just plain storage.

If an object is just a region of storage, with no particular characteristics whatsoever, what then should we call the instances of C++ classes or other object-oriented systems? I would say that if "overloading" refers to using the same word or operator for different things, then the word "object" has certainly been overloaded. For some, an object is an elegant thing, combining persistence, concurrency, multiple inheritance and a host of rarified attributes. Anything less is less than an object. Others, however, will be satisfied with the qualities of abstraction and data hiding.

Some say C is already oriented towards objects with its struct feature. Add a few headers, they suggest, and C is practically as object-oriented as C++. Others point out that an object must have both data and functions, difficult to do with C structs.

No doubt, the overloaded term "object" will continue to be used frequently to describe class instances. In the perennial search to escape from the demon of "fuzzy-thinking," I will use the term "class-object" to identify a typical C++ class instance.

Class objects in C++ offer C programmers a valuable set of added resources, the two most basic of which are member functions and data hiding. All of the more spectacular features of C++, such as inheritance and polymorphism, depend on the underlying interface of member functions and hidden private data. The fundamental resource provided by C++ class objects is the ability to perform a general action, and yet have that action affect only the private data of one particular class object.

The C programmer can create member functions and hidden data as well. This article explores how to add member functions and private data into a simple C struct, and in so doing, illustrate some of the workings of a C++ pre-compiler. The C++ cfront pre-compiler translates all of C++ into corresponding C. You'll see that cfront must do a great deal more than simple preprocessing to generate the C code for C++ objects.

Adding class object features to simpler C objects is a challenging endeavor. How do you link up functions to structs, and have the functions act only on the particular information included in the specific linked-up structure? That's how C++ member functions behave. Pressing the question, how can you have private data members in C structs? Unlike C++, C has no keyword private. Data in a C struct has the same scope as the struct itself. On the surface, there appears to be no way to "hide" data inside a C struct. Yet C's resources are in fact powerful enough to support both member functions and hidden data in structs.

Creating Member Functions

Developing member functions in C must be done in two parts. First, you must associate functions with a struct type. Then you must make these functions operate only on the particular "object" or struct to which they are attached.

We'll use a simple CIRCLE type, with four members to which I'll add member functions and hidden data. The techniques illustrated here can be applied to any type of data holding struct.

typedef struct { int x, y; int radius, color; }
      CIRCLE;
The primary mechanism in creating member functions will be the sometimes infamous pointer to function. I say infamous because while pointers to functions are certainly one of C's most powerful resources, they are also potentially one of the trickiest. This sad fact is attested to by any number of crashed hard disks with overwritten File Allocation Tables.

/* pfi is a pointer to a *
   function returning int */
int (*pfi) (void);
The value of a pointer to function in C or any language is the ability to delay deciding exactly what function will be executed until the program itself is executed, at run-time. This "late-binding" of pointers to functions allows action strategies that are much more sophisticated than would be possible if every action were hard-wired into the source program. Complicated software, compilers for instance, makes heavy use of pointers to functions for all manner of purposes. New operators in the compiled language may be implemented as functions and accessed by means of a table of function pointers.

When adding member actions to the CIRCLE type, your first thought might be to simply add a pointer to a function, say a getcolor function that returns the current circle color.

typedef struct { int x, y; int radius, color;
  int (*getcolor)(void); } CIRCLE;
Now every CIRCLE has its own pointer to a getcolor function. Every time you declare a circle you will have to explicitly initialize getcolor to some function you have written.

int foo(void);
CIRCLE c;
c.getcolor = foo;
Furthermore, every CIRCLE will have its own specific pointer to its own specific getcolor function. But suppose you later decide to modify the action performed by the getcolor function? Now you must deal with hundreds of little structs, each running around with its own little action package. Instead, it would be very helpful to abstract the action portion of the struct from the rest of the struct. Then changing the action portion might require no changes at all to the rest of the struct.

A CIRCLE_ACTIONS struct will now go along with the CIRCLE type.

typedef struct { int (*getcolor) (void);
  int (*setcolor) (void);
  } CIRCLE_ACTIONS;
This first fundamental abstraction separates the actions of circles from the circles themselves. Now you can refine the action package without worrying about the rest of CIRCLEs for a while.

Of course, you must redefine CIRCLE to connect with CIRCLE_ACTIONS, by adding a pointer to the action package:

typedef struct {
  int x, y;
  int radius, color;
  /* pointer to CIRCLE_ACTION struct */
  CIRCLE_ACTION *pcact;
  } CIRCLE;
Now you can hook all CIRCLEs with the pcact pointer to a single CIRCLE_ACTIONS struct. But there are problems. Right now, the *getcolor function doesn't know which particular CIRCLE it should operate upon. We'll remedy that problem by passing *getcolor the address of the particular circle to access.

typedef struct { /* *getcolor will work on *pc */
      int (*getcolor) (CIRCLE *pc);
  int (*setcolor)(CIRCLE *pc, int val);
  } CIRCLE_ACTIONS;
With the definition of a CIRCLE and its actions out of the way, the code in
Listing 1 can create a circle c.

Now you can access c's data directly through its own member function.

int i = *c.pcact->getcolor(&c);
i receives the value 4, the color assigned to c. Our object-oriented system has gained the beginnings of member functions.

By the way, few will say that the syntax of this last construction is not a bit tricky. One of the great resources provided by C++ is the syntactical ease with which you can perform complicated actions like this.

Translated into "formal English," the above assignment to i means:

Access the getcolor pointer in the CIRCLE_ACTION struct pointed to by c's pcact pointer. Execute a function located at the address stored in this getcolor pointer. Pass the executing function the address of this specific CIRCLE. When the *getcolor function executes, it returns the color of the CIRCLE that called it.

CIRCLE and CIRCLE_ACTIONS can evolve independently, yet work together to provide the behavior of member functions. As a next step, I'll organize the action package a little differently, and implement a simple message system between the CIRCLE definitions file and the application program.

I will use an array of pointers to functions in the CIRCLE_ACTIONS typedef. I will also place the definitions of both CIRCLEs and CIRCLE_ACTIONS in a header file called OBJ.H, shown in Listing 2.

Now actions in the action package can be executed with a simple index into the pactions array. To call a member function in the application you no longer need to know the actual member name, only the index. I will use #defines for creating the index. A "class object pre-compiler" would create a header file of the #defines for you. Right now I will just show the new "messages" as simple #defines, one each for GETCOLOR and SETCOLOR.

It is also time to separate the internals of the CIRCLE_ACTION package from the application code that uses CIRCLEs. I'll create two source files, OBJ.C and CIRCLE.C to store the application code and the developing CIRCLE_ACTIONS function definitions respectively. For the moment, both files include OBJ.H, which contains typedefs for both CIRCLEs and CIRCLE_ACTIONS. Listing 3 and Listing 4 contain the initial versions of OBJ.C and CIRCLE.C. Note that the array of pointers to functions in the CIRCLE_ACTIONS package is initialized exactly as before.

There are several problems still to solve. For instance, right now there is nothing special about the functions in the action package. As global functions, circle_getcolor() and circle_setcolor could be used without declaring a CIRCLE object at all. Member functions should be accessible only through class objects. How can you hide functions effectively in C?

At moments like this, I remember something I tell our C and C++ students: "There are times when even after studying C for years, the light dawns on some internal facet in such a way that I imagine seeing a sign in the landscape which says 'Dennis Ritchie was here.'" The resource necessary for hiding functions already exists in C.

Storage classes provide the tool for hiding the action package from all non-class oriented access. I will use file-static functions, and later, file-static variables, to hide the internals of the CIRCLE definition from the rest of the program. I will isolate the "member functions" by giving them static storage class in the file CIRCLE.C. Because of their "file static" scope, functions like circle_getcolor() are visible only inside the file CIRCLE.C. This file-specific visibility is all that is necessary to hook the function addresses into the CIRCLE_ACTIONS struct, which itself is defined in CIRCLE.C. The CIRCLE_ACTIONS struct itself can still be accessed directly from another file, but I'll rememdy that lingering vulnerability shortly. In a class-object oriented system, the application program must interact with CIRCLE objects only through the defined member functions.

Listing 5 shows the new CIRCLE_ACTIONS definition file, called CIRCLE.C. The application program in Listing 6 is a small but illustrative main function, in a file called OBJ.C. These listings are more object-oriented than Listing 3 and Listing 4 by far. The applications define and create CIRCLEs and use the member functions of the action package to access their data. The member functions' indices must be documented at some point for others to know what the class interface is, just as they would have to be in any other class-object oriented system.

The system still has problems though. For one thing the syntax requires a hefty built-in "Cortex Compiler," or CC as I like to call it. Your Cortex Compiler is your ability to read a code fragment and know exactly what the compiler will make of it, character by character. Your Cortex Compiler is the most valuable pure C resource I know of. Still, syntax simplification is necessary if we're to survive the complexity of these new C "objects."

A greater problem now, however, is that all the circle's data is wide open to the public. No one should be able to get inside an object with a routine access or assignment. You must have private data for each object.

At this point it becomes clear that part of the problem is the method by which each CIRCLE object is initialized. The initialization is too simple. Rather than just assigning a few members when a class object is created, you must allocate some private storage for each new object's private data.

The system needs what in C++ is called a constructor. A constructor is a function that is called whenever a class object is created. The constructor creates the unique identity of the new class object, as well as giving it whatever private storage it needs. The constructor, of course, will also handle the simple assignments that initialize a CIRCLE. Finally, creating a constructor function lets you hide the CIRCLE_ACTIONS struct from any access outside CIRCLE.C.

Look at the changes to CIRCLE.C and OBJ.C in Listing 7 and Listing 8. Observe that only the constructor is external, allowing the constructor to be called whenever a CIRCLE is created in any file. The constructor connects each circle to the generic action package for CIRCLEs. Most of the internals of the CIRCLE class are now hidden from the application.

Now I'll let the constructor allocate memory for the private data variable in each circle. The member functions will always access the correct CIRCLE's data. A "destructor" will have to free the circle's private allocated storage when the circle dies. Note that in our example, the constructor and destructor are called explicitly. In C++ these functions are called automatically when objects are created and destroyed. The automatization of the constructor/destructor process is something else the pre-compiler would have to handle, inserting calls after object definitions and when objects go out of scope.

The final source files, OBJ.C and CIRCLE.C, are shown in Listing 9 and Listing 10. The CIRCLE constructor now allocates some private storage. A corresponding destructor function frees the storage when CIRCLEs go out of scope.

In Listing 10 a new, file-static struct, struct cir_pri_data, stores all the information about a circle's private data. A CIRCLE's private data is linked to a CIRCLE by the pprivate pointer to void. You therefore cannot use the pprivate pointer to directly access private data. Only in the file CIRCLE.C is the internal structure of struct cir_pri_data known. Several times in Listing 10, CIRCLE.C casts the void * pprivate to point at a struct cir_pri_data in order to access the CIRCLE's hidden data. Note that CIRCLE users, writing their own main functions, have no way to access a CIRCLE's private data except through the defined GETCOLOR and SETCOLOR indices. Our C structs now have both member functions and their own hidden data.

Conclusion

In this article, I have presented some of the hows of implementing an object-oriented system. I have not yet taken on the next task of examining how to put the new C "objects" to work. I would probably want to create a compiler or interpreter to allow a programmer using these new C "classes" to refer to them with a simpler syntax than the examples here! The compiler will read something written in the new language and translate the code into C. The programmer will then compile the C code and be done. I would actually be creating a pre-compiler, but one that does a lot more than handle a few #defines. The pre-compiler will create the appropriate temporary files to segregate the different class definitions from each other and from the application. The pre-compiler will also generate the necessary header for the type definitions and the message codes for C class objects to use with their member functions. Finally, the pre-compiler will make sure that the constructor and destructor functions are called whenever class objects are created or destroyed.

I might extend this new language and give it a host of other features, such as polymorphism, multiple inheritance, and overloaded operators. I might even think of calling this new language C++, until I happen upon the small sign neatly tucked into the landscape that says "Stroustrup was here."