July 1995/Questions and Answers

Columns

Questions and Answers

A Pitfall Inherited from C

Pete Becker

Pete Becker is Senior QA Project Manager for C++ at Borland International. He has been involved with C++ as a developer and manager at Borland for the past six years, and is Borland's principal representative to the ANSI/ISO C++ standardization committee.
Q
I'm having a problem making two dimensional arrays of my Complex variable class. The following works:
Complex *a = new Complex[5];
Complex b[3][3];
I can assign/view the elements in both arrays. What I'd like to do however, is:

Complex **c: = new Complex[3][3];
When I try and assign values to the array elements I get a bus error. To set the values I do this:

Complex temp(10.0,1.0); // this works fine... Complex **c = new Complex[3][3]; c[0][0]: temp; // and so forth...
What else should I be doing to make the dynamic array allocation work for more than one dimension?
A
This question should be as relevant to C programmers as C++ programmers; one of the wonderful things about our profession is that there is always more to learn. I've seen this question many times on the Internet and always passed it by because I felt that it wasn't very difficult, and I knew how to solve it if I ever needed to. The other day I needed to: someone dropped by my office with this same question, and it took half an hour to get something that worked. I attribute part of the blame to the lateness of the hour, but most of it must lie squarely on my own inadequate understanding of the issues involved with arrays and pointers.
The complications in dealing with multidimensional arrays come from a feature of C that's usually a convenience: array names and pointers are interchangeable in most contexts. When this interchangeability does what you expect it's a great convenience; when it does something unexpected it can be very hard to debug. For example:

void f( char * ); char text[] = "Hello, world!\n"; f( text );
In the call to f, the array name text is converted to a pointer to its first element, that is, a pointer to char, which is then passed to the function f. We're all used to seeing this happen, and it doesn't surprise anybody.
This also occurs in a more subtle form:

f( "Hello, world!\n" );
Here, the compiler translates the literal string "Hello, world!\n" into an unnamed array of char. This array is converted to a pointer to its first element, and that pointer is passed to f. This, too, is not surprising, although the mechanism is a bit more complicated, since it does not involve an explicit array declaration.
However, if you become complacent about this interconvertability you can get into trouble. Have you ever stumbled over this one?

// FILE1.C char text[] = "Hello, world!\n"; // FILE2.C extern char *text; int main() { puts( text ); return 0; }
In FILE1 we've told the compiler that the identifier text is the name of an array of char, that is, the name refers to a block of memory that contains the ASCII codes for the characters "Hello, world!\n". In FILE2 we've told the compiler that text is the name of a pointer to char, that is, it refers to a memory location that contains the address of a block of memory containing ASCII codes. These two definitions of text are not the same, and the program that results from linking these two files will not run correctly. In order to make it work, we must either change the definition of text in FILE1 from array of char to pointer to char, or we must make the opposite change in its definition in FILE2, like this:

// FILE1A.C char *text = "Hello, world!\n"; // FILE2A.C extern char *text; int main() { puts( text ); return 0; }
or like this:

// FILE1B.C char text[] = "Hello, world!\n"; // FILE2B.C extern char text[]; int main() { puts( text ); return 0; }
You've probably guessed by now that the problem raised in the question is the result of confusing arrays and pointers. This occurs in the declaration

Complex **c = new Complex[3][3];
This declaration attempts to create a variable of type pointer to pointer to Complex and initialize it with the result of the new-expression. Let's take new out of the picture for the moment, so we can understand the types a bit better:

typedef struct { double re, im; } Complex; int main() { Complex Initializer[3][3]; Complex **c = Initializer; // line 9 return 0; }
When I try to compile this code as C code with Borland C++ 4.5 I get the following message:

Warning test.c 9: Suspicious pointer conversion in function main
This message is a sign of trouble, although it doesn't tell us much about what the compiler thinks is going on. If we compile the same code as C++ code we get a more detailed message:

Error test.cpp 9: Cannot convert 'Complex ( *)[3]' to 'Complex * *' in function main()
In C this conversion is valid; in C++ it is not. It's not that the C++ compiler can't do the same thing that the C compiler does with it; in fact, if you add the right cast you can persuade the C++ compiler to accept this conversion. The problem is that the conversion rarely makes sense, and C++ is much less tolerant than C of constructs that probably won't work.
In this case, we've stumbled into the same old complacency that we ran into before: we've assumed that [] can be replaced with *, but the compiler is telling us that it doesn't agree. If you read the C++ error message carefully, it contains all the information you need to figure out what's wrong. The compiler is looking at an object of type Complex (*)[3], and it is trying to convert it to Complex **. The left-hand side of the assignment operation is a Complex **, so the compiler must have decided that 'Initializer' is of type Complex (*)[3]. Once you work through C's inside-out type declarations, this means that the compiler is treating 'Initializer' as a pointer to an array of three objects of type Complex. We declared it as an array of arrays; the compiler is converting it to a pointer to an array, but refusing to convert it to a pointer to a pointer. In fact, that's the rule in both C and C++ — an array can only be treated as a pointer when it is the outermost type. Arrays buried on the inside of a type declaration cannot be converted to pointers.
Knowing this, the way to fix line 9 is to declare the variable as a pointer to an array, rather than a pointer to a pointer. If we replace line 9 with the following

Complex (*c)[3] = Initializer; // line 9
the code compiles and runs correctly, both as C and as C++. Note that the parentheses in the type declaration are necessary. Without them, c would be an array of pointers, not a pointer to an array.
Once this correction has been made, adding the dynamic allocation back in is easy. In C we just need to call malloc and allocate the correct number of bytes:

Complex (*c)[3] = malloc( sizeof(Complex)*3*3 );
In C++, instead of calling malloc, we use operator new:

Complex (*c)[3] = new Complex[3][3];
Q
Is anyone else as tired as I am writing set...() and get...() methods in C++ classes? Instead of having something such as:

class foo { int i; public: seti (const int i_new) { PRECONDITION (...); i = i_new; }; };
why can't we having something that goes:

class foo { readonly: int i; public: i( const int i_new ) {...}; };
Basically, when the value of i is set using foo_object.i = ..., i( const int ) is called. If i( const int ) does not exist, then foo_object.i cannot be set since i is declared readonly. readonly declares a private member that is similar to a member declared const, but a readonly member can be modified with a "modifier," i(). i() does not need a return value since (I believe) the return value should always be the return value of foo.i = ..., according to the C++ specification today. (i is of course declared public or protected in this case.)
I propose this because I am very much against making class properties public. I like controlled access. What do you think?
A
I agree that the internal representation of a class's properties should not be made public. However, I don't think that making them read-only is a good solution. To see why not, let's begin at the beginning, with a simple C struct, wrapped in a typedef so that its name can be used in the same way as a C++ struct's name.

typedef struct { int x; int y; } Point;
In C we can create an object of this type and modify the object's internal data:

Point p; p.x = 0; /* locate at origin */ p.y = 0;
If we want to move this object so that it refers to a different location we do so by assigning new values to the internal data:

p.x = 20; /* move to (20,20) */ p.y = 20;
One of the primary goals of C++ is to provide better mechanisms for encapsulation, so we may be inclined to rewrite this struct to encapsulate its internals:

class Point { public: Point( int x1, int y1 ) : x(x1), y(y1) {} void setX( int x1 ) { x = x1; } void setY( int y1 ) { y = y1; } int getX() const { return x; } int getY() const { return y; } private: int x, y; };
Now, instead of assigning directly to data members, users of Point call member functions:

Point p(0,0); // locate at origin p.setX(20); // move to (20,20) p.setY(20);
This version of Point provides better encapsulation because all of its operations are written as member functions. As a result, we can change the internal representation of a Point in such a way that users of the current class will not have to rewrite their code to use the new one. For example, suppose we change Point to use polar coordinates instead of rectangular coordinates. The new version looks something like this:

class Point { public: Point( int x1, int y1 ) { reset( x1, y1 ); } void setX( int x1 ) { reset( x1, getY() ); } void setY( int y1 ) { reset( getX(), y1 ); } int getX() const { return radius*sin(theta); } int getY() const { return radius*cos(theta); } private: double radius, theta; void reset( int x, int y ) { radius = sqrt( x*x+y*y ); if( x==0 && y==0 ) theta = 0.0; else theta = atan2(x,y); } };
With this new version of Point old code will continue to run correctly, although a bit more slowly. By writing member functions instead of directly exposing data members we produce a class that greatly simplifies program maintenance. We have insulated users of the class from changes in its internal representation.
Adding a readonly keyword will allow us to write classes that prevent direct assignments to their data members, but such classes do not insulate their users from changes to the internal representation. For example:

class Point { readonly: int x, y; }; Point p; cout << p.x << endl;
What happens to this code if we change its internal representation from cartesian coordinates to polar coordinates? The readonly members x and y go away, and any code that used them must be rewritten. readonly may seem convenient when you are first writing and using a class, but ultimately it exposes implementation details of the class and leads to code that is hard to maintain.
Incidentally, if I were writing new code, I would not write a Point class with setX() and setY() member functions. As the comments in the code examples indicate, a Point has two fundamental properties: it describes a location, and that "point" can be moved. I would design the interface to a Point class to directly reflect these properties. It would look something like this:

class Point { public: Point( int x1, int y1 ); void move( int x1, int y1 ); int getX() const; int getY() const; };
Q
A simple example:

class A; class B; file1.cpp A foo1; file2.cpp B foo2;
Sometimes the compiler first builds object foo1, sometimes foo2. How can you tell the compiler the order of object creation? How do you work around this C++ problem?
A
My usual answer to this question is "don't do that." Most of the times I'm tempted to create objects with this sort of interdependency it's because these things look clever, not because they represent the best solution to the problem. But I'm in a good mood today, so I'll concede that there are times when this sort of dependency is necessary. I'm willing to talk about how to make it work.
There's no portable way to tell the compiler to construct foo1 before foo2 except to put them into the same compilation unit. In that case, the compiler will construct foo1 and foo2 in the order of their appearance within the compilation unit. But that's not much help if you really need to have them in separate files.
Borland's compilers call constructors for global objects in the same order that the files occur on the linker command line. So, for example, the command line

bcc file1 file2
would produce an executable file in which foo1 was initialized before foo2, and

bcc file2 file1
would produce an executable file in which foo2 was initialized before foo1. Other compilers (including future versions of BCC) may do things differently, so check your compiler documentation if you're going to rely on this behavior. Oh, don't forget: if you put the resulting object files into a library and link with that library instead of the individual object files, all bets are off.
Let's look a little deeper, and see if we can come up with a better solution. For the purposes of this discussion, let's assume that this order dependency arises because the constructor for B calls a member function on the object foo1, like this:

extern A foo1; B::B() { foo1.Register(this); }
Clearly, if foo1 has not been constructed, this code will not execute correctly. One very simple mechanism (which I picked up from Jerry Schwarz) for guaranteeing that foo1 has been constructed is to access it through a function call instead of accessing it directly:

A& GetFoo1(); B::B() { GetFoo1().Register(this); }
The function GetFoo1() contains a static variable:

A& GetFoo1() { static A foo1; return foo1; }
With this code, foo1 will be constructed the first time the function GetFoo1 is executed. This guarantees that foo1 will be constructed before any use, and also that foo1 will not be constructed if it is never used.
However, this technique is a bit more expensive than accessing foo1 directly, since it requires a function call each time. If this function call becomes a bottleneck in your program you may be tempted to turn it into an inline function. Be warned: static variables in inline functions are tricky, and your compiler might not do what you expect. I've tried this with both BCC and MSVC, and both ignore the in-line declaration. The resulting code works correctly. Other compilers might produce a separate copy of foo1 in each compilation unit that uses GetFoo1. If that happens, this technique won't work. Check that it works correctly before you make GetFoo1 inline.
You can also use the "nifty counter" trick, which avoids the cost of the inline call but introduces more overhead at startup. Some implementations use this technique to ensure that cout and the other standard streams get constructed properly before execution of any code that uses them. The nifty counter technique requires adding a new class to the header, declaring the object to be initialized, and creating an object of the new class, like this:

// FOO1.H class Foo1Counter { static unsigned long Foo1Initialized; public: Foo1Counter() { if( Foo1Initialized++ == 0 ) ConstructFoo1(); } ~Foo1Counter() { if( --Foo1Initialized == 0 ) DestroyFoo1(); } }; static Foo1Counter LocalFoo1Counter; extern A foo1;
Every translation unit that needs to know about foo1 should include this header. The result is that every such translation unit gets a static object, LocalFoo1Counter, whose constructor checks whether foo1 has been constructed, and if it hasn't, constructs it. Similarly, LocalFoo1Counter's destructor gets rid of foo1 when destructors are run for the last translation unit that needs it. This technique avoids the overhead of a function call when foo1 is used, but imposes the cost of a constructor and destructor in every translation unit that might use foo1, even when it isn't actually used.
Another drawback to this approach is that it weakens locality of reference during program startup. Every translation unit that might need to reference foo1 accesses the counter Foo1Initialized while it is intializing its global objects. On a virtual memory system this means that the page that holds this counter will probably be in memory throughout program startup. If you have several of these in your program these extra pages might well increase the amount of memory needed for program startup beyond the amount of memory physically available in the system. That will force much more paging to disk than otherwise necessary, and program startup may become unacceptably slow.
Finding a simple, general method to control order of initialization has proven very difficult. Several very bright people on the ANSI/ISO C++ Committee have worked on it for three or four years. Every simple solution has had major flaws. One of the most important lessons that I've learned from this is that it's easy to find a solution that handles most of the problem cases; it's very hard to find one that handles them all. And handling most of the cases may be worse than not doing anything, especially if a program can create a "bad" case that's also hard to detect.
Q
Speaking as one dissatisfied user of C++ libraries, I would like to see name mangling standards incorporated into the proposed C++ ANSI standards. That way I wouldn't have to keep purchasing different versions of other people's libraries when I change compilers.
A
Name mangling is one of the least significant problems involved in creating a single library that can be used with multiple compilers. Consider that the C++ language definition also does not dictate how objects should be laid out in memory, how parameters are passed to functions, how virtual function calls should be implemented, or how virtual base classes should be implemented. As long as different compilers resolve these issues differently, differences in name mangling actually provide a major benefit: they make it much harder to accidentally link a program with a library that uses a different binary layout.
The C language also does not prescribe a standard for name mangling. Most vendors have settled on the convention of adding an underscore at the beginning of a name, but this is by no means required. If a single C library can be used with several different compilers it's because the compiler vendors have tried to use compatible binary formats. In C++ we're still exploring ways of implementing classes efficiently. It's almost certainly too soon to settle on a single way of doing things. Maybe in five years...
The questions above were taken from various online sources, including the Internet and CompuServe. To ask Pete a question about C or C++, send e-mail to pbecker@wpo.borland.com, use subject line: Questions and Answers; or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.