September 1996/Questions and Answers

Columns

Questions & Answers

Pete Becker

Little-Known Effects of Defining Constructors

To ask Pete a question about C or C++, send e-mail to pbecker@wpo.borland.com, use subject line: Questions and Answers, or write to Pete Becker, C/C++ Users Journal, 1601 W. 23rd St., Ste. 200, Lawrence, KS 66046.

Find out why constructors and initializers don't always get along, and when to disobey the gods of OOP.

Q
I've been self teaching C++ for about 1 1/2 years now, and I'm comfortable with most of the language features (I mean, it doesn't scare me on a daily basis, as it used to do some months ago). I've tried to compile the following code with two versions of Borland C++ (3.1 and 4.5), and it always give me the error message:
Error test.cpp 19: Objects of type
    'C' cannot be initialized with {}

test.cpp:
class A {
public:
     A( char c = '\0' ) : ch( c ) {}
     ~A() {}
     // other members...
private:
     char ch;
};

// this is Ok, no complaint
A vowels[5] = {'a','e','i','o','u'};
struct B {
     char  a, b;
};

// this, obviously, is also Ok
B a_B_struct = { 'a', 'b' };

struct C {
A a, b;
};

// compiler error !!
C a_C_struct = { 'a', 'b' };
The compiler accepts the array initialization for both built-in and user-defined types, calling the proper constructor for the last. With structs, however, I get the error message. The help for the message says: "C++ classes can only be initialized with constructors if the class has constructors, private members, functions, or base classes that are virtual." If so, how can the array be initialized with no error?

I've found no reasonable explanation for this "feature," but it seems to be contrary to the (so-called) "spirit of C++," by presenting inconsistent behavior between the language native types and user-defined types.

Is that a language feature a compiler bug? Am I doing anything wrong? Any hints would be greatly appreciated.
-- Marcos Capelini

A
You're right, the compiler doesn't like your struct A in a context where it would be happy with a built-in type. On the surface this compiler behavior is inconsistent. As Emerson reminds us, however, "A foolish consistency is the hobgoblin of little minds...."[1] So let's look for the reason for this inconsistency, and see if we think it's defensible.

Let's begin by figuring out what the rule here really is. I could start this sort of a discussion off with a citation from the ANSI/ISO working paper, but in reality, I usually do what all of you do: I start off by playing with the compiler to see what it likes and what it doesn't like. Then when I think I've figured out what rules the compiler is applying I go to the authoritative source and try to figure out what rules it says the compiler should be applying. In most cases they're the same.

So, let's start out with something that we know works in C: embedding a built-in type in a struct.
Struct s1
{
int i,j;
};

s1 s = { 1, 2 };
As I'm sure you expected, this is perfectly acceptable to my compiler. It's straight C code, initializing a static object.

Now let's replace the built-in type with a struct:
struct D
{
int a;
};

struct s2
{
D i,j;
};

s2 s = { 1, 2 };
This, too, is acceptable, although BC5 gives me two warnings:
Warning test.cpp 11: Initialization
    is only partially bracketed
Warning test.cpp 11: Initialization
    is only partially bracketed
If you count lines in my source code, you'll see that both warnings are occurring on the very last line, the one that creates our object of type s2 and initializes it. The warning is legitimate: s2 contains two structs, and the initialization can be made more specific by adding brackets to delineate the initializers for each of the structs:
s2 s = { {1}, {2} };
This will quiet the compiler down. The compiler issues this warning because the potential for error is high: when you don't use brackets to set off the initializers for each of the contained structs, you have to count initializers more carefully. Let's change D a bit to see what sort of problems can arise:
struct D3
{
int a;
int b;
};

struct s3
{
D3 i,j;
};

s3 s = { 1, 2 };
Now the warning is pointing us to a potentially serious problem: the two fields in s.i are being initialized, and the two fields in s.j are not. That's because in the absence of braces to tell the compiler what to do, the compiler initializes each of the items in the struct in the order it encounters them. The leftover fields are left uninitialized. If we really want to do the initialization this way we can leave it alone. If we want to initialize a member of each of the D3 objects, as we did in the previous examples, we need to use a fully bracketed initialization:
s3 s = { {1}, {2} };
This initializes s.i.a and s.j.a, and leaves s.i.b and s.j.b with their default initialization. Finally, if we want to explicitly initialize all the fields in s3, we can do so in two ways:
s3 sa = { {1,3}, {2,4} };
s3 sb = { 1, 3, 2, 4 };
These two initializers do exactly the same thing. It's a bit harder to tell what's happening with the second form, so I prefer the first. Both are valid in C and C++.

So, the problem your code is having is not because you're using a struct as a member of C. That still works. How is your struct A different from my struct D above? Well, for one thing, it has a constructor. Let's see if adding a constructor to D leads to problems:
struct D4
{
D4(int);
int a;
};

struct s4
{
D4 i,j;
};

s4 s = { 1, 2 };
Now I get an error:
Error test.cpp 12: Objects of type 's4' cannot be
initialized with { }
Does that look familiar? Apparently adding the constructor made the difference. Be careful, though: always dig a little deeper. Let's try a simpler version of this code:
struct D4
{
D4(int);
int a;
};

struct s4
{
D4 i,j;
};

s4 s;
Now I get a different error:
Error test.cpp 12: Compiler could not generate default
constructor for class 's4'
That's right: we've created a class definition that's completely invalid. We got distracted for a moment by our attempts to apply aggregate initialization, and forgot to look at whether we could create one of these things in the first place. So let's help the compiler out by providing a constructor for our outermost struct:
struct D5
{
D5(int);
int a;
};

struct s5
{
s5(int);
D5 i,j;
};

s5 s = { 1, 2 };
Now we're back to our original problem:
Error test.cpp 13: Objects of type 's5' cannot be
initialized with { }
Let's try one more thing: we can get rid of the explicit constructor in s5 if we give D5 a default constructor. Does that work?
struct D6
{
D6();
int a;
};

struct s6
{
D6 i,j;
};

s6 s = { 1, 2 };

Warning test.cpp 12: Initialization is only partially
bracketed
Warning test.cpp 12: Initialization is only partially
bracketed
Yes, it works. What if we provide a default constructor that does the same thing as the compiler-generated constructor, that is, it simply calls the default constructor for each of the D subobjects?
struct D7
{
D7();
int a;
};

struct s7
{
s7() : i(),j() {}
D7 i,j;
};

s7 s = { 1, 2 };

Error test.cpp 13: Objects of type 's7' cannot be
initialized with { }
So it looks like the compiler is happy with aggregate initialization of objects that do not have any user-defined constructors, but won't accept objects that have constructors. We haven't really determined that yet, though: we've only tried default constructors. What happens if we add a constructor that takes an int, so it matches the types that we're using in the initializer list?
struct D8
{
D8(int);
int a;
};
struct s8
{
s8(int);
D8 i,j;
};

s8 s = { 1, 2 };

Error test.cpp 13: Objects of type 's8' cannot be
initialized with { }
What about a fully bracketed initialization?
s8 s = { {1}, {2} };
Error test.cpp 13: Objects of type 's8' cannot be
initialized with { }
Let's try the last resort of the desperate: a cast.
s8 s = { s8(1), (2) };

Error test.cpp 13: Objects of type 's8' cannot be
initialized with { }
Still no luck. The rule seems to be ironclad: objects with user-defined constructors cannot be initialized through aggregate initialization. If you look it up in the ARM or in the ANSI/ISO working paper you'll see that that's what it says. You can't do it.

Of course, all this exploration does not get at the answer to the ultimate question: why can't you do it? The logic is quite simple: by adding a constructor to your class you've suggested that the default action that the compiler would have taken would not work correctly. That really shouldn't be much of a surprise: the same thing happens with the default constructor. If you define any constructors in your class the compiler won't generate a default constructor. Add this one to your list of rules: if you define any constructors in your class the compiler won't do aggregate initialization. It believes you when you tell it that it doesn't know enough about your class to do these things right.

[1] "... adored by little statesmen and philosophers and divines." This is from R.W. Emerson, Essays, "Self Reliance", (First Series, 1841). I do not, of course, intend to accuse those who seek consistency in programming languages of being in league with "little statesmen."

Q
In C++ a class can have public, protected, and private sections. You can give other classes or functions access to the protected and private stuff using friend. Is there an acquaintance? That is, if I said that some other class was an acquaintance of my class, the other class could do public (of course) and protected, but not private.

Actually, I'm pretty durn sure that this doesn't exist now in C++. Do you know if it's going to exist in the future? I know that with some finagling I can accomplish what I want to do. I'd rather do it the easy (but not yet existing) way :-)
-- Brenda Molina

A
No, there's no acquaintance, and there's not much chance of getting one. There has been no lack of proposed changes to the Standard's specifications of access specifiers, to provide finer control over who can access what. But for the most part, these proposals tend to be work arounds. That is, they attempt to work around design problems in specific programs, rather than provide broadly useful language changes. In such cases, it's better to fix your program design rather than hack the language definition.

The C++ language definition, whether you get it from the ARM or from the ANSI/ISO working paper, does not tell you how to use the things that it defines. That's something you have to figure out for yourself. Used properly, protected members provide a simple, powerful, and controllable mechanism for permitting changes to the behavior of a class. Make sure you understand how this mechanism works, and that you've used it properly, before you decide that it's too limiting.

Public members of a class C, of course, can be accessed by any code that can access an object of type C. They define the behavior that the class C implements. That is part of the contract between the designer and the user of the class C. When C has a member function named show, and show is documented to display the object on which it is called, you ought to be able to rely on that behavior in all objects of type C and in all objects of types derived from C. If you can't rely on that, someone has messed up the class hierarchy.

Protected members are used to help derived classes implement the documented behavior of the base class. Protected members typically do one of three different things: they provide access to important information about the base class that functions in the derived class may need; they provide implementations of the basic operations that derived classes may need; and they provide hooks so that derived classes can modify the details of how the base class behaves.

Let's take the classic example of a base class for objects that can be drawn on the screen:
class Drawable
{
public:
virtual ~Drawable();

void draw() const;
void erase() const;
void move( int x, int y );
protected:
    Drawable( int x, int y );

    virtual void do_erase() const = 0;
    virtual void do_draw() const = 0;

    int get_x_pos() const;
    int get_y_pos() const;
private:
    int x_pos;
    int y_pos;
};
void Drawable::draw() const
{
do_draw();
}

void Drawable::erase() const
{
do_erase();
}

void Drawable::move( int x, int y )
{
erase();
x_pos = x;
y_pos = y;
draw();
}
The documentation for this class should have two parts, one describing the public interface and one describing the protected interface. The public interface is available to users of the class and classes derived from it. This interface supports drawing a derived object, erasing a derived object, moving a derived object, and destroying a derived object. (Of course, the documentation would describe each of these operations in more detail.) The protected interface is available to designers and implementers of classes derived from Drawable. The protected interface supports creation of an object at a specified position through the constructor. It also enables those with acess permission to determine the object's current location, through get_x_pos and get_y_pos. The protected interface also provides hooks through do_erase and do_draw, so that the implementer of a derived class can implement erasing and drawing appropriately.

Let's look at a Circle class based on Drawable:
class Circle : public Drawable
{
public:
    Circle( int x, int y, int r );
protected:
    int do_draw() const;
    int do_erase() const;
private:
    int radius;
};

Circle::Circle( int x, int y, int r )
    : Drawable(x,y), radius(r)
{
}

int Circle::do_draw() const
{
draw_circle( get_x_pos(), get_y_pos(), radius, black );
}

int Circle::do_erase() const
{
draw_circle( get_x_pos(), get_y_pos(), radius, white );
}
The class Circle adds its constructor to the public interface inherited from Drawable. That is, Circle promises us that we can create a circle at a specified location with a specified radius, that we can draw and erase a circle, and that we can move a circle to a different location. To do these things, Circle uses protected members of Drawable for all three of the basic capabilities that I mentioned earlier. The constructor uses Drawable's constructor to pass the Circle's initial position down to Drawable. The functions do_draw and do_erase use get_x_pos and get_y_pos to get the basic data they need from Drawable, and they provide implementations of draw and erase that are appropriate for a Circle.

Now we can create a Circle and move it around:
int main()
{
    Drawable *object = new Circle( 0, 0, 10 );
    object->show();
    object->move( 5, 5, );
    object->erase();
    delete object;
    return 0;
}
There's a problem with Circle, though. Suppose you have an object manager that keeps track of Drawable objects, and tries to be smart about not drawing objects that are not currently on the screen. To do that it needs to know the positions of all of the Drawable objects. That seems simple enough: just have the object manager call get_x_pos and get_y_pos whenever it needs to know where an object is. The problem is, it can't, because these functions are protected. Your first reaction might be to make the manager class a friend of Drawable so that it can call these functions. But this is overkill, since manager will then have access to Drawable's private data as well. Wouldn't it be nice if you could say that the object manager can get at Drawable's protected members but not its private members? Alternatively, wouldn't it be nice to let the object manager call get_x_pos and get_y_pos but no other protected or private members?

Tinkering with access rights might seem like a good answer at first glance, but there's a much better solution: make get_x_pos and get_y_pos public. After all, you will almost certainly encounter other situations in which you need to find the location of a Drawable object. Make this change and the problem goes away, and we have a class with a broader and more usable public interface.

No doubt some of you will object at this point that I've set you up: making get_x_pos and get_y_pos protected instead of public is obviously wrong; you suspected it all along. I plead guilty, but with an explanation: it's easy to get so focused on the details of making your code work that you lose sight of the overall design. When you run into an access problem, think about your class's public interface and protected interface. Don't just add friend declarations or wish for fancier options for controlling access. Look at how you got into your current situation, and consider all the possible ways out. Often the best solution is to rethink the interfaces to your class. If the design is wrong, fix it.

Q
We are trying to develop a Windows application that has multi-lingual support. To do this we have implemented a Msgs class in C++ that reads in a text file as an array of strings (TArray<string>). Since almost every subclass in the application needs to generate error/information messages it seems inefficient to pass the Msgs instance down the tree to make its members accessible. Declaring the Msgs object as a global would break OO rules (wouldn't it?).

A colleague has tried using a pointer to the parent class containing the Msgs object. Since this pointer has a type defined by the parent then the class using the Msgs object needs to know that type, thus again breaking the OO rules.

Inheritance could be used but the subclasses are not forms of Msgs, but have Msgs as part of them. For example:
class Msgs
{
private:
    TArray<string> tasMsgs;
public:
	   string& operator[](int index);
};

class OtherClass
{
public:
    void error_out(int id)
    {
    cout << tasMsgs[id]; // an error since
                         // OtherClass doesn't
                         // know about tasMsgs
    }
};

class Parent
{
public:
    Msgs mTheMessages;
    OtherClass oAnObject;
};
Any suggestions? -- Tom Robinson

A
There's a saying among those who trek the back country: if the map doesn't match the terrain, trust the terrain. The same thing applies to the "rules" that we use to suggest good design and coding techniques. If the rules don't fit your application, make sure the application works right.

You've mentioned two "rules" of object-oriented programming: global variables are bad and class interdependencies should be minimized. Both are generally good rules; both should also be violated when appropriate, and not taken as absolutes.

Sometimes a program uses data that is truly global. If that's the case, make it global. That's often the case with error messages. Sure, you could define each message in the place where it's needed, but then you run into exactly the problem you're trying to solve: when you want to localize the application (that is, translate messages into a language appropriate for your customers), it's much easier to have all the messages together in one place. Let's take a look at your code rewritten with the messages stored globally:
class Msgs
{
public:
    Msgs();
    string operator[](int index) const;
};

const Msgs ErrorMessages;

class OtherClass
{
public:
    void error_out(int id)
    {
    cout << ErrorMessages[id];
    }
};
I've made several changes to the sketch of your class Msgs. I added a declaration of a default constructor; I assume this was part of your original class, and you left it out of your note to make the class definition shorter. I added the default constructor back in because it's important to remember that it's there: it's responsible for loading the messages from wherever they're stored, presumably in some file that the program knows about. I also removed the actual array of messages. When we're talking about program design we're not interested in the internal details of how a class does whatever it does. All that matters is that you can index into an object of type Msgs and get back a string. I changed the return type of the index operator from string& to string, and I've added a const qualifier to it, to indicate that using it won't modify the object that you're calling it on. That means that the object ErrorMessages which I've added can be a const object.

Maybe you've noticed my change puts a heavy emphasis on const-ness. There's a reason: one of the biggest dangers in using global data is getting confused about which functions change the data. You may find that you no longer know whether the object actually contains what you think it contains. If you forbid your code to modify your global data you can have a great deal more confidence in its integrity. Once the ErrorMessages object has been initialized no one is going to change it. Your messages are safe.

Even when making things like ErrorMessages const, you can still carry out important optimizations such as lazy evaluation. Suppose you don't want to take the time to read the message file at startup, but instead want to wait until the program needs to display an error message. You can do that and still insist that ErrorMessages is a const object. We refer to objects like this as logically const: calling any member functions on ErrorMessages produces the same results, regardless of the preceding sequence of calls to its members. You'll always get the same message when you look up the same index.

To implement this behavior, keep a flag inside Msgs to indicate whether you've read the message file, and check that flag in the index operator. Both the flag and the message array will be modified by the index operator on its first pass. Since I've marked the index operator as a const operation, we have to tell the compiler that it's okay for const member functions to modify these two data members. That's the job of the keyword mutable.
class Msgs
{
public:
    Msgs() : data_present(false) {}
    string operator[](int) const;
private:
    mutable bool data_present;
    mutable Tarray<string> messages;
};

string Msgs::operator[](int index) const
{
if( data_present == false )
    {
    // insert code to read message file
    data_present = true;
    }
return messages[index];
}
The other risk in using global data is that two or more modules may use the same name for different global objects. That's easily solved: just put the global objects into a namespace:
Namespace Errors
{
class Msgs
{
public:
    Msgs();
    string operator[](int index) const;
};

const Msgs ErrorMessages;
};
The construct
Namespace Errors
{
. . .
};
defines everything between the curly braces to be within the Errors namespace scope. To access names defined in this scope, code outside the scope must add the namespace name as a qualifier:
class OtherClass
{
public:
    void error_out(int id)
    {
    cout << Errors::ErrorMessages[id]; //qualifier req'd
    }
};
So far I've been assuming you have only one set of error messages to be used by all classes in your application. The techniques I've described work fine in that situation. If you want to break your error messages down into smaller groups to make them more manageable or more flexible you've got do to a bit more coding. Let's say you've designed your class OtherClass to be used as a member of two separate classes, Class1 and Class2. You want it to display one set of error messages when it is used in Class1 and a different set when it is used in Class2. This situation is fairly common, for example when OtherClass is some sort of helper class, providing services to Class1 and Class2. In that case, the program must report a failure in the helper class in terms that make sense for Class1 and Class2 respectively. You must be able to tell OtherClass which set of messages to use, and that means that you must give it a pointer or reference to a Msgs object holding the appropriate messages. Here's a sketch of how this would be done:
class Msgs
{
public:
    Msgs(const char *);
    string operator[](int index) const;
};
class OtherClass
{
public:
    OtherClass( const Msgs& msg)
        : ErrorMessages(msg) {};
    void error_out(int id)
    {
    cout << ErrorMessages[id];
    }
private:
    const Msgs& ErrorMessages;
};

class Class1
{
public:
    Class1() : ErrorMessages(
        "Class1msg.txt" );
          other(ErrorMessages) {}
private:
    const Msgs ErrorMessages;
    OtherClass other;
};

class Class2
{
public:
    Class2() : ErrorMessages(
        "Class2msg.txt" );
        other(ErrorMessages) {}
private:
    const Msgs ErrorMessages;
    OtherClass other;
};
In every instance of OtherClass we've replaced the global variable ErrorMessages with a reference to the object that holds its error messages. Now, whenever we create an object of type OtherClass we specify the object that it uses to get its error messages. This technique gives us the most flexibility in structuring our error messages. Note that we've done this without requiring OtherClass to know anything about the structure of Class1 or Class2. Rather, those classes know their own structure, and know that they need to give an object of type Msgs to OtherClass for its message handling. OtherClass has to know only about Msgs, which is information that it already had anyway.

You're right to be concerned about global data and about structural interdependencies. Don't overdo it, though. One of the biggest challenges in program design is to recognize when it's all right to break the rules.

Pete Becker is Senior Development Manager for C++ Quality Assurance at Borland International. He has been involved with C++ as a developer and manager at Borland for the past six years, and is Borland's principal representative to the ANSI/ISO C++ standardization committee.