October 1997/C++ Theory and Practice

Columns

C++ Theory and Practice

Dan Saks

Initializing and Copying Subobjects

C++ lets you omit many details about copying objects when you define a class but not all. You may be surprised to learn what's not done for you.

Copyright © 1997 by Dan Saks

In my column last month, I described the rules by which C++ generates copy constructors and copy assignment operators for classes that lack them. I also explained a few techniques you can use to prevent compilers from generating these functions when you don't want them. (See "C++ Theory and Practice: Workarounds for a Mistake," CUJ, September 1997.)

On those occasions when you write a copy constructor or copy assignment operator for a class, you must be careful to copy exactly those subobjects (bases and members) of the class that need to be copied. If you fail to specify how to copy some subobject, your compiler will decide how to do it for you. It might decide to do nothing. It depends on whether the copy operation is a constructor or assignment.

This month, I will look at the rules for initializing and copying subobjects in constructors and copy assignments. I'll also alert you to some of the surprises that lurk therein.

Initializing Bases and Members

The definition for a constructor for a class can specify initial values for base and member subobjects by using a ctor-initializer. Although I have used ctor-initializers in many of my columns over the years, I think it's worth taking a moment to look at them in greater detail.

Table 1 shows the general form of a ctor-initializer. Table 1 uses the EBNF grammar notation, summarized in Table 2. A ctor-initializer is a sequence of one or more mem-initializers preceded by a colon and separated by commas. Each mem-initializer has the form
identifier ( expression-list )
The expression-list is a sequence of zero or more expressions separated by commas.

The identifier in a mem-initializer must be the name of a base class or member of the constructor's class. For a base, or for a member of class type, the expression list becomes the argument list to a constructor for that base or member.

For example, suppose class D has base class B, members m and n, and a constructor defined as:
D::D(T *p, int i)
:   B(p, i), m(i), n(0)
    {
    }
B(p, i) is a mem-initializer whose identifier names base class B. The parenthesized expression list (p, i) is the argument list to a B constructor. If B has no constructor that will accept arguments p and i, the program is in error.

For a member of non-class type, the expression-list can have at most one expression. If the expression is present, it specifies that the program initializes the member by assigning it the expression's value. For example, the mem-initializer m(i) in the previous constructor definition initializes member m by assigning it the value of i. If there is no conversion from i's type (int) to m's type, the program is in error.

Default and No Initialization

A mem-initializer can omit the expression-list. A constructor can also omit the mem-initializer for some or all of its bases and members. For example, class D might also have a constructor defined as:
D::D(T *p)
:   B(p, 0), n()
    {
    }
Here, the mem-initializer n() has an empty expression list, and the mem-initializer or member m is missing entirely.

For a subobject with class type, a mem-initializer with an empty expression list has the same effect as omitting the mem-initializer entirely. In particular, they both invoke the default constructor for that subobject. Of course, if the subobject doesn't have a default constructor and the compiler can't generate one, the program is in error.

For a member of non-class type, an empty expression list is not the same as no mem-initializer at all. In bygone days they were the same, but they aren't any longer. For member m, the mem-initializer m() initializes m with the value zero converted to m's type. In contrast, a missing mem-initializer leaves m uninitialized.

(This distinction is also true in a new expression. For example,
p = new int;
creates an uninitialized int object.
p = new int();
creates an int whose value is zero.)

Setting Up for Some Examples

The easiest way to describe the rules for copying subobjects is with some simple examples. For these examples, I'll use a class D with a base class B and a couple of members m and n:
class D : public B
    {
    ...
private:
    M m;
    N n;
    };
Each of the types B, M, and N has a public default constructor, copy constructor, and copy assignment. For my examples, I don't really care what these functions do, but I do want to see whether they execute. Therefore, the body of each member function contains little more than an output expression that displays a message announcing that the function has executed.

Rather than crank out three nearly identical classes B, M, and N, I produced them from a single class template T shown in Listing 1. T has a non-type parameter c of type char, which represents the class name in the messages coming from the member functions. For example,
typedef T<'B'> B;
defines type B as a class with a public default constructor, copy constructor, and copy assignment. The effect of a declaration such as
B b;
is to display the message
B's default constructor
to standard output.

Listing 2 shows class D with base B and data members m and n. It declares no member functions, so the compiler generates a default constructor, a copy constructor and a copy assignment operator when needed by the program.

In main, the declaration
D d1;
gives the compiler reason to generate a default constructor for D. That default constructor simply calls the default constructors for base B and members m and n. The output from the declaration confirms the expected behavior:
B's default constructor
M's default constructor
N's default constructor
The second declaration in main
D d2(d1);
prompts the compiler to generate a copy constructor for D. That copy constructor calls the copy constructors for each of the D's bases and members, as demonstrated by the output:
B's copy constructor
M's copy constructor
N's copy constructor
The lone assignment
d1 = d2;
in main causes the compiler to synthesize a copy assignment operator for D. That copy assignment operator calls the copy assignments for each of D's bases and members. Again, the output clearly demonstrates the expected behavior:
B's copy assignment
M's copy assignment
N's copy assignment
Constructors with Missing Pieces

It appears that C++ follows a very simple rule for initializing and copying subobjects in generated functions, namely, that a generated constructor or assignment for a class invokes the corresponding function for each base and member of that class. On the surface it appears that, if you write one of these functions that would otherwise be generated, but fail to mention some subobject, then the compiler will act on that subobject as if it had generated the function. But that's not really how C++ works.

Listing 3 shows class D (from Listing 2) with an explicitly- defined default constructor, copy constructor, and copy assignment operator. The explicitly-defined default constructor
D::D()
    {
    }
behaves exactly like the generated one (from Listing 2) . That is, the declaration
D d1;
in main produces the same output as it did before.

All the bases and member of D have class type, so this default constructor is equivalent to
D::D()
:   B(), m(), n()
    {
    }
However, if any member, say m, had a non-class type, the default constructor in Listing 3 would leave m unitialized, while the one above would initialize m with zero.

The explicitly-defined copy constructor in Listing 3:
D::D(D const &d)
    {
    }
does not behave at all like the generated one. Rather than call the copy constructor for each subobject, it calls the default constructor. Thus, the declaration
D d2(d1);
in main produces the same output as the default constructor.

The only time a program will implicitly call the copy constructor for a subobject is in a generated copy constructor. C++ treats the subobjects of an explicitly-defined copy constructor as it treats the subobjects in any other explicitly-defined constructor. Once again, that treatment is:

For a subobject of class type, call the default constructor.

For a (member) subobject of non-class type, do nothing.

If you want to completely reproduce the behavior of the generated copy constructor, you must supply mem-initializers for every subobject, as in:
D::D(D const &d)
:   B(d), m(d.m), n(d.n)
    {
    }
An initializer such as m(d.m) is pretty straightforward. It initializes member m of *this by copying the corresponding member of object d. The meaning of the initializer B(d) is not so obvious. It initializes the base B subobject of *this by copying the base B subobject of d, as follows.

B has a copy constructor, effectively declared as
B(B const &b);
The mem-initializer B(d) binds formal parameter b to object d. Inside B's copy constructor, b refers to the B subobject of d. In effect, the reference binding in the mem-initializer quietly converts a D to a B.

Copy Assignments with Missing Pieces

The explicitly-defined copy assignment in Listing 3:
D &D::operator=(D const &d)
    {
    }
also does not behave like the generated one. You might reasonably expect it to assign the subobjects it fails to mention, but it simply ignores them instead. Thus, the assignment
d1 = d2;
in main does nothing. It generates no output at all.

If you want to reproduce the behavior of the generated copy constructor, you must write assignments for every subobject. Assignments for the members are simply
m = d.m;
n = d.n;
The assignment for the base class subobject is not quite as simple. The base class subobject is not a named member, so composing an expression that refers to that subobject requires a little cognitive effort. There are various ways to do it.

One way is with a cast expression, as in:
*(B *)this = *(B *)&d;
Here, the left-hand side converts this from a D * to a B * and then derefences the converted pointer. The result is the base class B subobject of *this. Similarly, the right-hand side obtains the B subobject of d. Since the entire left-hand operand has type B, this assignment calls B's assignment operator
B &B::operator = (B const &b);
You can achieve the same effect with reference casts instead of pointer casts, as in:
(B &)*this = (B &)d;
Actually, you don't really need the cast operator on the right- hand side. If you omit the cast, then the right-hand side has type D. This is still okay, because the right operand of B::operator= is a B const &, which will bind to the B part of d nonetheless.

On the other hand, you cannot dispense with the cast from the left-hand side. Writing
*this = d;
within the body of
D &D::operator=(D const &d);
generates a recursive call. The function will call itself until it overflows the call stack.

As long as you are writing a cast, you might as well use the new style. In this case, the appropriate cast operator is static_cast. Either
*static_cast<B *>(this) = d;
or
static_cast<B &>(*this) = d;
will do.

As an alternative to a cast, you can use an explicitly-qualified call. For an object x of class type X, an assignment such as
x = v;
compiles as
x.operator=(v);
As with any member function call, you can qualify the function name with a class name, as in
x.T::operator=(v);
Here, T must be class X (the class of x) or a direct or indirect base of X. The expression just above calls an operator= from class T or one of T's bases..

Since T::operator= is a member of T, its this pointer has type T *. (Actually, it's T *const, but the const is not critical to this discussion.) Therefore, the call
x.T::operator=(v);
passes the address of the T part of x as the this pointer of T::operator=.

Let's apply this to class D from Listing 3. Instead of writing the assignment of the base class part as
*static_cast<B *>(this) = d;
you can write it as
(*this).B::operator=(d);
which is equivalent to:
this->B::operator=(d);
However, every reference to a declared or inherited class member m from within a member function compiles as if it were written as this->m. Therefore, you can write the expression just above as simply
B::operator=(d);
In summary, if you want to reproduce the behavior of the generated copy assignment for class D in Listing 3, you should define the copy assignment as
D &D::operator=(D const &d)
    {
    B::operator=(d);
    m = d.m;
    n = d.n;
    return *this;
    }
An Interesting Pitfall

As is often the case in C++, a minor typographical error can produce astonishing behavior. Here's one of my favorites.

If you accidentally write only one colon instead of two, as in
D &D::operator=(D const &d)
    {
    B:operator=(d); // : instead of ::
    ...
    }
the code still compiles. It even starts running. It just doesn't stop for quite a while.

The single : (colon) turns B: into a statement label. C++, like C, maintains statement labels in a namespace separate from all other identifiers. A program can have a statement label with the same spelling as a class name.

There are no gotos to the label in the function body above, so the compiler does nothing with that label. The function call compiles as if it were simply
operator=(d);
This is equivalent to
*this = d;
which, as I mentioned earlier, is an infinitely recursive call. Is this a programming language, or what? o

Dan Saks is the president of Saks & Associates, which offers training and consulting in C++ and C. He is active in C++ standards, having served nearly seven years as secretary of the ANSI and ISO C++ standards committees. Dan is coauthor of C++ Programming Guidelines, and codeveloper of the Plum Hall Validation Suite for C++ (both with Thomas Plum). You can reach him at 393 Leander Dr., Springfield, OH 45504-4906 USA, by phone at +1-937-324-3601, or electronically at dsaks@wittenberg.edu.