August 2002/Using Constructed Types in C++ Unions

Features

Using Constructed Types in C++ Unions

Kevin T. Manley

Toward a more perfect union.

The C++ Standard states that a union type cannot have a member with a non-trivial constructor or destructor. While at first this seems unreasonable, further thought makes it clear why this is the case:
class A { A() {}};
class B { B() {} };
union U { A a; B b; };
U u; // which constructor should be 
     // called? a::A() or b::B()?
The crux of the problem is that unions don’t have built-in semantics for denoting when a member is the “current” member of the union. Therefore, the compiler can’t know when it’s appropriate to call constructors or destructors on the union members.

Still, there are good reasons for wanting to use constructed object types in a union. For example, you might want to implement a scripting language with a single variable type that can either be an integer, a string, or a list. A union is the perfect candidate for implementing such a composite type, but the restriction on constructed union members may prevent you from using an existing string or list class (for example, from the STL) to provide the underlying functionality.

Luckily, a feature of C++ called

placement new can provide a workaround. What I’ll show next is an idiom you can use to get the same effect as having object members in a union. The idea is that instead of declaring object members, you instead declare a raw buffer and instantiate the needed objects on the fly. To use the idiom, just follow these steps:
For each constructed object type, declare a typedef. I’ll illustrate the next steps with one integer member and two object members of types TYPE1 and TYPE2.

Declare a struct MYUNION with a protected union member U.

Give MYUNION an enumeration uniontype with the value NONE, plus one value for each type, both non-object and object. For instance:
enum uniontype { NONE, _INT, _TYPE1, 
                 _TYPE2};
4. Give MYUNION a member variable currtype of type uniontype. This is the discriminator that will keep track of the union’s current type. Define a constructor for MYUNION that sets currtype to NONE.

In union U, declare an unsigned character buffer buff with a size equal to that of the largest object type. If you want to have non-object types in the union, include them in U also. For example:
union {
  int i; // non-object type
  unsigned char buff[max(sizeof(TYPE1),
    sizeof(TYPE2))];
} U;
6. For each object type, give MYUNION a member function as follows:
TYPE1& MYUNION::gettype1()
{
  if( currtype==_TYPE1 ) {
    return *
      (reinterpret_cast<TYPE1*>(buff));
  } else {
    cleanup();
    TYPE1* ptype1 = new(buff) TYPE1();
    currtype=_TYPE1;
    return *ptype1;
  } // else
}
This serves as the accessor for the given type and takes the place of referencing the object member directly by name. Thanks to the flexibility of C++ references, the accessor function can be used instead of a named reference as either an l-value (left side of assignment operator) or r-value (right side of assignment operator). First, the function checks to see whether the “current type” of the union is the requested type. If it isn’t, it calls cleanup to destroy any previously constructed object member. The function then calls placement new to construct the requested type. placement new doesn’t allocate any additional memory. Instead, it constructs the type within the previously declared raw buffer. Since the buffer is sized to the largest of the union’s object members, you can safely construct any of the object members within the buffer. After constructing the object, the function updates the current type and returns a reference to the new object. When the accessor is called and the requested type is the same as the current type, the function simply casts the buffer to the current type and returns a reference to the object.
For each non-object type, give MYUNION a similar member function that returns a reference to the appropriate union member. For example:
int& MYUNION::getint()
{
  if( currtype==_INT ) {
    return U.i;
  } else {
    cleanup();
    currtype=_INT;
    return U.i;
  } // else
}
Note that you could treat non-object types the same as objects as shown in step 6, instantiating them in the raw buffer. But the generated code is a little more efficient if you handle the non-objects separately.

Give MYUNION a member function cleanup as follows, with one case statement for each constructed object type. You should make access to this function protected.
void MYUNION::cleanup()
{
  switch( currtype ) {
    case _TYPE1 : {
      TYPE1& ptype1 = gettype1();
      ptype1.~TYPE1();
      break;
    } // case
// ... repeat above for _TYPE2, _TYPE3, etc...
    default: break;
  } // switch
  currtype=NONE;
}
This function destroys the current object type, if any. The trick here is the explicit destructor call. Since placement new didn’t allocate memory, it wouldn’t work to call delete to deallocate the object. Instead, you have to explicitly invoke the destructor by name.

Finally, give MYUNION a destructor that just calls cleanup.

By using this idiom, you can create unions that freely mix primitive and object types. When you reference an object type by its accessor, it is automatically instantiated. Later, if you access a different object type in the union, the old object is automatically destroyed, and the new object instantiated in its place. I hope you find this technique useful. For a complete working example, see Listing 1.

Kevin T. Manley is a software developer and consultant working in Seattle, WA. He’s currently working toward his Masters degree in Computer Science at the University of Washington. You can reach him at kmanley@cs.washington.edu.