Examining Whidbey C++

Dr. Dobb's Journal December, 2004

The next version of Managed Extensions for C++

By Richard Grimes

Richard is principal architect with iDesign Inc. and the author of Programming with Managed Extensions for Microsoft Visual C++ .NET 2003 (Microsoft Press, 2003). He can be contacted at richard@idesign.net.
Where to Get Whidbey C++

Visual Studio 2005, the Whidbey release, will contain the next version of .NET. It will also contain a new version of C++ that provides access to the new features available in .NET. Although these new features—specifically generics—are welcome, they do not represent the biggest change to C++. The language has been changed to make it easier to use and write .NET code, to provide better access to .NET features, and to give you the full range of power and flexibility in .NET that C++ has always given in the unmanaged world. In this installment of a three-part article, I outline the new syntax of the language and compare it with the current version of Managed Extensions for C++.

Introducing the New Language

The biggest change you'll notice with Whidbey C++ is that the language has dropped the pointer syntax for object references and uses a handle instead. (For information on getting Whidbey C++, see the accompanying text box entitled "Where to Get Whidbey C++.") In the first version of Managed C++, objects on the managed heap were accessed through managed pointer syntax: __gc *. In the next version of C++, __gc * is replaced with the "^" symbol. When you declare a variable, or class member, with the ^ symbol, you are defining a handle that the garbage collector will use to track the usage of a managed object. Managed objects are created with the gcnew operator and the new operator will only create unmanaged C++ objects.

For example, Listing One shows some simple code to instantiate and access a StringBuilder object. As you can see, where you would have used a __gc * and the new operator in the first version of the managed extensions, in the new version you will use ^ and gcnew. However, access to members is still through the -> operator. Note that the meaning of literal strings has changed. The first version assumed that a literal string without a prefix was an unmanaged char string, but in the new version, the compiler makes an intelligent choice, so in Listing One, Append takes a String and, hence, the literal string is treated as a managed string. If the string literal is used to initialize a char* parameter, then the literal will be treated as an ANSI string.

The first version of the compiler lets you use a C++ reference (&) to a managed object. This syntax is no longer allowed and instead a new operator, % (called a "tracking reference") has been introduced. Like C++ references, a tracking reference must be initialized and you cannot assign it to null. Tracking references must only be used as stack variables or method parameters, they cannot be used as class members. You use tracking references much like you would use C++ references, so when you change an object through the tracking reference, the underlying object is changed. A tracking reference used as a method parameter behaves like the ref keyword in C#: The method can access the object and change the object, and it can also change the tracking reference to refer to another object.

Another change you'll notice is that the language now supports automatic boxing. For example, the DayOfYear property is an Int32, a value type, and in the following, I have used it as a parameter to the StringBuilder::AppendFormat method:

sb->AppendFormat("Today is {0} days"
"from the beginning of the year",
DateTime::Now.DayOfYear);

This won't compile with the first version of Managed C++ because the nearest overload takes, as parameters, a string and an object, and the value type needs to be boxed to get an object. The first version used explicit boxing through the __box operator. The rationale behind this is that when you box a value type, you are creating a new object and so this is made explicit with the operator. However, it was a pain to remember to use this operator, and it made code more verbose and less easy to read. The Whidbey version of C++ has explicit boxing, so the previous code compiles without an error.

Declaring Types

class and struct retain their original C++ meaning; that is, members of a class are private by default, whereas members of a struct are public. Both classes and structs can be reference or value types, which is in contrast to C#, which treats a class as a reference type and a struct as a value type. A reference type is always created on the managed heap but a value type is created on the stack or is a member of a reference type. In the first version of the compiler, the class declaration is decorated with the __gc modifier to indicate that it is a reference type, or with the __value modifier to indicate that it is a value type; in the new version, these modifiers become ref and value. Listing Two shows the declaration of two classes, and where __gc class would have been used in the current version of C++, ref class is used in the Whidbey version.

Enumerations have a new syntax, too. The older compiler required the __value modifier but the new version uses the syntax enum class:

enum class primary_color {red, blue, green};

Interface definition also has a new syntax: The language uses the interface class keyword. Similar to ref class and enum class, interface class is a single keyword that contains whitespace, and it mirrors the name used by MSIL when defining an interface. However, I am uneasy about this keyword because my C++ and COM background tells me that a class and an interface are two different things. An interface class is what the first version of the Managed Extensions called a __gc __interface; that is, an interface that will be implemented by a managed object. The term "interface class" implies a class that has interface semantics, but this is not the case because an interface reference is always a reference to an instance of a class: You cannot create an instance of an interface. For this reason, I would have preferred if the designers of the new language had used ref interface instead of interface class. A further confusion with the new keyword is that the compiler even lets you define an interface struct. Because the difference between a struct and class is the default accessibility of members, this would imply that the members of an interface class are private by default and those of an interface struct are public by default. This is not the case. All members of a managed interface have to be public and this means that there is no difference between an interface class and an interface struct. This confusion would be eliminated if ref interface was used instead of interface class or interface struct.

On the positive side, managed interfaces in C++ now support more features of interfaces, as described by the ECMA specification. In particular, interfaces now support static members: fields, methods, and constructors. Static members are associated with a class and not specific instances; interfaces indicate members that must be implemented by a class but are accessible through a class instance. Thus, static members are a logical extension of the concept of an interface. In the first version of the runtime, no language-supported static members are in an interface, but Version 2 of .NET C++ supports declaring static members in interfaces. However, note that C# does not support calling static methods through interface references but, in beta 1, VB.NET lets you access interface static members; however, the compiler issues a warning.

Implementing Classes

There are several new keywords that you can use to implement your classes—some, such as abstract, sealed, and finally, have double underscored equivalents in the current version of the compiler; others are new keywords. C# has always had the new and override keywords to indicate how a virtual member is treated with respect to inherited members with the same signature. C++ only had one behavior—to override the inherited member. This has changed in Whidbey with the C++ override and new keywords. The override keyword means that the derived method overrides a virtual method inherited from a base class so that if the object is called through a base class reference or a derived class reference, the derived class method is called. The new keyword indicates that the derived class method is assigned a new slot in the v-table and this new slot is called when the object is called through a derived class reference. This means that there is a difference between this and calling the method through a base class reference, in which the old v-table slot is called, resulting in the base method being called. Listing Two uses these keywords.

C++ has always had the ability to declare constants. In the current version of the compiler, if you use the const C++ keyword, the compiler creates a constant field by inserting the IsConstModifier into the metadata for the item. The const keyword tells the compiler that the field is a literal constant; that is, the value is held in—and accessed from—metadata. A literal constant is not a variable at all—it consumes no memory at runtime and requires no initialization. The new version of the runtime lets literal constants be declared using the literal keyword. The difference between using the literal keyword and the const keyword is that literal will not add the IsConstModifier modifier to the metadata. literal can be applied to static fields that are integral or strings.

In addition, C++ also provides the initonly keyword. This is less restrictive than literal because it can be applied to static fields of any type. Such fields must be initialized in the declaration of the field or in a static constructor. So in contrast to literal fields, initonly fields require initialization at some point during runtime and they consume memory. This keyword has the same effect as the C# readonly keyword.

Properties

A property in .NET is just a piece of metadata that indicates the accessor methods that will give access to a value. Client code accesses the property as if it is a field, but the compiler uses the property metadata to determine the appropriate accessor method to call. The current version of C++ required that you declare property accessor methods individually and decorate each with the __property modifier so that the compiler knows which methods are used to generate the metadata. This mechanism works well and is flexible because it lets you declare accessor methods that have different access levels. However, this means that in your class definition, the accessor methods are not associated with each other. In the Whidbey release, C++ gets property blocks that have all the advantages of C# property blocks as well as the advantages of C++ accessor methods.

Listing Three is a simple use of properties. In this class, the Name property is declared in a property block with a getter and a setter method. The property declaration gives the type of the property and this type is also the return type of the getter and the type of the final parameter of the setter. Even though the type of the property block appears to be redundant, C++ requires that you provide it and a complete method header for the accessor methods.

In this example, I indicate that the setter returns void, which is often the way that a setter is implemented because they are usually used only to set a property. However, the advantage of declaring the complete method header is that you can indicate that the setter returns a value that lets you use the property in a statement that has multiple assignments. The Name accessors have the same accessibility, which they gain from the accessibility of the property block (in this case, public), but it is possible to apply an access specifier within the property block to give different accessibilities to the getters and setters. These access specifiers are scoped to the property block. Listing Four illustrates this. The Data property gives access to the data field and, because this field is private, it can only be accessed by members of the Base class. The Data property is declared as public so this means that, by default, the accessors are publicly accessible. However, the setter is marked as protected, so the setter can only be accessed by derived classes, as shown by the constructor of the Derived class. This property is read-only to other classes, but derived classes have write access.

The Age property in Listing Three is an example of another way to declare a property. The Age property does not have a property block and so the compiler adds simple accessors. These access a private field that has the same type as the property and has the name __backing_store_ appended to the property name (so, in this case, the field is __backing_store_Age).

Events

Events, like properties, are just metadata. This metadata indicates to users of a class the member methods used to add or remove a delegate from the event, and indicates the method that the event class can call to raise the event. The compiler generates these methods for you, as well as a delegate field to hold the delegates that are invoked when the event is raised. A delegate is a managed object that acts as a managed "function pointer" to a method that has the __clrcall calling convention; all methods on ref and value types have the __clrcall calling convention. Declaring and using delegates is essentially the same as with the current version of managed C++, with a couple of differences.

As with the first version of C++, the Whidbey release requires that a delegate constructor is passed the object and a pointer to the instance method on that object that is called through the delegate. However, in the Whidbey release, if the method is static, you can omit the object parameter and the compiler automatically passes a null value for you. In the current version of the compiler, you have to pass zero for the object parameter when you want to call a static method.

Delegates contain a linked list of delegates. To add a new delegate to the link list of another, you have to call the static method Delegate::Combine, which creates a new delegate object whose link list is the combination of the two delegates passed to the static method. There is a corresponding static method—Delegate::Remove—that returns a delegate that has the invocation list of one delegate removed from the invocation list of the other. The Whidbey release gives a shorthand of calling these methods with the + and - operators. In Listing Five, two delegates are combined using the + operator, then one of the delegates is removed using the - operator.

An event is a mechanism for an object to hold a delegate that can be invoked to notify other objects when an event has occurred. The event keyword in Whidbey behaves much the same as __event does in the current version of C++. As with properties, the big difference is that the keyword supports event blocks for declaring custom event methods. Again, similar to properties, you can give the event methods different access levels, but you should rarely stray from making the add and remove accessors public and the raise method protected.

If you don't provide an event block, the compiler generates the event methods for you and adds a delegate field to your class. However, there are compelling reasons to implement the event methods. This first reason is this: If your class has many events but few of these are likely to be used by users of the class, you will get many delegate fields that take up memory but are not used. In this case, take an example from the Windows Forms Control class that implements custom add and remove methods based on the EventHandlerList class.

The reason for implementing the raise method is more important: The compiler-generated version merely checks to see if the event delegate is not null, then it invokes the delegate. It does not take threading into account (another thread could change the delegate while it is being invoked), and it does not take into account that all event handlers are invoked serially so one lengthy event handler blocks the invocation thread and delays the invocation of the other handlers. And worse, if one errant handler throws an exception, the remaining event handlers are not invoked. All robust code that uses events provide a raise event that addresses these issues.

Operators

The .NET languages provided before Whidbey let you write operators for types; however, the mechanism to write and use operators in Managed C++ was somewhat cumbersome. .NET has special names for operators (for example op_Addition) and Managed C++ required that you used these special names rather than the equivalent C++ operator that you would use in your code (even though its not a huge leap of faith for the compiler to assume that operator + is the same as op_Addition). Operators in the current version of C++ work fine for value types, but have problems with reference types because the language accesses objects through managed pointers and so the compiler assumes that the operator applies to the pointers rather than to the objects. This could be solved with appropriate use of C++ references to the managed objects but it makes the code messy.

The Whidbey version of C++ uses handles to access objects so the operators are applied to handles and not to pointers. The compiler also lets you use C++ syntax to define operators and it generates methods with the appropriate standard .NET name. Managed operators are always static members of a class, take as a parameter the object being acted upon, and return the result of the operation.

Listing Six shows an example of a class that represents a month. The increment operator method (operator ++) is used for both the C++ pre- and postincrement operators. In addition, I have overridden the ToString method to print out the value of the object. Notice that the C++ compiler calls this method automatically when I pass the object as a parameter to Console::WriteLine.

Wrap Up

In this first installment of this article on the new version of C++, I've shown the new syntax for creating classes, instantiating objects, and accessing them. In the next installment, I'll outline the new switches for the compiler and linker.

DDJ



Listing One
StringBuilder^ sb = gcnew StringBuilder;
sb->Append("Today is ");
sb->Append(DateTime::Now.DayOfYear);
sb->Append(" days from the beginning of the year");
Back to article


Listing Two
using namespace System;
ref class Base
{
public:
   virtual void a(){Console::WriteLine("called Base::a");}
   virtual void b(){Console::WriteLine("called Base::b");}
};
ref class Derived : Base
{
public:
   void a () override {Console::WriteLine("called Derived::a override");}
   void b () new {Console::WriteLine("called Derived::b new");}
};
void main()
{
   Derived^ d = gcnew Derived;
   Console::WriteLine("call a through a Derived reference");
   d->a();
   Console::WriteLine("call b through a Derived reference");
   d->b();

   Base^ b = d;
   Console::WriteLine("call a through a Base reference");
   b->a();
   Console::WriteLine("call b through a Base reference");
   b->b();
}
Results:
call a through a Derived reference

called Derived::a override
call b through a Derived reference
called Derived::b new
call a through a Base reference
called Derived::a override
call b through a Base reference
called Base::b
Back to article


Listing Three
ref class Person
{
public:
   property String^ Name
   {
      String^ get() { return name; }
      void set(String^ n) { name = n; }
   }
   property int Age;
private:
   String^ name;
};
Back to article


Listing Four
ref class Base
{
   int data;
public:
   property int Data
   {
      int get(){return data;}
   protected:
      void set(int x){data = x;}
   }
};
ref class Derived : public Base
{
public:
   Derived(int i)
   {
      Data = i;
   }
};
Back to article


Listing Five
ref struct Test
{
   delegate void Del();
   void CalleeOne(){}
   void CalleeTwo(){}

   void Caller()
   {
      Del^ del1 = gcnew Del(this, &Test::CalleeOne);
      Del^ del2 = gcnew Del(this, &Test::CalleeTwo);

      // Create a delegate that will call both CalleeOne and CalleeTwo
      Del^ del3 = del1 + del2; // Calls Delegate::Combine
      // Create a delegate that will call just CalleeTwo
      Del^ del4 = del3 - del1; // Calls Delegate::Remove
   }
};
Back to article


Listing Six
using namespace System;
ref class Month
{
   int month;
public:
   Month(int m) : month(m){}
   static Month^ operator ++ (Month^ m)
   {
      m->month = m->month == 12 ? 1 : m->month + 1;
      return m;
   }
   String^ ToString () override
   {
      return String::Format("value is {0}", month);
   }
};
void main()
{
   Month^ m = gcnew Month(6);
   Console::WriteLine("before {0}", m);
   ++m;
   Console::WriteLine("after {0}", m);
}
Back to article