Inside Whidbey C++

C/C++ Users Journal November, 2004

The improvements in the next version of the Managed Extensions for C++

By Richard Grimes

Richard Grimes is the author of Programming with Managed Extensions for Microsoft Visual C++ .NET 2003 (Microsoft Press, 2003). He can be contacted at richard@ richardgrimes.com.

When .NET was first released, Microsoft provided a version of the Microsoft C++ compiler that would produce .NET managed code. Microsoft called this variant of the language the "Managed Extensions for C++," but colloquially it was known as "Managed C++." The emphasis in the first version of Managed C++ was to provide interoperation with native code and create .NET assemblies from any C++ that could be compiled with the Microsoft compiler, regardless of whether that code was managed or unmanaged. That was a phenomenal feat. Little was done to the actual language, the extensions that were required were carried out using double-underscore prefixed modifiers and array access used a different syntax, but other than these, Managed C++ looked just like Standard C++ [1].

The next version of .NET, codenamed "Whidbey," provides a new version of the Microsoft C++ compiler. The most noticeable difference is that the C++ language used for managed code has been changed. In this article, I present the new version of C++ that will be used in Visual Studio .NET 2005 to produce .NET managed code.You can try out these new features by downloading the May Community edition of Visual Studio .NET 2005. This is available at no charge to MSDN subscribers; nonsubscribers must pay a shipping fee [2]. In addition, the full C++ compiler is available through the express version of Visual C++ (this version does not include the unmanaged MFC or ATL libraries)[3]. This site requires a .NET passport and registration information. After registration, you are sent a confirmation e-mail with a link to the download site.

Reference and Value Types

The .NET runtime maintains its own heap for .NET objects, and objects created on this heap are known as "reference types." The managed heap is maintained by the .NET Garbage Collector (GC) and when it is under low-memory conditions, the GC compacts the heap. This means that the GC must be able to track use of all .NET objects allocated on the heap. It does this through a "handle." Managed C++ indicates that a variable or parameter is a handle to a managed object by using a ^ symbol (the * symbol is used exclusively for pointers to unmanaged memory). Allocation of managed objects is through a new operator, gcnew, and the new operator is used exclusively for unmanaged C++ objects.

Reference types are defined using the ref class or ref struct keywords. Note that the keyword is ref class; that is, two words are used for a single keyword. The class or struct part of the keyword has the same meaning as it does in C++—it determines the default accessibility of the members. Members of reference types are accessed through a pointer to member operator (->) on a handle. .NET reference types can have a single base class but can implement any number of interfaces.

.NET has support for stack allocated objects called "value types." These are allocated just as they are in unmanaged C++, by declaring a stack-based instance. Since these are not allocated on the managed heap, they are not tracked by the garbage collector. Value types are accessed directly, and variables do not use the handle syntax. Members of value types are accessed through the dot operator (.). Value types are defined using the value class or value struct aggregate keywords and, like reference types, they can implement any number of interfaces.

Managed C++ provides a tracking reference. This behaves as an alias for an object, similar to the behavior of a native C++ reference. A handle to an object, ^, can only be applied to objects on the managed heap, but a tracking reference can be applied to a handle to an object, an instance of a value type, or to an unmanaged type on the C++ free store or on the stack. For unmanaged and value types, a tracking reference essentially behaves like a native C++ reference.

For example, Listing 1 accesses the managed type StringBuilder, which is part of the .NET Framework and is used to construct managed strings. Since StringBuilder is a reference type, an instance is created with gcnew and accessed through a handle. The string is constructed using the current date and time through the static Now property of the DateTime value type. The Append method of the StringBuilder class is overloaded. In this code I use two versions—one takes a 32-bit integer and the other takes a managed string. For the latter, I pass literal values. In .NET, managed strings are always Unicode, but in this code the literal is not prefixed with a modifier to specify that the string is a managed string. The compiler uses the context to determine the type of the literal.

Properties

.NET properties are accessed as if they are data members, but they are implemented with accessor methods that let you define read-write, read-only, or write-only properties, depending on which accessor you implement. In fact, a property is just a piece of metadata that indicates the accessor methods. In the new version of Managed C++, properties are declared in property blocks, which is similar to how C# has always defined properties. However, the C++ syntax is more flexible.

Listing 2 is a class that has two properties, Name and Age. The Name property is read-write and is declared in a property block with a getter and setter method. The property declaration gives the type of the property and this type is also the return type of the getter and the type of the final parameter of the setter. The code simply accesses a field member of the class, but typically, accessor methods would calculate values or perform validation.

In this example, I indicate that the setter returns void, which is often the way that a setter is implemented because they are usually only used to set a property. However, the advantage of declaring the complete method header is that you can indicate that the setter returns a value that lets you use the property in a statement that has multiple assignments. The Name accessors have the same accessibility, which they gain from the accessibility of the property block (in this case, public), but it is possible to apply an access specifier within the property block to give different accessibilities to the getters and setters. These access specifiers are scoped to the property block.

The Age property gives an example of another way to declare a property. The Age property does not have a property block and so the compiler provides simple accessors. These access a private field that has the same type as the property and has the name __backing_store_ appended to the property name (so, in this case, the field is called __backing_store_Age).

Events

.NET events are a notification mechanism [4]. A client provides a callback method through a delegate, and a delegate can contain a reference to a single method or to multiple methods. These methods can be static or they may be implemented on one or more objects. An event indicates that an object provides a notification through a specified delegate. In this respect, an event is like a property—it is just metadata. This metadata indicates the member methods that will be used to add a delegate to or remove a delegate from the event, and indicates the method that the event class can call to raise the event.

The simplest use of an event is to declare a delegate, then add an event member of this type to the class; in general, the event should be public. The compiler adds a delegate field to the class and generates methods to add and remove delegates from the field. The accessibility of these methods are determined by the access level of the event. The raise method is always protected and the delegate field will always be private.

Listing 3 is an example of using events. The Calculator class has an event called Completed that is raised to indicate that the work has finished. The InformMe class has two methods that have the same signature as the CompleteHandler delegate and the address of each of these are used to initialize two delegates that are added to the Complete event. The compiler calls the add method in response to the += operator. If the method is an instance method, then the two parameter version of the delegate constructor must be called to pass the object instance and a pointer to the method; in the new version of Managed C++, you can omit the object parameter if the delegate method is static. Furthermore, the new version of Managed C++ provides the + and - operators to create a third delegate from combining two, or by removing one delegate from another.

The compiler-generated add, remove, and raise methods are sufficient for most cases, but if you have a class with many events, you may decide that a delegate field for each event is inefficient. The new version of Managed C++ lets you declare an event block similar to a property block so that you can implement these event methods and give them different access levels.

Interfaces

An interface is a contract guaranteed by the type that implements it. The new syntax uses the aggregate keyword interface class or interface struct to declare an interface. There is no difference between the two: Regardless of whether you use interface class or interface struct, the members of an interface are always public. Interfaces cannot contain any storage, they can only have methods or metadata, so interfaces can only contain methods, properties, and events.

A class that implements an interface must implement all members. In general, it does this by implementing methods with the same name as the corresponding member in the interface. However, in the new version of the compiler you can give the member a different name as long as you tell the compiler explicitly which interface member it implements. Listing 4 is an example of this. The class Foo implements two interfaces that have a member with the same name and signature. Foo handles this by providing two methods and explicitly mentioning which interface the method implements.

Managed interfaces in C++ now support static members: fields, methods, and constructors. This follows the ECMA .NET specification and is the only .NET language provided by Microsoft that supports this feature. Static members are associated with a type; interfaces indicate members that must be implemented by a type, but are accessible through an instance. An interface represents a behavior of an object and static members are a behavior of the type. Thus, static members are a logical extension of the concept of an interface. In beta 1, VB.NET lets you access interface static members but the compiler will issue a warning; this may change in future betas. C# does not support calling static methods through interface references (I've been told that this is by design).

Implementing Classes

.NET supports single implementation inheritance. There is no concept of access level to the base class—in .NET, the access level is always public and so you can miss this keyword when deriving from a base class. The runtime supports virtual methods so you can use types polymorphically. Interface methods are implicitly virtual, but you can make a method explicitly virtual with the virtual keyword. When a derived class provides a method with the same name and signature as a base class virtual method, you have to indicate whether you want the derived class method to override the base class method, or whether you want it to be a new method. You apply the override keyword to the method to get the Standard C++ behavior of the virtual method pointer replacing the base class method pointer in the object's v-table. This gives the usual polymorphic behavior. .NET also lets you indicate that the derived method provides a new implementation and a new entry in the v-table. To do this, you use the new keyword on the method so that if the method is called through a base class handle, the base class implementation is called.

Managed C++ lets you define constants. If you use the const C++ keyword, the compiler creates a constant field by inserting the IsConstModifier into the metadata for the item. However, this modifier only means something to the C++ compiler; other compilers ignore it. C++ provides two other ways to declare constants. The literal keyword can be applied to integer or string fields and it implicitly implies that the field is static. The literal constant must be initialized in the declaration. The initonly keyword is less restrictive than literal because it can be applied to static fields of any type. Such fields must be initialized in the declaration of the field or in a static constructor.

These three types of constants are treated differently by calling code. The value of the literal constant is stored in metadata. When the compiler compiles code that uses the literal constant, it copies the value from this metadata to the target assembly. A const constant is similar if the constant is defined in the same assembly, but if the constant is defined in another assembly, then the field is accessed from the other assembly and the value is copied into the location where it will be used. In other words, the other assembly has a memory location that contains the constant value. If the client is C++, then you will not be able to assign this field to another value; however, since other languages ignore the IsConstModifier modifier, it means that they can alter the value of a const constant. initonly fields require initialization at some point during execution time and they consume memory. This means that in all cases, the constant value is accessed as a field, so the metadata of the client code will show the name of the constant. Like literal constants, initonly fields cannot be changed by any code.

Verifiable Assemblies

One of the great facilities of Managed C++ is its ability to compile native C++ into an assembly. This means that existing static libraries and C++ source code can be reused in .NET assemblies. The downside of this facility is that the managed code that is generated is not verifiable. When the .NET runtime loads an assembly, it goes through a series of checks, which includes stepping through all of the intermediate language code checking that the code does not do anything unsafe such as manipulate pointers. Previous versions of the Managed C++ compiler always mark assemblies as not verifiable so that the runtime skips verification. A nonverifiable assembly can only be used if it has full trust, which usually means that it is installed on the user's hard disk.

The next version of .NET introduces SQL Server ("Yukon") as a host of .NET code, and this will not run nonverifiable code, regardless of the source of the code. To address this, the next version of the Managed C++ compiler has the option to generate assemblies that will be verifiable. Of course, to do this requires that you follow some draconian rules.

In fact, C++ can produce three types of assemblies based on the parameter passed to the /clr command-line switch. To produce a verifiable assembly, you use /clr:safe, which means that you cannot use native code of any kind, including code in DLLs accessed through Platform Invoke and the C runtime library. The compiler passes the code it generates through the .NET verifier and if any nonverifiable code was generated, the verifier emits an error.

Most native C++ code compiles exclusively to IL. The memory used by that code will not be managed by the garbage collector, so they will not produce managed objects. Such code is not verifiable because it still uses and manipulates pointers. An assembly that only contains IL is called a "pure" assembly and code for a pure assembly can be created with the /clr:pure switch. Pure assemblies can use native code in other DLLs through Platform Invoke and can use the C runtime library (CRT) through a special managed version of the library. The CRT uses global variables and native code could have global objects or static members. However, global objects are not supported in .NET.

C++ gets around this issue by providing initialization code in a static constructor for the main module of the assembly. A .NET static constructor is guaranteed to be called sometime before the type is used. An assembly is made up of one or more modules and each module contains the types defined for the assembly, so a module static constructor is the ideal location for initialization code used by those types. In .NET, the unit of execution isolation is called an "application domain" and a process can have more than one application domain. The native global and static objects for pure assemblies (initialized by the module static constructor) are initialized for each application domain.

The final type of assembly is a mixed assembly, which can contain code that calls native code of any type. The code can access native code in DLLs through Platform Invoke; it can access native C++ classes and code compiled into static libraries. A static library contains compiled x86 code and so this will be embedded into the assembly. You can use the CRT, but this time global and static objects will be initialized for the entire process, regardless of how many application domains that will run in the process.

Generics

Managed C++ developers are privileged because not only can they create and consume generics, but they can also use C++ templates. A template is expanded to a specialized type at compile time and it is this type that is put in the final assembly. A generic type, on the other hand, is not "expanded" until execution time, so an assembly can export a generic type. At execution time, the .NET runtime creates a concrete type from the generic type, either through simple substitution (for reference type parameters) or by creating a new type (for parameters that are value types).

Templates, of course, offer useful features such as explicit specialization, partial specialization, default parameters, and template parameters, which can be a base class of the templated type. Generics do not allow any of these. Generics are not Common Language Specification (CLS) compliant, but the other major .NET languages (VB.NET and C#) can define and consume them, so in essence, generics can be used by most developers.

Declaring a generic type is straightforward. In Listing 5, the class has a generic parameter of type T and T can be used throughout the class. The generic type is used without the handle syntax (^), both in the declaration of the parameter and when you use the generic type parameter in your code. When you access a member or a method parameter of the generic parameter, you use the pointer-to-member syntax even though the generic parameter may be substituted with a value type.

Such code can only use the Object members of T because those are the only members that T is guaranteed to implement. To get around this issue, you can declare a constraint on the parameter. Constraints are declared using a where clause mentioning the parameter and a list of the base type, and/or the interfaces implemented by the type. If client code tries to create a concrete type with parameters that do not follow the constraints, the compiler will issue an error. Listing 6 shows a generic type, DisposableType, which has a parameter that must implement the IDisposable interface. In this case, the compiler happily compiles the Close method because instances of the generic parameter are guaranteed to implement the members of IDisposable.

Conclusion

The Whidbey release of Managed C++ offers new features that let you produce better .NET code than the other .NET languages, while still having the power of C++. This article is based on the first beta. The next beta is touted to have improvements to the Standard Library for managed code, which will make Managed C++ an outstanding language for .NET development.

References

[1] http://www.cuj.com/documents/s=8212/cujcnet0209hodapp1/0209k.htm (Nick Hodapp outlines the syntax of the first version of Managed C++).
[2] http://lab.msdn.microsoft.com/vs2005/get/default.aspx. (This site requires a .NET passport and registration information. After registration, you are sent a confirmation e-mail with a link to the download site.)
[3] http://lab.msdn.microsoft.com/express/visualc/default.aspx.
[4] http://www.cuj.com/documents/s=8009/cuj0209smith/smith.htm (J. Daniel Smith explains delegates and event in the first version of Managed C++).