C/C++ Users Journal January, 2005

C++/CLI by Example: Getting Started

Examining a new dialect of Standard C++

By Rex Jaeschke

Rex Jaeschke is an independent consultant, author, and seminar leader. He serves as editor of the Standards for C++/CLI, CLI, and C#. Rex can be reached at rex@RexJaeschke.com.

Welcome to the world of C++/CLI [1], a new dialect of Standard C++. In this and future articles, I examine how C++ has been extended to exploit the CLI (Common Language Infrastructure) platform [2]. Apart from a good dose of enthusiasm and time, you'll need a compiler. For purposes here, I'm using the Visual C++ 2005 Express Edition Beta [3]. (You should also get the latest draft of the C++/CLI Standard [4].) This compiler is in beta release. While it is updated from time to time, it might not support all the syntax described in the draft Standard. In addition, the online help/documentation isn't complete. In many cases, function signatures are shown in the Managed Extensions for C++ form, rather than C++/CLI. However, the price is right—it's free!

Assemblies and Metadata

The traditional C++ compilation model involves compiling each source file separately to object form, then linking all objects together along with library functions to make executables. The CLI model is quite different. It involves the creation and use of assemblies.

Simply stated, an assembly is the output from a single compilation, regardless of how many input source files are involved. If that output has an entry point (a main function, for example), it is an .exe file; if it does not, it's a .dll file. Any compilation that refers to something from outside the assembly being created must access that dependent assembly. There is no header-like mechanism to promise what will ultimately be available at link time. Such external information must be accessible during compilation by having the compiler look inside dependent assemblies.

An assembly contains metadata, which describes the types and functions contained therein, and instructions in the CIL (Common Intermediate Language) [5], which Microsoft calls "MSIL." This metadata and instructions can then be executed by the platform-independent Virtual Execution System (VES) [5].

CLI Types

Listing 1 is a class that models a two-dimensional point. Various source lines (or blocks of lines) are labeled with comments of the form /*1*/, /*2*/, /*3a*/, and so on. These shall be referred to as "cases," as in Case 1, Case 2, Case 3(a), and so on.

Namespaces. All CLI Standard Library types reside in the namespace System or in namespaces nested inside that one. Examples are the types System::Object and System::String, and the namespaces System::IO, System::Text, and System::Runtime::CompilerOptions. Case 1 avoids the need for using namespace qualification.

Defining a Ref Class. In Case 2, we define a ref class called Point. A ref class is a CLI reference type. When taken together, ref and class (with intervening whitespace) make up a new keyword.

The public prefix indicates that this type is visible outside its parent assembly. (There are two kinds of visibility, public and private. By default, types have private visibility.) Only types can have visibility; as such, nonmember functions, global variables, and file-scope typedefs cannot be made visible outside their parent assembly.

As C++ programmers would expect, except for the default member accessibility, a ref struct is just like a ref class. Here I refer to both as ref classes.

Every ref class has a base type. If one is not explicitly specified, the default base is System::Object. A ref class can have only one base class.

Properties. Regardless of how a Point is represented internally, think of that point as having an X and a Y property. If the point actually uses Cartesian representation, the implementation of these properties is trivial. If it uses polar representation, that's more complicated, but is still a hidden implementation detail.

A scalar property is a member that provides field-like access to an instance. In Case 3(a), I define a property X with type int. The token property is a contextual keyword, not a globally reserved keyword (although the editor color codes it as if it were, which is not a bad thing). Its use is only reserved in this context.

A property can have either or both a get accessor and a set accessor. I simply call them the getter and the setter. The job of a getter, see Case 3(b), is to return the value of the given property (by retrieving it from some internal storage, computing it, or reading it from a file, for example). The job of a setter, see Case 3(c), is to set the value of the given property using the programmer-supplied value. These accessors are defined as separate functions with the names get and set, respectively, and they must return and take, respectively, the declared type of the property, in this case, int. (These names are not keywords.) The getter and setter can have different accessibilities, although that can hinder language interop because other CLI languages may not be able to support that.

A simple example of using the setter can be seen in the default constructor—Cases 5(b) and 5(c)—in which X and Y are set to zero. Note carefully that X=Y=0 cannot be used instead. Since the setter has a void return type, the subexpression Y=0 cannot occur inside another expression.

A scalar property can be made trivial simply by replacing its body with a semicolon. In this case, the compiler assumes that the property needs to be backed by storage, which it allocates and manages using a compiler-generated getter and setter. Since our scalar property really is trivial, it could have been defined in this manner.

Type Equality. For a ref class, equality is implemented via a function called Equals, as in Case 8(a), rather than by overloading operator==. As Point overrides System::Object::Equals, Point::Equals must be declared virtual and have the override function modifier. Again, the token override is a contextual keyword, not a reserved keyword. For this function to override the one in Object, it needs to take an Object as its parameter, not a Point.

Actually, the parameter has type Object^, which is read as "handle to Object" and points to an object on the managed (garbage-collected) heap. ("Handle" is a C++/CLI term; CLI actually calls such a thing a "reference," but C++ already has references, which are quite different.)

Experienced C++ class designers should notice there are two important things missing from this function's definition: The function is not const-qualified, and the parameter is not passed as a handle to const. Why is that? Member functions of ref classes cannot yet be const-qualified; the CLI has no notion of const-qualified functions. Declaring the parameter to be a handle to const makes it a different type, such that it would no longer be an override of System::Object::Equals. (Handles to const are permitted, but they can only be used within a C++/CLI context, and then never with any CLI Standard Library function, as the CLI has no notion of const. Future versions of C++/CLI will likely add full support for const, although again const will not be supported by other languages.)

In Case 8(b), I compare obj against nullptr. This keyword represents the null value constant. When used in the context of a handle, it represents the null handle—a handle that does not lead to an object. When used in the context of a pointer, it represents the null pointer—a pointer that does not contain an address.

To detect the (unusual) case of comparing something with itself, I compare obj with this in Case 8(c). In a nonref (that is, native) class, this is a pointer to the object on which the instance function was called, optionally with a const qualifier. In a ref class, this is a handle to the object on which the instance function was called. (Again, no const qualifier is permitted.) Just as the arrow operator (->) is used to access a member via a pointer, so too is a member accessed via a handle.

Equals must ensure that the two objects it's comparing have exactly the same type. You achieve this in Case 8(d) by calling System::Object::GetType, which "returns an instance of System::Type that represents the runtime type of the current instance. Two System::Type object references refer to the same object if, and only if, they represent the same type." Note that we are comparing two handles here, not two Type objects.

Once you know both objects have the same type, you can safely up-cast the Object handle to a Point handle, and perform the data comparison without worrying about a mismatch type exception; hence, I use static_cast.

Hash Codes. For hashtable data structures to work properly, objects they contain must have a function called GetHashCode. Basically, if a type defines Equals, it should also define GetHashCode, which overrides System::Object's version, as in Case 9. (I make no claims about the reasonableness of the hashing algorithm used.)

Value Formatting. Like equality, value formatting is implemented via a function that overrides one in System::Object, as in Case 10(a), rather than by overloading operator<<. This function, called ToString, is required to "create and return a string representation of the current instance." You achieve this by calling System::String::Concat to concatenate three string literals and two ints.

Clearly, Concat can't have a different overload for every possible combination of argument number and type. How then does Concat deal with these arguments? While there are overloads for common combinations, no overload takes more than four arguments. The overload used in this case is:

static String^ Concat(... array<Object^>^ list);

The ellipses notation at the beginning of the final (in this case, the only) parameter declaration (which must have a managed array type) indicates that this parameter accepts an arbitrary number of arguments of the given element type. That is, it's a type-safe version of varargs, called a "parameter array." The parameter list is a handle to a managed array of handles to Object. (I'll look at parameter arrays and managed arrays in general in a future article.)

How do the two ints, X and Y, get converted to Object^? There is an implicit conversion from an expression of any primitive type to Object^. This process is called "boxing," and involves the allocation of an Object on the managed heap, with that object containing the value of the primitive. The reverse process is called "unboxing," and requires an explicit cast.

Naming Conventions. The CLI naming guidelines [4,5] specify that classes, functions, and properties have names written in PascalCase; that is, with the first letter of each word being capitalized. The CLI Standard Library follows this approach.

An Application

Listing 2 is a simple application that uses the Point class.

Allocating Managed Memory. In Case 1, I define a handle to type Point, and initialize it with the location returned from the operator gcnew, which allocates space on the managed heap for a new Point object. (gcnew is a keyword.) As you should expect, the default constructor is called.
In this release of C++/CLI, objects of ref class type can reside only on the managed heap or on the stack. Unlike other CLI languages, C++/CLI lets you author ref classes that can be passed and assigned by value, using a copy constructor or operator=. You can still implement a virtual (deep) assignment using the normal CLI convention of overriding a Clone function.
Formatted Output. CLI provides a family of I/O types whose functions are called explicitly using functional notation. The simplest ones are the System::Console Write and WriteLine (see Case 2) function overloads, which write out text to the standard output device. WriteLine appends a newline while Write does not.
There are numerous overloads of these functions; however, the most common form takes a format string containing text and optional format specifications—each of which is delimited by braces—followed by the arguments whose values are to be formatted. A specification of {0} corresponds to the first argument passed following the format string, one containing {1} corresponds to the second such argument, and so on. Like Concat, there are overloads that take a small number of fixed arguments as well as some that take a fixed plus a variable number of arguments. In this case, the following overload is used:

	static void WriteLine(String^ format, 
  	Object^ arg0, Object^ arg1);

The string literal is implicitly converted to String^. As p1 is a Point^, and Point is derived from Object, p1 is passed as is. GetHashCode returns an int, so that is boxed to Object^ before being passed. Once WriteLine gets control, it calls the second and third argument's ToString functions and writes out the resulting strings. For completeness, here is the output:

	p1 = (0,0), p1's HashCode = 0
	p1 = (5,7), p1's HashCode = 11
	p1 Equals Point(9, 1) = False

Garbage Collection. The memory referred to by the handle p1 resides on the managed heap, which is under the watchful eye of the garbage collector. When a handle goes out of scope, the memory to which it referred has one less handle associated with it. When that handle count reaches zero, that memory can be reclaimed automatically. If a handle doesn't go out of scope for some time, yet you are no longer interested in the memory to which it refers, you can reduce its reference count by setting that handle to nullptr. There is no way to explicitly free managed memory; you can call delete on a handle, and that runs the destructor (Dispose function) immediately, but the memory will not be reclaimed until the garbage collector decides it needs to collect it.

Compiling the Code

To put Point and the main program in separate assemblies, you create two projects—project Point for the Point class, and project Main for the application program.

To create the Point project, select File|New|Project|Empty Project. (Do not choose "Class Library.") In the Solution Explorer in Source Files, right-click to Add|New Item|C++ File, and specify Point. To this file, add the source code from Listing 1, and save that file.

In the Solution Explorer, right-click on the project name Point, select Configuration Properties|General, and change Configuration Type to Dynamic Library. Then in Linker|Output File, change the .exe suffix to .dll.

(Although this is all done automatically if you chose the Class Library option, it would give you a bunch of other support files that you don't need.) Select Build and Point.dll is produced in the Point\debug folder.

Creating the Main project is much like creating the Point project, except that this new project is called "Main," and the source file is Main.cpp. (You can do this by running a second copy of the compiler; that way, you can work with both projects at the same time.) By default, selecting an Empty Project results in an .exe file, which is what you want. Since Main.cpp refers to the Point type, you need to tell the compiler where to find that type's parent assembly. To do that, in Solution Explorer, right-click on the project name Main, select Common Properties|References|Add New Reference|Browse, and navigate your way to the file Point.dll in the Point project folder created earlier. Select Add|OK, and OK. Select Build, and Main.exe is produced in the Main\debug folder. Execute the program. If you want the output window to persist, you must set a breakpoint at the closing brace of main.

Reader Exercises

Here are some things you might want to do to reinforce what I've presented:

Look at the names and purpose of the members in System::Object, System::Type, and System::String. In particular, look at the documentation for System::Object::Equals to see the rules by which equality should be determined, since any function overriding Equals should follow these rules.
Learn more about the format specifications available in System::Console::WriteLine.
Since every ref class inherits from System::Object, if the ToString function were omitted from Point, what would the application produce when writing out a Point?
Create an icon on the desktop for the program ildasm.exe that resides in the VC++ Express installation directory. This program is a disassembler; you can use it to examine any assembly.
Run ildasm against the Point.dll and Main.exe assemblies, and check out the various menu options. For the Point assembly, confirm that the actual names of the property X accessor functions are really get_X and set_X.
Change the Point class to use trivial properties for X and Y, and use ildasm to inspect the private backing storage and accessors.
The CLI Standard Library types are contained in three assemblies, mscorlib.dll, System.dll, and System.Xml.dll, all of which reside in the system directory C:\Windows\Microsoft.NET\ Framework\v2.0.40607 (or some such). Use ildasm to inspect the type [mscorlib]System::Object, for example. (The notation [...] indicates the type's parent assembly.)

References

[1] CLI stands for "Common Language Infrastructure," the subset of .NET that was standardized by Ecma Technical Committee TC39/TG3, and adopted by ISO/IEC.

[2] .NET is the name of a Microsoft product that is a superset of the CLI Standard. Another implementation of the CLI is Mono, from Novell/Ximian, which runs on Windows and Linux. See http://www.mono-project.com/about/index.html.

[3] http://lab.msdn.microsoft.com/express/visualc/. It is updated periodically as new features from the draft Standard are implemented.

[4] http://www.plumhall.com/ecma/index.html. (Tom Plum is convener of the C++/CLI Standards committee.)

[5] CIL and VES are part of the CLI Standard, ECMA-335 (http://www.ecma-international.org/publications/index.html).