C/C++ Users Journal January, 2005
Welcome to the world of C++/CLI [1], a new dialect of Standard C++. In this and future articles, I examine how C++ has been extended to exploit the CLI (Common Language Infrastructure) platform [2]. Apart from a good dose of enthusiasm and time, you'll need a compiler. For purposes here, I'm using the Visual C++ 2005 Express Edition Beta [3]. (You should also get the latest draft of the C++/CLI Standard [4].) This compiler is in beta release. While it is updated from time to time, it might not support all the syntax described in the draft Standard. In addition, the online help/documentation isn't complete. In many cases, function signatures are shown in the Managed Extensions for C++ form, rather than C++/CLI. However, the price is rightit's free!
The traditional C++ compilation model involves compiling each source file separately to object form, then linking all objects together along with library functions to make executables. The CLI model is quite different. It involves the creation and use of assemblies.
Simply stated, an assembly is the output from a single compilation, regardless of how many input source files are involved. If that output has an entry point (a main function, for example), it is an .exe file; if it does not, it's a .dll file. Any compilation that refers to something from outside the assembly being created must access that dependent assembly. There is no header-like mechanism to promise what will ultimately be available at link time. Such external information must be accessible during compilation by having the compiler look inside dependent assemblies.
An assembly contains metadata, which describes the types and functions contained therein, and instructions in the CIL (Common Intermediate Language) [5], which Microsoft calls "MSIL." This metadata and instructions can then be executed by the platform-independent Virtual Execution System (VES) [5].
Listing 1 is a class that models a two-dimensional point. Various source lines (or blocks of lines) are labeled with comments of the form /*1*/, /*2*/, /*3a*/, and so on. These shall be referred to as "cases," as in Case 1, Case 2, Case 3(a), and so on.
Namespaces. All CLI Standard Library types reside in the namespace System or in namespaces nested inside that one. Examples are the types System::Object and System::String, and the namespaces System::IO, System::Text, and System::Runtime::CompilerOptions. Case 1 avoids the need for using namespace qualification.
Defining a Ref Class. In Case 2, we define a ref class called Point. A ref class is a CLI reference type. When taken together, ref and class (with intervening whitespace) make up a new keyword.
The public prefix indicates that this type is visible outside its parent assembly. (There are two kinds of visibility, public and private. By default, types have private visibility.) Only types can have visibility; as such, nonmember functions, global variables, and file-scope typedefs cannot be made visible outside their parent assembly.
As C++ programmers would expect, except for the default member accessibility, a ref struct is just like a ref class. Here I refer to both as ref classes.
Every ref class has a base type. If one is not explicitly specified, the default base is System::Object. A ref class can have only one base class.
Properties. Regardless of how a Point is represented internally, think of that point as having an X and a Y property. If the point actually uses Cartesian representation, the implementation of these properties is trivial. If it uses polar representation, that's more complicated, but is still a hidden implementation detail.
A scalar property is a member that provides field-like access to an instance. In Case 3(a), I define a property X with type int. The token property is a contextual keyword, not a globally reserved keyword (although the editor color codes it as if it were, which is not a bad thing). Its use is only reserved in this context.
A property can have either or both a get accessor and a set accessor. I simply call them the getter and the setter. The job of a getter, see Case 3(b), is to return the value of the given property (by retrieving it from some internal storage, computing it, or reading it from a file, for example). The job of a setter, see Case 3(c), is to set the value of the given property using the programmer-supplied value. These accessors are defined as separate functions with the names get and set, respectively, and they must return and take, respectively, the declared type of the property, in this case, int. (These names are not keywords.) The getter and setter can have different accessibilities, although that can hinder language interop because other CLI languages may not be able to support that.
A simple example of using the setter can be seen in the default constructorCases 5(b) and 5(c)in which X and Y are set to zero. Note carefully that X=Y=0 cannot be used instead. Since the setter has a void return type, the subexpression Y=0 cannot occur inside another expression.
A scalar property can be made trivial simply by replacing its body with a semicolon. In this case, the compiler assumes that the property needs to be backed by storage, which it allocates and manages using a compiler-generated getter and setter. Since our scalar property really is trivial, it could have been defined in this manner.
Type Equality. For a ref class, equality is implemented via a function called Equals, as in Case 8(a), rather than by overloading operator==. As Point overrides System::Object::Equals, Point::Equals must be declared virtual and have the override function modifier. Again, the token override is a contextual keyword, not a reserved keyword. For this function to override the one in Object, it needs to take an Object as its parameter, not a Point.
Actually, the parameter has type Object^, which is read as "handle to Object" and points to an object on the managed (garbage-collected) heap. ("Handle" is a C++/CLI term; CLI actually calls such a thing a "reference," but C++ already has references, which are quite different.)
Experienced C++ class designers should notice there are two important things missing from this function's definition: The function is not const-qualified, and the parameter is not passed as a handle to const. Why is that? Member functions of ref classes cannot yet be const-qualified; the CLI has no notion of const-qualified functions. Declaring the parameter to be a handle to const makes it a different type, such that it would no longer be an override of System::Object::Equals. (Handles to const are permitted, but they can only be used within a C++/CLI context, and then never with any CLI Standard Library function, as the CLI has no notion of const. Future versions of C++/CLI will likely add full support for const, although again const will not be supported by other languages.)
In Case 8(b), I compare obj against nullptr. This keyword represents the null value constant. When used in the context of a handle, it represents the null handlea handle that does not lead to an object. When used in the context of a pointer, it represents the null pointera pointer that does not contain an address.
To detect the (unusual) case of comparing something with itself, I compare obj with this in Case 8(c). In a nonref (that is, native) class, this is a pointer to the object on which the instance function was called, optionally with a const qualifier. In a ref class, this is a handle to the object on which the instance function was called. (Again, no const qualifier is permitted.) Just as the arrow operator (->) is used to access a member via a pointer, so too is a member accessed via a handle.
Equals must ensure that the two objects it's comparing have exactly the same type. You achieve this in Case 8(d) by calling System::Object::GetType, which "returns an instance of System::Type that represents the runtime type of the current instance. Two System::Type object references refer to the same object if, and only if, they represent the same type." Note that we are comparing two handles here, not two Type objects.
Once you know both objects have the same type, you can safely up-cast the Object handle to a Point handle, and perform the data comparison without worrying about a mismatch type exception; hence, I use static_cast.
Hash Codes. For hashtable data structures to work properly, objects they contain must have a function called GetHashCode. Basically, if a type defines Equals, it should also define GetHashCode, which overrides System::Object's version, as in Case 9. (I make no claims about the reasonableness of the hashing algorithm used.)
Value Formatting. Like equality, value formatting is implemented via a function that overrides one in System::Object, as in Case 10(a), rather than by overloading operator<<. This function, called ToString, is required to "create and return a string representation of the current instance." You achieve this by calling System::String::Concat to concatenate three string literals and two ints.
Clearly, Concat can't have a different overload for every possible combination of argument number and type. How then does Concat deal with these arguments? While there are overloads for common combinations, no overload takes more than four arguments. The overload used in this case is:
static String^ Concat(... array<Object^>^ list);
The ellipses notation at the beginning of the final (in this case, the only) parameter declaration (which must have a managed array type) indicates that this parameter accepts an arbitrary number of arguments of the given element type. That is, it's a type-safe version of varargs, called a "parameter array." The parameter list is a handle to a managed array of handles to Object. (I'll look at parameter arrays and managed arrays in general in a future article.)
How do the two ints, X and Y, get converted to Object^? There is an implicit conversion from an expression of any primitive type to Object^. This process is called "boxing," and involves the allocation of an Object on the managed heap, with that object containing the value of the primitive. The reverse process is called "unboxing," and requires an explicit cast.
Naming Conventions. The CLI naming guidelines [4,5] specify that classes, functions, and properties have names written in PascalCase; that is, with the first letter of each word being capitalized. The CLI Standard Library follows this approach.
Listing 2 is a simple application that uses the Point class.
static void WriteLine(String^ format, Object^ arg0, Object^ arg1);
p1 = (0,0), p1's HashCode = 0 p1 = (5,7), p1's HashCode = 11 p1 Equals Point(9, 1) = False
To put Point and the main program in separate assemblies, you create two projectsproject Point for the Point class, and project Main for the application program.
To create the Point project, select File|New|Project|Empty Project. (Do not choose "Class Library.") In the Solution Explorer in Source Files, right-click to Add|New Item|C++ File, and specify Point. To this file, add the source code from Listing 1, and save that file.
In the Solution Explorer, right-click on the project name Point, select Configuration Properties|General, and change Configuration Type to Dynamic Library. Then in Linker|Output File, change the .exe suffix to .dll.
(Although this is all done automatically if you chose the Class Library option, it would give you a bunch of other support files that you don't need.) Select Build and Point.dll is produced in the Point\debug folder.
Creating the Main project is much like creating the Point project, except that this new project is called "Main," and the source file is Main.cpp. (You can do this by running a second copy of the compiler; that way, you can work with both projects at the same time.) By default, selecting an Empty Project results in an .exe file, which is what you want. Since Main.cpp refers to the Point type, you need to tell the compiler where to find that type's parent assembly. To do that, in Solution Explorer, right-click on the project name Main, select Common Properties|References|Add New Reference|Browse, and navigate your way to the file Point.dll in the Point project folder created earlier. Select Add|OK, and OK. Select Build, and Main.exe is produced in the Main\debug folder. Execute the program. If you want the output window to persist, you must set a breakpoint at the closing brace of main.
Here are some things you might want to do to reinforce what I've presented:
[1] CLI stands for "Common Language Infrastructure," the subset of .NET that was standardized by Ecma Technical Committee TC39/TG3, and adopted by ISO/IEC.
[2] .NET is the name of a Microsoft product that is a superset of the CLI Standard. Another implementation of the CLI is Mono, from Novell/Ximian, which runs on Windows and Linux. See http://www.mono-project.com/about/index.html.
[3] http://lab.msdn.microsoft.com/express/visualc/. It is updated periodically as new features from the draft Standard are implemented.
[4] http://www.plumhall.com/ecma/index.html. (Tom Plum is convener of the C++/CLI Standards committee.)
[5] CIL and VES are part of the CLI Standard, ECMA-335 (http://www.ecma-international.org/publications/index.html).