September 2002 C++ .NET Solutions/Standard C++ Meets Managed C++

C++ .NET Solutions

Standard C++ Meets Managed C++

Herb Sutter

Managed C++: as small a superset of Standard C++ as possible, but no smaller.

These are exciting times for C++ no matter where you look. In this article, I’ll focus on giving an overview of two overlapping and hopefully converging paths:

The roadmap for Standard C++.

The roadmap for C++ within Microsoft and with .NET.

In particular, this article surveys some of the major points of compatibility and incompatibility between Standard C++ and Microsoft’s managed extensions for C++ (a.k.a. Managed C++), and how the two are planned to converge in the future.

Standard C++

The first C++ Standard was published in 1998; it’s also known as C++98. Work is now just getting underway for Standard C++’s next revision, which will be complete sometime this decade and so is being referred to as C++0x, where the “x” will be filled in later.

In other articles, I and other writers have given an overview of where Standard C++ is today and where it’s going in the near and medium-term future; see [1] for Matt Austern’s introduction, and [2], [3], and [4] for my own overviews of the state of the union and the C++ standardization process’s organization, history, and next steps. Figure 1 reproduces a figure from [3] that outlines the standards process and the major efforts under way that can or will influence the development of C++0x. See [3] for more details about the committees and other entities in that figure that I don’t discuss again here.

Without rehashing that earlier material, there are several general trends or interests worth noting. The following areas are of particular interest to the standards committee as we work toward C++0x:

Threading (and other concurrent programming support). Standard C++ today is notoriously silent on the subject of threads. That’s just odd in a world where it’s common to write multithreaded systems, and where other languages and pretty much all major operating system environments do provide explicit concurrency support. Threading and concurrency is therefore a specific area where C++0x is likely to adopt a solution that acknowledges threads in the C++ language and library and sets out the guaranteed semantics and effects you should be able to portably rely on.

Running on a virtual machine. A commonly-asked question about Standard C++ is, “Can I compile C++ to run on the JVM?” As managed virtual-machine environments grow in popularity and become a more common platform target that real programmers are writing real code for, it’s important that the next version of Standard C++ have something to say about managed environments so that C++ continues to be a first-class language for all platforms.

Distributed and network programming. In Standard C++ right now, there’s no notion of proxy objects, marshalling, network streams, or anything like that. That’s a little dated, because modern programs do a lot of those things these days, especially in the most quickly growing categories of software development. The people working on C++0x are specifically interested in finding ways to better support such facilities in a standard and portable way.

Garbage collection. Today’s Standard C++ says nothing about GC (garbage collection). You can create an add-on GC library for C++ and design it to be intrusive or non-intrusive, and you can define extended semantics for running C++ in your own GC-enabled environment, but neither of those is truly satisfying because neither gives the same level of portable support as in-language GC support. Here are two particular examples:

Many GC environments use nondeterministic destruction, possibly with some notion of finalization separate from memory reclamation. This means that, unlike normal C++, you can’t tell when objects will actually be destroyed and their memory deleted and that those events might or might not happen at the same time on a particular system.

Some GC environments use compacting garbage collectors, which can be problematic because C++ programs generally don’t expect objects to shift around in memory on their own. (Think about what happens to pointers to those objects.) Conservative garbage collectors do not require the ability to move objects.

Dealing with GC is also of interest in C++0x. The GC issue also touches on other things under way in the standards world, such as the C++ community’s recurring interest in adding to Standard C++ a new concept called a move constructor, a kind of copy constructor that knows that the source object will immediately cease to exist. The concept of move constructors happens to fit well with compacting garbage collectors, which move objects around in just such a way.

There are other areas of interest in C++0x, but those are several of the larger ones.

C++ at Microsoft

Microsoft has a large investment in the C++ language. Virtually all Microsoft products are written in C++: that includes all flavors of Windows, Internet Explorer, Office, FrontPage, Exchange, SQL Server, the games and sims, Visual Studio itself (including the C# compiler and other languages, and excluding only some parts such as the Base Class Libraries that ship with the .NET framework which were written in C#) — pretty much the whole shooting match. Does Microsoft care about C++? They clearly do, and they equally clearly need to continue to care, because of their heavy vested interest in C++ code.

For years, C++ has been central to Microsoft, but regrettably Standard C++ has not.

Microsoft may use C++ a lot, but for some years now it hasn’t had a great reputation for standards conformance in the C++ world. For much of the 1990s, they’ve been a laggard, haven’t cared much about implementing the Standard, and didn’t even show up at our standards meetings. Things started changing a couple of years ago; now for the past two years or so they’ve resumed active attendance and are contributing at standards meetings, and finally there’s some real product evidence in the pipeline that conformance to the C++ Standard is important at Microsoft.

But you can soon judge for yourself: a lot of catch-up work has been done in the past year, to the point where the version of Visual C++ currently under development (which should be in beta by the time you read this) is one of the most standards-compliant C++ compilers running today on any platform. It’s hard to overstate just how big a turnaround that is — sadly overdue, but welcome nonetheless given the large number of people who use the Visual C++ compiler and the influence it has well beyond the Windows world, as many of the world’s C++ programs that target non-Windows environments target Windows too.

The other Big Thing happening at Microsoft is the .NET platform. Given .NET’s importance at Microsoft, it’s clearly in everyone’s best interests that C++ work well on that virtual-machine managed platform. But hey, .NET defines a lot of platform facilities — ones that are accessible via object-oriented, not just procedural, access to a common framework through a CLR (Common Language Runtime).

And, interestingly enough, .NET happens to provide support for, among other things, all of the points mentioned in the previous section as large areas of interest for C++0x:

Threading (and other concurrent programming support). The CLR provides a common threading model and support for locking and other primitives necessary for robust concurrent programming.

Running on a virtual machine. A key idea of .NET is to abstract the OS platform, as Windows already abstracts the hardware platform. In particular, code runs in a virtual machine in a managed environment.

Distributed and network programming. .NET provides built-in services for proxies, marshalling, network streams, distributed services, and other important network programming tools.

Garbage collection. The current .NET garbage collector is a compacting GC, which means that objects can be moved around in memory in a way not (easily) usefully detectable to a naive C++ program. Today, a program has to “pin” objects in place during times when the program has to be able to assume they don’t move. It might be possible that in the future the CLR might allow custom garbage collectors, which would allow the creation of conservative GCs that don’t need to perform memory compaction, but in the meantime defining move semantics for C++ objects could go a long way to making compacting GCs compatible with C++ objects. Further, the notion of separating finalization (destruction) from memory deallocation would allow deterministic finalization while still running in a managed environment. These are opportunities for improved convergence between Standard C++ and Managed C++.

There are many other things .NET provides, but those are the ones that happen to directly target some of the same areas that are also of interest in C++0x standardization.

Standard C++ is designed to be highly portable and platform neutral so that it can be implemented on pretty much any platform, or at least all important ones. Is that at odds with .NET? Maybe not. .NET definitely has strong Windows roots, but two things are interesting: a) the core parts of the CLR, known as the CLI, are now an ECMA standard and may soon become an ISO standard; and b) even now multiple implementations of that Standard, including an implementation by Microsoft for FreeBSD of all things, are available or under development for non-Windows platforms. It’s a little early to tell, but the indications at the moment are that .NET might end up being a platform applicable for production use much more widely than just on Windows boxes. We’ll see.

The Question: What about Standard C++ and Managed C++?

Today, Managed C++ is Microsoft’s attempt to extend Standard C++ in ways that directly support .NET’s CLR and its threading, virtual machine, and garbage collection facilities, as well as its related GUI, network, and other libraries. Those things are also of interest to future Standard C++. Perhaps in the future there can be some synergy here, depending on the various participants’ interest, or perhaps not. Regardless, what is interesting and worth noting is how people are being driven to solve the same problems, and how they’re being solved in production in something reasonably close to Standard C++ today.

Managed C++ is a strict superset of Standard C++, but it’s as minimal a superset as possible. The Visual C++ engineers say, convincingly, that this is not a case of “embrace and extend.” They tried, and are trying, to create as few differences with Standard C++ as possible. Indeed, Managed C++ will continue to work toward closer convergence with Standard C++, further increasing the overlap; perhaps, in adopting semantics for things like garbage collection, Standard C++ might in the future even move toward closer convergence with managed environments.

Where does Standard C++ fit with Managed C++, and how do they work together? Let’s consider the current status of C++ on .NET. To follow along, see Figure 2, which attempts to illustrate pictorially the overlap that Managed C++ provides with Standard C++ and .NET programming.

Compiling Standard C++ Code to the .NET CLR

“Can I compile my existing C++ program with /clr and run it on .NET?” This is the first and probably most basic scenario for moving code to the CLR, and it works completely. (The exceptions you’re likely to encounter aren’t actually Standard C++, as I’ll describe in a moment).

This scenario is important to you as a C++ programmer with existing code and skills, and for the same reasons it’s important to the vast majority of Microsoft developers. As noted, virtually all Microsoft products are written in C++, so Microsoft clearly has a heavy vested interest in C++. So do you. Today, you can take any conforming C++ program and compile it with /clr... and it just works. For example, you can take the many thousands of lines of code in the Office application suite, compile it with /clr to .NET’s IL, and have Office run entirely on the .NET runtime. That’s a pretty important accomplishment.

The one slight gotcha is actually outside the area of the C++ Standard, but only just beyond its edge. Let’s take a moment to look at it in a little more detail.

Meet the One Definition Rule

Standard C++ has something called the ODR (One Definition Rule), which says that your program can compile more than one definition of the same function or class in different places in the program and happily link them all together and that’s perfectly legal, as long as all those definitions are exactly identical. Now, clearly, in the normal course of affairs, you don’t want a class X in one source file to have a different definition from class X in another source file, and usually you don’t, which is all kosher, fat-free, and morally sublime.

But here’s the rub: in the real world, real programs occasionally do let some minor ODR violations creep in, and they usually don’t mind (or even know) because the violations are so slight that they don’t always actually interfere noticeably with the meaning of the program. By the way, if all the qualifications in that last sentence worry you slightly, good; they ought to, even though they do reflect real-world practice and what many code bases are living with, knowingly or not.

For example, consider template functions declared in header files. The way the Microsoft compiler implements the template inclusion model, a template will be instantiated in every translation unit that uses it. Of course, being neat and orderly minded, you don’t want to just keep them all and have lots of copies of the template littering your executable image in the end, not a bit of it — and so Something Must Be Done about all the copies. Not surprisingly, the Someone Who Must Do It is the linker.

In native mode, the Visual C++ linker just picks one instantiation and throws away the rest, assuming that all the copies are the same — after all, Standard C++ says they must be, right? So even though the template might have slightly different definitions in different translation units, the effect of those slight differences is often innocuous or at least unnoticed, even though they could be potentially harmful. I’m certainly not implying that you should be writing code that knowingly violates the ODR, because such code does harbor pitfalls and violates the Standard; I’m only pointing out the reality that such code exists in the wild, that it can creep in undetected, and that you might have some like it already and just not know about it if the effects happen to be benign. “So what if a class has an extra friend in one place, or otherwise looks a little different here or there?” say many compilers/linkers, including native-mode Visual C++, and happily accept your program with nary a burp, ignoring the minor breach of etiquette.

Life in .NET under the CLR is stricter. In C++ compiled for .NET, such slight ODR violations that may have crept into your code base will probably manifest as a linker error. Why? Because, whereas the native linker just picks one of the copies and leaves the rest alone, the URT (Universal Runtime) actively merges metadata from different translation units, and if it sees something inconsistent it will complain. (Given that it’s complaining about things that really were already potentially harmful and almost certainly unnoticed in the first place, this is probably a good thing.)

As long as you know what it is you’re looking for and what to expect, such errors are usually easy to resolve and good to resolve. For purists, it doesn’t affect the point that all standards-conforming C++ programs compile fine to .NET, because they do. For realists, it’s worth noting that while Standard C++ maps just fine to .NET, not all real-world programs limit themselves strictly to the Standard, and compiling with /clr may flag existing potential errors you may not be already aware of.

Goodies, Limitations, and Opportunities

Beyond just compiling Standard C++, there is some pretty strong support in Managed C++ for targeting the CLR’s managed environment. In particular, Managed C++ allows you to create managed types, identified with the __gc keyword, whose objects will live on the managed heap and that can be consumed by other .NET languages. Here “can be consumed” means that other .NET languages can not only create and use instances, but can do things like seamlessly inherit from the class. Managed C++ also makes it possible to consume (including to inherit from) types and facilities from the .NET framework itself, as well as ones someone may choose to write in other .NET languages.

There are, however, two major classes of current limitations between today’s Managed C++ and today’s CLR:

First, you can’t take all of Standard C++ into the managed world. In particular, as shown in Figure 2, while you can create class templates and you can create managed __gc classes, you can’t create __gc class templates. It would clearly be nice to remove this category of limitations.

Second, non-__gc C++ objects can’t be made available for seamless consumption by other .NET languages. Some of these cases exist because of technical barriers of the kind that compiler writers need to solve; others, however, fall squarely into the “clear low-hanging fruit” category, because for example there’s little reason not to emit full metadata for C++ PODs (“plain old data,” a Standard C++ term for simple data structs without full class-like behavior and restrictions).

Finally, Microsoft is working to extend the CLR in particular ways, notably to add support for run-time generics; compile-time generics are already powerfully available in C++’s templates. How well the CLR version of generics will overlap with the C++ facility remains to be seen, but clearly providing good compatibility and allowing the CLR versions to be used natively by C++ and vice versa is an important feature. The Visual C++ team knows it.

Summary

C++ is the premier language for systems and performance-oriented development on Windows and .NET, period — it certainly is that at Microsoft, and few modern programs in any organization can afford to say, “oh, well, we’ll just take a 30% performance hit and use some other language.” Most of us work on projects where the next release’s feature queue includes several performance-related requests, and in many application domains such as scientific and engineering work, “pretty fast” is never fast enough. Many applications likewise just can’t live (easily) without the power of being able to work at every level from bit twiddling to high-level abstraction all within the same language. That means that a large number of our software projects now need, and will continue to need, the power and flexibility of a language like C++, even as we use it to write distributed and web applications and server software.

Microsoft’s stated goal for Managed C++ is to allow everything you can do in Standard C++ to have full first-class support on the .NET CLR. That is, you should be able to use all C++ features on managed types (so Microsoft has to allow constructs like template __gc class Whatever , which is not yet supported today). Conversely, you should be able to write to the CLR with the same power as any other .NET language, so that other languages can consume our safe, fast, and efficient code as easily as code written in their own language. Some of this is here today, and more needs to come in the next releases. In some areas, achieving full convergence will be very difficult; for example, C++ has multiple inheritance and the CLR does not. But Standard-Managed convergence is still a valuable and important goal for C++ at Microsoft, and for the wider C++ community who are not well served by competing “flavors” of a language. That’s the kind of situation that a standard is supposed to prevent, and Standard C++ has done a great job of unifying pre-standard implementations.

As the time now comes to consider Standard C++’s continued role as a premier player in the new world of web services and virtual machines and network programming, Managed C++ provides probably the most prominent design for using C++ in such modern environments. Overlap with Standard C++ could be better, and the team knows it and is working hard to resolve the remaining limitations.

It will be interesting to see what convergence and improvements the next releases of both Standard C++ and Managed C++ will bring.

References

[1] Matt Austern. “The Standard Librarian: And Now for Something Completely Different,” C/C++ Users Journal, January 2002), <www.cuj.com/experts/2001/austern.htm>.

[2] Herb Sutter. “Sutter’s Mill: Toward a Standard C++0x Library, Part 1,” C/C++ Users Journal, January 2002, <www.gotw.ca/publications/mill20.htm>.

[3] Herb Sutter. “The New C++,” C/C++ Users Journal C++ Experts Forum, February 2002, <www.cuj.com/experts/2002/sutter.htm>.

[4] Herb Sutter. “The New C++: The Group of Seven — Extensions Under Consideration for the C++ Standard Library,” C/C++ Users Journal C++ Experts Forum, April 2002, <www.cuj.com/experts/2004/sutter.htm>.

Herb Sutter (<www.gotw.ca>) is secretary of the ISO/ANSI C++ standards committee, author of the acclaimed books Exceptional C++ and More Exceptional C++, and one of the instructors of The C++ Seminar (<www.gotw.ca/cpp_seminar>). In addition to his independent writing and consulting, he is also a C++ community liaison for Microsoft.