March 2003/Uncaught Exceptions

Uncaught Exceptions: Off the Road Again

Bobby Schmidt

Tracking pointers from Who Knows Where and other revelations.

Each of my questions this week mixes C and C++. And that’s fitting, as I’ve just returned from a three-week road trip that included back-to-back C and C++ Standard committee meetings in Santa Cruz.

When the C++ meeting was in Seattle in 1990, and in Redmond in 2001, I sat in for a couple of hours each time. But this time — and for the first time — I attended both meetings all days as an official observer from Microsoft.

The committees meet twice each year: autumn within North America, and spring elsewhere. Accordingly, each committee condenses six months of work into one week. This cramming resulted in long days, some starting at 8:30 A.M. and not wrapping up until 12 hours later. By the end of the second week, many of us were obviously worn out.

Those who attend these meetings consistently are serious about the C and C++ Standards, their implementations, and their customers. You can’t be casual and make this level and length of investment. Now it’s easy to take potshots at the Standards and those who create and maintain them. Lord knows I’ve been guilty of that sin more than once in this column. But after having spent two weeks immersed in this process, I have a much deeper appreciation for the work and players involved.

After the meetings, I drove north for The (3rd) C++ Seminar (a.k.a. TCS 3). The highlight for me was emceeing our first ever C++ trivia contest on the final day. Among the contestants we grilled on stage were Chuck Allison and David Burggraaf, top dudes of CUJ and Microsoft’s Visual C++ team. Sadly for the honors of both CUJ and Microsoft, neither man won. But happily for everyone else, both organizations graciously supplied t-shirts as prizes.

If you attended — and especially if you played — I trust you had as much fun as I did.

Will there be a TCS 4 in 2003? I truly hope so. Will there be more committee meetings? For sure, and I had planned to be at both, in Oxford, England this April, and Kona, Hawaii this October; but now it looks as if I’ll spend April in America. I think Chuck will attend at least one of the C++ meetings. Maybe the committee’s questions will be easier than the seminar’s. [Griller is always easier than Grillee, no? — cda]

Point of Origin

Q. Hi Bobby,

Despite many searches through comp.lang.c++.*, I have not received a relevant answer.

Say I have a pointer to some C library struct that I’m accessing from C++. I’d like to know if this (legal) address is from an object allocated on the heap or an object allocated on the stack. I will use this only for debugging and assertions. I’m using Microsoft Visual C++ 6.0, so I don’t insist on portability.

Thanks in advance.

— Adi Shavit

A. Pointers are pretty naive little beings. They contain only a single (presumed) address to some other thing and have a type that (presumably) tells what kind of thing the address references. But they have no memory of where their contained address comes from, or anything else about the context of that address. Further, there is no Standard C++ library call to tell you this information.

Here are a few simple ways you can overcome this problem:

Find a system-specific call that can tell you whether an address is in the stack or the heap.
Wrap the C struct in a C++ class. Add a tag member to the class and fill a different value into the tag dependent on where the class was allocated. Create and destroy the wrapper objects; do not create and destroy the C structure directly.
Same as above, except you selectively enable ways to create wrapper objects.

Method #1 requires knowledge of Windows programming that I don’t have. There well may be a call you can make that tells you where an address lives, or some algorithm you can apply to an address’s value. I just don’t know.

Keep in mind that there are multiple ways an object may be allocated: statically, automatically (on the stack), from the heap (via malloc), from the free store (via new), from a system-managed store, from an application-managed store, and so on. That an address doesn’t live on the heap doesn’t imply that it’s from the stack.

Method #2 requires that you wrap the C structure in such a way that the wrapper object knows where it came from. I show a simple possibility as Listing 1. Highlights:

The constructor senses the value of the per-class member new_called_. That member is true if and only if operator new is called for a C_wrapper object.
The constructor saves that value in the per-object tag_.
The wrapper object yields the tag’s state through is_dynamic.
The wrapper also turns itself into to the original C structure.

Method #3 requires that you hide certain C_wrapper members:

To disable new on C_wrapper, declare operator new as private.
To disable delete on C_wrapper, declare operator delete as private. This also has the side effect of preventing new on C_wrapper [1].
To disable creation of automatic or static C_wrapper objects, declare the C_wrapper constructor as private. Doing so will also disable new on C_wrapper pointers. You may therefore want to abandon new and write a custom function to create C_wrapper objects.
Ditto, except you declare the C_wrapper destructor as private, can’t call delete on C_wrapper pointers, and need a custom way to destroy C_wrapper objects.

These are certainly not the only approaches; for example, by trafficking in smart pointers or proxies instead of raw pointers, you could make the solution much more robust. Still, these approaches are simple and should give you some ideas for further development.

One last comment: my general experience is that if you find yourself fighting how the language semantics work, or are trying to divine meta-information beyond what the type system provides, you may well have some underlying weakness in your design. I’m not saying that’s necessarily true; for all I know, what you want makes sense in your program. Still, you might want to step back from your design and design assumptions and make sure that what you want truly makes sense.

Back to the Future

Q. Hello Bobby,

I am looking for a small example that ports a simple C++ non-virtual class type to C. This is not a task I’m wholly in favor of, but nevertheless I must paint the barn blue, and I wish to paint it properly. I have a few thoughts on the general approach, but I like to hear ideas from others who have transformed C++ to C on large projects.

Regards

— Wanda Simoneaux

A. (In all of my comments below, “C” means “C90.”)

Listing 2 shows a simple C++ program. The class in that listing has these features:

A single private int data member.
A combination conversion/default constructor that sets the data member and increments a class-wide instance counter.
A destructor that decrements the counter.
A copy-assignment operator.
A conversion operator yielding the private int value.
A static member function returning the current instance count.

Listing 3 shows my elementary and simplistic translation of Listing 2 into C. Some observations about that translation:

All C++ member functions map to global-scope C functions sharing the common name prefix X_.
Each instanced (or non-static) C++ member function maps to a C function declaring an explicit this pointer as the first parameter. const member functions map to C functions declaring this as X const *const, while non-const members map to C functions declaring this as X *const. This preserves the C++ behavior in C, by always preventing changes to the this pointer itself, while selectively allowing or disallowing changes through the pointer.
The static C++ member function count has no this parameter.
The instanced C++ private class member value_ maps to a C struct member of the same name. This name is not private in C and can be accessed throughout the program. There are ways to better emulate a private instanced member in C, but all the ones I know add source complexity, portability constraints, or run-time overhead.
The static C++ private member count_ maps to a global-scope static C variable of the same name. Because the C variable is static, it is effectively “private” with respect to other translation units.
You must manually remember to construct and destruct X objects. This becomes especially cumbersome if you distribute the definitions of automatic X objects throughout a function or have multiple return paths from that function. It also precludes your C code from mimicking the behavior of C++ global-scope X objects, since such objects are constructed and initialized before main is entered and destructed after main is exited.
As C lacks default arguments, what was one constructor in C++ maps to two construction functions in C.
You could assign the C variables directly, as in x1 = x2, without using the assignment function X_assign. But in more complex examples — especially those with pointer members — you will need an assignment function. In general, the simple built-in = operator will work in C if and only if the compiler-synthesized assignment operator works in C++.
The assignment function returns void, where the C++ assignment operator returns the expected X &. As C lacks references, the closest you could come to the C++ behavior is to declare a return type of C * and return this. That would allow awkward chaining such as X_assign(&x1, X_assign(&x2, &x3)) if you really insist on an analogue of C++’s x1 = x2 = x3.

You specifically asked for a C++ example involving no virtual functions. I’m glad you did, for virtual functions — let alone inheritance in general — complicates this simple model dramatically [2].

What, Were, They, Thinking?

Q. Bobby,

I would always have thought that statements like:

if (a, b)

or more simply:

if (7, 11)

were blatant errors. The comma operator is not a logical operator. Is it? That cannot be considered a logical statement. Can it?

Well, I just lost a bet on it.

The following is from the ISO C99 specification (6.5.17):

The left operand of a comma operator is evaluated as a void expression; there is a sequence point after its evaluation. Then the right operand is evaluated; the result has its type and value.

...

[T]he comma operator... can be used within a parenthesized expression or within the second expression of a conditional operator... In the function call f(a, (t=3, t+2), c) the function has three arguments, the second of which has the value 5.

So, in fact, if (7, 0) compiles as if (0), and if (0, 11) compiles as if (11). I tried this, and my compiler seems to agree. But how has something so weird and unintuitive made it into the Standard. Why?

— Dan Watkins

A. Note that while I specifically mention C below, the same general analysis applies to C++ as well.

The comma operator, along with its two operands, forms an expression. And like most C expressions, the comma expression has a type and value [3]. But what type and value? What makes sense, independent of the language rules?

In the simple comma expression within:

(1, 2)

I see two reasonable candidates for the expression’s type: int and void. Given your question, you apparently think the type should be void so that the expression has no usable value in contexts such as:

if (1, 2)

But that would also preclude:

int n = (1, 2);

and:

double y = (f(x++), g(x));

and:

return x > 0 ? sin(x) :
    (x = -x, cos(x));

Is this a reasonable limitation?

I think not. I’d argue that comma expressions would have insufficient merit if they could only have type void and wouldn’t be worth their cost to the language [4]. If we allow comma expressions to exist, I think we must allow them to have non-void type.

The question then becomes: which type should they have? In the earlier simple example:

(1, 2)

the answer seems to be int, as that is the type of the two operands. But what about:

(1.0, 2)

Should the type of the expression be double? int? If this were an arithmetic expression instead of a comma expression:

(1.0 + 2)

the int operand would convert to double. Should that rule apply to comma expressions, so that:

(1.0, 2)

has type double? Should the answer change for:

(1, 2.0)

where the types are switched?

Then consider:

char *cp;
int *ip;
(cp, ip)

What’s the type of the comma expression? In this case, the two operand types have no standard conversion to one another. Does that mean the comma expression should be disallowed?

For a different take on this question, contemplate the extended sequence:

int f(int n)
    {
    return n;
    }

double g(double n)
    {
    return n;
    }

double h()
    {
    return
        (f(1), f(2), f(3), f(4), g(5));
    }

What should the returned value be? I’d say either 1 or 5.0, the first and last operand values of the comma expression. From a code generation perspective, the net result would be:

f(1);
f(2);
f(3);
f(4);
return g(5);

if the expression takes on the last operand value, and:

int temp = f(1);
f(2);
f(3);
f(4);
g(5);
return temp;

if it takes on the first. The former interpretation is less expensive and thus is more likely from an efficiency perspective.

For an alternate perspective, consider the chained arithmetic expression:

return
    (f(1) + f(2) + f(3) + f(4) + g(5));

where the only reasonable interpretation is that the return value cannot be known until the last operand is evaluated. It seems consistent to me that in:

return (f(1), f(2), f(3), f(4), g(5));

the return value also cannot be known until the last operand is evaluated. This suggests that the expression’s value is that of the final operand, which further implies that the expression’s type is that of the final operand too.

In summary, then, my sense of how comma operators and expressions ought to work matches the language rules.

Now I’ll grant that the very existence of the comma operator is perhaps “weird and unintuitive,” especially given the long-established use of the comma as an argument/parameter separator in many languages. (Indeed, I think I would have preferred the colon for this operator.) But once you accept that the comma operator exists, I hope you’ll agree that the rules make some sense.

Notes

[1] What’s that? How can hiding operator delete prevent new? I leave that as an Exercise For The Reader.
[return to text]

[2] Those who have used Microsoft’s COM in C should nod vigorously at this point.
[return to text]

[3] The one exception comes if the expression type is void, since void expressions have no value.
[return to text]

[4] And yes there is cost, if only the confusion beginners suffer distinguishing the comma operator from the comma in a function declaration or function call.
[return to text]

About the Author

Although Bobby Schmidt makes most of his living as a writer and content strategist for the Microsoft Developer Network (MSDN), he runs only Apple Macintoshes at home. In previous career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him on the Internet via BobbySchmidt@mac.com.