The Perils of operator const char*()

C/C++ Users Journal July, 2004

Avoiding traps and pitfalls related to conversions

By Michael H. Manov

Mike Manov is a senior computer system design engineer with Lockheed Martin. He can be contacted at mmanovcomcast.net.

Recently, a coworker came into my office and asked for a second opinion on a core dump. "It's dumping core when it tries to allocate something on the heap, and the stack trace ends in malloc(). What's going on?" The stack trace showed a crash on the following line, which should not have caused a problem:

std::vector<int>* v = 
  new std::vector<int>;  
// nothing wrong here

When a core dump stack trace ends in malloc() or another heap-control routine, this is near-certain proof that the heap has been corrupted. As usually happens with heap corruption, the crash occurred some time after the heap was first corrupted. The crash on instantiating the vector<int> is a red herring; we have to find the real error.

Modern tools such as IBM's Rational Purify (http://www-306.ibm.com/software/ awdtools/purifyplus/) make it much easier to diagnose these problems. A Purify run on the executable flagged a double delete; i.e., the same pointer was being deleted twice, an action that causes undefined behavior. The memory in question was the char* payload of an automatic (on the stack, not on the heap) instance of a simple string class.

class SimpleStringClass {
  private:
    char* s_;
  public:
  // The usual: ctor, cctor, dtor,      // =(), etc,
  // plus an ill-advised conversion    // operator:
  operator const char*() { return s_; }
};

We were able to isolate the code that was causing the crash:

1   void foo() {
2      SimpleStringClass aString;
3      // ... intervening code
4      delete aString;  
       // first char* buffer delete          // occurs here
5   }   // second delete occurs 
        // after end of function

How did this code compile in the first place? My coworker said, "Oops. I was creating the aString instance on the heap, but then I put it on the stack, and I forgot to take out the delete at the end of the function." The variable aString is an instance, not a pointer, and the type system should have caught this error at compile time. An experiment showed that this similar code would not compile:

6   std::string otherString;
7   delete otherString;  
// this line won't compile

An old, unanswered question suggested itself as the reason why line 4 compiled. The ANSI C++ std::string is the only string class I have ever used that does not provide an operator const char*(), opting instead for an explicit c_str() method. I always wondered why, but had no answer until now.

Here's why. A string class with an operator const char*() creates a nasty little loophole in the type system. The compiler is able to find a syntactically valid interpretation of line 4 because of the user-defined operator const char*() conversion, which supplies the pointer needed by the delete operator. Line 4a, to follow, shows the compiler's interpretation of the previously mentioned line 4. You can see the conversion operator in action when you step through the code in a debugger.

4a  delete aString.operator const char*();

Once this line compiled, the code was doomed. This same problem will occur with every other operator const char*()-enabled string class. An error that should have been caught at compile time has now moved to runtime. This program has entered the realm of undefined behavior.

Those of you who program with the Microsoft compiler will find a different situation. Although the MFC CString class provides an operator const char*(), line 9 will not compile in Visual C++ 6.0:

8  CString c(foo);
9  delete c;     // won't compile with 		// Microsoft compiler

This is because the Microsoft 6.0 compiler will not apply the delete operator to a const pointer. However, this behavior does not conform to the C++ Standard, which says that it is not necessary to cast away const before applying the delete operator to a pointer (5.3.5.2).

Other classes also offer instance-to-pointer conversion operators; operator const char*() is not the only offender. Any user-defined conversion that converts an instance into a pointer will create the same hole in the type system. For example, this compiles:

10   delete std::cin;
10a  delete std::cin.operator void*(); 
// how the compiler interprets line 10

Some compilers, notably gcc, will warn that the delete operator is being applied to a void*, which is not a pointer-to-object type. However, the SGI, Solaris, and Microsoft compilers don't issue a warning, even at their maximum warning level.

It is worthwhile to look at the alternative strategies employed by some standard library classes. std::string requires that you use the c_str() member to access the underlying character array. The deceptively simple std::auto_ptr<T> class does not offer an operator T*(). Instead, it overrides operator *() and operator ->(), and provides an explicit member function, get(), for access to its pointer. Even though operator ->() returns a T*, that operator will be used only when an explicit -> token is encountered. This is much different from an operator T*(), which will return a T* in any syntactically valid context. This allows you to use std::auto_ptr like a T*, but without the reduced type-safety drawbacks of an operator T*().

Few programs are completely independent of the C character API. Any string class must therefore provide access to its char* buffer. In these situations, it is important to distinguish between what you need (access to the const char*) and how you are going to get it (conversion operator or member function). Sometimes, you don't have a choice. You need a particular capability and you need it to be implemented in a specific way. For example, any object that will be stored in an STL container class, such as std::vector, has to have an operator=(). The compiler will create one for you, but it's up to you to make it work the way you want. A member function called assignMe() is not an acceptable alternative to an operator=() in this situation. In contrast, we have options with the string class. The operator const char*() offers the most convenience, but that convenience comes at too high a cost, namely loss of type safety. The type-safe solution is an explicit conversion function; this is why std::string makes you call its c_str() function to access its const char* payload.

Many traps and pitfalls lurk in a language as powerful and complex as C++. The ones we have examined here are just a few but they show how tricky user-defined conversions can be. As with type casts, conversion operators should be used only with the greatest of care. Instance-to-pointer conversions such as operator const char*() are especially hazardous. If you don't really need a conversion operator, prevent a potential bug and just don't write it.