September 1994/Questions & Answers

Columns

Questions & Answers

Parentheses with new Operator

Kenneth Pugh

Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C and C++ language courses for corporations. He is the author of All On C, C for COBOL Programmers, and UNIX for MS-DOS Users, and was a member of the ANSI C committee. He also does custom C/C++ programming and provides SystemArchitectonicssm services. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 489-5239. Ken also receives email at kpugh@allen.com (Internet) and on Compuserve 70125,1142.
Q
In the April issue starting on page 115, you discuss certain uses of new expressions. You say, "He (Plauger) suggested that there are some subtleties in the C++ standard that dictate when you need to write parentheses and when sometimes even parentheses won't do. Different implementors probably interpret the rules differently and may even vary in how well they get their own rules right. Perhaps a reader can come up with an explanation of what these allocations really do."
OK, allow me to set it all straight. The rules of early C++ only allowed a simple name (one word) to be used as the type of a new- expression. It was C++ 2.0 that introduced an extended syntax. The extended syntax has not changed to this day, and has two forms. The first form figures out for itself where the type expression ends, and allows a restricted form of types. The second form is explicitly parenthesized. Like an if or while statement, the parens are part of the syntax and are not grouping parens. This is what happens in (1).
char* pc5= new (char*) [n];  (1)
The new-expression subexpression here is not new(char*) [n] as intended, but new(char*) only. You might consider this an order-3-of-precedence goof. To make it perfectly clear, (1) does the same thing as (2) and (3) below.

char** temp= new (char*); (2) char* pc5= temp[n]; (3)
You can see that this does produce a char*, and is a totally nonsense char*. The expression will allocate one (char*) and return a pointer to it, a char**. That is then subscripted in the normal way, adding n and dereferencing. There is no mystery why the final type is a char*, not a char** as originally expected.
The misconception here is the usage of the parentheses. The '(' indicates that this is the "explicitly parenthesized form" of new, and the type to allocate is delimited by the balanced parentheses. Either new char*[n] or new((char*[n]) would have been correct.
There are no subtleties here to worry about. A C++ programmer should be taught that the parentheses are part of the syntax of new, as with if and while, and are not grouping parentheses in an expression. There is a second form of new that does not use the parentheses. If a '(' appears where the type name goes, the parser uses the parenthesized form, otherwise it doesn't.
In addition, I would like to point out that (4) does the same thing as (1). The extra parentheses are redundant.

char* pc6= new (char(*))[n]; (4)
Eric Nagler has tried these (1 and 4) on a variety of compilers, and surprised me by getting compile errors on some. So it looks like there is some truth to Plauger's statement about compilers differing on the application of the rules. But, this is just a case of a botched parser, not a moving target or unclear specifications. The way new parses was well documented in the C++ 2.0 reference manual, is explained in the Annotated C++ Reference Manual [1] (ARM), and sits today in the draft working paper without change to this aspect of its meaning.
Using a new-expression in C++ is not as difficult as some people make it out to be. You just have to know the rules. Banging on it to see what happens may lead to enlightenment, but often leads to confusion.
P.S. For details on the grammar of new-expressions, see section 5.3.3 in the ARM or the draft working paper. Also my own work The C++ Test of Knowledge (or, the Guru's Handbook) contains an explanation of this exact problem.
John Dlugosz
Warminster, PA
A
Thanks for your clarification. Sometimes it just takes a pinpoint of light to illuminate the entire problem. For readers transitioning from C, let me suggest the following code for comparison. This analogy will show why you probably have not encountered the same problem with C. In C, the malloc function allocates memory in a fashion similar to new. In C, you use:
char * pc;
pc = (char *) malloc(sizeof(char));
In C++, you allocate a pointer to char with new, as in:
char * pc_cpp;
pc_cpp = new char;
In C, you could assign a value to this pointer to char with:
pc = ((char **) malloc(sizeof(char *)))[2];
This is syntactically correct, although you probably would never write something like this. The hideous expression on the right converts the void * pointer into a char ** pointer and then dereferences it one level by using the []'s. Note that the resulting value references memory that has not been allocated to the program. The cast is required since malloc returns a void * and you cannot use a pointer of that type in any expression.
The equivalent assignment in C++ is:
char *pc_cpp;
pc_cpp = new (char *)[2];
Note that the new operator in C++ returns a typed pointer (char **), so that a cast is not required to use the subscript. This is the form which John comments on in his reply. Again, the resulting value references non-allocated memory, so it's a meaningless assignment. The above form is typographically close to a meaningful assignment, such as:
char **pc_cpp_good;
pc_cpp_good = new char * [2];
This allocates an array of two pointers to chars, and it does represent valid, working code.
You probably would never write the erroneous C statement, as it looks ugly. However, just as the original questioner did, you might accidentally attempt to code the meaningless assignment in C++.
Let me use some actual numbers to further illustrate the problem. If malloc returns 100, then the value that is assigned to pc is 100 + 2 * sizeof(char *). This value will be 108 (with four-byte pointers). In C++, if new returns 100, then the value assigned to pc_cpp is the same.
If you attempt to execute either of
*pc = 7;
*pc_cpp = 7;
then you will be writing over memory that is not allocated to you. For MS- DOS users, you may receive a "Memory allocation error — System halted" message, when your program exits. On other systems, there may be no apparent error.
Note that the original statements which perform the allocation do not save the address of the allocated memory. You have lost those values. If you attempt to execute either of
free (pc);
delete pc_cpp
you will be passing addresses to free and delete which have not been allocated. The resulting error can be similar or worse than that caused by the previously mentioned assignments.
As John stated, section 5.3.3. of the ARM shows that parentheses can be used around the type name. It does take a bit of parsing practice to create all the possible forms using the new operator. Section 5.3.3 has examples of only the simpler forms. One sample shows that you can code the following:
int (* p_array)[10];
p_array = new int[20][10];
This will allocate space for 20 elements in an array, each the size of int [10]. The type returned by the new operator, as shown by the ARM is int (*) [10], so the assignment works properly.
Given John's explanation above, you could have:
int i;
i = new (int[20])[10];
The new operator allocates the size of 20 ints. new returns a pointer to ints. The subscript dereferences the pointer and returns the value of the 11th element in the array that was allocated.
If this all seems difficult to understand, then do as I typically do with complicated expressions. Hide this complexity in layers of objects. You should be able to arrange your objects so that you never need to use expressions that are difficult to understand. For example, you could design the first sample as:
class X
   {
private:
   int array[10];
   };

X * px;

px = new X[20];
px points to an object that represents an array of 20 by 10 ints. This design avoids multiple []'s and multiple *'s.
I figure that if compiler writers (as shown in letters last month) and ex-compiler writers (such as P.J. Plauger) have questions on a syntax matter with parenthesized expressions and the new operator, then mere mortals should probably avoid such constructions.

Persistent Objects — An Alternative
Q
In the March of 1994 issue C Users Journal, you answered a questioned from Paul Waldo...it was interesting to note that his problem could have been solved using Objective-C...instead of the kludge you suggested using C++. It is too bad that Objective-C does not get as much airplay as C++...considering that C++ is so garbled in the first place. Mind you this is just one engineer's opinion (-:
A short code fragment (Listing 1) demonstrates one solution to the question posed by Paul. This example is not complete or checked for accuracy, merely an example of what can be done with Objective-C. Other things I like about Objective-C.
1. Objective-C is more elegant than C++
2. Portable to MAC, SUN, SGI, IBM (Aix & OS2) etc.
3. Dynamic binding and run-time checking.
More can be found in the FAQ on usenet for Objective-C
Once object is written it can be brought back in. This is just a rough idea of how it could be written. StoreOn: is inherited from Object. You would have to implement a Store:On: method to save the class vars for Entry.
Jeff Bakst
MediaScape
A
Thanks for your reply. I looked at Objective-C a few years back, but stayed with C++. It's always good to look at another language to see what features your favorite one may be lacking.

Persistent Objects — Another Alternative
Q
I would like to suggest an improvement on one of your answers to Paul Waldo about run-time typing. You provided a good example of an Entry class with a save method, but it would be best if the Entry class 3didn't have to know what children it had, if any. Not knowing allows the class to be extensible in the future without modification of the Entry class.
For example, as it stands, if we create a new type of Entry, say Adjustment, then the Entry class and the Account class must both be modified. The problem is compounded if the class is in a library or immutable for some reason.
An alternate solution is to move the save and restore capabilities outside of the Entry and Account classes altogether, using separate classes that I usually refer to as builder and probe classes. They are often the same class, in which case I usually call the class a Conversion, and the prototype might look like:

class EntryConversion { public: virtual Entry* Retrieve(istream&) const= 0; virtual int Save(ostream&, const Entry*) const = 0; };
The Retrieve and Save methods are used to read and write to the streams. Note that Retrieve will check the type id at the head of the stream and rewind the stream if it is of the wrong type. (It is probably even safer to read the type outside of the Conversion object, since not all streams may be rewound, but that just complicates the explanation.)
Here's some typical implementations:

// Returns NULL for failure. // Entry* CheckConversion::Retrieve(istream& is) const { Check* rval= (Check*)NULL; Entry_type type; if (is >> type) { if (type == Entry_check) { rval = new Check; // Read entries from // stream into Check... } else // rewind stream } return(rval); } // This implementation will not write // the object unless it is of the // proper type. See implementation // of Entry::Save(). // int CheckConversion::Save(ostream& os, const Entry* entry) const { int rval = 0; // failure if (entry!=NULL && entry->type_of()==Entry_check) { ostream << Entry_type; // Write remainder of Check // values into stream... rval = 1; // success } return(rval); }
Here, the base Entry class knows about nothing except EntryConversions, and we could perhaps have some static methods/attributes dealing with the list of known Conversions:

class Entry { public: //... static void Learn(EntryConversion*); static Entry* Retrieve(istream&); static int Save(ostream&, const Entry*); private: static EntryConversion** known; };
When we want to save or restore an Entry, we simply examine our dynamically created list of conversions until we find a match, and then use that to save or restore it. The Entry class itself no longer needs to know what, if anything, inherits from it:

// Returns NULL for failure. // Entry* Entry::Retrieve(istream& is) { Entry* rval = (Entry*)NULL; for (int i=0; known[i]!=NULL; i++) { rval = known[i]->Retrieve(is); if (rval != NULL) break; // Ah! success } return(rval); } // Returns 0 for failure, 1 for success. // int Entry::Save(ostream& os, const Entry* entry) { int rval = 0; // failure for (int i=0; known[i]!=NULL; i++) { rval = known[i]->Save(os, entry); if (rval != 0) break; // Ah! success } return(rval); }
The drawback to this approach is that we must "teach" the Entry class its conversions. Entry could still learn the default classes in the base Entry class constructor, or we could provide some external initialization. Either way, what we would end up with is some code like:

Entry::Learn(new CheckConversion()); Entry::Learn(new DepositConversion()); Entry::Learn(new WithdrawalConversion());
Now, if we add an Adjustment class inheriting from Entry, none of the existing framework needs change, even if the base Entry class bootstraps itself with the default Conversions. All we need do is define our AdjustmentConversion class, and somewhere in our new application teach the Entry class the new Conversion, This is particularly important when attempting to distribute object libraries.
Another big advantage is that this totally separates the storage mechanism from the class itself. With a little more indirection, we can create the Conversions in such a way that the Entry class does not even need to know that we are using streams; we might use non-volatile RAM or some other non-stream method of object storage and the Entry hierarchy still need not change.
There are several disadvantages. First, the code is obviously more complex, and this is often a big hurdle, especially for people who are relatively new to developing OO applications. Second, it depends a lot more on convention for success; if someone forgets to teach their new Conversion to the Entry class, it will not be readily apparent why the system is failing.
Last, the initialization problem is nontrivial. Even if we decide that the Entry class should bootstrap itself with the known EntryConversions, what if we do a Retrieve before we have created an Entry? The known array of Conversions will be empty, and it will fail. blah.
Anyway, the article piqued my interest, since I have been dealing with similar problems quite a bit recently.
Burt Smith
Jeffersonville, PA
A
Thanks for your solution. As is usual, there are many ways to climb a mountain. The long way is more gradual and the short way is steeper. Each route has its tradeoffs.

References
[1] Ellis and Stroustrup. 1990. The Annotated C++ Reference Manual, Reading, MA: Addison-Wesley.