Where you declare things in C++ really matters as Bobby shows through several answers to readers' questions.
Copyright © 1999 Robert H. Schmidt
As the mythical "they" say, be careful what you wish for...
I always wanted to be a professional writer full time, and now it looks as though I'll have my chance. Along with my gig for CUJ, I am also now a columnist and writer for MSDN (Microsoft Developer Network). My first column in their Online Voices magazine appeared this past May [1].
Compared to writing for a print publication like CUJ, writing for an online magazine presents some advantages. For instance, I'm not bound to a fixed printing schedule, since online bits can be published and updated any time. I'm also not constrained by surrounding ad space, fixed column widths, or page counts. And I hope to make good use of hyperlinks to draw in related material.
On the other hand, so much drivel oozes into the Internet that I wonder if online publications will ever seem as authoritative or glamorous as print publications. Even though I'm fairly comfortable with Internet culture, and keep many of my technical references online, I must admit that printed books and magazines still feel more real. Then there's the whole online profit model, which I don't pretend to understand.
These differences are real today, but they won't be forever. I fully expect that in my lifetime the choice of "paper or plastic" publications will be largely arbitrary. Still, I wonder what we'll call a real newspaper printed only on bits a newspaperless? and, most importantly, when I'll be able to read it in the tub.
First Things First
Q
I have been introduced to a project in development which I will be maintaining. While reviewing the code I was assigned to I was looking at some of the criticisms by the previous programmer assigned to do the maintenance but who quit. I am having a problem with both versions: the original code and his proposed revision. This is a large project and I am including only one of the functions. The original function is called getFileName and the revised function is getFileName1. The first parameter is an input and the second is an output.
I am concerned about the fact that both functions are using pointers without initializing them, and secondly I am also concerned about the second function with regards to assigning the first parameter fullpath to the pointer variable ptr, which is then assigned to the argument variable filename. My concern is that ptr being a local variable will be lost and the value of the parameter filename1 will be possibly lost after leaving the function. I have a feeling that the second function contains too many opportunities for errors. I had created a small test program to test both functions which is attached. What is your thought on this ? Thanks for your time. Joao Coelho
A
(I've included a slightly-reformatted version of your small program as Listing 1.) My principle recommendation: ensure that the design and code are specified.
I'm guessing from their titles that the functions accept a DOS or Windows fully-qualified path name, parse out the base file name, then return that file name. I'm also guessing that the base file name is that part between the directory specification and the file extension.
The specification must define and constrain the domain of valid input data. For example, can the fullpath argument contain zero backslashes? Contiguous backslashes, as in UNC names? Forward slashes? More than one dot? A drive specifier such as C:? More than 255 characters?
I can infer some of these answers by reading and testing the code. My inference is incomplete, however, because the functions react differently to certain input strings. To see an example, test both functions with the string "\\1.2.3". To see a more pathological example, test with "1.2.3". Normally I'd ascribe these differences to inadequate testing; but without a specification, you can't easily craft meaningful test cases.
The specification must also define how the functions behave in the face of invalid data. These are C functions, so they can't throw exceptions. They could return an error signal but since they have but one outbound conduit (the filename parameter), they would have to add a second outbound paramater, supply a return value, or overload the meaning of filename.
Bottom line: before worrying about implementation concerns like uninitialized pointers, you first need to determine your requirements, then codify those requirements in a specification. Complete and accurate specifications mitigate or eliminate many other deficiencies. Scott Meyers and I talked about this recently, and we both agree: inadequate specification is arguably the single biggest deficiency in most programming projects.
Heisenberg's Turtle
Q
Hi Bobby,
Can you please help me with the following problem. I have posted it to various discussion forums and asked my colleagues, but no one seems to have an explanation for it.
I am using the VC++ compiler. I was generally going through some C++ columns and experimenting with some code. While trying to declare a class and including an object of that same class in its definition:
class A { A a1; };I discovered that the compiler gave the error
'a1' : uses 'A', which is being definedwhich is quite understandable. But when I preceded the declaration with the static keyword:
class A { static A a1; };the code compiled fine.
I fail to understand why making the declaration static lets the class compile. Any light on this matter is appreciated.
Thanking you in anticipation. Kabir Khanna
A
I'm turning your question around somewhat. You want to know why the compiler accepts the static version, but I'm first going to show why the compiler rejects the non-static version. I'm also going to make the question generic to both C and C++.
When you define a class or structure type, you explicitly and implicitly imbue that type with various properties. One of those implicit properties is size. Consider this small program:
#include <stdio.h> struct X; typedef struct X X; struct X { char m1; }; int main(void) { printf("%d\n", (int) sizeof(X)); return 0; }which you can build in either language. We know that sizeof(X) is at least the combined size of its parts, which in this case means sizeof(m1). Since the Standards define sizeof(char) to be exactly 1, we infer that sizeof(X) is at least 1.
To mimic your original problem, add a second data member:
struct X { char m1; X m2; };We already know that the compiler will reject this definition. But let's assume, for the sake of argument, that the compiler accepts this definition. What is the size of X now?
As before, the size of the entire structure is at least the combined size of its data members. The algebra for our example is
sizeof(X) >= sizeof(m1) + sizeof(m2)Substituting for sizeof(m1), which we know to be a constant 1:
sizeof(X) >= 1 + sizeof(m2)The only variable left is sizeof(m2). Since every complete object has non-zero size [2],
sizeof(m2) >= 1By implication,
sizeof(X) >= 1 + 1or
sizeof(X) >= 2Remember, though, that m2 is of type X. Since we have shown that sizeof(X) >= 2, we also know that sizeof(m2) >= 2. In our original relationship
sizeof(X) >= sizeof(m1) + sizeof(m2)we can now substitute
sizeof(X) >= 1 + 2or
sizeof(X) >= 3I hope you can see where this is going. The pattern is recursive, and sizeof(X) will approach infinity. Long-time Diligent Readers might recognize this phenomenon as "turtles all the way down." [3]
As a remedy for such reptilian excess, the Standards disallow definitions of objects having incomplete type. If you want to define an object, that object's type must be complete at the point of the definition. If the act of defining an object changes the object type's size, that type is clearly not yet complete [4].
Now to your original question. In the C++ definition
class A { A a; // error };the member a is defined within class A. But in the variation
class A { static A a; // OK };the member a is declared but not defined within class A. The actual storage for a exists outside any A object. From the C++ Standard (subclause 9.4.2, "Static data members"):
A static data member is not part of the subobjects of a class. There is only one copy of a static data member shared by all the objects of the class.
The declaration of a static data member in its class definition is not a definition and may be of an incomplete type other than cv-qualified void.Thus, the size of A is in no way dependent on the size of a. Further, by the time a is actually defined:
class A { static A a; // declaration OK }; A A::a; // definition OKthe type A is complete. Net result: both the a declaration (within A) and the a definition (outside of A) are valid.
Typedefs and Templates
Q
Hi,
This is my first time programming in C++/Unix and I have a combined function pointer and template function. The compiler complains about the following code on the the typedef line:
template<class T> struct node { }; // error on following line typedef int (*func_ptr)(node<T>);Now this is the C phrasing for declaring a function pointer and I thought I would mix it with the C++ template idea but all the syntactic variations I tried come up with one complaint or another. Is there a better way to type define a function pointer that has template arguments?
Thanks. Jason Balkman
A
(The example above is greatly simplified from your original.) While you don't say so explicitly in your email, I assume you want
func_ptr<char> p;to behave as if it were written
int (*p)(node<char>);Unfortunately for you, typedefs can't have [template?] arguments. You can get close with
#define func_ptr(T) \ int (*p)(node<T>) func_ptr(char) p; // OKYou can also instrument the node template to expose the typedef:
template<typename T> struct node { typedef int (*func_ptr)(node<T>); }; node<char>::func_ptr p; // OKWhile compact, this solution creates a tight coupling between node and func_ptr. node should not have to change simply because another type wants to reference it. To remove this coupling, you can move func_ptr into a second template:
template<typename T> struct node { }; template<typename T> struct node_ { typedef int (*func_ptr)(node<T>); }; node_<char>::func_ptr p;I Do Declare!
Q
While writing some code I stumbled upon a compile-time error which has me a little baffled. The basic structure of the code is as follows:
class A { }; class B { public: B(A const &); }; void f(B const &); int main() { A a; f(B(a)); // LINE1 - OK B(a); // LINE2 - error }LINE1 compiles fine, while LINE2 produces the following compile-time error:
'a' : redefinition; different basic typesFurthermore, while my Unix boxes also state the problem with redefinition, they also suggest that a has no linkage.
Is it not true that the statement B(a) should produce a temporary unnamed object? I have returned objects from functions using the "return value optimization" many times with no problems.
Could the problem be that the compiler sees LINE2 as an ambiguous expression? The reason I suggest this is because the code compiles fine when LINE2 is changed to (B)a , hence forcing a cast. In other words, it may be that the compiler cannot decide whether LINE2 is a cast of the object a to type B, or whether it is an instantiation of a temporary (and therefore unnamed) object of type B. Philip C. Couchara
A
Your example does indeed show a lovely syntactic ambiguity in C++. Given that B is a type name, you can interpret the statement
B(a);as either
1. an expression converting object a to type B, or
2. the declaration of object a having type B, with redundant parentheses around the a.
You want the compiler to choose Interpretation 1 (conversion expression), but the evidence suggests that the compiler actually chooses Interpretation 2 (declaration). Which is correct? Interpretation 2. Why? Because the C++ Standard says so.
In subclauses 6.8 and 8.2, the Standard spells out tie-breaking rules designed to resolve ambiguities like B(a). Put simply, whenever the compiler considers a construct that could be either an expression or a declaration, the compiler interprets the construct as a declaration. Furthermore, other than determining if the affected names are type names, the compiler considers only syntactic properties to disambiguate the statement.
Since B is a type name, the compiler can potentially interpret B(a) as either an expression or a declaration. According to the tie-breaking rules, the compiler chooses to interpret B(a) as a declaration even though that interpretation leads to the redefinition error that you found.
Now consider the correctly-compiling line
f(B(a));Because f is not a type name, the compiler cannot interpret this statement as a declaration, but instead reckons f(B(a)) as a function call expression. Because no ambiguity exists, the tie-breaking rules of subclauses 6.8 and 8.2 don't come into play.
Notes
[1] <http://msdn.microsoft.com/voices/deep.asp>
[2] This is true even for empty objects. See C++ Standard subclause 1.8 ("The C++ object model").
[3] For the etymology of this phrase, check out my very first CUJ column, in the November 1995 issue.
[4] Given both this observation and the current design pattern vogue among C++ cognoscenti, I hereby brand this design the Heisenberg Turtle Pattern. You read it here first.
Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also an alumnus of Microsoft, a speaker at the Software Development and Embedded Systems Conferences, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him on the Internet via rschmidt@netcom.com.