Bobby and friends gang up on Hungarian notation.
Copyright © 1997 Robert H. Schmidt
In the June CUJ, I published this excerpt from a letter sent by Andreas Arff:
As you might have understood by now, if you have bothered to read this far, the question is how we (the programmers) get information from reading source code. Not whether your editor supports #if 0...#endif coloring or if you put an m_ in front of a member. Point is, if you flame hungarian you flame syntax coloring and vice versa, because you actually flame the concept of conveying information in the source code. Give it some thought.
Two months later, I noted that entirely too many Diligent Readers had weighed in with opinions about Andreas' letter. Making good on my promise/threat from then, I now explore your views on this scintillating, life-defining debate.
Well, "debate" may be too strong a term, since nobody who wrote me sides with Andreas! [1] So rather than turn the column into a lopsided Point-No-Counterpoint, I'll look deeper into the responses for philosophical and technical insight. Compared to the past few columns, you should find this one shorter, softer, more subjective, and more personal.
Caveat the First: While converting your email messages from one mail client to another, I lost some of those messages' header information. If your mail did not contain your return address in the message text, I had no way of replying to you directly. Further, if that text didn't include your name, I can't even cite you in this column. My apologies to anyone so affected.
Caveat the Second: I got so many messages that I can't excerpt them all. Instead, I aim to capture the broad themes, while excerpting letters representative of those themes.
Science and Logic
Stuart D. Gatham and Bear Giles find that syntax coloring is a deterministic process, while Hungarian notation is a human convention, presumably prone to inconsistent interpretation. Bear specifically likens Hungarian to hand-optimizing, opining that at best each can match the success of automation, and that at worst each can actually degrade the code.
Greg Miller is more blunt:
This is a classic fallacy. Opposing a single method of conveying information is not at all equivalent to opposing all such methods. It's perfectly possible for one method to be a good thing and another a bad one.
a perspective shared by David "Ditto" Cattarin:
[Andreas has] fallen prey to a common logic fallacy [by stating] that two things are equivalent because they share a common property. Obviously, this is wrong. The real difference between Hungarian and syntax coloring is that one is an encoding scheme and the other is a presentation scheme. There is a *big* difference between the two!
Michael Lee Finney sent a long and fascinating letter, tying Hungarian to normal forms in database theory. Part of his mail reads:
ANY form of name decoration is subject to code update anomalies. More generally, if X can be converted to Y in the language such that existing code does not need to change, then any naming standard and name decoration should be identical for X and Y.
...I would say that if, for a given language, you form a basic set of possible code transformation, and then design a coding standard to be invariant over those transformations then you have reached 3rd normal form. Under C++, I can for instance, transform a macro constant or macro function into a variable or a function, respectively. I can transform a function to a pointer to a function (which is a variable). I can certainly transform the type of any variable to anything else within the semantic restrictions imposed by the code. I can (sometimes) transform a type into a class or vice-versa.
As demonstration, consider this example:
int (*function)() = some_function; int i = function();Because function is a pointer, Hungarian notation dictates that it start with the wart p and be mixed case:
int (*pFunction)() = some_function; int i = pFunction();Later you could turn the pointer into a real function:
int pFunction() { // ... }The rest of the code is oblivious to the change the calls to pFunction look and act the same, so that
int i = pFunction();still works. Unfortunately, because pFunction is now a real function, its name violates Hungarian's tenets. You are faced with the dilemma of either changing the name and all the calls, or leaving the misleading wart p and possibly confusing readers.
As I argued in my first column about Hungarian (one year ago this month!), embedding such meta-information in an entity's name is often tantamount to exposing part of that entity's implementation, a violation of abstraction. Contrast this with syntax-coloring, which as Zoltan Leskowsky points out, "provide[s] easy visualization of information that is already present in the code itself."
Naming schemes (like Hungarian) can expose other than type implementation. For example, where C typically uses macros to simulate small inline functions, C++ uses real inline functions or function templates, features not present in C. Long-standing C convention puts the names of macros in all upper case, while C++ convention puts functions into lower or mixed-case. If you port C macros to C++, then change the macros to functions, you almost certainly will want to change the macro names away from all upper-case.
The change is not required by the code all references to the function could continue using the upper-case name. Instead, the name-change requirement comes from humans attempting to hew to an inconsistent nomenclature.
I have no formal grounding in database theory, so I can't speak to Michael's points about normal forms. But with all the naming complexity in C++, and all the opportunities for an entity's attributes (and therefore its potential Hungarian-style name) to change over a project's lifetime, I find his thesis merits serious thought.
Information Overload
Bear Giles goes on to make this interesting point:
On a related note, I'm kinda disturbed that Andreas used the "conveying information in the source code" argument. Information is best judged by quality, not quantity. Which contains more useful information, a stack of business section stock price listings from the newspaper or a paragraph saying "buy XYZ for the following reasons:..."? (The stock price listings are best described as *data*, not information.)
One of the fastest ways to degrade the quality of information is to repeat it unnecessarily. A quick way to reduce the quality of information is to unnecessarily duplicate it. Unneeded repetition destroys the quality of information.
Redundancy in the face of noise is useful, but no compiler will (or at least should) read code in Hungarian notation, parse the information encoded in the prefix and then act on it.
Ultimately, the code must stand on its own regardless of any Hungarian notation used by the author. Since the compiler can't use the Hungarian notation and, if properly done, it duplicates information already present, the HN is an unnecessary redundancy. That means information arguments should *discourage* the use of Hungarian notation, not encourage it.
This raises an aspect I hadn't addressed directly: independent of the kind and quality of information presented, at some point the information's volume overwhelms its content. Ronald Michaels says this goes counter to "presenting information [with] minimization of cognitive effort."
Sometimes the problem is too much of the wrong kinds of information and too little of the right kinds, as Bear also notes:
[M]y experience is that many people will joyously embrace Hungarian notation as the solution to their problems, while fighting tooth and nail the necessity of merely documenting loop invariants or pre- and post-conditions.
As a long-time developer of test code, I can only say "amen, brother."
No Pain, More Gain
Several of you including John Dearden, Tom Bjorkholm, M. Seitz, and self-proclaimed "less than diligent reader" Frank Albe found that Hungarian made the job of coding harder while syntax coloring made it easier. As reader Seitz put it, "Hungarian notation means extra work for the programmer, while syntax highlighting means extra work for the editor."
I have to wonder if syntax coloring uniformly makes the job easier. Maybe a more appropriate view is that syntax coloring helps you understand a program while costing you some ability to read that program. I know that, after spending so many years reading code as black type on a white page (or later, as green images on a black terminal), I had trouble first adjusting to multi-hued editors.
Imagine syntax coloration extending to normal prose, so that nouns appeared in red, verbs in blue, and so on. While such a scheme would help the underlying sentence structure stand out, I doubt many of us would want to actually read such text. Even so, such coloring would be preferable to Hungarianized prose, for as Arch Robison points out
If Hungarian is so great, d_why v_do a_its n_proponents d_not v_use n_it p_in a_their n_comments? (d=adverb v=verb a=adjective n=noun p=preposition).
The Persistence of Warts
Many of you noted this dichotomy: syntax coloring changes how something appears, while Hungarian changes the thing itself. Hugh S. Myers put it this way:
We might play with point size or even use italics and differing fonts for a variety of effects but the words remain themselves. Hungarian notation insists that this is not good enough, that the word itself is deficient! Nonsense! If the word is wrong, use the right word this is after all what communication is about.
Dave Callaway writes that warts are "indelible" make one change to a class name, and all objects related to that class also require a name change. He also notes that the wart prorogation is most severe in the presence of public data (another fave of mine), since a change in the data member's type would require changes throughout the code.
In contrast, syntax coloring is an artifact of your view into the code, and not part of the code itself; or, as Tim Whitehouse puts it, Hungarian lasts the lifetime of the code, while coloring lasts only for the editing session. Dennis R. Sherman calls Hungarian "early and immutable" binding, while presumably coloring is a late and mutable binding a nice symmetry with C++'s notions of early (static) and late (dynamic) object type binding.
Arch Robison, David Cattarin, and Bear Giles wonder why Hungarian couldn't appear as an editor option; that way, the alleged advantages of Hungarian would not infest the actual code. This may not be so far-fetched. In current technology, C and C++ source can appear to be rich text (with different colors, styles, and so on), even though the underlying real text is flat. Similarly, I can envision an editor parsing source and artificially displaying warts as part of the pseudo-rich text.
Such an editor would address Arch's and Chris Waters' complaint that, for programmers sharing a project, syntax coloring is an individual choice different programmers viewing the same code can use different coloration or even different editors while Hungarian imposes on everyone. Donovan Balli, who claims to "enjoy [my] articles, and quite often agree with [my] views," says he would actually use prefer a Hungarianizing editor over syntax coloring.
Sidebars and Sundries
Apparently I'm not the only one who's deigned to publish anti-Hungarian sentiment. Paul Koning writes:
I recently read some nasty comments about Hungarian notation in the Linux coding standards. Linux said "Encoding the type of a function into the name ... is brain damaged the compiler knows the types anyway and can check those, and it only confuses the programmer."
On a different tack, Paul goes on to write:
And by the way, putting an "m_" in front of a member name is another case of Hungarian notation; I view it the same way. On the other hand, I couldn't conceive of ever doing the member initialization hackery that you described on the next page anyone who codes an initializer of the form "i(i)" in production code deserves to be shot.
An interesting dilemma, showing the difficulty in staying pure. I find that C++ forces all sorts of such compromises in places where extra features have been layered atop C. While I don't know for certain, I feel safe guessing that C was not designed to be simply and consistently extended.
I often wonder how many of these problems would go away if C++ were designed with more reliance on explicit syntax (with some extra keywords) and less reliance on "overloaded" punctuation symbols, positional clues, and arbitrary tie-breakers. Even a single change adopting a Pascal-like syntax for declarations would disambiguate C++ greatly.
Finally, I'd like to end this discussion with a few honorable mentions:
- Zoltan Leskowsky, for writing "my memory of Hungarian notation is very sketchy (inexcusable in my case, since both of my parents are Hungarian-born)." His punishment: 24 straight hours of Green Acres reruns.
- Michael Heyman and his perceptive observation that the "thing about [syntax] color [that] sets it apart from Hungarian is that it is in color."
- Jamie Blustein calling my column "impressive." But y'know, so is Windows' market share, so I'm not sure what to make of this.
Erratica
On page 67 of my January column, I suggested this C pseudo-construction technique:
#define zero_construct(type, variable)\ type variable;\ memset(&variable, 0, sizeof variable) void f() { zero_construct(char, c); zero_construct(struct s, x); zero_construct(void *, p); }Unfortunately, as Diligent Reader Bob Kamins points out, this code compiles in C++ but not in C! To see why, run the above through either language's preprocessor. The result is
void f() { char c; memset(&c, 0, sizeof c); struct s x; memset(&x, 0, sizeof x); void * p; memset(&p, 0, sizeof p); }C++ allows such interspersing of declarations with other statements. C does not, rendering my supposed C++-ish technique useless in C.
Bob's mail, originally sent to PJP, ends charitably:
Thanks for a really fine publication. It is always a pleasure to read.
A pleasure, even when it's buggy. I'm pretty sure I know what happened. I build all my code examples before publishing them. I suspect I had this example in a .cpp file, which my compiler interpreted as C++ instead of C. Live and learn.
Fibrillation
The fame and glory that inevitably follow me like a second shadow are taking their toll. Several non-CUJ obligations draw nigh, with preparations for SD '97 East looming most large. Therefore, as was the case last year, this column will skip a beat in November. In December, I'll resume The Learning C/C++urve's abstracted meanderings across the MAPI landscape. Until then, I hope to see you in Washington DC at SD '97 East.
The Fault is not in our Stars
If you read the tiny capsule bio at the bottom of this column's first page, you'll learn that I once was an astronomer. Growing up in SW Ohio, I spent uncountable moonless nights outside, summer and winter, dusk to dawn, armed with little more than my venerable 6" f/8 Newtonian, Norton's star atlas, and red-lens flashlight. Once astronomy became my job I had a planetarium and larger scopes available, but never lost my fondness for that original reflector.
Sadly, since I've moved to rainy Puget Sound, my astronomical pursuits has been mostly from the armchair; I reluctantly left my Newtonian with an old friend back home, and now rely on CNN and JPL's web sites for astronomy pictures and news.
Still, as the saying goes, they can't take the country or the astronomy out of the boy. As I write, two events have caused me to spontaneously regress to those boyhood days:
- The Pathfinder's Sojourner rover has been successfully scooting about the surface of Mars. I remember watching the first pictures from the Viking lander in '76, as they appeared stripe by agonizingly slow stripe across NASA's monitors. While that grabbed the imaginations of us hard-core devotees, it pales beside the collective consciousness surrounding Pathfinder's "live" color stereo pictures on computers world-wide. [2]
- The motion picture Contact has just been released in the U.S. While I could probably write a whole column on the movie [only to have it cut by some joyless managing editor mb], I'll mention only this: if you've ever looked out to the stars and wondered if anyone was looking back at the same time, you must see this film. I cannot recommend it strongly enough. o
Notes
[1] This does not mean everyone sides against Andreas. It means that everyone who wrote me sides against Andreas. Maybe only malcontents send me mail. (I can hear my managing editor muttering "birds of a feather...")
[2] When my parents first moved to Sedona AZ, I dubbed their backyard "Mars," since the ruddy rocky terrain reminded me of those early Viking pictures.
Bobby Schmidt is a freelance writer, teacher, consultant, and programmer. He is also a member of the ANSI/ISO C standards committee, an alumnus of Microsoft, and an original "associate" of (Dan) Saks & Associates. In other career incarnations, Bobby has been a pool hall operator, radio DJ, private investigator, and astronomer. You may summon him at 14518 104th Ave NE Bothell WA 98011; by phone at +1-425-488-7696, or via Internet e-mail as rschmidt@netcom.com.