January 1999/We Have Mail

Departments

We Have Mail

Letters to the editor may be sent via email to cujed@mfi.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.

Dear CUJ,

Here is a real mystery I have been unable to solve. Maybe a CUJ reader can suggest a solution. The following are the first ten lines of an MFC C++ .cpp source file:
m_pProfileProperties->m_StartHour =
    m_timeStart.GetHour();
m_pProfileProperties->m_StartMin  =
    m_timeStart.GetMinute();
m_pProfileProperties->m_EndHour   =
    m_timeEnd.GetHour();
m_pProfileProperties->m_EndMin    =
    m_timeEnd.GetMinute();
m_pProfileProperties->m_StartHour =
    m_timeStart.GetHour();
m_pProfileProperties->m_StartMin  =
    m_timeStart.GetMinute();
m_pProfileProperties->m_EndHour   =
    m_timeEnd.GetHour();
m_pProfileProperties->m_EndMin    =
    m_timeEnd.GetMinute();
// Includes
#include "stdafx.h"
The question is, simply, why does this compile? The compiler (MSVC 5.0) should absolutely refuse to accept the first eight lines, shouldn't it?

m_pProfileProperties, m_timeStart, and m_timeEnd are member variables of a class that is not until defined after stdafx.h is included (so they're not in the precompiled header). Even if they were global variables (they're not, we checked), this code shouldn't be valid before the variables are defined — the compiler lacks all type information for them. I would expect I'd get an error message saying something such as "unknown identifier m_pProfileProperties."

Further compounding the mystery is that according to Source Safe, these lines have been in the code base for close to a month, and been through many complete rebuilds, including several beta versions, several Gold Candidate CDs and a final Gold Master CD release. They apparently do no harm.

What on earth is the compiler thinking it sees?

Steve Cohen

Here's what Microsoft has to say about it:

From what Steve says in his letter I gather that this source code is compiled using a precompiled stdafx.h. What he describes happens because the compiler, when using a precompiled header, looks for the include named in the /Yu switch. Everything up to that include is treated as a comment.

Kerry Loynd
Microsoft Visual C++ compiler team

To the editor:

In the October 1998, issue of C/C++ User's Journal, Tredd Barton submitted a one-line routine which swaps two integers using three exclusive OR operations. This is a technique I did learn in college, in the 1960s, and it is disappointing that Mr. Barton did not learn this in his courses. The line of code which performed these operations is
x ^= y = (x ^= y) ^ y;
Unfortunately, this is not valid C++ (or C, for that matter). The C++ Standard contains the following statement (Chapter 5, Introduction):

"Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression."

The C Standard has similar text. Mr. Barton's code depends upon the value in x first being updated by the parenthesized expression, and this modified value is used in the leftmost assignment to x. This means that x gets updated twice. It also assumes that the value of y is modified by the parenthesized expression before y's value is used in the leftmost operation. This is a side effect of the expression. The same section in the Standard indicates that the ordering of side effects is unspecified.

It is very common to believe that parentheses affect the order of evaluation in C and C++. This belief is often reinforced because many compilers generate code to evaluate expressions which is simple and follows the evaluation order which the writer desired. But the C and C++ Standards do not require this, and a compiler is free to optimize code to eliminate multiple stores to the same variable, or to rearrange the order. This sometimes results in surprises (i.e., bugs) when previously working code is run through a different compiler, or through a different version of the same compiler, or when the level of optimization is changed.

Mr. Barton's code can be easily broken up into three statements, which do conform to the C and C++ Standards:
x ^= y;
y ^= x;
x ^= y;
Sincerely,

Michael Eager
Eager Consulting
eager@eagercon.com

Thanks, Michael. We received numerous letters alerting us to the undefined behavior in Barton's code, but I think yours offers the most clear explanation. Well, I must confess I have been caught by the old sequence-point problem, and this time I can't blame my senior editor for not catching it. We were on deadline and I slipped the letter in outside our normal techical review process. I guess I was blinded by the apparent cleverness of the scheme. Never again! — mb

I like to think I would have caught it, and suggested the use of commas to separate (and properly sequence) the three expressions. — pjp

Dear Dan Saks,

I just finished reading your Classes vs. Namespaces article in the July 1998 C/C++ User's Journal and have some input.

I agree with your conclusion that classes are superior to namespaces for creating abstractions. However, I have found one particular scenario where namespaces have been incredibly useful in "real life" programs — updating poorly written C code.

I have recently completed a series of projects where my primary goal was to add features and remove bugs in several fairly large C programs. The original programs were classic "spaghetti code." Before I could proceed with my primary tasks, I had to make sense of the existing mess. However, project constraints did not allow a full rewrite using an object-oriented approach. The code would remain strictly procedural.

My first step was to recompile the code as C++, which was straightforward. I then added namespaces according to the following three simple rules:

1) All functions (except main) and all variables must reside within namespaces.

2) Each namespace declaration is divided into exactly two blocks, between exactly one header file and exactly one source file (using namespace extension). This allows for a "public" part and a "private" part.

3) Variables could reside only within the "private" part.

4) using directives were not allowed.

The improvement to the code's readability was incredible. All functions and variables "belonged" somewhere, and I now had a logical basis for reorganizing them into modules. I could immediately look at any function call and determine in which module the declaration resided and therefore what data the function potentially modified. I caught numerous instances of functions without prototypes, misuse of variables, etc. The code "felt" like it was encapsulated. As an unexpected benefit, my code browser (Thomas Goerner's SourceAnalyzer) recognized the namespaces and divided my functions accordingly.

Would I recommend this approach for starting a new project of significant size? Absolutely not. A true object-oriented approach would be greatly superior. However, I found namespaces to be of great benefit when using C++ as a "better C" when faced with legacy procedural code.

Bill D. Pirkle
bpirkle@oico.com

Thanks for sharing your thoughts. — Dan Saks

Dear CUJ,

Just several months ago, I discovered the <tchar.h> generic text mappings header and learned of its purpose. It came as quite a surprise to me, because of all the C++ books I've read (and I've read my share), I cannot recall a single one that ever brought up the subject. Now, I've been programming C++ for about two years. I certainly don't consider myself an expert programmer, but I would like to think that I am in the upper intermediate group. Whatever the case, at 19, I get paid the same as some people at my company ten years older than myself (who have CS degrees). Somehow, though, I was able to go for all this time without even being slightly aware of what the generic text mappings do and why they are so important.

After looking into the <tchar.h> header for a few minutes and toying around with it in some of my code ("Hey, look at that — it works!"), I still hadn't embraced it fully. I thought it was kind of interesting, but had my doubts about whether it was actually worth the effort to adopt in my code. Then, about a month later, I was informed that a Japanese company was interested in purchasing some software I'd written at work. That changed my mind, very rapidly. This program had been written entirely without wide-character support, and I didn't much enjoy the idea of looking back through 15,000 lines of code and trying to internationalize all of it. I got lucky in this particular case, and didn't have to go to all that trouble. But that experience made me resolve to never get into a situation where I would. So I started using the macros, and to my surprise, it was very easy to get up and running using them.

Maybe back when all the computers on earth existed in U.S. universities and research facilities, it was OK to ignore international support. But those days are long gone. <tchar.h> provides an excellent opportunity to easily increase the portability of applications, and we should all take advantage of it whenever possible.

Brenden Mecleary
bman2@ix.netcom.com
http://www.gsidigital.com/bman/

The <tchar.h> mechanism is unique to Microsoft. While that is a very large playground, indeed, it is not the whole world. Both Standard C and Standard C++ have quite a lot of support for programming using large character sets. If you really want to be prepared for an internationalized world, I suggest you study these standards. Several books are available that discuss such matters. (I've written a couple of them myself.) — pjp

Dear CUJ,

The hash code in P.J. Plauger's column in the latest issue is clearly broken. When called from the following simple test loop:
for ( int i = 0; i < 20; i += 3 )
   a[i] = i * 10;
the line:
return (_Pairib(_P, false))
in the primary insert function is executed upon the attempt to insert the value 15.

Jonathon H. Lundquist

Yep. It's a bug all right. The test loop in operator[] should read:
for (_P = _Vec[_N + 1],
     _Pe = _Vec[_N];
     _P != _Pe; )
   if (comp(_Kfn()(_V),
            _Kfn()(*-_P)))
                ;
   // next three lines added
   else if (comp(_Kfn()(*_P),
                 _Kfn()(_V)))
      {++_P; break; }
   else if (_Multi)
      break;
   else
     return (_Pairib(_P, false));
Thanks very much for reporting the problem. — pjp

Dear PJP,

Thank you for taking the time to acknowledge the bug I reported. Unfortunately, after I determined a correct implementation for that portion, I next ran into the following bug:
if (_Maxidx <= size()/bucket_size) {
   if (_Vec.size()-1 <= _Maxidx) {

      // _Vec.size() == 8,
      // _Maxidx == 7
      _Mask = (_Vec.size()<<1) - 1;

      // _Mask == 15
      _Vec.resize(_Mask+1, end());

      // _Vec.size() == 16
      }
   size_type _N =
      _Maxidx - (_Vec.size() >> 1);

   // _N == -1  !!WRONG!!
   for (_P = _Vec[_N],
        _Pe = _Vec[_N + 1];
        _P != _Pe; ) {
      if ((comp(_Kfn()(*_P))&_Mask)
                ==_N)
         ++_P;
      else {
         for(_N = _Maxidx;
             _Vec[_N] == end();
             -_N ) {

            _Vec[_N] = _P;
            if ( _N == 0 )
               break;
         }
      _List.splice(end(),_List,_P++);
      }
   }
   ++_Maxidx;
}
At that point I gave up entirely on the published code, and ported your iterator concept, which was ingenious, to a hash table I had previously written which applies the expansion scheme yours was based on correctly.

You also found the second major bug in the code, which involved a bit of restructuring to fix. That's always the danger of publishing new code that's been only lightly tested. I do appreciate the feedback, however humbling. — pjp o