October 1994/We Have Mail

Departments

We Have Mail

P.J. Plauger
I should prefix my remarks here by saying that I have been a fan of yours for many years — ever since reading Software Tools and The Elements of Programming Style. I am therefore inclined to be positively biased when looking at anything that is associated with your name.
I know that you have held a senior editorial position at The C Users Journal for a good while and this factor caused me to move from glancing at the cover of the December 1993 issue to buying it.
The particular article that prompted my purchase was listed on the cover as "On-The-Fly Compression" and I thought it might have a solution for a task I need to handle in the near future. It didn't, but I accept that it was my fault for misinterpreting "on-the-fly" — I wanted something that would have been implemented as a pair of library functions that would look something like:

int compress(char *src, int srclen,char *dest,int destlen, int *state); int uncompress(char *scr, int srclen, char *dest,int destlen, int *state);
These could then be used inside other software that wanted to store, e.g., database records in a compressed format to reduce network traffic in a client/server system or maybe just to reduce storage requirements. As it happens, the article in question was just another file compressor — so I'll still have to write my own little library. No big deal.
However, in order to determine whether I could use the material in the article, I had to read it and that was when I became very disillusioned. I am quite willing to believe that you personally may not have read the article, and I hope this is so — it hardly meets the standards of The Elements of Programming Style, except as an illustration of what not to do.
Since I know you are quite able to read it for yourself and to see the faults, I won't waste your time here with a long listing of every nit I could pick with it. However, I should at least indicate the nature of the most disturbing deficiencies so that my concern is clarified.

Nothing in the presentation of the magazine or in the layout of the article indicates that the audience is intended to be exclusively made up of users of Microsoft's PC operating systems, but the article must have been written by somebody who has seen little outside that arena.

The article implies that it diverges from Standard C only in the use of some Turbo-C time and date functions, but it does not consist of legal Standard C even apart from these functions.

Although these time and date functions were mentioned, there are other functions (such as chmod and stat) which are also not Standard C. Surely, if he wanted to offer this facility (which is system-dependent after all), he could have put all that stuff in a separate file with a note that it offers specific features. Then those people who were interested in the subject of the article — compression — could have studied that without wasting their time with material that is simply not germane to the topic.

The layout of the code is utterly abysmal — a very rapid glance at listing 5 on pages 36-43 will hit you in the eyes.

The author requires users to edit a line in a header file to determine which of the compress/uncompress programs is to be built, despite the fact that even MS-DOS compilers allow much more elegant solutions to this sort of issue. And the complexity of the nested #ifdef control lines throughout is completely unnecessary—a little thought would have produced a much cleaner design that would have handled the task of creating two companion programs that share a lot of code.

The text accompanying the article spends a lot of time telling us that various files handle various elements of the program (which would be quite evident to anybody who glanced at the listings), but there is no clear and complete discussion of the algorithm used.
My other concern about the magazine is the amount of space devoted to C++ even though the title doesn't mention C++. I have no argument with your choice to cover this extra language, but I do think that you should come clean on the cover if that's your intention. I have read every word in a couple of Stroustrup's books (including the ARM — twice) and nearly every word in several other books as well as numerous articles on C++ and I am now quite satisfied that I can do without it for as long as C remains. For this reason, I'm not interested in reading magazines that give more than passing coverage of C++.
If you're still reading this, I hope you'll appreciate that my goal was not to sit down and give you a good beating over the head. I still look forward with interest to opportunities to read your own words wherever they may appear — I particularly liked your editorial column in the CUJ issue that I'm complaining about.
Anyway, I'm not asking you to publish this very lengthy letter in the magazine, although I do hope that the points I have made might inform future practice there.
It seems to me that many programmers look to the CUJ for guidance in good programming. Speaking as somebody who has worked as a professional programmer for more years than I want to count and who has had to do a great deal of training of professional programmers, I would hate to think that I could not recommend your magazine to people.
Best wishes and thanks for your books and articles,
Greg Black
681 Park Street
Brunswick, Vic. 3056
Australia
gjb@gba.oz.au
Much as I personally care about portability and rigorous style rules, I've long since come to appreciate that the larger community doesn't care nearly as much. Many programmers can write C for PC compatibles and never miss the incremental benefits of avoiding compiler, system, or architecture dependencies. Editorially, we strongly favor tidy code, but our bottom line is that each article have something that many of our readers can profit from learning.
So it is with C++. We refuse to abandon C, as some new publications dedicated to C++ have done. Equally, we refuse to ignore C++, even though a measurable fraction of our readers hope it will go away (at least that's what they say, from time to time). We decided to change the name of the magazine, in part to meet legitimate criticism such as yours about truth in packaging. We continue to refine our editorial focus. Feedback such as you provide helps us in that regard. (Please note, by the way, that we not only read but printed your entire letter.) — pjp
Dear Mr. Plauger,
How much longer is this going to continue? Almost one year ago, when I first saw Natural Language mentioned on the cover of your April 1993 issue ("A Natural Language Processor," Suereth, Russell) I was thrilled. It's not often that natural language is mentioned in public these days. But, alas, the article did little more than demonstrate some simple natural language patterns, useful (in my estimation) only for learning how not to do things. The second article, which followed in the June 1993 issue, did little more than to expand the number of IF statement blocks for your readers to type in. What kept me from writing a letter then was the number of complaints that I read in your column about the relative worth of these articles. I thought that the matter was put at rest.
But now, having received my April 1994 issue, we can celebrate the year anniversary of this creation with yet another article ("Expanding a Conversation Processor for Time") that again does little more than expand on the number of IF blocks and patterns to recognize. This series is doing little more than frustrating those that recognize the uses of a well written natural language parser, as well as sending those novices that do not down the wrong path. If Mr. Suereth keeps this up, he may eventually produce a program capable of parsing every possible sentence pattern in the English language ... but I certainly do not want to be the one to type in the program!
Perhaps if the sentence patterns that Mr. Suereth's program could parse were somewhat more useful, I could see the point. But just how often is one going to tell a computer that "Jim will be running during the day for one hour," as opposed to something more pragmatic, for example, a scheduling program: "Free one hour every Monday morning in April." Possibly, and this is stretching the program's usefulness in my eyes, one might be able to implement a personal scheduler, such that statements like "I run for two hours every Saturday at 10:00 am" might be useful to tell a computer, but I am hard pressed to think of many other uses.
At the very least I would like to suggest that Mr. Suereth stop the repetitive IF blocks. He claims to be creating a natural language parser. If so, then perhaps he should turn it in on top of itself and give it the ability to read a string from an external file, for example "PRON AUX VERB PREP DET NOUN" or "NAME AUX VERB PREP DET NOUN" and process these patterns accordingly. (This is still little better than a kludge, but it least it should make it a smaller kludge).
Duane Morin
dmorin@world.std.com
We got the message that you don't like Mr. Suereth's approach. I apologized in an earlier We Have Mail installment for any false expectations we might have created, but not for running either of the first two articles. We decided to run the third (and last in our collection) because a) we had already accepted it, b) we didn't feel it deserved to be killed, and c) even Mike Swaine, the opinionated Editor at Large for our friendly competitor Dr. Dobb's Journal, allowed as how he enjoyed reading the original article.
If the experts in natural-language parsing fear that we are leading innocents astray, there is an easy fix. Send us articles that you feel better illustrate the state of the art. If they meet our criteria for readability, we'll cheerfully publish them. — pjp
P.J. Plauger
The solution to use typedef/sizeof to check a size is clever. Anyhow, you don't have to make a struct — just do:

typedef char _check[sizeof(foo) == 20];
However it doesn't work in GNU C 2.x. Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure which is really a header for a variable-length object:

struct line { int length; char contents[0]; }; { struct line *thisline = (struct line *) malloc (sizeof (struct line) + this_length); thisline->length = this_length; }
In Standard C, you would have to give contents a length of 1, which means either you waste space or complicate the argument to malloc. You need to do:

gcc -pedantic
to make gcc complain.
marty
leisner@sdsp.mc.xerox.com
leisner@eso.mc.xerox.com
The correct way to achieve this effect in Standard C is to declare the array with the maximum size it can possibly attain. Otherwise, a conforming implementation is free to complain about a subsequent array subscript being out of range — or just plain get the subscripting wrong. To allocate the structure to a tailored size, you have to ask for the size in bytes of the structure, minus the declared size of the array member, plus the actual size of the array member. Admittedly, this is a messier size argument than that required by GNU C, but it's portable across all conforming implementations. — pjp
P.J. Plauger:
I've used over 20 C compilers over a dozen years on a dozen platforms (ranging from CP/M with BDS, Whitesmiths, and Aztec), 6502 Aztec (CP-M cross), 8086/MSdos Aztec (hosted and embedded systems), Microsoft, and a host of UNIX platforms (Dec PDP11, Vax, Sun 386i, Sun3, Sun4/Sparc, National 32xxxx running genix).
By far and away I prefer gcc. There's help available. You understand what's wrong (this is very important). And fixes are often given to you within days. And if you're capable, you can patch the source.
gcc/gdb is currently a wonderful combination. It has a number of features, matured over the years and is very stable on popular platforms (i.e. Sparc, 386/486). There are news groups and mailing lists for help/bugfixes. I like the situation where you have to pay to report bugs to vendors. And the bugs aren't fixed! (Only if enough people complain.)
If you're doing serious code development, you should get your code running on a number of platforms/operating systems. UNIX has a wonderful ability to core dump, and the debugger is running in a seperate process (none of the MS/DOS weirdness when a program runs fine under the debugger but doesn't by itself.
If you're writing graphics applications there are a number of libraries to hide the windowing system from you. So you can write code for X-Window and Windows. Running code on a variety of platforms leads to more robust code.
I'm not sure about C++. One of the most wonderful things about C is the quality and quantity of source code you can build on (X-Window, BSD, GNU, comp.sources.* and the C User's Group). One of the best ways to become a better programmer is to read other people's code, and there is oodles of C code to read. I will gradly grant that C is not the best language for everything. Also, I've been using very nice tools to analyze/work with C code. C++ is another story (the tools either aren't there or are immature).
I've been using mkid for five years (at one time I ported it to DOS, I'm trying to again to distribute). It's an incredible tool to analyze C code on small and large projects. What it does is build a symbol table, and then you can pose queries to this symbol table (i.e., what names are in a file). It was posted to comp.sources.unix in volume24 (mkid2). I'm working on porting it to MS/DOS and cleaning up the UNIX port.
marty
leisner@sdsp.mc.xerox.com
leisner@eso.mc.xerox.com
I omitted your more detailed description of mkid, but I trust you'll answer any queries that come your way. — pjp
Hi Mr Plauger at C Users Journal,
I have been a subscriber since Dec91.
The articles that were most useful to me in the Apr. '94 issue are:
p. 39 — "Creating Spin Controls" by Keith E. Bugg.
p. 91 — "Visibility in C" by Chuck Allison.
Clear, understandable. I learned and reinforced important C items that are easy to take for granted and to get rusty.
p. 113 — "Pointers and Arrays" by Kenneth Pugh. A closer understanding of how to use pointers in various ways.
Thanks,
Howard C Hoagland
Beckman Instruments, Inc.
Brea, California
hchoagland@biivax.dp.beckman.com
Thanks for the feedback. — pjp
Dear Mr Plauger,
I have a problem with qsort. Incidentally, in one of your editorials you mentioned the inventiveness of "letters to the editor" authors in spelling your name. One of the variations, you noticed, is to add an 'h' after the 'g' in your name. Believe me, for some inexplicable reason one does want to put an 'h' there; it is almost an urge. Just thought you might want to know.
Anyway, back to the qsort. Listing 1 shows the code on which I was trying to use it.
Ok, here is what happens (I am using Borland's 4.0++ compiler and I'm using the huge model at Borland's suggestion):
1. If data is less than 4,000 structures (hint, hint: data size is 16 * 4,000 = 64,000), then all is ok;
2. If data > 4,000 then the sort fails.
Undaunted, I use the qsort from your Standard C Library book (typed it in, too, and in the process changed its name to q_sort). Same result, no go. Yet you mention the possibility of sorting 1,000,000 pieces of data. How? Please help. There is no information anywhere I could find in dealing with large amounts of data (Microsoft devotes one measly paragraph to huge; Borland, in its example of qsort, sorts 1,000 pieces of data, but not a million.)
Cordially,
Nikita ANDREIEV
aleksey@netcom.com
As you've obviously guessed, the problem lies not specifically with the sorting algorithm — either mine or Borland's — but with how the compiler addresses data within an object larger than a 65,536-byte segment on an 80X86 computer. My bet is that you haven't doctored up the code in qsort to treat its data pointers as huge. That's where the address arithmetic is doubtless getting curdled. — pjp
Dear Mr. Plauger,
1.) The C Users Journal is doing a free job — I read it cover to cover each month. Please keep it up.
2.) I have a question which may spring from naivete on my part, but which might be a springboard for discussion in one of your future columns: Why doesn't the delete operator set a pointer to NULL after it has released the storage?
Best regards,
Jim Matey
The glib answer is that such extra work is not in "the spirit of C++." It's probably also close to the truth. — pjp
Dear Bill,
Having read my first ever issue of CUJ — the January 1994 issue, which I mostly enjoyed very much — I am afraid that I find myself pushed to write a very critical letter about the article "A Short Floating-Point Type in C++," by William Smith.
Mr. Smith lists the limitations of his package: float must be IEEE 32 bits, unsigned short must be 16 bits, and long must be 32 bits. In actual fact, at least some of these restrictions could easily be removed by minor changes to the code. However, he doesn't warn us of much more important restrictions than these.
The heart of the package is the two routines:
sfloat::sfloat(float)
and
sfloat::operator float()
which convert between short floats and IEEE ones. I found no less than three major errors or portability constraints in these:
(1) Both routines have short cuts if fsfBias is zero. Firstly, this happens if the two types have the same bias. Mr. Smith assumes this will only happen if the types have the same number of exponent bits; this is true for IEEE 32 bit, but might be false on other systems.
More important, however, is the assumption that it is the second short in the conversion union, and not the first, which holds the more significant bits. While this is true on MS-DOS systems, it is false on many others meeting the criteria listed by Mr. Smith. For more portable code, this short cut should be removed.
(2) When moving the sign from one type to the other, it is shifted left or right by sfBits. This should actually be fBits - sfBits: the difference in the bit numbers of the two sign bits (i.e. the sign is bit 31 in float and bit 15 in sfloat). Luckily, in this case, the two happen to be equal, but it does make me wonder how well the ideas in the code are thought out.
(3) The code that places the exponent into the correct place in the float shifts the value left, then right, then left again; similarly, the code code placing the mantissa shifts the value left in two stages. In trying to work out why, I was left with the conclusion that the code assumes that exactly the bits not required will be truncated by the various explicit and implicit casts. I doubt very much whether this code will work on systems with 32-bit (or even 16-bit) ints.
A better method would be to construct appropriate masks in the routine sfloatrange, and use these. For example:
In sfloatrange:
// Two new fields
sfloat::sfExpMask = ((1 << sfloat::sfExpBits) - 1)
    << sfloat::sfManBits;
sfloat::sfManMask = (1 << sfloat::sfManBits) - 1;

In sfloat::operator float():
// Get exponent
u.1 |= (unsigned long)(((s & sfExpMask) >> sfManBits)
    + fsfBias) << fManBits;

// Get mantissa
u.1 |= (unsigned long)(s & sfManMask) << sfManShift;
The last complaint I have is with the relational operators. These are downright wrong. All four work by treating the short floats as if they were short integers. Now, if both are positive, then indeed the larger float corresponds to the larger integer. However, negative floats correspond to positive integers strictly greater than the ones that positive floats correspond to, and the comparison order is reversed.
Suppose our short floats are signed, with 7 exponent bits and 8 mantissa bits. Then the following ten values will sort in this order (maximum at the top), clearly wrong!
float       bit pattern         hex
-42.0       1 1000100 01010000  0xC450
-2.0        1 1000000 00000000  0xC000
-1.0        1 0111111 00000000  0xBF00
-0.0859375  1 0111011 01100000  0xBB60
-0.0        1 0000000 00000000  0x8000
+42.0       0 1000100 01010000  0x4450
+2.0        0 1000000 00000000  0x4000
+0.0859375  0 0111011 01100000  0x3B60
+1.0        0 0111111 00000000  0x3F00
+0.0        0 0000000 00000000  0x0000
The only fix for this is to do the comparisons properly. Either:
inline int operator <= (sfloat sf1, sfloat sf2)
{ return (float) sf1 <= (float) sf2; }
or, by analyzing the bits:
inline int operator <= (sfloat sf1, sfloat sf2)
{
// If both short floats are non-negative,
// just compare the bit patterns
if (!sfloat::Signed ||
    !((sf1.s | sf2.s) & sfloat::sfManSignMask))
        return sf1.s <= sf2.s;

// +0 == -0, but the bit patterns differ
// This test is only true if one number is -0
    // and the other is +0 or -0
if ((sf1.s | sf2.s] == sfloat::sfManSignMask)
    return 1 // 0 if defining < or >

// Otherwise one is negative, so the larger int
// corresponds to the smaller float
return sf1.s >= sf2.s;
}
Finally, I close on a personal note. In your editorial on Technical Corrigendum 1, you refer to the "perversity" of believing that x<3&&0>0 must be parsed as containing a header file name. While it looks perverse, it is what the C Standard says. Some of us feel that we have to go by the words of the C Standard, not "what it ought to say." Otherwise why have a C Standard at all?
Clive D.W. Feather
Santa Cruz Operation
Croxley Centre
Hatters Lane, Watford
WD1 8YN, United Kingdom
Clive Feather has distinguished himself in recent years within the ISO C committee SC22/WG14 as a master picker of nits. (And I mean that in the nicest possible way.) That he could find mostly portability bugs in Mr. Smith's article should be taken as a compliment, in its own way. And speaking of complements (almost), my favorite way of testing signed-magnitude values using two's-complement integer arithmetic uses the magical (integer to integer) conversion:
y = 0 <= x ? x : -x + SIGN_BIT;
Here, the constant SIGN_BIT is an integer of the proper size with just its sign bit set. It's fun to study what this expression does with a negative zero.
As for Clive's last remark, I agree — perhaps more than he thinks — that a standard should be readable on its own merits. I still brand as perversity the all too common challenge, "Well, I suppose one can read the standard the way you intended, but I can also make a case for reading a silly meaning into it, and I insist on doing so." I take such assaults as guidance for making future drafts of the standard clearer, but not as proof that the developers of the original draft screwed up and produced a horribly flawed document. (That's a popular indoor sport among those who have the luxury of second guessing.) I'm not saying that Clive Feather plays this game, but I think he drinks an occasional (warm) beer with people who do. — pjp
Dear Sir,
I have just read February 1994 (Volume 12, Number 2) edition of The C Users Journal, and would like to make a comment about the technical content of the feature article, "Intuitive Access to Bit Arrays." I am distressed that, while in general the article is a good introduction to the use of C++ concepts, in the Other Applications section of the article the author seems to show a lack of understanding of the use of the array operator[], both in C++ and C, and thus promotes the common mis-belief that it is not possible to permit multiple arguments for operator[].
Surely the author has used C before and used arrays like this:
int TwoDimensional[10][5];
TwoDimensional[5][3] =
   TwoDimensional[10][3];
How then is it possible for the author to then suggest the absolute kludge operator(), and then to use it in a way which is not at all array-like in nature?
It is extremely simple to get the normal array operation in C++. Using his example, we could let a VideoDisplay contain a Vector of Lines, and a Line contain a Vector of Pixels. The operator[] for VideoDisplay would return a Line, and the operator[] for a Line would return a Pixel, allowing his code to be written as:
VideoDisplay v;
v[5][6] = v[20][10];
where
class VideoDisplay {
   .....
   DisplayLine& operator[](unsigned int x);
   .....
};

class DisplayLine {
....
   Pixel& operator[](unsigned int x);
.....
};
Thanks again for an interesting magazine.
Regards,
Rohan Lenard
rjl@iassf.easams.com.au
+61-2-367-4555
One programmer's kludge is another's clever notation. Your approach is eminently readable to me, but then I've been reading C code since the language was first invented. — pjp
PJP,
The week does not go by without someone needing to quote the font of all wisdom, saying something like "Yeah, but Plow-grrr says to learn a subset of C++ you are comfortable with...," or "Play-jer says not to trust software you didn't write youself..," Finding that the someone is sometimes me, I'd really like to be able to say "He says his name is pronounced..." :-)
So, how does the well informed computer professional (one who stops all activity until at least the editorial and letters of CUJ are read) pronounce your surname correctly?
respect, admiration etc.
Ron Chernich
My family has always pronounced it PLAW-grr, but what do we know? I'll take credit for the first quote, but not the second. Take that as a warning if you quote me: I might not have said it, or I may have been wrong. — pjp
Editor,
I was reading through Ballay and Storn's Article, "A Tool for Checking C Coding Conventions," in the (evolutionary issue) C/C++ Users Journal. On page 42 they declare:
enum E_BOOL {TRUE, FALSE};
Isn't this dangerous? Wouldn't the following readable code fail?
while (TRUE)
{
 ....
}
Better might be:
enum E_BOOL {FALSE=0, TRUE=1};
Rich Gossweiler
Yes it's dangerous. Yes, your way is better. It got by me. — pjp