July 1993/We Have Mail

Departments

We Have Mail

Dear C Users Journal,
Your article in the February 1993 issue on pointers by Christopher Skelly had several good points to it. The most useful was the author's careful distinction on page 98 regarding the mistake often made by teachers of C that "an array name is a pointer" which as the author points out "is in fact quite false..." His example, of the use of the sizeof operator, evaluating to the size of the entire array instead of the size of a pointer, is a good one. I also find his "Ladder of Indirection" a very simple and useful visualization concept.
However, there are several serious errors of distinction that your author makes philosophically which should be considered important by any human being to whom precision of thought is important to their life and their work. All programmers would want to be included in this category.
The first error is on page 94 in the heading "Key fact #2", wherein Mr. Skelly states that "a pointer always "knows" the type of thing it addresses." His use of the quotation marks around the work "knows" does not eliminate the error that neither a computer nor the code used to run it "knows" anything.
Perhaps you'll find this distinction a bit silly or trivial. But consider living in a world where machines are elevated to the status of the human and the human is relegated to the status of the animal. Mr. Skelly moves us just a bit closer.
Consider the second paragraph on page 94, "Each pointer has a built in sense of the type of the object stored at the address which the pointer contains." The word "sense" applies to a human ability. The proper word here would be description. Also consider the following paragraph containing the sentence, "Even the special case pointer to void points at a specific type and such a pointer has its own set of resources and limitations." The correct word instead of resources could be "attributes", or possibly "functions".
Consider the author's worst error in the use of the word lives on page 98 toward the top of the second column in the sentence, "The [] operator, usually thought of as being related to arrays, is also a dereferencing operator. p[n] lives on the plane below p. Mr. Skelly, life is a self perpetuating and regenerative activity engaged in by an organism. Can your p[n] make this claim?
A computer is a machine, a tool without volitional consciousness — without the choice to think or not to think. It must be turned on by the human operator. The application of these types of expressions and descriptions to inanimate objects or to coded central processor instructions is a key to the author's psychology as well as to that of many lonely programmers. Can not some of the best brains in the world, the programmers, consider the importance of a human centered philosophy. After all, they are not Tron, they are human. I wonder what Mr. Skelly thinks the differences are, if any, between a human and a computer. Perhaps sex?
These errors are diverting to the student programmer in what could have been a much shorter, simpler and informative article in one part instead of two. Mr. Skelly should get to the solutions sooner including especially, a detailed explanation for his teaser, ++*--*pppp[0]. The entire first section up to "Key Fact #1" could have been eliminated, thereby improving the article substantially. In other words cut the B.S. and get to the precise facts. Most writers are paid by the word however.
The final blow against humanity in this potentially useful article was struck by Mr. Skelly (wherein he delineates his philosophy very clearly) toward the beginning of the article on page 94 wherein he states that, "...I would rather be trout fishing than practically anything else." Mr. Skelly, I would rather be controlling my environment through the use of my computers, electronics, chemistry, machinery and any other method I can devise through the use of my living human brain to get the trout to swim right onto my plate, than anything else. Even if I had the surgical ability to tie flies, I doubt I would want to re-live the primitive life of savages like "John the Apostle Fisherman of Biblical times" in a set of wading boots. I compute for a living yes, but I also enjoy the steadily increasing control I obtain through my computer of my environment more than "practically" anything else.
Most Sincerely,
Hutchinson Persons
Engineer
I don't mean this to sound rude, but I know that Chris Skelly is a lot of fun at parties. After reading your letter, I'm not too sure about you. Translation: lighten up a little. I'm not an animist any more than Chris is. Still, at three o'clock in the morning, it's kind of fun to look on life as a kind of Warner Brothers cartoon played for keeps. I believe that the best teachers can package this kind of edgy near insanity for the enjoyment and edification of their students. The Reverend Charles Dodgson (aka Lewis Carroll) had that gift in spades. Chris Skelly has it. I aspire to it too without for a moment abandoning the notion of the universe as a marvelously contrived clockwork.
One man's B.S. is another man's tutorial. One man's reversion to savagery is another man's recreation (re-creation). I'm glad the world has people like you steadily improving the technology. I'm glad it also has people like Skelly explaining it and enjoying it. And I wish Dennis Ritchie hadn't confused arrays and pointers for all time. — pjp
Editor's name game (PJP):
Are you (The CUJ) the League of Women Voters? Are you the ACM? Or are you a Users Journal? What kind of users? Surely, C users, it's written on the cover!
But what should I do, if I want to read some thing about C++? Standing in a book and news store reading all the titles? Mainly, I would look after a cover marking C++, but PJP not? No, he knows that inside CUJ there are C++ articles. But from what impression you derive that the normal reader has X-ray eyes detecting C++ inside?
Enter and shop: all articles are marked well. Read an advertisment, always it's C a-n-d C++, and they are proud if they reached this C a-n-d C++.
Imagine the book and periodical store of a company. The clerk with the task to make a magazine about C++ available, C++, n-o-t C. Could he present a magazine with only C on the cover to his clients?
Try to complete your cover, otherwise you will have a veritable

EDITOR'S NAME GAME
and risk to fall short in C++ recognition on the CUJ cover!
Sincerely, an old C and newer C++ reader of CUJ
Hans G.W. Muller
Yup. We're continually working on better ways to show the world what's inside our magazine. — pjp
Dear CUJ:
Matt Weisfeld's article on "Solving Linear Equations Using C" (CUJ, Feb. 1993) contained a small slip in the problem's setup. The graph described by Figure 1 is not the solution space for the given constraint equations x1 = 4, x2 = 6 and 3x1 + 4x2 = 24. Nonetheless, if we switch the coefficients in the third equation, thus obtaining 4x1 + 3x2 = 24, the graph becomes correct. The correct wording, then, should have been: "Each ton of diet needs four pounds of the secret ingredient, while each ton of regular needs three pounds."
Sincerely,
Sergio Gonik
Computational Mechanics Company, Inc.
7701 North Lamar, Sutie 200
Austin, TX 78752
Oops. — pjp
Dear CUJ:
Having just finished reading the February 1993 issue, I would like to comment on two articles:
1) I found Frederick Hegeman's article "Sorting Networks" very interesting, and immediately proceeded to run some tests of my own. I found that the Microsoft 7.0 qsort function does indeed show the problems he mentioned for small numbers of comparisons; however, a slightly better-tuned qsort avoids this problem, making (on average) roughly the log(n!)/log(2) comparisons he mentions as optimal worst-case performance.
I would also point out that MergeSort comes very close to the limit, even for small n. Compare worst-case behavior for MergeSort to log(n!)/log(2) and to the worst-case behavior for Bose-Nelson nets:

n log(n!)/log(2) MergeSort Bose-Nelson 3 3 3 3 4 5 5 5 5 7 8 9 6 10 11 12 7 13 14 16 10 22 25 32 16 45 49 65 32 118 129 211 50 215 237 487 100 525 573 1511 1000 8530 8977 57158 10000 118459 123617 2426723
Note also that this is worst case behavior. MergeSort can do better (though not much so); Bose-Nelson networks will always require the same number of compares. MergeSort's only drawback is that it requires a scratch space as large as the input array. (Code for an easy-to-understand MergeSort, more efficient MergeSort, improved qsort, and test code used to produce the above table is enclosed. I doubt you'll want to print it, for space reasons if nothing else, but you might want to put it on your monthly code disk.)
MergeSort seems to stay within about 10 percent of optimum. It would be interesting, though probably not very useful, to know how much more closely the optimum value can be approached.
As an aside, I should note that Microsoft's qsort goes to n2 behavior if the array is almost, but not quite, sorted, or if the array is in descending order. Why this is left so is beyond me, since a simple change of pivot value could fix it.
2) Two small quibbles with David Burki's article on "Date Conversions:" First, it drops out the leap day February 29, 4800 BC, making dates before then off by one day. Secondly, the code does treat the year before 1 AD as 0, and the year before that as 1 BC. This is in accord with how astronomers see things (it's the reasonable way to do it), but contrary to how historians view things. They treat the year before 1 AD as 1 BC. Thus, the solar eclipse of 28 Aug 1203 BC (as astronomers reckon it) happened on 28 Aug 1204 BC (as historians reckon it).
The point is made slightly moot by the fact that dates before 1582 AD are almost always Julian anyway. (Code to do the date conversions without encountering these problems is enclosed. Same comment about printing it as before.)
As long as I'm here... Ever thought about putting the C Users Group code on CD-ROM?
Bill J. Gray
Ridge Rd, Box 1607
Bowdoinham, ME 04008
We can always benefit from more lore on sorting. And I always enjoy more calendar trivia. — pjp
Berney Williams, New Products Manager, replies:
CUG Vols. 101-360, including the full text of all three volumes of the directory, are available from Walnut Creek CD-ROM, 1-800-786-9907.
Dear Mr. Plauger,
Since I started using Gimpel's PCLINT, I've become more aware of typing issues in my C code. I am curious about the "proper" use of the standard library function write. I use Microsoft C v6 under MS-DOS where an int is 16 bits. The prototype

int write(int fid, void *buf, unsigned cnt);
returns the number of bytes written if the operation is successful and a -1 otherwise.
I certainly accept that a returned value of-1 implies that I can not distinguish between an error and a successful write of 65,535 bytes. But if I want to write, say, 48,000 bytes, how is write supposed to return successful value? This problem would have existed on the good old 16-bit PDP-11's, the machine C was originally designed for.
More generally, shouldn't (in a perfect world) write be prototyped to return unsigned and the error value be defined as ~0? Is the fact that write accepts an unsigned count but returns an int count a semantic flaw in the standard C definition of this function?
I've been programming in C for 18 years and have an appreciation of difficulties the ANSI committee must have had putting a lasso around the definition of C. It's just not obvious to me why write couldn't have been prototyped differently to avoid this particular problem.
Sheldon Hoffman
Reflective Computing
917 Alanson Dr.
St. Louis, MO. 63132
(314) 993-6132
You've discovered an ancient type weakness in the UNIX system calls. Actually, *write* originally took an *int* count argument. Looks like Microsoft tried to half reform it. The C standards committee didn't even try — we left *write* out of the Standard C library. (Other functions have similar problems, however.)
In practice, the problem is not too severe if you follow the pattern

if (write(fd, buf, n) != n) <error>
On twos-complement machines, this does what you want in all cases except a write error on a write of 65,535 bytes (not a likely occurrence if that's essentially all the bytes in your data space!).
Your "perfect world" fix for write is about the best you can do, I believe. — pjp
Dear Mr. Plauger:
I have been a subscriber to The C Users Journal for a couple of years now. Before that, I read the magazine sporadically. I'm generally pleased, since I find that I learn something new from almost every issue. On occasion, I have watched silently as computational and mathematical atrocities were committed within your pages. However, I can sit silently no longer. The article by P. J. LaBrocca [Mixed Numbers in C] in your April 1993 issue requires a response.
LaBrocca asserts that "[t]o reduce fractions to lowest terms, you need to know the prime factors of the numerator and denominator." [p.71] LaBrocca expands on this assertion by stating that "taking out gcds does not guarantee lowest terms." [p.73] The problem with these statements is that they are completely and absolutely false. If g is the gcd of two integers a and b, then it is necessarily the case that the integers a/g and b/g have no prime factors in common — that is the essence of the definition of the gcd. It follows that the fraction a/b, when written in lowest terms, must have the form (a/g)/(b/g).
Since LaBrocca obviously believes the falsehood that I quoted above, the code presented in the article is incredibly inefficient. It trades increased space and increased time for increased complexity in the algorithm. All in all, that seems like a poor trade. To be more specific, the data structure that LaBrocca uses to store the ratio of two four-byte integers occupies at least 330 bytes, since it contains room for 40 prime factors each for the numerator and denominator. No wonder that the article must point out that "stack overflow is likely!" [p.83]
The code is also inefficient in the choice of the algorithm. In order to reduce fractions to lowest terms, the author uses trial division to find the prime factors. Factorization is hard — that's why public key cryptography works. Moreover, trial division is a lousy way to factor numbers. The author is again forced to reveal the inadequacies in the code, saying that "[n]umbers with [large] prime factors ... produce all kinds of annoying errors." [p.83] Of course they do. You are solving the wrong problem, using the wrong methods.
Perhaps the worst aspect of LaBrocca's approach is that the correct solution has been known for 2500 years. Euclid's algorithm for computing gcds is a model of elegance and efficiency. If you combine the algorithm with back-substitution, then you can produce not only the gcd of a and b, but the reductions a/g and b/g with a mininal amount of work. (It is also interesting to note that you can determine the worst case performance of Euclid's algorithm, since you know the worst case: consecutive Fibonacci numbers. It's easy to see from this that the behavior is O(log(n)) in the size of the inputs.)
Since I've already shifted into critical mode, let me complain about one more aspect of the design. LaBrocca has confused the Model with the View. (Smalltalk programmers will know exactly what I mean.) Mixed numbers are, at best, a View of the underlying Model for rational numbers. Every arithmetic operation requires LaBrocca to convert the mixed number representation into an "improper" fractional representation. It seems to me that it would be better to store and use the fractional representation internally, and only convert to mixed numbers for input or output. (This depends, of course, on whether you expect to spend more time using numbers for input/output or for computation. Usually, computation wins. With the MixCalc application that was presented in the article, it's probably an even trade.)
Kevin R. Coombes
Department of Mathematics
University of Maryland
College Park, MD 20742
email: krc@math.umd.edu
I knew the approach was suboptimal in several ways, but we can't throw out every article that doesn't take the best possible approach to solving a problem. The botch about the GCD algorithm is my fault, since I edited this piece and I should know better. — pjp
Dear Sir:
I have been wondering about the distribution of the ISO/ANSI C Standard. I know that paper copies can be purchased, but is there an electronic version available? Physically, I'm sure it exists, you made reference to using it in The Standard C Library, but is it available to others? For that matter, what is the legal status of the standard? Who actualy owns it, or is it held in some form of public trust?
Thanks much
Kevin Nickerson
nickerson@bix.com
ANSI and ISO are still wrestling with the logistics of electronic distribution of standards documents. Don't expect an answer soon. Yes, the machine-readable text exists, but it is only available "for the purposes of standard formation." Both ANSI and ISO jealously guard their right to make money selling copies of standards.
ANSI, at least, has asserted copyright ownership of the standards it develops. That ownership can be, and has been, challenged, but I for one have better things to do with my time and money than to take on such a battle. I obtained formal permission from ISO to reproduce large portions of the C Standard in my book, and I'm grateful for their cooperation. — pjp
Dear Mr. Plauger,
Apologies for the rather late comment, but it's always some time before CUJ turns up at the local newsstand.
I'm wondering what is the reason for making time_t an unsigned long holding seconds since 1/1/1900 0:00. What comes first to my mind would be a structure similar to struct tm, sans the tm_?day and tm_isdst fields, probably. That would also, I guess, make coding the time conversion functions easier, since you would not have to worry that much about possible overflows. Also, a larger range of dates could be represented using such a time_t.
I can imagine that time_t was chosen the way it is in your implementation because a UNIX system usually returns seconds since 1970 when you ask the current time of it. Still, the question remains: What was the reason for making UNIX behave that way?
Yours sincerely
Soenke Behrens
uunet!informatik.
tumuenchen.de!behrenss
The C Standard requires that *time_t* be an arithmetic type, and that the *tm_year* field of *struct tm* count from 1900. I could have chosen a floating-point type for *time_t*, but the tradition is to use an integer type. An *unsigned long* is big enough for most purposes, anyway.
I believe the UNIX folk chose a 32-bit integer representation because that was the largest integer type you could manipulate at all comfortably on the PDP-11. (At that, you had to execute two or three instructions to do most moves, adds, or compares.) They chose the starting date as the start of the decade in which UNIX was born, just a year or so before it started catching on. — pjp