June 1992/We Have Mail

Departments

We Have Mail

Dear Mr. Plauger,
I would like to make a comment with respect to a letter to the editor from Philip D. Pokorny which was published in the January 1992 issue (V10N1). In his letter Mr. Pokorny complains about having to define a structure or structure pointer in order to use the sizeof() operator on a member of that structure. I don't know if the following technique is guaranteed by Standard C, but I've used the following macro to handle this

#define MemSize(s, m) sizeof(((s *) 0)->m)
This macro is used as follows

typedef struct S MyStruct MyStruct; struct S MyStruct { int count; char name[20]; long ref; printf("Size of the name member is %u\n", MemSize(MyStruct, name));
I enjoy the Journal and look forward to each issue. Keep up the good work.
Regards,
Bob Withers
649 Meadowbrook St.
Allen, Texas 75002
As you suspect, the macro is not guaranteed to be portable. Often, however, it does work. It's based on the same trick often used to implement the offsetof macro. — pjp
Mr. Plauger,
Concerning your reply to Mr. Pokorny (philip@cel.cummins.com) about using the sizeof operator to get the size of fields in a structure, this sounds like a perfect use of the fldsiz macro. I know that fldsiz isn't a standard C macro but it is very useful. fldsiz is defined like this

#define fldsiz( str, fld ) (sizeof( ( (str *) 0)->fld))
Basically it typecasts zero as a pointer to a structure to satisfy sizeof's need for a valid pointer. Pretty tricky. This was found in the struct.h header file on a UNIX system. Hope this helps.
Samuel R. Blackburn
76300.326@compuserve.com
Yes, that's a neat macro. Too bad it's not portable. See above. — pjp
Dear Sir,
I am writing in regards to Mr. Hegeman's article on factorial-base arithmetic. It is quite an interesting idea. The possibility for exactness in fractional computation using this system is intriguing. After reading the article, however, I did notice a problem with the representation that Mr. Hegeman neglected to mention.
The problem is with the representation of fractions with denominators with prime factors larger than the number of "places" in the representation. Mr. Hegeman correctly states that every rational number has a finite representation in factorial base, but many numbers have representations that extend to many places.
For example, suppose the largest prime factor of the integer I is P. The factorial-base representation of 1 / I requires the use of P "places" in the fractional part of the factorial-base representation of this number. This is because all of the numbers in { 1!, 2!, 3!,. (P-I) } do not have P as a factor. So, given any rational number represented by N / D, where N and D are integers and are mutually prime, and P is the largest prime factor of D, the fractional part of the factorial-base representation of N / D has P "places." (Whew!)
Suppose we were to try to compute the number 1/101. 101 is prime, so the fractional part of the representation of 1/101 requires 101 "places." Likewise the factorial-base representation of 1000/1021 requires 1021 places just for the fractional part of the representation, which is at least 2KB for this one number in Mr. Hegeman's type of representation.
The code in Mr. Hegeman's article uses 100 "places" for each of the integral and fractional parts of a number. The numbers 1/101 and 1000/1021, therefore, are not representable exactly in Mr. Hegeman's implementation of factorial base numbers, and any attempt to compute them causes an underflow error. This is not a criticism of Mr. Hegeman's implementation, because it approximates the correct value, and the error may simply be ignored. It does, however, mean that there are many numbers that are not representable using a fixed size factorial-base representation, particularly those numbers with denominators with prime factors larger than this fixed size.
Mr. Hegeman's fixed sized factorial-base representation is very interesting, and is able to represent a significant number more rational numbers exactly than a fixed-size, fixed-base representation such as a float or double. I wonder, however, if it might not be simpler — and more exact — to implement a large integer data type using an array of ints, and then simply use this type to implement a ratio data type, as Mr. Hegeman suggests at the start of his article.
Sincerely,
David McCammond-Watts
Programmer, The Bradley Group, Inc.
Chicago, IL 60647
Mr. Hegeman responds:
Mr. McCammond-Watts is quite right that there are fractions not representable using a fixed size factorial-base representation. I mentioned a couple myself. I wish that I had been capable of framing an explanation as clear and concise as his.
However, it should be clear that fractions like 1/101 and 1000/1021 ARE exactly representable within this system. The algorithm is entirely general and perfectly extensible, as I took some pains to explain. It is true that they cannot be represented in a fraction of 100 factorial digits, but expanding to accommodate them is trivial.
He is right that at some point we reach a limit beyond which we cannot go and should be prepared for some divisor with which we cannot cope perfectly. I suggested what that limit might be.
The example of 1000/1021 is particularly interesting. First of all, Mr. McCammond-Watts is mistaken — 1021 places call for longs and that means at least 4KB! Not very efficient compared to a ratio of 2 ints. On the other hand, recall how to approximate e:

e = 1 + 1/1! + 1/2! + 1/3! + . . . + 1/99! + 1/100! . . .
Given that 1/100! is in the neighborhood of 1/10^157.97), ratios and bignums are no more accurate and grossly less efficient than directly initializing a fixed size array of 103 elements.
Approximating e in factorial base (I don't say computing) may seem like a parlor trick, but consider the more useful calculations:

e^x : 1 + x + x^2/2! + x^3/3! + x^4/4! + . . . sin x = x - x^3/3! + x^5/5! - x^7/7! + . . . cos x = 1 - x^2/2! + x^4/4! - x^6/6! + . . .
It is not even necessary to use the factorial divisors! Having altered our perspective to that of factorial-base numbers, each division becomes a multiplication by the reciprocal, which becomes a simple "shift" (by divfbyi ()) since there is only one factorial digit to worry about and the contents of the array at that subscript will only, and always, be 1. Now an integer loop counter serves as an efficient coding of the reciprocal of an arbitrary factorial. Hard to beat that.
There are practical calculations that have a simple, clear, and natural fourmulation using factorial-base numbers. There are those that don't.
Finally, factorial-base calculations behave reasonably when and if they do lose perfect accuracy. The same calculations carried out with ratios and expandable bignums may simply fail altogether when memory allocation fails. A good tool box holds more than one size screwdriver, plus something else to open cans of paint.
Dear CUJ,
The "Wildcard Subdirectory Searches" article in your February 1992 issue opened by stating: "I have often been asked, 'How many C source modules do we have in our Image Processing system?, or, Is there an easy way to delete all of the *.SAV files in all 40 subdirectories?'"
The author then used the next nine pages to present his non-portable and limited (e.g., 127 subdirectories per directory) solution.
Under UNIX, those two questions are answered, in order by the following two one-line commands

find image_system_dir -type f -name '*.c' -print | wc -1 find start dir -type f -name '* .SAV' -exec rm {} \;
These commands will run under any variant of UNIX, from PC through Cray, and suffer no self-imposed limitations.
I can hear someone say, "But I want to use a C function rather than a command line program." If by some chance an appropriate UNIX utility is not available (rarely the case), then basically just pass the same command to popen and read the results using fgets or scanf — again simple, universal, and portable.
If the article's author had been running UNIX on his PC and AMIGA rather than MS-DOS and AMIGA-DOS, he likely could have written a one-line command rather than a nine-page article. And that is why I am more appreciative than ever that I use UNIX rather than proprietary and restrictive operating systems.
Andy Levinson
The Institute OSM Ltd.
11575 Sunshine Terrace
Studio City, CA 91604-3835
I love UNIX too whenever I have to play such games. (I keep a copy of my old Idris system that I can start up in seconds under Windows 3.0.) When I want to choose among 20,000 commercial software packages at a fraction of the selling price, however, I lean toward DOS. — pjp
To the editor:
The article "Hashing From Good To Perfect" (February, 1992, V10N2) was a good explanation of the topic but there is a bug in the shuffling algorithm used. I have coded exactly the same shuffling algorithm and it seemed to work fine. Once I needed to verify the shuffling of a deck of cards or a keno number list, the effects of the bug showed up. After many million runs, a statistical analysis showed that there was a skewing of results toward the lower numbers in the array toward ending up in a lower position of the array.
The incorrect (published) algorithm was: "...for each element in the array, pick a random number from 0 to 255 and exchange the current element with the array element indexed by the random number."
The correct algorithm is: "For each element in the array, pick a random number from X to 255 (where X is the element of the current index) and exchange the current element with the array element indexed by the random number."
Knuth and others have published the correct algorithm. It isn't obvious why it is better and the code for the first version is simpler and cleaner and seems to work. (I know — I have made the same mistake several times now!)
When you examine the second algorithm, you will see that the new version of the array is being built from the bottom up and that an element that has made it into the bottom never moves again. With the incorrect version, the initially lower elements may be swapped back and forth many times. As the shuffle proceeds, the initially higher numbers are given fewer opportunities to be swapped. The effect is that of multiple shuffles with the elements being treated unequally. Another way to look at the correct algorithm is: "Select one of the remaining elements; move it into the new list; repeat until done."
For most applications of the shuffle algorithm (graphic effects, game events, etc.), there might not be a difference but there sure was when I had to actually model a real deck of cards.
Love the magazine — keep up the great work!
Glenn R. Sogge
Lee & Randall Software Associates
104 Janet Avenue
Streamwood, IL 60107-1302
Ron Burk replies: What a cogent explanation! I lucked out, since this particular application is fairly forgiving if the shuffle is not perfectly random. In fact, if you go on to create a perfect hashing function, the heuristics applied after the shuffle can also skew the randomness measurably, but that is a fair trade for a perfect hash. Thanks for the letter; I am fixing my code and scribbling your explanation in the margins of my copy of Knuth.
Dear Mr. Plauger,
I would like to suggest that a GLOSSARY OF TERMS AND ABBREVIATIONS be added as a permanent feature of The C Users Journal.
My suggestion is supported by the following considerations:

Documentation Standards recommend such practice;

there is nothing more frustrating when reading an article than missing a point just because the writer assumes that you are american and/or that his jargon is good enough English;

if such a glossary existed and assuming that it will have to be limited to two or three pages, then your writers will have either to limit themselves to the glossary or start their articles with clear definitions; (they may even think about using plain English rather than americanisms);

After all they want to be understood, don't they?
Note: I am not suggesting to include "standard terms" like C, UNIX, bit etc.
Regards,
A. Bernay
bernay@aldetec.uniwa.munnari.oz.au
Dear P.J. Plauger,
As a "disgrace to the profession," I'm writing to let you know (and Mr. Paul Mann as well) that I've resigned my position as Program/System Administrator at a trucking company in Atlanta. Not only do I not know the difference between bit fields and bit variables but I do not have a degree and have no intention of obtaining one. Thus, my obligation is to resign and take up menial labor. Now, I do have seven years practical experience with both hardware and software, using various languages (with C as my special favorite!) but I guess I'm just a jerk after all!
Seriously Sir, and with no personal offense to Mr. Mann, why is that when somebody decides that we should be categorized according to a set of specific standards, these standards always seem to parallel the qualifications of the individual making the suggestion?
As a person who learned his trade the hard way, I suggest that words have power and should be used with care. Many times while learning C and Assembler, I nearly lost confidence in myself because some well-meaning individual made one of those "real programmers do it this way" statements. Much of the work I do now involves dBase dialects and when I was working for a company that markets operating systems it was considered anathema to admit to such a thing. Our obligation as professionals is to produce product the quickest and best way available. My boss Would be singularly unimpressed if I used C to do a job in twice the time it would take if I did it in Clipper. Similarly, when C is called for, it's my obligation to know the language and how to implement its various features. Even if it means learning the difference between bit fields and bit variables!
My best to you all.
Sincerely,
Michael Peirce
Mac's Customized Dist. Service
Nicely put. — pjp