November 1991/We Have Mail

Departments

We Have Mail

Dear Editor,
Ref: Articles on Skip Lists in February issue.
I haven't been able to make sense out of skip lists nor can I follow the conclusions of your authors. But correct me, if I am mistaken.
For one, there seems to be a general misunderstanding about the virtues of using a random number generator. "Skip lists rely on probability to randomly determine the level of any given node in the list." So, instead of making every, say, 25th node a level 2 node, and every 100th node a level 3 node etc., we leave it to a random number generator to set the levels, so that now e.g. node number 1, 20, 24, etc. are level 2, node number 200, 260 are level 3 etc. Use a random number generator at the end of a conveyor belt, to put different randomly-colored sweets into 5 different boxes. After so many trials, the result will be the same, as if you had used no random number generator and had gone 1, 2, 3, 4, 5, 1, 2, 3 etc... The risk that all reds end up in box number 1, all blues in number 2 is still there... There may be some point, that skip lists are better off than binary trees when the incoming list is already sorted according to the criterion we use for sorting. But there is no point in using a random number generator. If this were the cure, why not use one, two, three superposed random number generators? Random doesn't get more random!
For two, why do searches in skip lists start with the first element? If you started in the center, going left or right depending on the result of the comparison, you would cut search time in half. This for each level. But then you would have a binary tree, or some variation of a binary tree! Now that also the random number generator is out, how far are we away from a binary tree?
For three, if "chances that such a list will be noticeably unbalanced" are about one in 5,000, then I wouldn't like to be the one who runs the application program in this one-out-of-5000 cases. The basic problem is that the program seems to be hanging in such a case. There is a basic rule which is being overlooked here:
We program for the worst case.
When driving, you are prepared for a truck blocking the road just around the bend. Now, if statistically it is proven that this is a 1-in-5000 case, does it make you go faster in a bend? Programs must produce predictable results and have predictable performance.
To develop a good algorithm, one needs to be aware of the resources. If computer memory, programmer's time, and user's time were such resources, with the first especially scarce, good algorithms would look differently than if programmer's time were the true scarcity factor. As memory becomes less and less of a constraint, I would give binary trees with some safeguards (corrections for some reasonable or even complete balance) more of a chance in tomorrow's computer journals than skip lists. I do not share the view that recursion adds to overhead. With only some 20 recursive calls you can access 2 raised to the power 20 nodes in a balanced binary tree! Make that 40 for a reasonably balanced binary tree? Say 30 bytes for each recursive call. Where is the overhead? Of course, binary trees are not intrinsically linked with recursion; recursion simply makes things easier, by letting stack take care of your data which might otherwise have to go to an array. What I think is basically missing in the articles, is a performance comparison along those lines. I won't do that, because I think that binary trees have won already just on account of the above very general considerations. "The more I come to look at binary trees, the more I am convinced that they are better than their predecessors and their successors." Correct me, where I am wrong.
Sincerely yours,
L. Engbert
Engbert UB
Taunusstrasse 8
D-6384 Schmitten 1
Germany
I may be wrong, but I believe that the main reason for making random choices in building skip lists is to avoid correlations. You can wire in a simpler choice function with the same mean behavior and win on performance most of the time. But if the data you are organizing follows the same wired-in pattern, it can easily end up perversely organized. You see this phenomenon with certain implementations of Quicksort when you present data that is exactly in reverse sort. A random choice of pivot element is more expensive but more robust.
We program for the worst case far less often than you might think. The Law of Large Numbers tells us that statistically safe algorithms are pragmatically safe even if the extreme behavior is unacceptable. That's because the extreme happens so seldom. Only hard real-time systems need protect themselves against these extremes.
Stick with binary trees if you want. I for one celebrate every algorithm 1 can learn. You never know when a particular ugly duckling will prove to be a swan. — pjp
Dear Mr. Plauger:
In your response to Maynard A. Wright (August 1991, p. 136) you wrote, "By the way, the C Standard requires that functions returning structures be reentrant now."
Can you please specify, by section, where in the standard this is stated? I have not been able to find any mention of this in any of my references ("The C Programming Language, 2nd ed.", Kernighan & Richie; "C: A Reference Manual, 2nd ed.", Harbison & Steele; "Draft Proposed ANSI Standard Programming Language C, Dec., 7 1988," ANSI X3J11 Committee ). In the future, please cite your references.
Also: is the burden of this requirement the responsibility of the programmer or the compiler? Reentrancy most often has to do with recursion and interrupt handling. PL/M for example, requires that any procedure that is recursive (directly or mutually) be defined with the reentrant attribute. This tells the compiler to allocate space for local variables on the stack, instead of in static memory. This way, when the function recurses, the local variables don't get corrupted.
In C, there is no special syntax to specify that a function is recursive. This means that the compiler must recognize when a function is recursive or always store local variables on the stack.
Can you give an example of a function that is not reentrant?
On another topic, in the "Readers' Replies" section of Ken Pugh's "Question & Answers" column (December 1990, p.93), Bob Withers provided a very nice solution to Bob van der Poel's problem with external declarations (see August 1991, p.126).
In file global.h:

#ifdef DRIVE # define CLASS #else # define CLASS extern #endif CLASS int i;
In file main.c:

#define DRIVER #include global.h
In file foo.c:

#include global.h
He defines a macro called CLASS, which takes on one of two values. If a second macro, DRIVER, is defined, then CLASS is defined to be a null string. If DRIVER is not defined, then CLASS is defined to the string "extern". Thus, in main.c, i is declared as int and in foo.c, i is declared as extern int. This way, all externals are declared in exactly one place and declarations and definitions are forced to match.
I greatly enjoy your magazine. Keep up the good work.
Sincerely,
Terence J. Griffin
711 SE Tacoma St.
Portland, OR 97202
Here is one of those places where the C Standard enforces a requirement by being silent. Nowhere does it say that an implementation can botch a returned structure value if a signal comes along and calls the same function. Thus, the implementor must get it right. X3J11 explicitly discussed this issue. We probably put words about it in the Rationale, but I don't have a copy handy to check. That's the only place where you're likely to find a positive statement of this decision.
All functions in C can be called recursively. Section 2.1.2.3 of the draft you have should state that autos are private to each invocation of a function. You can, of course, craft a function that misbehaves when called recursively. Simply declare all your local data to have static storage class. Leave it auto, however, and it will come out right. — pjp
Sir (via e-mail):
Please accept my gratitude for your contribution and clarification of the C language. Although most of my work requires dBASE, I find that C is the medium that I pursue with the greatest enthusiasm. I read all of your articles in The C Users Journal with great interest but nothing has ignited my curiosity as intensely as the July article on Math Functions.
You mention FORTRAN. I have always supposed FORTRAN to be the preferred language for numerical applications. I have no idea why this is and I know nothing of that language. I am in anticipation that you are about to explode that supposition, however, and perhaps I should be patient until you finish the series before postulating this inquiry. But I am not patient and so I ask.
1. If FORTRAN is indeed the premiere language for mathematical applications, is this due to some "algorithmic" characteristic of it's libraries or to some other attribute?
2. If the above is true, then is it not possible for the C language to perform equally as well or better than FORTRAN for numerical use if the same algorithms are applied? If so, will you be demonstrating this for us in your current series? And then could one apply the C language to the same equations and expect at least as good or perhaps even better results?
3. If this is not the case, then would one be better off learning FORTRAN for the math stuff?
Perhaps the question "Numerical analysis FORTRAN vs. C" has been discussed somewhere in the literature. I would be glad for references if you are aware of any.
Thank you for your kind attention. I sincerely anticipate your response and look forward to reading many more of your expositions.
P.S. I have previously sent this via COMPUSERVE mail. I fear you did not get it. If you did please forgive this redundency.
Ron Irvine
Compuserve 72770,112
or
INET: 72770.112@compuserve.com
C is generally just as expressive as FORTRAN, if not more so. Where it has lost out in the past is on performance. That's because C had an unfortunate tendency to convert type float to double at the drop of a hat. The C Standard fixed most of that silliness. Unfortunately, C still loses out to FORTRAN on the really big number crunchers. And that's because it is easier to do flow analysis of a FORTRAN program and introduce parallelism inside loops more often. The tremendous power of C's pointers is a disadvantage here.
The Numerical C Extensions Group (now X3J11.1) is exploring ways to overcome these last hurdles to the acceptance of C by the FORTRAN old guard. But unless you are doing calculations that eat hours of supercomputer time, you will find C quite an adequate successor to FORTRAN even now.
I have a Compuserve account, but I haven't been reading my mail there while in Australia. My Internet address is better even when I'm in town, however. — pjp
Dear Mr. Plauger,
I just finished spending two hours searching back issues of The C Users Journal for a particular article. Is there an on-line index available anywhere? If not, I see a need for such a beastie and suggest the creation of a mailing list (CUGindex@castrov.uucp ?) for those people interested in the index's creation. I imagine people contributing bits and pieces as they have time/need.
I see similar need for a machine readable index to the disks.
Contingent on your response, I plan to write a letter for publication to the Journal announcing the existence of the mailing list.
As an aside, do you know if in Standard C, realloc(ptr, 0) is implementation defined (as is malloc(0))?
Thanks,
Brett Wuth
I've passed your letter on to R&D. I doubt they maintain an electronic index of CUJ, but I'm sure they have one of the CUG software. Whether and when they elect to make it available to the browsing electronic public I don't know.
Sadly, realloc(x, 0) is also undefined behaviour. — pjp
R&D Publications maintains an index of The C Users Journal for 1988, 1989, and 1990. We also maintain a hard copy directory of CUG disks. For information on a particular topic or disk call Customer Relations. — rlw