Letters to the editor may be sent via email to cujed@cmp.com, or via the postal service to Letters to the Editor, C/C++ Users Journal, 1601 W. 23rd St., Ste 200, Lawrence, KS 66046-2700.
Correction: In Richard Smereka's article, "A TCP/IP Socket Location Server" (September 2000, p. 34), we misspelled the name of Smereka's company. The correct spelling is Future Lab Inc. We extend our apologies to the author.
Dear CUJ,
I read the article about tree traversal (Valery Creux in the July 2000 issue) and my first reaction was, "You can't have a cake and eat it too." In order to remember your position in a tree you need the "path" the list of parents of the current node. This path could be implicit in a recursive algorithm, or explicit in a stack.
So how did the author manage to traverse a tree without any recursion? The catch is the innocent looking call, get_parent(current). Trees that have a parent pointer at each node (and, presumably a sibling pointer as well), indeed don't need recursion for traversal. For instance, trees that have to be quickly rebalanced are usually implemented with parent (and sometimes sibling) pointers at each node. The author's algorithm makes sense for such data structures.
Traditional trees, however, don't have parent pointers. So if the readers of the article were to change the implementation of their trees only to make such traversal possible, they should be warned about the cost. In a balanced tree, recursion depth is proportional to log(N) and so is the size of the traversal stack. Parent pointers, on the other hand, require storage proportional to N. So parent pointers take up more space than a traversal stack. Morover, space on the stack is allocated only during the traversal and can be reclaimed when the traversal is done. And finally, there is a non-trivial overhead in managing parent (and sibling) pointers during insertions.
Bartosz Milewski
www.relisoft.comI don't know what constitutes a "traditional" tree, but most of the ones I end up using do have parent pointers. But in any event, it takes more than parent pointers to recurse a tree without an external stack (either a call stack or an explicit stack of pointers). The point of the article is that you can do the job with just one additional bit of information in each node, beyond the parent pointer. pjp
Dear CUJ,
I found the July 2000 article "Tree Traversal in C without Recursion" by Valery Creux quite intersting, but I believe I've found some small problems with the source code shown in Figure 2 of the article. The code assumes the root node has no siblings, which might be true in practice, but seems to force the programmer to make a special exception for the root node when it isn't necessary. I suggest treating the root node as any other node in the tree and stop the loop when the current node reaches the root's parent. This allows for a cleaner, shorter, and more satisfying implementation of this idea while still maintaining, if not improving, its correctness.
Listing 1 is my modified version of the source code. As in the original source. It assumes a well-formed tree.
Yours truly,
Scott Haug
Valery Creux replies:
Scott, I'm replying to you (see point 3) and also to some others readers. I should thank all the diligent readers of my article and I should say that this article could have had a better conclusion. The algoritm has naturally some disadvantages/drawbacks:
1) It requires a pointer from a node to its parent node. When the application does not need such information, this pointer in all the nodes takes up more space than a recursive traversal. But see also point 4. Because of this pointer you could not (easily) share common sub-trees of a bigger tree.
2) When a node has a (small) maximum/fixed number of sub-nodes, managing the sub-nodes with a list costs more in terms of memory usage than, for example, an array of pointers.
3) The algorithm processes the root node of a tree a little bit differently than the others. One reader pointed out that I should process it like the others and should stop the traversal when I reached the parent of the root node. In this case I will also process the sibling of the root node. I do not expect that the root node should not have siblings. In my opinion, when I say that this node is a root node, I should not process the siblings of this node even if there are some. This is a question of taste or requirements.
4) If you examine the algorithm carefully, you will see that the parent information is needed only on the last sub-node of a particular node. In this case, the node's sibling pointer is null. If you don't need the parent information otherwise, you could share both pieces of information in one pointer and save the cost of a pointer, with of course the extra cost of a bit on each node.
This algorithm will be far from optimal in some traditional cases. In some others it will be helpful. Just add it to your bag of algorithms and be ready to use it when needed.
Valery Creux
Dear CUJ:
Pete Becker's article on "Floating-Point Basics" (June 2000, "The Journeyman's Shop") is a good start. He cautions that floating-point numbers are not exact, which can lead to an endless loop if the programmer tries to use an exact match of a floating-point number for a loop termination test:
// incorrect for (...; index==1.0; ...)Pete's "fix" changes the equality test to inequality:
// incorrect for (...; index<1.0; ...)which has the eminent virtue of being a finite loop. Unfortunately, it introduces other bugs!
Pete described his "fix" as partial, so Part II of his article may provide more info. Pending its publication, the following may be useful, both as the correct way to handle floating-point loops and to illustrate ways floating-point computations differ from integers.
To loop from 0.0 to 1.0 by 0.1, the "fix" used:
float index; // incorrect loop for (index=0; index<1.0; index+=0.1){ ...; };Since the programs now runs, this might leave the programmer satisfied. However, it also has some undesirable characteristics:
- The loop count is uncertain.
- The value of the loop parameter is increasingly inaccurate as the loop continues.
The following code illustrates this by successively increasing the upper limit of the loop, by powers of ten:
#include <iostream.h> void main ( ) { cout.setf(ios::fixed, ios::floatfield); cout << " Upper Limit " << "Number Loops" << endl; for (float fmax=1.0; fmax<=100000.0; fmax*=10.0) { long kount = 0; // incorrect loop for (float index=0.0; index<fmax; index+=0.1) { kount++; // loop body } cout.width(13); cout << fmax << " " << kount << endl; } }Results using Microsoft VC++ 6.0, using float data type (four-byte values), with optimization off [1]:
Upper Limit Number Loops 1.0 10 // OK 10.0 100 100.0 1001 // Incorrect [4] 1000.0 10001 10000.0 100015 100000.0 990564 // Incorrect [5]The first two trials with small numbers look good. Besides, the program logic follows well proven integer loop coding styles and the math formulas are correct. So we can stop testing, right? :-) Wrong!
When tested with larger numbers, the program shows a relationship between the loop upper limit and the loop count! This sort of bug passes easy-to-test small values, but fails inconsistently with real world (often large) values, causing much gnashing of teeth. For floating-point calculations, desk checks with small numbers are not enough, the code must be tested over its full value range.
The reason for the incorrect calculation is that the loop parameter (index) is updated using a floating point calculation, which in this case is approximate. (Pete Becker's article explains why 0.1 is represented in binary as an approximate value. Its numeric value implemented in binary can vary from 0.0999999 to 0.100001, depending on the platform and compiler.)
The loop parameter index is updated by 0.1 with each cycle. After N cycles, this simple looking code represents N approximated computations, each with a round-off error [2] As N gets larger, the program is increasingly inaccurate.
One guideline is to avoid floating-point computations until they are necessary. A better way to code this loop is:
for (long iloop=0; (iloop*10)<fmax; iloop++) { float index = iloop/10.0; kount++; // loop body }This code works. The loop count is correct, and the loop parameter index is accurate.
But the code takes advantage of knowing the loop increment is 0.1, in two places (iloop*10 and iloop/10.0). How can we handle the general case of floating-point loops from x1 to x2 by delx?
if (x2>=x1 && delx>0) { // [3] long nloop = (long) ((x2 - x1) / delx ); for (iloop=0; iloop<nloop; iloop++) { float index = iloop*delx + x1; kount++; // loop body } }This code first computes the loop count nloop, as an integer. This tells us exactly how many times the code should loop. The number of intervals is the loop range x2-x1 divided by the step size delx.
Within the body of the loop, the floating-point parameter index is computed based on the current loop count. This is important in assuring the value will be accurate. It is derived using only two floating-point (approximate) calculations, the exact same two calculations on every cycle. Thus, the parameter has uniform accuracy over the entire loop value range.
The cost for this approach is more CPU time (probably insignificant) and adding two more variables (iloop, nloop) to the program code.
Summary:
- Be careful using floating-point. Do not blindly copy integer programming logic.
- Programs can run correctly with some test values and be wildly incorrect with other values, so it is important to test over the full range of input values for the problem.
- When approximate computations are involved, it is generally better to replace N calculations by code using a constant number of calculations.
- For program accuracy, consistency and portability, even though a loop is over a floating-point range, it is sometimes necessary to use integer variables as the loop limits, and derive the floating point data as needed [6].
Notes:
[1] Optimization was turned off to force VC++ to use the four-byte floating point values, as coded, rather than the PC's internal 80-bit floating-point registers.
The point of this article remains the same, as the switch from internal registers to smaller memory words happens naturally as the code grows in size, or does procedure calls, and the compiler can no longer optimize the loop variables into registers. (By Murphy's rules, this switch happens when you least expect it :-). It is a common reason why numeric results from a small floating point test program may differ from a production version.)
The same effect can be demonstrated with the 80-bit registers, indeed with any floating-point computer, by selecting appropriate values.
[2] The cross-platform uncertainty effect on this code is magnified by varying computation round-off rules on different platforms, although these differences are theoretically removed when the IEEE floating-point rules are implemented.
[3] Since we are dealing with the general case of x1 to x2 by delx, we need a guard test (delx>0) in front of the loop to avoid a possible floating-point divide by zero.
[4] Probably 0.1 is represented internally on the PC as 0.0999999.
[5] This example illustrates additional floating-point calculation effects. As the loop upper limit increases, e.g. 100 and 10000, the number of loops is 1001 and 100015 respectively. This is larger than it should be, as we might expect if the loop step 0.1 is internally represented with a value that is slightly too small, e.g. 0.0999999.
But the last trial, with 100000, has 990564 loops, which is less than it should be. This breaks the pattern established with previous loop upper limit values. The reason is that the growing partial sum for the loop parameter has a much larger order of magnitude than 0.1, so when the two are added, the round-off occurs with a different value, as if 0.1 were represented as a slightly larger value, e.g. 0.10001.
[6] Sometimes one can gain accuracy by replacing floating-point calculations by an equivalent integer series. Caution: it can also complicate the code by obscuring the original user problem, and lead to its own bugs, when values exceed the range that can be handled by integers. Sometimes using extended precision (double) is a better method. These tradeoffs are a more extensive discussion.
Saul J Rosenberg
saul@rosenberg.com
Object Developers Group (ODG)I know that Pete was being intentionally simplistic. He focused on avoiding an infinite loop, not on getting the loop count exactly right. Nevertheless, your points are well taken with a vengeance. I applied your suggested "safe" method to your original test program, and got all nines for every test case. (I used gcc under Linux, which fails the same way as your VC++ run on the original test program.) The fix changes a mixture of successes and different failures to a uniform set of off-by-one errors. Try rounding the loop count (by adding 0.5 before truncating to an integer). It makes all the test cases behave sensibly, at least for this particular program.
Pete Becker and I do quite a bit of floating-point work these days, and we are humbled almost daily. I'm sure he will have more useful advice on floating-point arithmetic, in these pages, as time goes by. pjp