LETTERS

A Reader Over Your Shoulder

Dear DDJ,

In his February 1991 "Programming Paradigms" column "A Programmer Over Your Shoulder," Michael Swaine presents several examples to illustrate Antonetta van Gasteren's ideas in streamlined mathematical arguments. I'd like to comment on some of them.

In the maximization example, defining the sequence elements instead as elements of two "sets" (the mathematical equivalent of "bags") would clarify the problem somewhat, although subscripts and permutations would still be required. Also, I wonder whether Hardy, Littlewood, and Polya were really "deluded" by their own notation. In mathematics, looking at a problem from a different perspective (recasting a geometric problem in an equivalent algebraic form, say) is a common, often enlightening practice. Perhaps they knew what they were doing.

The point about retaining symmetry whenever possible is well taken; however, in the game involving bit strings, restating the proposition in terms of x and y accomplishes nothing. The critical fact is that the leftmost bit never changes; once that observation has been made, the conclusion follows directly from mathematical induction (which Swaine uses implicitly in his argument for termination). The only way to see that the leftmost bit never changes, though, is to refer back to the original transformations. I believe a better example could have been found to illustrate the point.

Steve Medvedoff

Santa Maria, California

Dear DDJ,

Our work as programmers often leads us to confuse computations and concepts. Michael Swaine, LISPer-extraordinaire, endorses recursion in place of the concept. In his February 1991 review of a book by Antonetta van Gasteren, he gives a problem simply and correctly: Match up pairs of natural numbers from two finite sets of the same size and maximize the sum of the products of the pairs. Then he defers to van Gasteren's recursive statement of the problem and claims that her recursive solution stands out among "generally messy" proofs.

Hardly. The solution is to sort the two sets of numbers the same way (ascending or descending) and pair them. A neat proof says: Starting from this solution, suppose we exchange the members of two pairs. Do a little algebra (omitted here) by writing the larger numbers as the smaller ones plus nonnegative differences. Compute and compare the required sums. The correct matching is greater by the nonnegative product of the difference terms, so no alternative can be larger.

The statement of the problem by van Gasteren introduces unnecessary recursion. I maintain that the above proof, even written out with a few variables, is clearer than a recursive one.

Charles Pine

Oakland, California

Multimedia Muscles

Dear DDJ,

I read with interest the "Editorial" in the February 1991 issue where the cost/performance aspects of various multimedia platforms are considered. The conclusion is that the Amiga is the sensible choice. The Amiga's especially versatile, sophisticated, and compact preemptive multitasking operating system makes it ideal for multimedia. The editorial goes on to say that the Amiga has never made it to the broad-based market. Although there is an Amiga for every two Macs, neither of those machines is broad-based in the sense that MS-DOS machines are.

Accurate assessments of the Amiga, especially those dealing with multimedia, coupled with such powerful programs as AmigaVision and hardware/ software exclusive to the Amiga such as the Video Toaster, should continue to propel it into the "mainstream."

Mainstream publications, such as the March 1991 Video and Popular Science magazines, mention the Amiga more often than computer magazines seem to. The city of Atlanta won its bid to host the Olympics on the strength of an Amiga running the multimedia presentation. A Mac II was originally tried, but was relegated to a supporting role when it was found inadequate to the task -- due to an inability to multitask, perhaps? Now I'm beginning to sound like Michael Swaine.

Jeff Johnson

Cincinnati, Ohio

In A Snap

Dear DDJ,

I'm writing to register my vote for the best article in the February 1991 issue: "Screen Capturing for Windows 3.0" by Jim Conger. While possibly one of the shortest code-related articles in DDJ's history, it came at just the time that I needed such a utility. Thanks!

While the program worked perfectly, there was one small bug (which Conger fixed via a kludge). While initializing the wndclass structure, the icon should have been loaded as follows (substituting hInstance for NULL):

wndclass.hIcon - LoadIcon (hInstance,   szAppName);

Had this been done, the entire if (IsIconic (h Wnd)) clause within the WM_PAINT case could have been eliminated, and the matching else clause just made part of the straight line WM_PAINT code.

Brian R. Anderson

Burnaby, British Columbia

Jim responds: Thanks. Note that this ties the icon to the window class, so that any window created from this class will minimize with that icon. That is fine in a small application like Snap3, but probably not what you want in a larger program with multiple calls to CreateWindow( ).

Protecting Protected Mode

Dear DDJ,

I read "Accessing Hardware from 80386 Protected Mode, Part I," by Stephen Fried (May 1990) with great interest, mostly because I've spent the past few years programming for Intel's iRMX, which runs in protected mode. While I generally enjoyed Mr. Fried's article, there are two subjects upon which I must comment.

First, 64 terrabytes is 2**46, not 2**48. The size of the virtual address space is (2**14 selectors) * (2**32 maximum segment size in bytes). The lower two bits of the selector contain the requestor's privilege level (RPL), which is the reason there are not 2**16 possible selectors.

Second, I do not like the convention of overlapping code and data segments in a flat address model, especially when executing in ring 0. I specifically refer to Figure 1, in which selectors Och (user code) and 14h (user data) overlap, sharing a base at 100000h with a 2ffh offset limit. Since a task running at privilege level 0 is at supervisor level with respect to the page-level protection mechanisms, it is not subject to the page R/W restrictions of user-level tasks, either. Such a combination defeats an important protection mechanism: preventing a program from overwriting its own code. I do not know of any compiler that will generate such code (although I've seen library implementations of int86() that modify code inline), so any time compiled code writes to the code segment, it is due to indirection through an invalid pointer, i.e., a programming error.

Stephen Fried disagrees with me. He believes programs should run at ring 3 only if the memory mapping described in his Figure 1 is used. In other words, a program should have read, write, and execute privileges for every byte of allocated memory. Perhaps he could satisfy himself and me both by allowing two different types of loading. By including offset fixup data in the executable (a cross between relocation data in a standard DOS executable and FIXUPP records in OMF-86 object modules), the loader could set descriptor bases either in the Fried-preferred method or in a manner such that code and data do not overlap.

I think that programmers benefit by having invalid memory references trapped by the hardware. Protection should be utilized as a developer's tool. Let's not cripple the 286/386/486 chips to satisfy our DOS programming prejudices.

Charles Scott Nichol

Philadelphia, Pennsylvania

Persistence Pays Off

Dear DDJ,

Here's a simpler way to have persistent objects other than that described by Scott Ladd in his article "Persistent Objects in Turbo Pascal" (September 1990). I was writing a program that had to store several objects of different types in a file, and I read the Turbo Pascal manual to find out the details of the storage of objects in memory. The format makes it easy to store and retrieve objects from disk.

Objects are stored exactly like a record except for a VMT (Virtual Method Table) if there are any virtual methods. The VMT is 2 bytes, and it should not be stored on disk. Furthermore, the problem with SizeOf( <object> ) does not apply when the object has a VMT. The VMT points to the actual size of the object and not the size of the base class. This makes it easy to create a base Storable class:

  Type
  Storable = object
        Procedure Load( var F : File );
        Procedure Save( var F : File );
        Destructor Done; virtual;
  end;
  Procedure Storable.Load;
         Type
              ObjectData = Array [0..2] of Byte;
         Begin
              BlockRead (F, ObjectData(Self)[2],
                                             Sizeof(Self) - 2);
         End;
  Procedure Storable.Save;
         Type
             ObjectData = Array [0..2] of Byte;
         Begin
              BlockWrite (F, ObjectData(Self)[2],
                                             Sizeof(Self) - 2);
          End;
  Destructor Storable.Done
          Begin;
          End;

Any other object that should be persistent can be inherited from Storable. Storable's Load and Save methods will automatically find the descendant type's size and store it. The descendant types do not have to write each individual field as they do in Mr. Ladd's program.

Amit J. Patel

Houston, Texas

Graph Decomposition Redux

Dear DDJ,

I read Edward Allburn's article "Graph Decomposition" (January 1991) with some interest. Although my work has nothing to do with graphs, I am fond of algorithms and I also use the Phar Lap DOS Extender with MetaWare's High-C compiler. Mr. Allburn wrote that the most expensive section of the algorithm was in determining if both vertices are already in the same set. A technique for reducing this effort comes readily to mind.

An additional data item will be needed for each element of an array. The size of this item in bits is up to you. More bits will reduce the effort required. (You could either steal some of the high-order bits of each entry, or you could make each entry take more space.)

Each time an adjacency list is created, the additional data items will be set to a given value. This value is based on incrementing a counter modulo 2 to the N where N is the number of bits you have available in the additional udata item.

Whenever an additional element is added, it gets its data item from the list it is being added to. If both vertices have been seen before, you first compare their additional data items. If they are different, then there is no chance that they are currently in the same adjacency list. If they are the same, then you must use your current technique for finding this out. Whenever adjacency lists are merged, you can either set all the combined elements additional data items to a new value, or you can pick one of the lists, and set all of the other entries to the first's value.

This reduces the number of times that you must find out the hard way if two vertices are in the same set. The amount that your effort is reduced is proportional to the number of bits you can allocate for the additional data item. Just four bits would reduce the number of times you would have to work things out the hard way by an order of 1 over 16.

Neal Somos

Seven Hills, Ohio

Dear DDJ,

I was intrigued by Edward Allburn's article "Graph Decomposition" (January 1991). I do not have a sufficient background in graph theory to really follow his argument, so I set myself to the challenge of trying to discover an algorithm which would solve the problem as I understand it.

Given: 1. A set of integers 1..Ncount 2. A set of pairs P(a,b), where a & b are in 1..Ncount. Pcount = number of pairs.

Then:

Assign each element in 1..Ncount to a subset so that all the subsets are disjoint and P(a,b) ==> (a in Subset <==> b in Subset).

I am a software developer specializing in graphics and calculations for manufacturing and industrial customers. I consider my main strength to be in the area of algorithm design (incidentally, I agree with Mr. Allburn's assessment of Sedgewick). Over the past year, I have worked with a small team which developed a 3-D modeling and rendering package, and alone I developed a time management package for large assembly lines. I got my math training as a physics major, and consequently I'm weak on graph theory.

I'm wondering if Mr. Allburn could take a look at the algorithm that follows. I am really curious to know if my "flash of insight" amounted to anything.

Graph Decomposition II

Let each connected component be referred to by the number of its minimum vertex. In Figure 2c (see DDJ, January 1991, page 90) we have

   component 1 (2,3,5,6)
   component 4 (7,8)
   component 9
   component 10
   component 11 (12)

For each vertex i, let G[i] store the minimum vertex. We want an algorithm to produce G as follows:

    G[1] = 1         G[7] = 4
    G[2] = 1         G[8] = 4
    G[3] = 1         G[9] = 9
    G[4] = 4         G[10] = 10
    G[5] = 1         G[11] = 11
    G[6] = 1        G[12] = 12

Then determining if a path exists between a and b is equivalent to checking if G[a] == G[b]. In pseudocode:

   Step 1: for each i, G[i] = 1
   Step 2: for each pair (a,b)
               find min and max of G[a], G[b]
               G[max] = G[a] = G[b] = min
   Step 3: for each i, G[i] = G[G[i]]

Step 1: Initialize G by marking each i as connected to itself.

Step 2: When a new pair is introduced, we actually have four vertices:

  Two members of the pair: a & b
  Their "minimum" connections (to date) G[a] & G[b]

Question: Which of these is the absolute minimum: Once we've found this, label the other three with it. Since G[i] <= i for all i, it suffices to just check the Gs. For example, imagine that at some point G is:

   G[1] = 1        G[4] = 4
   G[2] = 1        G[5] = 4
   G[3] = 1        G[6] = 4

Now comes the pair P(6,3). Max = G[6] = 4, min = G[3] = 1. G changes to:

   G[1] = 1
   G[2] = 1
   G[3] = 1 <- 1     G[b] = min
   G[4] = 4 <- 1     G[max] = min
   G[5] = 4
   G[6] = 4 <- 1   G[a] = min

I call P(6,3) a "late arrival" because until it arrived, it looked like we had two separate components, notice that this step has "informed everyone in the group" except #5. Step 3 handles that. Step 3: Final assignments: G[i] = G[G[i]]; in this example:

   G[1] = 1 <- 1
   G[4] = 1 <- 1
   G[2] = 1 <- 1
   G[5] = 4 <- 1
   G[3] = 1 <- 1
   G[6] = 1 <- 1

This is really a check to see if G[i] has been reassigned without i's knowledge. If G[i] is not equal to G[G[i]], such as G[5] = 4 in the previous example, then G[5] has been reassigned. In this case, vertex 4 (=G[5]) was reassigned by P(6,3) without #5's knowledge. If G[i] already equals G[G[i]], then G[i] has not been reassigned, and this step is redundant but does no harm.

Bill Polson

Saginaw, Michigan

Ed responds: Neal's suggestion is similar to one that a colleague of mine posed several months ago. Basically, the idea is to assign each of the connected components an ID number, and to have each vertex carry the ID number of the connected component the vertex belongs to. Thus, when determining if two vertices are already in the same connected component, the first step would be to compare the ID numbers attached to each vertex. It would then only be necessary to traverse one of the connected components if the ID numbers happened to match. Thus, one would theoretically save a significant amount of time when merging a pair of connected components. The drawback to this is that the ID numbers of one of the connected components have to all be updated to match the other. When I experimented with this idea, however, I found that this process of updating all of the ID numbers consumed all of the time that the ID numbers saved.

Bill Polson's algorithm is right on the money. In some respects, his algorithm is similar to Neal's suggestion of using ID numbers on each connected component. In Bill's algorithm, the smallest vertex number of the connected component basically serves this purpose. I implemented both his algorithm and GAD using Watcom C 386 8.0. When I compared them side by side, I found that Bill's algorithm consistently ran twice as fast as GAD when counting connected components in worst-case graphs.

The main drawback of Bill's algorithm compared to GAD is that it takes significantly more time to determine what vertices are in a particular connected component. I recently had the opportunity to write a DIFF program that is used to compare large graphs at my workplace. When a difference is found, all of the vertices involved in the difference must be logged into a report. Since GAD represents each connected component as a linked list of vertices, one simply follows this list to find all of the vertices. With Bill's algorithm, it would be necessary to scan the entire array to find all of the vertices. Bill, too, has released his algorithm into the public domain. It deserves serious consideration by anyone who is working with graph data structures.


Copyright © 1991, Dr. Dobb's Journal