LETTERS

Interpreting the Constitution

According to Michael Swaine in his June 1991 "Programming Paradigms," Laurence Tribe argues that in order to avoid errors of the past in dealing with constitutional issues, one must "remain true to the values represented in the Constitution" and that "fidelity to the values requires flexibility in textual interpretation."

The entire nation should find it of great comfort that, at long last, someone is going to fully define the values represented in the Constitution for "the rest of us." Tribe's "flexibility" in interpreting the text will, undoubtedly, produce a translation that is uncannily in line with his own personal opinions and political leanings.

As I see it, there are four ways of interpreting the Constitution, none of them fully acceptable:

What did they write? That "the right of the people to keep and bear arms shall not be infringed." Look up infringed in the dictionary. Any regulation, including all restrictions on carrying concealed weapons, should not be allowed. Strike one.

What did they mean? They meant that only white, Protestant, land-owning males should be allowed to vote. Strike two.

What would they write today? The drafters of the Constitution have been dead for more than 150 years. I guess we'll have to consult our crystal balls to find out. Strike three.

What do I want it to say? This is the method used by most people, apparently Tribe included. Actually, judging by the number of times I've heard people say, "that's unconstitutional!" I think that very few people have ever read the Constitution. Strike four. (You say there are only three strikes? Where does it say that in the Constitution?)

Technological advances and their application in the judicial process should be weighed by the degree to which they advance the cause of justice and provide protection of the innocent.

David Rago Livonia,

Michigan

Compressing to the Minimum

Dear DDJ,

Reading Mark Nelson's article on data compression (DDJ, February 1991) inspired me to experiment with text compression. In particular, I hoped that I could find an algorithm that compressed text data to near its theoretical minimum (if such a thing exists) without the bad-order time and memory requirements Mark spoke about with the high-order modeling.

I started with the assumption that text data can be characterized as a stream of symbols in which some symbols are more frequent than others, certain pairs of symbols are more frequent than others, and certain words and phrases are more frequent than others. I searched for an algorithm which exploits this "repetition" of strings or "frequency differentials" on all scales with one mechanism.

I came up with something which performed comparably to PKZIP. And I found the simplicity of the program quite surprising.

This algorithm sounds a little like Mark's Order-1 modeling, yet I found no need to go to higher orders. I had an array of "symbols," which was initialized to have one symbol for each character encountered in the text, but room for many more elements. Then I went through the file and constructed a table of correlations between each possible pair of symbols. For every pair of symbols occurring frequently enough above some threshold, I established a new symbol to replace the pair. Then I repeated the process until I failed to generate new symbols, at which time I reasoned that I had "removed" all context regularities from the file, and all that remained was to apply the Arithmetic Coding method to code all these symbols.

I found that, as I expected, most words coagulated together to be represented by a single symbol, and even uncommon words were represented by only two or three symbols, so that in a sense it was working like a dictionary model. If text was repetitious on all scales, like a fractal, I'd be able to compress the file to just a few symbols, with all the information of the file contained in the array of symbols, but as it is, the array of symbols is always small compared with the processed file.

For example,

  
while(true) {       
	break;

was encoded by four symbols:

  
while(   
true   
) {\n\r\t\t   
break;

Now, of course, I ended up with a large number of "symbols," up to several thousand for a large English file. To construct the table of correlations, I couldn't very well have an array: int c[2000][2000]. But I accomplished the same effect by splitting the theoretical table into smaller squares of some smaller fixed size, so that for each square I constructed part of the hypothetical 2000 x 2000 correlation table. It slowed things down, but I was not particularly concerned about speed. Firstly, I thought that the primary concerns were compression ratios and reconstruction times. Secondly, as it turned out, I was never waiting too long for files to be compressed.

Can I conclude that I may have compressed the file to its theoretical minimum? Something I believe, which someone may someday prove, is that Arithmetic Coding is the theoretical optimal way of encoding a sequence of symbols, providing that the sequence has no context regularities. A sequence of symbols without context regularities can be permuted in any way and be statistically equivalent to the original sequence. If my processed file of symbols has this property, I should be able to permute it in any way and get something completely comprehensible. This is not the case. I haven't tackled the problem that symbols occur with different frequencies in different regions of the text, (more so with programs than English), because I expected only modest gains with that optimization. Also, more pertinently, I know the rules of C syntax (and layout) and English grammar, and so all the "comprehensible" files share some regularities this method does not employ. Some of these more superficial regularities could be programmed into a compression program as culture-specific information, but there are also deeper regularities, e.g., meaning and function, which would be difficult to exploit.

Tim Cooper

Eastwood, Australia

Mark responds: Mr. Cooper has come up with an interesting algorithm for tokenizing a file. Whether it can be effectively used to compress a file is another question. By developing a comprehensive dictionary full of strings, it is possible to drastically reduce the size of a file well beyond the ratios achievable by compression programs most of us use. The problem is, in order to decompress the file, a copy of the dictionary has to be passed to the decompression program along with the data. So the critical factor is how much storage space is taken up by the dictionary and the compressed data together, not just the data. Mr. Cooper seems to be neglecting that factor in his presentation.

In the event that you are going to be compressing the same type of files on a regular basis, it makes sense to build a dictionary and keep it online for both the compressor and the decompressor to use. For the most part, today's users seem to prefer algorithms such as LZW that create dictionaries on-the-fly. It seems possible that applications involving the storage of massive amounts of homogeneous data, such as reference works on CD-ROM, could benefit from the Coopper approach.

Until Proven Otherwise

Dear DDJ,

I am writing in response to the letters from Steve Medvedoff and Charles Pine ("Letters," May 1991) regarding Michael Swaine's February 1991 "Programming Paradigms" column ("A Programmer Over Your Shoulder"). These letters present a wonderful opportunity to reinforce the points he made in his article about On the Shape of Mathematical Arguments by Antonetta van Gasteren.

The special irony is that the second writer disproves his own point. Briefly, he claims that he has a "hardly" messy proof of the natural number pair-matching problem. His proof is clear. It is also incorrect. The first part of this letter attempts to find out why.

It is hard to show where the "misproof" breaks down, because the conclusion, after all, is the correct one! This problem is very similar to one often faced by programmers. Your program has been getting the correct answer, but suddenly its not. By looking at the code, you eventually discover that it was getting the correct answer purely by accident. This incorrect proof manifests a similar bug. Its technique appears to work in the present situation. In other circumstances, it will actually prove incorrect statements.

One way to find this kind of bug is to represent the program, or proof, abstractly. In this case, then, the argument goes as follows: Begin with the pairing that you believe is optimal. Show that every pairing derivable from yours in a certain way is not better than yours. Conclude that yours is best.

The missing piece emerges in the phrase "in a certain way." The pairings using the procedure in the misproof include only those achievable by "exchang[ing] the members of two pairs." These are not all possible pairings (when the number of all pairs exceeds two, anyway), and so it does not necessarily follow that the putative optimum is best. In other situations (where the function to maximize is different, for example) this technique can "prove" incorrect conclusions.

Perhaps the "messy" proofs are messy because they are correct? The second point of this letter is to expound on the similarities between the spirit of modern mathematical thinking and code reuse. This is mentioned by the first letter writer, Steve Medvedoff, where he notes that "in mathematics, looking at a problem from a different perspective...is a common, often enlightening practice."

The same example will do nicely to illustrate. The sum of products appears in many branches of mathematics: as a dot product of vectors, for example, or as a line integral over a polygon, or the square of the diagonal of some hyperbox. In this case it is especially fruitful to look a little farther afield, to probability theory.

You should recognize the sum of products expression as the covariance of a bivariate population (the set of pairs). Now for the "code reuse": Those familiar with the relationship between covariance and correlation will see immediately that maximizing the covariance (while keeping the coordinate sets fixed) is achieved by maximizing the correlation coefficient. (By the way, this is where I hide most of the algebra and hand-waving that appears "in-line" in other proofs.)

The lights begin to flash. To wit: 1. The problem needn't be restricted to natural numbers, or even positive numbers, at all; the coordinates can be allowed to be any sets of real numbers. 2. The correlation coefficient is invariant under translations and rescalings of the coordinates (i.e., of all Xs at once, or of all Ys at once). (I can't resist pointing out that this extends the useful symmetries in the problem from a small, finite permutation group to an (infinite) four-dimensional group.) 3. The problem now has a direct geometrical interpretation: How can we match the points so that their scatterplot most closely approximates a straight line? This is an especially nice bonus, because it lets us visualize a solution to the problem.

Bringing in all these results from elementary probability theory is akin to reusing old code. "Why reinvent the wheel?" is the cry of today's programmer, seeking the Holy Grail of perfect code in no time with no work. It is a traditional technique of mathematics, which derives much of its power from building on previous results.

"Reusing" these mathematical results, let's translate the X-values so that the smallest is 0, and the Y-values so that their smallest also is 0. (In other words, shift the geometric picture so that the bottommost, leftmost coordinate is the origin. We can do this because of observation 2 above.) Now it is algebraically trivial (when you see this word "trivial," you know you're not reading a real proof! to see that the smallest X must be paired with the smallest Y in order to maximize the sum of products. Any other possibility can be equalled or improved by using this matching instead. The key is that the zeros simplify the computation.

This is where the beauty of the recursive formulation mentioned by Swaine comes in. We're done! The problem now is "one size" smaller, since we are left to pair the remaining points among themselves. By recursion, we see immediately that the ordered X-values must be paired with the ordered Y-values, smallest with smallest, through largest with largest.

The same kind of magic instantaneous solution arises in well-written recursive code. Recursive-descent parsers, for example, can appear just as mysteriously effective.

One point, worth a lot of reflection (I think), is that the hard part was over as soon as the connection with probability theory was made. In other words, the achievement lies not in the reuse of techniques (code) per se; it lies in discovering which code is relevant, and using it ("interfacing" with it) appropriately.

I have been able only to touch upon parallels between the disciplines of programming and "proof-making" here. Proofs and programs can have bugs, but sometimes appear to work correctly. Reusable code and insights from other mathematical/scientific disciplines can be remarkably effective. Visualizing a problem can be enlightening. These points, and the spirit of this discussion, are worthy of further musing, further pursuit. I hope that Swaine keeps on track.

William A. Huber, Ph.D.

Philadelphia, Pennsylvania