Paregmenon and Paradigms

Dr. Dobb's Journal January 2004

By Michael Swaine

Michael is editor-at-large for DDJ. He can be contacted at mike@swaine.com.

Jane Austen's Sense and Sensibility is one of the great works of Western literature. That may seem like a strange lead for a column in a programmer's magazine, but:

1. If we post this piece to the Web, it should change our Googlizing demographic profile in interesting ways, and

2. I wanted to see if former DDJ contributor and Jane Austen enthusiast Allen Holub is reading my column. If so, I should be hearing from him any minute now.

Also, "sense and sensibility" is a paradigm (that is, example) of a paregmenon—a paregmenon being a rhetorical device that I'm abusing this month. Paregmenon is the juxtaposition of words that have a common derivation. Like sense and sensibility. Paradigm means, among other things, an example. Paregmenon comes from the Greek and means literally "to bring side by side." Paradigm also comes from the Greek and means "to show side by side," so "paregmenon and paradigms" is a paradigm of paregmenon.

Yowzah. If anyone other than a hypothetical reader Holub is still with me, here's what I'm talking about this month, now that we've cleared up Jane Austen's place in Western lit: a book on XML, a book on UNIX, a follow-up to a recent discussion of some of the issues raised by electronic voting, and a consideration of whether biotech obeys Moore's Law.

Elections and Electrons

Okay, "election" and "electron" do not have a common derivation. They derive from unrelated Greek words, as near as I can tell. But they look like an example of paregmenon.

But you know all those problems regarding electronic voting that I raised in a recent column? The Australians think they've got the solution. They tested it out two years ago, it worked, and they figure the U.S. ought to adopt their model. Sort of a new Australian ballot, you might say. Should the U.S. follow the Aussie model? Here's the skinny from down under, via Wired News (http://www.wired.com/news/ebiz/ 0,1272,61045,00.html?tw=wn_tophead_1): The actual system is called "eVACS," or Electronic Voting and Counting System. But while the system itself is interesting, the model by which the system was arrived at is even more interesting.

The Aussies had been using a hand-counting system much like the system in Florida in 2000. And much like the system in Florida in 2000, the Aussie system had a few problems. In 1998, a couple of candidates were separated by a vote margin considerably smaller than the error rate of the system, making the result statistically meaningless, although far from actually meaningless in the eyes of the voters, I'm sure.

So what the government did was to put out a call for proposals to fix the system. Of the proposals that came in, exactly one was an open-source plan. Phillip Green, the electoral commissioner for the territory involved, said it was an easy choice to go with the open-source plan. He'd been observing the American experiments with e-voting, you see, and he had a good idea of what to avoid. "We were wary of using proprietary software that no one was allowed to see," he said. "We were very keen for the whole process to be transparent so that everyone—particularly the political parties and the candidates, but also the world at large—could be satisfied that the software was actually doing what it was meant to be doing." Thus: open source.

Despite the obvious good sense of Green's choice, the government went ahead with it. From start to finish, the project was posted on the Web for all to see. Also, an independent verification and validation company was hired to audit every line of code. Mistakes got spotted and corrected. And the whole project, from concept to product, took just six months and cost $125,000.

Swiping Votes

The eVACS system is not terribly sophisticated. Voters get a bar code to swipe over a reader, and that lets them cast a vote. It doesn't record any personal info; the validation that the voter is the voter happens in the usual low-tech way prior to handing out the vote-enabling bar code, I guess. The voter then uses an on-screen ballot, and the results are sent securely to a local server that burns two identical discs of results, along with digital signatures. These get delivered independently to a central polling place and counted (electronically).

The system doesn't include a voter-verifiable receipt, and the designers admit that this needs to be added. The receipt is a printout from the machine, without which there is no paper audit trail of the vote, which is necessary to assure voters that their vote got counted, as well as being needed in case of a recount. But that's easy enough to add, and New Jersey Congressman Rush Holt has introduced a bill in the U.S. House that would make such receipts mandatory in all American e-voting systems.

Passing such a law would be good. It would also be a very smart thing to pass a law mandating that all e-voting systems have to be open source, like the Australian model. In my opinion.

Pedantically Semantic

Here's an argument that XML has come of age: Effective XML: 50 Specific Ways to Improve Your XML, by Elliotte Rusty Harold (Addison-Wesley, 2004; ISBN 0-321-15040-6). Apparently, XML has matured to the point that a book of best practices is in order. And apparently, XML is complicated enough to fill such a book.

If the latter is true, it may be because the book is extended enough in its coverage to address beginners as well as old hands at XML. (In XML, old hands are those who have been at it for five years.) The author starts by lecturing us that an element is not a tag. Some readers may need to have that distinction clarified, I guess. Later in the book, he's telling us how to organize variable definitions for large stylesheets, which is more like it.

I don't mean to be flippant. Or I do, but that's just a personality quirk: The book deserves better, even if its author does have an odd name. He covers syntactic issues like the use of whitespace and how and when to parameterize DTDs. He explains how to use tags to make structure explicit. He gives implementation advice like: Always use Unicode.

The section on semantics may be the most sophisticated. It lays out the APIs and other tools for processing XML, and offers such advice as: Always use a parser. That would be superfluous advice for "real" programmers, but it's quite possible to cobble together your own system for processing XML. Harold's advice: Don't do it. My XML writing is limited, but I'm keeping this book on hand; I think it'll be useful.

Biotech Potential

Biotic potential is defined as the capacity of a population of organisms to increase in numbers under optimal environmental conditions. Biotech potential, then, might just as well be defined as the capacity of that domain of technology to increase in importance under a version of Moore's Law.

According to a recent article in Silicon Insider (which came to my attention thanks to the indefatigable Jeffrey Harrow), that's just what's happening with biotech. "[B]y 2000, the total costs of sequencing had fallen by a factor of 100 in 10 years, with costs falling by a factor of 2 approximately every 18 months." See? Moore's Law.

So what might we expect if biotech is in fact tracking the growth of computer technology?

Think of turning all that basic science in genetics into practical technology along the lines of computer-aided design. CAD for biological systems. That's wild, but plausible. Now map the developments in plain, old computer CAD onto that picture: From dedicated bio-CAD hardware to under $2000 PC-based bio-CAD systems in a generation—meaning home biological labs and everybody designing organisms on their kitchen tables.

That sounds fairly crazy, and it might not work out that way. But if in fact biotech is growing under the same kind of forces that have been driving growth in computer technology for the past couple of generations, something on this order of craziness probably will come to pass. If so, the consequences will be a lot more startling than anything the computer revolution has thrown at us. Some imaginable biotech futures, like some imaginable nanotech futures, don't look promising for human survival. As Harrow points out, "There are already public calls by scientists and politicians to restrict access to certain technologies, to regulate the direction of biological research, and to censor publication of some new techniques and data."

Fat chance.

Search Inside

That genie is probably out of the bottle. Some of the tools you and I and terrorists and teenagers might use for tracking biotech potential are themselves instances of trends in the technologization of society.

There's the Public Library of Science, which has launched its first publication, PLoS Biology (http://www.plosbiology.org/). Everything published by PLoS is free to the public, and it's solid refereed scientific work, including primary research and research summary articles. If PLoS were trying to do it all, it would be at best a quixotic endeavor, but PLoS Biology is intended as a demonstration to scientists, scientific organizations, academic institutions, and publishers that open access publishing of basic science works and is the right thing to do.

Then there's the proposed Cable Science Network (http://www.csntv.org/), which hopes to do for science what CSPAN did for afternoon naps—sorry, what CSPAN did for public access to political antics. All science, all the time. Jokes about naps aside, I love the premise, but I can't figure out how real the concept is. So far, what I see is a proposal, not a plan. I'm hopeful about this one, but I'll believe it when I see signs of funding.

And then there is a very interesting tool that isn't restricted to biotech or even to scientific knowledge, but should be a very useful tool for tracking biotech developments: Amazon's Search Inside feature. This ability to freely search the text of books without buying them probably won't hurt sales of novels, but it looks like a remarkable tool for the researcher.

I assume that Amazon planners think that they will be able to turn off the feature for certain classes of books—it's already only available for a part of Amazon's virtual holdings—if it starts cutting into sales. I wonder if they can, though. Google is already looking at acquiring the technology (and rights) to replicate what Amazon is doing, so that genie seems to be out of the bottle, too.

Incidentally, the exponential or Moore's Law or runaway—pick your adjective—growth of the biotech realm is creating a problem that could inspire some new technology. The U.S. Patent Office says it's bogged down with a half million patents to process, and it blames the backlog chiefly on biotech. Clearly what is needed is software to process patent applications.

Or maybe just new rules that make it easier for patent examiners to give a quick "No way!" to stupid patent applications.

History and Other Stories

I didn't really need to mention Jane Austen to raise the cultural tone of this column. The remainder of this month's offering will be chock-full of culture, thanks to culture critic Eric S. Raymond.

To ruthlessly paraphrase Eric: Expertise is what knowledge becomes after you get over yourself. The Art of UNIX Programming (Addison-Wesley, 2004; ISBN 0-13-142901-9) is an unusual book. I'm glad Eric didn't title his book UNIX Design Patterns as he considered. Ripping off Donald Knuth is much more tasteful. And like Knuth's classic The Art of Computer Programming, this is a book that shares expertise, not merely knowledge.

The tricky thing about sharing expertise is that, unlike mere knowledge, the nuts and bolts of expertise have already grooved down below the veneer of consciousness. You know, but without some effort, you can't always say why you know. Dredging up the details of expertise, which have become obvious or reflexive, in order to share them with others requires an effort that the merely knowledgeable person doesn't have to expend.

In packing the book with expertise, Eric drew on the expertise of some legendary experts, including Ken Thompson and Brian Kernighan and Ken Arnold and Steve Johnson.

The expertise shows up most blatantly in the section titled "Design," much of which consists of case studies and design patterns and other pedagogical devices; and the section on UNIX tools, which is titled "Implementation," although Eric seems to think he named it "Tools." (He should have.)

Eric's case studies offer brief sketches of such linchpins of UNIX as POP3 and IMAP and fetchmail, awk and Emacs Lisp. He discusses such UNIX interface-design patterns as the basic filter pattern and its variations, the interactive pattern seen in the ed line editor, and the separated engine and interface pattern, the last of which he calls "probably the one most characteristic interface design pattern of UNIX." If you think, as I did, that the filter pattern is the most characteristic UNIX interface-design pattern, you'll be interested in Eric's defense of his position. (Eric can almost always defend his position.)

Another kind of expertise is shown in the sections titled "Context" (history, philosophy, and comparisons with other operating systems) and "Community" (more history and philosophy, but this time regarding portability and standards and openness and documentation and licensing and another operating system comparison, this time with Plan 9 from Bell Labs). In the latter section, Eric is so blunt about the shortcomings of UNIX that one comes away thinking that it will survive, if at all, only in the Metcalf sense (see below). But I suspect that Eric is just rooting for the culture of UNIX to survive. If so, I find myself sympathetic and optimistic.

It Takes a Village

There is one statement in the book to which I must take exception. In discussing the classic Mac OS (in a section on operating system comparisons), Eric says, "[t]he incremental cost of becoming a developer, assuming you have a Macintosh already, has never been high." I guess I know what he means, but man oh man, I get a sharp pain between my eyebrows remembering the incremental cost of trying to become a developer for Macintosh in 1984: You had to buy a Lisa and you had to grok the gist of Inside Macintosh, which, if memory serves, consisted of 47-ring-bound volumes of badly photocopied riddles written in First-Century Aramaic. But that's a quibble.

Curiously, for all the practical expertise that this book shares, it is at its core a very philosophical work. Maybe it's not so curious: Programming seems to me to be a domain where deep philosophical questions and relentlessly pragmatic details bump into each other with some regularity. Eric brings up the pragmatic flaws of UNIX only to show that it is the underlying philosophy that really matters. He quotes Ken Thompson paraphrasing Bob Metcalf that Ethernet will never die because when something comes along to replace it, the replacement will also be called Ethernet. And in the case of UNIX, the code gets replaced like cells in a body while the organism remains the same individual.

Exactly what defines that individual, in the case of UNIX, is the culture and the community and the philosophy of it all.

In discussing the community of UNIX, Eric gets more personal than the average programming book. It's appropriate. To explain why GNU and the FSF have had the influence they have had, and to understand why Linux and open source have shot past GNU and FSF in immediate influence, you really do have to talk about the personalities of Richard Stallman and Linus Torvalds.

Elucidating the culture, community, and philosophy of UNIX also requires an exploration of its history, which Eric supplies just enough of. Whether it also requires eight Zen Koans featuring Master Foo is not so clear, but I'd hate to have missed them.

DDJ