PROGRAMMING PARADIGMS

Natural Language

Michael Swaine

I got dem I.O. blues. That's I.O., not I/O. I/O is input/ output and Io is a satellite and "I owe" is motivation for working, but I.O. is an ever-more-common syndrome in this Age of Information: Information Overload. I suffer from it; maybe you do, too.

If you do, you know that we, the afflicted, get little sympathy, since everybody thinks that we do this to ourselves. And maybe we do.

Info Gets Routed

Still, a lot of us who never went to either library school or trucking school are wondering why we spend so much of our time moving information from one location to another, figuring out where to stash the latest load of information, tracking down information that got lost in the stacks or in the information warehouse or out on the information highway. I'm sure that as we build more capable agents to move our information down that information highway they'll just get lost or stoned, make unscheduled stops, pick up hitchhikers, unionize, and strike.

I had somehow, naively, imagined that the Age of Information would bring about a different kind of work, a more intellectual labor; that soldiers would turn into video-game players, longshoremen into poets, and ditchdiggers into satellite engineers. Instead, it seems that the heavy lifting has just rolled onto our LANs and floated into our wetware. It's still manual labor: manual labor in the head.

And not all in the head, either. Let me tell you about my magazines.

I subscribe to, at last count, umpteen magazines, and the shelves on which they reside fill the long wall of my office, floor to ceiling, plus one wall of the spare room. Prior to their entombment, they hang on the 50 bars of a large magazine rack in the living room, the overflow spilling onto end tables and, occasionally, the floor, running ahead of my ability to keep up, like corpses in the plague years.

But I read them all.

Understand, I'm a professional wordsmith and feel free to use the word "read" in all its nuances simultaneously, from the somewhat superficial skim that I give to that Software License Agreement just before I rip open the package, to the impressive thoroughness that Zelda, our 11-week-old Labrador retriever, brings to her scrutiny of any periodical that happens to spill off that end table.

That said, I repeat: I read them all, or at least all that Zelda doesn't get to first. That's why I feel justified in occasionally opening the locks and letting some of this flood of information run off before it soaks into the water table of my library stacks, if you'll forgive a sloppy metaphor that may become more literal than I'd like if I don't get the office roof patched.

But I digress. This month, then, the locks are open.

Connections Get Mooted

I've learned to watch for the word "amid" in stories in the Wall Street Journal.

It's one of those weasel words that fake profundity by allowing the writer to seem to be saying more than he or she is actually prepared to say. The Wall Street Journal doesn't have a lock on the use of "amid," but the word does seem to crop up awfully often in economic reporting, for what you may agree, when you see what I'm driving at, are obvious reasons. The context is typically something like this:

Stocks tumbled as skittish investors bailed out of utility and financial stocks amid fears that interest rates are headed higher. [Wall Street Journal, November 4, 1993]

Notice that the sentence does not explicitly state any connection between the fall of these particular stocks and the fears of unspecified persons regarding interest rates, except that they occupied more or less the same time frame; that is, they coincided. The strong implication, though, is that there is a causal connection; otherwise what's the point of drawing attention to the coincidence?

But you can be sure that the writer has no credible evidence for a causal connection; otherwise, why not state it? "Amid" seems almost always to be a signal that the writer is about to indulge in guesswork. The guesswork may be eminently plausible, but that only makes it that much easier for readers to overlook the fact that it's just guesswork.

My advice is: Watch out for those "amid"s.

What in the ever-lovin' blue-eyed world, you inquire, does any of this have to do with the price of debuggers?

Trust me, I reassure, wading deeper into it.

There are more legitimate ways for writers to suggest a connection between ideas without actually stating the connection. One is juxtaposition: placing the ideas next to each other and letting the reader figure out the connection, if any.

Harper's magazine does this extremely well in its "Harper's Index" feature. In case you haven't seen the feature or any of the half-executed imitations of it, it consists of a list of factoids, like this:

Chances that an unemployed European has not worked in more than a year: 1 in 2.

Chances that an unemployed American has not worked in more than a year: 1 in 9. [Harper's, November 1993]

Harper's editors construct this list carefully so that there are connections between adjacent factoids, but they leave the discovery of the connection--causal, contrastive, ironic--to the reader.

I like "Harper's Index," and it reminds me of why I like the semicolon. I may write an article about the semicolon some day; if I do, it will sound something like this:

There are basically three ways to deal with relationships between ideas in running prose (as opposed to formal structures like "Harper's Index" that convey information in their structure). You can make the relationship explicit, for example, by using relative conjunctions like "therefore" or "nevertheless." Or you can leave it for the reader to discover the existence and nature of the relationship, by simply putting each idea in its own sentence. Or you can use a semicolon.

The first two choices, in my opinion, encourage passive reading. In one case the reader is given the relationship; in the other the relationship can easily be overlooked. Only the semicolon has the virtue of making the existence of a relationship between two ideas explicit without hinting what that relationship is. It signals plainly that there's something left unstated. It invites the reader to examine the connection between two ideas. The semicolon engages the reader; it makes prose more interactive.

Gabriel's Horn Gets Tooted

By jing, you persist, this writin' stuff ain't got jellybeans to do with programmin'.

Well dog my cats, I remonstrate, if Dick Gabriel can get away with it, why can't I, huh, once in a while? And get away with it Gabriel did, in the October 1993 installment of his "Critic-

at-Large" column in Journal of Object-Oriented Programming. Not only that, he justified it.

Gabriel had just returned from a week-long nature poetry workshop at which he was intrigued to hear poet Gary Snyder exhort would-be poets to "get the science right."

Gabriel has long encouraged scientists and engineers to "get the writing right." Most computer scientists, he complains, are persistent dilettantes. Despite the fact that they spend a quarter to half their professional careers writing, they do not approach it professionally, or seriously, and as a result do not communicate their ideas effectively.

Gabriel gives his list of tips on how to improve your writing, ending with one that his readers may find surprising: Start a writing workshop. Professional writers know that writing workshops are the fastest way possible to get a lot of useful feedback on your writing. They are exceptionally useful to most beginning writers (and more writers than would like to admit it are beginners).

A writer's workshop consists of a few to a couple dozen writers sitting in a circle criticizing each other's writing. There are few rules, but the few are important. Gabriel presents one set of rules, but there are others. Science-fiction writers seem to be especially good at workshops; what is called the "Clarion" model is excellent. In the Clarion model, each participant reads and critiques each other participant's work, while the victim remains silent until all criticisms are heard.

Gabriel suggests that computer scientists get together with colleagues in related fields to hold workshops, critiquing papers before submitting them to conferences. Then this year, he implies, there may not be, as there was last year, a 91 percent rejection rate on OOPSLA submissions.

I think this is a very good idea. Is anybody out there doing it? Let me know.

Plauger's Language Gets Booted

On a higher linguistic plane, P.J. Plauger drew criticism from a computational linguist in the November 1993 issue of C User's Journal.

Plauger had published an article on natural-language processing in the April 1993 issue of the magazine, and an interesting piece it was. Reader M. Boot, a computational linguist by profession, took Plauger to task for the simplistic level of the piece. His criticisms were, roughly:

  1. The author uses the terminology of computational linguistics in the article, but the associated code doesn't live up to the language of the article.
  2. The techniques demonstrated are 19 years out of date.
  3. This is adventure-game linguistics.
Maybe Plauger should read Computational Models of American Speech by M. Margaret Withgott and Francine R. Chen (University of Chicago Press, 1993), which Jon Erickson reviewed in this journal in October 1993.

Maybe a lot of us should.

Boot's beefs don't mean that the C User's Journal piece wasn't interesting and useful. In fact, Plauger, no dummy, found it interesting; he claims that his readers did; and I admit that I did, too. So I, for one, am happy to judge it a good article for its intended audience, but what about that audience? Are we all ignorant?

In this one area, yeah, I suspect that we are. This is only a guess (maybe I should work in an "amid"), but I suspect that the distance between academic and commercial work in computational linguistics is greater than the corresponding gap in a lot of other areas of computer science.

If true, doesn't that suggest an opportunity? Isn't it possible that computational linguistics could be a fruitful area for the kitchen-table software entrepreneur?

Granted, if the distance between academic and commercial work in computational linguistics is greater than the corresponding gap in a lot of other areas of computer science, it may be because computational linguistics is a lot harder than a lot of other areas of computer science. But Fermat's Last Theorem was hard, and its cracking last year just demonstrates that hard problems can often be broken down into smaller, more manageable problems.

Maybe there are small advances to be made in computational linguistics that are open to the kitchen-table programmer. And, not to be overlooked, maybe these advances could become successful commercial products. Many natural-language applications that do not require a complete model of the English language.

Computational linguistics is an area of interest to me, but I'm sure M. Boot would judge me to also be 19 years out of date. If any DDJ readers are doing interesting work in this area, and are willing to talk about it, I'd love to hear from you.

Negroponte Gets Hooted

In the November 1993, issue of New Media, editor-in-chief David Bunnell ridiculed the idea that there is a convergence happening in the area of multimedia, and passed along the intelligence that the word "convergence" was invented by Nicholas Negroponte as a marketing gimmick for his MIT Media Lab.

Did Southern Pacific Railroad and U.S. Rubber merge to create the auto industry, he asked, or G.E. team up with the Royal Shakespeare Company to launch the movie industry?

Historically, new industries are created and dominated by new companies, and Bunnell predicted that the multimedia heroes will be new companies, still in the garage today.

Ah, you say, but the new industry of multimedia depends on content, and the big companies are buying up all the content. But Bunnell also questions the notion of repurposing existing content.

Nicholas Negroponte had his own say on the issue of repurposing in the November 1993 issue of Wired. At least that's what I think he was talking about. I honestly believe that Negroponte consciously tries to write like Marshall McLuhan. I'd better let him speak for himself:

Modern multimedia_must include the automatic transcoding from one medium into another, or the translation of a single representation into many media_. Books that read themselves when you are dozing off, or movies that explain themselves with text are good examples.

I don't know about you, but I'm encouraged. I've been writing this column all along so that it would read itself if you fell asleep.

Issues Get Disputed

Also in that November issue of Wired, which is the first monthly issue, are an interview with Alvin Toffler (touching on such Toffleresque predictions as the breakup of China, a Constitutional crisis in the USA, a global revolt of the rich, and niche wars with personal nukes) and a whole slew of what Wired calls idées fortes and any other magazine would call "viewpoints."

One of these idées mused on the issue of the viability of copyright out on the information highway. Another was billed as being about digital archaeology, and darned if that wasn't a fair description. Can we assume that we are leaving a readable record behind as we generate all this electronic data? Anyone who can read German can read the first book ever printed, but I can't read my Osborne 1 disks. What will information archaeologists of the future make of our era, and on the basis of what data?

I cite these idées as evidence that discussion of the social implications of technological change is alive in computer publications. But Wired is a special case, and not actually written by or for the agents of that change.

There are magazines that are written by and for, et cetera. This one, for example. And I observe with pride that the issues of several programmer's magazines that I have before me do indeed touch on these social issues.

Here's the October/November issue of PC Techniques, in which editor-in-chief Jeff Duntemann debates encryption legislation, drug policy, and crime control with a reader. Here's November's Windows Tech Journal, in which Zack Urlocker talks about copyright law. And as we know, Jon's editorials often delve into the social consequences of technological change and of governmental reaction (or lack thereof) to that change.

Two thoughts about this: 1. It's important, because ignorance is power, placed in the hands of others. What you don't know can hurt you and what others don't understand can, too; 2. the best such discussions tend to be among the most technically knowledgeable. It's encouraging that the technical community is thinking about these things, and it's a laugh in the face of the common view that engineers and technologists don't consider the consequences of their work.

Which brings us back to writing, since ideas poorly expressed are not well understood. And it brings a chance for writer/editor/programmer P.J. Plauger to redeem himself.

In his "State of the Art" column in the November 1993 issue of Embedded Systems Programming, Plauger talks about the "other" interfaces of embedded systems. Most products are designed to be easy for the daily user, he says. But there are also the rare reconfiguration uses that may crop up monthly or yearly, and these typically sport interfaces and documentation that are all but unusable to anyone but a trained technician. Bad. As he puts it:

Any interface you provide that gets only occasional use had better do lots of prompting. Favor menu-style choices over open-ended command sets that must be memorized or looked up in a manual. Provide at least brief hints about what each option actually means.

In other words, consider your audience. The ultimate practical advice for writers and software developers.


Copyright © 1994, Dr. Dobb's Journal