Random Thoughts

Dr. Dobb's Journal January, 2005

By Michael Swaine

Michael is editor-at-large for DDJ. He can be contacted at mike@swaine.com.

Randomness is a glittering snowflake of a concept: utterly familiar yet beautiful and intricate when examined closely. And when you examine it too closely, it evaporates, leaving you wondering if it was just something you imagined. Science writer Ian Stewart recently posed the question of whether randomness even exists, or if it is just an invention of our superstitious minds ("Randomness," New Scientist, September 25, 2004).

Real or imaginary, randomness comes in flavors and displays itself in facets: It can be classical or quantum, a mark of superstition or a measure of ignorance, certain or uncertain, ergodic, or chaotic. It's related to heat, and you can improve algorithms by "heating" them up with some gratuitous randomness; but it's also related to noise, and allows you to sort noises according to their "colors."

Randomize bits and you have a cryptogram. Randomize letters and you have an anagram. Randomize thoughts and you have a Paradigm, or at least a "Paradigms" column. At the end of this month's random walk, I'll derandomize the anagram I published here three months ago.

Superstition

Conventional wisdom (which is to say, random public opinion) holds that there is "real" randomness at the heart of all things, a peculiar quantum randomness that cannot be removed. This, CW says, is different from "classical" randomness, which is just a measure of our own ignorance and can be reduced by acquiring more information.

But if classical randomness is simply a statistical measure of observer error, that's not how we talk about it. We speak of random processes as though the processes in question were not deterministic. When we can't find the pattern behind a sequence of events, we give the unknown pattern a generic name—randomness—rather than admit that the case is not closed. But pretending that you've said something meaningful when you've merely given a name to your ignorance: Isn't that superstition?

Uncertainty

Quantum randomness, though, is "real" randomness. Physicists say that there is no statistical process underlying quantum randomness. It just is. And the case seems solid, offering no loophole. Or at least that the possibilities for getting around Bell's Theorem and the empirical evidence for it are few and unpretty. You have to assume, for example, that certain quantities simply cannot be computed, or that a certain form of argument is inadmissible.

But if you're willing to reject one of the assumptions of Bell's Theorem, you can in fact construct a hidden-variable theory, in which the observed quantum randomness is merely a statistical property of unobserved variables. So the question remains: Is quantum uncertainty certain?

Contrariness

Randomness, or unpredictability, seems to have some affinity to irreversibility. Fill the grid in the game of Life or some other cellular automaton with random bits, and let it evolve through many generations. What you see is "a complex web of structure and activity...with various levels of organization" (Toffoli and Margolus, Cellular Automata Machines, MIT Press 1987; ISBN 0262200600).

But if the rule of evolution of the automaton is time reversible, the system looks just as random at every step. Only irreversible rules lead to interesting outcomes. Does this say something interesting about reversibility and the universe in which we live?

Anarchy

Whatever randomness is, it is certainly related in some way to unpredictability. Yet the relationship between randomness and unpredictability is far from straightforward. Randomness can lead to predictability, while deterministic processes can turn out to be unpredictable. In fact, can be expected to, in certain circumstances. It's predictable unpredictability. You doubt? Consider: Statistical mechanics is the mathematical engine of thermodynamics. Boyle's Law is a deterministic thermodynamic law relating the pressure, volume, and temperature of a quantity of a gas. If you know two of these quantities, you can always compute the third. It's the law.

What's going on at the molecular level, though, is as anarchic as a food fight at Animal House: individual gas molecules randomly bouncing off one another. (I don't mean to slander these molecules; their ricocheting properly obeys Fast Eddie Felson's Law of Combinations. But in the sense that you have no idea of the positions and velocities of the particles, their motion is random.) Yet out of this randomness comes the strict orderliness of Boyle's Law.

Chaos

So much for predictability out of randomness.

To get unpredictability out of determinism, we need look no farther than chaos (which, in my office, is never far away). Chaotic motion, Manfred Schroeder of Bell Labs claims, is much more widespread in nature than regular motion (Fractals, Chaos, and Power Laws, W.H. Freeman, 1991; ISBN 0716721368). In chaotic motion, small initial errors grow exponentially until they swamp out the effects of regular motion. It was mathematician and meteorologist Edward Lorenz who found the poetic image for this exquisite sensitivity to initial conditions: Does the Flap of a Butterfly's Wings in Brazil set off a Tornado in Texas? Answer: It might. Tweak the initial conditions a tiny bit in a chaotic model of the weather, and you get radically different weather.

So random processes can have effects that are highly predictable, and deterministic processes can have effects that are essentially unpredictable.

Confusion

Still, on the spectrum of exoticness, we think of chaos and unpredictability and randomness as being toward the exciting and mysterious end, and determinism and statistics as being over near the dull and understood end. Chaos theory lends itself to colorful metaphors like "butterfly effect" and to trippy philosophical discussions. Statistics, in contrast, seems like the most settled of disciplines.

However, there are real differences among statisticians regarding the deep meaning of it all. Statistics is, on careful examination, rather mysterious.

The two philosophical paradigms of statistics are classical(?) and Bayesian. The more you examine either of them, the less satisfactory does it seem. (It seems to me.) One is disturbingly circular in its logic, and the other is disturbingly subjective.

In classical statistics, the statement that a 95 percent confidence interval for an unknown parameter runs from -1.5 to +2.4 makes it sound as though the parameter lies in that interval with 95 percent probability. But if you thought that, you would be thinking right-side-up, which will get you nowhere in classical statistics. All you are allowed to conclude is that if you were to carry out similar procedures time after time, then the unknown parameter would lie in the confidence intervals you constructed 95 percent of the time. Whatever good that is. The developers of classical statistics constructed a lot of very handy tools, and told the people using them that they couldn't actually use them to come to any conclusions that a researcher would be interested in reaching.

Bayesian statistics treats probability as ignorance and provides methods for determining how much your ignorance is reduced by successive collections of evidence. But you can't begin to apply Bayesian methods without first taking a leap of faith for whose justification Bayesian statistics is utterly unhelpful. If you asked a Bayesian weatherman whether it was going to rain tomorrow, his first step would be to ask, "What do you think?"

And these unsatisfying options are the only philosophical models for statistics we have to choose from.

The fact that we can't even agree about what we're really doing when we do statistics is further evidence that we don't really don't know enough about ignorance. Should we be concerned? Is what you know that you don't know more or less dangerous than what you don't know that you don't know? We don't know.

Turmoil

We do know that injecting a little randomness into a system can sometimes improve its performance. In his 1951 essay, "Can Digital Computers Think?" (cited in Alan Turing: The Enigma, Andrew Hodges, Simon and Schuster, 1983; ISBN 0671492071), Alan Turing played around with the idea of granting "free will" to a computer program by injecting it with a little randomness, and even alluded to by-then decades-old arguments that the indeterminism of quantum processes could help to explain human free will. It seems plausible, based on extant evidence that he was very serious about this latter notion (Computationalism, by Matthias Schuetz, MIT Press, 2002; ISBN 0262194783).

But adding randomness to produce smarter algorithms is more than a notion. "Simulated annealing" is a proven technique for raising the "temperature" of a search algorithm, thus increasing the likelihood that it will jump out of a locally optimal state and discover more nearly globally optimal states. In an uncertain universe, a little uncertainty in our algorithms proves to be advantageous.

Noise

One system for categorizing different types of randomness is the color system for noise. Noise, or time-based random data, can be characterized as white, brown, pink, or black, at least.

"[T]he power spectra (squared magnitude of the Fourier transform) of statistical time series, often known as noises, seem addicted to simple, homogeneous power laws in the form f^-b as functions of frequency f [Schroeder]." For white noise, the exponent b=0. For brown noise, akin to Brownian motion, the exponent is -2. Pink noise, with an exponent of -1, lies between; black noise has an exponent beyond -2. The color expresses how time-dependent the randomness is.

Hodge says that Turing enciphered Winston Churchill's voice into white noise while working on crypto work during WW II, but Hodge isn't necessarily being as precise here as Schroeder. It could have been some other color of noise. In fact, Schroeder points out, white noise is a fiction. Its spectrum can be flat only over some finite frequency range. Whatever the range, it's definitive noise, staticky and unpleasant to the human ear.

Pink and black noise are widespread. Black noise governs power blackouts, floods, droughts, and bear markets. Pink noise has equal power in octave frequency bands, and because of the unbiased way it stimulates the inner ear, it is the psychoacoustic equivalent of white noise. It crops up in many natural systems. Brown noise shows up in computer graphics systems that produce fractal mountain ranges, and in all sorts of other places.

Obfuscation

The signature use of randomness, though, is in cryptography. The voracious appetite of crypto applications for random numbers is rich in irony. Not only are large, inherently uninteresting numbers now a commodity desired for their very uninterestingness, but the use to which these uninteresting numbers are put is to disinform. Crypto celebrates randomness not as in its informing use as a measure of ignorance but in its disinforming use as a means of ensuring ignorance.

But what exactly is a random number? If "random" just means "I don't know where that came from," how do you, as programmer, generate one? We've all written random-number generators, but we all know that the numbers generated were only "random enough" for our purposes. Programs that are said to produce random numbers actually generate pseudorandom numbers unless they are attached to gamma-ray detectors or some other source of measurements of quantum uncertainty.

But that's not quantum crypto. Quantum crypto draws on quantum randomness, but in a more fundamental way. It requires specialized hardware that can store information in the form of qubits in superpositions of quantum states, and specialized algorithms that can take advantage of the quantum storage. Quantum crypto promises to be uncrackable, assuming all those aforementioned Bells' Theorem assumptions of quantum uncertainty. And assuming many much more mundane assumptions regarding the effectiveness of the specific implementation's safeguards against spoofing and other non-quantum tricks.

Frankly, the idea of uncrackable encryption scares me. I don't trust myself that much. I recently turned on Apple's FileVault on-the-fly file encryption, and despite the fact that I have two safeguards for getting into the files if the on-the-fly encryption goes on-the-fritz, I'm still nervous.

Concealment

Another approach to obfuscation is steganography. I like the name and I love the concept—hiding in plain sight. Steganography is an alternative to encryption, and is as old as codes. In ancient Greece, they would etch messages in a block of wood, then fill the grooves with wax. Melt the way and read the message. Or they'd shave a slave's head, tattoo a message on his scalp, and let his hair grow back.

According to Wikipedia, "[s]teganography works by replacing bits of useless or unused data in regular computer files (such as graphics, sound, text, HTML, or even floppy disks) with bits of different, invisible information. This hidden information can be plain text, cipher text, or even images." Although steganography is an alternative to encryption, it can be used in combination with encryption, the encrypted message being further hidden steganographically. Or a separate message can be hidden steganographically in the same file with an encrypted message.

Steganography took a back seat to cryptography for years, but began to get some attention in the '90s, even getting its own international conferences. The application that kicked up the popularity of steganography is watermarking. Hiding a copyright message in a file is a way of protecting it that depends on the existence of the message being undetected. This is the unique feature of steganography; encryption doesn't hide its existence.

Hiding the fact that information has been embedded in a digital file is a tricky process. It's a game, really: You have to guess what statistical properties of the file might be analyzed. One trick is to embed the message not in individual bits but in the parity of a set of bits. That way you can change whichever bit in the set is least informative to potential snoops. Google "steganography" for algorithms and more info.

Troublemaking

All this discussion of cryptography and steganography set me thinking.

My personal views on the Second Amendment are: (1) that the founders were describing a means of safeguarding one's rights and one's person against any and all threats, (2) that the idea that the government might be one such threat was no stranger to their thinking, but (3) that in the 21st century it's more than a little naive to think that you're going to defend yourself against the government of the United States with your home arsenal.

I believe that what the authors of the Second Amendment would be defending today is the right to use crypto. I think that crypto occupies the role that guns did two centuries ago. You may disagree, but you could grant me the point for purposes of argument; I'm going somewhere else with this anyway.

I admit that crypto lacks some of the features of a gun. One important feature of the right to carry a gun is that it is self reinforcing, or self defending: If you are carrying a gun, people are reluctant to challenge your right to carry a gun. They may decide not to raise the issue. Using crypto doesn't have this virtue: It doesn't make you a less appealing target for suppression of your right to use crypto, but a more appealing one.

Steganography, however, does make you less of a target. Hiding that you're hiding, that's the trick to hiding from those who want to deny you the right to hide. Stego plus crypto: The un-gun gun?

Distraction

In my October 2004 column, I asked you to consider what obsolete machinery was suggested by the letters "ISO and the URL," and added, "[a]t the risk of sending you off on the wrong track, I'll suggest that you think about locomotives, focusing on the rail sound in south Ireland."

The solution is a Linotype machine. Along with "ISO and the URL" there are two other anagrams in the clues—"the rail sound" and "south Ireland"—that resolve to the strings "etaoin shrdlu."

These are not words, but they are by some estimates the most common 12 letters in English and, more to the point, they are the first two banks of keys on a Linotype machine.

Error

In that same column, I also wrote: "By this logic, all sorts of portable devices should be banned, including memory sticks. Hard to police this policy...Not a crazy idea, just a hard one to enforce, as are all attempts to bottle up information."

A reader writes: "I wanted to make you aware of Microsoft's efforts to help people police this policy: http://news.zdnet.com/2100-3513_22-5356485.html. Personally, I agree that its futile to try and bottle up information."

DDJ