Dr. Dobb's Journal, February 2006
Of course, this daunting business of inflicting on yourself a radical paradigm shift over to the search model isn't really something you'd want to task your imagination with every day. Writing a spreadsheet application, for example, isn't obviously helped much by viewing the task as a search through a space of programs looking for one that has a lot of little boxes with numbers and formulas in them. Or any other problem-space-search representation that I can think of. Still, it can be a powerful technique, and in fact, it was responsible for bringing about the first successes in artificial intelligencethe early puzzle-solving and game-playing programs. So say Avron Barr and Edward Feigenbaum in The Handbook of Artificial Intelligence (William Kaufmann, 1981; ISBN 0865760047), and they should know.
An entirely different idea regarding the desirable ubiquity of search is the notion that every viable 21st century software business model can and should be built around search. A corollary of this rather bold theorem is the idea that Microsoft wants to or needs to become Google, a notion that one would be tempted to discount as a gross oversimplification if it weren't for the fact that the person seemingly most responsible for putting the thought into the technological public's imagination is Bill Gates. And he should know.
What we all know by now is that Microsoft really is searching for a new business model. Maybe they'll find it in one of the many search-related projects underway in Microsoft labs. Maybe they'll adopt Google's when Google gets done with it.
What I can tell you is that you'll find here some random observations on search and a brief look at another of those fat books with ambitious titles for which I have an odd fondness. On my Fat Book Shelf right next to Stephen Wolfram's A New Kind of Science (1197 pages) and Stephen Jay Gould's The Structure of Evolutionary Theory (1433 pages) stands Roger Penrose's The Road to Reality: A Complete Guide to the Laws of the Universe (1099 pages). I've wanted to have someone explain to me the laws of the universe, and because Penrose won a Wolf Prize with Stephen Hawking "for their joint contribution to our understanding of the universe," he should know.
As I write this, search is much in the news (with Google indexing blogs, news is much in the search, too). Google itself is much in the news. Stock price? You could drive across the United States in an SUV at summer 2005 gas prices for less than the price of a single share of Google stock. CNet writer Declan McCullough recently wrote a piece canonizing the old search protocol Gopher and its Veronica server; while in The New York Times, John Battelle was raving about the Web 2.0 conference, much of which was about search. Meanwhile, there's a campaign to save the search mascot, Jeeves, and we read that "Bill Gates Visits the Holy Land and Talks Search." Search is everywhere.
News aggregators? Ping servers? Mapping, GPS, people search, social bookmarking, tagging, communities of interest. Just more ways to search.
Some communities of interest, particularly ones that lead to people making contact off the Internet, make some people nervous. And what interests are we talking about? Suicide, for example? Should search engines build technologies to push people who are searching for suicide information toward help? Or is it always a bad idea to subvert the proper working of a search engine? And before you answer, have you ever engaged in Googlebombing?
Which raises the rather naöive but important question of whether you can trust search. Clearly you can't, so what can be done about this? Dogpile is a search aggregator that could suggest the way: Don't trust one engine, but apply some metric of trust over all of them to route around bias and error and Googlebombing.
If the Internet is the center of your work, then search is the main task you have to perform.
There was a lot of second guessing in the press about the meaning of Google and Sun agreeing to cooperate on technologies. Mostly this was speculation about how they would target Microsoft. Would the two companies use OpenOffice.org to challenge Microsoft in the Office suite software arena? Or is that yesterday's platform, and would they push harder on the idea of the Internet as the center of the computer user's world?
I don't think anyone knows exactly what Sun and Google will accomplish together, but some scenarios seem more likely than others. Here's a question that I think brings some of the speculation into focus: Which is more likely, that Google copies Microsoft, or that Microsoft copies Google?
It's fun to ask questions like that, but it's not much fun to watch judges grapple with tricky technological issues. When they have trouble deciding whether "Intelligent Design" is religion or science, I worry about their ability to determine the senses in which BitTorrent resembles Grokster. "Torrent files don't contain any data," a defender argued. "This is a search engine scenario. Why aren't Google, Yahoo, or Microsoft getting sued?"
In fact, Google has been threatened over its Google Print technology, despite the fact that it has gone out of its way to avoid copying copyrighted material.
As I understand it, Google Print, which allows searching inside books, indexes those books rather than caching their content. This means that the book is nowhere copied, and that it would be extremely difficult to reassemble the book from the index.
But Google does cache content, of coursethe content of web pages.
The truth is, this routine caching of web pages is much more clearly a case of copying than anything Google is doing with books. When I Google a topic and find several news stories on the topic, and click on a link and find that it has expired and the news service has removed the story or pulled it behind a subscriber wall, I just click back to the search results and go to the cached version. This clearly undermines the intentions of the news service. Is it illegal? Should it be?
Making caching illegal could cause serious damage to the way search engines work and the way the Internet works. But this underscores one of the ways in which the Internet, working as it was intended, calls copyright and other intellectual property laws into question.
Personally, I think that a lot of problems are better described as challenges in visualization rather than search tasks. Sometimes you know where the information is and you just want to make some sense of it. Yes, you could define that as some sort of search. And I do suspect that when I'm staring at a spreadsheet I may be getting bogged down in data rather than viewing a solution or insight. But I think that sometimes we want to be intimately involved in the search process, and that converts the process into something other than pure search.
Advertising is certainly search. Search with a hook, which is to say fishing, and much of it is of a type of fishing known as "chumming." Throw the bait out on the water and hope that some big fish comes along and you'll be able to snag it.
The opposite of this is targeted advertising, which at first blush seems like a powerful idea that can improve the efficiency of advertising by orders of magnitude. The idea is not new, but technology today makes much greater targeting possible. To a scary degree. But on second glance (and after watching Glengarry Glen Ross again), it seems to me that it's not so simple. Selling is about converting a nonprospect into a prospect and into a customer. There is an inherent problem in defining the search space, due to the unwillingness of the salesperson to refineand thereby reduceit. Or at least there are conflicting desires. So maybe the picture of advertising as search is not so clear.
I finally reached the end of the road; that is, the last page of Roger Penrose's The Road to Reality (Alfred A. Knopf, 2004; ISBN 0679454438). It was a tough slog. The math made my head hurt, and I like math. I was going to dedicate this whole column to a review of it, on the principle that if I have to wade through 1100 pages of complex manifolds and holomorphic functions, you should be forced to suffer proportionately. But the truth is, I don't understand this book a whole column's worth.
According to its jacket, this book is addressed to the serious lay reader. Yeah, right. The audience is a little more rarefied than that: Nobody who hasn't done graduate work in mathematics is going to get much out of this book. Not only is it richly endowed with dense footnotes, but most of the footnotes have homework problems in them. Like footnote 27.16:
Give a general argument to show why a connected (3-)space cannot be isotropic about two distinct points without being homogeneous.
And make it snappy, serious lay reader.
Whatever its virtues, this book is not for the faint of heart. Reading it, or anyway wading through it, was for me a humbling experience. Not only did I have it rubbed in my face how much math I've forgotten and how little I knew to begin with, but I've in the past been critical of some things Penrose has written in his more popular and accessible writing, but here I wouldn't dream of critiquing him. Just over my head.
But I did get something out of the book and I do think that some DDJ readers might find this book interesting and I'm not sorry that I made the effort.
Penrose is a brilliant, important thinker, a collaborator with Stephen Hawking, and he's not kidding about the title. This book is a whirlwind tour through all the important questions in modern physics and all the math needed to truly understand the questions. I've written here about some unorthodox approaches (Wolfram, Fredkin) to understanding the laws of the universe, approaches that make those laws look like computer programs. Penrose has a different view of these things, but his approach is also challenging to orthodoxy. Although orthodoxy is probably the wrong term when any theory that fits the empirical data that quantum physics works with has to be flat-out crazy. What makes Penrose et al. germane to this admittedly wide-ranging column is that the information is central in all their theories. Information seems to be at the heart of everything; for example, in a brief moment of accessibility to that serious lay reader, Penrose exposes the common misconception that we depend on the energy from the sun for our survival. Nope: Entropy, not energy, is the key. We consume the sun's information.
It took me a while to figure out what Penrose was doing in this book. This, I think, is what he's up to: He wants you to be able to visualize mathematical structures. Spaces, fields, bundle spaces. He provides an enormous number of illustrations, mostly looking like something left in the oven too long: surfaces or solids of odd shapes curled back on themselves. It's a truism that you can't really visualize 4D space, but you can use visualizations to gain insight into four dimensions, just as you can't fully represent 3-space objects on 2D paper, but we manage to model them usefully via projections of various kinds. Penrose pushes this as far as he can. Even if you don't understand all of his helpful diagrams, it's impressive to realize that he works so hard to find a way to visualize every one of these extremely abstract concepts.
In the later chapters, we see what all this visualization work was for, as he introduces concepts in physics that heas a mathematiciansees mathematically. Now the foundation work in the early math chapters helps you get an idea how he visualizes the bizarre quantum properties of the universe. If you really work to understand the visualizations in the early chapters, you'll be able to visualize the tough stuff in the later, physics, chapters. Like: Ah, this is one of those things where the pie crust has lots of little fishhooks coming out of its upper surface.
I realize that I'm not giving much of a sense of what Penrose covers in the book. Okay, for example, he critiques ontologies for quantum theory, including the Copenhagen interpretation, the many-worlds view, environmental decoherence, consistent histories, and pilot wave approach, and presents his own unorthodox view. He covers the Special and General Theories of Relativity, Quantum Theory and quantum phenomena, and candidates for a Theory of Everything. And like that.
When he gets to the Theories of Everything, he provides much material for IDers to get excited about. Like his picture of an anthropomorphic Creator performing an extremely low probability act to place the universe in the immensely low entropythus specialBig Bang state. But it's a metaphor, just as he's being metaphorical when he presents Lee Smolin's notion of multiple universes spawned by multiple universes, with a kind of intergalactic natural selection leading to the evolution of better fitted universes. Darwin on the largest scale.
Finally, in chapters 30 and 33, Penrose presents his own approach, which I won't attempt to summarize except to say that he takes on all the giants of quantum theory. His theory is unorthodox, and he acknowledges that it's also short on testable predictions. Its virtues are at present chiefly aesthetic.
However, his theory is not alone in being hard to test. Penrose cautions against being seduced by mathematical beauty when the kind of theories you're dealing with are Big Science, where empirical refutability can be hard to come by.
As for me, I'm all for beauty and simplicity and the parsimonious explanation. I'm starting to lean toward that model that says that the universe can be most simply described using 11 dimensions. But maybe just because I want to refer to it as "Occam's Eleven."
Sorry.
DDJ