The Perl Journal December, 2003
Over the years, I've produced my fair share of commercial CD-ROMs. Along the way, the one thing I've learned is that a good search tool is hard to find.
Well, I should say it's hard to find if you want one that is lightweight, indexes without crashing, works on a variety of platforms, and doesn't cost an arm and leg to distribute on CD. I've looked at a lot of content-indexing systems, from the cheap-and-dirty right up to the expensive-and-feature-laden.
The problem is complicated by the fact that search tools aren't one application. They're twoan indexer and search client app. For CD-ROM content, you only have to run the indexer once, to produce an index. That index is distributed on your CD-ROM, along with the search client app that reads the index and lets users search the disc. Both of these apps need to be reliable.
Some search tools have indexers that tie you to one platform, which means that if you want to do all your CD-ROM production work on one machine, the indexer is determining for you which machine you'll be using. Some tools have Java-based search applets that are supposed to work on any end user's VM. We all know how that goeswrite once, debug everywhere. Yet another problem is that some tools require you to distribute an app on your CD-ROM that must be installed on the user's machine to allow searching of the disc. Granted, performance may be better than with a cross-platform search tool, but I still don't like CD-ROMs that do this.
So you can imagine my delight in running across jsFind, an open-source tool put together by Shawn Garbett (http://elucidsoft.net/projects/jsfind/). It's really just a nice customization of a popular open-source indexer called swish-e, coupled with some Perl to massage the swish-e output into an XML B-tree, and a JavaScript component to search that B-tree. Swish-e is available as source, so you can compile it on nearly any platform you like (I did it on Mac OS X). Shawn provides a patch for swish-e that lets it export its index as XML, but by the time you read this, that patch might well be incorporated into the main swish-e source. The back end of this tool, then, is usable on any machine on which you can manage to successfully compile it. So far, so good.
But what about that JavaScript component? That, too, seems to be a thing of beauty. It works in any Level 2 DOM-compliant browser. That includes many of the relatively recent browsers, and gives users the choice of at least two different browsers on any of the big three platformsWindows, Linux, and Mac OS X. (Probably Solaris, too, though I don't have a Solaris installation to test this.) The searches are fast and the whole user experience is customizable because you have access to all the code.
From the standpoint of compatibility for end users, a JavaScript client makes so much more sense than a Java client because JavaScript comes built-in to the browser. There's no separate install just to support the search client. Most of the time, the user's browser is adequate.
jsFind isn't perfect: It has a JavaScript timeout bug that shows up on some OS/browser combinations that leaves the search stuck on a "Please wait..." page. But this is usually easily cleared by resubmitting the search. Shawn promises he's looking for a way to squash that one. If you think you know the answer, he'd probably like to hear from you.
Kevin Carlson
Executive Editor
The Perl Journal