Through Thick and Thin

Dr. Dobb's Journal July 2003

By Gregory V. Wilson

Greg is a DDJ contributing editor and can be contacted at gvwilson@ddj.com.

Processing XML with Java
Elliotte Rusty Harold
Addison-Wesley, 2003
1071 pp., $54.99
ISBN 0201771861

Practical Debugging in C++
Ann R. Ford and Toby J. Teorey
Pearson Education, 2001
112 pp., $20.80
ISBN 0130653942

How to Break Software
James A. Whittaker
Addison-Wesley, 2002
208 pp., $26.25
ISBN 0201796198

Hacker's Delight
Henry S. Warren, Jr.
Addison-Wesley, 2002
306 pp., $39.99
ISBN 0201914654

Bioinformatics Computing
Bryan P. Bergeron
Prentice Hall, 2002
325 pp., $39.99
ISBN 0131008250

Bioinformatics for Dummies
Jean-Michel Claverie and Cedric Notredame
John Wiley & Sons, 2003
480 pp., $29.99
ISBN 0764516965

Regular readers of this column know that I usually don't like thick books. Partly, it's because they're a pain to carry around. Mostly, though, it's because we live in a networked world. I often find that most of the "thick" in a thick book is reference material that is soon outdated, and that I would rather search on-line anyway.

I make an exception for Elliotte Rusty Harold's 1000-page Processing XML with Java. Yes, a lot of the book is devoted to describing various APIs, but it also includes commentary on them, and lucid, pointed explanations of when and how to use SAX, DOM, JDOM, JAXP, XSLT, and other technologies.

Harold starts as he means to continue, with a careful, opinionated discussion of what XML is, what it is good for, and how to convert existing flat files to XML. SAX comes next (three chapters), followed by DOM (five chapters), and JDOM (a friendlier-to-Java variation on the DOM theme, which gets two chapters). XPath and XSLT get one chapter each, while the appendices present quick reference guides for various APIs, SOAP schemas, and two recommended reading lists (one of books, the other of specs). Throughout, the examples are well chosen and well analyzed, and the print is exceptionally clear.

At the opposite end of the size scale, though no less useful, is Ann Ford and Toby Teorey's Practical Debugging in C++, a book that began life as a lab manual for novice programmers. It carefully steps through the kinds of errors programs can contain, different ways to trace program execution, and the use of an interactive debugger. The focus is C++, but most of the concepts apply equally well to other static languages such as Java. While the book probably will not hold any revelations for most DDJ readers, anyone who is teaching or learning C++ will find it a valuable resource.

James Whittaker's How to Break Software: A Practical Guide to Testing is almost as thin and just as useful.

As its title suggests, the book is a catalog of different ways to make programs fail. Out-of-range input, badly formatted files, external resources like sockets or heaps that aren't as robust as they should be—they're all here in a "how to" format that explains when to apply an attack, what faults to target, how to determine if the attack is exposing a failure, and how to actually conduct the attack. As a bonus, the illustrations are drawn from widely known, real-world applications.

I liked this book a lot, and will be stealing ideas from it the next time I teach an undergraduate software engineering course. My only real complaint is that the book doesn't include exercises for readers to try themselves. Still, I suppose there isn't likely to be a shortage of bad software to play with any time soon...

Next up is Henry Warren's Hacker's Delight. The word "hacker" is used in its original sense to mean "a person who enjoys exploring the details of programmable systems and how to stretch their capabilities." If that's who you are, this book will be a delight, indeed. Want a fast algorithm for counting the number of leading zeros in a word? Have some bit matrices to transpose, or some constants to multiply by? Do you feel that conditional jumps are too expensive for casual use, and you'd like the compiler you're writing to produce fewer of them? The answers are all here, along with brief proofs that they are in fact answers. It's not light reading, but it reminded me of the "gee whiz" feeling I had as a teenager when I first started to program. For that alone, thank you, Mr. Warren.

The last two books in my pile today are Bryan Bergeron's Bioinformatics Computing, and Jean-Michel Claverie and Cedric Notredame's Bioinformatics for Dummies (yes, really). Both books are wide-ranging overviews of a topic that many seem to think is going to be the Next Big Thing (dot-gene, perhaps?). Both are well-written and well-illustrated, and have useful indices. However, there is one important difference: While the "For Dummies" book concentrates on the core concepts and tools of bioinformatics, Bergeron's book tries to cover pretty much the whole of modern computing: databases, networks, visualization, data mining, and collaborative environments. I applaud his ambition, but it's just too much for one book. Many sections feel like little more than a summary of key terms, and anyone who doesn't understand these topics before opening the book probably still won't after they close it.

Claverie and Notredame, on the other hand, achieve more by aiming at less. Their discussions of concepts, data formats, and tools are detailed enough to be useful to a nonspecialist like myself, and their language is light enough to avoid intimidation. Many of the details of the interfaces they describe will soon be out of date, but if you're entering the field today, this is a pretty good road map. Plus, it's a cool title.

DDJ