XML and Then Some

Dr. Dobb's Journal May 2001

By Gregory V. Wilson

Greg, a DDJ contributing editor, is the author of Practical Parallel Programming (MIT Press, 1995), and coeditor with Paul Lu of Parallel Programming Using C++ (MIT Press, 1996). Greg can be reached at gvwilson@ddj.com.

XML for the World Wide Web
Elizabeth Castro
Peachpit Press, 2001
270 pp., $19.99
ISBN 0201710986

Writing Effective Use Cases
Alistair Cockburn
Addison-Wesley, 2001
270 pp., $34.95
ISBN 0201702258

3D Game Engine Design: A Practical Approach to Real-Time Computer Graphics
David H. Eberly
Morgan Kaufmann, 2001
560 pp., $59.95
ISBN 1558605932

Open Source Development with CVS
Karl Fogel
Coriolis, 1999
316 pp., $39.95
ISBN 1576104907

Computational Molecular Biology: An Algorithmic Approach
Pavel A. Pevzner
MIT Press, 2000
314 pp., $44.95
ISBN 0262161974

SSL and TLS: Designing and Building Secure Systems
Eric Rescorla
Addison-Wesley, 2001
499 pp., $39.95
ISBN 0201615983

Understanding SOAP
Kennard Scribner and Mark C. Stiver
SAMS, 2000
514 pp., $39.95
ISBN 0672319225

I received an early Christmas present this past holiday season — a book on XML that's up to date and packed full of useful information. What's more, like other books in Peachpit's Visual Quickstart series, Elizabeth Castro's XML for the World Wide Web is beautifully designed and easy to read without ever being condescending.

I have more than 20 XML books on my shelves and have reviewed several in these columns. Most get bogged down in premature technicalia, and describe the many features of XML without ever really showing why they exist, or how they ought to be used. The 16 chapters and four appendices in Castro's book, on the other hand, are organized into one- and two-page explanations of particular topics, from writing nonempty elements to namespaces, schemas, and XML transformation. Throughout, Castro strikes a perfect balance between "what," "why," and "how," and provides a surprising amount of detail without ever overwhelming you.

While most of the credit for this goes to her lucid writing style, the book's two-column layout deserves a mention as well. The main idea is explained in one column; the other contains two-color illustrations with captions that elaborate on the main theme. I have to admit that I initially mentally filed the book as "nontechnical" because of this scheme's user friendliness — like far too many programmers, I subconsciously assume that if it ain't ugly, it can't be technically sweet. I'm glad that Castro has proved me wrong...

Karl Fogel's Open Source Development with CVS is almost as useful a book — perhaps even more so, given the near-total lack of other books on the subject. The Concurrent Version System (CVS) is the version-control system of choice for almost all open-source development projects. It has some flaws, but is robust, flexible, and available for just about every combination of hardware and operating system in existence.

Until now, CVS has been poorly documented compared to other important open-source tools. Fogel's book does an excellent job of fixing that, and of explaining how you can take full advantage of what CVS can do for you. While even-numbered chapters describe CVS's commands, options, and internals, odd-numbered chapters cover design for decentralized development, and the ins and outs of building, testing, and releasing open-source code. The author is now helping develop a successor to CVS called "Subversion" (see http://www.subversion.org/).

Eric Rescorla's SSL and TLS: Designing and Building Secure Systems does an equally good job of explaining the context around a complex technology while explaining the technology itself. In this case, the technology are two network security protocols: the Secure Socket Layer (SSL) and its emerging successor, Transport Layer Security (TLS). Chapter 1 is a short (40-page) summary of cryptography and secure communication protocols that sets the stage for most of what follows. Chapters 2-4 describe the basic and advanced features of SSL, while Chapter 5 analyzes the protocol's security properties. The remaining six chapters cover topics such as performance tuning, HTTP over SSL, and alternative approaches. While the book is clearly not for everyone, I think many programmers will find it worth buying for the first chapter alone.

Kennard Scribner and Mark Stiver's Understanding SOAP is just as technical as Rescorla's book. The Simple Object Access Protocol (SOAP) is a standard for encoding procedure calls in XML. Its aim is to provide a standard way for one process to ask another to do some work and send back the result. SOAP is fairly modest in that it does not address many essential features of full-blown distributed object systems, such as activation and callbacks. In practice, that may turn out to be an asset, because developers without the budget or patience for CORBA, Enterprise JavaBeans (EJB), and the latest incarnation of COM might still be willing to SOAPify their applications.

Chapters 1 and 2 of Understanding SOAP compare its key features to those of other object technologies, and introduce the main features of XML on which SOAP relies. Chapters 4-6 cover SOAP's encoding of data; Chapter 7, at how remote methods are invoked; and the last three chapters, at BizTalk (a framework for publishing XML schemas), the future of SOAP, and SOAP's COM binding. As might be guessed from the inclusion of BizTalk and a chapter on SOAP's COM binding, most of the discussion is Microsoft-centric; other platforms are mentioned as afterthoughts, if they are mentioned at all. That aside, the descriptions are clear, the examples are well chosen, and when the authors take the time to explain why certain features are the way they are, they do so well.

David Eberly's 3D Game Engine Design is subtitled, "A Practical Approach to Real Time Computer Graphics," which is actually a more accurate description of its contents. The book summarizes the mathematics and algorithms involved in rendering objects, detecting overlaps between them, generating terrain, and so on. "Summarizes" is the key word here — while the book includes proofs of many important theorems, it is a reference rather than a tutorial. One sign of this is the relative scarcity of illustrations and colored plates; another, its unapologetic assumption that its readers have their freshman calculus and linear algebra at their fingertips. You don't need to know this stuff to use a modern 3D graphics library, but I expect that many of the programmers who build those libraries for the rest of us will find it a useful reference.

Alistair Cockburn's Writing Effective Use Cases is also a useful reference, although I think it would have been even more useful if it had been somewhat shorter. It's not a long book — 22 chapters, four appendices, and an index take up just 270 pages — and it's definitely useful, but I often felt like the author had stretched a couple of paragraphs into three or four pages. I also found the hand-drawn icons of kites, waves, and fish about as mnemonic as the UNIX command line...

Those criticisms aside, this is a solid survey of how to translate a customer's vague thoughts and wishes into a concrete specification of a buildable piece of software. The book is full of useful checklists and reminders, but has a refreshingly pragmatic "do what works" approach to the design process. Far too many projects fail to deliver what users want because their developers start coding before actually figuring that out. Going through the steps this book recommends won't necessarily put an end to that, but it'll sure help.

The last book on this month's list, Pavel Pevzner's Computational Molecular Biology, is one of the first to survey the techniques now being used to identify, classify, and search gene sequences. Physicists and geologists have been relying on high-performance computing for almost half a century, but it is only in the last decade or so that a significant number of biologists have started using software to study the chemical processes that we call "life." Doing this well has turned out to be at least as important to endeavors like the Human Genome Project as actually gathering raw data, since the latter only becomes useful when it has been analyzed and classified.

Computational Molecular Biology describes dozens of different algorithms for sequencing genetic data, finding similarities between gene sequences, and so on, and contains a great many proofs of these algorithms' properties. Be warned, however, that it assumes a great deal of background knowledge about biological concepts and terminology. At the same time, I doubt whether anyone without at least a bachelor's degree in computing would be able to follow the discussion either. This book is therefore most likely to be of use in a graduate-level course for specialists.

DDJ