Small is Beautiful -- Kind Of

Dr. Dobb's Journal February 1999

By Gregory V. Wilson

Greg is the author of Practical Parallel Programming (MIT Press, 1995), and coeditor with Paul Lu of Parallel Programming Using C++ (MIT Press, 1996). Greg can be reached at gvwilson@interlog.com.

The Essence of SQL: A Guide to Learning Most of SQL in the Least Amount of Time
David Rozenshtein
SQL Forum Press, 1996
119 pp., $25.00
ISBN 0-9649812-1-1

The Perl Cookbook
Tom Christiansen and Nathan Torkington
O'Reilly & Associates, 1998
757 pp., $39.95
ISBN 1-56592-243-3

High Performance Computing, Second Edition
Kevin Dowd and Charles Severance
O'Reilly & Associates, 1998
446 pp., $29.95
ISBN 1-56592-312-X

JavaScript for the World Wide Web, Second Edition
Tom Negrino and Dori Smith
Peachpit Press, 1998
195 pp., $17.95
ISBN 0-201-69648-7

AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis
William J. Brown, Raphael C. Malveau, Hays W. McCormick III, and Thomas J. Mowbray
John Wiley & Sons 1998
309 pp., $39.99
ISBN 0-471-19713-0

Beginning Object-Oriented Analysis and Design: With C++
Jesse Liberty
Wrox Press, 1998
359 pp., $34.95
ISBN 1-861001-33-9

I sometimes think it's a shame that Canadian courts aren't as freewheeling as their American counterparts when it comes to damage suits. If they were, I'd probably have sued some big-name computer-book publishers by now to recover the cost of my chiropractor's bills. Personal computers were supposed to make paper obsolete, but the parcels I get from publishers, and the contents of my pack, seem to get heavier and heavier. Perhaps someone could persuade publishers to let books shrink back to a manageable size, and grab shelfspace by packaging them in large, mostly empty boxes, just like the software they describe.

Until that happens, I will continue to take a special pleasure in books that say everything they need to in just a few pages. David Rozenshtein's The Essence of SQL is one good example of this. The book "is dedicated to the proposition that one can often accomplish 80% of the task in 20% of the time." In this case, the task is learning SQL, and Rozenshtein meets his objective by showing how to formulate SQL queries to answer 13 common types of questions. The questions themselves are arranged in order of increasing complexity, so that (for example) negation is explained before aggregation, which in turn appears before SQL's handling of NULL is covered. The result can be read by a complete neophyte in less than two hours, but still manages to cover everything needed to formulate nontrivial queries against nontrivial databases. The writing is overly formal at times (the book itself is referred to as "this essay," for example), but it is still an excellent, useful little guide.

The second edition of Tom Negrino and Dori Smith's JavaScript for the World Wide Web is longer and much glossier than The Essence of SQL, but just as useful. Like Rozenshtein, the authors teach by example. Each technique is introduced by showing a page that uses it, then explaining the corresponding JavaScript source line by line. Color is used sparingly, but effectively, to highlight points of interest, and the code samples are readable despite their small size. All of JavaScript's commonly used capabilities -- highlighting icons as they are brushed by the user's mouse, controlling the content of one frame from another, checking user input, and handling cookies -- are covered, and links to further examples at the book's web site are plentiful.

Tom Christiansen and Nathan Torkington's The Perl Cookbook is several times larger than either of the previous books, but no less useful. Perl is a complex language, even by computing's rather forgiving standards; descriptions of it are littered with words like "except," "unless," and "however." While the language's motto might be, "There's more than one way to do it," beginners and harassed web site administrators would often be happy to be shown just one, so long as it worked.

Enter The Perl Cookbook. Each of its 20 chapters contains a dozen or more sections, each of which is has "Problem," "Solution," "Discussion," and "See Also" headings. The "Problem" descriptions are typically quite brief, as in:

You have a dat file containing comma-separated values that you need to read in, but these data fields may have quoted commas or escaped quotes in them.

The "Solution" presents at least one way to do what's required, and often shows a couple of alternatives as well. Solutions are taken from common practice, other O'Reilly books on Perl and regular expressions, or from widely used Perl modules -- the best answer to the aforementioned problem, for example, is to use the quoteword function from the Text:ParseWords module.

Other problems posed and solved in this book include passing parameters by name (instead of position), extracting values from C header files for use in Perl scripts, and handling TCP connections in client/server systems. The writing is clear, concise, and thankfully free of in-jokes, and the 22-page index and "See Also" links always led me to the answers I wanted with a minimum of fuss and backtracking. If only it was small enough to carry around without back strain.

The Perl Cookbook is the sort of reference and how-to that we've come to expect from O'Reilly, but as Kevin Dowd and Charles Severance's High Performance Computing shows, the company can put out a good survey when it wants to. The surtitle on the cover of this book says, "RISC Architectures, Optimization & Benchmarks," but the book covers a lot more than this. In fact, it covers just about everything that someone doing numerically intensive programming, such as statistics, graphics, or signal processing, needs to know about modern desktop computers.

The first section describes modern computer architectures: What RISC is and isn't, how memory subsystems are organized, what effect they have on performance, and how floating-point numbers are represented and manipulated. The second section, "Programming and Tuning Software," starts with a good summary of what compilers do to optimize programs, and what the limitations of current-generation commercial optimizers are. The other three chapters in this section look at timing and profiling, ways programmers can eliminate clutter that might prevent automatic optimization, and what can be done to make loops run faster. Since loops over arrays of values consume most of the time in number-crunching programs, this chapter is especially worth reading.

The third and fourth sections move on to parallel programming. Shared-memory multiprocessors, which are now affordable even for desktop use, are covered first, along with ways programs can be tweaked to take advantage of such hardware. More esoteric architectures and specialized programming languages are next. Having worked in this field in the 1980s, I was a bit depressed to see how little progress has been made, but that takes nothing away from the clarity and usefulness of the book.

The final section discusses benchmarks, and the pitfalls of benchmarking, while the appendices touch on threading in Fortran, Intel's next-generation IA-64 processor, and a variety of other topics. Overall, the book is very well written, very informative, and very good at focusing on the things that practicing programmers actually need to know. I only wish it had been available 15 years ago, when I first needed to learn all these things.

The last two books in this month's review have less to say about the details of particular languages, and more about how to go about designing and building large software systems. Or, in the case of AntiPatterns: Refactoring Software, Architectures, and Projects in Crisis, how not to do this. Design patterns are one of the bigger bandwagons to come along in the last 10 years. Since the publication of the "Gang of Four" book, it has become fashionable to try to label anything that anyone has done more than once as a pattern. This book is therefore doubly refreshing, both because its authors aren't trying too hard, and because it's always fun to see dirty laundry aired in public.

So what is an antipattern? Lava Flow is probably the best example. The anecdotal evidence teaser at the start of the chapter on this section reads:

Oh that! Well Ray and Emil (they're no longer with the company) wrote that routine back when Jim (who left last month) was trying a workaround for Irene's input processing code (she's in another department now, too). I don't think it's used anywhere now, but I'm not really sure. Irene didn't really document it very clearly, so we figured we would just leave well enough alone for now.

A Lava Flow is a program that has grown by accretion. Instead of throwing away code, successive waves of engineers have wrapped it, buried it, or worked around it, until only a small fraction of the code in the program actually does any useful work. The problem, of course, is that no one knows which fraction, and so the cycle continues.

As with most other books in the area, the authors describe each pattern in terms of its general form, symptoms, typical causes, and possible solutions. Unlike most other books, the authors of AntiPatterns include software project management in their remit, and describe such common catastrophes as "Death By Planning" and "Smoke and Mirrors." I found these sections much less interesting than the others, and felt they should have been saved for a second book. I also felt, looking back, that there was less substance in the book than I had thought when reading through it: More examples, and even a few question-and-answer problems, would do a lot to alleviate this.

Finally, Jesse Liberty's Beginning Object-Oriented Analysis and Design is hard to label, but a very good book nonetheless. His earlier Clouds to Code was a journal-like description of the development of a medium-sized commercial application. This book is more like a textbook, but is still very practically oriented. As the "Introduction" says:

When methodologists write their books, they must be exhaustive...[which] makes it all too easy for the reader to get lost in the details...[T]his book...is a working-programmer's guide to building commercial software using state-of-the-art object-oriented analysis and design. You will see how software is conceived, how you build a requirements document, how you make the "build/buy" decisions...[and] how to translate an object-oriented design into solid and reliable C++.

That's a pretty tall order, but Liberty manages to carry it off. The book really does show what analysis is, how to go about formalizing the user's understanding of a problem domain (not least so that the user's expectations can be contained), and how to translate all of that into a design for a program, and then into an actual program. The author uses the recently standardized Unified Modeling Language (UML) notation, and is up front about saying that for most commercial software developers, 32-bit Windows and MFC are the only platforms that matter, so you might as well get used to them early.

While it is the best general introduction to object-oriented analysis and design that I've come across yet, the book does have two significant flaws. The first, and smaller problem, is that some of the material on concurrency and persistence seems out of place. While these things are a necessary part of today's real-world applications, I think the book would have been stronger if it had focused on things that other books don't cover this well.

The second flaw is more important. Simply put, this book is not structured as a textbook. There are no questions at the end of the chapters, which would make it difficult for an overworked college instructor to use this book in a course. I think that's a shame, particularly since other books in this area (such as McConnell's now-classic Rapid Development and Software Project Survival Guide) have the same shortcoming. Perhaps there's room here for someone to write a cookbook.

But then, that would just add to the weight in my pack.

DDJ


Copyright © 1999, Dr. Dobb's Journal