This article is not about Windows 95 -- I promise -- although it might seem like it at first. By now you've probably heard every voice in the media harrumph and weigh in with an opinion on Windows 95 -- from Doonesbury to Jay Leno to San Francisco newspaper columnist Herb Caen ("Windows 95 needs so many memory and disk resources from your computer that it should be called 'Hoggin' DOS.'")
In case you've been on Bora Bora the last month, here's a quick digest of recent commentary: The user interface is a tremendous improvement over Windows 3.1 (but an unabashed rip-off of the Mac and OS/2), the software has bugs (but no severe ones), is slow (but comfortable if you have 16 MB of RAM), is a disk hog (but disk storage is dirt cheap), has an installation program that goes to heroic lengths to autodetect your hardware (yet nevertheless fails a good portion of the time), and is surprisingly compatible with DOS and Windows 3.1 programs (but manifests anomalies with a number of well-known applications). The product was launched with the theme song "Start Me Up," by the Rolling Stones, which cost $12 million to license, according to the British press (The Wall Street Journal says $4 million). Given the witch's brew of DOS code and 16-bit Windows code that remains within the operating system, one wag says Microsoft should have licensed "Goat's Head Soup" instead. There's a clear consensus that the software will get better, and that, regardless, Microsoft system software will continue to dominate the desktop (so what else is new?). It's also likely that many corporate MIS managers will resist the upgrade at first; therefore, the early adopters will likely be home users (paralleling the way the Pentium entered the market), motivated by a launch budget ($500 million, counting third-party efforts) equal to the GNP of many developing nations.
In the month prior to the launch, there was suspense as to whether there would be any "showstopper" bugs - bugs severe enough to abort the festivities. None turned up before the release to manufacturing (RTM), and all systems were go. However, in the weeks before RTM, it seemed that a different "final beta" or "golden master" would arrive in the mail each week, fueling this kind of speculation. Never mind that up to 1 million people have been using the product for several months, more than the installed base of many major software products.
Those who've shipped a software product can surely empathize with the anxiety prior to any major release, wondering if the fragile mountain of (let's face it) tangled spaghetti code will crash and burn as soon as the first user touches it. So endemic is this problem that one software-engineering expert says, "If you deny having encountered this, then you must be lying."
One of Microsoft's boasts about Windows 95 is that "the flight control software for the entire U.S. Space Shuttle program is roughly 500,000 lines of code, or 1/29th the size of Windows 95." (Chicago Tribune, August 16, 1995). If true, this means Windows 95 codebase tips the scales at 14.5 million lines of code. It is this boast that leads us to the focus of this article: big programs - not just Windows 95 - and the process by which large software gets written. Have there been any changes to the classical approaches to this problem? And what is it about space-shuttle software that a company would want to compare its software against it? These questions are substantial enough that the answer will span into the next issue of the Developer Update. It means stepping back to when Windows 95 was but a blip on the horizon, like an elephant marching across the plains of Kenya, viewed from Kilimanjaro. It turns out there are other elephants on the plain.
You may remember when Brad Silverberg, head of the Windows 95 division at Microsoft, said in May '94 that Windows 95 was "95% code complete," and that only performance improvements and testing remained. Yet this process dragged on, past slipped milestones and an interminable series of beta releases, long enough to garner a top spot on the P.C. Letter's vaporware list.
As you're aware, software project delays are the rule. Ed Yourdon has remarked that the PC industry has a near-perfect record of missed milestones - just recall Lotus 123/W, System 7, OS/2, dBase IV, FullWrite, and Windows NT. Once released, there are the inevitable bugs and disappointing performance. Microsoft's software is not the worst of the lot by any means.
Frederick Brooks followed his classic book The Mythical Man Month with a landmark essay in the April 1987 issue of IEEE Computer, comparing a software project to a werewolf: "usually innocent and straightforward but capable of becoming a monster of missed schedules, blown budgets and flawed products."
You may have read this before, but Brooks prescient prediction is worth re-reading:
But as we look to the horizon of a decade hence, we see no silver bullet. There is no single development, either in technology or management technique, which by itself promises even one order of magnitude improvement in productivity, in reliability, in simplicity. Not only are there no silver bullets in view, the very nature of software makes it unlikely there will be any.It is curious why anyone would boast of the size of the source code to their software. Any experienced developer, given a choice between two unknown, newly released pieces of software, one at 15 million lines of code, and the other at, say, 5 million lines (the size of Windows NT codebase) would certainly choose the smaller, all else being equal. After all, even the space-shuttle software, developed to be as close to zero defects as humans can achieve and an order of magnitude better than the best-quality commercial software, still has a rate of one defect per ten thousand lines of code. Extrapolating this rate to a codebase the size of Windows 95, means the product has shipped with 1500 active and festering bugs. Of course, to develop software at these antiseptic levels means a cost of $1000 per line of code (about $15 billion, big bucks even for Bill Gates).
Using the $400 million cost of Windows NT as a gauge, it seems likely that Windows 95 software cost is closer to $100 per line of code, resulting in a total cost of $1.5 billion (actual expenditures may be less due to reuse of legacy 16-bit Windows code) and there are probably more like 15,000 bugs (and you wonder why it just crashed on you). This figure matches ballpark estimates of the development cost of OS/2, reported to total $2 billion. Way at the other end of the scale, small development projects I've recently measured have a tab of $1 to $5 per line of code, for projects around 25,000 lines, and a detected bug list of 1 per thousand lines of code (there are likely many more bugs, but testing is not extensive). These are standard Windows apps in C and C++ for in-house corporate use, not shrink-wrapped system software.
The flight-control software, written by IBM's Federal Systems division (sold to Loral Corporation in 1994) consists of 420,000 lines of code running on five redundant computers. The project has been in existence for over 20 years, and in that time has seen an improvement of two orders of magnitude in the defect rate, and a 300 percent improvement in the productivity. There have been 17 releases of software, almost one per year. They have developed the capacity to predict costs within 10 percent, and have only missed one deadline in the last 15 years. This is certainly a sterling record, perhaps even of silver-bullet quality - although some of this sterling appears to be chrome plated.
How do they do it? According to the authors of Capability Maturity Model, the project is equivalent to a "Level 5" organization. This Level 5 classification is one of a set of levels that comprise the "capability maturity model" (CMM), which is a framework for understanding and improving the process of developing software. (You may know of this under its previous name, the Process Maturity Model.) In CMM, there are five levels that a software development organization must work through, starting with Level 1, the most immature and undisciplined stage. As in a video game, you can't go to a high level without first going through the preceding lower levels. There are no shortcuts.
The model has many details that we can't get into here, but we'll give a quick summary. At Level 1, activities are ad hoc and chaotic, and "success depends on individual effort and heroics." Level 2 (Repeatability) introduces basic project-management mechanisms, such as schedule tracking, that allow the organization to repeat past successes on new projects that are similar. In Level 3 (Defined), the software process is documented so that different projects in the organization can use the same overall process. Level 4 (Managed) adds more-detailed measures of software quality. Level 5 (Optimized) has additional mechanisms that allow the process to be improved continuously.
SEI is funded by the U.S. Department of Defense (DoD). The DoD recently stated that no defense contractor will get development contracts unless the organization is at least at Level 3. This proclamation provides a real incentive to get to know the model, but I must confess that the formal presentation in this book is so dense as to be almost impenetrable. In reading it, you enter a land of $64-dollar words, passive-voice sentence construction, and nouns turned into verbs - bureaucratese, in other words. If you're a hacker or independent-minded developer, you'll want to bring along some antiemetic pills on this journey. An example: "Software work products are decompressed to the granularity needed to meet the estimating objectives." Before swallowing the model whole, a bit of due diligence reveals some interesting facts.
A hallmark of CMM is that capability belongs to the organization rather than the project. The final sentence of the SEI report on the shuttle project reads: "One of the most important lessons from this project is that, while low-maturity organizations look on talented staff as the best way to save troubled projects, mature organizations look on talented staff as the best way to transfer the culture and methods to new applications."
It is therefore disappointing to find out that it is the same Loral division that is responsible for a recent process failure, the aborted FAA Advanced Automation System, that would replace the nation's geriatric air traffic control system, now reaching the end of its life. Over the past year, the 30-year-old IBM 9020E computers that comprise the current system have had multiple failures. The equipment is so old that the technicians that know how to repair this system are retiring, and new technicians don't have the skills to deal with such ancient technology. The FAA is the nation's largest purchaser of vacuum tubes. Despite the dire need for a replacement system, FAA Administrator David Hinson canceled most of the project last year, which has been underway since the early 1980s.
Some members of Congress blame cumbersome federal procurement procedures that "can still only buy yesterday's technology the day after tomorrow at government prices," (The Wall Street Journal, August 18, 1995) and argue for privatization. However, the real story seems to be primarily a failure of software development process. It's difficult to get information on this, but one person I spoke to that is familiar with, but not directly involved with the project, provided some details. Reportedly, the project was very late when canceled, already $2 billion over an original budget of $4 billion, and would not have been ready to test until the year 2000. According to The Washington Post (June 4, 1994), a study for the FAA by the Center for Naval Analysis found the software design used by Loral to be "seriously flawed" and riddled with errors.
Further, it turns out that even the shuttle software itself is not above reproach. A 1993 study for NASA, entitled "An Assessment of of the Space Shuttle Flight Software Development Process" commended NASA but also had some criticisms. These are summarized by Ivars Peterson in Fatal Defect (Times Books, 1995):
NASA had not adopted the strict safety and process methods appropriate for such a large, complex, high-profile undertaking - much of what happened during the software-development process proved undocumented. There was a lack of adequately detailed, written descriptions of actions and decisions taken by the people involved. Instead, panel members encountered a strong tradition of passing this lore orally from person to person.Sound familiar? From this account, it does not sound like a Level 5 outfit. Further investigation reveals that the only Level 5 organization that the SEI has information on is a Motorola site in India. It is an additional small irony that the CMM book comes with a post-release bug-fix, a piece of paper glued over the copyright page that changes the date from 1994 to 1995. It seems that source code is not the only thing that is late or needs patching.
Of course, anyone who's been in the software field will not be shocked by any of this. I'll briefly share one experience. Some years ago, a corporation I worked for spent $1 million to license a hot new graphics application from a talented startup company. The application consisted of 100,000 lines of code. The corporate managers decided to "improve" the product by assigning 80 engineers to the project. The resulting requirements, specifications, plans, and budgets created a paper stack about four feet high. After two years and $10 million in development costs, the project was scrapped because of extended delays, an inordinately high bug count and dramatically decreased performance (60 seconds between clicking the mouse on a simple object and seeing it highlighted on the screen).
Other examples abound. For a long time, the CASE tools market was led by Index Technology, which foundered when a new release of its flagship product was two years late. A developer I know who worked there confided that the code was in terminal spaghetti state, and needed a complete rewrite. Much more recently, the marketing VP of a company that sells a CASE tool priced in the low four-figures admitted that the company's developers didn't use its own tool to design and build this product.
It is likely experiences such as this that have soured many experienced programmers on software process methods in general and on CASE tools in particular. One estimate is that 70 percent of CASE tools go unused (IEEE Software, May 1992). Eric Raymond writes in the New Hacker's Dictionary:
There is the false belief that large, innovative designs can be completely specified in advance, then painlessly magicked out of the void by the normal efforts of a team of normally talented programmers. In fact, experience has shown repeatedly that good designs arise only from evolutionary, exploratory, interaction between one (or at most a handful) of exceptionally able designers and an active user population - and that the first try at a big new idea is always wrong. Unfortunately, because these truths don't fit the planning models beloved of management, they are generally ignored.This is as pure an expression of the "hacker ethic" with regard to the development process as can be found. In this worldview, the steps of something like CMM are not upward to mature enlightment, but downward-descending circles of Hell.
Recent writings on the software process seem to have less of an ivory-tower view, and acknowledge a world where budgets are real and humans fallible. For lack of a better term, I refer to this as the "new realism" in software development. Next month, I'll examine this approach and how it is being pitched in a number of recently published books.
Ray Valdés is senior technical editor for Dr. Dobb's Journal and can be contacted at