C PROGRAMMING

Plumbers, Programmers, and Quincy 96

Al Stevens

I've been speculating about the future of programming and what it will look like in the next couple of decades. I invite you to respond with your own notions about where we might be headed.

Programming has gone through several shifts since its beginnings about 50 years ago. Each new programming model tried to make the task of programming easier and more disciplined so that software systems are timely, responsible, extensible, reliable, and maintainable. Noble objectives.

A timely system is available when it is needed. A responsible system supports the requirements of its user. An extensible system can grow and change when the requirements change. A reliable system rarely fails. A maintainable system can be repaired. These are the attributes that describe a system of good quality.

To maintain the social level established by fellow columnist Michael Swaine, I had a swimming pool installed a couple of years ago in my Florida winter home. (It's also my summer home, but Michael doesn't need to know that.) I asked the contractor in advance what variables could affect the schedule. There were two: weather and the remote possibility that my backyard would turn out to be an undiscovered archeological site or burial ground. I asked what variables could affect the quality of a pool. Again, there were two: the expertise of the installers and the quality of the materials. Those guys had a clear understanding of their craft. The weather cooperated, no departed Seminoles or Spanish ruins were unearthed, the installers were a family with years of experience, they used trusted materials, and a good pool was finished on time.

For years our industry has tried to achieve the same level of performance, to remove the obstacles that impair our ability to deliver quality software on time. Many so-called methodologies have been advanced. Writers sold books, lecturers hit the seminar trail, consultants ran their meters, and few, if any, of the methodologies ever worked consistently. The excuse offered by the methodology peddlers is always the same: Their clients did not adhere dogmatically to the disciplines of the methodology. Otherwise, the methodology would have worked.

For a software-development methodology to have any chance at all, it must be thoroughly understood by its practitioners, and it must have the total commitment of everyone on the team from start to end of the project. That never happens. The looming specter of a missed deadline always eliminates all those good intentions, and strict adherence to some time-consuming methodology is sacrificed. All able coders drop whatever else they are doing and get down to writing code. Those who cannot write code stand around wringing their hands.

The Psychology of Computer Programming (Van Nostrand Reinhold, 1971), written by Gerald Weinberg 25 years ago, advances the notion of "egoless programming" as the basis for overcoming the problems of human ego versus technical merit in the resolution of technical issues. Presumably your ego is emotionally wrapped around your work, and any suggestion from someone else that you modify your work is taken as an assault on your competence and intelligence. According to Weinberg, if everyone on the team regularly reviews everyone else's work, each individual is less likely to react personally to criticism.

Review begets change. Programmers resist change, not so much because they don't like interference, but because they don't want to revisit finished work. Some insecure programmers view each suggestion as a mustache painted on their Mona Lisa, but those poor souls are usually in the minority. Most programmers do not have that particular problem. Consequently, I think that although much of Weinberg's wisdom has endured the test of years, egoless programming is a naive concept upon which to base a methodology. First, where ego is a factor, you cannot eliminate it no matter how you organize the people and the work. Everyone has an ego. Second, ego is not the real problem.

Every programmer knows what a pain it will be to change his or her own work to implement someone else's suggestion. Every programmer is the sole authority on the complexities of his or her own code. That's the problem. Periodic code reviews do not change this condition.

Consider this. If I agree that your change has merit, I have just signed up for a bunch of work that might compromise my ability to complete other assignments. Only I know by how much because only I know what has to be changed. It is usually far easier to argue against the merit of your suggestion, no matter how ineffective the argument and no matter how reasonable the suggestion, than it is to implement the change. Knowing that, I naturally seek the path of least resistance and beat you down with rhetoric rather than search for the merit in your idea. You, knowing how you would have built my part of the system, cannot understand why I am resisting what should be such a trivial change. We argue endlessly from two different perspectives, neither one related to the original issue. The question is eventually solved based on the relative debating skills of the combatants rather than on the technical merits of the question.

Such debates sink to the level of ego only because there is no common technical platform, even when one should naturally exist. The original issue gets lost when the overriding issue becomes who is right. The person making the suggestion is defending his or her right to make suggestions and be respected. The person arguing against the change might be guarding what could be exposed as a nonextensible piece of work and is usually defending the schedule, too. The technical merits of the issue get lost in the fray almost from the start.

Why does this happen? It happens because programmers work alone, designing and building their components mostly out of view of the rest of the team. A feature does not get effectively examined and reviewed until a significant body of individual work has been invested in the feature.

Weinberg tries to solve that problem by making everyone's work the collective intellectual property of the team. Programmers tend to resist that idea. No one wants to expose code not completed. Everyone wants time to smooth the bumps before peers, bosses, and the public gets a look.

I've seen projects attempt to implement egoless programming after someone in charge has read Weinberg. If the development team is a hierarchy, only the code of the lower levels ever gets reviewed. Those in positions of authority always exempt their own work from the common scrutiny of their subordinates. That's not what Weinberg had in mind, but that's what has always happened.

All this chaos is supported and encouraged by what I call the "private code paradigm" in which a programmer codes in private. Not until we eliminate the private code paradigm will we begin to find solutions to the problem.

The makers of methodology often try to associate software design and implementation with other construction disciplines. The designer is an architect. The programmer is a carpenter. This analogy fails. A real architect designs a building, and, when you look at the design, it looks like a picture of a building viewed from different angles. Furthermore, the architect does not have to design known components, such as toilet tanks, breaker boxes, and down spouts, from scratch, but simply inserts those reusable components into the design where they are known to fit. What a concept.

Anyone can, prior to the construction of that building, look at the design and have a good idea of what the building will look like. Anyone can, during the construction of that building, look at the design, look at the building, and make a reasonable guess as to whether the building complies with the design. As carpenters, masons, electricians, plumbers, roofers, drywall hangers, painters, tile layers, and so on, add their individual components to the building, everyone gets a look at the collective work in progress. Everyone can tell at a glance if things are shaping up, how the schedule is looking, how the budget is holding up, if codes are being met, and so on. The designers, builders, inspectors, and users can walk through the construction, see what is happening, and make concrete comments in a language that everyone understands. Everyone's work is public.

It has been observed that the practitioners of building design and construction have simply gotten it right after centuries of experience. That is true. It has been observed that software designers and builders, with only about 50 years of experience and with a constantly moving technology, have had little opportunity to get it right, That, too, is true.

The construction industry is regulated by local governments and regulations. Inspectors who have no vested interest in the outcome can declare the product to be unusable because it does not adhere to established standards. Bring the building "up to code" or you may not occupy it. Luckily, software builders are not regulated that way--we have no inspectors, but the freedom that we enjoy is part of the problem. Our work is private.

Several years ago, I had an abominable assignment, mercifully brief, but one that I hope never to repeat. I was the government's technical advisor on a hardware/software project being built by a contractor. There was a bunch of new hardware and a big program designed and written by one programmer. My role was like that of a building inspector. I was to observe the project in progress and tell the government if they were getting what they were paying for. The programmer, engineers, and managers who worked for the contractor hated me. During what was to be the final acceptance test, the government operators sat at the console and ran the system. The contractor programmer sat beside them. Every time an operator had a problem, the programmer explained why the procedure didn't work and what to do differently. No one else--not the engineers, not the documentation writers, not the managers, certainly not me--knew how to operate that software. I suggested that, as an experiment, they put a muzzle on the programmer and see how far the operators could get on their own. Guess how far they got? The government project manager asked me to write my opinion of the test results. I reported that the system should be accepted only if that particular programmer was part of the delivery and available to baby-sit the system 24 hours a day. Without him, in my opinion, the system was not usable.

We've got to get programmers out of the closet so that everyone can view all the work in progress all the time. Last month, I described a visual, virtual-reality, software development environment of the future in which designers and programmers meander around the design, adding components and working with the ones already in place. Taken one step further, that environment is populated by all the members of the team. As you cobble away at your part, you can look a few nodes over and see virtual renderings of your coworkers busily doing their parts. You'll be like the drywall hanger who knows not to hang anything because the insulation isn't up, the electrician isn't finished running wire through the studs, and the in-wall plumbing is incomplete.

C FILE Stream Text and Binary Mode

The Standard C Library, as defined by ANSI, provides two modes for the fopen function. You can open the file as a text file or as a binary file. If you do not specify a mode, the default is text mode.

MS-DOS programmers understand this requirement well. MS-DOS C and C++ programs translate newline characters ('\n') in memory into the CRLF ("\r\n") pair when writing to a text file. Those programs convert a CRLF pair into a single newline when reading a text file into memory. UNIX programs make no such conversion. The newline in memory is a newline in a text file. Therefore, no apparent difference exists between text and binary files that are read and written by UNIX programs. UNIX programmers are aghast when they hear that MS-DOS programs employ two different file formats. The following quote, taken from a newsgroup discussion among programmers of both platforms, is typical of the UNIX programmer's reaction when they discover text and binary modes: "It's not clear MS-DOS needs a different file format for text (they may, but it's a mistake)."

This argument is wrong on three counts. First, the two modes have nothing to do with operating systems and everything to do with compilers and hardware. Second, MS-DOS compilers do use a different format than those of UNIX. Third, it was no mistake. The difference was intentional, and when you understand why, you see that it was a reasonable solution to a prevailing problem.

This is how I remember it, although some details are fuzzy. Maybe this perspective will enable those who were not involved with computers 20 years ago to understand why MS-DOS and UNIX compilers have different formats for text files and why, given the perfect wisdom of 20/20 hindsight, neither approach is inferior. I invite those of you whose memories are better than mine to correct any errors I might make in this reminiscence.

When Dennis Ritchie built the C language, he worked with a PDP-11. The PDP 11's console, a typewriter-like device called a "DecWriter," behaved, either in hardware or through its device driver, like a typewriter with respect to the Enter key. The Enter keystroke sent to the computer the line-feed character, which, when echoed to the console, moved the type ball down one line and to the left margin. Consequently, C adopted the convention wherein a single newline character, which is the line feed in ASCII character sets, signifies left margin, down one line.

Early, so-called "home" computers--the ones that predate the IBM PC--used TTY devices as consoles. Some were traditional paper-based TeleTypes, others were dumb video terminals, called "glass TeleTypes." The early terminals did not translate a newline character into a CRLF pair. Send a newline character to one of those terminals, and the cursor, type ball, or whatever simply moves down one line without returning to the left margin. Send only a carriage return, and the type ball moves to the left margin staying on the same line. It takes two characters to get the effect of what we now think of as '\n'. Many of the video terminals had an option to enable a CRLF insertion, but the programmer could not count on everyone having the same setting.

The single newline character did not work on the TeleType, either when typing or when copying a text file to the console or printer device. When CP/M, an anarchistic operating system, became the dominant OS for microcomputers, nothing concrete could be assumed about its character devices and their device drivers. Consequently, C compiler builders for those early machines built into their I/O libraries the translation of LF in memory to CRLF on output and CRLF on input to LF in memory. This convention gave natural birth to the two modes, text and binary, for file streams because, of course, such translations mutate nontextual data.

The textmode translation was contrived in the interest of source-code portable programs but at the expense of portable data files. No problem, because the C statement while ((c=getchar())!=EOF) putchar(c);, when compiled with a PC compiler, converts a UNIX text file into a PC text file.

When UNIX programmers cite this two-mode contrivance as evidence of yet another weakness of MS-DOS and of the clear superiority of UNIX, they aim their shots in the wrong direction. Nothing in the MS-DOS API has anything to do with text and binary file open modes. The DOS API functions open and close files and read and write binary streams just like UNIX. Text and binary modes were invented for and implemented in language translators and file-system libraries to accommodate hardware that UNIX compilers did not anticipate. Early IBM PC C compilers perpetuated the convention presumably so that files and programs would be convertible from the then-dominant CP/M platform. Eventually, the PC's overwhelming dominance of the desktop market mandated that the text/binary convention become a part of the ANSI C Standard, and, whether we like it or not, text and binary modes are with us to stay.

In an ideal world, the solution to this problem would have been implemented in the device drivers of the character devices instead of in the file systems of the compilers. Inasmuch as CP/M depended on installers to write their own device drivers in assembly language, such an assumption could have been made except for one thing: C compilers for microcomputers came along after there was already a substantial installed CP/M base with devices and drivers that performed no such translation.

If the designers of the PC had known that their machine would have become a dominant C platform, and if they had had sufficient vision, they could have put the translation in the MS-DOS character device drivers, an obvious choice in retrospect. That's a bit of a stretch, however. No one could have accurately made that prediction. Because the PC designers did not anticipate the problem and because the PC's behavior became a de facto standard, the C compiler builders had to do something about newlines. They had no choice.

Microcomputer systems programmers of the 1970s and early 1980s could not know that an obscure, cult language would take over the microcomputer programming world and assume things about hardware.

Quincy 96

Quincy 96 is the current ongoing "C Programming" column project. It is a Windows 95 GUI application that serves as an integrated development environment for the Win32 port of the GNU C and C++ compilers. I am using Quincy 96 as the environment for programming exercises in a C and C++ training CD-ROM that DDJ and I are developing.

Quincy 96 is close to being completed. There are a few knots left to untangle, and I'm sure that new requirements will surface as I use it in the development of the training tutorial. Since writing last month's column, I added an expression parser to the debugger and the ability for an external program to send commands to Quincy 96.

Parsing Debug Expressions

The expression parser supports the examining and watching of variables during a debug session of a C or C++ program. During debugging you often want to look at a subscripted element in an array or a member of a class. You want to use variables for subscripts and arithmetic operators in the expression. You want to dereference pointers and references. The expression parser provides that capability. It's similar to the parser in an interpreter or compiler. In fact, I adapted the old Quincy expression parser for just this purpose. The new parser is not a complete C++ expression parser. You cannot call functions or perform floating-point math. The parser does not include assignment or logical operators. It's just a simple expression parser.

The parser needs the cooperation of the stabs section of the debugger. I described stabs last month. They are the debugging information tables that are embedded in a program compiled by GNU C or C++ with the -g option. In parsing an expression, the parser eventually will need a value from the debugged program's memory as defined by a symbol in the stabs symbol table. If the identifier turns out to be the name of a structure or class object, the parser needs to tuck it away until it finds a member operator and the identifier of a member. And more. So, it stands to reason that you cannot parse an expression unless a target program has been loaded and its stabs tables have been initialized from its .EXE file.

The parser uses a typical recursive-descent algorithm, so it takes advantage of C++ exception handling to find its way out of the descent when it finds an error in the expression. C++ exception handling does what setjmp and longjmp do in C except that in addition to unwinding the stack, a thrown exception calls destructors for any local objects that were declared along the way. C++ exception handling is a language feature rather than the function kludge that setjmp and longjmp use, so exception handling is a lot more intuitive.

Expression parsing is a two-step process. First comes the lexical scan, which translates the expression into tokens and eliminates unnecessary white space. Then comes the evaluation, which scans the tokens left to right and evaluates them one at a time. Operator precedence and associativity are managed by the algorithm. I discussed these subjects several times over the years in this column when I published interpreters, scripts, and query languages. Rather than revisit them, I'll refer you to the Dr. Dobb's/CD Release 3, which has the text and code from January 1988 to June of 1995 and a very fast search engine. If you want to look at Quincy 96's expression parser, download the code and open the parser.h and parser.cpp files.

The Tutorial and the IDE

The interactive tutorial, which I am building using Asymetrix Toolbook Multimedia, must be able to launch Quincy 96 with the source files loaded for a specific exercise. The tutorial must also be able to specify the variables to watch and the breakpoints for the exercise. I decided to use the format of the Windows API's private profile variables to define each of the exercises. An .INI file for each exercise has text settings that specify everything the tutorial needs to run the exercise automatically for the student. By overriding the CWinApp:: OnDDECommand function and intercepting opens of .INI files, the program can load the exercise's source-code files and set the watches and breakpoints.

The tutorial also needs to send Step, Run, Step Over, and Stop commands to Quincy 96's debugger. I decided to use DDE commands for these operations as well. Then I discovered a neat hack for testing. Long before the interactive tutorial is ready, I can test every exercise. I used the Windows 95 registry to associate files with the .Qcmd extension with the Quincy 96 executable program. I put dummy-text files on the desktop with the names Step.Qcmd, Run.Qcmd, and so on. Dragging and dropping an exercise's .INI file onto the Quincy 96 icon starts Quincy 96 with the associated exercise loaded and ready to go. Double clicking the .Qcmd file icons sends DDE commands to Quincy 96 to step through, run, and stop the program as if the user had clicked the buttons or the tutorial had sent the DDE commands.

As a final aid to development, I put a tool button onto Quincy 96's toolbar. It's off to the right and labeled TUT. It won't be in the released version on the CD-ROM, but I'll leave it intact on the download version so that you can play with it. I can manually set up a tutorial exercise in the Quincy 96 IDE with source-code files, breakpoints, and watch variables. When I click that button, the program generates an .INI file that appropriately describes the exercise.

The Color Bar Cursor

Last month I reported that I had not figured out how to display a color cursor bar for the program counter during a source-level debugging session. Now I know how, and I also know why Visual C++ and other Windows-hosted debuggers use a token in the margin for that purpose instead of a cursor bar.

Example 1 is the code that you put into a member function of a class derived from the MFC CEditView class. In this example, "newfont" is the name of a CFont object representing the font that the text editor is using. The nLine, and nWidth values are the line number and the width in characters of the text to be displayed. The lpText pointer points to the text of the line, a value that you get by using the technique that I described in the February column.

This technique works fine, except in one case: If the user does any horizontal scrolling, the cursor bar still displays text from the left margin. There is no adjustment in display units that I know of for the CDC::TextOut function. It uses character units. I must decide if this behavior is acceptable for Quincy 96 or find another way to define a cursor bar.

Source Code

The source-code files for the Quincy 96 project are free. You can download them from the DDJ forum on CompuServe and on the Internet by anonymous ftp; see "Availability," page 3. To run Quincy, you'll need the GNU Win32 executables from the Cygnus port. They can be found on ftp.cygnus.com//pub/sac. Get Quincy 96 first and check its README file to see which version of gnu-win32 you need. Every time they release a new beta, I have to make significant changes to Quincy 96. As I write this, the latest beta is Version 12 and Quincy 96 build 1 works with Version 10.

If you cannot get to one of the online sources, send a 3.5-inch high-density diskette and an addressed, stamped mailer to me at Dr. Dobb's Journal, 411 Borel Avenue, San Mateo, CA 94402, and I'll send you the Quincy source code (not the GNU stuff, however--it's too big). Make sure that you include a note that says which project you want. The code is free, but if you care to support my Careware charity, include a dollar for the Brevard County Food Bank.

Example 1: Displaying a Color Cursor Bar in a CEditView Control.

HideCaret();
CDC* pCDC = GetDC();
pCDC->SelectObject(&newfont);
pCDC->SetBkColor(0x00ff00);
pCDC->TextOut(0, nLine, lpText, nWidth);
ReleaseDC(pCDC);
ShowCaret();