Columns


Illustrated C

Processing Code Listings For Publication, Part 2

Leor Zolman


A long time ago, Leor Zolman wrote and distributed the BDS C Compiler for CP/M (what's that?). Following a several-year hiatus from computer-compulsiveness to learn some people skills, he got married, dragged his disbelieving wife to Kansas and joined the staff of R&D Publications, Inc. Two years later his wife has almost forgiven him. You can reach him at leor@rdpub.com or uunet!bdsoft!rdpub!leor.

Last time, I introduced plist, a program to translate tabs into spaces and right-justify comments in C source listings. While plist effectively reduces the column width required to display a listing, it is actually the second of two programs I typically employ in order to fit the listing more comfortably into one of the more narrow columnar magazine formats. This month I'll describe plist's companion program, maxl, a tool that finds the longest lines in any text file. With the longest lines of a file identified, you are then in the best position to decide how to achieve the largest reduction in code girth with the least amount of editing.

The Strategy

I started with the premise that input lines will remain below a reasonable maximum length, 300 characters, then tabulated the lengths of all lines in a file by using the value of each line's length as an index into an array of counters.

After scanning an entire input file, the information I wanted included the complete text of the longest line(s), the location (line number) of those lines, and a frequency count for each of the largest line lengths encountered.

The fact that several pieces of information might need to be saved relating to each input line suggested the use of an array of structures (rather than an array of simple counter variables) for storing the required information. I've assigned a tag name of line to the structure definition and defined an array named lines to contain MAXLINE instances of this structure type (lines 34-38 of Listing 1. ) Each element of the lines array therefore holds information pertaining to lines of a particular length (equal to the array element's index). This arrangement will support lines ranging anywhere from zero to 298 characters in length. Note that two character positions are lost due to the need for both a terminating newline and a null byte at the end of each line read in through the use of the fgets () function.

The Variables

The line structure contains three elements:

Unlike the plist program, maxl can operate on only one file at a time. maxl is designed for repeated use upon a single file, until that file is narrow enough for further processing with plist.

maxl takes three optional command line parameters:

For example, you can write

maxl maxl.c
or its equivalent, with the default option values written out,

maxl -t4 -d3 -f79 maxl.c
The result is shown in
Figure 1. Note that maxl.c refers in this example to the original C source file, not the listing file created by plist shown in Listing 1. Therefore, the line of length 70 that shows up in the maxl output does not actually affect the width of the final printed listing, because the comment-alignment feature of plist removes enough of the whitespace between the code and the comment to keep the line from standing out.

Scanning A File

After processing command-line parameters and opening the input file, maxl initializes the how_many counter for each index position in the lines array (lines 95-96.) This isn't strictly necessary, since external data is supposed to get automatically initialized to zero, but it can't hurt to make sure. In the last preparatory step, the line number counter lineno is initialized to 1.

Lines 99-141 comprise the main processing loop. Each time through, the program reads a new line of input text into the buf character buffer. It calculates the length of that line by keeping track of a "virtual" column number in the local variable v_col, adjusting for hard tabs and carriage returns as the text is scanned from left to right in lines 104-119. When a tab character is found, v_col is incremented as necessary to simulate an advance to the next logical tab position according to the current logical hard-tab setting, tabstop.

Either a newline or a null byte terminate the line scanning loop. Carriage returns are not usually seen in the middle of normal lines of text. Under DOS, the fgets function returns a line of text where the CR of CR-LF pairs has been stripped out, leaving only the LF, otherwise known as a newline. Under UNIX-like systems, CRs are not part of the normal text file line-termination sequence anyway. So, maxl displays a warning (line 113) when a CR is encountered, and sets the virtual column number back to 1.

All other characters cause v_col to simply be incremented to the next column (line 118.)

After the line has been fully scanned, a test is made to see if the line exceeds the maximum allowed length. If it does, a warning is issued (lines 121-124.)

Now we have all the information necessary to characterize this line in the data structures. The value of v_col provides the index into the lines array of the length structure we are interested in. Therefore, the expression

lines [v_col]
evaluates to the structure containing information about all lines of length v_col. The member how_many of that structure tells how many lines of that length have been previously encountered. The expression in line 126 tests whether a previous line had been found by returning a value of 1 (true) if the value of the how_many element is 0. how_many is then incremented (regardless of its previous value.)

If this is the first instance of a line of length v_col, then the statements in lines 128-136 save the line number and complete text of the line in the structure lines [v_col].

And The Winners Are...

After all input text has been processed, lines 142-154 display the results. The loop iterates through the structures in the lines array, from last (largest index value) backward. Any entry with a zero how_many value is skipped over completely (lines 145-146.) Every time a non-zero value of how_many is found, all required information is displayed and the counter j is incremented. If j reaches the value of diffsizes (the number of different sizes to display), we're done.

The putline() function displays a line of text, performing tab expansion in the process. putline() calls putrule() before and after the line is displayed, to provide a visual reference for the exact physical length of the line. putrule() displays a line made up of exactly the supplied number of = characters. That way, if the trailing characters of a given line are spaces or tabs, the rule line will extend beyond the apparent end of the text line and thereby reveal the superfluous trailing spaces.

Errata

After publication of August's column describing the plist program, I realized there were some cases when the behavior of plist might not yield exactly the expected results (i.e., bugs). I'd like to mention them now just for closure.

First, plist assumes that all tab characters appearing on a line of text always precede the text. In most of my C source files that is in fact the case, but other styles of C programming involve more liberal use of tabs. The most obvious case involves writing a series of symbolic definitions where the symbols being defined have greatly varying lengths. Then, tabs are often used to align the text to be substituted. Take the following code fragment for example:

#define FOO                 100
#define A_LONG_NAME_INDEED  50
#define X                   25
If tabs were used after F00 and after X, then tab translation might alter the vertical alignment of the values in the right "column." Since I could find no easy (i.e., non-kludgy) fix for this problem, the only way to preserve alignment in these cases is to write the lines without tabs after the identifier name.

Second, plist as published issues a warning message when the length of a line containing a comment exceeds the line length specified in the -c option (the right-most column for comment justification), but ignores lines without comments that exceed that specified length. This, at least, has an easy fix:

Find the following section in August's of plist.c:

235:  if (!(cmnt_start =
       strstr(line, "/*")) ||
236:     !(cmnt_end =
        strstr(line, "*/")))
237:   return;
and change it to read as follows:

if (!(cmnt_start =
        strstr(line, "/*")) ||
   !(cmnt_end = strstr(line, "*/")))
{     /* If no complete comment,
        just check length: */
   if (strlen(line) > cmnt_col)
      fprintf(stderr,
   "\aWarning: line %d too long.\n",
              lineno);
   return;
}
This concludes a short series on processing C program listings for publication. With these tools at hand, I'm hoping contributors to technical journals such as CUJ, and the editorial personnel of those journals, may get a bit of relief from the "fat C code" syndrome.