August 1991/Illustrated C

Columns

Illustrated C

Processing Code Listings For Publication, Part 1

Leor Zolman

Leor Zolman bought his first microcomputer (an IMSAI 8080) while in high school in L.A., carried it to M.I.T., withdrew, and wrote the BDS C compiler with it in assembly language. That was enough assembly language hacking to last a lifetime, so now he enjoys UNIX/Xenix system administration, article writing, and raising his newborn daughter Katelyn. You can reach him at leor@rdpub.com or uunet!bdsoft!rdpub!leor.
If you leaf through this or most any other magazine, you may notice how the material is organized into columns. While advertising material and editorial narrative tend to make effective use of their allotted number of columns, long program listings are usually a different story.
The trouble is something inherent to the nature of structured programming: liberal use of indentation is standard, reasonable practice for maintaining readability. But once you've indented enough tab stops to the right there may not be much space remaining for code — especially if the entire width of the listing needs to fit in a space narrower than one full page.
Then there is the whole, messy issue of comments. Unless you take advantage of the C++ // comment delimiter, it costs at least six columns of overhead just to delimit a complete comment on a single line of code ("/* " and " */", shown with a space character at each end to keep the comment text from bumping right up to each delimiter).
Two traditional solutions to this excessive-code-girth problem are
a) photographically reducing code listings
b) asking authors to re-write listings to reduce maximum line length
The first approach tends to result in printed listings that are difficult to both read and scan. In this and next month's columns, I'll present a pair of C programs that together provide a reasonably effective automated solution to the second approach.

Those Pesky Tabs
Most program editing today is probably performed using fairly intelligent text editors, such as EMACS/Epsilon, Brief, etc. One feature common to such editors is the ability to specify a custom "hard tab" value: When you insert a tab character, the cursor moves over to the next column of the display that is a multiple of n, where you have specified n as a global setting. The usual default for this tab setting is eight. With this default in 80 columns, it doesn't take many nested loops to cause C programs to begin storming the right margin. To make matters worse, the space normally available for code listings tends to run even less than that — actually around 65 character columns.
The varying interpretation of hard tabs is one of the most confusing factors in processing code listings. While an author may have written his code with a tab setting of four to conserve horizontal space, the program used by a magazine publisher to print out listings for publication may assume that a hard tab setting of eight was used in the preparation of the text. The result is an incorrectly formatted listing.
To deal with this possible conflict, an author may choose to use only 8-column hard tabs in listings to be published, leading to the line-length problem. Or, he can provide listings with all tabs converted to spaces. While this eliminates the possibility of misalignment, it also makes editing of the listing more difficult and time consuming.

Line Numbers
I prefer to include line numbers in listings appearing with my articles because it makes referring to specific lines of code so much easier. The line numbers themselves, however, do take up several valuable columns of space. By knowing how many lines a listing contains, however, we can at least limit the number of columns reserved for line numbers to the minimum necessary amount.

The plist and maxl Utilities
The programs I shall describe in this short series of columns had their genesis in a trivial program I wrote many years ago to attach line numbers to a text file. That original program grew into plist.c as presented this month.
The three main jobs that plist tackles are:
1. generation of line numbers (with arbitrary field width)
2. translation of tabs into spaces, with automatic translation from one hard tab setting to another
3. alignment of single-line code comments along an arbitrary specified right margin, with automatic preservation of vertical comment alignment (via adjustment for the hard-tab translation performed by job 2.)
Line number generation is trivial. A numbering sequence is prepended onto each line of output.
Tab translation is the process having the greatest potential of actually reducing code width. It works by de-tabbing the original code (converting tabs to spaces) using a hard tab value perhaps different from the one originally in effect when the code was written. Setting aside the issue of comment alignment for the time being, it happens that tabs mostly appear at the beginning of code lines. Changing the hard tab value does not have any effect on the structure of the code.
The tricky part is dealing with comments. A listing without comments that has tab translation performed ends up being pretty much just a "squashed" version of the original listing. When comments are present, however, all bets are off. This is because the author of a C program will often meticulously format comments according to some personal aesthetic standard. Typically, tabs are used to align comments in consecutive lines of code neatly beneath each other whenever possible. (OK, not everyone does it, but the code is sure more pleasant to read when they do!)
Once such explicitly-formatted code is put through tab translation, however, the number of tabs required in various places to reach a common column is altered, and neatly aligned comments turn into a random mess. More on this shortly.
The final job plist performs is to align in-line comments along the right margin. An arbitrary right-margin column number may be specified. Since a major goal is to reduce the overall listing width, the right-margin value should be the smallest that allows the existing comments to fit in the available space without overlapping program code. plist will add or remove white space as necessary between code and comments in order to align comments on the right. Any lines containing comments that would not fit in the available space are flagged with an error message.
What about aligning the left end of comment lines? If the author painstakingly created perfectly aligned comments, this structure should certainly be preserved even following tab translation. The algorithm I use in plist to achieve this preservation requires that comments be aligned in the original listings in order to remain aligned through the process. plist will not produce perfectly aligned comments from raggedly aligned input (except occasionally, for odd sequences of lines by random luck).
plist does the trick by keeping track of both original and translated column numbers while processing a line of text. When an inline comment is encountered, tabs are converted to spaces according to the original hard tab setting, not the translated one. Thus, the exact length of comments is preserved regardless of their horizontal placement in the output file, and comments that were aligned originally remain that way in the processed listing. Note that all this happens only when the -c# option is used to enable right-hand alignment for comments, and the -tm,n option is used to specify both the original (n) and new (m) hard tab settings.

Command Line Options And Compilation
Line numbering is activated with the -n[#] option. By default, line numbers are generated using a field width of NUM_WIDTH columns. This default may be overridden by giving an explicit field-width value of #. A colon and single space character always separate the line number from the start of text lines.
To line up comments along the right margin, use the -c# option, where # is the desired column width of the output text. This value must include provision for columns occupied by line numbers (if -n is used). The -t option controls tab-to-space conversion. Both the original and new hard tab settings default to the value TAB_SETTING. The -t option may be used to specify both the old and new hard tab setting (separated by a comma, new first), or only the new tab setting. If comment alignment is not required, then the old tab setting may be omitted.
plist takes its input from each file named on the command line and writes the processed listing to the standard output by default. If the -o option is used, then the output for each file processed is written to a new file having the same base name as the input file and the extension .lst.
In the continuing spirit of portability, both plist.c and maxl.c are generalized for compilation under either DOS or Xenix. Because I could find neither the strstr function nor any equivalent function in the Xenix library, I wrote one myself and include it for the convenience of Xenix users (lines 282-313 of Listing 1) .

Sample Input And Output
A minimal, contrived C source listing is shown in Listing 2. The source file, named sample.c, was originally prepared using Epsilon with a "standard" 8-column hard tab setting. When displayed on the console, the file appears as shown in Listing 2. To actually generate this listing, plist was run on sample.c as follows:
plist -t8,8 sample.c >listing.2
-t8,8 tells plist to preserve the original eight-column tab style in the translated (tabless) output. Since the hard tab setting was not changed, comments remain naturally aligned as in the original source file.
To illustrate the effect of simple tab translation on comment alignment, Listing 3 was produced by the command:
plist -t4,8 sample.c >listing.3
As the output shows, reducing tab size shortens the listing and preserves code indentation, but the comments become misaligned relative to the original source file if tabs were used around or within those comments.
For the final example, Listing 4 was created via:
plist -n2 -t2,8 -c60 sample.c >listing.4
The option -n2 enables line-number generation in a two-column numeric field. -c60 tells plist to right-justify comments at column 60 (with automatic vertical alignment.) By specifying a new tab size of two and the old size of eight via -t2,8, I enabled plist to preserve the exact comment alignment of the original source code while recovering nearly 75 percent of the space originally consumed by tab indentation.
By the way, Listing 1 (plist.c) was produced with:
plist -t2,4 -n3 -c65 plist.c >listing.1
The main Program
Lines 65-97 of the program in Listing 1 process command-line options. Except for the detail of the individual case statements, this is "stock" code I copy into any new application I write that needs the ability to process options. It assumes options:
a) are always named by a single case-insensitive character
b) are preceded by a dash
c) contain no white space
d) can all be processed before mainline program execution commences.
After each option is processed, the code in lines 93-96 removes it from the argc/argv parameter list. By the time the general usage check is reached (lines 99-101), the only remaining command-line arguments are filenames or similar non-optional parameters not prefixed by dashes.
If the -n option was used to generate line numbers, then line 104 initializes the string fmt to the format specification that will be used for creating line numbers. This statement is a bit tricky, since it includes a format specification being used to create another format specification. The first two % characters generate a single % in the output, the %d sequence generates the value numwidth, and the final portion of the string, "d: " carries over literally. If run with a numwidth value of, say, three, then fmt ends up containing the text "%3: " (minus the quotes, of course).
Lines 106-107 call dofile for each input file named on the command line.

The dofile Function
All file input and output housekeeping is performed by the function dofile. Before any lines are processed, the variable fpo is initialized to the output-stream handle. If the -o option was given, then an .LST file is created for each input file. Otherwise, the standard output stream receives the processed text.
The loop in lines 143-145 calls do_line once for each line of input text, and lines 147-149 clean up after the entire file has been processed.

Processing The Lines
The code to process each line of text is broken up into two passes, performed by the functions pass1 and pass2. do_line ties them together and generates the leading line numbers (if needed).
Tab translation on a single line of input text is performed by the pass1 function. As a text line is processed from left to right, the variable in_cmnt records whether an open comment token (/*) has been encountered yet (I've ignored close comment tokens, under the assumption that executable code always fully precedes comments on lines where both are present.)
The variables col and old_col track the current column number from the perspectives of the new and old hard tab settings, respectively. When a tab character is encountered in the input line, that tab is translated into spaces in one of two manners:

If in_cmnt is true (we're in a comment), then the old hard tab value (old_tab) is used to control the number of spaces that are generated (lines 208-210).

If in_cmnt is false (lines 199-206), then two things happen. First, the tab is translated as per the new hard tab setting (tabstop) in lines 200-202. Then, in lines 203-205, old_col is updated to represent the new column position as per the old tab setting. This step is the key to maintaining comment alignment. By knowing the precise column at which the comment began in the original text, plist can reproduce that comment in the output listing with the same total column width as it had before.
When combined with right-margin justification, these features insure that any input containing vertically aligned comments produces an output listing with those comments still aligned.
The function pass2 handles right-margin justification after tab translation has been performed by pass1. Wherever a comment is found, it is shifted right or left by just enough to make the total resulting line length equal to exactly cmnt_col characters.
pass2 begins by checking to see if there is indeed a complete comment present (lines 235-237). If opening and closing comment delimiters are not both detected, pass2 returns leaving the text unchanged. This exempts multi-line comments (which usually begin at the left margin) from being right-justified. When a comment is detected, it is immediately copied to a holding buffer, cmntbuf, and its length is computed.
Lines 243-245 scan backwards from the start of the comment, searching for the last character of executable code. A terminating null is installed immediately after that last character. There is now enough information available to determine if the comment can even fit in the desired number of columns; lines 247-254 detect the case where the comment will not fit and diagnose it.
When enough room is available for the comment, the necessary number of spaces are appended to the shortened input line (257-259), and then the saved comment text is appended onto that (line 260). Finally, a trailing newline is restored and the line processing is complete.
In my next column, I'll present another tool to help with trimming down source listings — a program that locates and displays the longest lines in a listing.

Exercise
The only comments supported by plist are the standard K&R-style delimited by /* and */. Adapt the program to support C++ style comments, where the token // specifies the beginning of a comment that includes all text to the end of the line.