LETTERS

Find That Function With AWK

Dear DDJ

In the article "Find That Function!" (August 1988), Marvin Hymowech provides two C programs for building and searching a file of where-defined information about C functions and their related source files. I don't want to denigrate what Marvin has done (because he has obviously put a substantial amount of work into the programs), but this task (among many others!) is perfectly suited to awk.

I have provided two awk programs that do basically the same thing as his C programs. Note that the code is much more compact (by an order of magnitude or two). The bldfuncs routine has been implemented using a different algorithm than Marvin's. I consider only those lines of C source where the last character on the line is a closing parenthesis. This set of lines includes all function definition lines. (This is coding style dependent, but it's true for the way I and Marvin and 99 percent of others code C.) Of these lines, the ones that begin with if for, while, ||, and && are discarded. This leaves just the function definition lines (with a slight possibility of a comment line).

The functionality of the second program, getf has actually been enhanced in the awk version. One can search for function names using regular expression syntax --decidedly more powerful than wildcards. Note also that the awk version invokes Marvin's editor directly instead of relying on DOS environment variables.

I didn't benchmark the two versions of the program, but my guess would be that the awk version runs a little slower. Since awk is not compiled and linked, but rather interpreted, it will be inherently slower (but not slow). But this is actually an advantage when going through the code-test-debug iteration cycle.

Although awk is certainly not suitable for developing major applications, it is beautifully suited to developing this kind of utility program. With C, these "quick little utility functions" take on many of the aspects of major application development. With awk, they truly are quick and little. See Examples 1 and 2, below.

Example 1

#---------------------------------------------------------------------- # Bldfuncs Joseph L. Kidd # This awk program mimics the function of the same name given # in ``Find That Function'', by Marvin Hymowech in DDJ, Aug. 1988 # # Usage: awk bldfuncs.awk srcfile1 srcfile2 ... # where srcfile1, ... may contain wildcard characters. # Output: A file named funcs.txt with the format: # srcfile1: # function_1 lineno1 # function_2 lineno2 # ... # function_n lineno3 # srcfile2: # ... # (where lineno1,... are the line numbers where the functions are # defined.) #---------------------------------------------------------------------- FNR == 1 { print FILENAME ":" >"funcs.txt" } /\)$/ { if ($1 ~ /^if\(*/ || $1 ~ /^for\(*/ || $1 ~ /^while\(*/ || $1 ~ /^\|\|/ || $1 ~ /^\&\&/* next else for (i=1; i<=NF; i++) if (x=index($i,"(")) { if (x==1) print " " $(i-1) " " FNR >"funcs.txt" else print " " substr($i,1,x-1) " " FNR >"funcs.txt" break } }

Example 2

---------------------------------------------------------------------------
#    GETF                Joseph L. Kidd
#    This awk program mimics the program of the same name given
#    in the article ``Find That Function'', by Marvin Hymowech in DDj, aug. #    1988.
#    Usage: awk getf.awk req+<req_func_name> funcs.txt
#         where <req_func_name> is the requested function name.
#         Note that <req_func_name> is the requested function name.
#         which is significantly more powerful that DOS's wildcards.
#--------------------------------------------------------------------------
$1 ~ /:$/ { file = substr($1,1,length($1)-1) }
$1 ~ req  { print "Pattern=\"",req,"\". ",
                    $1, "can be found in file", file,
                    "at line", $2 "."
               req_func = $1
               req_file = file
               req_count++
}
END       {    if (req_count==0)
                    print "Could not find", req
               if (req_count>1)
                    print "Multiple functions found."
               if (req_count==1)   {
                    cmd = "b  -m\search_fwd " req_func "\" " req_file
                    system (cmd)
               }
}

Joseph L. Kidd

San Jose, Calif.

Dear DDJ,

I have found that two changes should be made to Marvin Hymowech's function finding programs.

1. In bldfuncs.c, get_names_one_file() is confused by the declaration of a pointer to a function, as in:

     int (*point1)( )=puts;

It eats through the next function body and records int as a function. To fix this, after

        if(c ==';' || c ==',')  /*functions
             type check declaration, */
   continue;               /* so bypass it */

add

    if(c == PARENS)
/* something pointing to a function */
continue;                    /* so bypass it */

Actually, it could still be fooled by a function that returned a pointer to a function, as in:

        int (*function( ))(){ ... }

or odd but perfectly legal declarations like

        (main)(argc, argv) { ... }

or

        (main(argc, argv)){ ... }

2. In patn_match(), change

       while( *s++)           ;

to

        while( *s)       s++;

Otherwise, s gets incremented past the terminating zero, and the returned value can be incorrect.

I also changed bldfuncs to write the line number of each function definition so that getf could use the line number rather than the function name in the command line for the editor. I believe I'll add Unix-style pattern matching as well. Thank you for a very handy tool!

James R. Van Zandt

Nashua, New Hampshire

Dear DDJ,

In his article "Find That Function!", Marvin Hyrnowech puts his finger on a major practical problem with C, namely the poor recognizability of function definitions. I have a solution to the problem that, though so crude and lowtech it's almost embarrassing, may be of use to your readers. Namely, whenever I write or revise a C source-code file, I put the token FUNC, #defined to be nothing either in stdio.h or at the head of the file at the beginning of every function definition and the comment /* FUNC */ after the #define in every macro definition, as in:

        #define FUNC

        FUNC char *strend(register char *s)
    {
    while (*s++)
    return s;
    }

        #define /* FUNC */ max(a,b) ((a)>(b)?(a):(b))

To find a specific definition I use one of the two greps as follows:

             grep "FUNC .* strend("*.c

or:

             grep "FUNC .* max(" *.c

To list all the definitions I do:

             grep "FUNC" *.c

To list just function definitions:

             grep "^FUNC" *.c

And to list just macro definitions:

             grep "FUNC" *.c

It should be fairly simple to write a program that would insert FUNCs into a "raw" C file, especially with awk. But whatever you do, always make sure that FUNC does not denote any real C object. If it does, and you #define it away, there's bound to be trouble. Normally it will show up as a syntax error, but occasionally it won't.

Margaret Armstrong

Cambridge, Mass.

Compiler Review: Clarification and Crotchet

Dear DDJ

I enjoyed the compiler review in "The State-of-the-Art in Modula-2."(See September 1988.) I was encouraged to see that Kent Porter was as enthusiastic about JPI TopSpeed Modula-2 as I was. Although a 286 with hard disk and color is nicer, I found it to work quite well on a dual floppy monochrome system. (A small RAM disk with COMMAND.COM is quite helpful.) The multiple window environment worked well for learning the language. I could work on a program in one window, pull over a section of code into another window to test an unfamiliar feature of the language, and use a third window to look at library .DEF files.

There are a few things I felt the article may have overlooked, and so I have a request for clarification and a mention of a small "crochet."

No mention was made about Library Source Code. JPI includes it on a third disk. Some vendors charge extra. One of the libraries is a 34-function window library. Again, some other vendors charge extra.

I had trouble with the formula 0=(C-N)/C. Running it through my calculator, I got the following results: FTL = 59.03, Logitech = 62.70, TopSpeed = 83.69, Stony Brook = 88.48. As you can see, the only number that agrees with the article is the one for Stony Brook. By the way, what was the formula used for Geometric Average? I looked "Geometric Average" up in a CRC and tried to use the formula but couldn't get the results shown in the article.

One reason Logitech code sizes are so large is that all of the library object is linked in, even if only one procedure is used. If that library uses a procedure from another library, then all of the object code from the other library is also included, and so forth. (This is from page 283 of Modula-2, A Complete Guide by K.N. King.)

JPI TopSpeed used LONGREAL for all MATHLIB functions, which might explain the large difference in size for the FPMATH code when compared with other code sizes.

It would be nice if, in addition to the benchmarks themselves, you could show the changes needed to make the benchmarks run on each compiler. By the way, what was the environment used in making the speed and size tests? I suspect it was not a dual floppy monochrome 8088 XT compatible under DOS 3.2 at 4.77 MHz. I also suspect it was not a 386 with 1-Mbyte RAM disk and 1-Mbyte disk cache at 16 MHz. I wanted to get some idea of Modula-2 speed and code size when compared with the C compilers you had reviewed in a recent issue.

And now for the small "crochet." It would be very helpful if all compiler reviewers could pick a small suite of programs --say Sieve, Acker, and Drystone (which almost everybody uses anyway)-- to run against a standard environment, say 286 with 20-Mbyte hard disk at 6 MHz, so that readers could get some idea of relative compiler efficiencies. That way, readers could see if the improvement of Microsoft C 6.0 (whenever it comes out) over Microsoft C 5.1 would justify the big bucks or (more interesting to me) how an efficient C compiler stacks up against an efficient Modula-2 compiler. This could be in addition to any other benchmarks they felt necessary.

Robert A. Durtschi

Jolon, Calif.

Kent responds: The algorithm used to compute geometric average is too complex to reproduce in this limited space. If anyone wants it, they can send a request and a 5 1/4-inch floppy to DDJ, and we'll furnish the source code and .EXE for a geometric average spreadsheet that I've written in TopSpeed Modula-2.

The test platform for the benchmark: appear: at the top of page 74. We'll also furnish the code for the timing program If you request it.

The only changes Inane in the benchmark programs were the necessary changes in IMPORT lists, plus a couple of output statements required to adapt to TopSpeed's nonstandard calls.

As for the "crochet," I agree in principle, but there are practical problems, like what constitutes a "standard" environment. Our contributors are freelancers whose machines have little in common. We haven't the time to rerun benchmarks in the editorial offices. Thus, the best we can do in a practical sense is to make sure all benchmarks in a given article are run in the same environment.

Codeview Can Work With Turbo C

Dear DDJ,

Microsoft's Codeview and Turbo C 1.5 can work together. One compiles the source code with Turbo C's line numbering option on. Microsoft's linker is then used to link the object code with the /CO option enabled, which tells the linker that Codeview is going to be used on the program. The program is now ready for source level debugging.

I had purchased MASM 5.1 to help optimize some of the critical routines written in Turbo C. I ran into a bug while testing a serial port driver and was wishing for a good source level debugger that would work with Turbo C. After some tinkering, I found Codeview will work, provided one uses Microsoft' s linker.

Richard J. Clark

Tetragenics Company

Butte, Montana