Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and was a member on the ANSI C committee. He also does custom C programming for communications, graphics, image databases, and hypertext. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukemvs.ac.duke.edu (Internet).
Q
What are YACC and Lex? I am currently developing a mapping package that reads xyz values, posts them on a map grid, contours them, and does various other manipulations with the data. It turned out to be that the number of options you have with the mapping part is enormous, e.g. title blocks, which pen to use, linear grid or geographic, etc...
I am using with all my programs the getopt function for command line options processing. I don't think setting all the available possible options from the command line would be practical.
I am looking for a utility that would allow me to create my own macro language with specific keywords that accepts options, sort of like many getopts. This would obviously call for another utility to parse my macros.
Will YACC and Lex help? If you think they will, I would appreciate a quick review on what they are and how they can solve my problems. If they won't, can you please suggest something else.
Sami Kurdy
United Arab EmiratesA
Lex can help you with your problem. What you want to do is to recognize different strings of characters that satisfy certain rules.
The Lex (LEXical analyzer) program produces a C source file that looks for specified sequences of characters and produces actions when it finds the sequences. The output of Lex is compiled into your C program. The resultant program may be longer and slower than one you could write, but it will take you less time to develop it.
Let me summarize some of the features of Lex. You specify any number of regular expressions. For example, abcd matches the string abcd. [abcd] matches a single character that is a or b or c or d. [abcd]* matches a string containing any combination of a, b, c, and d, such as abcd, bbcda, or aabbbbcccc. [a-zA-Z]* matches any string containing upper or lower case letters, such as ABCabcd.
For each regular expression, you specify a C code fragment to be executed when the expression is found. For example:
[abcd] printf("Found a string of abcd");To resolve problems in recognizing strings, Lex uses the longest string possible that matches. If there are still ambiguities, it executes the first rule that matches.Lex creates a C routine called yylex( ). When you call this routine from your program, it reads the input for characters. When it recognizes a specified character sequences, it performs the matching action, if any. For example, if you were using it to recognize keywords, you could have it return the corresponding numeric token values, which you could then process. You might specify this as:
#define IDENTIFIER 10 #define PLUS 9 #define INTEGER_NUMBER 11 #define SET_COMMAND 12 [_a-zA-Z][_a-zA-Z0-9]* return IDENTIFIER; '+' return PLUS; [0-9]* return INTEGER_NUMBER; SET return SET_COMMAND;Lex may be sufficient for your needs, if the command language consists of a number of commands with optional parameters following. The string that Lex has matched is kept in an array called yytext[], so you could call sscanf to do conversion of numeric values that were found. If you used the above values, you might write a code fragment that looked like the code in Listing 1.This would look for the word SET followed by an integer number. If an integer number did not follow SET, then a message would be generated and the program would exit. With a simple command language, using Lex to analyze the input for commands would decrease the amount of your coding.
From the above, you see that Lex can be used to generate tokens. YACC (Yet Another Compiler Compiler) can use these tokens with specified rules to produce additional actions. You could write your own compiler, if you wish, using YACC to do the parsing.
The YACC set of rules can be more complex than the Lex rules, since one rule can depend on the results of other rules. The rules can specify a grammar, rather than simple pattern matching. The general form of YACC input is:
declarations %% rules %% subroutinesThe rules look somewhat like the Backus-Naur Form (BNF) for a programming language. For example, the BNF rules for C can be found in the ANSI C specification or in Appendix D of All on C. Using the above Lex tokens, you might make up YACC specifications as:
%token IDENTIFIER %token PLUS %token INTEGER_NUMBER %token SET_COMMAND %% ADD_EXPRESSION : IDENTIFIER PLUS IDENTIFIER { printf("Add expression")} ; SET_EXPRESSION : SET_COMMAND INTEGER_NUMBER %% yylex() /* Generated by lex or your own */ ;Developing a language (with or without YACC) is a bit complicated. However, if your command language involves computing expressions or control loops, using YACC is simpler than trying to write your own parser.Language construction is a bit beyond the scope of this column. Perhaps Robert or P.J. can get an article written about that topic. (KP)
Q
I am trying to develop a TSR screen saving routine, but I am running into problems. I have an IBM compatible, 80286 AT computer with VGA graphics. I would like to be able to save a screen from the DAK soul snatcher program VID. EXE. Can you offer a solution in Turbo C++?
Also, what is a debugger (like Turbo Debugger), and what is it used for?
Matt Nowicki
Millersville, MDA
There are many aspects to TSR programs, which would greatly exceed the space in this column. I suggest you obtain one of the libraries of TSR functions (e.g. Blaise Tools), or check the past articles in The C Users Journal or Computer Language for how to create TSRs.
A debugger allows you to step through your C program, either by one line at a time or by blocks of code. At each step, you can examine the values of variables. You can set a breakpoint in the code, which will allow you to execute all the code up until that point and then permit you to go into a single step mode. A debugger is an excellent way to learn how C works, as well as a way of checking out small routines or programs.
Debuggers vary in their features. For example, most debuggers permit you to set breakpoints on a particular line of code. Whenever your program reaches that line, a break will occur and you need to type a key to resume execution. Other debuggers allow you to specify an expression (such as "i = = 50"), which must be true for the break to occur. This can eliminate a lot of keystrokes. (KP)
qsort Comparison Function
Two months ago, Firdaus Irani of Chestnut Hill, MA wrote about an error he got when trying to compile the program in Listing 2. The error read:
Turbo C++ Version 1.00 Copyright (c) 1990 Borland International q_sort.c: Error q_sort.c 12: Type mismatch in parameter '__fcmp' in call to 'qsort' in function main *** 1 errors in Compile ***I checked with Robert Jervis, who is another member of the ANSI C committee. As I expected, the error is due to a function prototype nested in a prototype. This is the expected behavior, and it results from how function pointers in function prototypes are evaluated.For simple function parameters, such as ints and doubles or data pointers, if the actual parameter is assignment compatible with the formal parameter, the conversion takes place silently. A parameter which is a pointer to a function has a type that includes the types of its parameters. For example, for a function prototyped as
qsort(void *array, int element_size, int element_count, int _fcmp(void *, void *)); ***** which is first, size oryou can call this with:
char char_array[10]; int compare_function(void *, void *); qsort(char_array, 10.1, (char) 1, compare_function);The compiler will ensure that the parameters are initialized as:
array = char_array element_count = 10/* Converted and rounded to 10 */ element_size = 1 /* Increased to a int */ _fcmp = compare_functionHowever, you cannot call it with:
int compare_function(char *, char *);since the types which the function expects differ (even though they are assignment compatible).Note this means that if you attempt to use strcmp( ) as a parameter to qsort( ) and you include <string.h>, you will get this parameter mismatch error.
You could change the prototype definition to read:
int _fcmp( )This way, it will accept a pointer to any function, regardless of type or number of parameters. (This is ANSI, but disparaged.) (KP)
Readers' Replies
External Name Headers
I am a Staff Systems Software Engineer at Seagate Technologies. Several engineers in my department read the journals, and find The C Users Journal to be one of the best.I read the question by Andreas Lang in the October 1990 issue of The C Users Journal regarding keeping external variables in one .h file. I found a technique in Dr. Dobb's Journal, curiously the same month of October 1990, in a letter by Robert White. Following is a memo I wrote on this technique, using it to solve the problem M. Lang wrote about:
I ran across a letter in the October 90 The C Users Journal asking how variable declarations and the extern declarations of those variables could be kept in sync, especially when you want to initialize the variable at compile time.
The whole point of the technique is to declare all the information you need in all cases in a macro call, then define the proper macro to extract those parameters (pieces of information) that you want for the specific case and put them in the desired format.
For the case of variable declaration, see Listing 3. You select one of the sets of macros via a defined symbol or its lack of a definition as shown in Listing 4. This would generate two different cases, which are shown in Listing 5. This is the cleanest solution I have found yet to this problem.
Don bandit Gangwere
Scotts Valley, CAThe letter that Don referred to in Doctor Dobb's Journal used a similar technique to keep enum values and corresponding strings straight. See Listing 6.
In an include file, say table.h, are the values:
TABLE_VALUE(sunday, "SUNDAY"), TABLE_VALUE(monday, "MONDAY"), TABLE_VALUE(tuesday, "TUESDAY"),To use these values, you could use the declarations shown in Listing 7.This seems like a good idea to keep enums and strings in sync. (KP)
A response in your October Q?/A! column in The C Users Journal told Andreas Lang that there was no solution to having one .h file serve for both definition and externals declaration. It isn't especially elegant, but the code in Listing 8 works for me.
This isn't original with me, but I can't remember to whom to attribute it.
Terry Shankland
Mesa, AZThe one specific comment I would make on your coding is to avoid declaring a structure template and a variable at the same time. For example, it ought to look like:
struct range { int xmin, xmax, ymin, ymax; }; GLOBAL struct range data_range #ifdef ALLOCATE_SPACE = { 0, 0, 0, 0 } #endif ;The next letter mentions my preference for a separate header file. Either that or duplicating the defintion just reads better to me. For example, to keep everything in the same source file, I might write it as shown in Listing 9. I made the 0 value explicit for variable2, just for consistency's sake.I find the fewer #ifdefs that I have to deal with, the easier it is for me to understand. (KP)
I'm responding to the letter from Andreas Lang in the October issue of The C Users Journal. Lang writes: "I am trying to keep all external variables in just one .H file." He offers his solution but notes, "it isn't very elegant."
Lang's goal and his problem draw attention to organization, the heart of the C language. In C, the question is always, "How shall I organize my code and my variables for maximum readability and productivity?" At this point in my own C career, I usually can produce the code that I need rapidly; but I spend a great deal of time reorganizing it. My own struggle with C is less with the syntax than with the underlying concepts of organization that are built into the language. I still struggle with this. I have made my own efforts in the direction that Lang is giving, with success and frustrations similar to his. I no longer believe that the organizational goal expressed by Andreas Lang is suited to the nature of C.
The nature of C dictates that you minimize the number of external variables. I have determined this by trial and error. Given that code reusability is a key reason for working in C, one strives to organize in such a way that code re-use is simplified. I try to move new code rapidly into a temporary library.
First of all, I must organize the source in a certain way so that it is suited to being in a library. Then, creating the library produces an ASCII file list of public symbols in the library (the .LST file produced by TurboC is an example). This ASCII file is an important reference tool. It should be a concise reference to the functions and external variables in the library-ized source. It shouldn't be littered with names that could have been eliminated by reorganization.
The nature of C, I believe, is not suited to collecting all external variables in a single header file. Rather, the nature of C dictates that you minimize the number of external variables that you use. Often a variable need be shared by only a few functions. I take this as a signal that those functions belong together in a common source file. If this cannot be done, the variable can probably be passed as a parameter, rather than an external. When this cannot be done, I want the declaration of an external variable together with the code that requires the external. The reference serves as a warning notice to myself to be careful.
Sometimes an entire set of variables must be shared by a number of functions. This is my clue to create a structure to house that set of variables. Not only does this reduce the number of variables to one, it also solidifies concepts forming in my code. The name which seems reasonable to apply to a new structure often identifies the concept. Thus I find the organizational demands of C help me to formulate my ideas.
The apparent redundancy of variable and function declarations used to bother me. Now they hardly seem redundant at all. It used to bother me that function declarations were a necessary prerequisite to function calls. Now I view the function declarations as a valuable portion of my code. They summarize important information in a convenient place for handy lookup.
It used to bother me that variables must be declared before they can be used. Now I think of the variable declarations as an organizational tool. Now, the undeclared variables of GWBASIC make me uncomfortable. BASIC provides no built-in method of assuring that I'll be able to go back to a certain spot in my code and double check the name of a variable, or to see what names have been used.
The thing that bothers me now about C is that I find little discussion that addresses the nature of C. One has to hunt everywhere, read everything, and glean what one can. Many analyses offer solutions at the syntax level, rather than emphasizing the underlying concepts of organization which are built into the language.
In your reply to Lang, you write, "There is no write-once solution... You need to repeat the prototype." This cold, impersonal rule of thumb states the facts. But the wisdom in your personal coding habits does reflect the nature of C: "I don't try to initialize in a header file, but I do include a copy of the header file in the source... The compiler should check for agreement [between the extern reference and the definition]."
"The compiler should check for agreement." This solves a problem that I have run across, and never managed to solve. Including a copy of the header in the source was something that, until now, seemed to me unnecessary and redundant. Now I see it as necessary and unredundant. Thanks.
To me, your remarks on your personal coding habits address the nature of C. Perhaps you view personal coding habits as a subjective issue. I would disagree with that. Obviously, some aspects are personal. Generally, I think the things that I try are personal. But when I've been forced by the language to modify my coding habits, I know I've bumped up against the nature of C. And when I finally make the right guess, and things start to work, I know that I have an objective solution. I'd like to see more emphasis placed on this thing that I call the nature of C.
Aside and apart from all of the above, let me say that I really enjoy your column. The Q&A format is always instructive. Also, you're very brave to publicly offer solutions to spur-of-the-moment problems. And you're very honest to print replies from readers who have a month or more to come up with solutions that may be even better than yours! But seriously, I keep all my back issues of CUJ on my desk, and I always refer to your column.
Art Shipman
New York, NYThanks for your ideas. It makes explicit for me what I have been espousing implicitly. I agree that there is a nature to C, as you so well explain. I don't like to make dogmatic rules, but rather to make suggestions as to its nature.
For example, if one finds themselves using external variables to communicate between source files, one should check the organization of the code. If you find yourself ever doing a cut and paste of a set of code, you should examine why and whether a general routine can be created. If you go beyond a single member operator (e.g. employee.hire_date.month), you should examine the modularity of your code.
What you've described in your letter is part of the object-oriented design/modular programming concept that is also part of the essence of C. I have been an advocate of structures, even before structure assignment was made part of the language. Using structures and developing functions to operate on them is the greatest step one can take to object-oriented programming. As I described in my talk at the C Forum in October, if you can create these structures and functions, you can have many of the benefits of object-oriented design, such as code reusability, without learning a new language. (KP)