Columns


Questions & Answers

Stringizing, Replies

Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707.

You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).

Q

I usually program in BASIC, and now I'm self-taught in C. However, I have two questions to ask you:

1. I'd like to use the C program in Listing 1 to read a file which has been created by BASIC PRINT# command.

Why did the output of rec.first_name exceed 15 characters while it was defined as 15 bytes?

2. How to print the output to printer 2 (LPT2) instead of printer 1 (LPT1) because in C there is only one stdprn.

Chaiyos Gosolsatit
Lewiston, NY

A

You were lucky that there was a NUL (a binary zero byte) immediately following the rec variable. Otherwise your printout might have even looked more garbagey (or is it garbagy?). printf doesn't know record_name's field width. The "%s" format specifier tells printf to print characters starting at the address in the corresponding argument and ending at the first NUL character. (By convention NUL signifies the end of the string.)

The symbol rec.first_name evaluates to the address of the first character in the first name field. printf() starts printing form there to the first NUL, which by sheer accident happened to follow rec. Thus printf() printed both the first_name and last_name.

To stop printf() without relying on a NUL, use the field width and the precision specifiers with "%s". For "%s", the precision (the number after the decimal point gives the maximum number of characters. Your print statements would then read:

printf("First Name: %15.15s\n", rec.first_name);
printf("Last Name : %15.15s\n", rec.last_name);
The field width specifier can be omitted when the strings will be at least the length of the precision value and the printout will still line up. I usually include a width specifier so that the printout lines up even if an unexpected NUL appears in a string.

2. Under MS-DOS a MODE command on the DOS command line before executing your program will set the printer to be any device (LPT1, LPT2, COM1, etc.). that you want to use. Some compilers also define symbols, e.g. stdlst, for alternate devices.

Q

Here is a funny concerning function pointers that I am at a loss to explain:

Suppose you define a function, e.g.

int func(void)
   {
   ...
   }
then define a function pointer with

int (*ptr)();
then point ptr to the function with

ptr = func;
Both func and ptr now presumably point to the starting address of the function and the latter can indeed be called either with

func();
or

ptr()
However it can also be called with

(*ptr)();
but *ptr is the object pointed to by ptr, which is presumably the beginning of the function body and not its address. How come they both work?

Peter Sington
Essex, England

A

When you call ptr(), you are using a shortcut that is now permitted by the ANSI standard. It was not in K&R.

Under K&R, calling functions with pointers was consistent with data types. You called the function with a pointer and the indirection operator. For example, with a data pointer:

int i, j;
int *p;
p = &i;   /*  Refers to the variable p
        (pointer to int) */
*p = 5;   /*  Refers to the object p
        points at (an int) */
j = *p;   /*  Refers to the object p
        points at (an int) */
The use of p alone represents a pointer to an int; *p represents an int. Following along these lines for a function pointer:

int f();
int (*pf)();
pf = f;            /*  Refers to the variable pf */
another_func(pf);  /*  Passes the value
                of variable pf */
another_func((*pf)());  /*  Passes the
                    return value of the
                    function (an int) */
The appearance of the form of a declaration (except for the data type) in an expression is an object of that data type. The (*pf)() form for calling the function matches this for function pointers. Note that the parentheses are needed due to the precedence of the operators. Without them, the expression *pf() would attempt to apply the indirection operator to the value that a function called pf returns.

Some compilers permitted pf(). According to the Rationale for the standard, this construct "is unambiguous, invalidates no old code, and can be an important shorthand. The shorthand is useful for packages that present only one external name, which designates a structure full of pointers to objects and functions: member functions can be called as graphics.open(file) instead of (*graphics. open) (file)".

Accordingly, the Rationale states you can use any of the following:

(&f)();  (*f)();   (**f)();           (***f)();
pf();    (*pf)();  (**pf)();    (***pf)();
I prefer to use the (*pf) () form. This alerts the maintenance programmer to the fact that a reference to pf will not appear in the external function listing.

Q

I want to stuff two characters into an integer. I used the following code and it works with Microsoft 5.0. Is there a better way?

int i;
/*  Stick the first character in */
(char) i = 'A';
/*  Stick the second character in */
*(&(char)i + 1) = 'B';
Larry Meyers
Raleigh, NC

A

I'm amazed that it works. The standard states that "a cast converts the value of the expression to the named type". It also states that a "cast that specifies an implicit conversion or no conversion has no effect on the type or value of an expression."

The address operator can only be applied to an lvalue or a function designator. An lvalue is something that represents a memory address where a value can be stored; an lvalue can be the left-side of an assignment statement. The address operator cannot be applied to an expression. For example, &(i + 1) is illegal.

Listing 2 shows a program with the casts and the results (with an offending error removed).

The cast of the char c should be okay, since it does not do any conversion. I tried the program with two compilers. The Manx Aztec compiler complained that (char) c is not an 1value and gave many more errors. The Microsoft compiler allowed everything except for the assignment to (double) i, which it complained was not an lvalue.

If you need to stuff the characters into an integer, it might be better to use the memcpy function or do bit shifting:

memcpy(&i, "AB", 2);
i = 'A' << 8 | 'B';
Q

I was pleased to discover in your February column an explanation of the offsetof() macro of the new ANSI standard. Unfortunately my compilers (Turbo C and Power C) do not provide this macro and after making a dozen phone calls I have not been able to find a copy of the draft standard. I'm wondering if you might be able to provide me a copy of the offsetof() macro definition.

A

The offsetof operator can be defined:

#define offsetof(type,member)  \
   (size_t) &( ((type *) NULL) ->member)
where size_t is an ANSI standard typedef used for values that represent sizes in memory units (e.g. bytes). You could implement size_t with:

typedef int size_t;
or

typedef long size_t;
This macro simply finds the address of a member of a structure whose base address is 0 (the NULL address) and casts it into an integer.

Q

Is there a way under MS-DOS to link to C library functions at runtime? My executable files are too long. Presumably each contains a copy of each library function it calls — printf(), scanf() and the like. How can I reduce or avoid this replication, even at the cost of some execution speed?

Dale Wharton
Montreal, Canada

A

I have just received a copy of RTLink by Pocket Soft, Inc. (P.O. Box 821049, Houston, TX 77282 (713) 460-5600). RTLink has two features that will help you with large executable files. First, it supports more overlays than the Microsoft linker does. Second, it supports runtime libraries (RTLs). These are similar to what OS/2 offers. The RTLs contain common functions that are used by multiple executable files.

RTLink was recommended to me by some of my associates. I haven't used it yet, but I have a program that is approaching memory limitations with a single level of overlays.

Q

The problem that Josh Cohen has (The C Users Journal, vol.8, no. 3), is similar to a problem that I had recently. I had written a dozen small parsers, and the startup code for each was nearly identical, making it natural to use one file of code and setting things up so that the preprocessor would make the necessary minor changes. The files were carefully named so that prog1.c would use the headers prog1_a.h, prog1_b.h, etc.. There were corresponding headers for prog2 and the rest. The stereotyped code was put in file progx.h.

What I wanted to do was write something like (for prog1. c).

#define PROG   prog1
#include "progx.h"
which might be like

#define INCL_A(i)  #i "_a.h"
#define INCL_B(i)  #i "_b.h"

#include INCL_A(PROG)
#include INCL_B(PROG)
or something that would have a similar effect. Nothing works, either in Turbo C 2.0 or Microsoft C 5.1. I've tried several variations, such as

#define STR(s)     #s
#define INCL_A(i)  STR(i) "_a.h"

#include INCL_A(PROG)
and even the direct

#include STR (PROG)
Nothing works. Sometimes the compiler refuses to recognize the construction, and other times says that it can't open the file

"prog1" "_a.h"
where for some reason the strings weren't concatenated. The token-pasting operator helps, but not enough.

What I ended up doing was, in the main source file, defining

#define INCL_A "prog1_a.h"
#define INCL_B "prog1_b.h"
and then in progx.h using

#include INCL_A
#include INCL_c
which works, even if it is more of a hassle.

I've gone through the manuals and the books, and I can't find any reason why this shouldn't work, but it doesn't. Do you know of a way around this, or at least, can you give me an explanation of why the compilers behave this way?

Jim Howell
Lafayette, Colorado

A

The translation of the source file takes place in a number of phases. These are defined in the ANSI Standard section 2.1.1.2. For example, in the second phase physical lines are transformed into logical lines. In this phase, the \ immediately followed by a new-line is eliminated. In the fourth phase, the preprocessing directives are executed. This brings in the #include files. You are using the:

#include DEFINE_LABEL
version of the #include file (a new ANSI feature). The DEFINE_LABEL must be something that has been #defined. Not until the sixth phase are the adjacent string literals concatenated. All of these phases may take place in a single physical pass of the compiler; they are ordered in the Standard to clarify the logical operations that should take place.

As you can see, string literal concatenation does not take place until after the #include filename has been accessed. Thus you get the prog1_a.h filename error. The best I can suggest is to add a few #ifdefs in your header. For example:

#ifdef PROG1
#define INCL_A "prog1_a.h"
#define INCL_B "prog1_b.h"
#endif
#ifdef PROG2
#define INCL_A "prog2_a.h"
#define INCL_B "prog2_b.h"
#endif
When you compile, you would simply use:

cc /DPROG1
or

cc /DPROG2
to setup the necessary names. Most compilers provide a /D or -D option to define a name on the command line.

More On Stringizing

From the readers' response on this issue, I am reminded of an old story that I tell my classes during the introductory talk. A couple had a child who did not speak a single word. As the child grew up, they took him to all sorts of specialists, but to no avail. Suddenly, at the dinner table on the child's tenth birthday, he asked, "Could you please pass the sugar?" The astonished parents exclaimed, "You can speak! Why haven't you said anything before?" The child replied, "Well, up to now, everything's been fine."

Responses were received via fax, email and postal service. Last month I published the only one received before press time. This month there were too many to print.

As I mentioned in the last column, once the solution appears, it is obvious. For whatever reason, I had a mental model of how the token replacement worked. The printed explanation in the standard always translated into my mental model, even though the explanation was not in line with my model, especially since there was no specific example. With normal stuff, it doesn't matter, at least for the macro expressions I generally use. For example, given:

#define MAX   10
#define X(Y)  Y + 1
#define Z(Q)  X(Q * 2)
then

Z(MAX)
can be interpreted as either (No expansion):

X(MAX * 2)
MAX * 2 + 1
10 * 2 + 1
or as (expansion of the tokens)

X(10 * 2)
10 * 2 + 1
With the # operator, these two ways are different. Given

#define MAX 10
#define A(Y)  #Y
#define B(Q)  A(Q)
then B(MAX) can be interpreted as (no expansion):

A(MAX)  /*  Simply substitute */
#MAX    /*  Don't substitute, simply expand */
"MAX"   /*  Quote operator */
or as (expansion along the way):

A(10)  /*  Since Q was in parameter list
         but not part of # or ##,
         it is expanded */
#10
"10"
For ANSI and me, the correct model is now:

"The tokens of a macro represent either a name or a value. If it is used with a # or ## in the replacement string, it represents the name itself (e.g. unexpanded). Otherwise it represents a value (the expansion)."

If you have this mental model, then it all works out fine. As you'll see in the responses, at least one compiler maker does not follow the ANSI model.

Replies on this problem were received from many people. Although there is not room to print them in full, I thought it would be interesting to show how many variations there were in naming the "stringifying" macro. I'm going to go out on a limb and suggest a "standard name" for this macro, just like argc and argv are "standard names" for the main function's parameters. Let's call it "quote (x)".

Replies

You will probably be flooded with responses to Josh Cohen's question in the March 1990 C Users Journal, but just in case nobody else writes:

#define quote(x) #x
#define STRING_WITH_VALUE(x) "Error message" quote(x)
James Janney
Salt Lake City, Utah

I received comparable advice from:

Mark Grand, Concord CA; Shamus McBride, Seattle, WA; Carl Paukstis, Spokane, WA; Ian Cargill, Surrey, United Kingdom; Mary Kirtland, Arlington, VA; Mike Higginbottom, Sturtevant, WI; and Al Williams, League City, TX.

Josh Cohen, in the March, 1990, issue of The C Users Journal, had a problem with getting the preprocessor to put the value of a macro into a string. I had a similar problem where I was trying to insert a constant into the width parameter a format string. I solved it this way:

#define STR_LEN    15
#define STR(s)     #s
and used these in a variable definition

char* fmt ="%" STR (STR_LEN) "s";
The contents of fmt are now "%15s", which was exactly what I wanted. This solved the problem for Turbo C 2.0, and your code at the top of page 35 works here, too.

Unfortunately, neither of these work with Microsoft C 5.1, which produces the format string "%STR_LENs" and Mr. Cohen's problem, but with another layer of indirection both can be made to work. To solve Mr. Cohen's problem, write

#define MAX       10
#define STR(s)    #s
#define MSG(m)    "Error message " STR (M)
and use the call

DoMsg (MSG (MAX));
The string "Error message 10:" is now passed to DoMsg, as we want it to.

My problem needed a similar change, namely

#define STR_LEN     15
#define STR(s)      #s
#define FMT(s)      "%" STR (s) "s"
Char* fmt =FMT (STR_LEN);
Both of these behave the same way in Turbo C. Thankfully.

This naturally brings up the problem of which is the correct behavior for a compiler that conforms to the C standard. This seems to be a problem of precedence, over whether the stringizing comes before the translation of STR or after it, and from what I can tell from the standard, the MSC compiler behaves correctly. But then the whole thing is murky. It took some experimentation to figure these out, and I have no confidence that any of them would port to another "conforming" compiler, or even to the next version of these.

Jim Howell
Lafayette, Colorado

Constants Into Strings

I read with interest the letter from Josh Cohen and Stuart Downing of Dexter Michigan, and your response. You can use the * in the specification, as:

printf ("\n Record is %*s %*s",
   WFIRST_NAME, record.first_name,
      WSECOND_NAME, record.last_name);
The * may also be used in place of the precision part of the conversion specification, as:

printf ("%10.*f", precision, value);
Obviously, the width specified need not be a constant, allowing you to specify a variable field width or precision. Note, however, that you will incur a run-time penalty, which would not be the case if the pre-processor generated the string you wanted.

I realize this doesn't solve the original problem, but I couldn't find any way to do that either.

David Hansen
Chaska, Minnesota

Similar letters were received from John D. Bowman of Maryland Hts., MO and Michael S. Alt of Rockville, MD.

Reply To The Replies

Thank you all for your comments on the * specifier. I think I put down a version of my problem that I simplified to something that the * can now handle.

What I really wanted to do was initialize a table of format strings. I had this problem six years ago (would that be 6 BAC — Before ANSI C?). The * specifier did not exist in the compiler I was using. As I recall the situation, there was a record on disk that looked like:

#define WFIRST_NAME 10
#define WLAST_NAME 30
#define WNUMBER 5
struct s_record
   {
   char first_name[WFIRST_NAME];
   char last_name[WLAST_NAME];
   int number;
      ...
   };
struct s_record record;
I created an array of field information that looked something like:

struct s_field
   {
   char *location;  /*  Location of data */
   char *format;    /*  Format for data */
   };

struct s_field fields[] = {
   { (char *) &record.first_name, "%10s"},
   { (char *) &record.last_name, "%30s"},
   { (char *) &record.number, "%5d"},
      ...
   };
The fields array could be used in a loop to retrieve and print the information. The fact that the sizes of each field appeared in two places made it more difficult to maintain. So it would have been quite helpful to have the quote operator and implicit concatenation then.

I eventually wound up using a table such as:

struct s_field
   {
   char *location;
   int size;      /*  Number of bytes */
   int type;     /*  Type - STRING, INTEGER, etc. */
   char format[10];    /*  Format string */
   };
where the types were #defined. (This compiler did not even have enumerated types). Then in an initialization routine, I did some sprintfs to the format field in a switch statement. This looked something like:

switch (field.type)
   {
case STRING:
   sprintf(field.format, "%%%ds", field.size);
   break;
case INTEGER:
   sprintf(field.format, '%%%dd", field.size);
   break;
      ...
   }
Note that the %% yields a single % in the output. (KP)

Mouse Hardware

You may determine a mouse's hardware hookup through interrupt 33h function 36(decimal). The mouse driver must be compatible with the Microsoft Mouse v6.0 or later, released in September of 1986. Exact information is in the Microsoft MOUSE Programmer's Reference, pages 203-4, by Microsoft Press.

However, if you call function 36 and the mouse type is serial (CH = 2), the IRQ number (CL = 4 or 3) tells you that the serial port is COM1 or C0M2, respectively.

Code samples in assembly language and in C appear as Listing 3 and Listing 4. They unrightfully assume familiarity with the IBM-PC hardware interrupt structure.

Daniel R. Haney
Cambridge, MA