Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707.
You may fax questions for Ken to (919) 493-4390. When you hear the answering message, press the * button on your telephone. Ken also receives email at kpugh@dukeac. ac. duke. edu (Internet) or dukeac!kpugh (UUCP).
Q
Yesterday I spent several hours tracking down the following error. I have multiple source files, each of which accesses certain global variables. For example, consider these two files:
------------file #1-------------- char date[9]; main() { strcpy ( date, "01-01-90" ); foo(); puts ( date ); } --------------file #2-------------- extern char *date; foo() { puts ( date ); strcpy ( date, "02-02-90"); puts (date ); }Both files compile without warnings (I had #included the appropriate header files). The link step generated no warnings. However, when I ran the program, my computer entered the twilight zone. A little detective work with a symbolic debugger revealed that the value of date was 0, despite the fact that the link map showed a valid storage address for date.The compiler I used yesterday was QuickC 2.0. Today I tried this small example on a VAX and got a memory protection fault. I had an associate try the problem under UNIX; he got a similar error. I also tried MSC 5.1; the program ran without errors, but the output seemed to show that date in main was stored in a different place than date in foo(). Turbo C ran the problem without any difficulty.
I am sure you are aware that the solution is to replace the declaration in file #2 with:
extern char date[];My question is this: why does this error occur? When is a pointer not the same as an array name? I feel that I am missing something fundamental about pointers and arrays. Am I?Richard J. Wissbaum
Aurora, COA
You suffer from common pointer/array confusion, which can arise for many reasons, including the fact that square brackets have multiple meanings. To work towards a cure, let's examine some facets of arrays, pointers, and their symbols. Although your problem deals with external arrays, for the benefit of our readers, I'll start with automatic arrays. Assume that ints are two bytes long and that addresses (pointers) are four bytes long.
In Listing 1 function_1 contains an array, good_auto_array, declared as a local variable. The compiler allocates twenty bytes (10 times sizeof(int)) on the stack for good_auto_array, and four bytes for integer_pointer. You can make memory references with integer_pointer by using either form shown. Of course, you do not have to initialize it, but not doing so is the easy way to destroy your program.
The indexed form, integer_pointer[1], is normally transformed into the indirect form, *(integer_pointer+1), by the computer. (This translation does not always take place. I recently examined code produced by the DEC VAX compiler and different assembly instructions were generated depending on whether the source used the indexed form or the indirection form.) Likewise good_auto_array[1] is transformed into *(good_auto_array + 1). Unlike integer_pointer, good_auto_array represents a constant address. Thus, you cannot assign anything to it, as
good_auto_array = integer_pointer;Using the compile time operator sizeof on these two variables, sizeof(good_auto_array) would be 20 and sizeof(integer_pointer) would be four.The following erroneous declaration doesn't allow function_2() to compile, since you must state the size of the array when declaring a local variable.
function_2() { int bad_auto_array[]; /* Does not compile */ }Now for passing an array as a parameter:Listing 2 contains two functions which are passed an array as their sole parameter. The name of an array used without the subscript has the value of the address of the array. The address of array_passed is given to both function_3 and function_4. Regardless of which way you declare array_parameter, you can reference it using either function_4's indirect form or function_3's index form. In both cases array_parameter is four bytes long (sizeof(array_parameter) yields 4) and like all function parameters, acts like a local variable. Thus, you could assign
array_parameter = local_array;As for globals
extern int global_array[]; function_5() { global_array[1] = 5; }Function_5 makes a reference to a global array. You must already have declared this global array using the form that includes its size. If you attempt sizeof(global_array), the compiler should report an error, as it cannot determine the actual size. The following declaration should appear either in this source file or in any other source file to which this file is linked.
int global_array[10]; /* Sets aside 10 times sizeof(int) */When declaring global variables, you either make a definition or a reference. The form with the explicit size is a definition and sets aside storage. The form without the size and with the keyword extern simply makes a reference to a variable that will be linked to its definition at link time.You can make lots of references to global_array within all the linked source, but you may write only one definition. The ANSI standard sort of waffles on this point. A strictly conforming program will have only one definition for a global in all the sources files linked together. Some linkers will permit multiple definitions of the same global and treat all the ones after the first as references. The standard also permits the form
extern int global_array[10];to act as a definition, a form not permitted in K&R.To cover all possible extern declarations, the committee came up with the concept of "tentative definition". Section 3.7.2 of the Standard gives examples of this, though I won't elaborate on it here.
I prefer the general rule that works across all compilers:
One declaration of the form
int global_array [10];and all other declarations of the form
extern int global_array[10];The definition of the global sets the source aside. When the linker matches a reference to that global, it checks only for a matching name, not whether the two items are the same data type. (I once had a linker that did not even care if the definition was for a data item and the reference was to a function. That caused some interesting debugging problems).Now if you declare global variables differently in two source files that are linked, the linker will not produce an error. The ANSI committee stated that this problem was outside their jurisdiction. You can avoid this problem with something similar to function prototypes, as shown shortly.
Now to your problem (finally).
char date[9]; main() { strcpy ( date, "01-01-90" ); foo(); puts ( date ); }The external declaration of date allocates nine bytes. strcpy copies "01-01-90" into those bytes, and sizeof(date) is nine.
extern char *date; foo() { puts ( date ); strcpy ( date, "02-02-90"); puts (date ); }You declared date to be a pointer (sizeof(date) is four). The linker matched this reference to the date variable declared in the other file as an array of char. Referencing date in foo() uses the values in the first four bytes as an address. The strcpy in foo() copies "02-02-90" into that address.The strcpy in main() placed values in the first four bytes in date. The values in the first four bytes of date in ASCII and their hexadecimal representation are
ASCII 0 1 - 0 HEX 30 31 2D 30Since the PC stores addresses in reverse byte order, strcpy() will be passed 0x302D3130. If you are lucky, this address is simply in your data space and will simply wipe out nine bytes of some trivial variables. If you are unlucky, however, the address is in your code space, and your instructions will change.Using function prototypes (even ones without parameter indications) and including them in all your source files will make the compiler check the function return types. It can check that the function is actually defined using the same type as the prototype.
Similarly, if you set up a header file of extern declarations for all your external variables and include the header in all your source files, the compiler will check that your actual definition for the externals match the references to them.
Following these lines, I would have a header file called extern.h which contains the code in Listing 3. The other files would #include this file.
Since you have #include extern.h in the file where the variable is declared, changing the definition without changing the header file, produces a compilation error for that file.
An easy way to check external function and data definitions and references to them is to make the compiler create a symbol table that includes the function return types, parameter types, and external data types. This information is already created by the compiler and can be performed without the added baggage of including the source headers in every file.
The checking algorithm that the compiler already performs on function prototypes could be employed in a program that reads the symbol files for all object files that are to be linked. Function prototype checking and external data type checking could occur without having to write extra source code specifically for that purpose.
Replies
Stringizing
As many have decried, there's plenty in the C language with which to pervert the meaning of "elegance", such as the ternary "?:", taking the value of assignments, the wonders of the comma operator, etc. With so much density permitted by the language proper, why waste time trying to get cute with the primitive C preprocessor?I refer, of course, to Josh Cohen's letter (CUJ 3/90 "Q&A", p 34). I see things very differently than you do. What you call the "uglier" solution is in fact the simplest and therefore (dare I say it?) the most truly elegant.
What's so painful about defining not one but two symbolic constants in this situation? I do it routinely, myself, where I know I'll need a numeral and a string. I make it easy mnemonically by using the same symbol but with a postfixed q for the "quoted" version, e.g.
#define MAX 10 #define MAX_q "10"and then get off and running. Let us remember that related #defines are usually placed in a common header location, so how horrible is it to do two of them? I mean, unless your program has hundreds of symbols that have to be available in both quoted and unquoted form, what's the big deal? Are you using EDLIN or something?And who even needs "plenty of comments"? I know for my own use, and could document it easily for others, the one simple commentary fact that "In the modules comprising this program, any preprocessor symbol ending with lower case q expands to the enquoted expansion of a similar symbol written without the q."
When it comes to the code that depends on such symbol pairs, things get prettier, not uglier (i.e., you don't see one macro symbol as a parameter to another). I expect the preprocessor runs slightly faster, to boot, without going through substitution contortions. Certainly, the program runs much faster indeed than if we run sscanf() each time! (How could you even suggest it?!)
Diehards who refuse to accept the preprocessor's limitations may wish to switch to Turbo C, which at least at version 2.0 sports the (we now know non-ANSI-conforming) preprocessing "feature" of expanding tokens after the stringizer, i.e., under Turbo they can indeed code:
#define BOZO Clown #define STRINGIZE (X) #X printf("Bozo the %s",STRINGIZE(BOZO));and get what the Standard does not provide. Then they can complete their clever machinations with
#if !defined(__TURBOC__) #error PORT FAILURE: Stringizing\ macro contraindicated! #endifand be the envy of the neighborhood.J. A. Jaffe
Walnut Creek, CAYour comments are interesting. If I only had a couple of symbols, I'd probably do it your way. However, I like to make only one change. If two things need to change (even if they come one line apart), according to Murphy's law, only one of them will be changed.
I would like to thank William M. Raike of Auckland, New Zealand, who also sent a reply regarding this topic. He uses Turbo C, which incorrectly does the expansion first. It is a bug, according to ANSI, rather than a feature. You people who are using it, would you rather have Borland leave it alone or fix it? (KP)
Function Redeclarations
In replying to the first inquiry by Mr. Glenn Jordan in the April CUJ, you failed to address one of his points, which was the "Error 10: type mismatch in redeclaration of getarray" issued by Turbo C. The reason for the error is that when main () called getarray (), Turbo C built an implicit forward-declaration for getarray (), which gave the function the implied return-type of int. When getarray () was later defined as type void , the compiler perceived a conflict.Of course, the other compilers Mr. Jordan tried should have issued the same error (unless maybe ANSI has downgraded it to a warning, but the compiler should have said something. Note that had the getarray () function definition appeared before its first call, there would have been no error, since the definition of a function serves as its declaration if and only if the definition occurs before the first time the function is called (and no explicit forward-declaration is present). Tricky business!
John Lowenthal
Brooklyn, New York
sizeof Operator
Your answer in the April C Users Journal for Glenn Jordan's question concerning sizeof could have been more concise if it stressed that sizeof is a compile time operator, not a runtime function call, as it appears to be, (ref. K&R Rev 2, page 135). The implications of it being a compile time operator makes it obvious that it cannot be used inside a subfunction to determine quantities like the size of a passed array.The answer to his question is that a function cannot get an array's allocated size. It must be passed this information.
Incidentally, I enjoy your column very much, and I find it to be of great value, keep up the good work.
Harry N. Bearman
Fort Worth, TX
Keyboard Interrupt Processing
Mike Drew wanted to know how to do interrupt processing when a key is pressed. You suggested that he chain int 0x16. This will not cause his toggle to be set until some later time when he tries to read a keystroke from BIOS. If he chains int 0x09, the toggle can be set immediately, and the spurious keystrokes will never get into the buffer. The only drawback is you are running off the hardware scan code, so the key is otherwise unavailable.An outline for the package in Turbo C follows (Listing 4) . I have also attached a simple Turbo C routine (Listing 5) I have used to map the scan codes on my keyboard. This works in the described manner, but does not chain the old interrupt.
(Note: Bulletproof use of this technique requires capturing other interrupts to ensure that the old 0x09 interrupt is restored when the program exits, or placing the new interrupt in a TSR or device driver. In the latter case you don't need install and restore routines. Also, if you screw this up, its re-boot time.)
Joseph W. Gibson
Pasadena, CA
Of Mice And Men
Here are additional replies to the question on the IBM-PC mouse. I included both listings, as I find it instructive to show how two people tackle the same problem.In the March Q?/A! column of CUJ, Michael Wiedmann asked how to determine whether a mouse was connected to a specific COM port. He went on to mention using int 0x33, function 0, so I presume that he is using the Microsoft mouse and driver. I am not exactly sure what he wants to determine. Your answer appeared to assume that he meant, "is a mouse actually attached to the appropriate port?" Function 0 should detect that for him. If, instead, he means, "which port is allocated for mouse support?", there is a different function to determine that. The Microsoft interrupt function 36 (decimal) returns the following information:
1. Which type of mouse is being supported (serial, bus, InPort, PS/2, or HP);
2. Which version of the mouse driver is installed;
3. Which IRQ is used by the mouse driver.
The last point is the relevant point for determining which COM port is used by the mouse. However, there are two potential "gotchas". First, the PC uses IRQ 3 for COM1 and IRQ 4 for COM2, while the AT uses IRQ 4 for COM1 and IRQ 3 for COM2. So you have to find out which machine is being used. Address 0xFFFFE in the BIOS identifies the machine as follows:
0xFC is an AT 0xFD is a PCjr 0xFE is an XT 0xFF is a PC[from The Programmer's Problem Solver by Robert Jourdain. Brady Books,ISBN 0-89303-787-7]The second possible pitfall is that very old mouse drivers do not support function 36. My original Microsoft Mouse User's Guide has functions 0-16 and 19 only. I don't know when function 36 was added.
As a minor point, I should add that it is possible to configure a bus mouse to use COM1 or COM2 IRQs without their physically being attached to the appropriate COM port. But, if the system is configured with an IRQ conflict, detecting the mouse is the least of your problems.
Pages 203 and 204 in the Microsoft Mouse Programmer's Reference [Microsoft Press, ISBN 1-55615-191-8] cover this function. I strongly recommend the book to Mr. Wiedmann.
Thomas R. Clune
Boston, MAThis is a response to the question from Michael Wiedmann in the March 1990 Q?/A! column. He asked about determining which COM port on a PC compatible is used by a serial mouse. This can be accomplished with Interrupt 33H Function 24H (Get Mouse Information). This is one of the more recent functions which have been added to this interrupt. Microsoft originally used a MOUSE. SYS driver loaded through a CONFIG.SYS mechanism during boot, with the following old load syntax:
DEVICE=\path\M0USE.SYSSome time back (I believe it was about two years ago) Microsoft changed to a MOUSE.COM driver which is loaded as a TSR. This driver can be placed in AUTOEXEC.BAT, but this is not required as long as it is loaded before needed by an application program. The MOUSE.COM driver can be removed from memory (MOUSE OFF), while the older MOUSE.SYS was permanent. The new load syntax for MOUSE.COM is simply:
MOUSE
The code provided in my enclosed listing (Listing 6) includes a function which calls INT 33H Function 24H and interprets the results. This interrupt returns information about the interrupt line used by the mouse. PC compatibles have a well-defined relationship between port addresses and interrupts for ports COM1 and COM2. Unfortunately, no such relationship is guaranteed for additional serial ports such as COM3 or COM4. Therefore, the information provided here is only useful for COM1 and COM2. Port COM1 uses interrupt IRQ4, while COM2 uses IRQ3. The mouse_info() function returns the decoded COM number.The mouse type (serial, bus, etc.), interrupt line, and mouse driver version number are also returned by mouse_info(). If the mouse driver is not loaded, the interrupt vector points to a return instruction, and no registers are changed. All returned values from mouse_info() will then be zero. Please note that the REGS union and int86() function are extensions to ANSI C available in Microsoft C and QuickC. Similar methods of calling 8086 family interrupts and communicating between registers and C variables should be available in other C implementations for the PC.
Bill Byrom
Irving, TexasThis is an answer to your request in the March 1990 issue. The answer concerns the question by Michael Wiedmann of West Germany on page 37.
To determine if a mouse is using a serial COM port, and which one, I have provided the listing mcheck.c (Listing 7) . The mcheck routine first calls the mouse_status() routine to determine if a mouse driver is present. The mouse_status() routine will call interrupt 33 (hex) with a function code of zero to get the mouse status. If the mouse driver is available, then the mouse_status() routine will return TRUE (1) and report in the mouse_info structure the number of buttons the mouse has. If the mouse driver is not present, then the mouse_status() routine will return FALSE (0).
If a mouse driver is present, then the mouse_config() routine will be called to get the configuration of the mouse. This routine will call interrupt 33 (hex) with a function code of 24 (hex) to obtain the mouse configuration. The mouse_config() routine will fill in the information, concerning mouse driver revision, the mouse type and the Interrupt Request Level (IRQ), in the mouse_info structure. The m_info.m_type has a value in the range from one to five as currently defined in the Microsoft Mouse specification. All other values are considered unknown by this routine. See the mouse_type character array for valid mouse types.
Once the mouse information is obtained, the next step is to determine if the mouse is a serial one, and if so, which COM port it is using. If the m_info.m_type has a value of two, which means serial mouse, then the which_port() routine is called to find out which serial interrupt vector the mouse is using. The which_port() routine compares the COM1 and COM2 interrupt vector segments against the mouse driver's interrupt vector segment using the far pointers com1_vec, com2_vec and mouse_vec. If neither vectors match, then which_port() will return zero, otherwise it will return one indicating that the mouse is using COM1 or two (2) indicating that the mouse is using COM2.
The mcheck.c routine was tested under Microsoft's QuickC v2.0, Microsoft's C v5.1 and Borland's TurboC v2.0. Tests were ran against Microsoft's Mouse Driver v6.36 in both Bus and Serial modes and against Mouse Systems Mouse Driver v5.03 in serial mode. It should be noted that the Mouse Systems driver I was using appears to have some problems when calling interrupt 33 (hex) function 24 (hex). It reported a driver major version as zero and an IRQ level of 89. It even sometimes reported a mouse type of zero instead of two.
To avoid possible mouse driver incompatibilities as stated above, simply call the mouse_status() routine to see if a mouse driver is present, and if so, call the which_port() routine directly to see if the mouse is using a serial port.
David W. Gunnell
Leesburg, VA
Floating Point Format
I received four replies to this question. In addition to the ones below, I also heard from Michael Peppler of Geneva, Switzerland.(KP)This is in regard to the letter from Finnbarr P. Murphy in your C Users Journal column in the May 1990 issue. He wants to know about converting floating point numbers in the MS BASIC format to the IEEE format.
This is a familiar problem to people who program in compiled BASIC such as Microsoft QuickBASIC. While I haven't used QuickBASIC in the last few years, I am aware of the solution provided by Microsoft. Versions 3.0 and higher of QuickBASIC handle floating point numbers internally in the IEEE format, but functions are provided which allow you to read a file containing MS BASIC format floating point numbers. These functions automatically convert the numbers to the IEEE format. You can then output the numbers to another file in the IEEE format so that in the future you have to deal only with the IEEE format. Page 320 of the QuickBASIC Programming in BASIC manual presents a sample program for converting a file from one format to the other.
Perhaps more importantly for the C programmer, the library for Microsoft C contains functions fmsbintoieee and dmsbintoieee which convert single and double precision numbers respectively from the MS BASIC format to the IEEE format. According to the excellent book Microsoft C Bible by Barkakati, published by Howard Sams & Co., these functions are in the libraries for MS C v4.0 and up. They are also available in Microsoft QuickC.
I hope this information is of use to you and Mr. Murphy.
Bernard H. Robinson, Jr.
DeBary, FLPrior to QuickBASIC 4.0, all versions of BASIC (interpreters and compilers) stored floating points in Microsoft Binary Format.
Microsoft has provided functions to convert between these two formats in their library. They are as follows:
dmsbintoieee Converts Microsoft binary double-precision to IEEE format.
fmsbintoieee Converts Microsoft binary single-precision to IEEE format.
dieeetomsbin Converts IEEE double-precision to Microsoft binary format.
fieeetomsbin Converts IEEE single-precision to Microsoft binary format.
Vince Du Beau
Carteret, New JerseyI'm no BASIC expert (or even a C expert), but I may be able to help with Finnbarr P. Murphy's question about BASIC floating point numbers in the May C Users Journal.
Microsoft C uses the IEEE format for floating point numbers. Microsoft QuickBASIC before v4.0 (and I think, but am not sure, their other implementations, including the interpreters) used the so-called Microsoft Binary Format (MBF). Starting with QB 4.0, the programmer could choose either MSB or IEEE. The latter was the default. The QB 4.0 manual entitled Learning and Using Microsoft QuickBASIC, Appendix B, p. 247-252, discusses this. Pages 132-133 of the BASIC Language Reference covers the functions CVSMBF and CVDMBF, which can be used to convert MBF to IEEE.
The encoding of MBF is given in the MASM 5.0 Programmer's Guide, p. 133-134. Like IEEE format, zero is represented by all bits clear and, for non-zero values, the integral portion of the mantissa is assumed to be one and not expressed. Short real format is encoded as follows: bits zero through 22: fractional part of the mantissa; bit 23: sign bit (set means negative); bits 24-31: exponent, biased by adding 81h to the "real" exponent. Long real format: bits zero through 54: fractional part of the significant; bit 55: sign bit (set means negative); bits 56-64: exponent, again biased by adding 401h to the "real" exponent.
The easiest solution to the problem would probably be to write a BASIC program to read the files, convert the numbers to IEEE using CVSMBF or CVDMBF, and write the records back out. A combined C/Basic program could also be concocted. Since the program would have to start up in BASIC (see the sections in the manual on mixed-language programming), this solution would probably be more trouble than it was worth.
I enjoy your columns each month and never fail to learn from them. I hope this has helped you and Mr. Murphy.
Howard C. Sanner, Jr.
Bladensburg, Md.