May 1991/We Have Mail

Departments

We Have Mail

Dear Mr. Plauger,
I have just read Mr. Kolias' letter in the December issue concerning Borland's BGI stroked fonts. There is a free font editor available directly from Borland. You can ask Borland to send you the "BGI Developer's Disk" if you are a registered user of one of Borland's compilers. You need to specify the registration number. The German subsidiary of Borland, Borland GmbH, has shipped these disk free of charge. Maybe Borland America has a different policy, but I don't think so. The disk contains a font editor, some new fonts, some new drivers (including VGA 256 color and Hercules), and some example programs. If you have problems getting the disk try:
Borland GmbH
Lindwurmstrasse 88
W - 8000 Muenchen 2
GERMANY
Tel. +49 89 720100
I can be reached using wilhelms @uniaug.de or
Gerhard Wilhelms
Universitaet Augsburg
Lehrstuhl fuer Informatik I
Universitaetsstr. 2
W - 8900 Augsburg
GERMANY
PS: Thanks for your fine magazine!
Thank you — pjp
Dear CUJ,
I am starting to wonder if I made a mistake when I recently renewed my subscription to this magazine. It is disheartening to see how casually authors attribute errors to compiler bugs. In each issue there are at least two or three problems which are dismissed as compiler bugs which, with a minimum amount of research, can be shown to be user errors. I would have thought that the editorial staff would eliminate such unfounded remarks, or at least require the author to provide some evidence of the supposed bug. It worries me when I read an article full of phrases such as "the compiler seems to..." and "presumably."
Instead, we are subjected to articles such as Stuart Baird's "Using Large Arrays in Turbo C" peppered with mistaken compiler bug claims and erroneous suppositions. I use Turbo C daily and have done some difficult UNIX-to-MS-DOS ports. My main difficulties in such ports are the UNIX assumptions about arrays and memory architectures, so I have encountered just about every segmentation related problem at least once.
Given the declaration

int a[20000], b[20000];
there is no memory model which will handle this as we might assume it would. In Turbo C, static data is limited to a total of 64K except in the huge memory model, where each source file may contain up to 64K of static data. Since this source file has more than 64K of static data, it cannot be compiled correctly by Turbo C. This is documented behavior, so I'd hardly call it a bug. A major inconvenience brought about by the CPU, but definitely not a compiler bug.
Incidentally, this could be done by creating two source files, each containing one of the declarations above, and compiling under the huge memory model. Each source file has less than 64K of static data, but the total amount of static data is well over 64K. If the declaration above is made outside the scope of a function, or inside a function but declared as static, the compiler will generate an error message, "Too much global data defined in file..."; however, if the same declaration appears within a function and is not static, no error is generated because the data is no longer allocated in the global data area but on the stack at each function invocation. Such a declaration should generate a similar error at compile time, since stack space is always limited to 64K. It cannot truly be considered a bug as there are no guarantees about available stack space. A deeply recursive function with 1 byte of local data can run out of stack space. I think, however, Borland did overlook the cases which could be caught a compile time.
Another mistaken claim is that in the following code

long i; unsigned long count = l00000ul; char far *a; a = (char far *) farmalloc(count); for(i = 0L; i count; i++) a[i];
uses long arithmetic in the array subscript operator. The array subscript operator takes two operands. One is of course a pointer expression; the other is an integral expression. The documentation says "When an integral type is added to or subtracted from a 'pointer to type,' the result is also of type 'pointer to type,'" It also says that the type of the integral expression is unsigned int, unless the pointer is a huge pointer, in which case the integral expression is signed long. Of course, if the pointer is huge, normalization also occurs.
Since a is not a huge pointer, i is converted to unsigned int, added to the far pointer, and the result is a far pointer which points to someplace in the same 64K segment as a does. Had a been declared to be a huge pointer, the example should work as expected, as the author points out. Incidentally, unless a is a huge pointer, the code above will produce a warning message, "Conversion may lose significant digit" if warnings are enabled.
Turbo C can catch many of the subtler problems in user code if all warnings are enabled. This is precisely why I always compile with warnings fully enabled, and correct and/or confirm all code which generates warnings.
As for the "small bug" in printf in the huge model, my tests failed to produce anything but correct results. I am using Turbo C++ in the C compilation mode. Earlier versions might well produce different results, but I do not recall any such problem, and I have used Turbo C 1.5 and 2.0 before switching to Turbo C++.
While the article contained useful information and caveats, these repeated casual, unsupported references to supposed compiler and library bugs soured it for me. Mr. Baird is not alone, nor can he truly be faulted. I'd reword an old saying, if I had a dime for every time someone 'discovered' a compiler bug that was actually bad user code, I would be a very rich man.
Michael S. Percy
420 Gallo Way
Seneca, SC 29678
grimlok@hubcap.clemson.edu
(803) 885-1132
Where were you when I was selling C compilers? If half my customers were as tolerant as you, I'd have a lot more dimes in my pocket today. I remember telling users where the manual let the compiler off the hook. It did no good. A user expecting an error message wants an error message. As far as the unhappy user is concerned, a missing error message is a compiler bug.
I too use C part of Turbo C++. It is a fine product. It also contains bugs. That inclines me to believe an author who states that bizarre behavior is "probably" a compiler bug. I too am wary of an offhand tendency to blame the tools for what may well be a program bug. That inclines me to demand supporting arguments. In the end, I simply accept the fact that one man's bug is another man's feature. — pjp
Dear Mr Plauger,
I am concerned about the article "Automated Software Testing" in the February 1991 C Users Journal. My concern is not with Mr. McLaughlin but with the review process that the C Users Journal uses for its articles. Mr. McLaughlin had the difficult job of reducing many years of testing experience into a three-page article.
My first concern is about the third paragraph in the section "Testing All the Code:" (p 105):
"... The only way to ensure that code does not break down in an unexpected manner is to use a profiler, [...], to ensure that every line is being exercised, that all calls to all routines are used. If this is not done then a program that passes a test plan may break down in the field because a subroutine was not tested under all conditions."
Exercising every line of code does not mean that the code has been tested under all conditions. Example: I can exercise every line of Listing 1 on page 107 with the letter d. Further, there are other ways to help ensure that the code does not break down in unexpected ways; for example: code inspections or code reviews. Both code reviews/inspections and testing need to be used to minimize errors in software.
My second concern involves the last paragraph in summary (p 107):
"In this sample testing program more than 1,000 tests of the program were run."
The range of printing characters, defined by isprint, is from Ox20 to Ox7E: a total of 95 characters. Since the test cases are generated at random, I do not know if each character has been tested at least 10 times, or the letter b has been tested 1,000 times. Running the test several times does not allow me to know that I have covered all of the possible printing characters. Can I have confidence in the testing process if I don't know what has been tested?
As The C Users Journal continues to publish more articles on software development practices, I feel you owe it to your readers (and their customers) to be both correct and sound in the practices that are shown in your articles.
Sincerely,
Calvin Hertel
249 126th Ave. NW
Coon Rapids, MN 55433
Both of your examples are valid, but I think you're being a bit harsh. It is indeed an overstatement to say that using a profiler is "the only way" to ensure code quality. It is a necessary component in adequate testing, but it is not sufficient. Still, the other methods you cited are simply alternate ways to ensure that all corners of the code get reviewed.
Testing a program by generating random test cases does raise the confidence level. You can't always sniff out those chunks that have a finite set of inputs and test them exhaustively. Even if you do, you raise the prospect that an error in these specialized tests will overlook one or more inputs. Just don't assume that random testing raises confidence in proportion to the number of inputs generated.
I am glad you appreciate the difficulty of reviewing so much material in a small amount of space. It is tough on the reviewer, the editor, and the stuff getting reviewed. Nevertheless, reviews provide an important service to readers. We can only try to keep doing this tough job as fairly as possible. In that light, I accept your remarks as constructive criticism. — pjp
Dear Sirs,
I would like to make some comments concerning the questions of Mr. Ken Yerves about the location of specific data in an .EXE file (CUJ, December 1990).
Each .EXE file has a specific file header with information which can help to determine the location of data in the .EXE file. Part of this file header you can see in the following structure:

struct EXE_HEADER { char MZ[2]; /* MZ-header for .EXE files */ unsigned last_sector; /*lenght of last used sector in file */ unsigned file_size; /* size of file, including header, in paragraphs */ unsigned reloc_count; /* number of relocation table items */ unsigned header_size; /* size of header in 16-byte paragraphs */ } exe_header;
After opening your .EXE file in binary mode, you can read this header information:

fread(&exe_header, sizeof(exe_header), 1, file_ptr);
Then you can position the file pointer. The following two macros can simplify the code (assuming the data you are looking for is a structure variable named config_data):

#define MK_FP(s,o) \ ((voide far*)(((unsigned long)(s)<<16) | (o))) #define POSITION \ ((char huge *)&config_data \ - (char huge *)MK_FP(_psp, 0x100)) fseek(file_ptr, POSITION, SEEK_SET);
And there you are! Just another fread will get your data from your .EXE file without searching for a unique key:

fread(&config_data, sizeof(config_data), 1, file_ptr);
After reconfiguring your application and before existing, you can write your config_data back to the .EXE file in the same manner.
Yours sincerely,
Michael Wiedmann
Innsbrucker Str.
35D-1000 Berlin 62
Germany
Thanks. — pjp
Dear CUJ:
I was interested in the article "A Login Shell for MS-DOS" by Leor Zolman in the February 1991 issue of CUJ. However, in implementing his code on my system, I ran into several problems for which I propose solutions, along with a few suggested imporvements.
The first concerns his recommendation that the last line of the user's login script (i.e., batch file) specify the login command itself. The reason is to maintain security by returning the user to the LOGIN prompt instead of the DOS prompt. While the reason is sound, it's implementation is not. As given, the login script is executed using the system() function, which loads a copy of COMMAND.COM after the current invocation of LOGIN. When the last line of the script specifies 'login', LOGIN is invoked again, leaving the first copy of LOGIN "resident." One can easily see that each time the user executes a script file, memory will be quickly chewed up as each copy LOGIN loads after the previous one.
One solution to this would be to execute the script file by loading COMMAND.COM as a child process so that it overlays the parent LOGIN.EXE at the same memory location. The code in Listing 1 demonstrates.
Instead of the common login on the last line of the script file, as Mr. Zolman suggests, I use the DOS command exit, which returns execution back to AUTOEXEC.BAT from where LOGIN was originally invoked on boot-up. However, instead of ending AUTOEXEC.BAT and returning to the DOS prompt, simply execute LOGIN from within an endless loop. This ensures that LOGIN will reload when the script file terminates. The following fragment from AUTOEXEC.BAT shows how:
:NEXT
LOGIN
GOTO NEXT
to give yourself "superuser" status and the ability to get out of your own trap, write a special batch file (protected by a password) that loads COMMAND.COM again. This will put you back at the DOS prompt, which will give you full access. Typing exit at the DOS prompt will reload LOGIN again. (For a bit of additional security, change the attributes of all login script files to "read-only" and the password file to "hidden/read-only").
In giving my adaptation of LOGIN a final polish, I deleted zgetch(), zputs(), and zgets(), and instead used the DOS Input String function as implemented by cgets() and getpass() in the Turbo C library. Using these functions lets you limit cursor travel to 8 bytes (the length of a file name) and provides "full" DOS editing capability. In addition, getpass() does not echo characters to the screen when a user types a password.
I hope these comments will help others trying to implement LOGIN for themselves. My thanks for an enjoyable and useful magazine and to Mr. Zolman in particular for a great idea.
Sincerely,
Thomas Nelson
5004 W. Mt. Hope Rd.
Lansing, MI 48917
Leor responds:
Thank you, Mr. Nelson. I feel pretty stupid to have overlooked the stacking effect of repeated login invocations! And your usage of execlp is also a clear improvement. In examining your code, I discovered another oversight on my part: when using the cprintf() function, a "newline" is displayed by specifying \r\n, not just \n as I had done throughout the original program.
By the way, I don't actually use the login program under DOS (since most of my programming is under Xenix these days), and I guess it shows from those goofs!
I'll now offer an improvement based on your login-loop concept. I would expand the loop at the end of AUTOEXEC.BAT to the following:
:next
if not exist login.on goto :end
cls
login
goto :next
:end
I would then create a dummy file in the root directory named LOGIN.ON. Its presence enables the login program. In order to disable login completely, I'd log in as the super-user (as per your suggestion, where the super-user batch file contains an explicit invocation of the command processor), and rename LOGIN.ON to LOGIN.OFF, then give the exit command. This will return control to the AUTOEXEC.BAT loop, whereupon the test condition will become true (LOGIN.ON not found) and AUTOEXEC.BAT will terminate completely, freeing up ALL available memory for the "super-user". To re-enable login, just rename the file back to LOGIN.ON. — Leor Zolman
Editor:
The article "Complex Function Library," by Maynard A. Wright in your September 1990 issue is a prime example of misdirected programming efforts. In this article, Mr. Wright goes to great pains to find the most efficient way to program some complex transcendental functions via standard "textbook algorithms" when, in fact, his time might have been better spent in developing improved algorithms.
In Table 1 of this article the author lists the sizes and run times of three functions for calculating the hyperbolic sine (csinh) when the argument is complex. The three functions differ in the way they handle arguments and return values. From this table we learn that, although the .obj and .exe sizes vary by only a few percent, the run times vary by ten percent, from the slowest to the fastest of the three functions.
My point is that while some time can be saved through careful programming, often much more time can be saved by using a more efficient algorithm. Let me use the same function, csinh, as an example. Wright's method for csinh follows:
#include <math.h>
. . .
double arg_real, arg_imag,
   sinh_real, sinh_imag;
. . .
sinh_real = cos(arg_imag) *
   sinh(arg_real);

sinh_imag = sin(arg_imag) *
   cosh(arg_real);
While the code above is mathematically correct, it is very inefficient. Both the trigonometric and hyperbolic functions are expensive to calculate. Essentially all of the calculation time is spent in the four library function calls. In much less time than it takes to write the code for the three variations for csinh in the article, the writer could have cut the execution time by 40 percent by noting that both sinh and cosh are combinations of exp(arg_real). In fact, it is likely that the library routines for these functions actually call the library exp function. By writing the hyperbolic functions in terms of exponentials (after all, the hyperbolic functions are nothing but shorthand for combinations of exponentials) one can calculate csinh as follows:
#include <math.h>
. . .
double arg_real, arg_imag, sinh_real,
   sinh_imag, a, b; ...
. . .
b = 1.0 / (a = exp(arg_real));
sinh_real = cos (arg_imag) * 0.5 *
   (a -  b);
sinh_imag = sin(arg_imag3 * 0.5 *
   (a + b);
The number of library calls is reduced from four to three, but one of the three is the exponential function, which, as was mentioned above, is probably called indirectly by both sinh and cosh in Wright's method. I coded and timed the two methods and found the time savings to be about 40% in favor of the latter. Similar savings can be had for most of the other functions in Wright's paper.
Your journal serves an important function in educating C programmers, and I agree that programming style is important, especially for program maintenance. However, as the above example illustrates, It is almost always the case that a more efficient algorithm will save more computational time than the most effective programming techniques applied to an inefficient algorithm.
Yours truly,
Jim Sharpiro, Ph.D.
Professional Software, Inc.
2460 Hawthorne Ave.
Boulder, CO 80302
Your point is well taken, but you overlook an equally important point. Math functions must return accurate results for all valid inputs. I believe that Mr. Wright's functions do so. Yours does not. For small values of arg_real, your computation of sinh_real loses most or all of its precision, thanks to the subtraction of nearly equal values. For a band of large values, your approach suffers an overflow even though the function value is defined.
Here is a case where you are better off suffering the performance penalty and letting carefully crafted library functions do the hard bits for you. — pjp