LETTERS

Example 1.

  (a)
  char * double2dollarascii (char *pbuff, double dollars);

  (b)
  printf("%s", double2dollarascii(&tempbuff, dollar));

  (c)
  #define va_start(ap,v) ap = (va_list) &v + sizeof(v)
  #define va_arg(ap,t) ((t_FAR_*) (ap + = sizeof(t))) [-1]
  #define va_end(ap) ap = NULL

  (d)
  typedef char *va_list;
  #define va_dcl int va_alist;
  #define va_start(list) list = (char *) &va_alist
  #define va_end(list)
  #define va_arg(list,mode) ((mode *) (list + = sizeof(mode))) [-1]

This use would have the added advantage of being reentrant. Readers may recall that this strategy is often used for date and time. Oh bored computer science majors! Oh back room hackers!

The same article had an excellent discussion on functions with variable number of parameters, however, I would like to warn your readers that the macros provided are not always portable. On the Microsoft C 6.0 (large model), they are defined as in Example 1(c), while on my Unix (Sun3 4.2BSD) system, they are like the macros in Example 1(d). In these macros, the starting variables are different.

Rayaz Jagani

Sunnyvale, California

VGA BIOS's and Bias's

Dear DDJ,

I found Christopher Howard's article "Super VGA Programming" (July 1990) very enlightening. I particularly appreciate the time he spent to differentiate between the Tseng, Paradise, and Video Seven windowing schemes. I am exposed to all these boards daily and I just purchased the Orchid Technology ProDesigner II for my personal system.

I, too, agree with Mr. Howard that video board manufacturers should allow for a method of identifying the capabilities of the board. Compliance with the Video Electronics Standards Association (VESA) standards will provide for much easier identification of video adapters and their supported resolutions. I refer the reader back to the excellent article in the April 1990 issue of Dr. Dobb's Journal, "VESA VGA BIOS Extensions," by Bo Ericcson, or directly to the Super VGA BIOS Extension from the VESA committee (October 1, 1989).

No doubt my next statement will be considered as launching the first volley in this round of the Programming Styles War. I found Mr. Howard's method of determining the type of VGA chipset rather archaic. It was not because of his choice of assembly language over C, but rather because of his method of coding for string compares to avoid static data (see Listing Three, page 84, DDJ July 1990).

Mr. Howard's coding style violates the primary maxim of engineering. He strays from the rule "Keep it simple and stupid," otherwise known as the KISS principle. Should Mr. Howard have to change his code in the months to come to reflect a change in case sensitivity of any of the strings for which he is searching (i.e., Paradise or Tseng), he would have to wade through the code to manually change each letter. Software engineering emphasizes better coding practices.

Assembly language definitely has its place and I do a rather large amount of mixed C and assembly language programming myself. Mr. Howard could have placed the strings in the code segment rather than in a separate data segment, and thus would have maintained the readability and maintainability that structured programming requires. He should have defined the strings between the EXIT macro found next to the label svQC_exit and before the ENDP directive in Listing Three on page 85. By declaring memory locations and defining their contents with th assembler directives ParadiseStr DB "PARADISE" and TsengStr DB "Tseng", Mr. Howard could have used the repeat string compare (repse cmpsb) to determine if any of the desired strings could be found in the video BIOS.

I also feel that some routines are better left to a higher-level language. Therefore, I chose to implement a method of determining the type of video adapter using C because more people are likely to be familiar with a higher level language.

Richard Heffel

The Networking Group

Hayward, California

Extending Optimal Extents

Dear DDJ,

I was quite interested in the article "Optimal Determination of Object Extents," by Duvanenko, Gyurcsik, and Robbins (October 1990). To test the algorithm, I implemented the code on my Apple IIGS computer, using the ORCA/C ANSI standard compiler (version 1.1 from Byte Works, Inc.). The IIGS has 3 megabytes of memory, uses software floating point (SANE), with 4 bytes for float and 10 bytes for extended (both IEEE format). Times in Table 1 are elapsed (same as CPU time on the IIGS). All optimization options available were used (a very limited set since there are no registers available for optimization or holding temporary results).

Table 1

                            Base      New     Percent  Base      New
   Machine         Items    MIN&MAX  MIN&MAX  Change   Average   Average
 ------------------------------------------------------------------------

 Apple
  IIGS float       100,000    228      171    -25.0%   .0022800  .0017100
  SANE MC68881     100,000    187      141    -24.6%   .0018700  .0014100
  Call MC68881     100,000     38       31    -18.4%   .0003800  .0003100

 IIGS
  extended         100,000    157      115    -26.8%   .0015700  .0011500
  SANE MC68881     100,000    126      104    -17.5%   .0012600  .0010400
  Call MC68881     100,000     78       70    -10.3%   .0007800  .0007000

 IIGS long
  integer          100,000     15       12    -20.0%   .0001500  .0001200

 IBM 3090
  300J          60,000,000     27       17    -37.0%   .0000004  .0000003

 IBM RISC
  S/6000        60,000,000     42       32    -23.8%   .0000007  .0000005

Due to memory size and processor speed (the IIGS is no match for the other machines in Table 1), I held the number of elements down to 100,000. Clock values were only accurate to one second (good enough for this comparison).

My initial run used float numbers. The new algorithm gives exactly the expected reduction in time. However, upon reading the SANE documentation, I discovered that all float numbers are internally converted to extended numbers before use. I made a second run using extended numbers (80-bit floating point IEEE format).

The second run removed all the time for converting numbers from 4-byte float to 10-byte extended for the comparison. The reduction is significant, at the expense of 600,000 bytes of memory. This points out that the floating point operations in SANE dominate both algorithms (comparison).

I made an additional run using long integers instead of floating point. This was done to determine the effect of using SANE compared to essentially the basic loop overhead. As we can see from Table 1, the loop itself, even with long integer operations, takes very little time when compared to the floating point operations. If this problem were solved using integer operations, other items, such as subscript calculations and movement of data, would be more important. Of course, this is still probably true when using floating point hardware for the operations. In this case, the algorithms themselves should be optimized with regard to the subscript calculations and data movement (if not automatically done by the compiler).

For optimal speed, I would choose to code this algorithm in assembly language (a practical necessity on the IIGS). In doing so, one often discovers how well (or poorly) a particular compiler generates code. A good compiler on poor hardware can sometimes do as well as a poor compiler on good hardware.

I took this comparison one step further on the IIGS. By installing the Floating Point Engine (FPE) from Innovative Systems, the numeric operations were performed by hardware floating point (MC68881 processor). This is the same chip used with the Motorola 68000. The chip is used by replacing SANE on the IIGS, or by directly generated compiler code (supported by the ORCA/ C compiler). Table 2 provides the comparison for both. Note that FPE handles float (4-byte) operands directly without conversion to extended (10-byte) operands. For the directly generated compiler code, this provides the fastest speed without memory penalty. Note also that the performance improvement for the new algorithm is not as good when better hardware is used. The loops are no longer compare intensive, and the timing is more dependent on loop control code and data movement (an indication of code quality from the compiler).

Table 2

    Machine         C compiler  Memory  Co-proc?   float   CPU elap  Opts
 ------------------------------------------------------------------------

 Apple IIGS float  ORCA/C V1.1   3 meg     no     4 bytes    elap     all
  MC6881                                   yes

 IIGS extended     ORCA/C V1.1   3 meg     no     4 bytes    elap     all
  MC6881                                   yes

 IIGS extended     ORCA/C V1.1   3 meg     no     4 bytes    elap     all
  long integer

 IBM 3090 300J     C/370 V2    384 meg     no     4 bytes     CPU     OPT

 IBM RISC S/6000   RISC C      246 meg     no     4 bytes     CPU      -O
  Model 540

Finally, I have access to an IBM 3090 300J processor with 384 Mbytes of memory, and an IBM RISC S/6000 Model 540 processor with 256 Mbytes of memory. These computers are the fastest models available in their respective product lines. These processors were timed using 60 million items. My experience with the C compilers on these two machines is with this program. The IBM C/370 compiler had one optimization level (OPT), but it cut the time by more than half from the unoptimized run (NOOPT). The RISC C Compiler also had one optimization level (xO), but it cut the time even more. This kind of reduction is expected with RISC C, since this compiler is specifically set up to generate optimal code for the RISC instruction set (especially overlapping instruction sequences for the instruction set processors).

Ken Kashmarek

Eldridge, Iowa