32-BIT FLOATING-POINT MATH

Taking full advantage of the 386 instruction set

Al Williams

Al is the author of DOS 6: A Developer's Guide (M&T Books, 1993) and DOS and Windows Protected Mode (Addison-Wesley, 1993). You can reach Al at 310 Ivy Glen Court, League City, TX 77573 or on CompuServe at 72010,3574.

It's no secret that floating-point calculations can slow program execution. For those PCs that have them, coprocessors can make a difference but, when compared to floating point, integer arithmetic is still faster. Libraries that emulate coprocessors are also available; however, these emulators are notoriously slow.

In this article, I present an alternative approach to floating-point math that takes advantage of 32-bit instructions. Even though I use the 80386 as an example platform, these techniques can be applied to other 32-bit processors as well. (You could even adapt the routines to run on 16-bit processors, but you would see some loss of performance.)

To use FPM (floating-point math), you'll need MASM or TASM to assemble the main source file. You can then use FPM from assembly, C, or C++ programs. Since FPM takes advantage of several 386 instructions, you'll also need a 386- or 486-based PC. One caveat: FPM is not as accurate as a coprocessor (or an emulation library) that uses an 80-bit word. Still, it is accurate enough for many programs.

Underlying Principles

When you do arithmetic by hand, you don't work directly with floating-point numbers. Instead, you calculate an integer answer and decide where to put the decimal point. To add, for example, you align the two numbers' decimal points, then add. The answer's decimal point remains in the same position. When multiplying by hand, you treat the numbers as integers, and place the answer's decimal point by counting the number of digits to the right of the decimal point in the two multiplicands. FPM uses similar procedures on binary numbers.

Using 32-bit integer arithmetic, the 386 can easily perform real-number math with reasonable accuracy. Since FPM uses binary numbers, there is no "decimal" point. To avoid confusion, I'll refer to the point in a FPM number as the "radix point."

Figure 1 shows how FPM stores floating-point numbers. The val field stores a 32-bit binary number. The scale field determines the location of the assumed radix point. The sign field is 0 for positive numbers, or 0xFF for negative numbers.

FPM uses two routines to adjust numbers. RADJ shifts numbers right until bit 0 is a 1, and LADJ shifts them left until the leftmost bit is a 1. These routines update the scale field by the amount of the shift so that the number's value doesn't change--just its representation.

Before adding two numbers, FPM must line up their radix points using the algorithm in Figure 2(a). FPM uses RADJ on both numbers to allow for as many significant digits as possible; see Figure 3. If the scales are then equal, the addition proceeds. If they are not, FPM adjusts the number with the smallest scale to the left until the scales are equal, or until a left shift would cause a 1 bit to fall off the end of the word. If the scales are still not equal, FPM will right shift the number with the largest scale. While some bits on the right may be lost, these bits have less significance than the bits on the left.

Figure 2: (a) Pseudocode for FPM addition; (b) pseudocode for FPM subtraction; (c) pseudocode for FPM multiplication; (d) pseudocode for FPM division.

  (a)

  if (x.value==0) return y

  if (y.value==0) return x

  right_adjust(y)
  right_adjust(x)
  if x.scale!=y.scale then
       A=number with larger scale
       B=number with smaller scale
       sdiff=A.scale-B.scale
       idx=left_bit_index(B)  /* find index of leftmost '1' bit */
       if 31-idx>=sdiff then
            B.value=B.value<<sdiff
            B.scale+=sdiff
       else
            B.value=B.value<<31-idx
            B.scale+=31-idx
            A.value=A.value>>sdiff-(31-idx)
            A.scale-=sdiff-(31-idx)
       endif
  endif
  if bit 31 of A.value or B.value is set then
       A.value=A.value>>1
       B.value=B.value>>1
       A.scale--
       B.scale--
  endif
  if A.sign==0xff A.value=-A.value
  if B.sign==0xff B.value=-B.value
  result.value=A.value+B.value
  if result.value<0 then
       result.value=-result.value
       result.sign=0xff
  endif
  result.scale=A.scale

  (b)

  if y.value!=0 y=-y
  jump to add routine

  (c)

  right_adjust(x)
  right_adjust(y)
  if x.value==0||y.value==0 return 0
  product=x.value*y.value /* product is 64 bit */
  shift product left until top 32 bits are all zero
  x.scale=x.scale-amount_shifted
  if bit 31 of product set then
       product=product>>1
       x.scale--
  endif
  result.value=product
  result.scale=x.scale+y.scale
  result.sign=x.sign^y.sign /* exclusive-or */

  (d)

  left_adjust(x)
  right_adjust(y)
  if x.value==0 return 0
  if y.value!=1 then
       idx=left_bit_index(y)  /* find index of leftmost '1' bit */
       divx=x.value           /* divx is 64 bit */
       divx=divx<<idx
       x.scale+=idx
       result.value=divx/y.value
  else
       result.value=x
  endif
  result.scale=x.scale-y.scale
  result.sign=x.sign^y.sign   /* exclusive-or */

Figure 3: Addition with and without right adjustment.

  Problem: Add 10110.100 and 10010.000
  (both binary) using 8 bits.

           With no adjustment:

             10110.100
           + 10010.000
           ___________
            101000.100

        Overflow!

           With right adjustment:

             0010110.1
           + 0010010.0
           ___________
             0101000.1

Subtraction is simple; see Figure 2(b). FPM inverts the sign of the second argument, calls the addition routine, and restores the second argument's original sign.

The 386's multiply instruction can directly act on the two numbers in the val fields. The scale of the result is the sum of the two original scales. Figure 2(c) outlines the multiplication algorithm. The MUL instruction places a 64-bit result in the EDX:EAX register pair. FPM shifts any significant bits out of EDX to the EAX register (and adjusts the answer's scale, of course). The DIV instruction divides a 64-bit number (in EDX:EAX) by a 32-bit number. To maximize precision, FPM shifts the divisor to the right until its bit 0 is a 1 (adjusting the scale, of course). If the divisor is equal to 1, the division is complete. Otherwise, FPM shifts the 64-bit dividend left by the number of significant bits in the divisor. This assures a 32-bit result with the maximum number of significant digits.

Next, the DIV instruction divides the two numbers. FPM then subtracts the divisor's scale from the dividend's scale to locate the result's radix point. Figure 2(d) illustrates this process.

Implementation

Listing One (page 76) contains FPM.ASM, the core functions for FPM. FPM.H (Listing Two, page 78) and FPMC.C (Listing Three, page 78) simplify using FPM from C programs.

The faddxy(), fsubxy(), fmulxy(), and fdivxy() functions provide the basic mathematical operations. To round out the library, two routines (fixtofloat() and floattofix()) perform conversions between FPM numbers and standard C floats.

Listing Four (FPMGRF.C, page 80) shows an example of FPM at work and pits FPM against the standard floating-point library. FPMGRF plots a line twice using a real-number coordinate system.

First, the standard library plots the line, then fpmline() plots the same line. Variations in color show where FPM and the math library compute different results. Table 1 contains the results of running FPMGRF.C on different machines.

Table 1: FPMGRF.C timing results. Values are in clock ticks (about 1/18 of a second) and represent averages over 12 executions.

                 Math Library   FPM
  ---------------------------------

  25-MHx 386
  without 80387          26      7

  25-MHz 386
  with 80387              4      7

Optimizations

FPM takes full advantage of the 386 instruction set. Of course, the 32-bit register width is a significant advantage, but other 386 instructions help, too.

FPM often shifts words until their rightmost (or leftmost) bit is a 1 (see LADJ and RADJ in Listing One). A conventional DOS program might use the code in Example 1(a) to right-adjust EAX. The 386-bit scan instructions, BSF and BSR, can considerably speed up this operation. BSF starts at bit 0 and returns the index of the first 1 bit that it finds. BSR is similar to BSF, but it starts with bit 31 (for a 32-bit operand). If there are no 1 bits in the word, the bit-scan instructions set the zero flag (ZF). Example 1(b) shows how to rewrite the right-adjust routine using BSR and BSF. This avoids the expensive looping and jumping that the first method requires.

Example 1: (a) Conventional DOS program that right-adjust EAX; (b) using the 386's BSF and BSR instruction to right-adjust EAX.

  (a)

  SLOOP:   TEST EAX,1
           JZ DONE
           SHR EAX,1
           JMP SLOOP
  DONE:

  (b)

  BSF  ECX,EAX
  SHR  EAX,ECX

FPM provides a C-language interface to the bit-scan instructions (bitscan() in Listing One). The prototype is: int bitscan(int dir, unsigned long lword);. If the dir flag is 0, bitscan() uses the BSF instruction. Otherwise, it uses BSR. The lword parameter is a 32-bit integer to scan. Unlike BSF and BSR, bitscan() numbers the bits from 1 (rightmost) to 32 (leftmost). If no bits are set in lword, bit-scan()returns 0.

FPM uses other 386 instructions in several places. For instance, Example 2 shows how the SHLD instruction shifts an operand left, filling the rightmost bits from its other operand and leaving 10FAH in EAX.

Example 2: Using the 386 SHLD instruction.

  MOV EAX, 10H
  MOV EBX, 0FA000000H
  SHLD EAX, EBX,8

Conclusions

If you are a C++ programmer, you have probably already thought about writing a class wrapper for FPM. By using C++'s operator overloading, you could make FPM calls with conventional C operators. In fact, an FPM class is simple to construct. Unfortunately, the constructor overhead that C++ imposes degrades FPM's performance. I have experimented with several class wrappers for FPM--all ran more slowly than the normal C float type. If FPM isn't faster than the math library, there is little point in using it.

FPM can provide your programs with a fast alternative for real-number arithmetic. It also shows that you don't have to run in protected mode to benefit from the power of a 386 or 486 CPU.



_32-BIT FLOATING-POINT MATH_
by Al Williams

[LISTING ONE]



;##########################################################
;# File: FPM.ASM                                          #
;# 386 Floating point package by Al Williams              #
;##########################################################
.MODEL         SMALL,C

; enable 386 instructions
.386
.DATA

; Number format
FPM_NUM        STRUC
SIGN           DB     ?            ; 0 for + FF for -
SCALE          DB     ?            ; position of decimal point
NUM            DD     ?            ; unsigned magnitude
FPM_NUM        ENDS

.CODE
; C  routines
PUBLIC         faddxy,fsubxy,fmulxy,fdivxy,bitscan

;************************************************
; Helper for C routines bitscan(dir,lword)
; if dir==0 do BSF on num
; if dir!=0 do BSR on num
; Returns index 1-32 or 0 if word was zero
bitscan        PROC
               ARG    DIR:WORD, LWORD:DWORD
               MOV    AX,DIR
               OR     AX,AX
               MOV    EAX,LWORD
               JZ     SHORT SCANFWD
               BSR    EAX,EAX
               JMP    SHORT BITSC0
SCANFWD:       BSF    EAX,EAX
BITSC0:        JNZ    SHORT BITSC1
               XOR    AX,AX
               RET
BITSC1:        INC    AX
               RET
bitscan        ENDP

;************************************************
; Adjust temp register @BX so that rightmost bit is 1
RADJ           PROC
; Find last set bit
               BSF    ECX,[BX].NUM
; If all zeros, jump to ZOUT
               JZ     SHORT ZOUT
; Shift right and adjust scale
               SHR    [BX].NUM,CL
               SUB    [BX].SCALE,CL
               RET
ZOUT:
               MOV    [BX].SCALE,0
               RET
RADJ           ENDP

;************************************************
; Adjust temp register @BX so leftmost bit is 1
LADJ           PROC
; Find first bit set
               BSR    ECX,[BX].NUM
; If all zero's goto zout (above in proc RADJ)
               JZ     ZOUT
; 31-Bit_index -> # places to shift
               MOV    CH,31
               SUB    CH,CL
               MOV    CL,CH
; Shift number and adjust scale
               SHL    [BX].NUM,CL
               ADD    [BX].SCALE,CL
               RET
LADJ           ENDP

;************************************************
; Add x+y->x
faddxy         PROC   USES SI
               ARG    ANSOFF:WORD,ANSSEG:WORD,XARG:FPM_NUM,\
                      YARG:FPM_NUM
; Check for zero addition (x=0)
               MOV    EAX,XARG.NUM
               OR     EAX,EAX
               JNZ    SHORT YZA
               MOV    EAX,YARG.NUM
               MOV    CH,YARG.SIGN
               MOV    CL,YARG.SCALE
               JMP    DOADD3
; Check for y=0
YZA:           MOV    EAX,YARG.NUM
               OR     EAX,EAX
               JNZ    SHORT NZA
               MOV    EAX,XARG.NUM
               MOV    CH,XARG.SIGN
               MOV    CL,XARG.SCALE
               JMP    DOADD3
NZA:
               LEA    BX,YARG
               CALL   RADJ
               LEA    BX,XARG
               CALL   RADJ
; if eq then ok
               MOV    CL,XARG.SCALE
               MOV    CH,YARG.SCALE
               CMP    CL,CH
               JE     SHORT DOADD
; if xscale>yscale ...
               LEA    SI,YARG
               JG     SHORT PLUSADJY
               XCHG   CH,CL
               XCHG   BX,SI
PLUSADJY:
;    amt=xscale-yscale
               SUB    CL,CH
;    find top bit in y
               BSR    EAX,[SI].NUM
               MOV    CH,31
               SUB    CH,AL
               CMP    CH,CL
;    if 31-topbit>=amt shift y left by amt; yscale+=amt
               JL     SHORT PLUSELSE
               SHL    [SI].NUM,CL
               ADD    [SI].SCALE,CL
               JMP    SHORT DOADD
PLUSELSE:
;    else ...
;       shift y left by 31-topbit; yscale+=31-topbit
               XCHG   CH,CL
               SHL    [SI].NUM,CL
               ADD    [SI].SCALE,CL
               XCHG   CH,CL
               SUB    CL,CH
;       shift x right by amt-(31-topbit); xscale-amt-(31-topbit)
               SHR    [BX].NUM,CL
               SUB    [BX].SCALE,CL
DOADD:
; Make certain bit 31 of each number is off and keep scales equal
               MOV    EAX,XARG.NUM
               MOV    EBX,YARG.NUM
               MOV    CL,XARG.SCALE

               MOV    EDX,EAX
               OR     EDX,EBX
               JNS    SHORT SETSIGN
               SHR    EAX,1
               SHR    EBX,1
               DEC    CL
SETSIGN:
               XOR    DH,DH        ; Set positive flag
               MOV    DL,XARG.SIGN
               OR     DL,DL
               JZ     SHORT DOADD1
               NEG    EAX
               INC    DH
DOADD1:        MOV    DL,YARG.SIGN
               OR     DL,DL
               JZ     SHORT DOADD2
               NEG    EBX
               INC    DH
DOADD2:
                                   ; Assume positive sign in CH
               XOR    CH,CH
               ADD    EAX,EBX
               JNS    SHORT DOADD3
               OR     DH,DH
               JZ     SHORT DOADD3 ; Large positive number
               NEG    EAX
               NOT    CH
DOADD3:
               PUSH   DS
               LDS    BX,DWORD PTR ANSOFF
; Store sign, scale, and value
               MOV    [BX].SIGN,CH
               MOV    [BX].SCALE,CL
               MOV    [BX].NUM,EAX
               POP    DX
               RET
faddxy         ENDP

;************************************************
fsubxy         PROC
               PUSH   BP
               MOV    BP,SP
; If X-0 then compute X+0
               MOV    EAX,[BP+14].NUM
               OR     EAX,EAX
               JZ     SHORT SUBBY0
; else compute X+-1*Y
               NOT    [BP+14].SIGN
SUBBY0:
               POP    BP
; Let + do the work
               JMP    faddxy
fsubxy         ENDP


;************************************************
fmulxy         PROC
               ARG    ANSOFF:WORD,ANSSEG:WORD,XARG:FPM_NUM,\
                      YARG:FPM_NUM
; Right adjust X
               LEA    BX,XARG
               CALL   RADJ
; Right adjust Y
               LEA    BX,YARG
               CALL   RADJ
; Check for X*0 or 0*Y
               MOV    EAX,XARG.NUM
               OR     EAX,EAX
               JZ     SHORT RETZM
               MOV    EBX,YARG.NUM
               OR     EBX,EBX
               JZ     SHORT RETZM
; Do it
               MUL    EBX
TSTZLP:
; Find bits in EDX (overflow of 32 bits)
               BSR    ECX,EDX
; If none, goto DXZ
               JZ     SHORT DXZ
; Shift bits to EAX and adjust scale
               INC    CL
               SHRD   EAX,EDX,CL
               SUB    XARG.SCALE,CL
DXZ:

               OR     EAX,EAX
               JNS    SHORT NOOF
; keep answer positive
               SHR    EAX,1
               DEC    XARG.SCALE
NOOF:
; store answer
               PUSH   ES
               LES    BX,DWORD PTR ANSOFF
               MOV    ES:[BX].NUM,EAX
; compute new scale
               MOV    AL,YARG.SCALE
               ADD    AL,XARG.SCALE
               MOV    ES:[BX].SCALE,AL
; compute new sign
               MOV    AL,YARG.SIGN
               XOR    AL,XARG.SIGN
               MOV    ES:[BX].SIGN,AL
               POP    ES
               RET
; Jump here to return a zero
; shared by MUL & DIV
RETZM:         PUSH   ES
               LES    BX,DWORD PTR ANSOFF
               XOR    EAX,EAX
               MOV    ES:[BX].NUM,EAX
               MOV    ES:[BX].SCALE,AL
               MOV    ES:[BX].SIGN,AL
               POP    ES
               RET
fmulxy         ENDP

;************************************************
fdivxy         PROC
               ARG    ANSOFF:WORD,ANSSEG:WORD,XARG:FPM_NUM,\
                      YARG:FPM_NUM
; LEFT ADJUST X
               LEA    BX,XARG
               CALL   LADJ
; RIGHT ADJUST Y
               LEA    BX,YARG
               CALL   RADJ
; CHECK FOR DIVIDE BY ZERO
               MOV    EAX,XARG.NUM
               OR     EAX,EAX
               JZ     RETZM
; CHECK FOR DIVIDE BY 1
               MOV    EBX,YARG.NUM
               CMP    EBX,1
; DON'T DIVIDE BY 1 -- BESIDES IF WE ROTATE EDX:EAX LEFT,
; / BY 1 WOULD RESULT IN OVERFLOW
               JZ     SHORT DODIV
; SHIFT X BITS INTO EDX BASED ON SIZE OF Y
               XOR    EDX,EDX
               BSR    ECX,EBX
               SHLD   EDX,EAX,CL
               SHL    EAX,CL
; ADJUST SCALE
               ADD    XARG.SCALE,CL
               DIV    EBX
DODIV:
               PUSH   ES
               LES    BX,DWORD PTR ANSOFF
               MOV    ES:[BX].NUM,EAX
; COMPUTE ANSWER'S SCALE
               MOV    AL,XARG.SCALE
               SUB    AL,YARG.SCALE
               MOV    ES:[BX].SCALE,AL
; FIND ANSWER'S SIGN
               MOV    AL,YARG.SIGN
               XOR    AL,XARG.SIGN
               MOV    ES:[BX].SIGN,AL
               POP    ES
               RET
fdivxy         ENDP

               END

[LISTING TWO]



/***************************************************
 * FPM.H - C language header for FPM               *
 ***************************************************/
#ifndef FPMHEADER
/* fixed point type */
typedef struct
 {
 char sign;
 char scale;
 unsigned long num;
 } FIXED;

/* Scale factor
  use - 1000.0 for 3 places, 100.0 for 2 places etc. */
#define SFACTOR (10000.0)

#define CVTTYPE float

#ifdef __cplusplus
extern "C"
{
#endif

/* Assembly core functions (from FPM.ASM) */
extern FIXED faddxy(FIXED,FIXED),fsubxy(FIXED,FIXED),
       fmulxy(FIXED,FIXED),fdivxy(FIXED,FIXED);
extern int bitscan(int dir,unsigned long lword);

/* C helpers (from FPMC.C) */
FIXED floattofix(CVTTYPE f);
CVTTYPE fixtofloat(FIXED f);
int fixtoint(FIXED f);

#ifdef __BORLANDC__
/* this is the same */
extern FIXED faddxy (FIXED,FIXED),
       FIXED fsubxy (FIXED,FIXED),
       FIXED fmulxy (FIXED,FIXED),
       FIXED fdivxy (FIXED,FIXED);
#else
/* Microsoft definitions */
FIXED fpmmsc_rv;
extern void faddxy (FIXED _far *, FIXED,FIXED),
            fsubxy (FIXED _far *, FIXED,FIXED),
            fmulxy (FIXED _far *, FIXED,FIXED),
            fdivxy (FIXED _far *, FIXED,FIXED);
#define faddxy(a,b) (faddxy(&fpmmsc_rv,a,b),fpmmsc_rv)
#define fsubxy(a,b) (fsubxy(&fpmmsc_rv,a,b),fpmmsc_rv)
#define fmulxy(a,b) (fmulxy(&fpmmsc_rv,a,b),fpmmsc_rv)
#define fdivxy(a,b) (fdivxy(&fpmmsc_rv,a,b),fpmmsc_rv)
#endif

#ifdef __cplusplus
}
#endif

#endif

[LISTING THREE]



/***************************************************
 * FPMC.C - C language routines for FPM            *
 ***************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include "fpm.h"

/* convert fpm to float */
FIXED floattofix(CVTTYPE f)
  {
  FIXED ans;
  unsigned long i,whole,frac=0L;
  CVTTYPE fp=.5;
  ans.sign=0;
/* special case */
  if (f==0.0)
    {
    ans.scale=0;
    ans.num=0L;
    return ans;
    }
  if (f<0.0)
    {
    ans.sign=0xff;
    f=-f;
    }
  whole=(unsigned long)f;
  f-=whole;
/* scale is number bits in whole */
  ans.scale=32-bitscan(1,(unsigned long)whole);
  ans.num=(unsigned long)whole<<ans.scale;
  if (ans.scale)
/* compute fractional part */
    for (i=1L<<(ans.scale-1);i!=0&&f!=0.0;fp/=2)
      {
      if (f>=fp)
        {
        f-=fp;
        frac|=i;
        }
      if (!frac&&!whole)
        ans.scale++;
      else
        i>>=1;
      }
  ans.num|=frac;
  return ans;
  }
/* convert FPM number to float */
/* Uses SFACTOR for rounding (see FPM.H) */
CVTTYPE fixtofloat(FIXED f)
  {
  unsigned long i;
  CVTTYPE fp=0.5,res,sg=1.0;
  if (f.sign)
    {
    sg=-1.0;
    }
  if (f.scale<0)
    res=(CVTTYPE)(f.num<<-f.scale);
  else
    {
    res=(CVTTYPE)(f.num>>f.scale);
    while (f.scale>>5)
      {
      f.scale--;
      fp/=2.0;
      }
    for (i=1L<<f.scale-1;i!=0;i>>=1,fp/=2)
      if (f.num&i) res+=fp;

    }
  return floor(sg*res*SFACTOR+.5)/SFACTOR;
  }
/* convert FPM number to int (truncate fraction) */
int fixtoint(FIXED f)
  {
  int r;
  if (f.scale>=0) r=f.num>>f.scale; else r=0;
  if (f.sign) r=-r;
  return r;
  }

[LISTING FOUR]



/***************************************************
 * FPMGRF.C - Graphical demo of FPM with benchmark *
 ***************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <dos.h>
#include <conio.h>
#include <time.h>
#include "fpm.h"

/* Macro to call BIOS */
#define INT10(r) int86(0x10,&r,&r)
/* Parameters for line (y=Mx+B) */
#define M .291    /* Slope */
#define B .08     /* y-intercept */

main(int argc,char *argv[])
  {
  clock_t s,e;
  long fpmtime,floattime;
/* Set CGA mode */
  setmode(4);
/* mark time */
  s=clock();
/* draw line with math library */
  floatline();
  e=clock();
/* compute time */
  floattime=e-s;
/* wait for a key */
  printat(1,10,"Press any key to continue");
  if (!getch()) getch();
  printat(1,10,"                         ");
/* mark time again */
  s=clock();
/* draw line with FPM */
  fpmline();
  e=clock();
/* compute time */
  fpmtime=e-s;
/* wait for a key */
  printat(1,10,"Press any key to continue\n");
  if (!getch()) getch();
  setmode(3);
/* print results */
  printf("Float time=%ld ticks\nFPM time=%ld ticks\n",floattime,fpmtime);
  }
/* Set video mode */
setmode(int n)
  {
  union REGS r;
  r.x.ax=n;
  INT10(r);
  return 0;
  }
/* plot a floating point pixel */
floatplot(float x,float y)
  {
  int x1,y1;
  x1=x*320;
  y1=y*200;
  plot(x1,y1,2);
  return 0;
  }
/* ploat a point with FPM numbers */
fpmplot(FIXED x,FIXED y)
 {
 static FIXED xscale;
 static FIXED yscale;
 static int first=0;
 if (!first)
   {
   xscale=floattofix(320.0);
   yscale=floattofix(200.0);
   first=1;
   }
 plot(fixtoint(fmulxy(x,xscale)),fixtoint(fmulxy(y,yscale)),3+128);
 }
/* plot a hardware pixel */
plot(int x,int y,int color)
  {
  union REGS r;
  r.h.ah=0x0c;
  r.h.al=color;
  r.h.bh=0;
  r.x.cx=x;
  r.x.dx=y;
  INT10(r);
  return 0;
  }
/* Floating point library line drawing routine */
floatline()
  {
  float x,y;
  for (x=0.0;x<1.0;x+=.001)
    {
    y=M*x+B;
    floatplot(x,y);
    }
  }
/* FPM line drawing routine */
fpmline()
  {
  FIXED x,y,m,b,incr;
  m=floattofix(M);
  b=floattofix(B);
  x=floattofix(0.0);
  incr=floattofix(.001);
  while (fixtoint(x)==0)
    {
    y=faddxy(fmulxy(m,x),b);
    fpmplot(x,y);
    x=faddxy(x,incr);
    }
  }
/* print string via BIOS at x,y */
printat(int x,int y,char *s)
  {
  union REGS r,r1;
  r.h.ah=2;
  r.h.bh=0;
  r.h.dh=y;
  r.h.dl=x;
  INT10(r);
  r.h.ah=0xe;
  r.h.bl=7;
  while (r.h.al=*s++)
    int86(0x10,&r,&r1);
  }