Features


Arithmetic In Factorial-Base

Frederick W. Hegeman


Frederick Hegeman is an amateur programmer and computer language hobbyist. He can be reached at P.O. Box 2368, Rapid City, SD 57709, telephone (605) 343-7014.

C takes a machine-oriented approach to numbers; its arithmetic types are coupled to the physical architecture of the processor. Arithmetic operations on ints, longs, and doubles use a compact representation and provide fast and well-defined performance. The cost is limits on the magnitude and accuracy of operands and results.

Other languages take a less mechanical approach. Common Lisp supports bignums, integral values limited only by available memory, and ratios, rational values which allow exact computations by storing separate integral valued numerators and denominators (which can be bignums, of course).

For C applications that need to operate beyond the restraints imposed by float.h and limits.h, similar types could be coded, but doing so would be a nightmare. What kind of code would be required to compare (4129181559/4294967296) to (449/467)? To multiply or divide them? Even worse, memory usage could grow without limit, up to the point where the program fails.

Happily, there is a more practical approach that works within bounded memory and almost allows computation to unlimited magnitude and perfect accuracy.

Factorial-Base Arithmetic

Any rational number can be represented exactly in a number system based, not on increasing powers of a particular number (e.g., 100,101, 102, ...), but on the series of factorial numbers, (1!, 2!, 3!, ... ). Any integer in any static-base system can also be represented in factorial-base.

The decimal integer 231 can be represented as 2x102+3x101+1x100 or as 1x5!+4x4!+2x3!+1x2!+1x1!. And any rational fraction has an exact representation in factorial-base, even if infinitely repeating in a static-base.

The ratio 1/3 can be represented as

Whatever numbers a computer manipulates must be rational numbers. If I type in 3.1415926535897932384, I am not typing in p, but a rational approximation. I can't calculate the square root of 2; I can calculate a rational number than approximates it. Every operand that can be used, and every result that can be returned, is a rational number. I find it ironic that Georg Cantor, who discovered those computer incompatible values, the transfinite number, was also responsible for factorial-base representation. In Cantor's scheme every number that can be computed by machine can, given the resources, be computed exactly.

The limiting resources of the factorial-base system are memory and the implementation language. The language is the most important factor. Time, in theory, is not a factor. Memory is the limiting factor only for very small systems such as micro controllers. In C, the limiting factor is related to the integer square root of LONG_MAX. Because ANSI conforming compilers are allowed generous amounts of slack, the limiting value may be one less than the quotient of INT_MAX divided by sizeof(long). For a subset without longs, the limits are set by the integer square root of INT_MAX.

Representing Factorial-Base Numbers In C

Factorial-base numbers have a natural representation in C as arrays of some signed integral type. Consider the representation of decimal 231 given previously. The value of each factorial place is an integer to be stored in a separate element of the array. Arrange things so that references begin at array[1], rather than at array[0], and the subscripts track the factorial places

int factorial_integer[6] = {0,1,1,2,4,1};
Likewise for the ratio 1/3

int factorial_fraction[4] = {0,0,0,2};
To represent the real number 231 + 1/3, some scheme is needed to keep track of both parts. Declaring a struct with members to hold the separate arrays does so in a straightforward manner. However, since many subset compilers, and cross compilers, do not implement structs and since the idea of a tiny controller limited to eight-bit "words" calculating to the limits of 10! and 1/10! has a certain charm, I chose to implement both parts in a single array, as illustrated in
Figure 1.

The numbers are represented in a sign and magnitude scheme. The equivalent of 2's complement simply does not exist and since the subscript 0 is free, it holds the sign. The factorial-base integer follows. Each element of the integer portion will hold a maximum value equal to its subscript. If it ever happens that the value exceeds the subscript, the number can be rearranged so that it does not. Consider array element 2, which represents 2!. The maximum entry it should hold is 2. The value of 2 at subscript 2 is 2*2!. Suppose the entry at element 2 was 3 — the value of 3 at subscript 2 would be 3*2!. That is the definition of 3!, and the number can be normalized by representing it in terms of 3!. Note that it is not necessary to actually calculate the value of 3!.

Consider that the subscript i references the ith factorial "digit". What is the maximum value for i? Suppose the number is represented as an array of ints. The element i may have to temporarily contain the value of (i+1)*i before being normalized. This value must be representable as a positive int, and must not be allowed to suddenly turn negative, in order to propagate a carry value into the guard, G, and mark integer overflow. Since we are using ints, (i+1)*i cannot exceed INT_MAX. Assume INT_MAX is 32767 and the maximum value of i is 180, or 1 less than the integer square root of INT_MAX.

The rest of the array contains the factorial-base fraction. Consider it as a separate array beginning at &array[G]. The value of factorial place 1, at subscript G+1, is some integer x/1!, which is the same as x*1!, which is the same as x. This subscript is referenced to propagate carrys and borrows from the fraction part to the integer part during normalization. Otherwise, the array value at subscript G+1 is 0.

The remaining array elements represent the fraction proper. Each element will hold a maximum value equal to 1 less than its subscript relative to G. The maximum subscript (relative to G) is 1 less than the integer square root of INT_MAX. Underflow, or loss of significance, spills into subscript G2.

Now, assume the number is represented an array of longs and LONG_MAX is 2147483647. Most applications will not use up 46,339 factorial places. ANSI conforming compilers need only provide objects, including arrays, of up to 32,767 bytes and since each array element uses 4 bytes, arrays of at least 8,191 longs are presumably available.

facbase.c

Procedures and declarations for factorial-base numbers, the operations +, --, /, *, conversion to and from ASCII decimal, and support functions, are in Listing 1. Since that code contains only simple macros and no structs, unions, longs, or pointers to anything more exotic than chars and ints, it should compile under even the most rudimentary systems.

Only two things might lead to trouble on a system. First, some subset compilers may not allow compile-time initialization of arrays and variables. In that case, a short routine will have to be written to do runtime initialization of the integer variable nowarning and the constant arrays zero[] and one[]. The second problem may be lack of stack space. Some of the routines can use prodigious amounts of stack. On systems that allot a fixed amount of memory for use as a program stack, it may be necessary to declare all or most of the temporary numbers as static arrays, rather than the default to auto.

As described above, either integer or fractional parts can be adjusted freely up to about 180 factorial digits. The system is written for 100 factorial digits in both integer and fractional parts for the simple reason that the values of factorials in that range can be checked against the tables in the Chemical Rubber Co. Standard Mathematical Tables.

If some of the code looks oddly familiar, it is because operations on factorial-base numbers built around arrays are almost exactly the same as binary operations on arrays of bits. Shifting left and right are different because they are done by overt multiplication and division, while binary operations shift left and right in order to multiply and divide quickly. The only routine that might seem totally unfamiliar is the normalization routine that cleans up after operations. The software must mimic normalization usually performed in silicon.

In the absence of overflow or underflow, mathematic operations in factorial-base are exact. No rounding or truncation need ever be performed except when converting to a static-base on output. This code converts to decimal. To the limits of the system, the maximum error of any computation, no matter how large or how small the result, need never be greater than .5 units in the last place displayed. The actual results can always be exact and no errors propagate through a series of calculations. (Rounding to nearest, as that implies, might not be as appropriate as rounding towards zero or always rounding towards x. Note that the code provided truncates, rather than rounds.)

Included at the end of Listing 1 is code for a simple and slow RPN calculator program, which compiles when DEBUG is #defined. The calculator includes a routine to display a factorial-base number in factorial-base. Factorial-base integers are well-behaved but factorial-base fractions are confusing unless viewed as separate digits. Consider the infinite series for e:

The series can be coded directly into an array, normalized, and printed out. If the fraction is 100 factorial digits, it will print out correctly to 159 decimal places. On the other hand, .000 000 000 000 000 000 000 000 1 will underflow; it only has an exact representation in 105 factorial digits. Note, though, how it prints in decimal after underflow. If the fraction is 180 factorial digits, the value of e prints out correctly to 333 decimal places, 1.0E--44 requires all 180 factorial digits and 1.0E--45 underflows.

Conclusion

The factorial-base arithmetic is not practical for recalculation of a spreadsheet. But, for applications where accuracy is more important that speed, factorial-base arithmetic may be the answer. For programs that need to deal in exact fractions, factorial-base arithmetic is probably a good choice.

Suppose you want both fast and accurate arithmetic. You have what you think are fast and simple algorithms that perform calculations correct to 20 decimal places. Or do they? How do you verify that? How do you debug your code? Don't dismiss factorial-base math out of hand as too slow for practical applications.

Bibliography

Wayner, Peter. "Error-Free Fractions," BYTE, Vol. 13, No. 6, June, 1988, pp. 289-298. Presents a limited implementation in Pascal and cites a submission to CACM that has not yet appeared.

The Chemical Rubber Co., Standard Mathematical Tables, 18th ed., Cleveland, OH, 1970. Later editions have slimmed down, but earlier ones like this contain exact values for factorials up to 20!

Wozniak, Stephen. "The Impossible Dream: Computing e to 116,000 Places with a Personal Computer," BYTE, Vol. 6, No. 6, June, 1981, Pp. 392-407. Where else can you find the value of e printed out to 1,500 decimal places?