June 1994/Multi-Precision Integer Arithmetic Using C++

Scientific/Numerical Applications

Multi-Precision Integer Arithmetic Using C++

John K. Gotwals

The author has a Ph.D. in physics, and has worked with computers for 30 years. He is an associate professor of computer technology at Purdue University. His current interests are object-oriented programming using C++, and Windows programming using Visual C++. You may contact him by mail at CPT Dept., Knoy 242, Purdue University, W. Lafayette, IN 47907 or by e-mail at gotwal s@vm.cc.purdue.edu.

Introduction
In the last few years a need has developed for programming packages that can carry out integer arithmetic involving hundreds or thousands of digits. The advent of the RSA public-key crypto-system has contributed to this need by creating interest in factorization and testing for primality. Multi-precision arithmetic also allows researchers to observe patterns that appear in calculations, formulate conjectures about the patterns, and perhaps find counter-examples to others' erroneous conjectures.
This article describes the implementation and usage of a multi-precision integer arithmetic package on a 32-bit computer running Windows NT. I've written the high-level portions of the package in C++; I've found a clear and compelling reason for using C++ when operator overloading leads to a simplified programming interface. The system can perform arithmetic with integers of any reasonable length. For reasons to be discussed shortly, I've written the low-level routines in assembly language, but the interfaces to these routines are fully documented in the complete package. I've built and tested this package with Microsoft's 32-bit Visual C++, version 1.0, and I've made the full version, including assembly language source, available on the CUJ monthly code disk. I've also included a program (Listing 1) to demonstrate the capabilities of this package.

Design Issues and Implementation Choices
This design uses the signed-magnitude representation for numbers, in which negative and positive numbers of the same magnitude receive identical representation except in their signs. The implementation represents negative, zero, and positive numbers by sign values of -1, 0, and +1 respectively. Furthermore, the implementation assumes that at the end of every arithmetic operation the result will be normalized. With the exception of zero, a normalized number is one that does not have any leading zero digits. Zero is a special case and consists of one digit which contains zero.
The design process always involves tradeoffs between such things as performance, portability, size, ease of implementation and ease of use. In any extended-precision system it is necessary to detect overflow or underflow before applying a carry or borrow to the next most-significant digit. Since C doesn't provide a standard and efficient way to detect overflow in computations involving unsigned operands, I decided to implement the low-level arithmetic routines in assembly language. (I also used assembly language for performance reasons.)
The use of assembly language threatens to make a good portion of the system non-portable, and to throw me into the nefarious intricacies of extended-precision arithmetic at the CPU register level. To avoid this scenario in my particular implementation, I "wrap" a multiple-precision integer as an object of the LargeInt class. As shown in Listing 2, just below the access specifier private:, each object of this class contains a pointer (adr) to the most significant digit of a radix-2³² multi-precision integer, a count (len) of the number of radix-2³² digits, and the sign of the LargeInt object.
My implementation allocates memory for multi-precision integers from the heap. Since the arithmetic and assignment operations will always result in the creation and deletion of LargeInt objects, a large amount of heap activity occurs during program execution. I made no attempt to provide for a custom memory manager, since preliminary measurements made while running under Windows NT indicated a performance hit of only about 10% from the system memory manager.

Constructing a Multi-Precision Integer
Listing 2 shows the header file for the LargeInt class and lists the five LargeInt constructors available for creation of multi-precision integers. The following example invokes all five contractors to create and initialize n1 through n5. In this code fragment the default constructor initializes n1 to zero and the copy constructor is used to initialize n5 with the value contained in n4:
LargeInt nl, n2 = -123, n3 = 1234u;
LargeInt n4 = "12345678901234567890";
LargeInt n5 = n4;
Note that although n4 has been initialized with a string constant, it could have been initialized from a file input buffer. The only upper limit on the string initializer's length (assuming adequate virtual memory) is the maximum length that can be returned by strlen. The string initializer is converted from decimal to radix-2³² binary via the C function strtoul, which converts blocks of nine decimal digits (starting at the left of the string) at a time to binary. The constructor adds each converted block to the radix-2³² multiple-precision integer, and multiplies the resultant integer by 10⁹. The LargeInt member function decToBin carries out the conversion. A portion of this code is presented in Listing 3.
decToBin uses Homer's method to evaluate a polynomial whose coefficients are formed from blocks of nine decimal digits each. (Note that the listing does not include the code which picks off the optional sign from the front of the string, gets rid of any leading zeros, and tests for the special case of zero.) str is the address of the string to be converted, PackFactor is a constant with a value of 9 for a 32-bit system, and the implicit this pointer points to the LargeInt object, which contains the result of the conversion.

Comparing Multi-Precision Integers
Almost any calculation creates a need to perform arithmetic comparisons. Because C++ has provisions for overloading operators, a multi-precision comparison of lint1 with lint2 can be as simple as writing the code

if (lint1 < lint2) // do this else // do this
As currently written the LargeInt package overloads only the == and < operators, but additional comparison operators could be easily added. In this implementation, the left operand must be a LargeInt object, but the right operand can be either another LargeInt object or the built-in int type. Listing 4 contains the code for the overloading of the operator <.
The routine begins by checking the signs and number of radix-2³² digits of each operand. Only if the signs and lengths are the same is it necessary to perform a digit-by-digit compare. Although the code is fairly simple, I endured a fair amount of mental anguish while testing until I realized that I could not use the system memcmp function to compare digits, since it compares bytes instead of 32-bit words. See the bottom of Listing 2 for the definition of the memcmpInt function.

Multi-Precision Arithmetic
As shown in Listing 2, LargeInt objects can participate in the standard signed arithmetic operations of addition, subtraction, multiplication and division by using the overloaded operators +, -, *, and /. In addition, I've also overloaded the modulus, multiply assignment, and unary minus operators %, *=, and -, although the *= operator requires the right operand to be of type int. The code for the overloading of the * operator is given in Listing 5.
The * routine performs three steps:
1. Check whether either operand is zero (in which case, the routine is essentially finished).
2. Create a LargeInt result with sufficient space to hold the product.
3. Determine the sign of the result, and call the low-level multiply routine.
As shown by multiply's prototype at the top of the listing, multiply takes five parameters. Pointers u and v point to the multipliers, w points to their product, and n and m are the number of radix-2³² digits for u and v respectively.
The low-level multiply routine is shown in Listing 9. (The full assembly language source for all routines is provided on the code disk.) The LargeInt package includes several additional useful functions such as powerTwo, sqrt, divRem, and evalFrac. These functions' prototypes are given in Listing 2. The friend function divRem performs division between LargeInt variables, but unlike the division operator, divRem generates both the quotient and the remainder. The friend function evalFrac evaluates the fraction u/v, where u and v are both multi-precision integers, to any desired precision. This function is useful when you want to approximate some quantity with the rational number u/v.

Binary to Decimal Conversion
In the factorial calculation program shown in Listing 1, LargeInt variable result contains the factorial of a number that was entered by the user. To convert this result to a decimal number, the statement
result.binToDec(buf, bufSize)
calls the binToDec function with parameter buf pointing to an output string buffer and bufSize equal to the size of the string buffer. The function binToDec converts the radix-2³² integer into a decimal integer string and stores the string in the buffer. The function returns buf if the conversion is successful, but if there is insufficient string buffer space, the conversion process is halted and binToDec returns NULL.
A description of the binary to decimal conversion procedure is as follows: The routine creates blocks of nine decimal integers at a time by dividing result repeatedly by 10⁹ and stopping when the final quotient is zero. After each division the routine calls sprintf to convert the remainder from the division to a string of nine decimal digits. Each string of nine digits is stored at successive positions in the string buffer, and the quotient from the division becomes the dividend for the next division. Since this conversion procedure generates the least-significant block of digits first and the most-significant block last, the output string must be reversed, in blocks of nine.
Listing 6 contains the code for the binary to decimal conversion. (Code to remove leading zeros from the result string and possibly insert a minus sign has been omitted from this listing.) PackFactor is a constant with a value of 9 for a 32-bit system. Since the Largelnt object to be converted may not be changed, the routine must create a copy (stored in variable copy in the listing). The low-level division routine divrem carries out the real work of the conversion process. divrem's prototype is shown at the top of Listing 6.
This routine divides the m-place integer pointed at by u by the single-precision integer v. When the function has finished the division, the dividend has been replaced by the quotient and the remainder is returned on the stack.

Testing
Testing a package of this type is difficult. As an example, the low-level divide function is quite complex and consists of over 200 lines of assembly language code with a fair number of conditional branches. The code is a derivative of a program by Knuth which was written for his fictional MIX computer. Knuth has written that "some portions of that program would probably never get tested even if a million random test cases were tried." Since I developed this package for "recreational computing," my main method of testing was to run some programs which have known results.
For very basic debugging and testing, the function lintDump displays the sign and hex contents of each radix-2³² digit. The following code fragment provides an example; the output can be checked by using a scientific calculator.

LargeInt n1 = "-12345678901234567890"; printf ("n1 = "); n1.lintDump(); // n1: sign = -1 AB54A98C EB1F0AD2
At a more advanced level of testing, a test program can use the sqrt function to take the square root of a large integer, square the result and compare it with the original. Listing 7 shows a surprisingly simple function from this package which uses Newton's method to compute the square root of an integer. Using this function will exercise the add, multiply, and divide operations.
A CPU resource-intensive test is the Lucas-Lehmer algorithm for testing primality of a Mersenne number. Mersenne numbers are those of the form 2p - 1. The Lucas-Lehmer test determines if for a particular prime number p, the corresponding Mersenne number is also prime. Listing 8 shows a program which asks the user to enter a prime number and then determines if 2p - 1 is prime. This program calls the timer function to measure execution time, and the if statement inside the for loop displays information about the progress of the calculation.

Performance
A program which can calculate and display the factorial of any reasonably sized number is given in Listing 1. (The bufSize constant will have to be adjusted if the result contains 50,000 or more digits.)
Although this package performs signed arithmetic on integers of almost any length, as the precision increases the processing time will also increase, and probably in a nonlinear fashion. This time increase is illustrated by the program in Listing 1. Table 1 summarizes the data obtained from several different runs on a 33Mhz 486DX computer under Windows NT. Note the significant amount of time needed to perform the binary to decimal conversions.
The Lucas-Lehmer primality test is another CPU-intensive process. Although the algorithm is simple, the program processes large numbers. Table 2 shows some representative times.

Conclusion
Using C++ to overload the arithmetic operators and to create a LargeInt class of extended precision integer objects provides an easy-to-use package for multi-precision integer arithmetic. A package of this type allows the user to carry out calculations that are simply not feasible with most standard languages.
Be forewarned that your calculation results may sometimes be challenged. After calculating and printing all 1,512,852 digits of 300,000 factorial, I proudly showed the results to my neighbor. After looking at several pages of the output, he stopped on page 367 and circled one of the numbers with his pen. He then informed me that my output was wrong, because the digit he circled should be a "4" and not a "3" as was represented by my calculations. We had quite a heated exchange of words, and he told me that I could only make him change his mind by redoing the calculations by hand!

References
Bressoud, D., Factorization and Primality Testing, SpringerVerlag, 1989.
Knight, D., "An Ada Package for Multi-Precision Integer Arithmetic," SIGSMALL/PC Notes, ACM Press, Nov. 1993.
Knuth, D., The Art of Computer Programming, Volume 2, Addison-Wesley, 1981.
Nievergelt, J., Farrar, J., Reingold E., Computer Approaches to Mathematical Problems, Prentice-Hall, 1974.