UNDOCUMENTED CORNER

Inside the Pentium FDIV Bug

Tim Coe

Tim, who received his BSEE and MSEE from MIT in 1986, is a chip designer at Vitesse Semiconductor. He can be contacted at coe@vitsemi.com.

Introduction

by Andrew Schulman

I take out a cheap pocket calculator--actually a $19.95 Roget's Thesaurus & Spell Checker, to which Seiko threw in an 8-digit calculator "for free"--and divide 4195835 by 3145727. I get the answer 1.3338204. How do I know this is correct? By multiplying 1.3338204 and 3145727. This should give the result 4195835.

Of course, it doesn't. Division often produces a result whose decimal representation is inexact. On my cheap calculator, 1.3338204'3145727 yields not 4195835, but 4195834.8--the answer is too low by 0.2. Close enough for government work, perhaps, but to get a better answer, we need more digits.

My $20.00 calculator offers only eight digits. But my $3000 Pentium-based Dell Dimension XPS P60, which (according to the Pentium Processor User's Manual) complies with IEEE Standard 754 for Binary Floating-Point Arithmetic, guarantees 15-16 significant decimal digits (double precision). On a Pentium, then, you should be able to produce a far more accurate answer than on a cheap 8-digit calculator.

Unfortunately, as even David Letterman's audience has heard by now ("how about some defective Pentium salsa to go with those defective Pentium chips?"), the Pentium floating-point divider has a bug. If you do a floating-point divide (FDIV) of 4195835.0 by 3145727.0, you get the answer 1.333739068902. Multiplying this by 3145727 yields, not 4195835 or even the 4195834.8 produced by the calculator, but 4195579--a full 256 too low! Whereas the cheap calculator's answer would require only additional digits to gain precision, the Pentium's answer is simply wrong starting in the fifth digit. This is less precision than even IEEE single-precision numbers are supposed to have.

Given that the division operation has a trivially simple algorithm which we all learned in school, how is it that the Pentium sometimes divides incorrectly? The Pentium manual page for FDIV simply shows DEST <-- DEST/SCR, reinforcing the idea that division is a trivial, atomic operation. However, division is anything but simple if you want to do it quickly. (Growing up, I could always tell when my father was doing division on his Friden or Marchant calculator, because the whole house would shake for about five minutes.) Entire books, journals, and conferences are devoted to discovering newer, faster ways of performing the seemingly simple arithmetic operations, particularly division.

In this month's "Undocumented Corner," Tim Coe shows that the Intel Pentium uses a division algorithm, first discovered in 1958, called "radix 4," or "SRT." Even to many programmers, the idea that there is a division algorithm--that a complex piece of software runs "inside" the simple-looking / operator or FDIV instruction--is a revelation. It's also quite revealing that we're talking about an algorithm that dates back to 1958: Many of the ideas in "cutting edge" products are real-ly quite old. At any rate, there is a small bug (basically, a 0 appears several places in a table where a +2 ought to be) in the Pentium's implementation of the SRT divider.

Tim's article is a model of reverse engineering. Not only did he use the pattern of numbers whose reciprocals the Pentium calculates incorrectly to deduce the division algorithm used by the Pentium, but he constructed a model that predicted which other divisions would fail. He then confirmed these predictions on an actual Pentium. Most so-called "software engineering" is never like this, alas. Tim has demonstrated a genuinely scientific approach to software analysis.

As Tim points out, thanks are due to Dr. Thomas Nicely, not only for uncovering this bug, but also "for providing this window into the Pentium architecture." Flaws reveal far more than success, and the full story of the Pentium debacle is far from over. Intel lost nearly half a billion dollars because it tried to cover up the bug. The company recently instituted new policies to inform customers about future processor defects.

You can expect further "Undocumented Corners" on the Pentium processor. For some time, I have been trying to put together a piece on the undocumented "Appendix H" features of the Pentium, particularly its Virtual-8086 mode extensions. If you have any comments on this or any other important, undocumented, or buggy interface, please contact me on CompuServe at 76320,302.

On October 30,1994, Professor Thomas Nicely sent e-mail to several people (including Andrew Schulman) regarding a bug in the Pentium divider. For example, he wrote, 1 divided by the prime number 824,633,702,441 (a twin-prime pair with 824,633,702,443) "is calculated incorrectly (all digits beyond the eighth significant digit are in error)." Dr. Nicely provided several other values for which the Pentium produces an incorrect reciprocal, noting that the bug can be observed "by calculating 1/(1/x) for the above values of x. The Pentium FPU will fail to return the original x (in fact, it will often return a value exactly 3072=6*0x0200 larger)."

Schulman (who didn't have a Pentium machine at the time) forwarded the e-mail to Richard Smith of Phar Lap Software, asking him if he knew anything about the bug. After verifying the bug, Smith reposted Nicely's message to the CompuServe Canopus forum hosted by Will Zachmann.

Alexander Wolfe at the Electronic Engineering Times saw this post and contacted Terje Mathisen in the comp.sys.intel Internet newsgroup. Mathisen also verified the bug and wrote a small program to test for it. Mathisen then posted his work, a brief excerpt from which appears in Listing One , to comp.sys.intel, starting the thread "Glaring FDIV bug in Pentium!" Andreas Kaiser saw this and promptly wrote a program to do reciprocals of random numbers and let it run for a day. On November 4, he posted the divide failures he saw back to comp.sys.intel. Wolfe's article, the first published description of the FDIV bug, appeared on the front page of EE Times (November 7, 1994).

Kaiser wrote that he had performed roughly 25,000,000,000 reciprocals and that the division was usually correct. He knew that the exponent did not matter: If X fails, then X*(2^N) will also fail, so he divided each failure by 2 until odd. The 23 numbers shown in Listing Two were the failing reciprocals he found; I will refer to these numbers as "Kaiser's list."

Radix 4 and SRT Division

At that time, I was considering buying a Pentium machine and was following several PC-related newsgroups. On November 6, the two snippets in Listings One and Two flowed across my terminal.

There was a pattern in Kaiser's list. As a floating-point designer, I wondered what could be derived about the Pentium divider design from that pattern, so I started writing the numbers out in binary. Listing Three shows Nicely's prime and one number from Kaiser's list in binary. Analysis of the numbers from Kaiser's list reveals that all but two of them are of the form in Figure 1(a), where J and K are integers greater than or equal to 0 and delta is a real number that has varying ranges depending on J, generally between 0 and 1.

The 2^K factors common to all the terms in Figure 1(a) reflect the arbitrariness of the exponent in the occurrence of an error. The 2^(-2*J) factors common to the terms that express the deviation of an operand from a binary scaled 3 indicate that the Pentium divider must be an iterative divider that computes two bits of the quotient per cycle. This is because this deviation must be creating some specific pattern in the remainder that acts like a key--unlocking the bug and releasing it to do its damage.

That this key can be multiplied by a somewhat arbitrary power of 4 (note: 2^(-2*J)= 4^-J) and still unlock the bug, means two things: 1. On each cycle, the Pentium is multiplying the key by a factor of 4; and 2. greater values of J represent this key starting deeper in the remainder and therefore reaching the point where it unlocks the bug on a later cycle. To multiply the remainder by 4 on each cycle, two bits of quotient must be generated.

Two bits/cycle is in rough agreement with the quoted 39 cycles/extended long division from the Pentium data book: 32 cycles are needed to generate roughly 64 bits of quotient, with a few extra bits generated to allow for correct rounding and a couple of additional cycles required to set the divide up and finish it off.

The technical name for this type of divider is "radix 4," which essentially means the operation is performed in base 4. The longhand decimal division taught in school is a radix 10 iterative divide algorithm. The algorithm selects an appropriate quotient digit on each iteration and then recalculates the remainder and quotient according to Figures 1(b) and 1(c).

An "appropriate quotient digit" is defined as a digit for which the remainder after the application of equation Figure 1(b) is both greater than a geometric sum along the radix of the least-possible quotient digit, times the divisor, and less than a geometric sum along the radix of the greatest-possible quotient digit, times the divisor; see Figures 1(d), 1(e), and 1(f).

There are several multiplies in Figures 1(b) and 1(c). Hardware to implement multiplies by anything other than a power of 2 is very expensive both in chip area and time. Multiplications by positive and negative powers of two are just shifts and inverts. Having radix 4 and the possible quotient digits of -2, -1, 0, 1, and 2 meet the multiplication criteria nicely.

Having five possible digits in a radix 4 divider brings us to what is known as an "SRT divider," named after its three independent discoverers--D. Sweeney (IBM), J.E. Robertson (University of Illinois), and K.D. Tocher (Imperial College of London). The Robertson and Tocher papers were published in 1958.

Selecting a quotient digit when the number of possible digits equals the radix leaves no margin for error. By turning around the equations in Figures 1(b) and 1(c) for longhand radix 10, we get the equation in Figure 1(g). Making perfect digit selections would be very expensive in terms of hardware. What the originators of the SRT algorithm proposed was that, by having more possible digits than the radix, a certain amount of slop in quotient-digit selection would be recoverable. Turning around the equations in Figures 1(b) and 1(c) for radix 4 and the digit set -2, -1, 0, 1, 2 leads to the equation in Figure 1(h).

Now there is only one possible digit selection only if it is well within the two bounds. A reasonable implementation of the equation in Figure 1(h) can be achieved using only a limited number of the most-significant bits of both the divisor and the remainder. In hardware, this can be realized with a table, comparators, or random logic. The Pentium uses a lookup table (I found this out from Intel much later in the game); my models use functionally equivalent sets of comparisons. Listing Four is an example of longhand division in base 4.

History

Having made the multiplies in the equation in Figure 1(b) easy to perform, the next issue is how to perform the add. The most conceptually straightforward way to do this is with a regular adder. But regular adders that produce a normal sum have to deal with the problem of propagating carries.

For example, examine the two adds in Listing Five . Note that only changing one input bit resulted in many of the output bits changing through the propagation of a carry. A regular adder must handle all possible occurrences of this situation anywhere in the add, and it must handle this situation correctly. Propagating carries is expensive in terms of both hardware and time.

Carry-save adders are often used in hardware floating-point design to perform adds in such a manner that carries need not be propagated. Carry-save adders can be of several different types. One of the simpler types that is adequate for the remainder calculation in a radix 4 divider is known as a "3-to-2 carry-save adder." At each individual bit position, the carry-save adder takes three bits (one from each of the input operands) and computes their sum. This sum is expressed as a sum bit, which has the same significance as the input bits, and a carry bit, which has twice the significance as the input bits. The truth table in Table 1 illustrates the logic performed at each individual bit position of the carry-save adder.

The input words to the carry-save adder are the old-remainder sum word, old-remainder carry word, and -digit*divisor for the given cycle. The outputs of the carry-save adder are the new-remainder sum word (shifted left 2 to reflect the multiplication by the radix) and the new-remainder carry word (shifted left 2 to reflect the multiplication by the radix and shifted left 1 more to reflect the extra significance of the carry bits). Table 1 is very easy to implement in hardware.

The true remainder value on any given cycle is the sum of the remainder sum word and the remainder carry word. When a normal carry-propagate adder is used to calculate the remainder, there is only one way to represent a remainder of a given value. When a carry-save adder is used to calculate the remainder, the way in which the true remainder value is apportioned between the sum word and the carry word depends upon the history of carry-save adds performed.

In particular, I had noticed a while back that long, coincident sequences of 1s in both the sum word and the carry word occurred following a very specific history. To get a coincident sequence of 1s of length N+1 in the remainder at the end of a cycle, there must have been a coincident sequence of 1s in the remainder of at least length N at the beginning of a cycle and the -digit*divisor must have a sequence of consecutive 1s of at least length N aligned with the sequence in the remainder. The initial remainder, which is just the dividend in the sum word, has all 0s in the carry word.

If a very specific history is necessary to create a pattern, that pattern will be extremely rare. All sources indicated that divide errors were extremely rare (1 in 10¹⁰ for random divides quoted from Intel in the November 7 EE Times and 1 in 10⁹ for random reciprocals from Kaiser's list). However, the only way I could imagine the error being this rare was that the Pentium divider was using a carry-save adder to do the remainder calculation and that the long, specific, and therefore rare history associated with the buildup of long coincident sequences of 1s in carry-save remainders was involved with the failure.

Terje Mathisen had provided the quotients that resulted from taking the reciprocal of Nicely's prime on both the Pentium and the 486. I already knew the divisor and initial remainder (1 for a reciprocal), and from the quotients I could extract the digit sequences, giving me the history of -digit*divisor. It turns out that multiple digit sequences are possible (see Listing Six) that can give the same quotient, and indeed there was some ambiguity near the cycle where I surmised the error was occurring.

When I saw the long sequences of -1 digits, I wrote a simple model of a divider with the digit sequence hardwired to +1, followed by endless -1s and a carry-save adder to do the remainder recalculation. The bit patterns that developed included large (>= 5 bits long near the failure), coincident sequences of 1s in the remainder. I started running the numbers included in Kaiser's list and numbers near them that I surmised were not failing. I noted two conditions had to be met at the beginnings of cycles 14 and 15 for numbers near Nicely's prime to fail.

The first condition was associated with the selection of -1 as the quotient digit on cycle 14 and could be expressed thus: The sum of the eight most-significant bits in my representation of the remainder sum word and the remainder carry word (chopped remainder sum, note that chopping of the 56 least-significant bits comes first and summing comes second) at the beginning of the cycle could not be greater than 250 or less than 239. Since I was carrying five bits to the left of the binary point (four are adequate, but I left plenty of room) and representing everything in two's-complement format, this corresponds to a chopped remainder sum between -17/8 and -6/8, inclusive.

The second condition was associated with the value of the remainder on cycle 15, the cycle after the last selection of a -1 digit for the quotient. I determined that to get an error, the value of the chopped remainder sum had to be 30 (30/8 if the binary point is taken into account) and the first three bits in both the remainder sum word and the remainder carry word that were chopped off (six bits total of significance ranging from 2^-4 to 2^-6) all had to be 1. (I found out three weeks later that this condition, while empirically correct about whether an error would occur, is actually a consequence of the root cause of the error.)

The 5-bit long sequence of coincident 1s in the carry-save remainder at bit positions 2^-4 to 2^-8 and at least one at bit position 2^-9 at the beginning of cycle 14 is the requirement to generate the conditions tested for at the beginning of cycle 15.

Exactly the same conditions applied to other numbers on Kaiser's list, but they applied to cycles 12 and 13; J=2 in Figure 1(a) for Nicely's prime.

Move it Up

The pileup of 1s in the remainder that was a precondition for the error didn't start until about halfway through the long sequence of -1 digits. A long sequence of 1s was necessary in the divisor for the pileup. When doing reciprocals, both this sequence and the starter pattern associated with the 1149 term in Figure 1(a) had to have the divisor as their root source. I knew that hardware dividers were by no means restricted to doing reciprocals, so I wondered whether I might be able to move the starter pattern into the dividend and just use the divisor as a bed upon which pileups of 1s could grow. I thought I might be able to get the pileup to start growing in the first cycle, thereby meeting the conditions I had determined necessary for an error in the seventh or eighth cycle.

After playing with the divisors and dividends, I determined that the following dividend and divisor constituted the smallest pair of integers that would induce the error on the earliest cycle possible:

hex     800bf6 / bffffc
decimal 4195835 / 3145727

I had been doing all my modeling on a Sun IPC running UNIX, and I had no access to a Pentium. After predicting the above divide would fail, I drove down to my local CompUSA where I asked a salesperson how to load up Windows calculator on a Pentium (I had never used Windows before). I did the above divide, then multiplied back by the divisor, and what do you know--the answer was off by 256. This represented an error of 1 part in ~16000. Listing Seven provides an algebraic analysis of the relationship between the numbers on Kaiser's list and the above ratio.

I mulled this over for a couple of days and then wrote an abbreviated description of my reasoning. On November 14, I posted the reasoning and program (TWO-THIRDS.C, available electronically; see "Availability," page 3) along with an explanation of the output of TWOTHIRDS.C and ITERATOR.C to comp.sys.intel.

More of the Same

It turns out I was not the only one writing the numbers on Kaiser's list in binary. Dik Winter had also been doing so and queried Andreas Kaiser about the two entries on his list that don't match the equation in Figure 1(a); Kaiser's response is shown in Listing Eight .

To address these final two cases, the divider model had to be fleshed out with a full quotient-digit-selection algorithm and a fully capable calculator of -digit*divisor. The second item was trivial to accomplish (do the appropriate shift, invert, or clear). The creation of a quotient-digit selector required the application of theory and the appropriate choices where the algorithm allowed multiple possibilities for the quotient digit; see Figure 1(h).

Due to the nature of the condition for failure on cycle 14 for numbers near Nicely's prime, the approximation of the remainder used to determine the quotient digit had to be the previously defined chopped remainder sum. The quotient digit of -1 selected for the chopped remainder sum of 250 (-6/8) could equally well have been assigned to 0. I took this to mean that I should choose the quotient digit farthest away from 0 consistent with correct divider operation versus a given combination of remainder and divisor.

So what approximation of the divisor would be appropriate to use in quotient-digit selection? My analysis convinced me that I needed the first six bits of the divisor. But it turns out there was an error in my analysis; only five bits of the divisor are necessary for quotient-digit selection. This error led me to believe that the full divider model would not handle certain divisors correctly, but this turned out to be wrong; the error had no adverse impact on the divider model's ability to predict errors.

The quotient-digit selector is implemented in two parts. Once, at the beginning of the divide, the most-significant six bits of the divisor are checked against several thresholds to pick the thresholds to be checked against the remainder. At the beginning of each cycle of the divide the chopped remainder sum is checked against these thresholds to determine the appropriate quotient digit, and -digit*divisor is subsequently calculated.

I ran the final two cases from Kaiser's list through the model and saw that big pileups of 1s and conditions very similar to the original second condition for failure were occurring on corresponding cycles for these divides. The only differences in the conditions for failure were different values for the chopped remainder sum. The requirement for six 1s in the most-significant chopped-off bits of the remainder remained the same.

To model the occurrence and amount of the error, a check for the conditions for failure was inserted into the program; if an error condition was detected, the program would ask the user whether a correct or incorrect result was desired. The amount that appeared to be incorrectly subtracted from the remainder on the cycle of interest was too small to be due to a remainder overflow. Also, the amount subtracted off the remainder was 3 for divisors beginning with 0x8f and 4 in other cases.

I could not conceive of a specific error in the logic design that would result in this somewhat bizarre specification and consequences of failure. So I attributed the error to some complexity (unknown to me) used to speed up digit selection in the Pentium divider's quotient-digit selector and remainder calculator. To model the amount of error precisely, I simply subtracted the appropriate amounts from the remainder when an error condition was encountered.

I posted my more-complete divider model to comp.sys.intel on November 16. At this point, the following divisors and digit sequences near the error had been found to produce errors:

0xbf.. -1 +2
0x8f.. -1 +2
0xa7.. -2 +2

Upon examination of the quotient-digit-selection algorithm, the following divisors and digit sequences appeared to create very similar situations:

0x8f.. -2 +2
0xbf.. -2 +2
0xd7.. -2 +2
0xef.. -1 +2
0xef.. -2 +2

I played around with the possible combinations of dividend and divisor that would appear to induce failure and discovered that the following fairly simple conditions would generate error conditions along the -2 +2 digit sequence:

(intdividend-deltadividend)/
    (intdivisor-deltadivisor)

intdivisor=3, 9, 15, 21, 27
intdividend>0

10-8>deltadividend/intdividend>
     deltadivisor/intdivisor>0

Either intdividend modulo intdivisor=intdivisor/3 or intdividend modulo intdivisor= 2*intdivisor/3 must hold (which one depends on the relative binary exponents of the operands); for example, 6.9999995/2.99999999. These operands were simple enough to generate on the fly. I also constructed a test case for the remaining -1 +2 digit sequence case and made another trip to CompUSA. All of the cases listed earlier produced divide errors. I also tested a couple of cases halfway between these cases on the outside chance that they would produce divide errors; these divides were performed correctly.

I updated the divider model to test for and correctly model all the divide errors that I knew of and posted my results along with some general characteristics of operands that were at risk and an analysis of what the probabilities of error actually were. The post went out to comp.sys.intel on November 20. The divider model available electronically (ITERATOR.C) has more extensive comments and improved variable names; the original error test and modeling are included but commented out.

A Final Insight

Over the two weeks following my November 20 posting, I was contacted by Cleve Moler of The Mathworks and became involved in his and Mathisen's effort to produce an efficient and accurate software workaround to the Pentium divide problem. We determined that in order to achieve accuracy, the software patch would have to do a divisor check prior to each division to determine if the divide had any risk of failing. If this check came back positive, both the divisor and dividend would be scaled by 15/16 and the divide would then be performed as normal. This scaling was guaranteed to map any at-risk divisor into a divisor that was not at risk. My part in this effort was to determine what constituted an at-risk divisor.

I had been investigating the possible quotient-digit sequences and divisor-bit sequences that could cause an error by using my model of the Pentium divider. In particular, I had been trying to construct erroneous divides with the least possible consecutive 1s in the divisor starting at bit position 2^-5. I had been able to construct an error with only eight consecutive 1s (bit positions 2^-5 to 2^-12) but no fewer. So we decided that a bit mask checking for 1s in these positions along with a table lookup to ensure that the first five bits of the divisor were also at risk would be a good divisor check.

At this point Moler, Mathisen, and I got together with the Intel compiler group and Peter Tang to jointly produce a software patch that Intel would then distribute to all compiler and assembler vendors. We immediately concluded that a divisor check based only upon empirical results would not be of sufficient quality. What was needed was closed-form proof that a certain number of 1s was required to address the flaw in Intel's P-D (Partial Remainder-Divisor) table. This proof would use the design itself, not empirical results, as its starting point.

On December 2, I received a copy of the Intel white paper on the flaw, but I had a difficult time reconciling the flaw in the P-D table with my empirically determined conditions for failure. I initially thought that I would need to know some additional design details to produce a proof. Later that day, however, I realized that what I really needed was better insight into the mechanism of failure.

For at-risk divisors just less than 3/2 (first five bits 1.0111), my empirically determined conditions for failure were a chopped remainder sum of 30 (30/8) and six 1s in the highest bits of the chopped-off portion of the carry-save remainder. The flawed P-D entry for this divisor was associated with a chopped remainder sum of 31 (31/8).

The insight was that my model was detecting the error one cycle before the error actually occurred! Meeting the model's condition for failure on cycle[flaw-1] always resulted in the selection of a +2 digit and the addressing of the flaw on cycle[flaw]. In addition, this was the only way to reach the flaw. Using this insight, Tang and I were able to produce a closed-form proof that showed that six 1s in bit positions 2^-5 to 2^-10 in the divisor were required to address the flaw. I was later able to construct a divide failure that had a 0 in bit position 2^-11 showing that my originally determined eight 1s was incorrect.

After addressing the flaw, which incorrectly selects a 0 digit instead of a +2, the remainder overflows. This gives the appearance that 16 has been subtracted from the remainder at the end of cycle[flaw]. For divisors beginning with 1.0100, 1.0111, 1.1010, and 1.1101, this results in a remainder that abides by the equation in Figure 1(f), and the divide continues normally. For a divisor beginning with 1.0001, the remainder is out of the bounds of the equation in Figure 1(f) at the beginning of cycle[flaw+1]. A 0 digit is selected again on this cycle and the remainder overflows again in the opposite direction. Since this overflow occurred on the next cycle, it is a factor of 4 less than the first overflow, leading to a net effective subtraction from the remainder of 12. The remainder is then back in bounds, and the divide proceeds normally.

With the exception of the five flawed P-D entries, no other design details are relevant to these divide errors other than those that were reverse-engineered. Intel's P-D table is functionally identical to my thresholding mechanism in selecting quotient digits. Modification of the model to reflect the flawed P-D entries actually made my modeling of the error conditions simpler. A model of the flaw is included in ITERATOR.C.

I would like to thank Cleve Moler, Terje Mathisen, Peter Tang, and the forces at Intel for involving me in their effort to create a software workaround to this problem; Andreas Kaiser for providing such incredibly valuable information; and my employer, Vitesse Semiconductor, for supporting my involvement in this firestorm. Most of all, I would like to thank Dr. Thomas Nicely for opening this window into the Pentium's divider design.

Bibliography

Atkins, Daniel E. "Higher-Radix Division Using Estimates of the Divisor and Partial Remainders." IEEE Transactions on Computing, 1968.

Koren, Israel. Computer Arithmetic Algorithms. Englewood Cliffs, NJ: Prentice Hall, 1993.

Omondi, Amos R. Computer Arithmetic Systems: Algorithms, Architecture and Implementations. Englewood Cliffs, NJ: Prentice Hall, 1994.

Sharangpani, H.P. and M.L. Barton. "Statistical Analysis of Floating Point Flaw in the Pentium Processor (1994)." Intel Corp., November 30, 1994. (Available from http://www.intel.com/product/pentium/white11/ index.html.)

Figure 1 Equations describing (a) the common features of the failing reciprocals; (b - f) the correct operation of dividers in general; (g) radix 10; and (h) radix 4 SRT dividers. Table 1 Carry-save adder truth table.

Listing One


Pentium (60 & 90)
  8.24633702441000E+0011 = 4026BFFFFFB829000000  824633702441
  1.00000000000000E+0000 = 3FFF8000000000000000             1
  1.21265962489116E-0012 = 3FD7AAAAAADFDB8E4CCB  1/824...
  9.99999996274710E-0001 = 3FFEFFFFFFF000000001  (1/824..)*824...
486DX
  8.24633702441000E+0011 = 4026BFFFFFB829000000
  1.00000000000000E+0000 = 3FFF8000000000000000
  1.21265962940867E-0012 = 3FD7AAAAAAEA8638FB73
  1.00000000000000E+0000 = 3FFF8000000000000000

Listing Two


    3221224323           12884897291          206158356633
  824633702441         1443107810341         6597069619549
 9895574626641        13194134824767        13194134826115
13194134827143        13194134827457        13194138356107
13194139238995        26388269649885        26388269650425
26388269651561        26388276711601        26388276712811
52776539295213        52776539301125        52776539301653
52776539307823        52776553426399

Listing Three


1011111111111111111111111011100000101001       = 824633702441
1011111111111111111110111000001000110111101101 = 52776539295213

Listing Four


In base 4 representation the operands are:

dividend   1.00000113323  X  2^22
divisor    1.13333333332  X  2^21

step number  ==>  1 2 3 4 5 6 7 8 9
plus digits  ==>  1.0 0 0 0 0 0 2 ?   <==  Add these two numbers
minus digits ==> -0.1 1 1 1 1 1 0 ?   <==  to create the quotient
                -------------------------
  1.13333333332 | 1.0 0 0 0 0 1 1 3 3 2 3   [dividend=remainder 1]
                 -1.1 3 3 3 3 3 3 3 3 3 2   (minus 1*divisor)
                  -----------------------
                   -1.3 3 3 3 2 2 0 0 0 3 0
                    1.1 3 3 3 3 3 3 3 3 3 2   (minus -1*divisor)
                    -----------------------
                     -1.3 3 3 2 2 0 0 0 3 2 0
                      1.1 3 3 3 3 3 3 3 3 3 2   (minus -1*divisor)
                      -----------------------
                       -1.3 3 2 2 0 0 0 3 2 2 0
                        1.1 3 3 3 3 3 3 3 3 3 2   (minus -1*divisor)
                        -----------------------
                         -1.3 2 2 0 0 0 3 2 2 2 0
                          1.1 3 3 3 3 3 3 3 3 3 2   (minus -1*divisor)
                          -----------------------
                           -1.2 2 0 0 0 3 2 2 2 2 0
                            1.1 3 3 3 3 3 3 3 3 3 2   (minus -1*divisor)
                            -----------------------
                             -0.2 0 0 0 3 2 2 2 2 2 0
                              1.1 3 3 3 3 3 3 3 3 3 2   (minus -1*divisor)
                              -----------------------
                                3.3 3 3 0 1 1 1 1 1 2 0   [remainder 8]
                               -2.3 3 3 3 3 3 3 3 3 3 0   (minus 2*divisor)
                                -----------------------
                                  3.3 3 0 1 1 1 1 1 3 0 0   [remainder 9]
                                 ??.? ? ? ? ? ? ? ? ? ? ?   (minus ??*divisor)
                                  -----------------------
                                   ??.? ? ? ? ? ? ? ? ? ? ?
plus digits  ==>  1.0 0 0 0 0 0 2 0 0 0...
minus digits ==> -0.1 1 1 1 1 1 0 0 0 1...
                -------------------------
                                  3.3 3 0 1 1 1 1 1 3 0 0
[The Pentium's Choice :-)]        0.0 0 0 0 0 0 0 0 0 0 0   (minus 0*divisor)
                                  -----------------------
[overflow remainder and           3 3.3 0 1 1 1 1 1 3 0 0
wrap around to negative ==>]       -0.0 3 2 2 2 2 2 1 0 0 0
[The division continues]            0.0 0 0 0 0 0 0 0 0 0 0  (minus 0*divisor)
                                    -----------------------
                                     -0.3 2 2 2 2 2 1 0 0 0 0
                                      1.1 3 3 3 3 3 3 3 3 3 2 (minus -1*dvsr.)
                                      -----------------------
                                        2.1 1 1 1 1 2 3 3 3 2 0
                                        .
                                        .
plus digits  ==>  1.0 0 0 0 0 0 2 2 2 2...
minus digits ==> -0.1 1 1 1 1 1 0 0 0 0...
                -------------------------
                                  3.3 3 0 1 1 1 1 1 3 0 0
[Correct Answer]                 -2.3 3 3 3 3 3 3 3 3 3 0   (minus 2*divisor)
                                  -----------------------
                                    3.3 0 1 1 1 1 1 3 1 0 0
                                   -2.3 3 3 3 3 3 3 3 3 3 0  (minus 2*divisor)
                                    -----------------------
                                      3.0 1 1 1 1 1 3 1 1 0 0
                                     -2.3 3 3 3 3 3 3 3 3 3 0  (minus 2*dvsr.)
                                      -----------------------
                                        0.1 1 1 1 1 3 1 1 1 0 0
                                        .
                                        .

Listing Five


   10011010111011001       10011010111011001
  + 1000101000101001      + 1000101000100001
                ^                       ^
   -----------------       -----------------
   11100000000000010       11011111111111010

Listing Six


Cycle number         13 14 15

Pentium ==>
    +1 -1 (ten -1's) -1 -1 +2  0  0 ...
               or
    +1 -1 (ten -1's) -1  0 -2  0  0 ...

Correct ==>
    +1 -1 (ten -1's) -1 -1 +2 +2 +2 ...
               or
    +1 -1 (ten -1's) -1  0 -1 -1 -2 ...

Listing Seven


All but two of the numbers posted by Andreas Kaiser
(including Nicely's prime) had the form:

3*(2^(K+30)) - 1149*(2^(K-(2*J))) - delta*(2^(K-(2*J)))

Normalize this expression to a number in [1,2) by
dividing by 2^(K+31):

3/2 - 1149*(2^(-31 - 2*J)) - delta*(2^(-31 - 2*J))

Delta has to be generally between 0 and 1 but these bounds
vary with J. Taking note that 1149 = 1152 - 3 = (9/8)*1024 - 3,
the above can be restated as:

3/2 - (3/2)*((3/4)*(2^(-21 - 2*J)) - delta*(2^(-31 - 2*J)))
             ---------------------x-----------------------

The restrictions on delta are now that it must be in general between
4/3 and 2.  It turns out the upper limit on delta of 2 varies very
little with J but the lower limit of 4/3 varies greatly with J.  (For
J large it goes towards a limit of 0 and for J negative it is greater
than 2, i.e., no failures.)

It is now clear that the criteria for failure of a divide is an
at-risk divisor (an at-risk divisor is one with a bit sequence
appropriate for the buildup of a pile of 1's for a given quotient
digit selection sequence; for this digit sequence this means ~19
consecutive 1's) coupled with a specific digit selection sequence
during the divide.  A specific digit selection sequence is precisely
equivalent to getting a specific quotient.  Do a simple one term
Taylor series expansion to get a simple expression for the quotient:

(1 + y)/(3/2 - (3/2)*x) = (2/3)*((1 + y)/(1 - x)) ~=
(2/3)*(1 + y)*(1 + x) ~=
(2/3)*(1 + y + x)

The variable named 'y' is 0 in the initial reciprocals.  The reason
there are no failures for J negative in the above expression for 'x'
is that the divisor becomes not sufficiently at risk.  If more of the
contribution to the quotient is moved from 'x' to 'y' J can be
brought negative and while still maintaining an at risk divisor:

y + x = (3/4)*(2^(-21 - 2*J)) - delta*(2^(-31 - 2*J))
0 < x < 2^-21     : This is the restriction associated
                  : with the lower bound on delta

For the pair 4195835/3145727 the analysis works as follows:

4195835 = 2^22 + 1531 = 2^22 + (3/4)*2^11 - 5
        = (2^22)*(1 + (3/4)*(2^-11) - (5/2)*(2^-21))
3145727 = 3*(2^20) - 1 = (2^21)*((3/2) - (2^-21))
        = (2^21)*(3/2)*(1 - (2/3)*(2^-21))
===>
y = (3/4)*(2^-11) - (5/2)*(2^-21)
x = (2/3)*(2^-21)
y + x = (3/4)*(2^-11) - (11/6)*(2^-21)

This corresponds nicely to the above equation with
J = -5 and delta = 11/6.

Listing Eight


dik@cwi.nl writes in article <CytpMv.6D2@cwi.nl>:

> 1000111111111111111000110101111000010101000100 = 9895574626641
> 1010011111111111111101101101011000010010100000 = 1443107810341
> 1011111111111111111110111000001000110111101101 = 52776539295213
>
> Except for the first two there is a common definite pattern:
> a leading 10, followed by a bunch of 1's, followed by 0111000001.
> If the random numbers are random enough this seems to be
> significant.  I would like to see verification of the first
> two numbers listed (perhaps a transcription error or some-such?).

No error of mine, but the results of these two numbers have a
significantly longer correct mantissa as the results of all others:

       9895574626641
9.895574626641000e+12 = 402A8FFFE35E15100000
1.000000000000000e+00 = 3FFF8000000000000000
1.010552734661427e-13 = 3FD3E38E6622AB7F2614
9.999999999998295e-01 = 3FFEFFFFFFFFFFD00000
       1443107810341
1.443107810341000e+12 = 4027A7FFF6D612800000
1.000000000000000e+00 = 3FFF8000000000000000
6.929489209567026e-13 = 3FD6C30C3B66AAA79320
9.999999999999858e-01 = 3FFEFFFFFFFFFFFC0000