Differential and Linear Cryptanalysis

Attacking the Data Encryption Standard

Bruce Schneier

Bruce is a DDJ contributing editor and author of Applied Cryptography, Second Edition (John Wiley & Sons, 1996). He can be contacted at schneier@counterpane.com.

The Data Encryption Standard (DES) has been the workhorse of cryptography for almost 20 years. Recently, two powerful new attacks on DES have been invented: differential and linear cryptanalysis. Both are statistical in that an attacker collects a large amount of plaintext and ciphertext associated with a given key, then uses that information to determine the key. In this article, I'll explain the attacks by showing how they work against DES.

A DES Backgrounder

Since the mid-1970s, DES has been analyzed and discussed in every book and magazine (including DDJ) that covers cryptography. DES is a 16-round block cipher. Its input is a 64-bit data element, x. You first permute the bits of x in a fixed pattern, then divide x into two 32-bit halves: L and R. Then, for i=1 to 16:

L_i=R_i-1
R_i=L_i-1f(R_i_-1,K_i)

After the sixteenth round, you swap L and R. Next, you recombine L and R and permute the bits again to get the ciphertext; see Figure 1.

Function f, the round function, is where the security lies. First, the 32 input bits are permuted and expanded to 48 bits. Then, the 48 bits are divided into eight 6-bit chunks. Each chunk goes through an "S-box," the output of which is a 4-bit number. These output 32 bits are permuted again.

DES has many constants, including the fixed permutations and the eight different S-boxes. These are all specified, but the designers of DES give no reasons why they were chosen instead of others.

Differential Cryptanalysis

In 1990 and 1991, Eli Biham and Adi Shamir introduced differential cryptanalysis, which looks specifically at pairs of ciphertexts whose plaintexts have particular differences. Differential cryptanalysis analyzes the evolution of these differences as the plaintexts propagate through the rounds of DES when they are encrypted with the same key.

Put simply, the technique chooses pairs of plaintexts with a fixed difference. The two plaintexts can be chosen at random, as long as they satisfy particular difference conditions; you don't even have to know their values. Then, using the differences in the resulting ciphertexts, you assign different probabilities to different keys. As you analyze more ciphertext pairs, one key will emerge as the most probable. This is the correct key.

The details are, of course, more complicated. Consider Figure 2, the DES round function. Imagine a pair of inputs, X and X', that have the difference X. The outputs, Y and Y' are known, therefore, so is the difference, Y. Both the expansion permutation and the P-box are known, so A and C are known. B and B' are not known, but their difference B is known and equal to A. (When looking at the difference, the XORing of K_i with A and A' cancels out.) So far, so good. Here's the trick: For any given A, not all values of C are equally likely. The combination of A and C suggests values for bits of A XOR K_i and A' XOR K_i. Since A and A' are known, this gives us information about K_i.

Look at the last round of DES. (Differential cryptanalysis ignores the initial and final permutations since they have no effect on the attack--except to make it harder to explain.) If we can identify K₁₆, then we have 48 bits of the key. (Remember, the subkey in each round consists of 48 bits of the 56-bit key.) The other eight bits we can get by brute force. Differential cryptanalysis will get us K₁₆.

Certain differences, called "characteristics," in plaintext pairs have a high probability of causing certain differences in the resulting ciphertext pairs. Characteristics extend and define a path through several rounds. There is an input difference, a difference at each round, and an output difference--with a specific probability.

You can find these characteristics by generating a table where the rows represent the possible input XORs (the XOR of two different sets of input bits), the columns represent the possible output XORs, and the entries represent the number of times a particular output XOR occurs for a given input XOR. You can generate such a table for each of DES's eight S-boxes.

Figure 3 (a), for example, is a one-round characteristic. The input difference of the left side is L; it could be anything. The input difference of the right side is 0. (The two inputs have the same right side, so their difference is 0.) Since there is no difference going into the round function, there is no difference coming out of it. Therefore, the output difference of the left side is L xor 0=L, and that of the right side is 0. This is a trivial characteristic, and is true with probability 1.

Figure 3 (b) is a less obvious characteristic. Again, the input difference to the left side is arbitrary--L. The input difference to the right side is 0x60000000; the two inputs differ in only the first and third bits. With a probability of 14/64, the output difference of the round function is L xor 0x00808200, so the output difference of the left side is L xor 0x00808200 and that of the right side is 0x60000000--with probability 14/64.

Different characteristics can be joined and, assuming the rounds are independent, the probabilities can be multiplied. Figure 4 joins the two characteristics just described. The input difference to the left side is 0x00808200, and that to the right side is 0x60000000. At the end of the first round, the input difference and the output of the round function cancel out, leaving an output difference of 0. This feeds into the second round, where the final output difference of the left side is 0x60000000 and that of the right side is 0. This two-round characteristic has a probability of 14/64.

A plaintext pair that satisfies the characteristic is a correct pair; the pair that does not is a wrong pair. A correct pair will suggest the correct round key (for the last round of the characteristic); a wrong pair will suggest a random round key. To find the correct round key, simply collect enough guesses so that one subkey is suggested more often than all the others. In effect, the correct subkey will rise out of all the random alternatives.

So, the basic differential attack on n-round DES will recover the 48-bit subkey used in round n, and the remaining eight key bits are obtained by brute-force guessing.

There are still considerable problems. There is a negligible chance of success until you reach some threshold; that is, until you accumulate sufficient data, you can't tell the correct subkey from all the noise. And the attack isn't practical: You have to use counters to assign different probabilities to 2⁴⁸ possible subkeys, and too much data is required to make this work.

Consequently, Biham and Shamir tweaked their attack. Instead of using a 15-round characteristic on 16-round DES, they used a 13-round characteristic and some tricks to get the last few rounds. A shorter characteristic with a higher probability worked better. And they used some clever mathematics to obtain 56-bit key candidates which could be tested immediately, eliminating the need for counters. This attack succeeds as soon as a right pair is found; this avoids the threshold and gives a linear success probability. If you have 1000 times fewer pairs, you have 1000 times smaller chance of success--but there is always some chance of immediate success.

The results are most interesting. DES variants with fewer rounds are highly susceptible to differential cryptanalysis. The best attack against full 16-round DES requires 2⁴⁷ chosen plaintexts. This can be converted to a known-plaintext attack that requires 2⁵⁵ known plaintexts. The analysis requires 2³⁷ DES operations.

Differential cryptanalysis works against DES and other similar algorithms with constant S-boxes. The attack is heavily dependent on the structure of the S-boxes; the ones in DES just happen to be optimized against differential cryptanalysis. The attack works against DES in any of its operating modes--ECB, CBC, CFB, and OFB--with the same complexity.

DES's resistance can be improved by increasing the number of rounds. Chosen-plaintext differential cryptanalysis DES with 17 or 18 rounds takes about the same time as a brute-force search. At 19 rounds or more, differential cryptanalysis becomes impossible because it requires more than 2⁶⁴ chosen plaintexts: Remember, DES has a 64-bit block size, so it only has 2⁶⁴ possible plaintext blocks. (In general, an algorithm is resistant to differential cryptanalysis if the amount of plaintext required to mount such an attack is greater than the amount of plaintext possible.)

Realize that this attack is largely theoretical. The enormous time and data requirements to mount a differential cryptanalytic attack put it beyond the reach of almost everyone. To get the requisite data for this attack against a full DES, you would have to encrypt a 1.5-MB/sec data stream of chosen plaintext for almost three years. Furthermore, this is primarily a chosen-plaintext attack. To convert it to a known-plaintext attack, you have to sift through all of the plaintext-ciphertext pairs looking for the useful ones. For full 16-round DES, this makes the attack slightly less efficient than brute force (the differential cryptanalytic attack requires 2^55.1 operations, and brute force requires 2⁵⁵). Properly implemented, DES is still secure against differential cryptanalysis.

Why is DES so resistant to differential cryptanalysis? Why are the S-boxes optimized to make this attack as difficult as possible? Why are there as many rounds as required, but no more? Because, as Don Coppersmith of IBM admitted, the designers knew about it in the mid-1970s (but it was classified government information).

Linear Cryptanalysis

Linear cryptanalysis, invented by Mitsuru Matsui in 1993, is a type of cryptanalytic attack that uses linear approximations to describe the action of DES. This means that if you XOR some of the plaintext bits together, XOR some ciphertext bits together, and then XOR the results, you will get a single bit that is the XOR of some of the key bits. This is a linear approximation, and will hold with some probability p. If p is not equal to 1/2, then this bias can be exploited. You use collected plaintexts and associated ciphertexts to guess the values of the key bits. The more data you have, the more reliable the guess. The greater the bias, the greater the success rate with the same amount of data.

How do you identify good linear approximations for DES? For starters, you find good one-round linear approximations and join them together. (Again, ignore the initial and final permutations; they don't effect the attack.) Look at the S-boxes. There are six input bits and four output bits. The input bits can be combined using XOR in 63 useful ways (2⁶-1), and the output bits can be combined in 15 useful ways. For each S-box, you can evaluate the probability that for a randomly chosen input, an input XOR combination equals some output XOR combination. If a combination has a high-enough bias, then linear cryptanalysis may work.

If the linear approximations are unbiased, then they will hold for 32 of the 64 possible inputs. I'll spare you the pages of tables, but the most biased S-box is S-box 5. In fact, the second input bit is equal to the XOR of all four output bits for only 12 inputs. This translates to a probability of 3/16, or a bias of 5/16, and is the most extreme bias in all the S-boxes.

Figure 5 shows how to turn this into an attack against the DES round function. The input bit into S-box 5 is b₂₆. (I am numbering the bits from left to right, and from 1 to 64. Matsui ignores this convention with DES and numbers his bits from right to left and from 0 to 63. It's enough to drive you mad.) The four output bits from S-box 5 are c₁₇, c₁₈, c₁₉, and c₂₀. We can trace b₂₄ backwards from the input to the S-box. The bit a₂₆ is XORed with a bit from the subkey K_i_,26 to obtain b₂₆. Bit X₁₇ goes through the expansion permutation to become a₂₆. After the S-box, the four output bits go through the P-box to become four output bits of the round function: Y₃, Y₈, Y₁₄, and Y₂₅. This means that with probability 1/2-5/16:

X₁₇Y₃Y₈Y₁₄Y₂₅=K_i,26

Linear approximations for different rounds can be joined in a manner similar to that discussed under differential cryptanalysis. Figure 6 is a three-round approximation with a probability of 1/2+.0061. The individual approximations are of varying quality: The first and last are pretty good, and the middle is bad. Together, the three one-round approximations give a very good three-round approximation.

The basic attack is to use the best linear approximation for 16-round DES. It requires 2⁴⁷ known-plaintext blocks and will result in one key bit. This is clearly not very useful. If you interchange the role of plaintext and ciphertext and use decryption as well as encryption, you can get two key bits. This still isn't very useful.

There are refinements. Use a 14-round linear approximation for rounds 2 through 15. Guess the six subkey bits relevant to S-box 5 for the first and last rounds (12 key bits in all). Effectively, you are doing 2¹² linear cryptanalyses in parallel and picking the correct one based on probabilities. This recovers the 12 bits, plus the b₂₆, and reversing plaintext and ciphertext recovers another 13 bits. To get the remaining 30 bits, use exhaustive search. There are other tricks, but that's basically it.

Against full 16-round DES, this attack can recover the key with an average of 2⁴³ known plaintexts. A software implementation of this attack recovered a DES key in 50 days using 12 HP9735 workstations.

Linear cryptanalysis depends heavily on the structure of the S-boxes, and the S-boxes in DES are not optimized against this attack. According to Don Coppersmith, resistance to linear cryptanalysis "was not part of the design criteria of DES." Either they didn't know about linear cryptanalysis, or they knew about something else even more powerful whose resistance criteria took precedence.

Linear cryptanalysis is newer than differential cryptanalysis, and there may be more improvements in the years to come.

Future Directions

Work has been done to extend the concept of differential cryptanalysis to higher-order differentials. Lars Knudsen uses "partial differentials" to attack 6-round DES; it requires 32 chosen plaintexts and 20,000 encryptions. The attack is too new to know if these extensions will make it easier to attack full 16-round DES.

Another avenue of attack is differential-linear cryptanalysis: combining differential and linear cryptanalysis. Susan Langford and Martin Hellman have an attack on eight-round DES that recovers ten key bits with an 80 percent probability of success with 512 chosen plaintexts and a 95 percent probability of success with 768 chosen plaintexts. However, it doesn't seem to extend easily to more rounds.

These attacks are still new, and work continues. There may be a breakthrough sometime during the next few years. Maybe we will see a practical statistical attack against DES. Who knows?

References

Biham E. and A. Shamir. Differential Cryptanalysis of the Data Encryption Standard. New York, NY: Springer-Verlag, 1993.

Matsui, M. "Linear Cryptanalysis Method for DES Cipher," Advances in Cryptology-EUROCRYPT '93 Proceedings. New York, NY: Springer-Verlag, 1994.

------. "The First Experimental Cryptanalysis of the Data Encryption Standard." Advances in Cryptology-CRYPTO '94 Proceedings. New York, NY: Springer-Verlag, 1994.

Schneier, B. Applied Cryptography, Second Edition. New York, NY: John Wiley & Sons, 1996.

Differential and Linear Cryptanalysis

Attacking the Data Encryption Standard

Bruce Schneier

A DES Backgrounder

Differential Cryptanalysis

Linear Cryptanalysis

Future Directions

References

Figure 1: DES.

Figure 2: DES round function.

Figure 3: DES characteristics. (a) With probability 1; (b) with probability 14/16 (X=0x60000000, Y=0x00808200).

Figure 4: A two-round DES characteristic with probability 14/64 (X=0x60000000, Y=0x00808200).

Figure 5: A one-round linear approximation for DES.

Figure 6: A three-round linear approximation for DES with probability 1/2+6.1x10^-3(A=[3,8,14,25], B=[8,14,25]).

Differential and Linear Cryptanalysis

Attacking the Data Encryption Standard

Bruce Schneier

A DES Backgrounder

Differential Cryptanalysis

Linear Cryptanalysis

Future Directions

References

Figure 1: DES.

Figure 2: DES round function.

Figure 3: DES characteristics. (a) With probability 1; (b) with probability 14/16 (X=0x60000000, Y=0x00808200).

Figure 4: A two-round DES characteristic with probability 14/64 (X=0x60000000, Y=0x00808200).

Figure 5: A one-round linear approximation for DES.

Figure 6: A three-round linear approximation for DES with probability 1/2+6.1x10-3 (A=[3,8,14,25], B=[8,14,25]).

Figure 6: A three-round linear approximation for DES with probability 1/2+6.1x10^-3(A=[3,8,14,25], B=[8,14,25]).