Ben is a C programmer and employee-owner at Lockheed Missiles & Space Co., 0/6215, B/551, PO Box 3504, Sunnyvale, CA 94088-3504.
Since the earliest days of electronic computation and data communication, equipment designers have had to grapple with data transmission errors. For example, data being transmitted over a noisy channel, such as a telephone line, may have some bits garbled by external phenomena to the extent that they no longer have their original value. In addition, data stored in certain types of computer memory occasionally lose the value in a single bit. In either case, if the data are not somehow corrected or retransmitted, they are worthless. If an error is undetected, the results can be catastrophic -- imagine that the data represent a withdrawal transaction from a bank's automatic teller machine.
Because of the need for reliable, accurate data transmission and storage, various methods have been developed over the years to deal with this problem. By introducing a certain amount of redundancy into the transmitted data, an error can be detected, and even corrected, at the receiving end. A very simple-minded usage of redundancy is shown in the following example. Suppose that we wish to transmit data in 8-bit bytes. By transmitting each byte twice, the receiver can compare corresponding bits and flag any discrepancy as an error. For example, compare the two transmitted bytes:
11010011 11011011
These two bytes differ at the fifth bit from the left, indicating an error. There is no way to know which value (0 or 1) is the correct value for that bit, so the receiver must request retransmission. By using triple redundancy, this problem can be eliminated, as shown:
11010011 11010011 11011011
Under this scheme, each byte is transmitted three times, and we again note that there is a discrepancy in the fifth bit. The bit appears as a 0 in two of the transmitted bytes, and as a 1 in only one byte. We can assume with a high degree of confidence that the correct value for that bit is 0, and retransmission is not necessary. This self-correcting capability is enormously useful in situations such as satellite communications, where signal propagation takes a long time.
The problem with the above approach is that it is inefficient. With triple redundancy, only 33 percent of the channel capacity is used for actual data. The use of Hamming codes is a widely used method of achieving the same, or better, results. A subset of the bits in each data word is assigned to (overlapping) groups, and a "check bit" is given to each group. This allows the detection/correction of errors in received code words with much more efficient use of the channel than the method previously described.
To illustrate this procedure, let's assume that we wish to transmit 4-bit message (data) words across a noisy channel. We also want to be able to detect and correct any single bit error occurring during transmission. It will be necessary to create three check groups, and assign three of the message bits to each group. Number the message and check bits as follows:
P1 P2 M3 P4 M5 M6 M7
P1, P2, and P4 are check bits (P stands for parity). M3, M5, M6, and M7 are the message bits. Before transmitting the 7-bit code word, values must be assigned to the check bits. This is done as follows: P1 is assigned odd parity over M3, M5, and M7. P2 is assigned odd parity over M3, M6, and M7. Finally, P4 is assigned odd parity over M5, M6, and M7. For example, take the code word:
M3 M5 M6 M7 1 0 1 1
Because the exclusive-OR of M3, M5, and M7 is 0, P1 will be 1. The 4-bit group, including the check bit itself, must have odd parity. By the same reasoning, P2 is 0, and P4 is 1. The complete code word to be transmitted is:
P1 P2 M3 M5 M6 M7 1 0 1 0 1 1
You may notice that the P1 check bit represents the parity over all message bit positions with the 21 bit present, as in positions 3, 5, and 7 (011, 101, 111 in binary). P2 is parity for the 22 bit positions, and P4 is parity for the 24 bit positions. The placement of check bits in the completed code word is done to reinforce this structure conceptually. In actual practice, any ordering of the bits will do, as long as sender and receiver agree on the same order.
To see that the added check bits provide enough redundancy to detect and correct any single bit error, look at the receiving and decoding process. When the receiver has all 7 bits of the code word, the grouping of bits is repeated, and parity of each 4-bit group is checked. P1, M3, M5, and M7 are exclusive-ORed together and checked for odd parity; the same process occurs for the P2 and P4 groups.
If one or more of the three groups do not have odd parity, a transmission error has occurred. Assume that the received code word is 1010011, which differs from the transmitted code word. The bits of the P1 group are 1101, the P2 group is 0111, and the P4 group is 0011. The exclusive-OR of the 4 bits in each group is 0, 0, and 1, respectively. As one of the groups, P4, does not have odd parity, an error has been detected.
We have all of the information needed to locate the bit position of the error and we can correct it. The three parity bits computed by the receiver are arranged in descending order, left to right, by group number, P4, then P2, then P1. Arranged this way, the bits are called the "syndrome." In the above example, the syndrome is 100, which is the binary number four (4). It just so happens that bit 4 (counting from the left) in the received code word is the bit in error. Simply invert the bit to correct it. Bit 4 also happens to be one of the check bits, and nicely illustrates that the Hamming method of error detection works for any bit in the code word, whether it is a message or a check bit.
The Hamming method can be adapted to work for longer message words. Efficiency is gained in terms of the check bits to message bits ratio, but the probability of an error occurring in a given code word is increased. On the other hand, if more check bits are used, greater numbers of bit errors in a word can be detected and corrected.
Methods used in logic designs to add check bits to message words include parity-generation circuits (Figure 1) and address-table lookups (as through a ROM). In the first case, exclusive-OR gates are used to generate odd parity from 2-bit pairs. The outputs are exclusive-ORed again to compute parity for a 4-bit group. The parity (check) bits are then transmitted with the message bits.
For lookup-based check-bit generation, the message bits are presented as the address to a memory. The data at each memory location is the appropriate combination of check bits for that message word. Again, assume a 4-bit message, and three check bits for single-error correction. The resulting memory lookup table would require four address bits and 3-bit data words, or a 16-word x 3-bit memory. As with the exclusive-OR network, the check bits produced from the lookup table are transmitted with the message bits.
The traditional method for decoding received data is with an exclusive-OR circuit, as shown in Figure 2, for the 4-bit message and 3-bit check scheme. The 7 bits of the code word are presented in parallel to the exclusive-OR gates. These are divided into three groups, corresponding to the three check bits. The odd parity of each of the 4 bits in the group is calculated, and the resulting syndrome is presented to a decoder. The decoder can be considered to be the error detector. If an error is indicated by the syndrome, one of the outputs Y1 to Y7 will be asserted. If the syndrome is 0, Y0 will be asserted, meaning no error was detected.
The decoder outputs are used to drive the correction logic, consisting of one exclusive-OR gate for each message bit. The check bits do not need to be explicitly corrected, as they are used only internally by the receiver. This is the reason nothing is connected to the Y1, Y2, and Y4 outputs of the decoder.
Normally, the exclusive-OR gates will pass the message bit through unchanged. In some instances, the decoder asserts one of the outputs Y3, Y5, Y6, or Y7. This will cause the corresponding gate to invert its received message bit and correct the error automatically.
An alternate approach to decoding, which is similar to encoding by ROM table lookup, boasts the same simplicity of design. For the decoding case, the ROM needs a 7-bit address and 4-bit words, or a 128-word x 4-bit ROM (see Figure 3). The trick is in deciding what to store in each word of the ROM.
In those cases where the incoming code word is correct, the answer is clear. There are 16 possible message words (24) and 16 different correct code words. The message word is stored at each of these 16 ROM addresses. When the 7-bit code word is presented at the inputs, the embedded message word will be delivered at the outputs. It can then be passed on by the receiver as correct message word.
For example, if the (valid) code word 1011011 is received, it is applied to the address inputs of the ROM. The data stored at that location is 1011, the four embedded message bits from the code word. The data at address X = X, for the 16 correct code words.
The remaining 112 (27 - 24) ROM locations correspond to erroneous code words. Given a set of 16 possible correct code words and assuming that one single-bit error occurs, each original code word can change in seven different ways. This means there are 16*7 = 112 possible erroneous code words, but because of the redundancy built in, a received code word containing a single error has one unambiguous correct code word. The reason this is so has to do with the "Hamming distance" of the code scheme, a concept that would, unfortunately, require a rather lengthy explanation.
We must store something at these 112 locations that will automatically correct the message portion of the code word. If each erroneous code word is a ROM address, then the data stored at that address should be the corrected code word. In other words, the data stored at address Y = X, where X is the corresponding correct code word.
Assume that the code word 1010011 is received and presented at the address inputs of the ROM. As it happens, this is an erroneous code word. The data stored at location 1010011 would be the corrected code word, 1011, without the check bits.
By considering each permutation in advance (an easily computable task for any realistic code word length), it is possible to program a ROM to do the decoding at the receiving end. The ROM-based design is simpler and cleaner than the exclusive-OR circuit and, for small word sizes, approximately as fast.
Furthermore, ROM-based designs can easily be extended to handle longer message and code word lengths. With longer code words, the efficiency ratio of message bits to check bits increases, although the chances of an error occurring in a given code word are increased. The ROM-based design is also able to accommodate different code schemes that can detect and/or correct more errors in a code word by using more check bits and assigning the groups differently. In these cases, a different size ROM is substituted and programmed. With the exclusive-OR circuit, a complete redesign is often necessary.
Hamming, R.W. "Error Detecting and Error Correcting Codes," Bell System Technical Journal, vol. 29, (April, 1950), 147 - 160.
Kohavi, Zvi. Switching and Finite Automata Theory, Second edition. New York: McGraw-Hill, 1978. 14 - 19.
Copyright © 1989, Dr. Dobb's Journal