3.2 Error Detection and Correction
Error detection is generally cheaper (in terms of additional bits in
overhead) to do than error correction. Neither are always needed, audio and
video can often have some errors without noticeably affecting the perceived
transmission quality. Error detection makes sense whenever the data must be
absolutely reliable (an ATM cash machine transaction) or when the medium is very
error prone (phone lines, wireless). Error correction is reasonable when
retransmitting the data is not feasible (e.g. a probe designed to crash land on
Saturn) or very expensive. Much of the current practice in error detection and
correction is based on work by the mathematician Hamming. Applications include
not only data transmission but data storage (e.g. use of a checksum to verify
data integrity on a storage device).
3.2.1 Error Correcting Codes - Codes that allow the original data to
be reconstructed in the face of incurring one or more errors. Generally the more
errors that can be corrected, the larger the correcting code required (in bits).
- Code word - A data frame generally consists of:
- m data bits (message)
- r code bits
- m + r = n bit code word.
- Hamming distance - The number of bit positions two code words
differ. 000 and 111 have a Hamming distance of 3, 101 and 000 have a distance
of 2. The XOR (eXclusive OR) of two code word bits determines number of
bits different. For example,
100010
XOR 011010
111000 Distance = 3
|
A B | A xor B
0 0 | 0
0 1 | 1
1 0 | 1
1 1 | 0
|
Significant in that for two codewords d distance apart, d
single-bit errors can convert one to the other. For a distance of 1 a single
error could convert one codeword into another, for example:
000000 is a distance of 1 from 000001, a single error changes 000000
to 000001 |
What is the Hamming distance between 000000 and 111100?
Parity Example
No parity |
|
|
|
Even parity |
|
|
|
00 |
|
m=2, r=0, d=1 |
|
000 |
valid |
|
m=2, r=1, d=2 |
01 |
|
The change of any one |
|
001 |
invalid |
|
Adding parity doubles |
10 |
|
bit results in a valid |
|
010 |
invalid |
|
the number of codewords, but |
11 |
|
codeword. No error |
|
011 |
valid |
|
only half are valid. Any single bit |
|
|
can be detected. |
|
100 |
invalid |
|
error produces an invalid code. |
|
|
|
|
101 |
valid |
|
|
|
|
|
|
110 |
valid |
|
|
|
|
|
|
111 |
invalid |
|
|
- What is the odd parity for the ASCII data: 11111111 and 11111110?
- Is data and parity bit 111100001 valid for even parity?
- Suppose that one million bits were sent with a single parity bit for
error detection. Would a 1-bit error be detected? Would all errors in
two bits be detected?
- What was sent if 000011 is received and we assume a 1-bit error
occurred?
- How many errors occurred at a minimum if 011001 is received? Can it
be corrected reliably? Then what to do on receiving 000011?
- Error correcting code construction - We want to construct an error
correcting code with minimum check bits as overhead. For single bit error
correction the limit for:
- m data bits
- r check bits
- m+r+1 <= 2r
- r=3 can correct one error in m=4 data bits,
since m+3+1<=23 = 8, or m=4.
- r=4 can correct one error in m=11 data bits,
since m+4+1<=24 = 16, or m=11.
- r=5 can correct one error in m=26 data bits,
since m+5+1<=25 = 32, or m=26.
|
The following is an example of a method by Hamming for constructing a
minimal single bit error correcting code. The code has m=4 data bits, thus can
encode 16 data values (00002-11112), and r=3 check bits.
There are seven bits numbered from 1 to 7 with four data bits (m3,
m2, m1, m0) and three check bits
(p2, p1, p0). Note that check bits are placed
at positions numbered as a power of 2 (e.g. check bit p2 is at
position 4 = 22) between data bits. Data bits can be in any order
but below are arranged in standard high bit at left order. The m data
(m3m2m1m0) and r check bits
(p2p1p0) are then organized into a vector as
follows:
POSITION |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
BIT |
p0 |
p1 |
m3 |
p2 |
m2 |
m1 |
m0 |
Data bits are checked by check bits whose position sum is equal to
the position of the data bit. In this example:
m3 = p0 +
p1
Position of m3 = Position of p0 + Position of
p1 = 3 m2 = p0 +
p2
Position of m2 = Position of p0 + Position
of p2 = 5 m1 = p1 +
p2
Position of m1 = Position of p1 + Position
of p2 = 6 m0 = p2 +
p1 + p0 Position of
m0 = Position of p2 + Position of p1 +
Position of p0 =7 |
The p check bit values are computed from the data bits by forming the
Exclusive-OR of all data bits checked by that bit as follows (note xor
here is Exclusive OR):
p2 = m2 xor m1 xor
m0 p1 = m3 xor
m1 xor m0 p0 = m3
xor m2 xor m0 |
Note that the sender computes
p0, p1, p2. For example:
p2 = m2 xor m1 xor
m0 = 1 xor 0 xor 0 = 1 |
The receiver can then perform the same calculation for the p check bits and
if any differ a transmission error occurred.
Error position vector: The binary representation of the error
position is given by the vector (C2, C1, C0),
where:
C0 = p0 xor m3
xor m2 xor m0
C1
= p1 xor m3 xor m1
xor m0
C2 = p2
xor m2 xor m1xor
m0 |
Note that the receiver computes
C0, C1, C2. From the above
computation of p2 = 1 and no errors in p2,
m2, m1, m0
C2 = p2 xor m2 xor
m1xor m0 = 1 xor 1
xor 0 xor 0 = 0
|
Example: The sender would compute the check bits and transmit both
data and check bits as in the vector above. The receiver would
compute the error position vector using the received data and
check bit vector. For example, to send data 11002 the vector
transmitted would be 01111002. Should a one bit error occur in
position 4 the received vector would be 01101002.
POSITION |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
BIT |
p0 |
p1 |
m3 |
p2 |
m2 |
m1 |
m0 |
TRANSMIT |
0 |
1 |
1 |
1 |
1 |
0 |
0 |
RECEIVED |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
Computing the error vector yields (1, 0, 0) indicating that POSITION 4
(410 = 1002 of the received frame is in error and should
be inverted to correct the error.
C0 = 0 xor 1 xor 1xor
0 = 0 C1 = 1 xor 1 xor 0 xor 0 = 0
C2 = 0 xor 1 xor 0
xor 0 = 1 |
3.2.2 Error Detecting Codes - To detect
d errors requires a
distance of
d+1, no
d number of errors can change a valid
code into another valid code.
Parity
The ASCII code uses 8 data bits, so that all possible valid 8-bit
codes are used. The distance is one, since each valid code is 1 bit
from another valid code. Hence one error transforms any valid code to
another valid code. |
ASCII code with parity, 8 data bits and 1 bit parity for error checking has
a distance of two, meaning each valid code + parity is at least
2 bits different from any other valid code. All valid codes are
transformed by a 1 bit error into an invalid code. The invalid code is
detected as an error.
Using even parity (there is an even number of 1 bits in the data and
parity bit) the letter A=00100001 0 (the last bit is parity calculated by the
sender). A 1-bit error anywhere in data or parity will transform the codeword
to an invalid code. Suppose the parity is changed from 0 to 1, then the
received code is 00100001 1. The receiver calculates the parity and recognizes
the codeword to be invalid so an error occurred somewhere in the data or
parity.
Note that more than one error has only a 50% chance of detection. For
11110000 0 sent, two errors could produce 11000000 0 which is still a valid
code and would not be detected as an error by the receiver. Three errors
producing 10000000 0 would be detected. Four errors producing 11000011 0 would
not, etc. An odd number of bit errors is detected.
It is generally cheaper to detect an error and retransmit data than to send
error correcting codes.
Sending 1,000,000 data bits in frames of 1000 bits using error correcting
Hamming codes requires 10 check bits per 1000 data bit frames or 10,000 extra
bit to correct single bit errors, a total of 1,010,000 bits transmitted (i.e.
m+r+1 <= 2r or 1000+10+1=1011<=
210=1024).
- Alternatively, 20 check bits could correct a 1 bit error for
1,000,000 data bits for a total of 1,000,020 bits (i.e. m+r+1
<= 2r or 1,000,000+20+1<=
220=1,048,576). Why is this a bad idea?
- A single parity bit can detect one error in a 1,000,000 bit
message but the message would be retransmitted when an error was
detected. Under what conditions is this a bad idea or a good
idea?
|
Using error detection and retransmit on a detected error requires 1 parity
bit per 1000 data bits or 1000 check bits for the data plus 1 additional check
bit for the 1000 parity bits, a total of 1,001,001 bits transmitted error
free. For 1 error per million bits, error detection and retransmit requires
1,002,002 bits to be transmitted (i.e. an additional 1001 bits retransmitted).
One key problem is the lack of robustness to error detection using parity
as it can detect 100% of single bit errors but only 1/2 of more than 1 bit
errors. This can be improved by observing that most errors occur in bursts and
reorganizing how blocks of data are sent.
Suppose that we send two 3 bit numbers 101 and 001 with even parity, 1010
and 0011. Sending as 1010 0011, a two bit error burst might transform
the underlined bits to 1100 0011 which is not detected as an error by a
parity check bit. Instead of sending all of one message data bits and parity
bit at once which can only detect a one bit error, a more robust approach
sends the first bit of each message, then the second, etc. This provides error
detection of a 2 bit burst since only one bit in each column would be changed
but not any 2 bit error, better than before but not good enough. The data and
parity of both is sent as:
10 |
First bit |
00 |
Second bit |
11 |
Third bit |
01 |
parity |
Sending 1010 0011 would be transmitted as: 10001101. A two bit error
burst in the underlined bits would be received as
10111101.
A two bit error burst, such as in the underlined bits, would be detected by
the parity bits when the message was reconstructed by the receiver. In
general,
n frames with a parity bit can detect a single
n bit
error burst.
- Polynomial codes - CRC (cyclic redundancy check) codes can
be constructed that provide significantly better error detection than
parity. The sender computes a checksum sent with the data. The receiver
recomputes the checksum on the received data using the same method,
if the received and computed checksums differ, an error has been detected,
retransmit the data.
- The method is roughly based on:
- Divide the data by an agreed upon divisor, the remainder is the
checksum.
- Transmit the data and checksum remainder.
- Divide the received data by the agreed upon divisor. The computed and
received remainder should be equal.
- The method is straightforward and is illustrated below by an example.
- Convert data to binary: 'a'=61h=01100001
M(x)=0x7+1x6+1x5+0x4+0x3+0x2+0x1+1x0
= 01100001
- To compute checksum, divide data M(x) by a selected generator
polynomial G(x). Append 0 bits to M(x) for the degree of G(x).
G(x)=x4+x+1 = 10011
xrM(x) = 01100001
0000
M(x)
xr |
- Divide xrM(x) by G(x) to get checksum, the remainder R(x).
Use Exclusive OR rather than binary subtraction where a divisor divides
the dividend if the same number of bits.
Q(x)
G(x)/ T(x)
R(x)= 1110
|
|
1101010
10011/011000010000
xor 10011
10110
xor 10011
10110
xor 10011
10100
xor 10011
1110 R(x)
|
- The message to be transmitted, T(x), consists of the data and
checksum:
T(x) = xrM(x) xor R(x)
xrM(x)
011000010000 xor
R(x) xor 000000001110
T(x)
011000011110 |
Note that the exclusive OR operation is effectively subtraction so the
dividend T(x) is 011000010000 - 1110, what is left over is divisible by
G(x). Example: 123/10 has remainder 3. (123-3)/10 has 0 remainder.
- The receiver recomputes the checksum of T(x), the remainder is 0 when
no errors detected, again because after subtracting the remainder from the
dividend to form T(x), T(x) is divisible by G(x).
1101010
10011/011000011110
xor 10011
10110
xor 10011
10111
xor 10011
10011
xor 10011
00
00
0 remainder implies no error
|
- CRC generator selection - Selected for robustness of error
detection. For example, G(x) with x+1 as a prime factor detects all
odd numbers of errors. Three polynomials are international standards, one
is:
CRC-12 =
x12+x11+x3+x2+x+1 =
1100000001111
No comments:
Post a Comment