All About Computer network: [ COMPUTER NETWORK ] Error Detection and Correction

3.2 Error Detection and Correction

Error detection is generally cheaper (in terms of additional bits in overhead) to do than error correction. Neither are always needed, audio and video can often have some errors without noticeably affecting the perceived transmission quality. Error detection makes sense whenever the data must be absolutely reliable (an ATM cash machine transaction) or when the medium is very error prone (phone lines, wireless). Error correction is reasonable when retransmitting the data is not feasible (e.g. a probe designed to crash land on Saturn) or very expensive. Much of the current practice in error detection and correction is based on work by the mathematician Hamming. Applications include not only data transmission but data storage (e.g. use of a checksum to verify data integrity on a storage device).
3.2.1 Error Correcting Codes - Codes that allow the original data to be reconstructed in the face of incurring one or more errors. Generally the more errors that can be corrected, the larger the correcting code required (in bits).

Code word - A data frame generally consists of:
- m data bits (message)
- r code bits
- m + r = n bit code word.

Hamming distance - The number of bit positions two code words differ. 000 and 111 have a Hamming distance of 3, 101 and 000 have a distance of 2. The XOR (eXclusive OR) of two code word bits determines number of bits different. For example,

       100010
XOR 011010
       111000    Distance = 3

A B | A xor B
0 0 |    0
0 1 |    1
1 0 |    1
1 1 |    0

000000 is a distance of 1 from 000001, a single error changes 000000 to 000001

What is the Hamming distance between 000000 and 111100?

**Parity Example**
No parity		Even parity
00	m=2, r=0, d=1	000	valid	m=2, r=1, d=2
01	The change of any one	001	invalid	Adding parity doubles
10	bit results in a valid	010	invalid	the number of codewords, but
11	codeword. No error	011	valid	only half are valid. Any single bit
	can be detected.	100	invalid	error produces an invalid code.
		101	valid
		110	valid
		111	invalid

What is the odd parity for the ASCII data: 11111111 and 11111110?

Is data and parity bit 111100001 valid for even parity?

Suppose that one million bits were sent with a single parity bit for error detection. Would a 1-bit error be detected? Would all errors in two bits be detected?

Error correcting codes - To correct d errors requires a distance of 2d+1. d errors transform the codeword sent to one that is still one bit closer to the original than any other possible legal codes. The following codewords have a distance of 3, so a one bit error can be corrected. For example, if 000000 was sent and one error occurs, 100000 might be received. The closest codeword to 100000 is the original 000000 so could be corrected. Two errors might result in 110000 which would be closer to 111000, leading to an erroneous correction.

Codewords for correcting a 1-bit error

000000

000111

111000

111111

What was sent if 000011 is received and we assume a 1-bit error occurred?

How many errors occurred at a minimum if 011001 is received? Can it be corrected reliably? Then what to do on receiving 000011?

Error correcting code construction - We want to construct an error correcting code with minimum check bits as overhead. For single bit error correction the limit for:

m data bits

r check bits

m+r+1 <= 2^r

r=3 can correct one error in m=4 data bits, since m+3+1<=2³ = 8, or m=4.

r=4 can correct one error in m=11 data bits, since m+4+1<=2⁴ = 16, or m=11.

r=5 can correct one error in m=26 data bits, since m+5+1<=2⁵ = 32, or m=26.

The following is an example of a method by Hamming for constructing a minimal single bit error correcting code. The code has m=4 data bits, thus can encode 16 data values (0000₂-1111₂), and r=3 check bits. There are seven bits numbered from 1 to 7 with four data bits (m₃, m₂, m₁, m₀) and three check bits (p₂, p₁, p₀). Note that check bits are placed at positions numbered as a power of 2 (e.g. check bit p₂is at position 4 = 2²) between data bits. Data bits can be in any order but below are arranged in standard high bit at left order. The m data (m₃m₂m₁m₀) and r check bits (p₂p₁p₀) are then organized into a vector as follows:

POSITION	1	2	3	4	5	6	7
BIT	p₀	p₁	m₃	p₂	m₂	m₁	m₀

Data bits are checked by check bits whose position sum is equal to the position of the data bit. In this example:

m₃ = p₀ + p₁ Position of m₃ = Position of p₀ + Position of p₁ = 3
m₂ = p₀ + p₂Position of m₂ = Position of p₀ + Position of p₂ = 5
m₁ = p₁+ p₂Position of m₁ = Position of p₁ + Position of p₂ = 6
m₀ = p₂ + p₁ + p₀Position of m₀ = Position of p₂ + Position of p₁+ Position of p₀=7

xor

p₂ = m₂ xor m₁ xor m₀
p₁ = m₃ xor m₁ xor m₀
p₀ = m₃ xor m₂ xor m₀

Note that the sender computes p₀, p₁, p₂.
For example: p₂ = m₂ xor m₁ xor m₀ = 1 xor 0 xor 0 = 1

Error position vector:

₂

₁

₀

C₀ = p₀ xor m₃ xor m₂xor m₀

C₁= p₁ xor m₃ xor m₁xor m₀

C₂= p₂xor m₂xor m₁xor m₀

Note that the receiver computes C₀, C₁, C₂.
From the above computation of p₂ = 1 and
no errors in p₂, m₂, m₁, m₀

C₂= p₂xor m₂xor m₁xor m₀ = 1xor 1xor 0 xor 0 = 0

Example:

data

check

data

check

₂

POSITION	1	2	3	4	5	6	7
BIT	p₀	p₁	m₃	p₂	m₂	m₁	m₀
TRANSMIT	0	1	1	1	1	0	0
RECEIVED	0	1	1	0	1	0	0

Computing the error vector yields (1, 0, 0) indicating that POSITION 4 (4₁₀ = 100₂ of the received frame is in error and should be inverted to correct the error.

C₀ = 0 xor 1 xor 1xor 0 = 0
C₁= 1 xor 1 xor 0 xor 0 = 0
C₂= 0xor 1xor 0 xor 0 = 1

3.2.2 Error Detecting Codes - To detect d errors requires a distance of d+1, no d number of errors can change a valid code into another valid code.

Parity

The ASCII code uses 8 data bits, so that all possible valid 8-bit codes are used. The distance is one, since each valid code is 1 bit from another valid code. Hence one error transforms any valid code to another valid code.

two

valid

invalid

even

It is generally cheaper to detect an error and retransmit data than to send error correcting codes.
Sending 1,000,000 data bits in frames of 1000 bits using error correcting Hamming codes requires 10 check bits per 1000 data bit frames or 10,000 extra bit to correct single bit errors, a total of 1,010,000 bits transmitted (i.e. m+r+1 <= 2^r or 1000+10+1=1011<= 2¹⁰=1024).

Alternatively, 20 check bits could correct a 1 bit error for 1,000,000 data bits for a total of 1,000,020 bits (i.e. m+r+1 <= 2^r or 1,000,000+20+1<= 2²⁰=1,048,576). Why is this a bad idea?

A single parity bit can detect one error in a 1,000,000 bit message but the message would be retransmitted when an error was detected. Under what conditions is this a bad idea or a good idea?

Using error detection and retransmit on a detected error requires 1 parity bit per 1000 data bits or 1000 check bits for the data plus 1 additional check bit for the 1000 parity bits, a total of 1,001,001 bits transmitted error free. For 1 error per million bits, error detection and retransmit requires 1,002,002 bits to be transmitted (i.e. an additional 1001 bits retransmitted).
One key problem is the lack of robustness to error detection using parity as it can detect 100% of single bit errors but only 1/2 of more than 1 bit errors. This can be improved by observing that most errors occur in bursts and reorganizing how blocks of data are sent.
Suppose that we send two 3 bit numbers 101 and 001 with even parity, 1010 and 0011. Sending as 1010 0011, a two bit error burst might transform the underlined bits to 1100 0011 which is not detected as an error by a parity check bit. Instead of sending all of one message data bits and parity bit at once which can only detect a one bit error, a more robust approach sends the first bit of each message, then the second, etc. This provides error detection of a 2 bit burst since only one bit in each column would be changed but not any 2 bit error, better than before but not good enough. The data and parity of both is sent as:

10 First bit

00 Second bit

11 Third bit

01 parity

Sending 1010 0011 would be transmitted as: 10001101. A two bit error burst in the underlined bits would be received as 10111101.

A two bit error burst, such as in the underlined bits, would be detected by the parity bits when the message was reconstructed by the receiver. In general, n frames with a parity bit can detect a single n bit error burst.

Polynomial codes - CRC (cyclic redundancy check) codes can be constructed that provide significantly better error detection than parity. The sender computes a checksum sent with the data. The receiver recomputes the checksum on the received data using the same method, if the received and computed checksums differ, an error has been detected, retransmit the data.
The method is roughly based on:
1. Divide the data by an agreed upon divisor, the remainder is the checksum.
2. Transmit the data and checksum remainder.
3. Divide the received data by the agreed upon divisor. The computed and received remainder should be equal.

The method is straightforward and is illustrated below by an example.

Convert data to binary: 'a'=61h=01100001
M(x)=0x⁷+1x⁶+1x⁵+0x⁴+0x³+0x²+0x¹+1x⁰ = 01100001
To compute checksum, divide data M(x) by a selected generator polynomial G(x). Append 0 bits to M(x) for the degree of G(x).

G(x)=x⁴+x+1 = 10011
x^rM(x) = 01100001 0000 M(x) x^r

Divide x^rM(x) by G(x) to get checksum, the remainder R(x). Use Exclusive OR rather than binary subtraction where a divisor divides the dividend if the same number of bits.

         Q(x) 
G(x)/    T(x) 
         R(x)= 1110

           1101010
10011/011000010000
   xor 10011
        10110
    xor 10011
          10110
      xor 10011
            10100
        xor 10011
              1110 R(x)

The message to be transmitted, T(x), consists of the data and checksum:

**T(x) = x^rM(x) xor R(x)**
x^rM(x) 011000010000 xor R(x) xor 000000001110 T(x) 011000011110

Note that the exclusive OR operation is effectively subtraction so the dividend T(x) is 011000010000 - 1110, what is left over is divisible by G(x). Example: 123/10 has remainder 3. (123-3)/10 has 0 remainder.

The receiver recomputes the checksum of T(x), the remainder is 0 when no errors detected, again because after subtracting the remainder from the dividend to form T(x), T(x) is divisible by G(x).

           1101010
10011/011000011110
  xor  10011
        10110
    xor 10011
          10111
      xor 10011
            10011
        xor 10011
                00
                00
                 0 remainder implies no error

CRC generator selection - Selected for robustness of error detection. For example, G(x) with x+1 as a prime factor detects all odd numbers of errors. Three polynomials are international standards, one is:
CRC-12 = x¹²+x¹¹+x³+x²+x+1 = 1100000001111

All About Computer network

Sunday, February 2, 2014

[ COMPUTER NETWORK ] Error Detection and Correction

3.2 Error Detection and Correction

No comments:

Post a Comment