Sunday, February 2, 2014

[ COMPUTER NETWORK ] Error Detection and Correction

3.2 Error Detection and Correction

Error detection is generally cheaper (in terms of additional bits in overhead) to do than error correction. Neither are always needed, audio and video can often have some errors without noticeably affecting the perceived transmission quality. Error detection makes sense whenever the data must be absolutely reliable (an ATM cash machine transaction) or when the medium is very error prone (phone lines, wireless). Error correction is reasonable when retransmitting the data is not feasible (e.g. a probe designed to crash land on Saturn) or very expensive. Much of the current practice in error detection and correction is based on work by the mathematician Hamming. Applications include not only data transmission but data storage (e.g. use of a checksum to verify data integrity on a storage device).
3.2.1 Error Correcting Codes - Codes that allow the original data to be reconstructed in the face of incurring one or more errors. Generally the more errors that can be corrected, the larger the correcting code required (in bits).
  • Code word - A data frame generally consists of:
    • m data bits (message)
    • r code bits
    • m + r = n bit code word.
  • Hamming distance -  The number of bit positions two code words differ. 000 and 111 have a Hamming distance of 3, 101 and 000 have a distance of 2. The XOR (eXclusive OR) of two code word bits determines number of bits different. For example,
       100010
XOR 011010
       111000    Distance = 3   
A B | A xor B
0 0 |    0
0 1 |    1
1 0 |    1
1 1 |    0
    Significant in that for two codewords d distance apart, d single-bit errors can convert one to the other. For a distance of 1 a single error could convert one codeword into another, for example:
    000000 is a distance of 1 from 000001, a single error changes 000000 to 000001
What is the Hamming distance between 000000 and 111100?
    Parity Example
    No parity

      Even parity


    00   m=2, r=0, d=1
    000 valid   m=2, r=1, d=2
    01
    The change of any one
    001 invalid
    Adding parity doubles
    10
    bit results in a valid
    010 invalid
    the number of codewords, but 
    11
    codeword. No error 
    011 valid
    only half are valid. Any single bit


    can be detected.
    100 invalid
    error produces an invalid code.




    101 valid





    110 valid





    111 invalid

  • What is the odd parity for the ASCII data: 11111111 and 11111110?
  • Is data and parity bit 111100001 valid for even parity?
  • Suppose that one million bits were sent with a single parity bit for error detection. Would a 1-bit error be detected? Would all errors in two bits be detected?
  • Error correcting codes - To correct d errors requires a distance of 2d+1. d errors transform the codeword sent to one that is still one bit closer to the original than any other possible legal codes. The following codewords have a distance of 3, so a one bit error can be corrected. For example, if 000000 was sent and one error occurs, 100000 might be received. The closest codeword to 100000 is the original 000000 so could be corrected. Two errors might result in 110000 which would be closer to 111000, leading to an erroneous correction.
    Codewords for correcting a 1-bit error
    000000
    000111
    111000
    111111
  • What was sent if 000011 is received and we assume a 1-bit error occurred?
  • How many errors occurred at a minimum if 011001 is received? Can it be corrected reliably? Then what to do on receiving 000011?
  • Error correcting code construction - We want to construct an error correcting code with minimum check bits as overhead. For single bit error correction the limit for:
  • m data bits
  • r check bits
  • m+r+1 <= 2r
  • r=3 can correct one error in m=4 data bits, since m+3+1<=23 = 8, or m=4. 
  • r=4 can correct one error in m=11 data bits, since m+4+1<=24 = 16, or m=11. 
  • r=5 can correct one error in m=26 data bits, since m+5+1<=25 = 32, or m=26. 
The following is an example of a method by Hamming for constructing a minimal single bit error correcting code. The code has m=4 data bits, thus can encode 16 data values (00002-11112), and r=3 check bits. There are seven bits numbered from 1 to 7 with four data bits (m3, m2, m1, m0) and three check bits (p2, p1, p0). Note that check bits are placed at positions numbered as a power of 2 (e.g. check bit p2 is at position 4 = 22) between data bits. Data bits can be in any order but below are arranged in standard high bit at left order. The m data (m3m2m1m0) and r check bits (p2p1p0) are then organized into a vector as follows:
POSITION 1 2 3 4 5 6 7
BIT  p0  p1 m3 p2 m2 m1 m0
Data bits are checked by check bits whose position sum is equal to the position of the data bit. In this example:
m3 = p0 + p1            Position of m3 = Position of p0 + Position of p1 = 3
m2 = p0 + p2              Position of m2 = Position of p0 + Position of p2 = 5
m1 = p1 + p2              Position of m1 = Position of p1 + Position of p2  = 6
m0 = p2  + p1 + p0    Position of m0 = Position of p2 + Position of p1 + Position of p0 =7
    The p check bit values are computed from the data bits by forming the Exclusive-OR of all data bits checked by that bit as follows (note xor here is Exclusive OR):
    p2 = m2 xor m1 xor m0 
    p1 = m3 xor m1 xor m0
    p0 = m3 xor m2 xor m0
    Note that the sender computes p0, p1, p2.
    For example: p2 = m2 xor m1 xor m0 = 1 xor 0 xor 0 = 1
    The receiver can then perform the same calculation for the p check bits and if any differ a transmission  error occurred.
    Error position vector: The binary representation of the error position is given by the vector (C2, C1, C0), where:
    C0 = p0 xor m3 xor m2 xor m0

    C1 = p1 xor m3 xor m1 xor m0

    C2 = p2 xor m2 xor m1xor m0
    Note that the receiver computes C0, C1, C2.
    From the above computation of p2 = 1 and
    no errors in p2, m2, m1, m0
    C2 = p2 xor m2 xor m1xor m0 = 1 xor 1 xor 0 xor 0 = 0

    Example: The sender would compute the check bits and transmit both data and check bits as in the vector above. The receiver would compute the error position vector using the received data and check bit vector. For example, to send data 11002 the vector transmitted would be 01111002. Should a one bit error occur in position 4 the received vector would be 01101002.
POSITION 1 2 3 4 5 6 7
BIT  p0  p1 m3 p2 m2 m1 m0
TRANSMIT 0 1 1 1 1 0 0
RECEIVED 0 1 1 0 1 0 0
Computing the error vector yields (1, 0, 0) indicating that POSITION 4 (410 = 1002 of the received frame is in error and should be inverted to correct the error. 
    C0 = 0 xor 1 xor 1xor 0 = 0
    C1 = 1 xor 1 xor 0 xor 0 = 0
    C2 = 0 xor 1 xor 0 xor 0 = 1
3.2.2 Error Detecting Codes - To detect d errors requires a distance of d+1, no d number of  errors can change a valid code into another valid code.
    Parity
    The ASCII code uses 8 data bits, so that all possible valid 8-bit codes are used. The distance is one, since each valid code is 1 bit from another valid code. Hence one error transforms any valid code to another valid code.
    ASCII code with parity, 8 data bits and 1 bit parity for error checking has a distance of two, meaning each valid code + parity is at least 2 bits different from any other valid code. All valid codes are transformed by a 1 bit error into an invalid code. The invalid code is detected as an error.
    Using even parity (there is an even number of 1 bits in the data and parity bit) the letter A=00100001 0 (the last bit is parity calculated by the sender). A 1-bit error anywhere in data or parity will transform the codeword to an invalid code. Suppose the parity is changed from 0 to 1, then the received code is 00100001 1. The receiver calculates the parity and recognizes the codeword to be invalid so an error occurred somewhere in the data or parity.
    Note that more than one error has only a 50% chance of detection. For 11110000 0 sent, two errors could produce 11000000 0 which is still a valid code and would not be detected as an error by the receiver. Three errors producing 10000000 0 would be detected. Four errors producing 11000011 0 would not, etc. An odd number of bit errors is detected.
It is generally cheaper to detect an error and retransmit data than to send error correcting codes.
Sending 1,000,000 data bits in frames of 1000 bits using error correcting Hamming codes requires 10 check bits per 1000 data bit frames or 10,000 extra bit to correct single bit errors, a total of 1,010,000 bits transmitted (i.e. m+r+1 <= 2r or 1000+10+1=1011<= 210=1024).
  • Alternatively, 20 check bits could correct a 1 bit error for 1,000,000 data bits for a total of 1,000,020 bits (i.e. m+r+1 <= 2r or 1,000,000+20+1<= 220=1,048,576). Why is this a bad idea?
  • A single parity bit can detect one error in a 1,000,000 bit message but the message would be retransmitted when an error was detected. Under what conditions is this a bad idea or a good idea?
Using error detection and retransmit on a detected error requires 1 parity bit per 1000 data bits or 1000 check bits for the data plus 1 additional check bit for the 1000 parity bits, a total of 1,001,001 bits transmitted error free. For 1 error per million bits, error detection and retransmit requires 1,002,002 bits to be transmitted (i.e. an additional 1001 bits retransmitted).
One key problem is the lack of robustness to error detection using parity as it can detect 100% of single bit errors but only 1/2 of more than 1 bit errors. This can be improved by observing that most errors occur in bursts and reorganizing how blocks of data are sent.
Suppose that we send two 3 bit numbers 101 and 001 with even parity, 1010 and 0011. Sending as 1010 0011, a two bit error burst might transform the underlined bits to 1100 0011 which is not detected as an error by a parity check bit. Instead of sending all of one message data bits and parity bit at once which can only detect a one bit error, a more robust approach sends the first bit of each message, then the second, etc. This provides error detection of a 2 bit burst since only one bit in each column would be changed but not any 2 bit error, better than before but not good enough. The data and parity of both is sent as:
 
10  First bit
00  Second bit
11  Third bit
01  parity
Sending 1010 0011 would be transmitted as: 10001101. A two bit error burst in the underlined bits would be received as 10111101.
A two bit error burst, such as in the underlined bits, would be detected by the parity bits when the message was reconstructed by the receiver. In general, n frames with a parity bit can detect a single n bit error burst.
  •  Polynomial codes - CRC (cyclic redundancy check) codes can be constructed that provide significantly better error detection than parity. The sender computes a checksum sent with the data. The receiver recomputes the checksum on the received data using the same method, if the received and computed checksums differ, an error has been detected, retransmit the data. 
  • The method is roughly based on:
    1. Divide the data by an agreed upon divisor, the remainder is the checksum.
    2. Transmit the data and checksum remainder.
    3. Divide the received data by the agreed upon divisor. The computed and received remainder should be equal.
  • The method is straightforward and is illustrated below by an example.
    1. Convert data to binary: 'a'=61h=01100001
      M(x)=0x7+1x6+1x5+0x4+0x3+0x2+0x1+1x0 = 01100001
    2. To compute checksum, divide data M(x) by a selected generator polynomial G(x). Append 0 bits to M(x) for the degree of G(x).
      G(x)=x4+x+1 = 10011
      xrM(x) = 01100001  0000                M(x)          xr
    3. Divide xrM(x) by G(x) to get checksum, the remainder R(x). Use Exclusive OR rather than binary subtraction where a divisor divides the dividend if the same number of bits.
               Q(x) 
      G(x)/    T(x) 
               R(x)= 1110

                 1101010
      10011/011000010000
         xor 10011
              10110
          xor 10011
                10110
            xor 10011
                  10100
              xor 10011
                    1110 R(x)
      
    4. The message to be transmitted, T(x), consists of the data and checksum:  
          T(x) = xrM(x) xor R(x)
              xrM(x)          011000010000
          xor   R(x)      xor 000000001110
                T(x)          011000011110
      Note that the exclusive OR operation is effectively subtraction so the dividend T(x) is 011000010000 - 1110, what is left over is divisible by G(x). Example: 123/10 has remainder 3. (123-3)/10 has 0 remainder.
       
    5. The receiver recomputes the checksum of T(x), the remainder is 0 when no errors detected, again because after subtracting the remainder from the dividend to form T(x), T(x) is divisible by G(x).
                 1101010
      10011/011000011110
        xor  10011
              10110
          xor 10011
                10111
            xor 10011
                  10011
              xor 10011
                      00
                      00
                       0 remainder implies no error
  • CRC generator selection - Selected for robustness of error detection. For example, G(x) with x+1 as a prime factor detects all odd numbers of errors. Three polynomials are international standards, one is:
                CRC-12 = x12+x11+x3+x2+x+1 = 1100000001111

No comments:

Post a Comment