• No results found

Reed-Solomon Codes: Error Correcting Codes

N/A
N/A
Protected

Academic year: 2021

Share "Reed-Solomon Codes: Error Correcting Codes"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

Bachelor Degree Project

Reed-Solomon Codes: Error Correcting Codes

Author​: Isabell Skoglund Supervisor​: Per-Anders

Svensson Examinator​: Marcus Nilsson Subject​: Mathematics Semester: VT2020

(2)

Abstract

In the following pages an introduction of the error correcting codes known as Reed-Solomon codes will be presented together with different approaches for decoding. This is supplemented by a Mathematica program and a description of this program that gives an understanding in how the choice of decoding algorithms affect the time it takes to find errors in stored or transmitted information.

Contents

1 Introduction 2

2 Error Correcting Codes 3

2.1 Bounds on Codes . . . 5

2.2 Linear Codes for Finite Fields . . . 6

2.3 Cyclic Codes . . . 12

2.4 BCH Codes . . . 15

3 Reed-Solomon Codes 19 3.1 Encoding . . . 20

3.2 Properties . . . 21

4 Methods 22 4.1 Decoding . . . 22

4.1.1 Direct Method . . . 25

4.1.2 Berlekamp–Massey algorithm . . . 26

4.2 My program . . . 28

5 Result 29

6 Conclusion 31

A

The Mathematica Code 33

(3)

1 Introduction

To be able to safely store or transmit information, without losing impor- tant messages, one needs a way to detect and correct errors that can occur during the process. All available channels of communication do have some degree of noise or interference such as a scratch in a CD or a neighbouring channel in radio transmission, these have to be considered when transmitting information.

One way to still be able to transmit messages safely over a noisy channel is to add some redundancy to the message. This gives the ability to reconstruct the interfered message. It can be done by replacing the symbols in the original message by codewords that will have some redundancy in them.

There are several of different ways to do this, one of the simplest ways is to just repeat the message, which gives an repeating code. That is, the message that is sent is repeated some number of times, so even if some part is disrupted the message is most likely still readable. Here it is probably quite easy to see if there are an error but one is required to send a lot of extra information.

Example 1. If the message that should be sent is hello, this is then repeated some number of times, for example three times so the transmitted code is then hellohellohello. Here if it occurs some error, say that the received message is weloohellohello, then one can still see what the original message is. If there should be to much errors one needs to ask for the message to be resent.

An other way is to append a parity check digit at the end of the message.

That is, if the message is in, for example binary, one can make the rule of adding a 1 at the end of the message if the total number of 1’s in the message should be either odd or even.

Example 2. If the message of 1100001 should be sent the number of ones is 3, if the rule is to have the total number of ones to be even, if it is not even one needs to add an extra 1 in the end, otherwise a 0 is added, this is the parity check digit. Then the transmitted code is 11000011. If there occurs an error, for example, the received message is 11100011, where the total number of ones is now 5, since this is not even there have occurred some error and the message needs to be resent.

This method gives the ability to notice when there are only one error in the message. The error can’t be corrected, it is only detected. Then the

(4)

receiver can ask for the message to be resent [8]. If the message is disrupted it can’t be read but there is almost no extra information sent. This gives the most important balance of error correcting codes, to send as little as possible and still be able to recover the information when it is disrupted.

2 Error Correcting Codes

Error correction codes are used to ensure that potential errors in a message that is sent over a noisy communication channel or is stored on sensitive devices can be detected and corrected within specific limitations [9]. A mes- sage that is transmitted over an noisy channel needs to be encoded to obtain codewords. These codewords consists of symbols that come from some alpha- bet A. These codewords are the ones that are transmitted and the receiver then decodes the message that might no longer be actual codewords. In the decoding process any possible errors are detected and corrected, to some ex- tent. This is represented in figure 1 found in Introduction to Cryptography with Coding Theory Second Edition page 399 [8].

The alphabets that are used in Example 1 and in Example 2 are the English alphabet and the binary numbers, respectively. If A is an alphabet, An denotes the set of n-tuples of elements in A, then the elements in a subset of An are the codewords of a block code with length n. A block code is a code where all codewords have the same length. Block codes with some additional conditions is mostly used in practise, one common condition is to require that A is a finite field, this gives that An is a vector space. These type of codes are called linear codes [8].

Figure 1: Overview for message transmission over a noisy channel [8].

(5)

Example 3. If A = {0, 1} is the alphabet of a binary repetition code where each symbol is repeated four times, and the code is the set {(0, 0, 0, 0), (1, 1, 1, 1)}

that is a subset of A4.

To be able to decode any alphabet, it is useful to decide a measure on how close two words are to each other. This measurement is called the Hamming distance, denoted d(v1, v2) for some words v1 and v2 from An. The Hamming distance is defined as the number of places where the two words differ, that is the minimum number of errors that needs to occur for v1 to be changed into v2.

Example 4. If A = {0, 1}, the Hamming distance d = (v1, v2) between v1 = 1100 and v2 = 0111 in A4 is equal to 3 since the two words differ in places 1, 3 and 4.

If one calculated the Hamming distance between all the different code- words in a code C there exists a minimum value, the minimum distance of the code C, denoted d(C), as

d(C) = min {d(v1, v2) | v1, v2 ∈ C, v1 6= v2} .

The minimum distance of C is important since it gives the smallest number of errors that can change one codeword into another codeword. This is used when a received message have some error so it does not correspond to an existing codeword. These errors are then corrected by finding the codeword that has the smallest Hamming distance from the received message. This is called nearest neighbour decoding, that is, changing the received message to a codeword by changing as few symbols as possible.

Rules can be set up to require that nearest neighbour decoding actually gives the correct answer when there are at most t errors. These are described in Theorem 1. There can occur some trouble when there are more than one nearest neighbour to the received message. For example using the same set up as in Example 4, then 1000 for example have the same Hamming distance to all four of {0000, 1100, 1010, 1001}. Here one approach is to just guess one of them, which can seem risky but if one symbol in a long message is guessed wrong the message will probably still be readable. Or if it represents a pixel in a picture and the colour of this pixel is guessed wrong one will still be able to see the picture. If it is more sensitive information where the meaning of the message is changed depending of one symbol the safest way to go is to have the message resent.

(6)

Theorem 1. A code C can detect up to s errors if d(C) ≥ s + 1 and a code C can correct up to t errors if d(C) ≥ 2t + 1 [8].

According to Theorem 1 a code can detect up to s error if one is able to change any codeword v1 at s places without changing it to another existing codeword v2. The code can also correct up to t errors if one is able to change any codeword v1 at t places and still have the codeword v1 closest according to the Hamming distance [8].

2.1 Bounds on Codes

As mentioned in the introduction, the balance to send as little additional information as possible and still be able to recover an message after an oc- curring error is really important. This is called the code rate or information rate, R, of a code and represents the ratio of input data symbols contra transmitted code symbols. It can be calculated for a q-tuple (n, M, d) code, where n is the length of the code, M is the number of codewords in the code and d is the minimum distance of the code, using

R = logq(M )

n .

This is representing what part of the bandwidth that is being used to transmit actual data. When using a code to transmit messages one would like the relative minimum distance, d/n, to be as large as possible to be able to correct a great number of errors. The relative minimum distance is a measure of the error correcting capability of the code relative to its length [6]. Here one would also like M to be as large as possible so that the code rate R is close to 1, since this gives the bandwidth efficiently when transmitting messages over noisy channels. The problem now is that increasing d tends to decrease M , or increase n, which in turn lower the code rate. This creates an dilemma where one wants both the code rate and the relative minimum distance to be as large as possible. This is described by the so called Singleton bound given by R. Singleton in 1964 presented in Theorem 2.

Theorem 2. Let C be a q-tuple (n,M,d) code. Then M ≤ qn−d+1[8].

(7)

A code that satisfies the Singleton bound with an equality is called an maximum distance separable, MDS, code. This is a code that has the largest possible value of M for a given n and d.

Proof. Let c and c0 be two codewords, c = (a1, ..., an) and c0 = (ad, ..., an), respectively. If two codewords c1 and c2 are different from each other then they differ in at least d places. When c01 and c02 are obtained by removing d − 1 entries from c1 and c2, c01 and c02 must differ in at least one place. The number M of codewords c is equal to the number of vectors c0 obtained in this way. There are at most qn−d+1 vectors c0 since there are n − d + 1 positions in these vectors. This implies that M is less or equal to qn−d+1, as desired [8].

One class of code that fulfils an equality in the Singleton bound and hence is MDS are the Reed-Solomon codes [8].

Example 5. Using the code in Example 3 that is a binary repetition code of length 4. This is a (4, 2, 4) code. Then the Singleton bound gives

2 = M ≤ q4−4+1= q1 = 21,

here q is 2 since it is a binary code. Since there are an equality in the Singleton bound this code ia an MDS code.

2.2 Linear Codes for Finite Fields

To be able to decode a code efficiently is really important. For the decoding process to be quick it is useful to apply some conditions for the code, this provides use for linear codes. Here the alphabet A will be a finite field F , where F still can be a lot of different alphabets as long as they are finite. For example the binary numbers that gives the alphabet F = Z2 or the integers modulo a prime p, which gives the alphabet F = Zp. The corresponding vector space over F is the set of n-tuples codewords in F and is denoted Fn. A subspace of Fn is a nonempty subset, S, that is closed under linear combinations. Then for all s1, s2 ∈ S and a1, a2 ∈ F it gives that a1s1+ a2s2 ∈ S. For both the finite fields Z2 and Zp all calculations for elements are done modulo 2 or modulo p, respectively.

Definition 1. A linear code of dimension k and length n over a finite field F is a k-dimensional subspace of Fn. This type of code is called an [n, k]

(8)

code. This could be rewritten to an [n, k, d] code, when d is known and is the minimum distance of the code [8].

For example the binary repetition code in Example 3 is a linear code with one-dimensional subspace of Z42.

The binary parity check code in Example 2 is a linear code with a seven- dimensional subspace of Z82. This binary code of dimension 7 and length 8 consists of the binary vectors such that the sum of all entries is zero modulo 2. Then the vectors

(1, 0, 0, 0, 0, 0, 0, 1), (0, 1, 0, 0, 0, 0, 0, 1), ..., (0, 0, 0, 0, 0, 0, 1, 1)

form a basis of the subspace that contains the binary vectors where the sum of all entries is zero modulo 2.

The ISBN code, that is The International Standard Book Number is an error detecting code that not is linear. When a book is published it is assigned with a ISBN number, that is a 10 digit codeword. The first digit gives the language, the second and third digit represents the publisher and the fourth to ninth digits represents a book identity number that the publisher assigns to the book. The last digit is chosen to fulfil

10

X

j=1

jaj ≡ 0 mod 11,

where a1, ..., a10are the digits of the ISBN number for a specific book. Since the calculation is made modulo 11 the tenth digit can be 10 which is then represented by X, the first nine digits can only be chosen form {0, 1, ..., 9}

and it is this that makes this code not linear. The code is not closed under linear combinations due to the fact that one can not choose 10 as one of the first nine entries.

When a linear code C of dimension k spans over a finite field F , where F has q elements, then the code C has qk elements. This is seen if there is a basis of C containing k elements, v1, ..., vk. Then all elements of the code C can be uniquely written in the form a1v1 + · · · + akvk, where a1, ..., ak ∈ F . Here there are q choices for each ai since F contains q elements, and there are k ai’s since the dimension of the code is k. Hence there are qk different elements in C. Here the Singleton bound can be rewritten for linear codes to qk ≤ qn−d+1, where d is the minimum distance of the code and n is the length. This implies that k + d ≤ n + 1.

(9)

As discussed before the minimum distance for a code is the smallest num- ber of symbols that have to change to transform one codeword into another codeword, and is represented by the Hamming distance for the code. To com- pute the minimum distance of any arbitrary code, that might not be linear, can be tiresome since it could require computing d(v1, v2) for every pair of different codewords that belongs to the code C. When it is known that the code is linear, finding the minimum distance can be done using the Hamming weight instead. The Hamming weight is defined as wt(v1) = d(v1, 0), where 0 = (0, 0, ..., 0), that is the number of nonzero places of v1. Then d (C) is instead the smallest Hamming weight of all the nonzero codewords, like

d(C) = min {wt(v1) | 0 6= v1 ∈ C} .

This gives an advantage, since one no longer needs to calculate every code- word against each other, instead it is only one calculation for each codeword and this goes much faster.

When constructing a linear [n, k] code, it needs a k dimensional subspace of Fn. One way to do this is to choose k vectors that are linearly independent to each other and then take their span. That is the set of all linear combi- nations of these linearly independent vectors [1]. To do this one can choose a k × n generating matrix G of rank k, with entries in F . The subspace is then given by the set of vectors of the form vG, where v runs through all row vectors in Fk. The rows of the generator matrix G are then the basis for a k dimensional subspace of F of all vectors of length n. This subspace is the linear code C. This means that every codeword is uniquely expressible as a linear combination of the rows in G.

Definition 2. Let G be a generating k × n matrix for a linear [n, k] code C.

An (n − k) × n matrix H such that

GHT = 0,

then H is called the parity check matrix for the code C with generating matrix G.

Theorem 3. If a linear code C has the generating matrix G = [Ik, P ], then H =−PT, In−k is a parity check matrix for C [8].

For a generator matrix G = [Ik, P ], where Ik is the k × k identity matrix, the last n − k columns gives the redundancy that together with the first k

(10)

columns, that still is the message, gives the full codeword. This code is then called systematic, where the first k symbols are the information symbols and the rest are the check symbols.

Example 6. Using Example 2 that is an [8, 7] code the generating matrix G locks like this,

G =

1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1

 .

So the codeword 11000011 is the sum of the first, second and seventh row modulo 2. This codeword is then obtained by multiplying (1, 1, 0, 0, 0, 0, 1) with the generating matrix.

To check if there have occurred any errors one can use the parity check matrix H = −PT, In−k where PT is the transpose of P used to construct the generator matrix G. When taking the dot product between the codeword and matrix HT and it’s result is not zero, there are some error.

The corresponding parity check matrix H for Example 6 would then be (1, 1, 1, 1, 1, 1, 1, 1), since

−PT = (−1, −1, −1, −1, −1, −1, −1) = (1, 1, 1, 1, 1, 1, 1)

modulo 2 and In−k = I8−7 = I1 = 1. This gives that if v = 11000011 is a codeword the dot product between v and HT should be zero, here v · HT = 1 · 1 + 1 · 1 + 0 · 1 + 0 · 1 + 0 · 1 + 0 · 1 + 1 · 1 + 1 · 1 = 4 ≡ 0 mod 2.

Generally, C =uG | u ∈ Ak is a subspace of An, where G is the k × n generating matrix. Then if v1 = uG is a codeword, v1HT should be equal to zero, as

v1HT = (uG)HT = u(GHT) = 0,

since GHT = 0 for every generating matrix and its corresponding parity check matrix. If some error e is introduced the received vector v2 = uG + e, then multiplying this with the parity check matrix yields

v2HT = (uG + e)HT = uGHT + eHT = eHT 6= 0,

(11)

and an error is detected, if e not is a codeword.

If a codeword is transmitted and then vector v is received, the receiver would compute vHT to see if there have occurred some error. If this is not equal to zero at least one error is detected. The value of vHT is called the syndrome of vector v and the syndrome of some vector v is denoted S(v).

When vHT is equal to zero, one can not say that there are no error, one can only say that v is a codeword. Since it is most likely to not be any error when vHT = 0, than enough errors occurring to change one codeword into another codeword, one can assume that no errors have occurred. Now the parity check matrix can be used to detect and correct errors in the process of decoding a received message. Two definitions about cosets will help in the understanding of the general decoding procedure using the parity check matrix.

Definition 3. Let C be a linear code and let u be an n-dimensional vector.

The set u + C given by

u + C = {u + c | c ∈ C}

is called a coset of C [8].

Definition 4. A vector having minimum Hamming weight in a coset is called a coset leader [8].

Using syndrome decoding requires a lot fewer steps than just searching for the nearest codeword to the received vector would. It can be done by using a syndrome lookup table that consists of the coset leaders and their corresponding syndromes. Then decoding is done by three steps.

1. Calculate the syndrome for a received vector r, S(r) = rHT.

2. Find the coset leader that has the same syndrome as S(r). Let it be c0.

3. Decode the received vector r using the coset leader c0 as r − c0. Example 7. Let C be a binary linear code that has the generating matrix

G =1 0 1 0 0 1 0 1

 .

(12)

Then the code C includes the codewords

{(0, 0, 0, 0), (1, 0, 1, 0), (0, 1, 0, 1), (1, 1, 1, 1)} ,

these elements will be the first row of a decoding table that will help in the decoding process. To create the next row take the vector with the smallest Hamming weight that do not already have a place in the table, there can be more than one choice, and than the next three elements in this row is obtained by the first element of the row subtracted of the element at the top of each column. This is done for all rows until all possible elements of length four is used, together this creates the table

(0,0,0,0) (1,0,1,0) (0,1,0,1) (1,1,1,1) (1,0,0,0) (0,0,1,0) (1,1,0,1) (0,1,1,1) (0,1,0,0) (1,1,1,0) (0,0,0,1) (1,0,1,1) (1,1,0,0) (0,1,1,0) (1,0,0,1) (0,0,1,1).

When a vector is received, look for it in the table and decode it to the vector at the top of the same column. If the received vector v is (0, 0, 0, 1) it is decoded to (0, 1, 0, 1).

Example 7 is quite small so even though (0, 0, 0, 1) is decoded to one of its nearest neighbours it is not the only one that are equally close, (0, 0, 0, 0) is also a nearest neighbour to (0, 0, 0, 1). This becomes a problem since the minimum distance of this code is 2, this means that a general error correc- tion might not be possible. If the code would have fulfilled the conditions described in Theorem 1 the same procedure will decode the vectors correctly.

The code in the example was used since writing the table and search for the received vector in this table can be difficult for large codes. Here the parity check matrix H can be used to make the process more manageable.

The vectors in the first column is the coset leaders, l, if v is in the same row as l, then v = l + c fore some codeword c. This gives that

vHT = lHT + cHT = lHT,

since c is a codeword it gives that cHT = 0. The syndromes are here the vector S(v) = vHT, if two vectors have the same syndrome they belong to the same coset and have the same coset leader, so the table in Example 7 can be replaced by the smaller table,

(13)

(0,0,0,0) (0,0) (1,0,0,0) (1,0) (0,1,0,0) (0,1) (1,1,0,0) (1,1).

Example 8. Using the same code C as in Example 7 with the same generating matrix G, decoding the received vector v = (0, 0, 0, 1) is now done by multiplying it by HT, this gives

S(v) = vHT = (0, 0, 0, 1)

 1 0 0 1 1 0 0 1

= (0, 1).

This is the syndrome of the third row in the smaller table, now subtract the coset leader from the vector v modulo 2 and the codeword (0, 1, 0, 1) is found, which is the same as in Example 7.

For large codes this procedure is too inefficient to be practical [8]. For a general linear code the problem of finding the nearest neighbour is hard and is considered a NP-complete problem, where NP stands for “nondeterminis- tic polynomial time” and is a classification of how hard this problem is to solve [4]. There are certain types of codes that have more efficient decoding procedures for example the cyclic codes.

2.3 Cyclic Codes

A linear code C is called cyclic if a cyclic shift of one codeword in C generates another codeword in C. If C is cyclic then if

(c0, c1, ..., cn−1) ∈ C it gives that (cn−1, c0, c1, ..., cn−2) ∈ C.

When continuing doing cyclic shifts, more codewords are generated, this gives that all cyclic permutations of a codeword is also a codeword. The code used in Example 7 is therefore also a cyclic code, if any of the codewords is shifted in a cyclic way it will still be a codeword.

If F is a finite field and as before consisting of the integers mod p, where p is a prime, then let F [x] denote the set of all polynomials in x that have

(14)

coefficients in F . Then with the positive number n the code will work in F [x]

(xn− 1),

that denotes the elements of F [x] mod (xn− 1). This is the polynomials with degree less than n. If a polynomial of degree n or larger is encountered it is divided by (xn− 1) and the remainder is the new polynomial. A cyclic shift of a word corresponds to multiplying the corresponding polynomial in F [x] / (xn− 1) by x modulo xn− 1. The general description of a cyclic code is given in Theorem 4.

Example 9. Using the code in Example 7 where one codeword (1, 0, 1, 0), which is the first row of the generating matrix G, can be represented as the polynomial g(x) = 1 + x2. Then g(x)x gives the second row of G, continuing with g(x)x2, that represents two cyclic shifts,

g(x)x2 = x2+ x4 ≡ 1 + x2 mod xn− 1.

Since the degree is equal to n = 4 the computation is done modulo x4− 1, this then gives the first row of G again.

Theorem 4. Let C be a cyclic code of length n over a finite field F . For each codeword (c0, ..., cn) in C, the polynomial c0 + c1x + · · · + cn−1xn−1 is associated in F [x]. Then let g (x) be the polynomial of smallest degree out of all the nonzero polynomials obtained from C in this way. Dividing g (x) by it’s highest coefficient, one may assume that g (x) is a monic polynomial, where a monic polynomial is when the leading coefficient is one. This polynomial g (x) is then called the generating polynomial for C and

1. g (x) is uniquely determined by C.

2. g (x) is also a divisor of (xn− 1), i.e. g(x)h(x) = xn − 1 for some h(x) ∈ F (x).

3. C is exactly the set of coefficients of the polynomials of the form g (x) f (x), where deg (f ) ≤ n − 1 − deg (g).

4. A polynomial m (x) ∈ F [x] / (xn− 1) corresponds to a codeword in C if and only if h (x) m (x) ≡ 0 mod (xn− 1), where h(x) is defined by 2 [8].

(15)

If g (x) = c0 + c1x + . . . ck−1xk−1 + xk is built like in Theorem 4, then by part 3 of the theorem, every codeword of C corresponds to a polynomial of the form g (x) f (x), where deg (f ) ≤ n − 1 − deg (g). Since f (x) is a linear combination of 1, x, x2, ..., xk−1, this gives that every codeword in C is a linear combination of the codewords corresponding to the polynomials

g (x) , g (x) x, g (x) x2, ..., g (x) xk−1. These are in turn corresponding to the vectors

(g0, ...gk, 0, 0...) , (0, g0, ..., gk, 0, ...) , ..., (0, ..., 0, g0, ..., gk) .

Then a generating matrix for C can be build similar as the one done for the linear codes,

G =

g0 g1 . . . gk 0 0 . . . 0 g0 g1 . . . gk 0 . . . ... ... ... ... ... ... ... 0 . . . 0 g0 g1 . . . gk

 .

To construct the parity check matrix for C corresponding to the generating matrix one uses part 4 of Theorem 4. Here h (x) = h0+ h1x + · · · + hmxm, where m = n − k,

H =

hm hm−1 . . . h0 0 0 . . . 0 hm hm−1 . . . h0 0 . . . ... ... ... ... ... ... ... 0 . . . 0 hm hm−1 . . . h0

 .

This should fulfil that g(x)h(x) = xn− 1 which is equivalent to g(x)h(x) ≡ 0 mod (xn − 1), this in turn gives that GHT = 0, which is true for every generating matrix and its corresponding parity check matrix. As mentioned for linear codes a parity check matrix H for a linear code C means that vHT = 0 if and only if v ∈ C. This is the same for the cyclic code, cHT = 0 if and only if c ∈ C.

Example 10. Constructing a generating matrix G for a code of length 7 can be done by factorising the polynomial x7 − 1, since the generating polynomial g(x) should divide

x7− 1 = (x − 1)(x3+ x2+ 1)(x3 + x + 1).

(16)

Then the generating polynomial could be chosen to g(x) = 1 + x + x2+ x4 this then generates the matrix

G =

1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1

.

Here a cyclic shift of the first row gives all the nonzero codewords, so all code- words is C are

C ={(0, 0, 0, 0, 0, 0, 0), (1, 1, 1, 0, 1, 0, 0), (0, 1, 1, 1, 0, 1, 0), (0, 0, 1, 1, 1, 0, 1), (1, 0, 0, 1, 1, 1, 0), (0, 1, 0, 0, 1, 1, 1), (1, 0, 1, 0, 0, 1, 1), (1, 1, 0, 1, 0, 0, 1)}.

Note that it happens to be like this for this particular example, for cyclic codes in general there can be additional codewords, whose cyclic shifts also is codewords.

To check that this is all the codewords one can take the linear combination in every possible way and check that it generates one of the codewords in this list. This code is cyclic since a cyclic shift of one codeword generates another codeword. Here the parity check matrix H is constructed from the parity check polynomial, h(x), that satisfies g(x)h(x) = x7− 1, hence h(x) = x3 + x + 1 which gives the matrix

H =

1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1

 .

The parity check matrix gives a way to detect errors in a transmitted message for cyclic codes in a similar way as for a linear code. It can still be hard to correct the occurring errors for a general cyclic code, to make it easier one can give even more structure to the code, like a BCH code [8].

2.4 BCH Codes

BCH codes were discovered in the late 1950’s by R. C. Bose and D. K. Ray- Chaudhuri and independently by A. Hocquenghem, hence the name BCH.

BCH codes are a class of cyclic codes, that has a decoding algorithm that can correct multiple occurring errors. These types of codes are specificity used in satellites and the special BCH codes called Reed-Solomon codes have

(17)

a lot off different applications. To construct a BCH code one needs some background information regarding polynomials.

If a polynomial d (x) is a divisor of the polynomial f (x) then f (x) = d (x) g (x) for some g(x). Here 1 and f (x) are trivial divisors of f (x) since they are always divisors of f (x). All other divisors are called non trivial, or proper, divisors of f (x). If a polynomial f (x) have no proper divisors in a finite field F , then f (x) is said to be irreducible over F .

Let the polynomial f (x) be of degree n ≥ 1, and irreducible over the finite field F , where

F = GF (pn) = Zp[x]

f (x) = {a0+ a1α + · · · + an−1αn−1| ai ∈ Zp, f (α) = 0}, where GF (pn) denotes the Galois field with pn elements, p is a prime. If F = F \ {0} denotes the group F without the zero element, this group is cyclic. If α is the generating element of this group, such that hαi = F, then f (x) is called a primitive polynomial [1].

When using addition and multiplication of polynomials this is done mod- ulo some irreducible polynomial h (x) of degree n, let Fn[x] be the set of all polynomials in F [x] with degree less than n. Here each codeword in Fn corresponds to a polynomial in Fn[x], so one can also use addition and mul- tiplication of codewords in Fn. Then multiplication in Fn is defined to be modulo an irreducible polynomial of degree n [5].

When using a primitive polynomial to construct GF (2r), that represents the Galois field with elements based on 2r, in binary it would be an r-bit number, all computations in the field is easier than when a non-primitive irreducible polynomial is used.

Let β be in Fnand represent the codeword corresponding to x mod h (x), where h (x) is a primitive polynomial of degree n. Then βi is equivalent to xi mod h (x). Here note that if 1 ≡ xm mod h (x) it means that 0 = 1 + xm mod h (x) which gives that h (x) divides 1 + xm. Since h (x) is a primitive polynomial, h (x) does not divide 1 + xm for m less than 2n− 1, this gives that βm is not equal to 1 for m less than 2n− 1. If βj = βi for j 6= i if and only if βi = βj−iβi, this implies βj−i= 1. From this one can say that

Fn\ {0} =βi | i = 0, 1, ..., 2n− 2 .

That is, every non-zero codeword in Fn can be represented by some power of β. This property makes multiplication in this field easy. An example of

(18)

this using GF (24) and h (x) = 1 + x + x4 shown in Table 1 found in Coding Theory and Cryptography the Essentials page 114 [5].

Example 11. Using Table 1 to compute multiplication of codewords is done using powers of β. To compute (1100) (1010) transform the codewords to powers of β, then

(1100) (1010) = β4β8 = β12= 1111.

This can be done since

(1 + x) 1 + x2 ≡ 1 + x + x2+ x3

mod h (x) .

codeword polynomial in x mod h (x) power of β

0000 0 -

1000 1 β0 = 1

0100 x β

0010 x2 β2

0001 x3 β3

1100 1 + x ≡ x4 β4

0110 x + x2 ≡ x5 β5

0011 x2+ x3 ≡ x6 β6

1101 1 + x + x3 ≡ x7 β7

1010 1 + x2 ≡ x8 β8

0101 x + x3 ≡ x9 β9

1110 1 + x + x2 ≡ x10 β10

0111 x + x2+ x3 ≡ x11 β11 1111 1 + x + x2+ x3 ≡ x12 β12 1011 1 + x2+ x3 ≡ x13 β13

1001 1 + x3 ≡ x14 β14

Table 1: Construction of GF (24) where h (x) = 1 + x + x4 [5].

An element α in GF (2r) is called a primitive element if αm is not equal to 1 for m between 1 and 2r− 1. That is, α is a primitive element if every non-zero codeword in GF (2r) can be expressed as a power of α. Then if a primitive polynomial h (x) is used to construct the finite field GF (2r) with β defined as above, then β is a primitive element.

(19)

Usually the order of the non-zero element α in GF (2r) is the smallest positive integer m such that αm = 1. For any non-zero element α in GF (2r), α has order m less than 2r− 1. Hence this α is a primitive element if it has order 2r− 1 [5].

This definition of primitive element α will be useful when one wants to construct the class of BCH codes that is called Reed-Solomon codes, since it is used when constructing the generating polynomial for the code.

To start the construction of a BCH code of length n over a finite field F , one needs to factorize xn− 1 in the same way as in the section of cyclic codes,

xn− 1 = f1(x)f2(x) . . . fr(x),

where each fi(x) is an irreducible polynomial over the field F . If α is a primitive root modulo n, then α0, α1, ..., αn−1 are the roots of xn − 1 such that

xn− 1 = (x − 1)(x − α) . . . (x − αn−1).

This means that each fi(x) is a product of some of the factors x−αj, then each αj is a root of the polynomials fi(x). For each j, let qj(x) be the polynomial fi(x) that fulfils fij) = 0, then the polynomials q0(x), q1(x), ..., qn−1(x) are formed. The polynomials ql(x) are not all distinct since the polynomial fi(x) can have two different powers αjl as roots, then the polynomial fi(x) will serve as both qj(x) and ql(x). Then a BCH code of designed distance δ is a code with the generating polynomial

g(x) = lcm{qk+1(x), qk+2(x), ..., qk+δ−1(x)},

where k is some chosen integer. A BCH code C with designed distance δ, such that d(C) ≥ δ, this is the so-called BCH bound, where the BCH bound says that for a cyclic code C of length n over F with minimum weight d. If C contains δ − 1 consecutive elements for some integer δ, then d is greater or equal to δ [6]. The polynomial g(x) is called the minimal polynomial of α over F since it is the monic polynomial of minimal degree in F (x) such that g(α) = 0.

Example 12. Using the same polynomial as in Example 10, then x7− 1 = (x − 1)(x3+ x2+ 1)(x3+ x + 1)

and using the other possible generating polynomial g(x) = x4+ x3+ x2+ 1 = (x − 1)(x3 + x + 1). If α then is a root of x3+ x + 1, it is a primitive root

(20)

modulo n, where n is equal to 7. This gives that g(α) = 0 as well as g(α2) = (α2)3 + α2 + 1 = 0, since the computations is done with binary numbers and α3 = α + 1 as well as squaring (α3)2 = (α + 1)2 = α2+ 2α + 1 = α2+ 1. This gives that the square of a root α is also a root of x3+ x + 1, then α4 = (α2)2 is also a root to g(x). Now g(x) can be rewritten

g(x) = x3+ x + 1 = (x − α)(x − α2)(x − α4).

All the remaining powers of α must be roots to (x − 1) and (x3 + x2 + 1), respectively, in summary the different polynomials qi are

q0(x) = x − 1

q1(x) = q2(x) = q4(x) = x3+ x + 1 q3(x) = q5(x) = q6(x) = x3+ x2+ 1.

If the chosen integer k is equal to −1 and d = 3 then the least common multiple of g(x) is then

g(x) =lcm{qk+1(x), qk+2(x), ..., qk+d−1(x)}

=lcm{q0(x), q1(x)}

=x4+ x3+ x2+ 1.

This example says that the minimum weight is at least 3. If k = −1 and d is chosen to 4 then the generating polynomial g1(x) is

g1(x) = lcm{q0(x), q1(x), q2(x)} = g(x),

since q1(x) = q2(x) the least common multiple does not change and now the minimum weight of this code also is at least 4. The actual minimum weight of this code is equal to 4, which can be seen if the minimum weight of the codewords is calculated.

3 Reed-Solomon Codes

The Reed-Solomon codes where introduced in 1960 by I. S. Reed and G.

Solomon and are a type of BCH codes. If F is a finite field with q elements, where q = pr, where p is a prime, then for a binary code, 2r. If n = q − 1,

(21)

then F contains a primitive element, α. Then the generating polynomial is constructed as

g (x) = x − αb

x − αb+1 . . . x − αb+d−1 , (1) where d is between 1 and n. Usually b is chosen to 0 or 1, from here on it will be chosen to 1. This generator polynomial will have coefficients in F and generates a BCH code C over F of length n that is called a Reed-Solomon code.

Since g (α) equals zero for all powers 0, ..., d − 1 of α the BCH bound gives that the minimum distance for C is at least d.

A Reed-Solomon code is a cyclic [n, n + 1 − d, d] code, where n = 2r− 1 and the elements are from GF (2r). The rest is based of the fact that the generating polynomial is a polynomial of degree d − 1 so it has at most d nonzero coefficients. This gives that the codeword corresponding to the generating polynomial is a codeword of weight at most d. It gives that the minimum weight for C is exactly d and the dimension of C is n − deg (g) = n + 1 − d, where g is the generating polynomial, which gives the notation for the code. The notation [n, k] can also be used where d = n + 1 − k.

The codewords in C is given by the polynomials g (x) f (x) ,

where deg(f ) is less or equal to n − d. Since there are q different choices for the n − d + 1 coefficients of f (x) there are qn−d+1 polynomials f (x). Hence there are qn−d+1 different codewords in C. Due to this the Reed-Solomon code fulfils the criterion for a MDS code, that is, there is an equality in the Singleton bound [8].

3.1 Encoding

To start an encoding process one needs to define the message polynomial. For a Reed-Solomon [n, k] code, k information symbols form the message that is encoded as one block and can be represented by the message polynomial m (x). This message polynomial m (x) is then of order k − 1,

m (x) = mk−1xk−1+ · · · + m1x + m0,

(22)

where the coefficients mk−1, ..., m0 are message symbols from an alphabet, the usually the Galois field GF (2r).

The Reed-Solomon encoding can be done both cyclic and systematic, when it is done systematic it is still a cyclic structure in the background, for example when constructing the generating polynomial. For both methods of encoding the same generating polynomial g (x) is used, shown in Equation (1). For the cyclic approach the generating matrix is made in the same manner as in the section of cyclic codes. Then the coefficients of the message polynomial represents a message vector instead which is multiplied through the generating matrix to construct a codeword. The generating polynomial is then

g (x) = (x − α) x − α2 . . . x − αd

= g1+ g2x + · · · + gd−1xd−1, and the corresponding generating matrix for g (x) is

G =

g0 g1 . . . gd−1 0 0 . . . 0 g0 g1 . . . gd−1 0 . . . ... ... ... ... ... ... ... 0 . . . 0 g0 g1 . . . gd−1

| {z }

n

,

where there are n columns and k rows. The codeword c is then c = (mk−1, ..., m1, m0)G.

3.2 Properties

For many types of applications the errors often occur in bursts and are not randomly distributed. A burst error is when there occurs an error in many adjacent bits, for example a scratch on a CD. Reed-Solomon codes, where the message symbols comes from the finite field F = GF (2r). Here GF (2r) represents the Galois field with elements based on 2r. Using the elements of a Galois field as message symbols the coefficients of the message polynomial would be represented by an r-bit binary number. It is due to this that the Reed-Solomon code is good with burst errors, since even though all bits in a symbol is in error it will only count as one symbol error in terms of correction capacity of the code [3].

(23)

4 Methods

When one starts to decode a received message there are some different ap- proaches to consider. There is an direct method that is based on some trial and error, there are also different algorithms to speed up the direct method.

An other type of decoding algorithm are the Berlekamp-Massey algorithm that will be used later on.

4.1 Decoding

Let [n, k, d] be some Reed-Solomon code, where n = 2r−1, k is the dimension and d is the minimum weight. Since the elements of the code comes from GF (2r), correcting a received vector means that one needs to both find the the locations of the error as well as magnitudes of the error. The error location is defined as the position in the received vector that holds an error and is refereed to by an error location number, if the jth coordinate of the vector is an error location then then its error location number is xj. The error magnitude of an error location is the size of the error in this error location j, it is the error in the coefficient of xj. To decode a Reed-Solomon code one needs to find both the error location and the corresponding error magnitude [5].

If the received vector is represented as an polynomial, R (x), then it can be seen as

R (x) = T (x) + E (x) , (2)

where T (x) is the transmitted codeword and E (x) is the error that have occurred. Here E (x) = En−1xn−1+ · · · + E1x + E0, where each coefficient is an element from GF (2r). The positions of the errors are determined by the degree of x. The correction capacity of the code are, as discussed before, d needs to be greater or equal to 2t+1, where t is the number of errors that can be corrected. This gives that if more than t = (d − 1) /2 of the coefficients in E (x) is non-zero the error may not be corrected. Like what happened in Example 7, where the algorithm did still work but the nearest codeword might be another than the one that was transmitted.

To know if there have occurred any errors one needs to calculate the syn- dromes for the received polynomial. This can be done in some different ways.

One way is to divide the received polynomial by the generator polynomial, if the received polynomial is an actual codeword this can be done without any

(24)

remainder. This property extends to the factors of the generator polynomial, which gives the ability to find each syndrome value S1, ..., Sdby dividing the received polynomial by each of the factors x − αi corresponding to the factors in Equation 1,

R (x)

x − αi = Qi(x) + Si

x − αi, (3)

where Q (x) is the quotient and i goes from 1 to d. Here the remainder is the sought syndrome values S1, ..., Sd. Rearranging Equation (3) gives the equation for each syndrome value,

Si = Qi(x) × (x − αi) + R (x) . Hence when x = αi this can be reduced to

Si = R αi

= Rn−1 αin−1

+ · · · + R1αi+ R0,

where the coefficients Rn−1, ..., R0 are the symbols of the received polynomial.

This gives an alternative way of finding the syndrome values, namely by substituting x = αi in the received polynomial. This is possible since the syndrome values only are dependent on the error pattern, that is

R αi = T αi + E αi ,

where T (αi) is equal to zero since x − αi is a factor of the generating poly- nomial that in turn is a factor of T (x). Hence this can be reduced to

R αi = E αi = Si. (4)

When no error is detected the syndrome values S1, ..., Sd will be zero [3].

The relation between the syndromes and the error polynomial can be used to set up a set of simultaneous equations and from these the error can be found. Here the error polynomial is rewritten to only include the error locations. Assuming that v errors have occurred, v needs to be less or equal to t, otherwise the errors can not be corrected. The error polynomial is rewritten to

E(x) = Y1xj1 + Y2xj2 + · · · + Yvxjv,

(25)

where j1, ..., jv is the error location number for each error, respectively. The error magnitude at each location is represented by Y1, ..., Yv. Substituting this back in the syndrome Equation (4) gives

Si = E(αi)

= Y1αij1 + Y2αij2 + · · · + Yvαijv

= Y1X1i + Y2X2i+ · · · + YvXvi,

where (X1 = αj1), ..., (Xv = αjv) still are the error locators with error location numbers j1, ..., jv. Now the 2t syndrome equations can be set up by a matrix equation:

 S1 S2 ... S2t

=

X11 X21 . . . Xv1 X12 X22 . . . Xv2 ... ... ... X12t X22t . . . Xv2t

×

 Y1 Y2 ... Yv

. (5)

Note that the syndromes S1, ..., S2t−1 corresponds with the roots of the gen- erating polynomial α, ..., αd chosen in Equation (1).

There are now two different ways to construct the error locator polyno- mial. The first one, denoted σ(x), is constructed as

σ(x) = (x − X1)(x − X2) . . . (x − Xv)

where the error locators X1, ..., Xv are roots to the polynomial and produces a polynomial of degree v with the coefficients σ1, ..., σv.

The second error locator polynomial, denoted Λ(x), is constructed as Λ(x) = (1 − X1x)(1 − X2x) . . . (1 − Xvx), (6) where the factors (1 − Xjx) gives that it is the inverses X1−1, ..., Xv−1 of the error locators that are the roots of the polynomial with coefficients Λ1, ..., Λv. However the coefficients of both σ and Λ are the same since σ(x) can be rewritten to xv × Λ(1/x). The two error locator polynomials is used on different places to make the computations easier.

When the error locations X1, ..., Xv have been found they can be substi- tuted back into the syndrome equation and be solved by direct calculation using matrix inversion of the matrix equation (5), this produces the error magnitudes Y1, ..., Yv. If the matrix that are obtained not is invertible an

(26)

alternative method of calculating the error value Yj is to use the Forney algorithm which will not be described here, Algebraic Coding Theory by Berlekamp E. for further reading [2]. Now when the symbols containing er- rors have been identified by Xj and the magnitude of these errors Yj is found, the errors can be corrected by subtracting the error polynomial E(x) of the received vector R(x), by rewriting of Equation (2) this gives the transmitted codeword T (x) [3].

4.1.1 Direct Method

The task of finding the coefficients of the error locator polynomial have some different approaches, this section will describe the direct method.

Now there is a corresponding root Xj−1 for each error that makes Λ(x) equal to zero, this can be written as

1 + Λ1Xj−1+ · · · + Λv−1Xj−v+1+ ΛvXj−v = 0.

This can be multiplied through by YjXji+v to be rewritten as YjXji+v + Λ1YjXji+v−1+ · · · + ΛvYjXji = 0,

where each term of YjXj with powers and j = 1, ..., v can be rewritten as syndromes,

Si+v + Λ1Si+v−1+ · · · + ΛvSi = 0, where i = 0, ..., 2t − v − 1.

This now produces a set of 2t − v simultaneous key equations, where Λ1, ..., Λv are unknown. To solve these equations for Λ1, ..., Λv one can use the first v equations,

 Sv Sv+1

Sv+2 ... S2v−1

=

Sv−1 Sv−2 . . . S1 Sv Sv−1 . . . S2

Sv+1 Sv . . . S3 ... ... ... S2v−2 S2v−3 . . . Sv−1

×

 Λ1 Λ2

Λ3 ... Λv

, (7)

except that v is unknown here. So to find what v is it is necessary to calculate the determinant for the matrix, for each value of v. These calculations should start with v = t and continue down until a non-zero determinant is found [7].

(27)

This non-zero determinant gives that the equations are independent and can be solved. Then the coefficients to the error locator polynomial are found by inverting the matrix and solve the equations. When the coefficients of the error locator polynomial is found they can be used to find the sought error location numbers that indicates the error location for each error. When the error location polynomial is written as

Λ(x) = X1(x − X1−1)X2(x − X2−1) . . . ,

then the function value will be zero if x = X1−1, X2−1, ... and this is the case when x = α−j1, α−j2, ..., hence the values of X1, ..., Xv are found by trial and error. This can be done using the Chien search, that is trying the powers of αj for j between 0 and n − 1, since this covers the whole field, to find the roots of Λ(x). The values of αj for j between 0 and n − 1 are substituted into Equation 6 and each result is evaluated. If this expression is evaluated to zero then the value of x is a root and identifies an error location. Here it is j that gives the error location number [3].

4.1.2 Berlekamp–Massey algorithm

This algorithm gives an alternative method for finding the error locator polynomial that is faster than the direct method. Here the error locator polynomial σ(x) is calculated with the syndromes S1, ..., S2t. Let σR(x) = 1 + σt−1x + σt−2x2+ · · · + σ0xt, this can be seen as the reverse of the error locator polynomial σ(x).Then let S(x) = 1 + S1x + S2x2+ · · · + S2tx2t be the syndrome polynomial. Here by using the division algorithm one can write

σR(x)S(x) = q(x)x2t+1+ r(x), where the degree of r(x) is less or equal to 2t.

This version of the algorithm will produce a polynomial P2t(x) satisfying P2t(x)S(x) = q2t(x)x2t+1+ r2t(x), where the degree of P2t(x) is less or equal to t and the degree of r2t(x) is also less than t. Hence P2t(x) is equal to σR(x). Now let

qi(x) = qi,0+ pi,1x + · · · + qi,2t−1−ix2t−1−i and also let

pi(x) = x2t+1−iPi(x) = pi,0+ pi,1x + · · · + pi,lxl,

(28)

then at step i, the algorithm calculates qi(x), pi(x) and the integers Di and zi.

The following steps are then used to calculate the error locator polyno- mial with the Berlekamp-Massey algorithm. Let T (x) be the transmitted codeword that is encoded using a generator polynomial g(x) constructed in the same manner as in Equation (1), then let the received vector with some error be R(x). The decoding process is then continued as follows:

1. Calculate the syndromes for the received vector as R(αi) = Si, where i = 1, ..., 2t.

2. Now define

q−1 = 1 + S1x + S2x2+ · · · + S2tx2t q0 = S1+ S2x + · · · + S2tx2t−1 p−1 = x2t+1 and

p0 = x2t,

as well the initial conditions D−1 = −1, D0 = 0 and z0 = 1.

3. Then for i = 1, ..., 2t, qi(x), pi(x), Di and zi is recursively defined for two different cases as follows

(a) If qi−1,0 = 0, then

qi(x) = qi−1 x , pi(x) = pi−1

x ,

Di = 2 + Di−1 and zi = zi−1.

(b) If qi−1,0 6= 0 then qi(x) =

qi−1(x) −qqi−1,0

zi−1,0qzi−1(x)

x ,

pi(x) =

pi−1(x) −qqi−1,0

zi−1,0pzi−1(x)

x ,

Di = 2 + minDi−1, Dzi−1 , and zi =

(i − 1, if Di−1≥ Dzi−1, zi−1, otherwise.

(29)

If e, that is less or equal to t, errors have occurred during the transmission of the codeword then p2t(x), that is equal to σR(x), has degree e and the error locator polynomial σ = p2t,e + p2t,e−1x + · · · + p2t,1xe−1+ xe then has e distinct roots. These roots can be found similarly as in the direct method and gives then the error location number [5].

4.2 My program

The Mathematica program that I have used can be found in appendix A, in this section there will be a specific example run of this program to se how it works. The first section of this program takes a text together with some n and k, that represents the Reed-Solomon code [n, k], and encodes the text. The encoding process start with that each letter gets represented by a element in Fn, and k such elements then creates one block of length k. This is then encoded by the k × n generating matrix G that is constructed by the generating polynomial g(x). This polynomial g(x) consists of consecutive powers of the smallest primitive element α modulo n, then

g(x) = (x − α1)(x − α2) . . . (x − αn−k).

For this example n = 257 and k = 249, the generating polynomial g(x) = 44 + 118x + 4x2+ 174x3+ 156x4+ 42x5+ 157x6+ 183x7+ x8, the computations are done mod n and here α = 3. The generating matrix G is then constructed in a the same manner as for a cyclic code. The blocks that consists of the message elements is then multiplied through G to encode the message. An output of the program gives how many errors, at most t errors if d(C) ≥ 2t + 1 according to the Theorem 1, that are introduced in each block and then one calculation of the error positions from the direct method and one from the Berlekamp–Massey algorithm.

Example 13. If 2 errors is introduced in one block of the text found in appendix A, there are some error in two different positions that are represented by xj. The number of errors as well as there positions are of course unknown when the decoding process begins. First the syndrome values for the corresponding vector is calculated, here

{17, 31, 133, 61, 71, 20, 176, 155},

they are the same for both methods. From now on the methods differ, the first description will consider the direct method.

References

Related documents

Both experiments proved the hypothesis stated in the beginning: steganographic data might be hidden into Reed-Solomon covert channel satisfying primary attributes of a

In the second case we have employed a nested construction of polar codes for channel and source coding to adapt the quality of the description conveyed from the relay to the

When exploring scientific databases for the frame of reference the following search words and phrases were used: Innovation, Absorptive Capacity, Innovation tools,

Chapter 1 is basic with the introduction of linear codes, generator and parity check matrices, dual codes, weight and distance, encoding and decoding, and the Sphere Packing Bound..

The message bits are divided into several classes depending on the connection degree and each class has different BER after decoding, that is, the code provides unequal error

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

while considering that the given matrix A is diagonalizable with eigenvalues having identical geometric and algebraic multiplicities, the columns of the matrix Q evalu- ated as

In the identification phase, the observation can be seen as the side information, so we will consider using Wyner-Ziv coding for polar codes to reconstruct and iden- tify.. In the