Efficient Decoding Algorithms for Low-Density Parity-Check Codes

(1)

EFFICIENT DECODING

ALGORITHMS FOR LOW DENSITY

PARITY CHECK CODES

Master’s thesis in electronics systems by

Anton Blad LiTH-ISY-EX--05/3691--SE

(2)

(3)

EFFICIENT DECODING

ALGORITHMS FOR LOW DENSITY

PARITY CHECK CODES

Master’s thesis in electronics systems at Link¨oping Institute of Technology

by Anton Blad LiTH-ISY-EX--05/3691--SE

Supervisor: Oscar Gustafsson Examiner: Lars Wanhammar Link¨oping 2005-02-28.

(4)

(5)

Avdelning, Institution Division, Department Institutionen för systemteknik 581 83 LINKÖPING Datum Date 2005-02-28 Språk Language Rapporttyp Report category ISBN Svenska/Swedish X Engelska/English Licentiatavhandling X Examensarbete ISRN LITH-ISY-EX--05/3691--SE C-uppsats D-uppsats

Serietitel och serienummer Title of series, numbering

ISSN Övrig rapport

URL för elektronisk version

http://www.ep.liu.se/exjobb/isy/2005/3691/ Titel

Title

Effektiva avkodningsalgoritmer för low density parity check-koder Efficient Decoding Algorithms for Low-Density Parity-Check Codes Författare

Author

Anton Blad

Sammanfattning Abstract

Low-density parity-check codes have recently received much attention because of their excellent performance and the availability of a simple iterative decoder. The decoder, however, requires large amounts of memory, which causes problems with memory consumption.

We investigate a new decoding scheme for low density parity check codes to address this problem. The basic idea is to define a reliability measure and a threshold, and stop updating the messages for a bit whenever its reliability is higher than the threshold. We also consider some modifications to this scheme, including a dynamic threshold more suitable for codes with cycles, and a scheme with soft thresholds which allow the possibility of removing a decision which have proved wrong. By exploiting the bits different rates of convergence we are able to achieve an efficiency of up to 50% at a bit error rate of less than 10^-5. The efficiency should roughly correspond to the power consumption of a hardware implementation of the algorithm.

Nyckelord Keyword

LDPC, Low Density Parity Check, Tanner graph, decoding, sum-product, probability propagation, early decision, threshold

(6)

(7)

Abstract

Low-density parity-check codes have recently received much attention because of their excellent performance and the availability of a simple iterative decoder. The decoder, however, requires large amounts of memory, which causes problems with memory consumption.

We investigate a new decoding scheme for low density parity check codes to address this problem. The basic idea is to define a reliability measure and a thres-hold, and stop updating the messages for a bit whenever its reliability is higher than the threshold. We also consider some modifications to this scheme, including a dynamic threshold more suitable for codes with cycles, and a scheme with soft thresholds which allow the possibility of removing a decision which have proved wrong.

By exploiting the bits different rates of convergence we are able to achieve an efficiency of up to 50% at a bit error rate of less than 10−5. The efficiency should roughly correspond to the power consumption of a hardware implementation of the algorithm.

Keywords: LDPC, Low Density Parity Check, Tanner graph, decoding, sum-product, probability propagation, early decision, threshold

(8)

(9)

Notation

d Minimum distance of a code.

Eb The energy used for a code word, averaged over the information bits.

Es The energy used for transmitting each symbol.

G A generator matrix.

H A parity check matrix.

j Column weight of the parity check matrix of a regular LDPC code.

K Number of message bits in a code word.

k Row weight of a the parity check matrix of a regular LDPC code.

M Number of parity bits in a code word.

M (n) The neighbours of variable node n.

m, mi Denotes a message, or bit i of the message.

N Number of bits in a code word.

N0 The power spectral density of the (AWGN) channel noise. N (m) The neighbours of check node m.

Pb The message bit error probability, after decoding.

Pe The symbol error probability, before decoding.

Pw The code word error probability, after decoding.

p, pi Denotes the parity, or bit i of the parity.

pa

n The prior likelihood of bit n assuming the value a.

qa

mn The variable-to-check message from variable n to check m.

qna The pseudo-posterior likelihood of bit n assuming the value a.

R The rate of a code.

r, ri Denotes a received signal, or sample i of the received signal.

ramn The check-to-variable message from check m to variable n.

t The threshold of an early-decision decoder.

x, xi Denotes a code word, or bit i of the code word. ˆ

x Denotes the decoder’s guess of the sent code word.

(10)

(11)

Introduction

1.1 Background

Low-density parity-check (LDPC) codes were first discovered by Robert Gallager [1] in the early 60’s. For some reason, though, they were forgotten and the field lay dormant until the mid-90’s when the codes were rediscovered by David MacKay and Radford Neal [2]. Since then, the class of codes has been shown to be remarkably powerful, comparable to the best known codes and performing very close to the theoretical limit of error-correcting codes.

The nature of the codes also suggests a natural decoding algorithm operating directly on the parity check matrix. This algorithm has relatively low complexity and allows a high degree of parallelization when implemented in hardware, allowing high decoding speeds. The performance comes at a price, however. The memory requirements are very large, and the random nature of the codes leads to high interconnect complexities and routing congestions.

1.2 Problem definition

The aims of this work were to build a software framework with which different mod-ifications of the decoding algorithm could be tested and evaluated. The framework were to include the usual elements in a coding system: encoding of messages, trans-mission of code words, and decoding. While focus lay on the algorithms theoretical properties, the suitability for hardware implementation were not to be neglected.

Using this framework, ideas could be easily tested, and the hope was that at least some modification would lead to better decoding algorithm performance.

1.3 Outline and reading instructions

In chapter 2 we describe the basic elements of a digital communication system. We introduce the channel abstraction, and define measurements of performance. Then

(14)

we explain the basic ideas of codes, and declare the theoretical bound for coding known as the Shannon limit. We continue with a description of general block codes and their properties, followed by a definition of low-density parity-check codes. We also define the Tanner graph, which is a visualization of codes, especially suited for low-density parity-check codes. Finally, we compare the performances of different low-density parity-check codes with some conventional coding schemes. The reader already familiar with digital communication and coding could skip this chapter.

Chapter 3 contains the description of the sum-product decoding algorithm. A simple example showing the idea of the algorithm is included. Then the algorithm for the cycle-free case is described, followed by the modifications necessary to adapt the algorithm to codes with cycles. In the end, some alternative message domains are described, suitable for various implementations. The material in this chapter is quite theoretical and a complete understanding is not necessary for the reader to grasp the contents of chapter 4.

Chapter 4 contains the new ideas of this work. We define a measure of message reliabilities, and we use this measure to define a new decoding algorithm. Four different modifications are evaluated using a reference code, and the results are shown. Also included are some further considerations and discussions about the results.

Chapter 5, finally, contains the summarized conclusions of this work, and some words about the implementability of the algorithm.

(15)

Chapter 2

Error control systems

In this chapter the basic concepts of digital communication and error control cod-ing are introduced. We consider the communication model, vector representation of signals, different channels, theoretical limits, and theoretical code performance measurements. There are numerous books introducing the subject, for example Wicker [3], concentrating on coding, and Anderson [4], treating more of the com-munication aspect.

2.1 Digital communication

2.1.1 Basic communication elements

A digital communication system is a system in which an endpoint A transmits information to an endpoint B. The system is digital, meaning that the information is represented by a sequence of symbols from a finite discrete alphabet. The sequence is mapped onto an analog signal, which is then transmitted through a cable or using an antenna. During transmission the signal is distorted by noise, so the received signal is not the same as the sent. The receiver selects the most likely sequence of symbols and delivers it to the receiving endpoint B.

The transmitter and receiver functions are usually performed by different ele-ments. The basic ones are shown in figure 2.1. The modulator maps the source symbols onto the analog signal. The channel transmitter puts the signal on the physical medium. On the receiver side, the channel receiver reconstructs an analog signal, and the demodulator compares the signal to the modulation waveforms and selects the most likely corresponding symbols.

Usually, the modulation scheme is linear, implying that an orthogonal basis can be chosen for the waveforms. Then the signals can be represented as vectors over this basis, where the lengths of the vectors are the square roots of the waveform energies. It can be shown (see e.g.[4], p. 48) that with a matched receiver only the noise in the basis’ dimensions is detected.

(16)

Data source Modulator Transmitter Receiver Demodulator Data sink Noise Channel

Figure 2.1. Basic elements of a digital communication system.

Throughout this work, we will assume the source and destination alphabet to be the binary, A = {0, 1}. Furthermore, we will work exclusively with the binary phase shift key (BPSK) modulation format. The BPSK format is one-dimensional, and we represent the waveforms with the scalars +1 and−1. Conventionally the symbol 0 maps to +1, and the symbol 1 maps to−1.

2.1.2 Channels

Definition 2.1 A channel is defined by an input alphabet AX, an output alphabet AY, and a transition function pY |X(y|X = x), where x ∈ AX and

y ∈ AY. The transition function denotes the probability (when AY is discrete)

or the probability density (when AY is continuous) of the event that symbol y is

received, given that symbol x was sent. The channels we consider are memoryless, meaning that the output is independent of earlier uses of the channel.

The two most common types of channels are the binary symmetric channel (BSC) and the additive white Gaussian noise (AWGN) channel. The BSC is really a channel abstraction including a hard-decision receiver, and the output alphabet is the same as the input alphabet, AY = AX ={0, 1}. The channel is defined by a probability p of cross-over between the symbols, such that

½

pY |X(0|X = 0) = pY |X(1|X = 1) = 1 − p

pY |X(0|X = 1) = pY |X(1|X = 0) = p

For example, if a 0 was sent there is a probability of p that 1 is received, and a probability of 1− p that 0 is received.

The AWGN channel models the noise as a white Gaussian stochastic process with spectral density N0/2. This noise has infinite power, and is therefore not

realizable. It is still a realistic model, though, for many practical channels over cables when the frequencies are not extremely high. With the vector model the noise becomes a Gaussian stochastic vector with mean 0 and standard deviation

σ =pN0/2 in every dimension. If we use BPSK modulation the input alphabet

(17)

2.1 Digital communication 5

where N ∼ N (0, σ), so the output alphabet is the set of real values AY =R. The transition function can be given as pY |X(y|X = x) = fx,σ(y), where fx,σ is the probability density function for a Gaussian stochastic variable with mean x and standard deviation σ.

The simulations in this work have been done over the AWGN channel exclu-sively. Unlike the BSC, the AWGN channel conveys reliability information about the transmitted symbols. Much better performance can be achieved if the decoder is able to use this information. As we will see later, the LDPC decoder is perfectly suited for this task.

2.1.3 Coding

Recall that the purpose of the receiver system was to decide the most probable sequence of symbols. Obviously, with the transmission schemes discussed so far, there is nothing to gain by looking at sequences of symbols, as the transmitted symbols are independent. By inserting redundant symbols in the transmitted se-quence, dependencies can be introduced, and the set of possible sequences can be restricted. This can lead to remarkably better performance, even if the signal energies has to be decreased to compensate for the higher transmission rate.

Figure 2.2 shows a digital communication system that employs channel cod-ing. Two elements have been added, the channel encoder and the channel decoder. Their functions are quite obvious, the encoder adds redundancy to the data ac-cording to the code being used, and the decoder uses the code properties to correct transmission errors. As said earlier, there are decoders that are able to use the like-lihood values from the demodulator. These are called soft-decision decoders, and the decoders with which we are to concern ourselves belong to this category. The other kind are hard-decision decoders, which have to choose a symbol for each received signal before it is decoded. Some information is lost this way, for example if the value +2 is received over an AWGN channel it is more likely that the symbol 0 was sent than if the value +0.5 was received. The hard-decision decoder removes this information.

Before we continue on to the topic of practical code construction, we will delve a bit deeper into the properties of channels and codes. In 1948, Claude E. Shan-non published an article [5] in the Bell Systems technical journal proving some quite intricate theorems about the transmission capabilities of channels. The most important is the noisy channel coding theorem.

Theorem 2.1 The noisy channel coding theorem states that for every channel

a quantity called the channel capacity can be defined, such that for information rates below this limit arbitrarily small error probabilities can be achieved. More-over, for information rates above the channel capacity the error probability must necessarily be bound away from zero.

It should be emphasised that the formal channel models not only the interfering noise, but also the transmission scheme with waveforms and signal energies. Thus

(18)

Modulator

Transmitter Receiver

Demodulator Noise

Channel

Data source Data sink

Channel encoder Channel decoder

Figure 2.2. Basic elements of a digital communication system employing channel coding.

an alternate interpretation of Shannon’s theorem is that above a certain transmis-sion power arbitrarily small error probabilities can be achieved, if the information rate is kept constant.

The information rate has not been formally defined here, and is a bit out of scope for this work, but for independent and identically distributed source symbols, it equals the quota between useful data (information) and total data transmitted. This quantity will also be referred to as the code rate. The definition of the code rate for a block code can be found in section 2.2.3.

The noisy channel coding theorem is non-constructive in its nature, and serves merely as a theoretical limit to strive for. Moreover, the theorem says nothing about the practicality of the codes promised, and, not surprisingly, the more powerful the codes are, the more far-reaching the dependencies between the symbols need to be, and the more difficult the code will be to code and decode.

2.1.4 Two simple codes

Here we will define two simple codes, which we will use as a base for defining performance metrics. The first code is the repetition code, which simply repeats each symbol a number of times. We will use 3 as the repetition factor, so each symbol is repeated 3 times. For example the symbols 101 become 111000111 when encoded. The decoder can then just look at the received symbols in groups of three and select the symbol that occured at least two times.

The second code we define is the (7,4,3) Hamming code, where the parameters are (N, K, d) = (7, 4, 3). This is a block code, that takes the source message in blocks of K bits and encodes them to blocks of N bits. The d parameter is the minimum distance of the code, and is the minimum number of symbols that two encoded words may differ in.

The (7,4,3) Hamming code can be defined by a number of parity check equa-tions, where the computations are done modulo 2. We call the source message bits

(19)

2.1 Digital communication 7

m0, m1, m2 and m3 and calculate the parity bits p4, p5 and p6 according to the

equations

p4 = m0+ m2+ m3

p5 = m0+ m1+ m3

p6 = m0+ m1+ m2

Then the code word (m0, m1, m2, m3, p4, p5, p6) is sent over the channel, and as

long as at most one error has occured, the receiver can find the transmission error and correct it. This is possible because the minimum distance of the code is 3, so as long as the transmission error is in just one symbol, the transmitted word will always be the “closest”.

2.1.5 Performance metrics

Next we will look at performance measurements of transmission schemes, and the gain to be had with coding. First we will consider pure BPSK modulation over the AWGN channel. We can calculate the bit error probability (or bit error rate) as a function of the signal-to-noise ratio (SNR). The SNR is a measurement of the signal strength relative to noise, and is usually given in dB as

SNR = 10 logEs

N0

where Es is the symbol energy and N0 is twice the power spectral density of the

noise.

The symbol error probability, denoted Pe, is the probability that a transmitted symbol 0 is received as a 1 or vice versa. The transmission scheme is symmetric, so it is sufficient to look at one of the cases. Assume therefore that the symbol 0 was transmitted, corresponding to signal level +√Es. The channel adds zero-mean Gaussian noise, with standard deviation σ =

q N0

2 . The received signal r is then

Gaussian with r ∼ N¡√Es, σ ¢

, and the error probability Peis the probability that this function is less than zero:

Pe= P (r < 0) = Q µ√ Es σ ¶ = Q Ãr 2Es N0 !

This function is plotted in figure 2.3.

In the figure we have the quantity Eb/N0 on the x-axis, whereas the SNR was

defined as Es/N0. Ebis a new concept that we will have to define when we employ coding. As different codes may have different code rates, it would not be fair to compare codes with the same signal energies. Therefore we define the quantity Eb to be the energy per information bit transmitted. The symbol energy Eswill then be the energy of the information bits averaged over all the bits transmitted, so if the code rate is R the relation Es= REbwill hold. In the recent transmission

(20)

0 2 4 6 8 10 12 10−9 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 E b/N0 (dB)

Bit error probability

BPSK Repetition Hamming

Figure 2.3. Bit error probabilities for some simple transmission schemes. Shown in

the figure are plain binary phase shift key (BPSK) transmission, the 3-repetition cod-ing scheme, and the (7,4,3) Hammcod-ing code. The Hammcod-ing code is able to correct one arbitrary error in each transmitted block. As can be seen, the Hamming code provides an insignificant coding gain at high enough signal strengths, while the repetition code performs worse than if no coding is used at all.

scheme discussed, where no coding were used, all the bits are information bits, so

Es = Eb. When comparing codes, we will keep Eb constant, while the relation

Es= REb will allow us to calculate the symbol energy in order to determine the error probability of each transmitted symbol.

We will also need to define the bit error probability, or bit error rate, denoted Pb. This is the probability that a message bit, after decoding, will be in error, and is a bit harder to calculate in the case of coding. In the no-coding case, however, it equals the symbol error probability Pe.

The 3-repetition code discussed sends each information bit three times, so the code rate is here R = 1₃, and the symbol energy Es= 1₃Eb. Therefore, the symbol error probability will be

Pe= Q µ√ Es σ ¶ = Q Ãr 2Eb 3N0 !

(21)

2.2 Block codes 9

The code is however able to correct single transmission errors, as each information bit is sent three times. So the cases where the information is incorrectly decoded are when all three bits are in error, or when just one bit is correctly received. We can write this as

Pb= Pe3+ 3Pe2 which is of course also a function of _NEb

0.

In figure 2.3 it can be seen that the code actually performs worse than if no coding is used at all. The code is simply too weak to compensate for the overhead of sending three times as many symbols. It is therefore unusable as en error correcting code, but serves as a simple example of performance calculation.

The Hamming code is slightly more complex. The code rate is R = 4₇, so the symbol error probability is

Pe= Q µ√ Es σ ¶ = Q Ãr 8Eb 7N0 !

The code is able to correct a single error in an arbitrary position, so the probability of a correctly transmitted message is (1− Pe)7+ 7Pe(1− Pe)6. Thus, the prob-ability of a code word error is 1− (1 − Pe)7− 7Pe(1− Pe)6. When a word error occurs, essentially half the bits are garbled, so the bit error probability is about half the word probability error:

Pb ≈ 1 2 · ³ 1− (1 − Pe)7− 7 (1 − Pe)6Pe ´

In the figure we see that the Hamming code surpasses the plain BPSK scheme when the signal is strong enough. Assume for example that the BPSK scheme at Eb/N0 = 11 dB is used. Then we can employ the (7,4,3) Hamming code and either reduce the transmitter power by 0.45 dB, or enjoy a 4.5-fold reduction of transmission errors. However, there are more advanced codes that offer benefits a lot better than this, see section 2.3.5 for a survey.

2.2 Block codes

2.2.1 Definition of block codes

There are two main ways to define a linear block code, either through a generator matrix G or a parity check matrix H. The relation x = GT_{m (mod 2) holds} for a code defined by a generator matrix. Thus the rows of G (the columns of GT₎ form a basis for the code, and the message m is the coordinates for the code word

x. In this work, however, we will define codes through parity check matrices. Then

the set of code words is given by the relation Hx = 0 (mod 2). The rows of H thus define a set of checks on the code word x. The relation implies that the bits involved in each check must have an even number of ones for the word to be in the code. This definition of a code does not include a mapping between code words and

(22)

messages, but often a code is constructed such that the message bits are mapped to certain locations in the code word. These bits are then called message bits, and the other bits are called parity bits.

For example the relations defining the (7,4,3) Hamming code in section 2.1.4 can be put in a matrix to form a parity check matrix H for the code:

H =  11 01 10 11 10 01 00 1 1 1 0 0 0 1  

If x = (m0m1m2m3p4p5p6)T, where m0. . . m3are the message bits and p4. . . p6

are the parity bits, the previously given equations result from the relation Hx = 0 (mod 2).

2.2.2 Systematic form

When put on the form H = [P I], where I is the identity matrix, H is said to be on systematic form. On this form the parity check matrix is particularly easy to convert to a generator matrix. By recognizing which parity bits are changed by changing one message bit and keeping the other message bits constant, we can determine the rows of the generator matrix. In the above case, changing bit m1

requires changing bits p5 and p6 for the parity check equations to be valid. This leads to a generator matrix G =£I PT¤. For example, the parity check matrix for the (7,4,3) Hamming code can be converted to the following generator matrix on systematic form: G =     1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 0 1 0 1 0 1 0 0 0 1 1 1 0    

Observe that the leading identity matrix ensures that the message bits are copied into the first locations of the code word.

2.2.3 Code rate

Usually the number of message bits in a code word is denoted K, and the number of parity bits is denoted M . Thus the relation K + M = N holds, where N is the total number of bits in a code word. Assuming that the rows of the parity check matrix are linearly independent (as is always the case with systematic parity check matrices), each row defines a parity bit. Therefore there are M rows and N columns in the parity check matrix. Similarly the generator matrix has dimensions

K × N . The rate of a code, denoted R, is the ratio between the number of message

bits and the total number of bits: R = K_N.

Later we will also use the concept design rate for systematically constructed (i.e.non-random) codes. The design rate denotes the rate that the code was de-signed to have according to the parity check matrix. However, systematically

(23)

con-2.2 Block codes 11

structed parity check matrices often have a small number of dependencies, so the actual code rate is a bit higher.

2.2.4 Encoding

Encoding is done by multiplication of the generator matrix with the message m. The operation yields the code word x:

x = GTm (mod 2)

Alternatively the parity bits can be computed as sums of message bits, using the parity check matrix on systematic form. This technique was demonstrated in sec-tion 2.1.4.

An arbitrary parity check matrix can be put in systematic form using Gaussian elimination. This may require that the code word bits be reordered though, so the message bits may become scattered. The procedure may be formalized in the following way, given without proof:

Theorem 2.2 Assume that H is a full-rank parity check matrix with dimensions

M × N , i.e. it consists of M independent rows. Let x denote a code word. Then

there exists a column permutation p such that H0 = p(H) = (B A), where A is

square and non-singular.

Applying this permutation to the code word x yields the equation

Hx = 0 (mod 2) ⇐⇒ p(H)p(xT)T = H0x0 = 0 (mod 2)

If we define the last M bits of x0 to be parity bits p and the rest to be message bits

m, we can split the equation to obtain the relation H0x0= (B A) µ m p ¶ = 0 (mod 2) ⇐⇒ Ap = Bm (mod 2) ⇐⇒ ⇐⇒ p = A−1_{Bm (mod 2)}

with which the parity bits can be effectively computed.

This encoding technique has been employed in the experiments in this work. However, when the block length grows it becomes impractical to store the random and dense generator matrix for direct encoding (several hundred megabytes are required for the codes proposed for the next generation of satellite-TV), so other methods must be sought. There are many algorithms for efficient encoding of various codes in hardware, but we will not discuss the topic further. The reader is instead referred to [3] for the general encoding problem, or to [6] for LDPC-specific encoding algorithms.

(24)

2.3 Low-density parity-check codes

2.3.1 Definition of LDPC codes

LDPC codes are defined from parity-check matrices. Originally Gallager defined an LDPC matrix as a randomly created matrix with small constant column weights and row weights [1]. An (N, j, k) LDPC code denotes an LDPC code with code word length N , column weight j, and row weight k. These codes were later renamed regular LDPC codes, as opposed to irregular LDPC codes which are still low-density but lack the property of constant column and/or row weights. It was shown already by Gallager that regular LDPC codes are not able to approach channel capacity with vanishing error probability on binary symmetric channels. MacKay, however, later showed [2] that irregular LDPC codes may achieve any desired error probability arbitrarily close to the channel capacity for any channel with symmetric stationary ergodic noise (including the BSC and the AWGN channel defined in section 2.1.2).

In practice, though, these results say very little about the usability of a code. What is more important is the existence of a practical decoding algorithm, and herein lies the importance of LDPC codes. The structure of the codes naturally suggests a simple iterative decoding algorithm that is inherently scalable and lends itself well to hardware implementation. This topic is explored in chapter 3, but before we get there we will look a bit more inte the theoretical properties of LDPC codes.

2.3.2 Tanner graphs

We will define a simple example code to simplify the description of the Tanner graph and girth concepts. Let H be the parity check matrix

1 2 3 4 5 6 7 8 9 H =         1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1         1 2 3 4 5 6 (2.1)

The code has design rate 1/3, but it actually has one redundant check, so an arbitrary check can be removed for a slightly higher rate code. (This always happen when the column weight is even, as the sum of all rows is the all-zero vector.)

The Tanner graph is needed later in chapter 3 when we describe the decoding algorithm. The graph is a visualization of the codes’ structure, and the decoding algorithm operates directly on it. As the name implies, the graph representation was investigated by Tanner, but he used it primarily for designing codes using a kind of “divide-and-conquer” technique [7].

(25)

2.3 Low-density parity-check codes 13

The nodes of the graph are the variables and the checks of the code, thus there is a node for every column and row in the parity check matrix. There is an edge between two nodes if and only if there is a one in the intersection between the corresponding column and row in the matrix. There are no intersections between two columns or two rows, so the graph is bipartite between the variable nodes and the check nodes. Figure 2.4 shows the Tanner graph corresponding to H.

2.3.3 Girth

The girth of a graph is the length of the shortest cycle in it. In this work we will also talk about girths of matrices, and mean the lengths of the shortest cycles in their associated Tanner graphs. As the Tanner graphs are bipartite, every cycle will have even length, and thus the girth will also be even. Note that we cannot formally talk about the girth of a code, as there are several parity check matrices that define the same code and these need not have the same girth. Equation 2.2 and figure 2.5 show the relationship between parity check matrices and Tanner graphs, using the example code. The length of the cycle is 8, and this is also the girth of the code as there are no shorter cycles.

1 2 3 4 5 6 7 8 9 H =         1 0 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1         1 2 3 4 5 6 (2.2)

The existence of cycles in the parity check matrix impact negatively upon the LDPC decoder’s performance. Thus it is desirable to obtain matrices with high girth values. 8 is in fact a quite high girth value. LDPC codes are often randomly constructed, and these matrices almost always have girth 4, which is the lowest possible. Elements in the matrix can be moved around to eliminate the short cycles, but it is difficult to achieve girths higher than 6 this way. Many try to find good algorithms for systematic construction of LDPC codes. The matrices can then be constructed with girths of 8 and higher, while they get the structure necessary for practical implementations of hardware encoders and decoders. It is difficult, however, to find suitable algorithms that fulfil these demands and still make the matrix random enough for the code to perform well.

It would be optimal for LDPC decoding purposes to use codes with parity check matrices without cycles, as in this case the decoding algorithm is optimal. Unfortunately such codes are useless. It was shown by Etzion et al. [8] that cycle-free codes have minimum distance at most two when the code rate R ≥ 1₂. Such codes are not able to correct a single error, and therefore we are forced to use codes with cycles. Very little research have been done about the influence of cycles upon the decoder’s performance, but it is believed that increasing girth is a good way to

(26)

7 8 9 2 1 3 4 5 6 2 1 3 4 5 6 Variable nodes Check nodes

Figure 2.4. The figure shows the Tanner graph corresponding to the code given by

the parity check matrix in equation 2.1. The nodes are numbered as their corresponding columns and rows in the matrix.

7 8 9 2 1 3 4 5 6 2 1 3 4 5 6 Variable nodes Check nodes

Figure 2.5. The figure shows a cycle of length 8 in the Tanner graph for the parity check

matrixH given in equation 2.2. The elements corresponding to the cycle’s edges are also shown in the matrix. There are no cycles of length less than 8, therefore the girth ofH is 8.

increase both the codes’ and the decoder’s performances. Other results [9] suggest that high girth might not be that important, though.

(27)

con-2.3 Low-density parity-check codes 15

structed codes that we will use later for experimental performance measurements of the LDPC decoders.

2.3.4 Integer lattice codes

Integer lattice codes (proposed by Vasic et al. [10]) belong to a category of codes called quasi-cyclic (or quasi-circulant) codes, which is an extension of the cyclic codes. The cyclic property of codes implies that every cyclic shift of a code word is also a code word. It enables a field-algebraic description of codes, as well as ensures that encoding can be done efficiently using shift registers [3]. The Hamming codes are all cyclic, as are the BCH and Reed-Solomon codes with which the reader may be familiar.

A code that is quasi-cyclic can be divided inte q equally-sized blocks, where a simultaneous cyclic shift of all blocks preserves code words. Quasi-cyclic codes can also be efficiently encoded using shift registers, although it is not necessarily the best encoding method for LDPC codes. A quasi-cyclic code with q = 1 is cyclic, of course.

Example 1

We consider as an example a simple code given by the parity check matrix H:

H =  10 01 00 10 00 10 00 01 01 0 0 1 0 1 0 1 0 0  

Assume that x = (x0 x1 x2 x3 x4 x5 x6 x7 x8)T is a code word. Then x0 =

(x2 x0 x1 x5 x3 x4 x8 x6 x7)T is a vector where each block of q = 3 has been

circularly shifted. Then check 1 is satisfied for x if and only if check 2 is satisfied for x0, and similarly for check 2 and 3, so x0 is a code word whenever x is.

The parity check matrices for lattice codes are constructed by concatenating cyclically shifted identity matrices. However, there are many other algorithms yielding the same structure, so it is not characteristic for lattice codes. As most systematically constructed LDPC codes, the girth is ensured to be at least 6. Girth 8 is easily achieved too, with carefully chosen design parameters.

To define the lattice codes we first need some geometrical constructions. Definition 2.2 A lattice is a rectangular grid of points, with height q and width

j. j can be any positive integer, whereas q is required to be prime. The lattice is periodic in the vertical dimension.

Definition 2.3 A line with slope s starting at point (0, a) is the set of j points:

(28)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Figure 2.6. The figure shows an integer lattice used for construction of LDPC codes.

Some of the lines with slope 0 and 2 are shown.

Thus a line includes a point from every column in the lattice, and the vertical lines are not allowed. For every slope, we require that either none or all of the lines with that slope exist.

Example 2

We let q = 5 and j = 3, and define the lattice with 15 points shown in figure 2.6. Some of the lines with slope 0 and 2 are shown.

One intuitive way of constructing a code from this geometry is to let each point correspond to a code word bit, and let the lines determine which bits are included in each check. However, we will do the other way, and let the lines correspond to bits, and the points correspond to checks.

We let S = {s0, . . . , sk−1} denote the set of slopes we decide to use, and let

k = |S| be the number of slopes. There will then be j points intersecting each line,

and k lines intersecting each point, so j and k are the column and row weights of the matrix, respectively.

To construct the parity check matrix we will need to define labelings for the points and lines. For example we can use the labeling defined by lp(x, y) for the point at coordinate (x, y), and ll(si, a) for the line with slope si starting at point (0, a), given by:

lp(x, y) = qx + y + 1

ll(si, a) = qi + a + 1 The labeling of the points is illustrated in figure 2.6.

Definition 2.4 Let q and j be parameters of a lattice, and let S = {s0, . . . , sk−1}

(29)

with dimensions qj × qk, where the element at (lp(x, y), ll(si, a)) is 1 if and only

if the line with slope si and starting point (0, a) intersects the point at coordinate (x, y).

Example 3

We extend the slope set of example 2 to obtain the set S = {0, 1, 2, 3}. Then q = 5,

j = 3 and k = 4 and we obtain a code with design rate 1 − j_k = 1₄. The code is given by the parity check matrix H, where each column correspond to a line over the lattice. The first five columns correspond to those with slope 0, the next five to those with slope 1, and so on. Similarly, the first five rows in the matrix correspond to the first column of points in the lattice, and the next five rows to the second column, and so on. For example, we can see that column 13 (bold) correspond to the line through the points (0, 2), (1, 4) and (2, 1), with labels 3, 10 and 12.

H =                           1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0                          

Theorem 2.3 A lattice with j = 3 and the slope set S = {s0, . . . , sk−1}, where S

does not contain any three-term arithmetic progression and every si ≤ q₂, defines

an LDPC code with girth 8.

With a three-term arithmetic progression we mean three integers a, b and c, such that b − a = c − b. The integers need not be consecutive in S. For the proof of theorem 2.3, see [10].

2.3.5 LDPC performance

Randomly constructed LDPC codes are often used when evaluating performances of systematically constructed LDPC codes. Random LDPC codes are most likely

(30)

0 1 2 3 4 5 6 7 8 9 10 11 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 E_b/N₀ (dB)

BPSK Hamming (255,247,3) Reed−Solomon (31,25,7) Lattice (203,116) Random LDPC (203,116) Shannon limit (R=4/7)

Figure 2.7. Performance comparisons of some short-length codes. The Reed-Solomon

and Hamming codes’ performance curves were calculated, while the LDPC codes’ were determined by simulation. As can be seen, the random LDPC code outperforms the Reed-Solomon code by about 3 dB.

good, although they are not very suitable for practical usage. Systematically con-structed codes that surpass random codes exist, but are relatively rare. This section provides a comparison of lattice codes to random LDPC codes, as well as to con-ventional coding schemes. We also provide an example with images to give the reader an intuitive sense of coding benefits.

Figure 2.7 compares short LDPC codes to simple coding schemes. The Ham-ming code can correct one error, and when the block length increases the probability of more than one error increases. Therefore we need codes able to correct multiple errors to increase performance. The Reed-Solomon codes are one type of widely-used codes, and the figure shows one with minimum distance 7. The block length is 31, but the symbols consist of 5 bits and are not binary, so the length in bits is comparable to the LDPC codes. As can be seen, the random LDPC code provides about a 3 dB increase in performance at a bit error rate of 10−6. The comparison is not altogether fair though, as Reed-Solomon codes have an inherent capability to correct burst errors, which are often present in reality but not modeled by the AWGN channel. Also shown is the capacity (Shannon limit) for rate 4/7 codes

(31)

2.3 Low-density parity-check codes 19 0 1 2 3 4 5 6 10−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 100 E b/N0 (dB)

BPSK

Lattice (32008,16004) LDPC, N=106 LDPC, N=107 Shannon limit (R=1/2)

Figure 2.8. Performance comparisons of some long codes. With increased lengths LDPC

codes can achieve performances very close to the channel capacity. There is also a class of codes known as Turbo codes that achieve comparable performances.

over the AWGN channel.

Figure 2.8 shows the performance of two irregular LDPC codes [11] created with a technique called density evolution. A code with a length of ten million bits performs at just 0.0045 dB from the Shannon limit. This is about as good as coding can ever get, if we can only find practical and fast decoding algorithms. A code of length one million bits performs just below one dB worse. While a block length of one or ten million bits may seem to long to be practical, it is certainly the case that many real transmission channels use rates of 10 Mbit/s or beyond. One example is digital TV with 15 Mbit/s, where a latency of one second would be acceptable. Inter-computer connections are another example, where speeds are growing ever faster, and above 1 Gbit/s codes of comparable length might well be considered.

The main cause that makes the LDPC codes better than conventional codes is the existence of a practical soft-decision decoder. It is also the case that low-rate codes perform better than high-low-rate codes (in terms of Eb/N0), and while

the LDPC decoding algorithm does not suffer significantly from reduced rate, the Reed-Solomon decoding algorithm becomes increasingly complex with increased

(32)

(a) Original image. (b) Encoded image.

Figure 2.9. Original and encoded image of a decoding example.

amounts of parity bits. (It is still the case, though, that decreased code rate causes increased transmission rate, which increases demands on transmission filters and synchronization circuits.)

Figures 2.9 and 2.10 show an example of the decoding process. To the left in figure 2.9 is shown an image that we wish to transmit. The image is first encoded with a rate-1/2 code into the image to the right. After transmission the top-left image in figure 2.10 is received. Using a soft-decision sum-product decoder, the received signal is then decoded in 14 iterations into the error-free image shown in 2.10. The reader should note that the inherent structure of the image is not used by the decoder. In practice, the image would first be compressed, and the decoder would achieve equally good performance with the compressed data.

(33)

(a) Received im-age.

(b) Iteration 1. (c) Iteration 2. (d) Iteration 3.

(e) Iteration 4. (f) Iteration 5. (g) Iteration 13. (h) Iteration 14.

Figure 2.10. Decoding process of a received image. The intensity of the pixels in each

(34)

(35)

Chapter 3

Decoding

The decoding problem is to find the most likely transmitted message, given know-ledge about the channel, the code used, and the a priori probabilities of transmitted messages. Most often the a priori probabilities are assumed to be uniformly dis-tributed over the message set, i.e.the messages are assumed to be equally likely, and indeed this is often a valid assumption. Usually the AWGN channel is assumed, but when the signal is airborne some fading channel might be more appropriate. However, it is often difficult to adapt the decoder to these conditions, but one way is to use an interleaver which spread the bits over several frames. This makes the errors more randomly located, so that the AWGN channel can be assumed.

Formally, the decoding problem over the BSC with equally likely messages can be stated as follows:

Theorem 3.1 Assume that a parity check matrix H of size M × N and a received

vector r of length N are given. Find the minimum-weight vector ¯n such that H ¯n = Hr. Then ¯x = r − ¯n is the most likely sent code word.

Proof. We know that r = x + n, where x is the transmitted code word. Thus

Hr = H(x + n) = Hx + Hn = Hn. The result is then easily deduced.

It has been known [12] for over 25 years that this problem is NP-hard. It is also believed that maximum-likelihood decoding of any capacity-approaching code is NP-hard, although this has not been shown as far as the author knows.

3.1 Iterative decoding

The sum-product (SP) decoder (or min-sum decoder, or belief propagation decoder) is a type of iterative decoder, the latest hype in the coding theory world. The algorithm works by passing messages representing bit and check probabilities over the Tanner graph of the code. For each iteration, more of the received data is used for calculating the likelihoods of each sent bit, until the set of bits form a valid code word or a maximum number of iterations is reached.

(36)

The sum-product algorithm is a general algorithm that may be used for a wide range of calculations, including turbo code decoding, Kalman filtering, certain fast fourier transforms, as well as LDPC decoding [13].

The main strength of the SP decoder is its simplicity and inherent scalability. Every node in the graph can be considered a separate simple processing entity, receiving and sending messages along its edges. Thus, the calculations can be made either in parallel by an element for every node, a single processor or DSP doing all the calculations in serial, or any combination. This ability to parallelise computations makes it possible to reach very high throughput rates of 1 Gb/s or higher.

The weaknesses, on the other hand, are very high memory requirements for storing of interim messages, and high wire routing complexity caused by the random nature of the graph. The large amounts of memory, in turn, causes large power dissipation issues, and the routing complexities makes it very difficult to make fully parallel implementations of codes longer than about 1000 bits. In short, there are many implementation difficulties regarding LDPC codes. Still, there are large amounts of structure that should be possible to exploit. In chapter 4 we analyze some ideas of reducing the decoding complexity.

Compared to turbo codes, the codes have almost orthogonal features. Their definition is very different, and the iterative decoding algorithm is difficult to im-plement in parallel. On the other hand, it does not have the memory requirements of the LDPC decoder.

In the following sections, we will first consider the decoder on cycle-free graphs, where it is easier to understand the algorithm. We will then describe how the algorithm can be used on graphs with cycles.

3.2 Decoding on cycle-free graphs

The decoding algorithm works by passing messages between the nodes of the Tanner graph of the code. As described in section 2.3.2, the graph consists of variable nodes and check nodes. Each variable of the code word correspond to a variable node, and each parity check correspond to a check node. We will use the following notions when we describe the algorithm:

• x is the sent code word, and xi denotes bit i of the sent code word.

• r is the received vector of correlation values from the demodulator. • ˆx is the decoder’s guess of the sent code word.

• pa

nis the prior likelihood of bit xnbeing a. Prior here means before decoding, i.e.pan is determined solely from the value of rn. For example, p07 is the prior

likelihood that bit 7 of the sent code word was a 0. Of course, for binary-input channels, a take the value 0 or 1, and p0n+ p1n= 1.

(37)

3.2 Decoding on cycle-free graphs 25

(a) First iteration. Message passing starts at the leaves.

(b) Second iteration.

(c) Third iteration. The center node got messages from all its neighbours and can send all messages at once.

(d) Final iteration. The message pass-ing reaches the leaves again.

Figure 3.1. The figure shows how messages are passed between the nodes in the Tanner

graph of a code. Every node sends a message along an edge when it has incoming messages on all its other edges.

• qa

nis the posterior likelihood of bit xnbeing a. This value is based on all the received bits of the block, and can be used directly to calculate the decoded bit ˆxn.

• M (n) is the set of neighbouring nodes to the variable node n. As the graph

is bipartite, these are all check nodes. The set can be defined from the parity check matrix asM (n) = {m : Hm,n= 1}. Slightly abusing notation, we also denote a set with an element m excluded by M (n) \ m.

• N (m) is analogously the set of neighbouring (variable) nodes to the check

node m, defined by N (m) = {n : Hm,n= 1}.

The nodes are to send a message along an edge whenever it has received mes-sages from all its neighbours except the one it is sending to. Each node is to send a message along all its edges once. Thus, message passing will start at the leaves of the tree and work its way into the interior of the graph, to later spread out to the leaves again. Figure 3.1 shows how the messages is passed in a simple graph. Variable nodes are depicted as circles, while check nodes are depicted as squares.

As the graph is bipartite, there are two types of messages:

• the variable-to-check message qa

mn, denoting the likelihood that bit n of the sent code word is a, given that the neighbouring checks of n other than

m have separable probabilities of being satisfied given by their messages {ra

(38)

• the check-to-variable message ra

mn, denoting the probability that check

m is satisfied given that variable n is locked at a, and the other

neigh-bouring variables have separable distributions according to the messages

{qa

mn0 : n0∈ N (m)} received via their respective edges.

Note that we assume that all messages arriving at a node are independent. Thus we also assume that the sent bits are independent, which is clearly not the case. This does not concern us much, though, as we are not very interested in the exact probability of the sent code word, but rather in the most likely one, which the algorithm will still give us.

We will combine the two messages along the same edge into a two-tuple (p0, p1), where p0represents the message where a = 0, and p1respresents the message where

a = 1. For example, we denote the message from variable node 1 to check node 2

with q21= (p0, p1), where p0= q210 and p1= q121.

The result of the message passing is that probabilities will propagate through the graph so that a message along an edge will be based on bit likelihoods gathered from every variable node in the branch starting at the sending node. In the end, when two messages have been passed along every edge (one in every direction), every variable node have incoming messages from all its neighbours, which can be used to calculate the posterior likelihoods qan based on every node in the graph.

3.2.1 A decoding example

1

2

3

4

5

0

1

2

3

(0.2, 0.8) (0.6, 0.4) (0.5, 0.5) (0.2, 0.8) (0.6, 0.4)

Figure 3.2. The figure shows the code used in the decoding example. In italics are the

node labels, in bold the sent code word bitsxi, and in parenthesis the prior likelihoods

(p0, p1), denoting the probability that the bit sent was 0 and 1 respectively.

Now, before we state the general updating rules for the messages, let us work through an example in the hope that the algorithm will seem intuitive to the reader. We will use the same code as in figure 3.1, showed again in figure 3.2 with the node labels and the sent code word. Thus we assume that the sent code word is x = (1, 0, 1, 1, 1). Furthermore, we assume that we can compute the prior probabilities p1 = (0.2, 0.8), p2 = (0.6, 0.4), p3 = (0.5, 0.5), p4 = (0.2, 0.8) and p5 = (0.6, 0.4) from the received signal. Thus, in short, we have one incorrect bit

(39)

3.2 Decoding on cycle-free graphs 27 Iteration 1

q

₁₂

q

₁₁

q

₂₄

q

₃₅

2

3

(0.2, 0.8) (0.6, 0.4) (0.5, 0.5) (0.2, 0.8) (0.6, 0.4)

1

4

5

3

1

2

Figure 3.3. The messages passed in the first iteration.

In the first iteration, the leaf nodes send their messages. These are q11, q12, q24

and q35, shown in figure 3.3. The variable-to-check messages shall be the likelihood

of the sent bit, considering incoming messages from other edges than the one which the message is passed along. However, there are no other branches from the edge nodes, and therefore the messages will be just the prior likelihoods.

q11 = p1= µ 1 5, 4 5 ¶ q12 = p2= µ 3 5, 2 5 ¶ q24 = p4= µ 1 5, 4 5 ¶ q35 = p5= µ 3 5, 2 5 ¶

Only variable nodes are leaves in the code, so only variable-to-check-messages are sent. It is perfectly valid to have check nodes as leaves too, but it may not make much sense, as they will check only one bit. Those bits will then have to be zero.

Iteration 2

In the next iteration all the check nodes are able to send messages to variable node 3. The messages are to represent the probability that the check is satisfied, given that the receiving node is locked at a certain value. So, messages r23 and r33 are

easy. Assuming that x3= 0, check 2 is satisfied when x4= 0, which has probability q240 = 15. Similarly, if x3= 1, check 2 is satisfied when x4= 1, which has probability q241 = 45. Therefore r23=

¡₁

5,45

¢

. Similarly, r33=¡3₅,2₅¢

r13 is a bit more difficult. Assuming that x3 = 0, there are two cases where

(40)

q

12

q

11

q

24

q

35

r

13

r

33

r

23

2

3

1

4

5

3

1

2

Figure 3.4. The messages passed in the second iteration.

the variables are independent (which they are), we can calculate the probability as

r013= q011q120 + q111q121 . r131 can be calculated similarly.

Thus, we have the following messages:

r13 = ¡q011q120 + q111q121 , q110 q121 + q111 q120 ¢ = µ 11 25, 14 25 ¶ r23 = q24= µ 1 5, 4 5 ¶ r33 = q35= µ 3 5, 2 5 ¶ Iteration 3

r

13

r

23

r

33

q

23

q

33

2

1

4

5

3

1

2 _q

13

3

Figure 3.5. The messages passed in the third iteration.

In the third iteration variable node 3 has incoming messages from all its neigh-bours. Therefore it is able to send all its messages at once. The calculations are similar, and we will describe how message q13 is determined.

The variable-to-check message q13 shall denote the likelihood of the variable x3, given the a priori likelihood p3 and the check probabilities r23 and r33. We

can calculate the likelihood that x3 = 0 by just multiplying the incoming beliefs,

yielding P r{x3 = 0} = p03r023r330 . Similarly, P r{x3 = 1} = p13r123r133. Note that,

(41)

3.2 Decoding on cycle-free graphs 29

calculating the joint probabilities of the variable and the event that the checks are satisfied. As the algorithm does not assume that a code word was really sent, the likelihoods does not sum to 1. Most often, mostly because of implementation reasons, the messages are normalized so that q₁₃0 + q1₁₃ = 1. Thus we get the message q13 = ³ p0₃r0₂₃r₃₃0 p03r023r330+p13r123r133, p1₃r₂₃1r1₃₃ p03r230r033+p13r231r133 ´

. With similar calculations we get the messages q23 and q33.

q13 = µ p03r230 r033 p0₃r0₂₃r0₃₃+ p1₃r1₂₃r₃₃1 , p13r123r331 p0₃r₂₃0 r0₃₃+ p1₃r₂₃1 r1₃₃ ¶ = µ 3 11, 8 11 ¶ q23 = µ p03r130 r033 p03r013r033+ p13r113r331 , p13r113r331 p03r130 r033+ p13r131 r133 ¶ = µ 33 61, 28 61 ¶ q33 = µ p03r130 r023 p0₃r0₁₃r0₂₃+ p1₃r1₁₃r₂₃1 , p13r113r231 p0₃r₁₃0 r0₂₃+ p1₃r₁₃1 r1₂₃ ¶ = µ 11 67, 56 67 ¶ Iteration 4

q

33

r

24

r

35

r

11

r

12

q

23

2

1

4

5

3

1

2 _q

13

3

Figure 3.6. The messages passed in the fourth iteration.

The calculations in the fourth iteration are similar to the second. The messages

q23 and q33 kan just be passed on as r24 and r35. Also, q13 kan be combined with q21 and q11 to calculate r11and r21, respectively.

Similarly to the calculation of r13, r011is the probability that check 1 is satisfied

if x1= 0, and x2 and x3 have distributions according to messages q12 and q13. So,

check 1 is satisfied if x2= x3= 0 or x2= x3= 1. We still assume that the variables

have separable likelihoods, so the event has probability r011 = q120 q013+ q121 q113.

Equivalently, assuming that x1 = 1, check 1 is satisfied if x2 = 0, x3 = 1 or x2 = 1, x3 = 0, which has probability r111 = q120 q113+ q112q013. r12 can be calculated

(42)

r11 = ¡q012q130 + q112q131 , q120 q131 + q121 q130 ¢ = µ 5 11, 6 11 ¶ r12 = ¡q011q130 + q111q131 , q110 q131 + q111 q130 ¢ = µ 7 11, 4 11 ¶ r24 = q23= µ 33 61, 28 61 ¶ r35 = q33= µ 11 67, 56 67 ¶ Bit decisions r24 r₃₅ r11 r12 r13 r23 r33 2 1 4 5 3 1 2 ₃ (0.2, 0.8) (0.6, 0.4) (0.5, 0.5) (0.2, 0.8) (0.6, 0.4)

Figure 3.7. The final bit decisions made from the prior likelihoods and the

check-to-variable messages.

All the nodes have sent messages to all their neighbours, so now we can use the check-to-variable messages along with the prior variable likelihoods to calculate the posterior likelihoods qn for every variable. For example, variable node 3 has incoming messages from all check nodes, which we can multiply together with the prior likelihood p3 to determine the most likely value of x3.

The calculations are similar to the calculations of the variable-to-check mes-sages, except that all incoming messages are used as we want to base the belief on the complete graph. Thus we get the following posterior likelihoods:

q1 = µ p0₁r₁₁0 p01r110 + p11r111 , p1₁r1₁₁ p01r011+ p11r111 ¶ = µ 5 29, 24 29 ¶ q2 = µ p02r120 p0₂r₁₂0 + p1₂r₁₂1 , p12r112 p0₂r0₁₂+ p1₂r1₁₂ ¶ = µ 99 155, 56 155 ¶ q3 = µ p0₃r0₁₃r0₂₃r₃₃0 p03r130 r023r330 + p13r113r231 r133, p1₃r1₁₃r₂₃1 r1₃₃ p03r013r230 r330 + p13r131 r123r331 ¶ = µ 33 145, 112 145 ¶ q4 = µ p04r240 p0₄r₂₄0 + p1₄r₂₄1 , p14r124 p0₄r0₂₄+ p1₄r1₂₄ ¶ = µ 33 145, 112 145 ¶

(43)

3.2 Decoding on cycle-free graphs 31 q5 = µ p05r035 p0₅r0₃₅+ p1₅r1₃₅, p15r135 p0₅r0₃₅+ p1₅r1₃₅ ¶ = µ 33 145, 112 145 ¶

For every variable we now select the most likely bit, i.e.we set xn = 1 if and only if qn1 > qn0. This gives us the decoding ˆx = (1, 0, 1, 1, 1), which is exactly what was sent.

It is not clear that the algorithm really results in a valid code word, nor that it is the most likely. However, no proof that this is really the case is given here, instead the interested reader is referred to Wiberg’s thesis [14].

3.2.2 The decoding algorithm

We will now state the formal updating rules for the sum-product algorithm. How-ever, we will use a slightly different approach than what is common, and give a direct translation from the message descriptions to the updating rules. First we need to observe the fact that the messages along a path form a Markov chain. This means that Pr{A | B, C} = Pr {A | B}, and it is no stranger than the statement that the value of a variable is dependent only on the arriving messages and not on any earlier messages. After all, the information in the earlier messages should be contained in the later messages along the same path.

As is usual in statistics, we will denote stochastic variables by upper-case letters. Thus the received code word is the vector X = (X1, . . . , XN). For all variables the prior likelihoods pa= Pr{Xn= a} , a ∈ {0, 1} are given. We also define the checks

C = (C1, . . . , CM), where each check Cmis the sum of all its adjacent variables:

Cm=

X n∈M(m)

Xn

We begin by calculating the check-to-variable messages ramn. The involved quantities are shown in figure 3.8. As defined earlier, ramn is the probability that check m is satisfied, given that Xn= a and the other neighbours have distributions according to their respective messages. We write this as

rmna = Pr{Cm= 0| Xn= a, N (Cm)\ Xn}

where we use the neighbour notation directly on the nodes to simplify the ex-pressions somewhat. We give the neighbours the explicit names Xn1, . . . , Xni and recognize that, because of the Markov chain property, we can add the neighbours of these variables without altering the value of the expression. This gives us

rmna = Pr{Cm= 0| Xn= a, Xn1, . . . , Xni, M (Xn1)\ Cm, . . . , M (Xni)\ Cm} Marginalizing this expression gives us the sum

Efficient Decoding Algorithms for Low-Density Parity-Check Codes

EFFICIENT DECODING

ALGORITHMS FOR LOW DENSITY

PARITY CHECK CODES

EFFICIENT DECODING

ALGORITHMS FOR LOW DENSITY

PARITY CHECK CODES

Abstract

Notation

Contents

Chapter 1

Introduction

1.1

Background

1.2

Problem definition

1.3

Outline and reading instructions

Chapter 2

Error control systems

2.1

Digital communication

2.1.1

Basic communication elements

2.1.2

Channels

2.1.3

Coding

2.1.4

Two simple codes

2.1.5

Performance metrics

2.2

Block codes

2.2.1

Definition of block codes

2.2.2

Systematic form

2.2.3

Code rate

2.2.4

Encoding

2.3

Low-density parity-check codes

2.3.1

Definition of LDPC codes

2.3.2

Tanner graphs

2.3.3

Girth

2.3.4

Integer lattice codes

2.3.5

LDPC performance

Chapter 3

Decoding

3.1

Iterative decoding

3.2

Decoding on cycle-free graphs

3.2.1

A decoding example

1

1

2

3

4

5

0

1

1

1

1

2

3

q

q

q

q

2

_q

_q