Approximate Inference Low-Complexity MIMO Detection

(1)

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2019

Approximate Inference

Low-Complexity MIMO Detection

JOÃO MIGUEL FERNANDES TORRES

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

Supervisors:

Dong Liu, Ph.D. Student

Division of Information Science and Engineering Kungliga Tekniska Högskolan

(4)

(5)

Abstract

In the telecommunications field, the correct detection of a certain signal is of most interest as it improves the robustness of the system. With multiple-input multiple-output (MIMO) antenna systems a new problem arises, the increase in complexity of the detection algorithms with the number of antennas.

This thesis consists of an overview over the classical and most recent derived from statistical learning detection algorithms. Starting with the optimal approach, the Maximum Likelihood detector, whose solution minimizes the error rate, is formally presented along with its scalability problem. To solve this, and enable the detection in systems of higher order, other sub-optimal receivers were developed along the years. Zero-Forcing or MMSE detectors are two examples of these, and, even though they present an attracting simplicity they carry a poor performance.

(6)

(7)

Sammanfattning

Inom telekommunikation är korrekt detektion av signaler viktigt eftersom det resulterar i ett mer robust system. Inom kommunikation med flera antenner i så kallade ”input multiple-output” (MIMO) system uppstår ett nytt problem då detektionsalgoritmernas komplexitet växer med antalet antenner.

Denna avhandling består av en överblick av klassiska och nyligen utvecklade detektionalgo-ritmer baserade på statistisk inlärning. Vi utgår från det optimala tillvägagångssättet, en så kallad maximum-likelihood detektor, vilken minimerar felsannolikheten. Denna algoritm presenteras tillsammans med problemen som uppstår då systemets antal antenner ökar. För att adressera detta problem har flertalet suboptimala detektorer utvecklats under tiden. Två exempel är "zero-forcing" eller MMSE detektorer vilka, trots deras fördelaktigt enkla design, leder till relativt dålig prestanda.

(8)

(9)

Acknowledgements

First and foremost I would like to thank my supervisor Dong Liu for all his help, for his promptitude to clarify any doubts and for guiding me throughout this work.

As this work was done with Huawei Technologies, I would like to thank the opportunity to enroll on a team of talented people and be part of a company of renown prestige in the telecommunications field. From all the people I came along in this project, I would like to give a special thank you to Nima Moghadam who acted as a second supervisor to my work and whose sympathy, availability, patience and knowledge were of extreme importance to the development of this work and I hope I get the chance to work with him in the future.

Furthermore, I wish to thank my colleague Tomás Costa who offered to act as an opponent to this Thesis and my colleagues who shared with me the second floor laboratory because they made the start of something new and sometimes scary feel very natural and simple with their sympathy and coffee. Finally, I thank my family, friends and Júlia for all their support, because sometimes all you need is a confidence boost to believe in yourself.

(10)

(11)

List of Figures

1.1 MIMO system . . . 1

2.1 MIMO channel with n inputs and m outputs. . . 6

2.2 Squared QAM constellations. . . 8

2.3 MLD tree representation. . . 10

2.4 Generic SD block diagram. . . 11

2.5 Tree representation of sphere decoder with K=2 and R=4. . . 12

2.6 Maximum Likelihood Detector performance for scenario 1. . . 13

2.7 K-best SE Sphere Decoder performance for scenario 2. . . 14

3.1 ZF, MMSE and MMSE-SIC detectors performance for scenario 1. . . 20

4.1 Iterative algorithm illustration. . . 25

4.2 EPD and PEDP (alpha=0.4) performance for scenario 1. . . 31

4.3 EPD and PEDP with combinatorial alpha performance for scenario 1. . . 31

4.4 EPD performance for scenario 2. . . 32

5.1 Top: Factor graph of function in equation (5.3). Bottom: Factor graph of function in equation (5.4). . . 36

5.2 Messages in a cycle-free graph. . . 38

5.3 GTA detector performance for scenario 1. . . 41

5.7 Example of a Factor Graph for MIMO systems . . . 44

5.8 Example of a Factor Graph for MIMO systems with prior information factors. . 45

5.9 GTA detectors performance for scenario 3. . . 46

5.10 GTA detectors performance for scenario 3. . . 47

A.1 Example of a graph. Left Set: Nodes belonging to the tree Right Set: Nodes outside the tree. . . 52

A.2 Pick the root node and include in the left set. Example: Root node equal to u1. . 52

(14)

List of Figures viii

A.3 Pick the edge linking to 1 that has the highest weight. Include u3in the left set. 52

A.4 Repeat, this time for all edges linking to node 1 and 3. Include u4. . . 53

A.5 Repeat and include u2. . . 53

(15)

Mathematical Notations

In order to achieve uniformity throughout this thesis and to avoid any possible confusion, the mathematical notations are now presented. Scalars are represented by plain characters, e.g. a. Vectors are represented by a lower-case bold character, e.g. x, and a matrices by an upper-case character in bold, e.g. X. Calligraphic uppercase letters denote sets, e.g. A.

xT_{, X}T _{Transpose of a vector,x, or a matrix,X.}

xH, XH Hermitian transpose of a vector,x, or a matrix,X.

X† Moore-Penrose Inverse.

|| x|| Euclidean norm of a vector x

diag(X) Column vector with the diagonal elements of X.

diag(x) Diagonal matrix with elements given by the elements of vector x.

Am _m

thcartersian product, A × A × ... × A.

|A | The cardinality of a finite set A.

<(a) Real part of a number a.

=(a) Imaginary part of a number a.

R, C Sets of real and complex numbers.

=∆ _{Equal by definition.}

≈ Approximately equal.

∝ Proportional to.

(16)

(17)

Acronyms

BP Belief Propagation.

DF Decision Feedback.

EP Expectation Propagation.

FG Factor Graph.

GTA Gaussian Tree Approximation.

KL Kullback-Leibler.

LDPC Low-density parity-check.

LS Least-Squares.

MIMO Multiple-Input Multiple-Output.

ML Maximum Likelihood.

MMSE Minimum Mean Square Error.

MRF Markov Random Field.

MST Maximum Spanning Tree.

NP-hard Non-deterministic Polynomial hard.

PAM Pulse Amplitude Modulation.

PEP Power Expectation Propagation.

QAM Quadrature Amplitude Modulation.

SD Sphere Decoder.

(18)

Acronyms xii

SER Symbol Error Rate.

SIC Successive Inference Cancellation.

SNR Signal-To-Noise Ratio.

V-BLAST Vertical Bell Laboratories Space-Time.

(19)

Chapter 1

Introduction

Fig. 1.1: MIMO system

In modern communication systems, one problem which is under intensive studying is that of receiver design. The received signal is characterized as the sum of the original transmitted signal with noise that arises from different components. Due to the fact that the received signal is altered, errors arise at the detection, and, the design of a receiver that minimizes this number of errors is desired.

The design of such a optimal receiver attains high computational complexity, and, for this reason there is a need for finding, although sub-optimal, simpler receivers. This is the case for the Maximum Likelihood (ML) detector whose detection is done by exhaustive search among all possible solutions. For the studied problem of joint detection of symbols transmitted over multiple antennas, it is known that, for most cases it leads to solving a Non-deterministic Polynomial hard (NP-hard) problem [1]. A problem where no known algorithm with polynomial complexity can find its solution, in fact, the complexity of solving this problem increases exponentially with the number of symbols to be jointly detected, and, it is not computationally feasible to find the solution. To demonstrate how this can be prohibitive, consider the example of a massive

(20)

2 Chapter 1 Introduction Multiple-Input Multiple-Output (MIMO) scenario where 64 antennas are transmitting one signal

each, and, each signal can have 32 different values, this would lead to search among 3264_∼₁₀96

possible values. The number of hydrogen atoms in the observable universe is approximately 1080_{, and, it makes finding a needle in a haystack seem easy.}

1.1 Historical Overview

The impossibility of solving this problem in finite time with the resources available today, led to the appearance of a number of sub-optimal, less complex algorithms that can be found in literature. A simple approach to solve this problem is the Zero-Forcing (ZF) algorithm [2] which is based on a linear decision that ignores the constraint of the signal belonging to a finite-set. Then, ignoring the correlation between symbols determines the closest point in the set for each symbol. This detector’s performance is poor due to noise amplification, however, noise can be reduced and the performance improved by means of Minimum Mean Square Error (MMSE) Bayesian estimator [3]. This detector partially incorporates the knowledge that the solution belongs to a finite set by introducing a prior Gaussian distribution. Similarly to ZF detector, it determines the finite set solution by picking the closest point to each component. The aforementioned receivers are known as linear receivers as they apply linear transformations to the received signals in order to perform symbol detection.

Another, more sophisticated, class of detectors that vastly improves the performance, is that of detection incorporating Decision Feedback (DF) or Successive Inference Cancellation (SIC). These detectors work in a sequential fashion, since they first detect one symbol and then subtract its contribution to the remaining symbols before proceeding with detecting another symbol. This is the case of the ZF-SIC and MMSE-SIC. However, the order in which each symbol is detected affects the performance of these receivers. One example of these algorithms can be found in what is known as MMSE-SIC with optimal ordering [4], and, even though the performance is improved there is still a big gap to the performance of ML detector.

Several attempts (detection algorithms) at reaching the performance offered by the ML detector have been developed throughout the years. This is the case of the Sphere Decoder (SD) algorithm applied to the MIMO detection problem that gained special attention in early 2000’s, and, different work can be found in [5–7]. This detector finds the ML solution by searching the nearest lattice point, and, although it offers lower computational complexity than that of the ML decoder it is not a feasible solution for high-order cases as the previously mentioned.

(21)

Chapter 1 Introduction 3 these detectors constitute an important part of this work, hence they are described and further explored in the next chapters.

1.2 Outline

This section outlines the various chapters composing this thesis. In this work reproduction of results along with modifications to existing detectors are presented, and should be noted it is solely focused on evaluating its performance in terms of symbols correctly detected (SER) focusing in achieving increased performance on these low-complexity detectors.

1.2.1 Chapter2

This chapter serves the purpose of providing the reader of this thesis to the underlying principles necessary to the full comprehension of the work presented in the next chapters. Namely, the linear MIMO channel model and some considerations regarding QAM signal energy. The MIMO detection problem is formalized and the ML detector and SD are introduced along with their performance results, for relevant scenarios. These constitute the lower-bound on the system performance during the rest of this work.

1.2.2 Chapter3

In this chapter the linear low-complexity decoders found in literature are further detailed. Namely, these liner receivers are Zero-Forcing (ZF), Minimum Mean Square Error (MMSE) and its improvement with Vertical Bell Laboratories Space-Time (V-BLAST) architecture. Performance results for these different detectors are presented, and, the MMSE results constitute an upper bound on the error.

1.2.3 Chapter4

In this chapter the EP algorithm is explained and its application to MIMO systems is studied. Further, the concepts of α-divergence and Power Expectation Propagation (PEP) are presented along with the necessary changes to the EP detector in order to obtain the PEP detector. The results obtained with these detectors are also presented.

1.2.4 Chapter5

(22)

(23)

Chapter 2

MIMO Detection

This chapter serves the purpose of introducing the reader to the underlying elements necessary to the understanding of this work. Namely, it will be presented the system model for linear MIMO channel used during the work.

Furthermore, the MIMO detection problem and the ML detector, along with the SD, will be formalized. The results for these two detectors, for the cases of interest, are presented at the end of the chapter, and, they constitute the baseline for the following chapters.

2.1 System Model

The system model described under this section is a general abstract model that is used in a variety of different scenarios as long as different assumptions supporting these different cases are used. The linear MIMO channel represented in the Figure 2.1 has as its core the fundamental idea that n symbols, u1, ..., un, are transmitted. These transmitted signals belong to a finite complex

alphabet, A, and, at the receiver m signals, y1, ..., ym, are received as a linear combination of the

ninput signals plus noise.

Various assumptions are to be considered and they will be presented later, however, one that quickly needs to be assured is the fact that the number of transmitted signals is less or equal to the number of received signals, n ≤ m. This assumption is pivotal in the development of MIMO receivers, so that the set of equations modeling our channel is not to be underdetermined. The linear MIMO channel is modeled as,

y= H u + w, (2.1)

where H ∈ Cm×n _{is called the channel matrix and w ∈ C}m_{is the additive noise. The symbols}

are transmitted over this channel, that varies over time and according to the scenario. However, the channel is assumed to be deterministic and perfect knowledge of the channel at the receiver is acquired, hence, the channel estimation problem is left out of this work. With this stated,

the channel coefficients, hi j, are considered to be well determined values taken from a complex

(24)

6 Chapter 2 MIMO Detection

Fig. 2.1: MIMO channel with n inputs and m outputs.

zero-mean unit-variance Gaussian distribution, N(0, 1). This is known as the flat-fading complex MIMO channel.

The noise is modeled as being a zero mean, circularly symmetric complex Gaussian with variance σ2

w. This assumption as the premise that noise comes from many different sources, and,

according to the central limit theorem, this distribution tends to be a Gaussian with the increasing number of components. The vector of transmitted symbols is a n × 1 i.i.d., independent and identically distributed, u = [u1, u2, ..., un]T = a + jb ∈ An, and, each component ui = ai+ jbi ∈

A. Similarly, the received signal and noise vectors are denoted as, y = [y₁, y₂, ..., ym] and

w = [w₁, w₂, ..., wm].

The detector has the function of determining an estimate of the transmitted message, ˆu, given the channel matrix and the received signal, H and y, respectively. The correct detection of the received symbols varies with the so called Signal-To-Noise Ratio (SNR), measured in dB, which is defined as SN R= 10 log₁₀ nEs σ2 w , (2.2)

where Esis the signal energy. For the rest of this work, the considered scenarios are comprised

(25)

Chapter 2 MIMO Detection 7

2.1.1 QAM Energy

QAM is a two dimensional generalization of Pulse Amplitude Modulation (PAM). A Mth order PAM constellation corresponds to a message symbol being modulated into a set of signals with M different amplitude values in the constellation. The common basis function used in PAM is

ϕ₁(t)= √1 T sinc t √ T , (2.3)

which corresponds to a periodic signal over time with period T. Note that any other function

with unit-energy could be used. The amplitudes are then, ±d

2, ±3d2, ..., ±(M−21)d, and d refers

to the minimum distance between points in the constellation. The average energy of a PAM constellation is

EP AM = Es, P AM =

d2

12 M2−1. (2.4)

Taking the same unit-energy function, the QAM basis is, ϕ₁(t)=r 2 T sinc t √ T cos ωct, ϕ₂(t)= −r 2 T sinc t √ T sin ωct. (2.5)

M-Squared QAM constellations, as the ones illustrated in Figure 2.2, are the result of the

cartesian product of two√M-PAM constellations. The points in these constellations are placed

as before, but this time is done in both dimensions. With this idea in mind, the average energy of the M-QAM constellation is

EM −Q AM = 2E√_{M −P AM}

= d₆2 M −_1, (2.6)

and, thus, the energy per dimension is

Es, MQAM =

EM −Q AM

2 = ₁₂d2 M −_1.

(2.7) As it can be seen in Figure 2.2, for squared QAM constellations the minimum distance between symbols is equal to 2, and the expression above turns into

Es, MQAM =

1

3 M −1. (2.8)

(26)

Fig. 2.2: Squared QAM constellations.

2.1.2 Real-Valued System Model

To reduce complexity on the formulation of the following algorithms, the complex-valued system model presented before is converted into a real-valued one. This new model is a double-sized version where the real < · and imaginary = · are considered separately. According to this model, we define the new transmitted symbol vector as

e

u= ha b

iT .

Following the same structure the received symbols vector and noise vector are defined asey =

h <(y) =(y) iT andwe= h <(w) =(w) iT

. The channel matrix is then,

(27)

Chapter 2 MIMO Detection 9 The system model of equation (2.1) is then written as

ey = eHeu+w.e (2.10)

The new alphabet is represented byAe, that constitutes the alphabet for the real and imaginary

components of the M-QAM signal with energyfEs = Es/2.

In addition two more remarks are to be considered. The first is that the constellation order (M) is considered to be a square number meaning a squared constellation as it eases the implementations in software of the different detection algorithms as no dependency need to be considered when treating the symbols’ real and imaginary parts. The second remark is that in order to achieve uniformity throughout the work presented the system model presented in this chapter is considered for the whole work, and, in order to ease the reading, the notation with g· will be dropped throughout this report.

2.2 Maximum Likelihood Detection

As previously mentioned, the receiver’s purpose is to determine an estimate of the transmitted message, ˆu, given the channel matrix and the received signal, H and y. Under certain assumptions previously referred, the optimal detector that minimizes the error probability,

P e =∆ P ˆu , u_,

(2.11) is the Maximum Likelihood Detector (MLD). On Gaussian noise condition, minimizing the error probability in the maximum likelihood sense is equivalent to

min

˜

u ∈ Ank y − H ˜uk

2_. _(2.12)

The minimization problem can be solved by searching over all possible transmitted vectors,

becoming, in this case, necessary to compute the functional for all An_{different values of u. As}

shown in [13], the problem of equation (2.12) can be altered into an equivalent one with resort to QL decomposition of the channel matrix. The new minimization problem becomes,

min

˜

u ∈ Ank ˜y − L ˜uk

2_, _(2.13)

where ˜y , QT_y_{. By substituting the problem, finding the solution can be viewed as a tree}

(28)

Fig. 2.3: MLD tree representation.

The solution to this constrained problem is trivial for small MIMO configurations, however, finding it scales exponentially with the number of antennas transmitting (n) which is a problem in massive MIMO scenarios.

2.3 Sphere Decoding

The design of ML decoders using number-theoretic tools for searching the closest lattice point has seen increasing development due to its MIMO applications. These decoders are known as Sphere Decoders (SD), and, their received attention revolves around their significant performance gain over other sub-optimal decoders and their average polynomial complexity for a wide range of system parameters. These system parameters are SNR, n, m and M. The purpose of assessing the performance of this detection algorithm in this work is to provide a lower-bound estimate on the error probability for the system configurations where the complexity of the optimal ML decoder is too high, restricting its use.

(29)

Fig. 2.4: Generic SD block diagram.

SD using the Schnorr-Euchner searching strategy shown in [6]. As it can be seen in the literature, SD type algorithms comprise three different structures as shown in Figure 2.4.

• Pre-processing unit:

– Channel matrix H is inverted, H−1.

– QR decomposition to H is performed, H = QR, and, it is calculated L = R−1, which

is a lower-triangular matrix. • Pre-decoding unit:

– Initial estimate to the transmitted message using ZF decoder, z = H−1yT

. • Decoding unit; Corresponds to the K-best SE Sphere decoder.

K-Best SE Sphere Decoder Algorithm

The maximum number of paths to be searched, (K), and the radius of the sphere, (C), are first chosen by the user.

Initialization:

e₁= z (2.14)

bestdist = 0 (2.15)

k = n, (2.16)

(30)

12 Chapter 2 MIMO Detection 1. for i = 1,...,length(bestdist) ˜ut,k = a, a ∈ A (2.17) ˜yt = (ei,k− ˜ut,k) l_k,k (2.18) newdist_t = bestdist_i+ y_t2 (2.19)

t= 1, 2, ..., T, where T = 1, 2, ..., length(bestdist) · length(A).

2. Sort in ascending order the variable newdist. Choose the K best paths whose values

newdisttare lower than C. Discard the remaining paths and adjust ˜U and ˜y accordingly.

3. for i = 1,2,...,length(bestdist), calculate

e_{i, j} = e_{i, j}− ˜y_i· lk, j, j = 1, 2, ..., k − 1. (2.20)

4. if k , 1, k := k − 1 and return to step 2. Otherwise, stop and return the first row in the U matrix corresponding to the path with shortest metric newdist corresponding to the estimated ˆu.

Fig. 2.5: Tree representation of sphere decoder with K=2 and R=4.

(31)

2.4 Simulation Specifications

All simulations in this work are performed under the assumptions presented in the previous sections. The number of channel realizations is equal to 5000, and, this number will be used whenever more simulations occur to achieve uniformity and coherence in the results. The studied scenarios are uniform for the entire work, and they are:

• Scenario 1: Number of transmitters (n) = 4, number of receivers (m) = 4 and complex constellation size (M) = 4.

2.5 Performance Results

(32)

Fig. 2.7: K-best SE Sphere Decoder performance for scenario 2.

(33)

Fig. 2.9: K-best SE Sphere Decoder performance for scenario 4.

Two conclusions are drawn from these results. The first is that for the same constellation size, increasing the number of transmitting antennas reduces the Symbol Error Rate. This is possible

to observe in Figures 2.6 and 2.7 by noticing that SER = 10−3 _{is reached for ≈ 23dB and}

≈ 19.5dB, respectively. Other conclusion is that for the same number of transmitting antennas

increasing the constellation order increases the SER. Observing the results in the Figures 2.8 and

(34)

(35)

Chapter 3

Linear Low-Complexity Receivers and

State-Interference Cancellation

Finding the optimal solution requires an exhaustive search over all possible solutions, which is limiting in several MIMO detection scenarios. For instance, when n = 12, m = 12, and M = 64, the number of different transmitted vector is 6412_≈_{4.7 · 10}21_{prohibiting its use. The emergence}

of receivers with lower complexity, capable of solving this problem with a smaller computational time resulted in a trade-off between accuracy and complexity. In this chapter the simplest linear receivers along with their non-linear Successive Inference Cancellation (SIC) version will be described and analysed.

3.1 Zero-Forcing

The simplest detector is the Zero-Forcing (ZF) detector. The name is derived from its relaxation of constrained problem in 2.12 solving the simpler Least-Squares (LS) problem

min

˜

u ∈Rnk y − H ˜uk

2_, _(3.1)

and, then map the unconstrained LS solution to the closest point in An_{. The solution to (3.1) is}

thus,

G= H† (3.2)

uZ F = Gy, (3.3)

where G is known as the equalization matrix and H†_{denotes the Moore-Penrose pseudoinverse}

of the channel matrix H. In cases where the channel matrix is square and invertible, the

pseudoinverse becomes H†_{= H}−1_{. For the remainder of the cases where n < m and the number}

(36)

18 Chapter 3 Linear Low-Complexity Receivers and State-Interference Cancellation of linearly independent columns in H is > n,

H†= (HHH)−1HH.

The solution uZ F does not necessarily belong to the constellation space, thus, the estimated

symbols vector ˆu is determined component-wise, and,

ˆ

u = Q uZ FA, (3.4)

where Q · A represents the quantization function of finding the closest symbol in the symbol

constellation A. This solution, even though it is simple, leads to poor performance due to noise amplification when the matrix H is ill-conditioned.

3.2 Minimum Mean Square Error

In order to lower noise-enhancement in ZF detection and reduce the receiver sensitivity to noise, the Minimum Mean Square Error (MMSE) detector is introduced. The idea is to minimize the

mean square error E kGY − xk2

. The solution to this minimization problem is G= HHH+ (σ_n2/Es)I

−1

HH (3.5)

uM M SE = Gy. (3.6)

Similarly to the ZF solution, the MMSE solution, uM M SE, might not belong to the constellation.

Equation (3.7) is the result mapping the MMSE solution to the constellation An_.

ˆ

u = Q uM M SEA (3.7)

3.3 V-BLAST

Vertical Bell Laboratories Space-Time (V-BLAST) architecture detectors, described first in [14], makes use of linear nulling techniques, ZF or MMSE, and non-linear techniques, symbol cancellation. This iterative algorithm can be summarized in the four different steps below.

1. Ordering: Choose the best channel.

2. Nulling: Apply linear transformation (ZF or MMSE). 3. Detect: Make a symbol decision.

4. Cancellation: Subtract the effects of the detected symbol.

(37)

Chapter 3 Linear Low-Complexity Receivers and State-Interference Cancellation 19 criterion. Secondly, the symbol cancellation comprises of the subtraction of the effect of the detected symbol from the received signal vector. This results in fewer interferers when detecting the next symbol.

The performance of the algorithm varies with the order in which a symbol effect is removed.

The reasoning behind this is that when detecting a certain symbol, ui, the subset to which

it belongs is different, and consequently, which rows in the nulling matrix this detection is

constrained. As an example, consider the number of transmitters equal to 4 determining u1first,

subject to u2, u3and u4, is different than determining u3first, subject to u1, u2and u4. The best

symbol to be detected is determined according to its certainty level, meaning, the symbol with highest value in the channel matrix diagonal.

V-BLAST Algorithm

1. Determine the nulling matrix according to equations (3.3) or (3.6).

Recursion:

2. Choose the optimal channel to remove by calculating:

kl = argmin

j

k(Gl)jk2, (3.8)

wki = (Gl)ki, (3.9)

where (·)kidenotes the ki row of a matrix.

3. Perform symbol detection.

uSIC,kl = (w)

T

kiyl (3.10)

ˆukl = Q(uSIC,kl)A (3.11)

4. Remove the effects of the already detected symbol and update y and H.

y_l₊₁= y_l− ˆuklHkl (3.12)

Hl+1= Hl\kl, (3.13)

where Hl\kl denotes nulling column kl of the Hlmatrix.

(38)

20 Chapter 3 Linear Low-Complexity Receivers and State-Interference Cancellation

3.4 Performance Results

In this section are presented the performance results of ZF, MMSE and MMSE-SIC detectors.

Fig. 3.1: ZF, MMSE and MMSE-SIC detectors performance for scenario 1.

(39)

Chapter 3 Linear Low-Complexity Receivers and State-Interference Cancellation 21

Fig. 3.3: ZF, MMSE and MMSE-SIC detectors performance for scenario 3.

(40)

(41)

Chapter 4

Expectation Propagation Detector

Finding the true posterior to a certain distribution attains high complexity, and, in certain cases as the MIMO detection this is intractable. Building tractable approximations to this distributions is a problem tackled in Bayesian approximate inference. Throughout this chapter the Expectation Propagation (EP) algorithm along with its extension with alpha-divergence are introduced. The performance results for both detectors are presented at the end of the chapter.

Consider a vector of random variables x with its distribution p(x) given by the product of several factors,

p(x) ∝ f (x)Ö

i

ti(x), (4.1)

where f (x) belongs to an exponential family with sufficient statistics φ(x) = [φ1(x), ..., φm(x)]

and ti(x), ∀i ∈ I, are non-negative factors. Expectation Propagation firstly described by

Thomas Minka in his Doctorate Thesis [17] has as its goal finding approximate the true posterior distribution p(x) with another distribution q(x) belonging to a family of tractable distributions, the exponential family.

Exponential Families

Let x ∈ Xm_{be a random variable and φ}

αsome functions, denoted as sufficient statistic, mapping

from the variable space to the the real space. The exponential family associated with this sufficient statistics φ and parameterized by θ consists of

pθ(x)= exp Õ α θαφα(x) − A(θ) , (4.2)

where A(·) is known as the log partition function and ensures that the the distribution is normal-ized. For a set of fixed sufficient statistics φ, the canonical or exponential parameters θ index a specific distribution corresponding to weighting of the sufficient statistics.

(42)

24 Chapter 4 Expectation Propagation Detector One important property of exponential families is the fact that if one multiply or divide two distributions belonging to the same exponential family the resulting distribution still belongs to the same exponential family. Multiplication or division of two distributions means adding or subtracting the coefficients, changing the weights but not the set of sufficient statistics.

Multivariate Gaussian Distribution

p(x)= 1 p|2πΣ|ex p − 1 2(x −µ) 0 Σ−1(x −µ) = exp − 1 2log(|2πΣ|) ex p −1 2(x −µ) 0 Σ−1(x −µ) = exp ( − 1 2 " x0Σ−1x −2µ0Σ−1x | {z } θ0_φ(x) +µ0 Σ−1µ + log(|2πΣ|) # ) (4.3)

By examining the transformation to the exponential form, its possible to note that

φ(x) = [vec(xx0_{), x]}0 _(4.4)

θ = [vec(Σ−1

), −2Σ−1µ]0 (4.5)

4.1 Expection Propagation

The idea behind EP is that while performing inference over p(x) is intractable it might not be intractable to perform inference over a ˆpi distribution displayed below.

ˆpi(x) ∝ f (x)ti(x), (4.6)

where only one of the factors ti(x)in equation (4.1) is present.

With this idea in mind, first is defined the approximate distribution q(x), q(x) ∝ f (x)

I

Ö

i=1

˜ti(x), (4.7)

where ˜ti(x) ∈ F, ∀i ∈ I. The distribution above is obtained by replacing every factor ti(x)in

equation (4.1) with an approximation ˜ti(x). At every iteration of the algorithm every factor is

refined which is done by sequentially substituting one approximating factor ˜ti(x)from (4.7) with

the true factor ti(x)to obtain a new still tractable distribution ˆp(x) = q\i(x)ti(x). With this new

distribution it is determined a new approximating distribution qnew_(x)_{, that includes information}

from this true factor, as a result of minimizing the Kullback-Leibler (KL) divergence between this still tractable distribution and the approximating distribution.

qnew(x)= argmin

q0_{(x)∈ F}

(43)

Chapter 4 Expectation Propagation Detector 25

Fig. 4.1: Iterative algorithm illustration.

Afterwards, it is proceeded to determine the new refined term ˜tnew

i (x), which is done dividing

the cavity distribution qnew_(x)_{with q}\i_(x)_{. As illustrated in Figure 4.1, the EP algorithm is an}

iterative algorithm whose goal is to achieve a distribution whose "distance" to p(x) is minimal. In the image, green coloured circles represent the approximating distribution q(x) belonging to a certain exponential family whereas the yellow and red colours represent ˆp(x) and the intractable distribution to approximate, respectively. The process of factor refinement is summarized below with the EP algorithm.

EP Algorithm

For all i

1. Compute the cavity distribution

q\i(x)= q(x)

˜ti(x) (4.9)

2. Find the new distribution obtained by projection of the more or less tractable distribution into the exponential family space

qnew(x)= proj q\i(x)ti(x) (4.10) = argmin q0_{(x)∈ F} DK L(q\i(x)ti(x)kq0(x)) (4.11)

3. Obtain the refined factor ˜tnew i (x)

t_inew(x) ∝ q

new_(x)

(44)

26 Chapter 4 Expectation Propagation Detector Minimizing the KL divergence between two distributions p and q is achieved by

Eq(x)[φi(x)]= Ep(x)[φi(x)] ∀i, (4.13)

known as moment matching technique. Given this, the second step in the EP algorithm is equivalent to

E_˜tne w

i (xq\i(x[φi(x)]= Eq\i(x)ti(x)[φi(x)] ∀i. (4.14)

4.1.1 EP Detector

The application of the EP algorithm to MIMO detection was introduced by Javier Céspedes in [9], and, in this section it is described in detail. Given the system model, the posterior distribution of the transmitted symbol vector u is

p(u| y) ∝ N (y : H u, σ_w2I )

n

Ö

i=1

1ui∈ A, (4.15)

where 1 is the indicator function. The approximating distribution q(x) is obtained by replacing each factor in the true posterior with an unnormalized Gaussian

q(u) ∝ N (y: H u, σ_w2I )

n

Ö

i=1

ex p(γiui− 1₂Λiu2i) (4.16)

where γiand Λi> 0 are real values ∀i. The approximating distribution is a Gaussian distribution,

since it is a product of Gaussian distributions is itself a Gaussian distribution. The mean µ and covariance matrix Σ are given by equations (4.17) and (4.18).

Σ= σ_w−2HTH+ diag(Λ−1 (4.17)

µ = Σ σ−2

w HTy+ γ (4.18)

If a certain factor tidepends solely on a subset xiof x, the approximating factor ˜ti(xi)is defined

over the same domain, thus the update can be performed over the marginal distribution q(xi);

this is the case for the factors in equation (4.16). Consequentially, instead of, at every iteration sequentially refining each factor, the factors are updated in parallel. This process is the so-called parallel EP detector (EPD).

Parallel EP Detector Algorithm

Initialization: γi = 0 and Λi = Es∀i

(45)

Chapter 4 Expectation Propagation Detector 27 1. Compute the cavity marginal

q(l)\i(ui)= q(ui) ex p(γiui− 1₂Λiu2_i) ≈ N ui : t_i(l), h2(l)_i (4.19) where h_i2(l)= σ 2(l) i 1 − σ_i2(l)Λ(l)_i , (4.20) t_i(l)= h2(l)_i µ (l) i σ_i2(l)−γ_i(l) ! (4.21)

2. Compute the mean µ(l)

ˆpi and variance σ

2(l) ˆpi of the

ˆp(l)_(u

i) ∝ q(l)\i(ui)1ui∈ Ai. (4.22)

3. Obtain the refined factor described by γ(l+1) i , Λ (l+1) i γ(l+1) i = β µ(l)_p i σp2(l)i − t (l) i h2(l)_i + (1 − β)γ (l) i (4.23) Λ(l+1) i = β 1 σ_p2(l)_i − 1 h2(l)_i + (1 − β)Λ(l) i (4.24) If Λ(l+1) i < 0, then Λ (l+1) i = Λ (l) i and γ (l+1) i = γ (l) i .

In step 3 was introduced a smoothing parameter β to improve the robustness of the algorithm, this value impacts the settle point as well as the speed of convergence. The value of beta can be viewed as the length of the step in the gradient direction in the gradient descent algorithm.

4.2 Alpha-Divergence

To measure the difference between two probability distributions, p and q, defined over the same variable x, the Kullback-Leibler divergence or KL divergence is used. It is a measure of how much information is lost when using q to approximate p. Let p(x) and q(x), two probability

distributions of a discrete random variable x. The KL divergence (DK L) between the two

distributions is defined as DK L(p||q)= Õ x p(x)log p(x) q(x) . (4.25)

(46)

28 Chapter 4 Expectation Propagation Detector The KL divergence has some properties that make it useful in different applications.

1. DK Lis convex with respect to both p and q.

2. DK L ≥ 0.

3. DK L = 0 when p = q.

Although is commonly presented as being the "distance" between two distributions, the KL divergence is not a distance measure as it is not a symmetric function, and, the DK L(p||q)is not

the same as DK L(q||p).

In order to obtain more freedom in choosing the metric to approximate a distribution, one can use the α-divergence, which is defined as

Dα(pkq)= 4 1 − α2 1 − ∫ x p(x)(1+α)/2q(x)(1−α)/2dx , α ∈ [−∞, +∞] (4.27)

The α-divergence shares the same properties as the KL divergence, and, finding the best approx-imation q(x) to p(x) reduces to minimizing the α-divergence between the two distributions.

4.2.1 α-EP

The α-EP algorithm by Minka in [18] appears as an extension to the original EP algorithm presented before. In this new algorithm is used the α-divergence instead of the KL divergence to determine the projection of the "not so bad" distribution into the exponential family space. Minimizing the α-divergence is equivalent to minimize the KL divergence with the exact distri-bution raised to a power na. Inference over a new class of distridistri-butions becomes possible as it can simplify computations. For instance, if a certain distribution is raised to a certain power b, one can pick the value of na equal to −b, such that the power factor is cancelled. α-EP algorithm has the exact same structure as the EP algorithm presented before, being the only difference found in equation (4.11) where the KL divergence is swapped with α-divergence.

qnew(x)= argmin q0_{(x)∈ F} DK L(q\i(x)ti(x)kq0(x)) qnew(x)= argmin q0_{(x)∈ F} Dα(q\i(x)ti(x)kq0(x)) (4.28)

By setting α = (2/nα) −1, the minimization in equation (4.28) becomes

(47)

4.2.2 Power EP Detector

Applying α-EP to the MIMO detection problem is not the classical example of how this algorithm should be used as it was meant to make computations easier by exponent cancellation. In MIMO, the distribution to approximate is a "Gaussian", and, performing inference over it is already computationally simple. However, new divergence metrics can be used with this new algorithm. With some simple changes to the original EPD its performance could be assessed. A particular way of implementing α-EP, denominated Power EP (PEP) is presented next.

Parallel Power EP Detector Algorithm

Initialization: γ_i = 0 and Λ_i = E_s_∀i

For a certain value α determine nα

Until convergence or Maximum number of iterations. At a certain iteration (l) of the algorithm 1. Compute the cavity marginal

q(l)\i(ui)= q(u_i) ex p(γiui− 1₂Λiu2_i) ≈ N ui : t_i(l), h2(l)_i (4.31) where h_i2(l)= σ 2(l) i 1 − σ_i2(l)Λ(l)_i /nα, (4.32) t_i(l)= h2(l)_i µ (l) i σ( i2(l) −γ_i(l) ! /nα (4.33)

2. Compute the mean µ(l)

ˆpi and variance σ

2(l) ˆpi of the

ˆp(l)_(u

i) ∝ q(l)\i(ui)1ui∈ Ai. (4.34)

(48)

30 Chapter 4 Expectation Propagation Detector

4.3 Performance Results

The performance for the EPD along with the performance of the PEPD for certain values of α, whenever its performance is better than that of EPD, are presented in this section. For both detectors the value of β is equal to 0.2. The PEPD was tested for α ranging from 0.1 to 0.9 with a step of 0.1. In every scenarios the number of iterations in the EP and PEP algorithm is equal to 10 as suggested in [9]. This is the minimum number of iterations where convergence of the algorithm is assured for all considered scenarios.

For the first scenario represented in Figure 4.2, in order to achieve uniformity with previous results a lower limit on SER = 10−3_{is maintained. In this scenario, for all values of α considered,}

EPD outperforms PEPD except for the case where α = 0.4. In addition, the performance of PEPD is worse until SNR ≈ 15dB where it evens, after that the performance of the PEPD is

slightly better with a marginal gain of approximately 0.1dB at SER = 10−3_.

Extending the lower limit to SER = 10−4_{, PEPD outperforms EPD for a wider range of values}

of α. From all different values, α = 0.1 and α = 0.4 were considered as they showed higher gain

against EPD at different SNR ranges. At SER = 10−4_{, for α = 0.1 the gain was roughly 2dB}

against EPD, however, numerical instability in the algorithm surges for SNR > 26dB. In order to overcome this problem, different values of α were considered for different SNR ranges as a combinatorial result of the three curves. The result of this is seen in Figure 4.3, the two pink dashed curves show the performance of the PEPD for α = 0.1 and α = 0.4, the blue curve shows the performance for the EPD (PEPD with α = 1.0) and the pink line is the result of picking the best of all three curves, referred to as PEPD with combinatorial α.

(49)

Fig. 4.2: EPD and PEDP (alpha=0.4) performance for scenario 1.

(50)

32 Chapter 4 Expectation Propagation Detector

Fig. 4.4: EPD performance for scenario 2.

(51)

(52)

(53)

Chapter 5

Inference over Graphical Models

The underlying concepts of factor graphs and the sum-product algorithm appeared first in the context of error-correcting coding with Low-density parity-check (LDPC) codes introduced by Robert G. Gallagher in [19]. In his research, bipartite graphs are used to describe different codes and the sum-product is described in this application.

Creating an unifying framework for modeling systems in the fields of telecommunications or computer vision is achieved by using graphical models. Factor Graphs are used to express the factorization of the joint probability of variables. The use of algorithms operating over these graphs are presented in this chapter with specific application in the context of MIMO detection.

5.1 Factor Graphs

Let x1, x2, ..., xn be a collection of variables where each xi takes values in some domain Ai.

Denoting g(x1, x2, ..., xn)as a function of these variables with domain S and codomain in R. The

domain

S= {A₁× A₂×... × An}

of g is denominated configuration space for the given collection of variables, and each element of S corresponds to the assignment of a value to each variable. The purpose of such graphical representation of joint distributions and algorithms is to enable the computation of the marginal distributions of these different variables. Assuming summation is possible in the function domain, then, associated with every function g(x1, ..., xn)are n marginal functions gi(xi). For

each a ∈ Ai, the value of gi(a) is obtained by summing the value of g(x1, ..., xn) over all

configurations of the variables that xi = a.

(54)

36 Chapter 5 Inference over Graphical Models

Fig. 5.1: Top: Factor graph of function in equation (5.3). Bottom: Factor graph of function in equation (5.4).

The ith marginal function over g(x1, ..., xn)is thus defined as

gi(xi)= Õ ∼xi g(x₁, ..., xn)= Õ x1∈ A₁ ... Õ xn∈ An g(x₁, ..., xn). (5.1)

In order to formally introduce factor graph representation, let’s assume the global function g(x₁, ..., xn) factors as the product of local functions, fi, each of which belonging to different

sub-domains of S

g(x₁, ..., xn)=

Ö

i ∈I

fi(Xi) (5.2)

where I denotes the number of different factors and Xi is the sub-set of S containing the inputs

of the ith local function.

A Factor Graph (FG) is defined as a bipartite graph expressing the way in which a function factors, as in equation (5.2). There exists variable nodes representing every variable xiand factor

nodes for every local function fi. Connecting a certain variable node, xi, and a factor node, fi,

exists an edge if the variable is an argument in the local function represented by such factor node. As an example, let’s consider two functions g1 and g2, both functions of 4 variables x1, x2, x3

and x4factorizing as

(55)

Chapter 5 Inference over Graphical Models 37

g₂(x₁, x₂, x₃, x₄)= f₁(x₁) f₂(x₁, x₂) f₃(x₁, x₂, x₃) f₄(x₁, x₂, x₃, x₄). (5.4) For both g1and g2 the set of factors I = {1, 2, 3, 4}. For g1, X1 = {x1, x3}, X₂ = {x₂, x₃, x₄}, X₃ = {x₃} and X₄ = {x₄}, and, for g₂, X₁ = {x₁}, X₂ = {x₁, x₂}, X₃ = {x₁, x₂, x₃} and X₄ = {x₁, x₂, x₃, x₄}. The graphical models for g(·) in equations (5.3) and (5.4) are shown in Figure 5.1. One remark is that the graph on top has no loops and is denominated as loop-free or cycle-free.

5.2 Sum-Product Algorithm

The sum-product algorithm, often referred as Belief Propagation (BP), can be viewed as a way of computing variable marginals, gi(xi)through local processing units. The messages mx→ f(x)

denote the messages sent from one variable node x to local function f , and, mf →x(x)are the

messages sent from function to variable node. These messages are calculated as follows, mx→ f(x)= Ö h ∈n(x)\ f mh→x(x) (5.5) mf →x(x)= Õ ∼{x } f (X) Ö y ∈n( f )\x my→ f(y) , (5.6)

where n(·) denotes the neighbors of a certain node. The sum-product algorithm operating over factor graphs propagates the information through messages sent to adjacent nodes.

These messages are calculated according to the following rule in [20]: Message sent from a

node v on an edge e is the product of the local or unit function at v with all messages received at v on edges other than e, summarized for the variable associated with e. An example of this

message passing is shown in Figure 5.2.

For loop-free graphs that can be represented by trees the sum-product algorithm determines

the marginals of xiwith one message sent each direction at every edge. When in the presence of

loops, this algorithm is often referred to loopy-BP, and, in this case the sum-product algorithm

yields no perfect summary for the variable xi. There is though the need for passing the messages

multiple times over each edge, becoming an iterative algorithm in contrast to its application to loop-free graphs. At termination the marginal or belief of a certain variable xiis proportional to

the product of that local node with all messages arriving at that same node. gi(xi) ∝

Ö

j ∈n(xi)

(56)

Fig. 5.2: Messages in a cycle-free graph.

In the example pictured in the figure above, g₁(x₁)= mf₁→x₁

g₂(x₂)= m_f₂_→x₂

g₃(x₃)= m_f₁_→x₃mf₂→x₃mf₃→x₃

5.3 Gaussian Tree Approximation

Gaussian Tree Approximation (GTA) Detector is an example of application of the sum-product algorithm to MIMO detection where the exact posterior probability,

(57)

5.3.1 Gaussian Tree Construction

Using the same notation as the one found in the paper describing GTA by Goldberger and Leshem [8], a tree with n nodes is therefore described by its parent relations, p(i) for every node i, where p(i) is the parent node of i. A certain distribution g(x1, ..., xn) is then described by a

first-order dependence tree if

g(x₁, ..., xn)= n

Ö

i=1

g(xi| xp(i)).

This representation is an approximation of the true distribution using only a set of n−1 first-order dependencies between all variables, and, the first task in this algorithm is to determine which of these better represent the true distribution. Derivations found in the original paper show that if a distribution f (x1, ..., xn)is to be represented by a tree, g(x1, ..., xn) = Îni=1g(xi| xp(i)),

the tree that minimizes the KL divergence is the best tree approximation and is represented by conditionals of the distribution to approximate, g(x1, ..., xn)= Î_in₌₁f (xi| xp(i)).

By applying the formula in equation (4.26) to determine the divergence between f (x) and its

tree representation În

i=1 f (xi| xp(i)), one gets after equation manipulation

DK L f || n Ö i=1 f (xi| xp(i)) = n Õ i=1 h(xi) − h(x) − n Õ i=1 I(xi; xp(i)), (5.9)

where h(·) denotes the differential entropy of a random variable and I(·) the mutual information

between two random variables. By noticing that h(xi) and h(x) do not depend on the tree

structure the result reduces to DK L f || n Ö i=1 f (xi| xp(i)) = − n Õ i=1 I(xi; xp(i)). (5.10)

This means the tree that best approximates f (x) is in turn the one that maximizes the sum in equation (5.10), and, determining the optimal tree from the fully connected graph representing

f (x)reduces to the well studied problem of finding the Maximum Spanning Tree (MST) from a

fully-connected n-node graph where the edge weights between two nodes i and j is given by the mutual information between the two variables xiand xj.

The mutual information is a measure of how much information one can obtain from a certain

variable xj by observing another variable xi. In the case of these two variables being jointly

Gaussian

I(xi; xj)= −log 1 − ρ2i j

_,

(5.11) where ρi jis the correlation coefficient between i and j. As when determining the tree the absolute

(58)

tree maximizes the sum of the square correlation between nodes.. The algorithm used to

determine the MST is Prim’s Algorithm explained in detail in appendix A.

5.3.2 BP over Tree Graphs

Given the results in the previous results, the optimal Gaussian tree approximation, ˆf(x), to the true posterior in equation (5.8), is

ˆp(x|y) ∝ ˆf(x) = f (x1; µ, Σ) n

Ö

i=2

f (xi| xp(i); µ, Σ), x ∈ An. (5.12)

This is a tree structured distribution, and, as mentioned before, by applying the BP algorithm to it one will arrive at the exact marginal of ˆp(x|y) with only one message passed at every edge in each direction. There are two different steps in running BP over a tree graph, the first is to pass the messages from the leaves to the root node, the "downward" messages, and, after this is complete send the messages back from the root node to the leaves, "upward" messages. A quick remark is the fact that this is a modified version of the BP algorithm introduced before as it works over a different graphical representation. The downward message to be sent down the tree from a certain node i to its parent node p(i) is

m_i→p(i)(xp(i))= Õ xi∈ A f (x_i| xp(i); µ, Σ) Ö j:p(j)=i m_j→i(xi). (5.13)

This messages are computed starting from the different leaf nodes and are passed through the tree until reaching the root node. The upward message to be sent from a certain node p(i) to its children i is

mp(i)→i(xi)=

Õ

xp(i)∈ A

f (xi| xp(i); µ, Σ)mp(p(i))→p(i)(xp(i))

Ö

j:j,i,p(j)=p(i)

mj→p(i)(xp(i)), (5.14)

where the messages are sent up the tree until they reach every leaf. These messages are computed based on the messages received from the other siblings to a certain node i and the message arriving from the parent of its parent node. In the case of the root node the messages from its parent are considered to be equal 1, and the second term in the sum disappears. When all messages have been sent, the belief at each variable is computed as the product of all messages arriving at it

belie fi(xi)= mp(i)→i(xi)

Ö

j:p(j)=i

mj→i(xi), xi ∈ A (5.15)

In the case of computing belief at the root node, no parent messages exist, and, belie fi(xi)= f (xi; µ, Σ)

Ö

j:p(j)=i

(59)

Chapter 5 Inference over Graphical Models 41 To determine the estimated transmitted symbol vector ˆu, one needs at each symbol to select that maximizes the belief,

ˆx = argmax ˜x∈An belie fi(˜x). (5.17) GTA Algorithm 1. Compute µ = HT_H₊ σ2 EsI −1 HTy and Σ = σ2 HTH+ σ_E2 sI −1 . 2. Denote f (u_i; µ, Σ) = exp − 1 2 (ui−µi)2 Σ_ii (5.18) f (u_i|uj; µ, Σ) = exp − 1 2 ((ui−µi) − Σi j/Σj j(uj−µj))2 Σ_ii− Σ_{i j}2/Σj j . (5.19)

3. Compute the MST with edge weight between nodes i and j equal to the square of the correlation coefficient: ρ2

i j = Ci j2/(CiiCj j).

4. Run BP over the tree structured distribution and determine the estimate to the transmitted symbol.

5.3.3 GTA Performance Results

(60)

Fig. 5.4: GTA detector performance for scenario 2.

(61)

Fig. 5.6: GTA detector performance for scenario 4.

5.4 Alpha-BP

Message passing is also applied when the graph modeling the system has loops. The MIMO problem can be modeled as a MRF and running a modified version of the BP algorithm is one example of BP applied in graphs with loops, known in this case as loopy-BP. The posterior can be factored as p(x) ∝ n Ö i=1 fi(xi) Ö k ∈K tk(xi, xj), (5.20)

where fi is a singleton factor (only one variable factor), tk is a factor dependent on a pair of

variables and K denotes the set of all variable pairs. Following this representation, for a scenario with 3 transmitted signals, the factor graph is represent in Figure 5.7. The true posterior found in the previous equation can be approximated by

(62)

Fig. 5.7: Example of a Factor Graph for MIMO systems

The marginal is consequently calculated as, qi(xi) ∝ ˜fi(xi)

Ö

k ∈p(i)

mk→i(xi). (5.23)

Making use of the α-divergence introduced before to calculate the mismatch between q and p, the new update rule introduced in [22] is derived as a generalization to equations (5.5) and (5.6). The message from function node k to variable node i and the message from the variable node j to function node k are now given as

mk→i(xi) ∝ Õ xj tk(xi, xj)αmk→ j(xj)1−αmj→k(xj) · mk→i(xi)1−α, (5.24) mj→k(xj)= ˜fj(xj) Ö n ∈p( j)\k mn→ j(xj). (5.25)

Following the same procedure used to determine the new message update rules the singleton factors are refined as

˜fnew

i ∝ fi(xi)α ˜fi(xi)1−α. (5.26)

5.4.1 Prior Information

(63)

Fig. 5.8: Example of a Factor Graph for MIMO systems with prior information factors.

The introduction of this prior factor yields the following update rule as a change to that of equation (5.25)

mj→k(xj)= ˆpj(x j) ˜fj(xj)

Ö

n ∈p( j)\k

mn→ j(xj)p (5.27)

and equation (5.23) changes to

qi(xi) ∝ ˆpi(xi) ˜fi(xi)

Ö

k ∈p(i)

mk→i(xi). (5.28)

5.4.2 α-BP detector

For the MIMO scenario the posterior distribution function is given by equation (5.8) and the α-BP detector is described next.

α-BP Detector

Initialization:

1. Determine the factor graph and if prior information is available add it to the graph. 2. Define the singleton and pairwise factors as

(64)

46 Chapter 5 Inference over Graphical Models tk(ui, uj)= exp − uiSi, juj σ2 w . (5.30)

Until converge or Maximum number of iterations

3. For every edge in the graph compute the messages according to equations (5.25), (5.24) and (5.8).

4. Determine the estimate ˆu by determining for each variable node the symbol that maximizes the belief according to equation (5.28).

5.4.3 Alpha-BP Performance Results

The performance results for the α-BP detector are presented next. For scenarios 1 and 2, it was assessed the performance of the α-BP detector for α ranging from 0.1 to 1.0 with a step equal to 0.1. The results were also drawn for α-BP with MMSE and EP priors.

(65)

Fig. 5.10: GTA detectors performance for scenario 3.

In the first image is shown the performance results for α = 0.6 as it corresponds to the best result and for α = 1.0 corresponding to the regular BP algorithm. As it is possible to see, the performance greatly increases for α = 0.6 when compared to to that of regular BP. The introduction of the MMSE prior yields even better results and its performance is better than that of the MMSE detector for SNR ranging from 0 to 20 dB. The EP prior produces good results, but they are in fact very similar to that of EP making it not useful, as one needs to run EP in order to get the prior information.

(66)

(67)

Chapter 6

Conclusions

This thesis aimed at studying some of the newest MIMO receivers found today, that make use of statistical learning. This study included the assessment of their performance in a variety of scenarios. The study baseline included detectors based on the Expectation Propagation, Gaussian Tree Approximation and Belief Proppagation algorithms. The introduction of α-divergence enabled a new class of algorithms such as α-EP and α-BP whose performance is also subject of assessment during this thesis.

The EP detector shows performance results that in certain scenarios are close to that of the optimal detector and because of this some alterations to the orignial detector were tested in order to reduce the existing gap. The only alteration showing some gain to the original detector was that of introducing new divergence measures with Power EP, even though this improvement was only observed in the first scenario.

Other approach to improve the EP algorithm was the use of QR decomposition in the channel matrix, but it showed no performance improvements in contrast to that of BP as seen in [10]. QR decomposition factorizes a matrix as the product of two different matrices, an unitary matrix (Q) and an upper diagonal matrix (R), applying it to the channel matrix, with certain transformations to the system, reduces dependencies in the system and the number of loops in the factor graph. The reason why this approach does not work lays in the fact that the EP algorithm works by refining a set of factors, and, whose refinement has no relation to the channel matrix.

The application of the BP algorithm to MIMO detection is also tested in the context of loop free graphs with the GTA detector and although its performance is significantly better than that of loopy BP it is worse when compared to EP detector. In order to improve its performance, it was tested the result of combining different results provided from different trees as

ˆu = argmin

˜u∈ ˆu1,..., ˆuk

kH˜u − yk, (6.1)

where ˆu1, ..., ˆukare the different estimates by GTA detector starting at different root nodes. This

(68)

50 Chapter 6 Conclusions The use of α-divergence is used in the refinement of factors for BP, and, again, its performance is significantly better than that of loopy BP but when compared to that of the EP, GTA it is worse, being in certain cases even worse than MMSE. The performance could not be assessed for the third and fourth cases as the complexity of computing the messages sent from function to variable nodes increases exponentially with the number of transmitted symbols. This fact is limiting to assess its performance as it happens with the MLD.

(69)

Appendix A

Prim’s Algortihm

Prim’s algorithm, just as other algorithms like Kruskal’s algorithm this is a greedy approach for finding the Minimum Spanning Tree, but with a small adjust it can be used to determine the Maximum Spanning Tree. The idea behind this algorithm is simple and it uses two sets of vertices. One set contains the vertices already in the MST and the other the remaining vertices. At the begginning the first set is empty and one picks a certain node as the root node. The idea at each iteration of the algorithm every edge connecting the nodes in the two sets are considered, the one with the maximum weight is picked and its vertex is added to the MST set.

Prim’s Maximum Spannning Tree Algorithm

1. Create a set to keep track of the nodes included in MST.

2. Pick a node as the root node.

3. While there are vertices outside MST set.

• Pick the node i whose edge has the biggest weight. • Include the node i in the MST set.

• Update the edge weights with the new node in the set.

Next, an example of the Prim algorithm is illustrated. In this example, the oval box on the left represents the MST set (nodes included in the MST) while the box on the right represents the nodes outside de MST.

(70)

52 Appendix A Prim’s Algortihm

Fig. A.1: Example of a graph. Left Set: Nodes belonging to the tree Right Set: Nodes outside the tree.

Fig. A.2: Pick the root node and include in the left set. Example: Root node equal to u1.

(71)

Appendix A Prim’s Algortihm 53

Fig. A.4: Repeat, this time for all edges linking to node 1 and 3. Include u4.

Fig. A.5: Repeat and include u2.

(72)

Approximate Inference Low-Complexity MIMO Detection

Approximate Inference

Low-Complexity MIMO Detection

JOÃO MIGUEL FERNANDES TORRES

Abstract

Sammanfattning

Acknowledgements

Contents

List of Figures

Mathematical Notations

Acronyms

Chapter 1

Introduction

1.1

Historical Overview

1.2

Outline

Chapter 2

MIMO Detection

2.1

System Model

2.2

Maximum Likelihood Detection

2.3

Sphere Decoding

2.4

Simulation Specifications

2.5

Performance Results

Chapter 3

Linear Low-Complexity Receivers and

State-Interference Cancellation

3.1

Zero-Forcing

3.2

Minimum Mean Square Error

3.3

V-BLAST

3.4

Performance Results

Chapter 4

Expectation Propagation Detector

4.1

Expection Propagation

4.2

Alpha-Divergence

4.3

Performance Results

Chapter 5

Inference over Graphical Models

5.1

Factor Graphs

5.2

Sum-Product Algorithm

5.3

Gaussian Tree Approximation

5.4

Alpha-BP

Chapter 6

Conclusions

Appendix A

Prim’s Algortihm