Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications

(1)

UPTEC F10 059

Examensarbete 20 p November 2010

Polynomial Matrix Decompositions

Evaluation of Algorithms with an Application to Wideband MIMO Communications

Rasmus Brandt

(2)

(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications

Rasmus Brandt

The interest in wireless communications among consumers has exploded since the introduction of the ''3G'' cell phone standards. One reason for their success is the increasingly higher data rates achievable through the networks. A further increase in data rates is possible through the use of multiple antennas at either or both sides of the wireless links.

Precoding and receive filtering using matrices obtained from a singular value decomposition (SVD) of the channel matrix is a transmission strategy for achieving the channel capacity of a deterministic narrowband multiple-input multiple-output (MIMO) communications channel. When signalling over wideband channels using orthogonal frequency-division multiplexing (OFDM), an SVD must be performed for every sub-carrier. As the number of sub-carriers of this traditional approach grow large, so does the computational load. It is therefore interesting to study alternate means for obtaining the decomposition.

A wideband MIMO channel can be modeled as a matrix filter with a finite impulse response, represented by a polynomial matrix. This thesis is concerned with

investigating algorithms which decompose the polynomial channel matrix directly. The resulting decomposition factors can then be used to obtain the sub-carrier based precoding and receive filtering matrices. Existing approximative polynomial matrix QR and singular value decomposition algorithms were modified, and studied in terms of decomposition quality and computational complexity. The decomposition algorithms were shown to give decompositions of good quality, but if the goal is to obtain precoding and receive filtering matrices, the computational load is prohibitive for channels with long impulse responses.

Two algorithms for performing exact rational decompositions (QRD/SVD) of polynomial matrices were proposed and analyzed. Although they for simple cases resulted in excellent decompositions, issues with numerical stability of a spectral factorization step renders the algorithms in their current form purposeless.

For a MIMO channel with exponentially decaying power-delay profile, the sum rates achieved by employing the filters given from the approximative polynomial SVD algorithm were compared to the channel capacity. It was shown that if the symbol streams were decoded independently, as done in the traditional approach, the sum rates were sensitive to errors in the decomposition. A receiver with a spatially joint detector achieved sum rates close to the channel capacity, but with such a receiver the low complexity detector set-up of the traditional approach is lost.

Summarizing, this thesis has shown that a wideband MIMO channel can be diagonalized in space and frequency using OFDM in conjunction with an

approximative polynomial SVD algorithm. In order to reach sum rates close to the capacity of a simple channel, the computational load becomes restraining compared to the traditional approach, for channels with long impulse responses.

Examinator: Tomas Nyberg Ämnesgranskare: Mikael Sternad Handledare: Mats Bengtsson

(4)

Popul¨arvetenskaplig sammanfattning p˚a svenska

Tr˚adlös kommunikation är ett omr˚ade vars populäritet har ökat de senaste ˚aren. Ett skäl till ”3G-internets” framg˚ang är de höga datatakter som är möjliga. Datatakten i en tr˚adlös länk beror p˚a signalens bandbredd samt sändeffekten, och genom att öka endera erh˚alls högre datatakter. B˚ade bandbredd och sändeffekt är dock dyra resurser, eftersom deras användande ofta är reglerat av nationella och internationella myndigheter.

Ett annat sätt att öka datatakten i en tr˚adlös länk kan vara att lägga till fler anten- ner p˚a sändar- och mottagarsidan, ett s.k. MIMO-system. Ett s˚adant system kan ses som en uppsättning av enkelantennlänkar med inbördes p˚averkan och kan beskrivas av en matris. Da- tatakten för flerantennlänken kan maximeras genom att skicka flera parallella dataströmmar

över MIMO-kanalen. Eftersom de olika utsända signalerna samsas om radiokanalen kommer de att blandas. Varje mottagarantenn kommer därför att ta emot en kombination av de utsända signalerna fr˚an de olika sändarantennerna.

För att undvika att signalerna blandas m˚aste de kodas. Det visar sig att genom att koda de sända signalerna med en speciell matris, samt avkoda de mottagna signalerna med en annan matris, s˚a transformeras kanalen till en uppsättning av parallella virtuella kanaler. P˚a dessa virtuella kanaler kan sedan oberoende dataströmmar skickas. Kodningsmatriserna ges av en s.k. singulärvärdesuppdelning av den ursprungliga kanalmatrisen.

För ett enkelantennsystem med hög bandbredd kommer radiokanalen att p˚averka de olika frekvenskomponenterna i signalen olika. Om inte systemet tar hänsyn till den effekten kommer dess prestanda att p˚averkas. Ett sätt att undvika denna frekvensselektivitet är att signalera

¨over kanalen med s.k. OFDM. Genom OFDM-systemet delas den ursprungliga signalen upp i flera signaler med l˚ag bandbredd. Genom att skicka dessa smalbandiga signaler p˚a olika delar av frekvensbandet s˚a p˚averkar de inte varandra. Den frekvensselektiva kanalen har s˚aledes delats upp i ett antal icke frekvensselektiva parallella subkanaler.

Genom att skicka en bredbandig signal över ett OFDM-baserat MIMO-system kan ännu högre datatakter ˚astadkommas. Dock m˚aste kodningsmatriserna beräknas för varje parallell subkanal i frekvensbandet, vilket innebär att m˚anga beräkningsoperationer krävs. Det här examensarbetet har undersökt en ny uppsättning algoritmer för att erh˚alla approximatio- ner av de kodningsmatriser som behövs. Kvaliteten p˚a de approximativa kodningsmatriserna jämfördes med de exakta matriserna och antalet nödvändiga beräkningsoperationer mättes.

Det visade sig att de nya algoritmerna kan producera kodningsmatriser av god kvalitet, men med fler nödvändiga beräkningsoperationer än det traditionella sättet att erh˚alla kodningsmatriserna.

Kodningsmatriserna fr˚an de nya algoritmerna simulerades ocks˚a i ett kommunikationssy- stem. Med de nya matriserna kan man uppn˚a datatakter som är nära den teoretiska maxka- paciteten för en enkel radiokanal om en avancerad dekoder används p˚a mottagarsidan. Om istället en uppsättning av enkla dekodrar används, som i det traditionella systemet, blir systemprestanda lidande.

Sammanfattningsvis s˚a har det här examensarbetet visat att kodningsmatriserna erh˚allna fr˚an de nya algoritmerna kan användas i ett bredbandigt MIMO-system för att maximera datatakten. Dock s˚a kräver de fler beräkningsoperationer, och en mer avancerad dekoder, än det traditionella systemet. De nya algoritmerna är s˚aledes inte konkurrenskraftiga jämfört med det traditionella systemet.

(5)

Acknowledgements

This diploma work was performed at the Signal Processing Laboratory at the School of Electrical Engineering at Kungliga Tekniska H¨ogskolan in Stockholm, and will lead to a degree of Master of Science in Engineering Physics from Uppsala University.

First and foremost, I would like to thank my supervisor Mats Bengtsson for proposing the thesis topic and taking me on as a MSc thesis worker. His advice and guidance has helped me considerably during the course of this work. My ¨amnesgranskare Mikael Sternad at the Division for Signals and Systems at Uppsala university also deserves my gratitude; his comments have been very valuable to the final version of this thesis.

My family has always been supporting my endeavours, and for that I am endlessly grateful.

Finally, thank you Melissa for being so lovely and cheerful, and for moving to Sweden to be with me.

(6)

Introduction

In the current day and age, access to mobile broadband through the cellular networks is ubiquitous. The demands for higher data rates are ever increasing, as people get used to having constant access to the Internet. The latest acronym in the flora of terms relating to cellular networks is LTE, standing for Long Term Evolution. This new standard promises even higher data rates than the previous ”3G” standards, by employing efficient modulation schemes as well as terminals with multiple antennas [1].

With these increasing data rates, one could ponder how they are achieved. It all boils down to the efficient use of resources, in this case power and bandwidth. As the use of the resources is optimized, higher data rates can be provided to the cellular customers. When the power and bandwidth allocation gets close to the optimal point however, how would one go about increase the data rate even further?

An exciting field in wireless communications is that of multi-antenna systems, so called MIMO (multiple-input multiple-output) communications. Having access to multiple antennas at either or both sides of a wireless link can open up for new transmission strategies. The MIMO channel can be used for increasing the rate even further, improving the signal quality, or both at the same time. The reason that the data rate can be increased, for the same amount of available power and bandwidth, is because the MIMO channel under certain conditions provides multiple parallel spatial channels, which can be used independently. This thesis will study how one can get access to the spatial channels, and compare two different approaches for doing this.

1.1 Wireless Communications

Wireless communications is the field of communication strategies that employ radio waves for information transfer. In particular, this thesis will focus on digital wireless communications, meaning that the information being transmitted is in digital form. The system is agnostic of the meaning of the data, but is rather focused on reliably transferring the data across the channel. The typical framework for digital communications can be seen in Figure 1.1.

The first operation in the system is that of source coding. This is the act of taking the information, in whatever form, and transforming it into a form suitable for transmission in a digital communications system. It may include sampling, quantization and lossy or lossless compression of the data [2, p. 2]. The output from the source coder is a sequence of bits, or binary digits. At the receiver side, the last operation performed is undoing the source coding.

(9)

Source Coding

Channel

Coding Mod. Channel Demod. Channel

Decoding

Source Decoding

Figure 1.1: Block Diagram of a Typical Digital Communications System.

If the system has been designed properly, the output of the source decoder will resemble the input to the source coder.

If part of the source coder’s task is to remove redundancy in the data, then the opposite holds for the channel coder. The blocks in Figure 1.1 in between the channel coder and channel decoder will almost certainly introduce some errors in the stream of transmitted symbols. This could be because the system temporarily suffers from a high noise level, or that the radio channel is bad. It is the task of the channel coder/decoder pair to recover any information lost, or at least recognize that some information was lost. This is done by inserting redundancy into the stream of symbols to be sent. Knowing the structure of the redundancy at the receiver side, errors can be corrected, or at least detected.

The last block on the sender side is the modulator. The modulator takes the output of the channel encoder, and transforms it into a form suitable for launch onto the physical channel. This includes mapping bits to symbols, applying pulse-shaping to the symbols so a continuous waveform is obtained and finally upconverting the signal to the carrier frequency.

The waveform is sent to the RF chain of the transmitter, and then converted to RF energy in the antenna. The modulation and demodulation part of the system can be seen in close-up in Figure 1.2.

Pulse

Shaping

Upconv.

Channel

Downconv.

Matched

Filtering Sampling

Demod.

Det.

Figure 1.2: Modulation-Channel-Demodulation Sub-system.

At the receiver, the effect of the carrier frequency is removed in the downconverting step.

The signal is then matched filtered to the pulse-shaping waveform, and sampled. Based on the samples, demodulation and detection is performed resulting in estimates of the transmitted symbols.

A digital communications system does not necessarily have clear cut lines between the subsystems of Figure 1.1. For instance, joint channel coding and modulation may give boosts in performance, at the price of higher transceiver complexity.

(10)

1.2 Multiple Antennas and Wideband Channels

There are several ways to increase the maximum reliable data rate of a wireless link. By transmitting more symbols per unit time, that is increasing the bandwidth of the signal, the data rate is increased. The downside is that bandwidth in the radio spectrum is an expensive resource due to national and international regulations. Another strategy is to increase the signal-to-noise ratio (SNR) of the link. A higher SNR means a better quality of the received signal, and therefore more data can be transferred per symbol, leading to increased spectral efficiency. For this case, more power is needed at the transmitter side, which may pose a problem if the transmitter is battery powered or if there are regulations regarding the amount of acceptable transmitted power at that particular frequency.

A third way of increasing the data rate is by introducing multiple antennas at either or both sides of the link. If the channels between the various antenna elements are uncorrelated, which they under certain circumstances are, multiple parallel spatial channels arise that can be used for parallel signalling. By accessing these spatial channels, the data rate can be increased without consuming more bandwidth or power resources. Information theory shows that for high SNRs, the theoretically highest data rate of a MIMO channel with uncorrelated spatial sub-channels grows linearly in the minimum number of antennas at the receiver or transmitter sides [3].

For radio channels where there are multiple paths from the transmitter to the receiver, several versions of the transmitted signal will be received at different points in time. This multipath propagation leads to problem for signals with a high symbol rate, i.e. wide bandwidth. In effect, different frequencies will be attenuated differently by the radio channel; the channel is said to be frequency-selective. This wideband behaviour of the channel must be mitigated for reliable communication to take place.

1.3 Problem Formulation and Contributions

This thesis will investigate whether a transmission scheme based on a polynomial singular value decomposition of a wideband MIMO channel matrix can achieve the same data rate as a similar system where the singular value decomposition is performed in the frequency domain, at comparable or lower transmission set-up complexity. In order to do this, some algorithms for polynomial matrix decomposition will be investigated in terms of complexity and error performance, and the achievable data rate is simulated for communication system frameworks including these algorithms.

The contributions given by this thesis are:

• Modified versions of the decomposition algorithms of Foster et al. [4] are proposed and analyzed in terms of complexity and decomposition quality.

• Two rational decomposition algorithms for polynomial matrices are proposed, and their shortcomings discussed.

• A transmission scheme utilizing the modified polynomial singular value decomposition algorithm is shown to achieve good performance, in terms of achievable rate, but at a restraining computational load compared to the reference SM by MIMO-OFDM transmission scheme.

(11)

1.4 Thesis Outline

The next chapter will lay the mathematical groundwork for the investigations to come. Poly- nomials and matrices are introduced, as well as the conjunction of the two concepts, which is polynomial matrices. Finally, an analysis of algorithm run time performance through benchmarking and theoretical analysis is presented.

The following part of the thesis, Chapter 3, will discuss the MIMO channel more deeply.

The idea of frequency-selective channels is presented, and the orthogonal frequency-division multiplexing technique for mitigating inter-symbol interference resulting from high data rates is introduced.

In Chapter 4, a set of polynomial matrix decomposition algorithms is presented. They are based on the idea of single coefficient nulling through polynomial Givens rotations. The chapter states the algorithms, and some analytical and numerical results are presented in order to examine the properties of the algorithms.

Chapter 5 similarly analyzes some polynomial matrix decomposition algorithms, but these are instead based on the idea of polynomial nulling through polynomial IIR Givens rotations.

With the algorithms clearly defined, Chapter 6 will employ them in a communications set up. The channel and system models are defined, and the system capacities derived. Using these expressions, simulation results are presented showing the capacity of the given systems.

Finally, conclusions will be drawn in Chapter 7. The thesis will be summarized, and key points will be presented.

(12)

Chapter 2

Preliminaries

This chapter is concerned with some fundamental facts and definitions that will be used a great deal in the rest of the thesis. Definitions of complex polynomials and constant matrices will be given, and then the two concepts will be joined into complex polynomial matrices.

The definitions regarding polynomials, matrices and polynomial matrices are mostly taken from [5, 6], where more details are given.

Additionally, a brief discussion about algorithmic complexity will be undertaken. After this chapter, the reader will be aware of all mathematics needed to understand the algorithms to be presented.

2.1 Complex Polynomials

A Laurent polynomial is an expression of the form

p(z) = C_−V₁z^V¹ + . . . + C₋₁z¹+ C₀+ C₁z⁻¹+ . . . + C_V₂z^−V² (2.1) or more compactly

p(z) =

V2

X

v=−V1

C_vz^−v V₁ > 0, V₂> 0 (2.2) where z is some indeterminate symbol and the C_v coefficients are taken from some field F. For our purposes, we will assume that the coefficients are complex numbers, that is F = C. Hence, (2.2) is a complex Laurent polynomial. In the following, we will just call it a polynomial. For a more in-depth discussion of fields and their properties, see [7, Ch. 7].

For F = C it holds that every polynomial uniquely determines a function [7, p. 297]. The function is evaluated at a point z₀ simply by replacing the indeterminate symbol z in (2.2) with z₀, and performing the summation. It should be noted though that a polynomial, and the function defined by the polynomial, are two distinctly different entities.

In order to conveniently classify polynomials, some notation will now be introduced. The relation∼, used as

p(z)∼ (V1, V₂, C) (2.3)

signifies that p(z) has V₁ positive exponent terms, V₂ negative exponent terms and that the coefficients are in C. The space of all Laurent polynomials with complex coefficients will be denoted C. The space of all polynomials p(z)∼ (V1, V₂, C) will be C^1×1×V¹^+V²⁺¹. Further on, the maximum degree of p(z) is V₁+ V₂.

(13)

It is clear that for V₁⁰ > V₁, V₂⁰ > V₂

p(z)∼ (V1, V₂, C) =⇒ p(z) ∼ (V1⁰, V₂⁰, C). (2.4) This can be shown to hold by setting the added outer coefficients to zero.

The reason for our interest in polynomials is in their relation to Linear Time-Invariant (LTI) systems. Chapter 3 will explore this relation further.

2.1.1 Addition and Subtraction Given two polynomials

a(z)∼ (V1, V₂, C), b(z)∼ (U1, U₂, C) (2.5) the variables

M₁ = max(V₁, U₁), M₂ = max(V₂, U₂) (2.6) can be defined. Assuming that a(z), b(z) have coefficient sequences

{Ca,i}^V_i=−V² ₁, {Cb,i}^U_i=−U² ₁ (2.7) the sum of the polynomials is defined as

(a + b)(z) = a(z) + b(z) =

M2

X

v=−M1

(C_a,v + C_b,v) z^−v (2.8)

with

Ca,v = 0∀v /∈ [−V1, V2], C_b,v= 0∀v /∈ [−U1, U2]. (2.9) Subtraction is similarly defined, but replacing C_b,v → (−Cb,v)∀v.

2.1.2 Multiplication

Multiplication of two polynomials is the convolution of the two coefficient sequences. Given the polynomials in 2.5, the product is written as

c(z) = a(z)b(z) =





V2

X

v=−V1

C_a,vz^−v









U2

X

u=−U1

C_b,uz^−u





=

V2

X

v=−V1

U2

X

u=−U1

C_a,vC_b,uz^−(v+u).

(2.10)

In particular, the coefficient associated with z^−r in the product will be given by

C_c,r= X

u+v=ru,v

C_a,vC_b,u =

V2

X

v=−V1

C_a,vC_b,r−v. (2.11)

Defining

C_a,v = 0∀v /∈ [−V1, V₂], C_b,r−v= 0∀(r − v) /∈ [−U1, U₂] (2.12)

(14)

the sum in (2.11) can be written as an infinite sum C_c,r=

∞

X

v=−∞

C_a,vC_b,r−v (2.13)

which can be identified as the convolution sum.

Let d₁, d₂ be the maximum degrees of the polynomials a(z), b(z). Then by zero-padding the two coefficient vectors to length d₁+ d₂− 1, the convolution can efficiently be evaluated using the convolution theorem [8, p. 191]. That is,

c(z) =F_d⁻¹(Fd(a(z))Fd(b(z))) (2.14) where the transforms are understood to be working on the coefficient vectors of the polynomials.

2.2 Polynomial Matrices

A polynomial matrix A(z) is a matrix whose elements are polynomials, or equivalently, a polynomial whose coefficients are matrix-valued [6, p. 24]. An arbitrary polynomial matrix

A(z) =

V2

X

v=−V1

A_vz^−v (2.15)

belongs to the space C^p×q if A_v ∈ C^p×q ∀v. For the given V1, V₂, we can also write {Av} ∈ C^p×q×(V¹^+V²⁺¹⁾.

The transpose A^T(z), conjugate A^∗(z) and Hermitian conjugate A^H(z) = A^T(z)∗

of a polynomial matrix are obtained by applying the respective operation on each of the coefficient matrices. In addition, A^H(z^−∗) will be termed the para-Hermitian conjugate of A(z). A polynomial matrix which satisfies A^H(z^−∗)A(z) = I is called a paraunitary matrix [4]. This type of matrices will play an important part in the algorithms to be developed in Chapter 4, as their columns are mutually orthogonal over all frequencies. Due to this orthogonality, the multiplication of an arbitrary matrix with a paraunitary matrix preserves the Frobenius norm of the original matrix.

The Frobenius norm of the polynomial matrix in (2.15) is defined as

kA(z)kF = v u u t

p

X

i=1 q

X

j=1 V2

X

v=−V1

|[Av]_ij|² (2.16)

where [·]ij denots the (i, j) component of the matrix.

In the following, the terms matrix and polynomial matrix will be used interchangeably, with the understanding that an ordinary matrix is just a polynomial matrix of maximum degree 0 with a single coefficient matrix.

2.2.1 Givens Rotations

A constant Givens rotation is a unitary transformation which zeroes a specific component of a vector. For the 2× 2 case, the constant complex Givens rotation is defined as

G =

ce^jα se^jφ

−se^−jφ ce^−jα

. (2.17)

(15)

Applying G to a vector x∈ C^2×1 one obtains Gx = Gx1

x₂

=

ce^jαx₁+ se^jφx₂

−se^−jφx₁+ ce^−jαx₂

and by selecting

θ =

(tan⁻¹

|x2|

|x1|

if|x1| 6= 0

π

2 if|x1| = 0 c = cos θ s = sin θ

α =−arg(x1) φ =−arg(x2) it can be shown that

[Gx]2 = 0.

Additionally, since G is unitary, the magnitude squared of the other component will be

|[Gx]1|² =|x1|²+|x2|²

which will be referred to as the energy moving property of the Givens rotation. Intuitively, the application of the Givens rotation moves the energy of component 2 to component 1, so that component 2 becomes zero.

The Givens rotation in (2.17) can be extended so that it zeroes a specific component of a p× 1 vector or p × q matrix. For the matrix case, if element (i, j) is to be zeroed, and element (i, i) is to receive the energy, the Givens rotation takens the form of a p× p identity matrix with elements at the intersections of rows i, j and columns i, j replaced by the elements of (2.17).

For the polynomial matrix case, a polynomial Givens rotation (PGR) can be defined.

What will be referred to as a PGR in this thesis is the elementary polynomial Givens rotation of [4], with an elementary delay matrix prepended. The polynomial analogue of (2.17) is therefore

G(z) =1 0 0 z^−t

ce^jα se^jφ

−se^−jφ ce^−jα

1 0 0 z^t

=

ce^jα se^jφz^t

−se^−jφz^−t ce^−jα

(2.18)

which when applied to x(z) = x₁(z) x₂(z)T

will zero the coefficient associated with z^−t in x₂(z) and move the energy to the constant coefficient of x₁(z), if the parameters are chosen such that

θ =

(tan⁻¹

|x2(t)|

|x1(0)|

if|x1(0)| 6= 0

π

2 if|x1(0)| = 0 c = cos θ s = sin θ

α =−arg(x1(0)) φ =−arg(x2(t))

(2.19)

where x_i(j) denotes the coefficient associated with z^−j for element x_i. In the process, all other coefficients of x1(z) and x2(z) will be affected. It can be shown that G^H(z^−∗)G(z) = I, and (2.18) is therefore a paraunitary operation. The paraunitarity implies that the operation is norm preserving, and in particular that the energy moving property still holds.

(16)

The polynomial Givens rotation can easily be extended to the p× q case, in the same manner as for the constant Givens rotation case. By construction, the p×p polynomial Givens rotation will also be paraunitary. For every application of a polynomial Givens rotation, the degree of the polynomial matrix it is applied to will grow with 2|t|. This inherent property of the PGR will lead to a need for a truncation step in the algorithms to be formed, as further explained in Chapter 4.

The application of a PGR only affects two rows of the matrix it is applied to. The complexity of the application on a p× q matrix with r lags is therefore

C_PGR= 2q(r + 2|t|). (2.20)

2.2.2 Decompositions

The (full) QR decomposition of a constant matrix A∈ C^p×q is

A = QR (2.21)

where Q∈ C^p×pis a unitary constant matrix, and R∈ C^p×q is an upper triangular constant matrix [5, p. 112]. The constant QR decomposition can be calculated in a variety of ways:

through Givens rotations, Householder rotations or via the Gram-Schmidt orthogonalization procedure [5, pp. 114-117]. An approximate polynomial QR decomposition of a polynomial matrix A(z)∈ C^p×q is

A(z) = Q(z)R(z) (2.22)

where Q(z)∈ C^p×q is an approximately paraunitary polynomial matrix and R(z)∈ C^p×q is an approximately upper triangular polynomial matrix [4]. Intuitively, this can be seen as a constant QRD taken over all frequencies.

For an arbitrary constant matrix A∈ C^p×q, the singular value decomposition (SVD) is

A = UDV^H (2.23)

where U ∈ C^p×p, V ∈ C^q×q are unitary matrices and D ∈ R^p×q is a diagonal matrix [5, p.

414]. The diagonal entries of D are called the singular values of the matrix A. The columns of U and V are called the right and left singular vectors of A, respectively. An efficient implementation for calculating the SVD of a constant matrix can be found in [9, p. 448]

Similarly, an approximate singular value decomposition can be obtained for polynomial matrices. Given A(z)∈ C^p×q, a PSVD of A(z) is

A(z) = U(z)D(z)V^H(z^−∗) (2.24)

where U(z)∈ C^p×p, V(z)∈ C^q×q are approximately paraunitary matrices and D(z)∈ C^p×q is an approximately diagonal matrix [4] whose diagonal elements are called the singular values of A(z). As in the QRD case, the intuition is that the polynomial matrix A(z) is decomposed into its SVD over all frequencies. Note that in this definition there is no assumption of ordering of the singular values, as compared to the ordinary SVD [5, p. 414].

2.2.3 Coefficient Truncation

During certain steps of the algorithms to be investigated in Chapter 4, the maximum degrees of the polynomials involved grow fast. As will be seen, often times most of the energy of the

(17)

filter coefficients will be concentrated around the constant coefficient. In order to keep the maximum degrees of the polynomials involved down, a truncation step is utilized. Thereby, the storage requirements for the polynomial matrices are reduced, as well as the computational load involved when applying the decomposition factors as filters. The truncation step is defined in [4], but is restated in Algorithm 1 for clarity.

Algorithm 1 Polynomial Matrix Truncation

1: Input polynomial matrix A(z)∼ (V1, V2, C^p×q) and truncation parameter µ.

2: Find the maximum value for T₁ so that _||A(z)||¹ 2

P_T₁

v=V1

Pp l=1

Pq

m=1|alm(v)|²≤ ^µ₂ holds.

3: Find the minimum value for T₂ so that _||A(z)||¹ 2

P_V₂

v=T2

Pp l=1

Pq

m=1|alm(v)|² ≤ ^µ₂ holds.

4: Return A_trunc(z) =PT2

v=T1A_vz^−v

The parameter µ defines the proportion of the total energy of the filter to be truncated from the matrix. It can be shown that the complexity of the naive implementation of this algorithm isO(pqr²). The complexity can be decreased by a binary search algorithm.

2.3 Computational Complexity

When studying algorithms from a performance perspective, it is interesting to analyze their running times. This may either be done through benchmarking or theoretical analysis [10, p. 91]. In this thesis, both approaches will be used. Benchmarking is straight-forward, as it only involves running the algorithm and measuring some quantity related to the performance.

By running the algorithm for different sets of input data, experimental relations are obtained relating the performance to the properties of the input. The downside of benchmarking is that the algorithm must be implemented first, and that one can not be sure that the results generalize.

Theoretical analysis, on the other hand, typically only gives upper bounds of the performance. This is particularly the case for algorithms containing loops, where the number of iterations of the loops are not obviously defined in terms of the input data size. In order to deal with bounds in a convenient manner, Ordo or Big-Oh notation is introduced. Assume that T (n) describes the run time of an algorithm with input data size n. Then we say that T (n) =O(f(n)) if it holds that

T (n)≤ cf(n)

for some constant c > 0 and all integers n > n₀. Ordo notation is a very convenient tool for describing complexity expressions, because of the two following rules as given by [10, p. 98]:

1. Constant factors don’t matter: For any constant d, O(df(n)) = O(f(n)) since the constant can be accumulated in the hidden constant c of the Ordo notation.

2. Low-order terms don’t matter: For the polynomial case, only the term with the largest degree needs to be kept, since it will dominate over any other terms for large n.

(18)

Chapter 3

MIMO Channels and Multipath Propagation

3.1 Propagation and Modeling

Wireless communication uses electromagnetic waves in the radio spectrum for information transfer. Electromagnetic waves are completely characterized by the Maxwell equations; a set of non-linear partial differential equations. The inherent complicated structure of the equations makes them hard to solve as well as hard to analyze analytically. Radio engineers therefore tend to use simpler models, where certain aspects of the radio wave propagation can more easily be analyzed.

3.1.1 Propagation

In general, the received radio signal is modeled as an attenuated and modified version of the transmitted signal, with some noise added. The attenuation can be classified into three time-scale dependent behaviours: path loss, shadowing and fading [3]. Path loss occurs only because of the distance between the transmitter and the receiver, and can be modeled as a factor 1/r^d, where d usually is in the range [2, 4] [3]. At best, which is the case of an isotropic transmitter antenna with fixed transmit power in empty space, the exponent will take the value 2. This is because for this scenario, the generated radio waves will be spherically propagating from the antenna, and the amplitude at distance r will decrease as 1/r. The power therefore decreases as 1/r². Due to ground reflections and other phenomena, the exponent can be larger though. The path loss is the most slowly varying of the three attenuation factors mentioned.

Shadowing takes place on a shorter time scale, and is typically incurred by objects that are blocking the radio waves. In an urban environment, this could be due to cars moving around and temporarily changing the propagation paths. Shadowing is typically hard to model, but one common model is the log-normal distribution [3].

The most quickly evolving attenuation phenomenon is fading. This occurs due to radio waves from different paths adding up constructively or destructively at the receiver. This interference effect changes on the order of half a wavelength, and is therefore sensitive to small changes in the propagation environment. Assuming that there is no Line-of-Sight (LOS) component, the fading is called Rayleigh fading. The name arises from the fact that the

(19)

channel coefficient can be modeled, through the central limit theorem, as a complex Gaussian stochastic variable, whose magnitude square is described by the Rayleigh distribution. If there is a LOS component, the channel gains are instead modeled by a Rice distribution, and the phenomenon is called Rician fading.

A subject that we have touched upon, but not discussed upfront, is that of multipath propagation. Depending on the environment, a transmitted radio wave may take several different paths to the receiver. An example can be seen in Figure 3.1. Depending on the path lengths, and the associated reflections, several delayed versions of the same signal will reach the receiver. The delay spread of a channel is a measure of how large spread it is between the first and the last multipath component to arrive at the receiver. For short delay spreads, compared to the transmit symbol period, the channel is said to have narrowband fading, for which the Rayleigh and Rice models work well. On the other hand, for channels with large delay spreads, other models must be used. The coherence bandwidth B_cof a channel is what width in the frequency domain can be assumed constant, and is approximately inversely proportional to the delay spread.

Figure 3.1: Example of Multipath Propagation

For channels where the delay spread is larger than the symbol period, an effect called inter-symbol interference (ISI) takes place. Since the symbols are transmitted at a high rate, the channel impulse response will not have enough time to ”die off”. Instead, subsequent symbols will be overlaid at the receiver, and therefore interfere with each other. This type of channels are also called frequency selective channels, since different frequency components of the transmitted signal will be attenuated by different factors.

(20)

3.1.2 Channel Modeling

In this section, the channel is assumed to be described well by a single-input single-output (SISO) linear system. A linear system is characterized by its impulse response at time t, h(t; τ ). For a transmitted signal s(t), the received signal r(t) is modeled as a filtered version of s(t) with some noise n(t) added, such that

r(t) = Z ∞

−∞

h(t; τ )s(t− τ)dτ + n(t).

In the following, we will assume that the channel is time-invariant over the transmission of a block of data, and the channel impulse response can be replace by h(τ ) such that

r(t) = Z ∞

−∞

h(τ )s(t− τ)dτ + n(t). (3.1)

The output of the channel can be thought of like a weighted sum of the input signal x(t) with weighting factor h(τ ). A graphical representation of a SISO channel can be seen in Figure 3.2.

Figure 3.2: Single-input single-output Channel

Radio transmissions always take place at some carrier frequency, which is modulated for data transfer. The carrier frequency itself does not carry any information, and it is therefore convenient to remove the effect of the carrier in any analysis of the signal. For the example above, all signals involved are assumed to be real, since they relate to physical processes. Let S(f ) be the Fourier transform of s(t), band-limited to [f_c− W/2, fc+ W/2] where f_c is the carrier frequency and W is the signal bandwidth. For consistency, W < 2fc. The complex baseband equivalent version of the signal is then defined through its Fourier transform

S_b(f ) = (√

2S(f + f_c) f + f_c> 0

0 f + f_c≤ 0. (3.2)

Since s(t) is real, its Fourier transform is symmetric around the origin. The transformation (3.2) therefore effectively moves the positive spectrum down from the carrier frequency to baseband, and scales it so that the sum power remains constant. The baseband equivalent spectrum will not be symmetric around the origin anymore, and the signal s_b(t) defined by S_b(f ) is therefore complex.

Note that the original signal s(t) can easily be recovered from s_b(t) since S(f ) = 1

√2S_b(f− fc) + S_b^∗(−f − fc) and by taking inverse Fourier transforms

s(t) = 1

√2

s_b(t)e^j2πf^c^t+ s^∗_b(t)e^−j2πf^c^t

=√

2Real(s_b(t)e^j2πf^c^t). (3.3)

(21)

The complex baseband equivalent signal is more convenient to work with than the passband signal, and since the transformation is invertible no information is lost in the conversion process.

Finally, in order to work with the signals in a computer, a discrete-time model is needed.

Sampling (3.1) faster than the Nyquist rate, it can be shown that a discrete-time model takes the form

r[m] =

∞

X

l=−∞

h[l]s[m− l] + n[m] (3.4)

where h[l] is determined from the transmit, channel, and receive filters in place. For the full derivations, see e.g. [11, p. 49].

3.1.3 MIMO Channels

A multiple-input multiple-out (MIMO) channel is a channel with several transmit and/or receive antennas. It is described by a matrix, since every transmit-receive antenna pair has a channel associated with it. These single-input single-output (SISO) sub-channels may be correlated. A graphical representation of a MIMO channel can be seen in Figure 3.3. The number of antennas used in MIMO communications systems are typically on the order of 2-4 antennas on either or both sides. For example, the IEEE 802.11n WiFi standard allows for up to 4 antennas on both sides [12].

..

. .

..

Figure 3.3: Multiple-input multiple-output Channel

Following the same derivations as in the previous section, it can be shown that discrete- time complex baseband equivalent system for the MIMO channel with M_r receive antennas and M_t transmit antennas takes the form

r[m] =

∞

X

l=−∞

H[l]s[m− l] + n[m] (3.5)

(22)

where {r[m], n[m]} ∈ C^M^r^×1, H ∈ C^M^r^×M^t, s[m] ∈ C^M^t^×1. For the narrowband case, (3.5) simplifies to

r[m] = Hs[m] + n[m].

The matrix channel H[m] describes how the transmitted signal is mixed in space and time, when sampled at the receiver. Row i of H[m] determines the weighting factors for the received signal at receiver antenna i at time lag m. Similarly, column j of H[m] describes the spatial signature of the signal from transmitter antenna j at time lag m.

Taking the z-transform of both sides of (3.5), and using the convolution theorem for the z-transform [8, p. 191], the relation is described by the polynomial matrix equation

r(z) = H(z)s(z) + n(z), (3.6)

a fact we will rely heavily on in the following.

3.2 Channel Capacity and Achievable Rate

In his seminal paper ”A Mathematical Theory of Communication” [13], Claude E. Shannon, introduced the concept of a channel capacity, and derived it for the additive White Gaussian noise (AWGN) SISO channel with an average power constraint. The channel capacity, if known for the given channel model, is the highest possible rate that can be used for communication over the channel with vanishing error probability. That is, for any rate below the channel capacity, there exists a code, with long enough codewords, that achieves that rate with arbitrarily small error probability.

Shannon’s results provide a benchmark for which any practical transmission scheme can be compared. The proof of the channel coding theorem is not constructive however, and it is not until recently with the introduction of turbo codes and low density parity check (LDPC) codes that practical systems get close to the channel capacity [2, p. 252].

In [13], Shannon derived the channel capacity for the deterministic AWGN channel with an average power constraint. With system bandwidth W , noise variance σ²_n and average power less than P , the well-known formula for the capacity is

S_AWGN = W log

1 + P

σ²_n

.

A similar result, for the channel capacity of deterministic narrowband AWGN MIMO channel with average sum power constraint, was derived in [14]. The capacity of the channel, with spatially and temporally white Gaussian noise with variance σ_n², is given by

S_MIMO-AWGN = max

P log

I + 1

σ_n²HPH^H

where P = E ss^H is the covariance of the transmit symbol s. The rate R < S_MIMO-AWGN R = log

I + 1

σ_n²HPH^H

(3.7) is called an achievable rate, since for a sub-optimal choice of P it is less than the channel capacity. The noisy channel coding theorem of [13] therefore gives that such a rate R exists, since R < S and a code with rate S exists. For our purposes, (3.7) will simply serve as a mapping from SNR to data rate, which will be useful for comparison of transmission schemes in Chapter 6.

(23)

3.3 Equalization Techniques

For a wideband channel, multipath propagation is inherent. For high symbol rates, this results in ISI. What is meant with a high symbol rate is not clear cut, but the rule of thumb is that ISI must be mitigated when W >> Bc.

In order to do symbol-by-symbol detection, the receiver needs to remove the ISI. The operation is called channel equalization, and there are many different strategies on how to do it. The equalizer needs information about how the channel behaves, typically the channel impulse response. This is obtained by a training phase, where the receiver can identify the channel from a sequence of known symbols that are sent. The optimal equalization, in the sense of minimizing the probability for that a symbol in the sequence is detected incorrectly, is called Maximum Likelihood Sequence Estimation (MLSE), and is implemented in practice using the Viterbi algorithm [15, p. 88]. A downside of MLSE is that the algorithm complexity grows exponentially in the number of channel taps [15, p. 90]. On the other side is the family of low-complexity linear equalizers. A common linear equalizer, the zero-forcing equalizer, removes the effect of ISI completely but suffers from noise amplification. Another equalizer, the MMSE receiver, makes an optimal trade-off between removing ISI and amplifying the noise, for the class of linear receivers.

Orthogonal Frequency-Division Multiplexing

For channels with high spectral dynamics, that is small coherence bandwidth B_c, there is another viable alternative. Orthogonal frequency-division multiplexing (OFDM) leverages the Fast Fourier Transform (FFT) and can handle quasi-static wideband channels well [15, p.

99]. In fact, OFDM can achieve the channel capacity as the number of sub-carriers grow large.

Through OFDM, signaling is performed in the frequency domain. In order to launch the signal onto the channel, it is transformed to the time domain using an Inverse FFT (IFFT). At the receiver, the received signal is transformed to the frequency domain, where detection takes place. Effectively, OFDM transforms the wideband channel into a set of parallel independent channels in the frequency domain.

For the SISO case, assume that a sequence of N bits b(n) ∈ {0, 1} is to be transmitted.

Through some mapping from b(n), a frequency vector s⁰ ∈ C^{N ×1} is created. The frequency vector is transformed to the time domain through the application of an IDFT matrix F ∈ C^{N ×N} defined by

F_ij = 1

√Ne⁻j2π(i−1)(j−1)

N . (3.8)

The time domain representation then takes the form

s⁰⁰= F^Hs⁰. (3.9)

In order to make the cyclic convolution of the DFT into a linear convolution, a cyclic prefix is prepended so that the signal

es = s⁰⁰[N− (L − 1)] . . . s⁰⁰[N− 1] s⁰⁰[0] s⁰⁰[1] . . . s⁰⁰[N ]T

(3.10) is transmitted on the channel. Because of the cyclic prefix, the output of the channel will be a cyclic convolution of the input signal, plus noise

er[m] =

L−1

X

l=0

h_ls⁰⁰[(m− L − l)modN] + w[m] (3.11)

(24)

At the receiver, stripping of the cyclic prefix, the vector r⁰⁰= er[0] er[1] . . . er[N− 1]T

(3.12) is formed. Transforming into the frequency domain, the received frequency vector

r⁰ = Fr⁰⁰ (3.13)

is obtained, which then is detected for every vector component. It can be shown that thanks to the cyclic prefix, the channel is rendered circulant, and the IFFT/FFT pair diagonalizes circulant matrices. The model, in the frequency domain, for the communications process is therefore

r⁰= Ωs⁰+ w (3.14)

where Ω is a diagonal matrix with the channel gains of the different frequency bins on the diagonal. The noise keeps its characteristics due to the unitarity of the FFT transform.

With the channel diagonalized over the frequency band, the transmitter is free to select varying transmit powers for the different frequency bins. The optimal way, in the sense of achievable rate, of doing this is by the waterfilling technique as further described by [11, p.

68]. The waterfilling strategy is also applied in Chapter 6, where a wideband MIMO channel is diagonalized in frequency through OFDM.

3.4 Summary

This chapter provided an introduction to channel modeling in wireless communications, as well as the concepts of channel capacity and achievable rate. Furthermore, multipath propagation and the sometimes ensuing inter-symbol interference was introduced. Some strategies for combating ISI was discussed, and the OFDM technique was described in detail.

In Chapter 6, most of the ideas of this chapter will be referred to. In particular, OFDM will be used to mitigate ISI for the MIMO wideband scenario present there.

(25)

Chapter 4

Polynomial Decomposition

Algorithms: Coefficient Nulling

This chapter will investigate a couple of polynomial decomposition algorithms which are based on the idea of iterative nulling of single coefficients. Polynomial Givens Rotations (PGRs) are employed for the coefficient nulling, as defined in Section 2.2.1. Through the application of consecutive PGRs on a matrix, decompositions such as PQRD and PSVD can be found.

Four algorithms will be studied, of which two were proposed by Foster et. al. in [4]. The remaining two are slightly modified versions of the original algorithms.

The decompositions generated by the algorithms will be approximations, because as shown by [16], an exact FIR decomposition of a FIR matrix is impossible to achieve. In this thesis, approximate polynomial decomposition algorithms will be used for the channel diagonalization problem of spatial multiplexing in wireless communications. This topic is further studied in Chapter 6.

In [17], McWhirter proposes a different procedure for obtaining a polynomial singular value decomposition. There, a sequential best rotation algorithm is introduced using generalized Kogbetliantz transformations. The algorithm, which is not studied in this thesis, is shown to perform better than previous sequential best rotation procedures.

The first section of this chapter will describe the performance measures employed in the study of the algorithms. As these are defined, the following sections will investigate one algorithm at a time, with respect to function, convergence and complexity.

4.1 Performance Criteria

In order to measure performance of the algorithms, a number of performance criteria will be defined. For the run time measurements, or algorithm complexity, the number of iterations of the innermost loop (coefficient steps, see Section 4.2) needed until convergence will be taken as the performance measure. The complexity in terms of floating point operations is then simply obtained by plugging in the complexity of a single coefficient step in terms of floating point operations, as given by equation (2.20).

Before introducing the other performance criteria, the following optimization problems

(26)

are posed:

minimize kA(z) − Q(z)R(z)kF minimize kA(z) − U(z)D(z)V^H(z^−∗)kF

subject to Q^H(z^−∗)Q(z) = I subject to U^H(z^−∗)U(z) = I V^H(z^−∗)V(z) = I kRlower(z)kF = 0 kDnon-diag(z)kF = 0.

Casually speaking, the PQRD and PSVD algorithms under study, respectively, can be thought to approximately solve these optimization problems. The resulting matrices Q(z), R(z) or U(z), D(z), V(z) would however neither be feasible nor minimize the cost function. Rather, the algorithms will output matrices that are ”close” to fulfilling the constraints, while having a ”small” associated cost. With this loose argument in mind, error criteria will be defined that determine how ”close” a set of matrices are to fulfilling the constraints, and how ”small”

the associated cost is.

How well the product of the decomposition matrices describes the decomposed matrix is measured by the decomposition error

E_d^QRD= kA(z) − Q(z)R(z)kF

kA(z)kF

(4.1) for a PQRD A(z) = Q(z)R(z). For the same PQRD, the triangularity error

E_t= kRlower(z)k²F

kA(z)k²_F . (4.2)

shows the ratio of the amount of energy in the lower triangular section of R(z) to the total amount of energy. Similarly, for a PSVD A(z) = U(z)D(z)V^H(z^−∗) the decomposition error is

E_d^SVD = kA(z) − U(z)D(z)V^H(z^−∗)kF

kA(z)kF

. (4.3)

This decomposition error definition makes sense from the algorithm evaluation point of view, as it represents how well the decomposition factors describe the original matrix. From an application point of view, it might be interesting to study the normalized version ofkD(z) − U^H(z^−∗)A(z)V(z)kF, as that definition instead would describe the error in the calculated singular values of the original matrix.

The relative amount of energy in the non-diagonal part of D(z) is described by the diag- onality error

E_diag= kDnon-diag(z)k²_F

kA(z)k²_F . (4.4)

Finally, the unitarity error of any matrix A(z) is defined as E_u= kI − A^H(z^−∗)A(z)kF

kIkF

(4.5) and indicates how close A(z) is to being paraunitary.

(27)

4.2 PQRD-BC: Polynomial QR Decomposition

The first algorithm to be studied is PQRD By Columns (PQRD-BC), which was first introduced in [4]. The algorithm generates an approximate PQRD

A(z) = Q(z)R(z)

of an arbitrary matrix A(z), where Q(z) is approximately paraunitary and R(z) is approximately upper triangular. This is done through the iterative application of PGRs, until A(z) has been transformed into a sufficiently upper triangular matrix. As will be shown in Sec- tion 4.4, PQRD-BC will be a necessary component of the PSVD by PQRD-BC algorithm.

As the next algorithm to be studied, Modified PQRD-BC, is a derivative of PQRD-BC, we will refrain from presenting simulation results for PQRD-BC. Rather, the results for Modified PQRD-BC given in Section 4.3 also represent the typical behaviour of PQRD-BC.

Though the original definition of PQRD-BC can be found in [4], we restate the algorithm here for completeness. A pseudocode representation of the algorithm can be seen in Algorithm 2.

Algorithm 2 PQRD By Columns (PQRD-BC)

1: Input polynomial matrix A(z) ∼ (V1, V₂, C^p×q), convergence parameter , truncation parameter µ and absolute stopping criteria MaxIter and MaxSweeps.

2: Let Q(z) = I_p, g₁= 1 + and n = 0.

3: while n≤ MaxSweeps and g1 > do

4: Let n = n + 1.

5: for k = 1 . . . min(p− 1, q) do

6: Let iter = 0 and g₂ = 1 +

7: while iter≤ MaxIter and g2 > do

8: Find j and t such that |ajk(t)| ≥ |amk(t)| holds for m = k + 1 . . . p and ∀t ∈ Z.

9: Let g₂=|ajk(t)|.

10: if g2 > then

11: Let iter = iter + 1.

12: Obtain PGR G(z) as a function of (j, k, t,|ajk(t)|, |akk(0)|).

13: Let A(z) = G(z)A(z).

14: Let Q(z) = G(z)Q(z).

15: Truncate A(z), Q(z) given µ.

16: end if

17: end while

18: end for

19: Find j, k and t such that |ajk(t)| ≥ |amk(t)| holds for n = 1 . . . q and m = n + 1 . . . p and ∀t ∈ Z.

20: Let g₁=|ajk(t)|.

21: end while

22: Let R(z) = A(z).

For future reference, the block of rows 12-15 will be called a coefficient step and the operations in rows 6-17 a column step. The algorithm operates over all columns of A(z) from left to right. For every column, a certain number of coefficient steps are performed,

(28)

until the coefficients in the given column are sufficiently small. To determine what coefficient to be nulled next, the coefficient with greatest magnitude under the main diagonal in the given column is found. This coefficient is subsequently nulled through the application of a polynomial Givens rotation with appropriate parameters. This behaviour is repeated until all coefficients in the given column have a magnitude less than the convergence parameter .

As one column step is finished, the algorithm moves to the next column, until all columns have been traversed. As a safe-guard, the algorithm can restart from column 1 if necessary, making another sweep over the columns.

Every coefficient step includes a matrix truncation, implemented as described by Algo- rithm 1. Without this truncation step, the maximum degrees of the matrix polynomials would grow very fast, and with that the memory requirements. As is generally the case, most of the energy of the coefficients is centered around the zero-lag coefficient, and the decomposition is therefore not ruined. This can specifically be seen in the example in Section 4.3.2.

4.2.1 Convergence and Complexity

The polynomial Givens rotation is a paraunitary transformation, and therefore energy preserving. By selecting parameters according to equation (2.19), the rotation has the effect of moving energy from the coefficient being nulled to the zero-lag coefficient of the diagonal element of the current column. After a column step is finished, most of the energy of the coefficients below the main diagonal will have been transferred to the diagonal element. Be- cause the columns are visited in order from left to right, any subsequent coefficient step will mainly affect columns including and to the right of the current column. This is because the coefficients in the previous columns will be close to being zero. The left-to-right ordering of column steps, together with the fact that the algorithm is allowed to restart for more sweeps, guarantees the convergence of the algorithm. The full convergence proof can be found in [4].

In order to derive the theoretical complexity, assume that PQRD-BC is applied to A(z)∼ (V1, V2, C^p×q). Additionally, it will be assumed that the time lag dimension of any matrix involved will be bounded by some r∝ (V1+ V₂+ 1), because of the truncation step at row 15.

The complexity will be derived in terms of number of coefficient steps needed for convergence.

By definition, the block of rows 12-15 is one coefficient step. Let the complexity of the coefficient step be O(Cc) where C_c is some function of p, q, r. Since row 11 is O(1), it is insignificant compared to the complexity of the coefficient step. Rows 8 and 19 involve searches for the coefficient with the largest magnitude under the main diagonal in the current column, and under the main diagonal in any column respectively. Denote the complexities of these algorithms as O(Cjt) and O(Cjkt).

The number of iterations of the while loop starting at row 7 is bounded by the constant MaxIter, and the complexity of the loop is thereforeO(MaxIterCc) +O(MaxIterCjt) = O(Cc) +O(Cjt). This is a bound with a very large hidden constant in the Ordo expression, and may therefore not be representative for the typical behaviour of the loop. In practice, the number of iterations of the loop would be proportional to the number of coefficients below the main diagonal of that column. The modified PQRD-BC given in Algorithm 3 does not suffer from this theoretical inconvenience, as the convergence criterion is defined differently.

Furthermore, the for loop at row 5 will iterate min(p−1, q) times, and the complexity of the block of rows 6-17 is thereforeO(min(p − 1, q)Cc) +O(min(p − 1, q)Cjt). The number of iterations (sweeps) of the outermost loop of the algorithm is bounded by a constant MaxSweeps.

The complexity of the iterative part of the algorithm is therefore O(MaxSweeps min(p −

Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications

Examensarbete 20 p November 2010

Polynomial Matrix Decompositions

Evaluation of Algorithms with an Application to Wideband MIMO Communications

Rasmus Brandt

Abstract

Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications

Contents

Chapter 1

Introduction

1.1 Wireless Communications

1.2 Multiple Antennas and Wideband Channels

1.3 Problem Formulation and Contributions

1.4 Thesis Outline

Chapter 2

Preliminaries

2.1 Complex Polynomials

2.2 Polynomial Matrices

2.3 Computational Complexity

Chapter 3

MIMO Channels and Multipath Propagation

3.1 Propagation and Modeling

3.2 Channel Capacity and Achievable Rate

3.3 Equalization Techniques

3.4 Summary

Chapter 4

Polynomial Decomposition

Algorithms: Coefficient Nulling

4.1 Performance Criteria

4.2 PQRD-BC: Polynomial QR Decomposition