• No results found

Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications

N/A
N/A
Protected

Academic year: 2021

Share "Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications"

Copied!
86
0
0

Loading.... (view fulltext now)

Full text

(1)

UPTEC F10 059

Examensarbete 20 p November 2010

Polynomial Matrix Decompositions

Evaluation of Algorithms with an Application to Wideband MIMO Communications

Rasmus Brandt

(2)
(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Polynomial Matrix Decompositions: Evaluation of Algorithms with an Application to Wideband MIMO Communications

Rasmus Brandt

The interest in wireless communications among consumers has exploded since the introduction of the ''3G'' cell phone standards. One reason for their success is the increasingly higher data rates achievable through the networks. A further increase in data rates is possible through the use of multiple antennas at either or both sides of the wireless links.

Precoding and receive filtering using matrices obtained from a singular value decomposition (SVD) of the channel matrix is a transmission strategy for achieving the channel capacity of a deterministic narrowband multiple-input multiple-output (MIMO) communications channel. When signalling over wideband channels using orthogonal frequency-division multiplexing (OFDM), an SVD must be performed for every sub-carrier. As the number of sub-carriers of this traditional approach grow large, so does the computational load. It is therefore interesting to study alternate means for obtaining the decomposition.

A wideband MIMO channel can be modeled as a matrix filter with a finite impulse response, represented by a polynomial matrix. This thesis is concerned with

investigating algorithms which decompose the polynomial channel matrix directly. The resulting decomposition factors can then be used to obtain the sub-carrier based precoding and receive filtering matrices. Existing approximative polynomial matrix QR and singular value decomposition algorithms were modified, and studied in terms of decomposition quality and computational complexity. The decomposition algorithms were shown to give decompositions of good quality, but if the goal is to obtain precoding and receive filtering matrices, the computational load is prohibitive for channels with long impulse responses.

Two algorithms for performing exact rational decompositions (QRD/SVD) of polynomial matrices were proposed and analyzed. Although they for simple cases resulted in excellent decompositions, issues with numerical stability of a spectral factorization step renders the algorithms in their current form purposeless.

For a MIMO channel with exponentially decaying power-delay profile, the sum rates achieved by employing the filters given from the approximative polynomial SVD algorithm were compared to the channel capacity. It was shown that if the symbol streams were decoded independently, as done in the traditional approach, the sum rates were sensitive to errors in the decomposition. A receiver with a spatially joint detector achieved sum rates close to the channel capacity, but with such a receiver the low complexity detector set-up of the traditional approach is lost.

Summarizing, this thesis has shown that a wideband MIMO channel can be diagonalized in space and frequency using OFDM in conjunction with an

approximative polynomial SVD algorithm. In order to reach sum rates close to the capacity of a simple channel, the computational load becomes restraining compared to the traditional approach, for channels with long impulse responses.

Examinator: Tomas Nyberg Ämnesgranskare: Mikael Sternad Handledare: Mats Bengtsson

(4)

Popul¨arvetenskaplig sammanfattning p˚a svenska

Tr˚adl¨os kommunikation ¨ar ett omr˚ade vars popul¨aritet har ¨okat de senaste ˚aren. Ett sk¨al till ”3G-internets” framg˚ang ¨ar de h¨oga datatakter som ¨ar m¨ojliga. Datatakten i en tr˚adl¨os l¨ank beror p˚a signalens bandbredd samt s¨andeffekten, och genom att ¨oka endera erh˚alls h¨ogre datatakter. B˚ade bandbredd och s¨andeffekt ¨ar dock dyra resurser, eftersom deras anv¨andande ofta ¨ar reglerat av nationella och internationella myndigheter.

Ett annat s¨att att ¨oka datatakten i en tr˚adl¨os l¨ank kan vara att l¨agga till fler anten- ner p˚a s¨andar- och mottagarsidan, ett s.k. MIMO-system. Ett s˚adant system kan ses som en upps¨attning av enkelantennl¨ankar med inb¨ordes p˚averkan och kan beskrivas av en matris. Da- tatakten f¨or flerantennl¨anken kan maximeras genom att skicka flera parallella datastr¨ommar

¨over MIMO-kanalen. Eftersom de olika uts¨anda signalerna samsas om radiokanalen kommer de att blandas. Varje mottagarantenn kommer d¨arf¨or att ta emot en kombination av de uts¨anda signalerna fr˚an de olika s¨andarantennerna.

F¨or att undvika att signalerna blandas m˚aste de kodas. Det visar sig att genom att koda de s¨anda signalerna med en speciell matris, samt avkoda de mottagna signalerna med en annan matris, s˚a transformeras kanalen till en upps¨attning av parallella virtuella kanaler. P˚a dessa virtuella kanaler kan sedan oberoende datastr¨ommar skickas. Kodningsmatriserna ges av en s.k. singul¨arv¨ardesuppdelning av den ursprungliga kanalmatrisen.

F¨or ett enkelantennsystem med h¨og bandbredd kommer radiokanalen att p˚averka de olika frekvenskomponenterna i signalen olika. Om inte systemet tar h¨ansyn till den effekten kommer dess prestanda att p˚averkas. Ett s¨att att undvika denna frekvensselektivitet ¨ar att signalera

¨over kanalen med s.k. OFDM. Genom OFDM-systemet delas den ursprungliga signalen upp i flera signaler med l˚ag bandbredd. Genom att skicka dessa smalbandiga signaler p˚a olika delar av frekvensbandet s˚a p˚averkar de inte varandra. Den frekvensselektiva kanalen har s˚aledes delats upp i ett antal icke frekvensselektiva parallella subkanaler.

Genom att skicka en bredbandig signal ¨over ett OFDM-baserat MIMO-system kan ¨annu h¨ogre datatakter ˚astadkommas. Dock m˚aste kodningsmatriserna ber¨aknas f¨or varje parallell subkanal i frekvensbandet, vilket inneb¨ar att m˚anga ber¨akningsoperationer kr¨avs. Det h¨ar examensarbetet har unders¨okt en ny upps¨attning algoritmer f¨or att erh˚alla approximatio- ner av de kodningsmatriser som beh¨ovs. Kvaliteten p˚a de approximativa kodningsmatriserna j¨amf¨ordes med de exakta matriserna och antalet n¨odv¨andiga ber¨akningsoperationer m¨attes.

Det visade sig att de nya algoritmerna kan producera kodningsmatriser av god kvalitet, men med fler n¨odv¨andiga ber¨akningsoperationer ¨an det traditionella s¨attet att erh˚alla kodnings- matriserna.

Kodningsmatriserna fr˚an de nya algoritmerna simulerades ocks˚a i ett kommunikationssy- stem. Med de nya matriserna kan man uppn˚a datatakter som ¨ar n¨ara den teoretiska maxka- paciteten f¨or en enkel radiokanal om en avancerad dekoder anv¨ands p˚a mottagarsidan. Om ist¨allet en upps¨attning av enkla dekodrar anv¨ands, som i det traditionella systemet, blir systemprestanda lidande.

Sammanfattningsvis s˚a har det h¨ar examensarbetet visat att kodningsmatriserna erh˚allna fr˚an de nya algoritmerna kan anv¨andas i ett bredbandigt MIMO-system f¨or att maximera datatakten. Dock s˚a kr¨aver de fler ber¨akningsoperationer, och en mer avancerad dekoder, ¨an det traditionella systemet. De nya algoritmerna ¨ar s˚aledes inte konkurrenskraftiga j¨amf¨ort med det traditionella systemet.

(5)

Acknowledgements

This diploma work was performed at the Signal Processing Laboratory at the School of Electrical Engineering at Kungliga Tekniska H¨ogskolan in Stockholm, and will lead to a degree of Master of Science in Engineering Physics from Uppsala University.

First and foremost, I would like to thank my supervisor Mats Bengtsson for proposing the thesis topic and taking me on as a MSc thesis worker. His advice and guidance has helped me considerably during the course of this work. My ¨amnesgranskare Mikael Sternad at the Division for Signals and Systems at Uppsala university also deserves my gratitude; his comments have been very valuable to the final version of this thesis.

My family has always been supporting my endeavours, and for that I am endlessly grateful.

Finally, thank you Melissa for being so lovely and cheerful, and for moving to Sweden to be with me.

(6)

Contents

1 Introduction 1

1.1 Wireless Communications . . . 1

1.2 Multiple Antennas and Wideband Channels . . . 3

1.3 Problem Formulation and Contributions . . . 3

1.4 Thesis Outline . . . 4

2 Preliminaries 5 2.1 Complex Polynomials . . . 5

2.1.1 Addition and Subtraction . . . 6

2.1.2 Multiplication . . . 6

2.2 Polynomial Matrices . . . 7

2.2.1 Givens Rotations . . . 7

2.2.2 Decompositions . . . 9

2.2.3 Coefficient Truncation . . . 9

2.3 Computational Complexity . . . 10

3 MIMO Channels and Multipath Propagation 11 3.1 Propagation and Modeling . . . 11

3.1.1 Propagation . . . 11

3.1.2 Channel Modeling . . . 13

3.1.3 MIMO Channels . . . 14

3.2 Channel Capacity and Achievable Rate . . . 15

3.3 Equalization Techniques . . . 16

3.4 Summary . . . 17

4 Polynomial Decomposition Algorithms: Coefficient Nulling 18 4.1 Performance Criteria . . . 18

4.2 PQRD-BC: Polynomial QR Decomposition . . . 20

4.2.1 Convergence and Complexity . . . 21

4.2.2 Discussion . . . 22

4.3 MPQRD-BC: Modified PQRD-BC . . . 22

4.3.1 Convergence and Complexity . . . 23

4.3.2 Simulations . . . 25

4.3.3 Discussion . . . 29

4.4 PSVD by PQRD-BC: Polynomial Singular Value Decomposition . . . 29

4.4.1 Convergence and Complexity . . . 31

(7)

4.4.2 Discussion . . . 32

4.5 MPSVD by MPQRD-BC: Modified PSVD . . . 32

4.5.1 Convergence and Complexity . . . 33

4.5.2 Simulations . . . 34

4.5.3 Discussion . . . 39

4.6 Sampled PSVD vs. SVD in DFT Domain . . . 39

4.6.1 Frequency Domain Comparison . . . 39

4.6.2 Computational Load Comparison, Set-Up Phase . . . 39

4.6.3 Computational Load, Online Phase . . . 41

4.6.4 Discussion . . . 42

4.7 Summary . . . 43

5 Rational Decomposition Algorithms: Polynomial Nulling 44 5.1 Rational Givens Rotation . . . 44

5.2 PQRD-R: Rational QR Decomposition . . . 45

5.2.1 Simulations . . . 46

5.2.2 Discussion . . . 47

5.3 PSVD-R by PQRD-R: Rational Singular Value Decomposition . . . 50

5.3.1 Simulations . . . 50

5.3.2 Discussion . . . 51

5.4 Summary . . . 51

6 Polynomial SVD for Wideband Spatial Multiplexing 55 6.1 Generic System Model . . . 56

6.1.1 Narrowband Scenario . . . 56

6.1.2 Wideband Scenario . . . 58

6.2 SM by MIMO-OFDM: SVD in the DFT Domain . . . 58

6.2.1 Specific System Model . . . 58

6.2.2 Capacity . . . 59

6.3 SM by MIMO-OFDM: SVD in the z-Domain . . . 60

6.3.1 Specific System Model . . . 60

6.3.2 Achievable Rate . . . 62

6.4 Simulations . . . 62

6.4.1 Method . . . 62

6.4.2 Results . . . 63

6.5 Summary . . . 65

7 Summary 69 7.1 Conclusions . . . 70

7.2 Future Work . . . 71

A Acronyms and Notation 72 B Some Complexity Derivations 74 B.1 Matrix-Matrix Multiplication . . . 74

Bibliography 78

(8)

Chapter 1

Introduction

In the current day and age, access to mobile broadband through the cellular networks is ubiquitous. The demands for higher data rates are ever increasing, as people get used to having constant access to the Internet. The latest acronym in the flora of terms relating to cellular networks is LTE, standing for Long Term Evolution. This new standard promises even higher data rates than the previous ”3G” standards, by employing efficient modulation schemes as well as terminals with multiple antennas [1].

With these increasing data rates, one could ponder how they are achieved. It all boils down to the efficient use of resources, in this case power and bandwidth. As the use of the resources is optimized, higher data rates can be provided to the cellular customers. When the power and bandwidth allocation gets close to the optimal point however, how would one go about increase the data rate even further?

An exciting field in wireless communications is that of multi-antenna systems, so called MIMO (multiple-input multiple-output) communications. Having access to multiple antennas at either or both sides of a wireless link can open up for new transmission strategies. The MIMO channel can be used for increasing the rate even further, improving the signal quality, or both at the same time. The reason that the data rate can be increased, for the same amount of available power and bandwidth, is because the MIMO channel under certain conditions provides multiple parallel spatial channels, which can be used independently. This thesis will study how one can get access to the spatial channels, and compare two different approaches for doing this.

1.1 Wireless Communications

Wireless communications is the field of communication strategies that employ radio waves for information transfer. In particular, this thesis will focus on digital wireless communications, meaning that the information being transmitted is in digital form. The system is agnostic of the meaning of the data, but is rather focused on reliably transferring the data across the channel. The typical framework for digital communications can be seen in Figure 1.1.

The first operation in the system is that of source coding. This is the act of taking the information, in whatever form, and transforming it into a form suitable for transmission in a digital communications system. It may include sampling, quantization and lossy or lossless compression of the data [2, p. 2]. The output from the source coder is a sequence of bits, or binary digits. At the receiver side, the last operation performed is undoing the source coding.

(9)

Source Coding

Channel

Coding Mod. Channel Demod. Channel

Decoding

Source Decoding

Figure 1.1: Block Diagram of a Typical Digital Communications System.

If the system has been designed properly, the output of the source decoder will resemble the input to the source coder.

If part of the source coder’s task is to remove redundancy in the data, then the opposite holds for the channel coder. The blocks in Figure 1.1 in between the channel coder and channel decoder will almost certainly introduce some errors in the stream of transmitted symbols. This could be because the system temporarily suffers from a high noise level, or that the radio channel is bad. It is the task of the channel coder/decoder pair to recover any information lost, or at least recognize that some information was lost. This is done by inserting redundancy into the stream of symbols to be sent. Knowing the structure of the redundancy at the receiver side, errors can be corrected, or at least detected.

The last block on the sender side is the modulator. The modulator takes the output of the channel encoder, and transforms it into a form suitable for launch onto the physical channel. This includes mapping bits to symbols, applying pulse-shaping to the symbols so a continuous waveform is obtained and finally upconverting the signal to the carrier frequency.

The waveform is sent to the RF chain of the transmitter, and then converted to RF energy in the antenna. The modulation and demodulation part of the system can be seen in close-up in Figure 1.2.

Pulse

Shaping

Upconv.

Channel

Downconv.

Matched

Filtering Sampling

Demod.

Det.

Figure 1.2: Modulation-Channel-Demodulation Sub-system.

At the receiver, the effect of the carrier frequency is removed in the downconverting step.

The signal is then matched filtered to the pulse-shaping waveform, and sampled. Based on the samples, demodulation and detection is performed resulting in estimates of the transmitted symbols.

A digital communications system does not necessarily have clear cut lines between the subsystems of Figure 1.1. For instance, joint channel coding and modulation may give boosts in performance, at the price of higher transceiver complexity.

(10)

1.2 Multiple Antennas and Wideband Channels

There are several ways to increase the maximum reliable data rate of a wireless link. By transmitting more symbols per unit time, that is increasing the bandwidth of the signal, the data rate is increased. The downside is that bandwidth in the radio spectrum is an expensive resource due to national and international regulations. Another strategy is to increase the signal-to-noise ratio (SNR) of the link. A higher SNR means a better quality of the received signal, and therefore more data can be transferred per symbol, leading to increased spectral efficiency. For this case, more power is needed at the transmitter side, which may pose a problem if the transmitter is battery powered or if there are regulations regarding the amount of acceptable transmitted power at that particular frequency.

A third way of increasing the data rate is by introducing multiple antennas at either or both sides of the link. If the channels between the various antenna elements are uncorrelated, which they under certain circumstances are, multiple parallel spatial channels arise that can be used for parallel signalling. By accessing these spatial channels, the data rate can be increased without consuming more bandwidth or power resources. Information theory shows that for high SNRs, the theoretically highest data rate of a MIMO channel with uncorrelated spatial sub-channels grows linearly in the minimum number of antennas at the receiver or transmitter sides [3].

For radio channels where there are multiple paths from the transmitter to the receiver, several versions of the transmitted signal will be received at different points in time. This multipath propagation leads to problem for signals with a high symbol rate, i.e. wide band- width. In effect, different frequencies will be attenuated differently by the radio channel; the channel is said to be frequency-selective. This wideband behaviour of the channel must be mitigated for reliable communication to take place.

1.3 Problem Formulation and Contributions

This thesis will investigate whether a transmission scheme based on a polynomial singular value decomposition of a wideband MIMO channel matrix can achieve the same data rate as a similar system where the singular value decomposition is performed in the frequency domain, at comparable or lower transmission set-up complexity. In order to do this, some algorithms for polynomial matrix decomposition will be investigated in terms of complexity and error performance, and the achievable data rate is simulated for communication system frameworks including these algorithms.

The contributions given by this thesis are:

• Modified versions of the decomposition algorithms of Foster et al. [4] are proposed and analyzed in terms of complexity and decomposition quality.

• Two rational decomposition algorithms for polynomial matrices are proposed, and their shortcomings discussed.

• A transmission scheme utilizing the modified polynomial singular value decomposition algorithm is shown to achieve good performance, in terms of achievable rate, but at a restraining computational load compared to the reference SM by MIMO-OFDM trans- mission scheme.

(11)

1.4 Thesis Outline

The next chapter will lay the mathematical groundwork for the investigations to come. Poly- nomials and matrices are introduced, as well as the conjunction of the two concepts, which is polynomial matrices. Finally, an analysis of algorithm run time performance through bench- marking and theoretical analysis is presented.

The following part of the thesis, Chapter 3, will discuss the MIMO channel more deeply.

The idea of frequency-selective channels is presented, and the orthogonal frequency-division multiplexing technique for mitigating inter-symbol interference resulting from high data rates is introduced.

In Chapter 4, a set of polynomial matrix decomposition algorithms is presented. They are based on the idea of single coefficient nulling through polynomial Givens rotations. The chapter states the algorithms, and some analytical and numerical results are presented in order to examine the properties of the algorithms.

Chapter 5 similarly analyzes some polynomial matrix decomposition algorithms, but these are instead based on the idea of polynomial nulling through polynomial IIR Givens rotations.

With the algorithms clearly defined, Chapter 6 will employ them in a communications set up. The channel and system models are defined, and the system capacities derived. Using these expressions, simulation results are presented showing the capacity of the given systems.

Finally, conclusions will be drawn in Chapter 7. The thesis will be summarized, and key points will be presented.

(12)

Chapter 2

Preliminaries

This chapter is concerned with some fundamental facts and definitions that will be used a great deal in the rest of the thesis. Definitions of complex polynomials and constant matrices will be given, and then the two concepts will be joined into complex polynomial matrices.

The definitions regarding polynomials, matrices and polynomial matrices are mostly taken from [5, 6], where more details are given.

Additionally, a brief discussion about algorithmic complexity will be undertaken. After this chapter, the reader will be aware of all mathematics needed to understand the algorithms to be presented.

2.1 Complex Polynomials

A Laurent polynomial is an expression of the form

p(z) = C−V1zV1 + . . . + C−1z1+ C0+ C1z−1+ . . . + CV2z−V2 (2.1) or more compactly

p(z) =

V2

X

v=−V1

Cvz−v V1 > 0, V2> 0 (2.2) where z is some indeterminate symbol and the Cv coefficients are taken from some field F. For our purposes, we will assume that the coefficients are complex numbers, that is F = C. Hence, (2.2) is a complex Laurent polynomial. In the following, we will just call it a polynomial. For a more in-depth discussion of fields and their properties, see [7, Ch. 7].

For F = C it holds that every polynomial uniquely determines a function [7, p. 297]. The function is evaluated at a point z0 simply by replacing the indeterminate symbol z in (2.2) with z0, and performing the summation. It should be noted though that a polynomial, and the function defined by the polynomial, are two distinctly different entities.

In order to conveniently classify polynomials, some notation will now be introduced. The relation∼, used as

p(z)∼ (V1, V2, C) (2.3)

signifies that p(z) has V1 positive exponent terms, V2 negative exponent terms and that the coefficients are in C. The space of all Laurent polynomials with complex coefficients will be denoted C. The space of all polynomials p(z)∼ (V1, V2, C) will be C1×1×V1+V2+1. Further on, the maximum degree of p(z) is V1+ V2.

(13)

It is clear that for V10 > V1, V20 > V2

p(z)∼ (V1, V2, C) =⇒ p(z) ∼ (V10, V20, C). (2.4) This can be shown to hold by setting the added outer coefficients to zero.

The reason for our interest in polynomials is in their relation to Linear Time-Invariant (LTI) systems. Chapter 3 will explore this relation further.

2.1.1 Addition and Subtraction Given two polynomials

a(z)∼ (V1, V2, C), b(z)∼ (U1, U2, C) (2.5) the variables

M1 = max(V1, U1), M2 = max(V2, U2) (2.6) can be defined. Assuming that a(z), b(z) have coefficient sequences

{Ca,i}Vi=−V2 1, {Cb,i}Ui=−U2 1 (2.7) the sum of the polynomials is defined as

(a + b)(z) = a(z) + b(z) =

M2

X

v=−M1

(Ca,v + Cb,v) z−v (2.8)

with

Ca,v = 0∀v /∈ [−V1, V2], Cb,v= 0∀v /∈ [−U1, U2]. (2.9) Subtraction is similarly defined, but replacing Cb,v → (−Cb,v)∀v.

2.1.2 Multiplication

Multiplication of two polynomials is the convolution of the two coefficient sequences. Given the polynomials in 2.5, the product is written as

c(z) = a(z)b(z) =

V2

X

v=−V1

Ca,vz−v

U2

X

u=−U1

Cb,uz−u

=

V2

X

v=−V1

U2

X

u=−U1

Ca,vCb,uz−(v+u).

(2.10)

In particular, the coefficient associated with z−r in the product will be given by

Cc,r= X

u+v=ru,v

Ca,vCb,u =

V2

X

v=−V1

Ca,vCb,r−v. (2.11)

Defining

Ca,v = 0∀v /∈ [−V1, V2], Cb,r−v= 0∀(r − v) /∈ [−U1, U2] (2.12)

(14)

the sum in (2.11) can be written as an infinite sum Cc,r=

X

v=−∞

Ca,vCb,r−v (2.13)

which can be identified as the convolution sum.

Let d1, d2 be the maximum degrees of the polynomials a(z), b(z). Then by zero-padding the two coefficient vectors to length d1+ d2− 1, the convolution can efficiently be evaluated using the convolution theorem [8, p. 191]. That is,

c(z) =Fd−1(Fd(a(z))Fd(b(z))) (2.14) where the transforms are understood to be working on the coefficient vectors of the polyno- mials.

2.2 Polynomial Matrices

A polynomial matrix A(z) is a matrix whose elements are polynomials, or equivalently, a polynomial whose coefficients are matrix-valued [6, p. 24]. An arbitrary polynomial matrix

A(z) =

V2

X

v=−V1

Avz−v (2.15)

belongs to the space Cp×q if Av ∈ Cp×q ∀v. For the given V1, V2, we can also write {Av} ∈ Cp×q×(V1+V2+1).

The transpose AT(z), conjugate A(z) and Hermitian conjugate AH(z) = AT(z)

of a polynomial matrix are obtained by applying the respective operation on each of the coefficient matrices. In addition, AH(z−∗) will be termed the para-Hermitian conjugate of A(z). A polynomial matrix which satisfies AH(z−∗)A(z) = I is called a paraunitary matrix [4]. This type of matrices will play an important part in the algorithms to be developed in Chapter 4, as their columns are mutually orthogonal over all frequencies. Due to this orthogonality, the multiplication of an arbitrary matrix with a paraunitary matrix preserves the Frobenius norm of the original matrix.

The Frobenius norm of the polynomial matrix in (2.15) is defined as

kA(z)kF = v u u t

p

X

i=1 q

X

j=1 V2

X

v=−V1

|[Av]ij|2 (2.16)

where [·]ij denots the (i, j) component of the matrix.

In the following, the terms matrix and polynomial matrix will be used interchangeably, with the understanding that an ordinary matrix is just a polynomial matrix of maximum degree 0 with a single coefficient matrix.

2.2.1 Givens Rotations

A constant Givens rotation is a unitary transformation which zeroes a specific component of a vector. For the 2× 2 case, the constant complex Givens rotation is defined as

G =

 ce se

−se−jφ ce−jα



. (2.17)

(15)

Applying G to a vector x∈ C2×1 one obtains Gx = Gx1

x2



=

 cex1+ sex2

−se−jφx1+ ce−jαx2



and by selecting

θ =

(tan−1

|x2|

|x1|



if|x1| 6= 0

π

2 if|x1| = 0 c = cos θ s = sin θ

α =−arg(x1) φ =−arg(x2) it can be shown that

[Gx]2 = 0.

Additionally, since G is unitary, the magnitude squared of the other component will be

|[Gx]1|2 =|x1|2+|x2|2

which will be referred to as the energy moving property of the Givens rotation. Intuitively, the application of the Givens rotation moves the energy of component 2 to component 1, so that component 2 becomes zero.

The Givens rotation in (2.17) can be extended so that it zeroes a specific component of a p× 1 vector or p × q matrix. For the matrix case, if element (i, j) is to be zeroed, and element (i, i) is to receive the energy, the Givens rotation takens the form of a p× p identity matrix with elements at the intersections of rows i, j and columns i, j replaced by the elements of (2.17).

For the polynomial matrix case, a polynomial Givens rotation (PGR) can be defined.

What will be referred to as a PGR in this thesis is the elementary polynomial Givens rotation of [4], with an elementary delay matrix prepended. The polynomial analogue of (2.17) is therefore

G(z) =1 0 0 z−t

  ce se

−se−jφ ce−jα

 1 0 0 zt



=

 ce sezt

−se−jφz−t ce−jα



(2.18)

which when applied to x(z) = x1(z) x2(z)T

will zero the coefficient associated with z−t in x2(z) and move the energy to the constant coefficient of x1(z), if the parameters are chosen such that

θ =

(tan−1

|x2(t)|

|x1(0)|

 if|x1(0)| 6= 0

π

2 if|x1(0)| = 0 c = cos θ s = sin θ

α =−arg(x1(0)) φ =−arg(x2(t))

(2.19)

where xi(j) denotes the coefficient associated with z−j for element xi. In the process, all other coefficients of x1(z) and x2(z) will be affected. It can be shown that GH(z−∗)G(z) = I, and (2.18) is therefore a paraunitary operation. The paraunitarity implies that the operation is norm preserving, and in particular that the energy moving property still holds.

(16)

The polynomial Givens rotation can easily be extended to the p× q case, in the same manner as for the constant Givens rotation case. By construction, the p×p polynomial Givens rotation will also be paraunitary. For every application of a polynomial Givens rotation, the degree of the polynomial matrix it is applied to will grow with 2|t|. This inherent property of the PGR will lead to a need for a truncation step in the algorithms to be formed, as further explained in Chapter 4.

The application of a PGR only affects two rows of the matrix it is applied to. The complexity of the application on a p× q matrix with r lags is therefore

CPGR= 2q(r + 2|t|). (2.20)

2.2.2 Decompositions

The (full) QR decomposition of a constant matrix A∈ Cp×q is

A = QR (2.21)

where Q∈ Cp×pis a unitary constant matrix, and R∈ Cp×q is an upper triangular constant matrix [5, p. 112]. The constant QR decomposition can be calculated in a variety of ways:

through Givens rotations, Householder rotations or via the Gram-Schmidt orthogonalization procedure [5, pp. 114-117]. An approximate polynomial QR decomposition of a polynomial matrix A(z)∈ Cp×q is

A(z) = Q(z)R(z) (2.22)

where Q(z)∈ Cp×q is an approximately paraunitary polynomial matrix and R(z)∈ Cp×q is an approximately upper triangular polynomial matrix [4]. Intuitively, this can be seen as a constant QRD taken over all frequencies.

For an arbitrary constant matrix A∈ Cp×q, the singular value decomposition (SVD) is

A = UDVH (2.23)

where U ∈ Cp×p, V ∈ Cq×q are unitary matrices and D ∈ Rp×q is a diagonal matrix [5, p.

414]. The diagonal entries of D are called the singular values of the matrix A. The columns of U and V are called the right and left singular vectors of A, respectively. An efficient implementation for calculating the SVD of a constant matrix can be found in [9, p. 448]

Similarly, an approximate singular value decomposition can be obtained for polynomial matrices. Given A(z)∈ Cp×q, a PSVD of A(z) is

A(z) = U(z)D(z)VH(z−∗) (2.24)

where U(z)∈ Cp×p, V(z)∈ Cq×q are approximately paraunitary matrices and D(z)∈ Cp×q is an approximately diagonal matrix [4] whose diagonal elements are called the singular values of A(z). As in the QRD case, the intuition is that the polynomial matrix A(z) is decomposed into its SVD over all frequencies. Note that in this definition there is no assumption of ordering of the singular values, as compared to the ordinary SVD [5, p. 414].

2.2.3 Coefficient Truncation

During certain steps of the algorithms to be investigated in Chapter 4, the maximum degrees of the polynomials involved grow fast. As will be seen, often times most of the energy of the

(17)

filter coefficients will be concentrated around the constant coefficient. In order to keep the maximum degrees of the polynomials involved down, a truncation step is utilized. Thereby, the storage requirements for the polynomial matrices are reduced, as well as the computational load involved when applying the decomposition factors as filters. The truncation step is defined in [4], but is restated in Algorithm 1 for clarity.

Algorithm 1 Polynomial Matrix Truncation

1: Input polynomial matrix A(z)∼ (V1, V2, Cp×q) and truncation parameter µ.

2: Find the maximum value for T1 so that ||A(z)||1 2

PT1

v=V1

Pp l=1

Pq

m=1|alm(v)|2µ2 holds.

3: Find the minimum value for T2 so that ||A(z)||1 2

PV2

v=T2

Pp l=1

Pq

m=1|alm(v)|2µ2 holds.

4: Return Atrunc(z) =PT2

v=T1Avz−v

The parameter µ defines the proportion of the total energy of the filter to be truncated from the matrix. It can be shown that the complexity of the naive implementation of this algorithm isO(pqr2). The complexity can be decreased by a binary search algorithm.

2.3 Computational Complexity

When studying algorithms from a performance perspective, it is interesting to analyze their running times. This may either be done through benchmarking or theoretical analysis [10, p. 91]. In this thesis, both approaches will be used. Benchmarking is straight-forward, as it only involves running the algorithm and measuring some quantity related to the performance.

By running the algorithm for different sets of input data, experimental relations are obtained relating the performance to the properties of the input. The downside of benchmarking is that the algorithm must be implemented first, and that one can not be sure that the results generalize.

Theoretical analysis, on the other hand, typically only gives upper bounds of the perfor- mance. This is particularly the case for algorithms containing loops, where the number of iterations of the loops are not obviously defined in terms of the input data size. In order to deal with bounds in a convenient manner, Ordo or Big-Oh notation is introduced. Assume that T (n) describes the run time of an algorithm with input data size n. Then we say that T (n) =O(f(n)) if it holds that

T (n)≤ cf(n)

for some constant c > 0 and all integers n > n0. Ordo notation is a very convenient tool for describing complexity expressions, because of the two following rules as given by [10, p. 98]:

1. Constant factors don’t matter: For any constant d, O(df(n)) = O(f(n)) since the constant can be accumulated in the hidden constant c of the Ordo notation.

2. Low-order terms don’t matter: For the polynomial case, only the term with the largest degree needs to be kept, since it will dominate over any other terms for large n.

(18)

Chapter 3

MIMO Channels and Multipath Propagation

3.1 Propagation and Modeling

Wireless communication uses electromagnetic waves in the radio spectrum for information transfer. Electromagnetic waves are completely characterized by the Maxwell equations; a set of non-linear partial differential equations. The inherent complicated structure of the equations makes them hard to solve as well as hard to analyze analytically. Radio engineers therefore tend to use simpler models, where certain aspects of the radio wave propagation can more easily be analyzed.

3.1.1 Propagation

In general, the received radio signal is modeled as an attenuated and modified version of the transmitted signal, with some noise added. The attenuation can be classified into three time-scale dependent behaviours: path loss, shadowing and fading [3]. Path loss occurs only because of the distance between the transmitter and the receiver, and can be modeled as a factor 1/rd, where d usually is in the range [2, 4] [3]. At best, which is the case of an isotropic transmitter antenna with fixed transmit power in empty space, the exponent will take the value 2. This is because for this scenario, the generated radio waves will be spherically propagating from the antenna, and the amplitude at distance r will decrease as 1/r. The power therefore decreases as 1/r2. Due to ground reflections and other phenomena, the exponent can be larger though. The path loss is the most slowly varying of the three attenuation factors mentioned.

Shadowing takes place on a shorter time scale, and is typically incurred by objects that are blocking the radio waves. In an urban environment, this could be due to cars moving around and temporarily changing the propagation paths. Shadowing is typically hard to model, but one common model is the log-normal distribution [3].

The most quickly evolving attenuation phenomenon is fading. This occurs due to radio waves from different paths adding up constructively or destructively at the receiver. This interference effect changes on the order of half a wavelength, and is therefore sensitive to small changes in the propagation environment. Assuming that there is no Line-of-Sight (LOS) component, the fading is called Rayleigh fading. The name arises from the fact that the

(19)

channel coefficient can be modeled, through the central limit theorem, as a complex Gaussian stochastic variable, whose magnitude square is described by the Rayleigh distribution. If there is a LOS component, the channel gains are instead modeled by a Rice distribution, and the phenomenon is called Rician fading.

A subject that we have touched upon, but not discussed upfront, is that of multipath propagation. Depending on the environment, a transmitted radio wave may take several different paths to the receiver. An example can be seen in Figure 3.1. Depending on the path lengths, and the associated reflections, several delayed versions of the same signal will reach the receiver. The delay spread of a channel is a measure of how large spread it is between the first and the last multipath component to arrive at the receiver. For short delay spreads, compared to the transmit symbol period, the channel is said to have narrowband fading, for which the Rayleigh and Rice models work well. On the other hand, for channels with large delay spreads, other models must be used. The coherence bandwidth Bcof a channel is what width in the frequency domain can be assumed constant, and is approximately inversely proportional to the delay spread.

Figure 3.1: Example of Multipath Propagation

For channels where the delay spread is larger than the symbol period, an effect called inter-symbol interference (ISI) takes place. Since the symbols are transmitted at a high rate, the channel impulse response will not have enough time to ”die off”. Instead, subsequent symbols will be overlaid at the receiver, and therefore interfere with each other. This type of channels are also called frequency selective channels, since different frequency components of the transmitted signal will be attenuated by different factors.

(20)

3.1.2 Channel Modeling

In this section, the channel is assumed to be described well by a single-input single-output (SISO) linear system. A linear system is characterized by its impulse response at time t, h(t; τ ). For a transmitted signal s(t), the received signal r(t) is modeled as a filtered version of s(t) with some noise n(t) added, such that

r(t) = Z

−∞

h(t; τ )s(t− τ)dτ + n(t).

In the following, we will assume that the channel is time-invariant over the transmission of a block of data, and the channel impulse response can be replace by h(τ ) such that

r(t) = Z

−∞

h(τ )s(t− τ)dτ + n(t). (3.1)

The output of the channel can be thought of like a weighted sum of the input signal x(t) with weighting factor h(τ ). A graphical representation of a SISO channel can be seen in Figure 3.2.

Figure 3.2: Single-input single-output Channel

Radio transmissions always take place at some carrier frequency, which is modulated for data transfer. The carrier frequency itself does not carry any information, and it is therefore convenient to remove the effect of the carrier in any analysis of the signal. For the example above, all signals involved are assumed to be real, since they relate to physical processes. Let S(f ) be the Fourier transform of s(t), band-limited to [fc− W/2, fc+ W/2] where fc is the carrier frequency and W is the signal bandwidth. For consistency, W < 2fc. The complex baseband equivalent version of the signal is then defined through its Fourier transform

Sb(f ) = (√

2S(f + fc) f + fc> 0

0 f + fc≤ 0. (3.2)

Since s(t) is real, its Fourier transform is symmetric around the origin. The transformation (3.2) therefore effectively moves the positive spectrum down from the carrier frequency to baseband, and scales it so that the sum power remains constant. The baseband equivalent spectrum will not be symmetric around the origin anymore, and the signal sb(t) defined by Sb(f ) is therefore complex.

Note that the original signal s(t) can easily be recovered from sb(t) since S(f ) = 1

√2Sb(f− fc) + Sb(−f − fc) and by taking inverse Fourier transforms

s(t) = 1

√2

sb(t)ej2πfct+ sb(t)e−j2πfct

=√

2Real(sb(t)ej2πfct). (3.3)

(21)

The complex baseband equivalent signal is more convenient to work with than the passband signal, and since the transformation is invertible no information is lost in the conversion process.

Finally, in order to work with the signals in a computer, a discrete-time model is needed.

Sampling (3.1) faster than the Nyquist rate, it can be shown that a discrete-time model takes the form

r[m] =

X

l=−∞

h[l]s[m− l] + n[m] (3.4)

where h[l] is determined from the transmit, channel, and receive filters in place. For the full derivations, see e.g. [11, p. 49].

3.1.3 MIMO Channels

A multiple-input multiple-out (MIMO) channel is a channel with several transmit and/or receive antennas. It is described by a matrix, since every transmit-receive antenna pair has a channel associated with it. These single-input single-output (SISO) sub-channels may be correlated. A graphical representation of a MIMO channel can be seen in Figure 3.3. The number of antennas used in MIMO communications systems are typically on the order of 2-4 antennas on either or both sides. For example, the IEEE 802.11n WiFi standard allows for up to 4 antennas on both sides [12].

..

. .

..

Figure 3.3: Multiple-input multiple-output Channel

Following the same derivations as in the previous section, it can be shown that discrete- time complex baseband equivalent system for the MIMO channel with Mr receive antennas and Mt transmit antennas takes the form

r[m] =

X

l=−∞

H[l]s[m− l] + n[m] (3.5)

(22)

where {r[m], n[m]} ∈ CMr×1, H ∈ CMr×Mt, s[m] ∈ CMt×1. For the narrowband case, (3.5) simplifies to

r[m] = Hs[m] + n[m].

The matrix channel H[m] describes how the transmitted signal is mixed in space and time, when sampled at the receiver. Row i of H[m] determines the weighting factors for the received signal at receiver antenna i at time lag m. Similarly, column j of H[m] describes the spatial signature of the signal from transmitter antenna j at time lag m.

Taking the z-transform of both sides of (3.5), and using the convolution theorem for the z-transform [8, p. 191], the relation is described by the polynomial matrix equation

r(z) = H(z)s(z) + n(z), (3.6)

a fact we will rely heavily on in the following.

3.2 Channel Capacity and Achievable Rate

In his seminal paper ”A Mathematical Theory of Communication” [13], Claude E. Shannon, introduced the concept of a channel capacity, and derived it for the additive White Gaussian noise (AWGN) SISO channel with an average power constraint. The channel capacity, if known for the given channel model, is the highest possible rate that can be used for commu- nication over the channel with vanishing error probability. That is, for any rate below the channel capacity, there exists a code, with long enough codewords, that achieves that rate with arbitrarily small error probability.

Shannon’s results provide a benchmark for which any practical transmission scheme can be compared. The proof of the channel coding theorem is not constructive however, and it is not until recently with the introduction of turbo codes and low density parity check (LDPC) codes that practical systems get close to the channel capacity [2, p. 252].

In [13], Shannon derived the channel capacity for the deterministic AWGN channel with an average power constraint. With system bandwidth W , noise variance σ2n and average power less than P , the well-known formula for the capacity is

SAWGN = W log

 1 + P

σ2n

 .

A similar result, for the channel capacity of deterministic narrowband AWGN MIMO channel with average sum power constraint, was derived in [14]. The capacity of the channel, with spatially and temporally white Gaussian noise with variance σn2, is given by

SMIMO-AWGN = max

P log

I + 1

σn2HPHH

where P = E ssH is the covariance of the transmit symbol s. The rate R < SMIMO-AWGN R = log

I + 1

σn2HPHH

(3.7) is called an achievable rate, since for a sub-optimal choice of P it is less than the channel capacity. The noisy channel coding theorem of [13] therefore gives that such a rate R exists, since R < S and a code with rate S exists. For our purposes, (3.7) will simply serve as a mapping from SNR to data rate, which will be useful for comparison of transmission schemes in Chapter 6.

(23)

3.3 Equalization Techniques

For a wideband channel, multipath propagation is inherent. For high symbol rates, this results in ISI. What is meant with a high symbol rate is not clear cut, but the rule of thumb is that ISI must be mitigated when W >> Bc.

In order to do symbol-by-symbol detection, the receiver needs to remove the ISI. The operation is called channel equalization, and there are many different strategies on how to do it. The equalizer needs information about how the channel behaves, typically the channel impulse response. This is obtained by a training phase, where the receiver can identify the channel from a sequence of known symbols that are sent. The optimal equalization, in the sense of minimizing the probability for that a symbol in the sequence is detected incorrectly, is called Maximum Likelihood Sequence Estimation (MLSE), and is implemented in practice using the Viterbi algorithm [15, p. 88]. A downside of MLSE is that the algorithm complexity grows exponentially in the number of channel taps [15, p. 90]. On the other side is the family of low-complexity linear equalizers. A common linear equalizer, the zero-forcing equalizer, removes the effect of ISI completely but suffers from noise amplification. Another equalizer, the MMSE receiver, makes an optimal trade-off between removing ISI and amplifying the noise, for the class of linear receivers.

Orthogonal Frequency-Division Multiplexing

For channels with high spectral dynamics, that is small coherence bandwidth Bc, there is another viable alternative. Orthogonal frequency-division multiplexing (OFDM) leverages the Fast Fourier Transform (FFT) and can handle quasi-static wideband channels well [15, p.

99]. In fact, OFDM can achieve the channel capacity as the number of sub-carriers grow large.

Through OFDM, signaling is performed in the frequency domain. In order to launch the signal onto the channel, it is transformed to the time domain using an Inverse FFT (IFFT). At the receiver, the received signal is transformed to the frequency domain, where detection takes place. Effectively, OFDM transforms the wideband channel into a set of parallel independent channels in the frequency domain.

For the SISO case, assume that a sequence of N bits b(n) ∈ {0, 1} is to be transmitted.

Through some mapping from b(n), a frequency vector s0 ∈ CN ×1 is created. The frequency vector is transformed to the time domain through the application of an IDFT matrix F ∈ CN ×N defined by

Fij = 1

√Nej2π(i−1)(j−1)

N . (3.8)

The time domain representation then takes the form

s00= FHs0. (3.9)

In order to make the cyclic convolution of the DFT into a linear convolution, a cyclic prefix is prepended so that the signal

es = s00[N− (L − 1)] . . . s00[N− 1] s00[0] s00[1] . . . s00[N ]T

(3.10) is transmitted on the channel. Because of the cyclic prefix, the output of the channel will be a cyclic convolution of the input signal, plus noise

er[m] =

L−1

X

l=0

hls00[(m− L − l)modN] + w[m] (3.11)

(24)

At the receiver, stripping of the cyclic prefix, the vector r00= er[0] er[1] . . . er[N− 1]T

(3.12) is formed. Transforming into the frequency domain, the received frequency vector

r0 = Fr00 (3.13)

is obtained, which then is detected for every vector component. It can be shown that thanks to the cyclic prefix, the channel is rendered circulant, and the IFFT/FFT pair diagonalizes circulant matrices. The model, in the frequency domain, for the communications process is therefore

r0= Ωs0+ w (3.14)

where Ω is a diagonal matrix with the channel gains of the different frequency bins on the diagonal. The noise keeps its characteristics due to the unitarity of the FFT transform.

With the channel diagonalized over the frequency band, the transmitter is free to select varying transmit powers for the different frequency bins. The optimal way, in the sense of achievable rate, of doing this is by the waterfilling technique as further described by [11, p.

68]. The waterfilling strategy is also applied in Chapter 6, where a wideband MIMO channel is diagonalized in frequency through OFDM.

3.4 Summary

This chapter provided an introduction to channel modeling in wireless communications, as well as the concepts of channel capacity and achievable rate. Furthermore, multipath propagation and the sometimes ensuing inter-symbol interference was introduced. Some strategies for combating ISI was discussed, and the OFDM technique was described in detail.

In Chapter 6, most of the ideas of this chapter will be referred to. In particular, OFDM will be used to mitigate ISI for the MIMO wideband scenario present there.

(25)

Chapter 4

Polynomial Decomposition

Algorithms: Coefficient Nulling

This chapter will investigate a couple of polynomial decomposition algorithms which are based on the idea of iterative nulling of single coefficients. Polynomial Givens Rotations (PGRs) are employed for the coefficient nulling, as defined in Section 2.2.1. Through the application of consecutive PGRs on a matrix, decompositions such as PQRD and PSVD can be found.

Four algorithms will be studied, of which two were proposed by Foster et. al. in [4]. The remaining two are slightly modified versions of the original algorithms.

The decompositions generated by the algorithms will be approximations, because as shown by [16], an exact FIR decomposition of a FIR matrix is impossible to achieve. In this thesis, approximate polynomial decomposition algorithms will be used for the channel diagonalization problem of spatial multiplexing in wireless communications. This topic is further studied in Chapter 6.

In [17], McWhirter proposes a different procedure for obtaining a polynomial singular value decomposition. There, a sequential best rotation algorithm is introduced using generalized Kogbetliantz transformations. The algorithm, which is not studied in this thesis, is shown to perform better than previous sequential best rotation procedures.

The first section of this chapter will describe the performance measures employed in the study of the algorithms. As these are defined, the following sections will investigate one algorithm at a time, with respect to function, convergence and complexity.

4.1 Performance Criteria

In order to measure performance of the algorithms, a number of performance criteria will be defined. For the run time measurements, or algorithm complexity, the number of iterations of the innermost loop (coefficient steps, see Section 4.2) needed until convergence will be taken as the performance measure. The complexity in terms of floating point operations is then simply obtained by plugging in the complexity of a single coefficient step in terms of floating point operations, as given by equation (2.20).

Before introducing the other performance criteria, the following optimization problems

(26)

are posed:

minimize kA(z) − Q(z)R(z)kF minimize kA(z) − U(z)D(z)VH(z−∗)kF

subject to QH(z−∗)Q(z) = I subject to UH(z−∗)U(z) = I VH(z−∗)V(z) = I kRlower(z)kF = 0 kDnon-diag(z)kF = 0.

Casually speaking, the PQRD and PSVD algorithms under study, respectively, can be thought to approximately solve these optimization problems. The resulting matrices Q(z), R(z) or U(z), D(z), V(z) would however neither be feasible nor minimize the cost function. Rather, the algorithms will output matrices that are ”close” to fulfilling the constraints, while having a ”small” associated cost. With this loose argument in mind, error criteria will be defined that determine how ”close” a set of matrices are to fulfilling the constraints, and how ”small”

the associated cost is.

How well the product of the decomposition matrices describes the decomposed matrix is measured by the decomposition error

EdQRD= kA(z) − Q(z)R(z)kF

kA(z)kF

(4.1) for a PQRD A(z) = Q(z)R(z). For the same PQRD, the triangularity error

Et= kRlower(z)k2F

kA(z)k2F . (4.2)

shows the ratio of the amount of energy in the lower triangular section of R(z) to the total amount of energy. Similarly, for a PSVD A(z) = U(z)D(z)VH(z−∗) the decomposition error is

EdSVD = kA(z) − U(z)D(z)VH(z−∗)kF

kA(z)kF

. (4.3)

This decomposition error definition makes sense from the algorithm evaluation point of view, as it represents how well the decomposition factors describe the original matrix. From an application point of view, it might be interesting to study the normalized version ofkD(z) − UH(z−∗)A(z)V(z)kF, as that definition instead would describe the error in the calculated singular values of the original matrix.

The relative amount of energy in the non-diagonal part of D(z) is described by the diag- onality error

Ediag= kDnon-diag(z)k2F

kA(z)k2F . (4.4)

Finally, the unitarity error of any matrix A(z) is defined as Eu= kI − AH(z−∗)A(z)kF

kIkF

(4.5) and indicates how close A(z) is to being paraunitary.

(27)

4.2 PQRD-BC: Polynomial QR Decomposition

The first algorithm to be studied is PQRD By Columns (PQRD-BC), which was first intro- duced in [4]. The algorithm generates an approximate PQRD

A(z) = Q(z)R(z)

of an arbitrary matrix A(z), where Q(z) is approximately paraunitary and R(z) is approxi- mately upper triangular. This is done through the iterative application of PGRs, until A(z) has been transformed into a sufficiently upper triangular matrix. As will be shown in Sec- tion 4.4, PQRD-BC will be a necessary component of the PSVD by PQRD-BC algorithm.

As the next algorithm to be studied, Modified PQRD-BC, is a derivative of PQRD-BC, we will refrain from presenting simulation results for PQRD-BC. Rather, the results for Modified PQRD-BC given in Section 4.3 also represent the typical behaviour of PQRD-BC.

Though the original definition of PQRD-BC can be found in [4], we restate the algo- rithm here for completeness. A pseudocode representation of the algorithm can be seen in Algorithm 2.

Algorithm 2 PQRD By Columns (PQRD-BC)

1: Input polynomial matrix A(z) ∼ (V1, V2, Cp×q), convergence parameter , truncation parameter µ and absolute stopping criteria MaxIter and MaxSweeps.

2: Let Q(z) = Ip, g1= 1 +  and n = 0.

3: while n≤ MaxSweeps and g1 >  do

4: Let n = n + 1.

5: for k = 1 . . . min(p− 1, q) do

6: Let iter = 0 and g2 = 1 + 

7: while iter≤ MaxIter and g2 >  do

8: Find j and t such that |ajk(t)| ≥ |amk(t)| holds for m = k + 1 . . . p and ∀t ∈ Z.

9: Let g2=|ajk(t)|.

10: if g2 >  then

11: Let iter = iter + 1.

12: Obtain PGR G(z) as a function of (j, k, t,|ajk(t)|, |akk(0)|).

13: Let A(z) = G(z)A(z).

14: Let Q(z) = G(z)Q(z).

15: Truncate A(z), Q(z) given µ.

16: end if

17: end while

18: end for

19: Find j, k and t such that |ajk(t)| ≥ |amk(t)| holds for n = 1 . . . q and m = n + 1 . . . p and ∀t ∈ Z.

20: Let g1=|ajk(t)|.

21: end while

22: Let R(z) = A(z).

For future reference, the block of rows 12-15 will be called a coefficient step and the operations in rows 6-17 a column step. The algorithm operates over all columns of A(z) from left to right. For every column, a certain number of coefficient steps are performed,

(28)

until the coefficients in the given column are sufficiently small. To determine what coefficient to be nulled next, the coefficient with greatest magnitude under the main diagonal in the given column is found. This coefficient is subsequently nulled through the application of a polynomial Givens rotation with appropriate parameters. This behaviour is repeated until all coefficients in the given column have a magnitude less than the convergence parameter .

As one column step is finished, the algorithm moves to the next column, until all columns have been traversed. As a safe-guard, the algorithm can restart from column 1 if necessary, making another sweep over the columns.

Every coefficient step includes a matrix truncation, implemented as described by Algo- rithm 1. Without this truncation step, the maximum degrees of the matrix polynomials would grow very fast, and with that the memory requirements. As is generally the case, most of the energy of the coefficients is centered around the zero-lag coefficient, and the decomposition is therefore not ruined. This can specifically be seen in the example in Section 4.3.2.

4.2.1 Convergence and Complexity

The polynomial Givens rotation is a paraunitary transformation, and therefore energy pre- serving. By selecting parameters according to equation (2.19), the rotation has the effect of moving energy from the coefficient being nulled to the zero-lag coefficient of the diagonal element of the current column. After a column step is finished, most of the energy of the coefficients below the main diagonal will have been transferred to the diagonal element. Be- cause the columns are visited in order from left to right, any subsequent coefficient step will mainly affect columns including and to the right of the current column. This is because the coefficients in the previous columns will be close to being zero. The left-to-right ordering of column steps, together with the fact that the algorithm is allowed to restart for more sweeps, guarantees the convergence of the algorithm. The full convergence proof can be found in [4].

In order to derive the theoretical complexity, assume that PQRD-BC is applied to A(z)∼ (V1, V2, Cp×q). Additionally, it will be assumed that the time lag dimension of any matrix involved will be bounded by some r∝ (V1+ V2+ 1), because of the truncation step at row 15.

The complexity will be derived in terms of number of coefficient steps needed for convergence.

By definition, the block of rows 12-15 is one coefficient step. Let the complexity of the coefficient step be O(Cc) where Cc is some function of p, q, r. Since row 11 is O(1), it is insignificant compared to the complexity of the coefficient step. Rows 8 and 19 involve searches for the coefficient with the largest magnitude under the main diagonal in the current column, and under the main diagonal in any column respectively. Denote the complexities of these algorithms as O(Cjt) and O(Cjkt).

The number of iterations of the while loop starting at row 7 is bounded by the con- stant MaxIter, and the complexity of the loop is thereforeO(MaxIterCc) +O(MaxIterCjt) = O(Cc) +O(Cjt). This is a bound with a very large hidden constant in the Ordo expression, and may therefore not be representative for the typical behaviour of the loop. In practice, the number of iterations of the loop would be proportional to the number of coefficients below the main diagonal of that column. The modified PQRD-BC given in Algorithm 3 does not suffer from this theoretical inconvenience, as the convergence criterion is defined differently.

Furthermore, the for loop at row 5 will iterate min(p−1, q) times, and the complexity of the block of rows 6-17 is thereforeO(min(p − 1, q)Cc) +O(min(p − 1, q)Cjt). The number of iter- ations (sweeps) of the outermost loop of the algorithm is bounded by a constant MaxSweeps.

The complexity of the iterative part of the algorithm is therefore O(MaxSweeps min(p −

References

Related documents

Dess- utom finns risk för socialt önskvärda svar (Bryman, 2002). I vår studie finns exempelvis risk för att företagsledarna framhållit sitt beslutsfattande som mer analytiska

I detta avseende används därför de slutsatser som tagits, från redogörelsen för neutralitetsprincipen och skatteförmågeprincipen, för att kunna avgöra på vilket sätt

Clark och English (2004) delar upp deltagarna i olika counselinggrupper, till exempel en grupp för barn, en för föräldrar till hörselskadade barn och en för vuxna..

a certain transport per kg of product is similar regardless of product group, but since there are so large differences in emissions of GHG’s from primary production the

I likhet med Berndtson som bekänner sina känslor för Arla men inte agerar efter dem är markisen tidigt öppen med sina känslor samtidigt som han inleder

Key Words: Discrete Dynamic Systems, Control, Finite Field Polynomial, Boolean Al- gebra, Propositional Logic, Binary Decision Diagrams, Temporal Logic, Modeling, Model

Syftet med detta examensarbete är att automatisera beräkningar för U-värden med hjälp av programmering, för att kunna jämföra olika konstruktionslösningar och

Det föreligger emellertid objektivt godtagbara skäl, vilket innebär att förfarandet inte utgör diskriminering enligt art. Det bör dock understrykas att en stor del av bedömningen