• No results found

ClosedformsolutionsusingfreeindependenceJolantaPielaszkiewicz Ontheasymptoticspectraldistributionofrandommatrices

N/A
N/A
Protected

Academic year: 2021

Share "ClosedformsolutionsusingfreeindependenceJolantaPielaszkiewicz Ontheasymptoticspectraldistributionofrandommatrices"

Copied!
68
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköping Studies in Science and Technology

Licentiate Thesis No. 1597

On the asymptotic spectral

distribution of random matrices

Closed form solutions using free independence

Jolanta Pielaszkiewicz

Department of Mathematics

Linköping University, SE–581 83 Linköping, Sweden

Linköping 2013

(2)

Linköping Studies in Science and Technology Licentiate Thesis No. 1597

On the asymptotic spectral distribution of random matrices Closed form solutions using free independence

Jolanta Pielaszkiewicz jolanta.pielaszkiewicz@liu.se www.mai.liu.se Mathematical Statistics Department of Mathematics Linköping University SE–581 83 Linköping Sweden LIU-TEK-LIC-2013:31 ISBN 978-91-7519-596-4 ISSN 0280-7971

Copyright c 2013 Jolanta Pielaszkiewicz

(3)
(4)
(5)

Abstract

The spectral distribution function of random matrices is an information-carrying object widely studied within Random matrix theory. In this thesis we combine the results of the theory together with the idea of free independence introduced by Voiculescu (1985).

Important theoretical part of the thesis consists of the introduction to Free probability theory, which justifies use of asymptotic freeness with respect to particular matrices as well as the use of Stieltjes and R-transform. Both transforms are presented together with their properties.

The aim of thesis is to point out characterizations of those classes of the matrices, which have closed form expressions for the asymptotic spectral distribution function. We consider all matrices which can be decomposed to the sum of asymptotically free inde-pendent summands.

In particular, explicit calculations are performed in order to illustrate the use of asymp-totic free independence to obtain the asympasymp-totic spectral distribution for a matrixQ and

generalize Marˇcenko and Pastur (1967) theorem. The matrixQ is defined as Q = 1 nX1X 0 1+· · · + 1 nXkX 0 k,

whereXiisp× n matrix following a matrix normal distribution, Xi∼ Np,n(0, σ2I, I).

Finally, theorems pointing out classes of matricesQ which lead to closed formula

for the asymptotic spectral distribution will be presented. Particularly, results for matri-ces with inverse Stieltjes transform, with respect to the composition, given by a ratio of polynomials of 1st and 2nd degree, are given.

(6)
(7)

Populärvetenskaplig sammanfattning

I många tillämpningar förekommer slumpmässiga matriser, det vill säga matriser vars el-ement följer någon stokastisk fördelning. Det är ofta intressant att känna till hur egenvär-dena för dessa slumpmässiga matriser uppför sig, det vill säga beräkna fördelningen för egenvärdena, den så kallade spektralfördelningen. Egenvärdena är informationsbärande objekt, då de ger information om till exempel stabilitet och inversen av den slumpmässiga matrisen genom det minsta egenvärdet.

Telekommunikation och teoretisk fysik är två områden där det är intressant att studera slumpmässiga matriser, matrisernas egenvärden och fördelningen för dessa egenvärden. Speciellt gäller det för stora slumpmatriser där man då är intresserad av den asymptotiska spektralfördelningen. Ett exempel kan vara en kanalmatris X för ett flerdimensionellt

kommunikationssystem, där fördelningen för egenvärdena för matrisenXX∗bestämmer

kanalkapaciteten och uppnåeliga överföringshastighet.

Spektralfördelningen har studerats ingående inom teorin för slumpmässiga matriser. I denna avhandling kombinera vi resultaten från teorin för slumpmatriser tillsammans med idén om fritt oberoende som diskuterades först av Voiculescu (1985). Den viktiga teoretiska referensramen av den här avhandlingen består av en inledning i fri sannolikhet-steori, som motiverar användning av asymptotisk frihet med avseende på vissa matriser, samt användningen av Stieltjes och R-transformen. Båda dessa transformer diskuteras tillsammans med sina egenskaper.

Syftet med avhandlingen är att karakterisera klasser av matriser, som har ett slutet uttryck för den asymptotiska spektralfördelningen. På så sätt slipper man numeriska ap-proximationer av spektralfördelningen när man vill dra slutsatser om egenvärdena.

(8)
(9)

Acknowledgments

I would like to express my gratitude to my supervisor Dietrich von Rosen for all advices, support, encouragement and guidance during my work. Thank you for all discussions and commenting on various drafts of the thesis. Similarly, I would like to appreciate all meritorious and administrative help I get from Martin Singull, my co-supervisor. Special thanks for believing in my Swedish language skills.

I am thankful to all the members of Department of Mathematics LiU for creating friendly working atmosphere, especially to people I have opportunity to cooperate or study with. Here, I cannot go without mentioning my colleagues PhD students who have been a great source of friendship.

Finally, I want to acknowledge my family for their love and faith in me and my friends from all around the world.

Linköping, May 13, 2013 Jolanta Pielaszkiewicz

(10)
(11)

Contents

1 Introduction 1

1.1 Problem formulation . . . 2

1.2 Background and some previous results . . . 3

1.3 Outline . . . 5

2 Free probability theory - background information 7 2.1 Non-commutative space and Freeness . . . 7

2.1.1 Space(RMp(C), τ ) . . . 11

2.1.2 Asymptotic Freeness . . . 12

2.2 Combinatorial interpretation of freeness. Cumulants . . . 13

2.2.1 Proof of asymptotic freeness between diagonal and Gaussian ma-trices . . . 15

3 Stieltjes and R-transform 19 3.1 Stieltjes transform . . . 19

3.2 R-transform . . . 23

4 Marˇcenko-Pastur, Girko-von Rosen and Silverstein-Bai theorems 27 4.1 Statements of theorems . . . 27

4.2 Comparison of results . . . 29

5 Analytical form of the asymptotic spectral distribution function 35 5.1 Asymptotic spectral distribution through the R-transform . . . 35

5.1.1 Asymptotic spectral distribution of the matrixQn whenΣ = I andΨ = I . . . 36

5.2 Classes of matrices with closed formula for asymptotic spectral distribu-tion funcdistribu-tion . . . 40

(12)

xii Contents

6 Future Research 45

Notation 47

Bibliography 49

(13)

1

Introduction

T

HEfocus of the thesis has been put on the studies of asymptotic behaviour of large

random matrices, sayQ∈ Q, where Q is a set of all p×p positive definite, Hermitian

matrices which can be written as a sum of asymptotically free independent matrices with known distributions. The aim is to obtain a closed form expression of the asymptotic spectral distribution function of the random matrixQ.

The concept of free independence, mentioned in the previous paragraph, has been introduced within an operator-valued version of Free probability theory by Voiculescu (1985) and allow us to think about sums of asymptotically free independent matrices in a similar way as about sums of independent random variables.

The motivation for considering problems regarding behaviour of the eigenvalues of large dimensional random matrices are arising within, e.g. theoretical physics and wire-less communication, where methods of Random matrix theory are commonly used, see publications by Tulino and Verdú (2004) and Couillet and Debbah (2011). Particularly, Random matrix theory started playing an important role for analyzing communications systems as multiple antennas became commonly used, which implies an increase of the amount of nodes. In application studies often, due to the lack of closed form solutions, nu-merical methods are applied to obtain asymptotic eigenvalue distributions, see e.g., Chen et al. (2012). The computation of the asymptotic spectral distribution forQ∈ Q, given

byQ = Pp×nPn∗×pdemands solving a non-linear system ofp + n coupled functional

equations, which solution has the same asymptotic behaviour as the investigated spectral distribution function, see Girko (1990) and Hachem et al. (2007).

In general, there is a strong interest in finding information about the spectral behaviour of the eigenvalues of random matrices. Particularly the smallest eigenvalue provides knowledge about both stability and invertability for a random positive definite matrix. For example, letX be a channel matrix for multi-dimensional designed communication

systems, then the eigenvalue distribution function ofXX∗determinates channel capacity

and achievable transmission rate.

(14)

2 1 Introduction

1.1

Problem formulation

We are interested in an asymptotic spectral distribution of a positive definite and Hermi-tian matrixQ ∈ Q, which can be written as a sum of asymptotically free independent

matrices with known distributions. The assumption about the matrix to be Hermitian is sufficient for eigenvalues to be real. The Hermitian property together with positive definitness assures that all the eigenvalues are both real and positive.

In the thesis the concept of "infinite matrix" is always realized by referring to a se-quence of random matrices with increasing size.

Definition 1.1 (The normalized spectral distribution function). Letλi be an eigen-value of ap× p matrix W , i = 1, 2, . . . , p with complex entries. The normalized spectral

distribution function of a matrixW is defined by

FW p (x) = 1 p p X k=1 1 {λk≤x}, x≥ 0, where1

{λk≤x}stands for the indicator function, i.e.

1

{λk≤x}=

(

1 for x≤ λk, 0 otherwise.

Previous results obtained in the literature will be compared using distinguished ele-ments from the class of matricesQ denoted by Qn, which can be written as a sum of p× p matrices of the form 1

nXiXi0, whereXi isp× n matrix, such that XiandXjare

independent for alli6= j. We assume that the Kolmogorov conditionp(n)n → c ∈ (0, ∞)

forn→ ∞ holds. More precisely, let Qn= AnA0n+ 1 nX1X 0 1+· · · + 1 nXkX 0 k, (1.1)

whereXihas a matrix normal distribution,Xi ∼ Np,n(0, Σi, Ψi), for all i = 1,· · · , k, XiandXjare independent for alli6= j and Anis a non-randomp× n matrix. The mean

of the matrix normal distribution is ap× p zero matrix and the dispersion matrix of Xi

has the Kronecker product structure, i.e. D[Xi] = D[vec Xi] = Ψi⊗ Σi, where both ΣiandΨiare positive definite matrices. Thevec X denotes vectorization of the matrix.

If some elements are standardized one can interpretΣiandΨias the covariance matrices for the rows and the columns ofXi, respectively.

Qncan also be rewritten in a more compact form as

Qn= PnPn0,

wherePn = Bn + Yn, Bn = (An, 0, . . . , 0), Yn = √1

n(0, X1, . . . , Xn). This form

shows immediately thatQn in our problem formulation is a special case of the matrix

ΩΩ∗, which spectral behaviour has been discussed by Hachem et al. (2007).

The use of the Kolmogorov condition is motivated by the observation that the limiting spectral distribution function of the matrixQnis not affected by the increase of number of rows and column as long as the speed of increase is the same, i.e.p = O(n).

(15)

1.2 Background and some previous results 3

1.2

Background and some previous results

Random matrix theory is the main field which placed its research interest in the prop-erties of matrices, with strong accent put on eigenvalue distribution. The considered matrices, called random matrices, have entries following some probability distribution. An ensemble of random matrices is a family of random matrices with a density function that express the probability densityp of any member of the family to be observed. Let Hn → UHnU−1be transformation which leavesp(Hn) invariant. Within Random

ma-trix theory most studied, classical cases are, whenU is orthogonal or unitary matrix. Then

a matricesHnare real symmetric matrices which give us Gaussian Orthogonal Ensemble (GOE) or complex Hermitian matrices, what corresponds to the Gaussian Unitary Ensem-ble (GUE). All mentioned matrices have diagonal and off-diagonal entries independent, normally distributed with mean zero and variance12.

Ensemble Hij U

GOE real orthogonal

GUE complex unitary

Table 1.1: Classification of Gaussian ensembles, the Hermitian matrixHn= (Hij) and its matrix of eigenvectorsU.

One of the important elements of the GOE is the Wigner matrix. Let a matrixX be p× p

with independent and identically distributed (i.i.d.) Gaussian entries, then the symmetric matrixQ defined as

Q = 1

2(X + X ∗)

is called a Wigner matrix.

In this section we discuss shortly results obtained within Random matrix theory in re-search regarding the spectral distribution of the matrixQn = PnP0

n defined in (1.1). Let

us categorize the literature investigating asymptotic behaviour of eigenvalues into three categories with respect to used assumptions.

1) Independent and identically distributed entries ofPnwith zero mean

The simplified version of the matrixQn, whenk = 1 and An = 0 with assumption about

independence of identically distributed entries in thep×n random matrix X1was

consid-ered firstly by Marˇcenko and Pastur (1967). For some details see Chapter 4 in this thesis or consult works by Yin (1986), Silverstein and Bai (1995) and Silverstein (1995).

2) Non-i.i.d. entries ofPnwith zero mean

The studies of an asymptotic spectral distribution without assumption about i.i.d. of the entries have been performed by Girko (1990), Girko and von Rosen (1994) and Khorun-zhy et al. (1996).

In Section 4.2 a generalized version of the result presented by Girko and von Rosen (1994) is discussed in relation to the result of Silverstein and Bai (1995).

(16)

4 1 Introduction

3) Independent and identically distributed entries ofPnwith non-zero mean

Under restrictive assumptions onAnin (1.1) the spectral asymptotic behaviour has been explained by a solution of a certain nonlinear system ofn + p coupled functional

equa-tions, see Girko (2001). The productQn = PnP0

n = (Rn + Yn)(Rn + Yn)0, where

bothRnandYnare independent random matrices,Ynhas i.i.d. entries and the empirical distribution ofRnR0

nconverge to some non-random distribution, has been discussed by

Dozier and Silverstein (2007). In that work it has been shown that the distribution of the eigenvalues ofQn converges almost surely to the same deterministic distribution. The Stieltjes transform for that distribution, which by the Stieltjes inversion formula determi-nates the spectral distribution function, is uniquely given by certain functional equations. For more details about the Stieltjes transform, see Section 3.1.

WhenRn= (Rn

ij) denotes a deterministic and pseudo-diagonal matrix (i.e., Rnij = 0

for alli6= j, but Rndoes not have to be necessary square),Yn = (Yn

ij) are independent

but not i.i.d. (Yn ij =

σij(n)

√ n X

n

ij,Xijn are i.i.d. and{σij(n)} is bounded, real sequence of

numbers). The empirical spectral distribution ofQn converges almost surely to a non-random probability measure and has been studied and proven by Hachem et al. (2006). In the presented there theorem, for some set of suitable assumptions, a system of equations which describe the spectral distribution through its Stieltjes transform is given. A weaker version of these assumptions (given below as A1, A2, A3) forAnin (1.1) is considered in the later paper by Hachem et al. (2007).

A1 Xn

ijare real, i.i.d. distributed, E(Xijn)2= 1; ∃>0E|Xij|n 4+<∞

A2 there exists aσmaxsuch that the family(σij)1≤i≤p;1≤j≤n, called variance profile,

satisfies

∃σmaxsup

n≥1 max

i,j |σij(n)| < σmax

A3 supremum over alln≥ 1 of maximum of Euclidean norms of all columns and rows

of the matrixAnis finite.

Under this set of assumptions it is proved by Hachem et al. (2007) that there exists a de-terministic equivalent to the empirical Stieltjes transform of the distribution of the eigen-values ofQn. More precisely, there exists a deterministicp× p matrix valued function Tn(z), analytic in C\ R+, such that

lim n→∞, p n→c  1 pTr(Qn− zIp) −11 pTr(Tn(z))  = 0 a.s. and ∀c > 0.

Note that byn → ∞, np → c we indicate that the so called Kolmogorov condition

holds and 1pTr(Qn− zIp)−1 is a version of the Stieltjes transform, see Section 3.1, of

the spectral distribution of the matrixQn. Tr A denotes trace of a square matrix, i.e. Tr A =P

iAii.

Moreover, Example 1.1 given by Hachem et al. (2007) shows that generally the con-vergence of the empirical spectral density ofQn can fail, despite of existence of vari-ance profile in some limit and despite of convergence of the spectral distribution for non-randomAnA0

(17)

1.3 Outline 5

Example 1.1: Hachem et al. (2007)

This example is to motivate the additional assumptions A1, A2, A3 used by Hachem et al. (2007) to avoid lack of convergence for the spectral measure ofQn.

Consider the2n× 2n matrix Ynof the form Yn=  Wn 0 0 0  , whereWn = (Wn

ij)i,j is a quadratic matrix of sizen such that Wijn = Xij

n, whereXij

are i.i.d. with mean 0 and variance 1. It is easy to see thatYnis a matrix with a variance profile.

Next, two2n× 2n deterministic matrices Bn=  In 0 0 0  and ¯Bn=  0 0 0 In  are considered. They are chosen so that both the spectral distribution functionsFBnB0n

2n and FB¯nB¯n0 2n converge to 12δ0+ 1 2δ1, whenn→ ∞. It is shown asF(Yn+An)(Yn+An)0

2n , whereAnis alternatively equal toBnand to ¯Bndoes

not admit a limiting distribution as

F(Yn+An)(Yn+An)0 =  1 2Pcub+ 1 2δ0 ifn is even 1 2PM P + 1 2δ1 ifn is odd

The PM P and Pcubdenotes the distribution ofWnWn0 (called Marˇcenko-Pastur’s

distri-bution) and(Wn+ In)(Wn+ In)0, respectively.

The mentioned example points out that to ensure convergence of the spectral measure we must consider assumptions concerning boundness of norm for rows and columns of matrixAn and existence of at least four moments ofXn

ij for theYigiven with variance

profile, for alli = 1, . . . , k. The question which arises is if we are able to distinguish

classes of matricesYisuch that the obtained asymptotic spectral distribution function is given by a closed form expression.

1.3

Outline

This thesis started with a short Introduction comprising the problem formulation and some basic literature. It is followed by the four main chapters of the work. All used symbols and operators are listed at the end of the work in Notation. Then the Bibliography and Appendix are placed. A more specified outline of the main parts is presented below.

In Chapter 2, one can find an introduction to the most basic ideas and results of Free probability theory, where the main focus is put on the concept of free independence. The chapter includes a proof of asymptotic freeness between particular classes of matrices and introduce free cumulants used later to define the R-transform. Then in the Chapter 3, the R- and Stieltjes transform are discussed. Here, properties of the R-transform and its relation to the Stieltjes transform play the the key role.

(18)

6 1 Introduction

After introduction to the theoretical tools used in the thesis, Chapter 4 presents Marˇcenko-Pastur theorem and compares two theorems given by Girko and von Rosen (1994) and one formulated according to Silverstein and Bai (1995).

In Chapter 5, one is introduced to the ideas and results related to the research ques-tion concerning the finding of closed formulas of the asymptotic spectral distribuques-tion of

Q∈ Q. An illustrative example in Section 5.1.1 allows us to put up the theorem, which

gives generalization of an earlier result by Marˇcenko and Pastur (1967). In Section 5.2 the results point out classes of matricesQ∈ Q which lead to closed formula for the

asymp-totic spectral distribution of large random matrices. The results are given by stating the asymptotic spectral distribution for all the matrices with a particular form of the inverse Stieltjes transform, with respect to the composition. The thesis finish with Chapter 6, where we put up some future research questions.

(19)

2

Free probability theory - background

information

F

REEprobability theory was established by Voiculescu in the middle of the 80’s (Voiculescu, 1985) and together with the result published in Voiculescu (1991) re-garding asymptotic freeness of random matrices have established a new branch of theories and tools in Random matrix theory such as the R-transform. The freeness can also be stud-ied with the use of equivalent combinatorial definitions based on ideas of non-crossing partitions (see, Section 2.2). The following chapter refers to both Random matrix theory and combinatorial point of view, while introducing basic definitions and concepts of the theory. These are introduced in a general set up of non-commutative probability space and specified in Section 2.1.1 to the algebra of random matrices. The combinatorial approach is introduced mainly in order to present a proof of asymptotic freeness between Gaussian and diagonal matrices.

2.1

Non-commutative space and Freeness

In this section the goal is to present a concept of freeness in a non-commutative space, which is introduced according to, among others, books of Nica and Speicher (2006) and Voiculescu et al. (1992). Some properties for elements of a non-commutative space are also presented. For further reading, see Hiai and Petz (2000).

Definition 2.1 (Non-commutative probability space). A non-commutative probability

space is a pair(A, τ), where A is a unital algebra over the field of complex numbers C

with identity element1Aandτ is a unital functional such that: • τ : A → C is linear,

• τ(1A) = 1.

(20)

8 2 Free probability theory - background information

Definition 2.2. The functionalτ is called trace if τ (ab) = τ (ba) for all a, b∈ A.

Note that the word trace is used in the thesis in two meanings: trace as name of functional fulfillingτ (ab) = τ (ba) for all a, b ∈ A and trace of a square matrix A = (Aij), i.e. Tr A =P

iAii.

Definition 2.3. LetA in Definition 2.1 have a ∗-operation such that ∗ : A → A, (a∗)∗= a and (ab)∗= bafor alla, b∈ A and let the functional τ satisfy

τ (a∗a)≥ 0 for alla∈ A.

Then, we callτ positive and (A, τ) a ∗-probability space.

Remark 2.1. If(A, τ) is a ∗-probability space, then τ(a∗) = τ (a) for all a∈ A, where z denotes complex conjugate.

Proof of Remark 2.1: Let us take a ∈ A, then as A is a ∗-algebra over C we can

uniquely writea = x + iy, where x = x∗andy = y.

Let us show thatτ (x) ∈ R for all x such that x = x. As x = x, we can rewrite x =  1 2(x + 1A) ∗ 1 2(x + 1A)−  1 2(x− 1A) ∗ 1 2(x− 1A) = w∗w− q∗q. Of course w, q∈ A. Then, by linearity and positivity of the functional τ, we get that

τ (x) = τ (w∗w− q∗q)∈ R.

Using thatτ (x)∈ R for obtaining equality () we prove immediately result τ (a∗) = τ (x− iy) = τ(x) − iτ(y)()= τ (x) + iτ (y) = τ (x + iy) = τ (a).

Subsection 2.1.1 gives an example of a ∗-probability space with the positive tracial function, i.e. space (RMp(C), τ ) of p× p random matrices with entries being

com-plex random variables with all moments finite and equipped with a functional such that

τ (X) := E(Trp(X)), for all X∈ RMp(C), where Trp= 1pTr is a weighted trace.

Keep-ing for now the general set up we define random variables, moments and distribution for elements in non-commutative space.

Definition 2.4 (Freeness of algebras). The subalgebrasA1, . . . ,Am⊂ A, where (A, τ)

is a non-commutative probability space, are free if and only if for any(a1, . . . , an), aj Aij and for alli∈ {1, 2, . . . , n − 1} and all j ∈ {1, 2, . . . , n − 1}

τ (ai) = 0 and ij 6= ij+1⇒ τ(a1· · · an) = 0.

Note, thatij 6= ij+1 for allj ∈ {1, 2, . . . , n − 1} means that all elements aj andaj+1

with neighboring indices belong to different subalgebras.

Denote the free algebra with generatorsa1, . . . , amby Cha1, . . . , ami, i.e. all poly-nomials inm non-commutative indeterminants.

(21)

2.1 Non-commutative space and Freeness 9

Definition 2.5.

a) The elementa∈ A is called a non-commutative random variable and τ(aj) is its jth moment for all j ∈ N. ajis well defined due to the fact that algebra is closed

over the multiplication of elements.

b) Leta∈ A, where A denotes ∗-probability space. The linear functional µ on unital

algebra which is freely generated by two non-commutative indeterminatesX and X∗an ChX, Xi defined as

µ : ChX, Xi → C, µ(Xψ1Xψ2

· · · Xψk) → τ(aψ1aψ2

· · · aψk),

for allk∈ N and all ψ1, ψ2, . . . , ψk∈ {1, ∗} is called ∗-distribution of a.

c) Leta ∈ A be normal, i.e. aa∗ = aa, whereA denotes ∗-probability space. If

there exists a compactly supported probability measureµ on C such that Z

znz¯kdµ(z) = τ (an(a∗)k),

for alln, k∈ N, then µ is called ∗-distribution of a and is uniquely defined.

Assume that support of∗-distribution of a is real and compact. Then the real proba-bility measureµ given by Definition 2.5 is related to the moments by

τ (ak) = Z

R

xkdµ(x)

and is called a distribution ofa. The distribution of a∈ A on compact support is

charac-terized by its momentsτ (a), τ (a2), . . ..

Definition 2.6 (Freeness). The variables(a1, a2, . . . , am) and (b1, . . . , bn) are said to

be free if and only if for any(Pi, Qi)1≤i≤p ∈ (Cha1, . . . , ami × Chb1, . . . , bni)psuch

that

τ (Pi(a1, . . . , am)) = 0, τ (Qi(b1, . . . , bn)) = 0 i=1,...,p

following equation holds

τ  Y 1≤i≤p Pi(a1, . . . , am)Qi(b1, . . . , bn)  = 0.

To be able to show that freeness does not go with classical independence in Lemma 2.2, given below, we state and prove first Lemma 2.1.

Lemma 2.1

Leta and b be free elements of a non-commutative probability space (A, τ). Then, we

have:

τ (ab) = τ (a)τ (b), (2.1)

τ (aba) = τ (a2)τ (b),

(22)

10 2 Free probability theory - background information

Proof: For freea and b we have τ  (a− τ(a)1A)(b− τ(b)1A)  = 0, τ 

ab− aτ(b) − τ(a)b + τ(a)τ(b)  = 0, τ (ab) = τ (a)τ (b). Then also τ 

(a− τ(a)1A)(b− τ(b)1A)(a− τ(a)1A) 

= 0, τ



(ab− aτ(b) − τ(a)b + τ(a)τ(b))(a − τ(a)1A) 

= 0, τ



aba− aτ(b)a − τ(a)ba + τ(a)τ(b)a − abτ(a) +aτ (b)τ (a) + τ (a)bτ (a)− τ(a)τ(b)τ(a)



= 0, τ (aba)− τ(a2)τ (b)− τ(a)τ(ba) + τ(a)τ(b)τ(a) − τ(ab)τ(a)

+τ (a)τ (b)τ (a) + τ (a)τ (b)τ (a)− τ(a)τ(b)τ(a) = 0,

τ (aba) = τ (a2)τ (b).

Similar calculations show that

τ (abab) = τ (a2)τ (b)2+ τ (a)2τ (b2)

− τ(a)2τ (b)2.

One can prove that the freeness and commutativity can not take place simultaneously as it is stated in the next lemma.

Lemma 2.2

Leta and b be non-trivial elements of∗-algebra A, equipped with the functional τ such

thata and b commute, i.e. ab = ba. Then, a and b are not free.

Proof by contradiction: Take two non-trivial elementsa and b of the∗-algebra A such

that they are both free and commute. Then

τ (abab)ab=ba= τ (a2b2)(2.1)= τ (a2)τ (b2)

and

τ (abab)(2.2)= τ (a2)τ (b)2+ τ (a)2τ (b2)− τ(a)2τ (b)2.

These two equalities give

τ (a2)τ (b)2+ τ (a)2τ (b2)− τ(a)2τ (b)2

(23)

2.1 Non-commutative space and Freeness 11

Then, asa and b are free

τ (a − τ (a)1A)2τ (b − τ(b)1A)2 = τ(a2− 2aτ (a) + τ (a)21A)τ (b2− 2bτ (b) + τ (b)21A)

= (τ (a2) − 2τ (a)2+ τ (a)2)(τ (b2) − 2τ (b)2+ τ (b)2) = (τ (a2) − τ (a)2)(τ (b2) − τ (b)2)

= τ(a2)τ (b2) − τ (a2)τ (b)2− τ (a)2τ(b2) + τ (a)2τ(b)2

(2.3)

= 0

Then, eitherτ (a− τ(a)1A)2 = 0 or τ (b − τ(b)1A)2 = 0. As long as the functional τ is faithful, i.e. τ (a∗a) = 0⇒ a = 0, the obtained equality implies that a = τ(a)1A

orb = τ (b)1A. We get that at least one of elementsa or b is trivial, what contradicts

the assumption that the equations holds for any non-trivial elements, which proves the statement.

2.1.1

Space

(RM

p

(C), τ )

In this subsection we consider a particular example of a non-commutative space(RMp(C), τ ).

Let(Ω,F, P ) be a probability space, then the RMp(C) denotes set of all p× p random

matrices, with entries which belong toT

p=1,2,...Lp(Ω, P ), i.e. entries are complex

ran-dom variables with finite moments of any order. Defined in this way RMp(C) is a

∗-algebra, with the classical matrix product as multiplication and the conjugate transpose as

∗-operation. The ∗-algebra is equipped with tracial functional τ defined as expectation of

the normalized traceTrpin the following way

τ (X) := E(Trp(X)) = E 1 pTr(X)  =1 pE( p X i=1 Xii) = 1 p p X i=1 EXii, (2.4)

whereX = (Xij)pi,j=1∈ RMp(C).

The form of the chosen functionalτ is determined by the fact that especially

inter-esting to us is the distribution of the eigenvalues. Notice, that for any normal matrix

X ∈ (RMp(C), τ ) the eigenvalue distribution µX is the∗-distribution with respect to a given functionalτ defined in Definition 2.5.

First, consider a matrixX with eigenvalues denoted by λ1, . . . , λp. Then

Trp(Xk(X∗)n) = 1 p p X i=1 λkiλi n = Z C zkz¯ndµX(z),

for allk, n∈ N, where µX is a spectral probability measure, corresponding to the nor-malized spectral distribution function defined in Definition 1.1. For

µX(x) = 1 p Z Ω p X k=1 δλk(ω)dP (ω),

whereδλk(ω)stands for Dirac delta, we obtain a generalization of the above statement for

X being a normal random matrix, so τ (Xk(X)n) = 1 p Z Ω p X i=1 λk i(ω)λi(ω) n dP (ω) = Z C zkz¯ndµX(z).

(24)

12 2 Free probability theory - background information

Hence, defined in (2.4) traceτ is a∗-distribution in the sense given by Definition 2.5c. Remark 2.2. In general, for any random matrixX the measure µXdoes not have compact support.

2.1.2

Asymptotic Freeness

The concept of asymptotic freeness was established by Voiculescu (1991), where Gaus-sian random matrices with constant unitary matrices have been discussed.

Theorem 2.1 (Voiculescu’s Asymptotic Freeness)

LetXp,1, Xp,2, . . . be independent (in the classical sense) p× p GUE. Then there exists

functionalφ in some non-commutative polynomial algebra ChX1, X2, . . .i such that • (Xp,1, Xp,2, . . .) has a limit distribution φ as p→ ∞, i.e.,

φ(Xi1Xi2· · · Xik) = lim

p→∞τp(Xp,i1Xp,i2· · · Xp,ik),

for allij ∈ N, j ∈ N, where τp(X) = E TrpX .

• X1, X2, . . . are freely independent with respect to φ, see Definition 2.6.

The mentioned work was followed by Dykema (1993) who replaced the Gaussian entries of the matrices with more general non-Gaussian random variables. Furthermore, the constant diagonal matrices was generalized to some constant block diagonal matrices, such that the block size remains constant. In general the random matrices with indepen-dent entries of sizep× p tend to be asymptotically free while p → ∞ , under certain

conditions.

To give some additional examples one can consider that two unitaryp× p matrices

are asymptotically free and two i.i.d. p× p Gaussian distributed random matrices are

asymptotically free asp → ∞. For the future use in the Chapter 5 we mention here

asymptotic freeness between i.i.d. Wigner matrices. This fact has been proven by Dykema (1993). The asymptotic free independence holds also for Gaussian and Wishart random matrices and for Wigner and Wishart matrices, Capitaine and Donati-Martin (2007).

Following Müller (2002) we want to point out that there exists matrices which are de-pendent, in classical sense, and asymptotically free as well as matrices with indede-pendent, in classical sense, entries which are not asymptotically free.

Remark 2.3. LetD1andD2be independent diagonal random matrices and let matrices

H1andH2be Haar distributed, independent of each other and of the diagonal matrices

D1,D2. Then,

• D1andD2are not asymptotically free; • but H1D1H∗

(25)

2.2 Combinatorial interpretation of freeness. Cumulants 13

2.2

Combinatorial interpretation of freeness.

Cumulants

Combinatorial interpretation of freeness, described using free cumulants (see, Definition 2.9 given below) have been established by Speicher (1994) and developed by Nica and Speicher (2006). Two main purposes of this section is to introduce idea of the free cu-mulant as well as to present steps of the proof of asymptotic free independence between some particular classes of matrices. Free cumulants play an important role in Chapter 3, where the R-transform is defined.

Definition 2.7 (Non-crossing partition). LetV = V1, . . . , Vp be a partition of the set

{1, . . . , r}, i.e. for all i = 1, . . . , p the Vi are ordered and disjoint sets andSpi=1Vi = {1, . . . , r}. The V is called non-crossing, if for all i, j = 1, . . . , p with Vi= (v1, . . . , vn)

(such thatv1< . . . < vn) andVj= (w1, . . . , wm) (such that w1< . . . < wm) we have

wk < v1< wk+1⇔ wk< vn < wk+1 (k + 1, . . . , m− 1).

Presented here, following Speicher, definition of non-crossing partition can be given in an equivalent recursive form.

Definition 2.8 (Non-crossing partition. Recursive definition). The partition V = {V1, . . . , Vp} is non-crossing if at least one of the Viis a segment of(1, . . . , r) i.e. it has

the formVi = (k, k + 1, . . . , k + m) and{V1, . . . , Vi−1, Vi+1, . . . , Vp} is a non-crossing

partition of{1, . . . , r} \ Vi.

Let the set of all non-crossing partitions over{1, . . . , r} be denoted by NC(r).

Definition 2.9 (Cumulant). Let(A, τ) be a non-commutative probability space. Then

we define the cumulant functionalskk:Ak→ C, for all i ∈ N by the moment-cumulant

relation

k1(a) = τ (a), τ (a1· · · ak) = X π∈NC(k)

kπ[a1, . . . , ak],

where the sum is taken over all non-crossing partitions of the set{a1, a2, . . . , ak} and

kπ[a1, . . . , ak] = r Y

i=1

kV(i)[a1, . . . , ak] π ={V (1), . . . , V (r)}, kV[a1, . . . , ak] = ks(av(1), . . . , av(s)) V = (v(1), . . . , v(s)).

For theX element of a non-commutative algebra (A, τ) we define the cumulant of X as knX= kn(X, . . . , X).

Note that the square bracket are used to denote the cumulants with respect to the par-titions, while the parentheses for the cumulants of some set of variables. To illustrate the difference let consider two elements set {a1, a2}, such that a1, a2 belong to non-commutative probability space equip with tracial functionalτ . Then k1(ai) = τ (ai) for

(26)

14 2 Free probability theory - background information

alli = 1, 2. The only non-crossing partitions of the two element set are segment{a1, a2}

or{a1}, {a2} so τ(a1a2) = Pπ∈NC(2)kπ[a1, a2] = k2(a1, a2) + k1(a1)k1(a2) = k2(a1, a2) + τ (a1)τ (a2). Hence k2(a1, a2) = τ (a1a2)− τ(a1)τ (a2) is a cumulant of the

2-element set{a1, a2}, while kπ[a1, a2] denotes cumulant of partition π. Lemma 2.3

Given by Definition 2.9 cumulants are well defined.

Proof: Following the definition of a cumulant τ (a1, . . . , an) = X

π∈NC(n)

kπ[a1, . . . , an] = kn(a1, . . . , an)+ X π∈NC(n),π6=1n

kπ[a1, . . . , an],

whereπ6= 1n means that we consider partitions different from then-elements segment,

i.e.π6= {1, 2, . . . , n}. Now the lemma follows by induction.

To show linearity of cumulants for the sum of free random variables, we need to state a theorem about vanishing mixed cumulants.

Theorem 2.2

Leta1, a2, . . . , an∈ A then elements a1, a2, . . . , anare freely independent if and only if all mixed cumulants vanishes, i.e. forn≥ 2 and any choice of i1, . . . , ik∈ {1, . . . , n} if

there existj, k such that j6= k, but ij= ikthen

kn(ai1, . . . , ain) = 0. Proof: The proof can be found in Nica and Speicher (2006). Theorem 2.3

Leta, b∈ A be free, then

kna+b = kna+ knb,

forn≥ 1.

Proof: The proof of the theorem follows from the fact that for free random variables

mixed cumulants are equal zero, see Theorem 2.2.

ka+bn := kn(a + b, a + b, . . . , a + b) = kn(a, a, . . . , a) + kn(b, b, . . . , b).

Definition 2.10 (Free additive convolution). Leta and b be elements of the non-commutative

probability space(A, τ) with law µa, µb, respectively. Then ifa and b are free and µa

andµbhave compact support, the distribution ofa + b is denoted µab, where is called

free additive convolution.

The measureµabis determinated by the tracial functionalφ ? ψ, called free product,

in the free algebra generated bya and b, i.e. Cha, bi = Chai ? Chbi by φ ? ψ((a + b)k) =

Z

xkd(µab)(x).

Given in definition distributionµabdoes not depend of the elementsa and b, but only of

the their distribution functionsµaandµb. Moreover, the Definition 2.10 can be extended to arbitrary probability measure on R, see Nica and Speicher (2006).

(27)

2.2 Combinatorial interpretation of freeness. Cumulants 15

2.2.1

Proof of asymptotic freeness between diagonal and

Gaussian matrices

The Section 2.1.2 indicates the asymptotic freeness between various classes of random matrices. Here, the goal is to actually prove the asymptotic freeness in one of such cases. Moreover, the proofs illustrate the use of the combinatorial approach introduced in Sec-tion 2.2.

Theorem 2.4 (Speicher, 1993)

LetA = (An)n∈NandB = (Bn)n∈Nbe two sequences of self-adjoint (i.e. An = A∗n) n× n matrices AnandBnsuch that

µAn→ µA and µBn→ µB weakly

for some spectral probability measureµA andµB on R. IfµA andµB have compact support, then

µAn+UnBnUn∗ → µA µB weakly

for almost all random sequences of unitary matricesU = (Un)n∈N.

Steps of the proof: The theorem have been proved by Speicher in the following steps.

1) Firstly, the asymptotic freeness between then× n Gaussian random matrices X

(entries are independent and complex valued random variables with mean zero and variancen1) and non-random diagonal matrices is shown. The free independence between Gaussian and non-random diagonal matrices is stated as Lemma 2.5 and proved below.

2) Then, the first step of the proof implies asymptotic freeness between polynomials inX and diagonal matrices.

3) LetU := X(X∗X)−1 2. Notice that U U∗ = X(X∗X)−12(X(X∗X)− 1 2)∗= X(X∗X)− 1 2(XX∗)− 1 2X∗ = X(X∗XXX∗)−12X∗= X(X∗X)−1X∗= I

and similarlyU∗U = I. Hence, U is an unitary matrix and X → X(XX)−1 2

determinates a measurable mapping from space of Gaussian matrices into space of unitary matrices, defined almost everywhere. Then, an image measure under this mapping of the measure on the space of Gaussian matrices is a measure on the space of unitary matrices. What implies that it is enough to proof the statement for the matrices given byX(X∗X)−1

2, which then can be approximated by the

polynomialsXg(X∗X), where g is a polynomial with real coefficients such that

that it is a good approximation ofx−1

2 . That and 2) imply asymptotic freeness

between unitary and diagonal matrices.

4) Finally, notice that both matricesA and B can be decomposed as A = W ADW, B = V BDV, respectively, whereAD,BDare diagonal matrices. Hence

µAn+UnBnUn∗ = µWnADnWn+UnVnBnDVn∗Un∗ = µADn+Wn∗UnVnBDnVn∗Un∗Wn

= µAD

(28)

16 2 Free probability theory - background information

we obtained equivalence of measures and we can simply prove the statement of the theorem for diagonal matricesADandBDas mappingUn → W

nUnVnpreserve

the measure. Then, by the third step, one obtains the asymptotic freeness between

{AD, BD

} and {U, U∗}. Hence, also AD andU BDUare asymptotically free,

what finishes the proof.

Lemma 2.4 (Speicher, 1993)

For almost all sequences of Gaussian square matricesX = (Xn)n∈Nthe following holds. LetP be a polynomial in two non-commuting indeterminants. Then

lim

n→∞Trn(P (Xn, X ∗

n)) = φ(P (X, X∗)),

whereφ is a positive linear functional, such that φ(1) = 1, φ(XX) = φ(XX) = 1

andφ(XX) = φ(X∗X) = 0.

Proof: The statement of a lemma follows from Lemma 2.5, which is proven below.

To show the use of the combinatorial interpretations of asymptotic freeness within Free probability theory, the proof of the Lemma 2.5 is given. It is, simultaneously, the proof of the first step in Theorem 2.4.

Lemma 2.5 (Speicher, 1993)

For almost all sequences of Gaussian square matricesX = (Xn)n∈Nthe following holds. LetD1 = (D1

n)n∈N,. . .,Ds = (Dns)n∈Nbes sequence of diagonal n× n matrices Dnj

such that

lim

n→∞Trn(P (D 1

n, . . . , Dsn)) = ρ(P (D1, . . . , Ds))

for all polynomialsP in s indeterminants, where ρ is a state on ChD1, . . . , Dsi, i.e., ρ is

positive linear functional andρ(1) = 1. Then lim

n→∞Trn(P (Xn, D 1

n, . . . , Dsn, Xn∗)) = ρ ? φ(P (X, D1, . . . , Ds, X∗))

for all polynomialsP in (s + 2) non-commutative indeterminants, where ρ ? φ is called

a free product of ρ and φ and is a functional on space ChX, D1, . . . , Ds, Xi. The

functionalφ is defined in the previous lemma.

Proof: As the proof is rather complex, we present it stepwise.

1. Notation. We consider monomialsXni(1), . . . , Xni(r)defined as X0

n := Xn,

Xnj := Djn, 1≤ j ≤ s Xns+1 := Xn

for all choices ofr∈ N and i(k) ∈ {0, 1, . . . , s + 1}. Define Sn := Trn(Xni(1)· · · Xni(r)) = 1 n n X k1=1 · · · n X kr=1

(29)

2.2 Combinatorial interpretation of freeness. Cumulants 17

2. Show the convergence of E[Sn] to ρ ? φ(V).

2a) Define the valid partition and denote the set of all valid partitions on{1, . . . , r} byPv(1, . . . , r).

The non-crossing partitionV = (V1, . . . , Vt) of{1, . . . , r} is called valid if

one of the following holds:

-t = 1 and i(v) ={1, . . . , s} for all v = 1, 2, . . . , r. Then ρ ? φ(V) = ρ(Di(1)

· · · Di(r)).

-V contains segment Vk= (m, m + 1, . . . , m + l) such that i(m), i(m + l)∈ {0, s + 1}, i(m) 6= i(m + l) and i(m + 1), . . . , i(m + l − 1) ∈ {1, . . . , s} and V \ {Vk} is valid partition of {1, 2, . . . , r} \ Vk. In the other words, segment Vk have to be started and finished withX and X∗, orXandX. The inner

elements of a segment should only consist of the diagonal matrices and the partition without this segment should become valid.

We define

ρ ? φ(Vk) = φ(Xi(m)Xi(m+l))ρ(Di(m+1)· · · Di(m+l−1))

and

ρ ? φ(V) = ρ ? φ(Vk)ρ ? φ(V \ {Vk}).

2b) Consider the expectation

E[Sn] = 1 n n X k1=1 · · · n X kr=1 E[Xi(1) n (k1, k2)Xni(2)(k2, k3)· · · Xni(r)(kr, k1)].

The entries in matricesXn are independent with mean zero andX∗ n(l, k) = ¯

Xn(k, l). Hence, E[Xni(1)(k1, k2)Xni(2)(k2, k3)· · · Xni(r)(kr, k1)] is different

than zero only if each of matrix elements ofXn occurs at least twice in the product.

2c) Now, denote set of all positions of theXnandX∗ nby I :={k : i(k) ∈ {0, s + 1}}.

Then a pair(kj, kj+1) is called free step if i(j)∈ {0, s + 1} and kj+1has not

appeared before and a repetive step ifi(j)∈ {0, s+1} and kj+1has appeared

before.

We are interested only in tuples(k1, . . . , kr), kr+1 = k1with both number of free steps and repetive steps equal to#I2 . We call that tuple valid. It have been shown by Wigner (1955) that contribution of non-valid tuples is at most

o(n1), so it vanishes in limit.

2d) We have one-to-one correspondence between valid partition and valid tuple (in recursive way). Hence,

lim

n→∞E[Sn] = X

V∈Pv(1,...,r)

(30)

18 2 Free probability theory - background information

3. Almost sure convergence ofSn.

3a) We want to show that Var[Sn] c n2.

Similarly like in the case of expected value we rewriteV ar[Sn] as a double

sum over all tuples(k1, . . . , kr) and (l1, . . . , lr). Contribution into that sum

have only that tuples for which at least one step of k-tuple agree with one step of l-tuple (in the opposite case we have independence and vanishing in the sum). Then, one consider such tuples thati(m) = i(j), hence (km, km+1) = (lj, lj+1) and i(m) 6= i(j), hence (km, km+1) = (lj+1, lj). Both cases we

obtain invalid tuples. Hence contribution of those cases is of the order n12 as

forE[Sn] we have had factor1n.

3b) We show sufficiency of point 3a) for showing almost sure convergence

E[ ∞ X n=1 (Sn− E[Sn])2] = ∞ X n=1 VarSn<∞ ⇒ ∞ X n=1 (Sn− E[Sn])2<∞ a.s. Hence lim n→∞(Sn− E[Sn]) = 0 a.s.

The combinatorial approach is often used in Free probability theory in order to prove asymptotic freeness between classes of matrices. Here, the freeness forp→ ∞ between

ap× p diagonal and Gaussian matrices have been proven. The concept of free cumulants

introduced in connection to the idea of non-crossing partitions will allow us to define one of the main tools for applying free additive convolution, namely the R-transform.

(31)

3

Stieltjes and R-transform

T

HEStieltjes transform is commonly used in research regarding spectral measure of random matrices. It appears among others in formulations and proofs of a number of results published within Random matrix theory, i.e. Marˇcenko and Pastur (1967), Girko and von Rosen (1994), Silverstein and Bai (1995), Hachem et al. (2007). Thanks to good algebraic properties it simplifies calculations provided in order to obtain the limit of spectral distributions for the large dimensional random matrices.

The second section of this chapter is presenting the R-transform introduced within Free probability theory and strongly related to the Stieltjes transform. The transformation provides a way to obtain an analytical form of the asymptotic distribution of eigenvalues for the sums of certain random matrices.

Both the Stieltjes and R-transform and their properties are discussed to different ex-tents by Nica and Speicher (2006), Couillet and Debbah (2011), Speicher (2009) within the lectures by Hiai and Petz (2000), Krishnapur (2011). A version of the Stieltjes trans-form, the Cauchy transtrans-form, is also widely described in Cima et al. (2006).

3.1

Stieltjes transform

The literature studies shows that defined in this section Stieltjes transform, or its version, is often described using terms Cauchy transform (i.e. Cima et al., 2006, Hiai and Petz, 2000, Nica and Speicher, 2006) or Stieltjes-Cauchy transform (i.e. Hasebe, 2012, Bo˙zejko and Demni, 2009). In this thesis using the Stieltjes transform terminology we follow the work by Couillet and Debbah (2011).

Definition 3.1 (Stieltjes transform). Letµ be a non-negative, finite borel measure on R. Then we define the Stieltjes transform ofµ by

G(z) = Z R 1 z− xdµ(x), 19

(32)

20 3 Stieltjes and R-transform

for allz∈ {z : z ∈ C, =(z) > 0}, where =(z) denotes the imaginary part of the complex z.

Remark 3.1. Note that forz ∈ {z : z ∈ C, =(z) > 0} the Stieltjes transform is well

defined andG(z) is analytical for all z∈ {z : z ∈ C, =(z) > 0}.

Proof of Remark 3.1: The fact that the Stieltjes transform is well defined follows from

the fact that for the domain under functionz1

−x is bounded.

One can show that G(z) is analytical for all z ∈ {z : z ∈ C, =(z) > 0} using

Morera’s theorem (see work by Greene and Krantz, 2006). Then, it is enough to show that the contour integral HΓG(z)dz = 0 for all closed contours Γ in z ∈ {z : z ∈ C,=(z) > 0}. We are allowed to interchange integrals and obtain

Z R I Γ 1 z− xdzdµ(x) = Z R 0dµ(x) = 0,

where the first integral vanishes by the Cauchy’s integral theorem for any closed contour

Γ asz−x1 is analytic.

Definition 3.1 above can be extended to all z ∈ C \ support(µ). Nevertheless, as

we require fromG(z) to be analytical, our consideration are going to be restricted to the

upper half plane of C as domain.

Now, we introduce the Stieltjes inversion formula, which allows us to use the knowl-edge about the form of the transformG to derive the measure µ.

Theorem 3.1 (Stieltjes inversion formula)

For any open intervalI = (a, b), such that neither a nor b are atoms for the probability

measureµ the inversion formula

µ(I) =−π1 lim y→0 Z

I

=G(x + iy)dx

holds. Here convergence is with respect to the weak topology on the space of all real probability measures. Proof: We haveπ1 lim y→0 Z I =G(x + iy)dx = −π1 lim y→0 Z I Z R =x + iy1 − tdµ(t)dx (∗) = 1 πy→0lim Z R b Z a y (t− x)2+ y2dxdµ(t) = 1 πylim→0 Z R arctan b− t y  − arctan a− ty  dµ(t) () = 1 π Z R lim y→0  arctan b− t y  − arctan a− ty  dµ(t).

(33)

3.1 Stieltjes transform 21

The order of integration can be interchanged in (∗) due to continuity of the function y

(t−x)2+y2. Interchanging of order between integration and taking the limit in() follows

by the Bounded convergence theorem asµ(R)≤ 1 < ∞ and ∃M arctan b− ty



− arctan a− ty 

< M ∀y ∀t

so it is an uniformly bounded real-valued measurable function for ally.

Then, using thatlimy→0arctan  T y  = π 2sgn(T ) for T ∈ R we get arctan b− t y  − arctan a− ty  y→0 −−−→  0 ift < a or t > b π 22 = π ift∈ (a, b)

which by the Dominated convergence theorem completes the proof.

Remark 3.2. More general, for anyµ being a probability measure on R and any a < b µ((a, b)) + 1 2µ({a}) + 1 2µ({b}) = − 1 πylim→0 Z I =G(x + iy)dx.

For further reading, see Krishnapur (2011).

Theorem 3.2

Letµnbe a sequence of probability measures on R and letGµndenote the Stieltjes

trans-form ofµn. Then:

a) ifµn → µ weakly, where µ is a measure on R, then Gµn(z) → Gµ(z) pointwise

for anyz∈ {z : z ∈ C, =(z) > 0}.

b) ifGµn(z)→ G(z) pointwise, for all z ∈ {z : z ∈ C, =(z) > 0} then there exists a

unique non-negative and finite measure such thatG = Gµandµn→ µ weakly. Proof:

a) We know thatµn → µ, then for all bounded and continuous functions f the

fol-lowing

Z

f dµn Z

f dµ

holds. Asf (x) = z−x1 is both bounded and continuous on R for all fixedz∈ {z : z∈ C, =(z) > 0} we conclude that Gµn(z) = Z R 1 z− xdµn(x)→ Z R 1 z− xdµ(x) = Gµ(z) pointwise.

(34)

22 3 Stieltjes and R-transform

b) Now, assume thatGµn(z) → G(z) pointwise. As µn is a probability measure

(so it is bounded and a positive measure for whichsupnµn(R) < ∞), then by

Helly’s selection principleµn has a weakly convergent subsequence. Denote that subsequence byµnkand its limit byµ.

Asf (x) = 1

z−x is bounded, continuous on R andf (x)

x→±∞

−−−−−→ 0, by part a) Gµnk(z)→ Gµ(z) pointwise for all z ∈ {z : z ∈ C, =(z) > 0}. Then, Gµ = G

what by the fact that inverse Stieltjes transform is unique mean that for all the converging subsequencesµnkwe obtain the same limitµ. Hence, µn→ µ.

Last in this section we state a lemma, not restricted to the space of matrices, which relates the Stieltjes transform with the moment generating function. It will be used later for proving a relation between the Stieltjes and R-transform.

Lemma 3.1

Letµ be a probability measure on R and{mk}k=1,...a sequence of moments (i.e.mk(µ) = R

Rt

kdµ(t)). Then, the moment generating function Mµ(z) =P∞

k=0mkzkconverges to

an analytic function in some neighborhood of 0. For sufficiently large|z|

G(z) = 1 zM µ 1 z  . (3.1) holds.

Consider now an element of the non-commutative space of Hermitian random matri-ces over the complex planeX∈ RMp(C) and note that analyzing the Stieltjes transform

is actually simplified to consideration of diagonal elements of the matrix(zIp− X)−1as Gµ(z) = Z R 1 z− xdµ(x) = 1 pTr(zIp− Λ) −1=1 pTr(zIp− X) −1,

whereµ and Λ denote the empirical spectral distribution and the diagonal matrix of

eigen-values of the matrixX, respectively. Lemma 3.2 ForX ∈ Mp×n(C) and z∈ {z : z ∈ C =(z) > 0} n pGµX∗ X(z) = GµXX∗(z)− p− n pz .

Proof: AsX ∈ Mp×n(C) the matrix X∗X is of size n× n and the matrix XX∗is of

sizep×p. Assume that p > n. Then, X∗X has n eigenvalues, while the set of eigenvalues

of the matrixXX∗consists of the samen eigenvalues and additional p− n zeros. Then,

we havep− n times the term 1 z−0 = 1 z and GµXX∗(z) = n pGµX∗ X(z) + p− n p 1 z.

(35)

3.2 R-transform 23 Ifp < n, we have GµX∗ X(z) = p nGµXX∗(z) + n− p n 1 z, n pGµX∗ X(z) = GµXX∗(z) + n− p p 1 z

and the proof is complete.

3.2

R-transform

TheR-transform plays the same role in the Free probability theory as the Fourier

trans-form in classical probability theory and it is defined in the following way.

Definition 3.2 (R-transform). Let µ be a probability measure with compact support,

with{ki}i=1,... as the sequence of cumulants, see Chapter 2, Definition 2.9. Then the

R-transform is given by Rµ(z) = ∞ X i=0 ki+1zi.

Note that defined in this way the R-transform and cumulants{ki} give us essentially the same information. Moreover, if it is not introducing confusion the upper index is skipped and the R-transform is simply denotedR(z).

There is a relation between the R- and Stieltjes transformG, or more precisely G−1,

which is the inverse with respect to composition, and often is considered as an equivalent definition to Definition 3.2.

Theorem 3.3

Letµ be a probability measure with compact support, G(z) the Stieltjes transform and R(z) the R-transform. Then,

R(z) = G−1(z)1 z.

The relation between the moment and cumulant generating functions is given by Lemma 3.3. This tool is stated here due to its use in the proof of Theorem 3.3.

Lemma 3.3

Let{mi}i≥1and{ki}i≥1 be sequences of complex numbers, with corresponding formal

power series M (z) = 1 + ∞ X i=1 mizi C(z) = 1 + ∞ X i=1 kizi

as generating functions, such that

mi = X

π∈NC(i) kπ.

Then

(36)

24 3 Stieltjes and R-transform

The proof of the lemma has been presented by Nica and Speicher (2006), who used combinatorial tools.

Proof of Theorem 3.3:

R(G(z)) + 1 G(z)= z

is equivalent with the statement given in the theorem. Let{mk}k=1,...and{ki}i=1,... de-note the sequence of moments and the sequence of cumulants, respectively. LetM (z) =

P∞

k=0mkzkandC(z) =

P∞

i=0kizi. Then

R(z) = ∞ X i=0 ki+1zi=1 z ∞ X i=0 ki+1zi+1=1 z ∞ X i=1 kizi= 1 z  ∞ X i=0 kizi − 1  = 1 z(C(z)− 1) (3.2)

holds. By Lemma 3.3, we get the relation between moment and cumulant generating functions M (z) = C(zM (z)). (3.3) Then R(G(z)) + 1 G(z) (3.2) = 1 G(z)(C(G(z))− 1) + 1 G(z) = 1 G(z)C(G(z)) (3.1) = 1 G(z)C  1 zM  1 z  (3.3) = 1 G(z)M  1 z  (3.1) = z

and the theorem is proved.

TheR-transform will play an important role in the following chapter so we prove here

some of its properties. Especially, the first two of Theorem 3.4 will be used frequently.

Theorem 3.4

Let(A, τ) be a non-commutative probability space, such that distribution of X, Y, Xn∈ A, for all n ∈ N, has compact support. The R-transform has the following properties

a) Non-linearity:RαX(z) = αRX(αz) for every X ∈ A and α ∈ C;

b) For any two freely independent non-commutative random variablesX, Y ∈ A RX+Y(z) = RX(z) + RY(z)

as formal power series; c) LetX, Xn∈ A, for n ∈ N. If lim n→∞τ (X k n) = τ (Xk), k = 1, 2, . . . , then lim n→∞RXn(y) = RX(y)

(37)

3.2 R-transform 25

Proof:

a) Let us prove the lack of linearity for the R-transform. We notice first that

GαX(z) = Z R 1 z− αxdµ(αx) = Z R 1 z− αxdµ(x) = 1 α Z R 1 z α− x dµ(x) = 1 αGX  z α 

and then asG−1αX(GαX(z)) = z we have

z = GαX(G−1αX(z)) = 1 αGX  1 αG −1 αX(z)  , αz = GX 1 αG −1 αX(z)  , G−1X (αz) = 1 αG −1 αX(z), Hence,G−1αX(z) = αG−1X (αz). Then, RαX(z) = G−1αX(z)−1z = αG−1X (αz)−1z = α  G−1X (αz)−αz1  = αRX(αz).

b) By the freeness ofX and Y we have that kiX+Y = kX

i + kiY fori = 1, 2, . . ., see Theorem 2.3. Then, RX+Y(z) = ∞ X i=0 kX+Yi+1 zi= ∞ X i=0 (kX i+1+ ki+1Y )zi= ∞ X i=0 kX i+1zi+ ∞ X i=0 kY i+1zi = RX(z) + RY(z)

c) The last property follows directly from the definition of the R-transform. As the free cumulants converge the R-transform also converge in each of its coefficients.

Presented in Theorem 3.4b) the linearization of free convolution

RµX+Y(z) = RµXµY(z) = RµX(z) + RµY(z),

whereµX (µY) stands for distribution ofX (Y , respectively), is for simplicity denoted

by

RX+Y(z) = RX(z) + RY(z).

Besides the asymptotic freeness of matrices, results regarding the R-transform, Part b) of Theorem 3.4 and Theorem 3.3, are considered to be the two main achievements presented by Voiculescu in his early papers, see Voiculescu (1991).

(38)
(39)

4

Mar ˇcenko-Pastur, Girko-von Rosen

and Silverstein-Bai theorems

T

HEresults discussed in this chapter are an illustration of the use of the Stieltjes trans-form in Random matrix theory. The theorems show various methods of calculation of the asymptotic spectral distribution. Two of such theorems are compared in the case of the Wishart matrixn1XX0, whereX ∼ Np,n(0, σ2I, I).

4.1

Statements of theorems

In this section we recall some of the early results obtained by Marˇcenko and Pastur (1967), Girko and von Rosen (1994) and Silverstein and Bai (1995).

Theorem 4.1 (Marˇcenko-Pastur Law)

Consider the matrixQndefined by (1.1), whenk = 1, An= 0 and X ∼ Np,n(0, σ2I, I).

Then the asymptotic spectral distribution is given by: If pn → c ∈ (0, 1] µ0(x) = p[σ 2(1 +c)2− x][x − σ2(1c)2] 2πcσ2x 1 ((1−√c)2σ2,(1+c)2σ2)(x) If pn → c ≥ 1  11 c  δ0+ µ,

where the asymptotic spectral density function

µ0(x) = p[σ

2(1 +c)2− x][x − σ2(1c)2]

2πcσ2x 1

((1−√c)2σ2,(1+c)2σ2)(x).

(40)

28 4 Marˇcenko-Pastur, Girko-von Rosen and Silverstein-Bai theorems

In the special case, whenc = 1 we obtain spectral density µ0(x) = 1

2πx p

4x− x2,

which is scaledβ-distribution with α = 1

2 andβ = 3 2.

Recently it has been proven that for a class of random matrices with dependent entries limiting empirical distribution of the eigenvalues is given by the Marˇcenko-Pastur law.

Theorem 4.2 (Girko and von Rosen, 1994)

LetX ∼ Nn,p(0, Σ, Ψ), where the eigenvalues of Σ and Ψ are bounded by some constant.

Suppose that the Kolmogorov condition0 < c = limn→∞pn <∞ holds and let µn,p(x)

be defined by Definition 1.1. Then for everyx≥ 0

µn,p(x)− Fn(x)→ 0,p n→ ∞,

where→ denotes convergence in probability and where for large n, {Fp n(x)} are

distri-bution functions satisfying

∞ Z 0 dFn(x) 1 + tx = 1 pTr  I + tAnA0n+ tΣa(t) −1 ,

where for allt > 0, a(t) is a unique nonnegative analytical function which exists and

which satisfies the nonlinear equation

a(t) = 1 nTr(Ψ(I + t nΨ Tr(Σ(I + tAnA 0 n+ tΣa(t))−1))−1).

Note, that for Stieltjes transformG(z), defined according to Definition 3.1, is given

byG(z) =1 zg − 1 z, where g(z) = R ∞ 0 dFn(x) 1+tx as in Theorem 4.2. Theorem 4.3 (Girko and von Rosen, 1994)

Consider the modification of the matrixQndefined by (1.1), when k=2 andAn= 0 to the

formQn=n11X1X0

1+n12X2X

0

2, where the matricesX1andX2are independent. Let

a(t) = 1 n2Tr Ψ2(I + t n2Ψ2b(t)) −1, b(t) = 1 n2Tr Σ2(I + tΣ2a(t) + tΣ1c(t)) −1, c(t) = 1 n1Tr Ψ1(I + t n1Ψ1Tr(Σ1(I + tΣ2a(t) + tΣ1b(t)) −1))−1, d(t) = 1 n1Tr Ψ1(I + t n1Ψ1Tr(Σ1(I + tΣ2a(t) + tΣ1d(t)) −1))−1. Putg(t) = 1pTr((I + t/n1X1X0 1+ t/n2X2X20)−1). If 0 < limn1→∞ p n1 < ∞ and 0 < limn2→∞ p n2 <∞ it follows that g(t)→ 1p(I + tΣ1d(t) + tΣ2a(t))−1, n→ ∞.

(41)

4.2 Comparison of results 29

Theorem 4.4 (Silverstein and Bai, 1995)

Assume that

• for p = 1, 2, . . . Zp = (√1pZijp)p×n, Zijp ∈ C are identically distributed for all p, i, j, independent across i, j for each p, E|Z1

11− EZ11|1 2= 1; • n(p)p → d > 0 as p → ∞;

• Tp = diag(τ1p, τ2p, . . . , τnp) where τip ∈ R, and the empirical distribution function

of the eigenvalues of the matrixTp, i.e. 1p, τ2p, . . . , τn} converges almost surelyp

in distribution to a probability distribution functionH as p→ ∞; • Bp = Ap + ZpTpZ∗

p, where ∗ stands for conjugate transpose of a matrix and Ap= (apij)i,j=1,2,...,pis a Hermitianp×p matrix for which FApconverges vaguely

toA almost surely, A being a possibly defective (i.e. with discontinuities) non-random distribution function;

• Zp, TpandApare independent.

Then, almost surely,FBp, the empirical distribution function of the eigenvalues ofBp,

converges vaguely, asp→ ∞ to a (non-random) distribution function F whose Stieltjes

transformm(z), z∈ C+, satisfies the canonical equation m(z) = mA  z− d Z τ dH(τ ) 1 + τ m(z)  . (4.1)

Note, that Silverstein and Bai defined the Stieltjes transform as−G(z), where G(z) is given by Definition 3.1. Hence, also the inverse of Stieltjes transform is given with opposite sign i.e.

µ(I) = 1 πylim→0

Z

I

=G(x + iy)dx,

whereI = (a, b) such that a, b are not atoms for the measure µ.

4.2

Comparison of results

Theorem 4.2 can be extended to hold for matrices over C. Then, together with Theorem 4.4 provides us two computationally different ways to obtain the asymptotic spectral dis-tribution. In this section the aim is to illustrate those differences with a simple example of the matrixQngiven by equation (1.1) withk = 1, An= 0 and X1∼ Np,n(0, σ2I, I).

The following calculations show that the mentioned theorems give us the Stieltjes trans-forms, which however differ by a vanishing term and thanks to that lead to the same asymptotic distribution function. Note that here the notations are standardized to the ones presented in Chapter 2 and 3.

(42)

30 4 Marˇcenko-Pastur, Girko-von Rosen and Silverstein-Bai theorems

Consider Theorem 4.2, whenΣ = σ2I, Ψ = I. Then, ∞ Z 0 dFn(x) 1 + tx = 1 pTr  I + tAnA0n+ tΣa(t) −1 = 1 1 + tσ2a(t),

wherea(t) is a unique nonnegative analytical function which exists and which satisfies

the nonlinear equation

a(t) = 1 nTr(Ψ(I + t nΨ Tr(Σ(I + tAnA 0 n+ tΣa(t))−1))−1) =  1+tp n σ2 1 + tσ2a(t) −1 , a−1(t) = 1 + tc σ 2 1 + tσ2a(t),

1 + tσ2a(t) = a(t)(1 + tσ2a(t)) + tcσ2a(t), 0 = a2(t)tσ2+ a(t)(tσ2(c− 1) + 1) − 1, a(t) = −1 + tσ 2(1 − c) ±p4σ2t + (1 + tσ2(c− 1))2 2σ2t . b(t) = lim n→∞ ∞ Z 0 dFn(x) 1 + tx = 1 1 + tσ2a(t) = 2 2− 1 + tσ2(1− c) ±p4σ2t + (1 + tσ2(c− 1))2 = 2 1 + tσ2(1− c) ±p4σ2t + (1 + tσ2(c− 1))2 = 2(1 + tσ 2(1 − c) ±p4σ2t + (1 + tσ2(c− 1))2) 1 + 2tσ2(1− c) + t2σ4(1− c)2− (4σ2t + (1 + tσ2(c− 1))2) = 2(1 + tσ 2(1− c) ±p4σ2t + (1 + tσ2(c− 1))2) −4ctσ2 = 1 + tσ 2(1 − c) ±p4σ2t + (1 + tσ2(c− 1))2 2ctσ2 . Sinceb(t) > 0, b(t) = 1 + tσ 2(1− c) −p4σ2t + (1 + tσ2(c− 1))2 2ctσ2 . Denotef (u) = 1 2uσ2 q −4σ2 1 z + (1− 1

zσ2(u− 1))2. Then, the Stieltjes transform equals

ˆ h(z) = 1 zb  −1z  = 1− 1 zσ 2(1 − c) −q−4σ2 1 z + (1− 1 zσ2(c− 1))2 2cσ2 = 1− 1 zσ 2(1− c) 2cσ2 − f(c).

References

Related documents

If we are not going to get it in place very soon, we will lose their enthusiasm and either they do something on their own or they will request all the excel sheets back. The

Crater wear is a generic term used to describe all the distinguishable wear-features that are able to be seen on the cutting tools rake face after a machining process.. As the

“Det är dålig uppfostran” är ett examensarbete skrivet av Jenny Spik och Alexander Villafuerte. Studien undersöker utifrån ett föräldraperspektiv hur föräldrarnas

– Visst kan man se det som lyx, en musiklektion med guldkant, säger Göran Berg, verksamhetsledare på Musik i Väst och ansvarig för projektet.. – Men vi hoppas att det snarare

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Data från Tyskland visar att krav på samverkan leder till ökad patentering, men studien finner inte stöd för att finansiella stöd utan krav på samverkan ökar patentering

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större