• No results found

Upper and Lower Bounds for Suprema of Chaos Processes

N/A
N/A
Protected

Academic year: 2022

Share "Upper and Lower Bounds for Suprema of Chaos Processes"

Copied!
50
0
0

Loading.... (view fulltext now)

Full text

(1)

Upper and Lower Bounds for Suprema of Chaos Processes

TIM FUCHS

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)
(3)

Upper and Lower Bounds for Suprema of Chaos Processes

TIM FUCHS

Degree Projects in Mathematics (30 ECTS credits) Degree programme in Engineering Physics (300 credits) KTH Royal Institute of Technology year 2018

Supervisor at TU München: Felix Krahmer Supervisor at KTH: Maurice Duits

Examiner at KTH: Maurice Duits

(4)

TRITA-SCI-GRU 2018:384 MAT-E 2018:82

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Talagrand’s generic chaining will be introduced and bounds for the expected supremum E[supt∈T Xt] of different stochastic processes proved.

The main focus lies on a result of Krahmer, Mendelson and Rauhut who were able to prove an alternative inequality for a specific kind of chaos process. As this bound leads to significantly better results in some applications, the sharpness of this inequality and a possible improvement will be investigated.

Sammanfattning

Det finns olika tekniker f¨or att analysera en stokastisk process (Xt)t∈T. I f¨oljande kom- mer Talagrands generic chaining att inf¨oras och gr¨anser f¨or v¨antev¨ardet av supremum E[supt∈T Xt] av olika stokastiska processer att bevisas.

Huvudfokus ligger p˚a ett resultat av Krahmer, Mendelson och Rauhut som kunde be- visa en alternativ oj¨amlikhet f¨or en viss typ av kaosprocess. Eftersom den gr¨ansen leder till betydligt b¨attre resultat i vissa applikationer, kommer sk¨arpan i denna oj¨amlikhet och en eventuell f¨orb¨attring att unders¨okas.

(6)
(7)

2 Generic Chaining 3

2.1 Talagrand’s proof . . . 3

2.2 The γα(T, d) functional . . . 10

3 Gaussian Chaos 13 3.1 Order 2 Gaussian chaos . . . 13

3.2 Talagrand’s bound . . . 15

4 Krahmer-Mendelson-Rauhut bound 19 4.1 Proof of the bound . . . 19

4.2 Orthogonal matrices . . . 25

4.2.1 Expectation . . . 25

4.2.2 Computation of γ2(Tt, d) . . . 27

4.2.3 Lower bound for γ2(O(n), d) . . . 28

4.3 Improving the bound . . . 35

5 Conclusion 37

(8)
(9)

1 Introduction

Stochastic processes are an important tool in many different fields. After Kolmogorov’s famous publication of Foundations of the Theory of Probability in 1933 [7], stochastic processes have taken on an important role in modeling different events in economics and science. Commonly known applications are, for example, the evaluation of financial port- folios [9] or risk estimation of insurance claims [15]. Besides that, more recent fields like machine learning are profiting from those results in mathematical research as well [14].

Often, the focus is on measuring the variance or trend of a specific process. This the- sis is providing an insight in a group of processes which are comparably difficult to control.

Given a stochastic process (Xt)t∈T, the quantity

E[sup

t∈T

Xt] (1.1)

is of interest.

Computing this expectation directly is often not possible wherefore the focus of this thesis will lie on finding efficient bounds.

Michel Talagrand is known for his research in this field. Therefore, this thesis is mostly based on his most recent publication Upper and Lower Bounds for Stochastic Processes [12] which is a collection of different results in this field.

Returning to the problem of controlling (1.1), the definition of this quantity has to be specified. To avoid issues with uncountable sets T , Talagrand generally sets

E[sup

t∈T

Xt] = sup{E[sup

t∈F

Xt] : F ⊆ T, F finite}

It should be noted for further proofs that assuming T to be finite does not lead to a loss of generality.

Obviously, an efficient bound should be depending on the set T . An important quan- tity is the so called entropy number en(T ). Be F ⊆ T with |F | ≤ 22n then for every t ∈ T , the distance of t to the set F is defined as the distance of t to the closest point in F :

d(t, F ) = inf

s∈F d(t, s)

Investigate now the greatest distance between an element t ∈ T and F , namely supt∈T d(t, F ).

Minimizing this distance over all subsets |F | ≤ 22n leads to the definition of the entropy number:

en(T ) = inf

F ⊆T ,|F |≤22n

sup

t∈T

d(t, F ); e0(T ) = 1

(10)

Richard M. Dudley published a paper 1967 [3] in which he connected entropy and Gaussian processes. Those results can be formulated to the so called Dudley’s entropy bound for Gaussian processes [12] using the metric d(s, t) :=pE[(Xs− Xt)2]:

E[sup

t∈T

Xt] ≤ LX

n≥0

2n/2en(T ) (1.2)

Throughout the whole thesis L > 0 denotes an universal constant not necessarily being equal at each appearance.

In the following, one of Talagrand’s most important results, the generic chaining, will be introduced. This technique can not only be used for improving (1.2), but also bounding more complex processes, namely Gaussian chaos processes.

An important application of chaos processes is in the context of compressive sensing.

As this this topic is not part of this thesis, only a short introduction based on [4] will be given:

Be r ∈ Rm×n, ~y ∈ Cm and ~x ∈ Cn. Assume further that ~x is a s-sparse which means

|{i ∈ [n] : xi 6= 0}| ≤ s. Compressive sensing aims for recovering the vector ~x from the observed vector ~y fulfilling the following equation:

~ y = r~x

The restricted isometry constant δs defined as the smallest value fulfilling (1 − δs)k~xk22 ≤ kr~xk22 ≤ (1 + δs)k~xk22

is an indicator if the recovery of an s-sparse is reliably possible. Research focuses on controlling δs depending on s and the dimensions of r.

For some types of random matrices r generated by a standard gaussian vector ~g, r~x can be rewritten as t~x~g where t is a matrix depending on ~x.

It can be shown that the restricted isometry constant is of the form δs = sup

t∈T

|kt~gk22− E[kt~gk22]|

In their publication Suprema of Chaos Processes and the Restricted Isometry Property [6], Krahmer, Mendelson and Rauhut investigated the expectation of this process. By prov- ing an alternative bound to Talagrand’s result, they were able to estimate the restricted isometry constant significantly better for some specific types of random matrices.

After acquiring the necessary knowledge about chaining, a simplified version of this re- sult will be presented. Furthermore, its sharpness will be investigated and a possible improvement will be provided.

(11)

2 Generic Chaining

Talagrand refined Kolmogorov’s idea of chaining and not only improved Dudley’s bound for Gaussian processes but also extended it to a wider range of stochastic processes. The main idea has been presented in [10] and further refined in [11].

To ensure consistency, the following introduction to generic chaining will be based on Talagrand’s newest publication [12] which includes minor changes.

2.1 Talagrand’s proof

Be (T, d) a metric space and (Xt)t∈T a stochastic process. It will be shown that a bound of E[supt∈T Xt] can be found by applying generic chaining if the following so-called increment condition is fulfilled:

P (|Xs− Xt| ≥ u) ≤ 2 exp



− u2 2d(s, t)2



∀u > 0 Furthermore, (Xt)t∈T is assumed to be a centered process, which means

E[Xt] = 0 ∀t ∈ T

Therefore, E[supt∈T Xt] = E[supt∈T Xt] − E[Xt0] = E[supt∈T(Xt− Xt0)] for t0 ∈ T . As Xt0 − Xt0 = 0, supt∈T(Xt− Xt0) is a non-negative random variable.

It is known that the expectation of a non-negative random variable Y can be calculated by E[Y ] =R

0 P (Y ≥ u)du:

E[sup

t∈T

Xt] = E[sup

t∈T

(Xt− Xt0)] = Z

0

P (sup

t∈T

(Xt− Xt0) ≥ u)du (2.1) As mentioned before, for the purpose of bounding the process, T can be assumed to be finite without a loss of generality.

In specific, one could investigate P (supt∈F(Xt−Xt0) ≥ u) for any finite F ⊆ T and extend the resulting inequality for T .

Using the properties of a probability measure, the probability of the union of events can be bounded by the sum of the probability of the single events:

P (sup

t∈T

(Xt− Xt0) ≥ u) = P ([

t∈T

{(Xt− Xt0) ≥ u}) ≤X

t∈T

P ((Xt− Xt0) ≥ u) (2.2)

It seems worth investigating the sharpness of this inequality a bit further. Obviously, the inequality would be sharp in case the random variables (Xt− Xt0)t∈T are independent.

On the other hand, if many of the Xt are close to being identical, the bound could show a significant lack of fit.

Consider for example the case T := {ti : i ∈ [2k]}, Xt2i = X, Xt2i−1 = −X ∀i ∈ [k]

and ∃j ∈ [2k] : tj = t0. Then obviously P (supt∈T(Xt− Xt0) ≥ u) = P (2X ≥ u), but P

t∈T P ((Xt− Xt0) ≥ u) = kP (2X ≥ u) for u > 0.

(12)

In order to find a better bound, those Xt which are nearly identical will be regrouped.

Later on, there will be a focus on partitioning T in a specific way. For now, consider a subset T1 ⊆ T and for each t ∈ T consider a point π1(t) ∈ T1. This can be considered as an approximation of t.

The purpose of this approximation is the following reformulation:

Xt− Xt0 = (Xt− Xπ1(t)) + (Xπ1(t)− Xt0)

Xπ1(t)− Xt0 can be potentially better bounded by (2.2) as for every subset |T1| < T there are less different π1(t) ∈ T1 than t ∈ T . Therefore, less terms are being summed up in (2.2). In addition, π1(t) can be chosen such that different t ∈ T with equal or similar Xt are being mapped on the same π1(t). Therefore, the problem mentioned above of having too many identical or similar random variables Xt− Xt0 can be avoided this way.

For the example above, one could choose T1 = {t1, t2} and π1(t2i−1) = t1, π1(t2i) = t2

∀i ∈ [k]. This results in the inequality P (sup

t∈T

(Xπ1(t)− Xt0) ≥ u) = P ( sup

π1(t)∈T1

(Xπ1(t)− Xt0) ≥ u)

≤ X

π1(t)∈T1

P ((Xπ1(t) − Xt0) ≥ u) = P (2X ≥ u)

which even is sharp in this case.

It remains to inspect the random variable Xt − Xπ1(t). As mentioned before, π1(t) is an approximation of t. Therefore, supt∈T Xt − Xπ1(t) should be easier to bound than supt∈T Xt− Xt0.

This procedure is now iterated for subsets Tn. Consistently, the approximation of t is denoted by πn(t) ∈ Tn. T0 = {t0} is set arbitrary but fixed. Consequently, π0(t) = t0. For n large enough, the approximation can be assumed to be perfect, which means ∃m > 0 with πn(t) = t ∀n ≥ m. Combining this, Xt−Xt0 can be represented in the following way:

Xt− Xt0 =X

n≥1

(Xπn(t)− Xπn−1(t)) (2.3)

As mentioned before, the summands are equal to zero for n large enough. Therefore, the underlying series actually is a finite sum.

After all necessary definitions have been introduced, the actual chaining can be defined.

As the assumptions before already suggested, Tn ⊆ T should be chosen as increasing sets so that the chain (πn(t))n≥0 approximates t more precisely with increasing n.

To control those subsets, |Tn| ≤ Nn has to be fulfilled ∀n ∈ N and Nn := 22n. T0 = {t0} and therefore N0 = 1.

Note that Nn2 = (22n)2 = 22·2n = 22n+1 = Nn+1 ∀n ≥ 1 and N02 = 1 ≤ 4 = N1. Therefore:

NnNn−1 ≤ Nn+1 = 22n+1 ∀n ∈ N (2.4)

(13)

Assume now πn(t) is chosen in order to minimize d(t, πn(t)). This means d(t, πn(t)) = d(t, Tn) = inf

s∈Tn

d(t, s)

Using the increment condition (2.2) gives some way of controlling (Xπn(t)− Xπn−1(t)). The idea is to substitute u in a way that the bound is not a function of the distance d(t, s) anymore. In addition, the bound should decrease with increasing n. The reasons for that will be demonstrated later on.

P (|Xπn(t)− Xπn−1(t)| ≥ u2n/2d(πn(t), πn−1(t)))

(2.2)

≤ 2 exp



−(u2n/2d(πn(t), πn−1(t)))2 2d(πn(t), πn−1(t))2



= 2 exp −u22n−1

(2.5) The proof continues now investigating the events Ωu for which |Xπn(t) − Xπn−1(t)| ≤ u2n/2d(πn(t), πn−1(t)) ∀t ∈ T , ∀n ≥ 1. This means for every ω ∈ Ωu, the process

|Xπn(t)(ω) − Xπn−1(t)(ω)| can be bounded along the chain πn(t) for any choice of t ∈ T . The probability of the complementary set Ωcu can be controlled in the following way:

P (Ωcu) = P (∃t ∈ T, ∃n ≥ 1 : |Xπn(t) − Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

≤X

n≥1

22n+12 exp −u22n−1

(2.6)

Proof of (2.6):

This will be shown step by step:

First, rewrite P (Ωcu) as the probability of the unions over all n ≥ 1 and t ∈ T . P (Ωcu) = P (∃t ∈ T, ∃n ≥ 1 : |Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

= P ([

n≥1

[

t∈T

: |Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

Obviously πn(t) ∈ Tn is taking the same value for some different t ∈ T in general.

Therefore, the probability can be controlled more efficiently by substituting S

t∈T by S

n(t),πn−1(t))∈Tn×Tn−1. In the next step, the standard bound for a probability of a union of events is being used:

P (Ωcu) = P ([

n≥1

[

n(t),πn−1(t))∈Tn×Tn−1

: |Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

≤X

n≥1

X

n(t),πn−1(t))∈Tn×Tn−1

P (|Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t))) (2.7)

(14)

For the next step, fix n ≥ 1 and focus on P

n(t),πn−1(t))∈Tn×Tn−1P (|Xπn(t) − Xπn−1(t)| >

u2n/2d(πn(t), πn−1(t))). There are only |Tn| possible values for πn(t). Therefore, the sum can only have a maximum of |Tn||Tn−1| summands. Bounding now this finite sum by the highest summand multiplied by the total amount of summands, i.e. P

i∈[k]ai ≤ k maxi∈[k]ai, results in:

P (Ωcu) ≤X

n≥1

X

n(t),πn−1(t))∈Tn×Tn−1

P (|Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

≤X

n≥1

|Tn||Tn−1| max

n(t),πn−1(t))∈Tn×Tn−1

P (|Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

(2.4)

≤ X

n≥1

22n+1 max

n(t),πn−1(t))∈Tn×Tn−1P (|Xπn(t)− Xπn−1(t)| > u2n/2d(πn(t), πn−1(t)))

The last inequality follows from the assumption |Tn| ≤ Nn and (2.4). Applying now (2.5) on maxn(t),πn−1(t))∈Tn×Tn−1P (|Xπn(t) − Xπn−1(t)| > u2n/2d(πn(t), πn−1(t))) provides the result:

P (Ωcu) ≤X

n≥1

22n+12 exp −u22n−1 =: pu

 These results can now be used for bounding E[supt∈T Xt]:

For ω ∈ Ωu

|Xt− Xt0|(2.3)= |X

n≥1

(Xπn(t)− Xπn−1(t))| (2.8)

≤X

n≥1

|(Xπn(t)− Xπn−1(t))|≤ uu X

n≥1

2n/2d(πn(t), πn−1(t)) (2.9)

holds. Taking the supremum of both sides yields:

sup

t∈T

|Xt− Xt0| ≤ u sup

t∈T

X

n≥1

2n/2d(πn(t), πn−1(t)) =: uS

Summarized, this inequality is fulfilled for at least all ω ∈ Ωu. Define now ˜Ωu ⊇ Ωuso that supt∈T |Xt−Xt0| ≤ uS ∀ω ∈ ˜Ωu. Therefore, the reversed inequality supt∈T |Xt−Xt0| > uS can only be fulfilled for ω ∈ ˜Ωcu ⊆ Ωcu:

P (sup

t∈T

|Xt− Xt0| > uS) = 1 − P (sup

t∈T

|Xt− Xt0| ≤ uS)

= 1 − P ( ˜Ωu) = P ( ˜Ωcu) ≤ P (Ωcu) ≤ pu

(15)

Based on (2.1), the E[supt∈T Xt] can be bounded. Plugging in the the last result gives E[sup

t∈T

Xt] ≤ Z

0

pudu

While the computation of pu is easier than calculating P (supt∈T |Xt− Xt0| > uS) di- rectly, pu is bounded further to facilitate the calculation of the integral. For this purpose, exp(−u22n−1) has to be bounded by a product of two functions f (u) and g(n). Simplified, pu =P

n≥1h(u, n) ≤ f (u)P

n≥122n+12g(n). If f (u) is continuous and g(n) chosen so that P

n≥122n+12g(n) converges, then E[sup

t∈T

Xt] ≤ X

n≥1

22n+12g(n)

!Z 0

f (u)du ≤ L Z

0

f (u)du

Therefore, substituting u22n−1 by a suitable sum would be sufficient. For n ≥ 1 and u ≥ 3, it can be seen that

u22n−1 ≥ u2(1

2+ 2n−2)3

2>23

≥ u2

2 + 2n+1

u≥3⇒ pu =X

n≥1

22n+12 exp −u22n−1 ≤ exp(−u2 2)X

n≥1

22n+12 exp −2n+1

= exp(−u2 2)2X

n≥1

 2 e

2n+1

= L exp(−u2

2 ) (2.10) The series converges as 2e < 1. It remains to consider u ∈ (0, 3). As a probability is naturally bounded by 1, the inequality can be improved as

P (sup

t∈T

|Xt− Xt0| > uS) ≤ min(pu, 1) ≤ L exp(−u2 2 )

The inequality has been shown for u ≥ 3 already. u ∈ (0, 3) follows now from 1 ≤ exp(−322) exp(−u22) = L exp(−u22).

Finally, the expectation can be bounded:

E[sup

t∈T

Xt] = Z

0

P (sup

t∈T

(Xt− Xt0) ≥ u)du ≤ Z

0

exp(− u2 2S2)du

For calculating the integral, consider the density of a normal random variable with vari- ance S2, f (u) = 1

2πSexp(−2Su22). As this is a symmetric distribution, 1

2πS

R

0 exp(−2Su22)du =

1

2 holds.

E[sup

t∈T

Xt] ≤ Z

0

exp(− u2

2S2)du = S

√2π 2 = LS

(16)

To avoid the need of determining an optimal chain (πn(t))n≥0, the distance d(πn(t), πn−1(t))) will be bounded using the triangle inequality:

d(πn(t), πn−1(t)) ≤ d(t, πn(t)) + d(t, πn−1(t))

πn(t)∈Tn

≤ sup

sn∈Tn

d(t, sn) + sup

sn−1∈Tn−1

d(t, sn−1) = d(t, Tn) + d(t, Tn−1) (2.11)

As πn(t) ∈ Tn can be seen as an approximation of t, (2.11) shows that the distance be- tween two successive approximations of t is not greater than the accumulated distance between t and the closest element in Tn−1 and Tn.

Therefore, using (2.11), S can be controlled without any dependency on the actual chain (πn(t))n≥0:

S = sup

t∈T

X

n≥1

2n/2d(πn(t), πn−1(t)) ≤ sup

t∈T

X

n≥1

2n/2(d(t, Tn) + d(t, Tn−1))

= sup

t∈T

X

n≥1

2n/2d(t, Tn) +X

n≥1

2n/2d(t, Tn−1) = sup

t∈T

X

n≥1

2n/2d(t, Tn) +√ 2X

n≥0

2n/2d(t, Tn)

≤ L sup

t∈T

X

n≥0

2n/2d(t, Tn)

As this is fulfilled for all (Tn)n≥0 with Tn ⊆ T and |Tn| ≤ Nn, a bound can be obtained by taking the infimum over those (Tn)n≥0:

E[sup

t∈T

Xt] ≤ L inf sup

t∈T

X

n≥0

2n/2d(t, Tn) (2.12)

The property inf sup

t∈T

X

n≥0

2n/2d(t, Tn) ≤ infX

n≥0

2n/2sup

t∈T

d(t, Tn) =X

n≥0

2n/2en(T )

shows that Dudley’s entropy bound (1.2) can not only be proved but also improved by us- ing Talagrand’s generic chaining. It should be mentioned as well that (2.12) and therefore (1.2) holds for any centered stochastic process fulfilling the increment condition.

(17)

Before continuing with formulating a more common form of (2.12), it remains to show that the bound actually holds for centered Gaussian processes. Therefore, it will be proved that they fulfill the increment condition.

For any centered Gaussian process (Xt)t∈T and s, t ∈ T , Xt and Xs are jointly nor- mal and ∃σ > 0 : Xt− Xs∼ N (0, σ). Obviously, d(s, t) =pE[(Xs− Xt)2] = σ.

Be G = Xs− Xt, then the moment generating function is known as E[erG] = eσ2r22 . By applying the Markov inequality, it can be shown that the increment condition is fulfilled:

P (G ≥ u)r>0= P (erG ≥ eru)Markov≤ E[erG]

eru = eσ2r22 −ru r=

u

= eσ2 2σ2u2 (2.13) The increment condition follows by using the symmetry of G and plugging in σ = d(s, t):

P (|Xs− Xt| ≥ u) = P (|G| ≥ u) = 2P (G ≥ u) ≤ 2 exp



− u2 2d(s, t)2



∀u > 0

(18)

2.2 The γ

α

(T, d) functional

For simplicity, denote the bound of (2.12) by γα(T, d) = inf sup

t∈T

X

n≥0

2n/2d(t, Tn)

As before, the infimum is taken over all subsets Tn ⊆ T with |Tn| ≤ Nn. As choosing those sets Tn bears some difficulties, Talagrand introduced the gamma functional which is defined as follows:

γα(T, d) = inf sup

t∈T

X

n≥0

2n/α∆(An(t), d)

Here, not sets Tn but partitions An are used. The term ∆(An(t)) is the diameter of the unique set An(t) of the partition An which contains the element t ∈ T . The infimum is taken over the set of all admissible sequences (An)n≥0 which is defined as

{(An)n≥0: An is a partition of T ∧ (∀An+1∈ An+1∃An∈ An : An+1⊆ An)

∧|A0| = 1 ∧ |An| ≤ 22n∀n ∈ N}

To understand this definition, the purpose of the gamma functional γα(T, d) will be sum- marized once more. The bound in inequality (2.12) should be substituted by a term which is easier to control, but is equivalent up to a constant factor. γα(T, d) fulfills this requirements if the following can be shown:

γα(T, d) ≤ γα(T, d) ≤ K(α)γα(T, d)

The first inequality ensures that γα(T, d) actually is a valid bound for E[supt∈T Xt].

For each partition Anof an admissible sequence (An)n≥0, Tncan be constructed by taking exactly one element of each set of An. |Tn| ≤ Nnas the partition is allowed to only consist of up to Nn sets. Then,

∆(An(t)) = sup

s,u∈An(t)

d(s, u) ≥ sup

s∈An(t)

d(s, t) ≥ inf

s∈Tn

d(s, t) = d(t, Tn) (2.14) The last inequality follows as there is exactly one ˜s ∈ An(t) ∩ Tn and sups∈An(t)d(s, t) ≥ d(˜s, t). Therefore,

γα(T, d) = inf sup

t∈T

X

n≥0

2n/2d(t, Tn) ≤ inf sup

t∈T

X

n≥0

2n/α∆(An(t)) = γα(T, d)

and the generic chaining bound is obtained:

E[sup

t∈T

Xt] ≤ Lγ2(T, d) (2.15)

for every centered stochastic process which is fulfilling the increment condition.

(19)

Taking a look at (2.14), one could assume the generic chaining bound is significantly less accurate than (2.12). By proving the second inequality ([12]; theorem 2.3.1), γα(T, d) ≤ K(α)γα(T, d), Talagrand showed that the gamma functional is of the same order as the bound in (2.12).

Yet, it remains to investigate the sharpness of the bound in general. For a centered Gaus- sian process (Xt)t∈T and d being defined as the canonical distance d(s, t) = (E(Xs− Xt)2)1/2, the majorizing measure theorem ([12]; theorem 2.4.1), published in [10] before, proves that γ2(T, d) is of the same order as E[supt∈T Xt].

1

2(T, d) ≤ E[sup

t∈T

Xt] ≤ Lγ2(T, d) (2.16)

The proofs of the last two results use techniques not further required for the purpose of this thesis. Therefore, they are skipped while the corresponding theorems in [12] are mentioned.

After demonstrating the efficiency of generic chaining for centered Gaussian processes, it will be shown how this technique can be used for a specific type of processes, which only fulfills a weaker tail condition.

Afterwards, a new bound for a specific chaos process will be presented and evaluated.

(20)
(21)

3 Gaussian Chaos

It has been shown that generic chaining leads to good bounds for centered processes fulfilling the increment condition. Focus now on processes which have a more complex distribution and only fulfill weaker tail conditions:

Be gi i.i.d.

∼ N (0, 1) and ai ∈ R, then X

i1<i2<···<id

ti1,...,idgi1· · · gid

is a Gaussian chaos of order d. The process X

i

ti1,...,idg(1)i1 · · · g(d)i

d

with g(j)i i.i.d.∼ N (0, 1) is called a decoupled Gaussian chaos of order d. Besides the defi- nitions above, [8] provides an introduction in estimating the moments of such a process.

Instead of moments, the focus will be on bounding the expected supremum once more.

In the following, the case d = 2 will be investigated further.

3.1 Order 2 Gaussian chaos

Under the assumption that there is a n ∈ N with ti,j = 0 for i > n or j > n, the decoupled Gaussian chaos of order 2 can be seen as a multiplication of two independent standard normal random vectors g, g0 of length n with a deterministic n × n matrix t. For the purpose of this thesis, assume t ∈ Rn×n:

Xt =X

i,j

ti,jgig0j =

n

X

i,j=1

ti,jgigj0 = gTtg0

Therefore, it seems natural to investigate the non-decoupled process gTtg. Centering the process leads to a more general definition of a second order Gaussian chaos including the diagonal terms:

gTtg − E[gTtg] =X

i,j

ti,jgigj − E[X

i,j

ti,jgigj]

i6=j→E[gigj]=0

= X

i,j

ti,jgigj −X

i

ti,iE[gi2]

E[gi2]=1

= X

i6=j

ti,jgigj +X

i

(g2i − 1)ti,i

(22)

As one can see in the cited literature, mathematical research focuses mainly on bounding the decoupled case. A reason for this has been shown in On decoupling, series expansions, and tail behavior of chaos processes [1], where Arcones proved the following inequality:

E[sup

t∈T

|X

j6=k

tj,kgjgk+X

j≥1

tj,j(gj2− 1)|] ≤ LE[sup

t∈T

|X

j≥1

X

k≥1

tj,kgjgk0|] (3.1)

Therefore, bounding the decoupled process is sufficient for bounding a non-decoupled chaos.

In general, a decoupled chaos process does not fulfill the increment condition. There- fore, the generic chaining bound (2.15) cannot be applied. Instead, the following weaker tail condition can be shown [12]:

P (|Xs− Xt| ≥ v) ≤ L exp(− min

 v2

d22(s, t), v d(s, t)



(3.2) d2 and d are the metrics induced by the Hilbert-Schmidt-norm and the spectral norm:

d2(s, t) =kt − skHS = s

X

i,j

|ti,j− si,j|

d(s, t) =kt − sk = sup

kxk2=1

k(t − s)xk2 =p

λmax((t − s)T(t − s))

By using (3.2) and applying a chaining approach, it can be shown, that γ1(T, d) + γ2(T, d2) is a valid bound for the expected supremum of a decoupled order 2 Gaussian chaos [12].

(23)

3.2 Talagrand’s bound

Before proving this bound, the main idea and difference to the proof of the generic chain- ing bound should be pointed out. In case min

 v2 d22(s,t),d v

(s,t)



= d2v2

2(s,t) ∀v ≥ 0, the tail condition would be equivalent to the increment condition with d = d2 and the process could be bounded by γ2(T, d2).

As min

v2 d22(s,t),d v

(s,t)



= d v

(s,t) has to be considered as well, a valid bound will be ob- tained by adding γ1(T, d) as it will be shown in the following.

For using the tail condition efficiently, consider two admissible sequences (Bn)n≥0 and (Cn)n≥0 which will be chosen according to two metrics d2 and d.

One constructs now each element An+1 of (An)n≥0 by intersecting all sets of the parti- tion Bn with the sets of the partition Cn. If ∆(Bn, d) is comparably small ∀Bn ∈ Bn and ∆(Cn, d2) comparably small ∀Cn ∈ Cn and An+1 = Bn∪ Cn, then ∆(An+1, d) and

∆(An+1, d2) can be assumed to be comparably small ∀An+1 ∈ An+1 as well.

Be (Bn)n≥0 and (Cn)n≥0 such that:

sup

t∈T

X

n≥0

2n∆(Bn(t), d) ≤ 2 inf sup

t∈T

X

n≥0

2n∆( ˜Bn(t), d) = 2γ1(T, d) sup

t∈T

X

n≥0

2n/2∆(Cn(t), d2) ≤ 2 inf sup

t∈T

X

n≥0

2n/2∆( ˜Cn(t), d2) = 2γ2(T, d2) (3.3)

The existence of those admissible sequences can be shown easily. As |T | = 1 is trivial, assume w.l.o.g. |T | > 1:

Be f : E → R+ and infe∈Ef (e) = u > 0, then

∃˜e ∈ E : f (˜e) ≤ 2 inf

e∈Ef (e) Proof :

Be f : E → R+0 and infe∈Ef (e) = u > 0.

Assume f (e) > 2u ∀e ∈ E, then infe∈Ef (e) = sup{x ∈ R : f (e) ≥ x∀e ∈ E} ≥ 2u. This contradicts to the assumption infe∈Ef (e) = u > 0.

Choose now f ((Bn)n≥0) = supt∈T P

n≥02n∆(Bn(t), d) and note that inf f ((Bn)n≥0) = γ1(T, d) ≥ inf ∆(B0(t), d) = ∆(T, d) = sups,tkt − sk2 > 0 for |T | > 1. As usual, the infimum is taken over all admissible sequences.

Therefore, the existence of a suitable admissible sequence (Bn)n≥0 follows directly from above. Proving the existence of (Cn)n≥0 follows the same way.

 As explained before, construct (An)n≥0 in the following way:

An= {An ⊂ T : An = Bn−1∩ Cn−1, Bn−1∈ Bn−1, Cn−1 ∈ Cn−1} n ≥ 1 A0 = {T }

(24)

Proving that (An)n≥0 is an admissible sequence works in two steps:

|An| ≤ |Bn−1| · |Cn−1| ≤ Nn−1· Nn−1≤ Nn

as there are |Bn−1| · |Cn−1| many possible intersections between the elements Bn−1∈ Bn−1 and Cn−1 ∈ Cn−1.

The second condition follows from the fact that (Bn)n≥0 and (Cn)n≥0 are admissible se- quences. Be An+1 ∈ An+1, then there exist Bn ∈ Bn and Cn ∈ Cn with An+1 = Bn∩ Cn. As (Bn)n≥0 and (Cn)n≥0 are admissible, there ∃ Bn−1 ∈ Bn−1 and Cn−1 ∈ Cn−1 with Bn⊆ Bn−1, Cn⊆ Cn−1 and ∃An∈ An with An = Bn−1∩ Cn−1. Then, An+1 = Bn∩ Cn ⊆ Bn−1∩ Cn−1= An.

Therefore

∀n ≥ 0, ∀An+1 ∈ An+1 ∃An∈ An : An+1 ⊆ An and it follows that(An)n≥0 is admissible.

As before, define a chaining by choosing subsets Tn ⊆ T so that Tn contains exactly one element of each set A ∈ An and denote by πn(t) the unique element contained in An(t) ∩ Tn.

Use the tail condition (3.2) and plug in v = ud(s, t) +√

ud2(s, t) with u ≥ 0:

P |Xs− Xt| ≥ ud(s, t) +√

ud2(s, t)

≤ L exp(− min (ud(s, t) +√

ud2(s, t))2

d22(s, t) ,ud(s, t) +√

ud2(s, t) d(s, t)



≤ L exp(− min u2d2(s, t) + 2v3/2d(s, t)d2(s, t) + ud22(s, t)

d22(s, t) , u +

√ud2(s, t) d(s, t)



≤ L exp(−u − min u2d2(s, t) + 2v3/2d(s, t)d2(s, t) d22(s, t) ,

√ud2(s, t) d(s, t)



| {z }

≥0

≤ L exp(−u)

The proof continues now similarly to the proof of the generic chaining bound.

Therefore, plug in u = 2nw for w ≥ 1 and investigate |Xπn(t) − Xπn−1(t)|:

P |Xπn(t) − Xπn−1(t)| ≥ w(2ndn(t), πn−1(t)) + 2n/2d2n(t), πn−1(t)))

w≥ w

≤ P |Xπn(t) − Xπn−1(t)| ≥ w2ndn(t), πn−1(t)) +√

w2n/2d2n(t), πn−1(t))

≤ L exp(−2nw)

(25)

As before, there are less or equal than 22n+1 pairs of (πn(t), πn−1(t)). Therefore, for w ≥ 9:

P (Ωc)

=P ∃n ≥ 1, t ∈ T : |Xπn(t)− Xπn−1(t)| ≥ w(2ndn(t), πn−1(t)) + 2n/2d2n(t), πn−1(t)))

≤LX

n≥1

22n+1exp(−2nw) ≤ L exp(−w)X

n≥1

22n+1exp(−2n−1)

= ˜L exp(−w)

The last inequality follows from (2.10) by choosing u = √

2w. With u22−n−1 = w2n and u2/2 = w, the bound can be obtained.

Arguing like in (2.9), supt∈T |Xt− Xt0| can be bounded for w ≥ 9

P sup

t∈T

|Xt− Xt0| ≤ w sup

t∈T

X

n≥1

(2ndn(t), πn−1(t)) + 2n/2d2n(t), πn−1(t)))

!

≥ 1 − L exp(−w) For n ≥ 2, πn(t), πn−1(t) ∈ An−1(t) ⊆ Bn−2(t). As An−1(t) ⊆ Cn−2(t) as well, the following holds:

dn(t), πn−1(t)) ≤ ∆(Bn−2(t), d) d2n(t), πn−1(t)) ≤ ∆(Cn−2(t), d2)

As d(π1(t), π0(t)) ≤ sups,˜t∈T d(s, ˜t) = ∆(T, d) and D0 = {T } for every admissible sequence (Dn)n≥0, one can substitute the sum of metrics by gamma functionals:

X

n≥1

2ndn(t), πn−1(t)) ≤ LX

n≥0

2n∆(Bn(t), d) ≤ 2Lγ1(T, d) = ˜Lγ1(T, d) X

n≥1

2n/2d2n(t), πn−1(t)) ≤ LX

n≥0

2n/2∆(Cn(t), d2) ≤ 2Lγ2(T, d2) = ˜Lγ2(T, d2)

Here, one can see why it is necessary that (Bn)n≥0 and (Cn)n≥0 fulfill (3.3). Combining this, one obtains

P

 sup

t∈T

|Xt− Xt0| ≤ Lw(γ1(T, d) + γ2(T, d2))



≥ 1 − L exp(−w)

This only has been shown for w ≥ 9. As before, the bound can be extended over all w ≥ 0 by choosing L high enough. Integrating yields the desired result:

E[sup

t∈T

Xt] = E[sup

t∈T

(Xt− Xt0)] = Z

0

P

 sup

t∈T

|Xt− Xt0| ≥ v

 dv

= Z

0

P

 sup

t∈T

|Xt− Xt0| ≥ Lw(γ1(T, d) + γ2(T, d2))



1(T, d) + γ2(T, d2))dw

≤ L(γ1(T, d) + γ2(T, d2)) Z

0

exp(−w)dw = L(γ1(T, d) + γ2(T, d2))

(26)

Therefore, a stochastic process fulfilling the tail condition (3.2) can be bounded by the sum of the two gamma functionals γ1(T, d and γ2(T, d2):

E[sup

t∈T

Xt] ≤ L (γ1(T, d) + γ2(T, d2)) (3.4)

(27)

4 Krahmer-Mendelson-Rauhut bound

In their publication Suprema of Chaos Processes and the Restricted Isometry Property [6], Krahmer, Mendelson and Rauhut investigated a specific chaos process:

Yt= ktgk22− E[ktgk22]

In the context of their research, a bound including the γ1-functional led to suboptimal results. They were able to bound the pth moment of Yt for every p ≥ 1 without using the γ1 functional by transforming the underlying process and applying a chaining approach:

E[sup

t∈T

|Yt|p]1/p ≤ Lγ2(T, d)



γ2(T, d) + sup

t∈T

ktkHS



+L√

p sup

t∈T

ktk



γ2(T, d) + sup

t∈T

ktkHS



+ Lp sup

t∈T

ktk2



In case of p = 1 and 0 ∈ T , the second summand can be dropped [12]:

E[sup

t∈T

|Yt|] ≤ Lγ2(T, d)



γ2(T, d) + sup

t∈T

ktkHS



(4.1)

This bound will be proved and its sharpness investigated.

4.1 Proof of the bound

Recall the definition of the process Yt = ktgk22− E[ktgk22] = htg, tgi − E[htg, tgi]. Define now a second process Zt := htg, tg0i with g, g0 iid.

Besides those two processes the notations U2 := E[supt∈T ktgk22] and V2 = supt∈T ktk2HS will be used.

Recall the decoupling inequality (3.1):

E[sup

t∈T

|X

j6=k

tj,kgjgk+X

j≥1

tj,j(gj2− 1)|] ≤ LE[sup

t∈T

|X

j≥1

X

k≥1

tj,kgjgk0|]

References

Related documents

To each mechanism is associated a certain characteristic length scale ℓ (which may depend on exter- nal parameters). Moreover we have discussed two different electronic feed- backs

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Rapporten, som även är ett inspel till den svenska exportstrategin, beskriver hur digitalisering har bidragit till att förändra och, i många fall, förbättra den kinesiska

(We refer to the book of Steeb and Euler [16] for more details on the Painlev´ e test of nonlinear evolution equations.) Thus, if conditions (67) and (68) are satisfied, equation

Figure 1: Inspection of t-SNE results with our tool, t-viSNE: (a) overview of the results with data-specific labels encoded with categorical colors; (b) the Shepard Heatmap of

Swedenergy would like to underline the need of technology neutral methods for calculating the amount of renewable energy used for cooling and district cooling and to achieve an

Given a positive scalar stochastic process x k , we show that under functional assump- tions on its conditional (onto past samples) probability moments, it is possible to trace

Via (3.11) we thus see that the e cient number of parameters used in the parametrization actually increases with the number of iterations (\training cycles&#34;) : d ~ ( i ) (3.13)