Upper and Lower Bounds for Suprema of Chaos Processes

(1)

Upper and Lower Bounds for Suprema of Chaos Processes

TIM FUCHS

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

(3)

Upper and Lower Bounds for Suprema of Chaos Processes

TIM FUCHS

Degree Projects in Mathematics (30 ECTS credits) Degree programme in Engineering Physics (300 credits) KTH Royal Institute of Technology year 2018

Supervisor at TU München: Felix Krahmer Supervisor at KTH: Maurice Duits

Examiner at KTH: Maurice Duits

(4)

TRITA-SCI-GRU 2018:384 MAT-E 2018:82

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Talagrand’s generic chaining will be introduced and bounds for the expected supremum E[sup_t∈T X_t] of different stochastic processes proved.

The main focus lies on a result of Krahmer, Mendelson and Rauhut who were able to prove an alternative inequality for a specific kind of chaos process. As this bound leads to significantly better results in some applications, the sharpness of this inequality and a possible improvement will be investigated.

Sammanfattning

Det finns olika tekniker för att analysera en stokastisk process (Xt)t∈T. I följande kommer Talagrands generic chaining att införas och gränser för väntevärdet av supremum E[sup_t∈T X_t] av olika stokastiska processer att bevisas.

Huvudfokus ligger p˚a ett resultat av Krahmer, Mendelson och Rauhut som kunde be- visa en alternativ ojämlikhet för en viss typ av kaosprocess. Eftersom den gränsen leder till betydligt bättre resultat i vissa applikationer, kommer skärpan i denna ojämlikhet och en eventuell förbättring att undersökas.

(6)

(7)

2 Generic Chaining 3

2.1 Talagrand’s proof . . . 3

2.2 The γ_α(T, d) functional . . . 10

3 Gaussian Chaos 13 3.1 Order 2 Gaussian chaos . . . 13

3.2 Talagrand’s bound . . . 15

4 Krahmer-Mendelson-Rauhut bound 19 4.1 Proof of the bound . . . 19

4.2 Orthogonal matrices . . . 25

4.2.1 Expectation . . . 25

4.2.2 Computation of γ₂(T_t, d∞) . . . 27

4.2.3 Lower bound for γ2(O(n), d∞) . . . 28

4.3 Improving the bound . . . 35

5 Conclusion 37

(8)

(9)

1 Introduction

Stochastic processes are an important tool in many different fields. After Kolmogorov’s famous publication of Foundations of the Theory of Probability in 1933 [7], stochastic processes have taken on an important role in modeling different events in economics and science. Commonly known applications are, for example, the evaluation of financial port- folios [9] or risk estimation of insurance claims [15]. Besides that, more recent fields like machine learning are profiting from those results in mathematical research as well [14].

Often, the focus is on measuring the variance or trend of a specific process. This thesis is providing an insight in a group of processes which are comparably difficult to control.

Given a stochastic process (Xt)t∈T, the quantity

E[sup

t∈T

X_t] (1.1)

is of interest.

Computing this expectation directly is often not possible wherefore the focus of this thesis will lie on finding efficient bounds.

Michel Talagrand is known for his research in this field. Therefore, this thesis is mostly based on his most recent publication Upper and Lower Bounds for Stochastic Processes [12] which is a collection of different results in this field.

Returning to the problem of controlling (1.1), the definition of this quantity has to be specified. To avoid issues with uncountable sets T , Talagrand generally sets

E[sup

t∈T

X_t] = sup{E[sup

t∈F

X_t] : F ⊆ T, F finite}

It should be noted for further proofs that assuming T to be finite does not lead to a loss of generality.

Obviously, an efficient bound should be depending on the set T . An important quantity is the so called entropy number e_n(T ). Be F ⊆ T with |F | ≤ 2²ⁿ then for every t ∈ T , the distance of t to the set F is defined as the distance of t to the closest point in F :

d(t, F ) = inf

s∈F d(t, s)

Investigate now the greatest distance between an element t ∈ T and F , namely sup_t∈T d(t, F ).

Minimizing this distance over all subsets |F | ≤ 2²ⁿ leads to the definition of the entropy number:

e_n(T ) = inf

F ⊆T ,|F |≤2²ⁿ

sup

t∈T

d(t, F ); e₀(T ) = 1

(10)

Richard M. Dudley published a paper 1967 [3] in which he connected entropy and Gaussian processes. Those results can be formulated to the so called Dudley’s entropy bound for Gaussian processes [12] using the metric d(s, t) :=pE[(X_s− X_t)²]:

E[sup

t∈T

Xt] ≤ LX

n≥0

2^n/2en(T ) (1.2)

Throughout the whole thesis L > 0 denotes an universal constant not necessarily being equal at each appearance.

In the following, one of Talagrand’s most important results, the generic chaining, will be introduced. This technique can not only be used for improving (1.2), but also bounding more complex processes, namely Gaussian chaos processes.

An important application of chaos processes is in the context of compressive sensing.

As this this topic is not part of this thesis, only a short introduction based on [4] will be given:

Be r ∈ R^m×n, ~y ∈ C^m and ~x ∈ Cⁿ. Assume further that ~x is a s-sparse which means

|{i ∈ [n] : x_i 6= 0}| ≤ s. Compressive sensing aims for recovering the vector ~x from the observed vector ~y fulfilling the following equation:

~ y = r~x

The restricted isometry constant δ_s defined as the smallest value fulfilling (1 − δ_s)k~xk²₂ ≤ kr~xk²₂ ≤ (1 + δ_s)k~xk²₂

is an indicator if the recovery of an s-sparse is reliably possible. Research focuses on controlling δ_s depending on s and the dimensions of r.

For some types of random matrices r generated by a standard gaussian vector ~g, r~x can be rewritten as t~x~g where t is a matrix depending on ~x.

It can be shown that the restricted isometry constant is of the form δ_s = sup

t∈T

|kt~gk²₂− E[kt~gk²₂]|

In their publication Suprema of Chaos Processes and the Restricted Isometry Property [6], Krahmer, Mendelson and Rauhut investigated the expectation of this process. By proving an alternative bound to Talagrand’s result, they were able to estimate the restricted isometry constant significantly better for some specific types of random matrices.

After acquiring the necessary knowledge about chaining, a simplified version of this result will be presented. Furthermore, its sharpness will be investigated and a possible improvement will be provided.

(11)

2 Generic Chaining

Talagrand refined Kolmogorov’s idea of chaining and not only improved Dudley’s bound for Gaussian processes but also extended it to a wider range of stochastic processes. The main idea has been presented in [10] and further refined in [11].

To ensure consistency, the following introduction to generic chaining will be based on Talagrand’s newest publication [12] which includes minor changes.

2.1 Talagrand’s proof

Be (T, d) a metric space and (Xt)t∈T a stochastic process. It will be shown that a bound of E[sup_t∈T X_t] can be found by applying generic chaining if the following so-called increment condition is fulfilled:

P (|Xs− Xt| ≥ u) ≤ 2 exp

− u² 2d(s, t)²

∀u > 0 Furthermore, (Xt)t∈T is assumed to be a centered process, which means

E[X_t] = 0 ∀t ∈ T

Therefore, E[sup_t∈T X_t] = E[sup_t∈T X_t] − E[X_t₀] = E[sup_t∈T(X_t− X_t₀)] for t₀ ∈ T . As X_t₀ − X_t₀ = 0, sup_t∈T(X_t− X_t₀) is a non-negative random variable.

It is known that the expectation of a non-negative random variable Y can be calculated by E[Y ] =R∞

0 P (Y ≥ u)du:

E[sup

t∈T

X_t] = E[sup

t∈T

(X_t− X_t₀)] = Z ∞

0

P (sup

t∈T

(X_t− X_t₀) ≥ u)du (2.1) As mentioned before, for the purpose of bounding the process, T can be assumed to be finite without a loss of generality.

In specific, one could investigate P (sup_t∈F(Xt−Xt0) ≥ u) for any finite F ⊆ T and extend the resulting inequality for T .

Using the properties of a probability measure, the probability of the union of events can be bounded by the sum of the probability of the single events:

P (sup

t∈T

(X_t− X_t₀) ≥ u) = P ([

t∈T

{(X_t− X_t₀) ≥ u}) ≤X

t∈T

P ((X_t− X_t₀) ≥ u) (2.2)

It seems worth investigating the sharpness of this inequality a bit further. Obviously, the inequality would be sharp in case the random variables (X_t− X_t₀)_t∈T are independent.

On the other hand, if many of the X_t are close to being identical, the bound could show a significant lack of fit.

Consider for example the case T := {t_i : i ∈ [2k]}, X_t_2i = X, X_t_2i−1 = −X ∀i ∈ [k]

and ∃j ∈ [2k] : t_j = t₀. Then obviously P (sup_t∈T(X_t− X_t₀) ≥ u) = P (2X ≥ u), but P

t∈T P ((X_t− X_t₀) ≥ u) = kP (2X ≥ u) for u > 0.

(12)

In order to find a better bound, those X_t which are nearly identical will be regrouped.

Later on, there will be a focus on partitioning T in a specific way. For now, consider a subset T₁ ⊆ T and for each t ∈ T consider a point π₁(t) ∈ T₁. This can be considered as an approximation of t.

The purpose of this approximation is the following reformulation:

X_t− X_t₀ = (X_t− X_π₁_(t)) + (X_π₁_(t)− X_t₀)

X_π₁_(t)− X_t₀ can be potentially better bounded by (2.2) as for every subset |T₁| < T there are less different π₁(t) ∈ T₁ than t ∈ T . Therefore, less terms are being summed up in (2.2). In addition, π₁(t) can be chosen such that different t ∈ T with equal or similar X_t are being mapped on the same π₁(t). Therefore, the problem mentioned above of having too many identical or similar random variables X_t− X_t₀ can be avoided this way.

For the example above, one could choose T₁ = {t₁, t₂} and π₁(t_2i−1) = t₁, π₁(t_2i) = t₂

∀i ∈ [k]. This results in the inequality P (sup

t∈T

(X_π₁_(t)− X_t₀) ≥ u) = P ( sup

π1(t)∈T1

(X_π₁_(t)− X_t₀) ≥ u)

≤ X

π1(t)∈T1

P ((X_π₁_(t) − X_t₀) ≥ u) = P (2X ≥ u)

which even is sharp in this case.

It remains to inspect the random variable X_t − X_π₁_(t). As mentioned before, π₁(t) is an approximation of t. Therefore, sup_t∈T X_t − X_π₁_(t) should be easier to bound than sup_t∈T X_t− X_t₀.

This procedure is now iterated for subsets T_n. Consistently, the approximation of t is denoted by π_n(t) ∈ T_n. T₀ = {t₀} is set arbitrary but fixed. Consequently, π₀(t) = t₀. For n large enough, the approximation can be assumed to be perfect, which means ∃m > 0 with π_n(t) = t ∀n ≥ m. Combining this, X_t−X_t₀ can be represented in the following way:

Xt− Xt0 =X

n≥1

(X_π_n_(t)− X_π_n−1_(t)) (2.3)

As mentioned before, the summands are equal to zero for n large enough. Therefore, the underlying series actually is a finite sum.

After all necessary definitions have been introduced, the actual chaining can be defined.

As the assumptions before already suggested, T_n ⊆ T should be chosen as increasing sets so that the chain (π_n(t))_n≥0 approximates t more precisely with increasing n.

To control those subsets, |T_n| ≤ N_n has to be fulfilled ∀n ∈ N and Nn := 2²ⁿ. T₀ = {t₀} and therefore N₀ = 1.

Note that N_n² = (2²ⁿ)² = 2^2·2ⁿ = 2²ⁿ⁺¹ = N_n+1 ∀n ≥ 1 and N₀² = 1 ≤ 4 = N₁. Therefore:

N_nN_n−1 ≤ N_n+1 = 2²ⁿ⁺¹ ∀n ∈ N (2.4)

(13)

Assume now π_n(t) is chosen in order to minimize d(t, π_n(t)). This means d(t, π_n(t)) = d(t, T_n) = inf

s∈Tn

d(t, s)

Using the increment condition (2.2) gives some way of controlling (X_π_n_(t)− X_π_n−1_(t)). The idea is to substitute u in a way that the bound is not a function of the distance d(t, s) anymore. In addition, the bound should decrease with increasing n. The reasons for that will be demonstrated later on.

P (|X_π_n_(t)− X_π_n−1_(t)| ≥ u2^n/2d(π_n(t), π_n−1(t)))

(2.2)

≤ 2 exp

−(u2^n/2d(π_n(t), π_n−1(t)))² 2d(π_n(t), π_n−1(t))²

= 2 exp −u²2ⁿ⁻¹

(2.5) The proof continues now investigating the events Ω_u for which |X_π_n_(t) − X_π_n−1_(t)| ≤ u2^n/2d(π_n(t), π_n−1(t)) ∀t ∈ T , ∀n ≥ 1. This means for every ω ∈ Ω_u, the process

|X_π_n_(t)(ω) − X_π_n−1_(t)(ω)| can be bounded along the chain π_n(t) for any choice of t ∈ T . The probability of the complementary set Ω^c_u can be controlled in the following way:

P (Ω^c_u) = P (∃t ∈ T, ∃n ≥ 1 : |X_π_n_(t) − X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t)))

≤X

n≥1

2²ⁿ⁺¹2 exp −u²2ⁿ⁻¹

(2.6)

Proof of (2.6):

This will be shown step by step:

First, rewrite P (Ω^c_u) as the probability of the unions over all n ≥ 1 and t ∈ T . P (Ω^c_u) = P (∃t ∈ T, ∃n ≥ 1 : |X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(πn(t), πn−1(t)))

= P ([

n≥1

[

t∈T

: |X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t)))

Obviously π_n(t) ∈ T_n is taking the same value for some different t ∈ T in general.

Therefore, the probability can be controlled more efficiently by substituting S

t∈T by S

(πn(t),πn−1(t))∈Tn×T_n−1. In the next step, the standard bound for a probability of a union of events is being used:

P (Ω^c_u) = P ([

n≥1

[

(πn(t),πn−1(t))∈Tn×T_n−1

: |X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t)))

≤X

n≥1

X

(πn(t),πn−1(t))∈Tn×Tn−1

P (|X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t))) (2.7)

(14)

For the next step, fix n ≥ 1 and focus on P

(πn(t),πn−1(t))∈Tn×T_n−1P (|X_π_n_(t) − X_π_n−1_(t)| >

u2^n/2d(π_n(t), π_n−1(t))). There are only |T_n| possible values for π_n(t). Therefore, the sum can only have a maximum of |T_n||T_n−1| summands. Bounding now this finite sum by the highest summand multiplied by the total amount of summands, i.e. P

i∈[k]a_i ≤ k max_i∈[k]a_i, results in:

P (Ω^c_u) ≤X

n≥1

X

P (|X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t)))

≤X

n≥1

|T_n||T_n−1| max

P (|X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t)))

(2.4)

≤ X

n≥1

2²ⁿ⁺¹ max

(πn(t),πn−1(t))∈Tn×T_n−1P (|X_π_n_(t)− X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t)))

The last inequality follows from the assumption |T_n| ≤ N_n and (2.4). Applying now (2.5) on max_(π_n_(t),π_n−1_(t))∈T_n×T_n−1P (|X_π_n_(t) − X_π_n−1_(t)| > u2^n/2d(π_n(t), π_n−1(t))) provides the result:

P (Ω^c_u) ≤X

n≥1

2²ⁿ⁺¹2 exp −u²2ⁿ⁻¹ =: p_u

These results can now be used for bounding E[sup_t∈T Xt]:

For ω ∈ Ω_u

|Xt− Xt0|^(2.3)= |X

n≥1

(X_π_n_(t)− X_π_n−1_(t))| (2.8)

≤X

n≥1

|(X_π_n_(t)− X_π_n−1_(t))|^Ω≤ u^u X

n≥1

2^n/2d(π_n(t), π_n−1(t)) (2.9)

holds. Taking the supremum of both sides yields:

sup

t∈T

|Xt− Xt0| ≤ u sup

t∈T

X

n≥1

2^n/2d(πn(t), πn−1(t)) =: uS

Summarized, this inequality is fulfilled for at least all ω ∈ Ω_u. Define now ˜Ω_u ⊇ Ω_uso that sup_t∈T |X_t−X_t₀| ≤ uS ∀ω ∈ ˜Ω_u. Therefore, the reversed inequality sup_t∈T |X_t−X_t₀| > uS can only be fulfilled for ω ∈ ˜Ω^c_u ⊆ Ω^c_u:

P (sup

t∈T

|X_t− X_t₀| > uS) = 1 − P (sup

t∈T

|X_t− X_t₀| ≤ uS)

= 1 − P ( ˜Ωu) = P ( ˜Ω^c_u) ≤ P (Ω^c_u) ≤ pu

(15)

Based on (2.1), the E[sup_t∈T X_t] can be bounded. Plugging in the the last result gives E[sup

t∈T

X_t] ≤ Z ∞

0

p_udu

While the computation of p_u is easier than calculating P (sup_t∈T |X_t− X_t₀| > uS) directly, p_u is bounded further to facilitate the calculation of the integral. For this purpose, exp(−u²2ⁿ⁻¹) has to be bounded by a product of two functions f (u) and g(n). Simplified, p_u =P

n≥1h(u, n) ≤ f (u)P

n≥12²ⁿ⁺¹2g(n). If f (u) is continuous and g(n) chosen so that P

n≥12²ⁿ⁺¹2g(n) converges, then E[sup

t∈T

X_t] ≤ X

n≥1

2²ⁿ⁺¹2g(n)

!Z ∞ 0

f (u)du ≤ L Z ∞

0

f (u)du

Therefore, substituting u²2ⁿ⁻¹ by a suitable sum would be sufficient. For n ≥ 1 and u ≥ 3, it can be seen that

u²2ⁿ⁻¹ ≥ u²(1

2+ 2ⁿ⁻²)³

2>2³

≥ u²

2 + 2ⁿ⁺¹

u≥3⇒ p_u =X

n≥1

2²ⁿ⁺¹2 exp −u²2ⁿ⁻¹ ≤ exp(−u² 2)X

n≥1

2²ⁿ⁺¹2 exp −2ⁿ⁺¹

= exp(−u² 2)2X

n≥1

2 e

2ⁿ⁺¹

= L exp(−u²

2 ) (2.10) The series converges as ²_e < 1. It remains to consider u ∈ (0, 3). As a probability is naturally bounded by 1, the inequality can be improved as

P (sup

t∈T

|X_t− X_t₀| > uS) ≤ min(p_u, 1) ≤ L exp(−u² 2 )

The inequality has been shown for u ≥ 3 already. u ∈ (0, 3) follows now from 1 ≤ exp(−³₂²) exp(−^u₂²) = L exp(−^u₂²).

Finally, the expectation can be bounded:

E[sup

t∈T

X_t] = Z ∞

0

P (sup

t∈T

(X_t− X_t₀) ≥ u)du ≤ Z ∞

0

exp(− u² 2S²)du

For calculating the integral, consider the density of a normal random variable with variance S², f (u) = ^√¹

2πSexp(−_2S^u²2). As this is a symmetric distribution, ^√¹

2πS

R∞

0 exp(−_2S^u²2)du =

1

2 holds.

E[sup

t∈T

X_t] ≤ Z ∞

0

exp(− u²

2S²)du = S

√2π 2 = LS

(16)

To avoid the need of determining an optimal chain (π_n(t))_n≥0, the distance d(π_n(t), π_n−1(t))) will be bounded using the triangle inequality:

d(π_n(t), π_n−1(t)) ≤ d(t, π_n(t)) + d(t, π_n−1(t))

πn(t)∈Tn

≤ sup

sn∈Tn

d(t, s_n) + sup

sn−1∈Tn−1

d(t, s_n−1) = d(t, T_n) + d(t, T_n−1) (2.11)

As πn(t) ∈ Tn can be seen as an approximation of t, (2.11) shows that the distance between two successive approximations of t is not greater than the accumulated distance between t and the closest element in T_n−1 and T_n.

Therefore, using (2.11), S can be controlled without any dependency on the actual chain (π_n(t))_n≥0:

S = sup

t∈T

X

n≥1

2^n/2d(π_n(t), π_n−1(t)) ≤ sup

t∈T

X

n≥1

2^n/2(d(t, T_n) + d(t, T_n−1))

= sup

t∈T

X

n≥1

2^n/2d(t, T_n) +X

n≥1

2^n/2d(t, T_n−1) = sup

t∈T

X

n≥1

2^n/2d(t, T_n) +√ 2X

n≥0

2^n/2d(t, T_n)

≤ L sup

t∈T

X

n≥0

2^n/2d(t, T_n)

As this is fulfilled for all (T_n)_n≥0 with T_n ⊆ T and |T_n| ≤ N_n, a bound can be obtained by taking the infimum over those (T_n)_n≥0:

E[sup

t∈T

X_t] ≤ L inf sup

t∈T

X

n≥0

2^n/2d(t, T_n) (2.12)

The property inf sup

t∈T

X

n≥0

2^n/2d(t, T_n) ≤ infX

n≥0

2^n/2sup

t∈T

d(t, T_n) =X

n≥0

2^n/2e_n(T )

shows that Dudley’s entropy bound (1.2) can not only be proved but also improved by using Talagrand’s generic chaining. It should be mentioned as well that (2.12) and therefore (1.2) holds for any centered stochastic process fulfilling the increment condition.

(17)

Before continuing with formulating a more common form of (2.12), it remains to show that the bound actually holds for centered Gaussian processes. Therefore, it will be proved that they fulfill the increment condition.

For any centered Gaussian process (X_t)_t∈T and s, t ∈ T , X_t and X_s are jointly normal and ∃σ > 0 : X_t− X_s∼ N (0, σ). Obviously, d(s, t) =pE[(X_s− X_t)²] = σ.

Be G = X_s− X_t, then the moment generating function is known as E[e^rG] = e^σ2r2² . By applying the Markov inequality, it can be shown that the increment condition is fulfilled:

P (G ≥ u)^r>0= P (e^rG ≥ e^ru)^Markov≤ E[e^rG]

e^ru = e^σ2r2² ^−ru ^r=

u

= eσ2 ⁻^2σ2^u2 (2.13) The increment condition follows by using the symmetry of G and plugging in σ = d(s, t):

P (|X_s− X_t| ≥ u) = P (|G| ≥ u) = 2P (G ≥ u) ≤ 2 exp

− u² 2d(s, t)²

∀u > 0

(18)

2.2 The γ

_α

(T, d) functional

For simplicity, denote the bound of (2.12) by γ_α^∗(T, d) = inf sup

t∈T

X

n≥0

2^n/2d(t, T_n)

As before, the infimum is taken over all subsets T_n ⊆ T with |T_n| ≤ N_n. As choosing those sets Tn bears some difficulties, Talagrand introduced the gamma functional which is defined as follows:

γ_α(T, d) = inf sup

t∈T

X

n≥0

2^n/α∆(A_n(t), d)

Here, not sets T_n but partitions A_n are used. The term ∆(A_n(t)) is the diameter of the unique set A_n(t) of the partition A_n which contains the element t ∈ T . The infimum is taken over the set of all admissible sequences (An)n≥0 which is defined as

{(An)n≥0: An is a partition of T ∧ (∀An+1∈ An+1∃An∈ An : An+1⊆ An)

∧|A₀| = 1 ∧ |A_n| ≤ 2²ⁿ∀n ∈ N}

To understand this definition, the purpose of the gamma functional γα(T, d) will be summarized once more. The bound in inequality (2.12) should be substituted by a term which is easier to control, but is equivalent up to a constant factor. γ_α(T, d) fulfills this requirements if the following can be shown:

γ_α^∗(T, d) ≤ γα(T, d) ≤ K(α)γ_α^∗(T, d)

The first inequality ensures that γα(T, d) actually is a valid bound for E[sup_t∈T Xt].

For each partition A_nof an admissible sequence (A_n)_n≥0, T_ncan be constructed by taking exactly one element of each set of A_n. |T_n| ≤ N_nas the partition is allowed to only consist of up to Nn sets. Then,

∆(An(t)) = sup

s,u∈An(t)

d(s, u) ≥ sup

s∈An(t)

d(s, t) ≥ inf

s∈Tn

d(s, t) = d(t, Tn) (2.14) The last inequality follows as there is exactly one ˜s ∈ A_n(t) ∩ T_n and sup_s∈A_n_(t)d(s, t) ≥ d(˜s, t). Therefore,

γ_α^∗(T, d) = inf sup

t∈T

X

n≥0

2^n/2d(t, T_n) ≤ inf sup

t∈T

X

n≥0

2^n/α∆(A_n(t)) = γ_α(T, d)

and the generic chaining bound is obtained:

E[sup

t∈T

X_t] ≤ Lγ₂(T, d) (2.15)

for every centered stochastic process which is fulfilling the increment condition.

(19)

Taking a look at (2.14), one could assume the generic chaining bound is significantly less accurate than (2.12). By proving the second inequality ([12]; theorem 2.3.1), γ_α(T, d) ≤ K(α)γ_α^∗(T, d), Talagrand showed that the gamma functional is of the same order as the bound in (2.12).

Yet, it remains to investigate the sharpness of the bound in general. For a centered Gaus- sian process (X_t)_t∈T and d being defined as the canonical distance d(s, t) = (E(X_s− X_t)²)^1/2, the majorizing measure theorem ([12]; theorem 2.4.1), published in [10] before, proves that γ2(T, d) is of the same order as E[sup_t∈T Xt].

1

Lγ₂(T, d) ≤ E[sup

t∈T

X_t] ≤ Lγ₂(T, d) (2.16)

The proofs of the last two results use techniques not further required for the purpose of this thesis. Therefore, they are skipped while the corresponding theorems in [12] are mentioned.

After demonstrating the efficiency of generic chaining for centered Gaussian processes, it will be shown how this technique can be used for a specific type of processes, which only fulfills a weaker tail condition.

Afterwards, a new bound for a specific chaos process will be presented and evaluated.

(20)

(21)

3 Gaussian Chaos

It has been shown that generic chaining leads to good bounds for centered processes fulfilling the increment condition. Focus now on processes which have a more complex distribution and only fulfill weaker tail conditions:

Be gi i.i.d.

∼ N (0, 1) and ai ∈ R, then X

i1<i2<···<id

t_i₁_,...,i_dg_i₁· · · g_i_d

is a Gaussian chaos of order d. The process X

i

ti1,...,i_dg⁽¹⁾_i₁ · · · g^(d)_i

d

with g^(j)_i ^i.i.d.∼ N (0, 1) is called a decoupled Gaussian chaos of order d. Besides the definitions above, [8] provides an introduction in estimating the moments of such a process.

Instead of moments, the focus will be on bounding the expected supremum once more.

In the following, the case d = 2 will be investigated further.

3.1 Order 2 Gaussian chaos

Under the assumption that there is a n ∈ N with ti,j = 0 for i > n or j > n, the decoupled Gaussian chaos of order 2 can be seen as a multiplication of two independent standard normal random vectors g, g⁰ of length n with a deterministic n × n matrix t. For the purpose of this thesis, assume t ∈ R^n×n:

X_t =X

i,j

t_i,jg_ig⁰_j =

n

X

i,j=1

t_i,jg_ig_j⁰ = g^Ttg⁰

Therefore, it seems natural to investigate the non-decoupled process g^Ttg. Centering the process leads to a more general definition of a second order Gaussian chaos including the diagonal terms:

g^Ttg − E[g^Ttg] =X

i,j

t_i,jg_ig_j − E[X

i,j

t_i,jg_ig_j]

i6=j→E[gigj]=0

= X

i,j

t_i,jg_ig_j −X

i

t_i,iE[g_i²]

E[g_i²]=1

= X

i6=j

t_i,jg_ig_j +X

i

(g²_i − 1)t_i,i

(22)

As one can see in the cited literature, mathematical research focuses mainly on bounding the decoupled case. A reason for this has been shown in On decoupling, series expansions, and tail behavior of chaos processes [1], where Arcones proved the following inequality:

E[sup

t∈T

|X

j6=k

t_j,kg_jg_k+X

j≥1

t_j,j(g_j²− 1)|] ≤ LE[sup

t∈T

|X

j≥1

X

k≥1

t_j,kg_jg_k⁰|] (3.1)

Therefore, bounding the decoupled process is sufficient for bounding a non-decoupled chaos.

In general, a decoupled chaos process does not fulfill the increment condition. There- fore, the generic chaining bound (2.15) cannot be applied. Instead, the following weaker tail condition can be shown [12]:

P (|X_s− X_t| ≥ v) ≤ L exp(− min

v²

d²₂(s, t), v d∞(s, t)

(3.2) d2 and d∞ are the metrics induced by the Hilbert-Schmidt-norm and the spectral norm:

d₂(s, t) =kt − sk_HS = s

X

i,j

|t_i,j− s_i,j|

d∞(s, t) =kt − sk = sup

kxk2=1

k(t − s)xk₂ =p

λ_max((t − s)^T(t − s))

By using (3.2) and applying a chaining approach, it can be shown, that γ₁(T, d∞) + γ₂(T, d₂) is a valid bound for the expected supremum of a decoupled order 2 Gaussian chaos [12].

(23)

3.2 Talagrand’s bound

Before proving this bound, the main idea and difference to the proof of the generic chaining bound should be pointed out. In case min

v² d²₂(s,t),_d ^v

∞(s,t)

= _d2^v²

2(s,t) ∀v ≥ 0, the tail condition would be equivalent to the increment condition with d = d₂ and the process could be bounded by γ₂(T, d₂).

As min

v² d²₂(s,t),_d ^v

∞(s,t)

= _d ^v

∞(s,t) has to be considered as well, a valid bound will be obtained by adding γ₁(T, d∞) as it will be shown in the following.

For using the tail condition efficiently, consider two admissible sequences (B_n)_n≥0 and (C_n)_n≥0 which will be chosen according to two metrics d₂ and d∞.

One constructs now each element A_n+1 of (A_n)_n≥0 by intersecting all sets of the partition B_n with the sets of the partition C_n. If ∆(B_n, d_∞) is comparably small ∀B_n ∈ B_n and ∆(C_n, d₂) comparably small ∀C_n ∈ C_n and A_n+1 = B_n∪ C_n, then ∆(A_n+1, d∞) and

∆(A_n+1, d₂) can be assumed to be comparably small ∀A_n+1 ∈ A_n+1 as well.

Be (B_n)_n≥0 and (C_n)_n≥0 such that:

sup

t∈T

X

n≥0

2ⁿ∆(B_n(t), d∞) ≤ 2 inf sup

t∈T

X

n≥0

2ⁿ∆( ˜B_n(t), d∞) = 2γ₁(T, d∞) sup

t∈T

X

n≥0

2^n/2∆(C_n(t), d₂) ≤ 2 inf sup

t∈T

X

n≥0

2^n/2∆( ˜C_n(t), d₂) = 2γ₂(T, d₂) (3.3)

The existence of those admissible sequences can be shown easily. As |T | = 1 is trivial, assume w.l.o.g. |T | > 1:

Be f : E → R⁺ and inf_e∈Ef (e) = u > 0, then

∃˜e ∈ E : f (˜e) ≤ 2 inf

e∈Ef (e) Proof :

Be f : E → R⁺0 and inf_e∈Ef (e) = u > 0.

Assume f (e) > 2u ∀e ∈ E, then infe∈Ef (e) = sup{x ∈ R : f (e) ≥ x∀e ∈ E} ≥ 2u. This contradicts to the assumption inf_e∈Ef (e) = u > 0.

Choose now f ((B_n)_n≥0) = sup_t∈T P

n≥02ⁿ∆(B_n(t), d∞) and note that inf f ((B_n)_n≥0) = γ1(T, d∞) ≥ inf ∆(B0(t), d∞) = ∆(T, d∞) = sup_s,tkt − sk2 > 0 for |T | > 1. As usual, the infimum is taken over all admissible sequences.

Therefore, the existence of a suitable admissible sequence (B_n)_n≥0 follows directly from above. Proving the existence of (Cn)n≥0 follows the same way.

As explained before, construct (A_n)_n≥0 in the following way:

A_n= {A_n ⊂ T : A_n = B_n−1∩ C_n−1, B_n−1∈ B_n−1, C_n−1 ∈ C_n−1} n ≥ 1 A₀ = {T }

(24)

Proving that (A_n)_n≥0 is an admissible sequence works in two steps:

|A_n| ≤ |B_n−1| · |C_n−1| ≤ N_n−1· N_n−1≤ N_n

as there are |B_n−1| · |C_n−1| many possible intersections between the elements B_n−1∈ B_n−1 and C_n−1 ∈ C_n−1.

The second condition follows from the fact that (B_n)_n≥0 and (C_n)_n≥0 are admissible sequences. Be A_n+1 ∈ A_n+1, then there exist B_n ∈ B_n and C_n ∈ C_n with A_n+1 = B_n∩ C_n. As (B_n)_n≥0 and (C_n)_n≥0 are admissible, there ∃ B_n−1 ∈ B_n−1 and C_n−1 ∈ C_n−1 with B_n⊆ B_n−1, C_n⊆ C_n−1 and ∃A_n∈ A_n with A_n = B_n−1∩ C_n−1. Then, A_n+1 = B_n∩ C_n ⊆ B_n−1∩ C_n−1= A_n.

Therefore

∀n ≥ 0, ∀A_n+1 ∈ A_n+1 ∃A_n∈ A_n : A_n+1 ⊆ A_n and it follows that(A_n)_n≥0 is admissible.

As before, define a chaining by choosing subsets T_n ⊆ T so that T_n contains exactly one element of each set A ∈ A_n and denote by π_n(t) the unique element contained in An(t) ∩ Tn.

Use the tail condition (3.2) and plug in v = ud∞(s, t) +√

ud₂(s, t) with u ≥ 0:

P |Xs− Xt| ≥ ud∞(s, t) +√

ud2(s, t)

≤ L exp(− min (ud_∞(s, t) +√

ud₂(s, t))²

d²₂(s, t) ,ud_∞(s, t) +√

ud₂(s, t) d∞(s, t)

≤ L exp(− min u²d²_∞(s, t) + 2v^3/2d∞(s, t)d₂(s, t) + ud²₂(s, t)

d²₂(s, t) , u +

√ud₂(s, t) d_∞(s, t)

≤ L exp(−u − min u²d²_∞(s, t) + 2v^3/2d∞(s, t)d2(s, t) d²₂(s, t) ,

√ud2(s, t) d∞(s, t)

| {z }

≥0

≤ L exp(−u)

The proof continues now similarly to the proof of the generic chaining bound.

Therefore, plug in u = 2ⁿw for w ≥ 1 and investigate |X_π_n_(t) − X_π_n−1_(t)|:

P |X_π_n_(t) − X_π_n−1_(t)| ≥ w(2ⁿd∞(π_n(t), π_n−1(t)) + 2^n/2d₂(π_n(t), π_n−1(t)))

w≥√ w

≤ P |X_π_n_(t) − X_π_n−1_(t)| ≥ w2ⁿd∞(π_n(t), π_n−1(t)) +√

w2^n/2d₂(π_n(t), π_n−1(t))

≤ L exp(−2ⁿw)

(25)

As before, there are less or equal than 2²ⁿ⁺¹ pairs of (π_n(t), π_n−1(t)). Therefore, for w ≥ 9:

P (Ω^c)

=P ∃n ≥ 1, t ∈ T : |X_π_n_(t)− X_π_n−1_(t)| ≥ w(2ⁿd∞(π_n(t), π_n−1(t)) + 2^n/2d₂(π_n(t), π_n−1(t)))

≤LX

n≥1

2²ⁿ⁺¹exp(−2ⁿw) ≤ L exp(−w)X

n≥1

2²ⁿ⁺¹exp(−2ⁿ⁻¹)

= ˜L exp(−w)

The last inequality follows from (2.10) by choosing u = √

2w. With u²2⁻ⁿ⁻¹ = w2ⁿ and u²/2 = w, the bound can be obtained.

Arguing like in (2.9), sup_t∈T |X_t− X_t₀| can be bounded for w ≥ 9

P sup

t∈T

|X_t− X_t₀| ≤ w sup

t∈T

X

n≥1

(2ⁿd_∞(π_n(t), π_n−1(t)) + 2^n/2d₂(π_n(t), π_n−1(t)))

!

≥ 1 − L exp(−w) For n ≥ 2, π_n(t), π_n−1(t) ∈ A_n−1(t) ⊆ B_n−2(t). As A_n−1(t) ⊆ C_n−2(t) as well, the following holds:

d∞(π_n(t), π_n−1(t)) ≤ ∆(B_n−2(t), d∞) d₂(π_n(t), π_n−1(t)) ≤ ∆(C_n−2(t), d₂)

As d(π1(t), π0(t)) ≤ sup_s,˜_t∈T d(s, ˜t) = ∆(T, d) and D0 = {T } for every admissible sequence (D_n)_n≥0, one can substitute the sum of metrics by gamma functionals:

X

n≥1

2ⁿd∞(πn(t), πn−1(t)) ≤ LX

n≥0

2ⁿ∆(Bn(t), d∞) ≤ 2Lγ1(T, d∞) = ˜Lγ1(T, d∞) X

n≥1

2^n/2d₂(π_n(t), π_n−1(t)) ≤ LX

n≥0

2^n/2∆(C_n(t), d₂) ≤ 2Lγ₂(T, d₂) = ˜Lγ₂(T, d₂)

Here, one can see why it is necessary that (B_n)_n≥0 and (C_n)_n≥0 fulfill (3.3). Combining this, one obtains

P

sup

t∈T

|X_t− X_t₀| ≤ Lw(γ₁(T, d∞) + γ₂(T, d₂))

≥ 1 − L exp(−w)

This only has been shown for w ≥ 9. As before, the bound can be extended over all w ≥ 0 by choosing L high enough. Integrating yields the desired result:

E[sup

t∈T

X_t] = E[sup

t∈T

(X_t− X_t₀)] = Z ∞

0

P

sup

t∈T

|X_t− X_t₀| ≥ v

dv

= Z ∞

0

P

sup

t∈T

|Xt− Xt0| ≥ Lw(γ1(T, d∞) + γ2(T, d2))

(γ1(T, d∞) + γ2(T, d2))dw

≤ L(γ1(T, d∞) + γ2(T, d2)) Z ∞

0

exp(−w)dw = L(γ1(T, d∞) + γ2(T, d2))

(26)

Therefore, a stochastic process fulfilling the tail condition (3.2) can be bounded by the sum of the two gamma functionals γ₁(T, d∞ and γ₂(T, d₂):

E[sup

t∈T

X_t] ≤ L (γ₁(T, d∞) + γ₂(T, d₂)) (3.4)

(27)

4 Krahmer-Mendelson-Rauhut bound

In their publication Suprema of Chaos Processes and the Restricted Isometry Property [6], Krahmer, Mendelson and Rauhut investigated a specific chaos process:

Y_t= ktgk²₂− E[ktgk²₂]

In the context of their research, a bound including the γ₁-functional led to suboptimal results. They were able to bound the pth moment of Y_t for every p ≥ 1 without using the γ₁ functional by transforming the underlying process and applying a chaining approach:

E[sup

t∈T

|Y_t|^p]^1/p ≤ Lγ₂(T, d∞)

γ₂(T, d∞) + sup

t∈T

ktk_HS

+L√

p sup

t∈T

ktk

γ₂(T, d_∞) + sup

t∈T

ktk_HS

+ Lp sup

t∈T

ktk²

In case of p = 1 and 0 ∈ T , the second summand can be dropped [12]:

E[sup

t∈T

|Y_t|] ≤ Lγ₂(T, d_∞)

γ₂(T, d_∞) + sup

t∈T

ktk_HS

(4.1)

This bound will be proved and its sharpness investigated.

4.1 Proof of the bound

Recall the definition of the process Y_t = ktgk²₂− E[ktgk²₂] = htg, tgi − E[htg, tgi]. Define now a second process Z_t := htg, tg⁰i with g, g⁰ iid.

Besides those two processes the notations U² := E[sup_t∈T ktgk²₂] and V² = sup_t∈T ktk²_HS will be used.

Recall the decoupling inequality (3.1):

E[sup

t∈T

|X

j6=k

t_j,kg_jg_k+X

j≥1

t_j,j(g_j²− 1)|] ≤ LE[sup

t∈T

|X

j≥1

X

k≥1

t_j,kg_jg_k⁰|]