Particle-based Parameter Inference in Stochastic Volatility Models: Batch vs. Online

(1)

Particle-based Parameter Inference

in Stochastic Volatility Models:

Batch vs. Online

ALBIN TOFT

(2)

(3)

Batch vs. Online

ALBIN TOFT

Degree Projects in Mathematical Statistics (30 ECTS credits)

Master's Programme in Applied and Computational Mathematics (120 credits) KTH Royal Institute of Technology year 2019

(4)

TRITA-SCI-GRU 2019:072 MAT-E 2019:29

Royal Institute of Technology

School of Engineering Sciences

KTH SCI

(5)

Abstract

(6)

(7)

Sammanfattning

Partikelbaserad parameterskattning i stokastiska volatilitets modeller: batch vs. online

(8)

(9)

Acknowledgements

(10)

(11)

1 Introduction 1

1.1 Stochastic Volatility . . . 1

1.2 Online Estimation . . . 1

1.3 Parameter Estimation in Stochastic Volatility Models . . . 2

1.4 Hidden Markov Models . . . 3

1.5 Thesis Objective . . . 4

1.6 Outline . . . 5

1.7 Research Question . . . 6

2 Background 7 2.1 Hidden Markov Models and Parameter Estimation . . . 7

2.1.1 Hidden Markov models . . . 7

2.1.2 Tangent filters . . . 11

2.1.3 Parameter estimation in fully dominated HMMs . . . . 11

2.2 Stochastic Volatility Models . . . 13

2.3 Sequential Monte Carlo Methods . . . 14

2.3.1 The bootstrap particle-filter . . . 14

2.3.2 Coefficient of variation and efficient sample size . . . 16

3 Methods 18 3.1 Stochastic Volatility Models as Hidden Markov Models . . . . 18

3.2 The PaRIS-algorithm . . . 20

3.2.1 Tangent filter estimation . . . 21

3.3 PaRIS-Based Online Parameter Estimation . . . 23

3.4 Batch EM Parameter Estimation . . . 26

4 Case Study 30 4.1 Algorithm Validation . . . 30

4.1.1 Validation of online algorithm simulated data . . . 30

4.1.2 Validation of offline algorithm . . . 35

(12)

4.2 Implementation on Actual Stock Prices . . . 38

4.2.1 Online implementation . . . 39

4.2.2 Offline implementation . . . 43

4.3 Experiments Regarding Uncertainty of Estimates . . . 46

4.3.1 Experiment 1 . . . 47

4.3.2 Experiment 2 . . . 49

5 Comparison of Online and Offline Estimator 52 5.1 Comparison on Simulated Data . . . 52

5.2 Comparison on AAPL Data . . . 53

5.3 Comparison of Results from Experiment 1 and Experiment 2 . 54 6 Discussion 56 6.1 Case Study . . . 56

6.2 Stochastic Volatility Model Selection . . . 57

6.3 Online vs Offline Algorithm . . . 58

7 Conclusions and Future Work 59 7.1 Conclusions . . . 59

7.2 Future Work . . . 60

Bibliography 62 A Derivations of Quantities in Online Algorithm 64 A.1 Derivation of˜hs,θ(xs, xs+1) . . . 64

A.2 Derivation of ∇θgs,θ(xs) . . . 65

B Derivation of Quantities in Offline Algorithm 67

C Maximization of Intermediate Quantity 69

(13)

Introduction

1.1 Stochastic Volatility

When working with financial mathematics one often uses the formula

pro-posed Black and Scholes for pricing of European call options [1]. The pricing

formula proposed by Black and Scholes is widely used in banking and is per-haps the most well known formula for pricing of options. The formula is based on several assumptions regarding the dynamics of the price of the underlying asset, which is assumed to follow a log-normal distribution with constant drift and volatility over time. When examining data over stock prices, it has how-ever become obvious that these assumptions do not coincide with the dynam-ics of actual stock prices. The assumption of constant volatility has especially proved to be unjustifiable. This has given rise to new and alternative models for stock prices and their volatility.

By removing the assumption of constant volatility, and viewing the volatility as a random process varying over time, several new models have been pro-posed. By introducing randomness to the volatility it is possible to formulate models which can capture the effects of a varying volatility. These models are known as stochastic volatility models, with some concrete examples being the

Bates model [2], Heston model [3] and the Hull and White model [4].

1.2 Online Estimation

When working with stochastic processes, one is often interested in the task of parameter estimation. This is due to the fact that stochastic processes com-monly are modeled using models including some parameters determining the

(14)

behaviour of the process. Given a set of observations, one then tries to find a model for the stochastic process which best resembles the observations. A cru-cial step in determining such a model is by estimating the parameters used for modeling the process. Depending on the observations, and what the applica-tion for the model is, there are several methods of estimating these parameters. If one for instance is interested in estimating the parameters using a fixed num-ber of observations, the parameters could be estimated by using all available observations at once. This is what is often called a batch estimate. However, in many applications one faces the scenario where new data become available as time passes, or where the amount of data is extremely large. One might then be interested in repeatedly estimating the parameters as new observations be-come available. By online estimation it is possible to use previously estimated parameters and new observations to update the previous parameter estimates, without using previous observations.

1.3 Parameter Estimation in Stochastic

Volatil-ity Models

As introduced previously, stochastic volatility models are new and powerful models for modeling asset prices, exchange rates and other financial data. However, due to the complex nature of the models it can be problematic to estimate the parameters specifying the models. This problem arises from the fact that stochastic volatility models includes so called missing data, meaning that only a fraction of the time series is observable. The problem of parameter estimation for missing data models can however be solved through the use of

particle filters and the EM-algorithm, which is done by Kim and Stoffer in [5].

(15)

1.4 Hidden Markov Models

Hidden Markov models are bi-variate stochastic processes characterized by an

underlying un-observable Markov process {Xt}t∈N, and one observable

pro-cess {Yt}t∈N. The process {Xt}t∈Nis often referred to as the state, or hidden,

process, and {Yt}t∈N as the observation process. The state process is only

partially observable through the observation process, and the observations are conditionally independent given the corresponding states. One could think of a hidden Markov model as a house with lights on. For an observer stand-ing outside, it is not possible to directly observe how many residents that are inside. However, by observing the amount of windows with lights on, the ob-server can make a qualified guess of the number of residents. In this example the hidden state process is the number of residents inside, and the observation process is the number of windows with light on. A general graphical

illustra-tion of a hidden Markov model can be seen inFigure 1.1.

Figure 1.1: Graphical illustration of a hidden Markov model. AsFigure 1.1illustrates, the observation at a certain time depends only on the hidden state at the same time instance. This is illustrated by the arrows in the figure. Further, it is also shown that the state process proceeds

independently from the observation process.

Hidden Markov models are often used when one is interested in making

infer-ence regarding some Markov process by observing an associated process [6].

In recent years hidden Markov models have seen several applications, such as speech recognition, computational biology and signal processing. This can

be seen by the multiple references to hidden Markov models in [7]. Hidden

(16)

in particular when working with stochastic volatility models [8]. One of the major strengths of implementing hidden Markov models to stochastic volatil-ity models is that it allows for volatilvolatil-ity clustering, which will be discussed further in this thesis.

What makes hidden Markov models such a powerful tool in statistics is the combination of the general model structure and the fact that the hidden process possesses the Markov property. This makes the models applicable to multi-ple problems, while allowing for efficient algorithms for parameter estimation, simulation and prediction.

The fundamental problem when working with hidden Markov models is, given

a specified model and a set of observations Y1. . . Yn one wants to make

in-ference regarding the corresponding hidden states X1. . . Xn. Evaluating the

conditional distribution of the hidden state Xk at time k given Y0:n is what

is referred to as smoothing in [6], and there are several ways of performing

smoothing. Most of the early references on smoothing are based on a rather specific kind of models known as Gaussian linear state-space models. The use of these models partially arose from the pioneering work by Kalman and

Bucy in 1961, resulting in the Kalman filter [9]. However, the lack of

com-putational tools for handling nonlinear/non-Gaussian components prevented hidden Markov models for being used in their full potential for several years

[8]. Today there exist several methods of handling smoothing for such

mod-els, for instance sequential Monte Carlo methods. These algorithms generate sequences of weighted particles in order to approximate the smoothing distri-butions of the targets, and have in recent years proved to be very useful. Known as particle filters, these methods approximate sequentially the smoothing dis-tributions by means of repeated importance sampling.

1.5 Thesis Objective

The main objective of this thesis is to formulate and implement an online es-timator for stochastic volatility. This will be done by modeling the stochastic volatility as a hidden Markov model, and by implementing a particle-based online estimator to the model. The PaRIS-algorithm, formulated by Olsson

and Westerborn in [10], will be implemented as online estimator. The online

(17)

of the PaRIS-algorithm. A major difference in the two estimators is that the online estimator approximates the “true” parameter, while the offline estimator approximates the maximum likelihood estimate of the “true” parameter. The main research question of the thesis is whether the online algorithm could be considered as an alternative, or even superior method, for parameter estima-tion of stocks modeled using a stochastic volatility model, to the offline EM-implementation. This will be evaluated by comparing accuracy, uncertainty and computational time for two algorithms based on the different approaches.

1.6 Outline

The outline of the thesis is as follows:

Chapter 2provides the reader with the necessary background required for the remaining part of the thesis. It covers the concept of hidden Markov models, and explains terms such as smoothing, backward kernels, parameter

estima-tion and tangent filters. Further, Chapter 2also explains stochastic volatility

models and sequential Monte Carlo methods.

In the followingChapter 3, the methods of deriving the particle-based online

estimator of the parameters for the stochastic volatility model are presented. The chapter starts with rewriting the stochastic volatility model used in the thesis as a hidden Markov model. The PaRIS-algorithm is explained, and its application as an online parameter estimator is presented. Finally, the chap-ter explains the offline EM-implementation of the PaRIS-algorithm, which the online implementation is to be compared to.

(18)

Following the results in Chapter 4 comesChapter 5, where the results from the previous chapter are used in order to compare the two algorithms. The two algorithms are compared in terms of accuracy, precision, uncertainty and computational time. The comparisons made in this chapter are further

dis-cussed in Chapter 6. Finally, in Chapter 7 conclusions regarding the results

and future work that might be of interest are stated.

1.7 Research Question

(19)

Background

The purpose of this chapter is to provide necessary background regarding the main areas covered in the thesis. First, some definitions and explana-tions will be given regarding hidden Markov models. Parameter estimation for hidden Markov models will also be covered in this section. Secondly, some background regarding stochastic volatility models will be given, where the model covered in this thesis will be defined and explained. Thereafter sequen-tial Monte Carlo methods will be introduced, and their application to hidden Markov models will be explained.

2.1 Hidden Markov Models and Parameter

Es-timation

2.1.1 Hidden Markov models

As previously mentioned inSection 1.4a hidden Markov model (abbreviated

HMM) consist of of a bivariate stochastic process {(Xt, Yt)}t∈N. The two

processes {Xt} and {Yt} takes values in two sets X and Y. Accompanying

each set is a σ-algebra X and Y , which implies that (X, X ) and (Y, Y ) are both measure-able spaces. We will now define a HMM by providing a definition of a transition kernel, and thereafter using the definition of a transition kernel in order to define a HMM. The following definitions are made by Cappé and

Mouliens in [6].

Definition 2.1.1 (Transition kernel). Let (X, X ) and (Y, Y) be two

measure-able spaces. An un-normalized transition kernel from (X, X ) to (Y, Y) is a function Q : X × Y −→ [0, ∞] that satisfies

(20)

(i) for all x ∈ X, Q(x, .) is a positive measure on (Y, Y). (ii) for all A ∈ Y, the function x 7→ Q(x, A) is measure-able.

If Q(x, Y) = 1 for all x ∈ X, then Q is called a transition kernel, or simply a kernel. If X = Y and Q(x, X) = 1 for all x ∈ X, then Q will also be referred to as a Markov transition kernel on (X, X ).

Once the concept of a transition kernel has been defined, a formal definition

of a hidden Markov model can be made. This is done in [6] by defining its

transition kernel, and the definition is stated below.

Definition 2.1.2 (Hidden Markov Model). Let (X, X ) and (Y, Y) be two

measure-able spaces and let Q and let G denote, respectively, a Markov transi-tion kernel on (X, X ) and a transitransi-tion kernel from (X, X ) to (Y, Y). Consider the Markov transition kernel defined on the product space (X × Y, X ⊗ Y) by T [(x, y), C] = Z Z C Q(x, dx0)G(x0, dy0), (x, y) ∈ X × Y, C ∈ X ⊗ Y. (2.1)

The Markov chain {Xt, Yt}t≥0 with Markov transition kernel T and initial

distribution ν ⊗ G, where ν is a probability measure on (X, X ), is called a hidden Markov model.

Further, if one considers the case where there exists probability measures µ, λ such that both G and Q has transition densities g and q, the model is said to be fully dominated. This implies that for a fully dominated model we have

G(x, A) = Z A g(x, y)µ(dy), A ∈ Y Q(x, B) = Z B q(x, x0)λ(dx0), B ∈ X t[(x, y), (x0, y0_{)] , q(x, x}0)g(x0, y0) (2.2)

where t[(x, y), (x0, y0)] is the transition density function of T . For the

(21)

subscript θ, and thus be written as gθ, qθ.

As mentioned briefly inSection 1.4the fundamental problem in hidden Markov

modeling is making inference of a hidden sequence X0, . . . , Xn, given a fully

specified model and the corresponding observations Y0, . . . , Yn. In particular,

one is often interested in the expectation of Xk:k0_{given a sequence of}

observa-tions Y0:n = y0:n. First define the conditional distribution of Xk:k0 _{given Y}_0:n

as φk:k0_|n,θ_{, and consider the expectation E}_θ[f (X_k:k0)|Y_0:n]. The expectation

can then be defined as

Eθ[f (Xk:k0)|Y_0:n] =

Z · · ·

Z

f (xk:k0)φ_k:k0_|n,θ(Y_0:n, dx_k:k0), _(2.3)

which makes it possible to express the expectation as

φk:k0_|n,θf , E_ν[f (X_k:k0)|Y_0:n]. _(2.4)

If the HMM is assumed to be fully dominated, the conditional distribution

φk:k0_|n,θ_{can be expressed by using the joint distribution of X}_0:t_{and Y}_0:t_{, which}

can be expressed as pθ(x0:t, y0:t) = ν(x0)gθ(x0, y0) t Y k=1 qθ(xk−1, xk)gθ(xk, yk). (2.5)

An expression for the conditional distribution can thereafter be obtained by

pθ(x0:t|y0:t) = pθ(x0:t, y0:t) pθ(y0:t) = R pθ(x0:t, y0:t) Xtpθ(x0:t, y0:t)dx0:t φ0:t|t,θ = pθ(x0:t, y0:t) R Xtpθ(x0:t, y0:t)dx0:t . (2.6)

One is often in particular interested in the smoothing distribution, defined as φ0:t|t,θ. Using Equation (2.5) and 2.6 it is possible to express the

joint-smoothing distribution as φ0:t|t;θ = ν(x0)gθ(x0, y0)Qt_k=1qθ(xk−1, xk)gθ(xk, yk) R Xtν(x0)gθ(x0, y0) Qt k=1qθ(xk−1, xk)gθ(xk, yk)dx0:t . _(2.7)

This distribution will prove to be important when making inference for HMMs.

(22)

be formulated as φk:k0_|n,θf = R · · · R f (xk:k0) (Qn m=0gm;θ(xm)) ν(dx0)Q max(n,k0₎₋₁ l=0 Qθ(xl, dxl+1) Lθ(y0:n) , (2.8) with Lθ(y0:n) = Z · · · Z g0;θ(x0)ν(dx0) n Y l=0 gl+1;θ(xl+1)Qθ(xl, dxl+1) (2.9) being the observed data likelihood. Closely related to the joint-smoothing dis-tribution are the filter and predictor disdis-tributions. Define the filter disdis-tribution as φt,θ , φt:t|t;θ, and the predictor distribution as πt+1;θ , φt+1:t+1|t;θ. The

fil-ter recursion in [10] provides that for all t ∈ N and f ∈ F (X ),

φt,θf =

πt,θ(gt,θf )

πt;θgt;θ (2.10)

πt+1;θf = φt;θQθf, (2.11)

where by convention π0;θ , ν. These quantities will prove to be important in

the following chapters, and in particular inSection 2.3.

One may also consider the time-reversed state process, which in [6] is proven

to have the Markov property. If one consider a fully dominated model, the

dis-tribution of Xsgiven Xs+1and Y0:t = y0:tis give by a backward kernel

←−

Qφs;θ.

Given that the model is fully dominated, the backward kernel for xs+1 ∈ X

and f ∈ F (X ) can be written as

←−

Qφs;θf (xs+1) ,

R f (xs)qθ(xs, xs+1)φs;θ(dxs)

R qθ(x0s, xs+1)φs;θ(dx0s)

. _(2.12)

Using the backward kernel, one can also express the joint-smoothing

distribu-tion φ0:t|t;θas

φ0:t|t;θ = φt;θTt;θ. (2.13)

Here the kernel Tt;θdescribes the in-homogeneous Markov chain initialized at

x ∈ X, evolving backwards in time according to the backward kernels. Often one is interested in computing smoothed expectations of some additive

objective function ht(x0:t), which can be defined as

(23)

This set up results in the possibility of recursive computation of {Tt;θht}t∈N by Tt+1;θht+1(xt+1) = Z {Tt;θht(xt) + ˜ht(xt:t+1)} ←− Qφt;θ(xt+1, dxt). (2.15)

Being able to recursively compute {Tt;θht}t∈Nwill prove to be essential in the

following chapters.

2.1.2 Tangent filters

Tangent filters are defined as the gradient of the prediction filter πt;θft. That

is, the tangent filter ηt;θis defined by

ηf ;θft , ∇θπt;θft, ft ∈ F (X ). (2.16)

In order to obtain an expression for the gradient of the prediction distribution,

one might first consider the gradient of φ0:t|t−1;φ, and thereafter obtain the

marginal. This is done by Olsson and Westerborn in [10], where by first

deter-mining ∇θφ0:t|t−1;θf0:tand thereafter marginalizing while using the backward

decomposition inEquation (2.13), the following expression for the gradient of

the prediction filter could be determined

∇θπt;θft= πt;θ{(Tt;θht;θ− πt;θTt;θht;θ)ft}. (2.17)

By expressing the tangent filters as inEquation (2.17) it is possible to

recur-sively update the filter derivatives. This is due to the fact that both the

predic-tors and the backward statistics {Tt;θht}t∈Nare possible to update recursively

through the filtering recursion inEquation (2.10) and the recursion in

Equa-tion (2.15). However, in most cases it is not possible to obtain closed form expressions for the predictors or the backwards statistics. Overcoming this problem can however be done by the use of sequential Monte Carlo methods,

which will further be discussed inSection 2.3.

2.1.3 Parameter estimation in fully dominated HMMs

(24)

proposed by Dempster et al. [11]. The term incomplete data refers to the fact that the observed data does not include observations of the hidden states.

Given a σ-finite measure λ on (X, X ) and considering a family {f (.; θ)}θ∈Θ

we consider the task of maximizing the likelihood L(θ) defined as L(θ) ,

Z

f (x; θ)λ(dx) _(2.18)

with respect to θ. In most applications, f (x; θ) is a simple function of θ. How-ever, L(θ) is usually determined by some high dimensional integral, which makes classical maximization methods difficult. In the case where one is in-terested in HMMs f is the joint probability density function of the pair of ran-dom variables X and Y , with X being hidden and Y observed. Here f will be referred to as the complete data likelihood and L the density of Y , which is the available likelihood for estimating θ. Note that since L(θ) is assumed to be positive, maximizing L(θ) is equivalent to maximizing l(θ) , log(L(θ)). A common and regularly implemented method of solving the optimization problem presented above is the expectation-maximization-algorithm

(abbre-viated EM). The EM is described by Dempster et al. [11], and is based on the

concept of an intermediate quantity Q, defined as

Q(θ, θ0_{) ,}

Z

logf (x; θ)p(x; θ0)λ(dx), _(2.19)

where p(x; θ) , f (x; θ)/L(θ) is the conditional density of X given Y . The intermediate quantity is fully defined, and may be used as a surrogate for l(θ).

One can interpret Q(θ; θ0) as Eθ0[log(f (X; θ)], given that X is distributed

ac-cording to the probability density p(.; θ0). Instead of maximizing l(θ), which

may be impossible, one maximizes Q with respect to θ, which gives a

max-imum likelihood estimate θ∗ of θ. The EM-algorithm can be summarized in

two steps, that are repeated until the generated sequence of estimates {θi}

con-verges. The algorithm is initialized by some initial guess θ0 ∈ Θ, then the two

steps follows:

• E-Step: Determine Q(θ; θi₎

• M-Step: Determine θi+1 _{∈ Θ which maximizes Q(θ}i+1_{, θ}i₎

which are repeated until convergence is achieved.

There are however alternative methods to the EM-algorithm, such as

(25)

the EM-algorithm may be applied, it is possible to evaluate the derivatives of the objective function l(θ) with respect to θ. If one considers the

Robbins-Monro scheme at iteration n let

θn = θn−1+ γnZn, (2.20)

where Znis a noisy measurements of ∇θlθ(y0:t)|θ=θn−1, and {γn} a sequence

satisfying the stochastic approximation requirements

P∞

n=1γn= ∞ and

P∞

n=1γn2 < ∞. If there is a large number of observations,

approximating Znat each iteration will be computationally costly, since each

observation is required in the approximation. Further, for every new

observa-tion available, Znneeds to be recalculated, which means this is an offline

al-gorithm. The algorithm can however be transformed into an online algorithm,

which is described in [10], and this will be further explained inSection 3.3.

2.2 Stochastic Volatility Models

As introduced inSection 1.1, stochastic volatility models are a new and

power-ful models to describe the dynamics of stock prices. As mentioned previously, they are able to capture the stochastic behaviour of the volatility of stock prices, which results in models that closer resembles actual stock prices than models where the volatility is assumed to be constant. A general setup for a discretely

observed stochastic volatility model is provided by Genon and Catalot in [12]

dYt= ϕ(Vt)dt + Vt1/2dBt, Y0 = 0,

dVt= b(Vt)dt + a(Vt)dWt, V0 = η,

(2.21)

where {Bt, Wt}t≥0is a two dimensional Brownian Motion, {Vt} is a positive

diffusion process and η a random variable independent of the Brownian

mo-tions. The process {Yt} is discretely observed at times t = t1, . . . , tn, and is

defined as Yt , log(St), with St being the asset price at time t. By

study-ing the empirical distribution of the observed increments of {Yt} Genon and

Catalot provide a proof of convergence and a central limit theorem in [12].

By defining ϕ(Vt), b(Vt) and a(Vt) inEquation (2.21)it is possible to identify

known stochastic volatility models. Since most stochastic volatility models

are applied to stock prices, ϕ(Vt) is commonly defined as ϕ(Vt) , µ − 1₂Vt.

This is obtained by applying It ¯o’s lemma to the stochastic differential equation

dSt= µStdt + V

1/2 t StdBt,

S0 = s0,

(26)

with the substitution Yt , log(St). Consider the stochastic volatility model

proposed by Hull and White in [4]

dSt = µStdt + V

1/2 t StdBt,

dVt= φVtdt + ξdWt,

(2.23)

which by introducing the substitution Yt , log(St) and applying It¯o’s lemma

can be rewritten as dYt= (µ − 1 2Vt)dt + V 1/2 t dBt, dV t = φVtdt + ξdWt. (2.24)

2.3 Sequential Monte Carlo Methods

As mentioned inSection 2.1.2one is required to have access to filter and

pre-diction distributions in addition to the backward statistics, if filter derivative estimation is to be performed. This requirement is somewhat troublesome, since unless the state-space model is linear Gaussian or the state space X is fi-nite, these quantities are not available in a closed form. Thus, one will have to rely on approximations of these quantities in order to make filter derivative es-timation. Approximation can for instance be done by sequential Monte Carlo methods, such as sequential importance sampling proposed by Handchin in

[13].

2.3.1 The bootstrap particle-filter

In the following section it is assumed that all random variables are defined on a common probability space (Ω, F , P). Given is a sequence of observations

{yt}t∈N and a particle sample {ξti}Ni=1. Approximating the predictor πt;θf by

πN

t;θf can then be done as

πt;θf ≈ 1 N N X i=1 f (ξ_ti_{) , π}N_t;θf. _(2.25)

(27)

the updating step of the filtering recursion in Equation (2.10). The filtering

distribution φt;θf can then be approximated by φNf ;θf as

φt,θf ≈ N X i=1 wi t Ωt f (ξ_ti_{) , φ}N_{f ;θ}f, _(2.26)

where N is the number of particles generated and Ωt,PN_i=1wit. The weights

{wi

t} are generated by wti , gθ(ξti, yt), and can be interpreted as the belief that

the generated particle ξti produced the observation yt. If one were to take the

approach of the naïve sequential importance sampler, the next step would be

to generate new particles {ξt+1i } by propagating the current sample {ξti}, a

step referred to as mutation. However this has a major flaw, which is weight

degeneration. Weight degeneration means that as the algorithm is repeated

over time, a substantial amount of the weights will decrease in magnitude to become close to infinitesimal. This is due to the fact that the algorithm mu-tates particles with small weights. These particles will most likely mutate into particles with smaller weights, which leads to the phenomena of weight de-generation. The issue with degenerating weights is due to the fact that as the weights degenerate, only a few of the generated particles will affect the

ap-proximations in Equation (2.25)and 2.26. Thus, the algorithm will become

numerically unstable as time passes, and will yield inaccurate approximations.

The bootstrap particle filter, proposed by Gordon et al. in [14] overcomes this

flaw by including a selection step prior to the mutation. The selection step con-sists of replacing particles with small weights with particles with high weights. Mathematically this is done by for each particle drawing an index

I_ti ∼ P r({wi

t} N

i=1), i = 1, . . . , N. (2.27)

In words, this means for each particle an index j is drawn, where the

proba-bility of drawing index j is P(Iti = j) = w

j

t/Ωt. Note that P r({wit}Ni=1) is

known as the categorical distribution on {1, . . . , N } and is defined as just

de-scribed. This generates a uniform particle sample {ξI

i t

t , 1}Ni=1that targets φt;θ.

Performing the selection step results in a numerically stable algorithm, with non-degenerating weights.

Following the selection step comes as previously mentioned the mutation step.

Originally proposed by Gordon et al. in [14], this is done by propagating the

particles according to the state process. This implies that a new particle sample

(28)

with associated weights {wit+1}Nii computed by w

i

t+1 = gθ(ξt+1i , yt+1), i =

1, . . . , N . The algorithm is initialized by drawing {ξ0i}Ni=1 ∼ ν⊗N.

The bootstrap particle filter can also be used to obtain an approximation of

the smoothed statistic φ0:t;θft by storing the entire particle trajectory at each

time step s + 1. This is done by ξ0:s+1i = (ξ

Isi

0:s, ξs+1i ), i = 1, . . . , N . As a

result, the smoothing statistic can be approximated as

φ0:t;θf ≈ N X i=1 wi t Ωt f (ξ0:t) , φN0:t;θf. (2.29)

However, this approximation suffers from a serious drawback called the

par-ticle path degeneracy phenomena. This problem arises from the multiple

re-sampling steps in the bootstrap particle filter. The repeated re-re-sampling op-erations results in collapsing particle trajectories, meaning that prior to some random time T < t all trajectories are the same. This means that at a certain

time t > T , all particles {ξit}Ni=1share a common ancestor at time T . This

re-sults in inaccurate approximations of the joint filtering distribution, and thus an alternative approach is desirable without degenerating particle paths. One

such solution proposed by Olsson and Alenlöv in [10] is by using the

back-ward decomposition inEquation (2.13), and will be further explained in

com-ing chapters.

2.3.2 Coefficient of variation and efficient sample size

Since an SMC-estimator can suffer from weight degeneration, it is of interest to obtain some measure of the performance of the algorithm. This is often

done by determining the coefficient of variation CVN, and the efficient sample

size Nef f. The coefficient of variation is defined as

CVN , v u u t1 N N X i=1 N w i PN l=1wl − 1 !2 , _(2.30)

which is minimal when all weights are equal, and maximal when all weights except one is equal to zero. The efficient sample size is defined as

Nef f ,

N

1 + CV2

N

(29)

and can be interpreted as the number of effective particles used by the algo-rithm in the estimation. Since it is desirable with an algoalgo-rithm that uses as many particles as possible, a low value of the coefficient of variation and an

efficient sample size close to the total number of particles N , imply that the

(30)

Methods

In this chapter the methods for deriving the particle-based online estimator of the stochastic volatility will be presented. It provides a thorough explanation of how a general stochastic volatility model can be modeled as a HMM. There-after, the HMM model of the stochastic volatility is examined further, with important distributions, parameters and quantities defined. Once the setup for the stochastic volatility HMM is specified, a thorough description of the PaRIS-algorithm will be given. Thereafter online parameter estimation using the PaRIS-algorithm will be described. The last section of the chapter will de-scribe a conventional offline parameter estimator using the PaRIS-algorithm.

3.1 Stochastic Volatility Models as Hidden Markov

Models

Since online estimation is to be performed using SMC on HMMs, a crucial first step is rewriting a discretely observed stochastic volatility model, with sam-pling ∆t, as a state-space model for a HMM. Consider the stochastic volatility

model defined in Section 2.2 in Equation (2.23). By solving the stochastic

differential equations inEquation (2.23)using It ¯o’s lemma we obtain

Sti = Sti−∆texp µ − 1 2Vti ∆t + U_t1 ipVti∆t , Vti = Vti−∆texp φ − 1 2ξ 2 ∆t + U_t2_iξ √ ∆t , (3.1) where {Ut1i, U 2

ti} are two processes of independent standard normal random

variables. By defining Yti , log

_S

ti

S_ti−∆t

and Xti , log(Vti),Equation (3.1)

(31)

can be rewritten as Yti = µ∆t − 1 2exp(Xt)∆t + √ ∆t exp(Xt/2)Ut1i, Xti = Xti−∆t+ (φ − 1 2ξ 2_{)∆t + ξ}√_∆tU2 ti. (3.2)

This can be viewed as the state-space equations of a HMM, with {Xt} being

the hidden process and {Yt} being the observed. However, a simplified version

of these state-space equations will be considered in this thesis. The simplified model is made similar to the stochastic volatility model defined by Cappé and

Mouliens in [6] in Example 80

Xk= φXk−1+ σUk1

Yk = β exp(Xk/2)Uk2,

(3.3)

where (Xk, Yk) , (Xtk, Ytk), k = 1, . . . , n. One can generalize the model

specified in Equation (3.3) by including a constant drift in the log-returns,

resulting in

Xk = φXk−1+ σUk1

Yk = µ + β exp(Xk/2)Uk2.

(3.4)

One can argue that the simplified model specified above models the same

behaviour as the more advanced model in Equation (3.2). For instance, the

drift-term in the log-volatility in Equation (3.2)should be zero, since

other-wise it would imply that the volatility would either become infinite or zero over time. The parameters that fully specify the model can be summarized as

θ = (µ, φ, σ2_{, β}2_{). Some parameters worth extra consideration is σ}2

, which is the volatility of the stochastic log-volatility, and is often called the vol-vol. Further, µ is the drift of the log-returns, and φ is a parameter regarding the be-haviour of the volatility process. Some remarks regarding φ is that if φ ∈ (0, 1) the log-volatility is a weakly stationary process, and if φ ≥ 0 the model will have volatility clustering. Volatility clustering is a phenomena observed by

Mandelbrot in [15], which was that large changes are often followed by large

changes, and that small changes often were followed by small changes. Since this is a frequently occurring phenomena it is desirable with a model able to exhibit it.

Considering the model inEquation (3.4), it is possible to derive expressions

(32)

one can identify qθ(xk−1, xk) = 1 √ 2πσ2 exp −(xk− φxk−1) 2 2σ2 , gθ(xk, yk) = 1 p2πβ2 exp −(yk− µ) 2 2β2 exp(−xk) − 1 2xk . (3.5)

Deriving explicit expressions for the transition densities are of utmost interest, since they will be part of several vital quantities used in the algorithms. The

parameters collected in θ are unknown, and given a set of observations Y0:t =

y0:t the model can be calibrated through estimating the parameters. This will

be covered later in this chapter.

3.2 The PaRIS-algorithm

In Section 2.3 the bootstrap particle filter was introduced. Building on the theory covered regarding the bootstrap particle filter, the PaRIS-algorithm

for-mulated in [10] will be presented in this section. As mentioned inSection 2.3,

a problem with the bootstrap particle filter is the issue of degenerating

parti-cle trajectories. In [10] a solution to this problem is proposed by using the

backward decomposition. In order to use the backward decomposition

de-fined inEquation (2.13), one is required to approximate each backward kernel

←−

Qφs;θ, s ∈ N. This can be done through Monte Carlo estimation as

←− Q_φN s;θf (x) = X i=1 wi sqθ(ξsi, x) PN l=1wslqθ(ξsl, x) f (ξ_si), (x, f ) ∈ X × F (X ), _(3.6) where f ∈ F (X ). By replacing ←− Qφs;θ with ←− Q_φN s;θ in Equation (2.15), and

assuming estimates {˜τti}Ni=1 of {Ttht(ξti)}Ni=1 are available, it is possible to

recursively update the estimates as

˜ τ_t+1i = N X j=1 wj_tqθ(ξtj, ξt+1i ) PN l=1wltqθ(ξtl, ξit+1) (˜τ_tj + ˜ht(ξj, ξt+1i )). (3.7)

In practice, the recursion is initialized by letting ˜τ0i = 0, i = 1, . . . , N .

Using these estimates, the expectation φ0:t+1|t;θht+1can be approximated as

(33)

What makes this method promising is that it both allows for online estimation of φ0:t+1|t;θht+1 and also requires few variables to be stored in the memory. However, it suffers from a problem with computational complexity, since at every time step N terms needs to be summed and computed for each particle. This results in a scenario where one is required to keep the number of particles low in order to avoid a slow algorithm, which results in low precision.

The PaRIS-algorithm works around this problem by replacingEquation (3.7)

with the Monte Carlo approximation

τ_t+1i = 1 ˜ N ˜ N X j=1 τJ (i,j) t+1 t + ˜ht(ξ J_t+1(i,j) t , ξ i t+1) , _(3.9)

with N ∈ N˜ ∗ being the sample size of the Monte Carlo estimate. Further,

{J_t+1(i,j)}N˜

j=1are i.i.d samples from P r({wltqθ(ξtl, ξt+1i )}Nl=1). Drawing these

sam-ples can be done through rejection-sampling [16], which is done by first

draw-ing J∗ ∼ P r({wlt}Nl=1) and thereafter accepting the draw with probability

qθ(ξJ

∗

t , ξt+1i )/¯. Using the Monte Carlo estimate, it is possible to obtain an

estimate of φ0:t+1|t;θht+1as φN_0:t+1|t;θht+1 , 1 N N X i=1 τ_t+1i . _(3.10)

3.2.1 Tangent filter estimation

The PaRIS-Algorithm can effectively be implemented to estimate the tangent

filter specified inEquation (2.17). The algorithm allows for online estimation

of the flow {ηt;θ}t∈Nas new observations become available. By targeting the

predictor distributions through the use of a particle filter, while recursively

updating estimates {τti}Ni=1 ≈ {Tt;θht;θ(ξti)}Ni=1 as outline in Equation (3.9),

it is possible to derive estimates of the tangent filters. Note that ht;θ is the

complete data score function defined as

ht;θ(x0:t) = t−1 X s=0 ˜ hs;θ(xs:s+1), with ˜ hs;θ(xs:s+1) = ∇θlog gs;θ(xs) + ∇θlog qθ(xs, xs+1). (3.11)

For any ft∈ F (X ), the PaRIS-algorithm returns an estimate

(34)

at each time step t. This estimate targets ηt;θft. In [10] the algorithm is

sum-marized in pseudo-code, and the formulation is presented below. Note that every line is repeated for i ∈ (1, . . . , N ).

Algorithm 1 PaRIS 1: Input: (y0:tend, θ) 2: draw ξ0i ∼ ν 3: set τ0i ← 0 4: for t ← 0, 1, . . . , tend− 1 do 5: set wti ← gt,θ(ξti) 6: draw Iti ∼ P r({wlt}Nl=1) 7: draw ξt+1i ∼ qθ(ξ Iti t , .) 8: for j ← 1, 2, . . . , ˜N do 9: draw Jt+1(i,j) ∼ P r({wltqθ(ξlt, ξt+1i )}Nl=1) 10: end for 11: set τt+1i ← ˜N−1 PN˜ j=1 τJ (i,j) t+1 t + ˜ht,θ(ξ J_t+1(i,j) t , ξt+1i ) 12: set ¯τt+1 ← N−1PN_l=1τt+1l 13: set ηt+1;θN ← N −1PN l=1(τt+1l − ¯τt+1)δt+1l 14: end for

Note that line 9 in the algorithm above is performed using the rejection sampling described previously. The rejection sampling can be summarized in the pseudo-code below.

Algorithm 2 draw Jt+1(i,j)using rejection sampling

1: Input:({wtl}Nl=1, {ξ

l t}Nl=1, ξ

i t+1)

2: set accepted← FALSE

3: while accepted=FALSE do 4: draw J∗ ∼ P r({wlt}Nl=1)1 5: draw U ∼ U nif (0, 1) 6: if U ≤ qθ(ξJ ∗ t , ξt+1i )/¯ then 7: set Jt+1(i,j)← J∗

8: set accepted← TRUE

9: end if

10: end while

1

(35)

3.3 PaRIS-Based Online Parameter

Estima-tion

The algorithm outlined in Section 3.2estimates the flow of filter derivatives

for a fully specified model, with known parameter θ. However, the state-space

model for stochastic volatility introduced inSection 3.1is not fully specified,

since the parameters collected in θ are not initially unknown. As mentioned in Section 2.1.3, likelihood optimization of incomplete data models can be applied in order to obtain an estimate of θ, given a state space model and

ob-servations Y0:t = y0:t. As described inSection 2.1.3likelihood optimization

of incomplete data models consists of maximizing the likelihood defined in

Equation (2.18)with respect to θ. Since the optimization problem is of

com-plex nature, due to the fact that the log-likelihood lθ(y0:t) is not possible to

express in a closed form, an approximation of the optimization will be formed. As described previously, the two most common approaches to per-form this approximation are the EM-algorithm and gradient-based methods.

In [10] a gradient method using the PaRIS-algorithm is presented, which uses

the approximated filter derivatives from the PaRIS in order to approximate the log-likelihood. The PaRIS-based parameter estimator proposed by Olsson and

Westerborn in [10] is an online estimator, which is relevant to this thesis since

an online estimator is to be compared to an offline. The optimization problem to be solved is

θ∗ = argmaxθ∈ΘLθ(y0:t), (3.13)

which is equivalent to solving

θ∗ = argmaxθ∈Θlθ(y0:t), (3.14)

where lθ(y0:t) , logLθ(y0:t). Finding the optimal θ∗ can therefore be done

by determining the root to ∇θlθ(y0:t), which can be done through stochastic

approximation via the Robbins-Monro scheme. Consider the setup

θn+1= θn+ γn+1∇θlθ(y0:t)|θ=θn. (3.15)

As mentioned in Section 2.1.3this setup results in an offline estimator, since

the gradients of the objective function are required to be recalculated as new observations become available. The estimator proposed by Olsson and

(36)

with lθ(y0|y0:−1) = lθ(y0). By using the decomposition it is possible to

de-rive ∇θlθ(y0:t) = Pt_s=0∇θlθ(ys|y0:s−1). Since the objective function lθ(y0:t)

can be expressed as a sum of the one-step predictor likelihoods, the opti-mization problem could be cast into the framework of stochastic optiopti-mization through the use of stochastic sub-gradients. Instead of focusing on the

gradi-ent of lθ(y0:t) one can instead focus on the gradient of each one-step

predic-tor log-likelihood. Rewriting the updating scheme of θnusing the stochastic

sub-gradient method consists of replacing ∇θlθ(y0:t)|θ=θn−1 with its

stochas-tic sub-gradient ∇θlθ(yn|y0:n−1)|θ=θn−1. This results in the following updating

scheme

θn = θn−1+ γn∇θlθ(yn|y0:n−1)|θ=θn−1. (3.16)

One now needs to compute the sub-gradient ∇θlθ(yn|y0:n−1). This can be done

by first deriving an expression for Lθ(yn|y0:n), proceeded by using the

defini-tion lθ(yn|y0:n) , log(Lθ(yn|y0:n)), and finally differentiating the expression.

Starting with Lθ(yn|y0:n), one can rewrite the one step predictor likelihood as

Lθ(yn|y0:n) =

Lθ(y0:n)

Lθ(y0:n−1)

. _(3.17)

UsingEquation (2.8)one can then express Lθ(y0:n) and Lθ(y0:n−1) as

Lθ(y0:n) = R · · · R f (xn) (Qn_m=0gm;θ(xm)) ν(dx0)Qn−1_l=0 Qθ(xl, xl+1) φn:n|n,θf Lθ(y0:n−1) = R · · · R f (xn) Qn−1_m=0gm;θ(xm) ν(dx0)Qn−1_l=0 Qθ(xl, xl+1) φn:n|n−1,θf (3.18)

Inserting the expressions from Equation (3.18) in Equation (3.17) one then

obtains the following

Lθ(yn|y0:n) =

φn:n|n−1,θf gn;θ

φn:n|n,θf

. _(3.19)

Using the filter recursion fromEquation (2.10), the expression can be rewritten

(37)

By taking the logarithm of the expression, it is possible to express the one step

predictor log-likelihood as lθ(yn|y0:n−1) = log(πn;θgn;θ). Finally

differentiat-ing the expression results in the sub-gradient, which can be written as ∇θlθ(yn|y0:n−1) =

πn;θ(∇θgn;θ) + ηn;θgn;θ

πn;θgn;θ

, _(3.21)

with ηn;θ being the tangent filter defined inEquation (2.16).

Since the quantities πn;θ(∇θgn;θ), ηn;θgn;θ and πn;θgn;θ in general may not be

expressed in a closed form, the estimator proposed by Olsson and Westerborn

in [10] proceeds by approximating the quantities, using the PaRIS-algorithm.

The updating scheme for the parameter θ proposed in [10] at observation n + 1

can be summarized as

θn+1 = θn+ γn+1ζˆn+1, (3.22)

whereζˆn+1is an particle estimation, using the PaRIS-algorithm described in

Section 3.2, of the gradient of the one step predictor log-likelihood in Equa-tion (3.21). Given a set of observations Y0:t = y0:t and a state-space model

with unknown parameter θ, the proposed estimator finds an estimate θ∗of the

true parameter θ by “sweeping” through the observations. As new

observa-tions Yt+1;t0 = y_t+1;t0 _{become available, the algorithm is able to update the}

parameter estimation by continuing from where it stopped. Hence, it is an online algorithm. Olsson and Westerborn provides a describing section of

(38)

Algorithm 3 PaRIS-based online parameter estimation 1: Input:(y0:tend) 2: set θ0arbitrarily 3: draw ξ0i ∼ ν 4: set τ0i ← 0 5: for t ← 0, 1, . . . , tend− 1 do 6: set wti ← gθt(ξ i t, yt) 7: draw Ii ∼ P r({wtl}Nl=1) 8: draw ξt+1i ∼ qθt(ξ Ii t , .) 9: for j ← 1, 2, . . . , ˜N do 10: draw Jt+1(i,j) ∼ P r({wltqθt(ξ l t, ξt+1i )}Nl=1) 11: end for 12: set τt+1i ← ˜N −1PN˜ j=1 τJ (i,j) t+1 t + ˜ht,θt(ξ J_t+1(i,j) t , ξt+1i ) 13: set ¯τt+1 ← N−1 PN l=1τ l t+1 14: setζˆt+11 ← N −1PN l=1∇gθt(ξ l t+1, yt+1) 15: setζˆt+12 ← N −1PN l=1(τ l t+1− ¯τt+1)gθt(ξ l t+1, yt+1) 16: setζˆt+13 ← N−1 PN l=1gθt(ξ l t+1, yt+1) 17: set θt+1← θt+ γt+1 ˆ ζt+11 + ˆζt+12 ˆ ζ3 t+1 18: end for

As explained previously, rejection sampling is implemented to draw Jt+1(i,j)

according toAlgorithm 2.

3.4 Batch EM Parameter Estimation

As mentioned previously, there are mainly two approaches when one is

inter-ested in the task of parameter estimation. The estimator formulated in

Sec-tion 3.3 is a so called online estimator, based on the approach where one is using the gradient of the objective function. However, parameter estimation can also be performed using an offline algorithm, based on the EM-algorithm

described inSection 2.1.3. One is then required to first derive an expression

for the intermediate quantity Q(θ; θi), and secondly maximize the quantity

(39)

In the particular case of the stochastic volatility model, using the definitions

of the transition and emission densities inEquation (3.5),Equation (3.23)can

be written as Q(θ; θi_{) ∝ (n + 1) log β}2₊ 1 β2 S1− 2µS2+ µ 2_S 3 + + n log σ2+ 1 σ2 Z1− 2φZ2+ φ 2_Z 3 , (3.24)

which is done in Appendix B. The statistics S1, S2, S3, Z1, Z2, Z3 are defined

Following the E-step comes the M-step, where one in this case maximizes the

intermediate quantity by searching the roots of the gradient of Q(θ; θi). Let

(40)

Using simple rules of derivation, which are presented in Appendix C, one obtains µ∗ = S2/S3 φ∗ = Z2/Z3 (σ2)∗ = (Z1− 2φ ∗_Z 2+ (φ∗)2Z3 n (β2)∗ = (S1− 2µ ∗_S 2+ (µ∗)2S3 n + 1 . (3.26)

At each instance of the EM-algorithm the parameters are now updated

ac-cording to its optimal values defined inEquation (3.26). However, in order

to update the parameters one is required to estimate the statistics. Estimat-ing the statistics can effectively be done implementEstimat-ing the PaRIS-algorithm in

Algorithm 1. By letting ˜ h_t,θi(x_t, x_t+1) ,         y2 te −xt yte−xt e−xt x2_t+1 xt+1xt x2 t         ,

it is possible to obtain estimates of the statistics of interest. This can be done by using the output of the PaRIS-algorithm, giving an estimate of the statistics

asPNi=1

wi n

Ωnτ

i

n. Using the estimations of the statistics one can now compute the

(41)

Algorithm 4 PaRIS-based offline parameter estimation 1: Input:(y0:tend) 2: set θ0 arbitrarily 3: for k ← 0, 1, . . . , #EM s do 4: draw ξ0i ∼ ν 5: set τ0i ← 0 6: for t ← 0, 1, . . . , tend− 1 do 7: set wit← gθk(ξ t t, yt) 8: draw Ii ∼ Pr({wlt}Nl=1) 9: draw ξt+1i ∼ qθk(ξ Ii t , .) 10: for j ← 1, 2, . . . , ˜N do 11: draw Jt+1(i,j)∼ P r({wtlqθk(ξ l t, ξit+1)}Nl=1) 12: end for 13: set τt+1i ← ˜N −1PN˜ j=1 τJ (i,j) t+1 t + ˜ht,θk(ξ J_t+1(i,j) t , ξt+1i ) 14: end for 15: set wit+1← gθk(ξ i t+1, yt+1) 16: set Ωt+1←PN_l=1wlt+1 17: set S1 ←PN_l=1 wl t+1 Ωt+1τ (l,1) t+1 18: set S2 ← PN l=1 wlt+1 Ωt+1τ (l,2) t+1 19: set S3 ←PN_l=1 wl t+1 Ωt+1τ (l,3) t+1 20: set Z1 ← PN l=1 wl t+1 Ωt+1τ (l,4) t+1 21: set Z2 ←PN_l=1 wl t+1 Ωt+1τ (l,5) t+1 22: set Z3 ← PN l=1 wl t+1 Ωt+1τ (l,6) t+1 23: set θk+1(1) ← S2/S3 24: set θk+1(2) ← Z2/Z3 25: set θk+1(3) ← (Z1− 2θ (2) k+1Z2+ θ (2)2 k+1Z3)/tend 26: set θk+1(4) ← (S1− 2θ (1) k+1S2 + θ (1)2 k+1S3)/(tend+ 1) 27: end for

This estimator will be used inSection 4.2.2in order to validate the online

(42)

Case Study

The purpose of this chapter is to implement the algorithms described in

Sec-tion 3.3 and Section 3.4 to the hidden Markov volatility model formulated inSection 3.1. Prior to using data of actual stock prices, data will be simu-lated using the HMM stochastic volatility model, with known parameters. By comparing the estimated parameters, obtained by applying the PaRIS-based parameter estimators to the simulated data, to the parameters used to simulate the data, one can determine whether the estimators finds the true parameters. Once it has been confirmed that the algorithms finds the true parameters, one can proceed to using actual stock prices. In order to validate the parameters estimated for the actual stock prices, an offline estimation of the parameters will also be performed. The offline estimation will be performed using a batch EM estimation. The implementations of the algorithms on both the simulated data and the actual stock data were carried out using R. Following the im-plementations on simulated and stock data comes two experiments where the uncertainty of the two algorithms are compared.

4.1 Algorithm Validation

4.1.1 Validation of online algorithm simulated data

In order to validate the online algorithm, parameter estimation was first per-formed on data generated using a model with specified parameter values.

Con-sider the model specified in Equation (3.4), and let θ = (µ, φ, σ2, β2) =

(0.75, 0.9, 0.2, 1). By assuming a initial distribution ν = N (0, σ2_{) of X}

0, and

drawing x0 from ν, it is possible to generate both unobserved data X0:tend =

x0:tend and observed data Y0:tend = y0:tendfor some time period [0, tend]. Using

(43)

the generated data, it is now possible to run the parameter estimation described inSection 3.3and obtain estimates of the parameters. The algorithm requires

quantities˜hs,θ(xs, xs+1) and ∇θgs,θ(xs), and the derivation of these quantities

can be found inAppendix A.

The validation was performed using a simulated data set of 50, 000 observa-tions, with a fraction of the data presented in the figure below.

0 100 200 300 400 500

−10

−5

0

5

First 500 observations of simulated data

Index

Y

Figure 4.1: First 500 observations of the simulated log-returns, using the

specified stochastic volatility model with parameters θ = (µ, φ, σ2, β2) =

(0.75, 0.9, 0.2, 1).

Using the simulated data, the PaRIS-based online estimator could now be implemented in order to obtain online estimates of the parameters. In order to verify the convergence of the estimator, the algorithm was implemented

five times on the data set, each time with randomized initial values for θ0.

Thus, resulting in five separate learning trajectories for each of the

(44)

relying on specific values of the initial parameters in θ0. The parameter

estimation was performed using N = 1400 particles, {γt}50,000t=1 = {t

−0.6_}50,000

t=1 andN = 2 as precision parameter. Further, the˜

value of ¯ in the rejection sampling was set to ¯ = √_2πσ1 2

t

, with σt2 being the

estimated value of σ2at iteration t. This was done to satisfy the requirement

¯

≥ qθ(ξt, ξt+1). The results from the parameter estimation on the simulated

data are presented below in graphs and tables.

0 10000 20000 30000 40000 50000 0.2 1.0 µ 0 10000 20000 30000 40000 50000 0.0 0.8 φ 0 10000 20000 30000 40000 50000 0.2 1.0 σ 2 0 10000 20000 30000 40000 50000 1.0 2.0 β 2

Figure 4.2: Trajectories of PaRIS-based online parameter estimator for 5

im-plementations with randomized initial values θ0. Note that red lines are

rep-resenting the true parameter values. Each implementation is using N = 1400

particles andN = 2 as precision parameter.˜

Table 4.1: Estimated parameters for each of the 5 estimations with correspond-ing computation times.

Run µ φ σ2 β2 Run time[h]

1 0.7793 0.8853 0.2184 0.9981 15.31

2 0.7772 0.8818 0.2221 1.0074 9.81

3 0.7783 0.8803 0.2231 1.0117 9.15

4 0.7783 0.8824 0.2121 1.0054 19.51

(45)

FromFigure 4.2one can clearly observe that the learning trajectories from the PaRIS-based online parameter estimator converge to the true parameter

values. In both the learning trajectories for σ2 and β2 one can identify jumps

in the beginning of the algorithm. These jumps might occur as a result of a negative updated value of the parameter, which might be the result of the algorithm updating the parameters with too large steps for small values of t. Since the variance of a random variable is non-negative, this is not allowed, thus the algorithm updates the parameter using the absolute value. However,

as the algorithm proceeds through the data set, both σ2and β2converge

towards the true parameter values. Further,Table 4.1confirms the

convergence of each parameter by providing numerical values of the estimates.

Further, as described inSection 2.3.2it is possible to identify potential issues

with degenerating weights by calculating the CVN and Nef f, defined in

Equation (2.30)andEquation (2.31). This was done for the last estimation run, and the results are presented below.

0 10000 20000 30000 40000 50000 0 5 15 CV N 0 10000 20000 30000 40000 50000 0 600 1400 Ne ff

Figure 4.3: Coefficient of variation and efficient sample size for the last esti-mation run of the online implementation of the PaRIS-algorithm on the

sim-ulated data. An overall high efficient sample size Nef f indicates a stable

al-gorithm without degenerating weights. Final values after the last observation

was CVN = 0.4011 and Nef f = 1205.9888, with an average efficient

sam-ple size N¯ef f = 1210, which can be considered close to the total number of

(46)

Further, a histogram of the efficient sample size can be used to clearer visualize the distribution of the efficient number of particles in the algorithm.

Histogram of Efficient Sample Size

Efficient Sample Size

Frequency 0 200 400 600 800 1000 1200 1400 0 5000 10000 15000 20000

Figure 4.4: Histogram of efficient sample size for the last estimation run of the online implementation of the PaRIS-algorithm on the simulated data. A high frequency for the efficient sample size with values close to the total number of particles 1400 is desirable.

As can be seen inFigure 4.3, the algorithm in general has a low coefficient of

variation, resulting in an efficient sample size close to the total number of

particles. One can however identify some spikes in the CVN, which could

mean that some steps in the updating of θ could be inaccurate. However,

combining the results fromFigure 4.2andTable 4.1withFigure 4.3and

Figure 4.4, provides an overall impression that the algorithm is stable and provides accurate estimates.

In addition to the implementation above, the same validation was performed

using a reduced particle sample with N = 600 and tend = 30, 000. This was

(47)

estimates converging to the true parameters. The results from this validation

can be found inAppendix D.

4.1.2 Validation of offline algorithm

In addition to validating the online implementation of the algorithm on simu-lated data, the offline EM-estimator was also validated using a simusimu-lated data set. For the validation of the offline estimator, a smaller data set consisting of 1000 observations was generated, using both the same model and parameters as for the online-implementation. By applying the EM-implementation of the

PaRIS-algorithm described inSection 3.4on the simulated data set, maximum

likelihood estimates of the true parameters were obtained. The estimates were generated from running the EM-algorithm 600 times, using N = 300

parti-cles andN = 2 as precision parameter. By arbitrarily setting θ˜ 0 and updating

θi, i = 1, . . . , 600 according to the algorithm, it was possible to obtain

(48)

0 100 200 300 400 500 600 0.50 0.65 µ 0 100 200 300 400 500 600 −0.2 0.4 0.8 φ 0 100 200 300 400 500 600 0.20 0.40 σ 2 0 100 200 300 400 500 600 0.9 1.2 1.5 β 2

Figure 4.5: Learning trajectories of EM-estimations of model parameters θ =

(µ, φ, σ2_{, β}2_{). True parameter values θ}∗

= (0.75, 0.9, 0.2, 1) are indicated

by red horizontal lines in each graph. Final estimation of parameters θ600 =

(49)

0 200 400 600 800 1000 0 1 2 3 4 CV N 0 200 400 600 800 1000 0 100 200 300 Ne ff

Figure 4.6: Coefficient of variation and efficient sample size for offline EM-implementation of the PaRIS-algorithm on the simulated data. An overall high

efficient sample size Nef f indicates a stable algorithm without degenerating

weights. The average efficient sample size for the last iteration of the

EM-implementation could be computed asN¯ef f = 263, which can be considered

close to the total number of particles, which was 300.

(50)

Frequency 0 50 100 150 200 250 300 0 100 200 300 400

Figure 4.7: Histogram of efficient sample size for the last estimation run of the offline EM-implementation of the PaRIS-algorithm on the simulated data. A high frequency for the efficient sample size with values close to the total number of particles 300 is desirable.

As can be seen inFigure 4.5the EM-implementation of the PaRIS-algorithm

is able to estimate the true parameters from a simulated data set. In the following section of this chapter, the offline EM-implementation will be compared to the online implementation of the PaRIS-algorithm when using actual stock data.

4.2 Implementation on Actual Stock Prices

(51)

stock prices were required to be transformed into log-returns, which was done

by Yt = log(St) − log(St−1). The resulting log-return data set is presented

below. 0 100 200 300 400 −0.8 −0.4 0.0 0.2 0.4

Monthly log−Returns of AAPL

Index

Y

Figure 4.8: Monthly log-returns from the AAPL stock from the time period (01-01-1980) to the latest registered price.

4.2.1 Online implementation

Once the stock prices had been transformed into log-returns, parameter

es-timation could be performed using the online algorithm as in Section 4.1.1,

according to the algorithm described inAlgorithm 3. Some minor alterations

to the algorithm were made, in order to improve the performance. First of all, the number of particles was decreased from N = 1400 to N = 600 in order to decrease the computational time of the algorithm. Reducing the size of the

particle cloud can be motivated by the results fromAppendix D, which imply

(52)

to the simulated data set in Section 4.1.1. By repeating the data set, which consisted of approximately 400 observation, 75 times it was possible to create a larger set of observations consisting of approximately 35, 000 observations. Letting the algorithm process this enlarged data set yielded in converging

pa-rameter estimates, which can be seen inFigure 4.9. Further, it showed that the

algorithm suffered from some instability when estimating values of σ2 and β2

close to 0. Therefore, an alteration to the updating step of the estimation was

made. By decreasing the step size of the online estimator for parameters σ2

and β2, if the current estimate of the parameter falls under 0.1, stable results

were attainable. As for the implementation on simulated data, the algorithm was performed 5 times on the data. The results from the online

implementa-tion are presented graphically inFigure 4.9andFigure 4.10, and numerically

inTable 4.2. 0 5000 10000 15000 20000 25000 30000 35000 −0.5 0.5 µ 0 5000 10000 15000 20000 25000 30000 35000 0.4 1.0 φ 0 5000 10000 15000 20000 25000 30000 35000 0.1 0.4 0.7 σ 2 0 5000 10000 15000 20000 25000 30000 35000 0.0 0.4 0.8 β 2

Figure 4.9: Learning trajectories of 5 online estimations of parameters.

Us-ing 5 randomized initial values for θ0, each individual implementation of the

(53)

Table 4.2: Estimated parameters for each of the 5 online estimations, with corresponding computation times.

1 0.0156 0.9367 0.0775 0.0159 1.38

2 0.0165 0.9372 0.0825 0.0160 1.19

3 0.0156 0.9434 0.0711 0.0157 1.25

4 0.0155 0.9347 0.0764 0.0153 1.30

5 0.0151 0.9393 0.0754 0.0159 1.20

FromFigure 4.9one can observe that all 5 implementations of the algorithm converges towards similar parameter values. Further, the learning trajectories illustrate that the algorithm requires each implementation to be repeated over the data several times. This due to the fact that the data set only consists of approximately 400 observations, and the learning trajectories does not seem to converge until after at least 15, 000 observations. Further, the numerical

results presented inTable 4.2provide additional information regarding the

(54)

0 5000 10000 15000 20000 25000 30000 35000 0 2 4 6 8 CV N 0 5000 10000 15000 20000 25000 30000 35000 0 200 400 600 Ne ff

Figure 4.10: Coefficient of variation and efficient sample size for online imple-mentation of the PaRIS-algorithm on the AAPL data. An overall high efficient

sample size Nef f indicates a stable algorithm without degenerating weights.

The average efficient sample size for the last iteration of the online

implemen-tation could be computed as N¯ef f = 553, which can be considered close to

the total number of particles, which was 600.

(55)

Frequency 0 100 200 300 400 500 600 0 5000 15000 25000

Figure 4.11: Histogram of efficient sample size for the last estimation run of the online-implementation of the PaRIS-algorithm on the AAPL data. A high frequency for the efficient sample size with values close to the total number of particles 600 is desirable.

As can be seen inFigure 4.10the overall efficient sample size is close to the

total number of particles 600. This implies that the algorithm should not suffer problems with degenerating weights, which is the same conclusion that could be made regarding the implementation on simulated data.

4.2.2 Offline implementation

(56)

time and precision. Using the same data set as inSection 4.2.1, and the

of-fline algorithm validated inSection 4.1.2, it was possible to obtain parameter

estimates from 5 implementations of the offline-algorithm. The results are presented below in tables and graphs.

0 100 200 300 400 500 600 0.023 0.027 µ 0 100 200 300 400 500 600 0.2 0.6 1.0 φ 0 100 200 300 400 500 600 0.1 0.3 0.5 σ 2 0 100 200 300 400 500 600 0.012 0.020 β 2

Figure 4.12: Learning trajectories of 5 offline estimations of parameters.

Us-ing 5 randomized initial values for θ0, each individual implementation of the

offline algorithm seems to converge towards the same value.

Table 4.3: Estimated parameters for each of the 5 offline estimations, with corresponding computation times.

1 0.0249 0.9608 0.0443 0.0142 2.04

2 0.0241 0.9598 0.0468 0.0147 2.03

3 0.0245 0.9596 0.0452 0.0136 2.40

4 0.0244 0.9585 0.0486 0.0145 2.06

(57)

0 100 200 300 400 0 1 2 3 4 CV N 0 100 200 300 400 0 50 150 250 Ne ff

Figure 4.13: Coefficient of variation and efficient sample size for offline EM-implementation of the PaRIS-algorithm on the AAPL data. An overall high

efficient sample size Nef f indicates a stable algorithm without degenerating

weights. The average efficient sample size for the last iteration of the

EM-implementation could be computed asN¯ef f = 232, which can be considered

close to the total number of particles, which was 250.

(58)

Frequency 0 50 100 150 200 250 0 50 100 150 200 250

Figure 4.14: Histogram of efficient sample size for the last estimation run of the offline EM-implementation of the PaRIS-algorithm on the AAPL data. A high frequency for the efficient sample size with values close to the total number of particles 300 is desirable.

4.3 Experiments Regarding Uncertainty of

Es-timates

In this section two experiments will be performed. The main purpose of the two experiments is to investigate which of the two algorithms that provides the estimate with the least uncertainty. Experiment 1 focuses on the uncertainty with respect to the true parameter, while Experiment 2 focuses on the uncer-tainty with respect to the maximum likelihood estimate. During both of the

experiments, the total number of particles was set to N = 300 and N = 2˜

for both the online and offline estimator. The true parameters were set as in