Sequential parameter and statelearning in continuous timestochastic volatility models using theSMC² algorithm

(1)

IN

DEGREE PROJECT MASTER;S PROGRAMME, APPLIED AND , SECOND CYCLE COMPUTATIONAL MATHEMATICS 120 CREDITS

,

STOCKHOLM SWEDEN 2015

Sequential parameter and state

learning in continuous time

stochastic volatility models using the

SMC² algorithm

VICTOR TINGSTRÖM

(2)

(3)

Sequential parameter and state learning

in continuous time stochastic volatility

models using the SMC² algorithm

V I C T O R T I N G S T R Ö M

Master’s Thesis in Mathematical Statistics (30 ECTS credits) Master Programme in Applied and Computational Mathematics (120 credits) Royal Institute of Technology year 2015

Supervisor at KTH: Jimmy Olsson Examiner: Jimmy Olsson

TRITA-MAT-E 2015:80 ISRN-KTH/MAT/E--15/80-SE

Royal Institute of Technology

SCI School of Engineering Sciences

KTH SCI

(4)

(5)

Sequential parameter and state learning in

continuous time stochastic volatility models using

the SMC

2

_algorithm

Victor Tingström Abstract

In this Master’s thesis, joint sequential inference of both parameters and states of stochastic volatility models is carried out using the SMC2_algorithm found in [1]. The models under study are the continuous time s.v. models (i) Heston, (ii) Bates, and (iii) SVCJ, where inference is based on options prices. It is found that the SMC2_{performs well for the simpler models (i) and} (ii), wheras filtering in (iii) performs worse. Furthermore, it is found that the FFT option price evaluation is the most computationally demanding step, and it is suggested to explore other avenues of computation, such as GPGPU-based computing.

(6)

(7)

Sekventiell estimering av parametrar och tillstånd i

tidskontinuerliga stokastiska volatilitetsmodeller

nyttjandes SMC

2

_algoritmen

Victor Tingström Sammanfattning

(8)

(9)

Acknowledgements

First and foremost I would like to thank my supervisor Associate Professor Jimmy Olsson for introducing me to the interesting field of computational statistics, as well as his invaluable guidance and support throughout this thesis. I would also like to extend my thanks to Professor Nicolas Chopin who gave me guidance with regards to the SMC2 _{algorithm. Furthermore} I would also like to thank my high school mathematics teacher Vanja Jarl-Liljegren whose commitment to teaching and vast knowledge enabled me to pursue this academic degree. Lastly, I would like to thank my fiancée Hanna for her support and encouragement during my five years at KTH.

(10)

(11)

Nomenclature

Symbol Description

N (µ, 2₎ _{Normal distribution with mean µ, and variance} 2_. U(a, b) Uniform distribution on the interval [a, b].

E(µ) Exponential distribution with rate parameter µ. IG( , ) Inverse gamma distribution with shape and scale . G( , ) Gamma distribution with shape and scale .

FN (µ, 2₎ _{Folded normal distribution with mean µ and variance} 2_. a.s.

! Almost sure convergence. d.

! Convergence in distribution.

d X(x) Dirac measure evaluated at x with mass X.

supp(f) The support of f : X ! Y, i.e. supp(f) = {x 2 X | f(x) 6= 0}. N+ _{The natural numbers excluding 0, i.e. N}+ _{, N\{0}.}

(14)

(15)

Chapter 1

Introduction

The concept of banking might seem a newfangled idea, but is in fact mil-lenias old – beginning with the Romans who formalized the banking system and the Belgians who in 1531 laid the foundations for the modern stock exchange. During the culmination of imperialism in 17th century Europe, the concept of stock companies was born. As East India trading companies began paying out dividends to the shareholders for all the coming voyages as opposed to single voyages which had been the case earlier. Without the digital tools that are commonplace now, all the shares in companies were issued on paper, meaning that shareholders could trade them – however, as there was no formal place to conduct trades, trades were diﬃcult to carry out. This was remedied in 1773 when the London Stock Exchange was cre-ated, followed by the New York Stock Exchange in 1792. Hence, banking is in fact a rather old concept, tightly connected to civilization1_.

As a stakeholder one wishes to see the investment turn into profit. The shareholders of the first stock companies might have based their investments on the notion that the company has a good business model or a great new idea. However, it would be desirable to quantify and predict the results of the investment, e.g. how should one optimally allocate funds in a portfolio of stocks or what is the estimated return – which is by no means an easy task. But as time has progressed, so has mathematics, and it has been proven to be an invaluable tool for many applications. In 1900, Louis Bachelier brought mathematics into the world of finance by writing his pioneering PhD-thesis Théorie de la spéculation, in which he (among other things) invents the backbone of advanced financial mathematics used for modeling asset price dynamics and pricing contingent claims – namely the Brownian motion and martingales.

Further groundbreaking advancements were made in 1973, as Black and

(16)

Scholes introduced the Black-Scholes model which provided a unified frame-work for pricing European-styled derivatives. The asset price dynamics in this model could be argued to be too simple, as the volatility is assumed to be constant, which is not congruent with real world data. Thus, in 1993 Steven Heston introduces the Heston model which assumes that the asset’s log-price dynamics constitutes a continuous time hidden Markov model. In this case, the observable process corresponds to the asset’s log-price and the volatility to the hidden process. However, [2, 3, 4, 5] suggest that this model is misspecified, as it does not exhibit the kurtosis found in empirical data. This means that the tails are not “fat” enough to generate extreme values. One suggestion to improve the shortcomings of this model was to introduce the Constant Elasticity of Volatility model; see [6] for a specification. How-ever, this model generates too much kurtosis and thus too many extreme values. The remedy for this problem is to add a Poisson driven jump process to the asset price dynamics. This new idea corresponds to the Bates model, which corrects the kurtosis issue.

But as empirical data suggest, a sudden shock to the asset price is usu-ally followed by an increase in volatility [5]; see Figure 1.12_{. In [7, 2, 3] it is} suggested to adress this issue by adding a Poisson driven jump process to the volatility process as well; resulting in the Stochastic Volatility with Contem-praneous Jumps model. Furthermore, [4] finds that “The models that allow for volatility jumps do a very decent job of explaining the time variations in volatility, as well as the dynamics of volatility itself”.

Figure 1.1: The prices of the S&P500 index together with the volatility index VXO, during the crash of 1987.

Specifying the model constitutes only a small part of the job since these are often heavily parameterized. For example, the dynamics of the SVCJ model is governed by ten diﬀerent parameters which are non-trivial to es-timate. A natural source of information concerning parameters and latent states is the observed returns. However, as the diﬀerent models have a rather complex observable process, it has been found that inference based solely on

(17)

returns is rather uninformative and thus requires quite long time series to give precise estimates [8, 9]. To supplement the returns one can use option prices, as these are semi-closed form expressions of the parameters and states of the hidden volatility process under the diﬀerent models [10, 4, 5, 11, 9]. In [9] it is found that “[...] the use of parameter estimates based on joint options and returns data provide economically significant performance en-hancements.”

There are several papers on how to estimate parameters in stochastic volatility models, but the first to use both options and returns were [12, 2]. As known to the author, the first paper that turns to MCMC for inference about parameters and volatility is [13], which is followed by [9] developing a particle filtering scheme targeting both the parameters and volatility. In [14], the iterated filtering algorithm found in [15] is used on the Heston and Bates model to infer both states and parameters. In [16], calibration of the Heston, Bates, and SVCJ model is carried out by minimizing the squared error be-tween observed and model specific option prices using diﬀerent minimization schemes including genetic algorithms and adaptive simulated annealing.

The eﬃcient method of moments algorithm is in [12] used to infer the parameters of the Heston model. In [2], the implied-state generalized method of moments is used to infer the parameters, and then “back out” the option implied volatility; the true volatility is then found using the volatility-risk premium parameter. Both of these algorithms rely on a closed form expres-sion of the moment-generating function. While the MGF exists for the s.v. models under study in this thesis, it is not guaranteed to exist for other more complicated models. Furthermore, if several models are under study, it would be tedious to have to derive the MGF for these if the dynamics of the models are easy to sample from.

The MCMC approach taken in [13] allows for estimating both the param-eters and states using a simulation-based methodology. This methodology relies on Gibbs sampling, which in turn requires the full conditional distribu-tions being known up to a normalizing constant. Given the complexity of the models, this methodology becomes cumbersome and needs to be tailored for the problem at hand. The iterated filtering algorithm is used to great eﬀect in [14]. One of its great advantages over the previously mentioned algorithms is that inference concerning the latent states is based on particle filters, which have proven highly successful in non-linear filtering. However, the author of the thesis finds that the algorithm loses its stated plug-and-play ability when choosing not to use a bootstrap filter. Furthermore, iterated filtering requires defining algorithmic parameters for the considered problem.

(18)

of [16]. However, since the optimization space is huge the convergence is very slow, even when only considering a few number of options. Further-more, using previous estimates as an initial guess rarely yield any notable performance increase. Hence, ASA is not recommended for this problem.

The purpose of this thesis is to use the SMC2 _{algorithm introduced in} [1] for sequential estimation of parameters and states of models of the type previously described. A side goal is to provide an understandable and easy-to-read exposition of SMC as well as an overview of advanced financial asset price models, and how European vanilla options are priced under these.

What all the previously mentioned algorithms have in common is that the dataset used for inference must remain of fixed size during runtime. However, there are applications, e.g. finance, where data becomes available continuously. For these applications, it would be preferrable to use an algo-rithm that allows for appending this new data. Sequential algoalgo-rithms, such as SMC2_{, allow for this. Furthermore, the sequential structure of the} algo-rithm has the advantage of quickly discarding parts of the sampling space that are unintersting [1]. Additionally, particle filters are used for the esti-mation of latent states. While the algorithm is sequential, it is not online, i.e. the complexity increases with the time index; see [1] for details.

Comparing the algorithm with EMM or GMM, SMC2 _{has the} advan-tage of not requiring the model’s MGF to be known in closed form; we only require that it is possible to numerically evaluate the emission and transi-tion density. Another advantage, as compared with iterated filtering, is that SMC2 _{does not require any user-defined algorithmic parameters. Other} in-teresting characteristics of SMC2 _{is that it may automatically adjust the size} of the particle filters, as well as automatically calibrate the proposal density used in the particle MCMC rejuvenation step. The latter is a nice feature since defining the proposal in MCMC is not always easy, as it needs to be constructed such that it covers the sampling space and preferrably has most its mass in the interesting part.

It is found that the SMC2_{is a good addition to the Monte Carlo toolbox} when considering models that do not require evaluating option prices using FFT. It was also found that parallellizing the option pricing step oﬀers a considerable performance gain. Furhtermore, the author suggests to explore the possiblity of using GPGPU-computing, as modern graphics cards oﬀer a way of further parallellizing complex numerical tasks.

All code has been written in Python, making use of its classes and the extensive community-developed libraries, such as PyFFTW; which is a high performance FFT library written in C. The source code can be provided on request by e-mailing victor.tingstrom@gmail.com.

(19)

(20)

(21)

Chapter 2

Stochastic volatility

The algorithm we are using for inference, SMC2_{, operates under the} as-sumption that the considered models are hidden Markov models. This chap-ter therefore begins with an introduction to this concept. We then present the stochastic volatility models under study, and how vanilla European call options are priced under these. Discretization schemes for SDEs are then presented. At the end of this chapter, we present formally the problem of this thesis, including the derivation of the densities used for inference.

2.1 Hidden Markov models

A hidden Markov model is a stochastic process {(Xk, Yk)| k 2 N+}, where Xk takes values in some space X ✓ Rdx and Yk in Y ✓ Rdy. The process (Xk)k 1 is hidden from the observer but is partially observable via the ob-servable process (Yk)k 1. Furthermore, the process (Xk) is Markovian, and (Yk) are assumed to be conditionally independent random variables given the states (Xk). A common graphical illustration of an HMM is found in Figure 2.1.

X1 X2 X3

Y1 Y2 Y3

Figure 2.1: Illustration of an HMM. Xk corresponds to the hidden process and Yk

corresponds to the observable process

(22)

follow-ing densities

Yk| Xk= x⇠ p✓(·|x) , (2.1) Xk| Xk 1= x⇠ q✓(·|x) , (2.2)

X1⇠ ✓(·), (2.3)

where p is often referred to as the emission density, q the transition density, and the inital density. The subscripted ✓ imply a possible dependence on parameters defined on some parameter space ⇥. We will for brevity and readability use the following notation extensively

p✓(x) = p(x|✓).

Another way of representing the dynamics of an HMM is via the following system of equations

Yk= h (Xk, Vk) ,

Xk= g (Xk 1, Wk) , (2.4)

where (Vk)k 1, (Wk) 2 are two independent noise sequences defined on V, W respectively, and h : X ⇥ V ! Y, g : X ⇥ W ! X both assumed to be Borel measurable.

Since the noise sequences are independent, the joint density of an HMM can be factorized as p✓(x1:n, y1:n) = ✓(x1)p✓(y1|x1) n Y k=1 q✓(xk+1|xk)p✓(yk+1|xk+1). (2.5) In the HMM context, one is most often interested in the smoothing density, which by Bayes’ theorem is found to be [17]

f✓(x1:n|y1:n) = p✓(x1:n, y1:n) p✓(y1:n) , (2.6) where p✓(y1:n)is given by p✓(y1:n) = Z Xn p✓(x1:n, y1:n) dx1:n, (2.7) and is usually called the likelihood function.

2.1.1 Example

An example relating to finance, and stochastic volatility in particular, is the s.v.-model introduced by Taylor in 1982 with the following dynamics

(23)

where ↵ 2 (0, 1), , 2 R+_{. Furthermore, (V}

k)k 2 and (Wk)k 1 are Gaus-sian noise processes with variance 1. The strength of this model is that it allows for “volatility clustering” [17], i.e. the volatility gets higher in mar-ket stress. In this example, it is seen that the parameter set ✓ is given by ✓ = {↵, , }. We also find from the above model dynamics that the transition density is given by

Xk| Xk 1 = x⇠ N ↵x, 2 , (2.8)

and that the emission density is given by

Yk| Xk = x⇠ N 0, 2ex . (2.9)

2.2 Financial models

This section introduces the diﬀerent asset price models that will be studied, which are:

(i) Heston model, (ii) Bates model, (iii) SVCJ model.

We will let S represent the observable asset price process, and V the hidden volatility process. Furthermore, we let the processes be defined on the prob-ability space (⌦, Ft,{Ft}t 0,P), and for simplicity assume that S and V are defined on R+_.

• Heston model

The Heston model, developed by Steven Heston [18], has the following asset price dynamics

dSt= µStdt + p VtStdWts, (2.10) dVt= ( Vt) dt + v p VtdWtv, (2.11) where µ 2 R, and , , v 2 R+. The processes Wts and Wtv are two Brownian motions such that dhWs_{, W}v_i

t= ⇢ dt.

• Bates model

(24)

where the model parameters are the same as for the Heston model. Furthermore, the model is extended to include the Poisson process Nt with intensity N, such that P(dNt = 1) = Ndt. The percentage jump size J has the distribution

log(1 + J)_{⇠ N (µ}s, 2s), where µs 2 R, and s2 R+.

• SVCJ model

The SVCJ model introduced in [7] has the following dynamics dSt= µStdt + p VtStdWts+ JStdNt, (2.14) dVt= ( Vt) dt + v p VtdWtv+ Z dNt, (2.15) where the parameters are again the same as for the previous models. Compared to the Bates model, the SVCJ model also includes the jump size in volatility Z ⇠ E(µv), which by definition is always positive. Furthermore, the jump size in the asset price is somewhat modified to include the size of the volatility jump, such that

log(1 + J)_{| Z = z ⇠ N µ}s+ ⇢Jz, 2s .

The parameter ⇢J is often diﬃcult to estimate [4], and will thus set to 0 throughout this thesis.

It is now very important to note that none of the above models are of the form of an HMM as each St depends not only on Vt but also previous values of the process S. However, by applying the It¯o formula to the stochastic process Yt = log St it is possible to transform these models such that the observed state process only occurs on the l.h.s., which translates to using log-returns as observable process instead.

(25)

2.3 Contingent claims and option pricing

Let S denote an asset price process, t the current time, and T some future time s.t. t < T . If X is a random variable such that X 2 FS

T, then X is said to be a contingent claim [19]. We will refer to the time T as the date of maturity. If X can be written on the form

X = (ST),

then X is said to be a simple claim. The perhaps most famous form of is (x) = max_{x K, 0_{} = (x} K)+.

The above contract is called a European call option, and it gives the holder of the contract the right to, at time T , buy one share of the asset S for the price K, thus making a profit of (ST K)+. From [19] it is known that the price of a contingent claim at time t is given by

⇧(t; ) =EQhe RtTr(s) ds· (S_T) FS

t i

, (2.16)

where r(t) is the discounting rate1_{, and F}S

t the -algebra generated by the asset price process. Note that the expectation in (2.16) should be taken under the Q-measure, which in finance is often referred to as the equivalent martingale measure. However, the models in Section 2.2 are defined under the objective measure P. To get the Q-dynamics, we must use the Girsanov theorem and change the measure. The Q-dynamics of the asset price process is obtained by remembering that the discounted price process is a martingale under Q [19]. Finding the dynamics of the volatility process under Q is not as straightforward since this process cannot be priced. However, when working with the previously introduced models, the Girsanov kernel can be modeled as [20]

'(t) = pVt, (2.17)

where is the volatility risk-premium; a compensation for carrying the risk of stochastic volatility [3].

• Heston model

By applying the Girsanov theorem to the Heston model with respective kernels for each process, the Q-dynamics become

dSt= rStdt + p VtStd ˜Wts, dVt= ˜(˜ Vt) dt + v p Vtd ˜Wtv,

(26)

where the correlation is unchanged. Note that the parameters ˜ and ˜ are not the same as  and , since the measure has been changed. Using (2.17), it is seen that

˜

 =  + , ˜ =   + .

We know from the Girsanov theorem that the diﬀusion remains the same under a change of measure. Furthermore, it is in [11] said that the products  and ˜˜ should take the same values under P and Q. This practically means that one can use either returns or option prices for estimating the parameters, but that the estimates and products should be the same under both measures. One way to implement this condition is to simplify the problem and assume that the parameters are the same under both measures; greatly reducing the complexity of inferring the parameters as ⌘ 0. In [11] it is referred to as a time-series consistency. Although setting ⌘ 0 is in general not rec-ommended [9], it is a necessity since the author’s computer is of limited computational power.

• Bates model

Since there is also a jump process in the asset dynamics, the resulting Q-dynamics will look slightly different from the Heston model, such that dSt= (r Ñµ˜J)Stdt + p VtStd ˜Wts+ JStd Ñt, dVt= ˜(˜ Vt) dt + v p Vtd ˜Wtv.

It is now important to bear in mind that the all of the jump parameters have changed aswell. However, it is found in [3] that the parameters governing the jump size distribution are not notably aﬀected by the transformation, and thus remain almost the same. It shall also be assumed that the intensities under P and Q are equivalent using the same time-series consistency argument as for the Heston model. • SVCJ model

The dynamics of the SVCJ model is almost identical to that of the Bates model, and given by

dSt= (r ÑµJ)Stdt + p VtStd ˜Wts+ JStd Ñt, dVt= ˜(˜ Vt) dt + v p Vtd ˜Wtv+ Z d Ñt.

(27)

Given the risk-neutral dynamics, it remains to compute the expectation (2.16) under the diﬀerent models. However, given the inherent complex-ity of the models, a closed form solution of (2.16) is not available; but it can be represented as an infinite integral [3, 7]. In lack of an analytical solution, we need to resort to numerical methods. The method used in this thesis is the fast Fourier transform, or FFT for short; mainly for its speed and relative simplicity to implement [10].

To utilize FFT option pricing, we must know the characteristic function of the asset’s log-price dynamics under the Q-measure. Therefore, denote the characterstic function as, using the notation of [10],

YT(u) =EQ h eiuYT _FS,V t i , (2.18)

where YT is the solution of the log-price process at time T , and FtS,V the -algebra generated by both S and V . Having this characteristic function at hand, [10] shows that the price of a European call option at time t, maturing at T with log-strike k = log K is given by

CT(k) = e ↵k ⇡ Z ₁ 0 e iuke r(T t) YT(u (↵ + 1)i)

(↵ + iu)(↵ + 1 + iu) du, (2.19) where r is the discounting rate, assumed to be constant, and ↵ a dampening factor making the integrand square integrable; a requirement for Fourier inversion. The characteristic functions for each of the models are shown below.

• Heston model

By first transforming the asset price process to log-price using the It¯o theorem one gets that, setting Yt= log St

dYt= ✓ r Vt 2 ◆ dt +pVtd ˜Wts.

Next, it can be shown [18] that the characterstic function of the above process is given by, using the slightly more readable form of [16],

YT(u) = exp{C(u; T t)˜ + D(u; T t)Vt+ A(u; T t)} ,

(28)

and the constants given by [16] ⇣ = 2 v 2 , = ˜ ⇢ viu, ↵ = u 2_{+ iu} 2 , d =p 2 _4↵⇣, r+/r = ± d 2⇣ , g = r r+ . • Bates model

By proceeding as for the Heston model yields that the log-price process is given by dYt= ✓ r ˜NµJ Vt 2 ◆ dt +pVtd ˜Wts+ log(1 + J) d ˜Nt. In [3] it is shown that the characteristic function is given by, again using the slightly more readable form of [16],

YT(u) = exp

n

C(u; T t)˜ + D(u; T t)Vt+ P (u)˜N(T t) + A(u; T t) o

. Where C and D are defined as before but where P is given by

P (u) = µJiu + (1 + µJ)iuexp ⇢ 2 s iu(iu 1) 2 1. • SVCJ model

The log-price process for the SVCJ model is the same as for the Bates model. The only diﬀerence is the functional form of P which gets a bit more complicated,

(29)

and ' ' = log ⇢ 1 (d )c + 2µv↵ 2dc ⇣ 1 e d⌧⌘ . where c = 1 iu⇢Jµv ⌘ 1.

The integral (2.19) can then be evaluated fast and eﬃciently using FFT by discretizing and rewriting it such that [10]

ˆ CT(ku) = e ↵ku ⇡ N X j=1 e i2⇡ N(j 1)(u 1)eibvj (v_j)⌘ 3 3 + ( 1) j _(j ₁₎ _,

where is the Dirac -function, and is the integrand in (2.19), given by (u) = e

r(T t)

YT(u (↵ + 1)i)

(↵ + iu)(↵ + 1 + iu) .

In order to use FFT, we must define some algorithmic parameters, where the notation in [21] will be used. As the integral (2.19) is infinite, we need to put a finite upper bound on the integral, defined as c. The number of samples used in the FFT is set to N. If the upper bound on the integral is c, and the number of samples is N, the diﬀerence between two log-strikes will be ⌘ = c/N. The grid used for evaluating the integrand is given by vj = ⌘(j 1), for j = 1, 2, . . . , N. The FFT then calculates the prices of options on the grid of strike prices

ku = b + ⌘(u 1), u = 1, 2, . . . , N, where b = ⇡/⌘.

In practice, this means that one needs to interpolate between the log-strikes in order to get the price of interest. This is both an advantage as well as it is a drawback. The advantage is that the method outputs the prices for many diﬀerent strike levels with the same maturity and underlying in one iteration. The drawback is that the interpolation can result in inaccuracy. The remedy for this is a tighter grid, i.e. a greater value of N. However, this increases the computational load dramatically, since N can only be increased in multiples of 2. Therefore a balance between eﬃciency and numerical accuracy must be struck. Although, as seen in Figure 2.2, the error decreases rather fast as the number of samples increase, whereas the computing time increases two-fold for every increase in samples.

(30)

Figure 2.2: Computational time vs. error as a function of the number of samples, N, in the FFT. The error is defined as =k ˆCT(ku)|N CˆT(ku)|N =49,152k.

2.4 Discretization schemes

It may sometimes be advantageous to model a physical process as a SDE, such as the evolution of an asset price in finance, or the growth rate of bacteria in biology. However, any observations will naturally take place in discrete time. To accomodate for models where the SDEs do not have a closed form solution, we will have to discretize the processes. The perhaps most famous discretization is the Euler-Maruyama scheme. Thus, assume that we are given a SDE on [0, T ] with the following dynamics

dXt= µ(t, Xt) dt + (t, Xt) dWt, (2.20)

X0= x. (2.21)

The solution on the interval [t, t + t] is given by Xt+ t= Xt+ Z t+ t t µ(u, Xu) du + Z t+ t t (u, Xu) dWu.

The Euler-Maruyama scheme approximates the solution on the interval such

(31)

Thus, we get that the discretization of (2.20) is given by Xtk+1 = Xtk + µ(tk, Xtk) tk+ (tk, Xtk) Wk,

where tkand Wkare to be understood as tk+1 tk and W (tk+1) W (tk) respectively.

It can be shown that the weak and strong order errors for the Euler-Maruyama scheme are as follows [22]

Weak order = |E [X(T ) X(tN)]| = O ✓ max k tk ◆ , Strong order = E [|X(T ) X(tN)|] = O ✓ max k p tk ◆ .

A high strong order error is usually preferred when working with filtering problems [22]. The Milstein scheme is a discretization method with a strong order error of O( t) – considerably better than Euler-Maruyama. It is derived by applying the It¯o formula to the stochastic processes µ(t, Xt) and (t, Xt); for a derivation, see [23]. The Milstein scheme then discretizes (2.20) as Xtk = µ(tk, Xtk) tk+ (tk, Xtk) Wk+ (tk, Xtk) 2 @ @x ( Wk) 2 _t k .

2.5 Problem description

In light of the previous sections, we are ready to formulate the problem of this thesis in detail. The main goal of this thesis is to, given observations, estimate the parameters of the s.v models introduced in Section 2.2. There are two main approaches to such inference: (i) Bayesian, and (ii) frequentist statistics. As SMC2 _{is based on Bayesian statistics, we will focus solely on} this approach. Therefore, assume that we are given a set of observations, denoted y1:n. A Bayesian statistician models the unknown parameters as random variables, and estimate the same via the posterior density

p(✓|y1:n),

which we from Bayes’ theorem know is proportional to p(✓_|y1:n)/ p✓(y1:n)p(✓).

The second factor on the r.h.s. is called the prior, which summarizes the information already known concerning the parameters. As an example, con-sider the model of Section 2.1.1, where we had that ↵ 2 (0, 1); such that a natural prior is U(0, 1).

(32)

circumvent this problem, we shall use data augmentation. The idea is to simulate the hidden process and use the simulated data when estimating ✓. A formal representation of the problem is thus that we are interested in the joint density

p(✓, x1:n|y1:n)/ p✓(x1:n, y1:n)p(✓).

The estimates of the states can thus be regarded as a by-product. We will not show how this density is estimated until Section 3.2, as we first need to specify in detail the HMMs under study as well as introducing the foundations of SMC2_.

2.5.1 Modeling

This section is divided into three parts: in the first we derive the emission density, in the second we derive the transition density, and in the final we present the priors of the parameters.

2.5.1.1 Emission density

From Section 2.3 we have that the option pricing formulas are deterministic functions of St, Vt, and ✓. This poses a problem in two ways: (i) the model specification is not that of an HMM, and (ii) we could get a singularity as there are mutliple observations per Vt[11]. The remedy is rather simple: we model the observable option prices as the deterministic functions together with a Gaussian noise sequence. Four diﬀerent models are presented in [11], and we shall use ES2. Under this specification, the observed price of option iat time tk is modeled as

H_ti_k = ˜hi_✓(Stk, Vtk) + U

i

tk, (2.22)

where ˜hi

✓ is the model specific option pricing formula for option i, and (U_ti_k)k 1 a Gaussian noise process with deviation . Under this specifi-cation, the deviation , measured in monetary units, causes the so called bid-ask spread; which is the interval

[highest price a buyer is willing to pay, lowest price a seller is willing to sell for] . It easy to incorporate in the inference problem, however, we shall assume that it is both fix and known.

(33)

2.5.1.2 Transition density

Discretizing the volatility process (2.15) of the SVCJ model using the Mil-stein scheme yields

Vt= ( Vt) t + v p Vt Wtv+ 2 v 4 ( Wt) 2 _{t + Z N} t. (2.24) The dynamics of the Heston and Bates model are obtained by simply remov-ing the jump process. While the conditional density is rather complicated to derive, it is easy to sample from; as samples are obtained by simply storing the values generated by simulating the process (2.24) from an initial value.

An obvious drawback using the square-root process for modeling is that it is not allowed to assume negative values. While the parameters of the continuous time version can be chosen such that they satisfy the Feller con-dition2_{, there is no discrete equivalence. This means that we need to resort} to numerical tricks to ensure positivity. Four of the most common schemes are found in [24], together with a new one. We shall use the simple trick of taking the absolute value, such that

Vtk+1 =|Vtk +· · · | .

The next problem we are facing is choosing the initial density. A possible choice is the stationary distribution of (2.15). However, as we have not derived the density explicitly, we must look for other alternatives. One such is the stationary distribution of an Ornstein-Uhlenbeck process; motivated by the similiarity in dynamics to the square-root process. The initial density would thus be given by

Vt1 ⇠ FN ✓ , 2 v 2 ◆ , (2.25)

where FN is the folded normal distribution. That is, X is said to be FN if X =|Y | , Y ⇠ N (µ, 2).

It has become known to the author, after already having obtained results, that the stationary distribution of (2.11) is known in closed form and given by Vt1 ⇠ G ✓ 2 2 v , 2 v 2 ◆ . (2.26)

The above formulation is to be considered better, as it is possible to sample directly from the stationary distribution of interest.

(34)

2.5.1.3 Priors of parameters in s.v. models

We will restrict ourselves when inferring the parameters, as we shall only consider

✓ ={, , v, ⇢, N}, out of the original

✓ ={, , v, ⇢, , N, µs, s, µv}.

We do this mainly because of the limited computational power of the author. It is easy to extend the problem the include the full collection of parameters if using the SMC2_{. However, the author found that computational complexity} became too high as the algorithm had diﬃculties achieving “good” acceptance rates in the PMCMC step. However, more on this in Chapter 5 as we need to understand the algorithm in detail before presenting any detailed issues.

The last three parameters, together with , will thus be fixed during inference, and set as

= 0, µs= 0, s= 0.1, µv = 0.025.

The priors of the parameters in the s.v. models of interest are set to _{⇠ U(0, 3),}

⇠ U(0, 0.6), v ⇠ U(0, 0.6), ⇢_{⇠ U( 1, 1),}

N ⇠ U(1.5, 3).

(35)

Chapter 3

SMC & SMC

2

This chapter is divided into two sections; the first introduces the Monte Carlo toolbox and some of its contents, and in the second we present the SMC2 _{algorithm found in [1]. We shall assume that the reader is somewhat} familiar with MCMC methods, especially Metropolis-Hastings.

3.1 Monte Carlo toolbox

We begin introducing perfect Monte Carlo, to then show the principle of im-portance sampling, and finally present the foundations of SMC2_{: sequential} Monte Carlo.

3.1.1 Standard Monte Carlo

Let (⌦, F, P) be a probability space, and X be a real-valued random variable s.t. X : ⌦ ! X ✓ Rd_{. Furthermore, assume that X 2 L}1_(⌦,_{F, P). The goal} is now to calculate some expectation ⌧ = E[ (X)] for a given bounded Borel function : Rd_{! R}n_{. From standard probability theory it is known that}

E[ (X)] = Z

⌦

(X(!)) dP(!).

By defining the cumulative distribution function as F (x) = P(X  x), the above expectation can be rewritten as

E [ (X)] = Z X (x) dF (x) = Z X (x)f (x) dx, (3.1) where f(x) is the density function.

(36)

approximate it using Riemann sums. However, the error in deterministic numerical integration schemes are usually of order O(N c/d₎_{, where c is a} scheme-specific constant, d the dimension of X, and N the number of function evaluations [17]. That is, for an error of it is found that [17]

CN c/d  ) N ✓

C◆d/c .

This means that the computational complexity increases exponentially with the dimension of X for a fix error. This is sometimes referred to as the “curse of dimensionality”.

Still, there is hope in the form of a scheme called Monte Carlo approxi-mation. The idea is to utilize the (strong) Law of large numbers by drawing independent samples from the distribution of X, and letting these approxi-mate the distribution of interest. That is, if N samples, denoted X1:N_{, of X} have been generated the distribution function is estimated by the empirical distribution dFN(x), 1 N N X i=1 d _Xi(x),

such that by the strong LLN we have that Z X (x) dFN(x) = 1 N N X i=1 (Xi) a.s._! Z X (x) dF (x)as N ! 1. We will often denote ⌧ and ⌧N as the actual and estimated mean respectively, i.e. ⌧ = Z X (x) dF (x), ⌧N = Z X (x) dFN(x).

While the error is random for Monte Carlo approximation, the Central limit theorem implies that [17]

p

N (⌧N ⌧ ) d.! N 0, 2( ) , (3.2) if 2_{( ) <}_{1, where} 2_{( ) =}_{V( (X)). Furthermore, we find that}

V(⌧N ⌧ ) =V(⌧N) =V 1 N N X i=1 (Xi) ! ,

and as the samples X1:N _{are independent and of the same distribution of X,} we obtain (moving out N 1₎

(37)

Using the relation between variance and standard deviation thus yields that D(⌧N ⌧ ) = p1

N ( ).

The practical implication of the above result is that the error is of order O(N 1/2₎_{, i.e. independent of the dimension of X. This result together with} its simplicity to implement has made Monte Carlo the standard tool for calculating complicated expectations.

3.1.2 Importance sampling

In some cases, the density in (3.1) is only known up to a normalizing constant f (x) = z(x)

c ,

such that it is possible to evaluate z(x), but not the quotient. As the normal-izing constant is unknown, direct sampling becomes impossible. However, it is possible to circumvent this problem by using importance sampling. Thus, begin with defining an instrumental, or proposal, density g(x) satisfying

(i) it is possible to sample from g, (ii) supp(f) ✓ supp(g).

We will refer to any density as the target density if we are using importance sampling to obtain samples from it. In our case, the target density is f. Rewrite (3.1) as Z X (x)f (x) dx = R f >0_R (x)cf (x) dx f >0cf (x) dx = R z>0_R (x)z(x) dx z>0z(x) dx .

By using the instrumental density, the above expression can further be rewritten as _R z>0_R (x)z(x) dx z>0z(x) dx = R g>0_R (x)!(x)g(x) dx g>0!(x)g(x) dx , (3.3)

where ! is referred to the importance weight, and defined as !(x) = z(x)

g(x). (3.4)

(38)

where the samples X1:N _{are from the instrumental density, and !}i_{, !(X}i₎_. The empirical distribution function is thus seen to be given by

d ˆFN(x), N X i=1 !i PN j=1!j d _Xi(x). (3.5)

While perfect Monte Carlo approximation is unbiased, IS is not because of the quotient in (3.5) [25]. However, by assuming that the weights are bounded, the strong LLN is also valid for IS, i.e.

N X i=1 !i PN j=1!j (Xi) a.s._! Z X (x)f (x) dx as N ! 1

Furthermore, it can also be proven, again under weak assumptions, that the error is of O(N 1/2₎_{; the same as for perfect Monte Carlo. Importance} sam-pling thus amounts to approximating the density of interest using weighted samples from an instrumental distribution. A pseudo-code is found in Algo-rithm 1. Algorithm 1 IS 1: procedure IS 2: for i = 1to N do 3: Draw Xi_{⇠ g(x)} 4: Calculate !i _{according to (3.4)} 5: Store (Xi_{, !}i₎ 6: end for 7: end procedure

3.1.3 Sequential Monte Carlo

In this subsection, we shall introduce the SMC methodology. We begin with presenting sequential IS, and then SIS with resampling.

3.1.3.1 Sequential importance sampling Let the goal be to estimate the sequence of densities

⌧n= Z

Xn

(x1:n) fn(x1:n) dx1:n, n = 1, 2, . . . (3.6) We let the setting be the same as in the previous section, i.e. that the density fnis only known up to a normalizing constant

fn(x1:n) =

zn(x1:n) cn

(39)

Assume that we have already drawn the particles {!i

n, X1:ni }Ni=1 using IS, where we used the shorthand notation !i

n, !(X1:ni ). We now wish to draw the next sample Xi

n+1 and calculate the next weight !n+1i for i = 1, . . . , N. An apparent issue with IS is that we would have to redraw the entire path just to get the next sample; a clear waste of computational power. We would therefore like to construct an algorithm such that [17]

• the next samples X1:n+1i and corresponding weights !n+1i are sampled and calculated without any increase in complexity (w.r.t. n)1_, • the approximations remain numerically stable in the long run.

We begin with finding a way to sequentially draw the next sample. Therefore, define the the proposal density as gn(x1:n). The trick is now to use Bayes’ theorem and define gnas a sequence of conditional densities [17], such that

gn(x1:n) = g1(x1) n Y k=1

gk(xk+1|x1:k). From the above result we find that

gn+1(x1:n+1) = g1(x1) n+1_Y

k=1

gk(xk+1|x1:k) = gn(xn+1|x1:n)gn(x1:n). The practical implication of this formuation is that if we have already drawn Xi

1:nfrom gn, then Xn+1i can be sampled from the conditional density gn(xn+1|x1:n), such that

X_1:n+1i = X_1:ni , X_n+1i

forms a draw from gn+1 [17]. It then remains to find a way to recursively update the weights. By using (3.4), we find that [17]

!n(x1:n+1) = zn+1(x1:n+1) gn+1(x1:n+1) zn(x1:n) zn(x1:n) = zn+1(x1:n+1) gn(xn+1|x1:n)zn(x1:n) zn(x1:n) gn(x1:n) . We identify the last quotient of the r.h.s to be !n(x1:n), such that

!n+1(x1:n+1) =

zn+1(x1:n+1) gn(xn+1|x1:n)zn(x1:n)

!n(x1:n). (3.8) We have thus found a way to recursively update both samples and weights. A nice feature of SIS is that the normalizing constant in (3.7) is estimated by [17] cn⇡ 1 N N X i=1 !n X1:ni .

This algorithm, often referred to as sequential importance sampling, seems to satisfy the criteria we formulated earlier.

(40)

Algorithm 2 SIS 1: procedure SIS 2: Draw X₁i ⇠ g1(x1), i = 1, 2, . . . , N 3: Calculate !i 1 using (3.4) 4: for k = 2to T do 5: Draw X_ki ⇠ gk 1 xk|X1:k 1i 6: Set Xi 1:k (X1:k 1i , Xki) 7: Calculate !i k recursively using (3.8) 8: end for 9: end procedure

3.1.3.2 SIS with resampling

While SIS in theory satisfies our criteria, any practical implementation will soon demonstrate that as the time index increases, the importance weights will tend to zero because of the recursive multiplication in (3.8). In fact, there will after some time only be one particle whose normalized importance weight is non-zero; such that the considered collection of particles fails to represent the target density [25, 26]. This will be referred to as degeneration. A solution for this degeneration is proposed in [27]: resample among the particles before propagating them from n ! n + 1. The added resampling step eliminates particles with small importance weights that are deemed to be far from the target density, and duplicates those with larger weights; see Figure 3.1 for a graphical illustration.

x1 1 x2₁ x3₁ x4₁ x1 1 x2₁ x3₁ x4₁ x2 1 x2₁ x3₁ x4₁

Figure 3.1: Given the original blue particles and their corresponding green weights, these particles are resampled, yielding the orange particles used for propagation. Note that bigger circles imply a larger weight.

By denoting the surviving particles as ˜Xi

(41)

now approximated by the unweighted measure dF_Nn(x), N X i=1 ˜ Nid _X˜i 1:n(x), (3.9)

where ˜Ni _{is the number of copies of particle i, s.t.} P ˜_Ni _{= N} _{[25]. In} practice, this amounts to setting the weights to 1 after the resampling step. We thus get that !n+1(x1:n+1) is given by

!n+1(x1:n+1) =

zn+1(x1:n+1) gn(xn+1|x1:n)zn(x1:n)

. (3.10)

Since the empirical distribution (3.9) is now unweighted, the estimation E [ (X1:n)]⇡ 1 N N X i=1 ⇣ ˜ X_1:ni ⌘,

is unbiased. Furthermore, the normalizing constant is now estimated by [17] cn⇡ 1 Nn n Y k=1 ( _N X i=1 !k ⇣ ˜ X_1:ki ⌘ ) . (3.11)

Although the estimation is unbiased, there will be correlation between the resampled particles. This makes analysis of the algorithm rather diﬃcult, and the variance of the estimation will be somewhat higher than that of standard IS [28]. Although diﬃcult to analyze, it can be proven, under weak assumptions2 _{that [17]} p N 1 N N X i=1 ⇣ ˜ X_1:ni ⌘ ⌧n ! d. ! N 0, 2( ) , as N ! 1,

such that the error is O(N 1/2_{). Furthermore, from Theorem 4.3.1 of [25] it} is found that for all t, there exist a constant ct, independent of N, such that

E 2 4 1 N N X i=1 ⇣ ˜ X_1:ni ⌘ Z Xn (x1:n)fn(x1:n) dx1:n !23 5  ctk nk 2 N , where k nk , sup x1:n | (x1:n)|,

which means that the variance of the CLT has an upper bound.

There are many diﬀerent resampling schemes available, and a good expo-sition of the most common ones are found in [29]. We will collectively refer to any SMC implementation targeting filtering of HMMs as a particle filter. A pseudo code of SISR is found in Algorithm 3.

(42)

Algorithm 3 SISR 1: procedure SISR 2: Draw X₁i ⇠ g✓(x1), i = 1, 2, . . . , N 3: Calculate !i 1 according to (3.10) 4: Set ˜X₁i Xi 1 5: for k = 2to n do 6: Draw ai _{/ !}i k 1 7: Set ˜Xi 1:k 1 ( ˜X1:k 2i , Xa i k 1) 8: Draw X_ki ⇠ g✓(x| ˜X1:k 1i ) 9: Set Xi 1:k ( ˜X1:k 1i , Xki) 10: Calculate !i_k using (3.10) 11: end for 12: end procedure 3.1.3.3 Drawbacks

The drawback of using SISR is that the resampling step depletes the number of distinct particles as the time index increases [17]. This in turn has negative eﬀect on the estimation as the variance also increases [17]. One should therefore avoid resampling, and only do it when necessary. However, this is not implemented in the particle filters of this thesis.

We will use the eﬀective sample size, or ESS for short, to measure the number of distinct particles. In [1], it is measured as

ESS = _P_N 1 i=1(!in)2 N X i=1 !i_n !2 (3.12) Such that when !i

n= !jn, j 6= i, we get the maximum ESS = N. ESS will be the quantity we use for measuring the number of distinct particles.

Although there are issues pertaining the SISR, it performs well when considering [17]

• filtering of HMMs, i.e. when of (3.6) is defined as (x1:n) = xn, • estimating the normalizing constant of (3.7).

3.1.3.4 Example

Let us consider using SISR to filter the states of the HMM in Section 2.1.1. We assume that ✓ is known, and let y1:ndenote observations. When consid-ering filtconsid-ering, the sequence of expectations that we are interested in is given by

⌧n=E✓[Xn|y1:n] = Z

Xn

(43)

We thus see that the target density is (2.6), and from which we see that zn(x1:n) of (3.7) is given by

zn(x1:n) = p✓(x1:n, y1:n).

Furthermore, we also see that the normalizing constant cn corresponds to (2.7), which by (3.11) is estimated by p✓(y1:n)⇡ ˆp✓(y1:n) = 1 Nn n Y k=1 ( _N X i=1 !k X1:ki ) . (3.13)

The problem then lies in defining the proposal density. In [27], the proposal is set as the density of the hidden process; such that the weight (3.10) is given by

!n(x1:n) = !n(xn) = p✓(yn|xn).

This filter is in [27] referred to as the bootstrap particle filter. There is also the fully-adaptable particle filter of [30]. The idea behind the faPF is to construct the proposal as gn(x1:n) = g✓(xn|xn 1, yn), such that the weight is given by

!n(x1:n) = !n(xn 1, xn) =

p✓(yn|xn)q✓(xn|xn 1) g✓(xn|xn 1, yn)

.

It is possible to prove that the faPF is the optimal choice in terms of mini-mizing the variance of the importance weights [26].

3.2 SMC

2

SMC2 _{is a “black-box” algorithm developed for joint sequential estimation} of parameters and states of an arbitrary HMM [1]. More formally, SMC2 _is used to estimate the sequence of densities

p(✓, x1:n,|y1:n), n = 1, 2, . . . (3.14) The two main ingredients of SMC2_{are SISR of Section 3.1.3.2 and IBIS. IBIS} is a SIS implementation for finding the sequence of parameter posteriors

p(✓|y1:n)/ p✓(y1:n)p(✓), n = 1, 2, . . . , (3.15) with an occasional MCMC rejuvenation step to avoid the number of distinct particles from decreasing as the time index increases [31].

To derive a sequential estimation of (3.15), [31] uses Bayes’ theorem to find that3 p✓(y1:n) = n Y k=1 p✓(yk|y1:k 1),

3_{With the convention p}

(44)

such that at time step k, the importance weight is given by

p✓(yk|y1:k 1). (3.16) It is thus possible to use IBIS for sequential inference in any statistical model whenever the above density is known in closed form. For an HMM, we see that the above density is given by

p✓(yk|y1:k 1) = R

Xkp✓(x1:k, y1:k) dx1:k p✓(y1:k 1)

.

Refactoring the joint density and moving the normalizing constant inside the integral yields p✓(yk|y1:k 1) = Z Xkp✓(yk|xk)q✓(xk|xk 1) p✓(x1:k 1, y1:k 1) p✓(y1:k 1) dx1:k. By noting that the quotient is in fact (2.6), we get

p✓(yk|y1:k 1) = Z

Xk

p✓(yk|xk)q✓(xk|xk 1)p✓(x1:k 1|y1:k 1) dx1:k, (3.17) which, unfortunately, is rarely known in a closed form. However, by using a proposal density, this integral may be rewritten as [32]

p✓(yk|y1:k 1) = Z

Xk

!k(x1:k)p✓(x1:k 1|y1:k 1)gk(x1:k) dx1:k.

From the previous section we know that particle filters can be used to ap-proximate integrals as the above. Such that by using a SISR particle filter of size N, (3.16) is estimated by ˆ p✓(yk|y1:k 1) = 1 N N X i=1 !k X1:ki . (3.18)

Thus, by coupling each parameter particle in IBIS with its own particle filter of size N, we can estimate (3.16) using (3.18). The name “SMC2_{” thus comes} from the fact that it comprises to two nested SMC based methods [1]; for a visualization, see Figure 3.2. From now on, we shall denote the number of parameter particles as M, and the number of particles in each particle filter associated with ✓m _{as N. By the asymptotic behaviour of particle filters,} we get IBIS by letting N ! 1; which in practice is of course impossible to achieve.

(45)

Algorithm 4 SMC2 1: procedure SMC2 2: Draw ✓m_{⇠ p(✓), set !}m₁ 3: for k = 1 to n do 4: if k = 1 then 5: Draw Xi,m 1 ⇠ ✓m(x) 6: Calculate ˆp✓m(y₁) using (3.18) 7: Calculate !₁i,m using (3.10)

8: else

9: Sample ai,m / !k 1i,m

10: if k = 2 then

11: Set ˜X₁i,m Xai,m

1

12: else

13: Set ˜X_{1:k 1}i,m ⇣X_{1:k 2}i,m , Xai,m

k 1 ⌘

14: end if

15: Draw X_ki,m ⇠ g✓(x| ˜X_{1:k 1}i,m )

16: Set X_1:ki,m ( ˜X_{1:k 1}i,m , X_ki,m)

17: Calculate ˆp✓m(y_k|y_{1:k 1}) using (3.18) 18: Calculate !_ki,m using (3.10)

19: end if 20: Update !m_!m_{· ˆp} ✓m(y_k|y_{1:k 1}) 21: if ESS < ✏M then 22: Perform PMCMC rejuventation 23: end if 24: end for 25: end procedure

The MCMC step in IBIS is replaced by a particle MCMC step in SMC2_. Particle MCMC is a method of inference, similar to that of ordinary MCMC, that targets (3.14), but uses a particle filter as proposal for updating the x1:n-component [32]. Three diﬀerent schemes are proposed in [32], but the one of interest in this thesis is the Particle marginal Metropolis-Hastings. The advantage of using the PMMH is that it updates both the ✓- and x1:n -component simultaneously, without having to know the full conditional den-sity of the parameters. To achieve this update, [32] constructs the proposal as

r(_{{¯✓, ¯x}1:n}|{✓, x1:n}) = r(¯✓|✓)f_✓¯(¯x1:n|y1:n),

where r is a proposal kernel, and f_✓¯the smoothing density (2.6), but “adapted” to the new ¯✓. The acceptance probability, ↵, for PMMH is in [32] found to be given by

↵({¯✓, ¯x1:n}, {✓, x1:n}) = 1 ^ p✓¯(y1:n)p(¯✓)r(✓|¯✓) p✓(y1:n)p(✓)r(¯✓|✓)

(46)

✓1 _✓2 X₁1,1 X₁2,1 X₁1,2 X₁2,2 X₂1,1 X₂2,1 X₂1,2 X₂2,2 IBIS P ar ti cl e fil ter

Figure 3.2: Graphical representation of all particles at t = 2 for M = 2, and N = 2. In this fictive iteration, particle X2,1

1 did not survive the first resampling step.

We thus see that the name “particle marginal MH” comes from the fact that only the marginal likelihood function p✓(y1:n)is considered in the acceptance probability.

We shall now in detail describe the rejuventation step of SMC2_{. Thus,} assume that we are at time index n0_{, and have so far generated the particles} ✓m, X_1:n1:N,m0 , and weights !m for m = 1, 2, . . . , M. If ESS < ✏M for some

✏_{2 (0, 1), we rejuvenate the particles. The first step is to resample among} the parameter particles, and denote the surviving ones as ˜✓m_{. Next, we} mutate ˜✓m _{! ¯✓}m _{using the proposal}

¯

✓m_{| ˜✓}m_{⇠ N}⇣µ, ˆˆ ⌃⌘=: r(¯✓_|˜✓), (3.20) where ˆµ and ˆ⌃ is the estimated mean and covariance respectively, given by

ˆ µ = PM m=1!m✓m PM m=1!m , ⌃ =ˆ _P_M1 m=1!m M X m=1 !m(✓m µ)(✓ˆ m µ)ˆ T.

It also possible to remove the weights and use the resampled particles instead, although the above formulation results in somewhat lower variance. The advantage of (3.20) is twofold: (i) the proposal is automatically calibrated, and (ii) new particles are drawn in close proximity to where most of the probability mass is located [31]. In the MCMC context, the above proposal corresponds to an independent one. For each proposed parameter particle ¯

✓m_{, we run N particle filters up to and including time n}0_{; yielding the samples} ¯

X_1:n1:N,m0 . Next, we accept ¯✓m with probability

(47)

where we for readablility have defined ˆZ as ˆ p✓m(y_1:n0) = ˆZ ⇣ ✓m, X_1:n1:N,m0 ⌘ = 1 Nn0 n0 Y k=1 ( _N X i=1 !k ⇣ X_1:ki,m⌘ ) .

We then return the particles to the main algorithm and set the weights to

1, i.e. _⇣n ✓m, X_1:n1:N,m0 o , !m⌘ ⇣n✓¯m, ¯X_1:n1:N,m0 o , 1⌘. (3.21) In [1], it is shown that for any measurable function , we have that E[ (✓, X1:k)] is estimated by E [ (✓, X1:k)]⇡ 1 PM m=1!m M X m=1 !m ( _N X i=1 W_ki,m ⇣✓m, X_1:ki,m⌘ ) , where Wi,m

k denotes the normalized importance weight ! i,m

k . It is possible to use the resampled particles as well, however, it is found in [1] that the above estimation is of slightly lower variance. To use the resampled particles, simply remove the normalized weights Wi,m

k .

The performance of SMC2 _{is dictated by the acceptance rate in the} PM-CMC rejuvenation step; if it is low, the algorithm will suﬀer from poor mixing as new particles are seldom accepted. Addtionally, finding a fix value for N which results in both reasonable performance, in terms of acceptance rate and speed, as well as a small Monte Carlo error is often tedious [1]. Furthermore, [32] finds that to achieve good acceptance rates in PMMH, the number of particles in the particle filter should be N = O(t), i.e. dependent on the number of datapoints currently considered. It would therefore be eﬃcient to let the algorithm “decide” the size of N, which is a feature that can be added to SMC2 _[1].

Two diﬀerent strategies for achieving this are proposed in [1], namely Exchange importance sampling step and Conditional SMC step. The main practical diﬀerence between the two is that the first is less memory intensive, as the second requires storing the full history of the particles in the particle filters. Because of the limited computational power available, we choose the former method. If the acceptance rate in the PMCMC rejuvenation step falls below some pre-specified threshold, e.g. 20%, we increase the number of particles in the particle filters. Let us consider an example. Assume that the acceptance rate fell below 20% while rejuvenating the particles at time step n0. We then generate ˇN new ˇX_1:n1: ˇN ,m₀ for each parameter using our particle filter. The particles are then exchanged via the importance sampling step of [33, 1], and the weight of each parameter particle is updated with the factor

(48)

That is, instead of returning (3.21) we return ⇣n ✓m, X_1:n1:N,m0 o , !m⌘ ⇣n✓¯m, ˇX_1:n1: ˇN ,m0 o , uexch_m ⌘.

The new number of particles, ˇN, can be chosen arbitrarily, but it is in [1] suggested to set ˇN = 2N.

3.2.1 Drawbacks

While SMC2_{is a sequential algorithm, it is not online because of the PMCMC} rejuventation step [1]; it is found that the complexity increases quadratically with the time index, which in turn means that it is not suited for real-time applications. Before implementing SMC2_{, one should beware that it} is rather memory intensive if one is interested in performing smoothing, as this requires, for each time step, storing all currently considered particles in the particle filters. However, the memory cost is greatly reduced if filtering is only considered as we need only save the particles in the particle filters for the length of two time indices. That is, the particles at time k need only be stored until k + 1. If smoothing is needed, the author suggests using HDF54. Furthermore, the speed of the algorithm can be greatly increased by parallellizing the PMCMC rejuventation step as we do not need to “join” the particles at each time step and calculate ESS. This is however not im-plemented in this thesis, as parallellization is used for the option pricing step.

(49)

Chapter 4

Results

In this chapter, we present the results of using SMC2_{for inferring parameters} and states of the stochastic volatatility models in Section 2.2 and 2.1.1. However, we shall first consider two simpler examples to verify that the algorithm works as intended.

SMC2 _{will be configured as follows}

• we heed the recommendations of [29] and use systematic resampling for all resampling procedures,

• the rejuvenation step is instantiated if ESS < 0.2M,

• if the acceptance rate in the PMCMC rejuvenation falls below 20% we increase the number of particles in the particle filter,

• we use exchange importance sampling if we need to increase the number of particles in the particle filters, and these are always doubled.

4.1 Benchmarks

We benchmark the algorithm with the linear Gaussian model, and a non-linear, non-Gaussian model often used to benchmark SMC algorithms [32].

4.1.1 Linear Gaussian model

The linear Gaussian model is an HMM with the following dynamics Xt= Xt 1+ x✏t 1, ✏t 1⇠ N (0, 1),

Yt= Xt+ y⌘t, ⌘t⇠ N (0, 1), X1 ⇠ N (0, 1).

(50)

Figure 4.1: Realizations of simulating the processes.

We easily see that the emission density is given by Yt| Xt= x⇠ N x, 2y .

By using the bootstrap filter, the importance weight is simply given by !t(x1:t) = p✓(yt|xt),

and the transition density given by

Xt| Xt 1⇠ N x, x2 . We use similar prior distributions as in [1], that is

x⇠ IG(0, 2), y ⇠ IG(0, 2).

The algorithm is then initialized with M = 3,000 parameter particles, and N = 50particles in the particle filters. Figure 4.2 shows the filtered estimates of the hidden process up to time step 100.

As seen in Figure 4.2, the estimates of the SMC2 _{and the Kalman filter} are congruent. Figure 4.3 shows the posterior distributions of the parameters together with the true values, represented as a red line.

(51)

Figure 4.2: Filtered estimates of the hidden process together with the exact Kalman filter.

Figure 4.3: The parameter posteriors.

4.1.2 Non-linear HMM

To verify that the algorithm performs well on more complex HMMs, we will consider the following non-linear model [25, 32],

Xt= Xt 1 2 + 25 Xt 1 1 + X_{t 1}2 + 8 cos(1.2t) + Vt, Vt⇠ N (0, 2 V), Yt= X2 t 20 + Wt, Wt⇠ N (0, 2 W), X1 ⇠ N (0, 5).

(52)

comprising 1,000 observations generated with ✓ = {4, 5}, where Figure 4.4 shows the realizations.

Figure 4.4: The realizations of the processes.

We easily see from the dynamics of the model that the emission density is given by

Yt| Xt= x⇠ N x2/20, W2 .

The proposal density will again be defined as the underlying Markov chain, which is seen to be Xt| Xt 1= x⇠ N ✓ x 2 + 25 x 1 + x2 + 8 cos(1.2t), 2 V ◆ . We use the following priors on the parameters

V ⇠ IG(0, 1), W ⇠ IG(0, 1).

Running the algorithm with M = 3,000 parameter particles, and N = 50 particles in the particle filter produces the filtering results of Figure 4.5.

(53)

Figure 4.5: Filtered estimates of the hidden process.

Figure 4.6: The parameter posteriors of the non-linear model.

4.2 Stochastic volatility models

In this section, we apply the SMC2_{for inference in the s.v. models of interest,} and also the one in Section 2.1.1; with which we begin.

4.2.1 Taylor

Inference is based on a synthetic dataset comprising 1,000 observations gen-erated using the same values as in [30], i.e. ✓ = {0.9702, 0.178, 0.5992}, where the realizations can be seen in Figure 4.7.

We use the bootstrap filter again, and therefore have that the transition and emission density are given by (2.8) and (2.9) respectively. As initial density, we set

(54)

Figure 4.7: The realizations of the processes.

We assume U(0, 1) priors for the parameters. We use 3,000 parameter parti-cles and 50 inital partiparti-cles in the particle filters. The filtering results for the first 100 points are found in Figure 4.8.

And the parameter posteriors are found in Figure 4.9. We see from Figure 4.8 that the filtered estimates follow the true trajectory rather well.

(55)

Figure 4.9: The parameter posteriors of the non-linear model.

4.2.2 Heston, Bates, and SVCJ

We will use the same methodology for the three models, that is: (i) inference is based on 100 daily observations of synthetic datasets, (ii) we will consider six options with the same time to maturity, T = 2

years,

(iii) all options considered are vanilla European calls, with strike prices at time t evenly spaced on [0.85St, 1.15St],

(iv) each dataset has been generated with the parameters ✓ = {2, 0.15, 0.15, 0.50, 2} and µ in (2.14) to 0.05,

(v) we will use the bootstrap particle filter, i.e. the proposal is defined as the hidden Markov chain,

(vi) we use 3,000 parameter particles, and 50 initial particles in the particle filters,

(vii) the in (2.22) is set to 0.5,

(56)

The option pricing step is very computationally intensive, as it incurs M2_· N·NFFTcalculations. Fortunately, these calculations are very parallellizable as we can simply divide the parameter block into n equally sized partitions, and then let each of the n available cores of the used CPU handle each block. In our case, the available CPU is an i7 38201_{, such that we divide} the parameter block into 4 partitions and thus let each physical core and its attached thread handle one block each.

4.2.2.1 Heston model

The simulation results of the asset price and volatility can be seen in Figure 4.10.

Figure 4.10: Realizations of the Heston model, inference is based only on the first 100.

The filtered estimates are found in Figure 4.11, and the parameter pos-teriors are found in Figure 4.12.

(57)

Figure 4.12: The parameter posteriors of the Heston model. Where  is in the top-left corner, followed by , , and ⇢.

4.2.2.2 Bates model

The simulation results of the asset price and volatility can be seen in Figure 4.13. While barely discernable, the jump in the asset price process occurs at time index 51.

Figure 4.13: Realizations of the Bates model.

(58)

(59)

4.2.2.3 SVCJ model

The simulation results of the asset price and volatility can be seen in Figure 4.16. While diﬃcult to see, the jumps occur at time indices 16, 19, and 100.

Figure 4.16: Realizations of the SVCJ model.

The filtered estimates and the parameter posteriors can be seen in figures 4.17, and 4.18.

(60)

(61)

Chapter 5

Discussion

We have in this Master’s thesis used SMC2 _{to infer both parameters and} latent states in the stochastic volatility models of Section 2.2 and 2.1.1. The algorithm is well suited for inference in general HMMs thanks to its black-box design, i.e. the user does not have to tailor the algorithm for the model under study; other than the densities in the dynamics. The algorithm was implemented in Python to take full advantage of the algorithm’s dynamic structure. Other advantages gained from using Python is an out-of-the-box high-performance parallellization interface, and many community developed libraries.

We studied two benchmarks before tackling the complex s.v. models to verify that the algorithm works as intended. The first of the two was the linear Gaussian model of Section 4.1.1, to which there is an exact filter called the Kalman filter. From Figure 4.2 we see that the filtered estimates of SMC2 _{are congruent with those of the exact filter. Furthermore, we find} from Figure 4.3 that the parameter posteriors cover the true values, and seem to be approximately normally distributed around these. The second benchmark was the non-linear, non-Gaussian model of Section 4.1.2. As the model lacks an exact filter, a visual inspection will have to suﬃce. The filtered estimates of Figure 4.5 seem to follow the true path quite well, and are at times very close to the true values. From Figure 4.6 we see that the parameter posteriors cover the true values and seem to be normally distributed, where he distribution of V is centered around the true value. From these results, we may assume that our implementation of SMC2 _works as intended.

(62)

the true value.

We see that the filtered estimates of the Heston model in Figure 4.11 follow the true trajectory quite well. Furthermore, we see that all posteriors in Figure 4.12 cover the true values. From the density of , we see that the variance is rather high, which implies a flat likelihood; the option pricing function is therefore not sensitive to changes in . Similar results are found in [14] as the estimates did not converge to the true. The variance of is on the other hand very low, and the mass almost symmetrically distributed around the true value. This implies that has a strong influence in the option pricing function. From the posterior of , we see that the variance is higher as compared with but that the density still covers the true value, although not distributed symmetrically around the true value. The posterior of ⇢ is similar to that of , i.e. low variance and almost symmetrically distributed around the true value, also implying a strong influence in the pricing equations.

The results of the Bates model are of mixed quality. The filter estimates of Figure 4.14 are seen to be almost identical to the true values, rather impressive since we only considered 100 observations and six options. From the posteriors of Figure 4.15, we see that these are similar to the Heston model, although suﬀer from a slightly larger variance. However, the posterior of N is seen to perform rather well as the variance is low, and the density centered around the true value.

We see that the filter estimates of Figure 4.16 follow the true trajec-tory well, but have been shifted vertically downward. Furthermore, we see that the parameter posteriors are rather congruent with the Bates model, although seem to have slightly lower variance in all of the cases. This is most likely because of the jumps in volatility as ESS quickly drops to just a handful of particles whenever these occur, thus eliminating most of the particles.

Sequential parameter and statelearning in continuous timestochastic volatility models using theSMC² algorithm

Sequential parameter and state

learning in continuous time

stochastic volatility models using the

SMC² algorithm

VICTOR TINGSTRÖM

Sequential parameter and state learning

in continuous time stochastic volatility

models using the SMC² algorithm

V I C T O R T I N G S T R Ö M

Sequential parameter and state learning in

continuous time stochastic volatility models using

the SMC

algorithm

Sekventiell estimering av parametrar och tillstånd i

tidskontinuerliga stokastiska volatilitetsmodeller

nyttjandes SMC

algoritmen

Acknowledgements

Contents

Nomenclature

Chapter 1

Introduction

Chapter 2

Stochastic volatility

2.1 Hidden Markov models

2.2 Financial models

2.3 Contingent claims and option pricing

2.4 Discretization schemes

2.5 Problem description

Chapter 3

SMC & SMC

2

3.1 Monte Carlo toolbox

3.2 SMC

Chapter 4

Results

4.1 Benchmarks

4.2 Stochastic volatility models

Chapter 5

Discussion

_algorithm

_algoritmen