Modeling the Relation Between Implied and Realized Volatility

(1)

Modeling the Relation Between Implied and Realized Volatility

TOBIAS BRODD

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

(3)

TOBIAS BRODD

Degree Projects in Financial Mathematics (30 ECTS credits) Master's Programme in Applied and Computational Mathematics KTH Royal Institute of Technology year 2020

Supervisors at DNB ASA: Thor Gunnar Olsen, Arne Sæle Supervisor at KTH: Sigrid Källblad Nordin

Examiner at KTH: Sigrid Källblad Nordin

(4)

TRITA-SCI-GRU 2020:080 MAT-E 2020:043

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Abstract

Options are an important part in today’s financial market. It’s therefore of high importance to be able to understand when options are overvalued and undervalued to get a lead on the market. To determine this, the relation between the volatility of the underlying asset, called realized volatility, and the market’s expected volatility, called implied volatility, can be analyzed. In this thesis five models were investigated for modeling the relation between implied and realized volatility. The five models consisted of one Ornstein–Uhlenbeck model, two autoregressive models and two artificial neural networks. To analyze the performance of the models, different accuracy measures were calculated for out-of-sample forecasts. Signals from the models were also calculated and used in a simulated options trading environment to get a better understanding of how well they perform in trading applications. The results suggest that artificial neural networks are able to model the relation more accurately compared to more traditional time series models. It was also shown that a trading strategy based on forecasting the relation was able to generate significant profits. Furthermore, it was shown that profits could be increased by combining a forecasting model with a signal classification model.

(6)

(7)

Modellering av relationen mellan implicit och realiserad volatilitet

Sammanfattning

Optioner är en viktig del i dagens finansiella marknad. Det är därför viktigt att kunna förstå när optioner är över- och undervärderade för att vara i fram- kant av marknaden. För att bestämma detta kan relationen mellan den under- liggande tillgångens volatilitet, kallad realiserad volatilitet, och marknadens förväntade volatilitet, kallad implicit volatilitet, analyseras. I den här avhand- lingen undersöktes fem modeller för att modellera relationen mellan implicit och realiserad volatilitet. De fem modellerna var en Ornstein–Uhlenbeck modell, två autoregressiva modeller samt två artificiella neurala nätverk. För att analysera modellernas prestanda undersöktes olika nogrannhetsmått för prognoser från modellerna. Signaler från modellerna beräknades även och använ- des i en simulerad optionshandelsmiljö för att få en bättre förståelse för hur väl de presterar i en handelstillämpning. Resultaten tyder på att artificiella neurala nätverk kan modellera relationen bättre än mer traditionella tidsseriemodeller- na. Det visades även att en handelsstrategi baserad på prognoser av relationen kunde generera en signifikant vinst. Det visades dessutom att vinster kunde ökas genom att kombinera en prognosmodell med en modell som klassificerar signaler.

(8)

(9)

Acknowledgements

I would first like to express my gratitude to my examiner and supervisor Sigrid Källblad Nordin at the Department of Mathematical Statistics at KTH for her support and feedback regarding my thesis. Without her help, the quality of this thesis would have been significantly lower. I would also like to thank Thor Gunnar Olsen and the rest of the derivatives team at DNB Markets in Oslo for introducing me to volatility modeling as well as for helping me throughout this thesis. I’m very especially grateful for Arne Sæle’s valuable comments on this thesis. Also, a special thanks to Christer Magnergård at DNB Markets in Stockholm for helping me get in touch with the derivatives team in Oslo.

Lastly, I would like to thank my family and friends for their support and encour- agement throughout the years of my study and through the process of writing this thesis. This accomplishment would not have been possible without them.

(10)

(11)

1 Introduction 1

1.1 Modeling Volatility . . . 2

1.2 Scope of the Study . . . 3

1.3 Research Questions . . . 4

1.4 Thesis Outline . . . 4

2 Background 5 2.1 Black-Scholes . . . 5

2.2 Implied Volatility . . . 6

2.3 Realized Volatility . . . 6

2.4 Derivatives . . . 7

2.4.1 Call and Put Options . . . 7

2.4.2 Straddles . . . 8

2.4.3 Variance Swaps . . . 9

2.5 Volatility Trading Signals . . . 10

2.6 Time Series Modeling . . . 11

2.6.1 Autocorrelation . . . 12

2.6.2 Seasonal Decomposition . . . 12

2.6.3 Ornstein–Uhlenbeck Process . . . 13

2.6.4 Poisson Jump Process . . . 14

2.6.5 Ornstein-Uhlenbeck Process with Jumps . . . 15

2.6.6 Autoregressive Models . . . 16

2.6.7 Heterogeneous Autoregressive Models . . . 17

2.7 Artificial Neural Networks . . . 17

2.7.1 Multilayer Perceptron . . . 18

2.7.2 Recurrent Neural Networks . . . 20

2.7.3 Bias-Variance Dilemma . . . 22

2.8 Metrics . . . 23

2.8.1 Mean Squared Error . . . 23

ix

(12)

2.8.2 Mean Absolute Error . . . 24

2.8.3 Precision and Recall . . . 24

2.8.4 Correlation . . . 24

3 Method 25 3.1 Data Selection . . . 25

3.2 Model Selection . . . 26

3.3 Performance Metrics . . . 27

3.4 Methodology . . . 27

3.4.1 Data Pre-Processing . . . 27

3.4.2 Modeling . . . 28

3.4.3 Analysis . . . 31

3.4.4 Trading . . . 32

3.4.5 Significance Testing . . . 33

3.5 Hyperparameter Optimization . . . 33

3.5.1 Volatility Forecasting . . . 35

3.5.2 Signal Classification . . . 35

4 Results 36 4.1 Volatility Case Study . . . 36

4.2 Forecasts . . . 38

4.3 Trading . . . 41

4.4 Distributions . . . 45

4.4.1 MSE Distributions . . . 45

4.4.2 Profit Distributions . . . 47

4.5 Significance . . . 49

4.6 Seasonal Decomposition . . . 50

4.7 VIX Futures Trading . . . 52

5 Discussion 54 5.1 Volatility Case Study . . . 54

5.2 Forecast Comparison . . . 55

5.3 Trading Comparison . . . 56

5.4 Significance Analysis . . . 58

5.5 Seasonal Decomposition Analysis . . . 59

5.6 VIX Trading . . . 59

6 Conclusions 61 6.1 Future Work . . . 61

(13)

Bibliography 62

A Assets 64

B Additional Forecast Plots 65

C Data Plots 67

D Network Optimization Results 73

D.1 Forecasting . . . 73

D.1.1 MLP . . . 73

D.1.2 LSTM . . . 74

D.2 Classification . . . 74

D.2.1 MLP . . . 74

D.2.2 LSTM . . . 75

(14)

(15)

Introduction

Options are an important part in today’s financial market. This is especially true for DNB and other banks since they trade options. It’s therefore of high importance to be able to value options on the market in order to generate a profit.

There are several different models for pricing options. The difference between the models often comes down to how they model volatility. The perhaps most common model used today in option pricing is the Black-Scholes model [1].

This model assumes constant variance for the underlying asset, which is ar- gued by many to be wrong since the volatility surface of an option with different strikes and maturities is not flat. Then there is also pricing models based on stochastic volatility which does not assume the variance of the asset to be constant, but instead models variance by a stochastic process. An example of this is the Heston model [2]. While stochastic volatility models may model volatility more accurately, the Black-Scholes model is easier to use and perhaps more commonly used in the financial market. The Black-Scholes model only has one parameter that cannot be directly observed, the average future volatility of the underlying asset. This parameter is however extremely dif- ficult to estimate since one may never know what the true volatility of the asset is. However, there have been several studies throughout the years that attempt to find a model that estimates this parameter correctly. The most commonly used method is to use historical asset returns and a model from the GARCH family to model the volatility [3]. Using this method one can forecast historical/realized volatility and use the forecast as an estimate of future volatility. Another method is to use the market’s expected volaility, called im-

1

(16)

plied volatility. This volatility is, as the name suggests, the volatility which is implied by options on the market. This is often calculated by solving for the volatility in the Black-Scholes equation. Shu and Zhang [4] and Christensen and Prabhala [5] showed that although it is a biased estimator, implied volatility generally subsumes the information of historical volatility. This was further confirmed by Poon and Granger [6] who analyzed several different models for forecasting volatility and found that implied volatility generally is the better estimator. However, since implied volatility is a biased estimator and is often used when pricing options using the Black-Scholes model, banks such as DNB can make a lot of money by determining if implied volatility is too high or too low relative to realized volatility. For example, assume that a trader can buy a financial contract that get more expensive as volatility of an asset increases.

Also assume that the contract is priced using the market’s expected volatility, i.e. the implied volatility. In this setting, the trader should buy such contract when implied volatility is lower than future realized volatility, since that would imply that the contract’s price is lower than it should be. On the other hand, when implied volatility is lower than future realized volatility, then the opposite is true and the trader should instead sell a contract. By forecasting the relation, i.e. the difference, between implied and realized volatility, one can construct signals that can be used for trading. A more accurate model should in theory result in more accurate trading signals and this is why being able to model the relation accurately is so important.

1.1 Modeling Volatility

There are mainly two approaches to forecasting the relation between realized and implied volatility. The most intuitive way is to calculate the difference between the two volatilities and model the difference. Such models can directly forecast the difference without any intermediate steps. The second approach is to make use of the fact that implied volatility is known for future dates. By modeling only realized volatility, the future difference (relation) can be calculated using the known implied volatility. This approach is however more limited in terms of how far forecasts can be made compared to the first approach. This is because implied volatility needs to be calculated for options with a certain maturity date. Implied volatility can therefore only be calculated up to that maturity date. This means that models using realized volatility can only forecast the relation, i.e. difference, up to the selected maturity date.

(17)

1.2 Scope of the Study

Since this thesis is made in collaboration with the Norwegian bank DNB, the main objective is to model the relation between implied and realized volatility in the Norwegian market. The two previously discussed approaches, difference modeling and realized volatility modeling, will both be investigated and compared. Model performances will be investigated in terms of both error measures and trading performance. To make this more manageable, European style options will be simulated and the interest rate will be set to 0%. Options are however not the only contract that can be investigated in order to measure trading performance of volatility trading signals. Variance swaps is another type of contract available on large indices such as the S&P 500. The relation between S&P 500’s volatility index VIX and realized volatility will therefore also be modeled and used to trade variance swaps, also called VIX Futures, in a simulated environment.

There are many models to use when modeling time series and to limit the number of models that need to be tested, only three model classes and in total five different models will be investigated:

• Ornstein–Uhlenbeck based models (OUJ)

• Autoregressive models (AR, HAR)

• Artificial Neural Networks (MLP, LSTM)

Ornstein–Uhlenbeck processes are mean reverting, which means that a process moves towards its mean level over time. This should work well in our case since volatility also seems to be mean reverting. The HAR model has been proven to work well when modeling volatility and the AR model seems like a good comparison to the HAR model [7]. While Vortelinos [8] showed that neural networks model didn’t perform better than the HAR model, there have been studies by for example Lu, Que, and Cao [9] and Kristjanpoller, Fadic, and Minutolo [10] that show promising results when using neural networks in combination with other models. Neural network models such as the MLP and LSTM are therefore interesting models to investigate.

For this thesis it was also decided to investigate seasonal decomposition. This was decided because initial testing showed that the analyzed data may contain seasonal components. Modeling the seasonality of the data separately could therefore improve model accuracy.

(18)

1.3 Research Questions

The purpose of this thesis is to answer the following questions:

• Can we get a lead on the market by modeling the relation between implied and realized volatility?

• Can the performance of such models be improved by the use of artificial neural networks?

1.4 Thesis Outline

In the background chapter (2) financial and mathematical concepts behind volatility modeling and forecasting will be presented. This includes the Black- Scholes model, financial contracts such as options and variance swaps and various forecasting models used in this thesis. In the method chapter (3), data selection, model selection and the general methodology of the thesis will then be discussed. The results chapter (4) will present forecast accuracy results from the model comparison as well as trading results. Results from a volatility case study of the DNB stock will also be presented in this chapter. In the discussion chapter (5), forecasts made by the models will be discussed. It will also be discussed if the models can be used for trading purposes. Results from the volatility case study will also be analyzed and discussed in this chapter. In the last chapter, conclusions (6), main findings and conclusions of the thesis will be presented and future research will be discussed.

(19)

Background

2.1 Black-Scholes

The Black-Scholes equation is a partial differential equation (PDE) which describes the option price V (S, t) at time t with an underlying asset price S and annualized interest rate r [1]:

@V

@t +1 2

2S²@²V

@S² + rS@V

@S rV = 0. (2.1)

The Black-Scholes model makes certain assumptions on both the asset and the market.

Asset assumptions:

• The rate of return on the risk-free asset is constant (risk-free interest rate).

• The stock price follows a geometric Brownian motion (GBM) where drift and volatility are constant.

• The stock does not pay dividend.

Market assumptions:

• There is no arbitrage opportunity.

• It is possible to borrow and lend any amount of cash at the risk-free rate.

• It is possible to buy and sell any amount of stock.

5

(20)

• The market is friction-less (transactions do not incur any fees or costs) It is a well known fact that these assumptions does not always hold in the real world, especially the assumption that the volatility of the underlying is constant. Extensions to the Black-Scholes equation have therefore been developed to better model the volatility of the underlying asset. One such model is the Heston model [2].

2.2 Implied Volatility

Implied volatility (IV), which can also be called market volatility, is the volatility one get by solving for in the Black-Scholes formula 2.5 or 2.6. Since one of the assumptions of the Black-Scholes model is that volatility is constant, one should get a flat surface when plotting option volatility against moneyness.

However, in practice one often notices a so called volatility-smile (or volatility- skew) where volatility is higher for options deep out-of-the-money (OTM) and deep in-the-money (ITM) compared to options at-the-money (ATM). This is due to the fact that the market believes that the underlying’s distribution has fatter tails than the normal distribution assumed by the Black-Scholes model.

Implied volatility is often also a biased estimator as shown by for example Shu and Zhang [4] and Christensen and Prabhala [5]. This is due to the fact that there is a risk premium when buying options since for example call options have a theoretical unlimited payoff which the seller has to take into account when selling the option. It is therefore, on average, better to sell options than to buy options if the risk of unlimited loss is not considered. However, it is important to note that the risk premium varies by the underlying and may in some cases even be negative.

2.3 Realized Volatility

Realized volatility (RV) is the measure of price variation over a period of time.

In this thesis the volatility of log-returns is studied. Using asset prices (S), log-returns can be calculated as rt+1 = log(^S_S^t+1_t ). Using the returns, realized volatility can be estimated as the annualized sum of squared daily log-returns rover period p as: vuut252

p Xn

i=1

r_i². (2.2)

(21)

This differs slightly from the normal definition of volatility (standard deviation) in that the mean value of the log-returns is not subtracted from each sample before squaring the value. However, the difference should be small since mean log-returns are usually centered around 0.

2.4 Derivatives

2.4.1 Call and Put Options

Call and put options are financial contracts that give the buyer the right, but not the obligation, to buy or sell an underlying asset at a specified price (strike price) and date (maturity date). There are also different option styles, such as European and American style options. European options can only be exercised at maturity while American options can be exercised at any time until maturity. In this thesis European style options are used and the theory below mainly applies to European options, but can be modified to work for American options.

A call option is a financial contract that gives the owner the right, but not the obligation, to buy an asset at a specified price and date. The payoff function of a call option is defined as:

C(S, T ) = max(S_T K, 0), (2.3)

where S is the price of the underlying asset, K is the strike price and T is the maturity.

There also exists a financial contract that gives the owner the right, but not the obligation, to sell an asset at a specified price and date. Such contract is called a put option and its payoff function is defined as:

P (S, T ) = max(K ST, 0), (2.4)

with the same notation as in equation 2.3.

Using 2.1 one can derive the Black-Scholes formula for pricing a call option:

C(S, t) = N (d1)St N (d2)Ke ^{r(T t)}, d1 = 1

pT t

 ln

✓St

K

◆ +

✓ r +

2

◆

(T t) , d₂ = d₁ p

T t,

(2.5)

(22)

where N(·) is the cumulative distribution function (CDF) of the standard normal distribution, T t is the time to maturity (in years), Stis the spot price of the underlying asset at time t, K is the option’s strike price, r is the annualized risk free rate and is the volatility (standard deviation) of log-returns of the underlying asset.

Using the relationship between put and call options, called the put-call parity, one can also derive the price of a put option:

P (S, t) = Ke ^{r(T t)} St+ C(St, t). (2.6)

2.4.2 Straddles

A straddle is a combination of two options, namely a long call option and a long put option. This ensures that the buyer of the straddle will make a profit no matter which direction the underlying moves in, as long as it deviates enough from the strike price. The payoff of a straddle is defined as:

Straddle(S, T ) = C(S, T ) + P (S, T ), (2.7) with underlying asset price S and maturity T . C and P are payoffs of a call and a put respectively, with the same strike price K set to the spot price of the underlying asset price S at time t.

Since selling a straddle only results in a profit if the underlying does not deviate enough from the strike price, the payoff curve for a straddle is shaped like a V as seen in figure 2.1.

Similar to equation 2.7, the price of a straddle is calculated by combining a call and a put option:

Straddle(S, t) = C(S, t) + P (S, t),

with the strike price K of both options set to the spot price of the underlying asset price S at time t.

(23)

Figure 2.1: Payoff curves of a long call option and long put option which results in the payoff curve of a straddle.

2.4.3 Variance Swaps

A variance swap is a forward contract on realized variance that provides pure exposure to variance. Its payoff can therefore simply be described as:

( ²_R Kvar)· N,

where R is the realized variance of the underlying at expiration, Kvar is the delivery price, i.e. the volatility that is traded against, and N is the notional amount per variance point. It therefore pays a positive amount if realized variance is higher than implied variance and a negative amount otherwise.

Variance swaps can be replicated by a static portfolio of calls and puts since Kvar can be written as:

Kvar =2 T

✓ rT

✓S0

S_⇤e^rT 1

◆

logS_⇤ S₀ + e^rT

Z S_⇤ 0

1

K²P (K)dK + e^rT

Z ₁

S_⇤

1

K²C(K)dK

◆ ,

(2.8)

(24)

as shown by Sachs et al. [11]. Here the notation C(K) was used for the price of a call option with strike K and P (K) for the price of a put option with strike K. S_⇤is a parameter that defines the boundary between calls and puts. S_⇤can therefore be seen as a constant.

Equation 2.8 can be approximated by a discrete number of options so that variance swaps can be used on any underlying for which there exist call and put options. There also exist contracts that enable variance swap trading on large indices. An example of such contracts are VIX Futures that can be traded for S&P 500’s volatility index VIX.

2.5 Volatility Trading Signals

Since implied volatility and realized volatility are estimates of the same volatility, the relation between them can be used to find trading signals for statistical arbitrage opportunities. In theory, implied volatility should be equal to realized volatility since both are measures of the same volatility. The problem is that realized volatility cannot be observed for the same period as implied volatility since the latter is in the future, in which the former has not been observed yet. What the market believes to be the future volatility (implied volatility) will therefore often differ from the later observed realized volatility.

Since forecasts of realized volatility can be made using several different models, implied volatility also differs from the forecasts as well. So what measure should be used to get a good estimate of future volatility? There is unfortu- nately no answer to this since one will never know what the future volatility is.

Maybe the market knows something about the future that cannot be implied from past observations or maybe the market will give inaccurate volatility estimates. It simply isn’t known for sure. However, what is known is the relation between past observations of implied and realized volatility. In this case it can be assumed that realized volatility is the most accurate measure since it’s just a measure of volatility of the underlying. By forecasting the relation between these measures we can attempt to get a better understanding of future volatility. Trading signals based on the relation can also be computed to build a trading strategy based on the assumption that implied volatility can be over/undervalued as discussed in section 1. The signals can be computed as follows:

• RV > IV : Realized volatility is higher than what the market believes.

Options can be considered as cheap, this is a buy signal.

(25)

• RV < IV : Realized volatility is lower than what the market believes.

Options can be considered as expensive, this is a sell signal.

• RV = IV : The market seems to have priced the options correctly, we will therefore do nothing.

In this thesis signals for a month into the future are generated at each date in a test data set by forecasting difference between realized and implied volatility.

The difference is in this thesis defined as:

DIF F = RV IV,

since the sign of the difference in this case corresponds to a trading signal (+

for a long signal and for a short signal).

2.6 Time Series Modeling

Since time series such as realized volatility (RV) and volatility differences (DIFF) can be described as stochastic processes with a mean and a variance, the most common approach in time series forecasting is to model the variation in mean and variance using two separate models.

For variations in mean (level) there are three main classes of models: Au- toregressive (AR) models, Integrated (I) models and Moving Average (MA) models. There also exists combinations of these such as ARMA and ARIMA models as well as extensions such as the Heterogeneous AR (HAR) model. In this thesis only the AR and HAR models were implemented (see sections 2.6.6 and 2.6.7) since they have been proven to work well when modeling volatility [7].

Modeling variations in variance is mainly done using autoregressive conditional heteroscedasticity (ARCH) models and generalized ARCH (GARCH) models. Modeling variance generally does not have a big impact on the actual forecasts, but it is of high importance when confidence/prediction intervals are needed. This is however not what this thesis is investigating which means that variance modeling will not be covered.

In addition to the described models above, there are models based on processes such as the Ornstein–Uhlenbeck model (see section 2.6.3) that can be used to forecast time series. One can also use regression models from the machine learning field to forecast time series. In this thesis, artificial neural networks, which is a class of machine learning models based on biological neural networks (see section 2.7), will be studied.

(26)

2.6.1 Autocorrelation

Autocorrelation is the correlation of a process to lagged values of itself. This means that a process which has some kind of pattern will have a significant autocorrelation. To reveal such correlations one can use the autocorrelation function (ACF). The function measures correlation between lags of different lengths. If the autocorrelation doesn’t return any significant correlation between lags one can assume that the process consists of independent and iden- tically distributed (i.i.d) variables, i.e. their order doesn’t matter. The inde- pendence property is of high importance when modeling time series using moving average models such as the MA and ARMA models and when calculating confidence intervals for forecasts. Such models are however not studied in this thesis, but the ACF is still useful since its important to know if there is any correlation between past and future observations.

To determine useful lags for autoregressive models one can instead look at the partial autocorrelation function (PACF). It’s similar to the ACF, but the main difference is that it measures the partial correlation.

2.6.2 Seasonal Decomposition

Seasonal decomposition is the process of removing any seasonality that ap- pears in a time series. Most time series can be described using four components in either an additive model:

yt= Tt+ Ct+ St+ It, or a multiplicative model:

yt = Tt⇥ Ct⇥ St⇥ It,

where T is the trend, C is the cyclical component, S is the seasonality and I is the irregular component. There are many different techniques to decompose a time series into its components, but one of the most versatile and robust methods is Seasonal and Trend decomposition using Loess (STL) [12].

Seasonal and Trend decomposition using Loess

Seasonal and Trend decomposition using Loess (STL) decomposes an additive time series into a trend component, a seasonal component and an irregular component as shown in figure 2.2 [13]. The cyclical component is therefore ignored when using STL decomposition. Since the seasonality will not always

(27)

be a constant series, it is often useful to model it using a time series model to be able to forecast future seasonality. After removing the seasonality component from the original time series, the series can be called seasonally adjusted. The adjusted series can then be modeled and forecast as per usual. It’s however important to note that the seasonal forecast and the seasonally adjusted forecast need to be combined to get a complete forecast.

Figure 2.2: Decomposition using STL.

2.6.3 Ornstein–Uhlenbeck Process

An Ornstein-Uhlenbeck (OU) process is a continuous time stochastic process.

It is defined as the solution to a type of stochastic differential equations (SDE) called the Langevin equation:

dY_t = ✓Y_tdt + dW_t. (2.9)

A drift term, µ, may also be added to equation 2.9, as seen in equation 2.10.

An Ornstein-Uhlenbeck process with drift is often called the Vasicek model

(28)

in financial mathematics:

dY_t= ✓(µ Y_t)dt + dW_t. (2.10) Like the Black-Scholes model, the Ornstein-Uhlenbeck process is a random walk in continuous time. However, it is also a mean reverting process, which means that the process tends move around its mean. The three variables in the process can therefore be interpreted as:

• ✓: Speed of reversion

• µ: Long term mean level

• : Instantaneous volatility

The first and second moments (mean and variance) of an Ornstein-Uhlenbeck process are:

E[Yt] = Y0e ^✓t+ µ(1 e ^✓t), Var(Yt) =

2

2✓(1 e ^2✓t).

The conditional probability density function (PDF) for the delta of an Ornstein- Uhlenbeck process is defined as:

(Y|✓(µ Yt)dt, ²dt), where is the PDF of a normal distribution.

2.6.4 Poisson Jump Process

Many stochastic processes in financial mathematics tend to experience jumps.

For volatility time series, this is often due to events that cause a temporary period of high volatility before reverting to its mean, like an Ornstein-Uhlenbeck process. To model these jumps in time one can use a compound Poisson process. This process is defined as:

Yt=

N (t)

X

i=1

Ji, (2.11)

where N(t) : t  0 is a counting Poisson process with rate , and Jⁱ : i 1 are i.i.d random variables. Although J may have any distribution, a normal

(29)

distribution with mean µJand variance Jcan often be assumed. One can also assume that Ji and N(t) are independent of each other. The interpretation of this process is that at time t, N(t) jumps happen with intensity J. One can also define the delta of the process as:

dYt=

dN (t)X

i=1

Ji,

where dN(t) is a counting Poisson process with rate dt.

The first moment (mean) of a compound Poisson process is calculated by:

E[Yt] = E[

N (t)

X

i=1

Ji]

= E[N (t)]· E[J]

= · µJ. The second moment (variance) is calculated by:

Var[Yt] = Var[

N (t)

X

i=1

Ji]

= E[N (t)]· Var[J] + E[J]²· Var[N(t)]

= · J+ µ²_J·

= ( J + µ²_J)· .

The probability mass function (PMF) for a delta compound Poisson process is defined as:

p_k( dt) = ( dt)^ke ^dt

k! .

2.6.5 Ornstein-Uhlenbeck Process with Jumps

By combining an Ornstein-Uhlenbeck process (section 2.6.3) with Poisson jumps (section 2.6.4), an Ornstein-Uhlenbeck process with Jumps (OUJ) model can be defined in terms of deltas as:

dYt = ✓(µ Yt)dt + dWt+

dN (t)X

i=1

Qi,

where dWt ⇠ N(0, dt), dN(t) ⇠ P oisson( dt) and Qi ⇠ N(0, 1).

(30)

The probability density function (PDF) for a delta Ornstein-Uhlenbeck process with jumps is:

dYt = X1 k=0

pk( dt)· n(Y|✓(µ Yt)dt + µjk, ²dt + _j²k).

Forecasts for an Ornstein-Uhlenbeck process with jumps can be made using the mean for an ordinary Ornstein-Uhlenbeck process. An example of forecasts made for a simulated Ornstein-Uhlenbeck process with compound Pois- son jumps can be seen in figure 2.3.

Figure 2.3: Forecasts for an Ornstein-Uhlenbeck process with Poisson jumps.

2.6.6 Autoregressive Models

An autoregressive (AR) model describes a time-varying process where the output Y has a linear dependency on the process’ previous values [12]. The model can be formulated as:

Yt= ↵ + Xq

i=1

iYt i+ ✏t,

where ↵ is a drift term, i’s are model parameters for lagged values of Y . ✏’s can be seen as volatility innovations with a standard normal distribution.

(31)

2.6.7 Heterogeneous Autoregressive Models

The heterogeneous autoregressive model was developed by Corsi [7] as a way to model the long memory behavior of volatility. The following quote from his paper describes the model in quite simple terms:

The additive volatility cascade inspired by the Heterogeneous Mar- ket Hypothesis leads to a simple AR-type model in the realized volatility which has the feature of considering volatilities realized over different interval sizes. We term this model, Heterogeneous Autoregressive model of the Realized Volatility (HAR-RV).

The mathematical definition of the model is:

RV_t+1d^(d) = c + ^(d)RV_t^(d) + ^(w)RV_t^(w)+ ^(m)RV_t^(m)+ !_t+1d,

where RVtis the realized volatility at time t, c is a drift term, ’s are model parameters for daily, weekly and monthly lags and ! is the volatility innovation.

More detailed definitions of all variables can be found in the original paper by Corsi [7].

2.7 Artificial Neural Networks

An artificial neural network (ANN) is a computational model inspired by biological neural networks such as the human brain. The purpose of the network is to learn to perform tasks by training on given examples. An example of such task is signal classification. A neural network would in this case first be given labeled signals, i.e. signals and the correct labels for each of the signals.

After training on the training data it would in theory be able to predict labels for given signals. Although signal classification is something that is investigated in this thesis, the main objective is to forecast time series. Inputs to a forecasting network would instead be past observations of a time series and output would be forecasted values.

There are many types of artificial neural networks. In this thesis two types of neural networks are studied, namely the Multilayer Perceptron (MLP) network and the Long-Short Term Memory (LSTM) network. Since artificial neural networks are just networks, they can be represented as graphs. An example can be seen in figure 2.4.

(32)

Figure 2.4: A fully connected neural network with three input nodes, two hidden layers and one output node.

2.7.1 Multilayer Perceptron

A multilayer perceptron (MLP) is a feedforward artificial neural network (ANN) composed of multiple layers of perceptrons/nodes fully connected by weights as shown in figure 2.5.

... ... ...

x1

x2

x_n

h₁

hn

y1

yn

Input

layer Hidden

layer Output

layer

Figure 2.5: Multilayer Perceptron diagram.

(33)

Layers

A layer in a neural network is a collection of nodes that hold numbers. The first layer in a network is an input layer where each node is an input x1, x2, . . ., xn. After the first layer are several layers often referred to as hidden layers h⁽¹⁾, h⁽²⁾, . . ., h⁽ⁿ⁾. Each of these layers are composed of nodes h⁽ⁱ⁾₁ , h⁽ⁱ⁾₂ , . . ., h⁽ⁱ⁾n

which are connected to all nodes in the previous layer and connects to all nodes in the next layer. Each of these connections has a weight, so all connections between two layers can be described as a weight matrix W^(i,j) where i is the index of the previous layer and j is the index of the next layer. Since each node has several inputs they need to be multiplied by the connection weights and then summed to a single number before being outputted to the next layer.

After one or more hidden layers is the last layer which is an output layer where the nodes, y1, y2, . . ., yn, represent the outputs of the neural network. For example, a single node can represent the output from a regression and multiple nodes can represent probabilities for different classes in a classification setting.

Perceptrons

A perceptron in a neural network is a neuron with an activation function. An activation function transforms a node’s input to an activation/output. The most basic function is the linear function f(x) = x. There are however more complex activation functions such as the sigmoid function s(x) = _1+e¹ x and the hyperbolic tangent tanh(x) = ^e_e^xx+e^e ^x^x. What function to use depends on the data that the model is going to be fitted to and what the output of the model should be. There is therefore no right or wrong activation function and testing is often needed to determine the most optimal function for a given problem.

Backpropagation

Backpropagation is the method used to train neural networks. To perform backpropagation, a loss function for the network first needs to be defined. A loss function is any measure that defines the error between model predictions and the actual outcomes. Since any layer in a neural network can be represented as function of the previous layers, the loss function can be defined as a

(34)

function of the whole neural network:

L(x) = g(f (x)), f (x) = fn(fn 1(. . .))

= fn fn 1 . . . f1(x), fi(x) = (W^{i,i 1})x + bi,

where i = 1, . . . , n represent layer indices, is an activation function, W^(i,j) is the weight matrix between layer i and layer j and bi is a bias term for layer i. Backpropagation works by taking the gradient of the model’s loss function w.r.t. the weight matrices beginning from the last layer all the way back to the first layer. All weights, including the bias term, are then up- dated using their respective gradients. In practice this comes down to calculating gradients using the chain rule. However, a deep network using the Sigmoid function for example, may experience issues during training since backpropagation can cause vanishing or exploding gradients. Recent research has therefore proposed different activation functions such as rectified linear unit, ReLU(x) = max(0, x), in an attempt to solve this problem.

2.7.2 Recurrent Neural Networks

Recurrent neural networks (RNNs) is a class of feedforward neural networks.

The main difference to an MLP is that in addition to the neural network, there is a temporal sequence of networks that connect through a set of weights. This allows RNNs to often handle temporal data better than MLPs. The network can be described by an input vector xtand the following vectors:

• Hidden layer: ht= h(Whxt+ Uhht 1+ bh)

• Output: y^t= h(Wyht+ by)

where Wh is the weight matrix between xt and ht, Uh is the weight matrix between hidden layer vectors and Wy is the weight matrix between htand yt. b_hand by are bias vectors for the hidden layers and output. his the activation function for the hidden layers. A diagram of a simple RNN is shown in figure 2.6.

(35)

h₀ h₁ h₂ h_t

=

h_t yt

xt

y0

x0

y1

x1

y2

x2

yt

xt

...

Figure 2.6: Unfolded Recurrent Neural Network diagram.

Long-Short Term Memory

The problem with RNNs is that when using backpropagation through time during training, gradients can vanish or explode. A Long-Short Term Memory (LSTM) model is a modified RNN that partially solves this issue by introducing feedback loops in the form of memory cells in the network. A memory cell, or rather an LSTM cell, which takes ht 1and xtas inputs and outputs ht

is defined by:

• Input gate: it= _g(W_ix_t+ U_ih_{t 1}+ b_i)

• Forget gate: f^t= g(Wfxt+ Ufht 1+ bf)

• Output gate: ot = g(Woxt+ Uoht 1+ bo)

• Cell state: c^t= ft ct 1+ it ct

• Output: ht= ot h(ct)

where W and U are weight matrices and b is the bias vector. g is called recurrent activation function and his another activation function. A diagram of the cell is shown in figure 2.7.

(36)

g g h g

⇥ +

⇥ ⇥

h

ct 1

Cell

ht 1

Hidden

xt

Input

ct

Cell

ht

Hidden ht

Hidden

Figure 2.7: Long-Short Term Memory cell diagram.

2.7.3 Bias-Variance Dilemma

The Bias-Variance Dilemma is a trade-off problem in machine learning and statistics where the bias and the variance of a predictive model cannot both be optimized at the same time. Many models usually have parameters that have to be estimated, i.e. fitted, against data. How many parameters a model has and how the model is defined determines the model’s complexity. A simple linear model has for example a very low complexity while a polynomial of a high degree has a very high complexity. The linear model therefore has low variance since it can only predict things on a line, while the polynomial has high variance since it can better adapt to the data. The problem is that when looking at bias, the opposite is true. The linear model has a high bias since its predictions only fall on a line, while the polynomial has low bias since its predictions can vary more. A perfect model would have both low bias and low variance since those two traits make models generalize better. However, in reality this is not possible, hence the dilemma.

Overfitting

Artificial neural networks are very good at fitting data because of their many parameters. This however also means that network models tend to have high variance which causes them to overfit data. Overfitting simply means that a model has high variance and is able to fit the data so perfectly that it becomes

(37)

a problem when trying to make predictions for unobserved data. For example, consider a complex time series forecasting model with high variance. It might fit the training data perfectly and get very low in-sample errors, but when mak- ing out-of-sample forecasts, they might get very high error due to the model overfitting the training data. This is especially true if future data doesn’t be- have exactly like the historic data. The model is therefore not able to generalize properly and cannot make accurate forecasts. To avoid overfitting one should investigate regularization techniques.

Regularization

Regularization is a technique used in machine learning and statistics to min- imize the risk of overfitting data. There are many different types of regularization because its a very broad term. Some techniques like L1 (Lasso regression) and L2 (Ridge regression) regularization introduce a penalty term in the model’s loss function that penalizes model complexity. Other techniques in- clude early stopping and dividing the training data into a training set and a validation set. Early stopping simply means that the training process stops when a certain condition is met, for example when the loss function hasn’t de- creased after a certain number of iterations. This can be used together with a validation set. The purpose of a validation set is to avoid calculating the loss function on training data and instead perform the calculation on unobserved validation data to get a better estimate of out-of-sample error. This means that while training error always decreases, validation error might start to increase after a while. With early stopping one can therefore use the validation error as a way to avoid overfitting the data.

2.8 Metrics

2.8.1 Mean Squared Error

Mean Squared Error (MSE) is a metric that measures the difference between a series forecast and the series itself by looking at the mean of the squared differences. It can be formulated as:

M SE = 1 n

Xn i=1

( ˆYi Yi)²,

where ˆY is the forecast series and Y is the actual series. One may also take the root of the MSE to get the Root Mean Squared Error (RMSE).

(38)

2.8.2 Mean Absolute Error

Mean Absolute Error (MAE) is a metric that measures the difference between a series forecast and the series itself by looking at the mean of the absolute values of the differences. It can be formulated as:

M AE = 1 n

Xn i=1

| ˆYi Yi|,

where ˆY is the forecast series and Y is the actual series.

2.8.3 Precision and Recall

Precision and recall are two measures of relevance for categorical data. Pre- cision is the percentage of relevant items among retrieved items and recall is the percentage of relevant items that were actually retrieved. Precision can therefore be described by “How many selected items are relevant?” and recall by “How many relevant items are selected?”. Their mathematical definitions are:

Precison = True Positives

True Positives & False Positives, Recall = True Positives

True Positives & False Negatives.

2.8.4 Correlation

A correlation coefficient is a measure that describes how similar or dissimilar two data sets are. There are several different sample correlation coefficients, but Kendall’s ⌧ was in this thesis chosen since it allows for the relation between two data sets to be non-linear. The mathematical definition is:

⌧ = 2

n(n 1) X

i<j

sgn(Xi Xj)sgn(Yi Yj),

where X and Y are the two data sets for which the correlation should be measured.

(39)

Method

3.1 Data Selection

The examined data in this thesis consisted of closing prices and 30-day implied volatility for 13 Norwegian stocks and one index, the Norwegian OBX index, for a total of 14 different assets. All assets are defined in appendix A.

Closing prices from the S&P 500 index along with its volatility index VIX for simulating VIX Futures trading were also used. The start date was set to 2014-01-01 and the end date was set to 2019-12-31 to give us six years worth of data. This allowed us to use more than a year worth of data for testing since only 20% of the data was used for testing purposes, as shown in table 3.1.

For hyperparameter optimization (section 3.5), data between 2012-01-01 and 2017-12-31 was used. However, data from 2012 didn’t exist for all assets, so the actual time intervals varied from asset to asset. All time series were also split into a train data set (80%) and test data set (20%) before training. Table 3.1 shows time intervals used for different comparison sections in this thesis.

It also clearly shows that the test intervals are not overlapping.

25

(40)

Section Train Interval Test Interval D 2012-01-01 - 2016-10-24 2016-10-25 - 2017-12-31 4.2 2014-01-01 - 2018-10-24 2018-10-25 - 2019-12-31 4.3 2014-01-01 - 2018-10-24 2018-10-25 - 2019-12-31 4.6 2014-01-01 - 2018-10-24 2018-10-25 - 2019-12-31 4.7 2014-01-01 - 2018-10-24 2018-10-25 - 2019-12-31 Table 3.1: Time intervals used in comparison sections in this thesis.

Prices and implied volatilities for all assets were provided by DNB. The prices were daily closing prices and and the implied volatilities were computed from ATM options using the Black-Scholes model. The S&P 500 index and its volatility index VIX were also provided by DNB together with reporting dates between 2015-01-01 and 2019-12-31 for the Norwegian stocks.

3.2 Model Selection

In this thesis three univariate time series models and two machine learning models were used to model and forecast DIFF and RV in order to get future differences between RV and IV as discussed in section 1.1. The three univariate models were a model based on an Ornstein-Uhlenbeck process with compound Poisson jumps, an AR model and an HAR model. Both the AR and HAR models were chosen because they have been proven to work quite well when modeling volatility as seen in the HAR study by Corsi [7]. The Ornstein- Uhlenbeck process with jumps was selected since the difference between realized and implied volatility should be a mean reverting process, which an Ornstein-Uhlenbeck process should model quite well. The machine learning models on the other hand have not been studied in detail on volatility modeling, but recurrent neural networks should be able to handle temporal data quite well. The LSTM model which at the time of writing is one of the most popular recurrent neural networks was therefore chosen for the comparison.

The MLP model was chosen because it is a more basic neural network compared to the LSTM and is therefore a good baseline neural network to use in the comparison.

(41)

3.3 Performance Metrics

To compare the models in this thesis, measures from section 2.8. In order to combine a measure for all tickers, the mean measure for each model using data from all assets was calculated. This made it possible to compare the performance of models using data from multiple assets. Scores for each model were also calculated by dividing each measure for the models by the maximum measure among all models in the comparison and multiplying it by 100. If a metric was decreasing, i.e. the best value for the metric is 0 (MSE, error rate etc.), the score was inverted by taking 100 score. This resulted in a percentage based [0, 100] score, where 100 is the best and 0 is the worst. To combine scores from different assets, the mean value was calculated. The score is therefore a normalized measure that makes it easier to measure the relative accuracy of the models. One important thing to note is that since the scores are relative to other models in the same comparison, scores from different comparisons cannot be compared (unless otherwise stated). The solution to that case is to instead look at the mean measure. Another important note is that scores were based on MSEs for the forecasting models and based on error rates for the classification models.

3.4 Methodology

In this thesis, forecasts of the relation between implied and realized volatility were studied. A backtesting suite was therefore built from scratch using Python to allow for a wide variety of models to be tested and evaluated. The program consists of four components:

• Data Pre-Processing: Pre-processing data and splitting data into train/test data sets.

• Modeling: Fitting and forecasting/classifying data using a specified model.

• Analysis: Comparing forecasts/classifications and creating trading signals.

• Trading: Simulating trading using trading signals.

3.4.1 Data Pre-Processing

The main objective with this thesis was to study the relation between RV and IV and since the only available data in this thesis consisted of prices and IV,

(42)

RV needed to be calculated. Calculation of RV was performed using the formula described in section 2.3 with the period set to 22 trading days, i.e. one month. The relation between the two volatility measures was then calculated as described in 2.5.

The next step was to calculated the difference (DIF F = RV IV) and split the data into train and test data sets. The split was set to 80% train and 20% test data for all models in this thesis. RV, IV and DIFF were also plotted along with the autocorrelation function (ACF) and the partial autocorrelation function (PACF). Since reporting dates were provided by DNB for each Norwegian stock, the series could also be plotted together with the reporting dates. Since presenting data results for all assets would not be manageable, it was decided to do a case study for only one of the assets, namely the DNB stock. Results from the data case study are presented in section 4.1.

3.4.2 Modeling

Three univariate time series models and two machine learning models were implemented and compared in this thesis. Each of the models forecasted both RV and DIFF. That meant that for each model type, two separate models were trained, one using RV and one using DIFF data. This also meant that when RV was forecasted, calculation of future difference between RV and IV was required to be able to compare it to the DIFF forecast and also to calculate trading signals as described in section 2.5.

The forecasts were made using 22-step ahead forecasting for each day in the test data set. This meant that for each day, a forecast for the following 22 days was performed. This resulted in a forecast matrix where rows describe dates in the test data set and columns describe at which date the 22-step ahead forecast was made. This means that the matrix is a banded lower diagonal matrix where the diagonal is the one day ahead forecast and each next row is a forecast made one more day ahead. For example, if the test data set has n dates and 22-day ahead forecast are made, the resulting matrix would be an n⇥ n lower diagonal matrix with bandwidth 22. 22-day ahead forecasts were chosen because the given data set consisted of 30-day (= 22 trading days) IV and 22-day RV. This meant that the 22-day ahead forecasts consisted of forecasted values for every day until maturity and this was repeated for every date in the test date set.

(43)

OUJ Model

To fit the OUJ model to the data, maximum likelihood estimation was used to estimate the parameters µ, , muj, j and . For this the PDF described in section 2.6.5 was used together with the Scipy package in Python to mini- mize the negative log-likelihood of data differences, i.e. delta values. During testing, it was found that this optimization took a long time. A limit of 10 iterations was therefore set before terminating the optimization. Parameters were also only fitted once more than a years worth of new data was available. Initial parameter estimates were also computed to improve the fit even further. After experimenting with parameter estimation for simulated OUJ processes, a set of estimators that seemed to work quite well as initial estimators was found.

was chosen as the 90% percentile of the data. could then be used to split the data into an OU part and a jump part. µ could then be estimated as the mean of the OU part of the data. To estimate , the standard deviation of the OU data differences was calculated and divided by p

dt since a OU process has p

dt as standard deviation. A value to split the differences into an OU part and a jump part was therefore required. After looking at histograms of the differences it was noted that the data had a heavy right tail. This was due to the fact that used OU process included jumps, otherwise it would have been shaped as a normal distribution. The absolute value of the 5% quantile was therefore chosen as a splitting value. Using , ✓ could be calculated using the long term variance of an OU process (Var(OU) = _2✓²). µj could then be calculated as the mean of the jump part of the differences and j as the standard deviation of the same jump data.

To be able to use the OUJ model for forecasting purposes, the expected value of an OU process was used as the forecasting function (see section 2.6.3). This meant that forecasting jumps was ignored and only the underlying OU process was forecasted.

AR & HAR Models

The chosen time series models were the AR model and the HAR model discussed in sections 2.6.6 and 2.6.7, both of which were implemented using the ARCH package in Python. Lags for both models were chosen to be 1, 5, 22.

These specific lags capture information for short- (daily, 1 day), medium- (weekly, 5 days) and long-term (monthly, 22 days) frequencies.

(44)

MLP & LSTM Models

The chosen machine learning models were the MLP model and the LSTM model discussed in sections 2.7.1 and 2.7.2. These machine learning models were implemented using the Tensorflow package and Keras framework. Both models were chosen to have n = 22 inputs which correspond to the 22 previous days of the time series. This n was chosen to be the same as the number of days in the future that was forecasted. For example, if one were to forecast values 22trading days into the future for an option with maturity 30 days, the input to the models would be values for the previous 22 trading days. The optimal networks were chosen using the methodology described in section 3.5. For the MLP, a two layer network with 64 nodes in the first layer and 32 nodes in the second layer was chosen. For the LSTM, a single layer network with 16 nodes was chosen. Each of the Tensorflow models were trained using batches of the same size as the number of inputs to the model. Since forecasting was performed for each day in the test data set, that also meant that after each forecast had been made, a new data point was available for training. For this, batch training was used by stalling training until a years worth of data was available. Different stalling time periods were tested before this study began, but a year seemed to generally work best.

Hybrid Models

In an attempt to generate better trading signals than the ones defined in section 2.5, classification models based on either an MLP or LSTM were also created.

The inputs to such model was the previous 22 days, which was same as number of days until maturity, of DIFF data and the output was a signal in the range [ 1, 1]. To use the model in practice, a forecasting model first forecasted the volatility difference using either DIFF or RV as training data. The forecasted difference was then used as input to the trained neural network which classified trading signals. We call these models (i.e. a forecasting model combined with a neural network for classification) hybrid models. For the signal classification MLP and LSTM, the optimal networks were chosen using the methodology described in section 3.5 and the results are shown in section 3.5.2. For the MLP, a five layer network with 32 nodes in the first and third layers, 64 nodes in the second and fourth layers and 128 nodes in the third (middle) layer was chosen. For the LSTM, a single layer network with 32 nodes was chosen.

(45)

Seasonal Decomposition

Seasonal decomposition was in this thesis used to see if the accuracy of the forecasts could be further improved by modeling the seasonal component as a separate process. The decomposition was performed using STL (see section 2.6.2) and the seasonal component was modeled and forecated using the AR model described in 3.4.2. The seasonally adjusted component (Yt St) was then modeled using the five studied models as usual. The results are shown in section 4.6.

Tensorflow Setup

All Tensorflow models (forecasting and classification) used the Adam opti- mizer and MSE loss function. To reduce the risk of overfitting, Tensorflow’s early stopping functionality was taken advantage of to limit the number of iterations during training (see section 2.7.3). The minimum change in the monitored quantity was set to 0.001 and patience was set to 10. Furthermore, the training data was split into a training set (80%) and a validation set (20%) so that validation MSE could be monitored by the early stopping callback function in Tensorflow. This meant that 80% · 80% = 64% of the total data was used for training, 80% · 20% = 16% was used for validation/early stopping and the rest, 20%, was used for testing purposes.

3.4.3 Analysis

An important aspect of this thesis was to compare the results of the models to the actual outcome. An analysis component was therefore written to take care of exactly that. However, since each model outputs a forecast matrix, a diagonal first had be chosen before comparing the forecast to the test data set.

Each shifted diagonal corresponded to an i-day ahead forecast, i 2 1, 2, . . . n, made at each date in the test data set as described in section 3.4.2. This meant that different i-step ahead forecasts could be analysed. However, the most interesting i is i = 22, which is the forecast for the same date as an option with maturity 22 days into the future. This meant that only the last forecasted value of each 22-day ahead forecast was used.

After a forecast series was extracted, different metrics were computed as well as the signals described in section 2.5. The metrics were the (root) mean squared error ((R)MSE) and mean absolute error (MAE) for the forecasts as well as the error rate, precision and recall for the forecasted signals. The correlation between trading signals and profits was also measured using Kendall’s

(46)

⌧. The forecasts were plotted to give a visual representation of each model’s performance. The results are shown in section 4.2. To analyze the distributions of the results for each model, box plots and bar plots was used as shown in section 4.4.1.

3.4.4 Trading

Options trading was performed by executing simulated trades based on trading signals. The signals were calculated from model forecasts based on the theory from section 2.5. A buy signal at time t therefore indicated that 1 contract should be bought at time t an the opposite for a sell signal. Since buying cheap options and selling expensive options was of interest, any type of option where the price depends on the volatility could have been traded. In this thesis however, straddles (section 2.4.2) were traded since a straddle’s profit does not depend on the underlying’s direction, but only on the underlying’s deviation from the strike price. Volatility swaps or variance swaps which trade the volatility/variance directly could also have been used. However, such swaps are generally only traded for large indices and in this thesis it was required to be able to trade using stocks as well. Straddles are therefore a good alterna- tive, even though their value doesn’t depend on only the underlying’s volatility.

However, VIX Futures trading using S&P 500’s VIX index was also simulated.

This was possible since VIX values are essentially delivery prices for variance swaps expressed as a volatility. Trading using VIX could therefore easily be approximated using VIX volatility as delivery price and realized volatility for S&P 500 as the price at expiration. For all trading calculations, the interest rate was set to 0%, i.e. a zero interest rate environment was assumed.

The simulated trading results were analyzed by calculating costs, payoffs and profits which were then summarized to get aggregated results. The results are presented in section 4.3.

For the model comparison, four simple trading strategies, namely OPT, BUY, SELL and RANDOM, were also implemented. OPT stands for optimal and is a strategy where the trader knows future RV and can therefore calculate the exact DIFF series for the future. This is therefore not a strategy that is possible to use in reality, but serves as an optimal reference for the models in this thesis. The three other strategies are almost self explanatory. BUY always bought options and SELL always sold options. This resulted in the profit amounts being equal, but having different signs. RANDOM traded using completely random trading signals 1000 times per asset. The mean profit was then calculated per asset to