Heteroscedasticity Models and their Forecasting Performance

(1)

Examensarbete i matematik, 15 hp

Handledare och examinator: Rolf Larsson September 2015

Department of Mathematics Uppsala University

Heteroscedasticity Models and their Forecasting Performance

Sebastian Sjöholm

(2)

(3)

Abstract

In this paper the aim is to fit and compare six different heteroscedasticity models for which has the best forecast accuracy in two different markets: equity and exchange rates. I will compare ARCH (q), GARCH (1,1), GARCH (1,1) with student t distribution, the Integrated GARCH, the Exponential GARCH and the Threshold GARCH model. The forecast will be made for 100 days ahead and the results of the models are measured by the MSE, mean squared error, for which the least value gives the best performance for the given stock or exchange rate. It should be mentioned that the results do not differ much between the chosen models compared.

(4)

1 Introduction

1.1 Background

Market risks associated with high volatility are catastrophic for all investors. For an average investor, the performance of a portfolio usually depends on a few trading days during which extreme returns occur. During these days the bad outcomes can

completely reverse the good outcomes accumulated during the previous period. So investors are primarily concerned with the extreme risks that make the most impact on their portfolios performance. Therefore forecasting volatility plays one of the most important parts in portfolio allocation.

The three main purposes of forecasting volatility are for risk management, portfolio allocation and for taking bets on future volatility. Risk management includes measuring the future potential losses of a portfolio asset, and this includes that estimates have to be done about future volatilities and correlation. In asset allocation, minimizing the risk for a given level of expected return is a standard approach. The volatility of an asset, the risk, can also be seen as the movement of a financial asset, and the more the asset moves, the more likely its value will decrease.

The first model that emerged for modeling changing variance in time series is the Autoregressive Conditional Heteroscedasticity (ARCH) model, as a function of passed residuals, by Engle (1982) and its various extension. For this he later came to receive the Swedish central bank Prize in Economic Sciences in Memory of Alfred Nobel in 2003.

Since the introduction of the ARCH model several more sophisticated models have appeared. One of them is the Generalized ARCH model by Bollerslev (1986), which is a generalization of the ARCH process and also allows for past conditional variances in the current conditional variance equation.

Many of the different models have different features, which makes the forecast accuracy better. Both the ARCH and GARCH models capture volatility clustering, which means that periods with high volatility tend to be followed by periods of high volatility, and leptokurtosis, when the kurtosis value is largely positive. But as their distribution is symmetric, they fail to model the leverage effect. To address this problem, many nonlinear GARCH models have been proposed, such as the Exponential GARCH (EGARCH) model by Nelson (1991), where volatility can react asymmetrically to the good or bad news.

It is also proposed in the analysis of empirical data that sometimes the model fits better when relaxing the Gaussian assumption in the previous models and supposing instead

(6)

that the distribution follows a heavier tailed zero-‐mean distribution such as the Student’s t-‐distribution.

So, the question is: do the more sophisticated models capture the volatility better? And which model has the best forecast accuracy?

There are some studies that have been made about the forecast accuracy of the different models on different securities.

1.2 Previous studies

Hansen and Lunde (2005) compared 330 different ARCH-‐type models in terms of their ability to describe conditional variance. They used out-‐of-‐sample data using DM-‐US exchange rate data and IBM return data, where IBM return data where based on a new data set of realized variance. They found no evidence that a GARCH (1,1) model is outperformed by more sophisticated models in their analysis of exchange rates. But the GARCH (1,1) model was poor in comparison to models with a leverage effect in their analysis of IBM returns.

Marucci (2005) compared different standard GARCH models with more complex models like Markov Regime-‐Switching GARCH (MRS-‐GARCH) in their ability to forecast the U.S.

stock market volatility at horizons that range from one day to one month. He found that MRS-‐GARCH was better than all standard GARCH models in forecasting volatility at horizons shorter than one week. However, at forecasts longer than one week, standard asymmetric GARCH models tend to be better.

Alberg, Shalit and Yosef (2008) compared the forecast performance of several GARCH models with different distributions for two Tel Aviv stock index returns. Their results showed that, among the forecast tested, the EGARCH skewed Student-‐t model

outperformed GARCH, TGARCH and APARCH models in forecasting the Tel Aviv stock index returns. The result also showed that the asymmetric models, such as the EGARCH, improve the forecasting performance.

Zhang and Yu (2009) used the Shanghai Stock Exchange Composite Index data to construct ARCH-‐type models under the assumptions of normal residuals and non-‐

normal residuals and compared the forecast performance of volatility for normal and non-‐normal ARCH models and the performance of VaR measure to demonstrate the effect of the distribution assumptions on GARCH model’s forecasting ability.

Their results showed that the forecast performances of volatility for GARCH with assumptions of a student’s-‐t distribution for the residual terms are superior to the assumption of a normal distribution.

(7)

Laurent, Rombouts and Violante (2012) used 10 assets from the New York Stock Exchange and compared 125 different multivariate GARCH models on their 1-‐, 5-‐, and 20-‐day-‐ahead forecast performance of the conditional variance over a period of 10 years. The results show that in unstable markets the multivariate GARCH models perform poorly.

Ramasamy and Munisamy (2012) compared three simulated exchange rates of the Malaysian Ringgit with actual exchange rates using GARCH, T-‐GARCH and EGARCH models. The forecasted rates are compared with the actual daily exchange rates for four different currencies, the Australian dollar, Singapore dollar, Thailand Bath and

Philippine Peso in 2011. The results show that these GARCH models perform better predicting the more volatile exchange rates than predicting the exchange rates that are less volatile. The results also show that the leverage effect incorporated in the TGARCH and EGARCH models do not improve the results much.

1.3 Aim

The aim of this paper is to examine six different heteroscedasticity models ARCH, GARCH, IGARCH, TGARCH and EGARCH. And for the GARCH model, a different distribution will also be considered, i.e. student-‐t distribution. The main goal is to examine if one of these models outperform the other models significantly in forecasting the conditional volatility for different securities. I will therefore not only focus on the stock market but also on exchange rates. I have chosen three different stocks Apple, Coca-‐Cola and Google and three different currencies United States Dollars (USD), Great Britain Pound (GBP) and the Swedish Krown (SEK). For all six data sets I have excluded the last 100 observations when I modeled the volatility. This is because I used the last 100 observations in purpose of comparing the models forecasting accuracy of the conditional variances to the real observations. The result was measured using the Mean Squared Error, MSE, which is the squared sum of the difference between the observed squared residuals and the forecasted conditional variance.

I chose to examine the ARCH model because it is the first heteroscedasticity model, by Engle (1982), and my purpose of this choice is to compare this model with the more sophisticated models. The GARCH (1,1) model is suggested to be enough to capture the volatility, and according to Hansen and Lunde (2005) there is no evidence that the more sophisticated models outperform the GARCH (1,1) model in modeling exchange rates.

The EGARCH and IGARCH model are more sophisticated models compared to the others.

For example one purpose of the development of the EGARCH model is because when studying empirical data in different markets, the impact of negative price moves are bigger on future volatility that of positive price moves, which is called asymmetric volatility.

(8)

For the stock data I have collected 2025 observations for Apple and 2024 for Coca-‐Cola and Google. For the exchange rates I have 396 observations for GBP/SEK and 364 for SEK/USD and GBP/USD, so there is a quite a big difference in the sample size between the stock and exchange rates. The difference in sample size is also very interesting because the models forecast accuracy might perform worse when the sample size is smaller or bigger. For example Ng and Lam (2006) investigated how the size of the sample size affected the GARCH model and found that 1000 or more observations are recommended for the conventional GARCH model. We might therefore expect better performance in forecasting accuracy from the original GARCH model for the stocks than for the exchange rates.

2 Heteroscedasticity models

2.1 Autoregressive Heteroscedasticity model

The Autoregressive Heteroscedasticity models are built to capture time varying

volatility in time series. The first heteroscedasticity model introduced by Engle (1982) used this model to estimate the means and variances of inflation in the U.K. (Engle, 1982).

The basic idea behind the ARCH model in modeling financial data is that (1) the change of an asset return, 𝑎_!, is serially uncorrelated, but dependent because of the assumption that high volatility appears in clusters, and (2) the dependence of 𝑎_!, can be described by a quadratic function of its lagged values. The model is built on the information set at time t-‐1.

The conditional variance function depends on p-‐lagged observations:

𝜎_!^! = 𝛼_!+ 𝛼_!𝑎_!!!^! + ⋯ + 𝛼_!𝑎_!!!^! (1)

In equation (1) it can be seen that for larger values of previous observations of asset returns the forecast of the volatility will be bigger, and the squared values will cause them to be even bigger. This means that a shock which causes big changes in asset return, 𝑎_!, the model will automatically forecast large values for the future volatility which is the same behavior as the clusters of the volatility (Tsay 2005:102).

Consider the asset return function as

𝑎_! = 𝜎_!𝜀_! (2)

Where [𝜀_!] is a sequence of independent identically distributed (iid) random variables

(9)

𝜎_! Is the positive ARCH (p) function, as described above,

𝜎_!^! = 𝛼_!+ 𝛼_!𝑎_!!!^! + ⋯ + 𝛼_!𝑎_!!!^! (3)

Where 𝛼_! > 0 and 𝛼_! ≥ 0 for 𝑗 > 0.

One weakness of the ARCH (p) model is that the model assumes that positive and negative shocks have the same effect on volatility because it depends on the square of the previous returns. In practice it is well known that the price of a financial asset responds differently to positive and negative shocks (Tsay 2005:106).

How to select the order p of an ARCH model is debatable and the required number of p might be very large, which makes this model unattractive (Brooks 2008:391). In this paper we will use the Akaike´s information criteria, AIC, the Bayesian Information Criteria, BIC, and the Hannan-‐Quinn Information Criteria, HQIC, to decide wich order that best describes the conditional variance.

2.2 GARCH (p,q) model

Although the ARCH model is simple, it often requires a lot of parameters to describe the conditional variances. The General Autoregressive Heteroscedasticity model

(GARCH (p,q)) introduced by Bollerslev (1986) is a generalization of the ARCH process.

To illustrate the GARCH (p,q) model, consider the log return of an asset, 𝑟_!, and let 𝑎_! = 𝑟_!− 𝜇_! be the innovations at time t. Then we define GARCH (p,q) models as:

𝑎_! = 𝜎_!𝜀_! (4)

Where again [𝜀_!] is a sequence of iid random variables with mean zero and variance 1, 𝜎_! is the positive function:

𝜎_!^! = 𝛼_! + ^!_!!!(𝛼_!𝑎_!!!^! ) + ^!_!!!(𝛽_!𝜎_!!!^! ) (5)

Where 𝛼_! > 0, 𝛼_! ≥ 0 and 𝛽_! ≥ 0 for i, j>0 and !"# (!,!)(𝛼_!+ 𝛽_!) < 1

!!! . We also assume

that 𝛼_! = 0 for j>p and 𝛽_! = 0 for i>q. The constraint 𝛼_! + 𝛽_! implies that the

unconditional variance is finite, and the conditional variance, 𝜎^!, varies over time. If in the equation (3) q=0 the parameters 𝛽_! in the equation will vanish and the

(10)

GARCH (p,q) model will be reduced to an ARCH(p) model. The parameters p and q are referred to as ARCH and GARCH parameters respectively.

The GARCH (1,1) model can be written as

𝜎_!^! = 𝛼_!+ 𝛼_!𝑎_!!!^! + 𝛽_!𝜎_!!!^! (6)

Where 𝛼_! > 0 and 𝛼_!, 𝛽_! ≥ 0, also (𝛼_!+ 𝛽_!) < 1.

If we look at the equation (5) above we notice that large values for lagged variance and returns we will have a bigger variance, again the well-‐known behavior of volatility clustering in financial time series(Tsay 2005:114)

The GARCH (p,q) model also responses equally to positive and negative shocks as the ARCH (p) model (Tsay 2005:116).

2.3 GARCH (p,q) with student-‐t distribution

When we defined the GARCH (p,q) model we assumed that 𝜀_! was iid N(0,1). So here, instead, we assume that 𝜀_! follows a student-‐t distribution with v degrees of freedom, t(v). In the analysis of empirical financial data it is found that sometimes it is better to assume that the distribution of 𝑎_! better suits a student’s-‐t distribution (Brockwell and Davis 2002:352).

2.4 Integrated GARCH model (IGARCH)

In 1986 Engle and Bollerslev introduced a new model called the integrated GARCH model (IGARCH) that is persistent in variance because todays information remains important for forecasts on all horizons. First we consider the lag operator 𝐿^!𝑎_! = 𝑎_!!!

and the lag polynomials 𝛼 𝐿 = 𝛼_!𝐿 + ⋯ + 𝛼_!𝐿^! and 𝛽 𝐿 = 𝛽_!𝐿 + ⋯ + 𝛽_!𝐿^!. Then the GARCH models can be written as:

𝜎_!^! = 𝛼_!+ 𝛼 𝐿 𝑎_!^! + 𝛽(𝐿)𝜎_!^! (7)

and in the case when 𝛼 𝐿 + 𝛽 𝐿 = 1 we may represent the GARCH model as an ARMA (m,p) model where m=max (p,q),

(11)

1 − 𝛼 𝐿 − 𝛽 𝐿 𝑎_!^! = 𝛼_!+ (1 − 𝛽 𝐿 )𝜈_! (8)

where 𝜈_!= 𝑎_!^!− 𝜎_!^!. If we look at equation (8) and consider the autoregressive

polynomial 1 − 𝛼 𝐿 − 𝛽 𝐿 , Engle and Bollerslev said that when this polynomial had 𝑑 > 0 unit roots, and 𝑚 − 𝑑 roots outside the unit circle, the GARCH model is integrated in variance of order 𝑑 (IGARCH (p,d,q)).

With some modifications equation (6) may be written as:

𝜎_!^! = 𝛼_!+ 𝛼 𝐿 𝑎_!^! + 𝛽(𝐿)𝜎_!^! (9)

which is the Integrated GARCH model by Engle and Bollerslev (1986) (Hafner 1998:103). Notice that Integrated GARCH model is the GARCH model where 𝛼 𝐿 + 𝛽 𝐿 = 1 (Reider 2009:11).

The IGARCH (1,1) model may be written as:

𝑎_!^! = 𝜎_!𝜖_! (2), 𝜎_!^! = 𝛼_! + (1 − 𝛽_!)𝑎_!^!+ 𝛽_!𝜎_!^! (10)

where 𝜖_! is iid N (0,1) and we have replaced 𝛼_! by 1 − 𝛽_!.

2.5 Asymmetry in volatility

The heteroscedasticity models that assume symmetry in volatility, some of them presented above, are models that assume that negative and positive innovations have the same impact when forecasting the conditional variance. But in many financial markets this is not the case. In fact, in many markets negative shocks are assumed to have bigger impact on volatility than positive shocks, which is known as the leverage effect. For example Ferreira, Menezes and Mendes (2007) investigated the hypothesis that the conditional variance of stock return is an asymmetric functions of past

information on seven different stock market indices. Their results show that the

conditional variance is in fact asymmetric, and that negative shocks have bigger impact on the variance than positive shocks.

2.6 Exponential GARCH model (EGARCH)

While the GARCH (p,q) model captures the short run temporal dependencies in magnitude for a variety of speculative assets, the model does not capture the leverage

(12)

effect in stock returns (Bollerslev and Mikkelsen 1996:159). In the GARCH models, the conditional variances are functions of magnitude of the lagged residuals and do not consider if they are positive or negative. To capture this, inclusion of different

asymmetric terms in the conditional variance equation has been suggested. One of these models is the Exponential GARCH (EGARCH) model proposed by Nelson (1991). The EGARCH model may be written as follows

ln (𝜎_!^!) = 𝛼_!+ _!!!^! 𝑔_! 𝜀_!!! + ^!_!!!𝛽_!ln (𝜎_!!!^! ) (11)

The conditional variance is constrained to be non-‐negative by the assumption that the logarithm of 𝜎^! is a function of passed 𝜀_!’s.

We define 𝑔_!(𝜀_!!!) as the weighted innovation:

𝑔_! 𝜀_!!! = 𝜃_!𝜀_!!!+ 𝛾_! 𝜀_!!! − 𝐸 𝜀_! , 𝑗 = 1, … , 𝑞 (12)

Where 𝜃 and 𝛾 are real constants. Both 𝜀_! and 𝜀_! − 𝐸( 𝜀_! ) are zero-‐mean iid sequences with continuous distributions. Therefore 𝐸 𝑔_! = 0 (Tsay 2005:124).

In equation (10) the term of the function 𝛾_! 𝜀_! − 𝐸 𝜀_! represents the magnitude effect as in the GARCH (p,q) model. Suppose for example 𝜃 = 0 and 𝛾 > 0, then ln(𝜎_!!!^! ) will be positive when the magnitude of change in 𝜀_! is larger that the expected value of 𝜀_!. Also notice that if 𝜃 < 0 and the magnitude of change in 𝜀_! is smaller than the expected value of 𝜀_!, the negative effect will be larger than if 𝜀_!!! − 𝐸 𝜀_! . Therefore the conditional variances will response different in magnitude to positive and negative shocks (Nelson 1991:351). Equation (11) and (12) together form Nelsons EGARCH model (1991), and the most popular EGARCH model is the EGARCH (1,1) model with p and q equal to 1.

The EGARCH (1,1) model may then be written as a combination of (11) and (12), where (12) becomes

𝑔 𝜀_!!! = 𝜃_!𝜀_!!!+ 𝛾_![ 𝜀_!!! − 𝐸 𝜀_! ] (13)

which we insert to equation (11) for p, q=1,

ln (𝜎_!^!) = 𝛼_!+ 𝜃_!𝜀_!!!+ 𝛾_! 𝜀_!!! − 𝐸 𝜀_! + 𝛽_!ln 𝜎_!!!^! (14)

It should also be mentioned that when the EGARCH model assumes Gaussian distribution, the error term 𝐸 𝜀_! = 2/𝜋, so we will get the equation:

(13)

ln 𝜎_!^! = 𝛼_!+ 𝜃_!𝜀_!!!+ 𝛾_! 𝜀_!!! − _!^! + 𝛽_!ln (𝜎_!!!^! ) (15)

Here it should be mentioned that because of the property that negative shocks of the volatility tend to have bigger impact, 𝜃 is often assumed to be negative (Tsay 2005:124).

2.7 TGARCH

Another model within the asymmetric GARCH family is the Threshold GARCH

(T-‐GARCH) model by Rebemanajara and Zakoian (1993), which is an extension of the Threshold ARCH model by including the lagged conditional standard deviations as a regressor (Miron and Tudor 2010). The TGARCH model can be written as

𝜎_!^! = 𝛼_!+ ^!_!!!𝛼_!𝑎_!!!^! + _!!!^! 𝛾_!𝑆_!!!𝑎_!!!^! + ^!_!!!𝛽_!𝜎_!!!^! (16)

where:

𝑆_!!! = 1 𝑖𝑓 𝑎_!!! < 0

0 𝑖𝑓 𝑎_!!! ≥ 0 (17)

So, by examining the equations (16) and (17) we see that negative changes will have bigger effect than the positive changes; they capture the leverage effects, which is the purpose of the asymmetric GARCH models. Notice that 𝛾_! is expected to be positive otherwise the positive effects would be greater than the negative effects.

(14)

3 Data

The data that has been used in my study is the daily closing prices from three different stocks Apple, Coca-‐Cola and Google. I have also compared the result of forecasting three different exchange rates, Great Britain Pound (GBP), United States Dollars (USD) and the Swedish Krown (SEK). All the data was downloaded from Google Finance and the stock data are dated from 2007-‐01-‐03 to 2015-‐06-‐15, and contains 2025 observations for Apple and 2024 observations for Coca-‐Cola and Google. The last 100 observations where left out in order to evaluate the forecasting results. The exchange rates are dated from 2014-‐02-‐04 to 2015-‐03-‐10 each containing 396 observations for GBP/SEK and

SEK/USD and 398 observations GBP/USD. Also here the last 100 observations where left out to evaluate the forecast accuracy. All calculations were done in R using the rugarch package.

Graph 1 (left) Daily closing price (USD) for Coca-‐Cola (Black) and Apple (Red) from 2007-‐01-‐03. (Right) Daily

closing price (USD) for Google from 2007-‐01-‐03 to 2015-‐01-‐21.

(15)

Graph 2 (Top) Daily closing price (SEK) for GBP/SEK from 2014-‐02-‐04 to 2015-‐03-‐10. (Middle) Daily closing

price (USD) for SEK/USD from 2014-‐02-‐04 to 2015-‐03-‐10. (Bottom) Daily closing price (USD) for GBP/SEK from 2014-‐02-‐04 to 2015-‐03-‐10.

To be able to model the volatility of the financial data the trend must be removed from each data set. We do this by differentiating the time series one by one.

The stock data were differentiated with the log returns: 𝑎_! = ln 𝑦_! − ln (𝑦_!!!), where 𝑦_! is the observed closing price at time t. And the exchange rates were differentiated with the somewhat modified log returns:

𝑎_! = 100 ∗ (ln 𝑦_! − ln (𝑦_!!!)), here 𝑦_! is the exchange rate at time t.

A sample Autocorrelation function, sample ACF, and the sample Partial Autocorrelation function, PACF, was plotted for all stationary data sets. The sample ACF is a tool for estimating the dependence in the data. For example, if the sample ACF is close to zero we might suggest that it is iid noise (Brockwell and Davis 2002:18). Also, the PACF could be used in order to determine the order of the ARCH (p) model (Tsay 2005:119).

The Lagrange multiplier test for the Autoregressive heteroscedasticity (ARCH) effect was done for all residuals up to lag 30 and is represented in Appendix 1-‐6. If the LM test is not rejected, i.e. p-‐value exceeds 5%, then the null hypothesis that all coefficients in the ARCH model are zero, cannot be rejected and we say there is no ARCH effect (Engle 1982)

(16)

Graph 3 Daily log returns (USD) from Apple (top), Coca-‐Cola (middle) and Google (bottom).

Graph 3 shows the log returns from the three equities from 2007-‐01-‐03 to

2015-‐01-‐21. The top graph represent Apple, the middle graph represents Coca-‐Cola and the bottom graph Google. As we can see in the graph, all series show a significant

increase in volatility during the financial crisis in 2007 to 2009. It can also be seen that volatility seems to appear in clusters, which is a sign of ARCH effect, and there are periods with high volatility and periods with low volatility.

(17)

Graph 4 ACF function for Apple (top), Coca-‐Cola (middle) and Google (bottom).

Graph 4 shows the sample ACF function for the three equities Apple (top), Coca-‐Cola (middle) and Google (bottom). Notice that lag zero is always one. By examining the graphs we see that there exist no correlations between one lag to another. It also seems to be equally likely to be positive or negative from one lag to another. But, there might still be some correlation in magnitude. If we square the residuals and plot the graphs again we can see that there still exists some correlation in magnitude from one lag to another (Graph 5). LM tests for the ARCH effect was carried out on the three stocks (see Appendix (1-‐3)) and all p-‐values are less than 0.05, so we have ARCH effect.

(18)

Graph 5 The ACF function of squared residuals for Apple (top), Coca-‐Cola (middle) and Google (bottom).

Graph 6 shows the partial ACF of the squared residuals for all tree equities. The PACF shows significant lags up to lag 10 and then cuts off. The PACF cuts off at lag 3 for Coca-‐

Cola and at lag 12 for Google.

(19)

Graph 6 The PACF function of squared residuals for Apple (top), Coca-‐Cola (middle) and Google (bottom).

Graph 7 Log returns for GBP/USD (top), SEK/USD (middle) and GBP/SEK (bottom).

Graph 7 shows the differentiated series for the exchange rates. Notice also here that the volatility appears in clusters and that there are periods with high volatility and periods with low volatility. If we look at the sample ACF function for the exchange rates, Graph 8, we notice that there seems to be no correlation between different lags for all three exchange rates passed lag 1. If we look back in Graph 4 we notice that there is no significance in lag 1 for the stock data, but there is for all three exchange rates. This means that the degree of dependence is higher for the exchange rates than for the stocks.

(20)

Although, as in the case for the three stocks we also check the squared residuals to see if there exists some correlation in the magnitude of change from one lag to another. And if we examine Graph (1.9) we notice that there might still be some correlation of the change in magnitude.

Graph 8 ACF functions for GBP/USD (top), SEK/USD (middle) and GBP/SEK (bottom).

(21)

Graph 9 The ACF function of squared residuals for GBP/USD (top), SEK/USD (middle) and SEK/USD (bottom).

In Graph 10 below, the PACF for the exchange rates shows that for all tree exchange rates the squared sample PACF cuts off at lag one. If we compare the ACF and PACF on the squared residuals for the stock and the exchange rates it can be seen that the correlation is greater for the tree stocks than for the exchange rates. And that the correlation for the tree exchange rates seems weak according to the sample ACF and PACF.

(22)

Graph 10 The PACF function of squared residuals for GBP/SEK (top), SEK/USD (middle) and GBP/SEK

(bottom).

The LM test on the residuals was done for the exchange rates, as in the case for the stocks, and it was found that for GBP/SEK the null hypothesis is rejected for all lags except at lag 27 and 30.

For the exchange rate SEK/USD there is an interval between 3 and 13 where the p-‐value for the LM test exceeds 0.05 and we do not reject the null hypothesis for these lags, although the remaining lags have a p-‐value less than 0.05. For GBP/USD the p-‐value for lags 7, 9-‐13 and 17-‐19 exceeds 0.05 and we do not include them in our modeling. The remaining lags for GBP/USD are less than 0.05 and therefore included. These results can be found in Appendix 1-‐6.

Stock/ER MEAN MAXIMUM MINIMUM Std. Dev. Skewness Kurtosis

Apple 0.001093 0.132172 -‐0.197280 0.021843 -‐0.467878 6.758486 Coca-‐Cola 0.000379 0.1822549 -‐0.123424 0.019373 0.437862 10.65368 Google 0.000395 0.1823549 -‐0.122429 0.019382 0.436909 10.62484 GBP/USD -‐0.021365 0.82305 -‐0.99099 0.252954 -‐0.454664 2.212875 SEK/USD -‐0.067910 0.97372 -‐1.25158 0.332382 -‐0.418516 1.145960 GBP/SEK 0.046572 1.30594 -‐1.00965 0.312067 0.421306 1.286480

Table 1 Statistics about the log return series.

Table 1 shows some statistical information about the logarithmic return series. The kurtosis for normal distribution is 3 (Shumway and Stoffer 2006:282) and as we can see

(23)

in Table 1, the kurtosis for the three stocks exceeds 3. The kurtosis for all the exchange rates is below 3. So none of the time series seems to be normally distributed.

4 Building an ARCH (p) model

In order to build an ARCH (p) model it is important to determine the correct number of lags. There are many ways of determining the order and below you will find three different information criteria that will be used in order to determine the best order for the ARCH (p) models.

4.1 Determining the order for an ARCH (p) model

The methods used to compare the considered models are Akaike´s, Bayesian and Hannan-‐Quinn information criteria. Note that these criteria only compare considered models and selects the model that best fit the given data. Also, because this selection is based on information criteria no p-‐values for the different variables are necessary, only those coefficients which equal zero we will consider as insignificant.

4.1.1 Akaike´s Information Criteria (AIC)

The first method used to compare the considered ARCH models to determine which model that best fits the given data is Akaike´s Information Criteria (AIC) (Javed and Mantalos 2013). The idea is that you first fit an ARCH (p) model with lag p to the data.

Maximizing the conditional maximum likelihood function with respect to the data, which also determines the estimation of the parameters. The maximum likelihood function for an ARCH model is given by:

𝐿 𝛼_!, 𝛼_!. . 𝑎_!, . . = ^!_!!!𝑓_!_!_!_!_..(𝑎_!|𝑎_!!!) (16)

where the density function f, is the density function for the specified distribution of the model, e.g. normal or student-‐t distribution.

When we have maximized the likelihood function we will use it for the minimization of the AIC function, which in R is given by:

𝐴𝐼𝐶 = −2 ln 𝐿 /𝑁 + 2 𝑘 /𝑁 (17)

(24)

where N is the number of observations in our data after differentiation, k is the number of parameters in the ARCH and GARCH models, respectively (Ghalanos 2013:26), and L is the value of the maximum Likelihood.

4.1.2 Bayesian Information Criteria (BIC)

The Bayesian Information Criteria (BIC) is another way to determine the number of lags to include for the best fit of ARCH/GARCH models given a number of models (Javed and Mantalos 2013). Many studies suggest that when the sample size is large, the BIC

performs better in selecting the correct order of lags in your model (McQuarrie and Tsai, 1998).

The BIC function for an ARCH (p) model in R is given by:

𝐵𝐼𝐶 = (−2) ln 𝐿 /𝑁 + 𝑘 ln 𝑁 /𝑁 (18)

Notice that, compared to AIC, the BIC criterion penalizes more complex models, i.e.

models with more parameters, relative to models with fewer parameters. Also here, the BIC function selects the model that minimizes the value of the function.

4.1.3 Hannan-‐Quinn Information Criteria (HQIC)

Hannan and Quinn (1979) introduced another information criterion in order to compare models for model selection (Javed and Mantalos 2013). The HQIC function in R is

calculated as:

𝐻𝑄𝐼𝐶 = !! !" !

! + (!!"# !" !

! ) (19)

and also here N is the number of observations, ln(L) is the logarithm of the maximum likelihood function and k is the number of parameters in the model (Ghalanos 2013:26).

The results from the information criteria applied on the six data sets can be seen in appendix 1-‐6 along with the Lagrange multiplier test results. Appendix 1 shows that for the Apple stock the criteria select ARCH (16), ARCH (8) and ARCH (11) for the AIC, BIC and HQIC respectively. For the Coca-‐Cola stock in Appendix 2 all three information criteria select ARCH (19). Although AIC gives the same value for ARCH (19) and ARCH (23) we will choose ARCH (19) because it is preferable to have less parameters in our models. In appendix 3 we also see that the three information criteria selects the same number of lags, namely ARCH (19).

(25)

For the exchange rate GBP/USD there is a big difference of the preferable model. Both the AIC and the HQIC function select ARCH (28), and the BIC function selects ARCH (1).

This may be because as mentioned in section 3.2.2 the BIC function penalizes models with more parameters. For the SEK/USD exchange rate in Appendix 5 both the HQIC and the BIC function select ARCH (1) and AIC selects ARCH (15). And in Appendix 6 all tree information criteria select ARCH (1) for the GBP/USD exchange rate.

4.2 Mean Square Error (MSE)

When we have selected number of lags for the best-‐fitted model of each of the three criteria we use the MSE function to determine which model to choose. This result is then compared with the MSE result for all models, and the model with the lowest MSE value will be the best forecasting model for the given data. To evaluate the MSE function we need to forecast the future conditional variances using the selected models. These forecasts will then be compared with the future values of the squared log return for the stocks and exchange rates. The models with the least MSE values will then be the selected models. The MSE function can be described as:

𝑀𝑆𝐸 =!""^! !""(𝑎_!^!− 𝜎_!^!)^!

!!! (20)

where 𝜎^! represents the estimated conditional variance and 𝑎^! is the squared log returns (Hansen and Lunde 2005). Notice that this function forecasts the next 100 daily observations. This is why these observations were excluded when determined the best fit of the models so they could be used as a measurement for the best forecast. Also remember that this MSE function is the function used to compare and evaluate the different heteroscedasticity models.

4.3 Why use GARCH (1,1)?

Hansen and Lunde published a study in 2005 where they compared different GARCH models to see if anything beats the GARCH (1,1) model. Their aim was to determine if the improvement of the GARCH models after Bollerslev’s introduction of the first GARCH model in 1986 has led to better forecast ability. They found that the forecast ability of ARCH (1) model is outperformed by other models. This is because the ARCH (1) model does not capture the persistence in volatility as the other models. When they compared different GARCH models to the GARCH (1,1) model they found no evidence to reject the standard GARCH (1,1) model in favor of other GARCH models. They found this

somewhat surprising because the GARCH (1,1) model do not capture the leverage effect like the more sophisticated models like EGARCH and TGARCH.

To be able to compare the results of different GARCH models, I chose to evaluate all models using order (1,1).

(26)

4.4 Why use Student-‐t distribution?

Bollerslev suggested that the GARCH model 𝑎_! = 𝜎_!𝜀_! with assumed conditionally normal distribution might not sufficiently cover the leptokurtosis in financial time series. He suggested that sometimes the model 𝑎_! = 𝜎_!𝜀_! has thicker tails and is better described by a student-‐t distribution. He therefore introduced the GARCH-‐t model which assumes a student-‐t distribution instead of the normal distribution, (Bollerslev 1987).

(27)

5 Method

The steps used on the following models ARCH, GARCH, GARCH with student-‐t

innovations, IGARCH, EGARCH and TGARCH to find the best forecast accuracy for the conditional variance are explained below. The models were estimated for the three stocks Apple, Coca-‐Cola and Google and the three exchange rates USD/SEK, GBP/USD and GBP/SEK. Remember that the last 100 observations where excluded in order to evaluate the forecasting accuracy of the models. All calculations have been done using the rugarch package in R.

1) First differentiate the time series using the logarithmic returns to turn the series into a stationary time series.

2) The ACF/PACF function and the Lagrange multiplier test were applied to the residuals to examine if there were any ARCH effect.

3) Then the maximum likelihood function was estimated for different lags of the ARCH (p) models, and the logarithmic value of the result was calculated.

4) With different log likelihood values the AIC, HQIC and the BIC function where evaluated to find the value that minimizes the functions. The selected value then represented the number of lags to choose for the ARCH (p) model.

5) Then the coefficients for all models where estimated by maximizing the likelihood function for the chosen number of lags.

6) Then I estimated the conditional variance.

7) Then I calculated the log returns for the remaining 100 observations for each data set.

8) And last, I evaluated the forecasting accuracy using the MSE function. The MSE function, as described in section 3.3, is:

𝑀𝑆𝐸 = ^!

!"" !""(𝑎_!^!− 𝜎_!^!)^!

!!! (21)

Heteroscedasticity Models and their Forecasting Performance

Department of Mathematics Uppsala University

Heteroscedasticity Models and their Forecasting Performance

Sebastian Sjöholm

Abstract

Table of Contents

1 Introduction

1.2 Previous studies

1.3 Aim

2 Heteroscedasticity models

2.1 Autoregressive Heteroscedasticity model

2.2 GARCH (p,q) model

2.3 GARCH (p,q) with student-­‐t distribution

2.4 Integrated GARCH model (IGARCH)

2.5 Asymmetry in volatility

2.6 Exponential GARCH model (EGARCH)

2.7 TGARCH

3 Data

4 Building an ARCH (p) model

4.1 Determining the order for an ARCH (p) model

4.2 Mean Square Error (MSE)

4.3 Why use GARCH (1,1)?

4.4 Why use Student-­‐t distribution?

5 Method

2.3 GARCH (p,q) with student-‐t distribution

4.4 Why use Student-‐t distribution?