Volatility Evaluation Using Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple

(1)

IN

DEGREE PROJECT MATHEMATICS, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2019,

Volatility Evaluation Using

Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple

DARKO BLAZEVIC

FREDRIK MARCUSSON

(2)

(3)

Volatility Evaluation Using

Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple

DARKO BLAZEVIC FREDRIK MARCUSSON

Degree Projects in Financial Mathematics (30 ECTS credits)

Master's Programme in Applied and Computational Mathematics (120 credits) KTH Royal Institute of Technology year 2019

(4)

TRITA-SCI-GRU 2019:099 MAT-E 2019:55

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(5)

Abstract

This study examines and compares the volatility in sample fit and out of sample forecast of four different heteroscedasticity models, namely ARCH, GARCH, EGARCH and GJR-GARCH applied to Bitcoin, Ethereum and Ripple. The models are fitted over the period from 2016-01-01 to 2017-12- 31 and then used to obtain one day rolling forecasts during the period from 2018-01-01 to 2018-12-31. The study investigates three different themes consisting of the modelling framework structure, complexity of models and the relation between a good in sample fit and good out of sample forecast.

AIC and BIC are used to evaluate the in sample fit while MSE, MAE and R2LOG are used as loss functions when evaluating the out of sample forecast against the chosen Parkinson volatility proxy. The results show that a heavier tailed reference distribution than the normal distribution generally improves the in sample fit, while this generality is not found for the out of sample forecast. Furthermore, it is shown that GARCH type models clearly outperform ARCH models in both in sample fit and out of sample forecast.

For Ethereum, it is shown that the best fitted models also result in the best out of sample forecast for all loss functions, while for Bitcoin non of the best fitted models result in the best out of sample forecast. Finally, for Ripple, no generality between in sample fit and out of sample forecast is found.

(6)

(7)

Utv¨ ardering av volatilitet via betingade heteroskedastiska modeller p˚ a Bitcoin,

Ethereum och Ripple

Sammanfattning

Den här rapporten undersöker om bättre anpassade volatilitetsmodeller leder till bättre prognoser av volatiliteten för olika heteroskedastiska modeller, i detta fall ARCH, GARCH, EGARCH och GJR-GARCH, med olika in- novationsdistributioner. Modellerna anpassas för Bitcoin, Ethereum och Ripple under 2016-01-01 till 2017-12-31 och därefter görs endagsprognoser under perioden 2018-01-01 till 2018-12-31. Studien undersöker tre olika teman best˚aende av modellstruktur, komplexitet av modeller och relatio- nen mellan en god passning och god prognos. För att evaluera passnin- gen för modellerna används AIC och BIC och för prognoserna används förlustfunktionerna MSE, MAE och R2log som evaluering av prognosen mot den valda volatilitetsproxyn Parkinson. Resultaten visar p˚a att in- novationsdistributioner med tyngre svansar än normalfördelningen generellt leder till bättre passning, medan man för prognoserna inte kan dra en s˚adan slutsats. Vidare visas det att GARCH-modellerna p˚avisade bättre resultat b˚ade för passning och prognoser än dem mer simpla ARCH-modellerna. För Ethereum var samma modell bäst för samtliga förlustfunktioner medan Bit- coin visar olika modeller för respektive förlustfunktion. För Ripple kan inte heller n˚agon generalitet p˚avisas mellan passning och prognoser.

(8)

(9)

Acknowledgements

We would like to thank our supervisor Boualem Djehiche from the Depart- ment of Mathematics at KTH for insightful discussions, valuable input and mathematical guidance throughout the process of this thesis. Lastly we would like to thank our families and friends for the support.

(10)

(11)

List of Figures

1.1 V olatility curves. . . 3 2.1 Log returns of Bitcoin. . . 10 2.2 Squared log returns of Bitcoin. . . 10 2.3 QQ plot f or Bitcoin plotted against the optimal normaldistribution 11 2.4 QQ plot f or Bitcoin plotted against the optimal generalized

hyperbolic distribution . . . 12 3.1 P arkinson volatility estimator (red) vs Squared returns

(black). . . 23 4.1 Bitcoin f orecast perf ormances f or used loss f unctions. . . 29 4.2 Ethereum f orecast perf ormances f or used loss f unctions. 30 4.3 Ripple f orecast perf ormances f or used loss f unctions. . . 31 4.4 Optimal f orecast with M SE as loss f unction plotted against

the proxy. . . 32 4.5 Optimal f orecast with R2LOG as loss f unction plotted

against the proxy. . . 33 4.6 Optimal f orecast with M AE as loss f unction plotted against

the proxy. . . 34 4.7 Optimal f orecast f or all loss f unctions plotted against the

proxy. . . 35 4.8 Optimal f orecast with M SE as loss f unction plotted against

the proxy. . . 36 4.9 Optimal f orecast with M AE and R2LOG as loss f unctions

plotted against the proxy. . . 37 B.1 Log returns of Ethereum. . . 66 B.2 Squared log returns of Ethereum. . . 67

B.3 QQ plot f or Ethereum plotted against the optimal normaldistribution 68 B.4 Log returns of Ripple. . . 69

B.5 Squared log returns of Ripple. . . 70 B.6 QQ plot f or Ripple plotted against the optimal normaldistribution 71

(14)

(15)

Chapter 1

Introduction

During 2017, the cryptocurrency market expanded as never seen before.

With an exploding market cap, most cryptocurrencies reached new levels, with Bitcoin as the leading currency in media growing more than 1200%

(Coinmarketcap, 2019). Following the peak in December 2017, most cryptocurrencies decreased by more than 50% compared to the peak. Even though the bubble burst, the lows of the crypto coin market cap have been increasing every year, suggesting a surplus capital inflow to the market (ibid).

While it has been clear that the underlying technology blockchain has the potential to reform the whole financial world, it is still hard to say what impact cryptocurrencies will bring to the table in the long run. An important factor for dealing with cryptocurrency is to gain an understanding of the development of the market in total as well as in particular currencies, since 80% of the market cap consists of the ten biggest cryptocurrencies(ibid).

The fact that more and more capital is invested in the cryptomarket for each year implies the value of potentially predicting future movements.

As a part of being a currency, the initial purposes of cryptocurrencies were to work as a medium of exchange, enabling lower costs and anonymizing transactions via decentralized systems. During the last couple of years, the usage of cryptocurrencies seem to have changed path into a purely specu- lative investment for most users, leading to an explosive expand as a result of traditional money being invested. As of today, the total market cap circulates around $128 billions, compared to $12 billions in the beginning of 2016 (Coinmarketcap, 2019). As a result of the increasing interest, a need of quantifying the variation of cryptocurrencies becomes relevant. As stated above, the price change of cryptocurrencies in general during the last couple of years have resulted in a high volatility compared to traditional currencies.

While different studies modelling volatility on traditional underlying, such as exchange rates (Charef, 2017; Pilbeam Langeland, 2014 and Gao et al., 2012) and indices (Lin, 2018 and Sharma, 2015), little work exists on the

(16)

cryptocurrencies volatility modelling both regarding fitting the in sample data and forecasting the out of sample data.

A crucial and fundamental aspect in the world of finance is the study of volatility, which not too far ago was assumed to be constant in different modelling theories, lead by the most famous one performed by Black and Scholes back in 1973. Nowadays, assuming constant volatility is replaced by the knowledge of volatility being time-varying and predictable (Andersen Bollerslev, 1997). No matter of asset being an exchange, index or stock, modelling and forecasting of volatility of returns is crucial in multiple set- tings within the finance sector, such as option pricing, where issuers are pricing the derivatives mainly dependent on the volatility of the underlying asset. Risk management and portfolio allocations are also exposed to the future volatility in order to perform hedges and in estimating potential risks and losses, which have been more and more regulated according to law, lead by the MIFID (Markets in Financial Instruments Directive) regulations.

How are then volatility forecasts usually obtained? One common way to obtain the volatility is implied from the option market prices. Theoretically, this method contains all relevant information and parameters for estimating the future volatility via the Black Scholes model. However, the supporting evidence for this resulting in correct future volatilities could be questioned.

Bollerslev and Zhou (2005) show that option prices in general contain a risk premium due to that volatility cannot be perfectly hedged, which in general gives an over-estimation of the volatility and thus higher option prices. An- other phenomenon suggesting questioning the volatility implied from option pricing is the famous volatility smile obtained from the Black Scholes model.

Figure 1 theoretically shows different strike prices of options with the same, arbitrary underlying and maturity date plotted against the volatility, showing that at the money options have minimum implied volatility, while in the money calls and in the money puts in general have the highest implied volatility. Thus, the same market produces multiple implied volatilities for the underlying in the same period. Additional to that, financial crisis have resulted to a more skewed curve (Figure 1) in reality than the theoreti- cal smile curve obtained via Black Scholes due to the fact that institutions started using out of the money puts in their hedging strategies, leading to an increase in value for these. Furthermore, the limited existing maturity dates in the options market limits the time horizon for volatility forecast with this method, which causes problems for certain underlying assets, such as cryptocurrencies, due to the very limited existence of options.

(17)

Figure 1.1: V olatility curves.

Based on the above argumentation, this thesis will instead use the only objective method of volatility forecast left which is available for all assets, namely time series modelling. Four different types of conditional heteroscedasticity models, among many others, will be adapted to the cryptocurrency data and evaluated from two perspectives; in sample data (consisting of 730 data points between January 2016 to January 2018) and out of sample data (consisting of 365 data points between January 2018 and January 2019). One of the first models enabling modelling conditional heteroscedasticity in volatility was provided by Engle (1982), the Autoregres- sive Conditional Heteroscedasticity (ARCH) model, will be used. Although the simplicity, the model requires many parameters in order to describe the volatility process. As a consequence of this, Bollerslev (1986) developed the Generalized Autoregressive Conditional Heteroscedasticity (GARCH) model which is based on the precursor ARCH model, with the addition that far less parameters are needed for modelling the volatility process. Studies (Ning et. al., 2015; Olbrys, 2012) have shown that innovations in financial asset volatility have an asymmetric impact, which weakens both ARCH and GARCH models which take volatility clustering into account but assume that both positive and negative shocks results in the same impact of the volatility. This weakness was studied by Nelson (1991) who improved the existing GARCH model to the Exponential Generalized Autoregressive Con- ditional Heteroscedasticity (EGARCH) model which enables to take into account the differences between positive and negative asymmetric effects. The fourth and final model used in this thesis is the Glosten-Jagannathan-Runkle Generalized Autoregressive Conditional Heteroscedasticity (GJR-GARCH) model, which also models asymmetry.

This thesis aims to examine the four models adapted and evaluated on the three largest cryptocurrencies from a market capitalization perspective, namely Bitcoin, Ethereum and Ripple. The data, existing of daily exchange

(18)

rates against dollar for each cryptocurrency between January 2016 and Jan- uary 2019, was obtained from Coinmarketcap. Furthermore, the thesis aims to examine the models in the cryptocurrency market in terms of in sample fit and out of the sample fit. The thesis will consist of three main themes.

Primary, the structure of the modelling framework will be investigated, including different error distributions, leading to an understanding of how this factor affect the in sample fitting and out of sample forecasts. Secondly, an examination between more and less sophisticated models will be obtained in order to evaluate the potential differences regarding in sample and out of sample fit. The aim in this theme is not to evaluate the most accurate forecasts, but to evaluate if more complex models outperform models with less complexity. Finally, an examination in the relation between in sample fit and out of sample fit will be evaluated in order to determine if the better in sample fit also results in a better out of sample fit.

This thesis contributes to existing literature by investigating volatility models in a new cryptocurrency era, including both extreme upswing and down- swing, in contrast to previous research. The fact that the cryptocurrency phenomenon is relatively new, the added time period of data in this thesis will include a great amount of unexplored data. Also, to the best of authors’ knowledge, no volatility modelling has been performed and evaluated for Ripple so far.

1.1 Blockchain

The underlying technology enabling cryptocurrencies, blockchain, could be described as a decentralized database which records transactions between two parties efficiently and in a verifiable and permanent way. The process could be described in a couple of steps starting with a transaction request by the sending party. The request is then received by the network consisting of different computers(nodes) which are processing(mining) and validating the transactions before adding it to an existing blockchain, or declining it. If added to a chain, the transaction is complete and permanent, leading to the delivery being registered to the receiving party. In 2008, Satoshi Nakamoto, whose identity still remains unknown, introduced world’s first blockchain as a part of the first ever launched cryptocurrency, namely Bitcoin. The revolutionary part to not require verification by a traditional third trusted party raised the interest in cryptocurrencies. As of the end of 2016, the total market cap for cryptocurrencies exceeded $15 billions. The following year did not only attract regular payment users, but also investors and speculators. As a result of this, the cryptomarket boomed to levels never seen before, reaching a market cap all time peak of $830 billions. By the end of 2018 market cap bounced back and circulates around $125 billions.

(19)

Until today’s date, over 2000 new currencies have been introduced to the cryptomarket. Although the amount of currencies being introduced, only a handful represent the big majority of the market cap from year to year.

1.2 Literature Review

The following chapter will briefly present and explain previous literature relevant for the examination of the presented topic and research questions.

The material used below is limited to only include peer reviewed articles.

In the study W hat good is a volatility model?, Engle and Patton (2001) discuss GARCH modelling and forecasting where stylized facts about volatility are highlighted. Primarily, it was shown that volatility exhibits persistence, meaning that large changes in price of an asset often is followed by other large changes, and small changes in asset price leading to new small changes, which also was shown by Baillie et al (1996), Chou (1988) and Schwert (1989). The implication of all these studies regarding volatility clustering shows that volatility shocks today significantly will influence many periods of the future volatility. Secondly, Engle and Patton show that volatility in the long run is mean reverting, meaning that there is a normal level of volatility to which volatility eventually will return. In other words, mean reversion in volatility implies that current information has no effect on the long term forecast. Thirdly, the asymmetry of volatility is highlighted, also referred as the leverage effect, which shows that negative shocks tend to have a greater impact than positive shocks on the volatility for equities. This re- flection is supported by other studies as well, such as Black (1976), Christie (1982), Nelson (1991) and Glosten et al (1993), where evidence of volatility being negatively related to equity returns. Regardless of this, such evidence for traditional exchange rates have not been found in general. Furthermore, the asymmetric structure of volatility generally generates a skewed distribution of forecast prices, which was exemplified (Figure 1.1) for option pricing.

In the study Asymmetric V olatility in Cryptocurrencies, Baur and Dimpfl (2018) investigate 20 of world’s largest cryptocurrencies based on market capitalization. By using threshold GARCH (TGARCH) which takes asymmetry into account, they show a very different asymmetry effects compared to the equity markets. According to the study, positive shocks increase volatility by more than negative shocks. A suggested reason for this is explained to be the so called ”fear to missing out” effect, where investors use to go long in the cryptocurrency market as a result of positive news. This suggested reason is showed to be stronger for Ripple than for Bitcoin and Ethereum. Time series data for Ripple and Ethereum was from creation until 8 August 2018. For Bitcoin, times series data used was in the interval

(20)

from 28 April 2018 to the same end date as for Ripple and Ethereum.

Bouoiyour et al. (2015) investigate nine different GARCH-models applied on log returns of Bitcoin in two different time intervals. The optimal fit was evaluated using AIC, BIC and HQC. In contrast to Baur and Dimpfl, the authors suggest that Bitcoin is likely to be driven by negative shocks rather than positive in both time intervals, where the degree of asymmetry is strong in the period between December 2010 to December 2014. As a result of this, EGARCH, which accounts for asymmetry and leverage effects, gave the most optimal in sample fit in the used time interval. Furthermore, a shorter interval between 1 January 2015 to 20 July 2016 was examined, where the same asymmetry was spotted, even though is was weaker than in the longer, previous period. Additionally, they also show that the duration of volatility persistence decreases drastically compared to the longer period.

Finally, the authors conclude that the volatility seems to be on continued decline since January 2015.

Bouri et al. (2017) present properties for Bitcoin before and after the price crash during 2013. Prior the crash, the results pointed towards a positive relation between shocks and volatility, where positive news had a greater impact on volatility than negative. However, this property ceased in the post crash period. The authors suggest that inverted asymmetry compared to the equity market could be explained by the so called ”safe haven effect”, introduced by Baur (2011). In his study, Baur demonstrates the inverted asymmetry in gold, where gold is described to give rise to the safe haven property. According to this, investors interpret rising gold prices as a sig- nal for uncertainty in other markets, such as increased risk or uncertainty of macroeconomic and financial conditions. This on the other hand, introduces uncertainty in the gold market which yields to a higher volatility. Hence, positive shocks increase the volatility by more than negative ones.

GARCH M odelling of Cryptocurrencies (Chu et al., 2017) presents GARCH- type modelling of seven different cryptocurrencies, namely Bitcoin, Dash, Doecoin, Litecoind, Maidsafecoin, Monero and Ripple. Twelve different GARCH models, namely SGARCH(1,1), EGARCH(1,1), GJRGARCH(1,1), APARCH(1,1), IGARCH(1,1), CSGARCH(1,1), GARCH(1,1), TGARCH(1,1), AVGARCH(1,1), NGARCH(1,1), NAGARCH(1,1) and ALL GARCH(1,1) were fitted to the log returns of cryptocurrency exchange rates where maximum likelihood was used for fitting. Data between May 2017 and June 2019 was used. The goodness of the in sample fit was evaluated in terms of five different criteria, consisting of AIC (Akaike information criterion), BIC (Bayesian information criterion), AIC (consistent Akaike information criterion), AICc (corrected Akaike information criterion) and HQC (Hannan- Quinn criterion), where smaller values indicated better fit. Eight differ-

(21)

ent distributions (normal distribution, skew normal distribution, Student t distribution, skew student t distribution, skew generalized error distribution, normal inverse Gaussian distribution, generalized hyperbolic distribution and Johnson’s SU distribution) of the innovations process were examined, resulting in lowest value of all criteria for normal distribution for each GARCH-type model and for each currency. Two exceptions were found, the first using TGARCH(1,1) applied to Ripple, and secondly using AV- GARCH(1,1) also applied to Ripple, where the innovation process followed skew normal distribution gave the best fit. Among the twelve best fitting models for each currency, the IGARCH(1,1) model with normal distributed innovations resulted in lowest values of the criteria for five currencies, including Bitcoin. For Ripple, the GARCH(1,1) model with normal distributed innovations resulted in lowest values of the criteria, and for Dogecoin, the GJRGARCH(1,1) model with normal innovations resulted in lowest values of the criteria. In the study V olatility estimation of bitcoin : A comparison of GARCH models, Katsiampa (2017) also investigates the in sample fit of six different GARCH models (GARCH, EGARCH, TGARCH, APGARCH, CGARCH and ACGARCH) on Bitcoin log returns. Optimal models were also chosen according to the three information criteria AIC, BIC and HQC.

Results showed that the AR-GARCH model gave the most optimal in sample fit.

(22)

Chapter 2

Methodology and Data

This paper will include four different volatility models introduced in the introduction. The models will be applied on three different time series, namely Bitcoin, Ethereum and Ripple. In order to achieve the presented aims, the so called in sample fit and out of sample forecast will be defined and motivated. Furthermore, some characteristics of the data sets will be illustrated.

2.1 Methodology

The set of data points will be divided into two subsets which will be referred as the in sample subset and the out of sample subset. We define the complete set consisting of S data points: p1, p2,..., pS. The in sample subset is then defined consisting of p1, p2,..., pn, followed by the out of sample subset containing p_n+1, p_n+2,..., p_S. The in sample subset is evaluated via AIC and BIC, where two models (the best model according to each criterion) from each model family (i.e. ARCH, GARCH, EGARCH and GJR-GARCH) are selected. For the ARCH family, models with parameters combined of q = 1, .., 3 are evaluated. For the GARCH models, the parameters are fixed to p = 1 and q = 1 but where the in-build ARMA model is tested for com- binations of p = 0, .., 3 and q = 0, .., 3, which is also the case for the ARCH models. This procedure is then obtained for four different error distributions resulting in up to 72 models qualifying for the out of sample evaluation. For the out of sample forecast, the following iteration scheme is used:

Set the initial forecast origin to T = n. Fit each of the models to the in sample data containing p₁, p₂,..., p_n. Select the best models in terms of AIC and BIC for out of sample analysis.

Let h be the maximum forecast horizon, i.e. the last point of the initial out of sample subset. Calculate the one-step to h-step forecast using the

(23)

best AIC/BIC fitted models.

Compute all forecast errors for each model, for each of the forecast steps according to the definition below, where the actual volatility is determined by a volatility proxy.

When this is performed, increase the origin by one step, which yields m = m + 1 and repeat the process. This iteration should continue until the forecast origin m equals the last point in the original out of sample subset S In order to evaluate the one-step to h-step forecasts some kind of loss function must be introduced (see chapter 3). In the forecast scheme described above, the estimation sample increases as the forecast origin increases, which implies that all forecasts always are based on all available information. Even though the available information increases, in this study we will not re-fit the model in order to limit the computational burden, which means that the models will be fitted only once to the original in sample subset. This study will only include the one day ahead forecast since different forecasting horizons exceed the scope of this study.

2.2 Data

Data used in this study is obtained from Coinmarketcap, where the in sample subset contains the daily opening price from January 2016 until January 2018 and the out of sample subset from January 2018 to January 2019 for the three biggest cryptocurrencies from a market capitalization perspective, Bitcoin, Ethereum and Ripple. According to Tsay (2008), a reasonable choice is to set the in sample subset to ²₃ of the total set, and the out of sample subset to the remaining ¹₃. When computing the volatility proxy, opening prices as well as daily high and low prices are used, see chapter 3.

In order to illustrate some characteristics for the data, different plots are used. Figure 2.1 shows daily log returns for Bitcoin, which visually seems to be stationary with a mean around zero and also mean reverting, since volatility is moving between some range and not diverging. Clustering of volatility is also spotted, containing both relatively calm and turbulent periods, which is one of the key characteristics of asset return volatility.

This is also shown when inspecting the daily squared log returns in Figure 2.2, where relatively turbulent periods are followed by more calm periods

(24)

Figure 2.1: Log returns of Bitcoin.

and vice versa.

Figure 2.2: Squared log returns of Bitcoin.

(25)

Furthermore, the empirical distribution of the daily log returns is investigated. Figure 2.3 shows a QQ plot for Bitcoin, containing the empirical distribution of the daily log returns plotted against the best fitted normal distribution.

Figure 2.3: QQ plot f or Bitcoin plotted against the optimal normaldistribution

(26)

Clearly, the empirical distribution of the daily log returns has heavier tails than the best fitted normal distribution, suggesting that another reference distribution might be appropriate. Figure 2.4 illustrates the empirical distribution of daily log returns plotted against the best fitted generalized hyperbolic distribution, showing a much better fit in the tails.

Figure 2.4: QQ plot f or Bitcoin plotted against the optimal generalized hyperbolic distribution

(27)

The characteristics for Ethereum and Ripple are summarized in Ap- pendix A containing the corresponding plots as for Bitcoin. The characteristics for Ethereum and Ripple also follow the above description of Bitcoin where mean reversion, volatility clustering and heavy tails are identified.

(28)

Chapter 3

Mathematical Theory

In this chapter we introduce some basic ideas of time series analysis and conditional variance models. As mentioned in the introduction, volatilities of asset returns are shown to be predictable and time-varying. Engle and Patton (2001) mention three important characteristics that should be considered. Firstly, volatility of returns show persistence, meaning that volatility shocks today will influence volatility expectations many periods in the future. Secondly, regardless of its persistence, volatility is still mean reverting in the long run, meaning that after longer periods of higher or lower volatility, a correction towards the mean will occur. Thirdly, volatility is asymmetric, meaning that negative shocks have greater impact than positive shocks. When modelling and forecasting volatility, theory suggests that the more of these characteristics are incorporated, the better description of conditional variance will be obtained. Thus, the properties of each presented model below will be evaluated according to above.

3.1 Stationarity and Financial Data

A key role in time series analysis is played by processes whose properties, or some of them, do not vary with time. If we wish to make predictions, then clearly we must assume that something does not vary with time. In extrap- olating deterministic functions it is common practice to assume that either the function itself or one of its derivatives are constant. The assumption of a constant first derivative leads to linear extrapolation as means of predic- tion. In time series analysis our goal is to predict a series that typically is not deterministic but contains a random component.

The closing price P on a trading day t of a particular underlying asset generally appears to be non-stationary. On the other hand, the log price of the underlying Y_t:= log(P_t) has observed sample paths, similarly like those

(29)

of a random walk with stationary and uncorrelated increments, leading to the log return of the underlying

Rt= Yt− Y_t−1= log(Pt) − log(Pt−1) = log( Pt

Pt−1

). (3.1)

Furthermore, log returns have sample paths comparable to white noise. Al- though the comparison to white noise, strong evidence suggest that the sequence Rt is not independent (Brockwell Davis, 2016).

A model that is not stationary will not be mean reverting, which means that some shocks can potentially make the model diverge. Often the data the models are built upon are also transformed to be stationary. There are several hypothesis tests developed to see if a time series model is stationary, the Dickey-Fuller test for example. In this thesis the log returns have been tested with the augmented Dickey-Fuller test (see appendix B) and has been confirmed stationary. The results are however omitted from the thesis.

3.2 Conditional mean

Conditional mean models are the most fundamental models used in time series analysis. The ARMA(p,q) model uses past observations and innovations to form a simple relationship with todays data.

E(Rt|F_t−1) = Rt= µ+φ1Rt−1+φ2Rt−2+...+φpRt−p+Zt+θ1Zt−1+...+θqZt−q

(3.2) where Zt is a W N (0, σ²) process.

The main benefits of the ARMA(p,q) is that good models can be created only from the dataset without the use of other explaining variables. In addition, the models are more robust to changes in the datasets behaviour, compared to traditional regression models.

Financial time series often experience periods with low or extremely high volatility, which is not captured by the ARMA(p,q) model. Therefore, this report will combine the ARMA(p,q) model with all of the heteroscedasticity models which will be explained, to see how different orders of p and q will be able to capture the volatility of the data.

3.3 Conditional variance

The conditional variance V ar(R_t|F_t−1)=E((R_t− µ_t)²|F_t−1)=σ²_t is modeled in a similar way as the conditional expectation. When combining ARMA with conditional variance models, µ in the equation above is considered to

(30)

follow an ARMA(p,q) model. There are several models that incorporate different features such as giving larger volatility impacts to negative or positive innovations.

3.3.1 Autoregressive Conditional Heteroscedasticity model (ARCH)

Heteroskedasticity is a term describing that the variance of the data is changing over time. This effect is very prominent in financial data as the econom- ical climate can change very rapidly. This can lead to periods where prices rapidly grows or falls, resulting in very high volatility. The assumption of traditional ARMA(p,q) models are that the variance is constant over time, which can lead to under or over estimations when forecasting with the models.To combat this the ARCH/GARCH models were developed, which combats the heteroskedasticity and leads to iid squared residuals.

The ARCH model introduced by Engle (1982) revolutionized the way to model conditional heteroscedasticity in volatility. The model itself is described as relatively simple compared to other models but requires many parameters to describe the volatility. The ARCH(q) process Z_tis defined as a stationary solution of the equations

Z_t= h_te_t, e_t IID(0, 1) (3.3) where h_t is the function of Z_s, s < t, defined by

h_t= α₀+

q

X

i=1

α_iZ_t−i² (3.4)

with α₀> 0 and α_j ≥ 0, j = 1, ..., q, implying a positive conditional variance, where p is the order of the model. For ARCH models in general, the order could be obtained from the Sample Partial Autocorrelation function of squared returns (Sample PACF) where lags greater than q should be close to zero. This usually results in high orders enabling accurate modelling of the conditional variance. Over time, it has been shown that higher order models barely over perform lower orders in terms of out of sample volatility forecast (Bollerslev et al., 1992). As a consequence of this, the order is limited to three in this study which also is favourable in terms of modelling complexity. Furthermore, the ARCH model manages to take volatility clustering into account, which can be identified in the definition of conditional variance, ht, where large Zt−i implies large ht. As a consequence of this, large shocks will be followed by large shocks and small shocks will be followed by small shocks. Even though volatility clustering is included in the model, the previous mentioned asymmetry effects are not. This is easily shown since the model only takes the shocks squared as an input. Another

(31)

drawback is that the model is likely to over predict the volatility due to the fact that the model imposes restrictive intervals for parameters in order to have finite fourth moments.

Consider an ARCH(p) model with j-steps forecast for h²_k+j, which yields h²_k(j) = α₀+

p

X

i=1

α_ih²_k−i(j − i). (3.5) This study will use the one-step forecast for some ARCH(q) model with a maximum order q of three, which yields the following one-step forecast

h²_k(1) = α₀+ α₁h²_k(0) + α₂h²_k(−1) + α₃h²_k(−2). (3.6) 3.3.2 General Autoregressive Conditional Heteroscedastic-

ity model (GARCH)

The GARCH(p,q) process introduced by Bollerslev (1986) is a generaliza- tion of the above described ARCH(p) process. With similar properties, the GARCH process contains a modified variance equation defined according to

h²_t = α₀+

p

X

i=1

α_iZ_t−i² +

q

X

j=1

β_jh²_t−j, (3.7)

with α0> 0, αi ≥ 0, β_j ≥ 0 andPmax(p,q)

i=1 (α0+βj)<1.Ztis, as in the ARCH model, defined by

Z_t= h_te_t, e_t IID(0, 1). (3.8) Furthermore, the GARCH model requires less parameters for modelling the volatility. This could easily be motivated by comparing equation 3.4 and 3.7, where an additional term containing lagged conditional variances h_t−j reduces the amount of lagged square returns Z_t−i needed compared to the ARCH model for volatility modelling. Similarly to the ARCH model, the GARCH model does not take asymmetry in volatility clustering into account and also requires parameters to have a finite fourth moment. Consider a GARCH(p,q) model with j-steps forecast for h²_k+j, which yields

h²_k(j) = α₀+ (α₁+ β₁)h²_k(j − 1), (3.9) where j > 1 and k is the forecast origin. A more explicit expression is obtained by substituting h²_k(j − 1) iteratively, which yields

h²_k(j) = α₀(1 − (α₁+ β₁)^j−1)

1 − α1− β₁ + (α1+ β1)^j−1h²_k(1). (3.10)

(32)

3.3.3 Exponential General Autoregressive Conditional Het- eroscedasticity model (EGARCH)

The EGARCH model was introduced by Nelson (1991) developed from the previous GARCH model. Unlike ARCH and GARCH, this model revolutionized the ability to distinguish the impact of positive and negative shocks in volatility clustering. As for ARCH and GARCH, Z_t is assumed to be identical according to

Zt= htet, et IID(0, 1), (3.11) but with conditional variance properties according to

log(h²_t) = α₀+

p

X

i=1

(α_iZ_t−i+ γ_i | Z_t−i| −E(| Z_t−i |)) +

q

X

j=1

β_jlog(h²_t−j).

(3.12) In contrast to ARCH and GARCH, no restrictions for parameters are im- posed to avoid negative conditional variance. In order to highlight the asymmetry properties, a function f (Z_t) is introduced where the magnitude effects (γ1) and the asymmetry effects (α1) from an EGARCH(1,1) model are presented according to

f (Zt) = α1Zt−i+ γ1(| Zt−i| −E(| Z_t−i|)), (3.13) where f (Zt) is uncorrelated and has zero mean due to the properties of Zt. Thus, equation (3.12) can be rewritten according to

f (Z_t) = (α₁+ γ₁)Z_t1(Zt> 0) + (α₁− γ₁)Z_t1(Zt< 0) − γ₁E(| Z_t|), (3.14) The impact of positive and negative asset return effects are now easily seen, where positive shocks have an impact α₁+ γ₁ and negative shocks α₁− γ₁. For the asymmetric effect, negative α1 implies greater impact from negative than positive shocks. Thus, EGARCH is able to model volatility persistence, mean reversion and, unlike ARCH and GARCH, asymmetric effects.

3.3.4 GJR General Autoregressive Conditional Heteroscedas- ticity model (GJR-GARCH)

Glosten, Jagannathan and Runkle (1993) presented the GJR-GARCH model with similar properties as the previous explained EGARCH model. Ex- cept of being able to model volatility persistence and mean reversion, GJR- GARCH also models the asymmetrical effects. Zt is, as for the rest of the models, defined according to equation (3.2). The conditional variance is now defined according to

(33)

h²_t = α₀+

p

X

i=1

(α_iZ_t−i² (1 −1(Zt−i> 0)) + γ_iZ_t−i² 1(Zt−i> 0)) +

q

X

j=1

β_jh²_t−j (3.15) with parameters α₀ > 0, α_i ≥ 0, β_i ≥ 0 and γ_i ≥ 0 that guarantee a non negative conditional variance. In order to highlight the asymmetry properties, a function f (Zt) is introduced where the magnitude effects (γ1) and the asymmetry effects (α₁) from an GJR-GARCH(1,1) model are presented according to

f (Z_t) = α₁Z_t−i+ γ₁(| Z_t−i| −E(| Z_t−i|)). (3.16) Using the same reasoning as for the EGARCH model, f (Zt) can be written as

f (Z_t) = (α₁+ γ₁)Z_t1(Zt> 0) + (α₁− γ₁)Z_t1(Zt< 0) − γ₁E(| Z_t|). (3.17) Thus, negative shocks have an impact α₁ while positive shocks have an impact γ₁. Consider a GJR-GARCH(p,q) model with j-steps forecast for h²_k+j, which yields

h²_k(j) = α₀+ (α₁+ γ₁

2 + β₁)h²_k(j − 1). (3.18) where k is the forecast origin. A more explicit expression is obtained by substituting h²_k(j − 1) iteratively, which yields

h²_k(j) = α₀

j−2

X

i=0

α₁+ γ₁

2 + β₁)ⁱ+ (α₁+ γ₁

2 + β₁)^j−1h²_k(1). (3.19)

3.4 Error distributions

In all models used in this thesis, an error distribution must be set in order to fully define each model. Equation 3.2 defines Z_t, where the error term is presented as et. Regardless of which distribution it follows, et should have certain properties, such as being identically distributed and independent with unit variance and zero mean. In this thesis, four different error distributions are specified and used when modelling. As previous shown in the descriptive data and in Appendix A, all three time series used display relative heavy tails compared to the best fitted normal distribution, which motivates that another choice of parametric family modelling heavier tails might be considered as a reference distribution instead. As a consequence of this, three more distributions beyond the normal distribution are used,

(34)

namely the student t distribution, the generalized error distribution and the generalized hyperbolic distribution. Following, all density functions for each distribution are defined.

Normal distribution density function:

f (z) = 1 σ√

2πe⁻

(z−µ)2

2σ2 , −∞ < z < ∞ (3.20) Student t distribution density function:

f (z) = Γ(^v+1₂ ) Γ(^v₂√

vπ(1 +z²

v )⁻^v+1² , −∞ < z < ∞ (3.21) Generalized error distribution density function:

f (z) = λs

2Γ(¹_s)e^−λ^s^|z−µ|^s, −∞ < z < ∞ (3.22) Generalized hyperbolic distribution density function:

f (z) = (γ/δ)^λ

√

2πK_λ(δγ) e^β(z−µ)×

K_λ−1/2

αpδ²+ (z − µ)²

pδ²+ (z − µ)²/α1/2−λ×

K_λ−1/2

αpδ²+ (z − µ)²

pδ²+ (z − µ)²/α1/2−λ, (3.23)

−∞ < z < ∞, where K_λ is the modified Bessel function of second kind.

3.5 Fitting and evaluation of in sample models

When fitting all mentioned models to the in sample data, all parameters are estimated for each model. The fitting will be obtained using the Maximum Likelihood Estimation method, and as a result of this, give the parameter values for each model, for each distribution. When the models are obtained, each one of these will be evaluated according to the AIC and BIC.

Maximum Likelihood Estimation is a method of estimating the parameters of a statistical model so the observed data is most probable. The parameters are obtained by maximizing the likelihood function L(θ | z1, z2, .., zn) where θ contains the set of parameters being estimated. The likelihood function could furthermore be described as the joint probability of the observed data z1, z2, .., zn, which in this case is the in sample subset, over the parameter space θ. The function is defined according to

(35)

L(θ | F_n−1) =

n

Y

t=1

φ(Z_t| F_t−1), (3.24) where F denotes the available information at a specific time and φ the density function for a certain distribution. In order to limit the computational burden, the logarithm of the likelihood function will be used since maximizing this is equivalent to maximizing the original likelihood function, which yields to the following equation

logL(θ | F_n−1) =

n

X

t=1

logφ(Z_t| F_t−1). (3.25)

As suggested in the literature review, two of the most used information criteria will be used, Akaike information criterion (AIC) and Bayesian information criterion (BIC). These criteria will enable a way to compare the model’s fit to the in sample data. For both of these, the maximum likelihood will be used together with a penalty based on the complexity of the model according to below.

AIC = −2logL(θ) + 2k, (3.26)

where k is the number of parameters and logL(θ) the maximum logarithmic likelihood function. For BIC, an additional parameter N is introduced which equals the number of data points used in the in sample data, and is defined according to

BIC = −2logL(θ) + klog(N ). (3.27) The smaller value for a model, the better is the fit to the in sample data, including the penalty for the number of parameters and number of data points respectively.

To drive inference from our models they should be stationary as a non- stationary process can diverge and reach infinite variance. The model should also have iid squared residuals, therefore an augmented Dickey-Fuller test is performed on the residuals, as well as a weighted Ljung-Box test.

3.6 Evaluation of out of sample models

When all models have been determined in form of order and error distribution and fitted to the in sample data, followed by obtaining the one day forecast for the whole out of sample period, the evaluation of these results

(36)

is what remains. However, evaluating the performance of the forecasted volatilities is far from trivial and standardized. The main reason might be that conditional volatility itself is unobservable. As a consequence of this, a proxy for the actual volatility must be determined in order to obtain a reference when comparing the forecasted volatilities. This section will highlight the complexity of volatility and discuss different proxies considered, ending with a specification for the proxy used.

It can be shown that squared returns is an unbiased estimator for the volatility, however this proxy can be very noisy which resulted in that statisticians for a long time believed that GARCH models gave poor volatility predictions (Andersen and Bollerslev, 1998). Andersen and Bollerslev later found that depending on the proxy used GARCH models actually forecast volatility well. Since then, several volatility proxies have been developed. The Parkinson estimator and the Garman-Klass estimator are two of the more famous proxies. The former assumes constant trading and uses daily highs and daily lows to estimate the volatility. The Garman-Klass estimator also includes opening and closing prices to provide a more accurate measure (San- tander, 2012). There are also other measures which account for opening gaps and drifts in the data. However, since the the cryptocurrency markets are open nonstop, we believe that the Parkinson estimator will be well suited for cryptocurrencies. In addition, even though the Parkinson estimator is not the best estimator in simulations, some studies have shown that it might be the best estimator on empirical data (Bennet, A. Gil, 2012). Figure 3.1 shows the daily squared returns and the Parkinson estimator, applied to the Ethereum. We can see that most of the time, the Parkinson estimator is less noisy, as mentioned in several articles.

σ_parkinson² = ln(^high_low)²

4ln(2) (3.28)

When the proxy is calculated, the models are evaluated using some of the loss functions discussed in (Bollerslev, Engle and Nelson 1993). In this study, the following loss functions will be used in order to evaluate the forecast of the different models.

M SE = 1 n

n

X

t=1

(σ²parkinson,t− σ²_pred,t)² (3.29)

(37)

Figure 3.1: P arkinson volatility estimator (red) vs Squared returns (black).

R2log = 1 n

n

X

t=1

(log(σparkinson,t²

σ²_pred,t ))² (3.30)

M AE1 n

n

X

t=1

|σparkinson,t² − σ_pred,t² | (3.31)

When choosing a loss function, the optimal one will always be the one that penalizes results which counter the goal of the model. For example, as mentioned in Bollerslev, Engle and Nelson (1993), MSE might not be the best measure, since it is fully symmetric and does not penalize zero or negative variance. The R2log will exaggerate values close to zero and become larger if forecasts close to zero are wrong. MAE is similar to MSE in many cases, but it will not penalize large errors as much. West et al (1983) used a utility based measure which highlights the goal of the model, which in that case is to maximize risk adjusted returns, which should be the main factor when choosing a loss function. Since our goal is to study the behaviour of the data, we chose these three loss functions above.

(38)

Chapter 4

Results

This chapter will present the results obtained for each model. Primarily, the optimal resulting in sample fit will be presented, followed by the out of sample forecasts. In order do limit the reading burden, all AIC and BIC values are presented in Appendix A, meanwhile only the most optimal fitting models are presented in 4.1.

4.1 Optimal in sample fit

The optimal in sample fit will be presented for each cryptocurrency, starting of with Bitcoin, followed by Ethereum and Ripple. For each model, the parameters of the in-build ARMA(p,q) are specified together with the corresponding AIC or BIC value.

4.1.1 Bitcoin

(39)

For the ARCH models, clearly, innovations with student t distribution yields the most optimal fit. For the different GARCH models, the generalized error distribution yields the lowest values of the criteria, with exception for the AIC for GARCH, where the generalized hyperbolic distribution is optimal.

Furthermore, no matter of the model, the normal distribution results in the poorest fit.

(40)

4.1.2 Ethereum

For all three ARCH models, innovations with generalized hyperbolic distribution yields the most optimal fit. For all GARCH models, generalized hyperbolic distribution yields the lowest AIC and generalized error distribution the lowest BIC.

(41)

4.1.3 Ripple

As for Ethereum, the most optimal fit for the ARCH models is obtained using student t distribution for Ripple. In contrast to both Bitcoin and Ethereum, student t distribution also yields the lowest BIC for the EGARCH model and the lowest AIC for the GJR-GARCH model. For the different GARCH models, generalized hyperbolic distribution also yields a good fit.

(42)

4.2 Out of sample forecast

The best fitted (green marked from section 4.1) models are used to forecast the volatility. In total, 24 models qualified. All of them are evaluated using three criteria, namely MSE, MAE and R2log as described in section 3.10. The tables for each currency show the performance for each model, for each loss function. The most optimal model for each loss function and each currency is highlighted. Furthermore, each of the highlighted models are illustrated by plotting the resulting forecast obtained by the model (red curve) against the volatility proxy (black curve).

(43)

4.2.1 Bitcoin

Figure 4.1: Bitcoin f orecast perf ormances f or used loss f unctions.

(44)

4.2.2 Ethereum

Figure 4.2: Ethereum f orecast perf ormances f or used loss f unctions.

(45)

4.2.3 Ripple

Figure 4.3: Ripple f orecast perf ormances f or used loss f unctions.

(46)

4.2.4 Best performing forecasts for Bitcoin

For Bitcoin the GARCH(1,1)-ARMA(0,1) with generalized hyperbolic innovations resulted in the most optimal forecast according to the MSE loss function. For R2LOG, the GJR-GARHC(1,1)-ARMA(0,1) with student t- distribution resulted in the best forecast and for MAE, GJR-GARCH(1,1)- ARMA(0,1) with generalized hyperbolic distribution resulted in the best forecast. Below, forecasts from all models are plotted against the realized volatility proxy during the out of sample period.

Figure 4.4: Optimal f orecast with M SE as loss f unction plotted against the proxy.

(47)

Figure 4.5: Optimal f orecast with R2LOG as loss f unction plotted against the proxy.

(48)

Figure 4.6: Optimal f orecast with M AE as loss f unction plotted against the proxy.

(49)

4.2.5 Best performing forecasts for Ethereum

For Ethereum the GARCH(1,1)-ARMA(1,2) with generalized hyperbolic innovations resulted the best performing forecasts over all loss functions. Be- low, the obtained forecast is plotted against the realized volatility proxy.

Figure 4.7: Optimal f orecast f or all loss f unctions plotted against the proxy.

(50)

4.2.6 Best performing forecasts for Ripple

For Ripple the GARCH(1,1)-ARMA(2,2) with generalized hyperbolic innovations resulted in the most optimal forecast according to the MSE loss function. Whereas the GARHC(1,1)-ARMA(1,0) with Generalized error distribution resulted in the best forecast according to the R2log and MAE loss functions. Below, forecasts from both models are plotted against the realized volatility proxy during the out of sample period.

Figure 4.8: Optimal f orecast with M SE as loss f unction plotted against the proxy.

(51)

Figure 4.9: Optimal f orecast with M AE and R2LOG as loss f unctions plotted against the proxy.

(52)

Chapter 5

Analysis & Discussion

In this chapter theory will be used to analyze and discuss the obtained results for each cryptocurrency. The content will follow the presented themes and research questions, aiming for a foundation for the up-coming conclu- sion.

The first presented theme of this thesis is to investigate the structure of the modelling framework, where four different error distributions are included.

When observing the in sample results, it is quite clear that the normal distribution results in a relatively poor fit, compared to the other ones. As shown in the introduction, the normal distribution could be questioned when heavier tails are present. When fitting the different ARCH models it is evident that the student t distribution outperforms all other distributions for Bit- coin and Ripple, meanwhile the generalized hyperbolic distribution is the optimal one for Ethereum. For the different GARCH models, it is evident that the two generalized distributions give rise to the best fitted models for Bitcoin and Ethereum. For Ripple, the optimal GARCH type models are obtained using the generalized error distribution or student t distribution.

For the out of sample forecast, different models are preferred when using different loss functions. This makes sense as they punish different characteristics of the model. When evaluating using a certain loss function, for instance the MSE, we find that the generalized hyperbolic is the best distribution for all the cryptocurrencies. When R2log is used, mixed results are obtained. For Bitcoin a t-distribution is preferred, for Ripple a generalized error distribution is preferred and finally for Ethereum a generalized hyperbolic distribution is optimal. When evaluating using at the MAE, the generalized hyperbolic distribution is optimal for Bitcoin, generalized error distribution for Ripple and generalized hyperbolic distribution is best for Ethereum. Both student t and the generalized distributions are classified as heavy tailed distributions, which is described by Hult et al. (2012).

(53)

Over all, normal distribution, creates noisier models than the other distributions, both in and out of sample. The t-distribution fits reasonably well and many models are performing well in and out of sample. The generalized distributions can take the form of a distribution that closely resembles the innovations, which in turn give us a model that better captures the data.

Our results indicate that the normal distribution will not be suitable for the data and will lead to worse forecast compared to more heavy-tailed distributions. As a contrast to the in sample results, the normal was the best distribution in terms of MSE in the ARCH models. In contrast to this, for the GARCH models it was consistently the worst fit for R2log and on par with the other distributions for the other loss functions.

The second theme of the thesis aims to examine if more advanced models are better performing than less sophisticated models. As described in the theory chapter, differences between the models are clear. Where one of the dimensions is the model itself, where the ARCH model is classified as the least sophisticated one, meanwhile EGARCH and GJR-GARCH are the most sophisticated ones. The other dimension of complexity is the order of the ARMA process (the sum of p and q), where a higher sum indicates a more complex model. As can be seen in the out of sample results, a standard GARCH model is the best model for all loss functions for both Ethereum and Ripple, while GJR-GARCH is the best one for Bitcoin when evaluating using R2log or MAE. Furthermore, when evaluating Bitcoin using the MSE the GARCH model is again the optimal choice. Looking at all the ARCH models as a group we see that they perform worse no matter which loss function is used. As stated in the theory chapter, ARCH models sometimes require a high order to describe the data accurately. This is seen in the out of sample results where the GARCH type models perform a lot better as a group. When comparing model types in sample, we find that the AIC/BIC for the ARCH models are worse compared to the GARCH/EGARCH/GJR-GARCH models for all cryptocurrencies. When comparing GARCH, EGARCH and GJR-GARCH models the AIC and BIC are very close, this together with the out of sample results show us that the GARCH type models are a better choice compared to the simpler ARCH models.

The GARCH type however will not make as much of an impact in the in and out of sample, but it seems that most of the time the simpler standard GARCH model performs slightly better compared to the more advanced GJR-GARCH and EGARCH. Furthermore, the ability of taking eventual asymmetries in volatility into account seems not to give rise to better forecasts. As shown in the literature review, the volatility asymmetry for cryptocurrencies have been changing during the past years in contrast to the stock market, where negative asymmetry is present. Since EGARCH and

(54)

GJR-GARCH do not perform better than the standard GARCH, one might wounder if there actually exists asymmetry in volatility during the examined period.

Regarding the number of variables, we can see that when choosing the optimal model from the in sample data, AIC will prefer more complex models and the BIC will choose less complex models. For the out of sample period, we find that a less complex model often is preferred, where the highest order model used has a sum of four while the overall median is one. This indicates that using a higher order ARMA process will not improve the model and it may expose the researcher to over fitting the data.

The third and final theme of the thesis is to evaluate if a better in sample fit also results in a better out of sample forecast. When comparing the in and out of sample results, it is evident that the overall best fitted model (standard GARCH) for Ethereum and Ripple also results in the best out of sample forecast when using MSE as loss function. For Ethereum, this also holds for R2LOG and MAE as loss functions, while for Ripple it does not.

For Bitcoin on the other hand, optimal fitted models do not result in the best out of sample forecast for any of the loss functions which is contradic- tory to Ethereum.

When evaluating the most optimal in sample fit and out of sample forecast using MSE, it is observable that for specific model types separately (ARCH(1), ARCH(2), ARCH(3) etc.), the best in sample and out of sample results do not necessarily correlate. For Bitcoin and Ripple, generally the best in sample model does not perform best out of sample. Whereas for Ethereum the opposite is the case, the best in sample model in general leads to the best out of sample model for at least one of AIC or BIC. When instead using R2LOG or MAE as loss function. the pattern for Bitcoin and Ripple is the same as for using MSE, where the best in sample fit does not correspond do the best out of sample forecast in general. Using R2LOG and MAE as loss functions for Ethereum also yields a high correlation between good fit and good forecast.

What could then be the reason for an optimal in sample fit not resulting in an optimal out of sample forecast as the results indicate for Bitcoin and Rip- ple? One possible explanation to this could be the fact that the dynamics of the volatility might have changed during the in and out of sample periods.

During the in sample period where the models are fitted, the cryptocurrency market showed extremely rising exchange rates in a typical bullish market.

Even though the time period is relatively small, in total only three years, the huge drop of the whole cryptocurrency market precisely after the cut of point between in and out of sample periods gave rise to a bearish mar-

(55)

ket during the whole out of sample period. With an environmental change of this size, it is quite likely that a change of market dynamics occurred.

Thus, one could see this as a trade off problem, where an extremely good fit might cause less flexibility to environmental changes such as volatility shocks, leading to slow reactions.

Another thing to keep in mind is the volatility proxy. All results are com- puted against our proxy, the Parkinson volatility estimator. Thus another proxy, for example squared returns, would potentially result in a different ranking between the models. The choice of proxy is far from trivial, where an alternative volatility proxy could have been squared returns which is unbiased. Although this, the squared returns proxy have also been shown to be very noisy and suboptimal in many cases. A potential proxy could also have been the more advanced Garman-Klass volatility estimator, but according to previous literature, no evidence could be found that this proxy outperforms the Parkinson volatility estimator practically.

Volatility Evaluation Using Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple

Volatility Evaluation Using

Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple

DARKO BLAZEVIC

FREDRIK MARCUSSON

Volatility Evaluation Using

Conditional Heteroscedasticity Models on Bitcoin, Ethereum and Ripple

DARKO BLAZEVIC FREDRIK MARCUSSON

Utv¨ ardering av volatilitet via betingade heteroskedastiska modeller p˚ a Bitcoin,

Ethereum och Ripple

Acknowledgements

Contents

List of Figures

Chapter 1

Introduction

1.1 Blockchain

1.2 Literature Review

Chapter 2

Methodology and Data

2.1 Methodology

2.2 Data

Chapter 3

Mathematical Theory

3.1 Stationarity and Financial Data

3.2 Conditional mean

3.3 Conditional variance

3.4 Error distributions

3.5 Fitting and evaluation of in sample models

3.6 Evaluation of out of sample models

Chapter 4

Results

4.1 Optimal in sample fit

4.2 Out of sample forecast

Chapter 5

Analysis & Discussion