• No results found

Factors Affecting the Number of Trades in ETPs on Nordic Derivatives Exchange

N/A
N/A
Protected

Academic year: 2021

Share "Factors Affecting the Number of Trades in ETPs on Nordic Derivatives Exchange"

Copied!
62
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2020,

Factors Affecting the Number of Trades in ETPs on Nordic

Derivatives Exchange

SIMON CARLSSON

ERIK ALLGÅRDH

(2)
(3)

Factors Affecting the Number of Trades in ETPs on Nordic

Derivatives Exchange

Erik Allgårdh Simon Carlsson

ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2020

Supervisor at KTH: Mykola Shykula Examiner at KTH: Sigrid Källblad Nordin

(4)

TRITA-SCI-GRU 2020:120 MAT-K 2020:021

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden

(5)

Abstract

This thesis examines which factors that affect the number of trades in exchange-traded products (ETPs) on Nordic Derivatives Exchange. Mul- tiple linear regression is used to model the relationship between the num- ber of trades and 65 initially chosen predictor variables. The predictor variables include various indices, commodities, stocks, and volatility mea- sures.

Two models are presented, one of which includes a lagged dependent variable. These models explain 89% and 92% of the variance within the data. Foremost, the results confirm previous research advocating the volatility to play a significant role on the number of trades, but now also shown for ETPs. Currency exchange rates, equity indices and palladium are also shown to be statistically significant. In addition, interpretations of the results are given and suggestions for further research.

Keywords: regression analysis, trading volume, number of trades, ap- plied mathematics, exchange-traded products, bachelor thesis, NDX

(6)
(7)

Faktorer som p˚ averkar antalet avslut i ETP:er p˚ a Nordic Derivatives Exchange

Sammanfattning

Den h¨ar uppsatsen unders¨oker vilka faktorer som p˚averkar antalet avslut i b¨orshandlade produkter (ETP:er) p˚a Nordic Derivatives Exchange. Mul- tipel linj¨ar regression anv¨ands f¨or att unders¨oka relationen mellan antalet avslut och 65 p˚a f¨orhand valda regressionsvariabler som vi anser intres- santa att unders¨oka. Dessa regressionsvariabler best˚ar av bland annat olika index, r˚avaror, aktier samt volatilitetsm˚att.

Tv˚a modeller presenteras, varav en inkluderar en laggad beroende vari- abel. Dessa tv˚a modeller f¨orklarar 89% respektive 92% av variationen i datan. Resultatet visar att volatiliteten har en signifikant p˚averkan med avseende p˚a antal avslut vilket bekr¨aftar tidigare forskning, men visas nu g¨alla ¨aven f¨or ETPer. Valutakurser, aktieindex och palladium visas vara signifikanta. Vidare ges tolkning av resultatet och f¨orslag p˚a framtida forskning.

Nyckelord: regressionsanalys, handelsvolym, antal avslut, till¨ampad matem- atik, b¨orshandlade produkter, kandidatexamensarbete, NDX

(8)
(9)

Acknowledgements

We would like to thank our supervisor in applied mathematics, Mykola Shykula, for giving us feedback and guidance. In addition, we would like to express our gratitude to Oscar Britse and Markus Ramstr¨om at NGM for providing us with data.

(10)
(11)

Contents

1 Introduction 1

1.1 Background . . . 1

1.2 Previous Research . . . 1

1.3 Purpose and Aim . . . 2

1.4 Research Question . . . 3

2 Financial Background 4 2.1 Exchange-Traded Products . . . 4

2.1.1 Speculation . . . 5

2.1.2 Hedging . . . 6

2.2 Mechanics of Derivatives Trading . . . 6

2.2.1 Derivatives Exchanges . . . 6

2.2.2 Brokerage Firms . . . 6

2.2.3 Issuer of Derivatives . . . 7

2.2.4 Market Maker . . . 7

2.3 Volatility . . . 7

2.4 Number of Trades . . . 8

3 Mathematical Theory 9 3.1 The Multiple Linear Regression Model . . . 9

3.2 Ordinary Least Squares . . . 9

3.2.1 Assumptions . . . 9

3.2.2 Derivation . . . 10

3.2.3 Properties of the OLS coefficients . . . 10

3.3 Residual Analysis . . . 11

3.3.1 Graphical Residual Analysis . . . 11

3.4 Autocorrelation of Errors . . . 12

3.4.1 Lag Plot . . . 12

3.4.2 Durbin-Watson Test . . . 13

3.4.3 Lagged Dependent Variable Models . . . 13

3.5 Influential Observations . . . 14

3.5.1 Deletion Diagnostics . . . 14

3.6 Transformations . . . 14

3.6.1 Box-Cox Method . . . 15

3.7 Multicollinearity . . . 15

3.7.1 Variance Inflation Factor . . . 15

3.8 Variable Selection . . . 15

3.8.1 Backward Elimination . . . 16

3.8.2 Hypothesis Testing . . . 16

3.9 Model adequacy . . . 17

4 Methodology 18 4.1 Data . . . 18

4.2 Software . . . 18

4.3 Timeline . . . 18

4.4 Delimitations . . . 18

4.5 Variables . . . 18

4.5.1 Initial Variable Treatments . . . 18

(12)

4.5.2 Number of Trades . . . 20

4.5.3 Equity Indices . . . 20

4.5.4 Stocks . . . 20

4.5.5 Commodities . . . 21

4.5.6 Volatility . . . 22

4.5.7 Currencies . . . 22

5 Results 25 5.1 Initial Model . . . 25

5.1.1 Altered Model . . . 25

5.2 Residual Analysis . . . 25

5.3 Autocorrelation . . . 25

5.3.1 An Additional Model . . . 27

5.4 Leverage and Influential Points . . . 27

5.5 Transformations . . . 28

5.6 Variable Selection . . . 28

5.7 Multicollinearity . . . 31

5.8 Final Models . . . 31

5.8.1 Model A . . . 31

5.8.2 Model B . . . 32

6 Discussion 35 6.1 Model Adequacy . . . 35

6.2 Financial Interpretations . . . 35

6.2.1 Volatility . . . 35

6.2.2 Palladium . . . 36

6.2.3 Currencies . . . 36

6.2.4 Equity Indices . . . 36

6.2.5 Nokia . . . 37

6.3 Further Research . . . 37

7 Conclusion 39 A Appendix 40 A.1 An Introduction to Financial Derivatives . . . 40

A.1.1 Options . . . 40

A.1.2 Futures Contracts . . . 41

A.2 Omitted Trading Days . . . 42

References 43

(13)

1 Introduction

1.1 Background

The oldest recorded evidence of financial derivatives is found in the Code of Hammurabi, the code-of-law of Ancient Babylon [1, ch. 1]. It dates back to about 1754 BC and consists of 282 laws, one of which deals with farmers and their mortgages:

”48. If any one owe a debt for a loan, and a storm prostrates the grain, or the harvest fail, or the grain does not grow for lack of water;

in that year he need not give his creditor any grain, he washes his debt-tablet in water and pays no rent for the year” [2].

It states that in event of crop failure, an indebted farmer does not need to pay interest to the mortgagor. This decree functioned as an asset-or-nothing put option for the farmers [1, ch. 1].

A mere millennium and a half later, Aristotle tells the story about the philosopher Thales, who predicted an unusually plentiful olive harvest the com- ing fall. He purchased the right, but not the obligation, to hire all the olive presses in the region when fall came. Indeed, fall came and the olive harvest was abundant, resulting in a soaring demand for olive presses. Thales then leased the presses at a substantial premium and made a fortune [1, ch. 1].

Fast forward another two millennia to the year of 1848 when the world’s first futures and options exchange was founded, the Chicago Board of Trade [1, ch. 1]. This was the start of organized and centralized derivatives trading.

The move beyond the agricultural origins of derivatives markets was slow.

The first trading in non-agricultural commodities on exchanges began in 1933 with futures contracts on silver. Years elapsed and the pace of innovation in derivatives gained traction. The internet, electronic trading and fiber optics have laid a proper foundation for today’s high-speed, high-volume and complex derivatives markets.

Financial derivatives have been exchange-traded in Sweden since 1985, when Optionsmarknaden began trading options. In the beginning, only call options with a mere six stocks as underlying assets were available for trading [3]. In 2003, Nordic Derivatives Exchange (NDX) was founded by Nordic Growth Market AB (NGM) [4]. NDX is a Swedish regulated market for exchange-traded products (ETPs), structured products, bonds, and exchange-traded funds. As of year 2020, around 16,000 ETPs are listed for trading on NDX [5], and the countless types of derivative contracts are overwhelming.

In this era of proliferation of financial derivatives, it is interesting to under- stand which factors that drive derivatives trading. Therefore, this thesis aims to investigate the relationship between the number of traded ETPs and a set of 65 chosen variables using multiple linear regression.

1.2 Previous Research

Previous research has shown that factors such as earnings reporting, bid-ask spread, and calendar effects affect the trading volume in single stocks [6]. One study showed that the bid-ask spread has a negative relationship with trading

(14)

volumes in futures markets [7]. Another study found that futures trading vol- umes are affected by the underlying market characteristics, such as the price volatility [8].

In 1994, Jones et al. showed that there exists a strong relation between the number of trades and volatility in stock markets [9], further strengthened by other studies [10]. Studies have shown that there exist positive interrelations between exchange rate volatility and currency option trading volumes [11, 12].

Another study suggests that the relation between trading volume and volatility is steeper for positive returns than for non-positive returns [13]. McInish and Wood [14] found that:

”return activity in a period is associated with the level of trading frequency in a subsequent period and also with the number of shares in a subsequent period. This is consistent with small traders reacting to returns while professional traders largely ignore previous returns in their trading.”

The findings of McInish and Wood are of great interest for this thesis, since the majority of the trades on NDX are executed by small traders1 and they - according to McInish and Wood - trade as a reaction to changes in returns.

An important contributor on the topic of trading volumes is behavioral economist Hersh Shefrin, who has included behavioral and psychological as- pects as key factors affecting trading volumes. Investor overconfidence and heterogeneity in beliefs and expectations are some factors that affect trading volumes [15, p. 136-137,499-500]. However, as such factors are not easily quan- tifiable, they were completely omitted from this analysis.

1.3 Purpose and Aim

This thesis aims to examine and determine which factors affect the number of trades in exchange-traded products on Nordic Derivatives Exchange. This is of interest for three main reasons.

Firstly, it lies in every corporation’s best interest to readily understand its revenue streams. Many corporations operate on derivatives markets, and a solid understanding of which factors affect the number of trades is vital for understanding their own businesses and to develop suitable strategies to increase profitability. For such companies, the insight into which factors that drive the trading is valuable.

Secondly, investors might try to capitalize and use this insight as an invest- ment tool. There exist studies which suggest that increased trading volumes (relative to trend) are associated with negative skewness in stock returns over the subsequent six months [16]. Thus, understanding what factors that consti- tutes the fluctuation of trading volumes could yield a better basis for analysis of price movements and resulting in more adequate decision making.

Thirdly, there is a clear lack of research on trading volumes / number of trades in ETPs. Previous research has mostly revolved around trading volumes in stocks and futures. That the same findings apply to ETPs is not certain.

This thesis aims to make a contribution to fill this void of knowledge.

1Established in section 2.4.

(15)

1.4 Research Question

The research question is formulated as:

Which factors affect the number of trades in exchange-traded prod- ucts on Nordic Derivatives Exchange?

(16)

2 Financial Background

This section covers the essential financial theory for this thesis. Firstly, the reader is acquainted with exchange-traded products. Secondly, the mechanics of derivatives trading are discussed, as well as the essential parties which enable derivatives trading. Thirdly, volatility is discussed, which previous studies have suggested greatly affects the number of trades. Lastly, this section ends with a brief discussion about the term number of trades. Before proceeding, we recommend the reader unacquainted with financial instruments, options, and futures contracts to read section A.1 in the appendix.

2.1 Exchange-Traded Products

Exchange-traded products, ETPs, are a group of financial derivatives which are traded on exchanges [17]. Different types of ETPs can differ greatly in the un- derlying mechanics. This section aims to give an introduction to ETPs, as well as to explain the characteristics and properties of the various types of ETPs which are listed on NDX. Mutual characteristics of ETPs include that they are cash-settled, carry a management fee, and are associated with a limited invest- ment risk (that is, an investor cannot lose more than the initial investment).

The last-named property is extremely attractive from an investor-perspective.

ETPs can be benchmarked to stocks, commodities, indices, currencies, or in- terest rates. Most ETPs have some leverage to enhance price movements in the underlying asset. The various types of ETPs on NDX are tracker certifi- cates, constant leverage certificates, mini futures, plain vanilla warrants, turbo warrants, and unlimited turbos.

Tracker Certificates. A tracker certificate follows the underlying asset’s price movements 1:1 and is thus not leveraged. All tracker certificates on NDX carry a long position. Tracker certificates offer a simple and cost-effective means to invest in exotic and otherwise inaccessible underlying assets.2

Constant Leverage Certificates. Constant leverage certificates (commonly referred to as bull and bear certificates) are financial derivatives with a fixed daily leverage. Bull certificates carry a long position, and bear certificates carry a short position. The leverage and the certificate’s reference price are settled and recalculated on a daily basis to remain the constant leverage. The leverage is created by the issuer of the certificate, who borrows capital and buys (in case of a bull certificate) the underlying asset in the market.

Mini Futures. Mini futures are essentially leveraged, cash-settled futures contracts with no predetermined expiration date. However, mini futures have one particularly attractive property which regular futures contracts lack. Lever- aged futures contracts are extremely risky since it is possible to lose more than one’s initial investment. Mini futures mitigate this risk through the use of a

2For instance, on NDX one can invest in exotic indices such as the Vontobel Belt and Road Index, which tracks the price movements of companies that stand to profit from the realization of the Belt and Road Initiative, and the Solactive 5G Technology Performance Index, which tracks the performance companies with significant business engagement in the areas of 5G technology.

(17)

barrier, also stop loss level, similar to barrier options. In case of a mini future long, the barrier is set equal to or greater than the leveraged component of the mini future.3 However, contrary to a barrier option, the barrier of a mini future is not fixed during the course of a mini future. Instead, the barrier is decided on a regular basis by the issuer. Further, similarly to how a leveraged futures contract works, the investor has to pay interest on the leveraged com- ponent. For a mini future long, the leveraged component therefore increases with time to replicate interest paid by the investor. For a mini future short, the non-leveraged component increases with time to replicate interest paid to the investor [18, 19, 20]. If the stop loss level and the leveraged component differ, a salvage value is paid out to the investor after knock-out.

Plain Vanilla Warrants. Plain vanilla warrants greatly resemble European options [21], and thus carry leverage naturally. To illustrate this, consider a long plain vanilla warrant (i.e. essentially a European call option), which is at-the-money4with a strike price of 100 EUR. Suppose that given the maturity, volatility, and risk-free interest rate,5the price of the warrant is 1 EUR. Further, suppose that the price of the underlying asset increases to 102 EUR at maturity, and the option is thus worth 2 EUR. A 2% increase in the underlying asset has thus doubled the value of the warrant.

Turbo Warrants. A turbo warrant is a type of barrier option, and thus differs from a plain vanilla warrant. A call turbo warrant is similar to a down-and-out call option, and a put turbo warrant is similar to an up-and-out put option.

However, when the barrier (stop loss level) is hit, the payout is not necessar- ily zero. This depends on whether the strike price K and the barrier L are equal or not. If they differ, a reference price is determined and a salvage value is calculated and paid out to the holder of the turbo warrant [22]. This dif- fers from traditional down-and-out call and up-and-out put options in financial mathematics [23, p. 267-275].

Unlimited Turbos. Unlimited turbos are essentially mini futures, with one special property. The barrier is always set equal to the leveraged component.

Thus, once the unlimited turbo is knocked out, the salvage value is always zero.

2.1.1 Speculation

In finance, speculation aims to make a profit from the price dynamics of some underlying security. Financial instruments allow investors a large speculative position with respect to relatively small initial depositions [24, p. 15]. In addi- tion, investors can also take a market position without investing direct capital into the components by adopting a leveraged position. Leverage allow investors a greater market exposure than their capital grants elsewhere. Speculators play an important role to the market. Firstly, they are prepared to carry more risk

3In case of a mini future short the barrier is set equal to or less than the leveraged com- ponent.

4At-the-money refers to a situation where the strike price of an option equals the spot price of the underlying asset.

5Maturity, volatility, and the risk-free interest rate are key components when pricing a derivative in financial mathematics.

(18)

relative to the average investor meaning they are willing to invest in yet un- proven markets or times when the risk averse trembles [25]. Thus, without speculators, only the large and well-established companies would be the ones procuring loans. Secondly, they tend to be more active traders and hence pro- vides market liquidity yielding a smaller bid-ask spread. There are two forms of speculation. A bullish speculator believes the price of the security to increase while a bearish speculator seeks to profit from a price decrease of the security.

2.1.2 Hedging

While speculating often increases the risk, hedging intends to decrease the vari- ance associated with the portfolio. The ultimate object is to reduce the risk of adverse price movements with respect to some underlying asset. There are various types of hedges. Short hedges are used by investors for protection of potential price declines on specific assets in the future. They are primarily rec- ommended for investors already owning an asset and planning to sell in the future [26, p. 66]. Long hedges are used for potential price increases. Thus, these hedges are mainly being used by manufactures that know it will have to buy a specific asset in the future, e.g. oil, but seek to determine the price now [26, p. 66]. Another form of hedge is through diversification. Contrary to arguments in line with Modigliani and Miller (who claim that hedging for corpo- rations should be irrelevant to shareholders because they can do it themselves by adopting a more well-diversified portfolio), studies have shown a positive correlation with use of foreign currency derivatives and stock prices [27].

2.2 Mechanics of Derivatives Trading

This part discusses the crucial parties which all enable derivatives trading.

These are derivatives exchanges, brokerage firms, issuers, and market makers.

2.2.1 Derivatives Exchanges

Derivative exchanges exist to mitigate the transfer of financial risk and provide investors the opportunity for price discovery [28]. It functions as the connection between cash markets, hedgers and speculators [28, p. 3]. Most of the job of the exchange involves monitoring various flows, such as orders and trades.

Exchanges primarily make their money by charging small commissions for each settled trade.

2.2.2 Brokerage Firms

Private investors cannot trade directly on derivatives exchanges, such as NDX or Chicago Mercantile Exchange. Instead, private investors must trade through a brokerage firm which is a market member of the exchange. NGM has some 20 odd market members [29]. These market members, or brokerage firms, act as middlemen and trade on the behalf of its customers [30]. Avanza and Nordnet are two large Swedish online brokerage firms with a strong presence on NDX.

Market members generally charge their customers a commission fee for each trade executed. These commission fees can either be fixed or value-based.

(19)

2.2.3 Issuer of Derivatives

The issuer is responsible for creating the derivatives and list them on the ex- changes. Thus, they create the leverage associated with the investment and also the ones responsible to follow through with the financial transactions [31].

2.2.4 Market Maker

Market makers ensure the liquidity and efficiency within the market through buying and selling substantial amounts of the asset [32]. Both individual in- vestors and financial institutions can act as market makers [33], however on NDX it is the issuers of ETPs that act as market makers. They profit from the bid-ask spread.

2.3 Volatility

The volatility of a financial security is defined as the standard deviation of the return in a year when the return is expressed using continuous compounding [26, p. 319]. The volatility carries great importance in financial mathematics when pricing financial derivatives as it is one of five parameters in the celebrated Black-Scholes model [23, p. 108]. More relevant to this thesis is the fact that, as stated above, there exists a magnitude of research which suggests that volatility has a great impact on trading volumes and number of trades in futures markets, stock markets, and currency option markets [14, 34, 35, 36]. One theory which explains the relation between volatility and trading volumes is the mixture of distributions hypothesis. By this hypothesis, both trading volumes and volatility are derived from the (unobservable) rate of information flow to the market [36].

Another hypothesis which explains the relationship between trading volume and volatility is the sequential arrival information hypothesis. It assumes that information spreads sequentially among investors and traders [8]. Further, it assumes that ”trading takes place after each trader receives information, but an uninformed trader will be unable to perfectly learn by observing the trading activities of informed traders” [8]. Thus, the hypothesis conjectures that there exists both a lagged, as well as a contemporaneous, relationship between trading volume and return volatility [8].

One problematic aspect related to volatility is that the volatility is not di- rectly observable in the market. Two common approaches to deduce the volatil- ity of an asset are to compute the historical volatility or the implied volatility.

The historical volatility is computed using elementary statistical theory and his- torical security prices. However, as the name entails this will only approximate the historical volatility. Volatility changes over time, why historical data does not reveal anything about the present or future volatility. An alternative ap- proach is to estimate the market’s expectation of the volatility, which is being priced into option prices. Using the market price of an option written on the very same security one wants to estimate the volatility of, one can invert the Black-Scholes formula to compute the implied volatility [23, p. 108-110]. This is approximately how volatility indices such as the VIX, VVIX and VSTOXX are computed, although the exact formulas differ somewhat [37].

(20)

2.4 Number of Trades

The term number of trades, also transactions, simply refers to the number of deals or transactions in a day. A closely related term is trading volume, which refers to the number of contracts traded in a day [26, p. 52]. The trading volume is equal to the number of trades multiplied by the average trade size.

During the period of 2019 the average turnover per trade in ETPs on NDX was 3,435 EUR [38], suggesting that the majority of trades were executed by small traders. For such small sized transactions, NGM’s commission mainly comes from fixed transaction fees [39]. Therefore, the number of trades has a greater impact on NGM’s revenue streams than trading volume, and is thus of greater interest to investigate. For other actors, such as market makers which profit from the bid-ask spread, the trading volume is of greater interest since this factor is what drives the revenues.

(21)

3 Mathematical Theory

3.1 The Multiple Linear Regression Model

Regression analysis is a statistical technique used to analyze the relationship between a dependent variable and a set of predictor variables. The relationship is modeled and analyzed by fitting the dependent variable as a function of the predictor variables. The model then allows the practitioner to estimate conditional expectations of the response variable. The models can later be used for forecasting, but also for understanding relationships between the response and its predictors.

This thesis aims to model the relationship between a response variable and predictor variables using multiple linear regression. More specifically, the rela- tionship is modeled according to the following equation

yi= β0+ β1xi,1+ β2xi,2+ β3xi,3+ ... + βkxi,k+ i (1) where yi denotes the ith observed value of the dependent variable, xi,j denotes the ith observation of the jth dependent variable, βj denotes the linear coeffi- cients ∀ j = 0, 1, 2, ..., k, and i the jth error term. k denotes the number of dependent variables. Let n denote the number of observations and p = k + 1.

Then, equation 1 can be written in matrix form according to y = Xβ + ,

where y is the n × 1 vector of the observed values, X is the n × p matrix of the observations of the independent variables, β constitutes the p × 1 vector of regression coefficients, and the n × 1 vector  constitutes the random errors. In particular, y, X, β, and  are given by

y =

 y1

y2

... yn

 , X =

1 x11 x12 . . . x1k

1 x21 x22 . . . x2k

... ... ... ... 1 xn1 xn2 . . . xnk

 , β =

 β0

β1

... βk

 ,  =

1

2

...

n

 .

3.2 Ordinary Least Squares

Ordinary least squares (OLS) is the most common estimation of the unknown regression parameters in linear regression models. The estimator ˆβ is chosen to minimize the square distance of the residuals, which yields the fitted model ˆ

y = X ˆβ. Thus, ˆy is the predicted value by the model.

3.2.1 Assumptions

The random errors, i, are assumed to have mean zero and independent of con- temporaneous, past, and future errors. This assumption is referred to as strict exogeneity. The errors are also assumed to have constant variance, σ2, known as homoscedasticity. Further, another vital assumption is that the errors are uncorrelated with the observations. In addition, the random error terms are also assumed to be normally distributed. While strict exogeneity, homoscedas- ticity and uncorrelated errors are vital for derivation and properties of the linear

(22)

coefficients βj, normality is not. However, the normality assumption allows for hypothesis testing, confidence intervals and t-tests, which will be used through- out this thesis.

3.2.2 Derivation

The OLS estimator ˆβ is derived by minimizing the sum of squares S(β) =

n

X

i=1

2i = 0 = (y − Xβ)0(y − Xβ).

S(β) can be expressed as

S(β) = y0y − β0X0y − y0Xβ + β0X0Xβ = y0y − 2β0X0y + β0X0Xβ.

Since S(β) is convex in β, the first and second optimality conditions implies that the minimum is obtained by differentiating S(β) with respect to β and setting it to zero:

∂S(β)

∂β

βˆ= −2X0y + 2X0X ˆβ = 0. (2) Equation 2 can be rewritten as the least-squares normal equations

X0X ˆβ = X0y.

Thus, the OLS estimator ˆβ is given by

β = (Xˆ 0X)−1X0y,

assuming the predictor variables are linearly independent. Hence, the fitted model is given by

ˆ

y = X ˆβ. (3)

Equation 3 can be written as ˆ

y = X ˆβ = X(X0X)−1X0y = Hy, where H is known as the hat matrix.

3.2.3 Properties of the OLS coefficients

Provided that the model is correct, ˆβ is an unbiased estimator of β, shown below:

Eβˆ = Eh

(X0X)−1X0yi

= Eh

(X0X)−1X0(Xβ + )i

= Eh

(X0X)−1X0Xβ + (X0X)−1X0i

= β, since E() = 0 by assumption. The variance of ˆβ is derived through

Var( ˆβ) = Varh

(X0X)−1X0yi

= (X0X)−1X0Var(y)h

(X0X)−1X0i0

= σ2(X0X)−1X0X(X0X)−1= σ2(X0X)−1. The Gauss-Markov theorem establishes that the least squares estimator of β is the best linear unbiased estimator given that the errors have mean zero, constant variance and are uncorrelated [40, p. 80].

(23)

3.3 Residual Analysis

Residual analysis is useful and efficient for detecting inadequacies in the model or the data. As discussed in section 3.2.1 above, there are a few crucial un- derlying assumptions about the error , summarized as  ∼ N (0, σ2I). These assumptions can be verified using residuals.

The observed residuals e for n observations are defined as e = y − ˆy = (I − H)y,

where I is the n × n identity matrix and H is the hat matrix, defined as previ- ously. The variance of the residuals is

Var(e) = Var

(I − H)y

= σ2(I − H),

due to the idempotency of I − H and H [40, p. 131]. Further, since the diagonal elements of H are generally not identical, the residuals e do not have constant variance. Therefore, it is clear that in order to compare the residuals of dif- ferent observations in any meaningful manner, one needs to scale the observed residuals. One popular scaling method is studentized residuals, which has exact unit variance. The studentized residual is defined as

ri= ei

pMSRes(1 − hii),

where ei is the ith residual, MSRes is the residual mean square, and hii is the ith diagonal element of the hat matrix. Observations with studentized residuals greater than 3 are generally considered potential outliers [40, p. 131].

3.3.1 Graphical Residual Analysis

Graphical techniques are excellent for identifying abnormal values of residuals.

Given that the assumptions in the previous part are correct, as well as that the model in general is correct, certain characteristics of the residuals are expected.

To test these expected behaviors, one can plot the residuals in different ways.

Any deviations from the majority of the residuals suggest that certain model inadequacies are present. There are different types of residual plots.

One such residual plot is the Tukey-Anscombe plot which is a plot of the residuals versus the fitted values. It provides an efficient way of verifying or rejecting several of the assumptions. The plot in figure 1a) is the expected pattern if the underlying assumptions are satisfied. Figure 1b) illustrates the problem of dispersion, a case of heteroscedasticity. Lastly, figure 1c) shows a pattern which often is attributed to the lack of some important independent variable [41, p. 346-348].

Another common residual plot is the Q-Q plot (short for quantile - quan- tile plot), also known as normal probability plot, which is used to detect non- normality. It is a plot of the ordered residuals against the normal order statistics.

If the residuals are normally distributed the residuals should form a straight line according to 1d). Any deviations from this suggests issues with normality.

(24)

(a) Expected pattern (b) Dispersion

(c) Asymmetry (d) Q-Q

Figure 1: a)-c) show Tukey-Anscombe plots, d) shows a Q-Q plot

3.4 Autocorrelation of Errors

One of the fundamental assumptions about the true error  is that the errors are uncorrelated. Any violation of this assumption may seriously inflict harm on the model. Autocorrelation refers to the situation where errors are correlated with each other. The presence of autocorrelation implies that OLS estimates are no longer the minimum variance estimates (however, they are still unbiased) and may cause seriously underestimated error variances σ2. This implies that confidence intervals, prediction intervals and hypothesis tests are more imprecise procedures [40, p. 475].

Often, time series data exhibits some autocorrelation. One approach to handle the issue of autocorrelation is to add a lagged dependent variable [40, p. 494-495].

3.4.1 Lag Plot

A lag plot can be used to detect autocorrelation. It plots the residuals et and et−1against each other. If a linear shape appears, autocorrelation is present. If the linear shape has a positive slope, the autocorrelation is positive, and if the linear shape has a negative slope, the autocorrelation is negative. If no pattern can be identified, it is plausible that there is no autocorrelation. See figure 2 [42].

(25)

(a) Positive autocorrelation (b) No autocorrelation

Figure 2: Lag plots.

3.4.2 Durbin-Watson Test

The Durbin-Watson test is commonly used to detect autocorrelation. The Durbin-Watson test statistic d is defined as:

d = PT

t=2 et− et−1

2

PT

t=1e2t ≈ 2(1 − ρ),

where ρ is the simple correlation between et and et−1 [41, p. 355]. The value of d can lie between 0 and 4. A value less than 2 indicates positive autocorre- lation, and a value greater than 2 indicates negative autocorrelation. If d = 2, then there is no autocorrelation. In the Durbin-Watson test one tests the null hypothesis

H0: ρ = 0, against the alternative hypothesis

H1: ρ 6= 0.

The test can be one-sided or two-sided. For both cases, there exist lengthy tables with numerical values as for when to reject or not reject the null hypothesis, see [43] for such tables. However, a rule of thumb is that values between 1.5 and 2.5 are normal, and in general not cause for alarm [44].

3.4.3 Lagged Dependent Variable Models

Introducing a lagged dependent variable (LDV) in an OLS regression is a pop- ular method to mitigate autocorrelation issues:

yt= φyt−1+ xtβ + t.

Including a LDV does introduce some bias in the model. This bias can range from tiny to severe. However, in many cases a LDV is called for (e.g. due to autocorrelated errors), and excluding a LDV from the model can incur dramatic bias [45].

Some researchers have argued that the Durbin-Watson test is biased toward 2 when LDVs are included in the OLS estimates. However, other studies have shown that this is is not the case, and that the Durbin-Watson tests are com- pletely legitimate even in this case [46].

(26)

3.5 Influential Observations

Influential observations are points with an undesirably large effect on the model fit. These points are usually characterized by a large residual and (or) high leverage. High leverage observations are distant from the centroid of the data in X-space leading to an inordinate impact of the estimated regression coefficients [40, p. 212]. Observations with large residuals tend to differ substantially from the rest of the data and pull the slope more towards it. Indeed, the combination of large residuals and high leverage increases the importance of proper diagnostic treatments.

3.5.1 Deletion Diagnostics

Below, three diagnostics, which all measure the effect of ith observation, are discussed. Thus they are referred to as deletion diagnostics.

Cook’s D. The influence measure Cook’s D is defined as Di= ( ˆβi− ˆβ)0(X0X)( ˆβi− ˆβ)

pMSRes

,

where MSRes = SSRes/(n − p) and ˆβi is the OLS estimate of β when the ith observation is deleted. Cook’s D measures the shift in ˆβ when a certain obser- vation i is deleted. An observation i may be influential if Di> F(0.5,p,n−p) [41, p. 367].

DFFITS. The influence measure DFFITS is defined as DF F IT Si =yˆi− ˆyi(i)

qS(i)2 hii

,

where ˆyi = X(i)βˆi, ˆyi(i)is the estimated mean of the ith observation and S(i)2 the estimated mean of the error without the ith observation. It measures the shift in ˆyiwhen the ith observation is deleted. An observation i may be influential if

| DF F IT Si |> 2pp/n [41, p. 367].

COVRATIO. The influence measure COVRATIO is defined as

COV RAT IOi= S(i)2 p

MSResp 1 1 − hii

! .

It measures the impact of the ith observation on the precision of the estimates of the regression coefficients [41, p. 365]. An observation i may be influential if

| COV RAT IOi− 1 |> 3p/n [41, p. 367].

3.6 Transformations

Transformations are useful when any of the fundamental assumptions about normality, homoscedasticity and/or linearity are not satisfied. There are two approaches to transforming a linear model fit: transforming the dependent vari- able y or transforming the regressor variables xi [40, p. 182].

(27)

3.6.1 Box-Cox Method

The Box-Cox method is an objective technique to help specify the most ap- propriate transformation on the dependent variable. The method combines the objectives to induce homogeneous variance, simple relationship and improving normality in a linear model.

The method uses a family of power transformations y(λ), which are defined as:

y(λ)= (yλ−1

λ , λ 6= 0 ln(y), λ = 0 .

The appropriate λ is selected by some objective criterion, e.g. maximium likeli- hood, the Shapiro-Wilk test, or the probability plot correlation coefficient. The inventors of the method, Box and Cox, proposed that λ is chosen as the max- imum likelihood estimator [47]. A popular alternative criterion is to maximize the probability plot correlation coefficient (PPCC), the default selection cri- terion for the function boxcox in the R package EnvStats. Some research has shown that the PPCC is superior to other tests, such as the maximum likelihood [48]. The technical details of PPCC are omitted from this thesis.

3.7 Multicollinearity

The columns of X are linearly dependent if and only if

p

X

j=1

tjXj = 0, (4)

provided the existence of a set of constants t1, t2, ..., tp not all zero [40, p. 286].

In particular, multicollinearity is considered near-linear dependency among the regressor variables [40, p. 117]. In conjunction with equation 4, this would be almost true for a subset of constants ti. Almost all data sets suffer from multicollinearity, the question is rather to what degree [40, p. 286]. Severe mul- ticollinearity increases the variance in the estimated linear coefficients, ˆβ, and can sometime lead to estimates of β that are too large in magnitude. Therefore, the severity requires scrutiny.

3.7.1 Variance Inflation Factor

The variance inflation factor, VIF, is defined as:

V IFj= 1 1 − R2j

for the jth regressor coefficient [41, p. 372]. Indeed, V IFj depends on R2j which measures the regression of xj onto the other k − 1 regressor variables. R2j values approaching 1 indicate near singularity, implying V IFjwill be large. VIF values

> 10 indicate severe multicollinearity [41, p. 377].

3.8 Variable Selection

There are two predominate methods for variable selection in linear regression, namely best subsets regression and stepwise regression methods.

(28)

Best subsets regression compares all models which can be constructed and then chooses the best model based on some selection criteria. If there are k regressor variables, a total of 2k models can be constructed. As k increases, this number quickly becomes huge. For instance, to perform best subsets regression with 65 regressor variables requires constructing and comparing 3.7·1019models.

Best subsets regression becomes infeasible in practice for k > 40 [49, p. 58]. Due to the very large number of variables included in this analysis, any best subsets regression is omitted.

3.8.1 Backward Elimination

Stepwise regression methods require significantly less computing than best sub- sets regression. They work by sequentially adding or deleting regressor variables one at a time. Stepwise regression methods can be divided into three broad cat- egories: forward selection, backward elimination, and stepwise regression.

Backward elimination starts with a full model with k regressors and then eliminates one regressor variable at each step. At every step, some test statistic is computed for each variable, e.g. p-value, and the regressor with the worst statistic is omitted from the model. This procedure is repeated until either all regressor variables satisfy some preselected cutoff value or a desired number of regressor variables remain in the model [41, p. 213]. Backward elimination works very well as a variable selection method and is heavily favored in the research community [40, p. 347].

3.8.2 Hypothesis Testing

In order to determine the significance of the linear relationship in the model, a global F-test can be used [50]. The corresponding null hypothesis

H0: β1= β2= ... = βk= 0

is formulated, which is equivalent to testing whether the variation in the re- sponse variable is due to any regressor variable. The alternative hypothesis,

H1: βj6= 0 for at least one j

equivalently in words, at least one of the regressor variables is significant to the model. The test statistic is derived through

SST= SSR+ SSRes (5)

where SST denotes the total sum of squares, SSRes = Pn

i=1(yi − ˆyi)2, and SSR=Pn

i=1(ˆy − ¯y)2. Hence, equation 5 can be interpreted as how much of the sum of squares is explained by regression, SSR, and residual sum of squares, SSRes[40, p. 84]. Assuming H0 is true, it can be shown that this implicates

SSR σ2 ∼ χ2k, SSRes

σ2 ∼ χ2n−k−1,

(29)

and that SSRes and SSR are independent. Moreover, the test statistic F0 is constructed such that

F0=

SSR k SSRes

(n−k−1)

= MSR

MSRes

Therefore, under H0, F0follows a Fk,n−k−1distribution why our null hypothesis H0can be rejected on confidence level 100(1 − α) if F0> Fα,k,n−k−1.

3.9 Model adequacy

The coefficient of determination, R2, is a measure used for examining the model adequacy. R2 is defined as

R2=Variance explained by the model

Total variance = 1 − SSRes SST

The measure examines the distance between the observed values and the fit- ted values. More specifically, it measures the proportion of the variance in the response variable that is explained by the set of independent variables. How- ever, R2 is proportional to the number of predictor variables chosen to include, meaning it increases when adding more independent variables. An alternative measure is given by the adjusted R2. The adjusted R2is defined as

R2adj = 1 −

SSRes

n−p SST

(n−1)

that is similar to R2, but increases only if the added independent variable re- duces the residual mean square [40, p. 88].

The p-value of each linear coefficient examines the null hypothesis that the coefficient βj is equal to zero. A low p-value (typically < 0.05, depending on the level of significance chosen by the analyst) indicates that the null hypothesis can be rejected. A failure to reject the null hypothesis due to a high p-value suggests that the predictor variable might be insignificant.

(30)

4 Methodology

4.1 Data

The data was in part obtained from NGM and in part from Yahoo Finance.

Yahoo Finance’s data provider is ICE Data Services [51]. The data consisted of 1,269 observations with 66 variables (one dependent variable and 65 regressor variables), and was modified (according to section 4.5.1) in Microsoft Excel to fit the purposes of this analysis. A complete list of all regressor variables is found in table 3.

4.2 Software

The regression analysis was performed in the integrated development environ- ment RStudio using the programming language R.

4.3 Timeline

The analysis was performed with data over a five-year period; from 2015-01-01 to 2019-12-31.

4.4 Delimitations

Delimitations were made to only examine trading days for which exchanges in all markets with indices included in our analysis were open. Thus, trading days on NDX with public holidays in any of the markets, e.g. Thanksgiving in the United States and Midsummer in Sweden, were omitted from the analysis. In total, 107 trading days were omitted. A complete list of these can be found in table 8 in section A.1. The reason for this omission of certain trading days was to construct a model as accurate as possible.

4.5 Variables

On NDX, ETPs are benchmarked to either stocks, indices, commodities, cur- rencies or interest rates. Due to the limited amount of ETPs benchmarked to interest rates,6 no regressor variables were related to interest rates. Instead, focus was devoted to equity indices, volatility indices, stocks, commodities, and currencies as regressor variables. The following section describes which regres- sor variables that were included and why. However, the reader will first be acquainted to the methods used to alter the data to fit the purpose as appro- priate regressor variables.

4.5.1 Initial Variable Treatments

To fully exploit the information about e.g. equity indices and their relation to the number of trades in ETPs on NDX, one needs to transform the insignificant asset prices/values. It is very improbable that an equity index’s value will have any effect on the number of trades. However, daily returns of indices could possibly help explain the number of trades in ETPs. As previously noted,

6Only an approximate 70 ETPs are benchmarked to interest rates out of a total of 16,000 ETPs on NDX (as of April 2020).

(31)

studies have suggested that price changes and daily returns of securities affect the trading volume and number of trades. Therefore, the daily return of most assets was chosen rather than the assets’ prices as regressor variables. Note that for some assets the actual price (e.g. VIX) is used.

The daily return of an asset on trading day t, expressed in percent and denoted by rt, is computed accordingly:

rt= 100 × pt

pt−1 − 1

!

where ptdenotes the security’s closing price at trading day t.

We identified a severe weakness in a previous bachelor thesis on the topic of trading volume, ”Trading volume at Avanza” [52]. The authors of the thesis attempted, as the title entails, to determine the factors which affect the trading volume at Avanza. They did so using multiple linear regression, and obtained an adjusted R2of 1.926% in their final model. We believe that this poor model accuracy was due to the fact that they used the daily returns of securities (e.g.

indices) as regressor variables, without any sort of variable treatment or modifi- cation. The problematic aspects of this relates to that in a linear fit this implies that solely a positive or a negative daily return in a security can increase the trading volume, whilst the opposite sign in the return will decrease the trading volume. This sort of variable treatment (or rather, mistreatment ) neglects and contradicts previous research which advocates that it is the volatility and price movements that affect trading volumes [34].

Therefore, this issue has been bypassed by using the absolute value of the daily return, and in some cases the regressor variables are separated into two variables, one of which contains the positive returns and one of which contains the negative returns. This is clarified mathematically below.

Consider the daily return of a security, rt. Let r+t and rtdenote the positive return and negative return, respectively. These are defined as

r+t = rt·1{rt>0}

rt = rt·1{rt<0}

where1 denotes the indicator function.

The idea behind this separation (compared to simply using the absolute value) is that an increase in a security’s price might not trigger the same ETP trading pattern as a decrease in the security’s price, thus resulting in differing regression coefficients. Performing this separation of variables could therefore possibly increase the accuracy of the model. For instance, in a bearish market an investor might choose to hedge against further falls, or to go long with leverage to capitalize on a potential price rebound (using ETPs). On the contrary, in a bullish market there might not exist the same incentives to use ETPs, and the investor might choose a more long-term strategy and invest in single stocks, an investment option which also often incurs lower trading costs. We motivate this method of separation based on previous research which has suggested that positive and negative returns have different effects on the trading volume, e.g.

[13].

Regressor variables which comprise the positive return of some asset have been named on the form asset pos. Equivalently, the negative returns have

(32)

been named asset neg. Regressor variables which comprise the absolute value of the daily return have been named asset change. For the cases where the regressor variables equal some asset’s price or value, the variables have been named asset price.

4.5.2 Number of Trades

The number of trades in ETPs is the aggregated number of trades in all ETPs on NDX during one trading day. The number of trades is the dependent variable in our analysis. Henceforth, y and trades will be used interchangeably to denote the number of trades.

4.5.3 Equity Indices

A substantial portion of the trading at NDX is in instruments with equity indices as underlying assets [5]. These include (but are not limited to) CAC 40, DAX 30, DJIA, EURO STOXX 50, FTSE 100, NASDAQ-100, OMXS30, and S&P 500. All these indices are included in the analysis, except for OMXS30 and FTSE 100 which were replaced by OMXSPI and Cboe UK 100 due to erroneous data. Evidently, some of the world’s largest stock exchanges are absent and not represented by these indices, including the exchanges in Shanghai, Shenzen, Hong Kong, London, Toronto, Mumbai, Sydney and Seoul. To somewhat cover these too, the equity index MSCI World Index was included in the analysis.

Further, the equity indices OMXH25, OBX, and OMXC20 were included despite the lack of ETPs with these as the underlying asset. This was due to the substantial trading in Finnish, Norwegian and Danish single stocks on NDX.

For all the aforementioned indices, a separation between the positive and negative daily return has been done according to section 4.5.1. That is, each equity index is represented by two variables in the regression analysis, for in- stance DAX 30 is represented by dax30 pos and dax30 neg.

Equity indices are most often price-weighted or capitalization-weighted (also cap-weighted). In a price-weighted index, each constituent is weighted in propor- tion to their share price. In a cap-weighted index, each constituent is weighted in proportion to their market capitalization.

See table 1 for a full list of included equity indices and descriptions, and table 3 for the corresponding regressor variables.

4.5.4 Stocks

Substantial trading takes place in ETPs with single stocks as the underlying asset. However, considerable price movements in single stocks, e.g. Tesla, are not fully captured in equity indices such as the S&P 500. To account for some of the most frequently traded stocks on NDX in our analysis, the following stocks are included: Tesla, Inc., Apple, Inc., Amazon, Inc., H & M Hennes & Mauritz AB, Aktiebolaget Volvo, Telefonaktiebolaget LM Ericsson, Lundin Petroleum AB,7 Danske Bank A/S, Nokia Abp, and DNO ASA.

For the American single stocks (i.e. Tesla, Apple and Amazon), the positive and negative daily returns are separated according to section 4.5.1 as different

7On April 6 2020, Lundin Petroleum AB changed its name to Lundin Energy AB with the new ticker symbol LUNE.

(33)

Equity index Description

CAC 40 Cap-weighted index of Euronext Paris

Cboe UK 100 Cap-weighted index of London Stock Exchange DAX 30 Cap-weighted index of Frankfurt Stock Exchange DJIA Price-weighted index of NYSE and Nasdaq Euro STOXX 50 Cap-weighted index of 50 Eurozone stocks MSCI World Index Cap-weighted index of 1,643 global stocks NASDAQ-100 Cap-weighted index of Nasdaq

OBX Cap-weighted index of Oslo Børs

OMXC20 Cap-weighted index of Nasdaq Copenhagen OMXH25 Cap-weighted index of Nasdaq Helsinki OMXSPI Cap-weighted index of Nasdaq Stockholm S&P 500 Cap-weighted index of NYSE and Nasdaq

Table 1: Equity indices

Corporation Stock Ticker

Tesla, Inc. TSLA

Apple, Inc. AAPL

Amazon, Inc. AMZN

H & M Hennes & Mauritz AB HM B

Aktiebolaget Volvo VOLV B

Telefonaktiebolaget LM Ericsson ERIC B

Lundin Petroleum AB LUPE

Danske Bank A/S DANSKE

Nokia Abp NOKIA

DNO ASA DNO

Table 2: Stocks

variables. For the Nordic single stocks (i.e. the remaining), the absolute sign of the relative price change is used as the variable. This was purely to simplify the analysis. See table 2 for a complete list of the included stocks and table 3 for the corresponding regressor variables.

4.5.5 Commodities

Commodities are frequently traded on NDX. The most actively traded com- modities on NDX are energy commodities and precious metals. The commodi- ties included in the analysis are crude oil futures prices (WTI and Brent), gold futures prices, silver futures prices and palladium futures prices. There are many more tradable commodities on NDX, however as the aforementioned account for the most part of the trading volume, the delimitation to these commodities is deemed appropriate. Ideally, natural gas prices would have been included in the analysis, however due to erroneous and unreliable data it was omitted. Like the case for equity indices, a separation between the positive and negative returns was performed. See table 3 for the regressor variables.

(34)

4.5.6 Volatility

As mentioned previously, several studies suggest that volatility has a great im- pact on the number of trades and trading volumes. Therefore, the volatility indices VIX, VVIX and VSTOXX are included. The Cboe Volatility Index, commonly known by its ticker symbol VIX, is a measure of the implied volatil- ity which is being priced into S&P 500 options [53, ch. 2]. More specifically, it is the 30-day implied volatility that is being priced into S&P 500 index options [54]. It is computed using the prices of S&P 500 index options with a maturity of between 23 and 37 days [54]. Five regressor variables were included which relate to the VIX. Firstly, the price of the VIX. Secondly, the positive and negative (relative) returns of the VIX (according to section 4.5.1). Lastly, the positive and negative (absolute) returns of the VIX. Let ˜r denote the absolute return of an asset. Then, using the notations in section 4.5.1, the positive and negative absolute returns are mathematically given by:

˜

rt+= pt− pt−1 ·1{pt−pt−1>0}

˜

rt= pt− pt−1 ·1{pt−pt−1<0}

The reason for this is that the absolute return of the VIX could potentially yield additional useful information and help explain the number of trades.

A European corresponding volatility index is the Euro STOXX 50 Volatility (VSTOXX). The VSTOXX is computed in a very similar fashion to the VIX, with the implied volatility derived from option prices on the Euro STOXX 50 index [55, ch. 8]. Two regressor variables were included related to the VSTOXX;

the price of the VSTOXX and the change of the VSTOXX.

Further, the volatility of volatility measure VVIX was included in the analy- sis. It measures the 30 day implied volatility which is being priced into options on the VIX. Three variables were included in the analysis related to the VVIX;

the positive and negative returns of the VVIX, as well as the price of the VVIX.

See table 3 for the corresponding regressor variables.

4.5.7 Currencies

On NDX, there are around 750 listed ETPs with various currencies as underly- ing assets, including foreign exchange rates and crypto currencies. Substantial trading takes part in ETPs with these underlying assets, why the exchange rates EUR/SEK and USD/SEK, as well as BTC/USD were included in the analy- sis.8 For the currency exchange rates EUR/SEK and USD/SEK, the price of the asset as well as the positive and negative returns were included as regressor variables. For BTC/USD, only positive and negative returns were included. See table 3 for a list of the corresponding regressor variables.

8BTC is the most frequently used currency code for Bitcoin. However, XBT is the ISO 4217 currency code for Bitcoin.

(35)

Table 3: A complete collection of all 65 regressor variables

Variable Regression variable

Positive return CAC 40 cac40 pos Negative return CAC 40 cac40 neg Positive return Cboe UK 100 uk100 pos Negative return Cboe UK 100 uk100 neg Positive return DAX 30 dax30 pos Negative return DAX 30 dax30 neg

Positive return DJIA djia pos

Negative return DJIA djia neg

Positive return Euro STOXX 50 eurostoxx50 pos Negative return Euro STOXX 50 eurostoxx50 neg Positive return MSCI World Index msci pos

Negative return MSCI World Index msci neg Positive return NASDAQ-100 nasdaq100 pos Negative return NASDAQ-100 nasdaq100 neg

Positive return OBX obx pos

Negative return OBX obx neg

Positive return OMXC20 omxc20 pos Negative return OMXC20 omxc20 neg Positive return OMXH25 omxh25 pos Negative return OMXH25 omxh25 neg Positive return OMXSPI omxspi pos Negative return OMXSPI omxspi neg Positive return S&P 500 spx pos Negative return S&P 500 spx neg

Positive return TSLA tsla pos

Negative return TSLA tsla neg

Positive return AAPL aapl pos

Negative return AAPL aapl neg

Positive return AMZN amzn pos

Negative return AMZN amzn neg

Change HM hm change

Change VOLV volv change

Change ERIC eric change

Change LUPE lupe change

Change DANSKE danske change

Change NOKIA nokia change

Change DNO dno change

Positive return WTI wti pos

Negative return WTI wti neg

Positive return Brent brent pos Negative return Brent brent neg

Positive return Gold gold pos

Negative return Gold gold neg

Positive return Silver silver pos Negative return Silver silver neg

(36)

Positive return Palladium palladium pos Negative return Palladium palladium neg

Price VIX vix price

Positive return VIX (%) vix pos pct Negative return VIX (%) vix neg pct Positive return VIX (absolute) vix pos abs Negative return VIX (absolute) vix neg abs

Price VVIX vvix price

Positive return VVIX vvix pos

Negative return VVIX vvix neg

Price VSTOXX vstoxx price

Change VSTOXX vstoxx change

Positive return USD/SEK usdsek pos Negative return USD/SEK usdsek neg

Price EUR/SEK eursek price

Positive return EUR/SEK eursek pos Negative return EUR/SEK eursek neg Positive return BTC/USD btc pos Negative return BTC/USD btc neg

(37)

5 Results

5.1 Initial Model

The initially fitted model with all 65 regressor variables and 1162 observations produced an adjusted R2 of 0.8103. In order to verify the fundamental OLS assumptions, residual analysis was performed. The Tukey-Anscombe plot in figure 3 shows an increasing variance with larger fitted values ˆy. Thus het- eroscedasticity clearly is present. Further, the asymmetry of residuals around zero suggests problems with the model. It could potentially be due to the lack of some important regressor variables or the lack of a quadratic term of a currently included regressor variable [41, p. 346-348]. The Q-Q plot in figure 3 suggests tendencies that the residuals derive from a long-tailed distribution rather than a normal distribution [41, p. 358]. These are serious model assumption viola- tions that need to be dealt with before proceeding with the analysis and model building.

5.1.1 Altered Model

James et al. propose using concave functions to transform the response variable to solve the problem of heteroscedasticity [56, p. 95]. Such concave functions include √

y and log(y). Both transformations were assessed, and the latter transformation proved superior to the former with regards to heteroscedasticity and normality, see figures 4 and 5. Therefore, we chose to proceed with the model

log(y) = Xβ + .

Henceforth, this model will be referred to as model A.

5.2 Residual Analysis

The Tukey-Anscombe plot in figure 5 suggests no violation of the assumption E = 0 as the residuals are scattered rather symmetrically around zero. In- deed, by looking at the red trend line, one can tell it is not exactly zero, but rather close. Further, the plot demonstrates no particular pattern or shape of the residuals. Hence, the plot justifies the assumption of homoscedasticity.

According to the Q-Q plot of model A in figure 5, the residuals are fairly nor- mally distributed. The distribution of the tails is not perfect and systematic deviations are evident. However, if deviations in the tails are systematic, such deviations are less worrying [50, p. 97]. Thus, it remains plausible that errors are indeed normally distributed.

5.3 Autocorrelation

As an initial attempt to visualize any potential autocorrelation in model A, the residuals were plotted in a lag plot, see figure 6a). The lag plot indicates quite severe autocorrelation as the scatter of points forms a straight line (more or less). The positive slope of the line indicates a strong positive serial correlation.

To investigate this further, a Durbin-Watson test was performed. With a test statistic d = 0.8642218 and a p-value of less than 2.2e − 16, the null hypothesis

(38)

Figure 3: Tukey-Anscombe and Q-Q plots for initial fit.

Figure 4: Residual plots for√

y = Xβ + .

Figure 5: Residual plots for log(y) = Xβ + .

(39)

(a) Model A (b) Model B

Figure 6: Plots of ei versus ei−1

of no autocorrelation between the errors was rejected (see table 4). Thus it is clear that there exists a positive serial correlation.

5.3.1 An Additional Model

To mitigate the problem of autocorrelation, an additional model with a lagged dependent variable was created (henceforth referred to as model B):

yt= φyt−1+ xtβ + t.

Note that the lagged dependent variable yt−1 will throughout this thesis be referred to as trade lag. A Durbin-Watson test was performed on model B. The test statistic d = 1.524416 lies inside the interval of what is deemed normal. However, the p-value of 2.2e − 16 suggests that there is some positive autocorrelation still present, however not particularly severe. Thus, this seems to have fixed the issues with autocorrelation. Further, the lag plot in figure 6b) shows a scatter of points which form cluster with no clear shape or pattern.

Model Lag Autocorrelation D-W Statistic p-value

A 1 0.5667505 0.8642218 2.2e − 16

B 1 0.235428 1.524416 2.2e − 16

Table 4: D-W test statistics

When analyzing the residual plots for model B, similarities to the initial model were detected. A decision to transform the dependent variable accord- ingly was taken:

log(yt) = φyt−1+ xtβ + t. Henceforth, this analysis will proceed with two models.

5.4 Leverage and Influential Points

Cook’s D, DFFITS and COVRATIO were used to detect outliers and influ- ential points. No influential points were detected using Cook’s D. However, criticism has been leveled against Cook’s D advocating it does not always suc- cessfully manage to capture influential points [57]. In addition, the cutoff values of DFFITS provide guidelines rather than strict rules [40, p. 218]. Hence, the

References

Related documents

The ambiguous space for recognition of doctoral supervision in the fine and performing arts Åsa Lindberg-Sand, Henrik Frisk &amp; Karin Johansson, Lund University.. In 2010, a

This study adopts a feminist social work perspective to explore and explain how the gender division of roles affect the status and position of a group of Sub

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Samtidigt som man redan idag skickar mindre försändelser direkt till kund skulle även denna verksamhet kunna behållas för att täcka in leveranser som

Swedenergy would like to underline the need of technology neutral methods for calculating the amount of renewable energy used for cooling and district cooling and to achieve an

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating