• No results found

Modelling the executed volume of the OMXS30 TM index futures morning call auction

N/A
N/A
Protected

Academic year: 2022

Share "Modelling the executed volume of the OMXS30 TM index futures morning call auction"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

Royal Institute of Technology

Department of Mathematical Statistics

Bachelor Thesis

Modelling the executed volume of the OMXS30 ™ index futures

morning call auction

a collaboration with nasdaq omx market research

Author:

Jakob Hallmer hallmer@kth.se

Supervisor:

Gunnar Englund gunnare@kth.se

May 21, 2013

(2)

Abstract

The thesis aims to study and model the executed volume of the OMXS30™

futures morning call auction. Better understanding of the underlying me- chanics of this auction could possibly be used by traders to gain an advan- tage over the competition. It is also of pure academic interest to question if it is possible to predict the outcome of the auction in a meaningful way.

Short interviews were conducted to determine some of the initial co- variates to be included in the model. The data was collected and then used in a multiple linear regression model. Various tests were run to determine which covariates were significantly reliable and if not significant they were excluded. This resulted in a final model with an acceptable R2 where all covariates were highly significant. The result was then bootstrapped and cross validated using standard techniques.

The author draws the conclusion that considering the limitations of the linear model and the semi-random behavior of the auction, the model satisfies the primary purpose of the study. It is also noted that further research, specifically in time series analysis, can probably further elucidate the issue.

(3)

Sammanfattning

Syftet med avhandlingen var att unders¨oka och modellera den ex- ekverade volymen OMXS30™ terminer under morgonauktionen. B¨attre f¨orst˚aelse av de underliggande mekanismerna i denna auktion kan m¨ojli- gen anv¨andas av handlare f¨or att f˚a en f¨ordel gemtemot konkurrenterna.

Det ¨ar ocks˚a av rent akademiskt intresse att ifr˚agas¨atta om det ¨ar m¨ojligt att p˚a ett meningsfullt s¨att f¨oruts¨aga resultatet av auktionen.

Korta intervjuer genomf¨ordes f¨or att best¨amma n˚agra av de kovariater som ingick i modellen. Datan samlades in och anv¨andes sedan i en mul- tipel linj¨ar regressionsmodell. Flera tester genomf¨ordes f¨or att best¨amma vilka kovariater som var signifikant tillf¨orlitliga, och om inte signifikanta exkluderades de f¨oljdaktligen. Detta resulterade i en slutlig modell med en acceptabel R2 d¨ar alla kovariater var mycket signifikanta. Resultatet blev d¨arefter bootstrappat och korsvaliderat med hj¨alp av standardmetoder.

F¨orfattaren drar slutsatsen att med tanke p˚a begr¨ansningarna hos den linj¨ara modellen och det n˚agot slumpm¨assiga beteendet av auktionen s˚a uppfyller modellen det prim¨ara syftet med studien. Det noterades ocks˚a att ytterligare forskning, framf¨orallt inom tidsserieanalys, f¨ormodligen kan klarg¨ora fr˚agan ytterliggare.

(4)

Acknowledgements

I would like to express my sincere gratitude to Petter Dahlstr¨om and Patrik Br˚akenhielm. This thesis would not have been possible without their invaluable help and guidance.

I would also like to thank Sina Kazemi Vala for all support and proof reading.

(5)

Contents

List of Tables I

List of Figures II

1 Introduction 1

2 Financial theory 1

2.1 Index . . . 1

2.1.1 OMXS30™ . . . 1

2.2 Derivatives . . . 2

2.2.1 Futures contracts in detail . . . 2

2.2.2 Expiration . . . 2

2.2.3 Closing out positions . . . 2

2.2.4 Margin and the Clearing House . . . 3

2.3 The OMXS30™ Futures . . . 3

2.3.1 Call Auction . . . 3

3 Mathematical theory 4 3.1 Multiple regression analysis . . . 4

3.1.1 Linear regression model . . . 4

3.1.2 Ordinary Least Squares (OLS) . . . 4

3.1.3 Assumptions . . . 5

3.1.4 Covariates . . . 5

3.1.5 Dummy variables . . . 5

3.2 Multicollinearity . . . 5

3.2.1 Variance Inflation Factor (VIF) . . . 6

3.2.2 Correlation Matrix . . . 6

3.3 Model Selection and Validation . . . 6

3.3.1 R Squared and Adjusted R Squared . . . 7

3.3.2 Bayesian Information Criterion (BIC) and Akaike Infor- mation Criterion (AIC) . . . 7

3.3.3 K-Fold Cross-validation . . . 7

3.3.4 Leave-one-out Cross-validation (LOOCV) . . . 8

3.3.5 Bootstrap . . . 8

3.4 Significance test and Hypothesis testing . . . 8

3.4.1 Hypothesis testing . . . 8

3.4.2 The t statistic . . . 8

3.4.3 The F statistic . . . 9

3.4.4 The p-value . . . 9

4 Model and methodology 9 4.1 Confidence level . . . 9

4.2 Preliminary interviews . . . 10

4.2.1 Covariates from interviews . . . 10

4.3 Data collection . . . 10

4.3.1 Altered data points . . . 11

4.3.2 Excluded data points . . . 11

4.4 The Null Model Hypothesis . . . 11

(6)

4.5 Log model . . . 11

4.6 Covariates . . . 12

4.6.1 Dummy variables . . . 12

4.6.2 Transformed covariates . . . 13

4.7 Model validation . . . 13

5 Results and analysis 13 5.1 Intermediary models . . . 13

5.2 Creating the Final Model . . . 14

5.3 Final Model . . . 15

5.4 Analysis of the covariates in the Final Model . . . 16

5.4.1 Multicollinearity . . . 17

5.5 Linear Regression Assumptions . . . 18

5.5.1 Normality . . . 18

5.5.2 Linearity . . . 21

5.5.3 Independence . . . 22

5.5.4 Homoscedasticity . . . 23

5.5.5 Multicollinearity . . . 24

5.5.6 Assumptions summary . . . 24

5.6 Final Model compared to the Intermediary Models . . . 24

5.7 Final Model compared to the Null Model . . . 24

5.8 Bootstrapped Final Model . . . 26

6 Discussion 26 6.1 Covariates and their Beta-values . . . 26

6.1.1 In the Final Model . . . 26

6.1.2 Excluded . . . 27

6.2 Further research . . . 28

6.2.1 Additional covariates . . . 28

6.2.2 Generalized Linear Models . . . 28

6.2.3 Time series analysis . . . 28

7 Conclusion 28

8 Appendices 29

9 References 31

(7)

List of Tables

1 Suggested and resulting covariates from the interviews . . . 10

2 OMXS30 Futures Low Price altered data points . . . 11

3 The stepwise algorithms used to reduce the full model . . . 13

4 How many times a covariate was in a model . . . 14

5 Statistics for the Intermediary Models . . . 14

6 The covariates from the Final Model. . . 16

7 Bootstrapped β values from the Final Model. . . 16

8 Variance Inflation Factor Test . . . 18

9 Results from the Durbin-Watson test. . . 23

10 Final and Intermediary models comparison . . . 24

11 Final and Null model comparison . . . 25

12 ANOVA table comparing the Null Model and the Final Model . 25 13 Correlation Matrix for the Final Model. xi is the corresponding left side covariate. . . 29

14 All covariates in the initial regression model. . . 30

(8)

List of Figures

1 Relative Importance of Covariates - Index 1 is Call Volume Evening

Next Futures in Table 6 et cetera. . . 17

2 Histogram with Normal Curve . . . 19

3 Normal Q-Q Plot over standardized residuals . . . 20

4 Kernel Density Plot . . . 21

5 Residual vs. Fitted Plot . . . 22

6 Scale-Location Plot . . . 23

(9)

1 Introduction

The executed volume of the OMXS30™ opening call auction varies greatly from day to day. Knowledge of the underlying reasons for this variation may indicate known and unknown market mechanisms, if it is possible to predict it with a high level of statistical significance.

The main purpose of the thesis was to model that volume and see if it was possi- ble to predict it with a high statistical significance. Secondary purposes included testing which covariates correlated, or did not correlate, with the volume.

To model the volume a regression model based on multiple regression analysis with ordinary least squares (OLS) was used. The model was later bootstrapped and cross validated.

2 Financial theory

In this section some basic financial theory regarding futures and the financial market in general will be explained.

2.1 Index

An index is a statistical measure of a group of data points derived from an arbitrary number of sources. Examples include the S&P 500® which tracks the US Equity Market, the NASDAQ OMX Valueguard-KTH Housing Index (HOX™) which tracks the Swedish house market and the SCB Consumer Price Index (KPI) which tracks consumer prices in Sweden[4, p. 8, 1, 11].

2.1.1 OMXS30™

The OMXS30™ Index according to NASDAQ OMX: ”OMX Stockholm 30 is the Stockholm Stock Exchange’s leading share index. The index consists of the 30 most actively traded stocks on the Stockholm Stock Exchange. The limited number of constituents guarantees that all the underlying shares of the index have excellent liquidity, which results in an index that is highly suitable as underlying for derivatives products. The composition of the OMXS30™ index is revised twice a year. The OMXS30™ Index is a market weighted price index[15].”

The formula for calculating the OMXS30™ Index in SEK is defined as

It=

n

P

i=1

qi,t∗ pi,t∗ ri,t n

P

i=1

qi,t∗ (pi,t−1− di,t) ∗ ri,t−1∗ ji,t

∗ It−1 (1)

where the index i is the company and t is time. The variable q is number of shares, p is price in the quoted currency, d is dividend, r is exchange rate of quoted currency and j is an adjustment factor for corporate actions[19, p. 6][18, p. 4].

(10)

2.2 Derivatives

A derivative is a financial instrument that depends on an underlying instru- ment[24, p. 16]. Common examples are Options, Futures, Forwards and Swaps[13, p. 3]. They are generally used to redistribute risk, either increasing or decreas- ing. For instance, buying a call option will lower your risk while selling one would increase the risk[24, p. 16]. Buying a contract is referred to as taking a long position, while selling is referred to as taking a short position.[9, p. 4]

A derivative always has a set contract size which is defined as the deliverable quantity of underlying assets. For instance a stock option can have the common contract size 100 which would mean that if exercised, 100 stocks will have to be delivered. This creates leverage as a small initial investment gives a very large exposure in the underlying.

Not all trading is done on the electronic marketplace. Some is instead traded over-the-counter (OTC) which means a deal is agreed upon between two parties by themselves or via a broker and then reported to the market for clearing.[9, pp. 2-3]

2.2.1 Futures contracts in detail

A futures contract is a standardized derivative contract where both parties are obligated to follow through with the deal. Since both parties are on equal terms (as opposed to with an option) no premium is exchanged[24, p. 46]. Futures are Marked-to-Market (MtM) which means they employ periodic, usually daily, cash settlement against the fixing value. The final fixing is usually against the underlying and not the futures[18, p. 9, 9, p. 27].

2.2.2 Expiration

Derivatives can either employ physical delivery or be cash settled upon expiry.

If an option stipulates physical delivery the underlying will be exchanged if the option is exercised. If the option instead uses cash settlement the equivalent amount of cash will be paid.[9, pp. 33-34]

2.2.3 Closing out positions

The vast majority of contracts with physical delivery never lead to an actual delivery. This is because most traders close out their positions prior to the expiry date. Closing out simply means that the trader acquires the other side of his current position. ”For example, the New York investor who bought a July corn futures contract on March 5 can close out the position by selling (i.e.

shorting) one July corn futures contract on, say, April 20.” The investor’s gain or loss is determined by the futures’ price change between the date of purchase and the day he closed it out[9, p. 23].

(11)

2.2.4 Margin and the Clearing House

Trading with derivatives is associated with risk. For instance, the short side of a physical delivery style call option deal might not be able to deliver the underlying if he is assigned. For this reason a clearing house will act as an intermediary in all transactions. The clearing house guarantees that all contracts are honored.

This is partly accomplished by enforcing margin requirements.

A margin is a collateral deposited into the clearing house’s account. It is usually paid in cash but some clearing houses accepts instruments such as securities and treasury bills, but often at a reduced rate. Margin levels are determined by the volatility of the underlying asset.

When a futures position is opened the trader has to provide initial margin. To allow some market movement the clearing house declares the minimum level of margin that has to be kept on the account known as maintenance margin. The futures is marked-to-market and the daily settlement will be paid with (or added to) his margin capital. If the market moves unfavorably to the trader’s position and the posted margin dips below the maintenance margin the clearing house will issue a margin call. If the trader does not top up his margin account to the initial margin level, the clearing house will close his position[9, pp. 26-29].

2.3 The OMXS30 ™ Futures

The OMXS30™ Futures is a Marked-to-Market futures with the OMXS30™ share index as underlying. ”The final settlement is the difference between the previous day’s futures closing price and a volume weighted average price of the OMXS30™ index on the expiration day[16].” The contract size is 100. It expires on the third Friday of the expiration month of the expiration year [18, p. 40].

The naming of the OMXS30™ Futures follows NASDAQ OMX Nordic Deriva- tives standardized naming convention

(U nderlying short code)(Y ear)(M onth)(Strike) (2) where year is the last digit of the expiration year, month is the letter A-L for Call Options and Futures and M-X for Put Options and Forwards. Strike price is omitted for non-options.

For instance the OMXS30™ Futures that expires in March 2013 would be called OMXS303C[18, pp. 12-13].

2.3.1 Call Auction

The trading day for OMXS30™ Futures starts with a 5 minute long Opening Call Auction at 08:55 CET to determine the equilibrium price prior to ordinary continuous trading. The equilibrium price is calculated using all existing prices between the highest and lowest price where Limit Orders exist, extended one tick up from the highest price and one tick down from the lowest price.

(12)

Equilibrium price is the price which achieves the highest volume to be allocated.

If the highest bid is lower than the lowest ask price or if there are multiple highest volumes for different prices the calculation will fall back to other rules.

Finally the trading day ends with a closing call with minor differences from the opening call[17, p. 2].

3 Mathematical theory

3.1 Multiple regression analysis

Multiple regression analysis is a statistical technique for estimating the linear relationship between a dependent variable and multiple covariates.

The most common example is the Ordinary Least Squares (OLS) regression but there are many more generalized linear regressions. Examples include the Logit where the dependent variable is binary or a Poisson generalized model where a Poisson distribution is assumed for the dependent variable.

3.1.1 Linear regression model

The general linear regression model is defined as yi=

n

X

k=0

βkxi,k+ i , i = 1, . . . , j (3)

where yi is an observation of the dependent stochastic variable y, βk are the re- gression coefficients, xi,kare the regressors or covariates and iis the associated error term. Here xi,0is identically 1 if we define β0 as the commonly included intercept[28, p. 71].

Equation 3 can also be written in matrix form

Y = Xβ +  (4)

where

Y =

 y1

... yj

, X =

 xT1

... xTj

=

x1,1 · · · x1,n ... . .. ... xj,1 · · · xj,n

, β =

 β1

... βk

,  =

1 ...

j

 (5)

3.1.2 Ordinary Least Squares (OLS)

The Ordinary Least Squares (OLS) estimation of β is the value that minimizes the sum of squares ˆ2ˆ = |ˆ|2of the residuals ˆ = Y − X ˆβ. One way to achieve this is by solving the normal equations[8, pp. 20,21]

Xt = 0 (6)

(13)

3.1.3 Assumptions

Linear regression with Ordinary Least Squares makes a number of assumptions that if violated can result in many different problems resulting in at best an inefficient model and at worst a seriously biased or misleading model. The most principal assumptions are

Linearity The relationship between the dependent and independent variables are linear.

Homoscedasticity The error terms i are homoscedastic which implies con- stant variance

var{i| X1,i. . . Xn,i} = σ2 (7) Normality The error terms i are normally distributed with expected value

zero.

i∼ N (0, σ2) (8)

Independence The error terms i are uncorrelated with each other and the independent variable. This includes no autocorrelation.

Multicollinearity No perfect multicollinearity.

These assumptions need to be carefully checked and evaluated when forming a linear regression model[14].

3.1.4 Covariates

Covariates are the explanatory x variables in the regression model. Possible examples are the outside temperature in degrees if the independent variable is humidity or a person’s age in years if the independent variable is the risk of a certain type of cancer[28, p. 23].

3.1.5 Dummy variables

A dummy variable is a special covariate that only takes binary states, e.g. 0 or 1.

They are usually used to control for specific variables that cannot be continuous and are mutually exclusive such as if the person is a man or a woman. To control for this a dummy variable for woman could be used that is 1 if the person is a woman and 0 if the person is a man. In that case man would be the benchmark or the ’default state’. Adding another dummy for man would create singularities when inverting the matrix for the linear model[28, p. 218].

3.2 Multicollinearity

Multicollinearity is a phenomenon when two or more covariates in a regression model are highly correlated. This has the effect that coefficient estimates may change erratically with small changes in the model which in turn means that the results from two or more highly correlated covariates cannot be trusted.

(14)

It also has the unfortunate side effect that numerical matrix inversion used when computing an OLS with an algorithm cannot be successful due to numerically equivalent singularities.

You can practically never eliminate multicollinearity between covariates so what one generally means when saying there is none is that the multicollinearity is less than the accepted level[28, pp. 96,101].

3.2.1 Variance Inflation Factor (VIF)

One way to quantify the multicollinearity is to employ a VIF test. The VIF is defined as

V IF (βi) := 1

1 − R2i (9)

when running an OLS regression with the covariate Xias the dependent variable against all other covariates.

The square root of the VIF can be interpreted as how much larger the stan- dard error is in relation to if that covariate was uncorrelated with the other covariates.

A common suggested threshold for ’too high’ is a VIF greater than 5 or 10[10, pp. 199,200].

3.2.2 Correlation Matrix

A Correlation Matrix is matrix of correlation elements ρi,j∈ [−1, 1]. It is used to describe the linear correlation between the two random variables Xi and Yj.

ρX,Y = Corr(X, Y ) = cov(X, Y ) σXσY

= E[(X − µX)(Y − µY)]

σXσY

(10) A value of 1 implies a perfect positive linear relationship while a value of -1 implies a perfect negative linear relationship. A value of 0 indicates linear inde- pendence[10, pp. 283,284]. For a series of observations x, y the ρX,Y equivalent sample correlation coefficient rx,y is then

rx,y=

n

X

i=1

(xi− ¯x)(yi− ¯y) (n − 1)sxsy =

Pn

i=1(xi− ¯x)(yi− ¯y) pPn

i=1(xi− ¯x)2Pn

i=1(yi− ¯y)2 (11)

3.3 Model Selection and Validation

When running a regression it is of importance to be able to compare two or more models to decide which is the better one. There is no test to determine the true ’best model’ and therefore multiple tests will be used.

(15)

3.3.1 R Squared and Adjusted R Squared

The R Squared or R2 is a statistic used in linear regression for measuring the goodness of fit of a model. It can hold values from 0 to 1 where 0 is essentially no fit and 1 is a perfect fit. R2 is defined as

R2:= 1 − SSerr

SStot = 1 − P

i(yi− ¯y)2 P

i(yi− fi)2 (12)

where yiare the observed values, ¯y is the mean of the observed values, fiare the fitted values, SStot is the total sum of squares and SSerr is the sum of squared residuals.

R2 is subject to overfitting i.e. it will increase as we add more covariates. To combat this Adjusted R Squared or R2adj can be used which adds a penalty for every covariate. R2adj is defined as

R2adj:= 1 − (1 − R2) n − 1

n − p − 1 = R2− (1 − R2) p

n − p − 1 (13) where p if the total number of explanatory variables excluding the intercept and n is the sample size[40,41 28, pp. 192,202, 28].

3.3.2 Bayesian Information Criterion (BIC) and Akaike Information Criterion (AIC)

Two tests for model validation are the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) which are defined as

BIC := −2 ln L + p ln n (14)

AIC := −2 ln L + 2p (15)

where n is the number of observations, p is the number of explanatory variables and ln L is the Log-Likelihood which is the statistical probability of the sample given the parameters of the model.

They differ only in the second term where AIC has the coefficient 2 and BIC has the natural logarithm of n. As such BIC penalizes larger models more heavily than AIC for n > 7 which is the case for all but the smallest samples.

The goal with both AIC and BIC testing is to minimize their value. When comparing two models the ’best’ one is therefore the one with the smallest BIC or AIC. Since the two differ in how they compute their value it is possible for one model to have the smallest AIC and at the same time the largest BIC[10, pp. 208,209, 22].

3.3.3 K-Fold Cross-validation

K-Fold Cross-validation is a technique used to assess how a model will generalize to an independent data set. The general idea is to randomly partition the original sample into K equal sized ’folds’. Each fold is then removed, in turn, while the remaining folds are used to re-fit the model and to predict against the removed fold[27].

(16)

3.3.4 Leave-one-out Cross-validation (LOOCV)

Leave-one-out Cross-validation is a special case of K-Fold Cross-validation where K equals the number of observations in the sample[27].

3.3.5 Bootstrap

Bootstrap is a technique where sample data is resampled. For regression prob- lems this means case resampling where all rows are resampled as a unit. The simplest bootstrap method revolves around taking the original data set with size n. Then sampling it to form a new sample with the same size n. The new sample is taken from the original sample with replacement. The new sample in then used to compute whatever statistic we are interested in. This is then repeated over and over until we have reached our set replication limit.

To get a good bootstrap it is important to do ’enough’ replications where

’enough’ is usually a number above 10000 but recommended to be ’as high as possible’ considering time and CPU cycle limitations[26].

Confidence intervals with Bootstrap was created in using the bias-corrected and accelerated (BCa) bootstrap. It adjusts for both bias and skewness in the bootstrap distribution.

3.4 Significance test and Hypothesis testing

3.4.1 Hypothesis testing

To test a hypothesis an assumed true null hypothesis H0 and an alternative hypothesis H1 is set up. The null hypothesis is usually the status quo or the state where there is no change e.g. the medicine do not help. The alternative hypothesis in this case is then that the medicine do help. This is then tested against a significance level α of how often we accept rejecting H0when it is true.

Common levels are 10%, 5% and 1% but can be chosen arbitrarily. If the test statistic is less than our set risk level we reject H0 in favor of the alternative hypothesis and similarly if it is greater than our risk level we fail to reject H0[28, pp. 748,768].

3.4.2 The t statistic

In linear regression hypothesis testing with one variable the t statistic is often used. It is usually written

tstat= estimate − hypothesized value

standard error (16)

where the hypothesized value usually is 0. This t statistic is distributed on the t distribution with the degrees of freedom ν = n − k − 1 where n is the number of

(17)

observations and k is the number of covariates. The t distribution’s probability density function is defined as

t(tstat; ν = n − k − 1) := Γ(ν+12 )

√νπΓ(ν2)



1 +tstat2

ν

ν+12

(17)

where ν is the degrees of freedom and Γ is the Gamma function[28, pp. 119,135].

3.4.3 The F statistic

Another similar statistic is the F statistic which is more often used to test mul- tiple hypotheses about the underlying β values. An example is when comparing a restricted model to an unrestricted model. F is defined as

Fstat:= SSRr− SSRur

q

n − k − 1

SSRur (18)

where SSRr,ur is the sum of squared residuals from the restricted and unre- stricted models, q is the number of exclusion restrictions n is the number of observations and k is the number of covariates. The F distribution’s probability density function is defined as

F (Fstat; d1= q, d2= n − k − 1) :=

r

(d1Fstat)d1dd22 (d1Fstat+d2)d1+d2

FstatB d21,d22 (19) where B is the beta function.

For a two-sided test where only one covariate is dropped, t2n−k−1can be shown to be equal to F1,n−k−1[28, pp. 142-153].

3.4.4 The p-value

The p-value is defined as given the observed value of the t or F statistic, which is the smallest significance level at which the null hypothesis would be rejected. To compute the p-value, the area under the F or t distribution outside of our F or t statistic is measured. The p-value is in practice used when forming a hypothesis test e.g. rejecting H0if the p-value is less than α[28, pp. 131-139].

4 Model and methodology

4.1 Confidence level

The confidence level for all hypothesis testing was set at 90% which gives a significance level α of 0.10. This will essentially allow us to reject a true null hypothesis one every ten times. To increase the accuracy of the hypothesis testing we will bootstrap the β-values for all covariates.

(18)

For the covariates the goal was for a p-value lower than 0.05 but up to 0.10 (p ≤ α) as denoted by our α was considered good enough. To complement this, BIC and AIC was implemented. A somewhat higher p-value was deemed acceptable if the BIC and AIC indicated that it was still a good idea to include the covariate and it was reasonable to do so.

4.2 Preliminary interviews

To get an initial set of covariates and a general idea of the feasibility of the study a few interviews were carried out. Interviewees were traders and similar at NASDAQ OMX with a good knowledge of the call auction as well as the financial market in general.

Most interviewees thought that it was indeed possible to predict the volume to some degree and most gave similar suggestions for possible covariates. Only a minority believed it to be futile cause.

4.2.1 Covariates from interviews

The following table was the result of the interviews.

Table 1: Suggested and resulting covariates from the interviews

Suggested variable Resulting covariate

Closeness to expiry Dummy variables 0-4 days from expiry

Volatility Volatility in current and next futures

Opening price in current futures Opening price divided by closing price in current futures

Opening price in a market that opens before Stockholm

Opening price divided by closing price in Nikkei 225 Index (N225)

4.3 Data collection

All data used in this thesis was collected exclusively from two sources:

• NASDAQ OMX Internal Database

• Yahoo Finance

This data was later used to calculate additional covariates. For instance the covariate N225 Volatility was calculated using N225 High/N225 Low. All trade data is single counted which means only one side of the transaction is consid- ered. See Table 14 for more details.

In total 507 observations were collected and used after all exclusions. This is essentially all data from the start of the morning futures auction on 2011-04-04 to the March expiry on 2013-04-19.

(19)

4.3.1 Altered data points

A few data points were modified due to missing or inconsistent data.

Two values for OMXS30 Future Low Price were changed due to an extremely low value. This was probably due to incorrect trades removed T+1 by the market surveillance. The database used for this data does not pick up those changes. The correct low was found by checking the lowest of all still active contracts for the day in question.

Table 2: OMXS30 Futures Low Price altered data points Contract Date Incorrect low Correct low

OMXS302K 2012-10-24 10.50 1024.50

OMXS032C 2012-03-07 10.69 1055.00

For the Nikkei 225 Index (N225) values for 27 days were missing due to different bank holidays. Here an average of the past two business days was used.

OMXS30 Next Future had zero volume on 28 occasions which meant no Volatil- ity, Open/Close and Close/Open could be calculated. To solve this the value 1 was used for those covariates.

Due to the dependent variable being log transformed (more on this in section 4.5) it could not hold the value 0 since the logarithm of zero is undefined. It was changed to 1 on 16 occasions.

4.3.2 Excluded data points

Three data points were removed due to them being extreme outliers. This was done visually by looking at different plots and by a Bonferroni Outlier Test for Studentized residuals where the p-values were much greater than our accepted 0.10[6].

4.4 The Null Model Hypothesis

To set up a null hypothesis a null model was created. It is simply an OLS regres- sion without any regressors resulting in a calculated intercept of 4.9695.

log Yi= 4.9695 + i (20)

This model was later used to test if our model was significantly different than

’no model at all’.

4.5 Log model

In some cases it can be beneficial to transform the dependent variable. One of these transformations is the log transformation:

(20)

log Yi = β0+ β1Xi+ i (21) This changes the interpretation of the independent variable so that a one unit change of X results in a 100β1% change of Y for small β-values.

Log transforming the dependent variable also has the added benefit that it often reduces the variance of the covariates coefficients. Therefore this is a good way to reduce heteroscedasticity.

A log transformed model with base e was used for all models.

4.6 Covariates

Covariates included were a mix of suggested covariates from the interviews and some other sensible additions including time lagged versions.

The final list of all 43 covariates used in the initial regression model can be found in Table 14 in the Appendix.

4.6.1 Dummy variables

Due to the nature of futures contracts and the effect of certain events a few dummy variables were created.

Half day Half days are special events in the financial market. Therefore a dummy to control for half days was added to the model. In total it had a non-zero value on nine occasions.

Short week A short week usually indicate that there is a bank holiday some- time during the week. In total it had a non-zero value on 63 occasions.

Week days To check if there was any difference between the weekdays, dum- mies were added for Monday, Tuesday, Thursday and Friday with Wednesday as benchmark.

Closeness to expiry The closer you are to the expiry date means that there are fewer days left to trade and benefit from the contract. Therefore dummies were added for 0, 1, 2, 3 and 4 days until expiry. Day 0 would mean it is the expiration day.

Expiration Week Since 4 days until expiry is not always (e.g. due to holi- days) in the expiration week a dummy was added for this. Though this was not expected to give very good results since it would be heavily correlated with the

’closeness to expiry’ dummies.

(21)

4.6.2 Transformed covariates

After the initial tests a reduced model was found. To improve the model even further, squared terms were added for all remaining count covariates (e.g.

volume and volume2). The reasoning is that it is very common to experience diminishing returns on covariates. For example an argument can be made that studying for 5 years is almost linearly beneficial to your income but studying for 20 is not. Adding a squared term would in that case model the behavior better and would result in a positive β for the normal term and a negative β for the squared term.

4.7 Model validation

To reduce the initial model, four iterative stepwise elimination algorithms were used.

Table 3: The stepwise algorithms used to reduce the full model Information Criterion Direction

Model 1 Akaike (AIC) Forward

Model 2 Akaike (AIC) Backward

Model 3 Bayesian (BIC) Forward

Model 4 Bayesian (BIC) Backward

Backward elimination means that the full model was used and then covariates were removed one at a time and the AIC or BIC was calculated for each removal.

The one with the biggest effect was removed from the model and then the procedure was started over again. This continued until there was no more improvement.

Using Forward elimination is essentially the same as Backward elimination but instead the starting point is a fully reduced model and instead of removing covariates they are added.

5 Results and analysis

In this section intermediary models will be analyzed and improved into a Final Model. The β values of all resulting covariates in the Final Model will be discussed. Their respective p-values will also be analyzed.

5.1 Intermediary models

The stepwise model validation produced four different models. Each of the following models came from the associated index in Table 3.

Out of 23 remaining covariates 13 were present in more than one model.

(22)

Table 4: How many times a covariate was in a model Count Evening Current Futures Previous Business Day 4

Evening Next Futures 4

0 days to exp 4

1 days to exp 4

Morning Current Futures Previous Business Day 4

Short week 3

Evening Next Futures Previous Business Day 2

Expiration week 2

Friday 2

Total Volume Current Futures 2

2 days to exp 2

3 days to exp 2

4 days to exp 2

Statistics were calculated for all models to be used as a way of choosing a candidate for the Final Model.

Table 5: Statistics for the Intermediary Models AIC BIC R2Adj LOOCV Mean Square

Model 1 1345 1400 0.562 0.838

Model 2 1338 1422 0.573 0.827

Model 3 1358 1391 0.546 0.859

Model 4 1344 1394 0.562 0.835

All models had many different problems such as multicollinearity, low signifi- cance on covariate β values and low significance on the intercept.

From this data Model 2 was chosen to be used for constructing the Final Model. It was chosen because it had the best AIC, R2Adj and LOOCV Mean Square. It also had good p-values for most βs and contained most covariates mentioned more than once.

5.2 Creating the Final Model

To increase the effectiveness of the model, squared terms were added for all non-dummies.

This resulted in some added multicollinearity which was checked by the VIF test.

To combat this and to negate the problem that most covariates had different averages of many orders of magnitude, covariate standardization was employed on all non-dummy covariates. Standardization is when the covariate covi is transformed

covdi= covi− average(datai)

std(datai) (22)

(23)

The squared term is thencovdi2.

Essentially this means that all covariates now have the mean 0 and standard deviation 1. A standardized variable behaves differently than before. For exam- ple, if an observation now has the value of 0.5 it is half the standard deviation above the mean. A value of -4 means it has a value four standard deviations below the mean.

Covariates still showing a high amount of multicollinearity and a p > 10 were removed.

Appropriate interaction terms were added but all had a p > 10 and were sum- marily removed from the model.

Thereafter all covariates were removed one by one and checked with AIC, BIC, R2Adj and LOOCV Mean Square to see if it could be reduced even further.

When AIC and BIC was not agreeing AIC was chosen to be the leading test if the difference was close or equal.

5.3 Final Model

The Final Model by OLS is as follows

lnhCall Volume Mor Curr Futi = 5.6064 +0.2373 ∗ (hCall Volume Eve Next Futi − 508)/1231 +0.3123 ∗ hShort weeki

−4.7143 ∗ h0 days to expi

−2.6695 ∗ h1 days to expi

−2.1457 ∗ h2 days to expi

−2.0062 ∗ h3 days to expi

−0.7257 ∗ h4 days to expi +0.2574 ∗ (hCall Volume Mor Curr Fut Prev BDayi − 261)/236

−0.0333 ∗ ((hCall Volume Mor Curr Fut Prev BDayi − 261)/236)2 +0.1157 ∗ (hCall Volume Eve Curr Fut Prev BDayi − 3691)/1960 +0.1877 ∗ (hCall Volume Eve Next Fut Prev BDayi − 318)/891 +0.4452 ∗ (hTotal Volume Curr Futi − 100363)/69450

−0.1183 ∗ (hVolatility Curr Futi − 1.02)/0.0111

−0.2117 ∗ hFridayi

(23)

(24)

5.4 Analysis of the covariates in the Final Model

Table 6: The covariates from the Final Model.

Estimate Std. Error t value Pr(> |t|)

(Intercept) 5.6064 0.0806 69.60 <2e-16 ***

Call Volume Evening Next FuturesS 0.2373 0.0835 2.84 0.00469 **

Short week 0.3123 0.1225 2.55 0.00469 *

0 days to exp -4.7143 0.4444 -10.61 < 2e-16 ***

1 days to exp -2.6695 0.3401 -7.85 2.6e-14 ***

2 days to exp -2.1457 0.4095 -5.24 2.4e-07 ***

3 days to exp -2.0062 0.3963 -5.06 5.9e-07 ***

4 days to exp -0.7257 0.2137 -3.40 0.00074 ***

Call Volume Mor Curr Fut Prev BDayS 0.2574 0.0586 4.40 1.4e-05 ***

Call Volume Mor Curr Fut Prev BDay2S -0.0333 0.0181 -1.84 0.06693 .

Call Volume Eve Curr Fut Prev BDayS 0.1157 0.0442 2.62 0.00908 **

Call Volume Eve Next Fut Prev BDayS 0.1877 0.0778 2.41 0.01614 *

Total Volume Curr FutS 0.4452 0.1208 3.69 0.00025 ***

Volatility Curr FutS -0.1183 0.0631 -1.88 0.06129 .

Friday -0.2117 0.1138 -1.86 0.06346 .

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 Index S is for a Standardized covariate

All covariates and the intercept have very small p-values. They also satisfy our required level of p < 0.10.

To confirm that our conclusions regarding the coefficients are correct we need to see if they are strictly the same sign. For instance if the β value for ”Short week” could be both positive and negative we cannot say with any confidence that a short week has a negative impact on the volume. If the confidence interval contains 0 we cannot say that it is of any importance to the model. To check this a bootstrap was run with 100000 replications.

Table 7: Bootstrapped β values from the Final Model.

Estimate EstimateBootMed EstimateBootCI

(Intercept) 5.6064 5.6056 (5.4578, 5.75933)

Call Volume Evening Next FuturesS 0.2373 0.2363 (0.0305, 0.42423)

Short week 0.3123 0.3124 (0.1168, 0.49993)

0 days to exp -4.7143 -4.7021 (-5.5609, -3.84459)

1 days to exp -2.6695 -2.6597 (-3.5413, -1.89111)

2 days to exp -2.1457 -2.1413 (-2.9231, -1.40874)

3 days to exp -2.0062 -2.0002 (-2.6783, -1.39065)

4 days to exp -0.7257 -0.7279 (-1.1219, -0.33089)

Call Volume Mor Curr Fut Prev BDayS 0.2574 0.2569 (0.1654, 0.35352)

Call Volume Mor Curr Fut Prev BDay2S -0.0333 -0.0334 (-0.0635, -0.00179)

Call Volume Eve Curr Fut Prev BDayS 0.1157 0.1153 (0.0580, 0.17139)

Call Volume Eve Next Fut Prev BDayS 0.1877 0.1881 (0.0310, 0.34614)

Total Volume Curr FutS 0.4452 0.4516 (0.2432, 0.63048)

Volatility Curr FutS -0.1183 -0.1199 (-0.2213, -0.01041)

Friday -0.2117 -0.2132 (-0.3938, -0.03257)

Index S is for a Standardized covariate

A hypothesis to cover both situations was for positive β values set up as H0: βi≤ 0

(25)

H1: βi> 0 and for negative β values

H0: βi≥ 0 H1: βi< 0

Both the low and high part of the confidence intervals have the same sign as the βi for all i. We reject the null hypothesis at the 90% level.

To visualize how important each covariate is to the model we can look at a bar diagram of percentage of R2.

1 3 5 7 9 11 13

Relative Importance of Covariates

R−Square= 0.584 Predictor Variables

% of R−Square 051015202530

Figure 1: Relative Importance of Covariates - Index 1 is Call Volume Evening Next Futures in Table 6 et cetera.

The dummy 0 days to exp is by far the largest contributor to R2 while 4 days to exp contributes the least.

5.4.1 Multicollinearity

Multicollinearity can be a huge problem in linear regression as it can inflate the β values. To test for this a Variance Inflation Factor (VIF) test was employed[10, pp. 199-200].

(26)

Table 8: Variance Inflation Factor Test VIF Call Volume Evening Next FuturesS 4.47

Short week 1.05

0 days to exp 5.95

1 days to exp 3.48

2 days to exp 4.86

3 days to exp 4.73

4 days to exp 1.38

Call Volume Mor Curr Fut Prev BDayS 2.20 Call Volume Mor Curr Fut Prev BDay2S 2.01 Call Volume Eve Curr Fut Prev BDayS 1.25 Call Volume Eve Next Fut Prev BDayS 3.88

Total Volume Curr FutS 9.35

Volatility Curr FutS 2.55

Friday 1.33

Index S is for a Standardized covariate

As the VIF is not greater than the suggested 10 for any of the covariates we conclude that we do have some multicollinearity but nothing too severe[23, p. 3].

The VIF of Total Volume Curr FutS is close to 10 so it is the covariate with the single most inflation and could be of some concern.

To further examine this the correlation matrix (Table 13 was checked. It also confirms that we have some multicollinearity but nothing too severe.

5.5 Linear Regression Assumptions

To verify that the linear regression assumptions in subsubsection 3.1.3 were not violated a range of tests were run and the results shown below.

5.5.1 Normality

To confirm this assumption there are at least three plots to look at. The His- togram with Normal Curve with a superimposed normal distribution is a fair way to check this. One problem with the Histogram with Normal Curve however is that the bars are half a unit wide so it can be hard gauge the true curvature.

To make it easier we have the Kernel Density Plot where two curves are cal- culated and then compared to a normal distribution. They are both supposed have the shape of a normal distribution[10, pp. 189-194] with mean 0.

(27)

Histogram with Normal Curve

Residuals

Frequency

−3 −2 −1 0 1 2 3

020406080120

Figure 2: Histogram with Normal Curve

While not perfectly normal it is close. A slight bump just right of 0 could indicate trouble but it could also be because the previously mentioned wide units. There are also some minor problems in the tails as we will also see on the Normal Q-Q plot.

(28)

Kernel Density Plot

Residuals

Density

−3 −2 −1 0 1 2 3

0.00.10.20.30.40.5

Figure 3: Normal Q-Q Plot over standardized residuals

The Kernel Density Plot calculates two curves for the residuals and superim- poses them over a histogram. The full line is without smoothing and the dotted line has a smoothing factor of 2. It is easy to see that they very much have the form of a normal distribution.

The Normal Q-Q plot is a probability plot of standardized residuals against expected values under normality. For normality all points should fall on a straight 45 angle[10, pp. 189-194].

(29)

−3 −2 −1 0 1 2 3

−202

Theoretical Quantiles

Standardized residuals

Normal Q−Q

Figure 4: Kernel Density Plot

There are some outliers in the higher and especially lower end of the line but all in all it is very close to normal.

While none of these plots displayed a perfect normal distribution it is definitely close enough. The analysis of these three plots suggests the normality assump- tion is not violated.

5.5.2 Linearity

If the dependent variable is linearly related to the explanatory variables there should be no systematic relationship between the residuals and the predicted values. In essence this means we should only have random noise in the Residuals vs. Fitted Plot [10, pp. 189-194].

(30)

0 1 2 3 4 5 6

−3−2−1012

Fitted values

Residuals

Residuals vs Fitted

Figure 5: Residual vs. Fitted Plot

If the dots in the plot are closer together on one side than on the other side of the x-axis there could be a problem. Possible problematic dots are around y=-3 and y=2.5 since there are no corresponding dots on the opposite side. However, these are minor violations and very few considering the large sample.

The Residuals vs. Fitted Plot suggests the linearity assumption is not vio- lated.

5.5.3 Independence

Time series data, which we are working with, often display some kind of auto- correlation which means that observations closer in time correlates more with each other than observations distant in time. Common remedies for this include adding lagged versions of the dependent and some independent variables to the model.

To test for this a bootstrapped Durbin-Watson test with 100000 replications was run[10, p. 196]. A hypothesis test for the autocorrelation ρ was set up as

H0: ρ = 0 H1: ρ 6= 0

(31)

Table 9: Results from the Durbin-Watson test.

lag Autocorrelation D-W Statistic p-value

1 0.0187 1.96 0.595

As the p-value is greater than our chosen α of 0.10 we cannot reject H0. This result and the fact that we already include lagged parts in our model suggests the independence assumption is not violated.

5.5.4 Homoscedasticity

To confirm homoscedasticity or constant variance one can look at a Scale- Location Plot which shows the square root of standardized residuals against the fitted values.[10, pp. 189-194].

0 1 2 3 4 5 6

0.00.51.01.52.0

Fitted values

Standardized residuals

Scale−Location

Figure 6: Scale-Location Plot

For perfect homoscedasticity the plot should display no patterns. The dots looks fairly random with about equal distance and numbers on both sides of the y=1 line. There is a slight pattern around x=1 but apart from this the plot looks good.

The Scale-Location Plot suggests the homoscedasticity assumption is not vio- lated.

(32)

5.5.5 Multicollinearity

From the VIF Test in Table 8 and the Correlation Matrix in Table 13 we con- clude there is no perfect multicollinearity.

5.5.6 Assumptions summary

The conditions are not ideal but the previous tests and plots suggests that no linear regression assumption is severely violated.

5.6 Final Model compared to the Intermediary Models

To visualize how much we have improved our model from the Intermediary Models we compare the Final Model to each best Intermediary Model value from Table 5. From the same table we calculate the mean of all statistics.

Table 10: Final and Intermediary models comparison

Final Model Best IM value Mean IM

LOOCV Mean Square 0.822 0.827 0.840

R2Adj 0.572 0.573 0.561

AIC 1336 1338 1346

BIC 1403 1391 1402

IM stands for Intermediary Model

The Final Model’s test values are clearly much better than the mean of the Intermediary Models except for in the BIC category. When comparing the Final Model to the Best Intermediary Model values it is the best in two of the tests and very close in the Adjusted R2 test.

Considering the multicollinearity, the high p-values for coefficients and possible other problems in the Intermediary Models, it is safe to say that the Final Model is an improvement.

5.7 Final Model compared to the Null Model

To get confidence intervals to verify the model and check it against the null model a Bootstrap with 100000 replications was run. As decided in the method- ology section the confidence interval was set to 90%.

(33)

Table 11: Final and Null model comparison

Final Model Null Model

Degrees of freedom 492 506

LOOCV Mean Square 0.822 1.85

R2 0.584 N/A

R2 BootMed 0.596 N/A

R2 BootCI 0.491-0.646 N/A

R2Adj 0.572 N/A

R2Adj BootMed 0.584 N/A

R2Adj BootCI 0.475-0.636 N/A

Residual standard error 0.888 1.36

Residual standard error BootMed 0.873 1.36 Residual standard error BootCI 0.853-0.960 1.25-1.49

AIC 1336 1752

AIC BootMed 1318 1750

AIC BootCI 1294-1413 1664-1844

BIC 1403 1761

BIC BootMed 1385 1759

BIC BootCI 1361-1480 1673-1853

BootCI is short for Bootstrapped BCa Confidence Interval.

BootMed is the reported median from the bootstrap.

If the two Residual standard error (RSE) intervals do not overlap we can reject the null hypothesis that they are equal

H0: (Final Model)RSE= (Null Model)RSE in favor of

H1: (Final Model)RSE6= (Null Model)RSE

The equivalent can be extended for BIC and AIC with the same result.

Furthermore , it was also noted that the Final Model is drastically better than the Null Model according to LOOCV Mean Square, Residual standard error, AIC and BIC.

To further confirm the difference an ANOVA table was calculated

Table 12: ANOVA table comparing the Null Model and the Final Model

Res.Df RSS Df Sum of Sq F Pr(>F)

Null Model 506 933.30

Final Model 492 388.36 14 544.94 49.31 <2e-16

As the p-value is below our required 0.10 and indeed numerically almost zero, it confirms that the two models are very different.

(34)

5.8 Bootstrapped Final Model

From the bootstrap data in Table 7 an alternative model can be formulated as the Bootstrapped Final Model with some minor differences from the Final Model. Here the β values are replaced with their respective bootstrapped me- dian.

lnhCall Volume Mor Curr Futi = 5.6056 +0.2363 ∗ (hCall Volume Eve Next Futi − 508)/1231 +0.3124 ∗ hShort weeki

−4.7021 ∗ h0 days to expi

−2.6597 ∗ h1 days to expi

−2.1413 ∗ h2 days to expi

−2.0002 ∗ h3 days to expi

−0.7279 ∗ h4 days to expi +0.2569 ∗ (hCall Volume Mor Curr Fut Prev BDayi − 261)/236

−0.0334 ∗ ((hCall Volume Mor Curr Fut Prev BDayi − 261)/236)2 +0.1153 ∗ (hCall Volume Eve Curr Fut Prev BDayi − 3691)/1960 +0.1881 ∗ (hCall Volume Eve Next Fut Prev BDayi − 318)/891 +0.4516 ∗ (hTotal Volume Curr Futi − 100363)/69450

−0.1199 ∗ (hVolatility Curr Futi − 1.02)/0.0111

−0.2132 ∗ hFridayi

(24)

6 Discussion

6.1 Covariates and their Beta-values

6.1.1 In the Final Model

Call Volume Evening Next Futures A moderately large positive β-value of 0.2373 indicate that the volume in the morning auction follows the same pattern as the evening auction. E.g. the volume in the evening auction correlates positively with the morning auction.

Short week A moderately large positive β-value of 0.3123 indicates that traders are more eager to trade during a short week.

X days to exp They all have negative β-values which decreases closer to expiry. By looking at the underlying data the decrease can be explained by traders starting to trade in the next month’s futures contract approximately 4 days before expiry. The big negative β-value on the day of expiry is due to there being almost zero trading in the current futures on that date.

(35)

Call Volume Morning Current Futures Previous Business Day A moderately large positive β-value of 0.2574 indicates that the morning auc- tion is somewhat dependent on the previous day’s morning volume for the same instrument.

Call Volume Morning Current Futures Previous Business Day2 A small negative β-value of -0.0333 in the squared term will offset the effects of an unusually large volume.

Call Volume Evening Current Futures Previous Business Day A small positive β-value of 0.1157 indicates that the morning auction is somewhat depen- dent on the previous day’s evening volume for the same instrument. It should be noted that the β-value is roughly half of the morning equivalent.

Call Volume Evening Next Futures Previous Business Day A small positive β-value of 0.1877 could be a result of the next futures contract almost exclusively being traded in the expiration week. Therefore this could help pre- dict the current morning futures volume with regards to the fixed negative β-values of the X days to expiry dummies.

Total Volume Current Futures A positive β-value of 0.4452 indicates that the volume in the morning auction follows the same characteristics as the con- tinuous trading for the rest of the day. E.g. if traders trade more during the day they also trade more in the auction.

Volatility Current Futures A small negative β-value of -0.1183 indicates that the higher the volatility, the more conservative the traders are. It should be noted that according to Figure 1 this covariate is relatively unimportant to the model.

Friday A very small negative β-value of -0.2117 indicates less interest in trad- ing on Fridays.

6.1.2 Excluded

Current Futures Open/Close and Nikkei 225 Open/Close These co- variates were suggested by most interviewees but seem to be uncorrelated with the volume. The initial suggestion stemmed from the idea that a higher opening price would correlate with an eagerness to buy and therefore a higher volume.

Since the p-value for the above covariates are very high it is not possible to go further into the issue.

(36)

6.2 Further research

6.2.1 Additional covariates

Additional covariates and general areas of further research:

• Orderbook activity

• Late openings due to technical problems

• Time of year

• Presence of algorithmic traders

6.2.2 Generalized Linear Models

Due to the modelled value being of the count type (always a non-negative in- teger) some more generalized linear models specifically tailored for these cir- cumstances could be of use. Two examples are quasi-logarithmic and negative binomial.

6.2.3 Time series analysis

While looking at the dependent variable over time a few trends were noticed.

It seemed to spike somewhere in the middle of two expiry dates. There also seemed to be trends regarding previous executed volumes which is indicated by the many previous business day terms in our model. An ARIMA could probably be a good model to start with. Due to the lack of time and experience in the field this was never fully investigated.

7 Conclusion

With the Final Model in hand we can conclude that it is possible to model and predict the executed volume of the morning call auction to some extent. Notably a majority of the predictions from the preliminary interviews were found to be correct. With an R2of 0.584 and somewhat high residual standard errors there is probably some explanatory part of the model missing or possibly the result of unpredictable random noise. Further studies can possibly either find these parts or confirm the existence of random noise. In conclusion, the thesis successfully fulfils its intention to model the morning auction within the parameters of the paper.

(37)

8 Appendices

Table13:CorrelationMatrixfortheFinalModel.xiisthecorrespondingleftsidecovariate.

(Intercept)x1x2x3x4x5x6x7x8x9x10x11x12EveningNextS0.4Shortweek-0.24-0.060daystoexp-0.4-0.630.081daystoexp-0.66-0.580.070.652daystoexp-0.65-0.240.010.190.533daystoexp-0.57-0.11-0.010.010.390.784daystoexp-0.42-0.080.060.010.230.450.46CallVolumeMorCurrFutPrevBDayS0.080-0.080.070.070.110.110.13CallVolumeMorCurrFutPrevBDay2S-0.180.010-0.06-0.04-0.05-0.06-0.06-0.7CallVolumeEveCurrFutPrevBDayS-0.170.040.020.10.160.220.190.04-0.10.05CallVolumeEveNextFutPrevBDayS0.380.02-0.01-0.6-0.54-0.26-0.13-0.030.02-0.01-0.07TotalVolumeCurrFutS0.44-0.090.060.19-0.24-0.8-0.86-0.47-0.090.04-0.210.04VolatilityCurrFutS-0.380.04-0.03-0.090.210.630.670.410.0700.07-0.04-0.76Friday-0.2800.03-0.210.060.040.050.08-0.060.03-0.0200.01

References

Related documents

He explains that “a great butler can only be, surely, one who can point to his years of service and say that he has applied his talents to serving a great gentleman-and through

It assesses the situation in all administrative regions while focusing on coverage of the territory with the capacity of accommodation establishments and restaurant &amp;

Including the chemicals of the pro- curable or the theoretical training set in future screening pro- grams on critical environmental and human health endpoints would increase

Studien har resulterat i flera intressanta slutsatser som kan studeras vidare och fördjupas. En möjlig studie är att vidga undersökningen och införa dummyvariabler för trender och

Social mechanisms that produce and reproduce different conditions for women and men were at the core of this perspective. Ascertaining whether gender or sex differences exist and

In this context, it is important to understand which strategies may help the Italian business formed mostly by micro and small size companies to survive the crisis and improve

OSS companies that adopt a product-oriented business strategy can all be associated with the returns from scale factor and the need for continuous revenue streams (cf. At the

It is shown in the results that when having a constant magnetic flux density in the core and increasing the air gap length, losses due to induced eddy currents in the core will