Predicting share price by using Multiple Linear Regression.

(1)

Predicting share price by using Multiple Linear Regression

A Bachelor Thesis in Mathematical Statistics

Gustaf Forslund & David Åkesson Vehicle engineering KTH

May 21

^st

, 2013

(2)

Abstract

The aim of the project was to design a multiple linear regression model and use it to predict the share’s closing price for 44 companies listed on the OMX Stockholm stock exchange’s Large Cap list. The model is intended to be used as a day trading guideline i.e. today’s information is used to predict tomorrow’s closing price. The regression was done in Microsoft Excel 2010^[18] by using its built-in function LINEST. The LINEST-function uses the dependent variable y and all the covariates x to calculate the β-value belonging to each covariate. Several multiple linear regression models were created and their functionality was tested, but only seven models were better than chance i.e. more than 50 % in the right direction. To determine the most suitable model out of the remaining seven, Akaike’s Information Criterion (AIC), was applied. The covariates used in the final model were; Dow Jones closing price, Shanghai opening price, conjuncture, oil price, share’s opening price, share’s highest price, share’s lowest price, lending rate, reports, positive/negative insider trading, payday, positive/negative price target, number of completed transactions during one day, OMX Stockholm closing price, TCW index, increasing closing price three days in a row and decreasing closing price three days in a row.

The maximum average deviation between the predicted closing price and the real closing price of all the 44 shares predicted were 6,60 %. In predicting the correct direction (increase or decrease) of the 44 shares an average of 61,72 % were achieved during the time period 2012-02-22 to 2013-02-20. If investing 50.000 SEK in each company i.e. a total investment of 2.2 million SEK, the total yield when using the regression model during the year 2012-02-22 to 2013-02-20 would have been 259.639 SEK (11,80 %) compared to 184.171 SEK (8,37 %) if the shares were never to be traded with during the same period of time. Of the 44 companies analysed, 31 (70,45 %) of them were profitable when using the regression model during the year compared to 30 (68,18 %) if the shares were never to be sold during the same period of time. The difference in yield in percentage between the model and keeping the shares for the year was 40,98 %.

(3)

Chapter 1 Introduction

1.1 Introduction

Most people around the world dream of having more money in their pockets. The trick, however, is not to work harder, but to work smarter. One way to make more money is to invest on the stock exchange, but due to its seemingly random and unpredictable nature, people are reluctant to do so. At first glance the stock exchange may seem random and unpredictable, but that is not the entire truth. If the stock exchange is carefully analysed a pattern will slowly emerge and it will be evident that there are a number of variables that contributes to a company’s share price. Such variables are positive and negative news, price target, conjuncture, oil price and other economic influential country’s stock exchange just to name a few.

One of the most common share analysis tool used today is the so called regression channel.

These regression models are often sole based on the closing price vs. time and is more reminiscent of a technical analysis rather than a prediction of the shares closing price. This project aims to take it a step further by predicting a closing price for each day. The approach is to determine which variables that has an influence on company’s share price, design a multiple linear regression model and perform prediction using Microsoft Excel 2010’s^[18]

built-in function LINEST to predict the closing price of 44 companies listed on the OMX Stockholm stock exchange’s Large Cap list. The Large Cap list was at the time made up of 62 companies, but sufficient information was only found for 44 of them. Unlike the regression channels that can be used for forecasting the direction of shares for several days ahead, even weeks, this model will be used to analyse share prices on a daily basis for what resembles day trading. The goal with the final model is to maximize the profit and minimize the losses based on a daily analysis during the time period 2012-02-22 to 2013-02-20.

Since several multiple linear regression models were to be designed containing different sets of covariates the Akaike Information Criterion (AIC) was used to determine the most suitable model. One of the criterions for the model, set by us, were that it should be better than chance in predicting if the share would increase or decrease in value i.e. have more than 50 % of the predicted values in the correct direction. The other criterion was that the predicted closing price should not deviate more than 10 % from the real closing price.

(6)

2

Chapter 2 Theory

2.1 Econometrics

The term “econometrics” is believed to have been coined by the Norwegian man Ragnar Frisch who lived between the years 1895-1973. He was one of the three principle founders of the Econometric Society, first editor of the journal Econometrica and co-writer of the first Nobel Memorial Prize in Economic Science in 1969^[15].

Econometrics is used when it comes to applying statistical methods to problems when the data available is observational rather than experimental, meaning the data obtained does not come from controlled and planned experiments. Common fields where econometrics is applied are economics, biology, medicine, social science and astronomy^[16]. The latter is a perfect example of a natural science where data are typically observational and not experimental.

2.1.1 The Multiple Linear Regression Model theory

The basic model for econometric work and modelling for experimental design is the multiple linear regression model^[16]. The specification is

(2.1)

where y is the observation of the dependent random variable _i y whose expected value depends on the covariates x_Cj where C is a constant that denotes that i does not change. e _i represents the error terms and is assumed to be independent between observations and such that

( |{_i _jk}) 0

E e x  and E e( _i²|{x_jk})² (2.2) (2.3)

where  is unknown. Usually the covariate x is a constant 1 and _C₀ ₀ is the intercept.

If written

( 0 ... ), 1,...,

i i ik

x  x x i n and  (₀ ... _k)^T

then the specified model may be written as

i i i

y xe (2.4)

The covariates may be deterministic (predetermined) values or outcomes of random variables^[16].

0

, 1,...,

k

i ij j i

j

y x  e i n





 

(7)

3 Sometimes it is convenient to use the matrix notation

Y  Xe where ( |E e X)0 and E ee( ^T |X)I² (2.5) (2.6)

where Y is a n1-matrix of random variables, X is an n (k 1)-matrix and e is an n1- matrix of random variables.

In the regression model above the parameters _j and the variance ² are unknown and it is these parameters that are to be estimated from obtained data. The model can be used for either prediction or it can be used to give a structural interpretation, which allows for hypotheses testing. Since the project aims at predicting shares’ closing price the interesting part was therefore only prediction.

2.2 Prediction

When performing a prediction the linear model is often used^[16]. The covariates x makes up a ₀ row matrix and with known covariates the predicted value of the corresponding y, y_p, is

0 ˆ

yp x . (2.7)

The prediction contains two unknown components; the estimated value ˆ of  is used instead of the real  and the error term, which is set to zero in the prediction equation.

However, the error term is never zero in reality so to calculate it in the prediction the following equation is used

0 0( ˆ)

ep  e x   (2.8) whose total variance is

1 2

0 0

( _p) (1 ( ^T ) ^T)

Var e  x X X ^ x  (2.9)

which is estimated to

1 2

0 0

ˆ ( ) (1_p ( ^T ) ^T)

Var e  x X X ^ x s (2.10)

where s is an unbiased estimate of ²

2 1 | |ˆ 2

s 1 e

 n k

  (2.11)

(8)

4

where n is the number of observations, k is the number of covariates in the prediction model and | |eˆ ²e eˆ ˆ^T , eˆ Y Xˆ.

2.3 Regression channels

On today’s stock exchange one of the most common analysis tools is the regression channel.

It uses historic values to forecast the future. The regression channel is based on a form of chaos theory i.e. trying to predict something that springs from total chaos. A metaphoric example can be made to illustrate how the regression channel works. Imagine a cigarette, which stands straight up in a room where the air is perfectly still. The chaos theory says that there is no way of predicting the smoke’s trails and loops as it leaves the cigarette. However, if 10.000 cigarettes were to be observed in a row it would be noticed that the smoke trails of one cigarette would never behave the same way as another cigarette’s, but at the same time it would also be noticed that the smoke trails would never move outside a conical boundary on their way up in the air. In chaos theory this boundary is known as a chaotic attractor.

Regression channels are based on the same principle, but instead of the smoke trail they use a share’s closing price and the channel’s boundary is the chaotic attractor, which the share price is not allowed to cross for a longer period of time. If the share moves outside the regression channel it indicates that an unforeseen event has occurred, such as positive or negative news or a new price target has been released and it is time to sell or buy the share.

One of the most common regression channels in use today is the Raff Regression Channel. It uses time and closing price to draw up the channel. A regression line is created by analysing a share’s closing price between certain days, say for example 100 days. Once the regression line is drawn two more parallel lines are drawn, one above the regression line and one below it, at equal distance from the regression line, see Figure 2.1. The distance is determined by the highest or lowest share closing price from the regression line during the 100 days analysed^[12]. The top line is seen as resistance and the bottom line is seen as support. The share may cross these two lines for a short moment but if it stays outside for a longer period of time it indicates that a new trend is coming.

(9)

5

Figure 2.1. Raff Regression Channel with 100 days analysed of Volvo B share between 2008-01-02 to 2013-02-21

As seen in Figure 2.1 the 100 days analysed is between day 400 and day 500, marked with two red stars. In this case it was the maximum value at day 473 that set the distance between the resistance line and the regression line. The red circle at day 579 indicates the point in time when something unexpected happened and it was time to sell the shares and collect the profit or to buy more shares and make a new regression channel. Even though the share returns to the regression channel it has been more than 10 days since it crossed the support line and a new regression channel has to be made.

Since the project aims to take it a bit further the multiple linear regression model was intended to predict a closing price for each day instead of looking at the interval the share may stay within. The difference between the regression channel and the multiple linear regression model is that the regression channel can be used to see for how long the shares are worth keeping, while the multiple linear regression model was intended to be used as a guideline for day trading i.e. keep the shares if the model predicts a higher closing price tomorrow compared to today otherwise sell them.

0 200 400 600 800 1000 1200 1400

30 40 50 60 70 80 90 100 110 120 130

Closing price [SEK]

Days

VOLVO AB, B share

(10)

6

Chapter 3 Data

3.1 Covariates

How the stock exchange move during a day depends on several different factors. For instance, if the lending rate is raised the shares should go down, but on the same day a company could announce a quarterly report, which should make the shares increase in value. Here all the different covariates that have been used for the different models are listed and explained what kind of impact they have on the shares.

3.1.1 Stock exchanges in the world

Dow Jones and Shanghai are two of the largest stock exchanges in the world. Because of this, and the fact that both USA and China are two of the most economic influential countries in the world, it makes Sweden and the OMX Stockholm stock exchange in particular very dependent on how these two countries economy develops^[13]. Because of the time difference the closing price for Dow Jones and the opening price for Shanghai were used. The index prices for Dow Jones and Shanghai are given in USD and CYN respectively meaning they had to be converted into SEK using Oanda^[11] for the exchange rates over the time period that was analysed.

3.1.2 Conjuncture

The conjuncture for Sweden is given by the so called Barometerindikatorn which is given at the end of every month by Konjunkturinstitutet. It is based on studies where the present and future prospects of the economy are explored by both companies and households. It has an average of 100 meaning that if the value is lower than 100 it is economic recession and over 100 it is economic boom in Sweden. The conjuncture gives a good indication of peoples approach on wanting to buy or sell shares at the moment. Since the value is only given at the end of every month a linear interpolation was performed to obtain values for every day. The last month was extrapolated, since the value for it was not yet given.

3.1.3 TCW index

The TCW index stands for Total Competiveness Weights and is an index calculated by International Monetary Fund, IMF. It is a weighted average of several currencies, which measures the Swedish currency against other currencies. A currency’s weight is based on Sweden’s trading with a particular country relative all the other countries. If the TCW index increases it means that the Swedish currency (SEK) has decreased in value and if the index decreases the SEK will increase in value. The TCW has a large impact on companies, which is why it was considered a variable in the model.

3.1.4 Lending rate

The lending rate is the rate that banks have to pay Riksbanken when lending money. If the lending rate is low it means that the banks can lower their interest rate leading to that;

1) companies will invest more,

2) and consumers will have more money to consume and invest,

The lending rate is given by Riksbanken and retains the value until a new is given.

(11)

7

3.1.5 Pay day

A dummy variable where a one indicated that salary was given on that day, normally on the 25^th every month. The hypothesis was that when people get their salary they might have some extra money over after paying their bills, which could be used for investing in shares leading to an increase in value.

3.1.6 Opening price

The opening price is the value that each share has when the OMX Stockholm stock exchange opens for trading. The opening price gives a good indication of where the stock will move during the day. Since the Stock exchange can be likened with an auction market i.e. buyers and sellers meet to make deals with the highest bidder, the opening price does not have to be the same as the last day’s closing price.

3.1.7 Highest/lowest price of the day

The highest and the lowest price of the day are taken the day before and gives an indication of how much the shares usually move during a day and how this in the end will affect the closing price. It also shows the general cyclical movement for each share.

3.1.8 Positive/Negative insider trading

Insider trading refers to trading done by people with good insight in how the company is doing e.g. what kind of result they may present in the near future. People with good insight could be the CEO, CEO of subsidiary, members of the board, people that own a lot of shares in the company etc. This covariate was made up of two dummy variables; one for positive insider trading where a one indicated that people with good insight in the company bought shares, and one for negative insider trading where a one indicated that people with good insight sold their shares. If people with good insight in the company sell their shares it might indicate that the company is not doing as well as they thought and vice versa.

3.1.9 Quarterly and annual reports

The quarterly and annually reports are reports that present each company’s results for a certain period. Usually when a company makes their reports public the share will increase in value. This is often explained by the expectations that the investors have on the companies.

In this covariate only the release date was of interest i.e. the date that the report was released so a dummy variable was created where a one indicated a released report.

3.1.10 Positive/negative price target

The price target is set by different analysts. Usually investors that do not have the time to analyse the shares for themselves trust the analysts meaning that it gives a good indication of when the shares will go up or down. The price target regards the share with the largest volume i.e. the share that most people trade with, normally B shares. This covariate was made up of two dummy variables; one for positive price target where a one indicated that the analysts gave a price target larger than the actual price for the share and one for negative price target where a one indicated that analysts gave a price target lower than the actual price for the share.

(12)

8

3.1.11 Oil price

The oil price gives a good indication of how strong the USD is and the condition of the USD has a big impact on the worldwide stock market. Since the oil price is given as USD/barrel it was converted it into SEK/barrel to match the model. Another effect the oil price has on shares is that if the oil price rises a company’s share value normally decreases since nearly every company depends on oil for production^[14]. Therefore if the oil price rises the company’s expenses increases and their profit will decrease.

3.1.12 Three positive/three negative days in a row

The hypothesis was that when a share moved in one direction for several days it would have a big impact on peoples feeling toward the company. This covariate was made up of two dummy variables; one for three positive days in a row where a one indicated that the share’s closing price had increased in the last three days, and one for three negative days in a row where a one indicated that the share had decreased in the last three days.

3.1.13 P/E ratio

This is a ratio calculated by

/ /

Current share price

P E  Earnings share (3.1)

and is a measurement of how long it will take to get the investment back, provided that the earnings remain unchanged. The P/E-ratio is given in years. From where were the data was collected the P/E-ratio was given once a year for each company so the values were linear interpolated to obtain them for every day.

3.1.14 Positive/Negative press releases

Whenever a company makes a press release it will have an impact on their shares. These releases can be about letting people go or that the company has just made a huge profitable deal. This covariate was made up of two dummy variables; one for positive press releases where a one indicated that a positive release had been made, and one for negative press releases where a one indicated that a negative release had been made.

3.1.15 Number of completed transactions

This is the number of completed trading transactions that has been done with a company’s share during one day. This number usually increases when a report is about to be released. It gives a good indication of how popular a company’s shares are at the moment.

3.1.16 OMX Stockholm closing price

This is the value of how the total OMX Stockholm exchange has moved during the day.

3.1.17 Split and reversed split

A split is when a share is divided into several others and reverse is when several shares are made into one. This covariate was made up of two dummy variables; one for split where a one

(13)

9

indicated a split had been made and one for reversed split where a one indicated a reversed split had been made.

3.2 Collecting data

The historical data for all of the shares and the press releases was collected from Nasdaqomxnordic^{[1] [2]}

The data for the conjuncture and TCW index was collected from Ekonomifakta^{[4] [6]}.

The index for Dow Jones was collected from Stloisfed^[8] and for Shanghai from Yahoo finance^[9].

The insider trading was collected from Finansinstitutionen^[5]

Target prices and P/E-ratio was both found at Dagens ndustry^[3]

The Oil price per barrel was collected from Eia^[10]

The lending rate was collected from Riksbanken^[7]

All of the exchange rates for converting foreign currencies into SEK was collected from Oanda^[11]

The data was then processed in Microsoft Excel 2010^[18] using its built-in function LINEST to make the regression. For making the plots MATLAB^[17] and Minitab 16^[19] was used.

Chapter 4 Modelling

4.1 Modelling

Before designing the multiple linear regression models a number of covariates had to be evaluated to see what kind of impact they had on the share prices. The modelling process consisted of testing different types of covariate combinations until a satisfactory result had been achieved. What was learned during the modelling process was that theory put into practice does not always yield a perfect result. Covariate combinations that should have yielded a better result sometimes turned out to be worse. When seven satisfactory multiple linear regression models had been designed Akaike’s Information Criterion, AIC, was applied to determine which model that was most suitable.

4.2 AIC test

Since multiple linear regression modelling often results in several different model designs it is important to choose the best model suitable. One way of settling for a model is to use Akaike’s Information Criterion, AIC. The AIC provides an index that can be used to

(14)

10

determine which of several competing models to go with. The multiple linear regression model with the lowest AIC index should be the model of choice.

The general formula for AIC is

) 2 log( ( ) 2 ( )

( _i _i ) _i

AIC    L  p (4.1)

where L( _i) is the likelihood function of the parameters in model _i evaluated at the maximum likelihood estimators and p is the number of covariates in the model. According to stat.stanford^[20] the general formula can rewritten as

) (log(2 ) log( ( )) log( )) 2 ( 1)

( n SSE n

AIC         n p (4.2)

where stands for the model that is tested, n is number of observations, p number of covariates and SSE is the sum squared error. The models that were tested and their AIC indexes were the following.

4.2.1 First model

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

Closing price Dow Jones closing price conjuncture TCW

lending rate payday opening price highest price lowest price positive insider negative insider reports

pos

   

   

   

 ₁₂ ₁₃

14 15

16 17

) ( )

( ) ( )

itive price target negative price target Shanghai opening price oil price

three positive days in a row three negative days in a row

 



 

1 1

( ) 167993, 61 ) 333904, 69

45892 17

(

AIC n

p SSE







  

 

(15)

11

4.2.2 Second model:

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

pos

   

   

   

 ₁₂ ₁₃

14 15

16 17

18 19

) ( )

( ) ( )

three positive days in a row three negative days in a row positive press negative press

 



 

2 2

( ) 167996,15 ) 333880,19

45892 19

(

AIC n

p SSE







  

 

4.2.3 Third model:

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

pos

   

   

   

 ₁₂ ₁₃

14 15

16 17

18 19 2

) ( )

( ) ( )

( ) ( ) ( )

three positive days in a row three negative days in a row positive press negative press positive report

 

  



 

   ₀

21 22 23 24

(negative report) (dividend) (split) (reversed split)

   

3 3

( ) 168001, 00 ) 333793,87

45892 24 (

AIC n

p SSE







  

 

(16)

12

4.2.4 Fourth model:

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

pos

   

   

   

 ₁₂ ₁₃

14 15

16 17

18

) ( )

( ) ( )

( )

three positive days in a row three negative days in a row number of clompleted transactions

 





 



4 4

( ) 167990, 50 ) 333818, 93

45892 18

(

AIC n

p SSE







  

 

4.2.5 Fifth model:

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

pos

   

   

   

 ₁₂ ₁₃

14 15

16 17

18 19

) ( )

( ) ( )

(

three positive days in a row three negative days in a row number of completed transactions positive press n

 



 

 egative press)₂₀

5 5

( ) 167993,17 ) 333796,81

45892 20 (

AIC n

p SSE







  

 

4.2.6 Sixth model:

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

pos

   

   

   

 ₁₂ ₁₃

14 15 16

17 18

) ( )

( ) ( ) ( )

( ) ( / )

itive price target negative price target

Shanghai opening oil price three positive days in a row three negative days in a row P E ratio

 

  

 



  

 

6 6

( ) 167995, 55 ) 333903, 67

45892 18

(

AIC n

p SSE







  

 

(17)

13

4.2.7 Seventh model:

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

pos

   

   

   

 ₁₂ ₁₃

14 15

16 17

18

) ( )

( ) ( )

( )

(

three positive days in a row three negative days in a row number of completed transactions

OMX Stockholm opening p

 





 



 rice)₁₉

7 7

( ) 167929,88 ) 332771,86

45892 19

(

AIC n

p SSE







  

 

Summary of the AIC indexes were as follows.

( 1)

AIC AIC( ₂) AIC( ₃) AIC( ₄) AIC( 5) AIC( ₆) AIC( ₇) 167993,61 167996,15 168001,00 167990,50 167993,17 167995,55 167929,88

As mentioned above the model with the lowest AIC index should be the best model of choice.

Here the multiple linear regression model with the lowest AIC index is number seven hence the model to go with.

4.2 The final model

The aim was to create a general multiple linear regression model that could be applied to all 44 shares analysed. To achieve this four years (2008-01-02 to 2012-02-21) of data from every share was collected and inserted into a single Microsoft Excel 2010^[18] sheet. The amount of data made up 45892 observations. Microsoft Excel 2010’s built-in function LINEST was used to perform the regression. The LINEST-function gives the _i:s value and the corresponding error term for every covariate used. The final model that was used had the following design.

0 1 2 3

4 5 6 7

8 9 10 11

( ) ( ) ( )

( ) ( ) ( ) ( )

(

Closing price Dow Jones closing price conjuncture TCW interest rate payday opening price highest value lowest value positive insider negative insider reports

po

   

   

   

 ₁₂ ₁₃

14 15

16 17

18

) ( )

( ) ( )

( )

(

sitive price target negativ price target Shanghai opening price oil price

three positive days in a row three negative days in a row number of completed transactions

OMX Stockholm opening p

 





 



 rice)₁₉

(18)

14

_i-values corresponding residuals, e _i

β0 7,33539 e0 1,00779

β1 1,429510^-5 e1 3,6215110^-6

β2 0,01035 e₂ 0,00240

β3 – 0,04664 e3 0,005639

β4 – 0,24109 e₄ 0,01902

β5 0,25977 e5 0,06065

β₆ 0,96488 e₆ 0,00398

β7 – 0,04321 e7 0,00652

β₈ 0,07893 e₈ 0,00659

β9 0,04293 e9 0,06880

β10 0,06856 e10 0,11282

β11 0,26392 e₁₁ 0,09203

β12 – 0,09910 e12 0,05644

β13 – 0,16703 e₁₃ 0,09102

β14 0,00024 e14 3,4825110^-5

β₁₅ 0,00076 e₁₅ 0,00020

β16 0,02782 e16 0,04033

β₁₇ 0,08593 e₁₇ 0,03795

β18 1,362610^-5 e18 5,1774910^-6

β₁₉ – 0,00433 e₁₉ 0,99883

Table 4.1. β-values and corresponding residuals from the regression of the final model (2008-01-02 to 2012-02-21).

The _i:s where then used to calculate the estimated yˆ (closing price) by the formula

19 , 19 2

, 2 1 , 1

0 ...

ˆ  x _j x _j x _j

y     (4.3)

where j denotes the j^th row i.e. j^th day predicted in Microsoft Excel 2010^[18].

(19)

15

Chapter 5 Result

5.1 Plots of the residual and R

²

value

Figure 5.1. Normal probability plot from the regression (2008-01-02 to 2012-02-21).

Figure 5.2. Plot of the residual vs. fits from the regression (2008-01-02 to 2012-02-21).

According to Microsoft Excel 2010^[18] the model had an R² value of 99,88 %.

(20)

16

5.2 Correct predicted directions

The project was made for 44 companies on the OMX Stockholm stock exchange Large Cap list. The aim was that the model should predict the direction of the shares at least 50 % correctly i.e. better than chance. The predicted closing price was compared with yesterday’s to determine if the predicted value was an increase or decrease. If the real closing price was an increase and the predicted value as well it was considered a correct prediction. Table 5.1 displays the result for each share.

Share Correct predicted directions [%]

Alfa Laval 62,8

Alliance Oil 60,0

Assa Abloy 62,0

Atlas Copco 62,4

Axfood 58,0

Axis 55,2

Billerudkorsnäs 56,8

Boliden 68,8

Castellum 58,8

Electrolux 66,0

Elekta 64,4

Ericsson 66,0

Fabegé 63,6

Getinge 60,8

H&M 63,6

Hakon invest 59,2

Hexagon 65,2

Holmen 62,4

Hufvudstaden 54,4

Husqvarna 60,8

Industrivärden 62,4

Investor 61,2

Kinnevik 62,0

Latour 57,6

Lundbergföretagen 66,0

Meda 59,6

MTG 58,8

Peab 56,4

Ratos 57,2

SAAB 56,0

Sandvik 70,4

SCA 64,0

Scania 62,0

SEB 65,2

Securitas 56,4

Skanska 68,0

SKF 66,8

(21)

17

SSAB 62,4

Swedbank 62,4

Swedish Match 58,4

TeliaSonera 58,8

Trelleborg 66,4

Volvo 70,8

Wallenstam 55,2

Table 5.1. Correct predicted directions in percentage for each share (2012-02-22 to 2013-02-20).

Each share was over 50 % which was the project aim, meaning that the model is profitable on a day by day basis in the long term. Since the model is intended to be applicable on all shares an average correct predicted direction was calculated

Average correct predicted direction = 62,8 60, 0 ... 55, 2

61, 71%

44

   

5.3 Maximum and average deviation

Since the average predicted direction gives insufficient information on the model’s ability alone a maximum and average deviation in percentage from the real closing price was calculated for each share. The calculation was done using Microsoft Excel 2010^[18] and the result is displayed in Table 5.2.

Share Maximum deviation [%] Average deviation [%]

Alfa Laval 6,80 1,07

Alliance Oil 10,24 1,52

Assa Abloy 4,17 1,04

Atlas Copco 4,44 1,19

Axfood 4,76 0,79

Axis 7,51 1,39

Billerudkorsnäs 9,76 1,48

Boliden 5,24 1,24

Castellum 4,66 0,82

Electrolux 6,14 1,25

Elekta 11,73 1,15

Ericsson 5,85 1,04

Fabegé 4,50 1,05

Getinge 5,60 0,91

H&M 4,53 0,76

Hakon invest 9,79 0,98

Hexagon 6,15 1,38

Holmen 3,73 0,83

Hufvudstaden 4,21 0,80

Husqvarna 11,63 1,30

Industrivärden 5,28 1,11

(22)

18

Investor 4,02 0,83

Kinnevik 6,82 0,83

Latour 5,42 1,19

Lundbergföretagen 4,66 0,82

Meda 4,81 0,85

MTG 29,19 1,47

Peab 5,59 1,36

Ratos 8,81 1,38

SAAB 4,52 1,09

Sandvik 6,27 1,21

SCA 8,10 0,88

Scania 5,11 1,20

SEB 4,68 1,09

Securitas 4,99 1,04

Skanska 3,47 0,82

SKF 8,48 1,10

SSAB 5,57 1,56

Swedbank 5,31 1,01

Swedish Match 6,72 0,91

TeliaSonera 3,78 0,76

Trelleborg 7,02 1,27

Volvo 5,80 1,20

Wallenstam 4,33 1,11

Table 5.2. Maximum and average deviation in percentage for each share (2012-02-22 to 2013-02-20).

The average maximum deviation was 6,60 % and the average deviation was 1,03 %.

5.4 Investing

To evaluate the model further an investment test was conducted. If 50.000 SEK had been invested in each share, a total investment of 2.2 million SEK, and the model had been applied over the year 2012-02-22 to 2013-02-20 the total yield would have been 259.639 SEK corresponding to 11,8 %. If the same investment had been done and the shares were never to be traded with during the same period of time i.e. the investment would have been done at 2012-02-22 and then sold at 2013-02-20, the total yield would have been 184.171 SEK corresponding to 8,37 %. This means that the model would have generated a 40,98 % greater yield that year. In Table 5.3 the shares’ result after a year is presented both using the model and without.

To calculate each share’s yield Microsoft Excel 2010^[18] was used. First a prediction column was created were a one indicated that the predicted closing price would be higher than yesterday’s real closing price, if the predicted value was lower or equal to yesterday’s real closing price a zero was shown. The four red columns to the right of the prediction column indicates if the prediction column goes from zero to one, one to one, one to zero or zero to zero. Each day there was a one present in the zero to one column shares was bought to the real opening price. Each day there was a one present in the one to one column the shares bought

(23)

19

earlier were kept and the profit/loss was added to yesterday’s daily value. Each day there was a one present in the one to zero column the shares were sold to the real opening price. Each day there was a one present in the zero to zero column nothing happened since the share was not of interest. To get an overview of how the calculation was made see Figure 5.3.

Figure 5.3. Extract of how the calculations of Sandvik’s yield was made in Microsoft Excel 2010^[18] using the model (2012-02-22 to 2013-02-20)

Share

How much the investment of 50.000 SEK is worth after the year using the model [SEK]

How much the investment of 50.000 SEK is worth after the year not using the model [SEK]

Alfa Laval 59970,21 55875,00

Alliance Oil 44225,26 33261,30

Assa Abloy 58206,55 63075,00

Atlas Copco 59086,73 53941,30

Axfood 45873,00 55540,80

Axis 42776,07 44362,50

Billeurdkorsnäs 46192,24 52641,75

Boliden 67020,51 47523,00

Castellum 52277,52 55265,85

Electrolux 68260,31 57024,00

Elekta 64516,28 64085,40

Ericsson 57895,17 59969,40

Fabege 59531,77 60455,25

(24)

20

Getinge 48817,93 52796,00

H&M 55803,67 48214,40

Hakon 60942,62 79165,00

Hexagon 75416,53 67369,20

Holmen 43163,71 48588,80

Hufvudstaden 44869,93 60253,00

Husqvarna 68619,69 50751,80

Industrivärden 64668,24 57450,00

Investor 62327,47 63790,50

Kinnevik 53983,50 50832,60

Latour 43729,87 53246,70

Lundbergföretagen 59566,70 58426,00

Meda 50952,34 55803,10

MTG 43150,95 41164,20

Peab 51134,95 48963,84

Ratos 43780,77 37388,25

SAAB 39568,02 52749,60

Sandvik 81024,52 51931,80

SCA 56428,23 68853,60

Scania 55887,09 52899,00

SEB 68538,51 67743,35

Securitas 43642,22 46154,85

Skanska 59587,12 47444,40

SKF 59139,09 47288,50

SSAB 59390,02 35585,55

Swedbank 55984,63 69615,00

Swedish Match 51485,73 42061,60

TeliaSonera 48099,77 45426,15

Trelleborg 72091,30 64696,25

Volvo 57681,16 50453,20

Wallenstam 54331,33 64044,10

Total yield 259.639,23 184.170,89

Table 5.3. Each share’s yield in SEK both using the model and without (2012-02-22 to 2013-02-20).

(25)

21

Figure 5.3 displays the total yield in SEK for all 44 shares using the model.

Figure 5.4. Total yield [SEK] for all 44 shares using the model (2012-02-22 to 2013-02-20).

Chapter 6 Discussion

6.1 Discussion

Since the model was designed under limited time there is a possibility that some covariates or combination of covariates have been overlooked. Despite this the resulting model gives a yield which is 40,98 % better using the model than without during the year 2012-02-22 to 2013-02-20. Because the model was only tested and compared to not using the model during the time period 2012-02-22 to 2013-02-20 it is not statistically established that the model yields a 40,98 % better result every year.

According to Microsoft Excel 2010^[18] the model has a R² value of 99,88 %. The R² value indicates how well the model predicts responses for new observations, which means that a R² value of 99,88 % is very good.

The standard error is minimized due to the use of Akaike’s Information Criterion. This gives that among all the models tested the best one has been used. All calculations have been made in computer based mathematical software programs meaning that the risk of calculation errors has been minimized. When using extrapolation there is a small chance of obtain an incorrect

(26)

22

result. However the difference between the extrapolated value and the real is small and it will only affect the predicted value with approximately 0,1-0,2 SEK. The reason for linear interpolation of the conjuncture was since the values were only given once a month they had to be linear between two months.

The reason for designing a general model was because of the fact that it is intended to be used as a day trading guideline in the mornings when the opening price is known. It is of great importance that the calculations can be completed quickly so that decisions regarding selling or buying shares can be done fast. This is where the general model has an advantage compared to several tailor made models. Security is found in numbers when it comes to share trading.

Four years (2008-01-02 to 2012-02-21) of data from every share was collected and inserted into a single Microsoft Excel 2010^[18] sheet. The amount of data made up 45892 observations.

Microsoft Excel 2010’s^[18] built-in function LINEST was used to perform the regression. The LINEST-function gives the _i:s value and the corresponding error term for every covariate used. The data was collected for four years due to the fact that more observations lead to a better reliability of the model. Another reason was to include the worldwide stock market crash of 2008. The β-values of the model will then be more accurate towards TCW index, conjuncture, oil price, Dow Jones index, Shanghai index and OMX Stockholm index which make the model more reliable.

Another approach to predict share’s closing price is to use Markov processes. Unlike multiple linear regression which analyses historical data the Markov process uses the present to predict the future. Since shares are memoryless i.e. they are not affected by yesterday’s cyclic movement, it is possible that Markov processes would have yielded a better result. The reason why multiple linear regression yields such a good result after all could be that four years of historical data from 44 shares displays people’s irrational behaviour which is the main reason why shares are considered random and unpredictable.

(27)

23

Chapter 7 Appendix

7.1 Predicted closing price compared to real closing price

Figure 7.1. Predicted vs. real closing price for Alfa Laval (2012-02-22 to 2013-02-20).

Figure 7.2. Predicted vs. real closing price for Alliance Oil (2012-02-22 to 2013-02-20).

0 50 100 150 200 250

110 115 120 125 130 135 140 145 150 155

Alfa laval

Day

Closing price [SEK]

predicted value real value

0 50 100 150 200 250

45 50 55 60 65 70 75 80 85 90

Alliance Oil

Day

Closing price [SEK]

(28)

24

Figure 7.3. Predicted vs. real closing price for Assa Abloy (2012-02-22 to 2013-02-20).

Figure 7.4. Predicted vs. real closing price for Atlas Copco (2012-02-22 to 2013-02-20).

0 50 100 150 200 250

170 180 190 200 210 220 230 240 250 260

Assa Abloy

Day

Closing price [SEK]

0 50 100 150 200 250

130 140 150 160 170 180 190

Atlas Copco

Day

Closing price [SEK]

(29)

25

Figure 7.5. Predicted vs. real closing price for Axfood (2012-02-22 to 2013-02-20).

Figure 7.6. Predicted vs. real closing price for Axis (2012-02-22 to 2013-02-20).

0 50 100 150 200 250

210 220 230 240 250 260 270 280

Axfood

Day

Closing price [SEK]

0 50 100 150 200 250

130 140 150 160 170 180 190

Axis

Day

Closing price [SEK]

(30)

26

Figure 7.7. Predicted vs. real closing price for BillerudKorsnäs (2012-02-22 to 2013-02-20).

Figure 7.8. Predicted vs. real closing price for Boliden (2012-02-22 to 2013-02-20).

0 50 100 150 200 250

50 55 60 65 70 75

BillerudKorsnäs

Day

Closing price [SEK]

0 50 100 150 200 250

85 90 95 100 105 110 115 120 125 130

Boliden

Day

Closing price [SEK]

(31)

27

Figure 7.9. Predicted vs. real closing price for Castellum (2012-02-22 to 2013-02-20).

Figure 7.10. Predicted vs. real closing price for Electrolux (2012-02-22 to 2013-02-20).

0 50 100 150 200 250

75 80 85 90 95 100

Castellum

Day

Closing price [SEK]

0 50 100 150 200 250

120 130 140 150 160 170 180

Electrolux

Day

Closing price [SEK]

(32)

28

Figure 7.11. Predicted vs. real closing price for Elekta (2012-02-22 to 2013-02-20).

Figure 7.12. Predicted vs. real closing price for Ericsson (2012-02-22 to 2013-02-20).

0 50 100 150 200 250

50 100 150 200 250 300 350 400

Elekta

Day

Closing price [SEK]

0 50 100 150 200 250

55 60 65 70 75 80

Ericsson

Day

Closing price [SEK]

Predicting share price by using Multiple Linear Regression.