• No results found

Swedish Stock market: Explaining trade volumes in single stocks

N/A
N/A
Protected

Academic year: 2021

Share "Swedish Stock market: Explaining trade volumes in single stocks"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Swedish Stock market: Explaining trade volumes in single stocks

Bacherlor´s Thesis, Mathematical Statistics, KTH Royal Institute of Technology

Author: Jesper Sevelin

Supervisor: Thomas Önskog

(2)

2

Abstract

Title: Swedish Stock market: Explaining trade volumes in single stocks

Author: Jesper Sevelin

Supervisor: Thomas Önskog

Department: Department of Mathematics

The Swedish stock market consists of roughly 750 companies listed on five different markets. Out of all those companies a significant portion are rarely traded. Stocks where the trading activity is low not only present a liquidity

problem to shareholders and potential investors but also affects the reputation of the traded company. A company whose shares are not actively traded does not have a market that actively puts a value on the company.

This study aims to interpret how daily trade volumes can be explained by both categorical and numerical variables associated with the companies listed in Sweden.

This study, contrary to popular belief, shows that the market of the listed stock is to a large degree irrelevant when explaining daily trade volumes of the stocks listed in Sweden. The study instead reveals the importance of factors such as shareholder structure, free float and number of outstanding shares in a company.

2017-06-09

(3)

3

Sammanfattning

Titel: Swedish Stock market: Explaining trade volumes in single stocks Författare: Jesper Sevelin

Handledare: Thomas Önskog

Institution: Matematiska Institutionen

Den svenska aktiemarknaden består av ca 750 bolag, listade på fem olika

marknadsplatser. Utav dessa bolag är det en stor del som upplever att deras aktie sällan handlas. Aktier med låg handelsaktivitet presenterar inte bara ett

likviditetsproblem för aktieägarna och potentiella investerare men kan också påverka det bakomliggande företagets rykte. Ett företag vars aktier sällan handlas har inte en aktiv marknad som sätter ett värde på bolaget.

Studien är inriktad på att förstå hur dagliga handelsvolymer kan förklaras både i termer av kategoriska samt numeriska variabler associerade med företagen noterade på en börslista eller motsvarande i Sverige.

Studien, tvärtemot vad många tror, visar att marknadsplatsen för företagens aktie är i hög grad irrelevant för att förklara dagliga handelsvolymer för aktierna som är noterade på en lista i Sverige. Studien visar istället vikten av aktieägarstruktur, free-float, och mängden utstående aktier i företaget.

2017-06-09

(4)

4

Contents

1 Introduction ... 6

1.1 The Swedish Stock Market ... 6

1.2 Becoming a listed company ... 6

1.3 Daily trade volumes ... 6

1.4 Research Question ... 6

1.5 The purpose ... 7

1.6 Scope of research ... 7

2 Theory... 7

2.0 Regression analysis ... 7

2.1 The regression model ... 7

2.2 Ordinary least squares ... 7

2.3 The assumptions of OLS ... 8

2.3.0 Strict exogeneity ... 8

2.3.1 Homoscedasticity... 8

2.3.2 No autocorrelation... 9

2.3.3 Normality, The Q-Q plot ... 9

2.3.5 No multicollinearity ... 9

2.4 Selecting and validating models ... 9

2.4.1 Hypothesis testing, p-values and the t-statistic ... 9

2.4.2 F-Test ... 10

2.4.3 R-Squared ... 10

2.4.4 Variance Inflation Factors ... 10

2.4.5 Akaike Information Criterion ... 11

2.5 Data selection ... 11

2.5.1 Variables ... 11

2.6 Transformation of variables ... 11

3 Methodology ... 12

3.1 Software ... 12

3.2 Variables ... 12

3.3 Base models ... 12

3.4 Data collection ... 13

3.5 Building the models, transformations ... 13

3.5.1 The dependent variables ... 13

3.5.2 The independent variables ... 14

3.6 Variable Selection ... 15

(5)

5

3.61 Akaike Information Criterion ... 15

3.6.2 Performing the regressions ... 15

3.6.3 Stepwise AIC ... 18

3.7 Validating the reduced models by computing R2 and adjusted R2... 19

3.8 The final models ... 19

3.8.1 Final model 1 ... 19

3.8.2 Final model 2 ... 19

3.8.3 Final model 3 ... 19

3.9 Reviewing residual and Q-Q plots of the final models ... 19

3.10 Checking for Multicollinearity ... 21

3.10.1 Computing the Variance Inflation Factors ... 21

3.11 F-statistic of the final models ... 21

4 Result ... 22

4.1 Market... 22

4.2 If the company pays dividends ... 22

4.3 Number of outstanding shares ... 23

4.4 Number of shareholders ... 23

4.5 Market capitalization ... 23

4.6 Number of forum posts ... 23

4.7 Free-float ... 24

4.8 Conclusions on hypothesis testing ... 24

4.8.1 Hypothesis testing, Trades ... 24

4.8.2 Hypothesis testing, Turnover ... 24

4.8.3 Hypothesis testing, Total volume ... 24

5. Discussion ... 25

5.1 Commenting the results ... 25

5.2 Model accuracy ... 25

5.3 Relevance to companies that are listed or planning on becoming listed ... 25

5.4 Further research ... 26

7 References ... 27

(6)

6

1 Introduction

1.1 The Swedish Stock Market

The Swedish stock market consists of approximately 750 companies with their common stock being listed on one of five different markets. One of the largest markets NASDAQ OMX is popularly referred to as “The Main market” [1]. Other markets that exists are First North,

Aktietorget, NGM Equity and NGM MTF. With the exception of The Main market and NGM Equity, also known as regulated markets, the other markets come with less strict regulations, suitable for non-matured companies who wants to operate in a listed environment, these are commonly referred to as growth markets.

1.2 Becoming a listed company

There are various reasons for why a company decides to become publicly traded. One of several reasons for a company to seek a public listing is to have a greater access to the capital market.

By performing an initial public offering a company can raise capital while gaining new

shareholders. An initial public offering is most often referred to as an IPO and is performed on one side by a company who offers their shares to the public and on the other side a market where the shares will be listed. When the IPO has been successfully executed the company has raised some capital and the new shareholders can offer to buy or sell shares through the market.

As a market listing often comes with rigorous regulations and transparency requirements the listing itself is often seen as a certificate of quality.

1.3 Daily trade volumes

There are different ways to measure the traded volumes for listed shares, three common practices are to look at either amount of trades, cash turnover or total number of shares traded. Earlier studies on the subject show how specific stock characteristics can be relevant to the trading activity. In a paper published by two MIT students a conclusion is drawn that risk, size, price, trading costs and if the stock was a part of the S&P500 index were relevant factors affecting turnover in single stocks. [2]

Other studies have been performed, showing that for companies listed on the Tehran Stock Exchange the free-float percentage has a linear relationship with the stock turnover ratio. [3]

While these studies provide valuable information regarding the behaviour of trade volumes they are often limited to analysis of purely numerical predictor variables that are characteristics of the stock, not the company.

Daily trade volumes are a common reference when discussing the interest for a company and low volumes are a cause of concern. It is not uncommon for companies with low trade volumes to debate the cost of maintaining an active listing. A company that is rarely traded will find it to be more difficult raising capital from the market and does not have a market that actively puts a value on the company. For some companies this is often used as a cause, among others, to delist their shares. Other companies maintain a high ambition of remaining as a listed company but are also struggling with the same issues related to low trade volumes.

1.4 Research Question

By performing multiple regression analysis this study seeks to interpret how variables

associated with the listed companies in Sweden affect the daily trade volumes of their stock and by doing so answering the following question:

What variables explain trade volumes and what is their effect?

(7)

7

1.5 The purpose

The results derived from this study will aid companies that seek to improve the trade volumes of their listed stocks. It will in addition to this provide valuable information to companies prior to initial public offerings to analyze whether the company is sufficiently mature to become a listed company.

1.6 Scope of research

This study encompasses all companies that are listed on a stock market in Sweden.

2 Theory

2.0 Regression analysis

Regression analysis is a process within statistics where one seeks to estimate the relation between a variable 𝑦, a response variable, and the values of other variables, called covariates.

The covariates can also be referred to as “explanatory variables” as they are used to explain the outcome of 𝑦. A residual is also present, it consists of the unexplained part; that is, the difference between the true outcome of 𝑦 and the estimated outcome derived from the covariates. [4]

Regression analysis can be a powerful tool when one seeks to investigate how two or more variables are related to each other. It is employed in numerous fields, one of importance being finance.

2.1 The regression model

The linear regression model can be defined as:

𝑦

𝑖

= ∑

𝑘𝑗=0

𝑥

𝑖𝑗

𝛽

𝑗

+ 𝑒

𝑖

, 𝑖 = 1, … , 𝑛

(1)

The subscript 𝑖 refers to an observation while the subscript 𝑗 refers to the independent variable (covariate) of number 𝑗. Consequently 𝑥𝑖𝑗 is observation 𝑖 on covariate 𝑗. The betas 𝛽𝑗 are the unknown parameters that are to be estimated. In the equation 𝛽0 is the intercept, i.e. where the regression line intersects the 𝑦-axis.

The assumption is that the explanatory variables (the covariates) influences the dependent variable, and not necessarily the other way around. This is the basis of a structural

interpretation. [4]

The regression model contains multiple explanatory variables and is therefore a multiple linear regression model.

2.2 Ordinary least squares

One procedure, among many, to estimate the coefficients 𝛽𝑗 in equation (1) is called ordinary least squares (“OLS”). This study only employs least squares for the estimation of coefficients.

To illustrate the OLS estimates equation (1) can be written by matrix notation:

𝑦 = 𝑋𝛽 + 𝑒

(2)

where 𝑋 is the matrix containing the independent variables, 𝑦 and 𝑒 are two vectors of equal size. The OLS estimate seeks to minimize the sum of squares:

ê

𝑇

ê = |ê|

2 (3)

(8)

8 Referencing to (2) and (3), in accordance with the normal equations the estimate, using the OLS procedure is:

𝛽̂ = (𝑋

𝑇

𝑋)

−1

𝑋

𝑇

𝑌

(4)

where the ˆ notation refers to the predicted values. [4]

2.3 The assumptions of OLS

In order for OLS to be an efficient estimator several assumptions are often constructed. Some of these assumptions are described below:

2.3.0 Strict exogeneity

Strict exogeneity refers to the assumption:

𝐸(𝑒

𝑖

) = 0

(5)

This can be explained as the conditional mean of the regression errors being zero.

A violation of this is called “endogeneity” and introduces a problem where the OLS estimates can become invalid. Endogeneity arises when the residuals are correlated with one or several of the covariates. OLS assumes the opposite that no correlation between residuals and covariates exist.

This is a common concern within the field of econometrics where supply and demand models are often used. As an example the independent variable “price” on the dependent variable

“demand” changes as demand change and “demand” also changes as a result of different prices.

The residual instead contains covariates that explain the simultaneous relationship, an example being “Advertising”. The solution is often to introduce a variable, for this purpose called an

“Instrumental variable” that is uncorrelated with the residual but correlated with the problematic covariate. [4]

2.3.1 Homoscedasticity

Homoscedasticity is defined as the presence of same variance, as derived from the name. The meaning is that, with reference to Equation (1) that the error terms 𝑒𝑖 have the same standard deviation. This is also assumed if OLS is to be the best estimator of 𝛽 in Equation (2). As homoscedasticity is assumed a violation of such can be of concern. When homoscedasticity cannot be observed it is instead called heteroscedasticity. [4]

Heteroscedasticity can be discovered by for example plotting the residuals against the fitted values of a model. This is illustrated below. For homoscedasticity the residuals should be randomly distributed along the red fitted line. The model to the left can be suspected of being heteroscedastic while the model to the right is close to homoscedastic.

Figure 1. For a set of data the residuals are plotted against the fitted values. To the left heteroscedasticity can be observed. To the right the data is closer to homoscedastic.

(9)

9

2.3.2 No autocorrelation

Autocorrelation refers to the situation where the error terms between observations are correlated. For OLS it is assumed that these error terms are uncorrelated. Autocorrelation is common when dealing with time-series data, or overall when dealing with measurements over time.

2.3.3 Normality, The Q-Q plot

Q-Q plots help us assess whether data points are derived from the same theoretical distribution.

An example is whether it is normally distributed or not. To test this the quantiles from a set of data is scatter-plotted against the quantiles of a normal distribution. If both data sets come from the same distribution, in this case the normal distribution, they should now lie on a line,

specifically 𝑦 = 𝑥. If the data points depart significantly from the line it indicates that the data set tested is not normally distributed. If this departure is seen when using the Q-Q plot to assess the residuals it can be problematic as normality of the residuals is often used as an assumption for OLS [6]

2.3.5 No multicollinearity

Multicollinearity is the presence of correlation between one or several regression variables.

When a strong multicollinearity is present the standard errors of the estimated coefficients become large and therefore the estimate becomes inaccurate. [4]

There are ways to test for multicollinearity and one option is to compute the variance inflation factors, which will be described in section 2.4.4.

2.4 Selecting and validating models

2.4.1 Hypothesis testing, p-values and the t-statistic

Hypothesis testing is a common phrase in statistics and refers to the process of testing a hypothesis that is formulated for the relationship between data and then compared to another hypothesis that proposes that the relationship is non-existent, also called the null hypothesis.

The null hypothesis can be denoted as:

𝐻0:

𝛽 = 𝑎

(6)

Where 𝑎 can take on any value; however, for this study where one seeks to prove that the regression coefficients 𝛽 are non-zero the value of 𝑎 would be set to 0.

The statistician who seeks to prove that a relationship exists assumes the following hypothesis:

𝐻1:

𝛽 ≠ 𝑎

(7)

If it is accepted that a type 1 error occurs with a certain probability α, that is, the probability to reject the null hypothesis despite it being true, then for any test of the hypothesis 𝛽 = 𝑎 where the p-value is at most α then the alternative hypothesis 𝛽 ≠ 𝑎 must be true and the null

hypothesis can be rejected. [4]

An often occurring test in statistics which is often generated by statistics software is the t- statistic. The t-statistic is a form of ratio that explains if the means of two groups are equal or

(10)

10 not. If the two means are equal the null hypothesis stays true. Given the above definition of the null-hypothesis the t-statistic can be defined as:

𝑡 = 𝛽̂

𝑆𝐸(𝛽̂) (8)

Where 𝑆𝐸(𝛽̂) refers to the estimated standard error of the predictor estimate 𝛽̂. [7]

A relevant question to ask for each coefficient 𝛽 is: Under the assumption that the null hypothesis is true, what is the probability that the value of the coefficient is equal or more extreme than the actual observed result. This value is called “p-value” and becomes a measure of significance. The p-value can be obtained from the t-distribution, with regards to the degrees of freedom and the obtained t-value. The threshold where the p-value is said to indicate non- significance for a variable is pre-defined and depends on what level of significance is chosen.

Usually a 5 % level is used and this is the level that was used throughout the study. [8]

2.4.2 F-Test

While the t-statistic, as described in 2.4.1 was employed to test individual covariates the F-Test refers to a similar procedure but where several covariates can be assessed at the same time. If desired the whole model can be tested at once. In terms of hypothesis testing the F-Test can be formulated as a method of evaluating whether the hypothesis that several (𝑟 number of

coefficients), or all coefficients are equal to zero holds, that is, if the null-hypothesis is true.

It is defined as:

𝐹 =1

𝑟

𝛽

̂𝑡

𝑉

̂−1

𝛽

̂

(9)

Where 𝑉̂ is the estimated covariance matrix and 𝑟 the number of parameters in 𝛽̂.

Similarly as with the t-test the p-value for F can be calculated from the F-distribution and compare with the significance level of choice. [4]

The F-test where all coefficients are tested is sometimes mentioned as a test of “overall

significance”. The implication of testing all coefficients is that one investigates a null hypothesis stating that the fit of the model only containing the intercept is equal to the full model.

2.4.3 R-Squared

R2 is commonly referred to as the “coefficient of determination” and provides a value for the goodness of fit for the regression model. It is defined as:

𝑅

2

= 1 −

𝑛𝑖=1 (𝑦𝑖−𝑦̂𝑖)2

𝑛𝑖=1(𝑦𝑖−𝑦̅)2 (10)

Where 𝑦̅ is the horizontal line that passes through the intercept of the linear regression line 𝑦̂.

The value derived from R2 can be seen as a proportion of how much of the variance in the dependent variable that is predicted by the independent variables. [9]

A reduced R2 can be computed which penalizes the addition of independent variables to the model.

2.4.4 Variance Inflation Factors

One way to compare multicollinearity between models is to compute the variance inflation factors of each model, VIF. VIF provides a measure of how much a variable is inflated because of

(11)

11 collinearity with other variables [10]. The principle behind calculating the VIF values is to perform multiple regressions, using each independent variable as a dependent variable and running the regression with the remaining independent variables. If one of the regressions yields a high R2 value it indicates that much of the variance of that variable can be expected to originate from the other independent variables, and consequently multicollinearity is present. The variance inflation factor can be calculated as:

𝑉𝐼𝐹

𝑗

=

1

1−𝑅𝑗2

(11)

The 𝑅𝑗2 values originate from the regression of 𝑋𝑗 as dependent variable with all other variables as explanatory, as defined in section 2.4.3. The resulting number can be used as a measure of

multicollinearity with a high value indicating high collinearity. There are several rules of thumbs as to at what value the VIF is a cause of concern. One rule of thumb is a VIF value exceeding 10.

However, VIF values should not alone be used to determine the design of the. High VIF values far exceeding 10 does not necessarily warrant the removal of one or correlated variables. Instead, other factors that affect the variance of the coefficients need to be considered. [11]

2.4.5 Akaike Information Criterion

Comparing and selecting models can be done with guidance from the Akaike Information Criterion test where the best model is considered the one that minimizes:

𝐴𝐼𝐶 = 𝑛 log(|ê|2) + 2𝑘 (12)

Where n is the number of observations and k is the number of coefficients of the variables. The symbol ê refers to the residuals and can be defined by rewriting Equation (2):

ê = 𝑌 − 𝑋𝛽̂

(13)

Where the hat notation refers to the predicted values from the regression. [4]

2.5 Data selection

Relevant data needs to be selected that are believed to have an effect on the dependent variable.

For this study variables that are intuitively relevant has been selected together with

unconventional variables that are believed to also have an effect on the daily trade volumes of a stock.

2.5.1 Variables

In a regression it is often talked about two types of variables, continuous or categorical.

Categorical variables are often treated as a binary variable in regression analysis and is

therefore attributed either 1 or 0 as value, indicating If the observation belongs to the category.

They are often referred to as dummy variables. Continuous variables can take on any value.

2.6 Transformation of variables

The idea behind transforming variables can be to get a better fit and sometimes for the purposes of interpretation.

An example where a variable is logarithmically transformed for the purpose of interpretation is when it is suitable to describe the impact of an explanatory variable as a percentage change rather than a numeric amount [4].

(12)

12 A logarithmic transformation of variables can also be used to make highly skewed data come closer to a normal distribution. Skewed data can be an indication that a nonlinear relationship exists between the dependent and independent variable. In order to preserve the linear model the skewed variable can be replaced with the logarithm of that variable. [5] Highly skewed data might also have multiple outliers among the data points and these points can have a severe leverage effect on the slope of linear trend lines. After logarithmic transformation the data including the outliers, are moved closer to each other.

3 Methodology

3.1 Software

The collected data was compiled and processed in Microsoft Excel. The processed data was saved as comma separated files (.csv). The programming language for statistical computing R was used with R-Studio as the integrated development environment. Consequently R-Studio was used to extract data from the comma separated data files and run the regressions.

3.2 Variables

The variables used in this study are shown below:

Covariate Type of covariate Description

Trades Dependent, continuous The average amount of daily trades Turnover Dependent, continuous The average daily cash volume traded Totalvolume Dependent, continuous The average daily amount of shares traded dividend Categorical If the company is paying dividends

NASDAQ(used

as benchmark) Categorical If the company is listed on the NASDAQ main market

firstnorth Categorical If the company is listed on First North aktietorget Categorical If the company is listed on Aktietorget

ngm Categorical If the company is listed on NGM Equity

nordicmtf Categorical If the company is listed on NGM MTF

inlforum Continuous Amount of forum posts

shareholders Continuous Amount of shareholders in the company marketcap Continuous The market capitalization of the company freefloat Continuous The free-float of the company

NumberShares Continuous The number of outstanding shares

Table 1. The variables that are used and referred to throughout this paper

3.3 Base models

Trade volumes have been measured by three response variables, turnover, total volume and number of trades (for definitions see Table 1). These three measured variables are the most quoted statistics when gathering information from a stock market on the trading activity of a stock. They are related but tell slightly different things and will therefore be put as three

separate response variables. Three base models with each of the dependent variables have been computed. The three models are run with the same covariates, as specified in Figure 1.

One of the models with response variable “Trades” is shown below:

(13)

13 Trades = 𝐵1(𝑎𝑘𝑡𝑖𝑒𝑡𝑜𝑟𝑔𝑒𝑡) + 𝐵2(𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑) + 𝐵3(𝐹𝑖𝑟𝑠𝑡𝑛𝑜𝑟𝑡ℎ) + 𝐵4(𝑛𝑔𝑚)

+ 𝐵5(𝑛𝑜𝑟𝑑𝑖𝑐𝑚𝑡𝑓) + 𝐵6(𝑖𝑛𝑙𝑓𝑜𝑟𝑢𝑚) + 𝐵7(𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠) + 𝐵8(𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵9(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵10(𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠) + 𝐵11(𝐿𝑎𝑟𝑔𝑒𝑐𝑎𝑝) + 𝐵12(𝑀𝑖𝑑𝑐𝑎𝑝) + 𝐵13(𝑆𝑚𝑎𝑙𝑙𝑐𝑎𝑝)

3.4 Data collection

Data for the dependent variables turnover, total volume and trades has been retrieved from the stock markets NASDAQ OMX Nordic, First North, Aktietorget and Nordic Growth Market. The data refers to daily records of each company stock between the dates 1 January 2016 to 6 March 2017, which corresponds to approximately 300 days of trading. For each stock the average daily turnover, volume and trades were extracted.

Data for the independent variables was to a large degree collected through the Avanza Stock Filter, which is available online. Free-float was calculated based on the shareholder structure which each listed company updates quarterly. To retrieve information on the amount of shareholders data was retrieved by visiting the central securities depository in Stockholm.

3.5 Building the models, transformations

As described in section 2.6 the transformation of either the dependent variable or one or several of the independent variables can prove to be useful. This section takes a look at the variables described in Figure 1.

3.5.1 The dependent variables

Trades, turnover and volume are three variables used in the three regression models used for this study. As the dependent variables can only take on positive values they should intuitively be logarithmized [4]. Another argument to use the log of a variable is that it is somehow skewed in its distribution. A way to illustrate the distribution of data is to plot a histogram of the variable of choice. A histogram of the dependent variable “Trades” is shown below:

Figure 2. The effect of logarithmic transformation of the dependent variable "Trades". As observed a heavily positively skewed distribution of data can come close to a normal distribution after transformation

As visible in Figure 1 the data is heavily skewed and can be difficult to fit with the linear

regression models. This is mitigated by using the logarithm of the affected variable. By using the logarithm on skewed data a more even distribution can be expected. As visualized in figure 2 the logarithm of “Trades” proves useful to achieve a normal distribution.

The same transformation was considered and performed for “Turnover” and “Totalvolume”.

(14)

14

3.5.2 The independent variables

The quantitative variables “free float”, “number of shareholders”, “marketcap” and “number of shares” are reviewed and some are transformed, as seen on the next page. A heavy positive skew can be observed in all variables except “free float”, which is left untouched. It is also easily observed that the three covariates take on a wide range of values from low to very high, this itself warrants the use of logarithmic transforms.

Figure 3. The independent variables before (to the left) and after (to the right) logarithmic transformation

(15)

15 These variables, with the exception of free-float, will be considered appropriate for log

transformations.

3.6 Variable Selection

3.61 Akaike Information Criterion

Based on the observations in Figure 3 and 4 the dependent variables and three of the independent variables are log transformed.

A test using Akaike information criterion is performed to explore whether a reduced model is preferred, the results are as below with the following reference number for each variable:

1: Log(marketcap) 2: log(NumberShares) 3: log(Shareholders). 4: freefloat. 5: Dividend. 6:

Inlforum. 7: ngm. 8: Nordicmtf. 9: Firstnorth. 10: aktietorget. As an example in the third column the value “-1” represents the removal of “log(marketcap). The values under “-1” refers to the change in AIC compared to the AIC value of the full model.

Table 2. AIC test performed in order to test whether a reduced model is preferred.

The values in Table 2 represent the action of reducing the model, one variable at a time, all others included. The variables The values are the change in AIC upon the removal of that variable. A positive value means a higher AIC sum for the model, indicating that the variable should be kept in the model.

As visible it is warranted to keep the log-transformed independent variables. Some benefits are indicated as to the removal of “nordicmtf”, “firstnorth” and “aktietorget”. However, the removal of the variables only reduce the the akaike values by a relatively small amount. Individual p- values will be reviewed when the regressions are performed and will decide on the elimination of the aforementioned variables from the models. Upon the potential removal of these variables the reduced models will also be reviewed by computing the AIC value stepwise.

3.6.2 Performing the regressions

With the new log-models the regressions are run and the estimates and individual p-values for the coefficients are presented. In this section the coefficients will be reviewed in terms of p- values and a stepwise AIC test will be performed.

Full model

(AIC) -1 -2 -3 -4 -5 -6 -7 -8 -9 -10

Log- Model

1 1567.6 84.7 22.7 152.4 62.7 10.7 2.44 12.9 -1.8 -1.9 -2 Log-

Model

2 1828.4 223.6 7.3 97.3 66.6 0.949 2.98 9.88 -1.96 -1.92 -2 Log-

Model

3 1676.2 41.4 516.6 77.8 101.4 7.4 1.1 3.4 -1.1 -1.8 -1.5

(16)

16 3.6.2.1 Regression from log-model 1

Log-model 1 Coefficients:

Estimate Std. Error t value p-value (Intercept) -5.883915 0.455837 -12.908 < 2e-16 Log(marketcap) 0.269349 0.028180 9.558 < 2e-16 Log(NumberShares) 0.156832 0.031542 4.972 8.58e-07 Log(shareholders) 0.625322 0.047688 13.113 < 2e-16

freefloat 1.643060 0.200366 8.200 1.39e-15

dividend -0.341982 0.096409 -3.547 0.000419

inlforum 0.007051 0.003367 2.094 0.036640

ngm -1.126831 0.292306 -3.855 0.000128

nordicmtf 0.087610 0.197313 0.444 0.657188

firstnorth -0.029121 0.110082 -0.265 0.791451

aktietorget 0.009318 0.131938 0.071 0.943719

Table 3. At the 5 % significance level the variables “nordicmtf”,”firstnorth” and “aktietorget” are deemed insignificant and should be removed from the model.

For log-model 1 the regression shows insignificance at the 5 % level for multiple variables. The model will now be reduced one variable at a time, starting with the least significant variable:

Coefficients: Coefficients: Coefficients:

p-value p-value p-value

(Intercept) < 2e-16 (Intercept) < 2e-16 (Intercept) < 2e-16 Log(marketcap) < 2e-16 Log(marketcap) < 2e-16 Log(marketcap) < 2e-16 Log(NumberShares) 8.40e-07 Log(NumberShares) 6.85e-07 Log(NumberShares) 6.66e-05 Log(shareholders) < 2e-16 Log(shareholders) < 2e-16 Log(shareholders) < 2e-16 freefloat < 2e-16 freefloat < 2e-16 freefloat < 2e-16

dividend 0.000216 dividend 0.000232 dividend 0.000207

inlforum 0.036554 inlforum 0.038149 inlforum 0.038760

ngm 7.06e-05 ngm 7.63e-05 ngm 7.06e-05

nordicmtf 0.642421 nordicmtf 0.570206

firstnorth 0.679422

Table 1. P-values are reviewed by stepwise elimination of insignificant variables

Table 3 shows the p-values of the coefficients as the least significant variable is removed. Only three variables were removed as the other coefficients were sufficiently significant during the stepwise removal (p-value below 5 %). The stepwise removal of variables is performed since the removal of one variable might affect the significance of other variables. By removing one

variable at a time the reduced model can be reviewed and evaluated for further reduction.

(17)

17 3.6.2.2 Regression from log-model 2

Log-model 2 Coefficients:

Estimate Std. Error t value p-value

(Intercept) 2.194271 0.560664 3.914 0.000101

Log(marketcap) 0.565925 0.034661 16.328 < 2e-16 Log(NumberShares) 0.117754 0.038795 3.035 0.002504 Log(shareholders) 0.603162 0.058655 10.283 < 2e-16 freefloat 2.080638 0.246443 8.443 < 2e-16

dividend -0.202101 0.118580 -1.704 0.088819

inlforum 0.009185 0.004141 2.218 0.026924

ngm -1.234486 0.359526 -3.434 0.000635

nordicmtf -0.046746 0.242688 -0.193 0.847323

firstnorth 0.036450 0.135397 0.269 0.787857

aktietorget 0.001564 0.162280 0.010 0.992314

Table 2. At the 5 % significance level the variables “dividend, “nordicmtf”,”firstnorth” and “aktietorget” are deemed insignificant and should be removed from the model.

For log-model 2 the regression also show insignificance for some variables. As with the regression for log-model 1 a stepwise reduction of the model was performed. The result is shown below:

Coefficients:

p-value

(Intercept) 3.09e-05

Log(marketcap) < 2e-16 Log(NumberShares) 0.000822 Log(shareholders) < 2e-16

freefloat < 2e-16

Inlforum 0.011847

ngm 0.000420

Table 3. P-values are reviewed by stepwise elimination of insignificant variables

(18)

18 3.6.2.3 Regression from log-model 3

Log-model3 Coefficients:

Estimate Std. Error t value p-value (Intercept) -9.349573 0.496885 -18.816 < 2e-16 Log(marketcap) -0.204314 0.030718 -6.651 6.39e-11 Log(NumberShares) 0.966926 0.034382 28.123 < 2e-16 Log(shareholders) 0.475279 0.051982 9.143 < 2e-16

freefloat 2.295404 0.218408 10.510 < 2e-16

dividend -0.321666 0.105091 -3.061 0.0023

inlforum 0.006375 0.003670 1.737 0.0829

ngm -0.741257 0.318627 -2.326 0.0203

nordicmtf 0.200674 0.215081 0.933 0.3512

inlforum 0.006375 0.003670 1.737 0.0829

aktietorget -0.094800 0.143819 -0.659 0.5100

Table 4. At the 5 % significance level the variables “inlforum”, “nordicmtf”,”firstnorth” and “aktietorget” are deemed insignificant and should be removed from the model.

The regression for log-model 3 yielded several insignificant variables. As with the other models these variables are stepwise removed and the p-values of the remaining coefficients are shown below:

Coefficients:

p-value

(Intercept) < 2e-16 Log(marketcap) 9.92e-12 Log(NumberShares) < 2e-16 Log(shareholders) < 2e-16

freefloat < 2e-16

dividend 0.000845

ngm 0.018782

Table 5. P-values are reviewed by stepwise elimination of insignificant variables

3.6.3 Stepwise AIC

Stepwise AIC, for this section tested backwards, works by testing the full model one covariate at a time and the removed covariate that caused the largest reduction of AIC is excluded from the model. The test is then repeated until the AIC value cannot be reduced further. The stepwise iteration is performed in the interface R-Studio using the built in “Step” function.

The result from the stepwise AIC is visualized below, showing what covariate is removed and in what order.

Step 1 Step 2 Step 3

log-model 1 -aktietorget -firstnorth -nordicmtf log-model 2 -aktietorget -nordicmtf -firstnorth log-model 3 -firstnorth -aktietorget -nordicmtf

Table 6. Stepwise AIC, showing what covariate is removed for every step

The stepwise AIC test does not provide any conflicting results with the significance tests where p-values were reviewed. However the AIC values do not indicate that a removal of “inlforum” for

(19)

19 log-model 2 and “dividend” for log-model 3 is warranted. The p-values suggest otherwise and these two variables will be removed where applicable in each model. Also, for all models

“aktietorget”, “firstnorth” and “nordicmtf” will be removed

3.7 Validating the reduced models by computing R

2

and adjusted R

2

The next step is to validate the reduced models by computing a measure of goodness of fit, as discussed in section 2.4.3 For this purpose R2 is computed for the three models, first for the base log-models, and then for the reduced models which have been constructed as a result of the previous tests.

MODEL R2 Adjusted R2 REDUCED MODEL R2 Adjusted R2

Log-Model 1 0.7959 0.7926 Log,Model 1 0.7957 0.7934

Log-Model 2 0.8293 0.8266 Log-Model 2 0.8284 0.8267

Log-Model 3 0.8188 0.8158 Log,Model 3 0.8171 0.8154

Table 7. The log-models are compared with the reduced log-models.

The values of R2 shows that as the models are reduced the goodness of fit becomes only slightly worse, and adjusted for two decimals the goodness of fit is unchanged. The reduced models can therefore be kept and the final models can be constructed.

3.8 The final models

Throughout section 3 rigorous testing and diagnostics have been performed on the base models.

The tests have prompted changes and the final models are presented below:

3.8.1 Final model 1

log(Trades) = 𝐵1(log (𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵2(log (𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠) + 𝐵3(log (𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠)) + 𝐵4(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵5(𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑) + 𝐵6(𝑖𝑛𝑙𝑓𝑜𝑟𝑢𝑚) + 𝐵7(𝑛𝑔𝑚)

3.8.2 Final model 2

log(Turnover) = 𝐵1(log (𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵2(log (𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠)

+ 𝐵3(log (𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠)) + 𝐵4(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵5(𝑖𝑛𝑙𝑓𝑜𝑟𝑢𝑚) + 𝐵6(𝑛𝑔𝑚)

3.8.3 Final model 3

log(Totalvolume) = 𝐵1(log (𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵2(log (𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠)

+ 𝐵3(log (𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠)) + 𝐵4(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵5(𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑) + 𝐵6(𝑛𝑔𝑚)

3.9 Reviewing residual and Q-Q plots of the final models

In section 2.3 residual and Q-Q plots are discussed. In this section the final models are reviewed by plotting their respective residual and Q-Q plots:

(20)

20

Table 8. Residual and Q-Q plots for Log(Model 1). The residuals are to a large degree randomly spread and the model can be assumed to be close to homoscedastic. The Q-Q plot indicates that the normality assumption of the residual holds.

Table 9. Residual and Q-Q plots for Log(Model 2). The residuals are to a large degree randomly spread and the model can be assumed to be close to homoscedastic. The Q-Q plot indicates that the normality assumption of the residual holds. One outlier can be observed.

Table 10. Residual and Q-Q plots for Log(Model 3). The residuals are to a large degree randomly spread and the model can be assumed to be close to homoscedastic. The Q-Q plot indicates that the normality assumption of the residual holds, however some outliers can be observed at both ends of the plot.

The residual and Q-Q plots show that for all models 1-3 the assumptions (as described in Section 2.3) of homoscedasticity and normally distributed residual holds. One last diagnostics will be run on the final models. The final models will be reviewed in terms of multicollinearity among the variables.

(21)

21

3.10 Checking for Multicollinearity

As discussed in Section 2.3 the presence of multicollinearity is a violation of one of the

assumptions of OLS. Strong collinearity among covariates can cause individual coefficients to be incorrectly specified.

3.10.1 Computing the Variance Inflation Factors

To review whether multicollinearity is present among the covariates the variance inflation factors (“VIF”) are computed, the theory behind this is described in section 2.4.4.

The VIF values for Final Model 1-3 are printed out below:

Independent variable, Final Model 1 Vif

Log(marketcap) 4.026025

Log(NumberShares) 2.084777

Log(shareholders) 3.933942

freefloat 1.341512

dividend 1.914119

inlforum 1.100350

ngm 1.016544

Table 11. Variance inflation factors for the independent variables in Final Model 1. The VIF values are relativly low and not a cause of concern

Independent variable, Final Model 2 Vif

Log(marketcap) 2.801390

Log(NumberShares) 2.026184

Log(shareholders) 3.911890

freefloat 1.336846

inlforum 1.078921

ngm 1.015808

Table 12. Variance inflation factors for the independent variables in Final Model 2. For this model the VIF values are also relatively low and not a cause of concern

Independent variable, Final Model 3 Vif

Log(marketcap) 4.013539

Log(NumberShares) 2.084102

Log(shareholders) 3.928862

freefloat 1.294825

dividend 1.876842

ngm 1.016195

Table 13. Variance inflation factors for the independent variables in Final Model 3. The VIF values for the model are low and not a cause of concern

The results above show that collinearity between the covariates is low for all final models. The assumption of no multicollinearity, as described in section 2.3 is therefore satisfied. No changes to the final models are required based on the VIF-values.

3.11 F-statistic of the final models

The F-statistics for the final models are computed below together with its p-value:

(22)

22 F-stat p-value

Final Model 1 346.1 < 2.2e-16 Final Model 2 501.1 < 2.2e-16 Final Model 3 464.0 < 2.2e-16

Table 14. F-Statistic for the final models.

4 Result

4.1 Market

The study overall shows that the market on which a company is listed is not relevant to explain the daily trade volumes. The exception is the market NGM which shows a negative effect on trade volumes.

Market Trades Turnover Totalvolume

β β β

NGM Equity -1.127335 -1.226173 -0.72323

First North Insignificant Insignificant Insignificant Aktietorget Insignificant Insignificant Insignificant Nordic MTF Insignificant Insignificant Insignificant

Table 15. Regression results for the categorical variables that describe the market that a company belongs to

The results show that the amount of trades of a company stock, all other variables held

constant, for companies listed on NASDAQ relative to companies listed on NGM Equity decreases by 67 % while on the latter market. The turnover, i.e. the cash turnover decreases by 70 %. The total volume (number of shares traded) decreases by 51 %. For companies listed on NASDAQ relative to the other markets there is no significance meaning, indicating that they can enjoy the same amounts of trade volumes on the other markets.

4.2 If the company pays dividends

Trades Turnover Totalvolume

β β β

Dividend -0.343015 Insignificant -0.33380

Table 16. Regression results for if a company pays dividends or not

The results show that, all other variables held constant, a company that pays dividends will have the number trades decreased by 29 %. Paying dividends is insignificant for the cash turnover.

The total number of shares traded is decreased by 28%.

(23)

23

4.3 Number of outstanding shares

Trades Turnover Totalvolume

β β β

Number of outstanding

shares 0.158270 0.128181 0.96920

Table 17. Regression results for the number of outstanding shares

The results show, all other variables held constant, that when the number of outstanding shares are changed by 1 % the number of trades are changed by 0.16 %. The cash turnover is similarly changes by 0,13 % and the total number of shares traded changes by 1%.

4.4 Number of shareholders

Trades Turnover Totalvolume

β β β

Number of

shareholders 0.624703 0.610832 0.48761 Table 18. Regression results for the number of shareholders

The results show, all other variables held constant, that when the number of shareholders are changed by 1 % the number of trades changes by 0.62%. The cash turnover changes by 0.61 % and the total number of shares traded by 0.48 %.

4.5 Market capitalization

Trades Turnover Totalvolume

β β β

Market capitalization 0.269125 0.532276 -0.20179 Table 19. Regression results for the market capitalization

The results show, all other variables held constant, that when the market capitalization is changed by 1 % the number of trades are changed by 0.26 %, the cash turnover is changed by 0.53 % and the total number of shares traded by -0.20%.

4.6 Number of forum posts

Trades Turnover Totalvolume

β β β

Number of forum posts 0.006938 0.010315 Insignificant

Table 20. Regression result for the monthly number of forum posts in the Avanza stock forum "Placera"

The results show, all other variables held constant, that one unit change in the number of

monthly forum posts in the Avanza stock forum “Placera” would change the amount of trades by 0.7% and the cash turnover by 1 %. As for the total number of shares traded the number of forum posts are insignificant.

(24)

24

4.7 Free-float

Trades Turnover Totalvolume

β β β

Free-float 1.659238 2.092709 2.32018

Table 21. Regression results for the free-float of a company

The interpretation of the results is similar to the interpretation of the other variables that are not log-transformed. As free-float can take on any value between 0 and 1 it is instead suitable to describe the changes in the dependent variable for every 0.1 unit change in free-float. The results show, all other variables held constant that a 0.1 unit change in free-float results in a 16.7

% change in trades. The cash turnover is changed by 20.1 % and the total volume changed by 23.2 %.

4.8 Conclusions on hypothesis testing

4.8.1 Hypothesis testing, Trades

The regression model shows that all variables except if the company was listed on the Nordic Growth Market, First North or Aktietorget do explain some of the daily trades of the company stock. This with the support of the F-statistic and the p-value indicating significance from 3.11 the null hypothesis for the model as a whole can therefore be rejected. The null hypothesis for the insignificant variables is not rejected.

4.8.2 Hypothesis testing, Turnover

This regression model shows that all variables except if the company was listed on the Nordic Growth Market, First North or Aktietorget or if the company pays dividends, do explain some of the cash turnover of the company stock. The null hypothesis for the model as a whole can can be rejected, as is visible in 3.11 where the p-value for the F-statistic shows significance. As with the other model the null hypothesis for the insignificant variables is not rejected.

4.8.3 Hypothesis testing, Total volume

This regression model instead shows that all variables except if the company was listed on the Nordic Growth Market, First North or Aktietorget or the number of forum posts (Avanza Stock forum), do explain some of the result on the number of shares traded of the company stock. The null hypothesis for the model as a whole can for this model be rejected as well, similarly with support from the results in section 3.11. As with the other two models the null hypothesis for the insignificant variables is not rejected.

(25)

25

5. Discussion

5.1 Commenting the results

The three final models yielded R2 values around 0.8 and adjusted R2 just marginally reduced compared to the non-reduced R2, which is satisfactory considering the presence of human behaviour in stock trading.

A surprising result was that for companies listed on the main market NASDAQ the change to another market, NGM Equity being the exception, did not matter in terms of explaining trade volumes. This is contrary to popular belief of which the NASDAQ main market always has been considered the best market, on many factors.

Another surprise was the effect of if a company is paying dividends or not. Intuitively one might think that it should have a positive effect on trade volumes. There is a risk that this market is correlated with a variable not used in the regressions, that is, business sector. It is easy to notice that the type of companies that are paying continuous dividends are to a large degree

investment companies, real estate companies and in general other mature companies. A large portion of companies listed in Sweden are actually on the growth markets, Nordic Growth Market, Aktietorget or First North. As these are companies in development they rarely pay dividends.

In summary the results obtained from this variable should be met with a sceptic eye.

5.2 Model accuracy

All models employed were log-models with both dependent and independent variables being log-transformed. The usage of log-transforms were highly justified but it still affects

interpretability. Results obtained for single variables should be reviewed with a cautious eye as the other variables need to be taken into consideration as well. The model should also be used on the basis of interpretation rather than relying on the coefficient value to precisely explain the changes in trade volumes. Instead one should interpret the results as to them showing the direction of change and significance in change when a variable is altered.

It ought to be mentioned that for the test performed in this paper the assumptions of OLS holds.

5.3 Relevance to companies that are listed or planning on becoming listed

A vital question throughout this study has been whether listed companies in Sweden can, based on what the results of this paper show, increase the trade volumes of their stocks. As mentioned above both the number of shareholders, free float and number of shares in the company can have potentially material effect on the trade volumes of the company. Companies can to some degree influence these three factors.

Drastic measures to increase the number of shareholders can be to offer shares to the market, also called a share issuance, or to merge with another company that has a strong shareholders base.

Issues with low free-floats can be dealt with by a similar fashion, it can also be avoided by raising capital from the open market instead of institutional investors.

(26)

26 The number of shares in a company is something that a company can alter by performing share- splits, for example by having 1 share becoming 10 shares, each valued at 1/10 of the original share.

Companies listed on the NGM Equity market can consider applying for a listing elsewhere, as this market indicated that trade volumes were lower associated with this market.

Companies that are about to perform an IPO sometimes only offer a small portion of company shares to the open market. This study can help such companies to foresee if they will be at risk of having their shares rarely traded and perhaps change their IPO strategy.

5.4 Further research

Further research on the topic should be made by including more variables. For example the addition of the variable “Business Sector” could explain some of the trade volumes and perhaps explain the results that this study obtained on companies that pay dividends. Further research could also be made on interaction among variables. It is also commendable to expand the scope of research to include markets outside of Sweden.

(27)

27

7 References

[1]NASDAQ.(2016) www.business.nasdaq.com, collected from:

http://business.nasdaq.com/Docs/Nasdaq%20Stockholms%20regelverk%20f%C3%B6r%20e mittenter%20-%203%20juli%202016%20-%20mark-up%20version.pdf on 2017-05-10.

[2] Trading Volume: Definitions, Data analysis, and implications of Portfolio Theory. MIT: Andrew W. Lo, Jiang Wang

[3] The relationship between the percentages of free float shares and liquidity of shares in the companies accepted in Tehran Stock Exchange. Islamic Azad University, Tehran, Emad Rezeal and Atefeh Tahernia

[4] Elements of Regression Analysis (2016). Lang, Harald

[5] Linear Regression Models with Logarithmic Transformations. Methodology Institute, London School of Economics. Benoit , Kenneth

[6] University of Virginia (2015) www.data.library.virginia.edu. Extracted from http://data.library.virginia.edu/understanding-q-q-plots/ on 2017-05.10.

[7] University of California, Berkley (2014), www.statistics.berkley.edu. Extracted from http://statistics.berkeley.edu/computing/r-t-tests on On 2017-05.10.

[8] Trending Sideways (2013). www.trendingsideways.com. Extracted from:

http://trendingsideways.com/index.php/the-p-value-formula-testing-your-hypothesis/ on 2017-05.10

[9] PennState Eberly College of Science (date n/a).www.onlinecourses.science.psu.edu.

Extracted from https://onlinecourses.science.psu.edu/stat501/node/255 on 2017-05-10 [10] (2010). "A protocol for data exploration to avoid common statistical problems". Methods in Ecology and Evolution. Zuur, A.F.; Ieno, E.N.; Elphick, C.S

[11] A Caution Regarding Rules of Thumb for Variance Inflation Factors. Department of Sociology, University of Oregon, Eugene. O’BRIEN , ROBERT M.

References

Related documents

Compared to the market index, buying past winners yield an excess return while short selling of losers tend to make index investing more profitable7. The analysis also shows

med fokus på kommunikation mellan sjuksköterskan och patienten i postoperativ vård samt patientens kommunikativa behov och sjuksköterskans förhållningssätt till detta..

The sample responses are presented in the order to which the patient was added to the training matrix, for the first values the healthy (red star), cancerous (blue triangle) and

Keywords: Swedish central bank, Riksbanken, Repo Rate, Swedish Stock Market, Real Estate sector, Bank sector, Event study, Average Abnormal Return and Cumulative Average

They suggested that during periods of large market price movements, typical rational asset pricing models would suggest increased levels of dispersion with an increase in the absolute

The holding period for each asset is 6 months, which is equivalent with momentum trading strategies used by George and Hwang (2004), Jegadeesh and Titman (1993) and Moskowitz

The table shows the test results for time homogeneity of the Markov chains of order 0 representing daily, weekly and monthly returns of the index OMXSPI during the period January 2000

För att skapa rätt associationer till logotypen ska den utformas med hjälp av projektmål och värdeord för projektet.. ● Hur bör en logotyp för ett hållbarhetsprojekt