1
Swedish Stock market: Explaining trade volumes in single stocks
Bacherlor´s Thesis, Mathematical Statistics, KTH Royal Institute of Technology
Author: Jesper Sevelin
Supervisor: Thomas Önskog
2
Abstract
Title: Swedish Stock market: Explaining trade volumes in single stocks
Author: Jesper Sevelin
Supervisor: Thomas Önskog
Department: Department of Mathematics
The Swedish stock market consists of roughly 750 companies listed on five different markets. Out of all those companies a significant portion are rarely traded. Stocks where the trading activity is low not only present a liquidity
problem to shareholders and potential investors but also affects the reputation of the traded company. A company whose shares are not actively traded does not have a market that actively puts a value on the company.
This study aims to interpret how daily trade volumes can be explained by both categorical and numerical variables associated with the companies listed in Sweden.
This study, contrary to popular belief, shows that the market of the listed stock is to a large degree irrelevant when explaining daily trade volumes of the stocks listed in Sweden. The study instead reveals the importance of factors such as shareholder structure, free float and number of outstanding shares in a company.
2017-06-09
3
Sammanfattning
Titel: Swedish Stock market: Explaining trade volumes in single stocks Författare: Jesper Sevelin
Handledare: Thomas Önskog
Institution: Matematiska Institutionen
Den svenska aktiemarknaden består av ca 750 bolag, listade på fem olika
marknadsplatser. Utav dessa bolag är det en stor del som upplever att deras aktie sällan handlas. Aktier med låg handelsaktivitet presenterar inte bara ett
likviditetsproblem för aktieägarna och potentiella investerare men kan också påverka det bakomliggande företagets rykte. Ett företag vars aktier sällan handlas har inte en aktiv marknad som sätter ett värde på bolaget.
Studien är inriktad på att förstå hur dagliga handelsvolymer kan förklaras både i termer av kategoriska samt numeriska variabler associerade med företagen noterade på en börslista eller motsvarande i Sverige.
Studien, tvärtemot vad många tror, visar att marknadsplatsen för företagens aktie är i hög grad irrelevant för att förklara dagliga handelsvolymer för aktierna som är noterade på en lista i Sverige. Studien visar istället vikten av aktieägarstruktur, free-float, och mängden utstående aktier i företaget.
2017-06-09
4
Contents
1 Introduction ... 6
1.1 The Swedish Stock Market ... 6
1.2 Becoming a listed company ... 6
1.3 Daily trade volumes ... 6
1.4 Research Question ... 6
1.5 The purpose ... 7
1.6 Scope of research ... 7
2 Theory... 7
2.0 Regression analysis ... 7
2.1 The regression model ... 7
2.2 Ordinary least squares ... 7
2.3 The assumptions of OLS ... 8
2.3.0 Strict exogeneity ... 8
2.3.1 Homoscedasticity... 8
2.3.2 No autocorrelation... 9
2.3.3 Normality, The Q-Q plot ... 9
2.3.5 No multicollinearity ... 9
2.4 Selecting and validating models ... 9
2.4.1 Hypothesis testing, p-values and the t-statistic ... 9
2.4.2 F-Test ... 10
2.4.3 R-Squared ... 10
2.4.4 Variance Inflation Factors ... 10
2.4.5 Akaike Information Criterion ... 11
2.5 Data selection ... 11
2.5.1 Variables ... 11
2.6 Transformation of variables ... 11
3 Methodology ... 12
3.1 Software ... 12
3.2 Variables ... 12
3.3 Base models ... 12
3.4 Data collection ... 13
3.5 Building the models, transformations ... 13
3.5.1 The dependent variables ... 13
3.5.2 The independent variables ... 14
3.6 Variable Selection ... 15
5
3.61 Akaike Information Criterion ... 15
3.6.2 Performing the regressions ... 15
3.6.3 Stepwise AIC ... 18
3.7 Validating the reduced models by computing R2 and adjusted R2... 19
3.8 The final models ... 19
3.8.1 Final model 1 ... 19
3.8.2 Final model 2 ... 19
3.8.3 Final model 3 ... 19
3.9 Reviewing residual and Q-Q plots of the final models ... 19
3.10 Checking for Multicollinearity ... 21
3.10.1 Computing the Variance Inflation Factors ... 21
3.11 F-statistic of the final models ... 21
4 Result ... 22
4.1 Market... 22
4.2 If the company pays dividends ... 22
4.3 Number of outstanding shares ... 23
4.4 Number of shareholders ... 23
4.5 Market capitalization ... 23
4.6 Number of forum posts ... 23
4.7 Free-float ... 24
4.8 Conclusions on hypothesis testing ... 24
4.8.1 Hypothesis testing, Trades ... 24
4.8.2 Hypothesis testing, Turnover ... 24
4.8.3 Hypothesis testing, Total volume ... 24
5. Discussion ... 25
5.1 Commenting the results ... 25
5.2 Model accuracy ... 25
5.3 Relevance to companies that are listed or planning on becoming listed ... 25
5.4 Further research ... 26
7 References ... 27
6
1 Introduction
1.1 The Swedish Stock Market
The Swedish stock market consists of approximately 750 companies with their common stock being listed on one of five different markets. One of the largest markets NASDAQ OMX is popularly referred to as “The Main market” [1]. Other markets that exists are First North,
Aktietorget, NGM Equity and NGM MTF. With the exception of The Main market and NGM Equity, also known as regulated markets, the other markets come with less strict regulations, suitable for non-matured companies who wants to operate in a listed environment, these are commonly referred to as growth markets.
1.2 Becoming a listed company
There are various reasons for why a company decides to become publicly traded. One of several reasons for a company to seek a public listing is to have a greater access to the capital market.
By performing an initial public offering a company can raise capital while gaining new
shareholders. An initial public offering is most often referred to as an IPO and is performed on one side by a company who offers their shares to the public and on the other side a market where the shares will be listed. When the IPO has been successfully executed the company has raised some capital and the new shareholders can offer to buy or sell shares through the market.
As a market listing often comes with rigorous regulations and transparency requirements the listing itself is often seen as a certificate of quality.
1.3 Daily trade volumes
There are different ways to measure the traded volumes for listed shares, three common practices are to look at either amount of trades, cash turnover or total number of shares traded. Earlier studies on the subject show how specific stock characteristics can be relevant to the trading activity. In a paper published by two MIT students a conclusion is drawn that risk, size, price, trading costs and if the stock was a part of the S&P500 index were relevant factors affecting turnover in single stocks. [2]
Other studies have been performed, showing that for companies listed on the Tehran Stock Exchange the free-float percentage has a linear relationship with the stock turnover ratio. [3]
While these studies provide valuable information regarding the behaviour of trade volumes they are often limited to analysis of purely numerical predictor variables that are characteristics of the stock, not the company.
Daily trade volumes are a common reference when discussing the interest for a company and low volumes are a cause of concern. It is not uncommon for companies with low trade volumes to debate the cost of maintaining an active listing. A company that is rarely traded will find it to be more difficult raising capital from the market and does not have a market that actively puts a value on the company. For some companies this is often used as a cause, among others, to delist their shares. Other companies maintain a high ambition of remaining as a listed company but are also struggling with the same issues related to low trade volumes.
1.4 Research Question
By performing multiple regression analysis this study seeks to interpret how variables
associated with the listed companies in Sweden affect the daily trade volumes of their stock and by doing so answering the following question:
What variables explain trade volumes and what is their effect?
7
1.5 The purpose
The results derived from this study will aid companies that seek to improve the trade volumes of their listed stocks. It will in addition to this provide valuable information to companies prior to initial public offerings to analyze whether the company is sufficiently mature to become a listed company.
1.6 Scope of research
This study encompasses all companies that are listed on a stock market in Sweden.
2 Theory
2.0 Regression analysis
Regression analysis is a process within statistics where one seeks to estimate the relation between a variable 𝑦, a response variable, and the values of other variables, called covariates.
The covariates can also be referred to as “explanatory variables” as they are used to explain the outcome of 𝑦. A residual is also present, it consists of the unexplained part; that is, the difference between the true outcome of 𝑦 and the estimated outcome derived from the covariates. [4]
Regression analysis can be a powerful tool when one seeks to investigate how two or more variables are related to each other. It is employed in numerous fields, one of importance being finance.
2.1 The regression model
The linear regression model can be defined as:
𝑦
𝑖= ∑
𝑘𝑗=0𝑥
𝑖𝑗𝛽
𝑗+ 𝑒
𝑖, 𝑖 = 1, … , 𝑛
(1)The subscript 𝑖 refers to an observation while the subscript 𝑗 refers to the independent variable (covariate) of number 𝑗. Consequently 𝑥𝑖𝑗 is observation 𝑖 on covariate 𝑗. The betas 𝛽𝑗 are the unknown parameters that are to be estimated. In the equation 𝛽0 is the intercept, i.e. where the regression line intersects the 𝑦-axis.
The assumption is that the explanatory variables (the covariates) influences the dependent variable, and not necessarily the other way around. This is the basis of a structural
interpretation. [4]
The regression model contains multiple explanatory variables and is therefore a multiple linear regression model.
2.2 Ordinary least squares
One procedure, among many, to estimate the coefficients 𝛽𝑗 in equation (1) is called ordinary least squares (“OLS”). This study only employs least squares for the estimation of coefficients.
To illustrate the OLS estimates equation (1) can be written by matrix notation:
𝑦 = 𝑋𝛽 + 𝑒
(2)where 𝑋 is the matrix containing the independent variables, 𝑦 and 𝑒 are two vectors of equal size. The OLS estimate seeks to minimize the sum of squares:
ê
𝑇ê = |ê|
2 (3)8 Referencing to (2) and (3), in accordance with the normal equations the estimate, using the OLS procedure is:
𝛽̂ = (𝑋
𝑇𝑋)
−1𝑋
𝑇𝑌
(4)where the ˆ notation refers to the predicted values. [4]
2.3 The assumptions of OLS
In order for OLS to be an efficient estimator several assumptions are often constructed. Some of these assumptions are described below:
2.3.0 Strict exogeneity
Strict exogeneity refers to the assumption:
𝐸(𝑒
𝑖) = 0
(5)This can be explained as the conditional mean of the regression errors being zero.
A violation of this is called “endogeneity” and introduces a problem where the OLS estimates can become invalid. Endogeneity arises when the residuals are correlated with one or several of the covariates. OLS assumes the opposite that no correlation between residuals and covariates exist.
This is a common concern within the field of econometrics where supply and demand models are often used. As an example the independent variable “price” on the dependent variable
“demand” changes as demand change and “demand” also changes as a result of different prices.
The residual instead contains covariates that explain the simultaneous relationship, an example being “Advertising”. The solution is often to introduce a variable, for this purpose called an
“Instrumental variable” that is uncorrelated with the residual but correlated with the problematic covariate. [4]
2.3.1 Homoscedasticity
Homoscedasticity is defined as the presence of same variance, as derived from the name. The meaning is that, with reference to Equation (1) that the error terms 𝑒𝑖 have the same standard deviation. This is also assumed if OLS is to be the best estimator of 𝛽 in Equation (2). As homoscedasticity is assumed a violation of such can be of concern. When homoscedasticity cannot be observed it is instead called heteroscedasticity. [4]
Heteroscedasticity can be discovered by for example plotting the residuals against the fitted values of a model. This is illustrated below. For homoscedasticity the residuals should be randomly distributed along the red fitted line. The model to the left can be suspected of being heteroscedastic while the model to the right is close to homoscedastic.
Figure 1. For a set of data the residuals are plotted against the fitted values. To the left heteroscedasticity can be observed. To the right the data is closer to homoscedastic.
9
2.3.2 No autocorrelation
Autocorrelation refers to the situation where the error terms between observations are correlated. For OLS it is assumed that these error terms are uncorrelated. Autocorrelation is common when dealing with time-series data, or overall when dealing with measurements over time.
2.3.3 Normality, The Q-Q plot
Q-Q plots help us assess whether data points are derived from the same theoretical distribution.
An example is whether it is normally distributed or not. To test this the quantiles from a set of data is scatter-plotted against the quantiles of a normal distribution. If both data sets come from the same distribution, in this case the normal distribution, they should now lie on a line,
specifically 𝑦 = 𝑥. If the data points depart significantly from the line it indicates that the data set tested is not normally distributed. If this departure is seen when using the Q-Q plot to assess the residuals it can be problematic as normality of the residuals is often used as an assumption for OLS [6]
2.3.5 No multicollinearity
Multicollinearity is the presence of correlation between one or several regression variables.
When a strong multicollinearity is present the standard errors of the estimated coefficients become large and therefore the estimate becomes inaccurate. [4]
There are ways to test for multicollinearity and one option is to compute the variance inflation factors, which will be described in section 2.4.4.
2.4 Selecting and validating models
2.4.1 Hypothesis testing, p-values and the t-statistic
Hypothesis testing is a common phrase in statistics and refers to the process of testing a hypothesis that is formulated for the relationship between data and then compared to another hypothesis that proposes that the relationship is non-existent, also called the null hypothesis.
The null hypothesis can be denoted as:
𝐻0:
𝛽 = 𝑎
(6)Where 𝑎 can take on any value; however, for this study where one seeks to prove that the regression coefficients 𝛽 are non-zero the value of 𝑎 would be set to 0.
The statistician who seeks to prove that a relationship exists assumes the following hypothesis:
𝐻1:
𝛽 ≠ 𝑎
(7)If it is accepted that a type 1 error occurs with a certain probability α, that is, the probability to reject the null hypothesis despite it being true, then for any test of the hypothesis 𝛽 = 𝑎 where the p-value is at most α then the alternative hypothesis 𝛽 ≠ 𝑎 must be true and the null
hypothesis can be rejected. [4]
An often occurring test in statistics which is often generated by statistics software is the t- statistic. The t-statistic is a form of ratio that explains if the means of two groups are equal or
10 not. If the two means are equal the null hypothesis stays true. Given the above definition of the null-hypothesis the t-statistic can be defined as:
𝑡 = 𝛽̂
𝑆𝐸(𝛽̂) (8)
Where 𝑆𝐸(𝛽̂) refers to the estimated standard error of the predictor estimate 𝛽̂. [7]
A relevant question to ask for each coefficient 𝛽 is: Under the assumption that the null hypothesis is true, what is the probability that the value of the coefficient is equal or more extreme than the actual observed result. This value is called “p-value” and becomes a measure of significance. The p-value can be obtained from the t-distribution, with regards to the degrees of freedom and the obtained t-value. The threshold where the p-value is said to indicate non- significance for a variable is pre-defined and depends on what level of significance is chosen.
Usually a 5 % level is used and this is the level that was used throughout the study. [8]
2.4.2 F-Test
While the t-statistic, as described in 2.4.1 was employed to test individual covariates the F-Test refers to a similar procedure but where several covariates can be assessed at the same time. If desired the whole model can be tested at once. In terms of hypothesis testing the F-Test can be formulated as a method of evaluating whether the hypothesis that several (𝑟 number of
coefficients), or all coefficients are equal to zero holds, that is, if the null-hypothesis is true.
It is defined as:
𝐹 =1
𝑟
𝛽
̂𝑡𝑉
̂−1𝛽
̂(9)
Where 𝑉̂ is the estimated covariance matrix and 𝑟 the number of parameters in 𝛽̂.
Similarly as with the t-test the p-value for F can be calculated from the F-distribution and compare with the significance level of choice. [4]
The F-test where all coefficients are tested is sometimes mentioned as a test of “overall
significance”. The implication of testing all coefficients is that one investigates a null hypothesis stating that the fit of the model only containing the intercept is equal to the full model.
2.4.3 R-Squared
R2 is commonly referred to as the “coefficient of determination” and provides a value for the goodness of fit for the regression model. It is defined as:
𝑅
2= 1 −
∑𝑛𝑖=1 (𝑦𝑖−𝑦̂𝑖)2∑𝑛𝑖=1(𝑦𝑖−𝑦̅)2 (10)
Where 𝑦̅ is the horizontal line that passes through the intercept of the linear regression line 𝑦̂.
The value derived from R2 can be seen as a proportion of how much of the variance in the dependent variable that is predicted by the independent variables. [9]
A reduced R2 can be computed which penalizes the addition of independent variables to the model.
2.4.4 Variance Inflation Factors
One way to compare multicollinearity between models is to compute the variance inflation factors of each model, VIF. VIF provides a measure of how much a variable is inflated because of
11 collinearity with other variables [10]. The principle behind calculating the VIF values is to perform multiple regressions, using each independent variable as a dependent variable and running the regression with the remaining independent variables. If one of the regressions yields a high R2 value it indicates that much of the variance of that variable can be expected to originate from the other independent variables, and consequently multicollinearity is present. The variance inflation factor can be calculated as:
𝑉𝐼𝐹
𝑗=
11−𝑅𝑗2
(11)
The 𝑅𝑗2 values originate from the regression of 𝑋𝑗 as dependent variable with all other variables as explanatory, as defined in section 2.4.3. The resulting number can be used as a measure of
multicollinearity with a high value indicating high collinearity. There are several rules of thumbs as to at what value the VIF is a cause of concern. One rule of thumb is a VIF value exceeding 10.
However, VIF values should not alone be used to determine the design of the. High VIF values far exceeding 10 does not necessarily warrant the removal of one or correlated variables. Instead, other factors that affect the variance of the coefficients need to be considered. [11]
2.4.5 Akaike Information Criterion
Comparing and selecting models can be done with guidance from the Akaike Information Criterion test where the best model is considered the one that minimizes:
𝐴𝐼𝐶 = 𝑛 log(|ê|2) + 2𝑘 (12)
Where n is the number of observations and k is the number of coefficients of the variables. The symbol ê refers to the residuals and can be defined by rewriting Equation (2):
ê = 𝑌 − 𝑋𝛽̂
(13)Where the hat notation refers to the predicted values from the regression. [4]
2.5 Data selection
Relevant data needs to be selected that are believed to have an effect on the dependent variable.
For this study variables that are intuitively relevant has been selected together with
unconventional variables that are believed to also have an effect on the daily trade volumes of a stock.
2.5.1 Variables
In a regression it is often talked about two types of variables, continuous or categorical.
Categorical variables are often treated as a binary variable in regression analysis and is
therefore attributed either 1 or 0 as value, indicating If the observation belongs to the category.
They are often referred to as dummy variables. Continuous variables can take on any value.
2.6 Transformation of variables
The idea behind transforming variables can be to get a better fit and sometimes for the purposes of interpretation.
An example where a variable is logarithmically transformed for the purpose of interpretation is when it is suitable to describe the impact of an explanatory variable as a percentage change rather than a numeric amount [4].
12 A logarithmic transformation of variables can also be used to make highly skewed data come closer to a normal distribution. Skewed data can be an indication that a nonlinear relationship exists between the dependent and independent variable. In order to preserve the linear model the skewed variable can be replaced with the logarithm of that variable. [5] Highly skewed data might also have multiple outliers among the data points and these points can have a severe leverage effect on the slope of linear trend lines. After logarithmic transformation the data including the outliers, are moved closer to each other.
3 Methodology
3.1 Software
The collected data was compiled and processed in Microsoft Excel. The processed data was saved as comma separated files (.csv). The programming language for statistical computing R was used with R-Studio as the integrated development environment. Consequently R-Studio was used to extract data from the comma separated data files and run the regressions.
3.2 Variables
The variables used in this study are shown below:
Covariate Type of covariate Description
Trades Dependent, continuous The average amount of daily trades Turnover Dependent, continuous The average daily cash volume traded Totalvolume Dependent, continuous The average daily amount of shares traded dividend Categorical If the company is paying dividends
NASDAQ(used
as benchmark) Categorical If the company is listed on the NASDAQ main market
firstnorth Categorical If the company is listed on First North aktietorget Categorical If the company is listed on Aktietorget
ngm Categorical If the company is listed on NGM Equity
nordicmtf Categorical If the company is listed on NGM MTF
inlforum Continuous Amount of forum posts
shareholders Continuous Amount of shareholders in the company marketcap Continuous The market capitalization of the company freefloat Continuous The free-float of the company
NumberShares Continuous The number of outstanding shares
Table 1. The variables that are used and referred to throughout this paper
3.3 Base models
Trade volumes have been measured by three response variables, turnover, total volume and number of trades (for definitions see Table 1). These three measured variables are the most quoted statistics when gathering information from a stock market on the trading activity of a stock. They are related but tell slightly different things and will therefore be put as three
separate response variables. Three base models with each of the dependent variables have been computed. The three models are run with the same covariates, as specified in Figure 1.
One of the models with response variable “Trades” is shown below:
13 Trades = 𝐵1(𝑎𝑘𝑡𝑖𝑒𝑡𝑜𝑟𝑔𝑒𝑡) + 𝐵2(𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑) + 𝐵3(𝐹𝑖𝑟𝑠𝑡𝑛𝑜𝑟𝑡ℎ) + 𝐵4(𝑛𝑔𝑚)
+ 𝐵5(𝑛𝑜𝑟𝑑𝑖𝑐𝑚𝑡𝑓) + 𝐵6(𝑖𝑛𝑙𝑓𝑜𝑟𝑢𝑚) + 𝐵7(𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠) + 𝐵8(𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵9(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵10(𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠) + 𝐵11(𝐿𝑎𝑟𝑔𝑒𝑐𝑎𝑝) + 𝐵12(𝑀𝑖𝑑𝑐𝑎𝑝) + 𝐵13(𝑆𝑚𝑎𝑙𝑙𝑐𝑎𝑝)
3.4 Data collection
Data for the dependent variables turnover, total volume and trades has been retrieved from the stock markets NASDAQ OMX Nordic, First North, Aktietorget and Nordic Growth Market. The data refers to daily records of each company stock between the dates 1 January 2016 to 6 March 2017, which corresponds to approximately 300 days of trading. For each stock the average daily turnover, volume and trades were extracted.
Data for the independent variables was to a large degree collected through the Avanza Stock Filter, which is available online. Free-float was calculated based on the shareholder structure which each listed company updates quarterly. To retrieve information on the amount of shareholders data was retrieved by visiting the central securities depository in Stockholm.
3.5 Building the models, transformations
As described in section 2.6 the transformation of either the dependent variable or one or several of the independent variables can prove to be useful. This section takes a look at the variables described in Figure 1.
3.5.1 The dependent variables
Trades, turnover and volume are three variables used in the three regression models used for this study. As the dependent variables can only take on positive values they should intuitively be logarithmized [4]. Another argument to use the log of a variable is that it is somehow skewed in its distribution. A way to illustrate the distribution of data is to plot a histogram of the variable of choice. A histogram of the dependent variable “Trades” is shown below:
Figure 2. The effect of logarithmic transformation of the dependent variable "Trades". As observed a heavily positively skewed distribution of data can come close to a normal distribution after transformation
As visible in Figure 1 the data is heavily skewed and can be difficult to fit with the linear
regression models. This is mitigated by using the logarithm of the affected variable. By using the logarithm on skewed data a more even distribution can be expected. As visualized in figure 2 the logarithm of “Trades” proves useful to achieve a normal distribution.
The same transformation was considered and performed for “Turnover” and “Totalvolume”.
14
3.5.2 The independent variables
The quantitative variables “free float”, “number of shareholders”, “marketcap” and “number of shares” are reviewed and some are transformed, as seen on the next page. A heavy positive skew can be observed in all variables except “free float”, which is left untouched. It is also easily observed that the three covariates take on a wide range of values from low to very high, this itself warrants the use of logarithmic transforms.
Figure 3. The independent variables before (to the left) and after (to the right) logarithmic transformation
15 These variables, with the exception of free-float, will be considered appropriate for log
transformations.
3.6 Variable Selection
3.61 Akaike Information Criterion
Based on the observations in Figure 3 and 4 the dependent variables and three of the independent variables are log transformed.
A test using Akaike information criterion is performed to explore whether a reduced model is preferred, the results are as below with the following reference number for each variable:
1: Log(marketcap) 2: log(NumberShares) 3: log(Shareholders). 4: freefloat. 5: Dividend. 6:
Inlforum. 7: ngm. 8: Nordicmtf. 9: Firstnorth. 10: aktietorget. As an example in the third column the value “-1” represents the removal of “log(marketcap). The values under “-1” refers to the change in AIC compared to the AIC value of the full model.
Table 2. AIC test performed in order to test whether a reduced model is preferred.
The values in Table 2 represent the action of reducing the model, one variable at a time, all others included. The variables The values are the change in AIC upon the removal of that variable. A positive value means a higher AIC sum for the model, indicating that the variable should be kept in the model.
As visible it is warranted to keep the log-transformed independent variables. Some benefits are indicated as to the removal of “nordicmtf”, “firstnorth” and “aktietorget”. However, the removal of the variables only reduce the the akaike values by a relatively small amount. Individual p- values will be reviewed when the regressions are performed and will decide on the elimination of the aforementioned variables from the models. Upon the potential removal of these variables the reduced models will also be reviewed by computing the AIC value stepwise.
3.6.2 Performing the regressions
With the new log-models the regressions are run and the estimates and individual p-values for the coefficients are presented. In this section the coefficients will be reviewed in terms of p- values and a stepwise AIC test will be performed.
Full model
(AIC) -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
Log- Model
1 1567.6 84.7 22.7 152.4 62.7 10.7 2.44 12.9 -1.8 -1.9 -2 Log-
Model
2 1828.4 223.6 7.3 97.3 66.6 0.949 2.98 9.88 -1.96 -1.92 -2 Log-
Model
3 1676.2 41.4 516.6 77.8 101.4 7.4 1.1 3.4 -1.1 -1.8 -1.5
16 3.6.2.1 Regression from log-model 1
Log-model 1 Coefficients:
Estimate Std. Error t value p-value (Intercept) -5.883915 0.455837 -12.908 < 2e-16 Log(marketcap) 0.269349 0.028180 9.558 < 2e-16 Log(NumberShares) 0.156832 0.031542 4.972 8.58e-07 Log(shareholders) 0.625322 0.047688 13.113 < 2e-16
freefloat 1.643060 0.200366 8.200 1.39e-15
dividend -0.341982 0.096409 -3.547 0.000419
inlforum 0.007051 0.003367 2.094 0.036640
ngm -1.126831 0.292306 -3.855 0.000128
nordicmtf 0.087610 0.197313 0.444 0.657188
firstnorth -0.029121 0.110082 -0.265 0.791451
aktietorget 0.009318 0.131938 0.071 0.943719
Table 3. At the 5 % significance level the variables “nordicmtf”,”firstnorth” and “aktietorget” are deemed insignificant and should be removed from the model.
For log-model 1 the regression shows insignificance at the 5 % level for multiple variables. The model will now be reduced one variable at a time, starting with the least significant variable:
Coefficients: Coefficients: Coefficients:
p-value p-value p-value
(Intercept) < 2e-16 (Intercept) < 2e-16 (Intercept) < 2e-16 Log(marketcap) < 2e-16 Log(marketcap) < 2e-16 Log(marketcap) < 2e-16 Log(NumberShares) 8.40e-07 Log(NumberShares) 6.85e-07 Log(NumberShares) 6.66e-05 Log(shareholders) < 2e-16 Log(shareholders) < 2e-16 Log(shareholders) < 2e-16 freefloat < 2e-16 freefloat < 2e-16 freefloat < 2e-16
dividend 0.000216 dividend 0.000232 dividend 0.000207
inlforum 0.036554 inlforum 0.038149 inlforum 0.038760
ngm 7.06e-05 ngm 7.63e-05 ngm 7.06e-05
nordicmtf 0.642421 nordicmtf 0.570206
firstnorth 0.679422
Table 1. P-values are reviewed by stepwise elimination of insignificant variables
Table 3 shows the p-values of the coefficients as the least significant variable is removed. Only three variables were removed as the other coefficients were sufficiently significant during the stepwise removal (p-value below 5 %). The stepwise removal of variables is performed since the removal of one variable might affect the significance of other variables. By removing one
variable at a time the reduced model can be reviewed and evaluated for further reduction.
17 3.6.2.2 Regression from log-model 2
Log-model 2 Coefficients:
Estimate Std. Error t value p-value
(Intercept) 2.194271 0.560664 3.914 0.000101
Log(marketcap) 0.565925 0.034661 16.328 < 2e-16 Log(NumberShares) 0.117754 0.038795 3.035 0.002504 Log(shareholders) 0.603162 0.058655 10.283 < 2e-16 freefloat 2.080638 0.246443 8.443 < 2e-16
dividend -0.202101 0.118580 -1.704 0.088819
inlforum 0.009185 0.004141 2.218 0.026924
ngm -1.234486 0.359526 -3.434 0.000635
nordicmtf -0.046746 0.242688 -0.193 0.847323
firstnorth 0.036450 0.135397 0.269 0.787857
aktietorget 0.001564 0.162280 0.010 0.992314
Table 2. At the 5 % significance level the variables “dividend, “nordicmtf”,”firstnorth” and “aktietorget” are deemed insignificant and should be removed from the model.
For log-model 2 the regression also show insignificance for some variables. As with the regression for log-model 1 a stepwise reduction of the model was performed. The result is shown below:
Coefficients:
p-value
(Intercept) 3.09e-05
Log(marketcap) < 2e-16 Log(NumberShares) 0.000822 Log(shareholders) < 2e-16
freefloat < 2e-16
Inlforum 0.011847
ngm 0.000420
Table 3. P-values are reviewed by stepwise elimination of insignificant variables
18 3.6.2.3 Regression from log-model 3
Log-model3 Coefficients:
Estimate Std. Error t value p-value (Intercept) -9.349573 0.496885 -18.816 < 2e-16 Log(marketcap) -0.204314 0.030718 -6.651 6.39e-11 Log(NumberShares) 0.966926 0.034382 28.123 < 2e-16 Log(shareholders) 0.475279 0.051982 9.143 < 2e-16
freefloat 2.295404 0.218408 10.510 < 2e-16
dividend -0.321666 0.105091 -3.061 0.0023
inlforum 0.006375 0.003670 1.737 0.0829
ngm -0.741257 0.318627 -2.326 0.0203
nordicmtf 0.200674 0.215081 0.933 0.3512
inlforum 0.006375 0.003670 1.737 0.0829
aktietorget -0.094800 0.143819 -0.659 0.5100
Table 4. At the 5 % significance level the variables “inlforum”, “nordicmtf”,”firstnorth” and “aktietorget” are deemed insignificant and should be removed from the model.
The regression for log-model 3 yielded several insignificant variables. As with the other models these variables are stepwise removed and the p-values of the remaining coefficients are shown below:
Coefficients:
p-value
(Intercept) < 2e-16 Log(marketcap) 9.92e-12 Log(NumberShares) < 2e-16 Log(shareholders) < 2e-16
freefloat < 2e-16
dividend 0.000845
ngm 0.018782
Table 5. P-values are reviewed by stepwise elimination of insignificant variables
3.6.3 Stepwise AIC
Stepwise AIC, for this section tested backwards, works by testing the full model one covariate at a time and the removed covariate that caused the largest reduction of AIC is excluded from the model. The test is then repeated until the AIC value cannot be reduced further. The stepwise iteration is performed in the interface R-Studio using the built in “Step” function.
The result from the stepwise AIC is visualized below, showing what covariate is removed and in what order.
Step 1 Step 2 Step 3
log-model 1 -aktietorget -firstnorth -nordicmtf log-model 2 -aktietorget -nordicmtf -firstnorth log-model 3 -firstnorth -aktietorget -nordicmtf
Table 6. Stepwise AIC, showing what covariate is removed for every step
The stepwise AIC test does not provide any conflicting results with the significance tests where p-values were reviewed. However the AIC values do not indicate that a removal of “inlforum” for
19 log-model 2 and “dividend” for log-model 3 is warranted. The p-values suggest otherwise and these two variables will be removed where applicable in each model. Also, for all models
“aktietorget”, “firstnorth” and “nordicmtf” will be removed
3.7 Validating the reduced models by computing R
2and adjusted R
2The next step is to validate the reduced models by computing a measure of goodness of fit, as discussed in section 2.4.3 For this purpose R2 is computed for the three models, first for the base log-models, and then for the reduced models which have been constructed as a result of the previous tests.
MODEL R2 Adjusted R2 REDUCED MODEL R2 Adjusted R2
Log-Model 1 0.7959 0.7926 Log,Model 1 0.7957 0.7934
Log-Model 2 0.8293 0.8266 Log-Model 2 0.8284 0.8267
Log-Model 3 0.8188 0.8158 Log,Model 3 0.8171 0.8154
Table 7. The log-models are compared with the reduced log-models.
The values of R2 shows that as the models are reduced the goodness of fit becomes only slightly worse, and adjusted for two decimals the goodness of fit is unchanged. The reduced models can therefore be kept and the final models can be constructed.
3.8 The final models
Throughout section 3 rigorous testing and diagnostics have been performed on the base models.
The tests have prompted changes and the final models are presented below:
3.8.1 Final model 1
log(Trades) = 𝐵1(log (𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵2(log (𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠) + 𝐵3(log (𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠)) + 𝐵4(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵5(𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑) + 𝐵6(𝑖𝑛𝑙𝑓𝑜𝑟𝑢𝑚) + 𝐵7(𝑛𝑔𝑚)
3.8.2 Final model 2
log(Turnover) = 𝐵1(log (𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵2(log (𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠)
+ 𝐵3(log (𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠)) + 𝐵4(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵5(𝑖𝑛𝑙𝑓𝑜𝑟𝑢𝑚) + 𝐵6(𝑛𝑔𝑚)
3.8.3 Final model 3
log(Totalvolume) = 𝐵1(log (𝑚𝑎𝑟𝑘𝑒𝑡𝑐𝑎𝑝) + 𝐵2(log (𝑁𝑢𝑚𝑏𝑒𝑟𝑆ℎ𝑎𝑟𝑒𝑠)
+ 𝐵3(log (𝑠ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟𝑠)) + 𝐵4(𝑓𝑟𝑒𝑒𝑓𝑙𝑜𝑎𝑡) + 𝐵5(𝑑𝑖𝑣𝑖𝑑𝑒𝑛𝑑) + 𝐵6(𝑛𝑔𝑚)
3.9 Reviewing residual and Q-Q plots of the final models
In section 2.3 residual and Q-Q plots are discussed. In this section the final models are reviewed by plotting their respective residual and Q-Q plots:
20
Table 8. Residual and Q-Q plots for Log(Model 1). The residuals are to a large degree randomly spread and the model can be assumed to be close to homoscedastic. The Q-Q plot indicates that the normality assumption of the residual holds.
Table 9. Residual and Q-Q plots for Log(Model 2). The residuals are to a large degree randomly spread and the model can be assumed to be close to homoscedastic. The Q-Q plot indicates that the normality assumption of the residual holds. One outlier can be observed.
Table 10. Residual and Q-Q plots for Log(Model 3). The residuals are to a large degree randomly spread and the model can be assumed to be close to homoscedastic. The Q-Q plot indicates that the normality assumption of the residual holds, however some outliers can be observed at both ends of the plot.
The residual and Q-Q plots show that for all models 1-3 the assumptions (as described in Section 2.3) of homoscedasticity and normally distributed residual holds. One last diagnostics will be run on the final models. The final models will be reviewed in terms of multicollinearity among the variables.
21
3.10 Checking for Multicollinearity
As discussed in Section 2.3 the presence of multicollinearity is a violation of one of the
assumptions of OLS. Strong collinearity among covariates can cause individual coefficients to be incorrectly specified.
3.10.1 Computing the Variance Inflation Factors
To review whether multicollinearity is present among the covariates the variance inflation factors (“VIF”) are computed, the theory behind this is described in section 2.4.4.
The VIF values for Final Model 1-3 are printed out below:
Independent variable, Final Model 1 Vif
Log(marketcap) 4.026025
Log(NumberShares) 2.084777
Log(shareholders) 3.933942
freefloat 1.341512
dividend 1.914119
inlforum 1.100350
ngm 1.016544
Table 11. Variance inflation factors for the independent variables in Final Model 1. The VIF values are relativly low and not a cause of concern
Independent variable, Final Model 2 Vif
Log(marketcap) 2.801390
Log(NumberShares) 2.026184
Log(shareholders) 3.911890
freefloat 1.336846
inlforum 1.078921
ngm 1.015808
Table 12. Variance inflation factors for the independent variables in Final Model 2. For this model the VIF values are also relatively low and not a cause of concern
Independent variable, Final Model 3 Vif
Log(marketcap) 4.013539
Log(NumberShares) 2.084102
Log(shareholders) 3.928862
freefloat 1.294825
dividend 1.876842
ngm 1.016195
Table 13. Variance inflation factors for the independent variables in Final Model 3. The VIF values for the model are low and not a cause of concern
The results above show that collinearity between the covariates is low for all final models. The assumption of no multicollinearity, as described in section 2.3 is therefore satisfied. No changes to the final models are required based on the VIF-values.
3.11 F-statistic of the final models
The F-statistics for the final models are computed below together with its p-value:
22 F-stat p-value
Final Model 1 346.1 < 2.2e-16 Final Model 2 501.1 < 2.2e-16 Final Model 3 464.0 < 2.2e-16
Table 14. F-Statistic for the final models.
4 Result
4.1 Market
The study overall shows that the market on which a company is listed is not relevant to explain the daily trade volumes. The exception is the market NGM which shows a negative effect on trade volumes.
Market Trades Turnover Totalvolume
β β β
NGM Equity -1.127335 -1.226173 -0.72323
First North Insignificant Insignificant Insignificant Aktietorget Insignificant Insignificant Insignificant Nordic MTF Insignificant Insignificant Insignificant
Table 15. Regression results for the categorical variables that describe the market that a company belongs to
The results show that the amount of trades of a company stock, all other variables held
constant, for companies listed on NASDAQ relative to companies listed on NGM Equity decreases by 67 % while on the latter market. The turnover, i.e. the cash turnover decreases by 70 %. The total volume (number of shares traded) decreases by 51 %. For companies listed on NASDAQ relative to the other markets there is no significance meaning, indicating that they can enjoy the same amounts of trade volumes on the other markets.
4.2 If the company pays dividends
Trades Turnover Totalvolume
β β β
Dividend -0.343015 Insignificant -0.33380
Table 16. Regression results for if a company pays dividends or not
The results show that, all other variables held constant, a company that pays dividends will have the number trades decreased by 29 %. Paying dividends is insignificant for the cash turnover.
The total number of shares traded is decreased by 28%.
23
4.3 Number of outstanding shares
Trades Turnover Totalvolume
β β β
Number of outstanding
shares 0.158270 0.128181 0.96920
Table 17. Regression results for the number of outstanding shares
The results show, all other variables held constant, that when the number of outstanding shares are changed by 1 % the number of trades are changed by 0.16 %. The cash turnover is similarly changes by 0,13 % and the total number of shares traded changes by 1%.
4.4 Number of shareholders
Trades Turnover Totalvolume
β β β
Number of
shareholders 0.624703 0.610832 0.48761 Table 18. Regression results for the number of shareholders
The results show, all other variables held constant, that when the number of shareholders are changed by 1 % the number of trades changes by 0.62%. The cash turnover changes by 0.61 % and the total number of shares traded by 0.48 %.
4.5 Market capitalization
Trades Turnover Totalvolume
β β β
Market capitalization 0.269125 0.532276 -0.20179 Table 19. Regression results for the market capitalization
The results show, all other variables held constant, that when the market capitalization is changed by 1 % the number of trades are changed by 0.26 %, the cash turnover is changed by 0.53 % and the total number of shares traded by -0.20%.
4.6 Number of forum posts
Trades Turnover Totalvolume
β β β
Number of forum posts 0.006938 0.010315 Insignificant
Table 20. Regression result for the monthly number of forum posts in the Avanza stock forum "Placera"
The results show, all other variables held constant, that one unit change in the number of
monthly forum posts in the Avanza stock forum “Placera” would change the amount of trades by 0.7% and the cash turnover by 1 %. As for the total number of shares traded the number of forum posts are insignificant.
24
4.7 Free-float
Trades Turnover Totalvolume
β β β
Free-float 1.659238 2.092709 2.32018
Table 21. Regression results for the free-float of a company
The interpretation of the results is similar to the interpretation of the other variables that are not log-transformed. As free-float can take on any value between 0 and 1 it is instead suitable to describe the changes in the dependent variable for every 0.1 unit change in free-float. The results show, all other variables held constant that a 0.1 unit change in free-float results in a 16.7
% change in trades. The cash turnover is changed by 20.1 % and the total volume changed by 23.2 %.
4.8 Conclusions on hypothesis testing
4.8.1 Hypothesis testing, Trades
The regression model shows that all variables except if the company was listed on the Nordic Growth Market, First North or Aktietorget do explain some of the daily trades of the company stock. This with the support of the F-statistic and the p-value indicating significance from 3.11 the null hypothesis for the model as a whole can therefore be rejected. The null hypothesis for the insignificant variables is not rejected.
4.8.2 Hypothesis testing, Turnover
This regression model shows that all variables except if the company was listed on the Nordic Growth Market, First North or Aktietorget or if the company pays dividends, do explain some of the cash turnover of the company stock. The null hypothesis for the model as a whole can can be rejected, as is visible in 3.11 where the p-value for the F-statistic shows significance. As with the other model the null hypothesis for the insignificant variables is not rejected.
4.8.3 Hypothesis testing, Total volume
This regression model instead shows that all variables except if the company was listed on the Nordic Growth Market, First North or Aktietorget or the number of forum posts (Avanza Stock forum), do explain some of the result on the number of shares traded of the company stock. The null hypothesis for the model as a whole can for this model be rejected as well, similarly with support from the results in section 3.11. As with the other two models the null hypothesis for the insignificant variables is not rejected.
25
5. Discussion
5.1 Commenting the results
The three final models yielded R2 values around 0.8 and adjusted R2 just marginally reduced compared to the non-reduced R2, which is satisfactory considering the presence of human behaviour in stock trading.
A surprising result was that for companies listed on the main market NASDAQ the change to another market, NGM Equity being the exception, did not matter in terms of explaining trade volumes. This is contrary to popular belief of which the NASDAQ main market always has been considered the best market, on many factors.
Another surprise was the effect of if a company is paying dividends or not. Intuitively one might think that it should have a positive effect on trade volumes. There is a risk that this market is correlated with a variable not used in the regressions, that is, business sector. It is easy to notice that the type of companies that are paying continuous dividends are to a large degree
investment companies, real estate companies and in general other mature companies. A large portion of companies listed in Sweden are actually on the growth markets, Nordic Growth Market, Aktietorget or First North. As these are companies in development they rarely pay dividends.
In summary the results obtained from this variable should be met with a sceptic eye.
5.2 Model accuracy
All models employed were log-models with both dependent and independent variables being log-transformed. The usage of log-transforms were highly justified but it still affects
interpretability. Results obtained for single variables should be reviewed with a cautious eye as the other variables need to be taken into consideration as well. The model should also be used on the basis of interpretation rather than relying on the coefficient value to precisely explain the changes in trade volumes. Instead one should interpret the results as to them showing the direction of change and significance in change when a variable is altered.
It ought to be mentioned that for the test performed in this paper the assumptions of OLS holds.
5.3 Relevance to companies that are listed or planning on becoming listed
A vital question throughout this study has been whether listed companies in Sweden can, based on what the results of this paper show, increase the trade volumes of their stocks. As mentioned above both the number of shareholders, free float and number of shares in the company can have potentially material effect on the trade volumes of the company. Companies can to some degree influence these three factors.
Drastic measures to increase the number of shareholders can be to offer shares to the market, also called a share issuance, or to merge with another company that has a strong shareholders base.
Issues with low free-floats can be dealt with by a similar fashion, it can also be avoided by raising capital from the open market instead of institutional investors.
26 The number of shares in a company is something that a company can alter by performing share- splits, for example by having 1 share becoming 10 shares, each valued at 1/10 of the original share.
Companies listed on the NGM Equity market can consider applying for a listing elsewhere, as this market indicated that trade volumes were lower associated with this market.
Companies that are about to perform an IPO sometimes only offer a small portion of company shares to the open market. This study can help such companies to foresee if they will be at risk of having their shares rarely traded and perhaps change their IPO strategy.
5.4 Further research
Further research on the topic should be made by including more variables. For example the addition of the variable “Business Sector” could explain some of the trade volumes and perhaps explain the results that this study obtained on companies that pay dividends. Further research could also be made on interaction among variables. It is also commendable to expand the scope of research to include markets outside of Sweden.
27
7 References
[1]NASDAQ.(2016) www.business.nasdaq.com, collected from:
http://business.nasdaq.com/Docs/Nasdaq%20Stockholms%20regelverk%20f%C3%B6r%20e mittenter%20-%203%20juli%202016%20-%20mark-up%20version.pdf on 2017-05-10.
[2] Trading Volume: Definitions, Data analysis, and implications of Portfolio Theory. MIT: Andrew W. Lo, Jiang Wang
[3] The relationship between the percentages of free float shares and liquidity of shares in the companies accepted in Tehran Stock Exchange. Islamic Azad University, Tehran, Emad Rezeal and Atefeh Tahernia
[4] Elements of Regression Analysis (2016). Lang, Harald
[5] Linear Regression Models with Logarithmic Transformations. Methodology Institute, London School of Economics. Benoit , Kenneth
[6] University of Virginia (2015) www.data.library.virginia.edu. Extracted from http://data.library.virginia.edu/understanding-q-q-plots/ on 2017-05.10.
[7] University of California, Berkley (2014), www.statistics.berkley.edu. Extracted from http://statistics.berkeley.edu/computing/r-t-tests on On 2017-05.10.
[8] Trending Sideways (2013). www.trendingsideways.com. Extracted from:
http://trendingsideways.com/index.php/the-p-value-formula-testing-your-hypothesis/ on 2017-05.10
[9] PennState Eberly College of Science (date n/a).www.onlinecourses.science.psu.edu.
Extracted from https://onlinecourses.science.psu.edu/stat501/node/255 on 2017-05-10 [10] (2010). "A protocol for data exploration to avoid common statistical problems". Methods in Ecology and Evolution. Zuur, A.F.; Ieno, E.N.; Elphick, C.S
[11] A Caution Regarding Rules of Thumb for Variance Inflation Factors. Department of Sociology, University of Oregon, Eugene. O’BRIEN , ROBERT M.