• No results found

Macroeconomic factors' impact on the number of bankruptcies among small and medium-sized companies

N/A
N/A
Protected

Academic year: 2022

Share "Macroeconomic factors' impact on the number of bankruptcies among small and medium-sized companies"

Copied!
60
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2020,

Macroeconomic factors' impact on the number of bankruptcies among small and medium-sized

companies

AGNES HANSSON

AGNES LINDVALL

(2)
(3)

Macroeconomic factors' impact on the number of bankruptcies among small and medium-sized

companies

Agnes Hansson Agnes Lindvall

Agnes Lindvall Agnes Lindvall Agnes Lindvall ROYAL

Degree Projects in Applied Mathematics and Industrial Economics (15 hp) Degree Programme in Industrial Engineering and Management (300 hp) KTH Royal Institute of Technology year 2020

Supervisor at KTH: Ximei Wang

(4)

TRITA-SCI-GRU 2020:104 MAT-K 2020:005

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden

(5)

Abstract

Small and medium-sized companies constitute a large part of the Swedish economy and are to a great extent exposed to the developments in the macroeconomy. There is a general consensus that it exists a relationship between these two components, but to what dimen- sion is it true? The aim of this thesis is to evaluate if the number of bankruptcies among small and medium-sized companies can be explained by the situation in the macro econ- omy. In order to do so, data have been collected and a multiple linear regression analysis has been accomplished.

The result of the analysis suggests that a model of the six macroeconomic factors months, CPI, retail sales, OMX30, total enterprises and liquidated enterprises, can explain the number of bankruptcies to an extent of 64.49%. When comparing the adequacy of other models used to estimate risks of bankruptcy, it is stated that other models are more accu- rate. Furthermore, we have concluded that the model is useful to bring insight but only when considered in combinations with other models and tools.

Keywords

Bachelor thesis, bankruptcy, macroeconomic factors, multiple regression analysis, statistics.

(6)
(7)

Sammanfattning

Sm˚a- och medelstora bolag utg¨or en stor del av den svenska ekonomin och ¨ar i mycket stor utstr¨ackning exponerad mot utvecklingen i makroekonomin. Den generella bilden ¨ar att det existerar en relation mellan dessa tv˚a komponenter, men till vilken utstr¨ackning ¨ar det sant? Syftet med den h¨ar uppsatsen ¨ar att utv¨ardera om antalet konkurser bland sm˚a- och medelstora bolag kan f¨orklaras av vilket stadium makroekonomin befinner sig i. F¨or att kunna genomf¨ora unders¨okningen har en multipel linj¨ar regressionsanalys genomf¨orts.

Resultatet av analysen visar att en modell best˚aende av sex oberoende variabler, m˚anader, KPI, detaljhandeln, OMX30, totalt antal bolag och antal likviderade bolag, kan till 64.49%

f¨orklara utfallet av antalet konkurser bland sm˚a- och medelstora bolag. J¨amf¨ort med andra modeller och teorier som anv¨ands f¨or att ber¨akna risken av konkurs hos ett bolag uppfyller modellen inte lika h¨og tillf¨orlitliget. Slutgiltigen f¨oresl˚as att modellen kan anv¨andas i kom- bination med andra modeller och verktyg f¨or att bidra med insikt och slutsatser.

Nyckelord

Kandidatuppsats, konkurs, makroekonomiska faktorer, multipel linj¨arregression, statistik.

(8)

Acknowledgements

We want to thank Mykola Shykula, at the Department of Mathematics, and Julia Liljegren, at the Department of Industrial Engineering and Management, for showing interest and being great supervisors during our project. For advising and giving valuable advises along the way. Thank you!

(9)

Contents

1 Introduction 7

1.1 Background . . . 7

1.2 Research question . . . 8

1.3 Earlier research . . . 8

2 Impact 10 2.1 Scope . . . 10

2.2 Delimitation . . . 10

3 Economic theory 11 3.1 Response variable . . . 11

3.2 Regressor variables . . . 12

3.2.1 CPI . . . 12

3.2.2 Retail sales . . . 12

3.2.3 Unemployment . . . 12

3.2.4 Import and Export price index . . . 13

3.2.5 Producer price index . . . 13

3.2.6 OMX30 . . . 13

3.2.7 Number of enterprises . . . 13

4 Mathematical theory 14 4.1 Definition and terminology . . . 14

4.1.1 Basic assumptions . . . 15

4.1.2 Ordinary least squares estimation . . . 16

4.1.3 Multicollinearity . . . 16

4.1.4 Standardized regression coefficients . . . 17

4.1.5 Categorical variables . . . 17

4.1.6 Strategy for model building . . . 17

4.2 Model evaluation and diagnostics . . . 18

4.2.1 Residual analysis . . . 18

4.2.2 Examination of the correlation matrix . . . 19

4.2.3 Variance inflation factors . . . 19

4.2.4 Methods for handling of multicollinearity . . . 19

4.2.5 R2 . . . 20

4.2.6 AIC and BIC . . . 20

4.2.7 Mallow’s Cp . . . 21

4.2.8 Cook’s distance . . . 21

4.2.9 Covariance Ratio . . . 22

(10)

4.2.10 The Box-Cox method . . . 22

4.2.11 Cross validation . . . 22

4.3 Statistical inference . . . 23

4.3.1 Confidence interval . . . 23

4.3.2 Test for significance of regression . . . 23

4.3.3 F-test statistic . . . 24

4.3.4 t-test statistic . . . 24

5 Methodology 25 5.1 Code environment . . . 25

5.2 Data . . . 26

5.2.1 Data variables . . . 26

5.2.2 Modification of data . . . 27

5.3 Model building . . . 28

5.4 Model evaluation . . . 29

5.4.1 Residual analysis . . . 31

5.4.2 Diagnostics . . . 33

5.4.3 Top three models . . . 37

6 Result 38 6.1 Residual analysis . . . 38

6.2 Diagnostics . . . 39

6.3 Final model . . . 43

6.3.1 Final model without outliers . . . 43

7 Discussion 45 7.1 Variable selection . . . 45

7.1.1 CPI . . . 45

7.1.2 Retail Sales . . . 45

7.1.3 OMX30 . . . 46

7.1.4 Total enterprises . . . 46

7.1.5 Liquidated enterprises . . . 47

7.2 Shifting in data . . . 47

7.3 Outliers . . . 48

7.4 Bootstrap for Final model . . . 48

8 Conclusion 50

(11)

1 Introduction

To survive in today’s competitive markets can be challenging for many companies. Some companies are facing decreasing revenues and profits and furthermore fail to meet their obligations. A bankruptcy does not only affect the owners but also other stakeholders such as employees, customers and creditors. The most significant affection is for the creditor since they are risking losses of the outstanding debt when loans are not being repaid. With that said, there are several stakeholders, e.g. parties, interesting in knowing if a company is more or less risky to face bankruptcy.

The aim with this analysis is to evaluate if there exists a correlation between the number of bankruptcies and macroeconomic factors and the performance of the Swedish equity mar- ket OMX30. Furthermore, it will be evaluated if there is time indicators suggesting when the number of bankruptcies is increasing or decreasing. Our hypothesis is that the number of bankruptcies will correlate with macroeconomic factors and the equity’s performance.

We also expect to see annual trends in the number of bankruptcies.

1.1 Background

When a corporation is filed for bankruptcy it has been facing financial distress for certain amount of time. The corporation has been unable to repay the outstanding debt and will on the back of this be objective for a legal proceeding in which the assets of the firm are measured, evaluated and finally repaid to creditors.[2] The bankruptcy will impact the corporation’s shareholders, however, it will indirectly also affect the macroeconomy in the long run, e.g. rising unemployment. The facing of bankruptcy can in many cases be explained by microeconomic factors, such as internal ratios and figures from the financial statement. However, it can be assumed that the corporation most likely also is affected by the financial and economic surrounding in which it is operating.[3]

In Sweden today, there are approximately 1.2 million corporations and out of these over 96% is small sized businesses with 0-10 employees. Small and medium-sized companies make up almost 40% of the total revenues in Swedish enterprises and employ close to 45%

of the total number of employees.[4] According to statistics from Statistic Sweden (SCB ) 42% of the Swedish women and 52% of the Swedish men in ages 18-70 years are considering founding their own business, measured during time period 2015-2017.[5] The statistics also show that 22.3% of the population applied for business loans.[5]

The greatest risk with lending money is that the debtor will not be able to pay back the loan on the agreed time frame, i.e. mortgage and interests. When a company is facing default over a longer time period, this will potentially lead to bankruptcy and have an impact on the creditors since the outstanding debt may not be repaid. When the number

(12)

of bankruptcies increases it normally leads to higher loan losses for the creditors.[6]

1.2 Research question

The framework for multiple regression analysis will be used to answer the research question;

what impact do macroeconomic factors, the performance of Swedish equities and time indicators have on the number of bankruptcies among Swedish small and medium-sized companies?

1.3 Earlier research

When studying the number of bankruptcies and analysing the factors influencing the de- velopments it is commonly made by studying internal numbers from the corporations’

financial statements.[8] Also, the age of a firm is a commonly used factor when trying to predict the probability of bankruptcy. However, there have been a few analyses made with macroeconomic factors.

In 2005, Norges Bank published an empirical study of factors influencing the number of bankruptcies. The number of bankruptcies had during 2002 and 2003 risen sharply but then fell back to levels seen in the mid 1990s. The underlying factors were analysed and the conclusion was that changes in profit margins, competitiveness and real interest rates had been the most driving factors of the change in number of bankruptcies. Competitiveness was evaluated with regard to the exchange rate of the Norwegian currency. Also the anal- ysis showed that the high wage growth had contributed to the number of bankruptcies. It was also discussed to what extent macroeconomic factors, such as combination of labour, material and real capital, had an impact on the number. What one could see was firstly that if the unemployment rose from 4% to 5% of the total labour force, the number of bankruptcies would increase by approximately 8.25%, with see-able effects after two quar- ters and full effect after a year. Secondly, the analysis showed that the number increased with 1.75% in the first quarter followed by 2.75% in the longer term when real unit labour costs increase with 1%.[6]

In 2009, Philippe du Jarin professor at Edhed Business School wrote a report on how relevant it is to choose the right variables for predicting bankruptcies. The report brought up the question of not only determining the correct model but also evaluating the pres- ence of independent variables to evaluate the jointly precision of the model. In the report, 190 papers were analysed and the takeaway was that there were five issues concerning the modelling that got major focus but that the variable selection was somewhat neglected.

The conclusion of the research suggests that equal parts should be put on modelling and variable selection. However, not the same methods and techniques should be used to eval- uate the different aspects. Lastly, du Jarin stated that exploration of the variables would

(13)

allow better understanding of its financial impact and how well a factor is reflecting failure.

Thereby it could bring insights and trend or patterns to the table and furthermore help explain the dynamics of financial bankruptcies.[9]

A credit strength test that is commonly used to determine the risk of bankruptcies is the Altman Z-Score. It is used to evaluate a publicly traded manufacturing company’s likelihood of bankruptcy and uses data from the company’s annual 10-K report[10], known as a financial performance report.[11] The Ohlson Score is a multi-factor financial formula that uses key figures from the financial statement to predict the probability of bankruptcy.

The score includes nine parameters measuring the firm’s default and is an accounting based model, just like the Altman.[12] The Zmijewski Score is predicting the probabil- ity of bankruptcy within two years time. The formula includes figures from the financial statements.[13] Shumway introduced a prediction model including market data in 2001, since he suggested that about half the variables included in Altman and Ohlson’s models was not statistically significant. The model determined the likelihood of bankruptcy by including all available information about the firm at a specific point in time and including three market-driven variables in the analysis.[14] In March 2004, Hillegeist et al. intro- duced a fifth model when predicting the probability of bankruptcy, called BSM-Prob that also included the Black-Scholes-Merton option pricing in the analysis.[15]

There has been a lot of analysis and research on different model and another one was published in April 2010 at University of Queensland, Australia. In this analysis the au- thors have been comparing five different models of predicting bankruptcies originating by;

Altman, Ohlson, Zmijewski, Shumway and Hillegeist et al. The conclusion of the anal- ysis was that firm specific factors such as earnings before tax, working capital and total liabilities were figures that have large impact on the probability of bankruptcy. Also, it showed that firm specific market variables of stock return and volatility affected the num- ber. Lastly, the research pointed out that the Altman model performed poorly compared to the other models which includes not only firm-specific ratios but also market data.

The final conclusion suggested that models which includes key figures from financial state- ments, market data and characteristics of the firm creates the most capturing model when predicting corporation financial distress.[16]

(14)

2 Impact

If one could identify trends and correlations between the number of bankruptcies and macroeconomic factors, this could potentially give insights of in which economic environ- ment small and medium-sized companies are more likely to go bankrupt. Furthermore, if the credit risk could be more precisely estimated it could potentially increase the amount of capital creditors are lending to small and medium-sized companies. In the long-term it would support the employment growth and thereby also the economic growth.

The aim of this thesis is to determine if the chosen independent variables can explain the number of bankruptcies. With the findings we hope to bring insightful and explana- tory ideas of how creditors can better explain when and to what extent it will be profitable to lend.

2.1 Scope

In this thesis we are determining what impact a number of factors have on the risk of bankruptcy among small and medium-sized unlisted companies. We have defined small and medium-sized as companies of 0-50 employees. The aim is to identify the struc- tural interpretations of bankruptcy. The dependent (response) variable will be number of bankruptcies. The independent (explanatory) variables are based on macroeconomics data, data from the Swedish equity market OMX30 and also time intervals such as annual trend.

In this analysis we are focusing on the economic bankruptcy, which is stated as follow- ing: “a company is classified as bankrupt when it is definitely unable to face its long term obligations”.[8]

2.2 Delimitation

In the analysis we have chosen to not include ratios and figures from the companies finan- cial statements. Furthermore, microeconomic factors will not be included in the analysis.

Regarding the data, we have limited the analysis to include the outcome of the indepen- dent variables between the Jan 2009 and Dec 2019. The data included in this analysis are mostly numbers of changes given in percentage.

The analysis has also been restricted to cover the Swedish market which implies that we have only included data covering Swedish companies.

(15)

3 Economic theory

In this section the economic theory of the thesis is presented. Here the focus is on the macroeconomic factors, stock exchange and the structure of enterprises. The categorical variable is presented in the section 5 Methodology. This theory will be the foundation for further evaluation of the models acquired.

3.1 Response variable

In this thesis the response variable is number of bankruptcy among Swedish small and medium sized companies. The economic bankruptcy is when a company that is unable to face its financial long term obligations.[8] In an article by Stiglitz it is defined in the following way:

” In a two-period model in which the firm invests in the first period and makes its return in the second period (and thereupon dissolves), bankruptcy is easy to define: the income of the firm is less than the fixed obligations to bondholder.”[21, p.459]

Clearly if the income of the firm is less then the fixed obligations the firm can borrow more to escape that problem, but only to a certain level. For a firm there will always exist a risk for bankruptcy with a certain debt-equity ratio. Because a firms value is affected by the debt-equity ratio the goal will not always be to keep it as low as possible, but instead to find a perfect level of debt-equity ratio.[21, p.458]

In an article by Norges bank some macroeconomic variables was evaluated to see if there exists a correlation between number of bankruptcies. Unemployment is considered a good measure of developments in domestic demand and according to their model, number of bankruptcies will increase if unemployment rises.[6] Inflation could also influence number of bankruptcies, if the enterprise has a loan with a variable interest rate and is not price- indexed it could experience less earnings when inflation rises and the increase in interest expenses is greater than the increase in earnings. To conclude higher inflation could in- crease the probability of bankruptcy if the debt for the enterprise is not price-indexed and there is no longer an option for the company to borrow after credit rationing.[22] Regard- ing number of enterprises active, which also is a variable in this analysis, the article by Norges Bank presents that an increase of the number of enterprises entails an increase of the number of bankruptcies. It also presents that normally new enterprises have a greater probability of bankruptcy compared to older enterprises.[6]

(16)

3.2 Regressor variables

The macroeconomic regressors are CPI, retail sales, unemployment, import price index, export price index and producer price index. The stock exchange regressor variable is OMX30. In the section 5 Methodology the sources from where these variables and their data are received and how they are modified to fit the dataset are presented.

3.2.1 CPI

The Consumer Price Index (CPI) is a measure that examines the weighted average of prices of a basket of consumer goods and services compared to the same month the year before. Generally it indicates the average price of the consumption in households. The consumer price index is the standard measure of inflation calculations in Sweden and an economic indicator.[23] The percentage change of the CPI is the inflation. The Riksbank is the central bank of Sweden which ensure that money retains its value over time and that payments can be made safely and efficiently. Their goal is that the inflation in Sweden is kept at a low and stable rate. It should amount to 2 percent per year. [24] The central bank change the interest rates in response to the output in the economy to maintain the inflation stable, the notation of this is the Tayor rule. [25, p.192]

3.2.2 Retail sales

Retail sales is the purchases of finished goods and services by consumers and businesses at the end of the supply chain over a certain time period. Retail sales can be an adequate indicator of the current state of the economy and its projected path towards expansion or contraction. It is therefore also a good macroeconomic indicator as it accounts for two-thirds of the GDP.[26]

3.2.3 Unemployment

Unemployed people is the part of the working population that is unable to find work. It is often used as a measure of the health of the economy. According to Okun’s law, a higher rate of unemployment leads to a loss in output.[25, p.145] The Beveridge curve show the correlation between vacancies and unemployment, with a higher unemployment there are less vacancies and it is more difficult to find a job. This however has multiple effects on the economy, with less people in work, the BNP will decrease.[25, p.155] The Phillips curve explains the relationship between unemployment and inflation. With a low rate of unemployment, the inflation is high and when the rate of unemployment is high the inflation is low. [25, p.121]

(17)

3.2.4 Import and Export price index

Export price index and import price index are two price indexes that measure the price on goods and services in international trade, where the export price index is for goods and services that leave Sweden and the import price index is for goods and services that enter Sweden.[27]

3.2.5 Producer price index

Producer price index is a price index that measures the average change in prices that we receive from domestic producers on their output.[28] As for CPI, PPI is a measure of the cost of a given basket of goods, the difference is that PPI also includes for example raw materials and semifinished goods. Another difference is that the producer price index mea- sures prices on an earlier stage of the distribution system, in contrast to the consumer price index that measures prices at the end of the distribution system i.e. what the households actually spends on a retail level. Therefore PPI can sense changes in the general price level before CPI.[25, p.44]

3.2.6 OMX30

OMX Stockholm 30 (OMX30) is a stock market index for the Stockholm Stock Exchange.

It is an index of the thirty most frequently traded stocks on the Stockholm stock exchange.

OMX30 measures the stock price performance with the initial index from year 1986 with value 125. The composition of companies and their indexes revise every half a year.[29]

3.2.7 Number of enterprises

Total number of enterprises, Number of enterprises started and Number of enterprises liq- uidated are three variables within this subject. These three variables and their statistics measures the number of companies being active on the Swedish market, covering all the events for companies except for the ones facing bankruptcy. The following different types of measures that are included in this thesis is the following; Limited company (Aktiebo- lag), Trading partnership (Handelsbolag), Sole trader (Enskild n¨aringsidkare) and Limited partnership (Kommanditbolag).[30]

(18)

4 Mathematical theory

The main framework that will be used in this thesis is the theoretical framework of multiple regression analysis. Following sections will cover major parts of the modelling. Source for the mathematical theory is Introduction to Linear Regression Analysis[1], otherwise it is clearly stated.

4.1 Definition and terminology

A multiple regression analysis includes one dependent variable and multiple independent variables. A model that describes the relationship between the dependent and the inde- pendent variables xj where j = 1, 2, .., k is

y = β0+ β1x1+ β2x2+ ... + βkxk+ 

where k is the number of independent variables. y is a linear function of the unknown parameters 0, 1, . . . , k. The j are representing the expected change in response y per unit change in xj, while all other independent variables , i.e. xiwhere ij, are held constant. The random variable is the error term. The model describes a hyperplane in the k-dimensional space. In general, any regression that is linear in the parameters, i.e. βj, is a linear model regardless of the shape of the surface that it generates in the k-dimensional space. In most real world analysis, the values of regression coefficients and error variance are not known and must be estimated from the collected sample data. Assumed is that the error terms are uncorrelated and that the E[] = 0 and V ar() = σ2. What is also assumed in this thesis is that all independent variables are fixed, i.e. they are non-random variables which have been measured without error. Regarding the model of y it is assumed that the conditional distribution is normally distributed with µ(y) = β0+ β1x1+ β2x2+ ... + βkxk and V ar(y) = σ2. p is the number of variables in the model and is furthermore the sum of the dependent and the independet variabels, i.e. p = k + 1.

To the model above the corresponding sample regression model is

yi= β0+ β1xi1+ β2xi2+ ... + βkxik+ i = β0+

k

X

j=1

jxij) + i, i = 1, 2, ..., n

Commonly the Ordinary Least Squares Estimation (OLS) is used to estimate the coeffi- cients. The least squares function is

S(β0, β1, ..., βk) =

n

X

i=1

2i =

n

X

i=1

(yi− β0

k

X

j=1

βjxij)2

(19)

To minimize the function S, all coefficients j must satisfy the following equations of the derivatives

dS dβ0 = −2

n

X

i=1

(yi− ˆβ0

k

X

j=1

βˆjxij) = 0

dS dβj = −2

n

X

i=1

(yi− ˆβi

k

X

j=1

βˆjxij)xij = 0, j = 1, 2, .., k

By simplifying the equations above, the least-squares normal equations are obtained where the least-squares estimators will be the solution. To simplify the dealing with multiple regression models it is convenient to express the equations in matrix notations

y = Xβ + 

Since we want to find the coefficients that minimizes the OLS function of the sum of squares between the dependent value yiand the fitted value ˆyithe matrix notation can be simplified to

S(β) = (y − Xβ)0(y − Xβ)

Minimizing the function with the derivative function, also known as the least-squares nor- mal equation, expressed in matrix notations is

β = (Xˆ 0X)−1X0y

The above function is often used to determine the least-squares estimators since it is the best linear unbiased estimators for the given problem, also known as BLUE. By best, it means the set of estimators that has the smallest variance among all sets, which are all linear combinations of the data. This is established by Gauss-Markov Theorem for the OLS estimator model.

In a regression analysis, the difference between the fitted value and the observed value is called the residual, ei = yi− ˆyi.

4.1.1 Basic assumptions

There are a number of assumptions that are made when studying a regression analysis:

• The relationship between the dependent variable y and the independent variables is linear, at least approximately

• The error term has zero mean

• The error term has constant variance

(20)

• The error term are uncorrelated

• The error terms are normally distributed

If the assumptions are not met it could potentially cause serious problems to the modelling and therefore it is major important to consider the assumptions and determine the adequacy of a model. We cannot use standard summary statistics, such as t-testing and R squared- testing to detect underlying departures. To check the model adequacy and diagnosing of the basic regression assumptions, we use methods primarily based on studying the models residual.

4.1.2 Ordinary least squares estimation

The expected value of the estimators in the ordinary least squares model is E[ ˆβ] = β and the variance is an unbiased estimator of V ar( ˆβ) = ˆσ2 = M SRE S = SSn−pRE S where SSRE S = y0y − ˆβ0X0y.

4.1.3 Multicollinearity

When using the least squares estimator to determine the coefficients of the regression model it is of major importance that there do not exist near-linear dependences among the vari- ables. If there is no linear relationship between the variables they are said to be orthogonal, however that is not the case in most applications of regression analysis. Multicollinearity is said to exist if there are near-linear dependencies among the variables. Most data set will suffer from multicollinearity to some extent and it has a number of potentially effects on the least-squares estimation of the coefficients. Strong multicollinearity among variables increases the variance and covariance for the least squares estimators, which diminishes the usefulness of the models estimations. Before pursuing a multiple regression analysis one needs to examine the presence of multicollinearity. In section 5.4 Model evaluation and diagnostics are several methods for the examination is presented.

There are primarily four different sources of multicollinearity:

• The data collection method employed

• Constraints on the model of in the population

• Model specification

• An over-defined model

Three main recommendations to dealing with multicollinearity:

• Redefine the model in terms of a smaller set of independent variables

(21)

• Perform preliminary studies using only subsets of the original set of independent variables

• Use principal-component-type regression methods to decide which variables to be removed

4.1.4 Standardized regression coefficients

In a multiple regression model it is common that the independent variables are in differ- ent units and it is therefore difficult to directly compare the estimated coefficients with each other. Generally, the coefficients βj are units of y per unit of xj and to make it more appropriate to compare the coefficients it is sometimes helpful to convert the vari- ables to scaled variables. The estimated coefficients will then be dimensionless regression coefficients. There are mainly two different methods to produce scaled variables; unit nor- mal scaling and unit length scaling. Both methods will produce the same dimensionless regression coefficients ˆb where the relationship with the original estimations βj is

βˆj = ˆbj(SST Sj j

)1/2, j = 1, 2, ..., k

βˆ0 = ¯y

k

X

j=1

βˆjj

4.1.5 Categorical variables

In a regression analysis the most common variables used is quantitative variables, variables that can be measured on a well defined scale. However, sometimes it can be necessary to use qualitative variables in a regression. A qualitative variable, also called categorical vari- able, can’t be measured. They can be seen as different categories of the observations.

Categorical variables can not be added to the regression analysis as they are straight to the observed data set, but must be modified. They have to be transformed to a series of variables to be able to enter the regression model.

4.1.6 Strategy for model building

In order to be able to answer the research question we will need a strategy of how to choose variables and build the model. This is an iterative process described as follows:

• Fit the full model

• Perform residual analysis

(22)

• Determine if transformation is needed

• Perform variable selection

• Select a number of models that are suggested to be best

• Check adequacy for these models

• Make conclusions and recommendations 4.2 Model evaluation and diagnostics 4.2.1 Residual analysis

The definition of residual is

ei = yi− ˆyi, i = 1, 2, ..., n

where yiis the dependent value observed and ˆyi is the fitted value. The residual is therefore the deviation between the observation and the fitted value, but is also a measure of the variability in the independent variable not explained by the regression model. One could also think of the residual as the model’s errors. To use the residuals and also the plotting of residuals is an effective way to analysing the adequacy of the fit of the regression model and to check the underlying assumptions.

Properties of the residual is that they have zero mean and the approximated average variance is estimated by

Pn

i=1(ei− ¯e)2

n − p =

Pn i=1e2i

n − p = SSRE S

n − p = M SRE S

Residuals are not independent, but this has minor effect on the model adequacy as long as n is not small relative to the number of parameters k. The degree of freedom associated with the n residuals is n − p.

It is helpful to use the residuals to find outliers and extreme values in the observations separated from the rest of the data points. Standardized residuals is logical scaling of the residuals with the average variance, M SRE S. Large standardized residuals potentially indicates an outlier.

di = ei

√M SRE S, i = 1, 2, ..., n

By plotting the standardized residuals, with zero mean and approximately unit variance, against the fitted values it is desired that the spread of the residuals is equal along the line.

This implies that the model fulfills the basic assumptions in consideration of constant vari- ance. At some points, it can also be desirable to plot the residual against the independent

(23)

variables separately. These plots often exhibit patterns that indicates how the indepen- dent variable behaves. Furthermore, if the time sequence of the data collection is known, plotting the residual against the time order could give insights of how the variance changes over time. Errors at one time period in the analysis can also be correlated to another time period in the model; called autocorrelation which is potentially a serious violation to the basic regression assumptions described in section 4.1.1 Basic Assumptions.

4.2.2 Examination of the correlation matrix

As mentioned above, it is of major importance to examine the presence of multicollinearity before conducting a multiple regression analysis. A very simple measure of it is inspection of the off-diagonal elements, rij, in a X0X matrix. Variables are nearly linearly dependent if the absolute value of rij, i.e. |rij|, is near unity. Used for the examination is the standardized values; each of the variables has been centered by subtracting the mean and dividing by square root of the corrected sum of squares. Generally, using this method to detect multicollinearity is sufficient only for pairwise nearly linearly dependence. More complex, i.e. higher order of multicollinearity, requires other methods.

4.2.3 Variance inflation factors

Another method to examine the presence of multicollinearity is to consider the variance inflation factors. The VIF is determined from the unit scaled matrix C = (X0X)−1defined as

V IFj = Cj j= (1 − Rj2)−1

The j:th diagonal element of C can be written as Cj j = (1 − R2j)−1 where R2j represents the coefficient of determination obtained when the regression of Xj is made on the remaining p − 1 variables. xj is nearly orthogonal when R2j is small and Cj j is close to unity. xj is nearly linearly dependent on some subset of the remaining variables when R2j is near unity and Cj j is large. The variance of the j:th variable coefficients is Cj jσ2and therefore we can view Cj j as the factor by which the variance is increasing due to nearly linearly dependence among the variables.

The VIF is measuring, for each term in the model, the combined effect of the depen- dences among the variables on the variance of that term. Large values indicates that there exists multicollinearity in the model; practically values that exceeds 5 or 10 indicates that the associated regression coefficients are poorly estimated due to multicollinearity.

4.2.4 Methods for handling of multicollinearity

The most commonly used method when it comes to multicollinearity is to consider the selection of independent variables. Thus, not all independent variables are appropriate

(24)

for the model, but eliminating variables will not guarantee elimination of multicollinearity.

Variable selection methods help to justify the presence of highly related independent vari- ables in the final model, also, it will contribute to increased confidence interval. The aim is to generate a subset of independent variables to compute a final model.

One of the methods is called all possible regressions and is evaluating all possible re- gressions combinations that can be generated of the set of independent variables. It is starting with one independent variable, continuing with two and so on until it has gener- ated 2k equations, e.g. there are k independent variables in the set. It is assumed that β0 is included in all equations. Thereafter, the equations are evaluated according to some suitable criterion, i.e. MSE, BIC, adjusted R2, and the “best” regression model is chosen to continue with.

4.2.5 R2

Two well-known measurements commonly used to assess the overall adequacy of a model is R2 and adjusted R2. R2 is defined as

R2 = 1 −SSRE S SST

while the adjusted R2 is defined as following R2adj = 1 −

SSRE S n−p SST

n−1

and is by some regression builders preferred before the regular one. This is since the resid- ual mean square and the (n−1)SST is constant regardless of how many variables are included in the model.

Generally the measure of R2 does never decrease when another variable is added to the multiple regression analysis regardless of the contribution of the added variable. On the back of this, there are difficulties to judge whether an increase in R2 is revealing any important knowledge of the analysis.

4.2.6 AIC and BIC

In modelling information will always be lost. By help of this criteria, it is examined to what extent the model is losing information due to statistical modeling. Akaike proposed an information criterion, AIC, in order to maximize the expected entropy, i.e. the measure of the expected information, of a model. It is penalized as a log-likelihood measure as follows

AIC = −2ln(L) + 2p

(25)

and the lowest value of AIC is desired.

The BIC is an extension of AIC and places a greater penalty on adding another variable as the sample size increases. The criterion is as follows

BIC = −2ln(L) + pln(n)

Both the AIC and BIC is gaining popularity and are more commonly used in complicated and complex modeling than the least-squares.

4.2.7 Mallow’s Cp

The Mallow’s Cp is a criterion related to the mean square error of the fitted values, esti- mated by the following equation

Cp = SSRE S(p) ˆ

σ2 − n + 2p

If a model has no bias, the Cpis equal to p. If the model has low error, the Cp is represented by small values. Generally, small values of the Cpare desirable however it may be preferable to accept some bias in model as it will reduce the average error of a prediction.

4.2.8 Cook’s distance

Cook’s distance measures the effect of deleting a given observation where it takes both the x space and the response variable into account. The model calculate the square distance between the sum of the least-square estimate B and the same estimate with the last point (Bi) removed. Cook’s distance is denoted (Di) and calculates as following:

D(i)= Pn

j=1( ˆyj− ˆyj (i))2 ps2

We can rewrite the expression as following to express it in terms of leverage:

D(i)= ei2

ps2 hii (1 − hii)2

When we rewrite the expression of Cook’s distance we can see that the diagonal element in the hat matrix is a component in the measurement. The hat matrix (or the projection matrix) is calculated as following:

H = X(XTX)−1XT

The i:th diagonal element in the hat matrix given by hii is the leverage of the i:th obser- vation. Where leverage is the standardized measure of the distance of the i:th observation from the center of the x-space.

(26)

4.2.9 Covariance Ratio

We can also measure the influence of the observations with the Covariance ratio, also labled CovRatio. With Covariance Ratio we measure the impact of every observation on the variance and standard errors for the regressor coeficients. It also measurues their covariances. So overall the Covariance measures the effect of the ith observation on the precision on the estimated model. The Covariance Ratio calculates as following:

COV RAT IO(i)= (S( i)2 )p M SR esP ( 1

1 − hii

)

4.2.10 The Box-Cox method

The Box Cox method can be used to evaluate if y should be transformed to correct non- normality and/or non-constant variance. To transform we can use the power transforma- tion of y to a certain power λ. The Box Cox method uses maximum likelihood. Maximum likelihood maximizes the following function:

L(λ) = 1

2nLn(SS(RE S )(λ))

We construct an 100(1-α) percent confidence interval for λ. This is the λ values that satisfy the following equation.

L(ˆλ) − L(Λ) <= 1 2χ2α,1

In the plot we construct the confidence interval together with a horizontal line as follows:

y = L(ˆλ) − 1 2χ2α,1 4.2.11 Cross validation

Cross validation is a statistical method used to estimate the precision of a model. Usually, a single parameter k is included where k refers to the number of folds. The method is generally used for a known set of samples in order to examine how well it would perform on an unlimited sample set. The sample set is splitted randomly into k number of groups, known as training and test sets. Each set will be seen as test set once and seen as training set k − 1 times.

The test set is set aside while the remaining sample, namely the training set, is fitted with the least squares method. The M SE for each of the iterations is obtained, e.g.

(27)

M SE1, M SE2, ..., M SEk. The cross validation is then estimated by

CV(k ) = 1 k

k

X

i=1

(M SEi)

The number of folds k must be carefully chosen. Either k can be chosen so that the sample groups are large enough to fulfill the statistical requirements. A second tactic is having k = 10, e.g. a fixed value, which is proven to estimate modest variance with low bias. A third alternative is to set k = n where n represents the size of the sample set. The third approach is called leave-one-out cross validation.

Usually, one choose k = 5 or k = 10 but there is no formal rule. As the number of folds increases, the difference in size between the training and testing set is getting smaller and this implies that the bias of the method decreases. The trade-off when choosing k is between the bias and the variance; the larger k, the less bias and the more variance.[17]

4.3 Statistical inference 4.3.1 Confidence interval

To consider the confidence intervals one at the time for the multiple regression model is important and will tell us if and to what extent, decided by α, repeated samples will contain the true value of our estimators or not. For the analysis we will still assume that the error terms are normally and independently distributed, i.e.  ∼ N ID(0, σ2). On the back of this, also the observations yi are normally and independently distributed, i.e.

yi ∼ N ID(β0+Pk

j=1βjxij, σ2). From the least-squares estimation we can consider the estimators normally distributed, i.e. ˆβ ∼ N (β, ˆσ2Cj j). Each of the statistics is distributed as t with n − p degrees of freedom

βˆj− βj

2Cj j, j = 0, 1, ..., k

With these equations, a definition of the confidence interval of α can be defined βˆj− tα

2,(n−p)

q ˆ

σ2Cj j ≤ βj ≤ ˆβj+ tα

2,(n−p)

q ˆ σ2Cj j

where the quantity usually is called the standard error and notated as Se( ˆβj) =pσ2Cj j. 4.3.2 Test for significance of regression

This test is an analysis of the variance and is applied on the regression model to determine if there is a linear relationship between the dependent and independent variables; more

(28)

generally thought of as an overall test of the model adequacy. For the test, hypotheses are needed

H0 : β0 = β1 = β2 = ... = βk= 0 H1 : βj 6= 0, for at least one j

and if the null hypothesis is rejected it implies that at least one of the independent variables is contributing significantly to the model. The total sum of squares is divided into two separate parts; sum of squares due to regression and residual sum of squares SST = SSRE S+ SSRE G.

4.3.3 F-test statistic F-test is defined as follows

F0 =

SSRE G

k SSRE S

n−k−1

= M SRE G M SRE S reject null hypothesis if

F0> Fα,k,(n−k−1)

4.3.4 t-test statistic t-test is defined as follows

t0= βj2Cj j

reject null hypothesis if

|t0| > tα/2,(n−k−1)

(29)

5 Methodology

In this thesis we are examining the research question

• what impact do macroeconomic factors, the performance of Swedish equities and time indicators have on the number of bankruptcy among Swedish small and medium-sized companies?

The main methodology that will be used is the multiple regression analysis. Its framework was comprehensively described and presented in section 4 Mathematical theory. To be able to answer the question we will consider different models of the independent variables in order to find the best model that presents the most precise result. BLUE, e.g. best linear unbiased estimates, is received when using the ordinary least squares method to estimate the coefficients of the linear fit of the regression model. By best, one mean the most precise model which is measured according to the smallest variance. The significance of each model and its contributing variables will be evaluated throughout the analysis.

5.1 Code environment

To be able to execute all the methods described in section 4 Mathematical theory we have used the language R. The code has been written in R Studio which is a statistical and graphic software. We have created functions for the different methods, compiled distinct plots and figures, made significance tests and much more.

In R build-in packages enables the user to import build-in functions that can be used in the analysis. For this thesis the following packages were installed:

• MASS

• car

• moments

• leaps

• readxl

• caret

• corrplot

• moments

(30)

5.2 Data

5.2.1 Data variables

In the model we have 12 regressors, the independent variables and one dependent variable.

One of the regressors, months, is a categorical variable. The categorical variable has 12 categories which are the months of the year. The other independent variables are CPI, retail sales, unemployment, OMX30, export price index, import price index, producer price index, number of enterprises, number of enterprises started and number of enterprises liq- uidated. The dependent variable that we fit these regressors to is number of bankruptcies.

The data is from January 2009 to December 2019 for every month, which make it 132 observations.[31]

The first independent variable is the time, measured in months. This is a categorical variable because we want to see trends between months over the years. The next indepen- dent variable we have is Consumer price index (CPI). This data is downloaded from the Swedish database for official statistics called Statististiska centralbyr˚an (SCB).[32] SCB is a state agency with the purpose to provide users and customers with statistics for decision making, debate and research who coordinates all the official statistics in Sweden.[33] An- other independent variable in the regression is retail sales. This data is also collected from SCB where the observation for every month is the change from the percentage change for the number of retail sales the same month the year before.[34]

The dataset of unemployment is collected from SCB where every observation is the part of the working population that is unemployed relative to the same month compared to the year before.[35] The dataset for OMX30 is downloaded from Nasdaq Stock Market[36], which is the world’s largest stock exchange company.[37] Every observation is the average change over the month compared to the last time period. Since the dataset originally con- tain observations from every day, it has been transformed. This is also done to reduce the risk that extreme events have an excessive impact on data sets. Export price index, import price index and producer price index are all collected from the same dataset from SCB.[38]

Every observation for these variables is, as many of the other variables, the percentage change from the corresponding month previous the year.

At last there are three variables that explains changes of how many companies that are ac- tive on the Swedish market, these are number of enterprises, number of enterprises started and number of enterprises liquidated. This data is collected from Bolagsverket,[39] which is a Swedish state authority with the main purpose to register and make company infor- mation available.[40] Every observation for Number of enterprises is the current number of companies at the market that month. For Number of enterprises started and liquidated it is the amount of started or liquidated enterprises that month that is observed in the dataset.

(31)

In the original model there are 132 observations in the dataset. When observations, i.e.

outliers, are removed from the dataset to improve the model there are 129 observations.

The dataset will also be shifted and furthermore there will be less observations when pro- ceeding with different time intervals. The datasets will then consist of 129 observations, 126 observations and 120 observations respectively. This depends on how large time difference that is tested.

5.2.2 Modification of data

Before executing the regression analysis on the data, one has to make sure that the correct and complete data has been imported.

As mentioned in section 4.1.6 Categorical variables most variables used in an regression analysis is quantitative variables. It means that they are well defined on a scale and can thereby be measured. For this analysis, we have chosen to also look at a categorical vari- able, namely annual trends, for which we have used the categorical variable months, e.g.

the year’s twelve months.

The categorical variable was initially imported as a factor from the data file with twelve different levels, e.g. 1 to 12, where 1 represents January and 12 December. Instead of transforming the factors to numerical variables, it was kept as factors for the whole anal- ysis. This was in order to follow the monthly movements and furthermore try to identify if there were months in which the number of bankruptcies strictly (or slightly) deviated.

Regarding the quantitative data it is worth having a look into the modifications made on the numbers collected for the performance of the Swedish equity market OMX30. When collecting the numbers from Nasdaq it was measured as the daily performance, expressed in the closing price, CPi. However, since we are interested in the numbers of monthly performance in terms of changes it had to be transformed. The daily numbers were aggre- gated to monthly numbers by calculating the average of the closing price during a month, i.e. n equals date of last trading day. x = 1nP

i =n

1CPi. When the average of the month had been estimated, the changes from month t to t + 1 were calculated

∆(x) = xt+1

xt − 1

The aggregated numbers represent the changes from the month before. The main reason why we decided to calculate the average of the monthly closing price, instead of calculat- ing the daily change, before estimating the monthly changes was to decrease the risk of including the effects of trading days with larger market movements. For example, if there has been a trading day in which one of the largest companies has presented a disappoint-

(32)

ing quarterly report and the stock has lost ground, this would have affected the monthly change to a greater extension.

5.3 Model building

This analysis is starting with an evaluation of the full model, i.e. all 11 independent variables presented in table 1, to be able to fully analyse if there are significant variables to be used in different combinations to further evaluate the precision of the regression analysis. Firstly, all basic assumptions need to be checked.

Months CPI Retail sales Unemployment

OMX30 Export price index Import price index Producer price index Number of enterprises Number of enterprises started Number of enterprises liquidated Table 1: Independent variables in full model.

In our data set we have collected monthly numbers from January 2009 until December 2019, which altogether makes 132 observations. This can be considered as a large sample set where each observations is independent and collected in the same manner. One can also conclude that it is drawn from the same underlying population. Collectively this is known as observations being independent and identically distributed, i.e. N ID ∼ (0, σ2).

With that said, it is assumed that both the sample and the population from which it was drawn are normally distributed due to the Central Limit Theorem[18].

A linear relation between the dependent and independent variables can be assumed due to the plot of residuals versus the fitted values. The plot shows that the fitted line is horizontal and surrounded by a range of residual data points. This implies that the basic assumption of a linear relationship between the predictor and the regressors is fulfilled.

(33)

Figure 1: Residual of the initial model vs the fitted values obtained with ordinary least squares estimation.

We can also conclude that the initial model is fulfilling the basic assumptions of normally and independently distributed error terms. Due to the plots of the normal quantiles and scale-location it is shown that the error term 1) is normally distributed and 2) that an almost linear line with equally spread residuals along it which implies that the residuals have equal and constant variance.

Figure 2: Plots for the initial model.

Since all basic assumptions have been shown to be fulfilled we can now continue further with the analysis and evaluate the model and thereafter continue to improve the model’s precision.

5.4 Model evaluation

In this part, we will go through the different parts of the analysis described in the section 4 Mathematical theory and evaluate the full model and which models that should be evalu- ated even further. Firstly, the model was fitted with the ordinary least squares estimation and below is a summary table of the results:

(34)

Estimated Standard Error t value Pr(> |t|)

Intercept 2.847e+02 1.736e+02 1.640 0.10383

cpi.m -7.997e+00 6.817e+00 -1.173 0.24328

retail.sales.m -7.495e+00 2.882e+00 -2.601 0.01057 *

M.m2 1.395e+01 2.465e+01 0.566 0.57260

M.m3 7.454e+01 2.650e+01 2.813 0.00582 **

M.m4 7.905e+01 2.886e+01 2.739 0.00719 **

M.m5 1.303e+02 2.724e+01 4.781 5.43e-06 ***

M.m6 9.234e+01 3.018e+01 3.059 0.00279 **

M.m7 -5.049e+01 3.834e+01 -1.317 0.19060

M.m8 -9.604e+01 3.289e+01 -2.920 0.00425 **

M.m9 -2.777e+01 2.995e+01 -0.927 0.35585

M.m10 9.160e+01 2.866e+01 3.195 0.00182 **

M.m11 1.172e+02 2.725e+01 4.299 3.72e-05 ***

M.m12 7.859e+01 2.869e+01 2.740 0.00718 **

unemployment.m 1.562e+01 1.264e+01 1.236 0.21924

OMX30.m 2.446e+02 1.339e+02 1.827 0.07048 .

export.m 1.393e-01 4.327e+00 0.032 0.97438

import.m 1.632e+00 3.274e+00 0.498 0.61916

ppi.m -7.339e-01 6.828e+00 -0.107 0.91460

total.enterprises.m 3.057e-04 1.355e-04 2.256 0.02605 *

startup.enterprises.m 4.031e-04 1.006e-02 0.040 0.96812

liquidated.enterprises.m -3.441e-02 1.230e-02 -2.798 0.00607 **

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ Residual standard error: 49.59 on 110 degrees of freedom

Multiple R-squared: 0.6957, Adjusted R-squared: 0.6376

F-statistic: 11.98 on 21 and 110 DF, p-value: < 2.2e − 16 Table 2: Summary of the OLS estimations on the initial model.

What is also worth paying attention to is the ANOVA table, presenting important re- sults about the variance of the variables:

(35)

Df Sum sq Mean sq F value Pr(> F )

cpi.m 1 3595 3595 1.4620 0.229204

retail.sales.m 1 8998 8998 3.6592 0.058359 .

M.m 11 554060 50369 20.4847 ¡ 2.2e-16 ***

unemployment.m 1 1 1 0.0002 0.988526

OMX30.m 1 10871 10871 4.4210 0.037780 *

export.m 1 6276 6276 2.5524 0.112994

import.m 1 3000 3000 1.2202 0.271729

ppi.m 1 37 37 0.0149 0.903002

total.enterprises.m 1 11759 11759 4.7823 0.030871 *

startup.enterprises.m 1 564 564 0.2294 0.632899

liquidated.enterprises.m 1 19251 19251 7.8293 0.006071 **

Residuals 110 270475 2459

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ Table 3: ANOVA table of the OLS estimations on the initial model.

From these two methods applied on the full model one can draw the following conclu- sions:

• The criterion adjusted R2 is suggesting that 63.8% of the response variable is ex- plained by the regressors

• There are a number of variables that fulfill the required significance level, i.e. 95%

significance

• The different months are compared to the numbers of month 1, i.e. January. From the summary one can tell that there are seven months that distinguish from the rest and is statistical significant (95% significance level)

• A number of variables’ null hypothesis should be rejected 5.4.1 Residual analysis

The residual analysis is important since it is representing a major part in the process of finding extreme values and outliers that could potentially harm the model. What is worth mentioning, however, is that before one removes any observations one needs to evaluate its influential and contribution to the model. In the plots below we have chosen to consider the normal residual, the standardize residual and the studentized residual. From the plots one can tell that there are three distinct potential outliers that should be investigated further;

i.e. observation 50, 91 and 130.

(36)

Figure 3: Residuals for the initial model.

Another aspect of residual plots is that it is an advantageous method to check the adequacy of the model. By plotting the residuals against the independent variables sepa- rately, known as added variables plots, one can exhibit patterns of how the independent variable behaves and how it contributes to the models precision. If the marginal relation- ship is falling into a straight line with non-zero slope, one can assume that the independent variable and the dependent variables has a linear relationship and that the independent variable is not contributing with additional information to the model.

From the plots below, one can tell that there are several variables that is contributing with more or less information to the model. However, if one exclude the monthly consid- erations from the analysis on the back of these being indicator variables, there are four variables that have zero-slope lines, namely import, export, PPI and startup enterprises.

(37)

Figure 4: Added-variable plots of the initial model.

5.4.2 Diagnostics

The variable inflation factor suggests that there exists multicollinearity among the inde- pendent variables. Below is the results presented and suggested is that CPI, retail sales, unemployment, OMX30, total enterprises, startup enterprises and liquidated enterprises have VIF values not exceeding the cut-off value 10.

(38)

cpi.m retail.sales.m unemployment.m

3.186 1.379 7.965

M.m OMX30.m

29.11 1.239

export.m import.m ppi.m

20.54 11.87 27.02

total.enterprises.m startup.enterprises.m liquidated.enterprises.m

4.512 4.752 3.097

Table 4: VIF values for the initial model.

Below is the correlation matrix plotted. Positive correlations are displayed in blue while negative in red. The intensity of the color and the size of the plot are proportional to the correlation coefficient. For the initial model, the correlation matrix suggests that import, export and ppi have high value of positive correlation while total enterprises and unemploy- ment have high value of negative correlation. Other than that, the matrix does not reveal any major indicatives. The strong correlation between import, export and ppi we have also seen according to the VIF value presented above. The values for these three variables were all larger than all other independent variables with the exception for monthly numbers.

Figure 5: Correlation matrix for the initial model.

Regarding Cook’s distance, one requires the observations to have a value below one,

(39)

which is the generally suggested limit before the points get influential. The graph shows that the points are below the limit, however observation 130 has a remarkable higher value than the others. We have seen that observation 130 has a potential of being an outlier in the residual analysis and should therefore be investigated further.

For the Covariance ratio we want the observations within the limits [1 − 3p/n, 1 + 3p/n], which are the limits generally used in Covariance ratio analysis. The graph shows that almost all observations are within the limits, only observations 31, 50, 85 and 130 are outside this limit. These observations have shown signs of being potential outliers earlier in the analysis and should therefore be investigated further.

Figure 6: Plots for the initial model.

In the Box cox method we want the value of λ = 1 in the 95 percent confidence inter- val which has been plotted. As shown in the graph that is the case for this model and the new function is yλ = y1 = y so then no transformation is made because it would not improve the model.

(40)

Figure 7: Box cox of the initial model.

Regarding the variable selection, the all regression method will be used and thereafter evaluated according to the following three criterion: BIC, Mallow’s Cp and adjusted R2.

Figure 8: All possible regression of the initial model according to the BIC, Mallow’s Cp and adjusted R2 criterion.

The cross validation method was applied on the initial model. It suggests that the most precise model is consisting of six variables. However, also the model with five regressors have a low cross validation value.

Figure 9: Cross validation of initial model.

(41)

5.4.3 Top three models

As a result of the analysis made on the initial model, i.e. the full model, it is suggested that we further investigate the following three models

M odel1: M.m, unemployment.m, total.enterprises.m M odel2: M.m, CPI.m, retail sales.m, OMX30.m,

total.enterprises.m, liquidated.enterprises.m M odel3: M.m, retail.sales.m, unemployment.m,

export.m, total.enterprises.m,

Table 5: The three models for further analysis.

These three models have been suggested throughout the analysis and is believed to have potential of being more precise than the full model. On the back of this, our believe is that one of these models will better be able to answer our research question and furthermore bring insights of the financial phenomenon bankruptcies. The analysis of the models is presented in section 6 Result.

(42)

6 Result

In this section the analysis and results for the three models previously introduced will be presented. We have chosen to describe the results for the model with highest precision more comprehensively while the other two models will be presented not as thorough. The same methodology has been applied for these three models as we did for the initial model, i.e. residual analysis followed by diagnostics. However, we will not consider any variable selection methods since these three models are the set of models suggested by the full model.

6.1 Residual analysis

Regarding the residual analysis we could see that for the M odel1new outliers have occured;

observation 3 and 85 are newly detected outliers. For M odel2 the same outliers as before are detected. In M odel3 observation number 3 is a newly found outlier. However, from Cook’s distance we can tell that they are leverage points since the value is far less than 1 and will therefore not have influential on the model. Furthermore, these outliers nor the outliers detected in our initial model will be rejected. More about outliers and our reasoning about them in section 7.3 Outliers. Regarding the other forms of residuals they are not showing any significant differences from what we have seen in the standardised residual.

Figure 10: Standardized residuals for the three models.

Next, we would like to present the result for the added-variable plots for the second model. These are efficient to study when one wants to examine the linear relationship between the response variable and the independent variables separately. If we exclude the monthly trend and correlation, the plots shows that all other included variables have a non-zero slope. The variable retail sales and liquidated enterprises have the most steep slopes.

References

Related documents

Policymakers concerned about an influx in foreign acquisitions in strategically significant sectors as a result of the COVID-19 pandemic should note that despite that the

Factors such as avoid direct competition with large firms and provide opportunities for expansion would promote the process of internationalization of SMEs8. 

As part of our research, we hypothesize that (i) increased regulatory capital has a negative impact on SME bank lending, (ii) relationship-based banks are more adversely

 Påbörjad testverksamhet med externa användare/kunder Anmärkning: Ur utlysningstexterna 2015, 2016 och 2017. Tillväxtanalys noterar, baserat på de utlysningstexter och

A linear regression analysis was performed with the result that there is a linear relationship between the performance of the Industrial Transportation Companies and the variables

The business environments in these regions have been compared through nine different variables, namely the motives to start a business, the perception of entrepreneurship by

The purpose of the task aiming improve the strategic management system is to rationalize the execution of main strategic management functions of an SME (corporate management,

A multiple regression analysis has been performed to examine the significance of the relationship between macroeconomic variables and the performance of a small capitalisation