Overcoming a financial crisis: A study of which factors predicts the impact of a rapid economic change

(1)

Overcoming a financial crisis

A study of which factors predicts the impact of a rapid economic change

Arvid Artman

Department of Statistics

Uppsala University

Supervisor: Lars Forsberg

VT 2020

(2)

1

Abstract

This paper investigates which factors best predict the economic state of a Swedish

municipality after the 2008 crisis by constructing a linear model that regresses the change in the unemployment rate on a set of variables. The variables used for the model were from a dataset put together using data from a government service and were selected for the model using Bayesian information criterion. From this procedure, a model with six independent variables was estimated. The model’s statistics were examined, and the model was

subsequentially tried against the five multiple linear regression assumptions. It was concluded that the model did not fulfil the assumption of homoscedasticity, and because of this, the dependent variable was transformed into a logarithm, thus yielding a log-lin model. This model ended up fulfilling every assumption and had higher explanatory power than the previous model. It is concluded that the variables that denote the number of newly registered businesses per 1000 residents, the share of residents with a high education, the fraction of net-commuters, the number of refugees received with a residence permit per 1000 residents, total net investments per person, the share of long term unemployed residents and the population size all prove significant when included together in a log-lin model of the change in the unemployment rate.

Acknowledgements

I would like to direct a special thank you to my supervisor Lars Forsberg, who guided me through difficult decisions and problems throughout the writing process, and to my friend Victor Molnö, who assisted me when I encountered problems with the coding in RStudio.

Key Words

Financial crisis, Economic well-being, Multiple linear regression, Bayesian information criterion (BIC), Log-lin model

(3)

2

Table of content

1 Introduction ... 3 1.1 Background ... 3 2.2 Research question ... 3 2 Data ... 4 2.1 Source ... 4 2.2 Dependent variable ... 4 2.3 Independent variables ... 5 2.4 Limitations... 5 3 Method ... 6

3.1 Multiple linear regression ... 6

3.2 Choosing independent variables ... 6

3.3 Testing the model ... 7

4 Result ... 10 4.1 The model ... 10 4.2 Evaluation ... 11 4.3 Assumptions ... 12 4.4 Remedy ... 13 5 Conclusion ... 18

5.1 The final model ... 18

5.2 Alternative approaches ... 19

(4)

3

1 Introduction

1.1 Background

The 2007 financial crisis is generally regarded as the most severe global economic crisis since the 1930’s. The origin was the increase in losses from subprime mortgages in the United States, which lead to a national banking crisis and severe consequences for the community.1

With the financial market of most developed countries being very intertwined these days, the crises had a large impact on the economic state of many countries around the world. Countries with a close economic relationship with the United States saw an almost unanimous drop in GDP, and economically vulnerable countries were in some cases close to facing bankruptcy.2 Despite the negative impact of the crisis, not all countries were affected equally badly. China and Mexico are good examples of opposites on the spectrum; Mexico, being a relatively poor country and having a close economic tie to its northern neighbour, was hit hard by the crisis, while China merely saw a slowdown in their fast-growing GDP.34_{As a middle example,}

Sweden was hit hard by the crisis but managed to delay and decrease the worst impact to a certain degree. This is largely due to the Swedish banks’ carefulness with lending money in the wake of the national crisis in 1990.5_{Almost every country was affected to varying}

degrees, and each has its unique circumstances that contributed to their handling of the crisis. Countries are often the main objects of investigation for the effects of economic crises, and economic issues overall. However, a simple analysis of a country might not be enough to capture the entire effect of the crisis. Not as often does anyone ask themselves what is going on inside a country. Is the national impact a good estimator for the local impact within a country? Will a country be affected the same across all regions? The difference in the economy of local municipalities may sometimes be comparable to that between countries, although on a smaller scale, which would mean that just as the financial situation in Sweden could serve as an indicator for how the country would handle the crisis, the same should be possible for local regions.

2.2 Research question

This study aims to build a model that predicts the change in economic well-being in Swedish municipalities during the 2008 financial crisis based on the state of the municipality before the crisis, in 2008. The purpose of the study is to investigate which variables will correlate with mild and severe consequences from a crisis and if it is possible to, by observing the financial state of a municipality, accurately predict the depth of the impact.

Although ideal, the intention of this study is not to generalise this model for all countries or time periods. It is acknowledged that the financial effect of an economic crisis may vary over time and that various indicators may not be equally useful across all municipalities.

1_{Duca, J., Muellbauer, J. and Murphy, A., p. 12} 2_{Ozturk, S. and Sozdemir, A., p. 574}

3_{Angeles Villarreal, M., p. 19} 4_{Morrison, W., p. 3}

(5)

4

2 Data

2.1 Source

For this study, a dataset was put together in the program Excel with data from the website kolada.se. The website is a service provided by Rådet för främjande av kommunala analyser (RKA), an organisation consisting of the Swedish government and Sveriges Kommuner och Regioner (SKR), that comprises data for Sweden’s administrative divisions, mainly from Statistiska Centralbyrån (SCB) and in an accessible format.

2.2 Dependent variable

When constructing a model that predicts economic well-being after a financial crisis, one needs to select a variable that acts as a good indicator of economic well-being. This will be the variable that will be expected to change as the crisis continues and that the final model will measure. Ideally would have been to use the gross regional product (GRP) per capita as an indicator, which is a measure of the total economic activity within a region (in this case, the municipalities of Sweden) in a given time period divided by the number of people in the region. This is the most widely used measure of how well a region is doing. Unfortunately, data of GRP for Swedish municipalities only exists from 2012 and onwards.

Instead, another variable that could be expected to correlate with economic well-being, and that saw a major change from 2008 to 2009, had to be chosen. For this, the unemployment rate was chosen. Just as GRP, a low unemployment rate indicates economic well-being (and vice versa), and the variable increased dramatically after 2008, after having previously decreased.

Since the intention is to model the change in economic well-being, the dependent variable will be the change in the unemployment rate, measured in percentage units. Another possible variable would have been the percentage change in unemployment. This was not used because it turned out to not accurately capture the change in well-being, due to inconsistencies when it came to proportions.

The unemployment level in Sweden peaked in 2010 before it started to decrease again, so the change in unemployment between 2008 and 2010 is what will be predicted by the model to get the total effect of the crisis. Between these years, unemployment almost doubled nationwide. In all municipalities, unemployment rose to some degree; the average

municipality more than doubled its unemployment and the average change in percentage units was 4,2.

A problem with this measurement is the possible change in the number of residents in the municipality between the years. If, for example, the number of employed people in an area was to decrease (all else held constant), the unemployment rate will naturally increase, since a bigger share of the population would be unemployed than before. A way to counteract this problem would be to measure the change in the unemployment rate in 2008’s population size, using the following recoding:

(6)

5

This, however, only seems to be a necessary recoding if the people moving were

unproportionally either employed or unemployed. Since every municipality saw a rise in unemployment and nothing is known about the people who changed municipality, this assumption is not valid. When compared to the normal difference in the unemployment rate, the population change’s impact on the change in the unemployment rate was negligible, and thus the recoding seems redundant. Instead, the recoding used to obtain the dependent variable was simply as follows:

𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡 𝑟𝑎𝑡𝑒 2010 − 𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡 𝑟𝑎𝑡𝑒 2008 Where the unemployment rate is for March the given year.

2.3 Independent variables

To be able to predict how much this variable will change after the crisis, one also needs variables that measure the state of the municipality before the crisis. These variables will be evaluated in the model against the dependent variable. From kolada.se, different performance indicators for Swedish municipalities were selected for the dataset on the basis that they could logically be correlated with a rising unemployment rate. The final dataset consisted of 31 variables (see Appendix 1).

2.4 Limitations

As with GRP per capita, there were other variables that would have been interesting to test in the model, for which 2008 data did not exist. Among those are the Gini-coefficient, as an indicator of equality in the municipality, and the percentage of people working for different sectors and in different areas. Similarly, a few variables worth testing did not have sufficient indicator variables in the data. For the difference in median income between genders, as an indicator of gender equality, the numbers were too similar between all municipalities to be worth testing.

(7)

6

3 Method

3.1 Multiple linear regression

This study will make use of multiple linear regression to model the dependent variable. This approach is one of the most commonly used statistical tools for modelling a correlation. The method aims to estimate the average change in the dependent variable in response to a one-unit change in an independent variable, with all other variables held constant.

The linear regression model will take on the following form:

𝑌 = 𝛽₀+ 𝛽₁𝑋₁+ 𝛽₂𝑋₂+ ⋯ + 𝛽_𝑗𝑋_𝑗+ ⋯ + 𝛽_𝑘𝑋_𝑘+ 𝜀_𝑖

Where 𝑌 is the dependent variable, 𝑋_𝑛 is an independent variable, 𝛽₀ is the intercept, 𝛽_𝑗 is the slope coefficient associated with the 𝑗th independent variable and 𝜀_𝑖 is an error term (the unexplained variance) for observation 𝑖.

The parameters of the model are estimated using MLS (Maximum likelihood), which aims to minimize the sum of the squared residuals. The following assumptions are needed:

1. Linearity: The relationship between the dependent variable and each independent variable is linear.

2. Normality: The residuals are normally distributed.

3. Homoscedasticity: The residual variance is constant (the same for each value for 𝑋_𝑗). 4. Independence: The observations are independent of each other.

5. No (perfect) multicollinearity: The independent variables are not correlated. 3.2 Selecting independent variables

When selecting which independent variables to include in the final model, a few things are worth noting. The number of variables in the dataset (31) is rather large, and thus yields many options for combinations when fitting the model. This makes it a time-consuming process to evaluate each combination individually. Also, the method of creating a good model is exploratory in nature; we do not have a specific theory for which variables may explain the dependent variable the best, but instead wish to investigate this connection. Not only is it important to select the variables with the best combined explanatory power, but also to select a suitable number of variables. An extra variable may, depending on the model, add

explanatory power or serve as an unnecessary addition that increases the risk of overfitting. Also noteworthy is that the combinations of variables matter for the explanatory power; two variables that may not be significant on their own might be significant when combined and a variable that is significant in and of itself may not end up in the best model. Therefore, it might not be enough to simply add variables until the model will not benefit from one more. To evaluate each model, there needs to be a measurement of the accuracy that will serve as the criterion for model selection. While the 𝑅2 is a common measurement of the variance in the dependent variable explained by the independent variable, a problem is that 𝑅2_always

will increase as new variables are added. Thus, the way to get the highest 𝑅2_{will always be to}

(8)

7

controls for the number of variables in the model, lowering the value if variables without explanatory power are added. However, 𝑅_𝑎𝑑𝑗2 is more useful when you have a theory about which variables best explains the dependent variable and is also not designed to deal with overfitting.6

For this model, the Bayesian information criterion (BIC) will be used. The advantage of using BIC when selecting a model is that it imposes a large penalty for the numbers of parameters included, which is useful for preventing overfitting.7 A low BIC is preferred, and the decision rule will be that the model with the lowest BIC will be used. BIC is defined as follows:

𝐵𝐼𝐶 = 𝑘 ln 𝑛 − 2 ln 𝐿̂ Where:

𝑘 = the number of parameters to estimate in the model. These are the intercept, each slope coefficient for the independent variables and the variance of the error term (thus, an empty model has two parameters).

𝑛 = the sample size. In the case of this model, 𝑁 (= 290) will be used instead, since we are dealing with an entire population (all Swedish municipalities) and not a sample.

𝐿̂ = 𝐿(𝜃̂|𝑥), the maximized likelihood function for the model.

𝜃̂ = the parameter values that maximises the likelihood function 𝑥 = the observed data

To find the best model, the regression subset selection procedure in the RStudio packages “leaps” and “car” will be used. The procedure generates the best model (assessed by the statistic chosen) for each subset size (subset = a set of independent variables). When running this function, the exhaustive method was chosen to test every subset possible when regressed on the rise in unemployment. The program plots a graph that includes the subset with the lowest BIC for each subset size. For simplicity’s sake, a cap of 10 variables is set to begin with.

It should also be said that the selection of variables is made without any prior knowledge about their correlation with the dependent variable. The variables have not been examined before the selection, and thus possible additions such as interaction terms, logarithmation or polynomial correlation that could be proven significant will not be considered from the start. 3.3 Testing the model

The BIC is useful for selecting a model, but it is not the only criterion for a good model. It only gives an estimate of how well the model explains the data; it does not take the other criteria for linear regression into account. Therefore, the model that the program selects will be tested to see if it fulfils the aforementioned assumptions for multiple linear regression. If it

6_{Stauffer, p. 269} 7_{Ibid, p. 271}

(9)

8

does not, the model will be adjusted and, if necessary, the model with the second lowest BIC will be used and evaluated in the same way.

Out of the assumptions, normality and independence are fulfilled by the nature of the data set. Every municipality is its own administrative unit and not dependent on each other to any significant degree. Due to the central limit theorem (CLT), the high number of observations allows for the assumption that the coefficients.

The assumptions of linearity and homoscedasticity can both be examined by reviewing a residual plot. This is a plot with the fitted values from the model on the x-axis and the

residuals on the y-axis. Onto this, a trend line will be added that represents the residual mean for the corresponding fitted value. When the linearity is fulfilled, the trend line will

approximately be a straight line with a constant residual mean of 0. The model is homoscedastic if the residual variance is constant for each fitted value. When these assumptions are to be examined further, perhaps to single out a variable that causes a problem, the scatterplots of the correlation between the dependent variable and each independent variable can be viewed.

Apart from reviewing the residual plot, the model will also be tested for homoscedasticity using the Breusch-Pagan test in RStudio, which is used to test if the residuals are dependent on the fitted values from the model. The test is performed by regressing the squared residuals from the model on the independent variables using an auxiliary regression:

𝑢̂2 = 𝛾0+ 𝛾1𝑥1+ 𝛾2𝑥2+ ⋯ + 𝛾𝑗𝑥𝑗+ ⋯ 𝛾𝑘𝑥𝑘+ 𝑣𝑖

The null hypothesis is that the model is homoscedastic and is rejected if the conditional variance is explained by the fitted values to a large degree. Under the null hypothesis, the auxiliary regression is chi-squared distributed with 𝑘 degrees of freedom. The test will be performed using a 5% significance level and the null hypothesis will be rejected if the p-value is lower than the significance level. If so, heteroscedasticity will be assumed.

Multicollinearity, and additionally the correlation, will be detected by examining the variance inflation factor (VIF) and the Pearson correlation coefficients (𝜌), respectively. The VIF measures the multivariate collinearity within the model, whereas the Pearson correlation coefficient measures the bivariate correlation between the independent variables.

The VIF for variable 𝑗 is calculated using the 𝑅2 from regressing 𝑗 upon the other independent variables in the model. The VIF for variable 𝑗 is defined as:

𝑉𝐼𝐹𝑗 = 1 1 − 𝑅_𝑗2 Where: 𝑅_𝑗2 = 1 −𝑆𝑆𝑟𝑒𝑠 𝑆𝑆𝑡𝑜𝑡 = 1 − ∑ (𝑥𝑗−𝑥̂)𝑗 2 𝑛 𝑖

(10)

9

𝑥̂ = the value of 𝑥 for observation 𝑖 estimated by the model _𝑗 𝑥̅ = the mean of 𝑥 𝑗

The Pearson correlation between variables 𝑋 and 𝑌 is defined as: 𝜌𝑋,𝑌 =

𝑐𝑜𝑣(𝑋, 𝑌) 𝜎𝑋𝜎𝑌

Where:

𝑐𝑜𝑣(𝑋, 𝑌) = the covariance between variables 𝑋 and 𝑌 𝜎_𝑋 and 𝜎_𝑌 = the standard deviations for variables 𝑋 and 𝑌

The VIF can take on any positive number, while the Pearson correlation coefficient has a value between +1 and -1, where +1 denotes perfect positive correlation, -1 denotes perfect negative correlation and 0 denotes no correlation at all. Generally, a VIF above 10 and/or a correlation above 0.7 or below -0.7 is regarded as evidence for strong multicollinearity or high correlation between the variables within the model.

(11)

10

4 Result

4.1 The model

According to Figure 1, the model with the lowest BIC, and thus the best model, includes six variables. These variables are:

N_B = New_Businesses: The number of newly registered businesses (limited companies, individual traders, trading companies and limited liability companies) in the municipality per 1000 residents the 31 December.8

H = High_Education: Residents aged 25-64 with a post-secondary education of at least three years divided by all residents aged 25-64.9

N_C = Net_Commuting: The number of commuters who work in the municipality and live in another minus the number of commuters who live in the municipality and work in another, divided by the number of workers who live in the municipality (presented in percentages).10 Rf = Refugees: The number of refugees received (as their first municipality of residence) with a residence permit per 1000 residents the 31 December.11

8_{Kolada; Fri Sökning; Nyckeltal: ”Nyregistrerade företag, antal/1000 invånare”} 9_{Ibid, Nyckeltal: ”Högutbildade invånare 25-64 år, andel (%)”}

10_{Ibid, Nyckeltal: ”Nettopendling, andel (%)”}

11_{Ibid, Nyckeltal: ”Kommunmottagna i flyktingmottagandet med uppehållstillstånd, antal/1000 inv”}

Figure 1: The graph shows the subset size on the x-axis and the BIC on the y-axis. The models include the independent variables in abbreviated form. One model is included for each subset size and the figure does not include models that are both larger and has a higher BIC than the

(12)

11

N_I = Net_Investments: Total net investments divided by the population the 31 December.12 L = Long_Term_Unemployment: The number of residents aged 25-64 who in March was openly unemployed or in a program with activity support for at least six months, divided by the number of residents aged 25-64 the 31 December.13

The following is the estimation of the model:

Dependent variable: Unemployment_Change New_Businesses -0.264*** (0.050) High_Education -0.083*** (0.013) Net_Commuting 0.012*** (0.003) Refugees 0.112*** (0.031) Net_Investments 0.0001*** (0.00003) Long_Term_Unemployment 0.234*** (0.087) Constant 5.958*** (0.298) Observations 290 R2 0.508 Adjusted R2 0.498 Residual Std. Error 1.004 (df = 283) F Statistic 48.719*** (df = 6; 283) Note: *_p<0.1;**_p<0.05;***_p<0.01 4.2 Evaluation

From Table 1, we can see that every variable in the model is significant on at most a 1% significance level. The 𝑅2_{is 0.508 and the 𝑅}

𝑎𝑑𝑗2 is 0.498, which means that the model

explains approximately half of the variation in unemployment change, a moderate level of

12_{Ibid, Nyckeltal: ”Nettoinvesteringar, totalt kommun, kr/inv”} 13_{Ibid, Nyckeltal: ”Långtidsarbetslöshet 25-64 år, andel (%) av bef.”}

Table 1: The table shows the variables in the model with their respective slope coefficients, the p-value in brackets and asterixis indicating on what significance level the variable is significant on.

(13)

12

explanatory power. The residual standard error tells us that the model is on average wrong by a percentage unit.

4.3 Assumptions

As previously mentioned, the assumption of independence is fulfilled. Next, the residual plot will be examined to see if the model is linear and/or homoscedastic.

In Figure 2, the trend line is almost lining up with the line where the residual mean is equal to 0. The exceptions are the edges where the line curves slightly upwards, indicating a residual mean greater than 0 for these fitted values. However, since the observations in the edges are few they can be viewed as outliers, and thus the assumption of linearity can be seen as fulfilled.

Regarding the variance, Figure 2 shows clear evidence of heteroscedasticity in the model since the residuals on average increase in size as the fitted values increase. To test this, the Breusch-Pagan test will used to test the following hypotheses (recall the specification of the auxiliary regression):

𝐻₀: 𝛾₁ = 𝛾₂ = ⋯ = 𝛾_𝑗 = ⋯ = 𝛾_𝑘 = 0 𝐻₁: 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛾_𝑗 ≠ 0

Since the p-value, as seen in table 2, is lower than the significance level (0.05), 𝐻₀ is rejected and the model is concluded to, on a 5% significance level, suffer from heteroscedasticity.

BP 13.501

Df 6

P-value 0.036

Figure 2: Residual plot of the model with the red line representing the residual mean

(14)

13

To see if multicollinearity and correlation are present in the model, the correlation matrix and the VIF matrix from RStudio will be examined according to the rule of thumbs mentioned in the method section. In the tables, the variables are abbreviated.

As seen in Table 2, no VIF exceeds 10, and as seen in Table 3, no correlation between two variables is above 0.7 or below -0.7. Therefore, it is concluded that the model does not suffer from multicollinearity or high correlation between independent variables.

4.4 Remedy

The model can be said to fulfill every assumption for multiple linear regression except for homoscedasticity. This creates a problem because the model standard error will be incorrect and conditioned on the size of the fitted values.

Some common causes for heteroscedasticity are a high number of outliers, omission of

variables and mistransformation of data. There are several ways to deal with this problem, but one of the simpler ones is to transform the model into a log-lin model; that is, a model that is logarithmic in the dependent variable and linear in the independent variables. Thus, the change in unemployment will be recoded accordingly:

ln 𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡_𝐶ℎ𝑎𝑛𝑔𝑒

With this transformation, the interpretation of the coefficients changes; each 𝛽_𝑗 now denotes the percentage change in the unemployment change. While not being the most digestible interpretation, the outcome of the model can be used to calculate the estimated unemployment change (𝑌) in the following way:

𝑌̂ = 𝑒ln 𝑌̂ = 𝑒𝛽̂0+𝛽̂1𝑋1+𝛽̂2𝑋2+⋯+𝛽̂𝑗𝑋𝑗+⋯+𝛽̂𝑘𝑋𝑘 Variable VIF N_B 1.862 H 1.844 N_C 1.101 Rf 1.119 N_I 1.039 L 1.211 N_B H N_C Rf N_I L N_B 1 0.650 -0.036 -0.161 0.123 -0.267 H 0.650 1 0.058 -0.207 0.185 -0.104 N_C -0.036 0.058 1 0.153 -0.026 0.265 Rf -0.161 -0.207 0.153 1 -0.063 0.244 N_I 0.123 0.185 -0.026 -0.063 1 -0.002 L -0.267 -0.104 0.265 0.244 -0.002 1

Table 4: Correlation matrix for the independent variables Table 3: VIF values for each independent variable

(15)

14

A reestimation of the model with this variable as the dependent variable will however not be enough. Since this is an entirely new variable from a data point of view, it is appropriate to redo the regression subset selection to see if any other variables turn out to be significant with the logarithmic dependent variable.

Figure 4 shows that the new model specification almost yielded the same best model as before, with the difference being that the population size now is included:

Pp = Population: Total number of residents the 31 December.14

14_{Kolada; Fri Sökning; Nyckeltal: ”Invånare totalt, antal”}

Figure 4: The graph shows the subset size on the x-axis and the BIC on the y-axis. The models include the independent variables in abbreviated form. One model is included for each subset size and the figure does not include models that are both larger and has a higher BIC than the

(16)

15

The new model gives the following estimation:

Dependent variable: Unemployment_Change_log New_Businesses -0.082*** (0.012) High_Education -0.027*** (0.003) Net_Commuting 0.003*** (0.001) Refugees 0.023*** (0.007) Net_Investments 0.00002** (0.00001) Long_Term_Unemployment 0.056*** (0.021) Population 0.000001*** (0.00000) Constant 1.987*** (0.074) Observations 290 R2 0.605 Adjusted R2 0.595 Residual Std. Error 0.231 (df = 282) F Statistic 61.697***_{(df = 7; 282)} Note: *p<0.1; **p<0.05; ***p<0.01

With the new model, we can see in Figure 4 that the BIC decreased significantly for the best model. Similarly, Table 4 shows that both the 𝑅2 and the 𝑅_𝑎𝑑𝑗2 increased to around 0.6, meaning the model explains approximately 60% of the variance in the dependent variable. The standard error has also shrunk significantly, although this is partly due to logarithms being smaller than the original numbers, thus also having a smaller variance. The variable Net_Investments, however, lost a level of significance.

Table 5: The table shows the variables in the model with their respective slope coefficients, the p-value in brackets and asterixis indicating on what significance level the variable is significant on.

(17)

16

From Figure 5, we can see that the red line now falls even more closely in line with an average of 0, indicating clear and obvious linearity. Regarding heteroscedasticity, the model now looks much more homoscedastic than the previous one. The Breusch-Pagan test will once again be performed:

𝐻₀: 𝛾₁ = 𝛾₂ = ⋯ = 𝛾_𝑗 = ⋯ = 𝛾_𝑘 = 0 𝐻₁: 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 𝛾_𝑗 ≠ 0 The properties of the test was was as follows:

Since the p-value, as seen in table 6, is higher than the significance level (0.05), 𝐻0 is

accepted, and the model is assumed to be homoscedastic.

BP 4.246 Df 7 P-value 0.751 Variable VIF N_B 2.09 H 1.948 N_C 1.196 Rf 1.119 N_I 1.039 L 1.273 Pp 1.647

Figure 5: Residual plot of the model with the red line representing the residual mean

Table 5: VIF values for each independent variable

(18)

17

Tables 5 and 6 once again do not show sufficient evidence for the presence of multicollinearity.

All assumptions of multiple linear regression are now fulfilled.

N_B H N_C Rf N_I L Pp N_B 1 0.650 -0.036 -0.161 0.123 -0.267 0.470 H 0.650 1 0.058 -0.207 0.185 -0.104 0.480 N_C -0.036 0.058 1 0.153 -0.026 0.265 0.297 Rf -0.161 -0.207 0.153 1 -0.063 0.244 -0.004 N_I 0.123 0.185 -0.026 -0.063 1 -0.002 0.076 L -0.267 -0.104 0.265 0.244 -0.002 1 0.138 Pp 0.470 0.480 0.297 -0.004 0.076 0.138 1

(19)

18

5 Conclusion

5.1 The final model

From this study, it can be concluded that the rise in the logarithm of the unemployment rate in a Swedish municipality during an economic crisis can be explained using the number of newly registered businesses per 1000 residents, the share of residents with a high education, the fraction of net-commuters, the number of refugees received with a residence permit per 1000 residents, total net investments per person, the share of long term unemployed residents and the population size. Two of these variables, New_Businesses and High_Education, have a negative effect on the dependent variable, while the others have a positive effect.

It is not surprising that New_Businesses and High_Education has a negative effect on the rise in unemployment. Having a lot of newly started businesses around will most likely mean they will provide jobs for the residents, and highly educated people will always have an easier time finding work.

The variable Net_Commuting is a bit harder to interpret. If it takes on a positive value, it means that more people commute into the municipality than out of it. This might be associated with a rise in unemployment since it means that more of the jobs available are being performed by people who are not residents of the municipality, thus, the unemployment rate would be lower if the jobs were performed by residents.

Refugees are generally unemployed to a larger degree than the population as a whole and accepting them into a municipality will almost certainly increase unemployment, thus why Refugees has a positive effect on rising unemployment. Since the variable only measures the refugees accepted the same year, it does not take any possible contribution to society that a refugee might bring after further integration into account. Upon arrival, refugees are likely a net cost in most cases until they become integrated into society. The time this takes might depend on different factors, such as their age, educational level and which municipality they end up in.

Net_Investments has a positive effect on the rise in unemployment. This is surprising since one would think that investments will create jobs in one way or another, and thus have a negative effect. It might be because in the crisis the investments are depreciated to some degree which financially hurts the municipality.

Being another measure of unemployment, Long_Term_Unemployment obviously shows some similarities with Unemployment_Change. It has a positive effect on it and likely has so

because a high long-term unemployment among residents in a municipality might be an indicator of other underlying economic problems that become more apparent in the crisis and causes the unemployment rate to rise.

Perhaps the most surprising variable to find in the model was Population. The variable has a positive effect in the model and is significant but has a negative (and very low) bivariate correlation with Unemployment_Change. There are also a few outliers with a significantly larger population size than the others, namely the big city municipalities (Stockholm, Gothenburg and Malmö). According to the model, bigger municipalities appear to, when

(20)

19

correcting for other factors, have a higher rise in unemployment during a crisis than smaller ones. This indicates that the variable interacts with the other variables in a way that changes its sign from when the variables are compared in a bivariate format.

It also became apparent how much the combination of variables in a model mean. When examining the BIC plots, we see that the models with smaller subsets include very different variables than the ones with even lower BIC and larger subsets.

5.2 Alternative approaches

When it came to the model selection, a few things were surprising. Despite it with BIC almost being more likely to choose a too small model than a too big one, the two models examined had six and seven independent variables respectively, which is relatively large. It was also clear that as the subset size increased, the marginal deduction in BIC decreased. Despite BIC imposing a strict penalty for subset size, it is possible that a better approach would have been to choose the last model that significantly lowered the BIC instead of simply the model with the lowest BIC overall. The question here is whether BIC should be strictly relied on to choose the best model or if one should choose a smaller model anyway in favour of not risking overfitting the model (a problem which BIC should counteract).

There is also an open question about whether heteroscedasticity is a problem in this model. As previously mentioned, heteroscedasticity makes the standard errors incorrect. The standard errors are a measure of how accurate the estimates from the model are, but since the model is not tested on additional data, and arguably cannot be, one could argue that this is not a problem in this specific case.

5.3 Interpretation

It should be noted that this model should not be used as a guideline for how to hamper the economic effect of a financial crisis. If a certain variable is negatively correlated with a rise in the unemployment rate during the crisis, it does not mean that the relationship is causal. There might be a third variable that causes both, and thus the variable, in that case, would have no direct effect on the rise in unemployment. Since the severity of crises varies greatly, the estimations may not be applicable to other crises that may have, for example, a generally larger or smaller effect on the unemployment rate. Hopefully, this model will at least serve as an indicator for which variables that together may predict certain economic effects of a crisis on a municipality.

(21)

20

References

Angeles Villarreal, M. 2010. The Mexican Economy After the Global Financial Crisis. Congressional Research Service.

Duca, John V., Muellbauer, John and Murphy, Anthony. 2010. Housing markets and the financial crisis of 2007–2009: Lessons for the future. Journal of Financial Stability. Vol. 6(4). Pages 203-217. Elsevier.

Kolada. RKA (Rådet för främjande av kommunala analyser). Available at:

https://www.kolada.se/ (accessed 12 May 2020)

Morrison, Wayne M. 2009. China and the Global Financial Crisis: Implications for the United States. Congressional Research Service.

Ozturk, Serdar and Sozdemir, Ali. 2015. Effects of Global Financial Crisis on Greece Economy. Procedia Economics and Finance. Vol. 23. Pages 568-575. Elsevier.

Stauffer, H. 2008. Contemporary Bayesian and Frequentist Statistical Research Methods for Natural Resource Scientists. Hoboken, New Jersey, U.S.: John Wiley & Sons, Inc.

Sveriges Riksbank. 2018. The financial crisis 2007-2009. Available at:

(22)

21

Appendix 1

Complete list of variables in the dataset

Assistance_Cost: Gross cost minus internal revenue and sale to other municipalities and regions for financial assistance (including investigation costs), divided by the population the 31 December.

Assistance_Receivers: The number of residents (including children) who at some point during the year has received financial assistance, divided by the population 31 December (presented in percentages).

Average_Age: Average age the 31 December.

Business_Revenue: Extern revenue excluding revenue from sale to other municipalities and regions for business, divided by the population the 31 December.

Child_Poverty_Index: Percentage of children aged 0-17 with Swedish or foreign (at least one foreign-born parent) background who are in financially disadvantaged (with low income or maintenance support) households.

Debt: Total debt in the municipality, divided by the population the 31 December. Demographic_Provision_Ratio: The sum of the number of residents ages 0-19 and the number of residents aged 65 or older divided by the number of residents aged 20-64. Equity_Ratio: Personal capital divided by the sum of assets in the balance sheet, excluding pension commitments earned before 1998.

Female_Wage_Divided_Male_Wage: Monthly median salary for women aged 18-66 employed by the municipality divided by the monthly median salary for men aged 18-66 employed by the municipality.

Foreign_Born: Percentage of foreign-born residents aged 18-64 in the population.

High_Education: Residents aged 25-64 with a post-secondary education of at least three years divided by all residents aged 25-64.

Investment_Infrastructure: Investment expenditure from infrastructure, protection etc. divided by the population the 31 December.

Investment_Spending: Total net investments in the municipality divided by the population 31 December.

Long_Term_Unemployment: The number of residents aged 25-64 who in March was openly unemployed or in a program with activity support for at least six months, divided by the number of residents aged 25-64 the 31 December.

Median_Net_Income: Disposable median net income (the sum of all taxable and tax-free incomes as earned income and capital income and transfers minus tax and other negative transfers) for residents aged 20 or older.

Net_Commuting: The number of commuters who work in the municipality and live in another minus the number of commuters who live in the municipality and work in another, divided by the number of workers who live in the municipality (presented in percentages).

Net_Investments: Total net investments divided by the population the 31 December.

Net_Profit: Net finances (financial revenue minus financial cost) divided by the population the 31 December.

New_Businesses: The number of newly registered businesses (limited companies, individual traders, trading companies and limited liability companies) in the municipality per 1000 residents the 31 December.

Parental_Allowance_Men: Percentage of parental allowance (in terms of the number of net days) that is used by men.

(23)

22

Pay_Gap: Monthly median income for women aged 18-66 minus monthly median income for men aged 18-66 employed by the municipality.

Population: Total number of residents the 31 December.

Post_Secondary_Education: Percentage of residents aged 25-64 with a post-secondary education (including postgraduate education).

Refugees: The number of refugees received (as their first municipality of residence) with a residence permit per 1000 residents the 31 December.

Revenue: Total extern revenue excluding revenue from sale to other municipalities and regions, divided by the population the 31 December.

Tax_Revenue: Total tax revenue in the municipality, divided by the population the 31 December.

Tourism_Revenue: Extern revenue excluding revenue from sale to other municipalities and regions for tourism, divided by the population the 31 December.

Unemployment: Percentage of residents aged 18-64 who in March are openly unemployed or in a program with activity support.

Unemployment_Change: Unemployment 2010 minus unemployment 2008.

Urban_Degree: Percentage of residents who live in an urban area (collections of buildings with at least 200 residents, unless the distance between the buildings does not exceed 200 meters).

(24)

23

Appendix 2

Code for the regression subset selection:

Library(leaps)

leaps <- regsubsets(Unemployment_Change~., data = Municipalities, nbest = 1 , method = "exhaustive", nvmax = 10)

subsets(leaps, statistic="bic")

RStudio output for the first model:

Call:

lm(formula = Unemployment_Change ~ New_Businesses + High_Education + Net_Commuting + Refugees + Net_Investments + Long_Term_Unemployment) Residuals:

Min 1Q Median 3Q Max -3.0814 -0.6035 -0.0894 0.6085 4.7961 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 5.958e+00 2.976e-01 20.017 < 2e-16 *** New_Businesses -2.640e-01 5.045e-02 -5.234 3.24e-07 *** High_Education -8.348e-02 1.296e-02 -6.439 5.15e-10 *** Net_Commuting 1.169e-02 3.151e-03 3.710 0.000249 *** Refugees 1.124e-01 3.127e-02 3.595 0.000383 *** Net_Investments 8.722e-05 2.872e-05 3.036 0.002617 ** Long_Term_Unemployment 2.343e-01 8.710e-02 2.690 0.007567 ** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.004 on 283 degrees of freedom

Multiple R-squared: 0.5081, Adjusted R-squared: 0.4977 F-statistic: 48.72 on 6 and 283 DF, p-value: < 2.2e-16

RStudio output for the second model:

Call:

lm(formula = Unemployment_Change_log ~ New_Businesses + High_Education + Net_Commuting + Refugees + Net_Investments + Long_Term_Unemployment + Population)

Residuals:

Min 1Q Median 3Q Max -0.98522 -0.13681 0.02074 0.15289 0.80149 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 1.987e+00 7.353e-02 27.019 < 2e-16 *** New_Businesses -8.227e-02 1.227e-02 -6.704 1.10e-10 *** High_Education -2.656e-02 3.058e-03 -8.683 3.15e-16 *** Net_Commuting 2.783e-03 7.539e-04 3.692 0.000267 *** Refugees 2.272e-02 7.181e-03 3.164 0.001725 ** Net_Investments 1.707e-05 6.594e-06 2.589 0.010128 * Long_Term_Unemployment 5.572e-02 2.050e-02 2.718 0.006977 ** Population 7.944e-07 2.786e-07 2.852 0.004667 ** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.2305 on 282 degrees of freedom

Multiple R-squared: 0.605, Adjusted R-squared: 0.5952