• No results found

Analysis of the IFO-model in the cost equalization for municipalities

N/A
N/A
Protected

Academic year: 2021

Share "Analysis of the IFO-model in the cost equalization for municipalities"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

Analysis of the IFO-model in the cost equalization for municipalities

Author:

Elise Wester

(811020)

Semester:

Ht -20

Course name, level and credits:

Independent Project I, Second Cycle 15.0 credits, ST433A

Subject: Statistics Supervisor: Thomas Laitila Examiner: Olha Bodnar

(2)

Abstract

The Swedish cost equalization consists of nine sub-models and is to create equal economic conditions for all municipalities, independent of unaffectable structural conditions. The IFO-model, that includes all activities in the individual and family social care, has over the years been known to be difficult to construct. This is due to the heterogeneity of the people in need of IFO-activities. In this thesis the aim is to enhance the performance of the IFO-model. This is done by suggesting new explanatory variables and testing them in linear regression models, using the ordinary least squares estimator. It was also tested to use panel data and the first difference estimator to try and maximize the utility of having data from two years. It was found, using OLS, that the new variables proportion of newly registered with the employment service and proportion of young adults improved the IFO-model. Introducing them made one of the variables used today insignificant and it was removed from the model. The result from using the panel data was very different than expected and opened the question if the method used today only captures the differences between municipalities but not really reflect the effect of a change in the the individual municipality. The outcome of the cost equalization, using today’s model and the model suggested in this thesis, was calculated for the individual municipalities and also for municipality groups. The changes ranges from 0.0177 SEK per resident in Vänersborg to 750.5 SEK per resident in Åsele. It was also tested to divide the municipalities into groups to see if the model performs differently in different categories of municipalities and it was found that there is a difference.

(3)
(4)

Contents

1 Introduction 5

1.1 The financial equalization system . . . 5

1.1.1 Aim of thesis . . . 6

1.2 History . . . 6

2 Data 10 2.0.1 Data processing . . . 11

3 Method 13 3.1 Linear regression model . . . 13

3.1.1 Linearity . . . 14

3.1.2 Exogenity . . . 14

3.1.3 Normality . . . 14

3.1.4 No multicollinearity . . . 15

3.1.5 Homoscedasticity and Non-autocorrelation . . . 15

3.2 Standard cost . . . 16

3.3 New variables . . . 17

3.4 Models . . . 17

3.5 Panel data . . . 19

4 Result and analysis 20 4.1 Models . . . 20

4.1.1 Assumptions of multiple linear regression . . . 20

4.2 Model selection . . . 24

4.2.1 Selected models . . . 25

4.3 First difference model . . . 25

4.4 Outcomes . . . 26

4.4.1 Net cost deviations . . . 26

4.4.2 Fee or grants . . . 26

4.4.3 Municipality groups . . . 28

4.4.4 New model on divided data . . . 29

5 Discussion 31 5.0.1 Future research . . . 32

(5)

Chapter 1

Introduction

1.1

The financial equalization system

The municipal financial equalization consists of five parts: income equalization, cost equaliz-ation, structural grants, introduction grants and regulation grants/fees. The cost equalization is a solidarity-based system within the municipal sector and the state have no share in the financing. The purpose of the cost equalization is to create equal economic conditions for all of Sweden’s municipalities to provide service to the residents, independent of unaffectable structural conditions. The system is not meant to even out differences in the level of ambition or effectiveness, but aims to equalize for structural needs and cost differences, e.g. differ-ences in the proportion of children or the proportion of older people. The cost equalization for municipalities consists of nine sub-models:

• Preschool, leisure care center and other educational activities • Preschool class and primary school

• High school

• Municipal adult education • Individual and family social care • Elderly care

• Infrastructure • Business-wide costs

• Public transport (common to municipalities and regions)

For all sub-models, a standard cost for each of the municipalities is calculated, it is the cost that the municipalities are expected to have based on their structural conditions. Grants or fees are calculated as the difference between the standard costs in each municipality and the average standard cost in the country. Those with a higher standard cost than the average receives a

(6)

grants and those with a lower standard cost will pay a fee. The sum of all fees is always equal to the sum of all grants. In practice, grants and fees are not calculated per sub-model. Instead, the standard costs from all models are summed to a so-called structural cost per municipality and the difference between the structural cost and the country’s average will decide whether the municipality will receive a grants or pay a fee.

The IFO-model

The IFO-model is used to calculate the standard cost for all activities in the individual and family social care, such as financial assistance, child and youth care, substance abuse care for adults, family law and other care for adults. The IFO-model is the model that has been most difficult to construct due to the heterogeneity of the people in need if IFO-activities.

1.1.1

Aim of thesis

The aim of this thesis is to investigate the possibility of improving the IFO-model, by finding a model that better explains the differences in costs for the IFO-activities based on structural differences in the municipalities. This is done by suggesting new variables, that is thought to have an impact on the municipalities costs for IFO-activities, and adding them to the model. Different models is tested to be able to suggest changes to enhance the performance of the IFO-model.

1.2

History

Since early 20th century, there have been different types of systems for the equalization of the economical conditions between the municipalities in Sweden. There have been many state investigations of the equalization systems over the years and the equalization system has over time been adapted to the change in division of responsibilities between state and municipal sector and to other changed conditions.

1993

In 1990, the government appointed a parliamentary committee, the Municipal Economic Com-mittee (Kommunalekonomiska kommittén, KEK). The comCom-mittee was assigned the task of designing a new system for the state’s contribution to municipalities and regions, and review other sources of funding for the municipal sector. The cost equalization introduced in 1993 was based on KEK’s proposal [1] for equalization of structural costs between municipalities and included only four factors for measuring structural conditions. The factors were based on regression analyzes and included climate (a warming index), age structure, degree of sparsely populated areas (population density) and social structure. Social structure was described with the variables Proportion of early retirees and Proportion of children with single parents in the municipalities. The method for cost equalization was called the Total Cost Method (Totalkost-nadsmetoden). The Committee concluded that further development work was necessary, so

(7)

the 1993 design of cost equalization became a temporary solution, and in June 1992, the gov-ernment appointed a new investigation, the Structural Costs Investigation [2] (Strukturkost-nadsutredningen). The task of the investigation was to develop principles and methods for leveling out structural cost differences between the municipalities.

1996

The changes in the cost equalization that were introduced in 1996 were based on the Structural Costs Investigation that stated that the cost equalization should be based on unaffectable and measurable factors, that measure structural cost differences. The investigation also proposed that cost equalization should be designed as a zero-sum game so that municipalities that are structurally favored contribute to those that are less favored. In the method introduced, called the Standard Cost Method (Standardkostnadsmetoden), the structural cost differences are cal-culated using a number of sub-models for different activities. In each sub-model, a standard cost is calculated using factors that reflect the structural needs and cost differences between different municipalities. The standard cost method is still used in the current cost equalization. In 1995, the government appointed a new parliamentary committee with the task of develop-ing and followdevelop-ing up the 1996 equalization system. The committee took the name Municipal Equalization Investigation [3] (Kommunala utjämningsutredningen) and submitted its final report in December 1998 with proposals for changes in cost equalization from the year 2000. 2000

In 2000, based on the Municipal Equalization Investigation’s proposal, major changes were made to the model for individual and family care. The investigation found that the costs for IFO-activities increased considerably during the 1990s and that the costs were distributed un-evenly between the municipalities. It was also noted that the differences between the actual costs for IFO and the estimated costs of the model had increased significantly. The explanat-ory variables that were included in the new IFO-model were single women with children up to the age of 15, foreign-born refugees and close relatives and other foreign-born from countries outside the Nordic region and the EU, unemployed without compensation, proportion of men with low income and a density measure (the square root of the urban population). The ex-planatory power of the new IFO model was 68 percent, which was 4 percentage points higher compared to an updated version of the IFO-model that was replaced.

In September 2001, the government again appointed a parliamentary committee, the Equaliz-ation Committee (Utjämningskommittén), with the task of reviewing the equalizEqualiz-ation system. The committee proposed changes in the cost equalization in its report submitted in 2003 [4] and the report was the basis for the changes introduced in the equalization system 2005. 2005

Again it was decided to make changes in the cost equalization to increase the precision in the equalization and also there were certain cost bases that needed to be updated. This time the cost equalization for individual and family care was divided into two parts, that were then

(8)

summed up to a standard cost for individual and family care. One part included structural differences that were calculated for financial assistance, substance abuse care and other indi-vidual and family care. The other part included structural differences in child and youth care with the factors children of single parents, children with a foreign background, prosecuted youth and a population of up to 75,000 inhabitants.

A parliamentary committee, that took the name Equalization Committee.08 (Utjämningskom-mittén.08), was tasked with evaluating and investigating the system of municipal financial equalization in September 2008. The committee submitted its report [5] in April 2011. 2014

The changes that were implemented in 2014 were suggested by the Equalization Commit-tee.08 and from within the Government Offices. The current sub-models were retained, but some updates and a number of major adjustments were made, as many of the models had not been updated for a long time. The sub-model for individual and family care was again re-placed with a new model, since the degree of explanation had fallen. In 2010, the explanatory power had fallen to 46 percent from having been 66 percent in 2005 when the model was in-troduced. The differences between reported net costs for IFO and the estimated standard costs also varied greatly between the municipal groups. The entire average net cost of individual and family care where again estimated with a single model, using the variables proportion of unemployed without compensation, proportion of low-educated people born in Sweden aged 20–40, square root of the urban population in the municipality in 2005, proportion living in apartment buildings built 1965–1975 and proportion in the population receiving financial as-sistance for longer than 6 months.

In November 2016, the government appointed a special investigation, the Cost Equalization Investigation (Kostnadsutjämningsutredningen), to conduct a new review of the municipal cost equalization. The investigation was tasked to consider whether major societal changes, which affect costs in municipalities, are sufficiently captured by the current system. The investig-ator should also analyze whether it is possible to simplify the equalization. The investigation submitted its report [6] in 2018.

2017

To counteract errors due to the large influx of refugees, Statistics Sweden (SCB) made some changes in variable definitions. In the individual and family care model, people with es-tablishment allowance were excluded from the variable proportion of unemployed without compensation.

2020

The Cost Equalization Investigation’s analyzes show that the degree of explanation in the current model, i.e. how well the included variables explain differences in net costs between municipalities, has fallen since it was introduced. The investigation also points to problems with influencing and skewed incentives with regard to the variable proportion of the population

(9)

who have received financial assistance for a period longer than six months and instability with regard to the variable for unemployment. Based on the report from the Cost Equalization Investigation, the calculation of the standard cost for individual and family care are based on the following variables: the proportion of children aged 0–19 living in households with a low income standard, the proportion of low-educated born in Sweden aged 20–40 year, the square root of the urban population in the municipality, the proportion of the population living in apartment buildings built 1965–1975, number of days with sickness benefits for residents age 16-64 and a variable for adjusting income effect due to border commuting to Norway and Denmark.

(10)

Chapter 2

Data

The estimates for the variables used in the regression models was calculated using the average of the years 2018 and 2019. The decision to use averages was based on the use of averages, from the years 2014-2016, in the current model. The variables descriptions and abbreviations are displayed in table 2.1, were the dependent variable, NET, and the first six of the explanatory variables are the ones used today and the last four are the new variables suggested in this thesis. The variables CBIS, LEL, SqUP, ApB, SDB and BC were obtained from Statistics Sweden (Statistiska centralbyrån, SCB). The Variable NRES was obtained from the Swedish Public Employment Service (Arbetsförmedlingen).The variables NET, SPH, YA and OR were accessed from Kolada, a database by the Council for Promotion of Municipal Analyzes (Rådet för främjande av kommunala analyser, RKA) that is a non-profit association. Members are the state and Sweden’s Municipalities and Regions. The dependent variable, NET, is shown

Abbreviation Variable description

NET The average net cost for the municipalities IFO-activities

CBIS Proportion of children, age 0-19, that live below income standard LEL Proportion of residents, age 20-40 and born in Sweden,

with low educational level

SqUP Square root of the urban population

ApB Proportion of the residents that live in apartment buildings built 1965-1975

SDB Number of days with sickness benefits for residents age 16-64 BC Adjustment for border commuters. Calculated by multiplying

the proportion of border commuters by CBIS.

NRES Proportion of residents, age 16-64, that are newly registered at the employment service

SPH Proportion of single parent households

YA Proportion of adult residents that are between 18 and 24 years old OR Proportion of residents over 65 years

Table 2.1: Abbreviations of the variables used.

(11)

Obs. Min Mean Median Max NET 290 1320 3971 3912 8942 CBIS 290 0.0287 0.1008 0.0947 0.2641 LEL 290 0.0067 0.0232 0.0233 0.0467 SqUP 290 31.62 140.4 110.4 960.9 ApB 290 0.0011 0.0788 0.0687 0.3911 SDB 290 10.90 27.55 28.04 42.43 BC 290 0.0033 0.1000 0.0363 2.932 NRES 290 0.0314 0.0623 0.0611 0.1044 SPH 290 0.0148 0.0267 0.0267 0.0354 YA 290 0.0989 0.1296 0.1293 0.2101 OR 290 0.1256 0.2370 0.2429 0.3509

Table 2.2: Descriptive statistics.

for each of the variables are seen. The municipalities average net costs varies from 1320 SEK in Vellinge to 8942 SEK in Filipstad. The square root of the urban population varies between 31.62 in Bjurholm and 960.9 in Stockholm and it is seen in the plot in figure 2.1 that the three big cities, Stockholm, Göteborg and Malmö, stands out from the rest of the municipalities. For most variables the mean and the median lie close to each other. The one that stands out is BC, where a small number of municipalities with a very large proportion of boarder commuters pushes the mean up. This can also be seen in figure 2.1.

2.0.1

Data processing

The data from the different sources were loaded into the statistical software R where it were combined into a data frame. For all calculations and data processing the packages: readxl, MASS, sandwich, lmtest, mvShapiroTest, olsrr, tm and car were used. The data from all sources for 2018 were merged together into a data frame, by municipality, and the same were done with the data for 2019. Then, using the 2018 and 2019 data frames, two new data frames were created, one where the average of the two years was used and one where instead the difference between the years was used.

(12)
(13)

Chapter 3

Method

To investigate the possibility of better explaining the differences in IFO-costs for the municip-alities, multiple linear regression models and the OLS estimator were used. This is the method used today and for all previous versions of the IFO-model. It was also tested to use panel data and the first difference estimator, to try to maximize the utility of having access to data from multiple time periods. Weighted least squares was also considered but it was decided to use robust standard errors for the OLS estimates, for comparability with the model used today.

3.1

Linear regression model

The multiple linear regression model assumes a linear relationship between a dependent vari-able yi and a set of explanatory variables xTi = (xi0, xi1, . . . , xiK). Every single observation i

follows yi= xTi β + ui where β is a (K + 1)-dimensional column vector of parameters, xTi is

a (K + 1)-dimensional row vector and uiis a random error term, with mean zero. The whole

sample of N observations can be expressed in matrix notation, y = Xβ + u where y is a N-dimensional column vector, X is a N × (K + 1) matrix and u is a N-N-dimensional column vector of error terms, i.e.

     y1 y2 .. . yN      =      1 x11 . . . x1K 1 x21 . . . x2K .. . ... . .. ... 1 xN1 . . . xNK           β0 β1 .. . βK      +      u1 u2 .. . uN      . (3.1)

Ordinary least squares (OLS) minimizes the squared distances between the observed and the predicted dependent variables yi:

S(β ) =

N

i=1

(yi− xTi β )2= (y − Xβ )T(y − Xβ ) −→ minβ, (3.2)

which gives the OLS estimator: ˆ

β = (XTX)−1XTy. (3.3)

(14)

1. Linearity 2. Exogenity 3. Normality

4. No multicollinearity

5. Homoscedasticity and Non-autocorrelation

If assumptions 1, 2, 4 and 5 are met, the OLS estimator is consistent and unbiased, and if assumption 4 is also met tests like t-test and F-test are valid.

3.1.1

Linearity

Linearity means that the functional relationship between dependent and explanatory variables is linear in parameters, that the error term enters additively and that the parameters are con-stant across observation i. This assumption can be checked by plotting y against each of the explanatory variables xj and visually inspect the scatter plot for signs of non-linearity.

3.1.2

Exogenity

Exogenity implies that the error term and the explanatory variables are uncorrelated i.e E[ui|xi] =

0 ⇒ Cov[ui, xi] = 0. Violations of the exogenity assumption will lead to biased estimators.

This assumption can be checked by plotting the residuals against the explanatory variables and look for any kind of patterns, which would indicate that the functional form is misspe-cified. An other reason for violations of this assumption is if any variables that are correlated with the explanatory variables is omitted, which is difficult to check for.

3.1.3

Normality

Normality means that the residuals of the regression are normally distributed. If this assump-tion is met, and the matrix of explanatory variables is deterministic, tests like t-test and F-test are valid. This assumption may be checked visually by a histogram or a QQ-Plot. Normality can also be checked with different test, e.g. the Shapiro-Wilks test.

Shapiro-Wilks test

Shapiro-Wilks test [10] tests the null hypothesis that a sample w1, w2, . . . , wn comes from a

normally distributed population. The test statistic is

W = ∑ n i=1aiw(i) 2 ∑ni=1(wi− ¯w)2 , (3.4)

where w(i) is the i : th smallest of the ordered sample and ¯w= ∑ni=1wi/n is the sample mean. Coefficients ai is given by (a1, a2, . . . , an) = m

TV−1

(15)

(m1, m2, . . . , mn)T is the covariance matrix and vector of expected values, respectively, for

the order statistics of independent and identically distributed random variables from a stand-ard normal distribution, and C = ||V−1m|| = (mTV−1V−1m)1/2. If the p-value is less than the

chosen α-level, the null-hypothesis is rejected.

3.1.4

No multicollinearity

Perfect multicollinearity occurs when an explanatory variable is a linear combination of one or more of the other explanatory variables. In case of perfect multicollinearity it is impossible to compute the OLS estimator. If the multicollinearity is not perfect it may still lead to large standard errors or make small changes in the input data lead to large changes in the model. To check for bivariate correlation the correlation matrix can be calculated, where any two variables with a correlation higher than | ± 0.7| indicate multicollinearity. But, it’s possible that while no two variables are highly correlated, three or more together are multicollinear. Too check for this, the Variance inflation factor (VIF) can be calculated. VIF is a direct measure of how much the variance of the coefficient is being inflated due to multicollinearity. For each of the explanatory variables xj, the variance inflation factor may be calculated as

VIFj=

1

1 − R2j, (3.5)

where R2j is the proportion of the variance in xjthat is predictable from the other explanatory

variables, when xj is regressed onto them. A rule of thumb is that VIF should be less than 5

in order to be accepted as not causing multicollinearity.

3.1.5

Homoscedasticity and Non-autocorrelation

Homoscedasticity means that the variance of error terms is equal across the values of the explanatory variables. This assumption can be checked by plotting the standardized resid-uals against the predicted values. The points should be randomly scattered about a horizontal line. In contrast, any systematic pattern or clustering of points indicate that the assumption is not met and we instead have heteroscedasticity. The homoscedasticity assumption can also be checked using different test, e.g. the Breusch-Pagan test. There should also not be any correlation between the error terms. Autocorrelation normally occurs in time-series data but autocorrelation may occur in cross-sectional data if observations are dependent in aspects other than time. For example, measurements made at nearby locations may be closer in value than measurements made at locations farther apart.Whenever some ordering of sampling units is present, the autocorrelation may arise. This phenomenon is called spatial, or serial, auto-correlation. Visual checks of the residuals can detect autocorrelation or tests like the Durbin Watson test can be used on ordered data such as time-series or ordered space-series.

Breusch-Pagan test

It is assumed that the heteroscedasticity has the form σi2= h(zTi γ ) for the variances of the observations where zi= (1, z1i, z2i, . . . , zqi) explain the difference in the variances. The

(16)

null-hypothesis of homoscedasticity is γ1= γ2= . . . = γq= 0. Fit the original model and compute

the residuals ˆui= yi− xTi β . Regress ˆˆ u2i on zi using ordinary least squares and compute R2.

Under the null-hypothesis the test statistic nR2 is asymptotically χq2-distributed. This is the studentized Breusch-Pagan test used in R [12], that was suggested by Koenker.

Durbin-Watson test

The Durbin-Watson test [13] tests the null-hypothesis that there is no autocorrelation. The test statistic is DW = ∑ N i=2uˆi− ˆui−1 ∑Ni=1uˆ2i ≈ 2(1 − ˆρ ), (3.6)

where ˆui is the residuals of the regression. Since DW is approximately equal to 2(1 − ˆρ ), where ˆρ is the sample autocorrelation of the residuals, DW around 2 indicates ˆρ ≈ 0 i.e. no autocorrelation.

3.2

Standard cost

Today the municipalities standard costs are calculated as in equation (3.7) where the coeffi-cients β are multiplied by the values for the explanatory variables for each municipality. The coefficients were estimated using a regression model of the form y = Xβ + u, where the av-erage net cost for individual and family care in 2014-2016 was estimated by the explanatory variables: proportion of children 0-19 years old that live below income standard, proportion of residents in the age 20-40, born in Sweden, with low education level, the square root of the urban population (updated every 5 years), the proportion of the residents living in apartment buildings built 1965-1975, the number of days with sickness benefits for residents age 16-64 and the proportion of border commuters.

Standard costt = ((β0+ β1· CBISt−2+ β2· LELt−2+ β3· SqUP2015

+ β4· ApBt−2+ β5· SDCt−2+ β6· BCt−2) · adjustment factor

+ supplements/deductions for wage costs) · KPIFt−1· KPIFt

(3.7)

The adjustment factor adjusts for the difference between the calculated average cost and the average cost reported by the municipalities in The Annual Municipal Accounts (Räkenskaps-sammandraget, RS). Since, most of the basis for this equalization is from the year 2019, we must count up to the equalization year 2021. To be able to do this, a forecast for the con-sumer price index with fixed interest (KPIF) that the Ministry of Finance (Finansdepartemen-tet) makes for the years 2020 and 2021 is used.

(17)

3.3

New variables

The cost equalization should be based on structural differences in costs. Structural costs are costs that can not be affected by the municipalities, such as the distribution of age and gender, socioeconomics, descent and geographical conditions [6]. Variables that is not included in the regression model today but can be expected to have an impact on the municipalities costs for IFO:

Newly registered with the employment service

If the number of unemployed residents increase, an increase in costs for financial support can be expected. There are also connections between unemployment and an decrease in mental health [7] and an increased risk for substance abuse [8], that besides increased need for sub-stance abuse care, could lead to increased need for family support, such as counseling, foster care or respite care.

Single parent household

To be a single parent can be challenging, not only financial but also social. It could be expected that an increase in the number of single parent households could result in an increase in the need for family counseling or respite care.

Proportion of young adults

Calculations using data from the National Board of Health and Welfare (Socialstyrelsen) [9] on the number of people receiving financial support in different age groups shows that young adults is the by far largest group receiving financial support, 7.17% of the population between 18 and 24 years receive financial support, in contrast to only 3.36% of the population between 25 and 65. So it was considered reasonable to expect that municipalities with a large propor-tion of young adults will have higher costs for financial support.

Proportion of older residents

The opposite reasoning can be used for municipalities with a large proportion of elderly resid-ent, that cost for financial support will be lower. The need for child- and youth care and family support should also be lower in municipalities where the elderly population is large.

3.4

Models

Since data for 2014-2016 was not available, both the original model and the one including all suggested new variables was estimated using the average from the years 2018-2019.

(18)

Original model

NET = β0+ β1· CBIS + β2· LEL + β3· SqUP

+ β4· ApB + β5· SDC + β6· BC + u.

(3.8)

Full model

NET = β0+ β1· CBIS + β2· LEL + β3· SqUP

+ β4· ApB + β5· SDC + β6· BC + β7· NRES

+ β8· SPH + β9· YA + β10· OR + u.

(3.9)

To see if one or more of the explanatory variables contribute significantly to explain the vari-ation in y a Wald test [11] can be performed.

Wald test

Let ˆβ be the estimator of K + 1 parameters, and ˆβ ∼ N(β , σ2(XTX)−1). The test of q hypotheses on the K + 1 parameters is expressed with a q × K + 1 matrix R. The null-hypothesis is Rβ = r where r is the 1 × q vector of restrictions. For unknown σ2, with es-timator ˆσ2= uˆ

Tuˆ

N−K−1, the test statistic is given by:

F= N− K − 1 q

(R ˆβ − r)T[R(XTX)−1RT]−1(R ˆβ − r) ˆ

uTuˆ ∼ Fq,N−K−1. (3.10) The adjusted R-squared

R2, the coefficient of determination, is used to explain how much of the variation in y that is explained by the explanatory variables. Generally a higher R2 indicate a better fitted model. However, adding more variables to a regression will increase the R2, so to avoid over-fitting the model the ¯R2can be used instead. The ¯R2is adjusted to compensate for the increase in R2 that occur when adding more explanatory variables.

AIC for model selection

To find the best fitted model, the AIC can be used [14]. For set of candidate models, the AIC value is calculated as in equation 3.11 and the preferred model is the one with the minimum AIC value. AIC rewards godness of fit but also includes a penalty for the number of estimated parameters, which discourage overfitting.

AIC= − log( ˆL) + 2k, (3.11)

(19)

StepAIC function in R

The procedure for the StepAIC function starts with the full model and in the first step calculate the AIC for the model and then remove one variable at the time, calculate the AIC for those models and delete the variable whose removal lowered the AIC the most. The second step starts with the model that was selected in the previous step and again removes one variable at the time and calculate the AIC, but now also tests to include the eliminated variable again before deciding which variables to eliminate. Step two is repeated until the AIC value does not decrease any more.

3.5

Panel data

Panel data consists of observation on the same n units at two or more time periods T and can be useful when it is suspected that the outcome variable depends on explanatory variables which are not observable but correlated with the observed explanatory variables. If such omit-ted variables are constant over time, the effect of the observed explanatory variables can be consistently estimated using panel data estimators.

The multiple linear regression model for unit i = 1, . . . , N who is observed at several time periods t = 1, . . . , T is written yit = α + xTitβ + vTi λ + ci+ uit where yit is the dependent

vari-able, xTit is a K-dimensional row vector of time-varying explanatory variables and vTi is a M-dimensional row vector of time-invariant explanatory variables excluding the constant, α is the intercept, β and λ are K- and M-dimensional column vectors of parameters respectively, ciis an individual-specific effect and uit is an error term.

First difference estimator We have the equations

yit= α + xTitβ + vTi λ + ci+ uit (3.12)

and

yi,t−1= α + xTi,t−1β + vTi λ + ci+ ui,t−1. (3.13)

By subtracting (3.13) from (3.12) the intercept, the individual-specific effect and the time-invariant explanatory variables are eliminated and the First difference model is given by:

˜

yit = ˜xTitβ + ˜uit, (3.14)

where ˜yit= yit− yi,t−1, ˜xit = xit− xi,t−1 and ˜uit = uit− ui,t−1. Using OLS to estimate β yields

the estimator:

ˆ

(20)

Chapter 4

Result and analysis

To decide which variables to include in the model, first the original model and the full model, including all new variables, were estimated using OLS. The assumptions for OLS were tested and the ¯R2, and the p-values were used to decide which variables to eliminate and which to keep. Since cross-sectional data not always capture the effect of a change on individual level, the data from the two years available was used to create a panel data set and the first difference estimator was used to see how the changes in the explanatory variables, from one year to another, effects the change in the dependent variable.

4.1

Models

Regression of net cost on the explanatory variables in the original model, using data from the 290 Swedish municipalities from 2018-2019, gave the estimates in the first column of table 4.1. The ¯R2 for this model is 0.4356 and all included variables show significance on at least level α = 0.1. After the four new explanatory variables were added to the model, the estimates in the second column of table 4.1 were obtained. The ¯R2 has increased to 0.4485 and the variable NRES is significant at level α = 0.05.

4.1.1

Assumptions of multiple linear regression

The assumptions of multiple linear regression were tested on the data and the full model. Linearity

The dependent variable was plotted against each of the explanatory variables and no non-linear patterns could be detected. The plots are seen in figure 2.1.

Normality

To visually check the residuals for normality a QQ-plot was used, plotting the quantiles of the ordered residuals from the full model and the expected value of the order statistic of a standard normal distribution against each other. The result can be seen in figure 4.1. The statistic from

(21)

Model Original Full Selected 1 Selected 2

Coefficient Coefficient Coefficient Coefficient

(Std. error) (Std. error) (Std. error) (Std. error)

Intercept 415.2 -1101 -635.8 -757.7 (338.7) (900.3) (607.6) (597.8) CBIS 12121 *** 9702 *** 9817 *** 9967 *** (1903) (2078) (2098) (2094) LEL 28567 * 22688 29095 * 33277 ** (11596) (14213) (11532) (10754) SqUP 2.918 * 2.594 * 2.720 * 3.162 ** (1.129) (1.150) (1.085) (0.9881) ApB 2612 . 1397 1664 -(1551) ( 1436 ) (1415) SDB 40.36 *** 33.92 * 30.37 ** 26.41 * (11.11) (13.35) (11.38) (11.56) BC -522.4 ** -407.5 * -409.2 * -428.2 * (186.9 ) (188.5) (192.4) (192.3) NRES - 11487 * 13006 ** 13849 ** (5242) (5280) (4913) SPH - 19398 - -(20073) YA - 7335 6387 7448 (4767) (4602) (4840) OR - 87.02 - -(1851) R2: 0.4473 0.4676 0.4656 0.4628 ¯ R2: 0.4356 0.4485 0.4504 0.4495 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

Table 4.1: Estimates with standard errors from the four models.

the Shapiro-Wilks test is W = 0.99296 and the corresponding p-value is p = 0.1892 so the null-hypothesis of normality can not be rejected.

(22)

Figure 4.1: QQ-plot of the residuals from the full model

CBIS LEL SqUP ApB SDB BC NRES SPH YA OR

VIF 2.215 2.0628 2.200 2.195 2.531 1.271 2.092 2.114 1.340 2.792

Table 4.2: Table over VIF values for the full model Exogenity

The residuals were plotted against the explanatory variables, seen in figure 4.2, and no obvious patterns could be detected, which is a sign that the model has the correct functional form. However, omitted variables may still be a problem.

No multicollinearity

The variance inflation factor was calculated for all the explanatory variables in the full model and are shown in table 4.2. None of the values indicate the presence of multicollinearity. Homoscedasticity and Non-autocorrelation

The Breusch-Pagan test was performed on the full model and the statistic was χ2 = 24.50 with p < 0.001, so the null-hypothesis of equal variance was rejected and Eicker-Huber-White standard errors were used for the full model to account for the heteroscedasticity.

Since time-series data was not used, autocorrelation in that aspect was not a concern. However, spatial autocorrelation was thought to be a potential problem. In an attempt to test for this type autocorrelation, the observations were ordered after municipalities groups, defined as in table

(23)
(24)

4.3. Then a Durbin-Watson test was used, which resulted in the test statistic DW = 1.844 with p= 0.0737. The null-hypothesis of zero spatial autocorrelation could not be rejected at level α = 0.05, but at level α = 0.1 the alternative hypothesis would be accepted and indicate that there exists a spatial dependence between municipalities that are similar to each other.

Name Group Group description

A1 Large cities Population of at least 200 000 in

municipality’s largest urban area A2 Commute municipality close to large city At least 40% commutes to large city

B3 Medium cities Population between 40 000 and 200

000 in municipality’s largest urban area B4 Commute municipality close to medium city At least 40% commutes to medium city B5 Low commute municipality close to

medium city

Less than 40% commutes to medium city

C6 Small cities Population between 15 000 and 40 000

in municipality’s largest urban area C7 Commute municipality close to small city At least 30% commutes to or from

small cities

C8 Rural areas Population less than 15 000 in

municipality’s largest city, low commute patterns

C9 Rural areas with tourist industries Rural area that meets criterias for tourist industries i.e number of guest nights, turnover in retail / hotel / restaurant in relation to the size of the population

Table 4.3: Municipality group descriptions.

4.2

Model selection

To test if any of the new variables in the full model are statistically significant a Wald test was used, with robust standard errors to account for the heteroscedasticity. The null-hypothesis is that the coefficients of all the new variables is equal to zero. The resulting statistic is F = 2.647 with p = 0.0338 so the null-hypothesis can be rejected at level α = 0.05. Not all

(25)

of the variables are significant and the p-values were used to choose which variables to keep and which to eliminate from the model.

4.2.1

Selected models

For the first selected model it was decided to keep all of the original variables and eliminate the variables OR and SPH. The estimates and their robust standard errors are seen in table 4.1. The ¯R2has increased to 0.4505.

In the second of the selected models the variable ApB, from the original model, was also eliminated and the result of the regression is seen in table 4.1. A Wald test of the two selected models, to see if the removed variable ApB is statistically significant, was performed. The resulting statistic was F = 1.384 with p = 0.2405 so the null-hypothesis, that ApB is equal to zero, could not be rejected. From this it was decided that the second of the selected models is the one suggested to replace the model used today.

AIC

To check that the best model been chosen, the StepAIC function in R was used to calculate the AIC value for all possible models and select the best one. Table 4.4 shows the results and the model with the lowest AIC value is the same model as suggested in the previous section.

Start First step Second step Third step

CBIS + + + + LEL + + + + SqUP + + + + ApB + + - -SDC + + + + BC + + + + NRES + + + + SPH + + + -YA + + + + OR + - - -AIC: 3940.82 3938.82 3937.82 3937.40

Table 4.4: AIC values from stepAIC function in R.

4.3

First difference model

A panel was created using data for the years 2018 and 2019 and the first difference estimator was used to obtain the estimates in the first column of table 4.5. The variable SqUP disappear since it is only calculated every 5 years and therefor did not change between 2018 and 2019. Other, unobserved variables, also disappear if they are constant over time, which eliminates some of the omitted variable bias that may be present. One outlier was found in the data. The

(26)

observation was for Färgelanda, were the net cost had more than doubled from 4606 SEK in 2018 to 9267 SEK in 2019. This could be from some change in the municipality’s way of reporting their costs and the observation was removed from the data. The ¯R2was only 0.0363 and many of the variables in this model were insignificant at all conventional significance levels. However, the variable OR, that has not been significant in the models in the previous section, is now significant. All insignificant variables were removed from the model. The variable BC, even if significant at level α = 0.1, was also removed since the variable CBIS, that BC is to adjust, was insignificant. Running the regression again using the reduced model gave the estimates in the second column of table 4.5. The ¯R2is 0.0349. The sign for two of the remaining variables have changed compared to the estimates in table 4.1. The difference in results from the two methods indicates that the model used today may explain some of the differences between the municipalities but it does not really capture the effect of a change in the explanatory variables for the individual municipality.

4.4

Outcomes

The outcomes of the cost equalization was calculated for the original model and the new model and compared to see how a change in the model used would impact the municipalities.

4.4.1

Net cost deviations

To see how much the estimated standard costs from the different models differ from the mu-nicipalities actual costs for the IFO-activities the net cost deviations were calculated as

Net cost deviation = Net cost - Reference cost

Reference cost (4.1)

where the reference cost is the standard cost before it is adjusted to the cost level of 2021. To get a total deviation, the absolute values of each net cost deviation were summed together and the results are presented in table 4.6. The first row consists of the statistics from net cost deviations calculated using the model that exists today for the reference cost, where data from 2014-2016 was used to estimate the coefficients. All three new models gave smaller deviations than the original model, even when the data from 2018-2019 was used to estimate it. It would not be optimal to have a zero net cost deviation since that would mean that not only the structural cost are captured but also the different ambition levels and efficiency of the municipalities, but smaller deviations can be seen as a sign of a better fitted model.

4.4.2

Fee or grants

To see how the municipalities would be effected in the cost equalization by changing from the model used today to the one suggested here, the difference between the outcomes, in SEK per resident, were calculated. The smallest change was for Vänersborg (0.0177) and the largest change for Åsele (750.5), were a positive change means that the municipality will either receive a larger grants or pay a smaller fee. In Table 4.7 it is shown how many municipalities

(27)

Model FD full FD reduced Coefficient Coefficient (Std. error) (Std. error) Intercept 111.7 . 84.25 . (64.69) (44.14) CBIS 4020 -(2658) LEL -75231 * -68755 * (32768) ( 32333) ApB 8889 -(8826) SDB -9.48 -(31.21) BC -2369 . -(1274) NRES 8652 * 9315 * (4063) ( 4038) SPH -22269 -(31416) YA 7644 -(8839) OR 27695 * 26802 * (11388) ( 10842) Multiple R-squared: 0.0664 0.0450 Adjusted R-squared: 0.0363 0.0349 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’

(28)

Model Min Mean Median Max Total deviation Old -0.5552 0.0683 0.0566 1.2086 61.101 Original -0.5584 0.0324 0.0075 1.0977 56.668 Full -0.5379 0.0264 0.0054 1.1203 55.746 Selected 1 -0.5372 0.0279 0.0236 1.1268 56.006 Selected 2 -0.5333 0.0329 0.0243 1.1023 56.266

Table 4.6: Statistics of the net cost deviation.

Change Number Larger that 500 (+) 2 Between 100 and 500 (+) 80 Smaller than 100 (+) 72 Smaller than 100 (-) 52 Between 100 and 500 (-) 82 Larger that 500 (-) 2

Table 4.7: Number of municipalities in each size class of change, in SEK/resident.

that will experience a large, medium or small, positive or negative, change. The number of municipalities that would receive grants and those who would pay a fee would be stable. The receivers would increase by one, from 68 to 69 municipalities, and those who pay would decrease by the equivalent, from 222 to 221. There are nine municipalities that would go from receiving grants to instead paying fees and ten municipalities that would receive grants instead of paying fees.

4.4.3

Municipality groups

To see if a the impact of changing the model would be different between different municipal groups, the differences between outcomes were calculated for each of the groups defined in table 4.3 and the results are seen in table 4.8. The negative impact is largest for the two A-groups, where about 3/4 of the municipalities would experience a negative change. For the municipalities in the B-gruops and C-groups the positive and negative changes are more evenly distributed. The two municipalities that would experience a negative change of more than 500 SEK per inhabitant are both in municipality group A2, where at least 40% of the residents commutes to a large city. In contrast, the two municipalities that would experience a positive change of more than 500 SEK are both in group C8, rural areas with low commute patterns. This difference could be attributed to the addition of the variable NRES in the new model, that could be expected to take higher values in rural areas and the removal of the variable ApB that took higher values in bigger cities. The number of the municipalities that receive grants or pays fees is seen in table 4.9, and has not changed in four of the groups. The larges change is for the groups B3, where two municipalities would receive grants using the new model instead of paying fees, and C7, where two municipalities would pay fees instead of receiving grants. In table 4.10 it is seen how much on average that the municipalities in each group would

(29)

Change A1 A2 B3 B4 B5 C6 C7 C8 C9 Larger that 500 (+) 0 0 0 0 0 0 0 2 0 Between 100 and 500 (+) 0 3 15 13 9 11 9 14 6 Smaller than 100 (+) 1 7 3 11 7 7 6 8 2 Smaller than 100 (-) 0 13 2 14 9 6 15 9 4 Between 100 and 500 (-) 2 18 1 14 10 5 22 7 3 Larger that 500 (-) 0 2 0 0 0 0 0 0 0 Total 3 43 21 52 35 29 52 40 15

Table 4.8: Number of municipalities in each municipality group and size class of change, in SEK/resident.

Group Fees Grants

Original model New model Original model New model

A1 0 0 3 3 A2 39 39 4 4 B3 11 9 10 12 B4 44 44 8 8 B5 24 25 11 10 C6 25 24 4 5 C7 33 35 19 17 C8 31 31 9 9 C9 15 14 0 1 Total 68 69 222 221

Table 4.9: Number of receivers of grants and payer of fees, by municipality group and model.

receive or pay. The group A2, that contain the two municipalities that had the largest negative changes, also have the largest negative change on average. For the group C8 that had the two largest positive changes, the average change, while still positive, is only the third largest. This is because both the groups B3 and C9 contains relatively many municipalities with positive changes in the range 100 to 500 SEK.

4.4.4

New model on divided data

Since the impact of changing model were different for the different municipal groups it was tested to divide the data to see how the model perform for different categories of municipal-ities. The data from the municipalities was divided in to three data sets, where set A consists of data from municipality groups A1 and A2, set B of data from groups B3-B5 and set C of data from groups C6-C9. Then the regression was ran using the new model on each of the three data sets. In table 4.11 we see that the ¯R2 are very different for the different municipal categories. The model explain around 70% of the variation in net costs for the municipalities in category A, but only just above 27% of the variation for the municipalities in category C.

(30)

Original model New model Diff A1 1947 1861 -86.22 A2 -1066 -1186 -119.7 B3 82.08 224.5 142.4 B4 -734.6 -735.9 -1.264 B5 -355.3 -374.3 -18.98 C6 -569.9 -534.9 34.92 C7 -293.8 -361.1 -67.25 C8 -476.3 -384.6 91.71 C9 -1132 -1018 113.6

Table 4.10: Average fee(-)/grants(+) by municipal group, original and new model. New model

Category A B C

¯

R2 0.7098 0.4529 0.2723

Table 4.11: ¯R2for new model, estimated on divided data.

Municipal group model

To test if the different municipal groups have a significant impact, two dummy variables was created such that

B= 1 if in group B

0 otherwise and C=

 1 if in group C

0 otherwise . (4.2)

The dummy variables was multiplied with each of the explanatory variables and added to the model as seen in equation 4.3. The regression using the dummy variable model gave an

¯

R2= 0.4686, compared to ¯R2= 0.4495 for the model without dummy’s. Then a Wald test of the model with dummy variables and the model without was performed. The resulting test statistic was F = 2.095 with p = 0.0089, so the null-hypothesis can be rejected at level α = 0.01 and the division into groups have a significant effect.

NET = β0+ β1· CBIS + β2· LEL + β3· SqUP + β4· SDC

+ β5· BC + β6· NRES + β7· YA

+ β8· (B · CBIS) + β9· (B · LEL) + β10· (B · SqUP) + β11· (B · SDC)

+ β12· (B · BC) + β13· (B · NRES) + β14· (B · YA)

+ β15· (C · CBIS) + β16· (C · LEL) + β17· (C · SqUP)

+ β18· (C · SDC) + β19· (C · BC) + β20· (C · NRES)

+ β21· (C · YA) + u.

(31)

Chapter 5

Discussion

The aim of this thesis was to find a model that better explained the differences in the muni-cipalities costs for IFO-activities. A new model was suggested, where one variable from the original model was eliminated and two new variables were introduced. The new variables were NRES, proportion of newly registered with the employment service, and YA, proportion of young adults, and the one removed was ApB, proportion of residents living in apartment buildings built in 1965-1975. This change increased the ¯R2 from 0.4356 to 0.4495 and also increased the significance of some of the individual variables so we have support for our the-ory that these variables helps explain the municipalities costs. The net cost deviation was also smaller with the new model which may reflect a better fitted model.

The results from using the first difference model may seem surprising. The signs of two of the variables are different from what was expected. These results suggests that an increase in the proportion of the population with low educational level would decrease the costs for the IFO-activities, and also that an increase in the proportion of residents over the age of 65 would lead to increased cost for the IFO, which goes against the theory. That the results are so differ-ent depending on which method is used can be thought to reflect upon faultiness or deficiency in the method used today. Today’s method can be thought to capture the differences between municipalities but not really reflect the effect of a change in the the individual municipality. Also it can be thought that the differences between the two years used for the first difference model was small and therefor not showed any significant effect, if data was available for more years the effects might have been seen more clearly.

Spatial dependence between municipalities could be an issue. After ordering the muni-cipalities after groups, the null-hypothesis of zero spatial autocorrelation was rejected at level α = 0.1. Ordering after groups is not the optimal way to detect spatial autocorrelation, es-pecially since the groups were not ordered within themselves, but it does give an indication that this might be an issue. Dividing the data and running the regression again shows that the models perform very differently on different categories of municipalities, which raises the question of a joint model is the best method. This difference may indicate that some of the variables have different effects for different groups. The Wald test for the model with dummy variables against the one without also showed indications of that.

(32)

5.0.1

Future research

It would be interesting to investigate the possibility of using different models on different categories further. Also to create a distance matrix for the municipalities would be a better way to detect spatial dependence for future work in this area. It would also be very interesting to continue exploring, using panel data for more years and see if that could shine some light on the differences of using OLS compared to the first difference estimator.

(33)
(34)

Bibliography

[1] Kommunalekonomiska kommittén. Kommunal ekonomi i samhällsekonomisk balans – statsbidrag för ökat handlingsutrymme och nya samarbetsformer(SOU 1991:98)

Stockholm: Finansdepartementet.

[2] Strukturkostnadsutredningen. Kostnadsutjämning mellan kommunerna (SOU 1993:53) Stockholm: Finansdepartementet.

[3] Kommunala utjämningsutredningen. Kostnadsutjämning för kommuner och landsting (SOU 1998:151)

Stockholm: Inrikesdepartementet.

[4] Utjämningskommittén. Gemensamt finansierad utjämning i kommunsektorn (SOU 2003:88)

Stockholm: Finansdepartementet.

[5] Utjämningskommittén.08. Likvärdiga förutsättningar – Översyn av den kommunala utjämningen(SOU 2011:39)

Stockholm: Finansdepartementet.

[6] Kostnadsutjämningsutredningen. Lite mer lika. Översyn av kostnadsutjämningen för kom-muner och landsting(SOU 2018:74)

Stockholm: Finansdepartementet.

[7] Sociala rådet. Massuppsägningar, arbetslöshet och sjuklighet (SOU 2010:102) Stockholm: Socialdepartementet.

[8] Underlagsrapport 2. Arbete, arbetslöshet och jämlik hälsa – en kunskapsöversikt (S 2015:02)

Stockholm: Kommissionen för jämlik hälsa. [9] Socialstyrelsen.

https://www.socialstyrelsen.se/statistik-och-data/ statistik/statistikamnen/ekonomiskt-bistand/

2020-12-08.

[10] S.S. Shapiro and M.B. Wilk. An Analysis of Variance Test for Normality (Complete Samples), Biometrika, Vol.52, No.3/4. (Dec., 1965), pp. 591-611.

(35)

[11] Abraham Wald. On the Efficient Design of Statistical Investigations. The Annals of Mathematical Statistics, vol. 14, no. 2, 1943, pp. 134–140.

[12] RDocumentation.

https://www.rdocumentation.org/packages/olsrr/versions/0.5. 3/topics/ols_test_breusch_pagan

2020-12-27.

[13] J. Durbin and G. S. Watson. Testing for Serial Correlation in Least Squares Regression: I.Biometrika, vol. 37, no. 3/4, 1950, pp. 409–428.

[14] Hirotugu Akaike A New Look at the Statistical Model Identification, IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716-723, 1974.

References

Related documents

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

På många små orter i gles- och landsbygder, där varken några nya apotek eller försälj- ningsställen för receptfria läkemedel har tillkommit, är nätet av

Det har inte varit möjligt att skapa en tydlig överblick över hur FoI-verksamheten på Energimyndigheten bidrar till målet, det vill säga hur målen påverkar resursprioriteringar

Detta projekt utvecklar policymixen för strategin Smart industri (Näringsdepartementet, 2016a). En av anledningarna till en stark avgränsning är att analysen bygger på djupa

DIN representerar Tyskland i ISO och CEN, och har en permanent plats i ISO:s råd. Det ger dem en bra position för att påverka strategiska frågor inom den internationella

Av 2012 års danska handlingsplan för Indien framgår att det finns en ambition att även ingå ett samförståndsavtal avseende högre utbildning vilket skulle främja utbildnings-,