Uppsala University
Department of statistics
Bachelor Thesis
Immigration and competition:
Are low- and medium-skilled native Swedes more likely to support the Sweden Democrats when there is an influx
of immigrants, compared to high-skilled native Swedes?
Authors:
Mattias Tajik & Claes Kock
Supervisor:
Dr. Mattias Nordin
Spring 2020
Abstract
The Sweden Democrats have gained considerable political success in recent years, as have many other right-wing populist parties in the West. We theorize that the economically weakest and least educated parts of native society are the ones who experience, real or imagined, the most pressure and competition from immigration.
Skill level is divided into three different categories, depending on education. These are ”low-skill”, ”medium-skill” and ”high-skill”. We expect that immigration should make the low-skilled and medium-skilled more prone to vote for the SD, when compared to the high-skilled natives. We use survey data from the SOM institute of Gothenburg University, as well as municipal data from Statistics Sweden, to test the hypothesis. Our results seem to show that the support for the SD among low-skilled natives increases when immigration increases, compared to high-skilled natives. No such effect is observed for the medium-skilled natives.
Keywords: Immigration, populism, skill-level, natives, Sweden democrats
Contents
List of Tables 3
List of Figures 3
1 Introduction 5
1.1 Background and previous research . . . . 5
1.2 Objectives . . . . 7
2 Empirical strategy 8 2.1 Data . . . . 8
2.1.1 Voting for the SD . . . . 10
2.1.2 Skill-level . . . . 12
2.1.3 Immigration . . . . 13
2.1.4 Controls . . . . 14
2.2 The model . . . . 17
2.2.1 Interaction effect as difference in APE:s . . . . 17
2.3 Cluster-robust standard errors . . . . 19
2.4 Alternative methods . . . . 19
3 Results 21 3.1 Linear probability model . . . . 26
4 Discussion 28 References 32 A Additional information for some variables 33 A.1 Distribution of ImmCh . . . . 33
A.2 Questions used from SOM institute (2017) . . . . 34
A.3 Formulas for municipality level variables . . . . 35
B Auxiliary output 36 B.1 Plots of differences . . . . 36
B.2 Interaction effects at different levels of ImmCh . . . . 39
B.3 Plots from the linear probability model . . . . 42
List of Tables
1 Variables and descriptive statistics . . . . 10 2 Classification of skill-levels and their sizes . . . . 12 3 Average partial effects of ImmCh at three skill-levels . . . . 21 4 Estimated coefficients and differences in APE:s for four logistic regression
models . . . . 22 5 Estimated coefficients from linear probability model . . . . 26
List of Figures
1 Left : Party support in 2017 SOM survey, Right : Election results for 2018 Source: SOM-2017, SCB (2020d) . . . . 11 2 Predicted probability of voting for the SD at different skill-levels and differ-
ent levels of ImmCh, without controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 24 3 Predicted probability of voting for the SD at different skill-levels and dif-
ferent levels of ImmCh, with controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 25 4 Distribution of ImmCh. . . . 33 5 Differing predicted probability between low-skilled and high-skilled, with-
out controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 36 6 Differing predicted probability between low-skilled and high-skilled, with
controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 37 7 Differing predicted probability between medium-skilled and high-skilled,
without controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 38 8 Differing predicted probability between medium-skilled and high-skilled,
with controls. The confidence level is 95 percent, and the confidence inter- vals are calculated with the delta method. . . . 38 9 Estimated interaction effect, for different levels of ImmCh and lowskill,
without controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 40 10 Estimated interaction effect, for different levels of ImmCh and lowskill,
with controls. The confidence level is 95 percent, and the confidence inter-
vals are calculated with the delta method. . . . 40
11 Estimated interaction effect, for different levels of ImmCh and medskill, with controls. The confidence level is 95 percent, and the confidence inter- vals are calculated with the delta method. . . . 41 12 Estimated interaction effect, for different levels of ImmCh and medskill,
without controls. The confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 41 13 Estimated effect of ImmCh for three skill-levels, without controls. The
confidence level is 95 percent, and the confidence intervals are calculated with the delta method. . . . 42 14 Estimated effect of ImmCh for three skill-levels, with controls. The confi-
dence level is 95 percent, and the confidence intervals are calculated with
the delta method. . . . 43
1 Introduction
The last decade has seen increased success for right wing populist parties across the West, most of which were formerly seen as fringe parties. Parties such as France’s National Front and Italy’s Lega have seen increasing electoral success, at the expense of more established parties (l’int´ erieur 2017; Affari Interni e Territoriali 2018). Brexit and the election of Donald Trump can also be seen as parts of the same phenomenon.
The Sweden Democrats (SD) is a Swedish far-right party which has seen increased electoral success in later years. The party is built on a platform of right-wing populism, with opposition to immigration being one of their main issues (Hinnfors and Sundstr¨ om 2015).
The party is fairly new to Swedish politics, gaining its first parliamentary seat in in 2010, with 5.7 percent of the vote.
The rapid rise of right-wing populism has led many social scientists to study the reasons behind its success. Building on previous research, we formulate a hypothesis regarding a relationship between native skill level and the influx of immigrants to an area. We propose that when immigration to an area increases, lower skilled natives are more likely to vote for the SD, when compared to higher skilled natives. We divide skill level into three levels based on level of education. Immigration is measured on the municipal level.
The study uses survey data as well as municipal data regarding immigration. To test the hypothesis, a logistic regression model is estimated, using both individual- and municipal- level variables.
1.1 Background and previous research
Broadly speaking, two strands of literature have emerged regarding the explanatory fac- tors for the growing support of right-wing populist parties in the Western world (H˚ afstr¨ om Dehdari 2018). The first strand focuses on the relationship between the visibility of im- migrants and its effect on the support of far-right parties: The main idea of this strand of literature is that the mere presence of immigrants generates anti-immigrant sentiments, with different scholars suggesting differing underlying reasons.
For example, Tabellini (2020) discusses how opposition to immigration in the US between
the late 19th century and the early 20th century was motivated by perceived cultural
differences. He argues that even when immigration provided a net economic benefit,
there was still widespread political backlash due to ethnic/religious/linguistic/cultural
differences between migrants and natives, which resulted in increased support for anti-
immigration legislation. Thus, we can consider this as a more ”cultural” perspective,
where animosity towards immigrants is motivated by native/immigrant differences.
The second strand of research suggests that natives are more prone to support anti- immigrant parties when subject to economic distress. Two different reasons are proposed by this strand of research. The first theory suggests that low skilled workers vote for right- wing parties because they oppose free international trade. Free trade awards companies the ability to move their production overseas, which forces the workers in the industrialized West to compete with low-wage workers in developing countries. In other words, the fear of competition drives workers to support the far-right, which generally opposes free trade agreements. The second theory proposes that the reason voters are supporting far- right parties is because an influx of immigrants would lead to increased competition for societal resources, such as schooling and healthcare. Meaning that, a fear of intensified competition with immigrants when they are subject to economic distress generates support for anti-immigrant parties (H˚ afstr¨ om Dehdari 2018).
H˚ afstr¨ om Dehdari (2018) finds that low-skilled workers in Sweden are more prone to support the SD because most immigrants moving to Sweden are low-skilled. This is at least true when the natives are subject to economic distress. The findings are motivated by (an at least perceived) intensification of economic competition between lower skilled natives and immigrants. Thus, he combines the views of the cultural and economic perspectives. Immigration might generate increased support for the SD but it is generally due to a perceived cost of immigration among low-skilled natives - a cost where the low- skilled natives must bear the heaviest burden. It should be noted that the actual effect of immigration on the labour market and on the economy as a whole does not have a solid consensus. Some studies such as Borjas (2003) finds that immigration increases the supply of labour, which in turn lowers wages. Others, such as Card (2009), finds evidence which suggests that immigration have infinitesimal effects on wages.
Although the economic effects of immigration might be contentious, similar studies to H˚ afstr¨ om Dehdari (2018) find that lower-skilled natives tend to be more supportive of anti-immigrant parties. For example, Arzheimer and Carter (2006) finds that the socio- demographic background of a voter has a great influence in determining if they would vote for a right-wing populist party or not. According to the study, younger males working in manual labour professions have a significantly increased probability of voting for a right- wing populist party. Conversely, being a professional middle-aged woman significantly decreased the same probability. This might be explained by the fact that the lower skilled natives at least perceive that they pay the largest costs of immigration.
The real or perceived cost of immigration to Sweden can be highlighted by studies from
Statistics Sweden (SCB). According to a study by SCB, most immigrants are low- or
medium skilled. A vast majority of immigrants do not hold a university degree (SCB
2019a). We theorize that this has the effect of making high-skilled immigrants compete
disproportionately more with low-skilled native workers, thus making native workers feel economically threatened, in turn leading to increased support for the SD. Overall, there is also the fact that unemployment is higher among immigrants and foreign born citizens, compared to native ones (SCB 2016). This might generate a heavy burden on the welfare system, which might be perceived as an immigration-related expense.
In Sweden there is also evidence to believe that highly educated immigrants have dif- ficulties finding employment that matches their level of education. According to SCB (2017), immigrants with a high level of education have more difficulty finding work that matches their education when compared to natives. Looking at engineers as an example, we see that foreign born engineers have a lower ”match” than native engineers. ”Match”
in this context means to have a job that matches one’s education. Thus, highly educated immigrants seem to have a harder time putting their education to use. Overall, a third of foreign born workers feel that they are overqualified for their current job, compared to 17 percent for natives (SCB 2016).
1.2 Objectives
In previous research, one theorized reason behind support of right-wing populist parties is perceived economic competition between lower-skilled natives and immigrants. Following H˚ afstr¨ om Dehdari (2018), we theorize that there is a perceived cost related to immigration, and that lower skilled natives perceive themselves as the ones who are most affected by this price. In line with previous theory, we believe lower skilled natives should feel more economically threatened by immigration than higher skilled natives.
We believe that this interaction between immigration and having a low skill-level should
be positive - meaning that the lower skilled natives should be more prone to support the
SD, compare to the higher skilled natives, when there is an influx of immigrants. Note
that this allows for an effect on the higher skilled - whatever that effect is, the effect
should be larger for the lower skilled. While H˚ afstr¨ om Dedhari (2018) shows that the the
perceived cost of immigration increases when the economic distress increases, we aim to
show that support for the SD increases among low-skilled natives when immigration to
an area increases.
2 Empirical strategy
Given the previous theoretical discussion of the real or perceived economic threat of immigration, we theorize that lower skilled natives will react differently to an increase of immigrants when compared with higher skilled natives. A large influx of immigrants should lead to larger differences in the support for the SD between lower skilled and higher skilled natives. This is a hypothesis of an interaction effect between skill-level and an influx of immigration. The anticipated interaction effect is positive, meaning that in general, lower skilled natives should be more likely to support the SD when immigration is increased, when compared to higher skilled natives.
In the following sections the data used to test this hypothesis is introduced and discussed.
We partition skill-level into three groups - low-skilled, medium-skilled and high-skilled - according to the observed individual level of education. We discuss both our individual level variables and our municipality-level variables. Following that is a discussion about the included controls. After all of this, the model is presented. Then a short discussion of alternative methods follows. As has already been outlined, the hypothesis regards an interaction effect.
It is anticipated that the effect of a larger influx of immigration should make the two lower skill levels more prone to vote for the SD, when compared to the high-skilled natives. We measure immigration in each municipality for the years 2007 and 2017, and its impact on individuals with different skill-levels with respect to their support of the SD.
2.1 Data
The general approach in previous research that has aimed at studying how economic distress or an influx of immigrants might impact the native voters has generally only studied data on the municipality- or precinct-level (see for example H˚ afstr¨ om Dehdari (2018)). This is partly due to the fact that it is generally not possible to connect an individual, and his/her properties, with his/her actual vote. Since the aim of this study is to show that an influx of immigrants impacts native individuals we prefer to use survey data. This approach does make it possible to connect the respondents skill-level with his/her party support.
If we instead studied, for example, only precinct-level data it would be possible to show
that precincts with a high rate of low-skilled individuals tend to be more supportive of
the SD than precincts with a high rate of high-skilled individuals, when the influx of
immigrants is larger. However, this is slightly different than showing that an influx of
immigrants impacts individuals with different skill-levels differently. Data on immigration
is measured on the municipality level.
The data used in this thesis is provided by two primary sources - first, the Institute of Society, Opinion and Media (SOM) 1 ,which is a part of Gothenburg University. The SOM institute collects data about Swedish opinions through their yearly surveys. This thesis uses data from the survey that was collected in 2017, which was the latest survey publicly available when the work on this thesis first started. In total, slightly more than 10 000 respondents participated in the survey. The data is collected with a simple random sample.
Every Swedish resident between the ages of 16 and 85 has the same chance of being asked to participate in the survey(Jansson, Tipple, and Weissenbilder 2018, p. 3).
The survey includes a wide range of individual-level data. Since the theoretical discus- sions in previous sections implies that the opinions of natives will be impacted by an increase of immigrants, only native born respondents are kept for the analysis. We define
”native born” as simply being born in Sweden, not being connected to any particular ethnicity. Some respondents did not describe their level of education, so these individuals are not included in the data set. Overall 8418 observations are kept for the analysis. The municipality in which the respondent resides is also registered in the survey. Using this information, every respondent could be matched with municipality-level data gathered from SCB.
SCB is the official government agency that produces official statistics. Through this agency we have access to immigration numbers and employment data, as well as other economic and demographic variables. This is offered on a level of granularity that allows us to see these things on a municipal level, which is useful since we are studying the individuals in these municipalities and might want to control for various local differences.
All of our variables, as well as some related descriptive statistics, are presented in table 1. Some additional information regarding variables are presented in Appendix A.2 and A.3.
1 Non-official translation of ”Samh¨ alle-, opinion och media”.
Table 1: Variables and descriptive statistics
Variables Mean Std. dev. Min. Max.
Dependent Variable
SD 0.141 0.349 0 1
Explanatory Variables
ImmCh 6.929 2.992 0.559 19.549
lowskill 0.148 0.355 0 1
medskill 0.297 0.457 0 1
Controls
Man 0.494 0.500 0 1
Age 51.839 18.272 16 85
Population 2007 147808.600 223753.300 2549 795163
ShareImm 0.129 0.062 0.034 0.392
U nemployment 2007 0.218 0.039 0.127 0.359
Taxable Income 2007 152834.400 19341.540 120280 268924
Tax rate 2007 31.604 1.007 28.890 34.240
Large city 0.185 0.389 0 1
Over 65 0.176 0.032 0.104 0.305
20 to 64 0.535 0.039 0.332 0.607
Notes: The descriptive statistic is calculated for the sample consisting of only respondents born in Sweden. All numbers are rounded to the third decimal. Sources: SCB (2020a), SCB (2020b), SCB (2020c), SCB (2019c), SCB (2019b), and SOM institute (2017).
2.1.1 Voting for the SD
Any respondent who stated that the SD is their preferred party is counted as a person that would vote for the SD in a hypothetical election. In figure 1 the distribution of party support among the respondents is shown. The SD is supported by around 13 percent of the respondents.
One of the benefits of using the SOM survey is the publicly available information regarding analysis of the accuracy of the attained results. Thus, fears about possible bias which might lead to faulty results, can in part be discussed solely based on the analysis made by the SOM institute. As noted in Jansson, Tipple, and Weissenbilder (2018, pp. 22–
26) female respondents are slightly more prevalent in the sample than male respondents.
Furthermore, young respondents are slightly less prevalent in the sample than in the
population. Both these facts might impact the results somewhat. We know that males
tend to be more supportive of the SD, for example: from the SOM-survey we know that 877 out of the 1363 SD voters in the survey are males, leaving only 486 females.
Furthermore, figure 1 shows the percentage of voters for each party in the survey. Figure 2 shows the number voters for each party in the national election of 2018. The proportion of SD voters is about 4 percentage points higher in the national election. Since almost a year separates the SOM-survey data and the national election results some of the difference might be due to some voters changing party. However, combining the information of the slight bias in favour of females in the survey with differing results between the SOM survey and the election of 2018 1 leaves room to doubt that some SD voters decided to not participate in the survey. We can assume that voters that vote for controversial parties might not want to participate in surveys, which might also result in there being more SD voters in the population compared to our sample.
Figure 1: Left : Party support in 2017 SOM survey, Right : Election results for 2018
Source: SOM-2017, SCB (2020d)
2.1.2 Skill-level
Skill-level is partitioned into three categories. Low-skilled, medium-skilled and high- skilled. (These are denoted as ”lowskill”, ”medskill” and ”highskill”, when included in an equation or in a table of results, or when they are referred to as variables).
Table 2: Classification of skill-levels and their sizes
Skill-level Classification Obs.
Low Studied compulsory education, less than 9 years 1246 Completed compulsory education, 9 years
Medium Studied at secondary education, 2 years or less 2496 Completed secondary education, 3 years
High Tertiary education 4676
Notes: This classification is based on a classification from SOM institute (2017).
Table 2 shows the rules used to classify individuals as either low-, medium- or high-skilled.
Low-skilled individuals are defined as anyone that has only studied the compulsory 9 year education. At the time when the survey was taken the first 9 years of education was compulsory in Sweden (SFS 2010:800; SFS 2017:1115, chap. 7 §11). Previously however, the time of compulsory schooling was shorter (Proposition 1962:54). Thus, it is possible that some respondents will have studied a shorter amount of time than 9 years. Both of these groups are here categorized as low-skilled.
Following the 9 years of compulsory schooling it is possible to study an additional 3 years of secondary education. Previously, the secondary education could be shorter as well. Some respondents might also have dropped out from secondary education before finishing it. Any respondent that has studied or completed secondary education, but not studied any further, is categorized as medium-skilled. Individuals that have studied at a community college have also been categorized as medium-skilled 2 . The category high- skilled covers anyone that have studied at higher levels than post-secondary education.
This includes respondents who have only studied at the university or equivalent, as well as those who have a degree. Higher degrees than a bachelor’s degree is also included in this category.
2 Community college is a translation of ”folkh¨ ogskola”.
2.1.3 Immigration
Data on immigration is collected from SCB, since they have extensive data on the number of immigrants living in all 290 municipalities. In order to capture sufficient variation in the amount of immigration, we collect data from 2007 and 2017. The data used counts the number of immigrants moving into each municipality.
To measure the influx of immigrants we use data of the number of foreign-born individuals in each municipality in 2007 and in 2017. The variable is calculated by:
ImmCh = ∆Imm
P op 2007 = (Immigrant 2017 − Immigrants 2007 ) P opulation 2007
The distribution of ImmCh over all municipalities is shown in Appendix A.1.
An alternative idea would be to use the difference in the share of immigrants between 2007 and 2017. That is,
∆Share = Immigrants 2017
P op 2017 − Immigrants 2007 P op 2007 .
However the validity of this variable might be questioned. Such an approach suffer from a flaw that might generate faulty results. The share of immigrants might not change because of an influx of immigrants but because of other explanations, such as more natives than immigrants moving away from a region. This would be a problematic approach - in this thesis, what is of interest is an increase in the number of immigrants. The variable of importance here is not the share of immigrants, but the size of the influx. Of course, the size of the influx must be standardized to the size of the population of the municipality.
ImmCh does that without falling into the potential trap of registering municipalities that
are suffering from depopulation as having a large influx of immigrants. Using a variable
such as ∆Share would be problematic, since large parts of northern Sweden suffers from
depopulation. ∆Share might show that large parts of northern Sweden being reported
as having a large influx in immigration, when in reality a larger share of native Swedes
moving out of the municipality, compared to immigrants.
2.1.4 Controls
The controls included in the model are both controls on the individual level and on the municipal level. All controls are listed in table 1, alongside the dependent and explanatory variables.
In order for the regression output to be unbiased, all important controls must be included in the model (Angrist and Pischke 2007, pp. 59–64). To attain true estimates, it is important to control for the effect of all relevant confounders, which otherwise might generate spurious correlations. Leaving out confounders, meaning variables that covaries with both ImmCh and with the support for the SD, will generate biased results. This is called omitted variable bias. Since an interaction is of interest, confounders that impacts the three skill levels differently are especially problematic.
Some controls must be left out due to an even more serious issue. As noted by Angrist and Pischke (ibid., pp. 64–68), when testing a hypothesis regarding a variable it is unwise to use controls which are affected by the variable of interest. Such variables will increase the bias 3 . For example, since immigration to a municipality surely impacts the local economy somehow (although as was noted earlier, exactly what the impact is is contentious), variables that measures changes in the economy between 2007 and 2017 must be left out. One way to substitute these variables is to use measures of the economy before the influx of immigrants occurred. In our case, this means that we use economic variables from 2007.
This solution is not completely satisfactory. Changes in the economic conditions of the municipalities themselves are completely left out. The main issue with this is the financial crisis of 2008. It occurs during our chosen time period and its effect will not be covered by the controls. If the financial crisis had an impact on the immigration levels of certain municipalities, and if it also impacted the individuals probability to vote for the SD, then it will impact the results. Note that there are fairly good reasons to at least believe that the financial crisis impacted people’s attitudes towards the SD (H˚ afstr¨ om Dehdari 2018).
Economic productivity in 2007 might impact the result. A highly productive municipality will likely attract many immigrants to move to that municipality. Furthermore, a highly productive municipality will likely be better at providing employment opportunities and provide better welfare services for its population, both immigrants and natives. If this is the case, immigrants might not be perceived as a threat on the market. Additionally, even if the immigrants should be a net loss for the municipality, the burden will be heavier for economically weaker municipalities.
3 Angrist and Pischke (2007, p. 65) portrays it as: causal effect + selection bias.
The economic situation in each municipality is controlled for with three variables.
Taxable Income 2007 is only used to control for the productivity of a municipality.
Taxable Income 2007 is the total taxable income in 2007 divided by the total population in a municipality. This means that it is not susceptible to the tax levels of the individual municipalities. More productive municipalities will tend to have a higher taxable income than less productive municipalities. Thus, it is used as an inexact measure of productiv- ity 4 . Where the economic resources are scarce, conflict will likely develop and economic competition will intensify.
Another variable included to control for the economic situation in the municipality is level of unemployment in 2007, which we simply call U nemployment 2007 . Unemployment is obviously related to productivity, however, not exactly. We assume that municipalities with high levels of unemployment will generally not attract immigrants. Furthermore, a high rate of unemployment will likely increase the risk of immigrants not being able to provide for themselves and to increase (the real or perceived) competition for employment opportunities.
One last economic variable that is included in the model is the tax-rate, called T axrate 2007 . A low tax-rate will function as a pull-factor, drawing immigrants to the municipality. An influx of immigrants might tear on the public resources, especially in the short term (see for example Forslund and ˚ Aslund (2016, p. 9) for a discussion on immigrants in Sweden and their difficulties on the labour market). Increasing the cost of the public resources will likely demand higher tax-rates. This burden is heavier in municipalities where the tax rate is already high. If a municipality has to increase public expenditure due to immigration, thus increasing taxes, then it seems reasonable to believe that natives in that municipality will be more prone to support a party whose goal is to lower immigration, assuming that an increase in tax rate negatively affects them economically.
During the time period 2007 to 2017 some demographic changes might have occurred which ideally should be controlled for. Depopulation could theoretically generate a need for employees - this might function as a pull-factor for immigrants. If the immigrants are needed economically, it might make them more appreciated among the native population.
This might in turn lead to a lower probability of voting for the SD in that municipality.
As with the economic variables, we cannot include changes in the economic demand for immigrants, since it would be dependent on ImmCh. It is plausible that a large influx of immigrants will either cause native Swedes to move from a municipality or to not move to a municipality. There is vast literature on the topic of ”white-flight” and ”white-avoidance”
which seems to suggest this (see for example Br˚ am˚ a (2006)). Since immigration might
4 From 2010 onward, SCB collects data on the gross regional product, which would be a preferred
measure of productivity if it was available for 2007.
impact the natives willingness to move to or from a municipality, controls for such variables would be endogenous. Hence, they are left out of the test.
A variable which measures the population size of the municipality is included -
P opulation 2007 . Smaller communities might be impacted in different ways than larger communities, due to having different economic situations, smaller labour markets and more competition regarding healthcare. The ability to ”absorb” immigrants could thus significantly vary depending on the size of the municipality, which makes it important to control for the population size. In addition, a dummy called ”Large city” is included to register if respondents are living in one of the larger cities, Stockholm, Gothenburg and Malm¨ o. These three large cities have taken in a disproportional amount of immigrants and they might also have a special position to integrate immigrants. Large businesses are located in these areas and the labour market is generally more dynamic compared to the rest of the country.
It is possible that the amount of immigrants before the change might impact the results.
The previous level of immigration might correlate with the immigration that occurred between 2007 and 2017. The previous immigration might also impact native probability to vote for the SD. Since what is of interest is the effect of the influx during the ten year period we include a control for the the amount of immigrants in 2007. The variable ShareImm is included measures this.
Some individual level controls were also included in the model; Age and Gender. These
are quite standard controls and are likely not impacted by the treatment of interest, and
are included in order to reduce the error variance (Woolridge 2013, p. 206).
2.2 The model
To test the hypothesis a logistic regression model is used. The following model is esti- mated:
P (SD = 1) = G(β 0 + β 1 lowskill i + β 2 ImmCh i + β 3 lowskill i × ImmCh i + β 4 medskill i + β 5 medskill i × ImmCh i + γ 1 V i + γ 2 lowskill i × V i + γ 3 medskill i × V i ) = G(·) (1) Note that two skill-levels are included in the model, low-skilled and medium-skilled. The skill-levels are allowed to interact with the variable ImmCh which measures the size of the influx of immigrants. V is a vector of all the control variables. The control variables are also allowed to interact with the skill-levels.
A logistic regression model assumes no perfect multicollinearity and that G(·) is a logistic function. This implies linearity in the logit.
2.2.1 Interaction effect as difference in APE:s
A test of the coefficients of the interaction terms tells us whether or not its inclusion generates a model with a better fit. However, it does not tell us anything about the general effect that immigration has on natives of different skill-levels (Karaca -Mandic, Norton, and Dowd 2012). The coefficients of the interaction term might even have reversed signs from the actual interaction effects (Ai and Norton 2003). Furthermore, the interaction effect might vary, and even be positive at some levels of predicted probability, while negative at others. Therefore, the coefficients of the interaction terms, given in equation 1, is not what is of primary interest in this study. The coefficients are reported in the test but not much importance should be given to them.
The parameters in logistic regression models are difficult to interpret, even when the
model does not include interaction terms. Variables in limited dependent models can
have different impacts depending on the value of the included control variables (Ai and
Norton 2003). That is, for certain values of the controls, the exact same amount of
ImmCh will have a different effect. A common way to measure the size of an effect is to
study the Average partial effect (APE) of the variable (Woolridge 2013, p. 592). In order
to test our hypothesis of the interaction effect, we make use of the APE. If the APE of
ImmCh is different for low-skilled and high-skilled (or medium-skilled and high-skilled)
natives, then there is an interaction effect.
The definition of APE of ImmCh for low-skilled natives is given by:
N l −1
N
lX
i=1
∂P (SD = 1|lowskill i , ImmCh i , V i )
∂ImmCh i
The formula expresses the APE of ImmCh for the low-skilled natives. Virtually the same formulas may be expressed for the APE of ImmCh for the medium- and high-skilled. We do this simply by replacing lowskill with either medskill or highskill.
To study if the effect of ImmCh is different for different skill-levels we compare the APE:s of ImmCh for the different skill-levels. The difference between the APE:s of ImmCh for lowskill and highskill is given by 5 :
N l −1
N
lX
i=1
∂P (SD = 1|lowskill i , ImmCh i , V i )
∂ImmCh i
− N h −1
N
hX
i=1
∂P (SD = 1|highskill i , ImmCh i , V i )
∂ImmCh i
(2) The difference between the APE:s of ImmCh for medskill and highskill is given by:
N m −1
N
mX
i=1
∂P (SD = 1|medskill i , ImmCh i , V i )
∂ImmCh i
− N h −1
N
hX
i=1
∂P (SD = 1|highskill i , ImmCh i , V i )
∂ImmCh i
(3) At this point, after having explained some vital concepts and variables, the hypotheses might be repeated in more formal terms. What is tested is the difference of the APE:s of ImmCh between low-skilled and high-skilled; and between medium-skilled and high- skilled. That is, equation 2 and 3 are expected to be different from zero, and the prior theoretical reasoning implies the anticipation that both 2 and 3 should be larger than zero. In order to perform the test, we must obtain the standard errors of the difference in the APE. These are attained by using the asymptotic standard errors derived by the delta method 6 .
5 Difference in APE:s is what is referred to as mean interaction effects in Ai and Norton (2003) and as average of the cross partial derivatives in Karaca -Mandic, Norton, and Dowd (2012), when one part of the interaction is a dummy.
6 With one parameter the delta method approximates the variance of a function h( ˆ α) of an estimated
parameter ˆ α in the following way: V ar(h( ˆ d α)) ≈ V ar(α)[h d 0 (α)] 2 . See for example Woolridge (2010) for a
description of how the delta method is used to calculate the asymptotic standard errors for the APE:s.
2.3 Cluster-robust standard errors
The clustered structure of the data must be taken into account by some means. The decision to use cluster-robust standard errors is sometimes taken erroneously. According to Abadie et al. (2017), two reasons might motivate the use of cluster robust standard errors. First, if the sample is a stratified sample, rather than purely random, using clustered-standard errors can account for dependence within the strata. The second reason for why cluster-robust standard errors might be used is if the treatment is clustered.
The first motivation does not justify the use of cluster robust standard errors, since the sample is generated by a simple random sample (Jansson, Tipple, and Weissenbilder 2018). However, the second issue does justify the use cluster robust standard errors, since the ”treatment”, immigration, was assigned to clusters of natives, living in the same municipality. Thus, following the advice of Abadie et al. (2017) the cluster-robust standard errors are used, and they are clustered on the municipality level.
2.4 Alternative methods
Generally, when studying causal connections where the outcome is binary, two major approaches are available: probit and logistic regression. In this study a logistic regres- sion model is employed. However, the results would be very similar if probit was used instead.
Another approach would be to use a linear probability model. Such a model would estimate effects that are very similar to the difference in APE:s. Although the linear probability model might estimate probabilities smaller than 0 or larger than 1, it is in some instances useful. As a pedagogical tool, as well as a robustness check, output from a linear probability model is provided in the results section. The model that is estimated is:
P (SD = 1) = β 0 + β 1 lowskill i + β 2 ImmCh i + β 3 lowskill i × ImmCh i + β 4 medskill i + β 5 medskill i × ImmCh i + γ 1 V i + γ 2 lowskill i × V i + γ 3 medskill i × V i + u i (4) To estimate the model OLS is used. The linear probability model relies on the same assumptions as linear regression. However it violates some of the key assumptions, such as homoscedastic error terms 7 . The linear probability model assumes linearity and implies constant partial effects. This implies that the marginal effects may be close to the marginal
7 The linear probability model suffers form an inherent heteroscedasticity and non-normally distributed
error terms. See for example Woolridge (2010, pp. 454–457) for a more thorough description of both the
issues related to the linear probability model, as well as a description of its usefulness
effects estimated in a logistic regression at average predicted probabilities, but farther away at small and large predicted probabilities.
Since the aim of this study is to explore an interaction between a municipality-level and a individual-level variable it would also be possible to use a multi-level model/hierarchical model, such as a multi-level logistic regression, or a multi-level probit model. Such an approach could capture a cross-level interaction, for example by employing a so called
”varying-intercept, varying-slope model” (Gelman and Hill 2007, p. 314) 8 . Such a model might capture more of the heterogeneity, and may be preferable if the main aim is to construct a strong predictive model. Furthermore, the number of observations differs a lot between different municipalities. This is generally problematic and yields inaccurate estimations for multi-level models, especially when the dependent variable is binary (Hox 2010, pp. 235–237) 9 . Since the aim of this thesis is to test a particular hypothesis regarding an interaction instead of creating a strong predictive model a usual logistic regression model is used instead of a multi-level model.
8 A possible model to estimate could be a multi-level logistic regression model, such as for example:
ln( 1−P (y P (y
i=1)
i