To vote, or not to vote? Understanding voter turnout patterns

(1)

To vote, or not to vote? Understanding voter

turnout patterns

Constructing, interpreting and comparing logistic regression models measuring

voter turnout in German federal elections

By Filip Hellberg and Elliott Syrén

Department of Statistics

Uppsala University

Supervisor: Martin Solberger

(2)

1

Abstract

The core of any democracy is constituted on the very principle of entrusting citizens to elect their leaders. Yet, no country has ever achieved total voter turnout. This paper aims to better understand the differences between people who vote and those who choose not to. We have chosen to look closer at the case of Germany, a country that on paper seem to have the right conditions for high voter turnout whilst instead only acquiring a mediocre turnout in recent elections. To do so, we make use of cross-sectional data collected by the World Value Survey in 2013. We construct, interpret and compare 10 different models based on both previous research on the topic of voter turnout as well as our own intuition. Through the usage of logistic regression, we aim to further the understanding on how a range of different variables affect the likelihood of one voting in German federal elections. Furthermore, some of the presented models are used to predict whether or not individuals voted as a way of comparing the models as well as a mean of evaluating their applicability. The paper presents a benchmark model constituted of two variables that are used as control variables in some of the other models as well – age and education. By extensively analysing these two variables we were able to find that they seem to improve the predictive power in terms of increasing a model’s ability to effectively detect nonvoters. Most models saw great difficulty in finding nonvoters within the dataset, furthering our perception that voter turnout is a complex phenomenon to understand on a micro-level. Despite this, we were able to distinguish what variables and models that appear to be more appropriate, convenient and suitable to understand voter turnout in the case of German federal elections based on 2013 data and hopefully by extension unveiling implications for the general case of voter turnout in industrialized, well-developed western democracies.

Keywords

Voter turnout, Logistic regression, Confusion matrix, Receiver operating characteristic (ROC), World value survey (WVS), R applications

Acknowledgements

(3)

2

1 Introduction

Voter turnout is essential for the legitimacy of any functioning democracy and can be seen as one of the conceptual foundations to the idea of representative democracy in itself. This paper aims to investigate how different variables presented in the World Value Survey (WVS) Wave 6 from 2013 affect the likelihood of voting in federal elections in Germany using logistic regression. Additionally, and at least as importantly, we aim to construct, interpret and compare different models to see what model(s) best explain voter turnout. Based on previous research from countries all over the world, variables which are either said to have a positive or negative correlation with voter turnout will be applied to the case of Germany to see if the results can be reproduced and furthermore to compare the effectiveness of different models. Of course, countries are different from one another and cultural deviances are at times hard to measure or account for, thus making voting patterns look very different from case to case. Hence, some theories and patterns observed in other countries could possibly differ when applied on the case of Germany making it interesting to see whether or not there are factors that are somewhat closer to universal when it comes to effects on voter turnout.

We have chosen to look closer at Germany because voter turnout in the election 2009 was at an all-time low, even though the country to the vast majority of people today would be considered more democratic compared to 30 years ago. Germany is an industrialized country with a, at least in modern history, relatively well-educated population and a strong economy. To many political scientists, this would reasonably imply high voter turnout. However, in 2009 voter turnout was at 70.9 percent followed by 71.5 percent in the general election 2013, which is the same year that our data was collected. Since then another general election has been held with a higher voter turnout of 76.2 percent which still is not to be considered as remarkable compared to other western democracies, especially considering that Germany is one of the world’s largest and most developed economies and a key actor within the European Union.

1.1 Purpose of this study

The purpose of this paper is to better understand what variables affect the likelihood for a German citizen to vote or not to vote, both by interpretation and analysis of parameters as well as by comparing different models using a variety of measures. One of the more detailed approaches we have chosen to work the most with is to evaluate our models by predicting whether or not individuals within the dataset voted or not. In doing so, we can compare what model most accurately classifies whether or not one votes based on the parameters we have investigated.

(5)

4 model best explains voter turnout in Germany and what model most accurately predicts whether or not an individual voted or not. It is important to note that the models based on previous research may differ in exact form and included variables due to the fact that we have a different dataset to work with compared to what was used to construct models within the other papers presented. This might to some extent be a contributing factor if results differ between our and previously published results.

1.2 Research question

While preparing how to best understand voter turnout using a quantitative statistical methodology, we came to the conclusion that there are two overarching goals we would like to achieve with this paper: understanding what variables affect voter turnout and evaluating what model most effectively predicts whether a person voted or not. The research questions we believe best captures the aim of this paper are formulated as such:

What variables are both significant and accurately explains whether or not someone votes in German federal elections?

And,

What model most effectively predicts voter turnout German federal elections?

(6)

5

2 Logistic Regression

This paper will make use of logistic regression as a tool to understand the relationship between voter turnout and a range of different variables. Just as with linear regression, logistic regression models the mean of the dependent variable given a number of independent variables. However, the dependent variable in logistic regression is binary – “success” or “failure” – rather than continuous in nature. Logistic regression is useful when the measured variable is, for example, “happy or unhappy”, “Democrat or Republican”, or as in our case: whether one votes or not in the German federal elections. It is important to note that in the general case, “success” is not necessarily associated with something inherently good, it could for instance indicate the presence of a disease or the occurrence of a crime, etc.

Let 𝑌 denote the response variable, and let the two possible outcomes of 𝑌 be 0 (”failure”) and 1 (”success”). Then the dependent variable follows a Bernoulli distribution with probability of ”success” 𝑃(𝑌 = 1) = 𝜋, say, and probability of ”failure” 𝑃(𝑌 = 0) = 1 − 𝜋 (Fitzmaurice and Laird, 2001, p. 10225). Note that 𝜋 is the mean of 𝑌. Suppose we have a set of 𝑝 predictors, 𝑥_,, 𝑥_-, ..., 𝑥_.. Logistic regression aims to describe the impact on 𝜋 when there are changes in the predictor variables based on the regression

log 2_,433 5 = 𝛽₈+ 𝛽_,𝑥_,+ 𝛽_-𝑥_-+ ⋯ + 𝛽_.𝑥_., (1) where log(∙) is the natural logarithm. The left-hand side of Equation (1) is often referred to as the logit function, or the log-odds. Log-odds are essentially the odds of the occurrence of an event logarithmically which is not very intuitive nor easy to interpret. Instead, we can convert it to odds by exponentiating to get 2 3

,435, that is, the odds can be understood as the probability

of an event occurring divided by the event not occurring. If the probability of an event occurring is 0.8, the odds is 4 =_,48.>8.> (Fitzmaurice and Laird, 2001, p 10222). The logistic model can also be presented as the probability of success:

𝑃(𝑌 = 1) = 𝜋 = ?@ABCDECFGFECHGHE⋯ECIGIJ

,E?@ABCDECFGFECHGHE⋯ECIGIJ .

The independent variables within a logistic regression model can either be metric or nonmetric and their coefficients represent the effect in terms of variance in the dependent variable per unit increase or decrease in respective variable. Nonmetric refers to a variable type that is not numerically measurable, such as for instance gender or employment status, as opposed to

metric variables that are numerically measurable, such as age that can be measured in years or

(7)

6 (Hair et al., 2014, p. 313-314). For instance, it could be that smoking affects the likelihood of an individual being put into the group of people within a dataset with a heart disease. The method can be used to predict a multitude of things with a binary outcome.

The logistic regression parameters, 𝛽K, (𝑗 = 1,2, … , 𝑝), have the interpretation as the change in

log odds of success for a unit change in 𝑥_K, given that the remaining predictors are held fixed. Conversely, a unit increase in 𝑥K changes the odds of success by a factor of exp(𝛽K) , often

referred to as a “odds ratio” (Fitzmaurice and Laird, 2001, p. 10224). The parameters can be estimated by maximum likelihood, which chooses values of the logistic regression coefficients that are most likely to have generated the observed data. It is done by maximizing the likelihood function for the data and since 𝑌 is binary, the probability distribution is the Bernoulli distribution. Suppose we have 𝑁 independent observations on the response variable (𝑦_,, 𝑦_-, … , 𝑦_T) and the predictor variables (𝑥_U,, 𝑥_U-, ..., 𝑥_U.; 𝑖 = 1,2, … , 𝑁). The likelihood function which is to be maximized can then be expressed as

∏ 𝜋_UXY_{(1 − 𝜋}

U),4XY, T

UZ, (2)

where, from the probability of success,

𝜋_U = exp B𝛽8+ 𝛽,𝑥U,+ 𝛽-𝑥U-+ ⋯ + 𝛽.𝑥U.J 1 +exp B𝛽₈+ 𝛽_,𝑥_U,+ 𝛽_-𝑥_U-+ ⋯ + 𝛽_.𝑥_U.J .

The coefficients that maximise the value of the likelihood function (2) are the maximum likelihood estimates. This maximisation requires an iterative procedure. In this paper, we use R to maximize the likelihood function.1

It is perhaps tempting to analyse the relationship between the mean of 𝑌 and the predictors by a linear model on the form

𝐸B𝑌|𝑥_,, 𝑥_-, … , 𝑥_.J = 𝜋 = 𝛽₈+ 𝛽𝑥_,+ 𝛽𝑥_- + ⋯ + 𝛽_.𝑥_.. (3) However, probabilities are restricted to lie between 0 and 1, and expressing the probability of ”success” as a linear function of the predictors violates this restriction. Additionally, we would expect there to exist a nonlinear relationship between 𝜋 and the predictors, so that a unit increase in 𝜋 is considered more extreme when 𝜋 is very low or very high. Below, two visual examples are provided to show how a logit model better captures relationships between independent variables and a binary dependent variable. The observations are not taken from an actual dataset but were created by us for the sole purpose to illustrate why a logit model is to prefer for the purposes of our paper.

(8)

7 Figure 2.1: An OLS regression model describing the relationship between voter turnout and age (simulated data).

In this example, simulated age is represented on the horizontal axis while simulated voter turnout is represented on the vertical axis. First, we present an ordinary least squares (OLS) regression based on (3). A graph of the fitted line is shown in Figure 2.1.

Here, we can clearly see that the relationship between the two variables is misrepresented as a result of voter turnout being a binary variable. As there are only two possible outcomes for the dependent variable, all observations are distributed between these two points (1 and 0) and an arbitrary line is produced in between. Furthermore, as mentioned by Hair et al. (2014, pp. 317-318) the error term of a model with a binary dependent variable such as the one plotted above follows a binomial distribution (which is a form of the Bernoulli distribution) rather than a normal distribution, thus violating the normality assumption needed to perform OLS regression. Additionally, due to the binary nature of the dependent variable the variance is not constant, resulting in possibly and probably concerning levels of heteroscedasticity rendering any results from this model pointless to rely on or interpret.

(9)

8 Figure 2.2: A logit model describing the relationship between voter turnout and age (simulated data). In this graph we see that at very low values of the predictor variable (here representing age) the curve approaches but never reaches 0, while on the opposite hand approaching but never reaching 1 at very high values of the predictor variable. This model ranges between 1 and 0 as any binary outcome, whereas mentioned (and seen in the graph) 1 represents an event occurring and 0 represents the non-event (or the event not occurring) – in our example being whether or not a person votes or not. Thus, the model in essence aims to predict the probability whether or not an event occurs based on different characteristics (independent variables) of a person or observation, etc. Here, the nonlinearity of the binary response variable is accounted for and we are no longer using a methodology to explain the relationship between voter turnout and age that requires a normality assumption as is the case in the OLS-example.

(10)

9 between our predictor variables. According to Hair et al. (2014, pp. 196-197) correlations below 0.9 are not an issue of concern. In case of more than two variables in a model, we also study the variance inflation factor (VIF). The VIF is calculated as

𝑉𝐼𝐹 = 1

1 − 𝑅-,

where 𝑅-_{is the regular R-square from a regression of a predictor on the other predictors within}

a linear model. In essence, VIF is calculated based on the 𝑅-_{value of a regression that is run}

without the response variable but instead using the regressors of interest. We will be using R to calculate the VIF values to be able to pinpoint if our predictors show signs of multicollinearity and therefore should be omitted.2_{VIF indicates to what extent the variance of}

a coefficient estimate is inflated by multicollinearity (Midi et al., 2010, p. 259). For this paper, a value of 10 or above is to be considered absolutely too high to proceed with further examination or regression (Hair et al., 2014, pp. 200-201). However, as noted by Midi et al. (2010, p. 259) values above 2.5 may be troublesome for “weaker” models, which according to them often is the case for logistic regression models, and we will therefore make sure to closely examine models including predictors that result in values higher than 2.5 and consider their exclusion.

Logistic regression is also sensitive to extreme values like other methods of regression, as this can potentially sway the results produced to disproportionately represent the relationship between a higher/lower value than the median value and the dependent variable in an inaccurate way.

As logistic regression makes use of maximum likelihood estimation it does not necessarily require large sample sizes. However, Hosmer et al. (2013, p. 167) recommend sample sizes greater than 400 in order to make sure that goodness of fit measures are effective. As we will elaborate further on later in the paper, they also recommended to split the dataset, using one part for analysis and the other as holdout to control and analyse the results, thus doubling the total sample size needed. Another recommendation is that the sample size should exceed at least 10 observations per category within each estimated parameter (Hair et al., 2014, pp. 318-319). Here, categories within an estimated parameter refers to for example “unemployed” within a variable describing employment status.

(11)

10

2.1 Goodness of fit statistics

To evaluate our models, we use different goodness of fit statistics. Among these are the

McFadden’s Pseudo-𝑅-_,

𝑅_abc- _{= 1 − d}log(𝐿a)

log(𝐿₈) f,

where 𝐿_a is the likelihood from the model we estimate, and 𝐿₈ is the likelihood from the same model with only an intercept, no other predictors are included. 𝐿₈ is comparable to the residual sum of squares in linear regression (McFadden, 1973, p. 121). If all the slope parameters are equal to 0, then the 𝑅_abc- _{will also be equal to 0. However, the 𝑅}

abc- will never be equal to 1

(Hu et al., 2006, p. 847-848). This formula corresponds to a proportional reduction in “error variance”, and a value of 0.2 to 0.4 indicates excellent fit (McFadden, 1977, p. 35).

We will formally test the goodness of fit by the Hosmer-Lemeshow test. This test divides the data into groups, where 10 groups is a commonly used division and furthermore the number of groups we throughout the paper have chosen to use when preforming this test (see Hosmer et al., 2013, chapter 5). The first group contains the 10% with lowest probability of voting and the last group contains the top 10% most likely to vote. The test then compares the expected outcome of the grouping with the actual outcome. If expected and actual outcome are similar, the model fits the data and the null hypothesis of ”good fit” should not be rejected (Hosmer et al., 2013, pp. 158-160).

The null and alternative hypotheses are

𝐻8 = 𝑇ℎ𝑒 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑚𝑜𝑑𝑒𝑙 𝑓𝑖𝑡𝑠 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑤𝑒𝑙𝑙

Versus

𝐻x = 𝑇ℎ𝑒 𝑙𝑜𝑔𝑖𝑠𝑡𝑖𝑐 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑚𝑜𝑑𝑒𝑙 𝑑𝑜𝑒𝑠 𝑛𝑜𝑡 𝑓𝑖𝑡 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑤𝑒𝑙𝑙

Let 𝑔 be the number of groups and let 𝑛y be the number of subjects in the 𝑘th group (𝑘 =

1, 2, . . . , 𝑔). Additionally, let 𝒙| _{= ( 𝑥}

,, 𝑥-, . . . , 𝑥.) be a vector of the predictors. The test

statistic (Hosmer et al., 2013, Section 5.2) is constructed as

𝐶~ = • €(𝑜,y− 𝑛y𝜋•y) -𝑛_y𝜋•_y(1 − 𝜋•_y)‚ ƒ yZ, , where 𝑜_,y = ∑b… 𝑦_K

KZ, is the the sum of subjects for which 𝑦 = 1, where 𝑐y is the number of

covariate patterns in the 𝑘th group3_{, and}_𝜋•

y is the average estimated probability in the 𝑘th

group,

3_{As an example, if a model contains sex and race, each coded at two levels, there are four possible covariate}

(12)

11 𝜋•y = 1 𝑛_y• 𝑚K𝜋†K b… KZ, ,

where 𝜋†_K is the estimated probability of success from the logit of the 𝑗th covariate pattern, and 𝑚K is the number of vectors with 𝒙 = 𝒙K, for 𝑗 = 1,2,3, . . . , 𝐽, where 𝐽 is the number of distinct

values of 𝒙 observed. So, for example, if one or more observations within the dataset share the same age and education level (in a dataset only containing these two variables), then these observations will result in a single distinct vector. Suppose this vector is 𝒙_,. Then 𝑚_, is the number of subjects that share this observed vector. Similarly, 𝑚_K denotes the number of subjects that share vector 𝒙_K. Note that 𝑜_,yis the observed number of outcomes in decile 𝑘, while 𝑛_y𝜋•_y is the predicted – or expected – number of outcomes in decile 𝑘. We will not manually calculate the test statistic for our models, but will instead calculate it in R.4

Under the null hypothesis, the distribution of the test statistic is approximately Chi-squared with degrees of freedom equal to the number of groups minus 2 (Hosmer et al., 2013, p.158). We will use the 5% significance level to reject the null hypothesis.

There are some problems associated with the usage of the Hosmer-Lemeshow test. One considerable issue is how one best chooses the number of groups used to perform the test. Several studies that we have come across have shown that this has great influence on whether the model is classified as good fit or not, (Hosmer et. al, 1997, p. 967). While these concerns are of importance to note, we have decided to use the test with a healthy amount of caution as it is a frequently used and recognised test that we believe is useful to evaluate our models. When comparing models with different numbers of parameters, we can use the Akaike

Information Criterion (AIC). The following formula is used to calculate AIC:

𝐴𝐼𝐶 = −2𝐿 + 2(𝑝 + 1),

where 𝐿 is the log-likelihood function and 𝑝 is the number of regression coefficients. Generally, lower AIC scores are preferred when comparing to models. According to Hosmer et al. (2013, pp. 120-121) there is no statistical test to compare two AIC values.

2.2 A test for validity of a Likert scale

Several of the variables that we have chosen to use are ordinal variables, meaning that the variable is considered to have natural, ordered categories. However, the distances between different categories are unknown. An example could for instance be how interested one is in politics on a scale from 1 to 4, where one is very interested and 4 is not at all interested. Here, we know that there is a natural order from 1 to 4 but we cannot presume a set distance between

(13)

12 two or more categories. When creating a model including an ordinal scale variable, we could make dummy variables for all categories. However, that would increase degrees of freedom, require larger sample sizes and would increase the overall number of variables in a model often causing parameters to become statistically insignificant. Therefore, we will perform likelihood ratio tests to see whether or not we can consider variables like these as Likert scale variables and assume equidistance (Christensen, 2015, p. 12). In other words, we can test if the distance between 1 and 2 is equal to the distance between 3 and 4 or not in the example above. If that is the case, we can technically use the variable much as if it was a continuous variable, thus reducing the overall size of our models.

The test has the following hypotheses that are tested against each other: 𝐻₈ = 𝐿𝑖𝑘𝑒𝑟𝑡 𝑠𝑐𝑎𝑙𝑒 𝑖𝑠 𝑣𝑎𝑙𝑖𝑑, 𝑖. 𝑒. , 𝑒𝑞𝑢𝑖𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑖𝑠 𝑣𝑎𝑙𝑖𝑑

Versus

𝐻_x = 𝐿𝑖𝑘𝑒𝑟𝑡 𝑠𝑐𝑎𝑙𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑣𝑎𝑙𝑖𝑑, 𝑖. 𝑒. , 𝑒𝑞𝑢𝑖𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑖𝑠 𝑛𝑜𝑡 𝑣𝑎𝑙𝑖𝑑

To perform the test, we estimate two models: an unrestricted logistic regression model where the variable of interest is used with the different categories as dummies, and a restricted model where the variable of interest is not categorised. The likelihood ratio statistic is constructed as

𝐿𝑅•Ž• = 2[log(𝐿‘’) − log (𝐿’)]

where 𝐿_‘’is the likelihood from the unrestricted model and 𝐿_’ is the likelihood from the restricted model. Under the null hypothesis, the statistic is Chi-square distributed, 𝐿𝑅_•Ž• ∼ 𝜒_–-_,

where 𝑟 – the degrees of freedom – is equal to the number of restrictions (Christensen, 2015, p. 12). Again, we use the 5% significance level. Thus, we estimate the models to get 𝐿𝑅•Ž•,

(14)

13

3 Data

The data used for analysis in this paper comes from the WVS. There were 2,046 respondents in the original dataset. Depending on what variables one chooses to look at the number of missing values varies as there are some variables that contain more missing values than others. All in all, with the variables we chose to look closer at in this paper we conclude that the total number of observations after omitting all missing values were 1,844. WVS have high requirements for the sampling. Samples must represent all people age of 18 and above. People are included if they are resident in the country, no matter of their nationality, citizenship or language. Countries that participate in WVS can either do a simple random sample to choose participants or combine it with a stratified selection. The data is mostly collected by face-to-face interview at the respondent's home or place of residence. Other methods that are used are paper questionnaires and computer-assisted personal interviews. Before the sampling is done, an explanation of the procedures must be confirmed by WVS. The information about sampling presented above is taken both from the WVS website as well as from email correspondence with the WVS itself.

A possible hazard that one has to be aware of is the fact that all answers within the dataset are self-provided, meaning that a respondent for whatever reason could give up false information. While this is of concern and important to point out, we do believe that WVS is a reliable source and we trust their sampling methods to produce results accurate enough for the purpose of this paper.

In order to be able to evaluate our models we have chosen to split up our data into a training dataset and a test dataset, with 50% of the overall data in each dataset. This is done randomly by our R software.5_{We use the training dataset to construct our models and interpret their}

parameters. Later, we use the remaining 50% of the data, located in the test dataset, to see how well our model can classify whether or not the individuals in the dataset voted. When splitting the data up into two datasets, each set contains 922 observations, meaning that both datasets exceed our sample size requirements (see Section 2). Theoretically, as both datasets exceed the sample size requirement, there should not be any consequential differences between the two as they should approximate the same population. Furthermore, there should not be any critical differences in constructing and estimating models using all 1,844 observations compared to using 922 observations as the split was done randomly and we meet and exceed our sample size requirement in both instances. Some differences are sure to present themselves, but we do expect any estimations presented to be reasonably representative in approximating a more general trend.

Before splitting the data up, we performed an initial screening of the data to ensure that there were no values that were incorrectly coded (i.e that there is a value of 11 on a scale that is said to be between 1-10). We did not come across any instances whereby we determined that there

5_{The function ‘set.seed(455)’ is used in R to randomly distribute the observation into two dataset. If one would}

(15)

14 is a risk of this being the case. Furthermore, all values that were coded as “not applicable”, “no answer”, “not asked” etc were deemed to be missing values and were therefore omitted. These questionnaire options are coded as negative values by the WVS in the dataset, hence all data points containing a value lower than 0 were omitted.

3.1 The response variable

The response variable used for all models presented in this paper is voter turnout, which is measured using what in the WVS is referred to as V227, where respondents are asked to answer the following question:

When elections take place, do you vote always, usually or never?

We have chosen to recode the answers “always” and “usually” to 1 and “never” to 0, as we in order to perform logistic regression need a binary dependent variable. The reason we have chosen to merge “always” and “usually” is both because they are intuitively closer to one another as well as due to the fact that we are especially interested in singling out people in the “never” category to further analyse and study their traits and patterns.

3.2 Predictor variables: Wage and employment

Charles and Stephens (2013) present a conclusion that to some might come across as against common intuition, namely that higher wages and employment rates result in lower voter turnout in US elections for governor, senator, Congress and state House of Representatives. They do not find any evidence that this would affect turnout in the presidential election. While this result is based on data from the US, we nonetheless believe the relationship between voter turnout and wages and employment to be of interest to look closer at.

Within the WVS dataset for Germany, we found the following variables that could in some way be related to either wage or employment for an individual:

V229: Whether the respondent is currently employed or not as well as what type of employment/reason for not being employed. The variable has a total of 11 categories but 3 are considered to be missing values: “don’t know”, “not applicable” or “no answer”. These and similar answers in other variables for other models have consistently been omitted from the dataset. After observations containing these 3 answers to this question were omitted, we were left with 8 categories: “full-time employed”, “part-time employed”, “self-employed”, “retired”, “housewife not otherwise employed”, “student”, “unemployed” and “other” (where other is listed as a ‘no’ answer to the question whether or not the person is currently unemployed). Hereinafter this variable will be referred to as “EmpStat” in tables etc.

V239: Income, measured on a scale from 1-10 where the respondent states what income level they believe they fit the best into. Hereinafter this variable will be referred to as

(16)

15

3.3 Predictor variables: Satisfaction with democracy

There are many studies that have come to the conclusion that if citizens of a country are satisfied with the state of democracy within their country, then they are more likely to vote. However, a study by Ezrow and Xezonaki (2016) instead comes to the conclusion that the relationship in fact is the opposite. By studying 12 countries, including Germany, they found that between the years 1976–2011 overall increases in citizens satisfaction with democracy was associated with a decrease in voter turnout in national elections. Important to note here is that their study is based on over-time increases and might thus not be accurately captured by our cross-section data. However, we do believe that it is not unreasonable to assume that high levels of satisfaction can act as a sort of proxy-variable for this relationship of increases in satisfaction considering that Germany has a history implying that at least the majority of people should consider democracy more functional today than 30 years ago. Below we will try and see whether satisfaction with democracy has a positive effect on voting or not. However, no exact question was in the WVS questionnaire regarding this. We have chosen to look at the following indicators for democracy satisfaction and potentially use one or both of them as proxy-variables:

V130: In this question the respondent was asked if they consider having a democratic

political system is a “very good”, “fairly good”, “fairly bad” or “very bad” way of governing their country. Hereinafter this variable will be referred to as “HavDem” in tables etc.

V140: In this question the respondent was asked the question “How important is it for you to live in a country that is governed democratically?” on a scale from 1-10. Hereinafter this variable will be referred to as “ImpDem” in tables etc.

We reason that if a person believes that having democratic system is unpreferable compared to other governing systems, that person is probably not very satisfied with the current state of democracy. Similarly, if it is not very important for a citizen to live in a democracy, again that should reasonably indicate that they are not very satisfied with the current state of democracy.

3.4 Predictor variable: Political interest

Except for the already presented and used variables, we have chosen to additionally consider a variable which to us make intuitive sense to complement some of the other variables in order to better capture and explain voter behaviour: political interest, which will be referred to as ”PolInterest” and within the WVS questionnaire is labeled as follows:

(17)

16

3.5 Some other common predictors – control variables

There are of course many variables that affect whether a person vote or not. However, it is neither desired or practically possible to find every single one nor would it prove to be a good model if we were to try to fit too many variables. That being said, we do not want to present a range of underspecified models as this would result in both inaccurate interpretations as well as capricious predictions. Therefore, there are a few variables which are usually checked for in other studies regarding voter turnout that we will control for. In a study by Smets and van Ham (2013), the authors found that age and education were the most commonly included variables when studying voter turnout. These variables were found in about 70% of the studies of voter turnout that they examined. Furthermore, gender was found in more than 25% of the studies and is variable that is easy to work with and which makes sense from a demographical point of view to include. Therefore, we looked closer at these three variables, defined as following in our WVS dataset:

V240: Sex, where males are coded as 0 and females are coded as 1. Hereinafter this variable will be referred to as “Sex” in tables etc.

V242: Age, simply the age of an individual. Hereinafter this variable will be referred to as

“Age” in tables etc.

V248: Education, a person's level of education categorised into 9 different categories: 1. No formal education

2. Incomplete primary school 3. Complete primary school

4. Incomplete secondary school: technical/vocational type 5. Complete secondary school: technical/vocational type 6. Incomplete secondary: university-preparatory type 7. Complete secondary: university-preparatory type 8. Some university-level education, without degree 9. University-level education, with degree

(18)

17

4 Results

Before presenting the results from our estimated models, we present a table of correlations between our predictor variables. As all correlations are independent from other variables when calculated, it does not matter whether or not parts of or all variables we use within this paper are presented separately or together. Therefore, we have chosen, in order to reduce the number of tables included in the paper, to present a single correlation matrix containing all variables used within the entire paper in Table 4.1. Each time correlations are investigated throughout the paper, we will refer to this table.

Table 4.1: Correlations between all predictor variables within our dataset.

PolInterest HavDem ImpDem EmpStat Income Sex Age Education PolInterest 1.000 HavDem 0.201 1.000 ImpDem -0.255 -0.442 1.000 EmpStat 0.039 -0.023 0.033 1.000 Income -0.199 -0.030 0.068 -0.174 1.000 Sex 0.188 -0.023 -0.010 0.117 -0.022 1.000 Age -0.198 -0.099 0.150 0.315 0.067 -0.028 1.000 Education -0.225 -0.057 0.058 -0.280 0.326 0.002 -0.262 1.000

4.1 A benchmark model

First, we estimate a benchmark model based on the common predictors Sex, Age and Education. The model is

log [𝜋/(1 − 𝜋)] = 𝛽₈+ 𝛽_,𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝛽_-𝐴𝑔𝑒 + 𝛽_˜𝑆𝑒𝑥.

(19)

18 Table 4.2: VIF values for predictors within Model 1.

Variable VIF

Sex 1.002

Age 1.055

Education 1.054

In our dataset, Education had nine levels which could be used as dummies. These were coded as the highest degree of education an individual had at the time of the collection the data. However, nine dummies are not very manageable (especially if this variable is to be used in models containing other variables). Looking closer at the variable we saw that there were, for example, “Finished primary school” and “unfinished secondary school” (see Section 3.3 for further elaboration on what the categories look like). We tried to combine the higher unfinished level of education with the category of education following directly below, for example merging the two aforementioned categories. Rationally this does make sense from the perspective of having as equal distribution between groups as practically possible, as some of the groups were very small while others were considerably large, with the largest group (in the overall dataset before splitting it up) containing 611 observations while the smallest only contained a mere 14 observations. This imbalance could potentially be of concern even if the variable is to be used with a presumption of equidistance. After doing so we were left with 5 groups, with the biggest group now containing 650 observations while the smallest group now contains 140 observations. At this point, we could have run the model using 5 dummies. However, we chose to perform a likelihood ratio test to test the validity of a potential Likert scale as this would further reduce the number of parameters and were able to confirm that the variable could be used with restrictions allowing us to include it with only one coefficient (see Section 2.2 for a technical description of the test performed). We received a value of 0.053 which is far below 11.071 as needed for the five degrees of freedom we have for this variable as a result of having 5 categories that we restrict, thus indicating a better fit using the restricted model rather than the unrestricted. Therefore, we can technically use Education as a metric variable when constructing the model.

(20)

19 more accurate predictions. In doing so, we wish to contribute to furthering what Smets and van Hams (2013) refers to as a kind of ‘core model’ when explaining voter turnout. Additionally, we will also construct a model containing solely the two control variables to use as a benchmark against the other models to evaluate their effectiveness in predicting voter turnout. Hence, this section partially aims to construct and interpret a benchmark model that we later will use to compare the other models against when using them to classify whether or not the observations in the test dataset voted or not. Especially interesting is the opportunity to look closer at if the models where the two control variables are included to see if they manage to beat the benchmark model, thus offering insight into whether or not the variables apart from the control variables adds any predictive power.

After converting the estimated coefficients by exp(𝛽~K) we can interpret the estimates in a more

intuitive and accessible way, as the change in odds of success. When interpreting the estimates, we presume that all other variables are kept the same, also known as ceteris paribus within the field of economics. According to ”Model 2” in Table 4.3, the odds ratio for the Education variable is about 2.280 whereas the odds ratio for the Age variable is around 1.041. In practice, this means that an individual has more than twice the odds to vote compared to an individual in the previous group within the Education variable. Similarly, our model suggests that for each year older an individual is, that person has around a 4% increase in the odds of voting. The implications of this model are quite straight forward; it suggests that older people and more well-educated people to a higher extent cast their vote in general federal elections in Germany. Both these models in Table 4.3 showed a p-value below 0.05 for the Hosmer-Lemeshow test, indicating that they both have a fit bad. The McFadden’s pseudo 𝑅-_{is the same for the two}

(21)

20 Table 4.3: Logistic regression parameter estimates and standard errors under models based on sex, education and age for the probability of voter turnout.

Model Parameter Estimate Std.Error

Model 1 Intercept -2.123*** _0.505 Education 0.823*** _0.121 Age 0.040*** _0.007 Sex 0.033 0.217 Model 2 Intercept -2.110**** _0.498 Education 0.824**** _0.121 Age -0.040**** _0.007

Goodness of fit statistic Value

Model 1 𝑅_abc- _0.118 AIC 591.9 H-L test, p-value 0.011 Model 2 𝑅_abc- _0.118 AIC 589.9 H-L test, p-value 0.008

Note: *_{denotes significance at 0.1 level;}** _{denotes significance at 0.05 level;}***_{denotes significance at 0.01 level;} **** _{denotes significance at 0.001 level.}

4.2 Models based on wage and employment

We now estimate some models using the two predictors related to wage and employment:

EmpStat and Income. Looking at the correlation matrix in Table 4.1, we see no indication that

these two variables have any concerning level of correlation between one another. According to our calculations they have a correlation of -0.174. As the correlation is not higher than 0.9 in absolute value we proceed as usual with further analysis.

Before fitting a model based on these variables, we chose to perform a likelihood ratio test to test the validity of a potential Likert scale with the presumption of equidistance on Income, using the methodology presented in Section 2.2. The test result of 4.317 indicates that we can assume equidistance, as it is lower than the 𝜒-_{-value of 18.307 at 10 degrees of freedom using}

a 5% significance level. Therefore, we now assume equidistance between the 10 income categories, thus allowing us to use a ‘joint’ coefficient for all 10 categories, essentially making the Income variable technically act like a metric variable rather than a dummy-variable with 10 separate outcomes and 10 different coefficients.

(22)

21 unemployed, other) were reduced into ”employed”, ”outside the workforce” and ”unemployed”. The reason three dummies were created is because of the fact that there is a difference between being unemployed and being a student or retired, as these two groups stand outside of the ordinary workforce and are thus not expected to work (from a society-point of view). Employed here is then someone who either works full time, part time or is self-employed. Outside of the workforce is someone who is a student or retired. Unemployed is the remaining of the alternatives one could choose as a “no” answer to the question of whether or not one is employed. This division captures differences in voting patterns between people who work and not work, while accounting for the fact that students and retired people are not included in the ordinary workforce. We have defined three new variables: Empstat1, which is

the reference category, represents those who are currently employed. EmpStat2 represents those

outside of the workforce, i.e., students and retired people. EmpStat3 represent those who are

currently unemployed. Based on these variables, the following model was created: log [𝜋/(1 − 𝜋)] = 𝛽₈+ 𝛽_,𝐼𝑛𝑐𝑜𝑚𝑒 + 𝛽_-𝐸𝑚𝑝𝑆𝑡𝑎𝑡_- + 𝛽_˜𝐸𝑚𝑝𝑆𝑡𝑎𝑡_˜.

The estimated coefficients and their standard errors for this model are presented under ”Model 3” in Table 4.5. All included parameters are highly significant, apart from the intercept. According to this model, the odds ratio of voting for a person increases by a factor of 1.368 per unit change in Income, meaning that a person is 37% more likely in terms of odds to vote for each one unit increase in Income when compared to a person with a one unit lower income. Thus, our findings suggest the opposite of the conclusion that Charles and Stephens (2013) reached, namely that there is a positive relationship between income and voter turnout. For EmpStat, this model suggests that the odds ratio between a person whom is currently unemployed (EmpStat3) and someone who is currently employed (the reference category) is

(23)

22 After constructing this model to better understand the relationship between voter turnout and income and employment, we have chosen to construct a second version containing the same variables but also including our control variables, Age and Education (see Section 3.5 for a more detailed description of these variables). This model is referred to as ”Model 4”. The approach of including the control variables will be reproduced for each presented model. Looking at the correlation matrix presented in Table 4.1, we again do not find any concerning levels of correlation, with the highest correlation being 0.326 between Income and Education not exceeding our rule of thumb of 0.9. While this is a good indication that no concerning level of multicollinearity is present within the model, we will also look closer at the VIF values for our predictor variables; see Table 4.4.6_{As can be seen in Table 4.4, none of the variables}

indicate any VIF values close to what we would consider concerning and therefore no variables are excluded from further analysis on the sole basis of issues related to multicollinearity. Table 4.4: VIF values for predictors within Model 4.

EmpStat 1.132

Income 1.184

Age 1.125

Education 1.237

When running a logistic regression including the four aforementioned variables we soon came to realise that due to the presence of our control variables in the model, 𝐸𝑚𝑝𝑆𝑡𝑎𝑡˜ is no longer

significant with a p-value of 0.422 (see “Model 4” in Table 4.5). 𝐸𝑚𝑝𝑆𝑡𝑎𝑡- considerably

dropped in significance from being significant at a 0.1 percent significance level to now being significant only at a 5 percent significance level. Due to the fact that one level of our dummy variable EmpStat now cannot be considered statistically significant, we cannot include it in further analysis as there is no evidence suggesting this variable is better than chance at explaining variances in the dependent variable. There are several possible reasons that could explain why this turned out to be the case. For instance, while there are no signs of correlations between the independent variables or multicollinearity within the model, there could still be some form of pattern within the dataset where observations with for example higher education more frequently belong in the employed category, older individuals more often more frequently belong in the second EmpStat category and so forth. In essence, that the three variables to some extent explain the same variance. No matter the reason we have concluded, after looking closer at possible ways to salvage the variable, to exclude it from further analysis together with the control variables. Thus, one could claim that the inclusion of the control variables served a purpose of ensuring a model with better fit as a result of the exclusion of the EmpStat variable.

(24)

23 We continue with estimating a model where EmpStat has been dropped, so that it contains only

Income, Age and Education; see ”Model 5” in Table 4.5. The new model suggests that a person

who enjoys a one scale unit higher income (after transforming the log odds) is 1.223 times more likely to vote in terms of an odds ratio compared to a person in a one scale unit lower income level, ceteris paribus. In other words, there is a 22.3% change in odds for every scale unit increase in Income. Furthermore, according to this model we after transforming from log odds see that for each year increase in Age, the odds of a person increases by 3.98%, suggesting that there is a slight increase in the expected likelihood that a person votes the older the individual is. Last but not least, Education seems to have a positive effect on the likelihood that an individual vote according to this model. The Education variable contains 5 levels that have been tested and proven to be equidistant (see Section 4.1 for further elaboration) and according to the results for Model 5 (after transforming) the odds of voting increases by 109.17% per scale unit increase in Education compared to the previous level, suggesting that the odds are more than twice as high per increase in scale unit in Education.

Model 5 is, however, the only of the models in Table 4.5 where the Hosmer-Lemeshow test indicates a bad fit as we reject the null hypothesis that the model has good fit with an observed p-value of 0.015, below the needed 0.05. The McFadden’s 𝑅-_{is 0.133 which does not indicate}

excellent fit. Despite EmpStat3 being insignificant, the Hosmer Lemeshow test indicates that the fit is good for Model 4 with a p-value of 0.205. The McFadden’s 𝑅-_{is also slightly higher}

than the previous model at a value of 0.144. Model 3 and 4 have good fits according to the Hosmer-Lemeshow test. The McFadden 𝑅-_{s are quite similar for the three models but Model}

(25)

24

Table 4.5: Logistic regression parameter estimates and standard errors under models based on employment and income for the probability of voter turnout.

Model 3 Intercept 0.487 0.312 EmpStat2 0.954**** _0.285 EmpStat3 -0.548**** _0.276 Income 0.313**** _0.0629 Model 4 Intercept -2.332**** _0.000 EmpStat2 0.720** _0.029 EmpStat3 -0.235 0.422 Income 0.208 *** _0.002 Age 0.029**** _0.000 Education 0.714**** _0.000 Model 5 Intercept -2.681**** _0.542 Income 0,201**** _0.064 Age 0.039**** _0.007 Education 0.738**** _0.125

Model 3 𝑅_abc- _0.080 AIC 616.8 H-L test, p-value 0.890 Model 4 𝑅_abc- _0.144 AIC 578.5 H-L test, p-value 0.205 Model 5 𝑅_abc- _0.133 AIC 581.8 H-L test, p-value 0.015

4.3 Models based on satisfaction with democracy

We now estimate some models using the two predictors based on satisfaction with democracy:

HavDem and ImpDem. Before looking closer at their relationship, we did worry that HavDem

(26)

25 In order to reduce the number of parameters within the models, we performed a likelihood ratio tests to test the validity of a Likert scale on both HavDem and ImpDem. We can conclude that both the variables could be used as Likert scale variables. HavDem has a Chi-square value of 2.71, below our 5% rejection point of 9.488 (4 degrees of freedom) while ImpDem has a value of 13.88, also below our 5% rejection point of 18.307 (10 degrees of freedom). This implies theoretical equidistance when applied within this model, which simplifies the analysis as less parameters need to be included and estimated (see section 2.2 for a technical description of how a likelihood ratio test of the validity of a Likert scale is conducted). We therefore start by fitting the following model:

log [𝜋/(1 − 𝜋)] = 𝛽₈+ 𝛽_,𝐻𝑎𝑣𝐷𝑒𝑚 + 𝛽_-𝐼𝑚𝑝𝐷𝑒𝑚.

The estimation results for this model is shown under ”Model 6” in Table 4.7 at the end of this section. For HavDem Model 6 estimates that a person is 28.32% less likely to vote per one scale unit increase compared to the odds of voting in the previous category (since the exponential of -0.333 is 0.717). Here, it is important to note that 1 refers to “very good” while 4 refers to “very bad”, which explains why the observed relationship is negative rather than positive. For ImpDem, Model 6 estimates that a one scale unit increase in the odds results in a 23.61% increase in the odds of voting (since the exponential of 0.212 is 1.236). The p-value when conducting a Hosmer and Lemeshow test is less than 0.001, meaning that we reject the null hypothesis thus implying that the model has bad fit. The McFadden’s 𝑅-_{was also very}

low, 0.041, far from the excellent fit level of 0.2. This leads us to believe that Model 6 is not a good fit nor suitable to practically or conceptually understand voter turnout.

Now, a second model is constructed with the addition of the control variables being introduced; see “Model 7” in Table 4.7. Again, no significant correlations were detected within the model, with the highest correlation still being between HavDem and ImpDem (see Table 4.1). An extra check by using VIF values also indicates that there is no trouble with multicollinearity, as seen in Table 4.6.

Table 4.6: VIF values for predictors within Model 7.

ImpDem 1.261

Havdem 1.254

Age 1.089

(27)

26 When tested together with the control variables HavDem proved not to be significant, meaning that we cannot prove that it actually has an effect on voter behaviour (see Table 4.7). Model 7 also has bad fit according to the Hosmer-Lemeshow test (p-value 0.036), while McFadden’s 𝑅-_{is 0.139. We now proceed by fitting a model without HavDem. The estimation results for}

this model is shown under ”Model 8” in Table 4.7.

In Model 8, the odds ratio of ImpDem is about 1.226, meaning that a person who think that it is one likert scale unit more important to live in a democracy compared to another person is 22.6% times more likely to vote in terms of differences in odds. According to Model 8, the relationship between ImpDem and voter turnout is positive, contradictorily to the findings of Ezrow and Xezonakis (2016), which is furthermore indicated by both predictors estimated in Model 6. As mentioned, we could not measure an exact relationship as the one in primary focus in their study. Furthermore, their study introduces time series elements to analyse the phenomenon from a differing perspective, which is a dimension not covered by our paper. That being said, this model type is still of interest to evalue so we will proceed later in the study to use this model to predict voter turnout.

According to the Hosmer-Lemeshow test, all the models introduced in this section seem to have bad fit. This is an important finding and after giving it some thought it is not the strangest of phenomenons – usage of the quite vague concept of subjective perception of democracy as the sole factor to single handedly explain if someone voted or not does seem a bit far-fetched in the case of Model 6. The models do improve, however, by including Age and Education. The McFadden’s 𝑅-_{is the highest for Model 7 with a value of 0.139, but only slightly higher}

than for Model 8. Model 6 has a very low McFadden’s 𝑅-_{of 0.041. Model 6 furthermore has}

(28)

27 Table 4.7: Logistic regression parameter estimates and standard errors under models based on satisfaction with democracy variables for the probability of voter turnout.

Model 6 Intercept 0.676 0.671 HavDem -0.333** _0.163 ImpDem 0.212**** _0.060 Model 7 Intercept -3.004**** _0.848 HavDem -0.215 0.169 ImpDem 0.170*** _0.063 Age 0.036**** _0.007 Edu 0.799**** _0.123 Model 8 Intercept -3.673**** _0.667 ImpDem 0.204**** _0.057 Age 0.037**** _0.007 Education 0.805**** _0.122

Model 6 𝑅_abc- _0.041 AIC 640.84 H-L test, p-value 0.000 Model 7 𝑅_abc- _0.139 AIC 579.83 H-L test, p-value 0.036 Model 8 𝑅_abc- _0.137 AIC 579.43 H-L test, p-value 0.025

4.4 Models incorporating political interest

In a final setup we include the variable political interest (PolInterest) in combination with variables that have proven effective within the other presented models. We start off by considering the following model:

log [𝜋/(1 − 𝜋)] = 𝛽₈+ 𝛽_,𝑃𝑜𝑙𝐼𝑛𝑡𝑒𝑟𝑒𝑠𝑡 + 𝛽_-𝐼𝑚𝑝𝐷𝑒𝑚 + 𝛽_˜𝐼𝑛𝑐𝑜𝑚𝑒.

(29)

28 correspond with voter turnout but that intuitively should not correlate with each other. The aim is to cover different areas of an individual’s values and structural preconditions (taking the addition of the control variables that are to be introduced into account) and thus not only focusing on a specific area (such as plain economic factors) alone in order to account for potential deviances outside of said specific area that otherwise would not be captured by the model considering the complexity of voter turnout. As with the previous models, we start off by looking at the correlations between our predictors in Table 4.1. The highest correlation within predictors included in Model 9 is that between PolInterest and ImpDem with a value of -0.255, which is not of concern. In Table 4.8, the VIF values within this model are presented to ensure that we do not suffer from multicollinearity even though the correlations are not to be considered significant. As can be seen from the table, none of the VIF values are close to be of concern. Hence, we proceed onward with further analysis without the concern of multicollinearity.

PolInterest 1.131

ImpDem 1.064

Income 1.099

PolInterest is within the dataset coded with four options; very interested; somewhat interested; not very interested; and not at all interested. We performed a likelihood ratio test of the validity

of a Likert scale that ensured us that this variable was suitable to use with the presumption of equidistance between the four options. We received a test value of 2.926 which is well below 9.488, which is the threshold at four degrees of freedom using a 5% significance level. Thus we do not code it as four separate dummy variables but rather use it as one metric variable. At this point, a logistic regression was performed in R using the variables above.

Transforming the coefficients, we conclude that Model 9 suggests a 0.484 odds ratio for

PolInterest (the exponential of -0.725). This estimate suggests that a one scale unit increase is

(30)

29 the higher income one has declared, the more likely that person is to have voted in general federal elections.

The Hosmer and Lemeshow test resulted in a p-value of 0.181, indicating that Model 9 has good fit. The McFadden’s 𝑅-_{for this model is 0.131, which is comparable to the other models}

we have created; it is quite far below the excellent fit of 0.2.

As with the previous models, we now adjust the modelling to include our control variables,

Age and Education. The results for this model are presented under ”Model 10” in Table 4.10.

First, we check the correlations. No significantly large correlations were found between the independent variables in Model 10, with the highest correlation again (as with Model 5) being between Income and Education with a value of 0.326, which is far from concerning. We also produce VIF values, shown in Table 4.9. Not surprisingly the highest VIF value was that of

Education. However, it is not high enough to be of any concern for further analysis.

Variable VIF PolInterest 1.214 ImpDem 1.075 Income 1.353 Age 1.158 Education 1.431

(31)

30 positive relationship with voter turnout whereby a one scale unit increase in one’s level of education is estimated to result in about an 85% increase in the odds of voting.

When comparing the two models in Table 4.10, the Hosmer-Lemeshow test implies that Model 9 has a good fit (p-value 0.181) while Model 10 does not (p-value 0.001). At the same time, McFadden’s R2 is noticeably higher for Model 10 (0.178 versus 0.131). Furthermore, AIC is lower for Model 10, providing us with the general suggestion that this model is to be preferred when compared to Model 9.

Table 4.10: Logistic regression parameter estimates and standard errors under models based on satisfaction with democracy for the probability of voter turnout.

Model 9 Intercept 0.973 0.704 PolInterest -0.725**** _0.124 ImpDem 0.195**** _0.057 Income 0.262**** _0.063 Model 10 Intercept -1.874** _0.881 PolInterest -0.544**** _0.129 ImpDem 0.167*** _0.058 Income 0.173*** _0.066 Age 0.029**** _0.007 Education 0.617**** _0.133

Model 9 𝑅_abc- _0.131 AIC 583.09 H-L test, p-value 0.181 Model 10 𝑅_abc- _0.178 AIC 555.91 H-L test, p-value 0.001

(32)

31

5 Comparing and evaluating models using predictions

One of the many analytic options that logistic regression offers is the opportunity to make use of a model in order to predict whether or not observations are to be classified in one group or the other. In our case, that would be whether or not an individual votes in German general federal elections. In this section, we aim to evaluate the applied effectiveness of our models by investigating which of the models presented in this paper best predicts who votes and who does not within the remainder of our dataset, the so called ‘holdout’ or ‘test’ dataset.

A central method often used to test and visualise the results of a model’s predictive power is by examining a so-called classification matrix, also known as a confusion matrix. Within the matrix there are four possible outcomes: (1) predicting the occurrence of an event when it did

in fact occur; (2) predicting the occurrence of an event when it did in fact not occur; (3) predicting that an event did not occur when it did in fact occur; and (4) predicting that an event did not occur when it did in fact not occur. Here, both options 1 and 4 are considered a correct

prediction whereas options 2 and 3 means that the model failed to correctly predict whether or not one voted (Hair et al., 2014, p. 232).

For the general case, the number of predicted events divided by total amount of actual non-events is called specificity. For our case where we look at voter participation, the number of people that the model predicts to be nonvoters divided by the actual total amount of people within the test dataset who did not vote would be the specificity of a model. Conversely, the number of events that are predicted to occur divided by the total number of actual occurrences is called sensitivity. When correctly predicted events are divided by the total number of predictions, we get the so-called accuracy of a model. In this paper we have used R to calculate the cut-off point that maximises accuracy for each model individually. A cut-off point is the number used as a benchmark when letting a model predict (Hosmer et al., 2013, p. 170). When that number is exceeded, the model classifies the observation as an occurred event. So, for example, say we have two observations, one who is 45 years old and is classified as being in group 5 within the Education variable and another one who is 22 years old and is classified as being in group 2 within the Education variable. Say that using one of our models, the first person has an estimated 𝑌-value of (these are arbitrary numbers for the sake of exemplification) 0.8 whereas the other person has an estimated 𝑌-value of 0.3. Predicting using a cut-off point of 0.6, the model will predict that the first person voted and that the second person did not. Later, the actual turnout can be confirmed as we do have data over who voted and not within the dataset. If the model is correct in both instances, our classification matrix will indicate that one prediction of the occurrence of an event when it did in fact occur was made and that one prediction that an event did not occur when it did in fact not occur was made. If the case is in fact that for example both individuals did vote, our classification matrix will point out that one event that occurred was correctly predicted, while another event was predicted to not occur when it did in fact occur.

(33)

32 Due to the fact that understanding the behaviour and characteristics of nonvoters compared to voters is arguably more interesting and important in terms of policy implications, we are especially interested in the specificity of our models and will keep a close eye on that when evaluating and comparing the models. However, there are substantially fewer nonvoters in our dataset. There are a few techniques than can be used to deal with this ”class imbalance”, for example the usage of up-sampling, down-sampling and Synthetic Minority Over-sampling Technique (SMOTE) (Dech et al., 2014, pp. 322-324). However, in applying such methods we would render any possibilities to interpret and analyse the models estimates and other statistical measures, limiting the models to exclusively be effective at predicting. As our report aims to understand voter turnout on a broader and more general spectrum rather than on the basis of presenting the best model for forecasting German voter participation, we have chosen not to apply any of these techniques. However, it could be interesting to see potential differences in results if the class imbalance within the dataset was accounted for using a suitable technique. We encourage anyone who wish to more or less exclusively work with predicting and forecasting voter turnout, with this or similar datasets, to use an appropriate way of managing the class imbalance.

As mentioned above, we have used R to calculate the cut-off point that maximises the accuracy. The reason we have chosen to use a cut-off point that maximises accuracy rather than specificity which at first might seem to be a more rational choice considering the fact that we have mentioned that we are especially interested in specificity as a measurement when comparing the models is quite simply due to the fact that any model can easily maximise specificity by choosing a high enough cut-off point so that no observations are classified as a “success” or event occurrence, thus technically correctly classifying all events as non-events. The same can be done with sensitivity by choosing a low enough cut-off point so that all observations instead are classified as “successes” or occurrences, technically correctly classifying all events as events as all observations exceed the cut-off point value. Hence, we believe the fairest approach to choosing cut-off point in order to justly compare the models’ specificity (and other measures for that matter) is to instead use an accuracy-maximising cut-off point.

In this paper, we make use of so-called ROC-curves, short for Receiver Operating

Characteristic-curves. Among other things, a ROC-curve visualises where the optimal cut-off

point can be found. The curve shows the probability of finding true signal (specificity) and false signal (sensitivity) for different cut-off points, and a point where these points jointly are maximised can be calculated which then ensures the highest possible accuracy for that model. Furthermore, a ROC-curve can be used to see how well a model can differentiate between two groups, in our case between voters and nonvoters. The area under the curve (AUC) for a ROC-curve ranges from 0 to 1, where 0.5 equals the predictive performance of chance (Hosmer et al., 2013, p. 174). Hosmer et al. (2013) give guidelines for what the AUC should be. They suggest that values between 0.5 and 0.7 is to be considered bad discrimination, meaning that they are not much better than chance alone. An area from 0.7 to 0.8 is referred to as acceptable

To vote, or not to vote? Understanding voter turnout patterns