Roots of Conflict:

(1)

1

Roots of Conflict:

Classification and Regression Trees and the Complexity of Organized Violence

Jonah Simonds Master's Thesis

Spring 2021

Department of Peace and Conflict Research, Uppsala University Supervisor: Corinne Bara

Word count: 22,272

(2)

2

Abstract

Conflict researchers have validated many different theories on the causes of organized violence, but there are significant gaps in knowledge concerning how these theories interact with one another. In this thesis, I identify a body of the most prominent theories of organized violence and model them in an environment suitable for capturing these complex interactions.

I formulate six causal categories to which these theories belong: Geography; Economy;

Conflict History & Insecurity; Liberty & Inclusion; Natural Resources; and Structures of Governance. I then construct a cross-national, time-series sample of country-year

observations and create a general model of organized violence using a machine learning technique called Classification and Regression Trees (CART). The results from this first model indicates a substantial negative effect owing to Peace Years, a count of the number of years since the country last experienced an internal conflict. Subsequently, I construct three more models, each investigating different subsets of country-year observations based on their Peace Years value. My models indicate that the country-years most likely to experience a high number of deaths from organized violence are those where conflict occurred in the previous year, the population size is high, and the net rate of male secondary school

enrollment is low. The models also reveal several novel results under the presence of certain conditions, including: nonlinear relationships between deaths from organized violence and both oil exports and mass education; and a negative relationship between economic inequality and deaths from organized violence, wherein higher inequality results in fewer deaths. These findings highlight the importance of complexity-based modeling for both future conflict research and policymaking oriented towards violence reduction.

(3)

3

1. Introduction ... 5

2. Literature and theory on organized violence ... 7

2.1 “God gave physics the easy problems” ... 7

2.2 Defining organized violence and its causes ... 11

2.3 Causal category: Geography ... 13

2.4 Causal category: Economy ... 14

2.5 Causal category: Conflict History & Insecurity ... 15

2.6 Causal category: Liberty & Inclusion ... 16

2.7 Causal category: Natural Resources ... 17

2.8 Causal category: Structures of Governance ... 17

2.9 Accounting for complexity ... 18

3. Research design ... 20

3.1 CART: a statistical methodology for modelling complexity ... 21

3.2 Target attribute: Log-Transformed Deaths from Organized Violence ... 23

3.3 Input attribute operationalizations: Geography ... 25

3.4 Input attribute operationalizations: Economy ... 28

3.5 Input attribute operationalizations: Conflict History & Insecurity ... 29

3.6 Input attribute operationalizations: Liberty & Inclusion ... 29

3.7 Input attribute operationalizations: Natural Resources ... 30

3.8 Input attribute operationalizations: Structures of Governance ... 31

3.10 Tree specifications ... 32

4. Results and Analysis ... 34

4.1 General CART model for organized violence ... 35

4.2 Modeling country-years with one or more Peace Years ... 39

4.3: Modeling country-years with zero Peace Years ... 41

4.4 Modeling country-years where conflict onset is occurring ... 46

5. Implications ... 49

5.1 Research implications ... 49

5.2 Policy implications ... 53

6. Limitations ... 57

6.1 Theoretical limitations ... 57

(4)

4

6.2 Research design limitations ... 58

7. Conclusion ... 61

8. Acknowledgements ... 61

9. References ... 62

10. Annexes ... 68

10.1 Annex 1: Replication ... 68

10.2 Annex 2: Abbreviations ... 68

10.3 Annex 3: Logarithmic scale conversion ... 69

(5)

5

1. Introduction

The question of why certain countries go through periods of intense violence remains one of the most pressing inquiries in social science research. Consider the range of outcomes experienced by the countries that saw widespread social unrest in late 2010 and early 2011 during the “Arab Spring” movement. In Syria, the protests devolved into a bloody civil war which has become the deadliest conflict event of the 21^st century. Between 2010 and 2019, the most recent year for which data are available, the Uppsala Conflict Data Program (UCDP) estimates there were 362,257 battle-related deaths in Syria (UCDP 2020a). In contrast, Tunisia, the country from which the protest movement originated, experienced very little organized violence – only 159 battled-related deaths over the same time period (UCDP 2020b). Conflict researchers have offered up numerous theories seeking to explain the long duration and high intensity of violence in Syria, such as an authoritarian government that sought to maintain power by any means necessary, ethnic and religious divisions that

fragmented the opposition, structural factors that incentivized indiscriminate violence against civilians, and the involvement of foreign powers who propped up weak actors (Lynch 2013).

While these theories have strong theoretical and evidential foundations within existing conflict literature, it is apparent that no individual factor, on its own, is sufficient to explain what occurred in Syria. For instance, autocratic governance was a feature of nearly every country which experienced social unrest during the Arab Spring. No others, however, subsequently experienced violence on a level comparable to Syria.

What causes such extreme variance in organized violence? The discipline of conflict research seeks to provide empirical answers to this and related questions. As with the Syrian case, conflict researchers have developed a broad catalogue of theories on the causes of organized violence. However, the literature has considerably less to say about how these theories interact with one another. I believe this is owed, in part, to methodological preferences that have led researchers to test and validate individual theories rather than explore how violence occurs as the result of multiple interdependent conditions. Traditional quantitative conflict research has relied almost exclusively on linear regression analysis for null-hypothesis significance testing, but this family of techniques is generally poor at

handling problems involving nonlinearity or conditional relationships between causal factors.

To avoid these shortcomings, in this thesis I will perform my analysis using Classification and Regression Trees (CART), a basic machine learning technique. CART uses the logic of decision trees to recursively partition data into cohorts based on the inputs that best explain

(6)

6

variation in the target attribute – in this case, deaths from organized violence. CART is also well-suited for handling complex problems, yet nevertheless remains simple to interpret, an advantage it holds over other, more advanced machine learning techniques. My goal is to produce a general model of organized violence that paints a clear yet precise picture of the countries where organized violence is expected to be most severe based on existing theory:

for example, oil-producing countries with populations over 10 million people and a high degree of exclusion for marginalized ethnic groups. This will help answer the research question: what are the combinations of factors that best explain variation in organized violence?

Answering this question has value to both researchers and policymakers. For conflict researchers, models that account for causal complexity can provide a clearer picture of the situations and contexts in which different theories might be more or less valid for explaining organized violence. This has important implications for theory testing. Using the example cohort discussed above, a researcher interested in investigating the effect of ethnic exclusion on organized violence might augment their theory or research design depending on the

population size and oil production status of the countries they plan to study. For consumers of research such as policymakers, better intelligence regarding the scope conditions for different theories of organized violence might help them understand where and when to apply different policy correctives. A policymaker seeking to reduce the likelihood of violence in an oil- producing country with a population above 10 million should understand the importance of pursuing policies to address ethnic exclusion, while in smaller, non-oil producing countries, other policies may have a higher impact. Compared to general statements about the causes of violence across all countries, giving policymakers specific information tailored to the

circumstances of their country can save lives by reducing the likelihood of organized violence.

My thesis proceeds as follows. To construct my general model of organized violence, I identify and categorize existing theorized causes of organized violence that have a strong preexisting body of evidential support. I then describe my procedure for constructing a training dataset from a cross-national sample of country-year observations, as well as the CART methodology I use to analyze this training data. My analysis consists of four separate models, the first of which is a general model of organized violence. This first model reveals that the number of years since a country previously experienced conflict explains much of the variance in organized violence, so for my next two models, I split my original training data

(7)

7

into two subsets: country-years that did not experienced conflict in the previous year; and those that did. I also construct a fourth model investigating the intensity of organized violence in a subset of country-years where conflict onset is occurring. These models all highlight how organized violence can be explained through a complex series of interactions between input attributes. Country-years where conflict has recently occurred, the population size is large, and rates of educational enrollment are low are associated with the highest number of deaths from organized violence. For certain cohorts of country-year observations, the models also reveal relationships not reported by previous conflict research: nonlinear relationships between deaths from organized violence and both oil exports and mass education; and a negative relationship between economic inequality and deaths from organized violence, wherein higher inequality results in fewer deaths. Following the

presentation and analysis of my model results, I explore their possible implications for both research and policy. Finally, I conclude with a discussion of the limitations of my research.

2. Literature and theory on organized violence

In the following section, I discuss the issue of complexity in conflict research and how traditional quantitative research has struggled to identify how circumstances combine and interact to explain variation in organized violence. I identify several existing theories of organized violence and separate them into six distinct causal categories which form the foundation of my general model. I also introduce CART and discuss the implications of using algorithmic machine learning techniques for the purposes of developing or testing theory. In place of hypotheses, I discuss my expectations for my general model of organized violence.

2.1 “God gave physics the easy problems”

The study of organized violence as a modern field of research emerged in the first half of the 20^th century in response to the trauma of two world wars and fears that humanity could not survive a third as it entered the nuclear age (Wallensteen 2011). Following trends in other disciplines of social science, the leading voices in the nascent field of peace and conflict research sought to establish a discipline of research governed by empirical and positivist principles. This empirical emphasis, accompanied by advances in statistics and data science, led to the first generation of modern quantitative research on organized violence. Chief among these was the Correlates of War project, which sought to establish connections between armed conflict and demographic, geographic, socioeconomic, environmental, and

(8)

8

political factors (Singer and Small 1972). Borrowing liberally from the field of economics, quantitative conflict researchers began to rely heavily upon linear regression analysis as a means of null-hypothesis significance testing (Bernstein et al. 2000). This approach still dominates the state of quantitative conflict research to this day.

While the body of knowledge on the causes of organized violence has advanced considerably over the past 70 years, major gaps persist. This is exemplified by the absence of a generalized, or unified, theory of organized violence. Instead, there is a patchwork of

theories that seem to apply to certain cases but not to others. Why is this the case? One reason is that organized violence is a very complex phenomenon – in the words of Bernstein et al.

(2000), “God gave physics the easy problems.” Complexity in the social sciences is itself a multifaceted concept. Ragin (1987, p. 20) refers to situations in which “an outcome results from several different combinations of conditions,” while Jervis (1997, p. 35) offers, “the effect of one variable or characteristic can depend on which others are present.” Complexity can also manifest in data structures, such as “high dimensionality; a mixture of data types;

nonstandard data structure; and perhaps most challenging, non-homogeneity: that is, different relationships hold between variables in different parts of the measurement space” (Breiman et al. 1984, p. 7). As such, accounting for complexity is perhaps the defining challenge for all systematic social science research, but especially so for multidimensional social phenomena such as organized violence. The linkages between determinants of organized violence, as well as challenges operationalizing and measuring abstract social concepts, make it very difficult to prove that any individual factor is sufficient to cause organized violence.

The challenge of accounting for complexity in quantitative research on organized violence has been exacerbated by a theoretical adherence to null-hypothesis tests of significance. Continuing in the tradition of the first generation of quantitative conflict research, quantitative traditionalists have, for more than 50 years, been primarily focused on developing and testing theories concerning the influence of so-called “independent” variables on different dimensions of organized violence, such as onset, duration, and intensity. This has resulted in many siloed theories of organized violence, but not a lot of consensus relating to the validity of these theories in relation to one another. When the field was less mature, relying upon bivariate hypothesis testing was justifiable owing to the lack of knowledge on the relationships between organized violence and its individual covariates. Similarly, this mode of research remains the best option for testing emerging or underexplored theories related to organized violence or armed conflict. That said, scholars studying organized

(9)

9

violence should not be satisfied if the state of research consists of a series of segregated theories. To make continued improvements in our understanding of organized violence, as well as the policy choices that may impact its likelihood of occurrence, it is essential that these theories be tested in environments where their relative validity, potential interlinkages, and scope conditions can be better understood. This is the goal of this research project.

Qualitative research can fill some of these gaps, but is ill-suited for validating theory owing to the small number of cases studied, which limits external validity (Gerring 2007, p. 43).

What needs to happen to address these challenges? I believe quantitative conflict researchers should move away from linear regression analysis as the default technique for testing theory and seek to employ alternate methods. Regression analysis such as ordinary least squares (OLS) is best suited for documenting simple linear relationships, and even so involves a number of explicit assumptions that limit its practical applicability. These include assumptions of no autocorrelation and no multicollinearity (Kellstedt and Whitten 2009, pp.

189-194), both of which are inherent to complex social phenomena. Consider the common language for interpreting multivariate regression coefficients: “A change in x is associated with a change in y holding all other variables constant.” It is contradictory to assume that x can vary without causing variation in other independent variables while simultaneously acknowledging the causal complexity of the phenomenon being studied. Again, linear regression analysis has its place in social science research because it can be a valuable tool for testing the statistical significance of individual inputs. However, quantitative researchers should seek other techniques for the purpose of developing or testing theories which account for causal complexity.

Linear regression analysis has been the dominant quantitative technique for so long that I believe it has narrowed the scope of the questions researchers are willing to ask about organized violence. For example, in one influential article on inequality and conflict, Østby (2008, p. 144) writes: “There are, in theory, five possible relationships between economic inequality and political conflict: positive, negative, convex (inverted U-shaped), concave (U- shaped) or null.” It might be more accurate to write that these are the five types of

relationships that are verifiable with linear regression analysis; but it stands to reason that nonlinear or conditional relationships are also theoretically possible. For example, if

economic inequality has a strong, negative relationship with political conflict in cases where two other conditions are present, but an equally strong, positive relationship with political conflict in the cases where those conditions are absent, regression analysis would

(10)

10

inaccurately yield a null result. Østby establishes that conflicting evidence is problematic but neglects to discuss that multiple relationship types may be observable between inequality and conflict depending on different scope conditions – the crux of a complexity-based argument.

Rather than consider the full range of theoretical possibilities, the relationship between

inequality and conflict has been oversimplified in this example because of the methodological limitations of linear regression analysis.

Owing to these concerns, many other conflict researchers have questioned the continued dominance of linear regression analysis and null-hypothesis significance testing, and there is a growing acceptance of alternative quantitative techniques (Ward, Greenhill, and Bakke 2010; Bara 2020). The most prominent practitioners of nontraditional methodologies have been conflict forecasters, who frequently employ algorithmic approaches such as machine learning to predict when and where violence is likely to occur (Bara 2020). There is an important distinction between conflict research that seeks to explain the causes of past events and research that seeks to predict future violence. This distinction often manifests in very different theoretical approaches regarding how research is conducted (Ward, Greenhill, and Bakke 2010). Explanatory research is “process-oriented” in that the philosophical business of constructing and testing theory is prioritized over the validation of theory.

Predictive research such as conflict forecasting, on the other hand, is “results-oriented” in that the accuracy of the statistical predictions takes precedence over building theory. Machine learning, a branch of artificial intelligence based on algorithmic decision making and pattern identification, is well-suited for this latter approach, in large part because it has demonstrated superior predictive performance compared to statistical alternatives (Muchlinski et al. 2016;

Jones and Lupu 2018).

However, there is a trade-off associated with advanced machine learning techniques between predictive power and explanatory clarity. For example, random forests, a popular machine learning technique, achieve strong out-of-sample predictive performance through resampling and random feature selection (Kelleher, Mac Namee, and D’Arcy 2015). In the process, however, the models become black boxes that are too convoluted for humans to interpret intuitively. Discussing the interpretably of various machine learning techniques, Molnar (2020) writes: “If you focus only on performance, you will automatically get more and more opaque models.” This can be problematic for high-stakes predictive modeling, such as conflict forecasting, not least because algorithms are never flawless (Rudin 2019). Their predictive accuracy, while improved over many other statistical techniques, is still far from

(11)

11

perfect. Other issues, such as measurement error in the training data, can also cause algorithms to produce inaccurate predictions. When machine learning models are

uninterpretable, it becomes impossible to distinguish between results that represent genuine trends versus results that reflect idiosyncrasies in either the observable data or the algorithm specifications. This is a major downside for conflict researchers who seek to use advanced machine learning techniques to explain the causal complexity of violence, rather than just predict its prevalence.

In summary, there is a tension in quantitative conflict research between methodologies that can validate narrow theoretical claims while not accounting for complexity, and methodologies that demonstrates strong predictive performance by accounting for complexity while neglecting to provide intelligible causal explanations. Is there a way to bridge this gap? In this paper, I explore CART with the aim of testing several existing theories of organized violence to help build consensus towards a general model while accounting for causal complexity. This methodological choice is further substantiated in Section 3.

2.2 Defining organized violence and its causes

Any general model of organized violence needs to be built in careful consideration of the commonly understood causes of such violence. In the following subsections, I provide some definitions of key concepts and then describe how I’ve grouped the causal factors of organized violence into six broad categories. The discussion on each category includes a brief explanation of how each factor is theorized to contribute to organized violence.

Before I can categorize causal factors, however, I need to establish exactly what I mean by organized violence. The Uppsala Conflict Data Program (UCDP) measures and separates organized violence into three categories: armed conflicts, non-state conflicts, and one-sided violence (Themnér and Wallensteen 2012). Armed conflict is defined by UCDP as

“a contested incompatibility between two parties – at least one of which is the government of a state – that concerns government or territory or both, where the use of armed force by the parties results in at least 25 battle-related deaths¹ in a calendar year” (Ibid., p. 66). Non-state conflict is defined as “the use of armed force between two organized groups – neither of

1 The threshold of 25 battle-related deaths is arbitrary. Gleditsch et al. (2002) justify its selection as a large enough number to represent a politically significant event, but small enough to give the data a high degree of specificity (previous conflict data initiatives, such as Correlates of War, had used a threshold of 1000 deaths).

(12)

12

which is the government of a state – that results in at least 25 battle-related deaths in a year”

(Ibid., p. 71). Finally, one-sided violence is “the use of armed force by the government of a state or by a formally organized group against unorganized civilians that results in at least 25 deaths” (Ibid., p. 77). In the aggregate, and setting aside the 25-death threshold, these three categories make up my conceptualization of organized violence. Of these categories, armed conflicts have accounted for the majority of deaths over the past 30 years, and within this category, intra-state wars, more commonly referred to as civil wars, are by far the most common (Ibid.). Consequently, most conflict scholarship is now oriented towards the study of civil wars. However, in line with the theme of complexity, events of organized violence rarely fit neatly into one category or the other. For example, the ongoing Syrian civil war has featured extensive non-state and one-sided violence in addition to violence more classically categorized as armed conflict (Lynch 2013). The Syrian civil war has also had complex transnational dimensions, an example of how the line between inter- and intra-state armed conflict is growing increasingly fuzzy (Forsberg 2016).

While armed conflict accounts for the greatest share of overall deaths from organized violence, non-state and one-sided violence are also important phenomena to study. The highest single country-year total of deaths from any category of organized violence, by far, occurred in 1994 in Rwanda mostly as the result of horrific one-sided violence against civilians. Meanwhile, the highest single country-year total of deaths from non-state conflict occurred in Mexico in 2019, the most recent year for which UCDP data is available, owing primarily to ongoing violence between drug cartels. As the importance of non-state actors continues to grow, there is an urgent need to better understand the causes of non-state

violence. It is also increasingly the case that non-state violence and violence against civilians occurs within the context of purportedly state-based armed conflicts, such as in Syria.

Consequently, it is apparent to me that any general model of organized violence cannot be limited only to armed conflict but must instead account for deaths from all three forms of organized violence.

Literature reviews of existing research on organized violence have identified some reliable predictors (Hegre and Sambanis 2006; Dixon 2009; Fjelde and Nilsson 2012;

Valentino 2014). As Dixon (2009, p. 707) notes, many quantitative studies of violence use the same basic set of “control variables” as a point of departure and then add one or two extra

“study variables” which represent causal factors warranting further research. So, to construct

(13)

13

a general model, it is logical to begin from this consensus and then augment as new evidence becomes available. I have identified six thematic groupings, or “causal categories,” for these commonly theorized drivers of organized violence:

Geography; Economy; Conflict History & Insecurity;

Liberty & Inclusion; Natural Resources; and Structures of Governance (see Table 1). In the following subsections, I will clarify the scope of each category and briefly relate the mechanisms by which these different causal factors are theorized to cause organized violence. I will then

summarize, in italics, the main argument of each theory.

2.3 Causal category: Geography

Geography includes the physical and human geography of a country, as well as measures such as population size, area, and natural features such as mountainous terrain.

Specific measures that fall within this causal category are often included in quantitative conflict studies as control variables without much substantive discussion of their theoretical implications. In general, large countries (both in land area and population) are considered more difficult to govern and police and, thus, more prone to experiencing internal conflict owing to increased opportunities for rebellion (Fearon and Laitin 2003). Large countries often encompass diverse subnational regions and social identities, which creates complicated political arrangements and impedes to the formulation of political consensus (Ibid.). It follows logically that countries with large populations should, statistically, also experience more deaths from organized violence. Rough terrain such as mountains is also theorized to represent an opportunity for insurgency by inhibiting government mobility and creating more spaces for rebels to hide (Ibid.). By the same token however, rough terrain may also reduce the overall intensity of violence by creating conditions where a protracted low-level

insurgency is viable (Mukherjee 2014). So, rough terrain may increase the likelihood of conflict while paradoxically lowering the intensity of violence.

Theory 1: Countries with larger populations experience more organized violence.

Theory 2: Countries with larger land areas experience more organized violence.

Theory 3: Countries with rough terrain such as mountains experience more conflicts, but may experience less organized violence as expressed by battle-related deaths.

Table 1: Causal categories Causal Categories:

Geography Economy

Conflict History & Insecurity Liberty & Inclusion

Natural Resources Structures of Governance

(14)

14 2.4 Causal category: Economy

The next causal category, Economy, refers broadly to the notion of general prosperity, but also encompasses conceptions of economic structure and growth, development, and education. The negative relationship between economic prosperity and organized violence is one of the most thoroughly substantiated theories in the field of peace and conflict research. It is theorized that poverty and lower levels of economic development are drivers of internal conflict, owing to a lower cost of violence and a reduction in the foregone cost of rebel recruitment (Collier and Hoeffler 2004). Low economic growth is thought to contribute to violence for many of the same reasons (Ibid.). Extreme state poverty has been conceptualized as a proxy for state weakness, creating environments where violence is more likely to occur because the government is unable to adequately sanction those who participate in rebellion (Fearon and Laitin 2003). Other research has suggested an inverted U-shaped relationship between economic development and the incidence of violence, particularly domestic

terrorism, as the least developed countries have fewer valuable targets for terrorism (Ghatak and Gold 2017).

It is also worth considering how the distribution of resources within a country’s economy can create conditions for violence. Although individual- or household-level economic inequality has an inconsistent statistical record as a predictor of violence, it is theorized to cause violence through psychological and social mechanisms. Psychologically, high levels of inequality causes individuals to perceive themselves in a state of relative deprivation compared to those in society who are better-off (Gurr 1970). Depending on the scale of the deprivation, this can compel deprived individuals towards organized violence stemming either from their feelings of frustration or from hope that acts of violence will improve their material condition (Ibid.). The social mechanism describes how violence can occur in societies with high inequality where individuals develop a broader sense of identity on the basis of socio-economic class, which is then used as a tool for mobilization to

overcome collective action problems (Bartusevičius 2014).

Although education could be its own causal category, I have also chosen to include it in Economy because the mechanisms by which education may influence violence resemble other economic factors. Education and prosperity often go hand-in-hand, as educational attainment is highly correlated with economic prospects on both an individual and a societal level. Educational investment disincentivizes conflict by creating alternative avenues by which people can improve their lives (Thyne 2006). At the same time, higher rates of

(15)

15

educational enrollment, particularly of young males, decreases the pool of available recruits for armed groups and drives the cost of recruitment upwards, as educated individuals have more options to earn a living outside of being a solider (Collier and Hoeffler 2004).

Theory 4: Less prosperous countries experience more organized violence.

Theory 5: Countries with lower economic growth experience more organized violence.

Theory 6: Countries with more individual- or household-level economic inequality experience more organized violence.

Theory 7: Countries with lower rates of education experience more organized violence.

2.5 Causal category: Conflict History & Insecurity

Conflict History & Insecurity, as a causal category, comprises the time since previous conflict, as well as ongoing factors related to real or perceived insecurity. There is a

consensus in the conflict literature that a prior history of civil conflict is a dangerous

condition that significantly increases the likelihood of future conflict (Dixon 2009). A count of years since the last conflict, or “peace years,” has been shown to have a strong, negative correlation with the incidence of new violence (Ibid.). New states are also thought to be more unstable, and are thus more prone to internal violence or a tempting target for opportunistic foreign adversaries (Fearon and Laitin 2003). Finally, there is strong evidence that a country that shares a land border with another country experiencing armed conflict is itself more prone to violence (Buhaug and Gleditsch 2008). The mechanisms for this “contagion” theory of violence include increased availability of arms which lowers the cost of fighting, refugee surges that increase competition for scarce local resources, and other factors such as

decreased trade that can cause economic uncertainty and instability (Forsberg 2016).

Theory 8: Countries with recent conflict histories experience more organized violence.

Theory 9: New states experience more organized violence.

Theory 10: Countries experience more organized violence if they share a border with another country experiencing armed conflict.

(16)

16 2.6 Causal category: Liberty & Inclusion

Liberty & Inclusion is the most broadly conceived of the causal categories, encompassing ideas such as equality of opportunity, freedom from oppression, and

discrimination based on social group identity. Many of these factors have been considered controversial topics within the field of quantitative conflict literature owing to the difficulty of measuring and collecting data on such abstract concepts (Isaacs 2016). For instance, over the past two decades there has been considerable disagreement within the literature on the statistical relevance of group-level discrimination (Fearon and Laitin 2003; Collier and Hoeffler 2004). Group-level discrimination is experienced on the basis of belonging to a particular social group, such as an ethnic minority, and can manifest in political, economic, or social exclusion. After robust debate, however, there is now near-universal acknowledgement in recent conflict scholarship that group-level discrimination, at least under certain

conditions, can be a statistically significant predictor of organized violence (Østby 2008;

Cederman, Gleditsch, and Buhaug 2013; Hillesund et al. 2018). Much like economic inequality, group-level discrimination is theorized to lead to violence by fomenting long- standing grievances among individuals who are then easily able to mobilize on the basis of a shared identity (Gurr 1993). There is considerable statistical evidence of a positive

relationship between domestic terrorism, a form of organized violence, and both minority economic discrimination and ethnic political exclusion (Piazza 2011; Hansen, Nemeth, and Mauslein 2020). The same is true of religious oppression, particularly when religious identities overlap with other social identities (Basedau, Pfeiffer, and Vüllers 2016).

I have also chosen to group measures of gender equality or nonequality in the Liberty

& Inclusion causal category. Although the quantitative record on gender-based discrimination is inconclusive, organized violence is a highly gendered social phenomenon. Men make up the overwhelming majority of perpetrators of organized violence, and the effects of organized violence can also be highly gendered as well depending on the circumstances and the nature of the violence being committed (Bjarnegård et al. 2015). In October 2000, the United Nations Security Council endorsed Resolution 1325, which henceforth became known as the Women, Peace and Security (WPS) Agenda (UN Security Council 2000). Acknowledging the gendered nature of conflict, the WPS Agenda specifically calls upon UN Member States to integrate women at every level in the prevention and resolution of conflicts (Ibid.). With this consideration, it makes sense to me to include measures of gender equality in this causal category.

(17)

17

Theory 11: Countries where there is more group-level economic discrimination experience more organized violence.

Theory 12: Countries where many people are politically excluded on the basis of ethnicity or race experience more organized violence.

Theory 13: Countries with less religious freedom experience more organized violence.

Theory 14: Countries with less gender equality experience more organized violence.

2.7 Causal category: Natural Resources

Natural Resources, as a causal category, could plausibly be considered a subset of the Geography category, but there is a distinct stream of conflict literature on the importance of certain natural resources such that it seemed worthy of its own designation. Particular attention has been given to oil as a natural resource that can contribute to organized violence (Ross 2004; Päivi Lujala 2009). Exploitable oil reserves are theorized to drive civil conflict through strategic incentives as well as the economic and institutional consequences of natural resource rents (Sachs and Warner 1995; Fearon and Laitin 2003; Collier and Hoeffler 2004).

Diamond production has also been shown to have a robust effect on the incidence of violence, at least under specific conditions – such as the relationship between diamond production and civil conflict in societies divided along ethnic lines (Päivil Lujala, Gleditsch, and Gilmore 2005).

Theory 15: Countries with large exploitable oil reserves experience more organized violence.

Theory 16: Countries that produce diamonds experience more organized violence.

2.8 Causal category: Structures of Governance

The last causal category is Structures of Governance, which covers the democracy- autocracy spectrum of political regimes as well as other measures relating to political institutions. One of the most well-known theories of interstate armed conflict is the

“democratic peace theory,” which posits that democracies are less likely to start wars amongst one another (Owen 1994; de Mesquita et al. 1999). While this theory has been robust to repeated statistical tests, there is scant evidence that democracies are less violent overall when controlling for pre-existing socioeconomic conditions (Gat 2005; Hegre 2014).

Rather, there is a strong body of evidence that, along the democracy-autocracy scale, it is

(18)

18

better to inhabit one of the poles (either strong democracy or strong autocracy) than to fall somewhere in the middle as an “anocracy” or incoherent political regime (Hegre et al. 2001;

Slinko et al. 2017; Jones and Lupu 2018). Strong institutions, a proxy for state capacity that is a feature of both stable democracies and autocracies, has been proposed as an alternative mechanism which discourages violence (de Mesquita et al. 1999; Vreeland 2008; Hendrix 2010). Consequently, measures of institutional strength, as well as periods of political instability that weaken state institutions, should be considered in this causal category.

Theory 17: Democracies do not go to war with each other, but democratic countries do not inherently experience less organized violence overall.

Theory 18: Anocracies experience more organized violence.

Theory 19: Countries where there is political instability experience more organized violence.

2.9 Accounting for complexity

Taken together, these six causal categories and nineteen theories resemble a theoretical core of the drivers of organized violence, as confirmed by integrating previous quantitative research findings. However, it is important to note here that, as with the different categories of organized violence, each of these causal categories represents its own

substantial field of research with its own complexities and unanswered questions. I also wish to acknowledge the integrity of other streams of peace and conflict research which have not been included in this analysis. Ultimately, a substantial amount of simplification was necessary to accomplish my research goal within the parameters of this thesis. It is also evident that none of these causal categories are necessary or sufficient conditions for the occurrence of violence. For example, while there is a strong theoretical and evidential link between exploitable oil reserves and civil conflict, Norway has substantial oil reserves and is nevertheless one of the most peaceful countries in the world. While these theories are all probabilistic, it seems likely that there are other intervening factors which moderate the relationship between oil reserves and the likelihood of civil conflict such that Norway is unaffected. This is an example of the interdependence between these causal categories and the causal complexity of violence.

Researchers have barely scratched the surface on how these causal categories and their corresponding indicators interact with one another, but the existing research is tantalizing. In response to seemingly contradictory assessments on whether ethnic

(19)

19

fractionalization contributes to civil war onset, Blimes (2006) identified that ethnic

fractionalization may have an indirect effect on civil war onset. Ethnic fractionalization alone is not a robust predictor of civil war onset, but ethnic identities may serve as the fault lines in countries where other factors would prompt civil conflict, because ethnic groups share a social cohesion that allows them to overcome collective action problems associated with armed insurrection (Ibid.). In another example, Hansen, Nemeth, and Mauslein (2020) proposed that the effect of ethnic exclusion on terrorism could be accentuated by poverty, population density, and regime type. This parallels research done by Asal et al. (2016), who found that exploitable oil reserves interact with the political exclusion of ethnic groups to create conditions where violence is more likely to occur. In a study with methodological relevance for my own research, Jones and Lupu (2018) constructed ensemble models using multivariate regression and classification trees to assess the conditions under which

anocracies are more likely to experience violence. These are just a few examples of

quantitative conflict researchers seeking to account for causal complexity by pursuing novel theoretical and methodological approaches to modeling armed conflict and organized violence.

The above review of the quantitative conflict literature has relied on studies that primarily used linear regression analysis to perform null-hypothesis significance testing. In this study, in place of regression analysis, I will instead make use of classification and regression trees, a machine learning approach that performs well while modelling nonlinear and interdependent causal factors. This methodological choice also has important theoretical implications regarding hypothesis testing – specifically, CART is ill-suited for hypothesis testing because, unlike regression analysis, the trees do not provide p-values and slope coefficients for each regressor in the model. Rather, the CART algorithm identifies the most relevant input attribute at each split in the tree, meaning only the most statistically relevant attributes will show up in the model. We can still draw some conclusions concerning the importance of specific attributes based on measures of variable importance, but a null hypothesis cannot be accepted or rejected if an attribute is not even represented in the model output. It would also be scientifically disingenuous for me to generate my CART models and then, based on their results, go back to write a series of hypotheses I can pretend to “test.”

With 19 different theories operationalized across the six causal categories, there are 171 possible bivariate interactions to test – and this doesn’t even take into consideration

interactions between three or more attributes. Proposing a series of hypotheses based on my

(20)

20

tree output alone would be both impractical and incongruous with the purpose of this research.

To the extent that this thesis can be considered theory-driven, it is owing to the extensive existing theoretical and evidential substantiation of the six causal categories as drivers of violence. The theories that make up these causal categories were selected for inclusion in this study specifically because they have been tested and shown, repeatedly, to be statistically significant predictors of violence. Given that, it would be contradictory for me to speculate on which of these causal factors should show up in my models while others should not. For conflict scholars who fashion social science research as theory-driven, declining to propose my own theories or to test hypotheses might seem like bad science at best and heresy at worst. In practice though, social science research has always involved an ongoing dialogue between theory and empirics – a dialogue that is usually hidden from view as researchers tinker with their models before publishing their findings. With algorithmic approaches such as machine learning, which involve the construction of a model with minimal human intervention, this dialogue should become more transparent. Data mining with machine learning can reveal relationships that may not have been apparent using other techniques, and these findings can be used to constructed new evidence-based theories.

Despite this, I do have two expectations for what these trees might reveal. First, I expect that the trees will present a complex picture of the causes of organized violence. By this I mean that several input attributes from different causal categories will be represented in the trees, and the analysis will present several nonintuitive and nonlinear interactions between the causal categories. Second, I expect that the “economy” causal category will register more saliently in the tree output than the other causal categories. This is owing to the consistent statistical robustness of prosperity, as measured by GDP per capita, and other economic variables in virtually every existing quantitative analysis of organized violence.

3. Research design

With context on the state of research on organized violence and the goals of this project, I will now explain my research design choices for this study. First, I discuss the benefits of CART and how it can model complexity while producing intuitive and easy-to- understand results. I then discuss my operationalization of both my target attribute, Log- Transformed Deaths from Organized Violence, and my input attributes which seek to explain

(21)

21

variation in the target attribute. Finally, I lay out the technical specifications for the regression trees I will use to construct my models.

3.1 CART: a statistical methodology for modelling complexity

Given the drawbacks of linear regression analysis for analyzing complex social phenomena, in this study I will make use of a basic machine learning technique known as CART, which is modeled on the logic of decision trees. The use of trees for solving

classification or regression problems was first proposed by Morgan and Sonquist (1963), but the CART methodology as it is known today was brought to the forefront of statistical research in 1984 by Breiman et al. (1984). As they noted, “trees…were originated by social scientists motivated by the need to cope with actual problems and data” (Ibid., p. 2).

Specifically, these social scientists were dealing with complex phenomena and attributes that interact with one another, as well as nonparametric or nonlinear data structures. CART

handles these problems through an algorithmic approach that learns from data without relying on rules-based programming, which negates the need to follow the explicit assumptions of linear or logistic regression (Molnar 2020). Furthermore, CART performs automatic feature selection, meaning it identifies the most important inputs and subordinates the rest. In contrast to linear regression models, where human intervention is necessary for finding the model of best fit, CART cuts through high dimensional data to produce models that are both parsimonious and best-fitting. While trees should not be used to the exclusion of all other statistical methods, Breiman et al. (1984) found that CART generally performed better than linear regression for solving nonlinear problems. This capability is particularly notable for researchers studying organized violence, a field in which complex data and nonlinear relationships are the norm.

The inspiration to use CART to model complexity in organized violence came from a project being done by the United Nations Economic and Social Commission for Asia and the Pacific (ESCAP) using CART to model inequality in the Asia-Pacific region (Savic and Wang 2019). Nicknamed “LNOB” for “Leaving No One Behind,” the project uses CART to reveal how multiple layers of circumstances can combine to reveal inequality between groups of people in access to various indicators of sustainable development.² Much like organized violence, inequality is a multidimensional social concept with interlinking underlying causes

2 UNESCAP’s LNOB project can be viewed at: https://lnob.unescap.org

(22)

22

and complex data structures. The LNOB trees account for this and provide a high degree of specificity when identifying which social group is the furthest behind in access to basic opportunities: an example cohort might be women aged 25-34, living in rural areas, with low educational attainment. This type of intelligent data disaggregation is invaluable for

policymakers who wish to identify the cohort most in need of assistance with policy interventions. Likewise, I anticipate that CART can provide a high degree of specificity in describing the combination of circumstances under which organized violence is most likely to occur, such as the example cohort discussed in Section 1: oil-producing countries with

populations over 10 million people and a high degree of exclusion for marginalized ethnic groups.

Another advantage of CART is the output, which can be interpreted and understood even by those without a background in statistics. The logic of CART closely imitates how most people intuitively make decisions about which traffic route to take home or which produce to buy at the grocery store. Following the produce example: most of us have several heuristics we use when selecting the ideal bunch of bananas. We generally prioritize bananas that are ripe, or nearly ripe, and with as few blemishes as possible. We might have other preferences – such as the size of the individual fruit or the number of bananas in the bunch – but we’re unlikely to put a bunch of bananas in our shopping cart if the fruit are heavily bruised, regardless of their other characteristics. So, if we construct a decision tree of our banana selection process, the number of blemishes would probably be the first split in our tree, with other characteristics being ranked lower but still ultimately influencing our final selection. The simplicity of CART output facilitates disseminating the results of research to people who do not have any formal statistical training, which is the case for many

policymakers.

Simplicity is the main reason why I have chosen CART rather than a different, more advanced machine learning technique such as random forests or gradient boosting. Random forests, for example, make use of bootstrapping and random feature selection to better account for the overfitting issues inherent to CART (Molnar 2020). This is beneficial for predictive performance, but the associated drawback is that it is more challenging to measure the impact of a specific explanatory factor or a specific case, and the resultant model becomes a black box (Ibid.). This might be an acceptable trade-off when the research goal is predictive accuracy, but not for this study where interpretability and intelligibility are essential. CART fits the bill for a methodology that can account for causal complexity while still providing

(23)

23

useful insights into how different input attributes interact with one another to explain violence. In my review of conflict literature, I found very few examples of CART being utilized as a standalone methodology. This may be indicative of the difficulties in publishing quantitative conflict research that does not observe the convention of null-hypothesis

significance testing, but it also suggests that this may be an underexplored space where valuable insights can be gained regarding the causal complexity of violence.

3.2 Target attribute: Log-Transformed Deaths from Organized Violence

With the goal of making a general model of organized violence, the target attribute I am seeking to measure will be Deaths from Organized Violence.³ The definition and

operationalization of this target attribute are taken from the UCDP Georeferenced Event Dataset (GED), which tracks incidences of fatal organized violence that result in at least one death and provides an estimate of the number of deaths, as well as the geographic location of the event and its duration (Sundberg and Melander 2013). The GED is limited to only

tracking events which belong to conflict dyads that have, at some point, crossed the threshold of 25 battle-related deaths in a single calendar year. UCDP’s methodology relies upon

newswires and other news reports, and they provide high, low, and best estimates for the number of deaths associated with each event. I have aggregated the UCDP GED “best” count of deaths from individual events by country and by year, which allowed me to employ the country-year as my unit of analysis for this study.

Many quantitative conflict studies choose to treat conflict as a binary event/non-event, using either of the UCDP’s definitions of minor conflict (25 deaths per dyadic year) or major conflict (1000 deaths per dyadic year). While this has its advantages, treating conflict as an event with only two possible outcomes represents a significant loss of meaningful variation and, consequently, a reduction in complexity of the concept being studied. As Valentino (2014) points out, the Falklands conflict between Argentina and the United Kingdom is usually coded as equivalent to the Soviet occupation of Afghanistan, as both surpass the 1000-death threshold to qualify as “major conflict” events. Needless to say, these two events were vastly different in terms of length and intensity of violence, and coding them as

equivalents is a gross oversimplification of the concept of a major armed conflict. Studying

3 “Attribute” is terminology preferred by machine learning handbooks, but can be used interchangeably with variable, feature, or field. A “target attribute” is analogous to a dependent variable in statistical research, while an “input attribute” is analogous to an independent variable.

(24)

24

the constituent term of armed conflicts – deaths from organized violence – provides for greater specificity as well as a wider dispersion of outcomes.

When aggregated at the country-year level, Deaths from Organized Violence has a highly skewed, nonparametric distribution that resembles a Poisson distribution with λ < 1. In nonstatistical terms, this means there are a high number of country-years with 0 deaths from organized violence – more than two-thirds of the entire sample. The count of country-years where a specific number of deaths was observed decreases exponentially as the number of deaths goes up. However, there are some significant outliers, including 21 country-years with an observed number of deaths from organized violence above 10,000. The unfortunate

distinction of country-year with the greatest number of deaths from organized violence is Rwanda 1994, with 524,468 per my aggregation of UCDP GED – nearly seven times higher than the next highest country-year death total. This figure is so exceptionally large that it skews computations across the dataset. For example, excluding Rwanda 1994 changes the mean country-year death total across the entire sample from 531.11 to 388.21. Likewise, within the trees, Rwanda 1994 and other country-years with abnormally high death totals represent such extreme outcomes compared to the median case that the CART algorithm ends up lumping all of the other, less extreme cases into one large cohort. This results in trees that provide interesting information about the outlier cases but does so to the detriment of the non-outliers, which make up the vast majority of the sample. The intense skew of the

distribution of deaths across country-year observations is evident in Table 2, which shows the 3^rd quartile value of Deaths from Organized Violence to be substantially smaller than the mean value.

However, I also do not want to exclude the high-end outliers from the sample. After all, the country-years with high death counts are some of the most significant cases of organized violence over the past thirty years. So, to reduce the computational gravity of the outliers, I have taken the natural logarithm of the Deaths from Organized Violence attribute, resulting in a new attribute: Log-Transformed Deaths from Organized Violence (a

logarithmic scale conversion graph is available in Annex 3). Transforming the target attribute in this manner reduces the overall skew of the distribution, as is shown in Table 2. While this transformation is not strictly necessary owing to CART’s capability to model nonparametric attributes, reducing the computational importance of the outliers results in more informative models overall. It also reduces the overall variance of the trees, a topic which will be

discussed in more detail in Section 6.

(25)

25

Table 2: Summary statistics for standard and log-transformed target attribute

Target attribute Min 1st Q Median Mean 3rd Q Max Coefficient of variation

Deaths from Organized Violence

0 0 0 531.1 18 524468 17.06

Log-Transformed Deaths from Organized Violence

0 0 0 1.58 2.94 13.17 1.66

As my goal is to make a general model of organized violence, I naturally intend to include as many countries as possible in my sample. The study population is based on Gleditsch and Ward’s (1999) list of independent states, which is guided by the criteria of de facto sovereignty, international recognition by other states, and a minimum population of at least 250,000 people. Ultimately, owing to data availability restrictions, the final sample consists of 134 countries. The time period of the study is also dictated by data availability.

UCDP’s latest GED release (20.1) includes event data from 1989-2019 (Pettersson and Öberg 2020). This makes 1989 a natural starting point for my sample of country-years, which also corresponds roughly with the dissolution of the Soviet Union and the beginning of a new paradigm in global politics (Themnér and Wallensteen 2012). However, for several of my input attributes, data are only available up to 2016, so I have selected this year as a recency cut-off point to avoid biases related to imputing systematically missing data. The period of study, then, is all years between and including 1989-2016, resulting in a sample of 3,696 unique country-year observations.

3.3 Input attribute operationalizations: Geography

As described in Section 2, I have identified six causal categories which represent a theoretical core of the drivers of organized violence. Input attributes operationalizing these causal categories were selected based on two criteria: conceptual fit with the causal

mechanisms identified by previous research; and data availability/completeness. In total, 19 input attributes were selected for inclusion in the initial model, and these attributes are displayed in Table 3. Summary statistics for all attributes included in my models are displayed in Table 4.

The input attributes belonging the Geography causal category are Population Size, Land Area, and Mountainous Terrain. Population Size is a straightforward operationalization based on World Bank data measuring the number of people, in millions, residing in a country (The World Bank 2020). Land Area is operationalized by the country’s total land area in thousands of square kilometers, using data from the Varieties of Democracy or V-Dem

(26)

26 Table 3: Descriptive summaries of input attributes

Causal category Input attribute Operationalization Data source

Geography Population Size Total population World Bank^I

Land Area Country size V-Dem^II

Mountainous Terrain Mean elevation above sea level

ETH GROWup^III

Economy Prosperity GDP per capita V-Dem

Economic Growth GDP per capita growth rate

V-Dem Economic Inequality Est. of Gini index of

inequality in equivalized household disposable income

SWIID^IV

Mass Education Lower secondary enrollment rate, male, net (percentage of relevant age group)

World Bank

Conflict History &

Insecurity

Peace Years Years since country first appears in dataset or since the end of the last ongoing internal or internationalized internal conflict episode

ETH GROWup

New State States within first 2 full years of independence

Primary research Neighbor in Conflict Shared land border with

a country experiencing a major conflict event (>= 1000 fatalities in single country-year)

UCDP GED^V &

GeoDataSource^VI

Liberty & Inclusion Minority Economic Discrimination

Access to state jobs by social group

V-Dem Ethnic Exclusion Sum population of MEG

as a fraction of total population

ETH GROWup

Religious Freedom Freedom of religion index

V-Dem Gender Equality Women's political

empowerment index

V-Dem

Natural Resources Oil Fuel exports as a

percentage of merchandise exports

World Bank

Diamonds Primary diamond

producer

British Geological Survey^VII

Structures of Governance Democracy Institutionalized democracy index

Polity5^VIII Anocracy Revised combined polity

score (Polity2) between -2 and 2

Polity5

Political Instability Period of political interruption or interregnum

Polity5

(27)

27

I: World Bank Open Data: available at https://data.worldbank.org/

II: Varieties of Democracy (V-Dem) Dataset v10.1: available at https://www.v-dem.net/en/

III: ETH Zürich Geographical Research on War, Unified Platform (GROWup): available at https://growup.ethz.ch/

IV: The Standardized World Income Inequality Database (SWIID): available at https://fsolt.org/swiid/

V: Uppsala Conflict Data Program (UCDP) Georeferenced Event Dataset (GED) Global v20.1: available at https://ucdp.uu.se/downloads/index.html#ged_global

VI: GeoDataSource Country Borders: available at https://www.geodatasource.com/addon/country-borders VII: British Geological Survey World Mineral Statistics Data: available at

https://www2.bgs.ac.uk/mineralsuk/statistics/wms.cfc?method=searchWMS

VIII: Center for Systemic Peace Polity5 Project: available at https://www.systemicpeace.org/polityproject.html

Table 4: Summary statistics of all attributes included in the general model

Attribute name Attribute scale Min 1st Q Mean 3rd Q Max

Log-Transformed Deaths from Organized Violence

Ratio 0 0 1.58 2.94 13.17

Population Size Ratio (millions) 0.33 4.10 33.72 27.90 1324.51 Land Area Ratio (1000 km²) 0.56 57.49 844.89 670.37 22,008.91 Mountainous

Terrain

Ratio (meters) 1.21 257.06 589.33 740.43 2,971.544

Prosperity Ratio (USD) 443 2,446 14,019 19,802 156,144

Economic Growth Ratio -0.89 0 0.03 0.06 5.86

Economic Inequality

Ratio 19.50 32.67 39.20 45.00 67.20

Mass Education Ratio (0-100) 3.28 39.02 62.09 86.57 100

Peace Years Ratio 0 2 23.18 43 70

New State Dummy 0 0 0.02 0 1

Neighbor in Conflict Dummy 0 0 0.25 1 1

Minority Economic Discrimination

Interval -2.87 -0.15 0.83 1.97 3.14

Ethnic Exclusion Ratio (0-1) 0 0 0.15 0.20 0.92

Religious Freedom Interval -3.97 0.06 0.92 1.87 2.82

Gender Equality Interval (0-1) 0.04 0.57 0.71 0.88 0.98

Oil Ratio (0-100) 0 0.81 16.77 16.20 99.99

Diamonds Dummy 0 0 0.17 0 1

Democracy Ordinal (0-10) 0 1 5.59 9 10

Anocracy Dummy 0 0 0.10 0 1

Political Instability Dummy 0 0 0.03 0 1

(28)

28

dataset (Coppedge et al. 2020). Mountainous Terrain, on the other hand, is surprisingly difficult to operationalize. In their well-known 2004 analysis of civil war, Collier and

Hoeffler (2004) went so far as to commission a physical geographer to create a new index of mountainous terrain, but the index data is unfortunately not publicly available. In lieu of a more sophisticated measure, I settle for a country’s average elevation above sea level (in meters) to operationalize the concept of rough terrain that might favor insurgency, with data from ETH Zurich’s Geographical Research On War, Unified Platform, henceforth referred to as GROWup (Giradin et al. 2015).

3.4 Input attribute operationalizations: Economy

I operationalize four concepts relating to the Economy causal category: Prosperity, Economic Growth, Economic Inequality, and Mass Education. To operationalize the concept of general economic prosperity, I use data from V-Dem on gross domestic product (GDP) per capita (Coppedge et al. 2020). This measurement, which shows economic output per person, is the standard operationalization for economic prosperity in political science research.

Economic Growth is operationalized by GDP per capita growth rate, which also came from V-Dem and measures the percentage change in GDP per capita from the previous year (Ibid.).

To operationalize Economic Inequality, I turn to the Standardized World Income Inequality Database (SWIID) (Solt 2020), one of the primary data sources used by

Bartusevičius in his influential analysis of the relationship between conflict and economic inequality (Bartusevičius 2014). I use SWIID’s estimates of the Gini index of inequality in equivalized household disposable income to operationalize my Economic Inequality input attribute. This attribute has a theoretical measurement scale from 0 to 100, with 0 indicating total economic equality and 100 indicating total inequality.

Mass Education is operationalized through the net percentage of males enrolled in secondary school. Per World Bank indicator metadata, “Net enrollment rate is the ratio of children of official school age who are enrolled in school to the population of the

corresponding official school age” (The World Bank 2020). This operationalization is similar to the education indicator used by Collier and Hoeffler (2004), who focused specifically on the education of young men as the group of people most likely to be recruited into organized armed groups.

Roots of Conflict: