C Thesis Analysis Code
Lina Sigurdh January 11, 2021
1: Replication Reiter & Stam
#All right, I will begin by simply trying to replicate the results from Reiter and Stam's ana lysis. 'doublereduce' is the data they provided.
load("reiterstam.Rdata")
#Creating the logit model.
replication <- glm(sideaa ~ pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mdemtar + persona l + military + single + democ + contig + majpow + ally + loglsrat + advanced + dispyrs + dspl ine1 + dspline2 + dspline3, data = doublereduce, family = "binomial")
replication_f = glm(sideaa ~ pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mdemtar + person al + military + single + democ + contig + majpow + ally + loglsrat + advanced + dispyrs + dsp line1 + dspline2 + dspline3, data = doublereduce, family = binomial)
coeftest(replication_f, vcov. = vcovCL(replication_f, cluster = doublereduce$idyr, type = "HC 0"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.80545323 0.09889942 -48.5893 < 2.2e-16 ***
## pdemdtar 1.04494020 0.14021596 7.4524 9.168e-14 ***
## pdemdin 0.10529635 0.18999241 0.5542 0.579433
## sdemin -0.08143726 0.14207032 -0.5732 0.566497
## sdemtar 0.17889896 0.11394694 1.5700 0.116410
## mdemin -0.45500447 0.27555257 -1.6512 0.098689 .
## mdemtar 0.64215122 0.24440637 2.6274 0.008604 **
## personal 0.29558888 0.26506248 1.1152 0.264779
## military -0.30662639 0.57399481 -0.5342 0.593205
## single -0.65063790 0.15066330 -4.3185 1.571e-05 ***
## democ -1.04653039 0.21039016 -4.9742 6.551e-07 ***
## contig 2.91383090 0.09028561 32.2735 < 2.2e-16 ***
## majpow 2.16351725 0.10355229 20.8930 < 2.2e-16 ***
## ally 0.08479717 0.08700605 0.9746 0.329753
## loglsrat -0.31570044 0.02667998 -11.8329 < 2.2e-16 ***
## advanced -0.18713522 0.14153245 -1.3222 0.186099
## dispyrs -0.38071836 0.02265343 -16.8062 < 2.2e-16 ***
## dspline1 -0.00351297 0.00042412 -8.2829 < 2.2e-16 ***
## dspline2 0.00228290 0.00035586 6.4152 1.407e-10 ***
## dspline3 -0.00064137 0.00014120 -4.5423 5.564e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
file:///C:/Users/smulb/OneDrive/FREDS C/METHODS C MODULE 3/WIP/Lina Sigurdh_Source code bachelor thesis.html 2/8
#That worked! All right, on to working with the new IV.
2: Subsetting and reshaping data
#When working with SIPRI and the VDem data, I first need to do some clean-up, before I create my dummies and merge with the other data.
load("fullpopulation.Rdata")
apop <- population[which(population$`Additive polyarchy index` < 0.35),]
aypoplow <- apop[which(apop$Year > 1948),]
population2 <- aypoplow[which(aypoplow$Year < 1995),]
#What I have done here is sort the population of data. First, I removed all values equal to a nd above 0.35 on the API. Then, I removed all values equal to or lower than 1948. Last, I rem oved all values equal to or higher than 1995. Thus, my population should be countries with a score lower than 0.35 on the API and within the time period 1949-1994.
#I then did the same with the SIPRI data, and excluded all subjects except for those within t he period 1949-1994:
load("SIPRIuntreated.Rdata")
sipri1 <- sipri %>% select(-("1995":"2019"))
#I also need to transform the SIPRI data from wide to long.
load("milex.Rdata")
milex2 = reshape(milex, direction = "long", varying = list(names(milex)[2:47]), v.names = "Mi litary expenditure", idvar = c("Observation"), timevar = "year", times = 1949:1994)
#There we have it :)
3: Merging with V-Dem and SIPRI
load("milexp.Rdata")
load("vdempopulation.Rdata")
colnames(population) <- c("Country","country_id","country_text_id","year","api","COWcode","md i,")
colnames(population) <- c("Country","country_id","country_text_id","year","api","COWcode","md i")
merged <- merge(population, milexp)
#Now the VDem and the SIPRI data are combined in one dataset. On to removing data I do not ne ed and renaming columns.
colnames(merged)[colnames(merged) == "Military expenditure"] <- "milexp"
t_merged <- merged %>% select(-Observation,-country_text_id,-Country,-country_id)
#And that concludes the merging part.
4: Model 2: Creating the IV dummy
#My IV will be a dummy variable created from the MDI and milex.
#Model 1:
t_merged$dummy <- ifelse(t_merged$mdi >=0.5,1,0) & ifelse(t_merged$milex > 0.05,1,0)
#That worked!
dummyd <- t_merged %>% select(-mdi,-milexp) summary(dummyd$dummy)
## Mode FALSE TRUE NA's
## logical 3636 111 223
#All right, that wraps up making the dummy for model 2. Next, I will experiment some with hig her/lower cutoffs in order to make model 3 and 4.
#Model 3, lower cutoff for MDI:
t_merged$dummy <- ifelse(t_merged$mdi >=0.3,1,0) & ifelse(t_merged$milex > 0.05,1,0) d_md3 <- t_merged %>% select(-mdi,-milexp)
#Model 4, higher cutoff for MDI:
t_merged$dummy <- ifelse(t_merged$mdi >=0.7,1,0) & ifelse(t_merged$milex > 0.05,1,0) d_md4 <- t_merged %>% select(-mdi,-milexp)
5: Merging my IV with the data from Reiter &
Stam
file:///C:/Users/smulb/OneDrive/FREDS C/METHODS C MODULE 3/WIP/Lina Sigurdh_Source code bachelor thesis.html 4/8
load("dummyd.Rdata") load("reiterstam.Rdata")
#On to the crucial part, merging the data I have compiled with the data from Reiter & Stam. I intend to merge on the variables Year, and merge Side A with COWn.
#It is important to note that while the variables are called statea and stateb, the descripti on says the opposite (that statea is state B and vice versa). However, looking at the data, I conclude that the statea variable is State A.
colnames(dummyd)[colnames(dummyd) == "COWcode"] <- "statea"
colnames(dummyd)[colnames(dummyd) == "dummy"] <- "d_autocracy_mil"
d_model2 <- merge(dummyd,doublereduce)
#This was model 2. Now I do the same for model 3:
colnames(d_md3)[colnames(d_md3) == "COWcode"] <- "statea"
colnames(d_md3)[colnames(d_md3) == "dummy"] <- "d_autocracy_mil"
d_model3 <- merge(d_md3,doublereduce)
#...and model 4!
colnames(d_md4)[colnames(d_md4) == "COWcode"] <- "statea"
colnames(d_md4)[colnames(d_md4) == "dummy"] <- "d_autocracy_mil"
d_model4 <- merge(d_md4,doublereduce)
6: Regression
load("data_cthesis.Rdata")
#Model 2
model2 <- glm(sideaa ~ d_autocracy_mil+pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mdemta r + personal + military + single + democ + contig + majpow + ally + loglsrat + advanced + dis pyrs + dspline1 + dspline2 + dspline3, data = d_model2, family = "binomial")
model2_f = glm(sideaa ~ d_autocracy_mil + pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mde mtar + personal + military + single + democ + contig + majpow + ally + loglsrat + advanced + dispyrs + dspline1 + dspline2 + dspline3, data = d_model2, family = binomial)
coeftest(model2_f, vcov. = vcovCL(model2_f, cluster = model2_f$idyr, type = "HC0"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.72977909 0.10830238 -43.6720 < 2.2e-16 ***
## d_autocracy_milTRUE 0.72681302 0.13558314 5.3606 8.293e-08 ***
## pdemdtar 1.12845407 0.12809179 8.8097 < 2.2e-16 ***
## pdemdin -0.08967405 0.51023790 -0.1757 0.86049
## sdemin 0.44636389 0.19023903 2.3463 0.01896 *
## sdemtar -0.09855915 0.11294964 -0.8726 0.38288
## mdemin -1.19386573 1.01410765 -1.1773 0.23909
## mdemtar 0.29091842 0.21855423 1.3311 0.18315
## personal 0.26365341 0.25352657 1.0399 0.29837
## military -0.47791702 0.60208342 -0.7938 0.42733
## single -0.63386167 0.14890912 -4.2567 2.075e-05 ***
## democ 0.43937742 0.20130493 2.1826 0.02906 *
## contig 3.06969976 0.09036322 33.9707 < 2.2e-16 ***
## majpow 2.02282858 0.11227636 18.0165 < 2.2e-16 ***
## ally 0.05805295 0.09042070 0.6420 0.52085
## loglsrat -0.30833982 0.02979351 -10.3492 < 2.2e-16 ***
## advanced -0.07951295 0.13412956 -0.5928 0.55331
## dispyrs -0.41851692 0.02641836 -15.8419 < 2.2e-16 ***
## dspline1 -0.00419284 0.00050540 -8.2962 < 2.2e-16 ***
## dspline2 0.00282954 0.00042268 6.6943 2.167e-11 ***
## dspline3 -0.00084947 0.00016664 -5.0976 3.439e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##Model 3: lower cutoff value of the MDI
model3 <- glm(sideaa ~ d_autocracy_mil+pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mdemta r + personal + military + single + democ + contig + majpow + ally + loglsrat + advanced + dis pyrs + dspline1 + dspline2 + dspline3, data = d_model3, family = "binomial")
model3_f = glm(sideaa ~ d_autocracy_mil + pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mde mtar + personal + military + single + democ + contig + majpow + ally + loglsrat + advanced + dispyrs + dspline1 + dspline2 + dspline3, data = d_model3, family = binomial)
coeftest(model3_f, vcov. = vcovCL(model3_f, cluster = model3_f$idyr, type = "HC0"))
file:///C:/Users/smulb/OneDrive/FREDS C/METHODS C MODULE 3/WIP/Lina Sigurdh_Source code bachelor thesis.html 6/8
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.75404440 0.11327483 -41.9691 < 2.2e-16 ***
## d_autocracy_milTRUE 0.73613682 0.10879379 6.7663 1.321e-11 ***
## pdemdtar 1.05319018 0.13717269 7.6778 1.618e-14 ***
## pdemdin -0.06657868 0.51056949 -0.1304 0.8962493
## sdemin 0.47319713 0.19178835 2.4673 0.0136141 *
## sdemtar -0.04678661 0.11518372 -0.4062 0.6846021
## mdemin -1.18623975 1.01431448 -1.1695 0.2422026
## mdemtar 0.17954943 0.24195435 0.7421 0.4580390
## personal 0.29404877 0.25395238 1.1579 0.2469092
## military -0.42170126 0.60265511 -0.6997 0.4840903
## single -0.58100608 0.15210310 -3.8198 0.0001336 ***
## democ 0.44138548 0.20081565 2.1980 0.0279517 *
## contig 3.05465110 0.09387929 32.5381 < 2.2e-16 ***
## majpow 1.97575356 0.11430509 17.2849 < 2.2e-16 ***
## ally 0.04656265 0.09245623 0.5036 0.6145296
## loglsrat -0.29347525 0.03006291 -9.7620 < 2.2e-16 ***
## advanced -0.08031336 0.13405570 -0.5991 0.5491032
## dispyrs -0.41606505 0.02721889 -15.2859 < 2.2e-16 ***
## dspline1 -0.00415855 0.00051760 -8.0342 9.418e-16 ***
## dspline2 0.00280317 0.00043061 6.5098 7.524e-11 ***
## dspline3 -0.00083920 0.00016858 -4.9782 6.418e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##Model 4: increase cutoff value of the MDI
model4 <- glm(sideaa ~ d_autocracy_mil+pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mdemta r + personal + military + single + democ + contig + majpow + ally + loglsrat + advanced + dis pyrs + dspline1 + dspline2 + dspline3, data = d_model4, family = "binomial")
model4_f = glm(sideaa ~ d_autocracy_mil + pdemdtar + pdemdin+ sdemin + sdemtar + mdemin + mde mtar + personal + military + single + democ + contig + majpow + ally + loglsrat + advanced + dispyrs + dspline1 + dspline2 + dspline3, data = d_model4, family = binomial)
coeftest(model4_f, vcov. = vcovCL(model4_f, cluster = model4_f$idyr, type = "HC0"))
##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.62020965 0.10343778 -44.6666 < 2.2e-16 ***
## d_autocracy_milTRUE 0.74909631 0.16159820 4.6355 3.560e-06 ***
## pdemdtar 1.10025027 0.12287284 8.9544 < 2.2e-16 ***
## pdemdin -0.11711293 0.50888427 -0.2301 0.8180
## sdemin 0.40885684 0.19032804 2.1482 0.0317 *
## sdemtar -0.01419075 0.10916970 -0.1300 0.8966
## mdemin -1.23833364 1.01265429 -1.2229 0.2214
## mdemtar 0.29730896 0.21306272 1.3954 0.1629
## personal 0.32696386 0.23457073 1.3939 0.1634
## military -0.48748382 0.60352287 -0.8077 0.4192
## single -0.68790505 0.14579332 -4.7184 2.378e-06 ***
## democ 0.39741243 0.20125414 1.9747 0.0483 *
## contig 2.96293216 0.08830950 33.5517 < 2.2e-16 ***
## majpow 2.16303527 0.10760508 20.1016 < 2.2e-16 ***
## ally 0.10222080 0.08805797 1.1608 0.2457
## loglsrat -0.33168357 0.02902249 -11.4285 < 2.2e-16 ***
## advanced -0.11492316 0.13536620 -0.8490 0.3959
## dispyrs -0.41554971 0.02569266 -16.1739 < 2.2e-16 ***
## dspline1 -0.00413727 0.00049279 -8.3956 < 2.2e-16 ***
## dspline2 0.00280936 0.00041416 6.7832 1.175e-11 ***
## dspline3 -0.00085503 0.00016445 -5.1994 2.000e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#And there we have it!
#Converting to odds-ratios for interpretation, model 2:
m2_oddsr <- exp(cbind(OR = coef(model2), confint(model2)))
7: Creating tables to present results
file:///C:/Users/smulb/OneDrive/FREDS C/METHODS C MODULE 3/WIP/Lina Sigurdh_Source code bachelor thesis.html 8/8
#To present my results in tables, I use the stargazer package (Hlavac, Marek (2018). stargaze r: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.2. https://
CRAN.R-project.org/package=stargazer).
#Table 1: Summary statistics
stargazer(replication,model2,model3,model4,type="html",out="thesis_table1.html",title="Table 1: Summary statistics",font.size = "small",covariate.labels = c("Military dictatorship","Per s/democ","Democ/pers","Democ/single","Single/democ","Democ/mil","Mil/democ","Personal","Milit ary","Single","Democ","Contiguous","Major power","Ally","Power ratio","Economically advanced"
,"Time since last dispute","Cubic spline 1","Cubic spline 2","Cubic spline 3"))
#Note that in table 1 in the thesis, the dependent variable is called "Directed Dyads" as opp osed to "sideaa" which it is in this output. I changed that in word.
#Table 2: Odds ratio of Table 2
rownames(m2_oddsr) <- c("Intercept","Military dictatorship","Pers/democ","Democ/pers","Democ/
single","Single/democ","Democ/mil","Mil/democ","Personal","Military","Single","Democ","Contig uous","Major power","Ally","Power ratio","Economically advanced","Time since last dispute","C ubic spline 1","Cubic spline 2","Cubic spline 3")
stargazer(m2_oddsr, type="html", title = "Table 2: Odds ratios of Model 2", out = "oddsratio.
html")