• No results found

ANALYSIS OF BINARY DEPENDENT VARIABLES USING LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION : A REPLICATION STUDY

N/A
N/A
Protected

Academic year: 2021

Share "ANALYSIS OF BINARY DEPENDENT VARIABLES USING LINEAR PROBABILITY MODEL AND LOGISTIC REGRESSION : A REPLICATION STUDY"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

ANALYSIS OF BINARY DEPENDENT

VARIABLES USING LINEAR PROBABILITY

MODEL AND LOGISTIC REGRESSION :

A REPLICATION STUDY

Submitted by

Lutendo Vele

A thesis submitted to the Department of Statistics in

partial fulfilment of the requirements for Master

degree in Statistics in the Faculty of Social Sciences

Supervisor

Harry J. Khamis

(2)

ABSTRACT

Linear Probability Model (LPM) is commonly used because it is easy to compute and interpret than with logits and probits even though the estimated probabilities may fall

outside the0,1 interval and the linearity concept does not make much sense when

deal-ing with probabilities. This paper extends upon the results of Luca, Owens, and Sharma (2015) reviewing the use of LPM to examine if alcohol prohibition reduces domestic vi-olence. Regular LPM resulted in inconclusive estimates since prohibition was omitted due to collinearity as controls were added. However Luca et al. (2015) had results, and further inspection on their regression commands showed that they ran a linear regression, then a post-estimation on residuals and further used residuals as a dependent variable hence the results were different from the regular LPM. Their method still resulted in unbounded predicted probabilities and heteroscedastic residuals, thus showing that OLS was inefficient and a non-linear binary choice model like logistic regression would be a better option. Logistic regression predicts the probability of an outcome that can only have two values and was therefore used in this paper. Unlike LPM, logistic regression uses a non-linear function which results in a sigmoid bounding the predicted outcome between 0 and 1. Logistic regression had no complication; thus logistic (or any another non-linear dichotomous dependent variable models) regression should have been used on the final analysis while LPM is used at a preliminary stage to get quick results.

(3)

Contents

1 Introduction 1

2 Data 2

2.1 Descriptive Statistics . . . 2

2.2 Treatment of Missing Data . . . 4

3 Methodology 5 3.1 Linear Probability Model . . . 5

3.1.1 Assumptions of Linear Probability Model . . . 6

3.1.2 Critics on Linear Probability Model . . . 6

3.2 Logistic Regression . . . 7

3.2.1 Assumptions of Logistic Regression . . . 8

3.2.2 Critics on Logistic Regression . . . 8

4 Results 8 4.1 Linear Probability Model Results . . . 8

4.2 Logistic Regression Results . . . 10

4.2.1 Sample Size Analysis . . . 10

4.2.2 Examining the odds that the husband drinks . . . 11

4.2.3 Examining the odds that the husband beats his wife . . . 18

5 Conclusion 25

(4)

1

Introduction

Data analysis, which is grounded in statistics with a long history, has played an important role in different domains through its process that begins with data collection to analysis to answer the research question(s). To answer a research question, one needs to study different factors, thus the dependent variable and independent variables. Regression anal-ysis is a mathematical process that guides in answering questions like which variables are most significant and how do those variables interact with each other. However, variables come in two main groups, each with further classifications; Categorical, also known as qualitative or discrete variables which are further classified into nominal, dichotomous and ordinal, and Continuous, also known as quantitative variables which are classified into interval and ratio. Therefore, different analysis or modelling methods are needed to model different dependent variable types.

This paper will focus on regression models used on dichotomous dependent variables, thus binary choice models. Dichotomous variables take the value 1, which may represent success or 0 representing failure. When modelling such a variable, one quickly think in terms of probabilities. For example, what is the probability that a married man living in a state with alcohol prohibition, is religious, has a certain level of education, and works a white collar job drinks alcohol?

Linear Probability Model (LPM) and Logistic Regression are some of the models es-timated when the regression model has a dichotomous dependent variable. Regardless of the critics discussed by Maddala (1983), LPM is one of the most applied statistical models in social sciences because of its easy interpretability and computation speed. Lo-gistic regression, introduced in the late 1960s early 1970s, addresses the critics discussed by Maddala (1983) and as its model is fit by an iterative process of the maximum likeli-hood it was expensive to estimate; as a result LPM remain favourable especially in the early years, however, with the improvement of computer technology, the linear proba-bility model lost favour. Since LPM violates probaproba-bility boundaries and thus result in somewhat meaningless predictions, LPM can be used as the first step in the dichotomous dependent variable analysis. Amemiya (1981, p. 1486–1487) in his survey on qualitative response models, concerning LPM, stated: “it has frequently been used in economet-ric applications, especially in the early years, because of its computational simplicity. Though I do not recommend its use in the final stage of a study, it may be used for the purpose of obtaining quick estimates in a preliminary stage”.

(5)

ideas surrounding LPM and logistic regression, their assumptions and critics, including how they differ. Followed by a summary of the results reported in section 4. Finally concluding remarks on section 5.

2

Data

This thesis investigates if the use of LPM in an article by Luca et al. (2015) was the best binary choice regression model to answer the research questions given that LPM has weaknesses which may result in meaningless predictions. The same datasets used by the authors were provided when they published their paper and were therefore used in this thesis. A couple of datasets were used and first a panel dataset containing the evolution of alcohol prohibition; thus precise laws pertaining to the prohibition of alcohol sales and or consumption and their changes for 17 major states in India from 1980-2010 was compiled. Rich microdata was also collected from the 1998-1999 and 2005-2006 Indian National Family Health Survey (NFHS) to investigate the impact of alcohol prohibition on individual behaviour.

Finally Indian Crime Records Bureau (NCRB) for the years 1980-2010 was used to com-plement their individual-data analysis from state-level administrative crime data with a focus on crimes targeted towards women. The 17 major states investigated in the study were not listed, and with the information given regarding them, it is hard also list them as India has 29 states. NFHS has approximately 3 % of missing observations while NCRB has approximately 31 % of missing observations which exceeds the 25 % rule of thumb as proposed by Dermitas et al. Enders (2010, p. 260). The NCRB was used to investigate the effect of prohibition on other types of crimes targeted towards women on the state level, which are continuous variables and therefore excluded in this thesis.

2.1

Descriptive Statistics

(6)

Table 1: Indian National Family Health Survey Variable Description

Variable Description

year Year of Interview

rep age Respondent’s Current Age

rep educ Education in Single Years

hhsize Number of Household Members

children Numder of Living Children

stwt State Individual Weight

husb beat Hus band has Beaten Respondent

rep wc Respondent Works a White Collar Job

husb wc Husband Works a White Collar Job

urban Household in Urban Area

religion Religious Affiliation

State State of Residence

husb age Husband’s Current Age

husb educ Husband’s Years of Schooling

husb drink Husband Drinks Alcohol

r educ Respondent’s Education / Husband’s Education

r age Respondent’s Age / Husband’s Age

agegap cat Linear Difference in Age

educgap cat Linear Difference in Education

ownmoney Respondent has Money she alone can decide how to use

b unfaithful Husband Justified in Beating Wife if he suspects her of being Unfaithful

husb age cat Husband’s current Age Category

rep age cat Respondent’s Current Age Category

prohib State Prohibition

literacy Literacy Rate

purban Percent Urban

pcgdp Per Capita GDP

unemp Unemployment Rate

health % of Expenditure spent on Health Welfare

educ % of Expenditure spent on Education

pcpolice Total Police officers per 1000 State population

pmale Percent Male

pcpolice exp Total Police Expenditure per 1000 State population

(7)

Table 2: Indian National Family Health Survey Descriptive Statistics

Statistic N Mean St. Dev. Min Max #NAs

year 109,936 - - 1980 2012 0000 rep age 108,930 33.819 8.023 15.000 49.000 1006 rep educ 108,905 4.110 4.860 0.000 99.000 1031 hhsize 108,930 5.351 2.196 1.000 35.000 1006 children 108,930 2.919 1.695 0.000 15.000 1006 stwt 108,930 - - 0.000 6,320,908 1006 husb beat 98,876 - - 0.000 1.000 11060 rep wc 108,930 - - 0.000 1.000 1006 husb wc 108,930 - - 0.000 1.000 1006 urban 108,930 - - 0.000 1.000 1006 religion 108,889 - - 1.000 5.000 1047 State 109,936 - - 1.000 35.000 0000 husb age 108,930 39.824 8.806 15.000 60.000 1006 husb educ 108,839 6.691 7.357 0.000 99.000 1097 husb drink 98,871 - - 0.000 1.000 11065 r educ 108,814 0.932 1.201 0.010 18.000 1122 r age 108,930 0.852 0.102 0.300 2.467 1006 agegap cat 108,930 - - 1.000 9.000 1006 educgap cat 107,991 - - 1.000 7.000 1945 ownmoney 108,930 - - 0.000 1.000 1006 b unfaithful 108,930 - - 0.000 1.000 1006

husb age cat 108,930 - - 1.000 8.000 1006

rep age cat 108,930 - - 1.000 6.000 1006

prohib 91,630 - - 0.000 1.000 18306 literacy 107,828 65.898 8.847 23.946 90.920 2108 purban 107,828 30.625 17.544 5.936 97.504 2108 pcgdp 109,695 14,049.360 6,917.924 1,800.000 51,000.360 241 unemp 107,828 3.355 2.130 −0.330 24.917 2108 health 100,671 4.316 1.045 1.500 13.870 9265 educ 100,641 16.953 4.596 6.400 34.080 9295 pcpolice 104,470 1.898 1.601 0.000 16.412 5466 pmale 104,214 0.503 0.032 0.395 0.593 5722 pcpolice exp 104,138 25.689 26.496 4.105 224.022 5798

2.2

Treatment of Missing Data

(8)

the rule of thumb, the dataset was used as it is and not impute missing observations.

Figure 1: Missing Pattern of NFHS data

3

Methodology

There are several binary choice models and only LPM (Linear Probability Model) and logistic regression are discussed in this thesis. A brief overview of LPM and logistic re-gression are presented and applied to our data to review their performance. Five models will be estimated for both LPM and logistic regression.

3.1

Linear Probability Model

LPM is a linear regression model applied to dichotomous dependent variables. Ordinary Least Squares (OLS) is used to estimate the parameters of LPM which uses a linear func-tion of the independent variables. This means that LPM is linear and raises quesfunc-tions on it’s the ability to bound the estimated probabilities between [0, 1] for meaningful esti-mates. However, LPM is commonly used due to its easy interpretation and computation.

(9)

following linear regression model

y = β1+ β2x2 + ... + βkxk+ µ

= xβxβxβ + µ (1)

where β is a K × 1 vector of parameters and xxx is a N × K matrix of explanatory

variables, and µ is a residua with zero mean and constant variance assumptions. As it is a probability model, in order to interpret the results in terms of probability we take expectations on both sides of size of equation 1 to get,

E(y|x; βx; βx; β) = xβxβxβ (2)

3.1.1 Assumptions of Linear Probability Model

1. A linear relationship between the dependent and independent variables 2. Homoscedasticity

3. Multivariate normality

4. The data has little or no multicollinearity 5. No autocorrelation

6. Outliers/influential cases

The assumptions above will not be formally tested, but results from the application of LPM on our data will outline violation of these assumptions.

3.1.2 Critics on Linear Probability Model

Criticisms of the linear probability model discussed by Maddala (1983) are that; the dis-turbances in the LPM are heteroscedastic, therefore least squares is not efficient, the error term is not distributed normally, so there exist non-linear procedures more efficient than least squares, and predicted probabilities from the LPM could lie outside the 0-1 interval. Angrist and Pischke (2009, p. 103) regarding linear regression said: “... may generate fitted values outside the limited dependent variable boundaries. This fact bothers some researchers and has generated a lot of bad press for the linear probability model.” These disparagements are what this paper will examine as unbounded predictions may be mean-ingless since the model is used to model probabilities

Two LPM will be used to estimate the effect of alcohol prohibition on the husband’s drinking behaviour and domestic violence as described by Luca et al. (2015).

HusbandDrinkshsy = γy+ P rohibitionsyβββF S+ XXXsyδ + HHHhsyθθθ + WWWhsyτ + µµµhsy, (3)

DomesticV iolencehsy = γy + P rohibitionsyβββRF + XXXsyδ0+ HHHhsyθθθ0+ WWWhsyτ0+ ωωωhsy (4)

Equation (3) examines the effect of alcohol prohibition on husband’s drinking be-haviour and equation (4) examines the impact of alcohol prohibition on domestic vio-lence,

where γy are survey year fixed effects, Prohibitionsy is a binary variable equal to 1 if

state s has alcohol prohibition in survey year y, HHHhsy and WWWhsy include a host of

(10)

their age, education, religion, and whether he or she works in a white-collar occupation. In some specifications, variables to help capture the wife’s bargaining power within the household, including whether she has money of her own that she can control and whether she believes that her spouse is justified in beating her if he suspects her of being un-faithful were included. Along the same vein, to proxy for the wife’s relative wage (since actual wage data are not available) by including the spousal age and education gap, both as ratios and as fixed effects were attempted. Because none of the states changed its

prohibition status across the two sample waves. A matrix of state-level controls, XXXsy,

was included to capture systematic differences between states that could be correlated with both drinking and violent behaviours, including the state literacy rate, urbanisation, per capita GDP, the unemployment rate, police and police expenditure per capita, the percent of adults who are male, and the state health and education expenditure per capita.

3.2

Logistic Regression

Logistic regression predicts the probability of an outcome that can only have two values. Unlike LPM, logistic regression uses a non-linear function which result in a curvature bounding the predicted outcome between 0 and 1. Consider a dichotomous response model,

P r(y|xxx) = G(xβxβxβ) (5)

where G is a function which only takes the values between 0 and 1. Logistic distribu-tion is the commonly used non-linear (G) funcdistribu-tion resulting in a logit model,

G(xβxβxβ) = exp(xβxβxβ)

1 + exp(xβxβxβ) (6)

Maximum Likelihood (ML) is used to estimate logistic regression. For a random sample

size N , the ML estimate of βββ is the vector ˆβˆβˆβM L which gives the maximum likelihood of

observing the sample {y1, y2, . . . , yN}, conditional on explanatory variables xxx.

Assume the probability of success, yi = 1 is G(xβ)(xβ)(xβ) and the probability of failure, yi = 0

is 1 − G(xβ)(xβ)(xβ). Then MLE of the β is,

L(y|x; βx; βx; β) = N Y i=1 G(x(x(xiiiβ)β)β) N Y i=1 1 − G(x(x(xiiiβ)β)β) = N Y i=1 G(x(x(xiiiβ)β)β)(yi)1 − G(x(x(xiiiβ)β)β) (1−yi) , (7)

and the log likelihood is,

ln L(y|x; βx; βx; β) = N Y i=1 G(x(x(xiiiβ)β)β) N Y i=1 1 − G(x(x(xiiiβ)β)β) = N X i=1  yiln G(x(x(xiiiβ)β)β) + (1 − yi) ln1 − G(x(x(xiiiβ)β)β)   , (8)

(11)

3.2.1 Assumptions of Logistic Regression

1. A linear relationship between the continuous explanatory variables have a linear rela-tionship with the logit of the outcome variable

2. The data has little or no multicollinearity 3. No autocorrelation

The assumptions above will not be formally tested.

3.2.2 Critics on Logistic Regression

Logistic regression uses ML for estimation; thus, ML iterative process is used to fit the model, which makes it slower compared to LPM. Heteroscedasticity makes the MLE of the parameter vector biased and inconsistent Greene (2012, p. 733), unless the likelihood function is modified to correctly take into account the precise form of heteroscedasticity. Interpretability wise, the odds ratio, log of odds and coefficients are hard to understand and interpret. Logistic regression estimates will be severely biased in a panel model with fixed effects and a short time dimension.

Two models as those used in LPM will also be estimated for the logistic regression analysis with the left side being logistic(Husb-drinks) and logistic(Husb-beats) respectively.

4

Results

In this section, results from both LPM and logistic regression are presented and discussed. Husband controls are husband’s demographic characteristics; age, education, household size, whether he works a white-collar job or not, and whether the household is located in an urban area or not. Wife controls are also demographic characteristics, thus age, education, and whether she works a white-collar job or not. Bargaining controls are the number of children, whether she thinks the husband is justified in beating her if he suspects her of cheating or not and whether or not she has her own money. Finally, there is the husband and wife’s age category and age and educational gap controls.

4.1

Linear Probability Model Results

(12)
(13)

this procedure generates biased coefficients and standard errors that can lead to incorrect inferences, with both Type I and Type II errors. Forbidden regressions produce consis-tent estimates only under rigorous restrictive assumptions which rarely hold in practice (see Wooldridge (2010)). In general, forbidden regressions will not consistently estimate the relationship of interest.

The results in Table 3 Panel B are from estimating the likelihood that the wife reports domestic violence, and if the state has an alcohol prohibition policy. Same specifications were followed as in Panel A, resulting in the same problems of predicted probabilities

outside the 0,1 range and exclusion of prohibition because of collinearity as controls

were added to models (2) to (5). Model (1) from Panel B shows that alcohol prohibition reduces the likelihood of the husband beating his wife by 8.4% as compared to the sample mean of 17%.

Armed with this, a homoscedasticity test was performed, and predicted probabilities were reviewed on each model used by the authors. The residuals were heteroscedastic and pre-dicted probabilities were unbounded, resulting in probabilities below zero. These results indicate that using a different non-linear binary choice regression model would have been better.

4.2

Logistic Regression Results

Logistic Regression results are presented and discussed in this section. Since the LPM, a linear model resulted in unbounded predicted probabilities, the omission of prohibition in the model due to collinearity, and heteroscedastic residuals, it means linear or OLS is inefficient for these models. Hence logistic regression, which is non-linear, is used since it is sigmoid; thus bounds predicted probabilities and heteroscedasticity test is too sensitive given that logistic regression has no error term. Odds ratios, Robust standard errors and 95% confidence intervals are presented in tables with, ∗ ∗ ∗, ∗∗, ∗ Significant at the 1, 5, and 10 per cent level respectively. Moreover, margin plots are also used to elaborate on the results.

4.2.1 Sample Size Analysis

Sample size analysis is performed to ensure the reliability of logistic regression analysis. Peduzzi, Concato, Kemper, Holford, and Feinstein (1996) on their simulation study of the number of events per variable in logistic regression analysis, found that logistic models with low events per variable lead to major problems including biased regression estimates. As a sample size guideline, Peduzzi et al. (1996) ’s work is used to show the minimum sample size in each model.

n = 10k/p (9)

(14)

4.2.1.1 Sample size analysis for the odds that husband drinks.

Model 1 has 12 covariates. The minimum sample size should, therefore, be 10*12/0.16 = 750. Model 2 through 5 had 18, 24, 26, 26 covariates, respectively, and the minimum sample sizes were 1125, 1500, 1625, and 1625, respectively. Since the population has 15 819 events of drinking husbands, the sample size is large enough for all models.

4.2.1.2 Sample size analysis for the odds that the husband beats his wife.

Model 1 has 12 covariates. The minimum sample size should, therefore, be 10*12/0.33 = 364. Model 2 through 5 had the same number of covariates as mentioned in section 4.2.1.1, and the minimum sample sizes were 546, 728, 788, and 788, respectively. Since the population has 36 629 events of husbands who beat their wives, the sample size is large enough for all models.

4.2.2 Examining the odds that the husband drinks

Table 4 below shows the first model with year as fixed effect

The first model shows that the odds of the husband drinking given that the state has the alcohol prohibition policy is 58% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.2209, 0.8622).

Table 4: The impact of alcohol prohibition on alcohol consumption, model (1)

Model (1) Year as FE

Variable OR Robust SE 95% Conf. Interval

Constant 0.3670*** 0.0658 0.2583 0.5214 Prohib 0.4243** 0.1535 0.2088 0.8622 Year 1999 1.0777 0.1552 0.8126 1.4292 2005 1.2425 0.1788 0.9270 1.6474 2006 0.3670*** 0.1932 1.1821 1.9475

Table 5 below shows the two models subsequent to the first model, where the first model is the effect of alcohol prohibition on the husband’s drinking behaviour with the year of the interview as a fixed effect. The second model adds to the first model plus husband controls and religion as a fixed effect and adding wife and bargaining controls to the second model yields the third model.

(15)
(16)

Figure 2 below shows the evolution of the husband’s drinking probability with 95% CIs. From Figure 2(a) shows that the husband’s probability of drinking if the state has the alcohol prohibition policy increased from 0.23 in 1988 to 0.34 in 2006. This result is interesting as with time; one would expect people to get used to the policy and stop drinking. However, there are many reasons for this as it may be that respondents be-came confident and complete the survey truthfully or there are no severe consequences for violating the policy so people go back to drinking, or people begin to brew their own alcohol at home to quench their thirst. The probability of drinking varies within various religious affiliations. Figure 2(b) shows that Muslim husbands are less likely to drink alcohol, followed by Hindus, while Sikh and Christians have the highest probability to drink if the state has the alcohol prohibition policy.

(a) P(Husb-drink) over the interview years. (b) P(Husb-drink) in different religious

affilia-tions.

Figure 2: Husband’s probability to drink given religion and year of interview with 95% CIs

(17)

Table 6: The impact of alcohol prohibition on alcohol consumption, model (4)

Model (4) Husband & Wife age group (FE)

Variable OR Robust SE 95% Conf. Interval

(18)

(a) P(Husb-drink) given husband’s age category (b) P(Husb-drink) given wife’s age category

Figure 3: Husband’s probability to drink given his and the wife’s age category with 95% CIs

Figure 3(a) shows that husbands between 30-34 years of age have the highest drinking probability followed by 25-29 & 35-39 years age groups respectively, followed by those who are 40-49 & 20-24 years age groups, with 15-19 & 50-60 years age groups having the lowest drinking probability respectively. The likelihood of the husband drinking given the wives age follows the same order as the husband’s age group. Those husbands with wives who are 30-34 & 25-29 years age groups have the highest probability to drink, followed by those with wives of 34-39 & 20-24 years of age respectively, while those with wives who are 40-49 & 15-19 years of age have the lowest drinking probability respectively.

(a) P(Husb-drink) given his age and religious affiliation.

(b) P(Husb-drink) given the wifes’ age and reli-gious affiliation.

Figure 4: Husband’s probability to drink given his and the wife’s age and religious affili-ation with 95% CIs

Figure 4 above shows similar trends as Figure 2 with same age groups and religious affiliation having the same probabilities. Muslim husbands seem to have little variation between age groups compared with other religions.

(19)

Table 7: The impact of alcohol prohibition on alcohol consumption (5)

Model (5) Age and Education gap

Variable OR Robust SE 95% Conf. Interval

Constant 0.9667 0.6181 0.2760 3.3852 Prohib 0.3231*** 0.9667 0.2349 0.4445 Husb-age 0.9989 0.0104 0.9787 1.0195 Husb-educ 0.9426*** 0.0115 0.9203 0.9655 Husb-wc 0.7644*** 0.0318 0.7045 0.8264 Urban 1.3044 *** 0.0625 1.1874 1.4329 Hhsize 0.9529*** 0.0070 0.9393 0.9667 Children 1.0546*** 0.0144 1.0267 1.0832 Rep-educ 0.9713*** 0.0110 0.9499 0.9931 Rep-age 0.9917 0.0101 0.9721 1.0116 Rep-wc 1.0923 0.0643 09732 1.2259 B.unfaithful 1.0726 0.0567 0.9669 1.1897 Ownmoney 1.1704** 0.0592 1.0560 1.1224 Year 1999 1.1641 0.1463 0.9100 1.4892 2005 1.1526*** 0.1952 1.1875 1.9609 2006 1.8065*** 0.2235 1.4174 2.3022 Age gap-cat

Wife 5-10 years older 0.9432 0.4236 0.3911 2.2747

Wife < 5 years older 0.9113 0.4214 0.3682 2.2556

Husband < 5 years older 0.8786 0.4412 0.3283 2.3511

Husband 5-10 years older 0.8726 0.4374 0.3267 2.3305

Husband 10-15 years older 0.8831 0.4414 0.3316 2.3520

Husband 15-20 years older 0.8093 0.4291 0.2863 2.2876

Husband 20-25 years older 0.8284 0.5042 0.2513 2.7311

Husban > 25 years older 0.8119 0.4577 0.2689 2.4512

Education gap-cat

Wife 5-10 years more schooling 0.8915 0.3208 0.4404 1.8045

Wife < 5 years more schooling 0.9573 0.3752 0.4441 2.0639

Husband < 5 years more schooling 0.8983 0.3932 0.3809 2.1183

Husband 5-10 years more schooling 0.8775 0.4061 0.3543 2.1734

Husband 10-15 years more schooling 1 0.8434 0.4209 0.3171 2.2432

Husband ≥16 years more schooling 0.4539 0.2542 0.1514 1.3605

Religion

Muslim 0.1916*** 0.0394 0.1280 0.2868

Christian 2.1238*** 0.3546 1.5311 2.9461

Sikh 2.0945*** 0.2125 1.7169 2.2552

(20)

(a) P(Husb-drink) given the age gap. (b) P(Husb-drink) given the education gap.

Figure 5: Husband’s probability to drink given his and the wife’s age and education gap with 95% CIs

Figure 5 above shows the husband’s drinking probabilities given age and education gap. Figure 5(a) shows high drinking probability for husbands with wives who are 10 years older than them, followed by those who are 5-10 years older with those who have wives who are less than 5 years older than them with lowest drinking probability. On the other hand, the probability of a husband drinking decreases the older the husband is to the wife. In general, wives who are older than their husbands have husbands who are more likely to drink, although they are legally prohibited as compared to husbands who are older than their wives.

Figure 5(b) shows the husband’s drinking behaviour, given the education gap. Husbands who have wives who are highly educated than they have the highest drinking probability compared to husbands who are highly educated than their wives, with the lowest drinking probability.

Figure 6 below shows drinking probabilities of husbands given age and education gap and their religious affiliation which follows the same trend as Figure 5. The variation between religious affiliations remains unchanged with Christians and Sikhs having the highest drinking probability and Muslims with the lowest drinking probability.

(a) P(Husb-Drink) given the age gap in different religious affiliation.

(b) P(Husb-Drink) given the education gap in different religious affiliation.

(21)

4.2.3 Examining the odds that the husband beats his wife

Table 8 below shows the first model, which examines the effect of alcohol prohibition on domestic violence with the year of the interview as a fixed effect. This first model shows that the odds of the husband beating his wife given that the state has the alcohol prohibition policy is 52% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.3895, 0.5931).

Table 8: The impact of alcohol prohibition on domestic violence, model (1)

Model (1) Year as FE

Variable OR Robust SE 95% Conf Interval

Constant 0.2593*** 0.0439 0.1861 0.3612 Prohib 0.4807*** 0.0516 0.3895 0.5931 Year 1999 0.9749 0.1563 0.7120 1.3349 2005 0.7363* 0.1563 0.5143 1.0542 2006 0.06334*** 0.1083 0.4530 0.8856

Table 9 below shows the second model, which adds to the first model husband con-trols and religion as fixed effects and adding wife and bargaining concon-trols to the second model yields the third model. When husband controls are added in the first model, the odds that the husband beats his wife given that the state has alcohol prohibition blanket and husband controls are kept at means is 50% lower (CI @95%: 0.4080, 0.6097) than in states without alcohol prohibition blanket.

(22)
(23)

(a) P(Husb-Beat) his wife over the years. (b) P(Husb-Beat) his wife across religious affil-iations.

Figure 7: Husband’s probability to beat wife over the years of interview and in different religious affiliations with 95% CIs

To examine the evolution of domestic violence given the alcohol prohibition blanket, Figure 7 above is used, which shows domestic violence in different religious affiliations and over the interview years. Figure 7(a) shows that domestic violence decreased through-out the interview years. Thus the probability of a wife reporting that the husband beat her decreased from 0.2 in 1998 to 0.135 in 2006. Figure 7(b) shows that Sikh husbands are less likely to beat their wives followed by other religions while Muslims, Hindus and Christians have the highest probability to beat their wives if the state has the alcohol prohibition policy.

(24)

Table 10: The impact of alcohol prohibition on domestic violence, model (4)

Model (4) Husband & Wife age group (FE)

Variable OR Robust SE 95% Conf. Interval

(25)

Figure 8 below shows the husband’s domestic violence behaviour, given their different age groups with wives of different age groups. Figure 8(a) shows that husbands between 20-49 years of age have the highest probability to be reported for domestic violence, followed by those between 50-60 years of age; finally, young husbands between 15-19 years of age have the lowest probability of being reported for domestic violence. The likelihood of the wife reporting that her husband beats her, given the wife’s age follows the same order as the husband’s age group. Those husbands with wives who are 20-34 years of age have the highest probability to be reported for domestic violence, followed by those with wives of 35-39 & 15-19 years of age respectively, while those with wives who are 40-49 years of age have the lowest probability of domestic violence respectively.

(a) P(Husb-Beat) his wife given his age cate-gory.

(b) P(Husb-Beat) his wife given the wife’s age category.

Figure 8: Husband’s probability to beat wife given his and the wife’s age category with 95% CIs

(a) P(Husb-Beat) his wife given his age and re-ligious affiliation.

(b) P(Husb-Beat) his wife given the wife’s age and religious affiliation.

Figure 9: Husband’s probability to beat wife given his and the wife’s age and religious affiliation with 95% CIs

(26)

Finally examining the effect of husband and wife’s age and education gap, we add them as fixed effects to model (4). Table 11 below shows the results for this fifth model, and the odds of the husband beating his wife in a state that has the alcohol prohibition policy is 51% lower than in states without alcohol prohibition, and the result is statistically significant (CI @95%: 0.3975, 0.6008).

(a) P(Husb-Beat) his wife given the age gap. (b) P(Husb-Beat) his wife given the education

gap.

Figure 10: Husband’s probability to beat his wife given his and the wife’s age and edu-cation gap with 95% CIs

(27)

Table 11: The impact of alcohol prohibition on domestic violence, model (5)

Model (5) Age and Education gap

Variable OR Robust SE 95% Conf. Interval

Constant 1.0108*** 0.9772 0.1519 6.7240 Prohib 0.4887*** 0.0515 0.3975 0.6008 Husb-age 1.0210* 0.0126 0.9966 1.0460 Husb-educ 0.9873 0.0084 0.9710 1.0039 Husb-wc 0.9050* 0.0523 0.8081 1.0135 Urban 1.0257 0.0687 0.8995 1.1695 Hhsize 0.9565*** 0.0111 0.9350 0.9785 Children 1.0925*** 0.0122 1.0688 1.1167 Rep-educ 0.8859*** 0.0080 0.8705 0.9016 Rep-age 0.9659*** 0.0122 0.9423 0.9901 Rep-wc 1.1384*** 0.0527 1.0397 1.2466 B.unfaithful 1.2885*** 0.0759 1.1481 1.4462 Ownmoney 1.0769* 0.0409 0.997 1.1602 Year 1999 1.1173 0.1760 0.8206 1.5213 2005 0.8702 0.1426 0.6313 1.1998 2006 0.8140 0.1283 0.5976 1.1088 Age gap-cat

Wife 5-10 years older 0.6133 0.2940 0.2397 1.5619

Wife < 5 years older 0.5879 0.2463 0.2586 1.3362

Husband < 5 years older 0.4939 0.2394 0.1910 1.2771

Husband 5-10 years older 0.4675 0.2448 0.1675 1.3047

Husband 10-15 years older 0.4272 0.2372 0.1439 1.2685

Husband 15-20 years older 0.4039 0.2438 0.1237 1.3185

Husband 20-25 years older 0.3733 0.2349 0.1087 1.2814

Husban > 25 years older 0.3773 0.2487 0.1037 1.3732

Education gap-cat

Wife 5-10 years more schooling 1.5490 0.8309 0.5413 4.4326

Wife < 5 years more schooling 1.3224 0.7327 0.4464 3.9171

Husband < 5 years more schooling 0.8893 0.5096 0.2892 2.7344

Husband 5-10 years more schooling 0.7813 0.4471 0.2546 2.3982

Husband 10-15 years more schooling 0.5877 0.3597 0.1771 1.9507

Husband ≥16 years more schooling 0.3399 0.2317 0.0894 1.2928

Religion

Muslim 0.9126 0.0806 0.7675 1.0850

Christian 1.1855 0.0876 1.0257 1.3703

Sikh 0.8151 0.1237 0.6054 1.0974

(28)

5

Conclusion

Luca et al. (2015) used LPM to examine both the effect of prohibition on the drink-ing behaviour of husbands and the impact of prohibition on domestic violence. However, nothing was mentioned regarding the two main problems with the LPM were: unbounded probability predictions are possible, and linearity does not make much sense conceptu-ally. Therefore, this raised enough curiosity to replicate their study. The objectives of this thesis were, therefore, to give a brief overview of the linear probability model & lo-gistic regression and a review using applications to decide if the lolo-gistic regression would be preferable over LPM.

Both LPM and logistic regressions were estimated, and LPM estimates resulted in un-bounded probabilities and the second model throughout the fifth model had collinearity resulting in variable prohibition being omitted. To overcome these problems, they per-formed a linear regression where they regressed the dependent variables (husband drinks and husband beats wife) on all independent variables except the variable of interest, pro-hibition (alcohol propro-hibition), then post-estimation on residuals and further used these residuals as a dependent variable and regress them on prohibition. Although this proce-dure yielded results it did not resolve the collinearity and unbound predicted probabilities problems, and this procedure results in biased parameter estimates.

(29)

Acknowledgements

(30)

References

Amemiya, T. (1981). Qualitative response models: A survey. Journal of Economic

Literature, 19(4), 1483–1536. Retrieved from http://www.jstor.org/stable/

2724565

Angrist, J., & Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist’s

companion (1st ed.). Princeton University Press. Retrieved from https://

EconPapers.repec.org/RePEc:pup:pbooks:8769

Chen, W., Hribar, P., & Melessa, S. (2018, 6). Incorrect inferences when using

residuals as dependent variables. Journal of Accounting Research, 56(3), 751–

796. Retrieved from https://doi.org/10.1111/1475-679X.12195 doi: 10.1111/ 1475-679X.12195

Donald, S. G., & Lang, K. (2007). Inference with difference-in-differences and other panel data. The Review of Economics and Statistics, 89(2), 221-233. Retrieved from https://doi.org/10.1162/rest.89.2.221 doi: 10.1162/rest.89.2.221 Enders, C. (2010). Applied missing data analysis. New York, NY: The Guilford Press. Freckleton, R. P. (2002). On the misuse of residuals in ecology: Regression of residuals

vs. multiple regression. Journal of Animal Ecology, 71(3), 542–545. Retrieved from http://www.jstor.org/stable/2693531

Greene, W. H. (2012). Econometric analysis (7th ed.). Prentice Hall, Upper Saddle River, NJ.

Luca, D. L., Owens, E., & Sharma, G. (2015). Can alcohol prohibition reduce violence against women? The American Economic Review, 105(5), 625–629. Retrieved from http://www.jstor.org/stable/43821957

Maddala, G. S. (1983). Limited-dependent and qualitative variables in econometrics. Cambridge: Cambridge U.P.

Peduzzi, P., Concato, J., Kemper, E., Holford, T. R., & Feinstein, A. R. (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology, 49(12), 1373-1379.

References

Related documents

2 The result shows that if we identify systems with the structure in Theorem 8.3 using a fully parametrized state space model together with the criterion 23 and # = 0 we

In the present thesis I have examined the effect of protein synthesis inhibitors (PSIs) on the stabilization of LTP in hippocampal slices obtained from young rats.

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Both Brazil and Sweden have made bilateral cooperation in areas of technology and innovation a top priority. It has been formalized in a series of agreements and made explicit

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

Att förhöjningen är störst för parvis Gibbs sampler beror på att man på detta sätt inte får lika bra variation mellan de i tiden närliggande vektorerna som när fler termer

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating