• No results found

Estimating Multinomial Logit Models with Samples of Alternatives

N/A
N/A
Protected

Academic year: 2021

Share "Estimating Multinomial Logit Models with Samples of Alternatives"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

Estimating Multinomial Logit Models with

Samples of Alternatives

Benjamin Jarvis

The self-archived postprint version of this journal article is available at Linköping University Institutional Repository (DiVA):

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156125

N.B.: When citing this work, cite the original publication.

Jarvis, B., (2018), Estimating Multinomial Logit Models with Samples of Alternatives, Sociological methodology. https://doi.org/10.1177/0081175018793460

Original publication available at:

https://doi.org/10.1177/0081175018793460

Copyright: American Sociological Association

(2)

ESTIMATING MULTINOMIAL LOGIT MODELS WITH SAMPLES OF ALTERNATIVES*

Benjamin F. Jarvis

Institute for Analytical Sociology Link¨oping University

April 3, 2019

Keywords: discrete choice; choice sets; sampling; residential mobility

2,055 words (including text, abstract, footnotes, references); 1 figure

*Published in Sociological Methodology, DOI: 10.1177/0081175018793460. This is a pre-print version, prior to the publisher’s copyedits. This comment was prepared with the input of Robert Mare and Elizabeth Bruch. Financial support was provided by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no 324233, Riksbankens Jubileumsfond (DNR M12-0301:1), and the Swedish Research Council (DNR 445-2013-7681, DNR 340-2013-5460, and 2016-01987). Please send correspondence to: Benjamin F. Jarvis, Link¨oping University, Institute for Analytical Sociology, S-601 74 Norrk¨oping, Sweden.

(3)

Abstract

This research note reconsiders advice offered by Bruch and Mare (2012) about sam-pling choice sets in conditional logistic regression models of residential mobility. Con-tradicting Bruch and Mare’s advice, past econometric research shows that no statistical correction is needed when using simple random sampling of unchosen alternatives to pare down respondents’ choice sets. Using data on stated residential preferences con-tained in the Los Angeles portion of the Multi-City Study of Urban Inequality, it is shown that following Bruch and Mare’s advice—to implement a statistical correction for simple random choice set sampling—leads to biased coefficient estimates. This bias is all but eliminated if the sampling correction is omitted.

(4)

Bruch and Mare’s 2012 article, “Methodological Issues in the Analysis of Residential Preferences and Residential Mobility,” re-introduces a flexible class of discrete choice models into the mobility researcher’s toolkit. These models are especially relevant to research on residential mobility and migration because they allow for the simultaneous consideration of push and pull factors in the migration process. Bruch and Mare’s contribution is already pay-ing dividends in research on racial segregation (Quillian 2015; Sprpay-ing et al. 2017), and I hope more segregation and mobility researchers incorporate these methods into their research.

Much of Bruch and Mare’s discussion focuses on what sociologists and epidemiologists often call conditional logistic regression, but which economists and choice modelers call multinomial logistic regression (MNL). While the advice Bruch and Mare offer about the MNL model is generally sound, they make an erroneous suggestion regarding sampling that stems from a misreading of methodological developments in econometrics. Following Bruch and Mare’s advice could lead to biased parameter estimates. Given the potential for bias, the aim of the present note is to correct the sociological record and provide updated guidance for sampling in MNL models. I use an empirical case to demonstrate that Bruch and Mare’s advice may lead researchers to overestimate coefficient magnitudes, especially when using very small sampling fractions, a likely occurrence in residential mobility research conducted in large urban areas.

In research on residential mobility, MNL models can be computationally cumbersome because they require an analyst to treat every neighborhood in a city as a distinct categorical outcome, or alternative in the parlance of choice modelers. These neighborhood alternatives form a choice set from which each respondent chooses a neighborhood in which to live. To estimate MNL models, the analyst constructs a dataset by cross joining data describing the attributes of N individuals with data describing the characteristics of J neighborhoods in the choice set. Many reasonable unit definitions lead to very large choice sets for major metropolitan areas, requiring the production of datasets with enormous (N × J ) numbers of person-alternative observations and inducing long model estimation times.

(5)

Thankfully, the econometric literature provides a convenient workaround. Bruch and Mare highlight research by McFadden (1978) showing that consistent parameter estimates can be obtained using a sampling approach that dramatically reduces the computational burden. The approach involves randomly assigning a small sample of unchosen neighbor-hoods to each respondent’s choice set, in addition to each respondent’s observed destination. This results in dramatically reduced dataset sizes and enables speedier model fitting.

In some cases, estimating a MNL model with a sample of unchosen alternatives requires implementing a statistical correction. Uniquely for the MNL model, applying a correction is as simple as estimating a standard MNL model with the reduced choice set, but with a user-calculated sampling correction included as a covariate with a constrained coefficient. The challenge for the researcher is to come up with the appropriate sampling correction.

Bruch and Mare (p. 121-122) present an expression for neighborhood choice probabilities in the case of choice set sampling. The probability that persion i chooses neighborhood j when unchosen alternatives are sampled into a reduced choice set, C(i) is:

pij(Zj, Xi, C(i)) =

exp [βZj+ γZjXi− ln qij]

P

k∈C(i)exp [βZk+ γZkXi− ln qik]

, j ∈ C(i). (1)

Here, Zj are characteristics of neighborhoods, Xi are characteristics of individuals or

house-holds, and ln(qij) is a correction factor to be included in the model with its coefficient

constrained to −1. qij is a user defined sampling probability indicating the probability of

including alternative j in respondent i’s reduced choice set.

As presented, this expression is misleading. While Bruch and Mare imply that (1) is general, it actually corresponds to the special case of importance sampling (Ben-Akiva and Lerman 1985, p. 265). In fact, the general expression is:

pij(Zj, Xi, C(i)) =

expβZj+ γZjXi+ ln πi C(i)|j

 P

k∈C(i)expβZk+ γZkXi+ ln πi C(i)|k

 , j ∈ C(i). (2)

The general sampling correction for alternative j is given by ln πi C(i)|j, where πi C(i)|j

(6)

are sampling probabilities for the reduced choice set, considered as a whole, given the hypo-thetical choice of j ∈ C(i). In other words, the sampling corrections in the general case are

calculated based on the sampling probabilities for all alternatives in the reduced choice set except for the focal alternative for which the sampling correction is being calculated. At the moment of calculation, the focal alternative is treated as if it were the chosen alternative. It just so happens that for the MNL model, maximizing a conditional likelihood constructed with the probabilities in (2) produces estimates consistent with those obtained by maximizing the unconditional likelihood based on the full choice set (Manski and McFadden 1981).

By proceeding from the more general expression in (2), it can be seen that no sampling correction is needed when drawing simple random samples of unchosen alternatives into reduced choice sets. This follows from the uniform conditioning property of simple random sampling, whereby πi C(i)|j = πi C(i)|k



∀ j, k ∈ C(i) (McFadden 1978, p. 545). Thus

the sampling correction terms in the numerator and denominator of (2), ln πi C(i)|j and

ln πi C(i)|k, cancel out and are ignored when estimating the model. Key texts in choice

modelling are unequivocal on this point: Ben-Akiva and Lerman’s (1985) text states that when using simple random sampling, “a standard logit model with a choice set given by C(i)

yields consistent estimates” (p. 264). Train’s more recent text (2009) states, “If all [unchosen] alternatives have the same chance of being selected into the subset, then estimation proceeds on the subset of alternatives as if it were the full set.” (p. 64). In contrast, Bruch and Mare advise estimating a model of the form (1) and setting qij as follows: “1. If the alternative

is chosen, sample with qij = 1.0” and “2. If the alternative is not chosen, sample with

qij  1.0” (p. 122). This implies applying a sampling correction when drawing a simple

random sample of unchosen alternatives, contradicting the previous econometric advice. I empirically confirm the econometric advice for simple random sampling by estimating discrete choice models of neighborhood choice using data from the Los Angeles portion of the Multi-City Study of Urban Inequality (LA-MCSUI).1 The LA-MCSUI survey contained an

(7)

racial composition. Respondents were presented with a card depicting a stylized neighbor-hood containing three rows of five houses and were asked to imagine living in the central house. Respondents were then asked to depict their ideal neighborhood by filling in each of the remaining 14 houses with White, Black, Hispanic, or Asian occupants. Ignoring the dif-ferent spatial arrangements of neighbors, respondents could generate 14+4−14−1  = 680 distinct racial compositions. Thus, the ideal neighborhood task can be conceptualized as a discrete choice problem in which respondents choose one preferred neighborhood racial composition out of 680 possible racial compositions. I focus on the data for White respondents, and estimate discrete choice models where the probability of choosing neighborhood j takes the following form: pij(Zj, Xi, C(i)) = exp [βAP Aj+ βBP Bj + βHP Hj + α ln qij] P k∈Cexp [βAP Ak+ βBP Bk+ βHP Hk+ α ln qik] , j ∈ C(i). (3)

Where P Aj, P Bj, and P Hj are the proportions Asian, Black, and Hispanic among the

possible neighborhoods, β’s are the coefficients indicating the preference for these different groups among White respondents, qij is the sampling probability calculated according to

Bruch and Mare’s advice (i.e., with qij = 1 for the chosen alternative), and α is a parameter

to be constrained either to 0, following the econometric advice, or −1, producing estimates based on Bruch and Mare’s advice.

I compare coefficient estimates using the full choice set to coefficient estimates obtained when sampling unchosen alternatives into reduced choice sets. Producing estimates for the full choice set is not too onerous, as the number of respondents is reasonable and the full choice set is not too large. As for the reduced choice sets, I produce 100 data sets for each of several reduced choice set sizes J = {5, 10, 20, 50, 100}, by randomly sampling J −1 unchosen alternatives into each respondent’s choice set. The chosen neighborhood is always included, so a reduced choice set of 5 neighborhoods contains a sample of 4 unchosen neighborhoods, a reduced choice set of 10 neighborhoods contains a sample of 9 unchosen neighborhoods, and

(8)

so on. For each dataset, I estimate the model without using a sampling correction (α = 0) and then using Bruch and Mare’s correction (α = −1). Thus, I estimate the model in (3) a total of 2 × 100 × 5 + 1 = 1, 001 times. These estimates are summarized in Figure 1, which presents the median, interquartile range, and 95% intervals for sets of 100 point estimates.

Coefficient estimates presented in Figure 1 reveal that the original econometric advice is sound, while Bruch and Mare’s correction biases estimates upwards in magnitude. The smaller the number of sampled alternatives, the greater the bias. Estimates are also highly variable when using small sampling fractions with Bruch and Mare’s sampling correction. In contrast, eliminating the sampling correction leads to estimates that are in line with estimates obtained using the full choice set. Note that even when eliminating the sampling correction, smaller sampling fractions lead to a greater loss of alternative-specific information, rendering the estimates more variable.

There are, of course, other schemes besides simple random sampling, and these require sampling corrections. Different sampling schemes are usually used to increase the precision of estimates by ensuring that alternatives deemed most relevant to the choice process are in-cluded in each respondent’s choice set, or by obtaining sufficient variance in alternative-level covariates. The importance sampling approach represented by (1) is one such scheme. For example, in residential mobility research it is possible to use MNL to estimate a single model for both movers and non-movers that accounts for push and pull factors simultaneously. This involves treating the household’s current neighborhood or housing unit as an alternative in the broader choice set, albeit an alternative that is more likely to be chosen than all oth-ers. Because immobility is common, it can be useful to always include the mover’s origin in the reduced choice set (i.e., setting qij = 1 for that neighborhood or housing unit), while

sampling the remaining alternatives with qij  1.

Even in the importance sampling case, Bruch and Mare’s advice misses the mark. They advise always setting qij = 1 for the chosen alternative, but the observed choice should

(9)

included in the reduced choice set, qij should be set for all alternatives according to a priori

considerations (Ben-Akiva and Lerman 1985, p. 265). The logic of the sampling correction in (2) hinges on calculating the conditional probability of sampling the remainder of the reduced choice set conditional on the hypothetical —not observed—choice of the alternative for which the sampling correction is being calculated. It just happens to be the case that (2) simplifies mathematically to (1) for importance sampling. So, qij should not be set to 1 for

the chosen alternative unless the analyst’s a priori sampling scheme dictates it; for example, if the chosen alternative is also the origin alternative in a combined, mover-stayer mobility model.

In conclusion, no sampling correction should be used when reducing choice sets through simple random sampling of unchosen alternatives in MNL models. Using an inappropriate sampling correction leads to biased coefficients. Instead, researchers should simply estimate their models as normal, treating the reduced choice set as if it were the full choice set. If an analyst wishes to use a more complex sampling design, the observed choice should not receive a special sampling correction just because it is always included in the choice set. Its sampling correction, if needed, should be calculated according to the same a priori sampling probabilities applied to the non-chosen alternatives. I urge researchers who want to explore more complex sampling designs to consult Ben-Akiva and Lerman (1985) for additional insights.

(10)

ENDNOTES

1. The repository https://bitbucket.org/igjarjuk/bmcom contains all of the Stata code and data needed to reproduce this analysis.

(11)

BIOGRAPHY

Benjamin Jarvis is senior lecturer at the Institute for Analytical Sociology, Link¨oping Univer-sity. He uses discrete choice models in conjunction with agent-based and micro-simulations to study residential segregation dynamics and social mobility. His current research uses Swedish registers to study how segregation affects and is affected by assortative mating and kinship ties.

(12)

REFERENCES

Ben-Akiva, Moshe E. and Steven R. Lerman. 1985. Discrete Choice Analysis: Theory and Application to Travel Demand . Cambridge, Massachusetts: MIT Press.

Bruch, Elizabeth E. and Robert D. Mare. 2012. “Methodological Issues in the Analysis of Residential Preferences, Residential Mobility, and Neighborhood Change.” Sociological Methodology 42(1):103–154.

Manski, Charles F. and Daniel McFadden. 1981. “Alternative Estimators and Sample Designs for Discrete Choice Analysis.” In Structural Analysis of Discrete Data with Econometric Applications, edited by Charles F. Manski and Daniel McFadden, pp. 2–50. Cambridge, Massachusetts: MIT Press.

McFadden, Daniel. 1978. “Modeling the Choice of Residential Location.” In Spatial Inter-action Theory and Planning Models, edited by Anders Karlqvist, Lars Lundqvist, Folke Snickers, and J¨orgen W. Weibull, volume 3 of Studies in Regional Science and Urban Economics, pp. 75–96. North-Holland Publishing Company.

Quillian, Lincoln. 2015. “A Comparison of Traditional and Discrete-Choice Approaches to the Analysis of Residential Mobility and Locational Attainment.” The ANNALS of the American Academy of Political and Social Science 660(1):240–260.

Spring, Amy, Elizabeth Ackert, Kyle Crowder, and Scott J. South. 2017. “Influence of Prox-imity to Kin on Residential Mobility and Destination Choice: Examining Local Movers in Metropolitan Areas.” Demography 54(4):1277–1304.

Train, Kenneth. 2009. Discrete Choice Methods with Simulation, 2nd ed. Cambridge: Cam-bridge University Press.

(13)

-0 .7 -0 .6 -0 .5 -0 .4 -0 .3 -0 .2 5 10 20 50 100 % Asian coef. 5 10 20 50 100 % Black coef. 5 10 20 50 100 % Hispanic coef.

co

e

ffici

e

n

t

e

st

ima

te

choice set size

no sampling correction Bruch-Mare sampling correction

Figure 1: Neighborhood racial composition coefficient estimates from MNL models of ideal neighborhood choice, White LA-MCSUI respondents (N=791). Dark horizontal lines show the point estimates obtained for the full choice set of 680 possible neighborhood racial compositions. Each box summarizes the interquartile range and whiskers 95 percent intervals for 100 point estimates obtained from models estimated with the specified sampling correction on 100 datasets constructed with the specified choice set sizes. The choice sets in each dataset were constructed by randomly drawing, without replacement, a fixed number (4, 9, 19, 49, or 99) of unchosen alternatives and adding each respondent’s observed choice.

References

Related documents

Influences o f bene ficiaries, t echniques, alt ernativ es and choice ar chit ect s 2018.?. FACULTY OF ARTS

Random sampling analysis is an alternate method, that is under development, to analyze the alloy composition of steel scraps compared to test melts.RSA are performed by

This involves the intake of raw water from lake Mälaren, the treatment process using unit processes inside the water treatment plant and the final production of 1 m 3 of

There are plenty of things needed to be taken in consideration when using Titanium to reach several platforms, like that some user interface elements only exist for one of

This is the concrete act of jumping which the man in the case performs, and an act like that is presumably not contained in any optimific life history world open to the man before

In addition, as the steering documents mention the equality between genders as one of the fundamental values to be taught in the Swedish school system, bringing copies

We start by giving some key results for the basic multivariate count data AR(1) model, before introducing spatial effects and exogenous variables in this setup.. , y Mt ) 0

In this section we shall describe the six strategies that are spanned by two designs, stratified simple random sampling —STSI— and proportional-to-size sampling — πps— on the