## Discrete Choice Modeling based on Utility Theory

## to Explain Response Propensity in Sampling

## Surveys

**Author: Mattias Holm **

**(930705) **

*Fall 2019 *

Independent Project I, Master Thesis, 15 credits Statistics

Örebro University, School of Business Supervisor: Thomas Laitila

## Abstract

A discrete choice model based on utility theory is composed to describe behaviors affecting an individual's response propensity in sampling surveys. With goods and leisure as constraints, and variables affecting both the response rate and the survey variable as a function of utility, a threshold for the discrete decision to participate or not participate in a survey is constructed. By using household expenditure survey data from Statistics Sweden, the approach is tested practical, where the results display a high significant correspondence between the theoretical model and the empirical findings. The conclusions are that the discrete choice model seems to be an appropriate approach when modeling behaviors affecting a single individual’s response propensity and can be a useful foundation in other and more complex modeling and adjustment of nonresponse.

*Keywords: Response propensity, response propensity factor, nonresponse, utility theory, *
*discrete choice model *

Acknowledgement: Many thanks to Professor and supervisor Thomas Laitila for contributions both in the theoretical and empirical elements, as well as guidance and support during the entire thesis process.

**Table of Contents **

**1. INTRODUCTION ... 2**

1.1.PREVIOUS RESEARCH ... 3

**2. THEORETICAL FRAMEWORK ... 4**

2.1.DISCRETE CHOICE MODEL ... 4

2.2RESPONSE PROPENSITY FACTOR ... 8

**3. DATA ... 8**

3.1.HOUSEHOLD EXPENDITURE SURVEY (HUT) ... 8

3.2.VARIABLES ... 9
**4. MODELING ... 10**
**5. RESULTS ... 11**
**6. DISCUSSION ... 12**
**7. CONCLUSIONS ... 13**
**REFERENCES ... 14**
**APPENDIX ... 16**

## 1. Introduction

Reasons for nonresponse and methods to reduce it has been an increasing concern for several decades (Brick, 2013; Tourangeau & Plewes, 2013). Still, there is considerable uncertainty in how nonresponse affects the final estimates. Brick (2013) classifies three major themes in the nonresponse research; mechanisms that cause nonresponse, data collection methods to reduce nonresponse, and statistical methods adjusting for nonresponse. The main target in the latter is to minimize bias caused by nonresponse but cannot be seen as an isolated matter from the other two subjects (Groves, 2006; Groves et al., 2000). Aspects such as personal incentives, appearances from the interview or the survey illustration, and various appealing survey subjects, can affect an individual’s response propensity differently within and between surveys, while at the same time correlate to attributes affecting the survey variable (Groves et al., 2000; Fjelkegård & Persson, 2012). This demands auxiliary information to assist in reducing the covariance between the response propensity and the survey variable (Groves, 2006). However, the application of this information is challenging and distinguishes between surveys.

In a meta-analysis by Peytcheva and Groves (2009), demographical variables used as auxiliary information in the examined studies display large differences in mean values between respondents and non-respondents (all though not statistically tested). Brick (2013), however, with regards to Peytcheva and Groves (2009), states that demographic variables might not always be effective in reducing nonresponse bias. The statements are not contradictory, but statistical adjustment needs to be considered if there are discrepancies in distribution (or at least mean value) between auxiliary variables for response and nonresponse items. These somehow inconsistent allegations can be interpreted as a need for new approaches when modeling for nonresponse adjustment.

The aim in this study is to describe the behavior of response propensity by theoretically
concretize these mechanisms with a discrete choice model based on utility theory. The discrete
choice is a binary decision to participate in a survey or not to participate, where the choice is
assumed to fall on the outcome yielding the highest perceived utility. The theorical application
*incorporates a response propensity factor, interpreted as a function of both measurable and *
latent variables affecting the response rate, and should possess the ability to effectively reduce
the covariance between the response propensity and the survey variable (Groves, 2006). Both
utility theory from a qualitative statistical perspective (Groves et al. 2000; AAPOR, 2014;
Fjelkegård & Persson, 2012) and a micro econometrics perspective (Train & McFadden, 1978;
Jara-Diaz, 1998; Gärling et al., 1998), as well as nonresponse bias research (e.g. Peytcheva &
Groves, 2009; Groves 2006; Brick, 2013), are the basis for the approach in this study.

By using data from Statistics Sweden’s Household expenditure survey (HUT), the empirical
models display significant results and correspond well to the theoretical model. The model
*specification including age and income as variables of the response propensity factor seems to *
be the most appropriate of the models tested, where the Hosmer and Lemeshow test indicates
a good fit for the applied data. The main conclusion is that illustrating an individual’s perceived
utility with a discrete choice model seems to be a suitable approach when modeling behaviors

affecting response propensity. Even though the application is limited to a single individual’s choice of participate in a survey or not, the methodology can be applied as a foundation in other and more complex modeling of response propensity in sampling surveys.

Previous research within the field of nonresponse and the applied utility theory are reviewed in second part of this first section. In section two, a description of the theoretical nonresponse modeling process is presented. The data is presented, and the gathering process in HUT is explained in section three. This follows by modeling the binary choice model in section four. Further, the models are empirically tested in section five and discussed with respect to the fit of the model, and development areas of the theoretical approach in section six. Final conclusions are then given in the seventh section.

1.1. Previous research

In research papers that treat general adjustment methods, such as in Lundsröm and Särndal
(1999), correct auxiliary information is often proclaimed required to reduce nonresponse bias
and can do more harm than good if definite and used incorrectly. The demand for modeling
approaches of nonresponse is thus evident, where one application is made by Groves et al.
*(2000). The authors present the Leverage-saliency theory for survey response propensity, with *
an illustrative concept: An individual has attributes affecting the decision to participate in a
survey, where the attribute leverage varies between individuals. These attributes can be
assigned diverse salient roles in the survey request, for instance, by a strong social purpose
described in the survey instructions, or an interviewer actively advertising a specific reward.
In regard to this, a prominent attribute can have a significant role in a particular survey, and
contrary, no effect if the attribute is omitted. This is undoubtedly one explanation as to why
research on survey participation has been hard to replicate, as Groves et al. (2000) proclaim,
resulting in few trusted and consistent ways in measuring the effects of different attributes.
This advocate individualized (at least sub-grouped) survey formatting in terms of both
appearance and message to increase the response propensity, which on the other hand, may
complicate bias identification and reduction in the estimation process.

Tolonen et al. (2006) review the response rate in the Finnish adult health behavior survey from 1978-2002, where both gender, age, marital status, and education level are demonstrated to affect the response rate. Further, the difference between each variable's categories seems to have small changes and be moderately consistent over time, supporting the findings, where the response rate also overall declines over time. It is though important to highlight that the survey design affect individuals differently, consequently also differently between surveys (Groves et al. 2006). Even though the Finnish adult health behavior survey cannot represent all cases and survey designs, the results demonstrate that demographic variables are valuable and useful tools when handling nonresponse bias adjustment.

Many studies have focused on the relation between nonresponse bias and nonresponse rate from several angels. Peytcheva and Groves (2009) conducted a meta-analysis of 23 studies to review whether nonresponse bias in demographics variables are related to the nonresponse rate, where no strong evidence were established, indicating that nonresponse rates has a small effect

on corresponding biases. One cause of these results can be connected to the conclusions in Groves et al. (2000), that comparison of demographics’ explanation degree between studies might not be preferable. It is however crucial to consider these variables as adjustment tools, as previously specified.

To state time as a factor of response propensity seem reasonable, as the occasion of answering in a survey can stretch for a long period of time. The actual effect on the response rate is though hard to assess, where the main reason is due to that the perceived time to complete the survey is not linearly increasing to the actual time (Vercruyssen, Putte, & Stoop, 2011). Consequently, the effect of time to complete a survey can be assumed to interact with other factors affecting the decision to participate, both caused by the interviewer and the survey design, as well as the individual incentives and demographic. Further, in a single survey, the time is constant (approximately) across individuals, but still, one can expect that the time effort is perceived. However, if the auxiliary information is specified correctly, the assumption that the included variables will contain each individual’s perceived time effort, and the effect of the time to complete the survey on the response propensity can be treated as constant in a single survey. The theory of social exchange was one of the first recognized approaches adapted to the nonresponse qualitatively theory and is explained by Don Dillman in his book from 1978 (Fjelkegård & Persson, 2012). The theory assumes an exchange situation where an individual (respondent) strives to maximize the profit and minimize the loss. The trade-off can be applied to other than between money and goods, as in survey circumstances - to participate or not participate, where the outcome consequently is a perceived utility rather than a quantitative utility. The social exchange theory can be adapted to a discrete choice model, where the outcome depends on the discrete choice as a function of the utility. The foundation of the model was constructed by McFadden and Train (1978) and can be applied to most kinds of decisions between qualitatively diverse commodities (Jara-Diaz, 1998). The utility is maximized with respect to working time, which is a component in both the goods and the leisure constraint. Depending on the discrete choice, a different cost and time will affect the utility. The choice in this study is either to respond or not to respond, where the theoretical application is thoroughly explained in the next section.

## 2. Theoretical framework

In this section, the discrete choice model under the utility function constraints are derived, followed by an interpretation of a threshold equation of the choice to participant in a survey or not.

2.1. Discrete choice model

A discrete choice model based on classic utility theory is applied to quantitatively map the characteristics of response behavior. The discrete choice for each individual is either to respond or not to respond. If an individual decides to respond the individual yields a utility of 𝑈", and

*output is selected. The utility functions are quantified by goods consumption G, leisure time L *
(Train & McFadden, 1978; Jara-Diaz, 1998; Gärling et al., 1998), and an added response
*propensity factor X, which can be interpreted as a function of both measurable and latent *
variables. The approach is summarized as follows:

*Max𝑈(𝐺, 𝐿, 𝑋) * (1)

Under constraint

*𝐺 = 𝑤𝑊 + 𝐸 + 𝑐 * (2)

*𝐿 = 𝜏 − 𝑊 − 𝑡 * (3)

*where w is wage and W is working time. E is other income than wage, and c is economic *
incentives to respond (e.g. money and lottery tickets). Leisure is a function of total time
available 𝜏, working time, and the time to complete the response t. Working time affects goods
*consumption and leisure time contrary, were a higher W implicate increased G and decreased *
*L. Thus, the trade-off between goods consumption and leisure time depends on the optimal *
*value of W. To solve the optimal working time, U is maximized with respect to W. Given that *
*U takes the Cobb-Douglas form; *𝑈 = 𝐾𝐺"78_{𝐿}8_{𝑋}9_{, where K is a constant, 𝛽 is a parameter }

*[0,1] that weights an individual’s preferences between G and L, and R is equal to 1 if an *
*individual respond and 0 if not, the expression for optimal working time W* is*1_{: }

𝑊∗ _{= (𝜏 − 𝑡)(1 − 𝛽) −}𝐸 + 𝑐

𝑤 𝛽 (4)

*As seen in equation (4), W* is a function of 𝜏 − 𝑡, 𝐸 + 𝑐, and w. By substitution G and L in the *
*utility function with G(W*) and L(W*), respectively, U* is a maximized utility function given *
*W*, and constraint G and L, and takes the form: *

𝑈∗ _{= 𝐾(1 − 𝛽)}"78_{𝛽}8_{𝑤}78_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]𝑋}9 _{(5) }

Further, given the individual’s choice to respond or not to respond yields the two different
utility functions 𝑈_{"}∗_{ and 𝑈}

#∗:

𝑈_{"}∗_{= 𝐾(1 − 𝛽)}"78_{𝛽}8_{𝑤}78_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]𝑋 , R=1 }_{(6) }

𝑈_{#}∗ _{= 𝐾(1 − 𝛽)}"78_{𝛽}8_{𝑤}78_{[𝑤𝜏 + 𝐸] , R=0 }_{(7) }

The distinction between the functions clarifies two things, first, the time to complete the
*respond t, and the incentive to respond c, has no contribution to 𝑈*_{#}∗_{ (i.e. t=c=0), and second, }

the response propensity factor has no direct effect on 𝑈_{#}∗_{. It is however central to proclaim that }

*even though an individual chooses not to respond, the response propensity is affected by X, as *
concluded further down in equation (9). As an example, connected to the Leverage-saliency
theory, social interest in the survey or the survey results probably has a positive effect on the
propensity to respond (Groves, Singer & Corning, 2000; Fjelkegård & Persson, 2012). In this
regard, it is also important to note that the utility is an individual’s perceived utility of
responding to a survey or not responding to a survey. Further, an individual will choose to
participate in the survey if 𝑈_{"}∗_{ is greater than 𝑈}

#∗:

𝑈_{"}∗ _{> 𝑈}

#∗ (8)

By rearranging equation (8) and set the threshold to zero for survey participation, the interpretation of factors affecting the response propensity from a practical situation becomes fairly convenient:

*𝑤[𝑋(𝜏 − 𝑡) − 𝜏] + [𝑋(𝐸 + 𝑐) − 𝐸] > 0 * (9)
*Assuming that the response propensity factor X is indifference (X=1), and τ - t and c are *
constant across all individuals, wage has a negative effect on survey participation. This implies
*that c needs to be high, and since economic incentives is not infrequently zero, other income *
*under these circumstances (e.g. interest income or welfare) are demanded to compensate for w, *
*where a low t requires less input of E + c. In some cases, both E and c will be equal to zero *
and have no effect on the utility to respond, which moreover implies no influence by wage,
*whereas the magnitude of X determines if the outcome is positive or negative. As *
*comprehended, the outcome will depend on the response propensity factor X and its *
*contribution to the relation between wage and other income, where X contributes positively if *
positive2_{. Further, equation (9) display a main effect from w and E as linear functions, whereas }

*X can be interpreted as an interaction effect with time, wage, economic incentives, and other *
income. Hence, an optimistic attitude towards survey participation will decrease the perceived
time effort, indicated by an increased response propensity, affecting both the terms on the
left-hand side positively.

Wage will only affect the response propensity from a continuous perspective since the
*left-hand side will be more differentiated from zero when w is large, and w cannot be less than zero. *
Artsev et al. (2008) empirically conclude, with data from the Israel household expenditure
survey (HET), that nonresponse in the survey of family expenditure is a decreasing convex
function of income (i.e., increasing concave of response). This suggests including a quadratic
factor of wage in an estimation process, in line with the structure of the threshold equation. If
wage, however, is equal to zero, which is not too rare, this entails that working time is equal to
zero as well3_{. An individual’s utility is then no longer depending on the trade-off between the }
*two constraints G and L since w and W is omitted*4_{, which implies that the maximization }
2_{ X can, by definition, be negative but consequently always imply a negative outcome, and will therefore not be }

discussed in this theoretical application.

3_{ Volunteer work and other sorts of unpaid work are assumed to be chosen as leisure time. }
4_{ Thus, when wage is equal to zero; 𝐺 = 𝐸 + 𝑐 and 𝐿 = 𝜏 − 𝑡. }

problem does not need to be solved. Due to these different preconditions, the utilities of
responding or not responding when wage is equal to zero need to be added onto equation (9).
*Thus, by directly substituting G and L into the utility function, taking the same Cobb-Douglas *
form as in page five, the utility expression of responding and not responding when wage is
equal to zero, respectively, take the following forms:

𝑈",BC# = 𝐾(𝐸 + 𝑐)"78(𝜏 − 𝑡)8*𝑋 , R=1 * (10)

𝑈_{#,BC#} = 𝐾𝐸"78_{𝜏}8_{ , R=0 }_{(11) }

In the same manner as equation (8), an individual will choose to participate in the survey if

𝑈_{",BC#} is greater than 𝑈_{#,BC#}. By rearranging the condition and set the right-hand side equal to

zero the following equation is obtained:

𝑋 − 1

(1 + 𝑐𝐸)"78

(1 − 𝑡𝜏)8 *> 0 * (12)

As seen in equation (12), the choice depends on an individual’s fixed value of 𝛽. An individual
with 𝛽 higher than 0.5, i.e. values leisure more than goods, will choose not to participate in a
survey more often compared to an individual with 𝛽 lower than 0.5, ceteris paribus. However,
the value of 𝛽 gets less essential when the quotient between c and E, respectively t and 𝜏, are
close to zero5* _{. In the not infrequently case when c is zero, the left term in the denominator is }*
equal to one, whereas time to complete a survey relative to total time available can be assumed
to be small. This causes the second term on the left-hand side to be close to one across all

*individuals and can be identified as a constant. By denoting this constant M, and integrating*the case when wage is equal to zero onto equation (9), the following threshold equation for an individual to respond is obtained:

*𝑤(𝑋(𝜏 − 𝑡) − 𝜏) + 1[𝑤 > 0](𝑋(𝐸 + 𝑐) − 𝐸) + 1[𝑤 = 0](𝑋 − 𝑀) > 0 * (13)
Both the case when wage is equal to zero and more than zero are included in equation (13) and
by denoting,

d = F1,_{0,} w = 0_{w ≥ 0}

and 𝑑̅ = F

1, w ≥ 0

0, w = 0

(13) can be written as:

*𝑋K𝑤(𝜏 − 𝑡) + 𝑑̅(𝐸 + 𝑐) + 𝑑L − K𝑤𝜏 + 𝑑̅𝐸L − 𝑑𝑀 > 0 * (14)

5_{ See Figure A1 in Appendix for illustration of the right term on the left-hand side (denoted M) with different }

The interpretation of (14) is the same as (9) with the added element that the response propensity
*factor needs to be larger than M at the occasion when wage is equal to zero, for an individual *
to respond. Equation (14) will be the foundation in the process of modeling the binary outcome
of responding or not responding to a survey in section four.

2.2 Response propensity factor

The variables in the response propensity factor should be interactively valid with the variables
in the utility model and suitable enough to reduce the covariance between response propensity
*and the survey variable (Groves, 2006). X must, therefore, have an effect on both the response *
propensity and the survey variable. Thus, variables with these assumed to have properties, as
well as usable in the empirical application for the HUT, are briefly discussed in this paragraph.
A difference between male and female, as well as having children or not, are to expect in terms
of overall time pressure (Mattingly & Sayer, 2006), which in turn might affect the perceived
utility of answering survey negatively, due to less leisure time. Further, both gender, age, and
education level seem to affect response propensity and thereby need to be controlled for, as
concluded by, for example, Talonen et al. (2006) and additionally might capturing underlying
attributes affecting the response propensity. As previously mentioned, Artsev et al. (2008)
concluded response rate to be a function of income and could as well be included when
modeling response propensity in the HUT survey. The above-mentioned qualities are possible
variables to include in a final empirical model to reduce the nonresponse bias. The modeling
of the response propensity factor is presented in conjunction with the modeling of the binary
outcome, and further integrated into the binary model, as equation (14) emphasizes.

## 3. Data

The empirical study is based on Statistics Sweden’s data from HUT 2007. In this section, the collection method and specification of data from HUT are explained. Variable included in the later on presented models are also described.

3.1. Household expenditure Survey (HUT)

Statistical Sweden’s survey HUT serves to illuminate for expenditure of goods and services for
different reporting groups (different combinations of individuals living in the household)
(Fridlund Karlsson, 2008). HUT is a sampling survey with data collected in a continuing
twelve-month process, where each randomly selected household participate under a four weeks
period (two weeks gathering of data). The implementation of the survey follows a collection
*protocol, where each participant is being both interviewed regarding their households' larger *
*expenditure and purchases, and capital goods the last twelve months, as well asked to keep a *
cash book6_{. }

6_{ Each participant should keep track of all their household purchase in a cash book or save receipts under a 14 }

*days long period. A households’ larger expenditure and purchases regards to, for example, bought and sold *
furniture, telephone cost, and insurance cost. The exact information assembled by Statistical Sweden can be found
in Fridlund Karlsson (2008).

The sample consists of 4000 households, uniformly distributed weekly-wise for the
twelve-month process, yielding 52 equally sized sub-samples (Fridlund Karlsson, 2008). The sample
frame is the Swedish register over the total population (RTB), where at least one individual
resident in each household needs to be in the age of 0-79 years, to have a nonzero probability
*of being selected. There are ten different household groups in the survey, where Single with *
*children and Single without children will be used in this study, where a child is an individual *
in the age of 0-18 years living in the household. To only use these two household groups are
due to the inherent assumption of individual utility in the theoretical approach. To handle
household groups that include cohabiting (more than one individual over 18 years old) can be
assumed to have a combined utility of answering a survey, of two or more individual utilities,
and thereby be more challenging to construct and interpret. Consequently, these groups cannot
be supported by the theoretical approach in this study. This should not be seen as a drawback,
but rather as a scaled-down problem with a more approachable starting point to this new
approach of modeling and examine response propensity. Cohabiting of individuals who is
between the age of 19-32 and who is younger than the age of 19 are omitted, i.e., that an
individual need to be 13 years or older on the occasion when having a child/children.

3.2. Variables

The Swedish household expenditure data from 2007 contains 1 181 individuals who are living
as singles were 490 of these individuals participated in the survey. In Table 1 below, descriptive
statistics of the included variables for all 1 181 individuals in this study are presented. The
auxiliary information of all the individuals are available from registers of Statistical Sweden.
*Table 1. Descriptive statistics of data from HUT 20077 _{. }*

*N = 1 181 * Median Mean SD Min. Max.

*R * 0 0.41 0.49 0 1
*gender * 1 0.53 0.50 0 1
*age * 43 45.81 17.15 19 79
*nchildren * 0 0.36 0.78 0 5
*inc * 6 477 7 276 5 081 -2 806 74 533
*w * 62.05 85.5 92.65 0 586.59
*E * 2 573 4 206 6 378 0 84 495
*d * 0 0.35 0.48 0 1
*H * 25 126 32 933 29 935 0 210 321
*𝐻𝑐 * 25 016 32 762 29 754 0 209 572
*gender(Hc+d) * 4 129 16 163 24 571 0 168 969
*age(Hc+d) * 962 919 1 384 248 1 339 360 22 10 404 516
*nchildren(Hc+d) * 0 14 394 44 247 0 838 287

*inc(Hc+d) * 1.64E+08 3.42E+08 6.59E+08 -228 100 1.56E+10

*In this study, age, gender, nchildren, and inc are used as variables in the response propensity *
factor. As seen in the table, 35 percent of the individuals have zero wage, and a few individuals
*(eight of them) has a negative income. E, other income, less than zero are truncated, whereas *
*the two weeks disposable income, inc, can be negative. The variables H and Hc are generated *
to simplify the empirical model presented in section four and correspond to terms in (14), where
*Hc is 𝑤(𝜏 − 𝑡) + 𝑑̅(𝐸 + 𝑐) and H is 𝑤𝜏 + 𝑑̅𝐸. Since economic incentives, c, is zero in HUT, *
*the time to complete the response, t, is the variable contributing to the slightly lower values of *
*Hc than H. Further, the four variables included in the response propensity factor are multiplied *
by 𝐻𝑐 + 𝑑, respectively, to follow the specification of equation (14).

## 4. Modeling

Both the binary outcome model and the model of the response propensity factor are presented in this section. In the empirical process, a probit model will be used to estimate the response propensity, i.e. the probability that an individual respond.

The model of the response propensity factor is constructed as a multiple linear regression with a set of variables that explain both the response propensity and the survey variable, looking as follows:

𝑋 = 𝛽_{#}+ 𝛽_{"}𝑋_{"}+ ⋯ + 𝛽_{O}𝑋_{O} (15)
*In equation (15) There are k number of explanatory variables with k corresponding parameters *
𝛽 and an intercept 𝛽_{#}*. By substituting X into a probit model taking the shape of equation (14) *
*the probability to respond, R=1, can be modeled as: *

𝑃(𝑅 = 1) = Φ(𝛼 + 𝛽_{#}𝐻𝑐 + 𝛽_{#}∗_{𝑑 + 𝛽}

"𝑋"[𝐻𝑐 + 𝑑] + ⋯ + 𝛽O𝑋𝒌[𝐻𝑐 + 𝑑]

+ 𝛽_{OU"}*𝐻) * (16)

The above equation is a general probit model of the theoretical equation (14)8_{. From model }
(16), three models are constructed for empirical testing. The first (1) model includes age,
gender, number of children, and disposable income as variables of the response propensity
factor. Additionally, all four of these variables are also included as main effects in model (1),
as the model will serve as a comparison to model (2) and (3) who fits properly to the theoretical
formulation. In these two models, who match the theoretical model, all variables included as
main effects in model (1) from the response propensity factor are omitted, and further,
*nchildren(Hc+d) and gender(Hc+d) are excluded in model (3). *

To evaluate the goodness of fit of the models, the Likelihood ratio test (LR test) will be used
as a measure to compare the models. In relation to model (1) the other two models are nested,
and consequently two LR test will be performed. The Hosmer and Lemeshow test will also be
*conducted to evaluate how well the predicted values correspond to the true values of R. *
8_{ The derivation of equation (14) into the general probit model expression in (16) is shown in the appendix. }

## 5. Results

The estimated parameters for all three models are presented in this section, as well as the LR-test and the Hosmer and Lemeshow LR-test.

*Table 2. Results of three probit models estimates, LR test, and Hosmer and Lemeshow test. *

(1) (2) (3)

*(Intercept) * -0.6376 *** -0.4370 *** -0.4090 ***

*Hc * 0.01002 * 0.00539 *** 0.00520 ***

*d * -0.3658 *** -0.2612 ** -0.2791 **

*Age(Hc+d) * 2.765E-07 ** 3.573E-07 *** 3.636E-07 ***
*gender(Hc+d) * -2.273E-06 1.252E-06

*nchildren(Hc+d) * 1.687E-06 1.113E-06

*inc(Hc+d) * -3.662E-10 ** -3.379E-10 ** -2.811E-10 **

*H * -0.00996 * -0.00536 *** -0.00518 ***
*age * 0.0036
*gender * 0.2314 *
*nchildren * -0.0350
*inc * -4.599E-05
N 1181 1181 1181
Log-likelihood
(parameters)
-766.8022
(12)
-770.3618
(8)
-771.3858
(6)
P-value LR test - 0.1297 0.1644

P-value Hosmer and

Lemeshow test - - 0.3775

*Note: The significance levels are denoted as: *** = 0.01, ** = 0.05, * = 0.1. P-values for the LR test are for model (2) and *
*(3) both compared with model (1). *

*Model (1) display several insignificant parameters, where the added main effects, age, *
*nchildren, and inc, from the response propensity factor are three of them. By evaluating the *
*estimated parameters signs and compare them to equation (14), both Hc (positive), H *
*(negative), and d (negative) all correspond correctly to the theory for all three models. Examine *
*model (2), where the main effects of X are removed, the estimates get overall more significant *
*compared to model (1), except for d. The estimates for model (2) and (3) are fairly similar, and *
*the exclusion of the insignificant parameters gender(Hc+d) and nchildren(Hc+d) does not *
seem to affect the outcome, and thereby have no effect on the response propensity. Thus,
despite the significant estimated intercept, model (3) match equation (14) well. By viewing the
scatter plot of fitted values with model (3) and disposable income in Appendix (Figure A3),
the increasingly concave pattern is consistent with the empirical results from Artsev et al.
(2008).

The results of the LR test display that model (1) does not fit the data significantly better than
*none of model (2) and (3), when age, nchildren, gender, and inc are removed from the model. *
*By specifying X “more correctly” (as the results indicate), only entering age(Hc+d) and *
*inc(Hc+d), the P-value (0.1644) shows an even higher insignificant rate for model (3) than *
model (2). The P-value from the Hosmer and Lemeshow tests suggests that model (3) is
suitable to fit the data, and the null hypothesis that the true response rate differs from the fitted
values cannot be rejected.

## 6. Discussion

The similarities between the empirical models, especially setting (3), and the theoretical equation (14) suggests that the application using a utility function with discrete choice modeling might be a suitable approach in adjusting of nonresponse. From a modeling perspective, the drawbacks are the significant intercept, which is not a component in (14). This could be a shortcoming with the model, but also a variable interpretation error, as well as missing attributes within the response propensity factor. An example if this is education, which could be one explanation, as the response rate can differ between the level of education (Tolonen, 2006), and affect the survey variable, household expenditure, due to correlation with income.

The definition of measurements of the variables are also a subject of discussion, as an example,
*the truncation of other income (E), and whether it should be allowed to be negative. Other *
income such as interest capital can be negative, as well as liabilities, but are here seen as
*minimum zero, while disposable income is allowed to be negative. The negative values of E are *
though both few and small and will not affect the final estimates to a great extent in this
empirical study but should be further investigated to evaluate whether a truncation at zero is
appropriate or not. Another consideration is the wage measure since individuals in the
population, and thereby in the sample, work part-time, which is not being considered in this
study due to the missing information. If, for example, wage is affecting the response propensity
positively, which possibly is the case under the survey circumstances, a too high defined hourly
wage can contribute to an underestimation of the positive effect of the participation probability,
ceteris paribus.

In the HUT survey, the measurement of time can be relatively set, since each participant has to
keep a cash book and do interviews with Statistics Sweden under a two weeks period, and the
time to complete the survey gets small relative to this period, which is displayed in the Figure
A3 in Appendix. The relatively short accumulated time to complete the survey under the two
weeks participation period is though controlled for, which causes the small difference in the
*estimated parameters of H and Hc in all models. In other sampling surveys, the time to *
complete the survey relative to the total time might be harder to model and evaluate, and the
*assumption that M is constant across all individuals might not hold. Further, Vercruyssen et al. *
(2011), with support from Mattingly and Sayer (2006) (who proclaim that free time has the
potential to reduce time pressure), conclude that less free time increase the nonresponse causes
“too busy” and “have no time” measured with objective indicators of free time at weekdays

and weekends. Thus, the actual time to complete the survey might not either be quantifiable or a good indicator and could be treated as a constant across individual, whereas the perceived time effort quantified by, for example, free time, could be an indirect measurement. The variables included in the model could, however, be assumed to treat this effect, but more research is needed within this subject.

A factor that is ignored in the model is that some of the children to the individuals in the sample also has an income, which is not included as disposable income in the model. One could argue that it should be included since this factor affects the survey variable. However, the decision if and how children's income should be incorporated needs further evaluation since the utility yielded for answering the survey is assumed to regard the single individual choice to respond or not, which might be independent of their children's income. One further consideration regards whether the age of children should be included as a factor since different age groups might demand more work and time effort from the parent, where both actual and perceived time could be affected differently.

As the theoretical process of the discrete choice model has been the main focus in this study, the modeling of the response propensity factor is simply kept to a linear function of the explanatory variables. Even though the linear application and the included variables in the practical application fit the model well, the response propensity factor could also be modeled in other ways. This is one of the next steps in improving the model fit both for general and specific survey nonresponse adjustment, together with reasonably auxiliary variables, to fulfill the assumption of correlation with both attributes affecting the response rate and the survey variable, to reduce estimation bias.

## 7. Conclusions

The discrete choice model includes factors that, as often, not have a given correct specification,
*and additionally, where X and t are both challenging and crucial to determine. Nevertheless, *
this new theoretical approach and framework seems to be a suitable approach to concretize and
assess the factors affecting individual behavior of response propensity in sampling surveys.
The significant results and great similarities between the empirical results and the theoretical
model demonstrates a practical fit and emphasize further application in nonresponse
adjustment situations. The model is restricted to a single individual’s utility but can be used as
a foundation for other and more sophisticated modeling of response propensity.

## References

*AAPOR. 2014. Current Knowledge and Considerations Regarding Survey Refusals. AAPOR *
*Task Force on Survey Refusals. *

Artsev, Y., Yitzhaki, S., Schechtman, E. 2008. Who Does Not Respond in the Household
*Expenditure Survey. Journal of Business & Economic Statistics. 26 (3), 329–344. *

*Brick, J. M. 2013. Unit Nonresponse and Weighting Adjustments: A Critical Review. Journal *
*of Official Statistic. 29 (3), p. 329–353. *

*Dillman, D. 1978. Mail and Telephone Surveys: The Total Design Method. New Jersey: John *
Wiley & Sons, Inc.

*Fjelkegård. L., Persson A. 2012. Varför bortfall? ur urvalspersonernas perspektiv (främst). *
Internal report from Statistics Sweden. Unpublished.

*Fridlund Karlsson, Å. 2008. Hushållens Utgifter (HUT) 2007. Official statistical report *
produced by Statistical Sweden 2008-06-04. https://www.scb.se/he0201 [collected
2019-11-29].

Groves, R.M., Singer, E., Corning, A. 2000. Leverage-Saliency Theory of Survey
*Participation: Description and an Illustration. Oxford Journal, 64 (3), p. 299–308. *

*Groves, R.M. 2006. Nonresponse rates and nonresponse bias in household surveys. Public *
*Opinion Quarterly. 70 (5), p. 646–675. *

*Gärling, T., Laitila, T., Westin, K. 1998. Theoretical foundations of choice modeling. Oxford: *
Elsevier science ltd.

Jara-Diaz, S.R. 1998. Time and Income in Travel Demand: Towards a Microeconomic Activity
*Framework. Universidad de Chile. *

Lundsröm, S., Särndal, C-E. 1999. Calibration as a Standard Method for Treatment of
*Nonresponse. Journal of Official Statistics. 15 (2), p. 305-327. *

Mattingly, M.J., Sayer, L.C. 2006. Under Pressure: Gender Differences in the Relationship
*Between Free Time and Feeling Rushed. Journal of Marriage and Family, 68 (1), p. 205-221. *
McFadden, D., Train, K. 1978. The goods/leisure tradeoff and disaggregate work trip mode
*choice models. Transportation research. 12 (5), p. 349-353. *

Peytcheva, E., Groves, R.M. 2009. Using Variation in Response Rates of Demographic
*Subgroups as Evidence of Nonresponse Bias in Survey Estimates. Journal of Official Statistics. *
25, p. 193-201.

Putte, B., Stoop, I.A.L., Vercruyssen, Anina. 2011. Are They Really Too Busy for Survey
*Participation? The Evolution of Busyness and Busyness Claims in Flanders. Journal of Official *
*Statistics. 27 (4), p. 619-632. *

Tolonen, H., Helakorpi, S., Talala, K., Helasoja, V., Martelin, T., Prättälä, R. 2006. 25-year
trends and socio-demographic differences in response rates: Finnish adult health behaviour
*survey. European Journal of Epidemiology. 21, p. 409-415. *

*Tourangeau, R., Plewes, T. J. 2013. Nonresponse in social science surveys: a research agenda. *
Washington, D.C.: The national academies press. ISBN: 978-0-309-27247-6.

## Appendix

*Figure A1. Scatter plot of M and 𝛽 with eleven different values of 𝛽 between 0 and 1. *

*Note: *𝑀 = "

("U_{W}V)XYZ_{("7}[
\)Z

* ,where t=4 (estimated total time to respond, in hours), 𝜏=336 (two weeks in hours), c=0. *

*Table A2. Description of the variables in Table 1. *

Variable Description

*R * Response rate, 1 if respond, 0 otherwise

*gender * 1 if female, 0 if male

*age * Age

*nchildren * Number of children under 19 years old in the household
*inc * Disposable income during the time span to respond (2 weeks in HUT)

*w * Wage per hour (yearly wage income divided by 1700)
*E *

Other income then wage income during the time span to respond (2 weeks in HUT). Calculated as disposable income minus wage income minus tax.

Minimum value is set to zero

*d * 1 if wage is equal to zero, otherwise 0

*H * *wτ+𝑑̅E *

*Hc * *w(τ-t)+𝑑̅(E+c) *

*gender(Hc+d) * *gender multiplied by Hc+d *

*age(Hc+d) * *age multiplied by Hc+d *

*nchildren(Hc+d) * *nchildren multiplied by H+d. *

*inc(Hc+d) * *inc multiplied by Hc+d *

0.9 1 1.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M 𝛽

*Figure A3. Scatter plot of fitted value of model (3) and disposable income. *

*Note: Response probability is fitted values of model (3) and Income is two weeks disposable income. One high extreme value *
*of income is removed for illustrative purpose. *

*A4. Derivations from the Cobb-Douglas utility function and forward to equation (9). *

*Each expression denoted by (A.#) corresponds to the expression denoted with the same number *
*as in the text, e.g. (A.1) is equal to (1). *

*Max𝑈(𝐺, 𝐿, 𝑋) * *(A.1) *

*Insert G and L into the Cobb-Douglas utility function: *

𝑈 = 𝐾(𝑤𝑊 + 𝐸 + 𝑐)"78_{(𝜏 − 𝑊 − 𝑡)}8_{𝑋}9
ln(𝑈) = ln(𝐾) + (1 − 𝛽) ln(𝑤𝑊 + 𝐸 + 𝑐) + 𝛽 ln(𝜏 − 𝑊 − 𝑡) + 𝑅𝑙𝑛(𝑋)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●_{●}
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● _{●} ●●
●
●
●
●
●
●
●
●
●
●●_{●}
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● _{●}
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● _{●}
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●_{●}
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●
●
●
●
●
●
● _{●}
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● _{●}
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 10000 20000 30000 40000
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Income
Response Probability

𝜕ln (𝑈)
𝜕𝑊 =
𝑤(1 − 𝛽)
𝑤𝑊 + 𝐸 + 𝑐−
𝛽
𝜏 − 𝑊 − 𝑡
*Set equal to zero and solve for W to maximize the function: *

𝑤(1 − 𝛽)
𝑤𝑊 + 𝐸 + 𝑐 =
𝛽
𝜏 − 𝑊 − 𝑡
𝑤(𝜏 − 𝑊 − 𝑡)(1 − 𝛽) = (𝑤𝑊 + 𝐸 + 𝑐)𝛽
𝑤(𝜏 − 𝑡)(1 − 𝛽) − 𝑤𝑊(1 − 𝛽) = 𝑤𝑊𝛽 + (𝐸 + 𝑐)𝛽
𝑤𝑊 = 𝑤(𝜏 − 𝑡)(1 − 𝛽) − (𝐸 + 𝑐)𝛽
𝑊∗ _{= (𝜏 − 𝑡)(1 − 𝛽) −}𝐸 + 𝑐
𝑤 𝛽 *(A.4) *

Substituting 𝑊∗_{ into G and L; }

𝐺(𝑊∗_{) = 𝑤 c(𝜏 − 𝑡)(1 − 𝛽) −}𝐸 + 𝑐
𝑤 𝛽d + 𝐸 + 𝑐 = 𝑤(𝜏 − 𝑡)(1 − 𝛽) + (𝐸 + 𝑐)(1 − 𝛽)
𝐿(𝑊∗_{) = 𝜏 − [(𝜏 − 𝑡)(1 − 𝛽) −}𝐸 + 𝑐
𝑤 𝛽] − 𝑡 = (𝜏 − 𝑡)𝛽 +
𝐸 + 𝑐
𝑤 𝛽
and further into the Cobb-Douglas utility function:

𝑈∗ _{= 𝐾[𝑤(𝜏 − 𝑡)(1 − 𝛽) + (𝐸 + 𝑐)(1 − 𝛽)]}"78_{[(𝜏 − 𝑡)𝛽 +}𝐸 + 𝑐
𝑤 𝛽]8𝑋9
𝑈∗_{= 𝐾(1 − 𝛽)}"78_{𝛽}8_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]}"78_{[(𝜏 − 𝑡) +}𝐸 + 𝑐
𝑤 ]8𝑋9
𝑈∗ _{= 𝐾(1 − 𝛽)}"78_{𝛽}8_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]}"78_{(𝑤}7"_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)])}8_{𝑋}9
𝑈∗ _{= 𝐾(1 − 𝛽)}"78_{𝛽}8_{𝑤}78_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]𝑋}9 _{(A.5) }

Substituting (6) and (7) into (8) gives:

𝐾(1 − 𝛽)"78_{𝛽}8_{𝑤}78_{[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]𝑋 > 𝐾(1 − 𝛽)}"78_{𝛽}8_{𝑤}78_{[𝑤𝜏 + 𝐸] }

[𝑤(𝜏 − 𝑡) + (𝐸 + 𝑐)]𝑋 − 𝑤𝜏 − 𝐸 > 0

*𝑤[𝑋(𝜏 − 𝑡) − 𝜏] + [𝑋(𝐸 + 𝑐) − 𝐸] > 0 * *(A.9) *

*A5. Derivations from equation (14) into the general expression of the probit model (16). *
By denoting:

𝐻𝑐 = 𝑤(𝜏 − 𝑡) + 𝑑̅(𝐸 + 𝑐) , 𝐻 = 𝑤𝜏 + 𝑑̅𝐸 Equation (14) ca be written as:

𝑋[𝐻𝑐 + 𝑑] − 𝐻 − 𝑀𝑑 > 0

*If X is written as the general multiple linear regression as in (15), a general probit model can *
be expressed as:
𝑃(𝑅 = 1) = Φ(𝛼 + (𝛽_{#}+ 𝛽_{"}𝑋_{"}+ ⋯ + 𝛽_{O}𝑋_{O})[𝐻𝑐 + 𝑑] + 𝛽_{OU"}𝐻 + 𝛽_{OUe}𝑑)
𝑃(𝑅 = 1) = Φ(𝛼 + 𝛽_{#}[𝐻𝑐 + 𝑑] + 𝛽_{"}𝑋_{"}[𝐻𝑐 + 𝑑] + ⋯ + 𝛽_{O}𝑋_{O}[𝐻𝑐 + 𝑑] + 𝛽_{OU"}𝐻 + 𝛽_{OUe}𝑑)
𝑃(𝑅 = 1) = Φ(𝛼 + 𝛽_{#}𝐻𝑐 + 𝛽_{#}𝑑 + 𝛽_{"}𝑋_{"}[𝐻𝑐 + 𝑑] + ⋯ + 𝛽_{O}𝑋_{O}[𝐻𝑐 + 𝑑] + 𝛽_{OU"}𝐻 + 𝛽_{OUe}𝑑)
By denoting:
𝛽#𝑑 + 𝛽OUe𝑑 = (𝛽#+ 𝛽OUe)𝑑 = 𝛽#∗𝑑

The model looks as follows:

𝑃(𝑅 = 1) = Φ(𝛼 + 𝛽#𝐻𝑐 + 𝛽#∗𝑑 + 𝛽"𝑋"[𝐻𝑐 + 𝑑] + ⋯ + 𝛽O𝑋𝒌[𝐻𝑐 + 𝑑]