• No results found

Treatment of missing observations in multilevel data

N/A
N/A
Protected

Academic year: 2021

Share "Treatment of missing observations in multilevel data"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Title: Treatment of missing observations in

multilevel data

Author: Walaa Qabaha

(Date of Birth: 90-06-01)

spring 2020

Course name, Independent project II, 15 credits Subject: Statistics

Örebro University School of Business Supervisor: Nicklas Pettersson Examiner: Olha Bondar

(2)

Contents

1. Introduction………..1 1.1Research problem………..2 1.2 Outline……….2 2. Missing data………2 3. Multilevel data………3 3.1Multilevel modeling………..3

3.2Missing observations in multilevel data………....5

4. Methods for handling missing observations in multilevel data………...6

4.1 Complete case analysis………....6

4.2 Imputation………...7

4.2.1 Likelihood approach ………....7

4.2.2 Fully conditional specification method………....8

4.3 Multiple imputation………...10 5. Application………..11 5.1 Data………11 5.2 Simulation study………11 6. Discussion………....17 7. Conclusion………...18 References……….. 19 Appendix……….21

(3)

Abstract

In this thesis, missing data, the cases of missing values in multi-level observations, missing data mechanisms, and the multi-level modeling are described. Different methods that can be used to treat missing data at multi-level observations are discussed, including complete case analysis, imputation methods, other modern techniques that allow dealing with missing observations with a hierarchal structure and multiple imputation method. The properties of multiple imputation method and of the complete case analysis approach to treat missing observations in multi-level data are investigated through a simulation study based on the data from Joop (2010). The evaluation of those methods is done by comparing the estimators of model parameters together with their uncertainties, which are obtained when the full data are used, when the multiple imputation procedure is employed, and when the complete case analysis approach is used. The results show that two procedures dealing with missing

observations provide results which are in line with those obtained when the full data are used. Moreover, the multiple imputation produces both the values of the estimated model

parameters and their uncertainties, which are closer to those values obtained from the full data than the ones computed in the complete case analysis.

The conclusion is that the multiple imputation method is an appropriate procedure and is more preferable under the assumption of missingness at random, and this method gives researchers flexibility when dealing with multi-level observations. Moreover, this study points to

important areas for new research on this topic, which focus on modern methods for handling missing observations with hierarchal structure.

(4)

Acknowledgment

Special thanks to my husband and my kids, who encouraged me to finish my thesis and their infinite love and support. I also want to thank my supervisor, Nicklas Pettersson, for guidance and support during the thesis process. Many thanks to the examiner Olha Bondar for her helpful comments and suggestions.

(5)

1. Introduction

Multi-level data have a hierarchical structure and are defined as the data where observations are gathered in units. These include observations of pupils, clustered within classes, or observations of teachers, clustered within schools. There exist four cases of missing values in multilevel observations: values missing at level

1(individual level); level 2(class level); the outcome variable; the class variable (class identificatory) (van Buuren, 2018). The problem of missing values in multilevel data is important and interesting. Since missing data can appear both at all levels and on several levels simultaneously, and this makes it more complicated in comparison to the nonhierarchical missing data problem. As a consequence, it is less obvious what the best solutions are in a specific situation. However, many methods have been developed to deal with the various types of situations, but the area is still under development and will probably continue to be so for the foreseeable future (Enders, Keller, &Levy, 2018; Hox, van Buuren, & Jolani, 2015; Grund, Ludtke, &Robitzsch, 2016, recent reviews).

In multi-level observations, data could be missing at different levels. A classical example of using a multi-level model is to estimate and predict students' results: • In the outcome variable (dependent variable), data may be missing at level 1(pupil

level) or at level 2(class level) or at both levels.

• There are missing data in level 1(independent variable). If a student is absent because of sickness or any other reasons, then the student’s result is missing. • While missing data in level 2 (independent variable) occur if the teacher's

experience who teaches in a specific class is missing.

• A fourth case is missing values in class level, if we cannot find out which class a student belongs to (class identificatory is missing).

To analyze missing data, it is good to discover the reason of missingness, such as “don’t know,” or “refused,” or “unintelligible” (Schafer and Graham, 2002, p148), because knowing why the data are missing provides information that can be valuable for the researcher to find out how to treat missingness. Before choosing an appropriate procedure to handle missing data, Missing data mechanisms should be considered when analyzing missing data in multi-level observations. The mechanism of

missingness is defined as the data generating process for missing data and explains the relationship between variables. Missing data mechanism is an important aspect since choosing appropriate methods to handle missing data depends very strongly on it. (Little and Rubin, 2002). A more detailed description of the missing data mechanisms will be given in the next chapter.

One of the most common approaches for dealing with missing data is called “complete case analysis (CCA)”. CCA approach restricts the statistical analysis for the variables used in the aggregate analysis and related to variable selection. Sometimes there perhaps are missing data in variables that are not used in the analysis. Then we would probably keep these observations and obtain complete data for the variable we are interested in. The advantages of using this approach are “simplicity” because analyzing missing data can be utilized without any adjustment, and “comparability” since different researchers can use the same standard methods on the same data and replicate each other

(6)

findings. While the disadvantages of using this approach are that it can reduce sample size and can introduce bias when missing data are not missing completely at random (MCAR) (Little and Rubin, 2002, p. 41). Other modern techniques to deal with missing data with a hierarchal structure include likelihood approach, fully conditional specification method, and multiple imputation method.

The thesis aims to assist in choosing the most appropriate method to handle the problem of missing values in multi-level observations based on understanding the missing values at different levels in data that have a hierarchical structure. Knowing the type of missing data influences the choice of the appropriate method. A good approach should be flexible and can take all available information into account in a suitable way.

1.1 Research problem

In this paper, I will try to find out what the challenges are in missing data in multi-level analysis, and what is the most appropriate method to handle the missing values in multi-level observations? How much does the type of missing values depend on the model and what is the impact of missing values on the estimators of the model

parameters?

1.2 Outline

This thesis consists of seven chapters: chapter 2 describes the missing data in general. Chapter 3 provides a description of the multi-level model and the cases of missing values at multi-level observations are presented. In chapter 4, the methods that can be used to treat missing data in multi-level observations are discussed. In chapter 5 we describe the data, the methods used to treat missing data, and the linear mixed model applied to the data. Chapter 6 presents the analysis of the methods with some

discussion. The last chapter is a concluding chapter with suggestions for further research on this topic.

2. Missing data

Classification of mechanisms leading to missing values was done by Rubin (1976), who categorized missing data mechanisms into three cases as missing at random (MAR), missing completely at random (MCAR), and missing not at random (MNAR). MAR means that the missing data are related to some other observed variables in the data. If, for example, the response is higher among women than men, and we have observed the women conditional on men variable, then, the missingness in the variable in condition is the missingness at random. While MCAR means that the probability of missing values is unrelated to the data. The third possible mechanism MNAR is defined as the probability of missingness that related to the missing observations themselves. It can be, for example, either that missingness on income depends on the level of income, low earners might not want to answer. But it could also be that low earners have lower education. It is more likely to miss answers from

(7)

had, we would have the MAR, so that we would have random missingness conditional on education (van Burren, 2012).

To express the mathematical definition of missing data mechanism, we introduce the following notations: Let 𝛾 be the vector of model parameters used to describe the distribution of a univariate dependent variable Y. Let R be the missing data indicator vector that applies to Y where R=0 for y observed (𝑦#$% ) and R=1 for y missing (𝑦&'%). It is also possible to generalize so that R and Y are matrices.

If the missing values are MCAR, then the probability distribution for MCAR can be written as (Enders, 2010):

P(R=1| 𝑦𝑜𝑏𝑠, 𝑦𝑚𝑖𝑠, 𝛾) = P (R=1 | 𝛾). (1) So, the probability that a variable is missing does not depend on the dependent

variable Y in the data and the probability of missingness is the same for all unites. If the missing values are MAR then the probability distribution for MAR can be

written as (Enders, 2010):

P(R=1| 𝑦𝑜𝑏𝑠, 𝑦𝑚𝑖𝑠, 𝛾) = P (R=1 | 𝑦#$%, 𝛾). (2) So, the probability of missingness depends only on observed information (𝑦#$%). Finally, the probability distribution for MNAR is defined as (Enders, 2010): P (R=1| 𝑦𝑜𝑏𝑠, 𝑦𝑚𝑖𝑠, 𝛾) (3) So here the probability that a variable is missing depends also on unobserved information including missing values (𝑦&'%) itself.

In the 1990s, the developments in the methods for handling missing data addressed MAR and thus MCAR which is a special case of MAR, while the new methods focus on how to handle missing observations in data that have a hierarchical structure, and in which the missing data are MNAR and MAR (van Buuren, 2012). In this thesis, we will focus only on MAR mechanism.

3. Multi-level data

In section 3.1 a description of multi-level data and of the multi-level model that is used to analyze missing data in multi-level observations are provided. In section 3.2 a characterization of the cases of missing values at multi-level observations is

presented.

3.1 Multi-level modeling

Multi-level data have a hierarchical structure and are defined as the data where observations are gathered in units such as observations of pupils, clustered within schools, or observations of individuals, clustered within countries, neighborhoods, or

(8)

states (van Buuren, 2018). The important thing with hierarchical data is that the observations share common aspects in the hierarchy. Based on the analysis, this dependence may be important. This is the reason why we need to investigate the structure of the data generation, and potentially use a model that reasonably considers the hierarchical structure. Modeling multi-level data depends on what we are

analyzing, to what extent, and in what way we need to consider the structural part (Black et al., 2011).

The problem of missing values in multi-level data is very common. The complexity lies in the presence of the four cases of missing values in multi-level observations: - values missing at “the outcome (dependent) variable”,

- “the level-1 predictors (independent variables of the model)”, - “the level-2 predictors (independent variables of the model)”, - “the class variable”.

There are different ways to intervene and to consider those four cases of missing data. The use of a suitable method depends on the aim of inference and which type of model and method we use (van Buuren 2018, p. 200). There exist many similar concepts for multi-level data, and many different notations.

Next, we introduce the description of the multi-level model used in the empirical part of the thesis:

• Level 1: individual, or student level Variables at level 1 are:

- Teacher popularity (popteach) is the outcome variable at the first level, and the dependent variable of the model. Each teacher evaluates the popularity of each student in the scale from 1 to 10. Popteach has two indexes i and j. The variable popteach'8 denotes the popularity of the i-th student in the class j evaluated by the teacher.

- Extravert (extrav) denotes psychological characteristic of the student. Extrav is an independent variable on the individual level. This variable can take values from 1 to 10. The variable extrav'8 is the psychological characteristic of the i-th student in j-th class.

- The first level model equation is

popteach'8= 𝛽=8 + 𝛽>8 extrav'8+ 𝜀'8, (4) level 1

where the slope 𝛽>8 is assumed to be fixed across classes. The within-class random residuals of the student level are assumed to be independent and normally distributed with 𝜀'8~ N (0, 𝜎BC ) .

• Level 2: class level

We denote 𝛽=8 the random intercept in the level 1 equation which varies by class. The model equations at level 2 are given by

𝛽=8= 𝛾==+ 𝛾=> texp8 + 𝑢=8 (5) level 2 𝛽>8 = 𝛾>= (6) level 2

(9)

- The first equation of the level 2 describes the variation in the mean of teacher popularity across classes, it is the function of the overall level mean 𝛾== and the class specific part 𝛾=> texp8, where texp8 denotes the teacher experience who teaches in class j. Teacher experience is an independent variable at the class level, it is measured in years of working experience. In our data, it takes values from 2 to 25. The residuals 𝑢=8 is the class-level random residuals, which are assumed to be normally distributed with 𝑢=8~N (0, 𝜎F=C ).

- The second equation of the level 2 specifies that 𝛽>8 is a fixed effect equal to 𝛾>=, that is why this equation does not have random residuals.

The unknown parameters are the fixed parameters 𝛾==, 𝛾=> , and 𝛾>=, and the variances 𝜎BC and 𝜎

F=C . We are interested in their estimation. Parameter 𝛽=8 is a random variable which can be predicted.

We may plug in the level 2 equations in the level 1 equation, and then the model will be rewritten as a single predictor equation:

popteach'8= 𝛾==+ 𝛾>= extrav'8+ 𝛾=> texp8 + 𝑢=8 + 𝜀'8. (7) Since it is difficult to deal with the sum of random residuals 𝑢=8 + 𝜀'8, it is more convenient to rewrite the model in the matrix notation as shown below

H IJIKLMNO PQ ⋮ IJIKLMNO SQQT = U 1 texp8 extrav>8 ⋮ ⋮ ⋮ 1 texp8 extravWQ8X Y 𝛾== 𝛾=> 𝛾>=Z+Y 1 ⋮ 1 Z 𝑢=8+U 𝜀>8 ⋮ 𝜀WQ8X, which is the model for class j, where 𝑛8 is the number of students in the class j.

3.2 Missing observations in multi-level data

Many reasons for missing observations in multi-level data must be considered when dealing with multi-level observations such as data loss, item nonresponse, error in the entry of the data, or error on the research design (Black et al., 2011). In multi-level observations, data could be missing at different levels. For example, data may be missing at item nonresponse or may be missing at the unit nonresponse. Item nonresponse occurs when the respondent forgets one or more items in the survey while unit nonresponse refers to respondents who refused to participate, which means all data are missing for this respondent. There are various methods to handle the problem of missing data in the unit and item nonresponse: editing and imputation methods have been used to handle the item nonresponse, and weighting techniques addressed the unit nonresponse (see, van Buuren, 2018).

Missing observations in predictors level 1 or level 2 can be handled by using a listwise deletion approach (complete case approach). The multi-level model, in general, can handle the problem of missing observations in the outcome variables. More

specifically, it can handle models with varying time points (van Buuren, 2018). The linear mixed effects with expectation-maximization algorithm can be employed for

(10)

modeling multi-level data, and this model is used to handle the problem of missing data in the outcome. But if the missing data is MNAR, it can be very difficult to handle (Black et al., 2011). According to van Buuren (2018), missing data in the

predictors are complicated and difficult to handle because we cannot calculate derived variables from the data by using multi-level models such as the product of two level 1 predictors and the cluster means of level 1 predictors when we have missing

observations. First, we need to handle missing values in the predictors, and then we can handle missing data in the outcome variables. A more detailed description will be given in the next chapter.

4. Methods for handling missing observations in multi-level

data

In section 4.1 a traditional technique is provided such as complete case analysis that can be used to handle missing data in multi-level observations. In section 4.2 we present imputation methods and other modern techniques which allow dealing with missing observations, while the multiple imputation is described in section 4.3.

4.1 Complete case analysis

Complete case analysis (CCA) is a simple method for handling missing data. It is also known as a listwise deletion approach and is used to treat the item and unit

nonresponse. Generally, this method works only when missing observations are MCAR. Whereas if the data are not MCAR, the methods can suffer from the bias in estimates and from the loss of precision. The benefit of using such an approach is that we are using auxiliary variables because we do imputation by weighting all other variables from the complete cases. In general, this weighting procedure helps to avoid bias in estimates. In the empirical part of the thesis, we do not use this weighting procedure, while the R function deletes missing data completely.

The CCA approach works through excluding all units that have missing observations and uses a smaller dataset with completely observed units (Little and Rubin, 2002). However, this method becomes more complicated in multi-level observations when there is missing in many variables, and there may be very few complete cases. Then the use of a small dataset leads to the loss of accuracy and introduces bias if the missing data are not MCAR or MAR and used to control for relevant variables (Gelman and Hill, 2007).

Missing data in multi-level observations cause many problems for researchers because it indicates that data are missing in many variables, and this type of missing data may cause large fractions of the sample to be rejected. Deleting or omitting large fractions of the sample means that the complete data cannot represent the full

population, which can affect the estimation. The estimated parameters under a complete case are unbiased and efficient if missing data are MCAR since the

incomplete data do not provide any information for the analysis. While if the missing observations are not MCAR, the result may be biased. The relation to what we are estimating in the same data may be MCAR for one problem and MAR for another.

(11)

When analyzing missing observations in multi-level data, predictors level1 or level 2 can be handled by using a listwise deletion approach (Schafer and Graham, 2002). This approach is done through omitting any case that has missing observations from level 1(individual level), level 2(class level), and outcome variable and runs a model without any missing data. That means, this method requires complete data on all variables on the analysis, which indicates that any case with missing observations on one or more variables is rejected from the analysis. In the dataset that is used in the analysis, there are missing values at level1variables: pupil extraversion (extrav), level 2 variables: teacher experience (texp) and outcome variables: popularity of teacher (popteach) that need to be treated. In order to examine if the listwise deletion approach is reasonable, it is important to check the assumption of the linear mixed model.

4.2 Imputation

There are many methods for single imputation including mean imputation, regression imputation, and “cold- or hot-deck”. These methods impute only point prediction, which underestimates the uncertainty mostly when missing data are not MCAR (see van Buuren, 2018 for more details). Next two imputation methods based on likelihood function and constructed from the Bayesian point of view will be introduced for the multilevel model discussed in section 3.1.

4.2.1 Likelihood approach

The maximum likelihood estimates from an incomplete data set are unbiased when the data are MAR and MCAR, and when the model is specified correctly. Moreover, this method is more appealing than the complete case approach. The mixed-effect model is suitable in conjunction with the maximum likelihood (ML) to handle missing

observations in the outcome variable (dependent variable). Many software programs such as 𝑀𝑝𝑙𝑢𝑠 are used to treat the problem of missing observations in a specified model that includes auxiliary variables. Mplus software is explained in more detail in Muthén et al. (2016) (see also, Black et al., 2011).

Let y'8 = popteach'8 , 𝑥>;'8 = extrav'8 , and 𝑥C;8= texp8 . Using the linear mixed model presentation (7) of the two-level model (4)-(6), the log-likehood function is given by 𝑙b 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎 BC; y'8, 𝑥>;'8, 𝑥C;8c = ∑ ∑ log (𝑓by'8|𝑥>;'8, 𝑥C;8; 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎 BCc) ' 8 (8) with 𝑓by'8|𝑥>;'8, 𝑥C;8; 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎BCc = > iCj(klmnk opm ) 𝑒𝑥𝑝 r−(tuQv wpp v wPp xP;uQv wpP xm;Q)m C(klmnk opm ) y. (9)

(12)

If some values of y'8, 𝑥>;'8, 𝑥C;8 are missing, then the log-likelihood function cannot be computed and, consequently, it cannot be maximized. To deal with the problem

Dempster et al. (1977) develop the so-called EM (Expectation-Maximization) algorithm. This is a two-step approach where in the first step the expectation of the log-likelihood is computed conditionally on the observed data while the computed conditional expectation is maximized in the second step. See also Little and Rubin (2002, chapter 6) where likelihood methods to address the problem of missing data in multi-level observations are explained in detail.

In cases of incomplete data, Black et al. (2011) state that ML estimation techniques under the assumption of ignorable missingness make use of all observed scores. However, the estimated parameters with ML may not be reliable when ML is used with a small sample size.

4.2.2 Fully conditional specification method

Fully conditional specification (FCS) is another strategy used to handle data with missing observations. The likelihood approach presented in the previous section, together with the EM algorithm, provides an approach on how to estimate model parameters when missing values are present in data. Although the initial aim of this method is not to impute the missing values themselves, but rather to develop an estimation procedure which allows the estimation of the model parameter, the

imputation of missing values takes actually place when the expectation step of the EM algorithm is performed.

The fully conditional specification method is designed in another way. The idea behind the approach is to impute the missing values with the predictive ones

computed under the assumed model specification. In practice, the computation of the predictive values employs the methods of Bayesian statistics together with the Markov Chain Monte Carlo approach. For this reason, the imputation procedure is also known as a Bayesian strategy in the literature.

Fully conditional specification approach uses the conditional distribution of one variable at a time conditioned on all other variables included into the model (van Buuren and Groothuis-Oudshoorn, 2011). Based on the derived conditional distributions the missing values of each variable in the model are imputed. The procedure is repeated for all variables by using their conditional distributions. The FCS method is flexible and can be used for different model specifications where each variable can have a different distribution. It is easy to understand and provides

imputed values close to the observed values in the data in many cases. On the other side, the FCS is more explorative and depends on empirical support. Finally, the FCS is implemented in different statistical software, like in MICE (multivariate imputation by chained equations) package of R.

In the case of the two-level model (4)-(6) written in the linear mixed model parameterization (7) as

(13)

the fully conditional specification approach is designed by starting with the definition of a prior assigned to model parameters 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , and 𝜎BC. Several

approaches exist how a prior for the model parameters can be specified. One can use a non-informative prior or alternatively one can opt for an informative prior, like a conjugate prior. Mostly used prior is a mixture of both approaches (see, Audigier et al., 2018). It employs constant priors for the location parameters 𝛾== , 𝛾>= , 𝑎𝑛𝑑 𝛾=> and independent inverse gamma priors for the scale parameters 𝜎F=C , and 𝜎

BC. This prior is given by 𝑝( 𝛾== ) ∝ 1 (10) 𝑝( 𝛾>= ) ∝ 1 (11) 𝑝( 𝛾=> ) ∝ 1 (12) 𝑝(𝜎F=C ) ∝ (1/𝜎 F=C )•opn>𝑒𝑥𝑝(-𝛽F=/𝜎F=C ) (13) 𝑝(𝜎BC) ∝ (1/𝜎BC)•ln>𝑒𝑥𝑝(-𝛽B/𝜎BC) (14)

The hyperparameters in the inverse-gamma priors 𝛼F=, 𝛽F=, 𝛼B, and 𝛽B are usually chosen large to make the priors as vague as possible (see, Audigier et al., 2018). In order to describe the fully conditional specification approach we consider for simplicity a special case of missing data when missing observations are present in the dependent variable of the model, namely popteach'8. In the similar way missing observation in the independent variables might be treated.

Let yJƒ„ = popteach#$% denotes those values of popteach'8 for which the observation is present. Similarly, let y…†„ = popteach&'%, we denote those popteach'8 for which the observation is missing. The aim of the fully conditional specification approach is to impute the values in y…†„ by drawing them from the posterior predictive distribution given by

𝑝( y&'%| y#$%) = ‡ 𝑓b y&'%|𝑥>;'8, 𝑥C;8; 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎 BCc wpp , wPp , wpP ,kopm ,klm × 𝑝( 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎 BC|y#$%)𝑑 𝛾== 𝑑 𝛾>= 𝑑 𝛾=> d𝜎F=C d𝜎BC, (15) where 𝑝( 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎 BC|y#$%) ∝ 𝐿b 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎BC; y#$%, 𝑥>;'8, 𝑥C;8c × 𝑝( 𝛾== )𝑝( 𝛾>= )𝑝( 𝛾=> )𝑝(𝜎F=C ) 𝑝(𝜎 BC) is the posterior of 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , and 𝜎

BC given the observed values yJƒ„= popteach#$% of dependent variable popteach'8 and

𝐿b 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , 𝜎

BC; y#$%, 𝑥>;'8, 𝑥C;8c denotes the likelihood function computed for the observed values of popteach'8.

The integral in (15) could be analytically evaluated for some priors assigned to 𝛾== , 𝛾>= , 𝛾=> , 𝜎F=C , and 𝜎

BC, while for other priors one have to use numerical means to obtain the posterior predictive distribution (15). In the latter case, one can apply the Markov Chain Monte Carlo (MCMC) approach with Gibbs sampler (cf., van Buuren, 2018) to get the realizations of popteach&'% which impute the missing values in the variable popteach'8 of the multi-level model (4)-(6).

(14)

4.3 Multiple imputation

Inference based on single imputation methods does not take the uncertainty about the imputed values into account, which usually leads to underestimating the variance of the estimators of the model parameters. Rubin (1987) suggested a new approach to deal with this problem. In contrast to the single imputation, the new approach can capture the uncertainty in the estimated model parameters, which are related to the uncertainty of the imputed values. The idea of Rubin (1987) was to impute the missing values several times and then to pool the estimators of the model parameters. Since not only a single imputed value is used in the procedure, the uncertainty related to the imputation is now taken into account when the variance of the estimated model parameters is computed. Figure 1 provides the graphical presentation of the procedure suggested by Rubin (1987), which is known in the literature as the multiple

imputation (MI). In general, multiple imputation fills in missing data several reasonable values, which creates multiple completed datasets.

Figure 1: The sketch of multiple imputation where the sign “?” corresponds to missing values.

Multiple imputation is one of the mostly used methods that deals with missing observations in multi-level data. According to Black et al. (2011), the multiple imputation method with the normal assumption is a frequently used procedure to handle the problem of missing observations in multi-level data. The use of MI procedure is relatively fast and straightforward method to apply.MI helps to better represent the uncertainty and get better precision in the uncertainty of estimates. In general, there is probably a better method than MI. Still, this method is very general, and gives the researcher flexibility when dealing with missing observations in a

hierarchal data set. Moreover, this method can provide accurate and efficient estimates and can be utilized under the less limiting assumption of missing at random (MAR). Therefore, MI is often preferred (van Burren, 2018).

Many possible approaches can be used for multiple imputation with multi-level analysis.A Bayesian approach described in section 4.2 is one of them which shows to

(15)

mice package of R replaces multivariate missing data multiple times where each missing value is imputed by a separate method. The advantage of using mice is that this method can impute two level data and maintain consistency (van Buuren, 2018). Multiple imputation procedure from mice works through filling in the missing data with multiple values, running the analysis with the completed dataset in all of those cases, and average the final results to get better estimates. Also, van Buuren (2018) found that multiple imputation with FCS is convenient to treat missing values at level 2 with random slopes.

5. Application

In section 5.1 a description of the data is provided and in section 5.2 a simulation study is described in more details.

5.1 Data

The data used in this study are a popular multi-level data set from Joop (2010) with 2000 observations. The data set consists of 7 variables. The variables are described as follows:

• Pupil: pupil number within class • Class: class number

• Extrav: pupil extraversion(level 1) • Sex: pupil gender(level 1)

• Texp: teacher experience (level 2) • Popular: pupil popularity(level 1)

• Popteacher: pupil popularity evaluated by teacher (outcome variable)

The data set contains many missing observations. A simulation study is used to obtain complete data set with 548 observations that can be used as actual population for comparison.

5.2 Simulation study

In this section, we compare the ability of the multiple imputation method and of the complete case analysis approach to handle missing observations in multi-level data. The comparison is based on a simulation study, where the full data set consists of 548 observations on the four variables as described in section 3, where the two-level model equation is introduced. The aim of the simulation study is to compare the estimators of model parameters together with their uncertainties, which are obtained when the full data are used, when the multiple imputation procedure is employed, and when the complete case analysis approach is used.

Before the multiple imputation method and of the complete case analysis approach are used to estimate the parameters of the two-level model (4)-(6), we create missing values in all three variables of the model, namely popteach, extrav, and texp, in the full data set. In order to make the setup of the simulation study as realistic as possible,

(16)

the mechanism of missing data is estimated for each variable in the model by using extended data consisting of 2000 observations with missing values from which 548 observations were obtained. The extended data set is taken from Joop (2010). For each variable in the full data set, we estimate MAR missing data mechanism in the extended data set from which the full data set is obtained, by applying a logistic regression. The logistic regression is expected to provide a good fit since we have a dichotomous dependency.The idea is to estimate the missing data mechanism based on how the mechanism looked in the extended data.

In total, three logistic regressions are fitted to the extended data. In order to check which variables are significant and related to the dependent variable in each of the three models, transformations of variables were used in the model formulations as well.This approach is helpful in modeling the type of missingness, such as if the missing values are missing completely at random or missing at random or missing not at random. If parameter estimates are statistically significant for the explanatory variables, then the missingness is MAR or NMAR.

After fitting the logistic regression models to the extended data set, the mechanism of missingness in each variable of the data set is estimated. These mechanisms are later used to generate missing observations in each of the three variables in the full data set consisting of 548 observations. Next, we describe the whole procedure in detail in the case of popteach as the dependent variable of the logistic regression model. Similar approaches were also used for models with extrav and texp as dependent variables. For the logistic regression with dependent variable popteach, the following probability is modeled as a function of extrav and texp:

P(R=1| 𝛽) = >nŠŠ(‹pŒ‹P•Ž••‘’Œ‹m••Ž“)(‹pŒ‹P•Ž••‘’Œ‹m••Ž“) ,

where 𝑒𝑥𝑡𝑟𝑎𝑣 and 𝑡𝑒𝑥𝑝 are explanatory variables of the model, 𝛽 are coefficients in the model equation, and R is the cluster indicator and defined as

R= —1 𝑖𝑓 𝑝𝑜𝑝𝑡𝑒𝑎𝑐ℎ 𝑖𝑠 𝑚𝑖𝑠𝑠𝑖𝑛𝑔, 0 𝑖𝑓 𝑝𝑜𝑝𝑡𝑒𝑎𝑐ℎ 𝑖𝑠 𝑛𝑜𝑡 𝑚𝑖𝑠𝑠𝑖𝑛𝑔.

After fitting the logistic regression model to the extended data set consisting of 2000 observations, we apply it to each variable in the full data set and compute the

probability of a missing observation for each of 548 observed values of popteach. Later on, these estimated probabilities are used to generate missing values in the variable popteach, that is each of 548 values in variable popteach, was replaced by NA with the corresponding estimated probability. It is important to note that by

construction, the considered mechanism of missingness is MAR.

The pattern of generated missing values in all of three variables are shown in Figure 2. Here, we observed that 141 missing observations were generated in the variable extrav, 152 missing observations were generated in variable popteach, and we also have 240 missing observations in variable texp. Moreover, we have only 155 out of 548 entries in the data were no missing values in all of three variables are present,

(17)

while in the most cases, the missingness takes place in the variable texp only (131 out of 548 cases).

Figure 2: Pattern of generated missing values.

In the new data, there exist missing values at level 1, at level 2, and at the outcome variable that need to be considered before dealing with missing data. By construction, the missing values mechanism for all three variables is MAR. This type of

missingness can be treated by using complete case analysis and multiple imputation method from mice as described in section 4.

Next, we compare the ability of the two approaches described in section 4 to fit the model to data with missing values. The first method, complete case analysis, also known as the listwise deletion approach, uses only those observations vectors where all variables are present. Following Figure 1, there are only 155 out of 548 cases in the created data set. The second approach employs multiple imputation by using package mice in R, and it is based on 20 imputed data sets.

In Figures 3-5, we plot non-parametric density estimators of all three variables

popteach, extrav, texp included in the model. The blue line shows in each figure the

corresponding density estimator obtained in the data sets without missing values, while red lines are the densities constructed from each imputed data sets. In most of the considered cases for all three variables we observe that the non-parametric density estimators constructed from imputed data have similar shapes as those constructed from the data without missing values. Some deviations are present when the missing values in the variable extrav are imputed, where several density curves obtained for imputed data are smoother than the corresponding blue line in the figure. The reason

(18)

for that could be that the possible values of the variable extrav are integers, while fitted model assumed a normal distribution which is continuous one.

Figure 3: Non-parametric estimator of the density of popteach variable. Blue line corresponds to the full data, while red lines to 20 imputed data.

Figure 4: Non-parametric estimator of the density of extrav variable. Blue line corresponds to the full data, while red lines to 20 imputed data.

(19)

Figure 5: Non-parametric estimator of the density of texp variable. Blue line corresponds to the full data, while red lines to 20 imputed data.

The linear mixed model is fitted to the original data set consisting of 548 observations, to the one obtained by deleting the missing values corresponding to the complete case analysis, and to each imputed data sets in the multiple imputation. Further, the

resulting parameter estimators obtained for each of 20 imputed data sets are pooled to deliver the resulting estimators of the model parameters together with their

uncertainties.

In Figures 6-8 we plot the estimated parameters obtained for each of the fitted models together with their double uncertainties. The results for the full data are shown in blue, the results corresponding to the multiple imputation are green, and for the complete case analysis are depicted in orange. Figure 6 presents the estimators of intercept obtained for each fitted model, while the estimated slopes parameters are shown in Figure 7 in case of extrav and in Figure 8 for texp.

In the figures, we observe that the multiple imputation produces both the values of the estimated model parameters and their uncertainties, which are closer to those values obtained from the full data than the ones computed in the complete case analysis. The larger difference is present in the estimators for the intercept, where the complete case analysis overestimates and provides an uncertainty, which is considerably larger than the one obtained from the full data. Although the uncertainty obtained from the

multiple imputation method is also larger than the one computed in the case of the full data, but it is smaller than the one from the complete case analysis. Also, the estimated intercept by the multiple imputation method is closer to the one obtained from the full data. Similar results are also present for the estimated slope parameters: the values obtained by applying the multiple imputation method are closer to the ones obtained from the full data. Finally, we note that the slope parameter corresponding to variable

(20)

texp seems to be underestimated by each method dealing with missing observations in

data, while the multiple imputation method produces a larger value for the slope coefficient related to the variable extrav than the one calculated from the full data, while the application of the complete case analysis results in a smaller value. Figure 6: Estimated intercept together with double uncertainty.

Figure 7: Estimated slope parameter related to the extrav variable together with double uncertainty.

(21)

Figure 8: Estimated slope parameter related to the texp variable together with double uncertainty.

6. Discussion

In this study, the imputation method from mice and the listwise deletion approach was used to treat missing data in multi-level observations. The ways in which these

methods work were also investigated. Many possible approaches can be used for multiple imputation with multi-level analysis. The focus was on a fully conditional specification approach described in section 4.2, which shows to be very effective to impute missing values in the multi-level model. MI method with FCS is useful and gives the researcher flexibility when dealing with missing data in data sets with hierarchal structures. The advantage of using the MI method is that this method imputes missing data multiple times and analysis each imputed data set separately, then combine the results to get a better estimate. This MI procedure helps to better represent uncertainty and maintain consistency.

Moreover, MI works as a relatively fast and straightforward method to apply. But it is not always easy to create a good multiple imputation since it requires a very good imputation method. However, in general, there is no miracle cure, and in principle, it is always possible to find a better alternative than imputation. Still, this method is very fast and is very general van Buuren (2018). It also depends on what the purpose is; sometimes, even exclusion can be a good option if there is a lot of data and MCAR, the complete data set can be achieved quickly.

However, according to van Burren (2018), the multiple imputation with FCS is an appropriate procedure to treat missing data at level 2 with random slops. Moreover, according to Gelman and Hill (2007), MI methods are always preferable in many

(22)

research to analyze multilevel data because if the model is specified correctly and there exist enough imputed data set then the result becomes accurate and efficient compared to complete case analysis that sometimes becomes more complicated to treat missing data at multilevel observations when there are missing values in many variables, and there may be very few complete cases. According to our result, as shown in figure 1, there are only 155 out of 548 cases in the created data set used to fit the model regarding to the complete case approach. Dropping many cases that have missing values will affect the power of estimate. Then the use of complete case

analysis, in this case, leads to produce estimated parameter which is far away from the estimated parameter obtained from the full data set with larger uncertainty.

In the simulation study, we compared the ability of the multiple imputation method and of the complete case analysis approach to treat missing data in multi-level observations. The evaluation of those two procedures was done by comparing the estimators of the model parameter and their uncertainties. As a result of the plot of non-parametric density estimators of all three variables popteach, extrav,

texp included in the most of the considered cases for all three variables, we observe

that the non-parametric density estimators generated from imputed data have similar shapes as those generated from the full data. Furthermore, the empirical results show that the multiple imputation produces the values of the estimated model parameters and their uncertainties, closer to those values obtained from the full data than the ones computed in the complete case analysis.

7. Conclusion

The treatment of missing data at multilevel observations causes many challenges for the researcher since missing data can appear on all levels or several levels

simultaneously, making it more complicated than a nonhierarchical missing data problem. However, the use of a recent procedure, such as MI is straightforward. It gives the researcher flexibility when dealing with missing data in a hierarchal data set. Moreover, MI gives accurate and efficient results compared to the result obtained from the complete case approach.

As a conclusion from the thesis result, it seems that the multiple imputation is a good procedure to treat missing data at multilevel observations for which the estimated parameters are closer to the estimated obtained from the full data. MI replaces multivariate missing data multiple times, where each missing value is imputed by a separate method, which helps to better represent uncertainty and get a good estimate. Moreover, a Bayesian approach described in section 4.2 is an effective method to impute missing values in the multilevel model. It is easy to understand and provides imputed values close to the observed values in the data in many cases.

This paper focuses on increasing modern procedures for handling and analyzing missing data at multilevel observations. This topic points out many interesting future research questions and encourages the use of recent methods for handling missing observations with a hierarchal data structure.

(23)

8. References

Audigier, V., White, I. R., Jolani, S., Debray, T. P., Quartagno, M., Carpenter, J., van Buuren, S. & Resche-Rigon, M. (2018) Multiple imputation for multilevel data with continuous and binary variables. Statistical Science, 33(2), 160-183.

Black, A., Ofer, H., & Betsy Mccoach, D. (2011). Missing data techniques for

multilevel data: implications of model misspecification. Journal of Applied Statistics. Dempster, A.P, N.M. Laird, & Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society,

Series B, Vol. 39 (1), 1-37.

Enders, C. K. (2010). Applied Missing Data Analysis. New York: Guildford Press. Enders, C.K., Keller,B.T., & levey, R.(2018). Afully conditional specification approach to multilevel imputation of categorical and continuous variables. Psychological methods, 23,298-317.

Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and

multilevel/Hierarchal models. New York.

Grund,S., Robitzsch,A.,&Lüdtke,O.(2016).mitml: Tools for Multiple imputation in multilevel modeling(Version 0.3-2)[computer software].

Grund, S., Robitzsch, A., & Lüdtke,O. (2016). Multiple imputation of multilevel missing data: An introduction to the R package pan.

Hox,J.J.,van Buuren, S.&Jolani, S. (2015). Incomplete multilevel data: problems and solutions. In Harring, J.R.,Stapleton, L.M.,Beretvas, S.N. &Hancock,

G.R.(Eds)(2015). Advances in multilevel modeling for educational research:

Addressing practical issues found in real-world applications (pp.37-58). Charlotte,

NC: Information Age Publishing, Inc..

Little, R.J.A., & Rubin, D.B. (2002). Statistical analysis with missing data. New York: Wiley.

Muthèn, B. & Asparouhov, T. (2016). Dimentional, level, and Multi-Timepoint Item Response Modeling. In van der Linden, W. J., Handbook of Item Response Theory.

Rubin, D. B. (1976). Inference and missing data. Biometrika,63,581-592.

Rubin, D. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. Schafer, J.L, & Graham, J.W. (2002). Missing data: Our view of the state of the Art. Psychological Methods. Vol.7, No 2, 147-177.

(24)

Schafer, J.L., &Yucel, R.M. (2002). Computational strategies for multilevel linear mixed effect models with missing values. Journal of Computational and Graphical

Statistics, 11, 437-457.

van Buuren, S. (2012). Flexible imputation of missing data. Chapman& Hall/CRC. van Buuren, S. (2018). Flexible imputation of missing data. 2nd ed. Chapman&

Hall/CRC.

van Buuren, S., & Groothuis- Oudshoorn, K., (2011). MICE: Multivariate imputation by chained equation in R. Journal of Statistical Software, 45(3), 1-67.

(25)

Appendix

# load the packages

#install.packages("Matrix") ### packages for linear mixed model ##install.packages("pan")

##install.packages("mice")

##install.packages(" broom.mixed") ####open the packages

library(lme4) #####linear mixed model require(mice) ####data imputation require(lattice)

require(pan)

library(tidyverse)#### data manipulation library(broom.mixed)

####random number generator, set the seed to be 2020 set.seed(2020)

####load the data

con <- url("https://www.gerkovink.com/mimp/popular.RData") load(con)

ls()

# Description of the data: how many observations we have and the number of variables summary(popNCR)

head(popNCR) dim(popNCR) str(popNCR) tail(popNCR)

#####to access variables name attach(popNCR)

#####For each variable in the full data set, we estimate MAR missing data mechanism in the extended data set consisting of 2000 from which the full data set with 548 observations are obtained, by applying a logistic regression to make the setup of the simulation study as realistic as possible.

mdm_extrav <- glm( I(!is.na(extrav)) ~ texp + popteach , family = binomial ) # answer=1 or not =0 in the dependent variable(extrav)

mdm_texp <- glm( I(!is.na(texp)) ~ extrav + (popteach)^2 , family = binomial ) # answer=1 or not =0 in the dependent variable(texp)

mdm_popteach <- glm( I(!is.na(popteach)) ~ texp + extrav , family = binomial ) #answer=1 or not =0 in the dependent variable(popteach)

summary(mdm_extrav) summary(mdm_texp) summary(mdm_popteach)

#### The complete data is obtained by using complte case analysis #table(rowSums(is.na(popNCR[,c(3,5,7)])))

(26)

kompletta_data <- popNCR[comp_case==1,c(1:3,5,7)] any(is.na(kompletta_data$popteach))

any(is.na(kompletta_data$texp)) any(is.na(kompletta_data$extrav))

####compute the probability of a missing observation for each of 548 observed values of popteach,texp and extrav

pr_real_extrav <- predict(mdm_extrav, data.frame(kompletta_data), type="response") pr_real_texp <- predict(mdm_texp , data.frame(kompletta_data), type="response")

pr_real_popteach <- predict(mdm_popteach, data.frame(kompletta_data), type="response")

#####The estimated probabilities in the previous step are used to generate missing values in the complete data set with 548 observations

####in the variables popteach,texp and extrav

resp_real_popteach <- runif(length(pr_real_popteach))< pr_real_popteach summary(resp_real_popteach)

resp_real_texp <- runif(length(pr_real_texp))< pr_real_texp summary(resp_real_texp)

resp_real_extrav <- runif(length(pr_real_extrav))< pr_real_extrav summary(resp_real_extrav)

##### We generate real dataset with missing values that is each missing values in variables popteach,texp and extrav were replaced by NA

popteach_real_na <- kompletta_data[,c(5)] popteach_real_na[resp_real_popteach==FALSE] <- NA texp_real_na <- kompletta_data[,c(4)] texp_real_na[resp_real_texp==FALSE] <- NA extrav_real_na <- kompletta_data[,c(3)] extrav_real_na[resp_real_extrav==FALSE] <- NA data_na <- data.frame(popteach_real_na,texp_real_na,extrav_real_na,kompletta_data[,c(1,2)] ) is.na(data_na$texp_real_na) is.na(data_na$popteach_real_na) is.na(data_na$extrav_real_na) summary(data_na)

#####The pattern of generated missing values in all of three variables

data_pattern<-data_na[,c("class","popteach_real_na", "extrav_real_na", "texp_real_na")] names(data_pattern)[names(data_pattern) == "popteach_real_na"] <- "popteach"

names(data_pattern)[names(data_pattern) == "extrav_real_na"] <- "extrav" names(data_pattern)[names(data_pattern) == "texp_real_na"] <- "texp" md.pattern(data_pattern)

(27)

library(mice)

d <- data_na[, c("class","popteach_real_na", "extrav_real_na", "texp_real_na")] meth <- make.method(d)

meth[c("popteach_real_na", "extrav_real_na", "texp_real_na")] <- c("pmm", "pmm","pmm") pred <- make.predictorMatrix(d) pred["popteach_real_na", ] <- c(-2, 0, 1,1) pred["extrav_real_na", ] <- c(-2, 1, 0, 1) pred["texp_real_na", ] <- c(-2, 1,1, 0) #pred["class", ] <- c(-2, 1, 1, 1) d[,"class"]<-as.integer(d[,"class"])

imp <- mice(d, pred = pred, meth = meth, m = 20, print = FALSE) densityplot(imp, ~popteach_real_na, ylab="Density",xlab="popteach") densityplot(imp, ~extrav_real_na, ylab="Density",xlab="extrav") densityplot(imp, ~texp_real_na, ylab="Density",xlab="texp")

library(lme4)

#####fit the linear mixed model from the multiple imputation method

fit <- with(imp, lmer(popteach_real_na ~ 1 + extrav_real_na + texp_real_na + (1 | class), REML = FALSE))

summary(pool(fit))

fit_sum<-summary(pool(fit)) fit_est<-fit_sum$estimate fit_sd<-fit_sum$std.error

#####fit the linear mixed model from the full data with 548 observations

fit2 <- lmer(popteach ~ 1 + extrav+texp + (1 |class), data=kompletta_data, REML = FALSE) summary(fit2)

fit2_sum<-summary(fit2)

fit2_est<-fit2_sum$coefficients[,1] fit2_sd<-fit2_sum$coefficients[,2]

#####fit the linear mixed model from complete case analyses

fit3 <- lmer(popteach_real_na ~ 1+extrav_real_na+texp_real_na + (1 |class), data = data_na, REML = FALSE)

summary(fit3)

fit3_sum<-summary(fit3)

fit3_est<-fit3_sum$coefficients[,1] fit3_sd<-fit3_sum$coefficients[,2]

### we plot the estimated intercept for each of the fitted models together with double uncertainty.

plot(c(1,2,3),c(fit2_est[1],fit_est[1],fit3_est[1]),type="n",main="",xlab="",ylab="Intercept",yl im=c(0,3),xlim=c(0.8,3.5),axes = F)

segments(1, fit2_est[1]-2*fit2_sd[1], 1, fit2_est[1]+2*fit2_sd[1], col="Blue", lwd=3) segments(2, fit_est[1]-2*fit_sd[1], 2, fit_est[1]+2*fit_sd[1], col="Green", lwd=3) segments(3, fit3_est[1]-2*fit3_sd[1], 3, fit3_est[1]+2*fit3_sd[1], col="Orange", lwd=3)

(28)

points(1:3, c(fit2_est[1],fit_est[1],fit3_est[1]), pch=18, cex=1, col="Red")

axis(side=1,at=c(1,2,3),tick=F,labels = c("full data","multiple imputation","complete case analysis"))

axis(side=2,at=c(0,0.5,1,1.5,2,2.5,3),tick=T,labels = c(0,0.5,1,1.5,2,2.5,3))

### we plot the slope parameters related to the extrav variable obtained for each of the fitted models together with their double uncertainties

plot(c(1,2,3),c(fit2_est[2],fit_est[2],fit3_est[2]),type="n",main="",xlab="",ylab="extrav",ylim =c(0.3,0.7),xlim=c(0.8,3.5),axes = F)

segments(1, fit2_est[2]-2*fit2_sd[2], 1, fit2_est[2]+2*fit2_sd[2], col="Blue", lwd=3) segments(2, fit_est[2]-2*fit_sd[2], 2, fit_est[2]+2*fit_sd[2], col="Green", lwd=3) segments(3, fit3_est[2]-2*fit3_sd[2], 3, fit3_est[2]+2*fit3_sd[2], col="Orange", lwd=3) points(1:3, c(fit2_est[2],fit_est[2],fit3_est[2]), pch=18, cex=1, col="Red")

axis(side=1,at=c(1,2,3),tick=F,labels = c("full data","multiple imputation","complete case analysis"))

axis(side=2,at=c(0.3,0.4,0.5,0.6,0.7),tick=T,labels = c(0.3,0.4,0.5,0.6,0.7))

###we plot the slope parameters related to the texp variable obtained for each of the fitted models together with their double uncertainties

plot(c(1,2,3),c(fit2_est[3],fit_est[3],fit3_est[3]),type="n",main="",xlab="",ylab="texp",ylim= c(0.03,0.12),xlim=c(0.8,3.5),axes = F)

segments(1, fit2_est[3]-2*fit2_sd[3], 1, fit2_est[3]+2*fit2_sd[3], col="Blue", lwd=3) segments(2, fit_est[3]-2*fit_sd[3], 2, fit_est[3]+2*fit_sd[3], col="Green", lwd=3) segments(3, fit3_est[3]-2*fit3_sd[3], 3, fit3_est[3]+2*fit3_sd[3], col="Orange", lwd=3) points(1:3, c(fit2_est[3],fit_est[3],fit3_est[3]), pch=18, cex=1, col="Red")

axis(side=1,at=c(1,2,3),tick=F,labels = c("full data","multiple imputation","complete case analysis"))

References

Related documents

I Team Finlands nätverksliknande struktur betonas strävan till samarbete mellan den nationella och lokala nivån och sektorexpertis för att locka investeringar till Finland.. För

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar