• No results found

Multi-population mortality models in the Lee-Carter framework: an empirical evaluation on Sweden's 21 counties

N/A
N/A
Protected

Academic year: 2022

Share "Multi-population mortality models in the Lee-Carter framework: an empirical evaluation on Sweden's 21 counties"

Copied!
40
0
0

Loading.... (view fulltext now)

Full text

(1)

MULTI-POPULATION MORTALITY MODELS IN THE LEE-CARTER FRAMEWORK - AN EMPIRICAL

EVALUATION ON SWEDEN’S 21 COUNTIES

Submitted by Christoffer Eriksson

A thesis submitted to the Department of Statistics in partial fulfillment of the requirements for a two-year Master of Arts degree

in Statistics in the Faculty of Social Sciences

Supervisor Mattias Nordin

Spring, 2020

(2)

ABSTRACT

Mortality forecasts has a wide range of applications for example in policy planning, de- mographic research and actuarial science. This thesis compares the forecast accuracy between an independent model and some joint models which all are intuitive extensions of the widely used Lee-Carter model in a multi-population mortality framework. The joint models in my application differ in model complexity and the potential trade-off between model complexity and forecast performance is therefore important to investigate. The aim was to see if all joint models are better than the independent model or if some are better than others. Evaluation is done on Swedish county data and the models are estimated during 1969-1999 and evaluated on data between 2000-2019. The results showed that the joint models had better forecast accuracy than the individual model and the difference increased as the estimation period became shorter.

There was also a clear difference in forecast accuracy between the joint models on shorter esti- mation periods.

Keywords: Coherence, Lee-Carter, Multi-population mortality modeling, Mortality forecasting

(3)

Contents

1 Introduction 3

1.1 Background . . . . 3

1.2 Aim and research questions . . . . 5

1.3 Outline of the paper . . . . 5

2 Methodology 6 2.1 Notation . . . . 6

2.2 Mortality models in the Lee-Carter framework . . . . 6

2.2.1 The Lee-Carter model . . . . 6

2.2.2 Mortality in Poisson setting . . . . 8

2.2.3 Multi-population models . . . . 9

2.3 Forecasting . . . . 12

2.3.1 The main mortality indices κ

t

or κ

(1)t

. . . . 12

2.3.2 Additional time-dependent components . . . . 15

2.4 Model evaluation . . . . 15

3 Empirical application 18 3.1 Data . . . . 18

3.2 Method . . . . 19

4 Results 22 4.1 In sample evaluation . . . . 22

4.2 Out of sample evaluation . . . . 23

4.2.1 Forecast horizon 2000-2009 . . . . 23

4.2.2 Forecast horizon 2010-2019 . . . . 24

4.2.3 Forecast horizon 2000-2019 . . . . 25

5 Discussion 30

Bibliography 33

(4)

1 Introduction

1.1 Background

Forecasts of mortality has a wide range of applications and especially for old-age mortality.

It is important for governments when dealing with pension- and social security related policy planning, to know what needs for healthcare there will be in the future and mortality forecasts also has an important role in actuarial science and demography.

Lee and Carter (1992a) introduced a ground-breaking method to analyse and forecast mor- tality. The method has since then been one of the most used method in practical applications, it is often used as a benchmark for other mortality models and it is a building block from where multiple extensions has been built upon (Janssen 2018). The basic structure of the Lee-Carter model has gained its popularity because of its simplicity and intuitive interpretation, its forecast performance, its analytical properties of the general mortality development over time and for being a more objective method that is less dependent on subjective judgements.

Li and Lee (2005) extended the Lee-Carter method to jointly model multiple populations mor- tality which has got a lot of attention in this research field. The main ideas behind modeling populations jointly is two-fold: Firstly, jointly modeling makes it possible to impose parameter structures so that forecasts for the mortality development between different populations does not diverge indefinite over time. Secondly, if populations are closely related in different aspects (e.g. spatially, economically and in terms of common policies) they should have some similar- ities and shared experience in their mortality development. Jointly modeling could therefore be more efficient than modeling populations separately in terms of forecast performance and parameter uncertainty since one can increase relevant information. In small populations the mortality rates are often more variable than in larger populations, a common way to avoid the potential problems of estimating the model on small populations is to aggregate them across years, ages or with closely related populations. Jointly estimating such populations with larger ones is one example of where jointly modeling potentially could decrease the parameter uncer- tainty and improve the forecast stability (Alexander et al. 2017).

The main focus in the literature of joint mortality modeling between populations has been

(5)

to ensure that forecasts between different populations age specific mortality rate are not di- verging or having cross-overs in a long run, something that is called coherent forecasts. The development of methods that ensure coherent forecasts in recent years has been the reason why multi-population mortality modeling has got so much attention (Janssen 2018). Coherent forecasts between closely related populations is a desirable property in many cases but should depend on the reason of mortality forecasting. It could for example depend on the forecast hori- zon or the research question. The comparisons in the literature over multi-population mortality models has mainly been between the forecast accuracy of independent modeling and coherent modeling where coherent mortality models have been shown to be more superior in terms of forecast accuracy.

In practice, one can model mortality between populations jointly in different ways. The popu- lar model specifications proposed by Li and Lee (2005) are making strong assumptions on how equal the populations are. Models that are more flexible have been overlooked in the litera- ture because they does not ensure coherent forecasts. In the aspect of increasing the forecast efficiency and robustness for a single population of interest, these variations can have a model structure that can be equally if not more intuitive then to impose a structure that ensure coher- ence. A strict coherent model that assumes the populations to be more similar than they may be could lead to unbiased estimates of the model, less flexibility or even worse forecast accuracy.

An example of populations that could be considered closely related in many aspects are Swedish counties (län). They have many common aspects that affects mortality, e.g. public policies, demographic similarities and healthcare. But Swedish counties also differ in some of these aspects. The counties have their own political leadership and are therefore largely responsi- ble in the healthcare for their region. Sweden’s counties also varies a lot in population size.

These examples suggests that jointly modeling the mortality in Sweden’s counties could be

more effective than estimating them separately but depends on how different they are.

(6)

1.2 Aim and research questions

Mortality models in the Lee-Carter framework is still widely used in practice but the aspect of modeling multiple populations mortality in different ways as a bias-variance trade-off is miss- ing in the literature. Because of this I want to see how different multi-population mortality models in this framework affect forecast performance for individual populations but still pre- serve a simple model structure.

The data that will be used is mortality data for ages from 60 up to 90+ in Sweden’s 21 counties.

The goal with this thesis is to see how modeling Swedish counties jointly in the Lee-Carter framework could improve the forecasts under a forecast horizon of 20 years. This will be com- pared with modeling the counties independently.

The main questions I want to answer based on this data:

• Do jointly modeling Swedish counties lead to better forecast accuracy than independent modeling?

• Is some ways of jointly modeling better than other in this aspect?

• How much does the length of the estimation period affect the forecast accuracy between the models?

1.3 Outline of the paper

Section 2 will be about previous research and the methodology.

The first part of that section, section 2.1, is about the notation that will be used throughout

the paper. In the rest of section 2 there will be an overview over previous relevant research,

from the original Lee-Carter method to its extension in to multi-population setting. Section 3

is about the data in detail, the application of the methods and a last part about my conclusions

and a discussion.

(7)

2 Methodology

2.1 Notation

The measure that is of main interest in this thesis is the mortality rate m

t,x,i

for age x, year t and in population i. It is a measure of how many that died at a specific age during a year in relation to how many on average that lived at that age during the same year in each population.

It is estimated as follows:

m

t,x,i

= D

x,t,i

E

x,t,i

Where t = year, x = age or age class and i = population.

D

x,t,i

is the number of deaths at age (or age class) x, at year t and in population i. E

x,t,i

is the exposure-to-risk which is the average number of persons that lived between the end of year t and t-1, in the same age (or age class and in the same population.

2.2 Mortality models in the Lee-Carter framework

In the Lee-Carter modeling framework there are often two steps in doing mortality forecasts.

The first step is to estimate the parameters in the mortality model and then model and project- ing the time-series components of the model. These models are only estimated on historical mortality data and are therefore not including assumptions about factors other than historic mortality data and its development over time. For example advances in the medical science or other factors that can influence mortality are not explicitly incorporated in the models (Brouhns et al. 2002). All of the model structures presented in the upcoming sections will be estimated on a subset of the data in a Poisson setting which is described in section 2.2.2 and 2.2.3. The estimated models will then be used for forecasting and be evaluated on the rest of the data by looking at their forecast accuracy.

2.2.1 The Lee-Carter model

Lee and Carter (1992a) introduced a simple model for describing the mortality development

for different ages. Their approach is now called the Lee-Carter method and was designed to

(8)

model only one population and modeled the age-specific mortality rate m

x,t

as:

m

x,t

= exp(a

x

+ β

x

κ

t

+ 

x,t

) or log(m

x,t

) = a

x

+ β

x

κ

t

+ 

x,t

Where 

x,t

∼ N (0, σ

2

) and the expected value of the mortality rate is called the force of mortality and is denoted as µ

x,t

:

E(m

x,t

) = µ

x,t

= exp(a

x

+ β

x

κ

t

)

This model describes the force of mortality as a function of an age-specific constant (a

x

) and a product of a time varying mortality index (k

t

), that reflects how the general mortality is changing over time and an age-specific constant (β

x

) that reflects how the mortality rate at different ages are affected by changes in the mortality index. The observed mortality rate is assumed to be the force of mortality with the error term 

x,t

. The constants α

x

and β

x

is assumed to be fixed over time (i.e time-invariant). The parametrisation, regardless of how it is estimated, is not unique and therefore not directly identified. This fact can be illustrated by looking at following equation:

log(µ

x,t

) = α

x

+ β

x

c · κ

t

c

There is an infinitive number of possible solutions to this equation, based on different values of c. To have consistently comparable estimates one need to impose identifications constraints and the standard choice for the Lee-Carter model is:

X

x

β

x

= 1 and X

t

κ

t

= 0

Because of the absence of regressors, the model can not be fitted with ordinary regression methods such as ordinary least square (OLS). Lee and Carter used singular value decomposition (SVD) which is a dimensionality reduction technique, commonly used in many fields. To estimate the model they first used above constraints to find the least square solution to α

x

, which becomes:

ˆ α

x

= 1

T X

t

log(m

x,t

)

They then decomposed the centered log-mortality matrix {log(m

x,t

) − ˆ α

x

} = A

(X×T )

 which is ordered by ages in columns and years in rows:

A = U ΣV

0

=

r

X

i=1

σ

i

u

i

v

0i

(9)

Were U

(X×X)

and V

(T ×T )

are orthogonal matrices, Σ

(X×T )

is a rectangular diagonal matrix which diagonal contain the singular values for A (diag(σ

1

≥ σ

2

≥ ... ≥ σ

r

≥ 0)) and r is the rank of A. To get estimates of β

x

and κ

t

to the model, only the first rank (rank=1) is used:

A = σ ˆ

1

u

1

v

01

= ˆ β

(X×1)

κ ˆ

0(1×T )

With this estimation approach an underlying assumption is that the the errors for the loga- rithm of the mortality rate is normally distributed and with constant variance (homoscedastic).

Brouhns et al. (2002) argued that the logarithm of the mortality rate is more variable in older ages where the exposure to risk is smaller than in younger ages which make an assumption of homeostatic errors invalid.

2.2.2 Mortality in Poisson setting

An important contribution to handle the problem of assumed homoscedastic errors was by Brouhns et al. (2002) that modeled the number of deaths as a Poisson variable. The parameter in the Poisson distribution is the exposure rate times the force of mortality:

D

x,t

∼ P oisson(E

x,t

µ

x,t

) (1)

Brouhns et al. modeled the force of mortality as Lee and Carter did but as part of the Poisson distribution in equation (1):

µ

x,t

= exp(a

x

+ β

x

κ

t

) or log(µ

x,t

) = a

x

+ β

x

κ

t

(2) They argued that modeling the number of deaths as a Poisson variable is more intuitively ac- ceptable because the number of deaths is a positive discrete value, Poisson distribution is well suited for mortality analyses and it overcomes the problem of assuming homoscedastic errors.

Estimation is done with using maximum likelihood estimation (MLE) by maximising the log- likelihood function:

L(θ) = X

x,t



D

x,t

log(E

x,t

µ

x,t

(ˆ θ)) − E

x,t

µ

x,t

(ˆ θ) − log(D

x,t

!) 

They did this iteratively by using the Newton-Raphson algorithm, starting with α, then β and lastly κ:

θ ˆ

(v+1)

= ˆ θ

(v)

− ∂L

(v)

(ˆ θ

(v)

)/∂θ

2

L

(v)

(ˆ θ

(v)

)/∂θ

2

(10)

Assuming that the number of deaths follows a Poisson distribution as in equation (1) implies that the expected value and the variance is the same:

E(D

x,t

) = E

x,t

µ

x,t

and V ar(D

x,t

) = E

x,t

µ

x,t

In practical applications, for example the analysis made by Cairns et al. (2009), significant signs of over-dispersion has been noticed (i.e. variance is greater than the expected value). In these cases the variance can more properly be described as:

V ar(D

x,t

) = φE

x,t

µ

x,t

where φ > 1

To handle the problem of over-dispersion one can either use other distributional assumptions such as Negative-binomial distribution or use Poisson distribution and approximate the over- dispersion from the deviance function (Wong et al. 2020). Because Poisson distribution is an intuitive choice of estimating the number of deaths it is still the standard distributional choice in the literature, especially for multi-population models.

2.2.3 Multi-population models

There are a lot of reasons to believe that closely related populations (in terms of spatial-, economic- and social distance), over time have a lot of things in common when it comes to mortality and the development of mortality. Modeling such populations jointly could increase the amount of relevant information, decrease the parameter uncertainty and thus be a more ef- ficient way of estimating than modeling them separately.

An important property in the research of multi-population mortality forecasts is called coher- ence. It means that the difference between some populations age-specific mortality do not increase indefinitely over time when doing forecasts. This property is reasonable from both a theoretical and a practical perspective in many cases. For example, there has been a con- vergence between men’s and women’s mortality level for most countries for a while. Inde- pendently forecasting men’s and women’s mortality development could lead to an indefinitely increasing or decreasing ratio between the sexes age-specific mortality which is highly unlikely.

It is more probable that a ratio for the age-specific mortality levels between the sexes will go towards a constant in a long run:

m

x,t,M an

m

x,t,W oman

− → R

x

t − → ∞

(11)

The idea is the same in cases where there is more than two populations.

Divergent differences in mortality levels, which leads to an increasing or decreasing ratio be- tween different populations over a long horizon can be possible but in many cases not plausible (Li et al. 2016).

The models I will present have similar model structures as the original Lee-Carter model. Their core assumptions is that the logarithm of the age-specific force of mortality can be described with an age-specific intercept and a product of an age-specific sensitivity-parameter with a mor- tality index. The difference between them are which of these parameters that they assume to be common for all populations. Some of the models also include additional factors to allow for more flexibility. Because of these assumptions, similar to the original Lee-Carter model, they are referred to as models in the Lee-Carter framework.

Lee and Carter (1992b) estimated a model using SVD that has been called the Joint-κ model, where they modeled the force of mortality between US men and women as:

ln(µ

x,i,t

) = α

x,i

+ β

x,i

κ

t

(3)

In this model, the populations (sexes) are assumed to share a common trend κ

t

but are at the same time allowed to differ by the population-specific terms α

x,i

and β

x,i

. This could be a reasonable assumption but because of different age-specific sensitivities (β

x,i

) between the populations, forecasts based on this structure are not coherent. Lee and Carter argued that there still was some benefits of modeling a common trend in this way: It can be more effective to work with a single κ and it is a more parsimonious way to deal with the assumed connection between the populations than the use of more complicated time series models to capture a po- tential cointegrated relationship.

Li and Lee (2005) introduced an approach to jointly model mortality between populations that ensured coherent forecasts. Their approach has been described as a trend setter in this research field (Janssen 2018). They extended the original Lee-Carter method by introducing a model, estimated with SVD, that they called the Common Factor Model (CFM) which model the force of mortality as:

ln(µ

x,i,t

) = α

x,i

+ β

x(1)

κ

(1)t

(4)

(12)

The use of the common factor β

x(1)

κ

(1)t

ensure coherent forecasts and the population-specific term α

x,i

is allowing it to be a difference in the age-specific level of the force of mortality be- tween the populations. To allow for short run deviation from the common trend they introduced the Augmented Common Factor Model (ACFM):

ln(µ

x,i,t

) = α

x,i

+ β

x(1)

κ

(1)t

+ β

x,i(2)

κ

(2)t,i

(5) β

x(1)

κ

(1)t

is the common long-term trend in force of mortality where β

x,i(2)

κ

(2)t,i

is a population- specific factor which allow the populations to differ from the common main trend. As long as the population-specific time components κ

(2)i,t

are modeled so that it approached a constant the forecasts will be coherent in the long run but the modeling approach lets the population’s age-specific force of mortality differ from the common trend and between populations in the short run. Li (2013) applied this modeling strategy in the Poisson framework and estimated it in a hierarchical way by conditional maximum likelihood

1

.

Kleinow (2015) argued for the usefulness of a model he called the Common Age Effect model (CAE), estimated with SVD, where the age effect (β

x

) is assumed to be common for all popu- lations:

log(µ

x,i,t

) = α

x,i

+ β

x

κ

t,i

(6)

He did not use it for forecasting but to see how important population specific sensitivity pa- rameters (β

x,t,i

) were compared to assuming that they were the same (β

x,t

) for the populations he looked at. The model does not necessarily ensure coherent forecasts, but if the sensitivity parameters are close between the populations it could be a more consistent and efficient way of doing forecasting compared to individual models, since estimation of population-specific sensitivity parameters are avoided.

Li et al. (2016) suggested that one could use the CAE-structure as an additional factor to the Poisson-CFM:

ln(µ

x,i,t

) = α

x,i

+ β

x(1)

κ

(1)t

+ β

x(2)

κ

(2)t,i

where D

x,i,t

∼ P oisson(E

x,i,t

µ

x,i,t

) (7) This would decrease the number of estimated parameters compared to the a Poisson-ACFM and if the β

x,i(2)

is similar between the populations this would be a more efficient modeling approach.

1

They first estimate the CFM form in Poisson setting, treated the estimated parameters as fixed quantities and

then estimated the additional parameters given the previously estimated parameters.

(13)

All of the models I have discussed so far assumes a time-invariant age pattern by the use of the time-independent parameters α and β. This assumption has been showed not to hold under long periods because of larger structural changes in mortality aspects that affects ages differ- ently (Li and Li 2017). Some researchers have noticed a very subtle and small but constant change in the age-pattern. Without taking this in to account when doing long forecasts can make the forecasts questionable. Under a shorter fitting period and forecast horizon the esti- mated age-pattern could be approximately valid as it capture the latest systematic changes but to short fitting period can give less robust and therefore unreliable age-pattern because of higher parameter uncertainty (Li et al. 2013).

2.3 Forecasting

To make forecasts, one model and project the time-dependent components and then incorporate the predicted values into the estimated mortality model by holding the time-invariant parame- ters fixed. To illustrate this, model in equation (2) would be forecasted as:

ˆ

m

x,t+1

= ˆ µ

x,t+1

= exp( ˆ α

x

+ ˆ β

x

κ ˆ

t+1

)

2.3.1 The main mortality indices κ

t

or κ

(1)t

In practical applications, the main mortality indices for single populations, since Lee and Carter (1992), are often modeled as a random walk with drift (RWD):

κ

t

= c + κ

t−1

+ 

t

The reason for this is that it is easily interpreted and capture the most important feature of the mortality index since the estimated κ’s are often linear over time. Lee and Carter motivated the choice by inspection of the auto-correlation function (ACF) and the partial auto-correlation function (PACF). Different choices of auto-regressive integrated moving average (ARIMA) with drift are therefore suggested as an option and has been used where these seems to model and describe the main mortality indices more correctly.

Modeling the main mortality indices in this way, there is an underlying assumption of linearity

in the mortality index. If the main mortality indices are not linear during the fitting period

forecasts based on RWD or some other linear time-series model tend to make the forecasts

(14)

biased and inaccurate (Johnny Siu-Hang Li PhD et al. 2011). This fact and by looking at mortality data for longer periods has made many researchers to question these often assumed data-generating processes of the mortality indices (Börger and Schupp 2018).

Li et al. (2011) suggested that it is more appropriate to assume that the mortality index follows a piece-wise linear trend process with random changes in time which can be exemplified by following example:

κ

t

= α

1

+ β

1

· t + (α

2

− α

1

t

(T

) + (β

2

− β

1

t

(T

) + e

t

Γ

t

(T

) =

(1 if t > T

0 otherwise

and Ψ

t

(T

) =

(t if t > T

0 otherwise

These random changes in the slope and intercept of the mortality index in the Lee-Carter model can be seen as structural breaks like suddenly advances in the medical science, epidemics or catastrophes. Such changes can have have effects on both the mortality indices and the shape of the age-pattern (Johnny Siu-Hang Li PhD et al. 2011).

The recommendation to increase the validity for the forecasts is to estimate and forecasting the model during the last and longest time period where assumption of linearity and time-invariant parameters seems to hold. If one does not have a theoretical reason to believe in structural changes during the forecast horizon of interest, extrapolating the most recent linear trend is probably the best approximation of the future trend (Börger and Schupp 2018).

Different tests and algorithm has been proposed to finding the optimal fitting period based on the assumption of linearity and the validity of time-invariant parameters (see Booth et al. (2002) or Zivot and Andrews (2002)).

In practice, for example in the analysis made by Lee and Li (2005), Li (2013) and Li et al.

(2016), it is very common that the only assumption that is being checked is the linearity as- sumption by either a visual inspection or with statistical tests (Li and Li 2017).

In a Swedish context, Lundström and Qvist (2004) looked at both of these assumptions for

men and women at a national aggregated level during the 20th century based on the Lee-Carter

model. They suggested that an estimation period from 1980 would be a valid estimation period

for the assumption of linearity and time-invariance for both men and women.

(15)

Modeling more than two population-specific main mortality indices jointly is uncommon in the literature. The state of the art is to use model structures like the CFM or the ACFM in these cases. In the CAE model, one approach could be to model the population specific mortality indices as a multivariate random walk with drift:

κ

t

= c + κ

t−1

+ 

t

where 

t

∼ N (0, Ω)

This becomes a multivariate extension of the standard assumptions made when modeling the main mortality indices for single populations. In practice, this would preserve the simplic- ity, ease the interpretation of the models and capture potential correlations in the errors. As in the univariate case, this could be extended to the use of the vector version of the ARIMA, VARIMA.

An underlying assumption in the coherent mortality models I have presented is that the popula- tions share a common trend. A second approach and a more theoretically reasonable approach would therefore be to model multiple mortality indices as co-integrated time-series, which was suggested but not explored by Lee and Carter (1992b). This would assume that all mortality indices individually are non-stationary processes with a unit root but that one or more linear combinations of these (A

0

κ

t

) would be stationary because of shared stochastic processes. The use of co-integrated time-series models could be a theoretically valid way of handling the mul- tivariate mortality indices but could make the interpretation and the model specification overly complex in this application.

I will not explore this possibility because of following reasons:

• There is not any standard way of modeling multiple (i>2) mortality indices in the litera- ture

• To keep the models easy for interpretation

• Because of large number of mortality indices in relation to their length

• To be consistent with how the mortality indices is estimated in the other models

• In the CAE structure, the main use of that model in my application is to see it as a way

of decrease the uncertainty around the sensitivity parameters β

x

.

(16)

For a discussion of the possibilities and pitfalls of using cointegrated time-series models in the Lee-Carter framework, see Jarner and Jallbjørn (2020).

2.3.2 Additional time-dependent components

When additional time-components are added in the model as in the ACFM, the standard proce- dure is to model them as independent autoregressive processes of order 1 (AR(1)):

κ

(2)i,t

= µ

i

+ φ

1,i

κ

(2)i,t−1

+ e

i,t

where e

i,t

∼ N (0, σ

i2

)

As long as |φ

1,i

| < 1 the process is stationary and projected values of κ

(2)i,t

will move towards a constant over time, which will ensure coherent forecast in the long run. If the process is not stationary it is often estimated as a random walk:

κ

(2)i,t

= κ

(2)i,t−1

+ e

i,t

where e

i,t

∼ N (0, σ

i2

) This also ensure coherence since the expected value of κ

(2)i,t+1

is κ

(2)i,t

.

Higher order AR(p)-processes are also often used if it seems to describe these time-series better.

If the goal is coherent forecasts, one could use other mean-reverting time-series models.

2.4 Model evaluation

The goal in these contexts is often to have a model that generalise well and is parsimonious as well as interpretable. One tries to avoid over-parametrization as long as the final model provides an adequate description of the main features of the data (Li 2013).

Increasing the number of parameters in a model will always make the model fit the data it is estimated on better but not necessarily generalize better on new data, i.e. over-fitting. To find this balance between in sample fit and model-complexity and to select models the use of information criterias is often used in the literature. The most commonly used measures is Bayes Information Criteria (BIC) and Akaike Information Criterion (AIC):

BIC = −2 LL + n

p

ln(n

d

) AIC = −2 LL + 2 n

p

LL is the estimated log likelihood, n

p

is the effective number of parameters (the number of pa- rameters minus the number of identification constraints) and n

d

is the number of observations.

Lower values of these measures is preferred. The increase of the likelihood is balanced with

(17)

penalizing the inclusion of more parameters. These criteria is well suited when using maxi- mum likelihood and make it possible to compare models that are not necessarily nested (Cairns et al. 2009). BIC penalize the inclusion of more parameters harder then AIC and the different measures could therefore give different results. Both measures suggesting the same model is a stronger indicator for model selection.

A standard procedure is also to look at plots of the residuals during the fitting period against year and age for each population. This can show systematic errors in the models. A common choice of residuals is to use standardized deviance residuals which definition can be seen in the Appendix.

The most commonly used measure to evaluate the out of sample prediction accuracy in mortal- ity modeling is mean absolute percentage error (MAPE) between the estimated mortality rate and the actual mortality rate. It is defined as:

M AP E

i

= 1 N

i

X

x,t

ˆ

m

x,i,t

− m

x,i,t

m

x,i,t

N

i

is the number of observations in each population which is the number of ages or age classes times the number of years. In my application I will evaluate the models by looking at the information criterias to see if there is any clear indication of the best model in terms of in sample fit vs. model complexity. I will also briefly look at the scaled deviance residuals as described above. The models that has been presented have the same core assumption about how mortality can be described and are therefore expected to have similar residual plots. Since the main purpose of this thesis is to look at the forecast accuracy I will only use the residual analysis to see if the models differ from one another to any larger extent in this aspect. The most important part in this study based on my research question is the out of sample forecast evaluation. So when evaluating the forecast accuracy I will plot the MAPE value for each county and for each model to clearer illustrate the potential gains in efficiency between the models. The mean of the MAPE will be presented for each model as it is common practice.

There is potential skeweness of the MAPE values due to the variability in mortality that comes

from the difference in population size and because MAPE ranges from zero to infinity. Mean

values are affected skewness and mean is therefore a good measure in the presence of extreme

values. I will also present the median as an average measure which is less dependent on extreme

(18)

values and therefore shows another relevant aspect of the average forecasts. To illustrate the spread of the MAPE values the minimum and maximum MAPE value for each model will be presented.

The forecast accuracy will be evaluated on the full evaluation period (2000-2019), on a shorter

period (2000-2009) and a longer period (2010-2019) to see if the accuracy between the models

depend on the forecast horizon.

(19)

3 Empirical application

3.1 Data

The data for this application is the number of deaths and exposure to risk for Sweden’s 21 coun- ties. The data comes from Statistic Sweden’s statistical database and range between the years 1969-2019 (SCB 2020). I look at the ages from 60 to 90+ in five-year age classes. Old-age mortality is often more interesting and have more policy relevance (e.g. in pension policy mak- ing) than the mortality in younger ages. The choice of five-year age classes is to decrease the difference in the number of parameters between the models to make them more comparable, to avoid unnecessary noise of estimating in the presence of small populations and because it is probably more interesting for policy-making to look at broader age classes instead of specific ages.

To be able to generalize the results for practical use I will estimate the models under two periods:

• During the full period (1969-1999).

• During 1980-1999. This is the estimation period suggested by Lunströms and Qvists (2004) analysis, based on the assumptions of linearity and time-invariant parameters.

Their analysis was made on aggregated level but the assumptions should be approxi- mately valid on county level.

The reason to use two estimation periods is to see how the average accuracy and the distribution of the MAPE values depends on the length of the fitting period since there is a bias-variance trade-off in using shorter estimation periods compared to longer. By having longer fitting period one is decreasing the parameters uncertainty but could miss changes in the age-patterns or structural breaks in the mortality indices and therefore have more biased forecasts. To focus on the assumption of linearity and the validity of a time-invariant age-pattern could decrease the bias in the models but makes the fitting period shorter which increase the parameter uncertainty.

The linearity assumption based on the independent model for each counties will be checked

visually and discussed. The aim is not to do the most valid forecasts but to see how well the

models performs with respect to the length of the fitting period. But it is important to be aware

of the validity of the main assumptions when the goal is to have reliable forecasts. If the results

(20)

based these two fitting periods leads to significantly different conclusions this could be useful for practical applications.

The forecast accuracy will be evaluated between the years 2000-2009, 2010-2019 and 2000- 2019.

3.2 Method

All models will be estimated by assuming that the number of deaths follows a Poisson distri- bution as:

D

x,i,t

∼ P oisson(E

x,i,t

µ

x,i,t

)

Since I am interested in the point forecasts I will not construct prediction intervals. The mod- els are estimated with maximum likelihood by using the iterative Newton-Rapson method as described in section (2.2.2) until the change in the log-likelihood is smaller than 10

−6

which is a common choice. The formula of the log-likelihood can be seen in the Appendix. Models that have additional factors will be estimating with conditional maximum likelihood, as described in the same section. Initial parameter values was randomly generated from a standard normal distribution. After estimating the parameters, the identification-constraints that can be seen in the Appendix were imposed.

There was no convergence problem during the estimations.

All calculations were done in R (R Core Team 2019).

Table 1: Evaluated models

Model (Structure) Structure Ensured coherent Jointly

M1 (Individually) ln(µ

x,i,t

) = α

x,i

+ β

x,i

κ

t,i

No No

M2 (Joint-κ) ln(µ

x,i,t

) = α

x,i

+ β

x,i

κ

t

No Yes

M3 (CFM) ln(µ

x,i,t

) = α

x,i

+ β

x(1)

κ

(1)t

Yes Yes M4 (ACFM) ln(µ

x,i,t

) = α

i,x

+ β

x(1)

κ

(1)t

+ β

x,i(2)

κ

(2)t,i

Yes

1

Yes M5 (CFM+CAE) ln(µ

x,i,t

) = α

i,x

+ β

x(1)

κ

(1)t

+ β

x(2)

κ

(2)t,i

Yes

1

Yes

M6 (CAE) ln(µ

x,i,t

) = α

i,x

+ β

x

κ

t,i

No Yes

1

Coherence is ensured because the additional time-dependent components is estimated as

independent AR(1)-processes.

(21)

The models that I will evaluate can be seen in Table 1. These models have the parameter- structure as the models I have presented in section 2. These seems to be natural ways of incorporating information from closely related populations and still preserve simple model structures. Even though the joint-κ structure and the CAE structure does not necessarily ensure coherent forecasts, they can still potentially be used in practice as a way of increasing rele- vant information for single population mortality forecasting. In this way they can be seen as a compromise between estimating on different sub-populations independently with models that impose a coherent model structure.

A short description of the models and what they assume about the mortality development in each county:

• M1 is independently modeled counties with a parameter structure as in the original Lee- Carter model (equation 2). ln(µ

x,i,t

) = α

x,i

+ β

x,i

κ

t,i

This assumes that the age-specific intercepts, the age-specific sensitivity-parameters and the main mortality indices are unique for each county.

• M2 has the structure as in the joint-κ model (equation 3). ln(µ

x,i,t

) = α

x,i

+ β

x,i

κ

t

This assumes that the age-specific intercepts and the age-specific sensitivity-parameters are unique for each county while they share a main mortality index.

• M3 has the structure as in the CFM (equation 4). ln(µ

x,i,t

) = α

x,i

+ β

x(1)

κ

(1)t

This assumes that the age-specific intercepts are unique for each county while they share the age-specific sensitivity-parameters and a main mortality index.

• M4 has the structure as in the ACFM (equation 5). ln(µ

x,i,t

) = α

i,x

+ β

x(1)

κ

(1)t

+ β

x,i(2)

κ

(2)t,i

This assumes the same as M3 but also that each county have additionally unique age- specific sensitivity-parameters and a potential short-run deviation from the trend.

• M5 has the structure as in the ACFM but where the additional factors has the form of the CAE model (equation 7). ln(µ

x,i,t

) = α

i,x

+ β

x(1)

κ

(1)t

+ β

x(2)

κ

(2)t,i

This assumes the same as M3 but also that each county have an additionally unique po-

tential short-run deviation from the trend but common age-specific sensitivity-parameters

to these time-dependent parameters.

(22)

• M6 has the structure as in the CAE model (equation 6). ln(µ

x,i,t

) = α

i,x

+ β

x

κ

t,i

This assumes that the age-specific intercepts and the main mortality indices are unique for each county while they share the age-specific sensitivity-parameters.

After estimating the models the BIC and AIC values are calculated and ranked. In line with the purpose of this thesis, M1 is independently modeled for each county and will therefore not be evaluated based on these measures. A brief visual inspection of the scaled deviance residuals is then done. Because of many residual plots I will only show such plots if the models differ to any larger extent from one another in this aspect. Thereafter I will estimate the main mortality indices as univariate RWD even for models where there exist multiple mortality indices (M1 and M6). The reason is that I am only interested in the point forecast and the estimated drift terms will be the same as if I had done it as a multivariate random walk with drift. RWD is chosen to be consistent with the literature as it is the standard assumption in practical appli- cation. The additional time-dependent parameters κ

(2)

(in M4 and M5) will be estimated as independent AR(1)-processes, which is also to be consistent with the literature and practical standards. Choosing different time-series processes for different models could make the com- parability questionable.

The time-series is forecasted to 2019 and the MAPE is then calculated between 2000-2009,

2010-2019 and then for the full period (2000-2019).

(23)

4 Results

4.1 In sample evaluation

Table 2: BIC & AIC ranking for jointly models

Fitting period 1980-1999 1969-1999

Model (Structure) BIC (rank) AIC (rank) n

p

BIC (rank) AIC (rank) n

p

M2 (Joint-κ) 28467.85 (2) 26600.17 (2) 312 43129.76 (2) 41054.67 (1) 323 M3 (CFM) 27655.17 (1) 26625.55 (3) 172 42400.36 (1) 41224.69 (3) 183 M4 (ACFM) 30762.59 (5) 26590.23 (1) 697 47148.88 (5) 41116.35 (2) 939 M5 (CFM+CAE) 30175.60 (4) 26721.59 (4) 577 46547.25 (4) 41285.65 (4) 819 M6 (CAE) 30065.97 (3) 26761.61 (5) 552 46318.65 (3) 41288.33 (5) 783 M1 is not included since it is estimated independently on each county.

There was no obvious difference between the models residual plots and therefore these will

not be shown. From the ranking based on the information criteras (Table 2) we can see that

the BIC rankings follows the number of effective parameters in the models (from smallest to

largest) and therefore suggest M3 (CFM) for model selection. The AIC rankings differ between

number of effective parameter and the length of the fitting period. One model that has good

ranking for both measures and fitting period is M2 (joint − κ) which has the best rank based on

the AIC for the longer fitting period and then has the second best rank for the other measures

under both fitting periods. M5 and M6 has quite bad rankings in general which could poten-

tially mean that they also have bad or at least not significantly better out of sample performance

than the models with better ranking.

(24)

4.2 Out of sample evaluation

4.2.1 Forecast horizon 2000-2009

During the shortest forecast horizon, all the joint models (M2-M6) have lower minimum-, maximum- and mean MAPE value than the individual model (M1). The coherent models (M3- M5) have the lowest minimum MAPE values but are having the highest median MAPE values of all models where M6 has the lowest median MAPE values.

Figure 1: MAPE under forecast horizon 2000-2009 for each county

Table 3: Distributional measures of MAPE×100 under forecast horizon 2000-2009

Fitting period 1980-1999 1969-1999

Model (Structure) Mean Median Min Max Mean Median Min Max

M1 (Individually) 6.95 6.24 3.34 14.82 7.54 7.12 4.29 13.53

M2 (Joint-κ) 6.63 6.24 3.03 14.00 7.37 7.03 3.94 12.75

M3 (CFM) 6.45 6.69 2.68 11.38 7.29 7.56 3.09 12.47

M4 (ACFM) 6.44 6.67 2.74 11.46 7.20 7.56 3.19 12.46

M5 (CFM+CAE) 6.46 6.72 2.68 11.46 7.30 7.53 3.01 12.48

M6 (CAE) 6.42 6.13 3.33 11.78 7.26 6.42 4.13 11.73

(25)

4.2.2 Forecast horizon 2010-2019

During the later forecast horizon all joint models (M2-M6) have lower minimum-, maximum-, mean- and medium MAPE values than the individual model (M1). The coherent models (M3- M5) have the lowest minimum- and maximum MAPE values and the coherent models with additional time components (M4-M5) have slightly better values compared to the coherent model without such parameters (M3).

Figure 2: MAPE under forecast horizon 2010-2019 for each county

Table 4: Distributional measures of MAPE×100 under forecast horizon 2010-2019

Fitting period 1980-1999 1969-1999

Model (Structure) Mean Median Min Max Mean Median Min Max

M1 (Individually) 9.11 8.48 5.24 19.75 10.09 9.19 6.29 20.65

M2 (Joint-κ) 8.74 8.05 4.70 19.70 9.71 8.68 5.81 17.30

M3 (CFM) 7.64 7.57 3.17 14.40 9.39 9.11 4.10 16.35

M4 (ACFM) 7.57 7.51 3.11 14.00 9.24 8.83 4.03 15.66

M5 (CFM+CAE) 7.58 7.50 3.13 14.00 9.30 8.84 4.02 15.70

M6 (CAE) 7.51 6.91 4.83 14.63 9.37 8.92 4.56 18.08

(26)

4.2.3 Forecast horizon 2000-2019

During the full forecast horizon the joint models (M2-M6) have lower minimum-, maximum- , mean- and median MAPE values compared to the individual model (M1). Here it is also clearer that the joint models have better forecast accuracy than the individual models (M1) and especially under the shorter fitting period. The coherent models (M3-M5) have the lowest minimum- and maximum MAPE values during this forecast horizon.

Figure 3: MAPE under forecast horizon 2000-2019 for each county

Table 5: Distributional measures of MAPE×100 under forecast horizon 2000-2019

Fitting period 1980-1999 1969-1999

Models (Structure) Mean Median Min Max Mean Median Min Max

M1 (Individually) 8.03 7.32 4.29 17.29 8.81 8.27 5.68 16.69

M2 (Joint-κ) 7.69 7.01 3.87 16.85 8.54 7.75 5.05 15.02

M3 (CFM) 7.01 6.47 2.97 12.69 8.27 8.23 3.55 14.07

M4 (ACFM) 7.01 6.47 2.99 12.73 8.22 8.17 3.61 14.06

M5 (CFM+CAE) 7.02 6.46 2.97 12.73 8.30 8.25 3.52 14.09

M6 (CAE) 6.97 6.49 4.14 13.21 8.32 7.71 5.16 14.63

(27)

All models fitted during the longer estimation period (1969-1999) are performing worse in all aspects than fitted during the shorter estimation period (1980-1999).

In figures 7-8 in the Appendix one can see the mortality indices estimated independently for each county from 1969-1999 based on M1. These show that the linearity seems to be approxi- mately valid for most of the counties during this period but there is some counties (e.g. 21, 24 and 25) where there seems to be a structural break in the linearity around 1980 or somewhat later. This potential non-linearity could also be the result of an autoregressive behaviour in these time-series. Since this is not taken in to account it could be the reason that the models perform worse during the longer fitting period. Another reason could be because of some struc- tural changes in the age-pattern as discussed earlier. An inspection of the difference between the estimated age effects (β

x

) for the different fitting period (Figures 9-10 in the Appendix) for all counties based on M1 shows a tendency of some increasing sensitivity for the younger age classes in relation to the older age classes during the shorter fitting period.

In general, regardless of the length of the fitting period, the results from the out of sample evaluation in section 4.2.1-4.2.3 shows that joint mortality models have better forecast accu- racy than individual models during the full and later forecast horizon in all aspects but even in most aspects during the shorter forecast horizon.

The results also indicate that the length of the fitting period affect the difference between the joint models in general and between the joint models and the independent model in particular.

To more clearly see if and how the average accuracy (in terms of mean and median) for the

models is dependent on the length of the fitting period and for these forecast horizons I shorted

the fitting period one year at a time from 1969-1999 to 1990-1999 and estimated the mean and

median MAPE values for all models. Ten years as fitting period for a forecast horizon of 20

years is not recommended in practice but can be illustrative for this purpose. The results can be

seen in figures 4-6.

(28)

Figure 4: Average forecast accuracy under forecast horizon 2000-2009 for different starting year of the fitting period (Mean and Median MAPE)

Figure 5: Average forecast accuracy under forecast horizon 2010-2019 for different starting

year of the fitting period (Mean and Median MAPE)

(29)

Figure 6: Average forecast accuracy under forecast horizon 2000-2019 for different starting year of the fitting period (Mean and Median MAPE)

This shows a much more clear picture and based on this result we can see that the joint models (M2-M6) have better accuracy (in terms of mean MAPE) than the individual model independent on the length of the fitting period and the difference increases as the fitting period is becoming shorter. The difference in mean MAPE between the all of the joint models is not clear until the start of the fitting period is around 1982. After that point, M1 (the individual model) has the highest mean MAPE, followed by M2, M6 and lastly M3-M5. There is no clear difference between the coherent models (M3-M5) which indicate that the inclusion of addi- tional factors are not necessary for increasing forecast accuracy.

In terms of median MAPE, not all of the joint models have better forecast accuracy than the individual model for all of these fitting periods. It takes until around 1985 before all joint mod- els show better results than the individual model for the forecast horizon 2000-2009, around 1979 for the forecast horizon 2010-2019 and until 1977 for the full forecast horizon 2000-2019.

After that, M1 has the highest median MAPE, followed by M2 and then M3-M6.

A comparison between the in sample evaluation for the joint models and their out of sam-

ple accuracy shows that AIC seems to penalize the model complexity to little when looking

at M3 and the models that are estimated conditioned on it (M4 and M5). The BIC seems to

(30)

be more in line with the out of sample for these models since their out of sample performance do not vary at any larger extent from one another and we would therefore want the ranking to follow the level of complexity as the BIC ranking does for these models. I.e. we would want M3 to have better ranking than M4 and M5 because of no significantly difference in forecast accuracy and M3 has much fewer parameters. For the other joint models their AIC and BIC ranking does not correspond to the results in the out of sample evaluation since M6 has better forecast accuracy in general compared to M2 but worse AIC and BIC rankings. Better values for these information criterias does not necessarily lead to better out of sample performance.

Another reason could also be that the information criterias do not take in to account that we

use the parameters differently when forecasting, holding some of them fixed but not other. The

inclusion of time-dependent parameters should be penalized differently in the information cri-

terias than the inclusion of fixed age dependent parameters for a more fair comparison. M2 has

more fixed age dependent parameters than M6 that has more time-dependent parameters, if this

would be took in to account then the AIC and BIC rank may have been more in line with the

forecast accuracy.

(31)

5 Discussion

In this thesis I have compared the forecast accuracy of different parsimonious multi-population mortality models in the Lee-Carter framework estimated in a Poisson setting compared to an individual model in the same setting. The basis for my evaluation was data on old-age mortal- ity development in Sweden’s 21 counties between 1969-2019. The results show that the joint models (M2-M6) had better forecast accuracy in terms of mean MAPE for all forecast horizons and fitting periods. The results also shows that the efficiency gains of using joint models is clearly dependent on the length of the fitting period and that one should carefully look at the validity of the main assumptions for these models to not get biased forecasts.

That the coherent models (M3-M5) are better than the individual model (M1) is the conclu- sions in many other comparisons. This is in line with my results but the difference becomes smaller as the fitting period becomes longer and the inclusion of additional terms (M4-M5) does not have any obvious impact on the forecast accuracy for these populations. On data like this, the most simple of the evaluated coherent models have some benefits over the other mod- els in practical applications which my analysis also suggests: It seems to be more efficient due to the homogeneity between these populations, it has a relatively simple and easily interpreted model structure and it ensure coherent forecasts which is a useful property if one is interested in more than one of the populations and a comparison between different populations forecasted mortality.

I was surprised to see that there was no clear difference in the average forecast accuracy be- tween the joint models on longer fitting period. I think this is an interesting result which indicate that the non-coherent joint models can be efficiently used for closely related, but less homogeneous populations than in my evaluation, where one have reasons to believe that only some of the parameters for a population of interest could be consider to be common with other populations. One can in such cases increase the information about certain parameters which is more efficient than independently estimate them and potentially less biased than imposing a strict coherent structure.

There are many other multi-population mortality models that are more complex with poten-

tially better forecast performance than the models I have evaluated. The Lee-Carter structure

(32)

is still widely used in practice which made me interested in investigating the efficiency gains of intuitive version and extensions of it in a multi-population framework and I hope that more people will continue to do that. Therefore I have some suggestions to further research:

It would be interesting to see how these coherent models compares to models with multiple main mortality indices being modeled and forecasted using cointegration techniques since there is a lot of theoretical reasons to believe that closely related populations share a common trend to some extent. Even if standard cointegration methods do not necessarily lead to coherent forecasts it could still be interesting to model the dynamics between the mortality indices and could potentially increase the forecast performance even more.

It would also be interesting to see this comparison be made on populations that are less homo- geneous than Sweden’s counties to see under what conditions non-coherent models are more useful and if the results differ.

Another thing that would be an interesting extension is to use clustering techniques to group

different sub-populations based on similarities in their mortality developments. This could be

complicated because one would have to take in to account similarities in their age-pattern and

the general mortality development over time. Clustering the mortality development between

populations could be interesting on its own but could also be used in models like these to give

some populations, instead of all, common parameters.

(33)

Acknowledgements

I would like to thank my supervisor Mattias Nordin for giving me valuable inputs and opin-

ions. I am especially thankful for his support during this process. I would also like to extend

my thanks to my girlfriend Lina Malmgren for all her continuous love and support throughout

my academical studies and I would also like to thank my family for their support.

(34)

Bibliography

Alexander, M., Zagheni, E., and Barbieri, M. (2017). A Flexible Bayesian Model for Estimating Subnational Mortality. Demography; Silver Spring, 54(6):2025–2041.

Booth, H., Maindonald, J., and Smith, L. (2002). Applying Lee-Carter under Conditions of Variable Mortality Decline. Population Studies, 56(3):325–336.

Börger, M. and Schupp, J. (2018). Modeling trend processes in parametric mortality models.

Insurance: Mathematics and Economics, 78:369–380.

Brouhns, N., Denuit, M., and Vermunt, J. K. (2002). A Poisson log-bilinear regression ap- proach to the construction of projected lifetables. Insurance: Mathematics and Economics, 31(3):373–393.

Cairns, A. J. G., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A., and Balevich, I. (2009). A Quantitative Comparison of Stochastic Mortality Models Using Data From England and Wales and the United States. North American Actuarial Journal, 13(1):1–35.

Janssen, F. (2018). Advances in mortality forecasting: introduction. Genus, 74(1):1–12.

Jarner, S. F. and Jallbjørn, S. (2020). Pitfalls and merits of cointegration-based mortality mod- els. Insurance: Mathematics and Economics, 90:80–93.

Johnny Siu-Hang Li PhD, F., Wai-Sum Chan PhD, FSA, C., and PhD, S.-H. C. (2011). Struc- tural Changes in the Lee-Carter Mortality Indexes. North American Actuarial Journal, 15(1):13–31.

Kleinow, T. (2015). A common age effect model for the mortality of multiple populations.

Insurance: Mathematics and Economics, 63:147–152.

Lee, R. D. and Carter, L. R. (1992a). Modeling and Forecasting U.S. Mortality. Journal of the American Statistical Association, 87(419):659–671.

Lee, R. D. and Carter, L. R. (1992b). Modeling and forecasting US sex differentials in mortality.

International Journal of Forecasting, 8(3):393–411.

Li, H. and Li, J. S.-h. (2017). Optimizing the Lee-Carter Approach in the Presence of Structural

Changes in Time and Age Patterns of Mortality Improvements. Demography; Silver Spring,

54(3):1073–1095.

(35)

Li, J. (2013). A Poisson common factor model for projecting mortality and life expectancy jointly for females and males. Population Studies, 67(1):111–126.

Li, J., Tickle, L., and Parr, N. (2016). A multi-population evaluation of the Poisson common factor model for projecting mortality jointly for both sexes. Journal of Population Research;

Dordrecht, 33(4):333–360.

Li, N. and Lee, R. (2005). Coherent Mortality Forecasts for a Group of Populations: An Extension of the Lee-Carter Method. Demography, 42(3):575–594.

Li, N., Lee, R., and Gerland, P. (2013). Extending the Lee-Carter Method to Model the Rota- tion of Age Patterns of Mortality Decline for Long-Term Projections. Demography; Silver Spring, 50(6):2037–51.

Lundström, H. and Qvist, J. (2004). Mortality Forecasting and Trend Shifts: An Application of the Lee-Carter Model to Swedish Mortality Data. International Statistical Review / Revue Internationale de Statistique, 72(1):37–50.

R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.

SCB (2020). Statistic Sweden’s statistical database. Stockholm, Sweden.

Wong, K., Li, J., and Tang, S. (2020). A modified common factor model for modelling mortality jointly for both sexes. Journal of Population Research, 37(2):181–212.

Zivot, E. and Andrews, D. W. K. (2002). Further Evidence on the Great Crash, the Oil-Price Shock, and the Unit-Root Hypothesis. Journal of Business & Economic Statistics, 20(1):25–

44.

(36)

Appendix

Identifiable constraints:

M 1 f or all i : X

x

β

x,i

= 1 and X

t

κ

t,i

= 0 M 2

X

x,i

β

x,i

= 1 and X

t

κ

t

= 0

M 3 X

x

β

x(1)

= 1 and X

t

κ

(1)t

= 0 M 4

β

x(1)

and κ

(1)t

is the same as in M 3 + f or all i : X

x

β

x,i(2)

= 1 and X

t

κ

(2)t,i

= 0 M 5

β

x(1)

and κ

(1)t

is the same as in M 3 + X

x

β

x(2)

= 1 and X

t,i

κ

(2)t,i

= 0

M 6 X

x

β

x

= 1 and X

t,i

κ

t,i

= 0

Standardised deviance residuals:

r

x,i,t

= sign(D

x,i,t

− ˆ D

x,i,t

) s

2(D

x,i,t

ln(D

x,i,t

/ ˆ D

x,i,t

− D

x,i,t

− ˆ D

x,i,t

) φ ˆ

(

D ˆ

x,i,t

= E

x,i,t

µ

x,i,t

(ˆ θ), ˆ φ = deviance

n

d

− n

p

, deviance = X

x,i,t

2

"

D

x,i,t

ln  D

x,i,t

D ˆ

x,i,t



− D

x,i,t

+ ˆ D

x,i,t

#)

The log-likelihood function:

L(θ) = X

x,t,i



D

x,t,i

log(E

x,t,i

µ

x,t,i

(ˆ θ)) − E

x,t,i

µ

x,t,i

(θ) − log(D

x,t,i

!)



(37)

Figure 7: Estimated κ based on individual models (M1) under the fitting period 1969-1999 for

different counties (official county-code)

(38)

Figure 8: Estimated κ based on individual models (M1) under the fitting period 1969-1999 for

different counties (official county-code)

(39)

Figure 9: Estimated β

x

based on individual models (M1) under the fitting period 1969-1999

and 1980-1999 for different counties (official county-code)

(40)

Figure 10: Estimated β

x

based on individual models (M1) under the fitting period 1969-1999

and 1980-1999 for different counties (official county-code)

References

Related documents

För att göra detta har en körsimulator använts, vilken erbjuder möjligheten att undersöka ett antal noggranna utförandemått för att observera risktagande hos dysforiska

All recipes were tested by about 200 children in a project called the Children's best table where children aged 6-12 years worked with food as a theme to increase knowledge

Det som också framgår i direktivtexten, men som rapporten inte tydligt lyfter fram, är dels att det står medlemsstaterna fritt att införa den modell för oberoende aggregering som

Using 1000 samples from the Gamma(4,7) distribution, we will for each sample (a) t parameters, (b) gener- ate 1000 bootstrap samples, (c) ret the model to each bootstrap sample

Däremot är denna studie endast begränsat till direkta effekter av reformen, det vill säga vi tittar exempelvis inte närmare på andra indirekta effekter för de individer som

Regioner med en omfattande varuproduktion hade också en tydlig tendens att ha den starkaste nedgången i bruttoregionproduktionen (BRP) under krisåret 2009. De

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft