• No results found

The COVID-19 Pandemic and its Effects on Swedish Mortality

N/A
N/A
Protected

Academic year: 2021

Share "The COVID-19 Pandemic and its Effects on Swedish Mortality"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

The COVID-19 Pandemic and its Effects on Swedish Mortality

Özlem Tepe & Siri Voghera

Bachelor’s thesis in Statistics

Advisor

Patrik Andersson

(2)

Abstract

This thesis analyses the COVID-19 pandemic’s effects on Swedish mortality during 2020 by investigating whether it has resulted in excess mortality. This is done using a stochastic mortality projection model from the Lee-Carter framework and by assuming the number of deaths follows a Poisson distribution. Due to the few confirmed COVID-19 deaths at younger ages, the decision is made to only include 50-to-100-year-olds in the analysis. Models in the Lee-Carter framework are fitted on historical data from 1993–2019 collected from Human Mortality Database and Statistiska Cen-tralbyrån. After evaluating the models, inter alia using residual analysis and backtest-ing, we ascertain that the classical Lee-Carter model accomplishes a wanted level of fit and forecast accuracy. During the morality projection with the Lee-Carter model, three different sources of uncertainty are accounted for by constructing prediction in-tervals using bootstrap. The results show that the large age group 67–94-year-olds have suffered statistically significant excess mortality during 2020. The level of excess mor-tality differs between ages, with the ages 70–90-year-olds having the highest number of excess deaths. Comparing the number of confirmed COVID-19 deaths to our fore-casted number of excess deaths indicates the COVID-19 virus likely caused the surge in deaths.

(3)

Contents

1. Introduction ... 1 1.1 Background ... 1 1.1.1 Excess mortality ... 1 1.1.2 Demographic factors ... 2 1.1.3 Previous studies ... 3

1.2 Research question and scope ... 4

1.2.1 Outline of the thesis ... 4

2. Methodology ... 5

2.1 Central death rates and central exposures ... 5

2.2 Models in the Lee-Carter framework ... 6

2.3 Modelling in a Poisson setting ... 8

2.4 The Lee-Carter model and its extensions ... 8

2.5 Model evaluation ... 10

2.6 Forecasting ... 11

2.7 Forecast accuracy evaluation ... 12

2.8 Prediction intervals... 13

3. Data and model selection ... 15

3.1 Data ... 15

4. Results and analysis ... 22

5. Conclusion ... 28

Appendices ... 33

Appendix A: Equations ... 33

Appendix B: Residual plots ... 35

(4)

1

1. Introduction

1.1 Background

The COVID-19 pandemic is a global health crisis that has affected billions of people worldwide (WHO, n.d.). Data from Folkhälsomyndigheten (FHM, 2021) shows that Sweden as of spring 2021 has endured three waves of high spread, with spikes in deaths in April and December 2020. The Swedish Government has classified COVID-19 as a disease that constitutes a danger to society and preventive measures have been implemented to limit the spread of the virus (FHM, 2020a). However, in contrast to other European countries, Sweden has taken a more liberal strategy, avoiding enforced lockdowns (Grasso et al., 2021, p. S12). Apart from the direct effects of the virus on morbidity and mortality, the pandemic and the implemented preventive measures may also have indirect effects on public health (FHM, 2020b). Delayed or avoided medical care, temporary or permanent layoff, and isolation are some examples of consequences that may have indirect negative effects on public health and may cause premature deaths. Simultaneously, preventive measures may also have positive effects on public health, for example by reducing the spread of other diseases. Hence, it is important to investigate how the COVID-19 pandemic affects public health overall. One way to do so is by analysing excess mortality.

1.1.1 Excess mortality

(5)

2

In contrast to the number of confirmed COVID-19 deaths, excess mortality also cap-tures COVID-19 deaths that have been missed due to incorrect diagnosis or under-reporting as well as deaths from other causes that are indirectly linked to the pandemic. Thus, excess mortality can be seen as a comprehensive measure of the pandemic’s total effect on mortality (Giattino et al., 2021). While it is particularly important to investigate excess mortality in countries that do not have a high-quality reporting sys-tem over the cause of death, it is still interesting to investigate excess mortality in Sweden due to the different effects that the pandemic may have on mortality.

1.1.2 Demographic factors

(6)

3

1.1.3 Previous studies

There are already some studies on excess mortality in Sweden due to COVID-19. FHM analyses on an ongoing basis the weekly fluctuations in mortality by using a method called the EuroMOMO algorithm (FHM, 2021). The expected number of deaths is modelled by a generalised linear model of the Poisson family, fitted on mortality data for the last five years (EuroMomo, n.d.).

Moreover, at the request of FHM, Kolk et al. (2021) analysed if there has been excess mortality in Sweden during 2020. In their report, the observed number of deaths during 2020 is compared to Statistiska Centralbyrån’s (SCB) forecast as well as the mean number of deaths for 2017 to 2019. SCB’s forecast on mortality for the age group 50– 100-year-olds is calculated using the Lee-Carter model (SCB, 2018, pp. 150–151). As a solution to parameter estimates of the Lee-Carter model being sensitive to the choice of fitting period, the model is fitted on several different periods. A mean of the differ-ent forecasts is then used as the main forecast. According to Kolk et al. (2021, p. 19), the COVID-19 pandemic caused 7752 excess deaths during 2020 and a break in the previous trend of increasing life expectancy. They also found that the increased mor-tality is concentrated to the higher ages, that men have been more affected than women and that there are regional and seasonal differences (ibid., p. 6).

(7)

4

1.2 Research question and scope

The aim of the thesis is to investigate how the COVID-19 pandemic has affected mor-tality in Sweden, thus contribute to the research on the pandemic’s impact on public health. This will be done by analysing whether the pandemic has caused excess mor-tality in Sweden during 2020. The main research question is therefore: How has the

COVID-19 pandemic affected the mortality in Sweden during 2020? Is there a statis-tically significant excess mortality and if so, how large?

Furthermore, the study aims to answer the following sub-questions:

i. How does the level of excess mortality differ between ages?

ii. How does the number of excess deaths differ from the number of confirmed COVID-19 deaths?

Many demographic factors of interest can be analysed. This study will focus on the older population, between the ages 50–100, as data shows that the elderly are most affected by COVID-19 in terms of mortality. In contrast to FHM’s weekly reports and Modig et al. (2021), the objective of this thesis is to investigate how the pandemic overall has affected mortality during 2020; since for some ages and regions, both pe-riods of excess mortality and pepe-riods of mortality deficit have been reported. Thus, this study shares many similarities with the report by Kolk et al. (2021). However, whilst Kolk et al. (2021) do not use prediction intervals to account for uncertainties, this study aims to account for several sources of uncertainty to determine whether the observed excess mortality is statistically significant.

1.2.1 Outline of the thesis

(8)

5

2. Methodology

In this section, the public health metric called central death rate is introduced. Next, the stochastic mortality projection models of the Lee-Carter framework are presented, along with a discussion of how they can be fitted and forecasted. This is followed by a review of how backtesting is used to evaluate a model’s forecasting ability. Lastly, we discuss how to construct prediction intervals by using bootstrap. All analysis is done using the StMoMo package in R (Villegas et al., 2018).

2.1 Central death rates and central exposures

The central death rate, 𝑚𝑥,𝑡, is a measure of how many died at age x in year t in relation

to how many on average lived at age x in year t. If 𝐷𝑥,𝑡 denotes the number of deaths

and 𝐸𝑥,𝑡 denotes the central exposed to risk, the central death rate can be defined as:

𝑚𝑥,𝑡 = 𝐷𝑥,𝑡 𝐸𝑥,𝑡 .

The death rates and central exposures are collected from the Human Mortality Data-base (HMD) and are thus calculated according to their methods protocol (Wilmoth et

al., 2005). However, at the time of writing, the demographic data for 2020 is not yet

available at HMD. The death rates and central exposures for 2020 are therefore calcu-lated by using data over deaths and population size from SCB’s database. Due to the rather advanced method used by HMD, the exposures for 2020 is calculated by fol-lowing simplified version:

𝐸𝑥,𝑡 =𝑃𝑥,𝑡+ 𝑃𝑥,𝑡+1

2 ,

where 𝑃𝑥,𝑡 and 𝑃𝑥,𝑡+1 refers to empirical estimates of population size at the beginning

(9)

6

2.2

Models in the Lee-Carter framework

Table 1. Evaluated models from the Lee-Carter family.

Model Structure LC ln(𝜇𝑥,𝑡) = 𝛼𝑥+ 𝛽 𝑥 (1) 𝜅𝑡(1) RH (𝛽𝑥(0)= 1) ln(𝜇𝑥,𝑡) = 𝛼𝑥+ 𝛽𝑥 (1) 𝜅𝑡(1)+ 𝛾𝑡−𝑥 APC ln(𝜇 𝑥,𝑡) = 𝛼𝑥+ 𝜅𝑡 (1) + 𝛾𝑡−𝑥 CBD ln(𝜇𝑥,𝑡) = 𝜅 𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2) M6 ln(𝜇𝑥,𝑡) = 𝜅 𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2)+ 𝛾𝑡−𝑥 M7 ln(𝜇 𝑥,𝑡) = 𝜅𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2)+ ((𝑥 − 𝑥̅)2− 𝜎̂𝑥2) 𝜅𝑡(3)+ 𝛾𝑡−𝑥 M8 ln(𝜇𝑥,𝑡) = 𝜅 𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2)+ (𝑥𝑐 − 𝑥)𝛾𝑡−𝑥 PLAT ln(𝜇𝑥,𝑡) = 𝛼𝑥+ 𝜅 𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2)+ 𝛾𝑡−𝑥

There are many modifications of the original Lee-Carter model (Lee and Carter, 1992). For example, Renshaw and Haberman's cohort extensions (2003, 2006, 2011), the original CBD model as well as the extended CBD models by Cairns et al. (2009), and the model by Plat (2009). In this thesis the models in the Lee-Carter family seen in Table 1 are evaluated.

Akin to a generalised linear model (GLM), the evaluated models consist of four com-ponents: the random component, the systematic component, the link function, and a set of parameter constraints (Villegas et al., 2018). Here, the random component, i.e., 𝐷𝑥,𝑡, is assumed to follow a Poisson distribution with the exposure to risk 𝐸𝑥,𝑡 times the expected death rate 𝜇𝑥,𝑡 as mean:

𝐷𝑥,𝑡 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝐸𝑥,𝑡𝜇𝑥,𝑡) 𝑤𝑖𝑡ℎ 𝜇𝑥,𝑡 = E (𝐷𝑥,𝑡

𝐸𝑥,𝑡).

The systematic component 𝜂𝑥,𝑡, i.e., the predictor, captures the effect of age x,

calen-dar year t, and year of birth (cohort) 𝑐 = 𝑡– 𝑥c on the death rate 𝑚𝑥,𝑡. Each model

has its own predictor, but a generalised form of the predictors can be written as:

𝜂𝑥,𝑡 = 𝛼𝑥+ ∑ 𝛽𝑥(𝑖)𝜅𝑡(𝑖)

𝑁

𝑖=1

(10)

7

The general shape of mortality by age is captured by the time-invariant age-specific parameters 𝛼𝑥. The long-term mortality trend is captured by the time-varying mortality

indices 𝜅𝑡(𝑖), from 𝑖 = 1, … , 𝑁, while the time-invariant age-specific parameters 𝛽𝑥(𝑖) capture the fact that mortality declines at different rates at different ages. The cohort index 𝛾𝑡−𝑥 captures the cohort effect, with the time-invariant age-specific parameters 𝛽𝑥(0) modulating its effect across ages.

The 𝛽𝑥(𝑖) can either be pre-specified functions of age, as in the Cairns-Blake-Dowd models, or non-parametric terms which must be estimated as in the Lee-Carter model. The mortality indices and the cohort index are assumed to be stochastic processes.

The link function 𝑔 links the random component and the predictor. Since the number of deaths, 𝐷𝑥,𝑡, is assumed to follow a Poisson distribution, the log-link will be used:

𝑔(𝜇𝑥,𝑡) = ln(𝜇𝑥,𝑡) = 𝜂𝑥,𝑡 .

The logarithm of the central death rate at age x and in year t is thus:

ln(𝑚𝑥,𝑡) = 𝜂𝑥,𝑡+ 𝜀𝑥,𝑡 ,

where the error term 𝜀𝑥,𝑡 represents particular age-specific historical influences not

captured by the model (Lee and Carter, 1992, p. 660). The mortality rate 𝑚𝑥,𝑡 and the number of deaths 𝐷𝑥,𝑡 can be obtained by using the inverse of the link function:

𝑚𝑥,𝑡 = exp(𝜂𝑥,𝑡+ 𝜀𝑥,𝑡),

𝐷𝑥,𝑡 = 𝐸𝑥,𝑡exp(𝜂𝑥,𝑡+ 𝜀𝑥,𝑡).

To ensure model identification, most of the models need a set of parameter

(11)

8

2.3 Modelling in a Poisson setting

There are different ways of estimating the parameters in the evaluated models. A com-mon approach is the one suggested by Lee and Carter (1992), who used ordinary least squares (OLS) via singular value decomposition (SVD). When using OLS via SVD, the errors are assumed to be normally distributed with mean zero and a constant vari-ance over all ages (i.e., homoscedastic). However, as pointed out by Brouhns et al. (2002), this assumption is quite unrealistic. The logarithm of the central death rate is much more variable at older ages than at younger ages. To overcome the problem of heteroscedasticity, the number of deaths will be assumed to follow a Poisson distribu-tion (ibid., 2002), which means that the parameters can be determined by maximizing the log-likelihood function, i.e., maximum likelihood estimation (Appendix A). The Poisson distribution has proven to be well suited to mortality analyses (Brillinger, 1986).

As discussed by Currie (2016) and Hunt and Villegas (2015), many stochastic mortal-ity models, including the ones in the Lee-Carter family, are examples of generalised linear or non-linear models. Therefore, the maximization of the log-likelihood function is done using the iterative fitting procedure by Turner and Firth (2015), with the set of parameter constraints presented in Appendix A.1

2.4 The Lee-Carter model and its extensions

The Lee-Carter model. The Lee-Carter model (Lee and Carter, 1992) was originally intended as a method to make long-run forecasts in the United States, but due to its simplicity and forecast performance, it has become a leading statistical method of mor-tality forecasting and is often used as a benchmark for other mormor-tality models. The

1 However, due to robustness and convergence issues, fitting cohort extensions of the Lee-Carter model is

(12)

9

predictor of the original Lee-Carter model consists of a static age term 𝛼𝑥, a mortality

index 𝜅𝑡(1), and a non-parametric age-modulating term 𝛽𝑥(1), 𝜂𝑥,𝑡 = 𝛼𝑥+ 𝛽𝑥

(1)

𝜅𝑡(1).

Renshaw and Haberman’s cohort extensions. Renshaw and Haberman (2006) ex-tended the Lee-Carter model by adding a cohort effect,

𝜂𝑥,𝑡 = 𝛼𝑥+ 𝛽𝑥(1)𝜅𝑡(1)+ 𝛽𝑥(0)𝛾𝑡−𝑥.

Moreover, several substructures can be obtained by setting one or both of the age-modulating terms to a constant. Particularly popular is the following model, obtained by setting 𝛽𝑥(0) = 1,

𝜂𝑥,𝑡 = 𝛼𝑥+ 𝛽𝑥(1)𝜅𝑡(1)+ 𝛾𝑡−𝑥.

It was suggested by Haberman and Renshaw (2011) as a simpler version that solves some stability issues of the original model. In line with their conclusion, the original model will not be used. Thus, the RH model with 𝛽𝑥(0)= 1 will be referred to as simply the RH model.

Another popular substructure is the so-called age-period-cohort (APC) model, which is obtained by setting 𝛽𝑥(1) = 𝛽𝑥(0) = 1 ,

𝜂𝑥,𝑡 = 𝛼𝑥+ 𝜅𝑡(1)+ 𝛾𝑡−𝑥.

The Cairns-Blake-Dowd models. The two-factor Cairns-Blake-Dowd (CBD) model, developed by Cairns et al. (2006), is a well-known variant of the Lee-Carter model. The predictor of the original CBD model has two mortality indices, 𝜅𝑡(1) and 𝜅𝑡(2), and two pre-specified age-modulating parameters, 𝛽𝑥(1) = 1 and 𝛽𝑥(2) = 𝑥 − 𝑥̅ , where 𝑥̅ is the average age in the data,

𝜂𝑥,𝑡 = 𝜅𝑡 (1)

(13)

10

The two-factor CBD model does not have a problem with identifiability. Thus, the model does not need any parameter constraints.

Moreover, Cairns et al. (2009) proposed three extensions of the original CBD model, known as M6, M7 and M8 respectively,

𝜂𝑥,𝑡 = 𝜅𝑡(1)+ (𝑥 − 𝑥̅)𝜅𝑡(2)+ 𝛾𝑡−𝑥, 𝜂𝑥,𝑡 = 𝜅𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2)+ ((𝑥 − 𝑥̅)2− 𝜎̂𝑥2) 𝜅𝑡 (3) + 𝛾𝑡−𝑥, 𝜂𝑥,𝑡 = 𝜅𝑡 (1) + (𝑥 − 𝑥̅)𝜅𝑡(2)+ (𝑥𝑐 − 𝑥)𝛾𝑡−𝑥,

where 𝜎̂𝑥2 is the average value of (𝑥 − 𝑥̅)2.

The Plat model. Plat (2009) introduced a model that incorporates features of both the CBD model and the Lee-Carter model, as well as captures the cohort effect,

𝜂𝑥,𝑡 = 𝛼𝑥+ 𝜅𝑡(1)+ (𝑥 − 𝑥̅)𝜅𝑡(2)+ 𝛾𝑡−𝑥.

2.5 Model evaluation

The objective of this thesis is to predict the mortality one year forward, for 2020, by using data up until 2019. Therefore, it is relevant to evaluate a model’s fit, to find which model to the greatest extent captures the unique patterns and structural changes in Swedish mortality data. A model that fits the data well will capture all the relevant structures in the data. One way to assess the goodness-of-fit is by analysing the residual plots. Strong clustering of residuals indicates that the model fails at capturing the im-portant structures of the data.

(14)

11

could result in overfitting. The model's in-sample accuracy will be exceedingly high, but its out-of-sample capacity will be poor because the model’s parameters are so highly adapted to the fitted data. The aim is to find a model with a good fit, a good predictive ability, and that is interpretable. AIC and BIC penalise models with high numbers of parameters, allowing for a relative comparison of a model’s goodness-of-fit (Villegas et al., 2018, p. 19).

2.6 Forecasting

Once the models are estimated, to make forecasts, the mortality indices, 𝜅𝑡(𝑖), 𝑖 = 1, … , 𝑁, and the cohort index, 𝛾𝑡−𝑥 , are modelled as stochastic time series. To illus-trate this, the death rate is forecasted as follows when using the LC model:

𝑚̂𝑥,𝑡+1 = exp(α̂𝑥+ 𝛽̂𝑥𝜅̂𝑡+1(1)),

where 𝜅̂𝑡+1(1) is the point forecast. This means that appropriate approximations of the underlying data generating processes must be identified. A standard approach is to assume that the mortality indices follow a multivariate random walk with drift. An-other option is to forecast the mortality indices using 𝑁 independent ARIMA (𝑝𝑖, 𝑑𝑖, 𝑞𝑖) models. The cohort index is also modelled as an ARIMA (𝑝, 𝑑, 𝑞). There are dif-ferent ways to choose the appropriate ARIMA model. For example, one can use the Box-Jenkins method as in Lee and Carter (1992).

(15)

12

2.7 Forecast accuracy evaluation

When forecasting, it is imperative to evaluate how accurate models are at forecasting – the skill of the models. To evaluate the forecast accuracy, the forecast error of the models can be measured. By comparing the difference between the forecasted values of a period and the observed values of the same period it is possible to judge how good the models’ forecast abilities are.

The purpose of our time series forecasting is to extrapolate. Since time series data are time-dependent – the observations are not independent of each other – using a valida-tion method such as cross-validavalida-tion is not possible. In this case, it is pertinent to use a method that considers the temporal aspect of the data. Backtesting is one such vali-dation method that adheres to the temporal sequence of the historical data during the validation process. There are many ways of performing a backtest. One-step-ahead forecasting is an example of a backtest procedure and will be used in this study for model validation.

During the one-step-ahead forecast, the models are fitted on a base period of chosen length and then a forecast is made with a forecast horizon of one step. The forecast horizon is then moved forward in one step increments. The base period length is fixed and moves one step forward along with the forecast horizon; this kind of fixed base period is called a sliding window. This is performed as an out-of-sample validation where some of the known data are used for estimation of the models and some of the most recent known data points are withheld for the models to be evaluated on after-wards. For each step forward in the process, the parameters of the models are re-esti-mated. This process is done for a chosen number of one-steps. The forecasted value for each step is then compared to the corresponding observed value of that step. The difference between the point forecast value and the observed value is the forecast error. A model with good forecasting accuracy will have low forecast errors. This study will use the Mean average percentage error (MAPE) metric to calculate the forecast errors. MAPE is defined as:

𝑀𝐴𝑃𝐸𝑖 = 1

𝑁∑ |

𝑚̂𝑥,𝑡− 𝑚𝑥,𝑡

𝑚𝑥,𝑡 |

(16)

13

where 𝑁 is the number of age classes, which is 51 in our case, and the steps are 𝑖 = 1, 2, … , 𝑛. For each step, 𝑚̂𝑥,𝑡 is the forecast of the death rate, and 𝑚𝑥,𝑡 is the observed death rate. Since each step in the process will generate a MAPE value, the mean MAPE value of each model will be presented along with the median, minimum, and maximum values. For mortality projection evaluation it is important to consider the higher mor-tality rates in the older ages. Using metrics such as Root mean squared error can result in the forecast errors for the older ages being overrepresented. By using a relative measure such as MAPE, which normalises the forecast error values, a more accurate picture of the size of the models’ forecast performances is attained.

Furthermore, it is important to investigate the models’ ability to produce plausible forecasts (Cairns et al., 2011). This can be done by analysing simulated trajectories. As Cairns et al. (2011, p. 356) put it, the essential question is “what mixture of biolog-ical factors, medbiolog-ical advances and environmental changes would have to happen to cause this particular set of forecasts?”. They also argue that it is relevant to consider if the forecast levels of uncertainty at different ages are consistent with the greater ob-served volatility in higher ages’ mortality rates.

2.8 Prediction intervals

A prediction interval is an indicator of the likely uncertainty in a point forecast. There are several ways of constructing prediction intervals, which account for different causes of uncertainty. This study will consider three sources of uncertainty: (1) the forecast error in the mortality and cohort indices, (2) the error in the model fitting, and (3) the uncertainty arising from the natural variability of the number of deaths, 𝐷𝑥,𝑡. One way of constructing prediction intervals is by simulating trajectories from the fit-ted mortality model.2 However, such prediction intervals only account for the uncer-tainty arising from the error in the forecast of the mortality and cohort indices.

2When assuming the mortality indices, 𝑘 𝑡 (𝑖)

(17)

14

By using bootstrap, prediction intervals can be obtained that also takes into account the uncertainty arising from the estimation of the parameters of the mortality projec-tion model, since the true parameter values are unknown. Two common procedures are the semiparametric bootstrap approach proposed by Brouhns et al. (2005) and the residual bootstrap approach suggested by Koissi et al. (2006). In this thesis, the semi-parametric bootstrap approach by Brouhns et al. (2005) will be implemented. First B samples of the number of deaths 𝑑𝑥,𝑡𝑏 , from 𝑏 = 1, … , 𝐵, are generated by sampling from the Poisson distribution with the observed number of deaths, 𝑑𝑥,𝑡 as mean.3 The mortality model is then re-estimated on each bootstrapped sample, 𝑑𝑥,𝑡𝑏 , in order to obtain B bootstrapped parameter estimates: 𝛼̂𝑥𝑏; 𝛽̂

𝑥(1),𝑏, … , 𝛽̂𝑥(𝑁),𝑏; κ̂𝑡(1),𝑏, … , κ̂𝑡(𝑁),𝑏;

𝛽̂𝑥(0),𝑏; 𝛾𝑡−𝑥𝑏 . Afterwards, the mortality and cohort indices of each bootstrapped sample are simulated forward. In this study, there will only be one simulated path per boot-strapped sample, which means that the total number of paths will equal B. Confidence and prediction intervals are then obtained by using the quantiles. For example, a 95 percent interval is obtained by using the 2.5th and 97.5th percentiles.

To construct a prediction interval that takes into account all three sources of uncer-tainty, the following procedure will be used. Once the model has been bootstrapped using the approach by Brouhns et al. (2005) and simulated forward, a new number of deaths for 2020 will be generated by drawing from a Poisson distribution with the bootstrapped simulated number of deaths for 2020, 𝑑̂𝑥,𝑡+1𝑏 = 𝑚̂𝑥,𝑡+1𝑏 × 𝐸𝑥,𝑡+1, as mean. Where 𝐸𝑥,𝑡+1, is the observed exposure to risk and 𝑚̂𝑥,𝑡+1𝑏 is the bootstraped simulated death rate.4 Thus, 𝐵 new simulated number of deaths for 2020 will be generated. Af-terwards, prediction intervals that account for all three sources of uncertainty can be obtained by using the quantiles of 𝑑̂𝑥,𝑡+1𝑏 .

3 Another option is to use the fitted number of deaths instead of the observed number of deaths, as suggested by

Renshaw and Haberman (2008).

4When using the LC model, the simulated bootstrapped death rate is 𝑚̂

𝑥,𝑡+1𝑏 = exp (𝛼̂𝑥𝑏+ 𝛽̂𝑥𝑏𝜅̂𝑡+1 (1),𝑏

(18)

15

3. Data and model selection

In this section, we first present the data. Then the choice of fitting period and age group will be discussed. Lastly, the model selection process is performed, and the results evaluated.

3.1 Data

The main data source is the Human Mortality Database, which provides empirical es-timates of central death rates and central exposures for the years 1751 to 2019 (HMD, n.d.). The data material HMD uses comes almost exclusively from SCB.

The empirical estimates for 2020 are not yet available at HMD and are therefore cal-culated using data from SCB’s database over death counts for 2020 and population estimates for 2019 and 2020 (SCB, 2021a-b).

(19)

16

Figure 2. Death rate versus time at different ages from age 0 to age 90.

(20)

17

3.2 Choosing fitting period and age group

Two important aspects of the chosen data need to be considered when modelling mor-tality. The first is which time period the chosen models are to be fitted on, the so-called

fitting period. The second is what age range should be included in the analysis. The

choice of these two factors is a significant determinant of the forecasting accuracy of a model (Li and Li, 2017).

(21)

18

Figure 3. The number of COVID-19 deaths, and the proportion of COVID-19 deaths of total

deaths for all ages in ten-year age groups during 2020, using data from Socialstyrelsen (2020, 2021).

To ensure that the model produces as accurate forecasts as possible it is important to choose an appropriate fitting period. The length of the fitting period should be deter-mined by the aim of the forecast. For short-term forecasting, a short fitting period is appropriate (Li and Li, 2017). The structures in mortality in most recent years are more relevant: they have a greater and more immediate effect on short-term mortality. Using a too long fitting period for short-term forecasting can result in the model not capturing the smaller and more recent mortality trends, and thus making short-term forecasts based on long-term trends. Alternatively, using a too short fitting period for long-term forecasting can lead to forecasts which do not account for the long-term mortality trend or do not account for the fact that trends can break (O’Hare and Li, 2014).

Choosing the fitting period of 1920–2019 (thus excluding the effect of the Spanish Flu in 1918) it is clear none of the models can properly capture the systematic structures of mortality in Swedish data. The residual plots for all models show systematic errors (Appendix B). This indicates that using the fitting period 1920–2019 is too long for the models to capture important underlying structures in the data.

(22)

19

1992), these studies have tried to find a fitting period where these assumptions hold. One study by Li and Li (2017, p. 1085), found that the estimated optimal fitting period for the Swedish unisex population of the age group 0–89-year-olds are with a period starting from 1976. However, they found that for the shorter age range of 50–89-year-olds a base period starting from 1993 is better (ibid., p. 1092). Lundström and Qvist (2004) saw that the linearity of the mortality index holds for a base period of the past 50 years. According to them, the last 50 years of Swedish mortality data are sufficient as it captures the most important changes in mortality during the 20th century such as drops in infectious and chronic diseases (ibid., pp. 49–50). However, they argue that limiting the fitting period to only the last 25 years could be necessary “in order to deal with the different phases of falling mortality for males and females” (ibid., p. 50). Both studies place the start of the appropriate fitting period to fit a model for forecasting Swedish mortality rates around the early to mid-1990s. As our objective is to make a short-term forecast of just one year, for 2020, a base period starting from 1993, as proposed by Li and Li (2017), will be used.

3.3 Model selection

(23)

20

Table 2. The AIC and BIC values for the LC, RH, and PLAT models, 𝑛𝑝is the number of

ef-fective parameters in the model.

Models 1993 – 2019 AIC (rank) BIC (rank) 𝑛𝑝 LC 14244 (3) 14907 (2) 127 RH (𝛽𝑥(0)= 1) 13991 (2) 15047 (3) 202 PLAT 13967 (1) 14893 (1) 177

The next phase is to identify the appropriate ARIMA models for the mortality and cohort indices. In line with previous research (Lee and Carter, 1992), the mortality indices of the LC model and the RH model are forecasted using a random walk with drift (Figure C1 and C3 in Appendix C):

𝜅𝑡(1)= 𝑐 + 𝜅𝑡−1(1) + 𝑒𝑡 𝑤ℎ𝑒𝑟𝑒 𝑒𝑡~𝑁(𝑜, 𝜎2).

This can be motivated by inspecting the autocorrelation function (ACF) and the partial autocorrelation function (PACF) of the first difference (Figure C2 and C4 in Appendix C). If the mortality index follows a random walk, then the first difference should be white noise. The cohort index of the RH model is assumed to follow an ARIMA (0, 2, 2) process (Figure C5 in Appendix C):

(1 − 𝐵)2𝛾

𝑡−𝑥= (1 + 𝜃1𝐵 + 𝜃2𝐵2)𝑒𝑡 𝑤ℎ𝑒𝑟𝑒 𝑒𝑡~𝑁(𝑜, 𝜎2).

This decision was made after inspecting the correlogram of the cohort index as well as the AIC and BIC values. To affirm that an ARIMA (0, 2, 2) is suitable, the correlogram of the residuals was also inspected (Figure C6 in Appendix C). If it is a good fit, then the residuals should have no significant autocorrelations.

(24)

21

evaluated using 15 steps. The first step forecasted is the year 2005 and the last step being year 2019. The sliding window will have a width of 26 years and will move fifteen steps along with the forecast horizon. The first step will use the period 1978– 2004 for parameter estimation and 2005’s death rates will be forecasted. Then, taking one step forward – moving the estimation period to 1979–2005 – the year 2006’s death rates are forecasted, and so forth until all fifteen steps up until 2019 have been taken.

The mean MAPE values for both models indicate low percentage forecast errors (Ta-ble 3). The mean MAPE for the LC model is approximately 3.11 percent. For the RH model, the corresponding mean MAPE value for the same fifteen steps averages around 3.32 percent. Both models demonstrate a good forecast accuracy. From the overall assessment of the goodness-of-fit and forecast ability, it is clear the LC model can accomplish a wanted level of forecasting accuracy with a simpler model structure. Therefore, the LC model is chosen for the mortality forecasting in Section 4.

Table 3. Mean, median, minimum, and maximum MAPE values from the one-step-ahead

forecast for LC and RH models.

Models MAPE (%)

Mean Median Min Max

LC 3.114 2.888 2.472 5.057

(25)

22

4. Results and analysis

As seen in Figure 6, 𝛼𝑥 is increasing with age, indicating that there is a strong positive relationship between mortality and age – older ages have higher mortality rates. The mortality index 𝜅𝑡(1)shows a linearly decreasing trend: the mortality rate improves over time. The assumption of the mortality index being approximately linear in the long-term seems to hold. The grey area represents a 95 percent prediction interval of the mortality index when modelled as a random walk with drift. The estimate of the pa-rameter 𝛽𝑥(1) is decreasing with age. Since 𝛽𝑥(1) shows how changes in time affect the different age groups’ mortality rates, a decreasing line indicates a higher mortality im-provement for the younger ages in the selected age range than that of the older ages. However, it plateaus between the ages 60–80, indicating that mortality improves at the same rate for these ages.

Figure 4. Estimated parameters of the LC model, with a forecast horizon of 10 years and a 95

(26)

23

Figure 5. Actual and fitted death rates in log scale, 19932019, for the selected ages of x = 60, x = 70, x = 80, and x = 90, using the LC model.

In Figure 6, we see the fitted death rates compared to the observed death rates for four different ages. The fit is good for all selected ages as the actual death rates are close to the fitted line.

Figure 7 illustrates the effect the uncertainty arising from the forecast error in the mor-tality index and the error in the model fitting has on the projection of age-specific death rates. As seen, parameter uncertainty does not appear to have a large impact on the width of the prediction interval. The prediction interval which accounts for parameter uncertainty (dot-dashed red lines) and the prediction interval which only accounts for the forecast error in the mortality index (black dotted lines) coincide closely for all the ages.

(27)

24

Figure 6. Observed, fitted, and forecasted death rates at the selected ages x = 60, x = 70, x

= 80 and x = 90, with 95 percent confidence and prediction intervals. Dots correspond to observed death rates, solid black lines to fitted rates and dashed lines show point forecasts. The black dotted lines represent 95 percent prediction intervals excluding parameter uncer-tainty (500 simulations) while the dot-dashed red lines represent 95 percent confidence and prediction intervals including parameter uncertainty.

Table 4 shows the observed number of deaths, the forecasted number of deaths (i.e., point forecasts), and the estimated number of excess deaths (i.e., the difference be-tween the observed number of deaths and the point forecast) at different ages and in total. The 95 percent prediction intervals account for all three sources of uncertainty, i.e., the forecast error in the mortality index, parameter uncertainty, and the uncertainty which arises from the natural variability of the number of deaths.

(28)

25

Table 4. Observed and forecasted (excess) deaths for 2020 using the LC model, with 95

per-cent prediction intervals accounting for all three sources of uncertainty.

Mortality Excess mortality

Age Observed Point

forecast LB of PI UB of PI Estimated LB of PI UB of PI Sig. high

(29)

26

Figure 7. Excess death for 2020, for 50–100-year-olds. Difference between the observed

num-ber of deaths and the forecasted numnum-ber of deaths, with 95 percent prediction interval for each difference.

The prediction intervals for the number of excess deaths can be used to determine whether the estimated excess mortality is statistically significant, by analysing if they cover zero or not (Figure 7). The prediction intervals at the ages 50–57, 60–63, 66, 95, and 97–99 cover zero, which implies that the estimated excess mortality at these ages is not statistically significant. However, the prediction intervals for the oldest ages should be analysed with caution. Since these populations are small, each death will have a large impact on the variability resulting in higher volatility. With a small pop-ulation the absolute number of deaths will be smaller which in turn makes it harder to discover excess mortality. This is also why some of these ages show non-significant excess mortality (ages 95 and 97–99) and others significant excess mortality (ages 96 and 100). Nevertheless, the large age group 67–94-year-olds has statistically signifi-cant excess mortality.

Table 5. Observed and forecasted total number of deaths and excess deaths for 2020 with 95

percent prediction intervals.

Point forecast Lower bound of PI Upper bound of PI

Total deaths 85558 82726 88259

Total excess

(30)

27

Apart from analysing the differences in excess mortality between ages, it is also inter-esting to see whether the estimated total number of excess deaths, aggregated over all ages, is statistically significant from zero. By aggregating the data we can get an over-all understanding of the pandemic’s effect on mortality. If the pandemic had not oc-curred, the expected total number of deaths for 2020 would according to our forecast be 85558 for the age group 50–100-year-olds. The observed number for 2020 is 94226 deaths. Taking the difference gives us the estimated total number of excess deaths, which is 8668. As seen in Table 5, the lower bound of the prediction interval for the total number of excess deaths is 5967. Showing that the total number of excess deaths is statistically significant from zero.

Another equivalent approach is to use the prediction intervals for the number of deaths. The mortality is significantly high if the observed number of deaths is over the upper bound of the prediction interval. For example, as we can see, the observed total number of deaths at 94226 is higher than the upper bound of the prediction interval for the total number of deaths at 88259 (Table 5).

To evaluate whether the forecasted excess deaths could be a result of the COVID-19 pandemic, insight can be gained by comparing the forecast with the number of regis-tered COVID-19 deaths for the age group 50–100-year-olds in 2020. As can be seen, the forecasted number of excess deaths and the registered deaths in COVID-19 are close in size (Table 6). According to FHM, 9617 people died of COVID-19, while Socialstyrelsen reported 8969. Both numbers fall within the bounds of the prediction interval. This could be an indication that these excess deaths were caused by the COVID-19 virus. It also affirms that the estimated number of excess deaths is reason-able, as the data from FHM and Socialstyrelsen are comprehensive.

Table 6. Estimated total number of excess deaths for ages 50100 and registered COVID-19 deaths in 2020. Forecasted excess deaths Reported COVID-19 deaths* Reported COVID-19 deaths** Number of deaths 8668 8969 9617

(31)

28

5. Conclusion

The aim of this thesis was to investigate how the COVID-19 pandemic affected mor-tality in Sweden by analysing whether it caused excess mormor-tality during 2020. Evalu-ation of the different models within the Lee-Carter framework showed that for model-ling Swedish mortality data the LC model achieved the desired level of fit and forecast ability. According to the results of this study, the pandemic caused excess mortality in Sweden during 2020. The observed number of deaths for the age group 50–100-year-olds exceeds the expected non-crisis level of mortality. The aggregated excess mortal-ity of 8668 deaths was statistically significant from zero, which points to that these excess deaths were brought on by the COVID-19 pandemic.

When comparing different age groups, there is no significant excess mortality in the younger age groups 50–57 and 60–63, while the large age group 67–94-year-olds suf-fered a statistically significant excess mortality. This is in line with both data from FHM, which show that the age group 70 and over is more severely affected by COVID-19, and previous studies. For the oldest ages 95–100 the results differ, as these age groups are small and therefore have large variability in mortality. Hence, our results for these oldest ages do not show conclusive evidence of excess mortality.

(32)

29

References

Brillinger, D.R. 1986, "A Biometrics Invited Paper with Discussion: The Natural Var-iability of Vital Rates and Associated Statistics", Biometrics, vol. 42, no. 4, pp. 693– 734.

Brouhns, N., Denuit, M. & Van Keilegom, I. 2005, "Bootstrapping the Poisson log-bilinear model for mortality forecasting", Scandinavian actuarial journal, vol. 2005, no. 3, pp. 212–224.

Brouhns, N., Denuit, M. & Vermunt, J.K. 2002, "A Poisson log-bilinear regression approach to the construction of projected lifetables", Insurance, mathematics &

economics, vol. 31, no. 3, pp. 373–393.

Cairns, A.J.G., Blake, D. & Dowd, K. 2006, "A Two-Factor Model for Stochastic Mortality with Parameter Uncertainty: Theory and Calibration", The Journal of risk

and insurance, vol. 73, no. 4, pp. 687–718.

Cairns, A.J.G., Blake, D., Dowd, K., Coughlan, G.D., Epstein, D. & Khalaf-Allah, M. 2011, "Mortality density forecasts: An analysis of six stochastic mortality models",

Insurance, mathematics & economics, vol. 48, no. 3, pp. 355–367.

Cairns, A.J.G., Blake, D., Dowd, K., Coughlan, G.D., Epstein, D., Ong, A. & Balevich, I. 2009, "A Quantitative Comparison of Stochastic Mortality Models Using Data From England and Wales and the United States", North American actuarial journal, vol. 13, no. 1, pp. 1–35.

Checchi, F., & Roberts, L. 2005, “Interpreting and using mortality data in humanitar-ian emergencies: a primer for non-epidemiologists”, Humanitarhumanitar-ian Practice Net-work Paper, no. 52, pp. 1–38.

Currie, I.D. 2016;2014, “On fitting generalized linear and non-linear models of mor-tality”, Scandinavian actuarial journal, vol. 2016, no. 4, pp. 356–383.

EuroMomo. No date, “Work Package 7 Report – A European algorithm for a common monitoring of mortality across Europe”,

https://www.euromomo.eu/up-loads/pdf/wp7_report.pdf (Accessed: 12 April 2021).

Folkhälsomyndigheten. 2020a, Smittskydd och övervakning, https://www.folkhals-

omyndigheten.se/smittskydd-beredskap/utbrott/aktuella-utbrott/covid-19/om-sjuk-domen-och-smittspridning/smittspridning/smittskydd-och-overvakning (Accessed:

26 May 2021).

Folkhälsomyndigheten. 2020b, Covid-19-pandemins tänkbara konsekvenser på folk-hälsan, Solna: Folkhälsomyndigheten, https://www.folkhalsomyndigheten.se/pub-

(33)

30

Folkhälsomyndigheten. 2021, Bekräftade fall i Sverige – daglig uppdatering,

https://www.folkhalsomyndigheten.se/smittskydd-beredskap/utbrott/aktuella-ut-brott/covid-19/statistik-och-analyser/bekraftade-fall-i-sverige/ (Accessed: 11 April

2021).

Garcia, I. 2020, Områdena som drabbats hårdast av Corona i Stockholm. Sveriges

Ra-dio. 7 April. https://sverigesradio.se/artikel/7447621 (Accessed: 11 April 2021). Giattino, C., Ritchie, H., Roser, M., Ortiz-Ospina, E., & Hasell, J. 2021, Excess

mor-tality during the Coronavirus pandemic (COVID-19),

https://our-worldindata.org/excess-mortality-covid (Accessed: 12 April 2021).

Grasso, M., Klicperová-Baker, M., Koos, S., Kosyakova, Y., Petrillo, A. & Vlase, I. 2021, "The impact of the coronavirus crisis on European societies. What have we learnt and where do we go from here? - Introduction to the COVID volume",

Eu-ropean societies, vol. 23, no. S1, pp. S2–S32.

Gutterman, S. & Vanderhoof, I.T. 1998, "Forecasting Changes in Mortality: A Search for a Law of Causes and Effects", North American actuarial journal, vol. 2, no. 4, pp. 135–138.

Haberman, S. & Renshaw, A. 2009, "On age-period-cohort parametric mortality rate projections", Insurance, mathematics & economics, vol. 45, no. 2, pp. 255–270. Haberman, S. & Renshaw, A. 2011, "A comparative study of parametric mortality

projection models", Insurance, mathematics & economics, vol. 48, no. 1, pp. 35– 55.

Holm, G. 2021, Nya mönstret i pandemin – fler dör utanför storstäderna. Expressen. 9 January.

https://www.expressen.se/nyheter/nya-monstret-ipandemin-fler-dor-utan-for-storstaderna/ (Accessed: 11 April 2021).

Human Mortality Database. No date, University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). [Dataset]. Available

at www.mortality.org or www.humanmortality.de (Accessed: 23 April 2021).

Hunt, A. & Villegas, A.M. 2015, "Robustness and convergence in the Lee–Carter model with cohort effects", Insurance, mathematics & economics, vol. 64, pp. 186– 202.

Jämställdhetsmyndigheten. 2021, Fler män dör men fler kvinnor blir sjuka i covid-19,

https://www.jamstalldhetsmyndigheten.se/nyhet/coronapandemin-och-halsa-ur-ett-jamstalldhetsperspektiv (Accessed: 11 April 2021).

Koissi, M., Shapiro, A.F. & Högnäs, G. 2006, "Evaluating and extending the Lee– Carter model for mortality forecasting: Bootstrap confidence interval", Insurance,

mathematics & economics, vol. 38, no. 1, pp. 1–20.

(34)

31

Lee, R.D. & Carter, L.R. 1992, "Modeling and Forecasting U.S. Mortality", Journal

of the American Statistical Association, vol. 87, no. 419, pp. 659–671.

Li, H. & Li, J.S. 2017, "Optimizing the Lee-Carter Approach in the Presence of Struc-tural Changes in Time and Age Patterns of Mortality Improvements", Demography, vol. 54, no. 3, pp. 1073–1095.

Lundström, H. & Qvist, J. 2004, "Mortality Forecasting and Trend Shifts: an Applica-tion of the Lee-Carter Model to Swedish Mortality Data", InternaApplica-tional statistical

review, vol. 72, no. 1, pp. 37–50.

Modig, K., Ahlbom, A. & Ebeling, M. 2021, "Excess mortality from COVID-19: weekly excess death rates by age and sex for Sweden and its most affected region",

European journal of public health, vol. 31, no. 1, pp. 17–22.

O'Hare, C. & Li, Y. 2014, “Identifying Structural Breaks in Stochastic Mortality Mod-els”, Journal of Risk and Uncertainty in Engineering part B. Available at http://dx.doi.org/10.2139/ssrn.2192208.

Plat, R. 2009, "On stochastic mortality modeling", Insurance, mathematics &

econom-ics, vol. 45, no. 3, pp. 393–404.

Renshaw, A.E. & Haberman, S. 2003, "Lee–Carter mortality forecasting with age-specific enhancement", Insurance, mathematics & economics, vol. 33, no. 2, pp. 255–272.

Renshaw, A.E. & Haberman, S. 2006, "A cohort-based extension to the Lee–Carter model for mortality reduction factors", Insurance, mathematics & economics, vol. 38, no. 3, pp. 556–570.

Renshaw, A.E. & Haberman, S. 2008, "On simulation-based approaches to risk meas-urement in mortality with specific reference to Poisson Lee–Carter modelling",

In-surance, mathematics & economics, vol. 42, no. 2, pp. 797–816.

Socialstyrelsen. 2020, ”Dödsorsaker första halvåret 2020”. Published 2020-11-17. Art. nr: 2020-11-7034.

Socialstyrelsen. 2021, ”Statistik om dödsorsaker andra halvåret 2020”. Published 2021-03-18. Art. nr: 2020-11-7034.

Statistiska Centralbyrån. 2018, “The future population of Sweden 2018–2070”. De-mographic reports 2018:1. https://www.scb.se/contentassets/b3973c6465b446a69

0aec868d8b67473/be0401_2018i70_br_be51br1801.pdf (Accessed: 18 April

2021).

Statistiska Centralbyrån. 2021a, Life table by sex and age. Year 1960 – 2020. [Dataset]. Available at https://www.statistikdatabasen.scb.se/ (Accessed: 25 April 2021). Statistiska Centralbyrån. 2021b, Population by age and sex. Year 1860 – 2020.

(35)

32

Turner, H. & Firth, D. 2015, “Generalized Nonlinear Models in R: An Overview of the gnm Package”, R package version 1.0-8,

http://CRAN.R-project.org/pack-age=gnm.

Villegas, A.M., Kaishev, V.K. & Millossovich, P. 2018, "StMoMo: An R Package for Stochastic Mortality Modeling", Journal of statistical software, vol. 84, no. 3, pp. 1–38.

Wilmoth, J. R., Andreev, K., Jdanov, D., Glei, D.A. & Riffe, T. with the assistance of Boe, C., Bubenheim, M., Philipov, D., Shkolnikov, V., Vachon, P., Winant, C. & Barbieri, M. 2005, “Methods protocol for the human mortality database”,

http://www.mortality.org/Public/Docs/MethodsProtocol.pdf (Accessed: 8 April

2021).

Wilmoth, J.R. 2000, "Demography of longevity: past, present, and future trends",

Ex-perimental gerontology, vol. 35, no. 9, pp. 1111–1129.

World Health Organization. No date, Coronavirus,

(36)

33

Appendices

Appendix A: Equations

Identification constraints The LC model: ∑ 𝛽𝑥= 1 𝑎𝑛𝑑 ∑ 𝜅𝑡 = 0 𝑡 . 𝑥 The RH model: ∑ 𝛽𝑥(1) = 1, ∑ 𝜅𝑡(1) = 0 𝑡 𝑎𝑛𝑑 ∑ 𝛾𝑐 = 0 𝑡𝑛−𝑥1 𝑐=𝑡1−𝑥𝑘 . 𝑥

The APC model:

∑ 𝜅𝑡(1) = 0, ∑ 𝛾𝑐 = 0 𝑡𝑛−𝑥1 𝑐=𝑡1−𝑥𝑘 𝑎𝑛𝑑 ∑ 𝑐𝛾𝑐 = 0 𝑡𝑛−𝑥1 𝑐=𝑡1−𝑥𝑘 . 𝑡

The M6, M7 and M8 models:

∑ 𝛾𝑐 = 0 𝑡𝑛−𝑥1 𝑐=𝑡1−𝑥𝑘 , ∑ 𝑐𝛾𝑐 = 0 𝑡𝑛−𝑥1 𝑐=𝑡1−𝑥𝑘 𝑎𝑛𝑑 ∑ 𝑐2𝛾 𝑐 = 0. 𝑡𝑛−𝑥1 𝑐=𝑡1−𝑥𝑘

The PLAT model:

(37)

34 Maximum Likelihood Estimation

Given that the number of deaths, 𝐷𝑥,𝑡, at age x in year t follows a Poisson

distribu-tion, the log-likelihood function is:

ln 𝐿𝑥,𝑡(𝜃; 𝐷𝑥,𝑡) = ln ( 𝜆𝑥,𝑡𝐷𝑥,𝑡 𝑒−𝜆𝑥,𝑡 𝐷𝑥,𝑡! ) = ln (𝜆𝐷𝑥,𝑡𝑥,𝑡 𝑒−𝜆𝑥,𝑡) − ln(𝐷 𝑥,𝑡!) = 𝐷𝑥,𝑡ln(𝜆𝑥,𝑡) − 𝜆𝑥,𝑡− ln(𝐷𝑥,𝑡!),

where 𝜃 are the set of parameters in the mortality projection model and 𝜆𝑥,𝑡 is the

mean in the Poisson distribution, which is equal to 𝐸𝑥,𝑡𝜇𝑥,𝑡. Given that the 𝐷𝑥,𝑡 are

independent, the joint log-likelihood can be written as:

(38)

35

Appendix B: Residual plots

Figure B1. Scaled deviance residuals of the candidate models for fitting periods

(39)

36

Figure B2. Scaled deviance residuals of the candidate models for fitting periods

(40)

37

Figure B3. Scaled deviance residuals of the candidate models for fitting periods

(41)

38

Figure B4. Scaled deviance residuals of the candidate models for fitting periods

(42)

39

Appendix C: Identification of ARIMA model

Figure C1. Mortality index of the LC model forecasted as a random walk with drift.

(43)

40

Figure C3. Mortality index of the RH model forecasted as a random walk with drift.

(44)

41

Figure C5. Cohort index of the RH model forecasted as an ARIMA (0, 2, 2).

References

Related documents

This shows a much more clear picture and based on this result we can see that the joint models (M2-M6) have better accuracy (in terms of mean MAPE) than the individual model

The aim of this essay was to investigate if there is a significant difference in the confidence for various authorities among different citizen groups in Sweden, in time of crisis

Stöden omfattar statliga lån och kreditgarantier; anstånd med skatter och avgifter; tillfälligt sänkta arbetsgivaravgifter under pandemins första fas; ökat statligt ansvar

The literature suggests that immigrants boost Sweden’s performance in international trade but that Sweden may lose out on some of the positive effects of immigration on

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

Parallellmarknader innebär dock inte en drivkraft för en grön omställning Ökad andel direktförsäljning räddar många lokala producenter och kan tyckas utgöra en drivkraft

The purpose of this thesis is to show how a publicly accessible epidemic model using real-world population and mobility data can predict large-scale behavior of COVID-19 in