Forecasting Mortality Rates using the Weighted Hyndman-Ullah Method

(1)

School of Education, Culture and Communication

Division of Mathematics and Physics

MASTER’S DEGREE PROJECT IN MATHEMATICS

Forecasting Mortality Rates using the Weighted Hyndman-Ullah Method

by

Anthony Kojo Ramos

MAA515 — Examensarbete i matematik för masterexamen

DIVISION OF MATHEMATICS AND PHYSICS

MÄLARDALEN UNIVERSITY

(2)

School of Education, Culture and Communication

Division of Mathematics and Physics

MAA515 — Master’s Degree Project in Mathematics

Date of presentation: 4𝑡 ℎ

June 2021

Project name:

Forecasting Mortality Rates using the Weighted Hyndman-Ullah Method

Author:

Anthony Kojo Ramos

Version: 12th June 2021 Supervisor: Milica Rančić Reviewer: Ying Ni Examiner: Anatoliy Malyarenko Comprising: 30 ECTS credits Course name:

Degree Project in Mathematics

Course code: MAA515

(3)

Abstract

The performance of three methods of mortality modelling and forecasting are compared. These include the basic Lee–Carter and two functional demographic models; the basic Hyndman– Ullah and the weighted Hyndman–Ullah. Using age-specific data from the Human Mortality Database [6] of two developed countries, France and the UK (England&Wales), these meth-ods are compared; through within-sample forecasting for the years 1999-2018. The weighted Hyndman–Ullah method is adjudged superior among the three methods through a comparison of mean forecast errors and qualitative inspection per the dataset of the selected countries. The weighted HU method is then used to conduct a 32–year ahead forecast to the year 2050.

Keywords: Mortality modelling, mortality forecasting, Lee–Carter method, singular value decomposition, functional demographic models, functional principal components analysis, mean forecast errors, ARIMA.

(4)

Acknowledgements

First and foremost, I am grateful to God for the good health and well–being I had in this period of writing the thesis. I would like to thank my supervisor, Milica Rančić for the efforts, patience, motivation and steering she provided during the writing process. I would also like to thank Dr. Ying Ni, my reviewer for her meticulous support. I would like to thank the examiner, Professor Anatoliy Malyarenko for his noticeable additions throughout the process and my studies in general. Finally, to Abena Asantewaa and Nana Esi, thank you for your prayers, belief and encouragement.

(5)

List of Figures

3.1 Historical total mortality rates for France and U.K.(England & Wales) 1900-1950 20 3.2 Decreasing mortality rates for France and UK(England&Wales) . . . 21 3.3 Basis function and associated coefficient for the UK total mortality data - LC

method . . . 22 3.4 Basis function and associated coefficient for the French total mortality data

-LC method . . . 23 3.5 Basis functions and associated coefficients for the UK total mortality data

-basic HU method . . . 23 3.6 Basis functions and associated coefficients for France total mortality data

-basic HU method . . . 24 3.7 Basis functions and associated coefficients for the UK total mortality data

-weighted HU method . . . 25 3.8 Basis functions and associated coefficients for France total mortality data

-weighted HU method . . . 26 3.9 France actual and within-sample forecasted mortality rates for selected years . 27 3.10 UK actual and within-sample forecasted mortality rates for selected . . . 28 3.11 Actual (1987-2018) and Forecast (2019-2052) mortality rates using weighted

HU method for the France total population . . . 31 3.12 Actual (1987-2018) and Forecast (2019-2052) mortality rates using weighted

HU method for UK’s (England&Wales) total population . . . 32 3.13 Weighted HU model and forecast, France total mortality . . . 34 3.14 Weighted HU model and forecast, UK total mortality . . . 35

(8)

List of Tables

3.1 Percentage of variation explained in the France total population . . . 24

3.2 Percentage of variation explained in the UK total population . . . 25

3.3 Mean forecast error measures in the France total population . . . 30

(9)

Chapter 1 Introduction

For actuaries, insurance companies and pension fund managers, demographers and government policy makers, mortality data serves as an important basis for decision making. Actuaries use it to conduct the actuarial valuation and evaluate the projected financial health of the companies for which they are a part of. Managers in the insurance sector utilize this data in setting the premium levels for which customers would have to pay. Pension fund managers require mortality data to know the trends in the population and how long retirees are likely to be alive after their productive years. This data also help managers to account for the appropriate financing of future liabilities no matter how uncertain they may be. The same applies to demographers who use mortality forecasting for population projection and policy makers, to make economic decisions in relation to the structure of the population. With the inevitability of pandemics like the Spanish flu, COVID-19 and wars, governments require mortality data and forecasts. They also use them to plan social security and health care programs since an additional year of life expectancy causes considerable increase in expenditure on these areas of economy. It is for these reasons that the academic field of mortality prediction and forecasting have garnered massive interest.

With the above stated and recent improvement in technology, health care systems and lifestyle patterns in the latter years of the past century, mortality modelling and forecasting has become a major feature amongst researches.

1.1 Mortality Modelling

John Graunt in 1662, at the onset of the bubonic plague, used the “Bills of Mortality” available to establish birth and mortality rates as well as the magnitude of the population of England [11]. The bills of mortality contained information relating to the births, deaths and its causes among the population under study, England. Per Grant’s observations, he was able to create a life table and indicate the age of death among deceased members of the population. Thus Graunt conjured the concept of the life table based on mortality investigations. His work however had setbacks, according to [13], in that the relationship between the life table, the age distribution and the size of population were not fully acknowledged.

(10)

mortality modelling of de Moivre in 1725 [14]. De Moivre suggested a survival model which was a straight forward law of mortality based on a linear survival function. He indicated that a survival function gives the probability that an individual aged 𝑥 will survive beyond a designated time. Per his model, such an individual has a future lifetime which is assumed to follow a uniform distribution on the interval (0, 𝜔 − 𝑥), where 𝜔 is the ultimate age and in his case choosing 86.

The de Moivre’s survival function, explained in Definition 5, is given as: 𝑆(𝑥) = 1 −

𝑥 𝜔

, 0 ≤ 𝑥 ≤ 𝜔.

According to [13], subsequent work was done by several researchers to improve upon the work of de Moivre in the field of mortality modelling. None was distinct until the work of Benjamin Gompertz in 1825. This is believed to be the "beginning of a new era". Gompertz in [12], put forward the "law" of mortality and established the idea that, 𝑙𝑥of the life table is a continuous mathematical function and is connected to the underlying force of mortality. Until then, the table had only records of survivors from an initial cohort.

The model of the Gompertz law/force of mortality is given as

𝜇_𝑥 = 𝐵𝑐𝑥, 𝑥 >0, (1.1)

where 𝐵 and 𝑐 are constants such that 0 < 𝐵 < 1 and 𝑐 > 1. The full definition for the force of mortality is given in Definition 6.

His proposal for this model, according to [15] was as a result of the assumption that death arises out of two co-existing causes: Firstly by chance without previous disposition to death; secondly by deterioration and the inability of an individual to withstand destruction.

If the latter of the two causes were to be operational, then equation (1.1) would be persisting to form the force/intensity of mortality. However, if the first of the causes were to be operational, then the intensity of mortality would be a constant number of survivors from a previously selected cohort and would decrease geometrically, causing the intensity of mortality to be a constant. i.e. 𝜇𝑥 = 𝐴. The Gompertz’ law though simple was not able to portray mortality over the life span of all ages. The mortality tables he created had age ranges of (10 - 50) or (15 - 55).

Considering the above mentioned two causes of death, William Makeham in [26], rendered modifications to the Gompertz’ law. Makeham, in linking the two causes of death concurrently to mathematical expressions asserted that some deaths will be increasing with age while others would be roughly independent of age.

An extension of the Gompertz’ to Makeham’s law had a model for the force of mortality as

𝜇_𝑥 = 𝐴 + 𝐵𝑐𝑥 (1.2)

The extra fixed term, 𝐴, which is independent of age of an individual helps to improve the modelling of the death for younger ages [8]. This model proposed by Makeham has been in use since its inception until as recent as the CSO 1941(US) table where a life table was modelled for ages between 15 to 95.

(11)

Subsequently, there have been extensions to the Gompertz and Makeham models; Double Geometric [10], with new models also proposed. According to [10], Opperman proposed a 3-parameter model; Thiele, a 7-parameter model to take into consideration, the three stages of human life. Thus infancy, the "accidental hump" years and adulthood. Then others were developed to include the Weibull, Modified Perks [10] and the Lundengård et al model [25] which was a power-exponential function based model.

1.2 Mortality Forecasting

With this level of research and recent developments, Mortality modelling alone became not enough anymore. This led to the coming to the fore, forecasting of mortality rates in the last 40 years. A proper mortality forecast has economic advantages to policy makers as well as insurance and pension fund managers. With the accelerated aging of populations of countries, particularly in the west, mortality forecasts have become extremely important in that actuaries are able to develop models for greater competitiveness on the market. It also allows users of these forecast methods to meet the needs of the population as a whole.

In [4], the authors indicate the existence of three approaches to forecasting of mortality. It is of importance to note however that these approaches are no longer used in isolation: a combination often yields a better forecast. These three are briefly explained as follows:

Expectation: Experts’ opinions form the basis of the expectation methods. They could be information extracted from surveys (e.g. the number of children women in a population wish to have) or demographic developments based on expert opinions [1]. An advantage of expert opinion is the incorporation of demographic, medical and other important information. It is however condemned for its subjectivity. Experts in most cases tend to be conservative in their mortality forecasts and have a bias to forecasting smaller declines. When these forecasts are compared to the other methods (extrapolation) and the actual reality, they always tend to fall short.

Explanation: The explanation approach to forecasting centers on structural epidemiological models [4]. It incorporates medical and demographic data though embedding these factors into the forecasting approach is not easily achieved. This method of forecasting is mostly regarded as ideal because essential feedback and risk factors which are ever present in the life of individuals in a population are included. The model then have the ability to describe the population structure but cannot effectively predict long term forecasts. Generally, the explanatory models are regression-based and only differ from the regression-based extrapolation models because of the inclusion of risk factors or explanatory variables.

Extrapolation: This forms the basis and is the most frequent approach to mortality fore-casting. The basic assumption proponents of this approach make is that the future will be a

(12)

continuation of the past [1], and as such, no information in relation to recent developments in medicine, lifestyle among others is included in the formulation of these types of models.

According to [4], the authors allude to the simplest method among the extrapolation method as being the linear continuation of the trend in life expectancy. This method shows accuracy though it only satisfies short-term forecast.

A mathematical concept frequently employed in the extrapolative forecasting has to do with Time Series methods. ARIMA modelling whether univariate or multivariate are utilized depending on the number of parameters to be used. They comprise of underlying models which ranges from zero to several factors.

For the zero factor models, [7] in their article employed this method to come up aggregate mortality in 2005. They used first differences of age specific log death rates and achieved an estimation using iteration. These models are good and can compare with the benchmark Lee-Carter model. The Lee-Carter model is a typical example of a two-factor model [23]. The authors in this article used principal components analysis to identify mortality components. The two factors here are age and time. Since the article [23], several modifications and extensions have been made in this regards with all having the Lee-Carter model as the benchmark. Some extensions are the models proposed by Li and Lee in [24]; Renshaw and Haberman in [30]; Hyndman and Ullah in [19] and Hyndman and Shang in [18]. All the above listed methods utilize the consistency which occurs in both time trends and age patterns.

A draw back to the extrapolation approach to forecasting is that historical patterns are not a best guide to the future especially in mortality cases. An important consideration to make to improve the quality of prediction is the length of the historical period (data) being used.

1.3 Project Description

In the academic literature, research comparing the basic Lee–Carter, basic Hyndman–Ullah and weighted Hyndman–Ullah methods for the total population for the UK and France is non-existent.

In [18], the authors, in addition to proposing the method of the weighted Hyndman–Ullah, used age specific French female mortality rates from 1816 to 2006. In terms of comparison of methods in the existing literature, the authors in [9] carried out an investigation between the basic Hyndman–Ullah and the benchmark Lee–Carter for selected Asian countries. Due to the limitation of available data in some of the selected countries, a comprehensive performance evaluation could not be achieved.

In [28], the author conducts a comparison of only the first and second order Lee–Carter methods against the robust Hyndman–Ullah method by using the mortality rates for Sweden. In [32], where the authors compare point and interval forecast accuracy using the data of 14 developed countries, the range of the dataset for some of the selected countries included data from the 18𝑡 ℎ

century. There was therefore, no uniformity in the starting dates of the data used. Also, there has been a marked improvement in technology and healthcare to the extent that events which caused massive deaths during that period have become almost nonexistent.

In this thesis however, the weighted Hyndman-Ullah method of forecasting is explained into detail and will be applied to the dataset of two countries, the UK (England&Wales) and

(13)

France total populations. The outcomes will be compared to the basic Hyndman–Ullah as well as the benchmark Lee–Carter approach. Finally, the age range considered in here is 0-100, unlike in the case of [32]. This is done to depict the robustness of the methods understudy and measure their performance in the face of fluctuations which occur at higher ages up to 100. The thesis is organized into four chapters.

The Chapter 1 gives an introduction to mortality modelling and forecasting and delve into a review of the methods for forecasting mortality rates.

Chapter 2 centers on methodology where a detailed description will be made of the basic Lee–Carter, basic Hyndman–Ullah and the weighted Hyndman–Ullah method. This will entail the assumptions and formulations used for these methods.

In Chapter 3, a description of the database used is made. Also, analysis carried out in the thesis with the accompanying results will be presented in this chapter. Finally, there will be a discussion of results from the 3 methods used in forecasting our selected dataset.

This thesis ends in Chapter 4 with remarks of conclusions arrived at during the development of this thesis and possible recommendations of future works.

(14)

Chapter 2 Methodology

2.1 Actuarial Concepts

Mortality forecasting is closely linked to some basic actuarial concepts. Some of these fundamental concepts which will be used throughout this thesis are detailed below. It includes their definitions, theorems and properties. A more in-depth theory can be found in the book, [8].

To try and develop the unpredictability of age-at-death in probability concepts, let (𝑥) denote a life aged 𝑥, where 𝑥 ≥ 0.

Definition 1. _{Given a probability space (Ω, F , P), let 𝑋 be a mapping from Ω to an interval 𝐼}

of the real line R. The mapping 𝑋 is said to be a random variable if, for any 𝑎 < 𝑏, {𝜔 : 𝑎 < 𝑋(𝜔) ≤ 𝑏} ∈ F . In this case, 𝑋 is said to be F –measurable [20]. Ω is referred to as a sample

space_{, 𝜔, as an elementary event which belongs to the sample space and P is a probability}

measure.

Thus a random variable 𝑋 helps map a sample space into the real numbers, with the property that for every outcome, there is an associated probability 𝑃𝑟 [𝑋 ≤ 𝑥] which exists for all real values of 𝑥.

Definition 2. The random variables, 𝑋1, 𝑋₂, ..., 𝑋𝑛, are said to be independent and identically

distributedhereafter {iid}, if they have the same probability distribution and are independent of each other.

Thus each random variable has the same probability as the others and are all mutually independent.

Definition 3. 𝑇𝑥 is a continuous random variable which represents the future lifetime of (𝑥). That is to say the death of (𝑥) can only occur at any age greater than 𝑥. The quantity 𝑥 + 𝑇𝑥 then denotes the age-at-death random variable for (𝑥). A special case is 𝑇0which indicates the continuous future lifetime of a newborn.

Definition 4. The lifetime distribution is the distribution function of future lifetime of (𝑥), 𝑇𝑥. Denoting the distribution function as 𝐹𝑥,it is given in probabilistic terms as

(15)

𝐹_𝑥(𝑡) therefore indicates the probability that (𝑥) dies between age 𝑥 and 𝑥 + 𝑡.

This is of more interest to statisticians and mathematicians. For actuaries and demographers however, the probability of survival is rather of more interest. This led to the survival function. Let 𝑆𝑥 denote the survival function.

Definition 5. For any positive 𝑡, the survival function, 𝑆𝑥(𝑡) is the probability that (𝑥) will attain the age 𝑡. Mathematically, it is given as

𝑆_𝑥(𝑡) = 1 − 𝐹𝑥(𝑡) = 1 − 𝑃𝑟 [𝑇𝑥 ≤ 𝑡] = 𝑃𝑟 [𝑇𝑥 > 𝑡].

Generally, 𝑆𝑥(𝑡) gives a number, in probability terms, to the possibility that an individual aged 𝑥 will live beyond 𝑥 + 𝑡 years. A special case is 𝑆0(𝑡) which represents the survival function for a newborn.

The probability of (𝑥) dying between (𝑥 + 𝑡) and (𝑥 + 𝑧), hence can be derived using either the distribution function, 𝐹𝑥(𝑡) or the survival function 𝑆𝑥(𝑡) [5]. This distribution is depicted as follows:

𝑃𝑟[𝑡 < 𝑇𝑥 ≤ 𝑧] = 𝐹𝑥(𝑧) − 𝐹𝑥(𝑡) = 𝑆𝑥(𝑡) − 𝑆𝑥(𝑧).

The standard survival function fulfills four necessary and sufficient conditions according to [8], and they are listed below:

• 𝑆𝑥(0) = 1 : the probability that a life aged 𝑥 survives 0 years is 1, • lim𝑡→∞𝑆𝑥(𝑡) = 0 : the probability that (𝑥) survives infinitely is 0, • the survival function is a non-increasing function of 𝑡,

• the survival function is right continuous.

In actuarial science, probability statements about 𝑇𝑥 are made with reference to standard actuarial notation. Therefore, the above defined survival function can be restated in actuarial terms as follows:

𝑡𝑝𝑥 = 𝑆𝑥(𝑡) = 𝑃𝑟 [𝑇𝑥 > 𝑡].

i.e. the probability that (𝑥) survives beyond age 𝑥 + 𝑡. In the occasion where 𝑡 = 1, it is dropped to simply 𝑝𝑥; 𝑃𝑟 [(𝑥) will attain 𝑥 + 1].

The lifetime distribution is described in actuarial terms as: 𝑡𝑞𝑥 = 𝐹𝑥(𝑡) = 𝑃𝑟 [𝑇𝑥 ≤ 𝑡].

The symbol𝑡𝑞𝑥 can be interpreted as the probability that (𝑥) will die within 𝑡 years.

Similarly, when 𝑡 = 1, the 1 is dropped and the symbol becomes 𝑞𝑥; 𝑃𝑟 [(𝑥) will die within 1 year]. Per Definition (5), the relation𝑡𝑝𝑥 +𝑡𝑞𝑥 =1 can be established.

(16)

Definition 6. Let 𝜇𝑥 denote the force of mortality of an individual aged (x). This is defined mathematically as: 𝜇𝑥 = lim 𝑑𝑥→0+ 1 𝑑𝑥 𝑃𝑟[𝑇₀≤ 𝑥 + 𝑑𝑥 |𝑇₀ > 𝑥]. It describes the instantaneous rate of death in the time period, 𝑑𝑥.

In terms of the survival function, 𝑆𝑥 of Definition (5), the force of mortality can be expressed as: 𝜇_𝑥 = lim 𝑑𝑥→0+ 1 𝑑𝑥(1 − 𝑆𝑥 (𝑑𝑥)). Let 𝑚 (𝑥, 𝑡) be the central death rate for age 𝑥 in year 𝑡 [23].

Definition 7. The central death rate, 𝑚 (𝑥, 𝑡) is the ratio of the average number of deaths in a given period to the average number alive within that same period. It can be mathematically defined as:

𝑚(𝑥, 𝑡) = 𝑑𝑥 𝐿_𝑥 ,

where 𝑑𝑥 is the number of deaths of individuals aged 𝑥 within year 𝑡 and 𝐿𝑥 is the average number of population alive at age 𝑥 during the same year 𝑡.

2.2 Mathematical Concepts

In the modelling and forecasting methods of the Lee-Carter (LC), the Hyndman Ullah (HU) and the weighted Hyndman-Ullah (wHU), specific mathematical knowledge and theories are required and they are treated below.

2.2.1 Singular Value Decomposition

The Singular Value Decomposition, hereafter called SVD, is one of the fundamental and principal tools of modern numerical analysis, especially in numerical linear algebra. It is a dimensional reduction tool and a data pre-processing technique which is essential in the Lee-Carter model of forecasting. The SVD theorem according to [21], is stated as:

Theorem 1. Every matrix, 𝐴 ∈ 𝑀𝑚×𝑛, can be factorized as

A = U𝚺VT, where

• A is any 𝑚 × 𝑛 matrix,

• U is an 𝑚 × 𝑚 unitary matrix with columns 𝜃1, ..., 𝜃_𝑚, • V is an 𝑛 × 𝑛 unitary matrix with columns 𝜔1, ..., 𝜔𝑛, • Σ =

𝑆_𝑟 0

0 0

, with 𝑆𝑟 as a diagonal matrix having 𝑟 = 𝑟𝑎𝑛𝑘 (A).

The principal diagonal elements of 𝑆𝑟,( 𝜌₁, ..., 𝜌_𝑛), are referred to as singular values and are uniquely determined by the matrix A.

(17)

Remarks

• The columns of U are the orthonormal eigenvectors of AAT_{(left singular vectors of A).} • The columns of V are the orthonormal eigenvectors of AT_A_{(right singular vectors of A).} • The singular values on the diagonal of Σ are the square roots of the non-zero eigenvalues

of both ATAand AAT.

The SVD method is used in the LC model to determine the least-squares solution of a couple of parameters in the model when it is applied to the matrix of logarithms of the mortality rates after the averages over time have been subtracted [23].

2.2.2 Linear Time Series

After using the SVD to attain the parameters of the LC model for mortality, the next step the authors of [23] do is to forecast mortality rates. They use linear time series models, precisely a

random walkwith drift to perform the forecast. Some characteristics and types of linear time series models are described below.

Definition 8. A time series, 𝜀𝑡is referred to as a white noise if {𝜀𝑡} is a sequence of independent and identically distributed (iid) random variables with finite mean and variance. A special case known as a Gaussian white noise is when 𝜀𝑡 is normally distributed with mean zero and variance 𝜎2.

Definition 9. The time series 𝑘𝑡 is linear if it can be expressed as

𝑘_𝑡 = 𝜇 + ∞ Õ

𝑖=0

𝜓_𝑖𝜀_𝑡−𝑖,

where 𝜇 is the mean of the series, 𝜓𝑖are the weights of 𝑘𝑡 and 𝜀𝑡 is a sequence of iid random variables with mean zero (i.e. a white noise) [35]. In this case, 𝜓0 =1 and 𝜀𝑡 indicates new information at time 𝑡 of the time series.

Definition 10. A simple model possessing the predictive power of the form, 𝑘_𝑡 = 𝜙₀+ 𝜙₁𝑘_𝑡₋₁+ 𝜀𝑡,

is referred to as an autoregressive (AR) model of order 1 or an AR(1) model, with 𝜀𝑡 assumed to be a white noise, 𝜙0, a constant and 𝜙1,a parameter of the AR model [35]. This model bears resemblance to the simple linear regression model with 𝑘𝑡representing the dependent variable and 𝑘𝑡−1, the explanatory variable. In situations where only 𝑘𝑡−1alone cannot serve as a basis for modelling 𝑘𝑡, a flexible model AR(p), which is a generalization of the AR(1) model, is made.

(18)

The AP(p) model is given as

𝑘_𝑡 = 𝜙₀+ 𝜙₁𝑘_𝑡₋₁+ ... + 𝜙_𝑝𝑘_𝑡−𝑝+ 𝜀𝑡,

where 𝑝 is a non negative integer and 𝜀𝑡 is a white noise, [35]. This implies that past 𝑝 variables, 𝑟𝑡−𝑖 (𝑖 = 1, ..., 𝑝) serve as the explanatory variables in determining the current value of 𝑟𝑡. Thus the general form of the AR(p) process is modelled based on a linear combination of past values of the variable itself.

Definition 11. The general form of a moving average model of order 1 or MA(1) model is given as

𝑘_𝑡 = 𝑐𝑜+ 𝜀𝑡− 𝜃₁𝜀_𝑡₋₁,

where 𝑐0is a constant, 𝜃1is the parameter of the model and 𝜀𝑡 is a white noise series [35]. The MA model at times is defined as the extension of the white noise series and the constant 𝑐₀is the mean of the series.

The generalized MA model into a series of more than order 1 is the MA(𝑞) and is given as 𝑘_𝑡 = 𝑐𝑜+ 𝜀𝑡− 𝜃₁𝜀_𝑡₋₁− ... − 𝜃_𝑞𝜀_𝑡_−𝑞,

where 𝑞 > 0.

Definition 12. The author in [35], describes a time series 𝑘𝑡as a random walk if 𝑘_𝑡 = 𝑘𝑡₋₁+ 𝜀𝑡,

where 𝑘0 is a real number indicating the starting point of the series and 𝜀𝑡 is a white noise series. The random walk is a typical example of a non-stationary series like the price series of an asset, foreign exchange rate and others. With the characteristic of 𝜀𝑡 being a white noise and iid, the determined value of 𝑘𝑡 is likely to go up or down in a random manner.

Definition 13. A time series 𝑘𝑡 is a random walk with drift if:

𝑘_𝑡 = 𝜃 + 𝑘_𝑡₋₁+ 𝜀𝑡, (2.1)

where 𝜃 = 𝐸 [𝑘𝑡− 𝑘𝑡−1] is a constant, and 𝜀𝑡 is a zero mean white noise series [35].

For a random walk with drift, the constant term becomes the time slope of the series unlike in the case of an MA(𝑞) model where the constant is the mean of the series.

Differencing is a process of stabilizing the mean of a time series by removing changes in the level of a time series, thereby eliminating trend and seasonality.

Definition 14. A differenced time series is the change between consecutive observations in the original series and is written as [16]:

𝑘0

(19)

This differenced series will only have 𝑇 − 1 values as 𝑘0₁, the first observation will be impossible to calculate.

A combination of the autoregressive and moving average process yields an ARMA( 𝑝, 𝑞) model. However, with a non-stationary time series, differencing is required to make the series stationary. If there are autoregressive and moving average components to the differenced series, it may be modelled as an ARIMA(𝑝, 𝑑, 𝑞) as defined below.

Definition 15. The linear time series {𝑘𝑡}, is said to be an ARIMA(𝑝, 𝑑, 𝑞) if it can be written as [16]: 𝑘0 𝑡 = 𝑐 + 𝑝 Õ 𝑖=1 𝜙_𝑖𝑘0_𝑡−𝑖+ 𝜀𝑡 + 𝑞 Õ 𝑖=1 𝜃_𝑖𝜀_𝑡−𝑖, (2.2)

where 𝑘0𝑡is the differenced series. As before, 𝑝 is the order of the autoregressive process, 𝑑 is the order of differencing required to make the series stationary and 𝑞, the order of the moving

averageprocess.

In effect, the AR and MA models are special cases of the ARIMA model. It includes both lagged values of 𝑘𝑡and lagged errors.

For an ARIMA(1,0,0) or AR(1) model:

• when 𝜙1=0, 𝑘𝑡is equivalent to white noise,

• when 𝜙1=1 and 𝑐 = 0, 𝑘𝑡 is equivalent to a random walk,

• when 𝜙1=1 and 𝑐 ≠ 0, 𝑘𝑡is equivalent to a random walk with drift, • when 𝜙1< 0, 𝑘𝑡 tends to oscillate around the mean.

The ARIMA model is utilized in both the LC and the HU models for making forecasts using historical data. In the LC model, the random walk with drift as defined above, was found to be the best suit for the US population data. However, the order of the particular ARIMA model to select differs from country to country though this can easily be determined using the

demography packagein R [17].

2.2.3 Functional Principal Component Analysis (fPCA)

Principal Component Analysis (PCA) is a systematic procedure to the exploration of variability in multivariate data [33]. Eigenvalue decomposition of the variance matrix is used in PCA to find directions in the data where highest variability occurs. Each principal component produces a loading/weight vector which indicates the direction of variability corresponding to that component.

In the HU model however, fPCA is used because the smoothed function 𝑓𝑡(𝑥) is continu-ous and ordinary PCA methods, specifically designed for discrete values functions, becomes inapplicable.

Functional PCA as proposed by J.O. Ramsay and C.J Dalzell in [29], is constructed on the smoothed function by maximizing variance within predictors to achieve minimal loss of

(20)

information. In practice, the smooth function 𝑓𝑡(𝑥) is discretized on a dense grid of 𝑞 equally spaced interval {𝑥1, 𝑥₂, ..., 𝑥𝑞} that covers the interval [𝑥₁, 𝑥𝑝] [18]. Here, 𝑓𝑡(𝑥) represents the underlying continuous and smooth function for age 𝑥 in year 𝑡. With the discretized data, SVD methods, as treated in Subsection 2.2.1, can then be applied to attain the principal components (orthogonal basis functions), 𝜙𝑗(𝑥) together with their corresponding scores.

The derivation of these values are described below:

Let G∗,an 𝑛 × 𝑞 matrix, denote the discretized smooth function 𝑓𝑡(𝑥), and let G = WG∗, where

W= diagonal(𝑤1, ..., 𝑤𝑛). Applying SVD to G gives G = U𝚺VT, where 𝜙_𝑗(𝑥𝑗) is the (j,k)th element of U.

If B = GU, then 𝛽𝑡 , 𝑗 is the (t,j)th element of B. Subsequent values of 𝜙𝑗(𝑥) are derived using

linear interpolation.

Using a large enough 𝑞 in the discretization of the 𝑓𝑡(𝑥) function minimizes the need for interpolation, though this will increase the computational time of the SVD considerably.

2.3 The Lee-Carter (LC) Model

In [23], the authors proposed a statistical method to model and extrapolate long-term trends in mortality rates. They used this method to make probabilistic forecast about the US mortality rates to the year 2065.

The Lee-Carter model is a two-factor model which is estimated by using selected principal components (age & time) through matrix decomposition. The use of the principal components and its surrounding approaches help to estimate the age patterns contained in the mortality data.

In [4], matrix decomposition is described as the identifier of age patterns and their import-ance over time. Then the time related parameters on the other hand are used for forecasting through linear time series methods.

2.3.1 The Model

Definition 16. Given 𝑚 (𝑥, 𝑡) in Definition (7) as the central death rate, the two-factor Lee-Carter model is given by [23]:

ln[𝑚 (𝑥, 𝑡)] = 𝑎𝑥+ 𝑏𝑥𝑘_𝑡 + 𝜀𝑥 ,𝑡, (2.3) where

• 𝑎𝑥 is the age specific average log-mortality,

• 𝑏𝑥is the rate of change at age 𝑥 in response to the level of mortality over time, • 𝑘𝑡 is the level of mortality in year 𝑡 and

(21)

2.3.2 Estimation

In estimating the parameters of equation (2.3), the least squares method is used. To ensure the uniqueness of the solution that will be derived, the authors in [23] introduced constraints on the parameters 𝑏𝑥 and 𝑘𝑡 as follows:

𝑋 Õ 𝑥=1 𝑏𝑥 =1 and 𝑇 Õ 𝑡=1 𝑘𝑡 =0,

where 𝑋 is the number of age groups and 𝑇 is the total number of years considered.

This reflects a normalizing of the 𝑏𝑥and 𝑘𝑡parameters leading to 𝑎𝑥being only the averages over time of the ln(𝑚𝑥 ,𝑡). Thus

ˆ 𝑎_𝑥 = 1 𝑇 𝑇 Õ 𝑡=1 ln[𝑚 (𝑥, 𝑡)], (2.4)

where 𝑇 is as explained above. Ordinary regression methods, though a least squares method, is not used since there are no given regressors on the right hand side of equation (2.3). The method preferred by the authors in [23] is the SVD method. The SVD method is used to find the least squares solution when applied to the matrix of the logarithms of the rates after the parameter ˆ𝑎𝑥, already estimated using equation (2.4), is subtracted.

Letting M denote the resulting matrix after the subtraction, it is described mathematically as:

M =ln(𝑚𝑥 ,𝑡) − ˆ𝑎_𝑥.

Carrying out SVD, as described in Subsection (2.2.1), on the new matrix M will yield a unique solution provided by the first right and left vectors as well as the leading value of the SVD.

Practically, 𝑆𝑉 𝐷(M) = 𝑆𝑉 𝐷 (ln(𝑚𝑥 ,𝑡) − 𝑎𝑥) = 𝜃₁𝜌₁𝜔𝑇 1 + 𝜃2𝜌₂𝜔𝑇 2 + ... + 𝜃𝑛𝜌𝑛𝜔 𝑇 𝑛. (2.5) Per equation (2.5), the estimator of the parameter 𝑏𝑥 is

ˆ 𝑏_𝑥 = 𝜃₁, and the estimator of the parameter, 𝑘𝑡 is given as

ˆ

𝑘_𝑡 = 𝜌₁𝜔𝑇 1.

These estimated parameters of ˆ𝑎_𝑥, ˆ𝑏_𝑥 and ˆ𝑘𝑡form a new matrix ˆM_x,t,of logarithmic historical mortality rates for each age group and time period given as in [28]:

ˆ

Mx,t =𝑎ˆ_𝑥+ ˆ𝑏_𝑥𝑘ˆ_𝑡 =𝑎ˆ_𝑥+ 𝜃₁𝜌₁𝜔𝑇 1, where ˆ𝑎_𝑥is given by equation (2.4)

Remark: The realized estimated value of 𝑘, which minimizes the errors in the log of death rates will then have to be re-estimated to evaluate the actual numbers of deaths.

(22)

2.3.3 Forecasting

After fitting the historical data to the model by the estimation of the parameters above, the authors in [23] proceeded to make future predictions. The parameters ˆ𝑎_𝑥 and ˆ𝑏𝑥 are constant per the model through the years while the mortality index ˆ𝑘𝑡, on the other hand varies as the years evolve.

Forecasting for future years therefore is centered on ˆ𝑘𝑡.Linear time series is proposed for the forecasting but for the U.S. mortality data, the authors prescribed a random walk with drift. The random walk with drift is treated under the Section (2.2.2). Reproducing the equation (2.1),

ˆ

𝑘_𝑡 = ˆ𝜃+ ˆ𝑘_𝑡₋₁+ ˆ𝜀_𝑡.

The estimator ˆ𝜃 is the expected value of the series. i.e. ˆ𝜃 = 𝐸 [ ˆ𝑘_𝑡− ˆ𝑘_𝑡−1]. The error term 𝜀_𝑡is a normally distributed white noise with zero mean and variance 𝜎2. i.e. 𝜀𝑡 ∼ N (0, 𝜎2).

Due to the decreasing mortality rates over time, the constant drift ˆ𝜃 is expected to be negative with 𝑡 strictly increasing. This leads to the expected value of the series to decrease over time since it is independent of time. The variance within the series however increases over time because of its time dependency.

The LC method has an advantage of simplicity. Also, the model shows that the parameters chosen explain a large proportion of the variability in mortality rates of developed countries [4].

A disadvantage is the assumption that 𝑏𝑥, the ratio of the rates of mortality change will remain unchanged over time though at different ages.

2.4 The Hyndman-Ullah (HU) Models

Hyndman and Ullah in 2007 introduced a non-parametric method of mortality modelling and forecasting which is based on functional data paradigm [19]. The mortality data are smoothed using weighted penalized regression splines across ages before modelling. In [32], the authors realized that non-parametric methods are more robust especially with the occurrence of outliers and provide accurate forecasts.

The HU model, fundamentally, is an extension of the LC model in these ways [2]: • the log mortality rates are smoothed before modelling,

• functional principal component analysis methods are used, • more than one principal component is used, and

• the principal component scores are forecasted with methods more complex than the simple random walk with drift.

Basically, the HU method relates to a functional data model that uses an higher-order principal components to take into account additional variation in mortality rates. The method developed can be applicable to both fertility and mortality data. In the case of mortality data, penalized regression splines with monotonic constraint is applied to first smooth the log mortality rates.

(23)

2.4.1 The Model

As will be explained subsequently, the log mortality rates are smoothed in the HU model and hence are transformed into continuous variables, 𝑚𝑡(𝑥).

Definition 17. Let 𝑦𝑡(𝑥) be the log transformed mortality rates for age 𝑥 in year 𝑡. Then according to [19]: 𝑦_𝑡(𝑥) = log[𝑚𝑡(𝑥)] = 𝑓𝑡(𝑥) + 𝜎𝑡(𝑥)𝜀𝑡, (2.6) where, 𝑓_𝑡(𝑥) = 𝜇(𝑥) + 𝐽 Õ 𝑗=1 𝛽_{𝑡 , 𝑗}𝜙_𝑗(𝑥) + 𝑒𝑡(𝑥), (2.7) for 𝑡 = 1, ..., 𝑇 , where,

• 𝑡 is the individual years,

• 𝑇 is the number of time periods fitted, • 𝑥 is the age,

• 𝑓𝑡(𝑥) is the smoothed function of 𝑥,

• 𝜎𝑡(𝑥) is the noise component which varies with 𝑥, • 𝜀𝑡is an iid standard normal random variable,

• 𝜇(𝑥) is the mean function across ages estimated by ˆ𝜇(𝑥) = 1 𝑛

Í𝑛

𝑡=1 𝑓𝑡(𝑥), • {𝜙1(𝑥), ..., 𝜙𝐽(𝑥)} is the set of first 𝐽 principal components,

• {𝛽𝑡 ,1, ..., 𝛽𝑡 ,𝐽} is the set of uncorrelated principal component scores, • 𝑒𝑡(𝑥) is the residual function with mean zero.

In [3], the need to incorporate multiple principal components is explained extensively. The authors then conclude that the additional components encapsulate non-random patterns that are not captured usually by the first principal component. Hyndman and Ullah in [19] found the optimal number of principal components to choose as 𝐽 = 6.

2.4.2 Estimation of Parameters

According to Hyndman and Ullah, the steps involved in developing their model of mortality prediction and forecasting can be summarized in six steps [19]. The steps are given below.

1. Using a non-parametric smoothing method, smooth the data for each 𝑡 to estimate 𝑓𝑡(𝑥) for 𝑥 ∈ [𝑥1, 𝑥𝑝].

2. Decompose fitted curves using fPCA to achieve orthonormal basis functions, as in eq. (2.7).

(24)

3. For each of the principal component scores, {𝛽𝑡 , 𝑗}, fit a univariate time series model. 4. Using fitted time series models, forecast the coefficients {𝛽𝑡 , 𝑗} for 𝑡 = 𝑛 + 1, ..., 𝑛 + ℎ. 5. Use forecasted coefficients to obtain forecasts of 𝑓𝑡(𝑥), 𝑡 = 𝑛 + 1, ..., 𝑛 + ℎ, i.e. through

equation (2.7). Also, from equation (2.6), forecasts of 𝑓𝑡(𝑥) are also 𝑦𝑡(𝑥).

6. The estimated variances of the error terms in equations (2.6) and (2.7) are used to compute prediction intervals for the forecasts.

To achieve step 1 of the HU model, weighted penalized regression splines with monotonic constraints are used to estimate the smooth function, 𝑓𝑡(𝑥). This non-parametric smoothing controls heterogeneity and reduces some of the randomness caused by 𝜎𝑡(𝑥).

Weighted Penalized Regression Splines

In [19], the authors proposed a method for smoothing mortality rates across ages in each year. The smoothing is obtained through the use of constrained weighted penalized regression

splines. This pre-processes the mortality data to reduce the noise therein before fitting it to the model. It is important to note that the age 𝑥, after smoothing is then considered as a continuous variable and as such, the continuous mortality rates are depicted as 𝑚𝑡(𝑥).

Given that 𝑚𝑡(𝑥) represents the continuous observed mortality rate for age 𝑥 in year 𝑡, let 𝑁𝑡(𝑥) denote the total years of life lived between ages 𝑥 and 𝑥 + 1 also in calendar year 𝑡, approximately, the mid-year population at age 𝑥 in year 𝑡. i.e. June 30. Then 𝑚𝑡(𝑥) is assumed to be binomially distributed with the variance of 𝑦𝑡(𝑥) = 𝑙𝑜𝑔 [𝑚𝑡(𝑥)], through a Taylor approximation, given as:

ˆ 𝜎2 𝑡 (𝑥) ≈ 1 − 𝑚𝑡(𝑥) 𝑁𝑡(𝑥)𝑚𝑡(𝑥) .

Hyndman and Ullah then defined the weights, to be used in the weighted penalised regression splines, as the inverse variances, given as:

𝑤_𝑡(𝑥) =

𝑁_𝑡(𝑥)𝑚𝑡(𝑥) 1 − 𝑚𝑡(𝑥)

The method of weighted penalized regression splines is adopted in estimating the 𝑓𝑡(𝑥) curve because they can be quickly computed while allowing for monotonous constraints to be imposed quite simply. Furthermore, a qualitative constraint is applied, with the aim of attaining better estimates of 𝑓𝑡(𝑥), when 𝜎𝑡(𝑥) is large. The function 𝑓𝑡(𝑥) is assumed to be monotonically increasing after some years. i.e. for 𝑥 > 𝑐 where 𝑐 = 50 years. The reasoning for this constraint stems from the fact that the older an individual gets, the more likely the occurrence of death. The constraint added therefore allows for the reduction of noise in the estimated curves at high ages.

This smoothing method is one-dimensional and allows for the application of a univariate time series in the forecasting procedure [18]. It is important to note that the implementation

(25)

of this particular smoothing method is available in the demography package for R [17]. This same R package [17], will be used generally in the implementation section of this thesis.

To achieve the second step of the HU model, functional principal component analysis is used to decompose the smoothed function, 𝑓𝑡(𝑥). Given that the ages, though are measured in discrete values, have been transformed into continuous variables, fPCA is then used in the decomposition.

Weighted Functional Principal Components Analysis wfPCA

In this version of PCA, the mean function 𝜇(𝑥), as described in Definition 17, is estimated through a weighted average. It is given as:

ˆ 𝜇(𝑥) = 𝑛 Õ 𝑡=1 𝑤_𝑡𝑓_𝑡(𝑥), (2.8)

where 𝑓𝑡(𝑥) is a smoothed curve derived from 𝑦𝑡(𝑥) and 𝑤𝑡 = 𝜅 (1 − 𝜅 𝑛−𝑡

) is a geometrically decaying weight with 0 < 𝜅 < 1. The mean adjusted functional data is then denoted as [18]:

ˆ 𝑓_𝑡 ∗

(𝑥) = ˆ𝑓𝑡(𝑥) − ˆ𝜇(𝑥) Then by applying fPCA to ˆ𝑓𝑡

∗

(𝑥) as proposed in [29] and treated in Subsection 2.2.3, the principal components {𝜙𝑗(𝑥)} and their corresponding scores {𝛽𝑡 , 𝑗} are obtained.

2.4.3 Weighted HU Model

With the inevitable occurrence of wars and pandemics, outliers are ever-present in mortality data. A robust method was therefore proposed in [19] to forecast even in the presence of these outliers. The robust form of the HU method then introduced the concepts of weights. These weights are assigned to outlying years to eliminate their possible influence in the modelling and forecast.

However, the weighted version of the HU method is another method also proposed in [18]. In this version, latter years are assigned higher weights during the model fitting as opposed to years from the distant past. The reasoning behind the assignment of weights stems from the fact that recent developments in mortality trends have a greater importance and relevance towards future predictions than distant developments and experiences. It can be expressed symbolically as in [2] and [19]: 𝑓𝑡(𝑥) = ˆ𝜇(𝑥) + 𝐽 Õ 𝑗=1 𝜙𝑗(𝑥) 𝛽𝑡 , 𝑗 + 𝑒𝑡(𝑥),

where, ˆ𝜇is as described in equation (2.8) andÍ𝑛

𝑡=1𝑤𝑡 =1.

The new weights, 𝑤𝑡, are determined through the optimal value of 𝜅 which is attained from the minimizing of the overall forecast error measure. These weights are used in the estimation of the functional principal components, thereby allowing the components to be based on current trends.

(26)

2.4.4 Forecasting

In contrast to the LC model where forecasting is based on a simple principal component, the HU model conducts forecasting on higher order principal components, in this case the order of 6. Time series models are employed to enable in the forecasting of principal component scores. This is achieved by selecting the optimal time series model, using standard model-selection procedures (e.g. Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC)), for all the components. The AIC and BIC are model identification tools where models with small values are considered as good models.

A combination of equations (2.6) and (2.7) yield a single compact equation of the form:

𝑦_𝑡(𝑥𝑖) = 𝜇(𝑥𝑖) + 6 Õ

𝑗=1

𝛽_{𝑡 , 𝑗}𝜙_𝑗(𝑥) + 𝑒𝑡(𝑥𝑖) + 𝜎𝑡(𝑥𝑖)𝜀𝑡 ,𝑖.

By conditioning on the observed data I = {𝑦𝑡(𝑥𝑖); 𝑡 = 1, ..., 𝑛; 𝑖 = 1, ..., 𝑝} and the set of basis functions 𝚽, an ℎ-step ahead forecast, 𝑦𝑛+ℎ(𝑥) is given as [19]:

ˆ𝑦𝑛, ℎ(𝑥) = 𝐸 [𝑦𝑛+ℎ(𝑥) | I, 𝚽] = ˆ𝜇(𝑥) + 6 Õ 𝑗=1 ˆ 𝛽_{𝑛, 𝑗 , ℎ}𝜙ˆ_𝑗(𝑥),

where ˆ𝛽_{𝑛, 𝑗 , ℎ} represents the ℎ-step ahead forecast of 𝛽𝑛+ℎ, 𝑗 using the estimated time series ˆ

(27)

Chapter 3 Numerical Experiments

3.1 The Data

The data set adopted in this thesis is the single year of age historical data taken from the Human Mortality Database [6]. At [6], mortality rates are defined as the ratio of death counts per calendar year to population exposure in the relevant intervals age and time. Two developed countries with reliable data series are selected. These two countries are France and the U.K. (England & Wales), taking notice that both countries have consistent data collection methods. These two are adopted because of their active participation in both World Wars and being located on the European continent, their population (mortality rate) were greatly impacted by the 1918 Spanish Flu.

The end year of 2018 is used since the data available for these two countries terminate there. Thought is also given to the flu pandemic of 1918, and as such the beginning of the data set commences in 1900. This allows the large jumps in mortality, which occurred in the 20th century, to be captured and form a proper base for model fitting. In [23], the authors indicate their preference to ignore this particular year, terming said year as an outlier. Others also propose that the two world wars are also outliers and hence commence their forecasting methods from the year 1950 [22],[3].

In this thesis, the outliers are however left in the datasets because it furnishes the analysis with events, though may be rare, but equally valuable.

Finally, the ages considered in the analysis are in single years within the range 0-100. In [32], the authors propose the use of age restrictions to avoid fluctuations in older ages of mortality data.

An extract of the total populations for both countries is plotted in Fig. 3.1. From the figure, it can be noticed that the log mortality rates for the years 1914 to 1918 and 1940 to 1945 are unusually high especially for ages between 18 to 50; the age range considered for military service in these countries. This unusual increase as mentioned earlier is attributable to the two world wars and the Spanish Flu of 1918.

(28)

Figure 3.1: Historical total mortality rates for France and U.K.(England & Wales) 1900-1950

3.2 Performance Evaluation

A comparable analysis is carried out on the three different methods of mortality forecasting. The basic LC method serves as the benchmark and is compared against the basic HU and the weighted HU methods. The analysis performed is divided into the following segments:

• Decreasing mortality rates: Over the past century, it has become an empirical fact that mortality rates worldwide have been decreasing consistently leading to individuals living longer years of life [27]. Mortality rates for four separate years within the time span under consideration, are graphically depicted for the two countries.

• Fitting the historical data: The three methods for mortality modelling and forecasting are used in fitting the existing data. With each of the methods, the proportion of variance explained in the models is given and discussed.

• Within-Sample Forecast: For the two countries, the dataset is divided into two parts; from 1900 to 1998 and from 1999 to 2018. The first serves as the "training set" while the second serves as the "testing set". This is done to ascertain the accuracy of the fitted models in forecasting. The three methods are fitted on the training set and a 20-year forecast is made. This within sample forecast is compared to the testing set, which is the actual data thereby enabling to assess the forecasting capabilities of the methods. • Out-of-Sample Forecast: With the available mortality data terminating in 2018 for

both countries, an out-of-sample forecast is carried out from 2019 to 2050 (32 years ahead forecast). The best performing methods attained after quantitative analysis of

(29)

the within-sample forecasts is used to make the forecasts into the future and the results thereafter are shown. To provide a level of certainty, a confidence interval of 95% is provided to the forecast.

3.3 Results and Discussions

The results from the analysis described in Section 3.2 are presented here along with discussions of the findings.

3.3.1 Decreasing Mortality Rates

Figure 3.2 illustrates decreasing levels of mortality rates for the years 1900, 1950, 2000 and 2018. It can be noted that the mortality rates for 2000 and 2018 are close for both countries, probably because of the not much difference in years.

Figure 3.2: Decreasing mortality rates for France and UK(England&Wales)

For the England&Wales population, the total death rates does not depict a pronounced mortality

humpin the years 1900. A slight hump occurs for the 1950 and with 2000 and 2018, it becomes well defined. The same can also be said for the French population.

However, in the later years, the mortality rates all tend to converge. This validates the claim that the older an individual, the greater likelihood of death occurring [34].

As per the conclusion of [27], it is noticed from Fig. 3.2 that death rates decrease as the years advances with the mortality curve for latter years lying lower in the figure as compared to earlier years.

(30)

3.3.2 Fitting the Historical Data

A summary analysis of the LC method fitted to the France population produced a basis function which explained 95% (0.95046) of the variation occurring in the data for the ages in fit (0-100). In the UK data, 95.2%(0.95216) of the variation is explained by the basis function in the LC method. Figures. 3.3 and 3.4 show the fitted basis and the corresponding coefficient. In Fig. 3.4, the impact of both World Wars is clearly depicted in the ”𝑘𝑡” graph. For Fig. 3.3 however, the years 1914-1919, which covers the Spanish flu and World War I, has a noticeable spike in mortality rates. For years spanning World War II, there seem to be a highly volatile mortality rate for that period.

Figure 3.3: Basis function and associated coefficient for the UK total mortality data - LC method

The fitted basis and associated coefficients for the basic HU method of the UK is shown in Fig. 3.5. Figure 3.5 has the first coefficient of the first basis function showing a rigorous effect of World War II on the mortality rates of the UK. This particular effect is not clearly visible in the component plot of the LC method for the same country in Fig. 3.3.

Regarding the percentage of variance explained, the basic HU method, of order 5, has 99.7% (0.99722) and 99.8% (0.99826) of variation explained due to the basis functions for France and UK total populations respectively.

For the weighted HU method, the total percentage variation due to the basis functions amounted to 99.9% (0.99890) and 99.9% (0.99929) respectively for the France and UK total population. Figure 3.8 depicts the basis functions together with their corresponding coefficients for France. The first basis function for France explains 97.7% and that of the UK explains 97.9%.

(31)

Figure 3.4: Basis function and associated coefficient for the French total mortality data - LC method

Figure 3.5: Basis functions and associated coefficients for the UK total mortality data - basic HU method

(32)

Figure 3.6: Basis functions and associated coefficients for France total mortality data - basic HU method

It must be noted that while the sole basis function in the LC method explains only 95.04% of the variation in the France data, the first basis function for the basic HU method explains 95.4% while that of the weighted HU explains 97.7%.

Tables 3.1 and 3.2 show that the total percentage of variation attributed to the basis func-tions and the number of basis funcfunc-tions present in each of the three methods under study in this thesis.

Table 3.1: Percentage of variation explained in the France total population

France No. of Basis Functions % of 1stBasis Function Total % of variation

LC 1 95.04% 95.04%

bHU 5 95.40% 99.70%

wHU 5 97.70% 99.90%

From Tables 3.1 and 3.2, it can be noted that the weighted HU method produces basis functions which explains more of the variation present in the data for both countries. The basic LC method is least informative about variation explanation.

(33)

Figure 3.7: Basis functions and associated coefficients for the UK total mortality data -weighted HU method

Table 3.2: Percentage of variation explained in the UK total population

UK No. of Basis Functions % of 1st Basis Function Total % of variation

LC 1 95.20% 95.20%

bHU 5 95.50% 99.80%

wHU 5 97.90% 99.90%

3.3.3 Within-Sample Forecast

Within-sample forecasted mortality rates for France and U.K. total national population for the years 2003, 2008, 2013 and 2018 are presented in Figs. 3.9 and 3.10. For each of the years, the actual mortality rates are plotted together with the other three forecasting methods. The time span for the dataset used in the within-sample forecasting ranges from 1900 to 1998.

(34)

Figure 3.8: Basis functions and associated coefficients for France total mortality data - weighted HU method

(35)

F igure 3.9: France actual and within-sample forecas ted mor tality rates for selected y ears

(36)

F igure 3.10: UK actual and within-sample forecas ted mor tality rates for selected

(37)

Discussions

Graphically, the 2003 within sample forecast for France shown in Fig. 3.9 (top left) has the basic LC method underestimating the actual log mortality data from age 0 to about age 60. It thereafter slightly overestimates mortality rates though there is not a wide gap in the overestimation as compared to the underestimation present between the ages 15 to 40.

For the same year, the basic HU method presents a fairly forecast estimate. Between the ages 2 to 18, the basic HU method presents a slight overestimation. For the mortality bump, which occurs from ages 18 to about 30, there is a slight underestimation of the actual data. After the age 50, there is the case again of a slight overestimation of the actual data.

The weighted HU method however provides an almost fit forecast estimation with the only minimal overestimation occurring between the ages 20 and 40. With the exception of this age range, the weighted HU forecast estimate for the year 2003 is almost identical to the actual data for the year 2003.

The same attributes can be found graphically for the 2008 within sample forecast in Fig. 3.9 for the basic LC and HU methods. With the weighted HU method however, there is a slight overestimation between the ages 10 to 50 and after 60 years.

In the 2013 within-sample forecast (bottom left) however, the overestimation in the later ages occurs after age 40, unlike in the year 2003, which occurs after age 50. For the weighted HU method in year 2013, overestimation occurs right from about age 10 to 50 which is different from the 2003 plot.

Visually inspecting Fig. 3.10, similar observations made from the France within-sample forecast is also applicable in this case. The basic LC within-sample forecast underestimates for early ages and after the mid ages, slightly overestimates mortality rates. The basic HU within-sample forecast slightly underestimates the mortality hump, but otherwise provides a good forecast estimate. With 2013 and 2018 however, in addition to the underestimation, there is a marked overestimation between the ages 50 and 90. With respect to the weighted HU within–sample forecast, an almost perfect forecast estimate is generated especially for the years 2003 and 2008.

Quantitatively, the forecast methods are compared in Tables 3.3 and 3.4. Within-sample measures of fit like the mean absolute errors (MAE) mean square error (MSE) and mean ab-solute percentage error (MAPE) are used. The MAE, which is the average of abab-solute forecast errors of log mortality across different age groups and is a measure of forecast precision. Thus how close the forecasts were to the out-of-sample values irrespective of the sign of the distance (negative or positive). With MAE measuring the average over the test sample of absolute dif-ferences between forecasted and actual mortality rates, a smaller MAE is preferred. Similarly, the MAPE measures the relative overall fit in percentage terms providing an easier platform for interpretation. The closer MAPE is to 0%, the more advantageous it is to the measurement of performance. In [31] the theory of these mean measures are treated exhaustively.

In all the mean forecast errors computed for the mortality rates for the two countries in Tables 3.3 and 3.4, the basic HU method performed better than the basic LC method. However, the weighted HU method produced minimum forecast errors and as such performed the best amongst the three methods.

(38)

Table 3.3: Mean forecast error measures in the France total population

France MAE MSE MAPE

LC 0.305673 0.160859 0.248007 bHU 0.170030 0.041455 0.187286 wHU 0.111311 0.022365 0.119780

Table 3.4: Mean forecast error measures in the UK total population

UK MAE MSE MAPE

LC 0.483988 0.395692 0.355441 bHU 0.210427 0.065188 0.227599 wHU 0.137607 0.034626 0.152182

3.3.4 Out-of-Sample Forecast

Figures 3.11 and 3.12 show the forecasted mortality rates for the period 2019-2050. The actual mortality rates for 32 years prior to the forecast are also plotted. The underlying years used in the fitting for the forecasts, through the weighted HU method, span the period 1900 to 2018.

(39)

F igure 3.11: A ctual (1987-2018) and F orecas t (2019-2052) mor tality rates using w eighted HU method for the France total population

(40)

F igure 3.12: A ctual (1987-2018) and F orecas t (2019-2052) mor tality rates using w eighted HU method for UK’ s (England&W ales) total population

(41)

The basis functions and associated coefficients for the two countries are presented in Figs. 3.13 and 3.14. With the model fitting covering the years 1900 to 2018, the first basis function for the France total population explains 98.6%, while that of the UK explains 97.8% of the variation in the data. The corresponding coefficients depict the outliers occuring during the World War I&II years. The five coefficients obtained from the weighted HU method are all shown to have a 95% prediction interval (in yellow).

In Subsection 2.2.2, it was settled that for the weighted HU, linear time series of higher orders are used in the forecast of mortality rates, unlike in the case of the LC method which predominantly employs random walk with drift. Specifically, univariate time series models (ARIMA) are used for the forecasting of the coefficients of the basis functions. For the France total population, the first coefficient is modelled as an ARIMA(0,1,5) with drift. In the UK total population, the first coefficient yielded an ARIMA(0,1,0) with drift. The full results of the linear time series models for each of the coefficients together with values of their selection criteria in the weighted HU model are presented in Appendix B.

(42)

F igure 3.13: W eighted HU model and forecas t, France total mor tality

(43)

F igure 3.14: W eighted HU model and forecas t, UK total mor tality

(44)

Chapter 4 Conclusions and Future Work

4.1 Conclusion

In recent years, an immense level of research has been carried out in the area of mortality modelling and forecasting. Through the availability of high-quality data, specialized mortality forecasting packages using R as a common programming language has been developed.

This thesis presented the dataset for the study as the UK (England&Wales) and France total populations. The reasoning for their selection was attributed to the availability of high quality mortality data and their location on the European continent, hence exposing their populations to the effects of both World Wars and the Spanish flu of 1918.

The empirical fact of decreasing mortalities over the years, especially child mortality, was depicted graphically after conducting the numerical experiments. In the presentation, the years 1914-1919 and 1940-1945 were identified to have high mortality rates. The latter years could be attributed solely to the World War II but the former outlier was attributed to a combination of both the World War I and the Spanish flu of 1918.

The basic LC, basic HU and weighted HU models were fitted to the France and UK dataset. Having divided the data into two, training and testing sets, a within-sample forecast was carried out on the training set and the corresponding results compared to the testing set data. In the model fitting, the weighted HU method was determined to have basis functions explaining 99.9% of the variation present in the dataset for both countries. The basic LC method, with about 95% performed the worst in variation explanation.

Through graphical inspection and a comparison of the error measures in each of the three methods, the weighted HU was adjudged to perform the best. The three forecast error measures aggregated over ages were the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Mean Absolute Percentage Error (MAPE).

With the weighted HU method possessing superior forecasting abilities of the three methods under study, a 32-year ahead forecast was made. Thus a forecast of mortality rates to the year 2050 was made using the weighted HU method and the results are also presented graphically together with 32 years prior to the end of the available dataset.

(45)

4.2 Future Work

This thesis considered three methods of mortality modelling and forecasting all within the extrapolation approach to forecasting. Further work, with this thesis serving as a basis, can be done in the following areas.

• The available data at [6] for these two countries selected terminates in the year 2018. Therefore, the impact of COVID-19 on their mortality rates has not been incorporated in the modelling. With increased deaths reported the world over, the latter days of 2019 until the pandemic is over will clearly be an outlier as was the case in the years of the World Wars and Spanish flu. It is therefore suggested that once the full data of mortality covering the extent of the pandemic is realized, the three methods in this study are applied to the data to evaluate their performance and ability to handle outliers.

• The basic LC method uses only one basis function. Other versions of the LC method that utilize more than one basis function, like the already developed methods in [22] and [3] can serve as the basis of comparison with the functional demographic models. This in effect will enable the percentage of variation explained to be greater and hopefully improve the forecasting accuracy.

• With the ages in focus ranging from 0-100, fluctuations were still evident in higher ages. Another suggestion will be to conduct similar work with a shorter age range, say 0-70. The within-forecast range could also be modified from the 20 years to a shorter or lengthy time span since the dataset is large enough to accommodate. This will also allow for the attainment of mean forecast errors spanning several years.

• Finally, the three versions of the HU method could be evaluated. The basic, robust and weighted HU methods could be applied to countries, with one highly impacted by the effects of the World Wars and Spanish flu, and another that does not necessarily have its mortality impacted by such events like Canada. This will help to ascertain, of the three functional demographic models, the best method of mortality modelling and forecasting.

Forecasting Mortality Rates using the Weighted Hyndman-Ullah Method

School of Education, Culture and Communication

Division of Mathematics and Physics

MASTER’S DEGREE PROJECT IN MATHEMATICS

Forecasting Mortality Rates using the Weighted Hyndman-Ullah Method

by

Anthony Kojo Ramos

MAA515 — Examensarbete i matematik för masterexamen

DIVISION OF MATHEMATICS AND PHYSICS

School of Education, Culture and Communication

Division of Mathematics and Physics

Acknowledgements

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Mortality Modelling

1.2

Mortality Forecasting

1.3

Project Description

Chapter 2

Methodology

2.1

Actuarial Concepts

2.2

Mathematical Concepts

2.2.1

Singular Value Decomposition

2.2.2

Linear Time Series

2.2.3

Functional Principal Component Analysis (fPCA)

2.3

The Lee-Carter (LC) Model

2.3.1

The Model

2.3.2

Estimation

2.3.3

Forecasting

2.4

The Hyndman-Ullah (HU) Models

2.4.1

The Model

2.4.2

Estimation of Parameters

2.4.3

Weighted HU Model

2.4.4

Forecasting

Chapter 3

Numerical Experiments

3.1

The Data

3.2

Performance Evaluation

3.3

Results and Discussions

3.3.1

Decreasing Mortality Rates

3.3.2

Fitting the Historical Data

3.3.3

Within-Sample Forecast

3.3.4

Out-of-Sample Forecast

Chapter 4

Conclusions and Future Work

4.1

Conclusion

4.2

Future Work