Bayesian Parameterization in the spread of Diseases

(1)

UPTEC F 17024

Examensarbete 30 hp Juni 2017

Bayesian Parameterization in the spread of Diseases

Robin Eriksson

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0

Postadress:

Box 536 751 21 Uppsala

Telefon:

018 – 471 30 03

Telefax:

018 – 471 30 00

Hemsida:

http://www.teknat.uu.se/student

Abstract

Bayesian Parameterization in the spread of Diseases

Robin Eriksson

Mathematical and computational epidemiological models are important tools in efforts to combat the spread of infectious diseases. The models can be used to predict further progression of an epidemic and for assessing potential countermeasures to control disease spread. In the proposal of models (when data is available), one needs parameter estimation methods. In this thesis, likelihood-less Bayesian inference methods are concerned. The data and the model originate from the spread of a verotoxigenic Escherichia coli in the Swedish cattle population. In using the SISE3 model, which is an extension of the susceptible-infected-susceptible model with added environmental pressure and three age categories, two different methods were employed to give an estimated posterior: Approximate Bayesian Computations and Synthetic Likelihood Markov chain Monte Carlo. The mean values of the resulting posteriors were close to the previously performed point estimates, which gives the conclusion that Bayesian inference on a nation scaled SIS-like network is conceivable.

ISSN: 1401-5757, UPTEC F 17024 Examinator: Tomas Nyberg Ämnesgranskare: Per Lötstedt Handledare: Stefan Engblom

(3)

Popul¨ arvetenskaplig sammanfattning

Forskare och industri anv¨ ander idag datormodeller i stor utstr¨ ackning. Anled- ningen ¨ ar av b˚ ade ekonomiska och fysiska sk¨ al. Det ¨ ar mer kostnadseffektivt att g¨ ora experiment i en datoriserad milj¨ o ¨ an i verkligheten. I den virtuella milj¨ on ¨ ar det ¨ aven enklare att utf¨ ora tester. Utf¨ oraren f˚ ar beroende p˚ a modell, kontroll ¨ over sm˚ a detaljer som hastigheten p˚ a vinden i en flygplanssimulering eller antalet liter vatten som str¨ ommar genom ett reningsverk. I dessa modeller finns parametrar och det ¨ ar dessa som finjusterar de n¨ amnda kvantiteterna.

Motsatsen till att simulera modeller ¨ ar att fr˚ an data f¨ ors¨ oka anpassa den modell som beskriver data b¨ ast. F¨ or att anpassa data till modeller beh¨ over man hitta vilka parametrar som g¨ or att felet mellan simulering, via modell, och verklighet blir s˚ a liten som m¨ ojligt. Detta kallas parametrisering.

Vid studier av hur sjukdomar sprider sig anv¨ ands idag ofta stokastiska, slumptalsberoende modeller. En vanlig modell f¨ or ett basfall ¨ ar SIS-modellen:

individer g˚ ar mellan tv˚ a stadier: S – susceptible (mottaglig) och I – infected (in- fekterad). Modellen inneh˚ aller tv˚ a parametrar, smitthastigheten och ˚ aterh¨ amt- ningshastigheten. Detta examensarbete behandlar Bayesianska parametriser- ingsmetoder d¨ ar parametrar behandlas som sannolikhetsf¨ ordelningar och inte som punktv¨ arden.

Ett hinder f¨ or en popul¨ ar Bayesiansk parametriseringsmetod, Metropolis- Hasting Markov chain Monte Carlo (MCMC), ¨ ar att man p˚ a f¨ orhand beh¨ over veta vad som kallas likelihood som ¨ ar ett m˚ att p˚ a sannolikheten att data genererats med f¨ oreslagna parametrar. N¨ ar man inte l¨ angre vet f¨ ordelningen beh¨ over man anv¨ anda sig av andra metoder som p˚ a olika s¨ att f¨ ors¨ oker kringg˚ a kravet.

I detta examensarbete studeras en SIS-liknande modell, som ¨ ar ett fall d¨ ar likelihood ¨ ar ok¨ ant. Ist¨ allet f¨ or MCMC anv¨ ands Approximate Bayesian Computations och Synthetic Likelihood Markov chain Monte Carlo som b˚ ada

¨

ar metoder drivna av simuleringar vilket g¨ or dem v¨ aldigt ber¨ akningstunga. De resulterande parameterf¨ ordelningarna fr˚ an dessa metoder f¨ orv¨ antas befinna sig kring n˚ agra p˚ a f¨ orhand best¨ amda punktv¨ arden, vilket de ¨ aven gjorde.

Slutsatsen ¨ ar att det erh˚ allna resultaten visar att metoderna ¨ ar anv¨ andbara

och att fortsatt utveckling av dem rekommenderas.

(4)

1 Introduction

With modern mathematics and computers, modeling of the spread of infectious disease is made possible [2]. Examples of infectious diseases are virus outbreaks, like influenza, or bacterial spreads as antimicrobial resistance (AMR). AMR is a global public health threat [12]. Antibiotics are crucial for the treatment of many bacterial infections as well as for the success of major surgery and cancer chemotherapy. However, with evolution and misuse, bacteria develop a natural resistance to these medications. The research is thus under time- and societal pressure to find new, not yet repelled, antimicrobials. A contributor to AMR is the food producing animal industry [14]. The animals are subjected to antibiotics for three reasons: growth promoting (banned in the EU), group, and individual treatment of infections. In the case of AMR spread, the two former are of major concern [10, 14].The resistance gene spreads from the farms to bacteria in contact with the human population, and without anti-infective drugs, the number of infections without effective treatment will increase. Research in epidemiology will aid in finding how to anticipate and stop the spread of diseases, such as AMR.

Three processes can be planned for the supervision of diseases: measure- ments, forecasts, and countermeasures. Measurements can be samples from livestock, food, or hospitals. The goal of the measurements is to monitor and better understand the spread. Forecasts are needed to tell how the disease will spread. The evaluation of possible countermeasures determines what method hinders the disease most efficiently. To perform the latter two processes, one needs mathematical-/computational models, which are the targets for this thesis.

The mathematical modeling of infectious diseases is an area that in later years, in the age of information technology and computers, has seen an increase in academic papers. In 1957, Bailey released a book covering the subject in which the number of articles on the subject was in the hundreds [2]. At that time computer modeling was in its infancy compared to today. Bailey relied on rigorous mathematics and transformation techniques. Today much work is carried out by simulations and numerical methods. State of the art tools, such as SimInf [18], allows one to compute years of simulated time in a couple of seconds on a personal desktop computer.

3

(7)

4 CHAPTER 1. INTRODUCTION

The specific modeling in this thesis is done using a state space modeling technique called continuous time discrete state Markov chain (CTMC). An entity, say a cow, can be in one state at a time. The entity will with a specified rate shift to another possible state. The transition between states is executed in continuous time, therefore the name CTMC. When modeling a disease spread, a simple, comprehensible candidate model is the SIS-process [2]. The Markov chain has two states, Susceptible (S) and Infected (I), and the rates are the infection rate, how quickly the infection spreads to susceptible entities, and the recovery rate, how fast an infected entity recovers and re-enters the susceptible state. When one assumes the CTMC model, stochasticity in the model is also assumed. At each time step, the system evaluates the possible state transition where the outcome is dependent on a random variable.

To adjust models, one introduces parameters, which allow for fine tuning of the characteristics of the output. When data becomes available, one can extract enough information to find the correct parameters that fit the model output to the received data. The procedure of finding a good enough parameter fit is called estimation. In a deterministic setting, the estimation of parameters often involves finding parameters under a present noise. In the stochastic case, one will need to rely on previously developed methods and theorems. Today there are two schools of statistics, frequentist, and Bayesian. Briefly described, frequentists believe that variable (parameter) values hold one true value and one only observes these values with a specified accuracy, P-values. Bayesians say that the variable (parameter) have a probability distribution and that the observed values are drawn from this distribution. In this thesis, we assume the Bayesian point of view. A popular method today for parameter estimation is Markov chain Monte Carlo (MCMC). The MCMC method incrementally takes steps in the parameter space and finds the posterior of the parameter.

One limitation of the method is that it requires a defined likelihood function, which for complex probability models might be infeasible to obtain. This thesis addresses that limitation and implements newer methods that do not need the likelihood for parameter estimations.

Data is increasingly available today. A requirement for new solutions is that they need to be able to react as fast as possible to new data. The methods used in this thesis utilize the data gathered to estimate parameters. In using MCMC with a defined likelihood function, one computes the probability of the sampled dataset being from the same distribution as the examined data.

Another type of methods is built around received data, e.g. Kalman filters [9],

newer filtering techniques, as the particle filter, ensemble Kalman filters [16],

or the use of path wise information theory [15]. The key difference between

the filtering techniques and the posterior inference methods, can be explained

by analogy. An artist is completing a painting, and the task is to name the

painter from just observing the painting. The dynamical, data-centric methods

(filters), starts with equal weights, the certainty of the answer, as the canvas

is empty. The methods recompute the weights with every nth stroke of the

(8)

5 brush, and converge to an answer. The posterior inference methods, observe the finished painting and give a solution based on prior knowledge of a pool of painters. This thesis concerns posterior inference methods, but as an outlook, one could explore the filtering methods.

When constructing methods and solutions for complex problems, is a crossroad: either one goes straight ahead and tries to solve the problem in its full complexity, or one starts with a simple toy model and later increases the complexity to full rank. The latter methodology has the advantage that if one incrementally raises the complexity of the problem and the method fails in a stage, then one knows that the method will not solve the full problem as it cannot solve the simpler alternative. One way of constructing a simplified problem is by the inverse crime, inheriting its name from optimization theory.

An example where using the inverse crime would be beneficial is in financial stock price modeling with parameter estimation. As a first step, consider a candidate model. Generate a time series with a known parameter set using the model. If and only if the estimation method successfully infers the correct parameters from the generated data, continue to the next step. The following levels should consider perturbed data with added noise. Again, if and only if the methods determines the parameters from the information given, continue to add complexity. The increments carry on until one confidently can feed the real data to the method. Finalizing each step often requires more computing time, because of the growing intricacy of the problem. Thus being able to discard the method at the early steps is more time efficient than starting with full complexity. This thesis is the third phase in the inverse crime for solving the problem of parameter estimation on a national scale disease spread network. One explored and proved the feasibility of a parameter estimation under synthetic data [3]. Then one achieved point estimates for actual data [19].

The aim of this thesis is to determine a Bayesian posterior for said parameters

using a synthetic dataset closely resembling real data.

(9)

(10)

2 Disease modeling

Data is becoming available, and the area of computational epidemiology is now of interest as the required data, and computational power are finally here.

This chapter defines Markov chains, SIS-processes, and network models.

2.1 Markov Chain

A Markov process is a group of stochastic processes that also satisfies the Markov property. With this process, one can make predictions about its future solely dependent on its present state. Consider the stochastic process Y

n

, n = 0, 1, 2, 3, . . . that takes on a finite number of values. The process is said to be in the state i at the time n if Y

n

= i. Suppose that when in state i there is a fixed probability P

_ij

that the next state the process will be in is j. Figure 2.1 gives a visual interpretation of the example, where the process can be in either state i or j with the transition probabilities defined as P

_ii

, P

_ij

, P

_jj

, and P

_ji

. The example process is a Markov process. Time dependence of Markov processes, are defined as discrete or continuous in time.

In this thesis, we will refer to Markov chains as continuous in time, CTMC for short.

2.2 SIS-model

To simulate disease spread, one needs to decide on a mathematical model. A basic example is the SIS-model. The population can be in one of two states;

susceptible, S, or infected, I. One writes a representation of the model as S+I − → 2I

^υ

I − → S

^γ

)

, (2.1)

where υ is the transmission rate and γ the recovery rate [2].

Figure 2.2 shows a simulation of a SIS system over 1000 days. One sees how after ≈ 200 days the outbreak takes on and the proportion of individuals infected rises to ≈ 60% and later reaches its peak and then die off as individuals return to the susceptible state and the number of infected individuals vanishes

7

(11)

8 SIS

_E

3-model

Figure 2.1: A Markov chain with two defined states, i and j, and the transition probabilities: P

_ii

, P

_ij

, P

_jj

, and P

_ji

.

after 400 days. Figure 2.2 is just one simulation with a set of parameters.

Another run, with a different random number seed, might give a different pattern; the percentage of infected individuals might never surpass 10% and then goes straight to zero, or the system stays in the oscillating state for multiple periods before returning to zero. Figure 2.3 depicts is a simulation with the same parameters as in Figure 2.2, but with different random numbers, showcasing the latter mentioned phenomena.

2.3 SIS

_E

3-model

The SIS-model can be adjusted and extended in a number of ways. When modeling a bacterial infection, one extension is adding an environmental, infectious pressure ϕ(t) as the infected individuals shed the disease/bacteria to the environment [3]. Modeling the continues variable ϕ(t) is done as

dϕ dt = α I

N − β(t)ϕ, (2.2)

where α is the average shedding rate, N is the sum of susceptible and infected individuals, and β(t) is the decay of bacteria in the environment at time t. β(t) is a seasonally dependent constant. Four seasons translates into a set of four constants, where only one is observable and active at a time. When including ϕ(t) in (2.1) one gets

S+I −

^υϕ

− → 2I I − → S

^γ

)

. (2.3)

(12)

9

0.00 0.25 0.50 0.75 1.00

0 250 500 750 1000

Time

value

variable S I

Figure 2.2: Evolution of a SIS system over 1000 days. The red curve represents the proportion of individuals that are in the susceptible state, S, and the blue are in the infected state, I. The y-axis gives the proportion and the x-axis the elapsed time in days. After 400 days the proportion of infected individuals goes to zero. The used parameter values were υ = 0.017 and γ = 0.1.

The transition between states is modeled as a CTMC over i = 1, 2, . . . , N spatial nodes and j = 1, 2, . . . , M age groups. Further extending (2.3) into

S

i,j

+I

i,j υjϕi

−−−→ 2I

_i,j

I

i,j

γj

−→ S

_i,j







, (2.4)

with υ

_j

being the age group dependent transmission rate and ϕ

_i

the environ- mental, infectious pressure at node i defined as

dϕ

i

dt = α P

j

I

_i,j

N

i

− β(t)ϕ

_i

, (2.5)

where N

i

is the sum of all age groups of individuals in the node as N

i

= P

j

I

_i,j

+ S

_i,j

. As an extension to (2.5), adding a rate of proximal spread

between nearby nodes captures unregistered connections in the network and

(13)

10 SIS

_E

3-model

0.00 0.25 0.50 0.75 1.00

0 250 500 750 1000

Time

value

variable S I

Figure 2.3: Evolution of the same SIS system as in Figure 2.2 over 1000 days, but with a different random seed. The red curve represents the proportion of individuals that are in the susceptible state, S, and the blue are in the infected state, I. The y-axis gives the proportion and the x-axis the elapsed time in days. As the random seed is different the systems evolution over time is different. Compared to Figure 2.2 the proportion of infected individuals never reaches zero in the time frame of 1000 days.

increases model to data explanation [19]. Let D denote the rate of the proximal spread, and let d

_ik

denote the distance between the two nodes i and k. Defining a new ϕ

_i

as

dϕ

i

dt = α P

j

I

_i,j

N

i

+ X

k

ϕ

k

(t)N

k

(t) − ϕ

i

(t)N

i

(t)

N

i

(t) · D

d

_ik

− β(t)ϕ

_i

. (2.6) The distance between nodes, the sum over k, is evaluated for all nodes except for the trivial case of k = i.

Figure 2.4 illustrates how a SIS

E

3 system can behave over 1 000 days with

100 connected nodes. Seasonality is observable in the figure, and the infection

decays to zero over time and is almost non-existent after the 1 000 days.

(14)

11

0.00 0.25 0.50 0.75 1.00

0 250 500 750 1000

Time

value

variable S1 S2

S3 I1

I2 I3

Figure 2.4: Evolution of a SIS

_E

3 system over 1000 days and 100 network connected nodes. The three age categories are represented with different dashing of the curves. The red, yellow, and green curves give the proportion of susceptible individuals and the blue, purple, and magenta curves give the proportion of infected individuals. Seasonality is observable as in the periodicity of multiple local minima in the dashed green line, which represents S

₃

. The y-axis gives the proportion and the x-axis the elapsed time in days. The values of the parameters used were υ

1

= 0.0357, υ

2

= 0.0357, υ

3

= 0.00935, γ

1

= 0.1, γ

2

= 0.1, γ

3

= 0.1, α = 1, β

t1

= 0.19, β

t2

= 0.085, β

t3

= 0.075, $beta

t4

= 0.185, and D = 0.02.

2.4 Local and global dynamics

The system at hand connects multiple nodes, forming a network, and the dynamics needs to be defined. A stochastic differential equation (SDE) with jumps describes the local dynamics as

dX

_t

= Sµ(dt), (2.7)

where µ(dt) = [µ

1

(dt), . . . , µ

Ncomp

(dt)]

^T

and N

comp

is the number of compart-

ments or age groups. An example of the transitions between compartments is

the SIS-model (2.1). The state vector consists of two compartments X = [S, I],

(15)

12 Simulation software

with the transition matrix

S = −1 1 1 −1

, (2.8)

and intensity vector

R(x) = [υx

1

x

2

, γx

2

]

^T

(2.9) In the case of a transition, the state vector is modified by the transition matrix as

X

_t

= X

_t−

+ S

k

(2.10)

when transition k occurred at time t.

The local dynamics extends to a network model of N

nodes

nodes, by a state matrix X ∈ Z

^N+^comp^×N^nodes

and (2.7) extends to

dX

⁽ⁱ⁾t

= Sµ

⁽ⁱ⁾

(dt), (2.11) with node index i ∈ {1, . . . , N

_nodes

}. Then consider an undirected graph G with of N

_n

nodes as its vertexes and interactions defined by the counting measures ν

^i,j

and ν

^j,i

. Where ν

^i,j

represents the change in state due to an influx of individuals from the node i to j, and the reverse for ν

^j,i

. Node j (i) is assumed to be in the connected component C(i) (C(j)) of node i (j). These connections are stored in matrix C. The network dynamic is then

dX

⁽ⁱ⁾t

= − X

j∈C(i)

Cν

^i,j

(dt) + X

j;i∈C(j)

Cν

^j,i

(dt), (2.12)

which in combination with (2.11) leads to the overall dynamics dX

⁽ⁱ⁾t

= Sµ

⁽ⁱ⁾

(dt) − X

j∈C(i)

Cν

^i,j

(dt) + X

j;i∈C(j)

Cν

^j,i

(dt). (2.13)

2.5 Simulation software

The CTMC modeling in this thesis is performed using SimInf [18], a general

software framework for data-driven modeling and simulation of stochastic

disease spreading in a complex network of connected nodes. SimInf is an

extension to R and utilizes compiled C code with R matrices, in an object

oriented programming framework to define objects in logical layers connected by

well-defined interfaces. The computations are done in parallel using OpenMP,

allowing efficient use of systems with multiple computational threads available.

(16)

3 Available data

The thesis use data from an observational study tracing the verotoxigenic Escherichia coli O157 (VTEC O157) status in Swedish cattle herds from 2009 to 2013 [20]. The data includes births, deaths, and movements on an animal level from 37 221 farms, referred to as nodes. However, the study conducted measurements for the bacteria only on 126 of the nodes due to some factors.

Illustrated in Figure 3.1 is the spatial system that connects the Swedish cattle farms (or nodes) through some of the animal transfers.

3.1 Data Collection

The data collection process was such that over a period of four years, every other month, a measurement was taken at some specified nodes. The testers wore overshoe-socks and collected samples by walking around the animals with the socks on. The socks were then sent to a laboratory in which bacterial cultures were grown from each sock. If the verotoxigenic bacterium was found in one of the sample from a node, that node was marked as infected. The procedure resulted in a binary time series of ones and zeros.

In working under the inverse crime principle, it is not desirable to start working with the data directly. Instead, we use a generated dataset that mimics the observed data. The new dataset includes the same number of data points, the same nodes, and the same date of accumulation, as the real data, to reproduce the same situation as with the actual data but with controlled complexity. The dataset can also be altered with different parameters, enabling the use of data-driven estimation methods.

As the actual data from the study is an observation of the underlying state, to complete the constructed dataset one also needs to model the detections of the bacteria. Therefore the procedure is modeled to emulate the laboratory detection, creating simulated observed data similar to the real dataset. For the simulation of the sock-to-binary result, the result used is from a prior study [1] in which they examined the infection detection probability. In this study, tests were taken at a node with high probability of finding the infection, in animal-pools of three individuals in each pool. From the resulting samples

13

(17)

14 Clustering of the data

a sensitivity function was determined to be [0, 0.28, 0.86, 0.99] for the number of infected individuals in the range (0 − 3) in each pool.

In summary there are thus two variance building steps. First, the overar- ching simulation of the underlying state, and secondly, the swabbing of the nodes, uncovering the observed state.

3.2 Clustering of the data

Spatial aggregation of the data is needed for the analysis. A possible aggregation is to sum all nodes, and view the data as one node as

N

j

= X

i∈[1,Nnodes]

N

i,j

(3.1)

where i is the node number from 1 to N

_nodes

, and j the time of observation.

Another example is to group into counties with the argument that Sweden is a latitudinally dependent country where the southern climate is different from the northern one. To group nodes into counties, one needs to combine the location of the node with the boundaries of the counties. Another similar, more robust way, is to group the nodes with a clustering algorithm such a K-means or hierarchical clustering, which only requires either the geographical point of the node or the distance between the nodes. Figure 3.2 illustrates the dendrogram that one gets from hierarchically clustering the dataset. The height defines the distance between clusters and each bifurcation represents the division of the data into another cluster. The number of clusters that the data is divided into is therefore directly related to what level of bifurcations that one chooses. Level one has one cluster, two has two clusters, three has three clusters, and so on. To evaluate what level to choose in a hierarchical cluster, one can create a heat map, as in Figure 3.3. In the heat map, the lighter the color (white to yellow to red) the further the nodes are apart, and there are five distinct darker (red) squares. The number of darker squares gives the number of possible clusters. In Figure 3.3, one can argue that there should be three, four, five, or six number of groupings. However, one should aim towards a good balance between precision and generality. Therefore five clusters were chosen.

Using hierarchical clusters, the spatial aggregation of the data becomes N

_k,j

= X

i∈[k1,kM]

N

i,j

(3.2)

where k is the index of the cluster which holds aggregated information about nodes i = k

₁

to k

_M

, which are defined in this thesis from the dendrogram in Figure 3.2.

In the dataset, one can see that sampling on the same date was not always

possible. By aggregating the data in time, one accounts for the mismatch

(18)

15 in date. The decision was made to aggregate in yearly quarters; Jan-Mar, Apr-Jun, Jul-Sep, and Oct-Dec.

The values in Table 3.1 represents the data received after one simulated and detected trajectory. Each row is a quarter for which the study ran, and the columns show the number of infected holdings in the cluster in that quarter.

This quantification definition, of the infection spread, is later named prevalence.

Table 3.1: Aggregated sampled data from a simulated SIS

_E

3-system of five clusters, P.1 to P.5, over the number of infected nodes, prevalence, in each cluster for the quarters, from 2009 Q4 to 2012 Q4. The table is from an example run and is what is used for the analysis.

P.1 P.2 P.3 P.4 P.5

2009 Q4 1 0 2 6 4

2010 Q1 0 0 4 3 3

2010 Q2 2 0 5 4 3

2010 Q3 3 1 1 1 2

2010 Q4 4 1 6 9 10

2011 Q1 1 1 1 4 6

2011 Q2 3 2 2 6 8

2011 Q3 3 1 3 4 5

2011 Q4 7 2 5 3 9

2012 Q2 2 5 4 1 7

2012 Q3 4 3 2 2 2

2012 Q4 4 5 5 7 5

3.3 Point estimates

In a study of point estimates for the parameters in the SISe3 model, θ = (υ

₁

, υ

₂

, υ

₃

, γ

₁

, γ

₂

, γ

₃

, α, β

₁

, β

₂

, β

₃

, β

₄

, D)(2.4), an optimization problem defined

as

argmin

θ

G(θ) (3.3)

tries to solve for the best fit of single values of the parameters, θ. Here G is a distance measure determined by G

1

, the number of infected holdings, and G

2

, the incidence cases of infected holding, as G(θ) = G

1

(θ) + G

2

(θ). In turn, G

1

and G

₂

are defined as

G

1

(θ) = X

tn∈y,q,c

(Y

in

(θ) − Y

_in^∗

)

²

, (3.4)

G

2

(θ) = X

tn∈y,q,c

(X

in

(θ) − X

_in^∗

)

²

, (3.5)

(19)

16 Prior observation

where we aggregate data over time slots of quarters, q, in years, y, for clusters, c. Y

_in^∗

denotes the nth observed status, 1-positive and 0-negative, in node i at time t

_n

. Similarly Y

_in

(θ) denotes the simulated status corresponding to Y

_in^∗

. X

_in^∗

denotes the first occurrence of a positive status in node i at time t

_n

, such that X

_in^∗

= 1 at the first occasion Y

_in^∗

= 1 and X

_in^∗

= 0 all other occasions.

The optimization problem (3.3) was then solved by using the Nelder-Mead method [19] and the resulting parameter values are given in Table 3.2.

Table 3.2: The resulting parameters values from point estimations conducted in [19]. Values accompanied with a “=” sign were fixed and not estimated. υ

₁

and υ

₂

were fixed to the same value as it was determined that the two younger categories were similar compared to the adults, group 3.

Parameter Value

υ

₁

0.0184

υ

₂

0.0184

υ

₃

0.00980

γ

₁

= 0.100

γ

₂

= 0.100

γ

₃

= 0.100

α = 1.00

β

1

0.109 β

2

0.103 β

3

0.114 β

4

0.125 D 0.100

3.4 Prior observation

Cray et al. [5] conducted a study on the difference between transmission rates

υ for the different age groups. The study determines that there is a significant

difference between the young (age group 1 and 2) and the adult population

(age group 3) in cattle. From the produced point estimates [19] one observes

that υ

₃

is approximately half the size of υ

₁

. One can employ this knowledge of

the relationship between the parameters into Bayesian parameter inference by

a prior. This thesis implements a fixed scalar prior where υ

₃

is half the value

of υ

₁

. υ

₂

is set to the same value as υ

₁

as it is seen as young cattle.

(20)

17 Figure 3.1: Visualization of cattle movements in the data. The connections shown are a random subset of the complete dataset of ∼ 10

⁸

recorded events.

The source of the data is the national cattle register at the Swedish Board of

Agriculture [Reproduced from [3]].

(21)

18 Prior observation

Figure 3.2: Cluster dendrogram of the hierarchical cluster constructed from the

distance matrix of the nodes in the data. The height represents the distance

between two nodes. The bottom of the tree, at zero height, the identification

number of the nodes in the dataset are given. Each bifurcation represents a

level at which the data is divided into one more cluster. At the top level all

the nodes are in one cluster and at the bottom the number of clusters is the

same as the number of nodes.

(22)

19

153011014

15363 15518 15556 18398 18395 18188 29211 18366 16732 16773 18363 18872 15724 15947 16050 16123 15945 16130 16365 16408 16499 16456 23517 16294 162314757 11392 12136 11669 12002 11990 22487 12192 14680 15046 14655 15001

36

15027 15099 14609 14666 101176444303635574 5587 2529 5633 4966 2550 2934 2584 2931 4849 2553 2986 2590 6837 4360 203604356 2945 2808 3205 2560 2401 3037245159003539

15184 15168 13600 13577 14718 13581 13604 13603 13569 13563 19905 14768 15110 15182 14735 14755 13579 13602 22957 13559 13539 20061 13536 13586 135372186 1625 1903 1902 1908 1907 1613 6821 1873 1874 1870 1943 1345 1030 1631 1649 1650 1686 1694 1691 1663 244041679 1680 1674 1678 1683 1677

15301 101415363 15518 15556 18398 18395 18188 29211 18366 16732 16773 18363 18872 15724 15947 16050 16123 15945 16130 16365 16408 16499 16456 23517 16294 16231 475711392 12136 11669 12002 11990 22487 12192 14680 15046 14655 15001 3615027 15099 14609 14666 10117 644430363 55745587 25295633 49662550 29342584 29314849 25532986 25906837 436020360 43562945 28083205 25602401 3037245 15900 353915184 15168 13600 13577 14718 13581 13604 13603 13569 13563 19905 14768 15110 15182 14735 14755 13579 13602 22957 13559 13539 20061 13536 13586 13537 21861625 19031902 19081907 16136821 18731874 18701943 13451030 16311649 16501686 16941691 166324404 16791680 16741678 16831677

Figure 3.3: Visualization of the heat map result from the hierarchical cluster

constructed from the distance matrix of the nodes in the data. More distinct

red color represent closer distances between nodes. The right and bottom

sides list the identification numbers of the nodes. The top and left sides give

dendrogram as seen in Figure 3.2 which is the underlying base of the heat map.

(23)

(24)

4 Bayesian Methods

Estimating parameters of stochastic models using available data can be done in multiple ways [8]. In Bayesian inference, the parameters are described by their posterior distribution. As a general setting, imagine data D generated from a model M with parameters θ, and the prior density P (θ). The posterior distribution of the parameters is then given by

P (θ|D) = P (D|θ)P (θ)

P (D) , (4.1)

referred to as Bayes’ theorem. The critical point about the Bayesian inference is that it provides a means of combining new evidence, data, with prior beliefs in the priors. A simplistic approach to computing the posterior, P (θ|D) for the parameter θ, is a plain rejection method:

1 Propose a θ from the prior P (θ).

2 Accept the proposal with probability h = P (D|θ); return to 1.

In the above method, one assumes that the likelihood P (D|θ) is known for the model M, which is not true for all models. More involved Bayesian inference methods exist, such as MCMC [6], given in Algorithm 1. To use MCMC, one needs to have a problem with an analytically tractable likelihood, L(θ). As one steps away from this space, one needs to use other methods. These methods will inherently be worse off, as the likelihood defines the probability of the values. This section will cover two methods that try to mimic the same result as MCMC without using a defined likelihood, ABC and SLMCMC.

21

(25)

22 Summary statistics

Algorithm 1 MCMC Metropolis [6]

1:

Initial guess θ

2:

Compute a likelihood L(θ)

3:

while n < N do

4:

Propose θ

⁰

from a proposal distribution g(θ

⁰

|θ)

5:

Compute a likelihood L(θ

⁰

)

6:

Draw a uniformly distributed number r ∼ U [0, 1]

7:

if r < min 1,

^L(θ_L(θ)g(θ|θ⁰^)g(θ⁰^|θ)0)

then

8:

Set θ = θ

⁰

9:

Set L(θ) = L(θ

⁰

)

10:

Set n = n + 1

11:

end if

12:

end while

The likelihood function L(θ) defines the probability of the outcome data given the parameter value θ, and is used in the algorithm to determine whether to accept or reject a proposed parameter. The proposal distribution function g(θ

⁰

|θ) returns a value θ

⁰

dependent on the argument θ, interpreted as a new parameter set θ

⁰

from the old set θ in the algorithm. The proposal function is reversible, in the sense that if one steps to θ

⁰

, you can use g to return to θ. For a symmetric proposal function, the fraction in Step 7 of the algorithm reduces to

^L(θ_L(θ)⁰⁾

. A possible choice is to let g be a Gaussian distribution centered around θ. Points close to θ are thus more common than distant ones. In this thesis a log-normal distribution is chosen as none of the parameters are allowed to be negative.

4.1 Summary statistics

A data set of length n is said to have n dimensions. Comparing two data sets of length n means comparing two sets of n dimensions, which is ineffective for large n. Therefore, a more efficient approach is to summarize the data into a lower dimension m, where m n. The comparison is then between two data sets of dimension m instead of n. The m summary statistics (SS) provide the new dimension m.

A SS can often be categorized into one of four types: location (e.g. mean or median), spread (e.g. standard deviation or range), shape (e.g. skewness or kurtosis), and dependence (e.g. Pearson product-moment correlation co- efficient). Summarizing the data with one SS from each of the mentioned categories could give a good set of summarizing data points for the entire data set.

If the SS summarizes the data perfectly, they are called sufficient statistics

and are a mapping from n to m dimensions without loss of any information [4].

(26)

23 The goal for choosing SS is to mimic sufficient statistics but is often not reachable.

Finding what SS to adopt is mostly based on heuristics for each applied problem [11, 21]. However, Nunes et al. [13] studies a quantification of the problem in search for a non-heuristic solution of finding SS.

The choice of SS in this thesis is selected to be the prevalence, the number of infected holdings, as defined in equation (3.4), but without the square difference.

Defining the prevalence defined as

P = X

tn∈y,q,k

Y

_in

(θ), (4.2)

which aggregates data over time slots of quarters, q, in years, y, for clusters, k.

Y

_in^∗

denotes the nth observed status, 1-positive and 0-negative, in node i at time t

n

.

4.2 Approximate Bayesian computations

In recent years, as the accessibility of computational power, ABC has become a viable option for Bayesian inference. The advantage of using ABC over MCMC is the possibility of using SS when a likelihood is not available [17]. ABC is computationally heavy, compared to MCMC, because simulated trajectories are evaluated through SS, which hold less information than a likelihood, and one proposes new parameters without knowledge of the previously simulated trajectory; ABC trades memory for generality. The pseudo-algorithm for ABC is given in Algorithm 2.

Algorithm 2 ABC algorithm [17]

1:

Compute summary statistic S

^∗

of observed data D

^∗

2:

Draw θ

_i

where i = 1, 2, . . . , N from a prior distribution

3:

Simulate one set of data D

i

for each parameter θ

i 4:

Compute summary statistics S

i

of D

i

5:

Calculate the distance between the summary statistics, ρ

_i

= ||S

_i

− S

^∗

||

6:

Option 1: Accept θ

i

for which ρ

i

< .

7:

Option 2: Sort ρ

i

in ascending order and keep highest ranking θ

i

. For a sufficiently small tolerance , the posterior P (θ|D) is approximated [17].

4.3 Synthetic Likelihood

Wood [21] proposed the idea to approximate summary statistics for the data as

s ∼ N (µ

_θ

, Σ

_θ

), (4.3)

(27)

24 SLMCMC

where µ

_θ

is the multivariate mean and Σ

_θ

the covariance of a multivariate normal variable. The validity of the approximation made in (4.3) can be checked per SS using e.g. the Jarque-Bera test [7, 21]. One writes the synthetic log-likelihood as

l

s

(θ) = − 1

2 (s − ˆ µ

θ

)

^T

Σ b

⁻¹_θ

(s − ˆ µ

θ

) − 1

2 log |b Σ

θ

|, (4.4) with ˆ µ

θ

and b Σ

_θ

being estimates of µ

θ

and Σ

_θ

over N

r

simulated observations of the proposed parameter θ. These estimates are defines in (4.10) and (4.13) As N

r

increases, l

s

starts to behave like a conventional log-likelihood.

4.4 SLMCMC

As in ABC, instead of using an analytically tractable likelihood, SLMCMC aims to use SS to measure a distance between the data generated by the proposed parameters and the reference data. The difference between ABC and SLMCMC is that SLMCMC tries to replicate the MCMC method by using a SS based synthetic likelihood.

This thesis tests two cases of the application of the synthetic likelihood, a heuristic urn-model and the synthetic likelihood defined in (4.4).

A binomial approximation

When replicating the synthetic likelihood in the event of a binomially distributed SS, some simplifications are possible. However, to correctly use the binomial approximation the measurements should be independent of one another. The following arguments assume this property.

Assume that for each prevalence measurement P

⁽ⁱ⁾

is drawn from the ith urn, where i = 1, 2, . . . , N

_m

and N

_m

is the number of prevalence measurements.

The heuristic urn model is described by a binomial distribution

P

⁽ⁱ⁾

∼ Bin(N

⁽ⁱ⁾

, p

⁽ⁱ⁾

), (4.5) where N

⁽ⁱ⁾

is the number of observations and p

⁽ⁱ⁾

the binomial probability for urn i. By estimating p

⁽ⁱ⁾

, one can define the likelihood of two measurements being drawn from the same urn.

From the data D

θ^∗

generated by the sought after parameter set θ

^∗

, one can estimate ˆ p

⁽ⁱ⁾

by the frequentist approach,

ˆ

p

⁽ⁱ⁾

= E[P

⁽ⁱ⁾

]

N

⁽ⁱ⁾

, (4.6)

where E[P

⁽ⁱ⁾

] is the expected value of P

⁽ⁱ⁾

. For a proposed parameter set, ˜ θ, the

model M(˜ θ) is simulated and ˜ P = P ˜

⁽¹⁾

, ˜ P

⁽²⁾

, . . . , ˜ P

^(Nⁿ⁾

is extracted from the

generated data D

_θ_˜

. The probability mass function of a binomially distributed

(28)

25 variable defines the probability that P ˜

⁽ⁱ⁾

is from the same distribution as P

⁽ⁱ⁾

, where the probability mass function is

P( P ˜

⁽ⁱ⁾

= P

⁽ⁱ⁾

) = N

⁽ⁱ⁾

P

⁽ⁱ⁾

ˆ

p

^P⁽ⁱ⁾

(1 − ˆ p)

^N⁽ⁱ⁾^−P⁽ⁱ⁾

. (4.7) The synthetic likelihood, L(˜ θ|D

θ^∗

), is (4.7) or if multiple P s are extracted from the data, e.g. a time series, the product of the probabilities as,

L(˜ θ|D

_θ^∗

) = Y

j

P( P ˜

⁽ⁱ⁾

= P

⁽ⁱ⁾

). (4.8) By the approximations in (4.5),(4.7), and (4.8), one can construct a search algorithm similar to MCMC, and determine an approximated posterior for ˜ θ.

A comment about the synthetic likelihood extracted in (4.8) is that because of the assumption of the binomial distribution of P , each observation is assumed to be discrete. Which is a correct assumption because of the binary result from the actual measurements, a farm cannot be half infected. But as we always compare these discrete observations we do not put any more effort into arguing for continuity when comparing to other approximations of the likelihood.

A normality approximation

For multiple measurements (simulations), the prevalence, P , can be approxi- mated to be normally distributed, as Wood proposed about general SS [21].

We therefore assume

P ∼ N (µ

_θ

, Σ

θ

) (4.9)

with the mean vector µ

_θ

and covariance matrix Σ

_θ

. One aims to estimate µ

_θ

and Σ

_θ

in the following manner

ˆ

µ

⁽ⁱ⁾_θ

= X

j

s

⁽ⁱ⁾_j

/N

_r

(4.10)

where s is a vector consisting of the SS from multiple simulations with the same parameters θ, this case the prevalence P as

s = s

⁽¹⁾

, s

⁽²⁾

, . . . , s

^(N^m⁾

=

P

₁⁽¹⁾

, P

₂⁽¹⁾

, . . . , P

_N⁽¹⁾

r

,

P

₁⁽²⁾

, P

₂⁽²⁾

, . . . , P

_N⁽²⁾

r

,

. . . ,

P

₁^(N^m⁾

, P

₂^(N^m⁾

, . . . , P

_N^(N^m⁾

r

!

, (4.11)

where i = 1, 2, . . . , N

m

is the number of prevalence measures and j = 1, 2, . . . , N

r

is the number of simulations from the parameter θ. To estimate Σ

_θ

, the correla-

tion between the SS, one first computes the difference between each simulation

(29)

26 SLMCMC

vector s and the vector of mean values, ˆ µ

_θ

, as

S = (s

⁽¹⁾

− ˆ µ

⁽¹⁾_θ

, s

⁽²⁾

− ˆ µ

⁽²⁾_θ

, . . . , s

^(N^m⁾

− ˆ µ

^(N_θ ^m⁾

), (4.12) and then Σ

θ

by the sample covariance matrix,

Σ b

_θ

= SS

^T

/(N

_r

− 1). (4.13) (4.4) then gives the synthetic log-likelihood.

Note that the number of simulations, N

r

, used for estimates ˆ µ

θ

and b Σ

θ

needs to be larger or equal to N

m

in order to correctly determine the covariance

Σ

_θ

. Otherwise, b Σ

_θ

will have zero eigenvalues, and the determinant becomes

zero. A computational trick that can be used to circumvent the problem is to

nudge the diagonal of the matrix with a small value δ. However, this will give

a slightly biased result.

(30)

5 Results

The parameter estimation setting first simulates a dataset with the parameters given in Table 3.2, and then by using two inference methods, ABC and SLMCMC, to see if a posterior is possible to achieve. The aim is to estimate θ = (υ

₁₂

, υ

₃

), where υ

₁

= υ

₂

= υ

₁₂

and the remaining parameters are viewed as fixed values previously observed and determined. The true parameter set θ

^∗

, that the methods tries to estimate is as in Table 3.2: (υ

₁₂

, υ

₃

) = (0.0184, 0.00980).

5.1 ABC

The results covers two cases in ABC. The most general one is a uniform prior, while the second one was a scalar dependence between the two parameters, effectively creating one parameter, υ

12

= 2 · υ

3

.

For 5000 runs, with two separate υ for the age groups (υ

1

= υ

2

6= υ

₃

) the resulting posterior is given in Figure 5.1. In the figure we see four different symbols, representing different top percentages of the data; + - 10%, - 5%, 4 - 2.5%, - 1.24%. Each higher level includes the lower level points as well.

Implying that all points presented are inside the 10% region, and so on.

To further investigate the posterior, the mean and standard deviation are extracted, Figure 5.2 and Figure 5.3 gives the histogram for the top 100 values for the same ABC. Presented in both figures are straight lines for the mean values and dashed lines for the actual values both added for illustrative purposes. The moment values are for υ

₁₂

mean: 0.019025 and standard deviation: 0.010493, as seen in Figure 5.2. The mean is close to the true value, but the total range is large (0, 0.04). For υ

₃

the mean is 0.006740 and standard deviation is 0.005190. Compared to υ

₁₂

the range is smaller (0, 0.2), the true value is still inside of one standard deviation, but the histogram looks heavy on the left side.

The histogram in Figure 5.4 gives the resulting posterior with a scalar prior fixated υ

₃

fixed at the half of υ

₁₂

. υ

₁₂

has a mean: 0.016977, and standard deviation: 0.001491. Compared to the case without the prior knowledge applied, the mean is not closer to the true value. However, the standard deviation is one order smaller for the same amount of accepted values. One needs fewer

27

(31)

28 SLMCMC

Figure 5.1: Posterior for θ = (υ

12

, υ

3

) using ABC with a total of N = 5000 trajectories with a decreasing acceptance from 10% to 1.24%. The different accuracies are represented by different colors and symbols. The finer accuracies also lie inside of the courser ones. The 10% values are also include the 1.25%

region.

simulations with the prior applied for the same amount of accuracy as in the case without it.

5.2 SLMCMC

In testing the two SLMCMC methods, first a synthetic likelihood map is

computed, then a prior condition is applied which results in a 1-dimensional

parameter space for which the likelihood also is presented. If this map and

space are well defined without singularities, and coupling a maximum is feasible,

(32)

29

0 5 10

0.00 0.01 0.02 0.03 0.04

υ₁₂

Count

variable Mean True

Figure 5.2: Histogram over υ

12

using ABC for the top 100 values of a total of 5000 trajectories. In addition to the bars, the solid red line illustrates the mean value and the dashed blue line marks the true value of the parameter.

then we apply the search algorithm SLMCMC.

The SLMCMC algorithm, with both synthetic likelihood approximations, was used to extract posteriors. The algorithms ran until 100 values were accepted. From these values every nth value was kept in order to remove as much autocorrelation as possible, and the nth value was the first value for which the autocorrelation was lower than 20%, referred to as burn-in period.

The full space coordinate grid is constructed from υ

₃

= [0, 0.02] and

υ

₁₂

= [0, 0.04], and the prior later applied is υ

₁₂

= 2υ

₃

.

(33)

30 SLMCMC

0 10 20

0.00 0.01 0.02 0.03 0.04

υ₃

Count

variable Mean True

Figure 5.3: Histogram over υ

3

using ABC for the top 100 values of a total of 5000 trajectories. In addition to the bars, the solid red line illustrates the mean value and the dashed blue line marks the true value of the parameter.

Binomial distributed prevalence

When estimating ˆ p, the original dataset was consisted of 10 simulated trajec- tories. Then for each proposed parameter only one trajectory was used for evaluation.

Figure 5.5 shows the log-likelihood map over the mentioned grid. One sees a similar pattern as in Figure 5.1. The lighter shade of blue represents a higher likelihood of the prevalence generated from the proposed θ. Contour lines are also added to the figure to emphasize the change in likelihood as one travels from the darker regions from the upper right to the lower left of the grid.

After applying the prior condition to θ, the log-likelihood can be presented in

a one-dimensional parameter space as shown in Figure 5.6. The blue triangles

(34)

31

0 5 10 15

0.00 0.01 0.02 0.03 0.04

υ₁₂

Count

variable Mean True

Figure 5.4: Histogram over υ

12

using ABC for the top 100 values of a total of 5000 trajectories when applying the prior: υ

12

= 2υ

3

. In addition to the bars, the solid red line illustrates the mean value and the dashed blue line marks the true value of the parameter.

are computed values of the log-likelihood and the red circle indicates the maximum log-likelihood which is υ

₁₂

= 0.018, a close estimate of the genuine value, observable in Figure 5.7 which is the likelihood and not log-likelihood.

Figure 5.8 illustrates the SLMCMC generated posterior for υ

₁₂

with the

prior applied on υ

12

to be the scalar 2υ

3

. The resulting mean is υ

12

= 0.018694

and standard deviation is υ

₁₂

= 1.805509 · 10

⁻⁴

. The burn-in was reached after

4 values.

(35)

32 SLMCMC

0.000 0.005 0.010 0.015 0.020

0.00 0.01 0.02 0.03 0.04

υ₁₂ υ3

−4000

−3000

−2000

−1000

L

Figure 5.5: The log-likelihood computed over θ-pairs using the binomial approximation of the prevalence. The lighter the color the higher the likelihood.

The white lines are to contour the level changes in the log-likelihood.

Normal distributed prevalence

In the computation of the synthetic likelihood, 10 simulations were made for each proposed parameter, requiring the diagonal of the estimated covariance matrix to be perturbed by δ = 10

⁻³

, in order to have non-zero eigenvalues.

In Figure 5.9 one sees the log-likelihood map from the normally distributed approximation of the prevalence. The transitions are not as smooth as in Figure 5.5, but the major contour lines are there. There is a difference between the maps also in magnitude as seen in the level scale on the right-hand side of the figures.

Applying the same prior to θ in the log-likelihood map as in Figure 5.6

one gets the log-likelihood versus υ

₁₂

in Figure 5.10 and Figure 5.11 for the

(36)

33

−4000

−3000

−2000

−1000 0

0.01 0.02 0.03 0.04

υ₁₂

log−likelihood

variable max up12

Figure 5.6: A splice of the log-likelihood over θ-pairs using the binomial approximation under prior υ

12

= 2υ

3

. The maximum likelihood is marked with a red circle on υ

₁₂

= 0.018.

likelihood without the logarithm. The likelihood in Figure 5.11 was scaled by a factor 1/100 to emphasize the difference in likelihood between points. The red circles the maximum of the log-likelihood υ

₁₂

= 0.018. The right-hand side of the maximum is not as smooth as in Figure 5.6, which is caused by the non-smoothness present in Figure 5.9.

As for the binomial approximation, an SLMCMC algorithm is deployed

and the resulting posterior is visualized in Figure 5.12. The mean value is

0.015765, standard deviation is 0.000541, and the burn-in was reached after 4

values. The result is similar to that of the binomial one, but with the mean

outside the desired area.

(37)

34 SLMCMC

−4.103678e−56 1.846655e−55 4.103678e−55 6.360701e−55 8.617723e−55

0.01 0.02 0.03 0.04

υ₁₂

likelihod

Figure 5.7: The splice presented in Figure 5.6 but without the logarithm.

(38)

35

0 5 10 15

0.00 0.01 0.02 0.03 0.04

υ₁₂

Count

variable Mean True

Figure 5.8: The posterior from using SLMCMC with the binomial approximated prevalence subjected to the prior υ

₁₂

= 2υ

₃

, with the acceptance rate 0.9599%.

In addition to the bars, the solid red line illustrates the mean value and the

dashed blue line marks the true value of the parameter.

(39)

36 SLMCMC

0.000 0.005 0.010 0.015 0.020

0.00 0.01 0.02 0.03 0.04

υ₁₂ υ3

−3e+05

−2e+05

−1e+05

L

Figure 5.9: The log-likelihood computed over θ-pairs using the normal approx-

imation of the prevalence. The lighter the color higher the likelihood. The

white lines are the contours of the levels in the log-likelihood.

(40)

37

−3e+05

−2e+05

−1e+05 0e+00

0.01 0.02 0.03 0.04

υ₁₂

log−likelihood

variable max up12

Figure 5.10: A splice of the log-likelihood over θ-pairs using the normal

approximation under priorυ

₁₂

= 2υ

₃

Bayesian Parameterization in the spread of Diseases

Examensarbete 30 hp Juni 2017

Bayesian Parameterization in the spread of Diseases

Robin Eriksson

Abstract

Bayesian Parameterization in the spread of Diseases

Robin Eriksson

Popul¨ arvetenskaplig sammanfattning

Vid studier av hur sjukdomar sprider sig anv¨ ands idag ofta stokastiska, slumptalsberoende modeller. En vanlig modell f¨ or ett basfall ¨ ar SIS-modellen:

I detta examensarbete studeras en SIS-liknande modell, som ¨ ar ett fall d¨ ar likelihood ¨ ar ok¨ ant. Ist¨ allet f¨ or MCMC anv¨ ands Approximate Bayesian Computations och Synthetic Likelihood Markov chain Monte Carlo som b˚ ada

¨

ar metoder drivna av simuleringar vilket g¨ or dem v¨ aldigt ber¨ akningstunga. De resulterande parameterf¨ ordelningarna fr˚ an dessa metoder f¨ orv¨ antas befinna sig kring n˚ agra p˚ a f¨ orhand best¨ amda punktv¨ arden, vilket de ¨ aven gjorde.

Slutsatsen ¨ ar att det erh˚ allna resultaten visar att metoderna ¨ ar anv¨ andbara

och att fortsatt utveckling av dem rekommenderas.

Contents

Contents 1

1 Introduction 3

2 Disease modeling 7

2.1 Markov Chain . . . . 7

2.2 SIS-model . . . . 7

2.3 SIS

3-model . . . . 8

2.4 Local and global dynamics . . . . 11

2.5 Simulation software . . . . 12

3 Available data 13 3.1 Data Collection . . . . 13

3.2 Clustering of the data . . . . 14

3.3 Point estimates . . . . 15

3.4 Prior observation . . . . 16

4 Bayesian Methods 21 4.1 Summary statistics . . . . 22

4.2 Approximate Bayesian computations . . . . 23

4.3 Synthetic Likelihood . . . . 23

4.4 SLMCMC . . . . 24

5 Results 27 5.1 ABC . . . . 27

5.2 SLMCMC . . . . 28

6 Discussion 41

7 Conclusion 43

Bibliography 45

1

1 Introduction

3

4 CHAPTER 1. INTRODUCTION

One limitation of the method is that it requires a defined likelihood function, which for complex probability models might be infeasible to obtain. This thesis addresses that limitation and implements newer methods that do not need the likelihood for parameter estimations.

Another type of methods is built around received data, e.g. Kalman filters [9],

newer filtering techniques, as the particle filter, ensemble Kalman filters [16],

or the use of path wise information theory [15]. The key difference between

the filtering techniques and the posterior inference methods, can be explained

by analogy. An artist is completing a painting, and the task is to name the

painter from just observing the painting. The dynamical, data-centric methods

(filters), starts with equal weights, the certainty of the answer, as the canvas

is empty. The methods recompute the weights with every nth stroke of the

5

brush, and converge to an answer. The posterior inference methods, observe the finished painting and give a solution based on prior knowledge of a pool of painters. This thesis concerns posterior inference methods, but as an outlook, one could explore the filtering methods.

The aim of this thesis is to determine a Bayesian posterior for said parameters

using a synthetic dataset closely resembling real data.

2 Disease modeling

Data is becoming available, and the area of computational epidemiology is now of interest as the required data, and computational power are finally here.

This chapter defines Markov chains, SIS-processes, and network models.

2.1 Markov Chain

A Markov process is a group of stochastic processes that also satisfies the Markov property. With this process, one can make predictions about its future solely dependent on its present state. Consider the stochastic process Y

, n = 0, 1, 2, 3, . . . that takes on a finite number of values. The process is said to be in the state i at the time n if Y

= i. Suppose that when in state i there is a fixed probability P

that the next state the process will be in is j. Figure 2.1 gives a visual interpretation of the example, where the process can be in either state i or j with the transition probabilities defined as P

, P

, P

, and P

. The example process is a Markov process. Time dependence of Markov processes, are defined as discrete or continuous in time.

In this thesis, we will refer to Markov chains as continuous in time, CTMC for short.

2.2 SIS-model

To simulate disease spread, one needs to decide on a mathematical model. A basic example is the SIS-model. The population can be in one of two states;

susceptible, S, or infected, I. One writes a representation of the model as S+I − → 2I

I − → S

)

, (2.1)

where υ is the transmission rate and γ the recovery rate [2].

7

8 SIS

3-model

Figure 2.1: A Markov chain with two defined states, i and j, and the transition probabilities: P

, P

, P