• No results found

Applications of susceptible infectious recovered models for measles risk in Colorado schools and childcare centers

N/A
N/A
Protected

Academic year: 2021

Share "Applications of susceptible infectious recovered models for measles risk in Colorado schools and childcare centers"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

APPLICATIONS OF SUSCEPTIBLE INFECTIOUS RECOVERED MODELS FOR MEASLES RISK IN COLORADO SCHOOLS AND CHILDCARE CENTERS

by

EMMA S. JONES B.A., Simpson College, 2015

A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment

of the requirements for the degree of Master of Science

Biostatistics Program 2019

(2)

This thesis for the Master of Science degree by Emma S. Jones

has been approved for the Biostatistics Program

by

Kathryn Colborn, Advisor Debashis Ghosh

Edwin Asturias

(3)

Jones, Emma S. (MS, Biostatistics)

Applications of Susceptible Infectious Recovered Models for Measles Risk in Colorado Schools and Childcare Centers

Thesis directed by Assistant Professor Kathryn Colborn

ABSTRACT

Measles was considered eliminated in the U.S. in 2000, but in the first 7 months of 2019 there have been over 1,100 reported cases. Many states, including Colorado, allow for non-medical exemptions (NMEs) to vaccinations for school-aged children, and NMEs are thought to be a major driving factor in the recent US measles outbreak. Colorado has the lowest reported MMR vaccination rate for incoming kindergartners in the country, with only 88.7% of children covered. Schools and child care facilities provide a unique setting in which to evaluate risk of measles outbreaks. We used compartmental sus-ceptible, infectious, recovered (SIR) models and enrollment data from Colorado schools and child care centers to simulate measles outbreaks. Both deterministic and stochastic SIR models were used. An-alyzing outcomes of these models showed limited influence from enrollment size or facility designation (urban, rural, or frontier facilities), but major influence from up-to-date rates. Ordinary Kriging was used to generate risk maps for measles outbreaks across the state.

The form and content of this abstract are approved. I recommend its publication. Approved: Kathryn Colborn

(4)

ACKNOWLEDGEMENTS

Thank you to Dr. Katie Colborn for her guidance and for advising this thesis, and to my committee members for providing invaluable feedback throughout this entire process. Thank you to Marlee Barton and Elizabeth Abbott for allowing me to use the data they meticulously collected and cleaned from the Colorado Department of Health and Environment database. Finally, thank you to the Department of Biostatistics for funding my graduate studies and providing me with the tools to complete both my Master’s degree and this thesis.

(5)

TABLE OF CONTENTS CHAPTER

I. INTRODUCTION . . . 1

I.0.1 Measles Virus and Vaccination . . . 1

I.0.2 Community Immunity . . . 2

I.0.3 SIR Models . . . 3

II. AIM 1: DETERMINISTIC SIR MODELS APPLIED TO COLORADO SCHOOL AND CHILD CARE CENTER DATA . . . 6

III. AIM 2: STOCHASTIC SIR MODELS . . . 12

III.1 Introduction to Stochastic SIR Models . . . 12

III.1.1 Impact of Enrollment . . . 15

III.1.2 Impact of UTD Rates . . . 20

IV. MAPPING OUTCOMES . . . 26

IV.1 Scatter Plot Maps . . . 26

IV.2 Ordinary Kriging . . . 26

V. CONCLUSIONS . . . 33

V.1 Summary of Thesis . . . 33

REFERENCES . . . 35

APPENDIX A. ABBREVIATIONS . . . 37

(6)

LIST OF TABLES TABLE

II.1 Number of facilities, facility size, and facility UTD rates by facility type and designation. . . 7 III.3 Results from stochastic and deterministic (det) models with varying enrollment and up to

date rate held constant at 65%. . . 17 III.4 Stochastic and deterministic results from 15 Colorado facilities . . . 19 III.5 Results from stochastic and deterministic (det) models with varying UTD rates and

(7)

LIST OF FIGURES FIGURE

II.1 Infected curves from outbreak simulations for each facility using deterministic models. . . 8 II.2 Outbreak trends and total percent infected across quantiles of sizes of public schools,

private schools, and child care centers. . . 9 II.3 Outbreak trends and total percent infected across facility designations. . . 10 III.1 Simulation of an outbreak at a school with 1000 students and an up to date rate of 65%. . . 14 III.2 Simulation of an outbreak at a school with 1000 students and an up to date rate of 90%. . . 14 III.3 Stochastic and deterministic results for increasing enrollment . . . 16 III.4 Probability of outbreak and median days to DFE as enrollment changes . . . 17 III.5 Relationships between UTD rates and proportion infected, median days to DFE, and

probability of outbreak for each facility size group. . . 19 III.6 Relationships between enrollment and proportion infected, median days to DFE, and

probability of outbreak for each UTD rate group. . . 20 III.7 Stochastic and deterministic results for increasing UTD rates . . . 22 III.8 Median days to DFE, approximate R0, probability of outbreak, and median proportion

infected as UTD rates increase from 65% to 95% while enrollment is held constant at 1000. . 23 III.9 Proportion infected after 120 days as UTD rates increase . . . 23 IV.1 Results from stochastic model for each school or child care facility in Colorado. Results are

in terms of median proportion infected over 120 days of an outbreak. . . 27 IV.2 Results from deterministic model for each school or child care facility in Colorado. Results

are in terms of proportion infected over 120 days of an outbreak. . . 27 IV.3 Example semivariogram displaying components of a semivariogram. . . 30 IV.4 Semivariogram using Colorado school data, which was used to create a map with hotspots

generated from Kriging. . . 31 IV.5 Kriging maps of 120-day deterministic simulation results . . . 31 IV.6 Kriging maps of 120-day stochastic simulation results . . . 31

(8)

CHAPTER I INTRODUCTION

I.0.1 Measles Virus and Vaccination

Measles is a highly contagious viral infection that causes systemic illness, and is commonly known for affecting the skin, throat and causing high fevers. The virus’s high virulence comes from being well-adapted to the human body, requiring only a low dose of virus to cause an infection. It also can live for up to two hours in airspace and on surfaces. It is estimated that one infected person can infect 90% of susceptible people with whom they come into contact (Centers for Disease Control and Prevention (2019d)). An infected person is thought to be infectious for around four days prior to showing symptoms (Centers for Disease Control and Prevention(2019d)).

Measles is most dangerous for children under 5, adults over 20, and those that are immunocompro-mised (Centers for Disease Control and Prevention(2019c)). Symptoms include cough, coryza, conjunc-tivitis followed by high fevers and rash, and prostration; complications include pneumonia, encephalitis, and secondary bacterial infections (Centers for Disease Control and Prevention(2019c)).

Measles became a nationally notifiable disease in 1912 in the US when there were roughly 6,000 documented cases per year. Around 1963, the United States saw 3-4 million infections per year, 400-500 deaths, and 48,000 hospitalizations. That same year, a vaccination became available for measles, and it was declared eliminated from the U.S. in 2000, though this status was relatively short lived. As of April 19, 2019, there have been 1164 individual cases across 30 states in the United States. The rate of infections in 2019 quickly surpassed 2014 numbers, which previously saw the most annual cases since 2000 (Centers for Disease Control and Prevention (2019a)). The number of infections is now the highest number seen since 1992. As the country experiences significantly large outbreaks of measles among pockets of unvaccinated individuals in 2019, it is important to evaluate the risk of outbreaks at a community level to identify these clusters of susceptible individuals. Schools and child care facilities provide a unique opportunity to evaluate risk of local outbreaks, as they represent communities with relatively constant mixing of individuals in a confined location. One way of expressing this risk is through the use of susceptible, infectious, recovered (SIR) models of disease transmission potential, given a population size and susceptible proportion. SIR models are compartmental models that simulate epidemic curves for a disease outbreak. Prior research has demonstrated their usefulness for modeling disease transmission and estimating the potential magnitude of an outbreak (Allotey(2017),Zhouet al. (2019),Cauchemezet al.(2008)).

(9)

All 50 states have laws requiring vaccination for school attendance, unless a parent or guardian provides an acceptable reason meeting state vaccine exemption policies. Increasingly, schools in states such as Colorado report rates of immunization for various diseases, including measles. Per Board of Health rules, the Colorado Department of Public Health and Environment (CDPHE) began in 2017 to publish comprehensive immunization coverage information online for all licensed child care facilities and K-12 schools (Pabonet al.(2017)). Facilities with an enrollment size 10 or greater are required to report their data on an annual basis. Data represent almost 1 million children and have shown widespread variation in children fully immunized for MMR, as well as exemption rates across and within counties, school districts, and across school types. Historically, measles cases in Colorado have only accounted for zero to eight percent of US cases (Centers for Disease Control and Prevention (2019a)), despite state-level MMR vaccination coverage remaining well below national coverage state-level targets needed to ensure protection from measles. Colorado has received attention for ranking last among states for kindergarten MMR vaccination coverage (88.7% for 2017-18 school year) and is one of the states with the highest rates of nonmedical exemptions (NMEs) from school-required vaccines (4.5%). This is at least partially attributed to its lenient exemption policies (Whittingtonet al. (2017)). The demonstrated correlation between high rates of NMEs, lower MMR vaccination coverage linked to anti-vaccination sentiments, and higher cases of vaccine preventable diseases (VPDs) is concerning for Colorado as the national case count of measles continues to climb.

In this thesis, we sought to estimate the risk of measles outbreaks in Colorado schools and child care facilities using both deterministic and stochastic SIR models. We also utilize ordinary Kriging to generate smooth maps of model risk estimates. The goal was to inform policy makers and parents about measles risk and to develop a modelling framework that can be applied to school data in other states or for other infectious diseases in Colorado.

I.0.2 Community Immunity

Community immunity, also known as herd immunity, is defined as the level of vaccine coverage against a particular infectious disease that provides protection to those who are unvaccinated in a population (Vaccines.gov (2019)). To estimate this for a particular infectious disease, you first need an estimate of basic reproduction number, often referred to as R0, or the expected number of secondary infections generated by a single, typical infection in a completely susceptible population. Mathematically, R0 is

βN

(10)

where β is the contact rate and γ is the removal rate. Therefore, if β=0.9, then an infectious person will infect 9/10 susceptible individuals with whom they come into contact, and these individuals will be infectious for a period of time equivalent to 1/γ. Given R0, we can estimate p, the proportion of the population needed to be vaccinated to achieve community protection, by the following

p = 1 − 1 R0

. (I.2)

It is clear from the equation above that as R0increases, so does the threshold needed to achieve commu-nity protection. Anderson and May reported an R0=18.8 and p=0.95 for measles (Anderson and May (1992)). Other estimates of R0for measles have been reported in the range of 12-18 (Centers for Disease Control and Prevention(2019d),Guerraet al.(2017)).

I.0.3 SIR Models

SIR models use a system of ordinary differential equations (ODEs), a differential equation that involves only ordinary derivatives with respect to only one dependent variable (Nagle et al. (2012)), to model a compartmentalized population. The classic model has Susceptible, Infected, and Recov-ered/Removed compartments. Because they are built using ODEs, these models give deterministic results and are most useful for simulating simple population dynamics. The system of ODEs in a three compartment model is written out in equationI.3.

Every person in the population must be assigned a compartment. Those that have been vaccinated start in the Recovered/Removed compartment, and the rest of the population starts in the Susceptible population. A single person moves from Susceptible to the Infected compartment, which starts the outbreak. Due to the highly contagious nature of the measles virus, it takes only a single person to start a simulated outbreak. This should only turn into an outbreak if the population does not have herd immunity, a form of indirect protection when a large enough proportion of a population is vaccinated against a disease (Vaccines.gov(2019)). If enough people are vaccinated, then even those who have not received the vaccination are considered protected from the disease.

dS dt = −βS(t)I(t) dI dt = βS(t)I(t) − γI(t) dR dt = γI(t) (I.3)

(11)

Multiple assumptions must be made when using simple ODEs. First, a closed population is assumed if population dynamics such as immigration, births and deaths do no warrant consideration. Second, homogeneous mixing is assumed; every person in the population is equally likely to meet anyone else in the population (Britton(2010)). Third, for the purpose of the models used in this thesis, movement through compartments is unidirectional, and a person can move at most 2 times.

The model is defined by its starting configurations and parameters that describe movement between compartments. Starting configurations are typically S(0) = 1 − ǫ, I(0) = ǫ, and R(0) = 0, where ǫ is the small proportion of the population that is initially infected. This term must be positive; the derivatives would be constant and equal to 0 otherwise (Britton(2010)). The assumption of uniform mixing implies that infections will occur at a rate that is proportional to S(t)I(t), the size of the Susceptible and Infected compartments at time t (Britton(2010)). S(t) is monotonically decreasing t S(∞) while R(t) is monotonically increasing to R(∞). I(t) can be rewritten as

dI

dt = I(t)(βS(t) − γ). (I.4)

From I.4, we can see that I(t) will increase when βS(0) > γ. Eventually, S(t) will have enough people move out of the compartment so that βS(0) < γ, causing I(t) to decrease (Britton(2010)). The difference in whether I(t) will increase or decrease depends on R0 := β/γ, as described in the previous section. However, this is a generalization that should be made for large populations only. Since the Colorado schools data have small populations and we are assuming we know the expected (mean) infection time, an approximation outlined by van den Driessche (2017) based on the final size equation of an epidemic should be used instead:

log(S(∞)) = −R0(1 − S(∞))

R0= −(log(S(∞))/(1 − S(∞))).

(I.5)

This equation yields the an approximation of the R0of the epidemic (van den Driessche(2017)). Other approximations can be made, such as using

R0= βS(0)

γ , (I.6)

also outlined in van den Driessche (2017), but this is the expected value of R0at the start of the outbreak. The two approximations will be compared in later sections.

(12)

SIR models allow the exploration of what outbreaks may look like over time for different popula-tions, how many people were infected over the entire outbreak, and the influence of model parameters representing infection rates and time to recovery.

The aims of this thesis are the following:

1. Examine outbreak risk across Colorado schools and child care centers using deterministic SIR models

2. Explore the influences of outbreak risk using stochastic SIR models, and to compare results to those from deterministic models

(13)

CHAPTER II

AIM 1: DETERMINISTIC SIR MODELS APPLIED TO COLORADO SCHOOL AND CHILD CARE CENTER DATA

With new cases occurring across the country every day, all with the potential of starting outbreaks, it is important to evaluate risk at the community level. Schools and child care centers are convenient environments to model risk due to relatively constant mixing of children in a consistent location and reliable data on up to date (UTD) rates. It is also worth evaluating risk at these locations due to the fact that the majority of susceptible and potentially infectious individuals are children that would be attending school or frequently visiting a child care center.

Colorado requires every school and every child care center with 10 children or more to report the number of vaccinations and vaccination boosters per child at their facility (Pabonet al. (2017)). Using these numbers, UTD rates were estimated at each school and child care center. It has been shown that a single dose of MMR vaccine gives 93% protection and two doses give 97% protection (Centers for Disease Control and Prevention(2019e)). The first vaccine is given between the age of one to two, and the second dose is given between the age of four and five. Because of this, child care centers have less coverage against the measles due to having many children under the age of five. Schools have higher coverage simply because the majority of their population is old enough to have received both vaccinations.

As described in the introduction, the dynamics of an outbreak from an SIR model are characterized by two parameters: beta and gamma. Beta determines the rate of a susceptible person becoming infected after encountering an infected person (Nagleet al.(2012)). Gamma determines the rate that an infected person recovers and is then removed from the outbreak, and is equal to 1 over the time to recovery (Nagle et al. (2012)). Both parameters are set to a fixed value in basic SIR models. Based on MMR vaccine data and measles natural history, gamma was set to 0.90 and beta was set to 0.10. TableII.1shows the number of facilities, facility size, and the median UTD rates by facility type and facility designation.

With these data, 60-day deterministic SIR simulations were run for each facility using reported proportion susceptible and proportion removed (removed meaning UTD on vaccinations with the ap-propriate protection). A single infected person was then introduced to this model, where the proportion infected equaled one divided by that facility’s enrollment.

We were also interested in describing outbreak potential for different types of schools and child care centers. Each facility had indicator variables regarding whether it was a public school, private school, or child care facility and whether a it was located in an urban, rural, or frontier area of Colorado. Furthermore, we categorized facilities into their enrollment size using quantiles.

(14)

Table II.1: Number of facilities, facility size, and facility UTD rates by facility type and designation. Facility Type n Median Size (IQR) Median UTD (IQR)

Public 1706 410 (263, 568) 92.8 (90.1, 94.4)

Private 156 150 (45, 247) 89.2 (79.3, 95.0)

Child Care 1728 48 (29, 78) 89.5 (85.5, 92.5) Facility Designation n Median Size (IQR) Median UTD (IQR)

Urban 2930 140 (49, 436) 91.4 (87.8, 93.0)

Rural 476 92 (34, 298) 90.5 (85.5, 93.0)

Frontier 183 52 (24, 159) 90.1 (87.3, 93.0)

The 60-day time series simulations were compared visually by fitting cubic smoothing splines to facility demographics described above. Careful consideration was needed when determining which visu-alizations best showed how outbreak size (percent infected) and length were influenced by varying UTD rates, facility type, facility location, and facility size.

Figure II.1, where percent infected was plotted against days of outbreak, shows that there is a relationship between size of outbreaks (in percent infected) and facility type. The smoothed curve fit to the public schools simulations never exceeds 1% infected, while the curves for both child care centers and private schools exceed 3%. FigureII.1also shows that child care centers had some individual simulations exceed 50% infected at the peak of the outbreak curves. Only 1 simulation of from the private school data exceeded 50% at the peak, while all other curves peaked below 30%. There were multiple curves that peaked above 30% for public schools, but the majority of the curves stayed close to 0% infected, resulting in a much lower smoothed curve. Private schools had the most variability in outcomes, shown by the confidence band around the green smoothed private school curve. The public school and child care center curves had confidence bands tight enough that they are not visible on this scale, indicating low variability in simulation results.

Figure II.2 show trends in percent infected over an outbreak and the total percent infected versus facility UTD rates for each facility type. Quantiles of size were calculated separately for each facility type to create categorical variables for enrollment. If we first look at the legends of these graphs, we can see that the size distribution across facility types varies greatly. A large private school has roughly 250 students enrolled, while a large public school has 600 or more. As would be expected, child care centers are much smaller than schools, where large child care centers have around 75 or more children. Despite differences in distribution in size, we can see similar trends across the three graphs in how outcomes look for the categorical sizes. Small facilities consistently have the worst outcomes, experiencing the largest percentages of children infected out of the size categories. Smaller populations peaked earlier than larger populations, as well as having much higher peaks in the outbreak time series graphs. Across all facility

(15)

Figure II.1: Infected curves from outbreak simulations for each facility. The top left shows smoothing splines that were fit to assess the average trends. Graphs for each facility type show the average curve in color with each facilities’ infected curve.

types, we see that as size increases, the outbreak curves dampen. If we look at the graphs with UTD percent on the x axis and percent infected on the y, we can see here as well that increasing enrollment decreases the impact of an outbreak. For public schools, all but the smallest size group did not experience an outbreak if UTD rates were around 95% or higher. It appears that large public schools with 95% UTD rates and above have full protective immunity and are not experiencing outbreaks. Private schools and child care centers did not have curves that reached 0, though it is obvious they are approaching 0.

FigureII.3shows the percent infected over an outbreak and UTD rates versus total percent infected across facility designations. Colorado schools can be categorized as either urban, rural, or frontier. Frontier schools experienced the worst outbreak outcomes, with peak infected curves reaching 4%. Rural schools peaked just under 3%, while urban schools remained below 2%. Frontier and rural designations had much wider confidence bands, likely because facilities were most frequently designated as urban. The curves are all approaching 0 as time moves towards 60 days, but none of the curves have resolved to 0 cases. If we look at the right panel of Figure II.3 where UTD rates are graphed against total percent infected, trends for urban and rural schools look identical and almost perfectly overlap. The

(16)

Figure II.2: Outbreak trends and total percent infected across quantiles of sizes of public schools, private schools, and child care centers.

(17)

curve for frontier schools doesn’t match the others until UTD rates reach 85%, and even at high UTD rates the curve is a bit sporadic. The frontier subset of the data has fewer data points than the other designations, resulting in an over-flexible spline fit at higher UTD rates and influenced by a few points at low UTD rates. If there were more data for this subgroup, perhaps the curve would match the other two designations. However, this graph implies that designation does not influence total percent infected over 60 days.

Figure II.3: Outbreak trends and total percent infected across facility designations.

These basic deterministic SIR models show that communities with more dense populations tend to have UTD rates that are high enough for community immunity to take effect. Communities with less dense populations also tended to be child care facilities and private schools. Small public schools, private schools, and child care centers had the worst outbreak outcomes, while larger facilities had on average less drastic outcomes than their less populous counterparts. However, in every subgroup, total percent infected decreased as UTD rates approached 1. The major limitation to this type of model, however, is that constant mixing is assumed, while this is obviously not the case in the real world. Colorado has likely not experienced an outbreak of measles due to no infected individual being introduced to a vulnerable area of the state. More densely populated areas are more likely to have more migration and international travelers than the less dense communities, which is known to be associated with measles outbreaks. The second major limitation to this simulation study is that the models used were deterministic and did not allow for any variability in the simulation outcomes. It is fair to assume that there should be some uncertainty and randomness in the final number infected when simulating outbreaks in small populations, such as schools or child care centers. It is also reasonable to assume that in large populations, if only

(18)

one initial infected person is introduced, an outbreak may not take off depending on the probability of meeting a susceptible person. No chance was incorporated in the deterministic model, which is why stochastic compartmental models are a useful alternative.

(19)

CHAPTER III

AIM 2: STOCHASTIC SIR MODELS

III.1 Introduction to Stochastic SIR Models

Stochastic SIR models are compartmental models that incorporate chance to allow for varying results. These models are dynamical simulations where there is a dependence on information from the previous time step. Continuous variables are substituted for discrete while process rates are substituted for process probabilities (Regoes (2018)). A stochastic model can be set up so that the population space is discrete and not allow for partial individuals; previously, the deterministic models used proportion of a population, which can give partial individuals if results are converted from proportions. Stochastic models do not allow the identification of individuals within the population; these are still population-based models. Stochastic models are an improvement over deterministic models because they incorporate randomness to the model processes. Simulations with the same inputs for identical populations can have varying results. Because of this, standard errors and confidence intervals can be calculated for simulation outcomes, allowing for a measure of uncertainty.

The stochastic models have the same starting configurations as the deterministic models, except that starting compartment sizes are in terms of number of people, rather than proportion of the population. Infected people come into contact with others in the population randomly in time at a constant rate β; this is according to a Poisson process with intensity β (Britton(2010)). Each contact is with an individual that had been selected uniformly at random from the entire population. All contacts are assumed to be mutually independent. "Contact" assumes that if an infected person meets one susceptible, they will become infected. The infected people stay infectious for a random time I. Infectious periods are assumed to be independently and identically distributed (IID), as well as independent of the contact process. Infectious periods are distributed FI, with mean E(I) = 1γ; this allows the stochastic model to mirror the deterministic. We will be using an FI that is exponentially distributed, and the process is Markovian with jump intensities (Britton(2010)).

There are multiple algorithms to determine how many individuals move between compartments at each time step. The most commonly referenced algorithm is the tau-leaping method based on the Gillespie Method that can easily be configured to fit the above specifications (Regoes (2018)). The first step of the algorithm is to determine when an event will happen. If present time is t, then t + τ represents the time when an event happens. This is based on a random number that is distributed typically exponentially or Poisson. We will be using Poisson, and that is scaled by the sum of the process rates (Regoes (2018)). Contact between infected people and susceptibles are considered an

(20)

event, as well as an infected person recovering and moving to the Recovered compartment. This is distributed Poisson since these these events are considered rare. The second step decides what type of event happens. This is determined by “drawing a process randomly from all possible processes according to their respective probabilities” (Regoes (2018)). Once the algorithm has picked which process and when it will happen, the compartments can be updated to reflect that event, and this process is iterated, typically at least 1,000 times.

Local dynamics can then be described by a ‘pure jump’ stochastic differential equation with discrete states

dX(t) = Sµ(dt), (III.1)

where µ(dt) is a vector of scalar counting measures µ(dt) = [µ1(dt), . . . , µN trans(dt)]T. For any time t, if transition k occurs, then the state vector S is updated by

X(t) = X(t−) + Sk, (III.2)

where Sk is the kth column in S. In equation III.1, the N trans number of possible state transitions are competing independent Poisson processes. The process that wins determines which transition will happen next and changes the state according to equation III.2. The stochastic simulation will then continue under the Markov assumption “where previous events are remembered via the state variable X only" (Widgrenet al.(2018)). Due to the nature of the stochastic model, with contacts being distributed Poisson(β) and infectious periods distributed Exponential (γ), results look bimodal. Because of this, medians and the IQR should be used as summary statistics rather than the mean and standard deviations or standard errors.

The stochastic model was first run with 10,000 identical nodes with 449 people susceptible, 1 person infected, and 650 people vaccinated or recovered. This was a 120-day simulation and represents a population with 65% up to date rate for the MMR vaccination. The results can be seen in figureIII.1. We can see in the left pane of FigureIII.1that on average, the size of the removed compartment (purple) approaches 1000, while the susceptible compartment (red) approaches 0. The outbreak peaks just under 30 days. From the shading showing the IQR, we can also see that some nodes experience no outbreak and the compartment sizes did not vary much from the starting values. This is especially apparent in the right pane of FigureIII.1, where 10 random nodes out of the 10,000 that were run are shown. There was at least 1 of these 10 that did not experience an outbreak and has flat lines for all compartments over the 120 days. Some nodes experienced outbreak peaks around 20 days, while others around 40

(21)

Figure III.1: Simulation of an outbreak at a school with 1000 students and an up to date rate of 65%. The median result and corresponding IQR are shown for all compartments: Susceptible (S), Infected (I), Cumulative Infected (C), and Removed/Recovered (R).

(a) SIR Curves with shading showing the IQR. (b) 10 nodes with varying outcomes

Figure III.2: Simulation of an outbreak at a school with 1000 students and an up to date rate of 90%. The median result and corresponding IQR are shown for all compartments: Susceptible (S), Infected (I), Cumulative Infected (C), and Removed/Recovered (R).

(a) SIR Curves with shading showing the IQR. (b) 10 nodes with varying outcomes

days. The magnitude of the outbreak also varies between nodes. The most extreme have peaks around 150 infected, while some simulations still experienced outbreaks but only 100 people were infected at the peak.

Next, a model simulating a population with a 90% up to date rate was created with 100 people initially susceptible, 1 infected, and 899 vaccinated or recovered. The results can be seen in FigureIII.2. We can see that on average no outbreak has occurred over the 10,000 nodes. The shading indicating the interquartile range in FigureIII.2also show that when the initial infected individual managed to infect any susceptible people, no major outbreak occurs within the 60 days. If we look at 10 random nodes that were run, we see slight changes in the compartment curves for a few of the nodes, but no outbreak is able to take off.

(22)

In order to explore differences between the outcomes of the deterministic and stochastic models, the proportion infected at each time point in the outbreak was compared for the two types of models, as well as the total proportion infected after 120 days. Results from the stochastic and deterministic models were overlaid for ease of comparison. It was also of interest to explore how outcomes changed as enrollment and up to date rates increase and compare the two types of models over these scenarios.

III.1.1 Impact of Enrollment

To assess the impact of population size on outbreak outcomes, enrollment size was varied from 50 to 3050 students and up to date rate was held constant at 65%, an UTD rate where outbreaks are guaranteed. Medians and the respective interquartile ranges were calculated for the stochastic outcomes. Deterministic models were run with the same starting configurations and values for β and γ. The models were run over 120 days. The median stochastic outcome with IQR and deterministic outcome trends are shown in FigureIII.3for each enrollment size.

The deterministic model always has more extreme outcomes at the peak of the outbreak than the stochastic model, though the deterministic results generally fall within the 75th quartile of the stochastic results for each simulation. As enrollment increased, the peak of the infected curves stayed at approx-imately 0.09 for the median of the stochastic model, 0.11 for the outcome of the deterministic model, and 0.105 for the 75th quartile of the stochastic model. The exception to this was the simulation with enrollment set to 50. The median of the peak was lower than the other simulations, around 0.075, but the 75th quartile and deterministic models peaked much higher, exceeding 0.125. This illustrates how the IQR of the median proportion infected at the peak of an outbreak is significantly wider than that of larger populations, and that outcomes seem more volatile for smaller populations than larger. An increase or decrease of only 1 infection equates to a large proportion of the entire population for small en-rollment sizes. The jagged appearance of the median curves for the small populations also demonstrates the limited numbers of transitions that are possible at each given time point, and how each transition leads to a large change in proportion infected.

If the proportion infected at the end of the outbreak (cumulative infected) is analyzed instead of the size of the peak of the outbreak, it is possible to see that this summary statistic is robust against changes in population size. The median proportion infected (MPI) ranged from 0.300 to 0.330 as enrollment increased, shown inIII.3. The IQR was consistent at approximately (0, 0.334) for all simulations with more than 1000 enrolled. The IQRs were slightly higher for smaller populations; the widest IQR was seen, (0, 0.340) when enrollment was set to 50. In every stochastic simulation, the median percentage of susceptible people becoming infected was roughly 86%.

(23)

a) Size = 50 b) Size = 550

c) Size = 1050 d) Size = 1550

e) Size = 2050 f) Size = 2550

g) Size = 3050

Figure III.3: Infected curves in terms of proportion infected for stochastic (black) and deterministic (red) simulations. The up to date rate was held constant at 65% while enrollment varied from 50 to 3050, increasing by increments of 500. 75th quartile confidence bands are shown for the median stochastic result; the 25th quartile was consistently 0.

(24)

Table III.3: Results from stochastic and deterministic (det) models with varying enrollment and up to date rate held constant at 65%. The models were run for 120 days. The deterministic result for proportion infected is shown. This is followed by the stochastic results: the median proportion infected (MPI) with the 25th and 75th quantiles about the median, as well as the median number of days until disease-free equilibrium (DFE). All stochastic models reached DFE within 120 days, though simulations with enrollment above 1000 took over 90 days to reach DFE.

Enrollment Prop. Inf. MPI (IQR) DFE P[Outbreak] E[R0] Approx. R0

50 0.321 0.300 (0.020, 0.340) 37 0.755 3.15 1.62 550 0.331 0.325 (0.002, 0.335) 81 0.760 3.15 1.67 1050 0.332 0.329 (0.001, 0.334) 92 0.719 3.15 1.67 1550 0.332 0.329 (0.001, 0.334) 98 0.717 3.15 1.67 2050 0.332 0.330 (0, 0.334) 103 0.701 3.15 1.68 2550 0.332 0.330 (0, 0.334) 107 0.702 3.15 1.68 3050 0.332 0.330 (0, 0.333) 110 0.699 3.15 1.68

Figure III.4: The left panel shows how the probability of an outbreak (more than 1 person infected) decreases gradually as enrollment increase. The right graph shows for the median days to disease-free equilibrium increases steadily as enrollment increases.

It is frequently of interest to describe the states of equilibria that a compartmental model should reach. If the disease is able to persist in a population, endemic equilibrium is a descriptive characteristic of that model. In the case of measles, however, the virus is too virulent and infectious to remain present in any but very large populations. Because of this, endemic equilibrium is not useful and disease-free equilibrium (DFE) should be used instead. Analyzing days to DFE gives insight to how population size affects the length of an outbreak. Median days to DFE shows what the median length of time that the 10,000 simulations took to have 0 individuals in the Infected compartment. Larger populations are capable of sustaining measles infections for longer, while measles burns through smaller populations in only a fraction of the time. There was a sharp increase in DFE when enrollment went from 50 to 550: 37 to 81, an increase of 212%. Incrementally increasing enrollment between 550 and 3050 led to more consistent increases in DFE, which can be seen in III.4. It is unclear whether the trend seen in the simulations with at least 550 people continues as enrollment approaches infinity. It is unlikely to find

(25)

schools much bigger than 3000, however, so exploring populations much larger to this was not relevant to this simulation study. Based on what is known about measles, disease free equilibrium may not be met with large populations, and the system would reach endemic equilibrium instead, assuming vaccination rates are low enough that there is not herd immunity.

The probability of outbreak can be estimated by calculating the fraction of the 10,000 simulations that saw more than 1 infected individual at any point over the course of 120 days. Simulations were considered to have no outbreak if the initial infected individual never passed the virus to another person. III.4shows the trend in probability of outbreak as enrollment is increased. The downward trend when enrollment increases from 550 to 1050 was expected, since smaller populations allow for more volatile outcomes, though the trends seems to be leveling fairly quickly. All enrollment sizes had probabilities of outbreak exceeding 2/3, which is very high, though this was by design due to having an up to date rate of only 65%.

Finally, the expected R0 was calculated from the starting configurations and approximate R0 was calculated from the final compartment sizes. Since every simulation had the same UTD rate, the expected R0 was consistent for all simulations and was expected to be 3.15. This was based on the calculation described previously in (1.6) in Chapter 1. The R0 of the outcome was estimated using (1.5), which uses data from the results of the simulation and not only the starting configuration. R0 increases as enrollment exceeds 100, but otherwise is consistently 1.67 or 1.68. This was expected due to how the median proportion infected was robust to changes in enrollment size, which is used to calculate the approximate R0.

A similar set of simulations was run but where the starting configurations were taken from the Colorado schools and child care center data. Fifteen facilities with varying enrollment and UTD rates were chosen that fell into one of the following combinations of enrollment and UTD rates: approximately 100, 500, 1000, or 3000 children enrolled, and UTD rates less than 70%, between 70-80%, between 80-90%, and greater than 90%. There were no extra large facilities with an UTD rate between 70-80%.

Figure ?? further illustrates the trends described above between enrollment, median days to DFE, and the probability of outbreak. The MPIs for different UTD groups saw flat trends as enrollment increased. The MPI trends for facilities with at least 80% UTD rates were all consistently close to 0 as enrollment increased. As would be expected, facilities with less than 70% UTD rates saw the highest MPIs, ranging from 0.3 to 0.4. Facilities with UTD rates between 70-80% saw MPIs ranging from approximately 0.1 to 0.15. Trends in median days to DFE were similar to those for MPI. Facilities with UTD rates above 80% saw flat trends in median days to DFE; facilities with UTD rates above 90% had the lowest median days to DFE for every enrollment size. Facilities with UTD rates below 80% saw median days to DFE

(26)

Table III.4: Results from stochastic and deterministic (det) models from child care centers and schools in Colorado. No school had enrollment near 3000 and an UTD rate in the 70-70% range. The models were run for 120 days. The deterministic result for proportion infected is shown. This is followed by the stochastic results: the median proportion infected (MPI) with the 25th and 75th quantiles about the median, as well as the median number of days until disease-free equilibrium (DFE). All stochastic models reached DFE within 120 days.

Enrollment UTD Prop. Inf. MPI (IQR) DFE P[Outbreak] E[R0] Approx. R0

1066 0.58 0.409 0.408 (0.002, 0.412) 88 0.754 3.78 2.13 92 0.63 0.352 0.348 (0.011, 0.359) 50 0.774 3.33 1.76 461 0.67 0.309 0.302 (0.002, 0.312) 79 0.755 2.97 1.58 3765 0.68 0.298 0.295 (0, 0.299) 117 0.671 2.88 1.55 89 0.74 0.228 0.146 (0.000, 0.236) 34 0.699 2.34 1.28 420 0.77 0.188 0.094 (0, 0.190) 59 0.675 2.07 1.21 943 0.77 0.187 0.131 (0, 0.187) 80 0.679 2.07 1.24 3828 0.86 0.187 0 (0, 0.004) 12 0.360 1.26 1.08 94 0.85 0.086 0.011 (0, 0.064) 12 0.573 1.35 1.09 492 0.85 0.066 0.002 (0, 0.033) 13 0.578 1.35 1.08 107 0.94 0.009 0 (0, 0.009) 9 0.334 0.54 1.03 946 0.84 0.008 0.001 (0, 0.051) 13 0.592 1.44 1.09 509 0.94 0.002 0 (0, 0.002) 9 0.351 0.54 1.03 1144 0.94 0 0 (0, 0.001) 9 0.200 0.54 1.03 2946 0.99 0 0 (0, 0) 8 0.003 0.09 1.01

Figure III.5: Relationships between UTD rates and proportion infected, median days to DFE, and probability of outbreak for each facility size group.

(27)

Figure III.6: Relationships between enrollment and proportion infected, median days to DFE, and probability of outbreak for each UTD rate group.

increase significantly as enrollment increases, with no indication of leveling off. One facility with almost 4000 children and a very low UTD rate took 117 days to reach DFE, the most drastic result seen in this simulation study. Probability of outbreak is fairly consistent as enrollment increases for all UTD rate groups, with the exception of facilities with more than 90% up to date. This is likely due to the school with around 3000 students that has an UTD rate of 99%, rather than the 94% that all other high UTD schools have. The Approximate R0 is always greater than the expected R0. All facilities in the highest UTD group have R0 close to 1, while facilities in the lowest UTD group have R0 above 1.5. Trends in approximate R0are flat for every UTD group as enrollment increases.

This has shown that while the deterministic model consistently has more extreme results at the peak of the outbreak than the median of the stochastic model results, the deterministic results fall within the 75th quartile of the stochastic results the majority of the time. This is true across a wide range of enrollment sizes. It is possible that this is an artifact of modeling an outbreak in populations that have a low up to date rate. This could also be true about trends seen in DFE and probabilities of outbreak. Further, enrollment seems to have little impact on any of the summary statistics or model characteristics. Results were fairly robust as enrollment was increased. Because of this, varying the up to date rate while holding enrollment constant also needs to be explored.

(28)

Table III.5: Results from stochastic and deterministic (det) models with varying up to date rates and enrollment held constant at 1,000. The models were run for 120 days. The deterministic result of proportion infected is shown. This is followed by the stochastic results: median proportion infected (MPI) with the 25th and 75th quantiles about the median, as well as the median number of days until disease-free equilibrium (DFE). All stochastic models reached DFE within 120 days. Simulations with up to date rates below 80% took over 90 days to reach DFE. E[R0] was calculated for each simulation based on starting configurations, while the approximate R0is estimated from the final state equation.

UTD Rate Prop. Inf. (Det.) MPI (IQR) DFE P[Outbreak] E[R0] Approx. R0

65.0 0.332 0.329 (0.001, 0.335) 91 0.724 3.15 1.67 70.0 0.274 0.268 (0, 0.277) 94 0.678 2.70 1.48 75.0 0.213 0.196 (0, 0.215) 94 0.628 2.25 1.32 76.0 0.200 0.174 (0, 0.201) 90 0.613 2.16 1.29 77.0 0.187 0.152 (0, 0.188) 87 0.605 2.07 1.26 78.0 0.173 0.009 (0, 0.173) 36 0.588 1.98 1.14 79.0 0.159 0.003 (0, 0.158) 26 0.582 1.89 1.13 80.0 0.144 0.003 (0, 0.141) 20 0.565 1.80 1.12 85.0 0.057 0.001 (0, 0.024) 13 0.471 1.35 1.08 90.0 0.005 0 (0, 0.003) 10 0.341 0.90 1.05 95.0 0.001 0 (0, 0.001) 9 0.160 0.45 1.03

III.1.2 Impact of UTD Rates

To examine the effect of increasing up to date rates on the proportion of infected students in a given population, enrollment was held at 1000 while up to date rates varied from 65% to 95% with incremental increases of 5%. The models were run over 120 days. Again, the IQRs were calculated about the median of the stochastic outcomes. Deterministic and stochastic results are overlaid in FigureIII.7.

(29)

a) UTD = 65% b) UTD = 70%

c) UTD = 75% d) UTD = 80%

e) UTD = 85% f) UTD = 90%

g) UTD = 95%

Figure III.7: Infected curves in terms of proportion infected for stochastic (black) and deterministic (red) simulations. The up to date rate was held constant at 65% while enrollment varied from 50 to 3050, increasing by increments of 500. 75th quartile confidence bands are shown for the median stochastic result; the 25th quartile was consistently 0. These results are scaled to the results seen with the lowest up to date rate of 65%.

(30)

Figure III.8: Median days to DFE, approximate R0, probability of outbreak, and median proportion infected as UTD rates increase from 65% to 95% while enrollment is held constant at 1000.

Figure III.9: Proportion infected after 120 days. The deterministic result (blue) and 75th quartile of the stochastic result (red) have very similar trends as UTD rates increase, while the median stochastic result (green) shows a sharp decrease when the UTD rate increases from 77% to 78%.

Unlike when enrollment was changed and up to date rates were held constant, the deterministic results more frequently fall outside of the 75th quartile as up to date rates increase from 65% to 95%.

(31)

When up to date rates are low (below 75%), the relationship between stochastic and deterministic results look similar to the enrollment study, and only the peak of the deterministic results fall outside of the 75th quartile. As up to date rates exceed 70%, the median of the stochastic model and the result of the deterministic model start to disagree more drastically. While both curves are flattening out as up to date rates increase, the peak of the deterministic model is increasingly lagging behind the peak of the stochastic model and greatly exceeding the 75th quartile.

Figure III.7makes it clear that UTD rates have a direct impact on simulation outcomes. The peak of the outbreaks are cut by more than half when the UTD rate is increased from 65% to 75%. The peaks drop from 0.025 to close to 0 when the UTD rate is increased from 75% to 80%. The decrease is slightly less drastic for the deterministic results but we see the same trend. The interquartile range from the stochastic model show that there still is the possibility of an outbreak when the up to date rate is 80%, with a peak occurring around day 45 for the stochastic model, and 55 for the deterministic model. This is only in the top 25th quartile of cases, however. The median falls to near 0 as the UTD rate reaches 80%.

The drastic difference between the deterministic results and the mean stochastic results seen in the graphs with 80 and 85% up to date rates displays a strength of the stochastic models. If the susceptible population is large enough and the conditions supplied by the parameters in the model, β and γ in these models, the deterministic model will always predict an outbreak. The stochastic model shows that both smaller and potentially larger outbreaks than what is seen in the deterministic result are possible; the 90th or 95th percentiles would need to be examined to confirm this. This also shows that it is possible that no outbreak occurs in the stochastic model when the deterministic model predicts that enough people will be infected for it to be considered an outbreak. Even if the conditions are able to produce an outbreak, one is not guaranteed to happen.

Low UTD rates cause the measles virus to burn through the susceptible population very quickly, as was discussed in the previous section. Populations with UTD rates between 70-80% are able to support the virus much longer. This is evident in TableIII.5, where simulations with UTD rates below 80% take more than 80 days to reach DFE. If UTD rates are 85% or higher, there is a better chance that major outbreaks will not happen, while UTD rates of 90% or higher completely prevent a major outbreak. Since the infectious period of measles is assumed to be 10 days, we would expect the median days to DFE to be around 10 if no outbreak occurs; we see this in the results in TableIII.5. FigureIII.7shows a graph of the DFE outcomes in TableIII.5. A major drop in DFE was seen between 75 and 85% UTD, so more simulations were run within this range. UTD rates were increased by increments of 1% in order to get a better idea of what is happening when the rate is increased from 75 to 80%. Despite adding

(32)

more data points in this range, we still see a major drop in DFE as UTD rates are increased from 77 to 78%. This is perhaps related to a topic touched on in van den Driessche (2017), where they mention how DFE is locally asymptotically stable (LAS) when R0< 1 but unstable when R0> 1. However, the change R0is seen at this point, both the expected R0 and the approximated R0, does not drop to near 1, so it is still unclear what is causing this sudden change.

FigureIII.9shows a graph of the proportion infected from the deterministic results, and the median proportion infected (MPI) and 75th quartile of the MPI in the same diagram. The green line of the MPI shows the same drop that was seen in the DFE graph. The deterministic result and 75th quartile have essentially the same outcomes up to UTD rates of 0.85. All results drop below 0.1 when the UTD rate increases to 0.85, but the deterministic result is higher than the 75th quartile of the median stochastic result. All results approach 0 as the UTD rates reach 90%. This shows that the deterministic result is more extreme than the median stochastic result, though still plausible under that modelling framework. If we refer back to FigureIII.6, we can examine the impact of UTD rates on trends from simulations using the Colorado data for starting configurations. All enrollment groups see a large drop in median days to DFE when UTD rates are around 85%. An increase in UTD rates always decreases the median days to DFE for all enrollment groups. We see the same pattern in R0, both the expected and approximate, as in the DFE graph. R0 is near 1 for all sizes of facilities with high UTD rates. Facilities with UTD rates below 70% had R0 greater than 1.5. We can see a direct correlation between R0 and UTD rates, which is intuitive due to how R0 was calculated using the final size equation (Equation 5). The MPI decreases by more than half each time the UTD rate is increased by around 10%. This is the trend for both stochastic and deterministic results. There are less drastic drops in the 75th quartile than the medians, however (III.4). The probability of outbreak drops below 0.50 for all facilities with high UTD rates. Smaller facilities did not approach 0 as UTD rates approached 1, which shows how small facilities are more certain to have outbreaks than larger facilities with similar UTD rates.

Through these simulations, it is apparent that UTD rates directly impact all characteristics of the SIR models, both stochastic and deterministic. The same trends are seen across all enrollment sizes as UTD rates are increased. These trends are all intuitive, however the drastic decreases seen between UTD rates of 77% and 78% remain unexplained.

(33)

CHAPTER IV MAPPING OUTCOMES

The locations of all Colorado schools and child care centers with at least 10 children in attendance were included in the Colorado data used in previous chapters. This allowed for limited spatial analysis and mapping of the data. Mapping points and adding a third dimension of color allows for viewing severity of outbreaks at each location across the state. Kriging can be used to generate heat maps to see where the hot-spots of severe outbreak risk are across the state. Kriging also allows for de-identification of schools, which may be advantageous depending on the audience.

IV.1 Scatter Plot Maps

Figures IV.1 and IV.2 show all of the data points mapped across Colorado. The color gradient shows the median proportion of infected individuals over 10,000 simulated 120-day outbreaks from the stochastic model in FigureIV.1. The proportion infected over 120 days from the deterministic model can be seen in FigureIV.2. Green points represent locations that had simulations where the proportion infected was below 0.25. Any color darker than that saw at least a quarter of their population be infected by measles at some point within the 120 day simulation. Similar trends can be seen in both maps. It is possible to identify areas that saw the worst simulation outcomes. There is a cloud of slightly darker than green points around the Denver metro area in North-Central Colorado, with very dark points above Denver in the Boulder and Fort Collins areas. There is a cloud of dark points in the Colorado Springs area in central Colorado. All of these locations are the most dense in the state, so a wide variety of outcomes, which has been shown in previous sections to be directly tied to up to date rates, is to be more expected in these areas. The rural location with the highest risk is near Durango, in the Southwest corner of the state. Multiple facilities saw at least 50% of their populations infected within the 120-day deterministic simulation.

IV.2 Ordinary Kriging

Kriging gives a more general overview of hot-spots across the state and de-identifies individual facil-ities. Kriging allows for predicting results in areas without data by looking at the nearest data points available. This analysis is based on the assumption that locations closer to each other are more likely to be similar than locations further from each other. The general formula for a kriging model is

ˆ Z(s0) = N X i=1 λiZ(si), (IV.1)

(34)

Figure IV.1: Results from stochastic model for each school or child care facility in Colorado. Results are in terms of median proportion infected over 120 days of an outbreak.

Figure IV.2: Results from deterministic model for each school or child care facility in Colorado. Results are in terms of proportion infected over 120 days of an outbreak.

(35)

where Z(si) is an observed value at the i’th location, λi is an unknown weight for the observed value at the i’th location, s0 is the location where a measurement will be predicted, and N is the number of observations (ArcGIS(2019a)). In ordinary kriging, the weight parameter, λiis dependent on a Gaussian process model, a type of stochastic process, that has been fitted to the observed data points, distance between observed data and locations where predictions will be made, and the spatial relationship between observed data near the prediction location.

A stochastic process is based on the function

S : Rd→ R (IV.2)

such that S(x) is a random variable (Diggle and Ribeiro (2007)). Proportion infected after 120 days will be the random variable used. A Gaussian process can then be described as having a random vector (S(x1), . . . , S(xn)), where the joint distribution of all x1, . . . , xn is multivariate Gaussian (Diggle and Ribeiro (2007)). This process is further defined by its mean, µ, and its variance, σ2 > 0 (Diggle and Ribeiro(2007)). A kriging model is an unknown function that is realized by a Gaussian process. First, a Gaussian vector is constructed with all observed data that has a mean function of 0, and a known covariance function (Diggle and Ribeiro(2007)).

Returning to equation IV.1, all of the Z(s1) to Z(si) make up a random field Z on Rd. We are assuming this field has a linear trend such that

Z(x) = f′

(x)β + ǫ(x); x ∈ Rd, (IV.3)

where f′

(x) = [f1(x), . . . , fP(x)] are P known functions and β = [β1, . . . , βP]′are coefficients for percent infected at each P location (Abrahamsen and Benth(2001)). The error term ǫ(x) is a Gaussian random field with mean of 0; this field has a known covariance function. A spherical covariance function was used for this model (Diggle and Ribeiro (2007)):

ρ(x) = 1 − 3 2  x φ  +1 2  x φ 3 0 < x ≤ φ (IV.4) ρ(x) = 0 x > φ

(36)

In addition to this, we assume that the coefficients P have a prior multinormal distribution:

β ∼ NP(µ0, Σ0) (IV.5)

(Abrahamsen and Benth(2001)). In order to make predictions for each location, we must calculate the posterior distribution of Z, the random field of these locations, given the exact data, Ze= ze, and the restriction that Zi∈ Bi, where Bi is any Borel set in Rd (Abrahamsen and Benth(2001)). A joint pdf can be used to estimate both the coefficients, β, and the random effects, Z.

f (z, β|ze, Bi ) = Z RP Z RK K(z, β; ˜z, ˜β)f (˜z, ˜β|ze, Bi )d˜zd ˜β (IV.6)

(Abrahamsen and Benth(2001)). This gives the best linear unbiased predictor at each desired location within the field of interest.

There is a two-step process for creating predictions from a kriging model. First, semivariograms and a covariance function are needed to estimate spatial autocorrelation. Second, predictions are made for locations with no observed data (ArcGIS(2019a)).

Semivariograms show spatial autocorrelation in data. Pairs of data points are analyzed to determine the average correlation between points that are approximately the same distance apart (ArcGIS(2019c)). N (N − 1)/2 pairs are made from the N data points, and Euclidean distances are measured between each pair that are separated by distance h (Mert and Dag(2017)):

h = |ui− uj| = q

(xi− xj)2+ (yi− yj)2+ (zi− zj)2; (IV.7) ui = (xi, yi, zi), uj = (xj, yj, zj).

The semivariances are calculated between observation pairs by

γ(h) = 1 2N (h)

X (i,j)∈N (h)

(Z(xi) − Z(xj))2 (IV.8)

(Mert and Dag(2017), ArcGIS (2019b)). Semivariances are calculated for each distance and are then plotted with distance on the x axis and semivariance on the y axis; the semivariogram is the plot of these points (Mert and Dag(2017)). Semivariograms are expected to level out as you move to the right of the x axis, which is measured in distance. Because of this, the semivariogram, or covariance, function is typically spherical (Mert and Dag(2017)), such as the function outlined inIV.4. This function fit to the semivariogram can be seen in FigureIV.4.

(37)

Figure IV.3: Example semivariogram displaying components of a semivariogram.

There are four key components to a semivariogram: the range, sill, partial sill, and nugget (ArcGIS (2019c)). Figure IV.3, taken fromArcGIS(2019c), shows how these components relate to the semivar-iogram. The range describes where the semivariogram will level out and has a slope approaching 0. If the distance between data points is less than the range, they are assumed to be spatially correlated. If they are equal to or further than the range, then there is minimal or no spatial correlation. The value on the y axis, measuring correlation, where the range is reached is called the sill. The distance between 0 on the y axis and the minimum non-0 y-axis value is the nugget. This estimates measurement error or variability within small distances (Mert and Dag(2017)). The partial sill can be calculated by subtracting the nugget from the sill. All of these values are informed by examining the scatter plot of distances against semivariances before the covariance function can be fit. Figure IV.4shows that there is not much spatial correlation between any of the data. This implies that any given region of the state is not more likely to have similar values across it; some schools near each other may have drastically different simulation outcomes. A semivariogram with stronger spatial correlation would have points that fit the curved line much better than how the points in FigureIV.4do. "Fitted-by-eye" parameter values and the sample semivariogram have been used to fit the curve inIV.4(Diggle and Ribeiro (2007)).

Once a covariance function has been fit to the semivariogram, predictions can be made for locations that are missing data. The kriging model calculates weights for areas without data based on surrounding observations (ArcGIS(2019a)). Data closest to the location of prediction have higher weights. Prediction locations are determined by creating a grid of the surface and cell size is manually chosen. Any cell

(38)

Figure IV.4: Semivariogram using Colorado school data, which was used to create a map with hotspots generated from Kriging.

Figure IV.5: Deterministic results showing the proportion infected for each Colorado school (left panel) and all facilities (right panel). Enrollment and up to date rates for each school or child care facility in Colorado was used into inform the parameters of the model. Kriging was used to map results from 120-day simulations.

Figure IV.6: Stochastic results showing median proportion infected for each Colorado school (left panel) and all facilities (right panel). Enrollment and up to date rates for each school or child care facility in Colorado was used into inform the parameters of the model. Kriging was used to map results from 120-day simulations.

(39)

without data will be locations for predictions. Ordinary kriging was used to make predictions, which assumes that there is a constant, unknown mean across the entire grid (ArcGIS(2019a)).

There are slight differences in the maps in FigureIV.5, which show results from the 120-day deter-ministic simulations. The plot on the left shows results from all Colorado schools, while the plot on the right includes both schools and child care centers. There are slightly more pink areas across if the map if it is closely examined, but there is not a stark difference between the two. Both maps have similar hot spots. FigureIV.6shows the results from the stochastic models. The map on the left again shows results from only simulations of Colorado schools, and this map is quite dark compared to those in FigureIV.5 and the second map in FigureIV.6. According to the stochastic results, most Colorado schools have a median proportion infected between 0.06 and 0.10. Only one major hot-spot appears down near Durango. There is one school in that are with quite large enrollment and very low UTD rates. Based on what was seen in the previous section, low rates in a large population guarantee a high median proportion infected. This one school had a result drastic enough to influence that whole zone on the map. No other schools had enough influence to create hot-spots on the map of the stochastic results, even if there were results with relatively high stochastic results. The map with stochastic results that include child care centers, however, is quite bright with most areas showing median proportion infected between 0.08 and 0.15. Again, we saw in the previous section that having very low enrollment allows for more volatile results, especially when using proportions. Since all child care centers fell in the lowest quartile of enrollment of the Colorado data, we see more drastic stochastic results. There are also child care centers in areas where there are not schools, so this allows for areas of the map to be informed by data, rather than defaulting to the intercept value of the Kriging model used to generate these maps.

(40)

CHAPTER V CONCLUSIONS

V.1 Summary of Thesis

The simulation studies in this thesis have demonstrated the utility of two types of compartmental SIR models: deterministic, which are purely mathematical and give only one solution, and stochastic, which introduce random chance into two parts of the model and allow for a measure of uncertainty. Deterministic models are a quick tool for generating some estimate of outbreak risk, though due to the assumptions required for simple SIR models, they may not be suitable for small populations.“Small" populations refer to schools and child care centers, which is why stochastic models can be more useful. Data on enrollment and up to date rates for the MMR vaccine is publicly available for all schools and child care centers in Colorado with 10 or more children enrolled. These data were used as starting configurations for both types of models. These results were analyzed as well as mapped using scatterplots and Kriging models.

Section 1 explored deterministic models, and showed that up to date rates were the main influence on outbreak results. Smaller schools experienced larger percentages of the population being infected than larger schools, even though all quartiles of enrollment had percent infected approach 0 as up to date rates approached 1. There did not appear to be a strong difference between urban, rural, or frontier schools, except that frontier schools are typically smaller and therefore saw slightly more extreme outbreak simulations. Outcomes for private schools and child care centers looked similar due to the correlation between small enrollment and higher percent infected.

Section 2 examined the influence that enrollment and up to date rates have on median proportion infected, median days to disease-free equilibrium, the probability of an outbreak, and the approximate R0 of the simulation. Section 3.1 specifically analyzed the effect of enrollment and it was apparent that simulation results were very robust to changes to enrollment. Enrollment only impacted median days to disease-free equilibrium. Larger populations are able to sustain the measles virus for longer, therefore extending the number of days until there is no longer a single infected person in the population. This did not fully level out over the enrollment sizes used, so it is unclear at which population size disease-free equilibrium will become robust top population increases. These trends were also seen when applied to 15 schools and child care centers from the Colorado data set. The slopes of trends were consistent across up to date groups as enrollment increased, again with the exception of median days to disease-free equilib-rium. Section 3.2 specifically analyzed the effect of up to date rates and this was clearly a major influence

(41)

all model characteristics: median days to disease-free equilibrium, the probability of outbreak, median proportion infected, and the approximate R0of the simulation. This is all intuitive based on knowledge of how herd or community immunity works to prevent outbreaks. The one unintuitive result was the major drop seen in proportion infected, days to disease-free equilibrium, and the approximate R0 when UTD rates were increased from 77% to 78%. It may be due to disease-free equilibrium not being locally asymptotically stable when R0 is above 1, as proposed by van den Driessche, however, the R0 at 78% is still not close to 1. The 75th quartile of the proportion infected does not see this drop, though, so this implies that at this point suddenly a larger proportion of simulations experience 0 or near-0 infections. Again, it is unclear exactly why this has happened at this point.

Finally, Section 4 explored two different visualizations of the data mapped with Colorado county lines. There was not a noticeable difference in scatter plots of the deterministic and stochastic simulation results. Darker points indicating high proportions infected appeared in the same locations across the state. Kriging was used to infer outcomes across the state where no data were available. The two maps from deterministic results show no major difference between mapping only schools and including both schools and child care centers in the results. There was a significant difference between the stochastic results, however. When child care centers were excluded from the Kriging model, the majority of the state looks like the proportion infected was between only 0.05 and 0.10, with a major hot-spot around Durango, Colorado. When child care centers were included, more data points from small-population simulations, which were previously shown to have more volatile results than larger populations, causes many areas of the map to have predicted proportions infected between 0.10 and 0.18.

Up to date rates are the main factor influencing outbreak outcomes, which further enforces the need to increase vaccination rates in Colorado. With one of, if not the lowest, MMR vaccination rates in the country for incoming Kindergartners, the state has been extremely lucky that an infected person has not mixed with any people in the susceptible pockets across the state. Parents of today’s elementary-aged children were likely vaccinated themselves and therefore have no first-hand experience with measles, mumps, or rubella, and have been lax in vaccinating their children. Hopefully news and first hand experiences from the over 1,100 cases this year makes parents realize that the vaccine is necessary for protecting their children.

References

Related documents

Industrial Emissions Directive, supplemented by horizontal legislation (e.g., Framework Directives on Waste and Water, Emissions Trading System, etc) and guidance on operating

The EU exports of waste abroad have negative environmental and public health consequences in the countries of destination, while resources for the circular economy.. domestically

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Uppgifter för detta centrum bör vara att (i) sprida kunskap om hur utvinning av metaller och mineral påverkar hållbarhetsmål, (ii) att engagera sig i internationella initiativ som

Regioner med en omfattande varuproduktion hade också en tydlig tendens att ha den starkaste nedgången i bruttoregionproduktionen (BRP) under krisåret 2009. De

a) Inom den regionala utvecklingen betonas allt oftare betydelsen av de kvalitativa faktorerna och kunnandet. En kvalitativ faktor är samarbetet mellan de olika

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

• Utbildningsnivåerna i Sveriges FA-regioner varierar kraftigt. I Stockholm har 46 procent av de sysselsatta eftergymnasial utbildning, medan samma andel i Dorotea endast