Modeling telemedicine customer arrivals

(1)

Modeling telemedicine

customer arrivals

ARON STRANDBERG

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

(3)

customer arrivals

ARON STRANDBERG

Master in Computer Science Date: April 17, 2021

Supervisor: Viktoria Fodor Examiner: György Dán

School of Electrical Engineering and Computer Science Host company: KRY International AB

(4)

Abstract

This project investigates whether customer arrivals to digital healthcare providers can be appropriately modeled by non-homogeneous Poisson processes (NHPPs).

Data from a large Swedish telemedicine provider was examined, with the goal of confirming that the data exhibited the two required properties of Pois- son processes: customer interarrival times being (i) exponentially distributed, and (ii) independently and identically distributed. The arrival process was modeled as a piecewise constant arrival function, with several interval lengths considered. Two statistical tests were performed, the Kolmogorov-Smirnov and Brock-Dechert-Scheinkman tests, to confirm whether the two properties held.

The experiments showed that a majority of intervals exhibited both properties, suggesting that NHPPs are a useful approach for modeling telemedicine customer arrival data.

(5)

Sammanfattning

Detta arbete undersöker kundankomster till digitala vårdgivare, och huruvida dessa kan modelleras med en inhomogen Poissonprocess (NHPP).

Data från en stor svensk digital vårdgivare har undersökts, med målet att bekräfta om datan uppvisar de två egenskaperna som krävs av en Poisson- process: att tiden mellan ankomster är (i) exponentialfördelade, och (ii) obe- roende och likafördelade. Markovprocessen som beskriver ankomsterna har beskrivits som en styckvis konstant funktion. Två statistiska test har genom- förts, Kolmogorov-Smirnov och Brock-Dechert-Scheinkman. Enligt resulta- ten uppvisade en majoritet av datan de två egenskaperna, vilket antyder att inhomogena Poissonprocesser är en användbar modell för ankomster till digitala vårdgivare.

(6)

1 Introduction 1

1.1 Telemedicine . . . 1

1.2 Modeling Telemedicine Systems . . . 2

1.3 Problem statement . . . 3

1.4 Research method . . . 3

1.5 Kry . . . 4

1.6 Ethical considerations . . . 4

2 Background 5 2.1 Queueing Theory . . . 5

2.2 Queue Model . . . 5

2.2.1 Model Parameters . . . 6

2.2.2 Kendall’s Notation . . . 7

2.2.3 Common Distributions . . . 7

2.2.4 Service Disciplines . . . 7

2.2.5 Customer behaviour . . . 8

2.3 Arrival Processes . . . 8

2.3.1 Stochastic Processes . . . 9

2.3.2 Counting Processes . . . 9

2.3.3 Time-dependent Arrival Processes . . . 9

2.3.4 Poisson Processes . . . 10

2.3.5 Non-homogeneous Poisson Processes . . . 11

2.4 Arrival Process Modeling . . . 11

2.5 Rate Functions . . . 11

2.5.1 Parametric models . . . 12

2.5.2 Nonparametric models . . . 13

2.5.3 Semiparametric models . . . 15

2.6 Parameter Estimation . . . 16

2.7 Poisson Properties . . . 16

v

(7)

2.7.1 Poisson distribution . . . 16

2.7.2 Independent arrivals . . . 17

3 Related Work 19 3.1 Telemedicine . . . 19

3.2 Call Centers . . . 19

3.3 Hospitals . . . 21

4 Arrival Process Modeling in Telemedicine 22 4.1 Telemedicine Arrival Data . . . 22

4.1.1 Unrounding . . . 23

4.2 Notation . . . 23

4.3 Arrival Process . . . 24

4.4 The Piecewise Constant Model . . . 25

4.4.1 Motivation . . . 25

4.4.2 Interval Selection . . . 26

4.5 Verifying the Poisson assumption . . . 26

4.5.1 Exponentiality of interarrival times . . . 27

4.5.2 Independence of interarrival times . . . 27

4.6 Parameter Estimation . . . 28

4.6.1 Finding the maximum likelihood estimator . . . 28

4.6.2 Simulation . . . 29

4.6.3 Technical Dependencies . . . 30

4.7 Results . . . 30

4.7.1 Interval size . . . 30

4.7.2 Poisson properties . . . 33

4.7.3 Parameters and simulation . . . 35

5 Discussion 39 5.1 Conclusions . . . 39

5.2 Future work . . . 40

5.2.1 Additional data analysis . . . 40

5.2.2 Other models . . . 40

5.2.3 Simulation tools . . . 40

Bibliography 41

(8)

Introduction

Arrival processes in health care have been studied extensively. However, the introduction of telemedicine afforded by technological advancements in mobile devices means that health care is now available at all hours, at any physical location, making new research needed.

1.1 Telemedicine

The notion of telemedicine first began with the invention of the telephone in the late 19th century [1], but did not reach mainstream adoption until the late 20th century. Rapid advances in technology and telecommunications, perhaps most prominently the introduction and widespread adoption of the internet, has radically transformed society. Almost every aspect of everyday life has been impacted, including health care.

The word telemedicine stems from the Greek word tele, for “distance”, and the Latin word medic¯ına, meaning “remedy” or the “practice of healing” [1].

The World Health Organization in 1998 [2] defined telemedicine as:

“The delivery of healthcare services, where distance is a critical factor, by all healthcare professionals using information and com- munication technologies for the exchange of valid information for diagnosis, treatment and prevention of disease and injuries, research and evaluation, and for the continuing education of healthcare providers, all in the interest of advancing the health of indi- viduals and their communities.”

The matter of administering health services using telecommunication takes a variety of shapes, ranging from the mundane in electronic medical records

1

(9)

to the unconventional in performing surgery remotely using robotics. Other examples include pre-written articles on websites such as WebMD [3] and speaking to a nurse over the phone, like in the case of for example the Swedish service 1177 [4].

The introduction of smartphones and high-speed mobile internet has made real-time video transfer a staple in everyday life. This led to the possibility of providing telemedicine in the form of remote doctor consultations over a video call, opening a new segment in the field. Examining the patients over video gives physicians more input to aid in the diagnosis, such as the ability to examine skin rashes and wounds. The market for providing telemedecine consultations is increasingly crowded, and is growing rapidly in terms of number of competitors, users, and consultations [5].

The most apparent improvement offered by telemedicine is improved medical efficiency. Telemedicine is especially helpful in remote areas, where the nearest medical clinic may be hours away. When seeking physical care, patients routinely have to travel long distances to receive basic medical advice, and spend significant periods in the waiting room, whereas in a digital consul- tation, the patient can be essentially anywhere, and perform other tasks while waiting for their turn, without having to physically be in the queue. Conversely, there are many who are unable to visit hospitals, most prominently among the elderly population. These patients have to be visited by traveling physicians who spend a large part of the working day just driving between patients.

Eliminating these periods of travel time results in improvements in terms of efficiency, meaning a larger number of patients can be helped in same time frame.

1.2 Modeling Telemedicine Systems

A central problem in telemedicine is to ensure there is an appropriate number of physicians working at any given time — if there are too few, waiting times may exceed desirable levels; if there are too many, needless resources will be spent that could have been used elsewhere.

Since staffing personnel must generally be done in advance, there is a need to forecast the number of physicians required for some time into the future.

This in turn requires predicting several different factors, such as the number of patients that will require care, the time required to serve each patient, and which types of expertise the physicians will require to handle the various ail- ments patients might seek care for.

We begin by considering the arrivals in terms of the number of patients

(10)

arriving, and what times they arrive. This is done by modeling the incoming patients as a Markovian arrival process.

1.3 Problem statement

The central problem this thesis considers is modeling arrivals in a setting in which the arrival pattern varies strongly over time. Several other areas feature similar time-varying arrival patterns, the most apparent being arrivals in other health care settings, such as patients arriving to a hospital emergency room.

Other examples include weather phenomena and customer support call centers, which both show time-varying features at differing time scales — snow storms are significantly more common in the winter, and call centers receive far fewer calls during the night.

Previous research has shown success in using non-homogeneous Poisson process to model these time-varying arrival patterns in other fields (see Section 3 for a summary). This thesis investigates whether previous results will also apply to telemedicine, with the specific research question:

Are non-homogeneous Poisson processes well-suited to modeling time-varying arrival processes in telemedicine?

Based on the similarities between telemedicine and call centers providing customer support, the hypothesis is that previous results in using non-homogeneous Poisson processes will be applicable.

1.4 Research method

The research begins by constructing a data set based on arrival data collected by a telemedicine provider, grouped in equally sized intervals, and structured to provide both arrival counts and arrival times. Three interval lengths are investigated. A model based on a non-homogeneous Poisson process is described, and two statistical tests are performed to investigate whether a non- homogeneous Poisson process is applicable. Finally, the parameters of the model are estimated. Specifically, the research will consist of the following steps:

• Researching different modeling techniques

• Selecting a suitable model

(11)

• Collecting a data set of arrivals

• Researching statistical tests to examine the Poisson properties

• Performing the statistical tests

• Constructing and estimating the parameters of the model

1.5 Kry

Kry is a Swedish health care company providing telemedicine services in Swe- den and selected European countries. The data set considered in this thesis has been provided by Kry, and the company has also provided a supervisor to ad- vise the student throughout the project.

1.6 Ethical considerations

Whenever working with personal data, especially in the context of health care, certain considerations have to be made. While the data used in this thesis has been generated by patient meetings, it contains no personal information and the person is not identifiable.

(12)

Background

This chapter contains the theoretical background behind the concepts discussed in the thesis.

2.1 Queueing Theory

Queueing theory concerns the statistical modeling of queueing systems and their behavior. While often considered a subset of operations research, stem- ming from its primary use case of increasing efficiency of business operations, the area itself is based on probability theory and stochastic processes.

Users of queueing theory aim to get insights which will enable them to lower customers’ waiting times or reduce the number of servers required to service a queue. For customers, reducing the time spent queueing frees up time for other uses. For businesses, finding the optimal number of servers required to service a queue while maintaining the same level of waiting times can help them reduce unnecessary spending.

Queueing theory was invented in the early 20th century by Danish engineer Agner Krarup Erlang, who formulated his earliest work on the subject in [6].

It was originally developed to study telephone exchanges, but it has since been used to model queues across a wide range of industries, such as hospitals and road traffic engineering.

2.2 Queue Model

The two fundamental units constituting a queue are customers, waiting to be helped, and servers, aiding the customers [7]. Although these terms can be

5

(13)

used in a general manner, we will adopt a few context-specific names within the thesis.

Patient Customer Physician Server

Meeting The service of a patient by a physician

2.2.1 Model Parameters

When constructing a model of the queue, there are a number of standard parameters [7, 8]:

Arrival Process The arrival process, denoted by A(·) models the pattern of customers arriving to the queue in terms of the number of customers arriving and at what times they arrive. This is formulated as a probability distribution of the time between subsequent arrivals, known is interarrival times.

The arrival process is most frequently modeled as a Poisson process, in which the time between arrivals will be exponentially distributed. Other distributions include D, in which the time between arrivals is deterministic, and Ek, the Erlang distribution.

Service Distribution The service time distribution, denoted by S(·), is a description of the duration of the time taken to service each customer. Like the arrival process, this is described using a probability distribution. The service time distribution is most commonly assumed to be exponential, but can also modeled as having deterministic and Erlang distributions.

Number of servers The number of servers servicing the queue, denoted by c. This is usually described as a fixed number, possibly varying over time.

Queue Size The number of places in the queue, also known as the buffer.

Denoted K, it is described by a fixed number or infinity. A queue with a buffer of size 0 is known as a loss system.

Population The source of possible customers, N. May be a finite set in some contexts, but often the population is simply considered infinite. In a school cafeteria, the population might be the students and teachers at the school, whereas the possible visitors to a web page is everyone with access to the

(14)

internet — while finite, the number is so large it is safe to assume it infinite, which will simplify the mathematics. The population size is primarily of interest when it is small, at which point the arrival rate might vary based on the number of customers currently in the queue (since there might not be enough customers in the population to enter the queue, which will then lower the rate).

Queue Discipline The discipline, D, describes how to determine the next customer to receive service. Common examples are described in Section 2.2.4.

2.2.2 Kendall’s Notation

Queue models are often described using these parameters, in a notation described by Kendall in [9], in the form A/S/c/K/N/D. A more common form is A/S/c, where the last three parameters are omitted. In this case, the sizes of the buffer and the population are assumed to be infinite, and customers are served by the order of arrival. An example is the M/D/5 queue, which features random arrivals but deterministic service times, with 5 servers.

2.2.3 Common Distributions

Examples of probability distributions commonly used for the arrival process and service distributions are:

M Exponential distribution

D Degenerate distribution (interarrival or service times are deterministic) E_k Erlang-k distribution (the sum of k exponential variables)

G General distribution. This is not a distribution itself, but an umbrella term used to study general behaviors which do not depend on a specific distribution.

2.2.4 Service Disciplines

When studying queueing systems with human customers, service is predomi- nantly given in the order in which customers arrived. There are however several different disciplines which become more common when studying queue models which deal with non-human customers, such as processor scheduling policies.

(15)

First come, first served (FCFS) The queue is served sequentially in order of arrival time.

Priority service (PQ) Customers are divided into groups of importance, and the most important one is served first. The customers within a priority level will then be served according to some other discipline, usually FCFS.

There is also another secondary strategy on how to deal with higher-priority service arrivals while serving another customer. In some cases, the service is interrupted and the lower-priority customers is returned to the queue, or the server may finish helping that customer and the newly arrived customer will be served by the next available server. One common strategy seen in e.g.

airline baggage handling desks is to have a separate queue which only serves high-priority customers.

Random service (RS) Customers are chosen randomly. This strategy for example occurs in operating systems, to decide process execution order.

2.2.5 Customer behaviour

When constructing queue models that feature real customers, extra care needs to be taken to model human behavior, specifically concerning large waiting times. These behaviors are characterized by these terms:

Balking Deciding to not join the queue

Reneging Leaving the queue before receiving service Jockeying Switching to another queue

Arriving customers who balk, renege, or who are unable to join a queue with limited queue size are denoted as dropouts, and the dropout rate may have a significant effect on the queue behavior.

2.3 Arrival Processes

The arrival process is one of the fundamental elements of a queueing model, describing the flow of customers arriving to the queue. To account for the uncertainty in both the number of arrivals and arrival times, this is modeled as a stochastic process [10]. The arrival process measures at what times customers arrive to the queue, denoted by {t^k | k 2 Z⁺}. The arrival times are strictly ordered, that is t1 < t2 < . . . < tn. The arrival times depend on one parameter, the average rate of arrivals, denoted by .

(16)

The interarrival times are defined by the time elapsed between two subsequent arrivals, denoted by the random variables {⌧k = tk tk 1 | k 2 Z+} [8]. The arrival process is usually expressed as the probability distribution of the interarrival time, A(t) = P (⌧  t) [8].

The most commonly used distribution used for interarrival times is the exponential distribution, which is expressed as A(t) = 1 e ^t, > 0. An arrival process with this distribution constitutes a Poisson process [10], de- scribed further in Section 2.3.4.

2.3.1 Stochastic Processes

A stochastic process is a mathematical object constructed by a family of random variables, indexed by some set T , written as {X(t) : t 2 T}. While the index set T can be any set, in many cases it is viewed as time. The variable X(t)will then represent a value at time t [11].

2.3.2 Counting Processes

With the index set T representing time, and the variable X(t) representing the cumulative number of arrivals at time t, the stochastic process {X(t) : t 2 T}

can be defined as a counting process. The counting process is denoted as {N(t), t 0}, where N(t) is the number of arrivals in the interval (0, t].

Allen [11] gives the following definition of a counting process. {N(t), t 0}

is a counting process if:

1. N(0) = 0

2. N(t) assumes only nonnegative integer values 3. s < t =) N(s)  N(t)

4. N(t) N(s) is the number of arrivals in the interval (s, t]

2.3.3 Time-dependent Arrival Processes

In general, arrival processes assume the rate of arrivals is constant over time, an assumption that does not hold in many applications [12]. As an example, a restaurant may see a spike in customer arrivals during lunch time each day, but the rate of arrivals may also increase over time as the restaurant becomes more well-known. These varying behaviors of the arrival rate can be categorized into these two types:

(17)

Cyclic The rate varies over time according to a repeating pattern, such as over the time of day, known as periodicities

Trend The rate exhibits a long-term trend, such as slowly increasing as the size of a company grows

2.3.4 Poisson Processes

Arrival processes are often modeled with exponential interarrival times, which results in several desirable features [10]. When arrival times are exponentially distributed, the number of arrivals in the interval (0, t] will follow a Poisson distribution, after which the Poisson process is named. The Poisson process is a special case of a birth-death process, where the number of deaths is always 0.

Birth-death processes are themselves special cases of continous-time Markov processes.

The central parameter is the arrival rate of the process, denoted as . The exponential distribution of the interarrival times has parameter , whereas the Poisson distribution of the number of arrivals has parameter t, where t as above is the length of the interval. The probability of there being n arrivals in the interval (0, t] is thus (1) [13].

P{N(t) = n} = e ^t( t)ⁿ

n! (1)

Another property of the exponential distribution is the memoryless property [10], referring to the probability of a certain outcome not depending on the current state of the process. Formally, this is defined as the equality of the two probabilities: P (X > s + t | X > s) = P(X > t). Consider an example described in terms of waiting time. The memoryless property states:

After already waiting 30 seconds, the probability of having to wait another 10 seconds is equal to the probability of having to wait 10 seconds in the first place.

The proof of this is shown in (2), where T is a random variable indicating waiting time [10, 13].

P (T > s + t| T > s) = P (T > s + t\ T > s) P (T > s)

= P (T > s + t)

P (T > s) = e ^(s+t)

e ^s = e ^t= P (T > t)

(2)

(18)

2.3.5 Non-homogeneous Poisson Processes

To model time-dependent arrival processes, the concept of non-homogeneous Poisson processes (NHPPs) was introduced. NHPPs are a generalization of the Poisson processes, where the arrival rate varies over time. By modifying the constant rate parameter into a (non-negative and integrable) function of time known as the arrival rate function, (t), this behavior can be expressed.

The probability of n arrivals in the interval (a, b] is then (3), where ⇤(a, b) is the cumulative arrival rate function, describing the number of arrivals in the interval (a, b] with arrival function (t), defined as (4).

P{N(a, b] = n} = e ^⇤(a,b)(⇤(a, b))ⁿ

n! (3)

⇤(a, b)⌘ E{N(a, b]} = Z b

a

(t) dt (4)

The first equality expresses the relation between the cumulative arrival function and the expected value of the counting process. The counting process properties of the NHPP {N(t), t 0} is changed from what what was described in Section 2.3.2 to the following [12, 14]:

1. N(0) = 0

2. {N(t + s) N(t), s 0} is independent of {N(u), 0  u < t}

3. P {N(t + h) N(t) = 1} = (t)h + o(h) 4. P {N(t + h) N(t) 2} = o(h),

where o(h) is the little-o notation as h ! 0.

2.4 Arrival Process Modeling

2.5 Rate Functions

One of the central problems in queueing theory is modeling the arrival rate function. An overview of different methods used to model rate functions for NHPPs is provided in [12], divided into three categories.

(19)

2.5.1 Parametric models

Parametric methods attempt to approximate the arrival rate (t) with a function on a specific form, modeled to capture various behaviors observed in arrival patterns. Commonly proposed rate functions include a mixture of exponential, polynomial, and trigonometric components. Polynomial components are well-suited to model long-term trends, the exponential ensures the arrival rate is positive at all times [15], while the addition of trigonometric features is intended to capture periodic behaviors such as daily or weekly cycles. The parameters of these function are are then estimated from observed arrival data.

Exponential-polynomial Early parametric models used exponential-polynomial functions to approximate rate functions. The rate function (5) introduced in [16] is the simplest of these exponential-polynomial rate functions. The family is also known as log-linear rate functions, since taking the logarithm of the function yields a linear combination of the parameters ↵i. Rate functions on this form have been covered extensively in the literature, see e.g. [17] [15].

(t) = exp(↵0 + ↵1t) (5)

[18] considers the exponential-polynomial function (6) to approximate (t), and notes that “any continuous rate function can be approximated arbitrarily closely by an exponential polynomial”, referencing the Weierstrass approximation theorem:

Weierstrass Approximation Theorem. Suppose f is a continuous real-valued function defined on the real interval [a, b]. For every ✏ > 0, there exists a poly- nomial p such that for all x in [a, b], we have |f(x) p(x)| < ✏, or equivalently, the supremum norm kf pk < ✏.

(t) = exp Xr m=0

↵mt^m (6)

Exponential-polynomial-trigonometric An extension to the exponential poly- nomial models is to add a trigonometric component to represent the cyclic behavior often present in time-dependent arrival processes, for example the weekly cycle. [17] introduces another rate function on the form of (7) to represent these cyclic behaviors with frequency !.

(t) = exp⇣X²

i=0

↵itⁱ+ sin(!t + )⌘

(7)

(20)

Lee et al. [19] model the arrival of storms at an off-shore drilling plat- form using an NHPP with the similarly constructed exponential-polynomial- trigonometric rate function (8). They present a method for simulating the process using thinning, based on a theorem described in [20].

(t) = exp⇣X^m

i=0

↵itⁱ+ sin(!t + )⌘

(8) Thinning is a method of generating a new realization of a stochastic process based on a previously obtained realization. First, the rate function (t) is approximated by a majorizing function ˜(t), meaning ˜(t) (t)for all t.

The authors recommend a piecewise linear structure for the majorizing function.

Second, the majorizing function is used to generate a new set of arrival epochs. This set of arrival times, { ˜An}, is thinned by randomly removing a subset of arrivals. Each arrival time is removed with the probability 1

( ˜An)/˜( ˜An). The remaining set of arrival times will then be a new realization of the same stochastic process, with the same original rate function (t), and over the same interval.

EPTMP [21] continues on their work in [22], proposing an arrival process with an exponential-polynomial-trigonometric rate function with multiple periodicities (EPTMP). The form of the EMTMP-type rate function is shown in equation (9).

(t) = exp⇣X^m

i=0

↵itⁱ+ Xp k=1

ksin(!kt + k)⌘

(9)

2.5.2 Nonparametric models

Nonparametric methods do not make any particular assumptions about the form of the arrival rate function (t), instead fitting a function to the observed arrival data.

Piecewise constant models A simple way approximating (t) with a piece- wise constant function by dividing the period into subintervals (e.g. by hour), and assuming the rate is constant for these subintervals. See e.g. [23] [24] [25]

for usages of piecewise constant models. This method relies on choosing an appropriate size for the size of the intervals, and may not provide satisfactory

(21)

results if the rate varies strongly within the chosen subintervals. A way to reduce the somewhat arbitrary choice of subintervals is to use a change-detection algorithm to detect rapid changes in rate and/or variance, and choose subintervals dynamically, as demonstrated in e.g. [26]. Whitt [27] describes a piece- wise constant model, dubbing it average stationary approximation (ASA), and suggests setting the interval length to the mean service time.

An example of a piecewise constant model estimate, from Chen and Schmeiser [24]: divide the sample into k intervals, where the interval limits are ti, i = 1, 2, . . . , k. For a sample with ni arrivals in the interval i over m realizations, the piecewise rate function is then (10), where iis (11).

(t) = i for ti 1< t tⁱ (10)

i = ni

m(ti 1 ti) (11)

Piecewise linear models Assuming the arrival rate to be constant within the intervals has several advantages, but may result in a poor-fitting function in arrival processes with rapidly changing arrival rates. Another solution is to instead fit a linear function to each subinterval, which provides a better estimation of rate functions which continuously increase or decrease during the intervals. This approach has been described in e.g. [28] [29]. A linear function in each interval also allows the rate function to be continuous (although not differentiable) everywhere.

Leemis[30] instead proposes a piecewise linear estimator for the cumu- lative arrival rate function. This results in a piecewise constant arrival rate function, with the estimate given in (12) over the interval (0, S]. k is the number of realizations, ni, i = 1, 2, . . . , kis the number of arrivals in realization i, n = P^k_i=1ni is the total number of arrivals in an interval across all realizations, and ti, t2, . . . , tnare the order statistics of the superposition of the k realizations, with t0 = 0and tn+1 = S.

⇤(t) =ˆ in (n + 1)k+

✓ n(t ti) (n + 1)k(ti+1 ti)

◆

, ti < t ti+1; i = 0, 1, . . . , n (12) Piecewise polynomial models To counter to computational difficulty en- countered with exponential polynomial models, [31] proposes a piecewise polynomial estimation of the rate function.

(22)

2.5.3 Semiparametric models

Kuhl et al. [32] describe a nonparametric approach that involves fitting functions to the data at p different resolutions, where a resolution describes either a long-term trend or a cyclic behavior. The procedure begins at resolution R0, by estimating the long-term behaviour of the arrival process. The method then examines the arrival process at resolution Ri, i = 1, . . . , p 1, examining each cyclic component in turn, from the largest period to the smallest. For every resolution, a monotonically increasing function ˆR_i(t), for all t 2 (0, bi)is fitted, describing the ratio of arrivals in a cycle of length biexpected to have occured at time t. The functions are then combined to create the mean-value function estimate ˆµ(t), using (13). Let N(S) denote the number of observed arrivals at time S. The functions { ˆQi(t), i = 0, . . . , p} are defined iteratively using (14) and (15). The first iteration Qi=pis (14), and Qi is (15) for i = p 1, . . . , 0, where ji,t is the unique integer such that (j 1)bi  t < jbⁱ.

⇤(t) = N (S) ˆˆ Q0(t) for t 2 (0, S] (13)

Qˆp(t) = ˆRp(t (jp,t 1)bp) (14)

Qˆi(t) = ˆRi[(ji+1,t 1)bi+1 (ji,t 1)bi] +{ ˆR_i[j_i+1,tb_i+1 (j_i,t 1)b_i] Rˆi[(ji+1,t 1)bi+1 (ji,t 1)]} ˆQi+1(t)

(15)

Kuhl et al. implements this approach in [33], providing a ready-made pro- gram for users to fit a mean-value function to arrival data. The paper also includes results from a series of experiments with regards to the performance of this approach, concluding that it provides good results and showing that simulating realizations from a model fitted with this approach using the inver- sion method is significantly faster than using the method outlined in [21].

Kuhl et al. further develops on their work in [34] to be able to model arrival process that do not exhibit any cyclic behaviours. They propose the general form of polynomials of a sufficiently high degree r, where the estimation requires estimation of the parameters k. The cumulative arrival function takes the form of ⇤(t) = ⇤(S)R(t) 8t 2 (0, S], where R(t) is the proportion of

(23)

cumulative arrivals up to point t, taking the form of (16).

R(t) = 8>

><

>>

:

t/S if r = 1

Xr 1 k=1

k(t/S)^k+

✓ 1

Xr 1 k=1

k

◆

(t/S)^r if r > 1 (16)

2.6 Parameter Estimation

One part of modeling the arrival process is estimating the model’s parameters based on observed data. There are several methods for this task, the most prevalent of which is maximum likelihood estimation. The method is based on constructing a likelihood function L(✓). Intuitively, given the parameters

✓ = (✓₁, ✓₂, . . . , ✓_n)and the observed sample y = (y1, y₂, . . . , y_n), the function L(✓; y) gives the likelihood of the parameters ✓ yielding the sample y.

By maximizing the likelihood function, ˆ✓ = arg max L(✓; y), we find an estimate ˆ✓ of the parameters which best fit our sample. A more comprehensive description of the maximum likelihood method can be found in e.g. [35].

The usage of maximum likelihood estimation to estimate the parameters of an NHPP model has been considered extensively in the literature, over a range of problem domains, see e.g. [36, 21, 19, 37, 18, 29, 31, 38, 39].

2.7 Poisson Properties

To confirm whether the arrival process can be modeled as a Poisson process, two properties must be validated.

2.7.1 Poisson distribution

To form a Poisson process, the number of arrivals in any finite interval must follow a Poisson distribution. By definition, this also means that the interarrival times follow an exponential distribution. One of the most widely used tests to confirm that the observed sample is drawn from an exponential distribution is the Kolmogorov-Smirnov (KS) test [40, 41], used in e.g. [18, 31], and its modified variant known as the Lilliefors test. Due to its prevalence in related literature, this is the test used in this thesis.

Kim and Whitt present a method to divide arrival data into intervals, and combine data from several intervals using the conditional-uniform (CU) property, before applying the Kolmogorov-Smirnov test. They compare the CU

(24)

test to another method, where the CU transformation is followed by sorting the interarrival times [42]. [25] uses a Pearson’s ²test to investigate the same question.

Other similar tests such as the Shapiro-Wilk [43] and the Anderson-Darling [44] tests have been shown to have greater statistical power.

Kolomorov-Smirnov test

The KS test measures the maximum difference between the empirical distribution function of the sample and the cumulative distribution function for the distribution being tested. If the test statistic is larger than the corresponding critical value in the statistics table, the hypothesis that the sample comes from the distribution is rejected.

The test statistic is given in (17), where Fn(x)is the empirical distribution function for the samples Xi. Fn(x)is defined in (18). I[ 1,x](Xi) is the in- dicator function, returning 1 if Xi  x, and 0 otherwise. Intuitively, Fⁿ(x) measures the ratio of elements in the sample X less than x. F (x) is the cumulative distribution function. The parameters for F (x) are specified in a regular KS test, and estimated from the sample when using the Lilliefors version.

When parameters for the distribution must be estimated from the sample, the critical points in standard statistics tables used in the KS test are no longer valid — the probability of incorrectly rejecting the null hypothesis (type I er- ror) will be smaller than given in the tables. Lilliefors [45] provides a new statistics table based on simulations, although the same test statistic is used.

The Lilliefors test has been showed to outperform the KS test in terms of statistical power [46, 47].

Dn= sup

x |Fⁿ(x) F (x)| (17)

Fn(x) = 1 n

Xn i=1

I[ 1,x](Xi) (18)

2.7.2 Independent arrivals

If the arrival process is a Poisson process, the arrivals in disjoint intervals are also independent variables. This property is often overlooked in studies [26], but can be verified using the Brock-Dechert-Scheinkman (BDS) test [48].

The BDS test examines whether the process that has generated a certain time series sample is i.i.d. by measuring the distance between pairs of points.

(25)

In this context, a point is an interarrival time.

✏denotes a distance below which a set of points are considered to be close to each other. It can be specified in a number of ways, for example as a real number or in terms of the standard deviation of the sample. If the sample is i.i.d., the probability of the distance being less than or equal to ✏ is constant.

C1,n(✏)is the correlation integral in embedding dimension 1, meaning the probability of each pair of points in the set for a sample of n observations satisfying the ✏ condition.

Cm,n(✏)is the corresponding correlation integral for embedding dimension m. The embedding dimension refers to the number of consecutive points used to construct the set of pairs. The definition for the correlation integral Cm,n(✏) is omitted here, but can be found in [48]. If the sample is indeed i.i.d., the following condition will hold [48]:

C_m(✏) = C₁(✏)^m (19)

The test statistic is given in (20) and (21), where Vm,n(✏)is the variance of the sample. If the sample is i.i.d., as shown in [48], the test statistic will be close to zero and follow a normal distribution: Wm,n(✏)! N(0, 1).

Wm,n(✏) = p

nTm,n(✏)

V_m,n(✏) (20)

T_m,n(✏) = C_m,n(✏) C_1,n(✏)^m (21)

(26)

Related Work

This section examines a selection of previous work. First, the research into telemedicine is examined. Then, two examples are examined, telephone call centers and hospitals, where non-homogeneous Poisson processes have been used to model queueing systems with varying arrival rates.

There is significant work showing various approaches to model the arrival rate function in a non-homogenerous Poisson process. These are explored in greater detail in Chapter 2.4.

3.1 Telemedicine

Van Zyl and Van Dyk [49] provides an overview of the research in the intersection of operations research in the intersection of operations research and telemedicine. Their research showed that the number of scholarly articles on the use of operations research in telemedicine was astoundingly low. However, their study was conducted in 2011, and the findings may be outdated.

Their study considers operations research as a whole, and only a small section is dedicated to the field of queueing theory, considered in this thesis.

They examine the work of Tarakci et al. [50], in which they study a queueing system providing an optimal strategy for deciding which patients to treat via telemedicine and which to refer to physical care.

3.2 Call Centers

Modeling arrivals in call centres has been studied extensively. Here two papers are examined, one which itself reviews the literature and refers to several interesting papers, and one which performs a study of call center data.

19

(27)

Gans et al. [51] conducts a review of the literature concerning telephone call center operation, including queueing models, customer behavior, and statistical analysis, but also human resources problems like hiring and training workers. The paper begins with a description of how a typical call center op- erates, and calculations for the Erlang C queue model (M/M/c in Kendall’s notation).

They discuss the concept of skills-based routing. Skills-based routing concerns the matching problem between callers and servers with regards to what skills are required by the server to assist a specific customer. A simple example is a call center providing support in multiple languages. There must then be a strategy for how to hire, staff, and organize servers in order to be able to handle requests in all languages.

They then consider the problems of determining the parameters for the Erlang C model, namely the arrival rate and service duration. They review several modeling and prediction techniques for these parameters, such as the well-known ARIMA (autoregressive integrated moving average) models in a paper by Andrews and Cunningham [52], a nonparametric piecewise-linear model in a paper by Massey et al. [53], and the work of Brown et al. in predicting confidence intervals for the arrival rate and service times, together forming a confidence interval for the offered load [54].

Kim and Whitt [42] considers arrival data from from the telephone call center of a medium-sized American bank, receiving approximately 300,000 daily calls. The arrival data shows strongly time-varying behaviour, and the paper examines whether it is appropriate to use a non-homogeneous Poisson process model. The authors describe three common difficulties with modeling data with varying arrival rates:

“(i) data rounding, e.g., to seconds; (ii) choosing subintervals over which the rate varies too much; and (iii) overdispersion caused by combining data from fixed hours on a fixed day of the week over multiple weeks that do not have the same arrival rate.”

They show methods to resolve each of these problems. Using a piecewise- constant NHPP model, they perform a Kolmogorov-Smirnov statistical test, utilizing the conditional-uniform property to combine arrival data from several intervals, to confirm that the data adheres to the Poisson properties. The results show that a majority of days examined exhibit the Poisson properties, but that this depends on avoiding the three problems they have described.

(28)

3.3 Hospitals

Like call centers, hospitals exhibit significant variance in the rate of arrivals.

There has been extensive research into applying queueing models to hospitals and other physical healthcare systems (see [55] and [56] for a review of the literature).

Kim and Whitt [42], using the same methods described previously, also studied arrivals to the emergency department at a hospital in South Korea.

The department receives on average 138.5 arrivals per day, with the daily arrival rate varying noticably less than in the earlier call center example. The NHPP hypothesis was nearly never rejected, and the authors concluded that the hospital arrivals were also well modeled by an NHPP.

Pritsker et al. [22] details the construction of a model for determining poli- cies for deciding recipients of organ donations. The model is called ULAM (UNOS Liver Allocation Model), with UNOS in turn being the abbreviation for the United Network for Organ Sharing.

ULAM models the arrival processes of organ donors and recipients using an NHPP with a rate function that is exponential-polynomial-trigonometric with multipe periodicities (described further in Section 2.5.1).

(29)

Arrival Process Modeling

in Telemedicine

This chapter describes the experiments that has been performed in this thesis, the method that has been used, and the results of the experiments.

4.1 Telemedicine Arrival Data

This study uses telemedicine arrival data from the Swedish digital healthcare provider Kry. The data considers arrivals for patients in Sweden during the year of 2019, where the patient is an adult (at least 18 years of age), and the appointment was conducted by a physician (as opposed to a nurse or a psy- chologist). The data has been extracted from the company’s database in the form of a single file in CSV format, where each line contains a timestamp in the format 2019-01-01 12:00:00. Some minimal preprocessing has been performed by sorting the timestamps, and removing blank lines. The precision of the timestamp measures is only specified to the second. Since duplicates are allowed, the data is effectively structured in 1-second bins.

The dataset contains arrival data consisting of 365 realizations, each realization spanning 24 hours. The total number of arrivals across all realizations was 489245.

The figures 4.1 and 4.2 show two visualizations of the arrival rates in the data. Figure 4.1 shows the arrival rate for five Mondays in July 2019. The rate is calculated as the number of arrivals per hour, averaged over each minute.

Figure 4.2 shows the average arrival rate for all days in the data set, with the shaded area showing the standard deviation. The number of arrivals is averaged over 15-minute intervals.

22

(30)

Figure 4.1: Arrival rates for five days

4.1.1 Unrounding

Since the arrival times are only measured to the second, several arrivals are considered to happen simultaneously. However, this is expressly disallowed by the Poisson assumptions described in 2.3.4. To account for this, the arrival times are “unrounded”, as shown in [54, 42]. The unrounding is performed by adding a uniformly distributed random variable taking values in the interval [0, 1000), with the value representing a number of milliseconds. Note that this only applies when considering the arrival times themselves — the aggregated counts are not affected.

4.2 Notation

We use a notation inspired by [25], in turn inspired by [57]. We use L to denote interval length. We use d 2 D to index the days, where D = {1, . . . , D}, and D is the number of days in the data set (in our case, D = 365). For the intraday intervals, we analogously let i 2 I, where I = {1, . . . , I}, and I = ²⁴_L, the number of intervals in 24 hours. For weekdays, we let w 2 W, where W = {1, . . . , W}, and W = 7, the number of days in one week. Let A be the set of all arrival times. Let Ai be the subset of arrival times in interval

(31)

Figure 4.2: Average arrival rate per minute across the whole dataset (the shaded area shows the standard deviation)

i. Let Ad,i be the set of arrival times in interval i on day d. We then define Nd,i 2 N as |A^d,i|, the number of arrivals on day d in interval i. Let w(d) be a function which gives the numerical index w 2 W of the weekday that the date dfalls on. Let Cw,i be the set of arrival counts in interval i for all instances of weekday w, defined as: Cw,i ={N^d,i | w(d) = w}.

4.3 Arrival Process

Patients enter the queue by requesting an appointment via a mobile application.

Users may choose between the regular queue, known as drop-in bookings, or scheduling a meeting for a specific time. Only drop-in arrivals are considered in this thesis. Since scheduled meetings are only booked at “even” times, these meetings are unlikely to fulfil the Poisson properties. These meetings will require an alternative approach to modeling, and is considered out of scope for this thesis.

The service is open around the clock, meaning patients may arrive into the queue at any time. Arrivals are either placed into the queue, or start receiving service if the queue is empty. Customers reneging, in this context patients

(32)

who cancel a meeting before they have been served, are not considered in the model.

4.4 The Piecewise Constant Model

The model considered approximates the varying arrival rate function (t) with a piecewise constant function. The set of arrival data is collected into equal- sized, non-overlapping intervals of length L. On weekdays w, in the interval i, the arrival rate is assumed to be constant with parameter w,i during the entire interval. In each interval, the arrival process can then be considered a homogeneous Poisson process. The parameter w,i is estimated from the set Cw,i, that is, the set of arrival counts across all instances of the same interval on the same weekday. The estimation is described further in Section 4.6.

Verifying whether this approximation fulfils the Poisson conditions described in Section 2.3.4 and estimating these parameters is the main scope of this thesis. While the data exhibits a clear daily periodic cycle, the model makes no assumptions about this periodic behaviour in the arrival process.

The estimated arrival rate for the last interval in one day is not necessarily the same as for the first interval in the next day.

4.4.1 Motivation

Advantages The main advantage of the piecewise constant model is that is simple to work with. The approximation is intuitively easy to understand, and since no previous results have been found in the field of telemedicine, it is reasonable to begin the research using the simplest model.

Two of the main tasks, estimating the parameters (described in Section 4.6), and generating random realizations of the process, are both relatively straightforward under the piecewise constant assumption [24].

Drawbacks Assuming a piecewise constant arrival rate may not be reason- able for data with rapidly changing arrival rates. This is the case especially when the number of arrivals increases or decreases significantly inside the scope of a single interval. In these instances, the choice of interval length must be considered even more carefully. One approach to this problem is to employ a smoothing algorithm, as described by Chen and Schmeiser [24].

Given an estimated piecewise constant rate function, in order to decrease the difference between the estimated arrival rates in adjacent intervals where the real arrival rate is increasing sharply, their approach halves each interval,

(33)

providing a smoother, but still piecewise constant, curve. The method can then be used several times, providing an increasingly smoother and nearly continuous approximation of the arrival rate function.

Other models considered

A simple analysis of the arrival data shows rapidly changing arrival rates which would motivate a linear approximation. A piecewise linear model was considered, specifically the one described in [29]. However, the complexity involved made the implementation unfeasible within the time frame for this project.

4.4.2 Interval Selection

One of the main difficulties with this type of model is the strategy for choosing intervals. Possible strategies vary in the length, number, and positioning of the resulting intervals.

Whitt [27] suggests setting the interval length to the mean service time in the system. While the service time distribution is outside the scope of this thesis, the mean service time is approximately 15 minutes.

Kim and Whitt [42] first fit a piecewise linear function to the arrival data, using the distinct linear components as the intervals. They then divide each interval into a number of equal-length subintervals, assuming the rate is constant within each subinterval. As they note, there is a trade-off in the choice of interval size:

“Using shorter intervals makes the [piecewise constant] approximation more likely to be valid, but interarrival times are necessarily truncated at boundary points and any dependence in the process from one interval to the next is lost when combining data from subintervals, so we would prefer longer subintervals unless the PC approximation ceases to be appropriate.”

Another approach is to select subintervals dynamically, based on the data. This is shown in [26], where changes in the arrival rate are detected by observing large jumps in the mean and/or variance in the number of arrivals. These points are then considered the interval limits.

4.5 Verifying the Poisson assumption

The assumption that the Poisson conditions described in Section 2.3.4 — independence of arrivals and the Poisson distribution of the number of arrivals

(34)

— holds within each interval must be examined. In order to verify whether this is true, two statistical tests are performed.

In this section, we examine the sets Ad,i; that is, the set of arrival times grouped by interval. We also consider the corresponding set of interarrival times; that is, the pairwise subtraction of arrival times within the intervals.

Recall that, as described in Section 4.1, the arrival times are recorded only to the second, and that this is offset by adding a uniformly distributed random variable to emulate millisecond-precision unique arrival times. Intervals which contain less than 10 arrivals are considered too small for performing valid statistical tests, and are excluded.

4.5.1 Exponentiality of interarrival times

Kolmogorov-Smirnov test First, the distribution of interarrival times is in- vestigated, to confirm whether they are exponentially distributed. This is verified using the Kolmogorov-Smirnov (KS) test, as suggested by e.g. [54]. The KS test is chosen primarily for its widespread prevalence in similar studies and ease of implementation. The mechanics of the test are described briefly in Section 2.7.1.

Since the rate parameter must be estimated from the sample, we use a KS test with a modified statistics table known as the Lilliefors test [45]. The significance level is set to 5%.

Technical Implementation We have used the implementation of the Lil- liefors test provided by the Python package Statsmodels [58]. The statistics table of critical values for the Lilliefors distribution used in the test has been calculated based on 10,000,000 simulations.

4.5.2 Independence of interarrival times

BDS test To verify that the sequence of interarrival times are independent and identically distributed (i.i.d.), a Brock-Dechert-Scheinkman (BDS) test is used [48].

A short description of the BDS test is given in Section 2.7.2. For further details, the reader is referred to [48]. For the BDS test, the default parameters for the implementation are used — two embedding dimensions are considered, and the distance threshold ✏ is calculated by multiplying the standard deviation of the sample by 1.5. Again, a significance level of 5% is used.

(35)

Technical Implementation The BDS test implementation used is the one provided by Statsmodels [58].

4.6 Parameter Estimation

To fully describe the model, there is still a need to determine the parameters

w,i. The parameters are estimated based on arrival data, and the model can then be used to generate realizations of the non-homogeneous Poisson process. The parameters are estimated using the Maximum Likelihood Estima- tion (MLE) method.

The likelihood function L( ) describes the probability of seeing the arrival times present in the data, from an arrival process with the rate . In order to find the set of parameters w,i most likely to have given rise to the observed arrival data, the objective is to maximize the likelihood function.

When estimating the parameters w,i, we consider the the arrival data grouped based on weekday and intraday interval; that is, the sets Cw,i. Ev- ery interval, each of length L, then contains the following data points:

1. i, the intraday interval index 2. d, the date this period occurred on 3. w, the weekday this date occurred on 4. Nd,i, the number of arrivals in this period Technical Implementation

For calculating the estimate, the GenericLikelihoodModel method from the Statsmodels Python package [58] is used.

4.6.1 Finding the maximum likelihood estimator

Taboga [59] provides an explanation for how the maximum likelihood estimator ˆ is derived. The probability mass function of the Poisson distribution is given in equation (22).

PX(xi) = exp( 0) 1 xi!

xi

0 (22)

(36)

The likelihood function is expressed as the joint probability mass function:

L( ; x¹, . . . , xn) = Yn i=1

exp( ) 1 xi!

xj (23)

The logarithm of the likelihood function is generally simpler to work with.

Thus the log-likelihood function `( ) = log L( ) is used by applying the nat- ural logarithm to the likelihood function:

`( ) = Xn

i=1

xilog( ) log(xi!) (24)

The objective is to maximize `( ), that is, finding ˆ:

ˆ = arg max `( ; x1, . . . , xn) (25) The derivative of ` with respect to is:

d

d `( ) = n + Xn

i=1

xi (26)

By setting the value of the derivate to zero, _d^d`( ) = 0, the maximum likelihood estimator ˆ is the sample mean of the n observations.

ˆ = 1 n

Xn i=1

xi (27)

This maximum likelihood estimator is calculated for each weekday and interval. Using the notation described in Section 4.2, the estimations are the set of parameters { w,i | w 2 W, i 2 I}, where each parameter is the sample mean of all arrival counts in the set Cw,i.

w,i = 1

|Cw,i|

XCw,i (28)

4.6.2 Simulation

The estimated model parameters w,i for each weekday and interval are then used as the rate parameters in a simulation for each weekday. Each interval is simulated independently.

Recall that Poisson processes will have exponentially distributed interar- rrival times. Using a simple approch based on inverse transform sampling, a

(37)

new exponential variate t is generated by drawing a random number u from a uniform distribution in the interval (0, 1), and then calculating t as in (29).

The next arrival time is then calculated by adding t to the previous arrrival time.

Arrival times are drawn from this random procedure until the arrival time texceeds the end of the interval limit. The overflowing value of t is discarded.

The set of arrival times generated during all intervals during that weekday w constitute one realization of the NHPP.

t = log u

w,i (29)

4.6.3 Technical Dependencies

The project was implemented using the programming language Python, as well as the libraries SciPy and Statsmodels [58]. The calculations therefore depend on the accuracy and correctness of these tools. While they are reasonably well-established within the scientific community, no attempts have been made to verify that the functions within these libraries are correctly implemented.

They are as such a possible, if yet unlikely, source of implementation errors.

The Statsmodels project has verified most of their results using other statistical software [60].

4.7 Results

4.7.1 Interval size

In this work, equally sized fixed intervals has been used for convenience. The intervals are considered within the period of one day, and the interval length Lis given in minutes. For the intervals to be equally sized, the interval size must then divide the number of minutes in one day, 1440. Allowed sizes are as such:

1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 30, 32, 36, 40, 45, 48, 60, 72, 80, 90, 96, 120, 144, 160, 180, 240, 288, 360, 480, 720, 1440 Several interval sizes have been considered. For brevity, results are presented for three values of L: 30, 60, 90. In order to select one size, both statistical tests described in Section 4.5 were performed for each value of L. While all

(38)

three interval sizes showed largely similar results, the model performs slightly better when L = 60. This is the value considered in the results presented in Section 4.7.

Figure 4.3: Ratio of intervals passing both tests as a function of the interval time

Figure 4.3 shows the percentage of intervals passing both the KS test and the BDS test, as described in Section 4.5, as a function of the interval time, with a polynomial trendline of the second degree. There is a clear pattern for all three interval sizes, where the ratio of intervals passing decreases in the early morning and evenings. There is a relative lack of data points during the night hours — this corresponds to these intervals not being considered, due to containing too few arrivals. The model performs very similarly when L = 30 and L = 60, but performs slightly more consistently (better during the night,

(39)

but worse during the daytime) when L = 90. 365 days have been considered.

Figure 4.4: Ratio of intervals passing both tests as a function of the arrival rate Figure 4.4 shows the percentage of intervals passing both the KS test and the BDS test, as described in Section 4.5, as a function of the average arrival rate, with a polynomial trendline of the second degree. Again, the model be- haves similarly with all three interval sizes. The model performs best when the arrival rate is roughly equal to 1 arrival per minute, with the ratio of passing intervals dropping off sharply with higher arrival rates. 365 days have been considered.

(40)

4.7.2 Poisson properties

The two properties of the Poisson process, exponential interarrival times and independent arrivals, have been examined by performing two statistical tests, the Kolmogorov-Smirnov (with Lilliefors statistics table) (KS) test and Brock- Dechert-Scheinkman (BDS) test, respectively. The tests have been performed independently for each interval. Both tests have been performed at the 5%

significance level. The data set covers 489,245 arrivals.

Table 4.1: Tests of NHPP of telemedicine arrival data Interval category Intervals Fraction

n 6518 100.0%

Poisson 4289 65.8%

Exponential 809 12.4%

Independent 1285 19.7%

Neither 135 2.1%

Table 4.1 shows the results of both tests, using interval size L = 60. Inter- vals are presented in four categories: intervals are considered Poisson when passing both tests, independent but not exponential when passing only the BDS test, exponential but not independent when passing only the KS test, and neither when passing neither test.

The percentage of intervals showing both independent arrivals and exponential interarrival times was ⇠66%.

One hypothesis is that nightly intervals passed the tests less often. This is supported by Figure 4.3. However, when running the same experiment, including only arrivals between the hours 06–22, the percentage increased only marginally. These results are shown in Table 4.2. This experiment fea- tured n = 5806 intervals, a reduction of ~11%. The number of arrivals was 459, 548, roughly 30, 000 less. This is explained by the fact that most intervals outside these hours have already been eliminated due to a low number of arrivals.

Another hypothesis is that the optimal interval size varies throughout the year. The daily average arrival rate, shown in Figure 4.5, increases throughout the data. As shown in Figure 4.4, the ratio of intervals passing both tests decreases sharply with increasing arrival rate. When running the experiment using only arrivals in a single month, the results improve. For the best-performing month, April, ⇠74% of intervals pass both tests, with n = 546, and 36, 602

(41)

Table 4.2: Tests of NHPP of telemedicine arrival data, between hours 06–22 Interval category Intervals Fraction

n 5806 100.0%

Poisson 3859 66.5%

Independent 1238 21.3%

Neither 126 2.2%

arrivals. This suggests that the interval size L needs to be selected more gran- ularly, for example on a monthly basis. The full results are shown in Table 4.3.

Figure 4.5: Daily number of arrivals during 2019

This is slightly lower than similar experiments in e.g. [26], in which arrivals to wireless networks were modeled as a non-homogeneous Poisson process. Their results in two trials showed 86.5% and 81.2% of dynamically chosen intervals passed both tests.

(42)

Table 4.3: Tests of NHPP of telemedicine arrival data, in April Interval category Intervals Fraction

n 546 100.0%

Poisson 404 74.0%

Independent 52 9.5%

Neither 7 1.3%

4.7.3 Parameters and simulation

The arrival data has been used to estimate arrival rate parameters for each weekday and interval of length L. An example of the estimated arrival process for Mondays, with L = 60, is shown in figure 4.6. The arrivals exhibit a familiar pattern, similar to the results shown in e.g. [61]. The corresponding cumulative arrival function is shown in Figure 4.7.

Figure 4.6: Example of the piecewise constant arrival pattern

(43)

Figure 4.7: Example of the cumulative arrival pattern

The parameter estimates have then been used as input to perform simulations of the arrival process across one realization (24 hours). Several runs of the simulation is performed in order to improve accuracy. Figure 4.8 shows the empiric cumulative arrival functions of 10 simulation runs, using arrival data from Wednesdays only.

Figure 4.9 shows the mean of the empiric cumulative arrival functions of 1, 000 simulation runs, also using data from Wednesdays. The shaded area indicates the 95% confidence interval.

(44)

Figure 4.8: The empiric cumulative arrival functions of 10 simulation runs of Wednesdays

(45)

Figure 4.9: The empiric cumulative arrival function of the mean of 1,000 simulation runs of Wednesdays. The shaded area shows the 95% confidence interval.

(46)

Discussion

5.1 Conclusions

This study has examined arrival data from a telemedicine provider, covering the year of 2019. Two statistical tests were performed to examine whether the arrival data exhibits the two vital properties of a Poisson process: (i) independent interarrival times, and (ii) exponentially distributed interarrival times, in order to ascertain that a non-homogeneous Poisson process is a suitable model for arrival patterns in the context of telemedicine.

The data was examined in fixed intervals, and results are presented with the interval size L = 60 minutes. When examining arrival data for the entire year, the results showed that ⇠66% of intervals exhibited both properties. When examining arrival data only for the month of April, the best-performing month, the corresponding percentage was ⇠74%.

These results indicate that using NHPPs to model customer arrival processes in telemedicine is merited, but that the interval size must be finely tuned when using a piecewise constant model. The piecewise constant model considered in this work does not incorporate any long-term trend predictions. In the setting of a rapidly growing industry, this limits the model’s usefulness in predicting future arrival patterns, and more advanced modeling techniques may be required to reach satisfactory predictive power.

No previous results have been found to support whether NHPPs are useful models in the field of telemedicine, although success have been found in several similar areas, such as physical hospitals and telephone call centers (see Section 3 for a summary). The similarities between these industries support the conclusion in this study that telemedicine arrivals are also well-modeled by non-homogeneous Poisson processes.

39