Customer Support Process Analysis: Using statistics and modeling to analyze a global customer support process

(1)

Customer Support Process Analysis

Using statistics and modeling to analyze a global customer support process

Tobias Björch Fredrik Strålberg

June 14, 2016

(2)

Copyright © 2016 Tobias Bj ¨orch and Fredrik Str˚alberg All rights reserved

CUSTOMER SUPPPORT PROCESS ANALYSIS - USING STATISTICS AND MODELING TO ANALYZE A GLOBAL CUSTOMER SUPPORT

Submitted in partial fulfillment of the requirements for the degree Master of Science in In- dustrial Engineering and Management

Department of Mathematics and Mathematical Statistics Ume˚a University

SE-901 87 Ume˚a, Sweden Supervisor:

Konrad Abramowicz Examiner:

Leif Nilsson

(3)

Abstract

A key challenge for a company with global support is to provide qualitative service to their customers. Management of global support centers has to consider customer re- quirements, service agreements, budget, resources and more. Therefore, management has a limited room for testing new approaches, especially in global operations. This thesis aims to use statistics, modeling and discrete event simulation to analyze a global support process. Analysis shall provide approximate results to support decision making. Model representation of the global support process uses non-parametric bootstrap, to replicate variability observed in the real-world system. Variability in the arrival process is con- sidered by using bootstrap block resampling. To describe the observed global support process, data has been collected from the case company. The results from simulation are validated by comparison with the observed data. Simulation results validate the model representation and therefore potential process enhancements are tested. Further, discus- sion considers results from test of process enhancements and validity of the simulation model.

Sammanfattning

En utmaning f ör ett f öretag med en global support är att erbjuda kvalitativ service till deras kunder. Ledningen f ör globala supportcenter m˚aste ta hänsyn till kunders önskem˚al, serviceavtal, budget, resurser med mera. Därf ör har ledningen ett begränsat handlings- utrymme f ör att testa nya tillvägag˚angssätt, speciellt inom globala verksamheter. M˚alet med det här arbetet är att med hjälp av statistik, modellering och diskret händelsestyrd simulering analysera en global supportprocess. Analysen ska bidra med approximativa resultat vilka kan användas som beslutsunderlag. Modellrepresenationen av den globala support processen tillämpar en icke-parametrisk ˚atersampling f ör att replikera variabili- tet i det observerade systemet. F ör att replikera variabilitet i ankomstprocessen används

˚atersampling av block (bootstrap block resampling). F ör att beskriva den observerade glo- bala supportprocessen används data inhämtad fr˚an f öretaget där arbetet utf ördes. Resultat fr˚an simulering valideras genom jämf örelser med observerat data. Simuleringsresultaten validerar modellrepresentationen och därf ör har potentiella processf örbättringar testats.

Vidare presenteras diskussion om validering av simuleringsmodellen samt resultat fr˚an tester av potentiella processf ¨orb¨attringar.

Svensk titel: Analys av en supportprocess

(4)

(5)

To our families

(6)

(7)

Acknowledgements

We wish to thank various people that has contributed to this thesis. Firstly, we would like to thank our supervisors at the case company. You helped us create the idea behind this thesis and guided us during our work. It has been a pleasure working with both of you.

Special thanks go to our supervisor Konrad Abramowicz for his encouragement, time, patience and his enthusiasm. You have inspired us to work hard even at times when the goal of this thesis felt distant. We have really appreciated all your support and it has been a pleasure getting to know you better.

Then we wish to thank our families and friends for their understanding and patience, during evenings and weekends, when we have been working on this thesis.

Finally, we would also like to show our gratitude to employees at the case company

that have been helpful and willing to answer all our questions.

(8)

(9)

1 Introduction 2

1.1 Background . . . . 2

1.1.1 Case company . . . . 2

1.1.2 Problem description . . . . 3

1.1.3 Possible ways to analyse a global support process . . . . 3

1.1.4 Process description . . . . 4

1.1.5 Service requirements and work schedule . . . . 9

1.2 Purpose . . . . 11

1.2.1 Potential process enhancement . . . . 11

1.3 Observed Data . . . . 12

1.4 Delimitations . . . . 12

1.5 Approach and Outline . . . . 13

2 Theory 14 2.1 Probability Theory . . . . 14

2.1.1 Sample space, events and probability . . . . 14

2.1.2 Axioms of probability . . . . 14

2.1.3 Random variable . . . . 14

2.1.4 Random variable characteristics . . . . 14

2.1.5 Expected value and variance . . . . 15

2.1.6 Distributions used in this thesis . . . . 15

2.2 Statistical Inference . . . . 16

2.2.1 Sample mean and variance . . . . 16

2.2.2 Hypothesis testing . . . . 17

2.2.3 Methods of tests . . . . 17

2.2.4 Inference about differences in means of populations . . . . 18

2.3 Stochastic Simulation . . . . 21

2.3.1 Pseudorandom numbers . . . . 21

2.3.2 Bootstrap . . . . 21

2.3.3 Bootstrap block resampling . . . . 22

2.4 Discrete Event Simulation . . . . 22

2.4.1 Warm-up period . . . . 23

2.5 Software . . . . 23

2.5.1 Software usage . . . . 23

2.5.2 SimEvents library . . . . 23

3 Data 27 3.1 Observed Data . . . . 27

3.1.1 Performance table . . . . 27

3.1.2 Time reporting table . . . . 28

3.1.3 Change history table . . . . 28

3.2 Data Processing . . . . 29

3.2.1 Arrival process . . . . 29

3.2.2 Model parameters . . . . 31

(10)

4 Method 33

4.1 Implementation . . . . 33

4.2 Model Representation . . . . 33

4.2.1 Attributes of customer support requests . . . . 33

4.2.2 Generate customer support request . . . . 34

4.2.3 Customer unit . . . . 35

4.2.4 Global support center . . . . 37

4.2.5 Product line maintenance . . . . 42

4.3 Simulation settings . . . . 44

4.3.1 Time . . . . 44

4.3.2 Work schedule . . . . 44

4.3.3 Input data . . . . 45

4.4 Representation of Potential Process Enhancement . . . . 45

4.4.1 Early routing . . . . 45

4.4.2 No individual assignment of customer support request in the global support centers . . . . 46

4.4.3 Number of engineers in product line maintenance . . . . 47

5 Results 48 5.1 Validation Of Model . . . . 48

5.1.1 Mean duration using original arrival process . . . . 48

5.1.2 Mean duration using bootstrap arrival process . . . . 49

5.1.3 Comparison of individual customer support request duration . . . . 50

5.2 Potential Process Enhancement . . . . 53

5.2.1 Early routing . . . . 53

5.2.2 No individual assignment of customer support request in the global support centers . . . . 56

5.2.3 Number of engineers in product line maintenance . . . . 57

6 Discussion and conslusion 60 6.1 Review . . . . 60

6.2 Validation of model . . . . 60

6.2.1 Evaluation of mean duration . . . . 60

6.2.2 Comparison of individual customer request support duration . . . . 61

6.3 Potential Process Enhancement . . . . 61

6.3.1 Early routing . . . . 61

6.3.2 No individual assignment of customer support request in the global support centers . . . . 62

6.3.3 Number of engineers in product line maintenance . . . . 63

6.4 Conclusions and Recommendations . . . . 63

References 64

A Appendix A - Individual times divided by different priorities using bootstrap ar-

rival process 65

B Appendix B - Individual times divided by different customer regions using boot-

strap arrival process 66

(11)

Abbreviations

SAP Systems, Applications and Products.

CSR Customer Support Request

CU Customer Unit

GSC Global Support Center PLM Product Line Maintenance DES Discrete Event Simulation OAP Original Arrival Process BAP Bootstrap Arrival Process

ER Early Routing

NIAC No Individual Assignment of CSR

NEP Number of Engineers in PLM

(12)

1 Introduction

In this thesis we make inference about a key challenge for a company with after sales ser- vices, namely management of service support centers. Management has to consider several aspects, such as customer requirements, budget restrictions, service agreements, available re- sources etc. Therefore, one of the key challenges for a company is to offer short service time to a customer while organizing resources. To simplify the approach of organizing resources to meet future demand, it would be of interest to have a mathematical method to evaluate the organisation and service. We aim to use statistics, modeling and simulation to analyze a global support process. We focus on finding an effective way to represent a real-world sys- tem with simulation. The thesis shall give decision makers a way to model a global support process. Hence, approximate results shall give an indication of how the process respond to changes.

1.1 Background

1.1.1 Case company

This thesis is carried out at a global telecommunication company that is part of changing the environment of communication technology. The case company provides equipment, soft- ware and service to enable transformation through mobility. Their leadership in technology and service has played an important role for expansion and improvement of connectivity worldwide. The company structure is divided into several business units, which are sup- ported by group functions such as sales, finance, human resources, etc.

This thesis analyze a global support process for customer service. Customers that are given service by this support organisation is large corporations. The global support is a service offered to customers for a service fee. Customers pay annual service fee to get help with resolving problems regarding a product.

The support organization is divided in three different levels: customer unit (CU), global support center (GSC) and product line maintenance (PLM). There exist approximately 150 CUs around the world, 3 GSC and 1 PLM-unit. An illustration of the global support or- ganization is seen in Figure 1.1. The geographical positions is not representing the actual position of the units in the organization.

Figure 1.1: An illustration of the global support organization. The geographical positions is not representing the actual

position of the units.

(13)

1.1.2 Problem description

It is a challenge to give service of advanced products such as telecommunication equipment.

It can take several days to solve a problem. It is also a challenge to meet customer demands on short response time and overall service time. There is a demand for local customer units to enable short response time, therefore the units are strategically located all over the world.

Customer units (CU) are local offices, and they are the first contact point with global support. CUs aim to give short response time and communication in customer’s native language. The other challenge is offering short overall service time. A company has to weigh between having specialists spread across the local offices or having centralized specialist units, which assists the local offices. Centralized specialist units are efficient because all local offices can require assistance from these units, i.e. they share these resources.

Another challenge is to offer 24/7 support to customers from different locations. The purpose of having units spread across the globe is to gain efficiency by using global volume and global competence. It also enables shared work over different time zones.

The global support handles customer support requests (CSR). When a customer contacts the global support, a CSR is created. During the following support process the CSR is tracked and data is collected.

The support organisation has three support levels CU, GSC and PLM. Each of them have their own sub process to support the customer. Together they constitute the global support process. The global support process is described in more detail under Section 1.1.4.

It is difficult for a large scale organisation to analyze effects of changes. Therefore, a method to indicate effects of changes is of interest to study.

1.1.3 Possible ways to analyse a global support process

1.1.3.1 Value stream mapping

Lean is a systematic method for eliminating waste and creating value. Methods such as lean offer a way to analyse a process by observation. Value stream mapping is a method used to map activities in a process. First determine the calendar time for each activity (lead time).

Then measure the time a resource spend on each activity (processing time). Identify time between activities and identify any loop backs. These measures can be used to calculate the flow efficiency and identify improvement areas. (Bicheno et al. 2011)

1.1.3.2 Performance measures

By observing the process and using average values of performance measures such as lead time, processing time and time between activities it is possible to calculate average service times, capacity and other values of interest. These measures are some examples of param- eters that are helpful to managers. However, they do not consider the variation that may exists in the process.

1.1.3.3 Testing new support process

A third approach to analyse the global support process is testing new set-ups or a new

process, i.e. a pilot study. A limited group tries a new process and during this time capture

the performance measures of this new way of working. This makes it possible to compare

the performance measures with the original support process. This approach gives reliable

results but requires lots of testing, involves several people and makes it very time consuming

and costly.

(14)

1.1.3.4 Simulation

Simulation is another approach which aims to represent the real-world process by using his- torical data. By simulation it is possible to test different set-ups without making any changes to the current organisation. One have to assume that the process can be well replicated in a simulation model.

Simulation models are used to solve problems and provide results to support decision makers. One thing to concern is to make sure that the results are accurate. There are meth- ods to check if the simulation is a good representation of the real-world process. Model validation and verification address the appropriateness of the representation. A probabilis- tic model which make inference about random variables uses properties such as mean and variance to determine validity. There exist many different concepts within validation. Con- ceptual validation is performed to test that theories and assumptions are reasonable, for the intended purpose of the study. Face validation is a subjective measure done by individu- als knowledgeable about the system. They are asked to determine whether the model is behaving reasonable in regard to real world system. (Sargent 2011)

The global support process has several similarities with an incoming call center. Accord- ing to Kim (2005, p. 390), call centers are commonly used by corporations to a wide range of activities. They can be planned to give service, support and serve other types of cus- tomer inquiry’s. These types of call centers is commonly referred to as service centers. Calls occupy resources in the service center and the management of a service center has a goal of achieving high service level for customers and use resources efficiently. A service center handles entities with its resources.

In a study from Bouzada (2009), the author uses an empirical case from a call center to compare between experimental methods (simulation), with analytical methods (queueing theory). The aim of the study is to compare the methods for dimensioning of the handling capacity. Bouzada (2009) results was able to verify that use of simulation for dimensioning of handling capacity showed advantages compared to the analytical method, mainly in complex organisations.

We conclude that these studies show that simulation is a suitable method to analyse a call center. Hence, we can consider simulation as a suitable approach to analyse the global support process.

1.1.4 Process description

As mentioned in Section 1.1.2, the global support handles CSRs. When a customer problem

occurs in the global support a CSR is created with a date and time stamp. During the

following support process variables regarding the CSR is collected in a data base. This

section describe the global support process, which is presented in Figure 1.2. We describe

sub processes in more detail under the following sections.

(15)

Figure 1.2: Global support process.

(16)

1.1.4.1 Customer support request duration

CSR duration is defined as: the time elapsed between the creation of a CSR until it is closed.

Engineers within each of the three support levels can present a solution to a customer. If the solution resolves the problem this leads to a closing of the CSR. A CSR can be closed in any of the support levels, which means that the support process can look different for each CSR.

If a CSR is handled by all three support levels the duration of a CSR is the time elapsed from start to CSR closed in PLM, seen in Figure 1.2. If a CSR is handled by the first two levels the duration of a CSR is the time elapsed from start to CSR closed in GSC. If a CSR is handled by the first support level duration of CSR is the time elapsed from start to CSR closed in CU.

1.1.4.2 Creation of a customer support request

Figure 1.3: CSR creation in the global support process

A CSR is created when a customer contacts someone within a CU and requires support.

CSRs can be created by all sorts of reasons and by any of the customers in the world.

Each time a CSR is created, it gets a date and time stamp and it is registered with a unique CSR ID. Creation of a CSR can be seen in Figure 1.3.

1.1.4.3 Assign engineer

After creation of a CSR it needs to be assigned to a support engineer. If a support engineer is available the CSR is assigned to the engineer. If all engineers are occupied, the CSR is put in a queue. The CSR queue holds CSR that has not yet been assigned to an engineer. The CSR queue also holds CSR that has been analysed by an engineer but is awaiting some additional information.

Assign engineer is a sub process that occurs on each support level Figure 1.4.

Figure 1.4: Assign Engineer in the global support process.

(17)

1.1.4.4 Pre-Analysis

Figure 1.5: Pre-Analysis in the global support process

Pre-Analysis occurs immediately after an engineer has been assigned to a CSR. Pre- Analysis is the first analysis of the customer problem performed by an engineer to nar- row the search for a solution. In order to allocate resources to the ”right” CSRs, all CSRs are assigned a priority according to technical and/or commercial impact of the problem.

A CSR can have one of five different pri- orities. A customer with a high priority CSR requires short service time and can ex- pect service during any time of day until the problem is resolved. Meanwhile a customer with a low priority CSR can expect longer

service time. Due to the fact that service is only given during daytime when the support is open. Customers with low priority CSR can also expect to be put in a queue while CSRs with higher priority is being served.

The CSR information established during the Pre-Analysis is stored in the business system Systems, Application and Products (SAP). Pre-Analysis is performed in all support level.

Although this process is more extensive for a CU-engineer, since they are the first contact

point for a customer. Pre-Analysis is viewed in Figure 1.5.

(18)

1.1.4.5 Analysis

Figure 1.6: Analysis in the global support process

Analysis is the biggest sub process in the global support. Prior to the anal- ysis the support engineer has secured measurement data and/or a remote connection which makes it possible to troubleshoot and try to resolve the problem together with the customer. It is also possible that more information is required during analysis. Then the engineer requests info either from the customer or other support personnel that has been previously assigned to the CSR. The engineer puts the CSR on hold and start working with another CSR. We refer to this scenario of CSR

on hold as a break. The elapsed time from information request until more information is received is referred to as break time. The expressions break and break time is referred to throughout the thesis. This creates a loop which can occur several times, depending on difficulty of isolating and resolving the problem. For example, a support engineer requires measurement data from the last week for a faulty product. Customer responds by saying that it takes a day to get the information. Engineer pauses analysis until information is received. Meanwhile the engineer can continue to service other CSRs. The Analysis sub process is seen in Figure 1.6

1.1.4.6 Escalation

If a support level determines that the CSR cannot be solved within that support level, a decision can be made to escalate the CSR to the next level. This can happen both in CU and GSC. An escalation, is a decision by the current organisational level to assign the CSR to the next level. In Figure 1.7 the escalation sub process is visualized.

Figure 1.7: Escalation in the global support process.

(19)

1.1.4.7 Find and present solution

Support engineers isolates the problem and a recovery procedure. The recovery procedure may look different, it can be a software update, hardware change, a product restart etc.

The support engineer presents the solution to the customer which in turn tries to recover the faulty product. If the recovery procedure does not resolve the problem, it is denied by the customer and support engineer returns to Analysis. If the recovery procedure is accepted by the customer, the CSR is closed and all info regarding CSR is updated in the software SAP, e.g. duration, support time and number of support activities

Find and present solution is seen in Figure 1.8

Figure 1.8: Find and present solution in the global support process.

1.1.5 Service requirements and work schedule

In Section 1.1.4 we describe the global support process. The organisation working according to this process is large and active all over the world, as seen in Figure 1.1. Having an organisation that offers service to global customers leads to a challenge in giving service during any time of day.

CSR with priority 4 and 5 are given service 24/7 which requires all support levels to

have personnel active at all times. CSR with priority 1, 2 and 3 are given support during

common office hours. Now we explain how the organisation is scheduled to meet these

service requirements. In Figure 1.9 we see a description of the path a CSR is escalated

according to time of day.

(20)

Figure 1.9: Path of a CSR during common office hours.

Figure 1.9 shows the path of CSR with any priority 1,2,3,4 and 5. All CSR is offered service during common office hours. CSR with priority 1,2 or 3 is only being served during the units common office hours.

Figure 1.10: Path of a CSR with priority 4 or 5, during evenings and weekends, in the Global support.

(21)

In support handling of CSR with priority 4 or 5 there exist a handover procedure which occurs when a GSC closes, see Figure 1.10. The handover procedure passes the service of a CSR from the closing GSC to the newly open GSC. A reduced amount of engineers works to service CSR with these priorities. In the global support there are three support levels, CU, GSC and PLM, which we describe in the following list.

Customer unit

There exist approximately 150 CUs in the global support, which work to serve one or more customer in their vicinity. Due to the scale of this support level we regard it as open at all time and able to service all customers at any time of day. This is a simplified representations of the CUs. We elaborate more on this topic under Section 4.2, where we describe our model representation.

Global support center

There exist 3 GSCs in the global support. The geographic locations of the GSCs makes it possible to offer service during any time of day, by sharing work over different time zones. Due to confidential agreements, number of engineers in each of the GSCs is referred as R GSC1 , R GSC2 and R GSC3 . These are reference values of the number of resources in each of the GSC during common office hours.

During weekends the global support needs to serve CSR with priority 4 or 5. This leads to less need for service of CSRs during weekends. Because of this, the number of support engineers is reduced in each of the GSCs during weekends.

Product line maintenance

There exist 1 PLM unit in the global support. Due to confidential agreements we refer to the number of engineers in PLM as R PLM . This is a reference value of the number of resources in PLM during common office hours.

Similar to GSC, there is less need for service of CSRs during evenings and weekends.

Because of this, the number of support engineers is reduced during evenings and week- ends.

1.2 Purpose

The purpose of this study is to use simulation to represent a real-world process, namely the global support process described in Section 1.1.4. The simulation model shall be able to provide approximate results which can indicate effects of organizational changes. The resulting model shall be able to test hypothetical set-ups. This thesis shall increase the case company’s knowledge of the process of support. We answer the following questions in this thesis:

• How can the global support be effectively simulated?

• How can simulation provide approximate results that can support the decision makers in managing the global support?

1.2.1 Potential process enhancement

These potential improvements corresponds to testing alternative set-ups of the support or-

ganisation or altering the global support process. We want to test the following potential

process enhancement.

(22)

1.2.1.1 Early Routing

As described in Section 1.1.4 a CSR can be escalated to higher support levels. In the current organisation CSR is created in one of the CU and is escalated in turn to GSC and then PLM.

Therefore, it is of interest to study the potential enhancement in having an early routing. We assign a person with knowledge about the process and common customer problems. This person has the role of routing the CSR to the support level, that is most suitable according to CSR characteristics.

We assume that the people working with routing has knowledge required to efficiently send CSR to the ”right” support level. By using observed data we will introduce an enhance- ment with routing functionality which uses probability based on historical data.

1.2.1.2 No individual assignment of customer support request in the global support cen- ters

For readability we will refer to this process enhancement as NIAC in GSC. In Figure 1.2 we can see that there exist a CSR queue on all support levels. From Section 1.1.4.3 we know that this queue holds CSR that has not been assigned to an engineer.

Consider a CSR which has been assigned to an engineer. If the engineer requests infor- mation from customer, support is paused until the information is receieved and the engineer continues to support the CSR. We introduce a potential enhancement where a CSR is placed back in the CSR queue when it is paused, for information request. With this potential en- hancement we allow any engineer to continue where the last engineer ended, without any loss of efficiency.

1.2.1.3 Number of engineers in product line maintenance

The third question regards the number of engineers in PLM support level. We want to test different variation of support engineers in the highest support level and test if this affects overall service time in the support process.

1.3 Observed Data

As part of this study, relevant data is collected to build a simulation model representing the global support process. The case company acquires data from each CSR. In total there are 180 characteristics and 145 keyfigures recorded. The provided data for this study is historical data recorded for CSRs from year 2015. The data contains information such as:

• Date and time.

• Priority level.

• Origin such as customer, country and region.

• Support handling time carried out by each support level (CU, GSC,PLM).

• Number of support activities.

• Duration of CSR, i.e. time from creation until CSR is closed.

1.4 Delimitations

In order to make this thesis viable a few limitations is introduced. The data considered in

the thesis, see Section 1.3, comes from CSRs regarding one specific product of the global

(23)

support service. Firstly, the reason for this is the amount and size of the data. This specific product might have complex correlations with other products which will not be considered in this thesis. Secondly, our thesis aims to effectively simulate the global support. One would expect it to be possible to use a similar model with other product types, if it is of interest.

Data and information of the customer units (see Figure 1.1) is not attainable, due to the geographic locations and number of CUs. Therefore, simplified representations of the CUs is used.

In the global support process there might be two or more engineers working at the same time, with the same CSR. This event is difficult to track in the observed data. Therefore, this event is not considered in our simulation model.

1.5 Approach and Outline

In this thesis we use simulation to analyse and evaluate a global support process. More specifically, we study the flow of CSRs through a global support process. Selected parts of the CSR data mentioned in Section 1.3 is used to represent the real world system in the simulation model.

In Chapter 2 the underlying theory we use in our thesis is presented, including probabil-

ity theory, statistical inference and simulation theory. In Chapter 3 we present the considered

data and data processing. In Chapter 4 we present how theory is applied, choice of simu-

lation approach and describe the model representation. The results and validation of the

model is presented in Chapter 5 followed by discussion and conclusions in Chapter 6.

(24)

2 Theory

In this chapter we start by defining fundamental concepts in probability theory, followed by statistical inference. Then we define theory of stochastic simulation, and discrete event simulation. Finally we also present theory about software used in this thesis.

2.1 Probability Theory

2.1.1 Sample space, events and probability

Let S denote a sample space of the experiment, which is the set of all possible outcomes.

Any subset A of the sample space is known as an event. For each event A of an experiment having sample space S there is a number P ( A ) , called the probability of event A.

2.1.2 Axioms of probability

P is a probability measure of event A for a sample space S if it satisfies the following axioms:

Axiom 1 0 ≤ P ( A ) ≤ 1 Axiom 2 P ( S ) = 1

Axiom 3 For any sequence of mutually exclusive events A ₁ , A ₂ , ...

P

n [ i=1

A i

!

=

∑ n i=1

P ( A i ) , n = 1, 2, ...

2.1.3 Random variable

Experiments are carried out to find a numerical quantity of interest. The resulting numerical quantity is a observation of a random variable. A set of observations from a random variable is called a sample.

There are two major types of random variables: discrete random variables and continuous random variables. Discrete random variables can take on a limited, or at most a countable number of values. Continuous random variables can take on an uncountable number of values.

2.1.4 Random variable characteristics

A random variable can be described by some characteristics, that describe its possible out- comes. For a discrete random variable X, the likelihood of taking on a specific value, is given by the probability mass function p ( x ) which is defined by:

p ( x ) = P ( X = x ) .

For a continuous random variable the possibility of taking on a given value is 0, because there are uncountably many values. Therefore it is suitable to talk about the possibility to end up in a given set. A continuous random variable, X, has a probability density function (pdf) f X , defined for all real numbers x and having the property that for any set A of real numbers:

P ( X ∈ A ) = Z

A f X ( x ) dx.

(25)

There exist a common characteristic which is shared by both types of random variables. It is called cumulative distribution function (cdf), F and its defined as

F ( x ) = P ( X ≤ x ) .

For further reading of theory that consider random variables, we refer to Ross (2012).

2.1.5 Expected value and variance

Expected value, for a random variable X, describes the weighted average of the possible val- ues in S, where each value is weighted with the probability of X taking that value. Expected value of a random variable X is denoted by µ = _E [ X ] , and defined as:

µ = E [ X ] =

R _∞

− ∞ x f ( x ) , if X is a continuous random variable.

∑ i x i P ( X = x i ) , if X is a discrete random variable.

Variance is a measure of the variation in the possible values of the random variable X. If X is a random variable with mean µ, then the variance of X, denoted by σ ² , is defined as:

σ ² = Var ( X ) = E [( X − µ ) ² ] ,

another measure of variation is the notion of standard deviation, which is defined as the square root of variance, σ.

2.1.6 Distributions used in this thesis

In this section we define the distributions used in this thesis.

2.1.6.1 Normal distribution

We say that random variable X is normally distributed with parameters µ ∈ R and σ > 0 and denote it by

X ∼ _N ( µ, σ ² ) if X has the density

f ( x | µ, σ ² ) = √ ¹ 2σ ² π

e ⁻

(x−µ)²

2σ2

, x ∈ R.

For such defined variable, we have E ( X ) = µ and Var ( X ) = σ ² . 2.1.6.2 t-distribution

We say that random variable X is t-distributed with ν ∈ N + degrees of freedom and denote it by

X ∼ t ν

if X has the density

f ( x | ν ) = ^Γ ( ^ν+1 ₂ )

√

νπ Γ ( ^ν ₂ )

1 + ^x

2 ν

⁻

^ν⁺₂¹

, x ∈ _R,

(26)

Γ ( t ) = Z _∞

0 x ^t−1 e ^−x dx.

2.1.6.3 Chi-squared distribution

We say that random variable X is chi-squared distributed with k ∈ N + degrees of freedom and denote it by

X ∼ χ ² ( k ) if X has the density

f ( x | k ) = ¹ 2

^k²

Γ

k 2

x

k

2

−1 e ⁻

^x²

, x > 0.

2.1.6.4 F-distribution

We say that random variable X is F-distributed with parameters d 1 ∈ N + and d 2 ∈ N + and denote it by

X ∼ F ( d ₁ , d 2 ) if X has the density

f ( x | d ₁ , d ₂ ) = ¹ B _d

1

2 , ^d ₂

²

d ₁ d 2

^d1₂

x

^d1²

⁻¹

1 + ^d ¹

d 2

x

−

^d1⁺₂^d2

, x > 0,

where the function B ( x, y ) , x > 0 and y > 0, is defined by

B ( x, y ) = Z ₁

0 t ^x−1 ( 1 − t ) ^y−1 dt.

2.2 Statistical Inference

Statistical inference is the part of statistics that aims to derive properties of an underlying distribution e.g. unkown parameter θ, by analysing a set of data. Inferential statistical analysis aims to draw conclusions about a population using hypothesis testing and deriving estimates. The theory presented in this section can be found in (Alm and Britton 2008).

2.2.1 Sample mean and variance

Lets start by defining a a sample x = ( x 1 , x 2 , ..., x n ) from the random variable X with the cdf F X ( x; θ ) . We introduce the point estimates of the expected value and variance, namely the sample mean and sample variance. Sample mean is defined by

¯x : = ¹ n

∑ n i=1

x _i = ^x ¹ + x 2 + . . . + x n

n .

Then, the sample variance is defined by

s ² : = ¹ n − 1

∑ n i=1

( x i − ¯x ) ² ,

(27)

and the sample standard deviation is denoted by s.

2.2.2 Hypothesis testing

In hypothesis testing framework we want to test a null hypothesis about the unknown pa- rameter θ:

H 0 : θ = θ ₀ ,

against an alternative hypothesis H 1 . The alternative hypothesis can be of different types:

• Simple alternative hypothesis:

H ₁ : θ = θ ₁

• Composite alternative hypothesis:

– one sided

H 1 : θ < θ ₀ or H 1 : θ > θ ₀ – two sided

H 1 : θ 6= θ ₀ When a test is preformed with a null hypothesis we write

H 0 : θ = θ 0

H 1 : θ 6= θ ₀ .

When testing the null hypothesis there are two types of errors that can occur, Type 1 error and Type 2 error. These errors and how they emerge can be obtained in Table 2.1.

Table 2.1: How Type 1 and Type 2 error emerges

Decision \ Reality H 0 false H 0 true Reject H 0 Correct Type 1 error

Do not reject H 0 Type 2 error Correct

where the probability of making the Type 1 error is denoted by α, i.e.

α = P ( Rejecting H 0 when H 0 is true ) _, and the probability of making Type 2 error i denoted β, i.e.

β = P ( Not rejecting H 0 when H 0 is false ) _.

Quantity α is called a significance level of the test and in most cases α is being controlled whenever a test is constructed.

2.2.3 Methods of tests

There are three different methods to perform a test: test variable method, direct method

and confidence interval method. In this thesis we present the direct method. In the direct

(28)

If we want to test H 0 : θ = θ ₀ on significance level α, we start by finding a reference variable R _θ for parameter θ, which distribution by the definition does not depend on θ regardless of what value it takes. Then we choose test variable

T ( X ) = R _θ

₀

( X ) .

If the null hypothesis is true then T ( _X ) has a fully known distribution. In the direct method we want to find a test variable T ( _X ) and then from the result of experiment x calculate:

p − value : = P _H

₀

to get at least as extreme value of T ( _X ) as we have observed ( i.e., T ( _x )) ,

where P _H

₀

stand for probability calculated under the assumption that the null hypothesis is true. Then, for a significance level, α, we reject H ₀ if p − value ≤ α and we do not reject H ₀ if p − value > α.

In general, when we want to test H 0 : θ = θ ₀ vs :

• H 1 : θ > θ ₀ , then the more extreme means greater than T ( x ) . Hence, p − value = P H

₀

( T ( _X ) ≥ T ( _x ))

• H 1 : θ < θ ₀ , then the more extreme means smaller than T ( x ) . Hence, p − value = P H

₀

( T ( _X ) ≤ T ( _x ))

• H 1 : θ 6= θ ₀ , then the more extreme means the absolute value of T ( X ) greater than the absolute value of | T ( _x )| _{. Hence,}

p − value = P H

₀

(| T ( _X )| ≥ | T ( _x )|)

2.2.4 Inference about differences in means of populations

We now introduce procedures to test equality of means. Comparing two population means is tested with a two-sample t-test meanwhile testing equality of several means is tested with Analysis of Variance (ANOVA).

2.2.4.1 Two-sample t-test

Assume that we have observed two independent samples x = ( x 1 , x 2 , ..., x n

₁

) from X ∼ N ( µ ₁ , σ ₁ ² ) and y = ( y 1 , y 2 , ..., y n

₂

) from Y ∼ N ( µ ₂ , σ ₂ ² ) , where µ 1 , µ 2 and σ ₁ ² , σ ₂ ² are the means and variances of random variables X and Y respectively. Let X and Y be the vectors of random variables corresponding to the observed samples. In what comes, there are no assumption regarding the size of the two samples and the variances are unknown. The reference variable for θ = µ ₁ − µ 2 is given by

R µ

₁

−µ

₂

( _{X, Y} ) = ^X ^¯ − Y ^¯ − ( µ ₁ − µ ₂ ) r

s

²₁

n

₁

+ _n ^s

²²

2

,

which is approximately t-distributed with f degrees of freedom, where 1

f = ¹ n 1 − 1

( n 2 s ² _x ) ²

( n 2 s ² _x + n ₁ s ² _y ) ² + ¹ n 2 − 1

( n ₁ s ² _y ) ²

( n 2 s ² _x + n ₁ s ² _y ) ² ^.

(29)

Then to test H 0 : µ 1 − µ ₂ = 0, we use the introduced reference variable, and obtain a test variable

T ( _{X, Y} ) = R _µ

₁

−µ

₂

( _{X, Y} ) = ^X ^¯ − Y ^¯ r

s

²_x

n

₁

+ ^s _n

²^y

2

,

which under H 0 is approximately t-distributed with f degrees of freedom. Now, for example, if H 1 : µ 1 − µ ₂ 6= 0, we reject the null hypothesis if P (| T ( X, Y )| ≥ | T ( x, y )|) = p − value ≤ α.

2.2.4.2 Analysis of variance

We now consider the procedure for testing differences in means of several populations, which is the analysis of variance, ANOVA. Consider the following representation of ob- servations, in the Table 2.2.

Table 2.2: Representation of observations

Population Observations Statistics Distribution (factor A)

1 x 11 , x 12 , . . . , x 1n

₁

¯x 1• , s ² ₁ X 1 ∼ _N ( µ 1 , σ ² ) 2 x 21 , x 22 , . . . , x 2n

₂

¯x _2• , s ² ₂ X 2 ∼ N ( µ ₂ , σ ² )

.. . .. .

p x p1 , x p2 , . . . , x pn

p

¯x p• , s ² _p X p ∼ _N ( µ p , σ ² )

The statistics in Table 2.2 are defined as,

¯x _i• = ¹ n i

n

_i

i=1 ∑

x ij

s ² _i = ¹ n _i − 1

n

_i

i=1 ∑

( x ij − ¯x i• ) .

In total we have N = _∑ _i=1 ^p n _i observations and we define ¯x •• as the total average of all observations. To test the null hypothesis

H 0 : µ ₁ = µ ₂ = . . . = µ _p H 1 : at least one pair differs

we can make use for the assumption of equal variances. The main idea is to construct two estimators of σ ² to make inference. First construct an estimator of σ ² which is unbiased regardless of the null hypothesis. Then construct another estimator of σ ² which is unbiased only under H 0 . If the observed ratio of the two estimates deviates significantly from 1 we reject H 0 . Let X ij be a random variable which corresponds to x ij , j = 1, 2, ..., n i and i = 1, 2, ..., p. Further, let the X be the vector containing all the variables X ij , j = 1, 2, ..., n i

and i = 1, 2, ..., p.

Each of s ² ₁ ( _X ) , . . . , s ² _p ( _X ) is an unbiased estimator of σ ² , regardless if the null hypothesis is

(30)

true or not. We can pool all the estimators to obtain one

s ² _e ( _X ) = ^∑

p

i=1 ( n i − 1 ) s ² _i ( X )

∑ ^p _i=1 ( n _i − 1 ) = ^∑

p

i=1 ( n i − 1 ) _n ¹

i

−1 ∑ ⁿ _j=1

ⁱ

( X ij − X ^¯ i• ) ² N − p

= ^∑

p

i=1 ∑ ⁿ _j=1

ⁱ

( X _ij − X ^¯ _i• ) ²

N − p = ^SSE

N − p = MSE.

The abbreviation SSE stands for sum of square errors, and MSE for mean square error. One can also show that SSE ∼ χ ² ( N − p ) .

The second estimator which is unbiased only under null hypothesis is

S ² _A ( _X ) = ^∑

p

i=1 ∑ ⁿ _j=1

ⁱ

( X ^¯ i• − X ^¯ •• ) ²

p − 1 = ^∑

p

i=1 n i ( X ^¯ i• − X ^¯ •• ) ² p − 1

= ^SSA

p − 1 = MSA.

The abbreviation SSA stands for factor A sum of squares, and MSA for factor A mean square. Moreover, one can prove that, under H 0 , SSA ∼ χ ² ( p − 1 ) , and SSA is independent from SSE. Further it is possible to show that if the null hypothesis is violated, the bias of MSA is positive, hence it overestimate the true value σ ² .

Now we can build a ratio

T ( _X ) = ^MSA MSE

underH

₀

∼ F ( p − 1, N − p ) ,

using the distribution results, we can now reject the null hypothesis if the P ( T ( _X ) ≥ T ( _x )) = p − value ≤ α.

2.2.4.3 Normality assumption violation

In both introduced test procedures, we assume that the underlying distributions are normal, and we use normality to construct the reference variable and test variable. If we cannot assure that a data set comes from a normal distribution, then we cannot guarantee the cor- rectness of the introduced methods.

By considering our test variables from the previous sections:

T ( X, Y ) = R µ

₁

−µ

2

( X, Y ) = ^X ^¯ − Y ^¯ r

s

²_x

n

₁

+ ^s _n

²^y

2

,

T ( _X ) = ^MSA MSE ,

we see that both test variables depends on data through sums or averages of random vari- ables.

The Central Limit theorem implies that for n → ∞, the distribution of sums and averages

is normal regardless of the distribution of the individual random variables. Hence, with a

large number of observations, we can perform the inference in an approximate way.

(31)

2.3 Stochastic Simulation

Now we continue by describing how probability and statistics is used in stochastic simula- tion. A probabilistic model has some stochastic properties. One aim of a simulation is to build a model which captures these stochastic properties and study the flow of the model over time. There are two ways of simulating a stochastic system, continuous and discrete.

Both ways offers the ability to know the history of the system at each time step. A continu- ous system considers small time steps, where variables evolve in continuous time. This way of simulation regards time as a continuous variable. In this approach a variable can change in an infinite small time step. An example of a continuous system is a car moving on a road where the variable velocity changes continuously (Ross 2012).

In the discrete event approach we have two key elements, variables and events. This way of simulation regards time as a discrete variable. Whenever an event occurs the values of the variables are updated and events occur at separated points in time. Values of variables are unchanged in between the events. An example of a discrete system is a grocery shop since variables such as customers in the shop, changes only when customers arrive or depart.(Law and Kelton 2000, p. 3-6).

2.3.1 Pseudorandom numbers

One important basis for a simulation study is the ability to generate random numbers, with a specified distribution. The modern approach to simulating random numbers is to use a computer, which successively generate pseudorandom numbers. Pseudorandom numbers have the appearance of being observations from independent random variables, even though they are deterministically generated (Ross 2012, p. 39). Generating a pseudorandom number starts with an initial value x 0 , called a seed, then recursively compute successive values x n , n ≥ 1.

x n = g ( x _(n−1) , x _(n−2) , ... ) (2.1)

where g ( . ) is a general function. The resulting quantity is called a pseudorandom number, which is taken as an approximation of a value from uniformly distributed random variable on interval (0,1). The approach in Equation 2.1 is a general generator. A battery of methods can be used to transform the uniform random numbers to any desired distributions. One of the distributions used in this thesis is Bernoulli distribution, which takes value 1 with probability p, and zero otherwise. To simulate from such distribution, we first simulate a uniform number. If the simulated number is smaller than p, we set the value of Bernoulli variable to 1 otherwise we set it to 0. We pass further reading of additional methods to literature about simulation such as Ross (2012) or Sokolowski and Banks (2010).

2.3.2 Bootstrap

Creating a simulation model often starts with an analysis of observed sample from a random variable. Then one have to consider one of two approaches. The first approach is non- parametric and uses sample data directly. The second approach is parametric and aims to determine a probability distribution of the underlying random variable. (Sokolowski and Banks 2010, p. 25-48).

To make use of data in a simulation setting, we need to have knowledge about the correct distribution and the underlying random model. The true random model is not attainable in reality, but we can recreate the random model in an approximate way, by using bootstrap.

Formally, following Davison and Hinkley (1997, p.11), let y 1 , ..., y n denote a single, homo-

geneous sample of data. The sample values can be seen as the outcomes of independent and

(32)

There are two main types, nonparametric and parametric bootstrap. In the parametric bootstrap we have a particular mathematical model, with adjustable constants or parameters ψ that fully determine f . Statistical methods based on this model are called parametric bootstrap methods. When no such mathematical model is used, the statistical analysis is nonparametric. In the nonparametric method we do not have any prior information about the underlying distribution we just assume that the random variable Y i are iid.

The empirical cumulative distribution function (ecdf) plays an important role, in the nonparametric bootstrap. The ecdf puts equal probability n ⁻¹ on each sample y _j . The corresponding estimate of F is the ecdf ˆ F, which is defined as the sample proportion,

F ˆ ( y ) = ^# { y _j ≤ y }

n ,

where # { A } is the number of times that event A occurs. More formally we can define the ecdf by,

F ˆ ( y ) = ¹ n

∑ n j=1

H ( y − y _j ) _,

where H ( u ) is the unit step function which jumps from 0 to 1 at u = 0.

Using inverse transform methods Ross (2012, p. 25-48), one can show that sampling from the ecdf is equivalent to sampling with replacement from original data set.

2.3.3 Bootstrap block resampling

In general, bootstrapping techniques requires independent data. But sometimes we are will- ing to recreate observed values which are time dependent, e.g., come from a specific time series. In this case, simulations mechanisms needs to take the time domain into account. An example of such settings is resampling of time dependent arrival process.

The simplest version of bootstrap block resampling, which, e.g., can be applied to station- ary or periodic time series, is to divide these data into b non-overlapping blocks of length l, where we suppose that n = bl. We set z 1 = ( y 1 , ..., y l ) , z 2 = ( y l+1 , ..., y _2l ) , and so forth.

This gives the blocks z 1 , ..., z b . Then the procedure is to take a bootstrap sample with equal probabilities b ⁻¹ from the z j , and then paste these end-to-end to form a new series. (Davison and Hinkley 1997)

2.4 Discrete Event Simulation

A Discrete Event Simulation (DES) is a simulation technique based around the idea of dis- crete events. DES uses variables and events to follow a model over time and determine a numerical quantity of interest. Following Ross (2012), we introduce the basic model which uses three variables - time variable, counter variable and system state variable. A time vari- able, t, refers to the amount of (simulated) time that has elapsed. Counter variables keep a count of the number of times that certain events have occurred by time t. System state vari- ables describes the “state of the system” at time t. These three variables keeps information and allows simulation of discrete time steps.

It exists an event list which lists upcoming events, in discrete time steps, which says when

an event occur. The values of the time, counter and system state variables are updated when

an event occurs and the output of interest can be collected. This makes it possible to track the

evolution of the system without looking at all time points during simulation process (Ross

2012).

(33)

2.4.1 Warm-up period

In many situations when we observe a real-world system, we consider an ongoing process with a ”steady”-state where entities are always present. Sometimes however, a simulation model starts with an empty system and one needs to consider a warm-up period, when the entities arrive to an empty model. Therefore, it is suitable to identify a point in the simulation time, when one can show that the system has reached a steady-state. Sokolowski and Banks (2010, p. 48-55) suggest that the time from simulation start until achieving a steady-state has to be deleted to accurately measure performance. Simulation should still start at time zero but the performance data are only collected after the warm-up period ends.

2.5 Software

2.5.1 Software usage

Several software programs is used during this thesis. SAP is used for extracting data. We mainly use MATLAB R2015b and the Simulink toolbox, more specifically SimEvents library.

This software has statistical features and is used to create discrete event simulation models.

For further reading, we refer to Mathworks (2015a).

2.5.2 SimEvents library

SimEvents is a MATLAB Product within the Simulink Toolbox. SimEvents is a simulation tool for creating discrete event simulation of random systems. In this thesis, SimEvents is used to create a simulation model of the global support process.

As stated in Section 2.4, a discrete event simulation, aims to model behaviour of entities as they pass through a system. SimEvents uses blocks with functions which allows the user to build a system. SimEvents also allows users to collect signals describing system status, which allows continuous analysis of the model. Moreover, SimEvents is connected to MATLAB which allows MATLAB functionality and possibiltiy to write your own functions.

Here we give a short introduction to SimEvents library and the most common blocks, we use in our simulation.

2.5.2.1 SimEvent blocks

We start by presenting a simple model to describe SimEvents modelling approach. In Fig-

ure 2.1 we can see three common blocks, entity generator, entity sink and signal scope. Entity

generator creates the entities which path is modelled, Sink declares the end of a entity path,

and Signal scope is used to track signal values during the simulation.

(34)

Notice that entity paths are described by arrows shown in Figure 2.2. Meanwhile signal paths are described by arrows shown in Figure 2.3.

Figure 2.2: Arrow describing entity path in SimEvents. Figure 2.3: Arrow describing signal path in SimEvents.

We continue by describing common SimEvent blocks we use in this thesis.

Generators

Figure 2.4 shows the Simevent block that defines time- based entity generator that creates a entity with a spe- cific inter arrival time.

Figure 2.4: SimEvents block Time-Based Entity Generator.

Sink

A sink determines the end of an entity path. The block is called Entity Sink, see Figure 2.5.

Figure 2.5: SimEvents block Entity Sink.

Servers

A server block is an operation that needs to be per- formed on the entities during a given time. In our the- sis we use three types of server blocks: Single-, N- and

Infinite Server are seen in Figure 2.6. Figure 2.6: SimEvents blocks: Single-, N- and Infinite Server.

Queues

A server can cause the entity path to be blocked, lead- ing to a need for a place to wait. Queue blocks are used for entities that are blocked. In our thesis we use two types of queue blocks: FIFO - first in first out and

Priority Queue, which can be seen in Figure 2.7. Figure 2.7: SimEvents blocks: FIFO- and Priority Queue.

Output switch and path combiner

Output switch and path combiner allows users to direct and combine several paths in the process. The blocks can be seen in Figure 2.8.

Figure 2.8: SimEvents blocks: Output Switch and

Path Combiner.

(35)

Attributes

Attributes are properties of entities, which describe some characteristic of the entity. Attributes are useful when defining different progression in the system, e.g.

priority in queues. SimEvents allows setting, reading and modifying attributes. The attribute blocks are seen in Figure 2.9.

Figure 2.9: SimEvents blocks: Set Attribute, Attribute Function and Get Attribute.

Event-Based Sequence

A block that generates a sequence of numbers from a specified column vector. This vector can either be described in the block dialog box or be predefined in MATLAB workspace, and used as input parameter to

the Event-Based Sequence block, seen in Figure 2.10. Figure 2.10: SimEvents blocks: Event-Based Sequence.

Timer and schedule

Start Timer and Read Timer allows user to measure the time it takes for an entity on a specified path. Sched- ule Timeout is a time stamp tagged to an entity. When timeout occurs the entity leaves the queue or server.

Cancel Timeout removes the timeout tag from an entity.

The graphical representation of the blocks is presented in Figure 2.11.

Figure 2.11: SimEvents blocks: Start Timer, Read Timer, Schedule Timeout and Cancel Timeout.

Resources

Resource blocks allows user to define a resource pool.

Resources can be specific to certain entities. The re- source pool can be shared among several blocks. Af- ter resource has been in use, it can either be returned (reusable) or destroyed (disposable). Resource blocks are: Resource Pool, Resource Acquire and Resource Re-

lease as seen in Figure 2.12. Figure 2.12: SimEvents blocks: Resource Pool, Resource Acquire and Resource Release.

Gate

A gate is a block that conditionally opens or closes the path it is placed on. An external signal sends info if it is opened or closed. The block Enable Gate is presented in Figure 2.13

Figure 2.13: SimEvents block: Enable Gate.

Clock

We can access the Simulink clock which enables us to keep track of the time of day. This is useful for many sit- uations. An example, consider a subway station where the subway arrives every hour and pick up passengers.

We can construct this model by using an Enable gate and Clock. The SimEvents block called Clock is seen

Figure 2.14. Figure 2.14: SimEvents block: Clock.

(36)

MATLAB Function

A MATLAB Function block opens a MATLAB function and allow a user to decide input and output signal. This block also enables user to perform MATLAB actions on the signal and adjust according to a specification. The SimEvents block called MATLAB Function is seen Fig- ure 2.15.

Figure 2.15: SimEvents block: MATLAB Function.

To Workspace

To Workspace block writes event-based signals to MAT- LAB workspace. This block is used to collect simulation results. The block writes the time and values of the signals connected to the block. It allows exporting the results to MATLAB workspace for further analysis. To

Workspace block is seen in Figure 2.16. Figure 2.16: SimEvents block: To Workspace.

By combining SimEvents with MATLAB environment it is possible to create very so-

phisticated simulation models. We have provided common blocks and a simple model from

SimEvents to improve readability and understanding of this thesis. For further reading about

SimEvents, we refer to its documentation Mathworks (2015b).

(37)

3 Data

We want to create a simulation model of a global support process. In order to build a simulation model we need to collect data from a real-world process. This chapter describes the observed data and processing of this data.

3.1 Observed Data

The case company has provided data of CSR flow in the support process. In Section 1.3 we present examples of variables that are observed and collected for each CSR. The data collection for each CSR is a mixture of automatically collected data and manually reported data. The data regarding a CSR is stored in a data base. Stored data is homogeneous, which has facilitated the data management.

In total there are 180 characteristics and 145 keyfigures recorded. Characteristics is data that gives information about a CSR and is either descriptive, e.g. name of customer, or binary e.g. if a CSR has passed a support level or not. A keyfigure is numerical information, e.g.

amount of time spent on support handling.

Information about the CSR is stored in different tables. The different tables are divided to give different perspective of the CSRs process through the global support. In this thesis we have considered three tables to replicate the support process. The three data tables are performance, time reporting and change history. These tables is described in detail below and an example is given to clarify the structure of data in each table.

3.1.1 Performance table

The CSR performance table is structured with data describing each CSR. The performance table holds information that summarizes CSR flow, from creation until closing. This table stores CSR data such as: priority, customer country, product domain and more. We also have stored data of numbers such as: days until predefined subtasks are performed, total time spent on support handling by local and global units. It also contains duration of a CSR, i.e. the time elapsed between creation and closing of a CSR.

We consider the following variables from performance table:

• CSR ID - for identification of a CSR.

• Date and time for creation and closing of the CSR.

• Priority - CSRs have different priority which specifies the magnitude of the customer problem. Priority defines the service time which the customer can expect.

• Customer Region - Defines the customer region that the CSR originates from. During the situation analysis we discovered that different customer regions experience various type of problems.

• Sum of CU support time - Total time logged to a CSR, by CU support level. This is sometimes referred to as sum of processing time.

• Sum of global support time - Total time logged to a CSR, by global support level.

Global support level is a combination of GSC and PLM support level.

• Duration in days - Duration in days from CSR registration until it is closed. This is

Customer Support Process Analysis: Using statistics and modeling to analyze a global customer support process

Customer Support Process Analysis

Using statistics and modeling to analyze a global customer support process

Tobias Björch Fredrik Strålberg

June 14, 2016

Copyright © 2016 Tobias Bj ¨orch and Fredrik Str˚alberg All rights reserved

CUSTOMER SUPPPORT PROCESS ANALYSIS - USING STATISTICS AND MODELING TO ANALYZE A GLOBAL CUSTOMER SUPPORT

Submitted in partial fulfillment of the requirements for the degree Master of Science in In- dustrial Engineering and Management

Department of Mathematics and Mathematical Statistics Ume˚a University

SE-901 87 Ume˚a, Sweden Supervisor:

Konrad Abramowicz Examiner:

Leif Nilsson

Abstract

Sammanfattning

Vidare presenteras diskussion om validering av simuleringsmodellen samt resultat fr˚an tester av potentiella processf ¨orb¨attringar.

Svensk titel: Analys av en supportprocess

To our families

Acknowledgements

We wish to thank various people that has contributed to this thesis. Firstly, we would like to thank our supervisors at the case company. You helped us create the idea behind this thesis and guided us during our work. It has been a pleasure working with both of you.

Then we wish to thank our families and friends for their understanding and patience, during evenings and weekends, when we have been working on this thesis.

Finally, we would also like to show our gratitude to employees at the case company

that have been helpful and willing to answer all our questions.

Contents

1 Introduction 2

1.1 Background . . . . 2

1.1.1 Case company . . . . 2

1.1.2 Problem description . . . . 3

1.1.3 Possible ways to analyse a global support process . . . . 3

1.1.4 Process description . . . . 4

1.1.5 Service requirements and work schedule . . . . 9

1.2 Purpose . . . . 11

1.2.1 Potential process enhancement . . . . 11

1.3 Observed Data . . . . 12

1.4 Delimitations . . . . 12

1.5 Approach and Outline . . . . 13

2 Theory 14 2.1 Probability Theory . . . . 14

2.1.1 Sample space, events and probability . . . . 14

2.1.2 Axioms of probability . . . . 14

2.1.3 Random variable . . . . 14

2.1.4 Random variable characteristics . . . . 14

2.1.5 Expected value and variance . . . . 15

2.1.6 Distributions used in this thesis . . . . 15

2.2 Statistical Inference . . . . 16

2.2.1 Sample mean and variance . . . . 16

2.2.2 Hypothesis testing . . . . 17

2.2.3 Methods of tests . . . . 17

2.2.4 Inference about differences in means of populations . . . . 18

2.3 Stochastic Simulation . . . . 21

2.3.1 Pseudorandom numbers . . . . 21

2.3.2 Bootstrap . . . . 21

2.3.3 Bootstrap block resampling . . . . 22

2.4 Discrete Event Simulation . . . . 22

2.4.1 Warm-up period . . . . 23

2.5 Software . . . . 23

2.5.1 Software usage . . . . 23

2.5.2 SimEvents library . . . . 23

3 Data 27 3.1 Observed Data . . . . 27

3.1.1 Performance table . . . . 27

3.1.2 Time reporting table . . . . 28

3.1.3 Change history table . . . . 28

3.2 Data Processing . . . . 29

3.2.1 Arrival process . . . . 29

3.2.2 Model parameters . . . . 31

4 Method 33

4.1 Implementation . . . . 33

4.2 Model Representation . . . . 33

4.2.1 Attributes of customer support requests . . . . 33

4.2.2 Generate customer support request . . . . 34

4.2.3 Customer unit . . . . 35

4.2.4 Global support center . . . . 37

4.2.5 Product line maintenance . . . . 42

4.3 Simulation settings . . . . 44

4.3.1 Time . . . . 44

4.3.2 Work schedule . . . . 44

4.3.3 Input data . . . . 45

4.4 Representation of Potential Process Enhancement . . . . 45

4.4.1 Early routing . . . . 45

4.4.2 No individual assignment of customer support request in the global support centers . . . . 46

4.4.3 Number of engineers in product line maintenance . . . . 47

5 Results 48 5.1 Validation Of Model . . . . 48