Automatic vs. Manual Data Labeling

(1)

STOCKHOLM SWEDEN 2020,

Automatic vs. Manual Data Labeling

A System Dynamics Modeling Approach CLAS BLANK

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

(2)

(3)

Automatic vs. Manual Data Labeling

A System Dynamics Modeling Approach

by

Clas Blank

Master of Science Thesis TRITA-ITM-EX 2020:382 KTH Industrial Engineering and Management

Industrial Management SE-100 44 STOCKHOLM

(4)

Automatisk Kontra Manuell Dataannotering

med Systemdynamiksmodellering

Av

Clas Blank

Examensarbete TRITA-ITM-EX 2020:382 KTH Industriell teknik och management

Industriell ekonomi och organisation SE-100 44 STOCKHOLM

(5)

A System Dynamics Modelling Approach

Clas Blank

Approved

2020-06-30

Examiner

Gustav Martinsson

Supervisor

Bo Karlsson

Commissioner

Klarna AB

Contact person

Stefan Magureanu

Abstract

Labeled data, which is a collection of data samples that have been tagged with one or more labels, play an important role many software organizations in today's market. It can help in solving automation problems, training and validating machine learning models, or analysing data. Many organizations therefore set up their own labeled data gathering system which supplies them with the data they require. Labeling data can either be done by humans or be done via some automated process. However, labeling datasets comes with costs to these organizations. This study will examine what this labeled data gathering system could look like and determine which components that play a crucial role when determining how costly an automatic approach is compared to a manual approach using the company Klarna's label acquisition system as a case study. Two models are presented where one describes a system that solely uses

humans for data annotation, while the other model describes a system where labeling is done via an automatic process. These models are used to compare costs to an

organization taking those approaches. Important findings include the identification of important components that affects which approach would be more economically efficient to an organization under certain circumstances. Some of these important components are the label decay rate, automatic and manual expected accuracy, and number of data points that require labeling.

Key-words

System Dynamics, Modeling, Data Annotation, Data Labeling, Cost Comparison

(6)

med Systemdynamiksmodellering

Clas Blank

Godkänt

2020-06-30

Examinator

Gustav Martinsson

Handledare

Bo Karlsson

Uppdragsgivare

Klarna AB

Kontaktperson

Stefan Magureanu Sammanfattning

Annoterad data, vilket är en kollektion utav datapunkter som har blivit annoterade med en eller flera taggar, spelar en viktig roll för många mjukvaruföretag i dagens marknad.

Det kan hjälpa till att lösa automatiseringsingsproblem, träna och validera

maskininlärningsmodeller, eller analysera data. Många organisationer sätter därför upp sina egna dataannoteringssystem som kan leverera den annoterade data som behövs inom organisationen. Annotering kan göras av människor, men kan också göras via en automatiserad process. Emellertid kommer annotering utav data med kostnader för organisationen. Denna studie undersöker hur ett sådant dataannoteringssystem kan se ut och analyserar vilka komponenter som spelar en betydande roll när kostnader mellan ett automatiserat system och ett manuellt system ska jämföras. Klarnas

dataannoteringssystem kommer att användas som en case-studie. Två modeller presenteras varav den ena beskriver ett system där enbart manuellt annoteringsarbete utförs, och den andra beskriver ett system där annotering utav data utförs via en automatisk process. Några viktiga resultat av denna studie är identifikationen utav betydelsefulla parametrar i modellerna när det kommer till att jämföra den ekonomiska effektiviteten mellan de två olika dataannoteringsstrategierna. Exempel på dessa komponenter är annoteringens förfalltakt, den förväntade manuella/automatiska pricksäkerheten, och mängden data som behöver annoteras.

Nyckelord

Systemdynamik, Modellering, Dataannotation, Kostnadsjämförelse

(7)

1 Introduction 1

1.1 General Introduction . . . 1

1.2 Research Questions . . . 3

2 Background 4 2.1 The Company Klarna . . . 4

2.2 Manual Labeling . . . 5

2.3 Automatic Labeling . . . 6

2.4 Web Automation . . . 8

3 System Dynamics Modeling Theory 10 3.1 System Dynamics . . . 10

3.2 Building System Dynamics Models . . . 14

3.3 System Dynamics Model Testing . . . 16

v

(8)

3.3.1 Tests of Model Structure . . . 18

3.3.2 Structure-Verification Test . . . 18

3.3.3 Parameter-Verification Test . . . 18

3.3.4 Extreme-Conditions Test . . . 19

3.3.5 Boundary-Adequacy (Structural) Test . . . 19

3.4 Tests of Model Behavior . . . 20

3.4.1 Behavior-Reproduction Tests . . . 20

3.4.2 Behavior-Prediction Tests . . . 21

3.4.3 Behavior-Anomaly Test . . . 21

3.4.4 Surprise-Behavior Test . . . 21

3.4.5 Extreme-Policy Test . . . 22

3.4.6 Boundary-Adequacy (Behavior) Test . . . 22

3.4.7 Behavior-Sensitivity Test . . . 22

3.4.8 Tests of Policy Implications . . . 23

3.4.9 System-Improvement Test . . . 23

3.4.10 Changed-Behavior-Prediction Test . . . 23

3.4.11 Boundary-Adequacy (Policy) Test . . . 24

3.4.12 Policy-Sensitivity Test . . . 24

(9)

3.5 Qualitative Model Evaluation . . . 24

4 Research Methodology 26 4.1 Model Development . . . 26

4.2 Model Testing . . . 28

4.3 Previous Studies . . . 29

4.4 Software For System Dynamics Modelling . . . 30

5 Model Description 32 5.1 Model Purpose . . . 32

5.2 Target Audience . . . 33

5.2.1 Example Case: Cart Scraping . . . 33

5.3 Manual Data Annotation System Description . . . 35

5.4 Automatic Data Annotation System Description . . . 38

5.5 Model Output . . . 39

5.6 Component of the System Dynamics Model . . . 39

5.7 Variables . . . 44

5.8 Combined Models . . . 46

5.9 Model Architecture . . . 49

(10)

5.9.1 Time Components . . . 49

5.9.2 Analyst Related Components . . . 50

5.9.3 Label Components . . . 52

5.9.4 Tool Related Components . . . 54

5.9.5 Analyst Efficiency and Available Time Components . . 56

5.9.6 Total Costs Components . . . 56

5.9.7 Automatic Costs Components . . . 57

6 Test Results 60 6.1 Extreme Conditions Test . . . 60

6.1.1 Label Requests . . . 61

6.1.2 Analyst Loss Rate . . . 62

6.1.3 Effective Time Per Analyst . . . 63

6.1.4 Label Decay Rate . . . 63

6.1.5 Expected Manual Accuracy . . . 64

6.2 Sensitivity Tests . . . 65

6.2.1 Expected Manual Accuracy . . . 66

6.2.2 Label Decay Rate . . . 69

6.3 Structure Verification Test . . . 69

(11)

6.3.1 Re-Labeling and Validation Steps . . . 69

6.3.2 Independent Labels . . . 72

7 Annotation Approach Cost Comparison 74 7.1 Case Scenarios . . . 75

7.2 Component Value Variation . . . 78

8 Discussion 84 8.1 Model Development and Testing . . . 84

8.2 Model Output and Usage . . . 85

8.2.1 Model Output . . . 85

8.2.2 Data . . . 86

8.2.3 Importance of Labels . . . 88

8.2.4 Using Averages . . . 88

8.2.5 Organizational Decision Making . . . 89

8.3 Side Effects . . . 89

8.4 Sustainability Aspect . . . 90

9 Conclusions 91

Bibliography 96

(12)

A Stock and Flow Diagrams 99

(13)

3.1 Simple system that shows the basic components of a system dynamics model. . . 12 3.2 Graph that shows how rabbit population changes over time. . . 13

5.1 Example shopping cart from IKEA.com/se. . . 34 5.2 Flow chart of the states in the labeling process at Klarna. . . . 36 5.3 Comparison of the degeneration of accuracy with varying pa-

rameter values. . . 47 5.4 Shows model components that are related to time requirements

for labeling and validation. . . 50 5.5 Shows model components that are related to analysts in the

system. . . 52 5.6 Shows model components that are related to the states of labels. 55 5.7 Shows model components that are related to the annotation tool. 55

xi

(14)

5.8 Shows model components that are related to analyst time efficiency. . . 56 5.9 Shows model components that are related to the total costs for

the organization. . . 57 5.10 Shows model components that are related to Costs of the au-

tomatic process. . . 58 5.11 Shows model components that are related to accuracy of the

automatic labeling process. . . 59

6.1 The effect on the number of analysts in the system when the number of incoming label requests is set to an extreme value.

(y-axis: Number of Analysts) . . . 62 6.2 Comparison of the Analyst Set-Up Cost with different Loss

Rate. (y-axis: e) . . . 63 6.3 The effect on labeling tasks in the system when effective time

is set to 0. (y-axis: Number of tasks in each state) . . . 64 6.4 Comparison of The Number of Analyst with different Decay

Rates. (y-axis: Number of analysts) . . . 64 6.5 Annotation tasks flow when Expected Manual Accuracy is set

to 0%. (y-axis: Number of labels) . . . 65 6.6 Value of Labeled Bank when the Expected Manual Accuracy

is varied. . . 67

(15)

6.7 Value of The Number of Analysts when the Expected Manual Accuracy is varied. . . 68 6.8 Value of The Number of Analysts when the Label Decay Rate

is varied. . . 70 6.9 Value of The Labeled Bank when the Label Decay Rate is varied. 71

7.1 Two scenarios with different values of certain model components. (Manual Approach Green, Combined Approach Blue, y-axis: e) . . . 77 7.2 Cost comparison when The number of incoming label requests

is varied. (Manual Approach Green, Combined Approach Blue, y-axis: e) . . . 80 7.3 Cost comparison when label decay rate is varied. (Manual

Approach Green, Combined Approach Blue, (y-axis: e) . . . 81 7.4 Cost comparison when analyst hourly rate is varied. (Manual

Approach Green, Combined Approach Blue, y-axis: e) . . . . 82 7.5 Cost comparison when time required per labeling is varied.

(Manual Approach Green, Combined Approach Blue, y-axis:

e) . . . 83

A.1 Stock And Flow Diagram of the Manual Labeling Process. . . 100 A.2 Simple Stock And Flow Diagram of the Initially Conceptual-

ized Automatic Labeling Process. . . 101

(16)

A.3 Stock And Flow Diagram of the Combined Automatic/Manual Labeling Process With Re-Labeling. . . 102 A.4 Stock And Flow Diagram of the Combined Automatic/Manual

Labeling Process Without Re-Labeling. . . 103

(17)

5.1 Table that divides model components into stocks, flows, and variables. . . 42 5.2 Table that divides model components into endogenous and ex-

ogenous. . . 43

7.1 Model Component Values . . . 76

xv

(18)

(19)

Introduction

1.1 General Introduction

Many modern software companies use 3rd party or in-house manual labeling facilities for gathering labeled data. These companies provide important data that is used for automation as well as creating datasets for statistical analysis, training of machine learning models, etc. Another way to label these datasets is to use automatic processes, for example heuristic methods and/or machine learning methods. Letting computers do the labeling work is generally more efficient than having humans do the work. However, for many labeling problems the automated methods might not be sufficiently accurate or feasible using state-of-the-art techniques and consequently, organizations might have to spend years of development time developing and improving these techniques.

Organizations have to make a decision whether to spend resources and time in trying to acquire the appropriate competencies and then develop potentially novel methods to solve the problem at hand, or if they should use 3rd party

1

(20)

human labelers and allocate development resources on other activities.

In this study, the economic efficiency and viability of 3rd party or in-house manual labeling services and automated methods will be discussed and compared according to a defined set of parameters, e.g. number of engineers working on the systems, type of labels being generated (lengths of workflows, time required to produce a label), required dataset size, rate of dataset viability decay, development set-up costs, etc. The aim is to deliver a framework that will enable decision makers in organizations interested in collecting labeled web data to make swift decisions regarding whether or not to automate a labeling process as well as identifying the cost units for manual and automatic labeling respectively. For this study, the company Klarna’s label acquisition infrastructure will be used as a case study. To compare a manual label gathering approach to an automatic approach, system dynamics modelling will be used. System dynamics modeling was chosen because of its ability to model and understand complex systems with multiple components and connections between the components. Moreover, the label gathering system has clear outputs and inputs which can be represented using system dynamics modeling.

Additionally, there are many software tools available that facilitates using and conducting tests on system dynamics models. Lastly, the author of this thesis has previous experience in using system dynamics modeling and hypothesised that it can be a useful method to analyse the labeled data gathering system.

The results of this study are interesting for any organization or individual that requires labeled datasets for their activities and is especially useful for organizations that require web-page labeling. The decision to use a 3rd party, or an in-house, label gathering pipeline can have important implications for organizations as it could possibly affect how they are structured.

(21)

Having this type of model comes with many benefits. E.g. the model would assist organizations interested in acquiring labeled web page datasets to chose which direction to take. It would also help organizations that already have dataset labeling acquisition systems in place to analyze the behavior of the system and parts of the system over time.

1.2 Research Questions

The goal of this study is to analyze and compare two ways of gathering labeled web page datasets. The two approaches is to let human analysts do the labeling, or to develop automatic processes that are capable of labeling the web pages at an acceptable efficacy. This study will use system dynamics modelling to compare the two approaches, which will provide a way for decision makers to analyze behaviors of the system.

The research questions that will be considered in this thesis are as follows:

1. Which components should be incorporated in system dynamics models that compare costs of manual and automatic labeling?

2. How could the components in a system dynamics model that aims to compare cost of manual and automatic labeling be connected and how do they affect each other?

3. What data needs to be gathered to reasonably assess the variable values of the components in the models?

4. How can a manual and an automatic approach to data labeling be com- pared with the help of system dynamics models?

(22)

Background

Section 2.1 will provide some background about the host company. Sections 2.2-2.3 will describe the distinction between automatic and manual labeling, and provide some background into the two approaches. Section 2.4 will provide background into where and how web automation have been used in the past.

2.1 The Company Klarna

Klarna is a company founded in 2005 in Stockholm, Sweden. The main purpose of the company is to improve the experience for online shoppers and to simplify how online shoppers interact with online merchants. Today, Klarna is one of Europe’s largest banks and provide payment solutions for consumers in multiple countries and continents.

Currently, Klarna uses a 3rd party manual data labeling system to acquire

4

(23)

annotated data. One of the main use cases within Klarna for annotated data is automating user interactions with merchant web pages. E.g. filling in shipping information and gathering product related information from shopping carts to be used in the integrated Klarna shopping application. The annotated data is used to build an automated system for collecting this information which will be further elaborated on in chapter 5.

2.2 Manual Labeling

There are organizations today which focus solely on providing dataset labeling services. This is a labor intensive and time consuming process, and is thus often placed in low cost countries like India, China, Nepal, and the Philip- pines. Estimates say that the global market value for data labeling was at $500 million in 2018 and will pass $1.2 billion in 2023[1]. Some large software corporations currently have services that supply data labeling through their online platforms. For example, the company Amazon offers Ground Truth[2]

service that uses a combination of manual labeling and automatic labeling to be able to label large quantities of data for their customers. Ground Truth uses manually labeled data to train machine learning models so that it will be able to handle data labeling automatically. Ground Truth sends individual data points that the models find hard to identify to human labelers. Furthermore, Microsoft offers some services to facilitate labeling projects through their machine learning platform Azure. For example, they offer software applications to structure labeling projects and also have a applications for “ML (Machine Learning) assisted labeling” which uses transfer learning from already trained models[3].

(24)

Another technique of manual labeling is crowdsourced labeling. In crowdsourcing a group of people are asked to contribute in performing a task that cannot be individually done with the same ease[4]. In crowdsourced labeling, the labelers may or may not be compensated for the work they do. Moreover, the participants are not experts which means that the problems are supposed to be simple and well-formed enough so that common sense is sufficient to identify the correct categories for the data points. Additionally, as a conse- quence of human error and bias it is essential to verify the reliability of the labels. Furthermore, since there is limited control over who participates in the crowd labeling project, there may be low quality annotaters, “spammers”, that assign labels randomly which affects the quality of the labels[5]. One way to account for the existence of spammers is to assign multiple labelers to every data point and to aggregate the result. This however, assumes that a majority of the crowd are not spammers. [5] propose a method to rank participants according to a “spammer score” and evaluate individual labels according to this score.

2.3 Automatic Labeling

Automatic labeling refers to any data point labeling that is not conducted by humans. This could mean labeling by machine learning models, by heuristic approaches, or a combination of the two.

A heuristic approach refers to passing single data points through a predefined set of rules that determine the label. These rules are often set up by human experts that can recognize the underlying factors that determine the label of the data point. Heuristic approaches have the advantage of being cost

(25)

efficient since for each type of data point the rules can be set up by only a single or a few human experts. The labeling itself is also relatively efficient since each data point will be passed through a limited number of rules. How- ever, if the structure of the data of interest is changing over time, these rules may become irrelevant or even faulty which will decrease the accuracy of the labels or even render the algorithm unusable until the changes are accounted for. Furthermore, the data may be of such a nature that it is difficult to express these rules or the experts do not know the individual algorithmic steps they themselves take to evaluate a data point. For example, humans have an easy time recognizing the difference between a dog and a cat in an image, but do not know exactly what steps the brain takes to make this distinction.

Machine learning can be trained to recognize which label should be put on individual data points, which is often referred to as a categorization problem in machine learning. The general idea is that a model is given a single data point in the form of a set of input parameters, and outputs one of the predefined labels which it finds most likely. The difference between the machine learning approach and the heuristic approach is that instead of humans determining which rules should be used to determine the label, the model is supposed to learn these rules. However, for a model to be able to learn these heuristics, it needs a sample of pre-labeled data to be able to identify the differences between the categories. In some cases the model might need to see thousands of examples before it is going to be acceptably accurate. This initial data sample will need to be labeled using other techniques. Furthermore, the data given to the model needs to represent the general data it later should make inferences from. If a machine learning model is able to be trained, it has the advantage of being somewhat adaptable to changes in the structure of the data, as long as the set of input parameters remains the same.

(26)

2.4 Web Automation

The internet is an important part of many peoples daily lives. Tasks that were traditionally carried out mainly in the physical world, or by a physical institu- tion, have moved towards being done on the world wide web. Anything from managing finances, reading the news, to shopping have online based interfaces.

With a lot of activity, there also comes opportunity for automating tasks that would simplify and speed up the every day use of the services available online.

Together with the fact that the web is entirely machine readable and manipu- lable the area of web automation is becoming an increasingly interesting and relevant area of research.

Web automation can include automation of web interfaces, such as clicking links, filling forms, and extracting data. Some of these actions can be tedious if required to be done often in a repetitive manner, e.g. filling in shipping information when shopping online. Automating these types of processes could save users time in the long run - and in the example of shipping information - allow online merchants to provide a more pleasant experience to their online customers.

Automating processes on the web is a relatively new research area which potentially has large implications on how we use the internet. Although web automation is a relatively novel research area, there have been some attempts at automating complex tasks performed on the web browser. One particular problem with automating web based tasks is that the action space - the possible actions that can be taken on a specific web page - is oftentimes large which increases the complexity of finding the "correct" action. Another hurdle is that tasks that are to be automated often consists of a series of actions that need to

(27)

be taken to reach the goal, and if an automation agent fails to complete only one of these actions, the task will not be successfully completed. One of the leading methods to create agents that are able to complete tasks on the web is reinforcement learning. Some previous research have used reinforcement learning with workflow-guided exploration on various web based actions like clicking checkboxes, buttons, and links [6], [7].

Tightly related to automating processes on the web is extracting information from web pages. Attempts at information extraction from web pages have been made by utilizing the information in the hierarchical HTML-structure represented by the document object model (DOM) tree. [8] presents an LSTM- neural network architecture called RiSER for email classification that utilizes textual content, manual annotation features, and the DOM-tree structure of HTML-enriched emails. They show that the model built from the proposed architecture is capable of learning from the structural information contained in the DOM-tree, which opens up for the architecture to be used in a wider set of application areas on problems where structural information is considered to be important for classification.

(28)

System Dynamics Modeling The- ory

This chapter is aimed at providing some theory on system dynamics modeling. What it is, where it has been used, and important things to consider when developing a system dynamics model. It will also provide information of how a model builder could go about verifying the usefulness and how to build confidence in such a model.

3.1 System Dynamics

System Dynamics is an area that aims to describe the behavior and connections between components in a complex system. Often times, projects in the context of organizations exceed the budget that was set in terms of costs or time limit. Managers might claim that these divergences are caused by factors that are outside of their control (external involvement, unforeseeable events, etc.)

10

(29)

and refuse to take responsibility. However, if they would have had more insight into the system that they are a part of as a whole, they possibly could have been able to foresee disrupting events in the project timeline and act accordingly to mitigate these events and lessen their effect on the project. System dynamics can mitigate this potential lack of information and understanding of the system by formalizing system structure, behavior, and sub parts. It also provides a way to visualize systems through stock and flow diagrams and casual loop diagrams. These visualization standards allows for easier system communication and explanation. Further, system dynamics modeling can help to analyse why certain behaviors in systems occur and can potentially show how less desirable behaviors could be avoided.

A system in the system dynamics perspective is often described using stocks, flows, feedback loops, and links. Stocks are accumulations of certain kinds of information or material that flows through them. A flow is the input/output of a stock that describes how the quantities in the stocks change over time. The process of flows accumulating/decumulating in stocks is called integration[9].

Feedback is the transmission and return of information about the amount currently in the stocks back to the systems flows. This information then tells the flows in the system how the amount currently accumulated in the stocks affects their flow rate. Links are connections between the stocks, flows, and variables in the system that describes dependencies between them.

Stocks and flows are often conceptualized as having constraints associated with them, which means that they cannot fall below a certain minimum value or exceed a certain maximum value. For example, a rabbit population model that describes the amount of rabbits to be negative or infinite could be seen as fundamentally flawed if the model tries to simulate a realistic scenario. There- fore, [9] suggests that model developers search for constraining factors that the

(30)

Figure 3.1: Simple system that shows the basic components of a system dynamics model.

model’s flows can process and/or that the model’s stocks can accumulate.

Figure 3.1 Shows a simple system that simulates a rabbit population. The number of rabbits in a population is held in the stock in the middle of the figure, while the blue arrows in and out of the stock are the flows that describes the number of rabbits added and removed to the population at a given point in time. The ovals are variables that determine at which rate rabbits are added and removed. Lastly, the gray striped lines are links that represent a one-way dependency between the starting point of the link to the ending point. Figure 3.2 Shows how the amount of rabbits in the population changes in a given period of time with the birth rate variable set to 0.2, the death rate variable set to 0.1, and the initial number of rabbits in the population set to 100.

System Dynamics modeling have been used in a wide range of different research areas to gather thorough understanding of complex systems. A complex system is a system composed of many components which may interact with each other. Complex systems are characteristically difficult to model due to dependencies, relationships, or other interactions between parts of the system or its environment. System Dynamics Simulation has been implemented in oper-

(31)

Figure 3.2: Graph that shows how rabbit population changes over time.

ations in multipurpose reservoir systems [10], where it was shown that through validation and sensitivity analyses that system dynamics simulation model was capable of simulating interactions among reservoir’s various functions, and that the insights gained from model simulations can assist in environmentally friendly operations of the reservoir. [11] used the system dynamics approach to model the carbon emission trading market conditions in China, where appropriate investment strategies for Chinese power enterprises were explored.

[9] discusses the use of system dynamics in economic modeling. They describe three principle ways that system dynamics is used, where the first involves translating an existing economic model into a system dynamics format, while the second involves creating an economic model from scratch by following the rules and guidelines of the system dynamics paradigm. The former approach enables well-known economic models to be represented in a common format and the second one has the advantage of yielding models that are more realistic and can produce results that are unexpected. The third is a hy-

(32)

brid of the first two approaches in which a well-known economic model is translated into a system dynamics format and then altered so that it conforms to the format of system dynamics. This approach tries to get the benefits of the two former approaches.

3.2 Building System Dynamics Models

[12] propose that the first step in the model building stage is conceptualization of the model. Specifically, they argue that there are four essential stages that a model builder go through when creating a system dynamics model. They are conceptualization, formulation, testing, and implementation. Although there are clear distinctions between the four stages, they are recursive. After each stage is completed, the model builder might have to return to the previous stages to incorporate information gained from the latter ones.

The first step of the conceptualization phase is to define the model purpose.

By deciding on and defining the model purpose, the model builder makes the later choices of both components and structure feasible. The goal of this stage is to arrive at a rough conceptual model capable of addressing the relevant problem in a system. It can also be argued that model conceptualization leads to more knowledge about how the system works and operates. After the focus area of the model has been selected, a model builder must collect necessary data that would help further define the focus of the model. Data could consists of measured statistical data and of operating knowledge from people familiar with the real system¹. Furthermore, a model builder must consider the main

1It can be argued that no such thing as a system actually exists in the real world, but are merely conceptualizations of the human mind. However, the term is useful for describing a

(33)

audience of the model. If the model’s structure and behavior cannot be un- derstood by its audience, or if it does not address the questions that the target audience is interested in, the model loses its usefulness to the audience [12].

The next step is to define the model boundary. The model boundary con- tains all components presented in the final model. The modelers could do this by brainstorming all components they see as necessary. Even those for which they are not certain. When selecting components the model builder must con- sider that (1) Components are necessary - The modeler sets the boundary so that nothing necessary is excluded from the model. Also, nothing included should be unnecessary. (2) Components should be aggregated - Similar con- cepts should be aggregated if doing so does not change the nature of the problem being modeled or the model purpose. Simple models with fewer compo- nents avoids unnecessary complications. (3) Components must be directional - All important components must have a directional name that reflects how the components’ values can grow larger or smaller [12].

The model builder should also consider which components are endogenous, and which are exogenous. Endogenous variables are variables that are involved in the feedback loops of the system, while exogenous are variables whose values are not directly affected by the system. The list of components conceptualized in this stage is a a guideline for the model builder going forward. Not a strict frame for the model.

The next step in the conceptualization process is to identify reference modes and often to create behavior charts that describe these modes. The reference mode graph has time on the horizontal axis and units of the variables on the

distinction between the modeled system and what is observed in real life which is why this wording is used.

(34)

vertical axis. Creating a reference mode graph comes with the benefit of visualizing mental models and historical data that then can be a valuable resource when formulating the stock-and-flow structure of the model. [12] argue that verbal descriptions or a set of statistics about system behavior can serve the same purpose, but that they are at risk of being lengthy and confusing without carrying the same visual impressions that a reference mode graph does. When no historical information is available, a model builder can create hypothesized reference modes that would capture key features of the behavior pattern of important system concepts.

The final step in conceptualization is deciding on the basic mechanisms of the system. The basic mechanism is the smallest set of components and relations that whose cause-and-effect relationships are capable of generating the reference mode. Deciding on the basic mechanism, also demands creating a dynamic hypothesis, which is an explanation of the reference mode behavior.

The dynamic hypothesis should be consistent with the model purpose. The model builder can chose to map the basic mechanisms in the form of either casual-loop diagrams or stock-and-flow diagrams.

3.3 System Dynamics Model Testing

When it comes to building confidence in system dynamics models conducting tests of the model is a good way to go on about this. [13] describes validation as the process of establishing confidence in the soundness and usefulness of a model. According to them, there is no method for proving a model to be correct since scientific models generally stands because they have not been disproven and because there is shared confidence in its usefulness. Testing a

(35)

system dynamics model is thereby conducted by testing it against a diverse set of empirical evidence and by seeking disproofs which develops confidence in the model as it withstands tests.

The model builder accumulates confidence that a model behaves plausi- bly and generates problem symptoms or modes of behavior seen in the real system. They also stress the that the validation process includes the communication process in which the model builder must communicate the bases for confidence in a model to an audience. Further, a model may be considered useful for scientists if it generates insight into the structure of real systems, makes correct predictions, and stimulates meaningful questions for future research. For political leaders and the public the model should explain causes of important problems and provide a basis for designing policies that can improve behavior and outcomes in the future.

[13] discuss three different categories of system dynamics model tests:

• Tests of Model Structure - Assesses structure and parameters directly, without examining relationships between structure and behavior.

• Tests of Model Behavior - Evaluates adequacy of model structure through analysis of behavior generated by the structure.

• Tests of Policy Implications - Explicitly focuses on comparing policy changes in a model and in the corresponding reality.

Specific tests in the three different categories will be discussed in the following sections.

(36)

3.3.1 Tests of Model Structure

Tests of model structure aims at testing the core structure and the parameters of the model.

3.3.2 Structure-Verification Test

Verifying structure refers to comparing structure of a model with the structure of the real system that the model represents. Structure verification may include review of the model assumptions by persons highly knowledgeable about corresponding parts of the real system. It may also involve comparing model assumptions to descriptions of decision-making and organizational relationships found in the relevant literature.

Oftentimes, structure verification tests is first conducted by the model builder by using their personal knowledge as a base. It is then extended to include criticism by other people who are experts in parts of or the whole real system that is trying to be modeled.

3.3.3 Parameter-Verification Test

Refers to testing the model parameters comparability against observations of the real system. Both conceptually and numerically. Conceptual correspon- dence means that parameters are represented in elements of the real system structure. Numerical verification might refer to determining if output values of the model are realistic when compared to data gathered from the real system.

(37)

3.3.4 Extreme-Conditions Test

Refers to setting extreme values for parameters in the model and testing whether the model produces expected and realistic results, even in these extreme conditions. Well behaved, realistic models should reflect extreme conditions sat- isfactory. If not, one can question the reliability of the model under different circumstances. [13] argue that extreme-conditions test is effective in the way that it could help identify flaws in the model and that it can help to reveal omit- ted variables. They also claim that this test can enhance usefulness of a model for analyzing policies that may force a model to operate outside of its initial context.

3.3.5 Boundary-Adequacy (Structural) Test

Tests structural relationships necessary to satisfy the purpose of a model and asks whether the model aggregation is appropriate and if the model includes all relevant structure. The test involves relating the structure of the model to the issue that is supposed to be addressed. If a plausible hypothesis for needing additional structure is developed, the boundary-adequacy test is not passed.

[13] argue that the model builder must differentiate between criticism of the model boundary-adequacy with criticism of model purpose. If the builder fails to do so, the boundary of the model can be extended indefinitely without having an effect on the usefulness or correctness of the model.

(38)

3.4 Tests of Model Behavior

Tests of model behavior evaluates the adequacy of the model structure through analysis of behavior generated by the system.

3.4.1 Behavior-Reproduction Tests

Aims to test whether or not model generated behavior matches observed behavior of the real system. Behavior reproduction tests include symptom generation, frequency generation, relative phasing, multiple mode, and behavior characteristic. Symptom generation test examines if a model recreates the symptoms of difficulty that motivated construction of the model. If the model does not show how internal policies and or structure cause the problem, actions cannot be taking to alleviate those problems. The frequency-generation and relative-phasing tests focus on periodicities of fluctuation and phase relationships between variables. The multiple-mode test considers whether or not a model is able to generate more tan one mode of observed behavior. A model able to generate two distinct periodicities of fluctuation observed in a real system provides the possibility for studying possible interaction of the modes and how policies differentially affect each mode. Multiple-mode tests might also be applied to a model that explains why one mode of historical behavior gives way to another.

(39)

3.4.2 Behavior-Prediction Tests

Behavior-prediction tests are similar to behavior-reproduction tests with the difference that behavior-prediction tests focus on future behavior rather that to replicate past phenomena. The two main behavior prediction tests are the pattern-prediction and the event-prediction test. The pattern-prediction test examines whether or not a model generates qualitatively correct patterns of future behavior. The event-prediction test focuses on a particular change in circumstances, such as a sharp drop in market share which may be found likely on the basis of analysis of model behavior.

3.4.3 Behavior-Anomaly Test

The model builder oftentimes expects the model to behave the same way as the real system does. However, when anomalous features of model behavior occurs, one can often find flaws in the model assumptions. The behavior- anomaly test is used extensively in model development, but can also play a part in model validation. [13] argue that model builders can defend certain model assumptions by showing that implausible behavior arises if the assumption is altered.

3.4.4 Surprise-Behavior Test

With a more comprehensive model, it is more likely to show behavior that is present in the real system but which might have gone unrecognized. Often such behavior comes as a surprise to the model builder, hence the name. When

(40)

such behavior is discovered, it contributes to the confidence in the model.

3.4.5 Extreme-Policy Test

The extreme-policy test involves altering a policy statement of the model in an extreme way and determining the consequences of doing so. Then it should be checked if the model behaves as one might expects under these extreme circumstances. The test shows the resilience of a model to major policy changes.

3.4.6 Boundary-Adequacy (Behavior) Test

Similar to the boundary-Adequacy (Structural) test but extended to include analysis of model behavior. The boundary-adequacy test involves conceptualizing additional structure that might influence behavior of the model, and analyzing the behavior of the model with, and without the additional structure.

3.4.7 Behavior-Sensitivity Test

The behavior sensitivity test focuses on sensitivity of model behavior to changes in parameter values. It involves changing parameter values of the model and checking whether these shifts in values drives the model behavior to fail other tests. If no such parameter values are found, it further increases the confidence in the model.

(41)

3.4.8 Tests of Policy Implications

Tests of policy-implication focus on comparing policy changes in a model and in the corresponding reality. Policy implication tests attempt to verify that response of a real system change would correspond to the response predicted by a model and to examine how robust policy implications are when changes are made in boundaries or parameters.

3.4.9 System-Improvement Test

The system-improvement test aims to test whether policies that is found beneficial in the model also correspond to improved real-system behavior. This test is the main real system test where improvements suggested by the model are being examined out in the real world. There are some difficulties in that improvements suggested by the model will not be tried until there are enough confidence gathered in it. Furthermore, in some model contexts, there may take a long time before results due to policy changes are observed.

3.4.10 Changed-Behavior-Prediction Test

This tests checks whether the model correctly predicts in which manner system behavior changes due to policy changes. The test can be conducted by changing policies in the model to check if the changes produces plausible re- sulting behavioral changes. One can also test if policy changes that have been made in real life have the same effect as would the same changes be made in the model.

(42)

3.4.11 Boundary-Adequacy (Policy) Test

As in the behavior and structural boundary-adequacy tests, the policy boundary- adequacy test examines how modifying the model boundary would affect the model by conceptualizing additional structure. In this case, it examines how modifying the model boundary would alter the models policy recommenda- tions.

3.4.12 Policy-Sensitivity Test

Parameter sensitivity can indicate the degree to which policy recommenda- tions might be influenced by uncertainty in parameter values. Conducting this test can help to reveal risks and opportunities of adopting a model for policy making. If the same policies are recommended by the model given variations of parameter values within a plausible range, potential risks of using the model recommended policies in the real system would be lower.

3.5 Qualitative Model Evaluation

Interviews has also been suggested to play an important role in the process of model evaluation and validation. They are especially important for system dynamics projects where the system builder and the client does not have the data series available for a thorough quantitative analysis, thus having to rely on a qualitative approach to model building. Furthermore, interviews could also work as a tool to promote social conversation regarding the adequacy of a model for a given purpose [14]. Structured interviews, semi-structured

(43)

interviews, and surveys could all be considered useful in terms of gathering information that could be used in system dynamics modeling. Which approach to consider will depend on the specific situation where the model is to be built and applied.

(44)

Research Methodology

4.1 Model Development

In order to develop a model that is able to describe the behavior of a system set up by organizations to acquire labeled data, it is clear that knowledge of how the real system is structured and behaves needs to be gathered by the model builder. Initially, two interviews were conducted within Klarna with individu- als that are highly involved in the manually annotated data gathering pipeline.

These were unstructured interviews where the system was discussed on a general level. The two interviewees have different roles within the company. The first interviewee is responsible for the teams that are some of the main users of annotated data within the organization, as well as the team that develops the tool used by analysts when manually annotating data. The second interviewee is responsible for relations with the analysts. The knowledge gathered from these interviews was used as a base for conceptualizing the models that are created in this study. Furthermore, additional meetings were set up with

26

(45)

the interviewees continuously during model development to enable continuous feedback to the model builder. Additionally, the model builder was given an introduction to the software tool that is used for annotation work. In the end of model development, a structural verification test was conducted with the first interviewee which acts as a final review of the model. The results of this test is further elaborated in section 6.1.3.

Note that while interviews were held during this study, this is not an interview study. The results of the interviews themselves are not the main part of the knowledge collection, but rather acts as an introduction to the system for the model builder. In this case, the model builder has been actively partici- pating in the day to day work of the host company and has therefore gained knowledge of the system through continuous interaction with it and several other employees. Model development was an iterative process that continued during the time of this project. Therefore, it is not deemed important that the interview process is formalized and explained. When it comes to reliability, the model components in this case are relatively generalizable. Meaning that for other similar systems, the model developed in this study can most likely be applied in some way. However, it should be stressed that there could be differences in system structure that would demand alterations in the model to be done for it to be applicable in other cases. Researchers in other cases might conceptualize model components that would be necessary in those cases that are not included in this model and thus leading to a different model. The validity of this research is essentially equivalent with the validity of the model. The model validity can be assessed through testing and comparison of the model with empirical data from the real system. In this study however, some of the necessary empirical data to correctly support the model validity is not available to the model builder. Thus, It is recommended for the potential users of

(46)

the model to collect this data and compare it to model output.

Moreover, some data that was used to support the model parameter values was collected through a tracking system implemented in the data annotation tool. Other model parameter values were estimated with the assistance of employees who are involved in the data gathering system. A major advantage of using this type of model is however to vary the parameter values and to ob- serve how that affects the model behavior. Note that to conserve the integrity of the host company, none of the data collected will be presented in this study because of

During model development, it became apparent that even if an automatic data labeling process was developed, there could still be a need for manual annotation in some steps of the labeling process. Therefore, variations of a combined automatic/manual model was developed.

4.2 Model Testing

For building confidence in the model, the set of tests described in sections 3.2- 3.5 was used as a base. However, because of the nature of the problem not all tests listed in this section could be followed out. Consequently, a subset of the tests are conducted in this study. Specifically, this study will be conducting three types of tests; 1) Extreme Conditions Test, 2) Variable Sensitivity Test, and 3) Structure Verification Test. These tests were chosen for their simplicity and because they are deemed to be able to produce interesting results that are of immediate use to the model builder when evaluating the model. Many of the tests that are described in section 3.2-3.5 require deep quantitative analysis

(47)

of the real system, which is not available to the model builder in this study.

Further, some of the tests are aimed at implementing changes proposed by the model in the real system and comparing real system results with model results.

These tests are beneficial when building confidence in the model but are out of scope of this project as it would require a longer time period to implement these changes and review which effects they have. Note that a system dynamics model cannot be proven correct and tests done to the model has the purpose of validating it’s usefulness.

The extreme conditions test acts as a sanity check for the model structure which makes it useful when trying to find hidden assumptions that the model is built upon. It can also reveal faulty model policies. The variable sensitivity test is useful when assessing how well the dependent variables are modeled and can reveal how uncertainties in the model parameters affect the uncertainties of the dependent variables. It is consequently a good test to utilize when deciding how and where the model should be used. Lastly, the structure verification test is an instrumental test to conduct because it can reveal misconceptions that the model builder has of the real system. Moreover, it could help to identify additional structure that would have to be incorporated in the model that was not initially conceptualized by the model builder.

4.3 Previous Studies

When conducting the initial search for work that had been done in the area it became apparent that data labeling supply structures of organizations was relatively unexplored research. There has been research conducted towards finding good practices for using and optimizing crowdsourcing [15, 16, 17].

(48)

Further, there is some research that have been aimed towards trying to predict costs of labeling individual images [18]. Additionally, there is research that investigates the development and utilization of tools used for manual, automatic, and semi-automatic data annotation [19, 20]. However, there seems to be no research that investigates data label gathering on the organizational level, at least as far as the search for previous studies that has been conducted can tell.

The following databases were used for searching:

• Google Scholar

• IEEE Xplore

• ScienceDirect

• arXiv

Key phrases that were used for the searching for previous work related to how organizations collect annotated data are as follows: Automatic dataset labeling, collecting annotated/labeled data for machine learning, collecting datasets for classification, Automatic manual data annotation, labeled data supply structure.

4.4 Software For System Dynamics Modelling

There are several different good choices of software that a model builder could utilize for system dynamics model development. For this study, the online based modeling software Insightmaker was used. Insightmaker has the advantage of being easy to use and comes with various useful features including

(49)

tools for standardised sensitivity testing, optimization, and built in graph com- parisons. It also provides a convenient way of visualizing simulation runs that is beneficial for model analysis and communication.

(50)

Model Description

5.1 Model Purpose

In this study, two models were initially conceptualized. The first one models the system that utilizes manual labor to label datasets, while the second one models the system in which actors develop and utilize automatic processes.

The purpose of these two models is to visualize and compare the two approaches in terms of the costs they are associated with to an organization.

During model development, it became apparent that an automatic and a manual approach might not be entirely separable, which created the need for a model that combines the two approaches. This combined model will together with the manual model be the ones that are tested and run in this study.

32

(51)

5.2 Target Audience

The target audience of this model is any organization that is interested in dataset labeling, and web page labeling in particular. This model is applied to the operations of Klarna AB as a case study and is tailored after the id- iosyncrasies of this organization. However, it can be argued that the basic mechanisms explained in the model would generalize to other organizations performing similar operations, making it a useful guide for model builders who wish to develop their own model on a similar task.

5.2.1 Example Case: Cart Scraping

An example of where annotated data is being used in Klarna is when extracting information from Klarna customers online shopping sessions. The goal is to save the information within the Klarna shopping application to enable customers to keep track of their online purchases in a user friendly manner.

A concept that is often found among online merchants is shopping carts. A shopping cart is where customers store the products that they aim to purchase while browsing the merchant web page. An example of how one of these shopping carts might look like is shown in figure 5.1. In this example, there are three products added to the cart. Each product is shown together with some important information that Klarna would like to store when a customer makes a purchase. This information includes the product image, product description, price, quantity, etc. The end goal is to extract this information automatically during online shopping sessions of Klarna users. Extracting this information however, is not an easy task. An extraction process needs to know where in the

(52)

Figure 5.1: Example shopping cart from IKEA.com/se.

underlying HTML-structure these pieces of information are located to be able to store them. Another factor that plays in is that an automatic process ideally should be able to extract information for a range of online merchants that all have different underlying HTML-structure in their web sites. This is where annotated data comes in. Analysts are receiving examples of HTML-snapshots of online merchant shopping carts and the locations of the information that is to be extracted are annotated in the underlying HTML-structure. These anno- tations are then used by Klarna engineers to create an automatic process that can extract this information with the help of the annotated data.

(53)

5.3 Manual Data Annotation System Descrip- tion

The knowledge that was gained during the initial set of interviews was utilized for model variable conceptualization. This description will act as a baseline for the reader to understand how the model components described in the coming section was conceptualized and why they are important to the model that is developed in this study.

The data annotation work, in the case of Klarna, is being done with the assistance of a software specifically developed for the purpose of simplifying annotating data, sending annotation batch requests, reporting errors, validating data, etc. This tool is an interface which acts as a mediator between the analysts and the department that is making the labeling request and is an essential part of the system. Naturally, developing and maintaining this kind of tool for use in production will have some cost associated with it. To be able to understand why certain model parameters were included in the model, we must understand the workflow of annotating data. In figure 5.2, a visualization of the data annotation workflow at Klarna is shown. In a nutshell, a department at Klarna has a set of data points that needs to be manually annotated according to some predefined set of rules. For example, given a merchant web page the analysts will find a set of objects and annotate those objects accordingly.

Annotating one web page will take a certain amount of time during which the analyst is being paid. After a given set of these web pages have been annotated, they are also being manually validated. In this part of the process, each labeled web page is quality controlled. If it shows that there are inaccuracies in how a certain web page has been labeled, it will have to go through an ad-

(54)

Figure 5.2: Flow chart of the states in the labeling process at Klarna.

ditional labeling round. On the other hand if the web page is deemed to have been correctly annotated, it is saved as a finished data point that can go on to be used in production. One important factor to point out is that each correct label of a web page is essentially a snapshot in time and that snapshot is as- sumed to be correct for that specific merchant, at that specific time. However, after some time have passed there might have been changes to the merchant web page structure that will make this once correctly labeled data point unusable. This event is called label decay, and consequently means that this web page will have to go through the labeling process again to be able to be used in production.

For each annotation of a data point, some amount of time will be spent by the analyst that is doing the labeling. The point estimate, or average amount of time taken for each data point can be gathered via the annotation tool. The amount of time required will most likely differ from data point to data point, and from analyst to analyst. Tracking data gathered from the annotation tool show that analysts that have recently started annotating data require more time

(55)

on average than more experienced analysts. It is plausible that this is a con- sequence of a learning period where newly hired analysts are learning how to use the tool and understand how to annotate the type of data that is requested by Klarna. This learning period means that there is additional cost associated with hiring new analysts along with the standard hourly rate.

When it comes to how many analysts there are in the system, it really de- pends on how much the organization is willing to spend on acquiring labeled data. A basic assumption that is made is that the amount of analysts will reflect how much annotation work that is required to handle all the incoming requests. Therefore, more label requests will mean more analysts in the system and consequently lead to higher costs. While an increase in label requests will increase the need for additional analysts, if the time required for analysts to label each data point is reduced (for example through improvements in the labeling tool), it will lead to a decrease in the number of analysts that are required. Moreover in the case of Klarna, analysts are full time employees which means that they are paid a monthly salary. It is however highly unlikely that an analysts is spending a 100% of their time actually doing labeling work. This means that there is a need for a component in the model that accounts for on average how much of the paid time the analysts are actually doing annotation work. This additional component is highly influential when it comes to model output. However, it is situation dependent and may be difficult to estimate.

(56)

5.4 Automatic Data Annotation System De- scription

In the case of creating an automatic process for data labeling, there is no specific case at Klarna that can be studied and to be use as a base for model component conceptualization. Thus, a more general approach to component conceptualization is taken and the components that are conceptualized are aggregated, meaning that the components could potentially be further divided up into parts for cases where it is deemed necessary. The model components have been discussed with experienced engineers who are well versed in projects where automatic processes are created.

Initializing automatic processes could possibly generate some costs to the organization. The organization could possibly have to acquire additional com- petence, or research new means or methods to be able to do the data labeling.

For machine learning based methods it often includes gathering an initial manually labeled dataset to use for model training. These factors can be aggregated into a one time cost to the organization when setting up creation of the automatic process. In the model, this is called the set-up cost. Other costs are related to developing and maintaining the quality of the automatic process.

This cost will depend on the time required for development as well as how many hired personnel are working on the project.

(57)

5.5 Model Output

The goal of this study is ultimately to be able to compare the costs to an organization of an automatic data annotation gathering approach to a manual approach. Thus, the main output of the models will be the total costs of acquiring annotated data for an organization. However, there are other possible outputs of the models that ultimately depend on how the model user decides how to use the models. For example, since the amount of labels that are supported at any given unit of time is shown, a possible output could be the cost per label by taking total cost and divide it by the number of labels that are supported. Further, the model user could decide to set a constant amount of analysts that will do data annotation. In this case the model output would be the amount of labels that can be supported by the system with a given amount of analysts. By dividing the amount of supported labels with the number of analysts present, the model user would get the number of labels per analyst that can be supported in the system. The fact that the models can have multiple different outputs is an advantage since it makes the model dynamic and somewhat adaptive under different circumstances.

5.6 Component of the System Dynamics Model

This section describes the components included in the models that were initially conceptualized based on the knowledge gained from the interviews. A brief description of each variable is provided. Recall from the theory section that the model boundary is all the components included in the model represented as a listing of those components. Some components are deemed to re-

(58)

quire a further description and have their respective paragraphs in section 5.7.

Table 5.1 displays the division of variables into stocks, flows, and variables.

Sub parts of the model and their connections are elaborated on in section 5.9.

The full stock and flow diagrams are displayed in appendix A in Figure A.1.

Table 5.2 displays the division of each variable into endogenous and exogenous variables. This division assists the model builder and users of the model in understanding which components are parts of the system’s feedback loops and which components values are not affected by other parts in the system.

Manual Labeling Model Variables (Variable Name - Description - Unit of Measurement):

• Time Per Label - The average time it takes for an analyst to label one data point - Hours

• Time Per Validation - The average time it takes for an analyst to validate one data point - Hours

• Labeling Requests - The number of new label requests that is put into the system at each unit of time - Quantity

• Labeling Tasks - The current number of tasks that need to be

labeled - Quantity

• Re-Labeling Tasks - The current number of tasks that has been validated negatively and require re-labeling - Quantity

• Validation Tasks - The current number of labeled tasks that require validation - Quantity

• Labeling to Validation - The flow of labeling tasks to validation tasks - Quantity

• Validation to Re-Labeling - The

(59)

flow of validation tasks to re- labeling tasks - Quantity

• Validation to Finished - The flow of validation tasks to finished tasks - Quantity

• Re-Labeling to Finished - The flow of re-labeling tasks to finished tasks - Quantity

• Labeled Bank - The current number of labeled data points that are validated positively - Quantity

• Expected Manual Accuracy - The fraction of labels that are expected to be labeled accu- rately of the labeling tasks - Unitless

• Decayed Labels - The flow of labels from the Labeling bank to require new labels - Quantity

• Number of Analysts - The number of analysts working on labeling and validating tasks - Quantity

• Analysts Required - The number of analysts required to finish all tasks within the given unit of time - Quantity

• Analyst Hourly Rate - The hourly salary of one analyst - e

• Analyst Loss Rate - The fraction of analyst that are removed in a unit of time - Unitless

• Analyst Set-Up Cost - Cost of adding one analyst into the system - e

• Analyst Hired - The flow of new analysts entering the system - Quantity

• Analyst Loss - The flow of analysts exiting the system - Unit- less

• Labeling Tool Server Cost - Costs associated with the up- keep of the tools used by analysts - e

• Labeling Tool Maintenance Cost - Costs associated with de-

(60)

Stocks Flows Variables Labeling Tasks Labeling to Validation Time Per Label Re-Labeling Tasks Labeling Requests Time Per Validation

Labeled Bank Validation to Re-Labeling Expected Manual Accuracy Validation Tasks Validation to Finished Analyst Hourly Rate Number of Analysts Re-Labeling to Finished Analyst Loss Rate

Decayed Labels Effective Time Available Per Analyst

Analyst Hired Analysts Required

Analyst Loss Analyst Set-Up Cost Labeling Tool Server Cost Labeling Tool Maintenance Cost

Analyst Costs

Total Cost Per Unit Of Time

Table 5.1: Table that divides model components into stocks, flows, and variables.

velopment and maintenance of the tool used by analysts - e

• Effective Time Available Per Analyst - Time spent by analysts doing tasks per unit of time - Hours

• Analyst Costs - Total costs of analysts in a given unit of time - e

• Total Cost Per Unit Of Time - Total costs of manual labeling in a given unit of time - e

(61)

Endogenous Exogenous

Labeling Tasks Time Per Label

Re-Labeling Tasks Time Per Validation Labeling to Validation Labeling Requests Validation to Re-Labeling Expected Manual Accuracy

Validation to Finished Analyst Hourly Rate Re-Labeling to Finished Analyst Loss Rate

Labeled Bank Effective Time Available Per Analyst Decayed Labels

Number of Analysts Analysts Required Analyst Set-Up Cost

Analyst Hired Analyst Loss Labeling Tool Server Cost Labeling Tool Maintenance Cost

Analyst Costs

Total Cost Per Unit Of Time

Table 5.2: Table that divides model components into endogenous and exogenous.