The Cost of Algorithmic decisions: A Systematic Literature Review

(1)

THE COST OF ALGORITHMIC DECISIONS

A Systematic Literature Review

Annalena Erhard

Master Programme in Data Science 2021

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

(2)

I n t r o d u c t i o n | 1-I

Annalena Erhard nanreh-0

Luleå Tekniska Universitet Supervisor: Ahmed Elragal

The cost of algorithmic decisions

Thesis, Master Data Science

2021.06.10

(3)

I n t r o d u c t i o n | 1-II

1 Introduction... 1

2 Methodology ... 4

3 Decision-making Systems an Overview ... 8

4 Cost drivers of solely algorithms generating decisions ... 10

4.1 Direct cost ... 10

4.1.1 Cost of implementing and designing Algorithms ... 10

4.1.2 Cost of executing Algorithms ... 11

4.1.3 Cost of maintaining Algorithms ... 11

4.2 Indirect costs ... 13

4.2.1 Data storage cost ... 13

4.3 Social and Environmental costs ... 16

4.3.1 Water consumption ... 16

4.3.2 Cost of fairness ... 16

4.3.3 Social cost ... 17

5 Cost drivers of hybrid decision making ... 18

5.1 Business Intelligence Systems ... 18

5.2 Theoretical costs... 18

5.3 Costs Factors derived from Decision Support System Applications ... 20

6 Discussion ... 21

7 Contribution ... 23

8 Conclusion ... 24

9 Implications ... 26

References ... 27

(4)

I n t r o d u c t i o n | 1-III

ABSTRACT

Decisions have been automated since the early days. Ever since the rise of AI, ML and Data

Analytics, algorithmic decision-making has experienced a boom time. Nowadays, using AI within

a company is said to be critical to the success of a company. Considering the point that it can be

quite costly to develop AI/ ML and integrating it into decision-making, it is striking how little

research was put into the identification and analysis of its cost drivers by now. This thesis is a

contribution to raise and the awareness of possible cost drivers to algorithmic decisions. The

topic was divided in two subgroups. That is solely algorithms and hybrid decision-making. A

systematic literature review was conducted to create a theoretical base for further research. The

cost drivers for algorithms to make decisions without human interaction, the identified cost

drivers identified can be found at Data Storage (including initial, floor rent, energy, service,

disposal, and environmental costs), Data Processing, Transferring and Migrating. Additionally,

social costs and the ones related to fairness as well as the ones related to algorithms themselves

(Implementation and Design, Execution and Maintenance) could be found. Business Intelligence

used for decision making raises costs in Data quality, Update delays of cloud systems, Personnel

and Personnel training, Hardware, Software, Maintenance and Data Storage. Moreover, it is

important to say that the recurrence of some costs was detected. Further research should go in

the direction of applicability of the theoretical costs in practice.

(5)

I n t r o d u c t i o n | 1

1 Introduction

Decision-making is a part of everyone´s daily life, whether it is in the medical field, business, juristic, or personal/ private decisions. These decisions can be carried out by physical deciding units like a human or robots, or nonphysical entities meaning business intelligence applications like i.e. decision support systems. [1] Decision-making potentially involves a long process. This is because all possible outcomes should be defined to correctly assess the situation along with its consequences. Decisions were automated way back in 800 B.C, but in the last few years, a lot of research has been done in the area of decision making and decision theory. [1] Another point that led to the rise of decision-making automation was that there were rapid developments in the fields of Data Science, Machine Learning (ML), and Big Data. [2]

In the mid-twentieth century, the first records of machines and humans working together appeared, and it was said that together they could make quite powerful predictions. Since then, the combination of intuitive human decision-making and data analytics has stood for generating more rational and possibly more valuable decisions. [2]

The automation step nowadays includes algorithms; hence it is called ‘algorithmic decision- making’. But this is not the only generic term used for it in research. Some synonyms for algorithmic decisions are data-driven, automated, and evidence-based decision-making.

Also, the terms Artificial Intelligence (AI) and ML will be mentioned more frequently in the next few pages. This is because AI has become increasingly important in recent years. With this rise, the goal was to extend decision theory to intelligent systems. Algorithms should 'learn' or be 'taught' to make their own decisions. Mostly the methods of Data Analytics and ML are used for achieving information-based decisions. [2]

Only a few of the areas of application are listed and briefly described by Kochenderfer et alia [1]. These include, for example, automated driving, breast cancer screening, and distributed wildfire surveillance [1]. It quickly becomes clear that this is the near future. Nevertheless, the optimum is still far from being reached as the following example shows.

In April and May 2020, Appen conducted a study with 374 respondents out of 19 different industries. Within this study about the state of the art in AI and ML, 49 percent of respondents answered their company was behind when it comes to adopting AI to the company while also 73 percent of them responded it is critical to the success of the company. Another interesting finding was, that the budgets for AI - initiatives with 500k – 5M as well as greater than 5 M doubled. [3]

Considering this fact and also the point that it can be quite costly to develop AI/ ML and

integrating it into decision-making, it is striking how little research was put into the

identification and analysis of the cost drivers. In the paper “Artificial intelligence for decision

making in the era of Big Data – evolution, challenges and research agenda” by Duan et alia

[4], twelve propositions regarding further research in the area of AI and decision science

were given. Proposition 2 described the difficulty of measuring the benefits and impact of

AI. This is not only a challenge of measuring the costs and benefits short- or long term, but

also from which perspective. A political point of view might ignore the social or

environmental impacts and from a social perspective, business-related costs might be

overgone. For a business, it is essential to know how best to make strategic decisions. The

question now is what consequences it has for decision-makers from which position possible

(6)

I n t r o d u c t i o n | 2

consequences are viewed at and what possibilities it offers to list, summarize and calculate the different costs of decision processes.

Especially digging a bit deeper, combining the algorithms of AI and ML with data to support decision making. In traditional software project management, cost accounting or cost models create a base for allocating resources, staying in budget and finishing a project in time. [5]

Just like that, it is substantial for any project - manager or decision - maker to be able to define the costs associated with the development, execution, and maintenance of any system within the decision-making process.

Project Outline

But Duan et alias [4] proposition 2 has already indicated one of the research questions for this thesis to contribute to cost accounting within the decision making process. This is

- what the costs or cost drivers of algorithmic decisions are ?

Usually, decisions are not only taken by humans. Often some degree of support is involved.

- Among these varying degrees of let’s say human involvement in the decision-making process, how are the costs distributed?

And considering the point of view of the decision executive.

- Does a change of perspective change the costs associated with generating the decision?

In the course of this thesis, these are the main research questions aimed to be answered.

There are several ways to get answers for this. The first possibility (1) is to conduct a systematic literature review to find out what has been researched so far, to identify gaps, and to present the possible cost factors. It is not only important to find out the cost areas, but also how the costs are distributed on human-machine interaction levels. Possibility (2) would be to conduct studies in cooperation with companies in which decisions are taken over by intelligent applications or completely by machines. It would be necessary to record the costs incurred in the implementation of these systems and in which areas and which of these costs occur repeatedly. An accurate listing of these costs from as many companies as possible from different business areas is necessary to create a model of how the costs can be approximated in the future. This would enable companies to calculate project costs for such initiatives.

(1) can be seen as a basis for possibility (2). First of all, the basic theoretical knowledge should be available in which areas costs can arise, in which framework they move and how they are influenced. In addition, it should first be found out from the literature how the costs differ when only machines or when machines and humans generate decisions in cooperation with each other (hybrid decision making).

Based on this knowledge, an approach can be developed as to how companies measure

their costs and which areas they should put their focus on. Since (1) has not been deeply

dealt with in the research so far, this is the first step and will be tackled in the course of

this thesis. In the following chapters, first, the research method of systematic literature

research and exact approach will be explained (2 Methodology). Further on, an overview of

some different existing decision-making structures is given (3 Decision-making Systems

an Overview). It will be divided into two main parts, solely machines taking over this part (4

Cost drivers of solely algorithms generating decisions) or machines and humans achieving a

(7)

I n t r o d u c t i o n | 3

decision together (5 Cost drivers of hybrid decision making). For each of these two structures, the driving cost factors are identified and summarized. Especially with algorithms generating decisions, the costs are divided into direct, indirect and social and environmental ones. In section 6 Discussion, the found aspects are summarized and possible overlaps are detected. Also, missing parts in the literature are pointed out. This and the second step, which will not be elaborated on here will be a task for future research.

A research agenda can be found at the end of this paper in the Conclusion in chapter 8. To

stress the contribution (chapter 7) of this thesis to the current research as well as its

implications (chapter 9), this is given afterwards.

(8)

M e t h o d o l o g y | 4

2 Methodology

The above-described topic will be tackled during this thesis. Conducting a literature review is one of the ways to perform research.

It is important, on the one hand, to place the research topic in the field of previous knowledge and thus to ensure the value of new knowledge and, on the other hand, to generate new knowledge from several already published works [6]. The question arises of how to best conduct literature research. Among others, vom Brocke et alia answer this in their paper “Reconstructing the giant”, which they have published in 2009. Guidelines for literature reviews were proposed and are followed within this thesis. It consists of multiple stages where some of them are more closely pointed out in the following:

1. As it is stated, it is difficult to define an adequate scope when conducting a literature review. This challenge is tackled in the first stage of the “Framework for literature reviewing” [6]. The authors proposed using “Cooper´s taxonomy” for this. It is a framework with 6 characteristics helping to define this scope. The first one to point out was the focus of the study. It can be chosen between emphasizing the work on

‘research outcomes’, ‘research methods’, ‘theories’, or ‘applications. For this project, the main attention will be given to research outcomes and summarization of recent application results. The second characteristic tackles the ‘goal’ of the research. Let it be integrating, criticizing, or summarizing your findings. Within this literature review, the found costs of algorithmic decisions will be integrated into the surroundings of current research. As a third step, the organization of the topic research, let it be ‘historical’, ‘conceptual’, or ‘methodological’ is defined.

Characteristics 4 and 5 focus on the perspective taken by the writer (4), which will be a neutral representation, and the type and level of knowledge of the audience (5). The audience addressed in this paper are General Scholars, hence readers that are relatively conform with this area of research. The last point defines the degree of coverage of the review. It can be ‘exhaustive’, ‘exhaustive and selective’,

‘representative’, and ‘central/pivotal’. [6]

Of course, the main goal is to present an exhaustive coverage of the available literature. Unfortunately, there are limitations to this paper that need to be kept in mind: primarily, the restriction to certain search terms within the literature review process could lead to an exclusion of relevant contributions. Another point is the limitation to the databases used, which could also exclude relevant results.

Therefore the literature coverage of this thesis can be defined as exhaustive and selective.

Figure 2 shows an overview of the characteristics and categories introduced by

Cooper in 1988 and proposed by vom Brocke et alia [6] in 2002. The taxonomies

followed by during the course of this research are colored in this graphic in light grey.

(9)

M e t h o d o l o g y | 5

2. According to vom Brocke et alia [6], phase 2 of the literature search process should circle around reconstructing already existing papers of the research topics and defining keywords and terms for future literature search. For this process, a mind map was created to visualize these and give an overview of their relationship with the examined topic. This graphic can be found in Figure 1. The title of this research work,

‘The cost of algorithmic decisions’, can be found in the middle of the mind map and can be divided into two separate topics. On the . Beginning with the terms on the right side, Nada et alia noted the interchangeability between the concept – wording of

Characteristic Categories

1 Focus Research

outcomes

Research methods

Theories Applications

2 Goal Integration Criticism Central issues

3 Organization Historical Conceptual Methodological

4 Perspective Neutral representation Espousal of position

5 Audience Specialized

scholars

General scholars

Practitioners/politicians General public

6 Coverage Exhaustive Exhaustive and

selective

Representative Central/pivotal

Figure 2: Showing the “Taxonomy of literature Reviews” introduced by Cooper in 1988 and proposed as a tool for defining the scope of a literature Review by vom Brocke et alia. The light grey marked fields represent the choice of categories by each of the characteristics that are used in the course of this thesis.

Figure 1: Mind map of keywords describing the research topic that were collected from various main sources.

(10)

M e t h o d o l o g y | 6

“data-driven, evidence-driven, fact-driven, AI-based, algorithmic decision making, or even automated decision making” [2]. Hence these termina were taken into account for the future search for this section of the topic. The first part of the investigated topic leads to a bit of a tricky situation. Searching through databases with i.e., the terms

‘data-driven decision’ and ‘cost’ will lead to a high amount of studies and research focusing on the cost optimizing value of introducing data-driven decision making into a firm. This, even more, points out the need for more research in the area of not only looking at the cost optimizing effect of such but also considering the cost associated with it. Search – Terms that lead to more useful results could be taken out of the keywords of similar papers and resulted in maintenance cost, cost model, technical dept, runtime cost, cost analysis, full cost, and training cost.

3. In the third phase of the process, the main literature research is conducted. It consists of journal, database, and keyword search of the in phase 2 defined keywords, as well as backward and forward searches. A backward search is defined as tracking back the older sources of a paper, whereas a forward search is described as the examination of newer written pieces that cited the starting piece. In the whole process, it is said to be important to limit down “the amount of literature […] to only those articles relevant to the topic at hand” [6] by reading the titles or abstracts and deciding on whether to keep or not keep an article. [6]

For this research, literature from the databases Scopus, ScienceDirect, ACM, EBSCO, and IEEE was conducted.

4. In phase 4, the selected literature pieces are summarized.

5. The last step contains compounding the research questions not being answered by the outcome of phase 4 into a research agenda. It is a valuable tool for the scientific community to pick up from there and conduct further research to create additional knowledge.

Figure 3: Literature research Process of this thesis. The Databases Scopus, IEEE, ACM, EBSCO, and Science Direct were used for keyword searches. In a second step, to reduce the number of total hits, a filter by year and language were applied, followed by a filter of keywords. Furthermore, abstracts and full papers were read to narrow the findings down even more. Through a final additional backward and forward search, a total of 23 papers were found and summarized for the purpose of this project.

The way the literature research was conducted is depicted in Figure 3. As already said, the Databases Scopus, IEEE, ACM, EBSCO, and Science Direct were used for keyword searches. In a second step, to reduce the number of total hits of 105517, a filter by year

Database Search

- Scopus - IEEE - ACM - EBSCO - ScienceDire

ct

10551 7

Filtered out by year and

Filtered out by keywords

Backward-/

Forward Search

69933 3572

Abstract and full paper reading

23

(11)

M e t h o d o l o g y | 7

and language was applied resulting in 69933, followed by a filter of keywords leading to

3572 papers. Furthermore, abstracts and full papers were read to narrow the findings

down even more. Through a final additional backward and forward search, a total of 23

papers were found and summarized for the purpose of this project.

(12)

D e c i s i o n - m a k i n g S y s t e m s a n O v e r v i e w | 8

3 Decision-making Systems an Overview

Data and its enormous collections are one of the main contributors to the fourth industrial revolution. [7] If one might accept it or not, it is part of almost everyone's daily life and more

than we think influencing decision making. Even though it is supposed to carry information and insights that are raising the opportunity for a decision-maker to make better decisions, decisions are not only based on data. Figure 4 shows decision-making structures, introduced and published by Eric Colson in Harvard Business Review [8] in 2019. Graphic a) shows a basic process of solely humans taking business decisions with no outside influence. A decision-making model without any human involvement is shown in graphic b). There AI bases the business decision on the underlying data. The last two graphics ( c) and d) ) give an example of how humans can be involved in decision-making. On the one hand, the machines are only dealing with the data and a human's job is to derive a business decision from seeing summarized data (c)). On the other hand, the model shows machines dealing with data as well as their interpretation. Humans get presented a set of possible options which to choose from in combination with possibly non-digital additional information. [8]

A similar approach of subdividing decision-making scenarios with varying degrees of human involvement was taken by Shrestha et alia in their article about “Organizational Decision- Making Structures in the Age of Artificial Intelligence” [9].

Solely machines making decisions is defined as humans delegating their decision-making tasks to AI. This is done mainly when decisions can and should be replicated afterward, it is more important to have an accurate than interpretable decision, the information base is

a) b)

c) d)

Figure 4: Decision making-systems with varying degrees of human involvement. Graphic a) shows a basic process of

solely humans taking business decision with no outside influence. A decision- making model without human involvement

is shown in graphic b). There AI bases the business decision on the underlying data. The last two graphics ( c) and d) )

give an example of how humans can be involved in decision making. On the one hand the machines are only dealing with

the data and a human’s job is to derive a business decision from seeing summarized data On the other hand the model

shows machines dealing with the data as well as their interpretation. Humans get presented a set of possible options

which to choose from in combination with possibly nondigital additional information. [8]

(13)

D e c i s i o n - m a k i n g S y s t e m s a n O v e r v i e w | 9

similarly restricted and specific every time the decision is made and it is very important to quickly get to an outcome. [9]

A second form of decision-making, which is not purely taken over by algorithms is presented as well. It is named “hybrid sequential decision-making structures”[9]. The term hybrid stands for combining human and AI strengths. Sequential decision-making defines the two parties executing their tasks after each other. On the one hand it is Ai – based decision – making applied to a large set of alternatives reducing the number of options in a first step and secondly humans choosing one option of the reduced set. This enables humans to effectively handling quite complex decisions with a lot of different options to choose from. On the other hand, human decision making can be done first and taken as an input for algorithms. Humans then have to reduce a large number of alternatives to a smaller set of options, while the algorithms job is finding out the best one to take. In case it is hard for human augmentation to differ between possible outcomes, this sequence should be followed. [9]

If human and AI are not deciding based on the others preselection, Shrestha et alia proposed a different structure. Humans as well as algorithms will make the same decisions, considering the information they have. In the end the decisions are aggregated, like majority voting. This enables both parties to play out their strengths such as human social capabilities and algorithmic objectivity and productivity. [9]

But which of these systems are included in algorithmic decision-making? As already defined, algorithmic decision-making “refers to the automation of decisions” [10]. Solely humans taking over this part is normally “based on experience, intuition, and context” [10], whereas more accurate, faster, and less subjective decisions can be made by algorithms “based on statistical models” [10]. The efficiency that automation provides is due to the simplification of the decision-making processes which reduces uncertainty by involving less of the context information. [10] However, a major focus of artificial intelligence is to make decisions based on uncertainty. [1] Through this uncertainty, costs can arise for not considered context units.

As stated above, machines do not always take decisions on their own. Often humans are involved in the picture as well. These “human-machine interfaces” [11] are finding increased use, resulting in many decisions taken by humans interacting with machines. [11] Whether physical entities such as human beings or robots or non-physical entities make decisions alone or in collaboration with another entity, the observations of surrounding conditions and properties are included in all of these entities. [1]

In the following chapters, the costs of algorithmic decisions are determined. This is divided

into the costs that arise by solely algorithms making decisions in chapter 4 (Cost drivers of

solely algorithms generating decisions) on page 10, and the cost drivers of hybrid decision

generation in chapter 5 (Cost drivers of hybrid decision making) on page 18.

(14)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 10

4 Cost drivers of solely algorithms generating decisions

As it was described in chapter 3, Decision-making Systems an Overview, on page 8, different decision-making structures could be defined. These were by Eric Colson [8] and Shresta et alia [9]. Summarizing this into two major areas, on the one hand it is solely machines generating decisions. The costs that arise from this will be discussed first, within this chapter. The second one is Machines and humans interacting with each other and taking decisions conjoined. This will be called hybrid decision making within this thesis. The costs resulting from that are summarized in chapter 5.

Figure 5: This graphic depicts the cost drivers for solely machines(algorithms) making decisions derived from Literature. It includes the costs for storing data, processing, transferring, and migrating it which is used by the algorithm, as well as the cost of handling the algorithms (including staff and staff training cost) itself and the ones resulting out of a taken decision.

The chapter about the costs of solely algorithms making decisions on their own is divided into three parts. As it is proposed by Dutta et alia [12], a full cost accounting framework is set up by the parts Direct private, Indirect Private, and Social and Environmental costs.

A similar approach is followed for the analysis here. It is started with the direct costs of algorithmic decisions in chapter 4.1, which includes the ones resulting from implementing and designing, maintaining, and running algorithms, including the costs that need to be invested for hiring educated, trained and skilled staff. Secondly, the Indirect costs are summarized in chapter 4.2 concluding with the social and environmental ones in 4.3.

4.1 Direct cost

4.1.1 Cost of implementing and designing Algorithms

Algorithms are used in various applications as a deciding unit without human

intervention. One area of application much discussed in the media is that of automated

driving, but they are also being utilized in aviation systems, in finance, and also in the

judicial sector. [1, 13] When dealing with algorithms, one of the first steps are the design

and implementation.

(15)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 11

4.1.2 Cost of executing Algorithms

Running the algorithm is the next step in producing cost. The execution process is quite difficult to measure and generalize. This is partly because both execution time and energy costs depend on the system specifications. On the other hand, background processes like security applications and an active network connection can increase energy consumption and thus contaminate the cost values. [14] Nevertheless, there are some costs that come from executing (software or) algorithms, which can be approximated by various cost estimation systems. Some of these techniques have been summarized by M. W. Asres, L. Ardito et alia [14]. It can be estimated by a hardware-based approach measuring the system power of the computing system or by power sensors on motherboards. The second approach is modeling the resource usage and the third is a software-based approach. [14]

Another cost factor that should not be forgotten arises from the size or scope of the data used. This issue was exemplary specified in the paper “An efficient data preprocessing approach for large scale medical data mining” [15]. The authors Hu et alia point out that very large data volumes influence the computational costs of data mining processes, which fall into the area of algorithmic decision support. For the medical field, the authors suggest applying efficient data preprocessing (EDP) methods to large-scale projects as a solution to reduce the costs involved. Thus, it needs to be stated that the processing time of three of the suggested EDP methods takes between 12 and 18394 hours, which is quite a lot of time difference (1/2 and 765 days). [15]

As it can be seen, there are quite some factors included in calculating the price of running an algorithm. This problem was also addressed by Kaplunovic and Yesha [16]. Often the question arises of where or on what instance to run machine learning algorithm on.

According to them, this question got more and more important as Big Data got more popular. Using i.e. Amazon Web Services (AWS) is a solution for this problem. A recently conducted study by Appen [3] showed an increased usage of AWS from 13 percent in 2019 to 25 percent in 2020. Estimating the running time of Machine Learning algorithms can be used there to allow the customer to estimate the arising expenses.

4.1.3 Cost of maintaining Algorithms

A third point at which costs can develop in connection with algorithms is that of

maintenance. In Figure 5 this is specified along with the previously discussed items of

implementing and designing as well as executing algorithms. The maintenance of

algorithms is an often forgotten factor, but there are “massive ongoing maintenance

costs in real-world ML systems” [17]. The Data Scientist Report of 2018 shows that only

approximately 38 percent of data scientists spent more than 50 percent of their time

working with AI. [18] Is that enough to create code with high-quality standards with which

to work sustainably and without large-scale cost traps? In addition, code is only part of

the scope of work associated with machine learning systems, as a graphic below (Figure

6) shows. The boxes represent the proportion of work associated with different parts of

(16)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 12

“real-world Machine Learning Systems” [17]. Only a small part is put in the fraction of ML code (the small black box in Figure 6).

Figure 6: A graphic showing the aspects included in “real-world Machine Learning Systems” [17] . The black small box in the middle shows the proportion of work dependent on the code of these ML systems. Other bigger areas of work are for example Data Collection, Data Verification, and the Serving Infrastructure. [17]

Image Source: [17]

In the paper ‘Hidden Technical Debt in Machine Learning Systems’ by Sculley, Holt et alia [17] the concept of ‘technical debt’ in combination with Machine Learning is tackled.

Technical debt is depicted as a symbol for the “long-term cost incurred by moving quickly in software engineering” [17]. Technical debt is described as being the cost arised by ‘bad’

setup or structured coding and can be paid back by “refactoring code, improving unit tests, deleting dead code, reducing dependencies, tightening APIs, and improving documentation” [17]. It is argued that Machine Learning systems are especially susceptible to the technical debt since the problems of maintaining traditional code are combined with a couple of instances that are unique to ML-code. [17]

What I want to drive some attention to here are some areas of coding set stress to in the course of the paper by Sculley et alia [17]. On the one hand, there is a big danger in using multiple coding languages within one product. Often it enlarges the costs of testing effectively as well as making it harder to transfer “ownership to other individuals” [17].

Secondly, it should be considered that issues with data dependencies can be more costly than the ones with code dependencies due to their higher difficulty of being detected.

Configuring the whole Machine Learning system is also one of the costly areas. Mistakes here can make the process a lot more time-consuming, wastes computing resources and can lead to production issues. [17] On top of that, of course versioning one system raises all types of costs in multiple instances. [17] But one thing is sure: the full cost left over after most of the debt is paid off will only be visible and measurable over time. [17]

In my opinion, that is one of the reasons why a cost estimation system should already be installed and used before and during projects, not to let the costs explode and get a surprise receipt in the end.

One cost factor that also arises during the implementation, execution and maintenance of

algorithms is that of personnel costs. AI is an attribute that is often praised in connection

with the automation of processes in the area of business, but its full potential can only be

exploited with the help of trained and skilled employees. Not only can an algorithm perform

less if the preparation by the human was not done properly, or it is not clear at the end how

(17)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 13

to deal with the output of the machine. For this not only Data Scientists are needed, but also "top analysts, contract managers, salespeople, recruiters, and other specialists". [19]

4.2 Indirect costs

The cost factors filtered out that are directly related to algorithms are those that were examined more closely in the three previous sections. However, an algorithm cannot recommend a course of action or make a decision without a very specific important component. This missing component is the data. In Figure 5, all the derived cost Factors involved in data processes can be found on the left part of the graphic. Different illusions about storing data seem to exist. On the one hand, data storage seems to be infinitely available and on the other hand, it is assumed to be quite cheap or even free of charge.

[12] Even though storage costs are reducing by about 50 percent every one and a half years, there are definitely costs involved in the process that is not to ignore. [12] These costs are identified on a theoretical basis in the following abstracts. Dividing it into the costs directly implied by storage systems and the not so implicit ones like environmental costs, water usage, and CO2 emissions as well as social and fairness associated ones.

4.2.1 Data storage cost

One of the implicit costs of algorithmic decisions is arising by storing data. In the century of big data that means storing quite large amounts of it. These large amounts of data are among others stored in data centers. “Data centers are a fundamental part of the IT operations and provide computing facilities to large entities, such as online social networks, cloud computing services, online businesses, hospitals, and universities” [20].

The total costs arising from creating, running, operating, and maintaining data centers are important from a business and computational point of view. [12]

Two different papers are introduced here that are dealing with the identification and calculation of storage costs. One of them was written by Boukhelef et alia [21]. It is focused on outsourcing the process of Data storage. It is said that one of the advantages of outsourcing is the reduction of costs. Companies like Amazon and Microsoft provide Cloud services or Database as a Service (DBaaS) to their customers. [21] Between the customer and the provider, a service level agreement is created. It defines the scope of resources and their quality which are available to the customer. If these agreements are violated on the part of the provider, penalties may be due, depending on the duration and severity of the service deviation. So in addition to basic setup running and maintenance cost these penalties or even just the cost of customer dissatisfaction needs to be taken into account for a total cost calculation. [21] In the paper “A Cost Model for DBaaS Storage” the authors additionally identify some cost-driving areas of such a system. They divide it into the storage system, migration, and workload cost (including energy, endurance, and penalty costs). Their proposed cost model, however, still inherits one missing property. The possible case of distributed storage was not taken into account.

Therefore the costs arised by network-related processes are left out. [21]

(18)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 14

A second analysis was taken by Dutta et alia. In “How much does storage really cost?

Towards a full cost accounting model for data storage” [12] the authors put a broader view on the process of storing data and its including cost drivers. Even though the paper is a little bit older than the one presented above, it gives a good indication of possible pricing dangers. It is said that not only the technical infrastructural costs but also human and environmental costs should be considered. The approach presented here follows the full cost accounting principle. “Full cost accounting is a systematic approach for identifying, summing up, and reporting the costs involved in the complete life cycle of a product or process” [12]. While usual accounting techniques focus on only the direct costs, the full cost accounting approach tries to also include hidden and indirect costs into the model. [12] As already stated before it is also extraordinarily important to include the hidden, not so obvious costs in the calculation. The cost drivers identified by the authors are the following:

• Initial

• Floor Rent

• Energy

• Service

• Disposal

• Environment

The Initial costs are the first ones addressed here. It not only includes the initial purchase of so-called networking equipment like routers, switches, and wires but also furniture and other miscellaneous accessories. [12]

Another cost factor is that of property rent. This depends heavily on where the data center is located. In rural areas, the price will be lower than in the city, although this definitely varies from country to country. [12]

Energy is the third main cost driver that is addressed by Dutta et alia [12]. In Datacenters, power is for example used for keeping servers, networks, and disks up and running as well as for maintaining the overall infrastructure, for cooling and security. [12] It is to mention that about 34 percent of the energy is used to air-condition the environment.

With this knowledge, it is no longer difficult to imagine that the entire data center industry is responsible for 1.5 % of the world's energy consumption and for about 2 percent of the United States' energy consumption. The U.S. is therefore a country that uses between 25 and 30 percent of the world's total data center energy. [20] Another way to cool a data center is with water. The resulting costs, both monetary and environmental, are described in more detail in the next chapter.

Service costs are another factor that must be added to the total costs. These arise, for example, in the areas of software development, infrastructure setup, and maintenance.

The costs for employees are also directly related to their work experience, which should be taken into account when calculating the service costs. [12]

Next on the list are disposal costs. After some time, disks might be replaced by newer

ones. One of the possible disposal solutions and one effective one is to physically destroy

it. This process “requires powerful and expensive machines” [12], which is why the

disposal process is often outsourced and taken over by other companies.[12] To pay for

(19)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 15

the environmental consequences of the physical destruction of the discs, companies pay their contribution to environmental organizations for the ecological footprint caused. [12]

This links well to the next and last cost factor of data storage mentioned here, the cost for the environment. As described in detail above, a large amount of electricity/energy is consumed by data centers. Large diesel generators are installed to provide power for emergency situations such as environmental disasters or power outages. Although these are intended for exceptional situations and are switched off most of the time, they still emit emissions continuously. [12] One possibility for cooling data centers is evaporative cooling, which is often used. This cooling method collects both impurities and, for example, antibacterial agents that are supposed to keep the cooling systems clean. If the water left behind is incorrectly handled, it can severely pollute the environment. [12]

Let me give you a small example from the authors of the theoretical cost factors, Dutta et alia, described above. This may help to understand the proportions of the costs a little better. Using a data center from practice, the costs for initial, floor energy, and service were added together. The environmental costs were not taken into account. The pie chart below (Figure 7) visualizes the percentage distribution of costs across the four categories (with service costs split into personnel and software). With a share of 65% of the calculated total costs, Floor Rent has by far the highest costs. Looking at the actual values of the costs incurred (between $22k and $117k per calculation area), these are definitely not to be swept under the table. The authors of the paper “How Much Does Storage Really Cost? Towards a Full Cost Accounting Model for Data Storage” have next broken down the costs to per byte. The total costs, which originally amount to 823809.9900$, drop to 71.51×10

³

picocents per byte. [12]

Figure 7: Pie chart representing the percentages of capital used for different categories. The numbers were taken from an example given in “How Much Does Storage Really Cost? Towards a Full Cost Accounting Model for Data Storage”. The main cost drivers identified are Floor rent with 65%, Initial with 14 %, Software and Personnel costs with 9 % each, and energy costs with 3 %.

Finally, one cost factor must be mentioned that has been omitted so far. From the authors Chiaraviglio et alia [22] it can be added that the processing, migrating, and transferring of data causes costs related to the CPU´s energy consumption as well. [22]

These processes are depicted together with the previously mentioned costs in Figure 5.

Initial 14%

Floor 65%

Energy 3%

Personnel 9%

Software 9%

Percentage cost of total Cost

Initial Floor Energy Personnel Software

(20)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 16

4.3 Social and Environmental costs 4.3.1 Water consumption

As already mentioned in the chapter about Data storage cost (4.2.1), costs also arise from water consumption. This issue is not only important in terms of monetary costs, but also for the environment and humanity. The water required is used directly for cooling and indirectly for energy generation. As already mentioned, data centers such as those of Amazon, Google, or Microsoft require a large amount of energy to operate the technical units. "In 2014, a total of 626 billion liters of water use was attributable to US data centers” [23]. Another comparison can be shown. Even a medium-sized data center consumes as much water as three normal-sized hospitals combined. The problem is that the world's scarce drinking water is often used. Although recycled water is often utilized, some operators still draw more than half of their water from traditional resources. This causes great stress on the remaining drinking water reserves and needs to be considered as a cost for living beings and the environment. [23]

4.3.2 Cost of fairness

In the next two subchapters, the social costs and those relating to fairness that arise from algorithmic decisions are now described. Figure 5 shows these costs on the right side of the graphic a bit outside of the decision-making process. From past research in the area called “algorithmic fairness” [24], it could be seen that the meaning of fairness in the

“context of decisions based on the predictions of statistical and machine learning models” [24] was found to be not easy. The issue of fairness in decision-making does not only arise in algorithmic decision making even though it might seem so since the topic was made popular in the fields of Statistics and Computer Science. A decision made by solely humans also is dealing with the issue of fairness, so not involving machines in the decision-making problem does not solve the problem. [24]

But how is algorithmic fairness overall defined? In general, an algorithm can make fair decisions if it is free of any discrimination for persons with the same prerequisites. [25]

In order to designate an algorithm as fair, a criterion for fairness must first be defined.

Then a decision rule is developed that satisfies the criterion, "either exactly or approximately" [13]. This approach is used, for example, in the legal system. There, one of the possible decisions is whether a prisoner can be released or not, or whether he or she is a danger to society or not. It was found that in the US state of Florida, black people were more often classified as high risk. Since this is anything but fair, it has been suggested from several sides that the fairness approach should be built into algorithmic decisions. Now, formal fairness must be weighed against public safety and optimized.

No matter where the threshold value of this optimization lies, there is always at least minimal cost to society. [13]

Last but not least, it can be stated that sometimes decisions are better made in favor of

a group than in favor of an individual. For example, it is not a group of outstanding talents

(21)

C o s t d r i v e r s o f s o l e l y a l g o r i t h m s g e n e r a t i n g d e c i s i o n s | 17

that work best together in a team, but possibly less individually strong but more team players. [13]

4.3.3 Social cost

Not only can an algorithmic decision have costs for society, but also for an individual.

Milli et alia focused on this in their paper about “The Social Cost of Strategic

Classification” [26]. The authors defined the costs to the individual as those that must be

incurred in order to obtain a positive classification. The term used here for this measure

is a social burden. Since decision optimization can no longer only maximize benefits

while ignoring costs, the proposed calculation of the authors includes the factor of costs

for the individual. Two different interactions of costs and benefits can be observed,

whereby only monotonic cost functions are considered. "(1) Monotonically improving

one's outcome requires monotonically increasing amounts of work, and (2) it is zero cost

to worsen one's outcome" [26]. For example, it costs a person more effort to pay back a

loan than to lose everything from one second to the other. [26] Therefore, the costs

incurred by the individual should not be ignored.

(22)

C o s t d r i v e r s o f h y b r i d d e c i s i o n m a k i n g | 18

5 Cost drivers of hybrid decision making

The previous chapter described the costs identified from the literature review that are related to algorithmic decisions. The focus was on those that are made by software- implemented machines. Several major areas could be filtered out. On the one hand, the costs (both monetary and non-monetary) already arise at the base, the data storage.

Furthermore, it must be taken into account that the implementation, execution, and maintenance of algorithmic systems cause large costs. Nor should the costs incurred in the area of fairness and society be disregarded. As already mentioned in chapter 3 (Decision- making Systems an Overview, page 8), the topic of algorithmic decisions is divided into two main areas. The decisions that are related to a human decision-maker are considered in more detail in the following chapter. The literature highlights some important points where costs can arise in connection with this. These are summarized in the chapter below.

5.1 Business Intelligence Systems

Business intelligence is an area that has received many definitions in the past and still does. What goes with it or not is a matter of definition. Until a few years ago, it was not clear what the connection was between decision support systems and business intelligence or data analytics, or where the similarities lay. In more recent articles, such as that by Bousty et alia [27], „Investigating Business Intelligence in the era of Big Data:

concepts, benefits and challenges“ from 2018, Business intelligence is referred to as the process starting with data collection and ending with decision-making. [27] It is there to support the actual decision-making by utilizing experiences from the past [28, 27, 29]. A major advantage is seen in the fact that business intelligence can use all possible data sources and build on them. [27]

5.2 Theoretical costs

But it does not only bring great advantages. As will be described in more detail later, it is precisely this area of massive data collection and storage that harbors an enormous cost factor for the use of business intelligence. Why it is important to decipher the direct and indirect cost drivers is obvious. On the one hand, it is important to pass on the list of costs to the customer in order to put the pricing into perspective and to generate explanation and transparency. [29] Furthermore, projects can be better planned from management levels and areas can be identified where there is still a need for optimization. [29] This is important in order to remain competitive in the ever-rising world market. [28] Companies are under great cost pressure to be efficient, productive, and cost-optimized. [28]

Therefore, total cost accounting is of enormous importance. However, it is surprising that

the field of cost accounting is still new and therefore little research has been done on it,

although it is well known that intensive cost management is an important factor for

success with business intelligence methods. [28]

(23)

C o s t d r i v e r s o f h y b r i d d e c i s i o n m a k i n g | 19

Grytz and Krohn-Grimberghe [29], among other things, have listed three different ways in which a cost breakdown can be carried out in the area of business intelligence:

1. costs are set up using so-called "flat-rate distribution keys" [29],

2. costs are allocated to a specific production area, for example, CPU or memory usage

3. or the costs are summarized on a product level, which ends up in a very technical breakdown.

However, in the opinion of both authors, the theories of cost breakdown are presented quite abstractly and the applicability of these, in reality, is questionable. [29]

Also in the course of this literature research, no full cost accounting model for business intelligence systems could be found that is complete and includes all direct as well as all indirect cost drivers. Though individual costs could be identified.

With the wave of Big Data in recent decades, the problem of data storage has arisen. Big Data is the concept of storing, formatting, and analyzing large amounts of data. [27] This problem becomes bigger and bigger as more data is used for business. Often, data storage is outsourced to a cloud system to cope with these volumes. The areas in which costs are incurred when storing data in cloud environments, for example, have already been described in more detail in the previous chapter (4.2.1 Data storage cost). However, not only this area of Business Intelligence can be outsourced to cloud systems, but also the software. The change from having the infrastructure on-site to the cloud is a challenge for many companies. [27]

Business intelligence working on the cloud, or CloudBI (CBI) as it is called, has the advantages of "elasticity, scalability, mobility, immediate deployment as well as reduction of implementation costs" [27], among others. However, update delays can lead to data discrepancies, which can cause significant costs even on a small scale. [27] Furthermore, even small delays in BI applications can be costly to the business. [27]

Let's take another step back to the data issue. Data is not only available in large quantities, but is also often of poor quality. For example, data containing errors is said to cost US companies around 600 billion dollars a year. This means that the basis for good data analysis is high data quality. Software solutions can be used to establish a certain data quality. These are another cost factor. In the case of large amounts of data, these solutions require an enormous amount of time, which can lead to losses. [30]

In addition, the task of maintaining BI systems is often overlooked. Just like the maintenance of algorithms, this also involves costs. [28]

Grytz and Krohn-Grimberghe write in their article about the similarity between BI and IT.

This similarity lies in the fixed costs for personnel as well as hardware and software.

However, the costs for IT are easier to calculate because they are made up of several direct

costs, whereas in BI applications there are often indirect costs that are more difficult to

calculate. [29]

(24)

C o s t d r i v e r s o f h y b r i d d e c i s i o n m a k i n g | 20

5.3 Costs Factors derived from Decision Support System Applications

In the above section, the costs that can arise with the use of BI systems were summarized.

however, these are only the cost factors identified from the literature research. in the course of this literature research, however, 4 articles were found that deal with the implementation or the setup of decision support systems. Each of the projects presented the costs that arose with implementation, setup, and maintenance. This is very interesting to compare, as it makes areas of cost visible that were not yet mentioned in the previous section. The table below (Table 1) gives an overview over the main identified cost divers and their share of the total costs. Two of these decision support systems were developed for the health sector. One in Tanzania [31] and the other in Ghana [32], both of which are named Clinical Decision Support System (CDSS). The other two systems were initiated for being a Maintenance Decision Support System (MDSS) [33] and Spatial Decision Support System (SDSS) [34] respectively. Looking at Table 1 and comparing the numbers with each other, it can be seen that there is a big difference in the budget of the four presented projects. At 23,000 US dollars, the decision support system for Ghana had the lowest budget. A large part, about 29% of this was only needed for the staff. Overall, the range of the budget required for personnel is quite large, between 3% and 29%. A similar range can be seen at the values for the percentage of the cost for training ranging between 9% and 33% and the percentage of cost for software, ranging between 10% and 32%. It needs to be mentioned that the authors of the four papers used different categories to specify the costs in. Therefore some gaps exits in the observed table.

Article

[31] [32] [33] [34]

Country Tanzania Ghana US - South Dakota Solomon Island

Type CDSS CDSS MDSS SDSS

Total cost (USD) 185927,8 23316 332879 96046,49

cost personnel % 10% 29% 3%

cost training % 33% 12% 9%

cost software % 32% 10% 14%

recurrent cost % 51% 66%

equipment cost % 34% 31%

cost maintainance % 17% 31%

Table 1: Table summarizing the main costs that came up when implementing and maintaining Decision Support systems. In the projects in Tanzania and Ghana Clinical Decision Support Systems (CDSS), in South Dakota, a Maintenance Decision Support System(MDSS) and on the Solomon Islands a Spatial Decision Support System(SDSS) was worked on. As much as the total costs, the percentage of costs in the different categories varies.

Since the numbers differ so much from each other, at first glance no rule can be derived as to how large this share generally is in these BI solutions. If we now compare the cost- bearing areas here with those previously derived from theory, a few categories can be added. It was known that maintenance, software, and equipment cause costs. But also the personnel and the training of the personnel costs. It is important to know that some of these costs are not one-off but recurs annually (recurrent costs). The authors of the papers of the two CDSS [31] and [32] have specified these. They amount to 51% and 66%

of the total costs in the two projects.

(25)

D i s c u s s i o n | 21

6 Discussion

First of all, it could be established that decisions are enormously important and have to be made every day. For quite some time, this process has been automated. [1] In recent years, the topic has become more and more important, because, with the constantly advancing research in the areas of AI, ML, and BigData, decision-making processes could be optimized more and more. [2] In the process, humans make decisions not only on their own but often in interaction with systems that recommend actions or analyze and present data in a way that allows humans to make an informed and valuable decision. [2] Furthermore, there is also the possibility of machines making fully automated decisions without human intervention. Regardless of which of these approaches is pursued, costs are incurred. The arising costs can be divided into different categories. In previous research, some of these individual cost areas were examined more closely, but there was a lack of a summary of all possible ones. This gap should be closed by the literature analysis in this thesis as far as possible. The research questions to be answered were the following: What are the costs or cost drivers of algorithmic decisions? And how are they distributed among the decision approaches of solely machines and humans and machines together taking decisions? Both of these questions could be answered within chapter 4 (Cost drivers of solely algorithms generating decisions) and chapter 5 (Cost drivers of hybrid decision making).

Chapter 4Fehler! Verweisquelle konnte nicht gefunden werden. began with identifying t he costs of decisions made exclusively by algorithms. A summary in form of a table can be found in Table 2.

These costs were divided into direct, indirect, and social/environmental costs according to the full cost accounting model proposed by Dutta et alia [12]. Figure 5 summarizes all the costs found within this part. The algorithm itself forms the center of it. The costs that are causally related to it are those of implementation and design, execution, and maintenance.

[14, 17] This included the hiring of skilled and trained experts. [19] The algorithms are based on a data set or many data from which they 'learn' and on which they are 'trained'. The storage of these data incurs further costs. These are the costs to build a data center (initial), the rent for the building, energy costs, service costs, and disposal costs. [12] And not to forget the costs for the environment, for example through energy generation or cooling of the server buildings. [12, 23] Another point where costs can increase is the processing, transferring, and migrating of data. [22] Last but not least, a cost driver could be found in the fairness and social area. [13, 24, 26]

Chapter 5 then focused on the interaction between humans and machines and, of course,

the costs that can arise when making decisions in that way. The generic term used for this

structure is hybrid decision-making. This is defined as a process that begins with the

collection of data and ends with a decision being made. [28] It is supposed to support the

decision process in a positive sense by using experiences from the past as optimizing

information [28, 29, 30]. Researchers have already tried to filter out a full-cost accounting

model for BI systems from the literature, but they have not succeeded [29]. In the course of

this analysis, some individual cost drivers could be identified. As with algorithms before,

data storage was identified as a major cost driver. [27] In addition, data in its original state

is often of poor quality. [30] Processing this data and generating a level of quality takes a lot

of time and is also expensive for the business. [30] Another cost factor lies in the

(26)

D i s c u s s i o n | 22

outsourcing of data storage and software applications to cloud systems. Even minor delays in updates can cause data discrepancies that cost the company a lot. [27]

The acquisition of suitable hardware and software and the maintenance of BI systems are also just as costly as suitable and trained personnel or the training of employees. [28, 29, 31- 34]

It is also interesting to see that some of the costs are not done at once. The so-called recurrent costs occur annually. In two examples of the implementation of clinical decision support systems, these costs amounted to 51% and 66% of the total costs. [31, 32]

A full summary of all cost drivers carried together within this thesis is shown in Table 2 below. It opposes the two scenarios, on the left hand side the cost factors arising while solely algorithms are making decisions and on the left hand side humans deciding with the support of Business Intelligence.

Cost drivers of solely algorithms generating decisions

Cost drivers of hybrid decision making Data Quality

Data Storage - Initial - Floor Rent - Energy - Service - Disposal - Environment

Data Storage

Data Processing/Transferring/ Migrating

Update Delays

Personnel and Personnel Training Personnel and Personnel Training Algorithms

- Implementation & Design - Execution

- Maintenance

Hardware Software Maintenance Social & Fairness

Table 2: Table summarizing and opposing the cost drivers derived from literature of both decisions solely taken

by algorithms in the left column and humans making decisions with the help of Business Intelligence.

(27)

C o n t r i b u t i o n | 23

7 Contribution

In the course of this master thesis, the questions of what the individual cost factors of

algorithmic decisions are, could be uncovered to a large extent. Since some machines

generate decisions alone and others hybrid with humans, the cost factors differ slightly

between the two structures. It could be identified, in which areas the costs theoretically

differ. As mentioned in chapter 1 ‘Project Outline’, answering these main research

questions lays the foundation for future research. The possibilities resulting from hopefully

continued research are immense. For example, all cost factors, including the previously

hidden costs of algorithmic decision processes, can be included in budget planning. This

can increase the likelihood that the implementation of these will not bankrupt the company,

satisfy the clientele, and keep projects on schedule. In addition, planning and recording cost

as accurately as possible increases competitiveness.

(28)

C o n c l u s i o n | 24

8 Conclusion

In retrospect, it can be said that the cost factors of pure algorithmic decision processes filtered out from the literature overlap with those of business intelligence systems in some places. For example, data storage is a cost item for both, which in my opinion differs only a little. Data quality was mentioned as a cost factor only for business intelligence systems, but it could just as well stand for the problems of the data basis of algorithmic systems. In contrast, the costs of fairness and social costs were included in the chapter of those for exclusively algorithmic decision processes, but in my opinion, can be applied just as well to business intelligence systems. For both, there are costs for the equipment, which could be determined in a study in the next step. Likewise, personnel must be hired and trained for both approaches. The exact costs incurred for this, or a cost model that can approximate these, should also be created in the next step (2) as suggested in the Introduction.

In my opinion, the costs also change depending on the perspective from which they are viewed. For example, let's take the position of a small business that has outsourced both the data storage and the software solution to a cloud system and has a relatively limited budget available. A decision may be made that affects the next fiscal year and some of the employees may need to be laid off. The BI application provides a recommended action for this decision. In Two areas, some costs may not be directly visible to the viewer - the small business - and thus may be ignored. One is the social and fairness costs of laying off an employee. This can be the bankruptcy of the family of this employee or, for example, the costs for social benefits that then arise for the state. On the other hand, there are the costs for the environment, which cannot be disregarded, that are incurred by cloud solutions.

These are, for example, the drinking water consumption for cooling and energy generation or the CO2 emissions of the diesel emergency generators of the booked DataCenter. These are also costs that such a company does not see directly and includes in the calculation or the choice of a possibly different employee for whom it would cost less to be laid off or the choice of a somewhat more sustainable DataCenter.

However, if one were to take the position of an environmental organization, the costs incurred for the environment would naturally stand out more. The costs incurred by the company, which has to train employees to be able to use software solutions, for example, would probably be less important.

These examples show how a different point of view can distort the cost calculation. This is why it is important to deepen the research in this area. For this purpose, I propose some points for a research agenda:

- A study should be conducted with various companies from which the costs incurred for the implementation and use of both algorithmic and BI systems for decision making are listed. In doing so, as just mentioned, covering different points of view should be included in the selection criteria for participating companies and organizations.

- Another possible approach for comparing the theoretical findings of this paper with

the practice would be the conduction of a study asking firms of what they see as a

major cost driver. What in their opinion is the part of the process they are spending

the most money at?

(29)

C o n c l u s i o n | 25

- In addition, another cost point regarding data storage should be examined more closely in future research. It is said that by about 2025, it will be possible to store about 90 percent of the data collected. However, only half of this would be secured.

[28] What costs could arise in connection with this?

The Cost of Algorithmic decisions: A Systematic Literature Review

THE COST OF ALGORITHMIC DECISIONS

A Systematic Literature Review

Annalena Erhard

Master Programme in Data Science 2021

Luleå University of Technology

Department of Computer Science, Electrical and Space Engineering

I n t r o d u c t i o n | 1-I

Annalena Erhard nanreh-0

Luleå Tekniska Universitet Supervisor: Ahmed Elragal

The cost of algorithmic decisions

Thesis, Master Data Science

2021.06.10

I n t r o d u c t i o n | 1-II

Table of Contents

1 Introduction... 1

2 Methodology ... 4

3 Decision-making Systems an Overview ... 8

4 Cost drivers of solely algorithms generating decisions ... 10

4.1 Direct cost ... 10

4.1.1 Cost of implementing and designing Algorithms ... 10

4.1.2 Cost of executing Algorithms ... 11

4.1.3 Cost of maintaining Algorithms ... 11

4.2 Indirect costs ... 13

4.2.1 Data storage cost ... 13

4.3 Social and Environmental costs ... 16

4.3.1 Water consumption ... 16

4.3.2 Cost of fairness ... 16

4.3.3 Social cost ... 17

5 Cost drivers of hybrid decision making ... 18

5.1 Business Intelligence Systems ... 18

5.2 Theoretical costs... 18

5.3 Costs Factors derived from Decision Support System Applications ... 20

6 Discussion ... 21

7 Contribution ... 23

8 Conclusion ... 24

9 Implications ... 26

References ... 27

I n t r o d u c t i o n | 1-III

ABSTRACT

Decisions have been automated since the early days. Ever since the rise of AI, ML and Data

Analytics, algorithmic decision-making has experienced a boom time. Nowadays, using AI within

a company is said to be critical to the success of a company. Considering the point that it can be

quite costly to develop AI/ ML and integrating it into decision-making, it is striking how little

research was put into the identification and analysis of its cost drivers by now. This thesis is a

contribution to raise and the awareness of possible cost drivers to algorithmic decisions. The

topic was divided in two subgroups. That is solely algorithms and hybrid decision-making. A

systematic literature review was conducted to create a theoretical base for further research. The

cost drivers for algorithms to make decisions without human interaction, the identified cost

drivers identified can be found at Data Storage (including initial, floor rent, energy, service,

disposal, and environmental costs), Data Processing, Transferring and Migrating. Additionally,

social costs and the ones related to fairness as well as the ones related to algorithms themselves

(Implementation and Design, Execution and Maintenance) could be found. Business Intelligence

used for decision making raises costs in Data quality, Update delays of cloud systems, Personnel

and Personnel training, Hardware, Software, Maintenance and Data Storage. Moreover, it is

important to say that the recurrence of some costs was detected. Further research should go in

the direction of applicability of the theoretical costs in practice.

I n t r o d u c t i o n | 1

1 Introduction

The automation step nowadays includes algorithms; hence it is called ‘algorithmic decision- making’. But this is not the only generic term used for it in research. Some synonyms for algorithmic decisions are data-driven, automated, and evidence-based decision-making.

Considering this fact and also the point that it can be quite costly to develop AI/ ML and

integrating it into decision-making, it is striking how little research was put into the

identification and analysis of the cost drivers. In the paper “Artificial intelligence for decision

making in the era of Big Data – evolution, challenges and research agenda” by Duan et alia

[4], twelve propositions regarding further research in the area of AI and decision science

were given. Proposition 2 described the difficulty of measuring the benefits and impact of

AI. This is not only a challenge of measuring the costs and benefits short- or long term, but

also from which perspective. A political point of view might ignore the social or

environmental impacts and from a social perspective, business-related costs might be

overgone. For a business, it is essential to know how best to make strategic decisions. The

question now is what consequences it has for decision-makers from which position possible

I n t r o d u c t i o n | 2

consequences are viewed at and what possibilities it offers to list, summarize and calculate the different costs of decision processes.

Especially digging a bit deeper, combining the algorithms of AI and ML with data to support decision making. In traditional software project management, cost accounting or cost models create a base for allocating resources, staying in budget and finishing a project in time. [5]

Just like that, it is substantial for any project - manager or decision - maker to be able to define the costs associated with the development, execution, and maintenance of any system within the decision-making process.

Project Outline

But Duan et alias [4] proposition 2 has already indicated one of the research questions for this thesis to contribute to cost accounting within the decision making process. This is

- what the costs or cost drivers of algorithmic decisions are ?

Usually, decisions are not only taken by humans. Often some degree of support is involved.

- Among these varying degrees of let’s say human involvement in the decision-making process, how are the costs distributed?