• No results found

Applying Revenue Management to the Last Mile Delivery Industry

N/A
N/A
Protected

Academic year: 2022

Share "Applying Revenue Management to the Last Mile Delivery Industry"

Copied!
77
0
0

Loading.... (view fulltext now)

Full text

(1)

IN THE FIELD OF TECHNOLOGY DEGREE PROJECT

COMPUTER SCIENCE AND ENGINEERING AND THE MAIN FIELD OF STUDY

INDUSTRIAL MANAGEMENT, SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2018

Applying Revenue Management to the Last Mile Delivery Industry

Modeling Willingness-to-pay with Machine Learning

PETER FINNMAN

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

(2)
(3)

Applying Revenue Management to the Last Mile Delivery Industry

Modeling Willingness-to-pay with Machine Learning

Author :

Peter Finnman

Thesis Commissioner:

Budbee

Examiner : Lars Uppvall Thesis Supervisors:

David Nilsson (Budbee) Thomas Westin (KTH)

Master of Science Thesis TRITA-ITM-EX 2018:642 KTH Industrial Engineering and Management

Industrial Management SE-100 44 STOCKHOLM

December 19, 2018

(4)

Till¨ ampbarheten av Int¨ aktsoptimering p˚ a Sista Milen Industrin

Modellering av Betalningsvilja med Maskininl¨ arning

F¨ orfattare:

Peter Finnman Uppdragsgivare:

Budbee

Examinator : Lars Uppvall Handledare:

David Nilsson (Budbee) Thomas Westin (KTH)

Examensarbete TRITA-ITM-EX 2018:642 KTH Industriell teknik och management

Industriell ekonomi och organisation SE-100 44 STOCKHOLM

December 19, 2018

(5)

Master of Science Thesis TRITA-ITM-EX 2018:642 Applying Revenue Management to the Last Mile

Delivery Industry

Peter Finnman

Approved Examiner Supervisor

2018-11-06 Lars Uppvall Thomas Westin

Commissioner Contact Person

Budbee David ¨Odling

Abstract

The understanding of what motivates a customer to pay more for a product or service has al- ways been a fundamental question in business. To the end of answering this question, revenue management is a business practice that revolves around using analytics to predict consumer behavior and willingness-to-pay. It has been a common practice within the commercial airline and hospitality industries for over 30 years, allowing adopters to reach their service capacity with increased profit margins.

In this thesis, we investigated the possibility to apply revenue management to the last mile delivery industry, an industry that provides the service of delivering goods from e-commerce companies to the consumer’s front door. To achieve this objective, a revenue management framework was conceived, detailing the interaction between the customer and a dynamic pricing model. The model itself was a product of a machine learning model, intended to segment the customers and predict the willingness-to-pay of each customer segment. The performance of this model was tested through a quantitative study on synthetic buyers, subject to parameters that influence their willingness-to-pay. It was observed that the model was able to distinguish between different types of customers, yielding a pricing policy that increased profits by 7.5% in comparison to fixed price policies.

It was concluded that several factors may impact the customer’s willingness-to-pay within the last mile delivery industry. Amongst these, the convenience that the service provides and the disparity between the price of the product and the price of the service were the most notable.

However, the magnitude of considering these parameters was never determined. Finally, em- ploying dynamic pricing has the potential to increase the availability of the service, enabling a wider audience to afford the service.

Keywords

Revenue Management, Last Mile Delivery, Machine Learning, Supervised Learning, Gaussian Process, Willingness-to-pay, Decision Tree, Dynamic Pricing, Operations Research, Revenue Management Information Flow

(6)

Examensarbete TRITA-ITM-EX 2018:642 Till¨ampbarheten av Int¨aktsoptimering p˚a Sista

Milen Industrin

Peter Finnman

Godk¨ant Examinator Handledare

2018-11-06 Lars Uppvall Thomas Westin

Uppdragsgivare Kontaktperson

Budbee David ¨Odling

Sammanfattning

Vad som motiverar en kund att betala mer f¨or en tj¨anst eller en produkt har l¨ange varit ett cen- tralt koncept inom aff¨arslivet. Int¨aktsoptimering ¨ar en aff¨arspraxis som str¨avar efter att besvara den fr˚agan, genom att med analytiska verktyg m¨ata och f¨orutse betalningsviljan hos kunden.

Int¨aktsoptimering har l¨ange varit framtr¨adande inom flyg- och hotellbranschen, d¨ar f¨oretag som anammat strategin har m¨ojlighet att ¨oka f¨ors¨aljningsvinsten.

I detta examensarbete unders¨oker vi m¨ojligheten att applicera int¨aktsoptimering p˚a sista milen industrin, en industri som leverar k¨opta produkten hem till kunden. F¨or att uppn˚a detta har vi tagit fram ett ramverk f¨or informationsfl¨oden inom int¨aktsoptimering som beskriver hur kunder interagerar med en dynamisk priss¨attningsmodell. Denna priss¨attningsmodell framst¨alls genom maskininl¨arning med avsikt att segmentera kundbasen, f¨or att sedan f¨orutse betalningsviljan hos varje kundsegment. Modellens prestanda m¨attes genom en kvantitativ studie p˚a syntetiska kun- der som beskrivs av parametrar som p˚averkar betalningsviljan. Studien p˚avisade att modellen kunde skilja p˚a betalningsviljan hos olika kunder och resulterade i en genomsnittlig vinst¨okning p˚a 7.5% i j¨amf¨orelse med statiska priss¨attningsmodeller.

Det finns m˚anga olika faktorer som spelar in p˚a kundens betalningsvilja inom sista milen in- dustrin. Bekv¨amlighet och skillnader i priset p˚a produkten som levereras och tj¨ansten att lev- erera produkten ¨ar tv˚a anm¨arkningsv¨arda faktorer. Hur stor inverkan faktorerna som beskrivs i detta examensarbete, har p˚a betalningsviljan, f¨orblev obesvarat. Slutligen uppm¨arksammades m¨ojligheten att, med hj¨alp av dynamisk priss¨attning, ¨oka tillg¨angligheten av tj¨ansten d˚a flera kunder kan ha r˚ad med en priss¨attning som ¨overv¨ager deras betalningsvilja.

Nyckelord

Int¨aktsoptimering, Sista Milen-leverans, Informationsfl¨oden inom Int¨aktshantering, Maskininl¨arn- ing, ¨Overvakat L¨arande, Gaussiska Processer, Betalningsvilja, Beslutstr¨ad, Dynamisk Priss¨attning, Verksamhetsforskning

(7)

Contents

Nomenclature iv

Glossary iv

Notation vi

Intended Audience vi

1 Introduction 1

1.1 Problem Formulation . . . 2

1.2 Purpose . . . 3

1.3 Research Questions . . . 3

1.4 Expected Contribution . . . 3

1.5 Delimitations . . . 3

1.6 Project Commissioner . . . 4

1.7 Thesis Outline . . . 4

2 Literary Review 6 2.1 Operations Research . . . 6

2.2 Revenue Management . . . 7

2.3 Approaches to Model Demand . . . 9

2.4 Dynamic Pricing Models . . . 10

2.5 Machine Learning . . . 11

3 Theoretical Frame of Reference 14 3.1 Strategic Levers: Price and Time . . . 14

3.2 Revenue Management Information Systems . . . 14

4 Machine Learning 17 4.1 Origin and History . . . 17

4.2 Key Concepts . . . 18

4.3 Bayesian Machine Learning . . . 21

4.4 Kernel Functions . . . 23

4.5 Gaussian Processes . . . 23

4.6 Decision Trees . . . 24

5 Method 26 5.1 Project Timeline . . . 26

5.2 Pre-study and Literature Review . . . 26

5.3 Quantitative Study . . . 27

5.4 Theoretical Analysis . . . 31

5.5 Validity and Reliability . . . 32

5.6 Generalizability . . . 33

(8)

6 Empirical Study 35 6.1 Budbee’s Customer Data . . . 35 6.2 Synthetic Training Data . . . 36 6.3 Testing . . . 39

7 Analysis 41

7.1 Component Analysis of Optimization Function . . . 41 7.2 Model Feasibility on Synthetic Data . . . 42 7.3 Revenue Management Information System Implementation . . . 45

8 Discussion 47

8.1 Employing Revenue Management to the Last Mile Delivery Industry . . 47 8.2 Performance . . . 50 8.3 Flexibility . . . 50 8.4 Sustainability and Ethics Implications . . . 51

9 Conclusion 54

9.1 Limitations and Future Research . . . 55

References 56

A Values of Fixed Price Policy 63

(9)

Acknowledgements

First I would like to express my sincerest gratitude to Budbee and its employees for welcoming me into the organization. You all helped turn month into hours, figura- tively speaking of course. I am also grateful for your support of this project, despite the resources it demanded at times.

Furthermore, I would like to thank my supervisors who partook in the production of this study, David Nilsson and David ¨Odling from Budbee and Thomas Westin from KTH. You provided me with knowledge and guidance that was crucial to enriching the study. I would also like to thank my friends at KTH for the support you’ve shown me throughout the entire process. There are too many of you to list by names. Hopefully you know who you are.

Lastly, I would like to thank my brother, Simon. I can recall several occasions where you urged me on when I was faced with setbacks. In the end, it was you who helped me across the finish line.

(10)

Nomenclature

AP I Application Programming Interface GP Gaussian Process

M L Machine Learning OR Operations Research

P SD Price Sampling Distribution RM Revenue Management

RM IF Revenue Management Information Flow SQE Square Exponential

Glossary

Below, a glossary is provided to facilitate the understanding of the thesis for readers who are not familiar with machine learning. Normally, the glossary would be ordered alphabetically. However, for the sake of comprehensibility, we start with simple terms within the particular domain, gradually moving towards more complex ones.

Entropy Commonly referred to as Shannon Entropy and relating to information theory, entropy can be defined as the degree of randomness in a dataset, with greater randomness implying a higher entropy.

Feature Synonymous with predictor variable or attribute, a feature is a variable with some degree of correlation to the phenomenon one is trying to predict or measure.

Information Gain The information gain is based on the decrease in entropy after a dataset is split on an attribute. It can be interpreted as the decrease in entropy as learn the value of a predictive variable.

Model Models are the analytical representation of the phenomenon being examined. They are typically be used for predicting future events related to the phenomenon.

Target The target represents the phenomenon that one desires to know more about. Given a set of features, the target rep- resents the truthful evaluation of their output.

(11)

Training Set A set of data that contains previously observed mappings be- tween the features and a target. This dataset is used to fit the parameters of the machine learning model.

Samples Samples are random elements of the training set.

Test Set As opposed to the training set, the test set contains data with- out a target label. The test set is generally regarded as the data which we wish to perform inference on.

Inference Inference is the process of letting a machine learning model predict the outcome of a feature set.

Classification Classification is a type of inference where the output is a cat- egorical label belonging to a class.

Regression Regression is another type of inference where the output is a numerical value.

Kernel Function Intuitive to its name, the kernel is a function that measures the similarity between two feature sets. The output is a numeri- cal value, where larger values indicate a greater dissimilarity between the two points.

Prior The prior is the initial intuition of the studied phenomenon encoded into a probability distribution.

Posterior During training one gradually updates the prior probability distribution with information carried by the samples. The resulting probability distribution is called the posterior prob- ability distribution.

Covariance Matrix The covariance matrix Σ is a matrix where position Σi,j de- notes the covariance between the ith and jth elements of ran- dom vector.

(12)

Notation

This paper contains extensive use of statistical models to formally introduce concepts and elaborate on complex expressions. The following notations are for the most part identical to those of Bishop (2016). Bolded lower-case letter denotes vectors. These vectors default to column vectors, unless they are transposed (x>) in which case they are row vectors. Bolded upper-case characters denote matrices, e.g. X, the design matrix. Non-bolded upper-case characters denote random variables. [a, b] denotes a closed interval from a to b (where a and b are included). (a, b) denotes an open interval from a to b (where a and b are excluded).

Specific to probability, we use the notation of p for the probability of an event and P for probability distributions. P(A|B) denotes the probability distribution of A given B.  will be used frequently throughout the paper, denoting a Gaussian distribution with a mean (µ) of 0 and a standard deviation (σ) of 1, if nothing else is specified.

Specific to machine learning, we use the notation of D to denote the data set, from which we draw samples xi. For linear regression, scalar weights are denoted as β with associated indices for simpler examples. Note that β is an exception and will not be bolded to clarify that it is a vector. θ denotes all model parameters, including the samples from our data set and eventual hyperparameters. k(x, x0) denotes the kernel function, defaulting to the squared exponential kernel unless specified. Finally, Σ denotes the covariance matrix.

Intended Audience

This thesis contains extensive use of mathematics, specifically pertaining to probabil- ity theory and statistics. To grasp the content in its entirety, it is recommended to possess a fundamental understanding of both calculus and statistics. However, steps have been taken to facilitate the understanding for readers that may not be com- fortable within these domains. In the second chapter, we offer a simple summary of the procedures and goals of machine learning. Chapter 4 constitutes the bulk of the mathematical theory of the thesis and has been structure to introduce simple concepts first gradually moving towards more complex ones. Subsequent chapters contain some complex mathematical expressions to a lesser extent. Efforts have been made to prop- erly explain obscure steps that in these sections. Furthermore, the glossary contains the most crucial terminology related to machine learning. The intent is to enable readers to comprehend the objectives of the content in the thesis, without necessarily understanding the techniques themselves.

(13)

1 Introduction

The introductory chapter of this paper briefly summarizes theory regarding revenue management (RM), a field in which scholars study the optimal conditions by which to sell a product or service to consumers. Furthermore, industries in which revenue management is prevalent are exemplified, discerning the reasoning behind the pricing strategy decisions of these industries. Subsequent to these exemplifications, the problem formulation and purpose of this paper are presented. The research questions and de- limitations of the study are outlined to elucidate the angle of ensuing research. Finally, the commissioner of the project is presented along with a brief outline of the structure of the thesis.

The problem of selling a service or product during favorable market conditions has be- come an increasingly interesting topic to study within the realm of operations research, applied to a multitude of different industries, e.g. commercial transport (Kimes, 1989;

Strasser, 1996), media and telecom (Nair and Bapna, 1997; Cross, 1998) and hospitality (Rothstein, 1974; Bitran et al., 1995). This surge in popularity could be attributed to the increased availability of customer data and the establishment of revenue manage- ment (RM) as an independent academic and professional field (Talluri and Van Ryzin, 2006). RM was conceived during the deregulation of commercial air travel, where air travel retailers sought to increase prices during periods of increased demand and consequently forced to reduce prices when demand faltered (Cross, 2011). The goal of RM is to analyze the market to determine when the seller can take advantage of increasing demand in certain market segments or the perception of the customer’s willingness-to-pay. The practice of RM is most prevalent in industries that deal with limited, temporary and allocated products, e.g. the airline or hotel industry where airline tickets and hotel rooms are the respective products (McGill and Van Ryzin, 1999). Due to the limited nature of these products, the seller must employ pricing strategies that account for variable demand and willingness-to-pay in order to reach optimal profit levels.

In this thesis, we examine the application of revenue management within the last mile delivery industry, an industry that has received little attention compared to the airline and hotel industry. Last mile delivery can be defined as transportation of goods from a retailer to the end customer, whilst last mile logistics refers to the processes and sys- tems that enable the delivery. The delivery destination for the product is oftentimes the residential address of the customer. The most prevalent interface for customer- provider communication within the industry is online web applications, where delivery requests are sent from the merchant that originally sold the product. In this thesis, merchants refer to companies that offer last mile delivery to customers, through a third-party service provider. One of the most important factors in this industry, con- tributing to the competitive advantage of companies, is to deliver the goods to the end customer in an expeditious manner (Esper et al., 2003). Whilst the primary focus in this thesis is the last mile delivery industry, it is not regarded the most conventional application area of RM. One example of an industry where the application of rate fencing and other methods of RM is more common, is the hotel industry.

RM has been pivotal to the evolution of pricing models in the hotel industry (Are-

(14)

noe et al., 2015), an industry subject to adverse pricing strategies at the expense of the end consumer, such as price collusion and deceptive price advertisement. These pricing strategies could be attributed to experienced difficulty in predicting demand in various customer segments. This sparked the interest in finding solutions to more accurately capture demand within the market and identify factors pertaining to said demand. Dynamic pricing is a flexible approach to pricing a product or service, ac- counting for variable factors such as demand, supplier, cost and season / weekday (PK Kannan, 2001). In an extensive study of over 900 hotel bookings, conducted by Abrate et al. (2012), the authors concluded that an excess of 90% of the studied hotels use dynamic pricing models to capture the demand of different market segments to increase revenue, in this case high-end business travelers and low-end leisure travelers.

Moreover, the authors of the study argued that the increased use of the internet and the availability of price comparison services were considered to be momentous factors in the popularity of employing dynamic pricing strategies.

Whilst capturing the demand of different market segments and increasing profit mar- gins could be seen as the original intent behind employing dynamic pricing strategies, Ulmer (2017) discusses an entirely different reasoning behind this decision. In the same-day delivery industry it is of paramount importance that deliveries to customers are punctual and executed in an efficient manner. This entails applying dynamic pric- ing to a different end, one of balancing the demand as to be both cost efficient and logistically appropriate. In turn, this requires an intricate understanding of the factors which demand depends upon, a task of great complexity.

In fact, this may be a task impossible for humans to comprehensively solve. Thus, the problem lends itself nicely to machine learning, a field of computational science that allows computers to learn from data and make data-driven decisions. The field of machine learning (more specifically supervised learning), assumes the empirical phenomena to follow a function by which it can be modeled and predicted to a certain extent (Goodfellow et al., 2016). Since its inception in 1959 (coined by Arthur Samuel), machine learning has been applied to model complex environments in a multitude of different industries, including: robotics (Kober and Peters, 2012), environment and healthcare (Kubat et al., 1998; Choi et al., 2016; Chen et al., 2017) and transportation (Hajibabai and Ouyang, 2013).

1.1 Problem Formulation

Pricing strategies are to be considered one of the most impactful business decisions in regard to survivability in competitive markets (Cross, 2011). Identifying and acknowl- edging the factors that amalgamate to the customer’s willingness-to-pay is paramount in understanding which pricing that is most appropriate to employ in terms of bal- ancing demand and reaching optimal profit levels. This task is difficult as it involves an intricate understanding of macro-level psychology and sociological trends pertain- ing to the origin of the demand for the given product or service. Furthermore, the complexity of this problem increases as one considers that different customer segments and merchants are subject to variations in willingness-to-pay. The complexity of the problem, as well as the complexity of considering individual factors themselves, makes it nearly impossible to tackle through traditional methods of analysis.

(15)

1.2 Purpose

The purpose of this paper is to investigate the potential to apply machine learning to solve the complexity of revenue management in the last-mile delivery industry. More specifically, we intend to explore approaches to modeling demand and willingness-to- pay in order to produce a pricing strategy that balances demand, thus increasing both profits and the accuracy of delivery times.

1.3 Research Questions

From the stated purpose we derive the following main research question in attempts to explore the problem formulation of the study. Furthermore, we compartmentalize the main research question into sub questions that concretize the angle of subsequent research.

Main Research Question: How can revenue management be utilized to balance demand and increase profits in the last mile delivery industry?

Sub Question 1: Which factors are essential to consider in the endeavor of increasing profits through dynamic pricing models considering multiple vendors and temporal demand variations?

Sub Question 2: How could machine learning be utilized to model both demand and willingness-to-pay of customers?

1.4 Expected Contribution

This paper studies an approach to model willingness-to-pay in customers towards last- mile delivery services. To achieve this, we make use of revenue management frameworks that facilitate the interaction between customers and a prediction interface. Applying this framework to the last mile delivery industry, we intend to make a conceptual contribution by identifying new theoretical constructs and feedback mechanisms. Fur- thermore, this thesis makes use of ensemble learning (a mixed-model machine learning approach) to both segment customers and predict their willingness-to-pay as a func- tion of multiple influential factors, an approach to dynamic pricing largely unexplored until now. Therefore, this thesis aims to also make an empirical contribution to the intersecting fields of machine learning and revenue management.

1.5 Delimitations

From a theoretical perspective, this paper is delimited to focus on dynamic pricing strategies, and the factors that are necessary to consider in determining these. This delimitation is extended to encompass the circumforaneous field of revenue manage- ment. Furthermore, we examine approaches to constructing dynamic pricing strategies through the use of machine learning, approximating the environment’s demand func- tion and customer’s willingness-to-pay.

We also delimit the study to solely focus on the last mile delivery industry, although literature from other industries is included to provide the research with context. As the operations of the project commissioner is currently limited to Sweden, the paper

(16)

will focus on lastmile delivery within Sweden. However, the generalizability of results will be discussed in Chapter 5.

1.6 Project Commissioner

Budbee is a relatively small last mile delivery service provider of goods that have been ordered online (Budbee.com, 2017). The company was founded in 2012 with the pur- pose of making deliveries of goods more convenient for the customer. Their business is mainly B2B, whereby Budbee partners with merchants and becomes a part of their post-checkout offering. However, they also offer B2C services to the end customer, through various additional services. In this thesis, a customer refers to the end con- sumer whilst we use merchant or vendor in reference to Budbee’s B2B customers.

Budbee’s competitive advantage lies in their accuracy to deliver on time, promising to be within the customer’s requested time-slot, maintaining a money back guarantee policy in case they fail to deliver on this promise. In the first quarter of 2018, they retained a customer satisfaction rating of approximately 4,9/5. They continuously work to improve their service and extend their product to become more transparent and convenient. Any changes introduced to the product must serve to align themselves with the competitive advantage of the company.

This project was conceived as a part of Budbee’s effort to improve upon a recently released feature of their service, namely delivery time slot allocation. This feature aims to increase the convenience for the end-user’s of the service, allowing them to plan their day-to-day activities around the awaited delivery. To date, the price of these time-slots has been determined through the expertise of Budbee’s employees.

However, in order for the feature to be optimal in terms of profitability, the price must account for factors such as the willingness-to-pay for various customer segments as well as the overhead cost of the delivery destinations.

1.7 Thesis Outline

As stated previously, this thesis concerns the intersecting fields between machine learn- ing and revenue management. In Chapter 2 we present literature in the fields of op- erations research, revenue management, dynamic pricing and demand models. This chapter is concluded with a brief description of machine learning, aimed at providing a fundamental understanding of its components, approaches and goals. Subsequently, some general frameworks are briefly described that will be applied to the last-mile delivery industry. Chapter 4 delves into the key machine learning concepts that will be used in the study, aimed at familiarizing the reader with the technical field. It should be stated that this chapter contains some more advanced sections towards the end. However, the initial sections do offer some examples that aid in understanding of the key concepts of machine learning. Chapter 6 and 7 detail the quantitative study that constitutes the empirical part of this thesis. The penultimate chapter discusses the application of revenue management to the last-mile delivery industry, as well as the performance and flexibility of the proposed dynamic pricing model. Finally, the last chapter concludes the thesis with some key findings.

(17)

Chapter Summary

In this chapter we reviewed revenue management, a field within operations research where the seller predicts consumer behavior to increase revenue. The need for rev- enue management became apparent in the hospitality industry as the current pricing strategies were unfavorable for the customer. In the last-mile delivery industry, the origin is of another nature. We addressed the opportunity for dynamic pricing to be used as a means to balance demand, allowing for intermediary service providers to be more accurate in the promises they make to customers. We then presented the purpose of this study, to model the demand function of time slots through the use of machine learning. From this purpose, we presented the research questions that the thesis in- tends to answer, followed by the paper’s contributions and delimitations. Finally, we introduced the project commissioner, Budbee, a delivery company within the last mile delivery industry.

(18)

2 Literary Review

This chapter introduces and details the management related concepts of the thesis. It offers a brief history of operations research, a field providing analytical tools used for supporting decisions, and outlines focus areas of the field. Subsequently, the chapter describes revenue management and the conditions under which the business practice can be successfully applied. One of the prerequisites for employing dynamic pricing, is building an understanding of demand and its constituent factors. To this end, we re- view some approaches to modeling demand and describe some of the weaknesses related to each approach in the thesis setting. In the penultimate section of this chapter, some dynamic pricing strategies are examined. Most notably, this section offers an example of how Uber gathers and utilizes demand data. Finally, we summarize some machine learning applications within the fields of operations research and revenue management.

A more in depth description of relevant machine learning techniques can be found in Chapter 4.

2.1 Operations Research

Operations management entails activities relating to the management of resources used in order to produce and deliver a particular service or product (Slack et al., 2010), encompassing the design and management of products, processes and supply chains. Whilst operations management generally examines the functions that ensure day-to-day customer fulfillment, operations research (OR) can be seen as the tools by which to achieve this. The objective of OR is to provide decision-makers with ana- lytical support, through consideration of data generated by the examined processes.

To achieve this objective, operations researchers employ a wide variety of analytical techniques, including: problem structuring methods, queuing theory, decision analy- sis, forecasting techniques, game theory, and optimization (Collins and Currie, 2012).

In simple terms, operations research entails looking at an organization’s operations and subsequently using mathematical or computer scientific models to aid in decision making and to improve upon these operations.

Operations research has strong ties to mathematics and builds upon many of the fun- damentals in statistics, such as the expected value of outcomes, dating back to the work of Blaise Pascal in 1654 (Amor et al., 2004). As a result, the genesis of the scientific field remains unresolved. Even academia seems divided as to the origins of OR. Some state that it was first used during World War II (Gass and Assad, 2005;

Collins and Currie, 2012), whilst others believe that it was conceived far earlier than that (Eiselt and Sandblom, 2012). In World War II, OR was predominantly used to increase efficacy of war operations. Oftentimes, the methodologies employed were sim- ple, consisting of trial and error followed by subsequent measurements and corrections.

Examples range from determining the optimal size of convoys in order to minimize loss of Allied cargo ships, to determining the undersurface color of British bombers most suitable to each operation. Despite the changes being relatively small, they had major impact on the overall war effort (Kirby, 2003).

Akin to the scenarios in the war, applications of OR has since been focused around small but impactful strategic problems. Vidale and Wolfe (1957) analyzed the efficacy

(19)

of product advertisement as a means to increase profits, revealing dynamics of mar- ket perception towards advertisement (such as response constant, saturation level and sales decay constant), that could be used to produce optimal advertisement strate- gies. On the academic side, Wagner (1969) brought many of the predominant OR methods of the time (linear and dynamic programming) to prominence within the business administration community. His critically acclaimed book gave insight into how to tackle problems such as inventory control and equipment replacement with operations research. A few years later, Luss and Gupta (1975) examined resource allocation problem through the lens of operations research. By measuring the return of an activity as a function of the effort put into it, it is possible to increase the value of resources. This research emphasizes the widespread applicability of OR, which in this particular case ranges from allocation of advertisement resources amongst sales territories, portfolio-selection and budgeting problems.

The acceleration of the global economy at the end of the twentieth century increased the customer’s expectations in terms of both costs and services (Buffa, 1980; Swami- nathan et al., 1998). In turn, this prompted a focus on supply chain management, specifically regarding the objectives of on-time delivery, quality assurance and cost minimization. Nahmias (1979) identified that inventory-related problems, up until this point, had received relatively little attention. This is reflected in the industrial setting as well. Maloni and Benton (1997) studied the state of competition for manufactured products between the Pacific region and their American and European counterparts.

They concluded that the west, previously the dominating players in these industries, must review their supply chains to be able to stay competitive, and further pushed the OR agenda. It appears that these industries realized this problematic situation as well. Some four years later, Geoffrion and Krishnan (2001) studied the e-commerce industry and found that there was widespread adoption of operations research within.

In some ways, operations research in combination with pressure from the global econ- omy, has helped push the efficient frontier of availability, building the foundation for many large, global companies such as Amazon, Alibaba and eBay.

2.2 Revenue Management

As previously mentioned, this paper reviews the means by which to increase profits through dynamic pricing. This entails employing operations research to revenue man- agement. In the next section, we review and contrast this field to provide a foundation for subsequent analysis.

Revenue management (RM), also known as yield management, has gained traction within the OR community and is generally regarded as one of most successful ap- plication areas of OR (Talluri and Van Ryzin, 2006). Put simply, RM is a business practice that regards sales decisions, how to segment the customer base and take ad- vantage of their purchasing behaviors and willingness-to-pay. It also regards seasonal variations in pricing, where the company experiences temporal variations in cost and demand. In an analysis of the application areas of revenue management, Weatherford and Bodily (1992) concluded that it is generally employed in situations appertaining to three characteristics. Firstly, an interval in between which the product or service is available, subsequently either becoming unavailable or diminishing in value. Secondly,

(20)

a fixed number of units, where extending these units is either subject to significant lead times or imposes high costs. Finally, they are characterized by the potential to segment customer’s based on their price sensitivity or willingness-to-pay.

There is not much to be said about revenue management up until the late 1980s, per- haps due to the number of products and services identified by Weatherford and Bodily’s three characteristics. The subject rose to prominence with the commercialization of airfare and there has been a wealth of literature that covers RM within this industry (Sa, 1987; Belobaba, 1987; Kimes, 1989; Cross, 2011; Smith et al., 1992). Shortly after, the subject expanded to encompass other industries as well. Kimes et al. (1998) has been instrumental in this regard, studying the application of RM in a wide array of industries, comparing them to more traditional industries (hotels, airlines, cruise lines).

She describes the typology of revenue management divided into four quadrants depending on the combination of vari- able pricing and duration control.

Variable pricing refers to if the industry typically employs a fixed or dynamic pric- ing strategy for its products and services (does the price vary depending on sales- depending factors). The duration control parameter refers to if the industry has any way to control or forecast the dura- tion of their customers use of perishable assets (assets diminish in value if they are not rented or used, e.g. rental cars or ta- bles at a restaurant). Kimes et al. (1998) also noted that most successful applica- tions of RM occur within quadrant two. Companies that are looking to employ the business practice should, if possible, try to position themselves within this quadrant by manipulating duration and price.

More recent studies have examined the psychology of buyers towards revenue manage- ment practices. Bougie et al. (2003) examined the customer’s perception of fairness in connection to RM practices and the effect that this perception had on businesses.

Consequences of a negative fairness perception include a loss of repeat customers, bad reviews spread via word-of-mouth recommendations and in some extreme cases violent behavior. Heo and Lee (2011) contrasted 2 surveys conducted 8 years apart (1994 and 2002), aiming to measure the perception of fairness in the hotel and airline industry. In the first study, when RM was new to the hotel industry, people felt they were treated unfairly when booking identical hotel rooms and paying different rates. In the airline industry, where RM had been present for some time, the perception was different.

Instead, people felt that they were being treated fairly. However, the second study revealed that this disparity had diminished significantly as people had grown accus- tomed to the practice. This indicates that transitions from quadrant one to quadrant two requires some measure of tact to avoid the negative ramifications that come with it.

(21)

2.3 Approaches to Model Demand

The main research question of the thesis states that it intends to provide clarity in how to balance demand and increase profits in the last-mile delivery industry. In order to achieve this, one must look at the dynamics of demand itself and uncover approaches to manipulating them.

During the inception of revenue management, the models and motives of companies were fairly simple (Cross, 1998; McGill and Van Ryzin, 1999). In industries with limited, temporary and allocated products, the goal was to fill vacancies with customers that were willing to pay less than full price to increase margins. In the academic review performed by McGill and Van Ryzin (1999), the airline industry employed dynamic pricing by offering discounted tickets for ”earlybirds”, passengers booking tickets a predetermined amount of days in advance. The obvious conundrum with this policy is that the number of full-fare passengers vary between flights unpredictably, limiting the efficacy of such approaches. It became clear that in order to employ revenue management efficiently, it is necessary to build an understanding of the demand and its constituent factors. Quoting Hicks (1939):

Discussion of equilibrium conditions is always a means to an end; we seek information about the conditions governing quantities bought at given prices in order that we may use them to discover how the quantities bought will be changed when prices change.

Understanding demand is largely a matter of understanding the product or service for sale. Friedman (2017) presents a rather simple model in which for any given price of a commodity, one would expect a particular quantity per unit time sold, generally related in an exponential manner. In a more complex scenario, he suggests that a com- modity or service may be subject to composite demand, meaning demand aggregated over all of its uses. However, this model seems rather one dimensional, especially when considering that some products or services are multi-dimensional in their competitive advantages. Furthermore, it is not a reasonable model to employ when dealing with limited products and services.

Whilst demand is in large part a matter of understanding the commodity, it seems that this does not provide an all-encompassing explanation. Perhaps, the answer lies partially within the characteristics of the commodity. For example, Bayer et al. (2011) researched modeling demand in the real estate market. They observed, that models theretofore were static in the sense that they did not consider customers to perceive future value of the object. Whilst such a parameter may be important when purchas- ing a house, this is not necessarily the case in other scenarios. Epple (1987) proposed a demand model based on multi-characteristic hedonic theory. Hedonic demand theory builds on the assumption that the market price of a product or service can be mod- eled as a function of a vector of desirable attributes. In an article by Rosen (1974), a framework is presented detailing how the market price emerges given enough inter- action between the supply and demand side of a commodity. However, he does not address how this transparent approach to selling a commodity affects the customer’s willingness-to-pay. Hedonic theory provides a more detailed model when it comes to evaluating the importance of various characteristics in the commodity. However, it

(22)

does not cover factors outside the commodity that may affect willingness-to-pay.

In the last-mile delivery industry, everyone is essentially paying for the same service (to get their goods transported home to their doorstep). It’s a convenience service, people pay money to save time. This trade-off varies in value depending on the consumer.

So perhaps, the demand model should not be focused so much on the service, rather the consumer purchasing the service. Clear (2018) pinpoints this perspective with an example in which he is interested in purchasing a bag for $19. Unfortunately, the shipping fee is $45 making him hesitant in his purchase. He contemplates whether or not $45 is worth the time it would take him to pick the bag up himself. The portrayed sentiment in this example isn’t that he is unwilling to pay $45 for shipping of any product, rather that $45 exceeds his value-time trade-off and that there is a significant disparity between the price of the product and the service of delivery. Aarts and Bijleveld (2014) discusses the subjective value of money, observing that evaluations of monetary gains and losses are influenced by their relation to an absolute, yet individual, reference point. This thought is expounded through the relationship between money and utility. In 1738, Daniel Bernoulli suggested that monetary units (e.g. $1) provide diminishing marginal utility, whereby each additional unit increases its utility less than did the previous unit (Stearns, 2000). Inversely, the loss of one monetary unit has a diminished psychological effect the more money an individual has. Argyle and Furnham (2013) notes that these effects may be enforced by circumambient factors as well, such as the strength of the economy or demands from the household. Measuring demand as a function of willingness-to-pay should therefore try to quantify public sentiment as well.

2.4 Dynamic Pricing Models

Viewing revenue management as the strategy encompassing management of assets and resources as to follow market conditions, dynamic pricing are the actual means by which to execute on this strategy. Dynamic pricing has become increasingly popular in industries where companies have the ability to store inventory. Elmaghraby and Keskinocak (2003) state that this increase in popularity could be summarized as a consequence of three factors: (1) increased availability of demand data, (2) ease of changing demand via new technologies, and (3) the availability of decision support tools to facilitate the analysis of demand data.

In industries that are reliant on crowd sourced human resources (e.g. Uber and Lyft), dynamic pricing is of paramount importance as the supply and demand of the par- ticular service or product changes drastically on a daily basis (Banerjee et al., 2015;

Chen and Sheldon, 2016). They rely on a surcharge multiplier called ”surge pricing”, a dynamic pricing strategy that is employed when the supply of taxi drivers cannot meet the demand of requested fares. This has multiple important implications for the business. Firstly, it acts to balance the demand, as prices go up the demand goes down.

Secondly, it engenders a positive reaction from employees since this surcharge is paid in full to the employee, facilitating flexible work hours. This example gives clarity to the factors listed above. Uber gathers demand data simply from gathering requested fares, that have both a source and a destination. This allows them to measure demand across different areas. The mechanic of surge pricing allows for Uber to control the

(23)

demand. Through the app, drivers gain the decision support tools to analyze areas where they stand to make the most profit (Uber, 2018).

There has also been a plethora of journal articles that attempt to unravel the intrica- cies of dynamic pricing through the use of sophisticated theoretical models (Gallego and Van Ryzin, 1997). Zhao and Zheng (2000) studied a model in which they as- sumed inventory to be limited and demand to be unpredictable and price sensitive.

Any inventory not sold by a certain time horizon would be sold at salvage value. The distribution of customer’s reservation price (maximum price the customer would ac- cept) is known to the business. The model shows positive result as to increase revenue of the business, as opposed to a fixed-price policy. Ulmer (2017) researched the vi- ability of using Markov decision processes to model the dynamic pricing strategy of same-day delivery where the customer is presented with multiple time windows, each with an associated price. The pricing strategy was explored through value function approximation, determining the pricing for each offered time slot, bar the last slot which was reserved for next-day delivery. The value function approximation approach delivered promising results, increasing revenue by 13.2% and serviced customers by 3.9%, in comparison to a fixed-price policy.

Whilst Ulmer (2017) provides a solid foundation for dynamic pricing in same-day de- livery, the model presented is highly theoretical, where the willingness-to-pay follows a stochastic distribution. It can be developed further by including factors that constitute the willingness-to-pay function, in turn facilitating the segmentation of customers.

2.5 Machine Learning

The final section of the chapter reviews some previous machine learning applications within operations research, eventually narrowing down to dynamic pricing. As stated previously, more advanced topics within machine learning (ML) will be covered in Chapter 4. To the end of understanding the basic mechanics of the field, we offer a more elementary and summarized explanation of supervised learning below. Su- pervised learning is a branch of machine learning where both inputs and outputs are revealed. Machine learning and supervised learning will henceforth be used inter- changeably.

Put simply, ML refers to algorithms that become progressively better at performing a task through observing data, without being explicitly programmed. In this context, explicitly programmed refers to supplying the software with task-specific instructions.

The observed data consists of two pieces of information, namely inputs and outputs.

Outputs are the phenomenon which we wish to gain a deeper understanding of, whilst the inputs are variables used to understand that phenomenon. Given zero noise in the measurements of the inputs and outputs, they are instead referred to as features and targets, representing a truthful mapping of correlation. The goal of this school of algorithms is to uncover hidden insights concealed within the data, subsequently used to predict future events, without human intervention or assistance. This process is typically called generalization, referring to the generalization of a small, observed sample size to a larger population. Some rudimentary examples of machine learning are presented in the more advanced chapter on machine learning.

(24)

The application of artificial intelligence and machine learning techniques within op- erations research is not a novel concept. An academic review of operations research within banking, performed by (Fethi and Pasiouras, 2010), revealed that the use of such techniques were extensive, including support vector machines, neural networks, decision trees and nearest neighbor methods. One of the more interesting applications is the use of k-nearest neighbor to identify the weights of performance measures in balanced scorecards, supporting decision making (Sohn et al., 2003). Another exam- ple of particular interest is the use of SVM (support vector machines, mentioned in Chapter 4.4) and ANN (artificial neural networks) for intrusion detection (Chen et al., 2005). The SVM performed surprisingly well, identifying 99.6% of intrusions with a false-positive rate of merely 4.17%.

Despite Talluri and Van Ryzin (2006) devoting an entire chapter to machine learning techniques in their book ”The theory and practice of revenue management”, there seems to be limited literature covering the intersection between the two subjects.

Gosavii et al. (2002) used reinforcement learning (a school of machine learning that explores learning-by-doing approaches to value inference) to optimize revenue of air- fare ticket pricing, incorporating all parameters within control of the airline company.

Furthermore, they observed that many scholars faced with similar problems are un- able to accommodate all of the model parameters as the resulting stochastic model would be either too large or complex to be computationally tractable. To counter-act many of the negative side-effects of employing a make-to-order operations, Barut and Sridharan (2005) proposed using decision-tree based analysis to determine whether or not to accept an order. In scenarios where demand exceeds the capacity significantly, the strategy managed to produce a notable increase in profit margins compared to first-come first-served policies. Tsai et al. (2009) used neural networks to forecast demand in the train industry, paramount in correct decision making regarding seat allocation, overbooking and pricing. Despite the wide variety of techniques used in revenue management, there seems to be scarce employment of Gaussian processes and ensemble learning.

The concept of machine learning could indeed perform well in modeling a wealth of different scenarios. Gosavii et al. (2002), who studied airfare ticket pricing with reinforcement learning, stated that other studies applying revenue management to the airfare industry had trouble accommodating all parameters of the problem. Many articles that apply revenue management use linear programming to optimize certain problems (Anderson et al., 2000; Bertsimas and De Boer, 2005; Zhang and Adelman, 2009). This presents a limitation in the studies of these articles. Linear programming requires an objective to be expressed through a mathematical expression. Furthermore, this objective must be constrained by decision variables that impact this objective.

Many realistic scenarios have too many variables to be enable a linear programming approach. This is one of the core strengths of machine learning, in that it is an example- based approach to learning relationships. With a thoughtful choice of machine learning models, one can study a problem without computational constraints, given a sufficient amount of data. On the other hand, there are also methods within the realm of machine learning that are computationally expensive. Talluri and Van Ryzin (2006) also noted the important use case of Bayesian learning, a school of machine learning

(25)

that combines prior knowledge or intuition of a problem with observed data. This type of learning lends itself to new problems, where observations have yet to be made, for example in the case of launching a new product or service. In these cases, the rules of the industry still apply to some extent. The task is determining how the components of the studied system interrelate in this new setting.

Chapter Summary

Operations research has traditionally been focused around small but impactful problems.

The typical procedure within OR involves measuring a phenomenon, subsequently mak- ing adjustments. One of the most successful applications of OR is revenue manage- ment, conceived by airliners to take advantage of variations in customer’s willingness- to-pay. The typology of RM can be divided into four quadrants based on a combination of variable pricing and duration control, where the most successful applications of RM occur when prices are variable, and the duration of perishable asset use is predictable.

Traditionally, industries that have moved toward this quadrant have been met with some backlash from consumers as it can be perceived as unfair. To the end of modeling demand, many models focus entirely on the characteristics of the commodity. Hedo- nic theory is one of these, a model that regards demand as a multi-variate function of its constituent, desirable characteristics. Whilst giving a detailed description of the commodity itself, it does not incorporate the willingness-to-pay of customers. This aspect has translated to some theoretical models of dynamic pricing, which builds on the assumption of a univariate willingness-to-pay function. This elucidates the posi- tioning of the thesis as the study of willingness-to-pay as a multi-variate function of circumambient factors.

(26)

3 Theoretical Frame of Reference

This chapter introduces concepts that will be used as a foundation for theoretical-based decisions and discussion in the thesis. The strategic levers of price and time are clar- ified, and suggestions offered as to how they could be incorporated into the pricing model. Subsequently, the chapter gives a brief outline of a revenue management in- formation system, detailing the steps by which the customer interacts with the pricing model.

3.1 Strategic Levers: Price and Time

Kimes et al. (2015) notes three strategic levers associated with revenue management strategies in the restaurant industry, namely time, price and space. Traditionally, the interplay between these levers is that business sells space for a given amount of time, often accompanied by some service (as in the case of airfare, hotels and restaurants).

The strategic lever of space has limited applicability to the same-day delivery industry and is therefore discarded.

In this thesis, the strategic lever of price concerns the equilibrium between profit and growth. Under the assumption that demand increases as prices decrease, the business has the ability to control this lever through the pricing policy. In the same way that a dynamic pricing strategy exploits willingness-to-pay as a function of the consumer’s perception of value, the value of this strategic lever can be altered by manipulating the perceived price function. Denoting the model function as M(π) where π is the price, the strategic lever can be implemented as a constant c, following:

M(π) = f (π) + c (1)

The constant controls how the model’s perception of price sensitivity in the customers.

A positive value will engender a decrease in the average price of pricing policy, whilst a negative value would have the opposite effect. The exact formulation of the model function will be expounded in later chapters.

The strategic lever of time, in the same-day delivery industry, concerns equilibrium between service and customers per vehicle. This lever relates to the service time of each customer (the time it takes to deliver the package once the delivery destination has been reached). Having low service time will allow for each vehicle to serve more customers, but may decrease customer satisfaction. This lever can be manipulated by altering the cost function in the dynamic pricing model. However, for the scope of this thesis, the cost function will remain constant during testing.

3.2 Revenue Management Information Systems

Talluri and Van Ryzin (2006) gives a brief overview of the components of an RM system, including information flows, controls and designs. The typical process for the system includes the steps:

1. Data Collection: Collect and store relevant historical data (prices, demand and ambient causal factors).

(27)

2. Estimation and Forecasting: Estimate the parameters of the demand model, subsequently using the demand model to forecast demand or willingness-to-pay.

3. Optimization: Measure the results of the model towards the environment it is attempting to imitate, comparing it to the results of previous parameter settings, adjusting if necessary. This step also includes the aforementioned strategic levers, where the pricing function is optimized to reflect the business strategy.

4. Control: Control the sales using the optimized model. In the case of same-day delivery, this could involve contracting more vehicles in the delivery fleet to meet demand and other business functions traditionally handled by the operations department.

Figure 1: Adapted from Talluri and Van Ryzin (2006), illustrating the revenue man- agement process flow. As this paper concerns the process up until the distribution layer, subsequent layers have been excluded.

Figure 1 illustrates the different layers and the components of the revenue management process. In the data layer, information regarding the customers, product and historical pricing is stored. These are communicated to a collection interface, where relevant data for the model is transformed to the correct input. In the RM Model layer historical data is manipulated and visualized by analysts (operations researchers, statisticians, economists, marketing scientists, etc.), who then perform estimation, forecasting and optimization of demand and pricing policies (Talluri and Van Ryzin, 2006). Subse- quently, the customer is presented with price listings for the products (or services) and can choose to either accept or reject these. The result is communicated back to the data layer, providing the process with feedback.

(28)

Chapter Summary

In this chapter we described two strategic levers, and how they can be tied to the dy- namic pricing model (and revenue management system). The strategic lever of price can be altered by manipulating the function that models the perceived price of a cus- tomer. Setting the perceived price to be high, will cause the dynamic pricing policy to adjust to lower pricing, facilitating a growth strategy and vice versa. The strategic lever of time can be altered through the cost function, which adjusts prices to reflect a change in the service time of each customer. Finally, a revenue management information sys- tem was reviewed, detailing the process by which data is collected and transformed to generate a dynamic pricing policy. This process includes, gathering, forecasting, optimization and control procedures that govern the conditions of the pricing policy.

(29)

4 Machine Learning

This chapter constitutes the bulk of the technical theory of the paper. The intention is to give an overview of machine learning (ML) concepts that will be used in subsequent chapters. First, a background of ML and some of its application areas are presented to give context to the reasoning behind employing ML. Second, explanations are offered on key concepts and how they impact learning and prediction of the studied phenomenon.

These explanations will be used in later chapters as support for decisions made in the research process. As stated in one of the precursory chapters, this chapter assumes the reader to have some fundamental knowledge within statistics and probability theory.

4.1 Origin and History

Ever since the inception of the digital computer, researchers have been enthusiastic towards the thought of computers learning as we do. In many ways, this was the orig- inal intent behind the field of applied statistics now called machine learning. Samuel (1959) started this quest with the hopes of programming a computer to learn the game of checkers. In his paper, Samuel proposed two learning algorithms to learn the classic board game, one being the precursor to modern day neural networks. The results were surprising, the agents devised strategies based on look-ahead (evaluating potential future board positions), that were able to outperform average checker players.

The race was on. Researchers scrambled (in the most figurative sense) to unlock the promised potential of machine learning. However, this effort was somewhat misguided.

Michalski et al. (2013) describes this first era of ML to be one guided by ambitions to create general purpose learning systems. At this time, machine learning and AI were nearly synonymous in terms of goals and applications. The models used in this paradigm had little to no task-oriented knowledge or initial structure and were largely comprised of various forms of neural networks. These models learn through updating weights upon which a boolean signal is determined, hence the paradigm name ”con- nectionism”. However, with the publishing of the book ”Perceptrons” (Minsky and Papert, 1969), describing the limitations of connectionism, the research community grew cold towards these approaches. This sentiment was reinforced 4 years later, with the publishing of the Lighthill report (Lighthill, 1973), a government sanctioned report on the state of AI research in the UK. In the report, Lighthill portrayed the prospects of the research field as abysmal, with limited application areas outside well-constrained toy examples. The report was followed by wide-spread cuts in funding to ML research across Europe. The progress of connectionist approaches came to an abrupt halt and the first ”AI winter” (coined after the idea of a nuclear winter, foreshadowing a steep decline prosperity) descended upon the field of ML.

This trend became an unfortunate reality of the ML field as expectations intermittently increased with new breakthroughs, subsequently faltering by virtue of disappointing practical applications (Crevier, 1993). Fortunately, academia soon came to embrace the notion that the distinguishing feature of ML is its ability to learn near-optimal solutions to a problem, given a set of well-defined constraints. Goldberg and Hol- land (1988) emphasized this notion, stating that there is no apparent reason as to why ML must borrow from nature, where learning happens seemingly absent of constraints.

(30)

Following this new-found direction for the academic field, older methods experienced a resurgence in popularity due to their well-constrained application areas. The devel- opment of new methods became more frequent as well. Q-learning was added to the repertoire of reinforcement learning (Watkins and Dayan, 1992), a learning by doing school of ML, which strives to derive a value function for the actions of any given state in a system. Breiman (2001) introduced the concept of Random Forests, an extension of the simple decision tree method (a machine learning algorithm that considers the Shannon Entropy of splitting data into groups), which classified data points based on a majority vote from each of the independent, constituent decision trees. Moreover, machine learning methodologies started to find successful practical applications in in- dustrial settings, e.g. within pharmacy (Burbidge et al., 2001) and medical treatment (Shipp et al., 2002). One algorithm that found resurgence in this regard was deep learning, a successor to the learning algorithms of Samuel (1959), present in content filtering on social medias, personalized marketing on web shops, face and speech recog- nition and colorization of black and white images (LeCun et al., 2015).

Perhaps more significant is machine learning’s reestablishment as a meaningful field of studies, reflected in the increased availability of machine learning at universities and the increasing demand for data scientists on the labor market.

4.2 Key Concepts

Data, Features and Targets

This thesis focuses solely on a branch of machine learning called supervised learning.

At the center of supervised learning (a subset of machine learning revolving around input-output mapping) is the data itself. Data and information are two terms that are oftentimes used interchangeably, although data appears to be the preferred term within the machine learning community. The process of collecting data generally spawns out of a curiosity for a certain phenomenon, whereby that phenomenon is quantified. In the trivial case, one measures a 1-1 input-output mapping. However, sometimes it is necessary to quantify a phenomenon using several inputs. In this case, the input variables are referred to as features, collectively spanning a feature space (the collective domain of the input vector). Targets is another word for observed output and is regarded as the ”true” output of an input vector. Machine learning aims to produce a function with a high degree of similarity to the target function. In some cases, the researcher strives to extend the machine learning procedure to predict future events, by so called predictive analysis.

(31)

Figure 2: Illustration of a toy machine learning example where 5 samples have been collected. Under the assumption of a zero-noise environment, the output value of the samples are regarded as targets. Example credit to King (2018).

Figure 2 depicts a typical scenario of machine learning, the disparity between the pre- dictive and target functions. In this example, observing more samples would decrease this disparity, leading to more accurate prediction. Whilst this example is trivial, more realistic problems are often more difficult due to measurement inaccuracies that introduce environmental noise. In these cases, it may be necessary to observe multiple data at a given point to gain insight into the variance of measurements.

Feature Selection

In the previous section Shannon Entropy was mentioned, a term related to information embedded in the data. To clarify, Hall (1999) explains that a system has a measure of unpredictability (entropy), given by:

H(Y ) = −X

y∈Y

p(y) · log(p(y)) (2)

The value of including a feature X in the system is given by the relative entropy between the entropy of Y and the entropy of Y with respect to X. If the latter is smaller than the former, our understanding of the target increases by virtue of the feature’s inclusion.

H(Y ) − H(Y |X) > 0 (3)

Imagine the problem of classifying pictures as to contain either bananas or oranges, provided a set of 1000 training samples. Now imagine the likelihood of an image containing an orange if the number of orange pixels in the image exceed ones that are either green or yellow. The classification in this example is regarded as Y in the equations above, whilst the pixels are a feature X considered for inclusion. Another

(32)

valuable feature might be where the picture is taken, since bananas are predominantly grown in Africa, Latin America and some parts of the Pacific, whilst oranges are grown in India, China and Brazil. However, there are two instances where the inclusion of a feature might be undesirable. The first is if the feature gives no information regarding the phenomenon.

H(Y ) = H(Y |X) (4)

The second is if the feature is highly correlated with another feature. In this in- stance, the feature doesn’t provide that much information and perhaps contributes to computational difficulties, depending on the classification algorithm.

Feature Split

Feature splits is a concept with close ties to feature selection. Another way to think of entropy is as a measure of how disorganized a dataset is (Harrington, 2012). Feature split is a tool used to organize a disorganized dataset by splitting it into two, or more, less disorganized datasets. Consider again the example of pixels when classifying bananas and oranges. If the dataset is split based on whether there are more orange or yellow pixels in the images, it is reasonable to expect that the dataset becomes more organized. To clarify, each subset is more disproportionate towards one class than their parent dataset.

Applications of Machine Learning

Goodfellow et al. (2016) gives an outline of the general application areas of ML, from which the most common are:

• Classification: In classification tasks, the computer program is inquired as to which of C = {c1, c2, . . . , ck−1, ck} categories a datum belongs to. In these tasks, the product of the ML algorithm is generally a function of the form f : Rn → c ∈ C. An example of a classification task is mapping images of fruits to their respective label in which the datum is a vector of grayscale values and the output a string, e.g. ”apple”, ”orange” or perhaps ”banana”.

• Regression: Regression tasks, unlike their classification counterparts, inquire a prediction of some numerical value given an observed datum. Instead of produc- ing a label belonging to a category, the function is of the form f : Rn → R. An example of a regression task is stock forecasting, where, given historical values of the stock and other relevant indicators (such as newspaper articles and industry trends), the algorithm predicts a numerical value corresponding to the future value of the stock.

• Transcription: In this type of task, the machine learning algorithm is asked to observe an unstructured representation of some kind of data and then translate the information into discrete textual form. Transcription is a common approach within speech recognition, where the intent is to transcribe human speech to text and vice versa.

• Anomaly Detection: In anomaly detection, machine learning algorithms sift through objects and determine which of these are unusual or atypical. Anomaly detection has been used frequently in credit card fraud detection, where ML can

(33)

distinguish purchases made on the owner’s credit card that differ from those typical to the owner. This allows the bank to freeze the connected account, preventing further fraudulent purchases to be made.

4.3 Bayesian Machine Learning

There are two different approaches to probability, each proving useful in different sce- narios, namely the frequentist approach and the bayesian approach. They are distinct in their perspective on probability. The distinction is best illustrated through an ex- ample (Ipeirotis, 2018).

Example: A coin is flipped with a probability assigned to landing on heads or tails, p and 1 − p respectively. In a trial run of flipping the coin 14 times, the coin lands on heads 10 times and tails 4 times. The question is: ”Will the next two tosses of the coin yield 2 heads? ”

Frequentist Approach

The frequentist would view this new probability solely through the lens of the ob- served data. In this sense, he/she would estimate the probability of 2 consecutive heads to be the maximum likelihood estimate: p = 10142

≈ 0.7142 ≈ 0.51. Thus, the frequentist would say that is more likely for the two next tosses to be heads. However, there is an inherent flaw to this assumption. Assuming the coin to be loaded towards landing on heads, given 14 trials, may lead to erroneous estimates. If our intuition regarding the properties of coins are true, this probability would be equal to 0.25, in which the previous assumption resulted in a probabilistic error of 0.26.

Bayesian Approach

Bayesian statistics incorporates intuition into the probability model. Furthermore, p is no longer considered a probabilistic estimate, but rather p expresses a random variable associated with a probability distribution over the outcomes. Logically speaking, the Bayesian approach aims to determine the probability of p given the data:

P(p|D) = P(D|p) · P(p)

P(D) (5)

derived from Bayes’ Theorem (Bayes and Price, 1763). Each of these expressions have specific names:

P(p|D) = Posterior P(D|p) = Likelihood

P(p) = Prior P(D) = Evidence

which will be used frequently throughout this paper. In Equation 5, the likelihood in the example takes the form of a binomial distribution:

P(D|p) =14 10



· p10· (1 − p)4 (6)

References

Related documents

När det gäller tillgänglighet för personer med nedsatt hörsel så innebär det att det ska finnas möjlighet till att kommunicera med andra människor, ta del av information och

By studying and combining four domains; product development theory, traditional prototyping, computer aided engineering and learning theory, the concept of low-cost demonstrators

Having undertaken fieldwork within hotels operating in Sweden, we suggest that in order to overcome the misperceptions and perceived boundaries of yield management, the system must

Utöver detta märks inga reducering i prestanda i relation till vad prototypen skall klara av, en aspekt som uppmärksammats är dock att remdriften bör vara något bredare för

Ingka group in Shanghai have decreased emissions with approximately 1100 ton of CO 2 e from 360,000 truck deliveries by switching from fossil fueled trucks to electric vehicles

13 Except Family Planning commodities which are stored by ZNFPC.. share information with their trading partners according to Eurich et al. The same applies with regards to the

We took an inductive approach and sought to establish a general proposition from particular facts ( ​Håkansson, 2013)​. Using a mixed methods research design, we