Cost optimization in the cloud : An analysis on how to apply an optimization framework to the procurement of cloud contracts at Spotify

(1)

MSc. in Applied Financial Mathematics and Finance

Cost optimization in the cloud

– An analysis on how to apply an optimization framework

to the procurement of cloud contracts at Spotify

Harald Ekholm

Daniel Englund

Department of Management and Engineering Division of Production Economics

Link¨oping University

Spring term, 2020

ISRN: LIU-IEI-TEK-A–20/03859—SE

Supervisor : J¨orgen Blomvall Examiner : Jonas Ekblom

(2)

(3)

Upphovsr¨

att

Detta dokument h˚alls tillgängligt p˚a Internet - eller dess framtida ersättare - under 25 ˚ar fr˚an publiceringsdatum under förutsättning att inga extraordinära omständigheter uppst˚ar. Tillg˚ang till dokumentet innebär tillst˚and för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekom-mersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillst˚and. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet p˚a ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i s˚adan form eller i s˚adant samman-hang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible re-placement - for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies perma-nent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educa-tional purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authentic-ity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Link¨oping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its home page: http://www.ep.liu.se/.

(4)

(5)

Abstract

In the modern era of IT, cloud computing is becoming the new standard. Compa-nies have gone from owning their own data centers to procuring virtualized com-putational resources as a service. This technology opens up for elasticity and cost savings. Computational resources have gone from being a capital expenditure to an operational expenditure.

Vendors, such as Google, Amazon, and Microsoft, offer these services globally with different provisioning alternatives. In this thesis, we focus on providing a cost op-timization algorithm for Spotify on the Google Cloud Platform. To achieve this we construct an algorithm that breaks up the problem in four different parts. Firstly, we generate trajectories of monthly active users. Secondly, we split these trajec-tories up in regions and redistribute monthly active users to better describe the actual Google Cloud Platform footprint. Thirdly we calculate usage per monthly active users quotas from a representative week of usage and use these to translate the redistributed monthly active users trajectories to usage. Lastly, we apply an optimization algorithm to these trajectories and obtain an objective value. These results are then evaluated using statistical methods to determine the reliability.

The final model solves the problem to optimality and provides statistically reliable results. As a consequence, we can give recommendations to Spotify on how to minimize their cloud cost, while considering the uncertainty in demand.

(6)

(7)

Acknowledgements

We would like to thank our university supervisor J¨orgen Blomwall and our Spotify supervisor Scott Meyer for their thoughts, feedback and continuous support. With-out you this thesis never would have seen the light of day. In addition we would like to mention and thank our examiner Jonas Ekblom, our buddy at Spotify Sonja Ericsson and our Spotify manager Anders Hagman for valuable input throughout this thesis.

(8)

List of Figures

2.1 Simplified user journey in the Spotify application. . . 10

2.2 Simplified developer journey when deploying software. . . 10

3.1 Overview of the method. . . 13

4.1 Example of a stochastic process. . . 14

4.2 Example of a convex set (left) and a non-convex set (right) (Boyd and Vandenberghe 2004). . . 19

5.1 One month of 2-min usage of vCPU for the region europe-west1 (nor-malized). . . 33

6.1 MAU for the optimistic estimate with antithetic pairs. . . 42

6.2 MAU for the pessimistic estimate. . . 43

6.3 Comparison between the redistributed MAU split and the actual MAU split over regions. . . 43

6.4 Normalized demand translated from MAU for all regions. . . 44

6.5 2-min demand translated from MAU for all regions. . . 44

6.6 Approximation of fi,r for the resource vCPU in region europe-west1. The axes have been normalized in some cases to hide sensitive infor-mation. . . 45

6.7 Visualization of the on-demand cost function algorithm for vCPU in europe-west1. . . 46

6.8 The result from one optimistic scenario for vCPU. . . 48

6.9 vCPU demand for one month and the level of commitment (Y). . . . 49

6.10 The result from one optimistic scenario for vCPU, with Spotify initial inventory. . . 51

7.1 Percentage change in cost from the recommended coverage ratios for vCPU in europe-west1. The graph is built on the average of 121 months for 100 simulations to illustrate the benefits of maintaining the correct coverage. . . 59

A.1 Split of MAU for Spotify (Spotify 2020b). . . 63

B.1 Normalized usage of cores per program and region. . . 64

B.2 Normalized usage of ram per program and region. . . 65

C.1 Normalized average usage per hour over the week of memory and vCPU per region. . . 66

(12)

E.1 Normalized memory demand translated from MAU for all regions. . . 68 E.2 2-min demand translated from MAU for all regions. . . 68 F.1 Approximation of fi,r for the resource vCPU in region us-central1.

The axes have been normalized in some cases to hide sensitive infor-mation. . . 69 F.2 Visualization of the on-demand cost function algorithm for vCPU in

us-central1. . . 70 F.3 Approximation of fi,r for the resource vCPU in region europe-west1.

The axes have been normalized in some cases to hide sensitive infor-mation. . . 71 F.4 Visualization of the on-demand cost function algorithm for vCPU in

europe-west1. . . 72 F.5 Approximation of fi,rfor the resource vCPU in region asia-east1. The

axes have been normalized in some cases to hide sensitive information. 73 F.6 Visualization of the on-demand cost function algorithm for vCPU in

asia-east1. . . 74 F.7 Approximation of fi,r for the resource memory in region us-central1.

The axes have been normalized in some cases to hide sensitive infor-mation. . . 75 F.8 Visualization of the on-demand cost function algorithm for memory

in us-central1. . . 76 F.9 Approximation of fi,rfor the resource memory in region europe-west1.

in europe-west1. . . 78 F.11 Approximation of fi,r for the resource memory in region asia-east1.

in asia-east1. . . 80 G.1 The result from one optimistic scenario for memory. . . 81 G.2 Memory demand for one month and the level of commitment (Y). . . 82 H.1 Percentage change in cost from the recommended coverage ratios for

vCPU in all regions. The graph is built on the average of 121 months for 100 simulations to illustrate the benefits of maintaining the correct coverage. . . 83 H.2 Percentage change in cost from the recommended coverage ratios for

memory in all regions. The graph is built on the average of 121 months for 100 simulations to illustrate the benefits of maintaining the correct coverage. . . 84

(13)

List of Tables

1.1 Abbreviations and Explanation. . . 3 2.1 Machine types and configuration (Google 2020e). . . 4 2.2 On-demand price (USD) / region where Spotify is present (Google

2020c). . . 5 2.3 List price (USD) for machine type N1 & N2 in region us-central1

(Google 2020c). . . 6 2.4 SUD depending on utilization for a N1 machine (Google 2020g). . . . 6 5.1 GCP and MAU distribution (Fictional numbers). . . 29 5.2 Representation of Anorm. . . 31

6.1 Error between actual numbers of usage for vCPU and memory and the output from the algorithm. . . 45 6.2 Error metrics of the on-demand cost function approximation. . . 47 6.3 Coverage metrics and procurement in relation to mean demand for

vCPU, start inventory set to ni,r,0/36. . . 50

6.4 Coverage metrics and procurement in relation to mean demand for memory, start inventory set to ni,r,0/36. . . 50

6.5 Coverage metrics and procurement in relation to mean demand for vCPU, with start inventory set to Spotify’s inventory. . . 51 6.6 Coverage metrics and procurement in relation to mean demand for

memory, with start inventory set to Spotify’s inventory. . . 52 6.7 Expected value and standard deviation for 1000 simulations of

opti-mistic and pessiopti-mistic estimations. . . 52 6.8 95% Confidence intervals for 1000 simulations of optimistic and

pes-simistic estimations. . . 52 6.9 Optimality gap between pessimistic and optimistic estimation for

1000 simulations. . . 53 6.10 95% Confidence intervals for the difference between 1000 objective

values for the optimistic and pessimistic estimations. . . 53 F.1 Summery of the approximation of the cost function for vCPU in

us-central1. . . 69 F.2 Summery of the approximation of the cost function for vCPU in

europe-west1. . . 71 F.3 Summery of the approximation of the cost function for vCPU in

asia-east1. . . 73 F.4 Summery of the approximation of the cost function for memory in

(14)

F.5 Summery of the approximation of the cost function for memory in europe-west1. . . 77 F.6 Summary of the approximation of the cost function for memory in

asia-east1. . . 79 G.1 Coverage metrics and procurement in relation to mean demand for

(15)

H. Ekholm & D. Englund CHAPTER 1. INTRODUCTION

1 - Introduction

Twelve years ago, Daniel Ek and Martin Lorentzon launched an application that came to revolutionize how people consume music. This application, known as Spo-tify, has grown exponentially over time and now boasts 286 million monthly active users (MAU) (Spotify 2020a). To sustain the rapid user growth and provide its ser-vices on a global scale, Spotify went from owning one server in 2008 to over 10,000 in 2015. Spotify soon realized that managing their own Information Technology (IT) infrastructure with sufficient quality, scale and cost efficiency was a difficult problem to solve.

The solution for Spotify was to shift from an on-premise infrastructure, with their own data centers, to virtualized computational resources. This is known as Infrastructure-as-a-Service (IaaS), where IaaS providers uses virtualization technology to make their physical data centers and hardware accessible via the Internet, often referred to as the cloud. It is one of the models available in the cloud computing universe where a pool of IT-resources, e.g., processing power, network bandwidth, storage and software, are available via the cloud. IaaS disrupted the IT industry with the possibility to procure computational resources as a service. This have opened up for immense scaling opportunities and a certain elasticity where companies only pay for resources being used. Computational resources have gone from being a capital expenditure to an operational expenditure. (Michael et al. 2009)

Vendors, providing IaaS, operating on the cloud computing market are for example Amazon, Google and Microsoft. Spotify chose to partner with Google and uses their IaaS called Google Compute Engine (GCE), which is a product on their Google Cloud Platform (GCP). The GCE provides a variety of virtual machine (VM) types running on Google’s global IT-infrastructure and can be customized to meet a client’s requirements and workload. There are two provisioning alterna-tives offered by GCE called on-demand and committed contracts. With on-demand contracts VMs are only debited when the resources are utilized and can be acti-vated/deactivated with flexibility. On-demand contracts enables the company to procure resources dynamically to fit a fluctuating and unpredictable demand. Com-mitted contracts on the other hand, are contracts that specifies a certain usage over a period of time. On GCE these contract are either one or three year commitments, and because of the commitment the company earns a significant discount.(J.R. Stor-ment 2019)

One of the challenges is to forecast the future demand. The time series can be con-sidered to be stochastic with properties as trends and seasonality. If the company can capture all the characteristics in the process and predict the amount of resources needed, a certain level of committed contract can be preferable. Nevertheless, it in-vokes several risks such as under- and overutilization which can generate unnecessary expenses due to the uncertainty in demand. On the other hand, if they only select on-demand resources the company forfeits the opportunity of potential cost savings. In the end, the problem lies in predicting the future demand of computational

(16)

re-H. Ekholm & D. Englund CHAPTER 1. INTRODUCTION

sources to procure the most efficient level of commitments. Hence, minimizing the cost while considering risk.

In this thesis, we investigate forecasting models and the use of operational analysis tools to suggest an optimal procurement strategy algorithm for GCE contracts.

1.1 Purpose of research

The purpose of this thesis is to suggest an optimal procurement process algorithm for minimizing the costs of computational resources under uncertainty.

1.2 Research delimitation

This thesis is solely focused on the consumers of IaaS on the cloud. In addition, it only considers products within the GCE environment. The only machine types that will be considered are N1 and N2. With regards to usage, only memory and vCPU will be taken into account. The same reasoning is also applied for contract types, where only on-demand and committed contracts will be considered. The thesis will also only take the three main regions us-central1, europe-west1 and asia-east1 into account.

(17)

H. Ekholm & D. Englund CHAPTER 1. INTRODUCTION

1.3 Definitions

Table 1.1: Abbreviations and Explanation.

Abbreviation Explanation

API Application Programming Interface CLT Central Limit Theorem

CUD Committed Use Discount GCE Google Compute Engine GCP Google Cloud Platform

GDPR General Data Protection Regulation GKE Google Kubernetes Engine

IaaS Infrastructure as a Service

i.i.d. independent and identically distributed IT Information Technology

LLN Law of Large Number MAU Monthly Active Users

OUP Ornstein–Uhlenbeck process PaaS Platform as a Service

SAA Sample-Average Approximation SaaS Software as a Service

SDE Stochastic Differential Equation SES Simple Exponential Smoothning SMA Simple Moving Average

SUD Sustained Use Discount

vCPU Virtual Central Processing Unit VM Virtual machine

(18)

H. Ekholm & D. Englund CHAPTER 2. BACKGROUND

2 - Background

In order to find the method to achieve the purpose of this thesis, stated in section 1.1, an understanding of the GCE environment and the mechanisms behind the demand of resource is needed. This chapter is divided into three parts. The first part describes the GCE environment. The second part gives a brief introduction to the different applications and platforms deployed in the cloud by Spotify. The third part of the chapter present use cases, from both a user and developer perspective, to give the reader a visualization of the allocation process.

2.1 Google Compute Engine

To be able to formulate the problem it is important to understand Google’s IaaS, known as GCE. The GCE provides VMs running on Google’s IT-infrastructure and offer a variety of configurable machine types, (see subsection 2.1.1), optimized to the clients workload (Google 2020b). These machine types can be deployed in different regions, due to Google’s global network infrastructure, and be billed on-demand or as a committed use contract. In addition to this, the effects of sourcing agreements will be introduced. This is because a sourcing agreement between two companies can effectively change the cost dynamics. In the end, all of these parameters affect the cost and needs to be considered when formulating the problem.

2.1.1 Machine types

A machine type can be considered as a set of virtualized hardware resources which can be constructed to manage different workloads. GCE has created a product range with predefined machine types categorised as general-purpose, memory-optimized, and compute-optimized families. If the predefined machine types does not meet the requirements to handle a specific workload, it is possible to create custom machine types. (Google 2020e)

Spotify mainly procure two machine types from the general-purpose family named N1 and N2. These machines can be configured with a specific number of vCPU and memory, but with some restrictions. The vCPUs needs to be purchased in whole numbers and depending on the number of acquired vCPUs a minimum amount of memory has to be purchased. The memory also needs to be a multiple of 256 megabytes (MB). The restrictions for these machine types can be seen in table 2.1.

Table 2.1: Machine types and configuration (Google 2020e).

Machine type vCPUs memory / vCPU N1 1 - 96 0.95 - 6.5 GB N2 2 - 80 0.50 - 8.0 GB

(19)

2.1.2 Regions and zones

GCE machine types are available at several locations all over the globe. The GCE environment is divided into 31 regions where each region compose of one or more zones where data centers are located (Google 2020f). Spotify deploys machines in a number of regions, with some samples presented in Table 2.2.

Table 2.2: On-demand price (USD) / region where Spotify is present (Google 2020c).

Region price / vCPU hour price / GB hour us-central1 0.031611 0.004237 europe-west1 0.034773 0.004661 asia-east1 0.036602 0.004906

Google (2020d) and Hargrove (2020) discuss the importance of choosing the right regions when deploying services in the cloud. They both cite latency, data laws, pricing and disaster protection as important factors to consider before deciding.

Both authors mention the importance of hosting applications as close to the users as possible to minimize latency. However, Google (2020d) emphasizes the importance of the user experience and states that many companies forget about this. When considering which regions to deploy, Google states that it is important to also con-sider factors such as how connected the region is to other regions. By connected, the author refers to how the regions are linked in terms of infrastructure. For example, if a company with a global user base wants to deploy only to one region, the best choice would be the US. This is because the US region is the most connected to the other regions, something that would minimize latency problems. (Google 2020d) To summarize they conclude that it is important to understand the dynamics of regions in order to make the best decision.

Laws can affect where a company would like to store and process data (Wall 2016). It is therefore important to consider different laws such as GDPR1 _{and other data}

protection laws according to Hargrove (2020). The author explains that by choosing which region or zone a machine is located in it is possible to control where data is stored and processed, thus staying compliant. GDPR is one of the reasons to why Spotify has decided to handle Dataproc (see subsection 2.2.1) in Europe (Hagman 2020).

Hargrove (2020) discuss the importance of considering pricing when deciding what region to deploy resources in. The author mentions that pricing differs across regions because of several different reasons such as real estate costs, energy cost, taxes and availability. Considering the GCE prices, (seen in table 2.2), vCPU and memory are lower in the US compared to Europe and Asia. The US is approximately 10% cheaper compared to Europe and 16% cheaper than Asia.

Network outages could have multiple negative consequences. Hargrove (2020) de-scribes how these situations could potentially cost billions for a business. In order to

(20)

protect systems from outages and blackouts the author emphasizes the importance of distributing resources across multiple regions. Google (2020d) designs zones and regions to be independent from each other. Hence, if services are hosted across mul-tiple regions a potential disturbance in one region could be mitigated by transferring traffic to another region.

2.1.3 Pricing and billing

The GCE pricing differs across regions depending on the machine type and contract type, which are introduced below. Specific examples of pricing can be seen in Ta-ble 2.3. In addition to this, there are rules regarding the minimal usage and billing intervals.

Table 2.3: List price (USD) for machine type N1 & N2 in region us-central1 (Google 2020c).

Machine type N1 & N2 Predefined vCPUs per vCPU hour Predefined Memory per GB hour Discount (compared to on-demand) On-demand price 0.031611 0.004237 0% 1 year commitment price 0.019915 0.002669 37.0% 3 year commitment price 0.014225 0.001907 55.0%

For GCE, Google charges a minimum of one minute of usage. After one minute the customer is charged per second. For example, if a customer deploys a VM for 45 seconds, the customer will be charged for one minute of usage. (Google 2020h).

On-demand contract

For the on-demand contract the user can activate and deactivate a machine with full flexibility. The user only pays for the exact usage which enables a certain elasticity2_.

(Google 2020h) When a user acquires on-demand contracts they can be eligible for a certain discount. This is referred to as a Sustained Use Discount (SUD) and depends on the utilization per month of on-demand resources. For example, if a VM has a monthly utilization of more than 25% the customer will receive a 20% discount (see table 2.4). For N1 machines this is capped at 30% and for N2 at 20%. A more detailed overview of the SUD can be found in appendix D. (Google 2020g)

Table 2.4: SUD depending on utilization for a N1 machine (Google 2020g).

Usage level (% of month) % at which incremental is charged

0%–25% 100%

25%–50% 80%

50%–75% 60%

75%–100% 40%

2_{An example of elasticity would be that a user can activate many machines during peaks and then turn them off}

when usage is going down. Hence, the user is not required to have a large amount of machines idle waiting for an increase in resource usage.

(21)

Committed use contract

By procuring the committed use contract the user commits to a certain usage for a predetermined time horizon (one or three years with GCE). Procuring this type of contract requires more planning and invokes a certain risk. However, by using these contracts the user receives a significant discount which is referred to as a Committed Use Discount (CUD). Depending on the machine type and the length of the chosen contract the discount can vary between 15% − 60%. An example of the discount can be seen in Table 2.3. (Google 2020a)

2.1.4 Sourcing agreements and their impact on the problem

Sourcing practices tend to vary across different businesses. It is therefore important to understand the dynamics of this process in order to fully understand the problem in this thesis. As mentioned there are many ways to work with strategic sourcing of products and therefore it exists many different contract types depending on the effort put into the sourcing process.

In a study by Belotserkovskiy et al. (2018) they conclude that many contracts often lack the basic elements related to performance, benchmarking, pricing adjustments and continuous improvements. The authors states that a direct result of this is an increase in the overall cost of the contract. However, Belotserkovskiy et al. also states that many organisations are competent with regards to sourcing. The more successful organisations often use more advanced processes and contracts that tend to be more extensive. These contracts often include conditions such as price reduc-tions, incentive fees, extensions and similar measures. (Belotserkovskiy et al. 2018) Price reductions are put in place as the supplier is expected to increase efficiency over time resulting in lower costs. Incentive fees can be introduced in order to motivate a supplier to strive towards success and secure delivery according to expectations. Extensions can be part of a contract in order to further motivate the supplier to exceed the terms of the contract. (Belotserkovskiy et al. 2018)

Spotify has an active partnership with Google and when evaluating the cost dynam-ics in this thesis, the sourcing agreement must be taken into consideration.

2.2 The situation at Spotify

Even if the full dynamics of the GCE and the sourcing agreement was to be captured in the model, it would still not be sufficient to reflect the full dynamics of the cloud spend at Spotify. There are some strategic factors that comes into play.

2.2.1 Applications and platforms in the cloud

Spotify uses several applications and platforms which falls under the GCE label. Some of these are licensed platforms while some of them are applications which are procured as Software as a Service (SaaS). Below is an explanation of some of the platforms and applications that are relevant for this thesis.

(22)

- Apache Cassandra can be considered a platform and is an open source NoSQL column store database running on physical or virtual computers re-ferred to as clusters (ScyllaDB 2019a). Cassandra is often used for applications with very large data sets and sometimes operate in real time. Some features include fast caching, global distribution and fast access for big data. (ScyllaDB 2019b)

- Google Kubernetes Engine (GKE) is a platform for handling different containers. GKE offers an environment where it is easy to deploy applications while automating both scaling and monitoring. (Google 2019b)

- Helios is a platform created in-house by Spotify in order to simplify deploy-ment and managedeploy-ment of containers across many servers. It was created before the introduction of open source frameworks such as Kubernetes, hence it is not as developed as other associated platforms. Since the introduction of GKE, Spotify has stopped launching new features within the Helios engine. (Spotify 2019)

- Dataproc is a Spark and Hadoop powered cloud service supplied by Google that enables the user to access advanced open source data analytic tools for data processing across multiple categories. Dataproc administers the deploy-ment and managedeploy-ment process of clusters, thus enabling the user to spend more time analyzing the data than dealing with data processing. (Google 2019a)

- Elastic search API is an application that is used to provide scalable search when there is a vast amount of data (Elastic 2019). At Spotify it is used internally, e.g. when an employee searches for a document and externally when a user searches for a specific song (Spotify 2016).

An important factor to consider is the elasticity of the different products within the cloud. As the day progresses the demand for these products fluctuates. For example, the demand for different containers which are used to play music for users is going to be lower during the night. Due to this, it is important to understand how these products scale depending on the amount of usage. Apart from this, there are also additional drivers for elasticity. Factors such as product roadmaps, sourcing leverages and internal capabilities are all drivers of elasticity.

For products like Cassandra, Helios and Elastic search API, scaling is handled man-ually. When a new product is deployed, the developer sets a manual utilization level which, if it is reached, triggers a notification. The developer can then increase the amount of machines to avoid a shortage of resources. According to Koduah (2020) it is credible to believe that developers tend to set a higher utilization level in or-der to avoid these kinds of shortages. GKE and Dataproc on the other hand are examples of services that scale efficiently and automatically. This means that the developer sets a utilization threshold and the scaling is then handled automatically. A visualization of the scalability of each respective application/platform is available in Figure B.1 and Figure B.2.

(23)

In addition to this, there are also ongoing migrations between these applications and services. Spotify is migrating from Helios to GKE, and Cassandra is decom-missioned in favor of BigTable and Spanner (Meyer 2020). This will affect the demand of resources as GKE scales automatically while BigTable and Spanner are not products within GCE. Both these measures will undoubtedly lower usage as the automatic scaling will increase the utility, while all usage originating from Cassan-dra will disappear as the replacing services are not within GCE. As a consequence, it is important to consider factors such as usage, elasticity and migrations in order to model this complex problem.

2.2.2 The architecture

To map the demand of computational resources to specific regions, the architec-ture of Spotify has to be understood. As described above in subsection 2.2.1 the applications deployed in the cloud are maintained by developers. They decide the amount of resources that is allocated to run each service, under what policy the service should scale and where it should be deployed. Even if the Spotify applica-tion is available around the globe, all services/applicaapplica-tions are not deployed in all regions. For example, Dataproc is only hosted in europe-west1. Because of this, the computational resources in each region are not proportional to the MAU in that region. A user in asia-east1 might request a host running in europe-west1 and affect the usage in that region as well as in asia-east1. This creates difficulties to use MAU as a proxy for the demand of resources. In Figure A.1 one can observe the regional MAU-split. (Greenely 2020)

2.2.3 A shift in machine types

Spotify procures a majority of N1 configurations, however, they are currently testing if they can move workload from N1 to E2 machines instead. If a migration from N1 to E2 would occur, it is important to consider the active committed used dis-count (CUD) contracts and how the usage of vCPU and memory would be affected. (Meyer 2020)

Another initiative to lower cost include so called right-sizing of machines. Right-sizing is about increasing utilization of current machines in order to be as efficient as possible and only pay for resources being used. This is done by looking at different machines and the applications that are run on them to see if it is a good fit. For example, an application that requires a lot of memory, but not that much processing power would be ideal to run on a high-memory machine type. Right-sizing is important to have in mind when considering committed contracts as this would potentially lower the amount of vCPUs and memory. (Meyer 2020)

2.3 Use case of cloud consumption

In order to understand the dynamics of cloud consumption with regards to both the end-users and developers at Spotify two use-cases, Figure 2.1 and Figure 2.2 are presented.

(24)

Figure 2.1: Simplified user journey in the Spotify application.

Google Cloud Platform _europe-west1

us-central1

asia-east1

New service 3. After consideration the software is pre-launched

and tested in two Asian countries and it is therefore deployed in the asia-east1 region

Host 1 Host 2

Other service

Multiple hosts

Other Service

Multiple hosts Other service Other service

Multiple hosts

Other Service

Multiple hosts Other service Other service

Multiple hosts

Other Service

Multiple hosts

Multiple hosts Multiple hosts

1. A team has developed new software that they want

to deploy

Investigates / tests requirements

2. The team investigates / tests the requirements for the program in order to chose the most

suitable VM to host the program

Deployment of software

4. The software is deployed as a new service starting with two N1:s with 2 cores and 7.5gb RAM VMs.

This was decided after testing the software. Developers

(25)

H. Ekholm & D. Englund CHAPTER 3. METHOD OVERVIEW

3 - Method Overview

To be able to achieve the purpose of this thesis, stated in section 1.1, a method divided into four parts is suggested and visualized in Figure 3.1. The four parts are Data collection, Evaluation of potential methods, Model design and configurations, and Evaluation of model. These parts are described more in detail below.

3.1 Data collection

In order to understand the mechanisms behind the demand of computational re-sources, data from GCP and Spotify has to be collected. Time series, e.g., vCPU usage, memory usage, host utilization, and MAU, are to be collected to grasp the dynamics behind resource allocation.

Interviews with professionals, at Spotify and Google, will also be conducted to un-derstand the impact on overall resource demand from events such as, program mi-grations, right-sizing, new machine type introductions and similar efforts. These interviews will also be used, as a complement to GCP documentation, to get a more fundamental understanding of the billing dynamics of GCP contracts.

3.2 Evaluation of potential methods

The time series that have been collected will be analyzed with regards to its charac-teristics. This is important, since the model has to capture the stochastic properties and time series characteristics in order to represent an accurate view of reality. If the model does not capture these characteristics and properties, the optimal solu-tion may not be applicable. The different time series will therefore be analyzed both separately and together in order to find relationships that could be of interest, for example vCPU per MAU. Other valuable insights, such as a deeper understanding of the different program’s scalability, is needed to be able to capture the impact of, e.g., migrations between programs. This section aims to put forward methods that could be suitable to solve the problem.

3.3 Model design and configuration

Modelling will be conducted based on historical data and different forecast scenar-ios will be generated using appropriate methods. These methods will be derived from a comprehensive literature study and discussions with the thesis supervisor at Link¨oping University. Based on these simulations, a theoretical framework such as deterministic or stochastic optimization will be applied in order to fetch a feasi-ble solution. It can be considered vital that the applied methods are suitafeasi-ble with regards to the problem description and the findings in the data analysis. As a con-sequence, a strong emphasis will be put on analyzing the data and the compatible methods.

(26)

3.4 Evaluation of model

To investigate if the optimization results statistical methods such as confidence interval and optimality gap will be applied. The results of this part is important as it determines if the used approach is reliable and valid.

(27)

3.5 Model overview

1. Data collection • Collect data from GCP and Spotify

• Perform interviews with Spotify and Google

2. Evaluation of potential methods • Visualize data with graphs

• Analyze data with regards to trends and seasonality • Evaluate scalability of each program

• Investigate migration dynamics

• Decide methods based on collected data and properties

3. Model design and configuration • Conduct literature study

• Apply appropriate theoretical framework • Simulate different forecast scenarios • Implement entire model

4. Evaluation of model • Confidence interval

• Optimality gap

(28)

H. Ekholm & D. Englund CHAPTER 4. THEORY

4 - Theory

In order to propose a suitable method that can fulfil the purpose of this thesis a theoretical background relevant to the problem will be presented in this section.

4.1 Stochastic Processes

A variable that changes its value randomly is often referred to as a stochastic variable

X : Ω → R (4.1)

that is defined on a probability space (Ω, F , P ), Ω is the sample space, F is the σ-algebra and P is the probability measure.

A family of stochastic variables on the same probability space and indexed by a index set, is defined as a stochastic process (Lamperti 2012). For example, if a random variable changes value over time it can be seen as a stochastic process and be represented as a random trajectory, e.g.,X(t, ω) : T × Ω → R , where T is the index set. An example of this is illustrated in 4.1.

Figure 4.1: Example of a stochastic process.

The classification of stochastic processes depends on its properties. Discrete-time processes can only change the value of the variable at specific time points, whereas continuous-time processes can change at any time. The continuous-variable can take any value in a defined range, whereas the discrete-variable only can take specific dis-crete values.

To illustrate a discrete-time stochastic process, lets consider a stochastic process over the time period [t, T ] with N time steps ∆t = T −t_N and independent identically distributed (i.i.d) random variables

(29)

x(t + ∆t) = x(t) + ξ(t), ξ(t) ∼ i.i.d. N (0, ∆t) (4.2) The process, known as a random walk, changes value for each new time step. This value change is decided by its previous value and the random variable with proba-bility distribution N (0, ∆t). If N → ∞, then ∆t → 0 and the stochastic process can be considered as a continuous-time process

x(t + dt) = x(t) + ξ(t), ξ(t) ∼ i.i.d. N (0, dt) (4.3) The process now has an infinitesimal time step dt and can change its value at any time. To represent the increments of the continuous-time process, the notation dx(t) = x(t + dt) − x(t) is used.

4.1.1 Wiener Process

The mathematician Norbert Wiener gave a rigorous mathematical formulation to the stochastic process known as the Brownian motion. The process arose from rep-resenting the irregular motion of particles in fluids but is today used in many various fields, such as quantitative finance, applied mathematics, and physics. The stochas-tic process can, therefore, be referred to as a Brownian motion or Wiener process. (Malliaris 1990)

Malliaris (1990) defines the Wiener process as a real-valued continuous-time stochas-tic process on the probability space (Ω, F , P ) with index t ∈ [0, ∞):

Z(t, ω) : [0, ∞) × Ω → R (4.4) and has the following properties:

1. Z(0, ω) = 0, ∀ω ∈ Ω, by convention the process is assumed to start at zero. 2. The increments of the process Z(t) − Z(s), 0 ≤ s < t are independent from

each other.

3. For 0 ≤ s < t, the increment Z(t) − Z(s) have the probability density function 1 p2π(t − s)e −1 2 x2 (t−s)_, _(4.5)

which implies that the increments are normal distributed with µ = 0 and σ2 _{= t − s. In other words Z(t) − Z(s) ∼ N (0, t − s).}

4. For each ω ∈ Ω, Z(t, ω) is continuous in t, for t ≥ 0.

The Wiener process is a particular type of stochastic process where the only relevant information to predict the future is the current state of the variable. This is implied by the second property of the Wiener process with independent increments. This property is known as the Markov property (Hull 2018a). Hence, the Wiener process is a type of Markov process.

(30)

4.1.2 Generalized Wiener Process

The Wiener process has zero drift rate and a diffusion rate of 1.0. This means that the expected value of any future state of Z is equal to its current value and the variance equals the length of the time interval. A generalized Wiener process have two constants to determine the drift and diffusion rate of the stochastic process. An example of a generalized Wiener process can be defined for a variable x in terms of dZ(t) as:

(

dx(t) = αdt + γdZ(t) x(0) = x0,

(4.6)

where α, γ ∈ R and t ∈ [0, T ). The equation above is a stochastic differential equation (SDE) with the solution

x(T ) = x(0) + Z T 0 αds + Z T 0 γdZ(s) = (4.7) = x0+ αT + γZ(T ). (4.8)

The expected value and variance of x(T ) can be calculated as

E[x(T )] = E[x0+ αT + γZ(T )] = x0+ αT (4.9)

Var[x(T )] = γ2Var[Z(T )] = γ2T. (4.10) As a consequence x(T ) ∼ N (x0 + αT, γ2T ), since the Wiener process has normal

distributed increments.

4.2 Ornstein–Uhlenbeck process

The Ornstein-Uhlenbeck process is a mean-reverting process that is applied within areas such as financial mathematics and the physical sciences. The process is sim-ilar to the wiener process and the random walk but for a modification in process properties. These have been changed so that the process has a tendency to move towards a central mean. It can be expressed by the following stochastic differential equation (SDE)

dQt = κ(µ − Qt)dt + σdWt (4.11)

where Wt is considered a standard Brownian motion while κ and σ are positive

constants. The constant κ can be explained as the mean rate of reversion and is subject to the condition κ < 1. The second constant, σ can be described as the in-stantaneous volatility and reflects the amplitude of randomness entering the system. The variable µ is constant for which Qt fluctuate and it is often referred to as the

long term mean. The coefficient of dt is called the drift and of dWt is referred to as

the volatility. An interesting observation is that the drift is positive for Qt< µ and

negative for Qt> u. Hence, the process reverts towards µ when it is far away from

(31)

Deng, Barros, and Grall (2014) describes that when studying (4.11) one can observe that, as κ is supposed to be the long term mean, it is possible to simplify the SDE by expressing the change of variable

Ht = Qt− µ (4.12)

which then subtracts off the mean. As a consequence Ht satisfies the SDE

dHt= dQt= −κHtdt + σdWt. (4.13)

This process has a drift towards zero at the exponential rate κ for which the authors state that it motivates the change of variables such that

Ht= e−κtZt⇔ Zt= eκHt

This step effectively removes the drift which can be proved by using the product rule for Itˆo integrals described by Itˆo (1951). This is expressed as

dZt= κeκtHtdt + eκtdHt

= κeκtHtdt + eκt(−κHtdt + σdWt)

= 0dt + σeκtdWt.

The solution for Zt can be retrieved by Itˆo-integrating both sides from s to t such

that

Zt= Zs+ σ

Z t

s

eκudWu. (4.14)

By going back to the previous variables, the above equation can be expressed as

Ht= e−κtZt= e−κ(t−s)Hs+ σe−κt Z t s eκudWu (4.15) and Qt= Ht+ µ = µ + e−κ(t−s)(Qs− µ) + σ Z t s e−κ(t−u)dWu. (4.16)

This equation describes the analytical solution to the SDE for the Ornstein-Uhlenbeck process. (Deng, Barros, and Grall 2014)

4.3 Linear programming

Ferguson (2015) describes how linear programming has the objective to minimize, or maximize, a linear function which is subject to one, or multiple, linear constraints. These constraints may take the form of equalities or inequalities. In addition to these constraints, Lewis (2008) states that there are several implicit assumptions in linear programming.

- Proportionality: The contribution of any variable to either the objective function or its constraints, is proportional to that variable.

(32)

- Additivity: Each variable’s contribution to the objective function or its con-straints, is considered to be an independent value of the remaining values. - Divisibility: Decisions variables can be fractions. This can however be

sur-passed by using so-called integer programming.

- Certainty: All parameters in the problem are known with certainty. This can also be referred to as the deterministic assumption.

Having taken the prerequisite into consideration, Ferguson (2015) describes how many linear programming problems are not easy to solve. He states that it depends on the number of variables and constraints in the problem formulation. However, Ferguson mentions that in linear programming one often refers to two main classes of problems, the standard maximum problem and the standard minimum problem. For these two problems, all variables have to remain non-negative and all constraints are inequalities. (Ferguson 2015)

By introducing an m-vector, b = (b1, . . . , bm)T, an n-vector, c = (c1, . . . , cn)T and

an m × n matrix A =      a1,1 a1,2 · · · a1,n a2,1 a2,2 · · · a2,n .. . ... . .. ... am,1 am,2 · · · am,n     

where each element ai,j ∈ R, i = 1, . . . , m, j = 1, . . . , n, it is possible to introduce

the standard minimum problem. In this problem the objective is to find an m-vector, y = (y1, . . . , ym)T to minimize the objective function

yTb = y1b1 + · · · + ymbm (4.17)

which is then subject to the following constraints

y1a1,1+ y2a2,1+ · · · + ymam,1 ≥ c1 y1a1,2+ y2a2,2+ · · · + ymam,2 ≥ c2 .. . y1a1,n+ y2a2,n+ · · · + ymam,n ≥ cn and y1 ≥ 0, y2 ≥ 0, · · · , ym ≥ 0.

These different constraints can also be expressed in a matrix form as yT_{A ≥ c}T

and y ≥ 0. The vector y is considered feasible if it satisfies all the corresponding constraints. This framework applies to many deterministic problems when the ob-jective is to minimize the obob-jective function. (Ferguson 2015)

However, before even commencing solving linear programming problems Lewis (2008) states that it is important to consider the feasible region of the linear program.

(33)

Hence, it is important to have a convex set of constraints. Lewis describes how a set X ∈ R can be considered convex if the line segment between any two points in X lies in X. This is illustrated in 4.2 and can be mathematically expressed as ∀x1, x2 ∈ X, ∀θ ∈ [0, 1]

θx1+ (1 − θ)x2 ∈ X for ∀θ ∈ [0, 1].

Figure 4.2: Example of a convex set (left) and a non-convex set (right) (Boyd and Van-denberghe 2004).

4.4 Stochastic Programming

Shapiro and Philpott (2007) describes stochastic programming as an approach to model optimization problems that include uncertainty. According to Shapiro and Philpott these models are often applied to optimization problems where the decision is made repeatedly under essentially the same circumstances, with the goal to make a decision that performs well on average. This is justified by the Law of Large Numbers (LLN), which states that if the process is repeated multiple times, the average of the results obtained will converge to the expected value with probability one (Dekking et al. 2005). An example of a stochastic optimization problem is

min

x∈Xf (x) := EF (x, ξ(ω)) (4.18)

where the goal is to minimize the objective function F (x, ξ(ω)), that is a real valued function of two variables x ∈ Rn _{and ξ(ω) ∈ R}d_{, X ⊂ R}n_{, where} _{ξ : Ω → R}d_{is a}

random vector defined on the probability space (Ω, F , P ).

4.4.1 Multi-stage stochastic programming

In stochastic programming, the most studied and applied model is the two-stage program. In these problems, the decision-maker takes a decision in the first stage, before knowing the outcome of a future random event. In the second stage, after a realization of the random variable, the decision-maker can take a recourse action to compensate for any unfavorable effect caused by the first decision. The basic idea is that decisions should solely be based on the available information at the time and not be dependent on future observations. In situations where the decision-maker is

(34)

faced with, e.g., a planning horizon, where decisions should be made at certain peri-ods of time, the two-stage model can be extended to a multi-stage model. (Shapiro and Philpott 2007)

In multi-stage models, Shapiro and Philpott (2007) describes that the random vector ξ(ω) can be considered to be a stochastic process that is revealed gradually over time with observations {ξ1, ξ2, . . . , ξT}. They use the notation ξ[t]to denote the past

observations of ξ(ω) up to time t. As the decisions should be based on the available data, ξ[t], the sequence of decisions can be considered to be a stochastic process, Xt,

adapted to Fξ(t). The decision process can be viewed as

decision(x1) → observation(ξ2) → decision(x2) → . . .

→ observation(ξT) → decision(xT)

Shapiro and Philpott (2007) describes the basic-ideas of the multi-stage model with an inventory problem. In the description a company is faced with a planning horizon of T periods. The future demand, that is uncertain, is seen as a random process Dt

indexed by the time t = 1, . . . , T . In the beginning, at t = 1, the inventory level y1 is known. At each stage t = 1, . . . , T the company first observes the inventory

level yt and decides to replenish the inventory level to xt. This results in order

quantity xt− yt, which should be non-negative xt ≥ yt. When the inventory has

been replenished, the demand dt is realized, and the inventory level for the next

period becomes yt+1 = xt− dt. The cost of ordering the quantity in period t is

denoted ct, the backorder penalty cost is denoted btand the holding cost is denoted

ht. The total cost in period t can then be formulated as

ct(xt− yt) + bt[dt− xt]++ ht[xt− dt]+. (4.19)

The objective of the inventory problem is to minimize the expected value of the total cost over the planning horizon. The optimization problem can therefore be written as: min xt≥yt T X t=1 Ect(xt− yt) + bt[Dt− xt]++ ht[xt− Dt]+ s.t. yt+1 = xt+ Dt, t = 1, . . . , T − 1 (4.20)

4.5 Scenario generation

To be able to model the stochastic program as a deterministic optimization problem scenarios from the random vector has to be generated (Shapiro and Philpott 2007). The problem with a random vector is that there are an infinite number of scenarios that can occur. Even if the random vector would be discrete, the number of scenarios available can become significant. For example, under the assumption that each component of the random vector ξ(ω) ∈ Rd is independent to each other and can take two values, then the number of scenarios available is 2d_{. To cope with the}

exponential growth in number of scenarios, Linderoth, Shapiro, and Wright (2006) study sampling techniques that can be used to obtain an approximate solution of the objective function. The goal is to reduce the number of scenarios and still obtain a solution close to the original problem.

(35)

4.5.1 Sample-Average Approximation

Linderoth, Shapiro, and Wright (2006) mention the ”exterior” sampling approach known as Sample-Average Approximation (SAA). This approach uses a reduced number of scenarios to approximate the objective function by the average. The scenarios can be taken from historic data, or it can be generated via Monte Carlo simulation, which Hull (2018b) describes as a method for sampling random outcomes of a stochastic process. By generating a sample of {ξ1, ξ2, . . . , ξN} and assume they

are i.i.d., the expected value function in (4.18) can be approximated as:

f (x) ≈ ˆfN(x) = 1 N N X k=1 F (x, ξk), ∀x ∈ X (4.21)

where the probability of each scenario is identical and expressed as pi = 1/N, i =

1, . . . , N . This approach can be justified, as above in with the LLN, where the approximation ˆfN(x) converges to f (x) as N → ∞ with probability one. Since

the sample of ξ(ω) is assumed to be i.i.d. the approximation ˆfN(x) is an unbiased

estimator of the expected value function f (x).

Eh ˆfN(x) i = 1 N N X k=1 EF (x, ξk) = E F (x, ξ(ω)) (4.22)

Because of the above mentioned properties of the SAA problem, the SAA optimal solution will converge to the optimal solution for the original problem as N → ∞. (Shapiro, Dentcheva, and P. 2009)

To have a sample size reaching infinity is unrealistic and would be computational heavy. Therefore, a balance between the accuracy of the approximation and the sample size has to be decided. The Monte Carlo simulation technique is known to have a slow convergence, and can require a large amount of scenarios to give an accurate approximation. It can be shown that the accuracy of the approximation

ˆ

fN(x) is proportional to the size of the sample

Varh ˆfN(x) i = 1 N2 N X k=1 VarF (x, ξk) = σ2 N. (4.23)

This means that the standard error of the approximation is σ/√N . Hence, to increase the accuracy of the approximation by one digit the sample size has to be increased by a factor of 100. The convergence rate is also decided by the variance of F (x, ξ(ω)) (Shapiro and Philpott 2007). To decrease the amount of needed scenarios to get an accurate approximation, variance reducing techniques can be used in the Monte Carlo simulation.

4.6 Variance reduction techniques

Owen (2013) describes how Monte Carlo simulations often demand a significant amount of samples in order to find a viable approximation of the stochastic param-eter. In order to lower the amount of required samples the author study the use

(36)

of variance reducing techniques. One commonly used model to reduce variance is presented in the below subsections.

4.6.1 Inverse transform sampling

To use variance reducing techniques one must be able to generate random samples from any probability distribution. Inverse transform sampling is an example of such a method. By using the inverse cumulative distribution function F−1(x) it is possi-ble to generate random numbers from any probability distribution. This technique makes use of a uniform [0, 1] distribution to sample from another distribution. (Sig-man 2010)

The cumulative distribution function for a uniform random variable on U can be described as FU(x) = P (U ≤ x) =        0, if x < 0 x, if 0 ≤ x ≤ 1 1, if x > 1 (4.24)

It is possible to prove that inverse transform sampling is achievable for both the discrete and continuous case. (Scholtes 2002) Considering the discrete case, let X be a discrete random variable where pi = P (X = xi), i = 1, . . . , n. Then if U is a

random variable with a uniform distribution and 0 ≤ a ≤ b ≤ 1 then

P (a ≤ U ≤ b) = P (U ≤ b) − P (U ≤ a) = FU(b) − FU(a) = b − a. (4.25)

Thus for every n the following applies

P (p1+ · · · + pn−1 ≤ U ≤ p1+ · · · + pn) = pn. (4.26)

Scholtes then defines Y as a function of the random variable U as

Y = Φ(U ) =              x1, if U ≤ p1 x2, if p1 ≤ U < p1+ p2 .. . ... xn, if p1 + · · · + pn−1≤ U < p1+ · · · + pn (4.27)

Accordingly, Y and X now have identical distributions. By sampling u1, . . . , uk from

a uniform distribution, then Φ(u1), . . . , Φ(uk) becomes a sample from the discrete

random variable X. (Scholtes 2002)

4.6.2 Antithetic sampling

Owen (2013) explains that in antithetic sampling every random sample is mirrored by an antithetic sample. The author concludes that the technique may reduce the variance because of the negative correlation between the two values. In other words the goal is to minimize the variance by trying to cancel out the error in each sample with its opposite.

(37)

Owen (2013) illustrates antithetic sampling by letting µ = E(X) for X ∼ b where b, in this case, is a symmetric density. The symmetry described by the author is with regards to the reflection point through the center point c. Hence, if the point x is reflected through the center point c the point ˜x is obtained. Thus, ˜x − c = −(x − c), which can be rewritten as ˜x = 2c−x. Owen exemplifies by stating that if b ∼ N (0, 1) then ˜x = −x. To summarize, the antithetic sample is the exact reflection of the random sample with regards to the horizontal axis, hence the perfect negative cor-relation.

When applying antithetic sampling to the function f (x), the estimate of µ is given by ˆ µant = 1 n n/2 X i=0 (f (Xi) + f ( ˜Xi)) (4.28)

where Xi ∼ b and n is an even number. As mentioned, antithetic sampling is able

to reduce variance as each sampled value x is mirrored by ˜x which satisfies x+˜₂x = c. However, the size of the variance reduction depends on the function f . The variance for antithetic sampling is

Var(ˆµant) = Var

  1 n n/2 X i=0 (f (Xi) + f ( ˜Xi))   = n 2n2V ar(f (X) + f ( ˜X)) = 1 2n

Var(f (X)) + Var(f ( ˜X)) + 2Cov(f (X), f ( ˜X))

= σ

2

n (1 + ρ)

where ρ is is the correlation between f (X) and f ( ˜X). If Cov(f (X), f ( ˜X)) < 0 the antithetic estimation will have a lower variance compared to an estimation based on the average of random i.i.d samples. Hence, if f is monotone for all values of x, ρ < 0, antithetic sampling will with absolute certainty reduce variance. In addition, Owen (2013) demonstrates how it is hard to measure the absolute effects of monotonicity in terms of variance reduction. He argues that ρ < 0 can hold without f being monotone for all values of x. At the same time, ρ can be barely negative when f is monotone. As a result, Owen argues that monotonicity is not a useful tool for measuring the absolute effects of variance reduction with regards to antithetic sampling.

4.7 Pessimistic and optimistic bounds

A general problem with stochastic programming lies in evaluating the quality of an obtained solution. To solve this Shapiro and Philpott (2007) proposes a framework focused on the optimality gap. This gap is defined as the difference between the optimistic and pessimistic bounds. If there is a feasible point ˆx ∈ X then f (ˆx) ≥ v∗ where v∗ = minx∈Xf (x) is the optimal solution. However, as the optimal solution

(38)

is unknown in the majority of cases, this is not possible. Nevertheless, in order to evaluate the candidate solution, ˆx, Shapiro and Philpott study the optimality gap

gap(ˆx) := f (ˆx) − v∗. (4.29) Shapiro and Philpott (2007) describes the use of statistical methods to estimate (4.29). They outline how to estimate f (ˆx), for a two-stage stochastic programming problem, by using the Monte Carlo sampling technique. Monte Carlo is used to generate i.i.d random sample ξj, j = 1, . . . , N0 of ξ which is then used to estimate

f (ˆx) via the corresponding sample average ˆfN0(ˆx) = 1

N0

PN0

j=1F (x, ξj). The sample

variance for ˆfN0(ˆx) is then computed as

ˆ σ2_N0(ˆx) = 1 N0_(N0_{− 1)} N0 X j=1 h F (ˆx, ξj) − ˆfN0(ˆx) i2 . (4.30)

Shapiro and Philpott (2007) continues by stating that it is possible to use a relatively large number of samples N0 since the computation of F (ˆx, ξj) only requires solving

separate second-stage problems. It is then possible to compute

UN0(ˆx) = ˆf_N0(ˆx) + z_α.ˆσ_N0(ˆx) (4.31)

which is the estimated upper bound with a confidence level of 100(1 − α)%. The bound is calculated using the Central Limit Theorem (CLT) where zα = Φ−1(1 − α)

and Φ(·) is the cumulative distribution function of the normal distribution. (Shapiro and Philpott 2007)

After having calculated the pessimistic estimation Shapiro and Philpott (2007) out-lines how the optimistic estimation (lower bound) for v∗ can be computed. They start by denoting ˆvN as the optimal value for an SAA problem, mentioned in

sub-section 4.5.1, with sample size N . Then, in order to compute the lower bound for v∗, the authors stress how it is important to understand that Eh ˆfN(x)

i

= f (x). This means that the sample average ˆfN(x) can be considered an unbiased estimator

of the estimation f (x). At the same time they describe that for any x ∈ X the condition ˆfN(x) ≥ infx∈X˜ fˆN(˜x) applies. Hence, for any x ∈ X

f (x) = E[ ˆfN(x)] ≥ E inf ˜ x∈X ˆ fN(˜x) = E[ˆvN].

Shapiro and Philpott (2007) then takes the minimum over x ∈ X of the left side of the above condition which results in v∗ ≥ E[ˆvN]. It is then possible to estimate

E[ˆvN] by solving the previously mentioned SAA problems and taking the average of

the optimal solutions. In other words, SAA problems that all originate from i.i.d generated samples of size N are solved to optimality M times. ˆv1

N, . . . , ˆvNm are then

the optimal solutions for these SAA problems. Thus,

¯ vN,M := 1 M M X j=1 ˆ vj_N (4.32)

(39)

is an unbiased estimator of E[ˆvN]. Shapiro and Philpott (2007) concludes that as the

samples are independent, the optimal solutions ˆv1

N, . . . , ˆvmN can also be considered

independent. Thereby, it is possible to estimate the variance of ¯vN,M as

ˆ σ_N,M2 := 1 M (M − 1) M X j=1 (ˆv_Nj − ¯vN,M)2. (4.33)

The estimated bound for E[ˆvN] is then

LN,M := ¯vN,M − tα,νˆσN,M, (4.34)

where ν = M − 1 and tα,ν is the α-boundary for the t-distribution with ν degrees of

freedom. Shapiro and Philpott (2007) explains how it is in practice common to use smaller values of M. This is due to the fact that for lower degrees of freedom tα,ν

is slightly larger than the corresponding value for a standard normal distribution, zα. However, as ν increases tα,ν converges towards zα which is why smaller values

are often used. Nevertheless as v∗ ≥ E[ˆvN] the authors conclude that LN,M is a

valid lower bound for v∗. Hence, the gap can be described as the difference between (4.31) and (4.34)

d

gap(ˆx) := UN0(ˆx) − L_N,M (4.35)

with a minimum confidence level of 1 − 2α. Shapiro and Philpott (2007) emphasizes how the estimate of the lower bound, LN,M, is conservative. They describe how this

is due to the bias v∗− E[ˆvN] of the SAA estimator. This could potentially be quite

(40)

H. Ekholm & D. Englund CHAPTER 5. METHOD

5 - Method

This chapter introduces the implemented approach to answer the purpose of research of this thesis, presented in section 1.1. This approach is split into four different sec-tions. The first section focuses on data collection through databases and physical interviews. The second section is aimed at evaluating and outlining the different available methods. The third section outlines the model design and configuration specifying how exactly the purpose will be achieved. The last part focuses on eval-uating the result and analyzing the different scenarios.

5.1 Data collection

In this thesis the collection of data is split into two parts. The first part is the collection of quantitative data to provide deeper insights of the time series. The second part focuses on the collection of qualitative data to gain deeper insight into the different processes.

5.1.1 Quantitative data

Data containing the usage of vCPU and memory will be fetched from GCP. This data will be collected for a wide range of granularities and aggregations between platforms and regions. The reason for this is mainly due to availability, as many data sets were created when Spotify migrated to GCP during 2018. The different data sets will then be used to analyze seasonality, scalability and similar trends over time. For the simulation, a representative week of 2-minute granular data will be used. The reason for this is that the 2-minute data has to be fetched via a cross reference between databases which results in a very slow response time. As a con-sequence, it would take approximately two months of non-stop computing to fetch all available historic 2-minute data.

Data outlining the number of MAU will also be collected. It will, similar to the usage data, be used for simulation.

5.1.2 Qualitative data

Qualitative data will be collected by interviewing employees at Spotify and Google. This will be needed in order to form an understanding of the underlying mechanism for the usage of vCPU and memory. Furthermore this is also required to understand the dynamics and effects of the ongoing right-sizings and migrations described in subsection 2.2.1, subsection 2.2.2 and subsection 2.2.3.

5.2 Evaluation of potential methods

To find an optimal procurement strategy for GCE resources the future demand of vCPU and memory must be estimated. To predict the future use of GCE resources,

Cost optimization in the cloud : An analysis on how to apply an optimization framework to the procurement of cloud contracts at Spotify

MSc. in Applied Financial Mathematics and Finance