• No results found

End-to-end latency and cost impact of function segregation and customized memory allocation in FaaS environments

N/A
N/A
Protected

Academic year: 2022

Share "End-to-end latency and cost impact of function segregation and customized memory allocation in FaaS environments"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

END-TO-END LATENCY AND COST IMPACT OF FUNCTION SEGREGATION

AND CUSTOMIZED MEMORY

ALLOCATION IN FAAS ENVIRONMENTS

Desire´e Fredriksson

Bachelor Thesis, 15 credits

Bachelor Of Science Programme in Computing Science

2021

(2)
(3)

Abstract

Function as a service (FaaS) is a type of serverless cloud computing intended to facilitate development by abstracting away infrastructural management, and offer a more flexible pay-as-you-go billing model based on execution time and memory allocation. FaaS functions are deployed to the cloud provider as either single units, or chained to form a pipeline of multiple functions that call each other. As each step in the pipeline might have different requirements, it could be beneficial to split larger functions into smaller parts. This would enable cus- tomized provisioning according to each function’s needs, and potentially result in a lower rate. However, decreased memory entails a lower CPU performance, which directly correlates to computation time. A test application was created and executed on Google Cloud services to investigate what impact function segrega- tion, and provisioning accommodated to each sub-function requirement, have on end-to-end latency and total cost. In conclusion, no trivial relation between cost and performance was found. Segregating and adjusting provisioning to required memory was in this experiment cheaper in some cases, but not all; however, always significantly slower. In addition to price and workload behavior being considered and balanced, it was found that aspects such as level of control over infrastructural management and hardware configuration has to be weighed in when deciding if FaaS is a suitable alternative for a given situation.

(4)
(5)

Acknowledgements Tack f¨or att du ¨ar du, Klas.

(6)
(7)

Contents

1 Introduction 1

1.1 Purpose 1

1.2 Research question 1

1.3 Method used 1

1.4 Delimitations 1

2 Related work 2

3 Theoretical background 3

3.1 Cloud computing 3

3.2 Function as a service 4

4 Method 5

4.1 Test application 5

4.2 Experiments 7

4.3 Statistics 9

5 Results 10

5.1 Increased memory provisioning 10

5.2 Segregation latency 11

5.3 Custom memory provisioning 13

6 Discussion 15

7 Future work 16

References 17

(8)
(9)

1 Introduction

Function as a Service (FaaS) is a serverless computing technology and an increasingly popular model for application development. Functions are stateless, cloud-native code deployed to a cloud provider with minimal environmental configuration. They are invoked in response to an event and executed on servers completely managed by the provider. Idle computation is not billed, as tenants are typically charged in relation to actual usage. Cost is based on execution time combined with amount of memory allocated for the function. This model provides trivial upscaling and downscaling with alternating loads, and a more flexible billing model than traditional server rental where the user is charged a fixed amount each day.

1.1 Purpose

FaaS functions can be deployed as single units, consolidated, or chained to form a pipeline con- sisting of multiple functions that call each other, segregated. Calling a function from another yields a run-time penalty, latency, a delay between the call being made and its responding action. This latency is, however, not billed as long as the calling function does not depend on the result from the other, but has an effect on end-to-end time.

As each step in the pipeline might have different memory requirements, it could be beneficial to split a resource-intensive application into smaller functions. This would enable more accu- rate memory allocation according to each function’s actual needs, and consequently result in a lower rate. However, a decrease in memory also lowers CPU performance, which directly correlates to computation time. As computation time is an additional factor for total cost, the negative effect a lower memory provisioning might have on end-to-end time could possibly affect total price as well.

1.2 Research question

What impact does function segregation, combined with customized memory provision ac- commodated to each sub-functions actual requirement, have on end-to-end latency and total cost?

1.3 Method used

A test application was created and run on Google cloud services as either consolidated or segregated, provisioned with different amount of memory and CPU performance. Start time and end time for each function were measured for execution time and latency evaluation, and price was calculated by combining execution time of each function with the prices listed on Google Cloud Functions website.

1.4 Delimitations

FaaS is offered by several cloud providers, but only Google Cloud Functions is concerned due to the scope of this study. For the same reason, the experiments leave out data transfers in the experiments, and exclude the free tier and regions other than us-central1 in the price calculations.

1

(10)

2 Related work

Several studies have investigated the impact of different factors on the performance of FaaS.

In their paper FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC [5], Spillner et al. investigates how FaaS performs within scientific and high-performance computing (HPC), compared to conventional HPC platforms. The highly controlled settings and demands for repeatability associated with these disciplines are uncommon use-cases for FaaS, seeing as its main purpose is to abstract away infrastructural configuration. Having resources available on-demand and charged according to use, however, is appealing and mo- tivated the authors studies on FaaS potential applicability within this area. Spillner et al.

conducted four experiments to study performance and usefulness of FaaS when executing re- source demanding computing tasks; more particularly calculation of 𝜋 , face detection, pass- word cracking, and precipitation forecast. The FaaS model successfully executed the experi- ments, but did not outperform the approximate performance of common fast processors. On the contrary, it is generally far slower and the providers do not offer to specifically increase performance. Although when comparing vendors, a resemblance among them all was found in that increased memory provisioning also increased performance. Further, their results showed a significant difference in resource requirement characteristics among the domains.

To address this, the authors suggest investigating ”special-purpose FaaS instances”. Addition- ally, the authors contribute with some concepts and tools intended to improve the engineering process. One example is worm functions, a concept of how to manage execution time restric- tions by dividing larger functions into smaller ones, calling each other, all of which is short enough to adhere to the constraints.

Function partitioning is related to one of six challenges serverless computing face regarding performance, presented by van Eyk et al. in A SPEC RG Cloud Group’s Vision on the Perfor- mance Challenges of FaaS Cloud Architectures[6]. It concerns dealing with overhead in several stages of FaaS, one of them being request overhead. Invoking a function introduces latency, and an increased number of functions entails more overhead due to invocations between these functions. This calls for measures to reduce this overhead to an acceptable level. Another chal- lenge the authors introduce is understanding the trade-off between performance and cost.

Compared to on-demand virtual machines and containers, they describe that FaaS becomes less lucrative when requests exceed a certain number per second. Having pricing models de- pend on load variations and differ between vendors increases complexity, and makes it more complex to find the most advantageous alternative. Therefore, the authors suggest more re- search on how to simplify decision-making or possibly find automated solutions to aid the user on how to minimize operational costs.

RS Kannan et al. present a strategy to improve resource utilization and throughput in mi- croservice architecture management while still satisfying Service Level Agreements (SLA);

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks[4]. As data- center accommodation adopts to new ways of providing resources, with multiple tenants or applications accessing shared microservices, there must exist a procedure to ensure each user acquire the agreed level of service. Previous approaches suggest locating applications that are latency-sensitive and of high priority, with applications of lower priority, but the authors focus on a finer granularity. Microservices, the individual parts and foundation of the appli- cations, are analyzed by establishing factors that influence execution time. The intention is to make improvements and simultaneously enable co-location of several applications requir- ing low latency, and satisfy SLAs. The study is limited to artificial intelligence and machine learning, and findings included that is beneficial to execute requests from various sources as consolidated, since resource utilization improved.

2

(11)

3 Theoretical background

This section presents key concepts of this study and related work in more detail.

3.1 Cloud computing

Cloud computing is hardware, as well as software services provided on-demand over the in- ternet [2]. Users are able to acquire these virtual resources as needed, and are charged in relation to what is allocated. Intentions are to achieve flexibility and economical advantages by sharing resources. The services can be either publicly accessible, or privately owned and limited to a single organization. Some major public cloud providers are Amazon Web Ser- vices1, Google Cloud Platform2, and Microsoft Azure3. Examples of services they provide in- clude scalable data storage, computational resources, and managed services such as database or web server hosting. Compared to single corporate data-centers, these organizations are commonly distributed world-wide to enable data back-up and reduced latency for geograph- ically different locations. Tenants are offered regularly updated hardware and the option to customize settings and performance.

Maintaining cloud services at an affordable rate, while still profiting, has been made possible by efficient use of resources compared to the average data center [2]. Traditionally, it has been common to accommodate available resources to manage peak load, with the purpose to minimize the risk of capacity saturation that presumably leads to an unsatisfactory user experience and thereby a loss of revenue. As a consequence, under-utilization is experienced when computations are idle. This expensive surplus was inevitable due to fluctuating service demands, difficulties to predict future requirements, and that is time-consuming to obtain and configure hardware supply. With this new model for how computing power is distributed and managed, organizations and individual users are spared investments upfront regarding operating manpower and hardware, as well as maintenance cost, service agreements and rent.

Therefore, it can be profitable to utilize cloud services as total costs might be reduced by more accurately adjusting resources to current needs, even if actual computation cost might be higher compared to running on own servers. Additionally, inexpensive areas have been selected to establish massive commodity-computer data centers that, due to their size, are often eligible to receive favorable prices on service agreements and hardware.

Cloud architecture can be implemented with different levels of abstraction concerning hard- ware and operations, to accommodate different needs of server management [7]. Users can acquire services ranging from complete software application that are ready-to-use without any knowledge or management of its infrastructure, to pure compute, network, and storage resources that allow fine-tuned configuration. The various models mostly differ in the amount of work required to set up and continuously maintain a service, which is also the motivation in using either one of them. Centrally situated within this scope is an architecture known as serverless, where operational logic and configuration, such as automatic scaling and server management, is mostly abstracted away. Resources are dynamically allocated on-demand, and the user is charged according to usage. Included in this category is Function-as-a-Service, a technology where stateless functions deployed to the cloud are triggered as a reaction to events.

1https://aws.amazon.com/

2https://cloud.google.com/

3https://azure.microsoft.com/

3

(12)

3.2 Function as a service

Function as a service (FaaS) is a type of serverless cloud-computing that follows the increasing migration of applications towards a micro-service architecture [3]. Intentions are to facilitate development by abstracting away the infrastructure commonly associated with creating and deploying micro-services, and to present a more flexible billing model that applies pay-as-you- go with no costs upfront. Tenants allocate and receive resources as needed, and are charged in relation to actual usage. This assures enough resources with rising loads, and the ability to scale all the way down to zero when computations are idle to avoid unnecessary expenses due to surplus resources.

Functions are cloud-native code, for instance a JavaScript or Python function, provided by the user and deployed to a cloud provider [1]. They are invoked by events, typically an HTTP- request, and can be composed of a single, consolidated function, or segregated into a pipeline of several functions chained together by calling each other. In contrast to subscription based rental of full-stack virtual machines, users are charged according to execution time combined with the computational resources required by the function, in addition to an invocation fee.

Necessary infrastructure commonly associated with creating and deploying micro-services, such as hardware management, scaling, and load-balancing, is operated by the cloud provider.

This allows improved resource efficiency for providers, and let users focus on application logic.

Consequently, tenants have less control over hardware design and the operational composi- tion, providing increased oversight for cloud providers to govern the development stack, se- curity, enhance resource utilization, and endorse further use of supplementary cloud services [3].

Functions generally have a limited execution time set by the cloud provider4[8]. Initiating a function is associated with the cold start problem, where start-up time is affected due to the set-up of required dependencies and resources. To minimize the effect of this, these pre- pared resources are maintained and ready to use for some time after termination to allow the next function to reuse them. Additionally, functions are stateless and therefore require supplementary, shared storage if state persistence is needed.

Examples suitable for a serverless environment often display functional characteristics such as being stateless and event-driven, designed to have a sole assignment with little require- ments on latency [6]. Typical use cases are to connect different cloud services, handle the occasionally emitted data within Internet-of-Things, and performing as a minor backend for web applications [5].

FaaS was first introduced by Amazon year 2014 with their platform AWS Lambda5. Today, the service is offered by several cloud vendors including Google Cloud Functions6by Google, IBM Cloud Functions using Apache OpenWhisk by IBM7, and Microsoft Azure Functions8by Microsoft.

4Execution time limits vary among providers, but as of May 2021 most are positioned around 10-15 minutes.

5https://aws.amazon.com/lambda/

6https://cloud.google.com/functions

7https://www.ibm.com/cloud/functions

8https://azure.microsoft.com/services/functions/

4

(13)

4 Method

To investigate what impact function segregation, combined with customized memory provi- sion accommodated to each sub-function’s actual requirements, have on end-to-end latency and total cost, three experiments were conducted on the Google Cloud platform. A test appli- cation was created using Google Cloud Functions to explore particular aspects of FaaS, and integrated with Google Cloud Firestore9for data aggregation and storage. RStudio10and the language R were used for data visualizing and statistical analysis.

4.1 Test application

Each test case involves an application that is triggered by an HTTP-request and run as either consolidated or segregated in Google Cloud Functions, provisioned with different amount of memory and CPU. Google Cloud was chosen as the platform as it is one of the larger available providers with sufficient functionality to perform the desired experiments. The test applica- tion was artificially constructed to control resource requirements and enable partitioning into equal sub-functions, in order to ensure the right conditions for the experiment. Requirements regarding computation and memory is constant for each application, except for the additional invocations between functions.

The test application is written in Python and executed in the Google Cloud Python3.8 envi- ronment, performing a series of eight calculations of the 100 000th Fibonacci number 𝑓 (100 000), as described in Algorithm 1. The decision to use 𝑓 (100 000) results from a few test-runs to find a calculation with reasonable execution time considering the scope of this study. The number of consecutive calculations was chosen to enable segregation into enough functions of equal size to run sufficiently comprehensive experiments, and simultaneously keeping time required to build the applications at an acceptable level.

Algorithm 1:Fibonacci Input:Integer i

Result:f(i), the i:th Fibonacci number

1 int f1 = 0;

2 int f2 = 1;

3 for i = 0 to i-1 do

4 x = f1 + f2;

5 f2 = f1;

6 f1 = x;

7 end

8 return f1;

To illustrate how the test application is segregated into sub-functions within the various test- cases, the syntax shown in Figure 1 is used. The external box represents an application, and the internal boxes represent single functions, each with individual provisioning, executing a sequence of 𝑥 calculations of the 𝑖:th Fibonacci number. Optional arrows illustrate internal function invocations using Cloud Functions Invoker. The variable 𝑦 is a placeholder for the amount of memory allocated for each function, and 𝑧 the approximated CPU performance.

A single internal box indicates a consolidated application with one function and no internal function invocations. When mentioned, the term invocation index refers to the number in order an invocation arrow has, starting at one and counting from left.

9https://cloud.google.com/firestore

10https://www.rstudio.com/

5

(14)

f(i) * x f(i) * x

y MB, z GHz y MB, z GHz

Figure 1:Syntax used to illustrate how functions within the test-application are configured.

Individual provisioning is illustrated as 𝑦 MB memory, and 𝑧 GHz of approximate CPU performance. The task to perform is described as an 𝑥 long sequence of calcu- lations of the 𝑖:th Fibonacci number, and optional arrows represent internal func- tion invocations.

This arrangement of various pipelines aims to show how different levels of segregation af- fects end-to-end time by measuring start time and end time for each function, and compare any differences to the consolidated counterpart. A data collection utility backend was created using Google Cloud Firestore to record performance and latency metrics from the running ap- plications, as it integrates well with Google Cloud Functions and offer sufficient functionality for the experiments in this study.

Price calculation

Functions were provisioned different amount of memory between test-cases to measure any impact this might have on total price. The amount of allocated memory has a direct correlation to cost, as each 100 milliseconds execution time is charged according to level of provision- ing. However, amount of allocated memory also has a direct correlation to CPU performance, which impact another price factor, namely execution time. To examine this, price was manu- ally calculated according to measured times and the prices listed on Google Cloud Functions website11, and shown in Table 1. Only prices in accordance with the paid tier for us-central1 region is included, as it was used for the experiments without regards to the free tier or other regions.

Table 1Google Cloud Functions prices in USD for region us-central1, excluding the initial free tier, as of May 2021.

Paid tier (𝑈 𝑆𝐷) Invocation 4.0e−7 per unit

CPU Time 1.0e−5 per second Memory Time 2.5e−6 per GiB second 128 MB memory, 0.2 GHz CPU 2.31e−7 per 100ms 256 MB memory, 0.4 GHz CPU 4.63e−7 per 100ms 512 MB memory, 0.8 GHz CPU 9.25e−7 per 100ms 1024 MB memory, 1.4 GHz CPU 1.65e−6 per 100ms 2048 MB memory, 2.4 GHz CPU 2.90e−6 per 100ms 4096 MB memory, 4.8 GHz CPU 5.80e−6 per 100ms 8192 MB memory, 4.8 GHz CPU 6.80e−6 per 100ms

11https://cloud.google.com/functions/pricing

6

(15)

The estimated execution cost was calculated using the current pricing table and the recorded execution times to get a stable estimation, as the billing model for Google Cloud Functions include a certain number of free invocations. The duration of the free tier depends on usage, and each month includes 2 000 000 invocations, 400 000 GB-seconds, 200 000 GHz-seconds of computation time, and 5 GB of Network egress traffic. Excluded from the price calculation in this study is the function invocation cost of 0.0000004 USD (0.40 USD per 1,000,000 invoca- tions), as it is not enough to significantly impact the measured values of this study. However, its existence should be noted and taken into account when calculating total price of functions.

The cost of function invocations is a fixed price per unit with no regards to the invocation source, its outcome or duration. Compute time fees depend on provisioned memory and CPU, and is measured and rounded up to the nearest 100 ms from when a request is received by a function to when it is terminated in any way. Network egress is charged in relation to usage (GB) at a flat rate, whereas inbound data, including outbound data to Google APIs within the same region or global, is free. Data transfers are not included in this study. Hence, the price for a certain function was calculated by multiplying its measured computation time with the price listed for running a function with its current provisioning.

4.2 Experiments

The data collection in this study was divided into three experiments, Increased memory pro- visioning, Segregation latency, and Custom memory provisioning, each with its own purpose towards answering the research question. To focus each experiment on a single measure al- lows for less complexity and therefore increases the chance to test what is intended, and that the result is not affected by circumstances regarding other experiments.

Increased memory provisioning

The first experiment was designed to investigate how memory provisioning affect computa- tion time. As seen in Table 1, Google Cloud Functions can be provisioned with seven different amounts of memory. CPU performance correlates to memory provisioning, so each increment also increase CPU. The highest two levels are the exception, as CPU remains the same between them. CPU performance is however described on the website as an approximation, which im- ply that the definite number of CPU clock cycles provisioned to a function is not certain, and therefore might vary from what was actually allocated.

f(100 000) * 8

 128 MB, 0.2 GHz;

 256 MB, 0.4 GHz;

 512 MB, 0.8 GHz;

1024 MB, 1.4 GHz;

2048 MB, 2.4 GHz;

4096 MB, 4.8 GHz;

8192 MB, 4.8 GHz;

Figure 2:Application configuration to investigate how differences in provisioning affect end- to-end time and computation time.

As CPU performance might have an impact on computation time, increased memory and con- sequently increased CPU could result in a faster run-time. The test application was therefore run as consolidated, once for each of the seven possible provisionings as illustrated in Figure 2. Start time and end time were logged for each function to measure possible differences in run-time for functions provisioned with the various provisionings.

7

(16)

Segregation latency

Experiment number two investigates the impact segregation has on end-to-end time and to- tal computation time. As segregation introduces internal function invocations that cause a latency penalty, end-to-end time might increase. There is also some extra logic within the calling function, needed to realize the invocation, that might affect total computation time.

This was tested by segregating the application into two, four, and eight functions, see Figure 3, creating a pipeline of sub-functions invoking each other.

f(100 000) * 4 f(100 000) * 4

256 MB, 0.4 GHz 256 MB, 0.4 GHz

f(100 000) * 2 f(100 000) * 2

256 MB, 0.4 GHz 256 MB, 0.4 GHz

f(100 000) * 2 f(100 000) * 2

256 MB, 0.4 GHz 256 MB, 0.4 GHz

f(100 000) * 1 f(100 000) * 1

256 MB, 0.4 GHz 256 MB, 0.4 GHz

f(100 000) * 1 f(100 000) * 1

256 MB, 0.4 GHz 256 MB, 0.4 GHz

f(100 000) * 1 f(100 000) * 1

256 MB, 0.4 GHz 256 MB, 0.4 GHz

f(100 000) * 1 f(100 000) * 1

256 MB, 0.4 GHz 256 MB, 0.4 GHz

Figure 3:Application configuration to investigate the impact segregation has on total com- putation time and end-to-end time.

Each function had the standard setting of 256 MB memory and 0.4 GHz CPU, and each pipeline was repeated 100 times. Start time and end time was logged for each function in order to mea- sure total computation time, and possible latency penalties introduced by functions invoking each other. The measured times were compared to the consolidated application provisioned with 256 MB from experiment Increased memory provisioning, as it is the non-segregated coun- terpart to the applications in this test-case.

Custom memory provisioning

The third experiment investigates the cost impact of segregating functions and adjusting memory provisioning according to each of the sub-function’s requirements. Minimizing avail- able memory according to actual needs might lower total cost as price depend on provision- ing. Although, since the price model amounts to decreased CPU performance with decreased memory provisioning, total computation time might also go down as it depends on CPU per- formance. Considering computation time is an additional factor when calculating price, this experiment examine what effect customized memory provisioning truly has on total cost.

Since actual usage of allocated memory has no effect on cost, but rather what is allocated, all sub-functions were identical beyond provisioning. In other words, the same Fibonacci calculation was performed within each function, with differences in provisioning.

8

(17)

The experiment compare the test-application, segregated in four differently provisioned sub- functions, with its consolidated counterpart provisioned similar to the highest provisioned sub-function. This aim to simulate two possible execution scenarios of an application, as it can conform to the highest required provisioning level throughout the application, or be divided into sections depending on memory requirements and adjust provisioning accordingly.

Provisioning distribution for the test-applications used in the experiment was decided to be represented in two different ways; the four lowest levels of CPU, as seen in Figure 4, and the four highest levels och CPU, as seen in Figure 5. The reason for this is to account for the fact that memory is doubled for each level, while this is not the case for CPU. Except for the highest provisioning (8192 MB), it is however close enough to be considered the same ratio for the scope of this experiment. The highest provisioning was therefore excluded in order to see if there is a difference despite the proportion among prices and CPU being practically the same. The effort is not extensive enough to cover all aspects of differences between prices and performance given various segregations and partitionings, but simply a means to discover indications to what contrasts might be.

f(100 000) * 2 f(100 000) * 2

128 MB, 0.2 GHz 256 MB, 0.4 GHz

f(100 000) * 2 f(100 000) * 2

512 MB, 0.8 GHz 1024 MB, 1.4 GHz

Figure 4:Application configuration to test cost impact of custom memory provisioning. Seg- regated into four sub-functions provisioned (128 MB, 256 MB, 512 MB, 1024 MB).

f(100 000) * 2 f(100 000) * 2

512 MB, 0.8 GHz 1024 MB, 1.4 GHz

f(100 000) * 2 f(100 000) * 2

2048 MB, 2.4 GHz 4096 MB, 4.8 GHz

Figure 5:Application configuration to test cost impact of custom memory provisioning. Seg- regated into four sub-functions provisioned (512 MB, 1024 MB, 2048 MB, 4096 MB).

Total cost was calculated for each application, and compared with end-to-end time to a con- solidated application provisioned enough to accommodate the highest requirement within each segregated application. For application visualized in Figure 4 this amount to 1024 MB, and 4096 MB for application visualized in Figure 5.

4.3 Statistics

Statistical analysis and data visualizing were conducted in RStudio using the language R. Box plots were used to characterize each group of measured values to present a graphical overview of the distribution, as they present adequate information in a small space and allow for a comparison between several groups. Hence, all groups within each test-case were illustrated collectively to demonstrate distribution in relation to each-other. Corresponding tables enable more exact comparisons of the measured values.

Within each test-case, one-sided, paired t-tests were conducted to examine statistical support for any differences found in the data. Paired tests were chosen to examine any impact due to different scenarios, and they were one-sided as variations were expected to go a certain direction. A significance level of 0.05 was chosen by convention. For each test group, 100 data-points was collected to get an adequate sample size, yet maintain a reasonable run-time in relation to the scope of this study.

9

(18)

5 Results

This section presents the results from the three experiments conducted in this study. Increased memory provisioning shows how variations in memory allocation affect computation time.

Segregation latencyshows how different degrees of segregation affect total computation time and end-to-end time. Finally, Custom memory provisioning shows how cost is affected by segregation and adjusting each sub-functions provisioning according to required memory.

5.1 Increased memory provisioning

Figure 6 shows how computation time of the consolidated application is affected by a differ- ence in memory provisioning. In the experiment, each increment between 128 MB-2048 MB induced a statistically significant increase in run-time, but further provisioning did not.

128 256 512 1024 2048 4096 8192

2468101214

Consolidated application with dynamic memory provisioning

Memory provisioning (MB)

End−to−end time (sec)

Figure 6:End-to-end time summary in seconds of seven applications provisioned differently.

As seen in Table 2, each memory provisioning has at least one test-run with a run-time of less than two seconds, around the same time as the higher provisionings were located. Apart from this, the run-times within each provisioning were rather concentrated.

Table 2End-to-end time summary in seconds of seven applications provisioned differently.

Min. (𝑠) 1st Qu. (𝑠) Median (𝑠) Mean (𝑠) 3rd Qu. (𝑠) Max. (𝑠)

128 𝑀𝐵 1.507 13.000 13.108 12.821 13.299 13.874

256 𝑀𝐵 1.314 5.140 5.298 5.633 6.265 6.652

512 𝑀𝐵 1.623 2.919 2.975 2.979 3.059 3.208

1024 𝑀𝐵 1.754 1.805 1.826 1.841 1.881 2.117

2048 𝑀𝐵 1.354 1.376 1.400 1.420 1.440 1.838

4096 𝑀𝐵 1.353 1.375 1.384 1.404 1.415 1.601

8192 𝑀𝐵 1.337 1.358 1.373 1.393 1.413 1.759

10

(19)

5.2 Segregation latency

Figure 7 shows how end-to-end time for an application provisioned with 256 MB memory is affected by different degrees of segregation. In the experiment, each doubling of functions in the applications resulted in a statistically significant increase in end-to-end time.

Consolidated Segregated in two Segregated in four Segregated in eight

24681012

Consolidated compared to segregated applications provisioned 256MB memory

Composition

End−to−end time (sec)

Figure 7:End-to-end time summary in seconds for applications provisioned with 256 MB memory and run as consolidated, segregated in two, four, and eight sub-functions.

The measured run-times, seen in Table 3, show that the mean end-to-end time of the consol- idated application is 5.63 seconds, and for the segregated applications 6.98 (2), 7.71 (4), and 10.84 (8) seconds. The first three applications are closer in time compared to the fourth. The same applies when considering application compositions, as the number of latency penalties introduced by the first three applications (0, 1, and 3) are closer than of the fourth (7).

Table 3End-to-end time summary in seconds for applications provisioned with 256 MB mem- ory and run as consolidated and segregated in two, four, and eight sub-functions.

Min. 1st Qu. Median Mean 3rd Qu. Max.

(𝑠𝑒𝑐) (𝑠𝑒𝑐) (𝑠𝑒𝑐) (𝑠𝑒𝑐) (𝑠𝑒𝑐) (𝑠𝑒𝑐)

Consolidated 1.314 5.140 5.298 5.633 6.265 6.652

Segregated (2) 5.157 6.896 6.986 6.981 7.094 8.299 Segregated (4) 7.145 7.513 7.660 7.710 7.802 9.060 Segregated (8) 10.27 10.67 10.80 10.84 10.99 11.99

As Figure 8 and its corresponding Table 4 show, there is no significant difference between any of the latency penalties introduced in the variously segregated applications provisioned 256 MB memory. In the experiment, each latency penalty caused by function-to-function invocations was measured to around 0.5 second, regardless of invocation index.

11

(20)

2 4.1 4.2 4.3 8.1 8.2 8.3 8.4 8.5 8.6 8.7

0.51.01.52.02.5

Individual function latency

Function invocation index

Latency penalty time (sec)

Figure 8:Latency penalties in seconds for each individual invocation index in applications provisioned with 256 MB memory.

Table 4Latency penalty summaries in seconds for applications provisioned with 256 MB memory. Includes the average invocation index, followed by each individual invocation index.

Min. (𝑠) 1st Qu. (𝑠) Median (𝑠) Mean (𝑠) 3rd Qu. (𝑠) Max. (𝑠)

Total avg. 0.3444 0.5073 0.5262 0.5452 0.5865 2.5898

Index 4.1 0.4176 0.5111 0.5262 0.5426 0.5411 2.5898

Index 4.2 0.3824 0.5096 0.5580 0.5752 0.5728 1.8673

Index 8.1 0.3444 0.4549 0.5100 0.5200 0.5738 0.7124

Index 8.2 0.4212 0.5152 0.5344 0.5401 0.5531 0.8329

Index 8.3 0.4195 0.5135 0.5641 0.5615 0.5943 0.9004

Index 8.4 0.3469 0.5103 0.5387 0.5670 0.6046 0.9477

Index 8.5 0.4160 0.5095 0.5411 0.5554 0.6040 0.7201

Index 8.6 0.4215 0.5049 0.5160 0.5175 0.5280 0.8286

Index 8.7 0.3633 0.5101 0.5516 0.5330 0.5591 0.6556

Subtracting total latency penalty from end-to-end time for each application shows how total computation time is affected by segregation. By this calculation, the consolidated application has a mean computation time of 5.63 seconds, and the segregated applications mean compu- tation times of 6.42 (2), 6.069 (4), and 7.046 (8) seconds. Structural difference is that invocation logic is introduced in all but the last function in the segregated application’s functions. Al- though simply a comparison of means, and that no apparent pattern explaining these differ- ences in computation time for the lower amount of segregation was found, a higher amount of segregation gives some indications to increased computing time.

12

(21)

5.3 Custom memory provisioning

When combining segregation with customized provisioning, the results show a different re- lationship between cost and performance depending on application composition.

Figure 9 shows that the consolidated application, provisioned with 1024 MB, has a signifi- cantly shorter end-to-end time and lower total execution cost than an application segregated in four sub-functions provisioned with 128 MB, 256 MB, 512 MB, 1024 MB, respectively.

Consolidated Segregated (4), custom memory

24681012

Runtim consolidated vs custom memory

Application composition

Runtime (sec)

Consolidated Segregated (4), custom memory

3e−074e−075e−076e−077e−078e−07

Cost consolidated vs custom memory

Application composition

Cost USD

Figure 9:End-to-end time in seconds and cost in USD for application segregated in four and provisioned with (128 MB, 256 MB, 512 MB, 1024 MB), and consolidated application provisioned with 1024 MB.

Table 6 shows a 0.437e-06 USD difference in mean price, and Table 5 shows a 7.056e-06 seconds difference in mean end-to-end time, between the consolidated and segregated applications.

Table 5End-to-end time in seconds for application segregated in four and provisioned with (128 MB, 256 MB, 512 MB, 1024 MB), and consolidated application provisioned with 1024 MB.

Min. (𝑠) 1st Qu. (𝑠) Median (𝑠) Mean (𝑠) 3rd Qu. (𝑠) Max. (𝑠)

Consolidated 1.754 1.805 1.826 1.841 1.881 2.117

Segregated 7.356 8.547 8.768 8.897 9.094 11.881

Table 6Cost in USD for application segregated in four and provisioned with (128 MB, 256 MB, 512 MB, 1024 MB), and consolidated application provisioned with 1024 MB.

Min. 1st Qu. Median Mean 3rd Qu. Max.

(𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷)

Consolidated 2.894e-07 2.979e-07 3.014e-07 3.038e-07 3.104e-07 3.493e-07 Segregated 3.102e-07 3.299e-07 3.361e-07 3.475e-07 3.475e-07 7.942e-07

13

(22)

Figure 10 shows that the consolidated application, provisioned with 4096 MB, has a signifi- cantly shorter end-to-end time and higher total execution cost than an application segregated in four sub-functions provisioned with 512 MB, 1024 MB, 2048 MB, 4096 MB, respectively.

Consolidated Segregated (4), custom memory

2345

Runtim consolidated vs custom memory

Application composition

Runtime (sec)

Consolidated Segregated (4), custom memory

6e−077e−078e−079e−07

Cost consolidated vs custom memory

Application composition

Cost USD

Figure 10:End-to-end time in seconds and cost in USD for application segregated in four and provisioned with (512 MB, 1024 MB, 2048 MB, 4096 MB), and consolidated application provisioned with 4096 MB.

Table 8 shows a 2.256e-06 USD difference in mean price, and Table 7 shows a 1.871e-06 second difference in mean end-to-end time, between the consolidated and segregated applications.

Table 7End-to-end time in seconds for application segregated in four and provisioned with (512 MB, 1024 MB, 2048 MB, 4096 MB), and consolidated application provisioned 4096 MB.

Min. (𝑠) 1st Qu. (𝑠) Median (𝑠) Mean (𝑠) 3rd Qu. (𝑠) Max. (𝑠)

Consolidated 1.353 1.375 1.384 1.404 1.415 1.601

Segregated 2.968 3.131 3.190 3.276 3.271 5.456

Table 8Cost in USD for application segregated in four and provisioned with (512 MB, 1024 MB, 2048 MB, 4096 MB), and consolidated application provisioned with 4096 MB.

Min. 1st Qu. Median Mean 3rd Qu. Max.

(𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷) (𝑈 𝑆𝐷)

Consolidated 7.847e-07 7.977e-07 8.026e-07 8.145e-07 8.204e-07 9.287e-07 Segregated 5.538e-07 5.688e-07 5.796e-07 5.889e-07 5.914e-07 7.967e-07

Thus, segregating an application to customize provisioning caused, in these two experiments, a lower price in one case, and a higher price in the other. Regarding end-to-end time, the experiment show that the consolidated applications are significantly faster in both test cases.

14

(23)

6 Discussion

To gain understanding of FaaS and explore how price of functions are affected given the pre- vailing circumstances, experiments were conducted to identify factors that affect cost and how they can be influenced. Function segregation and provisioning according to each sub- functions requirement was ultimately chosen as factors to focus on in the study, as provi- sioning is one of few modifiable parameters in FaaS. Provisioning also has a direct relation to price, and an indirect relation to runtime, which in turn also affects cost but in the opposite di- rection. Therefore, as both runtime and cost are important aspect to consider in development, it is interesting to investigate how these factors are actually related.

The results show that segregation increase end-to-end latency significantly. Each function in- vocation was measured to approximately 0.5 second, a consistent number regardless of num- ber of segregations and invocation index. However, different provisionings were not tested to see if this made an impact in some way. The impact this latency penalty has, might be consid- ered to depend on the length of the functions as it will be proportionally greater paired with shorter ones. In relation to the measured length of latency penalties in this study, current execution time limits for functions are set rather short. Therefore, it might make segregation less frequently used within these limits, but rather as a means to increase execution time by making pipelines of timely maximized functions to enable more time-consuming computa- tions. If increased latency actually matters also depend on existing requirements, as it might be crucial in some situations to minimize run-time, while not as important in other cases.

Cost-wise, the results are not as straight forward. Starting with provisioning, although the ratio between price and CPU is practically the same throughout all levels, there is a different relation between prices and performance for the segregated and consolidated applications when customizing memory with lower and higher provisionings. The consolidated applica- tion, provisioned as the highest provisioned sub-function in its segregated counterpart, was in general cheaper when the segregated application was provisioned with the lower levels, but more expensive than its counterpart when provisioned with the higher levels.

When looking at performance of each memory provisioning individually, there are large dif- ferences in end-to-end time among the four lowest levels. Between the top four provision- ings, on the other hand, only small differences in end-to-end time are present, and almost no difference at all between the top threes. When testing customized memory allocation, all four sub-functions in the higher provisioned application has therefore a performance close to the maximally measured in the experiment throughout its execution, whereas the lower one has more variation of significantly lower performance. Although performance variation between each provisioning level differ throughout the scale, price is rather consistently in- creased. Hence, as the top four provide similar performance, it is more beneficial to pay for the lower ones as price decrease while performance remains, than to pay for the highest pro- visioning from start to finish. The same relation between cost and performance can not be found with the lower provisionings, where a higher price is followed by a rise in performance.

In the experiment, this could be why cost ended up being lower, as it paid off enough in exe- cution time to provision higher throughout the lower provisioned application, but not in the higher provisioned application.

To test this further, the experiments could have included more computationally intensive cal- culations in order to possibly receive more distinct variations among the higher provisioning levels. This way the results might show the same relationship between cost and performance throughout the provisioning scale. It could also be interesting to divide the application in other formations, as the number of segregations and each unit’s provisioning might have an

15

(24)

impact on the result as well. Nevertheless, this indicates that the FaaS price model might be more complex than simply reducing memory provisioning.

So, segregating and adjusting to required memory were in this experiment cheaper in some cases, but always significantly slower. If cost is of higher priority, or there is a necessity to divide into smaller functions due to exceeded time limits, it might be the only option or not problematic that end-to-end time is increased. Circumstances requiring speed should therefore consider limiting the number of segregation as function invocation cause latency penalties, and increase provisioning as the results show that it raises performance. There are also some indications in the results that segregation might increase computation time as well, which could possibly be caused by the supplementary code needed to invoke the following function; this could also be a factor that cause total price to rise. Suitable in cases where execution time is of the essence, but there is not necessarily a need for more memory, would perhaps be to enable the purchase of only CPU power, as discussed by Spillner et al. As a result, both CPU and memory could be adjusted individually and therefore more accurately.

This could be beneficial as it is not necessarily desirable to adjust them simultaneously, due to potentially negative consequences this might have on price or performance.

Segregation is also interesting when considering instance optimization, as discussed by Spill- ner et al., suggesting functions to be specialized for certain types of data processing. If each segregated sub-function is not only adapted to memory usage, but also run on a server spe- cialized for its specific tasks, it could potentially increase performance and to some extent compensate for latency penalties and other FaaS overheads presented by van Eyk et al. In the means to facilitate the efforts to meet SLAs, as discussed by RS Kannan et al., gaining infor- mation regarding factors that influence execution time like this, can be valuable observations to improve the outcome. This kind of information could include not only computation type, but also knowledge gained from the experiments such as affects by latency penalties due to segregation, and alternative provisioning.

In conclusion, there is no simple template regarding appropriate use-cases for FaaS in relation to cost and performance, as well as level of overall control. Several aspects such as required level of management and hardware configuration control, cost, and workload behavior needs to be considered and balanced when deciding if FaaS is a suitable alternative for a given situation.

7 Future work

Future work could investigate test case Custom memory provisioning further, as the results were not sufficient to establish a fairly generalized picture of the FaaS billing model. To ac- complish this, a more extensive study, exploring additional scenarios, is necessary.

Other aspects of cloud functions, such as network, would also be interesting to investigate as it was excluded from this study. Parameterized functions would enable examining data transfers and how they are affected under varied circumstances, such as memory allocation.

Future studies could also compare service supply and performance differences among var- ious providers, as offers may vary and be suitable for different requirements. Results could highlight similarities and differences, and perhaps reveal shortcomings to encourage improve- ments. As desired by van Eyk et al., this could aid users navigating the trade-off between cost and performance make informed decisions and find the most advantageous alternative, when considering Function as a Service.

16

(25)

References

[1] Fritz Alder, N Asokan, Arseny Kurnikov, Andrew Paverd, and Michael Steiner. S-faas:

Trustworthy and accountable function-as-a-service using intel sgx. In Proceedings of the 2019 ACM SIGSAC Conference on Cloud Computing Security Workshop, pages 185–199, 2019.

[2] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010.

[3] Paul Castro, Vatche Ishakian, Vinod Muthusamy, and Aleksander Slominski. Serverless programming (function as a service). In 2017 IEEE 37th International Conference on Dis- tributed Computing Systems (ICDCS), pages 2658–2659. IEEE, 2017.

[4] Ram Srivatsa Kannan, Lavanya Subramanian, Ashwin Raju, Jeongseob Ahn, Jason Mars, and Lingjia Tang. Grandslam: Guaranteeing slas for jobs in microservices execution frameworks. In Proceedings of the Fourteenth EuroSys Conference 2019, pages 1–16, 2019.

[5] Josef Spillner, Cristian Mateos, and David A Monge. Faaster, better, cheaper: The prospect of serverless scientific computing and hpc. In Latin American High Performance Comput- ing Conference, pages 154–168. Springer, 2017.

[6] Erwin Van Eyk, Alexandru Iosup, Cristina L Abad, Johannes Grohmann, and Simon Eis- mann. A spec rg cloud group’s vision on the performance challenges of faas cloud archi- tectures. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering, pages 21–24, 2018.

[7] Erwin Van Eyk, Alexandru Iosup, Simon Seif, and Markus Th¨ommes. The spec cloud group’s research vision on faas and serverless architectures. In Proceedings of the 2nd International Workshop on Serverless Computing, pages 1–4, 2017.

[8] Vladimir Yussupov, Uwe Breitenb¨ucher, Frank Leymann, and Michael Wurster. A sys- tematic mapping study on engineering function-as-a-service platforms and tools. In Pro- ceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing, pages 229–240, 2019.

17

(26)

18

(27)

19

(28)

References

Related documents

Due to the diversity of studied lifecycle stages (from Product Development to Aftermarket), departments (Hydraulics, software, product maintenance, validation and

Second, we evaluate different secure messaging protocols and we consider the scientific work and relevant activities addressing transparency from the edge devices to the

In that case the IMS based call session control functions hosted by the core network provider request resources from the core Bandwidth Manager, and through inter-RAC

We investigate cryptography and usability for such an application in the context of JavaScript and XMPP (Extendable Messaging and Presence Protocol), and develop a set of suit-

The aim of this paper is to propose a specific measurement framework and relevant metrics for Company X to measure the performance of the end-to-end Product X supply chain.. Product

At the receiving end, TCP uses a receive window (TCP RWND) to inform the transmitting end on how many Bytes it is capable of accepting at a given time derived from Round-Trip

Figure A.21: Confidences on Smaller Multi-stream Network: predictions for rain, straight road and lane change.. Figure A.22: Confidences on Smaller Multi-stream Network: predictions

The data used in task 6 is an adapta- tion of the restaurant reservation dataset from the second dialog state tracking challenge (DSTC2) [8].. The dataset was originally designed