End-to-end latency and cost impact of function segregation and customized memory allocation in FaaS environments

(1)

END-TO-END LATENCY AND COST IMPACT OF FUNCTION SEGREGATION

AND CUSTOMIZED MEMORY

ALLOCATION IN FAAS ENVIRONMENTS

Desire´e Fredriksson

Bachelor Thesis, 15 credits

Bachelor Of Science Programme in Computing Science

2021

(2)

(3)

Abstract

Function as a service (FaaS) is a type of serverless cloud computing intended to facilitate development by abstracting away infrastructural management, and offer a more flexible pay-as-you-go billing model based on execution time and memory allocation. FaaS functions are deployed to the cloud provider as either single units, or chained to form a pipeline of multiple functions that call each other. As each step in the pipeline might have different requirements, it could be beneficial to split larger functions into smaller parts. This would enable customized provisioning according to each function’s needs, and potentially result in a lower rate. However, decreased memory entails a lower CPU performance, which directly correlates to computation time. A test application was created and executed on Google Cloud services to investigate what impact function segregation, and provisioning accommodated to each sub-function requirement, have on end-to-end latency and total cost. In conclusion, no trivial relation between cost and performance was found. Segregating and adjusting provisioning to required memory was in this experiment cheaper in some cases, but not all; however, always significantly slower. In addition to price and workload behavior being considered and balanced, it was found that aspects such as level of control over infrastructural management and hardware configuration has to be weighed in when deciding if FaaS is a suitable alternative for a given situation.

(4)

(5)

Acknowledgements Tack f¨or att du ¨ar du, Klas.

(6)

(7)

Contents

1 Introduction 1

1.1 Purpose 1

1.2 Research question 1

1.3 Method used 1

1.4 Delimitations 1

2 Related work 2

3 Theoretical background 3

3.1 Cloud computing 3

3.2 Function as a service 4

4 Method 5

4.1 Test application 5

4.2 Experiments 7

4.3 Statistics 9

5 Results 10

5.1 Increased memory provisioning 10

5.2 Segregation latency 11

5.3 Custom memory provisioning 13

6 Discussion 15

7 Future work 16

References 17

(8)

(9)

1 Introduction

Function as a Service (FaaS) is a serverless computing technology and an increasingly popular model for application development. Functions are stateless, cloud-native code deployed to a cloud provider with minimal environmental configuration. They are invoked in response to an event and executed on servers completely managed by the provider. Idle computation is not billed, as tenants are typically charged in relation to actual usage. Cost is based on execution time combined with amount of memory allocated for the function. This model provides trivial upscaling and downscaling with alternating loads, and a more flexible billing model than traditional server rental where the user is charged a fixed amount each day.

1.1 Purpose

FaaS functions can be deployed as single units, consolidated, or chained to form a pipeline con- sisting of multiple functions that call each other, segregated. Calling a function from another yields a run-time penalty, latency, a delay between the call being made and its responding action. This latency is, however, not billed as long as the calling function does not depend on the result from the other, but has an effect on end-to-end time.

As each step in the pipeline might have different memory requirements, it could be beneficial to split a resource-intensive application into smaller functions. This would enable more accu- rate memory allocation according to each function’s actual needs, and consequently result in a lower rate. However, a decrease in memory also lowers CPU performance, which directly correlates to computation time. As computation time is an additional factor for total cost, the negative effect a lower memory provisioning might have on end-to-end time could possibly affect total price as well.

1.2 Research question

What impact does function segregation, combined with customized memory provision accommodated to each sub-functions actual requirement, have on end-to-end latency and total cost?

1.3 Method used

A test application was created and run on Google cloud services as either consolidated or segregated, provisioned with different amount of memory and CPU performance. Start time and end time for each function were measured for execution time and latency evaluation, and price was calculated by combining execution time of each function with the prices listed on Google Cloud Functions website.

1.4 Delimitations

FaaS is offered by several cloud providers, but only Google Cloud Functions is concerned due to the scope of this study. For the same reason, the experiments leave out data transfers in the experiments, and exclude the free tier and regions other than us-central1 in the price calculations.

1

(10)

2 Related work

Several studies have investigated the impact of different factors on the performance of FaaS.

In their paper FaaSter, Better, Cheaper: The Prospect of Serverless Scientific Computing and HPC [5], Spillner et al. investigates how FaaS performs within scientific and high-performance computing (HPC), compared to conventional HPC platforms. The highly controlled settings and demands for repeatability associated with these disciplines are uncommon use-cases for FaaS, seeing as its main purpose is to abstract away infrastructural configuration. Having resources available on-demand and charged according to use, however, is appealing and mo- tivated the authors studies on FaaS potential applicability within this area. Spillner et al.

conducted four experiments to study performance and usefulness of FaaS when executing resource demanding computing tasks; more particularly calculation of 𝜋 , face detection, pass- word cracking, and precipitation forecast. The FaaS model successfully executed the experiments, but did not outperform the approximate performance of common fast processors. On the contrary, it is generally far slower and the providers do not offer to specifically increase performance. Although when comparing vendors, a resemblance among them all was found in that increased memory provisioning also increased performance. Further, their results showed a significant difference in resource requirement characteristics among the domains.

To address this, the authors suggest investigating ”special-purpose FaaS instances”. Addition- ally, the authors contribute with some concepts and tools intended to improve the engineering process. One example is worm functions, a concept of how to manage execution time restric- tions by dividing larger functions into smaller ones, calling each other, all of which is short enough to adhere to the constraints.

Function partitioning is related to one of six challenges serverless computing face regarding performance, presented by van Eyk et al. in A SPEC RG Cloud Group’s Vision on the Perfor- mance Challenges of FaaS Cloud Architectures[6]. It concerns dealing with overhead in several stages of FaaS, one of them being request overhead. Invoking a function introduces latency, and an increased number of functions entails more overhead due to invocations between these functions. This calls for measures to reduce this overhead to an acceptable level. Another chal- lenge the authors introduce is understanding the trade-off between performance and cost.

Compared to on-demand virtual machines and containers, they describe that FaaS becomes less lucrative when requests exceed a certain number per second. Having pricing models depend on load variations and differ between vendors increases complexity, and makes it more complex to find the most advantageous alternative. Therefore, the authors suggest more research on how to simplify decision-making or possibly find automated solutions to aid the user on how to minimize operational costs.

RS Kannan et al. present a strategy to improve resource utilization and throughput in mi- croservice architecture management while still satisfying Service Level Agreements (SLA);

GrandSLAm: Guaranteeing SLAs for Jobs in Microservices Execution Frameworks[4]. As data- center accommodation adopts to new ways of providing resources, with multiple tenants or applications accessing shared microservices, there must exist a procedure to ensure each user acquire the agreed level of service. Previous approaches suggest locating applications that are latency-sensitive and of high priority, with applications of lower priority, but the authors focus on a finer granularity. Microservices, the individual parts and foundation of the applications, are analyzed by establishing factors that influence execution time. The intention is to make improvements and simultaneously enable co-location of several applications requir- ing low latency, and satisfy SLAs. The study is limited to artificial intelligence and machine learning, and findings included that is beneficial to execute requests from various sources as consolidated, since resource utilization improved.

2

(11)

3 Theoretical background

This section presents key concepts of this study and related work in more detail.

3.1 Cloud computing

Cloud computing is hardware, as well as software services provided on-demand over the internet [2]. Users are able to acquire these virtual resources as needed, and are charged in relation to what is allocated. Intentions are to achieve flexibility and economical advantages by sharing resources. The services can be either publicly accessible, or privately owned and limited to a single organization. Some major public cloud providers are Amazon Web Ser- vices¹, Google Cloud Platform², and Microsoft Azure³. Examples of services they provide include scalable data storage, computational resources, and managed services such as database or web server hosting. Compared to single corporate data-centers, these organizations are commonly distributed world-wide to enable data back-up and reduced latency for geograph- ically different locations. Tenants are offered regularly updated hardware and the option to customize settings and performance.

Maintaining cloud services at an affordable rate, while still profiting, has been made possible by efficient use of resources compared to the average data center [2]. Traditionally, it has been common to accommodate available resources to manage peak load, with the purpose to minimize the risk of capacity saturation that presumably leads to an unsatisfactory user experience and thereby a loss of revenue. As a consequence, under-utilization is experienced when computations are idle. This expensive surplus was inevitable due to fluctuating service demands, difficulties to predict future requirements, and that is time-consuming to obtain and configure hardware supply. With this new model for how computing power is distributed and managed, organizations and individual users are spared investments upfront regarding operating manpower and hardware, as well as maintenance cost, service agreements and rent.

Therefore, it can be profitable to utilize cloud services as total costs might be reduced by more accurately adjusting resources to current needs, even if actual computation cost might be higher compared to running on own servers. Additionally, inexpensive areas have been selected to establish massive commodity-computer data centers that, due to their size, are often eligible to receive favorable prices on service agreements and hardware.

Cloud architecture can be implemented with different levels of abstraction concerning hardware and operations, to accommodate different needs of server management [7]. Users can acquire services ranging from complete software application that are ready-to-use without any knowledge or management of its infrastructure, to pure compute, network, and storage resources that allow fine-tuned configuration. The various models mostly differ in the amount of work required to set up and continuously maintain a service, which is also the motivation in using either one of them. Centrally situated within this scope is an architecture known as serverless, where operational logic and configuration, such as automatic scaling and server management, is mostly abstracted away. Resources are dynamically allocated on-demand, and the user is charged according to usage. Included in this category is Function-as-a-Service, a technology where stateless functions deployed to the cloud are triggered as a reaction to events.

1https://aws.amazon.com/

2https://cloud.google.com/

3https://azure.microsoft.com/

3

(12)

3.2 Function as a service

Function as a service (FaaS) is a type of serverless cloud-computing that follows the increasing migration of applications towards a micro-service architecture [3]. Intentions are to facilitate development by abstracting away the infrastructure commonly associated with creating and deploying micro-services, and to present a more flexible billing model that applies pay-as-you- go with no costs upfront. Tenants allocate and receive resources as needed, and are charged in relation to actual usage. This assures enough resources with rising loads, and the ability to scale all the way down to zero when computations are idle to avoid unnecessary expenses due to surplus resources.

Functions are cloud-native code, for instance a JavaScript or Python function, provided by the user and deployed to a cloud provider [1]. They are invoked by events, typically an HTTP- request, and can be composed of a single, consolidated function, or segregated into a pipeline of several functions chained together by calling each other. In contrast to subscription based rental of full-stack virtual machines, users are charged according to execution time combined with the computational resources required by the function, in addition to an invocation fee.

Necessary infrastructure commonly associated with creating and deploying micro-services, such as hardware management, scaling, and load-balancing, is operated by the cloud provider.

This allows improved resource efficiency for providers, and let users focus on application logic.

Consequently, tenants have less control over hardware design and the operational composi- tion, providing increased oversight for cloud providers to govern the development stack, se- curity, enhance resource utilization, and endorse further use of supplementary cloud services [3].

Functions generally have a limited execution time set by the cloud provider⁴[8]. Initiating a function is associated with the cold start problem, where start-up time is affected due to the set-up of required dependencies and resources. To minimize the effect of this, these pre- pared resources are maintained and ready to use for some time after termination to allow the next function to reuse them. Additionally, functions are stateless and therefore require supplementary, shared storage if state persistence is needed.

Examples suitable for a serverless environment often display functional characteristics such as being stateless and event-driven, designed to have a sole assignment with little requirements on latency [6]. Typical use cases are to connect different cloud services, handle the occasionally emitted data within Internet-of-Things, and performing as a minor backend for web applications [5].

FaaS was first introduced by Amazon year 2014 with their platform AWS Lambda⁵. Today, the service is offered by several cloud vendors including Google Cloud Functions⁶by Google, IBM Cloud Functions using Apache OpenWhisk by IBM⁷, and Microsoft Azure Functions⁸by Microsoft.

4Execution time limits vary among providers, but as of May 2021 most are positioned around 10-15 minutes.

5https://aws.amazon.com/lambda/

6https://cloud.google.com/functions

7https://www.ibm.com/cloud/functions

8https://azure.microsoft.com/services/functions/

4

(13)

4 Method

To investigate what impact function segregation, combined with customized memory provision accommodated to each sub-function’s actual requirements, have on end-to-end latency and total cost, three experiments were conducted on the Google Cloud platform. A test application was created using Google Cloud Functions to explore particular aspects of FaaS, and integrated with Google Cloud Firestore⁹for data aggregation and storage. RStudio¹⁰and the language R were used for data visualizing and statistical analysis.

4.1 Test application

Each test case involves an application that is triggered by an HTTP-request and run as either consolidated or segregated in Google Cloud Functions, provisioned with different amount of memory and CPU. Google Cloud was chosen as the platform as it is one of the larger available providers with sufficient functionality to perform the desired experiments. The test application was artificially constructed to control resource requirements and enable partitioning into equal sub-functions, in order to ensure the right conditions for the experiment. Requirements regarding computation and memory is constant for each application, except for the additional invocations between functions.

The test application is written in Python and executed in the Google Cloud Python3.8 environment, performing a series of eight calculations of the 100 000th Fibonacci number 𝑓 (100 000), as described in Algorithm 1. The decision to use 𝑓 (100 000) results from a few test-runs to find a calculation with reasonable execution time considering the scope of this study. The number of consecutive calculations was chosen to enable segregation into enough functions of equal size to run sufficiently comprehensive experiments, and simultaneously keeping time required to build the applications at an acceptable level.

Algorithm 1:Fibonacci Input:Integer i

Result:f(i), the i:th Fibonacci number

1 int f1 = 0;

2 int f2 = 1;

3 for i = 0 to i-1 do

4 x = f1 + f2;

5 f2 = f1;

6 f1 = x;

7 end

8 return f1;

To illustrate how the test application is segregated into sub-functions within the various test- cases, the syntax shown in Figure 1 is used. The external box represents an application, and the internal boxes represent single functions, each with individual provisioning, executing a sequence of 𝑥 calculations of the 𝑖:th Fibonacci number. Optional arrows illustrate internal function invocations using Cloud Functions Invoker. The variable 𝑦 is a placeholder for the amount of memory allocated for each function, and 𝑧 the approximated CPU performance.

A single internal box indicates a consolidated application with one function and no internal function invocations. When mentioned, the term invocation index refers to the number in order an invocation arrow has, starting at one and counting from left.

9https://cloud.google.com/firestore

10https://www.rstudio.com/

5

(14)

f(i) * x f(i) * x

y MB, z GHz y MB, z GHz

Figure 1:Syntax used to illustrate how functions within the test-application are configured.

Individual provisioning is illustrated as 𝑦 MB memory, and 𝑧 GHz of approximate CPU performance. The task to perform is described as an 𝑥 long sequence of calculations of the 𝑖:th Fibonacci number, and optional arrows represent internal function invocations.

This arrangement of various pipelines aims to show how different levels of segregation af- fects end-to-end time by measuring start time and end time for each function, and compare any differences to the consolidated counterpart. A data collection utility backend was created using Google Cloud Firestore to record performance and latency metrics from the running applications, as it integrates well with Google Cloud Functions and offer sufficient functionality for the experiments in this study.

Price calculation

Functions were provisioned different amount of memory between test-cases to measure any impact this might have on total price. The amount of allocated memory has a direct correlation to cost, as each 100 milliseconds execution time is charged according to level of provisioning. However, amount of allocated memory also has a direct correlation to CPU performance, which impact another price factor, namely execution time. To examine this, price was manu- ally calculated according to measured times and the prices listed on Google Cloud Functions website¹¹, and shown in Table 1. Only prices in accordance with the paid tier for us-central1 region is included, as it was used for the experiments without regards to the free tier or other regions.

Table 1Google Cloud Functions prices in USD for region us-central1, excluding the initial free tier, as of May 2021.

Paid tier (𝑈 𝑆𝐷) Invocation 4.0e−7 per unit

CPU Time 1.0e−5 per second Memory Time 2.5e−6 per GiB second 128 MB memory, 0.2 GHz CPU 2.31e−7 per 100ms 256 MB memory, 0.4 GHz CPU 4.63e−7 per 100ms 512 MB memory, 0.8 GHz CPU 9.25e−7 per 100ms 1024 MB memory, 1.4 GHz CPU 1.65e−6 per 100ms 2048 MB memory, 2.4 GHz CPU 2.90e−6 per 100ms 4096 MB memory, 4.8 GHz CPU 5.80e−6 per 100ms 8192 MB memory, 4.8 GHz CPU 6.80e−6 per 100ms

11https://cloud.google.com/functions/pricing

6

(15)

The estimated execution cost was calculated using the current pricing table and the recorded execution times to get a stable estimation, as the billing model for Google Cloud Functions include a certain number of free invocations. The duration of the free tier depends on usage, and each month includes 2 000 000 invocations, 400 000 GB-seconds, 200 000 GHz-seconds of computation time, and 5 GB of Network egress traffic. Excluded from the price calculation in this study is the function invocation cost of 0.0000004 USD (0.40 USD per 1,000,000 invocations), as it is not enough to significantly impact the measured values of this study. However, its existence should be noted and taken into account when calculating total price of functions.

The cost of function invocations is a fixed price per unit with no regards to the invocation source, its outcome or duration. Compute time fees depend on provisioned memory and CPU, and is measured and rounded up to the nearest 100 ms from when a request is received by a function to when it is terminated in any way. Network egress is charged in relation to usage (GB) at a flat rate, whereas inbound data, including outbound data to Google APIs within the same region or global, is free. Data transfers are not included in this study. Hence, the price for a certain function was calculated by multiplying its measured computation time with the price listed for running a function with its current provisioning.

4.2 Experiments

The data collection in this study was divided into three experiments, Increased memory provisioning, Segregation latency, and Custom memory provisioning, each with its own purpose towards answering the research question. To focus each experiment on a single measure allows for less complexity and therefore increases the chance to test what is intended, and that the result is not affected by circumstances regarding other experiments.

Increased memory provisioning

The first experiment was designed to investigate how memory provisioning affect computation time. As seen in Table 1, Google Cloud Functions can be provisioned with seven different amounts of memory. CPU performance correlates to memory provisioning, so each increment also increase CPU. The highest two levels are the exception, as CPU remains the same between them. CPU performance is however described on the website as an approximation, which im- ply that the definite number of CPU clock cycles provisioned to a function is not certain, and therefore might vary from what was actually allocated.

f(100 000) * 8

128 MB, 0.2 GHz;

256 MB, 0.4 GHz;

512 MB, 0.8 GHz;

1024 MB, 1.4 GHz;

2048 MB, 2.4 GHz;

4096 MB, 4.8 GHz;

8192 MB, 4.8 GHz;

Figure 2:Application configuration to investigate how differences in provisioning affect end- to-end time and computation time.

As CPU performance might have an impact on computation time, increased memory and consequently increased CPU could result in a faster run-time. The test application was therefore run as consolidated, once for each of the seven possible provisionings as illustrated in Figure 2. Start time and end time were logged for each function to measure possible differences in run-time for functions provisioned with the various provisionings.

7

(16)

Segregation latency

Experiment number two investigates the impact segregation has on end-to-end time and total computation time. As segregation introduces internal function invocations that cause a latency penalty, end-to-end time might increase. There is also some extra logic within the calling function, needed to realize the invocation, that might affect total computation time.

This was tested by segregating the application into two, four, and eight functions, see Figure 3, creating a pipeline of sub-functions invoking each other.

f(100 000) * 4 f(100 000) * 4