Scalability of push and pull based event notification: A comparison between webhooks and polling

Full text

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2020,

Scalability of push and pull based event notification

MARCUS NILSSON DANIEL DUNÉR

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

based event notification

A comparison between webhooks and polling

MARCUS NILSSON DANIEL DUNÉR

Högskoleingenjör Datateknik Date: June 9, 2020

Supervisor: Peter Sjödin Examiner: Markus Hidell

School of Electrical Engineering and Computer Science Swedish title: Skalbarhet hos push- och pullbaserad eventnotifikation

Swedish subtitle: En jämförelse mellan webhooks och polling

(3)

Scalability of push and pull based event notification / Skalbarhet hos push- och pullbaserad eventnotifikation

© 2020 Marcus Nilsson and Daniel Dunér

(4)

Abstract

Today’s web applications make extensive use of APIs between server and client, or server to server in order to provide new information in the form of events. The question was whether the different methods of procuring events are different in how they scale. This study aims to compare performance between webhooks and polling, the two most commonly used pull and push based methods for event notification when scaling up traffic. The purpose is to create a basis for developers when choosing the method for event notification. The comparison has been developed through measurements of typical indicators of good performance for web applications: CPU usage, memory usage and response time. The tests gave indications that webhooks perform better in most circumstances, but further testing is needed in a more well-defined environment to draw a confident conclusion.

Keywords

Event notification, Webhooks, Polling, Scalability, Performance test, Web

application

(5)

ii | Abstract

(6)

Sammanfattning

Dagens webbapplikationer använder sig i stor utsträckning av API:er mellan server och klient, eller server till server för att inhämta ny information i form av events (händelser). Frågan är om de olika metoder som finns för att inhämta events skalar olika bra. Förevarande studie ämnar att jämföra prestanda mellan ”webhooks” och ”polling”, de två mest använda pull- och pushbaserade metoderna för eventnotifikation vid uppskalning av trafik. Syftet är att skapa ett underlag för utvecklare vid valet av metod för eventnotifikation.

Jämförelsen har tagits fram genom mätningar av typiska indikatorer för god prestanda hos en webbapplikation: CPU-användning, minnesanvändning och svarstid. Testerna gav indikationer om att webhooks är bättre men det krävs vidare testning i en mer väldefinierad miljö för att dra en säkrare slutsats.

Nyckelord

Eventnotifikation, Webhooks, Polling, Skalbarhet, Prestandatest, Webbapplikation

(7)

iv | Sammanfattning

(8)

Acknowledgments

We want to thank our supervisor Peter Sjödin and our examiner Markus Hidell for their advice and contributions during this degree project.

Stockholm, June 2020

Marcus Nilsson and Daniel Dunér

(9)

vi | Acknowledgments

(10)

Contents

1 Introduction 1

1.1 Background . . . . 2

1.2 Problem . . . . 4

1.3 Goal . . . . 4

1.4 Benefits, Ethics and Sustainability . . . . 5

1.5 Delimitations . . . . 6

1.6 Outline . . . . 6

2 Theoretical Background 7 2.1 Polling . . . . 8

2.2 Webhook . . . . 9

2.3 Performance criteria for web applications . . . 10

2.4 Testing scalability . . . 10

2.4.1 Performance testing . . . 10

2.4.2 Core Performance Testing Activities . . . 11

2.4.3 Test Environment . . . 12

2.4.4 Identifying Acceptance Criteria . . . 12

2.4.5 Planning and Designing Tests . . . 13

2.4.6 Configuring the Test Environment . . . 13

2.4.7 Implementing the Designed Test . . . 13

2.4.8 Performing the Tests . . . 14

2.4.9 Analyze Results, Report, and Retest . . . 14

2.5 Related Work . . . 15

3 Methodology and Methods 17 3.1 Research process . . . 17

3.2 The model . . . 17

3.3 Performance Testing . . . 18

3.3.1 Identifying test environment . . . 18

(11)

viii | CONTENTS

3.3.2 Identifying performance Acceptance Criteria . . . 18

3.3.3 Planning and designing tests . . . 19

4 Implementation 21 4.1 The server . . . 21

4.2 The client . . . 21

4.3 Webhook . . . 22

4.4 Polling . . . 23

4.5 Implemented test design . . . 23

4.5.1 Load test . . . 24

4.5.2 Stress test . . . 24

4.5.3 Endurance test . . . 24

4.6 Measurement Criteria . . . 25

4.6.1 CPU and memory measurements . . . 25

4.6.2 Response time measurements . . . 25

4.7 Testbed . . . 25

5 Results 27 5.1 Load test . . . 27

5.1.1 CPU . . . 28

5.1.2 Memory . . . 30

5.2 Stress test . . . 31

5.2.1 CPU . . . 32

5.2.2 Memory . . . 35

5.3 Endurance test . . . 37

5.3.1 CPU . . . 38

5.3.2 Memory . . . 41

6 Discussion 45 6.1 Load test . . . 45

6.2 Stress test . . . 46

6.3 Endurance test . . . 47

6.4 Evaluation against acceptance criteria . . . 48

6.5 Conclusion . . . 49

6.6 Further Research . . . 49

6.7 Reflections . . . 50

References 51

(12)

A 55

A.1 Load Test Plots . . . 56

A.1.1 CPU . . . 56

A.1.2 Memory . . . 57

(13)

x | Contents

(14)

List of Figures

4.1 Webhook model. . . . 22

4.2 Polling model . . . 23

5.1 Central Processing Unit (CPU) usage of one burst of 100 clients. 29 5.2 CPU usage of one burst of 1000 clients. . . . 29

5.3 Memory usage of one burst of 1000 clients. . . . 30

5.4 CPU usage of one burst of 10 000 clients. . . . 32

5.5 CPU usage of one burst of 11 000 clients. . . . 33

5.6 CPU usage of one burst of 12 000 clients. . . . 33

5.7 CPU usage of one burst of 13 000 clients. . . . 34

5.8 CPU usage of one burst of 14 000 clients. . . . 34

5.9 Memory usage of one burst of 11 000 clients. . . . 35

5.10 Memory usage of one burst of 12 000 clients. . . . 35

5.11 Memory usage of one burst of 13 000 clients. . . . 36

5.12 Memory usage of one burst of 14 000 clients. . . . 36

5.13 CPU usage of 300 bursts of 500 clients each . . . 38

5.14 CPU usage of 900 bursts of 500 clients each . . . 39

5.15 CPU usage of 1800 bursts of 500 clients each . . . 39

5.16 CPU usage of 300 burst of 100 clients each. . . . 40

5.17 CPU usage of 300 burst of 1000 clients each. . . . 40

5.18 Memory usage of 300 burst of 100 clients each. . . . 41

5.19 Memory usage of 300 burst of 1000 clients each. . . . 42

5.20 Memory usage of 300 burst of 500 clients each, using polling. 42 5.21 Memory usage of 900 burst of 500 clients each. . . . 43

5.22 Memory usage of 1800 burst of 500 clients each. . . . 43

A.1 CPU usage of one burst of 1 client using polling. . . . 56

A.2 CPU usage of one burst of 10 client using polling. . . . 56

A.3 Memory usage of one burst of 1 client using polling. . . . 57

A.4 Memory usage of one burst of 10 client using polling. . . . 57

(15)

xii | LIST OF FIGURES

A.5 Memory usage of one burst of 100 client using polling. . . . . 58

A.6 Memory usage of one burst of 1000 client using polling. . . . 58

(16)

List of Tables

4.1 Testbed computer . . . 25

5.1 Average response time measured during load testing. . . . 28

5.2 Total number of requests . . . 28

5.3 Average response time for stress test . . . 31

5.4 Total number of requests sent for stress test . . . 31

5.5 Average response time . . . 37

5.6 Total number of requests sent . . . 37

(17)

xiv | LIST OF TABLES

(18)

List of acronyms and abbreviations

API Application Programming Interface CPU Central Processing Unit

DDoS Denial of Service Attack

HTTP Hypertext Transfer Protocol

HTTPS Hypertext Transfer Protocol Secure NAT Network Address Translation

RAM Random Access Memory REST Representational State Transfer SSL Secure Sockets Layer

TCP Transmission Control Protocol

URL Uniform Resource Locator

(19)

xvi | List of acronyms and abbreviations

(20)

Chapter 1 Introduction

The development of connected devices, mobile phones, and personal computers has been rapid. This has been accompanied by an increase in the amount of people with access to the Internet. In 2019, more than half of the global population used the Internet, a rapid increase from 16% in 2002 [1]. As access to the Internet increases and more and more of our daily actions are digital, expectations on the speed at which new data can be delivered to applications (hereafter, apps) and devices are also increasing. Events, such as the arrival of email, news, and messages, are expected to be delivered instantly. As most apps and devices get their data from servers it becomes a problem for the server of how to best notify clients about an event. A client-server model would have the clients request information about these events in order to be notified. This implies that from the client’s perspective, to ensure being up-to-date, it has to periodically ask the server for updates on the event. This approach, with an increasing number of clients, puts a strain on the server. Even when no events have occurred the overall response time from the server will be higher since it is always busy handling all requests and potentially sending empty responses.

This introduces a scalability problem where the server cannot respond in a timely manner, if at all.

There are several methods for addressing client event notification. Many

were developed to improve performance in some manner, although they vary

in their complexity of implementation and to what degree they adhere to the

client-server model. Event notification is either push or pull based depending

upon whether the server or client takes the initiative in taking responsibility for

updates. This thesis will address the two most common methods from these

two categories: polling and webhooks. The expected traffic that the server

will be exposed to might call for a specific method; however, the difference in

(21)

2 | Introduction

complexity of a given method might not be justifiable.

The decision of which method to use when developing a server is not always clear, as there frequently is a need for a balance between performance in terms of responsiveness for the user versus the resource costs of CPU and memory usage. Additionally, there are issues concerning the security and complexity of the implementation. This thesis aims to provide material to simplify an application designer’s decision of which method to use with regards only to server performance given an expected traffic load.

1.1 Background

Today, many modern web applications are built using a server that provides an Application Programming Interface (API) which is utilized by clients to implement the desired functionality of the application. A number of APIs support event notification. Event notification refers to the process where the server provides an update to the client in some way. For example, a company X offers electronic authentication through an API when a user signs in to a service of a third party. The third party application takes the role as the client and starts an authentication process at company X and asks the user to use that service to authenticate. The third party then waits for company X to notify it that the user has been authenticated using event notification. Once the event of a successful authentication has been received the user can be signed in.

Depending on the use case for the event notification method, a requirement might be that the client should be notified without unnecessary delay. The number of clients to notify can also be large, in which case all clients still expects be notified within a bounded time. These scenarios call for the method to perform well with regards to different metrics. A low response time gets the event notification to the client faster, while handling a large number of clients demands greater CPU and memory usage to deliver all of the events. This thesis uses these metrics to define the performance of event notification. An optimal method would have low CPU usage, low memory usage and has a low response time; however, it is not realistic to expect any method to achieve all these at the same time. In order to evaluate which is the best method for event notification given these metrics we will benchmark two different methods by measuring response time, CPU usage, and memory usage.

The exchange of information between the clients and the server API commonly takes place using Hypertext Transfer Protocol (HTTP). APIs that follow the client-server model normally use polling to check for event notifications.

However, polling (as a pull method) has some disadvantages and leads to an

(22)

increased traffic load compared with a push method.

Webhooks are a push based event notification method. The technique has gained popularity and is implemented in many modern web applications and used by global services such as GitHub, Facebook, and Google. Webhooks require a modification of the client-server pattern; instead of receiving updates by repeatedly sending requests, the client subscribes to the server for a specific resource and expects the server to send back an answer when one is available.

In the normal client-server pattern a server does not allow initiating contact with the client. As mentioned in [2], it is common among the push based methods to break this pattern and that is what the webhook server has to do.

When the client subscribes it attaches a Uniform Resource Locator (URL) to the request, specifying where it wants to receive the update of the resource.

The server then uses the URL when an update has occurred to notify the client.

This method requires a more complex implementation, on both server and client, and puts more work on the server as the server has to keep more state information (i.e., the pending subscriptions and URLs to use for them). The benefit is a reduced amount of empty responses from the server.

Comparing these two different types of methods introduces some challenges.

In a perfect world all applications share the same user patterns and system properties. When measuring a traditional algorithm emphasis is put on its performance in a concrete setting. The results are generally useful because the algorithm itself is unaffected to a large degree by the environment it is implemented in. However, unlike a traditional algorithm, event notification is dependent on the environment and the user’s context. The typical use case for event notification involves client traffic (in this case the actual distribution of the notifications to be sent to the client), a notoriously hard parameter to simulate realistically in a way that is useful to a wide range of applications. As a result, event notification cannot be isolated from the environment and user context. Fundamentally, there needs to be an event that occurs and someone needs to be notified. Since the environments where the event notification is implemented differ to a great extent, this indirectly affects the performance of the method. However, these methods have to be isolated from external factors to evaluate their general usefulness. The challenge is to create an environment where testing can occur, but also making the tests as realistic as possible so that a developer can make an informed decision as to which method to use.

Implementations of webhooks and polling vary between different applications,

making it hard to define a test application to a number of different real use

cases. However, there are some common concepts shared between many

implementations. Therefore, we utilize a generalized implementation while

(23)

4 | Introduction

breaking out these to keep the tests relevant for a wider range of applications.

To execute tests to gain useful results, a suitable test method is required. The tests should be able to produce results for both methods and be applied to both implementations of event notification in a similar manner such that a fair comparison can be made. We seek to draw conclusions as to when one solution consistently performs better than the other as measured using the metrics described previously. To do this we will perform load testing, i.e., subject each of the implementations to tests with increasing loads until we are able to draw our conclusion.

1.2 Problem

We will assume that a server provides an API for event notification and that there are a number of clients that want to be notified about events.

While polling works there are uncertainties about how well it scales with an increasing number of clients and notifications. In contrast, webhooks put more requirements on the server and the client, but the server should perform differently since it does not deal with the incoming polling requests (only infrequent subscription requests). The questions that emerge are how the two solutions compare, and under what conditions one is more suitable than the other. A good solution would offer lower response times, CPU usage and memory usage than the other method.

How does varying traffic affect the performance of polling compared to webhook based solutions for API event notification?

1.3 Goal

The goal of this project is to explore how a push based solution and a pull based solution compare with respect to performance under changing traffic.

This is done by conducting a controlled empirical study. The tests have to be repeatable for push and pull based solutions and with changing traffic load.

The results can then be used to choose which method to implement in a server.

In order to achieve the goal, the following will be carried out:

• Implementing the techniques for webhook and polling.

• Constructing an application that uses the two solutions and that mimics

a changing number of concurrent web clients to simulate different traffic

load.

(24)

• Gathering of data from different traffic scenarios.

• Analyzing how the data obtained from the different techniques compares.

The two techniques will be compared based on three factors affecting performance:

response time, CPU usage and memory usage. These factors are based on factors used in similar research [2, 3]. Response time is the server processing time for a request including the communication time with the client. A low response time is what is sought, since it means that the client got what it asked for faster. CPU and Memory usage are both limited resources on a system and they represent cost. CPU usage and memory usage should be kept as low as possible. The techniques may vary in comparison to each other, but they have to meet the following minimum performance criteria in order to be considered functioning:

• CPU usage should not exceed 80% to allow for temporary load increases.

• Memory usage is not allowed to exceed the amount of Random Access Memory (RAM) available, since this would impact the execution speed.

1.4 Benefits, Ethics and Sustainability

Efficient event notification is important as it can reduce the amount of unnecessary work done by both the client and server. Clients benefit from this in their mobile apps. Mobile apps is a common area for event notification as chat messaging, news, emails, and tweets all rely on some kind of event notification.

Since the phone CPU usage and communication is a larger consumer of energy in phones [4], reducing it also reduces energy usage. This extends battery life and time between charging it.

The server can also benefit as many modern applications are hosted in cloud environments that charge for CPU usage (increasingly at per minute granularity). Those same cloud environments are typically hosted in data centres that, according to Nicola Jones [5], use more energy than some countries (such as Iran). However, to set the context of energy usages, data centers consume about 1% of global electrical demand[5].

We do not see any direct ethical consequences of the performance of event

notification or as a result of this thesis.

(25)

6 | Introduction

1.5 Delimitations

This thesis tries to overlook server and environmental factors that are not directly related to the implementation and performance of the event notification, such as details of the server implementation, proxies, firewalls, or load balancing. Webhooks can be implemented in different ways. In this thesis a webhook is defined as by Matthias Biehl in [6]. Out of the several different methods from handling client event notification, this thesis focuses on two of the most common ones from each category: webhooks and polling. In this thesis the tests will consider different traffic scenarios; however, the actual traffic may differ depending on the applications using the techniques in the real world.

1.6 Outline

The thesis is structured as follows. Chapter 2 presents the theoretical background

for the area. It will present related work focusing on what has been done before,

what has been left for further research. Chapter 3 presents the engineering

related content of the project and the research methodologies and methods

chosen for the project together with possible flaws and how these flaws are

addressed. Chapter 4 presents the practical work that was done, specifically

how the models were built, how measurements were made, and details of the

testing environment. Chapter 5 presents the results from the measurements

collected from testing. Chapter 6 presents conclusions drawn from the analysis

of the results.

(26)

Chapter 2

Theoretical Background

According to [7, 8] the event delivery takes on three main characteristics that can be used to compare them: client pull vs. server push, aperiodic vs.

periodic, and unicast vs. 1-to-N. Polling and webhook are defined as pull and push respectively, defining who takes the initiative in the event delivery. Both methods can be aperiodic or periodic, as well as unicast or 1-to-N. This defines if the delivery is triggered by an event or a time schedule, and how many clients are notified. For the tests of this thesis polling will be periodic and webhooks will be aperiodic. Both are unicast.

There are challenges when developing an event notification infrastructure.

Pull based solutions puts strain on servers yet still can not guarantee instant

notification. In contrast, push based solutions where the server informs the

client violates the traditional client-server model and may break expectations

of how an API should behave. According to Matthias Biehl [6] the problems

come from the inversion of control as in order for the server to deliver an event,

the client needs to wait for the server to send it. This inversion of control

causes the client to effectively act as the server, hence inverting these two

roles. This inversion puts server-related expectations on the client that would

not normally be there, as it must accept requests and provide a Secure Sockets

Layer (SSL) for Hypertext Transfer Protocol Secure (HTTPS) traffic. The

client also needs to open ports to accept incoming requests, effectively acting

as a server. This might be unsuitable for many applications as most devices

are located behind firewalls or Network Address Translation (NAT). When

choosing an event notification method for an application, it is reasonable to

expect a compromise between the advantages and disadvantages of alternative

methods. Parameters to consider when selecting the method to be used include

scalability, implementation complexity, and requirements on client and server.

(27)

8 | Theoretical Background

2.1 Polling

In the context of event notification, polling refers to the act of requesting new information from the server to learn of events. Polling has two varieties:

long and short. Unfortunately, this terminology is used inconsistently but the general consensus appears to be that long polling refers to the technique where the server withholds sending a response until the event occurs, while in short polling the server returns a response (almost) immediately and expects the client to ask again. Short polling is commonly seen as the standard polling approach [6]: this thesis will refer to short polling as polling. Polling generally describes a set of behavioral patterns used by clients and, as there is no formal specification, this project will define polling as it is described in [6, 9, 10], resulting in the following definition:

A server has information about an event. A client asks the server about the event at regular intervals defined by a polling interval. If the event has occurred on the server, this event is sent as a response to the API call. If there is no status update, an empty response is returned. The main parameter in polling is the polling interval as this determines how often each client queries the server.

It can be argued that polling is the only way for a client to guarantee that it gets notified since polling does not require waiting for someone else to act. Furthermore, according to Matthias Biehl [6] polling should not add any additional constraints to the system making it universally applicable within the client-server paradigm. Polling may also be the only viable solution when the client is protected by a firewall that blocks exposing any endpoints. Polling is in practice the default method to handle events when using a Representational State Transfer (REST) API, due to the REST architecture and, by extension, the client-server model. Additionally, polling could be seen as the ”lazy”

approach (since it is so easy to implement and works in all cases), hence it is widely used.

When load on a server is low polling gets the job done. However, as load

increases one expects polling to eventually run into problems when a large

number of users repeatedly request data during the polling interval. In the

worst case, user’s polling slow down the server or even stops it, similar to the

effect of a Denial of Service Attack (DDoS) attack [11].

(28)

2.2 Webhook

A webhook is not a technology that can be purchased or installed nor is there a standard way of implementing it. The webhook technique is instead built upon a collection of concepts and best practices. This thesis takes its definition of webhooks from how they are described in [6, 10, 9] where the server acts as an API provider and the client is a subscriber. The client sends a request to subscribe to a service that the server provides. The client provides a webhook endpoint that handles notifications that are to be sent back from the server when an event occurs. The server must provide a number of API access points. One of these access points handles incoming subscriptions and another dispatches events which are sent to the subscribed clients. According to Matthias Biehl [6], it is important to distinguish between a webhook and a webhook endpoint. A webhook endpoint is the implementation of the receiving endpoint on the client side. Webhook refers to the concept of sending events to a webhook endpoint hosted by the client. Webhooks solve the delivery problem from the server to the client and is the most common solution when managing push based events between server and client in combination with RESTsful APIs [6]. Taking the responsibility away from the client means the client can not choose what data it gets or when. As expressed in [7]:

The “in-your-face” nature of push technology is the root of both its potential benefits and disadvantages.

There are two concepts that are widely known and work similarly to webhooks: callback functions and event handlers. In [12], a callback function is described as a function that is called with a reference to a function as an argument. The goal is to perform some action and then call the function that was received as an argument. If someone wants to receive an event, they write a callback function and register it with the application that produces the event. Webhooks work in the same way with the exception of being an HTTP endpoint on a remote machine rather than a function within a program.

The concept of event handlers is commonly used in the context of graphical

user interfaces and interactive websites. According to [13], a typical event

handler waits for a button to be clicked and then notifies some other component

about the event; for example, a submit button on a form executes a POST

request to a backend-server. Such event handlers are widely used in windowing

systems used in operating system. Webhooks can be seen as cross application

event handlers.

(29)

10 | Theoretical Background

2.3 Performance criteria for web applications

A reasonable response time for the application can be derived by looking at the effect of the delay on the end user. According to Jakob Nielsen [14], a 0.1 s delay is the highest delay for the user to feel that the system reacts instantaneously, 1 s appears as a noticeable delay but the user does not get distracted, and 10 s is the limit for how long the the application can keep the user’s attention. So depending on the specifics of an application there are different bounds of acceptable delays for even notifications. For example, to respond to a keystroke the delay needs to be below 0.1 second, while 1 second might be acceptable for a chat application, and 10 seconds might be acceptable to a notification of a new email message arrival.

Recommendations on CPU usage in [15] says that a CPU usage of 100%

at normal load is undesirable since that leaves no capacity for load peaks.

At 100% CPU the throughput will be unchanged while the response times increase with increased load. This source recommends a 70% to 80% CPU load as that leaves headroom to improve throughput instead of increased response time when the load has a short duration peak.

Memory usage for an application may indirectly affect performance. If an application uses more memory than is available in RAM, the operating system is forced to employ its swapping algorithms to virtually increase the accessible memory at the cost of a decrease in speed.

2.4 Testing scalability

Scalability refers to how an application’s performance is affected by increasing workload [16]. In Scott Barber and Colin Mason’s book Web Load Testing For Dummies [17], performance testing is defined as tests designed to evaluate the speed, scalability, and stability of an application. This book also defines different performance test subgroups. Details of this performance testing and the subgroups of the tests will be given in the following section.

2.4.1 Performance testing

Performance testing is used to find performance problems, to evaluate performance

with respect to different performance criteria, and/or to collect other performance-

related data. Performance testing can be divided into the following three sub-

groups: load, stress, and endurance/duration tests.

(30)

Load testing

According to [17, 18], load tests provide the best means for evaluating whether an application can handle the expected load of a given production environment. Load testing should be designed to assess the performance of an application under the anticipated load. The goal is to evaluate the application’s performance under different production-like conditions to find problems that would otherwise appear following deployment – when it could negatively affect users.

Stress testing

The purpose of stress testing is to expose the system to a load that it is unable to handle, i.e., purposely causing it to fail. This level of load is a scenario that no developer expects, but might occur in reality. Stress tests are conducted to determine a program’s stability near the breaking point. This is especially interesting when testing how the software performs against DDoS attacks or when the system exceeds available memory or CPU capacity.

Endurance testing

Endurance/duration testing utilizes load tests that are run for a longer time.

Their purpose is to ensure that the application operates normally and does not experience deterioration in performance or response time during extended operation. One example of why a program might degrade in performance over time is due to memory leaks.

2.4.2 Core Performance Testing Activities

One approach to performance testing is to follow the core performance testing activities in accordance with the performance activities described in J. D.

Meier’s book Performance testing guidance for web applications: patterns

& practices [16]:

1. Identify Test Environment,

2. Identify Performance Acceptance Criteria, 3. Plan and Design Tests,

4. Configure Test Environment,

5. Implement Test Design,

(31)

12 | Theoretical Background

6. Execute Tests, and

7. Analyze, Report, and Retest.

The above activities represent the typical performance testing processes that most projects go through. However, no testing process cover all cases.

Therefore, each testing activity needs to be adapted to the context of the specific project.

2.4.3 Test Environment

The first thing that needs to be done before doing performance testing is to determine what the physical test environment is and what resources are needed to carry out the planned tests. Doing this early in the testing process helps to streamline the design and planning of the tests, as some obstacles to thorough testing can be identified at an early stage. However, it may be necessary to revise the testing plans along the way.

The physical test environment refers to hardware, software, and network configurations. Ideally, the test environment is an exact replica of the production environment, with the exception of the tools used to perform the tests. The degree of similarity between the production and testing environment is an important factor when choosing which tests should be used and the details of the loads used for testing. Understanding the similarities between these two environments and their differences allows the tester to define a suitable test environment.

2.4.4 Identifying Acceptance Criteria

In order to be able to carry out suitable tests, it is important to assess what should be considered acceptable performance. This will vary depending on the project. For example, in a Rubik’s cube algorithm benchmark in [19] a small number of moves on the cube was chosen as the desired criteria. In contrast, common desired criteria for web applications are low response time, a desired throughput, and minimal resource utilization. The response time is the combination of the server’s processing time for a request, the time required for communication with the client, and the time that requests are en-queued waiting to be served. An example of desired throughput is how many requests an application has to process during a certain time interval.

Resource utilization refers to usage of system resources, such as CPU and

memory usage.

(32)

2.4.5 Planning and Designing Tests

When designing a test that will be used to assess the performance of an application it is important to simulate the likely usage scenarios as accurately as possible. Some key usage scenarios include the most common usage, business-critical usage, and performance-intensive usage. Appropriate metrics need to be chosen to compare the application’s performance with the desired performance – while highlighting performance problems, bugs, and bottlenecks.

These metrics should be chosen to directly or indirectly correspond to desired characteristics of the application.

2.4.6 Configuring the Test Environment

The test environment, tools, and resources should be prepared before testing.

Unfortunately, tests are often more difficult to set up than initially assumed and it is easy to miss vital aspects that might invalidate the data.

There are several considerations when configuring the test environment, tools, and resources. It is common for a load generator to first encounter a memory bottleneck and then encounter a CPU bottleneck. Therefore, it is important to determine the amount of load the generator can generate before reaching either of these bottlenecks.

System clocks should be synchronized across the systems from which data is to be collected. The accuracy of the load test with respect to the underlying hardware should be validated. Moreover, the hardware configuration should emulate typically real-world configurations. It is good practice to ensure that the test environment configurations are easy to adjust, since the type of metrics that should be gathered often changes multiple times during a project’s life cycle.

2.4.7 Implementing the Designed Test

Some useful tools to generate HTTP traffic are JMeter [20], NetSim [21], iPerf3 [22], and OMNeT++ [23]. However, tools for load generation are usually inadequate for the latest technologies

. Moreover, these tools usually only support the most common types of communication. This makes simulating realistic users a challenge. Additionally, emulating very large numbers of users may require a hardware configuration. Alternatively, load generators might have to be built from scratch. Unfortunately building high-performance

This is especially true for link speeds of greater than 10 Gbps

(33)

14 | Theoretical Background

load generators is difficult and time-consuming; hence, this should be taken into consideration when planning the tests. A typical test design scripts a user scenario and then scales up this scenario and combines it with other scenarios.

2.4.8 Performing the Tests

Execution of the tests is not be limited to pressing a button and waiting for the test to finish. There are several preparatory steps that can increase the probability of a successful test run with valid results, such as:

• Reset the system;

• Validate the test environment configuration to match the current test;

• Perform a test run: test all operations under normal load, to ensure that the system works correctly;

• Ensure collection of the selected metrics is correctly configured.

When it comes to running the actual test, good practices are to monitor the test and the system for indications that the test is failing. Each test should be executed more than once as the first execution can be affected by initial loading times for the server as well as warming up the server’s caches. If the first test differs significantly it should not be discarded, but rather it can be used to understand how the application performs on cold start (i.e. when the caches are not warmed up). During the test’s execution, the system should be isolated as the execution of other activities can affect the test results. This means to consider shutting down various background processes that normally run on the system.

Before the test results are accepted they should be reviewed to identify any flaws or anomalies. If the results are too hard to understand it can be helpful to fix specific variables during the test to make the results more clear. The results of the test should be archived along with all the information needed to repeat the test in the future. It is very important that both you and others should be able to reproduce the results of each of your tests.

2.4.9 Analyze Results, Report, and Retest

After the tests have been completed and data collected, the most important

task is analysis of the collected data. The results should be compared with the

acceptance criteria and whether the results are trending towards or away from

(34)

the expected results. If the tests fail or if it is impossible to draw conclusions from the results that answer the questions that the tests were intended to answer, then the tests need to be redesigned, repeated, and the test results analyzed again.

2.5 Related Work

There has been previous work done in this area. A comparison between polling and webhooks was done by Kavats and Kostenko in [3] regarding the speed of application interaction and implementation complexity. Their conclusion was that the webhook implementation showed a response time three times lower than the polling implementation when the load reached 1000 concurrent requests.

Push and pull techniques for use in browser based applications were compared in [2, 24]. The techniques compared were a pure pull technique (polling) and a COMET push implementation [25] which allows a server to push data to a browser. The comparison tested the system for both techniques with different numbers of concurrent users (ranging from 100 to 1000). The results show that push was more responsive than pull by a large factor and that push should be chosen whenever data coherence between the server and client is prioritized. On the other hand, pull showed better response times even for small numbers of users. The reason for this is that the server needs to maintain the state of each client while also handling all of the connections.

Based upon these results it was recommended that load balancing solutions prior to the server(s) should be applied even for a few hundred users. The conclusion seems to imply that push performs much better with regards to data coherence between server and client but at a great cost in CPU usage.

L. Brenna and D. Johansen addressed scaling problems of polling by extending existing web applications with a push-based web service wrapper[26].

Their goal was to validate the data close to the source to reduce the amount of redundant data transferred over the wire and also to avoid unnecessary client- server interactions.

S. Acharya, M. Franklin, and S. Zdonik investigated techniques for enhancing

system performance and scalability[27]. Their focus was on finding a suitable

balance between push and pull. They concluded that for a lightly loaded server,

pull seems to be the best choice, as queued requests are handled faster than the

average latency of publishing. However, this study did not focus on HTTP

servers.

(35)

16 | Theoretical Background

(36)

Chapter 3

Methodology and Methods

This chapter introduces the methodologies and research processes used in this thesis.

3.1 Research process

To carry out this research, the methodology chosen was to construct a model of an environment featuring client-server interactions through an API. Webhooks and polling were implemented in this environment and used by the client. This environment was used to do performance testing of the server. Measurement results are compared to reveal differences and draw conclusions about under what circumstances each method is more suitable according to the metrics described in Section 1.2.

3.2 The model

The model aims to emulate an environment featuring client-server interactions through a server’s API. The server provides a resource for each client, with a changing status that is accessible through the API. The status update will be made accessible by one of two event management tools (polling or webhook).

The clients use this API to acquire information about its resource. The clients

are configured to use one of the event management tools and the number of

active clients is scaled over a range to increase and decrease the traffic load on

the server. This traffic is configured to follow different patterns. This traffic

comes as a one-time burst or a number of bursts with a given interval.

(37)

18 | Methodology and Methods

3.3 Performance Testing

Testing the scalability of the two event notification methods is done by testing their performance. The approach to testing the performance will be through the use of the activities presented in Section 2.4.1.

3.3.1 Identifying test environment

In order to produce as accurate tests as possible, the test environment should reflect the production environment as closely as possible. When there is no specification for the production environment, as these event delivery methods are usually implemented as part of a web application, a suitable test methods is to use a web application. The parts of the web application that are tested should be unaffected by other parts of the application, such as network interface and functionalities that judged to be irrelevant.

3.3.2 Identifying performance Acceptance Criteria

Criteria that are indicative of good performance in a web application are response time, CPU usage, and memory usage. Response time measures the time from when a request is sent by client until the desired response is returned to the client. A low response time means that the client gets a fast response from the application. The application will follow the recommendations mentioned in Section 2.4.5 regarding response time, CPU usage, and memory usage. To maintain a satisfactory user experience and be considered acceptable performance in this thesis, the response time must be below 10 s

. If this criteria is met, then we will consider CPU and memory usage in order to limit the resource usage of the system. When presented with a ”normal” traffic load the CPU usage should be below 80% to allow for temporary (short term) increases in load, i.e., spikes in load. This CPU load will define an upper bound for a given method. Similarly, memory usage should be less than the available RAM so as not to adversely impact execution speed. This limit on memory usage is not expected to be surpassed during the testing of an application but memory usage of the two methods will be compared. The application will use more system resources as offered load increases. For the same amount of finished work an efficient method will uses fewer resources; hence, CPU and memory usage are important criteria for these tests.

This is the threshold described in Section2.3for web interactions.

(38)

3.3.3 Planning and designing tests

The designing of tests will be from the perspective of a web application and the usage scenarios that this implies. The main usage of such an application will be in a client-server scenario where the clients wish to receive updates about a relevant event

. This implies that the application has to work with a variety of different traffic loads from the clients. To assess the performance of the application, tests will be made according to the subgroups of performance testing described in Section §2.4.1. The application only has one usage scenario, wherein a client requests an event notification from the server. The flow of event updates and the flow of either polling requests or subscriptions will provide the scalable load for the tests.

The application considered in this thesis is only used as a means to measure the performance of the two different event notification methods. The identification of details of actual anticipated loads for an application is rather pointless. Instead, the performance characteristics for the methods will be collected at different artificially generated load levels depending on which test is being run. For load testing, different levels of manageable loads will be used to collect data that should correspond to ”normal” operation. While the load for stress testing will be set to purposely overwhelm the application server.

This means the load will be increased until the application crashes or takes too long to respond to any requests, i.e., greater than 10 s. The endurance tests will run the load tests for an extended period of time

.

This thesis will consider each client to only be interested in updates for a single event, rather than the case where a given client is interested in multiple events For the purpose of this thesis, this period of time will be limited to 5 min to 30 min.

(39)

20 | Methodology and Methods

(40)

Chapter 4

Implementation

This chapter describes the work done and how the model and tests were implemented.

4.1 The server

To achieve separating the code into manageable modules, the server was split into two separate programs: one for polling and one for webhooks. The two servers were implemented using node [28] to achieve a non-blocking behavior.

As node is built as a single threaded event loop it has one executing thread but assigns blocking tasks to new threads. This model realizes a server that is able process several requests concurrently and is primarily limited by memory and CPU capacity. The servers offer a HTTP API that provides a resource with a status. The resource’s status starts in a rejected state but is connected to a timer that changes the status to an accepted state after a random period of time between 0 s and 30 s for each resource. For the purposes of this thesis, the resource is considered to be a document waiting to be signed by a third party.

This resource waiting time is sent to the client with the event notification to allow for time calculations. The servers provide event notification about status updates to its clients via its specific event management method.

4.2 The client

A traffic generator was built in node to simulate client traffic. It was also split in

two parts for the same reason as for the server. The application creates clients

for either the webhook server or the polling server. To achieve scalability in the

(41)

22 | Implementation

number of clients, the traffic generator application creates one or more clients.

Each client uses the server’s API to access the resource’s status. When a client is notified with an accepted resource status, then the client ends its use of the API and terminates. The traffic generator makes heavy use of JavaScript promises [29] to allow the application to send numerous requests concurrently before waiting for a response from the server. When a response is received the appropriate action is taken, which depends on what event management system the client is implementing.

4.3 Webhook

Figure 4.1 shows an illustration of the webhook model. The implementation of webhook was realized to test the parts of the webhook that are relevant to this thesis.

The server side implementation of the webhook was realised by having two endpoints: one receives subscriptions in the form of HTTP POST requests from clients (i.e., the Event Subscription Endpoint) and the other sends notifications to the clients (i.e., the Event Sender Endpoint) using HTTP GET requests. The subscription endpoint receives a request from a client, with a callback URL and subscription ID. The callback URL is the client’s event receiver endpoint, where the client wants to have notifications sent, upon an event. When the callback URL is received the server stores this URL and subscription ID in a key-value-pair (i.e. the subscription ID and the callback URL). The server also waits for a randomized time period (i.e., resource waiting time), depending on the different tests, and then sends a notification to the URL. The wait emulates the service time of a service realized by the server.

Figure 4.1: Webhook model.

(42)

4.4 Polling

The polling API applied in this thesis was built with two endpoints on the server side (see Figure 4.2). A start endpoint was implemented to start a polling session (hereafter session). A client requests a session to start by sending an HTTP GET request, including a client ID, to the start endpoint.

When the server receives a request from the start endpoint it will store the ID and map it to an object that represents the specific session along with a variable stating the status of the session, it will then wait for a randomized time period (i.e., resource waiting time) before setting the status to done. When the event occurs, the session status will be changed to indicate that it is done. The server event update endpoint receives a HTTP GET request with a client ID, and then the server will look if there is a session mapped to this ID and if so then it will check whether the session status is set to done; if so, then it will respond with the session object. After calling the start endpoint the client will request the status periodically with a set interval

. Once the server responds with an updated status the client will cease sending requests.

Figure 4.2: Polling model

4.5 Implemented test design

There are several commercial and open source tools for generating HTTP traffic, as mentioned in Section 2.4.7. However, in order to avoid the complex configurations needed to satisfy both webhook and polling as to test them

This implementation uses an interval of 2 seconds.

(43)

24 | Implementation

in a similar fashion, a custom traffic generator was built. Furthermore, in a study in [30] where various load testing tools were evaluated, the concluding recommendations were to not use the open source tools when having little or no experience and instead use commercial options. However, this project did not have sufficient resources to use commercial tools.

The test designs are based on the behavior of the custom built traffic generator, specifically how much traffic it generates and at what rate. These parameters define what test is being run. The measurements were taken during the execution of each test. The parameters used for each of the tests are described in the following subsections.

4.5.1 Load test

The load tests were executed with a load that increased by a factor of 10, starting at 1, until a breaking point was reached. For each instance of the test the traffic generator was configured to send the specified number of requests simultaneously. The numbers of requests were 1, 10, 100, 1000, and 10 000.

4.5.2 Stress test

Stress testing exposes the point were the application fails. Finding the breaking point for each of the methods shows whether their performance is inherently different. Such a difference should be visible as one method being able to handle a higher maximum load than the other.

To find the breaking point of the two methods, the load was increased until the test failed

. An upper limit occurred at 10 000 clients or higher. At these loads the application sometimes crashed. Moreover, at 15 000 clients the application crashed every time. The range of loads for the stress tests was then chosen to be between 10 000 and 14 000 as at these loads the application proved to be unstable.

4.5.3 Endurance test

An endurance test was performed to test the methods when exposed to a load over a longer period of time and was conducted by sending a number of bursts every second. Each burst consisted of a number of concurrent requests to the server. The number of requests was set based on loads determined in the load

In this case, failure was either the server crashing or the response time being longer than 10 s.

(44)

test and the number of bursts was set to 300, 900, and 1800 to approximate lengths of 5 min, 15 min, and 30 min. The number of requests per burst were 100, 500, and 1000.

4.6 Measurement Criteria

This section describes how the performance testing was performed and what factors were taken into consideration. The tests measured CPU and memory usage by the server process, and response time on the clients.

4.6.1 CPU and memory measurements

The CPU and memory usage were measured using the Linux top command.

The values received from top are described in [31]. The CPU value shows the task’s share of the elapsed CPU time since the last update as a percentage of total CPU time. The memory value shows a task’s currently used share of available physical memory. The command was used in a bash script to query the CPU and memory usage for the server process every second during each test. This data was then used to plot a graph of these metrics for every second of the test.

4.6.2 Response time measurements

The traffic generator timestamps when each client sends its request for a resource. When the client receives the event notification the timestamp and the provided resource waiting time is subtracted from the current time to get the elapsed time. The time is stored in memory on the client and logged to a file on test completion.

4.7 Testbed

The testbed consisted of a computer as described in Table 4.1. The computer ran Ubuntu version 18.04.4 LTS operating system.

Table 4.1: Testbed computer

CPU Memory

Intel Core i5 3570K @ 4.0 GHz Corsair DDR3 @ 1333MHz 16.0 GB

(45)

26 | Implementation

(46)

Chapter 5 Results

This chapter presents the results for the different categories of tests. Each category is presented in its own section (see Sections 5.1, 5.2, and 5.3). Each section shows the results with comments on the response time, CPU, and memory for the given type of test.

5.1 Load test

Table 5.1 shows the average response time for a webhook and a polling request,

when the server is exposed to different loads. The table also shows the

difference factor between the two methods. Table 5.2 shows the total number

of requests that the application was sent during the test at each of the different

load levels, for each method. As can be seen in Table 5.1, webhooks were

significantly faster in regards to response time, although the gap between

webhooks and polling seemed to decrease when the application was exposed

to higher loads. As shown in Table 5.2 the number of requests was a factor of 6

to 8 times lower for webhooks and with an increasing load, with the exception

of the first test with only one client where there was only a single webhook

request sent. Note that in each case only one webhook request was sent per

client.

(47)

28 | Results

Table 5.1: Average response time measured during load testing.

Load Response Time Difference

(number of clients) Polling (ms) Webhook (ms) factor

1 1109 83 13.36

10 604 48 12.58

100 1115 139 8.03

1000 1916 922 2.08

Table 5.2: Total number of requests

Load Requests sent Difference

(number of clients) Polling Webhook factor

1 10 1 10.00

10 61 10 6.10

100 777 100 7.77

1000 8059 1000 8.06

5.1.1 CPU

Tests were executed for loads of 1 client and 10 clients. However, the resulting plots did not show any difference between the methods; hence these plots are not included here, see Appendix A.1. Figures 5.1 and 5.2 show the CPU usage of the application each second during the testing for the specified load.

Both event notification methods were tested for each load. The plots for 100 and 1000 clients show CPU usage from 0% to 50% on the y-axis for easy comparison.

The CPU usage was generally lower when using webhooks with 100 and

1000 clients. Each test shows a more even CPU usage for webhooks compared

to polling; polling is more volatile. The results for polling all show high spikes

throughout the test. For 100 and 1000 clients the peak CPU usage comes after

the initial burst which is to be expected since there is the initial request to

start before the repeated polling starts which reduces the number of clients as

events occur. The webhook results show that the CPU peaks at the start at the

initial burst before getting lower and more even, also expected for the similar

reason as the polling; all clients initially subscribe and then wait for the event,

finishing more and more clients as time passes. The differences in the results

(48)

are more apparent for 100 and 1000 clients.

Figure 5.1: CPU usage of one burst of 100 clients.

Figure 5.2: CPU usage of one burst of 1000 clients.

(49)

30 | Results

5.1.2 Memory

The memory usage was tested for each load, but the results did not show any clear differences. They differed a little more when the load grew. Figure 5.3 shows the memory usage of the application each second when it was exposed to 1000 client requests for each method. The results shows very similar memory usage between the two methods. The plots for 1, 10, and 100 showed no major differences and thus not presented here, see appendix A.3-A.6.

Figure 5.3: Memory usage of one burst of 1000 clients.

(50)

5.2 Stress test

Similarly to the tables in Section 5.1, Tables 5.3 and 5.4 show the average response times and total number of requests sent for both methods. The response times for webhooks are lower than for polling, although by a smaller factor. The number of requests differ in that polling sends more requests than webhooks. The difference factor does not change as much as in the load tests in Table 5.2 but seems to stay around a factor between seven and eight. Tests were also performed with a load of 15 000 clients but the application crashed for both webhook and polling.

Table 5.3: Average response time for stress test

Load Response Time Difference

(number of clients) Polling (ms) Webhook (ms) factor

10 000 9212 5750 1.60

11 000 9441 7187 1.31

12 000 9099 6730 1.35

13 000 12 783 10 850 1.18

14 000 11 417 10 520 1.09

Table 5.4: Total number of requests sent for stress test

Load Requests sent Difference

(number of clients) Polling Webhook factor

10 000 77 264 10 000 7.73

11 000 85 095 11 000 7.74

12 000 86 262 12 000 7.19

13 000 97 639 13 000 7.51

14 000 98 448 14 000 7.03

(51)

32 | Results

5.2.1 CPU

Looking at the graphs in Figures 5.4 to 5.8, the graphs are similar for the most part. The webhooks seem to experience a higher CPU usage for the initial burst than polling. Polling seems to experience more spikes making it more volatile. Both methods continue to execute after the duration of the test. The behaviour does not seem to change much as load increases from 10 000 to 14 000.

Figure 5.4: CPU usage of one burst of 10 000 clients.

(52)

Figure 5.5: CPU usage of one burst of 11 000 clients.

Figure 5.6: CPU usage of one burst of 12 000 clients.

(53)

34 | Results

Figure 5.7: CPU usage of one burst of 13 000 clients.

Figure 5.8: CPU usage of one burst of 14 000 clients.

(54)

5.2.2 Memory

Memory usage for polling is slightly higher than the usage for webhook for each test case, as shown in Figures 5.9 to 5.12. The memory usage between the different loads is sometimes lower for a test with more clients. The memory usage does not seem to be affected to such a high extent when the load is increased.

Figure 5.9: Memory usage of one burst of 11 000 clients.

Figure 5.10: Memory usage of one burst of 12 000 clients.

(55)

36 | Results

Figure 5.11: Memory usage of one burst of 13 000 clients.

Figure 5.12: Memory usage of one burst of 14 000 clients.

(56)

5.3 Endurance test

Table 5.5 shows the average response time for the two methods at the different loads. Table 5.6 shows the total number of requests sent during each test for both methods. The burst rate column specifies the client bursts in the format (number of clients / second) x number of bursts.

Table 5.5 shows a lower response time for webhooks for every burst rate.

The response time was higher for both methods when the number of clients increased. Increasing the number of bursts did not seem to affect webhooks but showed lower response times for polling. The number of requests sent was higher for polling in every case. The difference factor for requests sent did not change much and stayed between seven and eight.

Table 5.5: Average response time

Response Time Difference

Burst rate Polling (ms) Webhook (ms) factor

100 x 300 1060 35 30.59

1000 x 300 4565 458 9.97

500 x 300 2054 220 9.35

500 x 900 1801 224 8.05

500 x 1800 1259 235 5.35

Table 5.6: Total number of requests sent

Requests sent Difference

Burst rate Polling Webhook factor

100 x 300 238 007 30 000 7.93

1000 x 300 1 455 244 300 000 4.85

500 x 300 1 036 268 150 000 6.91

500 x 900 3 227 703 450 000 7.17

500 x 1800 6 875 825 900 000 7.64

(57)

38 | Results

5.3.1 CPU

Figure 5.13 to 5.17 shows the CPU usage of the application during an extended runtime and when exposed to a number of clients requests each second. The different combinations of runtime and clients per second was 5, 15, and 30 min with 500 clients each second and 5 min with 100 and 1000 clients each second. The results shows a stable CPU usage for the two methods, where the application using webhook uses slightly more CPU than the one using polling.

When running the test consisting of 1000 clients per second during a 5 min runtime, the application using polling took close to twice the time to finish than the expected time.

Figure 5.13: CPU usage of 300 bursts of 500 clients each

(58)

Figure 5.14: CPU usage of 900 bursts of 500 clients each

Figure 5.15: CPU usage of 1800 bursts of 500 clients each

(59)

40 | Results

Figure 5.16: CPU usage of 300 burst of 100 clients each.

Figure 5.17: CPU usage of 300 burst of 1000 clients each.

(60)

5.3.2 Memory

Figures 5.18 to 5.22 shows the memory usage of the application during the different bursts. When the application was exposed to 300 bursts of 100 clients per second the memory usage was very similar between webhook and polling. Except for that test case, the webhook seems to utilize less memory than polling. Neither of the methods seemed to be affected by the duration of the test.

Figure 5.18: Memory usage of 300 burst of 100 clients each.

Figur

Updating...

Relaterade ämnen :