Performance Testing and Response Time Validation of a Financial Real-Time Java Application

(1)

COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2017,

Performance Testing and

Response Time Validation of a Financial Real-Time Java

Application

EMMY RYSTEDT

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

Performance Testing and Response Time Validation of a Financial Real- Time Java Application

EMMY RYRSTEDT

Master in Computer Science Date: August 1, 2017

Supervisor: Karl Meinke Examiner: Johan Håstad

Swedish title: Prestandatestning och svarstidsvalidering av ett finansiellt realtids Java-system

School of Computer Science and Communication

(3)

Abstract

System performance determines how fast a system can deliver its services when it is exposed to different loads. In Real-time computing the system performance is a critical aspect, since the usefulness or correctness of a response from a real-time system depends not only on the content of the response, but also on when it is delivered. If the response is delivered to fast or to slow it is considered an error and the system might go into a bad state, even if the value of the response actually is correct. Even though timing is a crucial aspect in real-time computing, it is hard to find any established methods on how to measure and evaluate the performance of a real-time system in terms of timing.

This report strives to contribute to development in this research area by describing a project that investigates how to scientifically measure and report the timing performance of a financial real-time Java application. During the project a tool is implemented in a foreign exchange system, that can perform time measurements of different components in the system at application level. Experiments with variations of input values are constructed and executed to validate the system performance during different loads, by analyzing the measurements.

The results from the experiments gives a ranking of how much various factors im- pacts the performance of the system, and shows how it is possible to find threshold values and bottlenecks by studying the value distributions and maximum values.

The developed method can be used to compare the performance effects of different factors and to compare the system performance for different parameter values. The method shows to be a useful way to measure and validate the performance of a financial real-time Java application.

(4)

ii

Sammanfattning

Systemprestandan bestämmer hur snabbt ett system kan leverera sina tjänster när det ut- sätts för olika belastningar. Vid realtidsberäkning är systemets prestanda en kritisk aspekt av funktionaliteten, eftersom nyttan av ett svar från ett realtidssystem inte bara beror på svarets innehåll utan även när det levereras. Trots att timing är en viktig aspekt i realtidssystem är det svårt att hitta några etablerade metoder för hur man mäter och utvär- derar prestandan hos ett realtidssystem när det gäller timing.

Denna rapport strävar efter att bidra till utvecklingen inom detta forskningsområ- de genom att beskriva ett projekt som undersöker hur man på ett vetenskapligt sätt kan mäta och rapportera tidsprestandan för en finansiell realtids Java-applikation. Under pro- jektet implementeras ett verktyg i ett valutahandelssystem som på applikationsnivå utför tidsmätningar av olika komponenter i systemet. Experiment med variationer av inmat- ningsvärden konstrueras och exekveras för att validera systemets prestanda under olika belastningar, genom att analysera resultaten från tidsmätningarna.

Resultaten från experimenten ger en rangordning av hur olika faktorer påverkar systemets prestanda, och visar hur man kan hitta gränsvärden och flaskhalsar i systemet, genom att studera hur värdena var distribuerade och dess maximum värden.

Den utvecklade metoden kan användas för att jämföra prestandaeffekterna av olika faktorer och för att jämföra systemets prestanda med olika parametervärden. Metoden visar sig vara ett användbart sätt att mäta och validera prestandan hos en finansiell realtids Java-applikation.

(5)

Contents iii

1 Introduction 1

1.1 Background . . . 1

1.2 Problem Definition . . . 1

1.3 Purpose and Objectives . . . 2

1.4 Sustainability and Ethical Aspects . . . 2

1.5 Method . . . 3

1.6 Scope and Limitations . . . 3

2 Background 4 2.1 Real-Time Systems . . . 4

2.2 Software Performance Testing . . . 5

2.2.1 Performance Metrics . . . 5

2.2.2 Evaluation Techniques . . . 5

2.3 Java Performance Fundamentals . . . 6

2.3.1 Adaptive Optimization and Just-in-time Compilation . . . 6

2.3.2 Java Garbage Collection . . . 6

2.3.3 High Dynamic Range Histograms . . . 8

2.4 The Foreign Exchange Market . . . 8

2.5 The Quasar eFX System . . . 9

2.5.1 Hardware and System Setup . . . 9

3 Method 10 3.1 Performance Evaluation Method; Evaluation Criteria . . . 10

3.2 Evaluation Criteria . . . 11

3.2.1 Study Goals and System Boundaries . . . 11

3.2.2 System Services and Possible Outcomes . . . 11

3.2.3 Performance Metrics . . . 12

3.2.4 System and Workload Parameters . . . 12

3.2.5 Selected Factors . . . 13

3.3 Performance Evaluation Method; Implementation . . . 13

4 Implementation 15 4.1 Evaluation Technique . . . 15

4.1.1 Measurement Tool . . . 15

4.2 Workload Simulation . . . 18

iii

(6)

iv CONTENTS

4.2.1 Simulation of Incoming Liquidity . . . 18

4.2.2 Simulation of Service Requests . . . 18

4.3 Experimental Design and Analysis of Test Results . . . 18

4.3.1 Test Phase 1; Identifying Influencing Factors . . . 19

4.3.2 Test Phase 2; Finding Threshold Values and Bottlenecks . . . 20

5 Results 21 5.1 Factor Effects . . . 21

5.2 Validated Response Times . . . 22

5.3 Significant Maximum Values . . . 24

5.4 Threshold Values and Bottlenecks . . . 24

5.5 Value Distributions . . . 25

6 Discussion 27 6.1 Discussion of First Phase Results . . . 27

6.2 Discussion of Second Phase Results . . . 28

6.3 Reliability and Sources of Errors . . . 28

7 Conclusions 30 7.1 Future Work . . . 30

Bibliography 32

(7)

Introduction

System performance is a major aspect of the quality of a software system, which makes performance testing and performance evaluation an important part of the development and maintenance of a software system.

Since it is crucial to ensure that a system can deliver the agreed functionality to its customers the biggest effort in the testing process of a software system often is dedicated to functionality testing. It has however been found that the errors that are detected after a software program has been released often are related to performance issues, i.e that the system fails to deliver when it is exposed to certain workloads [1].

Real-time systems are a special category of systems, that have time constraints on how fast they need to deliver the requested functionally, for the response to be valid.

These systems respond to real-world events, and if the response is delivered too fast or too slow it is considered an error, even if the value of the response in fact is correct [2].

1.1 Background

The principal of this project is a company called Aphelion. Aphelion’s business idea is to develop and market real-time systems to increase the effectivity of currency trading, Over The Counter (OTC). The Quasar eFX system that is going to be used for the experiments during this project is currently marketed globally to banks and other financial institutions that operate in the foreign exchange market. The speed of information exchange is business-critical to Aphelion and if the system delivers information at the wrong time it might cause financial damage to their customers. The system is handling real-time prices for currencies, and it is important to always be in sync with the market by delivering correct prices at the right time. If prices are delivered too fast or too slow they can be rejected or misused.

1.2 Problem Definition

The performance of a real-time system depends on how well it succeed to deliver its services within its time constraints. Even though timing is directly crucial for a real-time system, it is hard to find any established methods on how to measure and evaluate the performance of a real-time system in terms of timing.

The performance of a system depends on a wide range of factors, both predefined values such as programming language and underlying hardware, but also event based

1

(8)

2 CHAPTER 1. INTRODUCTION

inputs like the number of users at the same time and what kind of resources that are requested. The large number of influencing factors makes the measuring and evaluation of the performance a complicated task. The system that is going to be used for the evaluation is written in Java and runs on a Java Virtual Machine (JVM). The JVM is perform- ing both code compilation and deallocation of unused memory during run time, which makes the execution time non-deterministic and adds complexity to the performance evaluation [3].

The problem that will be studied in this project is formulated as the following research question:

How to scientifically measure and report the timing performance of a financial real- time Java application?

1.3 Purpose and Objectives

One of the principal’s objectives for this project is to assure that the system within a margin is capable of processing the information load put on the system under production- like conditions without compromising any relevant response times. It is also desired to reveal where in the system the biggest bottlenecks in the information flow are located.

The main objective for the project is to establish a scientifically based methodology for measuring and reporting the performance of a real-time Java application in terms of response time. Furthermore, one objective is to provide a report of the response times of the different parts of the examined system. A tool should be provided that can be used to produce similar reports of the performance of the system, to ensure that the the desired performance is maintained during further development of the system.

The purpose is for the report to be of interest for people working or researching in the area of software performance testing. The report can be of special interest to people evaluating performance of a real-time Java application, and to people looking for a method to analyze the internal latencies in a system.

1.4 Sustainability and Ethical Aspects

An electronic system that replaces or complements a manual system increases the acces- sibility and speed of the service it provides. An electronic system can open up a service for more users and handle bigger quantities of use. In the area of foreign exchange the electronic trading systems make it easier to trade currencies, which makes it easier to exchange goods and other service between countries.

An electronic system can be built with a rigorous safety system, and lacks errors that arises with the human factor. With a system that is safe and easy to track and monitor it is possible to contribute to a sustainable development.

There is always a risk of having users searching for incorrect information provided by a system that can be used in an unethical way for economic gain. Since the correctness of the information provided by a real-time system depends on when it is delivered, there is a risk that information that is delivered too early or too late is rejected or misused.

On the foreign exchange market there have earlier been a problem with trading robots that scan the market for prices that lies off market, that can be utilized to buy and sell currency with a profit without being exposed to any risk.

(9)

1.5 Method

The project tested the hypothesis that by measuring and analyzing response times it is possible to find bottlenecks and parts critical for the performance of a system. The hypothesis was tested by implementing and using a tool that measures the response time on different parts of a specific system. The test results are analyzed statistically to see if it was possible to distinguish any bottlenecks in the system and to find which parts that are sensitive for change in workload and input values.

The work flow of the project was as follows:

1. The project began with a study on previous work in the area of performance testing to find out what statistical measures that should be calculated and how to present test results when evaluating performance in terms of timing.

2. When it had been found how the performance could be measured and evaluated, a measurement tool was developed as a software component that could be integrated and used in a system at application level.

3. It was studied how experiments could be constructed to give a good insight of the system performance and what factors that affects it.

4. When a suitable design for the experiments had been found, the method was tested on the Quasar eFX system.

5. At last the test results are evaluated to see if the performance could be validated and if it was possible to find any bottlenecks or threshold values.

1.6 Scope and Limitations

The performance of a system can be measured and analyzed in many different ways.

This project was limited to measuring the performance in terms of timing. The project focused on how to create valuable test cases for a specific system, which means that some of the findings might not apply to other types of software systems. As a practical limita- tion it was not possible to do the tests in a production environment. The tests are instead carried out on a test server with the same setup as the servers in production.

A problem that often arises when doing measurements in a computer system is that the system that is being measured is also the system that performs the measurements [4]. An evaluation of the performance impact of these processes is out of scope for this project.

(10)

Chapter 2

Background

In this chapter the theoretical background to the project and its research area is presented.

The first section, 2.1, gives an introduction to real-time systems and describes basic concepts that are commonly used in this area. Section 2.2 describes general concepts in performance testing, such as performance metrics and evaluations technique. Section 2.3 presents Java fundamentals that impact the system performance and are specific for Java applications. Section 2.4 gives an introduction to the foreign exchange market and its basic concepts. The last section, Section 2.5, describes the Quasar eFX System that was used during the project.

2.1 Real-Time Systems

"The goal of a real-time system is to respond to real-world events before a measurable deadline, or within a bounded time frame." Bruno and Bollella [2]

The goal of a real-time system is to have deterministic response times and to be able to guarantee that all deadlines always are met. A deadline can be either the processing time of a system request, relative deadline [2], or at which point in time the system re- sponds, absolute deadline [2], which both are important in a real-time system. A re- sponse that is delivered too early or too late is considered an error, even if the value of the response in fact is correct [5]. Due to the time sensitivity in real-time systems, latency is often considered an important measurement [2].

Latency is defined as a measure of the time interval between the time when the system receive an event until the system respond to that event [2].

When working with real-time requirements we often distinguish between soft real- timeand hard real-time. If a soft deadline is missed, the results can still have some value and a soft real-time system can tolerate to miss a few deadlines without going into a bad state [6]. If a hard deadline is missed the result will not have any value anymore, and the system will directly go into an abnormal state [6]. If a system has at least one hard deadline it is called a hard real-time system [6].

4

(11)

2.2 Software Performance Testing

While functionality testing is used to validate that a system is functioning correctly, performance testing is used to validate that a systems is able to deliver the expected functionality at a fast rate [7]. During performance testing the system performance is measured while the system is subject to different loads [4]. Due to the wide range of different types of software applications there is no standard measure of performance [7]. The choice of a suitable evaluation techniques, performance metrics, and representable workloads is therefore a part of the evaluation.

2.2.1 Performance Metrics

As a starting point for an evaluation a measurable quantity of performance should be defined that can describe the characteristics of the performance of a system. Metrics that are commonly used when evaluating the performance of a system are response time, throughput, and resource utilization.

• Response time can be defined as the time it takes for a system to perform a par- ticular request [4]. The response time can be measured from the time that a user initiate a request until the the system responds to the request [7].

• The throughput of a system can be defines as the rate, or requests per unit of time, at which the system completes a particular request [4][7]. The throughput is often measured as activities per second, e.g. bits per second (bps) or Transactions Per Second (TPS) [7].

• Resource utilization can be defined as the proportion of time that a resource is busy processing a request [4][7]. The resource utilization can for example be used to evaluate the performance of a CPU or I/O device.

2.2.2 Evaluation Techniques

The three general techniques that are used to evaluation the performance of a system are simulation, analytical modeling and measurement [7]. In general it is easier to convince others of the results when they come from real measurements [7]. Analytical modeling and simulation evaluations are therefore often based on previous measurements. Ana- lytical modeling and simulations can be used to validate measurements, or when measurements of a real system are not possible, for example in the stage of designing a new system.

To measure time duration of particular events within an application a specific measurement program can be created and built into the application. If the measurement tool is created from scratch it is recommended to include the following measurements [4]:

• Start and end time of the measurement period in the same time zone as all other measurements

• Largest response time

• Smallest response time

• Number of response times observed

(12)

6 CHAPTER 2. BACKGROUND

• Rate of job completions = (number of completed response times)/(measurement end time – measurement start time)

• Average response time

• Standard deviation of the response time

2.3 Java Performance Fundamentals

A Java program is run by a Java Virtual Machine (JVM) which means that the performance of a Java program depends on how the JVM manages and optimizes the program execution and how it utilizes the underlying hardware and operating system. The Java Standard Edition platform includes the Java HotSpot VM [8], which uses techniques like Adaptive optimization and just-in-time (JIT) compilation, described in section 2.3.1, to maximize the operating speed of the Program.

Java is an object-oriented programming language, where all objects are allocated from the heap and managed by the JVM’s garbage collector, which is further described in section 2.3.2.

In section 2.3.3 the org.HdrHistogram package is introduced, which is created as an extension to the standard Java Libraries, to be used for recording and analyzing sample data in latency and performance sensitive applications, such as real-time systems.

2.3.1 Adaptive Optimization and Just-in-time Compilation

An interpreter translates each line of a program into machine code as that line is executed, while a compiler translates the program into machine code before it is executed. A compiled program usually runs faster than an interpreted program, since the code can be optimized and customized to utilize the underlying hardware in an effective way [9].

Java is designed to take advantage of both the platform-independence of a interpreted language but also the performance advantages of a compiled language [9]. A Java class file is compiled into an intermediate language called Java bytecode that is executed by the JVM [9]. The JVM starts by interpreting the Java bytecode and is then continuously analyzing the code during execution to detect critical parts in the program that it dynam- ically compiles during runtime (just-in-time compilation), to speed up the execution time.

Since it takes time to compile a section of code, the JVM must make a trade-off between the time it takes to compile the code and how much the section of code is used, hence how much time it will save by running it as compiled code instead of interpreting each line of code separately.

In general, it is just a small part of the program that is used almost all of the time [9]. Hence, programs spend most of their time executing small subsections of their code, called critical hotspots, and the performance of an application depends a lot on how fast those sections are executed [9]. Only the critical hotspots in a Java program will be compiled, and by recompiling the code during execution the code can be further optimized according to the programs current execution profile [8].

2.3.2 Java Garbage Collection

The de-allocation of unused memory is done during runtime, which means that the JVM needs execution time and even to stop all application threads from executing during

(13)

some of the garbage collection processes. To avoid the long pause it would take to garbage collect all of the memory, the Java heap is divided into three areas, young generation, old generation and permanent generation.

• The young generation is where all new objects are allocated. Most objects become unreachable quickly and can also be garbage collected quickly[10]. Garbage collection in the young generation is called a minor garbage collection.

• The old generation is where the objects that survive the minor garbage collections will be placed. Garbage collection in the old generation is called a major garbage collection, and is often much slower than the minor collection, since it involves all live objects[10].

• The permanent generation is where metadata and library classes and methods are located and is not used for object allocations[10]. The permanent generation is only garbage collected during a full garbage collection.

There are four garbage collectors available for the Java HotSpot JVM, The Serial GC, The Parallel GC, The Concurrent Mark Sweep (CMS) Collector and The G1 Garbage Col- lector, which all uses different techniques to manage the heaps areas.

• The Serial GC performs both the minor and the major garbage collections using a single CPU, and all application threads are stopped until these operations are completed[11]. The serial GC also uses a mark-compact collection method, to move surviving objects to the beginning of the heap so that new objects can be allocated into a single continuous part of the memory, which allows faster memory allocations[10].

• The Parallel GC uses a multi-thread young generation collector and can be set to use either a single-threaded or multi-threaded old generation collector. The Parallel GC also compacts the memory in the old generation[10].

• The Concurrent Mark Sweep (CMS) Collector uses the same algorithm as the par- allel GC for the young generation collection. During the old generation collection most of the work is done concurrently with the application threads. The CMS collector is seen as a low pause collector, but can create problems with memory frag- mentations, since it usually does not perform any compacting[10].

• The G1 Garbage Collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector designed to be the long term replacement for the CMS collector[10]. The G1 collector uses a different Java Heap layout from the other garbage collectors. The young generation and the old generation do not have phys- ically separate memory space. Instead the heap is split into equal-sized regions, and each generation is a set of regions, which makes the young generation easy to resize[11].

The choice of an appropriate garbage collector and its configuration can have a big impact on the performance of an application[11].

(14)

8 CHAPTER 2. BACKGROUND

2.3.3 High Dynamic Range Histograms

The High Dynamic Range Histogram[12] (HdrHistogram) is originally a Java package created by Gil Tene to be used for creating histograms in latency and performance sensitive applications with support for recording and analyzing sample data with a config- urable range and decimal precision[12]. The hdrHistogram is operating at a fixed memory size, independent of the number of recorded values, and record sample values without the need of making any allocation operations [12]. The hdrhistogram also executes at a constant speed by using indexed locations instead of iterating or searching through the set of recorded values [12].

2.4 The Foreign Exchange Market

The foreign exchange (FX) market is the market where participants are able to buy, sell and exchange currencies at current or determined prices. The most actively traded financial instruments on the FX market are spot, forward and swap [13].

• A Spot represents an “on the spot” trade, and is the exchange of two currencies at a rate agreed on the trade date for settlement in one or two business days depending on which currencies that are traded[14].

• A Forward is a contract of a currency trade that settles on a different date, most of- ten further in the future, than spot trade[14]. A forward is priced as the spot price plus or minus the interest rate differential between the traded currencies[14].

• A Swap is available in several different types. The general foreign exchange swap involves the exchange of two currencies on a specific date (the short leg), and a re- verse exchange of the same two currencies at a date further in the future (the long leg), where both rates are agreed on at the time of the contract[13]. Another type of swap is a currency swap, where the parties exchange interest payments and/or principal amounts in different currencies[13].

There is no central marketplace for foreign exchange, which instead is traded over the counter (OTC). Since the market includes banks and other financial institutions that are operating in different time zones the market is open 24 Hours a Day, 5 Days a Week.

The FX market is divided into levels of access, depending on the amount of money that is traded. The top-tier is the wholesale market, called the interbank market, where the largest FX dealers trade with each other. The second tier consists of retail dealers, usually banks and financial institutions, that use the larger dealers as liquidity providers, to improve their own offered liquidity and prices, which they distribute to their customers over phone or more commonly the web. The software that delivers the prices to the corporate customers by connecting them to the banks over the web are called single- bank and multi-bank portals. The electronic systems that banks use to trade with each other are called electronic communication networks (ECNs).

The FX market is considered the largest and most liquid financial market in the world, with an average turnover of 5.1 trillion (10¹²) US Dollar per day in April 2016 according to the Bank for International Settlements (BIS) [13]. The USD, EUR and JPY are the world’s most traded currencies, and was on one side of 88%, 31% and 21% of all trades in April 2016 [13] respectively. The most actively traded currency pairs was the USD/EUR which represented 23% of the trades, USD/JPY 18% and USD/GBP 9%.

(15)

2.5 The Quasar eFX System

The Quasar eFX system is an end-to-end suite for electronic foreign exchange trading for banks and other financial institutions that operate in the FX market. The system performs both market data aggregation, price distribution and order transactions. Most of the customers are FX second-tier banks that use the system to distribute their liquidity to their customers. The system enables automated trading by aggregating the incoming market prices from liquidity providers (LPs) and electronic communication networks (ECNs), mold the liquidity into customized price streams and distribute the prices through distribution channels, such as single- and multi-bank portals. Figure 2.1 shows an overview of the hole Quasar eFX suite, with the connections to input channels (liquidity providers and electronic communication networks) at the left hand side, the core system in the middle, and connections to distribution channels to the right.

Figure 2.1: The Quasar eFX Suite

The System is managed through a desktop client that can be found in the top of figure 2.1 as Trader Client. The system also needs to be connected to a Credit System, to check the end customers credit worthiness before a trade can be made, and a Trade Capture system, to book all the trades that are done within the system.

The System uses a modular service-oriented architecture, where the business processes are exposed as services, and the service components communicate with each other via a proprietary data distribution system.

2.5.1 Hardware and System Setup

All Aphelion’s customers have their own application server, in a server park in London, where their version of the system is deployed. The servers are provided by Aphelion and they all have the same hardware components, Operating system and JVM installed. To test the system performance, a separate test server is prepared, at the same location and with the same hardware and JVM settings as the customer servers, and on which the latest version of the system is deployed.

(16)

Chapter 3

Method

In this chapter the general performance evaluation method that was used is presented.

The study followed the systematic approach to performance evaluation that is presented by Jain [7]. This approach includes 10 different steps, that we have decided to divide into 3 different parts for this project, (i) definition of evaluation criteria, (ii) implementation, and (iii) presentation of results.

The first part of the method includes the five first steps of the systematic approach [7], and is presented in Section 3.1. This part describes how we should set the base of the study, by defining what constitutes the system, and by defining the study scope and limitations. The implementation of the first part of the method is described in Section 3.2.

The second part of the method includes step 6-9 in the systematic approach [7], and is presented in Section 3.3. This part describes how the practical parts of the study should be carried out. The implementation of the second part is presented in Chapter 4, and is where the measurement tool, workload and experiments were created.

The last part of the method is also the 10th and last step of the systematic approach [7]. In the final step of the study the results is communicated to the principal, to decide if the results are sufficient or if another cycle is desired, to redefine the system boundaries or to include other factors and performance metrics that were not considered before. The final results from this study is presented in Chapter 5.

3.1 Performance Evaluation Method; Evaluation Criteria

This is the first part of the performance evaluation method, and includes the five first steps of the systematic approach presented by Jain [7], that are described below:

1. State the goals of the study and define the system boundaries. Given the same set of hardware and software, the definition of the system in the context of the evaluation might differ depending on the goals of the study. In this study the goal was to evaluate the internal latency at application level, and the system was delimited to the application server at software level.

2. List system services and possible outcomes. The next step in analyzing the system is to list which services it provides. The list should also include all the outcomes that are possible when these services are requested. The list is then used when se- lecting the performance metrics and the workloads. The services provided by the system that was studied in this project can be found in Chapter 3.2.

10

(17)

3. Select performance metrics. The criteria to compare the performance are called metrics. The metrics are usually related to the speed, accuracy or availability of the services. The criteria to evaluate the performance by are selected according to the defined system and the listed services, and should facilitate the study goals. In this study, the performance was measured by the time it took to perform a specific request, called response time or latency.

4. List system and workload parameters. The next step in performance projects is to list all the parameters that can affect the performance. The list is divided into system parameters and workload parameters, where system parameters includes both hardware parameters and software parameters. Parameters may also be added later if it is noticed during the evaluation process that there are more parameters that can affect the performance.

5. Select factors and their values. The parameters that will be varied during the eval- uation are called factors and should be decided during this step. Which values these factors are going to be tested for are called levels, and are also selected during this step.

3.2 Evaluation Criteria

This section describes how the first part of the performance evaluation method presented above, in Section 3.1, was implemented.

3.2.1 Study Goals and System Boundaries

In this study the goal was to evaluate the response times and the internal latency of the core system at application level. The response time measurements starts when a request is registered in the system and ends when the system sends the response. The internal latency of a system process or component was measured either from that the process is requested until a response was sent or from when a process is started until that the process has finished executing. Network latency was not measured in this study and the hardware latency was included in the application measurements and not evaluated separately.

3.2.2 System Services and Possible Outcomes

The system offers three base services, request for price (RFQ), request for stream (RFS), and market depth (MD), that can be requested through the distribution channels:

• RFQ is a request to get a price on a certain instrument, where base currency, quote currency, side, settlement date and amount must be provided, e.g. EUR/USD BUY SPOT 1000000. If the request is successful, the system will respond with a price on the requested instrument and continue to provide price updates until a deal is requested, the request is canceled, or until the time limit for the order have exceeded.

If the request for some reason was denied, the system will respond with an error message.

(18)

12 CHAPTER 3. METHOD

• RFS is a request to get price updates continuously streamed for a certain instru- ment and amount. If the request is successful, the system will send prices for the requested instrument until the stream is canceled.

• Market depth is a request to get price updates streamed for the whole market depth of an instrument. If the request succeeds, the system will send a continues stream of price updates for all pre-defined amounts for that instrument.

When a customer has been provided a price from the system, by requesting one of the services above, RFQ, RFS or market depth, they can either request to trade on a price, or to request the system to stop sending prices, by canceling the order.

• Trade request can be sent to request a deal on one of the provided prices.

• Cancel request can be sent to request the system to cancel an order, and stop pro- viding prices for that instrument.

To be able to provide their services to the customers the system will request for price updates and trades from their input channels in a similar manner, as soon as they are needed in the system. Most of the price updates from input channels will be provided as market depth prices, while it is more common with RFQ and RFS requests on the customers side.

3.2.3 Performance Metrics

Latency was used as the measure for comparison when analyzing the performance of the system in this study. Latency is defined as the time it take to perform a specific request or task in the system. The latency is measured by comparing timestamps done in the application by calling the Java.lang.System.nanoTime() [15] method. This method returns the current value of the running Java Virtual Machine’s high-resolution time source, in nanoseconds [15], and is used to measure elapsed time.

3.2.4 System and Workload Parameters

There are numerous parameters that can affect the performance of the system, like hardware components, operating system, JVM, garbage collector and its settings etc. None of the mentioned was evaluated during this study. This study focused on the application settings and the workload parameters. Even after narrowing it down to these categories of parameters, all parameters can not be studied.

Some of the workload parameters at the input side are:

• Number of input channels. The number of LPs and ECNs that should be connected and provide prices.

• Channel settings. The setting for how each individual LP and ECN should provide their prices. This include values for if it should be streaming market depth prices or provide price streams on request (RFQ), which amounts should be provided, which price margin should be added for each level, etc.

(19)

• Price update frequency. Which means how often each input channel updates the price of each provided currency pair.

Some of the workload parameters at the distribution side are:

• Number of distribution channels. The number of channels that are connected to the system and can request its services.

• Service requests. Both the type of service, RFQ, RFS and MD can be varied, but also its values, like instrument type, currency par, settlement date and amount etc.

• Request frequency. At what rate the requests arrive.

• Concurrent open price streams. The number of open price streams at the same time.

• Number of users. The number of users that connects to the system via some distri- bution channel to send service requests.

Some of the other system parameters are:

• Number of client users. Number of internal users that have their client application open.

• Price construction settings. The settings for how the prices should be constructed.

• Customer rules. The trading rules that should be used and which margins to add etc.

• System warm-up. For how long the system have been in use and how it have been used before the measurements start.

3.2.5 Selected Factors

The parameters that were selected as factors to vary during the experiments are:

(A) Delay value in ms. The initial delay, i.e. time gap, between incoming price updates for different instrument.

(B) Refresh value in ms. How often the input channel should refresh their prices, i.e how long to wait before sending out new price update.

(C) Number of input channels. The numbers of LPs and ECNs to request prices from.

(D) Number of streams. The number of open price streams, either RFS or MD, at the same time.

3.3 Performance Evaluation Method; Implementation

This section presents the second part of the performance evaluation method, which includes step 6-9 of the systematic approach presented by Jain [7], that are described below:

(20)

14 CHAPTER 3. METHOD

6. Select evaluation technique. The general performance evaluation techniques are simulation, analytical modeling and measurement. For this study it was chosen to perform application level measurements. How these measurements were done is described in Chapter 4.

7. Select the workload. The workload consist of a list of service requests and input values to the system. The workload should represent the system usage in reality, and historical use data should be analyzed to determine which values that are rep- resentative for the normal and heavy workloads in the real system. The simulation of the workload is described in Chapter 4

8. Design the experiments. Experiments should be constructed to offer maximum information about the selected factors with as little effort as possible. It is recommended to construct the experiments in two phases. In the first phase, the number of factors is large but the number of levels is small, to determine the relative effect of various factors. In the second phase, the number of factors is reduced to the ones found in the first phase to have the biggest impact on the performance, and the number of levels is increased, to study the important factors in more detail.

9. Analyze and interpret the data. A statistical technique to compare the results must be decided on to be able to draw any conclusions from the outcome from the experiments. In this study the results from the first test phase were modeled with a nonlinear regression model, to find the effects and interactions between the factors.

The results from the second phase were analyzed by comparing statistical values like mean, median and maximum values. But also by studying the value distributions and confidence intervals.

(21)

Implementation

In this chapter the implementation of step 6 to 9 of the performance evaluation method presented in Section 3.3 is described. Section 4.1 describes step 6, the selected evaluation technique, and how the measurement tool was created. Section 4.2 describes step 7, how the workload was simulated, and Section 4.3 describes step 8 and 9, how the experiments were constructed, how the results were analyzed.

4.1 Evaluation Technique

Application-level measurements of the real system were chosen as the evaluation technique. A measurement tool was developed as a separate software component that was integrated in the system and could be reached from other components within the system, to create latency samples of different processes and methods in the system.

There already existed a component in the system that implements the HdrHistogram packages[12] to save sample values, create histograms and to calculate statistical data like mean value and standard deviation. The existing tool is used to create a histogram Java object by instantiating the Histogram.java class imported with the HdrHistogram package[12]. The histogram is then managed by the calling process itself, to calculate statistical data about the latency of processing a general event. Latency histograms and measurements of queue sizes are done in all of the major components in the system.

During this project the existing histogram service was extended with the objective to work as a measurement tool that can be used to measure and compare the latency of specific events and information flows in the system. Except the histograms that already are implemented in the system, the measurement tool should manage all measurements done in the system. The tool should provide methods that can be called from other parts in the system to create sample values. The measurement tool is then respon- sible for creating histograms and calculate statistical data for the monitored processes using the hdrHistogram package.

4.1.1 Measurement Tool

The tool is manually activated through a management console that is provided by the system to manage its services during runtime. On activation the service will initialize a list of histogram objects, that is used to record sample values and calculate statistical data about the latency of the monitored processes.

15

(22)

16 CHAPTER 4. IMPLEMENTATION

The histogram service manages all the histograms, provides methods to sample latency values, and is called in different parts in the system when an event occurs, to populate the histograms. Methods to calculate statistical data from the recorded values are provided by the imported histogram package[12].

The results can be reached from links on the system management console. The results can either be viewed one histogram at a time presented by percentile distribution like in Figure 4.1, or on a performance report page, where all of the histograms are printed as charts like in Figure 4.3.

Figure 4.1: The value distribution of histogram data by percentile

The table lists the values corresponding to the percentiles, and the count of how many values that belong to every percentile in total, i.e. are smaller or equal to the percentile value. The main statistical data like minimum value, maximum value, mean value, standard deviation and median value are also presented together with the distribution.

In addition to the percentile distributions, it is possible to show an overview of all the the measurements and created histograms on a separate performance report management console. At the top of the management console there is a table of the total number of incoming, outgoing and conflated updates in the system during the observation period.

There is also a list of the median values of all of the histograms in increasing order as an overview and comparison of which processes that in general have the highest latency.

Figure 4.2 is an example of how the first part of the management console can look like.

(23)

Figure 4.2: Example of a the performance report management console

The histogram values are then plotted in charts like in Figure 4.3, with the latency in microseconds on the x-axis, and the total number of values on the y-axis. Both the x-axis and y-axis are in log scale to get a clearer picture of the distribution of the values.

Figure 4.3: Example of a chart on the performance report management console

(24)

To populate the chart the sample, values are divided into 100 buckets and the total number of values for each bucket is plotted together with the value level of that bucket.

The charts are also provided together with the overall statistical data for the distribution, seen in the bottom of Figure 4.3.

4.2 Workload Simulation

Since the test cannot be executed on one of the customer servers, the workload on the test server must be simulated during the measurement period.

The system can be loaded from two sides, both from the input channels and from the distribution channels. From the input side, the load will come in form of price updates and trade responses, whereas on the distribution side in form of service requests, which both needs to be simulated.

4.2.1 Simulation of Incoming Liquidity

Real input channels are connected to the system through so called adapters, that trans- late their messages and send them to the system component that is handling all of the incoming liquidity, the Liquidity Service (LS).

To simulate input channels, the system has test adapters, that create prices and send updates to the liquidity Service according to values read from a static price file. The static price file includes a list of instruments and prices that should be provided. Each instrument has a delay value (how long the input channel should wait before sending the first price), and a refresh value (how long to wait before sending a new price), that is used to control the price update frequency.

All input channel adapters use the same price file, and will therefore update their prices with the same frequency. What amounts and price margins the input channel should provide and in what manner, by RFQ or market data, is however defined for each adapter.

4.2.2 Simulation of Service Requests

To request services from the system the company has a test system that can be used to connect to the system with the same protocol as the real distributions channels use. This program is used on a PC, and is connecting to the test server over the Internet. Through this program all of the system services (RFQ, RFS, Market depth, trade and cancel order) can be requested.

4.3 Experimental Design and Analysis of Test Results

The test process was divided into two phases. The objective of the first phase was to look at several factors, and to identify which of the factors influence the performance the most. The goal was then to determine the threshold values for these factors in the second phase of the testing process, to find out what the systems limits are. The following two sections will describe how the experiments for the two phases were designed, and which evaluation techniques were used to analyze the results from the experiments.

(25)

4.3.1 Test Phase 1; Identifying Influencing Factors

The first phase of the testing process was designed to reduce the number of factors and to select the factors that have the most impact on the performance. By using a full factorial design we can find the effect of each factor and their interactions, by examine every possible combination of all factors at all levels.

Since the performance most often continuously decreases or continuously increases when a factor is increased from its minimum level to its maximum level[7], these two levels were used for each factor to rank the factors in order of impact, and to evaluate for which factors the effect is significant enough to be studied in more detail in the second phase. Table 4.1 shows an example of how the experiments would be designed if we have two factors, A and B. The minimum level is represented by -1 and the maximum level is represented by 1.

Table 4.1: 2²Factorial Design Experiment Factor A Factor B Result

1 -1 -1 y₁

2 1 -1 y2

3 -1 1 y3

4 1 1 y₄

To calculate the effect and the interaction of the factors a nonlinear regression model is going to be used. In the example where we have two factors, A and B the performance y would be regressed using the following model:

y = q0+ qAA + qBB + qABAB (4.1) Where q0 represents the mean performance, qA represents the effect of factor A, qB

represents the effect of factor B, and qAB represents the interaction between factor A and B. The effects and interactions in the example with factor A and B can be computed by preparing a 4 × 4 sign matrix as shown in Table 4.2.

Table 4.2: Sign Table for Calculation of Effects

I A B AB y

1 -1 -1 1 y₁

1 1 -1 -1 y2

1 -1 1 -1 y3

1 1 1 1 y₄

q₀ q_A q_B q_AB Sum/4

The entries in Column AB, is the product of the entries in columns A and B, and the q values are calculated by multiplying the factor column with column y, sum the values and divide it by 4. The equations for the q values are listed below:

q₀= 1 4

4

X

i=1

I_i× y_i (4.2)

(26)

q_A= 1 4

4

X

i=1

A_i× y_i (4.3)

q_B= 1 4

4

X

i=1

B_i× y_i (4.4)

q_AB = 1 4

4

X

i=1

ABi× y_i (4.5)

As an example, if we would have the results:

y1 = 15µs, y2 = 45µs, y3= 25µs, y4 = 75µs The effects would be calculated as follows:

q0 = (15 + 45 + 25 + 75)/4 = 40 (4.6)

q_A= (−15 + 45 − 25 + 75)/4 = 20 (4.7)

qB= (−15 − 45 + 25 + 75)/4 = 10 (4.8)

q_AB = (15 − 45 − 25 + 75)/4 = 5 (4.9)

The result is interpreted as follows. The mean performance is 40µs, the effect of factor A is 20µs, the effect of factor B is 10µs, and the interaction between factor A and B accounts for 5µs.

4.3.2 Test Phase 2; Finding Threshold Values and Bottlenecks

The second test phase was meant to investigate the system performance in more detail, by analyzing less factors but at more levels. The factors that did not affect the performance significantly in the first test phase were ignored, and the factors that affected the performance the most were analyzed at more levels.

The response times were validated by testing with a wide range of levels, and by looking at the mean and the percentile values we could both validate it in average case and on different significance levels. The factors were tested at levels that are higher than the highest normal use-case, which was used as the maximum value in the first test phase, to find threshold values and to see at what factor level the system reaches its limits. When a limit was found, more measurements were added, to identify where possible bottlenecks were located.

The experiments in the second phase had a full factorial design to create a complete picture of the performance of the system with different input parameters.

(27)

Results

In this chapter the results from the performance evaluation are presented. Section 5.1 presents the performance effects that were found for different factors. Section 5.2 describes how the response times were validated, and Section 5.3 the discovery of significant maximum values. Section 5.4 describes how threshold values and bottlenecks were investigated, and Section 5.5 presents identified value distributions.

5.1 Factor Effects

In the first test phase we looked at 4 factors (described in Section 3.2):

(A) Delay value (B) Refresh value

(C) Number of input channels (D) Number of streams

16 experiments were carried out, to test every combination of the four factors with two levels, minimum level and maximum level. In every experiment, 100 000 latency values were sampled. The mean value of the 100 000 sample values was used as the result of the experiment, when the effects and interactions of the factors were calculated.

How much of the total variation of the performance each factor accounts for is presented in Figure 5.1.

21

(28)

22 CHAPTER 5. RESULTS

Figure 5.1: The effects and interactions of the factors in test phase 1

Figure 5.1 shows that the number of open streams affects the performance the most, and the delay the least. The refresh value when in combination with the number of open streams have the second highest impact and the refresh value the third most impact.

The factor effects in decreasing order are:

1. (D) Nr streams

2. (BD) Refresh + nr streams 3. (B) Refresh

4. (C) Nr input channels

5. (CD) Input channels + nr streams 6. (ABD) Delay + refresh + nr streams

5.2 Validated Response Times

The factors were varied from their minimum levels to their maximum levels and tested in all combinations in the first test phase. Table 5.1 lists which factors that were at maximum level in the different experiments.

(29)

Table 5.1: Factor levels

Experiment number Factors at maximum level

1 None

2 A

3 B

4 AB

5 C

6 AC

7 BC

8 ABC

9 D

10 AD

11 BD

12 ABD

13 CD

14 ACD

15 BCD

16 ABCD

As described in Section 2.1, the goal of a real-time system is to respond before a measurable deadline or within a bounded time frame. To validate the response times, the mean response time value in each experiment was therefore compared to the pre-defined time frame for the process that was measured. Figure 5.2 presents what percentage of the time frame that was used by the process in the different experiments.

Figure 5.2: The percentage of the time frame needed for the response time Figure 5.2 shows that the mean response time in all of the experiments is within the specified time frame, because the mean latency values is less than 15% of the time frame in all of the 16 experiments.

(30)

5.3 Significant Maximum Values

We found at least one outlier, that was measured outside of the bounded time frame.

In fact, the maximum values or outliers are significantly higher than the mean value in some experiments. Therefore, it could not be verified that 100 percent of the sampled response times were within the specified time frame. In Figure 5.3 we describe outliers in relation to the pre-defined time frame. The maximum value measured in each of the 16 different experiments is shown in relation to the pre-defined time frame.

Figure 5.3: The maximum values in relation to the time frame

Figure 5.3 shows that the maximum values or outliers are significantly higher than the mean value in some experiments presented in Figure 5.2. In four of the experiments we sampled one or more values that was lying outside of the pre-defined time frame.

5.4 Threshold Values and Bottlenecks

In the second test phase the number of factors was decreased and the number of levels were increased to validate the response time for different levels of the factors with the highest impact on the performance, the number of open price streams and the refresh values. In the first test phase the factors were varied from their minimum level to their maximum level for what was found as a normal system usage. Since all of the median response times were within the acceptable time frame it were decided to create experiments with significantly higher levels for the factors in the second test phase, to examine at what levels the system reaches its limits. 16 experiments was constructed, to test all combinations of two factors, number of streams and refresh value, at four levels.

Since the latency values for the real system are confidential it was decided to execute the tests on components that the principal are considering to include in the system, to see how they perform and how they would impact system performance.

One of the components that was tested was an implementation of a scheduling algorithm, that can be used to throttle incoming price updates to the system. If the system is overloaded with incoming price updates a scheduling algorithm can be used to store the latest updates and to send them into the system at a rate that the system can tolerate.

(31)

When evaluating the experiment results from this component it was found that it did not have a continuous change in performance as we usually see when the factor levels are increased. Instead a threshold value was found where its latency would peak and then go back to normal values again for higher levels on the factors. Figure 5.4 shows how the median latency for this component at first decreased at fast rate when the the level of the refresh value is increased, and then dropped and went back to continue on a low level.

Figure 5.4: The change in latency of the scheduling component

When the latency values for the scheduling algorithm was compared to the systems repose times, it was found that the scheduling algorithm would account for up to 75% of the total response time if it reaches its threshold value.

5.5 Value Distributions

The value distributions for the different processes were also investigated to look for possible improvements. For some processes it was found that sample values had a similar range (minimum and maximum value) in all experiments, but the distribution of the sample values differed with different factor levels. In experiments with low factor levels a bimodal value distribution was found for many of the measured processes, while the values were more evenly distributed for the same processes, in experiments with the highest factor levels.

The distribution of the latency values sampled with the lowest factor level is presented in Figure 5.5, as a histogram with the number of samples on the y-axis and latency value at the x-axis. For confidentiality reasons we omit the exact numbers.

(32)

Figure 5.5: Statistical distribution of latency values at lowest factor level The distribution of the latency values sampled for the same process, but with the highest factor level is presented in figure 5.6, also as a histogram with number of samples on the y-axis and latency value at the x-axis. The samples in Figure 5.6 have the same scale as the samples in Figure 5.5. For confidentiality reasons we omit the exact numbers.

Figure 5.6: Statistical distribution of latency values at highest factor level

By studying Figure 5.5 and Figure 5.6 we can see that the samples have similar minimum and maximum values, and that the values get more and more evenly distributed over the value range when the factor levels are increased.

(33)

Discussion

In this chapter the results from the performance evaluation are discussed. In Section 6.1, the results from the first test phase are discussed, including a discussion about the factor effects, the validation of the response times, and the significant maximum values. Section 6.2, discuss the results from the second test phase, which includes a discussion about the found threshold value, and the different value distributions. The last section, Section 6.3, discuss the reliability of the results and possible sources of errors.

6.1 Discussion of First Phase Results

The results about the factor effects were not exactly as expected. Our expectation of what would influence the performance the most was the refresh value, since this is the factor that has the biggest difference between minimum and maximum value, and therefore gives the biggest difference in the total incoming load into the system, when it is varied from its minimum level to its maximum level. Instead it was found that the number of open streams impacted the general latency the most. The reason for this could be that the system uses resources and execution time to open and manage a price request, which might impact the latency of other processes as well.

The tests from the first test phase validate that the system is capable of processing the information load put on the system under production-like conditions without compromising any relevant response times in the majority of cases. The mean and median values from the tests all lie in what is seen as acceptable response times according to the principal. Even though it was expected that the system was capable of handling the normal load with good response times it was expected to see a clearer difference in the results from the lowest levels compared to the highest levels. It was nevertheless found that the latency was relatively low even in the experiments when all factors were at high level, and no breakpoints or threshold values were found during the tests. This shows that the system is capable of handling heavier usage patterns than the current system users experience.

An unexpected result was the significantly higher maximum values. Since the test was executed on a Java Virtual Machine (JVM) which performs both code compilation and deallocation of unused memory during run time, which makes the execution time non-deterministic, it was expected to see some outliers in some of the experiments. How- ever, it was found that the maximum values in all of the experiments were significantly higher than the mean and median values. Even though the system can miss a few dead-

27

(34)

28 CHAPTER 6. DISCUSSION

lines without going into a bad state, which would be the case for a hard real-time system, the goal of any real-time system is to be able to guarantee that all deadlines are met.

The maximum values that exceed the specified time frame for a process should therefore be further investigated and possibly improved. A possible explanation is that some of the outliers might be caused by a garbage collection pause, and could possibly be optimized by tuning the settings for the garbage collector. For example, the system is using the parallel GC for the young generation (described in section 2.3.2), and the CMS for the old generation right now. Switching to the G1 garbage collector could possibly yield in smaller maximum values, since the G1 collector is designed to have low pauses and is ment to be the long term replacement for the CMS. Another explanation might be the runtime compilation, which possibly could be avoided for the most frequent processes, by "warming up" the JVM by executing some extra runs before the test is started.

6.2 Discussion of Second Phase Results

When the number of factor levels was increased for the second test phase it was expected to find threshold values for the factors where the performance of different processes or components would start to increase dramatically faster, because it has reached its maximum throughput and utilization, and a queue starts to build up with requests that have to wait to be processed. This was the result in most of the cases, but one component, a scheduling component, was found such that when the factor levels were increased even more, the latency suddenly dropped and the performance went back to normal levels again. This unexpected behavior could either be explained by the fact that the component is not optimal for our use-case, or that there was an error in the implementation of the component.

The sampled values of the different processes were expected to be more or less nor- mally distributed, with most samples around the mean value and a few outliers. A distribution that was often seen was a bimodal distribution. This can possibly be explained by different execution times for different kinds of requests, and indicates that some of the request values seem to have a bigger impact on the performance. This fact is something that can be valuable to evaluate in a future experiment. The bimodal distributions could also be explained by garbage collections processes, database delays or operating system interrupts, that would pause the application execution and increase the latency of a response. The faster a process is, the more impact a short pause would have on the total latency, and could be seen as a bottleneck for the performance of that process.

6.3 Reliability and Sources of Errors

When measurements are made on a real system there will always be possible sources of errors in the sampled values. To decide how long to run the experiments, and how many samples to collect to get reliable results, some initial tests of the experiments were made. In the initial tests the same experiment was executed several times and stopped after different amount of time. The results from the experiments was then compared to see at what point we did not get any further information by collecting more samples, and therefor could stop the experiment. It was found that the results would not change significantly after 1000 samples. Since the execution times of the experiments were relatively small it was possible collect 100 000 values to increase the reliability of the results.

(35)

In some of the experiments we had bimodal distributions and outliers that were significantly higher than the mean value. These facts affects the mean value and indicates that the mean value should not be used alone to validate the latency. For example would a significantly higher maximum value increase the mean value, and indicate worse performance than we actually would see in most cases. To get a more complete picture of the performance the results should also include outliers and value distributions.

The performance of a system depends on wide range of factors, and it can therefor not be validated for all parameters and all usage patterns. The results from the tests would look different if other factors were varied and if other values were chosen for the parameters that were not varied.

(36)

Chapter 7

Conclusions

What we have learned during this project is that there is no simple answer to describing system performance when we take measurements at application level. Latency depends on several different parameters, and the test results will depend on which factors we test and which parameter values we use. What we have developed in this project is a general method that can be used to compare the performance effects of different factors and to compare the system performance for different parameter values, in many types of systems.

The method gives a good insight into the performance of a client-server system. The statistical values calculated by the measurement tool, such as maximum and median value are useful to use for comparison and validation when evaluating the performance of a real-time system, where the goal is that all deadlines are met, and not only the majority of them. By studying the percentile distributions it is possible to see the result for different confidence intervals and to determine if there is any significant difference between two alternatives.

The full factorial design approach of Section 4.3 gives an good overview of the performance of the system for certain parameter values, and shows how the performance is affected by all of the tested factors in different combinations. It also gives an insight into which factors and parameter values that should be further investigated.

7.1 Future Work

The performance evaluation of the studied system could be continued by:

• Tuning the JVM and garbage collector by choosing theses parameters as factors when executing the performance tests.

• Continuing the work with finding bottlenecks in the system, by applying more measurements to investigate which processes that account for the threshold values.

• Developing a test suite with static parameter values that can be used to compare the performance of new development, and to ensure that the performance is maintained high between new releases.

Furthermore, the measurement tool could be improved by:

• Automating the process of populating the experiment result matrices.

30

(37)

• Extending the performance report page to be able to compare the results from two or more experiments at a time.