Evaluation of traffic generation tools and implementation of test system

(1)

Evaluation of traffic generation tools and implementation of test

system

Henric Englund

February 1, 2011

Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Mikael R¨annar

Examiner: Fredrik Georgsson

Ume˚ a University

Department of Computing Science SE-901 87 UME˚ A

SWEDEN

(2)

(3)

Abstract

The purpose of this master’s thesis is to examine the possibilities to build an efficient and accurate distributed IP traffic generator based on a open source software traffic generator.

The problem lies in the demand for high performance traffic generators during quality

assurance of IP equipment and the high price of commercial hardware traffic generators. An

evaluation is conducted to identify a suitable traffic generator, given a set of requirements

that the traffic generator must fulfill to be usable in performance testing. A distributed

system utilizing the most suitable traffic generator is then implemented. The result and

conclusion of this report is that hardware traffic generators are superior to the software

based counterpart due to the several drawbacks that software traffic generators suffers from,

such as disc I/O and CPU load.

(4)

ii

(5)

List of Figures

3.1 Overview of a typical virtual private network . . . . 8

5.1 Server setup . . . 17

5.2 Illustration of the servers and DUT setup . . . 23

5.3 Ethernet Type II Frame . . . 23

5.4 Overview of the system design with communication channel and database calls 27 5.5 Overview of the functionalities in the different parts of the system . . . 28

5.6 Screenshot of the web client user interface . . . 29

v

(8)

vi LIST OF FIGURES

(9)

List of Tables

3.1 IPsec performance over a 1000 Mbps network . . . 11

3.2 SSL performance over a 1000 Mbps network . . . 11

4.1 Preliminary schedule . . . 13

5.1 List of eliminated candidates due to missing functionality . . . 17

5.2 Results from initial performance tests . . . 18

5.3 Results from performance test utilizing 1 gigabit interface with unidirectional traffic . . . 21

5.4 Results from performance test utilizing 2 gigabit interfaces with unidirectional traffic . . . 21

5.5 Results from performance test utilizing 3 gigabit interfaces with unidirectional traffic . . . 22

5.6 Results from performance test utilizing 4 gigabit interfaces with unidirectional traffic . . . 22

5.7 Quality comparison of 100Mbit and 200Mbit unidirectional traffic flows from Hpcbench and Ixia traffic generators. . . 24

5.8 Quality comparison of 100Mbit and 200Mbit unidirectional traffic flows from Iperf and Ixia traffic generators. . . 24

5.9 Final score from the evaluation phase . . . 25

vii

(10)

viii LIST OF TABLES

(11)

Chapter 1

Introduction

As network connections are getting faster and faster and more and more devices are con- nected to networks such as the Internet, the demand for properly quality assured network equipment becomes more and more critical.

1.1 Background

Clavister is a provider of high performance security solutions for a wide range of customers.

As the customer range consists of customers up to enterprise and telecom level, it is very important for Clavister to validate the performance and stability of the product portfolio.

With the foreseen capacity increase during the coming two years, Clavister needs alternative ways to perform performance measurements as the existing tools like Ixia, Spirent etc. are too expensive to be used for all measurements.

To improve Clavister’s test environment within reasonable costs, Clavister has identified a number of server based performance test applications. These applications need to be evaluated with regards to performance, stability, functionality, ease of use and price.

As a second phase of this project, a testing environment, utilizing the best application, will be implemented.

1.2 Report outline

This thesis report describes the evaluation of open source software traffic generators, the identifying of the most suitable traffic generator and the implementation of the distributed testing environment.

– Chapter 1 Introduction:

This chapter, an introduction to the thesis.

– Chapter 2 Problem description:

A detailed description of the thesis problem is given in this chapter.

– Chapter 3 Securing virtual private networks:

An in-depth study on some of the techniques used to secure virtual private networks.

1

(12)

2 Chapter 1. Introduction

– Chapter 4 Accomplishments:

This chapter describes the work process and points out some of the problems during the project.

– Chapter 5 Results:

The result of both the evaluation phase and the implementation phase is discussed in this chapter.

– Chapter 6 Conclusions:

The limitations of the implementation and the project in general is discussed in this chapter along with a discussion of what can be done to extend and improve the solu- tion.

– Chapter 7 Acknowledgements:

This chapter acknowledges the people who helped and supported me during this

project.

(13)

Chapter 2

Problem Description

The project is split into two phases. The first phase consists of a evaluation and analysis of the existing software traffic generators based upon a set of requirements. The second and last phase of this project is of more practical nature. The task is to develop a scalable test environment using the best application from the evaluation phase.

2.1 Background

In a test laboratory for network equipment, there are various situations where synthetic traffic generation can come in handy. The following is a sample of the possible situations where traffic generation fills a purpose:

– Throughput measurements

In order to find out how much traffic throughput (both bytes per second and packets per second) a product can handle, a traffic generator is most certainly needed.

– Stress and stability tests

To perform a proper stress and/or stability test, some kind of controlled traffic is probably needed to generate some load on the device.

– Background traffic for function tests

Background traffic can be very useful during function tests.

2.2 Phase 1 – Evaluation

This section explains the first phase of the project, the evaluation phase.

2.2.1 Problem statement

During the first phase, an in-depth evaluation of the existing software based traffic generators is done. Considering the time and money constraints of this project, only open source alternatives will be taken into consideration.

3

(14)

4 Chapter 2. Problem Description

2.2.2 Goals

The goal of this phase is to identify the one application that best satisfies the criteria of the evaluation.

2.2.3 Methods

In order to perform a qualified evaluation of the existing software traffic generators, a couple of subtasks need to be executed. Those are: defining the criteria that the evaluation will be based upon, selecting the candidates that will be analyzed and evaluating the candidates with regard to the criteria.

Defining the criteria

The whole evaluation will be based upon a criteria and it is highly important that this criteria is well defined and reflects the demands in a correct way. An outline for the criteria is:

– Performance – Stability – Quality – Functionality – Scalability Selecting candidates

The possible candidates for this evaluation needs to be identified. To accomplish this, the web will be manually scraped. However, the starting point is a list of applications assembled by Dipartimento di Informatica e Sistemistica, Universit`a di Napoli “Federico II” (Italy)[4].

Evaluating the candidates

The final list of candidates will then be mapped to a list of criteria to rule out the ones that do not match up with the criteria.

Throughout this thesis UDP will be used as the one and only protocol for throughput tests, all according to appendix C in RFC 2544[26]. The frame sizes specified in RFC 2544 will be used for throughput tests and comparisons.

Initially the performance criterion can be excluded to save some time and instead per- form a more in-depth performance evaluation on the candidates that fulfill all of the other requirements.

Conclusion

Some thoughts and conclusions over the evaluation and its result will be discussed.

2.3 Phase 2 – Integration

This section explains the second phase of the project, the integration phase.

(15)

2.3. Phase 2 – Integration 5

2.3.1 Problem statement

A testing environment will be implemented using the best tool from the evaluation phase.

The environment should be scalable by stacking servers to reach the desired performance levels. The system need to have functionality that makes it possible to execute automated and/or scripted tests. In addition, the system has to be stable and the results it generates has to be comparable with results from an Ixia traffic generator [15].

2.3.2 Goals

The goal is to, as a user of the system, be able to perform various throughput performance tests. The user should be able to configure the test parameters such as:

– Protocol (TCP or UDP) – Packet size (UDP only) – Speed (UDP only) – Duration

– DUT (device under test) for logging purposes – Sending and receiving server

The user should also be able to monitor the progress of a running test and be able to retrieve reports from older tests.

2.3.3 Methods

In order to fulfill the goals, a web based user interface, a underlying platform for the traffic generation servers and a communication solution for the web server and the traffic generation servers will be implemented. The communication channel will be used to start generation of traffic and to send reports back to the user via the web server.

The design of the implementation should be performed in a way that makes it possible to

replace the traffic generator software with another application without too much struggle,

given that the replacement has the same functionality. The web based user interface should

not need to be aware of which software traffic generator that is currently being used, neither

should the database design and to some extent not even the communication channel.

(16)

6 Chapter 2. Problem Description

(17)

Chapter 3

Securing virtual private networks

This section contains a study on virtual private networks and the different approaches used to secure such networks.

3.1 Introduction

A virtual private network (VPN) is a technology used to access private networks from a remote host that is not physically connected to the private network. VPNs can also be used to interconnect two or more private networks in order to build one larger virtual network.

Since VPNs are much cheaper than leased lines, it is commonly used to interconnect offices spread out all over the world with an intranet. VPNs can be used for both data and voice communications hence it is of interest for the telecom business[3].

VPNs do however not necessarily offer any security which makes the connection between networks and hosts fragile for malicious attacks, such as packet sniffing and identity spoofing.

In order to fill these security gaps, various techniques and protocols are currently being used by different vendors. Ensuring security over public networks is a difficult task and the most common methods will be discussed later in this document.

3.1.1 Terminology

VPNs can be divided into three separate categories: trusted VPNs, secure VPNs and hybrid VPNs. Given the nature of this paper, only secure VPNs will be discussed in greater detail later on in this document.

Trusted VPNs

Before the glory days of the Internet, a VPN consisted of one or more circuits leased from a communication provider. These circuits acted as a single wire in a local physical network.

The only privacy provided by the communication provider was that no one else but the customer used the same circuit. The general approach was that the customer trusted that the provider was not compromised, thus they are called trusted VPNs[3].

7

(18)

8 Chapter 3. Securing virtual private networks

Secure VPNs

The most common VPN scenario today involves secure VPNs. Secure VPNs makes use of various encryption and integrity technologies to secure the data that is being transmitted over the VPN. The data is encrypted at the edge of the originating network, moved over a public network in encrypted form like any other data, and is finally decrypted when it reaches the destination. Attackers can of course see the encrypted data travelling over the public network but are unable to decrypt it or inject fabricated bogus content without the recipient being aware of this and rejecting the data[3].

Hybrid VPNs

Hybrid VPNs are a combination of secure and trusted VPNs. They combine the secu- rity mechanisms that secure VPNs provides with the path assurance that trusted VPNs provides[3].

3.1.2 Overview

Figure 3.1 illustrates a typical virtual private network where two local area networks and one laptop client are interconnected over a wide area network. The LANs are using a physical VPN gateway to establish the VPN connection whilst the laptop client uses a VPN software client.

P u b l i c n e t w o r k

Figure 3.1: Overview of a typical virtual private network

3.2 Security mechanisms

The major security protocols used in secure VPN scenarios are Internet Protocol Secu- rity (IPsec) and Secure Sockets Layer (SSL). IPsec and SSL are located at different layers of the Open Systems Interconnect (OSI) model and have thus different advantages and disadvantages[34].

3.2.1 Internet Protocol Security (IPsec)

IPsec is a protocol suite that provides methods for authenticating and encrypting each

IP packet of a data stream. The IPsec protocol suite also includes protocols for two-way

authentication and negotiation of cryptographic keys to be used during the session. As the

name implies, IPsec is designed to secure IPv4 and IPv6 traffic[27]. IPsec operates in the

network layer of the TCP/IP protocol suite hence it is transparent to user applications and

(19)

3.2. Security mechanisms 9

applications does not need to be modified in order to make use of IPsec functionality, neither do end users require special training in security mechanisms.

IPsec architecture

The IPsec protocol suite is a set of open protocols used to perform various functions.

– Security association (SA)

IPsec uses SAs as the basis for adding security functionality into the IP protocol. A SA is a collection of algorithms and parameters that is used to encrypt and authenticate a unidirectional data flow. To acquire secure bidirectional data flows, a pair of SAs is required. SAs are typically set up by Internet Key Exchange (IKE) or Kerberized Internet Negotiation of Keys (KINK)[28].

– Authentication header (AH)

AH provides connectionless integrity and data origin authentication of IP packets. AH can also be used to prevent replay attacks[28].

– Encapsulating security payload (ESP)

ESP provides origin authenticity, integrity and confidentiality protection of packets[28].

Transport and tunnel mode

IPsec can operate in two different modes: transport and tunnel mode. In transport mode, only the IP payload is protected and the IP header remains the same. In tunnel mode, the entire original IP datagram is encapsulated in a new IP packet[27].

Cryptographic algorithms

The following cryptographic algorithms are required in every implementation of IPsec[20]:

– HMAC-SHA1-96 (RFC2404) for integrity protection and authenticity – TripleDES-CBC (RFC2451) for confidentiality

– AES-CBC with 128-bit keys (RFC3602) for confidentiality

Internet Key Exchange

Internet Key Exchange (IKE) is a protocol that is often used to negotiate the IPsec security associations (SAs), in a protected manner. Version 2 of IKE[2] was defined in december 2005 in order to address some issues in the first version, such as DoS attack resilience, NAT traversal, mobility support and reliability.

3.2.2 Secure Sockets Layer (SSL)

SSL is an application layer protocol that is frequently used to protect HTTP transactions

but can be used for other purposes as well, such as tunneling an entire network stack to

create a VPN.

(20)

10 Chapter 3. Securing virtual private networks

SSL architecture

SSL comes with a set of open protocols that is used during different phases of a SSL session.

– Handshake protocol

The handshake protocol is used to perform authentication and key exchanges[6].

– Change Cipher Spec protocol

This protocol indicates that the chosen keys will now be used[6].

– Alert protocol

Error and warning signaling are handled by the alert protocol[6].

– Application Data protocol

The application data protocol is responsible for transmitting and receiving of en- crypted data[6].

SSL VPN types

There are various types of SSL VPNs but there are two major types, SSL portal VPNs and SSL tunnel VPNs:

– SSL portal VPN

This type of SSL VPN allows for a single SSL connection to a web site, called portal, that provides access to internal resources. The remote user accesses the portal using a modern web browser with SSL support, authenticates himself or herself and is then capable of accessing services from the internal network.[7]

– SSL tunnel VPN

SSL tunnel VPNs gives the remote user access to multiple services from the internal network, including applications and services that are not web-based. This is possible by using a web browser to tunnel the communications over a SSL connection.[7]

3.2.3 Comparison of IPsec and SSL

Authentication methods

IPsec supports the use of both certificates and pre-shared keys while SSL only supports the use of certificates. IPsec provides methods for mutual authentication where as SSL provides methods for both mutual authentication and unilateral authentication. Both IPsec and SSL require that the hash functions HMAC-SHA-1 and HMAC-MD5 are implemented, for authentication of the exchanged messages.

Connections

As mentioned earlier in section 3.2.1 on page 9, IPsec can be configured in both tunnel mode and transport mode. SSL uses another approach with one connection per session where there’s independence between the sessions. However, performance may decrease as the number of connection increases.

Transportation

Portal styled SSL VPNs can only operate over TCP due to the nature of SSL. SSL tunnel

VPNs is, just like IPsec, capable of carrying any transportation protocol.

(21)

3.2. Security mechanisms 11

Interoperability and mobility

IPsec requires that client software is installed and configured on all devices that are to be used remotely from a untrusted network. This can be a quite troublesome task to manage in a organization with hundreds or thousands of mobile users. Since SSL is integrated in applications, no extra software or configuration is required on the devices being used to access internal resources.

Algorithms

Both protocols provides secure key exchange, strong data encryption, authentication and supports leading and well-known encryption, integrity and authentication technologies such as: 3DES, AES, MD5, RC4, SHA1 etc.

Performance

A performance comparison[1] between IPsec and SSL performed at the Tokyo University of Technology shows that SSL has slightly better throughput performance. This is of course depending on the used algorithms and the implementation of IPsec and SSL. The tests were performed by establishing tunnels between two identical computers and measuring the throughput by sending data through the tunnels. Super FreeS/WAN 1.99-8 was used to setup the IPsec tunnel and Stunnel 3.26 was used to setup the SSL tunnel. Table 3.1 shows the IPsec performance results and table 3.2 shows the SSL performance results.

Algorithms Throughput (Mbps) No Algorithm 427

SHA1 + DES 110

SHA1 + 3DES 69.5 SHA1 + AES-128 156 SHA1 + Blowfish 123.5

MD5 + DES 137

MD5 + 3DES 75.7

MD5 + AES-128 198 MD5 + Blowfish 148

Table 3.1: IPsec performance over a 1000 Mbps network

Algorithms Throughput (Mbps)

No Algorithm 427

3DES-EDE-CBC-SHA 86

DES-CBC-SHA 152

RC4-128-SHA 219

RC4-128-MD5 246

EXP-RC2-CBC-MD5 216

Table 3.2: SSL performance over a 1000 Mbps network

(22)

12 Chapter 3. Securing virtual private networks

One will have to ignore that different algorithms are used in IPsec and SSL in order to draw any conclusions from the performance results. It should however be possible to roughly compare SHA1 + DES (IPsec) with DES-CBC-SHA (SSL) and SHA1 + 3DES (IPsec) with 3DES-EDE-CBC-SHA (SSL) and table 3.1 and 3.2 clearly shows that the SSL implementation in this performance test is by far faster than the IPsec counterpart.

3.2.4 Conclusions

Considering the fundamental differences between IPsec and SSL and that they both have their advantages and disadvantages, it would not be fair to say that one of the protocols is better than the other one.

– Clients

SSL does not require that special VPN clients are installed and configured on devices to be used remotely from a untrusted network. All that is needed is a modern web browser.

– Security flaws on the remote host

The portability in SSL VPN can be a security issue if the internal network is accessed from insecure and possibly infected public devices.

– Site-to-site solutions

IPsec is superior when it comes to site-to-site solutions on fixed connections.

– IPsec trust issues

IPsec can only create a secure tunnel between a client and a terminating IPsec VPN gateway at the edge of the internal network which gives the user access to all the data in the internal network even if the user is only interested in checking his email account.

SSL can create end-to-end connections that is better suited for such scenarios.

(23)

Chapter 4

Accomplishment

This section describes the work process with regard to the preliminary schedule.

4.1 Preliminaries

Table 4.1 shows a preliminary schedule for all the job involved in this master thesis. Prior to the first post in the schedule, a project specification and some preparations was made.

All the work including report writing is planned to take place at the Clavister office in Ornsk¨oldsvik. Even though the schedule specifies that the report writing will take place at ¨ the end of the project, some writing should be done throughout the entire project to ease up the finalization of the report.

Weeks Tasks

4 Theory and studies on traffic generators.

Identification of possible candidates for the evaluation.

Familiarization of work environment.

1 Initial evaluation and selection of the primary software candidates.

4 In-depth quantitative evaluation of the most promising generators.

1 Analysis of evaluation results and selection of software to integrate.

5 Integration of the most fitting software into the test framework.

5 Report writing.

Table 4.1: Preliminary schedule

4.2 How the work was done

Studies on traffic generators and identification of candidates

I started off the project with some reading up on all the possible traffic generators. Early on, Clavister test engineer Jens Ingvarsson held an introduction on how the Ixia traffic generator works during RFC 2544 tests, to get some insight in what characteristics to look for when identifying possible candidates for the evaluation. In parallel with the theoretical work, I also installed and familiarized myself to some servers that I later was going to test

13

(24)

14 Chapter 4. Accomplishment

the traffic generators on. During this time I realized that there actually were plenty of open source traffic generators out there, of course with varying quality and features. I successfully compiled and tried most of them on a server running Ubuntu[33] Linux. I did also produce an evaluation template with some criteria to be used later on in the evaluation.

The initial plan of 4 weeks for introduction and identifying candidates was a good ap- proximation for how much time the job required.

Studies on virtual private networks

The plan was to do all the theory studies in the first four weeks but due to several circum- stances I ended up doing most of the VPN studies after the traffic generator studies and the implementation was complete. This did not cause any interference with the other parts of the project.

Initial evaluation and selection of the primary candidates

The scheduled 1 week for investigating the possible candidates was a bit optimistic, it turned out to require more time than expected due to some problems with one of the most promising candidates (Iperf). The build I was working with (the latest stable release at the time) consumed enormous amounts of CPU and I spent some time investigating and patching without any greater success. The problem was then solved by converting to the latest version from the Iperf SVN repository.

In-depth quantitative evaluation of the most promising generators

I ended up with only 2 candidates left to deeper evaluate. I executed my evaluation template on both candidates and ended up with a set of scores for the criteria. The most time consuming part was the extensive performance tests. The time frame of 4 weeks was a good approximation but I used a couple of extra days.

Analysis of evaluation results and selection of software to integrate

The outcome of the evaluation was very clear and I did not have to spend much time selecting which software to use for the implementation. One week for this was more than enough.

Integration of the best fitting software into the test framework

I initially thought that my scheduled time frame of 5 weeks was a bit much but after some time I realized that this phase was going to be more time consuming than I had originally imagined. I did however not have any major problems during this phase except that I lost one of my servers and had to replace it with a new one, with all the extra work that comes with that. After all, I had to realize that perfectionism is not possible with the time constraints of the project.

Report writing

Even though the plan was to write the report in parallel to the other work, that was not

what I ended up doing. Instead I made notes during all the work and ended up writing the

complete report at the end of the project. The planned 5 weeks was thus a relatively good

approximation.

(25)

Chapter 5

Results

5.1 Phase 1 – Evaluation

This part of the project is focused on the evaluation of traffic generators.

5.1.1 The criteria

The criteria for this evaluation was established during several meetings with the Quality Assurance (QA) team at the Clavister office in ¨ Ornsk¨oldsvik. During these meetings it was made clear that the wanted application needed to have the following properties:

Functionality

The following functionalities are required in order to make a good option for the implemen- tation phase:

– Ability to run multiple instances on the same machine or at least be able to utilize multiple network interfaces.

– Need to be able to specify packet sizes in UDP transfers.

– Need to be able to specify packet send rate in UDP transfers or be able to specify bandwidth.

– Need to be able to specify the duration of the sending.

– It must be possible to configure and launch the tool from scripts.

– Structured, informative and parsable reports.

– The reports must include number of packets sent and number of packets received or it must be possible to derive that information easily.

– It is desired that the application can report statistics at specified intervals.

– Some kind of documentation should be available covering the usage of non-trivial use cases.

– If possible, it is desired that the application have some kind of community backing.

15

(26)

16 Chapter 5. Results

Performance

Considering that the main purpose of the second phase of the project is to implement a testing environment, it is required that the underlying traffic generator is capable to perform as good as possible and hopefully can be fairly compared to professional testing equipment like Ixia[15].

Stability

For a candidate to be acceptable, it has to be able to run stable for a long time without any strange behavior. In order to prove that an application is stable enough, it will be setup to run at half of max capacity for 100 hours.

Quality

In order for a candidate to be useful, its generated traffic must be validated to reference tests from a Ixia traffic generator.

Scalability

It is desired but not required that the candidates has functionality to scale up the system by adding more hardware, in form of servers. However, if such functionality is not available, it is desired that multiple instances can be run in parallel without interference. In either case, it is required that multiple network interfaces can be utilized in each server.

5.1.2 Selection

To perform a competent evaluation, a large list of possible candidates is required. That list is then narrowed down to a smaller but more competent list of candidates. The bigger list was assembled by browsing various websites listing open source projects and websites that focus on network benchmarking.

In order to filter out the most fitting tools, a technique called “Elimination By Aspects”

(EBA)[19] has been performed on the list of candidates. The candidates that did not fulfill the minimum functionality requirements were eliminated from the evaluation, thus ending up with an even smaller and even more competent list. This technique was used to improve efficiency of the evaluation process since there is no real use in running time consuming performance and stability tests on candidates that does not provide the required functionality.

Additionally some initial performance tests were performed in order to see if it would be possible to filter out some candidates due to obvious performance issues. The score from this test is analyzed and the candidates which score is not within 20% of the top score are eliminated from the evaluation. Considering the purpose of this initial performance test, nothing but pure packet generation performance is taken into account.

Functionality requirements

The large set of candidates is mapped against the list of functionality requirements defined

in section 5.1.1. All candidates are analyzed in order to make sure that it has all the

required functionality. In case a candidate is missing some functionality, this is noted and

the candidate is eliminated from this evaluation.

(27)

5.1. Phase 1 – Evaluation 17

The most common elimination reason was missing or insufficient reporting functionality.

In order for a candidate to be useful in the next phase of this project, it is highly required that the results from the application are both well detailed and possible to parse for use in other scenarios.

Candidate Reason for elimination Bit-twist No report functionality.

Brute No report functionality.

Curl-loader No UDP.

Harpoon No report functionality.

KUTE No report functionality.

MGEN Insufficient reports.

Netperf Inability to specify bandwidth.

Mz No report functionality.

Pktgen No report functionality.

RUDE/CRUDE No report functionality.

Table 5.1: List of eliminated candidates due to missing functionality

Initial performance test

In this initial performance test, two servers with multiple gigabit interfaces are used. From my own experience it is clear that the best possible performance is achieved if the different servers only need to focus on one task, receiving or sending. If one server acts both as a sender and a receiver, the high CPU utilization of packet generation makes the receiving part drop packets which leads to very poor overall performance. The specification of the servers can be found in appendix A. To avoid any external disturbance in the network, the servers are directly connected to each other without any switch or similar between them.

Figure 5.1 describes the basic setup.

Server 1

Server 2

Figure 5.1: Server setup

In order to evaluate packet generation performance, a set of tests with different packet

sizes is performed and the maximum achieved performance noted. Extra weight is put on

the smaller packet sizes since they are most expensive to generate and to create a linear

increasing score weight relative to the packet size.

(28)

18 Chapter 5. Results

The columns with “Value” in the header shows the maximum measured bandwidth during these tests. Formula 5.1 is used to calculate a score between 0-10 based on the achieved network throughput and the maximum theoretical throughput:

M easured M bps × 10

N IC Speed × N umber of N ICs (5.1)

Since this test utilizes 4 gigabit network interfaces and only unidirectional traffic flows, the equation for score in this specific test is:

^value×10

_1000×4

Hpcbench Iperf

Criteria Weight Value Score W. Score Value Score W. Score

UDP 64 byte 8 160 0.4 3.2 140 0.35 2.8

UDP 128 byte 4 315 0.79 3.16 262 0.66 2.64

UDP 256 byte 2 575 1.44 2.88 427 1.07 2.14

UDP 512 byte 1.5 1040 2.6 3.9 802 2.01 3.015

UDP 1024 byte 1.25 2120 5.3 6.625 1855 4.64 5.8

UDP 1280 byte 1 2640 6.6 6.6 2453 6.13 6.13

UDP 1518 byte 1 3000 7.5 7.5 2768 6.92 6.92

Total score 24.63 33.865 21.78 29.445

D-ITG

Criteria Weight Value Score W. Score

UDP 64 byte 8 95 0.24 1.92

UDP 128 byte 4 182 0.46 1.84

UDP 256 byte 2 332 0.83 1.66

UDP 512 byte 1.5 655 1.64 2.46

UDP 1024 byte 1.25 1257 3.14 3.925

UDP 1280 byte 1 1797 4.49 4.49

UDP 1518 byte 1 2042 5.11 5.11

Total score 15.91 21.405

Table 5.2: Results from initial performance tests

In table 5.2 we can see the results from the performance test. We are interested in the weighted total scores in order to determine if any of the candidates are outside of the 20%

range of the highest score. Formula 5.2 is used to calculate the percentual difference between the highest score and the others.

1 − W S

W S

max

(5.2)

We can see the results from the performance test in table 5.2

It is made clear that D-ITG can’t compete with the other two candidates when it comes to pure UDP traffic generation performance and is therefore eliminated from the evaluation.

5.1.3 Evaluation

The evaluation step is done by evaluating each criterion deeply for all of the candidates

that are left from the selection step. Each candidate gets a relative score for the different

criterion and each criterion is weighted. In the end of the evaluation we will have a weighted

evaluation matrix that will represent the final result of the process.

(29)

5.1. Phase 1 – Evaluation 19

Introduction

Hpcbench is a Linux based open source network benchmark tool. The project is admin- istrated by Ben Huang ¹ and is available for download on SourceForge[12]. Hpcbench is capable of communicating over the MPI protocol as well in extension to TCP and UDP, a feature that is of no interest in this evaluation. According to the projects page, Ben Huang seem to have been the only active developer and there has been no updates since 2006-08-10.

There exists no review of Hpcbench on its SourceForge project page.

Iperf is a open source network benchmark tool that is available to most modern operat- ing system but we will only concentrate on the Linux version in this evaluation. The project is hosted on SourceForge[14] and at the time of writing there are currently 6 members added to the project with Jon Dugan ² and “mitch” ³ as administrators. The mailing list[29]

is fairly active with over 1500 messages since december 2002 and the end of 2009. 56 users have reviewed Iperf on SourceForge and 91% of these users are positive and recommends the application. Some says that Iperf is “almost a de-facto standard for testing network bandwidth”.

Functionality

In order to take part of the evaluation, both candidates has already fulfilled that they have the required basic functionality. It is however not said that the claimed functionality works well and as intended. In order to answer these questions, a test environment with both candidates has been setup and the candidates has been examined. All the required functionality has been tested and the experience that the candidates brings has been noted.

Two test servers was utilized during this test, they were directly connected to each other in the same way as during the initial performance test (see figure 5.1).

Iperf can be configured to use threads or not to use threads. Iperf generates extremely high CPU load when using threads which affects the performance in a very bad way and packets gets dropped. When running without threads the CPU load is reasonable and no unnecessary packets get dropped. The downside with running Iperf without threads is that the receiving instance is limited to only one sending instance. What that means is that the number of receiving instances has to be equal to the number of sending instances. My loose theory about this behavior is that Iperf messes up the packets’ IDs when they belong to two different UDP streams.

For some reason it seems very hard for Iperf to avoid packet loss during the first seconds of running.

Iperf’s functionality for specifying UDP bandwidth lacks inconsistency, especially on higher bandwidth. When requesting a certain number of Mbps, the actual number that Iperf generates can be way more or way less. For example, specifying 905 Mbps with 1518 byte packet size results in 906 Mbps but specifying 910 Mbps with 1518 byte packet size results in 977 Mbps. This is a behavior that is far from acceptable in a scenario like this.

Even though Iperf has some bugs, Hpcbench is far from perfect either.

If a certain amount of bandwidth is not specified, Hpcbench automatically tries to gen- erate as much traffic as possible. Good idea and might work well in some situations but the calculation of max possible bandwidth seems to fail when running multiple Hpcbench instances utilizing multiple network interfaces. Another scenario when the calculation fails

1

Ben Huang, hben@users.sf.net

2

Jon Dugan, jdugan@users.sf.net

3

“mitch”, mitchkutzko@users.sf.net

(30)

20 Chapter 5. Results

is when using small packet sizes. The outcome of the failed calculations is heavy packet loss and something that Hpcbench calls “UDP Key packet loss” which makes it impossible for Hpcbench to generate results. In order to avoid this problem, the only solution is to always specify a decent amount of bandwidth.

With Hpcbench it is possible to specify the number of seconds it will generate traffic. It is also possible to specify how many times that Hpcbench will be repeating itself. Hpcbench in UDP mode never outputs any reports until a session is over, meaning that in order to get any feedback during a long run, one have to specify a appropriate run time and repeat that X number of times. However, there are some problems with this design. First of all, maximum run time is hardcoded to 60 seconds but I had to increase that value in order to run for longer than 59 seconds. Secondly, after each test run Hpcbench sleeps for a hardcoded amount of seconds (2 seconds default) in order to collect test result. I have successfully decreased that value to 1 second but it is impossible to go any lower without serious modifications of the source code. Combining the need for repeated runs and the unavoidable pause between runs makes it impossible to setup a continuous flow of UDP traffic. The user will have to be aware of the pause between runs and take that into consideration when evaluating the results.

Both candidates has some major problems but on different areas. In order to grade the two candidates, one have to determine how critical the different problems are. The two major problems that Hpcbench suffers from are both serious problems but they can be avoided to some extent and the usage of the application is still possible.

The problem with specifying bandwidth in Iperf is however very critical and the insecu- rity in the results makes it impossible to use Iperf under these circumstances.

The final score for functionality is hard to determine precisely and have to be estimated.

The functionality score is calculated using formula 5.3. P 1 is the number of major problems whereas P 2 is the number of critical problems. If the score is less than 0, the final score will be 0.

10 − 2 × P 1 − 4 × P 2 (5.3)

Hpcbench suffers from two major problems while Iperf suffers from one major and one critical problems.

Hpcbench score: 6 Iperf score: 4

Performance

In addition to the initial performance test runs prior to the evaluation phase, as mentioned in section 5.2, some more accurate performance tests had to be done. The physical setup was identical to the initial performance test except for the fact that this test involves more network interfaces. In order to identify the optimal setup of the network interfaces, several tests had to be performed with different setups and traffic flows. The tests are graded individually as meta grades and are calculated using formula 5.4 where I is a set of meta scores and n is the length of that set. The final performance score is then calculated using formula 5.5 where W S is a set of weighted scores and n is the length of that set.

X

n i=1

I

m

n (5.4)

(31)

5.1. Phase 1 – Evaluation 21

X

n i=1

(W S)

i

n (5.5)

Hpcbench Iperf

Criteria Weight Value Score W. Score Value Score W. Score

UDP 64 byte 4 139 1.39 5.56 112 1.12 4.48

UDP 128 byte 2 262 2.62 5.24 167 1.67 3.34

UDP 256 byte 1 492 4.92 4.92 290 2.9 2.9

UDP 512 byte 1 880 8.8 8.8 576 5.76 5.76

UDP 1024 byte 1 935 9.35 9.35 910 9.1 9.1

UDP 1280 byte 1 950 9.5 9.5 931 9.31 9.31

UDP 1518 byte 1 955 9.55 9.55 934 9.34 9.34

Total score 46.13 61.72 39.2 44.23

Table 5.3: Results from performance test utilizing 1 gigabit interface with unidirectional traffic The data in table 5.3 represents the results from a performance test utilizing 1 unidirectional gigabit interface. The meta scores are calculated using formula 5.4.

Hpcbench score: 9 Iperf score: 6

Hpcbench Iperf

Criteria Weight Value Score W. Score Value Score W. Score

UDP 64 byte 4 176 0.88 3.52 144 0.72 2.88

UDP 128 byte 2 344 1.72 3.44 204 1.02 2.04

UDP 256 byte 1 656 3.28 3.28 504 2.52 2.52

UDP 512 byte 1 1236 6.18 6.18 908 4.54 4.54

UDP 1024 byte 1 1870 9.35 9.35 1812 9.06 9.06

UDP 1280 byte 1 1900 9.5 9.5 1862 9.31 9.31

UDP 1518 byte 1 1910 9.55 9.55 1866 9.33 9.33

Total score 40.46 44.82 36.5 39.68

Table 5.4: Results from performance test utilizing 2 gigabit interfaces with unidirectional traffic The data in table 5.4 represents the results from a performance test utilizing 2 unidirectional gigabit interfaces. The meta scores are calculated using formula 5.4.

Hpcbench score: 6 Iperf score: 6

The data in table 5.5 represents the results from a performance test utilizing 3 unidirectional gigabit interface. The meta scores are calculated using formula 5.4.

Hpcbench score: 5 Iperf score: 4

The data in table 5.6 represents the results from a performance test utilizing 4 unidirec- tional gigabit interface. The meta scores are calculated using formula 5.4.

Hpcbench score: 5

Iperf score: 3

(32)

22 Chapter 5. Results

Hpcbench Iperf

Criteria Weight Value Score W. Score Value Score W. Score

UDP 64 byte 6 210 0.7 4.2 126 0.42 2.52

UDP 128 byte 3 300 1.0 3.0 228 0.76 2.28

UDP 256 byte 1.5 750 2.5 3.75 429 1.43 2.145

UDP 512 byte 1 1314 4.38 4.38 846 2.82 2.82

UDP 1024 byte 1 2247 7.49 7.49 1578 5.26 5.26

UDP 1280 byte 1 2775 9.25 9.25 2436 8.12 8.12

UDP 1518 byte 1 2865 9.55 9.55 2388 7.96 7.96

Total score 34.87 41.62 26.77 31.105

Table 5.5: Results from performance test utilizing 3 gigabit interfaces with unidirectional traffic

Hpcbench Iperf

Criteria Weight Value Score W. Score Value Score W. Score

UDP 64 byte 6 208 0.52 3.12 112 0.28 1.68

UDP 128 byte 3 408 1.02 3.06 224 0.56 1.68

UDP 256 byte 1.5 808 2.02 3.03 428 1.07 1.605

UDP 512 byte 1 1412 3.53 3.53 812 2.03 2.03

UDP 1024 byte 1 1976 4.94 4.94 1568 3.92 3.92

UDP 1280 byte 1 2908 7.27 7.27 2208 5.52 5.52

UDP 1518 byte 1 3408 8.52 8.52 2500 6.25 6.25

Total score 27.82 33.47 19.63 22.685

Table 5.6: Results from performance test utilizing 4 gigabit interfaces with unidirectional traffic

The final performance scores, calculated with formula 5.5, are the following:

Hpcbench score: 6 Iperf score: 5 Stability

Both Iperf and Hpcbench has been setup to send and receive a constant flow of 500Mbit/s UDP traffic for 100 hours using 1024 byte frame size. The server setup used in this test is the same as in the initial performance test from the selection phase, see figure 5.2. None of the candidates dropped any packets during the tests which makes it easy to give both candidates a full score.

Hpcbench score: 10 Iperf score: 10

Quality

Since none of the tools is capable of generating traffic even close to gigabit line rate there is no need to validate the quality of such tests. I have instead chosen to validate generated unidirectional 100Mbit and 2x100Mbit traffic. To give this test a touch of a real testing scenario, two servers was interconnected via a DUT. A scenario that previously have been tested using a Ixia traffic generator.

Figure 5.2 illustrates how the DUT and the servers was connected during this test. The

candidates were configured to output a constant traffic flow for 10 seconds. In order to be

able to compare results from the candidates to reports from the Ixia traffic generator, I had

(33)

5.1. Phase 1 – Evaluation 23

to make sure that the frame sizes the candidates created was identical to the ones from the Ixia generator. To achieve such behavior, the *nix application tcpdump[32] is used to capture the traffic and analyze the content of the ethernet frames. For the candidates to produce a 64 byte ethernet frame, it has to generate a 18 byte UDP datagram. Adding 28 byte IP[16] overhead along with 14 byte MAC[17] header and 4 byte checksum results in a 64 byte ethernet frame, see figure 5.3 for further explanation.

Server 1

DUT

Server 2

Figure 5.2: Illustration of the servers and DUT setup

MAC Header (14 bytes)

Data (46 - 1500 bytes)

CRC (4 bytes)

Ethernet Type II Frame (64 - 1518 bytes)

Figure 5.3: Ethernet Type II Frame

The Ixia traffic generator uses the size of the ethernet frames in its reports and calcula- tions so in order to the make a proper comparison, the ethernet frame size have to be used when calculating the speed of the candidates as well. Therefore the number of frames per second are noted and used to calculate the bandwidth using formula 5.6.

f rame size × f ps × 8

10 ⁶ (5.6)

The Ixia traffic generator is an Ixia 1600T equipped with a LM1000STXS4-256 module.

(34)

24 Chapter 5. Results

Speed Frame size Hpcbench fps Hpcbench Mbps Hpcbench ref. diff. Ixia fps Ixia Mbps

100Mbit 64 104167 53.33 -5% 109228 55.92

100Mbit 128 83841 85.85 0% 84201 86.22

100Mbit 256 44643 91.43 -1% 45218 92.61

100Mbit 512 23337 95.59 -1% 23476 96.16

100Mbit 1024 11887 97.38 -1% 11973 98.08

100Mbit 1280 9624 98.55 0% 9615 98.46

100Mbit 1518 8068 97.98 -1% 8127 98.69

200Mbit 64 166667 85.33 -1% 168973 86.51

200Mbit 128 138719 142.05 9% 126688 129.73

200Mbit 256 89286 182.86 -1% 90436 185.21

200Mbit 512 46675 191.18 -1% 46952 192.32

200Mbit 1024 23773 194.75 -1% 23946 196.17

200Mbit 1280 19246 197.08 0% 19230 196.92

200Mbit 1518 16136 195.96 -1% 16254 197.39

Table 5.7: Quality comparison of 100Mbit and 200Mbit unidirectional traffic flows from Hpcbench and Ixia traffic generators.

Speed Frame size Iperf fps Iperf Mbps Iperf ref. diff. Ixia fps Ixia Mbps

100Mbit 64 99999 51.2 -8% 109228 55.92

100Mbit 128 83333 85.33 -1% 84201 86.22

100Mbit 256 43478 89.04 -4% 45218 92.61

100Mbit 512 23256 95.26 -1% 23476 96.16

100Mbit 1024 11905 97.53 -1% 11973 98.08

100Mbit 1280 9524 97.53 -1% 9615 98.46

100Mbit 1518 8130 98.73 0% 8127 98.69

200Mbit 64 153839 78.77 -9% 168973 86.51

200Mbit 128 117625 120.45 -7% 126688 129.73

200Mbit 256 86956 178.09 -4% 90436 185.21

200Mbit 512 46512 190.51 -1% 46952 192.32

200Mbit 1024 23810 195.05 -1% 23946 196.17

200Mbit 1280 19048 195.05 -1% 19230 196.92

200Mbit 1518 16261 197.47 0% 16254 197.39

Table 5.8: Quality comparison of 100Mbit and 200Mbit unidirectional traffic flows from Iperf and

Ixia traffic generators.

(35)

5.1. Phase 1 – Evaluation 25

Scalability

None of the remaining candidates have built-in support for scalability over multiple servers and/or network interfaces. In order to utilize multiple servers and network interfaces, mul- tiple sending instances of the applications are required. Iperf running without threads (see functionality evaluation on page 19 for explanation) requires that each sending process has its own dedicated receiving instance, otherwise Iperf reports a high rate of dropped packets (possible reason for this this behavior is mentioned in section 5.1.3). Hpcbench, on the other hand, is fully capable of handling multiple sending threads in one receiving instance which makes it way easier to manage and less resource demanding comparing to Iperf without threads.

Hpcbench score: 7 Iperf score: 3

Results

Even though code quality is not a part of the criteria, it is worth to mention that both candidates have equally good code structure and good comments. Iperf have an advantage since its still a fairly active project and have more developers involved in the project but this will only make a difference to the evaluation if there is a close call choosing between the candidates after the score from the evaluation have been summarized.

The score from the evaluation of the criteria can be found in table 5.9. It is clear that Hpcbench is superior to Iperf on every criterion and it is safe to say that Hpcbench is the candidate that best fulfills the requirements set for this evaluation and is hereby considered to be the only remaining candidate.

Hpcbench Iperf

Criteria Weight Score W. Score Score W. Score

Performance 2 6 12 5 10

Stability 1 10 10 10 10

Functionality 2 6 12 4 8

Scalability 1 7 7 3 3

Total score 29 41 22 31

Table 5.9: Final score from the evaluation phase

5.1.4 Conclusion

Considering the number of open source traffic generators found, I am a little disappointed that only a few of them have the requested functionality required for this project. The fact that most of the candidates were eliminated due to lacking report functionality unfortunately limits the list of candidates a little bit too much than desired.

Since most of the candidates lacks good and useful report functionality, my guess is that these tools are either incomplete or not designed to be used in scripted automated scenarios.

It could be of interest to extend these candidates with proper report functionality. Doing

so could possibly change the outcome of this entire evaluation but that is unfortunately

outside the scope of this project.

(36)

26 Chapter 5. Results

One thing that affects all of the three performance tested candidates is the poor perfor- mance that they can produce, especially when using small package sizes. All the tests were performed on modern hardware with multiple network interface cards but it seems like the high rate packet generation was way too CPU intense in order to be comparable with the numbers that hardware traffic generators like Ixia is capable of. Since all of the tested tools are running in user space, they are affected by scheduling, traversing the network stack and other issues that pure hardware packet generators do not need to worry about.

It might have been a more fair comparison if there existed a tool that generated its traffic in the network interface card instead of in the CPU.

One thing that turned out to be quite troublesome was the inconsistent use of the word

“packet size” among all the tested traffic generators. Some candidates define “packet size”

as the size of the UDP/TCP data payload, others define it as the size of the IP data payload, others define it as the size of Ethernet data payload and the reference IXIA traffic generators define it as the size of the entire Ethernet frame. I had to rerun some tests when I realized that the produced packet was not of the size that I expected and had to sniff the traffic in order to determine the true meaning of the “packet size” setting. One traffic generator even required some modification to support packet sizes of 64 byte Ethernet frames.

5.2 Phase 2 – Integration

This part of the project is focused on implementing a system that aggregates multiple Linux based servers into one traffic generating testing tool.

5.2.1 Design

Given a set of servers, one of them are chosen to act as a master and the others are considered to be slaves. Beside the web server, there is also a database that stores all the results from the previous tests and a queue for upcoming tests. The master acts as a command center, telling the slave nodes what to do.

See figure 5.4 for a visual presentation of the system design. The continuous lines symbolize function calls over the communication channel and the dotted lines symbolize database calls. The full arrows symbolize the call while the empty arrows symbolize return data.

Figure 5.5 shows the functionalities that the different parts of the design holds.

5.2.2 Communication channel

The communication between the web server and the traffic generation servers are imple-

mented in PHP[11] over SSH[31]. All the return data is JSON[13] encoded to ensure porta-

bility. One might argue that it would be more efficient to communicate over HTTP or even

better, implement a special protocol for this project only. The use of SSH for communi-

cation is motivated with easy setup, no need to have extra services running and the fact

that most Linux systems comes with a SSH daemon installed by default. Due to the time

constraints of this project, simplicity is the way to go. Another and probably less resource-

demanding approach would be to make use of SSHFS (Secure SHell Filesystem)[30] instead

of constantly opening new SSH connections.

(37)

5.2. Phase 2 – Integration 27

Web server

DB

Figure 5.4: Overview of the system design with communication channel and database calls

5.2.3 Flows

This section explains some interesting flows and the logic of some of the more common functionalities.

Execution of new test

When a user adds and saves a set of jobs in the web interface, the following scenario takes place. Note that the web server is referenced to as “The server” and the web client is referenced to as “The client”.

1. The client serializes an array of jobs into JSON format.

2. The client POSTs the JSON encoded array to the web server.

3. The server decodes the JSON string and inserts the new jobs into the database.

4. The server returns the ID of the new test to the client that starts polling the nodes for test logs.

5. The server asks the database and checks if there are any locks placed on the requested resources or if they are available.

6. If not all resources are available at the time, a cronjob asks the database every X minutes and checks if the locks on the resources has been dropped.

7. Once all the needed resources are available, the server contacts the hosts that will act as receivers and orders them to start a Hpcbench receiving instance if such is not already running.

8. The server places locks on the requested resources in the database.

9. The server contacts all the affected nodes and tells them to launch their clients.

10. Once the nodes are finished with their tasks, they parse the logs they created during

the run and insert that information into the database.

(38)

28 Chapter 5. Results

- check queue - get test list - get test details - get test log - launch test - post test

Server side Client side - get test list - get test details - get test logs - post test

- launch udp test - launch udp server - launch tcp test - launch tcp server - parse logs

Node Master

Figure 5.5: Overview of the functionalities in the different parts of the system

Polling for test results

The following scenario takes place when the web client starts polling for test data.

1. The client sends a GET request to the web server with the ID of the test.

2. The server sends a query to the database to check if the test data for the ID is available in the database.

3. If test logs are available in the database, query the database and build a JSON encode array with the test results.

4. If test logs are not available in the database, the server connects to the nodes over the communication channel and asks for their logs.

(a) Each node parses its log file for the requested test ID. The nodes return an JSON encoded array with all the essential results.

5. The server returns the JSON encoded array back to the client.

6. The client checks if the “finished” flag in the result is set. If not, restart the polling.

5.2.4 Client implementation

The web client heavily relies on JavaScript and AJAX[10] techniques. The web server sends JSON encoded data to the web client which decodes and presents the data in a proper way.

Under the graphical user interface, there is a core of server side functionalities written in PHP. These includes functions for getting test logs and launching of new tests.

Since the client is fed with JSON encoded data, the web browser viewing the web client

need to be able to understand and parse this format. Some modern web browsers have

native support for this but far from all of them. To support a wide range of web browsers,

(39)

5.2. Phase 2 – Integration 29

a open source JavaScript library called “json2.js”[18] is included to extend the web browser with the missing functionality for handling JSON.

Screenshot

Figure 5.6 is a screenshot of the web clients user interface showing the configuration area for new jobs, a list of older already executed tests, test reports from a previous test and an empty log box.

Figure 5.6: Screenshot of the web client user interface

(40)

30 Chapter 5. Results

(41)

Chapter 6

Conclusions

It was a bit of a disappointment to realize halfway through the project that the main goal for the whole process would not be possible to fulfill, not even close to it. The initial plan was to achieve 40 Gbit/s traffic utilizing a couple of servers but that would require lots of high performing and expensive servers. Considering that Hpcbench almost requires unidirectional traffic to avoid packet loss and the poor performance that each server pair actually puts out, one is probably better off investing in real but expensive hardware based traffic generators instead of relying on a dubious and still expensive open source solution, at least for serious testing scenarios. Considering the price of hardware based traffic generators and the assumed research they put into making their products as effective as possible, there is really no big surprise that open source software running on regular x86 hardware is no way near as effective when it comes to pure packet generation On the other hand, considering the price tag of a hardware traffic generator, it would be strange if a couple of servers running a open source software traffic generator could compete in terms of quality and functionality. Open source traffic generators and the outcome of this project is however still a valid alternative for minor performance tests where extreme accuracy and performance is not required, i.e. scenarios where a developer wants to make sure that a certain feature works as intended under a decent amount of traffic.

I realized after some time that this project was more time consuming than I at first imagined which led to some drawbacks and limitations in the final product. This topic is further discussed in the limitations sections (section 6.1).

6.1 Limitations

One major limitation that I have not been able to solve is the delay between sessions in Hpcbenchs UDP mode. According to the documentation in the source code, Hpcbench uses that delay to send results from the receiver to the transmitter. The purpose might be legit but the outcome rules out the possibility to generate consistent UDP flows longer than 60 seconds without intermission.

The following bugs are currently known in the implementation:

1. If a user tries to add multiple traffic flows (jobs) with different duration, the report functionality will fail at calculating the final result and no warning will be given.

2. There is a 1 second delay between each “repeat” in UDP mode that is not taken into consideration when calculating the total run time of a certain test.

31

(42)

32 Chapter 6. Conclusions

6.2 Future work

During my investigation in the first phase, I stumbled upon a solution from Nmon[23]

that enables packet capture at wire rate using nothing but modern ordinary hardware, no special network interfaces. Their solution is based upon a new type of network socket called PF RING[24] and a custom threaded NAPI[8] called TNAPI[25] designed for certain Intel 1 and 10 gigabit network cards. Their solution takes advantage of multicore CPUs and bypasses some of the network layers in the kernel. According to their web page, their solution has been used to capture traffic at a rate of 5 million packets per second using a 10 gigabit network interface. A library named nCap[5] from the same organization is claimed to have functionality for both transmitting and receiving data. The library makes use of PF RING from Nmon and it enables applications to create a straight path between user space and the adapter. Using this technique requires a license key for each network interface. If such solution could be used for packet generation and transmitting, it would probably outperform the traffic generators handled in this project by far. I have however not been able to investigate this any further due to the limited time and money constraints.

Another alternative and relatively cheap solution would be to invest in special hardware and build a traffic generator based on that. The use of special hardware would overcome the drawbacks that the ordinary software based traffic generators have with traversing the network stack, CPU load, disc I/O, interrupts. Things that contribute to poor performance and inaccuracies. A small group at Stanford University have come up with a PCI card with the name NetFPGA[21]. NetFPGA consists of the following hardware:

– Xilinx Virtex-II Pro 50 for FPGA logic – 4 gigabit ethernet ports

– 4.5 MB SRAM – 64 MB DDR2 DRAM

Its entire datapath is implemented in hardware which they claim makes it possible to achieve full line rate at all the network interfaces simultaneously. The purpose of the NetFPGA unit is to enable students and researchers to build prototypes of high-speed, hardware accelerated networking units at low cost. The card design is open source along with all (from what I can tell looking at their web page) the software.. There exists a quite large set of open source projects from mostly universities hosted at the NetFPGA web site. One of them is a simple packet generator by Glenn Gibb ¹ and Adam Covington ² both from Stanford University. According to the projects web page[22] and a published paper[9], the NetFPGA based traffic generator works at extreme accuracy and at full line rates all the time. I have not been able to try out NetFPGA on my own so I can only speculate but a traffic generation solution based on NetFPGA or similar is probably a competent replacement or complement to expensive high-end tools like Ixia, only considering performance and accuracy and not the range of other functionalities. It is worth to mention that Ixia uses FPGA logic in their load modules for packet generation and packet capturing[15].

1

Glenn Gibb, grg@stanford.edu

2

Adam Covington, gcoving@stanford.edu

Evaluation of traffic generation tools and implementation of test system

Evaluation of traffic generation tools and implementation of test

system

Henric Englund

February 1, 2011

Master’s Thesis in Computing Science, 30 credits Supervisor at CS-UmU: Mikael R¨annar

Examiner: Fredrik Georgsson

Ume˚ a University

Department of Computing Science SE-901 87 UME˚ A

SWEDEN

Abstract

The purpose of this master’s thesis is to examine the possibilities to build an efficient and accurate distributed IP traffic generator based on a open source software traffic generator.

The problem lies in the demand for high performance traffic generators during quality

assurance of IP equipment and the high price of commercial hardware traffic generators. An

evaluation is conducted to identify a suitable traffic generator, given a set of requirements

that the traffic generator must fulfill to be usable in performance testing. A distributed

system utilizing the most suitable traffic generator is then implemented. The result and

conclusion of this report is that hardware traffic generators are superior to the software

based counterpart due to the several drawbacks that software traffic generators suffers from,

such as disc I/O and CPU load.

ii

Contents

1 Introduction 1

1.1 Background . . . . 1

1.2 Report outline . . . . 1

2 Problem Description 3 2.1 Background . . . . 3

2.2 Phase 1 – Evaluation . . . . 3

2.2.1 Problem statement . . . . 3

2.2.2 Goals . . . . 4

2.2.3 Methods . . . . 4

2.3 Phase 2 – Integration . . . . 4

2.3.1 Problem statement . . . . 5

2.3.2 Goals . . . . 5

2.3.3 Methods . . . . 5

3 Securing virtual private networks 7 3.1 Introduction . . . . 7

3.1.1 Terminology . . . . 7

3.1.2 Overview . . . . 8

3.2 Security mechanisms . . . . 8

3.2.1 Internet Protocol Security (IPsec) . . . . 8

3.2.2 Secure Sockets Layer (SSL) . . . . 9

3.2.3 Comparison of IPsec and SSL . . . 10

3.2.4 Conclusions . . . 12

4 Accomplishment 13 4.1 Preliminaries . . . 13

4.2 How the work was done . . . 13

5 Results 15 5.1 Phase 1 – Evaluation . . . 15

5.1.1 The criteria . . . 15

iii

iv CONTENTS

5.1.2 Selection . . . 16

5.1.3 Evaluation . . . 18

5.1.4 Conclusion . . . 25

5.2 Phase 2 – Integration . . . 26

5.2.1 Design . . . 26

5.2.2 Communication channel . . . 26

5.2.3 Flows . . . 27

5.2.4 Client implementation . . . 28

6 Conclusions 31 6.1 Limitations . . . 31

6.2 Future work . . . 32

7 Acknowledgements 33 References 35 A Servers 37 A.1 Server 1 . . . 37

A.2 Server 2 . . . 37

B List of abbreviations and acronyms 39

List of Figures

3.1 Overview of a typical virtual private network . . . . 8

5.1 Server setup . . . 17

5.2 Illustration of the servers and DUT setup . . . 23

5.3 Ethernet Type II Frame . . . 23

5.4 Overview of the system design with communication channel and database calls 27 5.5 Overview of the functionalities in the different parts of the system . . . 28

5.6 Screenshot of the web client user interface . . . 29

v

vi LIST OF FIGURES

List of Tables

3.1 IPsec performance over a 1000 Mbps network . . . 11

3.2 SSL performance over a 1000 Mbps network . . . 11

4.1 Preliminary schedule . . . 13

5.1 List of eliminated candidates due to missing functionality . . . 17

5.2 Results from initial performance tests . . . 18

5.3 Results from performance test utilizing 1 gigabit interface with unidirectional traffic . . . 21

5.4 Results from performance test utilizing 2 gigabit interfaces with unidirectional traffic . . . 21

5.5 Results from performance test utilizing 3 gigabit interfaces with unidirectional traffic . . . 22

5.6 Results from performance test utilizing 4 gigabit interfaces with unidirectional traffic . . . 22