• No results found

Media Gateway Transfer Analysis

N/A
N/A
Protected

Academic year: 2022

Share "Media Gateway Transfer Analysis"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

INOM

EXAMENSARBETE TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM SVERIGE 2020,

Media Gateway Transfer Analysis

ARON HANSEN BERGGREN

KTH

SKOLAN FÖR ELEKTROTEKNIK OCH DATAVETENSKAP

(2)

Abstract

There are many protocols available to transfer files where some are faster than others. With ever increasing network speeds and amounts of media needing to be transferred, the need to minimize the amount of data sent is increasing. The amount of available protocols which makes this possible is also increasing, making it harder and harder for teams to decide which one to implement in their particular case.

This thesis aims to give a overview on the most suitable protocols for direct media transfer over short to long distance WAN connections. In order to come to any conclusions, a lot of theoretical work on transfer solutions is made to make the comparison only describe useable solutions with reasoning as to why they make the cut for this type of application. Then, a handful of suitable solutions were tested on geographically distributed virtual machines in order to give realistic net- work conditions. The result is that the choice matters on the needs of the organisation, as the commercially available solutions are in general superior to the freely ones not only in speed, but in support and documentation. However, the open source solutions to perform very well for being free to use. In order to say which solution is the ultimate one, a lot more resources would be needed to complete wan transfers in excess of 10Gb/s speeds over many more network conditions.

The results quickly showed that the underlying protocol might not be the determining factor for speed. Instead, the use of efficient multi-stream transfers is what actually makes a difference.

(3)

2

Sammanfattning

Det finns många olika protokoll tillgängliga för att föra över filer och med det sagt är vissa snabbare än andra. Med kontinuerligt ökande nätverkshastigheter och större mängder media som behöver överföras, så gör även behovet att minimera mängden data. Mängden tillgängliga överföringspro- tokoll vilket möjliggör detta fortsätter också att öka, vilket gör det svårare och svårare att för team att bestämma vilket protokoll de ska implementera gällande deras behov. Denna uppsats syftar till att ge en överblick över lämpade protokoll för direktöverföring av media över både långa och korta geografiska avstånd över WAN anslutningar. För att kunna komma till slutsatser i denna rapport, har en stor andel teoretiskt arbete inom överföringsprotokoll genomförts för att reducera mängden kandidater till dem som är lämpliga för denna typ av överföring. Vidare har den resterande mäng- den protokoll testats mellan geografiskt skilda virtuella maskiner för att återspegla en realistisk bild av hur de presterar. Resultatet är blandat, det beror mycket på vad som krävs i varje fall.

Kommersiella lösningar presterar bäst, både i prestanda, dokumentation och support. Med det sagt så presterar de med öppen källkod väldigt väl med, framför allt med fördelen av att inte kosta något att använda. Däremot kommer vi inte fram till någon ultimat lösning, då en mycket större mängd resurser behövs för att genomföra längre experiment över ännu högra hastigheter än 10Gb/s över ännu fler nätverksförhållanden. Slutsatsen av experimenten visar att det underliggande pro- tokollet inte spelar särskilt stor roll för den slutgiltiga hastigheten. Det som istället spelar roll är valet av ett protokoll som har multi-stream möjligheter som avgör hur snabbt du kan gå.

(4)

Authors

Aron Hansen Berggren aronber@kth.se Information and Communication Technology KTH Royal Institute of Technology

Examiner

Peter Sjödin Kista

KTH Royal Institute of Technology

Supervisor

Markus Hidell Kista

KTH Royal Institute of Technology

(5)
(6)

Contents

1 Introduction 7

1.1 Background . . . 7

1.2 Problem . . . 8

1.3 Purpose . . . 8

1.4 Commissioned Work . . . 9

1.5 Target Audience . . . 9

1.6 Ethics and Sustainability . . . 9

1.7 Methods . . . 9

1.7.1 Literature Studies . . . 9

1.7.2 Testing and Benchmarking . . . 9

1.8 Delimitations . . . 10

1.9 Disposition . . . 10

2 Background 11 2.1 Concepts . . . 11

2.1.1 Protocol . . . 11

2.1.2 Buckets . . . 11

2.1.3 Hybrid Cloud Solution . . . 11

2.1.4 Open Systems Interconnection (OSI) Model . . . 11

2.1.5 REST and Application Interfaces . . . 12

2.1.6 Network Congestion . . . 12

2.1.7 User and Kernel space . . . 12

2.1.8 TCP . . . 13

2.1.9 UDP . . . 13

2.1.10 Multistream. . . 13

2.1.11 Firewall . . . 13

2.1.12 Encryption . . . 13

2.1.13 Metadata . . . 14

2.2 Iconik Storage Gateways (ISG) current setup . . . 14

2.3 Candidates . . . 15

2.3.1 HTTPS . . . 15

2.3.2 FTPS . . . 15

2.3.3 SFTP . . . 15

2.3.4 SCP . . . 16

2.3.5 WDT . . . 16

2.3.6 QUIC . . . 16

2.3.7 PA-UDP. . . 16

2.3.8 UDT . . . 17

2.3.9 UFTP . . . 17

2.3.10 FileCatalyst Direct . . . 17

2.3.11 TIXstream . . . 17

2.3.12 FASP . . . 18

5

(7)

6 CONTENTS

3 Approach 19

3.1 Literature Studies . . . 19

3.2 Test Runs . . . 19

3.3 Final Candidates . . . 19

3.3.1 Experimental . . . 20

3.3.2 Speed Matters . . . 20

3.3.3 Ports Matters . . . 20

3.3.4 Support Matters . . . 20

3.3.5 Disqualified Protocols . . . 20

3.3.6 Final Protocols Investigated . . . 21

4 Test Environment 23 4.1 Test Suite . . . 23

4.1.1 HTTPS . . . 23

4.1.2 SCP . . . 23

4.1.3 SFTP . . . 24

4.1.4 UFTP . . . 24

4.1.5 WDT . . . 24

4.1.6 TIXStream MFT . . . 24

4.1.7 Notes . . . 24

5 Performance Evaluation 25 5.1 Single Stream Results . . . 25

5.2 Multistream Results . . . 26

5.3 SCP . . . 26

5.4 SFTP . . . 26

5.5 UFTP . . . 27

5.6 WDT . . . 27

5.7 TIXStream . . . 27

6 Discussions 29 6.1 Experiment Discussion . . . 29

6.2 Data Discussion. . . 30

7 Conclusions and Further Work 31 7.1 Conclusion . . . 31

7.2 Future Work . . . 32

References 33

(8)

Chapter 1

Introduction

1.1 Background

There are many ways to transfer files where some ways are faster than others. The Iconik Stor- age Gateway (ISG) allow users to have local access points to their media and reach it from any location, even if the remote gateway is on another continent. The gateways can also transcode media to different qualities and resolutions, analyze a set of storages for media, index them based on its attributes and create cloud assets corresponding to the media. It takes these orders from the Iconik service running in the cloud[1].

If a users available ISG do not possess media the user wants to work on, the user can request that media. This will cause the ISG which do have access to that media to upload it to the cloud, and then trigger the ISG available to the user to download it. This is very inefficient, as it introduces delay and unnecessary workloads on servers and networks. Both customers and product owners at Iconik.io are asking for a better solution, but the resources to perform a field study on the topic to find which transfer solution(s) are suitable for such a task have not been available. Having this solution in place would reduce the amount of traffic generated by the gateways over WAN, especially on local networks where the gateways would still have to go through the cloud instead of staying on the local network.

The time difference between sending media directly and using a intermediate cloud is significant, as the entire project needs to be transferred on to a intermediate cloud before the destination gateway can download it from there. The purpose of the ISGs is to index files and synchronize this online and make them available once requested by a remote site, to benefit from the hybrid cloud approach. The requirement of the intermediate cloud as storage medium is preventing the gateways to show their full potential.

The current method for transferring to the cloud use parallel HTTP/2 connections to cloud stor- ages. This method works for all supported configurations, but is rather slow and inefficient.

7

(9)

8 CHAPTER 1. INTRODUCTION

Iconik (and its management

queues)

S3 Bucket Google

Cloud Buckets

Local Firewall, Vegas

Media pro- duction office floor

Media Ingest office floor

Local Firewall, Europe

PRoffice

Sync Sync

The missing link

Figure 1.1: A typical Iconik Gateway usecase topology, with file transfer streams marked out.

What to use to create the dotted links between the firewalls is what is being investigated.

1.2 Problem

The current transfer solution for the Iconik Media Gateways is very inefficient, doing each task twice and is in dire need of a better way of transferring files. This design removes the need for users to handle firewalls as all transfers are trusted by the public cloud, but the time has come where it can no longer be the only available option. It simply takes too long and uses too much resources to send the same file twice. The current method should optionally be done directly with a single transfer using some transfer protocol. Instead of choosing a protocol at random and then testing to see if it happens to transfer media efficiently, this report aims to provide background information and benchmarks over a range of plausible protocols. The transfer protocol must be able to do its work without interrupting the other tasks of the system.

1.3 Purpose

The reader should be able to choose a protocol for direct media transfers based on the information and results of this report. There are a lot of methods out there for securely transferring media, and with that some are surely better than others. With heavy emphasis on security and resource utilization1, some protocols are sure to perform better than others, but by how much? What are the available options? What are their drawbacks?

1Both in terms of network link utilization and system resources

(10)

1.4. COMMISSIONED WORK 9

1.4 Commissioned Work

The problem this thesis aims to solve is one from the Iconik development team at iconik.io. The Iconik Storage Gateways does not utilize direct transfers, as background information to how to do the implementation was missing. This thesis aims to resolve the specific issue of missing background information for direct transfers for this product.

1.5 Target Audience

The target audience of this thesis are those looking to implement direct transfers of media between some locations securely and fast.

1.6 Ethics and Sustainability

The perks of performing this study is that it will present which protocols are good choices for direct transfers based on the previousproblems. This will ensure the protocol does not need to be switched in the near future. It should also heavily reduce the amount of traffic generated by those dependent on Iconik to access their media from more locations. Hopefully this will aid in the endless battle to send fewer network packets over the internet and save on network capacity.

Saving on network capacity leads to savings in electricity as the equipment which runs the internet is quite power hungry. This is becoming more important as the internet uses more and more of our electricity [2].

Ethically, the tests were all done between large data centers so no person and/or company should have been negatively affected by the network usage of this study.

1.7 Methods

1.7.1 Literature Studies

There exists a plethora of studies on TCP and UDP and so data for these protocols are readily available. Some of these studies contain relevant results, such as maximum achieved speeds, lim- itations and how security affects performance. Research for which protocols are relevant for this application were conducted, as not all the transfer protocols are stable while fulfilling the desired properties to solve theissuespresent today. There exist a lot of options out there for file transfers, and so a study on them must narrow them down to only look at a partial set of the suitable options.

1.7.2 Testing and Benchmarking

In order to validate the efficiency of the protocols over different network conditions, the benchmarks will be run between three different virtual machines in three different countries to ensure diversity in the delays and network conditions:

• Finland

• Belgium

• Oregon

The main machine transferred to and from is the virtual machine located in Finland. This machine has a relatively short delay to Belgium, and a worst case long delay to Oregon. Between these virtual machines, a high fidelity video totaling 20 GB was sent. In order to remove any potential bottlenecks not related to the network conditions, the machines were all equipped with 8 cores, 32 GB of RAM of which 25 GB was dedicated to RAM-disks which would fit the 20 GB large test file. A RAM-disk2 is used as otherwise the solid state drives in the machines would have become a bottleneck. At this configuration, the maximum bandwidth VPS3offers its VM customers is 16 Gbps in network connections(2 Gbps per allocated CPU core).

2Local storage which is completely removed when the system powers down, but is extremely fast

3Virtual Private Server

(11)

10 CHAPTER 1. INTRODUCTION

In order to produce repeatable results all transfer solutions were mostly left at default settings, see chapter4.1for more details on individual protocol configurations.

1.8 Delimitations

Not every protocol can be tested, and so the scope of compared network protocols must have a limit. The reasonable delimitations here are that the protocol must have good implementations for the operating systems which the gateways support:

• Windows 10 Professional

• Windows server 2016

• MacOS (as a daemon)

• Ubuntu Linux 16.04 LTS 64Bit

• RedHat Linux 7 64Bit

• CentOS 7 64Bit

While not listed, CentOS 8 is released around the time this study was performed, so while not officially yet supported it to should to be considered.

Regarding the configurations, each test could potentially be optimized by setting different settings for each transfer to see how the speed would be affected. However, it is impractical to use per connection transfer settings outside of a lab environment. Since it is unreasonable to configure each deployed client’s network parameters for each connection, this is also outside of the scope of this study. This includes extending the functionality of the protocols themselves for features such as splitting a file into multiple segments for higher parallelisation of the transfer. Only the major obvious optimizations are made, such as increasing the buffer for UDP which will increase performance independent of the host as long as the resources are available. Finally, the evaluated protocols should have stable releases with some level of support and not have reports of general slow speed.

The amount of networks tested is another delimitation. It would be very interesting to see how the candidates perform under different conditions. However, a lot more resources would be necessary to configure and test more networks and as such it is considered outside the scope of this report.

With regard to networking, only the scope of the host layers are relevant (Explained shortly in Figure2.1) for the result to answer the problems listed.

1.9 Disposition

In this report, there are in total seven chapters:

1. Overview of how and why things are done the way they are, with background history.

2. Theory over the different aspects with relevant backgrounds of the different components used to come to a conclusion later.

3. Breakdown of the methods used and preliminary cutoff for protocols to continue investigating, in order to keep the scope relevant.

4. Focus on how the finalists were actually benchmarked in order to state reproducible results.

5. Results from running the tests as specified in the previous chapter are shown and explained.

6. In the discussion the results are compared to one another and the benefits/drawbacks for choosing one of them over the others are presented.

7. Recommendations are presented based on previous chapters as well as recommendations for further investigations.

(12)

Chapter 2

Background

This chapter contains all background information and concepts needed to understand the termi- nology and technologies used.

2.1 Concepts

2.1.1 Protocol

The university of Cambridge dictionary defines a IT protocol as "A computer language allowing computers that are connected to each other to communicate"[3]. In short, it is a set of rules for how to process outgoing and incoming data.

2.1.2 Buckets

Cloud storage buckets, often referred to as just buckets is a (often)massive storage space available in the cloud. These are easy to use and well integrated into numerous services around the web.

They show as having infinite available capacity, usually charging customers for the amount of data that is stored. Buckets are seen as reliable and secure storage, as authentication is required for accessing them while the actual storage is done redundantly and transparently to the consumer.

2.1.3 Hybrid Cloud Solution

A hybrid cloud solution allows a user or customer to choose where to store their files, often simultaneously by using multiple storage gateways. A common scenario for these are workflows containing large media assets, which may not be feasible to work with remotely, but need to be readily available and backed up. Here, a local storage gateway for current projects with a cloud storage for previous and/or upcoming projects makes the most sense as it is often infeasible to store all media assets locally.

2.1.4 Open Systems Interconnection (OSI) Model

The internet is a very complex and large system, usually divided into 7 different layers which go under the name "The OSI model". This model aims to separate different functions of different parts, as to distribute who is responsible for what.

11

(13)

12 CHAPTER 2. BACKGROUND

OSI Model

Layer Function

Host Layers

7 Application High-level APIs, including resource sharing, remote file access

6 Presentation How to interpret the data, such as encoding, compression and encryption

5 Session Manages the open communications, and separate multiple concurrent transfers

4 Transport Transportation of bytes (TCP, UDP, SCTP, QUIC), how the data is structured in each message. Segmentation and acknowledgment

Media Layers

3 Network Structuring and managing of the nodes in the network, how to route packets and to which addresses

2 Data Link Reliable transmission between network nodes over some di- rect layer, such as cable

1 Physical Reception and transmission of bits Table 2.1: The layers of interest are the "host layers"

There are arguably more layers as well, but these are not official nor included in this report but there are several articles on layer 8 out there for further reading. What is interesting here are host layers as we are interested in the speed and feature set over encrypted transportation connections.

2.1.5 REST and Application Interfaces

When discussing Application interfaces (APIs) the most common form over the web is sending different messages over the web using what is known as a RESTfull API which is a standard for how to format requests. Each request has a type, address and possibly a body. It is then up to the application to match requests arriving at different addresses with different request types with some function and pass the body of the request if applicable. The most common requests are the GET requests, letting the service know that the request is requesting some resource to be sent back. The reverse request is a POST request which lets the service know that I want it to update some state based on what data it was sent in the request body. Each GET request is mapped to a single resource, so if multiple resources are needed so are multiple GET requests.

2.1.6 Network Congestion

Network congestion occurs when there is more data (in the form of packets) trying to flow over a network link than the capacity of the link. This happens when a network node receives more packets than it can process, so its buffers start to overflow in which case packets are dropped and lost. If left unchecked, the affected network will grind to a halt and become unusable. For this reason, transfer protocols implement some sort of congestion control, where they try to share available bandwidth equally [4].

2.1.7 User and Kernel space

User and kernel space can be seen as the two different levels applications run on a computer. First, the kernel space is what runs underneath what the user sees. It manages the network stack for all applications, the scheduling of processes, hardware input and output etc. User space is where your applications live. Here, we find web browsers, coding environments, games and desktop environ- ments.

Kernel space applications ALWAYS have precedence over user space applications, but is also very restrictive to what is allowed to run there. For example, if a user space application such as the Firefox web browser is using all available system resources to stream some media, the system should still handle open network connections first hand.

(14)

2.1. CONCEPTS 13

2.1.8 TCP

TCP is widely used in applications due to it being readily available and for it guarantee delivery of all packets. It has configurable options for guaranteeing in-order packet delivery and since the option for selective acknowledgments became available, performs "OK" under most circumstances.

One of its many features is congestion avoidance, which is what keeps the internet online with billions of hosts simultaneously, considering its absence was the cause which brought down the internet when it was missing back in 1988 [4]. There are new methods suggested for congestion control all the time, which all claim different benefits over the previous ones [5]–[7].

However, none has proven to be efficient enough for becoming the standard for all operating systems, with no algorithm having over 50% on the server side of the internet with the (CU)BIC algorithms coming close at almost 47% in 2014[8]. This is no surprise, as CUBIC performs very well compared to most other algorithms [9].

2.1.9 UDP

UDP is known for being very small in footprint compared to other transfer protocols in the network stack, as it is nothing more than connection ports, a message length and a checksum to ensure the same content is present if the packet reaches its destination. There is no congestion control or avoidance, no logic to ensure packets are delivered in order or at all and no security [10].

This has led to it being used as the ground for other protocols, especially those running high-speed transfers as there is nothing stopping the protocol from going full speed[11]. However, it also has no guarantees of actually delivering any files or data which is why something around it to ensure data integrity and security is needed. These other protocols sometimes use TCP in order to communicate back to the host machine if it is missing some information. As it is unreliable by default, it cannot be used standalone but its workings are necessary for further protocols. Some examples are presented in2.3.

2.1.10 Multistream

Usually when performing some sort of requests, a single data stream is allocated for sending the request and receiving the response. A multistream request/response will use multiple streams and/or connections to deliver the data.

2.1.11 Firewall

The internet is a large and scary place, with lots of different actors with different intent. In order to keep each network separate and only allow the traffic of your choice the concept of firewalls was born. The goal for the firewall is to only let authorized users network requests through while filtering out the packets with malicious intents. It achieves this goal by analyzing all packets addressed to and from the network to see which ones are from trusted users or requested by them, while block any suspicious activity. These filters are then able to be changed to allow traffic for publicly facing services to be reached from the outside. This is the same as telling the firewall "I trust that the traffic you are passing to this other node on the network is handled correctly by that node". Whenever someone visits a website, someone has told a firewall that web traffic to that web host is allowed explicitly.

2.1.12 Encryption

As put fourth in the firewalls text above, not everyone has your best interest when out on the wild wide web. Therefore almost all web traffic is protected by a layer of encryption, which obfuscates the data in such a way that only the sender and the recipient are able to decode and read the messages/packets sent. When it comes to sending large amounts of files, this layer of obfuscation can be critical as people and businesses do not want the public to know what they are sending online. The most common way of doing this over the internet is with TLS1, which also provides means of identifying public services as authentic (compared to impostors who pose as legitimate

1Transport Layer2.1Security

(15)

14 CHAPTER 2. BACKGROUND

services). Worth mentioning is the predecessor to TLS, SSL2, which is now known to be insecure.

Both TLS and SSL operate on signed certificates, which are cryptographically verifiable files. This way, a third party ensures that you truly are you and your browser or application can then trust the source. It is not hard to create a certificate claiming to be any source, but for your browser to application to trust it a third trusted party must first have signed it verifying you as you.

2.1.13 Metadata

Metadata is all as associated data to some other data that is not part of the original object. When it comes to videos, this is everything that describe some video. You cannot see how long the video is in the video, but the video player knows it before starting it by looking at the metadata of that media file. Another case where metadata is relevant is when keeping track of some object. Usually, one does not keep track of the file itself, but to some document outlining where the file is, how to get to it, who has access, what version it is and more.

2.2 Iconik Storage Gateways (ISG) current setup

It is important to understand how the ISGs operate today in order to evaluate how they might benefit from changing. There are generally 2 different cases where the gateways are transferring media:

1. A user needs access to media only available on another gateway than what the user is (pos- sibly) connected to.

2. A job is started in Iconik, either manually or by being triggered, to process media at some gateway without local availability to the media. This might for example be archiving media for long term storage, rendering a new version with new parameters or synchronizing the media to another application, such as a Cantemo Portal system.

Some of these actions are run on a local site, as part of the hybrid cloud approach to media management. If the media then needs to be transferred, the gateways will talk to Iconik in the cloud and either request the files or notice jobs available such as uploading files it has access through polling a queue of tasks in the cloud. This approach has the additional benefit of not requiring any network configurations as the gateways always talk to the queues in the cloud and are always responsible for initiating the connections. The gateways will then either upload its version of the file to a cloud storage and be done or it will find that the media it is looking for is not available in the cloud yet and then Iconik will schedule a job for the gateway with the media available to upload it while the requesting gateways will have its queue empty. This process is then repeated every few seconds.

Figure 2.1: Distributed ISGs have media assets which are transferable between each other using intermediate cloud storage. Notice that there is no direct link in between the ISGs today.

2Secure Sockets Layer

(16)

2.3. CANDIDATES 15

While the APIs of popular buckets such as BackBlaze, Amason S3 and Google Cloud Bucket allow breaking down your uploads into smaller transfer streams (using multiple POST requests), they do not allow breaking down the GET request of a file to multiple download streams. As such, there is a limitation in the amount of data streams available for the application when transferring which will always be present by the current design, illustrated previously in Figure1.1.

Iconik synchronizes the cloud storages available files with internal indexes so can efficiently find any file the user wants. Each indexed asset in Iconik is easy for it to access as Iconik itself only store this metadata(2.1.13) about the media assets, such as where which versions of the media assets are, and which related assets are available.

It is clear from the diagram that the link between gateways are missing, and one of the parts of implementing it is choosing a network protocol to transfer the data in between.

2.3 Candidates

There are a lot of different transfer solutions that could be examined, and the further you look and the deeper your pocket the more candidates you will find. Here are the most noteworthy ones with a short summary. Some will also have a short explanation why they did not make the cut for the benchmarking phase of this investigation.

Application Solutions

Application layer solutions for transferring media will be looked at from layer 7 of Table2.1. These are all running over some combination of the previously mentioned transport protocols (TCP and UDP). First, some well-known file transfer protocols mainly sending packets over TCP are pre- sented. Then, more exotic protocols running over other protocols such as UDP or combinations of UDP and TCP are presented and lastly are proprietary protocols developed by dedicated companies with industry support.

2.3.1 HTTPS

Being the currently used protocol for media transfer by the ISGs as described in 2.2 and as the backup for traversing firewalls other transfer protocols have a lot to prove over this one. HTTPS file transfers will as such be the baseline of file transfers. It encrypted with TLS and then runs over TCP. A lot of effort has gone into optimizing the implementations of TCP for the different operating systems which makes this protocol fairly attractive for its wide availability.

2.3.2 FTPS

FTPS, or FTP over SSL/TLS(2.1.12) is a secure version of the popular FTP which has been around longer than the Internet. While old and reliable, it has a major downside: It uses multiple ports per file transfer [12].

This is not expected to be available in user systems, in the real world deployments of ISGs file transfers can at most use a handful of ports, but each transfer requires a port they would risk quickly running out or opening the network to more ports than desired.

2.3.3 SFTP

SFTP is based of the SecureSHell(SSH) protocol, with SFTP expanding to SSH File Transfer Protocol. While it has a similar name to FTPS, these two protocols should not be confused as they are completely different. It is also important to note that there is also another protocol out called SFTP, the Simple File Transfer Protocol. The simple version is not looked at in this part, or at all in this investigation. First off, FTPS use TLS/SSL to remain secure with SFTP using private-public key pairs, the same method as ssh [12]. While it may be a problem for some types of implementations to manage the keys necessary for this type of transfer, this can be done securely against Iconik in the cloud over HTTPS.

Secondly, SFTP runs on the same conditions as SSH, which requires a single port to be available. It also runs over TCP(as with SSH), as such it should show a lot of similarities to the characteristics

(17)

16 CHAPTER 2. BACKGROUND

described in 2.1.8. SFTP also allows a user to manage the file system as well, being able to list remote files and directories and remove them.

2.3.4 SCP

Similar to SFTP this protocol also runs over SSH and sees many of the same features of SFTP, albeit with less access to the file system itself. SCP focuses on just the file transfer. As the files will be indexed in the cloud, this is not a problem and might be preferable. Just as with SFTP it comes with the openssl package which is readily available.

2.3.5 WDT

The Warp speed Data Transfer protocol aims to use multiple TCP connections until the only possible bottleneck is hardware based on either disc or network link speed. Originally developed by Facebook in order to move their databases around in data centers directly from system memory, it is now Open Source and available on GitHub under the BSD license [13]. It there comes as both a portable library and as a command line tool for testing and usage purposes directly. It is also available as a system packet in some package managers, for example the arch Linux family of distributions [14].

Since WDTś user base is small there is no more available information on how it works or performs.

2.3.6 QUIC

QUIC3 is sometimes referred to as running TCP + TLS over UDP as it shares their characteris- tics. It is in development primarily by Google and Mozilla and is under standardization by the IETF4[15]. It is planned to be the transport protocol for HTTP/3 after it has been standardized. It uses the same congestion control mechanisms that TCP uses, which ensures that is treats network traffic fairly even though it is run over UDP. Other benefits include that TLS Encryption is obliga- tory in the protocol which ensures that the packets are trusted and safe in Cyberspace. While both TCP and UDP can only be used as underling protocols (both are missing native encryption), QUIC might be able to be the single part in transferring media. It is reliable, secure, authenticated, have low response times, respect current traffic and does not suffer from the typical problems of TCP to the same degree, such as low performance when there is a high bandwidth/delay product [16].

Leong, Wang, and Leong also discussed a even higher performing congestion control algorithm in the same paper as the performance evaluation of high bandwidth/delay products, which both TCP and QUIC might use later.

Currently, QUIC only sees use between a selected few services, such as Google services and in a synchronization application Syncthing. These use QUIC with the primary aim to reduce latency between internet applications, such as buffer times of newly opened YouTube videos, responsiveness of Google documents and securely transfer files [17].

2.3.7 PA-UDP

Performance Adaptive-UDP, or just PA-UDP is a application level protocol implementation aiming towards making UDP safe even at the highest of speeds. It uses a combination of variables to make sure to not overfill any system buffers:

1. Disk read and write Both the maximum read speed of the sender and write speeds of the receiver are used as not to overload any of the operating system kernels with overfull UDP buffers. This speed limit is set at the beginning of the connection, and updated throughout the transfer with TCP control packets. If, for some reason, the buffers of the receiver would start to fill up a request for fewer packets is sent back to the host telling it to reduce speed.

2. Link speed The available link speed is also accounted for by trying to keep track of how many packets are in the network at any given time, packet loss and delays. This is due to how easy UDP can overfill the buffers of a network, causing heavy packet loss.

3Quick UDP Internet Connections

4Internet Engineering Task Force

(18)

2.3. CANDIDATES 17

PA-UDP is clearly detailed in a excellent paper by Eckart, He, and Wu called Performance Adaptive UDP for High-Speed Bulk Data Transfer over Dedicated Links - IEEE Conference Publication[18].

However, there is no future work on the protocol, and it lacks official implementations. As such, the protocol is still in its research phase, 10 years after its first appearance on the web.

2.3.8 UDT

UDT runs completely over UDP, implementing most of what TCP such as congestion control and packet delivery guarantees. In that sense, it is fairly similar to QUIC. However, the protocol is slightly older and focuses on just massive data transfers. It also runs in userspace instead of kernel space(2.1.7), such as TCP and the underlaying UDP. It also runs over a single port and should as such have no problems traversing firewalls [19]. The protocol is under development by its original author since 2007, Yunhong Gu, who still assists in the open source effort of the protocol. As it is listed as used in multiple high performance applications and winner of multiple high speed network transfer contests between 2007 and 2013, in both presentations and documents on the official git repository on dorkbox [20] and its other official channels [21].

2.3.9 UFTP

"UFTP is an encrypted multicast file transfer program, designed to securely, reliably, and efficiently transfer files to multiple receivers simultaneously."[22].

UFTP is based on MFTP, but runs over UDP instead of TCP, giving it clear advantages in high delay links, such as over satellite connections, which is its main targeted transfer link. As it is more focused on reliability than anything else, it is unclear how it will compare to the others in a speed sensitive investigation. It supports very strong encryption for all the packets with variable encryption methods. Since it is a multicast protocol, one could also send the same file to multiple points at the same time, only encrypting the file once on the sending side. This adds a slight amount of value, as it is not a common use case for media management to keep the same file on multiple locations.

2.3.10 FileCatalyst Direct

FC Direct is a proprietary solution using multiple UDP ports on both sending and receiving sides, with a single open port of TCP. The solution shows very promising results in previous tests, and has some of the most consistent results in all tests, even those with relatively poor conditions.

It comes with high levels of encryption, high link utilization and is has clients for all required operating systems.

FC Direct also suffers from some problems related to both porting and implementation details.

First, it is written in java which might be one of the reasons it takes longer to start and finish than other commercial protocols. Secondly, it also uses a wide range of ports for achieving its speeds [23].

2.3.11 TIXstream

TIXstream is very similar to FileCatalyst from a marketing standpoint in their corresponding feature set. However, there are some key differential factors. TIXstream uses a single UDP port for data transfer over its optional RWTP UDP based proprietary protocol, and as such does not impose a problem regarding firewall traversal as other solutions. It is also built using faster underlaying architecture, having about half the start and stop times of FC Direct. However, it does not handle packet loss and increased RTT’s equally well. As packet loss goes towards 1% and the RTT5increases over 150 the performance starts to degrade [23].

It also cannot multistream the UDP based RWTP protocol, which is locked at one stream. For multistream TIXStream handles TCP connections over XP, another proprietary protocol with a set of TCP ports [24, Section 5.2.5]. XP is the default transfer method, to better allow administrators to control the flow of traffic and multistream the transfer. Furthermore, it also just encrypts unencrypted media if the encryption is strong enough as is, resulting in less resources being wasted in the case of transferring encrypted media.

5Round trip time

(19)

18 CHAPTER 2. BACKGROUND

2.3.12 FASP

The Fast And Secure Protocol is similarly based on UDP with a row of different mechanisms in place to try to maximize its bandwidth. The most interesting mechanism of FASP is how it communicates with other clients in order to try and calculate how they can send data without interfering with each other [25].

FASPs shortcomings is in the expense of calculating all this extra data for each datagram, where studies have found that the protocol in general does not exceed 1.8Gbit/s without jumbo frames, which can not be assumed available [26].

(20)

Chapter 3

Approach

This chapter presents first how the literature study was done. It then explains more about the different delimitations applied to different protocols, narrowing down the candidates to a more reasonable list to test within the delimitations.

3.1 Literature Studies

In order to locate which transfer protocols might be suitable, locating them was the first step.

For this part of the study, searching for articles describing different protocols and their properties was carried out over different platforms serving articles such as Google scholar in order to gather information on the protocols. This information was then compared in terms of requirements, per- formance and features. The hardest point here was disqualifying protocols before writing about them, making it seem like they were not considered for the study. This was done in order to keep the report focused, as including every single protocol is the work of a long-lasting group rather than the work of a thesis.

The majority of the work of this report were in the literature studies, as running your own bench- marks for all the candidates is unreasonable as it requires multiple licensees as well as access to experimental protocols such as PA-UDP, which may not have official implementations. As such, there are three different research phases. The first phase is made of two steps:

1. A quantitative research phase is performed to gather as many suitable candidates as possible.

2. A qualitative research phase is performed to further reduce the number of candidates to a top candidate list.

The second phase benchmarked the protocols against each other to give further information on their performance.

Lastly, they are compared and recommendations are made based on their performance and speci- fications found in both the benchmarks and phase two.

3.2 Test Runs

The tests were run three times and averaged on the machines specified in1.7.2. The tests reflect the protocols that were chosen to be further evaluated in3.3.6. The tests are not exactly the same as the different implementations of the protocols differ, for a fairer comparison all of them should have been tested in the same application. However, no such tool for all these protocols exist, and as such is beyond the scope of this report. The specifics of how each specific test is listed below in Chapter4after the candidates have been chosen.

3.3 Final Candidates

Not all transfer protocols are able to be deeply analyzed, some delimitations have to be set up.

The following subchapters explains which candidates will be analyzed further, and which require extensive effort.

19

(21)

20 CHAPTER 3. APPROACH

The relevant delimitations here are:

• Protocol Stability: The currently available implementations of the protocol should be final- ized. This rules out experimental and theoretical protocols.

• Support: The level of developer support, such as continuous development and documentation.

• Speed: The protocol should be able to reach 10Gbps in theory

• Ports Needed: There should be a limited amount of ports needed to perform a transfer.

3.3.1 Experimental

The protocol PA-UDP shows initial good results in the papers detailing the protocol. However, none of them supply a implementation of the protocol. As such, it is not feasible to include it in the rankings. It seems to do a lot right, as it tries to analyze the path efficiently of what the highest possible speed is. Its ideas are surely borrowed for some of the proprietary options, but not guaranteed as they (FC Direct, FASP and TIXstream) do not disclose how they calculate available transfer capacity.

Another still currently experimental protocol that could have been investigated is QUIC. As QUIC is still not finalized, its performance and specification might change during the measurements, although it is close to be finalized [15].

3.3.2 Speed Matters

Some other protocols are already known for being to slow. For example, FASP (2.3.12) does not reach above 1.8Gbit/s without jumbo frames, which cannot be assumed available. Therefore, FASP is not going to be a candidate for this report.

For the same reason there are more reasons for QUIC to not be compared to the other ones. QUIC one of the largest topics in the network transfer area of this time, as it is about to be standardized in a RFC. However, it still does not send data significantly faster than what TCP does [27]. This is another reason to not investigate QUIC streams further, although not enough to disqualify it by itself.

3.3.3 Ports Matters

Another common theme are protocols which use more than a few ports. This causes problems with firewalls, as it is not reasonable to expect customers to open their networks to a wide variety of ports for the ISG to transfer files. The main protocols which fall into this pitfall are FTPS and FileCatalyst Direct. While FTPS is more easily discarded as it looks a lot like other encrypted TCP solutions, FC Direct had very promising features. The main pitfall of FC Direct is the need for trial keys, as they could have been evaluated with TIXStream. It is the need for ports as outlined in “Comparison of Contemporary Solutions for High Speed Data Transport on WAN 10 Gbit/s Connections” which pushes them over the edge into "not tested" [23].

3.3.4 Support Matters

The protocol does not need to be actively developed, however, it cannot be left in the dust. This is the case for UDT, which does not have any publicly actively maintained implementations. This limits the use of the transfer protocol, as it would be hard to maintain down the line unless a new group was formed to maintain the protocol solutions.

This is something that commercial transfer solutions excel in, as they have paid maintainers.

3.3.5 Disqualified Protocols

The protocols in this table will not be benchmarked, as they do not follow the delimitations as outlined above.

(22)

3.3. FINAL CANDIDATES 21

Experimental Slow Ports

Usage Unsupported

PA-UDP FASP FTPS UDT

QUIC FileCatalyst Direct

3.3.6 Final Protocols Investigated

The list of actually practical network protocols out of the listings in3.3 are then the following:

• HTTPS

• SFTP

• SCP

• UFTP

• TIXstream

(23)

22 CHAPTER 3. APPROACH

(24)

Chapter 4

Test Environment

This chapter first goes through how the test environment looks and feels before it goes through how each transfer protocol was setup and tested.

4.1 Test Suite

While the virtual machines(1.7.2) geographical locations might make them seem like ideal testing endpoints, preliminary tests show that there are virtual limitations on the machines. The ma- chines are set up with a amount of resources to only be limited by the transfer protocol and not by hardware. While they have seemingly stable latency in between, there are further limitations.

As it seems, the VMś have virtual limitations set by the VPS provider for the test machines. The connections are capped per TCP connection. Running network speed analysis tool iPerf3 from the VM in Finland for example shows consistent 500Mbps to Belgium, while the connections reach to 100Mbps to Oregon.

Furthermore, as it is unreasonable to manually tweak each connection once installed to achieve the highest possible link utilization, only the necessary changes to run each solution is applied.

This means each solution is either set to use infinite bandwidth or the maximum the application will allow. The only exception is the size of UDP buffers, which were often enlarged for overall improved performance.

In order to keep the IP addresses relevant, the hosts file of each machine was edited to "location- testserver". This way, each test used the same parameters across reboots, configuration files and below.

4.1.1 HTTPS

Using just pure HTTPS as a means of transfer have many perks, as which it is the transmission protocol in use in ISGś today. Testing it requires a client, for which aria2 has been elected as it is made to test multiple different transfer solutions. Using the --check-certificate=false the client is able to complete the download without setting up a custom domain for the tests. In order to serve the test file the Finland virtual machine was set up with a nginx server serving only HTTPS secure content using a self-signed certificate1. For multistream, aria2 uses a s flag for splits and x for streams(connections). So the arguments added for the multistream transfer were -x 15 -s 15.

4.1.2 SCP

SCP is readily available and easy to use, as it is part of OpenSSH which is installed in most Linux distributions by default, such as the virtual test machines. This was done using the SCP command line utility as such:

scp testuser@othervm/teststore/ramdisk/testfile.mp4 .

1Last half of2.1.12

23

(25)

24 CHAPTER 4. TEST ENVIRONMENT

There are no options for splitting a transfer over multiple connections, and as such there are is no parallel data or instructions available.

4.1.3 SFTP

SFTP is very similar to SCP, but does show the entire filesystem instead of just offering single files. Running SFTP is as easy as sftp testuser@othervm and then navigating to the ramdrive and entering get testfile.mp4. Just as with SCP, there are no options for splitting a transfer over multiple connections, and as such there are is no parallel data or instructions available.

4.1.4 UFTP

UFTP is not designed for this type of workload, but interesting non the less. It is readily available in most package managers and requires a single port to connect the sender to the reciever. The connection is secured by high grade encryption (aes256-cbc) according to the set authentication keys. Encryption was enforced with the -E flag, the largest UDP buffer available selected (100 MB) and the key file for encryption was specified.

Sender:

u f t p - Y aes256 - cbc - R -1 - B 1 0 4 8 5 7 6 0 0 - k \

/ h o m e / u s e r /. ssh / i d _ r s a - h s h a 1 - M finland - t e s t s e r v e r Receiving daemon:

u f t p d - D / t e s t s t o r e / r a m d i s k / - E \

- B 1 0 4 8 5 7 6 0 0 - k / h o m e / u s e r /. ssh / i d _ r s a UFTP does not either have any multistream capabilities per host.

4.1.5 WDT

First WDT had to be built from source. Then, it could be started from the destination machine with a single line command according to their documentation:

wdt - h o s t n a m e t e s t s e r v e r - f i n a l n d - n u m _ p o r t s 15 - s t a r t _ p o r t 2 2 3 5 6 \ - d i r e c t o r y / t e s t s t o r e / r a m d i s k | \

ssh t e s t s e r v e r - o r e g o n wdt - n u m _ p o r t s 15 \

- s t a r t _ p o r t 2 2 3 5 6 - d i r e c t o r y / t e s t s t o r e / r a m d i s k / - for the 15 port parallel version, and with -num_ports 1 for the single port test.

4.1.6 TIXStream MFT

TIXStream was set up using their documentation, as it requires a lot more steps to install they are omitted from here. The nodes were then configured to run the highest available speed. One error occurred here where the service would not find its own transfer clients, but this was quickly solved with live support.

4.1.7 Notes

None of the protocols were optimized by continuously setting slightly higher limits than the achieved speed. This can improve performance and did in preliminary testing, especially for UDP based protocols which might not rate limits its own traffic. However, doing it for all the protocols is outside the scope of this report.

(26)

Chapter 5

Performance Evaluation

This chapter presented the results from the benchmarks, separated by single and multistream, in that order. After the general results have been presented, there is a short overview of each protocols performance and experience.

5.1 Single Stream Results

The results from looking at the single stream transfer solutions shows a large spike in performance once going over to UDP as transport layer(2.1) for the long distance transfers(UFTP(2.3.9) and TIXStream’s XP(2.3.11)). It also shows higher performance for UDP based solutions over-all.

HTTPS SCP SFTP WDT UFTP TIXStream

100 200 400 600 800 1,000 1,200

Mbitpersecond

Single stream transfers

Fin to Oreg Fin to Belg

25

(27)

26 CHAPTER 5. PERFORMANCE EVALUATION

5.2 Multistream Results

When analyzing the multistream results it is clear that the performance is above that of any single stream solution, with WDT(2.3.5) and TIXStream’s RWTP (2.3.11) clearly doing the best to transfer at link capacity. In comparison to the single stream graphs, there are no UPD based protocols on this graph at all.

HTTPS SCP SFTP WDT UFTP TIXStream

100 500 1,000 2,000 3,000 4,000 5,000 6,000

Mbitpersecond

Multistream transfer

Fin to Oreg Fin to Belg

Figure 5.1: Note: SCP, SFTP and UFTP does not have any multistream capabilities. Also note that the scale is vastly different from previous graph.

5.3 SCP

Being the standard for secure terminal file transmission for simple use, ignoring rsync1, SCP is missing some features which others use to increase performance. Mainly, since it is running over SSH it only has a single connection to run over per file that needs to be sent. It also uses more CPU than represented by its comparative speed. This is most likely to overhead related to the SSH protocol than the limit on the link. This is more of a baseline for simplicity than a reasonable choice for high speed transfers.

5.4 SFTP

Because SFTP is running over the same protocol as SCP, SSH, it is no surprise here we find similar (identical) results for our transfers. Over multiple runs their performance were always within 1 − 2Mbps of each other.

1Rsync is a very popular program for synchronizing remote directories and files

(28)

5.5. UFTP 27

5.5 UFTP

UDP-based UFTP show the drawback of multicast optimized algorithms when running. It quickly saturates a single core on the sending machine, and keeps it at 100% for the duration of the trans- fer. Looking at system resource usage reveals that it is using only a single core for both encryption and packet management. This is most likely the bottleneck in this transfer solution. However, being single cored on a single port, it still performs very well. Especially when looking at single link transfers, where it does well even on the long distance test. The result looks worse when the buffer is smaller than max size. However, it might be reasonable to allocate 100 MB buffers per transfer for this level of performance.

The issue is that it does not scale, large files would always be helt back by a single core on the system. However, in the case of either many smaller files or files which should be sent to multiple receivers UFTP shows more promise. Especially on the multicast feature as encryption would only occur once instead of once per connection.

5.6 WDT

Neither of the tests caused more than .5 load on either end, making the transfer barely affect the system. It was one of the faster ones tested, considering it was always limited by the TCP limitations on the network more than the systems running the tests. In fact, it is hard to place this test next to UFTP as it is not clear where the limit per TCP stream affects these results as is clear from the single transfer graph (Figure5.1) where it is comparatively similar to the other TCP based solutions.

A quick check on the limitations on WDT was done by running it over 50 ports instead of 1 or 15.

Still, there were no visible hardware bottlenecks in either system, with the speed reported above 725 Mbytes/sec. If run over Googles internal VM network instead of over WAN it reaches even higher than 1100 Mbyte/sec on the Finland - Belgium test in some runs, with consistent minimums of 10/00 Mbytes/sec.

For both of these the same bottleneck is TCP, as the slowstart for so many TCP connections never reach higher speeds for the relatively small transfer of the test, but there seem to be more to gain from larger transfers.

5.7 TIXStream

TIXstream was fairly difficult to get going, as there was extensive documentation available with ready to use packages. In contrast, most other solutions could be set up just installing the associated packages and running the command. This is once again where the aspects of using a commercially available transfer solution shows its commercial strengths, as if a problem would occur then there was professional support available from the developers which was used to isolate a error in the test environment through remote access.

TIXStream proved very capable in both TCP multistream and in UDP single stream. Worth noting is that two different protocols were used, as the transfer solution supports two different proprietary transfer protocols. For single stream, the UDP based RWTP metrics are showed and for the multistream XP was used with the previously mentioned 15 allowed connections.

The fastest speed was from multistream TCP, where the results were identical to that of WDT when compared over multiple runs (three for each). Both of these used minimal server resources without overloading any single core, as the implementations are heavily multi-threaded. However, it used a significant amount of system resources while doing this.

When looking at single stream RWTP does show dominance in Figure5.1. The bottleneck for this connection is unclear. Analysing the network traffic remotely by sending the data captured by tcpdump back over a SSH connection to Wireshark2 revealed a lot of packages just signaling that packages has been fragmented. However, even with those extra packets, the net connection speed was faster than any other single-stream transfer solution tested.

2Wireshark is a very popular tool for analyzing network traffic and data streams

(29)

28 CHAPTER 5. PERFORMANCE EVALUATION

(30)

Chapter 6

Discussions

This chapter first analyze the experiments conducted and then the interesting parts of the data gathered.

6.1 Experiment Discussion

There are many missing aspects from true network performance missing from this study. They are listed in the delimitations(1.8) which are more strict than desired. One such missing test is "bare metal"1tests of how a completely unrestricted system would perform. The aim of the comparison was not to test how the protocols perform in virtual machines, and as such using them for speed testing was not ideal. As emphasized throughout this study, the enforced per-TCP-stream max- imum enforced by the VSP made it harder to judge how well/poorly the top/bottom performers truly are.

Further limitations come in the form of restrictions of protocol choice. There are many more pro- tocols out there for testing, especially commercial ones, which could have triumphed clearly over the others. The largest miss is discussed shortly, QUIC. The primary problem with the commercial solutions are the time and effort it takes to get trial licenses for their respective solutions and set them up. Therefore only time to investigate a single solution of this type was available.

The most promising technology on the horizon for the web traffic world is QUIC, which is stirring in all the pots. At the time of writing there were no easy-to-use solutions available for testing over the internet. There are however several implementations are available on GitHub, a very popular open source collaboration platform, of different implementation levels with instructions for single host testing. For examples, see quinn([28]), quiche(by Cloudflare, [29]) or neqo(by Mozilla, makers of Firefox [30]) which all provide different levels of help for setting up a local testing environment with the limitation that they are hard to configure without authentic certificates(2.1.12), which are required for web transfer tests. Furthermore, the reason given back in3.3.1 on experimental protocols still holds as the finalization of the protocol has still to be announced. All of these different implementations help to speed that process up: it is easy to get through something you can show works with the benefit of being first if your feature gets into the standard.

One of the more glaring shortcoming is the lack of any analysis on where the number of streams affect the transfer speed. The importance of knowing where you get diminishing returns or even more overhead than benefit is a very interesting metric for actually using a multistream solution.

This data could provide in depth insights in how the number of transfer streams affect transfer speeds. Interestingly, both WDT and TIXStream’s RWTP seems to lack automatic management of the number of optimal streams. This might be able to further increase bandwidth utilization if done correctly, but since they both landed on the same speed it is uncertain if there was more speed to gain from using such a feature. Without any data on stream multiplexing it is hard to tell how this factors into real world performance. A script2 could be used for launching WDT and iterating through different number of streams and removing the transfer file in between iterations

1Running directly on hardware, not through some virtual layer like in a virtual machine

2scripts are a popular way of automating reoccurring tasks

29

(31)

30 CHAPTER 6. DISCUSSIONS

and plotting the results in a future study to get this data. This is however outside the scope of this report, as that information does not affect thepurposeof this report. That this piece of the puzzle is missing was hard to spot before all the previous pieces had been paired. It is also something that would likely be different from client to client and from connection to connection.

On the topic of multiplexing, another missing metric is the results if there are multiple files trans- ferred or if a larger file would be split into parts. This would open multiple transfer sessions and potentially increase the value of the single-stream solutions. This could have enabled the UDP based solutions a chance over the TCP based ones, or a risk to overload the network as they do not respect network congestion(2.1.6). However, this is treading on the territory of creating a set of transfer solutions which is not part of this analysis. 1.8

6.2 Data Discussion

The key takeaway is that solutions running over multiple parallel connections does give superior performance, and by a lot in some cases. This looks to be due to the limitations per connection seemingly imposed by the hosting provider. Multistream solutions show that the multiplexing of data is more important than any single streams individual performance. However, in order to run these connections at higher speeds, bare metal machines would be necessary with fully adjustable firewalls3. The tested multistream solutions were all running over TCP, resulting in no data loss on the link due to congestion as the transfers were congestion controlled as outlined by the TCP standard. How this will be affected by future protocol standards such as QUIC which runs over UDP remains to be seen for high speed data transfer solutions. Once QUIC stabilizes, more effort will likely go into improving UDP packet handling on the internet which will most definitely change the previous results.

Secondly, the current use of HTTPS does not look like such a bad solution, most likely as the internet is optimized for HTTPS(TCP) connections. It is also faster than any of the tested single stream UDP solutions when in transferring over multistream, while using less resources which ties back to the previous discussion.

Also, the use of dedicated high-speed optimized transfer solutions seem to perform significantly better than the free standard transfer alternatives. This might be due to the faster ones being branded as fast transfer solutions, while some of the standard ones want to be easier to use and then just "good enough". In this segment, ease of transfer for the end user is more likely the ultimate goal, not the time of the transfer.

3There are performance settings available in firewalls, see2.1.11

(32)

Chapter 7

Conclusions and Further Work

7.1 Conclusion

Let us start with a summary and break it down.

• HTTPS

The easiest

Common network traffic Relatively fast

• WDT

Shared top speed Free to use More ports needed

• TIXStream

Shared top speed Commercial support Not free

• SFTP and SCP

Not particularly interesting

• UFTP

Fast for single streams Heavy on resources Multi-cast

For single stream transfers the choice comes to TIXStream, as it performed admirably with room to spare. UFTP comes second, but is limited to the speed of your fastest core for single file transfers.

Both of them are running without optimized parameters, which is how they would run in the wild and as such the comparison is fair.

The result for TCP based multistream solutions is harder to quantify as they seemed to be limited by the VPS provider. What was common for all TCP solutions were that they were low on system resources, which was not true for the UDP based solutions.

Which multistream transfer solution to go for depends on the needs for the organisation. TIXStream offers reliable support, good documentation for all the features and consulting from their heavily experienced team for how to implement their solutions in your products without the need to think of compilation of their solutions, at a cost. WDT offers limited documentation and only offers community support on its GitHub pages. For example, when this study was taking place, the

31

(33)

32 CHAPTER 7. CONCLUSIONS AND FURTHER WORK

official master branch of WDT did not compile and a fork from one of its developers was used instead, as it had yet to be accepted in by its waiting pull request. For a commercial product, this might not be acceptable. However, for a open source project which needs to implement a transfer back-end it makes a lot of sense, as it is under the BSD License and free to use.

HTTPS performed well considering it is not made for this type of application. It is also very common in modern network infrastructure and as such should be the easiest to maintain.

7.2 Future Work

QUIC shows promise compared to standard TCP in the referenced studies. There are other trans- fer protocols based on TCP such as WDT which could use QUIC instead. It would be interesting to see how the use of QUIC in current TCP based solutions would change the performance, for example in WDT.

More network conditions should also have been evaluated, as the network used in these tests were seemingly limited on the TCP connections. This factor might have ruled out protocols which are in reality quite good. This is also most likely what led to the similar results between XP and WDT.

Testing them without limitations on the link would most likely show differences not visible under these conditions.

Another limitation was the use of always default variables besides unlocking the limit speed. By tweaking the transfer variables for the connections, higher speeds might have been achievable.

Especially for the UDP based solutions were a lot of packages were dropped as there was no congestion control on the link, resulting in many packets being sent multiple times.

(34)

References

[1] Iconik Storage Gateway - iconik Help Documentation, https : / / app . iconik . io / help / pages/knowledgebase/iconik_storage_gateway.

[2] W. Van Heddeghem, S. Lambert, B. Lannoo, D. Colle, M. Pickavet, and P. Demeester,

“Trends in worldwide ICT electricity consumption from 2007 to 2012,” en, Computer Com- munications, Green Networking, vol. 50, pp. 64–76, Sep. 2014,http://www.sciencedirect.

com / science / article / pii / S0140366414000619, issn: 0140-3664. doi: 10 . 1016 / j . comcom.2014.02.008.

[3] PROTOCOL | definition in the Cambridge English Dictionary, en-US,https://dictionary.

cambridge.org/us/dictionary/english/protocol.

[4] V. Jacobson, “Congestion Avoidance and Control,” in Symposium Proceedings on Communi- cations Architectures and Protocols, ser. SIGCOMM ’88, http://doi.acm.org/10.1145/

52324.52356, Stanford, California, USA: ACM, 1988, pp. 314–329, isbn: 978-0-89791-279-2.

doi:10.1145/52324.52356.

[5] A. Mudassar, N. Md Asri, A. Usman, K. Amjad, I. Ghafir, and M. Arioua, “A new Linux based TCP congestion control mechanism for long distance high bandwidth sustainable smart cities,” Sustainable Cities and Society, vol. 37, pp. 164–177, Feb. 2018,http://www.

sciencedirect . com / science / article / pii / S2210670717307126, issn: 2210-6707. doi:

10.1016/j.scs.2017.11.005.

[6] M. Mathis and J. Mahdavi, “Forward Acknowledgement: Refining TCP Congestion Control,”

in Conference Proceedings on Applications, Technologies, Architectures, and Protocols for Computer Communications, ser. SIGCOMM ’96, http://doi.acm.org/10.1145/248156.

248181, Palo Alto, California, USA: ACM, 1996, pp. 281–291, isbn: 978-0-89791-790-2. doi:

10.1145/248156.248181.

[7] L. Vicisano, J. Crowcroft, and L. Rizzo, “TCP-like congestion control for layered multicast data transfer,” in Proceedings. IEEE INFOCOM ’98, the Conference on Computer Commu- nications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98, vol. 3, Mar. 1998, 996–1003 vol.3. doi:

10.1109/INFCOM.1998.662909.

[8] P. Yang, J. Shao, W. Luo, L. Xu, J. Deogun, and Y. Lu, “TCP Congestion Avoidance Algo- rithm Identification,” IEEE/ACM Transactions on Networking, vol. 22, no. 4, pp. 1311–1324, Aug. 2014, https : / / ieeexplore - ieee - org .focus . lib . kth . se/ abstract / document / 6594906. doi:10.1109/TNET.2013.2278271.

[9] B. Turkovic, F. A. Kuipers, and S. Uhlig, “Fifty Shades of Congestion Control: A Performance and Interactions Evaluation,” arXiv:1903.03852 [cs], Mar. 2019,http://arxiv.org/abs/

1903.03852. arXiv:1903.03852 [cs].

[10] User Datagram Protocol, English,https://www.ietf.org/rfc/rfc768.txt, RFC.

[11] Zhaojuan Yue, Yongmao Ren, and Jun Li, “Performance evaluation of UDP-based high-speed transport protocols,” in 2011 IEEE 2nd International Conference on Software Engineering and Service Science, Jul. 2011, pp. 69–73. doi:10.1109/ICSESS.2011.5982257.

[12] SFTP vs. FTPS: The Key Differences, en,https://www.goanywhere.com/blog/2016/11/

23/sftp-vs-ftps-the-key-differences, Nov. 2016.

[13] Facebook/wdt, Facebook,https://github.com/facebook/wdt, Oct. 2019.

[14] AUR (en) - wdt,https://aur.archlinux.org/packages/wdt/.

33

References

Related documents

missed). The design of linear actuators has been done according to the space available; the solution has to be robust and effective and it has to occupy a little space as

Determines when status register, block size, memory address, user node address and cpu node address- data shall be sent on the spwd(7 to 0)..

As earlier explained our work explores how area markings can be used with existing concepts in TIBCO Spotfire to improve aspects of speed and efficiency.. The existing concepts

H 1 : A higher level of education of the remitter increases the preference for selecting a formal transaction channel, constituting financial institution, MTO or mobile phone, over

Conduction of the incoming heat during and after braking passes through the pad surface into the back plate. Heat is further conducted through to the calipers via the metal back

The authors interpret that because local network in host countries helps the individuals to utilize the knowledge received in the different ways according to different country

• Is real-time simulation of the existing SimMechanics model possible when the control system has a 200µs loop time.. • If the control system loop is 200µs, how fast should the

The lighting control examined in this study was not very complicated to understand essentially, basically the domain knowledge information selected about this control was explicit