• No results found

Exploring web protocols for use on cellular networks : QUIC on poor network links

N/A
N/A
Protected

Academic year: 2021

Share "Exploring web protocols for use on cellular networks : QUIC on poor network links"

Copied!
62
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer Science

Master thesis, 30 ECTS | Computer Networking

2017 | LIU-IDA/LITH-EX-A--17/001--SE

Exploring web protocols for use

on cellular networks

QUIC on poor network links

Val av webbprotokoll för mobila uppkopplingar

QUIC på riktigt dåliga uppkopplingar

Hans-Filip Elo

Supervisor : Kevin O’Rourke (Opera Software) Examiner : Niklas Carlsson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att an-vända det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lös-ningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannenslitterära eller konstnärliga anseende eller egenart. För ytterli-gare information om Linköping University Electronic Press se förlagets hemsidahttp://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circumstances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page:http://www.ep.liu.se/.

c

(3)

Abstract

New developments in web transport such as HTTP/2 and first and foremost QUIC promise fewer connections to track as well as shorter connection setup times. These protocols have proven themselves on modern reliable connections with a high bandwidth-delay-product, but how do they perform over cellular connections in rural or crowded areas where the connections are much more unreliable? A lot of new users of the web in todays mobile-first usage scenarios are located on poor connections.

A testbench was designed that allowed for web browsing over limited network links in a con-trolled environment. We have compared the network load time of page loading over the protocols QUIC, HTTP/2 and HTTP/1.1 using a variety of different network conditions. We then used these measurements as a basis for suggesting which protocol to use during different conditions.

The results show that newer is not always better. QUIC in general works reasonably well under all conditions, while HTTP/1.1 and HTTP/2 trade blows depending on connection conditions, with HTTP/1.1 sometimes outperforming both of the newer protocols.

(4)

Acknowledgments

I would like to thank everyone who, during the course of this thesis, took time out of their day to proof-read this thesis. This includes Karl-Johan Elo, Ulrika Ingelsson, Kevin O’Rourke (Opera Software AS), Jens Edhammer and least but definitely not least Pontus Persson. To me your efforts have been inestimable and you all have my gratitude. I’m sending thanks to Roger Johannesson (TV Services Sweden AB) for his insights on browser automation and in-browser measurements as well as Haakon Riiser (Opera Software AS) for providing network traces for the experiments. I’d also very much like to thank my fellow students who I’ve studied with over the course of my stay at Linköping University. It’s been fun and hard work, and without to some of you this thesis wouldn’t be a reality. I can’t list all of you here, but I do like to send some extra thanks to Niklas Ericson, Jens Edhammer and Alexander Poole.

(5)

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables ix 1 Introduction 2 1.1 Motivation . . . 2 1.2 Aim . . . 3 1.3 Contributions . . . 3 1.4 Delimitations . . . 3 2 Related Work 5 2.1 What is a bad cellular network? . . . 6

2.2 Using Linux to simulate networks . . . 6

3 Method 8 3.1 Test scenarios . . . 9

3.2 The server . . . 11

3.3 The client . . . 12

3.4 Web content . . . 12

3.5 Limiting the bottleneck link . . . 13

3.6 Collecting the results . . . 15

3.7 Parameter tuning . . . 17

4 Results 20 4.1 Baseline testing . . . 20

4.2 Bandwidth limited scenarios . . . 22

4.3 Latency scenarios . . . 22

4.4 Lossy scenarios . . . 24

4.5 Differences between persistent and closed connection . . . 25

4.6 Summary . . . 25

5 Discussion 26 5.1 This work . . . 26

5.2 Further work . . . 28

5.3 The work in a wider context . . . 29

(6)

Bibliography 33

A Complete test results 36

B Box plots of bytes fetched 49

(7)

List of Figures

3.1 Testbench workflow . . . 8

3.2 Testbench architecture . . . 9

3.3 Proxy vs direct browsing . . . 9

3.4 Example qdisc chain of a limited network interface . . . 13

3.5 Gilbert-Elliot model of packet-loss . . . 14

3.6 Two packets arriving with mean latency, showing distribution curve for latency . . . 15

4.1 Persistent connection testing, unlimited scenario. . . 21

4.2 Persistent connection testing, heavily limited scenario. . . 21

4.3 Network load times for bandwidth limited scenarios . . . 22

4.4 Network load times for latency added scenarios . . . 23

(8)

4.6 Load time comparisons between open and closed connection. . . 25

A.1 Persistent connection testing, unlimited scenario. . . 37

A.2 New connection testing, unlimited scenario. . . 37

A.3 Persistent connection testing, heavily limited scenario. . . 37

A.4 New connection testing, heavily limited scenario. . . 38

A.5 Persistent connection testing, extra high bandwidth. . . 38

A.6 New connection testing, extra high bandwidth. . . 38

A.7 Persistent connection testing, high bandwidth. . . 39

A.8 New connection testing, high bandwidth. . . 39

A.9 Persistent connection testing, medium bandwidth. . . 39

A.10 New connection testing, medium bandwidth. . . 40

A.11 Persistent connection testing, low bandwidth. . . 40

A.12 New connection testing, low bandwidth. . . 40

A.13 Persistent connection testing, high trace bandwidth. . . 41

A.14 New connection testing, high trace bandwidth. . . 41

A.15 Persistent connection testing, medium trace bandwidth. . . 41

A.16 New connection testing, medium trace bandwidth. . . 42

A.17 Persistent connection testing, low trace bandwidth. . . 42

A.18 New connection testing, low trace bandwidth. . . 42

A.19 Persistent connection testing, extra tiny latency. . . 43

A.20 New connection testing, extra tiny latency. . . 43

A.21 Persistent connection testing, tiny latency. . . 43

A.22 New connection testing, tiny latency. . . 44

A.23 Persistent connection testing, low latency. . . 44

A.24 New connection testing, low latency. . . 44

A.25 Persistent connection testing, medium latency. . . 45

A.26 New connection testing, medium latency. . . 45

A.27 Persistent connection testing, high latency. . . 45

A.28 New connection testing, high latency. . . 46

A.29 Persistent connection testing, varying latency. . . 46

A.30 New connection testing, varying latency. . . 46

A.31 Persistent connection testing, low loss. . . 47

A.32 New connection testing, low loss. . . 47

A.33 Persistent connection testing, medium loss. . . 47

A.34 New connection testing, medium loss. . . 48

A.35 Persistent connection testing, high loss. . . 48

A.36 New connection testing, high loss. . . 48

B.1 Bytes fetched, persistent connection, baseline scenarios . . . 50

B.2 Bytes fetched, persistent connection, bandwidth limited scenarios . . . 50

B.3 Bytes fetched, persistent connection, latency added scenarios . . . 50

(9)

List of Tables

1.1 Old and new protocols . . . 2

2.1 Average TCP properties on Chinese cellular networks [Hu2015] . . . 6

3.1 Baseline scenarios . . . 10

3.2 Bandwidth limiting scenarios . . . 11

3.3 Latency scenarios, 10 Mbit/s symmetric bandwidth . . . 11

3.4 Loss scenarios, 10 Mbit/s symmetric bandwidth . . . 11

3.5 Initial connection properties . . . 18

3.6 Average TCP properties on Chinese cellular networks [Hu2015] . . . 18

3.7 Measured latencies for scenario 7 . . . 19

(10)

Nomenclature

Term Meaning Explanation

TCP Transmission Control Protocol Current standard for connection based transmssions UDP User Datagram Protocol Current standard for stateless transmissions HTTP Hypertext Transfer Protocol Current standard for resource fetching on the Web

TLS Transport Layer Security Current standard for encrypting traffic on the Web QUIC Quick UDP Internet Connections New protocol which explores technologies for

future improvement for TCP, TLS and HTTP RTT Roundtrip time Time from sending packet until sender

acknowledgement has returned. BDP Bandwidth delay product bandwidth ¨ delay

wget GNU wget Tool for fetching resources over HTTP.

Wireshark Wireshark Tool to create network traces and extract statistics from them.

iperf iperf Tool for measuring network throughput.

nginx nginx Web server.

Caddy Caddy Web server.

Chromium Chromium Open source web browser project on which Opera and Google Chrome are based.

(11)

1

Introduction

The World Wide Web (the Web) has grown continuously since its introduction in the early 90’s. There are more and more users of the Web and more and more devices connected to the Internet each day. All of these devices are served by the same type of back-end, which we call the Internet. The communica-tion medium of the Internet is relatively homogenous in that most of the infrastructural communicacommunica-tions usually take place over either optical fibres or Ethernet. The clients however, as leaf-nodes of the In-ternet, use many different types of connections including Ethernet, Wi-Fi (802.11) or mobile wide area connections such as GSM/2G, UMTS/3G and LTE/4G.

The massive growth of the Web and web-based applications has been a factor in enabling a multi-tude of devices to co-exist and run the same type of applications. Applications based on Web technol-ogy often use Hypertext Transfer Protocol (HTTP) to fetch content from remote locations. The time between making a request and receiving the resulting data is often the limiting factor for usability and user experience of these web applications. New implementations of the underlying protocols of the Web, listed in Table 1.1, seek to improve response times of content requests.

Table 1.1: Old and new protocols

Task Current protocol New experimental protocol Transport protocol TCP

QUIC Application protocol HTTP (over TLS)

1.1

Motivation

Quick UDP Internet Connections (QUIC) is an experimental protocol designed to explore methods for decreasing latency of connections running over modern networks with high bandwidths1. QUIC is also designed to minimize redundant transmissions compared to Transmission Control Protocol (TCP), which in theory should work faster with wireless physical layers like 802.11 and mobile networks that already implement some retransmissions of lost packets. QUIC, being an experimental protocol imple-mented in the application layer and not dependent on kernel support, can iterate fast and try different algorithms. These algorithms, when fully developed, can then be standardized and implemented within operating systems on many devices.

(12)

1.2. Aim

While many users and most servers today do have connections with higher available bandwidth and often shorter response times than before [20], the heterogeneous network connections on the client side still give differing experiences of responsiveness within the same application. The latency of requests in the worst case scenarios are often so high that they require a totally different approach than a regular web request to give a satisfactory user experience. One of these approaches is Opera Mini’s "Extreme Mode" which converts an entire web page into Opera Binary Markup Language (OBML) and then sends the web page to the client within a single TCP connection encoded as the binary format Opera Binary Socket Protocol (OBSP). This works well for slow and lossy connections but the OBML format does not support many dynamic web pages [12]

Today most web pages are dynamic in some way. This means that users can not use their favorite web applications in areas with bad cellular coverage. It would be great if these newer protocols allowed the web browsers of tomorrow to perform well even when acessing web pages over slow and/or lossy connections.

On top of web pages being mostly dynamic, many webpages are moving to encrypted transport as a response to recent year reports of systematic global surveillance, which has been further accelerated by the Linux Foundation’s Let’s encrypt project2[40,32]. The fact that major browsers only support HTTP/2 over encrypted channels and that QUIC is an encrypted-only protocol further reinforces that the future of web transport is encrypted.

1.2

Aim

This report aims to compare how QUIC performs over bad connections in comparison to older proto-cols such as TCP+HTTP (1.1 and 2). A bad connection is here seen as cellular connection in an area with bad cellular coverage, cellular networks while moving at high speeds or just poor implementations of cellular networks.

1.3

Contributions

1. Design a testbench for a fair comparison of how modern protocols for retrieving the most com-mon web resources perform over bad network connections compared to older protocols. The testbench is built using tools available in most Linux distributions along with web servers for the different protocols. Unlike most currently available web page test frameworks like web-page-replay and Mahimah [44,29], it can not record web requests into an archive and then replay the same requests in a browser. Our testbench focuses on delivering static content. This approach is chosen for comparing network protocols against each other since the testbench can fetch the same resources over different protocols without the server housing the original site having the capability to serve resources over multiple protocols.

2. Present a performance evaluation of QUIC against TCP with HTTP/1.1 and HTTP/2 using the testbench.

3. By varying network conditions, we present the strengths and weaknesses of the different proto-cols in the respective scenarios. TCP and HTTP are already fairly well documented for use over cellular networks [19,46,20,25], but not QUIC. When QUIC is compared to TCP and HTTP [27,11] it is usually over much better network conditions than are currently available in cellular networks in many countries [19]. Due to the lack of data on how QUIC perform during these conditions, this fills a gap in the existing literature.

1.4

Delimitations

• This thesis will not test dynamic page load elements, such as Facebook’s feed that loads more content on scroll events or video content.

(13)

1.4. Delimitations

• This thesis will look at network load times, not page load times. • This thesis will only test the encrypted alternatives of the protocols. • The QUIC connection migration feature will not be tested.

• QUIC forward error correction (FEC) will not be enabled in testing.

• The network properties of the testbench is modelled independently of each other.

Static content will be used since we explore how different protocols handle network bottlenecks, and server generation speed should not be a factor.

In order for the comparisons between protocols within the thesis to stay relevant in the future, all tests will be done using encrypted protocols. This means that the different encryption handshakes could have an impact for connection startup latency, which is one of the areas explored in this thesis.

(14)

2

Related Work

There has been some work done on comparing QUIC against TCP with HTTP/1.1 and HTTP/2 [8,11,

27]. In these comparisons the authors are mainly focusing on connections with high bandwidth-delay products (BDP). Especially Das [11] does a great job of characterising QUIC performance compared to HTTP. Das considers link speeds all the way down to 0.2 Mbit/s. This is about as slow as some poor cellular networks in daily use [19,38,37]. Das does however limit the receive buffer of the simulation to such a degree that loss is introduced. It is unclear how much loss is introduced, and if losses are continuous over a transmission or happening in bursts. In general loss in cellular networks is low, most often below 1%, due to packet-retransmissions within the link-layer [20,19]. When high rates of loss in cellular network do occur, they generally tend to cut off the transmission for longer periods of time. This is usually either due to a handover between cellular network base-stations or temporarily losing cellular coverage [20,31]. Both handovers and temporarily loosing the cellular coverage gives packet-loss in bursts rather than as continuous over the course of the connection.

Earlier authors exploring QUIC performance compared to TCP have focused on high BDP links with symmetric up- and down-links that have the same bandwidth, loss and packet reordering rates [8,

11,27]. Focusing on high BDP networks is only natural since many modern networks, especially better cellular networks, have a high BDP. No cellular networks are however symmetric in terms of up- vs down-link performance. The power-requirements of transmitting a signal over the air limits mobile device up-link performance. The higher amount of data on the down-link compared to the up-link also causes higher re-ordering and loss rates on the down-link. Looking at cellular networks in rural areas and countries with a less developed cellular infrastructure, the bandwidth is also usually not as high as in these studies. This thesis seeks to compare QUIC to older protocols over really bad connections.

Comparisons between HTTP/2 and HTTP/1.1 on limited networks are more common [39,14], due to the better availability of HTTP/2 in web servers, browsers and other software. These studies gener-ally show that HTTP/2 is less resilient to loss than HTTP/1.1, performing worse on lossy connections. These studies show that in general HTTP/2 performs much better than HTTP/1.1 in scenarios with 50-100 ms. They also show that HTTP/2 clearly is faster than HTTP/1.1 when loading many small objects over stable connections. Varvellot et al [42] show through recurring large scale analysis on the 200 most visited sites that HTTP/2 outperforms HTTP/1.1 on fiber connections as well as European 3G networks.

Erman et al [14] has done a great in depth investigation of using SPDY, HTTP/2s successor proto-col, and HTTP/1.1 as a proxy over cellular 3G connections. Erman highlights the problems that arise with TCP as the underlying transport layer protocol to both protocols, that most TCP

(15)

implementa-2.1. What is a bad cellular network?

tions reset the congestion window (cwnd) after a certain time of inactivity but does not reset the RTT estimations, leading to spurious retransmissions when trying to fetch data after idle perio;;.

Recently, Bocchi [4] looked at how users perceive browsing with HTTP/1.1 and HTTP/2 by letting 147 test persons browse the same pages using the two protocols. Even though HTTP/2 performed well in objective measurements, HTTP/2 had no advantage or even a small disadvantage in giving a subjec-tively better user experience experience. It would be interesting to see a similar study which includes QUIC as well, in order to see if the more drastic approach of QUIC improves the user experience.

2.1

What is a bad cellular network?

In their report, Hu et al.[19] perform a large-scale analysis of packet traces from Chinese operators in different regions. It is found that these Chinese 3G networks are one or two orders of magnitude slower than networks in the US and Europe both in terms of round trip time (RTT) and throughput. It is also found that even though the networks are of low quality, retransmissions implemented within cellular link layers keeps losses at minimum, usually below 1%, for all signal to noise ratios (SNR) above ´14 dB. They also calculate the median for the loss, transmission rate and packet reordering of TCP connections. Xu et al. [45] confirm that packet loss is rather rare in cellular networks. They also show that TCP has problems scaling the window to the ever changing available bandwidth in a cellular network. When running the testbench described in chapter 3, it is important that we use parameters that give TCP connection characteristics similar to those described by Huang and Hu [20,19]. This means large variations in bandwidth and RTT with loss that arrives in bursts.

Some average properties of mobile TCP connections in China described by Hu et al. [19] in their large scale analysis of Chinese mobile 3G networks are shown in Table 2.1.

Property UMTS HSDPA HSPA

Down-link bandwidth 13.2 kbit/s 44.3 kbit/s 33.6 kbit/s Up-link bandwidth 5.3 kbit/s 4.3 kbit/s 5.6 kbit/s

RTT 528 ms 135 ms 85 ms

Loss rate 3.0% 1.71% 1.95%

Re-ordering rate down-link 6-30% 6-30% 6-30%

Re-ordering rate up-link 0% 0% 0%

Table 2.1: Average TCP properties on Chinese cellular networks [19]

Xu et al. [45] explores mobile networks in Singapore using UDP tracing [45]. They find that packets often arrive in packet bursts, around 50-60% of the time. The rest of the time the packets arrive one at the time.

Riiser et al [38] and Li et al [25] do network traces during commute and high-speed commute, showing how networks behave when the user is moving through different regions. Both reports contain TCP traces, and Riiser has provided us with UDP traces as an aid for this report as well. Analyzing the UDP traces from cellular networks provided by Riiser shows large variations in bandwidth. These and similar bandwidth traces have been demonstrated useful in other performance studies (e.g., to evaluate the performance of proxies designed to take into account the unique properties of modern HTTP-based adaptive streaming techniques [23] and to emulate the impact of bandwidth caps [24]), further motivating the usage of such traces in this study.

In summary, a typical poor cellular connection seems to have a varying bandwidth, varying packet-burst sizes and packet losses that arrives in packet-bursts as well as varying latency of the connection.

2.2

Using Linux to simulate networks

Beshay et al. [2] as well as the Linux Foundation Wiki [30] do a great job of outlining the possibilities as well as the quirks of network emulation in Linux. The paper is a recommended read to anyone interested in network emulation. Not only are the findings within the paper of great use but their report

(16)

2.2. Using Linux to simulate networks

is also compact and to the point, making it worthwhile to read. Without the guidance of their report, we would have tried building a tesbench that did not use virtual switches but rather just virtual interfaces. This makes the Linux kernel report back packet losses to TCP, destroying the test. We would also not have forced the MTU size which is necessary since a virtual interface can have much larger MTU than is typical in a general mobile network. Prototyping the testbench would have been much more extensive work without reading their report.

In his report, Haßlinger describes in a distinct way the Gilbert-Elliot model of lossy channels, which has been of use for creating the loss model for our testbench [17].

(17)

3

Method

This chapter describes the methods used within the testbench that is built in order to do protocol testing. The testbench is implemented as a client-server setup with a bottleneck in between them. The bottleneck link is changed with different parameter values in order to compare the three protocols HTTP/1.1, HTTP/2 and QUIC in web browsing. We compare each protocol on how fast they fetch web pages as well as how much content they managed to fetch. A flowchart of the testbench process can be seen in Figure 3.1 and an overview of the testbench network architecture can be seen in Figure 3.2.

Figure 3.1: Testbench workflow

A major goal of the testbench is to, in a controlled way, simulate network conditions that are rep-resentative of poor network conditions. A major aim is also for the results to be reproducible. The testbench is designed as a single-machine network emulating a client-server scenario where there’s a bottleneck link separating the server and the client. The network limitation is done using Open-vswitch1and GNU/Linux packet-queueing disciplines (qdiscs)2and should work on almost any mod-ern GNU/Linux distribution.

The GNU/Linux kernels network stack is one of the, if not the most, used networking stacks of nodes on the Internet. Using a real-world networking stack for testing gives higher probability of maintaining results relevant in the real world. The reasons for emulating a network on GNU/Linux instead of using a physical network is to improve reproducibility. Using a physical network however makes future testing of the still very rapidly evolving QUIC protocol time-consuming, and is thus not a great alternative to using a virtual network. There are also no advantages to using a physical network since the bottleneck link still needs to be modelled, neglecting the behaviour of the rest of the network. Beshay et al. outlines the premises for network emulation in GNU/Linux very well in their 2015 report

1Openvswitch web page,http://openvswitch.org/

(18)

3.1. Test scenarios

Figure 3.2: Testbench architecture Figure 3.3: Proxy vs direct browsing

[2]. Their network architecture consists of a server, two switches between which there’s a bottleneck link, and a client. Our testbench is similar but uses a different set of qdiscs.

All TCP based transmissions are done using the current default TCP variant within most Linux distributions, TCP Cubic [16]. The congestion control algorithm Cubic is also the one currently used within QUIC [35].

3.1

Test scenarios

Using this testbench, the different protocols of interest will be run on a number of different scenarios. The scenarios are categorized into four different categories; reference scenarios, bandwidth limiting tests, loss tests and latency tests. The scenarios within each category are listed further down in this section. Each of these scenarios are run with a persistent connection to the web server as well as setting up a new connection for each page load. First let’s look at how we achieved a persistent connection as well as a new connection for each page load using the same testbench.

Proxy assisted and direct browsing

Proxy assisted browsing as well as direct browsing is illustrated in Figure 3.3. In order to test both persistent connections, as if using a proxy, as well as closing connections between page-loads, as if doing regular browsing – two different client behaviours were developed.

• A test which keeps a persistent connection to the web server over the duration of the test. This gives a better comparison between the different protocols usefulness in proxy-assisted services like Opera Mini Turbo and Extreme modes.

• A test which closes the connection to the server between each page load. This gives a better comparison between QUIC and HTTP on regular non-proxy-assisted page-loads to new servers and helps explore connection setup and ramp-up behaviour.

Persistent connection test In Listing 3.1 we can see the workflow that the client uses during the persistent connection test.

1 o p e n _ b r o w s e r ( )

f o r u r l i n a l l _ u r l s :

3 g e t ( b a s e _ u r l ) g e t ( u r l )

5 s a v e _ H T T P _ a r c h i v e ( u r l )

Listing 3.1: HTML page loaded between other pages

Where base_url is the index of the domain served by the web server. All other sites are stored in subfolders of the base path. On the base URL the simple HTML page in Listing 3.2 will reside.

(19)

3.1. Test scenarios 1 <HTML> 3 <HEAD> <t i t l e> p r o t o ´t e s t e r n e u t r a l p a g e < /t i t l e> 5 < /HEAD> <BODY> 7 <h1> p r o t o ´t e s t e r n e u t r a l p a g e < /h1> <p> T h i s i s t h e n e u t r a l p a g e o f p r o t o ´t e s t e r . I n a t e s t r u n w hi c h k e e p s an o p e n c o n n e c t i o n t o t h e web s e r v e r , t h i s p a g e i s l o a d e d b e t w e e n e a c h s i t e . < /p> 9 < /BODY> 11 < /HTML>

Listing 3.2: Persistent connection algorithm

In Listing 3.3 we can see the workflow that the client uses during the new connection test. Closed connection test The client follows the following algorithm during the duration of the test:

1 o p e n _ b r o w s e r ( ) f o r u r l i n a l l _ u r l s : 3 g e t ( a b o u t : b l a n k ) s l e e p ( 3 1 s ) 5 g e t ( u r l ) s a v e _ H T T P _ a r c h i v e ( u r l )

Listing 3.3: New connection algorithm

Loading about:blank here makes the browser leave the domain served by the web server, and waiting 31 seconds is then done in order for Chromiums time-out to trigger and close the connection to the server domain. This makes the next page load trigger a new connection setup to the web server.

Connection scenarios

The testbench runs tests over 16 different sets of connection parameters. Each of the parameters are built to map against certain functions of the testbench. The parameters are described in Section 3.5 where we look at how properties of the bottleneck link are modelled.

As mentioned earlier, the different test-scenarios are categorized into different types of tests, bandwidth-limiting, loss-tests as well as tests with varying latency. Along with these tests two ref-erence scenarios are used, one which is an unlimited network scenario and one which is modelled to be similar to the poor conditions described by Hu et al [19]. The poor conditions are subjectively unusable from a user experience perspective but still included since we want to look at these conditions. The reason we want to look at these conditions are that we want to see whether QUIC manages to break free from the shortcomings of HTTP/2 in these heavily limited scenarios [11,4,39]. The test scenarios used are listed in Tables 3.1, 3.2, 3.3 and 3.4.

Scenario Trace Trace multiplier (dl/ul) Latency (dl/ul) Lat. dev. (dl/ul)

Heavily limited Yes 0.6/0.3 300/300 ms 2.5/0.1 % overlap

Unlimited - - -

-Table 3.1: Baseline scenarios

The unlimited test is a virtual network link without any specified limits and will deliver packets as fast as the CPU of the host system3can handle. For the system used for testing we measured a link speed of 4.99 Gbit/s and a average latency of 0.028 ms using iperf4in UDP mode and ping5 respectively.

3Intel Core i7 4790K running at 4.8 GHz 4iperf homepage,https://iperf.fr/

(20)

3.2. The server

Our purely bandwidth-limited tests consists of 66 different tests which use either a static bandwidth or the trace udp-traces/storo_4x_700kbps_120s_tv.log provided by Riiser [37]. If di-viding this trace into means every seconds, as we do and will speak more of in Section 3.5m this trace has an average bandwidth of 1.013 Mbit/s, a maximum bandwidth of 23.552 Mbit/s and a minimum bandwidth of 4.31 kbit/s.

Scenario Trace Trace multiplier (dl/ul) Bandwidth (dl/ul)

Extra high bandwidth No - / - 10 / 5 Mbit/s

High bandwidth No - / - 5 / 3 Mbit/s

Medium bandwidth No - / - 2 / 1 Mbit/s

Low bandwidth No - / - 0.5 / 0.25 Mbit/s

High trace Yes 2.4 / 1.2

-Medium trace Yes 1.2 / 0.6

-Low trace Yes 0.6 / 0.3

-Table 3.2: Bandwidth limiting scenarios

Our latency tests all uses a 10 Mbit link with either a static or a varying latency added. Adding varying latency often result in packet reordering.

Scenario Latency (dl/ul) Lat. dev. (dl/ul) High delay 300 / 300 ms /

-Medium delay 150 / 150 ms / -Low delay 75 / 75 ms / -Tiny delay 50 / 50 ms / -Extra tiny delay 5 / 5 ms / -Varying delay 50 / 50 ms 2.5 / 0.1 ms

Table 3.3: Latency scenarios, 10 Mbit/s symmetric bandwidth

Our loss tests all use the Gilbert-Elliot model for modelling bursts of packet-loss. You will read more on the Gilber-Elliot model in Section 3.5. All scenarios of loss are non-deterministicin how much loss they show on the actual link, but the different scenarios listed in Table 3.4 have been measured to loss around 8, 14 and 22 % of the packets on a saturated 10 Mbit/s down link.

Scenario Move to gap prob. (dl) Move to burst prob. (dl) Loss rate in burst (dl)

High loss 3 % 1 % 70 %

Medium loss 3 % 1 % 35 %

Low loss 3 % 1 % 10 %

Table 3.4: Loss scenarios, 10 Mbit/s symmetric bandwidth

Each scenario is run at least five times for each protocol. Each test is also explored with both approaches for connection handling, proxy-assisted/keep-alive and direct connection/connection-close.

3.2

The server

As the web server the testbench in the HTTP cases uses Caddy [6]. Caddy was initially picked due to its QUIC support, so that tests could also be run with the same web server for all cases. After initial testing with Caddy compared to proto-quic [33] it was decided that the QUIC implementation from Caddy still is too slow in comparison. The testbench, when running QUIC testing, instead runs with the example server from goquic [15]. The choice to include the server from goquic is due to it being based on proto-quic, the reference implementation of QUIC exported from the Chromium

(21)

3.3. The client

project. Since QUIC is such a young protocol, performance varies greatly between implementations and therefore both available implementations were explored for inclusion in the testbench.

The servers are in both HTTP and QUIC cases serving static content from Alexa top 500, more on that in section 3.4

3.3

The client

In order to automate browser page loading the testbench uses chrome-har-capturer6. The first iterations of the testbench did however use the Selenium browser automation tool7for client-side automation. Selenium does have its quirks but is one of the most complete browser automation tools available, and works across different browsers. This means that if using Selenium the testbench could easily be made to run with non Chromium based browsers when they implement QUIC.

When automating Chromium based browsers with Selenium, one can not use the browsers built-in developer tools since only one connection is allowed to the remote dev-tools service, which is used up by Selenium [9]. Through the developer tools the HTTP Archive we seek can be exported. The inability to use the developer tools thereby hampered the reporting of resource load times and sizes. Together with Selenium a number of methods for extracting timing data from Chromium were considered, but none other than the HAR Archive exported the data needed for fair comparisons between the protocols. QUIC, originating from the Chromium project, is only supported in Chromium based browsers at the moment so the testbench is dependent on Chromium. These premises gave two achievable ways to automate the client and still receive satisfactory results.

• Use a remote debugging proxy like crmux8in order to connect Selenium and use Chromiums remote dev tools at the same time.

• Run the automations without Selenium.

Instead of using Selenium, a solution that exported the HAR through remote dev tools was chosen. The first idea was to implement a tool for loading web-pages and saving a HAR. As it turns out however there was already such a tool named chrome-har-capturer. This tools turned out to slot almost perfectly in for the protocol testing process of this testbench. We want to give credit the author(s) for its relatively stable page load status reporting.

3.4

Web content

In order to make page loads over the simulated network as representative as possible of real world web pages, as well as removing load-times due to server content generation, landing pages of the Alexa top 500 collection of web pages were scraped with the application GNU wget9. We then limited our pages to top 150 pages in order to get shorter run-times for the testbench. These pages were then traversed visually in a browser in order to see which pages worked somewhat reasonably as static content. Reasons for exclusion of web pages from the test suite were:

• The URL leads to something that is not a web page, but rather an API of some form. • The server stopped us from crawling it with GNU wget.

• The entire web page, or most of it, is lazy-loaded with Javascript. • The domain points to another page (often google.com).

• The page hangs the browser when some dynamic resource can not be fetched.

6Chrome HAR capturer,https://github.com/cyrus-and/chrome-har-capturer 7Selenium HQ,http://www.seleniumhq.org/

8crmux,https://github.com/sidorares/crmux

(22)

3.5. Limiting the bottleneck link

Figure 3.4: Example qdisc chain of a limited network interface

• The domain is a duplicate of an already used domain (google.se and google.com). This left us with a collection of 40 web pages, which can be found in Appendix C

3.5

Limiting the bottleneck link

When limiting the bottleneck link it is of importance that the virtual link shows properties close to real-world network links. This is verified by looking at how TCP connections fare over the simulated link compared to real-world cases such as those presented by Hu and Xu, as well as some UDP mea-surements. As it turns out, browsers fail to load many resources when the conditions are as poor as described by Xu and Hu [45,19]. Running a lot of tests with conditions similar to those described did not give differing outcome of inter-protocol performance than running just one test with properties close to these conditions. In order to get a perception of protocol performance during varying network conditions, we needed to create bigger differences between the test scenarios. In order to still measure performance on these really poor links, we added a reference test scenario with lower bandwidth but no packet-loss included (scenario 7). You will read more on our verification of network behaviour in Section 3.7, but now let’s look at how we created the bottleneck link.

The bottleneck link is limited using the tool Traffic Control (tc) in GNU/Linux. Tc is used along with queuing disciplines (qdiscs) that operate on network interfaces, and limit their outgoing packets. In order to limit both directions of a connection one interface in each direction is limited. On these interfaces qdiscs are chained together in order to get the multiple desired properties for the link.

Regarding how to create an environment for correctly emulating a bad link, in the GNU/Linux Foundation wiki entry on network emulation one can read the following [30]:

“When loss is used locally (not on a bridge or router), the loss is reported to the upper level protocols. This may cause TCP to resend and behave as if there was no loss. When testing protocol reponse to loss it is best to use a netem on a bridge or router.”

This means that all limits set on the bottleneck link is done on the interfaces of the virtual switches in the setup. If the limits were set on the local interfaces of the client or server, transmissions over TCP (HTTP) would have a direct feedback loop from the interface, both giving them an unfair advantage compared to QUIC as well as not being representative of a real-world scenario. This is the reason for using two switches within the setup to separate the client and server from the interfaces limited by qdiscs.

The entire qdisc chain used for limiting the outgoing traffic of an interface consists of three qdiscs, illustrated in Figure 3.4.

The packets must traverse the entire chain of qdiscs before leaving the interface, thereby being affected by the network properties identified as common on a cellular network. Each qdisc in the chain has a large buffer of 10, 000 packets in order for packets not to be dropped while waiting for the qdisc. Packet drops are already accounted for within the model and uncontrolled packet loss due to full buffers will make the results un-predictable as well as not being representative of the real world scenario the model tries to simulate. The code for limiting the network links can be found within thesource code

(23)

3.5. Limiting the bottleneck link

Figure 3.5: Gilbert-Elliot model of packet-loss

repository.10 The following subsections goes into further detail of the individual parts of the interface

limiting qdisc chain.

Modelling bandwidth

Bandwidth will be modelled both as a static bandwidth and using UDP traces provided by Haakon Riiser [37]. From the traces, a mean bandwidth over a time period of one second will be taken, and applied onto the link. When the trace has come to an end the simulation will start over again from the beginning of the trace, since the traces are 120 s at a maximum and each test run is 6-10 hours depending on browser behaviour. In order for all tests to be similar, the same bandwidth trace will be used for all runs but scaled using two factors, one for the up and one for the down link.

Modelling packet loss

Often in mobile networks, packet loss comes in bursts due to some change in the environment. In order to simulate this, the channel is modelled by a two-state Markov model called the Gilbert-Elliot model of packet loss. The idea is that the channel has two states, one state which is considered "the good" state (gap mode) and the other which is the "bad state" (burst mode). In gap mode there’s a certain probability k that a packet is correctly transmitted and within burst mode the probability is h where h ăă k. In each state there are certain probabilities, p and r respectively, that we are leaving the current state and entering the other, modelling the stability of the channel. Figure 3.5 shows a finite state machine (FSM) of the process [13,17].

Since it is non-deterministic when the bad and the good state occurs, it is of great importance to run the testbench many times with the same test parameters in order to get a representative sample of how each protocol performs during the conditions simulated.

A userland implementation of the Gilbert-Elliot model that randomized every second was explored. The point of this was that the random number generator could start on the same seed for different tests. It was decided that using the netem qdiscs built-in Gilbert-Elliot model would be preferable since it can randomize on a per-kernel-tick basis [28]. This is a lot faster than our userland Python implementation would be able to introduce random events, making variations in the random event stream less of importance to the end results of the tests.

A limitation of this loss model is that we see no losses from buffers losses. Most routers today have large buffers, and are mostly dropping packets as a tool to control TCP fairness by limiting fast and long running transmissions [c,10].

Modelling latency and inter-packet arrival times

Latency variations of the simulated network are modelled as a normal distribution with a certain mean, µ ms. There are four different ways of modeling latency in the different test scenarios:

1. No latency added 2. Static latency added

10https://github.com/hansfilipelo/master-thesis-netem/blob/master/net-setup/

(24)

3.6. Collecting the results

Figure 3.6: Two packets arriving with mean latency, showing distribution curve for latency

3. Latency with a fixed standard deviation

4. Latency with a standard deviation that varies to match the current bandwidth

The first three cases are quite static in how they model latency. These three cases create controlled environments which are highly reproducible due to a less random behaviour. The forth case is a little different and is created in order to achieve realistic inter-arrival times while still not receiving too many packets out-of-order. The standard deviation of the distribution is changed to match up with the rate of the connection when using the bandwidth trace provided by Riiser [37].

At a certain rate limit, the packets arrive to the receiver with fixed inter-arrival times. We want to make these packets have varying inter-arrival times without making too many of them arrive out of order. We achieve this by specifying a delay distribution which is normally distributed, and with variance depending on the current rate of the link. We want to choose the standard deviation, σ , for the distribution so that a certain amount of packages arrives in-order while still seeing random behaviour for inter-arrival times.

The time between two packets arriving with a latency equal to µ is illustrated in Figure 3.6, here L is the time in milliseconds of a certain percentage of the distribution we seek, and is calculated according to

L= 1

2 ¨ Packets/s (3.1)

Where Packets/s is calculated from the rate limit of the connection, given in Mbit/s according to Packets/s=Mbit/s ¨ 1000000

B ¨ P . (3.2)

Here B is the number of bits per byte, 8, and P is the number of bytes per packet, which is 1500 since the MTU of the link is set to 1500 bytes. The deviation of the latency distribution we seek is then calculated from L according to

σ= (µ+L)´ µ NX%

= L

NX%

, (3.3)

where µ is the mean of the latency distribution set on the connection and NX%is the X percentile value for the standard normal distribution [3].

3.6

Collecting the results

As mentioned in Section 3.3, an HTTP archive is used to save information of the test. The HTTP archive is a JSON file with information on resource fetching exported from the browser [36,1]. The reason for using a HTTP archive is that it gives a great overhead view of the final state of the web transfer. The reason for not combining the HTTP archive with packet traces is due to QUIC being an entirely encrypted protocol. In the long run this is better for the protocol itself, but creates some

(25)

3.6. Collecting the results

“The transport information for QUIC (congestion related information) is encrypted mainly to guarantee the transport can always evolve. If the acks were in the clear, or even check-summed, the concern is that eventually middle boxes would start parsing the congestion information and would break with any forward changes. This is currently a problem for TCP; the wire format allows for negotiated options and flexible features which are practi-cally unusable because of the expectations of current hardware on the internet.

The down side, of course, is that hiding these details from middle boxes also means QUIC flows are hard to analyze if you don’t control an endpoint. The tcpdump tool can let one visualize the rate of packets, and the gaps in packets, but it’s unclear which packets contain payload, which have congestion information, which have retransmits etc. Both client and server side code are architected to have hooks to easily dump this information during userspace processing. Such logs can are more data rich than tcpdumps and can be tied together with kernel level packet traces to get a better idea of latency across the whole system.“

The testbench controls all parts of the network chain, specifically it controls the client browser which is needed to export the HTTP archive.

Post-processing HTTP archives

A HTTP archive contains information on each request as well as for the total page load. Combining results of many HTTP archives allows us to get a statistically more accurate view of how each protocol performs. The source code contains a script11which collects data from all runs with a given set of parameters into a bigger collection of data, also generated as a JSON file. The resulting JSON file has the following structure:

{ 2 " i d e n t i f i e r s ": { " w e b _ p r o t o c o l ": "PROTOCOL", 4 " l o s s _ u l ": "PERCENT_LOSS_ON_UPLINK", " d e l a y _ u l ": "DELAY_ON_UPLINK", 6 " d e v i a t i o n _ u l ": "DELAY_DEVIATION_UPLINK", " b a n d w i d t h _ u l ": "BANDWIDTH_UPLINK", 8 " l o s s _ d l ": "LOSS_RATE_DOWNLINK", " d e l a y _ d l ": "DELAY_ON_DOWNLINK", 10 " d e v i a t i o n _ d l ": "DELAY_DEVIATION_ON_DOWNLINK", " b a n d w i d t h _ d l ": "BANDWIDTH_DOWNLINK" 12 } , " w e b s i t e s ": { 14 " e x a m p l e . com ": [ { 16 " t i m e ": 9 0 0 1 , " s t a t u s ": t r u e , 18 " t o t a l _ b y t e s _ f e t c h e d ": 9 0 0 1 , " r e s o u r c e _ c o u n t ": 3 20 } , { 22 " t i m e ": 3 0 0 0 , " s t a t u s ": f a l s e , 24 " e r r o r ": " Some e r r o r . " } 26 ] } 28 } Listing 3.4: pseudo–code/data–format.json

11Generate stats script, https://github.com/hansfilipelo/master-thesis-netem/blob/master/

(26)

3.7. Parameter tuning

The data format saves each load of a web page into a list under the keys websites/url in the main dictionary. Within each entry of the list, time is equal to the maximum of the browser reported onContentLoadand the longest running resource fetch, resource_count is the number of suc-cessfully fetched resources and total_bytes_fetched is the total number of bytes fetched. This file structure enables easier analysis of data from all HTTP archives that was gathered with a given set of parameters at once. The reason for choosing the time to load a page as the maximum of the resource times and onContentLoad is that sometimes when a long running resource fetch fails, the last re-source who succeeded is set as onContentLoad by Chromium. By taking the maximum value of all resources and onContentLoad, the testbench ensures that this error in reporting does not affect the statistics. The parameters of the test run is used as identifiers for the data which are later checked when doing comparisons, ensuring that comparisons between protocols are done using the same connection parameters.

3.7

Parameter tuning

Applying properties to an interface is one thing, but if we should use these links for testing we need to verify that the parameters we set give properties we want. What we want is properties that is as close as possible to the conditions described by Hu et al. [Hu2014]. The reasin for this s that we want to explore what QUIC could mean for users in parts of the world with cellular conditions similar to those described in Hu’s report. In order to verify the link properties, a number of tools are used. Mtr12 is used to verify latencies, GNU wget is used to look at long-running TCP connections and lastly the testbench browser is used to verify the connection properties of short-lived TCP connections. When comparing the testbench with the connection properties of Asian cellular networks found by Hu et al. [19], this is perhaps the most interesting measurement due to most TCP connections being short-lived when browsing the web. Wireshark13is used for packet capture and statistics during all tests. For UDP behaviour the findings by Xu et al. [45] will be used as comparison while for TCP, the findings by by Hu et al. [19] is instead the reference. Achieving connection properties similar to what they are reporting is a good starting point for our measurements.

Tuning parameters by checking TCP behaviour

In order to verify that TCP behaviour matches up with findings by Hu et al. [19], we are using GNU wgetto fetch a big file as well as running the testbench browser on five web sites using HTTP/1.1. Fetching both a big file and browsing small files allows us to look at both long running and short running connections, giving a picture of both steady state and connection ramp-up behaviour. Testing different parameters allows us to see how TCP performs during different conditions.

Browsing When setting the connection properties for the testbench according to Table 3.5 and let-ting the testbench load the first five pages of the dataset, we end up with TCP connection behaviour somewhat similar to that described by Hu et al. [19].

12mtr homepage,http://www.bitwizard.nl/mtr/ 13Wireshark web page,https://www.wireshark.org/

(27)

3.7. Parameter tuning

Property Value

Loss, probability to move to burst mode 100% Loss, probability to move to gap mode 0%

Loss rate in burst mode 0%

Delay down/up link 300 ms

Delay stddeviation down link 2, 5 % overlap (1 - 5 % packet-reordering) (Section 3.5) Delay stddeviation up link 0.1 % overlap (0 - 1 % packet-reordering) (Section 3.5)

Bandwidth trace storo_4x_700kbps_120s_tv.log

Bandwidth multiplier up link 0.3 Bandwidth multiplier down link 0.6

Table 3.5: Initial connection properties

The UDP bandwidth trace provided by Riiser has an average bandwidth of 1.013 Mbit/s, and the multipliers scale the bandwidth down a bit to better match the properties of poor cellular links. The measured TCP properties for a test run of these 5 webpages over HTTP/1.1 can be seen in Table 3.6. Page load times were perceived as slow, with the Facebook landing page loading at around 35 seconds over HTTP/1.1. The measured properties are similar to the typical TCP properties of Asian networks described by Hu [19]. This is highlighted in Table 3.6 shows a comparison of the properties of a typical Chinese data network and the properties of our simulated network.

Property UMTS HSDPA HSPA Simulated

Down-link bandwidth 13.2 kbit/s 44.3 kbit/s 33.6 kbit/s 42kbit/s Up-link bandwidth 5.3 kbit/s 4.3 kbit/s 5.6 kbit/s N/A

RTT 528 ms 135 ms 85 ms 600 ms

Loss rate 3.0% 1.71% 1.95% 0%

Re-ordering rate down-link 6-30% 6-30% 6-30% 1-5%

Re-ordering rate up-link 0% 0% 0% 0-1%

Table 3.6: Average TCP properties on Chinese cellular networks [19]

As can be seen, the properties of a TCP connection in our experiments are for the most part similar to those observed in the poor Chinese networks, although our simulated network provide somewhat better conditions according to some of the metrics (e.g., loss rate and re-ordering rates).

Getting a large file We used GNU wget to fetch a large file. We let the test run for 5 minutes. During these 5 minutes we usd the same connection parameters as for the browser test, wget lost the connection on multiple occations. The resulting download speeds of each connection were 352, 281, 274, 272, 265, 260 and 244 kbit/s respectively. We had 1.8 % packet re-ordering rate and saw 1.9 % packet-loss.

Tuning latencies

With the latency model described in 3.5 it is found that an overlap of 2.5% of the latency distribution from each side gives a packet reorder rate of HTTP/1.1 TCP streams that varies between circa 1 -5 % when used with the bandwidth trace provided by Riiser [37], and the connection parameters for scenario 7 (3.5).

Using the tool mtr to send 1000 ICMP ping packets, we saw a latency distribution with a mean value of 616 ms, worst and best latency of 1314 vs 577 ms and a standard deviation of 89 ms, also listed in Table 3.7.

(28)

3.7. Parameter tuning

Average latency 617 ms Max latency 1314 ms

Min latency 577 ms

Standard deviation 89 ms

Table 3.7: Measured latencies for scenario 7

This is close to the worst-case behaviours described by Hu et al. [19] in their measurements on Asian cellular networks. We will use this latency variation in one of our tests looking at really poor network conditions.

Conclusion on link-properties

In conclusion all of these properties look a little bit better than the poor cellular networks described by Hu et al [19]. Using as poor conditions as listed in Table 3.5 makes web browsing not only ex-tremely slow, but also unreliable. Running many tests with parameters around these conditions gives the same result in all cases, namely that all protocols struggle with these connection parameters. The link properties in Table 3.5 were therefore chosen to be one of the baseline scenarios instead of as a starting point from where all test parameters vary. The chosen suite of scenarios, see Tables 3.1, 3.2, 3.3 and 3.4, gives a variation of link properties in order to get a decent understanding of how QUIC react to different network conditions. Keeping the heavily limited scenario is however important to understanding how the protocols perform under the conditions described by Hu et al [19].

(29)

4

Results

When comparing the protocol results, we both need to compare how fast the different network protocols as well as how much data the respective protocol manages to fetch over a limited link. This gives a picture of how the protocol performs during each scenario. Our results are sometimes shown as cumulative distribution graphs (CDF) which show the network load times and loaded bytes of all requests for a given network scenario. A CDF graph is here used in order to show the probability of loading a random page (X ) at a certain time (t), or show percentage of the total samples that load below a certain amount of bytes (b). Shorter network load time is considered better, while more bytes fetched is considered better. When our results are not shown as CDFs, they are instead shown in box-plots.

Due to most of our network scenarios being non-deterministic, it is important to look at the results together as a group in order to see general trends. Studying a single result tells us nothing in this case. In this chapter we will look at the scenarios with the most interesting outcome. The full suite of scenario results can be found in Appendix A.

We first look at how the protocols perform under a certain group of scenarios where the protocols are allowed to use a persistent connection to the web server, and then look at cases where there’s a sub-stantial difference when closing the connection between each page load. For each group of scenarios we present the most interesting cases, were we see most pronounced differences between the protocols. Cumulative distribution graphs (CDFs) for all scenarios is listed in Appendix A.

4.1

Baseline testing

Looking at the cumulative distribution figures (CDF) for network load times for the first of the unlim-ited tests, seen in Figure 4.1 (a), the figure shows that QUIC manages to load the entirety of our loaded webpages in below 1 second. This is faster than the other two protocols in this scenario with both protocols loading 40 % of the pages in over 1 second. HTTP/2 is here faster in general than HTTP/1.1, but both protocols have a lot of outliers here, and especially HTTP/2. The load time medians for QUIC, HTTP/2 and HTTP/1.1 are 134, 941 and 1105 ms respectively. If we read the number of bytes fetched, shown as a CDF in Figure 4.1 (b), we can see that they fetch about the same amount of data here, meaning that all protocols manage to fetch proper web sites, which is to be expected and verifies that our testbench works as intended. There’s an uncertanty of why the outliers in load times that HTTP/1.1 and HTTP/2 show here happen, especially since HTTP/1.1 does not show this many outliers for other tests.

(30)

4.1. Baseline testing

(a) Load times (b) Bytes fetched

Figure 4.1: Persistent connection testing, unlimited scenario.

(a) Load times (b) Bytes fetched

Figure 4.2: Persistent connection testing, heavily limited scenario.

The same CDF graphs for the heavily limited testcase can be found in Figure 4.2 (a) and (b). Here we can see that all protocols struggle to keep network load times acceptable, with load time medians of 16812 for QUIC, 17225 for HTTP/2 and 18504 for HTTP/1.1. The number of bytes fetched does however show that HTTP/2 doesn’t fetch all the content that HTTP/1.1 and QUIC fetches. HTTP/2 performs really poorly under this severely limited scenario since it loads web pages partially. The reason for this is that the Chromium based browser times out on waiting for data much more often on HTTP/2 connections. If we inspect the tests visually while they are running it is clear that there’s a difference between the sites that HTTP/2 loads and those of the other two protocols.

The results obtained from the heavily limited test paints a clear picture, that QUIC manages to per-form much better than HTTP/2 during really unstable network conditions which include both packet re-ordering as well as unstable bandwidth and high latency. A big part of this improvement can prob-ably be explained by the multi-stream aware congestion control that QUIC implements, instead of relying on TCP which can only deliver data as one in-order stream. The separation of the streams congestion control seems to allow for a more efficient use of the network. HTTP/1.1 fares well here also, again indicating that separating the congestion control on a per stream basis gives an advantage here.

(31)

4.2. Bandwidth limited scenarios

Figure 4.3: Network load times for bandwidth limited scenarios

4.2

Bandwidth limited scenarios

Figure 4.3 shows a box plot for network load times in all of our purely bandwidth limited scenarios, which are the extra high bandwidth (EH bw), high bandwidth (H bw), medium bandwidth, (M bw), Low bandwidth (L bw), high trace limit (H trace), medium trace limit (M trace) and low trace limit (L trace) scenarios. Each box shows the median of network load times with the line within the box, with the 25 / 75 % percentiles as the outlines of the box. The whiskers indicate the 5 / 95 % percentiles while outliers outside of that are shown as dots.

QUIC has the highest median out of the three protocols, indicating that it is the slowest protocol out of the three in all cases. These purely bandwidth limited scenarios are really the worst-case for QUIC compared to the other two protocols. One can however see that HTTP/2 has more outliers in all of the scenarios, regardless of whether we are using a static bandwidth or a bandwidth that varies with the trace provided by Riiser. HTTP/1.1 is best in these tests with better median load time than the other two protocols as well as far fewer outliers.

In the purely bandwidth limited tests we can see that HTTP/1.1 performs exceptionally well com-pared to the other two protocols. QUIC, while having a slower median than HTTP/2, manages to have fewer outliers than HTTP/2 here making the choice between the two a hard one. We currently have no sane explanation to why the respective protocols show the behaviours seen in these tests, we can just make observations here. It should be mentioned that you rarely see bandwidth limited links without any form of noticeable latency out in the real world, but these tests does give a perception of how the different protocols work on heavily bandwidth limited networks where latency is lower than the typical latency.

4.3

Latency scenarios

In Figure 4.4 we can see a similar box plot over network load times as the one shown in our bandwidth testing, again showing median, 25/75% and 5/95% limits as well as outliers.

In these scenarios QUIC performs very well. The median is lower than for both HTTP/1.1 and HTTP/2, with HTTP/1.1 coming in as the second best protocol. One can also see that HTTP/2 in general has a wider spread in its load times, indicating the inconsistencies in its performance that we’ve seen through all testing.

Our static latency scenarios should in theory be a reasonably good case for both HTTP/2 and QUIC with their single connections to the remote server [11], but the further we increase the latency HTTP/2 falls further behind QUIC even with a persistent connection. We can also see that HTTP/1.1 performs on a similar level as HTTP/2 with regards to load times here - while HTTP/2 does not manage to

(32)

4.3. Latency scenarios

Figure 4.4: Network load times for latency added scenarios

fetch as much data as the other two protocols yet again. Since our latency added scenarios are using 10 Mbit/s links with only latency added, i.e. a link with somewhat high bandwidth-delay-product, expected behaviour would have been that should perform HTTP/2 reasonably well comparatively. This is however not the case and HTTP/2 performs as bad or worse than HTTP/1.1 in this case. Just to be sure that there was nothing wrong with the web server HTTP/2 implementation for these tests, we tried replacing the web server Caddy with nginx, yielding the same results. This could be a product of Chromiums HTTP/2 implementation, for example by having shorter timeouts than when browsing with HTTP/1.1. This needs to be explored using another browser implementation as well as digging in the Chromium source code, which there were not enough time for during this thesis.

In summary the static latency tests are quite representative of a stable mobile connection running over HSPA+ or LTE, which indicates that QUIC could perform well on such networks.

(33)

4.4. Lossy scenarios

Figure 4.5: Network load times for lossy scenarios

4.4

Lossy scenarios

Figure 4.5 shows again a box plot of the network load times for our lossy test scenarios - high loss (H loss), medium loss (M loss) and low loss (L loss). Lossy connections are a case where we know from other reports [11,26,39] that HTTP/2 performs poor, the question is whether QUIC, by taking a new approach in its transport layer, can improve on HTTP/2 in this case.

Looking at the graph we can see that QUIC manages to perform very well here, even managing to outperform HTTP/1.1.

Lossy conditions are less common in most networks, but can happen when in cellular networks due to the user moving between coverage areas of different base stations [31]. A protocol resilient to loss is good so that it can continue function under as bad network conditions as possible, especially when the client is moving like when the user is travelling.

(34)

4.5. Differences between persistent and closed connection

(a) Unlimited scenario (b) High latency scenario Figure 4.6: Load time comparisons between open and closed connection.

4.5

Differences between persistent and closed connection

Looking at test scenarios where the results differ between when running tests with new connection and a persistent connection, there is not much difference. Figure ?? shows a CDF of network load times where the dotted lines are showing the test scenario using a new connection for each page load. Figure ?? shows the same CDF for our high latency scenario. Though the differences are small, in all cases the new connection tends to load web pages faster than the persistent connection. One could easily make the assumption that it should be the other way around, but by letting the connection idle between page loads we make our congestion algorithms reset some of their values while other are kept. When the testbench then continues loading the next page the server again needs to scale up its congestion window while other congestion parameters are not reset. These phenomenon are very well outlined by Erman et al [14]. A way to improve the persistent connection performance is to keep sending data as long as the browser has the connection open in order to fight the reset of the congestion algorithm, though this is not very energy efficient for battery powered devices.

4.6

Summary

With really limited network links, QUIC manages to achieve more reliable file transfer than HTTP/2. HTTP/1.1 manages more reliable transfers than HTTP/2, and also being faster than QUIC in bandwidth limited cases. QUIC on the other hand is faster in high-latency scenarios and lossy scenarios.

In line with earlier findings [19,4], when choosing a protocol for web transfer on poor cellular networks, HTTP/1.1 should be considered in preference to HTTP/2. This study indicates that QUIC is at least as good as HTTP/1.1 in these cases, and might be the best web protocol for use on unstable connections. If speed over poor links is a priority, HTTP/1.1 is still a very reasonable choice. For web browsing in general one could consider QUIC as the best option as it more consistently delivers web pages fast over more reliable connections while still being considerably better than HTTP/2 over the poor connections.

(35)

5

Discussion

5.1

This work

The methodology used to acquire results within this thesis has both good and bad aspects to it. There are many parts to implementing solid web protocol testing such as network emulation, statistics gath-ering and measurements as well as browser automation. A lot of issues exists - many of them related to controlling the browser and measuring protocol performance.

Method

Investigating QUIC performance is not easy due to the lack of widespread QUIC support in both web servers and software designed to measure web performance. In order to achieve a testbed which creates reliable and comparable results between the protocols, the testbed inherits some disadvantages.

• The use of only static content makes the tests less representative of the real world web pages that the test-URLs are based on. The use of pre-downloaded static content also makes the most common measurement of web browsing performance, page load time, unavailable due to the fact that some lazy-loaded resources are missing from the downloaded content. When content is missing, pages often hang waiting for the time-out telling that the content is unavailable. Alternatives to serving static content include building a QUIC Ñ HTTP proxy (extensive work) for more real-world like page loads, or building a recording framework for QUIC (even more extensive work). All of these solutions have their own disadvantages, but exploring these options could be interesting future work.

• For some of the popular domains, loading the landing page is not a typical page load, but rather users load resources from that domain that is not the landing page.

• QUIC is still a young protocol. Test results of QUIC as a protocol perhaps does not make sense until we see more mature integrations into mainstream web server software such as Apache or nginx, which are exposed to a larger user base and could possibly be more reliable and faster than the toy QUIC server provided by the Chromium project.

• While there are many tools for browser automation, the state of the integration between browser and the tools are often less than desirable at this point in time. There are too many times that the browser will just refuse to load a page. This problem is often related to when the browser decides

References

Related documents

phenomenographic research, whose context are we speaking of? Who is experiencing the context? How can we describe and account for context in a phenomenographic study where the

The performance of OLSR and AODV protocols with respect to specific parameters such as initial packet loss, end-to-end delay, throughput, routing overhead and packet delivery

In terms of the RE, the behaviour of the playing strategies is exactly the same as in the case of the medium bit rate case: the Flash and HTML5 have one re-buffering event while

explain that communication protocols based on epidemic techniques “show complex and often unexpected behavior when executed on a large scale.” They develop an analytic

Detta scenario jämförs sedan mot ett så kallat kontrafaktiskt scenario, där vi utför en liknande analys som för baslinjescenariot, men vi använder andra värden på ståltullarna

IMS uppgift är att bidra till utvecklingen av en bättre praktik i socialt arbete genom att förse det sociala området med kunskapsöversikter över vilka insatser och metoder

En prospektiv studie på Ögonkliniken Universitetssjukhuset Örebro visade att innebandy stod för 26 stycken (53%) av alla sportrelaterade ögonskador under år 2014.. Även här

(However, Hakulinen [4:52] did not find IS in her telephone data.) As was shown, the fact that the WOZ2 system provided no feedback signals is surely to a large