Estimating Internet-scale Quality of Service Parameters for VoIP

(1)

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Estimating Internet-scale Quality of Service

Parameters for VoIP

by

Markus Niemelä

LIU-IDA/LITH-EX-A--16/013—SE

2016-03-24

(2)

Linköpings universitet

Institutionen för datavetenskap

Final thesis

Estimating Internet-scale Quality of Service

Parameters for VoIP

by

Markus Niemelä

LIU-IDA/LITH-EX-A--16/013—SE

2016-03-24

Supervisor: Mikael Asplund

Examiner: Simin Nadjm-Tehrani

(3)

Presentation Date

2016-03-24

Publishing Date (Electronic version)

2016-04-22

Department and Division

Software and Systems

Department of Computer and Information Science

URL, Electronic Version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-127360

Publication Title

Estimating Internet-scale Quality of Service Parameters for VoIP

Author(s)

Markus Niemelä

Abstract

With the rising popularity of Voice over IP (VoIP) services, understanding the effects of a global network on Quality of Service is critical for the providers of VoIP applications. This thesis builds on a model that analyzes the round trip time, packet delay jitter, and packet loss between endpoints on an Autonomous System (AS) level, extending it by mapping AS pairs onto an Internet topology. This model is used to produce a mean opinion score estimate. The mapping is introduced to reduce the size of the problem in order to improve computation times and improve accuracy of estimates. The results of testing show that estimating mean opinion score from this model is not desirable. It also shows that the path mapping does not affect accuracy, but does improve computation times as the input data grows in volume.

Keywords

Voice over IP (VoIP), Quality of Service (QoS), cost estimation, QoS-driven routing

Language

X English

Other (specify below)

Number of Pages 50 Type of Publication Licentiate thesis Degree thesis Thesis C-level X Thesis D-level Report

Other (specify below)

ISBN (Licentiate thesis)

ISRN:LIU-IDA/LITH-EX-A--16/013--SE

Title of series (Licentiate thesis)

(4)

Abstract

With the rising popularity of Voice over IP (VoIP) services, understanding the effects of a global network on Quality of Service is critical for the providers of VoIP applications. This thesis builds on a model that analyzes the round trip time, packet delay jitter, and packet loss between endpoints on an Autonomous System (AS) level, extending it by mapping AS pairs onto an Internet topology. This model is used to produce a mean opinion score estimate. The mapping is introduced to reduce the size of the problem in order to improve computation times and improve accuracy of estimates. The results of testing show that estimating mean opinion score from this model is not desirable. It also shows that the path mapping does not affect accuracy, but does improve computation times as the input data grows in volume.

(5)

List of Abbreviations

AS Autonomous System.

BGP Border Gateway Protocol.

C2P customer-to-provider.

CAIDA The Cooperative Association for Internet Data Analysis.

CI confidence interval.

ICMP Internet Control Message Protocol.

IGP Interior Gateway Protocol.

IP Internet Protocol.

ISP Internet Service Provider.

ITU The International Telecommunication Union.

LLSE Linear Least Squares Estimation.

MOS Mean Opinion Score.

MSE Mean Square Error.

P2C provider-to-customer.

P2P peer-to-peer.

QoS Quality of Service.

RMSE Root Mean Square Error.

RTT Round Trip Time.

S2S sibling-to-sibling.

(8)

Chapter 1 Introduction

Real-time media quality between two or more endpoints is critical for the success of Voice over IP (VoIP) applications. As providers of these services are only in control over parts of the network on which they operate, they are today not controlling how media packets are being routed between endpoints in individual sessions. This is instead done by the routing policies of Internet entities outside of the product’s control.

These global packet routes have tremendous impact on real-time communi-cation quality, so in order to be more competitive and provide guaranteed real-time media communication quality, control must be established over real-real-time media packet routing. The first step towards controlling real-time media packet routing is understanding global network impact on real-time media quality.

1.1 Problem description

Today, a large amount of data is being logged for millions of VoIP calls every day to monitor and give insight into call quality amongst other things. With such a large amount of data, it is possible to infer quality properties of individual links, such as Mean Opinion Score (MOS), Round Trip Time (RTT), packet loss, packet delay jitter, and available bandwidth. However, even on the highest level where connections between Autonomous System (AS) pairs are considered, there are thousands of endpoints and millions of pairs of such endpoints, numbers that are steadily growing. Trying to infer parameters for all of these pairs and endpoints is a large problem, which takes a long time to calculate.

Additionally, many pairs may not have up-to-date data reflecting the true characteristics of the connection between the pairs’ endpoints. This is because the Internet is constantly changing, and the number of calls logged in a day is dwarfed by the number of possible AS pairs.

As a first step toward controlling real-time media packet routing, the goal is to be able to make an informed decision on whether to relay a call through company controlled ASs or not, weighing costs against quality of service benefits.

(9)

This means that identifying calls where the experience is expected to be poor and being able to offer alternative routes for these is of particular interest.

A

B

C D

Company Network

Internet

Figure 1.1: Two potential paths for a VoIP call to take between A and B. The choice is illustrated in Figure 1.1. A call between AS A and AS B normally takes a path through other ASs on the Internet over which the VoIP provider has no control, represented by the solid edge in the figure. The dotted edges represent the alternative path where the call is relayed through a network over which the provider does have control. While the provider still has no control over edges AC and DB, it can now influence edge CD. Data for the full dotted path is not available however, as traffic has not yet been routed through this path. Therefore it is desirable to infer this from the individual components of the path.

The problem’s complexity lies in the fact that there is a large number of combinations of endpoints, and it keeps growing. There are billions of potential pairings, and trying to solve for all of the Quality of Service (QoS) contributions of these is infeasible for two reasons: data will not be available for all these pairs, and the resources required to process the amounts of data required would be immense. How can these billions of decisions be handled efficiently then?

Internally at Skype, one provider of VoIP services, a model to try to keep track of all of these pairs for which there is data has been developed. However, this model is considered to be slow and not scalable by the company. To be able to put this into practical use, the complexity must be reduced somehow.

This thesis proposes and evaluates an improved model that aims to be faster at processing the large number of AS pairs required to make well-informed decisions. The goal is to reduce it to a more manageable state in order to be able to analyze the data quickly and reliably. The model would be applied to the large amounts of data prior to call set-up, and could then be used to provide real-time information about the expected quality of service of route alternatives during call set-up.

(10)

model can be of use in making relay decisions. This will be done by comparing it to the existing model in accuracy of estimates as well as computation times.

1.2 Approach

To evaluate QoS given the problem description, we approach the problem in two parts as described in this section.

1.2.1 Network modeling

The existing model describes all possible AS pairs and the QoS metrics associ-ated with them by keeping track of every individual pair. This model represents the network as a complete graph of AS vertices. That is, each of the vertices is directly connected to all other vertices, as in Figure 1.2. Recognizing that quality metrics can be affected by network connections within an AS as well as between a pair, inferring these costs from call data becomes an expensive

prob-lem. With n vertices, there are N = n2 _{cost variables to calculate. For C data}

points, where C > N , using a basic Linear Least Squares Estimation (LLSE)

algorithm to solve for the costs has a time complexity of O(N2_{C) = O(n}4_C).

Currently, there are over 50000 ASs present on the Internet [1], which gives the following:

n > 50000 =⇒ n4_{> 6.25}

∗ 1018

It is unlikely that all of these AS pairs will be directly involved with VoIP calls, but as the number of ASs is constantly growing, this may well be reason-able in the future if we assume that the proportion of ASs involved in VoIP calls does not drop. Adding that the number of data points C must be greater than

N = n2_{, this is a daunting size. This model will henceforth be referred to as}

the na¨ıve model, as the way it represents the Internet is simplified compared to the next model.

A B

C D

E F

(11)

Clearly the Internet is not a complete graph, and in practice most ASs are directly connected to very few other ASs, routing traffic towards each other using the Border Gateway Protocol (BGP). The na¨ıve model would consider a path between pairs to be a direct connection, independent of all other ASs pairs, but knowing that this is not the case, we can model it differently. A path between two ASs is in fact a combination of direct connections between ASs. An

example is shown in Figure 1.3. Here, a path between D and A is D→ B → A,

including two direct connections. Note that both of these connections are also used in paths connecting other AS pairs in the graph. Comparing to Figure 1.2 there are much fewer direct connections, but all ASs still have paths to each other. With n vertices, there are still n2_{pairs, but each pair maps to a path P}

that is a subset of the m direct connections between vertices e on the Internet. (ASi, ASj)7→ Pi,j⊂ {ek : k∈ [1, m]}∀i, j ∈ [1, n]

Necessarily, m_{≤ n}2_{, and looking at BGP routing tables reveals that m appears}

to have a growth that is closer to O(n) than it is to O(n2_{). With the numbers}

from earlier, this could mean a potential improvement of 9 orders of magnitude when scaled up fully. It should be noted that while the number of subsets possible is 2n_{, there is generally only one correct path for each pair.}

A

B C

D E F

Figure 1.3: A topological AS relationship graph.

With this approach, calculating the cost of connections between pairs turns into two problems. The problem of inferring costs for edges and vertices of the

graph remains, but with the new time complexity O(m2_{C). To achieve this, the}

problem of mapping AS pairs to paths is introduced. Consequently, finding a good method of solving this is necessary for this proposed model to be feasible to implement.

When evaluating the chosen model, we will measure the computation speed-up of the model, as well as look at whether or not the produced results are able to identify poor paths and good alternative paths which will help in the decision-making process.

(12)

1.2.2 Cost-based QoS evaluation

Given a graph that has been populated with costs, in order to be able to make a decision about whether to relay a call or not, we need to be able to quantify the QoS of a call between a pair of ASs somehow. It may not be immediately obvious what impact the QoS metrics in the model have on the actual perceived quality of a call; even if a relayed call might improve the metrics, a non-relayed call may still provide a sufficiently high QoS.

MOS is generally used to measure the subjective QoS, but it cannot be mod-elled using the above approach, as it is modmod-elled as a more complex function of other QoS parameters. There is however a relationship between the measur-able network metrics and MOS, meaning the decision-making could be reduced to comparing single numbers. This relationship however is not obvious, as is noted in Section 2.5, and other explanatory variables may be necessary to obtain satisfactory results.

In order to evaluate the implemented model’s usefulness for decision-making, MOS will be used. As such, models to map network metric measurements to MOS will be explored and their accuracy evaluated.

1.2.3 Evaluation of model

To evaluate the network model and MOS estimation model, the following ques-tions need to be answered:

• Is the proposed network model as reliable as the current na¨ıve model? • Does performance in terms of computation time improve in practice when

comparing the proposed model to the current na¨ıve model?

• Can the model directly predict user experience by translating network metrics into MOS?

1.3 Related work

The behavior and performance of traffic in networks has long been studied, and advances are made constantly.

The Cooperative Association for Internet Data Analysis (CAIDA) runs a project that studies the topology of the Internet as a whole, trying to identify the relationships between entities in the network [10, 11]. The ability to observe and infer these relationships is a challenging endeavour, often imprecise, but has served as an inspiration for the ideas in this thesis. The topology dataset [2] is primarily concerned with classifying ASs and the connections between them.

Several other papers have looked at Internet topology and AS relationships as well [13, 14] and the effects of the paths traffic takes as a result [14, 23, 24]. That the Internet is organized in a way that is suboptimal for the traffic types today is found to be quite clear, but no solutions are found that would be easy to implement on the scale of the Internet. As the premise of the model proposed

(13)

in this thesis assumes knowledge of Internet topology, these studies are highly interesting to investigate. They are discussed further in Sections 2.3 and 2.4.

Methods for real-time measurements of some QoS metrics have been studied as well. Using DNS servers, reasonably accurate estimates of RTT and packet loss have been able to be obtained [15, 25]. These measurements enable more research to be performed, though it is not appropriate for the purposes of per-forming real-time analysis at call set-up in VoIP applications, as it is much too time consuming.

The Online Network Traffic Characterization (ONTIC) project is an ongoing project which seeks to explore new ways to characterize network traffic online using large scale data analysis. In one of the project deliverables [4], they provide a collection of the current state of the art in offline algorithms used to this end. The document, based on 101 sources, gathers not only the algorithms, but also the frameworks primarily used to implement them. As it stands today, Apache Hadoop and Apache Spark are the most important frameworks used, taking advantage of distributed computing. Apache Hadoop in particular has been widely used for many years, and is based on the MapReduce paradigm, in which mappers in one step perform transformations of independent data, and reducers then aggregate the results.

The ONTIC project is very interesting for the field of study, however the algorithms described in the deliverable are ones for classification only. As the problem statement for this thesis involves the comparison of routes between two points, classification does not seem suitable, as there could be potential improvements even within the elements mapped to the same class.

Component-based QoS effects are studied in[26], where real-time application systems are considered. QoS effects are here attributed to components on a much lower level than in this thesis, but the original na¨ıve model described in this thesis is one that has been developed based on this work. This type of model is found to be useful in identifying long-term QoS issues, shifting focus away from short-term problems with components.

1.4 Limitations

• The parameters considered in the model are RTT, packet delay jitter, and packet loss. Bandwidth as a cost variable is omitted from the model due to the complexity in both measuring and modeling it. Unlike the other parameters, it is not additive and would require a different approach. (See Section 2.6)

• Given the time frame and amounts of data processed, it is not possible to request any new kind of data collection to be performed. The evaluation must therefore rely on data already available. For call data, this means that MOS entries are available for only a small fraction of the potential data. Additionally, all route data has been gathered externally (See Sec-tion 2.3), even though this could potentially be gathered via calls in the

(14)

future. As a result of these two data limitations, the actual number of AS pairs that would be involved in a large-scale implementation of this model will not be reached.

• The actual decision-making process is not in the scope of this thesis, as it is largely a business decision. That is, no attempt to compare the potential paths is made, as this requires information about what paths could be used for relaying the calls, as well as when it would be profitable to relay. The intent is solely to provide a basis to make such a decision.

(15)

Chapter 2 Background

This chapter will first introduce some concepts related to Internet routing to understand how the Internet can be modeled appropriately (Section 2.1-2.4). Then the QoS metrics that will be considered for the model and the methods with which to analyze and estimate them will be presented (Section 2.5-2.7).

The terms route and path are used interchangeably throughout the thesis.

2.1 Autonomous systems

The Internet is a large network of routers connecting hosts to each other. In order to manage the scale of the Internet and allow traffic to be routed correctly, the Internet has been divided up into autonomous systems, all with indepen-dently managed internal routing. These ASs connect to each other through Internet Exchange Points or direct links. Formally, an autonomous system is defined by RFC 1930 to be ”a connected group of one or more Internet Proto-col (IP) prefixes run by one or more network operators which has a single and clearly defined routing policy”[17], where prefixes are contiguous blocks of IP addresses.

At least 50,507 ASs were in use as of May 6th, 2015, a number that has seen a consistent growth of roughly 3,000 per year over the last decade [1].

2.2 Routing

When routing traffic within an AS, any Interior Gateway Protocol (IGP) may be used to determine the most suitable path. Some ASs may opt to consider router hop count, while others use more complex metrics when selecting paths [23].

When routing traffic between ASs, the protocol used is the path vector protocol BGP. Paths are found by advertising routes to neighboring ASs, and selecting the best route among the ones known. While ties between best routes are broken by AS path length, the protocol first considers a local preference

(16)

value for path selection. This value depends on a local policy, the specifics of which are decided upon at the discretion of the AS administrator. The policy can be affected by any number of things, not all of which are related to path performance optimality [24]. For example, network operators may have agreements with each other that have to be honored, or they might have to handle large amounts of traffic through load balancing [23].

Studies have shown that packets very frequently are not routed using the shortest path, neither on the router level [24] nor on the AS level [14]. Un-derstanding the routing decisions made is clearly quite a complex problem, especially as routing policies are not obvious to an observer.

2.3 Route Views

University of Oregon’s Route Views [3] is a project which collects information about the global routing system, and publishes it for public use. It has a number of participating routers around the world that contribute with their BGP routing tables every two hours, showing snapshots of their point of view of the Internet. Each entry in these tables contains the following information:

• Network - The IP network reachable through this route

• Next Hop - The IP address to which traffic will be forwarded on this route • Metric - A value used to discriminate between different points of access

to a neighboring AS

• LocPrf - The local preference value of the route, determined by BGP policy

• Weight - A local, Cisco specific preference value, not shared through BGP • Path - The AS path taken to reach the network through this path Of particular interest for this thesis is the Path field, which provides information on the connections between ASs. The Route Views data is not a complete set of paths, as it only contains a few viewpoints, but with this information it is possible to construct a graph representation of the Internet, which can be used to infer additional possible paths for traffic to flow. However, as the next section will elaborate on, due to routing policies such a graph is not sufficient to find the paths that traffic actually takes.

2.4 AS relationship inference

ASs are typically owned by a company or an Internet Service Provider (ISP), who pays a higher-tier ISP to transit its traffic to the rest of the Internet. Some ASs might be connected to multiple higher level ISPs, yet they do not advertise to these ISPs that these additional connections are possible routes [23]. That is,

(17)

the ISPs will only see the paths through the AS to lower-tier ISPs as available. It is clear that the existence of a connection does not mean that traffic will be routed through it for various reasons. To better understand how ASs interact, a classification scheme for AS relationships was proposed by Gao [13].

The proposed scheme divides direct connections between ASs into the follow-ing four categories: provider-to-customer (P2C), customer-to-provider (C2P), peer-to-peer (P2P), and sibling-to-sibling (S2S). X is considered a provider for Y, considered the customer, if X transits traffic for Y but not the other way around. If X and Y transit traffic for each other, they are considered siblings, and if neither transits traffic for the other, they are considered peers.

CAIDA, analyzes Route Views BGP tables in order to gain a better un-derstanding of how ASs are related to one another. They base this on the classifications of Gao, but apply more recent algorithms to infer the relation-ships. [10, 11] The results of their analysis is continually published in the form of relationship data, where they list all links between ASs, and whether they are provider-to-customer or peer-to-peer. Customer-to-provider relationships are implicit in the data sets. Sibling-to-sibling relationships are not reported in the data sets, although they state that they infer such relationships in their analysis as well. Since they are not present in the data sets however, this type of relationship will be omitted from further discussion.

A B

C D E

F G H

Customer-to-Provider Peer-to-Peer

Figure 2.1: An AS graph with relationships.

Within this framework, some conclusions can be drawn about valid paths in an AS graph. Valid paths must be valley-free, which is defined as: after passing a provider-to-customer or peer-to-peer edge, a path may not pass a peer-to-peer or customer-to-provider edge anymore. Violations of this principal would mean that an ISP would be transiting traffic for other ISPs without getting paid by anyone, which would not be in their best interests.

To illustrate, in Figure 2.1, an example of AS relationships is shown with ASs organized in vertical levels by tier. Consider the case where AS F wishes to send

traffic to AS H. There are two possible valid paths: F → C → A → D → H and

F → C → D → H. These paths first travel up toward providers, then across 0

(18)

There are also four invalid paths between the two ASs, ignoring looping

paths. For example, in the path F → C → A → D → B → E → H, the

section A→ D → B first travels across a provider-to-customer edge, and then

across a customer-to-provider, creating a valley. The other paths contain similar violations, causing D to not act as a provider for anyone, yet still transit traffic.

2.5 Quality of Service metrics

The following network metrics that are known to impact QoS [19] will be con-sidered in this thesis:

• Round-trip time (RTT) - The time it takes for a packet to travel from the sender to the receiver, and a response to return.

• Packet delay jitter - The variance of the time it takes for a packet to travel one way between sender and receiver. Also known as Packet Delay Variation, but will be referred to as simply jitter in this thesis.

• Packet loss - The percentage of packets that are lost in transit between sender and receiver.

These metrics have been collected alongside a subjective evaluation of call qual-ity made by the call participant after the call has concluded, which is known as Mean Opinion Score (MOS). The participant is asked to rate the call on a scale from 1 to 5, with 1 being ”bad”, 2 being ”poor”, 3 being ”fair”, 4 being ”good”, and 5 being ”excellent”. MOS is the arithmetic mean of these subjective ratings. When the setting for calls is controlled, the expected MOS can be calculated using a formula defined in ITU-T G.107 [18]. This formula depends on addi-tional metrics however, some of which, like codec-specific parameters, cannot be measured or calculated easily [5] and are not part of the data available.

2.6 Additivity of QoS parameters

An additive function is a function f that satisfies the following equality: f (x + y) = f (x) + f (y)

for any x, y in the domain of the function.

In order to apply linear regression to the parameters under observation, we must have them in an additive form (See Section 2.7.2).

RTT can be assumed additive as is: twice the geographical distance should result in twice the round trip time.

Jitter is a measurement of the variance of the one-way delay, and the perfor-mance of individual components in our network are independent of each other.

(19)

As a result, it too can be considered additive, as the following is true for all components X, Y :

V ar(X + Y ) = V ar(X) + V ar(Y ) + Cov(X, Y ) Cov(X, Y ) = 0

The total packet loss over a path is a product of individual packet loss factors, with a fraction of packets being dropped in individual components in a path. It can be calculated as such:

LP =

Y

i∈P

(1_{− L}i)

Where LP is the total packet loss over a path P , and Liis the packet loss over

component i. We can transform this into the following additive form: log(LP) =

X

i∈P

log(1_{− L}i)

As a note, the bandwidth of a path is determined by the lowest bandwidth of the components on the path:

BP = min

i∈P(Bi)

Where BP is the bandwidth available over a path P , and Li is the bandwidth

available over component i. There is no way to reasonably transform this in order to make it additive, which is why it is not considered in the proposed model.

2.7 Machine learning algorithms

Machine learning as a field is concerned with learning from data and being able to make predictions based on this, utilizing algorithms that can adapt to the data it is provided. The algorithms are often iterative, requiring more computations to reach a good result, but may require less assumptions be made about the data. Algorithms that have been considered suitable to try to estimate MOS are described in this section, as well as the LLSE algorithm used to compare the two network models.

2.7.1 Clustering

Clustering algorithms are algorithms that aim to group items into sets in such a way that the items in each set are more closely related to each other than to those of other sets. This definition is quite broad, and as such, there are many clustering algorithms to accommodate the many different kinds of clustering that may be desired [12]. Clustering algorithms can be beneficial when it is not obvious how the data points are related, and may as such be interesting for MOS calculations, where several parameters together map into [1, 5]. Two clustering algorithms that appeared to be good candidates for the kind of clustering desired are described here.

(20)

k-means clustering

k-means clustering is a centroid clustering algorithm [16]. Its goal is to partition all data points into sets, minimizing the squared euclidean distance from each member of the set to the mean of the set, the centroid. Initially, k centroids are randomly generated. The algorithm then alternates between two steps until convergence:

1. Assign each of the data points to the cluster to which the euclidean dis-tance to the centroid is the smallest.

2. Recalculate the position of the centroid for each cluster.

k-means clustering is considered a hard clustering algorithm, meaning each data point is assigned to a single cluster.

Gaussian mixture model clustering

Gaussian mixture model (GMM) clustering is a distribution-based clustering model which uses the expectation-maximization algorithm to train the model [16]. This algorithm will produce probabilities that a data point belongs to a certain distribution, having iteratively maximized the log-likelihood of these distribution predictions. Initially, k distributions are randomly generated. The algorithm then alternates between two steps until convergence:

1. Compute the probabilities of belonging to each cluster for every data point using current model parameters.

2. Based on the probability that a data point belongs to a cluster, recompute the mean and variances of the distributions.

GMM clustering is considered a soft clustering algorithm, meaning each data point may be assigned to several clusters. It can however be used as a hard clustering algorithm by assigning each data point to the cluster it has the highest probability of belonging to.

2.7.2 Supervised learning

Supervised learning algorithms are algorithms that seek to infer a function from a set of training data. The training data takes the form of input values X, and corresponding output values Y. The function produced should then be able to map unseen input values to the correct output value.

There are many supervised learning algorithms with different strengths and weaknesses. They can be divided into two different categories: regression, which deals with functions mapping to values, and classification, which deals with func-tions mapping to different categories. Since a comparison between the values of QoS parameters is sought, we want to map to values, so we need regression algorithms. Here we will look at two ensemble learning algorithms, as these

(21)

are well-established methods of for increasing accuracy of machine learning al-gorithms [9]. The Linear Least Squares Estimation algorithm is also described, but it is not considered appropriate for MOS estimation, as MOS is not a linear function of other QoS parameters [18].

Ensemble learning

In ensemble learning, an ensemble can be considered a ”committee” of learning algorithms. The individual algorithms in the ensemble weight their results and produce a common result that in many cases is better than that of the individual learning algorithms [16].

Bagging Bagging, or bootstrap aggregating, is an ensemble method that

aver-ages a number of bootstrapped models to reduce the variance in its predictions [16]. Bootstrapping refers to sampling from the data set with replacement, re-sulting in training sets for the model that have a number of duplicate elements. After training the individual models, each of the models’ outputs are aver-aged to obtain the bagged model’s final output.

Boosting Boosting is an ensemble method that trains itself iteratively on

the provided data set by focusing on data points that were poorly predicted [16]. Initially, all data points are weighted equally, and the underlying model is trained on the set. Each data point is then in each step weighted by the square error of the aggregated prediction of the models trained so far and the actual value. This causes the next iteration to focus more on data points that were poorly predicted previously.

Decision trees Decision trees are often used as the underlying model for

ensemble learning, as it is a fast algorithm [16]. The trees are built by analyzing all possible binary splits of the training data, and then selecting the one that gives the lowest mean square error. This is then done recursively for each node in the tree until a stopping criteria is reached, such as maximum tree depth, or the number of observations in the node being too small to continue. The model can then be queried for predictions easily by following the correct splits down the tree.

Linear least squares estimation (LLSE)

LLSE is a way to fit a linear model to data that isn’t fully explained by the model in question by minimizing the square of the residual errors.

Consider the case where we have a relationship of the form x1β1+ x2β2+ ... + xNβN = y

where xiand y can be measured, and we wish to determine βi, for i∈ [1, N]. To

(22)

using matrix form as

Xβ= y

We are then interested in finding the best set of coefficients ˆβthat minimizes

C X i=1 |yi− N X j=1 Xijβˆj|2

The solution to this minimization problem, given that the columns of X are linearly independent, is given by solving the equation

(XT_{X) ˆ}_β_{= X}T_y

The time complexity of solving this when utilizing parallel computing is given in [7] as O CN 2 P + N3 P + N 2_{log(P )} with P cores. As C > N , N3 P is dominated by CN2

P . In any practical

applica-tion, C

P > log(P ) as well, so N

2_{log(P ) is dominated by} CN2

P . Thus, the time

complexity of the algorithm can be considered

O CN

2

P

This is of interest as we seek to compare the models on their computation times in particular.

(23)

Chapter 3 System design

There are many ways the problem described in Section 1.1 could be approached, with their own advantages and disadvantages. This chapter will cover the design choices made in the implementation of the approach described in Section 1.2. A brief summary of the design chosen is included at the end of the chapter.

3.1 System overview

The two main components of the system are the network model, which is re-sponsible for estimating the network QoS metrics of the expected path for the traffic, and the MOS model, which is responsible for translating these metrics into a single measurement of QoS.

Network Model MOS Model

Path endpoints {start end [relay]}

Network metrics MOS estimate

Figure 3.1: Full system query.

Figure 3.1 illustrates the expected use case of the system. It will be queried with a list of at least 2 ASs: the start AS, the end AS, and an optional list of relay ASs. This set is evaluated by the network model, which calculate network metrics that are passed on to the MOS model for translation. The estimated MOS value is then returned as the response to our initial query.

If the optimal relay points for a call are known, then this system can provide enough information for a relay decision to be made with two queries: one with only the start and end ASs provided, and one with the relay ASs included.

If there are certain known thresholds on QoS metrics, it is plausible that a decision could be made without the presence of the MOS model; no such assumption is made here however. Should such information be available though, the two components are easily decoupled by design.

(24)

3.2 Network model

The network model is the component of the system that is of particular interest in this thesis. How can we attempt to accurately model the connections in such a way that useful information is obtained? What are the pros and cons of different approaches?

A straightforward way of designing the network model would be to directly assign the network metrics to the input set of ASs. This might be done through simply collecting data on previous calls for that connection and performing some analysis on it. Figure 3.2 illustrates how this would be designed.

Cost estimation

Call data collection

Call network metrics

Path endpoints

{start end}

Network metrics

Figure 3.2: A simple network model design alternative.

While such a direct approach would likely be quite successful given enough data, there are some concerns with it. For example, if little or no data exists for a particular path, estimating network metrics becomes hard or impossible. This model does not require any information about the network infrastructure, but consequently does not give any new information about it either.

Another approach is to utilize information of the network infrastructure and divide paths into network components, assigning network metric costs to each component. Figure 3.3 shows what a query would look like in that system.

Mapping Cost estimation

Call data collection

Path endpoints {start end [relay]}

Path components Network metrics

Call network metrics

Figure 3.3: A path mapping network model design alternative.

There is additional complexity in this design, as it requires the addition of the path mapping component, but in return it has the potential to return some information about the behavior of individual components of the infrastructure. It could also allow for the prediction of call metrics of previously unseen AS

(25)

pairs. To illustrate, consider the connections in Figure 3.4. Assume that we have previously seen a number of calls from AS A to AS C, and from AS B to AS D. As each node and edge in this model is considered a component with associated costs, we could now infer costs for a call between B and C by adding the component costs together.

A

B

E

C

D

Figure 3.4: Four ASs interconnected through a common AS.

While both of these approaches would be interesting to explore, the latter has been chosen for its greater perceived potential. The following subsections will go into further detail about the path mapping and cost estimation system components and their design.

3.2.1 Path mapping

With the chosen approach, mapping paths to the correct set of components is crucial. If this is not done correctly, large components might be erroneously assessed and applied to other paths, leading to unreliable model output. Some different options have been explored to approach the mapping problem, de-scribed below.

Least hops model

The first approach to modeling the Internet infrastructure is based on the CAIDA dataset of AS relationships [2]. It uses the known relationships be-tween ASs to infer paths bebe-tween any given pair. While it is known that the shortest path using direct connections between ASs is often not the path used in practice [14], with this model we seek to investigate whether the relationships presented in Section 2.4 can be applied to produce valid path results.

The first thing to do is construct a graph representation of the network. The CAIDA dataset provides two files, both of which we use to construct this graph.

(26)

The first contains a list of all ASs reachable downstream from every AS. More importantly, it lists every AS in the dataset, so we can construct the set of all nodes from this file. The other file contains a list of every relationship that has been inferred, and its type: peer-to-peer or provider-to-customer.

43 79

55 32 15

1 4 8

Customer-to-Provider Peer-to-Peer

Figure 3.5: An AS graph with relationships and AS numbers.

0

1

2

3

4

5

6

7

1

4

8

15

32

43

55

79 6,C2P

6,C2P

3,C2P 4,C2P

2,P2C 4,P2P 7,C2P

2,P2C 3,P2P 5,C2P 6,P2P 7,C2P

4,P2C 6,P2C

0,P2C 1,P2C 4,P2P 5,C2P

3,P2C 4,P2C

Figure 3.6: An AS number list and an adjacency list corresponding to the graph in Figure 3.5. Indices are on the left.

The number of ASs and relationships is small, so an adjacency list can be used to represent the graph in-memory. To illustrate, consider the graph in Figure 3.5. It is the same graph as in Figure 2.1, but with ASs numbered. The way this graph would be represented is shown in Figure 3.6. We keep a sorted list of the AS numbers present in the graph, and have the indices of that list correspond to the indices of the adjacency list. Each entry in the adjacency list is then a list of the relationships that AS has, in the form of the AS index and the relationship type.

(27)

Given this information, we can do a breadth first search on the graph to find viable paths, subject to the constraints described in Section 2.4. A Java implementation of this can be found in Appendix A.1. This will return two different sets of distances and parents for each of the nodes in the graph: one where future path options are constrained by the path taken thus far, and one where they are not. This is of interest as the shortest path to a particular node might not be part of the shortest path to its neighbor. Therefore, in order to properly reconstruct paths, we must keep track of the best paths for both these cases.

Having computed this from a source node, we can then recursively recon-struct the set of paths between the source and any other node in the graph as shown in Algorithm 1. Calling the function initially, the parameter d is set to the shortest of the two distances that have been found for the node. On the recursive calls however, it is simply decremented by 1 for each call, preventing an invalid path from being reconstructed by picking the locally optimal distance for the current node.

Algorithm 1Path Reconstruction

Require: An end node e. The distance d to e on this path. The distance Df

and the direct parents Af for each node on the path from a start node

with-out traversing a P2P or P2C link. The distance Dc and the direct parents

Ac for each node on the path from a start node allowing for traversing P2P

or P2C links. A function rel(u, v) returning the edge connecting nodes u and v.

Ensure: The set of shortest paths to e, P .

1: _{procedure ReconstructPaths(e, d, D}f, Af, Dc, Ac) 2: P ← {∅} 3: Let D, A be Df, Af or Dc, Ac so that D = d 4: forv∈ A do 5: P0← ReconstructPaths(v, d − 1, Df, Af, Dc, Ac) 6: forq_{∈ P}0 _do

7: q← q ∪ {e, rel(v, e)}

8: P ← P ∪ {q}

9: if P =_{{∅} then}

10: P ← {{e}}

11: returnP

The main advantage of this model is that the topology information is readily available and requires little memory to hold the graph. However, computation times can be large unless the input data is sorted appropriately in order to min-imize the number of times that the constrained breadth first search algorithm must run. For an initial run it is possible to sort large amounts of call data on source AS, but for random queries no such sorting can be relied on.

Unfortunately, the output of this model will in the general case find multiple paths of the same length despite the added constraints imposed. With the

(28)

topology data used, there is no way to identify all policies applied, so paths that other traffic takes can appear as a valid path for the pair we are interested in. It is also true that it may be the case that none of the paths returned is the correct one, as policies could further inflate path length [14]. With such unpredictable results, this model is clearly unsuitable.

Known path model

An alternative approach is to collect data on actual paths taken by traffic and use this to map. This is equivalent to keeping routing tables for all ASs, which can then be used to construct the paths. As stated in Section 1.4, complete traceroute data that could give a near complete view of the relevant Internet routes is not available to us. Thus, for this thesis, the Route Views data set described in Section 2.3 is used. This limits the amount of paths that are known to the ones visible through the Route Views routers, but should provide enough known paths to evaluate the model.

To construct the routing tables, the AS Path field of the Route Views BGP routing table entries is used. As it gives the complete path from the start AS to the end AS, the path can simply be stepped through to create the next hop entry for each AS in the path. Since BGP propagates paths, it is certain that a path that is a tail of this AS path will be a correct path as well. Had there been a different preferred path for such a tail, then that would also have been reflected in the full path. A map data structure is used to store the (start, end) AS pair and its next hop AS. Reconstructing the path is then a trivial task, as seen in Algorithm 2.

Algorithm 2Known Path Construction

Require: A start node s. An end node e. A function NextHop(s, e) that

returns the next hop from s on the path to e. A function rel(u, v) returning the edge connecting nodes u and v.

Ensure: The components on the path P from s to e.

1: _{procedure ConstructPath(s, e)} 2: P ← {∅} 3: while s6= e do 4: P ← P ∪ {s, rel(s, e)} 5: s← NextHop(s, e) 6: P ← P ∪ {e} 7: returnP

The most important advantage of this model is that the paths are accurate. While paths may change over time, and changes should be monitored, the model provides one path as a response. The case of load balancing has not been considered here, though allowing for the construction of multiple paths would require only small modifications to the model. More interesting is how the multiple paths would be handled by the cost inference, as well as how a decision

(29)

of path selection would be made if the paths differ in quality.

This model is also faster than the least hops model, as it does not need to search a graph every time it is queried, but at the cost of having all path information stored. This may require quite a bit of memory as the available routing information grows.

The big disadvantage of course is the data collection, requiring a reliable way of obtaining path information. Given that obtaining such data is realis-tic though, this model is the one that has been chosen as the path mapping component.

3.2.2 Path component cost estimation

Going back to Figure 3.3, we now turn to the second part of the network model. Given the path components from the mapping, network metrics are to be esti-mated.

Training the cost estimation component can be visualized as in Figure 3.7. The first step is again to collect call data to be used for inferring component costs. Before we can continue with that however, the AS pairs must be provided to the mapping system component in order for them to be translated into path components, which are returned to the cost inference step. Now the the call data can be analyzed and finally the inferred component costs can be provided for the cost estimation.

Mapping Cost estimation

Cost inference Call data collection

3

2 4

1

Figure 3.7: Training the cost estimation component.

Cost inference

LLSE has been chosen as the method to analyze the call data. This is due to the na¨ıve model having used this method as well, and we wish to be able to make a fair comparison between the models. We have assumed that our network metrics are additive, and that components are independent of one another, so this is a valid choice. We also know that the algorithm is scalable when our data set grows [7].

(30)

The problem to solve is, as stated in Section 2.7.2

Xβ= Y

where X is a matrix with rows corresponding to data points, and columns to path components. The matrix simply consists of 1’s and 0’s, signifying whether a component is part of a row’s path or not. Y is a matrix with rows corresponding to those of X, and columns to the network metrics, transformed according to Section 2.6.

Cost estimation

Having calculated β, cost estimation is trivial. With a query x, equivalent to a row in X previously, the estimate ˆyis obtained by

xβ= ˆy

Of course, having transformed the metrics to their additive representations, they must be transformed back before they are returned from this step.

3.3 Mean opinion score estimator

The second main component of the system is the MOS estimator, which helps with interpreting the values from the network model. As no models that describe MOS using only the parameters available in our data have been found, we will apply statistical analysis to attempt to produce a useful estimate.

Several methods are proposed and tested to identify the model with the best results. The tests and results are detailed in Chapter 4, while the resulting design decisions are outlined here.

Learner ensemble MOS estimation Call data collection

Network metric training set RTT, jitter,

packet loss

MOS estimate

Figure 3.8: A MOS estimation model with ensemble learning applied directly to call data.

The first method considered is to apply supervised learning algorithms di-rectly to the call data. In doing so, no assumptions about the relationship between network metrics and MOS are made.

The supervised learning algorithms considered are the ones from Section 2.7, the ensemble learning algorithms. In Figure 3.8, a boosting or bagging ensemble

(31)

is trained with RTT, jitter, and packet loss as input paramters, and reported MOS values as output values. The ensemble is then directly queried to obtain an estimated MOS value.

Clustering MOS estimation Call data collection

Network metric training set RTT, jitter,

packet loss

MOS estimate

Figure 3.9: A MOS estimation model with clustering applied to call data. The second method considered is to cluster data points by network metric values, effectively discretizing the space of MOS values. The motivation behind clustering is that the MOS values in the call data are observations of a random variable, the mean of which is what is of interest to us. The clustering algorithms described in Section 2.7 require the number of clusters to be specified. However, there is no intuitive way of knowing how many clusters could be appropriate for our purposes. A starting point for the amount of clusters is therefore chosen

through a rule of thumb [22] to be k =pn/2, where n is the data set size. The

rule of thumb was chosen over other ways of selecting k mainly for its simplicity, as computation times were already growing quite large.

Learner ensemble MOS estimation

Clustering MOS estimation Call data collection

Network metric training set RTT, jitter, packet loss Network metric training set with MOS estimate

MOS estimate

Figure 3.10: A MOS estimation model with supervised learning applied to call data with MOS estimates from clustering.

Figure 3.9 shows how this method functions similar to the previous one, though the training data is here instead clustered with either k-means or

(32)

Gaus-sian Mixture clustering. Queries are then assigned to the most appropriate cluster in accordance to the algorithm, and the mean MOS value of the clus-ter’s data points is returned as an estimate.

The final method considered is where clustering is first used, and supervised learning is then applied to the resulting data. Here we wish to see if the clustered results can be improved upon by increasing the set of possible results. Figure 3.10 illustrates how this would look.

3.3.1 Evaluation of alternatives

All of these approaches were tested using the full evaluation data set that will be described in Section 4.2. When reviewing the results, the first method of applying supervised learning to the call data directly is found to be the best performing. Of the two algorithms tried for this method, the boosting algorithm produced the best result while also being faster and requiring less resources. Consequently, the MOS estimator component of the system is chosen to be that of Figure 3.8 with a boosting ensemble. The mean square errors of all approaches are shown in Figure 3.11. The test results will be shown in more detail in Section 4.4. GMM GMM+BaggingGMM+Bo osting k-means k-means+Bagging_k-means+Bo osting Bagging_Boosting 1.5 1.7 1.9 2.1 2.3 2.5 Mean Square Err or

Figure 3.11: Mean square errors of all MOS estimation approaches.

3.4 Design summary

Figure 3.12 shows an overview of the system’s design after all choices have been considered. Three main components make up the system: path mapping, path component cost estimation, and MOS estimation. The network model, which is

(33)

AS Path

{start end [relay]}

Known path mapping

LLSE cost estimation

Boosting ensemble

MOS estimation

MOS estimate

Routing data

Call MOS data

Call network data

Path components

Network metric estimate

Figure 3.12: Final system design with three main components and their inputs and outputs.

of the greatest interest, consists of the first two components, which work together to produce the network metric estimate. The cost estimation component is tightly coupled with path mapping, providing estimates based on the particular mappings from the previous step. The MOS estimation component however could be removed from the system easily, should a decision be possible from network metrics alone.

To train the model, two sets of data are required: the AS-level paths taken by calls, and call data with network metrics and MOS.

(34)

Chapter 4 Evaluation

In Chapter 1, we presented a number of questions to be answered in this thesis. Is the proposed network model as reliable as the current na¨ıve model? Does performance in terms of computation time improve in practice when comparing the proposed model to the current na¨ıve model? Can the model directly predict user experience by translating network metrics into MOS? Having decided on a design for the system, this chapter will answer these questions.

The chapter will describe how the system and its components have been tested and evaluated, and present the outcome. First, the objectives of the tests and what the tests will cover is defined. Then, the choice of data sets is presented. Next, the two main components are tested separately. Finally, the system as a whole is tested.

4.1 Test objectives and parameters

The goal of testing the model is to ascertain how well the model cost predictions match real data. This will be done by measuring the Mean Square Error (MSE) and Root Mean Square Error (RMSE) of the model outputs compared to the true observed values in the test sets.

For the network model, what will be tested is the estimated network metric values for AS pairs, and the computation times required. In addition to the MSE, plotting the estimates against observations is of interest to see what the nature of the errors are, and how the estimator behaves. What is most interest-ing is how the path mapped model compares to the na¨ıve model, particularly in computation times, as this is where improvements can be expected.

For the MOS estimator, it is naturally the estimated MOS value that is compared to the observed MOS values for the test data, with network metrics as input values.

For the complete system it is the error when comparing the MOS estimate for an AS pair and the mean observed MOS for that pair that is to be measured. Similar to the network model, the spread of MOS values is also valuable to look

(35)

at, as it shows whether the user experience is somewhat consistent for a pair as well. The metrics used to evaluate the models have been chosen based on prior experience using them for statistical analysis.

4.2 Data set selection

The measured metrics are not the only factors that determine a user’s experience of a call, as noted in Section 2.5. There are factors on the user’s side that affect the perceived experience substantially [18].

In order to reduce the amount of factors that affect the data, the platforms were constrained. Only calls between two Windows desktop clients are used in the test data. Unfortunately, this does not rule out differences in client versions, which can affect quality. It also does not differentiate between wired and wireless Internet connections, which means that things like wireless access point location planning may affect MOS values.

The data is based on calls performed throughout February 2015, during which time a fraction of users are were randomly prompted to rate the call on a scale from 1 to 5. If a user opted to rate the call, then the call data was logged. To match the time period, the Route Views data used is from a single point in time at the middle of the time period. Some routes may have changed prior to or after this midpoint, introducing some possibility of error.

Source AS Remote AS Jitter(ms) RTT(ms) Send loss (%) Receive loss (%) MOS

13456 7442 20 152 0 10 4

Table 4.1: An example of the call data entries used.

Table 4.1 shows an example of the call data used as the basis for training and evaluating the model. ASs are identified by their AS number, and jitter and RTT are measured in milliseconds. Packet loss is measured separately for packets sent and received.

This data was further filtered based on AS pairs in order to be able to perform the tests. First, only AS pairs whose paths were present in the Route Views data could be used, as path mapping could not be performed otherwise. Second, only AS pairs that were present at least 100 times in the call data were included when testing the model errors. This is to have some confidence in the data for the pairs, as well as being able to cross validate the model without having AS pairs in the test set that were not present in the training set.

The final data set used for the evaluation of accuracy consisted of 2,549,900 calls from 2,178 unique AS pairs. For performance evaluation, the data set contained 2,934,919 calls from 65,579 unique AS pairs.

(36)

4.3 Network model

In testing the network model, 10-fold cross validation [16] has been performed with the data set, training both a na¨ıve model and a path mapped model. The evaluation was performed in MATLAB, and the implementation is available in Appendix A.2. Table 4.2 shows the resulting errors for both models. There is no notable impact on the average error from the path mapping, with the errors being very similar. At a glance, the errors appear quite large however.

Model Error Jitter (ms) RTT (ms) Send loss % Receive loss %

Path Mapped MSE 2665 2900 0.2 0.14

RMSE 163.3 170.3 4.5 3.8

Na¨ıve MSE 2666 2913 0.2 0.14

RMSE 163.3 170.7 4.5 3.8

Table 4.2: Mean square errors and root mean square errors in 10-fold cross validation of the network models.

Path Mapped Na¨ıve

∈CI ∈CI/ ∈CI ∈CI/

RTT 1750 428 1749 429

Jitter 432 1746 435 1743

Send loss 1096 1082 1105 1073

Receive loss 1380 798 1385 793

Table 4.3: The number of network parameter estimates that fall inside of and outside of the 95 % confidence intervals for the mean of the observed values for each AS pair.

Having done cross validation, the plots in Figures 4.1-4.8 show the estimates generated by the models plotted against the means of the data points for every AS pair in the data set.

Table 4.3 shows the number of estimates that fall within and outside of the 95 % confidence interval (CI) for the mean of the observed values, assuming that the observations are normally distributed around the true value. Quite a large number of estimates fall outside of the confidence interval, indicating that the confidence of the model’s estimates is not that great. The data shown is for one of the cross validation folds, though the others behave similarly.

Figures 4.1 and 4.2 show the results of the RTT estimation. The two models produce very similar results, and seem to overall follow the diagonal line that would mean a perfect prediction of the means. It would appear that the models underestimate large RTT values, though the number of data points above 600 ms is not large enough to be conclusive. In the more populated ranges, the spread spread is more uniform, though somewhat large, as seen in Table 4.2. Depending on the required accuracy, this may be acceptable, especially as the

(37)

0 300 600 900 1,200 1,500 1,800 0 300 600 900 1,200 1,500 1,800 Estimated mean (ms) Observ ed m ean (ms)

Figure 4.1: Path mapping

net-work model RTT estimates plotted against the means of test point val-ues. 0 300 600 900 1,200 1,500 1,800 0 300 600 900 1,200 1,500 1,800 Estimated mean (ms) Observ ed m ean (ms)

Figure 4.2: Na¨ıve network model RTT estimates plotted against the means of test point values.

most interesting data points are the ones with high RTT. 20 % of the estimates fall outside of the 95 % confidence interval, which is the best result of the estimated metrics, but still a large amount. The expected value from a perfect fit would be 95 % accuracy.

0 200 400 600 800 1,000 0 200 400 600 800 1,000 Estimated mean (ms) Observ ed mean (ms)

Figure 4.3: Path mapping

net-work model jitter estimates plotted against the means of test point val-ues. 0 200 400 600 800 1,000 0 200 400 600 800 1,000 Estimated mean (ms) Observ ed mean (ms)

Figure 4.4: Na¨ıve network model jitter estimates plotted against the means of test point values.

Figures 4.3 and 4.4 show the results of the jitter estimation. Again, the two models have generated very similar results, but the fit is much worse than that of RTT. The regression models have a tendency to overestimate this parameter, and has comparatively few error cases where it has underestimated it. Consid-ering that the majority of the observed means are below 200, the RMSE of 163

(38)

ms makes this parameter’s estimation very untrustworthy. In fact, 80 % of the estimated values fall outside of the confidence intervals for the observed means.

0 5 10 15 20 25 0 5 10 15 20 25 Estimated mean (%) Observ ed mean (%)

Figure 4.5: Path mapping network model sending packet loss estimates plotted against the means of test point values. 0 5 10 15 20 25 0 5 10 15 20 25 Estimated mean (%) Observ ed mean (%)

Figure 4.6: Na¨ıve network model sending packet loss estimates plot-ted against the means of test point values. 0 5 10 15 20 25 0 5 10 15 20 25 Estimated mean (%) Observ ed mean (%)

Figure 4.7: Path mapping network model receiving packet loss esti-mates plotted against the means of test point values.

0 5 10 15 20 25 0 5 10 15 20 25 Estimated mean (%) Observ ed mean (%)

Figure 4.8: Na¨ıve network model re-ceiving packet loss estimates plotted against the means of test point val-ues.

Figures 4.5 and 4.6 show the results of the send packet loss estimation. Yet again, the models generate very similar results, though quite concentrated close to zero. The estimator has a tendency to overestimate the packet loss, not completely unlike the jitter estimator. This is particularly noticable when the mean is zero, as there are plenty of estimates along the X-axis. Other than that slight bias though, the estimator behaves quite randomly, with values all over

(39)

the place. Half of the estimates fall within the 95 % confidence interval, which again is poor.

Figures 4.7 and 4.8 show the results of the receiving packet loss estimation. Unsurprisingly, the models match closely also for the last parameter. Like the other packet loss parameter, the values are concentrated close to zero. The estimator seems to work slightly better for this data, though the plot is still very scattered. The RMSE is down slightly as well, and 63 % of estimated values fall within the confidence intervals here. The performance must still be considered poor however.

Overall, it can be said that the path mapping step has not noticeably influ-enced the linear regression step of the network model. The regression results appear quite poor, however it can be noted that for the large values in all of the cases the models produce large as well. If these cases with large, i.e. poor, values are compared to significantly lower estimates, then there can be some value derived from these estimates. This is due to the fact that significance of the differences between estimates are likely outweighing the probability of error in the individual estimates.

0 2 4 6 ·104 0 2 4 6 ·104

Distinct pair count

T

otal

AS

coun

t

Figure 4.9: Total number of ASs present as a function of included distinct AS pairs.

Remaining then is to compare computation times between the two models, which was what we hoped to improve. Figure 4.9 shows how the total number of ASs grows as a function of the number of distinct AS pairs. This is of interest to us as it reflects difference in number of variables to solve for in the na¨ıve and the proposed model. As expected, the growth rate decreases fairly rapidly, having included 40,000 of the around 50,000 ASs already at slightly more than 60,000 distinct pairs. At the beginning though, the total number of ASs is naturally higher, as many paths consist of more than two ASs. At 5000 distinct pairs, the two are roughly equal.

Figure 4.10 shows the computation times of running the na¨ıve model and the path mapped model as a function of distinct AS pairs. It should be noted that the number of equations in the system was not held constant as the pair count

(40)

0 2 4 6 ·104 0 5 10 15

Distinct pair count

Computation

time

(s)

Naive Path mapping

Figure 4.10: Time to solve the LLSE system as a function of distinct AS pairs included.

increased, but it was the same between the models at each measured point. The na¨ıve model is in the beginning slightly faster, though by very small amounts. The path mapped model starts to perform better after a while though, at about the same point as AS count growth visibly starts to decline in Figure 4.9. The difference appears to be growing at a consistent rate, though the slope of the graphs could be affected by the varying number of equations.

This confirms that the path mapped model performs better than the na¨ıve model in the tests as the number of distinct AS pairs grows.

4.4 Mean opinion score estimator

To test the MOS estimator, we again use 10-fold cross validation. However, the data set was required to be reduced in order to allow for computations with limited memory space and time. The MOS data set samples 320,000 data points from the previously used data set to work with.

Beginning with the clustering algorithms, the optimal number of clusters was found through trial and error. Using the rule of thumb mentioned in Section 3.3,

we use k =p320, 000/2 = 400 as a starting point. Table 4.4 shows the results

of these initial tests. The results of clustering, as stated in Section 3.3, are the mean values of the resulting clusters’ training data points, and it is against this mean value that the tested values are compared.

The Gaussian mixture algorithm does not seem affected by varying the num-ber of clusters, whereas k-means loses accuracy as the numnum-ber of clusters in-creases. Since the clustering algorithms are computationally intense, greater granularity in the cluster numbers is not feasible to test. Therefore, the best result for each algorithm in this table is selected as the number of clusters to be used, which was 300 in both cases.

For these values, the results of the clustering are used to train the boosting and bagging learner ensembles, as depicted in Figure 3.10. We will see that the

Estimating Internet-scale Quality of Service Parameters for VoIP

Institutionen för datavetenskap

Department of Computer and Information Science

Final thesis

Estimating Internet-scale Quality of Service

Parameters for VoIP

by

Markus Niemelä

LIU-IDA/LITH-EX-A--16/013—SE

2016-03-24

Final thesis

Estimating Internet-scale Quality of Service

Parameters for VoIP

by

Markus Niemelä

LIU-IDA/LITH-EX-A--16/013—SE

2016-03-24

Supervisor: Mikael Asplund

Examiner: Simin Nadjm-Tehrani

Contents

List of Abbreviations

Chapter 1

Introduction

1.1

Problem description

1.2

Approach

1.2.1

Network modeling

1.2.2

Cost-based QoS evaluation

1.2.3

Evaluation of model

1.3

Related work

1.4

Limitations

Chapter 2

Background

2.1

Autonomous systems

2.2

Routing

2.3

Route Views

2.4

AS relationship inference

2.5

Quality of Service metrics

2.6

Additivity of QoS parameters

2.7

Machine learning algorithms

2.7.1

Clustering

2.7.2

Supervised learning

Chapter 3

System design

3.1

System overview

3.2

Network model

Cost estimation

Call data collection

Call network metrics

Path endpoints

{start end}

Network metrics

A

B

E

C

D

3.2.1

Path mapping

0

1

2

3