WebRTC Quality Control in Contextual Communication Systems

(1)

WebRTC Quality Control in

Contextual Communication

Systems

WEI WANG

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

Contextual Communication

Systems

WEI WANG

Communication Systems Date: July 26, 2018

Examiner: Prof. Gerald Q. Maguire Jr.

(3)

(4)

Abstract

Audio and video communication is a universal task with a long history of technologies. Recent examples of these technologies include Skype video calling, Apple’s Face Time, and Google Hangouts. Today, these services offer everyday users the ability to have an interactive conference with both audio and video streams. However, many of these solutions depend on extra plugins or applications installing on the user’s personal computer or mobile device. Some of them also are subject to licensing, introducing a huge barrier for developers and restraining new companies from entering this area. The aim of Web Real-Time Communications (WebRTC) is to provide direct access to multimedia streams in the browser, thus making it possible to create rich media applications using web technology without the need for plugins or developers needing to pay technology license fees.

Ericsson develops solutions for communication targeting professional and business users. With the increasing possibilities to gather data (via cloud-based applications) about the quality experienced by users in their video conferences, new demands are placed on the infrastructure to handle this data. Additionally, there is a question of how the stats should be utilized to automatically control the quality of service (QoS) in WebRTC communication systems.

The thesis project deployed a WebRTC quality control service with methods of data processing and modeling to assess the perceived video quality of the ongoing session, and in further produce appropriate actions to remedy poor quality. Lastly, after evaluated on the Ericsson contextual test platform, the project verified that two of the stats-parameters (network delay and packet loss percentage) for assessing QoS have the negative effect on the perceived video quality but with different influence degree. Moreover, the available bandwidth turned out to be an important factor, which should be added as an additional stats-parameter to improve the performance of a WebRTC quality control service.

Keywords

(5)

Sammanfattning

Ljud och videokommunikation är en universell uppgift med en lång historia av teknik. Exempel på dessa teknologier är Skype-videosamtal, Apples ansiktstid och Google Hangouts. Idag erbjuder dessa tjänster vardagliga användare möjligheten att ha en interaktiv konferens med både ljud- och videoströmmar. Men många av dessa lösningar beror på extra plugins eller applikationer som installeras på användarens personliga dator eller mobila enhet. Vissa av dem är också föremål för licensiering, införande av ett stort hinder för utvecklare och att hindra nya företag att komma in i detta område. Syftet med Web Real-Time Communications (WebRTC) är att ge direkt åtkomst till multimediaströmmar i webbläsaren, vilket gör det möjligt att skapa rich media-applikationer med webbteknik utan att plugins eller utvecklare behöver betala licensavgifter för teknik.

Ericsson utvecklar lösningar för kommunikationsriktning för professionella och företagsanvändare. Med de ökande möjligheterna att samla data (via molnbaserade applikationer) om kvaliteten hos användare på sina videokonferenser ställs nya krav på infrastrukturen för att hantera dessa data. Dessutom är det fråga om hur statistiken ska användas för att automatiskt kontrollera kvaliteten på tjänsten (QoS) i WebRTC-kommunikationssystem.

Avhandlingsprojektet tillämpade en WebRTC-kvalitetskontrolltjänst med metoder för databehandling och modellering för att bedöma upplevd videokvalitet av den pågående sessionen och vidare producera lämpliga åtgärder för att avhjälpa dålig kvalitet. Slutligen, efter utvärdering på Ericssons kontextuella testplattform, verifierade projektet att två av statistikparametrarna (nätverksfördröjning och paketförlustprocent) för bedömning av QoS har den negativa effekten på upplevd videokvalitet men med olika inflytningsgrad. Dessutom visade den tillgängliga bandbredd att vara en viktig faktor, som bör läggas till som en extra statistikparameter för att förbättra prestanda för en WebRTC-kvalitetskontrolltjänst.

Nyckelord

(6)

Acknowledgements

The degree project was performed at Ericsson in Kista, Sweden. I would like to thank my Supervisor, Stefan Hellkvist, who instructed me how to start the degree project and provide supports continually. I also thank a colleague, Ken Dai, who offered me the use of test platform as well as help from various aspects. Moreover, I appreciate my manager Peter Hammarlund, who helped in the application of the thesis work and follow-on support.

Importantly, I am honored to participate in Ericsson EC3 team, and thanks for all of help from all of the group members, Patrik Oldsberg, Per Boussard, Morteza Araby,etc. Finally, I would like to give a particular thanks to my examiner, Prof. Gerald Q. Maguire Jr., who give me suggestions and feedbacks for this thesis.

(7)

(8)

Abstract i

Abstrakt ii

Acknowledgements iii

Contents v

List of Figures viii

List of Tables x List of abbreviations xi 1 Introduction 1 1.1 Background . . . 1 1.2 Problem . . . 2 1.3 Purpose . . . 2 1.4 Goals . . . 2 1.5 Research Methodology . . . 3 1.6 Delimitations . . . 4

1.7 Structure of the thesis . . . 4

(9)

2 Background 6

2.1 WebRTC architecture . . . 6

2.1.1 WebRTC Voice and Video Engines . . . 8

2.1.2 Audio CODECs . . . 8

2.1.3 Video CODEC . . . 10

2.1.4 Image enhancements . . . 10

2.2 Real-Time Transport for a Session . . . 10

2.2.1 User Datagram Protocol (UDP) . . . 11

2.2.2 Session Traversal Utilities for NAT (STUN) . . . . 11

2.2.3 Traversal Using Relays around NAT (TURN) . . . 11

2.2.4 Interactive Connectivity Establishment (ICE) . . . 12

2.2.5 Session Description Protocol (SDP) . . . 12

2.2.6 Datagram Transport Layer Security (DTLS) . . . . 12

2.2.7 Stream Control Transport Protocol (SCTP) . . . . 13

2.2.8 Secure Real-Time Transport Protocol (SRTP) . . . 13

2.2.9 Real-time Transport Protocol (RTP) . . . 13

2.2.10 Summary . . . 13

2.3 WebRTC APIs . . . 14

2.3.1 RTCPeerConnection . . . 14

2.3.2 getUserMedia API . . . 17

2.3.3 WebRTC’s Statistics API . . . 18

2.4 Ericsson’s contextual WebRTC framework . . . 19

2.5 Related Work . . . 20

2.6 Summary . . . 22

(10)

3.1.1 Data Analysis . . . 24

3.2 Principle Components Analysis . . . 28

3.3 Modeling . . . 31

3.3.1 Clustering . . . 31

3.3.2 Sampling Methodologies . . . 40

3.3.3 Classification using Random Forests Algorithm . 42 3.4 Selection of Remedial Actions . . . 44

3.5 Summary . . . 46

4 Evaluation of WebRTC test platform 47 4.1 Framework of Evaluation . . . 47

4.2 Interaction Theory . . . 48

4.3 Software design . . . 50

4.4 Network Simulation Environment . . . 52

4.5 Results and Analysis . . . 53

4.6 Summary . . . 57

(11)

2.1 WebRTC overall architecture for the website (Adapted

from figure in [6]) . . . 7

2.2 Voice and Video Engine . . . 8

2.3 WebRTC network protocol stack . . . 11

2.4 STUN and TURN . . . 12

2.5 Calling Sequences: Set up a call . . . 15

2.6 Calling Sequences: Receive a Call . . . 16

2.7 WebRTC framework for Ericsson WebRTC services . . . . 20

3.1 statistical RTCP sender report for a video stream from 9 March 2018 . . . 25

3.2 CDF of googRtt . . . 33

3.3 CDF and histogram of googAdaptationChanges . . . 34

(12)

3.11 Random Forest Simplified[59] . . . 43

4.1 Test Platform Framework . . . 48

4.2 Web interface layout . . . 51

4.3 Net Simulator Page . . . 53

4.4 An test showing the video quality QoS grade and recommended remedial action . . . 54

4.5 The relationship of the video quality QoS grade and packet loss percentage setting . . . 55

4.6 The relationship of the video quality QoS grade and delay setting . . . 56

4.7 The relationship of the video quality QoS grade and bandwidth setting . . . 57

A.1 statistical RTCP report for received audio . . . 69

A.2 statistical SSRC report for sent audio . . . 70

A.3 statistical SSRC report for received video . . . 71

A.4 visualizing statistical RTCP report for received audio . . 72

A.5 visualizing statistical RTCP report for sent audio . . . 73

(13)

2.1 constraints specification . . . 18

3.1 Parameter specifications of RTCP video-sender report (The parameters prefixed by "goog" signify Google) . . . 27

3.2 Proportion of the Variance Matrix . . . 29

3.3 Rotation Matrix . . . 30

3.4 Data quantity of clusters . . . 33

3.5 Class division on each parameter . . . 37

3.6 QoS grade assumption of each cluster . . . 38

3.7 Remedies to apply in different situations . . . 45

(14)

app application

CODEC Coder/Decoder

EM Expectation Maximization

FIR Full Intra Request

GMM Gaussian Mixture Model

IETF Internet Engineering Task Force

NACK Negative ACKnowledgemet

P2P Peer-to-peer

PLI Picture Loss Indication

POC Proof-Of-Concept

QoE Quality of Experience

QoS Quality of Service

RTC Real-Time Communication

RTMP Real Time Messaging Protocol

SDP Session Description Protocol

SMOTE Synthetic Minority Over-sampling Technique

SSIM Structural SIMilarity Index

VoIP Voice over IP

(15)

W3C World Wide Web Consortium

WebRTC Web Real-time Communication

(16)

Introduction

This chapter gives a general introduction to the Web Real-Time Communications (WebRTC) area. To be specific, this chapter describes specific problems that this thesis addresses, the context of problems, the goals of the degree project, and outlines the structure of the thesis.

1.1 Background

WebRTC is a standardized technology that provides an easy way to access computing device’s (PCs, mobile platforms, IoT devices, etc) media equipment such as a web camera, computer screen, and microphone; but also provides ways to transfer media streams acquired from these devices along with any other data (over dedicated data channels). To put this in perspective, WebRTC [1] is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. The WebRTC components have been optimized to best serve this purpose. Chapter 2 present a detailed information about the WebRTC.

In times of ever-growing bandwidth needs by Internet users, applications, and increasingly tightened requirements for network resources provided by network infrastructure providers, the desire to optimize the quality of service(QoS) of applications is becoming more and more important. Similar to other multimedia applications or services, Ericsson WebRTC services focus on enhancing a user’s

(17)

quality of experience (QoE) for a service to the best extent possible. Additional background about contextual communication is given in Section 2.11.

1.2 Problem

During a call (or a conference) there is information available in the browser, called webRTC stats, regarding what connections exist to other peers, what media coder/decoders (CODECs) are involved, what bandwidth is consumed and different network characteristics such as round-trip-times and packet loss (for examples of the data one can see callstats.io [2], a service offering data collection of WebRTC stats). There are many interesting questions when it comes to what specific data to analyze and how to analyze this data. This degree project seeks to solve problems such as which data features have close relationships with perceived quality by conference participants, which in turn leads to a question: What remedies can be taken to improve the current QoS?

1.3 Purpose

The purposes of this thesis project include doing some research about the current real-time video communication system, deploying algorithms for detecting an ongoing RTC application, and applying automation to increase application-based communication quality. This degree project aims to develop solutions for communication targeting Ericsson’s professional/business users. The solutions lie in realizing WebRTC quality control in contextual systems, with a focus on being aware of the current network environment and taking appropriate action to remedy deterioration in QoS.

1.4 Goals

(18)

• Understand what WebRTC stats-parameters are important to gather and how they affect the user’s perceived QoE in the conference.

• Implement a proof-of-concept (POC) that analyzes the data in question and tries to predict the perceived QoE – either as a continuous quality scale or simply as an “OK” or “not OK” output. • Given the quality assessment, describe what remedies one would

take to improve the situation for users that are currently in a call. For example, would it be wise to decrease the video frame rate, completely replace the video with still pictures, or to drop the video completely if one would like to, at all cost, try to maintain acceptable audio quality.

Expected deliverables include:

• A report describing what WebRTC data is most important to gather as input to an analytic engine when it comes to understanding the perceived call/conference quality for individual users.

• A working POC of an analytic engine that processes the collected statistics and predicts what call quality the users are likely to experience during the duration of the call.

• A description of what actions one could take to improve the situation for the currently active user. These remedies would take one or more actions to improve the perceived quality and, if feasible, integrate of this “action decision” into the POC. The data from the analytic engine could then be fed back to the clients in order for them to act⇤_.

1.5 Research Methodology

Data collection and analysis were the critical research methodologies in the early stages of this project. I choose Elasticsearch [4] as the basic

(19)

tool. Elasticsearch is a highly scalable open-source full-text search and analytic engine, which powers applications that have complex search features and requirements. Compared with other data collecting and analyzing platforms, Elasticsearch allows me to store, search, and analyze large volumes of data quickly and in near real time, since data are distributed in clusters of multiple nodes and indexed. Using a RESTful API, I can filter out stats-parameters which are important to gather by quickly assigning and performing different searches (i.e. queries).

Data visualization is provided by Kibana [5], a GUI to show the data search results. Kibana enables visual exploration and real-time analysis of your data in Elasticsearch.

1.6 Delimitations

The important aspect to keep in mind is that we do not envision that we control the network, thus all actions taken to improve the situation need to be done from an “over-the-top” perspective. This means that the application does not have control of the underlying network. After reading and studying previous related works, this degree project will not assess the perceived QoS by comparing the sending and receiving video/radio; instead, it will apply related algorithms to analyze and classify existing WebRTC statistical history data, in further to help invoke remedy actions so as to maintain a given QoS level.

1.7 Structure of the thesis

Chapter 2 presents basic background information regarding WebRTC. Chapter 3 describes the implementation and development done in this degree project, gives an introduction of the relevant engineering-related and scientific methods, such as analyzing, modeling, developing, and

evaluating the related model. Chapter 4 introduces how an

(20)

(21)

Background

This chapter provides basic background information regarding WebRTC. To be more specific, this chapter describes the WebRTC architecture, protocols, APIs, statistical parameters, etc. Additionally, this chapter describes Ericsson contextual WebRTC framework. Finally, some related works will be discussed.

2.1 WebRTC architecture

There are a number of different standards underlying the WebRTC architecture, that combine the browser and application APIs jointly promoted by the World Wide Web Consortium’s (W3C’s) working group and Internet Engineering Task Force (IETF) working groups. Nevertheless, WebRTC’s fundamental purpose is to empower real-time communication between web browsers. In order to be a strong real-time communications (RTC) architecture, it is necessary to work across multiple browsers and platforms, thus offering developers the ability to write rich multimedia applications, without requiring that the user install extra plugins.

The overall architecture for the WebRTC website is shown in Figure 2.1

(22)

Figure 2.1: WebRTC overall architecture for the website (Adapted from figure in [6])

As you can see from Figure 2.1, there are three distinct APIs displaying with different colors (highlighted):

• Web API for the third party developers that provides all the APIs needed for web-based application development (further descriptions of these Web APIs can be found in Section 2.3) • WebRTC Native C++ API for browser developers to implement

their service.

(23)

2.1.1 WebRTC Voice and Video Engines

The WebRTC voice and video engines enable the browser to access audio and video streams from the system’s hardware, such as microphone and camera. The fully featured WebRTC voice and video engines are in charge of all the signal processing, as described in Figure 2.2. They provide frameworks for audio and video media streams, from the system hardware to the network. Moreover, they exist directly in the web browser so that a web application (app) can receive the final optimized media stream, which can then be transferred to its peer via the Web APIs.

Figure 2.2: Voice and Video Engine

2.1.2 Audio CODECs

(24)

2.1.2.1 iSAC

The internet Speech Audio CODEC (iSAC [7]) is one of the audio CODECs available in WebRTC. It is a wideband and super wideband audio CODEC suitable for voice over IP (VoIP)[8] and streaming audio. Since 2011, WebRTC’s code base has contained a royalty-free license implementation of iSAC. Some of the features of iSAC are:

Sampling

frequency 16 kHz (wide-band) or 32 kHz (super wide-band) Adaptive

and variable bit rate

10 kbit/s to 32 kbit/s (wide-band) or 10 kbit/s to 52 kbit/s (super wide-band)

2.1.2.2 iLBC

The Internet Low Bit rate CODEC (iLBC) [9] is narrow-band audio codec for VoIP and streaming audio. Since 2011, an implementation of iLBC is available under a free license as a part of the open source WebRTC project. Some of the features of iLBC are [9, 10]:

Sampling frequency

8 kHz/16 bit (160 samples for 20 ms frames, 240 samples for 30 ms frames)

Fixed bit rate 15.2 kbps for 20ms frames and 13.33 kbps for 30ms

frames

2.1.2.3 Opus

Opus is a highly versatile, royalty-free audio CODEC. It has the following properties [11]:

Sampling

frequency 8 kHz (narrow-band) to 48 kHz (full-band) Constant

and variable bit-rate

(25)

2.1.2.4 Jitter and packet loss concealment

An algorithm is used to hide network jitter and packet loss. The aim of this algorithm is to maintain voice and video quality as high as possible, while minimizing end-to-end network latency.

2.1.3 Video CODEC

VP8 [12] is royalty-free video CODEC. VP8 is considered appropriate for interactive real-time communication as it is designed for low latency. Some of its properties are:

Required bandwidth 100 to 2,000+ Kbit/s

Variable bit-rate depending on the desired quality of the streams

2.1.4 Image enhancements

WebRTC’s image enhancements are designed to erase video noise from the image captured by a camera.

2.2 Real-Time Transport for a Session

(26)

Figure 2.3: WebRTC network protocol stack

2.2.1 User Datagram Protocol (UDP)

UDP delivers each datagram when it arrives and does not provide reliable delivery of data. Due to the time-sensitive characteristic of real-time communication, WebRTC priorities timeliness over reliability, hence WebRTC uses UDP as its transport protocol for real-time data.

2.2.2 Session Traversal Utilities for NAT (STUN)

To provide NAT traversal, WebRTC depends upon a STUN [13] server at a globally routable IP address. In order to discover whether a peer is behind a NAT and to obtain the IP address and port mapping, STUN packets should be sent before the peer-to-peer (P2P) WebRTC session is initiated. Then the two parties can use the discovered public IP address and port to connect with each other.

2.2.3 Traversal Using Relays around NAT (TURN)

(27)

Figure 2.4: STUN and TURN

2.2.4 Interactive Connectivity Establishment (ICE)

ICE, as specified in RFC 5245 [15], is a standard method of NAT traversal used in WebRTC. ICE deals with NATs by performing connectivity checks. ICE collects all available candidates for NAT traversal, such as IP, reflexive STUN address, and TURN relay address and then sends them to the remote peer via the Session Description Protocol (SDP). As soon as one client has all the collected ICE information about itself and its peer, it initiates connectivity checks, which check for the ability to send media data via the distinct address. It explores alternatives until success or it runs out of alternatives.

2.2.5 Session Description Protocol (SDP)

SDP[16] is a data format used for negotiating parameters in a Peer-to-peer (P2P) connection, including network information collected by NAT traversal mechanisms, the data type to be transferred between peers, CODECs to be used.

2.2.6 Datagram Transport Layer Security (DTLS)

(28)

2.2.7 Stream Control Transport Protocol (SCTP)

SCTP[18] is used for implementing WebRTC and delivering data channels. Similar to TCP, SCTP is connection-oriented and provides a flow control mechanism to ensure the network does not become congested.

2.2.8 Secure Real-Time Transport Protocol (SRTP)

SRTP[19] and its associated control protocol (SRTCP) are two application protocols used to multiplex streams, provide congestion and flow control, and provide delivery of real-time media traffic and other additional services on top of UDP. SRTP is a profile for RTP (described in the next section).

2.2.9 Real-time Transport Protocol (RTP)

RTP[20] is implemented on top of UDP and is designed for sending or receiving media traffic. RTP packets include sequence number and timestamp fields so that the receiver can deal with out of order packets and jitter.

The Real-time Transport Control Protocol (RTCP) provides a lightweight control mechanism for RTP. RTCP can send statistical reports and flow control messages. RTCP enables the receiver to provide feedback to the sender so that the sender can perceive the network conditions (as seen by the receiver) and potentially allow the sender to adapt to the current network conditions on the path fron the sender to the receiver.

2.2.10 Summary

(29)

2.3 WebRTC APIs

W3C’s WebRTC APIs [21] are designed to allow media data to be sent to and received from the peer browser by deploying the corresponding set of real-time protocols. Developers can access the WebRTC layer then build peer connections with these functions and objects. The W3C’s WebRTC APIs specification covers:

• Connecting with remote peers using NAT traversal

related technologies, for instance, STUN, TURN, and ICE as described in the preceding section.

• Exchanging locally-produced track information with remote peers. • Sending arbitrary data to peers.

2.3.1 RTCPeerConnection

(30)

(31)

Figure 2.6: Calling Sequences: Receive a Call

(32)

2.3.2 getUserMedia API

The getUserMedia API also known as MediaStream API, performs the following key functions:

• Generates a stream object that represents a real-time video or audio mediastream

• Deals with selecting input devices among multiple cameras or microphones connected to our device

• Provides secure authentication according to a user’s permissions or preferences asking the user before the browser accesses and fetches a media stream

One of the most important options that will be part of the deployment is the constraining option of getUserMedia API. You can find the full set of constraints provided in the expired Internet draft "Resolution Constraints in Web Real Time Communications" [23]. These options include the minimum required resolution, frame rate, video aspect ratio, and other optional parameters that can be passed from the configuration object to getUserMedia API. One example from W3C is [21]: { mandatory : { width : { min : 640 } , height : { min : 480 } } , o p t i o n a l : [ { width : 650 } , { width : { min : 650 } } , { frameRate : 60 } , { width : { max : 800 } } , { facingMode : " user " } ] }

(33)

Table 2.1: constraints specification

Object Specification Value options

Height Specifies the video

source height

Min and Max as an integer

Width Specifies the video

source width

FrameRate specify how many

frames to send per second (usually 60 for High Definition (HD), 30 for Standard Definition (SD))

aspectRatio height divided by

width – usually 4/3 or 16/9

Min and Max as a decimal

facingMode Select the front/user

facing camera or the rear/environment facing camera if available Which camera to choose – currently user, environment, left, or right

Generally, setting mandatory constraints is suggested in order to limit bandwidth of the network connection or to save computational power in the devices. These constraints are incredibly useful since it gives us the ability to adapt to specific network situations, in order to provide the best available QoS for users.

2.3.3 WebRTC’s Statistics API

(34)

browsers. For example, getStats() looks as follows: In Firefox:

peerConnection . g e t S t a t s ( n u l l ) . then ( function ( s t a t s ) { . . . // r e t u r n s a promise

In Chrome:

peerConnection . g e t S t a t s ( function ( s t a t s ) { // pass a c a l l b a c k function

Therefore, the monitored data itself also looks slightly different in chrome and firefox, but this will be addressed (and the values normalized) in the implementation that is described later.

In this way, WebRTC applications can observe the media stats parameters when performing session negotiation and during data transmission. These statistics play a key role when analyzing the perceived QoS at the client’s side. Among them, the RTCP report, identified by unique synchronization source identifier (SSRC) number, is one of the most important ones and will be described in detail on Section 3.1.1.

2.4 Ericsson’s contextual WebRTC framework

(35)

Figure 2.7: WebRTC framework for Ericsson WebRTC services The statistics server receives reports from clients when they do HTTP POST to it. It is possible to listen to the stats messages posted on this server with a web socket connection. The protocol on that web socket interface is NATS - Open Source Messaging System (https://nats.io). The server has an HTTP endpoint that receives messages when clients do HTTP POSTs to the statistics server. This HTTP endpoint server splits up all the messages in the POST (there are more types of messages than just those that WebRTC is posting). The server puts all of these messages on a message bus under different topics. This bus is a “NATS bus” in the current implementation. Applications can listen to messages on this bus and take analyzing . There is a websocket endpoint that connects to and exposes the bus. Using this endpoint, it was possible to capture the message posted to the statistics server in real-time and import the stats from statistics collector into the elasticsearch database in order to carry out this thesis project.

2.5 Related Work

(36)

a good fit for WebRTC. Facebook introduced a 360 /VR video quality metric using the Structural SIMilarity Index (SSIM) [27] technique for objective quality assessment.

Estimating quality perception for the end-user (subjective quality) is a highly complex problem because different individuals can have different personal feelings about the same conversation’s quality. As a result, there is no universal standard defining which factors lead to what degree of user perceived QoS for a real-time communication application. A few assessments for end user perceived video or audio quality have been done in the context of Internet-based applications. Most of them did not test the delay factor that we must address in a live call. Cole and Rosenbluth [8] looked at this in the context of audio. Chen and Thropp [28] conducted a survey about the effects of frame rates (FRs) on human performance then concluded a FR around 15 HZ seems to be a minimum boundary for human satisfaction; however, this varies according to video content, the viewers, and applications. Based on this, Ou et al. [29] built models to reflect the trend observed from subjective testing, specifically how perceived quality of video changes with different FRs. Athula Balachandranet, et al. [30] developed a predictive model for Internet video QoE by using machine learning and metrics that capture delivery-related effects, such as bitrate delivered, bitrate switching, the rate of buffering, and join time [31, 32, 33, 34]. Some quality assessment has focused on real-time video. For example, by understanding how video-based applications are influenced by network conditions. For example, French et al. [35] proposed an architecture to estimate real-time video QoE based on analysis of Real Time Messaging Protocol (RTMP) streams. They carried out experiments to demonstrate that they can predict video QoE based on stream state measurements (frame rate, bandwidth, and bitrate) and previous users’ ratings with 70-80% accuracy. Hossfeld et al. [36] presented how network delay (initial delay and interruption) affects human perceived video quality with a series experiments.

(37)

presents how Google realize WebRTC video quality measurement by using peak signal-to-noise ratio (PSNR) and SSIM. However, this test only measures correctness by comparing input and output videos, without considering other elements that can affect video quality, such as frame rate or resolution.

The thesis will build upon the existing established state of the art in the field, by combining some “objective” methods to adapt algorithms to predict a subjective quality score.

2.6 Summary

This chapter presented high-level introductions:

• What is WebRTC? Together with explanations on multiple aspects, such as architecture, protocols, and APIs.

• How does WebRTC system work, specifically when establishing a peer connection?

(38)

Implementation and development

This chapter describes the implement and development carried out in this degree project. The chapter begins with an introduction of what engineering-related and scientific methods were applied, specifically analyzing, modeling, developing, and evaluating a model. The purpose of this chapter is to provide an overview of the research method used in this thesis. Section 3.1 describes the data processing, including data collection, data analysis, and data visualization. Section 3.2 focus on selecting the model and building methods to grade and predict the perceived video quality based on the collected data of Section 3.1. Section 3.2 also introduces two algorithms that are used for big-data processing to enhance the model. Section 3.3 explains the remedies one could apply to improve the current video quality in a session and the corresponding development techniques that were used. Finally, Section 3.4 concludes for this chapter.

3.1 Data Processing

Data collection was a primary task at the start of this degree project. Section 2.4 explained how Ericsson WebRTC services captures statistical data from a WebRTC application. To learn which specific data should be analyzed and how to analyze this data. More than 46,327,000 RTCP-reports (one set of RTCP-report per second), corresponding to statistical data for 772,120 minutes of sessions, were loaded into Elastic

(39)

search for subsequent data processing.

3.1.1 Data Analysis

As described above, one of the key tasks of this degree project was to determine which data features are correlated with perceived quality as seen by the conference participants. In the remainder of this thesis will focus on the statistical parameters correlated with video and how they are correlated with QoS.

(40)

Figure 3.1: statistical RTCP sender report for a video stream from 9 March 2018

(41)

in this project. Of all of these parameter, 8 particular parameters are at

the center of this thesis project, include "packetsLost", "googAdaptionChanges", "googAvgEncodeMs", "googEncodeUsagePercent", "googFirsReceived",

(42)

Table 3.1: Parameter specifications of RTCP video-sender report (The parameters prefixed by "goog" signify Google)

Parameter Specification

googRtt Describe the round-trip-time measured via RTCP (Unit: ms)

packetsLost A cumulative number, specifying the number of RTP packets lost for this SSRC. So we can calculate the number of lost packet per second by dividing the difference between the current value of packetLost and its earlier value and the time between the two reports.

googNacksReceived The number of Negative Acknowledgement (NACK) received.

NACKs reflect RTP packets were lost.

googPlisReceived The number of times the receiver of the stream sent a Picture Loss Indiciation (PLI) packet to the sender, indicating that it had lost some encoded video data for one or more frames.

googFirsReceived A count of the total number of Full Intra Request (FIR) packets received by the sender. A FIR packet is sent by the receiving end of the stream when it falls behind or has lost packets and is unable to continue decoding the stream. The higher the value of this parameter is, the more often a problem of this nature arose, which can be a sign of network congestion or an overburdened receiving device.

googAdaptionChanges Indicates whether the resolution is changed because of CPU issues or insufficient bandwidth. googAdaptionChanges increases, whenever one of the two conditions changes.

googAvgEncodeMs Average encode time of video frame from sender. (Unit: ms)

googEncodeUsagePercent Average encode time per frame divided by

(43)

Select select parameters with string column names ("@timestamp", "ssrc", "googRtt", "packetsLost", "googNacks", "googPlis", "googEncodeUsagePercent", "googAvgEncodeMs", "googAdaptionChanges", "googFirsReceived").

Filter filter out invalid data records, i.e., this with a "Null" value for

googRtt.

Order order all data records by@timestamp for the next data processing action.

Generate new columns The values of "packetsLost", "googNacks", "googPlis", "googAdaptionChanges", "googFirsReceived" are cumulative over time. After ordering all data with increasing values of "@timestamp" for groups of records of independent "ssrc", one can calculate the difference in values for each subinterval.

Drop drop duplicated data with the same value for a statistical parameter,

except for "ssrc" and "timestamp".

After the series of processing actions, 422,534 RTCP-records remain. In order to give the rank ordering of parameters that most affect the video quality, this project applies an impotent data processing technique, specifically Principal Component Analysis (PCA), described in Section 3.1.1.1.

3.2 Principle Components Analysis

(44)

In PCA, the eigenvalue reflects the amount of variation in the total sample accounted for by each factor, and the ratio of eigenvalues is the ratio of explanatory importance of the factors with respect to the variables. In another words, if a factor has a low eigenvalue, then it is contributing little to the explanation of variances in the variables and may be ignored as redundant in comparison with more important factors.

Specifically on this project, PCA is applied to analyze multi-dimensional data with 8 features of RTCP-reports, including "googRtt", "packetsLost", "googNacks", "googPlis", "googEncodeUsagePercent", "googAvgEncodeMs", "googAdaptionChanges", "googFirsReceived". This analysis was programmed with high level APIs [43] provided in Apache Spark[44]. The variance ratio for each principle component (PC) are displayed in Table 3.2.

Table 3.2: Proportion of the Variance Matrix

PC number Proportion cumulative

PC1 22.3% 22.3% PC2 20.9% 43.2% PC3 14.3% 57.5% PC4 14.1% 71.6% PC5 13.6% 85.2% PC6 8.2% 93.4% PC7 6.4% 99.8% PC8 0.0% 99.8%

As one can see from the above matrix: The first two features are much stronger than the next three and these are stronger than the last two. Importantly, the first 4 explain 71.6% of the outcomes, which means the data dimension of stats parameters could be deduced to 4 by extracting only the first 4 strongest features.

(45)

Table 3.3: Rotation Matrix PC1 PC2 PC3 PC4 googRtt 0.030 0.114 0.873 0.276 googAdaptationChanges 0.133 0.008 0.282 0.948 googAvgEncodeMs 0.697 0.004 0.005 0.122 googEncodeUsagePercent 0.701 0.003 0.010 0.076 googFirsReceived 0.0 0.0 0.0 0.0 packetsLost 0.015 0.686 0.065 0.004 googNacks 0.042 0.635 0.250 0.053 googPlis 0.039 0.336 0.303 0.036

PCA yields a transformation based on the original inputs to a new set of outputs. According to this rotation matrix, the corresponding linear transformation function are:

PC1 = - 0.030 *googRtt - 0.133 * googAdaptationChanges - 0.697 * googAvgEncodeMs - 0.701 * googEncodeUsagePercent + 0.0 * googFirsReceived - 0.015 * packetsLost + 0.042*googNacks - 0.039* googPlis

PC2 = 0.114 *googRtt - 0.008 * googAdaptationChanges + 0.004 * googAvgEncodeMs - 0.003 * googEncodeUsagePercent + 0.0 * googFirsReceived + 0.686 * packetsLost + 0.635*googNacks + 0.336 * googPlis

PC3 = - 0.873 *googRtt + 0.282 * googAdaptationChanges + 0.005 * googAvgEncodeMs + 0.010 * googEncodeUsagePercent + 0.0 * googFirsReceived + 0.065 * packetsLost + 0.250*googNacks - 0.303* googPlis

PC4 = 0.276 *googRtt + 0.948 * googAdaptationChanges - 0.122 * googAvgEncodeMs - 0.076 * googEncodeUsagePercent + 0.0 * googFirsReceived - 0.004 * packetsLost - 0.053*googNacks + 0.036* googPlis

In summary:

Compared with other 6 parameters googAdaptationChanges and googAvgEncodeMs have a much stronger influence on PC1. When talking about the effect

on PC2, packetsLost, googNacks, and googPlis are much stronger. For

PC3, googRtt influence’s is greater than the next three googPlis, googAdaptationChanges, and googNacks. As to the last PC4, googAdaptationChanges is much

(46)

3.3 Modeling

Modelling[45] is a scientific activity, the aim of which is to make the particular parameters or features of the world easier to understand, define, quantify, visualize, or simulate by referencing it to existing and usually commonly accepted knowledge. It requires selecting and identifying relevant aspects of a situation in the real world and then using different types of models to define a model that satisfies particular aims, such as using conceptual models to gain better understand, using mathematical models to quantify a phenomena, and using graphical models to visualize something. Modelling is an essential and inseparable part of many scientific disciplines, each of which have their own ideas about specific types of modelling.

3.3.1 Clustering

In statistics, clustering is used to group data into categories based on some measure of inherent similarity or distance. Given a set of data items with the four principle components - the outcomes of Section 3.1.1, clustering algorithms could be applied to group them into different classes. In some multidimensional space points within each cluster are similar to each other, while points from different clusters are dissimilar. Usually, points are in a high-dimensional space and similarity is defined using a distance measurement.

Mean Opinion Score (MOS) [46] gives a numerical indication of the perceived quality of the media after being encoded and decoded using CODECs and after propagating over the path from sender to receiver. CODECs are generally assessed using an MOS score on a 5 level scale, hence this project also outputs 5 different QoS grades. This which in further lead to a desire to clustered data into 5 classes.

3.3.1.1 Introduction to the Gaussian Mixture Model (GMM)

(47)

describe the underlying generative process of the data set (assuming that the generative process is a set of Gaussian distributions and that only the parameters that are unknown). In GMM, each cluster can be seen as one distribution, such as a Gaussian distribution. This approach is based upon the following:

• Each data object Xiis assumed to be a sample from an independent and identically distributed mixture of k distributions Ci.

• Each cluster is a multivariate Gaussian distribution.

• GMM use the Expectation Maximization (EM)[48] algorithm to consider the statistics of each cluster, that includes calculation of mean and a co-variance matrix. The goal of EM is to find the maximum-likelihood estimate of a data distribution, when the data is partially missing or hidden. The EM algorithm iteratively refines the GMM parameters to increase the likelihood of the estimated model.

• Specifically for this project, after multiple iterations, each data object Xi gets a set of 5 different possibilities corresponding to 5 clusters. The cluster which has the maximum percentage will be chosen as the class that Xi belong to.

3.3.1.2 Clustered Results

The project built a GMM model by using a clustering algorithm in Apache Spark (as Spark provides a fast and general-purpose cluster computing system).

(48)

Table 3.4: Data quantity of clusters

Label 0 Label 1 Label 2 Label 3 Label 4 Sum Quantity 367939 46833 439 4052 3271 422534

Figures 3.2 to 3.9 display the Cumulative Distribution Function (CDF) of 8 statistical parameters in 5 different clusters that were labelled (For some clusters the value of specific parameters was a constant "0", hence we draw them as histograms). In each figure, there are 5 distribution functions marked with various colors, that represent the data with different labels.

(49)

Figure 3.3: CDF and histogram of googAdaptationChanges

(50)

Figure 3.5: CDF of googEncodeUsagePercent

(51)

Figure 3.7: CDF of packetsLost

(52)

Figure 3.9: CDF of googPlis

As you can see, there are clearly different characteristics among these 5 different clusters. Table 3.5 shows the class division situations for each parameter and follows with an explanation.

Table 3.5: Class division on each parameter

Parameter Class Division googRtt C3 | C2, C1| C4| C0 googAdaptationChanges C0, C1, C4| C3| C2 googAvgEncodeMs C0, C4| C1, C2| C3 googEncodeUsagePercent C0, C4| C1, C2| C3 googFirsReceived C0, C1, C2, C3, C4 packetsLost C0, C3| C1| C4| C2 googNacks C0, C3| C1| C4| C2 googPlis C0 | C3, C1| C4| C2

(53)

3.5, these Cluster numbers are written as C0, C1, C2, C3, and C4, which represent 5 different data clusters.

• Class Division: Observing the above figures of 8 statistical parameters, one can see obvious diversity between different clusters. For example in Figure 3.3, around 80 % percentage of data in C0, C4 has the same value range of googAdaptationChanges ( ⌧ 15 ). Similarly, around 80 % percentage of data in C1, C2 has the same value range of googAdaptationChanges ( ⌧ 30 ). Considering the last data cluster - C3, where the value range of

googAdaptationChanges is ⌧ 300 , the original 5 clusters could be divided and reordered into 3 classes:

C0, C4| C1, C2| C3(Ordering the value of googAdaptationChanges from low to high)

Assigning a QoS grade to each cluster can be done by considering a combination of the figures 3.2 to 3.9, Table 3.1 and Table 3.5. As mentioned before, this project ordered the perceived QoS into 5 levels. Table 3.6 describes how we indicate the QoS grade for every cluster and is followed by an explanation.

Table 3.6: QoS grade assumption of each cluster

QoS grade 4 3 2 1 0

Cluster number

C0 C3 C1 C4 C2

• C0 was tagged with the highest QoS grade - "4", representing the best video quality, because it has the lowest value for all parameters except for googRtt. C0 has a wide distribution of googRtt, where larger values indicate a long distance or more network nodes between WebRTC peers.

(54)

• C1 was tagged with a medium QoS grade - "2". That is because it has a higher value of googRtt, packetsLost, and googNacks when compared with C3. Additionally, C1 is in the same level with C3 when considering the value of googFirsReceived, and googPlis. Looking into

googAvgEncodeMs and googEncodeUsagePercent, the values for C1 are lower than C3, which means C1has a lower average encoding time than C3. However, this difference in encoding time for C1is lower than C3 is approximately equal to the difference in values of RTT for C3 and C1 (around 250ms bias for 80% percentage of data). In summary, C1 should be tagged with a lower QoS grade than C3.

• C4was tagged with a bad QoS grade - "1". Compared with C1, C4 has a higher value of googRtt, packetsLost, googNacks, and googPlis

as well as the same value of googFirsReceived and googAdaptationChanges.

Although c4 has a lower value of googAvgEncodeMs and

googEncodeUsagePercent than C1, this time difference is far less than the difference of googRtt between them: the googRtt of C4 higher than C1 around 1000ms for 80% percentage of data; the average encoding time of C4lower than C1around 15ms for 80% percentage of data.

• C2was tagged with the lowest QoS grade - "0" because it has the highest value of 4 parameters - googAdaptationChanges, packetsLost, googNacks, and googPlis. Compared with these other clusters, C2 has a significant difference for these 4 parameters (One can see this from the CDF figures of these 4 parameters).

(55)

3.3.2 Sampling Methodologies

In order to solve the imbalanced distribution problem, one can apply sampling[50] techniques - over-sampling or under-sampling to each cluster. These two methodologies in data analysis have received significant attention as a means to counter the effect of imbalanced data sets. In recent years, they are common used to adjust the cluster distribution of a data set (i.e. the ratio between the different classes/categories represented). Technically, over-sampling and under-sampling are opposite and approximately equivalent methods. They both involve using a bias to select more samples from one class than from another.

3.3.2.1 Over-sampling with SMOTETomek

There are a wealth of methods available to over-sample a data set used in a typical classification problem (using a classification algorithm to classify a set of images, given a labelled training set of images). In this project, over-sampling was applied to generate more data samples with labels "label 2", "label 3", and "label 4" based on the GMM clustered results of Section3.2.1. The most common technique is known as SMOTE: Synthetic Minority Over-sampling Technique[51].

SMOTE is an advanced method of over-sampling developed by Chawala [52]. It supplements the minority class by creating artificial examples in this class rather than by replicating the existing examples. The algorithm works as follows[53]:

• Assume A is the minority class and B is the majority class. Then, for each observation x that belongs to class A, the k-nearest neighbors of “x” were identified.

• A few neighbors are randomly selected (the number of neighbors depends on the desired rate of over-sampling).

• Artificial observations are then generated and spread near the line joining “x” to its nearest neighbors.

(56)

that exist in data sets with skewed class distributions are unresolved. Usually, class clusters are not well defined, hence some majority class objects might invade the minority class space, which leads to interpolated "minority class" samples being introduced too deeply in the majority class space. This situation can result in over-fitting by the classifier. In order to generate better-defined class clusters, Tomek links[54] can be applied to the over-sampled training set as a data cleaning method. Thus, instead of removing only the majority class examples that form Tomek links, examples from both classes are removed.

To implement SMOTE + Tomek algorithm, this project implemented a program based on C0 and C1 by calling methods provided by the imbalanced-learn library of Python’s scikit-learn package[55].

3.3.2.2 Under-sampling with Random-Sampler

Given an original data set S, under-sampling algorithms will create a new set S’ where |S’| < |S|. In other words, under-sampling techniques will reduce the number of samples in the targeted classes. Random-Sampler[56] is a fast and easy way to balance the data by randomly selecting a subset of data for the targeted classes. With the controlled algorithm, the number of samples to be selected can be specified, which makes Random-Sampler the most naive way of performing such selection.

To implement the Random-Sampler algorithm, this project implemented a program based on C2, C3, and C4 by calling methods provided from the imbalanced-learn library of Python’s Scikit-learn package.

3.3.2.3 Balanced Results

(57)

Figure 3.10: Balanced Results

The result is that number of points in the clusters with QoS grades "4" and "2" are reduced. Conversely, the number of points in the clusters with QoS grades "3", "1", and "0" has increased. In other words, the imbalanced distribution among the 5 clusters was solved by using SMOTETomek together with Random-Sampler techniques.

3.3.3 Classification using Random Forests Algorithm

In the terminology of machine learning and statistics, classification[57] consists of identifying to which of a set of categories (clusters) a new observation (WebRTC RTCP-record) belongs, on the basis of a training set of data containing observations whose category membership is known (records already tagged with a cluster identifier). Examples of such tags are labeling a given email is assigned as to whether this email is in the "spam" or "non-spam" class, or assigning a diagnosis for a patient with observed characteristics of the patient (gender, blood pressure, etc.).

This section introduces the Random Forests framework, how Random Forests work, and how they were used in this degree project.

(58)

into a decision tree, can be used to formulate a set of rules. Subsequently, these rules can be used to make predictions.

Since Random Forests were first introduced by Breiman[58], the random forests algorithm has been extremely successful as a general purpose classification algorithm. There are two phases in the algorithm: (1) random forest creation and (2) making a prediction from the random forest classifier built in the first phase. It is easy to understand the whole process using Figure 3.11.

Figure 3.11: Random Forest Simplified[59] The steps of the tree generation phase are[60]:

1. Randomly select “k” features from total of “m” features where k _{⌧ m . (In this project, m equals to 4, representing PC1 to PC4.)} 2. Among the “k” features, calculate the root node using the best

split point.

3. Split the node into daughter nodes.

(59)

5. Build a forest by repeating the first 4 steps for “n” number times to create “n” number of trees.

The steps of the prediction phase are[60]:

• Take the test features and use the rules of each randomly created decision tree to predict the outcome and store the predicted outcome. (In this project, the number of outcomes is the number of clusters) • Calculate the votes for each predicted outcome.

• Consider the highest voted predicted target as the final prediction from the Random Forest algorithm.

To implement a Random Forest algorithm, this degree project first split the stats data set into 70% as a training set and 30% as a testing set. The training set was used to train a Random Forest model by calling the RDD-based APIs[61] in Spark using a program written in Scala. To evaluate the prediction accuracy of this Random Forest model, the testing set was feed into the model and this showed approximately 90% accuracy.

3.4 Selection of Remedial Actions

(60)

Table 3.7: Remedies to apply in different situations

QoS

grade Remedy

4 Nothing to do.

3 Recommend: Reduce the resolution to keep the video

quality stable.

2 Highly Recommend: Drop the frame rate of the video or

Reduce the resolution to improve the clarity and correctness of the video.

1 The current network situation is horrible, strongly

recommend you drop the video completely to keep audio fluency.

0 If the situation persists a while, we recommend you to

terminate the conference.

Based on the explanation of Table 3.6, where we describes the reason for the QoS grade indication, the specification of what remedy to apply in different situation in detail are:

• The characteristics of sessions with QoS grade "3" includes the lowest value of RTT, no picture or packet lost, but one or two more googAdaptationChanges occured and a longer than average encoding time per frame. In this situation, the video resolution should be changed because of CPU issues or because there is

insufficient bandwidth. Meanwhile, the higher value of

googAvgEncodeMs as well as a high googEncodeUsagePercent implies the available CPU or bandwidth are insufficient to supply the ongoing resolution. Therefore, remedial actions should be taken to reduce the resolution directly in order to keep the video quality stable.

(61)

• The QoS grade "1" represents bad video quality. The sessions in this cluster have the lowest value of googAvgEncodeMs and googEncodeUsagePercent, which implies a strong possibility of low resolution. Although, these session had a much higher value of googRtt, packetsLost, googNacks, and googPlis, which is associated with a horrible perceived video quality —— choppy animation, frame skipping, etc. That is why we recommend the conference peers drop the video completely in order to keep audio fluency. • Sessions with QoS grade "0" indicates extremely terrible network

conditions. However, this situation seldom or never happened in the over all set of records. In fact, data labelled as QoS grade "0" only occurred 439 times. The recommendation of this cluster is to terminate the conference, if this situation persists.

3.5 Summary

(62)

Evaluation of WebRTC test platform

This chapter describes an experimentation evaluation of the classification model in a WebRTC application together with verification of the validity of the remedial actions proposed in Section 3.3. Section 4.1 describes the WebRTC test platform. Section 4.2 focus on methods and algorithms used for interacting with the WebRTC application and the classifier model. Section 4.3 shows a web interface to the software that was used for this project. Section 4.4 explains the network simulation environment. Section 4.5 proposes and analyses outcomes for an evaluation experiment. Finally, Section 4.6 states a conclusion for this chapter.

4.1 Framework of Evaluation

The WebRTC test platform shown in Figure 4.1 is mainly composed of two parts provided by two Docker[62] containers:

o _{webRTC-E2E-Test}

webRTC-E2E-Test is a docker container that runs two instances of a WebRTC application communicating with a Coturn server served from turn-netsim. The "webRTC-E2E-Test" container plays a crucial part in this project by sending real-time statistical parameters to the back end classifier as well as receiving the outcome - a QoS score from the classifier. Section 4.3 provides more information.

(63)

o turn-netsim

turn-netsim is a docker container which contains two services:

– Coturn server[63] as TURN server (the concept of a TURN

server was introduced in Section 2.2.3), providing a media relay.

– A Web application is listening on port 3000 to tune a traffic

control tool which simulates a slow and lossy network.

Figure 4.1: Test Platform Framework

4.2 Interaction Theory

(64)

stats message to Random Forest, which is responsible for sending a QoS score back to the front-end in real-time.

The WebSocket Protocol is a generally supported open standard for developing real-time applications. It enables bi-directional communication between a client running code in a controlled environment to a remote host that has opted-in to communications from that code. A WebSocket connection between a client and server enables both parties to send data at any time (hence it is full duplex).

The steps of interaction process are:

• When creating a WebSocket connection, the first step is to establish a TCP connection via which the client and server agree on using the WebSocket Protocol.

• After a TCP connection is established, the client initiates a WebSocket connection via a process known as a WebSocket handshake. This begins when the client sends an HTTP request to the server. An Upgrade header contained in this request message notifies the server that the client is trying to establish a WebSocket connection. • If the server supports the WebSocket protocol, it will agree to the

upgrade and communicates this by sending an Upgrade header in a response message.

• Now that the handshake is complete the initial HTTP connection will be replaced by this WebSocket connection that uses the same underlying TCP connection. At this point, both sides can start to send data.

As soon as a peer connection is established between the WebRTC application and classifier, stats parameter data will be collected into a stats matrix where data is saved for a period of 10 seconds. Every 10s, the stats matrix will be transmitted over the WebSocket. After the classifier makes a prediction based on each set of stats parameters, the classifier’s result will be mapped into a QoS grade, as shown in Table 3.6.

(65)

parameters every ten seconds. The voting algorithm in charge of finding the QoS grade selected the QoS grade that appears most often in outcomes computed from the set of 10 samples. Generally, the QoS control system takes less than 1 second to send the stats record to the classifier until an answer is returned via the WebSocket.

4.3 Software design

(66)

Figure 4.2: Web interface layout

(67)

• 4 frames, from top to bottom and from left to right, are the original video stream of peer A, the original video stream of peer B, the video stream from peer A after transmission to B and the video stream from peer B to A.

• A division to display the real-time score for video quality

• A division to show the proposed remedy corresponding to the assigned QoS grade.

• A division to present the current video constraints on resolution and framerate. This enables the users to choose a much lower resolution or framerate than current setting, when they are recommended to "reduce resolution" or "drop framerate".

• Last is a division to realize video control options, as instructed by the "remedy description" division.

4.4 Network Simulation Environment

Setting up the environment to run a net simulator consists of: • Build the turn-netsim container,

• Before running this container, map the TCP port 443 and 3000 to localhost,

• Access the port 3001 from localhost’s browser, i.e. http://localhost:3001/, and

(68)

Figure 4.3: Net Simulator Page

4.5 Results and Analysis

(69)

Figure 4.4: An test showing the video quality QoS grade and recommended remedial action

If the user applies the recommended remedy, they should perceive a higher quality video, thus in the next ten seconds they should see a higher QoS grade than before. Every 10s, the QoS grade is refreshed. This interface implements video quality control for the ongoing WebRTC sessions.

Generally, it takes 1 to 2 seconds for the users from getting a recommendation to acting on it, including reading the remedial recommendation, moving their mouse, and selecting the remedial action. To some extent, this degrades the performance that the users experience and also means that the first one or two measurements in the current 10 second interval are not representative of QoS of the parameters at the end of the 10 second interval. However, if the adaptation were automatic (as could be done in the future), the user would consistently experience the best quality that they could get with a 10 second averaging time.

As an evaluation, this project carried out an experiment to show the relationship between the parameters of the network simulator (bandwidth, delay, packet loss) and the QoS score. This experiment was done by evaluating the QoS grade for a continuous WebRTC conference under different conditions (i.e., with different parameters for the network simulator) and adopted the single variable principle. Figures 4.5, 4.6, and 4.7 display the outcome of this experiment.

(70)

of bandwidth and packet loss percentage were constant 4M and 0% separately for Figure 4.6 and the the settings of delay and packet loss percentage were constant 60ms and 0% separately for Figure 4.7. For every setting, the experiment records ten scores (during 100 seconds) and reports the average as a QoS grade of the current setting. These figures fit a linear relation between the settings of network simulator and the QoS grade by using a trendline[65] and R2_{[66] value (a number} from 0 to 1 that reveals how closely the estimated values for the trendline correspond to the actual data, with "1" representing a perfect fit between the data and the line drawn through them, and "0" representing no statistical correlation between the data and a line).

Figure 4.5: The relationship of the video quality QoS grade and packet loss percentage setting

(71)

Figure 4.6: The relationship of the video quality QoS grade and delay setting

(72)

Figure 4.7: The relationship of the video quality QoS grade and bandwidth setting

Figure 4.7 shows that the number of bandwidth generally has a positive effect on the QoS grade. This implies considering bandwidth as an additional statistical parameter could be done in the future for this project.

4.6 Summary

(73)

Conclusions and Future work

This chapter states some conclusions based on the entire project. Moreover, it also discusses some of the limitations and suggests some potential future work. The chapter ends with a few relevant reflections on this work.

5.1 Conclusions

The project consisted of data processing and modeling to understand the perceived quality of WebRTC conferences. After applying data processing and principal component analysis, Section 3.2 describes which WebRTC stats-parameters are important to gather and analyze in order to predict the perceived conference quality, which is the first expected deliverable. To produce the second expected deliverable, this project input outcomes of the PCA to a GMM clustering model. Based on the clustered outcomes, a QoS grade was assigned to each cluster. Continuously, a Random Forest classifier was trained to predict a QoS grade for newly collected WebRTC stats data during the duration of a WebRTC conference. Section 3.4 presents what action should be taken to improve the perceived video quality and Section 4.3 stepwise describes how to integrate these action decisions into a working POC, which hence produced the last expected deliverable. In the end, after implemented a experiment running on a Ericsson contextual test platform where the recommended remedies from the back-end model

(74)

were shown to the user who could act on them to improve the quality of a current session, the project proved two of the stats-parameters (network delay and packet loss percentage) for assessing QoS have the negative effect on the perceived video quality but with different influence degree. Additionally, the available bandwidth turned out to be an important factor. However, as WebRTC’s Statistics APIs do not provide the stats of available bandwidth currently, the project did not consider the available bandwidth as an stats-parameter to predict the video quality. Hence it should be added as an additional stats-parameter in the future.

5.2 Limitations

The section lists some limitations of this thesis project. These limitations were:

• Because all of the software was deploy on a localhost, the performance of this computer was a limitation. During the data processing phase, it took more than 28+ hours to load all of the original historical data (4GB) into Elasticsearch. The training of the model took a few hours.

• For Section 3.2, the lack of data diversity is also a limitation. There may be more than 5 clusters, but we do not have enough data to see them. What is more, there was very little data with bad QoS grades. If the data in bad quality clusters is sufficient, and there is no need to apply an over-sampling algorithm to generate new "bad" data.

• Some elements of the software used in the thesis project are not open source, hence details of them are not presented in this thesis.

5.3 Future work

(75)

the model with parameters that WebRTC’s Statistics APIs currently provide. Expanding to additional stats parameters, such as available bandwidth, could be a future task. In addition, gathering additional stats data would help to explore the probability of corner cases. Last but not least, testing conducted on commercial WebRTC products facing the public. Therefore, a future evaluation could be made in the real world rather than a simulation (emulation) environment.

5.4 Reflections

(76)

[1] Getting Started | WebRTC.URL: https://webrtc.org/start/

(visited on 04/11/2018).

[2] Varun Singh. Analytics for WebRTC — callstats.io. en. URL: / /

www.callstats.io/(visited on 04/11/2018).

[3] Xiaokun Yi. Adaptive Wireless Multimedia Services. Master’s thesis, KTH Royal Institute of Technology, Stockholm, Sweden, May

2006. URL: http://urn.kb.se/resolve?urn=urn:nbn:

se:kth:diva-92208.

[4] Elasticsearch BV. Elasticsearch. en-us. Products. URL: https://

www . elastic . co / products / elasticsearch (visited on 04/11/2018).

[5] Elasticsearch BV. Kibana: Explore, Visualize, Discover Data | Elastic.

URL: https://www.elastic.co/products/kibana (visited

on 04/11/2018).

[6] #webrtc. WebRTC architecture.URL: http://io13webrtc.appspot.

com/#31(visited on 04/11/2018).

[7] WebRTC. Frequent Questions. Apr. 2016.URL: https://webrtc.

org/faq/#what- is- the- isac- audio- code(visited on

04/11/2018).

[8] R. G. Cole and J. H. Rosenbluth. “Voice over IP performance monitoring”. en. In: ACM SIGCOMM Computer Communication Review 31.2 (Apr. 2001), p. 9. ISSN: 01464833. DOI: 10 . 1145 /

505666.505669.URL: http://portal.acm.org/citation.

cfm?doid=505666.505669.

[9] S. Andersen et al. “Internet Low Bit Rate Codec (iLBC)”. In: Internet Request for Comments RFC 3951 (Dec. 2004).ISSN: 2070-1721.URL:

http://www.rfc-editor.org/rfc/rfc3951.txt.