Determining multimedia streaming content

(1)

Determining multimedia

streaming content

Richard Tano

November 25, 2011

Master’s Thesis in Engineering Physics, 30 credits

Supervisor at Ericsson: David Lindegren

Examiner: Jerry Eriksson

Ume˚

a University

Department of physics

SE-901 87 UME˚

A

(2)

(3)

Abstract

This Master Thesis report was written by Ume˚a University Engineering Physics student Richard Tano during his thesis work at Ericsson Lule˚a.

Monitoring network quality is of utmost importance to network providers. This can be done with models evaluating QoS (Quality of Service) and conforming to ITU-T Recommen-dations. When determining video stream quality there is of more importance to evaluate the QoE (Quality of Experience) to understand how the user perceives the quality. This is ranked in MOS (Mean opinion scores) values. An important aspect of determining the QoE is the video content type, which is correlated to the coding complexity and MOS values of the video. In this work the possibilities to improve quality estimation models complying to ITU-T study group 12 (q.14) was investigated. Methods were evaluated and an algorithm was developed that applies time series analysis of packet statistics for determination of video streams MOS scores. Methods used in the algorithm includes a novel assembling of frequent pattern analysis and regression analysis. A model which incorporates the algorithm for us-age from low to high bitrates was defined. The new model resulted in around 20% improved precision in MOS score estimation compared to the existing reference model. Furthermore an algorithm using only regression statistics and modeling of related statistical parameters was developed. Improvements in coding estimation was comparable with earlier algorithm but efficiency increased considerably.

Best¨

amning av inneh˚

all p˚

a multimedia-str¨

ommar

Sammanfattning

Detta examensarbete skrevs av Richard Tano student p˚a Ume˚a universitet ˚at Ericsson Lule˚a.

¨

Overvakning av nätets prestanda är av yttersta vikt för nätverksleverantörer. Detta görs med modeller för att utvärdera QoS (Quality of Service) som överensstämmer med ITU-T rekommendationer. Vid bestämning av kvaliten p˚a videoströmmar är det mer meningsfullt att utvärdera QoE (Quality of Experience) för att f˚a insikt i hur användaren uppfattar kvaliten. Detta graderas i värden av MOS (Mean opinion score). En viktig aspekt för att bestämma QoE är typen av videoinneh˚all, vilket är korrelerat till videons kodningskom-plexitet och MOS värden. I detta arbete undersöktes möjligheterna att förbättra kvalitet-suppskattningsmodellerna under uppfyllande av ITU-T studygroup 12 (q.14). Metoder un-dersöktes och en algoritm utvecklades som använder tidsserieanalys av paketstatistik för up-pskattning av videoströmmars MOS-värden. Metoder som ing˚ar i algoritmen är en nyutveck-lad frekventa mönster metod tillsammans med regressions analys. En modell som använder algoritmen fr˚an l˚ag till hög bithastighet definierades. Den nya modellen gav omkring 20% förbättrad precision i uppskattning av MOS-värden jämfört med existerande referensmodell.

¨

(4)

ii

Acronyms

LR Linear regression LOR Logistik regression

FPD Frequent pattern discovery MOS Mean opinion scores

PEVQ Perceptual Evaluation of Video Quality QoS Quality of Service

(5)

2.4.3 Tools utilized . . . 4 2.5 Related Work . . . 5 3 Background 7 3.1 ITU . . . 7 3.1.1 ITU-T Recommendations . . . 8 3.2 RTP . . . 9 3.2.1 RTP header . . . 9 3.3 Video streams . . . 9 3.3.1 Compression . . . 10 3.3.2 Image coding . . . 10 3.3.3 Video coding . . . 11 3.3.4 H.264 . . . 11 3.4 Quality assessment . . . 12 3.5 PEVQ values . . . 13 3.6 Timeseries analysis . . . 14 3.6.1 General analysis . . . 14 3.6.2 ARIMA . . . 15 3.6.3 Timeseries matching . . . 15

3.6.4 Frequent pattern discovery . . . 16

(6)

iv CONTENTS

3.7.1 Multiple linear regression . . . 17

3.7.2 Logistic regression . . . 17

3.8 Cross-validation . . . 18

4 Results 19 4.1 Algorithm description . . . 19

4.1.1 Overview . . . 19

4.1.2 Frequent pattern discovery (FPD) . . . 19

4.1.3 Regression analysis . . . 21

4.1.4 Combining the methods . . . 24

4.1.5 Algorithm stepwise . . . 24

4.1.6 Algorithm for all methods . . . 25

4.1.7 Algorithm with only regressions . . . 25

4.1.8 Algorithm with modeled regression . . . 25

4.2 Algorithm numerical results . . . 26

4.2.1 Mathematical approach . . . 26

4.2.2 Results in numbers . . . 28

5 Conclusions 33 5.1 Data analyzed . . . 33

5.2 Methods analyzed . . . 33

5.2.1 Trends and cyclical behaviour methods . . . 34

5.2.2 Time series matching . . . 34

5.2.3 FPD . . . 34

5.2.4 Regression . . . 34

5.2.5 Analysis of statistic parameters . . . 35

5.3 Discussion of results . . . 36

5.4 Restrictions and limitations . . . 37

5.5 Future work . . . 37 6 Acknowledgements 39

References 41

(7)

List of Figures

3.1 Video stream as frame sizes versus time (measured in frame number) . . . 8

3.2 RTP header scheme [7] . . . 10

3.3 Intra coding . . . 11

3.4 Inter coding . . . 12

3.5 Quality (PEVQ) vs bit rate for three different contents . . . 13

3.6 Basic structure of PEVQ measure algorithm [14] . . . 14

4.1 Compressed frame sequence . . . 21

4.2 Flow of algorithm . . . 25

4.3 Error reduction versus bitrates for selected algorithms . . . 29

4.4 Residuals of old regression model . . . 30

4.5 Residuals of linear regression model . . . 30

4.6 Residuals of logistik regression model . . . 31

4.7 Residuals of combined regression model . . . 31

4.8 Residual comparison at 300kb/s bitrate between existing model and new model using algorithm V2 . . . 32

4.9 RMSE reference vs RMSE modeled regression for various bitrates . . . 32

A.1 LR regression parameter . . . 43

A.2 Standard deviation of frame diff sequences . . . 44

A.3 Median of frame sequence . . . 44

A.4 Longest calm period of frame sequences . . . 45

A.5 Longest small period of frame difference sequences . . . 45

A.6 Longest large period of frame sequences . . . 46

A.7 Number of passes through median of frame sequences . . . 46

A.8 LOR category intercept1 . . . 47

A.12 Mean of frame difference sequences . . . 49

(8)

vi LIST OF FIGURES

A.14 Standard deviation of frame sequences . . . 50

A.15 Median of frame sequences . . . 50

A.16 Longest calm period of frame sequences . . . 51

A.17 Longest small period of frame difference sequences . . . 51

A.18 Longest small period of frame sequences . . . 52

B.1 A number of frame sequences with PEVQ values of around 0-2.8 . . . 53

B.2 A number of frame sequences with PEVQ values of around 2.9-3.2 . . . 54

(9)

List of Tables

2.1 Tasks performed . . . 5

4.1 LR regression params . . . 23

4.2 LOR regression params . . . 23

4.3 LR modeled . . . 26

4.4 LOR modeled . . . 27

4.5 Error reduction from methods versus bitrates . . . 28

4.6 Error reduction with algorithms versus bitrates . . . 28

4.7 RMSE for algorithms versus bitrate . . . 30

(10)

(11)

Chapter 1

Introduction

Multimedia streaming has never been more used than now with Youtube and mobile TV ex-ploding in popularity. By monitoring these services mobile operators and internet providers can see how well their networks are working and possibly how happy their customers are, without having to arrange polls and surveys. An important service is video streaming which is one of the most bandwidth demanding services. One way to monitor service quality is that all user traffic in a live network are used, where hundreds of thousands video clients are reporting back to a measurement collection server. Lots of different contents will be used, and there is no way for the client to obtain the original, un-coded, file. In this case a parametric video quality model is used, that only takes parameters from the client and sends a quality score back to the measurement server. Parametric models have so far no good way to separate different all contents, meaning that the same performance indicators (packet loss, bit rate etc) will lead to the same score regardless of how easy the clip is to encode.

The task for the thesis is to use the information available in multimedia clients, to predict the content type, codec and other parameters that could help to enhance parametric models. ITU-T (International Telecommunication Union) are creating standards for infocommu-nications. Currently they are evolving new standards in QoS (Quality of service) and QoE (Quality of experience). One of the questions is handling the development of parametric models aimed for quality measurement purposes. Specifically questions regarding models discarding the use of the DPI (deep packet inspection) method are worked on. DPI models can be used to separate contents, however because of limitations regarding areas like security and technology the DPI method may not be applicable and thus the need for alternative models are high in the industry and at Ericsson.

1.1 Ericsson

Ericsson is one of the largest companies in Sweden. The company was founded in 1876 by Lars Magnus Ericsson and is now headquarted in Kista, Stockholm. It is a provider of

(12)

2 Chapter 1. Introduction

(13)

Chapter 2

Problem Description

2.1 Problem Statement

The aim of the thesis work is to find a mathematical way to classify video parameters from multimedia streams (video streams). Specifically find a mathematical model which should be able to order a video stream in levels of its coding complexity based on the parameters. The focus is on video streams coded in H.264 with a bit rate of 300 kb/s. The transport protocol used will be RTP and the resolution QVGA. The thesis work was extended to include investigation of a wide range of bitrates.

2.2 Goals

The goal is to create and describe a mathematical model that complies to and can be used in the ITU-T study group 12 (q.14) standards P.NAMS and only access basic protocol information in the data packets of video streams to classify video parameters.

2.3 Purposes

The purpose is to reduce the error that comes from uncertainty in correct classification of coding complexities of the video streams in existing parametric QoS models.

2.4 Methods

The work starts with a literature study and search of potential mathematical methods to use. To test the mathematical model analysis of video streams is needed. The creation of video streams is done using encoding software. Matlab is used to build the mathematical models, simulate and analyze the results.

(14)

4 Chapter 2. Problem Description

2.4.1 Planning

The work schedule were planned as follows:

1. Write specification: Specify the object, limitations and time planning of the thesis work.

2. Literature study: Get knowledge about the background to the problem. Search for existing work in the area. Look for potential mathematical methods and possibilities to use in model building.

3. Model decision: Determine which mathematical methods to use.

4. Data creation: Find media stream content to use in model testing and format it to data (packet streams).

5. Model building: Creation of the different model algorithms.

6. Model simulation: Try models on the data (packet streams and PEVQ scores). 7. Data analysis: Analyze the results of the model simulation.

8. Model analysis: Evaluate if changes to models can be made. If so start over at 5. 9. Report writing: If results are sufficient, write report.

Besides this two presentations were given to informed coworkers at Ericsson during the thesis work.

2.4.2 How the work was done

Table 2.1 shows the tasks planned during the work.

2.4.3 Tools utilized

Here follows a list of the tools that were used during the work. – Model building and simulation: MatLab 7.10.0 (R2010a) – Video streams: Contents on Ericsson server

– Simulation of packet streams: Oqtopus (Ericsson proprietary encoding script system) • X.264 encoder version r1867

(15)

2.5. Related Work 5

Tasks performed Activity Area

Research Gathering information Documentation Project plan

Reading Transport protocol Reading Video coding

Reading Mathematical methods Administration Installing software Work Defining model scenarios Work Build models

Work Data gathering Testing Models

Analysis Data Documentation Report Presentation Thesis

Table 2.1: Tasks performed

2.5 Related Work

(16)

(17)

Chapter 3

Background

New media services and networks experiences time varying performances. Monitoring sys-tems are used to measure how the users experience the quality of the services. For video streams the performance is often assessed in terms of quality of service (QoS), which for example includes lost, dropped and resent data packet information. As a result of new video standards, greater quality degradations is accepted. To know if the end user receives promised quality, it is required to assess the perceived quality from the user, referred to as quality of experience (QoE). The most correct way to do this assessment is with perception tests with human subjects. These tests demands large resources and cannot be done in a continuously manner to monitor the quality of a running service. Because of this other methods have been developed that use quality models that map QoS performance indicators to user perceived quality obtained from the perceptual quality tests. [2] These techniques fall under the ITU-T Recommendations (standards) for objective video quality assessment. When building models following the ITU-T Recommendations study group 12 Question 14 there exists restrictions considering the usage of data packet information (P.NAMS). When analyzing the video streams for determination of video parameters only packet statis-tics can be used as input to the models. Video streams can be viewed as time series of frame sizes (figure 3.1). Possible mathematical methods for analyzing the packet statistics include time series and regression analysis. Time series analysis focus on patterns in the data streams while regression analysis uses statistic parameters of the data streams.

3.1 ITU

ITU (International Telecommunication Union) has been creating standards in infocommu-nications since 1865. It became United Nations specialized agency in 1947 and its standards are used worldwide. ITU Telecommunication Standardization Sector (ITU-T) produces the standards called ITU-T Recommendations. The Recommendations become mandatory only when adopted as part of a national law. These Recommendations define how telecommuni-cation networks operate and interwork. ”Over 3000 Recommendations are in worldwide use

(18)

8 Chapter 3. Background

Figure 3.1: Video stream as frame sizes versus time (measured in frame number) for various topics ranging from network architecture and security to transmission systems and next-generation networks.” [3]

3.1.1 ITU-T Recommendations

ITU-T has a study group for evolving new standards in QoS (Quality of service) and QoE (Quality of experience) (study group 12). This study group is assigned a period from 2009-2012 to determine new standards in this field. [4]

Question 14 (Q.14/12) handles the development of parametric models for media qual-ity measurement purposes (Development of parametric models and tools for audiovisual and multimedia quality measurement purposes). These measurement models are used to estimate the user experience as written by ITU-T: ”Measures that predict user-experience are useful in monitoring and managing time-varying performance and help to facilitate the rollout, efficient operation and effective service management of such networks.”. [5]

(19)

3.2. RTP 9

3.2 RTP

When sending data over networks specific transport protocols are used. One of the most common are the real time transport protocol (RTP). This protocol is used in applications for transmitting audio, video or simulation data over multicast or unicast networks (trans-mitting data with real-time properties). RTP uses the RTP control protocol (RTCP) to monitor quality of service and does not by itself provide any quality of service guarantees. RTP is usually run on top of another network protocol (typically UDP). Both protocols contribute to the transport protocol functionality. [7]

Information over networks are delivered in data packets. Limitations in network per-formance affect how fast and reliable packets can travel from sender to receiver. Packets can be of various sizes and upper limits exists. Besides containing content information the packets also have headers which incorporates important parameters. These parameters are used for delivery and specification of the information in the packets. Parameters could for example be IP addresses, content type parameters, compression formats, encryption etc.

3.2.1 RTP header

When sending data packets over a network protocol the packets are encapsulated with new (extra) header information. The RTP protocol uses a number of header information parameters [7]. These are: version, padding, extension, CSRC count, marker, payload type, sequence number, timestamp, SSRC, CSRC. The sequence numbers on the packets give the receiver the possibility to reconstruct the senders packet sequence and in the video decoding determine the proper location of packets. This excludes the necessity to decode packets in correct sequence. It can also be used to detect packet loss. The marker can be used to signal significant events in the packet stream, for example frame boundaries. The timestamp determines the sampling instant of the first octet in the RTP data packet. The clock increments monotonically and linearly and its format is specified statically in the profile or payload format. The frequency is dependent of the data carried in the payload. Figure 3.2 shows a schematic view of the RTP header.

3.3 Video streams

(20)

10 Chapter 3. Background 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers |

| .... |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3.2: RTP header scheme [7]

AVC. In recent years the most common technique used in streaming high quality video is H.264.

3.3.1 Compression

Compression is used when sending or streaming data to reduce the data size (i.e. coding of images and video). There are two basic types of compression, lossless or lossy compression. [8]

1. Lossless compression: All information is kept and only knowledge about the source is needed. Size is reduced by increasing the information contained in every data bit sent. Various techniques exists but limits exists depending on the entropy of the source. 2. Lossy compression: Gives better compression but information is lost and knowledge

about both the receiver and source is required. Size reduction is received mainly from removing and distorting details and not requiring exact reconstruction in receiver.

3.3.2 Image coding

Images are built up from pixels which normally have three color values (RGB) dictating which color it represents. For compression its better to convert these values to another color space (YUV) and thus reduce the correlation. In color space the pixels are represented by a luminance component (brightness) and two color values.

(21)

3.3. Video streams 11

3.3.3 Video coding

In video coding the pictures are commonly divided into macro blocks. These are usually in the size of 16x16 pixels. There are two basic types of coding these macro blocks, intra coding and inter coding. [9]

1. Intra coding(I): The macro block is coded like a still image block. With similar techniques, like DCT transformations. Figure 3.3 shows an intra coding scheme. 2. Inter coding(P): Similar macro blocks between current and previous images are searched

for. The macro block is then coded by a motion vector and a difference block. This requires less data than Intra coding. Figure 3.4 visualize the motion vector.

Figure 3.3: Intra coding

The frames in the videocoding is arranged to different categories depending on how their macroblocks are coded.

1. I-pictures: All macroblocks are intra coded.

2. P-pictures: Macroblocks can be P-coded or skipped.

First picture in all videostreams have to be an I-picture. Different standards have various schemes for use of I- and P-pictures.

Value of motion dictates grade of compression, with less motion giving better compression and less data streams.

3.3.4

H.264

(22)

Figure 3.4: Inter coding

the Joint Video Team (JVT) which was a collaboration between the ITU-T Video Coding Experts Group (VCEG) and ISO/IEC Moving Picture Experts Group (MPEG). Its used by many internet streaming resources and softwares but also with cable/satellite televisions services, Blue-ray players, real-time videoconferencing and more.

H.264 is more complex (up to five times) than previous videos standards but also gives higher reduction in bitrates (20-50% compared to H.263) and works both with low and high bitrates. [9] Most important changes from previous standards are:

1. Usage of parameter sets instead of picture headers

2. Intra coding now results in 4x4 difference blocks with many different directional modes 3. Inter coding also results in 4x4 blocks with many previous pictures used as references

with more motion vectors and lower resolution 4. Integer transform instead of DCT transform 5. Prediction is used in coefficient coding 6. Loopfilter used

3.4 Quality assessment

When doing quality assessments for video streams a measurement algorithm gives quality scores, MOS (Mean Opinion Scores), depending on the picture quality.

(23)

3.5. PEVQ values 13

from 1 to 5, with 1 being bad (lowest perceived quality) and 5 being excellent (highest perceived quality). [10]

Subjective tests with real life persons is the most correct and common way to mea-sure MOS scores. Another way to do meamea-surements is by full reference models which use parametric models to make comparisons between the original video and the coded video. Yet another way is to use a parametric model which only use the streamed (coded) video. Both methods need to learn from human users (subjective tests) which parameters affect experienced quality. The MOS values can then be estimated from these parameters.

Video quality in video streams are highly correlated to the bit rate. High bit rate makes it possible for the encoder to use more data for each frame and thus also less compression of the picture. Figure 3.5 shows a correlation of bit rate and video quality measured in PEVQ values (see 3.5) for three video clips containing different contents.

Figure 3.5: Quality (PEVQ) vs bit rate for three different contents

For quality tests with humans it has been found that human memory affect the quality results and thus content length has to be considered when undertaking tests. It has been proposed that test lengths no longer than twenty seconds are to be used. [12] As a result preferred length of clips used in Ericsson models are ten seconds.

3.5 PEVQ values

(24)

ITU-T Recommendation J.247 (2008). The MOS-LQO scale goes from 1 (worst) to 4.5 (best). [11] The actual PEVQ value can however go below 1 and even reach negative values because of limitations in the algorithm used.

PEVQ gives MOS estimates of video quality degradation by comparing the undistorted reference video signal with the streamed video signal. Its a full reference, intrusive measure-ment algorithm. The approach of PEVQ is to model the human visual system and quantify the anomaly perceived in the video signal by a number of key performance indicators (KPIs). This estimation includes both the packet level impairments (loss, jitter) and signal related impairments (blockiness, jerkiness, blur etc.) caused by coding of the video. [13] Figure 3.6 shows an overview of the structure of the PEVQ algorithm.

Figure 3.6: Basic structure of PEVQ measure algorithm [14]

3.6 Timeseries analysis

A time series is a sequence of values or measurements over time. Successive values are assumed to be taken at equally spaced time intervals.

Time series analysis builds on identifying the pattern in the observations and use it to describe the behavior of the sequence or forecast future values.

(25)

3.6. Timeseries analysis 15

3.6.1 General analysis

Most patterns in time series can be explained by two basic components, trends and season-ality.

Trend is a component that change over time and can be both linear or non linear. It doesn’t repeat itself over time.

Seasonality is a component that also changes over time but unlike the trend it repeat itself in systematic intervals over time.

For trend analyses there exists no automatic techniques proven to work. The first step in most techniques is to remove the error component by smoothing the series. One of the most common techniques is the use of moving averages. This works by taking the average of surrounding values. Either median or mean can be used as an average, where the median method is more stable to outliers. Next step in trend analysis is fitting a function to the series, commonly a linear function is used. For this to work the series may need a transformation to remove nonlinearity, with a logarithmic or polynomial function.

Seasonality analysis is built to find the correlation between values in the series. The period between the repeat of a pattern in the series is called the lag. The correlation dependency between two terms can be measured by autocorrelation. [15]

3.6.2 ARIMA

Used for generating forecasts in time series analysis. The basic methodology behind it is estimation of sets of coefficients that can describe consecutive elements of the time series based on earlier time lagged elements.

The method is complex and comes with the condition that it requires stationarity of the time series. To use the model one needs to make the series stationary and remove serial dependency (seasonal). This means differencing the series until stationarity is achieved. For good results the user needs to examine plots and autocorrelograms to find a suitable level of differencing.

3.6.3 Timeseries matching

(26)

Piecewise constant and symbolic basis

A piecewise quantized basis starts with the series being divided into k segments. All points inside a segment are represented by their mean value.

This quantization can be done in MatLab through the functions reshape [21] and mean [22]. A code example is shown below.

K_{segments} = reshape(DataFrames, SegmentLength, NumberCoefficients); K_{means} = mean(K_{segments});

DataFrames: Frame sequences to quantize. SegmentLength: Length of segments

NumberCoefficients: Number of symbolic segments

These mean values can also be quantized in levels and exchanged by the symbolic coun-terpart representing the level (symbolic quantization).

Euclidean distance

This is the most widely used distance measurement method. The definition is expressed in equation (3.1).

L2 = q_X

((ai− bi)2) (3.1)

3.6.4 Frequent pattern discovery

Specific patterns of variable length can have higher chances of occuring in series depending on their parameters. Occurance of patterns can be measured against database patterns. Thus classification can be done through similarity comparisons.

3.7 Regression analysis

Regression is the statistical method of trying to find a relationship between a number of variables that predicts an outcome [17]. Regression can use one or many (multiple regres-sion) independent variables (predictor variables) to predict a dependent variable (response variable), which is the outcome.

Linear regression (LR) is the mathematical relationship of the straight line that best approximates the individual data points of the independent variable. Other forms of re-lationship between the predictor and response variable can be found, e.g. quadratic or logarithmic.

(27)

3.7. Regression analysis 17

Y = a + b × X + u (3.2) Where Y is the dependent variable, a is the intercept, b is the slope, x is the independent variable and u is the residual.

The slope b is the parameter that indicate how much a change in the predictor variable affect the response variable.

The accuracy of the prediction is a result of the models fit. This fit depends on how well the linear relationship corresponds to the data points. To know how well the fit is a number of values can be calculated that explains the models fit.

The residuals are an important marker of how good the regression model is. A residual is the difference between the model prediction and the actual prediction at that value of the independent variable. Residual plots can explain patterns or misbehavior in models.

R-value is another important value. It says how well the total fit is between the model and the independent variables. It sums up the residuals.

Through a statistic f-test the hypothesis that the model explains the relationship (effect different from zero) can be evaluated. The outcome of the test is a f-value that corresponds to a p-value. Depending on if the p-value is below or above the statistical significance requested, the hypothesis will either be dropped or kept.

There are various forms of regression, two of them are multiple linear regression and logistic regression.

3.7.1 Multiple linear regression

As with ordinary LR multiple linear regression predicts an independent variable from in-dependent variables. In this case there can be multiple numbers of inin-dependent variables. In MLR different relationships between the independent variables can create new predictor variables in the model.

Multiple linear regression takes a similar form as ordinary linear regression (3.3). Y = a + b1× X1+ b2× X2+ B3× X3+ ... + Bt× Xt+ u (3.3)

Where the numbers indicate the number of the independent variable and its correspond-ing parameter.

The same statistical parameters can be found in multiple LR as in ordinary LR. In the same way but with t test statistics can be carried out on all the b values. If the corresponding p-value is outside of the searched significance the b value can be dropped.

Multiple linear regression can be done in MatLab through the regstat function. [19]

3.7.2 Logistic regression

(28)

The difference from other linear regressions is that the response variable can be discrete. Thus it can only attain certain values and the outcome of the logistic regression model is a prediction of which value is the best fit. The discrete values can also be looked on as categories. For logistic regression there is no equivalent to R-square. Models can be compared with another measurement called the deviance of the fit, which is the difference between the log-likelihood of the fitted model and the maximum possible log-likelihood.

Logistic regression can be done in MatLab through the mnrfit function. [20]

3.8 Cross-validation

(29)

Chapter 4

Results

4.1 Algorithm description

4.1.1 Overview

Through the header timestamps of packets in media streams the frames can be assembled into a sequence consisting of the frame sizes and their time orders. These frame sequences consists of two different series. Every even or odd frame follows its own pattern, often the case being that either the odd are really low in size while the even being big or the reversed. A way to handle this behavior is to put each pair of frames together into a new frame consisting of both their sizes. These added frames thus create a new frame sequence which is subsequently analyzed regarding the frame sizes (FS).

Also used in the analysis is the difference between two consecutive frames, which are put in a new sequence consisting of the frame difference sizes and their corresponding time orders (FDS).

The sequences are analyzed and conversed into predicted MOS scores ( \M OS) by the combination of three methods, frequent pattern analysis and two types of regression analysis.

4.1.2 Frequent pattern discovery (FPD)

Frequent pattern discovery method is estimating \M OS scores to sequences by assigning them to categories. This categorization is done by searching for frequently occurring patterns in each sequence and comparing this with the patterns belonging to each category.

Each category consists of an upper and lower PEVQ limit, a PEVQ mean and a number of common patterns. These patterns are taken from frame sequences which have PEVQ scores inside the PEVQ interval of the category. The patterns consists of symbols, indicating an interval of values

In a categorization of a frame sequence the PEVQ mean of the category, with the highest similarity between the patterns in the sequence and category, is assigned to the sequence as

(30)

20 Chapter 4. Results

a prediction of its \M OS value.

For better prediction different numbers off lengths on patterns can be used. Also both the FS and FDS sequences can be used.

The method therefore consists of two parts, creation of the database patterns and ana-lyzing of a frame sequence.

Creation of database

Input to this method for creation of database patterns are the frame sequences (both ordi-nary frame size sequences and frame difference size sequences), the number of categories to be used, which symbol levels to use, how big the segment lengths are going to be andthe number of symbols in the patterns (length of patterns).

The first step of this method is to decide the category PEVQ intervals through the frame PEVQ scores of the frame sequences given as input to the method. This is done by evenly distributing the sequences over the number of categories to be used. The frame sequences are then put in their respective category according to their PEVQ value.

All frame sequences are reduced in dimensionality by breaking the sequence in certain segments, averaging their value over these segments and labeling them with a symbol. La-beling is done by comparing values to a symbol database with upper and lower limits for each symbol. If the value is inside the symbol range, that symbol is used. The compressed sequences are then arranged in patterns, which can be of various lengths. For example: ABA, AABC, ABCDE.

The next step is to compare and count patterns in the frame sequences belonging to each category. Only the most common patterns in each category are kept. A pattern can only be used in one category, so all patterns in a category must be unique. The final step is to put all categories to equal length (equal number of patterns).

Output of the method is the categories and their corresponding patterns. These patterns are for both FS and FDS sequences and can be of various lengths.

An example of a pattern output is: Category with 3 symbols FS patterns =

’LHF’ ’HFF’ ’FFE’ ’FEE’ ’EEE’ ’EED’ ’EDB’ ...

Figure 4.1 shows how a frame sequence gets compressed and arranged in specific symbol levels.

FPD analysis of a frame sequence

(31)

4.1. Algorithm description 21

Figure 4.1: Compressed frame sequence

The method counts the number of similar patterns in each category. This is done for all the pattern types and lengths in the database. The one with the most hits is chosen as the category estimate for the specific pattern type and length.

Output from the method is a category estimate for each of the different types of patterns and pattern lengths used.

4.1.3 Regression analysis

Two regression methods are used, logistic regression and multiple linear regressions. Both build upon creating statistical models depending on a number of statistical parameters extracted from the frame sequences.

Both methods uses statistics extracted from the frame sequences by the statistic extrac-tion funcextrac-tion.

Statistic extraction function (ST)

This function takes a frame sequence and extracts various statistic parameters from it. Input is the frame sequence to extract statistics from. This can either be an FS or FDS sequence.

Output is the statistics. These are as follow: 1. Mean of sequence (M)

(32)

3. Standard deviation of sequence (S) 4. Modes of sequence (MO)

5. Median of sequence (ME)

6. Longest calm period of sequence (LC). Defined as the longest length of period where the difference of two subsequent frames is smaller than 0.1 times the mean of the sequence.

7. Longest active period of sequence (LA). Defined as the longest length of period where the difference of two subsequent frames is greater than 0.0025 times the mean of the sequence.

8. Longest small period of sequence (LS). Defined as the longest length of period where frames are smaller than 0.9 times the mean of the sequence.

9. Longest large period of sequence (LL). Defined as the longest length of period where frames are greater than the mean of the sequence.

10. Number of bursts in sequence (NB). Defined as the number of times where the differ-ence of two subsequent frames is greater than 0.045 times the mean of the sequdiffer-ence. 11. Number of passes through median of sequence (NP). Defined as the number of times

the sequence goes from greater to smaller than the mean of the sequence or the re-versed.

Linear regression method (LR)

This method uses multiple linear regression analysis to get a statistical method for \M OS estimation of a frame sequence. A dataset of frame sequences, their corresponding statistical parameters and their PEVQ scores are used to create a regression model were the statistical parameters are the predictor variables (independent variables) and the PEVQ score are response variables (dependent variables).

Input to the method is statistics from the frame sequences extracted in ST. Example of statistical parameters to use in a good LR regression model are shown in table 4.1 with corresponding regression statistics. The adjusted R squared value for this LR regression is 0.3114.

Output of the method is a statistical model, which takes the statistical parameters as input and gives a \M OS estimate as output.

Example of output from the LR regression is shown in equation (4.1).

\

(33)

LR regression

Parameter T-statistic P-value Standard deviation of frame difference sequence 2.7514 0.0063 Median of frame sequence 3.6444 0.0003 Longest calm period of frame sequences -2.4016 0.0170 Longest active period of frame sequences 2.4515 0.0148 Longest small period of frame difference sequences -3.1349 0.0019 Longest large period of frame sequences 4.3367 0.0000 Number of passes through median of frame sequences 3.5316 0.0005 Total model (F-statistic) 18.477 2.872e-20

Table 4.1: LR regression params Logistic regression (LOR)

The logistic regression method uses a multinomial logistic regression to categorize frame sequences and then deciding a \M OS estimate in a similar way to the FPD method. The method uses an ordinal model to fit and use no interactions between categories.

Like in linear regression a number of statistical parameters are used as input to the regression (independent variables). Also used as input is the same categories used in the creation of database patterns, these are the dependent variables in the regression.

Example of statistical parameters to use in a good LOR regression model are shown in table 4.2 with corresponding regression statistics. The deviance for this LOR regression is 846,79.

LOR regression

Parameter T-statistic P-value Mean of frame difference sequences -0.0347 0.9723 Variance of frame sequences -2.1505 0.0315 Standard deviation of frame sequences -3.5486 0.0004 Median of frame sequences 2.7104 0.0067 Longest calm period of frame sequences -2.3708 0.0177 Longest active period of frame sequences 1.6882 0.0914 Longest large period of frame sequences -4.4320 0.0000 Number of passes through median of frame sequences -4.1254 0.0000

Table 4.2: LOR regression params

Output is a statistical method. This method takes statistical parameters as inputs and calculate a category intercept value (CIV) which is compared to the intercept values of the categories (which are also given by the method) to estimate a category for the frame sequence. Number of intercept values depends on number of categories used. In the case of five categories, four intercept values are provided each corresponding to the upper level of the real MOS category.

(34)

CIV = −0.0030 × M − 0.0001 × V + 0.0040 × S − 0.0010 × M O +0.0166 ×LC − 0.0119 × LA − 0.0422 × LL − 0.0350 × N P (4.2)

Intercept values = [1.7580, 2.9797, 4.0756, 5.3036]

Mean PEVQ of categories = [2.25431, 2.76333, 3.07321, 3.32914, 3.67939]

4.1.4 Combining the methods

There are various ways of combining the methods. The combinations all build upon taking a mean value of their estimations (categories or PEVQ values).

One way is to combine the frequent pattern discovery and logistic regression into one category estimation. This category estimation then gives a \M OS value through the category PEVQ mean value. The mean of this \M OS estimate and the linear regressions \M OS estimates is then used as a prediction of the \M OS score of the frame sequence in question.

4.1.5 Algorithm stepwise

Here follows a flow chart description of the steps in the algorithm. Setup for the algorithm:

1. Build category database consisting of patterns from frame sequence content 2. Create both linear and logistic regression models from frame sequence content

Analyzing of an incoming frame sequence: 1. Extract patterns from the frame sequence

2. Extract statistical parameters from frame sequence 3. Run FPD on the patterns to get category estimations

4. Put statistical parameters in logistic regression to get a category estimation 5. Put statistical parameters in linear regression to get a \M OS estimation

6. Combine the category estimations from each method to get a more correct estimation 7. Use the \M OS mean from the category estimations as a \M OS estimate

(35)

Figure 4.2: Flow of algorithm

4.1.6 Algorithm for all methods

Assembling of the methods can be done in many different ways. The following list gives some examples of configurations. All forms use three different lengths on symbols for patterns (3,4,5). Example steps of merging (algorithm version 1):

1. Mean PEVQ of every pattern categorization using only FS series. 2. Mean of 1. and LOR \M OS value.

3. Mean of 2. and LR \M OS value.

Version 2 of algorithm is to skip step two and take the mean of frame and log directly. Version 3 is to add the usage of categorizations from FDS series in step 1.

4.1.7 Algorithm with only regressions

Another way is to only combine the regression methods into an algorithm. \

M OS = mean(LR \M OS + LOR \M OS)

4.1.8 Algorithm with modeled regression

(36)

considered or a huge database has to be built. Instead of doing this the statistical parameters used in the regression can be modeled.

The equations (4.3) (4.4) (4.5) (4.6) (4.7) (4.8) are used to parameterize the regression parameters. Input to the equations is the bit rate in question.

C1∗ XC2 (4.3) C1+ C2∗ X (4.4) C1∗ XC2+ C3 (4.5) C1∗ XC2+ C3∗ X (4.6) C1∗ XC2+ C3∗ XC4 (4.7) C1∗ XC2+ C3∗ XC4+ C5∗ X (4.8)

Table 4.3 show the equation type and the corresponding coefficients that was used for parameterize the LR regression parameters. Table 4.4 shows the same for the LOR regression using five categories.

LR modeled

Parameter Equation NR Model coefficients LR regression parameter (4.6) -0.0044,1.17,0.018 Standard deviation of frame diff sequences (4.6) 4.64,-1.63,2.88e-10 Median of frame sequence (4.4) 2.31e-04,-1.78e-07 Longest calm period of frame sequences (4.6) -0.096,-0.64,6.83e-07 Longest small period of frame difference sequences (4.6) -0.22,-0.60,1.34e-07 Longest large period of frame sequences (4.3) 1.71,-0.90

Number of passes through median of frame sequences (4.3) 0.41,-0.68 Table 4.3: LR modeled

Observe that the parameter ”Longest active period of frame sequences” have been re-moved. This is a result of the parameter showing bad behavior with increased bit rate and thus the difficulty of finding a suitable model.

For figures showing the correct and modeled values of the regression look in Appendix A.

4.2 Algorithm numerical results

4.2.1 Mathematical approach

Error reduction

(37)

4.2. Algorithm numerical results 27

LOR modeled

Parameter Equation Nr Model coefficients

LOR category intercept1 (4.8) 35.25,-3.295,-0.0063,1.17,0.021 LOR category intercept2 (4.8) -0.023,1.10,0.32,-8.64,0.049 LOR category intercept3 (4.8) 8.57e-11,3.39,-0.91,1.01,0.98 LOR category intercept4 (4.8) 3.06e-11,3.51,0.63,0.98,-0.56 Mean of frame difference sequences (4.3) -4.08e+02,-1.99

Variance of frame sequences (4.3) -0.33,-2.26

Standard deviation of frame sequences (4.5) 45.45,-1.69,6.20e-04 Median of frame sequences (4.5) 0.0015,0.16,-0.0047 Longest calm period of frame sequences (4.7) -11.92,0.15,11.93,0.15 Longest small period of frame difference sequences (4.4) 0.017,-5.39e-06 Longest large period of frame sequences (4.5) -1.18,-0.014,1.053 Number of passes through median of frame sequences (4.4) -0.034,1.0057e-05

Table 4.4: LOR modeled prediction between the existing and new model were done.

The existing model used the mean of the PEVQ values of all the video streams at a particular bit rate as the \M OS estimate for a test sequence. The error for the model could thus be calculated as:

Error existing model (EE) = |Mean PEVQ - PEVQ)| Error of new model is calculated as:

Error new model (EN) =

M OS - PEVQ)\

One way to show the error improvement is to calculate how big the error reduction was of the total error:

Error percent reduction (EPR) =EE − EN

EE (4.9) (Negative EPR would instead mean an increase in error.)

With only around 300 video clips available for testing, a cross-validation technique were carried out to get more statistic significance of the results. This was done by randomly selecting ten percent of the clips as test objects while the rest were used as training material for the algorithm.

RMS Error

(38)

28 Chapter 4. Results (4.10). RM SE = r P d2 i n (4.10)

Where d is the differences and n is the number of differences. Error residuals

Another way to show how good the algorithms are capable of predicting MOS score is to calculate the residuals between correct PEVQ scores and estimated \M OS scores.

Error residual = PEVQ - \M OS (4.11)

4.2.2 Results in numbers

Results in error reduction (using (4.9)) with implemention of various algorithms are shown in table 4.5 and figure 4.3.

Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s LR 15,51% 16,70% 12,80% 17,56% 12,52% 11,83% LOR 14,94% 12,70% 14,60% 17,32% 15,54% 14,14% FPD 1,76% 3,11% 3,41% 9,35% 10,35% 5,92% Combined methods V1 18,31% 16,84% 18,04% 20,02% 18,52% 12,92% Combined methods V2 18,47% 16,18% 18,21% 20,15% 19,50% 13,41% Combined methods V3 18,31% 16,35% 17,72% 19,92% 19,06% 13,40% Regression combination 18,71% 17,61% 15,87% 19,02% 15,73% 14,50%

Table 4.5: Error reduction from methods versus bitrates

Error residuals (using (4.11)) for regression methods are shown in figures 4.4 4.5 4.6 4.7 (with simulations using 10% of the data as validation clips and 10 cross-sampling runs).

A comparison of the spread of the error residuals (difference between correct and esti-mated PEVQ values) are shown in figure 4.8 for both the existing model and the new model using algorithm V2.

Table 4.6 show error reduction for algorithms with combined methods, modeled regres-sions and high bitrates.

Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s 600kb/s 800kb/s 1500kb/s Mean* All methods V1 20,6% 17,9% 19,8% 20,5% 16,2% 15,0% 19,4% 18,4% 11,8% 18,5% Reg combination 20,9% 17,5% 15,6% 15,1% 17,4% 13,4% 17,9% 14,5% 11,3% 16,8% LR modeled 17,12% 14,48% 7,19% 5,53% 9,08% 10,36% 3,11% -70,32% -700,4% 9,6% LOR modeled 13,5% 15,69% 13,72% 11,22% 12,6% 9,7% 14,8% 12,4% -70,61% 13,0% Regressions modeled 23,2% 20,1% 18,9% 16,9% 18,7% 15,8% 12,2% -12,2% -399,8% 18,0%

(39)

Figure 4.3: Error reduction versus bitrates for selected algorithms

Table 4.7 show RMSE for algorithms with combined methods, modeled regressions and high bit rates.

(40)

Figure 4.4: Residuals of old regression model

Model/Bitrate 150kb/s 200kb/s 250kb/s 300kb/s 350kb/s 400kb/s 600kb/s 800kb/s 1500kb/s Mean* LR 0.5938 0.5412 0.4734 0.4337 0.3985 0.3545 0.2858 0.2233 0.1137 0.4130 LOR 0.6106 0.5302 0.4970 0.4403 0.4141 0.3729 0.2998 0.2299 0.1139 0.4243 Reg combined 0.5809 0.5185 0.4710 0.4242 0.3976 0.3566 0.2842 0.2197 0.1107 0.4066 All methods V1 0.5752 0.5116 0.4617 0.4030 0.3990 0.3504 0.2753 0.2158 0.1130 0.3990 LR modeled 0.6022 0.5362 0.5028 0.4681 0.4208 0.3627 0.3367 0.4062 0.8522 0.4545 LOR modeled 0.6693 0.5835 0.5125 0.4648 0.4243 0.3781 0.3004 0.2273 0.2124 0.4450 Regressions modeled 0.5856 0.5310 0.4774 0.4340 0.3912 0.3490 0.3043 0.2810 0.5205 0.4192 Reference model 0.6777 0.6169 0.5499 0.4966 0.4670 0.4086 0.3468 0.2620 0.1275 0.4782

* Disregarded extreme bitrates of 800kb/s and 1500kb/s Table 4.7: RMSE for algorithms versus bitrate

(41)

Figure 4.6: Residuals of logistik regression model

(42)

Figure 4.8: Residual comparison at 300kb/s bitrate between existing model and new model using algorithm V2

(43)

Chapter 5

Conclusions

5.1 Data analyzed

When examining frame sequences of video streams of different coding complexities (see Appendix B) one could observe that in general more changes (jumps) in frame size is seen in frame sequences belonging to high quality scores. Smooth curves are more apparent in groupings of lower quality scores. However no definite conclusion could be drawn about a specific frame sequence since all types of curves exist for all quality scores. There was no visual clue that can give a definite answer to where which quality category a frame sequence fit into, only various levels of probability. This could be a result of parameters not shown in the frame sequence statistics and only shown in the actual packet data.

5.2 Methods analyzed

A selection of different mathematical methods were looked into and tested in the search for a useful algorithm. The methods evaluated were:

– Field of time series analysis

• Trend and cyclical behavior methods • Time series matching

• Frequent pattern discovery – Field of regression

• Multiple linear regression • Logistik regression

(44)

34 Chapter 5. Conclusions

5.2.1 Trends and cyclical behaviour methods

In the time series area the trend and cyclical behavior methods were ruled out. This was a result of video streams belonging to different PEVQ groups showing bad common cyclical behavior or trend. This is a result of the randomized picked time intervals in the video stream. Furthermore its problematic to develop an algorithm that works automatically without user input with these methods.

5.2.2 Time series matching

With the time series matching three different types were tried, Euclidean, DTW and Lower bounding. One issue were the big dissimilarities between time series in the same PEVQ categories, but the biggest problem was with the time and resource management. To make match measuerement between time series a large database needs to be created. More prob-lematic is that measuring distance between time series takes much computational power. Algorithms incorporating for example the DTW algorithm had running times of minutes even on a modern stationary computer (Dualcore 2,4 ghz, 4 gb ram). Thus there would not be possible to incorporate these methods in the parametric models used in for example handheld devices doing media streaming. The hardware would not be adequate with today’s technology.

5.2.3 FPD

Frequent pattern discovery (FPD) works without user input and it can select and match patterns indifferent of where they occur, which removes some of the limitations of the other time series methods. Even if the method requires to create a pattern database for each bitrate and use comparisons of patterns in the estimation it takes much less time than time series matching methods. The pattern databases are efficient and small due to the dimensionality reduction and since it’s only symbol comparisons the measurements are done much faster than distance measurements. Running times were in the range of ten seconds on the same setup as was used with the time series matching methods.

The FPD method gives clear improvements in estimating the coding complexity com-pared to the old model. Error and RMSE reductions of up to and above 10% could be achieved. The results were not as good as each of the individual regression methods but in conjunction with those methods it improves the results.

5.2.4 Regression

The regression methods achieved the highest error and RMSE reduction and are very efficient in time and data space consumption. Multiple linear regressions (LR) give a little bit better

\

(45)

5.2. Methods analyzed 35

regression method also reached very good results, around 90% of the error reduction achieved when using all methods (Combined methods V1).

5.2.5 Analysis of statistic parameters

Following is a discussion of the statistic parameters for each regression. Linear (LR)

1. Median of frame sequence: Higher values indicate higher MOS scores. Seems to show better linearity than mean of frame sequence. Median doesn’t account for big changes (especially as could happen in beginning or ending of a sequence as a result of scene changes) as much as the average does. This makes the median a more stable parameter and thus more reliable.

2. Longest calm period of frame sequences: Generally long calm regions renders in lower MOS scores. This might be due to that long periods of non-changing frame sizes already is using the maximum available number of bits for that frame. This might indicate that the encoder needs more bits per frame to encode these frames with good quality.

3. Longest active period of frame sequence: Similar to the last one but reversed now checking for active region. High values indicates that the encoder gets to work a lot but can still use enough bits per frame while maintaining decent quality.

4. Longest small period of frame difference sequences: Small period in the frames dif-ference sequences sizes indicate few changes in the frames i.e. less activity between frames. A long period results in easier coding and lower MOS score. Measures same thing as longest calm period, but improves results when used in the model.

5. Longest large period of frame sequences: A long period of large frame sizes indicate a higher MOS score. The relationship isn’t as obvious as with other parameters. 6. Number of passes through median of frame sequences: This measures when the

fol-lowing frame is very similar to the preceding frame or not. This indicates when frames are very similar so that very few bits (for example just after an I-frame) are needed to encode the frame.

7. Standard deviation of frame diff sequences: Indicates how much each frame differs from the next. High standard deviation means both small and large differences, low std means that there is either small or large differences in general between the frame sizes.

(46)

36 Chapter 5. Conclusions

model was well below 1%. The R-squared value of the model was only around 31%. This is a fairly low value and indicates that only a third of the variance is explained in the model. However this is nothing strange considering the properties of the data and the impossibility of reaching perfect estimations. (See discussion about frame sequences in 5.1).

Logistic (LOR)

1. Mean of frame difference sequences: Shows good correlation between MOS score, greater mean value indicate higher MOS score. This could be due to high difference indicates that the difference between I and P-frames is large which might ease the workload on the encoder.

2. Variance of frame sequences: Quadratic relationship, low and high MOS scores seem to have higher values of variance then a medium MOS scores. Perhaps the coder is less efficient for certain events happening in the frame pictures for extreme end of the coding complexities.

3. Standard deviation of frame sequences: Same as for variance of frame sequences. This is actually the same parameter modeled twice. Nevertheless it improves the results. 4. Median of frame sequences: See linear.

5. Longest calm period of frame sequences: See linear. 6. Longest active period of frame sequences: See linear. 7. Longest large period of frame sequences: See linear.

8. Number of passes through median of frame sequences: See linear.

For LOR regression all but one statistic parameter were at 1% significance level. However the parameter ”Mean of frame difference” had a really high p-value of 90%. This would indicate that the parameter don’t belong to the regression model. However thorough testing showed that this parameter improved the results of the model and was thus kept. The deviance was low compared to using other configurations of statistic parameters and this indicate that a relative good model was found.

5.3 Discussion of results

The increased precision in estimating a MOS score of a video stream after incorporating the new algorithms in the old model can thus be estimated to around 10-23%, depending on bit rate and methods used.

(47)

5.4. Restrictions and limitations 37

happened for some clips even at bitrates of 300 kb. This problem was really apparent in bitrates around 150 were even negative PEVQ values could occur. The problem was partially solved by adjusting averages of categories and estimations up to one and removing clips that had received negative PEVQ scores.

The algorithm that used all of the FPD, MLR and LR methods in combination resulted in the highest RMSE and error reduction and thus the best \M OS estimation.

The most cost efficient algorithm is the one only using the regression methods. The increased precision were almost as good as with the algorithm using all methods but this al-gorithm used much less time and data space since the costly work of building and comparing patterns were removed.

Even further optimized were to directly model the regression parameters. Since the parameters showed such clear behavior with the bit rate the modeled parameters did not differ much from the regressed ones. (Unless really extreme values of bit rate were used, see Appendix A). However even for extreme values the modeled parameters still kept inside reasonable values of the same tenth exponent. Efficiency were improved since the need to do new regressions for each bit rate were avoided.

5.4 Restrictions and limitations

For really high bitrates (up and above 1000kb/s) the algorithms seem to loose their effec-tiveness. There may be a limit to how high bitrates the algorithms can be used in. This is probably a result of fewer characteristics in the packet statistics with increased bitrates and the difference in packet sizes and video frames becomes less apparent. This is not a problem for the resolution QVGA used in this thesis, since bitrates usually doesnt reach above 400kb/s.

5.5 Future work

(48)

(49)

Chapter 6

Acknowledgements

I would like to in particular thank my supervisor David Lindegren for all of his help with this thesis work.

(50)

(51)

References

[1] Ericsson homepage, 2011-09-27, "http://www.ericsson.com/se/"

[2] IEEE signal processing journal, "IP-basedmobileandfixednetwork" audiovisual me-dia services”, "http://ieeexplore.ieee.org/servlet/opac?punumber=79"

[3] ITU background, "2011-09-27,http://www.itu.int/en/ITU-T/publications/ Pages/recs.aspx"

[4] ITU-T Recommendations, 2011-09-27, "http://www.itu.int/en/ITU-T/ publications/Pages/recs.aspx"

[5] ITU-T studygroup 12, 2011-09-27, "http://www.itu.int/ITU-T/studygroups/ com12/sg12-q14.html"

[6] SRTP description, 2011-09-27, "http://tools.ietf.org/html/rfc3711#page-6" [7] RTP description, 2011-09-27, "http://tools.ietf.org/html/rfc3550"

[8] Introduction to Data Compression, Second edition, Khalid Sayood, 2000, ISBN 1-55860-558-4

[9] Multimedia f¨or mobila system, Stockholm 5-6 november 2007, STF Ingenj¨orsutbildning AB

[10] ITU Recommendation P-800, 2011-11-03, "http://www.itu.int/rec/T-REC-P. 800-199608-I/en"

[11] MOS-LQO, 2011-10-18, "http://www.scribd.com/doc/53306391/10/ PESQ-Output-MOS-LQO"

[12] Study of video assessment, 2011-09-27, "http://onlinelibrary.wiley.com/doi/10. 1002/acp.731/abstract"

[13] OPTICOM, 2011-11-03, "http://www.opticom.de/"

[14] Opticom PEVQ description, 2011-11-03, "http://www.pevq.org/ video-quality-measurement.html"

(52)

42 REFERENCES

[15] Timeseries analysis, 2011-11-03, "http://www.statsoft.com/textbook/ time-series-analysis/"

[16] Timeseries matching, 2011-11-03, "http://alumni.cs.ucr.edu/~mvlachos/ICDM06/ tutorial_icdm06.pdf"

[17] Regression analysis, 2011-11-03, "http://www.statsoft.com/textbook/ general-linear-models"

[18] Cross-validation, 2011-11-03, "http://citeseerx.ist.psu.edu/viewdoc/summary? doi=10.1.1.48.529"

[19] Multiple linear regression function, 2011-11-16, "http://www.mathworks.se/help/ toolbox/stats/regstats.html"

[20] Logistic regression function, 2011-11-16, "http://www.mathworks.se/help/toolbox/ stats/mnrfit.html"

[21] Reshape function, 2011-11-17, "http://www.mathworks.se/help/techdoc/ref/ reshape.html"

[22] Mean function, 2011-11-17, "http://www.mathworks.se/help/techdoc/ref/mean. html"

(53)

Appendix A

Modeled regression parameters

Figures of modeled and regressed values of regression parameters. The red line shows the regressed values and the black line shows the modeled values.

Figure A.1: LR regression parameter

(54)

44 Chapter A. Modeled regression parameters

Figure A.2: Standard deviation of frame diff sequences

(55)

45

Figure A.4: Longest calm period of frame sequences

(56)

Figure A.6: Longest large period of frame sequences

(57)

47

Figure A.8: LOR category intercept1

(58)

Figure A.10: LOR category intercept3

(59)

49

Figure A.12: Mean of frame difference sequences

(60)

Figure A.14: Standard deviation of frame sequences

(61)

51

Figure A.16: Longest calm period of frame sequences

(62)

(63)

Appendix B

Videostreams ordered in quality

Figure B.1: A number of frame sequences with PEVQ values of around 0-2.8

(64)

54 Chapter B. Videostreams ordered in quality

Figure B.2: A number of frame sequences with PEVQ values of around 2.9-3.2