Video quality encoding characterization and comparison

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Bachelor’s thesis, 16 ECTS | Information Techonolgy

2019 | LIU-IDA/LITH-EX-G--19/035--SE

Video quality encoding

charac-terization and comparison

Kvaliﬁcering och jämförelse av videokvaliteter

Julia Andersson

Andreas Hultqvist

Supervisor : Niklas Carlsson Examiner : Marcus Bendtsen

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten ﬁnns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Students in the 5 year Information Technology program complete a semester-long software develop-ment project during their sixth semester (third year). The project is completed in mid-sized groups, and the students implement a mobile application intended to be used in a multi-actor setting, cur-rently a search and rescue scenario. In parallel they study several topics relevant to the technical and ethical considerations in the project. The project culminates by demonstrating a working product and a written report documenting the results of the practical development process including requirements elicitation. During the ﬁnal stage of the semester, students create small groups and specialise in one topic, resulting in a bachelor thesis. The current report represents the results obtained during this specialisation work. Hence, the thesis should be viewed as part of a larger body of work required to pass the semester, including the conditions and requirements for a bachelor thesis.

(4)

Abstract

Adaptive streaming is a popular technique that allows quality adaption for videos based on the current playback conditions. The purpose of this thesis is to investigate how chunks in video files downloaded from YouTube correlate to each other. We investigate how the chunk size characteristics depend on the category and encoding of the video. The main focus is to analyze the chunk sizes of the video, focusing on distinctness between 360˝

and 2D videos. This is performed using the YouTube API. The videos are downloaded and analysed using youtube-dl and mkv-info. The results show that chunk sizes for adjacent qualities have higher correlation and that videos having a similarity between scenes have higher correlation. In addition, 360˝_{videos differ primarily from regular 2D videos by the}

(5)

Acknowledgments

We would like to thank our supervisor, Niklas Carlsson, for supporting us and giving us feed-back during the whole process. We would also like to thank Vengatanathan Krishnamoorthi for supplying us with tools making it possible to perform this study.

(6)

List of Figures

2.1 The structure between a DASH server and a DASH client. . . 4 2.2 Three different resolutions, as illustrated by deCock et al. The blue marker

repre-sents the encoding point and the red curve reprerepre-sents the PSNR-bit rate. . . 6 2.3 The relationship between encoding quality and bit rate for videos encoded at

dif-ferent resolutions as illustrated by deCock et al. . . 7 4.1 Scatter plots and correlations between different example encodings for three

ex-ample videos in the music category. . . 13 4.2 Box and Whisker plots for the category music video. . . 13 4.3 Scatter plots and correlations between different example encodings for three

ex-ample videos in the lyrics category. . . 14 4.4 Box and Whisker plots for the category lyrics video. . . 14 4.5 Scatter plots and correlations between different example encodings for three

ex-ample videos in the sports high movement category. . . 15 4.6 Box and Whisker plots for the category sports with high movement. . . 15 4.7 Scatter plots and correlations between different example encodings for three

ex-ample videos in the sports minimal movement category. . . 16 4.8 Box and Whisker plots for the category sports with minimal movement. . . 16 4.9 Scatter plots and correlations between different example encodings for three

ex-ample videos in the gaming category. . . 17 4.10 Box and Whisker plots for the category gaming. . . 18 4.11 Scatter plots and correlations between chunk sizes of chunks separated by k

chunks. (Three example videos from the music category.) . . . 19 4.12 Scatter plots and correlations between chunk sizes of chunks separated by k

chunks. (Three example videos from the lyrics category.) . . . 19 4.13 Scatter plots and correlations between chunk sizes of chunks separated by k

chunks. (Three example videos from the sports high movement category.) . . . 20 4.14 Scatter plots and correlations between chunk sizes of chunks separated by k

chunks. (Three example videos from the sports minimal movement category.) . . . 20 4.15 Scatter plots and correlations between chunk sizes of chunks separated by k

chunks. (Three example videos from the gaming category.) . . . 21 4.16 Scatter plots and correlations between different example encodings for three

ex-ample videos in the 360˝_{category. . . .} ₂₁ 4.17 Scatter plots and correlations between chunk sizes of chunks separated by k

chunks. (Three example videos from the 360˝_{category.) . . . .} ₂₂ 4.18 Box and Whisker plots for the category 360˝_{. . . .} ₂₃ 4.19 Box and Whisker plot comparing different number of subscribers for 480p and 720p. 24

(8)

1 Introduction

1.1 Motivation

YouTube is a global service for uploading and watching videos online. Over the years that YouTube has been active, the video delivery technique has evolved substantially. Today, al-most all video delivery is done using adaptive streaming, meaning that the quality of the video stream adapts to the current playback conditions of the user device. This is particu-larly attractive since devices do not have the same playback conditions in all locations; e.g., in a big city compared to a place on the countryside. This thesis investigates the chunk size correlation of videos streamed by YouTube, including the correlation differences observed for different video categories and between encodings. In addition, this thesis presents an investigation looking at the correlation of a video containing multiple dissimilar scenes and movements in need for buffering, e.g., a hockey match or music video, in comparison to a video with less movements, e.g., a golf video or lyrics video.

1.2 Aim

The purpose of this thesis is to investigate different chunk size characteristics in video files streamed by YouTube. The investigation includes breaking down correlation characteristics across video categories. The results are used to receive an understanding for how chunk sizes correlate to each other in different categories and encodings.

(9)

1.3. Research questions

1.3 Research questions

To receive an understanding for how video chunks correlate to each other, the following research question will be investigated in this thesis.

• How does chunk size correlations differ in behaviour between video categories? • How does chunk sizes correlate to each other between encodings?

• How does a 360˝ _{video differ in chunk-size characteristics in comparison to a regular} 2D video?

We answer these questions by collecting a data set and analyzing chunk size correlations within this data set.

1.4 Contributions

The importance of this thesis comes with improving everyday services that themselves i.e., contribute towards improving our everyday lives and the general comfortableness that we expect today. The expectations of this thesis is that it will contribute to further work in the development of more accurate and high performing adapting streaming services. The cor-relations analysed in this thesis are important to better understand how to best design algo-rithms with better accuracy of choosing the right encoding during adaptive streaming. The findings in this thesis can therefore help improve important adaptive solutions that are used in our everyday lives.

1.5 Delimitations

Due to time restrictions, we chose to focus only on a limited selection of video categories on YouTube. Furthermore, all tests were performed from a single location.

In retrospect, we would have preferred to analyze the possibility whether data chunk differ depending on the current popularity of the video at different geographical locations, for example.

The usage of the implemented functions in the YouTube API comes with multiple restric-tions. Examples of these restrictions are the amount of requests that we are permitted to make in a specific amount of time (e.g, one day). This sets a limitation of the amount of videos that we are able to download and analyse for this thesis. There are also more metadata that we would have liked to collect for this investigation, such as ’favorite count’, which no longer is possible to collect, since YouTube has disabled this function. The field ’channel ID’ was in some cases blocked for external users to use.

(10)

2 Background and related work

2.1 Dynamic Adaptive Streaming over HTTP

Dynamic Adaptive Streaming over HTTP (DASH) [4] is a technology providing clients with uninterrupted streaming of videos independent of playback conditions of the user, including network condition and device properties. Applications, such as YouTube, use DASH typically to adapt the video quality to maximize the expected playback experience of the clients, based on the current conditions.

A DASH-encoding is performed by splitting the video content into smaller pieces, called chunk fragments, which are referred to as chunks in this thesis. A video normally consists of a large variation of chunk sizes and the variation is generally higher for higher bit rates compared to lower ones. Higher bit rates generates higher qualities. Chunk sizes depend mostly on the activity of the current scene, which is displayed the same regardless of the bit rate level [18].

The chunks are stored by the HTTP server [17]. Each chunk is captured from a specific time period, normally a few seconds, and is stored in multiple resolutions. The size of the chunk is proportional to the resolution quality, meaning that higher resolutions have bigger chunks. Information about each chunk and other information about the video is saved in a Media Presentation Description (MPD) profile. This information contains the playback time, bit rate, number of chunks and the duration and URL for each chunk. The URL for each chunk is saved in purpose for the correlation to the previous and next one [3].

As shown in Figure 2.1, the DASH client uses the MPD from the HTTP server to make decisions of what encoding is going to be used. The DASH client requests a videos from the server using a series of HTTP GET requests. Each such request is for one or more chunks. Chunks are downloaded from the server and are stored in the playback buffer until playback of each chunk. Before beginning playback, the client typically build a minimum buffer, and the quality of the requested chunks are adapted to avoid that the buffer goes empty.

(11)

2.2. Quality of experience

Figure 2.1: The structure between a DASH server and a DASH client.

YouTube’s implementation of DASH

All services supporting streaming uses their own implementation of DASH. For example, services such as YouTube and Netflix have different ways of addressing chunks (e.g., per chunk URLS vs use of range requests). In addition, this includes handling of codecs. A codec is a technology used for encoding and decoding digital data streams that can be used for storage. The encoding type used by YouTube is called VP9 codec. The VP9 codec is developed by Google specified for YouTube. One characteristic of this codec is the requirement of two types of frames. The first one is called the key-frame, which has all information about the video file, and the other one the infra-frame, which has phased information about the last key-frame. The encoding can only start decoding at the key-frame [15]. In YouTube’s algorithm for adaptive streaming, it has been showed that the length of the chunk encoded at a specific time is correlated to the quality [4].

2.2 Quality of experience

In order for users to maximize the streaming experience, adaptive streaming uses algorithms for deciding the level of quality based on the conditions of the user. The satisfaction of the user is called Quality of experience (QoE). QoE is a measure made from a subjective perspec-tive, but have been shown to be affected by playback stalls, quality switches, startup delays, rebuffer times and the average playback quality. These aspects can all be measured using metrics such as the number of stalls, the stall duration, startup time, the number of qual-ity switches, and the average bit rate stalling and delay time, bit rate and amount of qualqual-ity changes. An important aspect to the QoE is the adaption level for the client, e.g., size of screen and adapting the quality to the condition of the device. Without any consideration to these parameters, bandwidth losses can occur [11].

Adaptive streaming uses multiple encodings when dividing into multiple chunks, which makes it possible to perform quality adaption based on the client’s network and device con-ditions [12]. The quality adaption is an important aspect in reducing stalling and buffering. YouTube uses HTTP to send the video content between server and client.

(12)

2.3. 360˝_videos

2.3

360

˝

_videos

A 360˝ _{video is a kind of interactive video available for multiple devices. YouTube started} supporting 36˝_{0 videos in March 2015 [1]. The main property of 360}˝_{is allowing the user to} make choices between different points of the video. Compared to a regular 2D-video, which produces the content for the user, the user decides what content should be displayed in a 360˝_{video. This type of video is produced using a special camera, a omnidirectional camera,} which is a device with multiple small cameras on it, or by multiple regular cameras put together into one video. An omnidirectional camera is placed in order to allow all possible angels to get captured, and these are then put together into one whole video. Due to all the data that is captured using an omnidirectional camera, the size of the files gets distinct larger than one produced from a regular one. The size of the files are due to the massive amount of bandwidth required. A 360˝ _{video videos requires this large amount of bandwidth because} of the amount of choices the user has in every angle [13]. The substantive bandwidth used by the user is approximately 20% of the total data, and thereby a lot of waste data is produced when viewing a 360˝_{video. Techniques for minimizing the bandwidth is a research subject} today, focusing on predicting the next move of the user.

2.4 Video encoding

Encoding is used when converting one video file from a format to another. When recording a video, the device used to film gives the video different encoding which differs over the time. Different encoding are necessary to be able to play one video at different devices, such as computers and phones, that requires different formats. It is often desirable to get the device to compile with as many formats as possible.

Each format has a unique set of parameters such as containers, (e.g., webm, mp4), codec (e.g., H264) and bit rates. An encoding type can be described as a type of recipe making allowances to these parameters and come up with the best kind of resolution. For example, if we have a specific bandwidth, the choice of encoding depends on what resolution we want and what device we are watching the video on. A balance between bit rate and encoding must be found. As the technology knowledge expands, these encoding are developed and changed over time. This means that YouTube and Netflix does not use the same encoding time today as they did ten years ago, and might be visible when looking at the meta results to the file [8]. Google’s recommendation for encoding used when uploading videos on YouTube is MP4 as a container and H.264 as codec [10].

H.264/AVC Encoding

H.264/AVC encoding was published in 2003. This type of encoding uses an algorithm to get the best match between bit rates and resolutions. The algorithm ends up producing a bit rate-ladder which consists of a pair of a bit rate and a resolution. A specific bit rate is connected to a specific resolution, meaning the only thing deciding the resolution is the bit rate of the current time [9].

Per-title Encoding

Per-title is one of the most common type of encodings. It uses the output all encoded files to maximize the efficiency of quality and bandwidth.

The per-title algorithm uses a bit rate ladder, meaning the total number of quality levels are selected and the bit rate resolution pair for each level. In addition, this algorithm takes consideration to some constraints. These are that the bit rate resolution pair should be effi-cient, related bit rates should be discernible alike to make the switching between bit rates as smooth as possible and to use a minimum amount of bit rates.

(13)

2.4. Video encoding

Figure 2.2 (adopted from work by deCock et al. [5]), shows three different resolutions as a function of bit rate and peak signal-to-noise ration (PSNR). THE PSNR is the ratio of between the maximum signal in correlation to corrupting signals. We note that as the bit rate increases, the encoding quality increases as well. From curve A and B, we can see that the curve starts flattering out at some point. The reason for that is because every resolution has an upper limit in quality production. Though, C and D shows that a higher resolution can produce lower qualities than the encoding at a specific bit rate. If more pixels are encoded with less precision, it might end up in a worse result than using a higher precision and encoding less pixels.

Figure 2.2: Three different resolutions, as illustrated by deCock et al. The blue marker repre-sents the encoding point and the red curve reprerepre-sents the PSNR-bit rate.

Figure 2.3 (adopted from work by deCock et al.[5]), shows the relationship between en-coding quality and bit rate for videos encoded at different resolutions. Looking at the graph, we notice that each of the resolutions (low, mid, high) has a region were the other two reso-lutions can not perform. This region is called the Convex Hull, and this is the region were it is most preferable to operate to get the best combination of quality and bit rate. In reality, it is impossible to operate at exactly this region because of limitation such as that we can only choose from a limited number of resolution. The goal is to operate as close to the convex hull as possible. The per-title bit rate-ladder is executed by selected points as close to the convex hull as possible.

(14)

2.5. Randomized videos

Figure 2.3: The relationship between encoding quality and bit rate for videos encoded at different resolutions as illustrated by deCock et al.

2.5 Randomized videos

To receive a reliable output result it is important to get a totally randomized set of videos that does not has any connection to each other. This is because the result should be general and can not rely on one specific type of video or one publisher only. An important aspect is to not get videos based on own recommendations or the location of the publisher.

2.6 YouTube’s algorithm: base 64

To randomize a unique ID for each new video uploaded, YouTube uses base 64. The reason for YouTube to use base 64 is because of the possibility to push an enormous number into a minimum amount of characters and still make it possible for humans to read. Using one character with base 64, 60 unique id:s can be generated. Using the same concept, 3315 id:s can be generated using two and 261854 id:s using 3. YouTube uses eleven characters which rep-resents 7.38 ¨ 1019possible id:s. Whenever a video is uploaded, YouTube generates a random number and checks if it is available.

The reason YouTube does not use a simple counting increasing from one is because of the complicatedness to synchronize with different servers uploading videos at the same time. Another possibility is to generate each server a sequence of numbers to use, but would gener-ate distinct more work. In addition, it would be a safety risk to use a incremental count, and makes it easy for anyone to track were the video comes from. Knowing a URL of one video, gives traces to related videos. For example, it could be uploaded from the same server [4].

YouTube Developer’s v3 API

The YouTube Developer’s v3 API has implemented functions making it possible to reach and collect data from the YouTube service. Access to the YouTube-API functions is gained by receiving a Developer API key, which is performed by Google. In addition, YouTube has a quota, which is a limit for the amount request which can be made to the server at a specific time, e.g., one day. The API specifies the quota cost for all specific quota calls.

The search()-function returns a list of search results matching the input parameters. The input parameters can be keywords, location, events, etc. to match video, playlist or channel resources. The execute()-function is used to actually execute the list search [6].

(15)

2.6. YouTube’s algorithm: base 64

Video popularity

It has been showed that YouTube uses an algorithm that adapts the top content when using the search function based on the platform used [16]. The rich-gets-richer concept is a theory meaning that the current amount of views for a video is proportional to the total amount of views of the video. People tend to watch videos that has a higher popularity than videos with a lower one, depending on the reliability of the uploader and if it is easier accessed e.g., because the visibility on top lists, recommendation, search algorithms etc. The popularity of two videos containing similar content (by Borghol et al. [2], referred to as clones), is deter-mined based on content-agnostic factors such as the age of the video, number of views etc. This is supported by the concept of rich-get-richer, a video with a lot of views tend to increase in more views faster than a video with less amount of views.

(16)

3 Method

To investigate chunk behaviour in YouTube videos with specific categories, a script was gen-erated. The script makes use of the functions in the YouTube API which gives us access to multiple functions directly connected with the YouTube service. In this thesis, the YouTube API is a main part of accessing the desired videos in order to receive a trustworthy result. Henceforth, we have downloaded and used the tools youtube-dl and mkvinfo to analyse the URL:s fetched from the YouTube API. Youtube-dl is a tool used for downloading videos to the computer from the YouTube server. It uses a URL as input and a webm file as output. Mkv-info is a tool using a video file as an input and extracts specific video segments as output. The segments used in this thesis is chunk size and time. By using these tools, we have been able to extract the relevant video data for this thesis in order to put it together into graphs and tables. We have made a selection of categories that will be presented in this section.

3.1 YouTube API

To generate a set of random videos in specific categories we used the YouTube API which is producing random videos using a specific prefix and in some cases also a postfix. The script randomizes 50 videos in each category using functions from the YouTube API. The input used is a prefix and in some cases also a postfix. Henceforth, we made a selection of the tab ’category’ to ensure the category was the one preferred. The information from YouTube was accessed using a personal ’developer API key’ generated by Google [6]. If the category appears to be incorrect with consideration to the preferred one, a new video was randomized. Each video were controlled manually. The reason for using an URL randomizer and not only manually searching for videos on YouTube is because there is a risk that YouTube lists only the recommended videos after previous personal searches, which would affect our result negatively. The usage of different uploaders affect our result by generates a more varying and trustworthy result because of the diversity [2].

3.2 Third-party tools

For this thesis, we have downloaded two free programs from the server to the computer. As mentioned before, these are youtube-dl and mkvinfo which are used together in order to receive our final result. In this section, we will explain these programs in detail.

(17)

3.3. Categories and formats

Youtube-dl

Youtube-dl is a command-line program used to download videos from servers provid-ing video streamprovid-ing. One of these services, which is also the one used in this thesis, is YouTube.com. The program demands interpretation in Python supporting the versions 2.6, 2.7, 3.2 or higher. As a user, you are able to do modifications which enables usage with con-sideration to the personal usage [7]. Youtube-dl supports video files with multiple formats. Examples of these formats are webm, mp4, flv and mkv. For this thesis, we download the video as a webm file, which is a format supporting extraction of chunks. A required in-put when downloading files from youtube-dl is the demanded format, which decides what format the video is being downloaded in. Additional information about the video file can be extracted from youtube-dl, e.g., what formats the specific video file can be downloaded in. This property was used before downloading the whole file, because all video files has a unique set of formats. Some of the formats are empty, and these was expurgated, to keep the result as legible as possible.

Mkvinfo

Mkvinfo is an extension for Mkvtoolnix. The tool lists all objects in the file into a matroska file. A matroska file has the property to collect multiple forms of multimedia information into a new file with mkv format [14]. In this thesis we extracted the extracted information from the video file into a text file to enable the usage in graphs and tables for the result section. With the Mkvinfo tool, we extracted the segments containing chunk size and time.

3.3 Categories and formats

The different categories we have chosen to analyze are:

• Music videos and lyrics. Defined by the YouTube’s implemented category ’Music’. • Sports with high and minimum movement. Defined by YouTube’s implemented

cate-gory ’sport’.

• 360˝_{videos. Defined by YouTube’s implemented category ’360’.} • Gaming. Defined by the YouTube’s implemented category ’gaming’.

The selection of the categories was taken with consideration to the predicted output. Our prediction was that the selected categories would give us the highest difference in result. We define the highest different in result as categories that are opposite to each other in content, e.g., a lot of movement versus minimal movement. This choice was done in order to generate a circumstantial discussion and provide reasonable conclusions. Our initial thought was to also compare 360˝_{and gaming videos to each other, but we found multiple differences that} made it preferable to analyse these categories one by one.

The data set contains approximately 50 videos/category to ensure that a good and reliable result was produced. Though, only three videos are presented in each graph in the result, but we can see a pattern in all videos within the category. The script make use of the video file properties in order to decide which encoding was to be used. Because of the differ in the existing encoding for different videos, the script collected the available formats for the specific video and saved the chunk size for each encoding. Only the most common encodings are used in the final analysis to receive a relevant result for the research questions.

The encodings we chose to analyse for most categories were 144p vs 240 p, 144p vs 480p and 480p vs 720p. These encodings were chosen to give a broad analysis and because the analysis would contain too many comparisons otherwise. We perform a analysis of the corre-lation between two low closely related encoding, between two high closely related encodings

(18)

3.4. Video search

and between one low and high encoding. Some exceptions were performed for videos which did not support some of the qualities (e.g., lyrics) or in some cases also higher qualities if these were possible (e.g., gaming and 360˝_{). 360}˝ _{has a much bigger files and differ a lot} compared to the other categories; therefore, we analyze this category alone.

3.4 Video search

When making a YouTube search for the category "Music videos" we used the keyword "Music Video", and when searching for videos in the category Lyrics videos we made a search with only the prefixes "Lyrics", "(Lyrics)", and "[Lyrics]".

To find videos in the category 360˝_{, we used the keyword "360" and the selected videos} only with the category 360˝_{, that is available on YouTube. For the category gaming, the} keyword "gaming" was used and selected were videos with live gaming (screen sharing).

For the search for sports with movement we decided to focus on hockey games because the guarantee for movements is high. The keyword used is "hockey games" and "hockey highlights". For the sports videos with minimal movements we have chosen golf videos, in contrast to the hockey videos and highlights. The keyword used was "golf match".

While using the recently uploaded videos is positive for the variation in uploaders, it gives us less opportunity to analyse how the encoding changes over the years. Therefore we chose to skip the analysis during the years.

Metadata for the videos was saved into a matrix while downloaded. This included the columns "URL", "Upload Date", "Uploader", "Uploader ID", "Duration," "Category", "Views for the specific video", "Total amount of views for the uploader" "Likes", "Dislikes", "Fetching Date", "Number of comments", "Number of keywords", "Number of subscribers", "Number of videos for the uploaders" and "Full Title".

3.5 Limitations

The main limitations are the size of the resulting data set and the access to the YouTube API. In most cases it is preferable to use a bigger data set; also in this case, as this would have allowed for improved confidence in the results and reduced the impact of outliers, which may impact the results.

The access to the YouTube API is limited. We use it in order to extract information about a user or a video. Though, fields such as ’channel id’ is blocked for many channels, and some other fields, such as ’age of the uploader’, is inaccessible for a regular user, as we are.

(19)

4 Results

A collection of 50 videos each were performed for the following categories: music videos, lyrics videos, sports with high and minimal movement, 360˝_{and gaming videos.}

4.1 Per-chunk-based correlation analysis

Music

Figure 4.1 shows scatter plots and correlations for three example videos belonging to the mu-sic category. The specification for this category is videos containing both mumu-sic and move-ment and has a most common quality limit of 1080p which is not displayed in the results.

Each graph is connected to a correlation value presented in the graphs. The correlation value is a numeric representation of the correlation between two different qualities. A corre-lation of one represents full correcorre-lation between the chunk sizes of the two encoding qualities. The correlation value is defined as a representation of smooth transmissions between quali-ties. The figures are plotted using Pearson correlation, meaning that the correlation is high if the fragment sizes are linear between two qualities.

In Figure 4.1, we observe the highest correlation between 480p and 720p, which are the highest qualities in the comparison. We can also observe that the chunk sizes for 480p and 720p, Figure 4.1(a) has a different shape compared to results showed in Figure 4.1(a) and 4.1(b), meaning that the size of the chunks differ more. The lower correlation for the smaller encoding may be due to the start up time of the video, which occurs before the video has fully loaded and received all information of the user capacity.

(20)

4.1. Per-chunk-based correlation analysis

(a) 144p vs 240p (b) 144p vs 480p (c) 480p vs 720p

Figure 4.1: Scatter plots and correlations between different example encodings for three ex-ample videos in the music category.

A box and Whisker plot consists of a box and multiple quartiles. Quartile one (lower boundary of box) represents the limit for 25% of the correlation values and quartile three (upper boundary of box) represents 75%. The median (red marker) is a representation of the median of the values. The circles (outliners) in the diagram represent extreme values. The lower and upper whiskers show the 10%-ile and 90%-ile, respectively. The diagram produces an overview of the results.

Figure 4.2 shows box and Whisker plots for 50 videos belonging to the category music video. We can see that higher qualities have higher correlation, and the lowest correlation is 144p. This is supported by the observation of Figure 4.1. In addition, an interesting observa-tion is that adjacent qualities have higher correlaobserva-tion.

(a) 144p (b) 240p (c) 360p (d) 480p

Figure 4.2: Box and Whisker plots for the category music video.

Figure 4.3 shows scatter plots and correlations for three example videos belonging to the category music lyrics. The specification for this category is videos containing music but only with lyrics text on the screen. Lyrics videos has less movement than music videos, that is presented in the previous category, and has in most cases a quality limit of 480p.

Read form Figure 4.3, the correlation value is a bit lower compared to music videos. We can see that the correlation is distinct higher for higher qualities which is similar for the music video category. The most observing difference between music video and lyrics is the placement of each video. In Figure 4.1, the chunk sizes are spread out, while we can see a distinct difference for each of the tree example videos in Figure 4.3. This may be due to the the conversion between scenes. Lyrics videos displays similar content during all scenes,

(21)

meaning that all chunks contains similar content. Music videos has a lot of movement and is less predictable. The content of a music video varies a lot more which has a consequence that the chunk sizes vary more between scenes.

In addition, we note that music videos has chunks relatively similar in size, which is proportional to the quality. Higher qualities have bigger chunks. Lyrics videos have chunk sizes more dependable on the video. This may also be explained by the similarity in video content between scenes.

Figure 4.3: Scatter plots and correlations between different example encodings for three ex-ample videos in the lyrics category.

Figure 4.4 shows box and Whisker plots for 50 videos belonging to the category lyrics video. The boxes are generally longer compared to the boxes for music videos. The reason is probably explained by the fact that lyrics videos have generally lower quality. Moreover, the lyrics videos do not support 720p, which had the highest correlation for music videos.

(a) 144p (b) 240p (c) 360p (d) 480p

Figure 4.4: Box and Whisker plots for the category lyrics video.

Comparing music videos and lyrics videos, the most observable difference the chunk cor-relation within each video. Lyrics videos has high corcor-relation within each video because of the similarity in content.

(22)

Sports

Figure 4.5 shows scatter plots and correlations for three example videos belonging to the category sports with high movement. The specification for this category is sports with a lot of action, e.g., hockey and highlight videos. The sports (high movement) category has a most common quality limit of 720p. From Figure 4.5, we again observe the highest correlation between 480p and 720p. Here, we can also observe a distinct difference between the lower qualities, Figures 4.5(a) and 4.5(b). The lower qualities have varying correlations and the chunk sizes are spread out, while the correlations for the highest qualities are high and more proportional spread out chunk sizes. This may also be explained by the startup time that often is performed in lower qualities. The negative correlation values observed in the Figure is only due to the axis, and is handled without the subtraction sign.

Figure 4.5: Scatter plots and correlations between different example encodings for three ex-ample videos in the sports high movement category.

Figure 4.6 shows box and Whisker plots for 50 videos belonging to the category sports with high movement. Again, we can observe the same result as in Figure 4.5. The correlation is higher for higher qualities. We can also observe that the boxes are longer than previous box graphs. The length of the box may depend on the negative correlation values, making the range of values bigger. Therefore, the length of the boxes is not useful in this analysis.

(a) 144p (b) 240p (c) 480p (d) 720p

(23)

Figure 4.7 shows scatter plots and correlations for three example videos belonging to the category sports with minimal movement. The specification for this category is sports with not as much movement as hockey and highlight videos ( e.g., golf). The sports (minimal movement) category has a most common quality limit of 1080p which is not displayed in the results.

The correlations in Figure 4.7 are distinct higher compared to sports with high movement. Similar to the music videos and lyrics, this can be explained by the conversion of scenes. Sports with minimal movement displays similar content between scenes, which high move-ment videos do not. The content changed in a golf game is mostly the arm movemove-ment of the player and the golf ball, while players in a hockey game changes position a lot.

Similar to the other compared categories, the highest correlation is related to the highest qualities, even though sports with minimal movement has high correlations for all categories.

Figure 4.7: Scatter plots and correlations between different example encodings for three ex-ample videos in the sports minimal movement category.

Figure 4.8 shows box and Whisker plots for 50 videos belonging to the category sports with minimal movement. We can observe a correlation to Figure 4.7. The correlation is high for all qualities except from 144p, which is low for all categories. We can see that minimal movement has higher correlation compared to sports with high movement, but also has more extreme values.

(a) 144p (b) 240p (c) 480p (d) 720p

Figure 4.8: Box and Whisker plots for the category sports with minimal movement. The main difference between sports with minimal movement and sports with high move-ment is that minimal movemove-ment has distinct higher correlation and each video has a high

(24)

correlation because of the similarity in scenes. Sports with minimal movement correlates in a similar way as lyrics videos.

Gaming

Figure 4.9 shows scatter plots and correlations for three example videos belonging to the category gaming. We have specified this category as gaming in original 2D environment and has a most common quality limit of 1080p which is not displayed in the results.

The gaming category has generally large files, making it interesting to look at higher quali-ties as well. Similar to all other categories, we can observe that gaming has highest correlation for 480p vs 720p and 360p vs 480p. The correlation is low for lower qualities, probably de-pending on the scene conversion. A gaming video also contains a lot of movement, making it hard to predict the next scene. Furthermore, something observable is that the correlation is lower for the highest qualities, 720p vs 1080p. We can also see that the three example videos are more sectioned in 4.9(c) and 4.9(d). This may due to the content of each video. A video with a specific type of content may have specific chunk sizes.

Figure 4.9: Scatter plots and correlations between different example encodings for three ex-ample videos in the gaming category.

(d) 360p vs 480p (e) 240p vs 1080p (f) 720p vs 1080p Scatter plots and correlations between different example encodings for three example videos in the gaming category.

Figure 4.10 shows box and Whisker plots for 50 videos belonging to the category gaming. Again, the long, protruding boxes may due to the negative correlation value. We can observe that adjacent qualities have higher correlation and the correlation to 144p is lowest for all qualities, probably depending on the differ in chunk sizes.

The box and Whisker plots makes an comparison between qualities. All categories have comparisons up to 720p except from lyrics which does not support that.

(25)

(a) 144p (b) 240p (a) 480p (a) 720p

Figure 4.10: Box and Whisker plots for the category gaming.

The main similarities between all categories is that categories with a similarity between scenes have higher correlation for each video. Generally for all categories is also that the highest correlation was for comparison between 480p and 720p. An exception in made for lyrics videos because the category did not support the quality.

From the box and Whisker plots it can be seen that adjacent qualities generates higher correlation to each other, which also motivates an implementation of qualities switching to qualities one step at a time [5]. An example can be taken from Figure 4.2, where we see that (c) and (d) has less correlation to 144p compared to (a) and (b) that are closer qualities.

(26)

4.2. Temporal correlations within encodings

4.2 Temporal correlations within encodings

Music

Figure 4.11 shows the same video set as in Figure 4.1 as a representation of k chunks apart with different values of k at quality 480p. We can observe the best correlation for k=1 even though all values for k are spread out. k = 1 represents the correlation to the next chunk, which should have higher correlation.

(a) k=1 (b) k=3 (c) k=5

Figure 4.11: Scatter plots and correlations between chunk sizes of chunks separated by k chunks. (Three example videos from the music category.)

Figure 4.12 shows the same video set as in Figure 4.3 as a representation of k chunks apart with different values of k at quality 480p.

(a) k=1 (b) k=3 (c) k=5

Figure 4.12: Scatter plots and correlations between chunk sizes of chunks separated by k chunks. (Three example videos from the lyrics category.)

Comparing the k=1 graphs for music video and lyrics, we can see that the correlation for lyrics is distinct higher. We can also see that each lyrics video is concentrated to each other for all values of k in contrast to the music video, which is spread out. This is again supported by the the similarity between scenes.

For k = 2, the correlation is varying for both music video and lyrics, meaning that the correlation for any chunk more than one chunk away does not have any correlation.

(27)

4.2. Temporal correlations within encodings

Sports

Figure 4.13 shows the same video set as in Figure 4.5 as a representation of k chunks apart with different values of k at quality 480p. We can observe that the correlation is varying and that the chunk sizes are spread out for all values of k.

(a) k=1 (b) k=3 (c) k=5

Figure 4.13: Scatter plots and correlations between chunk sizes of chunks separated by k chunks. (Three example videos from the sports high movement category.)

Figure 4.14 shows the same video set as in Figure 4.7 as a representation of k chunks apart with different values of k at quality 480p. Similar to lyrics (Figure 4.12), the correlation is high for k=1 and the three example videos are concentrated to them selves. This can be explained by the concept used for music and lyrics. The similarity in content increases the correlation to the next chunk.

(a) k=1 (b) k=3 (c) k=5

Figure 4.14: Scatter plots and correlations between chunk sizes of chunks separated by k chunks. (Three example videos from the sports minimal movement category.)

(28)

4.3. 360˝_videos

Gaming

Figure 4.15 shows the same video set as in Figure 4.9 as a representation of k chunks apart with different values of k at quality 480p. The correlation is high for K=1 and varies for the other values. Though, the three example videos are not concentrated each as for lyrics and sports with minimal movement. The correlation comes form the concentration of all three videos.

(a) k=1 (b) k=3 (c) k=5

Figure 4.15: Scatter plots and correlations between chunk sizes of chunks separated by k chunks. (Three example videos from the gaming category.)

One general similarity between all the categories is the correlation for k = 1. While the k=1 has relatively high correlation for the categories lyrics, sports with minimal movement and gaming. The high correlation for lyrics and gaming is due to the similarity between scenes.

4.3

360

˝

_videos

Figure 4.16 shows scatter plots and correlations for three example videos belonging to the category 360˝_{. The specification for this category is videos created with a 3D/360}˝ _format and has a most common quality limit of 2880p. Observed from Figure 4.16 is that the 480p and 730p has the highest correlation. Similar to the gaming videos, the correlation peak is for this comparison. Something observable is that the correlation for all qualities is relatively high, which is explained by the need of realistic content. A 360˝ _{video is supposed to look} realistic, which it would not be with low correlations between scenes.

Figure 4.16: Scatter plots and correlations between different example encodings for three example videos in the 360˝_category.

(29)

4.3. 360˝_videos

(d) 1080p vs 1440p (e) 144p vs 2880p (f) 2160p vs 2880p Scatter plots and correlations between different example encodings for three example videos in the 360˝_category.

Figure 4.17 shows the same video set as in 4.16 is that a representation of k chunks apart with different values of k at quality 480p. We can see that the correlation is relatively high for k=1, which supports the theory of smooth conversions between scenes. A high correlation to the next chunk makes the conversion to the next scene better.

(a) k=1 (b) k=3 (c) k=5

Figure 4.17: Scatter plots and correlations between chunk sizes of chunks separated by k chunks. (Three example videos from the 360˝_category.)

Figure 4.18 shows box and Whisker plots for 50 videos belonging to the category 360˝_. Observed from the Figure is that the correlation is smallest for smaller categories and higher for adjacent ones.

(30)

4.3. 360˝_videos

(a) 144p (b) 240p (c) 480p (d) 720p

Figure 4.18: Box and Whisker plots for the category 360˝_.

The main difference between the 360˝ _{and all the other categories is the size of the files} and the quality limit, which is higher for 360˝_{. This is due to the need of a realistic approach} and because of the the amount of data required. The size of the video files is also connected to the high quality limit. While the lyrics has a maximum quality of 480p, 360˝_videos sup-port qualities up to 2880p. The observation can be explained from the fact that a 360˝_video requires more bandwidth because of bigger files. A 360˝_{video has possible direction choices} for the user, meaning that a lot more scenes must be produced compared to a 2D video where the user does not have any choices. This is the reason 360˝ _{videos has the largest amount of} qualities, and most of them are in most cases not used by the user. Only about 20% of the total data is used [1]. A lyrics video is displayed in 2D and contains nothing but rolling text, which is similar from one scene to another. The user does not have any choice of displayed angle and thereby the videos does not require content produced for every possible choice. This includes all the other video categories.

Generally for most categories, as the quality increases, the chunk size also increases, which is supported by the comparison of 144p vs. 240p [3]. The 240p quality has approximately twice as large chunks. For 720p, the chunk size is approximately 1000 kB. This is probably applicable for the lyrics video as well, even though it can not be displayed in 720p, supported by the increase in chunk size from 360p to 480p. In obedience to 360˝ _{videos having larger} chunks because of larger files, the lyrics video contains smaller files because the limited data content produced from scenes and movement.

(31)

4.4. Correlation between number of subscribers

4.4 Correlation between number of subscribers

Figure 4.19 shows a box and Whisker plot of the correlation between 480p and 720p for videos uploaded by uploaders with different number of subscribers. The plot includes 81 videos. Out of the 81 videos considered here, 31 videos are associated with uploaders with less than 1000 subscribers, 23 videos with an uploader with less than 100000 but more than 1000 sub-scribers, and 27 videos with an uploader with more than 100000 subscribers.

We can see from Figure 4.19 that as the amount of subscribers of a channel increases, the correlation increases. We can also see that videos uploaded from uploaders with fewer subscribers have more varying correlation values. Perhaps this is due to more advanced equipment supporting higher qualities. An uploader with more subscribers tend to have a more serious interest in video making. In addition, there are also beginner uploaders, that has not been able to get as many subscribers, that can use more advanced video making. That can be a reason for the accounts with less subscribers to have more varying correlation values. Another aspect can be the number of views. A video with more subscribers generally have more views, which might be due to some properties of the content, which itself might affect the its and the uploader’s popularity.

Figure 4.19: Box and Whisker plot comparing different number of subscribers for 480p and 720p.

(32)

5 Discussion

5.1 Results

The qualities we chose to analyse was the highest ones possible with consideration to all categories. We chose 720p as the highest quality because it was the highest quality supported for most videos. An exception was made for lyrics, 360˝_{and gaming, because these differed} in lower respectively higher qualities. Our initial thought was to use higher qualities for the comparison, but was impractical because some categories did not support these.

Consideration must be taken to the axis scale that was necessary to perform because of the varying sizes. Most graphs are scaled approximately the same but with some exceptions. Furthermore, an aspect made from us it that negative values can affect the result of the boxes, making them longer. It would thereby be desirable to only produce positive values if redoing this data set. This can be made by adapting the axis to the content.

Seen from the results, we can draw conclusions that videos containing similar content during the whole playback generally have higher correlation. This is due to the similarity in chunks. Videos displaying similar content generates similar chunks as well. Videos contain-ing a lot of different scenes and movements generates less similarity in chunks. Every chunk has the same time interval, making a chunk containing a calm scene differ a lot in comparison to a chunk in the time interval of a lot of movement.

From the results, it can be seen that 360˝ _{produced distinct higher qualities than most} of the other categories. This is explained by the fact that a 360˝ _{video produces more} con-tent, even though most of it is not used [1]. Our initial thought was that 360˝ _{would have} higher correlation because of the desired realistic approach when switching between scenes. It would affect the video a lot more than in a 2D video if the conversion between chunks was not smooth. Finally, from the results, we can verify that 360˝_{videos generally have a higher} correlation for all qualities.

5.2 Method

The trustworthiness of the results is based on the size of the data set. Using a bigger data set, we could have analysed more video category and more videos per category. This minimizes the risk that the selected videos are outstanding from the regular chunk size characteristics of the specific category. In addition, a bigger data set would give us the opportunity to make

(33)

5.3. The work in a wider context

a deeper analysis for the encoding types within categories, and not only focus on the chunk size correlation between categories. E.g., we could have made a more detailed analysis on the sports category, and include all possible qualities for multiple types of sports.

Another restriction of the method was the access for the YouTube API. When extracting information about the user, using the YouTube API, some of the fields were inaccessible for a regular user. This includes fields such as ’channel id’, which is blocked for some channels. In addition, we wanted to have information about the user such as age, fist published video (in year) etc.

Randomizing algorithm

The method used to choose a random URL is based on the functions of the YouTube API. For example to receive a URL for a YouTube video with a specific prefix and postfix is done with YouTube’s search()-function. The complete implementation of this function is not public and is thereby not guaranteed to be completely random. YouTube uses a random number in base 64 to give an uploaded video a unique ID, most of the YouTube URL:s containing eleven characters [6]. This means that most of the eleven character-URL does not belong to a video and thereby a script that totally randomizes a URL of eleven characters to download could not be used. This might affect the result of the data set because we wanted to analyse 50 videos per category without any correlation to each other.

Our preference was to use a script generating a totally random YouTube-id. This was not possible because of YouTube’s handling of video id:s. YouTube uses a totally random number in base 64 each time a video is uploaded. The possible number of URLS is 7.38 ¨ 1019, were most of the URL:s are not containing an uploaded video [4]. A script to find a random video in a specific category, would take a massive amount of time. The solution we found was to use YouTube-API that generated a random video based on pre and/or postfix input [6]. This was performed by using embedded functions. A disadvantage using this method is the reliability of randomness, that we would have gained using the first one.

Because of design principles we could not represent all the 50 videos in a readable figures, and thereby we had to select three of them that is represented with different colours. Even though we had get a good representation of the category, the result would be even more trustworthy representing the whole data set in the graph.

5.3 The work in a wider context

The usage of adaptive streaming is an efficient technique increasing the user experience. The indigence for automated video streaming in today’s society is an important aspect for ful-filling the human demands, time and comfort. Today, video streaming is a majority of the network traffic and a streaming service such as YouTube might lose customers because of buffering and stalling. To avoid this, DASH clients implement new, more personally adapted algorithms. Receiving information from conversations using HTTP is complex because of the security mechanisms it comes with, but there are developers trying to receive this informa-tion in order to provide better algorithms to increase the QoE for the client [12]. One example is using buffer conditions from the client. A client’s sensitivity for delays is probably de-pendable on the size of the local playback buffer which makes it preferable to take that into account in the decision of quality.

In addition, streaming techniques might be based on popularity, personal usage, personal data, age, location etc, which can be a threat for integrity [2]. Age is used as a recommenda-tion tool by YouTube to select relevant videos for the user. Adapting the streaming service to the category of the video based on personal usage exposes the user’s personal choices. A user watching a video of a hockey game on YouTube, will, with high probably, receive an-other related hockey video as a recommendation. Therefore, a users recommended videos may reveal a lot about a users past usage.

(34)

6 Conclusion

The purpose of this thesis was to find out how chunk sizes correlate depending on quality and category, focusing on 360˝ _{compared to regular 2D videos. The aim was to download} videos from YouTube.com using tools, to receive an apprehension of how chunk sizes cor-relate. The result shows that qualities closely related to each other generates higher chunk correlation, and the highest correlation is between qualities 480p and 720p. Furthermore, cat-egories containing similar scenes, such as lyrics and sports with minimal movement, have higher correlation within each video compared to videos with lots of movement. In addition, the chunk sizes increase proportional to the quality and the maximum quality of a video cat-egory is dependant of the amount of bandwidth required. Furthermore, 360˝ _{videos have} generally high correlation for all qualities, and have a lot of qualities compared to a regular 2D video. For future work in this subject, an interesting aspect is to collect more data includ-ing more categories, to broaden the analysis. Somethinclud-ing that we did not have time for in this thesis, but that we thought would be interesting to analyse, is how YouTube has changed their encoding standard throughout the years. An interesting aspect is to analyse videos from, e.g., ten years ago, compared to a recountly uploaded video, to see if old videos are provides with new encodings. In additional, it is relevant to compare the behaviour of YouTube videos with other streaming services (e.g., Netflix).

(35)

Bibliography

[1] Mathias Almquist, Viktor Almquist, Vengatanathan Krishnamoorthi, Niklas Carlsson, and Derek Eager. “The Prefetch Aggressiveness Tradeoff in 360 Video Streaming”. In: Amsterdam, Netherland, June. 2018. Proc. ACM Multimedia Systems (ACM MMSys). [2] Youmna Borghol, Sebastien Ardon, Niklas Carlsson, Derek Eager, and Anirban

Ma-hanti. “The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity”. In: Beijing, China, Aug. 2012. Proc. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (ACM KDD), pp. 1186–1194.

[3] Shin-Hung Chang, Kuan-Jen Wang, and Jan-Ming Ho. “Optimal DASH Video Schedul-ing over Variable-Bit-Rate Networks”. In: Taipei, Taiwan, Dec. 2018. Proc. International Symposium on Parallel Architectures, Algorithms and Programming (PAAP).

[4] Xianhui Che, Barry Ip, and Ling Lin. “A Survey of Current YouTube Video Character-istics”. In: vol. 22. 2. IEEE MultiMedia, Apr. 2015, pp. 56–63.

[5] Jan De Cock, Zhi Li, Megha Manohara, and Anne Aaron. “Complexity-based consistent-quality encoding in the cloud”. In: Proc. IEEE International Conference On Image Processing (ICIP), Sept. 2017.

[6] Google Delevolpers. YouTube API. Apr. 6, 2019. URL: YouTube . http : / / code . google.com/apis/youtube/overview.html.

[7] youtube-dl developers. youtube-dl. May 3, 2019.URL: http://ytdl- org.github. io/youtube-dl/about.html.

[8] Divyashri Bhat Dilip Kumar Krishnappa and Michael Zink. “DASHing YouTube: An analysis of using DASH in YouTube video service”. In: Sydney, Australia, Oct. 2013. Proc. IEEE Local Computer Networks (IEEE LCN).

[9] Dag H. Finstad, Hakon K Stensland, Havard Espeland, and Pal Halvorsen. “Improved Multi-Rate Video Encoding”. In: Dana Point CA, USA, Dec. 2011. Proc. IEEE Interna-tional Symposium on Multimedia (ISM).

[10] Google. Rekommenderade kodningsinställningar för uppladdning. Mar. 22, 2019. URL: https://support.google.com/youtube/answer/1722171?hl=sv.

[11] Vengatanathan Krishnamoorthi, Niklas Carlsson, and Emir Halepovic. “Slow but Steady: Cap-based Client-Network Interaction for Improved Streaming Experience”. In: Banff, Canada, June. 2018. Proc. IEEE/ACM International Symposium on Quality of Service (IEEE/ACM IWQoS).

(36)

Bibliography

[12] Vengatanathan Krishnamoorthi, Niklas Carlsson, Emir Halepovic, and Eric Petajan. “BUFFEST: Predicting Buffer Conditions and Real-time Requirements of HTTP(S) Adaptive Streaming Clients”. In: Taipei, Taiwan, June. 2017. Proc. ACM Multimedia Systems (ACM MMSys), pp. 76–87.

[13] Kaixuan Long, Chencheng Ye, and Ying Cuiand Zhi Liu. “Optimal Multi-Quality Mul-ticast for 360 Virtual Reality Video”. In: Abu Dhabi, United Arab Emirates, Dec. 2018. Proc. IEEE Global Communications Conference (GLOBECOM).

[14] mkvtoolnix. mkvinfo. Apr. 18, 2019.URL: https://mkvtoolnix.download/doc/ mkvinfo.html.

[15] Juan J Ramos-munoz, Jonathan Prados-Garzon, Pablo Ameigeiras, Jorge Navarro-Ortiz, and Juan M. Lopez-soler. “Characteristics of mobile youtube traffic”. In: Proc. IEEE Wireless Communication, Feb. 2014, pp. 18–25.

[16] Bernhard Rieder, Ariadna Matamoros-Fernández, and Òscar Coromina. “From ranking algorithms to ‘ranking cultures’: Investigating the modulation of visibility in YouTube search results”. In: vol. 24. 1. Convergence, Feb. 2018, pp. 50–68.

[17] Di Shuang, Yongxiang Zhao, Li Chunxi, and Guo Yuchun. “An energy-aware chunk selection mechanism in HTTP video streaming”. In: Yangzhou, China, Oct. 2016. Proc. Wireless Communications Signal Processing (WCSP).

[18] Ong Zhang, Fengyuan Ren, and Wenxue Cheng. “Modeling and analyzing the influ-ence of chunk size variation on bitrate adaptation in DASH”. In: Atlanta, GA, May. 2017. Proc. IEEE INFOCOM.

Video quality encoding characterization and comparison

Linköping University | Department of Computer and Information Science

Bachelor’s thesis, 16 ECTS | Information Techonolgy

2019 | LIU-IDA/LITH-EX-G--19/035--SE

Video quality encoding

charac-terization and comparison

Kvaliﬁcering och jämförelse av videokvaliteter

Julia Andersson

Andreas Hultqvist

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research questions

1.4

Contributions

1.5

Delimitations

2

Background and related work

2.1

Dynamic Adaptive Streaming over HTTP

YouTube’s implementation of DASH

2.2

Quality of experience

2.3

360

videos

2.4

Video encoding

H.264/AVC Encoding

Per-title Encoding

2.5

Randomized videos

2.6

YouTube’s algorithm: base 64

YouTube Developer’s v3 API

Video popularity

3

Method

3.1

YouTube API

3.2

Third-party tools

Youtube-dl

Mkvinfo

3.3

Categories and formats

3.4

Video search

3.5

Limitations

4

Results

4.1

Per-chunk-based correlation analysis

Music

Sports

Gaming

4.2

Temporal correlations within encodings

Music

Sports

Gaming

4.3

360

videos

4.4

Correlation between number of subscribers

5

Discussion

5.1

_videos

_videos