• No results found

Towards Optimal Quality of Experience via Scalable Video Coding

N/A
N/A
Protected

Academic year: 2021

Share "Towards Optimal Quality of Experience via Scalable Video Coding"

Copied!
34
0
0

Loading.... (view fulltext now)

Full text

(1)

School of Innovation, Design and Engineering Mälardalen University Press Licentiate Theses

No. 112

TOWARDS OPTIMAL QUALITY OF EXPERIENCE

VIA SCALABLE VIDEO CODING

Pengpeng Ni

2009

(2)

Copyright © Pengpeng Ni, 2009 ISSN 1651-9256

ISBN 978-91-86135-46-1

(3)

Abstract

To provide universal multimedia experience, multimedia streaming services need to transparently handle the variation and heterogeneity in operating en-vironment. From the standpoint of streaming application, video adaptation techniques are intended to cope with the environmental variations by utiliz-ing manipulations of the video content itself. Scalable video codutiliz-ing (SVC) schemes, like that suggested by the standards H.264 and its SVC extension, is highly attractive for designing a self-adaptive video streaming system. When SVC is employed in streaming system, the produced video stream can be then easily truncated or tailored to form several sub-streams which can be decoded separately to obtain a range of preferable picture size, quality and frame rate. However, questions about how to perform the adaptation using SVC and how much adaptation SVC enables are still remaining research issues. We still lack a thorough understanding of how to automate the scaling procedure in order to achieve an optimal video Quality of Experience (QoE) for end users.

Video QoE, depends highly on human perception. In this thesis, we in-troduce several video QoE studies around the usability of H.264 SVC. Several factors that contribute significantly to the overall QoEs have been identified and evaluated in these studies. As an example of application usage related factor, playback smoothness and application response time are critical performance measures which can benefit from temporal scalability. Targeting on applica-tions that requires frequent interactivity, we propose a transcoding scheme that fully utilizes the benefits of Switching P and Switching I frames specified in H.264 to enhance video stream’s temporal scalability. Focusing on visual qual-ity related factors, a series of carefully designed subjective qualqual-ity assessment tests have been performed on mobile devices to investigate the effects of multi-dimensional scalability on human quality perception. Our study reveals that QoE degrades non-monotonically with bitrate and that scaling order prefer-ences are content-dependent. Another study finds out that the flickering effect

(4)

ii

caused by frequent switching between layers in SVC compliant bit-streams is highly related to the switching period. When the period is above a certain threshold, the flickering effect will disappear and layer switching should not be considered as harmful. We have also examined user perceived video quality in 3D virtual worlds. Our results show that the avatars’ distance to the virtual screen in 3D worlds contribute significant to the video QoE, i.e., for a wide ex-tent of distortion, there exists always a feasible virtual distance from where the distortion is not detectable for most of people, which makes sense to perform video adaptation.

The work presented in this thesis is supposed to help improving the design of self adaptive video streaming services that can deliver video content inde-pendently of network technology and end-device capability while seeking the best possible experience for video.

(5)
(6)
(7)

Preface

There are many people I wish to thank for their support during the years I have spent working for this licentiate thesis.

The work presented in this thesis would not have been possible without the encouragement and guidance of my supervisors. I would like first to thank my main supervisor Dr. Damir Isovic for believing in me and giving me constant and patient support throughout this work. Second, many thanks go to my ex-ternal supervisors Prof. Carsten Griwodz and Prof. P˚al Halvorsen, for guiding me to continue my research. I want to also thank my former supervisor Prof. Gerhard Fohler and my industrial mentor Dr. Isak Jonsson for introducing me to the journey of research. Then I would like to express my great gratitude to Dr. Alexander Eichorn for useful discussions and advices in the later period of my research work. You have been a great co-author and provided very useful input to the thesis.

Finally, I am grateful to my parents and my beloved husband Hongzhou for always encouraging me to pursue my ideals.

The research presented in this thesis has been supported both by the Swedish Knowledge Foundation’s grant for industrial graduate student at small scale company and the Norwegian Research Council though the Robust Multimedia Streaming Services (ROMUS) project.

Pengpeng Ni V¨aster˚as, August 1, 2009

(8)
(9)

Contents

I

Thesis

5

1 Introduction 7

1.1 Adaptive Video Streaming . . . 7

1.1.1 Video compression . . . 8

1.1.2 The H.264 Video Coding standards . . . 10

1.1.3 Quality of Experience . . . 11

1.1.4 Video QoE factors . . . 12

1.2 This thesis . . . 13

1.2.1 Application domain . . . 13

1.2.2 Motivation and goal . . . 13

1.3 Contributions . . . 14 1.4 Overview of Papers . . . 15 1.4.1 Paper A . . . 15 1.4.2 Paper B . . . 16 1.4.3 Paper C . . . 17 1.4.4 Paper D . . . 18 Bibliography 21

II

Included Papers

23

2 Paper A: Support for Digital VCR functionality over Networks for H.264/AVC 25 2.1 Introduction . . . 27

2.2 Related work, motivation and approach . . . 28

2.3 Multi-layered Video Sequence . . . 29

(10)

viii Contents

2.4 Adaptive Encoding Scheme . . . 30

2.5 Frame-layer rate control for S-frames . . . 32

2.5.1 Quadratic rate distortion model . . . 32

2.5.2 Multi-layered rate control algorithm . . . 33

2.6 Experiments . . . 36

2.7 Conclusions . . . 39

Bibliography . . . 40

3 Paper B: Pick your Layers wisely - A Quality Assessment of H.264 Scalable Video Coding for Mobile Devices 43 3.1 Introduction . . . 44

3.2 Related work . . . 45

3.3 Field Study Design . . . 46

3.3.1 Content Selection and Encoding . . . 46

3.3.2 Scalable Operation Points . . . 48

3.3.3 Subjective Assessment Procedures . . . 48

3.3.4 Limitations . . . 49

3.4 Bitstream analysis . . . 50

3.4.1 Scaling Granularity and Diversity . . . 50

3.4.2 Packet Statistics . . . 51

3.5 SVC Quality Assessment . . . 52

3.5.1 Noticeable QoE Degradations . . . 52

3.5.2 Scaling Order Preferences . . . 54

3.5.3 Objective Model Performance . . . 56

3.6 Conclusions . . . 57

Bibliography . . . 58

4 Paper C: Fine-grained scalable streaming from coarse-grained videos 61 4.1 Introduction . . . 63

4.2 Related Work . . . 65

4.3 Experiment Design . . . 66

4.3.1 Material . . . 66

4.3.2 Scalable Operation Points and Switching Patterns . . . 67

4.3.3 Procedure . . . 69

4.3.4 Participants . . . 70

4.4 Results . . . 71

(11)

Contents ix

4.4.2 Quality Layer Switching . . . 73

4.4.3 Multi-dimensional Layer Switching . . . 74

4.5 Discussion . . . 74

4.5.1 Preferences for Switching and Scaling . . . 74

4.5.2 Fine-grained Bit-rate Adaptation . . . 75

4.6 Conclusion . . . 75

Bibliography . . . 76

5 Paper D: Video Streaming into Virtual Worlds - the effects of virtual screen distance and angle on perceived quality 79 5.1 Introduction . . . 81 5.2 Experiment Design . . . 82 5.2.1 Implementation . . . 82 5.2.2 Video Material . . . 82 5.2.3 Procedure . . . 85 5.3 Results . . . 85

5.3.1 The Effects of Distance . . . 86

5.3.2 The Effects of Angle . . . 87

5.4 Discussion . . . 88

5.5 Conclusion . . . 89

(12)
(13)

List of Figures

1.1 Layered video structure of scalable video coding . . . 11

2.1 Encoding to support DVCR . . . 29

2.2 Remote backward playback. . . 30

2.3 Simplified SP-frame Encoder. . . 31

2.4 Bitrate fluctuations for different coding methods . . . 37

2.5 Transmission delay comparison . . . 38

2.6 PSNR comparison . . . 39

3.1 Selected Video Sequences. . . 47

3.2 Bitrate allocation for scalable OPs[Sm, Tn] in Sports sequence, whereSmrepresents m-th spatial resolution,Tn represents n-th temporal resolution. Each bar column can be additionally truncated into 4 quality layers identified byQi. . . 50

3.3 Cumulative distribution function (CDF) of NALU packet sizes for selected operation points of the Sports sequence. . . 51

3.4 Subjective DMOS scores for selected sequences as means with 95% confidence intervals. QoE gradients for within-dimension scaling are shown as lines. Note that higher DMOS scores mean lower QoE and that the bitrate drops from 1.5 Mbit for OP1 to 128 Kbit for OP6. . . 53

4.1 Bitstream layout for downscaling and layer switching options used in the experiments. Q and T denote the quality and tem-poral dimension respectively. . . 68

4.2 Bit-rate of different operation points and switching patterns. . 70

5.1 Virtual world video screen . . . 81

(14)

2 List of Figures

5.2 Sample screenshots from the prototype . . . 83 5.3 Frame samples at different video quality levels (Quantization

level increase) . . . 84 5.4 Different quality levels (resolution reduction) . . . 84

(15)

List of Publications

1. Pengpeng Ni, Damir Isovic, and Gerhard Fohler. User-friendly h.264/avc for remote browsing. In MULTIMEDIA ’06: Proceedings of the 14th

annual ACM international conference on Multimedia, pages 643–646,

Santa Barbara, CA, USA, 2006. ACM.

2. Pengpeng Ni and D. Isovic. Support for digital VCR functionality over network for h.264/avc. In ICC ’08. IEEE International Conference on

Communications, pages 520–525, Beijing, China, May 2008.

3. Alexander Eichhorn and Pengpeng Ni.Pick your Layers wisely - A Qual-ity Assessment of H.264 Scalable Video Coding for Mobile Devices. In

ICC ’09. IEEE International Conference on Communications, Dresden,

Germany, 2009.

4. Pengpeng Ni, Alexander Eichhorn, Carsten Griwodz, and P˚al Halvorsen. Fine-grained scalable streaming from coarse-grained videos. In

NOSS-DAV ’09: Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video, pages 103–

108, Williamsburg, VA, USA, 2009. ACM.

5. Pengpeng Ni, Fredrik Gaarder, Carsten Griwodz, and P˚al Halvorsen. Video streaming into virtual worlds: the effects of virtual screen distance and angle on perceived quality. In MULTIMEDIA ’09: Proceedings of

the 17th annual ACM international conference on Multimedia, Beijing,

China, 2009. ACM.

(16)
(17)

I

Thesis

(18)
(19)

Chapter 1

Introduction

1.1

Adaptive Video Streaming

As network technologies are becoming widely applied in our daily life, multi-media streaming services are gaining increased popularity. In a general stream-ing scenario, a video server sends stored or live media to the clients. The clients consume the received media soon after an initial playback delay. Interruptions or severe quality degradation must be avoided during the playback. Differ-ent from downloading service, multimedia streaming service need process a large amount of continuous media data with realtime constraints. When wire-less networks are integrated with other forms of multimedia delivery, ubiqui-tous multimedia consumption becomes possible. For instance, nowadays video streaming services already exist both on the internet and with mobile technolo-gies. User may access interactively multimedia content on different types of terminals and networks. With the growth of streaming service, a multimedia application will be judged not only by the function it performs, but also by its easy of use and user’s experience in using it. The expectations to the next generation of multimedia streaming services have been set higher from simple content access to the delivery of “best experiences”.

Development of self-adaptive approaches is essential for building services independently of network technology and terminal. Video adaptation is a field that offers a rich body of knowledge and techniques for streaming media in heterogenous environments. A video adaptation tool typically takes a coded video as input and produces a new presentation with a video format to meet user needs in customized situations, [1]. Examples of some popular video

(20)

8 Chapter 1. Introduction

adaptation technologies include frame dropping and coefficient dropping etc, see [2, 3]. The objective of video adaptation is to maximize the utility of the final presentation while satisfying various constraints, and utility in this context represents the quality of user experience [1]. However, the utility value of different adaptation options is not fixed and is heavily affected by the user preferences. Due to the lack of effective method for capturing and predicting of users’ satisfaction of the delivered service, efficient search method for optimal adaptation solution satisfying various constraints remains as an open issue.

The overall challenge resulting from the discussion above is how the pro-vision of streaming service can be made self-adaptive in order to optimize the overall user oriented performance in highly heterogeneous and possibly dy-namically varying networking environments.

1.1.1

Video compression

Getting digital video from its source (camera or a stored clip) to its destination (a display) involves a chain of processes. Key to this chain are the processes of compression (encoding) and decompression (decoding), in which bandwidth-intensive “raw” digital video is reduced to a manageable size for transmission or storage, and then reconstructed for display. It is necessary to perform video compression. First, current Internet throughput rates are insufficient to handle uncompressed video in real time. Second, video compression enables more efficient use of bandwidth. If there is a high bitrate connection, it is often more attractive for a video server to send multiple compressed video channels than to send a single uncompressed stream, for instance, in digital video broadcasting scenarios.

Video compression algorithms operate by removing redundant information from video signals [4, 5]. For example, there is usually little changes between sequential video frames in a video clip that was often captured at more than 25 frames per second (fps). Bits saving can be made by sending the first frame in its entirety, and sending only differences between subsequent frames. At the decoder side, the first frame can be decoded independently. It is then used as reference picture to predict the second frame which is reconstructed by adding the transmitted difference on the prediction formed by its reference. The sec-ond decoded picture can be referenced later by the third one and so on. The process that takes advantage of the similarities between successive pictures is known as inter coding or temporal compression. Inter frame coding allows efficient compression, but has the disadvantage that the predicted video frame exist only in terms of the differences from previously transmitted frame(s). If

(21)

1.1 Adaptive Video Streaming 9

one previous frame is removed due to impairment or network congestion, the difference data will then be insufficient to reconstruct correctly the target pic-ture, and consequently, larger impairments.

Various lossless and lossy techniques can be performed related to informa-tion that is contained only within a single video frame. The process is known as intra-coding or spatial compression. For example, often pictures contain sizable areas in which the same or similar pixel values exist. By partitioning a image into small areas (blocks) and encoding each block sequentially, video encoder can also use previously encoded and reconstructed blocks to predict the latter blocks belonging to the same video frame. Block-based prediction are performed not only in spatial but also in temporal dimensions. The more accurate matches found by the prediction, the less bits are required for encod-ing the image difference. However, the efficiency of the prediction process is highly related by video’s content characteristics. When complex textures and fast motion present in video frames, the obtained coding gain is not significant. Higher compression can be achieved by further exploration of the similar-ity between pixels. Usually, there is a high correlation between pixels that are close to each other. This gives rise to low spatial frequencies. Typically, the average brightness of the picture results in a substantial zero frequency com-ponent. The human eye and brain are fairly good at seeing small brightness variation in low spatial frequencies, but are less sensitive to noise in high spa-tial frequencies. This fact allows one to filter out some of the higher frequency content while still maintaining a recognizable image. The low-pass filtering is done by transform coding and frequency quantization. Image transforms, such as the discrete cosine transform (DCT) used in MPEG and JPEG, allow two-dimensional pictures to be described in the frequency domain. After that, frequency quantization simply divides the DCT coefficients by constants (in-dexed by quantization parameters) and rounds to the nearest integers. This is not a reversible procedure. Therefore frequency quantization is a lossy com-pression technique which compresses video at the expense of a certain amount of information loss (distortion). The selected Quantization Parameters (QPs) influences most the final decoded picture quality. Hence, most encoders aim to achieve the optimum balance between compression and decoded quality by properly setting the QPs.

Video compression is a combination of all the above mentioned compres-sion techniques that operate in temporal, spatial and frequency domains. As a result, most video streams are a sequences of compressed frame pictures, in where some frames are encoded using pure intra-coding technique and other frames are temporal predicted using inter-coding techniques. According to

(22)

10 Chapter 1. Introduction

the selection of compression technique and reference pictures, video frames are usually classified into three major types, intra frames (I frames), forward predicted frames (P frames) and bi-directional predicted frames (B frames). Furthermore, each video sequence can be segmented into one or more groups of pictures (GOP) consisting of one I frame and several P and/or B frames that are located between I frames. A GOP forms a complete set of video segment that can be decoded independently of frames in other segments. Highly com-pressed digital video adopts often long GOP size to include less number of I frames, because that I frame usually provides the least compression among all other frame types since it only allows intra-coding. On the other hand, long GOP size weakens the ability of impairment control because of the strong de-pendency between frames within a GOP. If one frame is impaired due to some packets loss, the impairment will be propagated to all the other frames depen-dent on the impaired one while retransmission will irritatingly result in longer decoding delay. Hence, the structure and size of a GOP are highly related to important performance measures such as coding efficiency, robustness against error propagation and decoding delay etc. One mentionable aspect of video adaptation is suitable adjustment of video coding structures and GOP sizes [2, 6].

1.1.2

The H.264 Video Coding standards

Multimedia compression algorithms have been continually developed over the last two decades. H.264/MPEG-4 Advanced Video Coding (H.264-AVC) [7] is a flexible video coding standard that represents current state of the art in the area of versatile highly compressed representation of digital video. Scaling from mobile phone use to next generation high definition DVDs, H.264-AVC has been well received in a number of industries such as telecommunications, content production and broadcasting. The dominant factor for the popular-ity of H.264-AVC is clearly the high rate distortion performance - the ratio subjective quality/bitrate ([8]), which is achieved through advanced spatial and temporal compression techniques.

In 2007, the scalable extension of H.264 standard (H.264-SVC) [2] was released to support video streaming in more heterogeneous scenarios. Multi-dimensional scalability is designed in add-on fashion on the basis of H.264-AVC. The term “scalability” in this context refers to the removal of parts of the total coded bit stream in order to adapt it to the various needs or preferences of end users as well as to varying terminal capabilities or network conditions. Compared to single layered video coding structure of H.264-AVC, H.264-SVC

(23)

1.1 Adaptive Video Streaming 11

encodes the video into layered structure, see figure 1.1. While the base layer forms a H.264-AVC compliant sub stream, a fully scalable global stream con-sists of several enhancement layers as well as the base layer. These layers are subset bit streams that can be extracted by truncating the global stream and they can themselves be decoded to obtain a range of preferred picture size, quality and playback speed (frame rate).

Figure 1.1: Layered video structure of scalable video coding

1.1.3

Quality of Experience

To measure and predict the actual performance of a streaming service, system designers commonly define some Quality of Service (QoS) metrics as perfor-mance parameters. Typical QoS metrics include network latency, jitter, and packet loss, and are usually used as measures of performance of the network itself. However, the ultimate measure of the services offered by a multime-dia streaming system is the satisfaction of user experience. Good QoS at the network level cannot ensure a good user experience.

Quality of Experience (QoE) has been recently introduced as an important issue in the multimedia research community. The term QoE is related but dif-ferent from the technical concept of QoS. Traditionally, QoS is defined as a set of parameters that can be used to define the network’s capability to meet the requirements of application. Respectively, QoE is a concept that connects the

(24)

12 Chapter 1. Introduction

user with the application. In [9, 10], QoE is defined as the overall acceptability of an application or service, as perceived subjectively by the end-user, includ-ing application responsiveness, functionality, usability and the service context that surrounds it. Contributing to the QoE are those objective QoS measures together with human components that may include user sensation, cognition, motivation etc. Since QoE reflects how well a service meets users’ expecta-tions, it will be an important factor in the marketplace success of a streaming service.

1.1.4

Video QoE factors

The quality of user experience is a very complex concept which is influenced by numerous factors. Existing studies [11, 12] provide some indications that the QoE factors can be classified into smaller number of groups. It has also been shown that the most influential factors for video QoE are related to three categories, namely visual quality, content type, and application usage.

Visual quality factors are related to the digitalization and compression of video source materials and their effects can be adjusted by various parame-ter settings. Betparame-ter visual quality and greaparame-ter reliability and/or flexibility can be achieved by “correct” compression and decompression processes. In [12], some of the most often mentioned visual quality factors has been intuitively described by user as a list of perceptual attributes such as disturbing inferior-ity, blurriness and fogginess, text readabilinferior-ity, motion jerkiness, color, size of picture, and blockiness etc. Additionally, flickering effect is defined in [13] as a significant factor for video with scalable quality. Most of these visual quality factors refers to some particular video processing stages and can be se-lectively used as criteria in subjective evaluation of different video processing techniques.

Content type and application usage related factors belong to high level de-sign of a product and should be taken into consideration earlier during the designing procedure. Examples of usage factors include purpose of use, equip-ments (end devices) and functionalities. Video content can be classified into different genres such as News, Sport, Cartoon and Commercials etc. Different video genres demonstrate different characteristics in terms of motion speed, scene complexity, variation etc, which results in different dependencies and emphasis on visual quality factors. For example, motion jerkiness in a video of Cartoon genre is not as noticeable as in video of Sport genre. We define content type related factor as the correlation between content characteristics and visual quality factors.

(25)

1.2 This thesis 13

1.2

This thesis

The work presented in this thesis has started in the context of the User Friendly H.264 for Realtime Browsing project. The initial scope of this project was to exploit the capability of H.264 video coding standard so as to produce H.264 compliant video that demonstrate good performance in professional video edit-ing applications which are frequently used by broadcast industry. Different from common video player, video editing applications have stringent require-ment regarding to the provision of the video skimming function. After the release of the scalable extension of H.264, our research motivation was highly inspired by many research advances in the field of video coding and adaptation. Thus, we extended our application scenarios from video streaming in wired network to wireless networks where mobile devices were used as playback ter-minals. However, the goal of our project remains the same as developing video adaptation approaches in order to provide consumers with the best possible ex-perience of network video applications when considering resource limitations, context change, etc.

1.2.1

Application domain

Examples of applications whose development can benefit from the thesis work include Video on Demand (VoD) systems over Internet including both wired and wireless access technologies. We focus on off-line encoding environments, where complicated preprocessing is allowed. The streaming service should en-able an user to click and start watching immediately pre-recoded video content, as well as navigate instantly to any point within the video.

The work presented in this thesis centers on the usability and performance of H.264 and its SVC extension. Knowledges gained by detailed study about the standards will serve as the basis mechanisms and functions that service planning can be built upon.

1.2.2

Motivation and goal

The motivation for the work presented in this thesis is to develop knowledge on how to develop adaptive multimedia streaming services that can deliver video content independently of network technology and end-device capability while seeking the best possible experience for video.

Common large scale networks are built on best-effort infrastructure, where the availability of resources (CPU, Memory, network bandwidth) may change

(26)

14 Chapter 1. Introduction

frequently. Further, the streamed content itself may change its characteristics over time, which often translates into varying resource requirements and thus adaptation path. In order for multimedia streaming services to cope with fre-quent changes, we need to develop self-adaptive mechanisms along with QoS management to guarantee user desired QoEs.

H.264 and the SVC extension, are highly attractive for designing such a QoE-based video streaming system. For example, when SVC coding is em-ployed in the system, the produced video stream can be easily truncated or tailored on the fly to adapt to changing resource availabilities. However, ques-tions about how to perform the adaptation and how much adaptation H.264 enables are still remaining research issues. In addition, the appended scalabil-ity comes at the price of decreased coding efficiency. For better rate distortion performance, the encoding process should be optimized for dedicated applica-tion scenarios. Moreover, the coding optimizaapplica-tion problem is highly dependent on the video content related features, the variation of supported bit-rates and user preference of frame rate against picture quality etc. This has motivated our investigations of a controlling mechanism that can jointly take those fac-tors into account when performing encoding.

In many adaptive streaming solutions, video quality is provided as the only adaptation parameter. When adaptation is to be performed online, a stream-ing server needs to select the most appropriate quality automatically by tunstream-ing some objective configuration parameters. For example, it may select a lower frame rate to relieve the network congestion [14]. However, video quality de-pends heavily on viewer perception. Making predictions about the effect of a given set of objective parameters on subjective QoE is challenging. Common objective metrics such as Peak Signal-to-Noise Ratio(PSNR) performs poorly to predict user perceived quality, especially for scalable content. Hence, reli-able QoEs are usually captured by subjective tests that measure QoE directly from the end-users. This has motivated the careful design of our video qual-ity assessment tests. With better understanding of the subjective nature of the user, one could deduce the required service layer performance for a given target QoE.

1.3

Contributions

In this section, we identify the scientific contributions of the work presented in this thesis.

(27)

ser-1.4 Overview of Papers 15

vice. Regarding to the employment of scalable video coding techniques, some research topics involving human factors have been proposed and studied, which results in an in-depth understanding of video QoE. Fac-tors contributing to the overall QoE is manifold. While some facFac-tors are related to human sensation and cognition, such as include playback smoothness, picture blurriness, and flickering effect; there are other fac-tors related to application performance include setup delay and random access delay. Due to the limited time, we evaluate some of the factors contributing to the QoE and leave a complete description of video QoE to the future work.

• User studies for QoE measurement. A couple of user studies have been performed to investigate human being’s perception to video quality. Main discoveries from those user studies include a) User’s sensation to smooth-ness and blurrismooth-ness of movie are content dependent. b) Changes in spa-tial resolution is most annoying. c) A flickering threshold exists and affect user’s detectability of quality distortion.

• Rate control framework for layered video structure. In practice, a video encoder needs to implement a rate control algorithm that dynamically adjusts encoder parameters to achieve a target bit-rate. Traditional rate control scheme is designed for non scalable videos that are structured in a single layer. For scalable videos that are based on layered video structure, we propose a rate control framework that handles multi-layer bits allocation.

1.4

Overview of Papers

1.4.1

Paper A

Pengpeng Ni and D. Isovic.Support for digital VCR functionality over network for h.264/avc.In ICC ’08. IEEE International Conference on Communications, pages 520–525, Beijing, China, May 2008.

In this paper, we propose a transcoding scheme that fully utilizes the ben-efits of Switching P(SP) and Switching I(SI) frames specified in H.264-AVC [15] to facilitate the implementation of full digital video cassette recording (DVCR) like functionality.

DVCR functionality that enables interactive and quick video browsing is highly desired in video streaming applications since it lets user easily skip

(28)

un-16 Chapter 1. Introduction

interesting parts of a video, stop at any position at any time and continue later; while does not demand complex user interface design. Hence it fits especially well for handheld devices with compact screen size and video editing applica-tions that requires frequent interactivity.

However, straightforward implementation of DVCR trick modes may lead to network congestions and demand also high performance on client devices. To resolve this problem, our coding scheme uses layered video structure in which primary SP frames are used to link the rest of P frames in the base layer and secondary SP frames constitute the enhancement layers. When tricky mode is requested, video server can choose to send only a subset of the video frames representing decodable video in different temporal resolutions. The substream requires less network and computational resources than video encoded in single layer structure. As a result, we face less jittering problem when video is playing back in VCR trick modes. Video QoE is improved by means of better usability. The flexible bit-rate adaptation provided by SP frames comes at the price of decreased video compression efficiency. In order to avoid the performance loss, our scheme does not preserve a fixed GOP size but extend the GOP size to the length of video shot. This reduces significantly the number of encoded I frames which usually cost more bits than the other frame types.

Our contribution in this paper is a self-adaptive controlling mechanism for scalable video coding. We present a multi-layered rate control algorithm that jointly consider bits allocation between different frames and different layers. In each layer, we use Quadratic rate distortion model [16] to estimate the re-lationship between target bitrate of a single frame, content variation and the required quantization parameter. After finishing encoding of a frame, the tar-get bitrate for the next frame is assigned according to the remaining bit budtar-get and temporal resolution. Between different layer, global bit allocation strategy is applied to minimize jitter effect while maintaining nearly constant quality among video frames in a video sequence. We underline the effectiveness of our approach through a set of experiments.

1.4.2

Paper B

Alexander Eichhorn and Pengpeng Ni. Pick your Layers wisely - A Quality Assessment of H.264 Scalable Video Coding for Mobile Devices. In ICC ’09.

IEEE International Conference on Communications, Dresden, Germany, 2009.

In this paper, we present and discuss the results of a subjective quality assessment test we performed on mobile devices to investigate the effects of multi-dimensional scalability on human quality perception.

(29)

1.4 Overview of Papers 17

Our subjective test is a field study based on the ITU-R recommendations for subjective quality assessment BT.500-11 [17]. In our experiment, an im-paired video and its reference were displayed twice in AB AB order. However, the position of the reference was randomized. Whether A or B is the refer-ence was not revealed to the assessors. The results reveal that visual effects of QoE degradations differ between scaling dimensions and scaling preferences are content dependent. Two technical content characteristics, scene complexity and motion speed, are measured respectively as MPEG7 edge histogram and as MPEG7 motion activity vector. The correlation between the content character-istics and user scaling preference shows that video containing slower motion may prefer temporal downscaling better than quality downscaling. Only spatial downscaling is regarded as worst for all content type. Several existing objec-tive video quality metrics have also been examined. But again, none of them was proved to be accurate enough to reflect the humans perfection.

To help the design of adaptive network services for scalable streaming ap-plications, we abstract two worth noticing properties of scalable video stream:

scaling granularity and diversity. Granularity is the difference between

bi-trates of two close-by scaling options. By scaling diversity, we mean the num-ber of scaling options that gives highest possible quality in every dimension for a given available bandwidth. Higher diversity gives the user more options to choose so that it is easier to achieve higher QoE. In this sense, it is more meaningful for scalable video stream than traditional concept of rate distor-tion optimizadistor-tion. Reasonable reladistor-tion between scaling diversity and granu-larity should be considered during encoding. For optimal QoE and increased chances of adaptation tools to follow preferred scaling orders, video encoders should maximize the scaling diversity and granularity of bitstreams.

1.4.3

Paper C

Pengpeng Ni, Alexander Eichhorn, Carsten Griwodz, and P˚al Halvorsen. Fine-grained scalable streaming from coarse-Fine-grained videos. In NOSSDAV ’09:

Pro-ceedings of the 18th international workshop on Network and operating systems support for digital audio and video, pages 103–108, Williamsburg, VA, USA,

2009. ACM.

In this paper, we propose and investigate the usefulness of frequent layer switching techniques as a method for achieving fine grained bit-rate adaptation using coarse grained scalable videos.

Fine-grained scalability can adapt most closely to the available bandwidth, but this comes at the cost of a high compression penalty. In the context of VoD

(30)

18 Chapter 1. Introduction

streaming to mobile end systems, we have therefore explored whether a similar adaptation to the available bandwidth can be achieved by performing layer switching in coarse-grained scalable videos. In this approach, enhancement layers of a video stream are switched on and off to meet an arbitrary bit-rate constraint over a short time-window.

User studies have been conducted to evaluate the QoE impact of layer switching. We test different switching patterns giving rates in-between pre-encoded H.264-SVC layers. The results indicate that switching patterns with sufficient perceptual quality exist, i.e., if the switching period is long enough to avoid “flickering effect”, the layer switching is preferred compared to down-scaling to a lower static SVC layer. Thus, achieving fine-grained scalable streaming from coarse-grained videos is possible. Layer switching comple-ments static downscaling on performing dynamic bitrate adaptation.

1.4.4

Paper D

Pengpeng Ni, Fredrik Gaarder, Carsten Griwodz, and P˚al Halvorsen. Video streaming into virtual worlds: the effects of virtual screen distance and angle on perceived quality. In MULTIMEDIA ’09: Proceedings of the 17th annual

ACM international conference on Multimedia, Beijing, China, 2009. ACM.

This paper is about a set of experimental studies related to perceived video quality in virtual 3D environments. Streaming video in 3D virtual worlds is a relative new topic, but has large potential marketable value. There is an increasing trend to include streamed video content in online 3D environments. Such environments allow potentially several concurrently visible videos on a single display device. However, network and processing bottlenecks occurs easily as a result of delivering more dynamic content to the end-users.

In this paper, we are looking at how to adapt the video stream in a 3D world in order to reduce the required bandwidth. Compared to traditional streaming application, users’ perceived quality in virtual worlds is influenced mainly by different factors. For example, the avatars’ distance and angle to the visual screens in 3D environment can be changed rapidly and significantly, which may contribute mostly to the video QoE. Video adaptation can be therefore made according to the avatars’ viewing condition.

We conduct subjective assessment tests using a 3D application prototype to determine how positioning of video in the 3D environment influences the hu-man’s detectability of video quality impairments. The statistical analysis of the test results confirms our hypothesis that virtual viewing distance is the domi-nant factor influencing user’s perception. The closer virtual viewing distance

(31)

1.4 Overview of Papers 19

is, the higher bitrate is required for maintaining users’ satisfaction. Meanwhile, for a wide extent of distortion, there exists always a reasonable close distance from where the given amount of distortion is not detectable. In addition, we test and compare two kinds of video adaptation techniques, namely coefficient dropping versus spatial resolution downscaling. Our results indicate that qual-ity loss introduced by coefficient dropping is less obvious than reducing the spatial resolution.

(32)
(33)

Bibliography

[1] S.-F. Chang and A. Vetro. Video adaptation: Concepts, technologies, and open issues. Proceedings of the IEEE, 93(1):148–158, Jan. 2005. [2] H. Schwarz, D. Marpe, and T. Wiegand. Overview of the Scalable

Exten-sion of the H.264/AVC Video Coding Standard. IEEETrans. on Circuits

and Systems for Video Technology, 17(9):1103–1120, 2007.

[3] Damir Isovic. Flexible Scheduling for Media Processing in Resource

Constrained Real-Time Systems. PhD thesis, M¨alardalen University, Sweden, November 2004.

[4] Iain E.G.Richardson. H.264 and MPEG-4 Video compression - Video

Coding for Next-generation Multimedia. Wiley, first edition, 2004.

[5] John Watkinson. The MPEG handbook. Focal Press, 2004.

[6] Pengpeng Ni, Damir Isovic, and Gerhard Fohler. User-friendly h.264/avc for remote browsing. In MULTIMEDIA ’06: Proceedings of the 14th

annual ACM international conference on Multimedia, pages 643–646,

New York, NY, USA, 2006. ACM.

[7] ITU-T and ISO/IEC JTC 1. Advanced Video Coding for Generic

Au-diovisual services, ITU-T Recommendation H.264, April 2004. ISO/IEC

14496-10(AVC).

[8] G.J. Sullivan and T. Wiegand. Rate-distortion optimization for video compression. Signal Processing Magazine, IEEE, 15(6):74–90, Nov

1998.

[9] International Telecommunications Union - Telecommunication standard-ization sector. Definition of Quality of Experience(QoE), 2007.

(34)

[10] Mulimedia Research Group, Inc. IPTV Video Quality: QoS & QoE, 2007. [11] International Telecommunications Union - Telecommunication standard-ization sector. Quality of Experience Requirements for IPTV Services, 2007.

[12] Satu Jumisko-Pyykk ¨o, Jukka H¨akkinen, and G¨ote Nyman. Experienced quality factors: qualitative evaluation approach to audiovisual quality. In

Multimedia on Mobile Devices 2007, February 2007.

[13] Pengpeng Ni, Alexander Eichhorn, Carsten Griwodz, and P˚al Halvorsen. Fine-grained scalable streaming from coarse-grained videos. In

NOSS-DAV ’09: Proceedings of the 18th international workshop on Network and operating systems support for digital audio and video, pages 103–

108, New York, NY, USA, 2009. ACM.

[14] Damir Isovic. Flexible Scheduling for Media Processing in Resource

Constrained Real-Time Systems. PhD thesis, M¨alardalen University, Sweden, November 2004.

[15] ITU-T and ISO/IEC JTC 1. Advanced Video Coding for Generic

Au-diovisual services, ITU-T Recommendation H.264, April 2004. ISO/IEC

14496-10(AVC).

[16] Tihao Chiang and Ya-Qin Zhang. A new rate control scheme using quadratic rate distortion model. Circuits and Systems for Video

Tech-nology, IEEE Transactions on, 7(1):246–250, Feb 1997.

[17] International Telecommunications Union - Radiocommunication sector.

ITU-R BT.500-11. Methodology for the subjective assessment of the qual-ity of television picture, 2002.

Figure

Figure 1.1: Layered video structure of scalable video coding

References

Related documents

Anledningen till att en lågt ställd tröskel innebär ett strängare betalningsansvar för kontohavaren är för att kontohavaren därmed får bära det ekonomiska

Omvårdnadspersonal som saknade kunskap och erfarenhet upplevde att de inte kunde hantera de situationer som uppstod i samspelet med patienter med avsiktligt självskadebeteende

Below we illustrate the index approach, which aims to capture early specialization by focusing on the four key components of the definition (training intensity, year-round

External validation of a nomogram predicting the probability of prostate cancer Gleason sum upgrading between biopsy and radical prostatectomy pathology among Japanese

The main classes inside this file are HTMLTagFinder, which is used to find all of the content within a certain tag with a given class (within the kth.se pages the main content can be

out in a multisensory laboratory at RISE Research Institutes of Sweden at several occasions. A panel fulfilling specific criteria has been trained to assess light sources and

As discussed in Section 3.1 about the sequence of delays on which these browsing sessions were differentiated; its sole purpose was to see how the users reacted to these different

Figure 4.9a & 4.9b shows the correlation plots of the live channel users of the watch duration of the current session and the join time and connectivity change rate of the