Statistical Multiplexing of Video for Fixed Bandwidth Distribution : A multi-codec implementation and evaluation using a high-level media processing library

(1)

Master of Science Thesis in Computer Science

Department of Electrical Engineering, Linköping University, 2018

Statistical multiplexing of

video for fixed bandwidth

distribution

A multi-codec implementation and

evaluation using a high-level media

processing library

(2)

Statistical multiplexing of video for fixed bandwidth distribution — A multi-codec implementation and evaluation using a high-level media

processing library Max Halldén LiTH-ISY-EX--18/5142--SE Supervisor: Harald Nautsch

isy, Linköpings universitet

Patrik Lantto

WISI Norden AB

Examiner: Ingemar Ragnemalm

isy_{, Linköpings universitet}

Division of Information Coding Department of Electrical Engineering

(3)

Abstract

When distributing multiple TV programs on a fixed bandwidth channel, the bit rate of each video stream is often constant. Since video sent at a constant quality is typically wildly varying, this is a very unoptimal solution. By instead sharing the total bit rate among all programs, the video quality can be increased by allo-cating bit rate where it is needed. This thesis explores the statistical multiplexing problem for a specific hardware platform with the limitations and advantages of that platform. A solution for statistical multiplexing is proposed and evaluated using the major codecs used for TV distribution today. The main advantage of the statistical multiplexer is a lot more even quality and a higher minimum qual-ity achieved across all streams. While the solution will need a faster method for bit rate approximation for a more practical solution in terms of performance, the solution is shown to work as intended.

(4)

(5)

Acknowledgments

I would like to thank everyone at WISI, and especially Patrik Lantto, for giving me the opportunity to do this thesis and helping me along the way, and for be-ing patient when some thbe-ings were gobe-ing a little slow. I would also like to thank Harald Nautsch for being my supervisor and Ingemar Ragnemalm for being my examiner. A special thanks to Ingemar for making sure this thesis was actually finished. I would also like to thank my opponent Niklas Norin. And above all a big thanks to my family and friends for all the support along the way, not only during the course of this thesis, but during my entire time studying at the univer-sity. Thank you all.

Linköping, Juni 2018 Max Halldén

(6)

(7)

1

Introduction

Television content is often distributed over bandwidth channels. This fixed-bandwidth requirement is at odds with the characteristics of a single video bit-stream. Typically, compressed video is encoded to achieve a constant video qual-ity and this means that the bit rate of properly compressed video will vary with the varying complexity of the video. The video can still be encoded with a fixed bit rate, but the resulting uneven quality will mean a higher bit rate will be needed to achieve a minimum acceptable quality level. When transporting mul-tiple video streams across a single channel with fixed bandwidth, these can be jointly encoded, each with a varying bit rate, to try to achieve the total bit rate of the channel, while maintaining as close to constant quality as possible across all video streams.

Sharing a single communication channel among several streams where only one stream is allowed to use the channel at a single point of time, with the purpose of increasing the total utilisation of the channel, is commonly known as statistical time division multiplexing. In this report we will simply refer to it as statistical multiplexing. The increased utilisation that can be gained from statistical multi-plexing is known as the statistical multimulti-plexing gain.

The capabilities of a statistical multiplexer is very much dependent on the en-coder. The encoder used for this project is based on the Intel Media SDK. The Intel Media SDK is a framework for accelerated media processing using Intel hardware. It provides real-time encoding capabilities for developers. [31]

(10)

1.1 Motivation

The bandwidth available in video distribution scenarios is often limited and the utilisation of the bandwidth is therefore of great importance. With statistical multiplexing, less bandwidth is needed to transmit video of the same quality. Statistical multiplexing is also becoming increasingly important for video distri-bution. With the adoption of the HEVC video codec, the gains from statistical multiplexing are increased compared to previous codecs [44]. Being able to send more video streams on the same channel means that the quality of all streams can be increased by sharing bit rate. It becomes more likely that the high bit rate requirement of one stream can be “cancelled out” by the low bit rate re-quirement of another at the same quality. We can see the increased interest in statistical multiplexing in recent broadcasting standards: The DVB-C2 standard for cable television states that: “DVB-C2 shall allow achieving the maximum benefit from statistical multiplex method.” [11]. The DVB-S2X extension to the DVB-S2 standard for satellite television allows bonding of multiple channels to increase total throughput with the stated goal to reach higher statistical multi-plexing gains [13].

Intel Media SDK is an emerging technology showing great results for efficient hardware-accelerated video decoding and encoding and with added support for HEVC makes this an important technology for new video distribution technolo-gies. [42, 40]

Statistical multiplexing is an on-going area of research which has been getting more relevant due to bandwidth becoming more scarce and the increasing adop-tion of higher resoluadop-tion video. Most previous research uses the available ref-erence encoder for the relevant codec without regard to the limited availability of encoder internals when using closed-source libraries and applications such as the Intel Media SDK and without regard to the applicability across different video codecs.

1.2 Aim

This report will present a solution to the statistical multiplexing problem for video streams using the Intel Media SDK. The capabilities of the Intel Media SDK is explored in the context of statistical multiplexing. The performance and feasibility of the solution is evaluated with a prototype.

1.3 Research questions

The questions to be answered in this report is as follows:

1. How can the resulting video quality and bit rate, that is valid for use in a statistical multiplexer, be inferred from available data when using the Intel Media SDK?

(11)

1.4 Delimitations 3

2. How can a codec-agnostic statistical multiplexer be implemented using the Intel Media SDK and the result from question 1? Specifically, how can an even video quality be achieved, while both utilizing available bit rate and satisfying bit rate constraints?

3. What statistical multiplexing gains is achievable using the Intel Media SDK and the statistical multiplexer from question 2?

1.4 Delimitations

The use of the Intel Media SDK results in some obvious limitations as to what can be achieved by the statistical multiplexer. If a technique cannot be achieved using the Intel Media SDK then it will not be taken into consideration for this report.

Further, we are limited in terms of what codecs the statistical multiplexer ports. While the statistical multiplexer should support all codecs currently sup-ported by the Intel Media SDK, no further assumptions will be made in regards to the codec. Apart from the methods that can be applied to all supported codecs, the statistical multiplexing solution will not use those that are codec-specific. The codecs that are actually used for evaluating the statistical multiplexer is the ones most used in broadcasting today.

(12)

(13)

2

Background

Television programs was initially broadcast as analogue content on a single ana-logue channel using a fixed frequency band. With the adoption of digital trans-mission it became possible to transmit the content compressed a lot more effi-ciently. The need for a universal standard to transmit compressed media con-tent led to the MPEG-1 and MPEG-2 standards. The MPEG standards contain a number of parts, which each define a specific aspect of the transmission and com-pression of media content. MPEG-2 expands upon the MPEG-1 standard and is the one most widely used today. Notably, the MPEG-2 Systems part specifies a content-agnostic way of transmitting multimedia content across various channels. It is the reason why this fairly old standard is still a major part of broadcasting technology today, and an important part of this thesis. While the MPEG-2 audio and video compression has arguably been superseded, the Systems part can be (and has been) extended to include the new compression technologies, allowing new technology to be used on old infrastructure. [48]

A central concept of the MPEG-2 Systems part is the transport stream, which is the format used in broadcasting to send multiple programs in a single stream. The statistical multiplexing operation will produce a constant bit rate transport stream, containing all the multiplexed programs. A program is the video, audio along with eventual subtitles and other ancillary data. As this thesis exclusively focuses on the video stream, a program will sometimes just be referred to as a video stream.

(14)

2.1 Video coding standards

The video coding standards used in this thesis are MPEG-2 Video/H.262 [24, 27], MPEG-4 AVC/H.264 [25, 28] and MPEG-H HEVC/H.265 [26, 29]. MPEG is the Moving Picture Experts Group, a working group of ISO/IEC, with MPEG-2, MPEG-4 and MPEG-H being suits of standards proposed by the organiza-tion. H.262, H.264 and H.265 are recommendations from the Telecommunica-tion StandardizaTelecommunica-tion Sector of the InternaTelecommunica-tional TelecommunicaTelecommunica-tion Union, ab-breviated as ITU-T.

As mentioned MPEG-2, MPEG-4 and MPEG-H contains a number of standards and H.262 more specifically corresponds to MPEG-2 part 2, or ISO/IEC 13818-2. H.264 corresponds to MPEG-4 part 10, or ISO/IEC 14496-10. Lastly, H.265 corresponds to MPEG-H part 2 or ISO/IEC 23008-2. H.264 is referred to as Ad-vanced Video coding (AVC) by MPEG and H.265 is similarly referred to as High Efficiency Video Coding (HEVC). For the rest of the report we will just refer to these standards as MPEG-2, AVC, and HEVC respectively.

The three standards are all commonly used in broadcasting and can be seen as three milestones in video compression technology. MPEG-2 is the oldest of the three, but is still used extensively. AVC is chronologically between the other two and gives a 50 % efficiency increase over MPEG-2. It is the most common format seen today. HEVC provides yet another 50 % efficiency over AVC. It is currently gaining in relevance, especially with the increasing adoption of Ultra HD resolu-tion content.

2.2 Statistical multiplexing in television broadcasts

Television broadcast is usually done on fixed bandwidth channels, due to hav-ing to be sent modulated on a frequency band. This means that one or several programs are fitted into a single channel with fixed bit rate.

When multiplexing a number of streams to create a constant bit rate transport stream, we have generally two alternative ways to do this. The simplest way is to divide the total bit rate among ell elemantary streams and encode all these with a constant bit rate. However, maintaining a constant bit rate is in conflict with the goal to maintain a constant video quality. The more optimal way is to encode all elemantary streams with a variable bit rate while making sure that the total bit rate doesn’t exceed the bit rate of the transport stream. This second alternative is what is called statistical multiplexing.

All encoders of video has to solve the problem of rate control to be able to achieve the specified bit rate for the stream. The problem of rate control becomes more complex in the statistical multiplexing scenario due to having to control multi-ple encoders without exceeding the total bit rate. Maintaining constant quality across multiple video programs adds another dimension to the already complex problem of maintaining quality temporally within a single program.

(15)

2.3 Intel Media SDK 7

During the construction (multiplexing) of the transport stream, the bit rate of each video stream has to be constrained to make sure that any supported decoder can decode the bit rate. When encoding the video streams with a constant bit rate, this constraint is trivially solved by setting the bit rate to be less than the maximum bit rate as specified by the decoder. To be able to benefit from statisti-cal multiplexing, the video needs to be encoded with a variable bit rate and the buffer level of the decoder has to be tracked to avoid buffer over- and underflow. A typical encoder expects that the decoder receives data at a certain rate and has a certain buffer size. When multiplexing a video stream into a transport stream, the same constraints has to be respected, along with some additional constraints that similarly guarantees that the transport stream can be demultiplexed as well. Statistical multiplexing typically has two different categories of use-cases, with differing requirements. The first category is during the initial encoding of the broadcasted content. Typically the recipients is many enough that emphasis is on getting the best possible quality, cost is less of an issue. The second category is when combining content from multiple sources to be rebroadcast, typically to a cheaper medium and a smaller audience. While quality is still important, this scenario is a lot more cost- and performance-constrained as the cost of a better statistical multiplexing solution is not just automatically offset by the decreased bit rate requirements and there is a desire to not add latency to the network.

2.3 Intel Media SDK

The Intel Media SDK is an SDK provided by Intel to access the hardware-accelerated media processing capabilities of Intel processors. The dedicated hardware for media processing has been a part of the integrated graphics hardware of Intel’s processors since the Sandy Bridge architecture.

There is some added confusion around the naming of the SDK, since the library actually called Intel Media SDK [21] has a close sibling called the Intel Media Server Studio [22], which is said to contain the Intel Media SDK. Despite this, there is some obvious differences between the two offerings, such as differing platform support. Although the Intel Media Server Studio’s version of the Intel Media SDK is used in this thesis, it will just be referred to as the Intel Media SDK.

(16)

(17)

3

Theory

There are several important areas of knowledge needed for the design and imple-mentation of a statistical multiplexer. First of all we need a more thorough ex-planation of what is meant by “multiplexing”, which meaning can be somewhat ambiguous. In this report multiplexing means combining multiple elementary streams with variable bit rate containing video into a transport stream with con-stant bit rate. The “statistical” part is about how to allocate bit rate to the video streams in such a way that the overall video quality is maximised.

There are several questions that need to be answered for us to fully understand this problem. Firstly, the transport and elementary streams need to be defined. The rate allocation needs to be explored and related to this is some details about the video encoding itself, mainly how video encoding techniques influence mo-mentary bit rate. Also, what we mean by quality needs to be defined. Although video quality is often approximated using relatively simple metrics, it doesn’t necessarily correlate that well with subjective quality as perceived by users. The main problem in statistical multiplexing is to jointly control the bit rate of all individual video streams to achieve high and/or consistent video quality over time as well as across all the streams while maintaining compatibility with the system target decoder described in section 3.3.1. This means that bit rate control in general is most often a part of any proposed solution and specific details of the encoding used is often, to various degrees, included in the algorithm.

This chapter aims to introduce these necessary concepts as well as what has been done in previous research in this area.

(18)

3.1 Video coding

To construct a statistical multiplexer it’s not necessary to construct the encoder as well, and often a statistical multiplexing solution depends on an existing encoder. In this thesis, the Intel hardware encoder is used, and as such we do not have to (and to some extent, can not) reason about the encoder implementation. Still, some high-level knowledge of common video compression techniques are needed, specifically the high-level compression techniques that highly impacts the rate characteristics of video streams. Also, the decoder models used to define the bitstream’s conformance with the codec standard is presented.

3.1.1 Independently and dependently coded frames

A typical image compression scheme consists of finding similarities within the image and making sure to only save the similar information once. One reason for what makes video compression even more efficient is that we can also find similarities between frames leading to higher compression rates than what can be found with single images. Differential coding is when a frame is coded us-ing the difference from the previous frame. Since a decoder should be able to resume playback from anywhere in the stream, we cannot just use differential encoding though. About once or twice a second a key frame, also called I-frame, is sent which is a full frame that doesn’t depend on any other frame. Every frame between I-frames is differentially encoded frames, that depends directly or indi-rectly on an I-frame. This leads to very large spikes in bit rate every time an I-frame is sent. An I-frame followed by all dependent frames are called a group of pictures (GOP). [43]

Most video codecs use this system of sending a full frame followed by differen-tially coded frames, meaning that most video streams (and all covered in this the-sis) will have predictable bit rate variations within a GOP. This can be exploited by shifting each stream to avoid overlapping I-frames, as is done by, for example, Polec et al. [39]. Since encoders typically doesn’t have fully synchronized clocks, this solution is impractical in most scenarios when streaming for longer periods of time.

3.1.2 Decoder models

To verify the compatibility between encoders and decoders, a decoder model for each video encoding standard is specified. This is a virtual decoder that can be implemented at the encoder to guarantee that all decoders that supports that specific codec will be able to decode the generated bitstream. The decoder model emulates the behaviour of a decoder, by defining a number of buffers and how the data moves between these buffers according to bit rate, buffer levels, and the frame’s timing.

The outline of the decoder models used are presented here, but we refrain from going into detail since it’s a bit out of scope for this thesis. We will mainly concern ourselves with the System target decoder (see section 3.3.3), but there is some

(19)

3.2 Rate control 11

unavoidable overlap between the system-level decoder models and the decoder models as used by the encoder.

Virtual buffer verifier

The MPEG-2 video standard defines the Virtual buffer verifier (VBV), which con-sists of the Elementary stream buffer in the System target decoder model, an instantaneous decoder and a Picture re-ordering buffer for decoded frames. At decode time, the picture will be instantaneously removed from the Elementary stream buffer. If the presentation time is not the same as the decode time, then the decoded picture is kept in the Picture re-ordering buffer until it’s presenta-tion time. [48]

Hypothetical reference decoder

The Hypothetical reference decoder (HRD) for the AVC and HEVC standards is the equivalent of the VBV for the MPEG-2 video standard. It contains a Coded picture buffer (CPB), an instantaneous decoder and a Decoded picture buffer (DPB). At decode time, a frame is instantaneously removed from the CPB and the decoded data is added to the DPB. The decoded frame can, unlike in the VBV model, be kept in the DPB after it’s presentation time if it is referenced by later frames. The CPB is equivalent to the elementary stream buffer as used by the System target decoder and the VBV model. [48, 10]

3.1.3 Data dwell time

The time between when a frame arrives at the decoder and it is decoded is called the data dwell time. It is the time the frame spends in the decoder’s buffer. This is an important metric since it determines the amount of temporal buffering used. The MPEG-2 systems standard gives an upper limit to the data dwell time of 1 second [23]. A user switching to this stream will have to wait this period before the decoder will be able to present the first frame and so this value will have to be limited to keep this waiting period reasonably low.

3.2 Rate control

The bit rate variations of the video, aside from variations within a GOP, are depen-dent on the compression used and the complexity of the current scene. Adjusting the encoding to achieve the wanted bit rate is commonly known as rate control. There is typically two main approaches to rate control: variable bit rate (VBR) and constant bit rate (CBR). There is no strict definition here, but generally VBR tries to maintain constant video quality while varying bit rate and CBR maintains a target bit rate, usually with varying quality as a result. The bit rate is usually controlled by the encoder by setting the quantization step-size.

Using VBR also give rise to the problem of actually allocating the bits to achieve an even quality level along with maintaining the correct bit rate. To maintain a constant bit rate, the encoder parameters has to be inferred from the target bit

(20)

rate. For VBR, the current bit rate also usually has to be inferred given a target average bit rate or quality level. This makes it a harder problem to solve than for CBR, but with better utilization of bit rate as a result.

3.2.1 Quantization

Quantization means mapping a continuous or discrete signal with more informa-tion onto a discrete signal with less informainforma-tion. The quantizainforma-tion used during video compression is commonly controlled by the quantization step-size. Essen-tially, the bit rate of the compressed video is usually controlled by this value, where a higher quantization means more compression and a lower bit rate. [43] The AVC codec introduced a Quantization Parameter (QP), which controls the quantization. This is an integer between 0 and 51 where the corresponding quan-tization size is doubled for every 6th increase in QP. The quanquan-tization step-size is close to proportional to the resulting bit rate [52].

3.2.2 Rate-distortion theory

Rate-distortion (R-D) theory is often used for optimal bit allocation by finding the bit rate as a function of the distortion often represented as a rate-distortion curve. The rate-distortion relationship is commonly found using analytic reasoning or a parametric model.

In all lossy compression there is a trade-off between bit rate and distortion and rate-distortion theory is the tool we use reason about this trade-off to be able to make informed decisions for rate control during the encoding.

Typically, a rate-distortion curve can be found similar to what is presented in figure 3.1.

Figure 3.1:A typical rate-distortion curve

bit rate

(21)

3.2 Rate control 13

When using the QP as the control parameter for rate control, two functions are needed. The rate-quantization function models the relationship between the rate and quantization step-size, often with QP as the dependent parameter. The distortion-quantization function instead models the relationship between distor-tion and quantizadistor-tion step-size.

3.2.3 Rate-quantization models

The relationship between quantization and bit rate has been extensively researched. Some of the previously proposed methods are presented below.

Quadratic rate-quantization model

The quadratic rate-distortion model gives the rate R as a quadratic function of the distortion D as given in equation 3.1. The model is fitted using statistics of previous D and R values to determine the parameters a and b in the model. By approximating the distortion D as the quantization Q used for the frame during encoding we get equation 3.2. [6]

R(D) = aD−1+ bD−2 (3.1)

R(Q) = aQ−1+ bQ−2 (3.2) Rate-quantization modeling in theρ-domain

The ρ-domain rate model is based on the observed linear relationship between the number of zeros among the quantized transform coefficients and the coded bit rate. Proposed by He and Mitra [18], the experimental results from this method has been good, and is used for rate control in the statistical multiplexing solution by He and Wu [19]. The number of zero transform coefficients are found in an initial encode pass and these are then used during the “real” encoding.

Unfortunately, the limitations of the Intel Media SDK means that we do not have access to the transform coefficients without implementing some custom pre-encoding step and thus is not possible for this thesis.

Rate-quantization modeling in theλ-domain

Given that the allocation is done by finding the optimal point on a number of rate-distortion curves, and also given that the gradient of all these curves are monotonically decreasing, any optimal solution will be found where the gradient (the λ) are the same for all curves. The rationale behind this is that at that point reallocating bits from one curve to any other can’t decrease the total distortion since decreased distortion per bit isn’t greater anywhere else. This extremum is the global distortion minimum for the resulting total number of bits since the gradient of the rate-distortion curves are monotonically decreasing. The tricky part here is to find the correct gradient, or λ, for the given number of total bits.

(22)

When solving the rate allocation problem using the Lagrangian optimization method, the bit rate is implicitly determined as a function of the Lagrangian multiplier λ. The quantization is then inferred from λ. [35]

Other rate-quantization models

The rate control method used by Changuel et al. [4] is a simple model that works well given that it is only applied on GOP-level. On frame-level, the inter-dependencies between frames make things more complex. The relationship given in equation 3.3 describes the rate R as a function of the Quantization Parameter Q, leaving a and b to be found empirically.

R(Q) = a exp(−bQ) (3.3)

3.2.4 Distortion-quantization models

The quantization step-size is intricately linked to distortion, but all we really can say is that increasing the step-size will increase the distortion. When comparing the actual distortion there is a need to model the distortion as dependent on the quantization in some way.

There exists some models for finding the distortion, given the quantization level. Guo et al. [15] proposes an analytical quantization-distortion model that goes into some depth of the internal workings of the encoder. It approximates the distortion value fairly well, but given the dependence on access to encoder data and that it approximates a fairly simplistic distortion value, makes it not really valid for our use-case.

Many rate control methods just assume the quantization used to be equal to the distortion. It is obvious that a smaller quantization step-size gives a lower distor-tion for any given frame, but this strategy could be problematic when comparing the impact quantization has on different frames. Equaling the quantization to distortion assumes that different quantization levels has the same impact on all frames, which is hardly true. The quantization-distortion relationship also be-comes more complex when taking into account not only the distortion within a single image, but also the varying quality levels between adjacent frames, which might adversely affect perceived quality. Section 3.4 discusses the commonly used quality metrics, but none of these take temporal changes in quality into account.

3.3 Video multiplexing using MPEG-2 Systems

This section covers how the multiplexing of multiple media content streams is done as described by the MPEG-2 Systems standard. This isn’t a thorough expla-nation of the MPEG-2 Systems standard, but will present our specific use-case of sending multiple video streams in a single transport stream. As a consequence

(23)

3.3 Video multiplexing using MPEG-2 Systems 15

only the case where we want to multiplex a number of video streams is covered, although a transport stream can contain all sorts of media.

3.3.1 Elementary Streams and Packetized Elementary Streams

The encoded bitstream as produced by an encoder is referred to as an elemen-tary stream, with the specific format being dependent on the codec used. When multiplexed and sent over a transport stream, the elementary stream is packe-tized into a packepacke-tized elementary stream (PES). The PES header contains general metadata, most importantly timing information about when the frame contained is supposed to be decoded and displayed.

The timing information is kept in the PES header for each frame as two values, the Decode Timestamp (DTS) and the Presentation Timestamp (PTS). Both of these values are sampled from a 90 kHz clock. The DTS indicates the time when the frame should be decoded and the presentation indicates the display time for the frame. The decode time is important for determining when the frame leaves the decoder buffer as explained in section 3.3.3.

3.3.2 Transport Streams

A transport stream consists solely of 188 byte-sized packets, identified using a unique packet identifier (PID), carried in the header of each packet. Each video stream is associated with a specific PID and the corresponding PES is again pack-etized and carried in the transport packets that belongs to that specific PID. The transport stream packets also carries the Program Clock Reference (PCR) of each of the video streams which indicates the timestamp of the packet as sampled from a 27 MHz clock at the sender.

A transport stream containing multiple programs are sent at a constant bit rate. A constant bit rate transport stream is a different concept than a constant bit rate video stream. Where a video stream are a bit more diffuse about what a constant bit rate is (the main question being over what time period the stream is considered being constant), a constant bit rate transport stream means that the transport stream packets are sent at a fixed interval. When there is no real data to send the stream is padded with null packets.

3.3.3 System Target Decoder

The MPEG systems standard [23] defines a system target decoder, which is a virtual demultiplexer and decoder model which defines the timing and bit rate requirements of a conforming transport stream. It serves the same purpose as the decoder models defined by the video codecs being transported and is to be seen as an extension of those models.

The system target decoder exists in different versions, the one covered here is the transport stream system target decoder (T-STD). It defines the decoding be-haviour of audio, video and other data, such as program metadata. Figure 3.2 shows a simplified version of the T-STD model. Each transport packet is sent to

(24)

Figure 3.2: Overview of the transport system target decoder. The empty boxes represents buffers with a well-defined size, input and output bit rate.

Figure 3.3:The video pipeline from the transport system target decoder.

the correct pipeline, which contains a number of buffers. Given the buffer sizes, and bit rates between each buffer, conformance is defined by the under- and over-flow requirements of each buffer.

The only part that is of interest for this thesis is the video decoding pipeline, which is presented in more detail in figure 3.3. The first buffer is the transport buffer and it contains the transport stream packets. The packets are received by the buffer at the bit rate of the transport stream, and emptied at 120 % of the max bit rate of the video stream. The transport headers are stripped before they are sent to the multiplexing buffer, which are emptied at 100 % of the max bit rate of the video stream. The PES headers are then stripped before reaching the last buffer, which is the elementary stream buffer, and equivalent to the buffers for encoded data in the decoder models for the current codec (see 3.1.2). Each frame is removed from the elementary stream at it’s decode time (defined by it’s DTS). The purpose of the the first buffer is mainly to limit bit rate reaching the decoder, since the bit rate of the transport stream often is a lot higher than that of an in-dividual video stream. The size of this buffer is thus kept intentionally small at 512 bytes. The multiplexing buffer exists to handle the extra overhead from the PES packetization. It can also provide extra buffering when the size of the ele-mentary stream buffer is less than maximum for the current video stream. Here we assume that the elementary stream is sized as large as possible though. The

(25)

3.3 Video multiplexing using MPEG-2 Systems 17 elementary stream buffer handles most of the actual buffering of the video, and as mentioned is the only one directly connected to the video decoder.

The requirements for the buffers given the sizes and bit rates is that the transport and multiplex buffers should not overflow. The elementary stream buffer just stops accepting bits from the multiplexing buffer when it is full and thus does not overflow. The transport and multiplexing buffers should also empty once every second. The elementary stream should not underflow, meaning that all frames must be in the elementary stream buffer at it’s decode time.

3.3.4 Statistical multiplexing methods

A statistical multiplexing solution combine a number of video streams while mak-ing sure that the bit rate constraints of the individual streams as well as the total bit rate constraint is satisfied. Figure 3.4 shows the typical design of a statistical multiplexer. Often the input frames are put through some kind of pre-processing step (although this is not necessarily needed), then a decision is made as to how to allocate the total bit rate across all frames. There are two kinds of solutions used to do the allocation; either using an open or closed loop. A closed loop uses feed-back from previous decisions to try to extrapolate the rate-distortion behaviour of the future frames. The feedback is indicated in figure 3.4 by the dashed lines. An open loop solution lacks this feedback.

Figure 3.4: Architectural overview of a typical statistical multiplexer for video.

A common method to do frame-level bit allocation, which has been used with some success [36, 3], is to use some kind of complexity measure for the raw frame

(26)

and using this as a weight function for allocating the bits among frames.

He and Wu [19] proposes a method for statistical multiplexing of AVC streams. They allocate the total number of bits available to maintain an even video qual-ity by using a simple buffer model that determines the initial buffer level and the total number of output and input bits subsequently, using the current input and output bit rate of the decoder buffer. The allocation problem is setup as a constrained minimization problem aimed at minimizing a global distortion to be achieved across all programs, while fulfilling all buffer constraints using the sim-ple buffer model. Using a ρ-domain rate control method, the expected bit rate for a fixed distortion value is calculated. The bit rate for each program is then given as proportional to the expected bit rate needed. To find the real distortion value to be used for the encoding, different distortion values is tested until one is found that satisfies all constraints. The authors’ testing shows that this method achieves a 40-50 % reduction in needed bandwidth to reach the same quality as compared to a constant bit rate scheme.

The method proposed by Changuel et al. [4] is another statistical multiplexing solution for AVC coding, and differs from the others in that is allocates bits on GOP-level and that it uses a PID-controller to maintain buffer levels. By setting up constraints for maximum bit rate, minimum distortion, “smoothness”, mean-ing minimal PSNR variations, and “fairness”, meanmean-ing an even quality across the different programs, they proceed to maximize the PSNR value given these con-straints by allocating bit rate per GOP.

Pang et al. [38] categorises statistical multiplexing algorithms as either minimis-ing distortion variance or average distortion. They categorise the method pro-posed by He and Wu [19] as the former and proposes themselves an algorithm in the latter category, also for AVC coding. Their experimental results shows the proposed algorithm to be better than previous algorithms that minimises average distortion. The statistical multiplexing solution allocates bits on the frame level and then uses bit rate control to determine the quantisation parameter per macro block. Using a previously proposed model for the rate-distortion relationship between dependent frames, the statistical multiplexing problem is solved over a number of future frames, just like was done by He and Wu [19]. This problem is relaxed into a convex optimisation problem that can be solved efficiently with well-known methods.

Blestel et al. [3] proposes a simpler generic statistical multiplexing solution that isn’t dependent on a specific encoder. By assuming the rate control is a given part of the encoder, they propose just a bit allocations scheme, and assume that they can set the bit rate directly. This abstracts away a lot, most often codec-specific, calculations. Still, the calculated distortion for each program is based upon a very simple relationship between distortion, complexity, and bit rate which as-sume the AVC codec is used. The distortion value is provided by a pre-encoding step which partly simulates the real encoding. But given this complexity value and relationship, the problem is solved using Lagrangian optimization such that either maximum or overall distortion can be minimized.

(27)

3.4 Video quality metrics 19 The solution differs from the others in that it makes a point of decoupling the dif-ferent parts of the statistical multiplexer. It assumes that both encoder and rate control are opaque entities that can be assigned a target bit rate. It also uses an open-loop solution to avoid the coupling between the multiplexing stage and ear-lier stages. The solution itself uses a pre-encoding analysis stage to determine the complexity of the incoming raw frames. It uses this complexity value for a simple rate-distortion model to find the optimal bit allocation, minimizing PSNR.

3.4 Video quality metrics

So far, the distortion and quality of the video has only been brought up as abstract concepts. This section aims to bring a bit more clarity to the meaning of these concepts, and how they are measured.

The only final and absolute metric for video quality assessment is subjective user tests. ITU-T [30] defines a standard method to conduct these tests. Since these kinds of tests are expensive to do, objective video quality metrics are often used instead. These metrics are well-defined measurements that usually measures the distortion in the image.

There are three major areas of objective video quality assessment methods: no-reference (NR), reduced no-reference (RR), and full no-reference (FR). NR methods uses only the compressed video for quality assessment, without access to the uncom-pressed original. An example of NR methods are those that try to find common compression artifacts in the image. The NR methods have the advantage of be-ing computationally simpler, in that they don’t require any access to reference material, but lacks the performance of the FR and RR methods. The RR and FR methods has access to some, respectively all, of the original video when assessing the video quality. FR yields the best results, with some methods coming close to the “real” subjective metric as defined by user tests, but is not usable in real-time applications due to the computational complexity. Simpler FR methods exists and is commonly used, such as peak signal to noise ratio (PSNR). [7, 32]

For the evaluation of rate control algorithms such as those for statistical multi-plexing, an FR metric is commonly used because of it giving the best accuracy and performance not being critical when used simply for evaluation.

3.4.1 Peak Signal-to-Noise Ratio and Mean Squared Error

The most commonly used metrics are the Peak Signal-to-Noise Ratios (PSNR) and Mean Squared Error (MSE). Both of these measure the amount of information loss in the signal while treating all information as equal and measures information loss as the squared distance between the original and compressed signals. The difference is that PSNR measures the error, or noise, as relative to the max ampli-tude of the signal and commonly uses the decibel scale, while MSE measures the absolute error.

(28)

PSNR = 10 log₁₀ ₁ xmax

N

PN

n=1(xn−yn)2

dB (3.4)

The definition of MSE is [43]

MSE = 1 N N X n=1 (xn−yn)2 (3.5)

In both equation 3.4 and 3.5 the input frame is a vector with pixel samples (x1. . . xn) of length N and the output signal after compression is a

correspond-ing vector (y1. . . yn) of equal length. xmaxis the largest possible value of a single

sample.

PSNR has been shown to correlate with quality and is usable as a quality compar-ison tool, but only as long as the codec and content remains the same [20]. PSNR isn’t ideal for assessing quality levels across the different video streams in the sta-tistical multiplexing scenario since the different programs typically has varying content.

3.4.2 Structural similarity index

The Structural Similarity Index Metric (SSIM) [51] measures image quality by looking at a combination of luminance, contrast, and structure differences be-tween the original and compressed images. This metric has been used extensively in rate optimization research [50, 53, 37, 14, 9] to try to achieve results that cor-relate better to the quality as perceived by the user.

The method defines the luminance l(x, y), contrast c(x, y) and structure s(x, y) for input x and output y as

l(x, y) = 2µxµy µ2x+ µ2y , c(x, y) = 2σxσy σx2+ σy2 , s(x, y) = σxy σxσy (3.6) where µx, µy are the mean of x, y, σx, σy are the variance of x, y and σxy is the

covariance of x and y.

The metric itself is then defined as a the product of these with added constants C1

and C2for numerical stability when the denominators are close to zero. Equation

3.7 shows the resulting product of the three metrics, and equation 3.8 shows the definition of the SSIM metric as the same product, modified with the extra constants for numerical stability.

l(x, y) × c(x, y) × s(x, y) = 4µxµyσxy

(µ2x+ µ2y)(σx2+ σy2)

(29)

3.5 Intel Media SDK 21

SSI M(x, y) = (2µxµy+ C1)(2σxy+ C2)

(µ2x+ µ2y+ C1)(σx2+ σy2+ C2)

(3.8)

3.4.3 Other quality metrics

There exists a lot other quality metrics, as well as several variations of the PSNR and SSIM metrics. This is an on-going area of research and while there exists a plethora of different metrics to choose from, there is no definitive consensus of what best conforms to subjective user tests. A recent paper by Chen et al. [5] is a good resource for the current state of art regarding the measuring of video quality. Also, Netflix has recently chimed in with a quality metric of their own [1], which uses a number of different metrics, weighted by fitting them to a large data set of subjective user tests.

It should be noted that none of these more advanced metrics are used in the same extent as SSIM, and especially PSNR/MSE.

3.5 Intel Media SDK

The Intel Media SDK handles the encoding in a more-or-less opaque way. The wanted codec can be chosen, along with some configuration of the rate control method used. There are some possibilities for finer control of the encoding pro-cess, but these apply only for the AVC codec.

The Intel Media SDK doesn’t support anything apart from the encoding itself, meaning the multiplexing has to be done by an external part.

With an API that is largely the same for different codecs, a statistical multiplexing solution can be found that is generic for the different supported codecs of the Intel Media SDK.

3.5.1 Bit rate control

The Intel Media SDK contains several rate control algorithm implementations for various use-cases; both VBR and CBR, as well as variations thereof. It also enables the developer to set the quantization used directly. Below we shortly introduce the specific bit rate control methods of the Intel Media SDK that was used in this thesis. [45]

Not all rate control algorithms are supported for all codecs, but all of the algo-rithms covered here support MPEG-2, AVC and HEVC encoding with hardware support.

Constant bit rate (CBR)

The constant bit rate algorithm tries to maintain a constant bit rate, padding the frames with unused bytes when needed. This results in less bit rate fluctuations.

(30)

Constant Quantization Parameter (CQP)

To also allow for custom rate control algorithms, the Intel Media SDK provides a mode called Constant Quantization Parameter. The Quantization Parameter is a single value that is set to control the quantization, and thereby the “lossy-ness” of the encoding, further described in section 3.2.1. The name is a bit mis-leading, since although the default QP is set during initialization, it can also be dynamically set during encoding. This mode is recommended [45] for more ad-vanced use-cases or when something other than the given rate control algorithms is needed.

(31)

4

Method

This section describes how the statistical multiplexing solution was designed, as well as how the tests and evaluations of the Intel Media SDK and the statistical multiplexer was conducted.

The high-level design choices of the statistical multiplexing solution is presented and motivated, as well as related to solutions from previous research.

4.1 Rate control with Intel Media SDK

Out of the different choices of rate control methods available in the Intel Media SDK, the CQP method was chosen since it was the only one that fully supported dynamically changing the bit rate, and it is also the most flexible. Using any other rate control method with the Intel Media SDK will make it more difficult to dynamically adjust bit rate.

Note that this means that we have a control parameter that is fairly close to the resulting quality, but makes it harder to predict the bit rate. Compare to the solution by Blestel et al. [3], which uses the bit rate as control parameter, and thus mainly focuses on approximating the resulting quality.

4.2 Evaluating rate and distortion characteristics for

Intel encoder

To illustrate the rate and distortion characteristics of the Intel encoder for dif-ferent QP a number of tests were done by encoding a lossless sequence using a

(32)

range of constant QP values. The frame complexity was approximated by the av-erage gradient of the frame as defined by equation 4.1 as Cgradient, for an image

of size X × Y with pixel values defined for all image positions (x, y) by the func-tion P (x, y). This is a fairly common and simple complexity metric that has been determined to work fairly well, for example in works by Yao et al. [54] and Wang et al. [49]. Cgradient= 1 X × Y X X x=2 Y X y=2 2P (x, y) − P (x − 1, y) − P (x, y − 1) (4.1)

The Intel Media SDK was tested by using the CQP rate control method to encode 5 10-second clips from the SVT Multi Format [16] collection. The format used was 1920x1080, interlaced with frame rate 25 Hz. The clips was combined into one continuous sequence and then encoded with the FFmpeg version and com-mand shown in listing 4.1. The sequence was then transcoded using all possible values of QP (1-51).

Listing 4.1:FFmpeg version and options used for encoding test sequences. $ ffmpeg − v e r s i o n

ffmpeg v e r s i o n N−86011− g36cf422 Copyright ( c ) 2000−2017 t h e FFmpeg d e v e l o p e r s

b u i l t with gcc 4 . 8 . 5 (GCC) 20150623 ( Red Hat 4 . 8 . 5 − 4 ) c o n f i g u r a t i o n : −−enable −libmfx −−enable −n o n f re e

l i b a v u t i l 5 5 . 62.100 / 5 5 . 62.100 l i b a v c o d e c 5 7 . 95.101 / 5 7 . 95.101 l i b a v f o r m a t 5 7 . 72.101 / 5 7 . 72.101 l i b a v d e v i c e 5 7 . 7.100 / 5 7 . 7.100 l i b a v f i l t e r 6 . 89.100 / 6 . 89.100 l i b s w s c a l e 4 . 7.101 / 4 . 7.101 l i b s w r e s a m p l e 2 . 8.100 / 2 . 8.100

$ ffmpeg −s 1920x1080 −r 25 −pix_fmt yuv420p − i

5 _ o l d t o w n c r o s s . yuv −vcodec l i b x 2 6 4 −qmin 1 qmax 1 −b : v 19M −_{minrate : v 19M −maxrate : v 19M − b u f s i z e : v 3M − f mpegts}

The distortion of the resulting compressed video was evaluated using the open source Video Quality Measurement Tool (VQMT) [17].

4.3 Satisfying rate constraints

The bit rate constraints of the multiplexed stream was guaranteed using a packet scheduler that tracks the buffer levels in the T-STD model while constructing the transport stream. The packet scheduler sends the video data of the multiplexed stream with equal priority while making sure data only is sent if the T-STD model

(33)

4.4 Rate control model 25

allows it. The packet scheduler only limits bit rate and the other parts of the algorithm have to make sure that the bit rate of each stream is sufficiently small that each frame can be sent on time.

The stream-specific parameters of the T-STD model, shown in figure 3.3, was chosen according to the settings and profiles of the codec that is supported by DVB [12]. Table 4.1 summarizes the stream-specific T-STD settings for the used codecs. While the size of the elementary stream buffer can be chosen by the encoder, the implementation in this thesis use the maximum size allowed. This has limited effect on the T-STD, since a smaller elementary stream buffer only means that the multiplexing buffer will be larger to compensate.

Codec Maximum bit rate Elementary stream buffer size

MPEG-2 80 Mbit/s 1222.656 KB

AVC 24 Mbit/s 3750 KB

HEVC 22 Mbit/s 2750 KB

Table 4.1:The maximum bit rate and elementary stream buffer size used by T-STD for HD content.

4.4 Rate control model

No simple single-pass rate control method was found to work well enough to avoid violating bit rate constraints due to approximation errors. For this reason, a two-pass rate control method was used, where a very simple rate control model was used to approximate the decrease in bit rate that could be had by increasing the QP for the second pass. The two-pass approach is what gives this method any viability and the model itself is just an obvious step up from a linear model when noting that the rate-quantization relationship is not that linear.

The main problem with this approach is the performance penalty of encoding each frame twice. Errors in bit rate estimations is compensated for as soon as the resulting frame is returned by the encoder, limiting the impact of the errors.

4.5 Distortion model

The distortion model was chosen to be approximated by the quantization used by the encoder, assuming that a larger quantization step-size gives a higher dis-tortion in equal measure across all programs and frames.

Modeling the distortion more exactly becomes a lot more complex and it was not deemed feasible in the scope of the project, considering that it should be valid for all used codecs and that the Intel encoder does not give us access to all the information typically needed for more exact distortion models.

(34)

The feasibility of this approach was evaluated together with the rest of the algo-rithm, by evaluating if this resulted in an equal distortion level across the multi-plexed programs according to the used video quality metrics.

4.6 Distortion and bit rate control

The core part of the statistical multiplexer is to use the rate-distortion model to set the encoder parameters in such a way that the packet scheduler is able to send each frame on time, while trying to maximize video quality.

This is done by calculating a global distortion setting that will yield a bit rate that can be sent on time, given the rate-distortion model. The algorithm assumes that the the bit rate is mainly limited by the total bit rate of the transport stream, not the bit rate of the individual elementary streams. While the bit rate of the individual streams is taken into account during the calculation, a too high bit rate of a single stream will result in a higher distortion setting for all streams, without full bit rate utilization. Since the maximum bit rate of the individual streams was chosen to be as high as possible (see table 4.1), this was deemed unlikely to happen.

The algorithm was designed to slowly change the distortion setting when it can, to avoid quick quality variations, but quicker changes is allowed when needed to avoid a frame not being sent on time.

4.7 Initial video stream offset

Some consideration was taken for how to handle the bit rate similarities as gen-erated by the GOP structure (see section 3.1.1). To avoid sending the I-frames of different programs at the same time, an offset was added to each video stream in such a way that the position of each program’s I-frame was evenly spaced across the GOP of the first program being sent.

The offset was implemented by discarding Ni frames at the start of each program

i (i = 0 is the first program), where Ni is found by equation 4.2.

Ni =

G · i

N ; G = GOP size, N = number of programs, 0 ≤ i < N (4.2)

This was not formally included in the statistical multiplexing algorithm since the video streams will have some temporal drift in respect to each other due to different encoders using different clocks. The impact of doing this will be negligible as well since most of the time at least a whole GOP is buffered. The reason this was included in the implementation at all was because it was initially believed to have a larger impact than it had.

(35)

4.8 Evaluation 27

4.8 Evaluation

Previous research uses openly available test sequences [38, 4], other video se-quences [3], or both [19, 33] for evaluating the solutions.

The statistical multiplexing solution proposed by this report was evaluated using both publicly available test sequences but also using longer testing in tests more similar to a real-world scenario.

The test using the publicly available test sequences was done offline, meaning that input and output were both files and there was no requirement on being able to process the input fast enough for live viewing. The main reason for this was the performance impact of gathering the data needed to monitor the video quality impact of the transcoding. When measuring the quality impact, the base-line used for comparison was the same streams encoded using the Intel Media SDK constant bit rate (CBR) rate control instead, set to use an equal share of the total bit rate. The overhead of the transport stream was taken into account when setting the bit rate of the CBR streams, while the overhead of the PES header was assumed to be negligible. Since the PCR was always included in the transport stream header in the implementation used, but nothing more, the overhead can easily be calculated as₁₈₈12 ≈_{6.4% (see the transport stream specification [23] for} details).

The longer test was done with real input recorded from satellite and terrestrial sources. For this test the statistical multiplexer was receiving and sending data from and to live sources. For performance reasons, the video quality data was not gathered during this test, but the main reason of the test was to verify proper operation of the statistical multiplexer in a more real scenario.

The quality was compared using the PSNR and SSIM metrics, again using the VQMT [17] software. The choice of PSNR was based on it being the current de facto standard for rate control evaluation. To get a more accurate assessment of perceived quality, the SSIM values was presented as well. Newer, more advanced, metrics were rejected because of the lack of consensus among these and what re-ally corresponds to visual quality as perceived by humans. In short, measuring PSNR makes it possible to compare the result with the majority of the other re-search out there, while SSIM gives a hint of what the perceived quality looks like as opposed to just PSNR.

Hardware and software specifications

The statistical multiplexer was running on an Intel NUC Kit NUC6i7KYK using CentOS 7.2 with Intel Media Server Studio Community Edition 2017 R1. Test video sequences

The test files used for the offline test were four freely accessible movies: 1. Big Buck Bunny [41]

(36)

3. Sintel [8]

4. Elephants Dream [34]

These were encoded from lossless data using the Intel-based encoder integrated in FFmpeg, which uses the Intel Media SDK. The software version of the Intel Me-dia SDK and the hardware were the same as used with the statistical multiplexer. The output in listing 4.2 shows the version of FFmpeg used as well as the encoder options. To test the statistical multiplexer with up to eight programs, the movies were also encoded in reverse.

Listing 4.2:FFmpeg version and options used for encoding test sequences. $ ffmpeg − v e r s i o n

ffmpeg v e r s i o n 2 . 6 . 8 Copyright ( c ) 2000−2016 t h e FFmpeg d e v e l o p e r s

b u i l t with gcc 4 . 8 . 5 (GCC) 20150623 ( Red Hat 4 . 8 . 5 − 4 ) c o n f i g u r a t i o n : −− p r e f i x =/ usr −− b i n d i r =/ usr / bin

−−_{d a t a d i r =/ usr / s h a r e / ffmpeg −− i n c d i r =/ usr / i n c l u d e / ffmpeg} −−_{l i b d i r =/ usr / l i b 6 4 −−mandir=/ usr / s h a r e /man −−arch=x86_64} −−_{o p t f l a g s =’−O2 −g −pipe −Wall −Wp, −D_FORTIFY_SOURCE=2} −_{f e x c e p t i o n s − f s t a c k − p r o t e c t o r − s t r o n g}

−−_{param=ssp − b u f f e r − s i z e =4 −grecord −gcc − s w i t c h e s −m64} −_{mtune=g e n e r i c ’ −−enable − b z l i b −− d i s a b l e − c r y s t a l h d} −−_{enable −}_{g n u t l s −−enable −l a d s p a −−enable − l i b a s s} −−_{enable −}_{l i b c d i o −−enable −l i b d c 1 3 9 4 −−enable − l i b f a a c} −−_{enable −}_{n o n f re e −−enable − l i b f d k −aac −−enable −n o n f re e} −−_{d i s a b l e −indev=j a c k −−enable − l i b f r e e t y p e −−enable −libgsm} −−_{enable −libmp3lame −−enable −openal −−enable − l i b o p e n j p e g} −−_{enable −}_{l i b o p u s −−enable − l i b p u l s e −−enable − l i b s c h r o e d i n g e r} −−_{enable −}_{l i b s o x r −−enable − l i b s p e e x −−enable − l i b t h e o r a}

−−_{enable −}_{l i b v o r b i s −−enable − l i b v 4 l 2 −−enable − l i b x 2 6 4} −−_{enable −}_{l i b x 2 6 5 −−enable − l i b x v i d −−enable −x11grab} −−_{enable −}_{a v f i l t e r −−enable −avresample −−enable −p o s t p r o c} −−_{enable −}_{p t h r e a d s −− d i s a b l e − s t a t i c −−enable −shared} −−_{enable −gpl −− d i s a b l e −debug −− d i s a b l e − s t r i p p i n g} −−_{s h l i b d i r =/ usr / l i b 6 4 −−enable −runtime − c p u d e t e c t}

l i b a v u t i l 5 4 . 20.100 / 5 4 . 20.100 l i b a v c o d e c 5 6 . 26.100 / 5 6 . 26.100 l i b a v f o r m a t 5 6 . 25.101 / 5 6 . 25.101 l i b a v d e v i c e 5 6 . 4.100 / 5 6 . 4.100 l i b a v f i l t e r 5 . 11.102 / 5 . 11.102 l i b a v r e s a m p l e 2 . 1 . 0 / 2 . 1 . 0 l i b s w s c a l e 3 . 1.101 / 3 . 1.101 l i b s w r e s a m p l e 1 . 1.100 / 1 . 1.100 l i b p o s t p r o c 5 3 . 3.100 / 5 3 . 3.100 $ ffmpeg − i " l o s s l e s s /%05d . png " −vcodec l i b x 2 6 4 −b : v

(37)

4.8 Evaluation 29

19M −minrate 19M −maxrate 19M − b u f s i z e 3M −x264−params " nal −hrd=cbr " −g 15 −bf 2 −pix_fmt yuv420p − p r o f i l e : v high − l e v e l 4 . 2 − f mpegts −muxrate 25M

cbr19M_libx264_tscbr25M . t s

Table 4.2 shows which channels are used for the longer live test and also which source each belongs to. The video streams were all originally encoded with a variable bit rate and streams from the same source were previously processed by a statistical multiplexer.

Program Source

SVT1 HD Öst Linköping Vattentornet 618 MHz (Terrestrial) [46] SVT2 HD Öst Linköping Vattentornet 618 MHz (Terrestrial) [46] HR Fernsehen Astra 19.2◦E 10891 MHz Horizontal (Satellite) [2] RBB Brandenburg Astra 19.2◦E 10891 MHz Horizontal (Satellite) [2] Table 4.2:Programs used in the live test. All were encoded at 720p and 50 frames per second.

(38)

(39)

5

Results

The results of the thesis is presented in three parts: a section with the results of the initial testing of the rate and distortion characteristics of the Intel encoder, an explanation of the statistical multiplexing algorithm, and the results of the evaluation of the statistical multiplexer.

5.1 Rate and distortion characteristics of Intel

encoder

In this section a number of smaller tests are presented where the Intel encoder is used for encoding the SVT multi format [16] test sequences for different QP. The 5 test sequences has been concatenated and encoded as one sequnce. Every test sequence is 10 s long, and the coding difficulty assumed by SVT is presented in table 5.1.

Note that the Intel HEVC hardware encoder didn’t have support for “normal” P-frames, only Generalized P- and B-frames, meaning all those were marked as B-frames for these tests.

Figure A.2 shows that the coding complexity as given by SVT in table 5.1 is spot on and the first three sequences has similarly bit rates compared to the last two, which has a significantly lower bit rate. The complexity as approximated by the gradient in figure A.1 follows this behaviour fairly well, apart from the third sequence, where the gradient is a lot lower compared to the resulting bit rate. It is notable that the bit rate of the MPEG2 encoded stream is a lot lower than for AVC and HEVC. The MPEG-2 codec responds differently to the same QP, where MPEG-2 compresses the video more. As is shown in figure A.3 and A.4, the

(40)

Test sequence Name Difficulty 1 CrowdRun Difficult 2 ParkJoy Difficult 3 DucksTakeOff Difficult 4 IntoTree Easy 5 OldTownCross Easy

Table 5.1: By SVT assumed coding difficulty of the SVT multi format test sequences. [16]

distortion for the MPEG2 stream is also a lot higher. Figures A.5, A.6 and A.7 indeed shows this difference more clearly.

The bit rate curve in figure A.2 shows very similar bit rates for AVC and HEVC, and figure A.5 also shows a similar behaviour for different QP values, although HEVC shows somewhat lower bit rates for low QP values.

As for distortion, the PSNR and SSIM paint two different stories when looking at the distortion variations in figure A.3 and A.4. The most notable variations in PSNR is an overall higher quality for the less complex scenes for MPEG2 and some variations between the different frame types. SSIM on the other hand, shows a higher quality for the more complex scenes instead and especially AVC and HEVC seems to have a quality that is proportional to the bit rate. SSIM also shows some quality variations in distortion between the different frame types.

5.2 Statistical multiplexing algorithm

This section presents the algorithm that, given the rate-quantization and distortion-quantization models, determines the least possible distortion that can be main-tained across all programs while satisfying all bit rate constraints as defined by the T-STD. The implementation of the algorithm comes with some caveats as it simplifies some things, mainly regarding the rate-distortion modeling.

The statistical multiplexing algorithm can be divided into three parts: a rate-distortion model, a mechanism for limiting the bit rate to the decoder, and a control algorithm that determines the smallest distortion setting that fits into the total transport stream bit rate.

5.2.1 Rate-quantization and distortion-quantization models

The statistical multiplexing algorithm needs a rate-distortion model for the en-coder. Since the control parameter for the encoder’s bit rate is the QP in this scenario, this relationship will be defined as a rate-quantization and a distortion-quantization function. This gives both the rate and distortion as functions of the

(41)

5.2 Statistical multiplexing algorithm 33

QP, which transitively defines the R-D relationship.

To find the bit rate for a certain QP the encoding was set up as shown in figure 5.1. The encoding is done two times. The first pass will give an initial point on the rate-quantization curve and from this point, the bit rate for a higher QP is calculated using the simple relationship shown in equation 5.1. By using the initial point of the first pass, the constant α can be disregarded when calculating the second point. Since the quantization is doubled for every 6th QP (see section 3.2.1), the constant β was set to1₆.

Figure 5.1:The rate control used to set the target bit rate.

The distortion is simply approximated to be equal to the quantization used, as shown in equation 5.2.

R(Q) = α2−βQ β = 1

6 (5.1)

D(Q) = Q (5.2)

5.2.2 Maintaining bitstream conformance

The multiplexing is controlled by a packet scheduler that prioritizes the stream with the least number of frames sent and verifies conformance for each packet using the T-STD model. It does this by keeping track of the state of the buffers in the model for each stream and updates the state each time a packet is sent. When no packet from any program can be sent, a null packet is sent instead. Figure 5.2 shows a high-level overview of the logic flow for the packet scheduler.

As explained further in section 5.2.3, the quantization used is determined by simulating the transmission of the frames waiting to be sent to determine what kind of bit rate is allowed for the next frames to be encoded and this simulation starts at the time of the latest sent frame. Therefore, the packet scheduler also updates the time of the last frame sent together with the T-STD buffer state at that time for each frame sent.

(42)

Figure 5.2:Overview of the packet scheduler algorithm.

multiplexer and algorithm B.2 shows how T-STD conformance is checked when adding a packet. Note that the T-STD is simplified by assuming instantaneous arrival of each transport packet and by not allowing the Elementary stream buffer to overflow into the Multiplexing buffer.

The performance of the packet scheduler algorithm scales well overall, but it is still a fairly heavy operation. It does not scale that well with the number of pro-grams, because of the sorting and updating every packet, but these are typically no more than 10. Otherwise it’s linear to the number of packets sent which is as good as can be expected. That being said, it is a fairly high amount of computa-tions to be done for each packet and the packet rate is usually very high (almost 20000 packets per second for a 30 Mbit/s stream). A simple way to lower the computational cost is to send a bunch of packets in-between every program up-date and sort, at the cost of some of the precision in the T-STD model (frames not being sent instantaneously and program priority not updated every packet).

5.2.3 Distortion control

This part of the algorithm controls the distortion setting for the current frame, making sure that each frame can be sent on time given the total bit rate of the transport stream as well as the maximum bit rates of the video streams.

The algorithm keeps a global distortion target for all streams. Initially this distor-tion target is used to infer the size of each frame currently considered using the rate-distortion model. Starting at the time the last frame was sent, the sending of

Statistical Multiplexing of Video for Fixed Bandwidth Distribution : A multi-codec implementation and evaluation using a high-level media processing library

Master of Science Thesis in Computer Science

Department of Electrical Engineering, Linköping University, 2018