• No results found

On Computational Complexity of Motion Estimation Algorithms in MPEG-4 Encoder

N/A
N/A
Protected

Academic year: 2021

Share "On Computational Complexity of Motion Estimation Algorithms in MPEG-4 Encoder"

Copied!
66
0
0

Loading.... (view fulltext now)

Full text

(1)

Motion Estimation Algorithms in MPEG-4 Encoder

Muhammad Shahid

This thesis report is presented as a part of degree of Master of Science in Electrical Engineering

Blekinge Institute of Technology, 2010

Supervisor: Tech Lic. Andreas Rossholm, ST-Ericsson

(2)
(3)

Video Encoding in mobile equipments is a computationally demanding fea- ture that requires a well designed and well developed algorithm. The op- timal solution requires a trade off in the encoding process, e.g. motion estimation with tradeoff between low complexity versus high perceptual quality and efficiency. The present thesis works on reducing the complexity of motion estimation algorithms used for MPEG-4 video encoding taking SLIMPEG motion estimation algorithm as reference. The inherent prop- erties of video like spatial and temporal correlation have been exploited to test new techniques of motion estimation. Four motion estimation algo- rithms have been proposed. The computational complexity and encoding quality have been evaluated. The resulting encoded video quality has been compared against the standard Full Search algorithm. At the same time, reduction in computational complexity of the improved algorithm is com- pared against SLIMPEG which is already about 99 % more efficient than Full Search in terms of computational complexity. The fourth proposed algorithm, Adaptive SAD Control, offers a mechanism of choosing trade off between computational complexity and encoding quality in a dynamic way.

(4)
(5)

It is a matter of great pleasure to express my deepest gratitude to my ad- visors Dr. Benny L¨ovstr¨om and Andreas Rossholm for all their guidance, support and encouragement throughout my thesis work. It was nonethe- less a great opportunity to do research work at ST-Ericsson under the marvelous supervision of Andreas Rossholm. The counseling provided by Benny L¨ovstr¨om was of great value for me in writing up this manuscript.

I can’t forget mentioning the comfort I received from Fredrik Nillson and Jimmy Rubin of ST-E in setting up the working environment and start up of ST-E algorithm. I owe my successes in life so far to all of my family members, for their magnificent kindness and love!

(6)
(7)

Abstract i

Acknowledgements iii

Contents v

1 Introduction 1

2 Basics of Digital Video 3

2.1 Color Spaces . . . 3

2.2 Video Quality . . . 4

2.3 Representation of Digital Video . . . 4

2.4 Applications . . . 4

2.4.1 Internet . . . 5

2.4.2 Video Storage . . . 5

2.4.3 Television . . . 5

2.4.4 Games and Entertainment . . . 6

2.4.5 Video Telephony . . . 6

3 Video Compression Fundamentals 7 3.1 CODEC . . . 7

3.2 A Video CODEC . . . 8

3.3 Video Coding Standards . . . 9

3.3.1 MPEG-1 . . . 10

3.3.2 MPEG-2 . . . 10

3.3.3 MPEG-4 . . . 10

3.3.4 MPEG-7 . . . 10

3.3.5 MPEG-21 . . . 10

3.3.6 H.261 . . . 11

3.3.7 H.263 . . . 11

3.3.8 H.263 + . . . 11

3.3.9 H.264 . . . 11

3.4 MPEG-4 . . . 11

(8)

3.5 Syntax . . . 12

4 Motion Estimation and its Implementation 15 4.1 Block Matching . . . 17

4.2 Motion Estimation Algorithms . . . 18

4.2.1 Full Search . . . 19

4.2.2 Three-Step Search . . . 20

4.2.3 Diamond Search . . . 20

4.2.4 SLIMPEG . . . 21

5 Rate Distortion Optimization and Bjontegaard Delta PSNR 23 5.1 Measurement of Distortion . . . 24

5.2 Bjontegaard Delta PSNR . . . 24

6 Simulation, Results and Discussion 29 6.1 SAD as a Comparison Metric . . . 30

6.2 Proposed Techniques . . . 30

6.2.1 Spatial Correlation Algorithm . . . 31

6.2.2 Temporal Correlation Algorithm . . . 31

6.2.3 Adaptive SAD Control . . . 32

6.3 Simulations with different video sequences . . . 34

6.3.1 Football Sequence . . . 35

6.3.2 Foreman Sequence . . . 36

6.3.3 Claire Sequence . . . 40

7 Conclusion and Future Work 51

List of figures 54

List of tables 55

Bibliography 57

(9)

Introduction

Since the advent of the first digital video coding technology standard in 1984 by the International Telecommunication Union (ITU), the technology has seen a great progress. The two main standard setting bodies in this regard are ITU and International Organization for Standardization (ISO).

Recommendations of ITU include the standards like H 261/262/263/264 and these focus on applications in the area of telecommunication. Motion Pictures Experts Group (MPEG) of ISO has released standards like MPEG- 1/-2/-4 which focus the applications in computer and consumer electronics area. The standards defined by both of these groups have some parts in common and also some work has been performed as a joint venture. The field of video compression has been continuously developing with the en- hancements in the previous versions of the standards and introduction of new recommendations. MPEG-4 standard is followed in this thesis work.

It can be easily said that video compression is a top requirement in any multimedia storage and transmission phenomenon with encoding the video in various forms before sending or storing it and then decoding it subse- quently at the receiver end or when viewing it. Besides the presence of digital video in television and CD/DVD etc, cellular phones will probably be the next high use place of video content. The limited storage capacity of mobile equipments dictates the requirement of efficient video compression tools. Video encoding in mobile equipments has developed from a high- end feature to something that is taken for granted. Nevertheless, it is a computationally demanding feature that requires well designed and well developed algorithms. Many different algorithms need to be evaluated in order to come closer to the optimal solution.

As early as 1929, Ray Davis Kell described a form of video compression for which he obtained a patent [1]. Given the fact that a video is actually a series of pictures transmitted at some designated speed between successive images, Rays patent gave rise to the idea of transmitting the difference be- tween the successive images instead of sending the whole image. However,

(10)

it took ages to get the idea implemented into reality but still it is a keystone of many video compression standards today. Connected to this idea, there comes the concept of motion estimation which tries to exploit the presence of temporal correlation at different positions between the video frames.

It predicts the motion found in the current frame using already encoded frames. Henceforth, the residual frame contains much less energy than the actual frame. Motion vectors and the residual frame are encoded by a bit rate much lesser than the bit rate required to encode a regular frame.

Motion estimation may require tremendous amount of computational work inside the video coding process. There are certain algorithms employed for doing motion estimation. The basic class of these is called Full Search Al- gorithms and it gives optimal performance but computationally very time consuming. To deal with this computation issue, many sub optimal fast search algorithms have been designed and this thesis will focus on some of them in a try to improve performance of one of them. The SLIMPEG motion estimation algorithm is taken as reference here and inherent video properties like spatial and temporal correlation has been applied to devise techniques in a try to achieve less complex yet performance oriented motion estimation algorithms.

The rest of the report is organized as: Chapter 2 and chapter 3 deal with fundamentals of digital video and video compression respectively. Imple- mentation aspects of motion estimation have been explored in chapter 4 ending with the introduction of SLIMPEG motion estimation algorithm.

Rate distortion and delta PSNR are the contents of chapter 5. Results of the main contribution have been provided in chapter 6 with their descrip- tion. Chapter 7 contains conclusion and some hints about future work in the field.

(11)

Basics of Digital Video

A video image is obtained by capturing the 2D plane view of a 3D scene.

Digital video is then spatial and temporal sampled frames presented in a sequence. The spatio-temporal sampling unit which is usually called pixel (picture element) can be represented by a digital value to describe its color and brightness. The more sampling points taken to form the video frame the higher is usually the visual quality but requiring high storage capacity.

The video frame is usually formed in a rectangular shape. The smoothness of a video is determined by the rate at which its frames are presented in a succession. A video comprising a frame rate of thirty frames per second looks fairly smooth enough for most purposes. A general comparison of ap- pearance of a video determined by its frame rate is given in the table 2.1[2].

Table 2.1: Video frame rates.[2]

Video frame rates. Appearance

Below 10 frames per second ’Jerky’, unnatural appearance to movement 10-20 frames per second Slow movement appears OK;

rapid movement is clearly jerky 20-30 frames per second Movement is reasonably smooth 50-60 frames per second Movement is very smooth

2.1 Color Spaces

The pixel may be represented by just one number (grey scale image) or by multiple numbers (colored image). A particular scheme used for rep-

(12)

resenting colors is called color space. Two of the most common schemes are known as RGB (red/green/blue) and YCrCb (luminance/red chromi- nance/blue chrominance).

In the RGB color space, each pixel is represented by three numbers indi- cating the relative proportion of the three colors. Each of the numbers is usually formed by eight bits. So, one pixel requires twenty four bits for its complete representation. It has been observed from psycho visual experi- ments that human optical system is less sensitive to color than luminance.

This fact is exploited in YCrCbcolor space where luminance is concentrated in only one of its components Y and color information is contained in the rest of the components. There is a relationship between both color space schemes where one representation can be transformed into another. For details on this, please see [2].

2.2 Video Quality

The video quality is an important parameter and is a subjective issue, by its nature of being judged by human. There are many objective criteria for measuring video quality which can give results with some correlation to hu- man experience e.g., PSNR. However, they may not make up satisfactorily to the demand of subjective experience of a human observer. Experiments show that a picture with lower PSNR may look visually better than with a higher value of PSNR. It is to be noted that human visual experience may vary also from person to person and brings up the need of such alternatives which could be thought of as covering the need of both objective and sub- jective tests. An objective test which matches best with the human visual experience will give acceptable results.

2.3 Representation of Digital Video

Before the video is ready for coding, it is often transformed to one of the Intermediate Formats. The central out of them is common intermediate format, CIF, where a frame size resolution is 253 X 288 pixels. Table 2.2 gives information about some standard common intermediate formats.

2.4 Applications

There has been an exponential growth in applications of digital video and the technology has got the capacity to emerge rapidly. Some examples of

(13)

Table 2.2: Intermediate formats. [2]

Format Luminance resolution(horz. X vert.)

Sub-QCIF 128 X 96

Quarter CIF(QCIF) 176 X 144

CIF 352 X 288

4 CIF 704 X 576

widely used digital video applications are given in the following subsections.

2.4.1 Internet

It can be safely said that current era of internet holds the most of digital video applications ranging from a small video clip to wholesome of movies, from a small video chat to a corporate video conference and so on. Re- mote teaching/learning, video telephony and sharing videos has been made possible by the digital video. The state of the art video broadcasting phe- nomenon YouTube presents billions of videos to its viewers worldwide by using benefits of digital video technology.

2.4.2 Video Storage

Digital video has reshaped the way of storing videos. CD/DVD ROM and Blu-ray Disc have almost wiped out the classic film tape media storage devices. These new storage discs come with huge advantages of capacity, portability and durability. The latest of them is Blu-ray Disc and it has storage capacity as much as 50 GB in single layer and upto 100 GB in dual layer[3].

2.4.3 Television

Satellite television channels across the planet create an entire new world of global village by the virtue of digital video. Literally, there are thousands of television channels operating in various areas of the world and the num- ber is yet increasing. News, current affair shows and popular drama serials gather a huge count of viewers.

(14)

2.4.4 Games and Entertainment

The heavy video games and movies have gained enormous popularity and these are again an applications of digital video. Now a days, we see an increasing trend of popularity of 3D animation movies which is a big suc- cess of digital video. Take the example of ’Avatar’, a block buster 3D flick which is possibly the best ever liked movie of the current era.

2.4.5 Video Telephony

The digital video has played an important role in getting a video along with voice while communicating on telephone. On government and private levels, video conferencing is replacing the need of traveling far away for at- tending meetings at one place. Skype is probably the brand leader in this field.

(15)

Video Compression Fundamentals

It has been observed that the size of an ordinary digitized video signal is far greater than usual storage capacity and transmission media bandwidth.

This fact shows the need of systems capable of compressing the video.

For the sake of example, a channel of ITU-R 601 television (with 30 fps) requires media having bit rate of 216 Mbps for broadcasting in its uncom- pressed form. A 4.7 Gb DVD can store only 87 seconds of uncompressed video at this bit rate. This implies that there is a clear need of such mech- anism which can make this data fit to be able of storing or transmitting having limited capacities. Hence comes the compression but with drawback of some quality loss in visual experience. An effective compression system is, in general, lossy in nature.

3.1 CODEC

The term CODEC represents a combination of two systems capable of en- coding (compressing) and decoding (decompressing). A typical codec is shown in figure 3.1. The encoder compresses the original signal and the process is called source coding. After some more signal processing the sig- nal reaches the point of decompression at source decoder.

According to information theory, there is statistical redundancy in an or- dinary data signal. This principle has been utilized in Huffman coding and such kind of CODEC is known as entropy CODEC. However, the entropy encoders do not perform well in case of images and videos. There is need of deploying source models before entropy coding can applied on such data.

There are some properties present in video which are taken into considera-

(16)

Figure 3.1: Source coder,channel coder,channel[2].

tion to be benefited in source models. These properties include the spatial and temporal redundancy present amongst pixels in video frames. More- over, psycho visual experiments have shown that human visual system is more particular about lower frequencies. So, in encoding process for video, some high frequencies can be safely ignored. Codecs are often designed to emphasize certain aspects of the media, or their use, to be encoded. For example, a digital video (using a DV codec) of a sports event, such as base- ball or soccer, needs to encode motion well but not necessarily exact colors, while a video of an art exhibit needs to perform well encoding color and sur- face texture. Pertaining to video quality, there can be two kinds of codecs.

In order to achieve good level of compression, most of the codecs degrade the original quality of the signal and are known as lossy codecs. There are also codecs which preserve the original quality of the signal and are known as lossless codecs [10]. Some examples of coding techniques are presented here. In Differential Pulse Code Modulation (DPCM) coding technique, pixels are sent as prediction of already dispatched pixels. Next step is to transmit the prediction error which is actually difference of prediction from actual pixel. Transform coding changes the domain of the frame signal.

This change is helpful in rounding off insignificant coefficients and a lossy compression is achieved. The transform coding has got a great deal of ap- plication in various video compression techniques. Another technique is motion compensated predictive coding which is the emphasis of this thesis.

In a similar way as that of DPCM, a model of actual frame belonging to a video is obtained by prediction based on already encoded frame. This model is then subtracted from the original frame to obtain residual frame which contains much less energy as compared to its original frame [2].

3.2 A Video CODEC

Video signals are constructed by a sequence of still images which are better known as video frames. These frames can be encoded for compression

(17)

Figure 3.2: Video CODEC With Prediction[2].

using intra frame coding techniques but this compression does not turn out to be of enough good value for a video. This fact and presence of temporal redundancy inside the video sequence drives the need of inter frame encoding. A prediction of actual video frame based on previous frame is subtracted from actual frame to form what is called residual frame. The residual frame is then encoded by frame codec. A block diagram of such video coder is in figure 3.2. The process of encoding the residual frame includes its transformation. The transform coefficients are quantized and then entropy coding is applied for transmission or storage. At the decoder end, revert operation of these steps are applied to get the original data back [2].

3.3 Video Coding Standards

Most of the video codecs currently being used belong one of the two main- stream video codec standards viz. International Standards Organization (ISO) and International Telecommunications Union (ITU). ISO has intro- duced JPEG and MPEG-x series for image and video respectively. Simi- larly, ITU has introduced its standards with H.26x series. Coming next is a brief description of these standards and MPEG-4 shall be described in detail.

ISO has covered the applications related to storage and distribution through its standards. The Moving Picture Experts Group (MPEG) has developed recommendations for video and its standards include the following [2][3].

(18)

3.3.1 MPEG-1

Video and audio data can be compressed and played back in real time on CD-ROM under this standard (at a bit rate of 1.4 Mbps). The VHS-quality digital video is compressed down to a ratio of 26:1.

3.3.2 MPEG-2

Bit rate has been increased from the previous standard to 3-5Mbps for com- pression and transmission of video and audio data storage and transmission purposes. Additionally, support for interlaced video has been added into it.

3.3.3 MPEG-4

It came in late 1998 and provides video and comes with additional features to those of the previous standards. It supports a huge range of bit rates and will be discussed in detail at the end of this section.

3.3.4 MPEG-7

It is a multimedia content description standard. This provides support for describing multimedia content data, with the aim of providing a standard- ized system for content-based indexing and retrieval of multimedia infor- mation. It is rather meant for accessing the multimedia data instead of coding and compression phenomenon. MPEG-7 has been formally known as Multimedia Content Description Interface.

3.3.5 MPEG-21

It is usually ratified as Multimedia Framework. It provides definition of an open framework for multimedia applications. The Rights Expression Lan- guage, as defined by MPEG-21, standardizes the process of sharing digital rights for digital content from its source to the consumer end. Integration and inter operation between various technologies related to multimedia field is promoted by this standard.

ITU has focused on applications related to real time and dual duplex video communications. Its working body for standardization is called as Video Coding Experts Group(VCEG) and it has given out the following standards.

(19)

3.3.6 H.261

It was primarily introduced for video telephony over ISDN lines where chan- nel capacity is multiple of 64 kbps. Two video sizes CIF and QCIF are being supported by it.

3.3.7 H.263

It offers videoconferencing for a variety of bit rates ranging from kbps to many Mbps. It is quite popular into internet applications.

3.3.8 H.263 +

It is second version of H.263 which adds some enhancements into the orig- inal standards including better encoding and a level of immunity to trans- mission errors. There came another version H.263++ where some annexes were added with more functionalities.

3.3.9 H.264

Also known as MPEG-4 part 10 or Advanced Video Coding (AVC), its first set of recommendations came in 2003. Its use is found in applications such as Blu-ray Disc, YouTube videos and television services. Moreover, H.264 was developed by Joint Video team (JVT) which was actually a collabora- tion work group of ITU and ISO.

3.4 MPEG-4

This standard was developed in an effort to enhance the functionalities of the already exiting MPEG standards for video coding. One of the add- on features is efficient compression for applications which involve low bit rate of transmission media. A whole new concept of video scene and video object has been introduced which considers the coding of video based on its contents instead of just taking everything same in rectangular frames.

MPEG-4 standard is progressive in a way that it has got the capacity to ab- sorb new tools and enhancements. The whole lot of tools, which MPEG-4 offers for encoding, have been offered through various subsets. Such sub- sets are called profiles and a specific profile addresses a specific application.

One such example is Simple Profile which aims at applications requiring low bit rate and low resolution. Another is Advanced Simple Profile which

(20)

has features like support for bidirectional prediction frames and quarter pixel motion compensation. Some of the salient functionalities for encod- ing video frames by MPEG-4 are described in the following [2].

• Video core: The video coding phenomenon uses such algorithms which make the core of the standards very low bit rate.

• Input format: Video data is pre-processed and sometimes converted to one of the picture sizes listed in Table 2.2, at a frame rate of up to 30 frames per second and in 4 : 2 : 0 (Y : Cr : Cb) format before the codec is applied.

• Picture type: The frames are encoded as I frames (Intra coded) or P frames (Predictive coded) or B frames (Bidirectional prediction).

For video encoding, a frame is usually divided into small sections of a certain size and these small sections are called macro blocks. I frames contain strictly intra coded macro blocks and P frame could have ei- ther inter or intra coded macro blocks.

• Motion estimation: It is done in macro blocks of size 16X16 normally with an optional availability of macro blocks of size 8X8, 4X4, 4X8, 8X4, 8X16, 16X8, depending upon the profile being in operation. The motion vectors (an array of coordinates representing relative motion) could have sub-pixel resolution.

• Transform coding: The residual frame obtained after motion esti- mation process is coded using discrete cosine transform (DCT). The coefficients returned are then quantized and arranged in zig zag fash- ion. Finally, run level coding is applied.

3.5 Syntax

Main features of the syntax of the MPEG-4 coded bit stream are described as under.

• Picture layer: The top layer of the syntax contains a complete coded picture. The picture header contains values to describe picture res- olution, the type of coded picture (inter or intra) and a temporal

(21)

reference field.

• Group of blocks layer: A complete row of macro blocks forms a group of blocks (GOB) in QCIF, CIF and SCIF frames and this flag helps in resynchronization by decoder if any errors cause synchronization loss.

• Macroblock layer: Four luminance and two chrominance blocks form one macroblock. The header contains information about type of mac- roblock and motion vectors for inter coded macroblocks.

(22)
(23)

Motion Estimation and its Implementation

Motion estimation is a key component in video compression, processing and in computer vision. The knowledge of motion in a video compression process helps eradicate temporal redundancy amongst the successive frames and consequently a high value of compression ratio is returned. This makes motion estimation a must found module in the video coding standards.

Contrary to the older standards, MPEG-4 introduces a region based motion model which is quite flexible and has got more efficiency. Consider a video frame from a set of video frames and name it as current frame. Some frames from the video which have already been encoded can be used to predict the contents of the current frame and such frames are named as reference frames. Such prediction is called motion prediction. The temporal order of reference frame can be earlier or later than the current frame and the estimation will be forward prediction or backward prediction respectively.

The forward and backward prediction could be combined also and in this case it is called bidirectional prediction. The process can be understood by observing the pictorial explanation given by figure 4.1 [11] and block diagram given in figure 4.2 [2]. The target of a motion estimation algorithm is to develop a model of the current frame according to the reference frame with maximum accuracy and minimum computational involvement.

As given in the figure 4.2, the Motion Estimation block creates such a model by altering a reference frame. The Motion Compensation block cre- ates a residual frame by subtracting the model of the current frame from the original current frame. This residual frame is then sent for transform and entropy coding and sent for transmission along with motion vectors information. Another interesting step is taken here which involves the de- coding of this encoded frame in a try to reproduce the current frame to be

(24)

Figure 4.1: Motion Estimation and Compensation [11].

Figure 4.2: Motion Estimation and Compensation [2].

(25)

Figure 4.3: Block Matching [2].

used as reference frame for further encoding process.

The grade of compression can be measured by the size of the coded resid- ual frame which is also called displaced frame difference (DFD) and the overhead info related to motion vectors. The size of the coded residual frame is proportional to the energy present in the DFD after the motion compensation process. It is observed that this energy can be reduced using motion estimation and compensation to have higher compression efficiency [2].

4.1 Block Matching

To carry out the motion estimation and compensation process, a video frame is taken as composed of non overlapping block of a certain size, e.g., 16 x 16 pixels. There are some other standard sizes too which are used in different video coding standards. Such blocks are formally known as Macro Blocks and motion estimation applied on them is known as block matching.

Block matching is performed on luminance samples (e.g., on Y blocks in MPEG-4 encoding). Two macro blocks out of current frame and reference frame are checked for similarity aiming at minimizing the energy difference between them. The search area in reference frame is centered around the macro block under consideration to exploit the temporal redundancy and to avoid searching the whole reference frame.

The block matching phenomenon is depicted in the figure 4.3.

In this figure, a 3x3 current block is searched for matching in the corre- sponding position in reference frame and the search region is kept 1 pixel wider than the size of the block. There are various search criteria methods to estimate the optimum matching point. Some examples include Sum of Absolute Difference (SAD), Mean Square Error (MSE) and Mean Absolute Error(MAE). SAD is calculated as:

SAD =

N i=1

N j=1

|Cij− Rij| (4.1)

(26)

where i,j are pixel positions and C, R represent current and reference frames respectively. SAD is usually chosen because of its simple calculation pro- cedure. The SAD between current block and its (0, 0) corresponding block in reference frame is considered as search metric and is given in equation 5.2

|1−4|+|3−2|+|2−3|+|6−4|+|4−2|+|3−2|+|5−4|+|4−3|+|3−3| = 12 (4.2) The SAD is calculated for other positions as well. The best matching block is centered at the position which returns minimum SAD value and in this case it is found to be (-1, 1), giving a SAD value of 2. Same procedure is repeated for rest of the blocks in the current frame to complete its motion estimation. So, the video encoding process can be itemized as follows [2].

• A difference of energy is calculated between block in current frame and a search window of certain size which is positioned around the center of its corresponding position block in the reference frame.

• The region picked as matching is the one which gives the least value of search metric.

• The procedure is repeated to have estimation model of the whole current frame.

• A residual frame is then obtained by subtracting the resultant model of the current frame from the original current frame.

• The residual frame along with its information of motion vectors is then encoded and sent for transmission or storage purpose.

4.2 Motion Estimation Algorithms

For the best match, the current block has to be searched for in the whole reference frame. But this approach makes things very computationally time consuming. It has been observed that potential match of block in current frame is often found in blocks nearby to its position in reference frame.

So, the search area is limited to smaller region than the whole frame and that region is known as search window. The optimal search window size is steered by two important trade offs, viz. performance and complexity.

Large search window sizes usually perform better but on the cost of larger number of comparisons and hence require more computational sources.

There are various methods deployed for finding the matching region in the reference frame (s) and they are broadly divided into two classes. The basic

(27)

Figure 4.4: Full Search Methods [2].

one is called Full Search and it is also known as the optimal solution for motion estimation inside block matching domain. The sub optimal class is called Fast Search method and such methods are speedy as obvious from their name. Starting with the Full Search, some of them are described as in the following.

4.2.1 Full Search

As said earlier, this is the optimum way of finding motion vectors. The method goes like this: Search the whole search window at each of its search points (pixels) for finding out the best match dictated by the search cri- terion like SAD. There are two ways in practice for searching the region.

These methods are depicted in figure 4.4. Raster order starts from one corner and goes all the way to the opposite corner. Spiral order starts from the middle point of the search window and moves outwards successively scanning the pixels on its way. The later method is efficient in a way that it can be stopped earlier if some minimum value of search criterion has been reached before scanning the whole search window [2].

Apparently, Full Search method is brute force by nature. This gives best results in terms of accuracy and hence best matches but involves a mas- sive measure of computations. This computational complexity renders Full Search method and restricts its usability in general and in real time CODECs in particular.

Coming next are some algorithms which try to reduce the number of com- parisons.

(28)

Figure 4.5: Three Step Search [2].

4.2.2 Three-Step Search

This algorithm is one of the earliest approaches for setting up a fast search method. As shown in figure 4.5, it starts by fixing eight search points at a distance (called step size) of half the length of search window from (0,0) position. The point giving the least value of search criterion (SAD) is then chosen as center for next round of searching with half the step size of previous round. This goes further until a step size of one is reached and the search point with minimum value of SAD is selected as the best match.

The considerable reduction in search points as compared to Full Search is obvious from the mentioned figure.

4.2.3 Diamond Search

Contrary to the previous one, the search shape has been changed to that of a diamond and the number of steps taken by the algorithm to converge are not limited. On the basis of patterns that the algorithm could adopt, its divided into Large Diamond Search Pattern (LDSP) and Small Diamond Search Pattern (SDSP) as depicted in figure 4.6. The algorithms starts with LDSP and if the minimum SAD value is found at the center,it jumps to step four as given in the figure 4.6. Rest of the steps are like LDSP except the last step. The points checked for SAD calculation could be 3 or 5 again indicated by the mentioned figure. The last step incorporates

(29)

Figure 4.6: Diamond Search [7].

SDSP around the new search origin and point with the least value of SAD is declared as best match. As the search pattern is neither too small nor too big and the fact that there is no limit to the number of steps, this algorithm can find global minimum quite accurately. The performance is very near to Full Search but the computational complexity is very much less [9].

4.2.4 SLIMPEG

This is standard reference algorithm for motion estimation used in this the- sis work and is discussed here. It works on recursive/predictive methodol- ogy. Motion estimation is done here in two steps. Firstly, a rough estimate of motion fields is obtained and an improvement is then performed to yield a fine tuned motion estimate.

The predictive phase is performed by utilizing the motion vectors of a pre- defined set of neighboring macro blocks to see which of them are closer matches to the current macro block. The predefined set comes from the same frame (spatial neighbors) and from previous frame (temporal neigh- bors). The idea behind it is to exploit the natural correlation found amongst neighboring macro blocks. This approach provides some initial estimate to start with instead of starting from zero every time as it happens in Full

(30)

Figure 4.7: Coarse and Refinement Phases [6].

Search or in some fast search algorithms like Three Step Search.

In the next phase, a refinement of the earlier coarse estimate is performed.

It involves searching for best match around a little distance from initially determined motion fields. The distance here is dictated by the amount of video content under consideration. This procedure is then repeated for the rest of the macro blocks [5][6]. These phases are depicted in figure 4.7.

A significant advantage of this algorithm over Full Search is that the use of previously calculated motion vectors provide a natural alignment towards the actual motion. This alignment improves the performance of the encod- ing procedure. It has been observed that Full Search sometimes gets stuck in local minimum because of it full dependence on the error measure (SAD) value. However, this algorithm tracks the actual motion very well.

As the number of predictors tried for a macro block is fixed priori, a liberty in choosing the optimal search window size is offered. Moreover, the search window size can also be altered dynamically. Such experiments reveal a huge reduction in computational complexity as compared to Full Search algorithms.

(31)

Rate Distortion Optimization and

Bjontegaard Delta PSNR

Rate distortion efficiency has been a standard measure of the performance of video encoder. It is a quite sophisticated process and its concise description is presented here. Lagrangian optimization techniques are being applied to address the issues found in common hybrid video coders. In such coders, some decision making has to be performed while using various methods of encoding involving different parameter settings. Intra coding stands for a situation where no motion estimation is performed. When some block from current frame exactly matches the block in reference frame, a SKIP flag is sent to the decoder telling that next block is just a replica of its previous counterpart. Starting with some the description of why rate distortion optimization is required, the optimization process is mentioned as under [7].

Given the background of motion estimation and compensation as in the previous chapters, a hybrid video codec is the one having availability of motion handling structure as well as frame coding. In a typical hybrid video coder, following questions are to be addressed:

• Segmentation of video images to form areas.

• Whether INTRA coding or not.

• Description of INTRA data.

• Various steps inside motion estimation if it is INTER coding.

The compression efficiency of hybrid video codecs comes with some addi- tional modes which are selected in online operation for different sections of

(32)

the encoded image. So, these modes of operation are related with signal- dependent rate distortion properties and rate-distortion trade offs are nec- essarily present in the designing phase of such aspects.

5.1 Measurement of Distortion

It is a natural requirement of a rate distortion optimization that some measure of distortion should be available. On the other hand, distortion is not easy to measure as perception of human visual system has not been well translated into physical quantity representation. The distortion measures in common use are; sum of squared difference (SSD), mean square error (MSE) and sum of absolute difference (SAD). Peak signal to noise ratio (PSNR) which happens to be normalized representation covering the whole pixel values range is calculated as:

P SN R = 10log[(2n− 1)2/M SE]dB (5.1) where n is pixel bit depth.

So, the rate distortion optimized coding options improve the performance of a video coder. The goal of an encoder is to optimize its overall fidelity:

Minimize distortion D, subject a constraint r on the number of bits used R.

This statement can be formulated as :

min{D} (5.2)

given that R<r.

This problem can be resolved by applying Lagrangian optimization which tries to minimize J where

J = D + λR (5.3)

Lagrangian rate-distortion is minimized for some suitable value of the la- grangian multiplier.

A solution to the equation 5.3 for one value of lagrangian multiplier pro- vides an optimal solution to equation 5.2 for a certain value of r. The aforementioned optimization method has proven its simplicity in effectively evaluating a number of coding options.

5.2 Bjontegaard Delta PSNR

It is required to calculate average PSNR difference between two rate distor- tion (RD) curves. Gisle Bjontegaard proposed a method [8] for it and this

(33)

method was accepted by Video Coding Experts Group (VCEG). Given are the two simulation conditions with four Quantization Parameter (QP) val- ues. The input data set contains four PSNR values and the corresponding data rates for each of the two simulation conditions. The baseline encoder is called anchor and the other is called test.

An interpolation curve through four data values of a normal RD curve is obtained as:

SN R = a + b∗ bit + c ∗ bit2+ d∗ bit3 (5.4)

where a, b, c and d are determined such that the curve passes through all 4 data points and ’bit’ means bit rate expressed in logarithmic scale. In the same way we can do the interpolation to find the bit rate as a function of SNR:

Bitrate = a + b∗ SNR + c ∗ SNR2+ d∗ SNR3 (5.5)

In this way we can find both:

• Average PSNR difference in dB over the whole range of bitrates.

• Average bitrate difference in percentage over the whole range of PSNR.

The calculation process is explained below and also depicted in figure 5.1.

Let B1, B2, B3 and B4 be the bit rates and their corresponding PSNR val- ues be P1, P2, P3 and P4 at one simulation condition. Using the equation 5.5, the RD curve equations are given by:

P1 = a + b∗ B1+ c∗ B12+ c∗ B13 (5.6)

P2 = a + b∗ B2+ c∗ B22+ d∗ B23 (5.7)

P3 = a + b∗ B3+ c∗ B32+ d∗ B33 (5.8)

(34)

Figure 5.1: BD PSNR [8]

P4 = a + b∗ B4+ c∗ B42+ d∗ B43 (5.9)

These equations are put in matrix form here:

⎜⎜

P1 P2 P3 P4

⎟⎟

⎠ =

⎜⎜

1 B1 B12 B13 1 B2 B22 B23 1 B3 B32 B33 1 B4 B42 B43

⎟⎟

⎜⎜

a b c d

⎟⎟

⎠ (5.10)

In compact matrix notation, it is written as

b = A∗ x (5.11)

where the column vector b represents the PSNR values given on the left side, A represents the matrix on the right hand side and x represents the column vector of unknown values given on the right of equation 5.11.

This system of equations is solved by using the expression:

x = inv(A)∗ b (5.12)

BD PSNR is found by taking the difference of values obtained after inte- grating both of the RD curves given by equation 5.5 over the interval of

(35)

Figure 5.2: BD PSNR

lowest bitrate value to the highest bit rate value.

In order to quantify the accuracy of BD measurements, a metric of reliabil- ity was introduced [7]. The value of reliability is calculated differently for BD PSNR and BD rate calculations and are presented next.

As given above, BD PSNR measurements cover an overlapping domain of bit rates of test and anchor RD curves and it is observed that a accuracy of BD PSNR comes from the amount of this overlap. The situation is illus- trated in figure 5.2. Let Full search be anchor and one of our contributions be a test curve here. The more the overlap between the two curves for bi- trate or PSNR, the better accuracy can be guaranteed for the BD results.

From the figure 5.2, the difference between ’oh’ and ’ol’ is the region of overlap in bit rates here and the whole range of PSNR values is taken into account, from the lowest to the highest, to get the PSNR value. On similar grounds, BD rate measurements are performed. The overlapping PSNR area is the deciding pattern for domain setting of bit rate values range to evaluate BD rate values.

Finally, a value of reliability is computed by the following formula.

r = oh− ol

max(ah− al, th− tl) (5.13) A reliability metric value approaching to 1 is an evidence of acceptable test and lower values of reliability metric dictate the need of a new simulation with a new set of QP values which could bring better overlap.

(36)
(37)

Simulation, Results and Discussion

Given the fact that a video is created by frames (pictures) presented in succession at a certain rate, there exists a variable amount of redundancy of information, depending upon the video content, amongst these succes- sive frames. Technically, we can say that there exists a correlation between various frames of a video and it is called temporal correlation. This facts gives a clue to use information of earlier coded frames as an estimation of the upcoming frames to be encoded. Moreover, the high density sampling points in a frame render a fair amount of correlation amongst neighboring blocks and this is known as spatial correlation. Inspired from the presence of these phenomenon in video, various methods derived on the basis of spa- tial and temporal correlation have been applied to devise new techniques for motion estimation. The resulting video sequences have been compared against video sequences where Full Search motion estimation algorithm was used. The testing measure were delta PSNR and delta bit rates which are introduced in the previous chapter. The subjective quality was observed by Ericsson’s proprietary VIPS viewer. Moreover, the reduction in complexity offered by the use of new techniques has been compared against the com- plexity offered by SLIMPEG motion estimation algorithm. The software platform used for simulations was MPEG-4 simple profile encoder. The whole of the work was performed in C language on the Linux environment.

The system used was Intel Core 2 Duo desktop computer.

The algorithms were tested for the first 100 frames of the Football Sequence (high motion content), Foreman Sequence (moderate motion content) and Claire Sequence (low motion content) for their luminance domain. The size of the videos was QCIF, at frequency of 15 fps and in 4:2:0 YUV format.

The values of quantization parameter (QP) used are mentioned wherever required along the text.

(38)

6.1 SAD as a Comparison Metric

As we are targeting this research on mobile multimedia applications, pro- cessing time consumption feature should always be kept under considera- tion. Following the SLIMPEG implementation given in [5], after the whole Video Encoder has been optimized, and in this case it has been done for an ARM cortex A9 with Neon support (which is a co-processor that per- forms vectorization) the load is approximately distributed as given in the table 6.1. There we observe that SAD computation uses a large amount of workload inside the Motion Estimation task, more than half of the Motion Estimation workload. Hence, we chose SAD computation complexity as metric of comparison between motion estimation algorithms.

Table 6.1: Computational workload distribution for SLIMPEG

Task Workload Remarks

Motion Estimation 50 %

SAD 30 %

Pre work 15 % Create predictors

Other 5 % Limited MVs inside search window

Motion Compensation 6 %

Write 12 %

Encode 30 % Transforms, Quantization..

Other 2 %

6.2 Proposed Techniques

During the work on this thesis, some algorithms were formulated exploiting the inherent properties of video and have been tested against SLIMPEG motion estimation algorithm for reduction in complexity. The measure of complexity metric taken here is the number of times an algorithm has to perform SAD calculations for search of similarity between current frame and the reference frame. The coded video quality is estimated by its PSNR and BD PSNR measures. The introduced algorithms offer a good level of reduction in SAD computational complexity. It is a common practice to compare the quality of resultant video against the video encoded with Full Search motion estimation algorithm employed. Coming next is a brief description of the contributions made in this thesis and a comparison of

(39)

results on the mentioned video sequences will follow that.

Before we introduce our techniques, it would be interesting to mention here that SLIMPEG has the possibility to INTRA code a macro block for which a set threshold decides that it has poor SAD value. However, our Temporal and Spatial correlation algorithms are not provided with this facility to see what exactly was estimated by them and hence the motion estimation done by them is kept unchanged. This decision may provide bad quality as a result but it was adopted to see the exact results of our algorithms.

In the graphical picture to follow, temporal and spatial means temporal correlation and spatial correlation algorithms being employed respectively.

6.2.1 Spatial Correlation Algorithm

It has been observed that there is high value of relation found inside a frame of a video amongst its neighboring blocks. This relation amongst the blocks inside a video frame thus build up a correlation called spatial correlation.

This fact leads to the idea of using some of motion vectors of earlier blocks in a sequence for some of the successive blocks in the same frame. The idea can be implemented through a number of ways. We introduce some techniques here which considerably reduces the computational complexity of the motion estimation process. This is explained in the following text.

Consider a frame out of a given video. The frame is divided into small grids called macro blocks. It is usual to do motion estimation of these macro blocks individually by comparing them against a region defined by a given window size in the reference frame. The first row and first column of the frame is coded by SLIMPEG estimation algorithm. Then an average is taken of motion vectors for the motion estimation of the remaining macro blocks in the same frame. The macro blocks chosen for taking the average measure in a particular frame are selected as; 1: The preceding macro block in the same row. 2: The macro block exactly above in the previous row of the current block. In motion estimation, as the prediction is done with some error so this error may become bigger and bigger along the way of predicting frames from their previous frames. This issue is tackled by adding an INTRA coded frame after a certain number of INTER coded frames and that is also done here. This algorithm has one in built inability to deal with a kind of motion which may not be occurring in the area of macro blocks which are encoded by SLIMPEG and hence its performance depends highly on the motion contents of the video being encoded.

6.2.2 Temporal Correlation Algorithm

The objects in a video look smooth in motion due to high temporal sam- pling. Normally, most of the parts of an object inside a video would undergo

(40)

Figure 6.1: SAD Control Algorithm [2 successive frames, two motion estimation algorithms are switched between black and grey macro blocks]

the same motion and hence an equal amount of motion estimation may be applied over the whole object. The motion amongst the frames thus have some sort of relation in succession, a temporal correlation. Observing the successive images of a video reveal the fact that, depending on the video content, there could be many parts/blocks of one image found exactly the same in a later image. Also, the movement found amongst the images is generally related, e.g. all of the parts of a moving car would move in the same direction and with the same speed. These observations give rise to the concept of temporal correlation stating that images of a video are corre- lated in time domain. This concept could also be implemented in different ways and the method adopted here mainly consists of using motion vector information of already encoded macro blocks in temporal order. Moreover, after encoding a fixed number of frames, one frame is encoded fully with SLIMPEG.

6.2.3 Adaptive SAD Control

Our approach here is to minimize the error measure along the successive frames while encoding a video. There are two alternatives to implement this technique. One is SAD Control Algorithm which is static. This al- gorithm offers a huge reduction in complexity (SAD calculations) though with a trade off of quality of the resultant video quality. It is depicted in figure 6.1. The black and gray colors represent two different kinds of motion estimation algorithms used. As it is seen in the figure 6.1, the se- lection of a particular algorithm for a certain block position changes along the way between consecutive frames. One of them could be SLIMPEG and the other is some reduced complexity algorithm. The error incurred due to reduced motion estimation for one macro block in one frame is then taken care of in the next frame.

(41)

Figure 6.2: Adaptive SAD control

We enhance the above mentioned algorithm to make it adaptive to deal with the error in prediction to form an algorithm called Adaptive SAD Control. This algorithm is dynamic in nature which adaptively chooses one of the two available methods of motion estimation. The two methods include SLIMPEG and the one which offers reduced complexity in SAD computation. Contrary to the SAD control, it has no fixed pattern of selection of ME algorithm. Instead, it is dynamic in nature and adapts online with respect to the SAD measure values. The decision of choosing a particular motion estimation algorithm is made by the ’threshold’ value.

The algorithm works as given in the following and also depicted in figure 6.2.

• The first frame of the video sequence under consideration is encoded using intra coding.

• Then a certain number of frames are encoded using predictive cod- ing. The second frame is encoded using SLIMPEG motion estimation algorithm

• Starting from the third frame, following scheme is implemented: With SAD threshold value of n%; 1. Sort the SAD values for all the mac- roblocks of previous frame in ascending order. 2. Find the index i of the macroblock that has the lowest SAD value out of top n% SAD values.

• For all the positions above index i, the corresponding macroblocks in the current frame are encoded by SLIMPEG. The rest of the

(42)

Table 6.2: Adjustability of Adaptive SAD Control Algorithm

SAD Threshold 30% 70%

Video sequence Football Foreman Clair Football Foreman Clair

Reduction in Complexity [%age] 63 65 62 35 36 34

Loss in BD PSNR [dB] -1.62 -3.03 -1.13 -0.64 -1.16 -0.32

macroblocks in the current frame, for which the corresponding mac- roblocks had SAD values less than the n% threshold in previous frame, will be encoded using temporal correlation motion estimation algo- rithm.

• Intra coding is iterated after a certain number of frames have been encoded using the above method.

This algorithm in total offers a liberty of choosing between high data rate and high quality encoding or low data rate and low quality encoding. In addition, the later option is is less computationally involved. A summary of the results obtained by applying Adaptive SAD Control algorithm with SAD threshold value of 30 % and 70 % is shown in table 6.2.

6.3 Simulations with different video sequences

The aforementioned algorithms were tested on different kinds of video se- quences. The framework adopted to the simulations is given in the follow- ing.

• Encode the video under consideration using motion estimation by Full Search, SLIMPEG, Temporal Correlation Algorithm, Spatial Correla- tion Algorithm and Adaptive SAD Control Algorithm separately with a defined set of four QP values. The QP values are chosen such that the data rates of the resulting video are limited in the approximate range of 35 kbps to 300 kbps.

• Record the data rates of the encoded videos and SAD computational complexity offered by each algorithm.

• Decode the videos and compare each of them against the original video to calculate individual PSNR values for each QP value.

• Take the video encode by Full Search as ’anchor’ and evaluate the BD PSNR and BD rate values for videos encoded by rest of the algo- rithms.

(43)

• Compare the efficiency of SLIMPEG against Full Search for SAD computational complexity. Take that values as a reference to check improvement achieved in SAD computational complexity by using other algorithms.

The data so obtained was used to plot some graphs for the video sequences presented in the coming sub sections. One such graph is called BD PSNR graph which shows the Bjontegaard Delta PSNR measure of the encoding quality offered by a certain motion estimation algorithm in comparison with Full Search. In this graph, and others to be discussed, Adaptive SAD Con- trol algorithm has variable response in accordance with its values against each value of the SAD threshold. However, other algorithms have a flat response as none of their attributes are dependent on the SAD thresh- old. BD rate plot presents a comparison amongst data rates offered by the algorithms under test compared with Full Search. Reduction in SAD computation complexity plot shows the decrease in SAD calculations for an algorithm in when compared against SLIMPEG. PSNR plots show the val- ues of individual PSNR achieved by each algorithm when compared against Full Search at a certain QP value. Some sample frames from each video sequence have been depicted in next sections for example, figure 6.3. Macro blocks inside such a frame are shown by green squares and value of motion vectors is represented by small arrows inside these squares. Coming next is the description of the simulations and results of three video sequences.

6.3.1 Football Sequence

This video sequence has quite high motion content. One of its frame is shown in figure 6.3 which was encoded by using SLIMPEG as motion estimation algorithm at QP = 16. The pointer gives information about macro block 55 that it was INTER coded and the value of motion vector is also there. Figure 6.4 shows the BD PSNR measure of the video encoded by four different motion estimation algorithms. As the players move fast and the background changes accordingly, it can be expected that different macro blocks inside any frame of the video could have little relation to each others movement. Also, the nature of the motion content provides an somewhat idea of relation amongst the frames in temporal domain. These intuitive ideas are well supported by the obtained results for spatial and temporal correlation algorithms as shown in the mentioned figure. The SLIMPEG lies above as it should. Our Adaptive SAD Control reaches up to the SLIMPEG quality level gradually with the increasing value of SAD threshold. Figure 6.5 depicts the BD rate measure of the earlier men- tioned algorithms. Similar to the BD PSNR results, motion estimation based on temporal correlation offers better BD rate values than the one

(44)

Figure 6.3: 10th frame of the Football Sequence[Encoded using SLIMPEG]

based on spatial correlation. Temporal correlation based algorithm is bet- ter in SAD computational complexity as well, as shown in the figure 6.6.

It is mentioned again here for reference that the computational complexity comparisons were made against the value offered by SLIMPEG. Adaptive SAD Control offers a wide liberty to choose between quality or reduced complexity.

The next four graphical pictures in figure 6.7 till figure 6.10 present the PSNR values of the encoded video for QP values 8, 16, 24 and 31. These values has been chosen carefully so as to keep the minimum and maximum bit rates of the encoded video inside the prescribed range. As expected, high PSNR values are obtained for lower QP values and vice versa for higher QP values. The temporal correlation algorithm has maintained its better per- formance over spatial correlation algorithm at individual QP values as well.

6.3.2 Foreman Sequence

This is a moderate motion content video sequence. One of its frame is shown in figure 6.11 which was encoded by using Adaptive SAD Control as motion estimation algorithm at QP = 11. The pointer gives information about macro block 34 that it was INTER coded and the value of motion vec- tor is also there. An intuitive observation of the foreman movement inside

(45)

Figure 6.4: BD PSNR[Football Sequence]

Figure 6.5: BD Rate[Football Sequence]

(46)

Figure 6.6: SAD Computation Complexity[Football Sequence]

Figure 6.7: PSNR values for QP = 8[Football Sequence]

(47)

Figure 6.8: PSNR values for QP = 16[Football Sequence]

Figure 6.9: PSNR values for QP = 24[Football Sequence]

(48)

Figure 6.10: PSNR values for QP = 31[Football Sequence]

the video suggests a rather abruptness in the direction of motion vectors.

As most of the motion content of this video come from the nodding of foreman, therefore the temporal correlation may have lesser value due to change in direction of motion so frequently. The figure 6.12 depicts this by BD PSNR results where spatial correlation algorithm value lies above over temporal correlation value. The next plot in figure 6.13 presents the BD rate values where spatial correlation offers better performance. Our Adap- tive SAD Control algorithm once again offer an exponential fall with SAD threshold increment. SAD computational complexity graph in figure 6.14 shows better performance of spatial correlation algorithm than temporal correlation, as expected. The next four plot in figure 6.15 till figure 6.18 depict PSNR values of the encoded videos at QP values 4, 11, 18 and 25.

6.3.3 Claire Sequence

This is a low motion content video sequence. One of its frame is shown in figure 6.19 which was encoded by using Temporal Correlation Algorithm as motion estimation algorithm at QP = 3. The pointer gives information about macro block 6 that it was Skipped coded and hence no motion vec- tor is required in this case. It becomes clear after having a look at the video that most of its visual areas continue to remain same frame after frame.As there is very low amount of motion in the sequence, we see that spatial correlation algorithm keeps higher value of BD PSNR than that temporal correlation algorithm in figure 6.20. Also, SLIMPEG becomes

(49)

Figure 6.11: 4th frame of Foreman Sequence[Encoded using Adaptive SAD Control algorithm]

Figure 6.12: BD PSNR[Foreman Sequence]

(50)

Figure 6.13: BD Rate[Foreman Sequence]

Figure 6.14: SAD Computation Complexity[Foreman Sequence]

(51)

Figure 6.15: PSNR values for QP = 4[Foreman Sequence]

Figure 6.16: PSNR values for QP = 11[Foreman Sequence]

(52)

Figure 6.17: PSNR values for QP = 18[Foreman Sequence]

Figure 6.18: PSNR values for QP = 25[Foreman Sequence]

(53)

Figure 6.19: 13th frame of the Claire video sequence[Encoded using Tem- poral Correlation Algorithm]

more efficient than so called optimum full search algorithm here because it uses previous estimates for motion vector calculations. The Adaptive SAD Control algorithm performs well here and keeps its values higher than even spatial for most of the region of SAD threshold. BD rate has similar results as presented in figure 6.21. Temporal correlation algorithm gives more re- duction in SAD computation complexity however, as shown in figure 6.22.

The individual QP value PSNR results are shown in the figure 6.23 till 6.26.

(54)

Figure 6.20: BD PSNR[Claire Sequence]

Figure 6.21: BD Rate[Claire Sequence]

(55)

Figure 6.22: SAD Computation Complexity[Claire Sequence]

Figure 6.23: PSNR values for QP = 2[Claire Sequence]

(56)

Figure 6.24: PSNR values for QP = 3[Claire Sequence]

Figure 6.25: PSNR values for QP = 5[Claire Sequence]

(57)

Figure 6.26: PSNR values for QP = 7[Claire Sequence]

(58)
(59)

Conclusion and Future Work

The task of reducing complexity, in terms of SAD calculations for measure of similarity, in motion estimation algorithms for MPEG-4 video encod- ing has been addressed in this thesis. Four methods have been introduced which proved their benefit in reducing the motion estimation complexity.

The comparison criteria here are the complexity offered by SLIMPEG and quality provided by standard full search motion estimation algorithm.

The temporal correlation algorithm focuses on usability of motion vectors from previous frame as a motion vector of the current macro block. This method provides an enormous amount of reduction in SAD computation complexity but may not be a stand alone practical solution for motion es- timation because of poor PSNR results.

The spatial correlation algorithm aims at reusability of the few fine tuned motion vectors, provided by SLIMPEG, in the same frame. This method too turned out to be very efficient in terms of low amount of SAD calcula- tions but hard to be employed alone for motion estimation process because of its poor performance in terms of PSNR of the resulting video.

• Football Sequence: The temporal correlation algorithm offers better quality than that offered by the spatial correlation algorithm by 0.79 dB in BD PSNR measure for this video. Moreover, the temporal correlation algorithm has 17 % less complexity of SAD computations than the spatial correlation algorithm.

• Foreman Sequence: The spatial correlation algorithm offers better quality than that offered by the temporal correlation algorithm by 1.1743 dB in BD PSNR measure for this video. Moreover, the tem- poral correlation algorithm has 18 % less complexity of SAD compu- tations than the spatial correlation algorithm.

(60)

• Claire Sequence: The spatial correlation algorithm offers better qual- ity than that offered by the temporal correlation algorithm by 2.2572 dB in BD PSNR measure for this video. Moreover, the temporal correlation algorithm has 18 % less complexity of SAD computations than the spatial correlation algorithm.

The SAD control algorithm alternates between a low complexity and a full motion estimation algorithm. This method offers a promising reduction in complexity and we have used this basic concept two formulate a dynamic adaptive algorithm.

The Adaptive SAD Control algorithm offers a system where complexity and quality can be traded off in a dynamic way. Higher quality may be obtained by increasing the complexity and it is possible to have low computational complexity if encoding quality can be compromised a little. We were suc- cessful in showing that this algorithm has got the capacity to provide a so called sliding knob which can be moved all the way from lowest quality with lowest complexity to the point of highest quality and highest complexity.

At the point of the highest quality the complexity of SAD calculations is the same as that offered by SLIMPEG.

The aforementioned conclusive remarks show the benefit of inherent prop- erties of video being used for motion estimation process. Correlation found amongst various parts of the video frames and consecutive frames proved useful in compression phenomenon.

The future work on motion estimation algorithms may be focused in differ- ent ways. Motion vectors obtained under the constraint of rate distortion may end up in less use of bit rate. Depending upon the nature of mo- tion in the video under consideration, the temporal and spatial correlation algorithms may be joined to achieve better results.

References

Related documents

Detta görs för att få ett svar på frågorna “Hur arbetar svenska företag med filantropi idag och vad motiverar dem till att göra detta?” och ”Hur väljer svenska företag

Den innefattar följande: föräldrar har ansvar för sina barn också när barnen är inlagda på sjukhus, barn och/eller föräldrar ska efter grundlig information vara

This is a powerful method which allows us to automatically identify, for instance, all the tractable sets of relations for the point algebras (with disjunctions) for totally ordered

Increased IGF1 levels in relation to heart failure and cardiovascular mortality in an elderly population: impact of ACE inhibitors.. Ioana Simona Chisalita, Ulf Dahlström,

longer historical baselines of coastal fish, what is possible to restore under current circumstances, disentangling large scale changes over time (e.g. climate change

Brister i hanteringssystemens miljö, som noterades på flera gårdar, var att drivgångar och grindar inte var ordentligt förankrade, olämplig bredd på drivgångar och

Structure from motion estimation from a rolling shutter camera is still not even close to being as stable as from a global shutter camera, and there is still many interesting aspects

Here sharing knowledge among different parts of company during design time can be important factor which can influence in success of product service system contracts.. Bertoni