• No results found

Perceptual Quality Assessment of Wireless Video Applications

N/A
N/A
Protected

Academic year: 2022

Share "Perceptual Quality Assessment of Wireless Video Applications"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

Perceptual Quality Assessment of Wireless Video Applications

Ulrich Engelke 1 , Tubagus Maulana Kusuma 2 , and Hans-J¨urgen Zepernick 1

1 Blekinge Institute of Technology, SE-372 25 Ronneby, Sweden, {ulrich.engelke, hans-jurgen.zepernick}@bth.se

2 Gunadarma University, Jl. Margonda Raya 100, Depok 16424, Indonesia, mkusuma@staff.gunadarma.ac.id

Abstract

The rapid evolution of wireless networks is driven by the growth of wireless packet data applications such as interactive mobile multimedia applications, wireless streaming services, and video-on-demand. The largely heterogeneous network structures, severe channel impairments, and complex traffic patterns make the wireless networks much more unpredictable compared to their wired counterparts. One of the major challenges with the roll-out of these services is therefore the design of wireless networks that fulfill the stringent quality of service requirements of wireless video applications. In this paper, the applicability of perceptual image quality metrics for real-time quality assessment of Motion JPEG2000 (MJ2) video streams over wireless channels is investigated. In particular, a reduced-reference hybrid image quality metric (HIQM) is identified as suitable for an extension to video applications. It outperforms other known metrics in terms of required overhead and prediction performance.

1 Introduction

With the implementation of current and the devel- opment of future mobile radio networks, there has been an increasing demand for efficient transmission of multimedia services over wireless channels. These services typically require much higher bandwidth for the delivery of the different applications subject to a number of quality constraints.

On the other hand, impairments such as the time- varying nature of the wireless channel caused by mul- tipath propagation and changing interference conditions make the channel very unreliable. Link adaptation and other techniques have been employed to adapt the transmission parameters in order to compensate for these variations [1]–[3]. The conventional adaptation techniques are based on measures such as the signal- to-noise ratio (SNR) or the bit error rate (BER) as indicators of the received quality. However, in case of multimedia services it has been shown that these measures do not necessarily correlate well with the quality as perceived by humans [4], [5]. Therefore, the best quality judgement of a multimedia service would be done by humans themselves. Clearly, this would be a tedious and expensive approach that cannot be per- formed in real-time. Therefore, quality measures have been proposed that incorporate characteristics of the human auditory and visual system and inherently ac- count for user-perceived quality. In contrast to already standardized perceptual quality metrics for audio [6]

and speech [7], the standardization process for image and video quality assessment in not yet as developed.

In the sequel, the applicability of perceptual image quality metrics for real-time video quality assessment of Motion JPEG2000 (MJ2) video streams over wireless channels is investigated. This approach is motivated

by the fact that MJ2 is solely based on intra-frame coding techniques. In addition, it has been shown that MJ2 encoded video streams can provide good perfor- mance over low bit rate error-prone wireless channels [8]. This is mainly due to the non-existence of inter- frame dependencies and the related suppression of error propagation. This characteristic makes MJ2 very error resilient compared to other state-of-the-art video codecs such as MPEG-4, defined by the Moving Picture Experts Group (MPEG). Furthermore, MJ2 offers high coding efficiency and low complexity.

In this paper, a number of image quality metrics are considered for application to real-time perceptual quality assessment of MJ2 video streams over wireless channels. Simulation results reveal that the reduced- reference hybrid image quality metric (HIQM) per- forms favorable over the other examined metrics in terms of required overhead and prediction performance.

This paper is structured as follows. Section 2 presents an overview of the considered quality metrics and measurement techniques. In Section 3, the ideas be- hind using quality prediction functions for automatic quality assessment are described. Simulation results for the different perceptual quality assessment techniques are provided in Section 4. Conclusions are drawn in Section 5.

2 Perceptual Quality Assessment:

From Image to Video

Traditionally, fidelity metrics such as the peak signal- to-noise ratio (PSNR) or the mean-squared error (MSE) have been utilized to estimate the quality of an image.

These belong to the group of full-reference (FR) metrics

which means that the original image is needed as a

(2)

reference for the calculation of the distorted image quality. Therefore, these approaches are not suitable for wireless communication purposes as the original image would typically not be available at the receiver. Instead, reduced-reference (RR) image quality metrics can be used which shall be based on algorithms that extract features such as structural information from the original image at the transmitting end. The feature data may then be sent over the channel along with the image. At the receiver, the image related data is extracted and the features of the received image are calculated. Given the features of the transmitted and received image, a quality assessment can be performed.

In view of the above arguments, the favorable per- ceptual video quality assessment shall be based on such an RR image quality metric. This approach finds its support in the fact that MJ2 videos consist of frames which are entirely intra-frame coded. This means that there are no dependencies between consecutive frames.

Therewith, there are no temporal artifacts introduced through neither the MJ2 source coding nor the wireless channel. As such, the quality of each video frame can be evaluated independently from its predecessors and successors using suitable image quality metrics.

The availability of the quality measure of each MJ2 video frame may be applied for link adaptation and resource management algorithms to adapt system parameters such that a satisfactory perceived quality is delivered to the end user. The block diagram of such an application scenario is presented in Fig. 1.

The features of each frame are calculated in the pixel domain of the uncompressed video frame. The resulting data is then concatenated with the data stream of the video frame. Together they are sent over the channel.

At the receiver, the data representing the features is extracted. After MJ2 source decoding the features of the received video frames are calculated and used, together with the features of the sent video frames, for the quality assessment. On the grounds of this assessment a decision can be deduced for the adaptation of system parameters.

2.1 Hybrid Image Quality Metric

As a reduced-reference metric, HIQM [9] extracts the features of the video frames on both the transmitter and receiver. The quality evaluation is composed of the outcomes from different image feature extraction algorithms such as blocking [10], [11], blur [12], image activity [13], and intensity masking [14]. Due to the limited bandwidth of the wireless channel it is an objective to keep the resulting overhead needed to represent the video frame features as low as possible.

Therefore, the overall perceptual quality measure shall be calculated as a weighted sum of the extracted features to be represented by a single number. This number can be concatenated with the data stream of each transmitted video frame without creating too much

TABLE I A RTIFACT EVALUATION .

Feature/Artifact Metric Algorithm Weight Value

Blocking f 1 [11] w 1 0.77

Blur f 2 [12] w 2 0.35

Edge-based activity f 3 [13] w 3 0.61

Gradient-based activity f 4 [13] w 4 0.16

Intensity masking f 5 [14] w 5 0.35

overhead. Specifically, the proposed metric is given by

HIQM =

5

X

i=1

w i · f i (1)

where w i denotes the weight of the respective image feature f i , i = 1, 2, 3, 4, 5. It is noted that the following relationships have been used:

f 1 , Blocking metric f 2 , Blur metric

f 3 , Edge-based image activity metric f 4 , Gradient-based image activity metric f 5 , Intensity masking metric

In order to obtain the values of the aforementioned weights, subject quality tests have been conducted at the Department of Signal Processing of the Blekinge Institute of Technology and an analysis of the results has been performed for the individual artifacts. The test was performed using the Double Stimulus Continuous Quality Scale (DSCQS) methodology, specified in ITU- R Recommendation BT.500-11 [15]. A total of 30 people had to vote for the perceived quality of both the transmitted and received set of 40 images. The responses of the test subjects are captured by the re- spective Pearson correlation coefficients. Accordingly, the magnitudes of these correlation coefficients are selected as the weights by which the individual artifacts contribute to the overall HIQM value (see Table I). The final quality measure of an MJ2 encoded video frame at the receiver may then be represented by the magnitude of the difference between the feature measure of the transmitted and the received frame

∆ HIQM (i) = |HIQM T (i) − HIQM R (i)| (2) where i denotes the i th frame within the transmitted (T ) and the received (R) video stream. The total length of the time-varying HIQM related quality value may be represented by 17 bits (1 bit for the sign, 8 bits for the integer in the range 0-255, 4 bits for each the 1 st and the 2 nd decimal).

Several other image quality metrics have been pro-

posed in recent years. For comparison purposes we will

consider in the sequel two metrics for which the source

code has actually been made available to the public.

(3)

Motion JPEG2000 Source Encoder

Flat Rayleigh Fading Wireless Channel Motion JPEG2000

Source Decoder

Channel Encoder

Channel Decoder Feature

Calculation Uncompressed

Video

Decision

Feature Calculation

Decomposition Concatenation

Quality Assessment

Mod

Demod

Fig. 1. Block diagram of a wireless link using reduced-reference perceptual quality metrics for video quality monitoring.

2.2 Reduced-Reference Image Quality As- sessment

The reduced-reference image quality assessment (RRIQA) technique has been proposed in [16]. It is based on natural image statistic model in the wavelet domain. The distortion between the received and the transmitted image is calculated as

D = log 2 Ã

1 + 1 D 0

K

X

k=1

| ˆ d k (p k kq k )|

!

(3)

where the constant D 0 is used as a scaler of the distortion measure, ˆ d k (p k kq k ) denotes the estimation of the Kullback-Leibler distance between the probability density functions p k and q k of the k th subband in the transmitted and received image, and K is the number of subbands. The overhead needed to represent the reduced-reference features is given in [16] as 162 bits.

2.3 Measure of Structural Similarity

The full-reference metric reported in [17] is also taken into account. Although the applicability of this metric for wireless communications is not necessarily given due to its full-reference nature, the comparison re- garding the quality prediction performance is of high interest as it would serve as a benchmark test for the reduced-reference metrics. The considered metric is based on the degradation of structural information. Its outcome is a measure of structural similarity (SSIM) between the reference and the distorted image

SSIM (x, y) = (2µ x µ y + C 1 )(2σ xy + C 2 ) (µ 2 x + µ 2 y + C 1 )(σ x 2 + σ y 2 + C 2 ) (4) where µ x , µ y and σ x , σ y denote the mean intensity and contrast of image signals x and y, respectively. The constants C 1 and C 2 are used to avoid instabilities in the structural similarity comparison that may occur for particular mean intensity and contrast combinations (µ 2 x + µ 2 y = 0 or σ 2 x + σ 2 y = 0). Clearly, the overhead with this approach would be the entire original image.

3 Prediction of Subjective Quality

Subjective ratings from experiments are typically aver- aged into a mean opinion score (MOS) which represents the subjective quality of a particular image. On the

other hand, the examined metrics relate to the objective image quality and shall be used to predict perceived image quality automatically. In the sequel, exponential functions are suggested for predicting the subjective quality from the considered image quality metrics.

3.1 System Under Test

The system under test comprised of a flat Rayleigh fading channel in the presence of additive white Gaus- sian noise (AWGN) along with hybrid automatic re- peat request (H-ARQ) and a soft-combining scheme.

A (31, 21) Bose-Chaudhuri-Hocquenghem (BCH) code was used for error protection purposes and binary phase shift keying (BPSK) as modulation technique. The average bit energy to noise power spectral density ratio (E b /N 0 ) was chosen as 5dB and the maximum number of retransmissions in the soft-combining algorithm was set to 4. These particular settings turned out to be beneficial in generating impaired images and video frames with a wide range of artifacts. It should be mentioned that these are the same settings that have been used in the derivation of the weights given in Table I.

To obtain the MJ2 videos, a total of 100 consecutive frames of uncompressed quarter common intermediate format (QCIF) videos were compressed at a bit rate of 1bpp using the Kakadu software [18]. No error- resilience tools were used during source encoding and decoding to get the full impact of the errors introduced by the channel. The MJ2 videos were then sent over the channel and decompressed on the receiver side to obtain the QCIF videos. In Fig. 2 it can be seen that a wide range of distortions could be created. In order to automatically quantify subjective quality of this type of impaired video frames in real-time, suitable quality predication functions are needed.

3.2 Exponential Prediction Function

The selection of an exponential prediction function

finds its support in the fact that the image quality

metrics considered here relate to image distortion and

degradation of structural information. As such, a highly

distorted image would be expected to relate to a low

MOS while images with low structural degradation

would result in high MOS. A curve fitting of MOS

values from subjective tests versus quality measure may

(4)

(a) Frame no. 2 (b) Frame no. 33

(c) Frame no. 80 (d) Frame no. 89 Fig. 2. Frame samples of the video “Highway drive” [19] after transmission over the wireless channel.

TABLE II C URVE FITTING PARAMETERS

HIQM RRIQA SSIM

a 96.15 109.1 14.93

b −0.2975 −0.1817 1.662

then be based on an exponential function leading to the prediction function

MOS QM = a e b·QM (5)

where QM ∈ {∆ HIQM , RRIQA, SSIM} denotes the respective perceptual quality metric. The parameters a and b are obtained from the curve fitting and define the exponential prediction function of the respective perceptual quality metric.

Figs. 3 a-c show the MOS obtained for the 40 differ- ent image samples used in our subjective tests versus the considered metrics ∆ HIQM , RRIQA, and SSIM, re- spectively. The parameters a and b of the corresponding exponential prediction function are given in Table II.

The figures also show the 95% confidence interval from which only a small scattering of image samples around the fitting curve is observed for ∆ HIQM while larger scattering and hence more prediction uncertainty is noticed for the cases of RRIQA and SSIM.

The prediction performance of the considered ob- jective quality metrics with respect to the subjective ratings shall be characterized by the Pearson linear correlation coefficient and the Spearman rank order [20]. The Pearson linear correlation coefficient char- acterizes the degree of scattering of data pairs around a linear function while the Spearman rank order mea- sures the prediction monotonicity. For the purpose of calculating these prediction performance measures, the relationships between MOS and predicted scores MOS QM with QM ∈ {∆ HIQM , RRIQA, SSIM} have been established using (5) and are shown in Fig. 4 a-

TABLE III P REDICTION PERFORMANCE

HIQM RRIQA SSIM Pearson 0.896 0.769 0.599 Spearman 0.887 0.677 0.461

c. The Pearson linear correlation coefficient and the Spearman rank order can be deduced from the data pairs shown in these figures and the results are reported in Table III. It turns out that the prediction performance of ∆ HIQM outperforms RRIQA and SSIM in both accuracy and monotonicity.

4 Simulation Results

The extensive simulations involved a wide range of video streams which were taken from the data base provided in [19]. The common findings from these simulations will be discussed in the sequel using a representative video stream. Specifically, the “Highway drive” video has been chosen to illustrate the ability of the considered measures in assessing perceptual quality for wireless video applications. The same wireless scenario as described in Section 3 was used with the simulations. The actual quality assessment has been performed on both the transmitted and received un- compressed QCIF videos. The exponential prediction curve (5) with parameters a and b given in Table II was used to translate the perceptual quality measures into predicted mean opinion scores MOS QM . Finally, the MOS QM values were normalized to fall in the interval [0, 100]. The progression of the quality measures over the 100 consecutive frames are shown in Fig. 5.

It can be seen from the results shown in Fig. 5 that ∆ HIQM very closely follows the assessment of the benchmark given by SSIM. In particular, ∆ HIQM

clearly identifies the same frames as of perceptually lower quality as those detected by SSIM and provides also stable quality assessments for the frames that have good quality. It is remarkable that this behavior can be achieved without requiring reference frames at the receiver as would be the case with SSIM. It should also be noted that SSIM appears to overestimate the perceptual quality as is the case with frame number 89 (see Fig. 2 d). Although this particular frame is clearly indicated by both ∆ HIQM and SSIM as of reduced quality, the low value given by ∆ HIQM seems to more accurately reflect the severe quality degradation.

As far as the comparison with the other reduced-

reference metric in terms of RRIQA is concerned,

the proposed ∆ HIQM can much better differentiate

among perceptual quality levels while RRIQA appears

to be rather unstable. Therefore, ∆ HIQM would be the

preferred metric when it comes to applications for real-

time quality assessment or the extraction of decisions

(5)

0 2 4 6 8 10 12 14 16 18 20 22 24 0

10 20 30 40 50 60 70 80 90 100

HIQM

MOS

Image sample Fitting curve Confidence interval (95%)

(a)

0 1 2 3 4 5 6 7 8 9

0 10 20 30 40 50 60 70 80 90 100

RRIQA

MOS

Image sample Fitting curve Confidence interval (95%)

(b)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

10 20 30 40 50 60 70 80 90 100

SSIM

MOS

Image sample Fitting curve Confidence interval (95%)

(c)

Fig. 3. Curve fitting for subjective scores versus perceptual quality measures: (a) ∆ HIQM ; (b) RRIQA; (c) SSIM.

for link adaptation techniques. Furthermore, we recall that the additional overhead per video frame needed with ∆ HIQM to communicate the quality reference consumes only 17 bits for representing the weighted sum of features while RRIQA requires 162 bits to represent the involved individual features [16].

The computational complexity has been measured in terms of the processing time for the 100 frames of the test video and is summarized in Table IV. Clearly, the

∆ HIQM based metric offers a significant reduction in processing time compared to RRIQA. As such, real- time applications would benefit from this feature pre- suming an efficient implementation on a digital signal processor platform was performed.

For comparison purpose, the progression of PSNR over the 100 frames is also presented in Fig. 5. In order

0 10 20 30 40 50 60 70 80 90 100

0 10 20 30 40 50 60 70 80 90 100

MOS

HIQM

MOS

Image sample Fitting curve Confidence interval (95%)

(a)

0 10 20 30 40 50 60 70 80 90 100

0 10 20 30 40 50 60 70 80 90 100

MOS

RRIQA

MOS

Image sample Fitting curve Confidence interval (95%)

(b)

0 10 20 30 40 50 60 70 80 90 100

0 10 20 30 40 50 60 70 80 90 100

MOS

SSIM

MOS

Image sample Fitting curve Confidence interval (95%)

(c)

Fig. 4. Curve fitting for subjective scores versus predicted scores:

(a) ∆ HIQM ; (b) RRIQA; (c) SSIM.

TABLE IV

O VERHEAD AND COMPUTATIONAL COMPLEXITY (100 FRAMES OF MJ2 VIDEO “H IGHWAY DRIVE ”)

HIQM RRIQA SSIM

Overhead 0.07% 0.65% 100%

Processing time (sec) 51.27 380.39 3.75

to align with the scale of the other examined metrics,

PSNR was set to a constant value of 100 when the video

frames at transmitter and receiver were identical. As

PSNR is a fidelity metric, it is known that it often does

not correlate well with human perception. For example,

it can be seen from Fig. 5 that PSNR underestimates the

quality of a number of frames compared to the above

(6)

0 50 100

∆ HIQM (MOS)

0 50 100

RRIQA (MOS)

0 50 100

SSIM (MOS)

0 10 20 30 40 50 60 70 80 90 100

0 50 100

Frame number

PSNR (dB)

Fig. 5. Progression of the different quality metrics for the video “Highway drive” [19].

perceptual quality assessment approaches.

5 Conclusions

In this paper, we examined the potential of perceptual image quality metrics for quality assessment of MJ2 video streams in the context of wireless channels. The reduced-reference hybrid image quality metric has been identified as suitable for an extension from image to intra-frame coded video applications. The simulation results have shown that ∆ HIQM outperforms RRIQA in both the overhead that is needed for representing the features of MJ2 video frames and the quality prediction performance.

References

[1] K. L. Baum, T. A. Kostas, P. J. Sartori, and B. K. Classon, “Per- formance characteristics of cellular systems with different link adaptation strategies,” IEEE Trans. on Vehicular Technology, vol. 52, no. 6, pp. 1497–1507, Nov. 2003.

[2] A. J. Goldsmith and S.-G. Chua, “Variable-rate variable-power MQAM for fading channels,” IEEE Trans. on Communications, vol. 45, no. 10, pp. 1218–1230, Oct. 1997.

[3] L. Hanzo, C. H. Wong, and M. S. Lee, Adaptive Wireless Transceivers. John Wiley & Sons, 2002.

[4] S. Winkler, E. D. Gelasca, and T. Ebrahimi, “Perceptual quality assessment for video watermarking,” in Proc. of IEEE Int. Conf.

on Information Technology: Coding and Computing, Las Vegas, USA, Apr. 2002, pp. 90–94.

[5] A. W. Rix, A. Bourret, and M. P. Hollier, “Models of human perception,” Journal of BT Technology, vol. 17, no. 1, pp. 24–

34, Jan. 1999.

[6] “Method for objective measurements of perceived audio qual- ity,” ITU-R, Rec. BS.1387-1, Dec. 2001.

[7] “Perceptual evaluation of speech quality (PESQ), an objective method for end-to-end speech quality assessment of narrow band telephone networks and speech codecs,” ITU-T, Rec.

P.862, Feb. 2001.

[8] F. Dufaux and T. Ebrahimi, “Motion JPEG2000 for wireless ap- plications,” in Proc. of First Int. JPEG2000 Workshop, Lugano, Switzerland, July 2003.

[9] T. M. Kusuma and H.-J. Zepernick, “A reduced-reference per- ceptual quality metric for in-service image quality assessment,”

in IEEE Symposium on Trends in Communications, Bratislava, Slovakia, Oct. 2003, pp. 71–74.

[10] Z. Wang, A. C. Bovik, and B. L. Evans, “Blind measurement of blocking artifacts in images,” in Proc. of IEEE Int. Conf. on Image Processing, vol. 3, Vancouver, Canada, Sept. 2000, pp.

981–984.

[11] Z. Wang, H. R. Sheikh, and A. C. Bovik, “No-reference perceptual quality assessment of JPEG compressed images,” in Proc. of IEEE Int. Conf. on Image Processing, vol. 1, Rochester, USA, Sept. 2002, pp. 477–480.

[12] P. Marziliano, F. Dufaux, S. Winkler, and T. Ebrahimi, “A no- reference perceptual blur metric,” in Proc. of IEEE Int. Conf.

on Image Processing, vol. 3, Rochester, USA, Sept. 2002, pp.

57–60.

[13] S. Saha and R. Vemuri, “An analysis on the effect of image features on lossy coding performance,” IEEE Signal Processing Letters, vol. 7, no. 5, pp. 104–107, May 2000.

[14] A. R. Weeks, Fundamentals of Electronic Image Processing.

SPIE Optical Engineering Press, 1996.

[15] “Methodology for the subjective assessment of the quality of television pictures,” ITU-R, Rec. BT.500-11, 2002.

[16] Z. Wang and E. P. Simoncelli, “Reduced-reference image qual- ity assessment using a wavelet-domain natural image statistic model,” in Proc. of SPIE Human Vision and Electronic Imaging, vol. 5666, Mar. 2005, pp. 149–159.

[17] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli,

“Image quality assessment: From error visibility to structural similarity,” IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.

[18] D. Taubman. (2005) Kakadu software: A compre- hensive framework for JPEG2000. [Online]. Available:

http://www.kakadusoftware.com

[19] Arizona State University, Video Traces Research Group. (2005) QCIF sequences c °Acticom GmbH. [Online]. Available:

http://trace.eas.asu.edu/yuv/qcif.html

[20] S. Winkler, Digital Video Quality - Vision Models and Metrics.

John Wiley & Sons, 2005.

References

Related documents

WaveGlow then took the mel-spectrogram and used it to synthesize audio which finally was used as input for our proposed network that generates a blend shape weight sequence

Genom en komparation av gymnasieelever i Sverige och Frankrike, som utgör exempel på länder med olika undervisningstraditioner, ställs frågor om effek- ten av litterär

From this investigation, a function will be developed to calculate the perceptual quality of video given the layering technique, number of layers, available bandwidth and

x To obtain a data size vs SNR graph for varying SNR values for a single data size value using Matlab and compare them with a JPEG2000 standard original image in terms of

Figure 17: Consistency of viewer’s opinions based on Video Quality 24 Figure 18: Consistency of viewer’s opinions based on Audio Content 25 Figure 19: Consistency of

The tool is designed in a way that after the user grades a video sequence, the ratings of the videos per each subjective test will be updated in the server database when the

It is revealed, that the saliency of the distortion region indeed has an impact on the overall quality perception and also on the viewing behaviour of human observers when

In this thesis, we will focus on the design of objective metrics for visual quality assessment in wireless image and video communication.. The aim is to quantify the