A Minimum Mean Square Error Estimation and Mixture-Based Approach to Packet Video Error Concealment

(1)

Linköping University Post Print

A Minimum Mean Square Error Estimation

and Mixture-Based Approach to Packet Video

Error Concealment

Daniel Persson and Thomas Eriksson

N.B.: When citing this work, cite the original article.

©2009 IEEE. Personal use of this material is permitted. However, permission to

reprint/republish this material for advertising or promotional purposes or for creating new

collective works for resale or redistribution to servers or lists, or to reuse any copyrighted

component of this work in other works must be obtained from the IEEE.

Daniel Persson and Thomas Eriksson, A Minimum Mean Square Error Estimation and

Mixture-Based Approach to Packet Video Error Concealment, 2007, Proceedings of IEEE

International Conference on Acoustics, Speech and Signal Processing, 2007, 749-752.

http://dx.doi.org/10.1109/ICASSP.2007.366016

Postprint available at: Linköping University Electronic Press

(2)

A MINIMUM MEAN SQUARE ERROR ESTIMATION AND MIXTURE-BASED APPROACH

TO PACKET VIDEO ERROR CONCEALMENT

Daniel Persson and Thomas Eriksson

Chalmers

University of

Technology

Department of Signals and Systems

412 96 Goteborg

Sweden

ABSTRACT

Inthis paper, a minimum mean square error-optimized mixture-based estimator is used forpacketvideo error concealment. At the same time as on-linecomputational complexityisreduced,performancein peak signal-to-noiseratio(PSNR)is increased incomparisonwith a Gaussian mixture-based estimator that obtains its parametersthrough probability density estimation by means of the expectation maxi-mization algorithm. Moreover, our method increasesperformance inPSNRcompared to several other previous error concealment al-gorithms.

Index Terms- Packet video, error concealment, estimation.

2. MMSEESTIMATION WITH MIXTURE-BASED ESTIMATOR

Inthis section, mixture-based packet video error concealment is for-mulated as a MMSE-problem. The proposed technique is a further developmentoftheapproach reportedon in[3], [4],and [5], that has previously shown to increase PSNR compared to the schemes [6],

[7],

and

[2].

2.1. Motivation of theproposed method 1. INTRODUCTION

Today's frequently used video compression methods utilize block-basedmotion-compensated inter-frame prediction, discrete cosine transform-basedquantization,and variablelength coding,toreduce the bit stream. High compressionratios are achieved in this way, but the video streamsensitivitytocommunication channel and stor-agemedia errors also increases. Amongthe various error resilience techniquesused forcombatingthisproblem, methods that work at the decoder side without extraredundancyfrom the encoder are re-ferred to as error concealment schemes. Error concealment methods are usually categorizedinto spatial approaches, that use only spa-tially surrounding pixelsfor estimation of lostblocks,andtemporal approaches, that use motion information andpixels fromprevious frames I

12.

IntYis paper, we combine spatial and temporal error conceal-ment. The MVs are considered to be availablethrough protectionin ahigh priority layer, orestimatedbythe median of the MVs of the surroundingblocks[2],and we propose a mean square error(MSE) risk- and mixture-based estimator that classifies the situation of local image correlation in order to choosespatial ortemporalerror con-cealment. In the case oflost MVs, theproposedmethod may be com-bined with any method forfindingthe MVs. Performance inpeak signal-to-noise ratio (PSNR) is increased, while on-line computa-tionalcomplexityisreduced, comparedto our ownGMM-based es-timator withparametersretrievedbymeansof theexpectation maxi-mization(EM) algorithmthat wasreportedonin [3], [4], and [5]. The new method also increases performance in PSNR compared with otherpreviousmethods that combinespatialandtemporalerror concealment, e.g.Shirani et al's method[6],Zhu et al's method[7], and motioncompensated copying [2]. The rest of the paper is orga-nized as follows. In Section2,the minimum MSE(MMSE) estima-tion framework isdefined,and anargumentationfor a mixture-based estimator isgiven. Section 3 describes theexperimental setup,and presentssimulation results. The paper is concluded in Section 4.

We startby summarizing [3], [4],and[5],and furtherdiscussingthe advantagesand theshortcomingsof the method. Thereafter,the new approachisexplained. In [3], [4], and [5], a group ofneighboring pixel values, that are lost at the decoder side, are represented by elements of the stochastic vector variable X. An estimate

X =

g(Y)

(1)

isformed,where Y is a stochastic vector whose elements represent pixelsin aneighborhoodto X.Further,the MMSEproblem

g*

(Y)

=

argminE[|X+

i

g(Y)

11

]

g(Y)

isposed,that has the solution

g*(Y)

=

E(X Y)

=

xfx

Iy

(x

IY)

dx.

Thedensityfx,y

(X, y)

is modeled with a GMM

(2)

(3)

(4)

M ml

where

Z'

=

[X',

Y']

andf(m)

(z)

areGaussian distributions with

meansandcovariances

o (m) _ F AX

z

-

L

0(m)

I

c(m)ZZ

c,r2)

- c(m)Cxx

YX

1--x

C(-)j

(5)

and where theapriori weights

0(m)

areallpositiveand sum to one. The estimator(3)maythen be written

m~~~~

g*(y)

E

o(m)(Y

(c,7)(c(§2)*

(Y+6y

) (6)

(3)

0

(m)(y)

=

0(m)f(m) (Y)

1 (k) (k)(y)

(7)

are aposteriori probabilitiesthat sum to 1. Aqualitative interpre-tation of theformof the mixture(6) is that the aposteriori weights

0(m)

(Y) are used for classifyingthe local image correlation, and choosing appropriatelinear estimators cm)

(C(2)

* 1(YA

(m))

+ 0 () for the situation at hand. As seen in [3], [4], and[5],the es-timator(6)with M > 1 increasesperformancein PSNRcompared tothe linear estimator obtained when M = 1. Thoughthe above

scheme has shown to increase PSNRcomparedto severalprevious methods [6], [7], [2], a few remarksabout its disadvantages can be made.

iThe formulation in (1) and (2) leaves us with adensity esti-mation problem in order to achieve fz(z).In

[3], [4],

and

[5],

thedensitywasachievedbymeansof the EMalgorithmthat is a maximum likelihood (ML) method, and therefore does notnecessarily minimize the MSE. A consistent treatment, where all parameters of the final estimator are achieved by minimizing the MSE, would be more eligible.

However,

a MMSEformulation for finding the

0(m),

0

(m),

and C

(-)

of the estimator(6)would be anextremelydifficultproblem. iThe estimator (6) has a highcomputational complexity

on-line, when X is estimated from Y. Thequadraticforms in the exponentsof

f

(m)

(Y)

contributesignificantlytothe compu-tationalcomplexity.

i In[3], [4], and [5], a vector Z with 64 dimensions was em-ployed. Itwould be desirable to increase the number of di-mensions of Ysubstantially, but with the estimator(6), this would be expensive due to matrixmultiplications, both for estimator

optimization,

and for on-linecomputations. Tosummarize,the benefit of asimplificationof the estimator(6)is threefold: the parameters may be obtained in theMMSE-sense, on-linecomplexityisreduced,and the estimate may be based on more information. In whatfollows,werephrasetheproblemstated in(1) and(2)with this in mind. We form the new estimate

X =

g(Y,

0)

(8)

where

g(y, 0)

is the estimator function with parameters 0. Optimal parameters0* arefoundbyminimizingthe MSE risk

0*

argminE(X

g(Y,

0)112).

(9)

0

Incontrast tothe parameters ofthe estimator(6),thatdependon ML estimation, theparametersof(8)areachievedintheMMSE sense.

As PSNRis a standard measure ofvideoquality [7], [6], [3], [4], [5], and a function of theMSE,the MSE risk was chosen.

By

minimiz-ingtheMSE,the PSNR is maximized. For theproposed estimator, wechoose a form that isheavily inspiredbythe solution in(6). The

means0 ()and 0

(m)

areremoved,the matrices C )

(Cr

))

* are

replaced bymatrices

A(m),

the functions 0(m) arereplaced by sim-plerfunctions 0

(m),

and two subsets of Y areemployedinthe esti-mator(8): Ycfor classification in the aposteriori weights (Cstands forclassification), andYp forprediction (Pstands forprediction). ByintroducingYcand Yp, more information may be used in the new simpleaposteriori weights0

(m),

than for the prediction. Wenow

g(Y;

0) fE

o(m)

(y )A(m)y

ml

o(m)(YC)

1(k)h(k)(YC)

Ek= 0(k)h(k)(c

h((YC e( y(m),y(m)2 2)

h(m) (Yc)

= exp

D(m)''I

C

(10)

(1 1)

(12) where Ym)1 and Y,(m)2 are vectors containing elements ofYc,

c(m)

> 0 is ascalar, and

D(m)

is the dimension of the vectors and<m)2. Inthe discussion of the GMM-based estimator, weconcluded that different aposteriori weightsfocus on different situations of video correlation. In accordance withthis,the vectors

YC(m)'l

and Y(m)2

should be chosen so that aspecificsituation of video correlation isgivenpriority. Forexample,inorder to generate amixture component that focuses onspatial correlation, y(i),1 and

y(m),2

should be chosen so that the exponent of(12) incorporates the difference of the values of manyspatially neighboring pixels. The exponent of(12)mayberepresentedas

()y(m),l+ y(m),2112 (m)

iC( m)11C 2

D(m)TW(m)yc

₍₁₃₎

where the matrices

W(m)

not are positive definite. This means that the new mixture (10)notis GMM-basedthoughit has the es-sential functionality of the estimator(6). Also, (10) iseasily opti-mized in the MMSE way, it reduces on-linecomplexity comparedto (6),and it may take a Yc withhighnumber of dimensions without severely increasingthe computational complexity. The parameters 0 are

{0(m), c(m),

A(m)}. Thoughthe estimation of X from Y is performed on-line, theoptimal estimator parameters 0* are found off-line.

2.2. Algorithm for solving the MMSE estimation problem Since thesolvingof the MMSEproblem (9),with theproposed es-timator(10),does not have a closed formsolution,analgorithmfor iterative solvingof(9)with(10) is nowproposed. Wesolve itera-tivelyfor the parameters

A(m),

0(m) andc(m). The algorithm in-creasesPSNR in everyiteration.

Theparameters

A(m).

The matrices

A(m)

arefirst considered. It is easy toshow that theproblemthat consists infinding

A(')*

=

argminE[||X

X

g(Y,

0)

112]

A('r) (14)

is convex. Theproofwill begivenin alongerjournalpaper.By set-tingthe derivative with respecttoA(r) equaltozero,and thereafter solvingfor

A(r),

weachieve

(15) A(r)*

RI(R2)*

1

R,

E

E[0(r)(Yc)XY>

i

0(Yc)( )0(Yc)(m)A(

)YPYPp]

(16)

m=1,m/r

R2 = E[(0( )(YcL))2Yp]. (17)

Theparameters

0(m)

and

c(m).

Since the MSE is not convex in

0(m),

these parametersareupdatedbysearches in the space of possi-I-

750

(4)

ble vectors 0

[0(1),

...,

0(M)

]T.Ineach iteration, the MSE is com-pared at the point describing the parameter set 0 obtained in the pre-vious iteration, and inthe two points0+0

[0(1)a(1),

...,

0(M)a(M)]T,

where 0 is some scalar, and a is a normalized random vector with elementsa(l)toa(M).The point that yields the minimum value of the MSE is chosen as the new 0. This algorithm is faster than gradi-entdescent withbacktrackinglinesearch,since many evaluations of the MSE are avoided. The parametersc(m) areupdated in the same way.

For optimization in practice, theexpectations are replaced by arithmetic means.

3. EXPERIMENTS

Inthis section, the proposed method is evaluatedand compared to methods suggested by other authors. Details ofthe simulations, which are chosen to fit state-of-the-art block-based videocoders,are given in Section 3.1. These conditions areimpartial to all the com-paredschemes. Results of theexperimentsarepresentedinSection 3.2.

3.1. Simulation prerequisites

Coding and packetization. We focus onpredictivelycoded frames (P-frames) (An application of the proposed method to restoration of intra-coded frames (I-frames) is completely analogous). MVs are calculated for 8+8-blocks with a search range of 8pixelsfor each component. The coder works in the limit ofperfectquantization. Each row of 16+ 16-blocks of pixels is divided into 8+8-blocks ofpixels, that areinterleavedinto twopackets,sothat if apacketis lost, there is a high probability that surrounding pixels are available. Concerningthe MVs, two situations areinvestigated.Inthe first sce-nario,the MVs are coded with theirrespective pixelinformationin the samepacket,and when apacketislost, lost MVs are estimated bythe median of MVs of theneighboringblocks[2]. Inthe second scenario,MVs areprotectedin ahigh prioritylayer,and considered available. Theseassumptionsaresimilar to theassumptionsin[7]. Errors. Thepacketsarerandomly assignedaslost. Simulations are runfor lossprobabilities ranging from 0.05 to 0.3. In the case of MVsprotectedin ahigh priority layer, only packetscontaining pixel information are lost.

Data. We use the luminance component of 124 MPEG-1 movies from [8]. The movies are divided into twoindependentsets, onefor off-lineoptimizationofthe parameters0,and another for evaluation. Inorder to show the robustness of ourscheme,we use moremovies for the evaluation than for thetraining. The sets used for parame-teroptimizationand evaluation contain 35 and 89randomlyselected movies respectively.

Benchmarking. Theproposedestimator iscomparedtoother schemes that mixspatialandtemporal information giventhe MVs: Shiraniet

al's method[6],Zhu et al's method[7],and GMM-basederror

con-cealment[3], [4], [5],aswellas tomotioncompensatedcopying [2]. Allmethods use the same MVinformation.

Proposed estimator. Each lost 8 +8-block isrepaired by splitting it into four 4+4-blocks whosepixelsare representedbyX. The surrounding pixels represented byYc and Yp will vary in the exper-iments, and will be described for eachexperiment. We choose to work with few mixture components since we strive for low on-line complexity. A mixture with M = 2 componentsis investigated.

Mixture component 1 focuses onspatial correlationby employing

Y<(1)'1

and

Yj(1),2

such that allpossibledifferences between closest spatial neighbors inYc are included in the exponent. In the same

way,mixture component 2 focusesontemporalcorrelationby

em-ploying<2)1 and<2)2 such that all possible differences between closesttemporal neighborsinYc areincluded in the exponent. Varying available information. Spatially surrounding pixels may notbe available, because the block in question is a border block, orbecause several consecutive blocks are lost. Different models are obtained and stored for each of these cases. In the case when nospatial surroundinginformationis available, we reduce our esti-mator tothe special linear MMSE solution to (9) and (10)

A(1)

E(X

Yp)(E(YpTYp))*

1that is obtained when M = 1,and nowYp

onlyrepresentspixels intheprevious frame. Byassuming mirror invariance of the model, only four model cases need to be stored. Off-Line parameter estimation. The parameters ofthe estimator are found off-line. A choice of M = 2mixture components was

previ-ouslymade. The parameters 0=

[(1),0(2)]Tand

c

[cC1), c(2)]T

are initializedby I

[1,1]T,

and the parametersA(m) are initialized

bythe linear MMSE solution to(9)and(10)obtained when M =1,

i.e. A(m) (XYp)

(E(YpTYp))*

1. For the update of O(m) and (m) wechoose 0 = 0.1. Ineachiteration, 1470 000realizations

ofZT =

[XT, yT]

areused. In each of the 10 firstiterations, two

iterations areperformedfor0(m) andc(m) respectively, aswell as oneiteration for theA(m). Inthe ten finaliterations, onlythe A(m) areupdated.

3.2. Results

Estimatorcomparison. The purpose of this first experimentis to seethat our estimator strategyyields higher performance in PSNR and lowercomputational complexitythan the GMM-based estima-tor[3], [4], [5],when both methods use the same number of mixture components and the same information. In thisexperiment, we set Yc = Yp, asillustrated inFigure 1, andpixels surrounding alost

8+8-block areguaranteedtobe available. The resultspresentedin Table 1 show that theproposedmethodgivesbetterperformancein PSNR, with around a third of thecomputational complexity. Errorconcealment comparison. The proposed estimator (10) is compared to different error concealment schemes. Inpreliminary simulations, theapproach with Yc and Yp chosen as inFigure 2, was comparedto the approachwith Yc and Yp chosen as in Fig-ure 1. Since Yc and Yp chosen as inFigure 2yieldedbetter re-sults while maintaininglowcomplexity, thisapproachwaschosen forcomparisontootherpreviouserrorconcealment methods. Note thatby choosing YcasinFigure 2, significantlymoreinformationis used for theclassification,than if Yc would have been chosen as in Figure 1. Intheexperiments, pixels surroundingalost 8+8-block were notguaranteedtobe available. The errors propagate in a few tensof frames in each movie. Figure3 presentsthe results for the casewhen the MVs are notavailable,andreplaced bythe median of the MVs of thesurroundingblocks. InFigure 4,we seethe results inthe case when the MVs are available. Theproposedmethodgives bestperformancein PSNR.The GMM-based method with M =20

givesacomparable result,but this comes at a cost of around 26 times higheron-linecomputational complexitythan theproposedmethod. AGMM-based estimator with M= 2has 2.6 timeshigher

compu-tationalcomplexitythan theproposedscheme.Examplesin alonger journal paperwill show thatour method also improves subjective

visualperformance.

4. CONCLUSION

Inthis paper, MMSE- and mixture-based error concealment, that maybe run inreal-time,ispresented. For amotivation of the new technique, we use as our starting pointan estimatormethodology basedonGMMsandprobability densityestimationbymeansof the

(5)

Framet-I 36r 1I 12 pixels 4pixels Framet

-

X YC=Yp

Fig. 1.Illustration of variables to be used with the proposed estima-torinthe estimatorcomparison. Blocks of size 8+8 aredivided into four 4+4-blocks that are estimatedseparately. The vector X is lost atthe decoderside,and is estimatedby usinga vectorofsurrounding pixels Yc=Yp. V, 34 G Proposed method, M=2 D GMM, M =20 " GMM, M=2 n:l / A! A x A.r -1

32 Zhu et

al's

method

4 C Shiraniet

al's

method

n300 -:<- Motion-compensated copying

-28t 1

24II

5 10 15 20

Lossratepercentage 25 30 Fig. 3. Comparisonsof different error concealmentschemes,inthe casewhen the MVs are notavailable,andreplaced bythe median of the MVs of thesurroundingblocks.

-

x

m YP

36

Fig. 2.Illustration of variables to be used with theproposed estima-torinthe error concealmentcomparison. Blocks of size 8 +8 are divided into four 4+4-blocks that are estimatedseparately. The vec-tor Xis lost at the decoderside,and is estimatedby usingvectorsof surrounding pixelsYcand Yp.

MV Typeof Proposed GMM, GMM,

test method,M =2 M= 1 M=2

Notavailable Closed 32.1 31.6 31.7

Notavailable Open 31.4 31.0 31.0

Available Closed 34.0 33.7 33.7

Available Open 33.4 33.1 33.1

Table 1. Comparisonof results in PSNR for theproposed method, and the GMM-based method. The number M is the number of mix-turecomponents.

EMalgorithm. All parametersof the new estimator are easily ob-tainedoff-line inthe MMSE sense. Theproposedestimatorgives betterperformance in PSNRthan the GMM approachwhenusing the same data and number of mixture components. At the same time, theproposedestimator has lowercomputational complexity. Moreover, the newtechnique may, without substantial increase in computational complexity, incorporatemuchbigger surroundingto the lost block asinputtothe estimator. Wefinallyshow that the pro-posedmethodgivesanimportantincrease inperformance compared

to arangeof other well-knownpreviouserrorconcealment methods. 5. REFERENCES

[1] Y. Wang and Q.-F. Zhu, "Error control and concealment for video communication: a

review,"

Proc.IEEE,vol.86,pp. 974-997,May 1998.

[2] P.Haskell and D.Messerschmitt, "Resynchronizationofmotion

34 7 P-_32SC 30-28 G Proposed method, M=2 D GMM, M =20 ' GMM,M=2 E GMM,M =1

° Zhu et al's method x Shirani et al's method

l-

Motion-compensated

copying

5 10 15 20 25

Loss ratepercentage 30 Fig. 4. Comparisonsof different error concealmentschemes,inthe casewhen the MVs are available.

compensatedvideo affectedbyatmcellloss," in Proc.ICASSP, Mar. 1992, pp. 545-548.

[3] D. Persson and P.

Hedelin,

"Astatisticalapproachtopacketloss concealment forvideo," in Proc.ICASSP,Mar.2005,pp. 293

-296.

[4] D.Persson, T.Eriksson,and P.Hedelin, "Qualitative analysisof videopacketloss concealment withgaussian mixtures,"in Proc. ICASSP,May2006,pp.II-961 -II-964.

[5] D.Persson, T.Eriksson,and P.Hedelin,"Statisticalpacketvideo errorconcealment,"IEEETrans.ImageProcessing, inreview. [6] S.Shirani,F. Kossentini,and R.Ward, "Aconcealment method

for video communications inanerror-proneenvironment,"IEEE JSelect. AreasCommun.,vol. 18,pp. 1122-1128,June2000. [7] Q.-F. Zhu, Y. Wang, and L. Shaw, "Codingand cell-loss

re-covery inDCT-basedpacket video,"IEEETrans. CircuitsSyst. VideoTechnol.,vol.3,pp.248-258,June 1993.

[8] "Prelinger archives," http://www.archive.org/details/prelinger, Onlineresource.

I-

752

I.XIR

p