A Nonlinear, Image-content Dependent Measure of Image Quality

(1)

.

'

1977-11-28

A NONLINEAR, IMAGE-CONTENT DEPENDENT MEASURE OF I~ffiGE QUALITY

Gösta H. Granlund

INTERNSKRIFT LiTH-ISY-I- 0189

(2)

-\

MEASURE OF H1AGE QUALITY

Gösta H. Granlund

INTERNSKRIFT LiTH-ISY-I-0189

(3)

l

ABSTRACT

In recent years, considerable research effort has been devoted to the development of useful descriptors for

image quality. The a.t tempts have been hampered by i n -complete understanding of the operation of the human visual system. This has made i t difficult to relate phy -sical measures and perceptual traits.

A new model for determination of image quality is pro -posed. Its main feature is that i t tries to invoke image content into consideration. The model builds upon a theory of image linearization, which means that the information in an image can wel l enough be represented using linear seg -ments or structures within local spatial regions and fre -quency ranges. This implies al so a suggestion that infor -mation in an image has to do with one-dimensional corre -lations . This gives a possibility to separate image content from noise in images, and measure them both.

Also a hypothesis is proposed that the visual system of humans does in fact perform such a l inearizat ion.

(4)

INTRODUCTION

It is a truism that a good picture is better than a bad picture, but i t is by no means clear what criteria to use to decide if a picture is good or bad. There are two

major sets of factors which govern decisions about image quality. One set of factors relates to the physics of

light, optics and performance of photosensitive materials and devices. The seeond set is related to the factors of psychophysics and vision. It has been difficult to estab -lish links between subjective and objective measures of image quality. For a review of the field, see [1,2,3].

The presented rnodel for image quality builds upon a prin-ciple of image linearization. The first aspect of this is the assumption that the information in an image can well enough be represented using linear segments or structures within local spatial regions and frequency ranges. The seeond aspect is the hypothesis that the visual systern of humans does in fact perform such a linearization.

In order for the quality rneasure to be useful i t is only necessary that the first requirernent is fullfilled. The theory developed and experiments performed on such a linearized operator support this assurnption [4] .

If, on the other hand, also the hypothesis of a lineariza-tion in the visual systern is correct, we would have a

rnodel for rneasurernents of image quality that is directly related to a part of the rnodel of visual perception.

(5)

3

A THEORY OF LINEARIZATION OF IMAGES

This section proposes the theory that images can be linearized locally, that is, within a local spatial re -gion and a certain frequency range we have a major vari-ation in one direction only. Thus a description of the image can be given, using only the amplitudes in the main directions as well as information about these directions. Compared to a complete representation, e .g. Fourier trans -forms wi th all directions represented, this gives a con -siderable reduction of the information.

The assumption of local linearization has been used as a basis for a picture processing operator [4]. The important characteristic of the operator is the use of complex fields. The basic function of the operator is, that in a local re -gion of the image i t describes the situation as a magnitude and a direction; that is a two-dimensional vector. See*' ~~tlr~ The magnitude of the vector should indicate the amount of t his variation, such as step size of an edge or density of a l ine. As indicated in [4] i t has been found that the direction of the vector ought to be, not the di-rection angle where the maximum is found, but this angle multiplied by a factor of two . .See Eignre

-2-.-In [4] the formulatian of the operator was made in Fourier terms. As indicated i n the paper there is no particular reason for using a Pourier formulatian except that i t in -herently gives certain useful relationships between pro -perties in spatial and frequency domaines , especially when

i t concerns the function of the windows.

It is here also proposed the hypothesis that local lineari -zation is a process that takes pl ace in the visual system.

(6)

It is beyond the scope of this presentation to discuss the reasons for that in detail . It can only be said that the hypothesis is supported by the present knowledge of the visual system [5] . There are a number of discovered fea-tures of the visual system that are indicative, such as orientation sensitive line detectors and edge detectors, hypercamplex cel ls sensing lines in a certain direction over a larger region, several separate spatial frequency channels, certain visual illusions etc. Many of these

features have direct parallels in the parallel hierarchical operator structure suggested in [4].

There are other compelling indications that speak in fa -vour of a linearization in effective picture processing systems. One indication is the present situation i n linguistic pattern recognition. The body of knowledge of formal grammars has been developed for one-dimensional grammars or one-dimensional strings. [6] It has proved most difficult to extend the theory to two dimensions ex -cept for certain special cases [7]. The present grammars build upon the rest riction of a linear ordering of events.

The hypothesi s suggested here is that the visual system does perform a l inearization on higher levels too, a l-though we do not have events that are local i n ordinary two-dimensional space, but in a more camplex concept space. A way to cover a two-dimensional space using a l inear operation is to have an ordering rul e for the app -l ications of the l inear operation. One such fami liar rule is the one used for scanning a picture starting e.g. from upper left corner and moving right, line by line. A priori, there is no preferred way to order points in a two-dimen -sional space.

A hypothesis suggested here is that the analysis of two -dimensional structures is performed using particular or

(7)

-5

dering rules tagether with ordinary one-dimensional gram -mars. It would then seem plausible that we have hierarchies of linear strings, where a string at a lower level would form a terminal at the higher level. See Figure l . The space in which the structure is located is partly two -dimensional object space, partly some concept space .

It seems that we would have informat ion flowing in two directions. See Figure 2. From higher to lower levels we would have a feedback from adjacent terminals (strings on the lower level) giving contextual information to deter -mine the ordering rule to be used. At the lower level a

string would be formed using the prescribed ordering rule and a terminal corresponding to this string would be for -warded to the higher level.

As we have a feedback system there will be conditions for instability especially if in certain parts of the structure there are conditions that change ei ther in the input in -formation or in the contextual information available. The interpretation of the pattern will then be identical to an ident ification of the hierarchical linguistic structure of the pattern. Visual illusions with a switching between two interpretations implies in this model that there are two structures available that are linguistical ly accept -able, but where a minor change of contextual information at the highest l evels, e.g. comprising experience or atten -t ion, makes at a certain moment one structure preferable t o another.

Undoubtly there is a way of linguistically dealing with two-dimensional information that is employd by effective visual systems. In the choice between a complete and gene -ral two-dimensional grammar, and a linear but hierarchic grammar, the later alternative seems more plausible. In

(8)

(

~

t

i

t

l . I l lustration of the structure of a hierarchical

(9)

/

2. Illustration of the interaction between levels for a hierarchical grammar.

Il/~

h

fY

/rvel

.r·trin'

f.owtr-

/fvd

.st

Y/

n

q

/-lr'qh

fY

/pvfl

c~ n

l

n ~vtf ( l~fd boc4

Jil

vY htt. f,' o Y!

{vo

», /ow~Y /fvp( fl~','s.ys -...J

(10)

order to deal with events i n the general three-dimensional space i t seems that an extension of the linearization can be used. The same discussion can also be applied to a

fourth dimension of t ime.

Another indication in favour of local l inearization is the

current status in t he f ield of two-dimensional recursive

filters. (8] It has been found that in visual systems re

-current lateral reciprocal inhibition is employed. [5]

This gives rise to a system t hat is in effect a recursive

f i l ter. However, two-dimensional recursive filters are

blessed by a curse: a polynomial in two variables cannot,

in general, be factored into a product of first-order fac

-tors. For technical appl ications this makes the stability

test in t wo dimensions ext remely cumbersome. Furthermore,

i t impl ies that a general two-dimensional recursive filter

cannot berealizedas a combination of low-order filters to

reduce the effects of quantization and roundoff noise. Also

in this case the concept of linearization would potentially

solve the problems. The use of one-dimensional structures

corresponding to polynomials of one variable only, would

allow a factorization in to a product of first-order fac

-tors with a well defined stability. The choice of variable

would be determined by the contextual information at higher

l evels as discussed earlier in t he section on linguistic

operators.

It should be observed that what has been discussed here

concerning linguistic operatars and recursive filters in

effect applies to the same processing, as the l inguistic operatars most likely function as recursive digital filters.

The image quality measure presented in the next section is

based primarily upon the assumption that images can be l inearized locally. Secondly i t is based upon the hypo

(11)

9

the visual system. If this is the case, i t would be ex

-tremely adantageous as we would have a model for measure

-ments of image quality that is directly related to a part

of the model of visual perception. It is wel l known that in measurements and recognition i t is very important to use features which are fundamental to the processes to be

described. As an example of the contrary one can mention

Fourier coefficients, which usually do not reflect correct interrelationship in the system under observation.

(12)

MEASURES OF IMAGE QUALITY

A common measure of image fidelity or image quality bet

-ween an object and an image has been to examine the mean

square difference between the radiance distribution R (x,y)

o

in a perfeet replication of the object and R

1(x,y) in the

image.

The mean square difference is

2

MSE

=

ff [R

0 (x,y)- Rl (x,y) ) dx dy

One can normalize the function by dividing through by

2

P =. JJ[R

0 (x,y)] dx dy

A fidelity defect measure is now

<D

=

MSE

=

d p 2 JJ[R 0 (x,y) -R1(x,y) ] dx dy

ff

[R

_o(x,y) J2dx dy

It has been argued quite correctly that the MSE is not a

good measure of perceived image quality, because the eye

is not sensitive to absolute luminance values, but rather

the appearance of particular features such as edges etc.

For that reason i t seems that first of all a better

mea-sure of the image fidelity for visual purposes would be to

compare the derivatives of the object, R' (x,y) and of the

o image Ri (x, y) .

The corresponding fidelity defect measure would then be

MSE' <Dd_ =

---pr-

=

2 JJ[R~(x,y)- Ri (x,y)] dx dy

ff[R

'

(x,y)]2dx dy o

(13)

Still this can be expected to be more of a physical measure than a valid perceptual rneasure.

11

With reference to the discussion in the preceeding

chap-ter, i t seerns that a morevalid measure could be obtained

if, rather than the image luminance, t he output from the

operator is campared between the object and the image.

According to the theory, the operator output contains the

parts of the image information that are essential to the

visual systern.

As indicated in [4] we can for every point (x,y) in the

image form a vector R(x,y) where

"""

R(x,y)= """ f(o)(x,y)

!~

1 ) (x, y) l

!~n)

(x' y) n

where each component f(n) (x,y) is the transform of order n.

- r

n

This vector has a certain similarity to a discrete fre

-quency spectrurn of the image. There are, however, a number

of important differences. First of all every vector repre

-sents the frequency content within a neighbourhood of a

certain size around the point (x,y). seeond i t is only the

frequency cornponent in the direction of maximum rnagnitude.

Most important of all, the higher order components do not

just contain lower frequencies but also transformations of

structures revealed by differences in the l ower order

transformation products. This is quite irnportant as i t can

be anticipated that perceptual clues will be emphasized.

(14)

would be ~

=

o 2

ffi

B

0 (x,y) - ~

1

(x,y) l dx dy

ffiR

(x,y) i2dx dy '""'0

This is a difference between two camplex image descrip

-tions rather than between frequency spectra of two images.

In practice i t wi l l most likely be necessary to adjust the weighting between different components. This can be done

by replacing the vector ~(x,y) by a vector

R(x,y) W

"""

where W is a weighting matrix

w

=

o

The individual weights will have to be determined using

earrelations to perceptual measurements. Although the

preceeding measure may be closer to perceptual qual ities i t probably is not close enough for all purposes and a more appropriate measure ought to be defined.

Most available methods for image quality determination campare perceptually an image before and after a degrada-tian that is known to its physical propert ies. These de

-gradations may be in terms of sharpness or noise. I t would be nice, however, to devise a measurement proeecture that

would give results similar to those obtained using per -ceptual measurements. This would also allow absolute

(15)

13

quality measurements on a single picture.

It can, and has been, argued whether a search for an al -gorithm giving image quality as a single number makes

sense. We will not go into the details of this discussion

in this context except observing that an important factor

is whether the algorithm closely follows the real model. In producing a single image quality factor we perform a

mapping from a multidimensional to a one-dimensional space.

Only if this mapping is performed in the same way in t he

measurement rnadel as in the visual system we can hope for a high earrelation between measurements and perceptual judgements. It is then apparent that the judgement of a picture is influenced by very subtle factors such as image

content and implications to observer, and such factors are probably impossible to ever quantify. It will probably prove

more elucidating to compute a number of different measures

describing different quality aspects and present this set for a more f lexible judgement.

The following discussion will mainly be devoted to various

factors that can be expected to relate to image quality and

questions of how these factors could be combined. As the

situation has been for other image quality measures, it

will prove necessary to test various hypothesis about re -lat ionships and to test these to perceptual measurement s.

It is well known that the image content affects the per

-ception of an image. For example is the perception of noise

different between regions containing details such as edges and regions of eonstant density.

With reference to the operator described in [4] the image content in point (x,y) is defined as the vector

(16)

C' (x,y)

=

-o

J~~)

(x,y)

l

.~~

2 ) (x, y) 2 l • !

f(n)~x

,

y)

l

L""'rn

J

where again

!~n)

(x,y) are the transform products of n:th order [ 4) • n

I t may be unwieldy to use a vector in arithmetical ex

-pressions for image quality and thus a scalar value of image content, C

0 , has been defined. n

C (x,y)= L log uk lir(k) (x,y) l

o k=l k

The factors uk are weights that will have to be determined empirically. This implies a multiplication of the magnitudes of different components in the image transform vector. The effect of this multiplication is to emphasize points which are structural ly complicated as opposed to just having high frequency content. As described in [4), the high frequency content information is not propagated to higher order trans

-forms, unless there is a structure in the high frequency pattern.

An obvious extension is to define the image content as an entropy, C _e, where

In this case the factors f(k) (x,y) have to be normalized

r

(17)

for equal values of f(k) (x,y) and small when they are

~k

very unequal.

15

The intuitively appealing way in which image content can be described as an entropy i s quite interesting from in -formation theory point of view. It rnay be useful to in -vestigate ways of decorrelating the point-by-point en -tropies in order to find newrneasuresof image information.

A nurnber of other ways can be conceived for defining the scalar value as a function of the vector cornponents, such as seeond moments, angular seeond moments, correlation,

variance, difference rneasures etc. There i s, however, pre

-sently no reason to believe that any of these rneasures would be advantageous.

Another irnportant, but vague, feature of image qual ity is sharpness. It has been possible to fairly well decide how degradation of the perception of sharpness is affected by physical ly well defi ned factors such as the optical trans -fer function (OTF) of the lens systern. It is rnuch rnore difficult to give an absolute estirnate of the sharpness of a single image. In particular, i t is difficult to specify physical rneasurernent procedures. The sharpness has to do with how wel l edges and l ines are defined.

If the analyzed image is noi se free, the sharpness can

reasonable well be deduced from the Fourier spectrurn of the

image; in particular its high frequency part. It has been

found, however, that al so the reproductian of the medium frequency ranges is irnportant for high subjective quality. The presence of cornponents of different frequencies varies for different images, and only the extent to which they are present in the analyzed image can be deterrnined.

(18)

If the image contains noise the situation becomes even

more difficult. Ordinary test procedures cannot distinguish between high frequency image components and noise, because

there is no way of taking structural earrelations into

account. The visual system on the other hand can very wel l

distinguish between these high frequency components.

Earlier i t has been suggested, that a measure for sharpness

could be derived from measurements upon edges [l]. The

re-presentation of an edge given an ideal edge is dependent

upon the point spread function of the system which in turn

leads to the modulation transfer function of the system.

The relationship between parameters of the modulation trans

-fer function and perceived quality has been fairly well mapped.

A difficul ty in an automated procedure is finding appropri-ate edges and determining corresponding frequency responses. It would be necessary to analyze the whole image and make some weighted sum of the results.

The proposed sharpness measure relates to the edge measure

discussed earl ier, but i t defines the existence of an edge

in terms of the operator output. It i s defined

(x,y) l

This is the magnitude of the sum of the coefficients from

all transforms [4] . Here again we have a certain similarity to a discrete frequency spectrum whose components are added. Note again, however, that the higher order transforms con -tain products from structures described in the lower order transforms. This gives a relative overemphasis of the low frequency (higher order) components. Choice of appropriate

(19)

17

weighting factors vk is intended to compensate for ·that and rather give an overemphasis of the high frequency

region as i t seems more important to high image quality.

Suitable components vk will have to be determined e

mpiri-cally. As outline d in [4] the transform f(k) (x,y) is

r k

camplex

f (k) (x y)

•Mr ,

k

where ek is t he direction in which the maximum output

from the operator is obtained. Along a straight edge all

ek will have essentially the same value and in this case

S(x,y) wil l be the sum of all magnitudes, which is the

maximally possi ble value n

S(x,y)

8_k=8=E _k₌_l vklf(k)(x,y)l _""_rk

If on the other hand S(x,y) not is on an edge, the angles ek will have various directions, and the sum will be con

-siderably smaller. This automatically disqualifies rela -t ively t he point from being an edge point to take into consideration.

As mentioned earlier i t has been difficult to measure noise

in images because the noise spectrum is not separated from the image spectrum. The visual system on the other hand can fairly easily det ermine the relat ive amount of noise

in n highl y structured image. The reason is that the visual

system can perceive earrelation and structure as disting

-uished from randomness.

The image quality measure suggested here uses the same

idea to distinguish between structure and randomness to

detect noise, using the operator described in (4] . The

measure of structure is simply given by the image content

(20)

The simplest way seems to be to detect regions which do not contain structure and there measure the amount of

noise. This means, that the computed image content value

C(x,y) i s below a certain level for this point (x,y). If

we go back to the definition of C(x,y) we can see that

even if the lowest order (highest freqency) transform is nonzero, the randoroness of the angles in the vector field

makes the next order transform become zero. The same thing is true for higher order transforms. This situation makes the image content C(x,y) assume a value zero or close to zero when we have a region containing only noise.

In these regions the noise can be determined using a

measure similar to the sharpness measure defined earlier.

We define the noise component N (x,y) in point (x,y) as

n

n l

N (x,y) =r: q lf(k) (x,y) l

n k=l k - rk

In this case the summat ion onl y has to be performed over

a few of the highest frequency components as noise is

usually limited in bandwidth. As we do not care about

angular earrelation in this case the absolute value is

taken before the summation.

Noise is not important only in the structurally empty

regions, but probably more so in connection with or inter

-fering with structure such as edges. In some cases the

total noise situation can be determined using only no

n-structured regions. However, the amount of noise may vary according to the density dependent upon whether the noise

is additive or multiplicative. Also we are interested in how the noise interferes with the perception of structured

regions as a function of the image content.

To measure noise in highly structured regions i s consider

(21)

-19

ture or earrelation have to be used. The measure suggested is based upon the theory of linearization. We then decide

that signals perpendicular to the maximum constitute noise.

As outlined before [4] the transform

.f~k)

(x,y) is complex

k f (k)(x,y) -~-r k

=

l

_l

f (k) (x y)

l

"\o'\"r , k

where ek is the direction the operator, if(k) (x,y)

l,

r k

in which the maximum output from

is obtained when the angular

orientation e is varied. We define the scalar

g~k)

(x,y) k

as

the magnitude obtained from the operator with orientation

ek+~, that is, the output in a direction perpendicular to the one giving the highest output.

The noise estimate in the structured region is now defined

Ne(x,y)

If we decide to devise a total expression for image quality,

the question is what i t should look like. A number of for

-mulas have been suggested, and one would have to test a number of possibilities and correlate to perception data.

The expressions are often of the type [9]

Q

=

_l ₊Sh_Graininessarpness

This expression seems a reasonable starting point for in

-vest igations. In terms of t he factors described earlier

the expression could look like

=

~

2

_{f 1}(C(x,y)) S(x,y)

Q l + I t f

2 (C(x,y)) N(x,y)

(22)

The function f

1 is a weighting function for the sharpness

measure, S(x,y), which is dependent upon the image con

-tent, C(x,y). Function f ₂ represents the same property

for the noise N(x,y) . It indicates the relative importance

of the noise given a certain image content, C(x,y).

An obvious question that cannot be resolved here, is

whether the different contributions can be simply summed

over the whole image, or if they have to be combined in some other way. A number of questions of this t ype have

to be studied further, and subjected to perception expe

(23)

CONCLUDING REMARKS

In the preceeding pages a number of factors supposedly

important to the perception of image quality have been suggested. This is only intended to be a first attempt

21

to relate a new approach to the current field of image quality measurements. Other factors and composite measures

will have to be defined as the experience increases, and

tested with relation to perceptual experiments.

Some of the expressions and the computations may seem

formidable. However, with the use of a planned parallel

picture processor for this type of operator [10], pro

-cessing times can be reduced to a few seconds. Such a

system will also allow the preparation of images for perceptual experiments. For example, one can add noise to an image in dependence upon the computed image content for different regions, and then test the perceptual chang

-es. In the same way i t seeros that one can perform adaptive

filtering such as noise removal from unstructured regions

as well as from structured regions along dimensions where

(24)

l . 2. 3. 4. 5. 6. 7 • 8. 9. REFERENCES

L.M. Biberman Ed., Perception of Displayed

Information, Plenum Press, 1973.

O.H. Schade Sr., Image Quality; A Cornparison of

Photographic and Television Systerns, RCA

Labora-tories, 1975.

H. Marmolin and

s.

Nyberg, Multidimensional Scaling

of Subjective Image Quality, FOA Report C 30039-H9,

1975.

G.H. Granlund, In Search of a General Picture Pro

-cessing Operator, Cornputer Graphics and Image Pro

-cessing, (In press).

H. Davson Ed., The Eye Volurne 2 A, Acadernic Press,

1977.

J.E. Hopcroft and J.D. Ullrnan, Formal Languages

and their Relation to Automata, Addison-Wesley,

1969.

R.O. Duda and P.E. Hart, Pattern Classification

and Scene Analysis, Wiley-Interscience, 1973.

T.W. Huang Ed., Picture Processing and Digital

Filtering, Springer-Verlag, 1975.

C.N. Nelson and G.C. Higgins, Image Sharpness,

Advances in the Psychophysical and Visual Aspects

of Image Evaluation, Surnrnary of the Proceedings

of an SPSE Technical Section Conference, October

(25)

10. G.H. Granlund, An Architecture of a Picture

Processor Using a Parallel General Operator,

Report LiTH-ISY- I-0188, 1977.