Implementation of an Image Quality Rating System

(1)

Department of Science and Technology Institutionen för teknik och naturvetenskap Linköping University Linköpings Universitet

LiU-ITN-TEK-A--10/010--SE

Implementation of an Image

Quality Rating System

Pia Falk

Emil Olsson

(2)

LiU-ITN-TEK-A--10/010--SE

Implementation of an Image

Quality Rating System

Examensarbete utfört i medieteknik

vid Tekniska Högskolan vid

Linköpings universitet

Pia Falk

Emil Olsson

Handledare Anders Johannesson

Examinator Björn Kruse

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Acknowledgement

We would like to express our gratitude and thank the following people for their help and support during the course of this project: Anders Johannesson, Per Kannermark, Andreas Nilsson, Peter Berggren, Andrew Andersson, Gustav Träff and Martina Lundh at Axis Communications AB. Our examiner Björn Kruse at Linköpings Universitet. We would also like to thank the participants in our psychophysical image quality evaluation. Finally, we would like to thank Axis Communications AB for giving us the opportunity to work with this thesis, giving us access to cameras and the lab environment, as well as making us feel welcome in the organization.

(5)

Abstract

As the digital camera surveillance business expands, much effort is put into op-timizing the quality of the captured images. There is a need for a image quality rating system when specifying and developing new cameras. This system can be used to compare new technology to old cameras as well as competitor cam-eras in order to locate the differences. The rating should be based on technical measurements and resemble human perception of image quality.

This thesis has focused on measuring and rating resolution and noise since these are important parts of perceived image quality. Images of a predetermined target (both stationary and rotating) were captured by the camera in three illuminations. An application was developed to facilitate input of the captured images as well as performing the required calculations. The application then returns one calculated image quality rating per illumination.

(6)

List of Figures

2.1 The imaging pipeline. . . 15

2.2 Global shutter vs. rolling shutter . . . 16

2.3 Rolling shutter effect on image . . . 17

2.4 A cross-section of a schematic image sensor. . . 18

2.5 The CFA Bayer pattern. . . 18

2.6 The sampling theorem . . . 21

2.7 USAF 1951 resolution test chart . . . 22

2.8 Change of modulation depth . . . 23

2.9 Decrease of modulation depth with increasing frequency . . . 24

2.10 Slanted edge . . . 26

2.11 Sine wave modulation . . . 26

2.12 Square wave modulation (bar patterns) . . . 27

2.13 MTF together with CSF yields the SQF . . . 29

2.14 CSF . . . 30

2.15 Gaussian Noise . . . 32

2.16 Poisson- and normal distributions . . . 33

2.17 Salt & pepper noise . . . 34

3.1 Target legend . . . 37

3.2 Target generating application . . . 39

3.3 Overview of the test process . . . 40

3.4 Target setup . . . 42

3.5 Error messages . . . 45

3.6 Steps in the process of extracting the target . . . 47

3.7 Sampling points for MTF . . . 49

3.8 Theoretical MTF compared with estimated MTF . . . 51

3.9 Theoretical SQF vs. measured SQF - varying pixel pitch . . . . 53

3.10 Theoretical SQF vs. measured SQF - varying f-number . . . 54

3.11 The Midpoint Circle Algorithm . . . 54

3.12 The sampled gray circles . . . 55

3.13 Images for contrast coefficient validity check. . . 56

3.14 A plot of the salt & pepper thresholds. . . 58

3.15 The technique of filtering salt & pepper noise . . . 59

3.16 The four scenes captured with one of the cameras . . . 65

(9)

3.18 Areas of interest with regards to a rolling shutter . . . 67

3.19 Sampled pixels for exposure time estimation . . . 68

3.20 Sampled data for exposure time estimation . . . 69

3.21 The GUI . . . 72

(10)

List of Tables

2.1 Approximate illumination for typical everyday scenarios. . . 20

3.1 Illuminations used in this thesis. . . 41

3.2 Example of image validation parameters. . . 44

3.3 Camera class with corresponding resolution limits. . . 46

3.4 Pixel pitch per camera class assuming 1/2” optical format. . . . 61

4.1 Details about the ratings of camera A . . . 75

(11)

List of Abbreviations

CCD Charge-Coupled Device

CFA Color Filter Array

CLAHE Contrast-limited Adaptive Histogram Equalization

CMOS Complementary Metal Oxide Semiconductor

CTF Contrast Transfer Function

ESF Edge Spread Function

FPN Fixed Pattern Noise

GUI Graphical User Interface

HE Histogram Equalization

LSF Line Spread Function

MTF Modulation Transfer Function

ND Neutral Density

OD Optical Density

OECF Opto-Electronic Conversion Function

OTF Optical Transfer Function

PTF Phase Transfer Function

SEM Slanted Edge Method

SFR Spatial Frequency Response

(12)

(13)

Chapter 1 Introduction

1.1 Background

Axis R

Communications is a Sweden-based company that specializes in profes-sional network video solutions for surveillance and remote monitoring. The end users range from large multinational corporations to small enterprises. Axis has several camera types available on the market since different cameras meet different needs. A camera that is optimized for capturing fast moving objects, such as cars, may not be efficient enough in observing an airport where faces need to be detected.

The surveillance market is a growing business and to further develop the com-pany’s position as the market-leading supplier of network video solutions, Axis constantly need to improve the cameras to meet new demands. Much effort is put into optimizing the quality of the captured images, for example reducing noise and adjusting the colors. When specifying and developing new cameras there is a need for a quality rating system that can be used to compare new technology to old cameras as well as competitor cameras in order to locate the differences. The rating should be based on technical measurements and resemble human perception of image quality.

The approach to test cameras to determine image quality is not new. There are several companies that specialize on testing certain properties in the cap-tured images. These tests give extensive information about a wide range of parameters. However, none of the tests offer a measurement that present the results in a way that non-experts could understand.

The definition of image quality is a widely debated subject. This report will focus on rating image quality as defined by Axis. This definition is based on three pillars of image quality; visibility, appearance and fidelity. The definition will be explained in further detail in chapter 2.1.

(14)

1.2 Purpose

This report aims to summarize the work, background studies and theory be-hind the implementation of an image quality rating system. Furthermore it discusses some of the more abstract problems associated with quantifying im-age quality from machine measured data and correlating the same with the human perception of image quality.

1.3 Objective

The main objective of this thesis is to develop an application for determining and measure certain aspects of the camera based on images of a predetermined camera target, under well defined environmental circumstances (illumination, distance etc), acquired from the video camera that is to be tested and rated. The tested camera is to be treated as a magic box, i.e. the only information about how the camera operates the user know is described in the marketing material of that particular camera. This criterion is essential in order for the application to be useful in testing competitors’ cameras.

The application must be easy to use for non-experts. Furthermore it must be versatile in the sense that a wide range of cameras must be able to be tested using the same test setup and software settings. The application must also be designed in such a manner as it would be easy to expand it with additional test modules in the future.

In order to achieve the goals described above, this thesis will also attempt to define what good image quality is, both from a computational and a human standpoint. The final score of a camera’s image quality based on the prede-termined aspects will then be translated from a machine calculated result to a more perceptive and logical score for humans.

1.4 Delimitations

This project will limit itself to two concrete aspects of image quality, resolution and noise. Another limitation is that only still images will be analyzed, thus excluding video streams in the thesis.

(15)

1.5 Methodology

The modules of the application will be developed using a combination of tests in a controlled environment (lab) and processing of the test results with ap-propriate tools.

Development will mainly be carried out on the MathWorksTM Matlab R

plat-form which is a high-level language, primarily intended for numerical compu-tations.

In a later phase of the thesis a psychophysical test will be carried out to determine correlation between the rating computed by the application and human perception.

1.6 Terminology

The word spatial resolution or just resolution in this report refers to the ability to distinguish small objects in the image. The greater the resolution the smaller sized objects can be distinguished. It does not refer to the image dimension in pixels, which may be the more common meaning of the word. For this, the report uses the word image size, or just size.

The report may also discuss temporal resolution which refers to cameras ability to convey details over time. A fast moving object would require a higher temporal resolution to be captured properly.

The reader should also be aware that the term image quality in this report is based on the use of the images in camera surveillance in general and Axis Communications description of image quality in particular.

(16)

Chapter 2 Theoretical Framework

2.1 Image Quality

The concept of image quality is a widely debated subject. Several scientists have their own definitions of how to measure and grade image quality. To this date, after decades of discussion, still no accepted formal definition of image quality exists.

Most scientists agree that image quality measurement often lies in the eyes of the observer. In, for example, medical imaging the quality of the image may be defined as the ability of expert observers to detect and recognize patterns to support a diagnosis. In a photograph, on the other hand, it may be the contrast and the reproduced colors that determine the image quality.

There are different approaches to study and measure image quality. The ap-proach in this thesis is to study certain aspects individually and weigh them together to achieve a combined image quality score.

According to Keelan [14, p. 9] image quality can be defined as:

”The quality of an image is defined to be an impression of its merit or excellence, as perceived by an observer neither associated with the act of photography, nor closely involved with the subject matter depicted.”

Keelan’s definition of the image quality has nothing to do with the context, in which the image is viewed. This may be a general definition and applicable in many situations, but when the purpose is to study certain aspects of image quality in a specific environment it may not be a sufficient definition. A more useful definition was presented by Janssen [12, p. 80] in 1999:

”(1) The quality of an image is the adequacy of this image as input to visual perception. (2) The adequacy of an image as input to

(17)

visual perception is given by the discriminability and identifiability of the items depicted in the image.”

Both these definitions are hard to apply in a complex rating system where the computer will do all the calculations, since they are based on human percep-tion. Despite this, they work well as a basic understanding of image quality and provide a basis for future development. Image quality is often, like in the definitions above, evaluated psychophysical (subjective or perceptual), and that may give the best quality measurements. But since psychophysical eval-uation is both expensive and time consuming the most optimized result will be achieved when physical (objective) and psychophysical evaluation are com-bined [16]. This is done by constructing metrics that relate to the subjective image quality from psychophysical procedures and apply these to the objective evaluation methods.

Axis’ own definition of image quality factors will work as a basis for this thesis and will be explained further in section 2.1.2.

2.1.1 Imatest

TM

ImatestTM is a company that develops software for testing digital camera image quality. ImatestTM have their own definition of what image quality is and which factors are the most important for image quality. They also have a number of different tests that are specialized at testing a specific image quality factor at a time. For the tests, Imatest use several ISO-standard- and own defined targets, each one designed to test a specific aspect of image quality.

The most important image quality factor is, according to Imatest, sharpness, which determines the amount of details an image can convey. To measure sharpness Imatest use a Spatial Frequency Response (SFR), synonymous with Modulation Transfer Function (MTF), which is the contrast at a given spatial frequency (given in cycles or line pairs per distance) relative to low frequencies. The theory behind MTF is described in detail in section 2.5.1.

Other image quality factors that Imatest find important are noise (described in section 2.6), dynamic range, color accuracy and distortion, to name a few. All the tests analyze image content with algorithms written in Matlab. Many of the technical concepts are available on their homepage [11] but only paying customers have access to their software.

Imatest’s approach is to test one or a few specific image quality factors at a time and measure them in detail. Consequently, the results are very extensive and may be almost incomprehensible for a non-expert. Some issues in this thesis are addressed with a similar approach as Imatest, but even though the procedures are related they are still adapted to the special needs and requirements of this thesis.

(18)

2.1.2 Axis Definition

Axis definition of image quality was formulated to specify the company’s view on image quality, especially for the most common use of the products: surveil-lance. The definition was developed with the goal of defining which different image quality factors are the most important. Axis prioritizes the most im-portant factors in the following order:

1. Visibility 2. Appearance 3. Fidelity

Visibility means that all relevant objects in the scene shall be visible. This involve being able to observe motion in both light and dark areas, making text readable, faces recognizable if necessary, and something as natural as not blurring moving objects.

Appearance means that the image should look good to the observer. Good looking images often have high contrast, exaggerated colors, sharp edges, low visible artifacts and low intensity level at night to hide noise.

Fidelity is the resemblance between the original scene and the captured image. A high fidelity camera will typically show a gray and dull image if the scene is gray and dull, and vice versa. Besides that, the image will be foggy if the scene is foggy and may therefore hide objects in the scene that otherwise would be desired to see.

Even though it is desired to have all three factors optimized, it may not always be possible. Visibility is the most important since the image is worthless if the object or event is not visible. Appearance comes next, since the images must look good even for a non-expert. Fidelity is mostly important when the real scene is visible at the same time as the captured image, which is not usually the case for Axis cameras.

In a visibility point of view it may be best to apply a high-pass filter on the image to show only high spatial frequency content but this would probably destroy both the appearance- and the fidelity factor entirely. A compromise might be the best solution and since the quality is mostly judged by humans, appearance is valued almost as high as visibility.

Visibility, appearance and fidelity each put different demands on the camera. For instance, visibility requires extremely low noise and high light sensitivity, have high optical requirements regarding sharpness and geometry and have a need for clever dynamic range compression. Appearance requires low noise, color enhancement without artifacts, good color- and IR-cut filters. Fidelity requires low noise, accurate color- and IR-cut filters and dynamic range con-siderations to mimic human scene adaption. All three quality factors have in

(19)

common that they require low noise and, in some sense, high resolution and sharpness. Therefore, it may be a good start to prioritize measuring these factors.

2.2 Psychophysical Image Quality Evaluation

A psychophysical evaluation is highly valued in order to correlate the computed values of image quality with human perception. The complexity in psychophys-ical measurements is massive. Since several factors affect the way people judge the image quality and several ways exist to measure it, no universal solution for performing evaluations exist.

The purpose of a psychophysical evaluation might be to determine if a new product works better than its predecessor, but to get a reliable answer is difficult because of several reasons. This can for example be the fact that improvement often are made at the sacrifice of some other image attributes. It can be a matter of preference if the results are perceived as improvements or not. Another reason for the answer to be unreliable is the fact that there is a learning effect that can influence the result. At first the image might seem like a big improvement but gradually become regarded as less desirable. Conversely, the image might seem like a deterioration at first but become acceptable or even preferred over time. Yet another reason might be that the image quality judgments are influenced by the context or the instructions [16].

The judgment of image quality is (as described in sections 2.1-2.1.2) several im-age attributes put together and the problem with an imim-age quality evaluation can thus be seen as projecting a multi-dimensional space into a one-dimensional space. Besides the mentioned problems with getting a reliable answer, another problem with image quality evaluation is the question of methodology. Two fairly simple ways of performing a psychophysical image quality evaluation are by placing an image on a quality scale from 1 to 10 or to use paired com-parison where the observer chooses from each pair which image is greater or less than the other in some perceptual attribute [6]. Although the accuracy of both methods can be challenged, studies show that the method of numerical category scaling seem to be consistent with results from other, more time-consuming methods [16].

Another complex attribute in an image quality evaluation is that system perfor-mance requires validation with proper sampling in the ”photographic space”. In general, this implies that a camera can have certain automatic settings, for example automatic white-balance and auto exposure, that in some scenes result in a poorer image quality than if the camera was configured manually. If a camera produces many bad pictures in a test, it does not necessarily mean that the camera has poor quality performance. The best way to avoid this is

(20)

to choose several diverging scenes to best evaluate the camera’s quality per-formance.

2.3 Cameras with Digital Sensors

There are two basic types of image sensors relevant to photographic surveil-lance, the passive- and the active pixel sensor. The Charge-Coupled Device (CCD) is a passive pixel sensor which has been widely used in everything from industrial applications to consumer products. The Complementary Metal Ox-ide Semiconductor (CMOS) started out as a passive pixel sensor with poor quality but quickly evolved into an Active Pixel Sensor (APS) which could match the CCD in some aspects. Lately however, the CMOS has evolved further and now surpasses the CCD in many applications [7].

2.3.1 From Scene to Image

Below follows a brief overview of the imaging system pipeline as depicted in figure 2.1, starting at the scene and ending with the image provided to the user in a typical camera with a digital sensor.

Scene Lens

Microlens Array Color Filter Array

Image Sensor 011... A/D Color Processing Image Enhancement and Compression

Figure 2.1: An overview of the imaging pipeline in a typical digital sensor camera.

The light from the scene first falls through the lens which concentrates the light and brings it to focus on the detector.

The amount of light hitting the detector is controlled by the aperture which is the opening of the lens through which the light passes before it reaches the detector. The size of the aperture, which can be expressed with a f-number, determines how much light will hit the sensor. The f-number, or f-stop, is the focal length of the lens divided by the diameter of the aperture. Consequently the f-number has no unit. The greater the f-number, the less light per area unit reaches the detector.

The Electronic Shutter

The aperture is however not the only thing controlling the amount of light. There is also a shutter which can be thought of as a gate which regulates for

(21)

how long the sensor is illuminated. The shutter can be either a mechanical curtain which physically blocks the sensor or an electronic device. In cam-eras with an electronic shutter, the light always falls directly on the sensor. The way to regulate the exposure time in this case, is to only activate the detector for the given time (corresponding to the time the mechanical shutter would have been open) and consequently only register light hitting the sensor during this time. Some modern Digital Single-Lens Reflex (DSLR) cameras combine a mechanical shutter for normal operation, with an electronic shutter for extremely short exposure times when the mechanical shutter is physically incapable of moving that fast.

Most digital video cameras utilize an electronic shutter and there are two main types of electronic shutters based on which type of sensor is used. The first one is called a global shutter and is used on CCD sensors. The global shutter reads all the pixels at the same time and stores the result in covered cells on the sensor surface for later readout (see figure 2.2A). This is much like a photographic film since all parts of the scene are registered simultaneously.

elapsed time row 1 row 2 row 3 row 4 row 5 row 6 row 7 row ... row n elapsed time row 1 row 2 row 3 row 4 row 5 row 6 row 7 row ... row n

A. Global shutter

B. Rolling shutter

Figure 2.2: The length of the blue bars represent the exposure time.

A. All rows are active simultaneous.

B. There is a time delay before activating each row.

The second type of electronic shutter is the rolling shutter which is usually found on CMOS sensors, which normally has no extra image memory on the sensor. The rolling shutter differs from the global shutter in that it does not read all pixels at once. The rolling shutter reads the pixels row by row (see figure 2.2B). This means that if there is movement in the scene (or the camera is panning), parts of the movement would be captured at different locations in time.

(22)

Assuming the rolling shutter reads line by line starting in the left upper most corner. A horizontal camera panning from left to right would result in a skewed image, where for instance a vertical tree would appear to lean to the left as seen in figure 2.3. This is because the top rows of pixels are registered while the tree is in one position in the frame, as opposed to the bottom rows of pixels which register the tree when it has moved to the right of this position in the frame.

camera panning

scene moving

Scene

Image

Figure 2.3: The effect of rolling shutter and camera/scene move-ment (the effect is exagerated).

The amount of skew caused by a rolling shutter depends on the time it takes to read each row of pixels. With a short readout time, the rows are registered in rapid succession and objects in the scene have less time to move during the exposure which results in less skew.

The Digital Sensor

The detector consists of a microlens array, a Color Filter Array (CFA) and lastly the image sensor itself. The light first falls on the microlens array which focuses the light in order to direct all the light, which falls on the particular microlens unit, to the photodetector in a pixel. The whole CMOS pixel is not photo responsive due to the fact that some readout circuitry must be integrated on it. The percentage of area occupied by the photodetector in a pixel is called the fill factor. The size of the pixel is called the pixel pitch.

Before the light from the microlens hits the photodetector it passes through a filter which filters light of certain wavelengths. The filter is normally only letting a single band of either red, green or blue light through and thus allowing only one of these spectra to be registered by the detector. The color filter is part of a CFA which is typically deposited in a certain pattern on top of the image sensor pixel array. A schematic view of a typical red-green-green-blue Bayer CFA is shown in figure 2.5.

(23)

Broad spectrum of light

Color filter array Microlens array Filtered light

Photodiode

Pixel

Figure 2.4: A cross-section of a schematic image sensor.

Red Green

Blue

Light

(24)

The reason light has to be divided into color bands is that the photodiode can not tell which wavelength of light should equal which pixel value. The way around this is to filter the light into the three primary colors red, green and blue, letting the photodiode register the amount of light from just one of these colors. It is also possible to use, for instance, the complementary colors cyan, magenta and yellow to achieve the same effect. The subsequent problem is how to rebuild the missing light, i.e. the green and blue pixel values for a red pixel for instance. The solution is to estimate the missing light through spatial interpolation and this is done with varying elegance depending on how accurate the interpolation must be. This interpolation is referred to as demosaicing and is often combined with additional digital signal processing in order to correct color, white balance and compensate for adverse sensor characteristics. Now each pixel has both a red, green and blue pixel value which together are used to represent the color of the light which entered through the lens. The RGB-color model is convenient to use as it closely mimics the way the human vision work.

2.4 Light

2.4.1 What is Light?

In order to understand some of the concepts in this thesis it is necessary to be familiar with some of the properties of light. Although science do not yet fully understand all aspects of light it is still possible to very precisely predict the behavior of it through calculations with formulas, derived mostly from observations of how light behaves.

A key property of light is that light consist of of different temporal frequencies. It can be described as a periodic function, at least over a short time span. This is illustrated if a beam of sunlight is shone through a prism which results in the separation of the different wavelengths of light in different color bands. Another important property of light is that it carries energy and this energy is always absorbed in discrete amounts. This implies the presence of a particle like element with a constant amount of energy. This element is called a photon and is a widely accepted as a integral part of the concept of light.

There are of course a multitude of additional formulas, definitions and obser-vations used to describe all the, more or less, graspable properties of light. Since they have no direct impact on the ideas in this thesis they will not be discussed further.

(25)

2.4.2 How to Measure Illumination

The lumen (lm) is the SI unit of luminous flux which is a measurement of the power of light. The difference between radiant and luminous flux is that radiant flux measures the total power of light emitted, whilst luminous flux measures the same but with consideration to the non-linear sensitivity to dif-ferent wavelengths in the human eye. Consequently luminous flux is a more perceptive measurement for humans.

Luminous flux is, as mentioned above, a measurement of the power of emitted light. Combining luminous flux with a surface area gives a measurement of the amount of light, as perceived by the human eye, that hits that surface. 1 lx is defined as 1 lm over m2 as seen in equation 2.1:

1lx = 1lm

m2 (2.1)

Lux is the SI unit of this measurement which is referred to as illumination and it will be used throughout this thesis whenever there is a need to quantify illuminance.

In order to measure illumination a lux meter is used. The lux meter is a hand-held device with a light-sensitive sensor and a display showing the measured illuminance in lux. It is placed on the illuminated surface, and the lux value is then read off its display.

Table 2.1 shows some typical everyday scenarios and the corresponding ap-proximate illuminations. They are presented here to give the reader an idea of how the lux relates to everyday life.

Illuminance Scenario

0.27 lx Full moon at night with clear skies 50 lx A typical living room

100 lx Very dark overcast day 320-500 lx Office lighting

1000 lx TV studio lighting 32 000 - 130 000 lx Direct sunlight

Table 2.1: Approximate illumination for typical everyday scenar-ios.

2.5 Resolution

The reader should be aware that, as stated in 1.6, resolution in this thesis refers to the ability of the camera to convey details in the generated image. In

(26)

some literature the same factor is also referred to as sharpness.

A high resolution is a very important factor in image quality and it is also tightly knit with Axis definition of image quality, especially the ’visibility’ pillar, see section 2.1.2.

The theoretical maximum resolution of an imaging sensor with a one-to-one reproduction scale is basically twice the pixel size of the sensor itself. This fact is described as the sampling theorem, in which it is stated that a bandlimited analog signal can be perfectly reconstructed from an infite sequence of sam-ples if the sampling rate exceeds 2 times the highest frequency. The highest frequency in this case is 1₂ cycles/pixel. In order to perceive some change in modulation on a sensor, the modulation must be registered in at least two pixels i.e. a modulation of maxiumum 1₂ cycles/pixel. If the modulation is in one pixel the resulting signal would be a mean value, which can be seen in figure 2.6B. Figure 2.6A shows the signal and 2.6C shows the signal from 2.6A represented by two samples (at the Nyquist frequency). Figure 2.6D shows the same signal sampled with 1₈ cycles/pixel i.e. 8 samples to represent one cycle.

1/2 1 1 0 1/2 1 1 0 1/2 1 1 0 1/2 1 1 0

A

B

C

D

λ λ λ λ

Figure 2.6: A. The signal (one cycle). B. 1 cycles/pixel

C. 1₂ cycles/pixel D. 1₈ cycles/pixel.

It is possible to measure the resolution in the spatial domain by looking at how test targets with known parameters degrade when they are reproduced, for

(27)

instance digitally in a camera. The USAF 1951 resolution test chart (figure 2.7) has been used for over half a century for visually estimating and quantifying resolution of optical imaging systems such as cameras, bomb aiming optics and binoculars.

There are a few inherent problems with this type of chart. It is intended to be used for visually estimating where the limits of the imaging system lies. This is done by looking at the chart and determine the highest resolution at which the bars can be distinguished. The resolution is then looked up in a table using the corresponding numbers in the chart. There are however big problems with this method as people judge this limit differently. Even if all human beings had the same exact contrast sensitivity, it would still yield different results because of the trouble with defining exactly how indistinguishable the black bars should be.

Using a computer to analyze the USAF 1951 chart and in this way remove the human factor from the equation is not ideal due to, amongst many things, the fragmented arrangement of the bars. This thesis will however develop this approach and look at alternative charts, as well as methods of interpreting the ones which are adapted for this purpose.

Figure 2.7: The United States Air Force (USAF), USAF 1951, resolution test chart.

(28)

2.5.1 Modulation Transfer Function (MTF)

The Modulation Transfer Function (MTF) is an important tool in order to evaluate imaging systems. It provides a means of expressing image quality of optical systems both objectively and quantitatively. The MTF can for instance be calculated based on lens specifications as well as being measured on actual lenses. This makes it a very useful tool for, amongst other things, evaluating a lens manufacturer’s production quality.

The MTF (2.3) is the magnitude of the Optical Transfer Function (OTF) [5] which is the Fourier Transform of the impulse response of the system. The OTF can be broken down into the following components:

OT F (ξ, η) = M T F (ξ, η) · P T F (ξ, η) (2.2)

where ξ and η are the spatial frequencies in the y- and x-plane and P T F (ξ, η) = e−i2π·λ(ξ,η) is the Phase Transfer Function.

From 2.2 above, the MTF (the magnitude of the OTF) is expressed as

M T F (ξ, η) = |OT F (ξ, η)| (2.3)

MTF is also known as Spatial Frequency Response (SFR) which might be a more descriptive name for some. In its essence, the MTF describes how modulation changes with an increase in spatial frequency.

Imaging- system

output

input

A

out

A

in

Figure 2.8: Change of modulation dept in an imaging system.

Figure 2.8 shows the input signal which can be thought of like an object that is to be photographed. The peak-to-peak amplitude Ain, of the uniform input

signal is:

(29)

where Amin and Amax corresponds to the lowest and highest intensities of the

signal.

The modulation depth M is defined as

M = Amax− Amin Amax+ Amin

(2.5)

Due to various factors in the imaging system (optics, sensor, demosaicing, noise reduction etc) there is always a decrease of modulation depth in the output (in this case an image) compared to the input signal, i.e. Min > Mout. Figure

2.8 shows how the output signals peak-to-peak amplitude Aout has decreased.

As the peak-to-peak amplitude decreases so does the modulation depth Mout.

The MTF describes how the modulation depth changes when spatial frequency increases [2]. The signal in figure 2.8 has a harmonic frequency, which only tells how much modulation depth decreases at that particular frequency. Fig-ure 2.9 on the other hand, shows the input signal (object) with increasing spatial frequency, followed by the decrease in modulation depth in the output signal (image). The bottom graph in figure 2.9 shows the resulting MTF. It is standard practice to normalize the MTF with the lowest frequency of the signal. spatial frequency MTF 1.0 position position Object Energy/surface area Image Energy/surface area

Figure 2.9: The MTF describes the decrease of modulation depth with increasing frequency.

(30)

There are a number of factors to consider when using MTF as a measuring tool. One thing which to consider is that the MTF is only valid in the point in the image where the measurement is taken. An imaging unit is almost never completely uniform in its imaging capabilities due to for instance the lens or anomalies in the sensor. One should therefore be aware that the MTF only speaks for the whole imaging plane if it is either measured over the whole plane or if the plane is uniform.

The MTF can be determined for a particular component in an imaging system, a lens for instance, as well as the system as a whole. It is crucial to know which sort of data the MTF is based on in order to interpret the MTF-curve correctly. It is worth keeping in mind that if the MTF is measured on an image coming from a camera there is no way of knowing exactly how different components in the camera affected the modulation.

2.5.2 Measuring the MTF

There are several ways to measure the MTF using objective methods. Some are more complex than others and trades simplicity for robustness. Other methods are more straight forward but require a well defined set of data to work properly and may be more sensitive to noise, distortion etc.

Slanted Edge Method

One of the more robust methods of measuring MTF is the so called Slanted Edge Method (SEM). This thesis will however only explain SEM briefly as the method is not easily applicable with a dynamic target (as explained in further detail at the end of this section).

SEM is based on a slanted edge (see figure 2.10) which is projected onto the sensor, i.e. photographed with the camera that is to be evaluated. Slanted edge means that the edge should not be aligned with the rows or columns of the sensor. The resulting image is scanned row- or column wise (depending on which angle the resolution is measured) to determine the edge location [3]. This information is projected into bins along the scanning direction. The number of bins is greater than the number of pixels scanned, resulting in an over-sampling of the edge. The normalized histogram of the bins is the Edge Spread Function (ESF) from which the Line Spread Function (LSF) can be derived. The MTF for the camera is the Fourier Transform of the LSF [5]. As mentioned above, this method has been rejected in this thesis. This is based on the fact that it is not radially symmetric and therefore not suitable to use in a moving target (as it most likely would not be located in the same position in consecutive images of the target). This problem is by no means unsolvable

(31)

Figure 2.10: Illustration of how a slanted edge test image could look. The dashed square represent the area from which data is acquired.

but it adds complexity to the measuring process which in turn has an adverse effect on the simplicity factor of the thesis problem formulation (section 1.3).

Sine Wave Modulation

A sinusoidal signal can be used to calculate the MTF. The result from using a sinusoidal signal is that there are no sharp edges in the image and consequently in-camera processing, for instance sharpening, is not performed. The theory is that this will give a more accurate MTF [17], as it is based on data that has been less processed and look more like raw sensor data. This thesis is however critical to this approach as the MTF is based on data (the signal), which does not correspond well with data coming from the camera when used in a real world application. It can be argued that the input signal should be as good a representation of real world data as possible in order to get a realistic MTF. In this aspect the sine wave modulated signal might not be the best choice.

Figure 2.11: Sine wave modulation.

When using a sinusoidal target, the sampled data will have decreased modula-tion depth compared to the input signal (see figure 2.8). Since the input signal

(32)

is sinusoidal (see figure 2.11), it is already known that a sine curve can also be used to describe the captured data. The minimum and maximum values of the signal, which can be derived from the expression of the sine curve, are used to calculate the modulation (as described in section 2.5.1). This process is repeated with a varying input signal, corresponding to spatial frequencies of interest, to obtain the MTF.

One advantage with the sinusoidal signal approach is that the effect noise has on the method is reduced with the sine curve fit. The actual insensitivity to noise depends on the robustness of the technique used for the curve fit. The drawbacks of using a sinusoidal input signal are that the captured data must be linearized before fitting a sine curve to it [11][17]. This is true not only for sinusoidal data but all image data that are to be used for measurements. It is particularly hard to linearize sinusoidal data as the Opto-Electronic Con-version Function (OECF) is unknown (The OECF is reffered to as the camera transfer function, or camera transfer curve in this thesis). When using black and white bar patterns, as discussed in the next section, the linearization is done by normalizing the data using the assumption that the lowest values in the data are white and the highest values are black. But linearization of the sinusoidal signal is not that trivial as it contains not only white and black but also shades in-between. There need to be known reference values (of different gray levels), which are transformed simultaneously with the input signal, avail-able to be used for the estimation of the camera transfer function[11]. This estimation is then used to linearize the data before the curve fit.

Square Wave Modulation

Using a square wave modulated (see figure 2.12) as opposed to a sine modulated input signal results in a linearization that is very simple. The image data is shifted to zero by subtracting the distance between zero and the lowest value from all data values. The entire data set is then multiplied with the maximum value, e.g. 255 for an 8 bit image and divided with the highest value in the original data set. This linearization is illustrated here for an 8 bit image:

(33)

Imlinearized= (im − min(im)) · 255 max(im − min(im)) (2.6)

where Im is the image from the camera.

The contrast ratio obtained from bar patterns is called the Contrast Transfer Function (CTF) and since MTF is based on a sine wave response, CTF is not the same as MTF. In order to go from a square wave to a sine wave the Fourier Transform mathematics can be used. It is stated that a square wave can be expressed as an infinite sum of sine functions and the MTF would consequently relate to the CTF as M T F (f ) = π 4 · CT F (f ) + CT F (3f ) 3 + CT F (5f ) 5 + CT F (7f ) 7 + ... (2.7)

and simplifying, by omitting some of the terms from the equation above, yields

M T F (f ) = π

4 · CT F (f ) (2.8)

The difference between MTF from 2.7 and 2.8 is only significant at low spatial frequencies [15]. The effect of the accuracy, in the resolution measurement, is discussed further in section 3.6.

2.5.3 Subjective Quality Factor (SQF)

Even though MTF is a useful tool for quantifying image quality there are some major drawbacks which relates to its nature rather than anything else. As the MTF describes modulation over an increasing spatial frequency a question that have to be considered is how the spatial frequency relates to the human eye. The Subjective Quality Factor (SQF) is, as the name implies, a subjective measurement which is more directly linked with perceived image quality than MTF. SQF weighs the frequencies in the MTF differently to match the sensi-tivity of the human eye and because of this ought to be a better measurement for resolution. It is a one dimensional measurement based on the use of the MTF curve in conjunction with a Contrast Sensitivity Function (CSF) for the human eye. Together with given parameters of viewing distance and size of the image (in order to match domains between the MTF and CSF, see section 3.6.3) the resulting SQF is defined as the area under the curve, which is the MTF multiplied with the CSF, multiplied with K (see figure 2.13).

The equation for the SQF as implied by Granger [9] is

SQF = K ·

Z _{CSF (f ) · M T F (f )} f

(34)

Where the normalization constant K ensures that a MTF value of 1 yields a SQF value of a 100%. K is defined as K = 100% R CSF (f ) f df (2.10)

Figure 2.13: MTF together with CSF yields the SQF.

Contrast Sensitivity Function (CSF)

There is a need of a mathematical description of the contrast sensitivity of the human eye in order to get a SQF representation of the measured data. In this thesis the so called Contrast Sensitivity Function based on the following expression proposed by Mannos and Sakrisson in 1974 [18] is used:

CSF (f ) = 2.6 · (0.0192 + 0.114 · f ) · e−(0.114·f )1.1 (2.11) This equation is simplified by omitting the constant 0.0192 which has little impact on the result (see figure 2.14) but a big impact on the complexity of the equation 2.9.

(35)

Figure 2.14: The Contrast Sensitiviy Function (CSF).

2.6 Noise

Noise is a random variation of signal that arises in the camera sensor and is, in most cases, perceived as degradation in image quality. The performance of imaging sensors is affected by a number of factors and each one affect the amount of noise. For example, when acquiring images with a CCD camera, sensor temperature and light levels are factors that affect the amount of noise in the resulting image [8]. Other factors that contribute are for instance pixel size in the sensor, exposure time, digital processing and raw conversion [11]. As described in section 2.3, both CCD- and CMOS sensors are used in digital cameras today. One of the most significant differences between the two used to be the amount of noise produced in the sensor. Lately, CMOS sensors have improved in quality and the two sensor technologies are now comparable, differing only in detail.

As described in section 2.3.1, demosaicing is performed to estimate the missing color channels in each pixel. Since noise appears at pixel level, it basically affects either the red, green or blue pixels, according to the Bayer CFA (see section 2.3.1. When demosaicing is performed the noise in each color channel is interpolated into the total image. This will result in different noise in different color channels but since this thesis handles the images as panchromatic images, i.e. one channel (see section 3.5), this chromatic noise will not be measured per color channel.

There are two basic types of noise:

(36)

to sensor irregularities.

• Temporal noise that varies randomly each time an image is captured. All digital cameras on the market try to minimize both types of noise through image processing. This is done in a variety of ways but since the approach in this thesis is to measure the noise that is present in the final image the details of the performed image processing will not be discussed.

2.6.1 Spatial Noise (Fixed Pattern Noise)

Spatial noise, also known as Fixed Pattern Noise (FPN), is noise that is con-stant over time. It is characterized by the same pattern of noise occurring in images taken with the same camera in the same illumination. The source of FPN may be differences in sensitivity of the pixels in a sensor. The two most significant sensitivity different types of pixels are ”hot” and ”cold” pix-els. A hot pixel is a picture element with an excessive charge compared to the surrounding pixels and a cold pixel is conversely pixels with low or no sensitivity.

There are two types of FPN. Pixel-to-pixel FPN is noise that is mostly visible when the camera is panning and looks similar to dust or grain on the lens. It can be approximated with a Gaussian distribution. The other type of FPN, structured FPN, appear like columns or rows in the image due to for example un-correlated row wise operations in the image sensor [10]. Structured FPN is easily detectable by the human eye and possible to suppress, and in some sense measure, by subtraction of a reference image. In this thesis however, the locations on the target used for measuring noise are thin circular areas (see section 3.1), hence it is impossible to detect column like structures.

2.6.2 Temporal Noise

Most temporal noise in an image from a CCD- or CMOS sensor is partly generated in the sensor (so called read noise) and partly intrinsic to the photon flow from the scene (photon shot noise, see section 2.6.2).

Mathematically, noise is often modeled with Probability Density Functions (PDF) to describe the statistical behavior of the signal variations in the noise. The following types of temporal noise are the most common in the images produced by digital surveillance cameras with CCD- and CMOS-sensors.

(37)

Gaussian noise

The Gaussian noise model is often used to simulate image noise because of its mathematical tractability in both the spatial and frequency domain. The PDF of a Gaussian random variable, z, is given by

p(z) = √1 2πσ e

−(z−µ)2_/2σ2

(2.13)

where z is the level, µ is the mean of average value of z and σ its standard deviation. The values the noise take on have Gaussian (also known as normal) distribution which means that the values cluster around a mean value. In figure 2.15 an image with added Gaussian noise (σ = 25.5) is shown.

Figure 2.15: A test pattern image containing gaussian noise.

Photon Shot Noise

Essentially, most image acquisition devices are photon counters. In every pixel in a sensor, the number of received photons gives the intensity of signal in that pixel. The distribution of photons reaching a pixel p is usually modeled as a Poisson distribution with parameter λ.

P (p = k) = e

−λ_λk

k! (2.14)

(38)

An interesting property of the Poisson distribution is that the variance is equal to the expected value. Consider two areas with different levels in an image. If the image only contains photon shot noise the noise variance will be greater in the brightest area since the number of counted photons is greater there. Note that the signal is also greater here and that thus the signal-to-noise ratio is indeed larger in the brightest areas. The signal-to-noise due to photon shot noise is √N where N is the number of captured photons.

When λ increases the Poisson distribution approaches a normal distribution (see figure 2.16) which implies that the photon noise can be approximated by the normal distribution with mean and variance both equal to λ [1]. This approxiamtion simplifies the simulation of a reference image only containing photon shot noise (this is described in detail in section 3.8.1).

0 5 10 15 20 25 30 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Poisson, λ = 1 Poisson, λ = 5 Poisson, λ = 20

Normal, mean & variance = 20

Figure 2.16: The Poisson distribution approaches a normal dis-tribution when λ increases.

Impulse noise

Impulse noise, also called salt & pepper noise because of its appearance in the image, often arise when transmitting images over noisy digital links [1], but is in CCD/CMOS sensors often the result of malfunctioned pixels. The PDF of impulse noise is given by

p(z) =      Pa for z = a Pb for z = b 0 otherwise (2.15)

(39)

If b>a, graylevel b will appear as a light dot in the image. Conversely, graylevel a will appear as a dark dot. Figure 2.17 shows an image with added salt & pepper noise.

Figure 2.17: A test pattern image containing salt & pepper noise.

Temporal Row- or Column Noise

Noise that is structured in rows or columns is another type of temporal noise sometimes seen in images captured by digital sensors. This kind of noise is similar to structured FPN in a snapshot but since it is a temporal noise these rows and columns appear, disappear and move with time. Since the measurements in this thesis are unable to detect structured noise (see section 2.6.1) this type of noise is measured as if it was part of the Gaussian distributed noise.

2.7 Exposure Time Estimation

There is a need to be able to estimate the exposure time for a particular camera in different amount of light. It is important to know how the camera handles especially low light situations and to see if low light performance is traded for a short, action freezing, exposure time. In a surveillance situation it is rarely the immobile objects in the image that is of interest but the moving, changing parts of the view. Therefore the exposure time should be short enough not

(40)

to blur the moving objects. It can be hard to get a feel for how much a slow exposure time affects the blurring of a moving object and it is therefore desirable to measure or approximate the actual exposure time.

(41)

Chapter 3 Implementation

3.1 Target Design

Prior to this thesis, Axis proposed a target design for the purpose of measuring resolution, noise and estimating exposure time. The objective of this thesis is to evaluate the image quality in video cameras (see section 1.3) and it is therefore interesting to look at both stationary and moving objects. To make it possible to acquire information in the same way regardless of the captured object is moving or not, the target has a circular design. This design worked as a starting point in this thesis and it was evaluated during the development of the application. The original target was found to be well designed and just small changes were made to it. The final target is shown in figure 3.1 and the different components and their purpose is described in detail in the following sections.

3.1.1 Siemens Star

The Siemens star pattern (figure 3.1A) was the brainchild of Werner von Siemens in the 1930s. He developed the pattern for use on testing the focus on video cameras and not for measuring resolution. It is very easy to visually inspect if the camera is focused when observing the Siemens Star through the camera. This is due to the appearance of highly visible Moir´e patterns [8] towards the center of the star when it is in focus.

When deciding the number of cycles in the Siemens star, it would be ideal with a number where the Nyquist frequency lay just outside the center of the star. This would allow almost the whole radius of the Siemens star to be used in the calculations.

It is however not possible to design a target in this way as there is a great variation in sensor sizes. The number of cycles has to be limited to the imaging

(42)

B

A

C

Figure 3.1: A. A Siemens star used for resolution measurement (section 3.1.1)

B. Gray fields used for noise measuring (section 3.1.2) C. Pattern used for exposure time estimation (section 3.1.3).

(43)

unit which produces the lowest number of pixels. The drawback of this is that the accuracy for cameras which produce images with a larger amount of pixels could be improved using a higher number of cycles.

One way of solving this discrepancy is of course to use different number of cycles for different amounts of pixels but this is once again a question of complexity versus simplicity, which is described in section 1.3. It is deemed unpractical for the purpose of this thesis to be forced to change the target design based on the size of the image the camera outputs.

3.1.2 Gray Fields

The purpose of three solid gray fields in the target (figure 3.1B) is to measure the noise in the images. The levels vary with 20 units between each field and the corresponding pixel values are 108, 128 and 148 respectively (assuming the image levels range between 0 and 255 - 8 bits). The reason for making three fields with different levels is to be able to normalize a noise measurement since different image processing techniques might change the properties of an image. This is described in section 3.7.2.

The fields are solid all around the circle because they need to be invariant when the target rotates. The reason for this is that the data is acquired and measured in the same way regardless if the target is rotating or stationary.

3.1.3 Exposure Time Pattern

In order to determine the exposure time the photographed target must be moving. There must also be reference markings on the target (figure 3.1C) in order to register any movement. Because of this, the markings must not be uniform in the direction of movement.

The goal is to register and measure the blurring effect the exposure time leaves on the markings. For this reason, the background in the field designed for this purpose is black. The reference markings are thin white bars which smudge and become gray areas of certain length when the moving target is photographed.

3.1.4 Generating the Target

A small application was made to generate a digital target. If the characteristics of the target were to change in the future or if the target gets damaged or lost, the same application could be used to generate a new target. The target is then printed at a photo lab and mounted onto a circular acrylic glass pane.

(44)

Figure 3.2: The small C# application developed to generate the target.

The application is written in C# and provide the user with the possiblity to change some parameters when generating the target, for example the size of the target and the number of cycles in the Siemens star (see section 3.1.1). The application is shown in its whole in figure 3.2.

3.2 The Test Process

The complete test process is shown in figure 3.3 where every box represent one step in the process. The following sections will describe the different parts of this process.

3.3 Image Acquisition

To create a reliable image quality rating model certain factors were considered when determining the number of captured images.

As described in section 3.1, it is interesting to look at images of both a sta-tionary and a rotating target. The measurements for resolution and noise are performed on both types of images, while it is only possible to estimate the exposure time in the images of a rotating target (see section 3.9).

Another factor affecting the number of captured images is the measurement of FPN. To be able to separate FPN from Gaussian noise two images are needed (see section 3.7.4).

(45)

(46)

It is also interesting to study the camera performance in different illumina-tions and therefore three illuminaillumina-tions were chosen. One illumination repre-sents common indoor illumination and the second reprerepre-sents low illumination. Since some cameras perform poorly in low illumination, making the images immeasurable, the third illumination is set so that the majority of the cam-eras to be tested can produce measurable data at a moderate illumination. The three levels of illumination are shown in table 3.1

Scenario Illumination

Common indoor illumination 400 lx Moderate illumination 10 lx Low illumination 1 lx

Table 3.1: Illuminations used in this thesis.

This results in 12 images that need to be captured in order to get a comprehen-sive image quality rating, i.e. 4 images (2 with stationary- and 2 with rotating target) in 3 illuminations. Another important thing to remember is the test results are only viable when all 12 images are captured with the same camera settings (see section 3.3.2.

The image quality rating model is designed to measure different aspects in 8-bit jpeg images hence all the calculations are made with the assumption that the image levels range between 0 and 255. The images placed in the folder depicted in figure 3.3 should be named according to the order they were captured, i.e. ”1.jpg” for the first image, ”2.jpg” for the second image and so on. It is possible that the images in low illumination are immeasurable (see section 3.4.1) and it is therefore practical to get one image quality rating per illumination. This way, a camera producing immeasurable images in low illumination will still get ratings for the two other illuminations.

3.3.1 Camera Alignment

The goal of the camera alignment is to center the target (figure 3.4C) in the image while the gray cloth (figure 3.4B) completely fills the frame. The left and right edges of the gray background should be just out of frame. This helps to ensure that the target fills approximately one third of the width of the frame. It is important that this is done properly for the test to work. The reason the target should not occupy a bigger area of the frame is that the closer the target gets to the edges of the frame, the more it is affected by potential lens distortion.

To aid in the camera alignment, certain guidelines must be followed. To begin with, the camera has to be mounted at the same height as the center of the target. Next, the camera should be pointed directly at the target creating an

(47)

A

B

C

Figure 3.4: A. Graded labels used for camera alignment B. Gray cloth

C. The target.

imaginary line from the center of camera sensor to the center of the target. This line should be the normal to both the sensor- and the target plane. To help with this, graded alignment labels (figure 3.4A) are include in the background.

3.3.2 Camera Settings

The lens of the camera should of course focus on the target to get any viable test results. As mentioned in section 3.1 it is easy to determine that the target is in focus by looking at an image of the target. If there are visible Moir´e patterns near the center of the target the focus is correct.

When testing cameras with zoom lenses it is worth noticing that the optical properties of the lens usually changes as the focal length changes. Conse-quently, the test result will only reflect the image quality at that focal length. It is recommended to reset the camera to factory default settings as a good starting point for a general rating of cameras. That being said, it can be interesting to test a camera that is configured for a low illuminated scene and compare it against the default settings in the same camera. It is once again important to point out that the test results only speak for the camera with the configuration used during the capture of the images.

(48)

3.3.3 Motor Settings

In order to get a reliable estimate of the exposure time (see section 3.9), the motor must run at a specified rotational speed. The speed is set to 30 rpm (revolutions per minute) in this thesis. To be able to compare test results of different cameras, the motor must always run at this specified speed.

3.3.4 Filters

As mentioned in section 3.3, three illuminations are used when acquiring im-ages. The lab at Axis, where the images used in this thesis were captured, has a number of light sources mounted on a frame in the ceiling. To increase or decrease illumination, the light sources are switched on and off. A problem with this solution is that the target does not get evenly lit in low illumination. Neutral Density (ND) filters [19] are used to solve the problem described above. The ND-filters are placed just in front of the camera, as close to the lens as possible, and works like a kind of sunglasses for the camera. In this way, the illumination in the scene is constant (400 lx) and the amount of light hitting the sensor is controlled by the ND-filters. The transmittance of light through a filter is determined by the Optical Density (OD) of the filter according to the following equation:

T = 10−OD· 100 (3.1)

where T is the transmission, given in percent, and OD is the Optical Density. A property of the ND-filters is that Optical Density exhibits an additive re-lationship, for example, stacking filters with OD values of 0.4 and 0.5 gives a resultant density of 0.9. To achieve the illumination conditions given in table 3.1 using ND-filters, the following filters are used:

• A filter with OD 0.6 stacked together with a filter with OD 1.0 gives a total transmittance of 10−1.6· 100 = 2.5% ⇒ 400 lx · 0.025 = 10 lx • A filter with OD 0.6 stacked together with a filter with OD 2.0 gives a

total transmittance of 10−2.6· 100 = 0.25% ⇒ 400 lx · 0.0025 = 1 lx

3.4 Image Analysis

3.4.1 Validating Images

To make the measurements and estimations reliable, it is necessary to validate the acquired images. The validation does not take in consideration whether the target is misplaced or skewed. It does instead look at the dynamic properties

Implementation of an Image Quality Rating System

LiU-ITN-TEK-A--10/010--SE

Implementation of an Image

Quality Rating System

Pia Falk

Emil Olsson

LiU-ITN-TEK-A--10/010--SE

Implementation of an Image

Quality Rating System

Examensarbete utfört i medieteknik

vid Tekniska Högskolan vid

Linköpings universitet

Pia Falk

Emil Olsson

Handledare Anders Johannesson

Examinator Björn Kruse

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

Contents

List of Figures

List of Tables

List of Abbreviations

Chapter 1

Introduction

1.1

Background

1.2

Purpose

1.3

Objective

1.4

Delimitations

1.5

Methodology

1.6

Terminology

Chapter 2

Theoretical Framework

2.1

Image Quality

2.1.1

Imatest

2.1.2

Axis Definition

2.2

Psychophysical Image Quality Evaluation

2.3