• No results found

GPU-Accelerated Real-Time Surveillance De-Weathering

N/A
N/A
Protected

Academic year: 2021

Share "GPU-Accelerated Real-Time Surveillance De-Weathering"

Copied!
75
0
0

Loading.... (view fulltext now)

Full text

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

GPU-Accelerated Real-Time Surveillance

De-Weathering

Examensarbete utfört i Datorseende vid Tekniska högskolan vid Linköpings universitet

av

Niklas Pettersson LiTH-ISY-EX--13/4677--SE

Linköping 2013

(2)
(3)

GPU-Accelerated Real-Time Surveillance

De-Weathering

Examensarbete utfört i Datorseende

vid Tekniska högskolan vid Linköpings universitet

av

Niklas Pettersson LiTH-ISY-EX--13/4677--SE

Handledare: Freddie Åström

isy, Linköpings universitet

Malin Ingerhed

Saab Dynamics AB

Examinator: Michael Felsberg

(4)
(5)

Avdelning, Institution Division, Department

Computer Vision Laboratory Department of Electrical Engineering Linköping University SE-581 83 Linköping Datum Date 2013-06-26 Språk Language Svenska/Swedish Engelska/English   Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport  

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-92835

ISBN — ISRN

LiTH-ISY-EX--13/4677--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

GPU-accelererad väderfiltrering för övervakningssystem i realtid GPU-Accelerated Real-Time Surveillance De-Weathering

Författare Author

Niklas Pettersson

Sammanfattning Abstract

A fully automatic de-weathering system to increase the visibility/stability in surveillance ap-plications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from fixed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The system does not use any background model and does not require any pre-calibration. Increase in contrast is obtained in all haze/rain/snow-cases while the system lags the maximum of one frame during rain or snow removal. De-hazing can be obtained for any distance to simplify tracking or other operating algorithms on a surveillance system.

(6)
(7)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet — eller dess framtida ersättare — under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för icke-kommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förla-gets hemsida http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet — or its possi-ble replacement — for a period of 25 years from the date of publication barring exceptional circumstances.

The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for his/her own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be men-tioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/

(8)

Abstract

A fully automatic de-weathering system to increase the visibility/stability in surveillance applications during bad weather has been developed. Rain, snow and haze during daylight are handled in real-time performance with acceleration from CUDA implemented algorithms. Video from xed cameras is processed on a PC with no need of special hardware except an NVidia GPU. The system does not use any background model and does not require any pre-calibration. Increase in contrast is obtained in all haze/rain/snow-cases while the system lags the maximum of one frame during rain or snow removal. De-hazing can be obtained for any distance to simplify tracking or other operating algorithms on a surveillance system.

(9)

Acknowledgments

Without the great help from Saab Dynamics AB this thesis would never exist. Everything has been very smooth and I had no complications with any equip-ment they provided me with since the very rst day. I also got a good view of the Saab Dynamics AB company and a great experience working as an engineer. A special thanks to my supervisor Malin Ingerhed at Saab Dynamics AB who has given me positive support through the whole semester, my supervisor Freddie Åström at Linköping University for technical feedback and very helpful advises for this report and to my examiner Michael Felsberg.

I would like to thank the division of Image Processing at Saab Dynamics AB for all the support, help and interest in my thesis work as well as letting me be a part of your team.

I would also like to thank my family and friends for all the support you have given me. Finally, access to the roof lab at Saab Dynamics AB made it possible to record data. Even if I only spend time there when the weather was bad fresh air is always nice.

(10)

Contents

1 Introduction 1

1.1 Introduction . . . 2

1.2 Dierent kinds of weather . . . 2

1.3 Contributions . . . 3

1.4 System overview . . . 4

1.5 Hardware . . . 5

1.5.1 Camera . . . 5

1.5.2 GPGPU and CUDA . . . 5

2 Methods and previous work 9 2.1 Image ltering . . . 10

2.1.1 The Bilateral lter . . . 10

2.1.2 The Guided lter . . . 10

2.2 Haze removal . . . 11

2.2.1 Models . . . 11

2.2.2 CLAHE . . . 12

2.2.3 Fattal . . . 13

2.2.4 Dark Channel Prior . . . 14

2.3 Rain and snow removal . . . 16

2.3.1 Two-step approach . . . 17

2.3.2 Vision and Rain . . . 17

2.3.3 Histogram of Orientations of Streaks . . . 18

2.3.4 Analysis in the frequency space . . . 18

2.3.5 Motion Robust Rain Detection and Removal . . . 18

3 Introducing the de-Weathering system 21 3.1 System structure . . . 22

3.2 Base software . . . 22

3.3 Static weather removal . . . 22

3.3.1 The algorithm . . . 23

3.4 Dynamic weather removal . . . 28

3.4.1 Rain segmentation . . . 28

3.4.2 Snow segmentation . . . 29

(11)

3.4.4 Snow classication . . . 34

3.4.5 Rain removal . . . 34

3.4.6 Snow removal . . . 35

3.4.7 Temporal averaging lter . . . 35

4 Results and system evaluation 37 4.1 Contrast measurement . . . 38

4.2 Comparing lter implementations . . . 38

4.3 Evaluation per module . . . 40

4.3.1 Haze module . . . 40

4.3.2 Dynamic weather modules . . . 46

4.3.3 Performance measure . . . 46

4.3.4 Visual result of removed rain/snow . . . 47

4.3.5 Evaluation of mean ltering of ve frames . . . 54

4.4 Evaluation of the complete system . . . 55

4.4.1 Performance . . . 55

4.4.2 Tracking conditions . . . 56

5 Conclusions and future work 59 5.1 Conclusions . . . 60

5.2 Future work . . . 62

5.2.1 Performance . . . 62

5.2.2 Robustness . . . 62

5.2.3 More situations . . . 62

(12)

Chapter 1

(13)

1.1 Introduction

Today there is a growing interest for investing in a surveillance system to monitor important areas. Images are normally monitored by an operator or automati-cally by some tracking/recognition algorithm. Operating such a system works ne until the visibility in the images is degraded by weather. Snow, rain or haze can put automatic systems to fail dramatically and operators can have hard times identifying objects within the scene. This thesis has dealt with this problem by developing a system to reduce eects of weather.

Surveillance of large open areas such as airports, train stations or trac roads are sensitive to weather. Those areas also have constant movement of large objects, such as airplanes, in the scene. Surveillance of protected places re-quires images in real-time without latency and processing such videos have to follow these rules. Other areas might not be so important, allowing playback and o-line processing. Depending on environment, weather in images can look dierent which will be discussed in Section 1.2. Surveillance systems are used widely in various situations which makes it impossible to develop a system for de-weathering without making restrictions.

The problem to solve in this thesis assumes the camera to be stationary and the only movement appears if the camera is shaking. Not dealing with camera motion relaxes the problem to assume that we have a stationary background. Only daylight images have been processed where the scene has a uniform illu-mination. Night and articial illumination is left for future work. Also, this thesis use monocular cameras which makes de-hazing a not so easy task since good estimations, such as depth, are required to not ruin the image. Usually background information or images from a clear day is not provided and thus this system does not require such information. By looking at previous works, algorithms were selected and modied to fulll the requirements of this system. A review of those works can be found in Section 2.

1.2 Dierent kinds of weather

Weather caused image distortions can be split into two groups, static and dy-namic. Haze, fog and clear weather lies within the static weather category as time will not aect the appearance of the scene. Haze and fog degrades the image like a lter that increases with distance until a point where it is no longer possible to see through at all. Dynamic weather contains all kinds of visible particles such as rain and snow. Rain often produces streaks in the image due to the high speed of the raindrops combined with the camera exposure time. Snowakes have more air resistance than rain and do not fall as fast, making the streak eect less visible. It is only the particles that lie within the camera focal distance that are visible in the image. Distant rain and snow appears as static weather similar to haze. The scene gets a dierent appearance when the ground is covered by snow instead of grass. Section 2.2, 2.3, 3.3 and 3.4 contain

(14)

more information about dierent weather situations and behavior. Figure 1.1 depicts four images of the same scene during dierent weather conditions.

Figure 1.1: Dierent weather with dierent visibility. Top left: A clear day, Top right: An almost clear winter day, Bottom left: A hazy winter day, Bottom right: More dense haze in winter.

1.3 Contributions

This thesis deals with algorithms with aim to increase viewing distance and improve object tracking conditions during bad weather. The system is built from using state of the art research algorithms that have been speed up using the recent GPGPU programming architecture CUDA by NVidia. The system is an autonomous video enhancer for bad weather situations and it oers a wider spectra than most previous work does. A novel method is used to reduce the complexity and to simplify the algorithm of de-hazing.

(15)

1.4 System overview

The system is divided into two modules: static and dynamic weather removal/processing. These modules can be used individually or together to obtain a wider range of

de-weathering. An input image is captured from a video stream sent from a network surveillance camera. The operator choses how to process the video by input from the keyboard. Processed video is then displayed on the screen, saved to disc or sent back to the network stream. In Figure 1.2 a owchart of the system is provided.

Figure 1.2: On top: Video stream owchart shows the available ltering methods, Middle: Static processing owchart of the main components, Bottom: Dynamic processing owchart with

(16)

1.5 Hardware

The hardware used is a network surveillance camera connected to a PC. Even if the PC used in this work is a high-end workstation the project aimed to use no special hardware other than a GPU with CUDA support. An NVidia GTX680 together with a dual 8 cores Xenon 3.3 GHz CPU has been used when creating the results in Chapter 4.

Algorithms are accelerated with NVidia's CUDA parallel programming lan-guage to achieve real-time performance for algorithms previously reported as slow. CUDA was chosen because of the authors previous experiences and the used hardware supports CUDA. This limits the system to support only NVidia graphics cards. As there is no special hardware except the GPU used in this work it can easily be applicable to already installed surveillance systems. The algorithms are implemented in Matlab, C, and C with CUDA acceleration to show the strengths of using a parallel computing unit as the GPU.

1.5.1 Camera

The camera used is a Basler Bip2 network camera with resolutions up to 1920x1080 and various streaming formats. This system takes MJPEG compressed video as input. Video compression with MJPEG is not dependent on future frames, well suited for real-time applications. The camera is stationary during the record-ings. The Basler camera is powered through a Power over Ethernet switch resulting in a highly portable camera that only requires a network cable.

1.5.2 GPGPU and CUDA

GPGPU, or General Purpose Graphics Unit, has recently extended the appli-cation areas thanks to the changed architecture on graphics cards. Previous GPU's had one purpose, to deal with graphics and the hardware was built to t the graphics rendering loop. Designed for specic tasks, such as geometric transformations, applying texture or rendering the output image, it was hard to use it in other applications. NVidia changed this back in 2007 as their G80-series was released. NVidia introduced a GPU with a central computing unit made for general instructions. The G80 did not only improve the graphics per-formance but also made it easy to do other kinds of computations on the board. To communicate with the GPU, NVidia has developed a language built on the programming language C, CUDA, or C Unied Device Architecture. NVidia is the owner and developer but CUDA is free to use. Today CUDA is only supported on NVidia GPUs with the G80-architecture or later. GPGPU pro-gramming has been favored by simulations and image processing algorithms as those often take advantage of the parallelism. Another GPGPU language is OpenCL that is widely used and works with other hardware than just NVidia's. There are reasons to prefer CUDA over OpenCL such as the similarity to the C instructions and the equal or superior performance that CUDA has. In general,

(17)

the GPU cannot read from the computer main memory so all data needs to be transferred from Host to Device. Host is a codename for the CPU and Device is a plug in hardware that can communicate with the Host, an o-chip device such as a GPU. The time cost of this transfer is hidden in the parallel computations on the GPU. Only algorithms that can be parallelized are expected to run faster on the GPU than on the CPU and the speedup mainly depends on how many cores on the GPU that can be utilized. Independent pixel calculations in image processing can be highly parallelized whereas algorithms with data dependent iterations cannot. A speedup factor of 8 is easy to achieve, often algorithms can go towards 100 times faster [1]. The programmer needs to know a few things about the GPU-architecture to be able to use it the way it is supposed to be used. CUDA works with threads that run in parallel, threads within a block are able to communicate with each other where each thread has a unique ID. Threads should be spawned in thousands, even if the GPU has fewer cores, to keep a high work load. An upper limit of 64K blocks can be created where each block holds up to 1024 threads. Figure 1.3a [2] illustrates the older generation (Geforce 7800) GPU architecture with the graphics rendering loop built in hard-ware. Figure 1.3b from [3] illustrates the architecture of the GTX680 graphics card (GK104) with its massive amount of cores. It has 1568 cores to be used at the same time. The similarities to a heavily parallel CPU in the GTX680 makes it much easier to write general purpose code on it.

(18)

(a) The Geforce G70 architecture, the graphics rendering loop is done in hardware.(2005)

(b) The GK104 architecture, green squares are cores that can be run in parallel.(2012)

(19)

One major dierence from programming on the Host, other than the massive parallelization, is that there is a set of dierent memories available on the Device. NVidia has chosen to name those Global, Shared, Texture, Constant and Cache memory.

Global

The Global memory is accessible for all running threads, it is an o-chip memory that the Device and Host communicate through. Unfortunately the Global memory has slow read and write speeds, up to 600 clock cycles. It is important to make use of the wide bus to transport data from o-chip memory to on-chip super-fast memory. The Global memory is the largest among the dierent memories and nowadays it is common with cards that have up to 4GB of Global memory.

Shared

Shared memory is an on-chip memory that is shared between threads within a block. It is a small memory, normally limited to 64 KB per block, with very fast read and write speeds. Programmers can use this as a manual cache to add data that is going to be reused multiple times and by that way improve processing time.

Constant

Constant memory is a read-only memory that is located on a reserved space in the Global memory. It can read out data faster since the GPU optimize this memory to be read only.

Texture

Texture memory is like the constant memory with the only dierence that one can map 2D textures directly to the Texture memory. There exist more opti-mized read instructions for Texture memory where 32-bits of data can be read at once, commonly used in image processing algorithms.

Cache

Like most processing units the CUDA-architecture also has a cache close to the cores. This memory cannot be modied by the programmer but it helps to know how it works when designing an algorithm for performance. Avoid cache-misses gives faster execution.

(20)

Chapter 2

(21)

2.1 Image ltering

In the following sections two image ltering methods are presented. Both meth-ods aim to preserve edges and rene a low detailed image by using a guiding image. The Bilateral lter [4] is a commonly used lter and the Guided lter [5] is a more recently developed lter. When the Bilateral lter uses a second image as guidance, it is called joint-bilateral or cross-bilateral lter [6].

2.1.1 The Bilateral lter

The Bilateral lter [4] is easy to implement but comes with a computational complexity of O(N2). The lter kernel takes distance and intensity dierence

as weights. The Bilateral lter is described as h(x, y) = k−1(x, y) x+r X u=x−r y+r X v=y−r f (u, v)e−0.5( d((u,v),(x,y)) σd ) 2 e−0.5( d(I(u,v),G(u,v)) σp ) 2 (2.1) where k is a normalization k(x, y) = x+r X u=x−r y+r X v=y−r e−0.5( d((u,v),(x,y)) σd ) 2 e−0.5( d(I(u,v),G(u,v)) σp ) 2 (2.2) and f(u, v) is a weighting function, G(u, v) is the image to be ltered and d is a distance measure. The weighting function is in a cross-bilateral lter the guiding image itself. The exponential terms are Gaussian distribution functions where the two sigmas are to describe the weights. The variable σd is the standard

deviation for the geometrical distance and changing this value will have eect on how much the lter should blur the image. The variable σp on the other

hand is the standard deviation for the euclidean gray level dierence and has eect on how sensitive the lter is around edges. In this work I is the guiding image and G is the image to be ltered. An RGB-color map is not well suited in this application because of the distinct colors between the intensity values. Therefore, this work converts RGB to intensity, I. Image G is already a grayscale image.

2.1.2 The Guided lter

The Guided Image Filter [5] is the second lter studied in this work. To deal with edge preserving ltering, like the Bilateral lter, the Guided lter is a superior candidate. It is closely related to Laplacian matting and Bilateral ltering. It takes two images as input, one guiding and one to be ltered. The two images might be the same or two versions of the same scene. Except that the lter gives superior results compared to similar methods, this lter comes with a computational complexity of O(N), independent of lter radius. A more detailed description of the implementation is described in Section 3.3.2.

(22)

2.2 Haze removal

The haze removal problem has been addressed for several decades. Early solu-tions use polarized lters in front of the camera to remove scattered light from haze. This increases the viewing distance slightly but reaches a limit fast. Us-ing image processUs-ing, there are dierent methods [7] [8] to estimate how much haze that is present and some methods [9] are built from the knowledge of the distance to the scene. This could be done using stereo cameras or by measuring the distance with other methods. Such methods are not suited for all situations. It is only recently that researchers have been trying to solve this by using a sin-gle image as input. The simplicity of monocular solutions is highly appreciated for surveillance systems as they are cheaper than equipment for more advanced solutions.

2.2.1 Models

Generally there are two models to describe the problem of removing haze in images, one physical and one physical. To enhance the image with a non-physical model, methods such as contrast enhancement, histogram equalization or anisotropic diusion can be used. The non-physical model [10] tries to re-store contrast in the image by looking either locally or globally in the image. Algorithms that use a non-physical model try to increase the contrast of the image in various ways. One such method, known for easy computations and pleasant result, is the CLAHE [10] method. It has a good reputation [11] and is used in algorithms for video enhancement. A more sophisticated algorithm has to be chosen for a physical based model and in most applications this gives a superior result to the non-physical one. The physical model takes the actual particles in the air into account.

Understanding of the physical model

The physical model relies on Koschmieder's law [7] which states that the ap-parent light to an observer comes from two sources, one direct and one airlight. The latter one is light scattered from air particles that lie in between the object and the observer. Formally, we have

I(x, y) = I0(x, y)e−kd(x,y)+ I∞(1 − e−kd(x,y)) (2.3)

where the left hand side is the received light at pixel position x, y , I0 is the

direct term and I∞ is the airlight term. This yields in a homogeneous

atmo-sphere and k is the scattering coecient. The equation also states that the scene radiance is attenuated exponentially with the distance to the object, d. Figure 2.1 illustrates how the light rays are aected by haze.

(23)

Figure 2.1: The components in the physical model.

Equation (2.3) can be rewritten into (2.4) which is an equation used in computer vision and computer graphics to model haze in a scene.

I(x, y) = J (x, y)t(x, y) + A(1 − t(x, y)) (2.4) where J is the direct term and A is the scattering term, airlight. The trans-mission, t(x, y), describes the amount of light in J that is not scattered. In a physical model, the airlight models the brightness of pure haze where no objects are visible. The airlight can be presented as a grayscale value or brightness for each color channel respectively. Physical models might use more information than a single image. Such information can be geometric measures or multi-spectral images but those fall out of scope for this thesis. More interesting are the physical model-based algorithms that only use input from single images [7] [8]. Following sections will shortly describe previous works of interest.

2.2.2 CLAHE

CLAHE, Contrast Limited Adaptive Histogram Equalization, [10] [12] use an advanced histogram equalization technique. It is also the most used method to de-haze images and video sequences because of its easy implementation and often satisfying results. There are two dierent ways to enhance the image, one global image equalization and one local. The local usually gives better perfor-mance but can suer from block eects [10]. The foreground and background in the image are separated by looking at temporal statistics. A pixel with constant or almost constant intensity/color will be classied as background while alter-nating intensity/color will be classied as foreground. This method also tries to estimate how the static background behind moving objects looks like. The last step requires a large number of frames to be accurate. Usually, foreground

(24)

extraction is described as Mi(u, v) =    1, ||Fi(x, y) − NBi(x, y)|| > α 0, ||Fi(x, y) − NBi(x, y)|| ≤ α (2.5) where Fi(x, y) is the current frame, NBi(x, y)is the background frame and α

is a user set threshold. The CLAHE method is then applied to the background and foreground separately by calculating local histograms in a neighborhood of MxN pixels. A threshold decides if the histograms are going to be clipped or not. The clipping limits the maximum allowed intensity, which also increases the robstness to noise. Each neighborhood is then equalized and intensity values are distributed on the new allowed set of graylevels. The average number of pixels in each graylevel, Navg for a new histogram is given using

Navg=

N M

Ngraylevel (2.6)

where N and M are the neighborhood sizes and Ngraylevel is the number of

existing graylevels. Using Navg, the histogram can be equalized and the region

is enhanced. The method uses several frames to estimate the background and is thus sensitive to moving scenes when switching on the system. The background estimation also makes this method dicult to use on single images but it is a common method used in surveillance and o-line applications.

2.2.3 Fattal

In [7] a single image de-hazing method by estimating the airlight is described. In other articles as [13] [8] and [14] it is one of the primary methods used to compare with and is usually classied as state of the art. It is a physics based method with focus on scattered light, described as a transmission formula. In [7] a color extension of (2.4) is used which reads

Ic(x, y) = t(x, y)Jc(x, y) + (1 − t(x, y))Ac (2.7) where Ic(x, y)is dened on all the color channels, c ∈ RGB, and t(x, y) is the

transmission map describing how much scattered light interferes with the light reected from the scene object. Ac is the airlight color vector and J is the

reected light from the scene object.

The airlight-albedo might dier between every pixel which increases the complexity of the problem. By letting nearby pixels share a constant albedo for a small region the degree of complexity decreases. In [7], J is modeled as a pixel wise product between the surface albedo and a shading factor, Rl. R describes the RGB vector of the surface reectance coecients and l is a scalar describing the light reected from the surface. The locally constant albedo gives that R(x, y) is also locally constant.

An important assumption is that the transmission map, t(x, y), and the shading function, l, are uncorrelated which gives a solution by estimating the

(25)

transmission map from looking into scene illumination. Most of the time there exist no constant albedo over the whole image but multi albedo (multiple sur-faces).

The resulting depth map can, besides de-hazing, be used to refocus images. There are some areas where this method fails, for example when there is insuf-cient signal-to-noise ratio in the image, also the computational time is among the highest for de-hazing algorithms available. Too much noise gives the wrong transmission map.

2.2.4 Dark Channel Prior

Dark channel prior [8] is a method that use the same concept as in (2.7). The algorithm estimate a distance map over a scene to nally get a transmission map, t(x). The theory relies on statistics from haze free outdoor images, where at least one of the RGB channels has low intensity. According to [8] this is true for 90% of the pixels that are not sky pixels. From the statistics, most of the images will have this behavior and in most cases this method gives extraordinary results [13]. The dark channel is calculated by inserting the lowest value of R, G and B into a monochrome image. To extend the precision, the algorithm looks in a neighborhood of nxn pixels. The given output is a rough estimate of a distance map taken from a single image. This can be almost directly translated to a transmission map by the simple transformation formula

˜ t(x) = 1 − ω min y∈Ω(x) minc Ic(y) Ac  (2.8) where ω is a constant to protect the ltered image to not remove all haze, as that would look unrealistic. Since the distance map is created relative to the airlight it is a valid model if the objects in the scene do not have an appearance very similar to the haze. The transmission has large blocks when Ω is large and if used to reconstruct the image, it will cause an output with visible block eects (halos). This is solved by rening the transmission map using an edge preserving lter guided using the original image. In [8] a soft matting Laplacian method is used to solve the renement. In [6] a cross-bilateral lter is used and in [5] a more sophisticated lter, named Guided image lter, is used. The Guided lter has better performance compared to the Bilateral lter in both computational complexity and quality. There are many similarities between the Guided lter and the soft matting Laplacian method. In [8] the airlight is estimated from the dark channel. There the 0.1% of the brightest pixels in the blocky dark channel is taken and the airlight is chosen as the brightest pixel among those in the original image. This method gives a better estimate instead of choosing the brightest pixel of the original image. The nal image is then calculated as

J(x) = I(x) − A max (t(x), t0)

+ A (2.9)

The algorithm is computational heavy and [8] reports computation times of 10-20 second of a 0.25 mega pixels image using a 3.0 GHz Intel P4 processor.

(26)

What is also stressed is that the choice of lter is very important to avoid visible halo eects. The main strengths of this method are the preserving of colors and edges. One drawback using this method is that in some images it might fail. Such situations can be strong sunlight. The model does not take dierent scattering for dierent wavelengths into account which is a approximation of the reality. There have been many improvements in getting the lters faster [15] but instead reducing the image quality. Simpler assumptions and good remained image quality give this algorithm a great advantage over others.

(27)

2.3 Rain and snow removal

Rain and snow diers from haze because of the bigger particles. In [16], extensive research of how the camera sees the rain and snow is presented. The particles are modeled where raindrop size, exposure time and background light is taken into account. These studies can be used in advanced graphics where rain should be rendered, detected or removed. Rain has another impact on the image than haze and the streaks from rain can be annoying for an observer or a processing algorithm might fail. As rain and snow particles moves, the temporal variety is an important factor to identify and remove the particles. The visible particles have to be within the camera's depth of eld and the ones that are too distant will appear static with similar appearance to haze. Rain and snow has dierent appearances when the particles are visible. Rain has a higher velocity than snow that has more air resistance.

Figure 2.2: Rain aecting the view.

There are many previous studies on methods for rain and snow degraded image enhancement. Often a background model is estimated to segment the rain or snow particles [16]. Some work allows a moving camera which makes the problem much more complex. Others have been looking in the frequency space [17] to nd and lter rain/snow. Some algorithms does the processing on-line [18] while others does it o-line [19]. Some concepts of the o-line technics are of interest such as [19]. The work of [16] also proposes a simple but yet eective method of rain removal by looking into spatio-temporal dierences. The following sections reviews some of the previous work used when choosing algorithms for this thesis. The segmentation step in [16] and [18] is highly suited for a massively parallel implementation since most of the computations are local down to pixel level. Spatial and temporal information from video is used to segment, classify and remove particles from the frames.

(28)

2.3.1 Two-step approach

As described in section 2.2.2, the CLAHE method for removing haze use a large set of frames to get a good estimate of the background. The method in [20] relies on a learning phase of around 60 frames to estimate the background map. Any changes in the scene during this learning phase could cause problems for the de-rain part. When operating on-line the algorithm dier the new image with the estimated background to extract snowakes or raindrops that should be removed. The algorithm uses (2.10) below to ll the created holes in the image with background information

P = αPb+ (1 − α)Pf (2.10)

where 0 ≤ α ≤ 1 is a weighting factor that decides how much of the back-ground/foreground that should be used in the new pixel value. P is the output image, Pb is the background image and Pf is the foreground image. Another

method described in [20] is an in-painting method that takes information from the foreground to ll the holes. This might be useful if a bigger moving object is behind the snowake. Thus more complex situations can be handled with simple implementations. The strength of this method is that it is very fast and suits surveillance applications well. But at the same time the weakness falls into the pre-learning stage where the static scene has to remain unchanged for good background estimation.

2.3.2 Vision and Rain

In [16] an advanced and well recognized work of rain appearance is described. The rst step in this algorithm is to detect aected pixels by treating rain as positive temporal intensity shifts in the image. A pixel blocked by rain should contain the same background value in the previous and following frame,

∆I = In− In−1= In− In+1≥ c (2.11)

To get rid of false positives due to motion in the scene, more constraints such as if the rain is linearly related to the background are used. A direction eld of the rain is also estimated to remove false positives. The rain damaged pixels are replaced with the average of two adjacent frames. A study in rain visibility depending on camera settings and distance to the rain has also been made in [16]. Rain physics, such as reecting and transmitting characteristics, velocity and raindrop shape are all studied. Almost everything is related to both camera hardware and computer vision aspects which makes [16] to a good source.

(29)

2.3.3 Histogram of Orientations of Streaks

In [21] a study in detecting rain using Histogram of Orientations of Streaks is proposed. Most theory and basics are taken from [16]. The algorithm estimates a background that is subtracted from every new frame. Rain streaks are clas-sied using three conditions: motion, photometry and streak size. When the rain streaks are classied, the Histogram of Orientation of Streaks is calculated where 180 bins are used in the angular range 0 ≤ h(θ) ≤ π. [21] found that cal-culating the histogram using Geometric Moments gives sturdier result among several mentioned methods in [21]. Longer streaks are given a higher weight and rain streaks are assumed to be close to a certain angle. On top of this, an uncertainty measure is calculated and the nal formula for Histograms of Orientation of Streaks is given by

h(θ) = P X i=1 ai dθi √ 2π e −1 2(θ−θidθi ) 2 (2.12) where θ is streak angle, aiis streak length and the variance dθiis the uncertainty

measure. This method removes most noise that passed the classication stage. The algorithm is developed to detect rain and its intensity by only using a single camera. The algorithm is very sensitive to camera motion but performs well if none is present.

2.3.4 Analysis in the frequency space

In [17] a method to detect rain and snow in frequency space rather than image space is proposed. By analyzing particle size, velocity and direction a robust model to predict how rain or snow will appear in frequency is presented. After detection, rain or snow can be reduced or increased depending on the applica-tion. The method in [17] models rain using a very simple analytical model and approximates the appearance in image space to a Gaussian. Movement of the rain and snow appears as a linear motion blurred Gaussian. Results with charts visualizing where this method provides better results than putting a median l-ter or patch based methods is shown. The result also shows that some content in the image may be falsely classied and removed by accident.

2.3.5 Motion Robust Rain Detection and Removal

In [18] the Wavelet transform is used to segment rain streaks from video frames. The input image is transformed to a second layer Wavelet transform where the low-pass part is set to zero. The reconstructed image contains the edges of the scene, including the rain streaks. A Bilateral lter is then used on the input image to lter the streaks that are thinner than most edges in the image. The Wavelet transform is then used on the ltered image where the same procedure is repeated. The two outputs from the Wavelet procedure are thresholded to binary images. The new edge image contains edges of the background that

(30)

is subtracted from the full edge image. As the other methods described here, equation (2.11) is used to detected rain but with only two images. A binary point wise multiplication of the peaks and the rain edges are then said to be the true rain streaks in the image. This segmentation is more robust to object motion than other methods [18]. Streaks are replaced by lling the holes using an in painting method. A winning concept over other methods is that this method only uses two images to obtain the intensity peaks, which is real-time without making compromises.

(31)
(32)

Chapter 3

Introducing the

(33)

3.1 System structure

In the current state, the system requires input from the user to select which kind of de-weathering that is preferred. During clear weather the system should be passive and no enhancement needs to be done. By selecting weather enhance-ment or combined enhanceenhance-ment, a better image will be obtained. The following sections will in detail describe how this works. A schematic view of the system is here again provided.

Figure 3.1: Static processing owchart of the main components.

3.2 Base software

The base-layer of the software was provided by Saab Dynamics AB which all implemented functions lie on top of. The software read images or video from disc or stream and handles the visualization of the images. To gain performance, the image displayed is a RGB565 image, which is an RGB image with the numbers 565 to describe the amount of bits per channel. All calculations deal with 8 bits per channel using unsigned characters in C. The top software is written entirely in C-code and compiled using Linux with gcc and nvcc.

3.3 Static weather removal

The static de-weathering Module removes haze and fog from video and single frames. The algorithm is mainly the Dark Channel Prior method [8]. The module only utilizes the current frame and can be used as standalone software for still images as well as streaming video. The Dark Channel Prior does most of the calculation on pixel level which suits well in a massively parallelization. The implementations are to large parts done in CUDA to meet the real-time requirement. This is an improvement compared to [8] where processing time is measured in seconds. Also a new method of nding the dark channel is proposed to minimize halo eects in the output image. One typical test image is the forest image (Figure 3.2) and all steps in this module will be visualized using that image.

(34)

Figure 3.2: The forest image.

3.3.1 The algorithm

The implemented version of the de-haze algorithm is here described in detail. A visual description of the algorithm steps are depicted in Figure 3.3.

(35)

Figure 3.3: The process of Dark Channel Prior algorithm. 1) Original image, 2) Dark channel, 3) Dark channel blocky 4) Dark channel with proposed method 5) Transformation map, 6) Rened

(36)

Finding the dark channel

The process starts with estimating an air map from the distance map. According to Koschmieder's law, scattered airlight is an additive term. The dark channel is calculated by assigning a grayscale image with the lowest value of R, G and B. By looking into the neighboring area, Ω, around each pixel to nd the lowest value [8], a more accurate result will be obtained. The improved estimate Jdark

is constructed using

Jdark(x, y) = min

x,y∈Ω( minc∈r,g,bJ

c(x, y)) (3.1)

While (3.1) improves the estimate of the dark channel it also gives rise to other problems, such as visible halo eects in the output image. To avoid the halo eects occurring if a supposed bright area is given a dark value, a new solu-tion is proposed. With inspirasolu-tion from the examiner of this thesis, two local regions can be registrered when nding the dark channel. By switching the operand minx,y∈Ω to the lter Hx,y∈Ω(x, y) errors in Jdark(x, y) are reduced.

Hx,y∈Ω(x, y)calculates the mean value in Ω and compares it to the center pixel.

If the center pixel has a lower value than the mean value it is assumed to be a part of a dark area and the min value of Ω is assigned to the center pixel. If the value of Jdark(x, y) is higher than the mean of Ω the mean is assigned to

the pixel. This protects areas of sharp edges and does not harm areas within dark or bright regions. Figure 3.4 illustrates the procedure and Figure 3.3-4 visualizes the result on the forest image.

Figure 3.4: Finding the dark channel. The center pixel in each region will be compared to the mean. The circle will be assigned the darkest value and the star will be assigned the mean value.

(37)

The dark channel can be used as a distance map without additional information about the scene. The true distance will never be found; instead a relative distance with respect to the level of haze will be used. Equation (3.1) is missing the airlight vector, Ac, to get that relative distance map which is solved by

instead using

Jdark(x, y) = 255Hx,y∈Ω min c∈r,g,b

Jc(x, y)

Ac

 (3.2)

Because 0 ≤ Jdark ≤ 255, pixels needs to be clamped if brighter than the

airlight.

Estimating the airlight

The airlight needs to be estimated automatically. In [7] the brightest pixel of the input image is chosen, but usually the airlight is not as bright as this. Instead the dark channel obtained in (3.1) is used to estimate the airlight. From this image the brightest values are taken by calculating the brightest pixels within a group of 16x16 pixels. The upper half of the image will contain the most haze dense area, from where the airlight is estimated. The result is stored in a vector and sent back to the Host which picks out the rst largest value starting from the top left corner. The airlight is then estimated by maximizing R + G + B of those 16x16 pixels the Host picked out. With this method, the airlight does not have to be the brightest pixel of the image, but closer to the most haze dense. Since the airlight changes slowly during hazy weather conditions, there is no need for recalculations every frame. Updating the airlight vector after every 40 frame is enough. This also helps to avoid what [10] call global ickering. The airlight vector is stored to be used in following frames.

Filtering the distance map

The median lter H(x, y)'s radius, Ω, was chosen by analyzing the eects of dierent sizes and their eects on the output image. To minimize errors in the distance map even more, a structure preserving lter is used to t the distance map to the input image. Two dierent lters have been implemented, the Bilateral lter described in Section 2.1.1 and the Guided lter. The Bilateral lter is modied to use a second image as guidance. The CUDA implementation of the Bilateral lter is fast and uses a lookup table for the rst Gaussian term in (2.1). The original image, which uses RGB color space, is converted into intensity and used as guidance. The Bilateral lter signicantly increases the quality of the distance map. The second lter, the guided image lter [5], has a not as straightforward implementation as the Bilateral lter.

(38)

Guided lter implementation

The CUDA implementation of the Guided lter is straight from the denition in [5]. The lter output, qi, is calculated using

qi= 1 |ω| X k|i∈ωk (akIi+ bk) (3.3)

where ak and bk are calculated using (3.4) and (3.5) below. Here Iiis the input

image and pi is the guide image (the original). µ and ¯pk are mean values of Ii

and pi respectively. σk is the standard deviation of Ii and  is a key parameter

to the Guided lter as well as to avoid division by zero. Changing  gives similar eect to changing the standard deviations of the Bilateral lter.  was experimentally set to 0.078. ak= 1 |ω| P i∈ωkIipi− µkp¯k σ2 k+  (3.4) bk = ¯pk− akµk (3.5)

Equation (3.3) can be approximated as the mean value of a and b, thus giving the formulation

qi= ¯aiIi+ ¯bi (3.6)

The CUDA implementation spawns one thread per pixel and calculates the mean value for its region ωk, the lter kernel size. The kernel size should be

large compared to the block sizes of (3.1) but as this is the most computational heavy calculation for the system the kernel size is a compromise between speed and quality.

Restoration

After creating the fully enhanced distance map the next step is to convert it into an air map, ˜t(x, y). This is done by inverting the distance map

˜

t(x, y) = 255 1 − ωHx,y∈Ω( min c∈r,g,b

Jc(x, y)

Ac ) . (3.7)

ω is now a constant with an initial value of 0.8 to avoid the resulting image to be too saturated as that would give an unrealistic impression. The constant is adjustable by the user to change the level of haze removal to suit the needs of the application.

Once the air map is computed the next step is to remove the haze. Using (3.8) for all color channels the haze is removed.

J(x, y) = I(x, y) − A max(˜t(x, y), t0)

+ A (3.8)

A constant t0is used to avoid division by zero and J(x, y) is the de-hazed image.

(39)

Used ltering procedure

In [5] a lter radius, r = 60, is used for the implementation independent of the radius size. The implemented version of the Guided lter does not feature this behavior and is thus dependent of the lter radius. Although fast execution and satisfying results can be achieved for r = 16 with the use of (3.1) and not the proposed H method, Figure 3.4. Using H(x, y) together with the Bilateral lter gives a much sharper distance map and the restored image does not suer from halo eects as can be observed with the small guided lter. The implemented algorithm uses only the Bilateral lter to enhance the distance map with σp=

4.47and σe= 0.1in (2.1).

3.4 Dynamic weather removal

The Dynamic Weather Module removes rain and snow from video. The al-gorithms share most parts between the two weather conditions but has some diverting steps. Inspired by [17] and [18], a Wavelet transform is implemented to detect rain in the frequency domain. The Wavelet method is more robust to other motion detecting algorithms [18] in the image since it handles larger objects better. In Chapter 4 the dierent methods will be evaluated.

3.4.1 Rain segmentation

One method to nd potential rain pixels is to compare intensity levels in three adjacent frames. Visible raindrops are shown as strong intensity streaks that are randomly placed in the scene and it is hard to predict where and when they appear. By assuming that rain streaks are not at the same position more than one third of the time [16], searching for intensity peaks gives a rst segmentation step. Using a positive threshold, intensity peaks can be found and a binary rain map, Bn is computed as Bn(x, y) =    1, if In− In+1> c1&& In− In−1> c1 0, otherwise (3.9)

The threshold limit is set with inspiration from [16] and by analyzing temporal dierences for pixels in a static background (Figure 3.5) where a value, c1 = 3

appeared to be a good value. If the value is set higher, less rain will be segmented and noise starts to be dominating if it is lower. In [18] only two adjacent frames are compared, using only the rightmost term in (3.9).

(40)

Figure 3.5: Plot for two pixels in a rainy scene with almost static background during 130 frames.

3.4.2 Snow segmentation

Snow usually appears bright and snowakes captured on video are then often brighter than the background.

Figure 3.6: Three pixels intensities during 400 frames from a snowing scene with static background. The snowakes appears as clear peaks in the plots.

By comparing three adjacent frames, a binary snow map can be computed by considering Bn(x, y) =    1, if In− In+1> c1&& In− In−1> c2 0, otherwise (3.10)

where In represents the image intensity. Intensity peaks caused by snowakes

are shown in Figure 3.6 where three pixel intensities has been plotted over 400 frames. Pixels blocked by a snowake in image I is assumed not to be blocked

(41)

in the previous nor the following image. This assumption holds most of the time although pixels might get misclassied. The constant c2 is set to four

(0 ≤ In ≤ 255) which segments positive intensity changes. The binary snow

map is then dilated with a 4-connected cross structure element. This eliminates some halo eects that can be visible around the segmented snowakes.

3.4.3 Rain classication

There are two algorithms to classify if the potential rain pixels are rain or not. One uses the hue information from the backround and raindrops as the other algorithm uses a wavelet transform to nd rianstreaks in the image.

Hue

Since raindrops are transparent the background hue should not dier from the raindrop hue [16]. Pixels have thus an almost constant hue over time during rain. With this observation, a robust classication where objects not being rain will remain unltered can be presented. The hue is calculated using Algorithm 1 [22] below for both the background and the segmented rain pixels.

Data: R,G,B Result: H

max = max(R,G,B); min = min(R,G,B); Chroma = max - min; if Chroma = 0 then H = 0; end if max = R then H = (G-B)/Chroma; if H < 0 then H += 6 end end if max = G then H = (B-R)/Chroma+2 end if max = B then H = (R-G)/Chroma+4 end H = 255*(H*60/360)

(42)

A pixel is said to be a rain pixel if it has been segmented and fullls

abs(Hb− Hr) ≤ 10 (3.11)

The background hue, Hb, is the hue from the latest frame that is by (3.9) not

rain pixels. Variable Hr is the hue for the segmented rain pixels.

Wavelet

The second method implemented to classify rain is the Wavelet method [18]. The implementation uses the forth order Daubechies wavelets (db4) in a three layer transform. Figure 3.7 shows the one layer forward Wavelet transform of an image (2D-signal) and Figure 3.8 the inverse. The lter hψis the db4 coecients

and the original image can be found in Wϕ(2, m, n).

Figure 3.7: The forward Wavelet transform for 2D-signals.

In the third layer the low-pass part, Wϕ(0, m, n), is set to zero and the edge

information (Horizontal, Vertical and Diagonal) is inversely transformed back to the original size.

Figure 3.8: The inverse Wavelet transform for 2D-signals.

This procedure is further referred as W aveletprocedure. The result of the Wavelet procedure is a grayscale image with intensities 0 ≤ I ≤ 255 that is

(43)

thresholded with a single threshold experimentally set to four. The thresholded result is then a binary edge map containing background edges and rain streaks. Figure 3.9 shows a owchart of the Wavelet method.

Figure 3.9: Flowchart of the Wavelet method.

The variable w is the output after the forward and inverse Wavelet transform has been applied. Rain streaks do not have very strong edges and can be ltered out by the Bilateral lter, which preserves the background structure. The ltering smooth's the image without erasing strong edges. Background edge information is calculated by applying the Wavelet procedure on a Bilateral ltered version of the image. By taking the same threshold on w the two binary images, w1and w2

in Figure 3.9, contains background plus rain and only background respectively. The intersection of w1and w2is then subtracted from w1. The resulting binary

image contains most rain streaks, a point wise multiplication with the segmented binary map classies if the pixel is a rain pixel or not. The algorithm is depicted in Figure 3.10 with the results of each step.

(44)

Figure 3.10: The procedure for the Wavelet analysis. 1) Rainy scene. 2) Binary image after Wavelet procedure. 3) Binary image after bilateral procedure. 4) The intersection of 2 and 3 is

removed. 5) Pointwise multiplication with the intensity segmentation. 6) Resulting image with rain streaks removed. The images at the bottom are cropped from 1) and 6).

(45)

3.4.4 Snow classication

To classify if the segmented snow pixels are snow or not, a soft constraint is used. It uses the frames In−1and In+1to compute a new binary snow map as

Bn(x, y) =    1, if abs(In−1− In+1) < c2 0, otherwise (3.12)

Moving objects in the scene is by this left unltered as the background has to be constant. With inspiration from [16] and some experiments, c2is set to ve.

Values in Bn(x, y) equal to one after both (3.12) and (3.10) are classied as

snow and should be replaced.

3.4.5 Rain removal

After segmentation and classication, the rain can simply be replaced by using the information from the non-blocked pixels in the adjacent frames. In the hue algorithm the equation

In= (In−1+ In+1)/2 (3.13)

is used and in the Wavelet method, the relation

In= In−1 (3.14)

is used. Figure 3.10 illustrates the procedure for removing rain with the Wavelet method.

(46)

3.4.6 Snow removal

The classied snow pixels in the current frame are replaced using background information for the pixels. As for rain removal, background is computed as the average between the previous and following frame (3.13). This removes most of the snowakes in the scene but static snow is left with the appearance of haze. Applying the haze removal module to this snow free image will increase the viewing distance. Figure 3.11 depics the steps of the snow removal algorithm where the last step also includes de-hazing.

Figure 3.11: The process of the snow removal algorithm. 1) Original image, 2) Binary segmented snow, 3) Chosen background pixels, 4) Information from adjacent frames, 5) Output image.

3.4.7 Temporal averaging lter

A last method for both rain and snow is a temporal averaging lter. The output image is taken as the average of ve following frames. An investigation of the performance of this averaging algorithm is done in Section 4.3.4.

(47)
(48)

Chapter 4

(49)

4.1 Contrast measurement

A method to measure performance of the ltering is to measure the contrast gain, Cg[23]. The denition of contrast gain is the dierence between the mean

contrast of the ltered and the unltered image. The mean contrast for an image, I, is computed as ¯ CI = 1 M N N −1 X y=0 M −1 X x=0 s(x, y) m(x, y) (4.1)

where m and s are calculated from I using m(x, y) = 1 (2p + 1)2 p X k=−p p X l=−p I(x + k, y + l) (4.2) s(x, y) = 1 (2p + 1)2 p X k=−p p X l=−p |I(x + k, y + l) − m(x, y)| (4.3) The radius p in (4.2) and (4.3) is set to two which results in a 5x5 local contrast measure and the norm in (4.3) is the L1norm. The gain is then calculated from

the de-hazed (Id) and the hazy image (Ih) as

Cg= ¯CId− ¯CIh (4.4)

The gain will be positive as images containing haze are contrast degraded and the ltering increases the contrast of the image. A higher value of Cg is

equiv-alent to a higher contrast for ¯CId.

4.2 Comparing lter implementations

Since all code has been implemented by the thesis author and not all errors are indicated by the CUDA compiler, a comparison with other implementations has been made to validate the implemented algorithms. The Bilateral lter is compared to the version of NVidia's open SDK. The frequently used image Baboon is here used to suggest an error free implementation of the Bilateral lter. Figure 4.1 includes four versions of the baboon image: The CUDA ltered, CPU ltered, the NVidia ltered as well as the original image.

(50)

Figure 4.1: Dierent implementations of the Bilateral lter. Top left: CUDA version, Top right: NVidia CUDA version, Bottom left: CPU version and Bottom right: the original Baboon image.

It is possible to observe a dierence between the NVidia version to the other two. Those dierences can be explained since the images displayed from this system use the RGB565 color space and NVidia uses OpenGL output. There is also another dierence in implementation; NVidia's version uses the color image directly whereas the implementations in this work lters one color channel at the time. The ltering behavior is though equivalent and the parameters are the same. The computation time dierences are small; this work takes 0.06 seconds per color channel (0.19 seconds per image) whereas NVidia takes 0.1 seconds per image, running on an NVidia Quadro 600 GPU.

(51)

4.3 Evaluation per module

The modules operates on dierent weather conditions, thus they will have dif-ferent impact on the output image. First, each separate module is considered and then a full system evaluation is presented. All time measurements are taken as the average of stable timings of at least 10 adjacent frames. The system used is the one described in Section 3.1.

4.3.1 Haze module

The evaluation of the de-haze module is here described and divided into sections of performance, noise and quality. The new method proposed is compared to [8] that solved the problem by doing a more complex solution where they try to solve a soft Laplacian matting for the distance map [8].

Performance measure

The computation time is measured for the separate phases of the image de-hazing module. All tests are performed on data read from disk in four dierent resolutions. The time is measured by the Host, in milliseconds, and contains all the overlap of the CUDA functionality (memory transfers and kernel setups). The computation time is shown in Figure 4.2 and 4.3. Both labels in Figure 4.2 are for the complete de-hazing module.

Figure 4.2: Computation time in milliseconds over resolution, lled curve is without memory savings on the GPU while dotted curve reuses memory.

(52)

Figure 4.3: Computation time in milliseconds over resolution of Dark channel, Bilateral, Guided and image restoration.

In [8] a de-hazing algorithm with performance of 0.1 seconds per mega pixel is described and clearly the implementation in this system outperforms the previous ones. The real-time requirement of 30 Hz frame rate is met by the system for resolutions up to 1024x768 which is more than three times faster than previous works. The algorithm in this work uses the novel described method for nding the dark channel together with a Bilateral lter instead of using the method [8] together with the Guided lter. Also the utilization of the GPU in this work accelerates the process to be faster than other known implementations. Contrast comparison

Comparing the contrast gain between dierent scenarios using the method de-scribed in section 4.1 gives an understanding of how good the algorithm is. A positive gain value shows that the contrast in the enhanced image has increased and not decreased where a higher value is better. Five images were used to evaluate the increased contrast and are shown in Figure 4.4 and Figure 4.5.

(53)

(a) Image 1. (b) Image 1.

(c) Image 2. (d) Image 2.

(e) Image 3. (f) Image 3.

(54)

(a) Image 4. (b) Image 4.

(c) Image 5. (d) Image 5.

Figure 4.5: Images before (left) and after enhancement (right).

The results of contrast gain are provided in Table 4.1 where two measure per image are taken, one with the ω parameter in (3.7) set to the initial value and one where it is adjusted down to 0.5 instead of 0.8. The images look very natural with ω set to 0.5 with less amplied noise. Enhanced imges in Figure 4.4 and 4.5 are used with the initial value of ω.

Measure Image1 Image2 Image3 Image4 Image5 ¯

CIhaze 0.047 0.107 0.044 0.023 0.006

CGain 0.048 0.050 0.052 0.060 0.017

CGain,ω=0.5 0.019 0.020 0.019 0.019 0.005

Table 4.1: Contrast gain for ve images after de-hazing. The contrast of the original image is provided as well as the gain for enhancing with a lower ω

The contrast increases for all cases which can be seen by observing a positive contrast gain. The values are comparable with the ones presented in [23] which are around the double. A major dierence in this comparison is that this work does not do a post histogram equalization; it would increase the contrast gain even further.

(55)

Noise sensitivity

The de-haze algorithm is sensitive to noise from the camera sensor where noise in smooth areas is most visible. Figure 4.6 illustrates the problem of noise enhancement in smooth areas like a sky region.

(a) Original sky region. (b) Enhanced sky region.

Figure 4.6: Noise increase in smooth areas. The noise in the sky region is amplied.

The increase in noise can also be seen from calculating the RMSE, Root Mean Square Error, in these smooth sky areas. Here a piece of sky region from an image taken with a DSL camera is compared to an enhanced version of the same area. The DSL camera has superior quality compared to the Basler surveillance cameras and has less noise in the original image.

Image Figure 4.6 (a) Figure 4.6 (b) DSL-Original DSL-Enhanced

RMSE 2.44 6.92 0.68 2.03

Table 4.2: Noise increasing in smooth sky areas.

For both sensors, the noise amplication is noticed as the RMSE increases al-most three times compared to the original images.

Quality of transmission map

The transmission map of the de-hazing module aims to be like the previous implemented version of [8]. A quality comparsion of the transmission map from this work and the one from [8] is shown in Table 4.3 below. Here is also data from when the neighborhood Ω in (3.1) is set to one, no blocks. The mean value of the transmission map as well as the RMSE is compared for the test image. Here the transmission map provided by [8] is taken as reference. The

(56)

mean value should be maximized without introducing errors to the image. If the mean value is maximized the better true dark channel was found if no errors were introduced. In Figure 4.7 parts of the transmission map by [8] and the one created in this work are illustrated.

(a) [8]. (b) This work.

(c) [8]. (d) This work.

Figure 4.7: Comparison between transmission maps of previous work and this work.

Code [8]. This work no blocks mean 197.7 197.9 179 RMSE X 25.1 30.2

(57)

4.3.2 Dynamic weather modules

To evaluate how well the de-rain and de-snow module performs the computation times, the amount of removed streaks/akes and the number of misclassications are here presented. Most of the sequences comes from the SCOTIA database [24] which contains video clips of various weather. The modules do not run together with the de-haze module when calculating the computation time, only the time for dynamic weather removal is evaluated here.

4.3.3 Performance measure

The computation time is measured for the separate phases of the de-rain and de-snow modules. All tests are performed on data read from disk in four dier-ent resolutions. The time is measured by the Host, in milliseconds, and contains all the overlap of the CUDA functionality. The computation times are shown in the graph of Figure 4.8. The de-rain algorithm marked with wavelet is without the pre-computations on the Host where the Wavelet transform is separately shown in Table 4.4 as it is not implemented in CUDA for this work.

Figure 4.8: Computation time, in milliseconds, over resolution of various dynamic weather removal and RGB to Intensity.

The computations are fast through all resolutions, which shows that they are suited for real-time applications. The memory reuse is here implemented from start and each time a new image is captured it is transfered to the GPU where the two last images are already stored.

(58)

Resolution 720x480 1024x768 1280x960 1920x1080

Wavelet(CPU) 17 54 75 146

Table 4.4: Computation time, in milliseconds for the Wavelet algorithm.

The CPU implementation (timed in Table 4.4) of the Wavelet transform cannot be used in a system that operates in real-time. It is given here to show that it is possible to classify rain pixels as high frequency objects.

4.3.4 Visual result of removed rain/snow

Although this results are best shown by looking at complete video sequences it will here be shown as images. There are four dierent methods implemented and here follows the results separately for each method. Two scenes of rain and two of snow are evaluated where images of the same frames are shown. The original frames are shown in Figure 4.9 and comes from the SCOTIA database [24] and videos recorded at Saab. The videos from SCOTIA that has been used and cropped are

• "rain_5_018_002_002_video_tree_basketry" • "rain_5_017_002_002_video_house"

(59)

(a) Rain 1. (b) Rain 2.

(c) Snow 1. (d) Snow 2.

Figure 4.9: Original frames.

In all videos, the intensity of rain and snow is high although no measure of how high was provided. All sequences are static with no moving camera or objects, except the sequence Rain1 that has some moving bushes.

(60)

Rain removal using hue

The results from the rain removal algorithm with use of the hue are shown in Figure 4.10.

(a) Rain 1. (b) Rain 2.

Figure 4.10: Results from rain removal using the hue.

Most of the rain streaks are removed and only small dots are left. When ob-serving the video it is clear that the rain streaks are heavily reduced.

Rain removal using Wavelet transform

The results from the rain removal algorithm with use of the Wavelet transform are shown in Figure 4.11.

(a) Rain 1. (b) Rain 2.

Figure 4.11: Results from rain removal using the Wavelet transform.

Using this algorithm to remove rain streaks works well and most of the streaks are removed from the video. Some more rain still remains when compared to the usage of hue.

(61)

Snow removal

The results from the snow removal algorithm are shown in Figure 4.12. The results should be very similar to the rain removal using the hue because of the similarity of the two algorithms. The strong intensity shifts from the snowakes on a dark background are expected to give a better result than removing rain since it is easier to spot the snow.

(a) Snow 1. (b) Snow 2.

Figure 4.12: Results of snow removal

The lter is eective if the background is dark and the snowakes are spread out. Observing the video in Figure 4.12a, almost no snowakes are visible. The dark background of the door is an example of an easy scene. Figure 4.12b on the other hand has a more complex scene with some brighter areas where the snow is not as easy to classify. The image in Figure 4.12 (b) shows that far from all snowakes are removed by this method but still a noticeable improvement is achieved.

(62)

Rain or snow removal using mean ltering

Results of using the mean ltering of ve frames are shown in Figure 4.13.

(a) Rain 1. (b) Rain 2.

(c) Snow 1. (d) Snow 2.

Figure 4.13: Results of mean ltering.

The mean lter is eective in the rain sequences when there is a static back-ground. The streaks fade in together with the backback-ground. Using this lter in the snow sequences, snowakes are still visible through the background. The mean lter has the easiest implementation among the algorithms in this work but gives fair and comparable visual results. Figure 4.13 b) has the best visual result in this section with smooth areas and no visible rain streaks.

References

Related documents

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Coad (2007) presenterar resultat som indikerar att små företag inom tillverkningsindustrin i Frankrike generellt kännetecknas av att tillväxten är negativt korrelerad över

Från den teoretiska modellen vet vi att när det finns två budgivare på marknaden, och marknadsandelen för månadens vara ökar, så leder detta till lägre

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I dag uppgår denna del av befolkningen till knappt 4 200 personer och år 2030 beräknas det finnas drygt 4 800 personer i Gällivare kommun som är 65 år eller äldre i

Detta projekt utvecklar policymixen för strategin Smart industri (Näringsdepartementet, 2016a). En av anledningarna till en stark avgränsning är att analysen bygger på djupa

DIN representerar Tyskland i ISO och CEN, och har en permanent plats i ISO:s råd. Det ger dem en bra position för att påverka strategiska frågor inom den internationella