Implementation and Performance Analysis of Filternets

(1)

Implementation and Performance Analysis of

Filternets

Henrik Einarsson 2006–01–27

(2)

(3)

Implementation and Performance Analysis of

Filternets

Examensarbete utfört i Medicinsk teknik

vid Tekniska högskolan i Linköping

av

Henrik Einarsson LITH-IMT/MI20-EX--06/419--SE

Handledare: Björn Svensson

imt, Linköpings universitet Mats Andersson

imt, Linköpings universitet Hagen Spies

Contextvision Tomas Loock

Contextvision

Examinator: Hans Knutsson

imt_{, Linköpings universitet}

(4)

(5)

Avdelning, Institution

Division, Department

Division of Medical Informatics Department of Biomedical Engineering Linköpings universitet S-581 83 Linköping, Sweden Datum Date 2006-01-27 Språk Language Svenska/Swedish Engelska/English ⊠ Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Örig rapport ⊠

URL för elektronisk version

http://urn.kb.se/resolve?urn= urn:nbn:se:liu:diva-5601 ISBN — ISRN LITH-IMT/MI20-EX--06/419--SE

Serietitel och serienummer

Title of series, numbering

ISSN

—

Titel

Title Implementering och Prestandatest av Filternät_{Implementation and Performance Analysis of Filternets}

Författare

Author Henrik Einarsson

Sammanfattning

Abstract

Today Image acquisition equipment produces huge amounts of data that needs to be processed. Often the data describes signals with a dimensionality higher then 2, as with ordinary images. This introduce a problem when it comes to process this high dimensional data since ordinary signal processing tools are no longer suitable. New faster and more efficient tools need to be developed to fully exploit the advantages with e. g. a 3D CT-scan.

One such tool is filternets, a layered networklike structure, which the signal propagates through. A filternet has three fundamental advantages which will crease the filtering time. The network structure allows complex filter to be de-composed into simpler ones, intermediate result may be reused and filters may be implemented with very few nonzero coefficients (sparse filters).

The aim of this study has been to create an implementation for filternets and optimize it with respect to execution time. Specially the possibility to use filternets that approximates a harmonic filterset for estimating orientation in 3D signals is investigated.

Tests show that this method is up to about 30 times faster than a full filterset consisting of dense filters. They also show a slightly larger error in the estimated orientation compared with the dense filters, this error should however not limit the usability of the method.

Nyckelord

Keywords Filter network, sparse filters, efficient filtering, local structure, orientation estimation, spherical harmonics

(6)

(7)

Abstract

Today Image acquisition equipment produces huge amounts of data that needs to be processed. Often the data describes signals with a dimensionality higher then 2, as with ordinary images. This introduce a problem when it comes to process this high dimensional data since ordinary signal processing tools are no longer suitable. New faster and more efficient tools need to be developed to fully exploit the advantages with e. g. a 3D CT-scan.

One such tool is filternets, a layered networklike structure, which the signal propagates through. A filternet has three fundamental advantages which will decrease the filtering time. The network structure allows complex filter to be decomposed into simpler ones, intermediate result may be reused and filters may be implemented with very few nonzero coefficients (sparse filters).

The aim of this study has been to create an implementation for filternets and optimize it with respect to execution time. Specially the possibility to use filternets that approximates a harmonic filterset for estimating orientation in 3D signals is investigated.

Tests show that this method is up to about 30 times faster than a full filterset consisting of dense filters. They also show a slightly larger error in the estimated orientation compared with the dense filters, this error should however not limit the usability of the method.

(8)

(9)

Acknowledgements

First I would like to thank all employees at Contextvision for a very enjoyable time during my work there.

I would specially like to thank Hagen Spies for all the help with the image/signal processing part of the work, as well as numerous proof readings of this report and for the comments and suggestions that followed.

I also give my gratitude to Tomas Loock and Hans Nordgren for all the feedback and suggestions concerning the programming part of my work and the design of the implementation. Also for suggestions and discussion on how to further optimize the implementation on my quest to eliminate clock cycles.

At IMT my examinator, Hans Knutsson, and my supervisors, Björn Svensson and Mats Andersson, have always had time for discussions. I am specially greateful for the discussions covering the topic on how the orientation tensor is derived from the harmonic functions, which turned into Appendix B.

I also would like to thank my opponent Erik Pettersson. His comments and suggestions are highly appreciated.

Finally I also would like to thank Per-Erik Forsén at the Department of Elec-trical Engineering who provided the software for visualization of 3D tensors, which made the work a lot easier.

Henrik Einarsson Linköping, 27 January, 2006

(10)

(11)

6.5 Memory usage . . . 57 6.6 Utilization of sparsity . . . 58 6.7 Quality tests . . . 59 6.7.1 Test signal . . . 59 6.8 A real example . . . 66 7 Discussion 69 7.1 Conclusions . . . 69 7.2 Future work . . . 70 Bibliography 71 A Mathematical proofs 75 A.1 The ’uniqueness’ requirement . . . 75

A.2 The ’polar separability’ requirement . . . 75

A.3 The ’uniform stretch’ requirement . . . 76

A.4 Existence of the matrix C . . . 78

B Derivation of a phase invariant 3D tensor from harmonic func-tions 79 B.1 Preliminaries . . . 79

B.2 Filter responses . . . 81

B.3 Filter combinations in the spatial domain . . . 82

C Polynomial expansion, the rather short version 87 C.1 Polynomial expansion . . . 87

C.2 Solution to polynomial expansion . . . 88

C.3 Ordinary convolution as a scalar product . . . 88

(13)

Contents xi

List of Figures

1.1 Example of high dimensional signal processing . . . 2

2.1 Neighborhoods with different orientations . . . 8

2.2 Examples of a simple signal, an approximately simple signal and a non simple signal . . . 10

2.3 Visualization of orientation tensor in 2D . . . 14

2.4 Visualization of orientation tensor in 3D . . . 15

2.5 Quadrature filter in the Fourier domain . . . 17

2.6 Orientation estimation using quadrature filters . . . 18

2.7 Radial function of the Loglet . . . 19

2.8 Second order spherical harmonics . . . 20

2.9 First order harmonic basis functions . . . 21

2.10 Second order harmonic basis functions . . . 22

3.1 A general filter network . . . 26

3.2 Cartesian separation of a Gaussian kernel . . . 30

3.3 Polynomial expansion by a series of convolutions . . . 30

3.4 Filter net for polynomial expansion exploiting Cartesian separability 32 3.5 Filter net for polynomial expansion using Cartesian separability and eliminating redundance . . . 32

3.6 3D orientation net with 335 coefficients . . . 33

3.7 3D orientation net with 168 coefficients . . . 33

4.1 The von Neumann model of a computer. . . 35

4.2 Modified von Neumann model creating parallel computers. . . 36

4.3 Loading of XMM register. . . 37

4.4 SSE2 functions. . . 38

4.5 Data alignment using SSE2. . . 38

4.6 Matrix storage. . . 39

4.7 Inefficient use of SSE2 for matrix vector multiplication. . . 40

4.8 Efficient use of SSE2 for matrix vector multiplication. . . 41

5.1 Filter net that differentiate a 2D signal. . . 45

5.2 XML-description of a filter net. . . 46

6.1 Developed convolvers filtering a 5123_{volume . . . .} ₅₀

6.2 Filtering time when the size of the data changes . . . 52

6.3 The full filter implemented as a filternet . . . 53

6.4 Filtering time when using a dual processor system . . . 55

6.5 Filtering time when placement of the coefficients change . . . 57

6.6 Utilization of sparsity in the filters . . . 59

6.7 The test signal for quality measures . . . 60

6.8 Tensor probability . . . 62

6.9 Position of the projections in 3D volume . . . 63

(14)

6.11 Result of orientation estimation with SNR 0 dB . . . 65

6.12 MR volume in the xz- and yz-planes . . . 66

6.13 MR volume in the xy-plane . . . 67

(15)

List of Tables

3.1 Computational load for separable Gauss kernel . . . 30

6.1 Coefficients for Ordo-functions . . . 53

6.2 Comparison between filter net and full filter set . . . 54

6.3 Memory usage of the 3D nets . . . 58

6.4 Error in the orientation . . . 61

6.5 Error in the orientation, slightly LP-filtered tensor . . . 61

(16)

(17)

Chapter 1

Introduction

Signal processing in 2D for example feature extraction, image enhancement and visualization are today common practise. The signal acquisition equipment used today can however produce signals of higher dimension then 2 e. g. CT and MR scans. The signals can be in 3D resulting in volumes with three spatial coordinates or image sequences with two spatial and one temporal coordinates. Even 4D signals are possible (volume sequences) with three spatial and one temporal coordinates. When processing these high dimensional signals the filtering methods used in 2D are often no longer suitable due to e. g. the increased computational load. Therefore new techniques must be developed to process these high dimensional signals. This study is performed within a research project involving the Depart-ment of Biomedical Engineering at Linköping university and Contextvision AB in which such filtering methods are developed.

1.1 High Dimensional Signal Processing

Linear filtering is one of the most fundamental tools in signal processing. The re-sulting pixel or voxel (depending on the dimensionality of the signal) is a weighted summation of the pixels/voxels in a surrounding neighborhood. The weights used are given by the filter. A filter can e. g. be designed to smooth, find desired fea-tures such as lines/edges, differentiate, sharpen or, as in this study, find orientation of structures in the signal.

A problem with high dimensional data is that a direct implementation of filter-ing usfilter-ing ordinary convolution becomes infeasible due to the computational load. The complexity of filtering is directly dependent on the size of the filter i. e. the number of coefficients in the filter. A filter of say size ND _{generally requires N}D

multiplications and additions to calculate the result in one point of the resulting signal.

If the dimensionality D is increased this yields a computational complexity that grows exponentially with D making a direct implementation impossible, due to the resulting execution time and/or memory usage.

(18)

Figure 1.1. Example of high dimensional signal processing. Visualization of a 3D CT-scan. Courtesy of CMIV (Center for Medical Image Science and Visualization) http://www.cmiv.liu.se

Take for example a filter set with 9 filters where each filter is of size 113_{. This}

results in a total of 9 · 113 _{= 11979 filter coefficients. Now if the signal is of size}

5123 _{this results in that approximately 1.6 · 10}12 _{multiplications are needed to}

create the filter responses for the entire signal. It is estimated that such a filtering takes up to an hour to complete1_{. It should be obvious that this is not reasonable}

in practical applications. The need for faster and more efficient filtering methods is evident.

1.2 Filter networks

Filter networks is one such method for efficient filtering of high dimensional sig-nals. A networklike structure of small filters is created through which the signal propagates. The idea when creating the network is to decompose complex filter into simpler ones and to minimize redundant computations by reusing

interme-1

(19)

1.3 Aim of the study 3 diate results so the same computations are not carried out more than absolutely necessary. If the filters can be created as sparse filters with very few nonzero coefficients this further decreases the computational load. This implies that only a fraction of the filter coefficients are used compared to a direct implementation, thus resulting in faster filtering.

1.3 Aim of the study

The aim of this study is to create an implementation of a filter network, optimize it with respect to execution time and to evaluate the performance of the imple-mentation. For the performance analysis both ”dummy” networks (used for tests more focused on computational load and execution time) as well as realistic 3D networks for estimating local orientation in 3D signals are used.

1.4 Thesis outline

This thesis is divided in two parts. The first part covers the theory and the tools used and includes chapters 2 – 4. In chapter 2 the idea of local orientation is presented and methods for estimating it from measurements on real signals. Chapter 3 describes the concept of filter networks and gives some examples of how a networklike structure of the filtering helps to increase the performance. The networks used to estimate local orientation in 3D signals are also presented in this chapter. Chapter 4 gives some basic knowledge on parallel computing and the tools used to parallelize the implementation. The chapter specially looks at SIMD computing using Intel’s SSE/SSE2 technology.

Part II consisting of chapter 5 – 7 contains the performance analysis and is the main contribution in this work. Chapter 5 is a very brief description of the implementation. Chapter 6 explains the tests that will be performed and the test signals used as well as the results of the tests.

Finally chapter 7 sums up the work, presents a conclusion as well as discuss future work relating to the work done here.

(20)

(21)

Part I

Theory

(22)

(23)

Chapter 2

Orientation

In this chapter the concept of local orientation will be presented and how this can be estimated in a multidimensional signal. Most of the theory will be explained in 2D to simplify the theory a bit and to make visualization easier. The generalization to 3D is straightforward in most cases, but some 3D topics will be covered as well. The theory presented in this chapter is based on [11], [18] and [19].

2.1 Representation of orientation

An image can be said to consist of mainly three types of neighborhoods, edges, lines and areas with an almost constant intensity level. It is very interesting to be able to first of all say if an arbitrary n2_(n3_{in 3D) neighborhood in the picture or}

volume contains some type of structure, that is if it contains a line and/or an edge in 2D (in 3D the possibility of a plane also exists) and if so what direction this structure has. This information can then be used in later steps of more advanced processing. An example of how this information can be used is image enhancement using adaptive filtering, which is described in [11] (chapter 10). The estimation of the direction of these potential structures in a neighborhood is called local orientation estimation.

First some type of representation for this orientation must be defined. A very simple representation would be to use the direction of the gradient in the neighbor-hood. This will however imply a very fundamental problem. Consider figure 2.1. It should be obvious that in each column the neighborhoods should be considered to have the same orientation (it does not matter if the black field is to the left or the right since the orientation of the edge is still the same). If the direction of the gradient is used to represent the orientation here it will point in different directions in e. g. figures 2.1(b) and 2.1(e). That is points with the same orientation are mapped to different points in the feature space. This is due to the fact that an edge is an odd feature whereas orientation is an even feature.

The above problem can be eliminated by instead using a double angle rep-resentation, in which the direction of the gradient is doubled. That is if the orientation in a neighborhood is represented by a vector pointing in the direction

(24)

(a) Neighborhood with orientation 0◦ (b) Neighborhood with orientation α◦ (c) Neighborhood with orientation 90◦ (d) Neighborhood with orientation 0◦

(e) Neighborhood with orientation α◦

(f) Neighborhood with orientation 90◦

Figure 2.1. Neighborhoods with different orientations. The images in every column should be represented by the same orientation.

cos(θ) sin(θ)T then with the double angle representation it is represented by a new vector

ˆ

x=cos(2θ) sin(2θ)T (2.1)

It seems like the gradient in the neighborhood could be used after all, if the angle of the gradient is doubled. This idea however breaks down when increasing to 3D. Consider an arbitrary orientation in the xy-plane given by spherical coordinates (r, θ, π/2). Now if the angles are doubled it will get coordinates (r, 2θ, π). This vector is pointing straight down along the z-axis and this is valid for all orientations in the xy-plane. That is all different orientations in the xy-plane are mapped to the same point in the feature space. If the angles are not doubled this representation inherits the same problem as the gradient in 2D described earlier. Therefore some other representation is needed. To find this lets look at the multidimensional Fourier transform, defined by

F (u) = F{f(ξ)} = Z

Rn

f (ξ) exp(−iuTξ)dnξ (2.2)

When the function f(ξ) undergoes an affine transformation g(ξ) = f(Aξ + b) in the spatial domain this is reflected in the Fourier transform by

G(u) = F{g(ξ)} = 1

| det A|exp iu

(25)

2.2 Tensors 9 See [3] for a proof of eq. (2.3). Pure rotations are affine transformations with [AT_]−1_{= A, det A = 1 and b = 0, thus eq. (2.3) can be written as}

G(u) = F (Au)

The relationship between the rotated function f(Aξ) and its Fourier transform is then

F (Au) = G(u) = F{g(ξ)} = F{f(Aξ)} (2.4)

That is the Fourier transform undergoes the same rotation.

Now lets look at the subset of n-dimensional functions for which it holds that

f (ξ) = g(ξTx) (2.5)

where ξ is the spatial coordinate, x is a constant vector pointing in some direction and g is a one dimensional function. This implies the n-dimensional function f(ξ) varies only in one direction. If the signal f(ξ) can be written as eq. (2.5) the signal/neighborhood is said to be simple or intrinsically one dimensional (i1D).

Consider a simple 2-dimensional function and let the variation be in the direc-tion of ξ1. That is the function ca be written as

f (ξ) = f (ξ1, ξ2) = f1(ξ1)f2(ξ2)

where f2(ξ2) ≡ 1. The Fourier transform is Cartesian separable [3] and thus yields

that

F (u) = F (u1, u2) = F(1)F(f1(ξ1)) = δ(u2)F1(u1).

where δ(u2) is the dirac impulse. That is the Fourier transform of the n-dimensional

simple function f(ξ) is the onedimensional Fourier transform of the function de-scribing the variation. That is all the energy in the Fourier transform is con-centrated along a line through the origin and in the direction of ξ1. (The same

direction as the variation.)

Now by using eq. (2.4) it is obvious that if the signal is rotated then the line with the energy is rotated as well. It is however not likely to find exactly simple neighborhoods in practical applications, due to e. g. noise or curvature in the signal. Thus energies concentrated along a sharp line in the Fourier transform are unlikely. Instead the energy will be spread in a wider area, but it will be concen-trated along a certain direction which does coincide with the orientation. Hence estimating orientation is the same as searching for the direction in Fourier space that contains most energy. Figure 2.2 gives an example of a simple neighborhood, a neighborhood that can be approximated as simple and one that is not simple and their Fourier transforms respectively.

2.2 Tensors

Before continuing with the estimation of local orientation it is time to introduce a new mathematical concept, that will be used to represent the local orientation.

(26)

(a) Simple neighborhood

(b) Approximately simple neighborhood

(c) Not a simple neighborhood

Figure 2.2. Examples of a simple signal, an approximately simple signal and a non simple signal. To the left the signal in the spatial domain and to the right the Fourier domain. The Fourier transforms have been manipulated so what is actually shown in the right column is the positions where energy exists in the Fourier domain.

(27)

2.2 Tensors 11 Mathematically physical quantities can be described with just a number represent-ing its size like in the case with mass, length and area. These quantities are known as scalars. Sometimes it is not enough with just describing the size. A direction must also be given for the quantity to be useful. Examples of this is velocity, magnetic flow and force. Quantities with this property do we know as vectors. But there are situations when a vector can not give an convenient description. This is very common as the dimensionality of the problem grows. Therefore the vector concept needs to be generalized. One generalization is the tensor. Tensors will here be denoted with bold capital letters, e. g. T.

Tensors are e. g. used for describing the stress at a point due to internal forces, the deformation of an arbitrary infinitesimal element of volume of an elastic body and in the theory of relativity.

Depending on the complexity of the quantity a tensor describes it can be said to have different orders. Tensors of order 0 are nothing else then scalars and tensors of order 1 correspond to vectors. Tensors of order 2 are then matrices. Here only tensors of order 2 will be considered, so the tensor can be thought of as a matrix with some special properties.

Since tensors are a generalization of the vector concept a number of concepts from vector algebra can also be generalized to tensor algebra. A few that will be used here are the following

Scalar product

The scalar product between two tensors T and U is defined by (Assuming an orthogonal base.)

T_{• U =}X

ij

tijuij (2.6)

where tij and uij are the elements of the tensors. The scalar product of the tensor

with itself defines the norm

kTk2= T • T =X ij t2ij = X n λ2n (2.7)

where λn are the eigenvalues of T.

Basis tensors

Just like in the case with vectors is it meaningful to define a basis so a tensor can be written as a weighted sum of the set of basis tensors

T=X

i

ciBi (2.8)

(28)

Outer product tensors

A special case of tensors are the outer product tensor. This tensor can be expressed as the outer product of two vectors a and b.

T= abT _(2.9)

The norm of an outer product tensor is, by using eqs. (2.7) and (2.9) kTk2= T • T = (abT_{) • (ab}T_{) = (a · a)(b · b) = kak}2

kbk2 (2.10)

For more about tensors see for example [17].

2.3 Orientation using tensors

Now back to finding a suitable representation for the local orientation. No matter what type of representation is chosen it should meet the following three basic requirements [18].

The ’uniqueness’ requirement: To remove the discontinuity experienced ear-lier in the case with the edges in figure 2.1 the representation should map the two vectors x and −x to the same point in the feature space i.e.

T_{(x) = T(−x)} (2.11)

where

xis a vector in the original space and Tis the map of x

The ’uniform stretch’ requirement: The mapping should locally preserve the angle metric of the original space, i.e.

kδTk = ckδxkr=const (2.12)

where r = kxk

c is a ’stretch’ constant.

The ’polar separability’ requirement: Since the magnitude of the orientation vector x is (normally) independent of the direction it is also reasonable to require that

kTk = f(kxk) (2.13)

(29)

2.3 Orientation using tensors 13

2.3.1 The Orientation Tensor

A mapping that meets all of the above requirements and thus will be used to represent the local orientation is given by

T_{≡ r}−1xxT (2.14)

Where r is any constant greater then zero and x is a constant vector pointing in the direction of the orientation. That this actually is a suitable representation for local orientation is proved in appendix A. Notice that no assumption about the dimensionality of x has been made here so this makes the orientation tensor suitable for signals of any dimensionality.

2.3.2 Interpreting the Orientation tensor

In an n-dimensional signal the orientation is represented according to eq. (2.14) by a n × n matrix. It is not trivial to see how the orientation is given by this matrix. Interpretation in 2D

A simple neighborhood is represented by a tensor TS with rank 1. (Subscript S

for simple.) TS can be written as

TS= λ1ˆe1eˆT1 (2.15)

In 2D the Orientation tensor can generally be written as

T= λ1ê1êT1 + λ2ê2êT2 (2.16)

This can be visualized as an ellipse with the principal axes pointing in the direction of the eigenvectors ˆe1, ˆe2 and the length of the axes given by the eigenvalues

λ1 ≥ λ2 ≥ 0 respectively. The simple neighborhood TS is then represented by

a line pointing in the direction of ˆe1. In practise signal neighborhoods are rarely

exactly simple, due to for example noise and/or curvature. Therefore tensors of the type in eq. (2.16) are most likely to be found and the best estimation one can do is to find the simple tensor that minimize

∆ = kT − TSk (2.17)

where TS is given by eq. (2.15). The norm of a tensor/matrix is invariant under

rotation, [14]. Then let C be an orthonormal matrix and rewrite eq. (2.17) as ∆ = kC−1(T − r−1xxT_{)Ck = kC}−1TC_{− C}−1r−1xxTC_k

Also let C be chosen such that C−1_TC_{is diagonal. It is always possible to find}

such a matrix C as long as T is nondefective, i. e. T has n distinct eigenvalues [14]. A proof of this is found in appendix A. Eq. (2.15) yields that only one eigenvalue of xxT _{is non zero. The norm of C}−1_TC _{is the sum of the squares of}

(30)

follows that ∆ is minimized if C−1_r−1_xxT_C _{is chosen so it removes the largest}

eigenvalue of C−1_{TC. Then ∆ becomes}

∆ =X

k6=1

λ2k

and since T and r−1_xxT _{undergoes the same rotation T}

S should be chosen as

TS = r−1xxT = λ1ˆe1ˆeT1

The value of ∆ is a measure of how well the tensor fits the simple neighborhood, the smaller ∆ the better fit.

−1 −0.5 0 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

(a) Tensor from an almost sim-ple neighborhood −1 −0.5 0 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

(b) Tensor from an approxi-mately simple neighborhood

−1 −0.5 0 0.5 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

(c) Tensor from an almost isotropic neighborhood

Figure 2.3. Visualization of three different Orientation tensors in 2D. The dashed line are the ellipses given by the eigenvector and eigenvalues. The solid lines are given by λiˆei. The estimated orientation is in the direction of the longest of the two.

Interpretation in 3D

The Orientation tensor is in 3D generally given by

T= λ1eˆ1êT1 + λ2ê2êT2 + λ3eˆ3êT3 (2.18)

In 3D there are structured neighborhoods that are not described by a rank 1 tensor [11]. The complexity of the neighborhoods are given by the rank of the tensor. There are three types of interesting cases

(31)

2.3 Orientation using tensors 15 1. Plane case (rank 1 neighborhood λ1≫ λ2≃ λ3)

T_{≃ λ}1T1= λ1ˆe1ˆeT1

This corresponds to a neighborhood that is approximately plane, i.e. it is constant on planes in a given orientation. The orientation of the normal vectors to the planes is given by ˆe1.

2. Line case (rank 2 neighborhood λ1≃ λ2≫ λ3)

T_{≃ λ}1T2= λ1(ê1eˆT1 + ê2êT2)

This is neighborhoods that are approximately constant on lines. The ori-entation of the line is given by the eigenvector corresponding the smallest eigenvalue, ˆe3

3. Isotropic case (rank 3 neighborhood λ1≃ λ2≃ λ3)

T_{≃ λ}1T3= λ1(ê1êT1 + ê2êT2 + ê3êT3)

This corresponds to approximately isotropic neighborhoods in which no typ-ical orientation can be found.

Easy algebraic manipulations of eq. (2.18) [29] results in that the general 3D tensor can be written as a linear combination of the three cases above

T= (λ1− λ2)T1+ (λ2− λ3)T2+ λ3T3 (2.19)

The generalization of the ellipse in 2D is to visualize the 3D tensor as an ellipsoid. A better choice is to visualize the tensor as a combination of a spear (corresponding to case 1), a disc (corresponding to case 2) and an sphere (corresponding to case 3). An example of such a visualization is found in figure 2.4.

Figure 2.4. Visualization of tensors in 3D. The red spear corresponds to a measure of how much case 1 the tensor is. In the same way the yellow disk and the green sphere corresponds to case 2 and case 3 respectively.

(32)

2.4 Estimation of orientation

So far a representation for local orientation has been derived and how this rep-resentation should be interpreted. Nothing has so far been said about how to actually construct the orientation tensor from a signal.

Two different methods will be presented here. The first uses a set of Quadrature

filters and might be seen as the standard method, [11] for orientation estimation.

This method will not be used in the filter network later and is therefore not described in much detail. For more about this method the reader is advised to [11]. The other method, that is used in the filter network, uses a type of filters called Loglets, [19]. Other methods to create the orientation tensor are found in e. g. [8] and [13]. A comparison between different methods can be found in [15].

2.4.1 Quadrature filter

This method is presented in [20] and estimates the local orientation in a signal by using a set of Quadrature filters. A Quadrature filter is defined as a filter that is zero over half of the Fourier space (regardless of dimension of the Fourier space).

Fk(u) = 0 if u · ˆnk ≤ 0 (2.20)

where u is the frequency and ˆnk is the direction of the filter. The filter are polar

separable in one part that only depends on the radius, R, and one part only depending on the direction, D.

Fk(u) = R(kuk)Dk(ˆu)

To meet the requirement in eq. (2.20) the directional function is chosen as

Dk(û) = (û· ˆnk)2 if û · ˆnk > 0

Dk(ˆu) = 0 otherwise (2.21)

The radial function, R(kuk), can be chosen arbitrary and is used to select the size of the structures the filter should detect. Different frequencies corresponds to structures of different size.

This filter is real in the Fourier domain (see figure 2.5), thus the filter will be complex in the spatial domain, [3], with an even real part and an odd imaginary part. As mentioned earlier a set of filters are needed to perform the orientation estimation. An argument that more then 2N−1_{, where N is the dimension, filters}

are needed can be found in [11]. The minimum number of filters needed are 3 in 2D and 6 in 3D. The 6 complex filters needed in 3D can be implemented as 12 real filters. These filters should be spread uniformly over the Fourier space, [11]. The orientation tensor can then be obtained by a linear summation of the magnitudes of the quadrature filter outputs, qk by using the formula.

T=X

k

(33)

2.4 Estimation of orientation 17 −π −π/2 0 π/2 π −π −π/2 0 π/2 π −0.5 0 0.5 1 1.5 y−axis x−axis

Figure 2.5. Example of a Quadrature filter in the Fourier domain. It should be evident that this filter will detect energy in a certain direction in the Fourier space.

where qk= |qk| and Mk is a dual tensor associated with quadrature filter k given

by

M_k= αˆn_knˆT_k _{− βI}

α and β are constants that depends on the dimensionality of the signal and ˆnk is

the direction of quadrature filter k. For mathematical proofs see [11].

Some examples of results when estimating orientation using this method can be found in figure 2.61_.

2.4.2 Loglets

Another way to construct the orientation tensor is to choose a filter set with filter responses hk(ξ) that in themselves form a basis tensor with tensor components

tn(ξ) =

X

i,j

αnijhi(ξ)hj(ξ)

The use of such basis filter set admits synthesization of filter responses in an infinite number of directions, [19]. One type of filters that can be used for this purpose are the Loglet filters. Loglets are similar to the quadrature filters and are polar separable in the Fourier domain as well. They are defined by a radial part and a vector valued directional part.

Qsk(u) = Rs(k u k)Dk(k ˆuk)

1

The results are best viewed in color. A version of this thesis with color figures can be found at http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-5601.

(34)

(a) (b)

(c) (d)

Figure 2.6. Examples of the result when using quadrature filters and the method described above to estimate orientation. Upper row: A simple test pattern (a) and the estimated orientation (b). Bottom row: A CT image (c) and the estimated orientation (d). The colors in figures (b) and (d) indicates the estimated orientation. The color coding is easiest to understand by looking at the upper row. Green and red corresponds to vertical and horizontal structures respectively.

where s indicates the scale and k is associated with a direction typically given by some vector ˆnk.

The radial function

The radial function is used, like in the case with the Quadrature filters, to select the size of the structure the filter should detect. The radial function, Rs(kuk),

(shown in the upper graph in figure 2.7) is build by combining functions of the type shown in the lower graph of figure 2.7.

The Directional function

In 3D this basis filter set must support a uniform approximation of functions on the unit sphere, [2]. This can be achieved by using polynomials of the following

(35)

2.4 Estimation of orientation 19 0 0.5 1 1 2 2.5 3 3.5 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2 2.5 3 3.5 −0.4 −0.2 0 0.2 0.4 0.6

Figure 2.7. Top: The radial function Rs(kuk), calculated by combining the functions

in the graph below. Bottom: Example of the functions used to build the radial function of the loglet. type F (u1, u2, u3) = N X η,µ,ν=0 aηµνuη1u µ 2uν3

These polynomials forms a complete set of functions in the unit sphere. The polynomials can be grouped together as homogeneous polynomials of degree l = η + µ + ν resulting in that there are

1

2(l + 1)(l + 2)

linearly independent polynomials for each l. This is the spherical harmonics basis that are commonly used in modern physics [28]. The basis filters should have a uniform shape and a well defined orientation to simplify interpolation and make the computations more efficient [2]. Unfortunately all the spherical harmonics do not have these properties, see figure 2.8.

An alternative basis filter, that has these properties are given by

Dlj(u) = G(ρ)(ˆu· ˆnlj)l (2.22)

In [2] it can be found that the directions for the first order basis filters are ˆ

n11=1 0 0T nˆ12=0 1 0T nˆ13=0 0 1T

and the second order basis filters have the following directions ˆ n21= cb −a 0 T ˆ n24= c0 b a T ˆ n22= c−a 0 bT nˆ25= cb a 0T ˆ n23= c 0 b −aT nˆ26= c a 0 bT

(36)

Figure 2.8. Two second order spherical harmonics. Note the non uniform shape and lack of welldefined orientation, an undesired property of the basis filters.

where

a = 2 b =√5 c = 1/

q

10 + 2√5

These directions do not fit a Cartesian grid very well. Therefore the second order filters will be spread along the diagonals, that is in the following directions

ˆ n21= √1₂1 1 0T ˆn22= √1₂−1 1 0T ˆ n23= √1₂ 1 0 1T ˆn24= √1₂ −1 0 1T ˆ n25= √1₂0 1 1 T ˆ n26= √1₂0 −1 1 T

This set of directions will constitute a basis as well and will span the same room as the original spherical harmonics, but it will not be orthonormal. This yields that the directional part is given by

Dk(ˆu) = 1 ˆ u ˆ uTnˆk 2a (2.23) where ˆ

uis a unit vector in the frequency domain ˆ

nk is a filter directing unit vector

a ≥ 0 is an integer setting the directional selectivity of the loglet.

The loglet is a vector valued filter, but in this application there is no actually need for any other component then the component resulting from the 1 in eq. (2.23). Thus the directional function is still scalar. One might then argue that the filters used are not really Loglets, but the term Loglet is used any way.

(37)

2.4 Estimation of orientation 21 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

(a) First order harmonic with direc-tion ˆn11 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

(b) First order harmonic with direc-tion ˆn12 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

(c) First order harmonic with direction ˆ

n13

Figure 2.9. First order harmonic basis functions. These functions are odd.

2.4.3 Constructing the Orientation tensor from harmonic

functions

The orientation tensor is given by (Note that the magnitude of the filter response has been changes to the l2norm in the expression.)

T=

K

X

k=1

k qk k2Mk

Where qk are the responses from a single scale loglets and Mk are the filter

orientation tensors, [19]. Now if the number of orientations, K, are increased to the limit K → ∞ the above sum will turn into the following integral, [19]

T=

Z

ˆ n

k q(ˆn_{) k}2M(ˆn) dˆn (2.24)

Where q (ˆn) is the response from a loglet in the ˆndirection and M (ˆn) is the dual tensor corresponding to ˆn. To solve the integral in eq. (2.24) and find the tensor

(38)

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

(a) Second order harmonics. Left: direction ˆn21. Right: direction ˆn22.

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

(b) Second order harmonics. Left: direction ˆn23. Right: direction ˆn24.

−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1

(c) Second order harmonics. Left: direction ˆn25. Right: direction ˆn26.

Figure 2.10. Second order harmonic basis functions. The second order harmonics are even.

(39)

2.4 Estimation of orientation 23 components is very cumbersome. But these components can be constructed as a weighted sum of products of the outputs from the basis filters.

tn(ξ) = J X j=1 j X l=1 ajnlhj(ξ)hl(ξ) (2.25)

Where J is the total number of spherical harmonics functions used, ajnl is the

weighting coefficient for the product between filters j and l in tensor component n and hj( ˆξ) is the j:th spherical harmonics function. That this is actually the

case will be shown in appendix B by finding the combinations that constructs the tensor components. In this chapter only the final filter combinations in the spatial domain will be given. The filter combinations in the spatial domain are given by

t1= h0(h0− h25− h26) − h211 t2= h0(h0− h23− h24) − h212 t3= h0(h0− h21− h22) − h313 t4= 1 2h0(h21− h22) − h11h12 t5= 1 2h0(h23− h24) − h11h13 t6= 1 2h0(h25− h26) − h12h13. and the tensor is given by

T=   t1 t4 t5 t4 t2 t6 t5 t6 t3  

Here only a second order approximation is used (only up to second order harmonics are used). This yields only one set of filter combinations to construct the tensor. If higher order harmonics are used an infinite number of combinations can be found to construct the tensor components.

The filternets used for estimating orientation, that will be presented in the next chapter, will approximate this loglet filterset. Thus the orientation tensor will be constructed by using eq. 2.25 where hi(ξ) is the output from the net.

(40)

(41)

Chapter 3

Filter Network

In this chapter the concept of filter networks will be introduced. The advantages of using such a network structure for the filtering will be presented. Two examples will be given to show how a layered structure can be used to increase the performance of the filtering operation.

3.1 Introduction to Filter Networks

A Filter Network or Filternet is a type of filter bank and the concept was in-troduced in [1]. As the name implies a filter network is a networklike structure through which the signal is filtered. The idea is that the introduction of this structure will in some way result in an increase of performance. A general filter network is shown in figure 3.1.

Every arc in the network is a filtering operation and in every node the incoming filter responses from the previous layer are summed up. The description of a filter network can be divided into two parts. One part describing the structure of the net and another part describing its internal properties. The structure of the network are all properties that one can obtain from looking at a figure like figure 3.1. Here things like the number of layers, the number of nodes in each layer and which nodes are connected to each other can be found. The internal properties then gives the number of filter coefficients in each filter and their spatial coordinates as well as the value of the coefficient. Filters at the same distance from the root node are said to belong to the same layer.

By using a structure like the one in figure 3.1 at least three advantages are introduced to the filtering.

1. With the possibility to sequential filtering a complex filter can be decomposed into several smaller subfilters with fewer nonzero coefficients. As a result the convolutions can be performed faster.

2. Since a filter output from say layer k can contribute as an input to more then one convolution in layer k + 1 can redundant computations if not be

(42)

L1 8 < : wwoooooo oooo '' O O O O O O O O O O L2 8 < : ? ? ? ? ? ? ? ? ? ? ? ? L3 8 < : wwoooooo oooo ?? ? ? ? ? '' O O O O O O O O O O ** T T T T T T T T T T T T T T ssgggggggggggg ggggggg ?? ? ? ? ? .. . Lk 8 < : ?? ? ? ? ? ? ? ? ? ? ? wwoooooo oooo '' O O O O O O O O O O ? ? ? ? ? ? ?? ? ? ? ? wwoooooo oooo Lk+1 8 < : ??? ? ? ? ? ? ? ? ? ? '' O O O O O O O O O O ??? ? ? ? wwoooooo oooo ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ? ? wwoooooo oooo .. . Ll 8 < : ?? ? ? ? ? ? ? ? ? ? ? wwoooooo oooo OOOOO'' O O O O O Ll+1 8 < : OOOOO'' O O O O O ? ? ? ? ? ? ttjjjjjjjjj jjjjj OOOOO'' O O O O O ?? ? ? ? ? .. . Ln−1 8 > > > > > < > > > > > : _/ / / / / / / / / / ?? ? ? ? ? ? ? ? ? ? ? ? / / / / / / / / / / / / / / / / / / / / wwooooo ooooo ooooo ooooo o ? ? ? ? ? ? ? ? ? ? ? ? ? zztttt tttt tttt tttt / / / / / / / / / / Ln 8 < : ? ? ? ? ? ? ,, Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y ? ? ? ? ? ? ** T T T T T T T T T T T T T T rreeeeeeeeeeeeeee

eeeeeeeee ? ? ? ? ? ? . . . ? ? ? ? ? ? ˜ f1 f˜2 f˜3 . . . f˜m

(43)

3.2 Designing a Filter Network 27 eliminated at least be minimized so that the same intermediate result is not computed more then absolutely necessary.

3. If the filters can be decomposed into sparse subfilters with very few non zero coefficients this results in further lowering the computational load. The sparsity must of course be exploited in the convolver used. An example of such a convolver can be found in [30].

3.2 Designing a Filter Network

For each output of the network there exists a desired filter response F (u) in the Fourier domain (in the spatial domain as well, but here the Fourier domain is used). By for example using the methods described in [12] a filter response ˜F (u) can be created. This filter response is linear with respect to the nonzero coefficients [26] c = [c1, c2, . . . , cN]T ∈ CN, due to ˜ F (u) = F{ ˜f (ξ)} = X ξ∈Zn ˜ f (ξ) exp−iξT_u₌ N X k=1 ckexp −iξT_u _(3.1)

The idea is that the filter network should approximate this filter response as good as possible. By looking at figure 3.1 it is clear that a filter network can have more then one output. Meaning that there exists more then one desired filter response. Then it follows that an appropriate optimization problem to solve is to choose the coefficients in the filter network as the solution to the following expression

c∗= arg min ǫ2(c) = M X m=1 Wm(u) ˜ Fm(u) − Fm(u) 2 (3.2) Where ˜Fm(u) is the response of output m of the network and Fm(u) is the desired

filter response from filter m. The function Wm(u) is a Fourier weight function

which describes at what frequencies a close fit is most important, [12]. Usually the estimated spectrum of the signal is used to define this weight function.

When optimizing a filter network the following steps have to be carried out 1. Choose the structure of the network

2. Choose internal properties, i.e. the number of filter coefficients and their spatial locations for each subfilter.

3. Choose the coefficient values, i.e. solve the minimization problem given by eq. (3.2).

Ideally all three of the above steps should be solved simultaneously when designing the network. The best would be to have an algorithm that given an arbitrary set of desired filter responses could return the entire filter network with structure, coefficient coordinates and coefficient values [27]. This is however a very complex task and it is doubtful that such a solution ever will be found.

(44)

The objective of the optimization process is to find a filter network with a minimal number of coefficients while keeping the performance of the net close to the ideal filter set. Moreover the errors should be spread as evenly as possible among the outputs [27].

The actual optimization process of the filter networks is beyond the scope of this study, but for the interested reader a method based on block relaxation is given in [25] that optimize a filter network given the structure of the net and the coordinates of the nonzero coefficients in each filter. In [22] is a method that uses a genetic algorithm1_{for placing the filter coefficients presented.}

In this study it is assumed that an optimized filter network for estimation of local orientation in 3D signals already exists and the task here is to make the implementation as fast as possible. (The nets has been optimized with the previously mentioned block relaxation algorithm.)

3.3 Examples of filter networks

It is however possible to construct filter networks that uses some of the desired properties that comes with the network structure without going through the com-plicated and time consuming optimization process described above.

First the concept of Cartesian separability will be reviewed since this is used extensively in both examples.

3.3.1 Cartesian separability

A function in two dimensions f(x1, x2) is Cartesian separable if the function can

be written as

f (x1, x2) = f1(x1)f2(x2)

for some functions f1(·) and f2(·). This also has some influence on the convolution.

An ordinary convolution between the signal s(x1, x2) and the kernel f (x1, x2) is

given by g(x1, x2) = (s ∗ f)(x1, x2) = ∞ X k=−∞ ∞ X l=−∞ f (x1− k, x2− l)s(k, l) 1

Genetic algorithms are an optimization method that tries to mimic biological systems by using features like mutations and crossovers to ”breed” new solutions [7].

(45)

3.3 Examples of filter networks 29 Now by using the Cartesian separability of f(x1, x2) the above expression can be

rewritten g(x1, x2) = ∞ X k=−∞ ∞ X l=−∞ f1(x1− k)f2(x2− l)s(k, l) = = ∞ X k=−∞ f1(x1− k) ∞ X l=−∞ f2(x2− l)s(k, l) | {z } g2(k,x2)=(s∗f2)(k,y) = = ∞ X k=−∞ f1(x1− k)g2(k, x2) = (f1∗ g2)(x1, x2)

So if the kernel is Cartesian separable then the following is also true

(s ∗ f)(x1, x2) = (s ∗ f1∗ f2)(x1, x2) (3.3)

3.3.2 Gaussian low pass filtering

This simple example makes it possible to count the number of operations that have to be performed in the two cases with and without the filter network. Consider a simple low pass filtering with a Gaussian kernel

f (x1, x2) = exp −x 2 1+ x22 2σ2

This kernel is Cartesian separable with f1(x1) = exp −x 2 1 2σ f2(x2) = exp −x 2 2 2σ

By using eq. (3.3) this kernel, of say size n×n, can be decomposed into two smaller kernels of size 1×n and n×1 respectively. Assume that the signal is of size m×m. If all border effects are neglected n2_{multiplications and n}2_{−1 additions are needed}

for computing the filter response at some point (x1, x2). Then for the whole signal

n2_m2_{multiplications and m}2_(n2_{− 1) additions are needed.}

If the kernel is separated as described above only n multiplications and n − 1 additions is needed for each of the two decomposed filters. Giving a total of 2nm2 _{multiplications and 2m}2_{(n − 1) additions for the whole filter response. If}

an addition and a multiplication are approximated to cost equally in time and/or computer resources and lower order terms are neglected this results in table 3.1. From this it should be obvious that the separable implementation is more compu-tational efficient for n ≥ 3.

The resulting layered filter structure in figure 3.2 do not look very much like a network. Therefore another example is given.

(46)

Method Number of operations

Direct implementation 2m2_n2

Separable implementation 4m2_n

Table 3.1. Computation complexity for separable Gauss kernel

f f1 f2

Figure 3.2. Cartesian separation of a Gaussian kernel. Left: Filter network with the two dimensional kernel. Right: Network with the separated kernel.

3.3.3 Polynomial expansion

This type of polynomial expansion was introduced by Farnebäck in [8] and is a method to approximate every neighborhood in a signal (in this case a 2D sig-nal) with a polynomial of (in this case) degree 2. That is every neighborhood is approximated with

f (x) ∼ xT_Ax_{+ b}T_x_{+ c}

where A is a symmetric matrix, b is a vector and c is a scalar. The coefficients of the model can be estimated in terms of normalized convolution [11] with the basis functions {1, x, y, x2_{, y}2_{, xy} and certainty identical to 1 in every point. For}

a short introduction to polynomial expansion see appendix C. If this introduction is not sufficient the reader is referred to [8].

ˇ ˜ b1 ttjjjjjjj jjjjjjjj ˇ ˜ b2ooo oooo wwoo ˇ˜_b 3 ˇ ˜ b4 ? ? ? ? ? ? ˇ ˜ b5 O O O O O O O '' O O ˇ ˜ b6 ** T T T T T T T T T T T T T T T 1 x2 _y2 _x _y _xy

Figure 3.3. Polynomial expansion by a series of convolutions.

For this example it is enough to know that polynomial expansion can be per-formed by a series of convolutions, more precisely the convolution scheme in figure 3.3 can be used to perform polynomial expansion in a kind of ”brute force” way. How the filters are created is described in appendix C.

The solution to the polynomial expansion, that is the coefficients {ri}6i=1 for

each basis function, is given by

r= (B∗_W2

(47)

3.3 Examples of filter networks 31 where r = r1 r2 . . . r6

T

. f is a neighborhood of the signal, B is a matrix with the basis function in its columns and Wais diagonal matrix with a Gaussian

in the diagonal. See Appendix C for more details. The part of eq. (3.4) that is to be inverted is independent of the signal f and can therefore be computed once and for all. This will actually result in a sparse matrix, [8]. Introduce the matrix Gso that G−1= (B∗_W2

aB)−1 and the solution can now be written

r(x, y) = G−1B∗W2_af = G−1      (a · b1) • f(x, y) (a · b2) • f(x, y) .. . (a · b6) • f(x, y)     

The · here means elementwise multiplication in the above expression. It should be rather obvious that the basis functions {1, x, y, x2_{, y}2_{, xy} are Cartesian separable}

by using the following functions

f0(x) = 1 fx(x) = x fxx(x) = x2 g0(y) = 1 gy(y) = y gyy(y) = y2

1 b1(x, y) = f0(x)g0(y) = b1x(x)b1y(y) x b2(x, y) = fx(x)g0(y) = b2x(x)b2y(y) y b3(x, y) = f0(x)gy(y) = b3x(x)b3y(y) x2 _b 4(x, y) = fxx(x)g0(y) = b4x(x)b4y(y) y2 _b 5(x, y) = f0(x)gyy(y) = b5x(x)b5y(y) xy b6(x, y) = fx(x)gy(y) = b6x(x)b6y(y) (3.5)

Now using the fact that the scalar products can be seen as ordinary convolutions, see appendix C for a proof, and the properties of Cartesian separability the above solution can be calculated with

r(x, y) = G−1      [ax· b1x] ∗ [ay· b1y] ∗ f(x, y) [ax· b2x] ∗ [ay· b2y] ∗ f(x, y) .. . [ax· b6x] ∗ [ay· b6y] ∗ f(x, y)     

That is the polynomial expansion can be performed using the convolution scheme in figure 3.4. From figure 3.4 and eq. (3.5) it is clear that there are some redun-dancy, some of the convolutions are performed between the exact same functions. By eliminating this redundance the network in figure 3.4 can be rewritten into the network in figure 3.5. This result in a more networklike structure and the fact that the output of a layer can contribute to more than one input in the next layer is used. By using this the polynomial expansion can be performed by using the 9 convolutions in figure 3.5 instead of 12 as in figure 3.4.

By using this convolution scheme and the sparse matrix G−1 _{one get a more}

efficient way to implement the polynomial expansion.

This was two simple examples of how the introduction of a layered structure and filter network can help to create a more efficient filtering.

(48)

axb1x ttjjjjjjj jjjaxbjjj2xooojj o wwoooo axb3x axb4x ? ? ? ? ? ? axb5x O O O O '' O O O O axb6x ** T T T T T T T T T T T T T T T ayb1y ayb2y ayb3y ayb4y ayb5y ayb6y 1 x2 _y2 _x _y _xy

Figure 3.4. Filter network for polynomial expansion when the Cartesian separability in the basis functions have been exploited. The symbol after the output layer indicates what coefficients comes out where in the network.

axf0 wwoooooo oooo axfx ?? ? ? ? ? _a xfxx ** T T T T T T T T T T T T T T T ayg0 aygy aygyy ? ? ? ? ? ? ayg0 aygy ? ? ? ? ? ? ayg0 1 y y2 x xy _x2

Figure 3.5. Filter network for polynomial expansion when the Cartesian separability in the basis functions have been exploited and the redundance has been eliminated.

3.4 Filter network for 3D orientation

Two filter networks for estimating orientation in 3D signals will be used. The first one is the one published in [25]. This filternet can be seen in figure 3.6. It uses a total of 335 coefficients. The other filter network used uses a more separated structure needing only 168 coefficients and can be seen in figure 3.7. Both these numbers should be compared with the 11979 coefficients2 _{needed in}

a direct implementation. Another network, which is based on the net in figure 3.7, but also has a part which can be used for image/volume enhancement will be used. This net is however not shown here. Both the orientation networks uses the loglet filters described earlier in chapter 2.

2

Based on 9 filters of size 113 .

(49)

3.4 Filter network for 3D orientation 33 L1 8 < : z1 ?? ? ? ? ? L2 8 < : z2 ??? ? ? ? z3 ? ? ? ? ? ? ttjjjjjjj jjjjjjjj L3 8 < : z4 ?? ? ? ? ? ? ? ? ? ? ? rreeeeeeeeee eeeeeeeeee eeee ? ? ? ? ? ? L4 8 < : z5 z6 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? L5 8 > > > > > < > > > > > : // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? $$ J J J J J J J J J J J J J J J J J '' O O O O O O O O O O O O O O O O O O O O O )) R R R R R R R R R R R R R R R R R R R R R R R R R ** T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T ** V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V V ++ W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W W // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? $$ J J J J J J J J J J J J J J J J J '' O O O O O O O O O O O O O O O O O O O O O )) R R R R R R R R R R R R R R R R R R R R R R R R R ** T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T wwooooo ooooooo oooooo ooo zztttt tttt tttt tttt t // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? $$ J J J J J J J J J J J J J J J J J '' O O O O O O O O O O O O O O O O O O O O O ttjjjjjjj jjjjjj jjjjjj jjjjjj jjjjj uullllll llllll llllll llllll l wwooooo ooooooo oooooo ooo zztttt tttt tttt tttt t // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? ssgggggggggggg gggggggg gggggggg gggggggg ggg tthhhhhhhh hhhhhhhh hhhhhhhh hhhhhhhh hh ttjjjjjjj jjjjjj jjjjjj jjjjjj jjjjj uullllll llllll llllll llllll l wwooooo ooooooo oooooo ooo zztttt tttt tttt tttt t L6 8 < : z7 z8 z9 z10 z11 z12 z13 z14 z15 h1 h2 h3 h4 h5 h6 h7 h8 h9

Figure 3.6. Filter network for estimating orientation in 3D signals by using a Loglet filter set as described in chapter 2. Arcs without any associated letter are single coefficient subfilters. The net has totally 335 coefficients.

L1 8 < : z1oo oo wwoooo z2 z3 O O O O '' O O O O L2 8 < : z4 z5 z6 L3 8 < : z7 z8 z9 L4 8 > > > > > < > > > > > : // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? $$ J J J J J J J J J J J J J J J J J '' O O O O O O O O O O O O O O O O O O O O O )) R R R R R R R R R R R R R R R R R R R R R R R R R ** T T T T T T T T T T T T T T T T T T T T T T T T T T T T T T wwoooooo oooooo ooooo oooo zztttt tttt tttt tttt t // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? $$ J J J J J J J J J J J J J J J J J '' O O O O O O O O O O O O O O O O O O O O O ttjjjjjjj jjjjjj jjjjjjjjj jjjjjj jj uullllll llllll llllll llllll l wwoooooo oooooo ooooo oooo zztttt tttt tttt tttt t // / / / / / / / / ? ? ? ? ? ? ? ? ? ? ? ? ? L5 8 < : z10 z11 z12 z13 z14 z15 z16 z17 z18 h1 h2 h3 h4 h5 h6 h7 h8 h9

Figure 3.7. Another filter network for estimating orientation. This net has a more separated upper part (L1–L3). Because of the separation only 168 coefficients are needed.

(50)

(51)

Chapter 4

Parallel Computing

In the chapter a short introduction to parallel computing will be given and how it can be used to increase the performance of software. Parallelization can be implemented in a number of ways. The type of parallelization that will be used in the implementation of the filter network is called SIMD and uses Intel’s SSE2 technology. Therefore this concept will be described in more detail.

4.1 Introduction

An ordinary desktop computer can be modeled with the von Neumann model [9], which look like figure 4.1. Here a program makes up a stream of instructions that is carried out sequentially by the central processing unit (CPU). It should be rather

Figure 4.1. The von Neumann model of a computer. A program defines a set of instructions that is to be carried out by the CPU.

obvious that a computer according to this model can literally just do one thing at a time. Therefore it has been motivated to try to modify computers and this model in an attempt to increase the performance of the computer. This has introduced new computers that are called multicomputers and/or parallel computers. Depending on how the architecture of the von Neumann model is modified is it possible to classify the different types of machines that arise by how the instruction- and datastreams are accessed and used [9].

(52)

• The von Neumann model in it self is what is called a SISD computer, that is a Single Instruction stream Single Data stream. The computer has one CPU which is doing all the computations and this CPU is working on one data stream.

• By adding more CPUs to the von Neumann model yields a machine like the one in 4.2(a). This type of computer is what is called SIMD computer, Single

Instruction stream Multiple Data stream. Here the machine has several

CPUs that are performing the exact same instruction stream (they are doing the exact same thing) but on different data streams.

• Another way to modify the original model is to merge several von Neumann type machines together. This is what is called a MIMD computer, Multiple

Instruction stream Multiple Data stream. Here every CPU have its own

instruction stream (that is they are not necessarily doing the same thing) and all processors are working on their own data stream. However if all processors execute the same program this is a SIMD machine. A schematic of such computer is found in figure 4.2(b).

• The observant reader should notice that according to how the abbreviation is modified a fourth modification is possible. That is the MISD computer,

Multiple Instruction stream Single Data stream. This type of machine is

very rare and it is doubtful if any have been commercially successful.

(a) Model of a SIMD computer (Single Instruction Streams and Multiple Data streams).

Interconnecting network

(b) Model of a MIMD computer (Multi-ple Instruction Streams and Multi(Multi-ple Data streams).

Figure 4.2. To different ways to create parallel computers by modifying the von Neu-mann model in figure 4.1.

In this study only SIMD parallelization will be used. The SIMD model that is used will however not exactly coincide with the one in figure 4.2(a) since this model has several physical CPUs. The model used will however coincide very well with what the abbreviation SIMD stands for. SIMD parallelization is very suitable in

(53)

4.2 MMX/SEE/SEE2 Technology 37 applications that consists of repetitive operations on a large array of data. For example in image processing where a convolution is nothing else then the same weighted linear summation performed on every pixel in the picture (the same instruction stream) but using different neighborhoods (different data streams).

4.2 MMX/SEE/SEE2 Technology

With the introduction of the Intel Pentium II processor in 1997 Intel also in-troduced the MMX technology which was created specifically for handling video, audio and graphical applications efficiently, [24]. The MMX technology uses a SIMD like execution to increase the performance of the processor. Later in 1999 when the Pentium III was introduced the MMX technology was extended by SSE. The SSE was created to further enhance the performance of advanced imagining, 3D, streaming audio, video and speech recognition applications [5]. The SSE has since then been extended as well by the SSE2. All three technologies work in a similar way and since it is the SSE2 that will be used this is what will be described in more detail.

Since the implementation will run on an ordinary Pentium 4 desktop computer no more processors will be introduced. The Streaming SIMD Extensions (SSE) is a software model that can be used to achieve parallelization in software. This technology uses instructions that operate on a set of 128 bit long floating point registers, the XMM registers. These registers can be loaded with anything that is 128 bits (16 bytes) long, see figure 4.3. This means that different size of the data

XMM register 0 127 2 doubles 0 127 4 floats 0 127 16 bytes 0 127

Figure 4.3. Examples of how a XMM register can be loaded.

results in different amount of speedups. With smaller data more variables can be loaded into the registers and operations are then performed on more variables at the same time. For the purpose here the registers will be loaded with 4 floats.

The parallelization is achieved by letting multiple data elements be manipu-lated with a single instruction. Two examples of how the SSE2 functions work can be found in figure 4.4