Make it Flat : Detection and Correction of Planar Regions in Triangle Meshes

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Make It Flat: Detection and Correction of Planar

Regions in Triangle Meshes

Examensarbete utfört i Datorseende vid Tekniska högskolan vid Linköpings universitet

av Mikael Jonsson LiTH-ISY-EX--16/4930--SE

Linköping 2016

Department of Electrical Engineering Linköpings tekniska högskola

Linköpings universitet Linköpings universitet

(2)

(3)

Make It Flat: Detection and Correction of Planar

Regions in Triangle Meshes

Examensarbete utfört i Datorseende

vid Tekniska högskolan vid Linköpings universitet

av

Mikael Jonsson LiTH-ISY-EX--16/4930--SE

Handledare: Hannes Ovrén

isy_{, Linköpings universitet}

Martin Svensson

Spotscale AB

Examinator: Per-Erik Forssén

isy, Linköpings universitet

(4)

(5)

Avdelning, Institution Division, Department

Computer Vision Laboratory Department of Electrical Engineering SE-581 83 Linköping Datum Date 2016-03-22 Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport

URL för elektronisk version

http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-126589

ISBN — ISRN

LiTH-ISY-EX--16/4930--SE Serietitel och serienummer Title of series, numbering

ISSN —

Titel Title

Detektion och tillrättning av plana ytor i 3D-modeller

Make It Flat: Detection and Correction of Planar Regions in Triangle Meshes

Författare Author

Mikael Jonsson

Sammanfattning Abstract

The art of reconstructing a real-world scene digitally has been on the mind of researchers for decades. Recently, it has attracted more and more attention from companies seeing a chance to bring this kind of technology to the market. Digital reconstruction of buildings in par-ticular is a niche that has both potential and room for improvement. With this background, this thesis will present the design and evaluation of a pipeline made to find and correct ap-proximately flat surfaces in architectural scenes. The scenes are 3D-reconstructed triangle meshes based on RGB images. The thesis will also comprise an evaluation of a few different components available for doing this, leading to a choice of best components. The goal is to improve the visual quality of the reconstruction.

The final pipeline is designed with two blocks – one to detect initial plane seeds and one to refine the detected planes. The first block makes use of a multi-label energy formulation on the graph that describes the reconstructed surface. Penalties are assigned to each vertex and each edge of the graph based on the vertex labels, effectively describing a Markov Random Field. The energy is minimized with the help of the α-expansion algorithm. The second block uses heuristics for growing the detected plane seeds, merging similar planes together and extracting deviating details.

Results on several scenes are presented, showing that the visual quality has been improved while maintaining accuracy compared with ground truth data.

Nyckelord

(6)

(7)

Sammanfattning

Konsten att digitalt rekonstruera en verklig miljö har länge varit intressant för forskare. Nyligen har området även tilldragit sig mer och mer uppmärksamhet från företag som ser en möjlighet att föra den här typen av teknik till produkter på marknaden. I synnerhet är digital rekonstruktion av byggnader en nisch som har både stor potential och möjlighet till förbättring. Med denna bakgrund så presenterar detta examensarbete designen för och utvärderingen av en pipeline som skapats för att detektera och rätta till approximativt platta regioner i arki-tektoniska miljöer. Miljöerna är 3D-rekonstruerade triangelmeshar skapade från RGB-bilder. Examensarbetet omfattar även utvärdering av olika komponenter för att uppnå detta, som avslutas med att de mest lämpliga komponenterna presen-teras. Målet i korthet är att förbättra den visuella kvaliteten av en rekonstruerad modell.

Den slutgiltiga pipelinen består av två övergripande block – ett för att detektera initiala plan och ett för att förbättra de funna planen. Det första blocket använder en multi-label energiformulering på grafen som beskriver den rekonstruerade ytan. Straffvärden tilldelas varje vertex och varje båge i grafen baserade på var-je vertex label. På så sätt beskriver grafen ett Markov Random Field. Energin är sedan minimerad med α-expansion-algoritmen. Det andra blocket använder heu-ristiker för att låta planen växa, slå ihop närliggande plan och för att extrahera avvikande detaljer.

Resultat på flera miljöer presenteras också för att påvisa att den visuella kvalite-ten har förbättrats utan att rekonstruktionens noggrannhet har försämrats jäm-fört med ground truth-data.

(8)

(9)

Abstract

The art of reconstructing a real-world scene digitally has been on the mind of researchers for decades. Recently, it has attracted more and more attention from companies seeing a chance to bring this kind of technology to the market. Digital reconstruction of buildings in particular is a niche that has both potential and room for improvement. With this background, this thesis will present the design and evaluation of a pipeline made to find and correct approximately flat surfaces in architectural scenes. The scenes are 3D-reconstructed triangle meshes based on RGB images. The thesis will also comprise an evaluation of a few different components available for doing this, leading to a choice of best components. The goal is to improve the visual quality of the reconstruction.

The final pipeline is designed with two blocks – one to detect initial plane seeds and one to refine the detected planes. The first block makes use of a multi-label energy formulation on the graph that describes the reconstructed surface. Penal-ties are assigned to each vertex and each edge of the graph based on the vertex labels, effectively describing a Markov Random Field. The energy is minimized with the help of the α-expansion algorithm. The second block uses heuristics for growing the detected plane seeds, merging similar planes together and extracting deviating details.

Results on several scenes are presented, showing that the visual quality has been improved while maintaining accuracy compared with ground truth data.

(10)

(11)

Acknowledgments

First, for being there and for all discussions big and small, I thank my friends – both those that know the dread of calculating triple integrals in spherical coordi-nates through variable substitution and those that have no idea what that is. Second, all the help from colleagues at Spotscale and people at Linköpings univer-sitet have been great, in particular the commitment from my examiner Per-Erik Forssén.

I would also like to give thanks to my family for the love and support through my years of education.

Finally, special thanks to the Swedish state for sponsoring me all this time. I promise you will get your money back eventually.

Linköping, March 2016 Mikael Jonsson

(12)

(13)

6.1.8 Mesh flattening . . . 45 6.2 Quantitative Results . . . 49 7 Discussion 53 7.1 Results . . . 53 7.1.1 Curvature segmentation . . . 53 7.1.2 Plane refinement . . . 54 7.1.3 Evaluation . . . 54 7.2 Method . . . 55 7.2.1 Curvature segmentation . . . 55 7.2.2 Plane refinement . . . 55 7.2.3 Photo-consistency refinement . . . 56 7.2.4 Evaluation . . . 57

7.2.5 General comments and comparison . . . 57

7.2.6 Source criticism . . . 57

7.3 Ethical and Societal Aspects . . . 58

8 Conclusions 61 8.1 Summary . . . 61

8.2 Problem Formulation Conclusions . . . 61

8.2.1 Well-delimited planes . . . 62

8.2.2 Unchanged accuracy by flattening . . . 62

8.2.3 Improved visual quality . . . 62

8.2.4 Different levels of simplification . . . 62

(15)

Contents xi

A Complementary result images 67

A.1 Vreta Church . . . 67

A.2 Vasallen . . . 71

A.3 Herz-Jesu-P8 . . . 72

A.4 Container . . . 74

(16)

(17)

Notation

Abbreviations

Abbreviation Meaning

cpu _{Central Processing Unit}

gpgpu _{General-Purpose computing on Graphics Processing} Units

gt Ground Truth

kde Kernel Density Estimation lidar LIght Detection And Ranging

mvs Multi-View Stereo

ncc Normalized Cross Correlation ransac RANdom SAmple Consensus

rgb _{Red, Green, Blue} rms _{Root Mean Square} sdf _{Signed Distance Function}

slam _{Simultaneous Localization And Mapping} std _{STandard Deviation}

(18)

(19)

1

Introduction

This master thesis will present a method for detecting approximately flat sur-faces in 3D meshes and replacing them with planes. The method is designed with architectural scenes in focus but can also be applied to other types of scenes containing flat regions. The method is applied as a post-processing step to a com-pleted 3D mesh, which makes it easy to apply at the end of a 3D reconstruction pipeline.

This first chapter will present the thesis motivation, the goal of the thesis, a re-view of related work in the field and delimitations in method and data.

1.1 Motivation

The art of 3D reconstruction has for many years now been a hot topic. Techniques have matured quite well and even begun to specialize on commercial and scien-tific interests. One of these specializations is architectural/urban scene recon-struction and understanding. The question asked is: How can we take advantage of the specific characteristics in these environments?

When producing 3D reconstructions of buildings based on structure from motion (SfM), most of the same algorithms are used as when performing SfM on other types of scenes (e.g. indoor and turn-table). We are thus not utilizing what’s par-ticular with building façades to improve the reconstruction – the most apparent being that big parts of the scanned surfaces can be assumed to be flat.

A gain from finding planes would be to get rid of uneven surfaces. This does not always pose a problem in meshes with texture but would definitely give an advantage by allowing richer lighting to 3D scenes, which does not look good in

(20)

2 1 Introduction

an uneven mesh, see figure 1.1. Furthermore, finding windows would allow us to set these surfaces as reflective rather than applying a diffuse texture to them. Triangle count and mesh complexity could also be reduced by finding planes.

Figure 1.1:An uneven mesh surface.

These are all important motivations for Spotscale, a company dealing with 3D reconstruction of architectural scenes. This thesis is conducted under their super-vision, meant as an addition to already existing functionality.

1.2 Goal

The goal of this thesis is to construct a system that detects and replaces surface regions with planes in a mesh. This shall be done to improve visual quality while not significantly perturbing the original. The mesh shall be obtained from 3D re-construction of rgb images. The detection and replacement is to be implemented as a pipeline in two steps, see figure 1.2. The first block shall provide initial plane seeds1that are not required to cover complete surfaces. The second block shall improve the limits of the plane seeds, particularly being able to extend a plane to replace as large a surface as possible.

Plane detection Plane refinement

Initial plane seeds Reconstructed 3D mesh Flattened 3D mesh

Figure 1.2: Overview of the pipeline with processing blocks (rectangles), input/output (diamonds) and partial result (dotted diamond).

We express the goal of the thesis by posing the following questions:

• Can well-delimited approximately planar surfaces be found in meshes ac-quired through 3D reconstruction?

• Is it possible to replace these surfaces with planes while maintaining recon-struction accuracy?

(21)

1.3 Related Work 3

• Will the overall visual quality improve after replacing surfaces with planes? • Can detection and replacement of planar surfaces be adapted to different

levels of simplification?

The last question means it should be possible to adapt the level of allowed per-turbation as a trade-off between faithfulness to original and simplifying the rep-resentation.

The evaluation of this thesis is two-fold. The first part consists of selecting pipeline components that best suits the questions posed, which shall be performed by re-viewing methods theoretically and experimentally. The second part consists of evaluating how the reconstruction accuracy has changed before and after plane substitution, through visual inspection and with an error measure.

1.3 Related Work

To be able to reach the goal of the thesis, a survey of related work is presented. Some of them will provide valuable algorithms or ideas for the pipeline creation. There is much recent research on the subject of urban reconstruction. Some pub-lications use as input only 2D images [21, 14]. Others use other input data such as RGB-D imagery [28, 13] or lidar data [17]. There is also a great deal of re-search that uses aerial imagery alone or combine it with other data [30]. A very recent publication [20] does semantic segmentation completely in the 3D domain, allegedly mainly to gain speed-up compared to using both 2D and 3D informa-tion. All of the above either use data that is not interesting for this thesis, or have focused more on simplification than is desired.

Another, slightly different, line of research does not explicitly try to find seman-tic meaning but basic geometric shapes (also known as primitives) thus finding meaning implicitly by extracting planes, cylinders, spheres and other shapes which could represent windows, walls, columns, parts of fences, domes etc. In Schnabel et al. [24] they present an efficient way of finding shapes in point clouds but there everything is substituted by primitives rather than leaving some details unchanged. In our work, it is however desired to keep details intact. In Lafarge et al. [16] both primitives and mesh patches are handled. It is explicitly aimed at urban scenes by using cues like symmetry and alignment for shape improvement. It is done by analyzing and manipulating a mesh, based on a sparse or semi-dense 3D reconstruction. Their results seem to fulfill the goal of this thesis, but their method is also a multi-view stereo algorithm. Our work aims to add components as post-processingafter a complete 3D reconstruction pipeline.

There are several interesting papers related to finding piece-wise planar surfaces in reconstructed 3D meshes. In Sinha et al. [25] there is no discrimination be-tween what is appropriate to approximate as a plane and what is not; all surfaces are assigned a plane which is too much simplification for our goal. In Gallup et al. [12] a measure is used to decide whether a region is planar or not with the

(22)

4 1 Introduction

help of a trained classifier. The main source of information is however depth maps from different views, which is not interesting as input in the work of this thesis. Furukawa et al. [11] introduces the Manhattan-world assumption where three dominant axes are estimated and plane hypotheses are generated in these directions. These ideas are inspiring but the three dominant axes assumption is too strict to be useful in this thesis.

1.4 Delimitations

The duration of this thesis work is time-limited and therefore some delimitations has to be presented.

Only plane primitives

Some of the references strive not only to find planes but also other primitives. In this thesis, no effort will be done to find other types of shapes – only planar and non-planar parts will be distinguished.

Used types of data

There are several types of data being produced in the different stages of a 3D reconstruction pipeline, such as raw point clouds and depth maps. In this thesis we have confined the methods to use 3D mesh geometry and topology as well as raw images.

Evaluation data

Even though the system should be applicable to any type of scene that contains planar surfaces, the evaluation meshes will all be of architectural character. Limited number of component candidates

Part of this work will consist in selecting suitable pipeline components. Many exist in literature but only a few will be chosen and reviewed as candidates in this thesis.

1.5 Thesis Structure

The layout structure of the rest of the thesis is as follows.

Chapter 2 presents theoretical background necessary for the understanding of this work.

Chapter 3presents the components to be evaluated for use in the pipeline. Chapter 4gives some practical implementation details, presents used data sets and explains the methodology of evaluation.

The results of the pipeline component evaluation and selection is found in chap-ter 5, followed by the quality evaluation in chapchap-ter 6.

Chapter 7and chapter 8 presents results and methodology discussion and con-clusions respectively.

(23)

2

Background Theory

This chapter presents the theoretical background of the used components in this thesis work. Section 2.1 and section 2.2 present definitions and explanations of the data that is to be handled by the proposed system. Section 2.3 introduces the

curvature concept which is used in a plane detection method in section 3.1.1.

2.1 Surface Representation: Triangle Mesh

There is a variety of possibilities for digitally representing a three-dimensional object. It is common to distinguish between two major classes: theimplicit

repre-sentation and theparametric. The implicit representations most notably include

thesigned distance function, sdf, which allows querying the distance from any

point in space to the closest point on the surface of the object. The most common among parametric representations is the triangle polyhedral, but there also exist other polygonal varieties, such as a polyhedral made up of quadrilaterals. The triangle polyhedral is the usual choice due to its simplicity, as it describes a differ-entiability class C0piece-wise linear surface [3, p. 6]. It is also the representation used in this thesis.

Definition 2.1 (Triangle Mesh). Atriangle mesh is a set of vertices V = {vi}, a set

ofedges E = {ei}pair-wisely connecting these vertices, ei = (vk, vl), and a set of

(triangle)faces F = {fi}delimited by three vertices and three corresponding edges,

fi = (vk, vl, vm). A mesh M, also called a model, is the tuple of its components

M = (V , E, F).

The number of vertices is NV such that {vi}= v1...vNV. Likewise, the number of edges is NEand the number of faces is NF.

(24)

6 2 Background Theory

A mesh may have attributes related to the components presented in definition 2.1, such as facenormals. The normal of a face is defined as the cross product vector

of two vectors going from a vertex v1 to vertices v2 and v3, all cornerpoints of the face. The order chosen on the cross product is such that the direction of the resulting vector isout of the surface. See figure 2.1 for an illustration of a mesh

with normals.

X

Y

Z

v

1

_v

2

v

3

v

4

(a)Four vertices.

X

Y

Z

e

1

e

2

e

₃

e

4

e

5

(b)Five edges connecting the ver-tices in (a).

X

Y

Z

f

1

f

₂

(c) Two faces delimited by the vertices in (a) and edges in (b).

X Y

Z

n

1

n

2

(d)Normals added to the faces in (c).

Figure 2.1:Mesh components residing in a 3-dimensional space. The lower right image is a mesh consisting of two faces, five edges and four vertices.

In this thesis, the termsneighboring and adjacent are used related to vertices, faces

and planes. For two vertices, a neighbor or adjacency relation means there is an edge connecting the two. For faces, it means two faces share an edge. For planes, it means two planes share an edge or a vertex.

The termnon-degenerate surface is used to describe that there are neither edges

with more than two incident faces, nor vertices that locally form infinitely thin structures. See figure 2.2 for an illustration. Another used term is isotropic as

opposed toanisotropic meshes. The former means that the elements are locally

uniform, here equilateral triangles. The latter means that triangle shapes are irregular, often adapted to the surface.

(25)

2.2 Image-based 3D Reconstruction Pipeline 7

Figure 2.2: Degenerate surface details. An edge with three incident faces and a vertex that forms an infinitely thin structure.

2.2 Image-based 3D Reconstruction Pipeline

The 3D reconstruction pipeline upon which this project has been implemented is a multiple-image passive1_{reconstruction. The input is a set of unordered images} of the environment to be modeled and the output is a digital representation of the environment in the form of a textured triangle mesh. Here follows a brief review of reconstruction methods.

Note that when the expression3D reconstructed or 3D reconstruction appears, it

refers to the type of multiple-image passive reconstruction described here, unless otherwise stated.

2.2.1 Image Acquisition

When performing image-based 3D reconstruction, a large set of images is usually used to obtain high quality results, both regarding mesh quality (resolution and accuracy) and texturing quality. There are different methods to obtain the set of images. Some large-scale reconstruction efforts, e.g. Frahm et al. [10], use pub-licly available photos usually taken by laymen from the ground and during differ-ent occasions. When dealing with a smaller urban environmdiffer-ent, an unmanned aerial vehicle equipped with a camera with known camera parameters is a useful tool.

There might or might not be a system to filter the input images to a smaller set of more relevant images. Decimating the image set improves processing speed and might improve quality.

2.2.2 Structure from Motion

From the image set, the goal is to find points in pairs of images that correspond to the same point in 3D space. Then it is possible to triangulate the position of these points while simultaneously positioning the cameras in the same 3D space. The point correspondence problem is solved with the help of feature matching

(26)

such as the popular SIFT [18], LK-tracking [19]2 or AKAZE [1], recently imple-mented in the OpenCV [5] library.

The point correspondences are then used as input to asimultaneous location and mapping, slam, procedure of the 3D points and the camera positions. This

nor-mally happens after subjecting the correspondences to an outlier rejection algo-rithm, such as ransac [9].

This step outputs a number of 3D points and camera positions with orientation, normally referred to as asparse point cloud and camera poses.

The point cloud from the previous step is as mentioned oftensparse, which means

that 3D points have only been triangulated where there exists salient features in the input images. Since the ultimate goal with the 3D reconstruction pipeline is to obtain a 3D mesh triangulated from the point cloud, the point cloud has to be made more dense.

2.2.3 Multi-View Stereo

At this stage the camera poses are considered to be known. This makes way for simplifications in the process of matching each point in an image to a point in 3D space. The issue of finding point correspondences breaks down to a 1-dimensional search problem due to the epipolar geometry constraints that can be applied, see figure 2.3.

Figure 2.3: Epipolar lines illustrated in two views. A specific point in the right view must be represented by a point on the epipolar line in the left view.

Now that matching a point in one image with other images is a 1D search prob-lem for each image pair, there is still a need for a criterion for evaluating different points on the line. For this, there are a number ofphoto-consistency measures to

evaluate how well points or groups of points (typically some sort of window) cor-responds. A straightforward approach is to use thesum of squared differences, ssd, for two sets of pixels. Another choice that is invariant to bias and gain (unlike ssd) that has been very popular is normalized cross correlation, ncc, see defini-tion 2.2.

(27)

2.3 Principal Curvature and Directions 9

Definition 2.2 (Normalized Cross Correlation). For two sets of points f and g in two different views, the normalized cross correlation is defined as

ncc(f , g) = (f − µf) · (g − µg)

σfσg

(2.1) with µxbeing the mean of the set x (repeated in a set of the same size as x) and

σxbeing the standard deviation of the set x.

With a method to find a large set of 3D points and stereo correspondences be-tween all the provided images, we arrive at a dense point cloud. The next step is to connect the points and create a surface out of this.

2.2.4 Mesh Surface Generation

In order to understand how to generate a surface, we come back to the choice of representation. Going with the explicit representation, one method is very straight-forward. For each point, its closest neighbors (based on Euclidean dis-tance) are noted and polygons are created delimited by the points. This gives an exact interpolation of the input points, which means their positions have to be of high accuracy. Holes as well as degenerate situations in the mesh may follow if that is not the case.

The most common choice in today’s literature is reconstruction to an implicit rep-resentation. A way of describing this is to find the zero isosurface of the SDF of the supposed surface to reconstruct. Finding it includes examining the local neighborhood and approximating tangent planes, followed by a step to deter-mine normal orientation. When a sdf has been calculated, it is possible to query any point’s distance in space to the surface and if it is inside or outside of it. Note that even though this reconstruction process leads to an implicit represen-tation of the mesh, efficient conversion methods exist to obtain it in an explicit representation. Different such methods may produce meshes with different qual-ities. Emphasis can among other things be on:

• making the mesh asisotropic as possible rather than anisotropic.

• having uniform triangle size and density rather than having smaller and more triangles in detailed parts.

• preferring a smoother mesh rather than a detail-preserving.

2.3 Principal Curvature and Directions

There are many interesting properties to review for two-dimensional surfaces em-bedded in 3D, the most relevant for this thesis being itscurvature. This concept

is presented first on 2D surfaces, then on smooth 3D surfaces and lastly extended to discrete surfaces.

(28)

2.3.1 Curvature of Curve in 2D

Aplane curve is a curve in a plane. Each such curve has a scalar curvature value

which put simply is how much it deviates from a straight line. More precisely, if the curve is parametrized by the variable s, a unit tangent vector T can be found for each point on the curve, represented by T (s) illustrated in figure 2.4. The curvature then depends on therate of change of this tangent vector, as a function

of s. κ = dT ds (2.2) s T(s) n(s)

Figure 2.4:A curve parametrized by s. Normal and tangent vector of a cer-tain choice of s are shown.

2.3.2 On 3D Smooth Surfaces

Thenormal plane of a point in a tangent direction ~t on a two-dimensional surface

embedded in R3is the plane defined by the normal and the tangent vector, which are always orthogonal to each other. Intersecting the plane with the present sur-face will locally produce a plane curve withnormal curvature κn(~t) at that point.

For a fixed point on the surface, the normal will be the same but the plane curve will be different based on the chosen tangent vector. Figure 2.5 shows an illustra-tion of a normal plane.

Definition 2.3 (Principal Curvatures). Theprincipal curvatures κ1and κ2 of a point on the surface are the maximum and minimum curvature values respec-tively, of all the plane curves obtained by varying the tangent vector around the normal.

They are also known as κ1= κmaxand κ2= κmin.

Definition 2.4 (Principal Directions). Theprincipal directions ~e1and ~e2are the tangent vectors giving rise to the principal curvatures κ1and κ2respectively. They are also known as ~e1 = ~emaxand ~e2 = ~emin.

(29)

indi-2.3 Principal Curvature and Directions 11

n T

Normal plane

Figure 2.5: A normal plane on a smooth surface. The normal n and one tangent direction T is displayed.

cates flatness in that direction. The principal directions are always orthogonal to the normal vector and when κ1, κ2, the principal directions are also orthogonal to each other. This can be explained with the Euler theorem that relates normal curvature to the principal curvatures.

2.3.3 On 3D Discrete Surfaces

As described in section 2.1 the surface representation is a piece-wise planar mesh. Speaking in terms of differential geometry on smooth surfaces, there exists no meaningful curvature for neither thefaces of a triangle mesh (since they are

pla-nar) nor for thevertices of the mesh (since the gradient of the surface there is

undefined due to the C0 nature of the surface). Therefore, there is a need for a way to extend the notion of principal curvature and directions to this sort of representation.

One of the first well-used algorithms for discrete principal curvature was devel-oped by Taubin [27]. It would become too extensive to explain it in detail here but an important property is the following. For each vertex, it uses the vertex normal and the closest vertex neighbors to calculate principal curvature and directions. The curvature is the result of an averaging of the neighbors and thus becomes worse the less regular (deviation from equilateral) the triangles are since the al-gorithm does not take into consideration triangle shape, only connectivity. Later developed algorithms have improved the considerations for triangle shape. For this thesis have been chosen a method based on Douros and Buxton [8] that lo-cally approximates the surface with an analytic representation (a quadric surface patch) at each vertex. Differential properties are then calculated on this patch and assigned to the vertex.

(30)

(31)

3

Pipeline Components

Here are all the pipeline components that are to be implemented, tested and compared as candidate steps for the two blocks presented in figure 1.2.

For the plane detection block, only one method is presented: the curvature seg-mentation, section 3.1. It is thus not compared to any other in a later chapter. For the plane refinement block, two methods are presented: photo-consistency refinement, section 3.2.1, and plane growing, merging and dividing, section 3.2.2.

In the latter, different methods for performing the dividing step will be presented here and compared in a later chapter.

A term used in the following isplane projection error for vertices. This means the

shortest Euclidean absolute distance between the vertex’s 3D position and the plane.

3.1 Plane Detection: Curvature Segmentation

This section presents the chosen plane detection algorithm. It is a curvature seg-mentation algorithm based on a multi-label energy model (section 3.1.1) with complementary functions. The name curvature segmentation comes from the fact that it uses local curvature properties to perform a segmentation of the mesh in regions. Planar regions are then singled out and planes fitted to these regions. The output of this block is initial plane seeds.

3.1.1 Multi-label energy model

With principal curvature values (see section 2.3) for each vertex on a mesh, it is possible to assign a label describing a local surface property to each vertex.

(32)

14 3 Pipeline Components

For the sake of simplifying a mesh, four different labels can be considered:

pla-nar, developable convex, developable concave and non-developable. This section is an

adaption of Lafarge et al. [15].

A developable surface is a surface that can be unfolded to a plane without any

distortion. Speaking in terms of curvature, it is where at least one of the principal curvatures is zero. The three first labels are thus specializations of developable. Furthermore:

• Planar: κmin = κmax= 0

• Developable convex: κmin= 0 < κmax

• Developable concave: κmin< κmax= 0

• Non-developable: κmin· κmax, 0

Any mesh produced through structure from motion with today’s quality will con-tain a noticeable amount of noise. The curvature values will never be perfectly zero and a per-vertex assignment of the labels will not be especially consistent. There is a need toestimate the local properties and to regularize the labels.

Local estimation of the geometry

Rather than labeling according to where the principal curvature values are zero, theprobability of a label at a certain vertex is instead considered. The two

curva-ture values act as input to a function that should map to one of the four labels, based on the closeness to zero of the curvature values. If both are perfectly zero, the probability of a planar label should be equal to one while the probability of a non-developable label should be zero. Moving any or both of the input variables away from zero, either positive or negative, should lower the probability mono-tonically towards zero for the planar label and inversely for the non-developable one. Analogous reasoning applies for the two other labels.

A choice of function that fits these requirements is the non-normalized zero-centered Gaussian distribution in different combinations for the different labels:

G(κ, σ ) = e−κ2/σ2 (3.1)

Notice that the parameter σ has been introduced here, which basically is a choice of scaling. The probabilities for the different labels are:

P r(l|κmin, κmax) =               

G(κmin)G(κmax) if l = planar,

G(κmin)(1 − G(κmax)) if l = developable convex,

(1 − G(κmin))G(κmax) if l = developable concave,

(1 − G(κmin))(1 − G(κmax)) if l = non-developable.

(3.2) Note that these different cases for the label probability sum to one.

(33)

3.1 Plane Detection: Curvature Segmentation 15

An initial labeling choice can be made based on this local information by choos-ing the label that gives the highest probability value for each schoos-ingle vertex. This gives a decent initial guess but is in general very noisy.

Markov Random Field formulation

In order to view the problem of finding the most suitable label for all vertices as a global one, aMarkov Random Field (MRF) formulation is used. This means

the choice of label at each vertex is regarded as a stochastic variable and that the mesh is viewed as an undirected graph with mesh edges between vertices being the connections between stochastic variables.

The MRF is formulated such that each vertex and each edge contributes with a value to an energy that is to be minimized with respect to the choice of label set

lNV 1 = (l1, . . . , lNV). U (lNV 1 ) = X i∈V Di(li) + β X {_i,j}∈E Vi,j(li, lj) (3.3)

The energy consists of the sums of two terms and a weighting constant β that decides which term to emphasize.

Theconsistency term Di(li) is the part that penalizes a label that does not comply

well with the curvature values. It can be chosen as

Di(li) = 1 − P r(li|κ(i)min, κ

(i)

max). (3.4)

The pairwise topological smoothness constraints Vi,j(li, lj) favors coherence of

la-bels across the graph. This term, like the consistency term, depends on the prin-cipal curvature of the two concerned vertices but also the prinprin-cipal directions. If two neighboring vertices are of different labels, the penalty is equal to one. If the label is the same, a value is calculated:

Vi,j(li, lj) =        1 if li , lj, min(1, a|| ~Wi−W~j||2) if li = lj. (3.5)

The parameter a is a positive constant, based on the mean edge length in the mesh. It decides how hard to penalize the difference between ~Wi and ~Wj. The vectors

~

Wiand ~Wjhave size 6x1 and describe the product of the principal curvature and

directions: ~ W = κmin~emin κmax~emax ! (3.6) with variables as defined in definition 2.3 and definition 2.4.

(34)

Minimizing the energy

The energy function (3.3) requires a non-convex optimization technique to min-imize. Luckily, the MRF formulation is not uncommon in the field of computer vision today and many studies have been made in general regarding optimiza-tion on graphs. We use the α-expansion algorithm [4] to minimize the presented function.

3.1.2 Finding connected components

The output of the energy minimization procedure is a set of labels, one corre-sponding to each vertex of the mesh. Neighboring vertices with the same label form clusters. These clusters are gathered by using individual vertices as seed

points and region growing based on label coherence.

3.1.3 Fitting planes to clusters

Clusters of non-planar labels are discarded. The remaining clusters are then of planar label and a plane estimation step follows. A single plane is however not necessarily appropriate to be fitted to a set of vertices. This is because on different levels of scaling (see σ parameter in section 3.1.1), an edge that might seem ap-parent to the human eye can be labeled as planar if the angle is not sharp enough. See figure 3.1 for an illustration of the effect of different choices of σ. Single ele-ment clusters are discarded. In a mesh, the same applies for clusters with three vertices or less.

In the illustration, the highest σ level would give rise to one plane whereas it would be appropriate with several distinct planes. Because of this, a plane is ac-cepted only if the mean vertex-to-plane projection error is below a certain thresh-old. The threshold is chosen to be a fraction of the mean edge length of the mesh. The plane estimation is performed by calculating a covariance matrix for the ver-tices and performing an eigenvalue decomposition. The vector corresponding to the smallest eigenvalue is used as a plane normal.

3.1.4 Hierarchical approach

It is unlikely that all planar surfaces are found with only one choice of the scaling factor σ present in the multi-label energy model. In order to find planes both big (such as building façades) and small (windows and chimney sides) it is a necessity to rerun themulti-label energy model, finding connected components and fitting planes to clusters steps with a different choice of this parameter. It is also

motivated by the fact that meshes can come in different resolutions.

For each σ iteration level, accepted estimated planes are kept and excluded from the processing on the next scaling level. After a fixed amount of iterations, the plane sets from the different levels are all sent to the next processing block. The other two parameters, β and a, are fixed during processing.

(35)

3.1 Plane Detection: Curvature Segmentation 17

Figure 3.1: Illustration of curvature segmentation for different levels of σ, going from highest (top) to lowest (bottom). Blue means a planar label, red any non-planar. Boxes indicate planar clusters resulting from finding con-nected components.

(36)

3.2 Plane Refinement

Two methods for plane refinement are presented here. The first one is photo-consistency refinement, which uses geometric and topological attributes and

im-ages as input data. It is inspired by the description in Lafarge et al. [16]. The second one isplane growing, merging and dividing which uses only geometric and

topological attributes of the mesh. The combination of these steps are novel al-though the operations are fairly straight-forward. In the dividing step, three well-studied clustering algorithms are used, as well as an own invention.

3.2.1 Photo-consistency refinement

To refine the quality of planar regions, we introduce the use of a photo-consistency measure as briefly described in section 2.2.3. This is motivated by the basic as-sumption of this work: some regions on reconstructed buildings are flat and the 3D reconstruction pipeline produces surfaces that are not. Therefore, projecting the geometry in these regions to planes will make the representation of reality more coherent with the real thing. Thus photo-consistency over several views should improve in these regions.

Recall that initial planes have been found from theplane detection block at this

stage. Corresponding vertex positions are first forced to the surfaces of these planes.

Then, vertex candidates for inclusion in a certain plane are selected (in practice this is the neighboring vertices of a plane). For each one, all faces that the vertex is part of are collected. We create two sets of faces, one before projecting the vertex to the plane and one after projection. A best-view camera is chosen based on the coherence of the vertices’ and the camera’s positions and the camera direction. From this camera for both sets of faces, a number of 3D points on the surface are projected from the pixels seeing the face set, see figure 3.2.

These two sets of 3D points are then projected into all the other cameras from which the candidate vertex is visible. If the resulting projected pixel coordinates ends up at non-integer coordinates (which of course is the general case), the color value is interpolated.

This gives the same number of pixels each time so the normalized cross corre-lation, definition 2.2, can be easily calculated between all camera pairs. This is performed for each rgb color channel individually and then averaged. In the def-inition, f and g are vectors containing the image RGB values of all the N points in each of the K cameras before and after projecting the vertex to the plane:

nccc before= X ∀{_{i,j} i,j} ( ~pic−µp~ic) · ( ~pj c₋_µ ~ pjc) σp~icσp~jc , where p~ic=                C_icp1 C_icp2 .. . C_icpN                (3.7)

(37)

3.2 Plane Refinement 19

Camera

center Projected

image

3D object

Figure 3.2: Projection from camera center through pixel centers gives a set of 3D points on a surface.

C_icpn is the color c of point n projected in camera i. This ncc measure is

cal-culated separately for each of the three color channels, then averaged over the number of channels and cameras such that:

ncc_before₌ 3 X c=1 nccc before 3 · K (3.8)

The nccafteris calculated analogously, with the difference that the plane-projected point set ~p0

is used instead. We arrive at a simple comparison of the two scalars ncc_before and ncc_after to determine whether to accept the plane-projection or not. The point set with the highest ncc value has the highest degree of photo-consistency. If the plane-projected set of faces gives the best score, the candidate vertex is moved to the proposed plane.

3.2.2 Growing, merging and dividing

The plane refinement method presented here is based on three steps. 1. Growing planes from the initial seeds.

2. Merging together adjacent and similar planes. 3. Dividing planes into clear substructures.

For the third step, several methods are presented. They will be compared in a later chapter.

Conditional plane growing

(38)

εprojection

nplane

ntriangle

Figure 3.3: Face candidate for inclusion to striped plane. Normals of face and plane are shown, as well as extent of plane projection error for a face vertex.

• a normal vector,

• a position in world coordinates and • the vertices or faces that is part of it.

To make these planes more complete, the algorithm looks at topological neigh-bors and compare some basic properties. This method first compares the normal of a candidate face with the normal of an adjacent plane. If the coherence is good enough (normal vectors are oriented similarly) the vertices of the face are pro-jected to the plane and a vertex projection error calculated, which is also used as an inclusion criteria. See figure 3.3 for an illustration. The thresholds for normal coherence and projection error are used as input to the pipeline.

It is worth mentioning that the normal coherence and the plane projection error criteria are two ways of verifying the same thing in the case of isotropic triangle faces. See figure 3.4 for an illustration of the relation between the allowed normal angle difference and the allowed projection error.

For an equilateral triangle with edge length m, the relation between projection error d and normal difference α is

d = m sin α

√

3/2. (3.9)

Merging of similar planes

In the set of initial planes, one real-world plane might be represented by several incomplete ones side-by-side on the surface. It is therefore necessary to carefully merge these planes to represent wholes rather than sub-regions while not merg-ing together planes that should be kept separate.

(39)

3.2 Plane Refinement 21 d = l sinα α l m l = m√3/2

Figure 3.4: Perspective looking at (left figure) and from side of (right fig-ure) two equilateral triangles, showing the relation between normal angle difference α and plane projection error d.

This is performed through a straight-forward approach by comparing neighbor-ing planes. For a merge to be accepted, the plane normals should point in a sim-ilar direction and the plane-projection error for each vertex can not be too large. The plane-projection error is calculated by first fitting a new plane to the com-bined vertex set and averaging the projection error for each vertex to this plane. The thresholds for normal deviation and projection error are input parameters to this processing step.

Dividing by clustering

A set of vertices can be considered to make up a planar surface within a certain error threshold in terms of vertex deviation from the plane. Depending on how tight this threshold is, sub-structures might exist in planes in the form of win-dows, doors or other extrusions. Also depending on the amount of simplification desired, these sub-structures could be either extracted or left as part of the plane. Here is presented a few methods for performing the division.

Each vertex belonging to a plane has a 2D position in relation to the plane and a scalar property that we calldeviation from the plane – that is the shortest

Eu-clidean distance to the plane, illustrated in figure 3.5. This property is utilized in some of the clustering methods reviewed below.

The idea of projecting the positions on the deviation axis and then clustering is inspired by Furukawa et al. [11]. The three traditional clustering methods used arekernel density estimation, agglomerative clustering and K-means. A fourth, own

invention calledextrema detection and growth is also evaluated.

Kernel density estimation on deviation

If there exist sub-structures in a plane of vertices with a limited amount of noise, these regions should each have a clear mean level of deviation that then is de-tectable. To find these different mean levels a kernel density estimation, kde, is em-ployed. First, the positions of the vertices are reduced to their one-dimensional (signed) distance from the plane. Then a histogram can be calculated based on this property which in general is noisy and of course, non-continuous. It needs to be smoothed with akernel which can be freely chosen as a non-negative function

(40)

deviation

normal

Estimated plane

Figure 3.5: Estimated plane and vertices seen from side. In the vertex set that has been fitted with a plane, there are two sub-structures that might be relevant to separate.

See figure 3.6 for an illustration. The kernel also has abandwidth parameter that

decides the width and thus how much smoothing is performed.

With the kde curve calculated, different maxima in the density function can be isolated by locating the minima and using the corresponding deviation value as a separation threshold. This separates vertices with different deviation values in intervals. Each separated vertex set is then subject to a new plane fitting.

An effect of this method is that the separation of the vertices in directions or-thogonal to the plane normal becomes irrelevant. This means several windows or other details with the same depth in the façade will end up having the same plane fitted to them.

Agglomerative clustering with centroid linkage

In agglomerative clustering, clusters are initially assigned one to each sample. Then, iteratively based on a criterion, these clusters are pair-wise merged together until all samples are part of the same cluster. In this context it is most suitable to use a centroid-linkage scheme with an Euclidean distance metric. For a thorough discussion on linkage schemes and distance metrics, consult e.g. Rokach and Maimon [23].

A centroid is thus calculated for each cluster and the choice of which two clusters to merge is based on the Euclidean distance of the centroids. There also has to be a stopping criterion for the clustering to be meaningful (rather than clustering each vertex separately or all in the same cluster). This is set as a maximum separation distance, such that when the closest two centroids are above this threshold, the algorithm stops.

A relevant addition to the regular clustering procedure is to exaggerate the dis-tance orthogonal to the plane, in order to favor separation in this direction. This is performed by a simple uniform scaling of the vertices. It is also common with many clusters across the plane with basically the same deviation value as the orig-inal plane. These are merged since separation based on distance in origorig-inal plane is not interesting. See figure 3.7 for an illustration.

Extrema detection and growth

(41)

nega-3.2 Plane Refinement 23 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0 1 2 3 4 5 6 7

(a)Histogram of the plane devia-tion. -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

(b) kdewith a Gaussian smooth-ing kernel.

New estimated plane

(c)Result with separation and new fitted planes.

Figure 3.6:Example of kernel density estimation separation. Note the mini-mum in the curve, which gives the proper value to perform separation.

(a)Clusters found with a certain threshold.

New estimated plane New estimated plane New estimated plane

(b)Result with separation and new fitted planes.

Figure 3.7:Example of agglomerative clustering separation. To each identi-fied sub-region a plane is estimated. Regions with little deviation remain in the original vertex set.

(42)

tive) are selected as a set ofdeviating region seed points. From these, regions grow

to the vertex neighbors as long as the neighbors have absolute deviation values above a threshold. See figure 3.8 for an illustration.

K-means on deviation

In brief, K cluster centroids are initially placed somewhere (e.g. randomly) in the data. Each vertex in the data set is then assigned the closest centroid in an Euclidean distance sense. The centroids are re-calculated and vertices are re-assigned a centroid and so on until convergence. The method can get stuck in local minima, which is why it might be useful to run several passes and compare the results to pick the best.

Here, the problem is decimated from 3 dimensions to 1 by just looking at the de-viation. The choice of how many clusters, K, to use is selected to 2 initially. If two distinct clusters can be found (their centroids are separated by more than a pre-defined threshold), a new round of separation can be attempted on the obtained subsets. See figure 3.9 for an illustration.

Note that the K-means algorithm is a heuristic, thus does not guarantee an op-timal solution. In addition, the algorithm intrinsically favors clusters of similar size.

(43)

3.2 Plane Refinement 25

(a)Most deviating seed points and neighbor checking.

(b)Region grown and further neighbor checking.

(c)Regions after growth.

New estimated plane New estimated plane New estimated plane

(d)Result with separation and new fitted planes.

Figure 3.8: Example of extrema detection and growth. To each identified sub-region a plane is estimated.

(44)

26 3 Pipeline Components -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 -10 -8 -6 -4 -2 0 2 4

(a)Result of K-means clustering on deviation.

New estimated plane

(b)Result with separation and new fitted planes.

(45)

4

Method

In this section we present used tools for implementation, used data sets and methodology for pipeline selection and evaluation.

4.1 Tools

The programming language used is C++, which is also the language for the Spotscale

components.

TheVisualization and Computer Graphics Library [6] (VCG for short) has been

ex-tensively used when necessary to generate properties for, and manipulate the mesh. It is a publicly available header library including many different mesh pro-cessing algorithms among which some are very recent and is thus well suited for the needs of the implementation.

The OpenGM library for discrete factor graph models [2] has been used for opti-mization algorithms.

matlab_{® [22] has been used for prototyping some algorithms. In the component} evaluation, experiments performed with matlab are presented.

In the following chapters, there are some comments on running time. There-fore, we present the hardware specifications on which the tests were run. The computer was equipped with a 2.4 GHz Intel®Core™i7-4700MQ cpu and 8 GB DDR3 memory.

(46)

28 4 Method

4.2 Input and Output Data

The input data consists of finished 3D mesh reconstructions of different buildings with corresponding images and applied texture. Some data sets for testing and evaluation are provided by Spotscale and one has been downloaded from the EPFL CVLab. The following data sets are presented: Vreta Church, Vasallen, Container and Herz-Jesu-P8. They all contain images from different directions and a reconstructed 3D mesh. Output of the devised system is the input mesh with approximate planar regions replaced with planes.

The input data is required to be a reasonably isotropic two-dimensional non-degenerate surface1. Increasingly anisotropic triangles gives worse results, es-pecially significant for the calculation of principal curvatures.

Vreta ChurchThe Vreta Church data set, seen in figure 4.1, depicts a Swedish

stone church established in the 12th century. The building consists of many flat surfaces both large and small but also details such as window sockets. It has approximately 678 000 faces and is provided by Spotscale.

Figure 4.1:Vreta Church data set. Reconstructed mesh and sample image.

VasallenTheVasallen data set, seen in figure 4.2, depicts a typical urban building

with right angles and many windows. It has approximately 300 000 faces and is provided by Spotscale.

ContainerTheContainer data set, seen in figure 4.3, has been downloaded online

and originally consists of a low-polygon 3D mesh of a container. Snapshots of this have been taken from different perspectives and it has then been 3D recon-structed, resulting in a mesh with approximately 57 000 faces.

Herz-Jesu-P8 The Herz-Jesu-P8 data set, seen in figure 4.4, is provided by the

EPFL CVLab and was made publicly available following the publication of Strecha et al. [26]. It consists of images of a stone façade and a lidar-scanned 3D mesh re-construction that will be used to perform evaluation against. A 3D rere-construction

(47)

4.2 Input and Output Data 29

Figure 4.2:Vasallen data set. Reconstructed mesh and sample image.

Figure 4.3:Container data set. Reconstructed mesh and sample image.

has also been performed from the provided images, consisting of approximately 500 000 faces.

(48)

30 4 Method

Figure 4.4:Herz-Jesu-P8 data set. Reconstructed mesh and sample image.

4.3 Pipeline Component Evaluation

The overall two blocks of the pipeline were identified at the beginning of this thesis work, as depicted in figure 1.2. After this followed a review of literature, summarized in section 1.3, which lead to the choice of some components to keep as candidates for the pipeline. These components were presented in chapter 3. Using a combination of the presented components, one final pipeline was con-structed that best suited the goal of this thesis. To do this, all the individual components were not quantitatively evaluated. Instead, choices were made based on theoretical reasoning and basic experiments on real and synthetic data. This evaluation is presented in chapter 5.

4.4 Final Pipeline Evaluation

Once the final pipeline components had been selected, a more thorough round of evaluation was performed. We use two methods for verifying the quality of the resulting mesh:visual inspection and comparison with ground truth.

4.4.1 Visual inspection

One of the most important requirements on this system is to improve the visual quality of meshes, which does not always comply with a quantitative measure of quality. Therefore, it is important to look at the result and comment on where the system seems to perform well and where it does not. This was of course done continuously during this thesis work but also, a walk-through of the different components will be presented.

4.4.2 Surface comparison with ground truth

It is very difficult to produce 3D meshes of real-world objects which could right-fully be called ground truth. One method that is close enough in this context is to laser-scan and reconstruct a scene. This will be done with the Herz-Jesu-P8 data

(49)

4.4 Final Pipeline Evaluation 31

presented above, since it comes with both images of the scene and a laser-scanned mesh.

In addition to the laser-scanned mesh, we also employ a simpler approach to get additional evaluation data. Starting from a digitally designed 3D mesh with both perfectly flat surfaces and non-flat details, snapshots can be taken from a number of well-chosen camera poses. These images are then passed through the provided 3D reconstruction pipeline, giving an imperfect version of the original, thus closely imitating real-world 3D reconstruction.

In both cases of evaluation mesh type, the reconstructed mesh is then subject to the proposed processing pipeline, substituting surfaces with planes where appro-priate. This gives three versions of the evaluation mesh: the gt, the reconstructed and the flattened. See figure 4.5 for an illustration of the idea in the case of dig-itally designed data. We then form two pairs with the GT and the two other meshes. The question then is “how much has the error changed before and after substitution with planes?”.

To get a comparable error, we calculate the distance to the gt mesh with the

Metro tool [7]. It uses a sampling method to generate points on each face of one

of the meshes, then calculating the closest distance from each of these points to the other mesh. For all these points, a rms error can be calculated, thus getting a value on how well the reconstruction corresponds to the GT.

(50)

32 4 Method

(a)Ground truth mesh.

(b)Reconstructed mesh, perturbed compared to gt.

(c)Reconstructed mesh with surfaces flattened.

Figure 4.5:Illustration of reconstruction and flattening of digitally designed 3D mesh. A house with flat surfaces and a round window next to a scarecrow without any flat surfaces. The system manages to replace the perturbed flat surfaces with planes but the window and scarecrow are left as they are.

(51)

5

Pipeline Component Selection

This chapter presents an evaluation of the different candidate components for the pipeline. This includes comparisons, theoretical reasoning and experiments. Finally, the chosen pipeline is presented.

5.1 Component Evaluation/Comparison

This section shows the result of the evaluation of thephoto-consistency refinement

and the different separation methods used in plane growing, merging and dividing.

5.1.1 Photo-consistency refinement

The running time of the photo-consistency refinement was too long on the used hardware in order to properly evaluate its use. The results that were obtained did not reveal whether it was a good addition to the pipeline or not and is thus not used in the final pipeline. Chapter 7 contains some discussion on this.

5.1.2 Separation method comparison

Four different methods for separating sub-structures in planes were presented in section 3.2.2. Agglomerative clustering and extrema detection and growth produce

separate planes for each found sub-region whereas the other two only does if the sub-regions are found to be separate clusters. Both varieties are useful in different situations and are rather easy to convert between.

Regarding time complexity, the outlier is agglomerative clustering. For each merge, all clusters are compared to all others which is O(n · n) with n vertices. To merge all clusters to one (or a few) means O(n) time, resulting in O(n3).

(52)

34 5 Pipeline Component Selection

The only parameter-free technique is kde, although the bandwidth of the smooth-ing can be altered if need be. In most cases, a default bandwidth based on the vertex deviation variance is a good choice. In all the others, there is a need to specify thresholds for when to separate.

Experiments

In matlab, a small simulation of a real situation is set up with two planar sur-face levels, that is shown in a noise-free version in figure 5.1. It consists of 1225 points that represent vertices on a real mesh, with onewall surface and one win-dow surface. A regular grid has been used for the XY-coordinates with a minor

amount of noise. The Z-coordinate is used for the deviation from the "‘true"’ pla-nar surfaces and is also perturbed with a varying amount of noise, see figure 5.2. The noise has been applied to simulate inaccuracy from the 3D reconstruction process. 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

(a)Seen from XY perspective.

0 5 10 15 20 25 30 35 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6

(b)Seen from XZ perspective.

Figure 5.1:Simulation of two planar surfaces without noise.

0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35

(a)Seen from XY perspective.

0 5 10 15 20 25 30 35 -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6

(b)Seen from XZ perspective.

Figure 5.2:Simulation of two planar surfaces with noise.

On this point set we apply the following three algorithms: kernel density estima-tion clustering, agglomerative clustering and K-means clustering. Separaestima-tion by ex-trema detection and growth have not been experimented with in matlab since it

(53)

5.1 Component Evaluation/Comparison 35

also uses neighboring information which makes the experiment much harder to implement. Instead, it was implemented in C++code and evaluated on real data.

In the experiment, noise is drawn from a normal distribution with varying stan-dard deviation, from 0 % to 30 % of the distance between the wall and window surface levels. The amount of correctly classified vertices in the two levels are noted for each algorithm, which are re-run on a new set of data with new ran-domized noise 1000 times. The results are presented in table 5.1, simulating one window and table 5.2 simulating 2 windows with the same deviation.

Table 5.1: Amount of correctly classified vertices when clustering with dif-ferent algorithms, in a simulation with a wall and a window.

Noise STD KDE Agglomerative K-means

Wall Window Wall Window Wall Window 0.0 % 100 % 100 % 100 % 100 % 100 % 100 % 10.0 % 100 % 100 % 100 % 100 % 100 % 100 % 15.0 % 99.98 % 99.85 % 99.95 % 99.95 % 99.98 % 99.84 % 17.5 % 99.91 % 99.51 % 99.74 % 99.81 % 99.94 % 97.06 % 20.0 % 99.74 % 98.86 % 99.13 % 99.55 % 99.86 % 97.26 % 22.5 % 99.40 % 97.81 % 97.34 % 99.30 % 99.75 % 93.69 % 25.0 % 98.95 % 96.41 % 91.94 % 99.32 % 99.62 % 88.55 % 27.5 % 98.32 % 94.82 % 84.19 % 98.89 % 99.49 % 82.44 % 30.0 % 97.56 % 93.45 % 78.86 % 96.11 % 99.38 % 74.39 %

Table 5.2: Amount of correctly classified vertices when clustering with dif-ferent algorithms, in a simulation with a wall and two windows.

Noise STD KDE Agglomerative K-means

Wall Window Wall Window Wall Window 0.0 % 100 % 100 % 100 % 100 % 100 % 100 % 10.0 % 100 % 100 % 100 % 100 % 100 % 100 % 15.0 % 99.97 % 99.86 % 99.95 % 99.96 % 99.98 % 99.86 % 17.5 % 99.86 % 99.48 % 99.76 % 99.78 % 99.93 % 99.30 % 20.0 % 99.65 % 98.85 % 99.30 % 99.47 % 99.81 % 98.05 % 22.5 % 99.29% 97.70 % 98.36 % 98.90 % 99.63 % 95.49 % 25.0 % 98.78 % 96.40 % 96.79 % 98.43 % 99.42 % 91.97 % 27.5 % 97.98 % 95.06 % 94.42 % 97.92 % 99.23 % 87.00 % 30.0 % 97.02 % 93.29 % 91.53 % 97.44 % 99.00 % 80.95 % The maximum noise level standard deviation is set to 30 % of the distance from wall to window planes. It is hard to give an estimate on how much noise is present in a 3D reconstructed mesh but it is certain that the average relative error be-tween points lying close to each other is much less than 30 %. In figure 5.2 the noise is at 20 %.