Improving Conventional Image-based 3D Reconstruction of Man-made Environments Through Line Cloud Integration

(1)

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University

Linköpings universitet

LiU-ITN-TEK-A--18/010--SE

Improving Conventional

Image-based 3D Reconstruction

of Man-made Environments

Through Line Cloud Integration

Martin Gråd

(2)

LiU-ITN-TEK-A--18/010--SE

Improving Conventional

Image-based 3D Reconstruction

of Man-made Environments

Through Line Cloud Integration

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Martin Gråd

Handledare Reiner Lenz

Examinator Sasan Gooran

(3)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

(4)

Improving Conventional Image-based 3D

Reconstruction of Man-made Environments

Through Line Cloud Integration

Martin Gr˚

ad

Link¨

oping University

(5)

Abstract

Image-based 3D reconstruction refers to the capture and virtual reconstruction of real scenes, through the use of ordinary camera sensors. A common approach is the use of the algorithms Structure from Motion, Multi-view Stereo and Pois-son Surface Reconstruction, that fares well for many types of scenes. However, a problem that this pipeline suffers from is that it often falters when it comes to texture-less surfaces and areas, such as those found in man-made environ-ments. Building facades, roads and walls often lack detail and easily trackable feature points, making this approach less than ideal for such scenes. To rem-edy this weakness, this thesis investigates an expanded approach, incorporating line segment detection and line cloud generation into the already existing point cloud-based pipeline. Texture-less objects such as building facades, windows and roofs are well-suited for line segment detection, and line clouds are fitting for encoding 3D positional data in scenes consisting mostly of objects featuring many straight lines. A number of approaches have been explored in order to determine the usefulness of line clouds in this context, each of them addressing different aspects of the reconstruction procedure.

The results show that while line data can be used to achieve small improve-ments on fully-fledged 3D reconstruction pipelines, it might be better suited as a faster substitute for the dense point cloud generation step. As such, it could be used for instance as a tool for on-site technicians to facilitate assur-ance of sufficient image coverage while capturing the scene. The results also indicate, however, that line clouds can be utilized to obtain improvements on specific areas that are problematic for the traditional pipeline, even in the con-text of dense reconstruction. Wiry structures, such as fences or antennas, which are exceedingly difficult to handle, are accurately reproduced using line clouds, making this method useful in such areas, alongside the traditional, point-based approach. Future work includes use of a certainty measure in the generation and displacement of points, an additional, local curvature-based point genera-tion technique and a localized reconstrucgenera-tion method.

Keywords: Image-based 3D reconstruction, SfM, MVS, Poisson surface reconstruction, line cloud, point cloud, PCA, k-d tree

(6)

1.1.4 Line clouds . . . 5 1.2 Related work . . . 6 1.3 Aim . . . 7 1.3.1 Research questions . . . 7 1.4 Delimitations . . . 7 2 Method 8 2.1 Implemented approaches . . . 8 2.1.1 Line sampling . . . 8 2.1.2 Point displacement . . . 9 2.1.3 Vertex displacement . . . 10 2.1.4 Combinations . . . 10 2.2 Implementation details . . . 10

2.2.1 Line clouds using Line3D++ . . . 11

2.2.2 Spatial decomposition using k-d trees . . . 11

2.2.3 Normal computation using PCA . . . 13

2.2.4 Third-party software . . . 14 2.2.5 Code implementation . . . 14 3 Results 16 3.1 Implemented approaches . . . 16 3.1.1 Line sampling . . . 17 3.1.2 Point displacement . . . 18 3.1.3 Vertex displacement . . . 19 3.1.4 Combinations . . . 20 4 Discussion 23 4.1 Future work . . . 24

(7)

(8)

List of Figures

1.1 Structure from motion illustration . . . 2

1.2 Epipolar geometry . . . 4

1.3 Poisson surface extraction . . . 5

1.4 Example line cloud . . . 6

2.1 Displacement distance function . . . 10

2.2 K-d tree for 2-dimensional data . . . 11

3.1 Original point clouds . . . 16

3.2 Meshes constructed from original point clouds . . . 17

3.3 Original line clouds. . . 17

3.5 Meshes constructed after use of line sampling . . . 18

3.6 Point clouds after use of point displacement . . . 19

3.7 Meshes constructed after use of point displacement . . . 19

3.8 Meshes constructed after use of vertex displacement . . . 20

3.9 Point clouds after use of both line sampling and point displacement 20 3.10 Meshes constructed after use of line sampling and point displace-ment . . . 21

3.11 Meshes after use of line sampling, point displacement and vertex displacement . . . 21

(9)

Chapter 1 Introduction

The field of 3D reconstruction is extensive, encompassing a multitude of dif-ferent approaches to fundamentally similar problems. At their simplest, these problems reduce to capturing of shape and appearance of real-world objects, by means of some data collection procedure. Such processes can be separated into two principal categories: active and passive methods [3]. Active methods have in common a physical interaction with their target objects, and include approaches such as range data methods, for example laser scanning. Passive methods, however, merely observe objects through the use of sensors such as those found in digital cameras, recording nothing but the light cast from sur-faces, either reflected or emitted. These methods are far simpler to employ, as they do not require specialized hardware, and constitute what can be called image-based 3D reconstruction.

This thesis focuses on image-based 3D reconstruction, specifically through the use of the techniques Structure from motion (SfM), Multi-view stereo (MVS) and Poisson surface reconstruction. This pipeline of processes is a conventional approach to the task of recovering 3D information from image sequences, and has seen extensive development since its inception. Generally, it produces pleas-ing results for many types of objects through the use of 3D point clouds, but falters when it comes to man-made environments [7]. Objects in such scenes often contain featureless areas, and this leads to inadequate reconstruction due to sparsity in the generated 3D data. To circumvent this deficiency, recent work by Hofer et. al. [7] proposes the use of line clouds rather than point clouds, and demonstrate its efficiency on a number of challenging data sets. This thesis acknowledges these results and aims to take them further by combining them with the traditional pipeline to exploit the strengths of both methods, in order to again improve on the results.

(10)

1.1 Background

3D reconstruction through capturing of image sequences has been extensively studied since 1970 [5], and has recently started to grow significantly in popularity [7]. One of the most common approaches to this problem is the use of a pipeline process incorporating the techniques Structure from motion (SfM), Multi-view stereo (MVS) and Poisson surface reconstruction, which are briefly presented below. Following these sections is also a more specific review of related work, as well as a statement of the aim of this thesis.

1.1.1 Structure from motion

Structure from motion, or SfM, is a photogrammetry technique that aims to infer 3D structures from sequences of 2D images. By finding correspondences between features in pairs of images, camera parameters and the relative trans-formation between poses can be computed. Image features can be identified for instance using the well-known SIFT (Scale-invariant feature transform) al-gorithm [12]. This is achieved by finding intersections between projection rays through matching feature points in multiple images. Figure 1.1 shows a series of three camera poses and one tracked feature point, visible in each of the three images, along with its reconstructed 3D position. In addition to its primary use as a camera pose estimator, SfM also generates a sparse point cloud from the feature points.

Figure 1.1: Structure from motion illustration [11]. The three matching image points pj,k−1, pj,kand pj,k+1correspond to the 3D point pj. pjis

computed as the intersection between the rays starting at each camera center and projecting through the corresponding identified 2D feature points on the image planes.

(11)

1.1.2 Multi-view stereo

The results from SfM are generally not sufficiently dense to perform adequate 3D mesh reconstruction. Only specific points in the scene will have a 3D rep-resentation, such as edges between facades, corners of windows, or otherwise unique and trackable points. This shortcoming is commonly addressed through the use of Multi-view Stereo, or MVS. This is a general term given to a number of techniques that infer 3D information from sequences of 2D image data using mainly stereo correspondence as their depth cue, on a set of more than two images [4]. These techniques take as input a set of images and corresponding camera parameters, and use them as constraints to generate dense 3D point information. In the case of the SfM-MVS-Poisson pipeline, the camera param-eters are supplied by the preceding SfM step.

The SfM point cloud construction is based solely on feature points, as men-tioned identified using for instance the SIFT algorithm. The reason for this is that the algorithm is one whose primary goal is to determine the camera param-eters, and as such can impose no constraints on the 2D image points that are not feature points, at a reasonable cost. Finding matches between individual pixels in pairs of images is a 2D search, where each pixel in one image needs to be compared to all pixels in the other one. This is a needlessly costly operation, and one that can be made simpler and faster at a later stage, namely the MVS step.

As the name implies, Multi-View Stereo takes as input multiple views, or in other words: multiple cameras. This means that this algorithm operates under different assumptions than what SfM does, specifically with known camera pa-rameters. These parameters are used to turn the 2D search for matching pixels into a 1D operation. When the camera poses are known, each pixel position in one image can be projected into another. This is achieved by constructing a 3D ray from the camera center through the pixel in question, and projecting it into the other image. If the relative transformation between the two images is correct, this projection results in a line in the target image on which a potential match for the original pixel must be located. This relation is called epipolar geometry, and the produced line is significantly faster to search through than an entire image, making this operation feasible at this stage. MVS, through the epipolar geometry, thus enables the construction of a much more dense point cloud, where every pixel of an image is a potential 3D point, in contrast to the relatively sparse selection of points used in the SfM procedure. A visualization of the concept of epipolar geometry is presented in Figure 1.2.

(12)

Figure 1.2: Epipolar geometry [13]. The image point XL in image OL

can represent any 3D point along the camera ray (dotted line O# »LX).

Projecting this line into image OR, however, generates an epipolar line

in the target image (marked in red), on which a potential pixel matching XL must be contained. In this case, the point XR matches XL, thus

indicating that the 3D position represented by XL is X, since that is

where both camera rays (O# »LX andO# »RX) intersect.

1.1.3 Poisson surface reconstruction

Poisson surface reconstruction [10] is a method to produce triangular meshes from sets of 3D points. The method is based on the assumption that the 3D data represents an underlying geometry, and that the points themselves are merely samples on that surface. The underlying geometry is described by an implicit function, and the data points are seen as samples on or near an iso-surface extracted from it. The normals of the data points are used as samples of the gradient of the underlying surface function.

Specifically, the method operates by first computing an indicator function, indi-cating the membership of points in space to the model. This function equals one at points inside the model and zero everywhere else. The function is derived by relating the normals of the model points to its gradient field. This vector field is zero everywhere except for near the surface of the model, where it is equal to the inward-facing surface normal. The reconstruction procedure thus deter-mines the indicator function by finding a scalar function whose gradient best approximates the normal samples. The results are very smooth surfaces that robustly approximate noisy data. A simple illustration of the surface extraction concept is presented in Figure 1.3.

(13)

Figure 1.3: Poisson surface extraction [10]. The set of oriented pointsV#» is related to the indicator function χmthrough its vector field of normals,

from which the indicator gradient ∇χM is derived. The surface ∂M is

generated from the indicator function.

1.1.4 Line clouds

There are more ways than one to store 3D information derived from sequences of images; point clouds are only one example, albeit the most common one. One weakness of this method is the fact that it is very resolution-dependent. To sufficiently describe surfaces, this data representation requires a very large amount of points, which, no matter the specifications of one’s system, ultimately will impose limitations on the execution, both in terms of memory requirements and computation time. Another shortcoming of the point cloud approach is the nature of points themselves. Not everything can be sufficiently well represented by a point in space, and not everything lends itself to identification by feature point detection such as the mentioned SIFT method.

An alternative data representation that is not as resolution-dependent as point clouds and also allows for other types of features is line clouds. Instead of identi-fying and storing single points in 2D and 3D, respectively, line segments can be utilized in the same way. This significantly reduces the memory requirements, since line segments are continuous and thus essentially resolution-free. Large data sets, naturally, still require a large number of line segments, but each of them encodes information that would require many more points. In addition, line segments do not need to (neither can they) be adapted resolution-wise, in order to sufficiently describe the geometry at different levels of detail, as is the case with point clouds. Line clouds also encode information in such a way that much of the data that is stored in point clouds does not need to be represented. For example, a square would traditionally need a great number of points in order to be robustly described, while a line cloud representing the same object would need only four line segments, representing the border edges. The missing data can be inferred using these borders. Furthermore, increasing the size of the square also necessitates addition of even more points to the point cloud, while four line segments still suffice. An illustration of a line cloud is shown in Figure

(14)

1.4.

Figure 1.4: Example line cloud, depicting a small house. Generated with Line3D++.

1.2 Related work

Much previous work on the subject of 3D reconstruction using line clouds con-cerns the use of line segment data as a substitute for dense point generation. This particular step of the reconstruction pipeline has been identified as the most time-consuming procedure, and alternative algorithms have been proposed to avoid it. Hofer et al. [7] use line clouds sampled as 3D points, fused with the sparse point cloud generated by SfM, omitting the use of MVS or similar meth-ods entirely, while still managing to achieve a high level of accuracy.

In contrast to Hofer et al., Sugiura et al. [15] build upon the line and point cloud approach by proposing the use of line clouds in the meshing step instead. Here, line segments are used as an additional input to the selected surface ex-traction method, namely tetrahedra carving. The line segments are used as constraints in the reconstruction, by making them constitute edges of tetra-hedra when employing the carving method. Densification of the point cloud through MVS is avoided in their work as well.

(15)

1.3 Aim

The aim of this thesis is to combine the point and line cloud-based 3D recon-struction methods introduced above, and leverage the strengths of both in order to achieve the most visually pleasing results possible, specifically in the context of man-made, outdoor environments. Such environments include, but are not limited to, houses, churches, car parks, and other buildings. This work also aims to investigate a number of different ways of combining the two approaches, and determine which are the most useful.

1.3.1 Research questions

The research questions that have been formulated for this thesis are:

• How can line clouds be used to enhance the visual fidelity of reconstructed man-made environments?

• In which way should point and line clouds be used together in order to yield the best results in terms of visual quality?

• Which kinds of data lend themselves well to enhancement using line clouds, and which do not?

1.4 Delimitations

This thesis focuses on the benefits that integration of line clouds can yield in an already existing pipeline for generating 3D representations of real objects. Consequently, code is written solely for this integration; other components of the production pipeline are utilized in their current form or replaced by publicly available alternatives. Correspondingly, generation of line clouds is not included in this thesis, but rather handled by available software developed during previous research. The execution time of implemented methods is not examined; only the final results in terms of visual fidelity is of interest.

(16)

Chapter 2 Method

This chapter gives detailed accounts of proposed approaches to solving the posed problem as well as implementation details of different techniques that have been utilized in order to realize them.

2.1 Implemented approaches

A few different approaches for improving reconstruction of man-made envi-ronments using line clouds have been explored. These are aptly named line sampling, point displacement, and vertex displacement, and are detailed in the following sections.

2.1.1 Line sampling

The first approach is the most straightforward: add points to the dense point cloud along the line segments of the line cloud. This idea is based on the as-sumption that point and line clouds contain information on different features of objects. They are generated using different feature detection algorithms, and should thus complement each other well. Points can be added along the line segments in a uniform fashion, but the spacing can also be dynamic. Utilizing the density of the point cloud around the sample position, the spacing can be ei-ther increased or decreased, depending on the desired effect. More points might be desired in low-density regions if the line cloud is assumed to contain informa-tion that the point cloud simply lacks; fewer added points might be wished for if scarcity in the point cloud signifies that the line cloud data actually is incorrect. Points are added by looping through the line segments of the line cloud and generating points along each one. The number of points to generate and the associated point spacing are calculated based on the length of each line segment and a user-supplied initial value. The spacing fully determines the position of the new points on the line, but computation of the accompanying normal

(17)

is more involved. This computation requires the use of neighbor information, and to this end, the statistical procedure Principal Component Analysis and the data structure K-d tree, detailed in sections 2.2.3 and 2.2.2, respectively, are utilized. The k-d tree allows for fast neighbor access, and PCA accurately approximates a value for the normal.

2.1.2 Point displacement

Another approach is to identify that a line cloud in many cases has more at-tractive qualities when it comes to building facades than corresponding point clouds do, and thus designate it as reference data and alter the positions of the points to better match the line segments. The attractiveness comes from the inherent consistency of certain object features in the reconstructed line mesh, for instance borders between different surfaces (such as facades and roofs). In comparison with point clouds, these are often significantly flatter. This quality is important for the overall appearance of man-made environments, and should thus be preserved as much as possible. By moving points closer to line segments in their proximity, the point cloud can be contracted around these features, in order to better represent them.

Since feature points often can be identified where no line segments are present, reconstructed point clouds often contain information on areas where line clouds are lacking. Due to this, points in such areas should not be altered when trans-lating points toward line segments. To facilitate this, the distance to translate each point is set to depend on the distance to its closest line segment, according to Equation 2.1.

dd= max(0, dl− d3l ∗

1

c2) (2.1)

Here, dd is the displacement distance, dl is the distance to the line segment

and c is the cut-off distance, indicating at which distance from the line seg-ment the value goes to zero. This function ensures that the displaceseg-ment is smooth, resulting in no added irregularities or jagged edges between points, and is visualized in Figure 2.1.

(18)

Figure 2.1: Displacement distance function. dlis the distance to a line segment

and the dd is the computed displacement distance. Here, the cut-off value is set

to 0.1. The value of the function never exceeds the distance to the line segment.

2.1.3 Vertex displacement

Part of the reason for reconstructed models being undesirably smooth is the meshing step, which in the context of this work is performed by Poisson sur-face reconstruction. This method is designed to create very smooth sursur-faces that robustly approximate noisy data [10], which makes it less than ideal for reconstructing sharp edges such as those found on corners of houses or between roofs and facades. A simple way around this drawback is to manipulate the produced triangular mesh instead of modifying the underlying point cloud. Dis-placing vertices of a triangular mesh is very similar to disDis-placing points of a point cloud, since these are essentially the same. The data is made up of points in space; whether they are triangulated or not is irrelevant to this method.

2.1.4 Combinations

The above methods can also be combined in a sequential manner, by applying at least one additional method after another one. The combinations explored in this way are: line sampling followed by point displacement, line sampling followed by both point displacement and vertex displacement, and point dis-placement followed by vertex disdis-placement.

2.2 Implementation details

The project is implemented in C++, using Line3D++[8][6] to generate line clouds and nanoflann [2] to speed up computations using k-d trees. In order to imple-ment the presented approaches, the statistical procedure Principal Component Analysis (PCA) has also been implemented. Software used to generate camera parameters, dense point clouds and triangulated meshes are VisualSFM, PMVS and PoissonRecon, detailed below. Also outlined in this section is the code implementation developed during this work.

(19)

2.2.1 Line clouds using Line3D++

Line3D++ [8][6] is a C++ framework for generating line clouds from sequences of photographs, developed by Hofer et al. By detecting line segments in im-ages and identifying matching correspondences across multiple imim-ages, a three-dimensional line cloud is constructed in a manner similar to that of point cloud generation.

2.2.2 Spatial decomposition using k-d trees

A k-d tree is a data structure that partitions data points in a k-dimensional space [1]. It does so by splitting the data in one dimension at each level of depth. The root node splits the data into two sub-sets along for example a variable ’x ’, storing in its child nodes subtrees having root nodes with smaller and larger x -values to the left and right, respectively. On the next level, the data is split along for example a variable ’y’, and so on. In 3D computer graphics applications, this technique can be used to facilitate fast (O(log n)) point neighbor and range searches. Figure 2.2 illustrates a k-d tree for two-dimensional data.

Figure 2.2: K-d tree for 2-dimensional data [16]. Each level alternately splits the data along the variables X and Y . The point (5, 4), for ex-ample, is placed in the left sub-tree of the root node since its X-value is smaller than that of (7, 2). The point (4, 7) is added to the right sub-tree of this newly added point since its X-value is smaller than that of the root node, and its Y -value is greater than that of (5, 4).

The k-d tree data structure is useful for point data, but not suitable for storing line segments, since they are not restricted to a single point in space. This work utilizes both point and line segment data, and the selected data structure thus needs to accommodate line segments as well. Consequently, either the data structure needs to be adapted and implemented in such a way as to facilitate such use, or the line segment data itself needs to be represented differently. In

(20)

this work, a combination of the two is used. Points are sampled on the line seg-ments and then readily stored in k-d trees, essentially converting the line clouds to point clouds. These points are indeed only approximate representations of the line segments, but sufficient for facilitating k-nearest neighbor search (knn). The points are recorded with an index, indicating which actual line segment they belong to, and when searching for the line segments closest to a knn query position, the distances to the actual line segments are still used. In this way, the knn search does not return the closest sampled point, but the actual line segment that is the closest. With a dense enough sampling, generating a suf-ficiently large set of points on the line segments, and with a proportionately large number of requested nearest neighbors (k ), a high probability of finding the closest line segment is ensured.

A knn search on k-d trees containing line segment data requires calculating the distance from an arbitrary query point to any line segment. The line seg-ments in question are identified by examining the closest sample points, but the actual distances to them are computed using the line segments themselves, as mentioned above. The distance to a line segment is calculated by first trans-lating the query point and the line segment to the origin, by subtraction of the position of the line segment start point. Using vector notation, this is expressed in Equation 2.2.

p′ = p − pbegin

pend′ = pend− pbegin

(2.2) Here, p is the vector from the origin to the query point (−−→OP), pbegin and

pend are vectors from the origin to the line start and end points, respectively

(−−→OPbegin and

−−→

OPend), and p′ and pend′ are vectors from the origin to the

translated points (−−→OP′ and−−→OPend′). Following this operation, the query point

is projected onto the translated line segment and then translated back from the origin, using Equation 2.3.

pproj= pend′ ·

p′•_p_end′

pend′•p_end′+ pbegin (2.3)

pproj is the projected point, on vector notation (

−−→

OPproj), and ’•’ denotes the

scalar product between two vectors. Next, the projected point is restricted to the line segment by determining if its position exceeds the limits of the line segment according to Equation 2.4.

pproj=

(

pbegin, p•pend<0

pend p•pend>kpendk2

(2.4) This sets the closest point on the line segment to the closest end point if the projected point is not within the confines of the line segment. If it is, the projected point retains the value obtained in Equation 2.3. Again, ’•’ denotes

(21)

the scalar product, and ’kk’ denotes the 2-norm of a vector. Finally, the distance between the query point and the projected point is calculated, representing the distance to the line segment.

2.2.3 Normal computation using PCA

Principal component analysis[14][9] is a statistical method that transforms ob-served data to bring out and emphasize its inherent variation. This is achieved by changing coordinate systems in such a way as to make the first coordinate axis represent the variable (or principal component) with the greatest variance within the data set. The second axis has the second greatest variance, and so on. These variables do not necessarily need to have any physical meaning; they can be combinations of multiple variables resulting in something that is meant for nothing but to express variation. The variance is computed using a covariance matrix that is constructed from the data set. From this matrix, eigenvectors and eigenvalues are extracted, representing the principal component directions and their corresponding variance, respectively.

Although the variables extracted by PCA do not have to represent anything physical, in the context of 3D positional data, they do; each extracted com-ponent is a direction vector. PCA can thus be used to calculate point cloud normals. The algorithm essentially fits a plane (orthogonal regression plane) to the points and sets the normal of the query point to the normal of that plane. For a set of points representing a surface, the variance is the greatest parallel to the represented surface; orthogonally, it is the smallest. Since PCA orders the variables by decreasing variance, the last component thus represents the normal. The PCA algorithm, as used in this work, requires retrieval of a set of points in the environment surrounding the point of which a normal is to be computed. To this end, the k-d tree data structure described previously (Section 2.2.2) is utilized. The covariance matrix is constructed using these points and their centroid, according to Equation 2.5.

C= 1 n− 1 n X i=1 (xi− ¯x)(xi− ¯x)T, (2.5)

where C is the 3×3 covariance matrix, n is the number of points, xiis each point

in the neighborhood and ¯xis the centroid of all those points. The eigenvectors and corresponding eigenvalues are then extracted from this matrix, and the eigenvector corresponding to the smallest eigenvalue is selected as the point normal. This normal is ambiguous in terms of positive or negative direction, since the plane can be oriented in either way. In order to ensure a correct direction, the normal is flipped if it points in opposite direction to that of the closest neighbor point.

(22)

2.2.4 Third-party software

Third-party software used throughout this work has been chosen so as to ade-quately represent a specific, already existing 3D reconstruction pipeline, while also being free to use. The three applications used are VisualSFM, PMVS and PoissonRecon, which handle the structure from motion, multi-view stereo and Poisson surface reconstruction steps, respectively.

VisualSFM takes as input a series of images, in the case of this thesis work a sequence depicting a small house, captured using a quadcopter drone and a camera. This particular data set was chosen for its simplicity relative to its completeness; only 66 images were needed to cover almost the entire ob-ject. This makes the subsequent application of algorithms both faster and more easily analyzed. VisualSFM was used in its GUI mode, providing an efficient means of monitoring the reconstruction while in progress. It calculates camera positions and produces a sparse reconstruction, all viewable within the program. PMVS (Patch-based multi-view stereo) was used to handle the second stage of the original 3D reconstruction, namely multi-view stereo. It takes the same set of images as input as the previsous step, along with the camera parameters generated. It is also available as a plug-in to VisualSFM, making it convenient for use as a direct follow-up procedure, within the same program. It produces a dense point cloud which also can be viewed within the GUI.

PoissonRecon is the third and final software used as part of the 3D recon-struction toolchain. It was used as a command line tool, needing only a set of points as input. These points are the dense point cloud generated by PMVS in the previous step. The mesh reconstruction is customizable through a number of parameters to control for instance the level of detail and individual point weights.

In addition to the software used for the starting 3D reconstruction, MeshLab is another tool used during the project. It was used primarily to convert models between different formats, to quickly explore ideas and to produce approximate textures for generated models.

2.2.5 Code implementation

The implemented work is developed in C++ and consists of a number of smaller classes and structs responsible for smaller tasks, with a larger central class containing most of the overall functionality. This central class handles all ini-tialization, import and export of models and user interaction, and controls the overall application of improvement algorithms, such as generation and displace-ment of points. The helper classes contain the specifications of the data and their related functionality. This includes algorithms for projecting a point onto a specific line of a line cloud, algebraic operations such as dot product calculation

(23)

and arithmetic operators. They also contain the data-specific implementation of k-d tree functionality, including indexing and computation of distances between data points.

There is also a main class that is the entry point of the program, that handles command line input parsing and customized execution of the program. This makes the program wholly operational using the command line. To run the program, a flag indicating which improvement algorithm to perform is needed, as well as paths to an input model, a line cloud model and an output model. In addition to this, three parameters changing the results of the execution can be supplied: line sampling density, maximum distance from lines at which to move points and line length weight when moving points. The program is also made to indicate to the user whether it is correctly called, as well as to communicate its progress during execution, through simple console output.

(24)

Chapter 3 Results

This section presents results from the implemented approaches.

3.1 Implemented approaches

Results from the different methods developed for this work are presented be-low. Two example data sets depicting a small house are used to visualize the results. The point clouds were generated from a set of 66 images, processed using VisualSFM and PMVS. The first data set uses images scaled down to a size of 900x506 pixels (16:9 ratio), while the second is generated from the full-size (4000x2250 pixels) images. Figure 3.1 and Figure 3.2 display the original dense point clouds and their reconstructed meshes, respectively, to which subsequent images are compared. The low resolution reconstruction is on the left side and the high resolution reconstruction is on the right. The triangulated meshes were generated using PoissonRecon. The line clouds generated from the same data sets are presented in Figure 3.3.

Figure 3.1: Original point clouds. Generated using VisualSFM and PMVS (as a VisualSFMplug-in).

(25)

Figure 3.2: Meshes constructed from original point clouds. Generated using PoissonRecon.

Figure 3.3: Original line clouds. Generated using Line3D++, with camera pa-rameters from VisualSFM.

3.1.1 Line sampling

Figure 3.4 and Figure 3.5 display the original point clouds with points added through line sampling, and the corresponding reconstructed meshes, respec-tively. Adding points along identified lines makes certain features appear in the point cloud that were practically imperceptible before, already at this stage. The improvements to the triangulated meshes are also apparent. Especially the low-resolution reconstruction has been refined in a number of areas, for instance along the roof ridge. Some artifacts can be observed in the meshes, however, where the surface is too tightly conformed to the lines, making it not align well with the rest of the mesh. An example of this is the two lines running the length of the large beam on the roof ridge. Here, the individual lines are clearly visible, and do not create a cohesive shape and instead give rise to unattractive, jagged structures.

(26)

Figure 3.4: Point clouds after use of line sampling.

Figure 3.5: Meshes constructed after use of the line sampling method.

3.1.2 Point displacement

Figure 3.6 and Figure 3.7 display the original point clouds with points altered through point displacement, and the corresponding reconstructed meshes, re-spectively. The points can be observed to have moved toward the identified lines, condensing the cloud around characteristic features of the house, such as large beams and planks. The subsequent meshes are also enhanced, albeit not to the same extent. Nevertheless, the same areas have seen some important improvements here, bringing out much of the same features as the line sampling does, without introducing the same conspicuous artifacts.

(27)

Figure 3.6: Point clouds after use of the point displacement method.

Figure 3.7: Meshes constructed after use of the point displacement method.

3.1.3 Vertex displacement

Figure 3.8 displays the meshes with vertices translated toward line segments. Using this method, certain features of the house are very well reproduced. The beam along the roof ridge is particularly well preserved, with large areas that are essentially flat, much like the original, real-world object. This is only true for the low-resolution reconstruction; the higher resolution model is practically unaffected by the vertex displacement.

(28)

Figure 3.8: Meshes constructed after use of the vertex displacement method.

3.1.4 Combinations

Results from the investigated combinations of techniques are presented in the following sections.

Line sampling and point displacement

Figure 3.9 and Figure 3.10 display the point clouds with both added and dis-placed points, and the corresponding reconstructed meshes, respectively. Here, the reconstruction features both straighter lines and smoother areas compared to the original mesh.

Figure 3.9: Point clouds after use of both line sampling and point displacement methods.

(29)

Figure 3.10: Meshes constructed after use of the line sampling and point dis-placement methods.

Line sampling, point displacement and vertex displacement

Figure 3.11 displays the reconstructed meshes after application of all three meth-ods: line sampling followed by point displacement and finally vertex displace-ment. The low-resolution mesh has had a few areas altered by this step, creating large, locally cohesive areas.

Figure 3.11: Meshes after use of line sampling, point displacement and vertex displacement.

Point displacement and vertex displacement

Figure 3.12 displays the reconstructed meshes after application of point and vertex displacement, introducing no new points to the mesh. This combina-tion of methods yields some of the improvements mencombina-tioned for the preceding approaches, yet fails to introduce a number of distinct details that the line sam-pling previously did. An example of this is the vertical beam on the sides of the roof, that are not as clearly visible using this method.

(30)

(31)

Chapter 4 Discussion

The results show that the implemented algorithms do improve the visual quality of the reconstructed scenes, albeit to different extents depending on the density of the original model. The lower-density model shown in Section 3 can be ob-served to be efficiently refined by all approaches, while the full-resolution model is only marginally improved overall. The smaller model is originally poorly re-constructed compared to the larger one but with application of the different procedures, the two models start to look more and more alike. Large features, such as the roof ridge and corners of the house are particularly well refined in for example the left image in Figure 3.10. In the right image, however, these features are practically unchanged from the original mesh in Figure 3.2. Fea-tures of this second model that do improve with the implemented algorithms are smaller details, for example beams and railings on and below the pergola roof.

The reason for the improvements being smaller for models reconstructed us-ing higher-density point data is straightforward: line clouds encode much of the same information as point clouds do, only in a more compact form. This is especially true for man-made environments, where most objects feature many straight lines and textureless areas, lending themselves well to line segment de-tection. Line segments are virtually resolution-free and thus equally dense no matter the resolution of the underlying images. Only the number of detections differ between resolutions. Using low-resolution images will consequently result in a smaller number of line segments, likely representing the most prominent features of the scene, but that can be used to generate a denser model through point sampling. This essentially serves the same purpose as an MVS step does, namely densification.

When the original point model is already dense, the implemented approaches improve mostly on specific areas that MVS struggles with. Such areas are for example wiry structures such as fences or antennas, where specific feature points are hard to find but line segments are easily detected. Comparing Figure 3.2

(32)

and Figure 3.4 makes this apparent. The beams and railings on the pergola (closest to the camera) are significantly better represented after use of the line sampling method.

When computation time is of little importance, a high-density MVS applica-tion is a suitable step towards accurately reconstructing a scene. In general, the line cloud approach employed in this work does not improve the reconstruction results significantly, and it is not able to produce models of the same quality on its own. What it does, however, is either provide improvements on specific areas, or faster produce approximate models. This makes the approach useful as either a post-production tool or perhaps as an aid for on-site technicians. During post-production it can be used to refine certain features of models, such as the mentioned fences or similar. As an on-site aid it could facilitate reliable assurance of sufficient camera coverage when capturing scenes, by being able to quickly construct accurate visualizations of scenes (if a sparse point cloud is not sufficient).

4.1 Future work

In addition to the mentioned potential applications of the implemented ideas, the combined employment of point-based and line-based 3D reconstruction shows further potential. An initial idea for this work was to not only improve the point clouds and meshed models, but also implement changes to the Poisson recon-struction step. This was deemed too involved and time-consuming, though, since changes to an already existing algorithm is far more complex than imple-menting new procedures that solely alter data. This idea thus fell out of the scope of this thesis. However, the idea still holds promise and is something that could be investigated further in future studies. The Poisson reconstruction step could be made to account for line segments in some way, for example akin to the point and line cloud-based approach by Sugiura et al. [15], mentioned in Section 1.2, where they confine reconstructed elements (edges of tetrahedra faces) to identified line segments.

Another idea that was discarded due to time constraints is another way to sample the line segments. Currently, points are generated only directly on the line segments. This could be expanded upon by also generating points along their sides. The idea is that a detected line segment indicates that there are probably two distinct surfaces (most likely planes) meeting at this line segment, and that the line segment itself represents their intersection. Using this infor-mation could make it possible to more effectively force generation of sharper features. The position of the points generated on these supposed planes might be computed by determining the curvature and mean normal of neighboring points along a line segment.

(33)

An additional idea that was devised during this work but not implemented due to time constraints is to incorporate a certainty measure when using the line clouds. Since the 3D line segments are generated using 2-dimensional line seg-ment detection procedures in images, a certainty measure for each line segseg-ment is undoubtedly already involved. This leads to the idea that this 2D measure should carry through to a 3D equivalent, that could be used when both sampling and displacing points in the implemented algorithms. Using Line3D++, however, such a measure was not found (during the rather limited investigation into this idea), leading this idea to be discarded in favor of other ideas. It should be possible to use this idea effectively in the future, though, either if the required information can be extracted after all, or if some other line cloud generation algorithm is used.

As an alternative idea to using individual certainty measures for each line seg-ment, Line3D++ allows for setting of certain parameters that could be used to achieve a similar effect. Most important is a maximum threshold for the spatial uncertainty allowed when clustering 2D line segments across multiple images that can be defined when calling the program. An idea is to use this measure to generate multiple line clouds of the same scene, with different uncertainty values. This would separate line segments with different uncertainty into multi-ple partitions, giving line segments at least semi-individual certainty measures. These line clouds could then be used to iteratively from most to least certain al-ter the point clouds through either sampling or point displacement. This should achieve a form of certainty measure use, and it might be worth investigating fur-ther in the future.

A final idea that could be studied further in future work is a localized line-based reconstruction. Since the results indicate that dense point clouds are only marginally improved overall by the implemented methods, it might be benefi-cial in terms of computation time to use them only where they are the most useful. This could be achieved by doing localized line-based reconstruction on areas that have been identified as inadequately reconstructed by the traditional pipeline. The cameras that have coverage over such areas can be extracted from the camera file already generated during the SfM step, and a line cloud can be constructed from these, with which consecutive operations can be performed.

(34)

Chapter 5 Conclusion

In this thesis, a line-based approach to improving traditional image-based 3D reconstruction of man-made environments has been studied and successfully implemented. Previous work indicates that line clouds can be used as an al-ternative to dense point clouds in this context, but this work investigates its use alongside such methods as Multi-view Stereo (MVS), to determine if the accuracy of produced models can be improved utilizing the strengths of both techniques. The results show that improvements on relatively low-density MVS results can receive significant improvements from the implemented algorithms, while denser models are only marginally enhanced overall. Such models can contain local areas which are inadequately reconstructed, however, that line segment data much better can represent. With this in mind, proposed appli-cation of the implemented methods are either as a fast substitute for MVS for when only quick preliminary results are needed, or as a tool to improve certain areas that dense point clouds have trouble representing. Future work could extend the implementation to allow for more general improvements of dense reconstruction, and includes ideas such as a certainty measure inclusion, plane sampling and changes to the surface extraction step.

(35)

Bibliography

[1] J. L. Bentley. Multidimensional binary search trees used for associative searching. Commun. ACM, 18(9):509–517, Sept. 1975.

[2] J. L. Blanco-Claraco. nanoflann. GitHub repository, https://github. com/jlblancoc/nanoflann. Accessed: 2017-03-28.

[3] B. L. Curless. New methods for surface reconstruction from range images. PhD thesis, Stanford University, 1997.

[4] Y. Furukawa, C. Hern´andez, et al. Multi-view stereo: A tutorial. Founda-tions and Trends® in Computer Graphics and Vision, 9(1-2):1–148, 2015. [5] S. Herbort and C. W¨ohler. An introduction to image-based 3d surface reconstruction and a survey of photometric stereo methods. 3D Research, 2(3):1–17, 2011.

[6] M. Hofer, M. Maurer, and H. Bischof. Line3d++. GitHub repository, https://github.com/manhofer/Line3Dpp. Accessed: 2017-03-28. [7] M. Hofer, M. Maurer, and H. Bischof. Improving sparse 3d models for

man-made environments using line-based 3d reconstruction. In 3D Vision (3DV), 2014 2nd International Conference on, volume 1, pages 535–542. IEEE, 2014.

[8] M. Hofer, M. Maurer, and H. Bischof. Efficient 3d scene abstraction using line segments. Computer Vision and Image Understanding, 157:167–178, 2017.

[9] H. Hotelling. Relations between two sets of variates. Biometrika, 28(3/4):321–377, 1936.

[10] M. Kazhdan and H. Hoppe. Screened poisson surface reconstruction. ACM Transactions on Graphics (TOG), 32(3):29, 2013.

[11] C. Kurz, T. Thorm¨ahlen, and H.-P. Seidel. Visual fixation for 3d video stabilization. Journal of Virtual Reality and Broadcasting, 8(2):1–12, 2011. [12] D. G. Lowe. Distinctive image features from scale-invariant keypoints.

(36)

[13] A. Nordmann. Epipolar geometry. Wikipedia, Creative Commons: CC BY-SA 3.0, https://en.wikipedia.org/wiki/Epipolar_geometry#/media/ File:Epipolar_geometry.svg. Accessed: 2017-03-28.

[14] K. Pearson. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572, 1901.

[15] T. Sugiura, A. Torii, and M. Okutomi. 3d surface reconstruction from point-and-line cloud. In 3D Vision (3DV), 2015 International Conference on, pages 264–272. IEEE, 2015.

[16] Wikipedia. k-d tree. Wikimedia Commons: Public Domain, https:// upload.wikimedia.org/wikipedia/commons/2/25/Tree_0001.svg. Ac-cessed: 2017-03-28.

Improving Conventional Image-based 3D Reconstruction of Man-made Environments Through Line Cloud Integration

Department of Science and Technology

Institutionen för teknik och naturvetenskap

Linköping University

Linköpings universitet

LiU-ITN-TEK-A--18/010--SE

Improving Conventional

Image-based 3D Reconstruction

of Man-made Environments

Through Line Cloud Integration

Martin Gråd

LiU-ITN-TEK-A--18/010--SE

Improving Conventional

Image-based 3D Reconstruction

of Man-made Environments

Through Line Cloud Integration

Examensarbete utfört i Medieteknik

vid Tekniska högskolan vid

Linköpings universitet

Martin Gråd

Handledare Reiner Lenz

Examinator Sasan Gooran

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare –

under en längre tid från publiceringsdatum under förutsättning att inga

extra-ordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner,

skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för

ickekommersiell forskning och för undervisning. Överföring av upphovsrätten

vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av

dokumentet kräver upphovsmannens medgivande. För att garantera äktheten,

säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ

art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i

den omfattning som god sed kräver vid användning av dokumentet på ovan

beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan

form eller i sådant sammanhang som är kränkande för upphovsmannens litterära

eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se

förlagets hemsida

http://www.ep.liu.se/

Copyright

The publishers will keep this document online on the Internet - or its possible

replacement - for a considerable time from the date of publication barring

exceptional circumstances.

The online availability of the document implies a permanent permission for

anyone to read, to download, to print out single copies for your own use and to

use it unchanged for any non-commercial research and educational purpose.

Subsequent transfers of copyright cannot revoke this permission. All other uses

of the document are conditional on the consent of the copyright owner. The

publisher has taken technical and administrative measures to assure authenticity,

security and accessibility.

According to intellectual property law the author has the right to be

mentioned when his/her work is accessed as described above and to be protected

against infringement.

For additional information about the Linköping University Electronic Press

and its procedures for publication and for assurance of document integrity,

please refer to its WWW home page:

http://www.ep.liu.se/

Improving Conventional Image-based 3D

Reconstruction of Man-made Environments

Through Line Cloud Integration

Martin Gr˚

ad

Link¨

oping University

Contents

List of Figures

Chapter 1

Introduction

1.1

Background

1.1.1

Structure from motion

1.1.2

Multi-view stereo

1.1.3

Poisson surface reconstruction

1.1.4

Line clouds