CLASSIFICATION OF BRIDGES IN LASER POINT CLOUDS USING MACHINE LEARNING

(1)

School of Innovation Design and Engineering

Västerås, Sweden

Thesis for the Degree of Master of Science in Engineering - Robotics 30.0

credits

CLASSIFICATION OF BRIDGES IN LASER

POINT CLOUDS USING MACHINE

LEARNING

Sunny Liao Nilsson

slo15003@student.mdh.se

Martin Norrbom

mnm16004@student.mdh.se

Examiner:

Miguel Léon Ortiz

Mälardalen University, Västerås, Sweden

Supervisor:

Ning Xiong

Mälardalen University, Västerås, Sweden

Company Supervisors:

Andreas Rönnberg

Lantmäteriet, Gävle

Anders Ekholm

Lantmäteriet, Gävle

22/06/2021

(2)

Abstract

In this work, machine learning was being used for bridge detection in point clouds. To estimate the performance, it was compared to an existing algorithm based on traditional methods for point classification. The purpose of this work was to use machine learning for bridge classification in point clouds. To see how today’s machine learning algorithms perform and find the challenges of using machine learning in point classification. The point clouds used are based on airborne laser scanning and represent the land area over Sweden. To get satisfactory results, several different testing areas were used with varying landscapes. For comparing the two different algorithms, both statistical and visual analysis were made to identify the algorithms’ behaviours, strengths, and weaknesses. The machine learning algorithm tested was PointNet++, and it was compared to the current algorithm that the Swedish mapping, cadastral and land registration authority use for bridge classification in point clouds. Based on the results, the current method had higher accuracy in the classification of bridge points, but the machine learning approach could detect more bridges. Thus, it was concluded that there are potential for this machine learning approach, but there are still needs for improvements.

(3)

Acknowledgement

First, we thank Lantmäteriet for giving us this opportunity of doing our thesis work with them. Then, we would like to thank our supervisors from Lantmäteriet. Andreas Rönnberg for supporting us from the beginning to the end, also for giving us feedback and explaining concepts of geodata analysis. Anders Ekholm for giving us advises for how to improve our machine learning approaches. Also, for setting up the workstation for us to improve the training process of the machine learning. After that, we would like to thanks our supervisor from Mälardalens University. Ning Xiong, for giving us support, professional guidance and advice during the whole thesis work. In the end, we would like to give a special thanks to Samuel Vikström for giving us optimization suggestions.

(4)

Acronyms

LiDAR light detection and ranging ALS _{airborne laser scanning} DSM _{digital surface model} DEM digital elevation model DTM digital terrain model GPS global positioning system ML machine learning MLP multi-layer perceptron TP true positive TN true negative FP _{false positive} FN _{false negative} FPR false positive rate TPR true positive rate OA overall accuracy

CNN convolutional neural network KNN k-nearest neighbors

ROI region of interest SVM support vector machine RF _{random forest}

DGCNN dynamic graph convolutional neuron network MSG multi-scale grouping

MRG multi-resolution grouping PFP point feature propagation FPS farthest point sampling SA set abstraction

FPL feature propagation layer FC fully connected

FTP _{file transfer protocol}

DBSCAN density-based spatial clustering of applications with noise MSE mean squared error

(5)

List of Figures

1 Forest laser scanning production status. . . 3

2 Illustrate of different patterns that can be used in laser scanners. . . 4

3 Reflectance and Wavelength for different materials from the work of M. Pfenningbauer and A. Ullrich [14], the data is based on ASTER spectral library version 2.0 [22]. . . 5

4 Illustration of the number of returned echos and return number for a given laser pulse. . . . 5

5 Represent the architecture of PointNet. Classification take n points as input, then get k score for m classes as output. Segmentation has a output score for each point. . . . 7

6 PointNet++ architecture for classification and segmentation. . . 7

7 Two methods to achieve robust feature learning under non-uniformed sampling density for PointNet++. . . 8

8 Hierarchical convolution for regular grids and point cloud. . . 9

9 PointCNN architecture for classification and segementation . . . 10

10 Clustering process of DBSCAN with 𝑚𝑖𝑛𝑃𝑡 𝑠 = 3. . . 11

11 Convex hull of a ten points set. . . 11

12 A part of the provided data [13] in Karlstad with many bridges, the bridges is marked with the colour red and the intensity for each point is presented with a colour scale from blue (low) to yellow (high). . . 12

13 Confusion matrix for binary classes. . . 13

14 Working flowchart. . . 17

15 Flowchart for the application to automatically generate datasets. . . 18

16 Tile block generating type A and B. Orange color areas are manual selected coordinates from the web application. . . 19

17 Placement of bridge in the tile block for method A . . . 19

18 Upsampling and downsampling in points and tile block level. . . 20

19 Flowchart for final training and testing with overlapping and filtering. Internal voting was used to vote for the points that were upsampled in the tile block. The second voting is applied to vote for the overlapped points between tile blocks. . . 22

20 Locations of training and testing areas for the final test. Orange marks represent the bridge location of the the training set, and blue marks represent the final testing areas. . . 23

21 Illustration of overlapping between tile blocks. . . 24

22 The learning curve in the final training. The curve was converged after 450 training epochs. 28 23 Comparison of prediction result with and without filter. . . 30

24 The edge of the watercourses are difficult to distinguish from bridges in the ML algorithm. 31 25 The industry building with flat roof (in the square) are difficult to distinguish from bridges in the ML algorithm. . . 31

26 The ml algorithm has a problem finding long tunnels (green ring) and identify the limit where the bridge starts and ends (purple ring). . . 32

27 The buildings with special shape (in the square) and the townhouse with flat roof (in the ring) are difficult to distinguish from bridges in the ML algorithm. . . 32

28 Locations of predicted bridges in Karlstad with the ml algorithm. . . 33

29 In the ML algorithm, the forest with tight crowns and leaves is difficult to distinguish from bridges. . . 33

30 Both the current and the ml algorithms have problems to identify narrow and short bridges. 34 31 Histogram of the number of bridges found by the algorithm. . . 34

32 Comparison in complex terrain at Trollhättan. . . 36

33 Comparison of two algorithms for prediction. The lime green area is the harbor (FP). . . . 36

34 Comparison of two algorithms for prediction. The lime green area in the pink ring is the edge of watercourse (FP). . . 37

35 The place found with wrong labelled bridge points, in the final test set. . . 37

36 Locations of predicted bridges in Hälsingborg. . . 43

37 Locations of predicted bridges in Lund. . . 43

38 Locations of predicted bridges in Norrköping. . . 44

39 Locations of predicted bridges in Nyköping. . . 44

(7)

(8)

List of Tables

1 The combination of point features tested. Feature with number 1 is selected, 0 is not selected. 21 2 The parameter settings tested for the tile blocks. The first row indicates which side lengths

that where tested for each tile block. The middle rows specify the number of points tested in each tile block. The last row is the average number of point from the provided data for

such size of area. . . 21

3 The test combination of search radius for the first and the second grouping layers. . . 21

4 This table contains information about the final data set that are used for the training and validation. Bridge blocks were fetched from the training areas that are described in figure 20. 22 5 description of the color in point cloud plot. . . 26

6 The common parameters for PointNet++ used during training. . . . 26

7 The extra parameters used during training to find the best feature combination. . . 26

8 Illustrates the results based on the feature combination. . . 26

9 The extra parameters used during training to find the best tile block size and number of input points. . . 27

10 Results for tile block size and number of input points setting. . . 27

11 The extra parameters used during training to find the best search radius. . . 27

12 Results for search radius setting. . . 28

13 The features and parameters used during training for final result and overlapping. . . 29

14 Results for test dataset with overlapping. Note that FP, FN and TP are normalized, which means that TP + FN = 1 and TN + FP = 1. . . 29

15 Input parameters used in the filter. . . 30

16 Results for test dataset with filter. Note that FP, FN and TP are normalized, which means that TP + FN = 1 and TN + FP = 1. . . 30

17 Comparison of current algorithm and ml algorithm. Note that FP, FN and TP are normalized, which means that TP + FN = 1 and TN + FP = 1. . . 35

18 Result for feature selection in test set contained only tile blocks with bridges over each place. 47 19 Result for tile block size setting in test set contained only tile blocks with bridges over each place. . . 47

20 Result for search radius setting in test set contained only tile blocks with bridges over each place. . . 47

21 The names of the LAZ-files used in the final testset. They are available for download at Lantmäteriets home page. . . 48

(9)

1 Introduction

With the development of airborne light detection and ranging (LiDAR) technology, the airborne laser scanning (ALS) method is widely used in basic surveying and mapping, three-dimensional modelling, forestry, flood modelling and mapping [1]–[3]. The non-contact measurement of the airborne LiDAR can quickly and effectively obtain three-dimensional coordinate information of points and provide fast and accurate basic data for the surveying and mapping [4]. It is difficult to obtain a digital surface model (DSM) of forest-covered areas with traditional digital photogrammetric topographic mapping. But the laser pulses of the airborne LiDAR can penetrate the canopy. The echo of the signal can be used to generate DSM of forest areas and other vegetation-covered areas [5]. The application of airborne LiDAR in flood modelling and mapping can be applied for flooding analysis. It can help to visually display the coverage of the flood, calculate the submerged area and water level, predict damage, and find effective rescue measures during flooding [3]. To create a flood model, a digital elevation model (DEM) is needed to extract hydrological characteristics from rivers and watersheds. The DEM must be adjusted to force drainage and connect to the real river network. The main methods of adjusting DEM include filling sinks and cutting obstacles such as trees and bridges. Since bridges and bridge-like objects that lie directly on the river will block the river in the flood prediction simulation. Detecting and classify the bridges is very important in many research works of this field [6]–[8].

The Swedish mapping, cadastral and land registration authority1started the ALS program for the whole of Sweden to develop a new national height model in 2009. The ALS data has been greatly used in predicting and producing a nationwide forest attribute map of Sweden [9]. In the spring of 2018, a new LiDAR scan was planned for the forest predicting, and 87% of Sweden’s land area will be scanned when it is finished. Classification of points on bridges are needed to update the national height model. D. Forsberg [10] has developed an automated algorithm for bridge detection. This algorithm is combined with a manual review and is currently used to classify bridge points in the Swedish mapping, cadastral and land registration authority. Reference data from several sources such as roads and railways are used. Considering that the current algorithm does not give a perfect result, moreover that it requires reference data in order to prevent high false positive (FP) value. A more accurate algorithm that does not rely on reference data is desirable to use. As the machine learning (ML) algorithm becomes more and more common in point cloud analysis, it starts to challenge the traditional methods [11][12]. Therefore, it is worth trying if the ML method can perform better.

This thesis aims to test the classification of bridges in ALS point cloud by using existing ML algorithms for the Swedish mapping, cadastral and land registration authority. The paper is structured in eight sections, problem formulation and limitations of the thesis are listed at the end of the current section. Background of this thesis are specified in section2. A brief description of ethical and societal considerations is written in section3. Research methodology is listed in section4. The technical method and implementation to solve the addressed problems of this thesis are covered in section5. The result, evaluation and discussion is displayed in section6. The section7, section8respectively correspond the conclusions and future work.

1.1 Problem formulation

The area of the point cloud data used for this work occupies about one-third of the size of Sweden, and it covers the areas from the south to the north. The land type and ecological environment are varied, which has led to complicated data types. The point density of the dataset is between 1 to 2 points per square meter, which means that the amount of data is huge. There are many challenges to identify bridges from such a dataset by using ML algorithm. The first challenge is creating a training set that can cover as many different areas as possible. Since bridges represent a small fraction of total land area, the distribution of bridge points is minimal compared to the total dataset. Another challenge is how to train a model to handle such an imbalanced dataset and achieve reasonable accuracy. From the problems described above, the following research questions are raised:

RQ 1 How do today’s machine learning algorithms perform in ALS point cloud classification for bridges? RQ 2 How to increase the accuracy of a learning algorithm for point classification in bridges?

RQ 3 How to build up a classification model that considers imbalanced and large ALS data? 1https://www.lantmateriet.se/

(10)

1.2 Limitations

The biggest limitations during this work was time and hardware resources. To train ML for point clouds a large amount of graphics processing unit (GPU) memory is often required. One workstation was used with sufficient hardware resources during the project. It took about four days to train a model with a training set that covers 34𝑘 𝑚2of land area. The dataset would need to be increased or changed several times to achieve higher performance for the ML model. Due to the limit of available time, this work is more focused to find the potentials of using ML for point classification in ALS data rather than getting an optimal model with high performance.

(11)

2 Background

This section covers general information of the background and theory that is relevant for the thesis. It also contains the details about the provided dataset, the potential ML methods for classification in point clouds, evaluation methods for ML models, and the related works.

2.1 Dataset

Figure 1: Forest laser scanning production status.

The ALS point cloud dataset used in this thesis is called Laserdata Skog [13], and is provided by the Swedish mapping, cadastral and land registration authority. The dataset is divided into index boxes with a size of 2.5 × 2.5 𝑘 𝑚2, and stored in the form of LAZ2 files. Figure1shows the production status3of Laserdata Skog. The data was classified into 3 levels. Level 1 has automated ground classification, level 2 and 3 have classified the bridges, improved land classification of dams, separation between land and water. Since level 3 is an improvement of level 2, the production status only shows the information of classification level 1 and level 3. Level 3 is the one that used in this thesis.

Laserdata Skog is captured by a Leica ALS80-HP4scanner since 2018, and it is still being used. From 2https://www.loc.gov/preservation/digital/formats/fdd/fdd000418.shtml

3https://webgisportal.lantmateriet.se/portal/apps/webappviewer/index.html?id=36d1e8bd49694da289e2ec7f774f531c 4https://leica-geosystems.com/-/media/files/leicageosystems/products/other/specifications/leica_als80_hp_productspec_en.ashx

(12)

Aircraft direction Aircraft direction Rotating Seesaw

Figure 2: Illustrate of different patterns that can be used in laser scanners.

the year 2020, another laser scanner called Leica TerrainMapper5has also been used to scan the land areas. The difference between these two scanners is that Leica ALS80-HP uses a seesaw pattern. This means that the scanning angle is 0° when the scanner is pointed straight under the aircraft, and the scanning angle is 20° when it is pointing to the edge. Leica TerrainMapper has a rotating scanner and a constant rotation angle. Figure2shows a simple illustration of the scanning patterns.

The planned flight altitude is about 3000𝑚 above the ground. The scanning angle of the laser scanner is max ±20°. The footprint of the laser beam is less than 0.75𝑚 at the ground surface, depending on flying height. The point density is about 1-2 points per 𝑚2, and the overlap between flight lines is at least 10%. The following features are provided from Laserdata Skog point cloud:

1. [x,y,z] coordinates are collected for each point. According to these coordinates, the three-dimensional structure information of the measured object can be directly obtained. The three-dimensional inform-ation carries geographic informinform-ation that can be used for data processing.

2. Intensity value is collected for each data point. It has a measure index value between 1 to 256, which described the return strength of the laser beam. The intensity varies with the composition of the surface object reflecting the laser beam. Different ground objects/materials have different reflectivity, as it described in Figure3. The reflectivity increases as the number increases. The laser scanner used in the Laserdata Skog dataset has a wavelength of 1064𝑛𝑚. The relative of reflectance between asphalt, cement, soil, trees, grass, and snow is different at this wavelength [14]. Therefore, the intensity can be applied as an extra feature to classify objects in point cloud data [15]–[19]. However, the values of laser beam intensity may not be reliable. It can be affected by many factors, such as object surface composition, scan angle, roughness and orientation of the object surface, beam divergence, atmosphere, moisture content [20][21].

(13)

Figure 3: Reflectance and Wavelength for different materials from the work of M. Pfenningbauer and A. Ullrich [14], the data is based on ASTER spectral library version 2.0 [22].

3. Number of return is the total number of returned echos for a given pulse. One transmitted pulse might return several echoes. It can occur where there are non-solid materials such as trees and vegetation. As shown in figure4, the number of return in this case is two.

4. Return number indicates which echo/return the point represent of the pulse, see figure4.

Returned Echo First return Second return Magnitude Duration Terrain

(14)

5. Number of classes. The points are divided into following classes: • 1 Unclassified points.

• 2 Ground points.

• 7 Low points (noise). The points that lie under the ground. • 9 Ground-classified points within water surfaces.

• 17 Points on bridges (only in classification level 3 see Figure1)

• 18 High noise. The points that lie above the ground, vegetation, and buildings. For example the points on the clouds.

6. GPS time is the global positioning system (GPS) timestamp of the laser beam emitted from the aircraft, expressed in seconds of the GPS week6.

2.2 Current algorithm for bridge detection

Classification of bridges was investigated with an automated algorithm by D. Forsberg [10] in 2011. The algorithm is divided into four parts: reduce the input data, identify potential bridges, detect the bridges, and capture the points in the point cloud that belongs to the bridge. Instead of interpolating the data into voxels, the raw points were used directly and then divided into grids. Each grid contains the points within its boundary. Segmentation was used to remove objects, identify potential bridges, and group potential bridge points into bridge segments. Minimum spanning tree is the segmentation algorithm that is used for the removal of objects. The purpose is to remove objects without a connection to the ground, such as vegetation and roofs. Consecutive slope segmentation was used to find potential bridges. Two assumptions were made to identify bridges: ‘Bridges connect to the ground at a minimum of two locations’, and ‘Points located at the bridge sides are raised above the surrounding neighbourhood’. Convex hull was used to find the boundary of a bridge. If the boundary fulfils the assumptions, it will be classified as a bridge. To classify each of the remaining potential bridge points, proximity segmentation was used.

The algorithm had problems classifying terrain with steep slopes, roofs and buildings with a smooth connection to the ground, and dense multi-layered vegetation. To get better performance, reference data for the location of roads and railway networks were used to reduce the search area of the point clouds.

2.3 Machine learning algorithms

The point cloud has an irregular format. For machine learning algorithms that require input in a specific order, the points usually need to be converted into regular three-dimensional voxel grids [23][24], or 2D images [25] to be able to analyse. But some algorithms can skip this process and work directly on the point cloud, such as PointNet [26], PointNet++ [27], and PointCNN [28].

2.3.1 PointNet

The classification of point clouds has improved significantly in both time complexity and accuracy since PointNet was released in 2016 [26]. The concept of PointNet is to skip as much pre-processing as possible and have the raw point cloud as input. Three main properties inspire the architecture of PointNet:

1. Point clouds are a set of points without a specific order. This means that the order of the input points must be irrelevant.

2. The points are not isolated. The models need to be able to capture local structure from nearby points. 3. Rotation or translation of the object should not change the outcome.

(15)

nx3 nx3 shared _nx64 _nx64 shared nx1024 1024 global feature (512,256,k)mlp output scores input transform input points mlp (64,64) _feature transform mlp (64,128,1024) max pool k T-Net matrix multiply T-Net matrix multiply 3x3 transform transform64x64 n x 1088 _shared nx128 shared nxm output scores point feature mlp (128,m) mlp (512,256,128) Segmentation Network Classification Network

Figure 5: Represent the architecture of PointNet. Classification take n points as input, then get k score for m classes as output. Segmentation has a output score for each point.

Figure 5 represents the architecture of PointNet. First, a pose normalization is performed with a transformation network (T-Net), then two layers of multi-layer perceptron (MLP) are used to get a higher dimension to build up features. After that, a feature transformation with regularization is done to get local features in the network. Three layers of MLP comes after to get the global feature with max-pooling.

In segmentation, the local and global features are combined and used as input to a new segmentation network to classify every single point in the network. PointNet has proven to be robust for noise and missing points in the data. This algorithm has also inspired newer algorithms for point cloud classification, such as PointNet++ and PointCNN. Both of these algorithms have an improved capability to learn from local structures. However, PointNet has its weaknesses. Because the desired property is not met, it cannot capture the local structure and shows limited ability in identifying fine-grained patterns and generalizing complex scenes [27]. 2.3.2 PointNet++ (N, d+C) (N1, K,d+C) (N1, d+C 1) (N2, K,d+C 1) (N2, d+C2 ) (N1, d+C 2+C1) _(N₁, d+C3) (N, d+C3 +C) _{(N, k}) per-point scores unit PointNet interpolate unit PointNet interpolate PointNet ( k ) class scores (1, C4) Segmentation Classification sampling &

grouping PointNet sampling &grouping PointNet

Hierarchical point set feature learning

skip link concatenation

set abstraction set abstraction

fully connected layers

Figure 6: PointNet++ architecture for classification and segmentation.

PointNet++ is built upon PointNet. It introduced a hierarchical grouping of points to capture local structure in the point clouds. The architecture of PointNet++ can generally be divided into two main parts: hierarchical feature learning architecture and its application [27].

The hierarchical feature learning structure consists of two learning structures. One is a point set feature learning, and the other is robust feature learning under non-uniformed sampling density. There are several

(16)

set abstraction (SA) levels (Figure6) in point set feature learning. It Mainly used for the feature extraction module, which is downsampling. Each SA level is constructed by three layers: Sampling layer, Grouping layer and PointNet layer. Sampling layer selects a set of input points that define centroids of local regions by utilising farthest point sampling (FPS). The first step of FPS is to randomly select a point, then select the point furthest away from this point to join the starting point. Afterwards, continue to iterate until the required number is selected. Compare with random sampling, this sampling algorithm can cover the entire sampling space better. Grouping layer constructs local regions by finding neighbouring points around the centroids with clustering approaches. Ball query and k-nearest neighbors (KNN) methods are used in the grouping layer to group points. The principle of the ball query grouping method is: given a centre point, all points within a given radius are included. At the same time, the number of neighbouring points 𝐾 is given as the upper limit of the number of neighbouring ballpoints. Compared with KNN searching for the nearest 𝐾_{points, the ball query method has a local area with a fixed radius, which is more suitable for local feature} recognition. PointNet layer uses a mini-PointNet to encode local region patterns into feature vectors.

concat

(a) Multi-scale grouping (MSG)

concat

(b) Multiresolution grouping (MRG)

Figure 7: Two methods to achieve robust feature learning under non-uniformed sampling density for PointNet++.

Since point clouds commonly come with varied densities in different areas, and it creates challenges for point feature learning. The features learned in point dense areas might not be recognized in point sparse areas, and the features learned in point sparse areas might not be recognized in point dense areas. To solve this issue, a larger proportion of patterns in a larger range need to be searched. To achieve this, two grouping methods are used to enable the PointNet layer to be adaptable. It can learn to combine features from different regions according to the input sampling density. The first method, called multi-scale grouping (MSG), randomly drops out input points to get a varied density in the training data. See the structure in figure7a, for the same centroid point, if three different scales are used, three areas are drawn around each centroid point, and the radius and the number of points inside each area are different. Different scale areas are sent to different PointNets for feature extraction, and then concat is used to concatenate the features of the centroid points. In other words, MSG is equivalent to multiple hierarchical structures in parallel. Each structure has the same number of centre points, but the area range is different, the number of input and output size PointNets is also different. This method takes a large neighbourhood for each centroid point and will keep all available points in the test. This will affect and reduce the calculation speed. Therefore, the second method multi-resolution grouping (MRG) was proposed, see figure7b. This method makes a concat for grouping of different levels. It will first put a pointNet for processing for low-level grouping and then concatenate with high-level grouping. The work is done by changing the weight of the features. There are two kinds of features, the first is directly extracted from the points, and the other one is obtained by the local regions. When the density of the local region is low, the second type of features might be less relevant than the first type of features. In that case, the weight of the first kind of feature will become higher.

Segmentation and classification are the two applications of the algorithm. Segmentation is a process of grouping the point clouds into multiple regions with similar attributes. The classification is a process to label these areas. In segmentation, point feature propagation (PFP) is applied to obtain the original points by propagating features from sub-samples, namely upsampling, see figure6. To accomplish the interpolation, two methods are used: across level skip links and inverse distance weighted average. Across level skip links combines the interpolated features with the features of the SA layer, while Inverse distance weighted average uses KNN to choose three adjacent points as interpolation objects to achieve the goal. The working

(17)

principle of classification is similar to PointNet, see the classification layers in figure6. Feature extraction is first performed on each point and then use max-pooling to get a 1 × 𝐶4global feature. An MLP (or fully

connected (FC) layer) is adopted after to get k scores.

Based on the results from the authors, PointNet++ had better accuracy and performance with varied density of point clouds than PointNet.

2.3.3 PointCNN

PointCNN was first proposed in 2018 by Y. Li et al. [28] for feature learning in the point cloud. The difference between PointCNN and PointNet++ is mainly manifested in two aspects. The first is that unlike PointNet++, PointCNN uses an 𝜒-transformation before using MLP. Randomly sampled centroids and their 𝑘_{neighbouring points will pass the 𝜒-transformation block before the MLP is applied. This is to transform} the input into a more standardized form, and it also considers the relationship between points in the local region. The second is that in the process of MLP, PointNet++ lifts the features by a symmetric function, PointCNN weights and permutes the input features [29].

PointCNN is inspired by convolutional neural network (CNN). The convolution operator in CNN can explore the spatially-local correlation in the data. It is the key factor for CNN’s success in different tasks [30]. The point clouds are unstructured and unordered, which is unsuitable for convolution. To be able to allow the convolution operator applied directly to this type of data, 𝜒-transformation is created and used in PointCNN. Conv Conv

x

-Conv

x

-Conv F₁ F2 F3 K1 K1 Fp1 Fp2 Fp3 Regular grids Point cloud

Figure 8: Hierarchical convolution for regular grids and point cloud.

Figure8shows hierarchical convolution for regular grids (CNNs) and point cloud (PointCNN). CNN can effectively reduce the dimensionality of a large amount of data into a small amount of data. As the regular grids show in the figure, the dimension reduces from a 4 × 4 matrix to a 2 × 2 matrix. PointCNN has the same characteristic, but the dimensionality is reduced based on unstructured points.

algorithm 1 𝜒-Conv Operator Input: K, 𝑝, P,F

Output: F𝑝 ⊲ Features “projected”, or “aggregated”, to 𝑝

1: 𝑃0← 𝑃 − 𝑝 ⊲_{Move P to local coordinate system of 𝑝}

2: 𝐹𝛿← 𝑀 𝐿𝑃𝛿(𝑃

0₎ _⊲

Individually lift each point into 𝐶𝛿dim. space

3: 𝐹_∗← [𝐹𝛿, 𝐹] ⊲ Concatenate 𝐹𝛿and 𝐹, 𝐹∗is a 𝐾 × (𝐶𝛿+ 𝐶₁) matrix

4: 𝜒← 𝑀 𝐿𝑃(𝑃0) ⊲_{Learn the 𝐾 × 𝐾 𝜒-transformation matrix}

5: 𝐹𝜒← 𝜒 × 𝐹∗ ⊲Weight and permute 𝐹∗with the learnt 𝜒

6: 𝐹𝑝 ← Conv(𝐾, 𝐹𝜒) ⊲Finally, typical convolution between 𝐾 and 𝐹 𝜒

The algorithm of 𝜒-Conv operator can be seen in algorithm 1. The ’K’ is the kernel of convolution, 𝐾 nearest neighbor search is applied here to get the K. The ’𝑝’ is the representative point in { 𝑝2, 𝑖}. The P

(18)

is the neighbor points, expressed as a matrix P = ( 𝑝1, 𝑝2, ..., 𝑝𝐾) 𝑇

, unordered set F = ( 𝑓1, 𝑓2, ..., 𝑓𝐾) 𝑇

is the feature of neighbor points. The ’F𝑃’ is the features of input features that "projected" or "aggregated" on

point ’𝑝’. The algorithm can be represented by the equation bellow:

𝐹𝑝 = 𝜒 − 𝐶𝑜𝑛𝑣 (K, 𝑝, P,F) = 𝐶𝑜𝑛𝑣 (K, 𝑀 𝐿𝑃(P − 𝑝) × [𝑀 𝐿𝑃𝛿(P − 𝑝), F]) (1)

𝑀 𝐿 𝑃𝛿(.) is MLP, it is used for high-dimensional mapping to get a more abstract feature representation,

similar to PointNet and PointNet++.

𝜒_{-Conv can be easily trained by using a backpropagation algorithm. It can act on point cloud data with} or without additional features in a unified form and be used on both features and kernel. It makes the feature matrix of neighbour points (𝐹𝑝) independent of the order of neighbour points.

Loss Loss Loss Loss Loss

x-Conv( N=4,C=C2 , K=4, D=2 ) x-Conv( N=4,C=C2 , K=4, D=2 )

x-Conv( N=7,C=C1 , K=4) x-Conv( N=7,C=C3, K=3 ) x-Conv( N=10,C=C4 , K=3 )

a - classification b - classification c - segementation

x-Conv( N=4, C=C2 , K=4 )

x-Conv( N=4, C=C1, K=4 ) x-Conv( N=7,C=C1 , K=4)

FCs FCs FCs FCs FCs

Figure 9: PointCNN architecture for classification and segementation

PointCNN has two network structures for classification, see 𝑎 and 𝑏 in Figure9. The number of samples drops too fast in structure 𝑎, which results that the 𝜒-Conv layer can not be fully trained. To improve this problem, the author proposes structure 𝑏, which aims to control the depth of the network while keeping the growth rate of the receptive field, so the model can “see larger and larger”. The 𝑐 network structure is for segmentation.

2.4 DBSCAN

The density-based spatial clustering of applications with noise (DBSCAN) is a density-based clustering algorithm, and it was first proposed by Ester et al. in 1996 [31]. This algorithm can divide regions with sufficiently high density into clusters and find clusters of arbitrary shapes in a noisy spatial database. Usually, the samples of the same category are closely connected, and there must be samples of the same category not far from any sample of the category. DBSCAN is based on such a set of neighbourhoods to describe the density of the sample set.

The eps and minPts are two of the most important parameters of DBSCAN. The eps reprecents the search radius of the neighbourhoods, and minPts represents the minimum number of the points required to define a cluster. Based on these two parameters, the point can be divided into core point, border point, and outlier point (noise). A core point is a point that the total number of points inside its eps neighbourhood is greater or equal to minPts. A border point is a point that is reachable from a core point, and the total number of points inside its eps neighbourhood is less than minPts. An outlier point is a point that neither a core point nor a border point.

The principle of DBSCAN algorithm:

1. DBSCAN searches for clusters by checking the eps neighbourhood of each point in the data set. If the eps neighbourhood of point P contains more points than minPts, it creates a cluster with P as the core point.

(19)

2. Then, DBSCAN iteratively collects the points whose density is directly reachable from these core points. This process may merge some clusters with reachable density.

3. When no new points are added to any cluster, the process ends.

Figure10shows an example for a clustering process of DBSCAN with 𝑚𝑖𝑛𝑃𝑡 𝑠 = 3. The red, green, and blue points correspond to the core points, border points, and outlier points.

A

Figure 10: Clustering process of DBSCAN with 𝑚𝑖𝑛𝑃𝑡 𝑠 = 3.

2.5 Convex hull

Convex hull7is a concept in computational geometry (graphics). If given a finite point set on a plane, the convex hull of this point set is a convex polygon containing the smallest area of all points in the set. Figure

11shows an example of a convex hull for a point set with ten points. The hexagon is the convex hull of the points. The blue points are hull points, and the black points are non-convex hull points.

p1 p2 p3 p4 p5 p6 p7 p8 p9 p10

Figure 11: Convex hull of a ten points set.

2.6 Training

The learning process in ML is called training. This process is to adjust the algorithm’s parameters through training data and then store information for future predictions. The model created by this process is called training model. The training model can be used to predict new data with an unknown target.

(20)

A good learning system requires a large amount of training data and an effective training process. Among them, the training data is essentially important, and it has a great impact on predictive performance. When the learning samples and diversity of the training data increase, the learning ability of the ML algorithm increase [32]. Besides, it is also important to keep the data balanced for supervised learning that uses label samples to train. If the data is imbalanced, the algorithm uses the overall accuracy as the learning target. The algorithm will then pay too much attention to the majority class. Therefore, the performance of the minority sample will be reduced.

2.6.1 Imbalanced data

Figure 12: A part of the provided data [13] in Karlstad with many bridges, the bridges is marked with the colour red and the intensity for each point is presented with a colour scale from blue (low) to yellow (high).

Bridges represent a small fraction of the total land area. This will cause the data set in the classification to become very imbalanced. Figure12is a point plot of Karlstad that contains many bridges, but compared with other non-bridge points, the bridge points are still few and far between. This imbalance problem will be more severe in non-urban areas in Sweden. It may happen that the entire test area does not contain any bridge points, such as forest or farm fields areas. The training may cause an under-fitting on this situation.

2.7 Evaluation

ML is the process of generalization of data sets, and a set of “general laws” can be summarized from the input data. The model learned by the machine usually does not cover all situations, which will lead to errors between the actual predicted output of the algorithm and the true output of the sample. Another problem is overfitting, which means the model has a good performance in the training set but not in the testing set. The training data cannot truly represent the real sample space. The generalization error cannot be directly obtained, and the empirical error is not suitable for the standard due to overfitting. Therefore, a set of model evaluation is needed [33].

When dealing with classification problems, a good and straightforward indicator is the confusion matrix, see figure13. This indicator can give a good overview of the operation of the model and is suitable for classification model evaluations [34][35]. Based on the confusion matrix, more indices that can be used to analyze the classification are derived, such as overall accuracy (OA), precision, recall and Youden’s J statistic [36].

For general classification problems, OA is intuitive and effective under normal circumstances. But for the classification tasks with imbalanced data, OA does not present the classifier’s performance well. The classifier’s output might bring higher classification accuracy to the majority class and perform poorly in the

(21)

minority classes. Precision, recall and Youden’s index have more reasonable evaluation criteria for this kind of imbalanced test data.

True Negative (TN) False Positive (FP) False Negative (FN) True Positive (TP) 0 1 0 1

Prediction

Actual

Figure 13: Confusion matrix for binary classes.

1. Overall accuracy is the ratio between the number of correct predicted sample of the trained model and the total number on all test sets. It defines as follows:

𝑂 𝐴=

𝑇 𝑃+ 𝑇 𝑁

𝑇 𝑃+ 𝐹 𝑃 + 𝑇 𝑁 + 𝐹 𝑁 (2)

2. Precision and Recall. Precision and recall are indexes used to measure the accuracy of machine learning models for predicting positive classes. The precision measures how often it is correct when the machine learning model predicts, while recall estimates how likely the machine learning model predicts positive class correctly.

Precision and recall are defined as:

𝑃𝑟 𝑒 𝑐𝑖 𝑠𝑖 𝑜𝑛= 𝑇 𝑃

𝑇 𝑃+ 𝐹 𝑃 (3)

The range of result value is between 0 and 1. 0 means no precision, and 1 means for perfect precision.

𝑅 𝑒 𝑐𝑎𝑙 𝑙= 𝑇 𝑃

𝑇 𝑃+ 𝐹 𝑁 (4)

The range of result value for the recall is the same as precision, but 0 is no recall and 1 for full recall. For a given class, different combinations of precision and recall have different interpretations of the model:

• High precision and high recall: the model can detect this class well.

• High precision and low recall: the model cannot predict this class well. But when it predicts this class, the prediction result is highly reliable.

• Low precision and high recall: the model can predict this class well, but the prediction results also include points of other classes.

• Low precision and low recall: the model cannot predict this class well.

3. Youden’s J statistic, also known as the correctness index, is a method of evaluating the authenticity of a screening test. Assuming that false negative (FN) and FP are equally harmful, Youden’s index can be used. The formula for Youden’s J statistic is specified below:

𝐽= 𝑇 𝑃 𝑇 𝑃+ 𝐹 𝑁

+ 𝑇 𝑁

𝑇 𝑁+ 𝐹 𝑃− 1 (5)

This index can be used to check the reliability of a classifier. The value range is from 0 to 1, the larger index, the better the effect and the more reliable classifier. If it is close to 0, it indicates that the classifier is not reliable even if it has high overall accuracy.

(22)

2.8 Related works

The current state-of-the-art for methods and algorithms are presented in this section. The commonly used approaches in point cloud classification include support vector machine (SVM), random forest (RF), deep learning. The existing deep learning-based classifier can generally be divided into two categories depending on the input data type. The first category requires the input point cloud data to be converted to other representations such as regular voxel structures before sending it into the classifier, for example, CNN [23]. The second category can achieve feature learning directly from the raw point cloud data. The representative algorithms are PointNet, PointNet++, and PointCNN.

The SVM approach requires few training samples. It is usually used to solve high-dimensional classi-fication problems [37]. J. Zhang et al. [38] used SVM to classify the ALS point cloud data in urban areas. 13 features were selected for classification, a total of five classes with three testing scenes for each class were evaluated. The result shows an overall classification accuracy is more significant than 92.34%, and it increased with point density. It was concluded that the method often fails to extract power lines since it has a similarity to vegetation and was not time-efficient in feature calculating. C. Chen et al. [39] proposed an improved version, the one-versus-rest SVM classifier with a mixed kernel function of Gauss and Polynomial, to classify the ground objects from ALS point cloud data. Three datasets, dataset I (8 classes), dataset II (5 classes), and dataset III (5 classes), were used. The mixed kernel function has a good local learning ability and a strong generalization ability. The average accuracies of classification for a dataset I, II, and III were 97.69%, 99.13%, and 96.20%, these results show the proposed algorithmthe one-versus-rest SVM is more robust and practical and the SVM. But the training speed is limited and affected by the number of the classes in dataset.

H. Kim and G. Sohn [24] have classified power-line scene from ALS data by using RF, the accuracy of the test classification are 95.9% with 9 important features and 96.4% with 21 features. The features were extracted by voxel- and point-based method. The training set and test set have 194,289 and 484,932 points, respectively. Both contain not only interesting classes but also the other classes which are manually or semi-automatically classified. Two to seven meters segment sizes were evaluated to compare classification performance. 3m segment size consider giving the best result in this case. The misclassification occurs the most in pylon parts near ground and vegetation since the close distance.

The advantage of using RF is also reflected in classifying imbalanced data. F. Pirotti and F. Tonion compared RF and a multi-layer neural network in TensorFlow framework on the classification of ALS point cloud data in urban scenes [11]. The data is divided into nine imbalanced classes, and 23 features were selected. The average F1 score values based on a growing set of training data is 82.3% for RF, 45% for neural network. The complexity of the hidden layer of the neural network causes the low F1 values in this work.

A 2D CNN with 21 layers got an OA of 86.4% which is highest among other approaches used for ALS point cloud classification in the work presented by E. Özdemir et al. [12]. It took 45 minutes for training and less than 10 minutes for classification with a training set of 753,876 points and an evaluation set of 411,722 points. Geometric features and height features were used, K-nearest points and radius search were applied in feature extraction. Both searching approaches are affected by the changes in point density. This caused some data loss, then further affected the final accuracy. Feature extraction played an important role in those classification approaches. It required input point cloud data converted to other representations. This might increase unstable factors, which will affect the accuracy of classification.

M. Soilán et al. [40] have applied PointNet to classify ground, vegetation and buildings from ALS point cloud. Zero centering and normalization was used to pre-process the [𝑥, 𝑦, 𝑧] coordinates. The feature vector that was used as input for the algorithm had the following parameters for each point: the raw [𝑥, 𝑦, 𝑧] coordinates, intensity, return number, the height of the point compared to the lowest point in a 3𝑚 × 3𝑚 neighbourhood, and pre-processed coordinates [𝑥𝑛, 𝑦𝑛, 𝑧𝑛]. The tile block had the size of 25 × 25𝑚2 and

was up/downsampled to 4096 points per tile block. The result showed that the buildings were difficult to distinguish from the vegetation. They suggested future work to test PointNet++ instead since it takes multi-scale context into account. In a similar work from Y. Lin et al. [41], PointNet++ was used instead. They had better results to differ buildings from vegetation on the same kind of data. The pre-processing of the input data have some differences, the size of the tile block was 50 × 50𝑚2, and the [𝑥, 𝑦] coordinates were normalized but not the 𝑧 coordinate.

In the work of S. Briechle [42], PointNet++ were used to classify coniferous and deciduous trees. The ALS data had a resolution of 54 points per 𝑚2, and the data was taken in June while the trees had leaves.

(23)

The tile block had a size of 20𝑚 × 20𝑚 × 60𝑚 with 8192 input points. The [𝑥, 𝑦, 𝑧] coordinates were zero centered, and parameters of PointNet++ were tuned to appropriate values after some tests. OA of this classification was 85%.

ML methods have been used for bridge classification in point clouds. But compared with the following studies, the difference in finding bridges is that the point clouds are scanned by laser is obtained from the ground. H. Kim et al. [43] have used ML to classify the abutment, slab, pier, girder, surface, and background parts of the bridge. The algorithms that participated in this work were PointNet, PointCNN, and dynamic graph convolutional neuron network (DGCNN). The results showed that DGCNN performed best with an overall accuracy of 94.5%, then PointNet (93.8%), and PointCNN (92.6%). However, PointCNN was more stable in parameter settings than the rest of the algorithms. However, there were some weaknesses in the process. Only three bridges have been tested, and all of them were included in the training, validation, and test set. The authors concluded that the training should be improved for future research. Another similar work was from H. Kim et al. [44], the PointNet was used to classify the bridge’s pier and deck. In their dataset, a total of seven bridges were scanned. Four were used for training, and three for testing. The points in each tile block were downsampled to 2048 points. The OA was 96% with the most optimized parameter settings in the results. Both of the works were tested with similar tile blocks size and an overlapping percentage between tile blocks. The results showed that PointNet performed as best with a high overlap (80% and more).

(24)

3 Ethical and societal considerations

The confidential buildings or places are removed or hidden in the provided data. The resolution of the data is too low to recognize a person or the number plates on vehicles. The data is also public so that anyone can access it. Therefore it has no ethical considerations in this work itself.

However, the usage of this work could be critical. One example is if this bridge detection algorithm would be used in an application that requires high correctness. It could be path planning for autonomous vehicle, where a truck could collide with a low bridge that was not detected. Another example is warfare, where this algorithm could be used to guide missiles toward bridges.

(25)

4 Research methodology

Input Data

Manually select the dataset which contain the

interesting areas

Generate the tile blocks on the selected areas

Training Testing Evaluation Update training data Update features and parameters Result

Figure 14: Working flowchart.

The research methodology of this thesis started with a literature review to understand the state-of-the-art methods. Then define and identify research questions, and applied the existing ML methods to solve the problems. The main work was divided into three parts: data collection, model training, testing, and result evaluating. Numerical and visual analytical methods were utilised for result evaluation. The whole methodology is specified as following steps:

1. Literature review and study for understanding the current state-of-the-art ML algorithm in the classi-fication of the point cloud, and training model issues with imbalanced data.

2. According to the study result of the literature review, identify the research questions of this thesis. 3. Define experimental methods to solve the research problems.

4. Implement the proposed methods. Figure14shows the working flowchart for the implementation. Input data were manually selected and then generated as tile blocks for training. Several temporary models were trained and evaluated to determine the necessary features and parameters. The final model was done with a much more extensive and diverse dataset that contains the problem areas encountered during the previous feature and parameter selection training.

5. Analyse the result for each training. The final result was compared with the current algorithm. The results were evaluated through statistical learning methods in tabular form and visual analysis methods with images.

(26)

5 Technical method and implementation

Point-based algorithms such PointNet, PointNet++ and PointCNN were planned to be used for this work, since those types of algorithms can skip the feature extraction step and use raw points data as input. They have been tested in similar data in many similar scenarios as mentioned i section2.8. But due to time limitation, only PointNet++ algorithms was tested. One of the main reasons for not choosing PointNet is that PointNet has a problem capturing local structure. PointNet++ and PointCNN have improved this capability which was more suitable for this case. The reason that PointCNN was not used is because of the hardware resource limit. PointCNN requires much more compute capability than PointNet++. At the beginning of the project, the ML algorithms were being set up on the computers provided by the Mälardalens university. PointCNN could not run efficiently due to the lack of GPU resources. Therefore, PointNet++ was prioritised. The supervised ML method was applied in this work. As mentioned previously, the dataset is too large to be used as training data, and the number of input points is limited for the algorithm. The dataset was selected manually, then generated to tile blocks in an appropriate size. Several temporary models were trained and evaluated to determine the necessary features and parameters. For final testing, there is a risk that the bridges will not be completely covered in a single generated tile block, which could make it more difficult to classify the bridge. To prevent it, overlapping and voting between the tile blocks were applied. Since the detected bridge has a size limit in its definition, a filter was created to remove the predicted points that did not reach the limit.

This section presented the solution and the implementation of this work. The section5.1was aimed to develop an automatic method for data collection and tile block generating. This method can greatly simplify the data collection process, save collection time, and accurately generate the desired area’s input data. The section5.2and5.3were focus on the solution in bridge detection by using ML approach. The evaluation method is proposed in section5.4.

5.1 Data collection and generation

1. Mark areas with polygons, and download the coordinates as .txt files

2. Rename the .txt file with the name defined as: Koordinater(method)_(location).txt https://karthavet.havochvatten.se/visakoordinater/ 1. Locate/get missing file Check if any file is missing Yes

FTP client 2. Download LAZ file FTP server

Generate tile blocks Export tile block to .h5 format

No

MA

TLAB script

Combine .h5 files to create a training set and set the ratio between bridge and nonbridge

Generated dataset for ML

(27)

To simplify the process of collecting training data, an application was implemented to speed up the process, which can be seen in figure15. The data sets for training and testing are automatically downloaded and converted into tile blocks with the size and format required by the ML algorithm’s input. The areas that are intended to be used in training and testing were selected first by marking polygon on the map8provided by the Swedish Agency for Marine and Water Management9. The coordinates of the polygon are in SWEREF 99 TM format and downloaded as text files.

2.5 km 2.5 km Index box Tile block B A

Tile block generating for bridges in selected Index box

Tile block generating for selected areas only

Figure 16: Tile block generating type A and B. Orange color areas are manual selected coordinates from the web application.

Two methods were implemented to generate tile block based on the marked area, see figure16. Method A collects all bridges in the selected index boxes and generates only tile blocks that contain bridge points. At least one bridge point needs to be placed with a certain range from the edge of the tile block, see figure

17. The reason to place the bridges like this is that the ML algorithm might have it easier to learn and find the relations between points from the bigger part of a bridge than from just a few bridge points. Method B generates tile blocks within a marked area, and the purpose is to collect tile blocks over smaller areas to limit the number of tile blocks generated. This is to avoid using the whole LAZ files as training data.

a 0.75a 0.75a First bridge point detected Tile block Bridge

Figure 17: Placement of bridge in the tile block for method A 8https://karthavet.havochvatten.se/visakoordinater/

(28)

Bridge points

Tile block

Remove tile block Nonbridge pints Downsampling needed

Upsampling needed Correct amount of points

a.

Up/down-sampling for points in tile block

b

. Up/down-sampling for tile blocks

Figure 18: Upsampling and downsampling in points and tile block level.

All the tile blocks are preprocessed with zero centring for the [𝑥, 𝑦] coordinates. The 𝑧 coordinates are subtracted by the median of the 𝑧 coordinates of all points in each tile block.

The point density in the provided data is varied. Point density in the area with much water is usually sparse because the water absorbs the signal from the laser scanner. At the same time, the overlap between the scanned areas causes a high point density. These reasons lead to the number of points between tile blocks differ a lot. Therefore, up/downsampling is needed within each tile block because PointNet++ requires a fixed number of points as input. The description in the left side of figure18 illustrates in which cases up/downsampling of points is needed. When there are too few points within a tile block, upsampling is made by duplicating the points randomly. If there are too many points within a tile block, downsampling is made by randomly remove points away from a tile block.

If the generated dataset is imbalanced, it is possible to use downsampling of tile block in order to get it balanced. See the description on the right side of figure18. The tile blocks are also shuffled randomly before they are saved as a data set in h5 files to make it easier to select files for training and validation.

5.2 Machine learning approaches

The input data plays an important role in improving the performance of the algorithm. In terms of features, zero-centred [𝑥, 𝑦, 𝑧] coordinates were used as the necessary features. The laser return number and intensity were utilized as additional features to see if the model’s predictive ability can be improved. Except for features, other parameters that have an impact on the algorithm need to be specified. To obtain that, a smaller training and test set was used to find a satisfying combination. The point clouds were divided into tile blocks with a certain number of points to reduce the input size of the algorithm. The number of points per tile block corresponds to the average point density in the ALS data. From the related works in section

2.8, it has been concluded that the tile block size affects the accuracy. The size needs to be adjusted to work well with bridges. In addition to the tile block size, the search radius in the grouping layer was tested and identified.

5.2.1 PointNet++ architecture

The part segmentation network of PointNet++ where used to classify the points in the tile blocks. The architecture of PointNet++ part segmentation network contains three different levels: SA, feature propagation layer (FPL) and FC. The details of how each level is defined are described here:

1. Notation of the architecture:

SA(𝐾, 𝑟, [𝑙₁, ..., 𝑙𝑑]): K is the number of points sampled (K local region). The r is the search radius in

the local region of ball query, has a unit in meter. The d indicates the PointNet has 𝑑 fully connected layers with width 𝑙𝑖(𝑖 = 1, ..., 𝑑).

(29)

FPL(𝑙₁, ..., 𝑙𝑑, 𝐶): A FPL with 𝑑 FC layers, C is the number of categories.

FC(𝑙, 𝑑𝑝): A FC layer with 𝑙 width and 𝑑𝑝 dropout.

2. Network of architecture:

• Three layers in SA level were used:

SA1(512, 4, [64, 64, 128]) −→ SA2(128, 8, [128, 128, 256]) −→ SA3([256, 512, 1024]) • Three layers were adopted in FPL level:

FPL(256, 256) −→ FPL(256, 128) −→ FPL(128, 128, 128, 128, 2) The FC layers in FPL had a dropout of 50%.

5.2.2 Feature selection and parameter settings

The point features tested where the [𝑥, 𝑦, 𝑧] coordinates, intensity and the return number. One assumption was made that [𝑥, 𝑦, 𝑧] coordinates is needed to get reliable results. The following feature combinations in table1were tested.

Point Features

[𝑥, 𝑦, 𝑧] _Intensity _{Number of Returns}

1 0 0

1 1 0

1 0 1

1 1 1

Table 1: The combination of point features tested. Feature with number 1 is selected, 0 is not selected.

To find a good combination for the number of points and tile block size. The set of sizes in table2were tested. The number of points for each tile block was proportioned to the average point density.

Side Length (m) 20 30 40 50 60 70

Point number 512 1024 2048 4096 4096 8192

- 512 1024 2048 -

-Average point number 600 1350 2400 3750 5400 7350

Table 2: The parameter settings tested for the tile blocks. The first row indicates which side lengths that where tested for each tile block. The middle rows specify the number of points tested in each tile block. The last row is the average number of point from the provided data for such size of area.

A total of five combinations of search radius were tested, see table3. The range of the radius was chosen according to the point density.

Radius combination 1 2 3 4 5

[1st layer, 2nd layer] in 𝑚 [4,8] [6,12] [8,16] [12,24] [16,32]

Table 3: The test combination of search radius for the first and the second grouping layers.

5.3 Training and testing

The feature and parameter selection was evaluated three times with different feature and parameter combin-ations. Then overlapping and filtering was applied in the final testing, see the flowchart in figure19.

(30)

Training set

Traininng

Testing 0%, 25% ,50% overlappingTesting set with between the tile blocks

ML algorithm

Internal voting for upsampling in tile block

Filtering Result

Voting for overlapping

V

oting

Figure 19: Flowchart for final training and testing with overlapping and filtering. Internal voting was used to vote for the points that were upsampled in the tile block. The second voting is applied to vote for the overlapped points between tile blocks.

5.3.1 Program setup

To train PointNet++ efficiently, a GPU with high performance is necessary. Therefore a work station from the Swedish mapping, cadastral and land registration authority were borrowed for training and prediction. The workstation was located in Gävle, and it was remotely used from Västerås. The operating system used was Ubuntu 18.04 in a Docker environment10. A combination of the software programs used was Python 3.7, Cuda 10.0, and Tensorflow 1.14. The working station had the GPU RTX2080Ti with 11GB of memory. Both the training and testing for PointNet++ were performed in the working station. The generation of data sets, analysis and filtering were performed on ordinary computers provided by Mälardalens University. 5.3.2 Training and testing set

The training and testing set were generated three times. The first set was used to make sure that generated data works on PointNet++. The second dataset was used for parameter settings and feature selection. The third larger data set is used for the final test. Each updated training set contain areas where the previous tests had a problem. The final dataset for training and validation can be seen in table4.

Tile block labels Total distribution Type of area/Location Number of tile blocks Distribution per label

Non-bridge 52.90% Total 3711 100.00% Residential apartments 404 10.90% City center 386 10.40% Church yard 88 2.37% Agriculture fields 162 4.37% Forest 292 7.87% Industry/shopping 972 26.20% Rail 143 3.85% Residential houses 172 4.63%

Harbour and watercourse 1092 29.40%

Bridge 47.10% Total 3307 100.00% Falkenberg 88 2.66% Göteborg 1825 55.20% Karlskrona 151 4.57% Luleå 275 8.32% Malmö 848 25.60% Vänersborg 120 3.63%

Table 4: This table contains information about the final data set that are used for the training and validation. Bridge blocks were fetched from the training areas that are described in figure20.

(31)

The areas for the training and the testing set were completely isolated from each other. Figure20show the locations where the training and testing sets were collected. Note that the orange marks only represent the locations of the bridges in the training set, non-bridges are taken from other locations as well. The distribution of tile blocks with non-bridge and bridge points used in training is displayed in table4. The ratio between non-bridge points and bridge points tile blocks is 52.9 : 47.1. For non-bridge points, there are total of nine types of areas which were selected. The train test split approach was used in training. The training set were divided into one training set and one validation set with approximate ratio 70 : 30 or 60 : 40. During all the training, the model with the highest score in the validation was saved.

In the parameter and feature selection, there were two kinds of test sets. To save time and get more varied bridges, one of the test set contained only tile blocks with bridges over each place. The other test set was located at Karlstad in a bridge dense area of 12.5𝑘 𝑚2. The purpose of the test set in Karlstad was to represent a real-world scenario where the bridges are underrepresented and there is a lot of complex and varied objects.

The final test set is only based on a real-world scenario, for that several LAZ files and cities were selected. See the location with blue colour marked in the figure20. The selected LAZ files can be seen table21in appendix.

Overlapping was also tested in the final test set to get more reliable predictions. The best results after overlapping and filtering were used to remove predictions that can not be a bridge by checking the size, density and number of points. If there are too few points, too small area or too low density of bridge points, the prediction is set to non-bridge.

Figure 20: Locations of training and testing areas for the final test. Orange marks represent the bridge location of the the training set, and blue marks represent the final testing areas.

CLASSIFICATION OF BRIDGES IN LASER POINT CLOUDS USING MACHINE LEARNING

School of Innovation Design and Engineering

Västerås, Sweden

Thesis for the Degree of Master of Science in Engineering - Robotics 30.0

credits

CLASSIFICATION OF BRIDGES IN LASER

POINT CLOUDS USING MACHINE

LEARNING

Sunny Liao Nilsson

Martin Norrbom

Examiner:

Miguel Léon Ortiz

Supervisor:

Ning Xiong

Company Supervisors:

Andreas Rönnberg

Anders Ekholm

22/06/2021

Acknowledgement

Acronyms

Contents

List of Figures

List of Tables

1

Introduction

1.1

Problem formulation

1.2

Limitations

2

Background

2.1

Dataset

2.2

Current algorithm for bridge detection

2.3

Machine learning algorithms

x

x

2.4

DBSCAN

A

2.5

Convex hull

2.6

Training

2.7

Evaluation

Prediction

Actual

2.8

Related works

3

Ethical and societal considerations

4

Research methodology

5

Technical method and implementation

5.1

Data collection and generation

MA

TLAB script

a.

b

5.2

Machine learning approaches

5.3

Training and testing