Umeå University Detect obstacles for forest machinery from laser scanned data.

(1)

Umeå University

Detect obstacles for forest machinery from laser

scanned data.

July 31, 2020

Author: Supervisor:

Simon Söderström Fredrik Linde

A thesis submitted in fulfillment for a Master in engineering physics

degree at Umeå University

(2)

UMEÅ UNIVERSITY Faculty of Science and Technology

Department of Physics

Master in engineering physics

Detect obstacles for forest machinery from laserscanned data. by Simon Söderström

Abstract

(3)

Acknowledgements

(4)

3.2.7 Ground classification . . . 9 3.3 Object detection . . . 9 3.3.1 Areas of interest . . . 9 3.4 Feature measures . . . 11 3.4.1 Voxelization . . . 12 3.4.2 Smoothness . . . 14 3.4.3 Flatness . . . 15 3.4.4 Intensity . . . 15 4 Results 16 4.1 Ground classification . . . 16 4.2 Voxel size . . . 16

4.3 Feature measures and obstacle detection. . . 18

4.3.1 Smoothness . . . 18

4.3.2 Flatness . . . 18

4.3.3 Intensity . . . 20

4.3.4 Stone classification . . . 21

4.4 The program . . . 25

5 Discussion and conclusion 25

(5)

1 Introduction

When it comes to utilizing the planet and its resources to its outermost in a sustain-able manner, how we utilize and handle the forests of our planet plays a big role. To mention a few countries, 57% of Sweden is covered by forest [1] and 40% of Canada is covered by forest [2], so it comes as no surprise that the forest environment is indeed a natural source with great richness. Throughout the years, humans have developed and manufactured highly technological machinery in order to more properly harvest the resources of the forest. Although, these types of machines are in general rather heavy and clunky to maneuver, especially under harsh and varying conditions. This is a task that is actively done whilst operating the machine out in the woods, a task which a machine operator takes on. Since the forest is unpredictable and the sight might be obstructed, the machine operator might have to take a detour due to obstacles, such as stones, blocking the operators’ path. However, if these obstacles were discovered in advance, the path could have been planned to more efficiently operate the forestry machine by minimizing the distance the machine has to traverse. This study will attempt to develop an algorithm that can be used to identify areas that are considered an obstacle for a forestry machine. As far as obstacles goes, the primary and most common issue is large stones blocking the path, but some other cases will also briefly be considered. The identification of obstacles will be done by examining LiDAR, Light Detection and Ranging, data of a forest area in the northern part of Sweden. This data is provided by SCA, Svenska Cellulosa AB, which as part of a project scanned the area using LiDAR technology. A LiDAR instrument can build up a complex map of an area by firing rapid laser light pulses onto a surface, and based on the time it takes for each pulse to bounce back to the instrument, the distance to the surface can be found [3]. There are typically two types of LiDAR data, Aerial Laser Scan (ALS) where the scanning procedure is done airborne, and Terrestial Laser Scan (TLS) where the instrument is located on ground level. The data available at hand for this particular study is of type ALS. The first step is to break down the raw data into something useful. The raw data in this case consists of large and overlapping point clouds, and in order to get the most high density data set possible in a manageable form, one has to cut and paste these files together. The second step is to identify the features that distinguishes a stone from other regions in the point cloud. The third step is to utilize these features to extract suspected areas where stones are located. Primarily, stones with area larger than 1 m2 _{are what would be considered as an obstacle. The fourth and}

final step consists of using a set of feature measures which are derived based on the characteristics of a stone in a point cloud. These feature measures can then be used to train a classifier to identify stones based on their values, and thus classify the clusters as stones or non-stones. The clusters, which according to the classifier are considered a stone are thus considered as obstacle, for which the coordinates of said stone would be the output.

(6)

of a point cloud was used to classify various aspects in forest environments. The first feature measure that will be used in this particular study will be based on the smoothness of a cluster, the second based on the mean flatness in an area as defined by Olofsson and Holmgren [4], and finally the third will be the mean intensity of point cloud records in an area.

The method will in full will be implemented as a python script and a description of how this script works and how to use it will be given.

1.1 Data set

The data set provided by SCA comes from, as mentioned, an area in northern Sweden. The coverage of the data set at hand is roughly 50000 acres, and is scanned using a dual wave LiDAR scanner attached to a plane [6]. The point cloud density in this data set is around 20 points per square meter, and since the scanned areas are overlapping the the data set approaches 100 points per cubic-meter.

Figure 1: Example of a raw data file. Note the point records high up in the air and below ground level, these would be regarded as noise. Here, the colors are simply coded by the elevation of the points.

1.1.1 What is LAS?

LAS files are a common way to store point cloud data, as it stores a lot of different valuable data in a compressed manner [7]. To mention a few features of interest, LAS files typically contains:

• A header block, consisting of:

– X,Y,Z coordinates: scale factor, offset, min values and max values. – Number of point records.

(7)

– Intensity. – Classification.

Primarily, the list above consists of the components that are of interest for this project in particular. However, a LAS file contains other information as well.

2 Theory

In this section the necessary theory to understand the method will be explained. 2.1 Linear discriminant analysis

Linear discriminant analysis, LDA, is a technique used when one wants to classify an observation as one of K ≥ 2 classes based on a set of predictors. The LDA clas-sifier works under the assumption that the predictors are drawn from a multivariate Gaussian distribution with a common covariance matrix and where the mean vector is class specific. The multivariate Gaussian distribution in turn assumes two things, namely:

1. Each separate predictor is normally distributed. 2. There is correlation between each pair of predictors.

Furthermore, the LDA classifier is based on the Bayes classifier. According to the Bayes classifier, a new observation x is classified as class k for which the expression

δk(x) = xTΣ−1µk− 1 2µ T KΣ −1 µk+ logπk (1)

is the largest. Here, δk is the probability of an observation x belonging to class k,

Σ is the covariance matrix, µk is the mean value of the predictor of class k and πk

is probability of an observation belonging to class k. The LDA classifier estimates the parameters of the Bayes classifier using the data at hand [8].

For two classes, the LDA classifier can be written as (~µ1− ~µ2)Σ−1~x = 1 2(~µ1+ ~µ2)Σ −1 (~µ1− ~µ2) − log π2 π1 , (2)

where ~µ1 and ~µ2 are the means of the observations of class 1 and 2 respectively, π1,

π2are the proportions of class 1 and 2 respectively. If we define the LDA coefficients,

~ C, as

~

C = (~µ1− ~µ2)Σ−1, (3)

we can rewrite equation (2) as

~ C(~x −1 2(~µ1+ ~µ2)) = −log π2 π1 . (4)

(8)

2.2 Shoelace theorem

As described in [9], the area A of a polygon with vertices (a1, b1), (a2, b2), ..., (an, bn)

ordered clockwise is by the shoelace theorem by A = 1

2|(a1b2+ a2b3+ ... + anb1) − (b1a2+ b2a3+ ... + bna1)|. (5) 2.3 Spatial point distribution of a point cloud

The eigenvalues describing the spatial distribution within a point cloud were found with an approach used by Lalonde et al. [5], in which they first of all calculated the covariance matrix Cov of the point cloud by the equation

Cov = 1 N N X i=1 (Xi− ¯X)(Xi− ¯X)T, (6)

where N is the number of point records of the form Xi = (xi, yi, zi)T, and in which

¯ X is defined by ¯ X = 1 N N X i=1 Xi. (7)

The eigenvalues are then found by decomposing Cov into its principal components.

3 Method

3.1 Data management

Since the raw data consists of multiple strokes of scanned forest and there are overlap among these strokes, it is necessary to merge and cut this raw data into something more useful and easy to handle. This was done by developing a python script which is in its essence based on a divide and conquer method, i.e., the problem at hand is broken down into sub-problems that are then solved. Due to the immense size of the files and the fact that the amount of available random access memory (RAM), i.e., the amount of data that can be loaded into the computer at once, is limited, the data had to be broken down into smaller batches. These batches were then handled separately, whilst saving necessary information from each batch. A breakdown of the process can be seen below.

1. Find how large area the entirety of the data set makes up. This was done by first identifying the corners of each file and then identifying the outer corners. This defines the dimensions of the grid.

2. Split each file of raw data into manageable sizes, i.e. small enough to fit in the memory of the computer. This will yield a set of sub files for each raw data file.

(9)

4. For each grid element, go through the text files from every sub file, and create a new LAS file for each grid element.

Running this script across the entire data set will result in a data set consisting of 100x100 m2 tiles of data, where all point records in the area from all files are stored. This will also render some files unusable due to the fact that they happen to coincide with an edge. An illustration of how this works can be seen in Figure2.

Figure 2: Illustration of the grid layout used to break down the data into manageable tiles. Here, two strokes (green, pink) of scanned forest lies on a grid of size (7, 4), with the overlap as orange. The points within each grid element makes up one tile.

Once the new LAS files has been created, they have to be crudely filtered to remove some obvious noise, see Figure 1. This filter was pretty naively made, where one simply removes every point that contains a negative elevation (Z coordinate) value, and every point with elevation 40 m above the average of the ground file. This 40 m limit was chosen since no trees in the area seemed to exceed 40 m in height. 3.2 Point cloud features

In this section, a collection of different types of objects with distinct features that exist in the data set will be presented. For the example figures in the sections below, the colors of each point simply corresponds to the height within the file the picture is taken from, and thus lacks extrinsic value and is only for viewing purposes.

3.2.1 Ground

A large portion of the point records in the data are from ground hits. As mentioned previously these are classifiable using a ground classifying algorithm, more on that subject in a later section.

3.2.2 Trees

(10)

Figure 3: Example of dense forest environ-ment displayed in the point cloud data.

Figure 4: Example of sparse forest environ-ment displayed in the point cloud data.

Based on this, one can draw the conclusion that in some cases the objects that lie on the ground are not properly scanned and is thus not detectable by this approach only. This is simply due to the fact that in the case of dense forest environment together with the fact that the data is of ALS nature, the ground and the objects which lie upon it are potentially not properly scanned.

3.2.3 Bushes

Bushes makes up another large portion of the data. In point clouds, bushes are seemingly randomly distributed in space, but still concentrated to an area. An example of a bush in this data can be seen in Figure 5.

Figure 5: Example of a bush in the point cloud data.

Another important thing to mention is that bushes lie as objects on the ground, i.e. in the area where most obstacles are located. This means that they will be point cloud structures to take into consideration when trying to identify obstacles, since bushes by themselves are not considered obstacles.

3.2.4 Stones

(11)

1. There are no point records directly underneath the stone, this creates the illu-sion of a hole in the ground within the point cloud, with an object that floats on top of that hole.

2. The point cloud is rather flat and more coherent, especially when compared to the point cloud of a bush.

3. The intensity measure for the point records of a stone are in general higher than other common objects in the point cloud. This is most likely due to the fact that stone materials have a higher reflectivity than biomass.

These three factors are primarily what will be utilized when constructing the pro-gram to identify the stones. An example of a stone in this data can be see in in Figure 6.

(12)

3.2.5 Uprooted trees

Timber and fallen trees are, obviously, very similar in appearance. The appearance of an uprooted tree is very similar to that of a stone, since the dirt and roots that the stem rips up also creates the illusion of a hole in the ground similar to factor 1 in section 3.2.4. However, the intensity measures for the points in the uprooted trees are hopefully different enough compared to stones to distinguish between the two. An example of an uprooted tree in this data can be seen in Figure 7.

Figure 7: Example of an uprooted tree in the point cloud data.

3.2.6 Huts and houses

(13)

Figure 8: Example of a small house and a woodshed in the point cloud data.

3.2.7 Ground classification

To classify the ground points in the data, a cloth simulation filter was used. In short, the idea is that one first turns the point cloud upside-down, and then drops a piece of cloth from above which will by gravity fall down towards the ground [10]. This particular algorithm that was used, was used as implemented in R via the package lidR, [11]. However, there are some parameters that have to be adjusted in order to generate a type of ground model which does not capture indentations such as stones. The adjustable parameters of interest here are rigidness and resolution. The resolution is the distance between the particles that makes up the piece of cloth and the rigidness corresponds to how rigidly the cloth will behave. Regularly, one picks a resolution based on the point cloud density, but for this case it is beneficial to use a lower resolution and higher rigidness than typically recommended, since it is of interest to avoid having the cloth creep up into small areas such as stones. In order to find a good value for these parameters, a few were tested for an area with a stone, and the result were interpreted by observation.

3.3 Object detection

The central part of this study is to identify objects of interest within the point cloud. Primarily, the type of obstacles that are most represented in the data set are stones located on the ground. Throughout this section, each step in how regions where obstacles might be located are detected will be thoroughly explained.

3.3.1 Areas of interest

(14)

spatial clustering of applications with noise, or DBSCAN. DBSCAN is a hierarchi-cal clustering algorithm, which as opposed to the other clustering algorithm type, partitional, requires no information about the amount of clusters within the data [12], which in this case there is a lack of. DBSCAN is also a powerful choice for this application as it captures clusters of arbitrary shapes, which natural objects tend to have. This algorithm was used as is implemented in the python package sklearn [13]. An example of this step can be seen in Figure 10, in which the ground points of the LAS file in Figure 9was inverted and clustered.

Figure 9: Projection of points classified as ground in a point cloud onto a 2D plane.

Figure 10: Clusters of holes in the ground found by inverting the image in Figure9, and then using DBSCAN to cluster the pixels.

The next step is to detect points that are located above these holes in the ground, which was done by first cutting out a slice of the point cloud just above the digital terrain model. This was done by using the software LAStools, see [14]. The thickness of this slice was picked as large enough to with certainty capture most of the stone along the ground. Essentially, this step will yield a point cloud with mostly bushes and stones, as well as tree stems. Note that this point cloud will not have any points classified as ground points. This point cloud was then clustered with the same algorithm that was used for identifying holes in the ground points, namely DBSCAN. Again, this is a powerful choice for the same reason, i.e. that naturally shaped objects are seemingly arbitrary in shape together with the fact that it is a hierarchical clustering algorithm which requires no prior knowledge of the number of clusters.

Figure 11: Slice of thickness 3 m above ground level of a las file.

Figure 12: Clusters of points in a slice above the ground level as seen in Figure11.

First of all, the clusters which are smaller than 1 m2 _{should be filtered out. This was}

(15)

than 1 m2_{, are then compared with the holes found in the ground, as seen in Figure}

10. This is done in order to identify clusters for which no point records underneath have been recorded. This step narrows it down greatly to which clusters of point records within the point cloud that are probable candidates for being obstacles on the ground. These clusters are found by searching for a hole within a circle around the center of a cluster, where the radius is based on the spatial area as seen from above of the cluster. The clusters that remains from Figure 12 after this step can be seen in Figure13.

Figure 13: What is left after comparing clusters found in the 3d point cloud, Figure12with holes in the ground points, Figure10.

Once the steps above have been performed one ends up with a set of point cloud clusters that are likely to be obstacles. However, further tests are required to dis-tinguish between stones and other objects such as uprooted trees and bushes which happened to not get filtered out. For this, a set of feature measures were used with the purpose of identifying the characteristics of a stone.

3.4 Feature measures

In this section each feature measure will be discussed and motivated. The feature measures smoothness and flatness are calculated for voxelized point clouds, which essentially is a 3 dimensional pixel. Here, the voxel sizes are to be predefined. The voxel size that would yield, in theory, best results are decided based on the test described in the coming section.

As for the feature measures, each of them will be evaluated for clusters which are either classified as a stone or non-stone, out of which a classifier model will be constructed to distinguish between the two. The classifier will be based on linear discriminant analysis, see section2.1. In order to get accomodate training data for the LDA classifier, a sample of randomly selected 100x100 m2 _{tiles out of the entire}

(16)

3.4.1 Voxelization

In order to perform analysis on smaller areas within a point cloud, one approach is to voxelize the data. For this, a program that breaks down a point cloud into voxels of fixed size xydim∗ xydim∗ zdim was developed. These dimensions are easily adjusted

depending on the task. The algorithm is a very generic algorithm that checks the coordinates for each point and calculates which {x,y,z} voxel position it corresponds to by xvoxel = ceiling pointx xydim − 1, yvoxel= ceiling pointy xydim − 1, zvoxel= ceiling pointz zdim − 1,

Once the data is in represented as voxels, one can perform analysis on smaller areas separately, as well as look at smaller features within the data.

To optimize the difference in feature measures between stones and bushes or other indistinguishable clusters, a set of simulated data was generated corresponding to some cases. Case 1 and 2 are set up to mimic the appearance of bushes and other types of randomly scattered points. Case 3 through 6 are set up to mimic the typical shapes of a stone and other flat objects. A description of each case can be seen in the list below. In the cases below, U (a, b) corresponds to a uniformly distributed variable between a and b. The number of points for each case are set up to mimic the typical density of the data set used in the study. A sample of each of these cases can be seen in figures14through 19.

• Case 1: Random scattered points confined within a box of sides of sides lx, ly, lz ∼ 2 + U (0, 1) independently in each direction.

• Case 2: Random scattered points that are confined within a square pyramid of sides lx, ly ∼ U (2, 3) and height lz ∼ U (2, 3)

• Case 3: Top half of scattered points that lie on a sphere of radius r ∼ U (1, 4). • Case 4: Top half of scattered points that lie on a sphere of radii rx, ry, rz ∼

1 + U (0, 2) independently in each direction.

• Case 5: Randomly scattered points which lie on a sphere cap of radii rx, ry, rz ∼

1+U (0, 2), together with randomly scattered points which lie on a cylinder with radius corresponding to the radius of the sphere cap.

• Case 6: Randomly scattered points within a box of size lx, ly ∼ U (1, 3), and

(17)

Figure 14: Example of case 1: Randomly scattered points within a box which mimic the typical look of a bush.

Figure 15: Example of case 2: Randomly scattered points confined in a pyramid shape to mimic the typical look of a bush.

Figure 16: Example of case 3: Points ran-domly scattered on a half-sphere with equal radii to mimic a variant of a stone.

Figure 17: Example of case 4: Points ran-domly scattered on a half-sphere with vary-ing radii to mimic a variant of a stone.

Figure 18: Example of case 5: Points ran-domly scattered on a sphere cap with vary-ing radii, together points randomly scattered on a cylinder adjacent to the sphere cap to mimic a variant of a stone.

Figure 19: Example of case 6: Randomly scattered points confined in a flat slab to mimic a variant of a stone.

Each of these cases were simulated 500 times and the feature measures were cal-culated using voxels of sizes xydim ∈ {0.1, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40, 0.45, 0.50},

and zdim large enough to cover the entire point cloud. Afterwards, the voxel size

which yielded the greatest difference between the feature measure values for the stone clusters and the bush clusters was found and used in the final model.

(18)

3.4.2 Smoothness

By directly comparing a stone in the point cloud to a bush in the point cloud, one can see that a stone is a lot more structured in the sense that each point record lies on a surface, i.e. the point cloud is coherent. Thus, a measure was made called smoothness, with the intent to extract information of how coherent the cluster is in a global manner. This particular smoothness measure is calculated by plane fitting in a voxelized cluster and calculating the average angle between the fitted planes. The angle between each plane is calculated as the cross product between each normal for each adjacent plane. Since there are cases in which a voxel does not contain enough points to fit a plane to it, these boundaries are discarded.

An example of the fitted planes with corresponding normal vector in the case of a point cloud representing a somewhat flat surface, as well as a more scattered point cloud can be seen in Figure20 and Figure21 respectively.

Figure 20: Example of fitted planes and cor-responding normals within voxels for a point cloud which would be regarded as a some-what flat surface.

Figure 21: Example of fitted planes and cor-responding normals within voxels for a point cloud which would be regarded as a some-what scattered object.

To describe the idea further, two common cases will be discussed. In Figure21, the points are more scattered and thus the normal vectors for each voxelized plane are pointing in a more scattered manner. This means that the average angle between each normal in Figure 21 is relatively large. On the other hand, in Figure 20, the points are located in a more coherent manner, thus the normal vectors for the planes in each voxel are pointing more towards the same direction. This means that the average angle between each normal in Figure 20 is comparatively small. The smoothness measure is calculated by

S = 1 Ntot X X i=1 Y X j=1 (

ni−1,j−1× ni,j−1 if ni,j−1, ni−1,j−1 exists

0 otherwise + 1 Ntot X X i=1 Y X j=1 (

ni−1,j−1× ni−1,j if ni−1,j, ni−1,j−1 exists

0 otherwise ,

(8)

where X and Y are the number of cells in each direction, ni.j are the normal vectors

of the planes in cells (i, j) and Ntot is the total number of evaluated cross products.

(19)

Figure 22: Example to describe how the smoothness measure is calculated. Normal vectors to fitted planes within each voxel are paired with green lines, where the cross product of each existing pair is to be evaluated.

In the example seen in Figure 22 the green lines represent normal vector pairs for which the cross products in equation (8) is evaluated. The first half of the right hand side in equation (8) corresponds to vertical green lines and the second half to horizontal green lines.

3.4.3 Flatness

In [4] a method to identify tree stems in a point cloud was developed. One step in in their process was to identify flat surfaces within the point cloud. This was done by first voxelizing the data, and for each voxel calculating the eigenvalues of the spatial distribution of the point cloud. The eigenvalues were found by the approach described in section2.3. The flatness F is then, as defined in [4], calculated by

F = e2− e3 e1

, (9)

where {e3, e2, e1} are the eigenvalues in ascending order. One can observe that the

flatness measure in equation9will be closer to 1 the flatter the object is, i.e. if one eigenvalue is a lot lower than the other two.

3.4.4 Intensity

The mean intensity of a point could tell some information of the surface on which the laser beam is reflected by. The intensity measure is partly based on the reflectivity of the surface of the object struck by the laser scanner [17]. In a study by Ohkawa et al. [18], it was discovered that the intensity measure could be used to distinguish between stones and grass in LiDAR data. Based on their result, an intensity measure was used as a feature measure. For a cluster with N point records, with intensity recordings {I1, I2, ..., IN}, the intensity measure is calculated as the mean intensity

within the cluster by

Iavg = 1 N N X i=1 Ii. (10)

However, since the intensity measure is depending on how far away from the reflec-tion point the scanner is located, the intensity will vary. In order to compensate the varying intensity, the intensity measure is adjusted according to

Iadj =

Iavg− Imin

Imax− Imin

(20)

where Imax and Imin are the highest and lowest intensities respectively recorded

within the tile. And thus Iadj is the used feature measure for intensity. This is a

crude way to counteract the fact that the intensity measure is not compensated by the distance to the scanner. Ideally, one would adjust the intensity based on the distance from the scanner, but this was not possible to do at the time.

4 Results

4.1 Ground classification

As for the ground classification step of the process, the two different algorithms were tested with varying parameter values. The result was then looked at and evaluated according to what was sought after. In the cloth simulation filter algorithm [10], the parameters which yielded the best result on average were cloth_resolution = 0.4 and rigidness = 1 for the implemented version in R. These parameters yielded a ground classification that does not tend to creep into larger stones.

4.2 Voxel size

In the generated sample data described in section3.4.1, the results for voxel dimen-sions xydim ∈ {0.1, 0.2, 0.25, 0.3, 0.35, 0.4, 0.5} for the 500 simulations can be seen in

the Table1 and Table2.

Table 1: Average flatness measure for 500 iterations for cases 1-6 with varying voxel sizes. height xydim Case 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 0.0046 0.0136 0.0183 0.0166 0.0179 0.0284 0.0359 2 0.0169 0.0339 0.0717 0.0921 0.0923 0.1495 0.1001 3 0.0759 0.1767 0.201 0.2461 0.2644 0.2812 0.3344 4 0.1506 0.2282 0.277 0.3149 0.3646 0.3697 0.4706 5 0.1341 0.2086 0.2094 0.2519 0.2771 0.3061 0.3371 6 0.1315 0.3118 0.3301 0.3412 0.4048 0.3049 0.3748

Table 2: Average smoothness measure for 500 iterations for cases 1-6 with varying voxel sizes. xydim Case 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 0.0943 1.3451 1.5204 1.5174 1.48 1.4296 1.3571 2 0.9837 1.3726 1.3457 1.2007 1.0663 0.9469 0.8694 3 0.1043 0.1452 0.1829 0.1961 0.2266 0.2354 0.2661 4 0.139 0.1465 0.1698 0.1982 0.2309 0.2674 0.3034 5 0.0911 0.4366 0.4212 0.3665 0.2876 0.2573 0.2278 6 0.8173 0.8121 0.6144 0.4209 0.3165 0.2365 0.1927

(21)

Table 3: Difference in feature measure S and F between cases (1,2) and cases (1,2,3,4) for varying voxel sizes. xydim 0.1 0.4 0.35 0.3 0.25 0.2 0.15 S 0.7437 0.8511 0.9367 0.9953 1.0707 1.1144 1.1736 xydim 0.1 0.15 0.2 0.25 0.3 0.35 0.4 F 0.1653 0.1964 0.2211 0.2277 0.2548 0.265 0.2834

It can be seen in Table 3 that the larger the voxel size, the larger the difference in flatness between the two cases. Similarly, the same effect but reversed can be seen for the smoothness measure apart from the outlier at xydim = 0.1. To conclude, the

voxel sizes xydim should be used based on this result are:

• Smoothness: As low as possible, thus the size that is used when calculating the smoothness measure will be calculated and set according to the point density within the cluster and area of the cluster.

• Flatness: Ideally as high as possible, but it should still be small enough to capture distinct features in the cluster. Based on the result of the test as displayed in Table3, the voxel dimension xydim was set to 0.4 when calculating

the flatness measure F. The value was chosen with the intention to best balance the result displayed by Table 3, together with the assumption that a voxel dimension xydim of 0.1 could cause complications with point clouds of a less

(22)

4.3 Feature measures and obstacle detection.

In this section, the results for each of the three feature measures will be presented. At the end, the classification based on these three measures will be presented.

4.3.1 Smoothness

For the smoothness measure, two histograms of the two cases stone and non-stone, were made and can be seen in Figure23.

Figure 23: Histograms of calculated smoothness value for stones (left) and non stones (right).

For the measures presented in Figure 23, the mean for stone clusters is 0.553 with standard deviation 0.158 and for non stone clusters the mean is 1.14 with standard deviation 0.268.

4.3.2 Flatness

(23)

Figure 24: Histograms of calculated flatness value for stones (left) and non stones (right).

(24)

4.3.3 Intensity

For the intensity measure, two histograms of the two cases stone and non-stone, were made and can be seen in Figure25.

Figure 25: Histograms of calculated intensity value for stones (left) and non-stones (right).

(25)

4.3.4 Stone classification

By plotting the measures against each other, one gets the following for each combi-nation:

Figure 26: Pairwise scatter plot for each feature measure smoothness, flatness and intensity. Here red dots correspond to stones and green dots correspond to non-stones.

In Figure 26, one can see obvious clusters for each combination which enhances the claim that they are indeed different. The linear discriminant analysis classifier only works under certain conditions as described in section2.1, conditions which are satisfied:

Condition 1: This was checked by drawing a qq-plot of each predictor for each class, which yielded the following plots.

(26)

Figure 28: Qq-plot of the flatness measure for stones (left) and non-stones (right). The plots also contain a reference line.

Figure 29: Qq-plot of the intensity measure for stones (left) and non-stones (right). The plots also contain a reference line.

As one can observe in Figure27through 29, the data seem to be somewhat normally distributed, but not completely for some of the measures. This implies that condition 1 more or less fulfilled, but could be regarded as a potential error source for the model.

Condition 2: The correlation between each predictor pair is summarized in Table

4.

Table 4: Correlation matrix for each pair of feature measures. Smoothness Flatness Intensity Smoothness 1 -0.855 -0.704 Flatness -0.855 1 0.733 Intensity -0.704 0.733 1

The correlation coefficients seen in table 4 are what would be considered as highly correlated, which would imply that condition 2 is indeed fulfilled. This means that an LDA classification model is a justified method to use.

(27)

Figure 30: Histogram for each label for the training data set, red for stones and blue for non-stones. The green line represents the threshold which is used to classify new observations based on this LDA model.

The model is based on 75% of the 822 clusters that were manually classified, the remaining 25% is used as test data. For these 25% clusters, the overall accuracy is 85.4%, with a class-wise accuracy of 82.9% for non-stones and 100% for stones. As for the total set of 822 clusters, whereas 708 are non-stones and 114 are stones, the model has an overall accuracy of 89.0%, with a class-wise accuracy of 88.3% for non-stones and 94.7% for stones. A similar plot to30, but with all 822 clusters, can be seen in Figure31.

Figure 31: Receiver operating plot of the LDA model, with an area under curve measure of 0.97.

(28)

Figure 32: Histograms of calculated smoothness value for stones (left) and non-stones (right).

(29)

4.4 The program

As a final mention in the result section, a few comments about the actual program in which this method is implemented will be brought up. The program is made as a python script, which only requires a .las file of a forest environment of size 100x100 m2_{. The program consists of a set of flags, which can be seen in table} ₅_{, which in}

some regard is seen as input parameters. An example of an output file Summary.pdf can be seen in appendixA.

Table 5: List of flags and their name, description and default value of the python script. Flag Name Description Default

-f File name

The name of the file that is to be analyzed. Has to be .las file of which covers an area of 100 x 100 m2.

None

-pdf Generate pdf Boolean saying if the summary.pdf

file should be generated or not. True

-plotcluster Plot clusters

Boolean saying if each individual suspected clusters are to be plotted and save as a .png file. Mainly used for analyzing purposes.

False

-min_area Minimum area Minimum area in square meter of detected clusters. 1

5 Discussion and conclusion

The method developed in this study has proved to be rather successful in detecting objects which lie on the ground in a forest environment which, due to their size, would be regarded as an obstacle for forest machinery. The primary focus of object type is stones, which is the most commonly appearing object in a forest environment and is what is considered an obstacle in most cases.

Initially, it was assumed that the tiles with dense forest would be discarded due to the fact that the ground and the objects which lie upon it is not properly scanned. Thus, at first a component of this study was initially intended as a way to filter out regions in which the point cloud would be considered sparse due to lack of point records underneath tree canopies. However, after some testing it was discovered that there are cases in which stones are indeed rather defined and thus detectable by the program. Due to the lack of true data in the sense that not all surfaces have been reached, it is however hard to conclude that all stones have actually been picked up by the LiDAR scanner making it hard to say how well this adjustment is justified to make. It would be beneficial in these areas to have complementary TLS data, where the scan has been performed at ground level, to check if any objects have been missed completely or not. Sadly this was not available at the time.

(30)

and the way it was classified was based on my interpretation, I chose to go with an approach that would in theory capture the more typical features of each class. The approach to detect suspected areas in the data which could be a stone or not is very reliant on a well defined ground model. Essentially if an object which lie on the ground level is rather low, say a large somewhat flat spherical stone, it was hard to control the parameters to ensure that the cloth filter simulation algorithm does not creep into the object. This could prove to be troublesome in some cases which would result in missing a cluster which would otherwise be captured by the algorithm.

One large issue with this study is the fact that there was a lack of actual train-ing data to properly test the model. Due to this fact, all the classifications that was done was by visually looking at the point cloud in a viewer software and then, by my interpretation of the region, set a label on the cluster. Ideally I would like to have had access to some other way to verify my decision, with for example a picture of the area, but this was sadly not the case. This also means that the classi-fied data used as verification data is not completely true since there might be some bias in what I considered being a stone or not, despite trying my best to be objective. Overall, the approach using a classifier based on linear discriminant analysis proved to be rather successful in distinguishing between faulty suspected clusters and stones, if one would disregard the fact that the class labels might not be 100% true. As seen in Figure 32, the model is rather well constructed since the area under curve value AU C = 0.97 is what would be regarded as quite high. One could also observe that if it would be desirable, which in this particular case it could be, one could adjust the threshold for the score to instead achieve 100% true positives. This would be considered desirable in the case of the object detection to optimize forest machinery since it would not cause much problem to path slightly differently if it avoided every single obstacle compared to not being able to trust the path completely. Do note that an adjusted threshold does not imply detecting 100% of obstacles, but merely 100% of obstacles in the manually classified data. The accuracy measures of the model are also very good, especially for stones. However, an accuracy for the class corresponding to stones of 100% does seem unrealistic and is with most certainty not fully true, most likely since the set used for testing is not that large.

The way the data was gathered meant that there would be varying point cloud den-sities across the tiles that was constructed. This meant that it was hard to find a general value for some of the parameters used, especially for the clustering algorithm DBSCAN, in order to get a well extracted set of clusters from all ranges of density. A set of values that worked most of the time was set as default which in some cases, generally around the edges of the scanned area, would have to be adjusted accord-ingly to get a functioning obstacle detection.

(31)

(32)

References

[1] Sveaskog, Brief facts 1: What is Swedish Forestry, (Available at

https://www.sveaskog.se/en/forestry-the-swedish-way/short-facts/brief-facts-1/), Accessed 2019-10-23

[2] Sustainable forest management in Canada, Overview - Canada’s Forests, (Available at https://www.sfmcanada.org/en/canada-s-forests), Accessed 2019-10-23

[3] LiDAR-UK, How does LiDAR work?, (Available at

http://www.lidar-uk.com/how-lidar-works/), Accessed 2019-10-14

[4] Kenneth Olofsson, Johan Holmgren, June 2016, ’Single Tree Stem Profile De-tection Using Terrestrial Laser Scanner Data, Flatness Saliency Features and Curvature Properties’

[5] Jean-Francois Lalonde, Nicolas Vandapel, , Daniel F. Huber, and Martial Hebert, June 7, 2006, ’Natural Terrain Classification using Three-Dimensional Ladar Data for Ground Robot Mobility’

[6] SCA bidrar till studie om digitalt skogsbruk (Available at https://www.sca.com/sv/nyheter/2019-12/sca-bidrar-till-studie -om-digitalt-skogsbruk/), Accessed 2020-03-08

[7] LAS Point Cloud Format, (Available at https://www.usna.edu/Users/oceano /pguth/md_help/html/las_format.htm), Accessed 2019-10-11

[8] Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, "An Intro-duction to Statistical Learning with Applications in R", 8th printing (2013). Pages 138-149.

[9] Art of problem solving, Shoelace Theorem, (Available at

https://artofproblemsolving.com/wiki/index.php/Shoelace_Theorem), Accessed at 2020-02-10

[10] W. Zhang, J. Qi, P. Wan, H. Wang, D. Xie, X. Wang, and G. Yan, ’An Easy-to-Use Airborne LiDAR Data Filtering Method Based on Cloth Simulation,’ Remote Sens., vol. 8, no. 6, p. 501, 2016. (Available at http://www.mdpi.com/2072-4292/8/6/501/htm)

[11] Jean-Romain Roussel, ’lidR: Airborne LiDAR Data Manipulation and Visualization for Forestry Applications’, (2020) GitHub repository: https://github.com/Jean-Romain/lidR

[12] Martin Ester, Hans-Peter Kriegel, Jiirg Sander, Xiaowei X, 1996, ’A Density-Based Algorithm for Discovering Clus-ters in Large Spatial Databases with Noise ’, (Available at https://www.aaai.org/Papers/KDD/1996/KDD96-037.pdf)

[13] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P. , Weiss, R., Dubourg, V., Vanderplas, J., Pas-sos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E., ’Journal of Machine Learning Research’, Volume 12 (2011). Pages 2825-2830.

(33)

[15] Barber, C.B., Dobkin, D.P., and Huhdanpaa, H.T., "The Quickhull algorithm for convex hulls," ACM Trans. on Mathematical Software, 22(4):469-483, Dec 1996

[16] Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, CJ Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E.A. Quintero, Charles R Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. (2020) SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, in press.

[17] ArcGIS, What is lidar intensity data?, (Available at

https://desktop.arcgis.com/en/arcmap/10.3/manage-data/las-dataset /what-is-intensity-data-.htm), Accessed at 2020-02-20

(34)

Umeå University Detect obstacles for forest machinery from laser scanned data.