Moving object detection in urban environments

(1)

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Moving object detection in urban environments

Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet

av David Gillsjö LiTH-ISY-EX--12/4630--SE

Linköping 2012

Department of Electrical Engineering Linköpings tekniska högskola Linköpings universitet Linköpings universitet SE-581 83 Linköping, Sweden 581 83 Linköping

(2)

(3)

Moving object detection in urban environments

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

David Gillsjö LiTH-ISY-EX--12/4630--SE

Handledare: James Underwood

acfr_{, University of Sydney}

Tim Bailey

acfr_{, University of Sydney}

Johan Dahlin

isy_{, Linköpings universitet}

Examinator: Thomas Schön

isy, Linköpings universitet

(4)

(5)

Abstract

Successful and high precision localization is an important feature for autonomous vehicles in an urban environment. GPS solutions are not good on their own and laser, sonar and radar are often used as complementary sensors. Localization with these sensors requires the use of techniques grouped under the acronym SLAM (Simultaneous Localization And Mapping). These techniques work by comparing the current sensor inputs to either an incrementally built or known map, also adding the information to the map.

Most of the SLAM techniques assume the environment to be static, which means that dynamics and clutter in the environment might cause SLAM to fail. To ob-tain a more robust algorithm, the dynamics need to be dealt with. This study seeks a solution where measurements from different points in time can be used in pairwise comparisons to detect non-static content in the mapped area. Parked cars could for example be detected at a parking lot by using measurements from several different days.

The method successfully detects most non-static objects in the different test data sets from the sensor. The algorithm can be used in conjunction with Pose-SLAM to get a better localization estimate and a map for later use. This map is good for localization with SLAM or other techniques since only static objects are left in it.

(6)

(7)

Acknowledgments

I would like to begin by thanking my supervisors at the ACFR, James Underwood and Tim Bailey. Thank you for all the valuable feedback, directions and expertise during our weekly discussions. I’m also grateful for the introduction to back-street-Thai, the workplace and all the helpful people working there. These I’d like to thank for the daily lunch discussions and a friendly welcome to the Australian culture. I will remember the great concept of pizza seminars and use it wisely. I also thank my Swedish supervisor Johan Dahlin for feedback on numerous edi-tion of this thesis. On the topic of feedback, I’d like too acknowledge my office neighbour and sounding board; Hanna Nyqvist, thank you for all the discussions and lunch walks.

Sincere thanks to my friends and family in Sweden for your support during this exciting and challenging semester. I’d also like to thank my Australian "family" Johan Wågberg and Emanuel Walldén Viklund for putting up with me and being such nice and understanding room mates.

Linköping, September 2012 David Gillsjö

(8)

(9)

1

Introduction

Robotic systems that are able to observe and autonomously react in different environments are increasing in numbers and complexity. Systems like these of-ten need a way to localize themselves in the surroundings or even create a map. One type of the solution to this is called Simultaneous Localization and Mapping (SLAM).

As the name suggests, the problem is for a system (e.g. robot) to build a map of its surroundings while at the same time position itself in this map. A map can also be given but is updated with information during the session. This enables a robot to be aware of its surroundings and react according to instructions. For example, planning of paths for a robot to take in a changing environment is made possible, since new information like a temporary obstacle can easily be incorpo-rated with already known areas of the map. By making localization and mapping one problem instead of two separate, the correlation between them can be used. Pose-SLAM is one of the solutions to the SLAM problem and the application con-sidered here. In this technique, the whole sensor measurement is taken and com-pared with the following measurement, estimating the position and orientation of the robot. The map is all these measurements combined together, for example in an occupancy grid.

Most solutions to SLAM assume that the surroundings are static to reduce the complexity of the problem, which makes it easier to solve in real time. This assumption makes the mapping and localization less robust in dynamic environ-ments, since moving objects might cause false associations.

(12)

This thesis seeks an algorithm for detection of non-static objects by pairwise com-parison of measurements from a 3D LiDAR sensor in urban environments. The algorithm can be integrated with pose-SLAM and will utilize different viewpoints of the same scene. This will make it possible to detect changes that have occurred not only around the time of measuring, but also dynamic objects that are station-ary for long periods. Parked cars or a group of people standing still conversing can for example be detected and left out from the map.

1.1 Contributions

• A technique using several viewpoints from a 3D LiDAR sensor in pairwise comparisons to successfully detect moving objects. This is done by selecting reference measurements from a larger data set and compare them with the measurement of interest.

• A greedy optimization algorithm for selecting which measurements to com-pare against.

• A case study showing results achieved and limitations of the technique.

1.2 Thesis outline

SLAM and LiDAR are explained in Chapter 2. Existing work in the area of SLAM and motion detection is then discussed in Chapter 3 followed by a description of the implemented algorithm in Chapter 4. A case study in Chapter 5 presents the results, with conclusions and future work in Chapter 6.

(13)

2

SLAM and LiDAR

This chapter first describes the concepts of SLAM (Simultaneous Localization And Mapping) and one of its solutions; pose-SLAM. Then follows a section on LiDAR (Light Detection And Ranging) which is the sensor for the application. At the end of the chapter the terminology for the sensor measurements is explained.

2.1 Feature-based SLAM

Feature-based SLAM is one of the first frameworks used and works by determin-ing landmarks in the environment which the robot’s position can be related to by sensor measurements. Depending on sensor choice the landmarks can range from simple signal emitting beacons to feature extractions in camera images. This explanation on feature-based SLAM follows the one given in Durrant-Whyte and Bailey [2006]. It is posed as a probabilistic problem with the following variables at time step k:

• The robot state vector, xk, with position and orientation.

• The robot control signal, uk, which determined the robot state transition

into this time step.

• The landmarks, m = (m1, m2, ..., mk), where mi denotes the i:th landmark.

• An observation, zi,k, is the sensor measurement of landmark i at time k.

The measurement of all landmarks at time k will be denoted zk.

A graphical description of the variables can be seen in Figure 2.1. The green tri-angles represent the state, which are connected by control signals to illustrate the state transition. The blue circles are landmarks and red arrows are observations.

(14)

xk−1 xk xk+1 uk uk+1 m1 m2 m3 z1,k−1 z3,k z3,k+1 z2,k−1 z2,k

Figure 2.1:This figure explains the variables typically used in probabilistic SLAM with landmarks. The green triangles represent the robots state in dif-ferent time steps, the control signals causing a state transition are shown as dashed black lines. The red arrows are observations of the blue landmarks.

For example, the green triangle in the middle representing state xk is caused by

applying control signal ukin the former state; this is the dashed line between the

states. In the state xk, observations are made of the blue landmarks m2 and m3, these observations are shown as red arrows originating from the green rectangle. When the feature-based probabilistic SLAM solution is derived, it is assumed that the robot state transition is a Markov process. This means that the robot state doesn’t depend on all the preceding states, only the previous state and the current signals. This gives the motion model formulated as a probability distribution

P (xk|xk−1, uk). (2.1)

It is also assumed that the landmarks are static, i.e. a landmark’s position is not dependent on the landmark’s preceding positions. The final assumption is that the observations are conditionally independent from each other. This means that an observation only depends on the landmarks and the current state. The last two assumptions give the following relations:

mk = mk+1, (2.2)

(15)

2.1 Feature-based SLAM 5

x0 xk−1 xk

z0 zk−1 zk

uk−1 uk

m

Figure 2.2: Graphical model of the probabilistic SLAM problem. Arrows illustrate dependencies between the variables for state x, observation z, con-trol signal u and landmarks m. Observation zk, which is the observation

made by the robot in time step k, is for example dependent on the state xk

and the landmarks m.

These assumptions give the problem a structure with few chained dependencies which can be solved using recursive solutions, for example a Kalman filter. This can be seen in the graphical model in Figure 2.2, where the arrows symbolize dependencies. The variable that the arrow is pointing at is dependent on the variable at the start of the arrow.

Since the landmarks are assumed static the information keeps accumulating and uncertainty can only decrease with time. If the landmarks were to move they would need motion models, and these motion models need to be very accurate to maintain good localization. This is difficult for objects in most environments since their movements are not predictable enough. It would also be very compu-tational demanding [Wang et al., 2003b]. This is why moving objects often are removed from the SLAM estimate rather than accounted for.

(16)

2.2 Pose-SLAM

Pose-SLAM is a solution which doesn’t estimate the positions of landmarks. In-stead it keeps estimating a trajectory of robot poses by setting up constraints between them. Constraints are made by comparing sensor measurements and estimating how the robot moved between them, see Figure 2.3. The map is then constructed by global alignment of the measurements, given the vehicle’s esti-mated trajectory.

The estimation of the trajectory can be made with different methods; except from common state estimation techniques there exists solutions like Fast-SLAM and graph-SLAM. The former keeps several trajectories hypotheses as particles, esti-mating the most likely. The latter is also a state estimation technique but repre-sents the problem with a graph, each node being a scan and the edges constraints from the measurement matching. More on Pose-SLAM and other methods can be found in Bailey and Durrant-Whyte [2006].

Since whole measurements are used instead of features, this method is good in situations where it is hard to find good recognizable and distinguishable features. As long as there are enough static objects around, the dynamics will not affect the pose estimate too much either. It would still be beneficial to have the moving ob-jects removed from the measurement. This makes the localization better and the measurements are also used to generate the map. If the map could be generated without the moving objects it would match better in later use.

An illustration of how moving objects affect the measurement matching is shown in Figure 2.4. The moving object causes an error which can be seen in the match-ing of the two static objects. The estimated motion will therefore not be the same as in Figure 2.3 where the two measurements only contained static objects. Dynamic objects might also introduce wrong alignments and therefore false hy-potheses; this isn’t shown in the example.

(17)

2.2 Pose-SLAM 7

1. 2.

Alignment

R

q

Figure 2.3:Illustration of a correct alignment. The two upper pictures illus-trate the surroundings in two measurements, only the robot itself is moving. By aligning the two measurements an estimate of the robots translation q and rotation R can be calculated.

(18)

1. 2.

Alignment

Figure 2.4:Illustration of an incorrect alignment. The two upper pictures il-lustrate the surroundings in two measurements, here we have one object and the robot itself moving. When the two measurements are aligned the static objects won’t match up because the moving object pulls the alignment. The translation and rotation estimate will not be correct. The alignment might also turn out completely wrong if the algorithm makes false associations due to moving objects, this is not shown here.

(19)

2.3 LiDAR 9

2.3 LiDAR

The sensing technique Light Detection And Ranging (LiDAR) is used for mea-suring range and light reflexivity. The technique uses focused light, often laser pulses, to illuminate the target. Depending on application, the light can be in the ultraviolet, visible or infrared spectrum. A sensor close to the light source then measures the returning light intensity and travel time to calculate the dis-tance to the target. With waveform analysis it is possible to increase accuracy and estimate distance from more than one return of the light.

Using light makes it possible to measure various types of targets depending on the wavelength used. Both cars, pavement, pedestrians, smoke, clouds and dust can give large enough reflections. For the detection to be more robust against fog and particles in the air the infrared spectra should be used. Lasers with larger range (higher power output) also benefits from longer wavelengths, since it is more safe for the eye. On the other hand, current sensing techniques have higher accuracy at shorter wavelengths so a trade-off has to be made. See Fujii and Fukuchi [2005] for LiDAR details.

The Velodyne HDL-64E S2 sensor (Figure 2.5) used for this application uses light just inside the infrared spectrum at 905 nm. It has a range of 120m and its light is safe for the eye; a requirement since it is to be used in the streets.

Figure 2.5: A picture of the sensor used, the HDL-64E S2. The picture is from the press section at Velodyne’s web site.

(20)

2.4 Free, occluded, unknown and occupied space

When reasoning about what is known in the environment given the measure-ments, the terms free, occluded, unknown and occupied space will be used. This works very well if the range sensor only return one reflection per ray as it does in this application. The free-space area in the environment is where the light from the sensor passed through before hitting an object. The endpoints of these light rays that reflected light back to the sensor make up the occupied space. These two types of areas are the ones considered known and can be reasoned about when it comes to deciding if objects have moved or not.

Unknown parts of the surroundings are where the sensor didn’t pick up any re-turns. Depending on the sensor and algorithm this could also be classified as free space up to the maximum range of the sensor. More about this in Chapter 4. The occluded space is the area hidden behind the occupied space, since nothing is known it can not be used when comparing measurements to determine if objects have moved or not. An illustration of this can be seen in Figure 2.6. The figure is a 2D illustration, where tiled shapes represent the actual objects, the legend describes the different areas in the measurement.

Free Unknown Occluded Occupied

Sensor

Figure 2.6: Description of the terms used for the different areas in the en-vironment relative to the measurements. The black tiled objects are objects in the environment, the blue thick lines are the measurements as well as the occupied space. For simplicity it is in 2D and measurements are continuous.

(21)

3

Related work in SLAM and motion

detection

This chapter describes different approaches taken in the research community and how this thesis relates to them. The chapter will focus on the research with laser as sensor.

3.1 Including moving objects in SLAM estimate

There are solutions which deals with dynamic environments by including the moving objects in the estimates. Examples include alternative methods of match-ing measurements which takes movmatch-ing objects into consideration [van de Ven et al., 2010] or SLAM frameworks where the landmarks’ state vector is extended with velocities [Bibby and Reid, 2007, Wang et al., 2003b].

Since the sought application is in conjunction with pose-SLAM and landmarks aren’t used in the same sense, the latter alternative isn’t the first option coming to mind. It could be used but is computational expensive and difficult to do. Considering motion during the scan matching is certainly good; it gives more ro-bustness against highly dynamic environments with very little stationary objects. The downside is that the moving objects will not be explicitly marked, which is beneficial if a static map is expected as a result of the pose-SLAM.

Therefore a method that removes dynamic objects will be used. Most of the pa-pers on SLAM solutions for dynamic environments do this. Since the measure-ments representing uncertain or dynamic objects are removed, the environment can be considered static by the SLAM algorithm. This was also how the first method for outdoor SLAM in dynamic environments was constructed [Wang and Thorpe, 2002].

(22)

3.2 Structure of common solutions

This section explains in general how moving objects usually are handled with pose-SLAM and choices made in regard to this. First the process of alignment is discussed, followed by the detection of possible candidates corresponding to motion. At last various implementations of the classification that decides what is moving or not are mentioned and compared.

3.2.1 Alignment

Alignment is the procedure of using a registration technique to match two mea-surements to get the transform relationship between them. For this thesis, a vari-ant of the commonly used algorithm ICP (Iterative Closest Point) [Rusinkiewicz and Levoy, 2001] is a prerequisite. There are a few alternatives to ICP includ-ing the RANSAC [Fischler and Bolles, 1981] and Fourier-Mellin transform [Chen et al., 1994].

ICP minimizes an error metric based on the distance difference between the mea-surements. As long as the initial guess is good, an estimate will be globally opti-mal according to that metric. Basic outlier checks are usually made and excludes measurements from the estimate. RANSAC is probabilistic and tries different hypothesis regarding what data to consider outliers or not. This makes it more robust to noise, but knowing when the best result has been found is hard and runtime may vary between matches. Since outliers, in this case mostly changes or motion, is handled by other methods, ICP is a valid choice for this algorithm. The Fourier-Mellin transform is better than ICP at finding matches without a good initial guess. It is however not as accurate and has therefore been used as a pre-alignment step.

3.2.2 Detection of candidates

A common first step in the detection of candidates is to do a segmentation of the measurements. This involves filtering the points representing ground and grouping points that are close to each other. Hopefully each group represents an object in the environment, but this is not always the case. Therefore there exists methods for keeping multiple hypotheses and choosing the most likely at the time. One such is the multi scale algorithm used in Yang and Wang [2011]. This implementation will however be using a single hypothesis algorithm [Douillard et al., 2011].

(23)

3.2 Structure of common solutions 13

There are three common actions after the segmentation; the first is to just pass on the non-ground segments. The other two are to either select segments via free-space violation detection [Wang et al., 2003a, 2007, Vu et al., 2011] or select segments based on local alignment errors [Katz et al., 2008]. The first alterna-tive leaves it to a higher level of reasoning, for example comparisons with mo-tion models in the next step. The free-space approach detects objects that have appeared between the sensor and the former closest objects. The approach is sim-ple and also handles possible occlusions, but adds another layer of computation. Using the already calculated alignment errors on the other hand adds a layer to handle occlusions.

These differences are illustrated in Figure 3.1 where two measurements from a scenario with a moving car are compared. In the first situation a car is driving in between two objects, occluding one of them which is of similar shape and size. In the second situation the car is out of sight and only the two stationary objects remain. Situation two is searched for moving objects using both methods. We see that using only alignment errors may lead to a false detection of the static object to the left, an occlusion check must be added for this technique.

Since they are not exclusive to each other, using both is also an alternative, as in [Miyasaka et al., 2009]. The free-space method is used in this project, partly because occlusions are implicitly handled, but also due to issues with the ICP algorithm used which still was under development.

3.2.3 Classifying movement

In this step, it is decided which segments that corresponds to moving objects. Most literature use either motion model prediction [Wang et al., 2003a,b, 2007, Zhao et al., 2008, Miyasaka et al., 2009] or dynamic occupancy grid maps [Wolf and Sukhatme, 2005, Wang and Thorpe, 2002]. A combination is also possible [Vu et al., 2011].

The Kalman filter is often the first choice when using motion models to predict the movement of the objects. Selection is then made based on what segments that are consistent with a motion model over some time frame. These meth-ods can be used without candidate selection in the former step (only delete the ground). Several models are often combined using a technique called Interactive Multiple Models (IMM) [Mazor et al., 1998, Vu et al., 2011], since motion of arbi-trary objects are hard to predict. This technique optimizes the track estimate by weighting estimates from several motion models. This is often combined with a Multiple Hypothesis Tracker (MHT) [Bar-Shalom and Fortmann, 1988, Vu et al., 2011] which keeps several tracks for each target and outputs the most probable one at the time.

Occupancy grids for dynamic objects are based on a model for the probability of a moving object occupying a cell. Cell values in a grid are increased when they are deemed occupied by a candidate moving object. If the values of the grids occupied by a segment is above some threshold, the segment is said to be dynamic.

(24)

1. 2. 1. 2. Moving Car Standing Car Building Measurements

Error detection in 2 Free-space detection in 2

Free-space Detection

Sensor

Figure 3.1: This figure compares using only errors from the alignment or free-space detection. The first two boxes are the environments in situation one and two. The next pair of boxes are the measurements from the sensor. The last pair is detection of moving objects in the measurement from situ-ation two with alignment errors or free-space violsitu-ations. The ellipse shows a detection by the algorithm, even though this is occluded space in the first measurement. An occlusion check is needed for this method.

(25)

3.2 Structure of common solutions 15

These methods which keep a state of the moving objects can handle situations where a moving object makes a temporary stop. They can also catch slow moving targets that may be difficult to detect, since they won’t be intruding that much into free space in between measurements. The downside to these methods is that they require a correct data association, especially with motion models. For example, a Kalman filter needs to compare segments that actually represent the same object. This might be hard sometimes due to occlusions which causes the object to quickly change shape.

There are a few other types of methods as well, for example modeling objects according to distributions and using gating criteria [Katz et al., 2008], or using weights in an expectation maximization framework [Rogers et al., 2010]. Most are however only concerned with online application, or at least to handle all mea-surements in sequence. What is sought in this application is similar to Levinson and Thrun [2010], which during a map building phase collects measurements from the area at different times. A grid map is used where each cell is a Gaussian variable describing the average and variance of the LiDAR intensity at return. A high intensity deviation in a grid cell implies a moving object which can be removed from the map.

The strength in a method like this is that it does not only detects objects moving at that time, but also detects objects that have been stationary a long time but can be moved, e.g. a parked car. This makes it possible to over time create maps with only stationary objects which can be used as models or more robust maps for lo-calization. The next chapter will present a method using range measurements to detect motion and changes in the surroundings using measurements from several viewpoints and points in time.

(26)

(27)

4

Motion detection using multiple

viewpoints

This is a description of the algorithm used to detect moving objects in a sensor measurement, also refered to as a scan, using a set of other scans taken in the vicinity. The algorithm is written in MATLAB and contains functions supplied by the ACFR.

4.1 Overview

The overall flow of the algorithm is described in Figure 4.1 and more detailed in Algorithm 1. The algorithm takes one active scan, which is the measurement that needs to have its moving objects removed. It also takes a set of reference scans, which are measurements taken in the proximity of the active scan, but can be from various different times. The basic idea is to compare the occupied space in the active scan with the free space in a selection of the reference scans.

Together with the measurements, the algorithm also expects that each scan has been segmented. The segmentation algorithm deletes measurements correspond-ing to ground and creates groups of points that seem to correspond to the same object. This is made with an external algorithm from ACFR and is further ex-plained in Section 4.2.

Since measurements are collected 20 times per second the algorithm would speed up considerably if the set were to be thinned out. This is the first step and the scans left will be either more than a meter or a second apart. The thresholds can probably be increased to higher values without a significant impact on the result.

(28)

Active Reference Thin Spectral reg. x1, y1 xN, yN _Selection Motion comp. ICP q1, R1 qM, RM Detection

Figure 4.1: The flow of the algorithm. One active scan and several possible reference scans are the inputs. In the end a selected few reference scans are compared with the active scan and moving objects in the active scan are detected.

(29)

4.1 Overview 19

Algorithm 1General algorithm for the method used in this thesis.

Require: Active scan Sa, Reference scans Sr, Thresholds Θ, No. ref. scans n

1: Read files(measurements, segmentation, relativeP ositions, groundT ruth) 2: Sr ←Thin(Sr) . Make scans more sparse in positions and time

3: for alli = Srdo

4: (Rsi, qsi) ← SpectralRegistration(i, Sa) . Supplied by ACFR

5: end for

6: Sc←SelectScans(Sr, n, T , q)

7: Sa, Sc←MotionCompensate(Sa, Sc) . Supplied by ACFR

8: for alli = Scdo

9: (R, q) ← ICP(i, Sa, Rsi, qsi) . Supplied by ACFR

10: Sap←RSa+ q

11: grid ← MapFreeSpace(Si)

12: Di ←DetectSegments(Sap, grid, Θ)

13: end for

14: movingSegments ← CombineResult(D)

The leftover scans are aligned with the active scan to get their positions. The alignment is done with spectral registration supplied by the ACFR, which is de-scribed in Section 4.3. Based on the position and time of the scans with regard to the active scan, n scans are now chosen for the active scan to be compared against. This is done with a simple greedy optimization to maximize distance in space and time between the chosen scans; this is further explained in Section 4.4. The selected scans and the active scan are then motion compensated with an algorithm from the ACFR to account for the movement of the sensor during the measurement. This procedure is further described in Section 4.5.

To detect changes and motion, the chosen reference scans needs to be compared with our active scan. The relation between a reference scan’s free space and the active scan’s occupied space is exploited for this. The free space first needs to be represented in a better data structure; this procedure is described in Section 4.7. To make the comparison the reference scans need a better alignment to the active scan, especially after the motion compensation. This is handled by an ICP algorithm from the ACFR, which is described in Section 4.6.

The active scan is rotated and translated according to the ICP output to fit each reference scan. This projected version of the active scan is compared with the free-space data structure created earlier. Further details can be found in Section 4.8. How the results from the different reference scans are at last combined is also described in the aforementioned section.

(30)

4.2 Segmentation

The segmentation part of the algorithm tries to delete points corresponding to the ground in the measurement and then assigns labels to the remaining points. Points that are close to each other are assumed to correspond to the same object and are marked with the same label. One set of points with the same label is called a segment, which hopefully corresponds to an object in the physical world. This is illustrated in Figure 5.11. A brief description of the segmentation algo-rithm will follow, for more details see Douillard et al. [2011].

The algorithm works with the concept of radial lines and raster lines. Measure-ments on a radial line are gathered from the same time instance and therefore on a line originating from the sensor. Measurements on a raster line stem from the same individual laser during one scan (one rotation) and resemble a ring around the sensor. The algorithm begins by removing the ground. To do this, a terrain mesh is first created based on these lines. This mesh is a graph where each point is a node. The node has edges to the previous and next measurement taken by the laser, and the laser below and above. The gradient is then calculated for all nodes in all four directions; the gradient with maximum L2norm is assigned to the node.

Figure 4.2: A segmented measurement with ground and segments marked in separate colours. The picture is reproduced from Douillard et al. [2011] with permission from the authors.

(31)

4.3 Spectral registration 21

The marking of ground points is then initiated by assuming that the longest set of nodes in the closest scan line with a gradient value below a thresholdmaxgrad

corresponds to ground. Ground classification is then propagated throughout the graph with conditions on neighbourhood to ground labeled nodes and the gradi-ent being less thanmaxgrad. Transition points are also removed from the

mea-surement by searching for local deviations along each raster line and radial line. For the segmentation part, a cubic voxel grid is created and populated with all the 3D points not marked as ground or transition from the previous step. Points are grouped based on local voxel adjacency, depending on several parameters. All parameters were used at their default values except the parameter specifying the minimum amount of voxels required for a segment, which was empirically set to eight.

4.3 Spectral registration

This algorithm is used as a preprocessing step for the ICP since it requires an ini-tial guess. The algorithm is fast and can handle large translational and rotational displacement, but doesn’t give as accurate results as the ICP. It could with some modification also handle scaling, but this is not an issue with the laser range measurements.

For faster processing the 3D laser range measurement is projected to a 2D plane and then fitted to a occupancy grid where each cell has a binary value, occupied or free. A brief explanation of the principle now follows. See Checchin et al. [2009] for more details regarding the basis of this particular implementation, or Chen et al. [1994] for more detailed theory.

The algorithm makes use of the shift property of the Fourier transform, meaning that a translation in the variables of a function corresponds to phase shift in its Fourier transform. Assume we have two 2D range scan grids g1and g2 that are translated by (∆x, ∆y) from each other so that g2(x, y) = g1(x − ∆x, y − ∆y), the shift property then gives the Fourier transforms the following relation

G2(u, v) = G1(u, v)e

−2πi(u∆x+v∆y)_. _(4.1) Proceeding by forming the normalized cross power spectrum

FnCorrn(x, y) o = G ∗ 1 |_G∗ 1| ·_|G2 G2| = G ∗ 1G1 |_G∗ 1||G1| e−2πi(u∆x+v∆y)= e−2πi(u∆x+v∆y), (4.2) with ∗ denoting the complex conjugate. We see that taking the inverse transform of this results in a Dirac pulse δ (x − ∆x, y − ∆y). This function is only nonzero at the coordinates corresponding to our sought translation, therefore we can express the translation as

(32)

When rotation is added to the mix, G1 and G2 needs a base change into polar coordinates to represent the rotation as a translation. This image can then go through the same process as g1 and g2above. For more details, see the referred papers earlier in this section.

4.4 Selection of scans

This algorithm tries to spread the chosen scans so that they are as dispersed in time and space as possible. It is not done in an optimized way but rather in a greedy optimized fashion. The sequence of actions is described in Algorithm 2. It starts by removing scans that are too far away to give valuable information; this value is set as half the sensor’s range. Feature vectors are then formed and nor-malized; in the implementation only x, y and time are used. The normalization is done so that the furthest distance in space equals the furthest distance in time. A distance matrix is then calculated and used to determine the nearest neighbour of each vector and the corresponding distance. The greatest of these distances is determined to be the most distant vector from all of the other vectors; this is illustrated in Figure 4.3. The vector is chosen and made an invalid choice until the next loop. Worth mentioning is that the active scan is an invalid choice from the start. This method is not an optimization in coverage, but in sparseness. Algorithm 2Selection algorithm

Require: Active scan position pa and time ta, Reference scans positions pr and

times tr, No. scans to select n

1: Exclude scans with ||pr−pa||> 60

2: Form feature vector Va= (pa, ta) and vectors Vr = (pr, tr)

3: (Va, Vr) ← Normalize(Va, Vr)

4: Build distance matrix D 5: fori = 1 : n do

6: Ci ←Max(Min(D))

7: Mark Ci as not valid choice in D.

(33)

4.5 Motion compensation 23 min min max Candidate Chosen Active

Figure 4.3:Illustrates how the most remote point is picked by the selection algorithm. Assume that the algorithm already have chosen the green mea-surements and is currently choosing between the blue squares. The mini-mum distance from the candidate measurements to the already chosen and the active measurement is then calculated, these are marked with arrows in the figure. The candidate measurements with the largest of these mini-mum distances is chosen by the algorithm. This procedure is repeated until a specified number of measurements are selected.

4.5 Motion compensation

The laser range scanner is rotating and moving continuously but measurements extracted are treated as a point cloud from one given time instance. This assump-tion gives errors in the range measurements, the correcassump-tion of these is what here is called motion compensation. The problem is exaggerated and illustrated in Figure 4.4. The algorithm simply assumes a constant velocity during the mea-surement and applies part of the rotation and translation to each point; more to those early in the measurement and less to those closer to the end.

Figure 4.4: The left side illustrates the lasers different positions during movements while the laser is rotating. Each line corresponds to a measure-ment after a small rotation. The right side illustrates the effect of assuming these measurements where taken at the same time. The effect isn’t this large in reality but illustrates the need for compensation.

(34)

4.6 ICP

The Iterative Closest Point (ICP) algorithm is commonly used for geometry based registration between models or measurements in 3D. Since it has been widely used there are several different methods. They differ in how points are selected, distance metric, error metric etc. More examples can be found in Rusinkiewicz and Levoy [2001]. The algorithm used in this work is currently being developed and details are not allowed to be disclosed here. It has similarities with the point-to-plane variant of ICP by Yang and Medioni [1992], which will now be briefly explained to describe the principles of ICP.

In this simple description ICP is divided into three steps:

• Calculate distance between a chosen set of points and their corresponding plane in the reference measurement.

• Calculate a translation and rotation that minimizes an error metric for these distances.

• Perform translation and rotation and repeat from the first step, unless the error is within a specified threshold.

Denote the ith chosen point in the new measurement piand its corresponding

ref-erence measurement point as ri, where i ranges from one to the number of chosen

points N . Points can be chosen based on for example surrounding geometry or uniform sampling. Correspondence between points is also defined differently be-tween ICP versions. Options include nearest neighbour in different metrics and normal shooting. In the point-to-plane ICP the distance between two points is defined as the length of the distance vector projected to the normalized surface normal ni of the reference point. The transformation matrix T also needs to be

introduced, which denotes the translation and rotation for the new measurement. This means that the distance Diis

Di = kni· (T pi−ri)nik= kni· (T pi−ri)k. (4.4)

See Figure 4.5 for an illustration. An initial estimate of the transformation matrix is needed because the point pairs need to be decided before forming this distance. For the ICP to converge it also needs to be good since it easily falls into a local minimum.

Step two is a parameter estimation of T where the error is minimized. Often outliers are excluded before this stage, the easiest approach to this is to exclude point pairs with distance above a specified threshold. In this example, a least squares estimation is used, which can be written as

T ← argmin T N X i=1 D_i2. (4.5)

(35)

4.7 Map free-space 25

n

Di

ri

pi

Figure 4.5:In this 2D example two lines are to be matched. The distance Di

is the distance vector projected to the normal of the reference point ri.

These steps are then repeated until a small enough average error is achieved. Max-imum number of iterations might also be a restriction, should it not converge.

4.7 Map free-space

To detect points in the active scan that correspond to moving objects, the free space in the reference scans needs to be compared to the occupied space in the ac-tive scan. To make it an effecac-tive operation, the free space needs to be represented in an alternative way. In this case a lookup table is used.

The table is a grid in azimuth and elevation angle in the spherical coordinates. Each grid cell stores the range of the point closest to the laser among all the points that falls into that grid cell. A 2D example can be seen in Figure 4.6 where discretization is made over the azimuth angle. This also illustrates the conse-quences of having too low resolution, the free-space between the laser and the objects in the upper left doesn’t get correctly mapped. A too high resolution on the other hand gives problem in the detection process; this will be shown in the results (Chapter 5).

The figure also illustrates that no measurements in a section yields an unknown range. According to the manual, a measurement without return can be regarded as free space up to 65 meters. During development, no returns were also observed in other cases. These cases stem from specular reflection which means that the light didn’t disperse and reflect back to the sensor.

This can be seen on for example car roofs at certain angles, which often doesn’t give returns due to this phenomena. This is unpredictable and often gives a high indication of motion when the object in fact is standing still. These sections are therefore treated as unknown range and cannot be used for detection of free-space violation. Results showing this can be seen in Chapter 5.

(36)

When the grid is created, the active scan is also trimmed to not have points out-side the grids boundaries. Indices for all points corresponding to their respective cell in the lookup table are also created. In total, the algorithm has four design parameters: the range for elevation angles in the table and the resolution of the grid in both angles. The resolution is set slightly lower than the resolution speci-fied by the Velodyne manual; this is to be on the safe side to not get empty cells at an object’s location. Angle ranges are also set with some margin, at least for measurements toward the ground. Parameters are listed in Table 4.1.

?

R ?

?

Figure 4.6:Illustrative figure displaying the free-space mapping in 2D. The dots are laser measurements; each circle section have a radius equal to the range of the closest point. Question marks illustrate that nothing can be said in those sections since no measurements exist. R is the laser position in the reference scan being free space mapped.

Parameter Value

Azimuth resolution 0.5 degrees/cell Elevation resolution 0.4 degrees/cell Max elevation 25 degrees Min elevation -4 degrees

(37)

4.8 Project scan and detect points 27

4.8 Project scan and detect points

Since the lookup table is in the coordinate frame of the reference scan, the active scan needs to be projected into the same frame. Projecting the reference scan into the active scan’s coordinate frame would be more error prone since the center of the lookup table wouldn’t be the center of the reference scan. This will give a false picture of the free space and will give false detections. Figure 4.7 and Figure 4.8 shows examples of these situations. The former figure uses the reference scan sensor position as center for the free space mapping, while the latter uses the active scan’s. The latter case will detect the object which is now further away from the sensor, even though it could also be an object that were occluded before. The occluded space is falsely marked as free space.

In the free-space mapping the full reference measurement was used to give good free-space mapping. When attaining the rotation R and translation q from the ICP algorithm the segmented scan was used for alignment. This because the ground points in both scans tend to draw the alignment so that the sensor posi-tions match and this is not desired. With the rotation R and translation q from the ICP algorithm the active scan can be aligned with the reference scan giving the new projected active scan Sap

Sap= RSa+ q. (4.6)

Each point in Sapis then compared with the corresponding cell in the free space

lookup table. Points within a static threshold are marked as candidate points for moving objects. The threshold is based on estimated range measurement devia-tion from earlier experiments and empirical testing. A threshold requiring 0.3m difference between the range measurements was found to work sufficiently. This is a bit higher than what would be necessary if the measurements are taken from the same viewpoint, but this is mostly not the case.

Segments are then marked as moving if they pass two other thresholds. A certain percentage of the segments’ points are required to be candidate points. This is to reduce the effects of different viewpoints, discretization and segmentation faults. The second threshold is a minimum required candidate points per segment. This threshold serves to avoid marking segments that consist of very few points that may arise from dust, strange reflections etc. Both of these are only empirically tested. Standard thresholds used are given in Table 4.2.

This procedure is for one comparison between a reference scan and the active scan. Now the combined result needs to be brought together. This is simply done by classifying a segment as moving if any of the comparisons yielded that result. Different variants of the final weighting will be shown in the results.

(38)

R

A

Free space

Figure 4.7:The dots A and R represents the viewpoint when the active and reference scan where taken. Scans are matched and free space is mapped with R as center. Blue dashed line is the object’s position in the reference scan, blue dashed and dotted represents the measurement points. None of them are in free-space, as it should be.

R

A

Free space

Figure 4.8:The dots A and R represents the viewpoint when the active and reference scans were taken. Scans are matched and free space is mapped with R as center. Blue dashed line is the objects position in the reference scan, blue dashed and dotted represents the measurement points. We see that some of them are in free-space; this is not a valid conclusion since it could simply be another object in Sr blocking the one represented in Sa.

(39)

4.8 Project scan and detect points 29 There is a drawback in this approach that arises when scans from different view-points are compared. The laser sensor has an uncertainty in both range and angular precision which in this angular grid approach means that an angular uncertainty could lead to a range difference when there should not be any, or a range uncertainty could make the measurement end up in the wrong grid cell. An illustration of the range uncertainty case can be seen in Figure 4.9.

A

R

Figure 4.9: Difference in viewpoint between active and reference scans might cause the measurement of the same spot to go into different grid cells due to range error.

Threshold Value Range difference 0.3m Percentage candidates 50% Minimum candidates 5

(40)

(41)

5

Results

In this chapter the data sets used are explained. Results that illustrate design choices are also presented, followed by figures showing the limitations of the algorithm.

5.1 Data

The data used is supplied by the ACFR and is collected using a HDL-64E S2 from Velodyne. This is a LiDAR (Light Detection and Ranging) sensor with 64 lasers vertically mounted in a unit rotating 360 degrees. All of the data is collected in Sydney with the sensor doing 20 revolutions per second; Table 5.1 describes the data sets used. There are three sets where the sensor moves and two where it is stationary. Redfern1 is useful for testing performance at stand still in an

urban environment and is compared with the moving data sets. The other set with a non-moving sensor,Aikido, is useful for testing detection of slight human

motion. Redfern2 has cars driving by in higher speed than Redfern3 which has

more cars and pedestrians moving. TheOpera set is useful for testing pedestrian

motion detection during sensor motion and also features a loop closure, making it possible to use scans from different passes of the same area.

(42)

Data set Laser Scenario

Redfern1 Still Laser base standing still by a intersection, truck driving by and pedestrians walking.

Redfern2 Moving Laser base meets two cars on road with cars parked on both sides.

Redfern3 Moving Laser base approaches an intersection just as the lights go green and drives through, cars and people moving around.

Opera Moving Laser base drives in by the Sydney Opera House, a large number of people moving around. It also comes back the same way and therefore gives data from different points in time for the same area.

Aikido Still Laser base standing in the grass during Aikido practice with a crowd standing mostly still and watching. Table 5.1:Table describing the different data sets used.

5.2 Design choices

This section shows results regarding design choices.

5.2.1 Grid size

Grid size was an important factor for correct detection. Here are three exam-ples: too small, close to the truth and too large. As seen in Figure 5.1, a small grid size (high resolution) yields less true detections because the points are often compared to empty cells in between the cells that holds a range value. With too large cells (low resolution) the free space in the scan gets smaller than it actually is and detections might therefore be missed. Since the cells are large the errors from viewpoint difference are less, which can be seen at the wall in the figure. The large cells however also makes objects larger, so the car in the reference scan that is background to the car in the active scan stretches into the space that is un-known. Therefore a higher detection rate will be seen on objects that have little background in the reference scan; other objects will see a lower detection rate.

5.2.2 Handling of no returns

In the algorithm, areas that doesn’t give a laser measurement are considered un-known. Figure 5.2 shows the result of treating these areas as maximum range measurements. Figure 5.3 shows the result of treating them as unknown. By assuming a maximum range measurement, the amount of false candidate points increases.

(43)

5.2 Design choices 33

(a)Too high resolution yields less true positives since the measurements are of-ten compared with empty grid cells.

(b)A resolution close to the lasers true, it is slightly lower than the true resolu-tion. Detections along the wall are from viewpoints difference and discretization.

(c) A too low resolution enlarges the background when there is a lack of it, therefore the number of detection points are higher. In general the free space gets smaller and the detection rate lower.

Figure 5.1: These figures show the effects of different grid cell sizes in the free space map. Green points are the active scan, magenta are the points within range threshold in free space and blue asterisks are centers of grid cells in the reference scan’s free space map.

(44)

(a) Both active scan and detected points.

(b) Showing only detected points.

Figure 5.2: Green points belong to the active scan and the magenta points are within the range threshold in the comparison. Green asterisk is the active scans laser position, magenta asterisk is the reference scan’s. Black ellipses mark the true moving objects. Treating lack of returns as max range readings gives many points that are falsely detected by the range threshold.

(45)

(a) Both active scan and detected points.

Figure 5.3: Green points belong to the active scan and the magenta points are within the range threshold in the comparison. Green asterisk is the active scans laser position, magenta asterisk is the reference scan’s. Black ellipses mark the true moving objects. Treating lack of returns as unknown gives less false detections but also less for true moving objects missing background.

(46)

5.2.3 Effects of thresholds

This section holds figures showing the difference in result when changing the thresholds. Both the minimum required percentage of detected points in the seg-ment and the minimum required number of detected points are tested in different scenarios.

The percentage threshold has the most impact on the results of the two. It is used to lessen the impact of errors from sensor, segmentation and viewpoint difference. In figure 5.4, a scenario is shown where the sensor is in the same position in both of the compared scans. The four persons training Aikido and the passing person are marked by black ellipses and are certainly known to have moved. Only the 10 percent setting manage to detect all of them in this scenario. This is on the other hand a difficult situation, where the objects move on almost the same spot and only one second has passed in between.

In the next scenario, the measurements are taken from different viewpoints and the percentage threshold needs to be increased to a higher value. This can be seen in Figure 5.5, where the actual moving objects also are marked with black ellipses. The 10 percent setting gives many false positives and even detects a wall. 30 percent still detects the same correct segments as the former setting and gives 7 false positives. At 50 percent, 2 segments are missed of the true moving segments that were detected at the 10 percent setting. This setting however only gives two false positives and is the one used to generate the other results. The setting is chosen to not risk deleting large static segments. Since many viewpoints are used, it still gives a high detection rate.

The threshold specifying the minimum number of required points hasn’t been tested very much. In the used data sets from an urban environment it doesn’t have as much use as the percentage threshold. If it is set too high, it might cause pedestrians in the sensor’s outer range to be marked as static. Additionally, any small static object that might be detected as moving and excluded from the map won’t make much difference for the localization with the map. It might however be of interest if the goal is to make a model of the static environment.

Most of the results are generated with this threshold set to five, the algorithm still seems to detect far away pedestrians and also avoids detection of some small static objects. Figure 5.6 shows one comparison from the Opera data with the threshold set to 0, 5 and 10. In the case with 10, pedestrians close to the water (top of figure) aren’t detected. Threshold at five give the same true detections as zero, but one less false positives. Looking at the results in general it seems that this threshold can be skipped in these environments, at least if the percentage threshold is set as high as 50%.

(47)

(a)The two scans compared, moving ob-jects are marked with ellipses.

(b)Percentage threshold set to 10%, de-tected segments are cyan.

(c)Percentage threshold set to 30%, de-tected segments are cyan.

(d)Percentage threshold set to 50%, de-tected segments are cyan.

Figure 5.4:This figure shows the detected segments in a scenario with four persons practicing Aikido and one person passing by. The difference in time between the scans is approximately one second. Only the 10 % case setting manages to detect all of them, but a lot of other objects are also detected. Minimum number of points is set to five.

(48)

(a)The two scans compared, moving ob-jects are marked with ellipses.

(b)Percentage threshold set to 10%, de-tected segments are cyan.

(c)Percentage threshold set to 30%, de-tected segments are cyan.

(d)Percentage threshold set to 50%, de-tected segments are cyan.

Figure 5.5: This figure shows the detected segments in a scenario where a car drives through an intersection. The difference in time between the scans is approximately one second. Minimum number of points is set to five. In this scenario, the 10 percent threshold gets a lot of falsely detected segments. The 30 percent case gets around 7 false positives but still detects all the ones detected correctly with 10 percent. The 50 percent setting misses 2 of the moving segments but only gets 2 false positives.

(49)

(a)The two scans compared. (b)Minimum detected points threshold set to zero, detected segments are cyan.

(c)Minimum detected points threshold set to five, detected segments are cyan.

(d)Minimum detected points threshold set to 10, detected segments are cyan.

Figure 5.6: This figure shows the detected segments in a scenario at the roundabout close to the Sydney Opera house. There is a large number of people walking alongside the water. If the minimum points threshold is set to five, the only difference in detection from without the threshold is a small static object marked by the black ellipse. If the threshold is increased to 10, more objects goes without detection, the two at the top of the picture are people that should be detected. The percentage threshold is here set to 50%.

(50)

5.2.4 Thresholding on one comparison or several

There are several options for which thresholds that can be used or how to use them. The two thresholds used when generating these results have been applied in two ways. The approach used the most is to apply the thresholds for each pairwise comparison and then assume a segment detected in any comparison to be moving.

The other approach is to use all the candidate points, i.e. points within free space according to the range threshold, from all the pairwise comparisons. This means that a point is a detected point if it was within the range threshold in any of the pairwise comparisons. The percentage and minimum number of points thresh-olds are then applied to these detected points collected from all reference scans that were compared with the active scan.

An example is shown in Figure 5.14 and Figure 5.7. The former figure shows the first approach, any segment that is not green is detected as moving. In the latter figure the second approach is shown, where all the cyan segments are detected as moving. Since both approaches has been using minimum 5 and at least 50% detected points as thresholds, the latter approach which fuse the candidate points will give more detections.

Although the latter approach combines the information it also combines the er-rors from all the measurements, therefore it might be necessary to increase the percentage threshold. Doing this will however make detection of objects that can only be seen in one of the different reference scans more difficult.

(51)

Figure 5.7:The figures shows the result of thresholding combined candidate points from four pairwise comparisons. The thresholds used are 50% and minimum 5 points. The ellipses shows the segments detected with this way of thresholding that were not detected by thresholding every pairwise com-parison. One of them is a correctly detected segment, but the others are false positives.

(52)

5.3 Result and limitations

Here are different cases presented to illustrate the capabilities and limitations of the algorithm.

5.3.1 Viewpoint difference

Figure 5.8 shows points within range threshold when the same viewpoint is used at different times. Figure 5.9 shows points within range threshold when the scans are taken from different viewpoints. In the latter case there are more false posi-tives and they are more clustered. This makes it difficult to distinguish true mov-ing objects from some static objects. This effect is due to the sensor uncertainty in combination with the different viewpoints discussed earlier.

5.3.2 Lack of background

Figure 5.10 shows a case where a car in the active scan is within the sensors blind spot in the reference scan. Since that is unknown space in the reference scan the points in the active scan have no grid cells to compare against and it cannot be detected. Similar cases appear with an empty street as background.

5.3.3 Segmentation related detection errors

The segmentation algorithm sometimes has problems separating objects when they are too close, which will be referred to as under-segmentation. The opposite case might also occur when measurements from one object results in several seg-ments, this will be referred to as over-segmentation. In Figure 5.11a a case with a pedestrian is shown. The pedestrian in question is shown by the black ellipse and most of its points are within the range threshold; these are colored magenta. This pedestrian isn’t detected because it is in the same segment as the wall. The wall contains a large number of points that aren’t within the free space and the whole segment is therefore marked as static.

A case of over-segmentation is shown in Figure 5.11b. The truck is split into three segments and the one marked by the ellipse doesn’t have enough points in the free space. This part of the truck is therefore considered static in that comparison. In this specific case the segment was still detected in comparison with other reference scans.

5.3.4 Occlusions

Some objects will not be detected due to occlusions in the reference scan. These situations are helped by using several reference scans, as this algorithm does. An example is shown in Figure 5.12. In this example, the sensor position in the reference scan makes most of the pedestrian occluded. Only the points marked in magenta are deemed inside free space, which isn’t enough to detect it as non-static. Green points belong to the active scan and blue points are the ones the free space grid is based on.

(53)

5.3 Result and limitations 43

(a) Both active scan, reference scan and detected points.

Figure 5.8:Green points belong to the active scan, red to reference scan and the magenta points are within the range threshold in the comparison. Green asterisk is the active scan’s laser position, magenta asterisk is the reference scan’s. Black ellipses mark the true moving objects. When the same view-point is used there are less false positives.

(a) Both active scan, reference scan and detected points.

Figure 5.9:Green points belong to the active scan, red to reference scan and the magenta points are within the range threshold in the comparison. Green asterisk is the active scan’s laser position, magenta asterisk is the reference scan’s. Black ellipses mark the true moving objects. When different view-points are used there are more false positives than when same viewpoint is used.

(54)

Figure 5.10: The car marked by the ellipse isn’t detected since it’s in the blind spot of the reference scan. Green is the active scan, blue are the points included in the free space map and magenta are the points within range threshold.

(a)Pedestrian marked by the ellipse is not detected because it is in the same segment as the wall.

(b) The truck is split into three seg-ments, the one marked by the ellipse isn’t detected because not enough points are deemed to be in free space.

Figure 5.11: Figures showing under- and over-segmentation. The former case might result in an object not being detected, while the latter might result in part of an object not being detected.

(55)

5.3 Result and limitations 45

Figure 5.12: Points forming the free space border are blue, active scan is green and magenta are points deemed inside free space. The sensor position in this reference scan makes most of the pedestrian occluded, only a bit of its left side is detected.

5.3.5 One versus many

Figure 5.13 shows an example where one scan is used as a reference scan. Figure 5.14 shows the result of using four scans as reference scan instead. Each color is a detection from one scan. More true moving objects are detected this way. The only one missed is a part of the car, indicated with a red ellipse.

(56)

Figure 5.13: Detections comparing the active scan and one reference scan. Correctly detected objects are marked with black ellipses, red indicates mov-ing objects not detected. 50% and at least 5 detected points required.

Figure 5.14: Detections using four reference scans. Correctly detected ob-jects are marked with black ellipses, red indicates moving obob-jects not de-tected. 50% and at least 5 detected points required.

(57)

6

Concluding remarks

This chapter first discusses what is achieved by this algorithm and then suggests some improvements that can be made.

6.1 Conclusions

The algorithm developed can detect non-static objects in LiDAR measurements by comparison with other measurements. It can utilize measurements from dif-ferent points in time and it can therefore be used both offline and online. This is according to the specifications made. The detection rate seems high, especially for only using static thresholds. Though some more testing needs to be done to see how good it actually is. The test cases show a potential for experimentation with different settings. Not only the static values, but also how the final result for a scan is decided.

The threshold requiring a minimum number of detected points might very well be dropped in the scenarios tested, a threshold much larger than five is at least not recommended. The threshold requiring a minimum percentage of detected points was found to be crucial for a good balance between true positives and false negatives. Of the two different methods for combining the results from the pairwise comparisons, the one that requires a segment to be detected in at least one comparison seems most beneficial. With this setting, a percentage threshold around 30-50% gives good results. This threshold can be varied depending on the requirements regarding false detections.

Moving object detection in urban environments

Institutionen för systemteknik

Department of Electrical Engineering

Examensarbete

Moving object detection in urban environments

Moving object detection in urban environments

Examensarbete utfört i Reglerteknik

vid Tekniska högskolan vid Linköpings universitet

av

Abstract

Acknowledgments

Contents

1

Introduction

1.1

Contributions

1.2

Thesis outline

2

SLAM and LiDAR

2.1

Feature-based SLAM

2.2

Pose-SLAM

Alignment

Alignment

2.3

LiDAR

2.4

Free, occluded, unknown and occupied space

3

Related work in SLAM and motion

detection

3.1

Including moving objects in SLAM estimate

3.2

Structure of common solutions

3.2.1

Alignment

3.2.2

Detection of candidates

3.2.3

Classifying movement

4

Motion detection using multiple

viewpoints

4.1

Overview

4.2

Segmentation

4.3

Spectral registration

4.4

Selection of scans

4.5

Motion compensation

4.6

ICP

4.7

Map free-space

4.8

Project scan and detect points

R

A

Free space

Free space

R

A

Free space

Free space

5

Results

5.1

Data

5.2

Design choices

5.2.1

Grid size

5.2.2

Handling of no returns