3D LiDAR based Drivable Road Region Detection for Autonomous Vehicles

(1)

IN

DEGREE PROJECT COMPUTER SCIENCE AND ENGINEERING, SECOND CYCLE, 30 CREDITS

,

STOCKHOLM SWEDEN 2020

3D LiDAR based Drivable Road

Region Detection for Autonomous

Vehicles

JIANGPENG TAO

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

3D LiDAR based Drivable

Road Region Detection for

Autonomous Vehicles

JIANGPENG TAO

Master in System, Control and Robotics Date: March 4, 2020

Supervisor: John Folkesson Examiner: Patric Jensfelt

School of Electrical Engineering and Computer Science Host company: Scania AB

(4)

(5)

iii

Abstract

(6)

iv

Sammanfattning

(7)

v

Acknowledgements

This thesis project is performed at SCANIA CV AB, EARM group. The work is financed by the Vinnova research project iQPilot (project number 2016-02547), for which I am grateful. The objective of this project is to move a step closer to the introduction of self-driving heavy-duty vehicles in traf-fic environments. I would first like to express my great appreciation to Prof. John Folkesson and Dr. Batool Nazre for their persistent supervision and help throughout this thesis project. They encouraged and steered me in the right direction whenever I got into trouble.

I would also like to thank all my colleagues at Scania who have ever sup-port my work and provide valuable suggestion during my mid-term and final presentations, especially Dr. Zhan Wang and my manager Per Sahlholm.

In addition, a thank you to my thesis examiner Prof. Patric Jensfelt for his assessment and comments on this thesis.

(8)

Chapter 1 Introduction

Modern vehicles increasingly have advanced driver assistance systems (ADAS) equipped to enhance better driving and safety. More and more features are added and improved, including avoiding collisions and alerting to potential dangers. Such systems are expected to grow more and more complex and reli-able towards fully autonomous driving during the following decades. Histori-cally, autonomous vehicle research began to gain momentum with the DARPA Grand Challenge in 2005 and Urban Challenge in 2007. Significant research work has been done in the related sensor, perception, planning, control sys-tems. Since then, a large number of companies and research organizations have joined and developed prototypes of autonomous vehicles (AVs).

Drivable road region detection is the fundamental task both in ADAS and autonomous driving. The system requires identifying the road boundary and detecting the surrounding obstacles, such as vehicles, pedestrians, guardrails and buildings. Importantly, the detection result provides the straightforward perceptual cue for collision avoidance and path planning. Therefore, the real-world application puts very high demand on its reliability. Research work [1] states that an acceptable error rate should be less than one error in 54,000 frames. Achieving such low error rates faces a myriad of challenges in realistic traffic scenes. There are numerous kinds of obstacles with different shapes, sizes, colors, textures, dynamic states. The ground surfaces can be planar, bumpy, sloped, or undulating. Even structured urban road might be built by a different standard.

A wide variety of sensor equipped in AVs were found useful for the task of drivable road region detection, including imaging sensors, LiDAR (light detection and ranging), RADAR (radio detection and ranging). Each kind of sensor has its advantages and drawbacks. The imaging sensor is the most

(12)

2 CHAPTER 1. INTRODUCTION

commonly used modality and it can obtain regular and dense visual data with color and texture information. However, the data clarity heavily depends on the visibility conditions. Shadows, extreme weather conditions (rain, fog, snow), sun glare may cause unpredictable model output. On the other hand, RADAR has high reliability in range and velocity measurement even under bad weather. But its weaknesses are also obvious. It has a very narrow field of view (less than 15 degrees) and low accuracy in lateral directions, which means a large blind region exists. In contrast, LiDARs can capture highly accurate geometry with a larger field of view, and also do not suffer from the external illumination. Although currently the high cost limits the extensive use of LiDARs, experts expect this to fall to less than 100 dollars in the next 5 years. It is therefore of interest to survey the applicability of LiDARs data without using photometric information for the road perception task.

1.1 Project Overview

This project was conducted at the company Scania AB in Stockholm. Scania is developing an prototype of fully autonomous bus called Klara. A 64-channel LiDAR is mounted with around 14 degrees pitch angle on the head of Klara to capture depth cues ahead. This project aims to propose an algorithm to detect the front drivable road region by classifying each 3D point as drivable road or obstacles. For this task, deep learning techniques have shown extraordinary results recently. But Scania would prefer not to implement such data-driven black box model, which may output unpredictable and unexplainable detec-tion result under unseen condidetec-tions. Thus, the primary focus of this project is restrictively on traditional methods exploiting geometry and projection at-tributes.

(13)

CHAPTER 1. INTRODUCTION 3

Figure 1.1: The block diagram of the proposed drivable road region detection system

(14)

4 CHAPTER 1. INTRODUCTION

computing the corresponding inverse depth up to a scale. In the U-disparity map, obstacles such as vehicles, pedestrians, trees, guardrails, are mostly pre-sented as pixels with high intensity. In the V-disparity map, an ideal plane is projected as a straight line. However, for the non-flat or multiplanar ground surface in real world, the corresponding projection would be indeterminate combination of line segments. Instead of 2D line fitting algorithms, the Sobel edge detection filter is used to extract a road profile from the V-disparity map. The extracted ground surface still retains some undrivable road structures, including curbs, ditches, grasses. The points corresponding to these road structures generally have a large geometrical variation from neighbors. There-fore, we propose three kinds of features to detect such points, which are seen as candidate boundary points. The road boundary is assumed to be observable and straight. Thus, a robust linear regression method is applied to fit left and right road boundaries. Those ground points between the two resulting road boundaries represent the final drivable road region.

(a) Result of ground surface extraction and boundary regression (The ground truth of drivable road region is presented in light blue. Yellow denotes ground surface. The green points represent obstacles. Feature points are denoted in blue, and the regressed road boundaries are plotted as red lines)

(b) Final result of drivable road region detection (The green color denotes true posi-tives, blue color representing false positives and red color being false negatives)

(15)

CHAPTER 1. INTRODUCTION 5

1.2 Research Question

A frame of point cloud from 64-channel LiDARs contains millions of 3D points, which makes it computationally expensive for searching and filter-ing operations among points. How can we design a efficient non-data-driven method to segment the point cloud into clusters of drivable road region and obstacles? Can such a method overcome the irregular and sparse problem of LiDAR data? Can such a method keep generalized in different traffic scenar-ios?

1.3 Outline

This report is organized as follows:

• Chapter 2 reviews the existing work in drivable road region detection. • Chapter 3 describes the prerequisite theory to help understand the

pro-posed method.

• Chapter 4 explains the proposed method and experimental setup. • Chapter 5 presents and analyze the experimental results in different road

scenarios. Also, the advantages and limitations of the proposed method are discussed.

(16)

Chapter 2 Related Work

This chapter summarizes the related work in the area of drivable road region detection.

Over the decades, a large number of solutions based on different sensors has been proposed for road perception. Hillel [1] surveys the research progress in road and lane detection. The predominant approaches in this area apply their model for the case of image data. Since the traffic infrastructures, such as lane mark, road signs, are particularly designed and built for human visual system, the texture and color cues in the image data are the most straightforward. Be-sides, with current manufacture development cameras are cheap to access and robust for usage. With the prior knowledge of traffic scenes, the road region detection generally converts as a pixel-wise segmentation problem [4, 5, 6] or vanishing point detection task [7, 8, 9]. On the other hand, stereo imaging methods extract 3D information from an image pair and model based on 3D space relations rather than photometric information [10, 11]. Although the deep estimation of stereo imaging can not reach the same accuracy and relia-bility as LiDARs, the modeling methodology regarding object detection and classification mostly can be transformed to LiDAR applications. In addition, in order to overcome the drawbacks of single sensor some work focus on fu-sion between image and LiDAR data by projecting the LiDAR points onto the image plane [12, 13, 14].

This project aims at exploring a solution with pure LiDAR data. Thus, only work exploiting 3D cues will be discussed in the following.

(17)

CHAPTER 2. RELATED WORK 7

2.1 Rigid Model Fitting

Traditionally, given 3D data the road perception follows such process: Fist is to separate the traffic scenes into geometrical structures such as road surfaces, curbs, poles, planes, corners. Then is to establish rigid models to present these structures.

Road surface

In general, road surface mostly is assumed to be an ideal plane. The most conventional approaches to plane extraction are the random sample consensus (RANSAC) [15], Hough based methods [16], and normal estimation. How-ever, these methods restrict to handle only planar road surfaces. For non-planar road surfaces, Ai [17] applies a quadratic road model combining with least-squares fitting. In [18], the ground surface approximation is presented by a B-Spline model.

Road boundary

Traffic roads are generally separated by different kinds of road structures, in-cluding curbs, grasses, ditches, guardrails. Similarly, all these structures have large elevation variation in their surroundings. The differential filter is com-monly used to extract curb features from a single laser scan, and can be con-volved with different kinds of data representation, such as elevations [19, 20], ranges [21], top-view Euclidean positions [22]. Some prior knowledge are uti-lized to improve the accuracy of curb detection, including road width and curb height [21]. In [20], the least trimmed squares regression method is applied to robustly fit the road boundary. Work by Fernández [23] proposes a method based on 3D curvature which can tackle curbs with different heights.

Lane mark

(18)

8 CHAPTER 2. RELATED WORK

2.2 Occupancy Grid Map based

Occupancy grid map is one of commonly used ways of LiDAR data presenta-tion. It splits a certain range of space into equally sized cells. Each cell stores some predefined state variables and is recursively updated based on the Bayes rule. Herbet et al. [26] introduce the concept of elevation map which repre-sents 3D information in 2D bird view and stores the height of the surface in each cell. Triebel et al. [27] propose a multiple surface representation which extends the model ability to complex structures. Compared to the conventional point cloud representation, such grid maps are more compact and organized.

In the grid map, each cell corresponds to a number of different 3D points. Generally, the variation of each set of points acts as a strong cue for 3D object detection task. In [28] the elevation histogram of the points in each cell is combined with a graph model to detect obstacles. Zhao et al. [29] propose a robust curb and road surface detection method by extracting three spatial cues, including the elevation difference, gradient value and normal orientation.

2.3 Disparity based

Knowledge of disparity represents the 3D information and several kinds of methods model traffic scenes based on the disparity characteristics.

U-V-disparity

(19)

CHAPTER 2. RELATED WORK 9

further perception works on other road features were carried out, including lane marks [33, 34], pothole [35, 36].

Stixel world

Another technique based on disparity is the "Stixel World" [11] which is pro-posed and mainly promoted by David Pfeiffer. The Stixel World is a medium-level representation of 3D traffic scenes to segment an image or depth maps into superpixels, and each superpixel is a rectangular stick with a certain height and class label, named "stixel". In the U-V-disparity domain, its segmentation is inferred by solving a maximum-a-posteriori (MAP) problem by minimiz-ing an energy function [37, 38]. In general, the Stixel World can be used to separate free space, static obstacles, moving objects, and background [11, 39, 40]. It also has proven applicable in both stereo images and 3D LiDAR range data [41, 37]. Franke further extended the availability of this technique in the case of adverse weather [42], and slanted streets [43]. Besides disparity, image color and texture information is used to improve the performance [44, 45, 46].

Lidar-histogram

Inspired by the U-V-disparity work, Kong et al. [47] proposed a method called Lidar-histogram to segment Lidar point cloud into road plane, positive and negative obstacles. Generally, the U-V disparity technique requires the dispar-ity map generated from stereo image matching. The alternative way of acquir-ing a disparity map is projectacquir-ing 3D point clouds onto an image plane. Similar to the V-disparity map, the Lidar-histogram simplifies the segmentation prob-lem as a 2D linear fitting task. To refine the detected road region, Kong et al. [48, 49] further designed a row and column scanning strategy based on the height difference. However, its performance is limited to the sparsity and dis-continuity of Lidar data. Therefore, Kong explored some improvement work on the fusion of LiDAR and camera, including upsampling the point cloud [50] and combing the FCNN-based results [51].

2.4 Deep Learning based

(20)

learn-10 CHAPTER 2. RELATED WORK

ing based methods has proven more generalized and accurate than traditional methods.

Predominantly, a category of approaches transform the irregular LiDAR data into 2D images and consider as a typical image semantic segmentation problem, including SqueezeSeg [52] PointSeg [53] in spherical coordinates, LiLaNet [54] in cylindrical coordinates, work by Dewan [55] in bird view. With 2D image presentation, a great number of state-of-the-art image-based deep learning framework can be applied. Another way of methods project point cloud on regular Cartesian voxel grids and extend convolution layers to 3D space, such SEGCloud [56] and OctNet [57]. Although the spatial relations are kept, this kind of methods cannot fail to deal with sparse LiDAR point cloud and reach real-time performance. Differently, PointNet [58] proposes a network architecture which can directly consume raw point clouds. Basically, it combines CNN structures with both local and global points features which are invariant to point order. It has proven useful and successful in both object classification and semantic segmentation task, but restrict to relatively small-scale scenes.

2.5 Method Choice

For this project, deep learning based methods are not preferred due to the unexplainable output and need of heavy data annotation work. Besides, the occupancy grid map discretizes point clouds into compact and organized cells, but results in lots of empty or barren cells. Processing such data is inefficient and wasting computation.

(21)

Chapter 3 Theory

The basic theories about the U-V-disparity and 3D LiDAR sensor are explained in the following.

3.1 Stereo Vision

Stereo vision is a broad research topic in computer vision aimed at extracting 3D information from two or more images.

3.1.1 Stereo Camera Model

The most specialized and standard case of a stereo vision system is two iden-tical pinhole cameras are displaced horizontally from each other, in a similar manner of human binocular vision. Its general setup is illustrated in Fig. 3.1. Under the pinhole camera model, the central projection of a point (X, Y, Z) in the camera coordinate is simply expressed as a linear mapping:

u = fuX_Z + u0

v = fvY_Z + v0

(3.1) Where the parameter u and v denote the image coordinates; fu = α/suand

fv = α/sv represent the focal length in terms of pixels; α is the focal length

in terms of distance, su and svare the size of each pixel in u and v directions.

In the following, we assume fu = fv and replace with f .

(22)

12 CHAPTER 3. THEORY

Figure 3.1: The general stereo camera model, where: P is a point in the real world coordinate, b is the distance of baseline, Ol,r is the optical center, p =

(u0, v0) is the principal point.

Homogeneous coordinates

Homogeneous coordinates are a specific coordinate system in projective ge-ometry. It works by adding an extra coordinate to a Euclidean coordinate. For example, a coordinate triple (kx, ky, k) with any non-zero value k, represents the same point (x, y) in 2D Euclidean space. In this way, the triple (x, y, 0) corresponds to the point (x/0, y/0) at infinity.

(23)

CHAPTER 3. THEORY 13

Where K is the intrinsic matrix corresponding to Eq. 3.1. γ denotes the skew coefficient between the x and y axis of the camera coordinates and is often approximately 0. Besides, [R | T ] are the extrinsic parameters used to transform from world coordinates to camera coordinates. R and T are respec-tively the rotation matrix and translation vector. k is the scaling factor.

Disparity

The disparity refers to the difference between the u coordinate of two corre-sponding points within a stereo image pair.

Based on Fig. 3.1, the transformation from world coordinates to the two camera coordinates is achieved by a simple vector translation with ±b/2 in x direction. Therefore, the disparity can be easily obtained by:

∆ = ul− ur = f

b

Z (3.4)

As shown in Eq. 3.4, the disparity is inversely proportional to the distance from the observer. Thus, given disparity map we can directly recover the 3D depths of each pixel.

3.1.2 Stereo Matching

Stereo matching is the process of estimating a 3D model of the scene by finding the pixels in two or more views that correspond to the same 3D position. Its primary task is to measure the disparity map accurately and efficiently. So far, stereo matching is still an actively studied and fundamental research area.

Basically, stereo matching techniques follow four steps:

1) Calibration: Obtains intrinsic and extrinsic parameters of the stereo camera offline.

2) Rectification: Uses the calibrated parameters to remove lens distortions and transform the stereo pair into the standard setup as Fig. 3.1.

(24)

4) Triangulation: Computes the 3D positions of each pixel based on Eq. 3.1 and 3.4, given the calibrated parameters and disparity map.

Although typical stereo matching algorithms are computationally expen-sive, utilizing FPGA (Field Programmable Gate Array) or GPUs (Graphic Pro-cessing Units) can allow for real-time performance.

3.1.3 U-V-disparity Domain

The U-V-disparity domain is used to describe the relationship between image coordinates (u, v) and disparity in stereo vision. It is commonly used to detect ground surfaces and structured obstacles in 3D scenes.

In general, the stereo rig is mounted on a robot approximately parallel to the ground surface as shown in Fig. 3.2. We use ψ, φ, θ to represent the yaw, roll, pitch angle of the camera coordinate with respect to the world coordinate. These three angles often approximately equal to 0.

Figure 3.2: The setup of the stereo camera and world coordinate system, where the parameters denotation is same with Fig. 3.1.

(25)

CHAPTER 3. THEORY 15                                    K =   f 0 u0 0 0 f v0 0 0 0 1 0   R =   1 0 0 0 cos θ − sin θ 0 sin θ cos θ  ×   cos ψ 0 sin ψ 0 1 0 − sin ψ 0 cos ψ  ×   cos φ − sin φ 0 sin φ cos φ 0 0 0 1   Tl,r = ±₂b 0 1 T (3.5)

From Eq. 3.2 and 3.5, we can derive the image coordinates (u, v) and disparity:             

ul,r = u0+ fX cos ψ cos φ−Y cos ψ sin φ+Z sin ψ±b/2_k

v = v0+ fX(cos θ sin φ−sin θ sin ψ cos φ)+Y (cos θ cos φ+sin θ sin ψ sin φ)−Z sin θ cos ψ_k

∆ = ul− ur = f_kb

k =X(sin θ sin φ − cos θ cos φ sin ψ)+

Y (sin θ cos φ + cos θ sin φ sin ψ) + Z cos θ cos ψ

(3.6) To simplify the Eq. 3.6, we set the yaw and roll angles to 0 and use a new image coordinates (U, V ) with respect to the camera principle point:

               Ul,r = ul,r − u0 = f X ± b/2 Y sin θ + Z cos θ V = v − v0 = f Y cos θ − Z sin θ Y sin θ + Z cos θ ∆ = ul− ur = f b Y sin θ + Z cos θ (3.7)

3.1.4 3D Planes Projection in U-V-disparity Domain

(26)

Figure 3.3: The typical planes in stereo system

Horizontal plane

Horizontal planes (white in Fig. 3.3) in the world coordinates can be simply described as:

Y = λ (3.8) Substitute Eq. 3.8 into Eq. 3.7 obtains:

λ

b∆ = V cos θ + f sin θ (3.9) Therefore, a horizontal plane in the world coordinates will be projected as a straight line in the V-disparity domain.

Vertical plane

Vertical planes (yellow in Fig. 3.3) in the world coordinates can be represented as:

Z = λ (3.10) Combining with Eq. 3.7 and 3.10, we can deduce:

λ

(27)

It shows that a vertical plane in the world coordinates will be also projected as a straight line in the V-disparity domain. If the pitch angle θ is sufficiently small, then Eq. 3.11 will be equivalent to Eq. 3.4:

∆ ≈ f b

λ (3.12)

Side surface plane

Side surface planes (purple in Fig. 3.3) in the world coordinates can be ex-pressed as:

X = λ (3.13) Substitute Eq. 3.13 into Eq. 3.7 derive a linear relationship in U-disparity domain with respect to the left image:

2λ + b

2b ∆ = Ul (3.14)

Oblique plane

Some other more general types of planes also exist in man-made environments. The red case in Fig. 3.3 can be described as:

Z = kY + m (3.15) Combining with Eq. 3.7 and 3.15, we can prove such planes are also pro-jected as straight lines in the V-disparity domain:

m

b ∆ = −V (sin θ + k cos θ) + f (cos θ − k sin θ) (3.16) The green case in Fig. 3.3 can be expressed as:

Z = kX + m (3.17) Similarly, we can deduce the relationship between (U, V ) and disparity:

2m − kb

2b ∆ = −V sin θ − kUl+ f cos θ (3.18) When the pitch angle θ is sufficiently small, disparity becomes linearly related to U :

2m − kb

(28)

The blue case in Fig. 3.3 can be modeled by:

Y = kX + m (3.20) With the similar principle, we can obtain:

2m − kb

2b ∆ = kUl+ V cos θ + f sin θ (3.21) Differently, the projection of this type of oblique planes doesn’t follow a linear relation.

3.2 3D LiDAR

Currently, the mechanical scanning LiDAR is the most commonly used type of laser sensors in autonomous driving. It can collect data over a wide area of up to 360 degrees by physically rotating a laser/receiver assembly, or rotating a mirror to steer a light beam. Each emitted beam is known as one channel and various numbers of channels are available including 1, 4, 16, 32, 64, 128. The range measurement can be directly calculated based on the time difference between the output pulse and receipt pulse.

3.2.1 LiDAR Model

A typical 64-channel LiDAR can capture millions of precise distance measure-ment points every second. The emitted orientation of each point is generally fixed and defined by the azimuth and zenith angles. Therefore, the raw Li-DAR data consists of four kinds of information: azimuth angle, zenith angle, range, and intensity. Besides, the resolution of azimuth angle can be adjusted by the alterable rotational speed. For example, the rotational frequency of 10 Hz corresponds to collecting one point per 0.2 degree. So, the azimuth angle resolution is 360/0.2 = 1800 points per scan.

(29)

Figure 3.4: Model of mechanical scanning LiDARs (The blue cylinder denotes LiDAR sensor and the long arrows presents laser beams)

(a) Intensity image

(b) Range image

(30)

Chapter 4 Methodology

This chapter describes the method of detecting drivable road region and actual implementation in detail.

4.1 Ground Surface Extraction

The U-V-disparity techniques have proven useful and robust for obstacle and ground surface detection in traffic scenes. However, most research works de-rive and apply the model for the case of image data. This work extends the disparity-based method’s applicability to 3D LiDAR data.

4.1.1 Disparity Image Generation

In general, the disparity in stereo vision is obtained by matching pixels in an image pair and computing its distance. Alternatively, LiDARs measure the depth of each real-world point, and the disparity can be directly obtained based on Eq. 3.4.

Central projection

We can assume one camera plane is placed parallel to the Y-Z plane of the LiDAR coordinates as shown in Fig. 4.1. Under the basic pinhole model, each forward 3D point from LiDAR can be mapped to image coordinates (u, v) by Eq. 3.1.

In general, the density of LiDAR point cloud is relatively smaller than the image resolution. As a result, the projected image will be full of missing pixels as shown in Fig. 4.2. The data density is principally determined by the focal length in terms of pixels f and the range of azimuth angle. Eq. 3.1 shows a

(31)

CHAPTER 4. METHODOLOGY 21

Figure 4.1: The setup of the virtual camera and LiDAR coordinate system. (Ol

is the origion of the LiDAR coordinates and coincides with the optical center Oi)

negative correlation between the focal length and density. If the focal length is too small, a large proportion of 3D points will be projected into the same pixels. On the contrary, the image data will be too sparse with large focal length. On the other hand, the image data becomes sparser as the deflection angle from Z-axis increases. Thus, we should carefully choose suitable values of the two parameters.

Due to LiDARs’ scanning property, the projected region is in the shape of an hourglass as shown in Fig. 4.2. The regular image therefore is obtained by cropping into a rectangle.

(32)

22 CHAPTER 4. METHODOLOGY

Scaled disparity

Since U-V-disparity methods only need to compute the histograms of disparity, it is unnecessary to obtain the real disparity in stereo vision. The equivalent disparity is defined as:

∆ = k · 1

X (4.1)

Where k is the scale factor and can be an arbitrary appropriate constant value.

Interpolation

As for the sparsity problem, the simplest solution is linearly interpolating the missing pixels by finding the nearest non-zero pixels. Specially, some mate-rials cannot bounce back laser pulses to the LiDAR receiver, such as water, glasses. Thus, in the real dataset relatively large segments are often missing. So, the search of non-zero pixels will be limited within a particular range.

Basically, the interpolation follows two steps:

1) For a missing pixel with coordinate (u, v), find the nearest up and down non-zero pixels within coordinate range of [v − d, v + d], then calculate the new disparity by:

∆new =

∆up(vup− v) + ∆down(v − vdown)

vup− vdown

(4.2) 2) If don’t find two non-zero pixels in v coordinate direction, then process

similarly in u direction: ∆new =

∆up(vup− v) + ∆down(v − vdown)

vup− vdown

(33)

(a) Original disparity image in grayscale

(b) Interpolated disparity image in grayscale

(c) Interpolated disparity image in pseudo color

Figure 4.3: Resulting sample of linear interpolation

4.1.2 U-disparity Map

The U-disparity map is achieved by computing disparity histograms column by column from the disparity image. The disparity value is manually set to range from 0 to maxDisparity_U and then averagely split into bins with number of bins_U . The histogram is defined as follows:

           H_mi = rows X n=0 ξm,n ξm,n = ( 1, if ∆(m, n) in ithbin 0, otherwise (4.4) Where m and n are respectively the column and row index.

(34)

Figure 4.4: U-disparity map sample

4.1.3 V-disparity Map

Using a similar principle, the V-disparity is obtained by accumulating the pixels with the same disparity in a rowwise manner. The two parameters maxDisparity_V and bins_V should be set up as well. The histogram is defined as:            H_ni = cols X m=0 ξm,n ξm,n = ( 1, if ∆(m, n) in ithbin 0, otherwise (4.5) As shown in Fig. 4.5a, the V-disparity map provides a side-view projection of the 3D scene.

(a) Original (b) After obstacle removal

(35)

4.1.4 Crude Obstacle Removal

In traffic scenes, most obstacles can be presented as a combination of some vertical planes in the world coordinates, such as vehicles, pedestrians, cyclists, buildings, and guardrails. As explained in section 3.1.4, for the case of vertical planes disparity is linearly correlated to the only variable, column index u. Therefore, in a certain column the pixels from obstacles will have roughly the same disparity value. That is, the pixels with high intensity in the U-disparity map correspond to obstacles.

Thus, the obstacles can be identified by applying a thresholding operation in the U-disparity map. If the intensity value of a pixel in the U-disparity map is larger than a certain threshold thr_U , its corresponding pixel in the original disparity image is labeled as obstacles.

Figure 4.6: Resulting disparity image in pseudo color after obstacle removal (red means value of zero)

As shown in Fig. 4.6, not all pixels from obstacles are removed completely. Actually, this stage is intended to simplify the V-disparity map’s representa-tion. As illustrated in Fig. 4.5b, the vertical lines with high intensity in Fig. 4.5a which correspond to obstacles are eliminated. In this way, the road pro-file can be extracted more accurately from the preprocessed V-disparity map, which will be discussed in the following section.

4.1.5 Longitudinal Road Profile Extraction

With the linear characteristics described in section 3.1.4, we can infer that through the previous step, the pixels with high intensity in the V-disparity map are very likely to correspond to the ground surfaces and are supposed to form a continuous curve. In general urban scenes, the well-structured ground surface mostly consists of one single plane and will be projected as an ideal straight line in the V-disparity map. Thus, typical 2D line fitting algorithms, like Hough transformation, geometric Hashing, are proven applicable and re-liable but fail for the cases of non-flat or multiplanar ground surfaces.

(36)

operator to detect the corresponding edges. The operator uses two 3×3 kernels which are convolved with the V-disparity map to compute the vertical and horizontal derivative approximations:

Gx =   −1 0 +1 −2 0 +2 −1 0 +1  ∗ Iv, Gy =   +1 +2 +1 0 0 0 −1 −2 −1  ∗ Iv (4.6)

Where Iv is the V-disparity map and ∗ denotes the 2D convolution

opera-tion.

The gradient in any orientation is approximated by combining the filter response in vertical and horizontal orientations. And the resulting gradient magnitude is defined as follows:

G = qG2

x+ G2y (4.7)

Using the Sobel operator enhances the projection of ground surfaces in the V-disparity disparity. Applying a thresholding operation can directly separate the potential pixels as shown in Fig 4.7. The threshold parameter is denoted as magT hr.

(a) Gradient magnitude (b) After thresholding operation

Figure 4.7: Resulting sample of the Sobel operator (Pixels in the red circles are outliers)

(37)

• First, the initial road profile consists of the most right white pixels of each row.

• Then an outlier checking process is carried out from the second bottom row to the top row. If the column index of the initial road profile Λi is

out of range [Λi+1− outlierT hr, Λi+1+ outlierT hr], the outlier should

be replaced by the most right white pixel within this range. Specially, if no white pixel exists within this range, then:

Λi = Λi+1− 1 (4.8)

where i denotes the row index from 1 to image height and outlierT hr is a constant threshold value.

Applicability in the case of removing interpolation

All aforementioned results are based on the interpolated disparity image of Fig. 4.3b. We found the Sobel operator also works for the case of original disparity image without interpolation process.

As shown in Fig. 4.8a, some rows on the V-disparity map lack the pixels with high intensity due to the missing pixels problem in the original disparity image. But the corresponding gradient magnitude still keep high since the 3 by 3 Sobel filter takes its surrounding pixels into consideration. Further experiments show that the interpolation step is not necessary and such road profile extraction method is applicable for spare disparity images.

(a) V-disparity map (b) Thresholded map

(38)

4.1.6 Road Point Extraction

Given the longitudinal road profile in the V-disparity map, the pixels whose disparity is smaller than the road profile only possibly correspond to negative objects, such as ditches, potholes. The pothole detection is beyond the scope of this project and we just assume no pothole or other road surface distress exist. Besides, Ditches are regards as a class of road boundaries and will be discussed in the following section. Thus, the region with disparity no larger than corresponding road profile is labeled as ground surface as shown in Fig. 4.9.

Figure 4.9: Extracted road region in pseudo color (red means value of zero) Similarly, it is straightforward to label the LiDAR points based on the dis-parity of the extracted road profile as follows. The 3D points whose corre-sponding disparity is not larger than the road profile are labeled as ground points. Besides, the points projected outside the image bounding box are called dead points since they are not involved in the proposed method. All remaining points then are obstacle points. Fig. 4.10 illustrates the labeling result.

Why not label based on the image extraction result

Since comparing disparity value pixel by pixel to road profile can obtain the image result as Fig. 4.9, one intuitive but defective solution is to label LiDAR points by checking whether the corresponding projection is within the 2D road region.

(39)

(a) Font view

(b) Top view

Figure 4.10: Resulting sample of labeling LiDAR points (Yellow denotes road class, blue is dead points and green represents obstacle class).

(a) Pixel based (b) Point based

(40)

4.2 Road Boundary Detection

Despite of successful obstacle removal by previous step, the result of ground surface extraction still retains some road structures which mostly act as driv-able road boundary, including curbs, ditches, grasses, sidewalks. These dif-ferent boundary structures commonly have a large geometrical variation in the surroundings. In the following, several types of features are proposed to ob-tain the candidate boundary points. Then, a regression algorithm is used to fit the boundary function for the case of simple straight road.

4.2.1 Feature Design

The following three types of feature descriptors response a value to each Li-DAR point. A large value means the corresponding point very differentiates with its surrounding points. A simple thresholding operation is applied to filter out the road boundary candidate points. The threshold parameter is denoted as f eature_thr.

3D surface curvature

The surface curvature describes the variation along the surface normal and is defined as follows: For each point p, its surrounding points piwithin a certain

radius are selected. The curvature corresponds to the weight of the smallest eigenvector.                      p = 1 k k X i=1 pi C = 1 k k X i=1 (pi− p)(pi− p)T σ = λ0 λ0+ λ1+ λ2 , λ0 < λ1 < λ2 (4.9)

Where λ is the eigenvalue of covariance matrix C.

(41)

Elevation variance

It is assumed that the road is the smoothest surface with unsmoothed edges on both sides. Considering each laser scan, the variance in vertical direction provides a discriminative descriptor of the smoothness. For each scan, the points labeled as ground surfaces are remained and sorted by corresponding azimuth angles. For ith _{point p}

i = [xi, yi, zi], its front and back k points are

selected. The elevation variance is obtained by calculating the variance in the z direction:              z = 1 2k + 1 i+k X j=i−k zj var = 1 2k + 1 i+k X j=i−k (zj − z)2 (4.10) As shown in Fig. 4.12, the road boundary can be identified by searching the left and right local extreme peaks.

(a) LiDAR scan

(b) Elevation variance (k = 3)

Figure 4.12: Resulting samples of the elevation variance feature (Yellow de-notes the ground surface points and obstacle points are plotted in green. The points in the red circles corresponds to curbs)

Least square error (LSE) of linear regression

(42)

edges. Therefore, considering the bird view of each scan, the sequential points with low linear correlation are very likely to correspond to the road boundary.

Figure 4.13: Model of laser scan on road surfaces, curbs, ditches and grasses (The dotted line segments represents one laser scan)

For each scan, the points labeled as ground surfaces are sorted by azimuth angles first. For ith point pi = [xi, yi, zi], the selected sequence consists of

its front and back k points. It is assumed that the sequence of points can be modeled as a simple linear function:

xj = β0+ β1yj+ εj, j ∈ [i − k, i + k] (4.11)

The parameter β can be easily estimated by the standard linear least square. The sum of square residual is a straightforward indicator to measure the degree of linearity:

Least square error =

i+k

X

j=i−k

ε2_j (4.12) Similar with the elevation variance feature, the extreme peaks corresponds to the candidate boundary points as shown in Fig. 4.14.

4.2.2 Boundary Model

(43)

(a) LiDAR scan

(b) LSE (k = 5)

Figure 4.14: Resulting samples of the LSE feature (The points in the red circles corresponds to curbs)

sum of squared residuals over a subset of the input points. The feature points with large residuals are iteratively removed and do not affect the final fit.

In order to simplify this task, it is assumed that straight boundaries exist on both left and right sides. The road boundaries are fitted to a simple lin-ear function. Therefore, such method cannot tackle the cases of no boundary observation, or single boundary observation, or winding boundaries.

Algorithm 1 The Least Trimmed Squares Algorithm Input: Independent variable X = {x1, ..., xN},

Dependent variable Y = {y1, ..., yN},

Outlier portion p

Output: Estimated parameter β

1: Number of points to remove n = length(X) × p

2: for 1 to n do

3: β = EstimateP arameter(X, Y )

4: Y = F itData(β, X)e

5: residual γ = abs( eY − Y )

(44)

4.3 Implementation

This section introduces how the proposed method is implemented.

4.3.1 Dataset

We have evaluated the aforementioned algorithm on three datasets which en-compass both urban and suburb scenes, two types of 64-channel LiDARs.

Scania dataset

A 64-channel LiDAR is mounted with a relatively large tilt angle on the head of the Scania autonomous bus Klara as shown in Fig. 4.15. Such large mount angle makes the nearby (<12m) point cloud especially dense, but at the ex-pense of decreasing the effective perception distance. We collected a set of LiDAR data with total 414 sequential frames at Södertälje. This data contains some typical suburb traffic scenes, such as ditches, tangled trees, non-flat road surfaces, hills.

Figure 4.15: LiDAR setup on Scania Klara bus

Dataset provided by the LiDAR supplier

(45)

KITTI road dataset

The KITTI road estimation benchmark [59] consists of 289 training and 290 testing images, as well as the synchronized LiDAR data captured by the Velo-dyne HDL-64E. This dataset mainly focuses on well-structured urban road scenarios, but covers quite an amount of different road contexts. The Ground truth of drivable road areas has been annotated on each image. Using the pro-vided calibration parameters between the camera and LiDAR frames, we can easily obtain the corresponding road annotation of each point.

4.3.2 Experimental Equipment

Hardware resources

In this project, a common laptop with an Intel Core i7-4720HQ CPU and 8 GB RAM was used for the proposed model.

Software resources

Several pieces of software and programming languages were used for the im-plementation of different stages. The primary development was done on Mat-lab 2017b. However, the LiDAR supplier cooperating with Scania only pro-vides a ROS interface to load raw data. Therefore, some work regarding raw data preprocessing and format conversion was developed on ROS Kinetic in C++. As for KITTI dataset, an official Matlab development kit is available for basic data processing and visualization.

4.3.3 Practical Considerations

In general, a frame of 64-channel LiDARs contains over one million 3D points. Processing such huge amount of data is very computationally heavy. In order to speed up the proposed algorithm, several assumptions are established as follows:

• The lane width generally ranges from 2.5 to 3.7 meters. The widest road with eight lanes and pavements should be less than 35 meters wide. Therefore, it is reasonable to only consider 3D points within the range of [−25, 25] meters in the y direction.

(46)

disparity image. Then the 3D points projected above the particular row are directly labeled as obstacles.

(47)

Chapter 5 Results and Discussion

Unfortunately, both Scania and LiDAR supplier dataset do not contain the point-wise annotation on the classes of obstacles and road surfaces. There-fore, we can only qualitatively analyze results for the cases of different traffic scenarios. Then quantitative evaluation is carried out on KITTI benchmark. After that, both advantages and disadvantages of the exploited method are de-scribed.

5.1 Case study

Scania uses a 64-channel LiDAR with different specifications from the Velo-dyne HDL-64E, including lower vertical angular resolution, larger vertical field of view, lower precision. Moreover, the LiDAR is unconventionally mounted with a large pitch angle. Most importantly, the proposed method is expected to prove useful for such LiDAR data and setup.

5.1.1 Ground Surface Extraction

Urban Scene

Urban traffic scenes generally consist of flat and planar road surfaces, well-structured road design, as well as vertical on-road obstacles. For typical ur-ban road surface, the longitudinal road profile is mostly presented as a perfect straight line in the V-disparity domain. Common line extraction algorithms like Hough transform can be used to extract the road profile [10, 30, 47]. De-spite that the proposed method complicates the extraction of road profile as an edge detection problem, it achieves satisfying performance on urban scenes.

(48)

38 CHAPTER 5. RESULTS AND DISCUSSION

As shown in Fig. 5.1, obstacles are all successfully identified, including vehicles, pedestrians, trees, buildings. Points over 30 meters away are actually very sparse, but the method still successfully extract the correct road surface points. More resulting samples in urban scenes are presented in Appendix B.1.1 to show its robustness and accuracy.

(a) Front view

(b) Top view and V-disparity map (The white pixels denote the extracted road profile)

Figure 5.1: Resulting sample of the LiDAR supplier’s dataset (The dead points are not included for better visualization)

Suburb Scene

Suburb scenes are dominated by nonplanar road surfaces and irregular struc-tures, such as ditches, bushes and hills. The line fitting techniques definitely fail to tackle such road cases since the mapping of road profiles in the V-disparity map are not a simple straight line. The work in [32] proposes a road profile extraction algorithm by identifying the points with maximum intensity value for each row in the V-disparity map. We find it is only useful for the traffic scenario with single lane.

(49)

CHAPTER 5. RESULTS AND DISCUSSION 39

the left bush, are misclassified as obstacles as shown in Fig. 5.2c. In contrast, our proposed method can deal with this challenging case. Appendix B.1.2 presents more resulting samples in other suburb scenarios.

In summary, the proposed method is reliable and accurate for the ground surface extraction task. Although a few misclassified points exist around the obstacles or on ground surfaces, it never incorrectly labels the whole vehicle or pedestrian clusters as ground surface on any LiDAR frame of the two datasets.

(a) Result of the proposed algorithm (front and right view)

(b) Result of the proposed algorithm (c) Result of the baseline algorithm [32]

Figure 5.2: Resulting samples of Scania dataset (Green points in the red circles represent trees)

5.1.2 Candidate Boundary Point Identification

(50)

The curvature feature is the stablest, but requires dense enough point clouds. On the contrary, the LSE feature behaves better for distant and sparse point clouds. The elevation variance feature can deal with both close and distant points, but is relatively sensitive to noises.

(a) Curvature (b) Elevation variance (c) LSE

Figure 5.3: Resulting samples for identifying curb points (The blue points are detected features; the parameter setting are as follows: curvature, radius = 0.5, f eature_thr = 0.03; elevation variance: k = 3, f eature_thr = 0.0015; LSE: k = 5, f eature_thr = 0.02)

(a) Front view

(b) Curvature (c) Elevation variance (d) LSE

(51)

5.2 Evaluation on KITTI Dataset

Since the KITTI dataset is dominated by planar urban road scenarios, the ground plane extraction does not face much difficulty as explained in Section 4.1.1. Besides, the elevation variance feature is chosen to use after carefully tuning parameters and comparing performances of the three types of features. However, limited by the road boundary regression model, the proposed method fails in some tricky cases as shown in Fig. 5.5. Therefore, the road boundaries are required to be straight and observable on both sides in order to make the boundary model effective. Fig. 5.6 presents some successfully resulting sam-ples. Finally, the yellow points between the two detected road boundaries will be labeled as drivable road region. More resulting samples are shown in Ap-pendix B.2.

KITTI officially defines several pixel-based metrics for evaluation in 2D bird’s eye view space [60]. With the similar principle, classical metrics are used for point-based evaluation as follows:

Precision = T P

T P + F P (5.1) Recall = T P

T P + F N (5.2) False Positive Rate (FPR) = F P

T P + F P + T N + F N (5.3) False Negative Rate (FNR) = F N

T P + F P + T N + F N (5.4) Accuracy = T P + T N

T P + F P + T N + F N (5.5) Results on the test set can only be evaluated via the KITTI official website. Among the available train set, we manually select 165 frames which contain two straight observable road boundaries. The evaluation result of these 165 frames is presented in Tab. 5.1.

Table 5.1: Evaluation on the selected 165 frames Precision Recall FPR FNR Accuracy

94.36 % 98.45 % 2.07 % 0.56 % 97.37 %

(52)

negative rate. Generally, precision and recall are inversely related to each other, where it is only possible to increase one at the cost of reducing the other. Considering the safety of obstacle avoidance system, a relatively high preci-sion is more vital. Therefore, the proposed method may encounter potential safety issues.

5.3 Computation Complexity

The proposed algorithm does not involve lots of complex calculations. Espe-cially, the ground surface extraction method segments point clouds in 2D space mainly by simple comparison operations. Besides, after removing obstacles the point cloud fed to the road boundary detection is largely shrunk.

The computational complexity is evaluated by the runtime of the devel-oped programs in Matlab. Based on the result of Tab. 5.2, the speed of the ground surface extraction algorithm can reach averagely 23.26 frames per sec-ond. Computing curvature features is much more time-consuming than other two features since it requires massive point-wise searching operations. If us-ing the elevation variance feature, the average speed of the whole algorithm is about 5.46 fps. However, it is anticipated that there is plenty of room for optimization, especially if the method is implemented in C++.

Table 5.2: Runtime of each step on 100 LiDAR frames in Matlab Step Time / s Ground

surface extraction

Project to image plane 0.81 U-disparity 3.29 Remove obstacle roughly 1.92 V-disparity 0.37 Extract road profile 0.18 Classify point cloud 1.03 Road

boundary detection

Compute surface curvature feature 99.88 Compute elevation variance feature 7.65

(53)

(a) Fork road

(b) Too short curb

(c) Winding road

(d) Blocked by vehicles

(54)

(55)

5.4 Good properties

Besides the detection performance and computation complexity discussed in the above, some other good properties of the proposed algorithm are summa-rized as follows:

1) Omit the stereo matching process

Most camera-based U-V-disparity approaches obtain disparity map from dense stereo matching. However, stereo matching techniques generally require specific optimization to reach real-time performance, and the error of disparity values increases with depth. In contrast, our method directly achieves the disparity map by a simple linear projection and is successfully adjusted to tackle the sparsity of disparity maps. Therefore, such LiDAR-based algorithm is not only more computationally efficient, but also more reliable for 3D information inference.

2) Identify outliers of LiDAR point clouds

As shown in Fig. 5.7, some extreme points exist which actually don’t correspond to any real-world objects. But such outliers generally are isolated from the ground surface, thereby resulting in large difference between corresponding disparity values and road profiles. Therefore, the outliers are successfully identified as obstacles and don’t affect the subsequent stage of road boundary detection.

Figure 5.7: Ability to identify outliers of LiDAR data (The isolated points in the red circles are definitely outliers and all classified as obstacles).

3) Parameters are easy to tune

(56)

5.5 Limitations

As described in Section 4.2, the boundary detection method is restricted to the simplest case. In addition, some assumptions and simplifications are applied to the surface extraction model. The corresponding limitations are listed in the following:

1) Reduce the field of view to a camera case

The two types of LiDARs in this project both capture 3D data over a 360-degree scanning area. LiDARs generally are mounted horizontally on the roof of autonomous cars, thereby providing 3D cues around. How-ever, the proposed method only considers the front points which are pro-jected within the predefined image region. In the real world, perception of back and two-side environment is an indispensable capability, espe-cially when AVs decide to back up or make a turn.

2) Require calibration on LiDARs’ pitch and yaw angles

As explained in Section 3.1.4, only when the pitch and yaw angles are small enough, the linear characteristic is obeyed in the U-V-disparity domain. Otherwise, the obstacles and ground surfaces cannot be pro-jected as curves with high intensity. Thus, calibration on the two angles is necessary. Then the LiDAR coordinate can be rotated with approxi-mate zero pitch and yaw angles. In other words, a large calibration error or looseness or accidental displacement possibly leads to invalidation of the proposed method.

3) Require a decent-sized road area in front

The assumption of a decent-sized road area is reasonable since AVs gen-erally keep a certain distance from surrounding obstacles. If the ground surface is almost completely blocked by obstacles, the method will ex-tract an incorrect road profile.

4) May fail to detect slanted obstacles

(57)

(58)

Chapter 6 Conclusions

6.1 Conclusions

In this project, a drivable road region detection system was developed for au-tonomous vehicles with a 64-channel LiDAR. The thesis work consists of three contributions:

• A U-V-disparity based method is proposed to extract ground surface from point cloud. In the U-V-disparity domain, obstacles are projected as points with high intensity, and ground surface is mapped as a curve. Using this attribute, the proposed method converts the 3D ground sur-face extraction problem to a simple edge detection task in 2D image. Those 3D points above the extracted ground surface are classified as obstacles.

• Three types of features are designed to filter the 3D ground points and obtain the candidate boundary points.

• A robust regression algorithm is proposed to fit road boundaries. Experiments have been conducted under variations involving different road scenarios, LiDAR mounting pose and two LiDAR configurations. Experi-mental results illustrate that the ground surface extraction method can achieve a realtime and robust performance. The three types of features can roughly identify the boundary points, but are all too computational-heavy due to mas-sive filtering operations among points. Limited by the regression algorithm, the proposed system finally restricts to the most common case: two straight boundaries are observable. Evaluation on KITTI road benchmark shows that for such case the overall recall and accuracy both reach a high rate over 97%.

(59)

CHAPTER 6. CONCLUSIONS 49

6.2 Future Works

The potential directions for future works are suggested as follows:

Expanding and annotating the Scania dataset

What concerns Scania most is the performance of the proposed system on the Scania Klara bus. But the Scania dataset used in this project only con-tains 414 frames in suburb scene. Moreover, no pointwise annotation makes it impossible to conduct quantitative evaluation and compare with other meth-ods. Therefore, it is essential to expand and annotate dataset, convering larger amount of different road contexts.

Improving road boundary model

The linear regression model used in this project can only fit straight road boundary. But actually the real-world cases are much more complicated. Some nonlinear curve fitting functions, such as B-spline, may be useful for the cases of winding or fork road. Besides, one interesting direction is to incorporate the prior knowledge of surrounding road from map. In this way, the system would know which model is suitable for current case.

What if road boundary becomes unobservable

The least trimmed squares algorithm was used to avoid the impact of false pos-itives, but it cannot tackle the situation when the road boundary is blocked by obstacles. One possible solution is to track the detected boundary points based on Kalman filter. In general, the detection part would be much more computa-tionally heavy than tracker. How to choose a suitable detection frequency can be investigated in future works.

Exploiting the information of reflectivity

(60)

50 CHAPTER 6. CONCLUSIONS

Extending to 3D object detection

(61)

Bibliography

[1] Aharon Bar Hillel et al. “Recent progress in road and lane detection: a survey”. In: Machine vision and applications 25.3 (2014), pp. 727–745. [2] Hazel Si Min Lim and Araz Taeihagh. “Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities”. In: Sustainability 11.20 (2019), p. 5791.

[3] World Health Organization et al. Global status report on road safety

2018: Summary. Tech. rep. World Health Organization, 2018.

[4] José M Alvarez, A Lopez, and Ramon Baldrich. “Illuminant-invariant model-based road segmentation”. In: 2008 IEEE Intelligent Vehicles

Symposium. IEEE. 2008, pp. 1175–1180.

[5] Ying Guo, Vadim Gerasimov, and Geoff Poulton. “Vision-based driv-able surface detection in autonomous ground vehicles”. In: 2006 IEEE/RSJ

International Conference on Intelligent Robots and Systems. IEEE. 2006, pp. 3273–3278.

[6] Vijay Badrinarayanan, Ignas Budvytis, and Roberto Cipolla. “Mixture of trees probabilistic graphical model for video segmentation”. In:

In-ternational journal of computer vision110.1 (2014), pp. 14–29. [7] Hui Kong, Jean-Yves Audibert, and Jean Ponce. “Vanishing point

detec-tion for road detecdetec-tion”. In: 2009 IEEE Conference on Computer Vision

and Pattern Recognition. IEEE. 2009, pp. 96–103.

[8] Jinjin Shi, Jinxiang Wang, and Fangfa Fu. “Fast and robust vanishing point detection for unstructured road following”. In: IEEE Transactions

on Intelligent Transportation Systems17.4 (2015), pp. 970–979. [9] Hui Kong, Sanjay E Sarma, and Feng Tang. “Generalizing Laplacian of

Gaussian filters for vanishing-point detection”. In: IEEE Transactions

on Intelligent Transportation Systems14.1 (2012), pp. 408–418.

(62)

52 BIBLIOGRAPHY

[10] Raphael Labayrade, Didier Aubert, and J-P Tarel. “Real time obstacle detection in stereovision on non flat road geometry through" v-disparity" representation”. In: Intelligent Vehicle Symposium, 2002. IEEE. Vol. 2. IEEE. 2002, pp. 646–651.

[11] Hernán Badino, Uwe Franke, and David Pfeiffer. “The stixel world-a compact medium level representation of the 3d-world”. In: Joint Pattern

Recognition Symposium. Springer. 2009, pp. 51–60.

[12] Luca Caltagirone et al. “LIDAR–camera fusion for road detection us-ing fully convolutional neural networks”. In: Robotics and Autonomous

Systems111 (2019), pp. 125–131.

[13] Xiaofeng Han et al. “Road detection based on the fusion of Lidar and image data”. In: International Journal of Advanced Robotic Systems 14.6 (2017), p. 1729881417738102.

[14] Shuo Gu et al. “Road Detection through CRF based LiDAR-Camera Fu-sion”. In: 2019 International Conference on Robotics and Automation

(ICRA). IEEE. 2019, pp. 3832–3838.

[15] Angel Domingo Sappa et al. “An efficient approach to onboard stereo vision system pose estimation”. In: IEEE Transactions on Intelligent

Transportation Systems9.3 (2008), pp. 476–490.

[16] W Shane Grant, Randolph C Voorhies, and Laurent Itti. “Finding planes in LiDAR point clouds for real-time registration”. In: 2013 IEEE/RSJ

International Conference on Intelligent Robots and Systems. IEEE. 2013, pp. 4347–4354.

[17] Xiao Ai et al. “Obstacle detection using U-disparity on quadratic road surfaces”. In: 16th International IEEE Conference on Intelligent

Trans-portation Systems (ITSC 2013). IEEE. 2013, pp. 1352–1357.

[18] Andreas Wedel et al. “B-spline modeling of road surfaces for freespace estimation”. In: 2008 IEEE Intelligent Vehicles Symposium. IEEE. 2008, pp. 828–833.

[19] Wende Zhang. “Lidar-based road and road-edge detection”. In: 2010

IEEE Intelligent Vehicles Symposium. IEEE. 2010, pp. 845–848. [20] Alberto Y Hata, Fernando S Osorio, and Denis F Wolf. “Robust curb

de-tection and vehicle localization in urban environments”. In: 2014 IEEE

(63)

BIBLIOGRAPHY 53

[21] Baoxing Qin et al. “Curb-intersection feature based monte carlo local-ization on urban roads”. In: 2012 IEEE International Conference on

Robotics and Automation. IEEE. 2012, pp. 2640–2646.

[22] Jaehyun Han et al. “Enhanced road boundary and obstacle detection using a downward-looking LIDAR sensor”. In: IEEE Transactions on

Vehicular Technology61.3 (2012), pp. 971–985.

[23] Carlos Fernández et al. “Curvature-based curb detection method in ur-ban environments using stereo and laser”. In: 2015 IEEE Intelligent

Ve-hicles Symposium (IV). IEEE. 2015, pp. 579–584.

[24] Soren Kammel and Benjamin Pitzer. “Lidar-based lane marker detec-tion and mapping”. In: 2008 IEEE Intelligent Vehicles Symposium. IEEE. 2008, pp. 1137–1142.

[25] Alberto Hata and Denis Wolf. “Road marking detection using LIDAR reflective intensity data and its application to vehicle localization”. In:

17th International IEEE Conference on Intelligent Transportation Sys-tems (ITSC). IEEE. 2014, pp. 584–589.

[26] In-So Kweon et al. “Terrain mapping for a roving planetary explorer”. In: IEEE International Conference on Robotics and Automation. IEEE. 1989, pp. 997–1002.

[27] Rudolph Triebel, Patrick Pfaff, and Wolfram Burgard. “Multi-level sur-face maps for outdoor terrain mapping and loop closing”. In: 2006 IEEE/RSJ

international conference on intelligent robots and systems. IEEE. 2006, pp. 2276–2282.

[28] Sujit Kuthirummal, Aveek Das, and Supun Samarasekera. “A graph traversal based algorithm for obstacle detection using lidar or stereo”. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and

Systems. IEEE. 2011, pp. 3874–3880.

[29] Gangqiang Zhao and Junsong Yuan. “Curb detection and tracking using 3D-LIDAR scanner”. In: 2012 19th IEEE International Conference on

Image Processing. IEEE. 2012, pp. 437–440.

[30] Zhencheng Hu, Francisco Lamosa, and Keiichi Uchimura. “A complete uv-disparity study for stereovision based 3d driving environment anal-ysis”. In: Fifth International Conference on 3-D Digital Imaging and

(64)

54 BIBLIOGRAPHY

[31] Yuan Gao et al. “UV-disparity based obstacle detection with 3D camera and steerable filter”. In: 2011 IEEE Intelligent Vehicles Symposium (IV). IEEE. 2011, pp. 957–962.

[32] Meiqing Wu, Siew-Kei Lam, and Thambipillai Srikanthan. “Nonpara-metric technique based high-speed road surface detection”. In: IEEE

Transactions on Intelligent Transportation Systems16.2 (2014), pp. 874– 884.

[33] Umar Ozgunalp et al. “Multiple lane detection algorithm based on novel dense vanishing point estimation”. In: IEEE Transactions on Intelligent

Transportation Systems18.3 (2016), pp. 621–632.

[34] Rui Fan and Naim Dahnoun. “Real-time stereo vision-based lane de-tection system”. In: Measurement Science and Technology 29.7 (2018), p. 074005.

[35] Amita Dhiman, Hsiang-Jen Chien, and Reinhard Klette. “Road surface distress detection in disparity space”. In: 2017 International Conference

on Image and Vision Computing New Zealand (IVCNZ). IEEE. 2017, pp. 1–6.

[36] Zhen Zhang et al. “An efficient algorithm for pothole detection using stereo vision”. In: 2014 IEEE International Conference on Acoustics,

Speech and Signal Processing (ICASSP). IEEE. 2014, pp. 564–568. [37] David Pfeiffer and Uwe Franke. “Towards a Global Optimal Multi-Layer

Stixel Representation of Dense 3D Data.” In: BMVC. Vol. 11. 2011, pp. 51–1.

[38] Marius Cordts et al. “The stixel world: A medium-level representation of traffic scenes”. In: Image and Vision Computing 68 (2017), pp. 40– 52.

[39] Friedrich Erbs, Alexander Barth, and Uwe Franke. “Moving vehicle de-tection by optimal segmentation of the dynamic stixel world”. In: 2011

IEEE Intelligent Vehicles Symposium (IV). IEEE. 2011, pp. 951–956.

[40] Friedrich Erbs, Beate Schwarz, and Uwe Franke. “Stixmentation-Probabilistic Stixel based Traffic Scene Labeling.” In: Bmvc. 2012, pp. 1–12.

[41] David Pfeiffer et al. “Ground truth evaluation of the Stixel representa-tion using laser scanners”. In: 13th Internarepresenta-tional IEEE Conference on

3D LiDAR based Drivable Road Region Detection for Autonomous Vehicles

3D LiDAR based Drivable Road

Region Detection for Autonomous

Vehicles

JIANGPENG TAO

3D LiDAR based Drivable

Road Region Detection for

Autonomous Vehicles

JIANGPENG TAO

Abstract

Sammanfattning

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Project Overview

1.2

Research Question

1.3

Outline

Chapter 2

Related Work

2.1

Rigid Model Fitting

2.2

Occupancy Grid Map based

2.3

Disparity based

2.4

Deep Learning based

2.5

Method Choice

Chapter 3

Theory

3.1

Stereo Vision

3.1.1

Stereo Camera Model

3.1.2

Stereo Matching

3.1.3

U-V-disparity Domain

3.1.4

3D Planes Projection in U-V-disparity Domain

3.2

3D LiDAR

3.2.1

LiDAR Model

Chapter 4

Methodology

4.1

Ground Surface Extraction

4.1.1

Disparity Image Generation

4.1.2

U-disparity Map

4.1.3

V-disparity Map

4.1.4

Crude Obstacle Removal

4.1.5

Longitudinal Road Profile Extraction

4.1.6

Road Point Extraction

4.2

Road Boundary Detection

4.2.1

Feature Design

4.2.2

Boundary Model

4.3

Implementation

4.3.1

Dataset

4.3.2

Experimental Equipment

4.3.3

Practical Considerations

Chapter 5