Visual Map-based Localization applied to Autonomous Vehicles

(1)

DEGREE PROJECT, IN COMPUTER SCIENCE , SECOND LEVEL STOCKHOLM, SWEDEN 2015

Visual Map-based Localization applied to Autonomous Vehicles

JEAN-ALIX DAVID

(2)

Visual Map-based Localization applied to Autonomous Vehicles

Jean-Alix DAVID jadavid@kth.se

Master’s Thesis in Computer Science at Computer Science and Communication School

Supervisor: Patric JENSFELT Examiner: Stefan H Å CARLSSON INRIA Supervisor: Amaury NEGRE

June 2015

(3)

Abstract

This thesis is carried out in the context of Advanced Driver Assistance Systems, and especially autonomous vehicles. Its aim is to propose a method to enhance localization of vehicles on roads.

I suggests using a camera to detect lane markings, and to match these to a map to extract the corrected position of the vehicle.

The thesis is divided in three parts dealing with: the map, the line detector and the evaluation. The map is based on the OpenStreetMap data. The line detector is a based on ridge detection. The results are compared with an Iterative Closest Point algorithm.

It also focuses on implementing the components under a real- time constraints. Technologies such as ROS, for synchronization of the data, and CUDA, for parallelization, are used.

(4)

Acknowledgments

I would like to thank:

– Amaury NEGRE, my supervisor in INRIA, for his help and advice.

– Christian LAUGIER and the INRIA for accepting me, and allow me to do my master thesis in their lab.

– Patric JENSFELT, my supervisor in KTH, for his help and guidance.

– Stefan CARLSSON, my examiner for this thesis.

(7)

List of Figures

1.1 Approach . . . 2

3.1 Representation of OpenStreetMap data on a top-down view . . . 6

3.2 Line generator graph . . . 7

3.3 Representation of the lane markings generation . . . 7

3.4 Corrected data for a crossroads . . . 8

3.5 Laplacian . . . 10

3.6 Ridges detection . . . 12

4.1 Lexus . . . 16

4.2 Tested route . . . 17

4.3 Ridges detection on highway . . . 19

4.4 Ridges detection on residential road . . . 19

4.5 ICP correction on highway . . . 21

B.1 ROS concepts . . . 34

(8)

1 Introduction

Advanced Driver Assistance Systems (ADAS) have been around for a long time. Good examples of such systems are the well-known Anti-lock Braking System (ABS) and Electronic Stability Program (ESP). They already provide a great increase of the car safety for driver, passengers and other road users.

However it is still possible to improve security. To do so the next step is a fully autonomous vehicle, which allows to totally overcome human errors. It will be achieve by high level pereception and decision algorithms and performent control of the vehicle. Given the lack of precision of GPS for localization, it is necessary to implement new ways to improve localization for precise control.

Here we introduce a method using a geographic map and images from a camera to do the localization of the vehicle, by comparing them with an Iterative Closest Point (ICP) algorithm.

1.1 Problem statement

The purpose of this thesis is to implement a method to improve vehicle localization using visual information and a map.

We also want to verify the following contraints:

• Real-time processing

• Cheap equipment adapted to vehicle use

• Embeded on the vehicle

The method is described in the ﬁgure 1.1. It has been divided in three parts:

• The treatment of OpenStreetMap map

• The lane markings detection

• The evaluation including a comparison with the ICP algorithm

(9)

CHAPTER 1. INTRODUCTION

The ﬁrst part consists of analyzing and adaptating the map. We choose OpenStreetMap because it is free, opensource and highly adaptable, especially we can store and query the database directly on the car. The second part corresponds to the implementation of a line detector to detect the lane markings.

We implemented a ridge detector as it is an eﬃcient method and only require a cheap monocular camera. The last part is the implementation of the comparison algorithm. We choose to implement an ICP algorithm because it is precise.

Camera image

Map data

Corrected position

Detected ridges OpenStreetMap data

Map generator Ridge detector

ICP

Figure 1.1: The three parts of the approach: map generation, ridge detection and comparison with ICP algorithm are articuled as shown by this ﬁgure.

(10)

2 Background

It is important for autonomous vehicles to ensure a precise control, thus it is important to have a precise localization. Moreover autonomous vehicles require a global positionning for pathplanning, but aﬀordable sensors that give absolute localization, such as GPS, are not precise enough for control. To stay in a lane a vehicle need a precision up to centimeters, where a simple GPS is precise up to a few meters. The sensors allowing good absolute precision are too expensive. Thus a local localization is needed, as well as a map to deduce the global positionning. We can achieve local localization with cheap sensors, for example with a camera.

Labayrade [1] proposed to generate a map using visual information and use it to improve lane detection, but he conﬁnes himself to lane detection.

Parra [2] proposed to use visual information as odometry, allowing to maintain localization even in case of GPS blackout. To obtain good results he used a stereo camera to obtain the visual information, which is still expensive. We deduce of these that using visual information is a good way to do localization.

Moreover it is possible to ﬁnd cheap monocular camera giving good images.

In 2014 Mercedes-Benz achieve a 103 km journey with an autonomo vehicle using lanelets [3] as map representation. Lanelets are a map representation deﬁning the parts of the road where the vehicle can drive, like virtual rails.

They are eﬃcient features for localization, but as they are a complex hand- made map representation, it is not practical to adapt it to any situation, thus we want to use a simpler and more global map. OpenStreetMap [4, 5], a free, opensource, user-generated map which provides a lot of up-to-date information about roads, is an excellent solution.

Several features can be used to track the road and match it to the map, as the lanelets for example. Some methods use 3D cameras to detect the shape of the road. That is what Danescu [6] and Nedevschi [7] porposed. They both use stereo cameras to detect the curvature of the road. They are precise as they give a direct information on the position of the vehicle relative to the road. The drawback of stereo cameras is that they are more expensive than monocular cameras, and it require to have a lot of information on the road

(11)

CHAPTER 2. BACKGROUND

stored in the map. The same inconvenience applied to the methods using geomedtrical models of the road [8, 9, 10], and trying to match the road seen on the image to a geometrical model, as a clothoide curve. Other solutions use monocular cameras and diﬀerent methods to track the road. Xu [11] showed that it is possible to use Hough transform to detect curves. Kuk [12] and Liu [13] applied it to lane detection. Hough transform allows to detect lines and curves, but it is not adapted to be used in correlation with the map, as the map is not in Hough space. Several methods have been proposed to use lane markings as visual features. Gruyer [14] proposed a method using a map of lane markings and two lateral cameras to detect the lane markings. This allows to correct the lateral drift and localize the vehicle in its lane. During the Mercedes-Benz journey in addition to lanelets lane markings were also used [15]. In a road context a good features for localization appears to be lane markings, as they bound the roads, they are the less prone to changes, and normally present on every road. A drawback is that they can not be visible for some temporary moments, for example during winter the road can be covered by snow or during work. However this drawback generally occurs in diﬃcult situations, which would still require driver’s attention, thus it can be ignored.

To detect lane markings ridgeness is the most used feature. Negre [16]

proposed an algorithm using ridgeness to detect elongated structures in a scene and on diﬀerent scales. The algorithm detects ridge points and provide elongation and orientation, which is interesting as orientation can be another feature to match the lines of the map. Lopez [17, 18, 19] applied ridgeness to lane marking detection. Kang [20] extended it to multi-lane detection.

Ridgeness needs no a priori information on the road neither expensive sensor, as only a monocular monochrome camera is necessary and it can easily be parallelizable to keep great performance.

Lane markings have often been used to locate a vehicle on the road, but generally it is relative to a map generated locally. Here we want a global localization which will allow to combine precise local control and global positionning for pathfinding for example. To make the local map generated by the sensors and the global map coincide we need an algorithm which can compare them. Different algorithm can be used to match detected lines to the map and correct localization, for example filters, such as Kalman Filter and its variants or Particle Filter, as done in [6]. They offer good and smooth results while tracking position, but lack precision in complex scenarios. For example in a multi-lane scenario a particle filter evaluate two particles the same way and not be able to choose the good one, as parallel lanes are not distinguishable.

Iterative algorithms can also be used, such as the Iterative Closest Point (ICP) algorithm [21]. The advantage of such algorithm is that it provide a precise localization, but the results are less smooth than the ones with ﬁlters.

(12)

3 Methods

This chapter describes the theoritical approach for each part of the thesis.

3.1 OpenStreetMap data

To improve localization we need a precise map. Moreover we want it free and opensource, because of our cost constraint. Thus OpenStreetMap [4, 5]

appears as a good solution, as it is free, opensource and usermade, which mean it is simple to use and to adapt to our needs.

3.1.1 Basic structure

The OpenStreetMap data are composed of three basic primitives:

• Nodes

• Ways

• Relations

Nodes define geographical points with their latitude and longitude. This can either be a real physical object, for example a bus stop, or an imaginary point defining the shape of a road. Ways define more complex features such as roads and boundaries, they consist in ordered lists of nodes. The list of nodes define the shape of the feature. If the first and last nodes of the way are the same then it is a closed way which can define an area. Relations describe other constraints between nodes, ways and/or relations. All of them can have several associated tags describing the meaning of a particular element. A tag is composed of a key and a value. The key describes the class of the feature, for example "highways" means the feature is a road, and the value details the specific feature, for example the value of "highways"

can be "motorway" or "residential". An extract of the OpenStreetMap wiki

(13)

CHAPTER 3. METHODS

can be found in Appendix A for more details and examples, an example of OpenStreetMap data can be found in ﬁgure 3.1.

(a) without satellite view (b) with satellite view

Figure 3.1: Representation of OpenStreetMap data on a top-down view

3.1.2 Lane markings generation

A problem with OpenStreetMap is that it lacks data on lane markings. Thus lane markings have to be added to the map. This has been done semi- automatically using the data on roads and lanes. We created a new Open- StreetMap map tag to identify lanes markings. The key of the tag is "marking"

and its value is either "middle" or "border" depending on the place of the line.

Then for each road we created as many lines as there are lanes on the road plus one, the border lines having the "marking" tag value "border", and the others the value "middle". This part was done automatically by what we call the line generator, as shown by the ﬁgure 3.2. The line generator iterate over each way and then iterate over each node of the way to duplicate the node the desired number of times to create the new ways representin the lane markings. As there is no convention for the positions of the lanes relative to the coordinates of the nodes, we decided set the coordinates as the center of the road and split the lanes on either side of the road.

(14)

CHAPTER 3. METHODS

Database

Line generator

Raw data Generated lines

Figure 3.2: The line generator takes the raw data from the database to generate lines and puts them back into the database. It is done oﬄine,

before manual correction.

(a) Raw OpenStreetMap data, a line represents a road, we do not see the

lanes.

(b) Modified OpenStreetMap data, here a line represents a lane markings,

thus we can see the lanes.

Figure 3.3: Representation of the lane markings generation

(15)

CHAPTER 3. METHODS

We finally manually corrected some places, essentially crossroads. This is done using a dedicated software, where the user can independently move each node and way, using a satellite view as reference. This part was done offline, and the database was then able to be used onboard by the other algorithms. It is a long and fastidious process, which shows another benefit of OpenStreetMap for a full scale implementation, as it can be done by all contributors if we publish the new tags.

Figure 3.3 shows the results of the conversion of data, and ﬁgure 3.4 shows a detailed view of a crossroads with satellite image.

Figure 3.4: Data were corrected for complicated areas such as crossroads, here a detailed view of a corrected crossroads with satellite view.

Finally the data stored are lists of points that represent lane markings.

Each line can be seen as a list of segments, where the end of each segment is

(16)

CHAPTER 3. METHODS

the begining of the next one.

3.2 Ridge detector

This part of the report describes the method used to detect lane markings. It is based on the method proposed by López [18], and using ridges as feature to detect lines.

3.2.1 Theory

This method uses Laplacian values of images to detect ridges, as proposed by Tran and Lux [22]. For each image the algorithm follows these steps:

1. Projection of the image into the horizontal plane using camera position relative to the vehicle, roll and pitch angles of the vehicle relative to the ground.

2. Computation of the Laplacian.

3. Elimination of pixels where the Laplacian is lower than a threshold.

4. Computation of the gradient and the ratio between Laplacian and gradient.

5. Elimination of pixels where the ratio is lower than an other threshold.

6. Computation of the Hessian matrix, its eigenvalues and their eigenvec- tors.

7. Elimination of pixels where eigenvalues are almost equal.

The ﬁrst step allows us to work in the same plane as the one of the map, and uses the camera calibration information and the roll and pitch angles of the vehicle to do the projection. Using the roll and pitch angles allows us to correct the projection, they are given by an Inertial Measurmant Unit (IMU).

Then by keeping pixel with a high Laplacian value we keep only bright object surrounded by darker zone, which correspond to the ridges. The Laplacian is deﬁned as follow:

L(f(x, y)) = ∂²f(x, y)

∂x² + ∂²f(x, y)

∂y²

(17)

CHAPTER 3. METHODS

And each derivative is calculated using Sobel operator with a 5 × 5 kernel.

Thus:

∂

∂x =







1 2 0 −2 −1 4 8 0 −8 −4 6 12 0 −12 −6 4 8 0 −8 −4 1 2 0 −2 −1







and

∂

∂y =







−1 −4 −6 −4 −1

−2 −8 −12 −8 −2

0 0 0 0 0

2 8 12 8 2

1 4 6 4 1







On ﬁgure 3.5 we can see the Laplacian for an image. Further steps are needed to extract lane marking.

(a) Camera image (b) Projected image (c) Laplacian of the image

Figure 3.5: Results for the Laplacian computation

The ratio between Laplacian and the norm of the gradient allows to remove the edge of these objects, thus allowing to keep only the center part of each ridge. Indeed in the middle of bright object the gradient will be almost zero, and on the edges it will be very high. Thus by dividing by the norm of the gradient we remove pixels that correspond to border of bright objects.

(18)

CHAPTER 3. METHODS

Moreover this allow, by choosing the threshold, to choose the size of objects we want to keep, ie the width of the lane markings we detect. The norm of the gradient is deﬁned by:

kgrad(f)k =

v u u t ∂f

∂x

!2

+ ∂f

∂y

!2

The Hessian matrix gives us the direction of the ridges, thus we can just keep ellongated ridges corresponding to lines, by keeping ridges where one eigenvalue of the Hessian matrix is greater than the other. The Hessian matrix is deﬁne as follow:

H(f) =





∂²f

∂x²

∂²f

∂x∂y

∂²f

∂y∂x

∂²f

∂y²





The second order derivatives are deﬁned by composing the previous 5×5 Sobel kernels. Which give three 9 × 9 kernels for the second order.

The greatest eigenvalue of the Hessian matrix gives us th direction of the line in question. Then we search the maximum value of ratio we deﬁned earlier over the orthogonal direction. We only keep this maximum and discard other value, this allows us to keep only one-pixel-wide lines. Having only one- pixel-wide lines allows to minimize the quantity of data send as input to the ICP, which is important as it will mean less computation and thus better performance.

On ﬁgure 3.6 we can see the results of the algorithm. The image 3.6a is the one from the camera, the image 3.6b is the projection of this image on a top-down view and the image 3.6c shows the detected lines. These results will be detailed and discussed later.

3.3 ICP algorithm

In this part we describe the ICP algorithm [23] used to register the map and the detected lines, and then correct the localization of the vehicle. This algorithm consists in matching the detected lines to the ones stored in the map, ﬁnd the transformation that corrects the position of the vehicle in the map, and iterate after applying the transformation.

3.3.1 Matching

The inputs of the matching are the pointcloud of pixels considered as lane markings and a list of local segments extracted from the map around the position given by the GPS. For each point in the pointcloud we search the

(19)

CHAPTER 3. METHODS

(a) Input image

(b) Projected image (c) Detected ridges

Figure 3.6: Ridges detection

(20)

CHAPTER 3. METHODS

closest line to the point in a predifined range, which direction coincide with the principal component of the Hessian matrix, this allows a better match when there are line in different directions. The range is defined to avoid to match points to lines too far away, no more than one lane at first, and the range is increased if no pixels match. Then this matching is used to find the corrected position.

3.3.2 Minimization

The corrected position is given by the transformation minimizing the following error:

E =^{N −1}^X

i=0

kq⁽ⁱ⁾− T^α.p⁽ⁱ⁾k² α= (tx, t_y, θ)^t

where p⁽ⁱ⁾ is a point of the pointcloud, q⁽ⁱ⁾ is the closest point on the matching segment, N is the number of points in the pointcloud and T^α is the 2D transformation we are looking for, with tx, ty, θ its parameters. T^α can be written as a matrix in homogeneous coordinates:

Tα =







cos(θ) − sin(θ) tx

sin(θ) cos(θ) t_y

0 0 1







The sum correspond to the sum of errors between the pointcloud and the map, that is what we want to minimize. The minimization is done using the Levenberg-Marquardt algorithm, because simpler algorithms as Gauss- Newton algorithm may not converge as we often only have information on one direction because localy the lane markings are parallel lines.

For a point p⁽ⁱ⁾ the Jacobian matrix of T^α is the 2 × 3 matrix:

J⁽ⁱ⁾= 1 0 −p⁽ⁱ⁾_x sin(θ) − p⁽ⁱ⁾_y cos(θ) 0 1 p⁽ⁱ⁾_x cos(θ) − p⁽ⁱ⁾_y sin(θ)

!

And the global Jacobian matrix is the 2N × 3 matrix:

J =







1 0 −p⁽⁰⁾_x sin(θ) − p⁽⁰⁾_y cos(θ) 0 1 p⁽⁰⁾_x cos(θ) − p⁽⁰⁾_y sin(θ) 1 0 −p⁽¹⁾_x sin(θ) − p⁽¹⁾_y cos(θ) 0 1 p⁽¹⁾_x cos(θ) − p⁽¹⁾_y sin(θ)

... ... ...

1 0 −p^{(N −1)}_x sin(θ) − p^{(N −1)}_y cos(θ)







(21)

CHAPTER 3. METHODS

The correction of the transformation is:

t= [J^tJ + λ.diag(J^tJ)]⁻¹J^tr with r the vector of errors of matching

r=







q⁽⁰⁾− T^α.p⁽⁰⁾ q⁽¹⁾− T^α.p⁽¹⁾

...

q^{(N −1)}− T^α.p^{(N −1)}







Where λ is the damping factor. It is adapted depending on eigenvalues of J^tJ.

Finally the updated transformation is T^α+t. We then iterate these two steps with the updated position. The number of iterations is a parameter allowing to have either a fast algorithm if it is small, either a better convergence if it is larger.

To ensure better convergence we add to the error the term kp_gps− T^α.p_posk², where ppos is the current position of the vehicle, pgps is the position given by the GPS. Thus when there is no match between pixels and lines the algorithm converge toward the GPS position instead of diverging because of the increasing range of matching.

(22)

4 Tests and results

This chapter presents the experimental protocol for the tests, the results to the tests and discussions about these results.

4.1 Platform and test environment

4.1.1 Platform

The tests have been done with a Lexus LS600h (as this thesis is part of a project in partnership with Toyota), shown in ﬁgure 4.1, equipped with the following sensors:

• GPS

• IMU

• Monocular RGB camera

• Stereovision camera

• Lidars

• CAN bus

In our experiment we only use the GPS, the IMU and a monocular camera.

More technical details on the equipment can be found in Appendix C.

4.1.2 Environment

The environment tested is composed of different kinds of road, with crossroads and roundabout, but mostly highway. The predefined route can be seen on figure 4.2, and is 11.3 km long. It is composed of highway, residential roads with crossroads and roundabouts. The tests principally take place in daylight with good weather condition.

(23)

CHAPTER 4. TESTS AND RESULTS

IMU + GPS (localization)

computer (online computation and data acquisition)

User interface Lidar sensors

and cameras (perception)

Figure 4.1: Lexus

4.1.3 Experimental protocol

The tests have been realized offboard, with a configuration equivalent to the one onboard. We first recorded the sensors data for a trip on the predefined route. Then we tested the algorithm on the recorded data.

This was done using ROS [24, 25] (Robot Operating System), more details on ROS can be found in Appendix B. ROS allows us to record the data while driving the car on the chosen route, and then replaying them on another computer to test the algorithms in the same conditions.

4.2 Map

4.2.1 Data storage

The data have to be stored on a local computer, as they need to be used in the vehicle while driving. To do so we use a software released by OpenStreetMap and named OSM3S [26]. It is an API which acts as a database to which the

(24)

Figure 4.2: Tested route

user can send query and get the data back as a XML ﬁle. The database can be populated by diﬀerent dataset, here we populated it with the dataset of the french region Rhône-Alpes, and our lane markings dataset for a small route around the INRIA.

4.2.2 Discussion

The map is globally correct, but there are sometimes some diﬀerences with the real road which can lead to errors in the results of the algorithm. These errors are due to the fact that the OpenStreetMap data are not always correct, as this is a user-made map. But there is also an unknown during the creation of the lines, we generally do not know where the coordinates of the points constituing a road refer to. We supposed that they refer to the center of the road, but sometimes it is false as it can refer to the center of a particular lane, depending on how the creator of the road did it.

Also some lines that exist on the road may not appear in the map as

(25)

they are not marked as roads. For example the cycle ways or pedestrian paths. Another problem is the fact that the data are not always up-to-date, to correct this we could use an internet connection to update the map even when on the road.

4.3 Ridge detector

4.3.1 Implementation

The implementation was done in C++ using the OpenCV [27] library for image processing, and using ROS to manage interaction between all sensors and parts of the platform, especially synchronization between images and inertial data. Thus the input are the images from the camera and the roll and pitch angles of the vehicle, and the output is a pointcloud of pixels considered as lane markings.

To improve performance the Laplacian, gradient and Hessian matrix com- putations were parallelized using CUDA and a GPU. This allows to have a real-time algorithm, as the image treatment is much faster.

4.3.2 Results

On ﬁgures 3.6 and 4.3 we can see the results of line detection for highway. The results are good, because all lines are detected, and there is no detection where there is no lines. However lines that are too thin are not always detected or only partially, but this is not really a problem as having more pixels for a line does not increase signiﬁcantly the results of the ICP. We can also see some aliasing on long lines, indeed they are not always sraight and aligned. It is due to the fact that we only keep a one-pixel-wide line.

On residential roads the detection also works for the lines, but there are also a lot of detection that are not lane markings. For example pavements, poles or trees are often detected as line, due to their elaongated shape and their color which is brighter than the background. We can see in ﬁgure 4.4 that lines are detected, but also objects in background such as safety rail or trees. The results are even worst for roundabout, because the camera does not see much of the road, indeed while entering a roundabout the road goes outside the ﬁeld of view of the camera.

(26)

(a) Camera image (b) Projected image (c) Detected ridges

Figure 4.3: Ridges detection on highway

(a) Camera image (b) Projected image (c) Detected ridges

Figure 4.4: Ridges detection on residential road

(27)

4.3.3 Discussion

The ridge detector has good results on highway scenarios, but more mixed results on other types of roads.

These results are mainly due to the fact that while on highway a large part of the image is covered by the road, but in residential area the camera see more background and thus the image include more useless information.

A way to improve these results could be to adapt the detector to detect diﬀerent size of line, as opposed to here where the detector is calibrated for an average size of line. Moreover a better orientation of the camera could also improve the results, because for this application we only need to see the road, and it would induce less error while projecting the image.

4.4 ICP

4.4.1 Implementation

This part was also implemented in C++ and using ROS to handle pointclouds and transformations between the diﬀerent frames.

4.4.2 Results

The results of the ICP to improve localization are highly dependent on the results of the ridge detector, Indeed when the ridge detector returns good results then we can expect good results from ICP, but with bad detection the ICP is most likely to diverge. Thus the results are good for highway scenarios, it means the localization is well corrected, as the car is detected in the right lane, and less good for roads with a lot of crossroads and roundabout.

On ﬁgure 4.5 we can see the results of the ICP in a highway scenario. It take place on a two-lane road before it merges with another two-lane road. In the lower right corner we can see the view of the camera, and thus that we are on the rightmost lane. The green lines correspond to the lines of the map.

The red dots correspond to the detected lines. The red arrow corresponds to the position given by the GPS, which puts the car on the left of the leftmost lane. The white rectangle represents the car in its corrected position, it is in the middle of the rightmost lane, which is the correct position.

On roads with a high number of lanes, like highways, the matching can be wrong by a couple of lanes, indeed the relative position to the lane is almost always well corrected, but the oﬀset in number of lanes can sometimes be wrong. It depends on the initial position, which is set to the GPS position at the begining. The correction is done well when the initial position is not too

(28)

Figure 4.5: ICP correction on highway

far away from the real position, or when there is not too many lanes, like on an entrance road to the highway.

On residential roads the results are equivalent if the ridge detector worked well. But the quality of the results will drop when arriving to a crossroads or a roundabout, where the ICP is most likely to diverge due to bad lines detection. It will only be corrected when the number of matchs is low, and the algorithm converge to the GPS position, which is equivalent to a reset of the position.

4.4.3 Discussion

The ICP works well, but a few improvments can still be done. There is sometimes some jumps in position due to the fact it is not ﬁltered. Thus ﬁltering could be added to smooth the variation of the corrected position and

(29)

thus avoiding some discontinuities. Another way to improve this algorithm could be to weight the errors for each points, depending on their results to the ridge detector. Another problem to correct is the longitudinal correction, the one in the direction of the road. Indeed on a straight road the lateral position is well corrected, but changing the longitudinal position of the vehicle does not impact the matching, as we match points to lines, and thus it does not impact the results either. A way to correct this could to take into account the velocity of the car and use it to correct the longitudinal position.

We also briefly tested a different approach than the ICP to improve the results of the algorithm on residential roads. We implemented a particle filter, where each particle was evaluated using the results of the ridge detector and the matching. It improved the results on residential roads, as results due to bad detection of lines was filtered, but it also deteriorated the results on highway. The overall results was a bit better, but a lack of precision ap- peared. Thus it was not keep as viable solution, but the two solutions could be combined to improve the global method.

(30)

5 Conclusion

In this report we have presented a method to localize a vehicle on roads using visual information and an opensource map. The approach was divided in three parts corresponding to the diﬀerent modules of the developped software. The ﬁrst part treated the map, it analysed the existing data from OpenStreetMap and extended them with lane markings data. The second part corresponded to the lane markings detection using the camera and a ridge detection method.

The third and last part implemented an ICP algorithm to compare the detected lines and the one stored in the map, and then return the updated localization of the vehicle.

Our results shows that this method is viable. Indeed we had good results on highway, and more mixed results on other kind of roads. This is mainly due to the quality of the road and thus the quantity of usefull data seen by the camera.

5.1 Future works

The proposed algorithm can be improved and extended in many ways. We already proposed several improvments for each part of the algorithm. However other upgrades can be done to improve every parts, for example we could parallelized the part of the code which are not yet, this will results in a great gain of computation time, especialy for the ICP part where manipulations of pointclouds could easily be parallelized when applying transformation to them. Another way to enhance the overall algorithm would be to improve the map, especially the semi-automatic construction of the lines. Indeed a better map means a better localization, here we only corrected the map for crossroads and roundabout, thus there were still errors on the rest of the map.

Finally it would be interesting to develop an application using this algorithm and a controllable vehicle to do a line follower.

(31)

Bibliography

[1] Raphael Labayrade. How autonomous mapping can help a road lane detection system? In Control, Automation, Robotics and Vision, 2006.

ICARCV’06. 9th International Conference on, pages 1–6. IEEE, 2006.

[2] Ignacio Parra Alonso, David Fernández Llorca, Miguel Gavilán, Sergio Ál- varez Pardo, Miguel Ángel García-Garrido, Ljubo Vlacic, and Miguel Án- gel Sotelo. Accurate global localization using visual odometry and digital maps on urban environments. Intelligent Transportation Systems, IEEE Transactions on, 13(4):1535–1545, 2012.

[3] Philipp Bender, Julius Ziegler, and Christoph Stiller. Lanelets: Eﬃcient map representation for autonomous driving. In Intelligent Vehicles Sym- posium Proceedings, 2014 IEEE, pages 420–425. IEEE, 2014.

[4] Mordechai Haklay and Patrick Weber. Openstreetmap: User-generated street maps. Pervasive Computing, IEEE, 7(4):12–18, 2008.

[5] OpenStreetMap wiki. http://wiki.openstreetmap.org/wiki/Main_

Page. [Online; accessed 17-March-2015].

[6] Radu Danescu and Sergiu Nedevschi. Probabilistic lane tracking in diﬃ- cult road scenarios using stereovision. Intelligent Transportation Systems, IEEE Transactions on, 10(2):272–282, 2009.

[7] Sergiu Nedevschi, Rolf Schmidt, Thorsten Graf, Radu Danescu, Dan Frentiu, Tiberiu Marita, Florin Oniga, and Ciprian Pocol. 3d lane detec- tion system based on stereovision. In Intelligent Transportation Systems, 2004. Proceedings. The 7th International IEEE Conference on, pages 161–

166. IEEE, 2004.

[8] Jens Goldbeck and Bernd Huertgen. Lane detection and tracking by video sensors. In Intelligent Transportation Systems, 1999. Proceedings.

1999 IEEE/IEEJ/JSAI International Conference on, pages 74–79. IEEE, 1999.

(32)

BIBLIOGRAPHY

[9] Yue Wang, Dinggang Shen, and Eam Khwang Teoh. Lane detection using spline model. Pattern Recognition Letters, 21(8):677–689, 2000.

[10] ZuWhan Kim. Robust lane detection and tracking in challenging scenar- ios. Intelligent Transportation Systems, IEEE Transactions on, 9(1):16–

26, 2008.

[11] Lei Xu, Erkki Oja, and Pekka Kultanen. A new curve detection method:

randomized hough transform (rht). Pattern recognition letters, 11(5):331–

338, 1990.

[12] Jung Gap Kuk, Jae Hyun An, Hoyong Ki, and Nam Ik Cho. Fast lane detection & tracking based on hough transform with reduced memory requirement. In Intelligent Transportation Systems (ITSC), 2010 13th International IEEE Conference on, pages 1344–1349. IEEE, 2010.

[13] Guoliang Liu, F Worgotter, and Irene Markelic. Combining statistical hough transform and particle ﬁlter for robust lane detection and track- ing. In Intelligent Vehicles Symposium (IV), 2010 IEEE, pages 993–997.

IEEE, 2010.

[14] Dominique Gruyer, Rachid Belaroussi, and Marc Revilloud. Map-aided localization with lateral perception. In Intelligent Vehicles Symposium Proceedings, 2014 IEEE, pages 674–680. IEEE, 2014.

[15] Julius Ziegler, Henning Lategahn, Markus Schreiber, Christoph G Keller, Carsten Knoppel, Jochen Hipp, Martin Haueis, and Christoph Stiller.

Video based localization for bertha. In Intelligent Vehicles Symposium Proceedings, 2014 IEEE, pages 1231–1238. IEEE, 2014.

[16] Amaury Nègre, James L Crowley, and Christian Laugier. Scale invariant detection and tracking of elongated structures. In Experimental Robotics, pages 525–533. Springer Berlin Heidelberg, 2009.

[17] A López, J Serrat, J Saludes, C Cañero, F Lumbreras, and T Graf. Ridge- ness for detecting lane markings. In Proceedings of the 2^nd International Workshop on Intelligent Transportation Systems (WIT’05), 2005.

[18] A López, C Cañero, J Serrat, J Saludes, F Lumbreras, and T Graf.

Detection of lane markings based on ridgeness and ransac. In Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE, pages 254–259.

IEEE, 2005.

(33)

BIBLIOGRAPHY

[19] A López, J Serrat, C Cañero, F Lumbreras, and T Graf. Robust lane markings detection and road geometry computation. International Jour- nal of Automotive Technology, 11(3):395–407, 2010.

[20] Seung-Nam Kang, Soomok Lee, Junhwa Hur, and Seung-Woo Seo. Multi- lane detection based on accurate geometric lane estimation in highway scenarios. In Intelligent Vehicles Symposium Proceedings, 2014 IEEE, pages 221–226. IEEE, 2014.

[21] Szymon Rusinkiewicz and Marc Levoy. Eﬃcient variants of the icp al- gorithm. In 3-D Digital Imaging and Modeling, 2001. Proceedings. Third International Conference on, pages 145–152. IEEE, 2001.

[22] Thanh Hai Tran Thi and Augustin Lux. A method for ridge extraction.

In 6th Asian Conference on Computer Vision 2004-ACCV’04, volume 2, 2004.

[23] Zhengyou Zhang. Iterative point matching for registration of free-form curves. 1992.

[24] ROS website. http://www.ros.org/. [Online; accessed 17-March-2015].

[25] ROS wiki page. http://wiki.ros.org/fr. [Online; accessed 17-March- 2015].

[26] OSM3S wiki page. http://wiki.openstreetmap.org/wiki/Overpass_

API. [Online; accessed 17-March-2015].

[27] OpenCV website. http://www.opencv.org/. [Online; accessed 17- March-2015].

(34)

A OpenStreetMap

This appendix is an extract of the OpenStreetMap wiki, and aim to explain in more details the structure of OpenStreetMap data.

A.1 Node

A node is one of the core elements in the OpenStreetMap data model. It consists of a single point in space defined by its latitude, longitude and node id. A third, optional dimension (altitude) can also be included: key:ele. A node can also be defined as part of a particular layer=* or level=*, where distinct features pass over or under one another; say, at a bridge. Nodes can be used to define standalone point features, but are more often used to define the shape or ‘path’ of a way. Over 2.000.000.000 nodes exist in the global OSM data set (as of 2013).

A.1.1 Point features

Nodes can be used on their own to deﬁne point features. When used in this way, a node will normally have at least one tag to deﬁne its purpose. Nodes may have multiple tags and/or be part of a relation. For example, a telephone box may be tagged simply with amenity=telephone, or could also be tagged with operator=*.

A.1.2 Nodes on Ways

A.1.3 Structure

Name Value

id integer number ≥ 1

lat decimal number ≥ −90.0000000 and ≤ 90.0000000 with 7 decimal places

lon decimal number ≥ −180.0000000 and ≤ 180.0000000 with 7 decimal places

tags A set of key/value pairs, with unique key

A.2 Way

A way is an ordered list of nodes which normally also has at least one tag or is included within a Relation. A way can have between 2 and 2,000 nodes, although it’s possible that faulty ways with zero or a single node exist. A way can be open or closed. A closed way is one whose last node on the way is also the ﬁrst on that way. A closed way may be interpreted either as a closed polyline, or an area, or both.

A.2.1 Types of way

Open way

An open way is a way describing a linear feature which does not share a ﬁrst and last node. Many roads, streams and railway lines are open ways.

(36)

Closed way

A closed way is a way where where the last node of the way is shared with the ﬁrst node with suitable tagging. A closed way that also has a area=yes tag should be interpreted as an area (but the tag is not required most of the time, see section below). The following closed way would be interpreted as closed polylines:

• highway=* Closed ways are used to deﬁne roundabouts and circular walks

• barrier=* Closed ways are used to deﬁne barriers, such as hedges and walls, that go completely round a property.

Area

An area (also polygon) is an enclosed filled area of territory defined as a closed way. Most closed ways are considered to be areas even without an area=yes tag (see above for some exceptions). Examples of areas defined as closed ways include:

• leisure=park to deﬁne the perimeter of a park

• amenity=school to deﬁne the outline of a school

For tags which can be used to deﬁne closed polylines it is necessary to also add an area=yes if an area is desired. Examples include:

• highway=pedestrian + area=yes to deﬁne a pedestrian square or plaza.

Areas can also be described using one or more ways which are associated with a multipolygon relation.

Combined closed-polyline and area

It is possible for a closed way to be tagged in a way that it should be interpreted both as a closed-polylines and also as an area.

For example, a closed way deﬁning a roundabout surrounding a grassy area might be tagged simultaneously as:

highway=primary + junction=roundabout, both being interpreted as a polyline along the closed way, and landuse=grass, interpreted on the area enclosed by the way.

(37)

A.3 Relation

A relation is one of the core data elements that consists of one or more tags and also an ordered list of one or more nodes, ways and/or relations as members which is used to deﬁne logical or geographic relationships between other elements. A member of a relation can optionally have a role which describe the part that a particular feature plays within a relation.

A.3.1 Usage

Relations are used to model logical (and usually local) or geographic relationships between objects. They are not designed to hold loosely associated but widely spread items. It would be inappropriate, for instance, to use a relation to group ‘All footpaths in East Anglia’.

A.3.2 Size

It is recommended to use not more than about 300 members per relation. If you have to handle more than that amount of members, create several relations and combine them with a Super-Relation. Reason: The more members are stuﬀed into a single relation, the harder it is to handle, the easier it breaks, the easier conﬂicts can show up and the more resources it consumes at database and server.

Note: ‘super-relations’ is a good concept on paper but none of the many OSM software applications is working with them.

A.3.3 Roles

A role is an optional textual ﬁeld describing the function of a member of the relation. For example, in North America, role:east indicates that a way would be posted as East on the directional plate of a route numbering shield. Or, multipolygon relation, role:inner and role:outer are used to specify whether a way forms the inner or outer part of that polygon.

A.3.4 Types of relation

There are many types of relation including:

• Relation:route is used to describe routes of many types, including major numbered roads like E26, A1, M6, I 80, US 53; or hiking routes, cycle routes and bus routes.

(38)

• Relation:multipolygon, used for deﬁning larger Areas such as river banks and administrative boundaries.

• Relation:boundary to exclusively deﬁne administrative boundaries

• Relation:restriction to describe a restrictions such as ‘no left turn’, ‘no U-turn’ etc.

A.3.5 Examples

Multipolygon

In the multipolygon relation, the role:inner and role:outer roles are used to specify whether a member way forms the inner or outer part of that polygon enclosing an area. For example, an inner way could deﬁne an island in a lake (which mapped as relation).

Bus route

A bus route might have a relation with type=route, route=bus and ref=* and operator=* tags. The ways over which the bus travels would be members, along with bus stop nodes. The ways would have role:forward or role:backward roles, depending on whether the buses operate in the direction of the way, or the opposite way (or the role might be left blank, meaning the bus route uses the way in both directions).

A.4 Tag

A Tag consists of ‘Key’ and a ‘Value’. Each tag describes a speciﬁc feature of a data element (nodes, ways and relations) or changesets. Both the key and value are free format text ﬁelds. In practice, however, there are agreed conventions of how tags are used for most common purposes.

Key can be modified with a prefix, infix or suffix namespace to further qual- ify it. Common namespaces are language specification and a date namespace specification for name keys.

A.4.1 Keys and values

Each tag has only a key and value. Tags are written in OSM documentation as key=value.

(39)

The key describes a broad class of features (for example, highways or names). The value details the speciﬁc feature that was generally classiﬁed by the key (e.g. highway=motorway). If multiple values would be needed for one key the semi-colon value separator may be used in some situations.

Here are a few examples of how keys and values are used in practice:

• highway=residential a tag with a key of ‘highway’ and a value of ‘residential’ which should be used on a way to indicate a road along which people live.

• name=* a tag for which the value ﬁeld is used to convey the name of the particular street

• maxspeed=* a tag whose value is a numeric speed in km/h (or in miles per hour if the suffix ‘mph’ is provided). Metric units are the default (and do not need to be mentioned explicitly). Other units, such as miles per hour, knots, yard or pounds must be stated after the value. Where a regulation is specified in a particular unit then that unit should be used within the value field.

• maxspeed:winter=* a key that includes a namespace for ‘maxspeed’

identiﬁes a diﬀerent value for maxspeed that applies only in winter.

• name:de:19531990="Ernst-Thälmann-Straße" name key with suﬃxed namespaces to specify the German name of a street which was valid from 1953 to 1990,

(40)

B ROS

This appendix describe what ROS is, and part of it works.

B.1 Robot Operating System

ROS [24] means Robot Operating System. The oﬃcicial website explains what it consists of: “ROS is a exible framework for writing robot software. It is a collection of tools, libraries, and conventions that aim to simplify the task of creating complex and robust robot behavior across a wide variety of robotics platforms.”. The ROS Wiki page [25] proposes a more technical deﬁnition:

“ROS is an open-source, meta-operating system for your robot. It provides the services you would expect from an operating system, including hardware abstraction, low-level device control, implementation of commonly-used func- tionality, message-passing between processes, and package management. It also provides tools and libraries for obtaining, building, writing, and running code across multiple computers.”

B.2 ROS Concepts

If a robot has to accomplish a global task, this global task can be split into elementary tasks like image processing, sound processing, environment anal- ysis, moving, etc. These tasks require a computation from a computer. We call a ROS node ‘a process that performs computation’. In order not to let these processes live alone, the computer needs something to inventory and link them. That is the role of the Master. Master is a managing superstruc- ture. Nodes can communicate with their pairs. The communication works as follows: A node can publish a message (i.e. variables), on what is called a topic. Another node is free to subscribe to this topic or not. It launches an operation just after data is published on the topic. Here the publisher has the initiative. But nodes can interact directly by using a service, that is to say a client/server structure. One client asks one server to do something and

(41)

APPENDIX B. ROS

waits for its response. This is less rigid than using a topic because the client has the initiative. With a service, a client node asks directly for something to a server node and waits for the response contrarily to the ﬁrst case where the node having the information control the reaction of the subscribers. Fig- ure B.1 schematizes the relations between nodes using services or topics to communicate.

Figure B.1: ROS concepts

(42)

C Platform

This appendix technicaly describes the equipments of the platform.

C.1 Car

The experimental platform is built on a Lexus LS600h car, seen on ﬁgure 4.1 and equipped with:

• 2 IBEO Lux Lidars.

• 1 TYZX Stereo camera.

• 1 Monocular RGB camera.

• 1 GPS Xsens MTi-G Inertial sensor.

• DELL computer with GPU and SSD memory.

• CAN bus.

C.2 Sensors

C.2.1 Stereo camera

The stereo camera is a TYZX Aptina MT9V022 CMOS, and its caracteristics are:

• 22cm baseline.

• 62^◦ HFOV.

• Depth of 1.8 − 23m.

• Resolution 512 × 320 pixels.

• PCI board for the disparity calculation in real time, and Linux drivers.

(43)

APPENDIX C. PLATFORM

C.2.2 RGB camera

The RGB canera is a IDS UI-5240CP-C color camera, and its caracteristics are:

• Resolution 1280 × 1024 pixels.

• Gigabit Ethernet interface GigE.

• 50 fps max rate in Freerun mode.

(44)

Visual Map-based Localization applied to Autonomous Vehicles

Visual Map-based Localization applied to Autonomous Vehicles

Visual Map-based Localization applied to Autonomous Vehicles

Jean-Alix DAVID jadavid@kth.se

Abstract

Contents

Acknowledgments

List of Figures

1 Introduction

1.1 Problem statement

Camera image

Map data

Corrected position

Detected ridges OpenStreetMap data

Map generator Ridge detector

ICP

2 Background

3 Methods

3.1 OpenStreetMap data

3.1.1 Basic structure

3.1.2 Lane markings generation

3.2 Ridge detector

3.2.1 Theory

3.3 ICP algorithm

3.3.1 Matching

3.3.2 Minimization

4 Tests and results

4.1 Platform and test environment

4.1.1 Platform

4.1.2 Environment

4.1.3 Experimental protocol

4.2 Map

4.2.1 Data storage

4.2.2 Discussion

4.3 Ridge detector

4.3.1 Implementation

4.3.2 Results

4.3.3 Discussion

4.4 ICP

4.4.1 Implementation

4.4.2 Results

4.4.3 Discussion

5 Conclusion

5.1 Future works

Bibliography

A OpenStreetMap

A.1 Node

A.1.1 Point features

A.1.2 Nodes on Ways

A.1.3 Structure

A.2 Way

A.2.1 Types of way

A.3 Relation

A.3.1 Usage

A.3.2 Size

A.3.3 Roles

A.3.4 Types of relation

A.3.5 Examples

A.4 Tag

A.4.1 Keys and values

B ROS

B.1 Robot Operating System

B.2 ROS Concepts

C Platform

C.1 Car

C.2 Sensors

C.2.1 Stereo camera

C.2.2 RGB camera