### Master of Science Thesis in Applied Mathematics

### Department of Electrical Engineering, Linköping University, 2017

### Automatic Lane Labeling

### Using an Off-line Smoothing

### Stereo-lane Algorithm

Master of Science Thesis in Applied Mathematics

**Automatic Lane Labeling Using an Off-line Smoothing Stereo-lane Algorithm**

Mikael Hartman LITH-ISY-EX–17/5087–SE Supervisor: Parinaz Kasebzadeh

isy_{, Linköpings universitet}
Martin Nilsson

Autoliv AB Examiner: Gustaf Hendeby

isy_{, Linköpings universitet}

Automatic Control

Department of Electrical Engineering Linköping University

SE-581 83 Linköping, Sweden Copyright © 2017 Mikael Hartman

**Sammanfattning**

Målet med detta examensarbete har varit att skapa en karta över väglinjer genom styckvis detektering. En existerande algoritm klarar att detektera väglinjer i re-altid genom bilder från en stereokamera. Användning av bas-splines (B-spline) möjliggör en konvertering av linjerna till en punktformig representation. Dessa kombineras off-line med en rörelseskattning för att skapa en sammanslagen färd-och vägbane-karta. Fusion utförs genom en optimeringsbaserad simultan lokali-sering och kartläggning (slam) metod.

Resultaten verifieras genom visuell granskning samt jämförelse mot mätningar från ett differentiell global positionerings system (dgps). Den föreslagna meto-den klarar av att förbättra positionsskattningar genom att utnyttja kurvaturen av väglinjerna. Genom en projektion av väglinjekartan till kameravy kan väglin-jer markeras i bild med längre detektions avstånd och noggrannhet än ingående realtids algoritm.

**Abstract**

The goal of this thesis has been to create a map of road lanes from piece-wise detections. Using images from a stereo camera, an existing algorithm is able to detect lane tracks in real-time. With a basis spline (B-spline) representation of the tracks, these curve features are converted into point features. These are com-bined off-line with an odometric estimate for the creation of a motion and lane track map. Fusion of odometric and feature measurements is done using an opti-mization based simultaneous localization and mapping (slam) method.

By use of visual inspection and a differential global positioning system (dgps) the result is validated. The method is able to improve on existing motion estimators by using the curvature of lane tracks. A projection of the resulting map is used to automatically label lane tracks as seen from camera view with accuracy and range exceeding that of the real-time algorithm.

**Acknowledgments**

First of all I would like to thank Autoliv AB for the opportunity to perform this masters thesis. I am very grateful for all the help and support I have received from all working in the lane detection development team, and especially my su-pervisor Martin Nilsson.

I would also like to thank my examiner Gustaf Hendeby, and ISY supervisor Pari-naz Kasebzadeh for all the help and feedback I have received and the time you put in. They have both been essential for me when determining the path and focus of the thesis throughout the work.

Finally, I also wish to thank my family and friends for all the encouragement and support you have given me throughout the years.

*Linköping, June 2017*
*Mikael Hartman*

**Contents**

Notation xi
1 Introduction 1
1.1 Objective . . . 2
1.2 Limitations . . . 2
1.3 Methodology . . . 2
1.4 Related work . . . 3
1.5 Outline . . . 4
2 Background 5
2.1 System . . . 5
2.2 Reference frames . . . 6
2.2.1 Position . . . 6
2.2.2 Orientation . . . 9
2.3 Lane track model . . . 10

2.4 Input pre-processing . . . 11

2.4.1 Motion . . . 11

2.4.2 Lane track curves . . . 12

2.5 Curve representation . . . 12

2.5.1 Bézier curves . . . 13

2.5.2 B-splines . . . 13

2.6 Simultaneous localization and mapping . . . 16

2.7 Nonlinear optimization . . . 17 2.7.1 Newton . . . 18 2.7.2 Gauss-Newton . . . 18 3 Method 19 3.1 Lane parametrisation . . . 19 3.2 Models . . . 20 3.2.1 System state . . . 20

3.2.2 Odometric measurement model . . . 21

3.2.3 Motion model . . . 21

3.2.4 Track measurement model . . . 22 ix

x Contents

3.3 SLAM formulation . . . 23

3.4 Separate localization and mapping . . . 24

4 Result 25 4.1 Implementation . . . 25 4.1.1 Covariance . . . 25 4.1.2 Convergence . . . 26 4.2 Data sequences . . . 27 4.3 Performance validation . . . 27

4.3.1 Mean Squared Error . . . 27

4.3.2 Camera view . . . 28 4.4 Input . . . 28 4.4.1 Motion . . . 28 4.4.2 Lane Tracks . . . 28 4.5 Performance . . . 31 4.5.1 DGPS . . . 31 4.5.2 Visual . . . 32 4.5.3 Marked data . . . 37 4.6 Fault detection . . . 37 4.7 Discussion of results . . . 39 4.7.1 Inputs . . . 39 4.7.2 Performance . . . 40

5 Conclusion and future work 43 5.1 Conclusion . . . 43

5.2 Future work . . . 44

A Manual marking data 47

**Notation**

Sets, norms and denotations Notation Meaning

k_{. k}_{2} _{Euclidean norm.}

k_{. k}_{P}_{Mahalanobis norm weighted with P .}*In* *Identity matrix of size n × n.*

R*n* *Set of all real numbers in n dimensional space.*

Abbreviations

Abbreviation Meaning

autosar _{Automotive Open System Architecture}

afacg _{Automotive Open System Architecture Front Axis }
Co-ordinate Ground

acrc _{Automotive Open System Architecture Camera }
Refer-enced Coordinate

dgps Differential Global Positioning System
ekf _{Extended Kalman Filter}

gps _{Global Positioning System}
ico _{Image Coordinate}

lidar Light Detection and Ranging mse Mean Squared Error

slam Simultaneous Localization and Mapping seplam Separate Localization and Mapping

**1**

**Introduction**

In the field of automation, accurate reference data sets are essential to evaluate performance or to, combined with machine learning, teach computers to identify features and patterns autonomously. This is certainly the case when develop-ing self-drivdevelop-ing cars which require a reliable mappdevelop-ing of the surrounddevelop-ing area usually achieved by use of a combination of camera, light detection and rang-ing (lidar) sensor and/or radar. To manually look through a video sequence frame by frame identifying the desired features to create reference data is a time-consuming task which is impractical for the large data sets. One important fea-ture to map is the driving lanes since they are used to keep the car on the road, not only in self-driving cars but also in modern human-driven vehicles that can alert the driver if the vehicle accidentally drift out of their lane.

This thesis is done in collaboration with Autoliv AB, a company specializing in car safety producing products both in passive safety, e.g. seat belts and airbags, and active safety, e.g. automated emergency breaks and lane departure warning. Autoliv AB have years of driving logs available from test vehicles that are hard to use in development due to the lack of reference data. These test vehicles are equipped with a front facing stereo-camera and corresponding software that can, from each pair of stereo-images, estimate polynomial curves of lane tracks ahead. The estimates are of high accuracy close to the vehicle but drop with range. Since the vehicle travels along the road, there will be better estimates of the tracks far-ther ahead at a later time in the driving log. Using the fact that the entire driving log is available when analyzing, the lane track estimates over time can be utilized to get accurate track curves along the entire driving path. These curves can then be projected onto each separate frame to get reference tracks with range and ac-curacy possibly exceeding even that which human vision can provide.

2 1 Introduction

To provide a better understanding of the problem, this chapter contain an ob-jective description and problem formulation in addition to the problem limita-tions. The chapter also includes a brief introduction to the methodology used, a summary of related work that has been done in this area and concludes with an outline of the remainder of the thesis.

**1.1**

**Objective**

The desired output from the thesis implementation are high quality frame mark-ings of lane tracks. This requires accurate motion and lane track estimates. There-fore, the goal of this thesis is to improve existing motion and lane track estimates by taking the whole data sequence into account. Additionally, the use of piece-wise observations of curve features for improvements to motion estimates is in-vestigated. This will not be done in general but specific to the lane labeling appli-cation.

**1.2**

**Limitations**

In this thesis, only lane tracks will be considered for curve features, limiting the scope to continuous and smooth curves.

Loop closures and long term drift will be disregarded unless it improves the ac-curacy of the shorter estimates. The reason for this is that the reference data only needs to mark the lane tracks that can be seen at a given time.

Image processing and frame to frame odometric estimates are done beforehand by existing algorithms provided by Autoliv AB. The problem solved is based on output from these algorithms and will not directly utilize any images.

**1.3**

**Methodology**

To obtain long continuous lane estimates from a number of local polynomial ap-proximations, the local estimates must be combined. Due to noise and approx-imations, the local estimates will not fit perfectly together. For this reason, a fusion of the local polynomial curves is needed.

In this thesis, this long continuous lane is modeled using a basis spline (B-spline) curve. A spline is any smooth function that is piecewise polynomial with con-tinuous derivatives up to given order where the pieces connect [6]. Additionally, while also being true for other types of splines, B-splines allow the curve to be rep-resented by a set of points in geometrical space, in contrast to the more abstract coefficients that normally define a polynomial curve. These points in geometrical

1.4 Related work 3

space are independent of the observation point meaning they can be used as dis-tinct feature points in space, while also having desired locality properties [13]. A more in depth description of B-splines is presented in Section 2.5.

Each observation of the curve is relative to the vehicle current pose. This means that in order to relate the observations to each other a motion estimate is required. To handle the need of localizing both the vehicle and feature points, a simultane-ous localization and mapping (slam) approach is proposed. The approach for-mulates a slam problem that, when solved, gives a map containing both vehicle motion and environmental features. More detail about the slam approach and resulting problem are given in Section 2.6.

Solving the slam problem in real-time is often done using filtering, a stepwise update approach where the position estimates are based of the previous posi-tions and current measurements. This means that any calculation will only have to take a few variables at a time into account. However, in this case the entire se-quence of measurements is known before making any position estimates which means that a smoothing approach preferred. Smoothing is here done by formu-lating an optimization problem over all variables. This means that the dimen-sionality of the problem grows linearly with the number of observation positions and feature points. Even for shorter driving sequences, the number of variables would become large. Then finding an optimal solution to the corresponding slamproblem could be computationally difficult.

**1.4**

**Related work**

The inspiration for the thesis is an article by Jeffrey Kiske [7]. In this article, the measurements of detected lane tracks over five seconds are projected into a single frame by use of a high precision global positioning system (gps). The resulting tracks have a long detection distance while still being accurate enough to use as learning data for a neural network.

Instead of using expensive hardware to transform measurements into a single frame the work of Pedraza et al. [12] shows that B-splines can be used to map curve features in the slam framework. They use B-spline control points as mea-surements in an extended Kalman filter (ekf) to map the curvatures of an indoor environment by use of laser scans and odometry data. A similar approach is used by Meier et al. [11] in outdoor environments including lane tracks. They manage to create a very sparse map that still accurately represent the original tracks. The approach utilized here is more based on the doctoral thesis by Minjie Liu [9] and related articles [8, 10], where curve measurements are utilized in a more gen-eral slam framework. A B-spline based observation model is presented which lets the control points be expressed as positions relative to the observer. This allows for optimization based techniques to be used instead of step based

meth-4 1 Introduction

ods such as ekf. The observation model is demonstrated to work in an indoor environment using a robot equipped with a laser scanner and odometric data.

**1.5**

**Outline**

The remainder of the report is split into four chapters separating the stages of the work. First general information about the system and theory gathered from out-side sources are presented in Chapter 2. This is followed by a more application specific description of the proposed solution to the above problem formulation in Chapter 3. There the used models and optimization problem formulation are presented. After that implementation and validation details are presented in Chapter 4. This is followed by method results and analysis in the same chapter. Finally the report is wrapped up in Chapter 5 with conclusion and suggestions for future work.

**2**

**Background**

This chapter introduces the platform used in the study, including descriptions of the input used. In some cases direct sensor values are used but also pre-processed measurements from existing algorithms. The used reference frames are defined along with the model used to represent lane tracks. Finally, the slam problem and the mathematical tools used are introduced in a general sense.

**2.1**

**System**

The system primarily consists of a stereo camera with internal logic produced by Autoliv AB mounted behind the rear-view mirror in some modern cars, as seen in Figure 2.1. The camera system contains two separate cameras where simultane-ous images can be compared to get depth perception from 2D images. This can be done using triangulation by knowing the relative positions of the cameras. This, along with other methods, allows for a 3D reconstruction of the area in front of the car. The camera system is calibrated both statically and dynamically to esti-mate the camera systems pose in vehicle and camera parameters.

Besides the camera, other sensors are used that measure forward speed, accel-eration and the angle of the front wheels relative to the car frame. A Global Positioning System (gps) is also available but is not directly used due to synchro-nization and accuracy issues. In certain limited test situations a more accurate Differential gps (dgps) is used to validate the motion estimates. A dgps does, in addition to the function of a normal gps, measure relative distances to ground mounted receivers that are impractical to place along public roads. This limits the use of dgps in this thesis.

6 2 Background

Figure 2.1:The stereo camera used along side a camera placement depiction.

**2.2**

**Reference frames**

Formulation of the slam problem with relation to the measurements require sev-eral reference frames which are defined in this section.

**2.2.1**

**Position**

A vehicle driving along a road will move in three-dimensions but each observa-tion made will be projecobserva-tions onto the two-dimensional image plane. The camera can be considered fixed in the vehicle and thus also traversing in the three dimen-sional world. This creates a need for a several different coordinate systems. A three-dimensional world coordinate system in which the vehicle’s position can be described at a given time, a three-dimensional coordinate system that relates the camera pose to the vehicle, and finally a two-dimensional coordinate system that represents the camera view. Transformations to relate these coordinate sys-tems are also necessary.

**World coordinate**

The vehicle’s position over time is represented in a three-dimensional Cartesian
coordinate system with origin in the vehicles initial position with the vehicle
*pointing along the x-axis, the y-axis pointing to the left and the z-axis upwards,*
as depicted in Figure 2.2. This coordinate system will be referred to as world
coordinate.

A development partnership between automotive interested parties known as Au-tomotive Open System Architecture (autosar) was founded in 2003. The goal of the partnership is to create standards in software architecture [1]. Among others, autosarhas defined some reference frames that will be used in this thesis.

2.2 Reference frames 7

Figure 2.2: View depicting the world coordinate system with origin in the front wheel axis at the ground level of the vehicle’s initial position.

At a given time an object detected by the vehicle will not have known world coordinates but rather coordinates relative to the vehicle’s current pose. Such ob-servations will be given in the autosar Front Axis Coordinate System Ground (afacg) that uses the vehicle current pose as origin but otherwise defined same as world coordinates. This means that at the initial time the afacg coordinate system will be the same as the world coordinates. A depiction of the afacg is shown in Figure 2.3.

The observations are in fact done by the camera which while being rigidly mounted in the car can shift and be slightly angled. This creates a need for the autosar Camera Referenced Coordinate System (acrc) that instead uses the camera’s cur-rent pose as the origin. A depiction of acrc is shown in Figure 2.4.

Transformations between afacg and acrc are given by the camera calibration and will approximately be the same at all time instances. The relation between afacg/acrc and world coordinates does however change with vehicle motion and is estimated as part of the slam problem.

**Camera view**

Before a 3D reconstruction, the observations will only be in terms of pixels de-tected by the camera. The coordinate of the pixels are given in the Image Co-ordinate System (ico) as depicted in Figure 2.5. Transformation from ico into the world coordinates is done by triangulation. The inverse of projecting world coordinates into the ico is done using a pinhole camera model with focal lengths provided by the calibration.

8 2 Background

Figure 2.3: View depicting the AUTOSAR Front Axis Coordinate System Ground (AFACG) as seen from above the car to the left and from the right side of the car to the right. Origin is located in the center of the car front wheel axis at the ground level.

Figure 2.4: View depicting the AUTOSAR Camera Referenced Coordinate System (ACRC) as seen from above the car to the left and from the right side of the car to the right. Origin is located at the camera position.

2.2 Reference frames 9

**2.2.2**

**Orientation**

To relate world coordinates, afacg and acrc the orientational differences are represented in Euler angles. In this part, the applied orientational and rotational transformations are presented.

**Euler angles**

The orientation of an object in three-dimensional Cartesian space can be
repre-sented by Euler angles. The definition of the Euler angles vary depending on
the author and the application. In this study, the angles are defined by Figure
2.6. The deficiencies of Euler angles are that they suffer from a 2π ambiguity and
*will have singular points at the poles of the unit sphere, where θ = ±π. Neither*
of which causes problems in this application since it is unlikely for a vehicle to
*turn as much as 2π radians around any axis between two camera frames or to be*
facing straight towards the sky or ground.

Figure 2.6:The used definition of Euler angles.

**Rotation matrix**

*A rotation in Euler angles will consist of three separate rotations around the X,Y*
*and Z axes. The rotations can be done in any order. In this case all rotations are*
*done in the order yaw, pitch then roll. The combined rotation R is given by*

*R(φ, θ, ψ) = Rx _{φ}Ry_{θ}Rz_{ψ}*
=
1 0 0
0

*cos(φ)*

*sin(φ)*0 −

_{sin(φ)}* *

_{cos(φ)}*cos(θ)*0 −

*0 1 0*

_{sin(θ)}*sin(θ)*0

*cos(θ)*

*cos(ψ)*

*sin(ψ)*0 −

_{sin(ψ)}

_{cos(ψ)}_{0}0 0 1 =

*cos(θ) cos(ψ)* *cos(θ) sin(ψ)* −_{sin(θ)}*sin(φ) sin(θ) cos(ψ) − cos(φ) sin(ψ)* *sin(φ) sin(θ) sin(ψ) + cos(φ) cos(ψ)* *sin(φ) cos(θ)*
*sin(φ) sin(ψ) + cos(φ) sin(θ) cos(ψ)* *cos(φ) sin(θ) sin(ψ) − sin(φ) cos(ψ)* *cos(φ) cos(θ)*
. (2.1)

10 2 Background

**2.3**

**Lane track model**

To get smooth curves, roads are typically designed using a clothoid model. The
lane tracks naturally follow the same pattern and will also be modeled using
*clothoids. A clothoid is a circular curve where the curvature c, which is defined*
*as the inverse of the radius c = 1/r, changes linearly with the distance along the*
*curve. At a certain distance s along the curve the curvature will then be c = c*0*+ds,*

*where c*0*is the initial curvature and ζ =dc _{ds}* is the curvature rate.

To represent the transition between two following clothoids, a double clothoid model can be used. The double clothoid is a clothoid curve where the curvature rate is changed after a given distance. The parameters of the double clothoid are described in Table 2.1.

Table 2.1:Parameters of a double clothoid.

*y*0 Lateral deviation of the start of the clothoid from the afacg along the

*y-axis.*

*α* Heading angle, i.e. the initial angle of the first clothoid.

*xt* *Transition distance, i.e. the distance along the x-axis where the *

curva-ture rate is changed.

*c*0 Initial curvature of the double clothoid curve.

*ζ*1 Curvature rate before the transition distance is reached.

*ζ*2 Curvature rate after the transition distance is reached.

By approximating the two following clothoids in the double clothoid with third
*degree polynomials the lateral deviation y at distance x can be calculated *
accord-ing to
*y(x) =*
*y*0*+ tan(α)x + c*0*x*
2
2 *+ ζ*1*x*
3
6*,* *x < xt*
*y*0*+ tan(α)x + c*0*x*
2
2 *+ ζ*1(*x*
3
*t*
6 +
*x(x−xt)xt*
2 *) + ζ*2*(x−xt*)
3
6 *,* *x ≥ xt*
*.* (2.2)
A graphical description of the double clothoid model is shown in Figure 2.7.

2.4 Input pre-processing 11

Figure 2.7:Double clothoid model.

**2.4**

**Input pre-processing**

Most of the raw output from the sensors is not directly used but are instead pre-processed by existing algorithms and converted into more useful formats such as odometric data. Similarly, the camera images are pre-processed to identify the lane tracks.

**2.4.1**

**Motion**

The vehicle’s motion between the time instances of sequential camera frames is estimated either just using the vehicles motion sensors speed, acceleration and wheel angle, here referred to as EgoMotion, or by use of the camera images to get visual odometry. Regardless of the used method the motion estimate is given as translation and rotation of the afacg between sequential camera frames.

**Visual Odometry**

To estimate vehicle motion using camera images, distinctive points are identified in two sequential images and matched. When matched the change in camera angle relative to each point can be acquired. This gives an accurate estimate of the rotational motion but not the translational. The depth perception of the stereo camera 3D reconstruction is not sufficiently precise for measuring the changes in distance that would be required. Instead the current vehicle speed sensor is used to determine the distance traveled. Covariances of the angle estimations are available.

12 2 Background

Figure 2.8: Example of marked lane tracks overlaid on the corresponding camera view image.

**2.4.2**

**Lane track curves**

Existing algorithms for estimating lane track curves identify lane track markings
in camera images. These marking coordinates are transformed from the ico into
the afacg which then have clothoid curves fitted to them forming tracks. The
*XY and XZ planes are handled separately with one double clothoid curve in the*
*XY plane and a single clothoid curve in the XZ plane. The result is an estimate*
of the currently seen lane tracks in 3D relative to the vehicle.

Some tracking and validation of the lane tracks between camera frames are done which removes false detections but requires a lane track to be seen for a few frames before being detected. The accuracy of the curve estimation is high but has a limited range as Figure 2.8 shows. This study utilizes the estimated clothoid curves in the current afacg as the input.

**2.5**

**Curve representation**

*An explicit expression, y(x), for the curves is not practical when trying to create*
a map of lane tracks since it does not allow for different y-values for a single
*x-value. A parametric curve*

*a(s) = [x(s) y(s) z(s)]T* (2.3)
*for some parametrization s avoids this problem. Additionally, it allows for the*
*dimensions to be handled separately. A parametric curve a(s) represented using*
*a regular power basis of degree n is given by*

*a(s) =*

*n*

X

*i=0*

2.5 Curve representation 13

*where ai* ∈ R3 are the coefficients. Using this basis to represent a long

compli-cated curve would require the basis to be of a high degree. Additionally, the
coefficients are abstract and numerically unstable since small changes to selected
coefficients have a large effect on the curve. As an alternative, Bernstein
*polyno-mials of degree n defined as*

*Bn _{i}(s) = n*

*i*!

*si(1 − s)n−i, i = 0, ..., n* (2.5)
*forms an alternative basis for parametric polynomials in the interval s ∈ [0, 1],*
where some of the deficiencies of the power basis are avoided [5]. A Bernstein
*basis have the property that the only non-zero basis function for s = 0 is B*_{0}*n*and
*for s = 1 is Bnn. A curve C(s) represented in a Bernstein basis of degree n*

*C(s) =*

*n*

X

*i=0*

*PiBni(s),* (2.6)

*where Pi* ∈ R3are the coefficients, will for s ∈ [0, 1] be a curve starting in the P0

*and ending Pn*. This gives the coefficients a clear geometric interpretation as it is

the points in space that bends the curve. This is similar to interpolating methods, but does not require the curve to pass through the intermediate coefficient points. A change in coefficients will only affect the curve by a similar amplitude, thus giving greater numerical stability compared to the normal power basis.

**2.5.1**

**Bézier curves**

*The polynomial curve in (2.6) for s ∈ [0, 1], is known as a Bézier curve of order n*
and the coefficients P*i, i = 0, ..., n are known as control points [13]. Apart from*

direct calculations, evaluation of a Bézier curve can be done using De Casteljau’s algorithm. The algorithm utilizes the recursive property of Bernstein polynomi-als which is

*Bk _{i}(s) = sBk−1_{i−1}(s) + (1 − s)Bk−1_{i}*

*(s)*(2.7)

*where Bk−1*−

_{1}

*= Bk−1*

_{k}*= 0 and B*0

_{0}= 1 [13]. The recursion allows the evaluation to be

done recursively using linear interpolation of intermediate points as Figure 2.9 illustrates.

**2.5.2**

**B-splines**

A Bézier curve have the desired geometrical interpretation and, being a polyno-mial, will be smooth. However, it would need to be of a high degree to represent a long complicated curve. A Bézier curve of high degree is computationally expen-sive to evaluate and is generally avoided. Instead, this can be solved by piecing together several low order Bézier curves forming a composite Bézier curve. This approach needs additional requirements to preserve smoothness in the junction between curves.

14 2 Background

Figure 2.9: Graphical illustration of De Casteljau’s algorithm used to
evalu-ate a point on the cubic Bézier curve (black) resulting from the four control
*points P*_{0}0*, P*_{1}0*, P*_{2}0*, P*_{3}0. Linear interpolations (blue) between the control points
*gives a set of intermediate control points P*_{0}1*, P*_{1}1*, P*_{2}1*placed s along the *
*inter-polations. The process is repeated until a single control point P*_{0}4 gives the
*Bézier curve evaluated for a given s.*

One solution for this problem is the use of splines which are a piecewise-polynomial
functions. They are continuous and continuously differentiable to degree n even
at the breakpoints. Basis spline (B-spline) is a spline function with minimal
*sup-port on the degree n, smoothness and domain partition [5]. An example of a*
B-spline curve is illustrated in Figure 2.10. This figure shows how the control
points of a B-spline shape the curve by bending locally.

*A B-spline of degree k, order k + 1, is defined as*
Γ*(s) =*

*n*

X

*i=0*

*Piβi,k(s)* (2.8)

*where Piare control points and βi,k(s) are the normalized B-spline basis functions*

*of B-spline order k + 1 defined over a knot vector*

*T = [*0*, ..., n+k*] ∈ R*n+k+1.* (2.9)

*The values i, i = 0, ..., n + k are the values of s for which the breakpoints of the*

*B-spline occur. The basis functions βi,k(s) are given by Cox-de Boor recursion*

formulas [2]:
*βi,0(s) =*
*1,* *if i* ≤*s ≤ i+1*
*0,* *otherwise.*

2.5 Curve representation 15

Figure 2.10:Quadratic B-spline curve (red-blue-red) with clamped knot vec-tor marked with control points(•) , control polygons (black) and component curves (red,blue,red). The component curves are the three separate second degree polynomials that make up the B-spline.

*and for k > 0*
*βi,k(s) =*
*s − i*
*i+k*−*i*
*βi,k−1(s) +*
*i+k+1*−*s*
*i+k+1*−*i+1*
*βi+1,k−1(s).* (2.10)

The recursion formula is a generalization of (2.7), that is not limited to a single
*basis expression over the fixed interval s ∈ [0, 1]. If the knot vector is chosen to*
*be strictly increasing, then for a given s ∈ [*0*, n+k] only a single βi,0(s) will be*

*non-zero. That inserted into the recursion reveals that only k + 1 basis functions*
can be non-zero as illustrated in Figure 2.11. This inserted into (2.8) shows that
*the B-spline is locally determined by only k + 1 control points.*

The property of Bézier curves beginning and ending in control points is not in
general shared by B-spline curves. However this can be achieved by
substitut-ing the strictly increassubstitut-ing knot vector with a clamped knot vector. A clamped
knot vector is a non-decreasing knot vector where the first and last elements are
*repeated k times*
*T = [k, ..., k*
| {z }
*k+1*
*, k+1, ..., n−1, n, ..., n*
| {z }
*k+1*
*].* (2.11)

One essential property of a B-spline is that any affine transformation of the curve
can be achieved by applying the transformation to the control points [9]. This
means that the curve can easily be translated, rotated and/or scaled. The B-spline
*basis functions of a curve for sample points s*0*, ..., sm* are known as a collocation

16 2 Background

Figure 2.11: Illustration of finding the non-zero basis functions of a cubic B-spline.

matrix and is defined by

*B =*
*β0,k(s*0) *. . .* *βn,k(s*0)
*..*
*.* *. ..* *...*
*β0,k(sm*) *. . .* *βn,k(sm*)
*.* (2.12)

**2.6**

**Simultaneous localization and mapping**

A large problem with mapping lane tracks in the surrounding environment is that measurements with most sensors will be relative to the vehicle and not the world coordinates. These can be related to a pose estimate in the map while it is forming. It then becomes necessary to simultaneously determine the vehicle pose and the position of the map features. This problem is known as simultaneous lo-calization and mapping (slam), which is common in robot applications and has recently been studied extensively [3], [4].

With the goal of finding the vehicle poses and features positions that minimizes a given cost function, eg. the weighted square distance between predicted and obtained measurements, while taking noise into account can be formulated as an optimization problem. Let

*• x be a vector containing current estimates of all robot poses and feature*
positions

*• mj _{i}*

*be measurement j at frame i*

*• H*

_{m}j*i*

*(x) be what measurement mj _{i}*

*is expected based on x*

*• P*

_{m}j*i* be the positive definite covariance matrix of the noise for measurement

2.7 Nonlinear optimization 17

Using a weighted least squares cost function the slam problem can then be
for-mulated as
min
*x*
*p*
X
*i=1*
*k _{i}*
X

*j=1*

*(mj*−

_{i}*−1*

_{H}mji_{(x))}T_{P}*mj*

_{i}(m*j*

*i*−

*Hm*

*j*

*i*

_{(x))}_{(2.13)}

*where i is an index over all frames and j is an index over all measurements in*
*frame i. Since P _{m}j*

*i* is positive definite the same will be true for the inverse, which

can be split using a Cholesky decomposition giving
*P*−1

*mj _{i}*

*= Lmji*

*LT*

*mj _{i}* (2.14)

*where L _{m}j*

*i* are lower triangular. By use of (2.14) and by defining the residual

*r _{m}j*

*i(x) = m*

*j*

*i*−

*Hm*

*j*

*i*

_{(x)}_{(2.15)}

(2.13) can be reformulated as a least squares problem

min
*x*
*p*
X
*i=0*
*ki*
X
*j=0*
*(r _{i}j(x))TL_{m}j*

*i*

*LT*

*mj*

_{i}r*j*

*i(x) = min*

_{x}*p*X

*i=0*

*ki*X

*j=0*

*(LT*

*mj*

_{i}r*j*

*i(x))T(LT*

_{m}j*i*

*r*(2.16)

_{i}j(x)).*Then finally the terms L _{m}j*

*i*

*r _{i}j(x) can be placed in a single vector r(x) so that the*
slam

_{problem is formulated on standard least squares form}

min

*x* *r*

*T _{(x)r(x).}*

_{(2.17)}

**2.7**

**Nonlinear optimization**

Solving a large nonlinear optimization problem analytically can in practise be
impossible. Instead, an approximate optimum can be found using numerical
al-gorithms such as the Newton or the Gauss-Newton methods. Both methods finds
approximations of local minimum to twice differentiable functions f : R*n* _{7→ R}

by a truncation of the Taylor series

*f (x) = f (x(k)) + (∇f (x(k)*))*T(x − x(k)) + (x − x(k)*)*THf(x(k))(x − x(k)) + ...,* (2.18)

*where Hf* *is the Hessian matrix of f . When removing all higher than second*

order terms from the Taylor series the resulting local approximation will have to be in a local minimum if

18 2 Background

**2.7.1**

**Newton**

*From a given starting point x*(0), the Newton algorithm consists of iterating
*Hf(x(k))s(k)= −∇f (x(k)), x(k+1)= x(k)+ s(k)* (2.19)

until convergence. A local minimum is found if the Hessian indeed is positive def-inite. Instead of running until full convergence the Newton method usually adds a stopping condition such as a threshold for the step length where the current approximation is considered sufficient.

**2.7.2**

**Gauss-Newton**

Calculating the Hessian needed in the Newton method is often computationally expensive. The Gauss-Newton is a specialization of the Newton method that avoids this calculation by limiting to nonlinear least squares minimization prob-lem. Using the same notation as in (2.17), the gradient and Hessian in the least square minimization problem is

∇_{f (x) = J}_{r}T_{(x)r(x), H}_{f}_{(x) = J}_{r}T_{J}_{r}_{+}

*m*

X

*i=1*

*ri(x)Hri(x),* (2.20)

*where Jr* *is the Jacobian of r and Hri* is the Hessian of the component function

*ri. Instead of calculating Hri*, these terms are skipped. This gives the Hessian

approximation

**3**

**Method**

This chapter introduces application specific models and slam problem formu-lation. A lane track map is created where each separate track is modeled by a B-spline and the vehicle pose for each camera frame is given in world coordi-nates. Pose changes between sequential camera frames and sampled clothoid pa-rameters of detected lane tracks are used as measurements. Differences between models and measurements are used to formulate an optimization slam problem that is solved using the Gauss Newton method. For validation purposes, a simple alternative method is also introduced.

**3.1**

**Lane parametrisation**

The B-spline parameters are chosen based on the lane track model. Since the lane track model is approximated with a third-degree polynomial, it is reasonable to choose a B-spline with degree of three.

To get the B-spline control points evenly distributed along the track curve
inde-pendent of driven speed, the knot vector values are chosen based on accumulated
curve length. Since the true curve length is unknown it is approximated to be the
*same as driven distance. In frame i, the accumulated driven distance li* for the

*first camera frame is l*0= 0 and for the remainder approximated by chord length

*li* =
*i*

X

*j=1*

k* _{d}_{j}*k

_{2}

_{,}_{(3.1)}

*where dj* is the odometric measurement of afacg translation between camera

*frames j − 1 and j. The knot vector is chosen to be clamped and uniformly spaced*
*with starting value laand end value lbwhere a and b are the first and last frames*

20 3 Method

*where the track is detected. By basing the B-spline parametrisation s on the *
vehi-cle motion, an association between the camera frames and detected track features
is directly given. This removes the need for point matching that often occurs in
visual tracking.

**3.2**

**Models**

In this section, a description of the input and motion model is given including a definition of the states that represent the map and vehicle pose. It is given in two different versions, one simple that does not use a motion model and one extended utilizing a motion model.

**3.2.1**

**System state**

The vehicle motion is represented by a set of pose states. When a motion model is used additional states related to derivatives are required, and an extended vehicle state is used. Since B-splines are used to model the tracks only a few feature points are needed to form the surrounding map.

**Vehicle state**

*The vehicle state determined for each frame i = 0, ..., N , contains the Cartesian*
*coordinates pi* *and the Euler angles qi* *of the afacg in frame i given in world*

*coordinates. Together they form the vehicle pose xv _{i}*, that is used to relate
mea-surements in the frame to world features. The vehicle state in the simple state
model is given by

*pi*=

*p*

_{i}X*pY*

_{i}*pZ*

_{i}*, qi*=

*φi*

*θi*

*ψi*

*, xv*=

_{i}*pi*

*qi*!

*.*(3.2)

*In the extended state model the additional states forward velocity v _{i}X* and front

*wheels angle γi*are added giving the vehicle state model

*xv _{i}*

*= (pT*

_{i}*qT*

_{i}*vX*

_{i}*γi*)

*T.*(3.3)

**Map state**

The B-spline representing a lane track is a curve in world coordinates that are
defined by the knot vector and the control points. The knot vector is considered
constant which fixes the order and number of control points. Each control point
*j for track l will have a state vector containing the world coordinates τ,*

*xf _{l,j}*

*= τl,j*=

*τ*

_{l,j}X*τ*

_{l,j}Y*τ*

_{l,j}Z*.*(3.4)

3.2 Models 21

*Assuming L tracks with Ml*control points, the combined state vector becomes

*x = ((x*_{0}*v*)*T* *, ... , (xv _{N}*)

*T*

*, (xf*)

_{1,1}*T*

*, ... , (x*1)

_{1,M}f*T*

_{, ..., (x}f*L,1*)

*T*

_{, ... , (x}f*L,ML*)

*T*

_{)}

*T*

_{. (3.5)}**3.2.2**

**Odometric measurement model**

*The measurement of the change in pose y _{i}ofrom frame i − 1 to i is given by*

*y*

_{i}o= ho(x_{i−1}v*, xv*(3.6a)

_{i}) + eo_{i},*where e*is Gaussian measurement noise.

_{i}oMeasurements from the pre-processed motion estimates between frames directly
gives the next pose in the afacg of the originating frame. The change in angles
will be the same in both the world coordinates and the afacg. This means that
a transformation of the world coordinate point of the next pose is sufficient to
*model the pose measurement. In other words, by moving from pose i − 1 to i the*
odometric measurement model is

*ho(xv _{i−1}, x_{i}v*) =

*Ri−1*0 0

*I*3

!

*(xv _{i}* −

_{x}v*i−1).* (3.6b)

The additional states in the extended state model are measured directly, therefore the measurement models are

*y _{i}o,v*

*= v*(3.6c)

_{i}X+ eo,v_{i}*y*

_{i}o,γ*= γi+ e*

*o,γ*

*i* (3.6d)

*where e _{i}o,vand eo,γ_{i}* are Gaussian measurement noise.

**3.2.3**

**Motion model**

In addition to odometric measurement a motion model is added in the extended model to improve the motion estimation. Additional input is given as

*ui* *= [aXi* *ωi*]*T* (3.7)

*where aX _{i}*

*is the forward acceleration and ωi*is the angular velocity of the front

wheels. The forward acceleration is measured by a sensor independent of speed measurement. However, the angular velocity of the front wheels is given by the same front wheel angle sensor as used in (3.6d). The motion model is introduced by

*xv _{i+1}= f (x_{i}v, ui) + νi* (3.8a)

*where νiis Gaussian process noise. With T as sampling time and W as the *

22 3 Method
*f (x _{i}v, ui*) =

*pi+ T viXRi* 1 0 0

*qi*+

*T vX*

*i*

*W* 0 0

*tan(γi*)

*v*

_{i}X+ T aX_{i}*γi+ T ωi*

*.*(3.8b)

**3.2.4**

**Track measurement model**

*A track curve of degree k is modeled by extracting the k + 1 control points*
*xf _{l,a−k}, x_{l,a−k+1}f*

*, ..., x*

_{l,a}f*that locally defines the curve. The value of a is determined*

*by the frame parametrisation s = liand the knot vector such that s ∈ [la, a+1l*[.

*For s ∈ [la, la+1[ the only non-zero basis functions will be βa−k,k, βa−k+1,k, ..., βa,k*

*which will be polynomials of degree ≤ k. This means that the local track curve*
Γ*(s) can be written as*
Γ*(s) =*
*M*
X
*j=0*
*xf _{l,j}βj,k(s) =*

*a*X

*j=a−k*

*x*

_{l,j}f*βi,k(s) =*

*a*X

*j=a−k*

*xf*

_{l,j}(β_{0}

*j+ β*

_{1}

*js + ... + β*)

_{k}jsk=*xf _{l,a−k}*

*xf*

_{l,a−k+1}*. . .*

*x*

_{l,a}f*β*_{0}*a−k* *β*_{1}*a−k* *. . .* *β _{k}a−k*

*β*

_{0}

*a−k+1*

*β*

_{1}

*a−k+1*

*. . .*

*β*

_{k}a−k+1*..*
*.* *...* *. ..* *...*
*β*_{0}*a* *β*_{1}*a* *. . .* *βa _{k}*
1

*s*

*..*

*.*

*sk* =

*= XafBa*1

*s*

*. . .*

*sk*

*T*

*.*(3.9)

The lane track curve measurements will be in the afacg, so the affine invariance
of B-spline is utilized by transforming the control points using the estimated
*frame pose. For brevity the addition and subtraction notations between a m × n*
*matrix M and a m × 1 vector V are represented with a operation column wise as*

*M ± V =*
*M*11±*V*1 *M*12±*V*1 *. . .* *M1n*±*V*1
*M*21±*V*2 *M*22±*V*2 *. . .* *M2n*±*V*2
*..*
*.* *...* *. ..* *...*
*Mm1*±*Vm* *Mm2*±*Vm* *. . .* *Mmn*±*Vm*
*.*

Using a transformation from the world coordinates to the afacg the modeled B-spline curve as detected from the vehicle is

Γ_{AFACG}(s) = Ri(X*f*

*a* −*pi)Ba*

3.3 SLAM formulation 23

*For the remainder of this section non-indexed x, y and z will refer to the *
Carte-sian coordinates or functions of such. The curve now consists of a parametric
*polynomial expressions of degree k for each axis in the afacg coordinate system.*

Γ* _{AFACG}(s) =*

*x(s)*

*y(s)*

*z(s)*

*.*(3.11)

The lane track measurements are given as the coefficients of (2.2) which for x < x*t*

*with a small angle approximation gives the explicit polynomial curves y(x) and*
*z(x). Since the parametrisation s is chosen to be the driven distance and the *
*x-axis in the afacg is the forward direction the curve x(s) is approximated by a*
*first degree polynomial x*1*(s) which then can be used to substitute s in y(s) and*

*z(s) to form comparable explicit curves y(x) and z(x) from the B-spline. The *
mea-surements are the polynomial coefficients vectorized which results in the track
measurement model

*y _{l,i}f*

*= hf(xv*

_{i}, x_{l,1}f*, ..., xf*

_{l,M}*l) + e*
*f*

*l,i,* (3.12a)

*where e _{l,i}f* is Gaussian measurement noise and

*hf(xv _{i}, xf_{l,1}, ..., x_{l,M}f*

*l*) =

*Ba(Ri(X*

*f*

*a*−

*pi*))

*T* 0 1 0

*Ba(Ri(X*

*f*

*a*−

*pi*))

*T* 0 0 1

*.*(3.12b)

**3.3**

**SLAM formulation**

The odometric and track measurement models are combined with the motion model to formulate the slam problem with the goal of minimizing sum of the residuals squared, weighted with measurement and process noise covariances. For the measurements, the residuals are the differences between measurement and the models presented in Section 3.2. Residuals for the motion model are the differences between modeled next vehicle state and the corresponding in the current state vector. The resulting optimization problem is

min
*x*
*N*
X
*i=0*
*L*
X
*l=0*
k_{h}f_{(x}v*i, x*
*f*
*l,1, ..., x*
*f*
*l,Ml) − y*
*f*
*l,i*k2*P*−1
*e _{l,i}f*
+
+

*N*X

*i=1*k

_{h}o_{(x}v*i−1, xvi) − yoi*k2

*−*

_{P}_{1}

*eo*+

_{i}*N −1*X

*i=0*k

_{f (x}v*i, ui) − xvi+1*k2

*−*

_{P}_{1}

*νi*

*,*(3.13)

where the vector norm k · k*P*−_{1}is the Mahalanobis distance, i.e.,

k* _{ek}*2

*P*−_{1} *= eTP*

−1* _{e = (P}*−

*T /2*−

_{e)}T_{(P}*T /2*−

_{e) = kP}*T /2*2

_{ek}24 3 Method

No additional constraints are needed since states can assume any real value. If the extended model is not used the sum containing motion model residuals is ignored and the odometric model is adjusted accordingly. As shown in Section 2.6 this weighted least squares formulation can be converted to a standard least squares problem that can be solved using Gauss Newton. However, there is no guarantee that the local optimum found by Gauss Newton will be a global opti-mum.

**3.4**

**Separate localization and mapping**

For comparative purposes a simple separate approach is used which in this thesis will be referred to as separate localization and mapping (seplam). This method consist of first estimating the motion without taking anything but odometric mea-surements into account, then mapping the lane tracks.

The motion estimate is obtained by using the odometric measurements as formations between the afacg coordinate system of each frame. These trans-formations are multiplied until the first pose is reached which is defined as the world coordinates. For each frame a sample point of detected lane tracks is ex-tracted into the current afacg coordinate system from the clothoid model. The sample point is then transformed into world coordinates from the existing mo-tion estimate.

Finally the lane tracks are smoothed by fitting a B-spline to the sample points.
The B-spline is chosen to be the order of four resulting in a third-degree
polyno-mial curve which corresponds to the polynopolyno-mial degree of the used road model.
A clamped uniform spaced knot vector is used with additional control points
added for each 50m of chord length. The fitting is done by finding the control
*points matrix P that minimizes the sum of squared distances between the sample*
*points and the B-spline curve. Assuming there are n + 1 control points and m + 1*
sample points this is formulated as

min
*P*
*m*
X
*j=0*
*n*
X
*i=0*
*Pi*k*βi,3(uj) − dj*k22*,* (3.15)

*where βi,3* *is the basis functions and dj* are the sampled coordinates. This is a

least square problem which will have the solution

*P = (BTB)*−1*BTdj* (3.16)

**4**

**Result**

In this chapter implementation details of the methods described in Chapter 3 are
presented. Results from the methods are then compared to “ground truth” from
dgps_{and camera view images. The feasibility of using estimated motion and}
track map for fault detection is also investigated. Finally the results are discussed
and analyzed.

**4.1**

**Implementation**

The methods implemented and compared are slam with and without motion model along with seplam. For brevity, slam with motion model will be referred to as slam Extended (slame) while slam will be the method without motion model. The motion estimate used as input will if nothing else is mentioned be visual odometry but in some cases EgoMotion is also utilized.

The models and cost function are implemented as described in Chapter 3. From the cost function, a residual vector is formed that, along with corresponding Ja-cobian, form a least squares optimization problem that is iteratively solved using Gauss Newton. The state vector is initialized to zeros as no better initial guess is needed.

**4.1.1**

**Covariance**

The slam problem formulation contain weights in term of inverse covariance ma-trices of measurement and process noise. True covariances are not known for the measurement noise, as the pre-processing algorithms does not currently provide such. Until such time when these become available the covariance matrices are treated as parameters, which is always true for the covariance of process noise.

26 4 Result

To reduce the number of parameters all noise is considered independent. This is known not to be true in almost any case for the measurement noise but the ap-proximation is still made. It is also considered constant across a sequence, apart from the lane tracks parameters that is scaled by detection distance. For a 30s long driving sequence these combined approximations reduce the number of pa-rameters to tune from 90000 to 32. Tuning the these remaining papa-rameters is done manually only to the accuracy of a power of ten.

**4.1.2**

**Convergence**

For non-convex optimization problems the Gauss Newton method is not guar-anteed to converge, and if it converges, the rate may still vary. The parameters are therefore not only tuned based on accuracy but also with regards to conver-gence. Poor convergence occurs when the Hessian is ill-conditioned, so the tun-ing is done with the resulttun-ing Hessian in mind. An example of the convergence for slam can be seen in Figure 4.1, where only three iterations are sufficient un-til improvements are negligible. Since convexness of the cost function has not been proven there is no guarantee that the found state is the global optimum. To determine if Gauss Newton get stuck in local optimum the algorithm is run several times starting from randomized initial state vectors. Every time the re-sulting state vector is the same. Although, the number of iterations required to reach this varies. This indicates that an approximation of the global optimum is reached without applying methods for avoiding local optimum such as grid search. 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Iteration 10-2 100 102 104 106 108 Cost

Figure 4.1: Cost function in the slam problem with each iteration of the Gauss Newton method.

4.2 Data sequences 27

**4.2**

**Data sequences**

The data used is from existing databases of Autoliv AB. These contain driving sequences from all over the world under varying road and weather conditions. A typical sequence is a 30s long driving log. In total a few hours of data is used for validation while only a few minutes are used for parameter estimation. The parameters are not adjusted based on conditions but inputs such as range of de-tected lane tracks and visual odometry are affected. Some of the sequences used are described in Table 4.1 while the rest are classified only by road type and time of day.

Table 4.1:Descriptions of driving sequences. Sequence Description

A Highway during day, road is initially flat and straight followed by a slow elevated curve to the right.

B Test track during day, road is flat and straight.

C Rural road during day, road is curved in alternating directions initially heading down followed by a climb.

D 2 + 1 road during night, road is flat and initially straight followed by a flat curve to the left.

E Rural road during day, road is flat and straight apart from a 90° left turn.

**4.3**

**Performance validation**

Estimated track and motion curves are validated by comparison to correspond-ing “ground truth” curves. What forms the “ground truth” varies dependcorrespond-ing on if the result is compared in the world or image coordinates. There is no avail-able “ground truth” containing lanes in the world coordinates. However, dgps when available, does provide “ground truth” in world coordinates for the motion estimate. In the image coordinates there exist data with manually marked lane tracks which serves as “ground truth”.

**4.3.1**

**Mean Squared Error**

The measure used to compare two curves is mean squared error (mse) which for
*two sampled curves u, v is*

mse* _{(u, v) =}* 1

*N*

*N*X

*i=1*k

*−*

_{u}_{i}*k2 2 (4.1)*

_{v}_{i}*where N is the number of samples. Curves are sampled each meter in the world*
coordinates and each pixel in the ico.

28 4 Result

**4.3.2**

**Camera view**

Curves in the ico plotted over camera images, subject to visual inspection, forms a qualitative measure. Transforming motion and track maps into pixels as seen in the camera view is done as described in Section 2.2. This method of compari-son has the drawback of being subjective and slow to assess large data sets thus forming a risk of overfitting to a few sequences.

**4.4**

**Input**

Before evaluating the proposed slam methods the quality of inputs is tested. As presented in Section 3.4, seplam creates a motion and lane track map directly from the input with little to no alteration. Therefore, it is used to benchmark the input and will also later act as reference.

**4.4.1**

**Motion**

Using gps for a rough global positioning, the Google maps area corresponding to sequence A is extracted. This is then overlaid with EgoMotion and visual odome-try motion estimates. The curves are aligned to the map manually, which may be a source of error in this test. Resulting comparison is shown in Figure 4.2, where both estimates correspond along the straight part of the road but differ largely when the roads starts to curve. The same motion estimates can be seen plotted in camera view in Figure 4.3.

**4.4.2**

**Lane Tracks**

It is already known that the accuracy of the lane detections is high, particularly for short range. Therefore this is not evaluated separately but instead the com-bined motion and lane track map created by seplam is tested. The seplam mapped lane tracks are smoothed using the same B-spline curve representation as used by slam and slame. To test if the restrictions posed by this represen-tation is a source of noticeable errors, seplam in this test is run both with and without the smoothing. This comparison is done for sequence A, with a cam-era view of resulting marked lane tracks in Figure 4.4. The estimated track and motion map in the world coordinates is shown in Figure 4.5. No noticeable differ-ence between the smoothed and non-smoothed curves of seplam can be seen for this sequence.

4.4 Input 29

Figure 4.2:Comparison of input motion estimates EgoMotion (red) and vi-sual odometry (blue) overlaid on manually fitted map where blue markers are gps start and stop positions.

Figure 4.3:Comparison of input motion estimates EgoMotion (red) and vi-sual odometry (blue) overlaid on camera view for sequence A.

30 4 Result

Figure 4.4: Comparison of seplam smoothed using B-splines (blue) and without smoothing (cyan) overlaid on camera view for sequence A.

0 100 200 300 400 500 600 700 800 900 1000 x (m) -70 -60 -50 -40 -30 -20 -10 0 10 y (m)

Figure 4.5: Comparison of seplam smoothed using B-splines (blue) and without smoothing (cyan) in world coordinate system. The motion estimate (red) and B-spline control points (red *) are also added.

4.5 Performance 31

**4.5**

**Performance**

The methods slam and slame are tested using three different sources of “ground truth” for comparison. The only “ground truth” in world coordinates comes in terms of 2D positions from dgps, which are compared to the motion estimate. In camera view both visual inspection and manually marked data are available for reference. These serves as “ground truth” for the lane tracks that could be detected in each time instance.

**4.5.1**

**DGPS**

On the test track used in sequence B, dgps measurements are available. Using
this as “ground truth”, the performance of the methods seplam and slam are
evaluated. The result is shown in Figure 4.6, where the dgps positions are
com-pared to the motion estimate. It shows that small sidewise motions in the lane
is better captured by slam than seplam. Using mse to compare also shows the
overall better fit of slam with a mse of 0.0046 compared to the 0.0163 of seplam.
Note that the mapped lane tracks from the seplam method are not perfectly
straight in contrast to the slam mapped tracks. Also note that the estimated
motion curve using both methods only overshoots the traveled distance by
*ap-proximately 3.5m, which is less than 1.0% of the driven distance. Although this*
test is only done on a straight road, the result stands in contrast to that of
Fig-ure 4.2 where the estimated driven distance is measFig-ured to undershoot by over
*7.5%.*
100 150 200 250 300 350
x (m)
-22
-20
-18
-16
-14
-12
-10
-8
-6
y (m)
100 150 200 250 300 350
x (m)
-22
-20
-18
-16
-14
-12
-10
-8
-6
y (m)

Figure 4.6:Comparison between estimated motion (red) and dgps positions (black) with modeled lane tracks (blue) added. The motion and track map is estimated using seplam in left figure and slam in right figure.

32 4 Result

**4.5.2**

**Visual**

The main application of this thesis is to mark lanes in camera view for each sepa-rate frame. Because of this the primary test is to see how well this is accomplished. The goal is to only mark curves fitting to painted lane markings on the side clos-est to the vehicle. The pre-processing algorithms does on occasion accidentally mark other features, such as cracks in the road, which then also will be marked in the track map. This cannot be avoided as no distinction is made between de-tections and the method only process input from other algorithms and not the images themselves. The marked camera views included in this report is selected since they reveal differences between the methods while the more common case is that all methods align.

**Highway, sequence A**

In sequence A, the conditions are considered good with clear markings, flat straight road and good weather. Comparisons between the methods are shown in Fig-ure 4.7 and FigFig-ure 4.8. In the initial straight part of the road the lane marking is correct as far as eye can see for seplam and slam while slame shows similar behavior to the EgoMotion in Figure 4.3 estimate further ahead. The main differ-ence between slam and seplam that can be seen here is that the slam produces track curves with higher elevation towards the end.

Later along in the sequence when entering the elevated road curve the mapped tracks are noticeably different. Both seplam and slame overestimates the roads horizontal curvature and misses the painted markings significantly. slam does however appear to be able to compensate only missing by a few pixels.

**Rural, sequence C**

The performance is expected to be reduced in the more difficult case from se-quence C with subsequent road curves and elevation changes. The results are shown in Figures 4.9, 4.10 and 4.11. In the first marked camera view the slame estimated tracks follow the correct path in the horizontal plane while gravely underestimating vertical curvature. seplam handles the elevation changes but, similarly to in sequence A, overestimates the horizontal curvatures. The slam tracks does outperform seplam by compensating for the error in horizontal cur-vature but does still miss by a few pixels in the distance.

Later on in Figures 4.10 and 4.11, when the vehicle have traveled along the path, the accuracy of all methods is significantly lower for short range. At long range only slam is marking correctly, but still missing by a few pixels.

4.5 Performance 33

Figure 4.7:Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of sequence A.

Figure 4.8:Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of sequence A.

34 4 Result

Figure 4.9: Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of early sequence C.

Figure 4.10:Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of mid sequence C.

4.5 Performance 35

Figure 4.11:Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of late sequence C.

**Night, sequence D**

The use of a motion model in sequences A and C has proven worse than the input estimates from seplam. However, this is not always the case as is shown during night in sequence D. The marked tracks are shown in Figure 4.12. Again, the slametracks noticeably differs from the others but in this case they correctly mark the upcoming turn, while seplam and slam misses it. Similar behavior to slame is achieved by the other methods when instead using the EgoMotion odometric estimate as shown in Figure 4.13. Then all methods correlate but es-timate a too steep horizontal road curvature. The most accurate marking in this sequence is achieved by slame using visual odometry.

36 4 Result

Figure 4.12:Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of sequence D using visual odometry.

Figure 4.13:Comparison of slam (blue), slame (red) and seplam (cyan) in camera view of sequence D using EgoMotion.

4.6 Fault detection 37

**4.5.3**

**Marked data**

To not only evaluate the methods on a few select camera frames the algorithm is run over multiple sequences comparing to manually marked lane tracks. The sequences are split into the categories highway day, highway night, rural day and rural night. The results of the comparisons are presented in Figure 4.14, with a more extensive description of data collection and data in Appendix A.

Highway day Highway night Rural day Rural night 0 2 4 6 8 10 12 14

Median marking error (pixels)

SLAM Ego motion SLAME Ego motion SEPLAM Ego motion SLAM Visual odometry SLAME Visual odometry SEPLAM Visual odometry

Figure 4.14:Median of pixel marking error for automatic compared to man-ually marked lane tracks.

**4.6**

**Fault detection**

A useful application of mapping tracks is to detect instances where a real time lane detection algorithm fails. Therefore a comparison between measured and mapped tracks is made with the goal to detect outliers. The comparison is partly made graphically by adding track measurements to a world coordinate map us-ing the motion estimate to transform from the afacg to the world coordinates. A result of this is shown in the left image of Figure 4.15. This is sequence for A where the measurements are accurate so no notable difference between measure-ment and track can be made.

Another means of comparison used is to, in afacg coordinates, calculate the mseof mapped to measured tracks. This is done for each camera frame and the result is shown in the right image of Figure 4.15. If no lane track is measured in the frame the comparative value is set to −1. This method also shows no clear differences between measured and mapped lane tracks for sequence A.

38 4 Result

Figure 4.15:Comparison between mapped tracks and measurements for se-quence A. Left figure is in world coordinates with measurements (green) compared to estimated curves (blue) and motion estimate (red). Right figure is frame wise mse.

In Figure 4.16 the same comparisons is made for sequence E where the measure-ment quality is lower. It is clear from the graphical comparison that some outly-ing measurements exists towards the end of the drivoutly-ing sequence. The same can be seen as clear peaks in the right image. To verify that these indeed are faulty detections the measurement is overlaid on the camera view in Figure 4.17. It is clear that in this particular case the fault detection have worked but no further test cases have been done.

0 50 100 150 200 250 300 350 400 450 x (m) -20 0 20 40 60 80 100 120 140 160 180 y (m)

Figure 4.16:Comparison between mapped tracks and measurements for se-quence E. Left figure is in world coordinates with measurements (green) compared to estimated curves (blue) and motion estimate (red). Right fig-ure is frame wise mse.