Exploring the Training Data for Online Learning of Autonomous Driving in a Simulated Environment

(1)

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2020

Exploring the Training Data

for Online Learning of

Autonomous Driving in a

Simulated Environment

(2)

Simulated Environment:

Mathias Kindstedt LiTH-ISY-EX--20/5325--SE Supervisor: Zahra Gahraee

isy_{, Linköping University}

Examiner: Michael Felsberg

isy_{, Linköping University}

Computer Vision Laboratory Department of Electrical Engineering

(3)

Abstract

The field of autonomous driving is as active as it has ever been, but the reality where an autonomous vehicle can drive on all roads is currently decades away. In-stead, using an on-the-fly learning method, such as qHebb learning, a system can, after some demonstration, learn the appearance of any road and take over the steering wheel. By training in a simulator, the amount and variation of training can increase substantially, however, an on-rails auto-pilot does not sufficiently populate the learning space of such a model. This study aims to explore concepts that can increase the variance in the training data whilst the vehicle trains online. Three computationally light concepts are proposed that each manages to result in a model that can navigate through a simple environment, thus performing bet-ter than a model trained solely on the auto-pilot. The most noteworthy approach uses multiple thresholds to detect when the vehicle deviates too much and repli-cates the action of a human correcting its trajectory. After training on less than 300 frames, a vehicle successfully completed the full test environment using this method.

(4)

(5)

Sammanfattning

Autonom körning är ett aktivt område inom både industrin och forskarvärlden, men ännu är en verklighet där förarlösa fordon kan ta sig fram oavsett väg, de-cennier bort. Istället kan man genom att använda en adaptiv inlärningsmodell

somqHebb learning uppnå ett system som kan ta sig fram självmant på alla vägar,

efter en initial inlärningsperiod. Genom att använda en simulator skulle möjlig-heten att träna en sådan modell öka avsevärt, likaså variationen av vägtyper och det omgivande landskapet. Dock klarar inte en enformig autopilot att fylla mo-dellens lärningsrymd. Detta arbete stävar efter att utforska koncept som kan öka variationen på träningsdatan, medan fordonet kör. Tre prestandalätta metoder presenteras som alla överträffar autopiloten och resulterar i en modell som lärt sig att följa en väg längs kurvor och raksträckor. Det främsta konceptet använ-der sig av två tröskelvärden för att korrigera fordonets styrning då den avviker för mycket från den korrekta rutten. Efter träning på färre än 300 bilder lyckas denna metod slutföra all testsegment utan kollision.

(6)

(7)

Acknowledgments

Firstly, I would like to thank my supervisor Zahra and examiner Michael for the interesting ideas for approaches, help and feedback I’ve received during my work. My thanks also go to Karl, Gustav and Mikael for much needed help when the code was making my life hard.

The ongoing COVID-19 pandemic has beyond doubt increased the effort to produce this thesis. At least in the sense that finding papers on online learning — as in actually related toonline learning and not online learning — has increased

substantially. With that in mind I would like to thank the whole of CVL for the daily fika breaks making each day better and family and friends keeping me sane with daytrips, movie nights, runs, anda lot of ice cream throughout this spring.

Linköping, May 2020 Mathias Kindstedt

(8)

(9)

List of Tables

4.1 Parameters used for tuning . . . 25

4.2 Notation used in section 4.4 . . . 27

4.3 Metrics — Example evaluation . . . 29

4.4 Notation used in section 4.5.3. . . 32

5.1 Distribution of GIST parameter values for the clear weather dataset. 35 5.2 Mean and max vale of GIST features . . . 36

5.3 Top random tuning configurations . . . 36

5.4 Top chronological tuning configurations . . . 38

5.5 Completed segments – auto-pilot, training threshold 0.1 . . . 39

5.6 Completed segments – U (−0.1, 0.1), training threshold 0.1 . . . . . 39

5.7 Completed segments – U (−0.25, 0.25), training threshold 0.1 . . . 39

5.8 Completed segments – N (0,0.25₁₂ ), training threshold 0.1 . . . 39

(14)

5.9 Completed segments – guided learning with equal blending . . . . 44

5.10 Completed segments – guided learning with logistic blending . . . 44

5.11 Completed segments – siphon learning, thresholds 0.01–0.15 . . . 46

5.12 Completed segments – siphon learning, thresholds 0.01–0.4 . . . . 46

5.13 Completed segments – siphon learning, thresholds 0.05–0.15 . . . 46

5.14 Completed segments – siphon learning, thresholds 0.05–0.4 . . . . 47

6.1 Parameters used for online experiments. . . 51

B.1 Completed segments auto-pilot, training threshold 0.01 . . . 73

B.2 Town 1 results, training on the auto-pilot. . . 73

B.3 4×town 1 results, training on the auto-pilot. . . 74

B.4 Town 4 results, training on the auto-pilot. . . 74

B.5 Completed segments U (−0.1, 0.1), training threshold 0.01 . . . . . 75

B.6 Town 1 results, training on the auto-pilot with U (−0.1, 0.1) noise. . 75

B.7 4×town 1 results, training on the auto-pilot with U (−0.1, 0.1) noise. 76 B.8 Town 7 results, training on the auto-pilot with U (−0.1, 0.1) noise. . 76

B.9 Completed segments U (−0.25, 0.25), training threshold 0.01 . . . . 77

B.10 Town 1 results, training on the auto-pilot with U (−0.25, 0.25) noise. 77 B.11 4×town 1 results, training on the auto-pilot with U (−0.25, 0.25) noise. . . 78

B.12 Town 7 results, training on the auto-pilot with U (−0.25, 0.25) noise. 78 B.13 Completed segments N (0,0.25₁₂ ), training threshold 0.01 . . . 79

B.14 Town 1 results, training on the auto-pilot with N (0,0.01₃ ) noise. . . 79

B.15 4×town 1 results, training on the auto-pilot with N (0,0.01₃ ) noise. 80 B.16 Town 7 results, training on the auto-pilot with N (0,0.01₃ ) noise. . . 80

B.17 Town 1 results, guided learning with logistic blending. . . 81

B.18 4×town1 results, guided learning with logistic blending. . . 81

B.19 Town 4 results, guided learning with logistic blending. . . 81

B.20 Town 1 results, guided learning with equal blending. . . 82

B.21 4×town1 results, guided learning with equal blending. . . 82

B.22 Town 4 results, guided learning with equal blending. . . 82

B.23 Town 1 results, siphon learning with thresholds 0.01 and 0.15. . . 83

B.24 4×town1 results, siphon learning with thresholds 0.01 and 0.15. . 83

B.33 4×town 1 results, siphon learning with thresholds 0.05 and 0.40. . 86

(15)

Notation

Sets and Functions

Notation Meaning

R The set of real numbers |_{· |} _{Absolute value}

k_{· k} _L2_-norm

d(x, y) Euclidean distance between x and y, i.e.,

y−x diff(x) Difference between adjacent elements in vector x =

[x1, x2, . . . , xn]T, i.e., [x2−x1, x3−x2, . . . , xn−xn−1]T

U_{(a, b)} _{Uniformly distributed noise in the range [a, b]} N_{(µ, σ}2₎ _{Gaussian noise of mean µ and variance σ}2

Abbreviations

Abbreviation Meaning

CNN Convolutional neural network IMU Inertial measurement unit PDF Probability density function

PID Proportional, integral, differential (regulator) RC-car Remote-controlled car

(16)

(17)

1

Introduction

In this chapter the motivation and goal of the work will be presented. The chap-ter Related Work and Theory will present the needed background for this thesis. The method used and experiments done to form the work done will be described in the chapter Method, while the chapter Results will present obtained results. The results will then be analysed and discussed in the following chapter Analysis, which then the chapter Conclusions will reflect upon, as well as provide sugges-tions for future work.

1.1 Motivation

The field on autonomous driving has vastly grown in recent years, with a large part of the automotive industry invested and the development is projected to keep expanding in the near future [3]. While fully autonomous vehicles that require no human interaction are the end goal, this is not projected to be fea-sible for several decades to come. This is not only depending on scientific and computational advancements but also much slower factors, such as changes to infrastructure and legislation. Due to this, the topic of autonomous driving is often divided into several increasing levels of automation. Each level requiring less interaction by the driver, with level 0 using no automation whatsoever and the end goal, level 5, requiring no human interaction in any situation [52].

At the time of writing, a lot of newer car models are equipped with level 3 functionality as cruise control and lane-keeping [59]. Level 4, where the driving can be fully passive except in some unusual cases, is already a reality within some confined areas such as shuttle buses on airports. Automakers are closing in on a full-scale level 4 system, with the Tesla auto-pilot perhaps being the most notable implementation. This still comes with the caveat that the environmental conditions must match the model assumptions sufficiently well. However, in

(18)

most cases this is not the case. Current level 4 systems are not optimal when lane markings are gone or when driving on anything other than tarmac [2]. Even in these optimal conditions, muddy or slushy roads may serve as a hinder.

To handle these situations a static and fully trained autonomous car would need to be at the level of a human driver in generalising the act of driving. Ad-vances are currently made in the field of deep-learning to approach a universal model able to work in any condition [24]. Such a model would demand a huge amount of training and testing data, containing a large variety of weather con-ditions, traffic situations, and road types. Collecting such data is a big focus of many substantial actors in the industry [59]. Another possibility that would re-quire less amount of data and training is an adaptive system that is able to learn on the fly, i.e., an online learning model. Such a system would also have to be trained when driving, so while not as hands-off as a static model, it would have an advantage with the ability to learn to drive on roads of any type.

One implementation of such an adaptive system is the qHebb model by Öfjäll [43] and will be the online learning method used throughout this thesis. The model, explained in detailed in chapter 3, could, in loose terms, be compared to a big look-up table. Here visual features from a camera mounted to a car, corre-spond to learned commands to perform for the given situation. In previous work has the qHebb model been tested on an RC-car but has now been modified to function in the CARLA driving simulator with the opportunity of more substan-tial and varied testing.

1.2 Goals

With the qHebb model running in CARLA, initial testing with the built-in auto-pilot has highlighted a key ingredient that must be present for the initial training of the model. As described above the qHebb model could trivially be compared to a big look-up table. Due to the monotonicity of the auto-pilot, the table would as a result not be sufficiently populated. It, therefore, appears that a larger variance in the training data is needed.

The goal of this work is therefore to try to find a more suitable structure of this modified or expanded data. As the final product is an online learning model, the data should sufficiently populate a learning space under those circumstances.

1.3 Problem Formulation

Based on the goal, the following problem formulation was chosen and is to be answered in this thesis:

1. What concepts for online learning of autonomous driving are useful for exploring the training data beyond a given trajectory?

(19)

1.4 Limitations 3

1.4 Limitations

The following limitations are applied in this work:

• The qHebb model by Kristoffer Öfjäll [42] will be the sole online learning model used in this thesis, as this model had been mostly integrated into a driving simulator prior to the beginning of this work. Adding further models would be well beyond the scope and time budget of the thesis and due to the low quantity of such models, compatibility and licencing may have also been of a hindrance.

• Only steering is used as the control signal to be estimated. To predict throt-tle control, some temporal information or speed sensor would have to be included in the qHebb model, which is beyond the scope for this thesis. • The model will be trained in a very simple environment. As the focus is

on lane-keeping, no traffic junctions will be used. Only straights and turns, right and left at various angles, will be included in the training and test data. In conjunction, no other actors, such as pedestrians or other vehicles, will be present.

• Other than the controls, the only input to the system was visual input from one camera placed on the hood of the car. The main focus of this work was to determine steering based on visual data. Other important parts of an automotive vehicle such as collision detection and traffic logic were deemed out of scope, but could — and would — in practice be handled by other systems than the qHebb model.

(20)

(21)

2

Related Work

Applying autonomous control to a vehicle has been the topic of numerous works over the years. This chapter will present some of the used methods, highlighting different types of learning as well as some of their strengths and weaknesses. The majority of systems for autonomous driving uses a combination of other sensors in conjunction with a camera, e.g., a laser range finder, IMU or GPS [12] [64] [62]. The number of possible sensor combinations is vast, and all have the purpose of gathering as much information as possible, to make the most accurate prediction.

However, for the remainder of this chapter, the focus will be on systems whose only source of information is a visual image captured by a camera or a stereo rig. One should note though, that while only using vision-based systems it is still theoretically possible to achieve a fully functioning steering, as the task is achiev-able by a human driver. Even after removing all information from other senses such as hearing other traffic and feeling the inertia of the car — fully doable in a simulator — the human is still able to drive perfectly.

Section 2.1 and 2.2 will present two different ways of managing the incoming data, with no or some high level abstraction respectively. Section 2.3 will present work on real-time models similar to the qHebb model, which previous work will be detailed in section 2.4. A more in-depth technical description of the qHebb model will be presented in Chapter 3.

2.1 Behaviour Reflex Approaches

The behaviour reflex approach for autonomous driving is in concept perhaps the simplest one to grasp [14]. The system is directly trained and evaluated by link-ing the inputted image orperception to the outputted command or action with no,

or a small amount of, pre-processing of the image done. The model architecture and learning method may vary greatly between approaches, but the central part

(22)

is that no scene description or abstraction is performed, such models are instead detailed in section 2.2.

One of the most notable and earliest works using a behaviour reflex approach is LeCun’s DAVE [35]. Here a 6-layer convolutional neural network, CNN, was

trained on depth images — disparity maps, estimated from a stereo camera rig — and their corresponding steering angles captured from a manually driven RC-car off-road. To train this supervised learning system, end-to-end learning with back-propagation was performed. DAVE was able to drive autonomously, but it was shown that less than 40 % of the predicted angles were correct, as in either left, right, or forward, when compared to a pre-recorded test set. LeCun notes that this is due to there existing multiple correct steering angles for an image, especially off-road.

A more recent example of a system using a behaviour reflex approach is done by Koutník et al . [31]. Here in this reinforcement learning model, neuroevolu-tion was used to train a recurrent neural network, RNN, to predict steering and

throttle controls of a vehicle using images from the TORCS simulator. Further development of this model also incorporated unsupervised learning to compress the inputted image to a lower-dimensional trained feature vector [32]. Other pre-processing methods used on the image before training is performed includes, for instance, optical flow [54].

2.2 Scene Abstraction Approaches

A behaviour reflex approach may however, despite the various configurations of learning methods and model architectures, have difficulty producing the correct action in certain situations where a higher understanding of the scene similar to Zhang et al . [65] could be of assistance, or even fully needed. This is especially applicable in complex environments, e.g., with surrounding traffic or local rules such as traffic lights or prohibited turn signs. Due to this, this is the currently most commonly used approach for industrial and commercial applications [24].

Common features to abstract from a scene are, for instance, lane markings and bounding boxes for other vehicles and pedestrians. The rise of deep learning has shifted the field of research from more classical to newer methods to detect these features, with the hope that they will possibly generalise better [24] [28]. Some models go as far as fully mapping their surroundings [14] [33], i.e., create a 3D-environment from which to extract the wanted actions [51].

As these types of systems strives to describe their surroundings as accurately as needed, certainly to a greater extent than the ones using a behaviour reflex approach, relatively few exist that rely solely on vision. Some other models of this kind are works by Kim[30], where particle clusters derived form an image is used to detect other vehicles, and Sotelo [57], which uses the colour of each pixel to extract the location of the road. Instead most works also combine the camera with IMU data [36] or a laser range scanner as in VITS [60].

DeepDriving [9] is a similar method where rather than modelling of a scene or drawing bounding boxes around its surroundings, simple affordance

(23)

indica-2.3 On-the-fly Models 7

tors are determined. These indicators include the angle of the car compared to the direction of the road, distance to lane markings, and the end of the road. By shrinking the perception vector from an image into a smaller set of values with more refined information, a trained CNN was able to traverse both real and sim-ulated environments in a successful way.

Many of the methods using this approach are in a lot of cases sensitive to changes in environmental conditions [14]. Shadows on the ground cast by trees and lamp posts; texture, colour, and shape of tarmac and line markings; as well as rain, snow, or mud can cause problems when extracting the features. As such a varied and vast dataset is needed to be able to accurately determine corrective actions for a car on the road.

2.3 On-the-fly Models

The precondition of a vast and varied dataset is not only confined to the higher-level abstraction approach, but the same also applies for all models trained of-fline, i.e., on a pre-recorded dataset. All end-to-end learning has no possibility to adapt to new conditions and as such a enormous amount of data must be col-lected to receive a generative model.

However, instead of depending on a varied and large enough dataset, if the model can train on-the-fly, it could learn the characteristics of each environment as they appear. For these types of systems, the behaviour reflex approach is most commonly used. With a real-time method the computational resources for each training step are much more limited and as suchsimpler, or rather light-weight,

models have generally been in use. Though with today’s advancements in compu-tation power, a combination of an on-the-fly mode and a higher-level abstraction approach becomes increasingly more viable.

Pomerleau’s ALVINN [49] is perhaps the most notable on-the-fly system. Here a car is driven in the Navlab simulator using the actions predicted by a single hidden layer neural network. When driving the current perception, the simu-lator image, is passed through the system, which is then back-propagated with the applied steering control. By this approach ALVINN was able to adapt to changes in the environment, by training initially manually, and then continuing autonomously.

One noted drawback of ALVINN is the way the training works using back-propagation. Due to training being done on each frame the system may on long straights over-fit to the forward action, and thus forget how to turn. This problem is not only a result of the every-frame training, but also due to how learning and forgetting in the system are directly connected [49].

2.4 qHebb Model

With the qHebb model Öfjäll managed to decouple the rate of learning and forget-ting. Much like ALVINN, qHebb is a behaviour reflex model that maps an image to a steering angle. But rather than feeding raw pixels into a neural network,

(24)

qHebb extract the orientation of small regions all over the image using the GIST feature descriptor and uses these features as a lookup-table, orlinkage matrix, to

determine the angle. Rather than back-propagation, the training of the linkage matrix is here done in a non-linear quasi-additive faction as

C_n =(1 − γ) Cq_n−1+ γDqn

1_q

,

the qHebb learning equation. Chapter 3 will derive this equation and in detail

explain the underlying theory behind the terms in it and the model in general. Previous experiments with the qHebb model have been done using an RC-car in a real environment [46]. Each experiment has started with a learning-from-demonstration phase which after the car should be able to operate by itself. For some tests, further short training periods have been added to train the model on additional actions that it may have a problem with.

One test performed tested offline learning of steering [43]. Here data was collected whilst driving on a simple track, as well as corrective data where the vehicle was placed at the side of the road and steered in towards the main track. Results show that the car had some problems with accuracy, but the overall direc-tion of steering was learnt.

Another experiment tested unimodal steering and throttle. The RC-car was trained using manual control on a closed-circuit loop consisting of four identical corners and straights. After one and a half lap the control was handed to the qHebb model, the car was able to continue the path autonomously in most cases for at least three more laps. In one case manual control had to be invoked to steer the car towards the correct path, but after that additional training the car followed the road.

(25)

3

Theory

This chapter will detail the underlying theory for this thesis generally and the qHebb learning model especially. The structure of the chapter follows, in a loose sense, a chronological timeline of the topics, as well as the structure of the exper-iment setup, from Hebbian learning and channel representations at the core of the model to the CARLA simulator at the top level.

As such the chapter will start by detailing learning models in section 3.1, fol-lowing by a basic description of channel representations and the GIST feature descriptor in section 3.2. Section 3.3 will combine the previous topics into the central concept of this thesis — the qHebb learning model. The last two sections 3.4 and 3.5 will detail the CARLA simulator and the topic of how to deal with unbalanced datasets, respectively.

3.1 Learning Models

At the core of every autonomous vehicle is a learning model — a concept on how to get the car to perform the correct actions based on its current state and surroundings. For the qHebb model detailed later, two concepts are important to grasp:Hebbian learning and associative learning. These are later combined into

associative Hebbian learning. The chosen training method also affects the system,

as such some central terminology will be briefly explained in section 3.1.4.

3.1.1 Hebbian Learning

Hebbian learning stems from a theory by Donald Hebb in 1949 regarding the stimulation of neurons in the human brain. The core principle of the theory is stated in what is commonly called Hebb’s rule:

(26)

Let us assume that the persistence or repetition of a reverberatory activity (or "trace") tends to induce lasting cellular changes that add to its stability. ... When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased [26].

Donald Hebb, 1949

In simpler terms, the concept of Hebbian learning says that continuous acti-vation of neurons by a specific input would strengthen the bond between them. If the same neuron was to be activated by another input, the bond would weaken. This can be viewed as a way of finding patterns as a matter of deduction, or for a neural network this would equal increasing or decreasing the weights for the activated features [43].

A toy example, visualised in figure 3.1, is to observe the connection between symptoms and infection when testing three people for a coronavirus. The sub-jects in question:

1. - Infected, with a fever and a black eye 2. - Infected, with a fever

3. - Not infected, with a black eye

After testing the first subject, both symptoms are as likely to be caused by the virus. After testing the second person the bond between fever and coronavirus in-fection would strengthen. The third subject, testing negative for the virus, would weaken the relationship between a black eye as a symptom and the virus, due to a black eye showing in both positive and negative cases.

As Öfjäll mentions [42], the Hebbian learning rule should not directly be ap-plied to a variable represented by a single value. Instead, for the learning to work as intended and replicating the biological setup, multiple neurons should represent the variable. Such a representation is detailed in section 3.2.1.

3.1.2 Associative Learning

Using the notation of Öfjäll [43] the core principle of associative learning can be formulated as the equation

y= Cx, (3.1)

where C corresponds to the linkage matrix connecting the input activations x to the associated output activations y. Alternative names for x and y is perceptions and actions. As hinted by the name, the linking matrix links the observed percep-tions to the corresponding acpercep-tions. If linear, (3.1) can be viewed as a single layer neural network where C function as the input weights for each output.

There are no de facto way for calculating C; multiple approaches has been presented including closed and incremental evaluation [42] [20]. A traditional

(27)

3.1 Learning Models 11

Figure 3.1: Toy example visualising the relation between input and output activations in Hebbian learning. Here a smile or frown symbolises a patient not infected and infected with a coronavirus, respectively. As each person is tested, the connections between the symptoms and disease change depend-ing on new information, highlighted by the red colour.

(28)

approach has been to minimize a L2-norm as 1 N N X n=1 yn−Cxn . (3.2)

3.1.3 Associative Hebbian Learning

By combining the Hebbian theory with associative learning the result is an alter-native method on how to calculate the linkage matrix [42]. This can be presented as C= N X n=1 y_nxT_n. (3.3)

Iteratively (3.3) can be written as

Cn= Cn−1+ ynxTn. (3.4)

This results in a potentially infinite linking matrix as more terms are added. To avoid this, the qHebb learning model is presented in section 3.3.

3.1.4 Online Learning

Equation (3.4) results in a separable update of the linkage matrix per iteration. Such a system where each training step may update the linkage matrix — or cor-respondingly the weights of a neural network — uses online learning. This is different from systems where a full dataset is used for training. The true defi-nition of online learning is not unique throughout the literature, however, two other criteria specified by Öfjäll are that the continuous updates should not be infinitely increasingly heavy with regards to both computation and storage [42].

For online learning, each training data sample can be used only once, due to its iterative nature, and is discarded as an individual data point. This is contrary to other systems where the model can re-train using the same sample repeatedly. The advantage of an online system is therefore that it can be highly adaptive to the training data and in many cases computationally light-weight, therefore suitable for real-time systems [56].

However online learning does not set requirements for the type of learning to be performed. The three main types of training — supervised [61], unsu-pervised, and reinforcement [5] [63] learning — can all be used in conjunction. For the qHebb model detailed later both explorative learning, a mix between supervised and reinforcement learning, and learning by demonstration, sole su-pervised learning, will be used.

3.2 Information Representation

An important question of every project, algorithm, or learning method is how to represent information in the best way depending on its conditions. For many

(29)

3.2 Information Representation 13

applications, the data sent as input to a system can often contain redundant in-formation or be packaged in a non-optimal way. Here two different ways of rep-resenting data will be presented; first the basis of channel representations as a different way to represent a value and later the GIST feature descriptor to extract orientation information from an image.

3.2.1 Channel Representation

Like the Hebbian theory detailed in section 3.1.1, the channel representation is also inspired by the way how the human brain processes data [20]. The channel representation, or the closely related population coding [22] [1], may at a glance seem similar to a histogram, as they both are divided into a fixed number of bins or channels. However for channel representations, on the contrary to histograms, the value for each channel does not represent the number of data points corre-sponding to it, instead, all the channels combined function as a way to store one or multiple values.

A way to visualise it is as K different microphones placed at regular intervals in one dimension, i.e., a line. Imagine that a sound is played at any point be-tween two adjacent microphones, a sound that attenuates fast towards zero so only microphones within the distance w₂ from the source can pick up the sound. By measuring the loudness at each microphone and knowing how the sound at-tenuates, one can calculate where the sound was originally emitted with high precision. Even if two sounds were played at the same time at any position the same applies. Both positions can be calculated, if the original intensity is known.

The vector containing the microphone recordings from which it was able to determine the sound origins is called the channel vector. The transformation from the sound position to the channel vector can be seen as the channel encod-ing and vice versa the channel decodencod-ing. In the example, the interval at which the microphones are placed is arbitrary and can be translated and scaled to the desired interval. Figure 3.2 shows such a transition from the value ξ = 3.8 to the corresponding channel representation, where K = 6 and w = 2.

In theory the attenuation function or ratherbasis function would be, in the

case of the qHebb learning model detailed in section 3.3, a cos2 basis function

b(ξ) [42]., b(ξ) =            cos2 _πξ w |_{ξ| <} w 2, 0 |_{ξ| ≥} w 2. (3.5) Again, using the notation of Öfjäll, w is the desired width of a channel times 3, i.e., if channels are placed at integer positions, then w = 3. A K sized channel vector using integer placed channels would be described as

x= C (ξ) = [b (ξ − 1) , b (ξ − 2) , . . . , b (ξ − K)]T, (3.6) where C ( · ) notes the channel coding operator. C†

( · ) serves as the channel decod-ing operator. It is possible to scale the channel vector to represent any range

(30)

Figure 3.2: Visualisation of relation between a single value and its corre-sponding channel representation. Moving downwards in the figure, channel encoding is shown where the value ξ = 3.8 transforms into the correspond-ing channel vector x via a basis function. Transitioncorrespond-ing upwards displays the inverse, i.e., decoding.

(31)

3.3 qHebb Learning 15

not only positive integers, by scaling and translating the basis function.

The cos2basis function is invariant to the encoded value and has a constant

L1 and L2 norm, which in turn allows for accurate and simple encoding and

decoding to and from a channel vector representation [20] [42]. The chain of encoding and decoding is however not always lossless when more than one value is used, although it exists methods to mitigate some of the loss [42].

3.2.2 Feature Descriptor - GIST

For many applications using the raw pixel values of an image is not the best option. For reasons such as redundancy of information, limited computational power, or constraints of algorithms, other representations may be more suitable. Such alternative descriptors of information or interesting features in an image may be a per-pixel object segmentation mask or popularised with the rise of ma-chine learning, deep and shallow features [58] [39]. For the qHebb model, GIST is used as a feature descriptor.

GIST is based on Gabor filters [47] to condense the image to a more com-pact representation. By calculating the gradients for each pixel and sorting them, similarly to histograms, into bins depending on its orientation. This is done for multiple regions in the image, as well as on different scales. The GIST feature descriptor preserves orientation information in the image but is evidently a lossy representation.

3.3 qHebb Learning

qHebb learning is a model developed by Öfjäll and Felsberg [43]. At its core it is an online learning model suited for biased datasets. It is the main concept and the foundation of further work in this thesis. Much of this section will be based on Öfjälls doctoral dissertation [42].

3.3.1 Concept

The qHebb model has many roots in biological systems, combining the Hebbian learning from section 3.1.1 with the channel representation detailed in section 3.2.1. By channel coding the input and output activations, or actions and percep-tions, y = C (α) and x = C (ρ), (3.4) becomes

Cn= Cn−1+ C (αn) C (ρn)T= Cn−1+ Dn. (3.8)

Where Öfjäll uses the definition

Dn= C (αn) C (ρn)T. (3.9)

Still (3.8) is just an alternative representation of Hebbian associative learning and has the same problem mentioned in section 3.1.3, where a possible explosion of values may happen as n tends to infinity. To mitigate this problem and move

(32)

towards the qHebb learning equation a learning factor, γ, is introduced in (3.8) as

Cn= (1 − γ) Cn−1+ γDn. (3.10)

This removes the possibility of exploding values, as D, the outer product of the channel coded actions and perceptions, are of limited range.

In (3.10) the learning factor, γ, of the current update sample represented by D is directly linked to the forgetting factor, (1 − γ), of the so far learnt linking matrix C. Hence if γ is increased the new sample is weighed higher and the pre-viously learnt samples are weighed lower, i.e., forgotten faster.

As a solution for this problem Öfjäll introduced a q-term in 3.10 as Cn =

(1 − γ) Cq_n−1+ γDqn

1_q

(3.11) to get theqHebb learning equation. By adding this q-term the learning and

forget-ting could be de-coupled.

Using a q-value of 1 would return the Hebbian associative learning model where all data points have the same weight while increasing the q value would give a higher effect of data with low representation in the dataset. A q value below 1 can reduce the effect of outliers.

3.3.2 Implementation Overview

One implementation of the qHebb learning model is as a steering predictor of a car, like the one described by Öfjäll [43]. Using (3.9), the sole perception, in this case, is a visual camera placed on the hood of the car, while the corresponding action is the desired steering. To get an overview of the software implementation of the qHebb model, a stripped-down diagram of the code structure can be seen in figure 3.3. Here only the absolute basic steps are displayed, with two possible outcomes depending on whether the system is training or testing.

For the qHebb learning model the GIST feature descriptor is used, described in section 3.2.2. As the descriptor is dependent on orientations it is, therefore, suitable for predicting the steering of a car based on lane markings and road shape which are highly directional [42]. Here GIST is configured to partition a 500×350 px image into 64 patches at four different scales. The gradients in each patch are summarised and divided into eight different orientations. This results in 2048 features per image — the perceptions of the qHebb model — which are then channel coded.

To predict the controls based on the GIST features, the relation between of the actions and perceptions in (3.1) is used. As seen, the current linkage matrix serves as the connection between them. With both y = C (α) and x = C (ρ) being channel coded, decoding C†

(y) must be done to get the resulting value.

If the model is to be trained the linkage matrix update D detailed in (3.9) is used, with the current features and correct control. For a control signal consisting of only steering the result of the outer product is here a three-dimensional matrix. The linkage matrix is then updated as in (3.11), as long as the error between the predicted and correct control value is over a defined threshold.

(33)

3.4 CARLA 17

Figure 3.3:Flow diagram over the core steps of the qHebb model implemen-tation.

Worth noting is that no information other than the linkage matrix is stored for use later as the model has no temporal knowledge. This means that the system has no possibility to filter the result towards a smoother trajectory as possible with for instance a Kalman filter. In theory it is therefore of no significance that the frames are in chronological order.

The loop in figure 3.3 has been implemented in C and C++ by Öfjäll [44], using channel coding and Hebbian learning models from Felsberg [18] and a GIST features extractor done by Pugeault [16]. Due to the effect of channel coding the resulting linkage matrix is highly sparse, therefore fit for OpenCVs sparse matrix implementation, thus increasing the efficiency of the calculations [43]. As such the code is able to function at run-time speed, crucial for an online learning system.

3.4 CARLA

Prior to the start of this work the qHebb implementation detailed above was inte-grated into the car simulator CARLA to enable a higher load of training and test-ing without the need for a physical car. Here brief details regardtest-ing CARLA will be presented and as well as other virtual learning implementations and bench-marks.

(34)

3.4.1 Simulator

CARLA is an open-sourced physics-based car simulator1. It was developed by Dosovitskiy et al . in 2017 as a collaboration between the industry and the re-search community [15]. The simulator is divided into two components; a server, which uses Unreal Engine, and at least one client, defined and programmable by the user, but where predefined examples are available. The simulator is viewed via a freely movable camera. See figure 3.4 for an example of how CARLA may look.

Figure 3.4:A view of the CARLA simulator with an RGB camera.

The simulator is published, at the time of writing, with seven towns, as well as multiple weather pre-sets to choose from. A vehicle is defined as an agent, capable of making own decisions independent of other actors or clients. There exists the ability to use a wide selection of cameras and sensors, such as an optical camera, LIDAR, IMU, and GPS. For cameras several post-processing images can also be used, for instance depth and object segmented images, see figure 3.5.

CARLAas a simulator is in active development, with minor releases approx-imately once every quarter [7]. The latest version 0.9.8 contains agents useful for both a smooth auto-pilot and a more erratic behaviour agent, controlled by a PID-regulator.

For the qHebb implementation, the sole sensor on the car is a hooded camera, viewing the road ahead at a downward angle, as mentioned in section 3.3.2 with a resolution of this camera is 500 × 350 pixels. A sample image from this camera can be seen in figure 3.6. The qHebb implementation has support for training on both different auto-pilots and manual control. To aid manual control and get a clearer overview, a camera such as figure 3.4 and an eye-in-the-sky camera is used, for visualisation purposes only.

(35)

3.5 Unbalanced Data 19

(a)Depth image. (b)Segmented image.

Figure 3.5: Showcase of multiple post-processing cameras available in CARLA. (a) shows a depth image, while (b) shows a segmented image of the same scene as figure 3.4.

.

3.4.2 Benchmark

The majority of the research done using CARLA has involved end-to-end learn-ing, as in supervised deep-learning models where all parameters are trained jointly, as for Sallab [53] and Mehta [40] or various types of reinforcement learn-ing, such as Liang [37]. In these cases, the goal has mostly been to test advanced navigation or the ability to drive in a complex environment, with obstacles such as traffic lights, pedestrians, or other vehicles.

Simpler benchmarks for sole testing of road following has been done previ-ously by Codevilla et al ., both for testing imitation learning [10] and when using behaviour cloning to train a model [11]. Both benchmarks have been done with CARLAversion 0.8.x and as of now, no similar benchmark in CARLA 0.9.x has of now been published [6].

3.5 Unbalanced Data

Unbalanced data has grown as a problem with the increase of large datasets, in part due to lower storage costs and a raised demand of deep networks that highly benefit from much data. One should note though, that this is far from a new prob-lem. Especially in the area of object detection this problem may often prevail — it is more likely that something is not an object rather than is — as the datasets can grow excessively large with many classes. Keeping parity between the num-ber of images in each class can be difficult. However even a binary classifier can suffer from biased data. In medicine, collecting images for diseases such as breast cancer often results in much fewer positives than negatives [25].

If such data is limited in number, i.e., a highly biased dataset, it is important to have a suitable evaluation metric. Using a dataset with a ratio of 1:100 between the infrequent and common class, a model could achieve a 99 % accuracy by only labelling all data as usual. Metrics as Precision-Recall curves or Receiver Operating Characteristic curves can better convey the success of a model in such

(36)

Figure 3.6: Sample image from the camera setup used in the qHebb imple-mentation in CARLA. This is representative of the image sent to the code.

cases.

This problem exists due to the infrequency of the rare samples, however, there exist methods to decrease the ratio [29] [8]. Following are three different methods to achieve this:

• Random oversampling — The thought here is to randomly pick any of these rare samples, disregarding of prior usage, to the extent that the ratio should at the end be closer to equal [38]. By doing so one would mitigate the effect of the under-represented class in the dataset, but as a result some samples would, therefore, be used multiple times — increasing the risk of over-fitting. To lessen the risk samples could be augmented in ways, such as mirroring or rotating [17].

• Down-sizing or random under-sampling — On the contrary to oversam-pling the idea here is to remove samples at random from the over-represented class. This will result in a loss of information [8].

• Learning by recognition — Here the method is to ignore samples that do not use one of the classes, i.e., skip training on certain samples from a class. This could be done as Japkowicz [29] which used a multi-layered neural network trained to compress and then reconstruct samples from the over-represented class. If the reconstruction-error was small the sample was deemed to belong to the trained class, otherwise not.

Learning by recognition is in ways similar to what is used in the qHebb model, where if the model predicts a command sufficiently similar to the correct one, the model is not trained. Moreover, qHebb uses the decoupled learning and forget-ting rate to increase the importance of rare training samples [42].

(37)

4

Method

This chapter will outline how the experiments in this thesis were constructed and performed. 4.1 will describe some limitations in this work compared to Öfjälls previous work on a physical RC-car [42]. Details on the tuning of the model parameters will be given in section 4.2, while section 4.3 dives deeper into the CARLAand experiment setup. The metrics used for the qHebb training and test-ing are presented in section 4.4, while the experiments themselves are described in section 4.5.

4.1 Changes from RC-car to Simulator

In [46] the channel coding in the qHebb model was used in a multi-modal sense to represent probability density functions. By doing this, it was possible to extract multiple hypotheses from the channel representation. This was most prevalently used with a set prior, e.g., a turn signal [45], to determine the desired turning direction at a junction. However, for this work the multi-modal approach has been scaled back and no ambiguous traffic situations will be tested. Another feature mentioned by Öfjäll [42] is a self-feedback loop where the model itself would strengthen or weaken the bond between features and control in the link-age matrix, by predicting the next frame. As noted in section 3.3.2 the current implementation does not have support for such a feature. Instead this work will solely focus on road following and will not implement anything that will result in a temporal dependence. However, an approach similar to self-feedback can be seen in section 4.5.4.

(38)

4.2 Parameter Tuning

Before any online training could be done using the qHebb model online in CARLA, the various parameters used were tuned. The parameters in question were the fol-lowing:

• The optimal range [α, β], see 3.7, based on the GIST features. • The learning rate, γ.

• The q parameter, which affects the forgetting factor depending on the cho-sen learning rate.

• The number of channels used for steering, K. Varied did also:

• The number of frames used for training.

Previous experiments using the model had previously been done with a phys-ical RC-car and camera in vastly different environmental conditions, either in a sterile environment or at night in on a forest road [46] [42]. In these experiments some or all these parameters were not reported. Due to this, and to accommodate the change of environment, from real to simulated data, it was deemed critical to perform this tuning.

For tuning, a previously collected dataset from CARLA version 0.8.4 was used, which only included simple segments as straights and 90 degree turns. The seg-ments totalling 980 frames were recorded in two different weather conditions, clear and cloudy-wet, at 10 Hz. See figure 4.1 for an image of the same section in the different weather types. This dataset was collected using a camera with very similar placement and intrinsic parameters as the one used in the online version in a town that has had no major changes in later versions. As such the dataset was suitable for tuning, particularly as the updated CARLA was not yet ready at this early stage.

4.2.1 Tuning Channel Coding Range for Features

To choose a suitable range for the channel coding of the GIST features, the fea-tures from all frames in the dataset were calculated. As in previous work, the number of channels was consistently kept at seven for the feature encoding [42]. Only the channel coding range was affected by this tuning. In figure 4.2, the features from the clear weather dataset is shown next to the steering command for the corresponding frame. The intensity of each pixel represents the related feature’s activation. From the figure one could observe a clear pattern in how the features are structured both traversing through the frames and the features. This is due to how the GIST feature descriptor is implemented. The distribution of features was also determined.

(39)

4.2 Parameter Tuning 23

(a)Sample image from clear dataset. (b) Sample image from cloudy-wet

dataset.

Figure 4.1:Images from the dataset used for the offline tuning of parameters.

4.2.2 Model Tuning

With the range for the channel coding of the features determined, the rest of the parameters was to be tuned. The same offline dataset as before was used, however only the clear weather part. Due to the dataset being of a smaller size, it was partially used both for training and testing. The training was done on a subset consisting of turns in both directions, totalling 490 frames. For testing the complete dataset was used. While this could potentially increase the risk of over-fitting, such an outcome would not affect this experiment at large. The purpose here was to observe if the chosen parameters could result in a learnt model at all, i.e., the purpose was to test whether the model could train, rather than how well it trained.

The tested learning rates were chosen by the influence of previous driving and tracking experiments with the qHebb model, as well as code from Öfjäll [42]. To get a wide range of values, an increment of factor 2 was used per step.

As Öfjäll mentions, setting a higher value of the q-parameter, problems due to biased training would mitigate to a certain degree. Therefore the q parameter was deliberately kept to a value over three [46].

For the number of channels used for encoding the steering value, an uneven number was used. This was done to be consistent with Öfjälls experiments, as the channel centre of the centre channel would then represent a steering signal of 0, i.e., driving straight forward.

The last variable altered during the tuning was the number of times the model trained on the dataset. Up to 16 epochs of the training set, close to 8000 frames, were used. Again, this would enable over-fitting, but as mentioned it would not affect the purpose of the experiment.

For the model tuning all 525 possible combinations of the parameters shown in table 4.1 were trained and tested.

Still to decrease the risk of over-fitting and help the system in general the training was not done in chronological order, instead, the order of the frames was randomised. This should improve the learning of the system [19]. To validate the

(40)

-1 -0.5 0 0.5 1 Steering control 0 100 200 300 400 500 600 700 800 900 1000 Frames 0 0.02 0.04 0.06 0.08 0.1 0.12 Feature intesities

Visualisation of GIST features and steering control for each frame in the clear weather dataset

Figure 4.2: Visualisation of all GIST parameters from the clear weather dataset next to the corresponding steering control for the respective frame. For each frame the steering response is displayed to the left and the activa-tion of all GIST parameters to the right, ordered as in the code.

(41)

4.3 Transition to CARLA 0.9.8 25

Table 4.1: Parameters tested when tuning the qHebb model on the clear weather dataset. All possible combinations of these values were tested, re-sulting in 525 configurations.

Learning Rate, γ q-value Steering Channels Epochs

0.025 3 5 1 0.05 4 7 2 0.1 5 9 4 0.2 6 8 0.4 7 16 8 9

effect of the randomisation, testing was also done in chronological order.

4.3 Transition to CARLA 0.9.8

A key ingredient to make the thesis possible was to have a functioning qHebb implementation in CARLA. By the start of this thesis, the qHebb model had been integrated into CARLA version 0.8.4. As the code then ran at a frame rate below 5 Hz, a transition to move to a newer version began to enable more and faster training.

Initially at the start of the thesis the code was running in a modified version of CARLA0.9.4, called macad-gym [48]. Macad-gym is a client primarily used for the implementation of reinforcement learning models. While that kind of feature was out of scope for this thesis, it was the version of CARLA used at CVL1. As such there were several experts with relevant experience to address any questions and concerns.

However, many critical functions required for the thesis were absent; such as an easy multi-camera setup, simple location extraction for constructing routes, and most importantly an easily accessible pilot. In macad-gym, the auto-pilot was located on the server-side, with corresponding commands not available until after the execution. While these obstacles could be overcome with some effort, it was redeemed to be more to gain by transitioning to an unmodified CARLAsetup which would have all these features.

Using CARLA version 0.9.8, the provided auto-pilot client was modified to also handle manual control and the qHebb model was later integrated into it. In this updated system, a frame rate varying around 20 Hz was achieved, which was a big improvement from version 0.8.4. This was also an important criterion for the new physics engine to function optimally.

(42)

(a)Town 1. (b)Town 2.

(c)Town 4. (d)Town 7.

Figure 4.3:Birds-eye view of the towns in CARLA used for training, (a) and (c), and testing, (b) and (d).

4.3.1 Experiment Setup

For the experiments, simple segments were collected for some of the maps in CARLA. See figure 4.3 for an overview of the maps. Appendix A goes into more detail regarding the sections used in each map and the setup at large. In brief

town 1 and town 2 are of similar characteristics with only sharp 90-degree turns

on two-lane streets in an urban environment, while town 4 and town 7 have a

larger variety of turns and number of lanes in a much more rural setting. In the following experiments, training was performed in town 1 and town 4, re-spectively, while testing was done in both towns 2 and 7. As online learning is by itself a non-reproducible method, precautions were taken towards the design and ordering of the segments to achieve an as well-defined and varied training sequence. All to ensure that the results could be comparable to one another.

Speed was kept rather low, at a constant 20 km/h, to avoid ragged steering due to having to take a sharp turn in high speed. Still, as can be seen in figure 4.2, the steering control of the auto-pilot is far from smooth in a turn. Note that this is

(43)

4.4 Evaluation Metrics 27

Table 4.2:Notation used in section 4.4 Notation Meaning

S The number of sequences in an experiment

Ns The number of data points, frames, collected in sequence s N The total number of data points in an experiment, N =PS

s=1Ns

p_s,n Predicted (x, y)-position of the car at data point n in sequence s p_s Vector of the predicted route in sequence s, [ps,1, ps,2, . . . , ps,Ns]

a_s,n As ps,n, but for the auto-pilot control

a_s As ps, but for the auto-pilot control

independent of the version of CARLA. The auto-pilot has a tendency to perform several short bursts of near-max steering instead of a smooth response. Initial training showed that this could be a problem for learning, as will be discussed in section 6.1.3. As the model only link a visual representation of the view to a steering command, big fluctuations in steering signal could result in the model having trouble learning the appropriate control. To dampen the effect of this, the PID-controller of the auto-pilot was tuned as well by changing parameters in a structured fashion until an acceptable performance had been reached.

4.4 Evaluation Metrics

Evaluating a trained model, and the way of doing it, is in many ways the most important step of the process, as the integrity of the work relies on it being done in a correct and fair manner. But measuring performance can be far from trivial, depending on the task at hand, especially when the task is not a easily quantifi-able question asWhat is the top speed of a human?, but rather abstract as How well does this autonomous car perform?.

Depending on the application, the desired traits may vary, e.g., driving as safe as possible may be preferable over taking the shortest route. Overall the driving performance is in many cases best visualised in a plot of the route, but for the amounts of experiments to be done, an alternative well-defined metric is needed to present the data on paper and filter out the good from the bad results.

For the qHebb implementation, the chosen metrics will be explained in sec-tion 4.4.1 - 4.4.3, which together tries to convey the same informasec-tion as an plot of the taken route. Though, for some configurations the taken route will be pre-sented, as well as other data, such as the steering control from the auto-pilot and qHebb model. An aggregated result of the number of completed segments will also be shown for each training method.

4.4.1 Mean Deviation from Auto-pilot Route,

µ

∆

One of the metrics observed is themean deviation from the auto-pilot route, µ∆. What this measure is the smallest distance between the current location of the car and the route taken by the auto-pilot on the corresponding segment for every

(44)

recorded frame, and averages them. Using the notation in table 4.2, µ∆= 1 N S X s=1 Ns X n=1 min d(pn,s, as) . (4.1)

A low µ∆is desirable, as the model then closely follows the auto-pilot route. However, one should note that this metric does not take the distance traversed into consideration.

4.4.2 Average Sequence Length Difference, ASLD

Average sequence length difference, ASLD, is another of the used metrics. This

cal-culates, for each sequence, the traversed distance of both the qHebb model and the auto-pilot, ASLD = 1 S S X s=1 kdiff(p_s_)k kdiff(a_s_)k, (4.2)

using the notation in table 4.2.

As the length of each sequence varies, the average is determined using the ratio of the two distances. It should be noted that this metric could have a value over 1, as the distance travelled by the model may be larger than the respective distance of the auto-pilot.

4.4.3 Minimum Relative Distance to Goal - MRDG

The final metric used is theminimum relative distance to goal. For this metric, the

minimum orthogonal distance between the car’s trajectory to the final position of the auto-pilot is determined for each segment. The smallest one, relative to the overall length of the segment, is defined as the MRDG. The notation in table 4.2 gives,

MRDG = min min d(ps, aend,s)

kdiff(a_s_)k !

, s ∈ [1, S], (4.3)

The reason for not only comparing the last position of the predicted route by the qHebb model is that a small lateral offset could result in the car moving further along the road than the auto-pilot. In that case the minimum distance to the auto-pilot endpoint would not necessarily be the endpoint of the qHebb model. The closer MRDG is to zero the better and would show that the car has done well in at least one segment.

4.4.4 Metric Interpretation

To visualise the situations where the different metrics perform well or not, table 4.3 shows the outcome of the cases in figure 4.4. Looking only at µ∆, it is clear that the configurations producing the largest numbers can be discarded independent of other metrics. But in this examplec) would result in the lowest score, despite

(45)

4.5 Online Training 29

crashing early. The same applies to ALSD and MRDG where the optimal value, 1 and 0 respectively, does not necessarily equals a well-trained system. It is appar-ent that it is important to take all metrics into consideration, but that individual good results can highlight when a potentially good configuration appears.

Figure 4.4:Example of different routes used to evaluate the different metrics used.

Table 4.3:Evaluation of the routes shown in figure 4.4, using the metrics in section 4.4 µ∆ ASLD MSRD a) High ₁ ≈₁ b) Low < 1 ≈_0.5 c) Medium < 1 ≈_0.5 d) High > 1 0 e) Medium < 1 0

4.5 Online Training

As tuning did not result in an unambiguous parameter configuration, the new setup of different parameter configurations, see table 6.1, was trained and tested

Exploring the Training Data for Online Learning of Autonomous Driving in a Simulated Environment

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2020