Adaptive Steering Behaviour for Heavy Duty Vehicles

(1)

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2017 ,

Adaptive Steering Behaviour for Heavy Duty Vehicles

TOM ÅFELDT

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

Adaptive Steering Behaviour for Heavy Duty Vehicles

Tom ˚ Afeldt Tom.afeldt@outlook.com

September 19, 2017

Master Thesis Automatic Control KTH Electrical Engineering

Supervisor at Scania: Examiner at KTH:

Peter Sid´ en Mikael Johansson

Peter.Siden@Scania.com Mikaelj@ee.kth.se

(4)

Abstract

Today the majority of the driver assistance systems are rule-based control systems that help the driver control the truck. But drivers are looking for something more personal and flexible that can control the truck in a human way with their own preferences. Machine learn- ing and artificial intelligence can help achieve this aim. In this study Artificial Neural Networks are used to model the driver steering be- haviour in the Scania Lane Keeping Assist. Based on this, trajectory planning and steering wheel torque response are modelled to fit the driver preference. A model predictive controller can be used to main- tain state limitations and to weigh the two modelled driver preferences together. Due to the difficulties in obtaining an internal plant model for the model predictive controller a variant of a PI-controller is added for integral action instead. The artificial neural network also contains an online learning feature to further customize the fit to the driver preference over time.

Keywords:

Scania, ANN, Artificial Neural Network, MPC, Model Predictive Con-

trol, System identification, Vehicle dynamics, Recursive filtering, Hu-

man behaviour, Machine learning, Online learning, Sample based pro-

cessing.

(5)

Sammanfattning

Idag anv¨ ands till st¨ orsta del regelbaserade reglersystem f¨ or f¨ orarassistanssystem i lastbilar. Men lastbilschauff¨ orer vill ha n˚ agot mer personligt och flexibelt, som kan styra lastbilen p˚ a ett m¨ anskligt s¨ att med f¨ orarens egna preferenser. Maskininl¨ arning och artificiell in- telligens kan hj¨ alpa till f¨ or att uppn˚ a detta m˚ al. I denna studie anv¨ ands artificiella neurala n¨ atverk f¨ or att modellera f¨ orarens styrbeteende ge- nom Scania Lane Keeping Assist. Med anv¨ andning av detta modelleras f¨ orarens preferenser med avseende p˚ a placering p˚ a v¨ agbanan och mo- mentp˚ aslag p˚ a ratten. En modell prediktiv kontroller kan anv¨ andas f¨ or att begr¨ ansa tillst˚ and och f¨ or att v¨ aga de tv˚ a modellerade preferenser- na mot varann. Eftersom det var mycket sv˚ art att ta fram den interna processmodellen som kr¨ avdes f¨ or regulatorn anv¨ ands ist¨ allet en vari- ant av en PI-kontroller f¨ or att styra lastbilen. De artificiella neurala n¨ atverken kan ocks˚ a till˚ atas att l¨ ara sig under k¨ orning f¨ or att anpassa sig till f¨ orarens preferenser ¨ over tid.

Keywords:

Scania, ANN, Artificiella Neurala N¨ atverk, MPC, Modell Prediktiv

Kontroll, System identifiering, Fordonsdynamik, Rekursiv filtrering,

M¨ anskligt beteende, Maskininl¨ arning, Online inl¨ arning, Sample base-

rad signalbehandling.

(6)

Acknowledgements

Thanks to my supervisors, Peter Sid´ en at Scania and Mikael Johansson at

KTH (Royal Institute of Technology), for their help with ideas, suggestions

and help that formed this study.

(7)

1 List of Definitions 2

2 Introduction 3

2.1 Goals . . . . 6

2.2 Delimitations . . . . 6

2.3 Outline . . . . 10

3 Theory 10 3.1 Data Processing . . . . 10

3.2 Artificial Neural Networks . . . . 11

3.3 Model Predictive Control . . . . 19

4 Related Work 20 5 Implementation 22 5.1 Control Structure . . . . 22

5.2 Data Processing . . . . 24

5.3 Artificial Neural Networks . . . . 26

5.4 Controller Choice . . . . 30

6 Results 33 6.1 Data Processing . . . . 33

6.2 Artificial Neural Networks . . . . 35

6.3 ANN Performance on Unsorted Data . . . . 42

6.4 PI-Controller . . . . 45

7 Analysis 46 7.1 Setbacks and Evaluation . . . . 47

7.2 Modelling of the Vehicle Dynamics . . . . 47

8 Future Work 47

9 Summary 49

(8)

1 List of Definitions

ACC - Adaptive Cruise Control.

ADAS - Advanced Driver Assistance System.

ANN - Artificial Neural Network.

ARE - Algebraic Ricatti Equation.

BP - Backpropagation.

BR - Bayesian Regularization.

CNN - Convolutional Neural Network.

Frame - Multiple samples of data forms a frame.

LKA - Lane Keeping Assist.

MLP - Multi-Layer Perceptron.

MPC - Model Predictive Control.

MSE - Mean Square Error.

Path - The predicted future placements forms the path.

PathNet - ANN that predicts preferred future placements.

PCA - Principal Component Analysis.

PDF - Probability Density Function.

Placement - The placement of the truck relative the center of the lane.

PP - Peak to Peak.

RBF - Radial Basis Functions.

Standardized data - Data with zero mean and unit variance.

Torque - The driver steering wheel torque.

TorqueNet - ANN that predicts preferred steering wheel torque.

u - Control signal [Nm].

w ij - The weight for a node between layer i and j.

x - System state vector.

(9)

2 Introduction

Scania trucks today have several Advanced Driver Assistance Systems (ADAS) such as Adaptive Cruise Control (ACC) and Lane Keeping Assist (LKA). The ACC is more than just a conventional cruise control regulating a constant speed. It can for example keep a certain distance to vehicles ahead and optimize fuel consumption. The LKA is a system that helps the driver stay in the lane during long highway drives. To do this it uses sensors such as cameras but also map data. The LKA is today, as most of the other ADAS, a rule-based control system that uses, for example, a PID controller or similar to control the vehicle. One problem with rule- based control systems is that they are not very adaptive to the driver style, which can result in major differences between the controller actions and the driver preferences. For example the ACC will not have the same preview or throttle response as the driver and the LKA might be too strict or might apply unwanted lane placement for some drivers or in some situations. This would cause the driver to feel uncomfortable using the ADAS. The idea is that every driver has their own style and that the surrounding environment and events play a big part in how the driver acts [1]. In order to overcome the problem of adaptivity, the control system should be adapted to the driver style. By using the LKA as a basis for this study, an adaptive LKA with focus on the driver feeling can be developed.

An adaptive LKA may give a more personalized feeling and a system that acts in a human, more predictable way for the driver. The conceptual goal is to have perfect artificial intelligence, copying the driver’s brain and controlling the car with that knowledge and sensor perception. However since this is not possible the author resorts to mathematical models. De- veloping this driver model is also useful for simulation of ADAS since the assistance systems may change the closed loop dynamics in an unpredictable way when interacting with a driver. The hope when designing this type of system is that the driver will be willing to use the system to a larger extent than before and that he or she will feel more comfortable using it. This is a small step towards autonomous systems and aims to improve the accep- tance, trust and user feeling for this type of systems. But the idea of this system could be applied to any type of autonomous system where human users are involved such as vehicles or machines in general.

To create an adaptive LKA one must first identify the driver style from

some collected data, then learn or adapt the controller to this steering be-

haviour while making sure some state limitations are respected. For exam-

ple, the steering wheel torque needs to be limited to a safe and comfortable

(10)

level.

The data in this study was collected from the database of Scania which contains a huge amount of real driver data from various drivers, trucks and locations. The data was extracted according to the definition in Section 2.2 and filtered to remove noise. The human steering behaviour was modelled considering two driver aspects: lane placement preference and steering wheel torque response. This was done using Artificial Neural Networks (ANNs).

The ANNs were pre-trained offline but training can also be applied online during driving for driver adaptation over time. The final part of the control structure is a variant of a PI-controller which adds integral action to remove the identified lane placement error over time. A Model Predictive Controller (MPC) solution, able to take both modelled driver preferences in account, while respecting the state limitations was also investigated. However, the MPC requires an advanced vehicle model, which can be technically difficult to obtain and even more difficult to formulate in a physical way. This is among other things due to unknown dynamics, vehicle variations and time varying constants.

The contribution of this study is the attempt to model human steer- ing behaviour in a more general scenario than before and to use it in an actual driving environment while applying both offline and online learning to adapt to the human steering behaviour. Another important thing is the focus on the driver feeling and human behaviour. This is why the steering wheel torque is chosen as a control signal instead of the conventional steering wheel angle that is used in many other studies. Using the steering wheel an- gle to control the vehicle skips one driver aspect and gives no information on how fast the target angle should be reached. This conventional approach is however well suited for fully autonomous vehicles since it is easier to develop physical models with the steering wheel angle as control signal and in this case there is no driver complaining about the driver feeling or comfort of the vehicle. The steering wheel torque that was used in this study is the human response to control the vehicle’s lateral position and motion. The steering wheel torque applied over time results in a steering wheel angle that gives a lateral acceleration to the vehicle; that can control the lateral position.

However modeling human behaviour and formulating a physical model of

the vehicle dynamics in lateral position from a steering wheel torque is diffi-

cult. Extensive research of particular vehicle setups would have to be made

just to create a grey-box model, meaning that some constants and parts

of the model would still have to be experimentally identified and estimated

from data. This is why the author resorts directly to black-box models such

as ANNs. The study creates this proposed framework for further studies to

(11)

build and improve on.

The idea of how the lateral controller should work is illustrated in Figure 1. A pretrained ANN is used to predict lateral position errors, here called placement errors. Future positions, at 0.5s and two seconds ahead of the vehicle, are predicted using sensor and map data and then subtracted by the current lateral position to create the predicted lateral position errors. From this information together with more sensor data a steering wheel torque is predicted using another pretrained ANN. The predicted steering wheel torque is together with a PI-controller used to control the lateral position of the vehicle and to give a response that corresponds to the human behaviour and steering preference of the current driver. The ANNs can also be enabled to adapt over time to the driver style and behaviour.

Figure 1: This is the goal of the control system and lateral controller. Lane placement errors are predicted for future positions using sensor- and map- data. From this a steering wheel torque response is estimated.

The key for success with ANNs is to create fundamental requisites so

(12)

that they can learn and predict the dynamics in a correct way. In order to find the mapping from input signals to output signals one needs to have enough input signals that give information about the output signals. That is why many input signals were considered to find correlation to the output signals. The data needs to be extracted and processed carefully before learn- ing the ANNs; the amount of data used is also important. Too little, noisy or inconsistent data will not give good results in the end. Noise, outliers and uninteresting modes or frequencies in the data were filtered out. To avoid inconsistency in the data, only interesting sequences of driver behaviours and road environments that matched the considered static highway scenario were extracted. This is one of the key points to obtain good results. The data must be consistent in both road environment and driver behaviour. If too many drivers or road environments are considered at once, it will be difficult to find and learn a pattern and behaviour.

Below follow the project goals of this study and the delimitations stated in a measurable way. There is also a short presentation of the data sets that were used.

2.1 Goals

• Define a delimited scenario of the study.

• Determine the most important parameters and events that affect the driver steering behaviour; lateral lane position and steering wheel torque response.

• Determine what kind of sensors are needed to measure the interesting events and parameters above.

• Model the driver’s lateral lane position and steering wheel torque re- sponse using sensors and map data.

• Construct a controller to control the lateral position and steering wheel torque response adapted to the driver while keeping some states within certain boundaries.

• Achieve this in 800 hours of work or less.

2.2 Delimitations

The following assumptions have been made within this study in order to be

able to do a consistent analysis. It is assumed that the scenario is somewhat

(13)

ideal and repeatable with respect to the gathered data, specifically:

• Repeatable summer conditions, no snow, tunnels, strong crosswinds or other things that make the sensors unable to measure the required states.

• That it is possible to find driver models for the available data set.

• The driver is assumed to always be correct and consistent in his steer- ing behaviour, unless he violates fundamental driving rules.

• There must be a static highway scenario, driving in the right hand side lane with more or less constant highway speed (around 90km/h) without any lane changes.

• If the ADAS should fail, the driver is always responsible to act at any time. Hence, the system is not autonomous.

In practice this means that data needs to be extracted before training the ANNs, considering some conditions to hold:

• Vehicle speed > 70 km/h

• The driver is not using the direction indicator or the brake.

• The driver keeps her hands on the wheel.

• The lane width is reasonably estimated.

• The truck is positioned inside of the lane.

• The sensor signals are of good quality.

The sequences that fulfill these conditions above, forms the data set that was used. This data set is called sorted data. The full continuous data set where no sequences of data were removed is called unsorted data. The unsorted data set contains several other situations than those whom fall within the defined scenario of the study. Two data sets from two different drivers were used. They were from the same truck, on the same type of roads, recorded on the same day and of approximately of the same length sampled at 100Hz.

The first data set was used for investigation, training and validation. The

second data set was solely used for evaluation and testing. The full data set

used in the investigation of this study was 256 minutes long. When the data

had been sorted and the interesting parts extracted, 162 minutes remained.

(14)

Because of this extraction the sorted data set will not be fully continuous but piece-wise continuous with varying length in the concatenated sequences. In Figure 2 the distribution of the removed data is illustrated; the red lines are samples that were removed from the data set. The blue lines shows the full range of the full data set.

Since the model is identified from a data set it is important to remember that the model is not a full picture of reality and can only reflect what is seen in the data set. If the driver does not have a consistent behaviour in lateral positioning or steering wheel torque response it will be difficult or impossible to identify the general behaviour.

There will be some things that affects the driver but are difficult or im- possible to measure, such as: visibility, driver fatigue, road grip, crosswinds or deep ruts. However a technique for measuring deep ruts in the road was proposed in Section 5.

In this study, open loop simulations were used to test and to implement

the system. No new sensor data will arrive no matter how the truck is con-

trolled since there is no feedback from the real world. Open loop simulations

also mean that there is no ’driver in the loop’ compensating and interacting

with the controller. In reality, the driver behaviour and closed loop dynam-

ics of the system can change when the system interacts with the driver. The

driver might change her behaviour when steering wheel torque is applied

from the controller. This can also cause the closed loop dynamics to change

in an unexpected way. In some cases it is also possible that in this simulated

system the prediction from the black box identification are actually not a

prediction but merely an effect from the driver actions. This could happen if

the prediction lag behind the log data. The steering wheel torque prediction

could then be an effect from past driver actions such as steering wheel an-

gle. For example, after the driver uses a steering wheel torque, effects in the

steering wheel angle can be seen. If the steering wheel torque prediction lag

behind in time it is possible to use this effect to recreate the used steering

wheel torque. This then becomes an identification of the past rather than a

prediction of the future. However, as long as there is some margin in time

when predicting, this is not a problem. The aim is to predict the current or

close future driver steering wheel torque. If the identification is set to match

the driver steering wheel torque some time ahead of the driver action, small

lags will be acceptable.

(15)

Figure 2: This is the illustration of the amount and distribution of removed

data due to sorting the full data set. The red lines are samples that were

removed from the data set. The blue lines shows the full range of the full

data set.

(16)

2.3 Outline

The rest of this report is organized as follows: Next there is a theory section that explains all the necessary theory for this study such as filtering, system identification, ANNs and MPC. After this there is a section of related work and papers that were considered important in this study. The implemented methods and control structure are then explained in a greater detail in Section 5 and the simulation results are shown in Section 6. After this there is a discussion about the results and a section of suggestions for future work followed by a summary in the last section.

3 Theory

Theory for the proposed parts in the system are explained here to give the reader a more complete picture of the algorithms and methods used in the implementation.

3.1 Data Processing

As in [2], recursive filters are used to filter samples as they arrive and to standardize the data online. Discrete IIR filters of order two were used.

Standardizing the data is to make the data have zero mean and unit variance.

Recursive algorithms for updating mean and variance online can be seen in Equations 1 and 2 respectively. Proofs are found in [3], [4]. The data is then standardized using Equation 3.

µ t = t − 1

t µ t−1 + 1

t x t (1)

σ ² _t = t − 1

t σ _t−1 ² + 1

t − 1 (x t − µ _t ) ² (2) x t = x t − µ _t

σ t

(3) Principal Component Analysis (PCA) was also used to process the data.

This statistical tool uses orthogonal transformation to convert multidimen-

sional data to linearly uncorrelated variables. The directions in the data

that have little or no variance will no longer remain [5]. By transforming

the data in this way one can reduce the number of dimensions of the in-

put data while preserving the variance giving the ANN a better chance for

success. The PCA principal is illustrated in Figure 3.

(17)

Figure 3: An illustration of the PCA process. The PCA dimensions in the new PCA coordinate system is chosen to be the dimensions that have the highest variance and at the same time are uncorrelated.

3.2 Artificial Neural Networks

References used for this section, Section 3.2, and more information can be found in [6].

Artificial Neural Networks (ANNs) were originally inspired by the human brain that use networks of neurons for computations and learning. An ANN consists of nodes and layers with weights that connects the nodes together.

ANNs are black box models that can amongst other things learn to fit func-

tions from user data. It can learn to fit a vast amount of nonlinear functions

as long as you use enough nodes, layers and input data. The function can

be some unknown system dynamics or some other relation that needs to

(18)

be modelled. By feeding the network with input data and target data, the weights of the ANN can be trained. They are trained so that the output sig- nal approximates the target data. There are some different types of ANNs, the difference between them is the topology and the training algorithms that are used. There are feedforward networks and recurrent networks both with supervised learning and unsupervised learning. The nodes of a feedforward network only have connections forward in the architecture while recurrent networks involve connections in loops. Some examples of feedforward net- works are Multi-Layer Perceptron (MLP), Radial Basis Functions (RBF) and Convolutional Neural Networks (CNN) and they are well suited for the task at hand. Recurrent networks are for example used for tasks such as pattern recognition, simulating a long- and short-term memory. Other applications of ANNs involve clustering, signal processing and classification.

Supervised learning means that the network is told how to learn and what inputs to use in order to approximate the target. A set of inputs that are believed to directly cause another set of outputs need to be defined.

There might not always be a casual relation between input and output but to another mix or representation of the input data. For example, in this study, no casual relation was found between the distance to the left hand side traffic and the lane placement. If the radar data was instead used to calculate the density of the traffic a relation to the lane placement might be found. It can as a designer be difficult to find these significant inputs and for supervised learning it is difficult to learn deep relations. The diffi- culty in learning exponentially increases in the number of layers used in the network [7]. This problem can be solved with unsupervised learning. The network then decides for it self what features in the data to use as inputs in order to approximate the target data. PCA, mentioned in Section 3.1, is for example an unsupervised signal separation technique since it is not told beforehand which input directions to use. When using unsupervised learning it is possible to learn larger and more complex networks [7]. How- ever one disadvantage with unsupervised learning is the lack of direction for the learning algorithm since it is not told exactly how to approach the problem. Another is that is might become difficult to tell what the network learned, what inputs it used to approximate the target and it might lose physical meaning. Then there is of course the ’curse of dimensionality’ [8].

For every input dimension used the computational complexity grows expo-

nentially. The demand for more data also grows large due to the risk of poor

generalization of high dimensional spaces. This is truly a ’curse’ for the use

of high dimensional inputs and data seen in many other applications but

ANNs.

(19)

However this study only consider supervised learning for the MLP since it is believed that the input set directly causes the output in a few steps.

The MLP has a feedforward topology which means that there are no connections in loops, only forward ones and it can be used to model nonlinear functions. Each layer consists of several nodes and every node in one layer is connected to all the other nodes in the next layer. An example of an MLP network can be seen in Figure 5. The connection is represented by weights, w. A weight from a node i in one layer, to a node j in the next layer is denoted w _ij . The inputs are fed into the first layer called the input layer and each node in this layer represents one input. In an analogous manner the target is represented in the last layer called the output layer and each node in this layer represents one output. The target is the output that the network is trying to reproduce based on the input data. In between input- and output-layer there is one or several hidden layers with a number of nodes in each layer. The hidden layers and nodes are the part that defines the complexity of the network and this is where the magic happens.

When training, the weights are usually initialized by setting them to small random numbers [6]. Backpropagation (BP) is a common algorithm for training MLP networks. The algorithm has two phases, the forward- and the backward-pass. The ANN is basically trained using gradient descent on the square loss and the gradient is computed via the chain rule. Below follows a walk through of the BP algorithm for a MLP with one hidden layer. The algorithm is also summarized in Algorithm 1. [6].

In the forward pass the inputs are propagating forward through all the layers of the network. To pass through the layers there are some steps. The activation and activation function is calculated for each node in all the layers and this together with a threshold θ decides whether the neuron fires or not.

If the neuron fires it outputs the value one otherwise zero. The activation for each neuron in layer j is denoted a j and is calculated according to Equations 5, where x _i are the inputs to the neuron, w _ij are the weights of the hidden layer and ϕ is the activation function. One common activation function is a sigmoid function that can be seen in Equation 4.

ϕ(x) = 1

1 + e ^−x (4)

h _j = X

i

w _ij x _i (5a)

a _j = ϕ(h _j ) (5b)

(20)

The activations from the hidden layer is fed to the output layer where the final activations are calculated in an analogous manner seen in Equations 6.

The only difference is that the activation function Φ is now a linear function since a continuous, linear, straight line through the origin is desired as output signal. In Figure 4 a more detailed view of a neuron is shown containing the transfer function and activation function.

h k = X

j

w jk a j (6a)

y k = Φ(h k ) = h k (6b)

Figure 4: Here is a closer look at what happens in a single neuron. The ac- tivation a j is the output of the neuron that turns into zero or one depending whether the neuron fires or not.

This was the first phase called the forward pass, propagating the inputs forward throughout the network calculating the outputs of the network. In the next phase called the backwards pass the errors of the network outputs are calculated for every node and propagated backwards in the network. The errors for each neuron in layer k is denoted δ _k and is calculated according to Equation 7. Where y _k is the output of the nodes in layer k and t _k is the given target value for the same layer.

δ _k = (t _k − y _k ) · ϕ ⁰ (y _k ) (7)

(21)

The error of the output layer are then fed backwards to calculate the errors in the hidden layer and the rest of the network. This is done for the next layer in Equation 8.

δ _j = X

k

δ _k w _jk · ϕ ⁰ (h _j ) (8)

Finally when all the errors are calculated it is time to adjust the weights in order to reduce the errors. This is done according to Equations 9. Where η is the learning rate that can be predetermined by the user or set by the training algorithm.

w _jk ← w _jk + ηδ _k a _j (9a)

w _ij ← w _ij + ηδ _j x _i (9b)

After the network has seen all the input data once, the training algo- rithm is repeated in the next so called epoch until the performance measure stops decreasing or certain stopping criteria is met. When the training is completed the forward pass can be used to predict the output given input data.

Figure 5 shows an example of a MLP topology with one hidden layer and

three hidden nodes plus bias nodes. The output torque is mapped against

two inputs, placement error and heading angle. Nodes are represented as

circles, hidden layers are placed in between input- and output-layer. The

arrows from each node represents the corresponding weights that are tuned

during the training process. Bias nodes takes care of eventual bias to enable

representation of values in the range of [-1 1].

(22)

Figure 5: An example of a MLP topology with one hidden layer and three

hidden nodes plus bias nodes. The output torque is mapped against two

inputs, placement error and heading angle. Nodes are represented as circles,

hidden layers are placed in between input- and output-layer. The arrows

from each node represents the corresponding weights that are tuned dur-

ing the training process. Bias nodes takes care of eventual bias to enable

representation of values in the range of [-1 1].

(23)

Algorithm 1: The backpropagation algorithm for training the MLP.

1) Forward Pass: Compute the activations a j and y k

a _j = ϕ X

i

w _ij x _i

!

y k = Φ



 X

j

w jk a j





2) Backward Pass: Compute the errors δ _k and δ _j δ _k = (t _k − y _k ) · ϕ ⁰ (y _k )

δ _j = X

k

δ _k w _jk · ϕ ⁰ (h _j )

3) Weight Update: Update the weights w _jk and w _ij w _jk ← w _jk + ηδ _k a j

w _ij ← w _ij + ηδ _j x _i

A number of advanced training algorithms are available today, two good ones are Levenberg-Marquardt and Bayesian Regularization (BR). Levenberg- Marquardt is a fast algorithm that often gives the best training result [9]. It heads straight for the closest minimum interpolating between Gauss-Newton and gradient descent [10]. BR aims to control the complexity of the network by penalizing non-smooth solutions and the number of active weights [11], [12]. This can help to improve generalization and avoid fitting the noise. BR is a type of backpropagation algorithm but instead of minimizing the MSE directly, it looks at the Probability Density Function (PDF) over the weight space [11], [12]. The algorithm is not very fast but can give good results for noisy, small and difficult data sets [6]. Further evaluation of other training techniques are left to the reader.

One important thing when using system identification tools is to avoid

overfitting the data (also called fitting the noise) in order to generalize as

much as possible. Cross validation and early stopping can help avoid overfit-

ting [6]. Also some training algorithms like BR try to counter this problem.

(24)

Early stopping and cross validation requires that the data is split into sev- eral sets. One example could be to have a training set used for training the network, one validation set used to measure the MSE during training and one independent test set used solely for performance evaluation after the training is completed. The idea is to stop training the network if the MSE of the validation set starts to increase for some time. The MSE for the training set usually keeps decreasing as long as you keep training the network and this is also the point of the training algorithm. During the training process the validation- and test-set are not shown to the network when tuning the weights and reducing the MSE, they are only used for eval- uation purposes. This keeps the network from overfitting these separated data sets. One potential problem with the early stopping approach could be that you stop training before you reach the global optimum and get stuck in a local minimum. Another downside of splitting the data into several sets is that you reduce the amount of training data which can be crucial for small data sets. If the data set is small BR might be a better choice since there is less need for a validation set when using BR, which means that more data can be used for training [9]. This is because of the algorithms ability to find a more general solution.

In order to create the best conditions for success it is recommended that the data is pre-processed before use in the ANN. The following data pre- process steps can be performed; Low pass filters can remove noise, high pass filters can remove slow drifts, other filters such as a Hampel filter can remove outliers and repair damaged data [13], PCA can remove correlated directions and reduce the number of input dimensions, standardizing the data can also improve training time and performance [6].

Other things that can be done to improve the results are to use more training data, train more iterations, use another training algorithm, use more inputs and a larger model or try different topologies and sizes. One could also try to train multiple networks with different initialization and data division. Then adding the results together by an average of the weight matrices. Adding new random Gaussian white noise to the input signals every time it is presented to the network is also reported to improve gen- eralization, preventing the network from overfitting the original data set.

[6].

All of these things mentioned to improve the performance are of course

tradeoffs. One would like to keep the network as small as possible to reduce

training time and the same thing with the amount of data used but larger

networks and more data can give better predictions. One of the hardest

tradeoffs is the generalization. It is the tradeoff between bias (approximation

(25)

error) and variance (estimation error). An issue is often that the ground truth is not known.

3.3 Model Predictive Control

Good references for this section, Section 3.3, and more information can be found in [14] and [15].

Model Predictive Control (MPC) is an optimal control strategy based on numerically optimizing the control and plant response for some receding- horizon and at regular time intervals. Using a plant model future responses can be predicted and optimized with respect to some cost function and set of constraints. MPC was originally used in slower chemical processes with sampling times of minutes or seconds but has recently reached faster applica- tions as the computational power increases. A summarized MPC algorithm can be seen in Algorithm 2 and a general MPC formulation is shown in Equation 10. Hence, N _c ≤ N , Q, Q _N , R << W . N is the prediction hori- zon, N c is the control horizon, Q, W, R are tunable weight matrices and s t

is a slack variable to soften the constraints to ensure feasibility. Q _N , the end penalization weight should be chosen as the solution to the Algebraic Ricatti Equation (ARE), [14]:

P = Q + A ^T P A − A ^T P B ^T (R + B ^T P B) ⁻¹ P BA.

The MPC formulation is used each time step to predict the plant dy- namics N steps ahead and to calculate N _c optimal control inputs while minimizing the cost function subject to linear dynamics and linear inequal- ities. If N _c < N the last optimal control input at time N _c is held until time N .

minimize

N −1

X

t=0

(x ^T _t Qx _t + W s _t ) +

N

c

X

t=0

(u ^T _t Ru _t ) + x ^T _N Q _N x _N + W s _N subject to :

t=0,...,N −1

x _t+1 = Ax _t + Bu _t C t x t − s _t ≤ c _t F _t u _t ≤ f _t

C _N x _N − s _N ≤ c _N

(10)

(26)

Algorithm 2: The model predictive control algorithm with receding- horizon N

Input: x _t Output: u ^∗ _t while t ≥ 0 do

1) Measure the states x t .

2) Solve the finite-horizon constrained LQR problem in Equation 10.

3) Apply the first optimal control input, u ^∗ _t=0 (x t ) from the solution of Equation 10.

t ← t + 1

4 Related Work

In 1993 the authors of [16] developed a neuro-controller using a three-layer neural network with 21 neurons, learning from human driving data. The data consisted of recordings on a highway from different drivers contain- ing several sequences between 140-580s of length. The significant inputs were heading angle, road curvature, lateral deviation, and the time aver- ages of both road curvature and lateral deviation. The vehicle speed and lane width were left out since they produced contrary effects to experienced driver knowledge. The output was the steering wheel angle. Training the network every sample the network converged after 50.000 cycles of learning, which equals 70 min at 12.5 Hz. It was also found that the data in the training set was more important than the network topology. Lateral road banking, crosswinds, deep ruts in the lane and badly painted markings were found to act as disturbances in the system leaving a static error about 25 cm in lateral displacement. It was reported that compared to a conventional PID controller, on a public highway, the neuro-controller was slightly better minimizing the lateral deviation and felt more comfortable for the driver.

In 1998 the authors of [17] also used multi-layer neural networks to out- put a steering angle considering lateral deviation, vehicle speed and accel- eration. The training data was simulated data from a vehicle model. The most significant input was the lateral deviation depending on the drivers visual perception.

In 2005 the authors of [18] trained ANNs from driver data to develop a

(27)

driver handling behaviour model. The used inputs were yaw angular veloc- ity, lateral velocity, lateral acceleration, roll angle, roll angle velocity, lateral displacement and preview lateral offset and the outputs were the steering wheel angle and steering wheel angle velocity. The results were evaluated on several different lane change maneuvers, the same maneuvers that the drivers made and data was collected on. Multi-layer networks are reported to work well. This is probably the paper that influenced this study the most due to its complete picture and good references.

One specific author has written several important articles on this sub- ject, [19], [20], [21]. In [19] it was suggested that ANNs should be used to model adaptive control system behaviour such as human behaviour. The authors made basic use of an elementary two-layer ANN to model the driver steering behaviour in a path regulation problem during lane change ma- neuvers. Lateral deviation was used as input, both current, preview and delayed versions to output a steering angle. The data was collected both from simulation models as well as real data from drivers. The results are quite promising but it is unclear if the authors evaluate their network on a separate validation data set. If not, the ANN might be overfitted causing it only to fit the training data.

The authors of [22] also used the lateral deviation and a back propagation network to control the vehicle.

Another interesting and different approach was made by the authors of [23] with the use of unsupervised learning. The system processed entire video frames of data to output a steering angle. The authors struggled to successfully train the net to achieve satisfactory results. This might work better today with more computational power and after the recent trend in Convolutional Neural Networks (CNNs) and deep learning.

The authors of [24] developed a computational model of driver behaviour in a cognitive architecture. Adaptive Control and Thought-Rational (ACT- R) cognitive architecture was used to model driver profiles such as: steering- , lateral position-, gaze-, curve negotiation-, lane changing-profiles and for decision making in a multi-lane highway environment. This cognitive archi- tecture could be used to predict and recognize driver behaviour instead of the ANNs that were used in this study.

A good thing suggested in many of the papers as for example in [24] and [18] is the use of the driver preview behaviour. In [18] the authors calculates a lateral preview offset to get the predicted future placement error some time ahead of the vehicle. While the authors of [24] uses two points to represent the vehicle’s current lane position and the upcoming curvature respectively.

Using the drivers preview behaviour gives a model for slow and fast driver

(28)

dynamics that can be used in the controller.

There are also other methods to model human driving behaviour. In [25]

human behaviour is modelled with a set of linear Kalman filters together by a probabilistic Markov chain. The dynamical model and sequence with the highest probability considering the observed data was chosen to rec- ognize human behaviour. Driver data from a simulation environment was used when classifying events such as: lane changes, stopping, turning or overtaking. Good results were reported.

In [26] the author proposes an approach to driver modelling in three steps: sensory perception, ability to learn and ability to optimize. He argues that this is essentially a model predictive controller and develops physical prediction models for driver dynamics based on this.

In [27] some different driver models are summarized. The authors also propose an adaptive predictive control framework. But most importantly for this study is that they conclude that the selection of driver model depends on the scope of the study and gives some intuition of how to model a driver.

Several other authors implemented fuzzy logic controllers instead of ANNs with various results, for example [28], [29]. However more knowl- edge about the physical system is needed and it might be more difficult identifying the driver behaviour using fuzzy logic.

There are also numerous interesting historic projects and competitions in the vehicle automation control field, such as the VaMoRs project, VAMP project, PROMETHEUS project, ARGO vehicle, PATH program, NAVLAB project, VISTA project, NAVIA vehicle and of course Google’s projects and the well known DARPA challenge [30], [31], [32], [33], [34].

5 Implementation

5.1 Control Structure

To implement and simulate the system Matlab and Simulink were used. The

block schematics of the implemented system can be seen in Figure 6. First

the signals are recursively lowpass filtered, then the predictions are made

from the ANNs and online learning is applied if the driving scenario pass

the terms stated in Section 2.1 and the driver is actively steering. One ANN

called P athN et is used to predict the future lateral lane placements at 0.5s

and 2s ahead of the vehicle, forming a path. This path is one of two aspects

that models the driver steering behaviour. The other aspect is the driver

steering wheel torque response to reach the formed path. In a similar way as

the first ANN, another ANN called T orqueN et is used to predict the driver

(29)

preferred steering wheel torque response. The ANNs consider the current and delayed measured state and the calculated predicted placements errors as inputs. The predicted placement errors are formed by subtracting the current placement from the predicted placements. A controller can be used in the end to create robustness and to weigh the two driver aspects together.

The controller could use the placement predictions and the steering wheel torque prediction as references while calculating an optimized steering wheel torque response based on a vehicle model. An MPC would be ideal for this type of controller. The control signal would be chosen to deviate as little as possible from the steering wheel torque prediction while at the same time minimizing the deviation from the predicted path, creating an optimized response. Which one of these two minimizations that is considered the most important one is a trade off that could be chosen by the designer or user.

In order for the controller to predict the response over some time horizon a

vehicle model is needed that creates a mapping from steering wheel torque

to lane placement. The response is optimized over some receding horizon,

computing a sequence of optimized control signals for the whole horizon,

every step. Using an MPC it is also possible to respect certain boundaries

on the states, such as maximum allowed steering wheel torque or boundaries

on lane placements. A formulation of this type of controller can be seen in

Equation 13.

(30)

Figure 6: This is the block schematics of the implemented system. First the signals are recursively lowpass filtered, then the predictions are made from the ANNs and online learning is applied if the scenario pass the terms stated in Section 2.1. Finally the controller uses the predictions as references and calculates an optimized response.

5.2 Data Processing

Since the system is used online in real time, new data streams in sample per sample as time passes. If frame based processing is desired, multiple sam- ples processed at once, large buffers have to be used to be able to process the data. Frame based processing is usually more effective and appealing.

However this can be demanding for the system memory and thus cannot be

implemented in many embedded systems. Instead sample based processing

have to be applied in such implementations, processing each sample of data

as it arrives. Normally when working with saved data, as have been done in

this case, one would filter batches of data but since the real system is bet-

ter represented with sample based processing it is used in the simulation as

well. Conventional lowpass filters cannot filter one sample at the time and

(31)

therefore it is required to have recursive lowpass filters to solve this issue.

Discrete IIR filters of order two were used to do this. They were designed with the help of second order Butterworth filters, to find the corresponding filter constants. After the filtering the data was standardized, giving it a zero mean and unit variance. A recursive algorithm for online standardiza- tion was used to do this as explained earlier in Section 3.1. Finally the data was processed with PCA to find fewer and at the same time uncorrelated input dimensions.

In order to process the data in the best way and to choose the best set of inputs for the ANNs a lot of testing was done to get the best results.

Knowing which inputs that are significant and how to filter them beforehand was very difficult. Some leads could be found in Section 4 but ultimately testing had to be done. By intuition of the physical system and help from previous research, correlation tests and performance tests in ANN models were done on different set of inputs to select the most significant ones. The same empirical experiments were done considering the filtering constants;

trying to low pass filter the signals above a frequency that seem physically

incorrect. For example, one could imagine that the driver only compensates

at a certain rate and the rest is noise from the sensor. Arguing in this manner

the time between peak to peak (PP) values of the inputs were limited to

certain upper boundaries. The filtered signals were evaluated in the ANNs

to ensure good filtering and the process was looped several times. The

time between peak to peak values can be seen in Table 1. The actual cutoff

frequencies used does not give much information because it depends on other

things such as the actual data set, which means they have to be adjusted

accordingly. In the filtering process Hampel filters were also tried to remove

outliers and repair data, but it was seen that low pass filters gave better

results in the end.

(32)

Signal PP Time boundary [s] PP value

Barrier Pos. Right long N/A

Barrier Pos. Left (for cars passing) short N/A

Current road curvature 25 ₁₀₀₀ ^3.5 m ⁻¹

Driver input torque 5 6 Nm

Heading angle 4 1.75 deg

Lane width 50 0.45 m

Map curvature 32 ₁₀₀₀ ^3.5 m ⁻¹

Current placement 30 1.2 m

Steering wheel angle 10 6 deg

Table 1: Table of the upper boundaries for the time between peak to peak values in the input signals. Barrier Pos. was not used in the end but different filters were still tried to try and find correlation to the target.

For measuring deep ruts in the road it is proposed that one measure the ground clearance and compare it to some standard value or moving average.

This would at least give an indication of how the road conditions change over time.

Due to space limitations all the tested filters and combinations of inputs etc. are not presented. Only the best combinations and results are presented in the following result section, Section 6.

5.3 Artificial Neural Networks

Since Matlab and Simulink were used in the implementation it was natural to make use of the Matlab ANN toolbox. In Simulink a user defined Matlab Level 2 S-function block was written to be able to use these toolbox functions in Simulation. In this function ANN predictions and online learning was implemented. A two seconds buffer was also necessary for the online learning feature to work. Unfortunately code generation is not supported for this type of solution. If code generation is desired offline generated ANN blocks can be used in Simulink instead, however the online learning feature is then lost.

A solution to this problem could be to directly code in C or similar. The ANN algorithm with online learning has its outline explained below and is also summarized in Algorithm 3.

First the current state is checked to see if it fulfill the scenario terms

stated in Section 2.1. To check if the driver is active the Driver input

torque needs to be larger than 0.75 Nm for a period of 0.5s. If this is fulfilled

the sensor data is buffered as stated in Equations 11. The filled buffer size

(33)

has a length of two seconds. Remember that in simulation there is no driver in the loop which means that there is no need to check this criteria. Due to this the T orque target is also slightly differently defined which can be seen in Equations 12.

T orque buf f er = [T orque input; T orque target] (11a) P ath buf f er = [P ath input; Current placement] (11b)

Then the ANNs are trained, the T orqueN et can be trained from the start of a single buffered scenario while the P athN et needs a full buffer in order to have target data for the training. The T orqueN et is, as mentioned before in Section 5.1, the ANN that predicts driver steering wheel torque and the P athN et is the ANN that predicts the future lateral lane placements ahead of the vehicle. The definitions of inputs and targets for the ANNs can be seen in Equations 12 and the definitions of the signals can be seen in Table 2. The true Path- and Torque-input also includes three delayed versions of themselves, with a 0.5s delay between each delayed version.

T orque input = [P lacement error05, P lacement error2, Current placement, (12a) Heading angle, Y aw rate, Current Road Curvature, (12b) M ap Curvature, Steering wheel angle, Lateral acceleration] (12c)

T orque target reality = [u − Driver input torque] (12d)

T orque target _simulation = [Driver input torque] (12e)

P ath input = [Current placement] (12f)

P ath target = [Current placement05, Current placement2] (12g)

(34)

Signal Description Sensor Barrier pos. Right/Left Distance to objects on the right and left hand

side. Barriers, cars passing by etc.

Radar Brake ped. position The brake use in percentage. - Current road curvature The curvature of the road, (1/radius). Camera Dir. Ind. lever used Logical signal for use of indication lever. - Driver input torque The steering wheel torque input by the

driver.

Torque sensor Est. road banking angle Estimated banking angle on the road. Yaw rate and lat-

eral acceleration Hands off detection Logical signal for hands on wheel detection. Torque sensor Heading angle The heading of the truck relative the road. Camera

Lane width The width of the lane. Camera

Map curvature The curvature of the road ahead of the truck. Map data Placement05/2 The placement of the truck relative the center

of the lane. 05/2 are ANN predictions at 0.5s and 2s ahead of the truck.

Camera

Placement error The lateral placement error, defined as Placement(t+∆)-Placement(t).

- Quality Right/Left Quality of the left and right road markings. Camera Steering wheel angle The angle of the steering wheel. Gyro

Vehicle speed Vehicle speed. -

Table 2: Description of signals that were considered in the system. Sorted in alphabetical order.

In the online training the entire buffer is used to train the ANNs in batch mode every time a new sample arrives. The buffer size can be adjusted if needed but the minimum size is two seconds, which in this case is 200 sam- ples. This is because it takes two seconds before a target value is available for the Current placement2. When the buffers are filled there is one set of inputs and targets for the P athN et and 200 sets of inputs and targets for the T orqueN et. Every time step the buffers are shifted one sample, up- dated and used to train the networks in batch mode as new data streams in. This means that at every time step the networks are trained to adapt to past situations seen in the last two seconds. Before the ANNs are used for prediction the first time in the system they are trained offline on the training set.

Differences in training parameters between online and offline training

can be found in Table 4 and general ANN parameters that apply to both

(35)

ANNs can be seen in Table 3.

ANN type MLP

Training algorithm Bayesian regularization Performance measure MSE

Data division Block Input processing PCA

Table 3: General ANN parameters that apply both the nets.

Method MaxEpoch Early stopping Data division Offline 1000 Yes, 10 consecutive fails Train 80%,

Validation 15%, Test 5%

Online 1 No Training only

Table 4: Online/Offline training differences

After the training the ANNs are used to predict the future preferred placements, Current placement05, Current placement2, of 0.5s and 2s ahead and the preferred T orque to use.

Even if the scenario terms are not fulfilled and training does not occur the ANNs still need to try and predict the driver behaviour, since this is the driver model used at every sample. These outputs from the ANNs can later be used as references in a controller. The algorithm for the use of the ANNs and online training just explained is shown in Algorithm 3.

The ANNs that were used in the end were MLP networks trained with

Bayesian regularization. The MSE was used as the performance measure

and early stopping was used as a stopping criteria. The data was divided

into three sets, the training set consisted of 80% of the data and the valida-

tion set 15% of the data and the test set of the remaining 5%. The block

division of the data is illustrated in Figure 7.

(36)

Figure 7: The data set was divided into blocks of traning, validation and test sets.

Algorithm 3: The algorithm for the use of the ANNs and online learning that was explained above.

if Scenario is OK then if Controller is ON then

if Driver is Actively steering for some time then fill or update buffers

if Training is ON & PathBuffer is Full then online train ANNs

else if Training is ON then online train TorqueNet Predict Path and Torque else if Controller is OFF then

fill or update buffers

if Training is ON & PathBuffer is Full then offline train ANNs

else if Training is ON then offline train TorqueNet else if Scenario is NOT OK then

if Controller is ON then Predict Path and Torque else if Controller is OFF then

Do nothing 5.4 Controller Choice

As mentioned in Section 5.1 and as can be seen in Figure 6, the input signals are first processed in lowpass filters, standardized and treated with PCA.

The signals are then used in the ANNs to predict the lane placement and the

steering wheel torque. To predict the steering wheel torque the predicted

placement error is first computed from the predicted lane placement. The

lane placement and the steering wheel torque are the two aspects that model

(37)

the driver steering behaviour. To control the lateral position of the vehicle one could use the predicted steering wheel torque directly, but it might be better to use both driver aspects.

The advantage of using an MPC instead of a PID controller is that the MPC can optimize the response while respecting constraints, this is illus- trated in Figure 8. The internal plant model also makes for a better control strategy, predicting the response ahead of the vehicle. The lateral placement predictions, predicted torque response and boundaries are weighed together in the optimization and an optimized output is calculated. The internal plant model in the controller is used to predict the closed loop response and to optimize over a discretized time horizon N. The states can be optimized over a horizon N while optimizing the control signal over a shorter horizon N c to save some computational time without losing much performance [35].

The formulation in Equation 13 was tested together with the parameters in Table 5.

Figure 8: The advantage of using an MPC is that one can optimize the response while respecting constraints. The placement references, torque response and boundaries are weighed together in the optimization and an optimized control signal is calculated.

To estimate the vehicle model of the vehicle dynamics a simple system

(38)

identification was done in order to estimate a linear discrete state space representation with lane placement and steering wheel torque as input, out- putting the next lane placement one sample ahead. The results were after testing not satisfactory and it was realized that another model was needed.

So another approach was investigated to improve the internal plant model.

An ANN was used as an internal black box model. The ANN was trained offline to predict the vehicle dynamics and mapping between lane placement and used steering wheel torque. The ANN was then used inside the MPC as an internal plant model. This was a better model but compromises instead the ability to respect the state constraints. This is because the model is highly complex and nonlinear. However as the original idea and need for an MPC was the ability to respect state constraints and to improve robustness, the conclusion is that the results of the internal plant modelling was still not good enough to fulfill the requirements and to motivate the use of an MPC.

An alternative applying a variant of a PI-controller was formulated and can be seen in Equation 14. The P-part is the predicted steering wheel torque, this is a part that is close to and proportional to the control signal that is desired in the end. The I-part integrates the predicted placement error to remove static error. This is the controller that was used in the end.

minimize

N

X

t=0

(P lacement _t − P redicted placement05/2) ² + W s _t +

ρ

N

c

X

t=0

(T orque _t − P redicted steering wheel torque) ² subject to :

t=0,...N −1

P lacement t+1 = A · P lacement t + B · T orque t

|T orque _t | ≤ T orque _max

|P lacement _t | ≤ P lacement _max − s _t

|T orque _t+1 − T orque _t | ≤ SlewRate

(13)

(39)

Parameter Symbol Value

Receding horizon N 200 samples

Control horizon N _c 100 samples

Trade-off parameter ρ 10

Torque limitation T orque max 4 Nm Maximum lane placement

deviation from the center

P lacement max 1.5 m

End penalization weight Q _N solution to ARE Slack variable to ensure fea-

sibility

s(t) solution of Equation 13

Slack weight W 1000

Maximum change of torque in one step

SlewRate 0.03 Nm/s

Sampling frequency f _s 100 Hz

Table 5: MPC parameter settings

u(t) = P redicted steering wheel torque(t)+I·

ˆ _t

0 P redicted placement error(τ )dτ (14)

6 Results

Here follows the results from open loop simulations. Each part of the online control structure was compared against the offline solutions. The whole system was simulated in an open loop environment to see how well the system performs all together. Both on validation data and on new driving scenarios. A variant of a PI-controller was also used and compared to the ANN steering wheel torque prediction.

6.1 Data Processing

The discrete IIR filters and recursive standardization of data seem to be of

good agreement when compared with conventional frame based processing

techniques. The sample based data processing has some settling time before

converging close to the frame based solution. To reduce the settling time the

filters can be initialized with initial states and the recursive standardization

with pre-computed starting means and variances. The processed signals

(40)

are delayed when compared to the original raw data signals. This is to be expected as change takes time when low pass filtering, hence introducing delays in the signals. The convergence time and slight difference between online and offline processing in this case can be seen in Figure 9.

Figure 9: Data that was processed online is here compared to data that was

processed offline. It can be seen that it takes time to converge to the offline

solution but in the end they overlap.

(41)

6.2 Artificial Neural Networks

Here the results from the ANNs are presented. In Table 6.2 the final param- eter settings used for the ANNs are shown, together with training time and epochs (the number of times that the networks were shown all the data).

The P athN et is a really small network with only one hidden layer and five hidden neurons. The identification of the steering wheel torque is a bit more complex so the T orqueN et uses three hidden layers with 8, 32 and 8 neurons in each hidden layer respectively. This is still not a very large network but it takes more time to train and it needs more data than the P athN et, because of its size.

The ANNs were pre-trained offline before use on the first data set con- taining driver number one. The data division mentioned in Section 5.3 was used in this process. Online adaption was also tried for further training and adaption during driving. The results can be seen below in Figures 10, 11, 12, 13 and 14.

For both the P athN et and the T orqueN et a regression plot and a time series plot is shown. The regression plots show how well each output value corresponds to each target value in the data set, the distribution if you will.

The data is separated with the mentioned data division. Ideally the data should fall onto the straight line and form a one-to-one mapping. None of the separated data sets should be worse than the others as this can indicate overfitting.

The time series plots show how the predictions on validation data can look like in the time plane. In Figure 12 a cloud above the straight line can be seen in the green validation set. This is a set of faulty predictions and can be seen in the time plane in Figure 13. When online adaption is used it can be seen that these faulty predictions are reduced which is shown in Figure 14.

When faulty predictions are made they can be quite large but most of the errors are considered small. About 80% of the steering wheel torque errors are smaller than 0.5Nm and after online adaptation 95-99% of the errors are small. These errors are considered small because the accuracy of the steering wheel torque sensor is currently ±0.5Nm.

The ANN results are also summarized in Table 6.2. Here are also some

results from the second data set involving the second driver shown.

(42)

Hidden layers Hidden nodes Training time Training epochs

PathNet 1 5 1.5min 113

TorqueNet 3 8-32-8 4.5min 14

Table 6: ANN parameters. Computed on a i7@3GHz with 16GB RAM.

(43)

Figure 10: A regression plot between output from the PathNet and the

corresponding target value. Ideally the plotted values would fall onto the

straight line.

(44)

Figure 11: A time series plot from validation data of the PathNet. It shows

from right to left: The current placement, the target and prediction at 0.5s

ahead, the target and prediction at two seconds ahead. The prediction at

two seconds ahead is not as accurate as the one at 0.5s ahead. The whole

data set has the average MSE of 1.2e-06 m for the prediction at 0.5s ahead

and 2.8e-05 m for the prediction two seconds ahead..

(45)

Figure 12: A regression plot between output from the TorqueNet and the

corresponding target value. Ideally the plotted values would fall onto the

straight line. In the validation set a cloud above the straight line can be

seen, this is a set of faulty predictions.

(46)

Figure 13: A time series plot from validation data of the TorqueNet. It

is easier predicting the turning points of the torque rather than the exact

torque level. In the beginning a set of faulty predictions can be seen. The

average MSE for the whole data set is 0.166Nm and 82% of the errors are

so called small.

(47)

Figure 14: A time series plot from validation data of the TorqueNet. Here online training was applied to further customize the fit over time. It can be seen that the MSE is decreased and the amount of large errors reduced.

The average MSE for the whole data set is 0.057Nm and 95% of the errors

are so called small.

Adaptive Steering Behaviour for Heavy Duty Vehicles

IN

DEGREE PROJECT ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS

STOCKHOLM SWEDEN 2017 ,

Adaptive Steering Behaviour for Heavy Duty Vehicles

TOM ÅFELDT

KTH ROYAL INSTITUTE OF TECHNOLOGY

Adaptive Steering Behaviour for Heavy Duty Vehicles

Tom ˚ Afeldt Tom.afeldt@outlook.com

September 19, 2017

Master Thesis Automatic Control KTH Electrical Engineering

Supervisor at Scania: Examiner at KTH:

Peter Sid´ en Mikael Johansson

Peter.Siden@Scania.com Mikaelj@ee.kth.se

Abstract

Keywords:

Scania, ANN, Artificial Neural Network, MPC, Model Predictive Con-

trol, System identification, Vehicle dynamics, Recursive filtering, Hu-

man behaviour, Machine learning, Online learning, Sample based pro-

cessing.

Sammanfattning

Keywords:

Scania, ANN, Artificiella Neurala N¨ atverk, MPC, Modell Prediktiv

Kontroll, System identifiering, Fordonsdynamik, Rekursiv filtrering,

M¨ anskligt beteende, Maskininl¨ arning, Online inl¨ arning, Sample base-

rad signalbehandling.

Acknowledgements

Thanks to my supervisors, Peter Sid´ en at Scania and Mikael Johansson at

KTH (Royal Institute of Technology), for their help with ideas, suggestions

and help that formed this study.

Contents

1 List of Definitions 2

2 Introduction 3

2.1 Goals . . . . 6

2.2 Delimitations . . . . 6

2.3 Outline . . . . 10

3 Theory 10 3.1 Data Processing . . . . 10

3.2 Artificial Neural Networks . . . . 11

3.3 Model Predictive Control . . . . 19

4 Related Work 20 5 Implementation 22 5.1 Control Structure . . . . 22

5.2 Data Processing . . . . 24

5.3 Artificial Neural Networks . . . . 26

5.4 Controller Choice . . . . 30

6 Results 33 6.1 Data Processing . . . . 33

6.2 Artificial Neural Networks . . . . 35

6.3 ANN Performance on Unsorted Data . . . . 42

6.4 PI-Controller . . . . 45

7 Analysis 46 7.1 Setbacks and Evaluation . . . . 47

7.2 Modelling of the Vehicle Dynamics . . . . 47

8 Future Work 47

9 Summary 49

1 List of Definitions

ACC - Adaptive Cruise Control.

ADAS - Advanced Driver Assistance System.

ANN - Artificial Neural Network.

ARE - Algebraic Ricatti Equation.

BP - Backpropagation.

BR - Bayesian Regularization.

CNN - Convolutional Neural Network.

Frame - Multiple samples of data forms a frame.

LKA - Lane Keeping Assist.

MLP - Multi-Layer Perceptron.

MPC - Model Predictive Control.

MSE - Mean Square Error.

Path - The predicted future placements forms the path.

PathNet - ANN that predicts preferred future placements.

PCA - Principal Component Analysis.

PDF - Probability Density Function.

Placement - The placement of the truck relative the center of the lane.

PP - Peak to Peak.

RBF - Radial Basis Functions.

Standardized data - Data with zero mean and unit variance.

Torque - The driver steering wheel torque.

TorqueNet - ANN that predicts preferred steering wheel torque.

u - Control signal [Nm].

w ij - The weight for a node between layer i and j.

x - System state vector.

2 Introduction

To create an adaptive LKA one must first identify the driver style from

some collected data, then learn or adapt the controller to this steering be-