Using machine learning for control systems in transforming environments

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Bachelor thesis, 16 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-G--20/046--SE

Using machine learning for

control systems in

transform-ing environments

Felicia Barkrot, Mathias Berggren

Supervisor : Lennart Ochel Examiner : Peter Fritzson

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns lösningar av teknisk och admin-istrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sam-manhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet – or its possible replacement – for a period of 25 years starting from the date of publication barring exceptional circum-stances. The online availability of the document implies permanent permission for anyone to read, to download, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the con-sent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping Uni-versity Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

c

(3)

Abstract

The development of computational power is constantly on the rise and makes for new pos-sibilities in a lot of areas. Two of the areas that has made great progress thanks to this development are control theory and artificial intelligence. The most eminent area of arti-ficial intelligence is machine learning. The difference between an environment controlled by control theory and an environment controlled by machine learning is that the machine learning model will adapt in order to achieve a goal while the classic model needs preset parameters. This supposedly makes the machine learning model more optimal for an en-vironment which changes over time. This theory is tested in this paper on an model of an inverted pendulum. Three different machine learning algorithms are compared to a classic model based on control theory. Changes are made to the model and the adaptability of the machine learning algorithms are tested. As a result one of the algorithms were able to mimic the classic model but with different accuracy. When changes were made to the environments the result showed that only one of the algorithms were able to adapt and achieve balance.

(4)

Acknowledgments

We would like to express our very great appreciation to our examiner Peter Fritzson and our supervisor Lennart Ochel

(5)

List of Figures

1.1 An inverted pendulum balancing on a cart . . . 1

2.1 A schematic over a simple feedback control system with exogeneous signals. . . . 5

2.2 A schematic over a PI-controller. . . 6

2.3 A schematic over a PID-controller . . . 7

2.4 Diagram of the pendulum environment with actuating forces . . . 8

2.5 A diagram of a classic decision tree . . . 11

2.6 A graph showing an example of linear regression . . . 12

2.7 Simplified image of a biological neuron . . . 13

2.8 Schematic of Artificial Neuron . . . 13

2.9 A graph showing weight altering by using stochastic gradient descent . . . 14

2.10 Graphs of hyperplane decision surface for a single layer perceptron network . . . 15

2.11 Graph showing a plot of the sigmoid activation function . . . 16

2.12 Graph showing the tanh activation function . . . 16

3.1 Schematic of the neural network structure . . . 21

4.1 Graphs of simulation results for the base pendulum with different start angles. . . 27

4.2 Graphs containing the different simulation results in altered environment 1 with changed pendulum mass . . . 28

4.3 Graphs containing the different simulation results in altered environment 2 with changed pendulum mass and pendulum length . . . 29

4.4 Graphs containing the different simulation results in altered environment 3 with changed cart mass and pendulum length . . . 30

4.5 Graphs containing the different simulation results in altered environment 4 with changed cart mass . . . 31

4.6 Graphs containing the different simulation results in altered environment 5 with changed cart mass, pendulum mass and pendulum length . . . 32

(8)

1 Introduction

This chapter will introduce the main motivation of the project. The aim is presented before presenting the research questions. Lastly the delimitations of the project is given.

1.1 Motivation

Control theory is a field of engineering which has seen a lot of development during the 20th century. Technological advances and the difference in computation power is two of the rea-sons for this development. Most of the present methods for controller design within the linear control theory branch is static, this means that as soon as the parameters for the system is set it is going to work that way until you change it. This sounds like a good approach in most cases, but if the environment changes or the user wants to make variations to the environ-ment the parameters need to be reset, something that could be a complicated process. When studying control theory an environment which is very frequent is the inverted pendu-lum. The inverted pendulum controller is a classic example in control engineering because of the instability of the system. The inverted pendulum is placed on a cart, both the pendulum and cart are constrained to move within the vertical plane. The goal is for the cart to be able to balance the pendulum 90 degrees straight up by moving in the vertical directions.

Figure 1.1: An inverted pendulum balancing on a cart

Just as the development of computation power has made an impact in the control theory branch, it has also made Artificial intelligence (AI) a hot topic in the 2010’s. AI is something that becomes more relevant every day in our society, research in this fields has made some big breakthroughs over the last ten years. Most of those breakthroughs have been in the area of machine learning with neural networks as one of the big buzzwords of the area.

(9)

1.2. Aim

The difference between a classic controller of the pendulum and a pendulum operated with machine learning is that the machine learning model will learn how to operate in order to balance the pendulum while the classic model will need preset parameters. This suppos-edly makes the machine learning model more optimal for an environment which may have deviations over time since there would be no need to change any parameters.

1.2 Aim

The aim of the project is to compare and study the data produced by different machine learn-ing algorithms to map their behavior and ability to adapt. The research environment, a sim-ulated inverted pendulum will be controlled by a PID-controller. From these simulations training data will be extracted to train machine learning algorithms to be able to balance the pendulum. Different changes will be made to the environment and then the algorithms will be run on the new environment to compare how control systems, that are based on different machine learning algorithms, adapt when placed in environments that are dynamic.

1.3 Research Questions

AI is one of the biggest fields within computer science as for today. There are numerous of areas in which it is used and one of these is control theory. The inverted pendulum is a classic experiment within this area and it is interesting to implement with machine learning to be able to examine how a well-studied environment functions with more modern techniques. It is also an model in which you can supply different changes to the environment such as different weights, lengths and start-angles to see how an algorithm adapts.

With previously mentioned problem as a base, the questions that will be answered in this report are:

• How does a pendulum based on classic control theory compare to a pendulum based on different machine learning algorithms?

• How will the machine learning algorithm behave when altering the environment by changing for example weight, length and size.

The necessary theory and method will be described in this report before presenting the result and discussion.

1.4 Delimitations

Considering the time frame of the project a number of different delimitation has been made. Today many different machine learning algorithms exist, given the mentioned time frame we had to pick 3 of them since comparing them all would take too much time. This work should therefore not be considered a complete comparison since many algorithms has not been con-sidered. Also due to the time frame of the object, the only problem which will be studied is the inverted pendulum. With more time there would be possible to research different types of problems. The pendulum will only be virtual as a result of the time frame of the project.

(10)

2 Theory

This chapter will explain the theory behind the project and will start of by presenting the main control theory which have been used to build the controller and the classic model of the pendulum. Then the area artificial intelligence is presented as an area in the field of computer engineering. The area is wide and has several subareas, the one who will be used in this article is machine learning. After this the selected algorithms will be thoroughly explained. The modeling language which has been used in this project to build the pendulum, Modelica, is explained last.

2.1 Control theory

Control theory deals with automatic systems and open- and closed-looped systems. Auto-matic systems refers to system that can work without human supervision. Control theory is a interdisciplinary subject which means that it is used within a number of different branches such as for example vehicles, robots and space technology [27]. Without control systems it is possible that our technology today would be very different. Control systems are what makes many of our machines work as intended. A control system is often based on a feedback-principle. There is an input signal that is compared to a reference signal which represent the goal that is to be reached [9].

To be able to construct a control system a high knowledge about the processes that are to be regulated is required. The most important factor are the outputs of the system and how they react to changes in the inputs. The input is some kind of command or stimulus that is applied to the system. The form of the inputs and outputs can vary. The input and output needs to be given to be able to identify the components of the system. A control system can have more than one input and output [8]. A process have static and dynamic characteristics. The static characteristics are the ones that are in the process’s static condition, this means that the static amplification is leveled everywhere in the work of the process. The dynamic characteristics of a process will take into account rigidity’s, time delays and transients. For example, when you press the pedal of a car it will take some time before the wanted speed is acquired. The static characteristics between different systems are often very alike while the dynamic characteristics may differ. Because of this, different processes can be divided into different types. There are processes with downtime, processes with overshoot and so forth. The inverted pendulum is an unstable process. This means that feedback is required to be able to keep the output close to the wanted reference signal. [27]

(11)

2.1. Control theory

When designing a control system there are a number of steps to go through. First the control-system must be studied, to be able to build the control-system one must know what type of sensors that are necessary, what actuators that is to be used and where they should be placed. After identifying all the necessities of the system, it can be modeled. After the system is modeled the resulting models’ properties can be determined and the performance specifications can be set. Based on earlier conclusions the type of controller that is to be used is set and the controller is designed to meet the measured properties. There are different types of controllers that can be used on different types of systems. Some of these controllers will be explained more closely in the following sections. After identifying the appropriate controller the system can be simulated. [9]

2.1.1 Open-loop and closed-loop control systems

Control systems are generally divided into two categories: open-loop and closed-loop sys-tems. These system are separated based on the control action which is responsible for activat-ing the system and producactivat-ing the output. The word action in the term control action is not necessary a change, motion or activity. In a system designed to have an object hit an target the control action is the distance between the object and the target. A distance is not an action but the action motion is implied in this case due to the goal of the object hitting the target. In a open-loop control system the control action is independent of the output, in contrast to the closed-loop control system where the control action is dependent on the output. The open-loop control systems ability to perform accurately is entirely determined by its calibration. When a system is calibrated the input-output relation is established to obtain a desired sys-tem accuracy. The closed-loop control syssys-tem are also known as feedback control syssys-tems. This is the kind of system which will be used in this article and the feedback control system is explained more thoroughly in the following section. [27]

2.1.1.1 Feedback control systems

Feedback is the main difference that separates the closed-loop control system from the open-looped one. The feedback permits the output to be compared with the input to the system and allows for the appropriate control action to be formed as a function of the output and input. The presence of the feedback gives the system a number of different properties. For example, a feedback control system increase accuracy and the feedback reduce the effects of external disturbances or noise.

(12)

2.1. Control theory

The simplest kind of feedback control system has three components: 1. The object that is to be controlled.

2. A sensor to measure the output from the object. 3. A controller to generate the input to the object.

Figure 2.1: A schematic over a simple feedback control system with exogeneous signals. As seen in figure 2.1 the output signal is fed back into the object that is to be controlled, this is called the feedback. Apart from the feedback there are some signals that are coming from the outside, the external disturbance and sensor noise are example of this. These signals are called exogenous signals [9]. The controller oversees the process and has a input variable, the command input, which is compared to a set-point that is the wanted outcome. If the input is the same as the set-point the control system has reached the goal, if they differ the controller sends the object input to the object to tell it how to perform to reach the set-point. [1]

2.1.2 Controllers

The controller is the heart of the control system. The controller’s task in the system is to, with the information from the feedback, create the control signal that will try to decrease the error. The simplest form of controller is the on-off controller, in these the control signal can only take two different values. The value of the signal depends on the output, if it is positive or negative. The on-off controller is simple but not always accurate enough. As a more advanced type of the on-off controller there is the multi-stage controller. The difference is basically that there are more than two stages in this type of controller. The on-off and multi-stage controller are not very usual, in many cases a P- PI- or PID-controller is used. [27]

(13)

2.1. Control theory

2.1.2.1 P-controller

With proportional control the variations of the control signal is proportional to the control error signal, the input to the controller. The relationship between the input and output can be described with the following formula:

u=u0+Ke (2.1)

Whereas u is the control signal, e is the control error signal, u0is the set point, the goal value.

The K parameter is the controllers amplification which means that it decides how much the controller should do to right the wrong.

A P-controller is often used as a basic function in most controllers but is often combined with a I-controller and D-controller which will be brought up in the following sections. The P-controller will give a softer control compared to previously mentioned on-off controllers. There is no K-value that gives a good speed and a high stability, if both speed and stability is wanted the controller must be supplemented with a D-controller. The P-controller might be enough if the requirements are not that high.[27]

2.1.2.2 I-controller

An I-controller is a controller where the output is an integral of the error. u(t) = 1

TI

żt

0

e(t)dt (2.2)

Whereas TIis the integration time which decides the velocity of the integration. e is the input

and u is the output. The I-controllers output at a certain time depends on the size of the error at that time. In a I-controller the control signals initial value is set to the set points value. As long as the error is 0, the control signal is the same as the set point, the control signal will stay at its initial value. If there is an error the control signal will increase or decrease depending on if the error signal is positive or negative. When the error is controlled the control signal will have returned to its initial value. [27]

2.1.2.3 PI-controllers

Often a P-controller and an I-controller is combined into a PI-controller. In this way you can use the advantages of the both types.

u(t) =Khe(t) + _T1 I şt 0e(t)dt i (2.3)

Figure 2.2: A schematic over a PI-controller.

The amplification will now affect both therms in the PI-controller. The integration time TI

in a PI-regulator is consciously chosen to be big and that will result in a slow change in the I-part of the controller, when an error occur, compared to the P-part. The integration time in the PI-controller corresponds to the time it takes for the I-parts output to match the P-parts output. The constants K and TIneeds to be set to suitable values for the PI-controller to work

(14)

2.1. Control theory

2.1.2.4 Derivatives and PID-controller

The third part of the PID-controller is the derivative part, the D-part. u(t) =TDe

1

(t) =TDde

(t)

dt (2.4)

Where the derivation time TDis a constant.

Figure 2.3: A schematic over a PID-controller

The output of the D-part is separated from 0 only when the input’s value is changed, when the derivative of the input is separated from 0. If the change of the input is fast, the derivative will be big and when the input takes a constant value the derivative will be 0. The derivative part is never present as a single entity in a control system, it is always used with some of the previously mention parts, for example in a PD-controller, or PID-controller.[27]

The output of a PID-controller is made up of outputs from the three different parts, the P-, I-and D-parts. The relationship between the input e I-and the output u in a PID controller can be described in the following form:

u(t) =Khe(t) + _T1 I şt 0e(t)dt+TDe 1 (t)i (2.5)

The derivative part of the PID-controller can improve the stability, speed and the interference suppression. If the PID-controller is to function as intended the parameters K KI and TD

(15)

2.2. Physics of the pendulum

2.2 Physics of the pendulum

The inverted pendulum operates in an environment with the following parameters; A cart that has a mass M. External force F is added at the sides. The pendulum itself has a mass m and is connected to the cart through a rigid massless rod with a length l. The pendulum is rotated from the vertical line by a quantity θ in the counter clockwise direction. There’s also a friction force f that works in the opposite direction of the external force and a gravitation constant g. Figure 2.4 describes the environment.

Figure 2.4: Diagram of the pendulum environment with actuating forces

The system has the freedom to move in two different ways, the cart can move horizontal with the x-axis and the pendulum can rotate against it’s pivot point 360 degrees.

2.2.1 Equations of motions

In section 2.2 it was stated that the pendulum has the freedom to move in two different ways. This leads to two state variables.

xs=Displacement of cart on x-axis relative to starting position.

θs=Angular displacement for the pivot relative to upright position.

To derive the equations of motions we used Lagrange’s equations (2.6). d dt( BL Bqi )´ BL Bqi =Qi (2.6)

Where L is the difference between the kinetic energy (T) and the potential energy (V).

L=T ´ V (2.7)

The potential energy of the system is going to be the potential energy of the pendulum since the cart will never have any stored energy.

(16)

2.3. Artificial intelligence

Finding the kinetic energy is a little more complicated since it involves both the pendulum and the cart.

T= 1 2M˙x 2₊1 2mV 2 p = 1 2(M˙x 2₊_m₍_V2 x+Vy2)) = 1 2(M˙x 2₊_m₍₍_{˙x ´ l ˙θCosθ}₎2₊_{l ˙θSinθ}₎2₎₎ = 1 2((M+m)˙x 2

´m(2 ˙xl ˙θCosθ ´ l2˙θ2(Cos2θ+Sin2θ)))

= 1

2((M+m)˙x

2

´m(2 ˙xl ˙θCosθ ´ l2˙θ2))

(2.9)

By then combining formula 2.8 and 2.9, equation 2.7 can be solved for L. L= 1

2((M+m)˙x

2

´2m ˙xl ˙θCosθ+ml2˙θ2´2mglCosθ) (2.10) Using Lagrange’s equations (2.6) and calculating the equations of motions for our state vari-ables xsand θswe can get the equations of motions for the system.

xs :(M+m)¨x ´ ml ¨θCosθ+ml ˙θ2Sinθ=F (2.11)

Where F is the external force applied to the state variable xs.

θs : m ˙xl ˙θSinθ+ml2¨θ ´ m ¨xlCosθ ´ m ˙xl ˙θSinθ ´ mglSinθ

=ml2¨θ ´ m ¨xlCosθ ´ mglSinθ

=l ¨θ ´ ¨xCosθ ´ gSinθ =0

(2.12)

Equation 2.12 is equal to 0 because there will be no external force actuating on state variable θs.

2.3 Artificial intelligence

The term artificial intelligence (AI) was first proposed at a conference held at Dartmouth College in 1956 [6]. Since then the area has been going through a remarkable process. The cognitive system, Watson, which was developed by IBM beat the reigning master in Jeopardy in 2011. In 2016 Google’s AI-system, AlphaGo, achieved great success in a challenge with Lee Se-dol, one of the best players in the world of the game Go. Simply put AI can be defined as the research of intelligent agents. This include devices that observe its environment and based on this observations makes decisions that maximizes the likelihood of obtaining a goal. The device should be programmed to mimic the cognitive behaviour in a human brain such as for example learning and problem solving. [29].

AI can be classified into two different categories, weak and strong AI. The strong AI category is considered to have human-like high level cognition ability. Included in this behaviour is for example common sense and self-awareness. On the other hand, there is weak AI which simulates human intelligent processes without resistance and real understanding. Modern AI systems are all at the stage of weak AI and as for today strong AI does not exist. [29]

(17)

2.4. Machine learning

2.4 Machine learning

Machine learning is an area of computer science and AI. It gives a computer the ability to learn without being deliberately programmed [6]. The area evolved from the study of pattern recognition and computational learning theory in AI. Machine learning is used for problems that can be solved using inference and have a large representative training data [3]. This is done with different algorithms. Simply explained, these algorithms are a sequence of differ-ent instructions. These instructions interpret input and revise the input to an output. There are many different types of algorithms and the challenge is to find the most efficient one for a specific task. Simpler tasks might not need an algorithm at all, for example, classifying emails into spam and not spam. When doing this the input is the email and the output is a simple yes/no depending on if the email is spam or not. The person using the email-service can change or affect what is spam, which mean it will change over time [2].

The main motivation for using this type of system is that if a system can learn and adapt to changes in an environment the designer of these systems does not need to be able to foresee and provide solutions for all possible situations. The areas in which machine learning is used today are many and include pattern recognition, speech recognition and robotics . One of the main themes of pattern recognition is recognizing faces, it is an easy task for the human brain, a human does it without effort every day. This is however done unconsciously which means without awareness, sensation, or cognition. To build a computer program that works with awareness or cognition is impossible, the program will never make its own decision, it will take action based on the code that the programmer have written. To mimic the brain the algorithm is programmed to look at the known facts; a face has a pattern, it has a nose, eyes and a mouth. These are all placed at a certain position in the face, there is a structure. With the data of photos of different faces a learning program can see and analyze a face-pattern and recognize this by checking for the pattern in each image. [2]

2.5 Supervised Learning

Supervised learning uses models to map inputs to comparable outputs with the help of des-ignated training data-sets. This can be used when trying to solve classification and regression problems that apply to predicting discrete or continuous valued outputs, respectively. To be able to solve a supervised learning problem you first need to determine what type of data that is going to be used as training set. Different types of data requires different actions. Af-ter deAf-termined the type of the data it needs to be collected. The data is a must to be able to achieve the requested result. When the data is gathered the input function of the learned function needs to be determined. It is important that the structure of the learned function and the comparable learning algorithm is thought through and compatible. The training sets consists of input objects and corresponding outputs and are gathered from human experts or from measurements. The most common way to decide the input function of the learned function is to transform the input object into a vector containing features that are revealing the object. [3].

2.5.1 Regression Tree

The regression tree is a long-established machine learning imputation algorithm. The algo-rithm forms a binary tree structure model which outputs a represented value by conditional branching. The tree has a root as a beginning. The leaf in the tree is the terminal node. In between the root and a leaf are the regular nodes. Originating from the root, questions can be asked about the features. The branches then answer the question. The next question is de-termined by the previous answer. In the classic version, each question refers only to a single attribute, and has a yes or no answer. [14, 12]

(18)

2.5. Supervised Learning

Figure 2.5: A diagram of a classic decision tree

The output, leaf, of a regression tree is obtained by making a series of comparisons rather than asking yes- or no questions. To train a tree you need a dataset. The dataset is divided into a training set and a testing set. The training set is usually composed of complete data which determine the structure of the tree while the testing set has missing values of certain attributes. The model searches every distinct value of the input data to find the split value that separates the data into two regions. After finding the best split the splitting process is repeated on each of the two new regions. This is repeated until a stopping point is reached. [14, 12]

A regression tree have a lot of advantages. It’s an excellent way for the user to visualize each step of a decision making process which can help with making rational decisions. There is a possibility to give priority to a decision criterion. A lot of the undesired data is filtered out in each step which makes for a manageable amount of data. It’s a very presentable algorithm that is easy to explain.

2.5.2 Linear regression

This method is a linear approach to modeling the relationship between independent vari-ables. This is done by fitting a linear equation to the data. For example there is a dataset with two different variables X and Y. These datasets are plotted in a graph. Linear regression now strive to find the optimal straight line amongst these data points. The line, which is called a regression line, contains the predicted score of Y for each achievable value of X. The line is a mean value of all the data points which means that the prediction often will not be exact and will have some errors of prediction. [18, 15]

(19)

Figure 2.6: A graph showing an example of linear regression

With two variables and only one explanatory variable the method is referred to as simple linear regression. If there are more than one explanatory variable this is referred to as multiple linear regression. It’s a fairly simple algorithm and it has well known properties which makes for a popular algorithm to use not only in machine learning but in areas such as for example finance, economics and epidemiology. [18, 15]

2.5.3 Artificial Neural Networks

An Artificial Neural Network (ANN) is an attempt to model how a biological brain operates. There are several different types of ANNs, this section covers the relevant ANN models for the report, from single layer networks with perceptron activation functions to more modern and evolved networks represented by multi layered neurons with ReLU functions. ANN are considered a viable solution for problems which have noisy and complex sensor data [21]. There are two types of Neural Networks, Convolutional Neural Networks (CNNs) and Re-current Neural Networks (RNNs). The difference is that everything in the CNN is sequential, whereas the RNN has loops inside of it. Every time Neural Network is mentioned in this thesis it is referring to a CNN if nothing else is mentioned.

(20)

2.5.3.1 Biological neurons

A human brain consists of a large set of connected nodes (neurons). Approximately there are 20 billions of neurons in a human brain, where each neuron is on average connected to 7 000 other neurons [10]. A neuron is sometimes also referred to as a nerve cell. The neu-rons transmit and receive signals through an electrochemical process, first an electrical pulse is sent, the pulse is then converted into a chemical message which is transported to other neurons [13]. The neurons in the brain receive the signals with something called dendrites. After receiving, the neuron has a system that keeps track of which input signal that is of most significance, this is named synapses. After gathering the received messages the neuron will evaluate what to do next and send messages to other neurons or organs, the part that sums all the input-signals up and evaluates if they reached a certain threshold is named Soma. The part of the neuron that handles the transmission is referred to as the Axiom [30]. The speed for which a human neuron can perform a switch is estimated to 10´3seconds, compared to a computer which has switching speeds of 10´10seconds. However, a human brain can recog-nize a person in approximately 10´1_{seconds because of the human neurons ability to operate}

in parallel [21].

Figure 2.7: Simplified image of a biological neuron

2.5.3.2 Artificial Neurons

Already 1943 the first mathematical model for representing a neural network was presented [20]. Since then there has been significant improvements and major breakthroughs. However, the basic model for how a neuron is represented remains the same. There is an input vector {x1... xn} that works like the dendrites described in 2.5.3.1. There is also a weight vector {w0

... wn} which works like the synapses, the weights themselves are represented by a decimal

number between 0 and 1. After receiving the input, they are all summed and checked if it reaches over the threshold of the activation function. Depending on the triggering of the activation function different values are sent to the output; represented by an output vector [21].

(21)

2.5.4 Backpropagation

To train a network it is necessary to calculate the desired functionality. Comparing the output from the network to the desired output gives us an error value (can also be referred to as loss). This is done by using an error function, the most common named sum of squared error (sse) [31]. Esse= 1 2 L ÿ l=1 H ÿ h=1 (Olh´Ylh)2 (2.13)

Where l = 1,2,3, ..., L and are for the different observations. h = 1,2,3, ..., H is the index for respective output node. O is the desired output and Y is the observed output.

As mentioned earlier each individual neuron has a weight associated with it. Altering the behaviour of the program is done by adjusting the weight that each neuron has. The weights adjust the importance of certain input values and change the output from the neurons, there-fore they can be used to alter the functionality of the network. Altering of the weight is done by something called backpropagating. Backpropagation works by using stochastic gradient descent, the idea is to minimize the error value generated by the error function. Calculating partial derivatives of the error function regarding the weights (_BWBE) gives us the possibility to move towards a smaller error value, this is displayed in figure 2.4. Each iteration the gradient is calculated and moved towards a smaller error by adjusting the weight in the appropriate direction [16].

Figure 2.9: A graph showing weight altering by using stochastic gradient descent

2.5.5 Activation functions

As mentioned in the previous section there is an activation function that decides whether or not the input was larger than a certain threshold. During the recent years there has been a lot of research in this area which has resulted in numerous different functions. This section briefly describes what has been, what exists and some of the setbacks of the different acti-vation functions. There has been experiments with using different actiacti-vation functions for different layers [28].

(22)

2.5.5.1 Perceptron

Old ANN systems used an activation function called perceptron.If the result is bigger than a threshold the perceptron will output a binary value (1 or -1). Since the output is binary it can be considered as a Boolean value. Given input x1to xnand assuming x0=1, the output o(x1,

. . . , xn) can be written as the formula:

o(x1, ..., xn) =

#

1 ifΣn_k=0wkxk> 0

-1 else (2.14)

Where wk= a real-value constant (weight) that affects how much each input will contribute.

As mentioned earlier, the output of a perceptron can be interpreted as a Boolean value de-pending on what value the sum of the weights and inputs have. In figure 2.10a there’s a representation of this. The figure 2.10b shows a limitation to the perceptron, there is no way to model a XOR function with a single function since the XOR function is not linear separable. This has led to evolution of more complex functions that can be used when modeling neural networks [21].

(a) (b)

Figure 2.10: Graphs of hyperplane decision surface for a single layer perceptron network

2.5.5.2 Sigmoid

The sigmoid receives a real-value input and produce a real-value output, compared to the perceptron activation function which outputs a binary value [19]. The sigmoid function is described in equation 2.15 and plotted in figure 2.11.

f(x) = 1

1+e´x (2.15)

As seen in the plot the function squashes the input value to a value between 0 and 1. When looking at figure 2.11 and remembering how back-propagation worked (calculating gradient) it is obvious that this function will have a hard time converging for high input values since all gradient is killed the closer it comes to ´8 or 8. Computing exponential functions will also be expensive computationally, which is important to avoid when training large networks [25].

(23)

Figure 2.11: Graph showing a plot of the sigmoid activation function

2.5.5.3 Tanh

Tanh activation function is a spin-off from the sigmoid, where the gradients increase when data is centered around 0. This can be visualised by plotting the tanh function, in figure 2.12 the output ranges from [-1, 1] where as in the sigmoid the possible output lies between [0, 1].[17] Equation 2.16 is the tanh function where the sigmoid function 2.15 has been denoted to σ(x).

tanh(x) =2σ(2x)´1 (2.16)

Figure 2.12: Graph showing the tanh activation function

2.5.6 Hyperparameters

There are a number of different learning algorithms that exists today. Oftentimes these al-gorithms have sets of hyperparameters. A hyperparameter is a parameter that has to be set accordingly by the user for the algorithm to be able to perform to its’ full extent. These hyper-parameters influence the algorithm and its performance a great deal and are therefore used to configure different aspects of the algorithm. The tuning of hyperparameters are generally done manually which can be a time consuming work and it can be hard to reproduce by others. The amount of hyperparameters can vary substantially but usually only a few of the hyperparameters impact the performance. Identifying which of the parameters that has this kind of impact in advance is hard. [5]

There are a number of different methods for optimizing hyperparameters, such as for exam-ple grid search, random search, Bayesian optimization and evolutionary optimization. The one that is used in this paper is grid search which is one of the most popular methods. The method searches through a user-specified subset of a hyperparameter field of the learning algorithm. This field may consist of real values parameters which makes it necessary to man-ually set bounds for the search. [4]

(24)

2.6. Modelica

2.6 Modelica

Modelica is an open for all language which is used for modeling different systems. The de-velopment of the language has been ongoing since 1996. The language is object-oriented and is suited for a number of different multi-domain models. The models in Modelica are described mathematically by algebraic, differential and discrete equations and from a user’s point of view, they are described by schematics. These schematics consists of components that has connectors that describes the possible interactions. A diagram model is made by drawing connection lines between connectors on different components. To be able to graphi-cally edit and browse a Modelica model a Modelica simulation environment is needed. This environment is used to perform model simulations and other analysis.

2.7 FMI - Funtional Mock-up Interface

Functional Mock-up Interface (FMI) is a tool independent standard that support both model exchange and co-simulation of dynamic models. This is done by using a combination of xml-files and compiled C-code. It was the European project MODELISAR that started the work with FMI in July 2008. The goal was to improve the design of systems and of embedded soft-ware in vehicles. The purpose has broaden since 2008 and the intention is now that dynamic system models of different software systems can be used together for different simulation models in both cyber physical systems and other applications. FMI functions are called by a simulation environment to create and simulate one or more executable called Functional Mock-up Units (FMU). An FMU can require the simulation environment to perform numeri-cal integration or have its own solvers. The goal is that the numeri-calling of an FMU in a simulation environment should be fairly simple. [26]

2.8 Related Works

There has been numerous experiments with balancing a pendulum with machine learning, especially with neural networks. In [24], [22], [23] and [7] this was performed successfully.

(25)

3 Method

In response to the research questions 1.3; creating a virtual model of the pendulum allowed for repetitive runs with different parameters for different algorithms. For all experiments a PID-controller was used as reference point. To allow the algorithms to run on a real-time hardware system, the algorithms were implemented in C++ to achieve fast execution time.

3.1 Virtual model of pendulum

The model was written by deriving the equations of motions mathematically, this gave a strong understanding of the physics of the pendulum. The C++ library used to access the FMU model is FMI4cpp because of its focus on being easy to setup and use [11]. The simula-tion of the virtual pendulum was done with step sizes of 0.001 seconds and total simulasimula-tion time for one simulation was 10 seconds. Both of these parameters were selected after trial and error investigations.

3.1.1 Implementation in OpenModelica

Equations 2.11 and 2.12 are used to simulate the behaviour of the pendulum and the cart where the input force is F in equation 2.11. The implementation of the inverted pendulum can be seen below.

1 model InvertedPendulum 2 import SI = Modelica.SIunits; 3 4 parameter SI.Mass M = 1; 5 parameter SI.Mass m = 1.5; 6 parameter SI.Length L = 1.0; 7 parameter SI.Acceleration g = 9.82; 8 9 SI.Angle theta(start=3.14);

10 SI.AngularVelocity theta_velo = der(theta);

11 SI.AngularAcceleration theta_accel = der(theta_velo);

12 SI.Position x_pos(start=0);

13 SI.Velocity x_vel = der(x_pos);

14 SI.Acceleration x_acc = der(x_vel);

15

16 input SI.Force F(start=0);

17 output SI.Angle Y = theta;

18 equation

19 (M + m) * x_acc - m * L * theta_accel * cos(theta) + m * L * theta_velo^2 * sin( theta) = F;

20 L * theta_accel - x_acc * cos(theta) - g * sin(theta) = 0;

21

22 end InvertedPendulum;

(26)

3.2. PID Controller

3.1.2 Altering of the environments

When considering which variables to make changes to, we decided that the mass of the cart (M), the mass of the pendulum (m) the length of the pendulum axis (L), and the displacement of the angle at the start were the most suitable to make changes to. Our result is measured by the displacement of θs, and the PID controller we are using operates on the offset of the angle.

Because of this we do not make any changes for the initial values of xsor take measurements

from it.

The angle offset was initialized with the values {90, 60, 45, 30, 20, 14, 9, 4, 0} to cover the possible scenarios a controller could face in a varying environment. We did not consider the opposite angles {-90, -60, ..., 0} since the target function for those values is expected to be a horizontal reflection for the values chosen.

3.2 PID Controller

The integral-part of the PID controller was achieved by having an internal state that accu-mulates the error, in the pseudo code below referred to as ITerm. The derivative is solved by saving the last error input and calculate the change that happened last time step, as men-tioned earlier our step size (dt in code below) is 0.001. There is also two hyperparameter values used for regulating the output from the controller, we set these to -100 respective 100 as these were seen as good thresholds after initial test runs. All the coefficient values are set through a grid-search of all parameters.

hyperparameters: Kp, Ki, Kd, out_max, out_min output :External force F for xs

1 Function PID(error) 2 ITerm+= error

3 output= (Kp ˚ error+Ki ˚ ITerm ´ Kd ˚(error ´ old_error)/dt) 4 if output > out_max then

5 output= out_max

6 else if output < out_min then 7 output= out_min

8 return output 9 end

Algorithm 1:Pseudo code of the PID controller

3.3 Choosing algorithms

When choosing which algorithm to use in the experiment we looked at a number of different factors. The first aspect that was discussed was the future aspect. We wanted a algorithm that is modern, up-to-date but most importantly current, this due to the usability and de-velopment of the study. As a contrast to this we also wanted a older, more well-researched and proven algorithm. We also considered using algorithms with different complexity. We found it interesting to see how algorithms with different complexity would adapt and learn to different environments. As for the current and modern algorithm we settled on a neural network. Even if the possibility of neural networks have been discussed for a long time, this is one of the biggest and most up-coming algorithms of AI as for today. More facts that made

(27)

3.4. Training data

As a contrast to the complex neural network algorithm we wanted a simple and well-established algorithm. We considered different ones but settled on the linear regression al-gorithm. This algorithm is very common, easy to use and implement and we thought that the contrast to the neural network would make for interesting discussion. We wanted three algorithms to be able to make a deeper comparison and as for the last algorithm we wanted to find a middle ground between the neural network and linear regression. Here we settled on a regression tree algorithm. This algorithm is fairly complex but is easy to understand and explain. We thought that the tree-structure seemed interesting and found it intriguing to see how this algorithm would compare to the first two algorithms we picked. We felt that all three algorithms were dissimilar enough to give us a bigger picture over how different kinds of algorithms work on the same kind of problem.

3.4 Training data

To be able to train our algorithms, training data was needed. The same sets of training data was used on all three machine learning algorithms. The training data was extracted by sim-ulating the virtual model and letting the PID-controller act as a point of reference. We ran 6 simulations with different starting values for θ varying between -60, -40, -30, 30, 40, 60 de-grees offset from vertical upright. For these simulations we took the value of θ and ˙θ as input for each data point and the output from the PID controller as the target for each data point at each step. This gave us 60 000 data points. Since a majority of these training points were for a pendulum balanced upright with nothing affecting it, we decided to remove all data points gathered after 4.5 seconds into the simulation. This was done to avoid overfitting issues and left us with 27 500 data points to be used for fitting and testing of the algorithms.

3.5 Implementing the learning algorithms

This section present the implementations of the different algorithms used throughout the project. Both pseudo code and text is presented to describe the implementations. When implementing the algorithms the focus was put into creating easy to understand algorithms, since the training phase was not run during the simulation it was not required to be optimized with regards to time. All pseudo code were derived from the theory part.

For simplicity, the Mean Squared Error (MSE) loss function was used for as the error function for all functions. MSE is equal to the mean of the sum of squared error that was mentioned briefly in section 2.5.4. The MSE was selected because the derivate is easy to calculate with it for the linear regression. The MSE is:

1 N n ÿ i=1 (yi´predict(x))2 (3.1)

(28)

3.5. Implementing the learning algorithms

3.5.1 Neural Network

Neural networks (NN) is a large field and there is a lot that can be done for optimization. In [22] a pendulum was stabilized virtually with a neural network using a energy swing-up and PID controller for stabilizing. However the length of the cart rail was thought to be infinite to minimize the input variables to only θ and ˙θ. There was one hidden layer used and 25 neurons in it. The sigmoid activation function was used. In [23] they also used a NN for stabilizing a pendulum virtually, however here the length of the cart was considered. Therefore there was four input variables to the system x, ˙x, θ and ˙θ. The pendulum was successfully controlled using only 4 neurons in the hidden layers of the NN, however using up to 20 neurons inside the hidden layer did generate better results.

The network was chosen to have one hidden layer and 25 neurons as in [22] because we used the same input variables and derive our training data in the same way. The output function chosen was the tanh function 2.16 over the sigmoid function 2.15 since our output data was centered around 0 it would provide better gradients. The saved computation by using ReLU was not prioritized since the amount of data was not large enough to motivate its use. Using ReLU would give derivates of 1 or -1 since it is a linear function, using tanh was thought to make the gradients more dependent on the target output and therefore yield better convergence.

Figure 3.1: Schematic of the neural network structure

Since this was not a classification problem there was no need for more than one neuron as output, by making the last neuron have a linear output function the output was as expected. During the training the backpropagation was done from the back to the front to make sure all gradients were calculated for previous neurons and did not affect each other. Respectively in the prediction the layers was traversed from front to back. The update on the weights, were done first after all changes had been calculated. This was done to not have any changes affecting other nodes during same iteration.

The derivative of tanh was used in the backpropagation and was equal to: B

Bxtanh x=1 ´ tanh

(29)

For the neural network we used three different hyperparameters. Learning rate, momentum, and epochs. The learning rate was a factor added to not overshoot when decreasing/in-creasing the weights. Momentum was used to quicker converge towards goal output if we had multiple training points indicating the same gradients. Epochs was used to re-use the training data for a training session. Their respective settings is discussed in section 3.6.

hyperparameters: learning_rate, momentum, epochs output :trained NN

1 Function Train(data_points) 2 foreach data_points i do

3 error = mse(y, Predict(i))

4 // Calculate gradients 5 foreach layer l do 6 foreach neuron n in l do 7 if n == output_neuron then 8 if error > 0 then 9 gradientn= 1

10 else if error < 0 then

11 gradient_n= -1

12 else

13 gradient_n= 0

14 end

15 else

16 foreach neuron m in layer [l+1] do

17 sum += m.gradient ˚ n.weightm

18 end

19 gradient_n=Derivative(n.output) ˚ sum

20 end 21 end 22 end 23 // Update weights 24 foreach layer l do 25 foreach neuron n in l do

26 foreach neuron m in layer [l+1] do

27 old = delta_w

28 delta_w=learning_rate ˚m.output ˚ n.gradient+momentum ˚old

29 n_w+= delta_w 30 end 31 end 32 end 33 end 34 end

(30)

output:float

1 Function Predict(data_point) 2 foreach neuron n in layer [0] do 3 n.output = data_point_n* n.weight

4 end

5 // Hidden layers

6 foreach layer l where l.index > 0 do 7 foreach neuron n in l do

8 foreach neuron m in layer [l-1] do 9 sum += m.output ˚ m.weight_n

10 end 11 if n == output_neuron then 12 n.output = sum 13 else 14 n.output = TransferFunction(sum) 15 end 16 end 17 end 18 return output_neuron.output 19 end

(31)

3.5.2 Linear Regression

When implementing the linear regression (LR) there was not as much flexibility as in the case of the NN. The derivation of respective feature was derived from 3.1, if we write the MSE with the prediction function for linear regression expanded.

1 N n ÿ i=1 (yi´m ´ f ÿ j=1 kjxij)2 (3.3)

Where m represents the bias, k is equal to the regression coefficient and x is equal to the input data point. F is equal to the features which is in our case two. N is the total number of data points. Equation 3.3 allows us to derive the gradients for a specific feature f and m.

"_{B f} Bk B f Bm # =   1 N řn i=1´2kf(yi´m ´ řf j=1kjxij)) 1 N řn i=1´2(yi´m ´ řf j=1kjxij))   (3.4)

The start value of m (bias) was initialized to achieve a straight line between maximum and minimum output value.

hyperparameters: learning_rate, epochs output :trained linear regression 1 Function Train(data_points) 2 m = max(data_pointsy)+min(data_pointsy ) 2 3 for i = epochs; i > 0; i- - do 4 foreach data_points d do 5 foreach feature f do

6 change_f+= learning_rate ˚ DerivateLoss_fd

7 end

8 changem+= learning_rate ˚ DerivateLossmd

9 end

10 k_f =k_f ´changef

11 m=m ´ change_m 12 end

13 end

Algorithm 4:Pseudo code of the training function for Linear Regression

output:float 1 Function Predict(data_point) 2 value= m 3 foreach feature f do 4 value+= data_point_f * k_f 5 end 6 return value 7 end

(32)

3.5.3 Regression Tree

Both the training and prediction algorithm were implemented recursively. Since binary tree structures are suitable to implement in recursive functions and time efficiency not being a priority in the training phase. Right and left is children of a specific node.

hyperparameters: max depth, leaf k output :trained tree

1 Function TrainRecursive(data_points, element, parent = nullptr) 2 if (parent.depth + 1 >= max depth) then

3 return

4 end

5 foreach feature f do

6 foreach data_points [ f ] i do

7 node[i].value = mean (data_points_f[i], data_points_f[i+1]) 8 foreach data_points [ f ] j do

9 node[i].residual += residual (node [i].value, j)

10 end

11 end

12 best_node_f = min (node.residual) 13 end

14 element = min (best node) 15 foreach data_points i do 16 if i < element then 17 leftdata.append(i) 18 else 19 rightdata.append(i) 20 end 21 end

22 if (size (leftdata) > leaf k) and (size (rightdata) > leaf k) then 23 TrainRecursive(leftdata, element.left, element)

24 TrainRecursive(rightdata, element.right, element) 25 end

26 end

Algorithm 6:Pseudo code of the recursive training function for Regression Tree

output:float

1 Function PredictRecursive(data_point) 2 if has children then

3 if data_point_f < value then

4 return leftchild PredictRecursive(data_point)

5 else

6 return rightchild PredictRecursive(data_point)

7 end

8 else

9 return value 10 end

(33)

3.6. Selection of hyper parameters for the algorithms

3.6 Selection of hyper parameters for the algorithms

In order to select the hyper parameters for the different machine learning algorithms we performed a grid search through the following values and selected the value for which the mean squared error was smallest.

For Neural Network:

Epochs=t1, 5, 10, 15, 20, 25, 30, 35, 40u

Momentum=t0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.03u Learning rate=t0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.03u For Linear Regression:

Epochs=t1, 5, 10, 15, 20, 25, 30, 35, 40u

Learning rate=t0.001, 0.002, 0.005, 0.008, 0.01, 0.02, 0.03u For Regression tree:

Depth=t10, 20, 50u Lea f k=t10, 50, 100u

3.7 Comparing the algorithms

To answer our first research question the data from the pendulum simulation based on the PID-controller, classic control theory, was compared to the data from the simulations with the different machine learning algorithms. The data was plotted in graphs that were compared to be able to separate the results. We compared how long time it took for the algorithm (if ever) to balance the pendulum in an upright position with less than 1 degree error. The algorithms where then compared to how much bias and variance they each respectively has and how this could have affected respective result.

When adding different weights to the pendulum the same comparisons were made to be able to determine how the different algorithms performed when altering the environment.

(34)

4 Results

In this chapter the result of the different simulations are presented in graphs. The starting an-gle is different in each graphs, this can be seen above the graphs. The first section will present the result of the base pendulum where no changes has been made. After this 5 different en-vironments will be presented in the same way. In these 5 enen-vironments different changes has been made.

4.1 Base pendulum

The result of the simulations are presented in the graphs below. The parameters of the pen-dulum can be seen at the top of the image. In the simulation of the base penpen-dulum the cart mass is 1.0, the pendulum mass is 0.3 and the pendulum length is 0.5. The graphs show that the linear regression algorithm, together with the tree algorithm, has the worst performance. The tree algorithm does not succeed balancing the pendulum when starting in an upright position (balanced starting state). The controller that is closest to the PID controller is the Neural Network, it mimics the PID-controller for all starting angles.

(35)

4.2. Data from altered environments

4.2 Data from altered environments

Here the different data from our altered environments are presented. In total 5 different altered environments have been studied. There are some of the three different parameters in the top of each image that have been changed in the different environments, cart mass, pendulum mass and pendulum length.

4.2.1 Altered environment 1

In this environment we have changed the pendulum mass from 0.3 to 1.0. The linear re-gression algorithm and the tree algorithm, again has the worst performance. Tree algorithm manages to balance the pendulum for starting offsets smaller than 9 degrees, but not when starting in a balanced state. The controller using the linear model only manages to balance the controller when starting in a balanced state. The neural network controller is performing almost identical to the PID-controller, a little better when starting with a 90 degree offset and a little bit worse when starting with a 60 degree offset.

Figure 4.2: Graphs containing the different simulation results in altered environment 1 with changed pendulum mass

(36)

4.2.2 Altered environment 2

In this environment we have changed the pendulum mass from 0.3 to 1.0 and the pendulum length from 0.5 to 1.0. The controllers performances is very close to the first environment, with the exception that the controller using a neural network is performing slightly better than the PID-controller. This is most visible when starting with large offset values on the angle.

Figure 4.3: Graphs containing the different simulation results in altered environment 2 with changed pendulum mass and pendulum length

(37)

4.2.3 Altered environment 3

In this environment the cart mass has been changed from 1.0 to 2.0 and the pendulum length has been changed from 0.5 to 2.0 compared to the base environment. Here we can see that for the simulations that started with large offset values the neural network significantly out-performed the PID-controller, the neural network managed to balance the pendulum when starting with a 90 degree offset, while the PID failed for the same angle. The regression tree algorithm for the first time managed to balance the pendulum when starting in an upright position and also for the 9 and 4 degree offset. The linear controller did not succeed on any simulation except for when starting in a balanced state.

Figure 4.4: Graphs containing the different simulation results in altered environment 3 with changed cart mass and pendulum length

(38)

4.2.4 Altered environment 4

In this environment the mass of the cart is the only thing that has been changed. The cart mass changed from 1.0 to 2.0. The graphs show a very similar performance to the altered environ-ment 2. The regression tree algorithm is again unable to balance the pendulum when starting in an upright position but succeeds with an offset of 9 or 4 degrees. The linear controller is not able to balance for any other simulation than the one starting with upright position. The PID-controller and the neural network is following each other with a slightly advantage for the neural network.

Figure 4.5: Graphs containing the different simulation results in altered environment 4 with changed cart mass

(39)

4.2.5 Altered environment 5

For the last environment we have changed the mass of the cart to 2.0, the length of the pen-dulum to 2.0 and the mass of the penpen-dulum to 1.0. The neural network controller deviates from the PID but is not performing better. The tree algorithm only manages to balance the pendulum when the starting offset is 4 degrees, while the linear regression controller only manages to balance the pendulum when starting in a balanced state.

Figure 4.6: Graphs containing the different simulation results in altered environment 5 with changed cart mass, pendulum mass and pendulum length

(40)

5 Discussion

The discussion is founded in the theory and result of the paper. First the result is discussed by going over the different algorithms one by one. Speculations about the outcome are made with connection to the theory. The method is also discussed with theories about what im-provements that could have been made and the assumed consequences of these improve-ments. With connection to the method the sources are discussed. In conclusion the work is discussed in a wider context.

5.1 Results

When studying and comparing the results we could see that one of the algorithms were able to mimic the PID-controller fully, the neural network. This is clear in figure 4.1 where the line of the PID-controller and the neural network follows each other. The controller using the neural network is also the only of our trained algorithms which manages to balance the pendulum for all simulations with starting offsets larger than 9 degrees. The linear regression never manages to balance the pendulum when not starting in a balanced state and therefor does not perform significantly better than using no controller at all. The controller using the regression tree algorithm on most occasions only performs better than using none controller for two starting offsets, 9 and 4 degrees, while it in all simulations except one performs worse than no controller when starting in a balanced state.

As seen in 2.5.3 the neural network algorithm can do non-linear mapping. This together with a small bias and low variance is what allows it to mimic the reference data fully. The regres-sion tree has the freedom to map in hypercubes, it makes no assumption about the target function and therefore gives us a low bias and high variance. The tree output values is lim-ited to the number of leaves compared to the linear regression and the neural network that has a continuous output function which suits this problem more. There is more advanced tree algorithms such as random forest and GBM which would probably have yielded a bet-ter result, but due to the linearity of the target function they would not be suitable for this issue because the algorithm makes too few assumptions about the target function. As for the Linear regression, it did not mimic the reference data at all and was performing worst of all algorithms. Based on 2.2 the balancing of the pendulum becomes a linear target function as the angle approaches upright position. We believe that because of this the linear regression should be able to balance the pendulum for more angles if tweaking with the training data, because it makes the assumption that the target function is linear it is reasonable that it does not manage to balance on training data that requires non-linear mapping, and removing these training data points would probably make the linear controller better.

Using machine learning for control systems in transforming environments

Linköping University | Department of Computer and Information Science

Bachelor thesis, 16 ECTS | Datateknik

2020 | LIU-IDA/LITH-EX-G--20/046--SE

Using machine learning for

control systems in

transform-ing environments

Felicia Barkrot, Mathias Berggren

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

1

Introduction

1.1

Motivation

1.2

Aim

1.3

Research Questions

1.4

Delimitations

2

Theory

2.1

Control theory

2.1.1

Open-loop and closed-loop control systems

2.1.2

Controllers

2.2

Physics of the pendulum

2.2.1

Equations of motions

2.3

Artificial intelligence

2.4

Machine learning

2.5

Supervised Learning

2.5.1

Regression Tree

2.5.2

Linear regression

2.5.3

Artificial Neural Networks

2.5.4

Backpropagation

2.5.5

Activation functions

2.5.6

Hyperparameters

2.6

Modelica

2.7

FMI - Funtional Mock-up Interface

2.8

Related Works

3

Method

3.1

Virtual model of pendulum

3.1.1

Implementation in OpenModelica

3.1.2

Altering of the environments

3.2

PID Controller

3.3

Choosing algorithms

3.4

Training data

3.5

Implementing the learning algorithms

3.5.1

Neural Network

3.5.2

Linear Regression

3.5.3