Using Artiﬁcial Intelligence for the Evaluation of the Movability of Insurances

(1)

Using Artificial Intelligence for the Evaluation of the Movability

of Insurances

Martin ˚ Aslin

June 29, 2013

Master’s Thesis in Computing Science, 30 ECTS credits Supervisor at CS-UmU: Patrik Eklund

Examiner: Fredrik Georgsson

Ume˚ a University

Department of Computing Science SE-901 87 UME˚ A

SWEDEN

(2)

(3)

Abstract

Today the decision to move an insurance from one company/bank to another is done manually. So there is always the risk that a incorrect decision is made due to human error. The goal of this thesis is to evaluate the possibility to use an artificial intelligence, AI, to make that decision instead. The thesis evaluates three AI techniques Fuzzy clustering, Bayesian networks and Neural networks. These three techniques was compared and it was decided that Fuzzy clustering would be the technique to use. Even though Fuzzy clustering only achieved a hit rate of 69%, there is a lot of potential in Fuzzy clustering. In section 4.2 on page 32 a few improvements are discussed which should help raise the hit rate.

(4)

(5)

List of Figures

2.1 The process of fuzzy controls . . . 8

2.2 Inference using Mamdani’s method . . . 9

2.3 K-Means clustering example. . . 9

2.4 Membership values for clusters . . . 10

2.5 Projection of a cluster . . . 10

2.6 Simple bayesian network . . . 12

2.7 Bayesian networks - order of nodes . . . 12

2.8 Calculating probability in a Bayesian network. . . 13

2.9 A neuron . . . 14

2.10 Neural network with one hidden level . . . 15

2.11 C-Means clustering algorithm . . . 17

2.12 Criterion number . . . 19

2.13 Create rules . . . 19

2.14 Calculate the input. . . 20

2.15 Process of Mamdani’s method. . . 21

2.16 Mamdani - Output membership function. . . 21

2.17 Centre of Gravity. . . 21

2.18 Larsen - Output membership function. . . 22

2.19 Takagi-Sugeno - X matrix. . . 22

2.20 Takagi-Sugeno - Y vector. . . 22

2.21 Takagi-Sugeno - P vector. . . 22

2.22 Takagi-Sugeno - β variable. . . 23

2.23 Takagi-Sugeno - matrix computation . . . 23

3.1 The impact of the fuzzyness variable . . . 27

3.2 The impact of the number of runs variable . . . 29

3.3 The impact of the interval limits . . . 30

v

(8)

(9)

List of Tables

2.1 Randomized Membership Matrix . . . 17

2.2 Adjusted Membership Matrix . . . 18

3.1 Output intervals of the inference methods . . . 26

3.2 Output from the system . . . 28

vii

(10)

(11)

Chapter 1

Background and motivation

1.1 Introduction

¹

Today’s banks and insurance companies have the possibility to, with the customers consent, take over personal insurances like for example capital, service and private pension insurances.

Which insurance that is movable, i.e. possible to take over, or which part of the insurance that is movable is partly decided by legislation and partly by the content of the insurance.

The size of one’s capital can be of big significance for a insurance company or bank when they decide if they want to take over the insurance. The movability is also decided by internally set rules by all separate companies. This makes the movability of a insurance dependent on a lot of parameters that can change a lot. Today the majority of the evaluation of the movability of insurances is done manually and from manually created sets of rules. There are three main types of assessment criteria, green - the insurance is movable, yellow - the insurance might be movable or partly movable, and red - the insurance is not movable. Sometimes there are a lot of sub-criteria within each of the main criteria, where a more detailed assessment is described. Here, for example, it can be described how to proceed in a move matter by, for example, requesting a health certificate. Manual evaluation of insurances is a time consuming process which could benefit from becoming fully automated or partly automated. The goal of this thesis project is to create a system that can classify insurance information.

1.2 Insurances

This section will explain what an insurance is, how the structure of an insurance looks and what you have to keep in mind when reviewing to see if the information in the insurance is correct. When reading this keep in mind that this is based on Swedish insurances and might differ from insurances in other countries.

1.2.1 What is an insurance

An insurance is an agreed upon financial protection in exchange for payment. So you pay some bank or company a fee and they will give you money in case something happens like an

1Most of the information about insurances in chapter 1 has been found in [16] which is an internal document at Svenska F¨ors¨akringsfabriken.

1

(12)

accident or if you retire. Some insurances will only give you back roughly the same amount of money as you have paid them, they are called savings insurances and an example of such insurance is the pension insurance. There is another type that is called risk insurances which will in many cases give you a lot more money than you have paid to the insurance company, but they are only valid as long as you pay the fee and you will not get the money back if you stop paying. The risk insurance system works because of the number of people they insure and that not everyone will be in e.g. an accident and in need of insurance money.

So the customers pays for each other, if all their customers would suddenly need insurance money then the company would likely become bankrupt. The insurance companies have rules where they evaluate the risk of something happening and adjusts the fee required or simply not approve the insurance.

For example: lets say an 18 year old man wants to insure a brand new super car in a major city, then he will most likely have to pay a huge yearly fee or the insurance company might not want to sign him because the high risk of something happening to the car. So the risks that they might look at in this case might be:

– His young age and the fact that he is male, which statistically means that he is more likely to be in some kind of accident.

– Powerful sports car which is easier to lose control of and they are a more desirable target for thieves.

– Lives in a major city with a big population which means more interaction with more people which means that there is a higher chance that some kind of accident occurs.

1.2.2 Structure

The structure can be split into two main parts that makes up an insurance. The first part contains more of an overall information about the insurance such as the insurance number, the person that is insured, the owner of the insurance, the insurance provider when it was signed, the cost of the insurance and etc. The second part contains information about the content of the insurance. The content of the insurance is divided into moments. It is possible for insurances to include more than one moment so e.g. an accident insurance will contain a accident moment and a pension insurance can contain a pension moment and a health insurance moment. The possibility to add more moments to the insurance is up to the insurance provider.

1.2.3 Types

There are a lot of different insurances but they can all be divided into two main types, that was mentioned in section 1.2.1 on the previous page, namely savings insurances and risk insurances. Savings insurances are paid to the insured after a certain date, e.g. pension savings, and risk insurances are paid after accidents, sickness etc. and is only valid as long as the insurance fee is paid so e.g. if a person never is in an accident then they will not be able to use the money they have paid for the insurance. Though the advantage with the risk insurance is that the money you get if you e.g. are in an accident can be much higher than the amount you have paid in.

Here are some of the types of insurances:

Pension this is a insurance that will let you receive money during your pension in a certain period of time and interval. Can be signed by a private person and/or a company.

(13)

1.3. Problem description 3

Survivor’s protection this moment exists so that the husband/wife, children or other heirs to the insured gets the money if the insured person dies.

Health insurance this insurance will provide the insured person with an income in case of sickness or early retirement due to ill health. Though a qualifying period of sickness might be needed.

Sickness-/ prematurely-capital this insurance will grant you a one-time amount of money if you become sick or hurt enough that you are granted sickness benefit.

Medical treatment insurance this insurance covers costs during medical treatment/ at- tendance.

Accident insurance usually contains three moments:

Medical treatment costs is costs that occur with sickness/injury. and includes costs for treatment of a doctor/dentist but also necessary travel costs during the treatment.

Disability capital a one time amount of money one receive if one is afflicted with a permanent disability or decreased capacity to work.

Death capital a sum that is paid to the husband/wife, children or heirs of the insured person in case of death.

Premium exemption this is an add-on for insurances that makes the insurance provider take responsibility for the payment of the premium if the insured becomes so sick that the period of sickness is greater than the qualifying period of sickness.

1.2.4 Pitfalls

There are a several pitfalls to keep a lookout for when reviewing an insurance and it slightly differs between the different types of insurances. Some general ones that apply to most of the insurances are e.g. is the insurance number, personal number, dates, sums, status etc.

correct?. Another pitfall is if an insurance has a status of continuous and a Z-time that has passed, then something is wrong. Z-time marks the last day that the insurance is valid, but that is only the case of the risk type insurance. A Z-time in a savings type insurance means the date when they should start paying e.g. a person’s pension.

1.3 Problem description

This section will describe the goal of this thesis and cover some of the problems that has to be considered and/or solved during this thesis.

1.3.1 Goal

The process of evaluating the movability of insurances is a time consuming process and that is mainly because it is done manually and because of the complex rules. The goal of this thesis is to make a system that can, based on manually pre-classified insurance evaluations, learn to classify insurance information. This could potentially save a lot of time while reducing the number of human errors. There are a few requirements of the system:

– The result of the evaluation should consist of a flag and a text.

(14)

– Use data that has been serialized as XML.

– The system should be as general as possible.

– The system should be implemented in C# and .NET 4.0

1.3.2 Data

The data this system will use for training will be stored in XML-files. The problem with the data is that the number of elements an insurance have will differ between the insurances.

The possibility to add additional ”moments” to most of the insurances, will create a lot of possible combinations and a lot of different elements that the insurance contain. Some of these elements can of course be filtered out with a preprocessor since they might not affect any decision. But can, and probably will, still leave us with a variable number of remaining elements. It would be possible to create a large neural network, NN, that takes all, possible, elements and just set the ones not used to NULL or 0. But that would make it harder to train the system since it would have to learn a more complex model. It would probably be more efficient to somehow divide the problem into a lot of smaller NNs that is more focused on a subset of the problem. For example: that each of the different types of insurance has its own network. They would still be large and/or complex since they can have a lot of different moments, but perhaps it is possible to divide it even further and maybe combine with other approaches such as Bayesian networks or a fuzzy clustering network.

There is another problem to consider with the data and that is the date and the sum values and that is how should they be represented in the system. The problem is that they are values that can be continuous meaning that they might represent 13 January 1988 to 11 February 2013 or 20-60.000 SEK.

1.3.3 Rules

There are of course a lot of rules regarding insurances and the rules can help us filter out insurances which is incorrectly filled in, thus pointless to evaluate, and of course to help us make a decision on whether or not the insurance is movable. A problem with the rules is that they are complex and might be hard to implement and that companies can have different rules. Another problem is that the rules can change a lot in the insurance industry and that makes it important to make a system that can adapt to new rules or a system where it is easy to change or add rules. It would be desirable if the system could detect any ambiguity in the rules before and after any change or addition of new rules.

1.3.4 The structure of the system

If we decide to make one large structure/network that should handle every case then we will end up with a behemoth of a system that will need a lot of data samples to be trained well enough. Another problem would be the training time which would be very long, since it will be a very complex function it would have to model.

On the other hand if we decide to use small expert systems e.g. one structure/network for each type of moment. That could allow us to shorten the training time since it will be a less complex function with less amount of training data. During retraining it might be possible to not having to retrain all of them which can save time. But we would have to come up with a way to train the system since insurances can have multiple amounts of moments but only one verdict. So if an insurance has a yellow flag should we assume that

(15)

1.3. Problem description 5

both moments was yellow, or was one of them yellow and the other green? The same has to be considered after the moments has been evaluated, if one moment is green while the other is yellow will that make the complete verdict yellow?

(16)

(17)

Chapter 2

Theory

2.1 Overview of ’intelligent computing’

There will be three types of approaches that will be briefly explained in this section and these approaches are logical, probabilistic and numerical. Then they will be compared against each other and the pros and cons of each approach will be presented. At the end of this section there will be a more thorough explanation and some algorithms of the chosen approach.

2.1.1 The logical approach

¹

The method in the logical approach that will be studied is the fuzzy clustering method.

Before explaining fuzzy clustering it is necessary to explain fuzzy sets, fuzzy logic and fuzzy control which is all used in fuzzy clustering.

Fuzzy sets and fuzzy logic

Lets say we have the proposition it is cold outside, now is this true if we know that the temperature is 17^◦C? Some people will probably find it cold while some might find it to be warmish. The problem with the proposition is the linguistic term cold which is not an defined line where every value below that line is considered to be cold. Instead the linguistic term cold is an arbitrary value that changes from person to person. Now this is where fuzzy sets come into the picture, they allow us to represent a value such as cold.

What the fuzzy set does is that it divides the cold value into degrees of cold, usually a value between 0 and 1. So if we go back to is it cold outside when it is 17 c? a fuzzy set might represent this as 0.65 true instead of definitely stating that it is true or false.

When fuzzy sets are used in logical expressions it is called fuzzy logic. Fuzzy logic describes fuzzy truth values which are a function of the truth values of the components.

The standard rules for evaluating these fuzzy truth values, T, of a complex sentence are:

– >(A ∧ B) = min(>(A), >(B)) – >(A ∨ B) = max(>(A), >(B))

1The majority of the information for this basic explaination of Fuzzy logic, sets, control and Clustering has been found in [2] and [9]

7

(18)

– >(¬A) = 1 − >(A)

So for example if we have >(Cold(outside)) = 0.65 and >(F reezing(M artin)) = 0.55 and then we have >(Cold(outside)) ∧ >(F reezing(M artin)) = 0.55 which is probable.

These rules might not change when a variable is modified, even when we would have wanted it to change. For example if we have A = 0.75 and B = 0.33 then min(>(A), >(B)) = 0.33 but if we change A = 0.85 then we still get min(>(A), >(B)) = 0.33 even though we might want a new value since one of the values changed. There are a few ways we can improve this but they will not be mentioned.

Fuzzy control

Fuzzy control uses rules for making decisions. A rule, <, is expressed as < : IF <

f uzzycriteria > T HEN < f uzzyconclusion >. Fuzzy control has a number of rules and these rules are stored in what is called a rulebase. There are a few ways to create a rulebase:

– Have an expert write the rules.

– Observe and record the in and out data when a expert performing the actions for a period of time.

– Generate rules based on data.

In this project the last one is the one that is of most interest since we want to minimize the number of human decisions.

Figure 2.1: The process of fuzzy controls, [2] page 58.

figure 2.1 describes the process of fuzzy controls. First the input, x, gets fuzzified which means that x is transformed into its corresponding truth value. Then the now fuzzified x gets combined by a logical conjunction which then is combined with the output membership function of the rule. The newly created membership function is then calculated before being defuzzified.

Inference is used to create the conjunctions, in figure 2.2 on the next page we can see the Mamdani inference method which in this case uses the minimum operation and then combines the output results by using the maximum operation. In figure 2.2 on the facing page the result given by the maximum operation is the grey field in U₁since that result has a higher value than U₂.

(19)

2.1. Overview of ’intelligent computing’ 9

Figure 2.2: Inference using Mamdani’s method, [2] page 59.

Fuzzy clustering

Figure 2.3: K-Means clustering example.²

In cluster analysis or clustering one strives to divide the data into different groups(clusters).

The data that is clustered together are more similar to each other than the ones in the other clusters, figure 2.3 is an example of a set of data points that has been divided into three clusters. Now in fuzzy clustering, a number of data points gets divided into clusters but now all data points belong to each one of the clusters, but in different degrees, just like in

2http://en.wikipedia.org/w/index.php?title=File:KMeans-Gaussian-data.svg&page=1, 8 Okt 2012

(20)

fuzzy logic. The closer a point is to the center of the cluster the more it belong to that cluster. The degree of belonging or membership value that a data point has, have to sum up to 1.0. So a data point could have the following membership values U₀= 0.17, U₁= 0.35 and U₂= 0.48 which sums up to 1.0.

Figure 2.4: Membership values for clusters, [2] page 68.

Looking at figure 2.4 we can see two clusters and a number of data points. The number above the points is the membership value of the point for that cluster. In the left square we can see the membership values to cluster 1, and as you can see even the ones that is really close to cluster 2’s center, represented by the right most x, have a membership value to cluster 1. In fuzzy C-means clustering, which is the technique that will be used if fuzzy clustering is chosen, it is usually the distance from the center of the cluster that decides the membership value.

Figure 2.5: Projection of a cluster, [2] page 71.

Projections of the data points will be created after the set of data points has been divided into clusters. Each cluster will get their own set of projections, this can be seen in figure 2.5, these projections will be used to generate the rules of that clusters that together will form the rulebase. The creation of the rulebase marks the end of the training and it can now be used to create output. In order to get this output we need something called inference which

(21)

will translate any new input to output with the help of the rulebase.

2.1.2 The probabilistic approach

³

This section will go through Bayesian networks which is based on Bayes’ theorem which will be explained first.

Bayes’ theorem

P (A) . P (B|A) is read as what is the probability of B given A. With Bayes’ theorem it is possible to calculate the probability of an unknown variable by using the probability from three known variables and that is a common case where a few probabilities is known while the one that we need to know is unknown.

In [9] they give an example where a doctor knows P (symptoms|disease), probability of symptoms given disease, but wants to know P (disease|symptoms). In the example the doctor knows that:

– P (s|m) = 0.7 – P (m) = 1/50000 – P (s) = 0.01

Where s is patient has a stiff neck and m that the patient has meningitis. So by using Bayes’ theorem the doctor can calculate that the probability of a patient having meningitis when the patient has a stiff neck is:

P (m|s) = P (s|m)P (m)

P (s) = 0.7 · 1/50000

0.01 = 0.0014 Baysian networks

Bayesian networks is a common approach when a system might have to deal with uncertainty.

Bayesian networks are based on Bayes’ theorem section 2.1.2. A Bayesian network can be described as a probabilistic graphical model that can represent dependencies between variables, see figure 2.6 on the next page.

The figure 2.6 on the following page⁴ shows how a simple Bayesian network can look.

There you have a graphical model describing the relationships between the different nodes.

Each of the nodes in the network has a probability table associated with it and since the rain node is not dependent on any other node, then only the unconditional probability of each state is necessary. When building a Bayesian network it is important to make a good model of the relationships so that a node does not depend on, for that node, unnecessary variables. The way the nodes are introduced in the system, the order of them, can have a big impact on performance. If the nodes are introduced in a ’not so good’ way it could potentially mean that some nodes will get unnecessary dependencies and sometimes that could give dependencies that is difficult to calculate. In [9] they give the following example:

3The majority of the information for this basic explaination of Bayes’ Theorem and Bayesian Networks has been found in [9]

4http://en.wikipedia.org/wiki/Bayesian network, 20 sep 2012

(22)

Figure 2.6: Simple bayesian network

Figure 2.7: Bayesian networks - order of nodes

The Bayesian networks in figure 2.7 both describe the same problem, the only difference is the order of the nodes. In network A we have Alarm which is dependent on Burglary and Earthquake, M arycalls and J ohncalls is dependent on Alarm. Which means that if either a burglary is in progress or if there is an earthquake the alarm will go off which will cause either or both Mary and John to call the owner. In network B we have J ohncalls which is dependent on M arycalls, Alarm which is dependent on both M arycalls and J ohncalls.

burglary is dependent on Alarm and Earthquake depends on both burglary and Alarm.

A few details from the example in the book are needed in order for this to make sense. In the example they state that Mary often listens to loud music so she might not hear the alarm, so if she is calling it is a high probability that John will call as well. Which makes J ohncalls dependent on M arycalls. The book, [9], will be quoted for the dependency between burglary and earthquake.

If the alarm is on, it is more likely that there has been an earthquake. (the alarm is an earthquake sensor of sorts.) But if we know that there has been a burglary, then that explains the alarm, and the probability of an earthquake would only be slightly above normal. Hence, we need both alarm and burglary as parents.

(23)

P (R|G)_{T T} = P (G, R)_{T T} P (G)T

= P

S∈{T ,F }P (G = T, S, R = T ) P

S,R∈{T ,F }P (G = T, S, R)

= P (G, S, R)T T T + P (G, S, R)T F T

P (G, S, R)_{T T T} + P (G, S, R)_{T T F} + P (G, S, R)_{T F T} + P (G, S, R)_{T F F}

= (P (G|S, R)P (S|R)P (R))T T T + (P (G|S, R)P (S|R)P (R))T F T

= (0.99 · 0.01 · 0.2) + (0.8 · 0.99 · 0.2)

(0.99 · 0.01 · 0.2) + (0.9 · 0.4 · 0.8) + (0.8 · 0.99 · 0.2) + (0.0 · 0.6 · 0.8)

= 0.00198 + 0.1584

0.00198 + 0.288 + 0.1584 + 0.0 ≈ 35.77%

Figure 2.8: Calculating probability in a Bayesian network.

As we can see we get two additional dependencies when the Bayesian network is arranged like B instead of A. This could, as stated above, mean that the computations becomes harder to calculate.

Lets do an example⁵ that requires some calculations. Say that we have a Bayesian network that looks like figure 2.6 on the facing page and we want to know What is the probability that it is raining, given the grass is wet? So what we want to know is P (R|G) where R means raining, G means Grass wet, S means sprinkler turned on and T means true. In figure 2.8 we can see how it is solved by using Bayes’ theorem and other statistical formulas/rules. It is possible to describe the probability of most of the scenarios that can occur based on the probability functions, even if some parts are unknown.

With Bayes’ theorem and other statistical formulas/rules it is possible to describe the probability of most of the scenarios that can occur based on the probability functions, even if some parts are unknown. There is a lot more that can be said about Bayesian networks but this is supposed to be a brief introduction/explanation. If this gets chosen you can read a more thorough explanation in section 2.3 on page 16.

2.1.3 The numerical approach

⁶

Some problems might be too hard for designers solve on their own since it can sometimes be hard (if not impossible) for a designer to predict all of the situations/states in which the system might find itself in. The change over time is another problem that is hard to predict, e.g. the stock market, and sometimes they don’t have an idea on how to program the solution and this is where learning based AI can be a good choice since they will learn to become a solution. There are many different types of learning approaches, but this project will focus on neural networks and explain how those work.

5http://en.wikipedia.org/wiki/Bayesian networks, 20 Sep 2012

6The majority of the information for this basic explaination of Neural Networks has been found on Wikipedia and some in [9]. The information about Artificial Neural Networks has been found in [9]

(24)

Neural networks

In the world of artificial neural networks(ANN) or simply neural networks(NN) one tries to achieve ‘intelligence’ by modelling the system after a biological neural network(BNN), like the human brain. Before I continue to explain ANN I will try and explain how a BNN works.

Disclaimer: This will be a simple explanation since I am far from an expert in the field of neuroscience.

Figure 2.9: A neuron.⁷

A BNN is a vast network of connected nerve cells called neurons. Each neuron consists of a cell body (soma) which contains a cell nucleus which contains the cells genetic material.

Stretching out from the body are a number of dendrites which receives signals from other neurons and a single long fiber called axon. The axon sends signals to other neurons. The axon and the dendrites are connected to other neurons at junctions called synapses. So that is the structure of the BNN, now to explain how it works. Lets say that you see a flower, then a lot of your neurons will start firing and sending signals to other neurons until they stop at a state where you either recognises that it is in fact a flower, maybe even what type of flower, or that it is something unknown.

So this is what a ANN is trying to mimic. An ANN will be built with a few layers. First we have one Input layer and then a number of hidden layers and lastly one output layer.

Each layer consists of a number of nodes (neurons) and each of these nodes are connected to all the nodes in the next layer, in one direction, see figure 2.10 on the next page. The connection between two nodes, lets say i and j, serves to propagate activation a_ifrom i to j and this connection has a numeric weight, w_ij, associated with it. This weight describes the strength and sign of the connection. During the learning process the strength of the weights will be updated to produce a desired signal flow. When a node derives its output it starts to calculate the weighted sum of all its inputs and then it applies an activation function to this sum. This is called a feed-forward network and it is the type that will be used in case ANN will be used.

One of the biggest risks when working with AI and systems that needs to be trained, is the risk of over training them. This could mean that it starts over-fitting and if the

7http://en.wikipedia.org/wiki/Nervous system, 18 sep 2012

(25)

network is too big the network might become a big lookup table. There are a few available techniques that can help reduce over-fitting, but those will be mentioned in section 2.3 on the following page if this approach is chosen.

Figure 2.10: Neural network with one hidden level.⁸

2.1.4 Pros and cons

In this section the pros and cons of each of the methods will be evaluated and compared against the others and based on those a method will be chosen as the most suited method for this project. The pros and cons will be based on these criteria:

– Can the method handle a variable number of inputs.

– How would it have to handle dates/sums.

– Can it handle complex rules

– how well can it handle changes to the rules.

Can the method handle a variable number of inputs

Fuzzy Clustering and Neural networks does not handle a variable number of inputs so well so it would be necessary to either assign the missing inputs as N U LL, 0 or make many small expert systems for e.g. every type of moment. If the first choice is used we will end up with a big system that will be harder to train and would require a lot more data for the training. While the second choice with the expert systems will be simpler to train, require less training data and after a rule change it might only be necessary to retrain a few of all the expert systems. But a problem that arises is that the results would have to be combined if an insurance contains multiple moments and it would then be necessary to know what to do if the moments produce different results e.g. one moment is green while another is yellow or red. We could make small expert systems for every insurance type as well but then we would still have the problem that a insurance can contain multiple moments i.e.

8http://en.wikipedia.org/wiki/Artificial neural network, 18 sep 2012

(26)

the number of input will vary. Bayesian Networks can handle variable inputs since it is possible to calculate the probability of missing variables by using Bayes’ theorem and other statistical functions/rules. Given the number of inputs variables available in this project and their dependencies it could mean that we would have to construct a big and complex model. That could lead to complex calculations.

How can it handle dates and/or sums

This problem is the same for all methods. The date or sum will have to be converted into a numerical representation before the system can use them, which means that we need to figure out how they should be represented.

Can it handle complex rules

Neural Networks are good at learning complex rules but one can never really be sure which complex rule it has learned. Fuzzy Clustering can also learn complex rules, but unlike Neural networks it is possible to show how/why it makes the choices it makes. Bayesian Networks can describe complex stochastic relationships between variables.

How well can it handle rule changes

If the rules change then both the Neural Network and Fuzzy Clustering require retraining.

It takes a long time to train a Neural Network and even longer for a Fuzzy Cluster. This makes it even more interesting with making one expert system for each moment since that could help us reduce training times. The smaller expert systems would require less training data and would have to learn a less complex function which will save us time. Bayesian Networks do not require retraining, though it might be necessary to update the probability tables.

2.2 The chosen method

The method that was chosen for this project was fuzzy clustering. The two deciding reasons for why the fuzzy clustering was chosen was:

Show how the system ’thinks’ With fuzzy clustering it is possible to show, with e.g.

graphs, how the system ’thinks’ which I evaluated would be a strong reason for picking fuzzy clustering.

Similar problem with good results Patrik, my supervisor at the CS-department, had done similar work with fuzzy clustering and with good results.

In section 2.3 the algorithms for fuzzy clustering will be explained.

2.3 Algorithms

In this section I will describe the fuzzy clustering algorithms used by this system.

(27)

2.3. Algorithms 17

2.3.1 C-Means Fuzzy Clustering Algorithm

This is the main algorithm for the type of fuzzy clustering that will be used in this project, namely C-means fuzzy clustering. The algorithm for C-means clustering can be seen in figure 2.11 and can be seen in [2] page 69.

Step 1: Fix c and m. Initialize U to some U⁽¹⁾. Select ε > 0 for a stopping condition.

Step 2: Update midpoints values vi for each cluster ci

Step 3: Compute the set µk ≡ i : 1 ≤ i ≤ c : ||xk− vi|| = 0, and update U^(`) according to the following:

if µ_k = ∅, then uik= 1/[Pc

j=1(||xk− vi||/||xk− vj||)^2/(m−1)] otherwise

uik= 0∀i /∈ µk andP

i∈µkuik= 1

Step 4: Stop, if ||U^(`+1)− U^`|| < ε, otherwise go to Step 2.

Figure 2.11: C-Means clustering algorithm

Step 1: In this step the membership matrix U is initialized. The membership matrix is an matrix that contains all the membership values in the application. It is used to check how strongly an input is tied to a particular cluster. To create and initialize an membership matrix is very simple, just create a matrix of size CxI where C is the number of clusters and I the number of inputs. Then just fill fill the matrix with randomized values 0 ≤ value ≤ 1, this can be seen in table 2.1. Though there is a criterion that the values has to satisfy and that is that the sum of all values for one cluster must sum up to 1 which in the table 2.1 is not satisfied. But that is easily fixed by dividing all the values in that cluster with the sum of the values. In table 2.2 on the next page the values have been divided with the sums in table 2.1 and now sums up to 1.

Table 2.1: Membership Matrix with randomized values between 0 and 1

Inputs

Clusters I1 I2 I3 I4 Sum^a Cluster₁ 0, 05 0, 53 0, 48 0, 16 1, 22 Cluster2 0, 79 0, 69 0, 60 0, 91 2, 99 Cluster3 0, 67 0, 80 0, 40 0, 35 2, 22

aThe sum is not part of the membership matrix, its just there to show that it is > 1.

(28)

Table 2.2: The Membership Matrix have been adjusted to sum up to 1 by dividing the input values by the previous sum

Inputs

Clusters I1 I2 I3 I4 Sum^a

Cluster₁ 0, 040983607 0, 43442623 0, 393442623 0, 131147541 1 Cluster₂ 0, 264214047 0, 230769231 0, 200668896 0, 304347826 1 Cluster3 0, 301801802 0, 36036036 0, 18018018 0, 157657658 1

aThe sum is not part of the membership matrix, its just there to show that it now sums up to 1.

Step 2: In this step we calculate the center, or midpoint, of a cluster. The center can be calculated with:

vi =

n

X

k=1

(uik)^mxk/

n

X

k=1

(uik)^m

Where uik is the membership value of point xk, for the i:th cluster and m is a fuzziness value that will work as a weighting exponent.

Step 3: In this the membership matrix U is updated and there are basically two cases that can occur when updating the membership matrix. One is that the center of a cluster is right on top of one or more points. In that case, the points that is under the center will get a membership value of 1.0 for that cluster and a value of 0.0 for the rest of the clusters, unless two clusters have the same center, which is unlikely but still plausible. If there is two clusters with the same center and a point lies on that center then that point will get a value of 1.0/number of clusters in the same location, so if there is two clusters then the point would get the membership value 1.0/2 = 0.5. In the other case of the update the value is updated based on this formula:

Uik= 1/[

c

X

j=1

(||xk− vi||/||xk− vj||)^2/(m−1)]

Where ||x_k−v_i|| is the euclidean distance from the current cluster, v_i, and the current point, x_k, ||x_k− v_j|| is the euclidean distance between cluster, v_j, and the current point and the variable m is a fuzzyness variable that is used as a weighting exponent.

Step 4: In this step we check if the difference between the old membership matrix and the new one is less than the stopping condition that was chosen or if there has not been any change to the matrix. If it is not then we go back to step 2.

This algorithm runs with a c that is set to a fixed number of clusters so it has to run several times with different c values in order to find the best c. It is preferred to have as small a c as possible to limit the number of rules to a reasonable amount. If the c value is to big then the system will become over fit and it could mean that each data point in the system might be in one cluster each and thus the rule will most likely become that data point for that cluster. When the algorithm has run several times for a many different c values it is time to see which c that is the best. To find the best c we use something called the criterion number. The criterion number is calculated according to:

(29)

2.3. Algorithms 19

S(U, c) =

c

X

i=1 n

X

k=1

(uik)^m[||xk− vi||²− ||vi− ¯x||²]

Figure 2.12: Criterion number

where ¯x is the center of all data points. The goal is to get the c which generates the smallest criterion number, though having c = number of data points usually generates the best criterion number. But, as stated above, it is not what we want since that would make the system over fit. After the best number of clusters has been discovered it is time to identify the rules of each of the clusters which you can read about in section 2.3.2.

2.3.2 Rulebase algorithms

After we have divided all of the data points into clusters we need to make projections of these clusters in order to create rules for the rulebase. All clusters will generate one rule each that is based on the projections of the cluster. In figure 2.5 on page 10 a projection of a cluster can be seen. The closer the projection is to the center the higher the projection will be so the highest point of the projection is the center. When the projection has been created we need to find a function which best fits this curve and that function will become the rule.

The number of projections that is needed depends on how many inputs a data point consists of, so if we have data points that has three inputs Age, Sex and Income then we would need to make three projections on the cluster. One for every input.

With these projections we can now create rules. To create a rule we use the following formula:

u

ik

− e

^β^p^(π^p^(x^k^−α^ip⁾

Figure 2.13: Create rules

Where u_ikis the membership value of data point x_kin the i:th cluster, π_pis the projection on the p:th axis, or p:th input. The variable βp is expressed as βp= 1/(2σ²) where σ is the standard deviation of πp. αip is the center of cluster i in πp. Running this on all clusters and all axes will create a set of rules that constitutes the rulebase. After the rulebase has been created then the fussy clustering is completed and it is ready for the real input.

2.3.3 Inference algorithms

In order to get output from the rulebase we need something that can take the input and the rulebase and translate it into output and for that we have inference methods. In this project there is two inference techniques that has been implemented, Mamdani’s method and Takagi-Sugeno’s method.

(30)

Activation

Both of the inference methods use a activation function. The first thing this function does is that it calculates the input using this formula

e

−(xkp−vip)2 2σ2

Figure 2.14: Calculate the input.

Where xkp is the p:th input of data point xk, vip is the center of the p:th input on the i:th cluster. σ is the i:th clusters σ that we used when calculating the rules in section 2.3.2 on the previous page. The results are saved in arrays, where each array represents one of the clusters and it contains all the calculations of all the dimensions of the input. So if we have a data point x with the inputs age, sex and income. Then we would get an array that would look like this arrayr₁ = [calc(age), calc(sex), calc(income)]. After all of the arrays has been created we go into the nest step which is to, depending on the configuration chosen, either select all the smallest values, minimum, or largest values, maximum, in these arrays and save them in a new array which represents the activation function.

Mamdani’s method

The process of Mamdani’s method can be seen in figure 2.15 on the facing page.

After the input has been calculated and the correct values has been chosen, see the Activation paragraph above, Mamdani’s method will try and find the best rule for each output. Mamdani’s membership function can be seen in equation (2.16) on the next page, where αi is the activation values, Uiis the output values from the rulebase and U is a fuzzy set.

To clarify things we will go through figure 2.15 on the facing page step by step. In this figure we have a system that contains three rules 1, 2 and 3. The curves represents how the rule depends on the input. In the figure input1 has been calculated to be 3 and input2 to be 8 and following the arrows we can see the results of the input where they hit the curve.

The next step is to see which of the inputs that has the biggest affect on the final result and in Mamdani’s case we do this with the maximum function. The result of this a fuzzy set, U , which is represented by the green graphs in figure 2.15 on the next page.

Now to make sense of this fuzzy set we need to defuzzify it and there are a few techniques for defuzzification but the one that has been used in this project is called Centre-of-Gravity.

So first we combine all values in the fuzzy set so we gets something that looks like the graph called ”Result of aggregation” in figure 2.15 on the facing page. It is on this graph we want to find the centre of gravity and that can be done by using equation (2.17) on the next page.

Where µU is the combined fuzzy set and uk is the k:th member of the fuzzy set.

There is another method that is based on Mamdani’s method but with a small alter- ation. The method is called Larsen’s method and uses the product as implication instead of minimum that Mamdani’s method uses. This is mentioned because in the implementation there is the possibility to do four different configurations. The Activation has two different settings namely maximum or minimum and in the Mamdani method it is possible to use

9http://www.dma.fi.upm.es/java/fuzzy/fuzzyinf/mamdani3 en.htm, 15 Feb 2013

(31)

2.3. Algorithms 21

Figure 2.15: Process of Mamdani’s method.⁹

U = ∨

ⁿ_i=1

(α

_i

∧ U

_i

)

Figure 2.16: Mamdani - Output membership function.

u = P

l

k=1

u

_k

· µ

_U

(u

_k

) P

l

k=1

µ

U

(u

k

)

Figure 2.17: Centre of Gravity.

Larsen’s method instead. Larsen’s membership function can be seen in equation (2.18) on the following page.

(32)

U = ∨

ⁿ_i=1

(α

_i

· U

_i

)

Figure 2.18: Larsen - Output membership function.

Takagi-Sugeno’s method

This method uses linear functions to create an inference. The out put is represented like this: ui = pi1+ pi2x1+ pi3x2 one for each rule. The first thing that needs to be done is to compute the constants, p, for each rule. In [15], Takagi and Sugeno describes a way to calculate these constants.

Let X be a m · n(k + 1) matrix(figure 2.19), Y an m vector(figure 2.20) and P a n(k + 1) vector(figure 2.21).

X =

β11, . . . , βn1, x11· β11, . . . , x11· βn1, . . .

. . . , xk1· β11, . . . , xk1· βn1

... ...

β_1m, . . . , β_nm, x_1m· β1m, . . . , x_1m· β1m, . . .

. . . , x_k1· β_1m, . . . , x_k1· β_nm .

Figure 2.19: Takagi-Sugeno - X matrix.

Y = [y

₁

, . . . , y

_m

]

^T

Figure 2.20: Takagi-Sugeno - Y vector.

P = [p

¹₀

, . . . , p

ⁿ₀

, p

¹₁

, . . . , p

ⁿ₁

, . . . , p

¹_k

, . . . , p

ⁿ_k

]

^T

Figure 2.21: Takagi-Sugeno - P vector.

Where β is defined as seen in figure 2.22 on the facing page, i represents the i:th rule, j and m represent the number of data points in the system, k is the k:th input in a data point and n is the number of rules in the system. Aik means the membership value of the k:th input in rule i, xkj means the k:th input from the j:th data point and ym is the output of data point m. The P vector represents the constant values needed to calculate the expected output and is generated by the matrix computation seen in figure 2.23 on the next page. In figure 2.21 the P_kⁿ means that its the n:th constant for rule k. After the P vector has been calculated we use this formula u_i= p_i1+ p_i2x₁+ p_i3x₂ to get the output u_i which we then use to calculate the final output with:

(33)

2.3. Algorithms 23

u = P

n

i=1

α

_i

u

_i

P

n

i=1

α

i

Where αi is the activation function that was described in the activation paragraph.

β

_ij

= A

_i1

(x

_1j

) ∧ · · · ∧ A

_ik

(x

_kj

) P

j

A

_i1

(x

_ij

) ∧ · · · ∧ A

_ik

(x

_kj

)

Figure 2.22: Takagi-Sugeno - β variable.

P = (X

^T

X)

⁻¹

X

^T

Y

Figure 2.23: Takagi-Sugeno - matrix computation

(34)

(35)

Chapter 3

Implementations, results and validations

3.1 Development environment

3.1.1 .Net

One criteria for this thesis was that it should be implemented in the .Net 4.0 environment since that is what they use at Acino. C# and .Net has excellent support for computers running windows since it is Microsoft that develops them both. I had never used C# prior to this project and knew that it could be a hindrance. I found it very similar to Java which is a language that I have used many times which made it easier to learn. Though I might have been influenced to program in a more Java like style and thus might not have fully taking advantage of the C# language. But it was an opportunity to learn C# which is a widely used language in the business world and so is good to know to make you more attractive on the job market.

3.1.2 Usefull embedded namespaces

A namespace in the .Net environment is basically a collection of useful methods and functions. The System.IO namespace contains useful methods for input and output like reading and writing data streams. In this section I will mention some of the namespaces that I have used in this project. The descriptions are from the msdn-site¹¹ and has been used to describe these namespaces.

System.IO: namespace contains types that allow reading and writing to files and data streams, and types that provide basic file and directory support.

System.Runtime.Serialization: namespace contains classes that can be used for serial- izing and deserializing objects. Serialization is the process of converting an object or a graph of objects into a linear sequence of bytes for either storage or transmission to another location. Deserialization is the process of taking in stored information and recreating objects from it.

11http://msdn.microsoft.com/en-us/

25

(36)

System.Linq: namespaces contain types that support queries that use Language-Integrated Query (LINQ). This includes types that represent queries as objects in expression trees.

System.Windows.Forms: namespace contains classes for creating Windows-based appli- cations that take full advantage of the rich user interface features available in the Microsoft Windows operating system.

System.XML: namespaces contain types for processing XML. Child namespaces support serialization of XML documents or streams, XSD schemas, XQuery 1.0 and XPath 2.0, and LINQ to XML, which is an in-memory XML programming interface that enables easy modification of XML documents.

System.Text: namespaces contain types for character encoding and string manipulation.

A child namespace enables you to process text using regular expressions.

3.1.3 Database management via XML export

This implementation uses XML to get the information about the insurances. Currently you have to download this information to a file in order for the program to read them, but it should not be to difficult to tie the implementation to a server that is hosting the insurance information. One thing to keep in mind is that there are two styles in use. At Acino, where I am doing this thesis, they have one style and at Svenska F¨ors¨akringsfabriken they have another style. The biggest difference is that the later uses codes for the different fields while Acino uses words/names so it is easier to read the one from Acino and that is the format that the implementation uses. The XML document is read by a preprocessor which then parses out the interesting parts of the data and sends them to the main program.

3.2 Result

The results was a bit of a surprise for a number of reasons. The first one was that the Mandani methods that used the maximum function in the activation phase had a constant hit rate which never changed. Looking at a sample of the output of the functions, table 3.2 on page 28, we can see that the interval of these methods only covers two outcomes namely case 3.0 = Green and 4.0 = Gray which explains the poor results. The method with the best results is the Takagi-Sugeno method which covers the whole range of expected results.

In table 3.1 the observed upper and lower limits of each method can be seen.

Table 3.1: The output intervals of the different inference methods

Inference Lower limit Upper limit

Takagi-Sugeno 1.0 3.5

T/F^a 0.5 2.5

T/T 0.5 2.5

F/F 3.0 4.0

F/T 3.0 4.0

aT/F, (activation function: T = Minimum, F = Maximum)/(Mamdani function: T = Product, F = Minimum)

(37)

3.3. Impact of the parameters 27

The second one is that the hit rate for all the methods was over all very stable with only a few drops in the hit rate. It was expected that the results would improve or at least change with different configurations but the results were very stable. With an exception of the T/T Mamdani method which struggled when the fuzziness variable was > 12. Another surprise with that Mamdani method was that it struggled when the number of runs was

> 140 which is strange since all that does is run the programs again to find the best system, the one with the highest criterion number.

3.3 Impact of the parameters

There are a few settings that can affect the results. In this section we will go through some of them and see what impact they have on the hit rate. But first there is a thing that needs to be explained. In the charts the Mamdani inference is named T/T, T/F, F/T, or F/F and the T stand for true and the F for false. The first symbol states if the Min-function has been used or the Max-function is used, T means that Min was used. The second symbol states if the Product-function has been used or if the Min-function has been used, T means that the Product-function was used.

3.3.1 The Fuzzyness Variable

The fuzziness variable, m, is a weighting exponent. It is related to the weight that is given to the closest center.

Figure 3.1: The impact of the fuzzyness variable

In figure 3.1 we can see that the Mamdani inference where we use the max function have a steady hit rate of 12% which never changes and 12% is is far from acceptable. The Mamdani inference where we instead use the min function performs around 40% which still is not acceptable. There is some slight difference between the Mamdani that uses the Min-function and the Product-function. The one that uses the min-function in both stages perform slightly better and is a bit more stable than the one that uses the min-function in the activation stage and the product-function in the next stage.

(38)

Table 3.2: Sample of the output from the system Expected Takagi-Sugeno T/F T/T F/F F/T

2 2, 4 2, 1 1, 6 3, 9 3, 9

2 2, 4 2, 1 1, 6 3, 1 3, 1

2 2, 4 2, 1 1, 6 3, 9 3, 9

2 2, 4 2, 1 1, 6 3, 1 3, 1

2 2, 4 2, 1 1, 6 3, 9 3, 9

2 2, 4 2, 1 1, 6 3, 1 3, 1

3 1, 2 0, 6 0, 5 3, 7 3, 7

1 1, 2 2, 3 2, 2 3, 9 3, 9

3 2, 6 1, 6 1, 2 3, 9 3, 9

4 3, 2 0, 6 0, 4 3, 4 3, 4

3 2, 6 1, 6 1, 2 3, 9 3, 9

2 2, 4 2, 1 1, 6 3, 1 3, 1

2 2, 4 2, 1 1, 6 3, 9 3, 9

3 2, 6 1, 6 1, 2 3, 9 3, 9

2 1, 4 2, 3 1, 9 3, 9 3, 9

3 2, 6 1, 6 1, 2 3, 9 3, 9

3 1, 2 2, 4 2, 4 3, 9 3, 9

1 1, 2 2, 3 2, 3 4, 0 4, 0

3 2, 6 1, 6 1, 2 3, 9 3, 9

2 2, 4 2, 1 1, 6 3, 1 3, 1

1 1, 2 2, 3 1, 9 3, 7 3, 7

1 1, 2 1, 6 1, 2 3, 9 3, 9

3 2, 6 1, 6 1, 2 3, 9 3, 9

1 1, 2 1, 6 1, 2 3, 7 3, 7

2 1, 4 2, 3 1, 9 3, 9 3, 9

3 2, 6 1, 6 1, 2 3, 9 3, 9

2 2, 4 2, 1 1, 6 3, 1 3, 1

4 3, 2 0, 6 0, 4 3, 4 3, 4

3 2, 6 1, 6 1, 2 3, 9 3, 9

4 3, 5 0, 5 0, 4 3, 4 3, 4

aThe numbers displayed is the actual numbers they produce.

During evaluation they are rounded off to the nearest integer.

The inference method with the best performance is the Takagi Sugeno inference which performs with a hit rate around 69% with a few drops. 69% is a lot better than the Mamdani methods but still not acceptable, the hit rate that we want to see is at least 90%.

(39)

3.3. Impact of the parameters 29

3.3.2 The Number of Runs

Since the initiation of the membership matrix is random we can see differences between training runs. This means that it is a good idea to do a number of runs in order to get the

’best’ clustering we can get.

Figure 3.2: The impact of the number of runs variable

In figure 3.2 we can see the impact of the number of runs performed. Most of the inference methods are stable which could mean that at some runs they were lucky or unlucky. Though the number of runs seems to have a big impact on one of the inference methods, namely the Mamdani version where we the minimum function is used in the activation and the product function is used in the Mamdani method. Though when using the product function inside the Mamdani method it is known as Larsen’s method.

3.3.3 The Cluster Interval

It is preferable to have as few clusters as possible in order to avoid over fitting the system.

In this test we look for an interval that that does not affect the hit rate while minimising the number of clusters we have to try. The label 20/90 means that the system will cluster between 20% and 90% of the data points. So if we have one hundred data points in the system, that would mean that 90 − 20 = 70 different clustering systems would be created and then compared to see which one that yields the best result.

In figure 3.3 on the next page we can see that the methods are relatively stable with a few drops and peaks. This could just be a coincidence since it should not make any difference in performance unless a area that the usually creates the best clusters is removed from the interval. Lets say that usually the winning cluster is from the system that makes around 60% of the data points. So if we only check between 20% and 50% we might miss that and get a lower hit rate. Looking at figure 3.3 on the following page we can see that the interval between 20/60 and 20/40 seems to be the best. Keep in mind that this is mostly to reduce training time and not to improve the hit rate.

(40)

Figure 3.3: The impact of the interval limits

(41)

Chapter 4

Conclusion

There was an scheduling conflict in the beginning of the project. I had forgotten that I had planned to retake two courses at the same time which in hindsight was not the best of ideas.

When the courses started the project was cut to a pace of 50% so half the day was spent on one of the courses and the other half on the project. But it always took a while to get back in to the project so a lot of time was lost in the end. Probably would have been best to not have taken courses on the side or at least only one.

The goal for this project was to create a system that could evaluate insurances using AI.

In section 1.3.1 on page 3 some criteria is presented, the only criteria that is not fully ful- filled is The result of the evaluation should consist of a flag and a text. The system is currently only giving a flag and the reason for that is that I was unsure on how the texts should be interpreted by the system. The system need the input and output to be represented by floating-point numbers, it probably is possible to look up all combinations of texts and give them a numerical representation that can be combined in a good way.

The system is currently not a good replacement for the evaluation of insurances. But even though the system only managed to get a hit rate of 69% at best, there is potential in fuzzy clustering. In section 4.2 on the following page a few suggestions is mentioned that could help boost the performance and make it an dependable replacement.

4.1 Restrictions and Limitations

There are a few restrictions and limitations in this project and those are:

Slow learning process: The process of training the system can can be quite time consuming depending on the number of data points that is used. But since this is not something that needs to be done so often it can be probably be disregarded. A way to reduce the training time would be to make it parallel so that multiple number of clusters can be evaluated at the same time. The fuzzy clustering process is very parallel friendly because it can easily be divided into a number of tasks.

Requires retraining: If there is a change in the rules then the system will need to be retrained. This could mean that new training data is required and that might have to be manually evaluated which might take some time and then we have the actual training time of the system which is mentioned in the previous block.

31

(42)

Variable number of inputs: The system can not handle a variable number of input and since two insurance can have different number of values/attributes we can not just read the whole insurance and send it to the system. We need to choose a number of important values/attributes that all insurances have and use them. The fewer that is used the less complex the problem will be for the system and the training time will be reduced. If a new value/attribute is introduced the system will have to be retrained.

No support for multiple instances: A way to improve the performance of the system would be to make different systems that each evaluates one type of insurance. This is discussed more in section 4.2. But currently the system does not support that. A few modifications would have to be done in order to support multiple instances. First is to fix the start up to support starting multiple instances and assigning tasks to them and then fix so that each instance can save their own training data without overwriting each other.

4.2 Future work

There is a few improvements that can be done in the future that might help improve the results. These improvements will be discussed in the following sections and the proposed improvements are:

– Better selections of training data – Small and focused

– Reinforced learning

4.2.1 Better selections

It could be that the training set currently in use is to hard or complex for the system as it is. So by making a more thorough selection of training data it could be possible to get better results since it would be easier for the system to cluster.

4.2.2 Small and focused

At the moment the application is used as a single instance to evaluate all types of insurances.

This means that the system needs to make a big and complex clustering that should handle all these cases. By making a lot of small systems that only handle a special case of the insurances it should make it easier for the systems to make clusterings. There are a few problems that can occur when doing this, depending on how finely the insurances is divided.

A insurance can contain a number of moments and if we make a system that has a lot of smaller systems that is focused on one moment each then we would have to come up with a way to combine the results of these moments. If for example one moment gives Green(movable) and another gives Red(not movable) should the combined output be Red, Green or Yellow(might be movable)?

4.2.3 Reinforced learning

During reinforced learning a system will, after training, run the training data again and see how well it performs. If the system makes a good call it is ’rewarded’ and a bad call

(43)

4.2. Future work 33

results in a ’penalty’. The system will then go back to training and will try and adapt to the

’rewards’ and ’penalties’ it revived. This process will continue until a certain limit has been reached e.g. a hit rate of atleast 90%. As you can see by introducing reinforced learning to the application it would be possible to improve the results.

(44)

(45)

Chapter 5

Acknowledgments

I would like to thank Acino for letting me do my thesis there and I would also like to thank all of Acino’s employees for making me feel welcome. I especially want to give a big thanks to Hannes Kock my supervisor at Acino, Patrik Eklund my supervisor at the CS-department and Anna Theorin my contact at Svenska F¨ors¨akringsfabriken, for helping me with this thesis.

35

(46)

Using Artiﬁcial Intelligence for the Evaluation of the Movability of Insurances

Using Artificial Intelligence for the Evaluation of the Movability

of Insurances

Martin ˚ Aslin

June 29, 2013

Master’s Thesis in Computing Science, 30 ECTS credits Supervisor at CS-UmU: Patrik Eklund

Examiner: Fredrik Georgsson

Ume˚ a University

Department of Computing Science SE-901 87 UME˚ A

SWEDEN

Contents

List of Figures

List of Tables

Chapter 1

Background and motivation

1.1 Introduction

1.2 Insurances

1.2.1 What is an insurance

1.2.2 Structure

1.2.3 Types

1.2.4 Pitfalls

1.3 Problem description

1.3.1 Goal

1.3.2 Data

1.3.3 Rules

1.3.4 The structure of the system

Chapter 2

Theory

2.1 Overview of ’intelligent computing’

2.1.1 The logical approach

2.1.2 The probabilistic approach

2.1.3 The numerical approach

2.1.4 Pros and cons

2.2 The chosen method

2.3 Algorithms

2.3.1 C-Means Fuzzy Clustering Algorithm

2.3.2 Rulebase algorithms

u

− e

2.3.3 Inference algorithms

e

U = ∨

(α

∧ U

)

u = P

u

· µ

(u

) P

µ

(u

)

U = ∨

(α

· U

)

Y = [y

, . . . , y

]

P = [p

, . . . , p

, p

, . . . , p

, . . . , p

, . . . , p

]

u = P

α

u

P

α

β

= A

(x

) ∧ · · · ∧ A

(x

) P

A

(x