Machine Learning and Telematics for Risk Assessment in Auto Insurance

(1)

STOCKHOLM SWEDEN 2020,

Machine Learning and Telematics for Risk Assessment in Auto

Insurance

FRITHIOF EKSTRÖM ANTON CHEN

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

(2)

Machine Learning and Telematics for Risk Assessment in Auto Insurance

Frithiof Ekstr¨om, KTH, Anton Chen, KTH

Abstract—Pricing models for car insurance traditionally use variables related to the policyholder and the insured vehicle (e.g. car brand and driver age) to determine the premium.

This can lead to situations where policyholders belonging to a group that is seen as carrying a higher risk for accidents wrongfully get a higher premium, even if the higher risk might not necessarily apply on a per-individual basis. Telematics data offers an opportunity to look at driving behavior during individual trips, enabling a pricing model that can be customized to each policyholder. While these additional variables can be used in a generalized linear model (GLM) similar to the traditional pricing models, machine learning methods can possibly unravel non-linear connections between the variables. Using telematics data, we build a gradient boosting model (GBM) and a neural network (NN) to predict the claim frequency of policyholders on a monthly basis. We find that both GBMs and NNs offer predictive power that can be generalized to data that has not been used in the training of the models. The results of the study also show that telematics data play a considerable role in the model predictions, and that the frequency and distance of trips are important factors in determining the risk using these models.

Sammanfattning—Prissättningsmodeller för bilförsäkringar använder traditionellt variabler relaterade till försäkringstagaren och det försäkrade fordonet (t.ex. bilmärke och förar˚alder) för att bestämma försäkringspremien. Detta kan leda till situationer där försäkringstagare som tillhör en grupp som anses bära p˚a en högre risk för olyckor f˚ar en felaktigt hög premie, även om den högre risken inte nödvändigtvis gäller p˚a en individbasis.

Telematikdata erbjuder en möjlighet att titta p˚a körbeteende under individuella resor, vilket möjliggör en prissättningsmodell som kan anpassas till varje enskild försäkringstagare. Även om dessa variabler kan användas i en linjär modell liknande de traditionella prissättningsmodellerna kan användandet av mask- ininlärningsmetoder möjligen avslöja icke-linjära samband mel- lan variablerna. Med hjälp av telematikdata bygger vi en modell baserad p˚a gradient boosting (GBM) och ett neuralt nätverk (NN) för att förutsäga frekvensen av olyckor för försäkringstagare p˚a m˚anadsbasis. Vi kommer fram till att b˚ada modeller har en prediktiv förm˚aga som g˚ar att generalisera till data som inte har använts vid träningen av modellerna. Resultaten av studien visar

även att telematikdata spelar en betydande roll i modellernas prediktioner, samt att frekvensen och sträckan av resor är viktiga faktorer vid bedömningen av risken med hjälp av dessa modeller.

Index Terms—Telematics for car insurance, Gradient Boosting Machine, Neural Network, Machine Learning

I. INTRODUCTION

TWO of the most fundamental aspects needed to compare insurance policies are coverage and premium. To attract and retain customers, an insurance company needs to offer good coverage at a competitive price. However, the ability to attract customers is a poor measure of success in the insurance industry, since the claims of some policyholders will inevitably result in expenses superseding the revenue generated from their premium payments. It is in the interest of the insurer to avoid these customers altogether, effectively achieved by offering unattractive premiums on their policies. Thus, a critical task for insurers is to distinguish between policies that are likely

to result in costly claims and policies with a high probability of generating a net income.

In the auto insurance industry, the most common way of assess- ing risk is using Generalized Linear Models (GLMs) to perform regressions using variables such as age, area of residence, estimated mileage and car model. However, new methods of performing these assessments have been gaining popularity in the latest decade.

One change that can be observed is the shift towards usage- based policies where data on how the car is used is collected and used together with the traditional variables to make risk assessments.

This introduces the ability for insurers to add new metrics to their models, including actual driving distance, speedings, hard breaks, and aggressive accelerations. Also, it gives policyholders the opportunity to prove to the insurer that they are safe drivers.

Another development can be seen in the method used to analyze the data and predict risk. While traditional mathematical methods are still widely used, the power of machine learning algorithms as tools for performing this analysis is becoming increasingly apparent and the technology is being used by multiple companies in the industry.

The purpose of this study is to investigate how telematics can be used in conjunction with state-of-the-art machine learning algorithms to increase the accuracy of risk assessments in the context of auto insurance.

Friday 26^thJune, 2020

II. BACKGROUND AND GOAL

A. The area

The topic of modern statistical and machine learning (ML) methods for insurance has been studied in multiple insurance contexts. Of the studies in the context of car insurance, many have applied such algorithms to data consisting of traditional parameters, such as driver age, previous accidents, and car model. However, there have also been multiple studies on the topic of using telematics for achieving a more accurate estimate of how much a car is used. The same information has also been explored in the context of evaluating driving behavior.

Both the usage and the behavioral aspect are believed to be useful for improving risk assessment.

B. Relevance for the company

The company providing the data is insurance company Paydrive.

The company’s strategy is to collect data about the driving behavior of the policyholders. The data in its raw form consists of GPS coordinates and accelerometer data, which can, in turn, be aggregated in variables such as the number of hard breaks and meters driven over the speed limit.

Inaccurate prediction of risk relative to competitive firms exposes the company to the risks of losing low-risk customers due to overpricing and attracting high-risk customers due to underpricing.

If the telematics data can be used to better predict the expected cost of claims from each policyholder, the company can potentially gain an information advantage that is difficult and costly for competitors to replicate.

C. Goal

The goal of the study is to build upon the current state-of-the- art research in the field by introducing and evaluating the relative

(3)

importance of multiple new variables for estimating risk in the context of car insurance. Another goal is to find models that can be used to predict the expected number of accidents for a customer and evaluate the performance of a Gradient Boosting Machine as well as a Neural Network. Finally, the implications of the results for the business’

pricing model will be discussed.

D. Ethical aspects

An insurance premium that rewards safe driving will not only increase the profitability of the insurance company. It will also allow drivers whose risk is traditionally overestimated to get a fair premium. Also, it encourages people to drive more carefully, potentially resulting in a reduction in the number and severity of traffic accidents. If this way of pricing car insurance proves successful and starts to spread, this effect has the potential to extend to the entire population of car drivers, making it increasingly expensive to drive incautiously as insurers are forced to raise prices of policies where telematics are not part of the risk assessment.

III. RESEARCH QUESTIONS The study is exploring two sub research questions:

• SRQ1: how do Gradient Boosting Machines (GBM) and Neural Networks (NN) compare in the task of predicting accident frequency from telematics?

• SRQ2: how can a ML-based model for predicting risk be integrated into the product portfolio of an insurance company?

To answer these questions, a study of the current literature in the field of telematics for car insurance is needed. Also, a study of how GBMs and NNs can be trained on the zero-heavy data set of insurance claims is required.

As part of the study, two machine learning models are trained on telematics data, as well as on data about the customer and their vehicle. Their predictive power is be evaluated and compared.

Finally, based on the results and conclusions of these analyses, the implications for pricing of the insurance policies is discussed.

IV. THEORY

A. Gradient Boosting

One of the ML algorithms that is used for estimating policyholder risk is Gradient Boosting, configured for regression. The algorithm is used to combine a collection of weak learners, typically fixed-size decision trees, into a strong learner. This is achieved by constructing fixed-size decision trees with nodes randomized from the input variables. The trees are created sequentially and weighted according to how well they compensate for the loss of the previous trees in the sequence.

The algorithm takes two inputs: the labeled data points and a differentiable loss function. The choice of loss function depends on the probability distribution of the variable being predicted.

B. Feed Forward Neural Networks

The model that the GBM is compared to is a feed forward neural network. It consists of multiple layers of connected neurons. The first layer takes the feature values (including a dummy feature) as its input. In each neuron, the inputs ¯x are multiplied by the weights ¯θ, and passed through a non-linear activation function. The final layer outputs a prediction.

The network learns through backpropagation. This is done by computing the gradient of the loss function, i.e. the partial derivatives of the loss function with respect to the inputs ¯x. The resulting gradient, multiplied by a learning rate, is finally added to the weights θ to adjust the net to give predictions closer to the training labels.¯ The model is trained until the rate of improvement sinks below a predetermined threshold, or until the training is interrupted by the user.

Fig. 1. Example of a Neural Network neuron using the sigmoid activation function. [1]

C. SHAP Values

To interpret the results and analyze the importance of individual features, we will use SHAP (SHapley Additive exPlanations), a method used to explain the output of machine learning models based on the classic Shapley values from game theory. The general idea of Shapley values is to assign payouts to players depending on how much they contribute to the total payout. In the context of machine learning predictions, the feature values can be seen as the players, and the game is the prediction task for one instance of the dataset, while the payout is the prediction itself. For a particular prediction, SHAP assigns each feature of the model an importance value that represents the contribution of the feature to the prediction. [2]

TreeSHAP is a variant of SHAP made for tree-based machine learning models such as gradient boosted trees, random forests, and decision trees, and will be used to analyze the feature importances of the gradient boosted model. Similarly, DeepSHAP is a variant suited for deep learning models, and it will be used to analyze the feature importances of the neural network model.

It is important to point out that SHAP Values provide model interpretability, but not causality. The values allow humans to understand why a model has yielded specific results, but do not necessarily explain the real-life causes that lead up to the observed results.

V. PREVIOUSRESEARCH

The usage of telematics data in car insurance pricing has been studied and compared with traditional pricing models in several studies. One such study shows that the combination of telematics data and traditional policyholder variables can be used to construct models with a better predictive performance than classical models that only use traditional policy data. [3] Another study shows that the behavior of drivers changes depending on the road type. The authors propose a risk index that takes several different parameters into account, such as driver aggressiveness on different road types and the fractions of the distance travelled during day- or nighttime. [4] This motivates further exploration of the predictive power of telematics data in our study, as well as taking additional parameters that might not be explicitly present in the telematics data, such as road type, into account when trying to predict the risk.

The value of telematics data does not exist only in the added information that can be used in models, but also in a wider perspective that covers societal aspects. Litman shows that annual insurance claims tend to increase as the annual vehicle travel rises. By using distance-based key variables for insurance pricing, prices can reflect the costs instead of just shifting costs between different groups of drivers. Since most crashes involve more than one vehicle, reduced mileage can cause a considerable reduction in both insurance claims and crash costs, which means that distance-based pricing can act as a traffic safety strategy. [5]

(4)

Several studies that have utilized machine learning approaches highlight gradient-boosted tree models specifically. When compared to traditional GLMs and other tree-based algorithms, the GBM consistently outperforms the other modelling approaches. Even in the case where the structure of GLMs is required due to regulatory issues, the gradient-boosted tree approach can reveal important variables and their interactions with each other. [6] This is supported by Yang, Qian, and Zou who find that a tree-based gradient boosting method can be a good complement to traditional GLMs due to its ability to extract information such as non-linearity and important interactions between variables. [7] These highlighted benefits of the gradient-boosted tree approach are part of the reason why we have chosen a GBM as one of our approaches in our work.

VI. METHOD

A. General preprocessing of data

The data used in the study consists of telematics from 150 000 trips and a-priori data from 24 000 insurance policies. Among the telematic variables are speeding distances, distances driven during different hours, total trip distances, as well as start and end coordinates. The a-priori data consists of variables such as driver age, car model and previous accidents. In addition to the provided variables, two new variables were added by connecting coordinates and timestamps with third party geographic and weather data. One of the variables indicates the percentage of trips starting or ending in regions classified as metropolitan by the Swedish Agency for Economic and Regional Growth [8]. The data points were classified by reverse geocoding, mapping each datapoint to a region in the GeoNames database.

The other variable classifies the weather by assigning a weather code to each trip, with each code belonging to one of three weather classes in ascending order of severity. The assignment was done by connecting the midpoint (in regards to both time and space) of each trip with the closest SMHI (Swedish Meteorological and Hydrological Institute) weather station, and then fetching the weather code from the closest timestamp in the station data. Each weather code was assigned a class using the following logic:

1) class 1 consists of any type of insignificant weather conditions or clear weather

2) class 2 consists of all types of mild to moderate precipitation and other weather conditions that might affect factors such as visibility negatively

3) class 3 consists of all types of severe precipitation, snow, storms, heavy fog, and other extreme weather conditions.

The motivation for adding these variables is that empirical data shows that the frequency of road accidents is higher in large cities and in bad weather conditions. [9], [10] The trip data was then aggregated on insurance policy number and month. The data was then joined with customer data, creating the final data set. It was then randomly split into a training (90%) and a test (10%) data set with claims balanced between the sets. 20% of the training data was allocated for validation during training.

B. Selection of loss function

In order to select a suitable loss function to use in training, the distribution of claims was analyzed. Since the labels are count data, some reasonable distributions to consider are the Poisson distribution, the negative binomial distribution, the Poisson-gamma distribution, and other Tweedie distributions.

The data was found to satisfy the conditions of the Poisson distribution, with a variance almost equal to the mean (variance/mean >

0.999). A chi square test against the distribution provides a p-value greater than 0.95 for the data aggregated on day, indicating a good fit. Based on these findings, the Poisson loss function was selected.

Training was also done using the mean square error loss function, achieving nearly identical performance.

C. Gradient Boosting Machine Configuration

The GBM was implemented using Microsoft’s LightGBM. Various hyperparameter configurations were also tested and evaluated, including different numbers of leaves, maximum depth, and the maximum number of bins for the features.

D. Neural Network Configuration

The neural network used in the study was a simple feed-forward network with one hidden layer. The data was normalized using the following function

norm(x) = µ(x)/σ(x)

Since the predicted variable is a rate, it cannot fall in (−∞, 0).

This calls for an activation function allowing only positive values in the output layer. Due to the zero-inflation in the output layer, the activation function needs to be smooth around x = 0 in order to prevent “dying” neurons, i.e. preventing the net from learning to output only zeros. [11] The chosen function, meeting both of these criteria, is the sigmoid function.

The neural network was implemented using Tensorflow and Keras with various neuron configurations.

E. Features

The set of features used in both of the aforementioned algorithms can be categorized into three general types:

1) Telematics data: pertains to the data collected from individual trips. Includes information such as total distance travelled, occurrences of speeding, and distance travelled during specific hours of the day.

2) Customer data: pertains to the data related to the customers and their insured objects, which corresponds to a-priori data.

Includes information such as customer age, vehicle age, and vehicle brand.

3) External data: pertains to the data collected from external sources that can be connected to the telematics data. Includes information such as distance driven travelled in certain weather conditions or in urban locations.

The features remain largely unchanged from the variables in originating datasets, but slight adjustments have been made to convert some features from absolute values to relative values.

TABLE I FEATURES

Description Type Features

Measured in absolute terms. In- cludes frequency and distance of trips.

Telematics TEL1, TEL8

Measured in relative terms. Con- nected to certain driving behaviour.

Includes occurrences of speeding.

Telematics TEL2, TEL3, TEL4

Measured in relative terms. Con- nected to trips being made under certain conditions. Includes distances travelled during certain hours of the day.

Telematics TEL5, TEL6, TEL7

Measured in relative terms. Con- nected to the location of trips. In- cludes distances travelled in certain areas or weather conditions.

External TEL9, TEL10, TEL11, TEL12

Measured in absolute terms. Con- nected to vehicle specifications such as weight and age.

Customer TRAD1,

TRAD2, TRAD4 Measured in absolute terms. Con-

nected to policyholder information such as age.

Customer TRAD3, TRAD5

(5)

F. Evaluation

The objective of the models is to predict the rate at which a customer will make insurance claims resulting in a net positive cost for the insurer. However, there is no real “rate” to compare the predictions to. Instead, the labels that can be used for validation are discrete values representing the number of claims in a given day or month.

The performance of the models was evaluated using multiple methods:

• The root mean squared error (RMSE) was used as the primary metric for evaluating goodness-of-fit.

• 10-fold cross validation was used to check for overfitting.

• The models’ predicted total number of claims in the test set was compared to the actual number of claims.

• Training and validation loss was evaluated to evaluate how much the performance improvement differed between the training and validation sets.

• Mean claim frequency predictions were compared to historical claim frequencies to evaluate if they were realistic.

VII. RESULTS

Fig. 2. Predicted versus actual claims for the Gradient Boosting Machine and Neural Network on groups of 149 customer months. The quantiles have been created from the predicted claims with the first quantile containing the 149 customer months with the lowest predicted risk, the second quantile containing the 149 customer months with the second lowest predicted risk, etc.

A. Model comparison

When plotting the predictions for small groups of 149 customer months (Figure 2) it can be observed that the model does a poor

Fig. 3. Predicted versus actual claims for the Gradient Boosting Machine and Neural Network on groups of 2988 customer months. The quantiles have been created from the predicted claims with the first quantile containing the 2988 customer months with the lowest predicted risk, the second quantile containing the 2988 customer months with the second lowest predicted risk, etc.

Fig. 4. SHAP values for 12 telematic variables and 5 traditional variables for the GBM. Shows the individual feature importances in descending order, with each dot representing a prediction of the model. Negative SHAP values indicate a lowering effect on the model output, while positive SHAP values indicate an increasing effect.

Fig. 5. SHAP values for 12 telematic variables and 5 traditional variables for the NN. Shows the individual feature importances in descending order, with each dot representing a prediction of the model. Negative SHAP values indicate a lowering effect on the model output, while positive SHAP values indicate an increasing effect.

(6)

job of predicting the actual outcome, with some quantiles predicting as many as five times the number of claims compared to the actual claims. As the size of the groups increases, however, the predictions become increasingly accurate, with a distinct correlation between the number of predicted claims and the actual claims.

Another observation is that the NN predictions have smaller errors on average compared to the GBM predictions (Table II) It can also be seen that both models predict claim rates of similar magnitude.

Finally, it should be noted that while the model has some degree of predictive power, it is difficult to evaluate exactly how good it is due to the small number of claims in the test set. The relative errors for the quantiles in figure 2 are large (some are infinite) due to the small groups. The relative errors for the quantiles in figure 3 are smaller (10.49% on average for the NN and 24.45% for the GBM) but so is the sample size, preventing the strength of the model from being proven.

B. Feature importance

Figure 4 and Figure 5 show the feature importance plots generated from the SHAP values for both the GBM and the NN. The features are ordered by impact on the model output in descending order. Each dot represents a prediction of the model, and the color of the dot indicates whether the feature value was high or low in the specific prediction.

Negative SHAP values indicate that the value of the specific feature had an effect of lowering the model output for a specific prediction.

Likewise, positive SHAP values indicate an effect of increasing the model output for a specific prediction.

As shown by these plots, TEL1 and TEL8 are the most important features in both models. The order of the remaining features differs, with the GBM seemingly placing a higher weight on telematics variables for its predictions than the NN. Both models place telematics variable in their three most important features, however. The features constructed from externally collected data generally carry a relatively high weight in the GBM, with TEL12 and TEL9 placing in the five most important features for the model. This applies less to the NN, but TEL9 still places as the sixth most important feature in this model.

Another observation that can be made from these plots is that several features seem to have high values on one side of the line marking zero, and low values on the other. This implies that certain types of behavior will indeed affect the model’s prediction of risk either positively or negatively. For instance, TEL9 and TEL12 are both features related to distance driven under certain conditions (either in certain weather conditions or in certain areas). The plot for the GBM model suggests that drivers who travel more often in these conditions will carry a higher risk in the model than those who do not. Likewise, TEL1 and TEL8 are both features measured in absolute terms, representing how much a driver travels (both the frequency of individual trips and the total distance travelled). In line with expectations, the predicted risk increases as driving becomes more frequent and trips cover longer distances.

Some unreasonable results arise when comparing the two plots to each other. The plot for the NN shows that high values for features TEL4 and TEL3, features that are known to be connected to high- risk driving behavior, contribute to a lower risk in the model. A possible explanation could be that there is too little data for the model to distinguish these features as indicators for high risk. It is again important to note that SHAP values do not provide causality, which means that the magnitude of a feature value cannot be directly proven to cause a higher or lower actual risk in real life based on these plots alone.

VIII. DISCUSSION

A. Interpreting the results

The GBM and the NN resulted in nearly identical predictive power in terms of RMSE, with the NN performing slightly better. Both of the models gave realistic predictions when compared to the historical claim frequencies and the cross validations showed that overfitting was not an issue.

TABLE II

RMSEAND CROSS VALIDATION STANDARD DEVIATION

Model RMSE Standard deviation

GBM 0.128 0.002

NN 0.125 -

The SHAP values showed that both of the features related to the frequency and distance of trips had the highest impact on model output for both models. This motivates a pricing model that takes the distance driven into account, due to the risk seemingly increasing with the distance travelled. Furthmore, both telematics data and a-priori data are important for the predictions. Considering the fact that both models show at least one a-priori data feature amid the telematics features in the upper half of the feature importance ranking, it is likely that a mix of the two data types offers a balanced bundle of information. However, the results reveal that it is ultimately telematics data that affects the model output the most.

Several implications can be raised from the differences and sim- ilarities between the SHAP values for the GBM and the NN. The two models agree on the two most important features, but disagree on the ordering of many of the other features. Some very obvious examples are features TRAD4 and TEL12 that are both valued highly in terms of feature importance in one model, but seemingly has very low impact in the other. Features TEL3 and TEL4, both being connected to high-risk driving behavior, also play different roles in the two models. Based on real-life knowledge about this type of behavior (such as speeding being seen as more aggressive driving, and therefore carrying a higher risk), the GBM’s interpretation of the features where high values correspond to higher risk is more logical than the NN’s reverse interpretation. As mentioned when presenting the results, it is possible that the ordering and SHAP values are partly arbitrary for the features below the top two due to a lack of data and relatively large amount of features. Data rows containing claims were relatively few compared with data rows not containing any claims at all. This, coupled with the fact that several features related to high-risk driving behavior (such as major instances of speeding or nighttime driving) were heavily skewed towards zero, may have caused some patterns that are known to exist to not be represented in the model. Despite some differences, the two models show similar results regarding whether a low or high feature value leads to a higher or lower risk in the model output (features TEL2, TEL5, and TEL9 to name a few), suggesting that there might still exist common patterns in the data that both models have been able to identify.

B. Limitations and future research

The biggest limitation to the study was the amount of data available for training the models. Compared to the major insurance companies, the data used for this study is very limited. With only a few thousand recorded accidents, the lack of information contained in the data itself limits the possibilities of making accurate predictions. Some features have values that are zero-skewed, which further limits the predictive power due to these features having very few occurrences that are non-zero in relation to the amount of data rows. Furthermore, a lot more work could be done to improve the quality of data aggregation and to tune the hyperparameters of the models.

There are also multiple aspects of the study that could be further explored. Perhaps the most interesting addition would be to compare the performance of the ML models to state-of-the-art linear models.

Another interesting area to explore is how the models compare when trained using only traditional variables versus a mix of traditional and telematics variables. Likewise, further work can be done to compare the feature importances between different models and attempt to a causality analysis.

We also discovered that what could be considered as aggressive driving behavior, speeding and hard breaks, was more prominent in

(7)

Fig. 6. Black-box, grey-box and white-box models compared on attributes of explainability. [12]

the days when accidents occurred. This indicates that predictions could possibly be improved by studying the driving behavior at the resolution of a day rather than a month. This is because these aggressive behaviors are difficult to identify among policyholder who drive safely on average if the values are aggregated over a month.

Future research could explore whether aggregating by day or not aggregating at all yields different results or not.

IX. INTEGRATING ANMLMODEL FOR PREDICTING RISK INTO A PRODUCT

This section is dedicated to discussing the opportunities and challenges of integrating an ML model for risk prediction into the product portfolio of an insurance company.

A. Model explainability

Integrating these alternative methods of predicting risk into the pricing of insurance policies will inevitably introduce some new challenges. One of the main challenges is connected to the explainability of the model. Models can generally be divided into white- box models, black-box models and grey-box models, with white- box models being models for which the internals are completely observable and understandable. Black-box models, on the other hand, cannot be observed or understood to the same extent as white-box models.

A traditional GLM would fit into the white-box definition, as the underlying mathematics are simple. A neural net, with the abstract meanings of weights in hidden layers, would classify as a black- box model, and a GBM would fall in the grey-box category, i.e.

somewhere in between the other two.

There are two main perspectives to consider when deciding on which model to use:

1) The accuracy of predictions: With better predictions, the insurer can accurately align the price of insurance policies with the actual risk associated with each policyholder, creating financial benefits for both the insurer and its safe-driving customers.

2) The transparency and predictability of the price: If machine learning models were to be used instead of the traditional GLM’s, however, it would come at the expense of the explainability of the model. The linearity of a GLM gives it the advantage of having a single factor that is able to explain the effect of each variable on the price. For instance, if a customer wants to know why their premium has increased, it can be explained by looking at how their data has

changed from the previous month and multiplying the differences with their respective factors.

With a non-linear model, however, this is not as easily done. While a sensitivity analysis can be used to give a rough idea of how the model reacts to changes in the factors, it is not enough to explain the full extent to which it changes its predictions. This is because it can capture complex and non-linear correlations, potentially involving more than one variable. The lack of interpretability is especially problematic in the case of overfitting, where the model wrongly predicts high risk. While measures have been taken to prevent this, mainly by early stopping in the case of increasing validation loss, it cannot be completely eliminated. The consequence is an increased risk of customer dissatisfaction. A quantitative analysis of the trade- off between accuracy and transparency has been left out of this study.

B. Pricing strategies

A central question to integrating a new model for predicting risk into the value offering is what pricing strategy to choose. We suggest considering the following key aspects when deciding on what pricing model to use:

1) The value of data: Determining the financial value of adding more data to the model is a difficult task due to the complex relationship between accuracy of predictions and fiscal performance.

To model the relationship between these two variables, the price sensitivity of customers must be estimated, as well as the financial effects of the attraction and repulsion of customers that will happen when pricing inaccuracies, both over- and underpricing, are corrected.

As previously discussed, the models of this study learned some odd behaviors that are likely to result in systematic over- or underes- timation of risk if included. This suggests that adding more data to the model will allow a greater set of variables to be included in the risk assessment, ultimately leading to better predictive performance.

2) Brand loyalty: Auto insurance is a product that is charac- terized by strong brand loyalty, with previous studies having found average annual customer retention rates of 70% to 90% [13], [14].

This can partially be explained by high search costs, i.e. the cost to the customer of searching and comparing different insurance policies [14]. However, a study of the Swiss market for health insurance found that customers were reluctant to switch insurer despite large price discrepancies between policies with similar coverage [15].

These findings suggest that the pricing of insurance is primarily important to attract customers, whereas it is less important to retain them.

3) Online conversion: Around 70% of consumers globally use digital research, such as price comparison, before purchasing a policy [16]. This highlights the fact that performance in online channels is critical to attract customers as an insurer.

A study on search, obfuscation and price elasticity on the internet found empirical evidence suggesting that the presence of price search on the internet lead to extreme demand elasticity [17]. This finding suggests that to effectively convert customers, insurers need to consider a pricing model supporting them in the task of ranking high on price comparison sites.

4) Loss prevention: Another aspect to take into account is the fact that the pricing model itself can serve as a financial incentive for customers to drive less and more carefully. This is in the interest of the environment, the safety of the driver, and the insurer. Researchers have found that usage of a product or a service is closely related to the buyer’s awareness of its cost [18]. This finding leads to the conclusion that for customers to adjust their behavior, the magnitude of potential savings, as well as the behavior that is required to achieve these savings, need to be communicated effectively.

C. Implications for the product portfolio strategy

There are multiple ways that an ML-based insurance can be integrated into the product portfolio of an insurance company. For the integration to be successful, it is critical to understand the different customer segments. Rogers [19] proposed a division of adopters

(8)

Fig. 7. Rogers’ adopter categories, visualizing the size different groups of customers, categorized by their speed to adopt new technology. [19]

into five categories: innovators, early adopters, early majority, late majority, and laggards.

The innovators are described as seekers of new ideas, with high exposure to media, high social status, large interpersonal networks, and a high tolerance of uncertainty. These are the first to adopt new innovation and are also the first to review and spread information about it.

The novelty of an insurance pricing model based on black-box ML risk predictions from telematics is likely to cause some scepticism.

Prospecting customers may be uncomfortable with the idea of letting a new, unknown algorithm judge their driving skills. To build trust and to be able to sell the product to the masses, early adopters must be convinced of the benefits of the new technology. Also, customers need to be assured that the expected price of the policy is lower than alternative products. This can be achieved in multiple ways, for instance by showing statistics on how the premiums of policyholders with the ML product compare to the premiums of other policies, and by providing financial incentives for testing the product at an early stage.

An important note is that an ML product can coexist with other, less risky offerings. In this way, the product portfolio can meet demands from a wider group of customers, enabling the insurer to capture a greater market share.

D. Competence enhancing or competence destroying?

Another question to consider is whether the ML methods of risk prediction augments or replaces the knowledge of today’s actuaries.

Schilling (2017) proposes to look at innovation as competence enhancing or competence destroying. Competence enhancing innovation is described as building on the core competencies of the business, whereas competence destroying innovation does not.

The ML methods could be considered competence destroying for insurers lacking ML expertise in their teams. This is because ML methods are fundamentally different from the GLMs traditionally used by actuaries. This can potentially open up for new actors in the insurance space. Companies with strong ML expertise are now in the position to develop models for risk prediction that can compete with the models developed by the large insurance companies over many decades.

However, the barriers to entry are great, because the accuracy of risk assessments is dependent on large quantities of data, leaving mature insurance companies with a significant advantage. To gain access to data to train the models on, as well as a large customer base, it is natural for companies with ML expertise to collaborate with established insurance companies. With the introduction of telematics, however, there are not yet many actors with substantial amounts of this new kind of data, potentially creating a window of opportunity for new entrants.

E. Communicating an ML-based pricing model

When pricing insurances, some basic data relating to the customer always has to be collected. However, telematics data adds another

layer of data collection that pertains to each individual customer’s behavior. Justifying this data collection by making the exchange of data for added value to the customer transparent is a key to build trust between the firm and both existing and potential customers.

Surveys reveal that customers generally perceive that enhance- ments made to a product or service is a fair exchange for their data.

Data that is self-reported such as email address and age are valued less than exhaust data such as location data. [20]

Telematics data falls under the exhaust data category, which means that customers will generally expect more value in return for data.

It can be argued that telematics data is self-reported as the entire function of the box device is to collect driving data, and users clearly agree to using this device when signing up for an insurance. However, the data is still collected passively without much user input, and there might be a group of potential customers who are hesitant to sign up for an insurance due to the nature of the exhaust data collection.

It is therefore in the firm’s best interest to focus on effectively communicating how an ML-based pricing model can create value for the customer.

There are three principles that firms can practice to build trust:

education, giving control to users, and delivering in-kind value. Users need to understand how the data is used to be able to trust the firm.

More control can be given to users over their data by letting them see their own data at any time, for instance through a web application.

Customers are also more willing to provide data when they know what value they can expect in return. [20]

As established previously, collecting telematics data offers an opportunity for value co-creation between the firm and its customers.

While communicating exactly how the driving behavior affects the premium is difficult with an ML-based model, it is also possible that customers do not care about all the details behind the pricing algorithm. Having a participative platform that attracts customers to actively engage with the improvement of the service (in this case tuning the premium pricing) can lead to customized benefits for the customer, and operational benefits for the firm. [21]

Since customers value their personal data, and research has shown that value co-creation can occur between the customers and the firm through sharing of data, we propose a platform that offers users access to their data in an easily interpretable manner to facilitate customer education and to provide some control to the users. This meets two of the aforementioned principles for building trust, and the delivered value can also become more visible to users as they are able to monitor their own behavior changes as their premium fluctuates.

Furthermore, this can also become an opportunity for the firm to provide users with feedback on how their data is used, and what they can do to potentially increase the value provided in both directions. In this way, such a platform for communication with users can help the firm achieve a higher degree of transparency, while offering benefits both to the customers and the firm.

X. CONCLUSIONS

Gradient boosting machines and neural networks have clear ad- vantages compared to traditional linear models for risk prediction in that they can capture non-linear, complex correlations involving multiple variables. While these models require greater amounts of data compared to linear models in order to reach a satisfactory level of performance, they also have the potential to improve the accuracy of predictions beyond what a linear model can achieve.

Our study shows that both methods are able to make predictions of claim frequencies that correlate with actual claim frequencies on data that was not used for training. The strongest performance was achieved with the feed-forward neural network. However, the claims in the test set are too few to confidently draw conclusions about the models’ predictive power.

Explainability and interpretability are areas where the traditional mathematical models are still clearly superior to the machine learning models. This is an obstacle that can weaken the ability of insurers to provide useful feedback to policyholders about how they can reduce their exposure to risk, and thereby lower their insurance premiums.

(9)

The strength of telematics in the task of predicting claim frequencies was another important finding of the study. Two of the telematic variables vastly outperformed the other variables in this regard and multiple other telematic variables also increased power of the model.

Still, it is clear that variables relating to the attributes of policyholders and their vehicles have significant predictive power.

We have also provided some insights into the opportunities, challenges and key aspects to consider when integrating a machine learning model for predicting risk into the product portfolio of an insurer. These revolve around the pricing of insurance policies, communicating the product to customers, and methods of building and accessing the competencies necessary to succeed in this task.

Finally, it can be said that machine learning and telematics for risk prediction in car insurance are topics still in their early days. There is much left to explore, both in the area of making the predictions themselves and in developing business models around them. An exciting future awaits!

REFERENCES [1] J. Boye, “Neural networks,” Lecture, 2019-11-20.

[2] C. Molnar, “Interpretable machine learning, chapter 5.10,”

https://christophm.github.io/interpretable-ml-book/shap.html, accessed:

2020-05-19.

[3] R. Verbelen, K. Antonio, and G. Claeskens, “Unravelling the predictive power of telematics data in car insurance pricing,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 67, no. 5, pp. 1275–

1304, 2018.

[4] M. F. Carfora, F. Martinelli, F. Mercaldo, V. Nardone, A. Orlando, A. Santone, and G. Vaglini, “A “pay-how-you-drive” car insurance approach through cluster analysis,” Soft Computing, vol. 23, no. 9, pp.

2863–2875, 2019.

[5] T. Litman, “Distance-based vehicle insurance feasibility, costs and benefits,” Victoria, vol. 11, 2007.

[6] R. Henckaerts, M.-P. Cˆot´e, K. Antonio, and R. Verbelen, “Boosting insights in insurance tariff plans with tree-based machine learning methods,” arXiv preprint arXiv:1904.10890, 2019.

[7] Y. Yang, W. Qian, and H. Zou, “Insurance premium prediction via gradient tree-boosted tweedie compound poisson models,” Journal of Business & Economic Statistics, vol. 36, no. 3, pp. 456–470, 2018.

[8] “Kommuntyper - stad och landsbygd,”

https://tillvaxtverket.se/statistik/regional-utveckling/regionala- indelningar/kommuntyper.html, accessed: 2020-05-11.

[9] T. Brijs, D. Karlis, and G. Wets, “Studying the effect of weather conditions on daily crash counts using a discrete time-series model,”

Accident Analysis & Prevention, vol. 40, no. 3, pp. 1180–1190, 2008.

[10] “Fatality Facts 2018 urban/rural comparison,”

https://www.iihs.org/topics/fatality-statistics/detail/urban-rural- comparison, accessed: 2020-05-11.

[11] “CS231n Convolutional Neural Networks for Visual Recogni- tion quick intro,” https://cs231n.github.io/neural-networks-1/actfun, accessed: 2020-05-11.

[12] R. F. Turkson, F. Yan, M. K. A. Ali, and J. Hu, “Artificial neural network applications in the calibration of spark-ignition engines: An overview,”

Engineering science and technology, an international journal, vol. 19, no. 3, pp. 1346–1359, 2016.

[13] L.-H. Lai, C.-T. Liu, and J.-T. Lin, “The moderating effects of switching costs and inertia on the customer satisfaction-retention link: auto liability insurance service in taiwan,” Insurance markets and companies: analyses and actuarial computations, no. 2, Iss. 1, pp. 69–78, 2011.

[14] E. Honka, “Quantifying search and switching costs in the us auto insurance industry,” The RAND Journal of Economics, vol. 45, no. 4, pp. 847–884, 2014.

[15] R. G. Frank and K. Lamiraud, “Choice, price competition and complex- ity in markets for health insurance,” Journal of Economic Behavior &

Organization, vol. 71, no. 2, pp. 550–562, 2009.

[16] PwC, “Insurance 2020: The digital prize – taking customer connection to a new level,” https://www.pwc.se/sv/forsakring/assets/insurance-2020- the-digital-prize-taking-customer-connection-to-a-new-level.pdf, note = Accessed: 2020-05-29.

[17] G. Ellison and S. F. Ellison, “Search, obfuscation, and price elasticities on the internet,” Econometrica, vol. 77, no. 2, pp. 427–452, 2009.

[18] J. Gourville and D. Soman, “Pricing and the psychology of consump- tion.” Harvard business review, vol. 80, no. 9, pp. 90–6, 2002.

[19] E. M. Rogers, Diffusion of innovations. Simon and Schuster, 2010.

[20] T. Morey, T. Forbath, and A. Schoop, “Customer data: Designing for transparency and trust,” Harvard Business Review, vol. 93, no. 5, pp.

96–105, 2015.

[21] K. Xie, Y. Wu, J. Xiao, and Q. Hu, “Value co-creation between firms and customers: The role of big data-based cooperative assets,” Information

& Management, vol. 53, no. 8, pp. 1034–1048, 2016.

Frithiof Ekstr¨om is pursuing a Master’s degree in Machine Learning at KTH Royal Institute of Technology in Stockholm, Sweden. His main focus in this study was building and evaluating the Neural Network model for risk prediction as well as studying the critical factors to succeed in pricing an ML-based insurance.

Anton Chen is pursuing a Master’s degree in Machine Learning at KTH Royal Institute of Technology in Stockholm, Sweden. His main focus in this study was the process of assigning weather codes to individual trips, the analysis of features, and the examination of possible approaches to communicating an ML-based insurance.

(10)

www.kth.se