Validation Techniques for Credit Risk Models - Applying New Methods on Nordea’s Corporate Portfolio

(1)

Kungliga Tekniska Högskolan

Degree Project in Engineering Physics

Validation Techniques for Credit Risk Models - Applying New Methods on

Nordea’s Corporate Portfolio

Author:

Katja Dalne dalne@kth.se

Supervisor Nordea:

Jon Enqvist jon.enqvist@nordea.com Supervisor KTH:

Gunnar Englund gunnare@kth.se

May 13, 2013

(2)

Abstract

Nordea, being the largest corporate group of its kind in Northern Europe, has a great need of evaluating its customers ability to repay a debt as well as the probability of bankruptcy. The evaluation is done by different statistically derived internal rating models, based on logistic regression. The models have been developed by the use of historical data and attain good predictiveness when a lot of observational data is provided for each specific customer. In order to ameliorate the rating models, Nordea wants to implement two new validation methods, recommended by the reputable credit rating agency Moody’s: information entropy and accuracy ratio with simulated defaults. A default is a customer either being close to or being bankrupt. Information entropy mea- sures how much information is included within a given variable, while accuracy ratio with simulated defaults validates the ability of the model to discriminate between "good" and "bad" customers when simulating default data. The simulation is used when sufficient default data does not exist, which is the case for large corporates. After the implementation of these validation methods, for the same set of data that Moody’s were given, the results that they presented could be confirmed by the chosen implementation method. This method was then used for analysis of a general set of data and it could be concluded that the use of each validation method, recommended by Moody’s, would improve the validation of the model.

Sammanfattning

Nordea är den största bankkoncernen i norra Europa. Man har stort behov av att känna till sina låntagares förmåga att återbetala skulder samt att noggrannt kunna uppskatta sannolikheten för konkurs. Utvärderingen görs med hjälp av kreditvärderingsmodeller baserade på logistisk regression. Modellerna bygger på historiska data och predikterar väl då dataunderlaget är stort i det enskilda fallet. För att utveckla och förbättra den modell som används för stora före- tag, önskar man inom Nordea implementera två nya valideringsmetoder som rekommenderas av det välrenommerade kreditvärderingsinstitutet Moody’s: in- formationsentropi samt noggrannhetsförhållande med simulerade defaulter. En default är en kund som antingen riskerar att gå eller är försatt i konkurs. In- formationsentropi mäter hur mycket information som ryms i en given variabel, medan noggrannhetsförhållande med simulerade defaulter validerar hur väl en modell kan skilja mellan "bra" och "dåliga" kunder vid användning av simulerade defaultdata. Simuleringen används när det inte finns tillräckligt med defaultdata att tillgå, vilket är fallet då antalet observerade defaulter inom en viss låntagarkategori är få. Efter implementering av valideringsmetoderna med samma data som Moody’s använt sig av, kunde de resultat Moody’s presenterat bekräftas med den implementeringsmetod som valts. Denna metod användes därefter vid analys av en större datamängd och slutsatsen kunde dras att an- vändandet av var och en av valideringsmetoderna, föreslagna av Moody’s, leder till en förbättrad validering av modellen.

(3)

Introduction

1.1 Background

A credit rating model is used to estimate the credit risk of a company. For a bank, it is necessary to evaluate the ability of a company to repay a debt and the probability of default in order to remain profitable. A default is defined as a company that is (or is close to) bankrupt and will be explained in detail later in the text.

Nordea has developed an internal rating system where a statistically derived rating model produces a Probability of Default (PD) of a specific customer. The PD is thereafter transformed into a rating (21-pointed scale), which is in turn quantified to give the capacity of repayment for a customer. The statistical analysis contains logistic regression, which will be explained in detail in the theoretical background. The logistic regression enables the model to make a prediction of a customer’s performance in the future according to historical be- haviour.

Nordea has different models of rating according to different portfolios, i.e.

different genres of business (corporate, shipping, financial institutions, real es- tate, bank, etc.). As one can see, one of them is the corporate (C) portfolio, which contains all different sizes of corporations, from small to large ones. A customer needs to fullfil one out of four criteria based on financial properties, in order to be defined as a customer of the large corporate (LC) portfolio. Nordea decided to create a new rating model for the LC-portfolio and once created, Nordea decided to engage an external world leading rating agency, Moody’s, to validate the model and give their recommendations of improvements. When Moody’s results were received, some of their recommendations were directly implemented while some of them were left to be discussed in the future. Two of the remaining recommendations concerned Information Entropy and Accuracy Ratio with Simulated Defaults.

(5)

1.2 Aim of the Thesis

In order to ameliorate the validation of Nordea’s Corporate Rating Model, Nordea wants to examine the value of implementing Information Entropy as well as Accuracy Ratio with Simulated Defaults. Moody’s presented their conclusion, that these two validation methods would improve the validation of the model, however no calculations and course of action were presented. To be able to implement the validation methods, Nordea primarily wants to examine the procedure in order to see which implementation method leads to the same results as Moody’s attained concerning the LC-portfolio. Secondarily, they will be implemented on the C-portfolio, which contains a lot more default data, in order to see if they are reliable for a general set of data. Depending on the achieved result, Nordea will be recommended to develop their validation of the model according to the conclusions drawn.

(6)

Chapter 2

Description of the Present Situation

Nordea is a Nordic financial and banking service group, established in 2000 by a merge of Kreditkassen, Merita-Nordbanken and Unibank. This corporate group make the largest one of its kind in Northern Europe. The parent company Nordea Bank AB is a Swedish joint-stock bank [1].

There are several shareholders of Nordea, where the largest one, with 21,3%

of the shares, is a Finnish insurance company Sampo. Nordea is listed on the NASDAQ OMX Nordic Exchange in Stockholm as well as in Helsinki and Copenhagen. Currently, Nordea provides 11 million private and 700 000 active corporate customers [2].

As for banks in general, Nordea is a financial institution and intermediary accepting deposits, channeling those deposits into lending activities. Being an influencing part in a financial system, banks are highly regulated. They are generally following minimum capital requirements, based on international capital standards, known as the Basel Accords.

Nordea’s headquarters is situated in Stockholm and is divided into several different departments where "Group Capital & Risk Modelling" is one of them.

This department is divided into five different units, where "Capital Analysis &

Forecast" is one of them. "Credit Rating & Analysis" is in turn a unit within it and the department where my thesis was demanded and performed [3].

(7)

Chapter 3

Theoretical Background

3.1 Regression Analysis

A statistical model describes the relationship between variables. As opposed to a deterministic model, a statistical model account for the possibility that the relationship is not perfect. It is done by accepting unexplained variation, in form of residuals. A way of describing a statistical model is:

Response = Systematic component + Residual component

The most optimal model that can be found is a model that yields a reason- able approximation to reality.

For a general linear regression model, the observation of the dependent variable y for observation number i (i = 1, 2, ..., n) is modeled as a linear function of (p − 1) independent variables (or covariates) x1, x2, ..., xp−1 as

y_i= β₀+ β₁x_i1+ ... + β_p−1x_i(p−1)+ e_i (3.1) or in terms of a matrix

y = Xβ + e (3.2)

where

y =





 y1

y2

... yn





 , X =







1 x11 · · · x_1(p−1) 1 x21 · · ·

... ... . ..

1 xn1 xn(p−1)





 , β =





 β0

β1 ... βp − 1





 , e =





 e1

e2

... en





 (3.3)

(8)

Here, β₀represents the interception with the y-axis [4].

This means that one can take an observed data set of y- and x values and fit it to a linear function (ordinary least square fitting) and the result will be an estimation of the unknown regression coefficients (model parameters), the β’s. These β’s will then be used in the model in order to estimate yi from an additional data set of x values. This kind of regression analysis is generally used for prediction and forecasting [5].

Figure 3.1: The principle of linear regression

3.2 Logistic Regression

Logistic regression is based on linear regression and follows the same principles according to the procedure of attaining a predicted value from observational data. What characterises logistic regression compared to linear regression is that the dependent variable, yi, is binary in logistic regression. The observed outcome can solely eventuate in two possible types, for instance: dead versus alive [5].

In binary models, it is of interest to model the response probability as a function of the predictors, reminding that the probability has range 0 ≤ p ≤ 1.

Knowing that the linear predictor Xβ can take any real value in one dimension, it is desirable that the model uses a link g(p) to transform a probability to the range (−∞, ∞). A link that can be used for this purpose is the logit link (i.e.

logistic transformation) that transforms a response probability as following:

logit(p) = log p

1 − p (3.4)

The ratio _1−p^p can be seen as odds of success. The logit link corresponds to the logistic distribution and has the density:

(9)

f (y) = βe^α+βy

[1 + e^α+βy]² (3.5)

The principles explained above are illustrated by the figure (3.2) below with the binary system defined as following: "default" = 1 and "non-default" = 0.

Figure 3.2: The principle of logistic regression

Logistic regression modelling is, nowadays, commonly used in a variety of different domains, such as biomedicine, engineering, biology and finance as an important analysis method [4-6]. For instance, logistic regression is used as a statistical analysis tool in rating models, as can be seen in the section below.

3.3 The Corporate Rating Model

The Corporate Rating Model that Nordea has developed is statistically derived and based on logistic regression which analyses a historical sample of data. The data consists of the customers’ repayment capacity, represented by so called factor values. These in turn are mapped to factor scores (FS) that are easier to understand as they are integers from 1-6. This mapping is done through a so called cut-off-method. By running a logistic regression, the data is used in order to find the regression coefficients, the β⁰s, which in turn represent the factor weights. The mathematical formula for the logistic regression is:

logit(PD) = ln( P D

(1 − P D)) = β0+ β1× F S1+ ... + βn× F Sn (3.6) The β⁰s are the different quantitative and qualitative factor coefficients and therefore n=12 in the regression equation for this model.

(10)

To obtain the best model possible, a final factor selection is done via a back- wards elimination process in order to exclude unintuitive or insignificant factors.

The criteria for a factor to be unintuitive is if it gets a negative regression co- efficient. This is due to the fact that the regression coefficients, i.e. the factor weights, must sum up to 100% as the outcome of the model is a probability. If there are several negative regression coefficients, the factor with the largest one is excluded. Hence, a new regression is performed, repeating the process until there are no factors remaining with negative regression coefficients. The criteria for a factor to be insignificant is if the p-value is above significance level 0,05 [7]. In statistical significance testing the p-value is the probability of achieving a test statistic at least as extreme as the one that actually was observed, assuming that the null hypothesis is true [8].

Once the β⁰s are derived by finding the best combination of the factor scores, as well as going through the factor selection, additional independent observations for a specific customer will be used in the model, in order to predict the outcome value of the model, the estimated PD for the specific customer.

As default data is used as input to the logistic regression analysis, the criteria for good predictiveness of the model is to have many default observations.

Hence, the development of the model require historical data containing a sufficient amount of default observations with good quality. 80% of the observational default data is used to build the model and these are called estimation data, while 20% of it is called validation data and is put aside in order to make vali- dations of the final model, once created.

The observation period regarded has to be long enough to bring a long- term estimation of a customer’s future performance but at the same time short enough in order to detect time trends. With these objectives in mind, as well as regimentations concerning Basel II, Nordea chose the observation period to be 12 months.

The main objective of the model remains to make a distinction between

"good" and "bad" customers. Nordea has defined customers being "bad" as defaulted customers and "good" as customers being non-defaulted. Consequently, there are no intermediate customers [7]. This is the reason to why logistic regression is used for the modelling, see figure (3.2). See detailed definition of

"good" versus "bad" customers with corresponding PD in table (3.1) as well as the relationship between PD and rating in figure (3.3).

(11)

Rating grade Description PD

"Bad" - DEFAULTED

0- Loss bankruptcy with no hope of recovery

0 Loss non-performing

0+ Doubtful but performing

"Good" - NON-DEFAULTED

6+ Excellent+ 0,030%

6 Excellent 0,034%

6- Excellent- 0,048%

5+ Strong+ 0,070%

5 Strong 0,104%

5- Strong- 0,143%

4+ Good+ 0,196%

4 Good 0,323%

4- Good- 0,536%

3+ Acceptable+ 0,850%

3 Acceptable 1,310%

3- Acceptable- 2,038%

2+ Weak+ 3,388%

2 Weak 5,208%

2- Weak- 8,285%

1+ Critical+ 12,430%

1 Critical 17,735%

1- Critical- 26,845%

Table 3.1: Definitions of performance

Figure 3.3: PD versus Rating

(12)

3.4 Information Entropy

Information entropy is defined by Moody’s as a measure of the information included within a given variable. This means measuring the diversity of answers within a subjective question. It is used to detect subjective questions that contributes little or no information at all, since it is of high interest in statistical analysis to avoid biased information. Generally, information entropy is defined as the minimum number of bits (either True or False) required to preserve the information contained within a given question. As an example, a question with eight different answers, each equally likely to occur, requires three bits to store the information since 2³ = 8 possible combinations. If the answers are not equally likely to occur, fewer bits are required to store the same amount of information. A question with n possible answers and the probability p_i for answer i to occur, has the information entropy, H, defined as:

H = −

n

X

i=1

p_i× log₂(p_i) (3.7)

Consequently, the information entropy reaches its maximum when the probability for each answer is equally likely, pi=_n¹. In this analysis Moody’s regarded relative information entropy, defined as the proportion of observed entropy to maximum entropy. The relative information can be interpreted as the amount of possible information used by the question. When this amount becomes low, the conclusion can be drawn that the question is suspect. The figure below, see (3.4), provides an example of a biased question with low information entropy.

In this analysis H ≥ 0, 70 shows that a question has average or high information content and H ≤ 0, 50 that it has low or non-existent informational content [9].

Figure 3.4: Example - Biased question

3.5 Accuracy Ratio with Simulated Defaults

Accuracy ratio (AR) is defined as following:

(13)

Accuracy ratio = B

A + B (3.8)

where

B= the area between the actual model and the random model A= the area between the perfect model and the actual model

Hence, the perfect model (the one that equals to reality) would have an AR of 100% and a random model would have an AR of 0%. In this analysis an AR

≥ 30% indicates above average to high accuracy in its capacity to discriminate between "good" and "bad" obligors while an AR < 10% indicates low accuracy.

Typically for the LC-portfolio, as well as other low-default portfolios, there will be an insufficient number of observed defaults. To overcome this, "bad" obligors in the portfolio are approximated in order to calculate the AR. The worst 10%

of the customers in the portfolio will be used as indicators of default [9].

To visualise the AR, see figure (3.5) below. It indicates that the perfect model occurs when all defaults happen for the lowest rating, which in turn is the goal for the rating process.

Figure 3.5: Example - AR

(14)

Chapter 4

Data Collection

Due to the large amount of data required for the analysis, already existing data was used throughout the project. Data was sourced from Nordea’s different rating databases.

The observational data for the LC-portfolio consisted of 414 customers and was taken from 2011. The observational data for the C-portfolio consisted of 17 900 customers and was taken from 2010 and 2011 with the actual year being T=2011, i.e t = T = 2011 and t = T − 1 = 2010.

(15)

Chapter 5

Methods

5.1 Method to Confirm Moody’s Results

Initially, it is of interest to confirm Moody’s results by detecting the path of calculations that has lead to their recommendations.

5.1.1 Information Entropy

Applying the idea of information entropy on the data set for the LC-portfolio that Moody’s were given, the distribution of scores for each financial (quantitative) and qualitative factor is regarded.

Each factor can be seen as a question and the scores for each factor as possible answers. For instance one can regard: "What is the factor x1score for customer X? Answer: 3". Each factor can be scored by an integer from 1-6, hence n = 6 in our analysis according to formula (3.7). To calculate p_i, the number of customers, having score i, is divided by the total number of scores, i.e. the total amount of customers. This is calculated for i = [1, 6]. According to formula (3.7), the observed information entropy can now be calculated for each factor. The maximum information entropy is calculated by stating that pi=¹₆ for all i. Finally, the relative information entropy is calculated as the observed information entropy divided by the maximum information entropy.

5.1.2 AR with Simulated Defaults

Applying AR on the data set for the LC-portfolio that Moody’s were given, the same factors are evaluated as mentioned above. The score of each factor is compared to the expert rating, which is seen as the reality (if the customer actually defaulted or not). By sorting the customers after their score, those having the 10% lowest scores can be found and defined as defaulted. This is how the defaults are simulated. Building up the plot with the different areas involved for AR, one needs to have the cumulative percentage of non-defaults ordered by rating (Non-default %) on the x-axis and the cumulative percentage of defaults ordered by rating (Default %) on the y-axis. The percentage of defaults within a specific rating contributes with one data point to the plot. In order to calculate the area between the plots, an approximate method will be

(16)

used. The method is based on dividing the area under the plot into rectangles, see figure (5.1) below. Here the method of area approximation is demonstrated for the case consisting of three data points.

Figure 5.1: Method of area approximation

Area under the plot =

= x₁× y1

2 + (x₂− x1) × y₁+(x₂− x1) × (y₂− y1)

2 +

+(x3− x2) × y2+(x3− x2) × (y3− y2)

2 =

= x₁× y1

2 + (x2− x1) ×(y₁+ y₂)

2 + (x3− x2) ×(y₂+ y₃) 2

(5.1)

As the axis goes from 0 to 1 (0 to 100%) it can be seen, according to formula (3.8), that:

A + B = 0, 5 ⇒ AR = 2 × A (5.2)

Once calculated the area under the plot, one can calculate the AR using the definition in formula (3.8).

5.2 Method for General Set of Data

The next step is to apply the validation method on a portfolio consisting of a bigger amount of data, the C-portfolio. This is done in order to show that they

(17)

are valid in a more general case.

Firstly, the information entropy will be calculated for a total financial factor where the financial factors have been weighted together. The data for the total financial factor is taken from t=2010. The method used to confirm Moody’s results (section 5.1.1) will be used, though only regarding one factor this time.

Secondly, the AR will be calculated for the C-portfolio using both calculated and approved rating with the same method as for the LC-portfolio (section 5.1.2). Calculated rating has been produced directly by the rating model, while approved rating has been adjusted after the rating model, due to specific criteria. However, the default simulation will be done in a different procedure than above. Instead of simulating the 10% worst performing customers as defaulted, customers up to a certain rating level will be defined as defaulted. Those that will be defined as defaulted are, in the first case, customers with rating 2+ and, in the second case, customers with rating 1+ or lower. To enable analysis of the accuracy of the results, due to the simulations, the AR will be calculated for customers having rating 0+ and lower, i.e. without simulation (as 0+ and lower are defaults per definition) as well. This value will be compared to the result in Nordea’s internal validation report. The only upcoming issue while comparing the results is that the AR presented in the handbook has been calculated due to another definition. Instead of defining the quantity of the x-axis in the AR- plot as "Non-default %", like Moody’s did (and that has been used throughout this project), Nordea has chosen to define the quantity of the x-axis as "Port- folio %", i.e. the total cumulative percentage of all customers in the portfolio.

Therefore, in order to be able to compare the results, the AR is calculated both for "Portfolio %" and "Non-default %" so that the case without simulation can firstly be confirmed by the handbook value and then be comparable with the calculations for the simulations.

(18)

Chapter 6

Implementation

6.1 Confirming Moody’s Results

In all implementations, Excel was used as computational tool.

6.1.1 Information Entropy

The implementation of the methods, mentioned in the preceding chapter, required the same calculations regarding each factor when calculating the information entropy. To visualise this, the calculations for the financial factor x1

will be presented as an example, see figure (8.1) in the Appendix.

To find the total # of customers with score out of the 414 customers in the LC-portfolio, the Excel function countif was used. Further, various arithmetic operations in Excel were applied in order to implement the chosen method and calculate H, defined in formula (3.7). The same procedure was done for all factors. The result is presented in the following chapter.

6.1.2 AR with Simulated Defaults

Implementing the method used to calculate the AR, several calculations were required in Excel, see figure (8.2) in the Appendix. Here the calculations regarding financial factor x1 are presented. However, all the factors (financial and qualitative) required the same calculations. Firstly, each customer’s expert rating was mapped to a score (21-pointed scale) using the Excel function vlookup for a table containing the transformation between a rating (0⁻, ..., 6⁺) and a score (1-21). To simulate the defaults, the 10% worst customers were picked out of the total 414 customers, which yielded 41 customers. The 41^st customer from the bottom (arranged by lowest score) had rating 2 (i.e. score 8) and therefore all customers with rating 2 (this value is different due to the regarded factor) or lower were defined as defaulted, which resulted in 52 customers being defined as defaulted. The default indicators were extended to include all those that had the same rating as the ratings included in the 10% worst percent, in order to avoid non-consistency in the simulation. The customers were then arranged by lowest financial factor x1 score. The customers defined as defaulted in the preceding

(19)

step were labelled with a default-flag, 1 indicating default and 0 indicating non- default. Hence, analysis of how the migration of defaults went was now possible.

Furthermore, to be able to calculate the AR, the total # of customers with score was found in the same way as in section 6.1.1. To find how many defaults and non-defaults there were for each group of customers with a specific score, countif was used for the specific interval containing the specific score. From this, the columns with cumulative calculations, see column J, K and L in figure (8.2), could be calculated easily by applying addition on the rows as the score increased.

The next step was to calculate the probability of being defaulted/non-defaulted for all customers within a specific rating (P_def)/(P_nondef). This was done by dividing the cumulative amount of defaults versus non-defaults for a specific score with the cumulative total number of defaults versus non-defaults. These probabilities are visualised by the area under the actual model in the plot, see figure (3.5). The total area, needed for the calculation of AR, was found by using the approximation formula, (5.1). Once the area was calculated, the AR could be calculated using formula (5.2). The result for all factors is presented in the following chapter.

6.2 Implementation on General Set of Data

Implementing the method used to calculate the information entropy on a general set of data, the C-portfolio, the factor regarded was a total financial factor.

This factor is a combination of each financial factor weighted together with weights defined by Nordea. The information entropy was calculated for the total financial factor using same principles as in the section 6.1.1. See figure (8.3) in Appendix for details. The total financial factor in the Excel sheet is represented by column A.

Implementing the method used to calculate the AR for the C-portfolio, sim- ilar calculations were done in Excel as in section 6.1.2, see figure (8.4) in Ap- pendix. As mentioned in section 5.2, the simulation was done twice by regarding customers with rating 2+ or lower versus 1+ or lower defined as defaulted. In figure (8.4) the calculations for the calculated rating, with defaults being defined as customers with 2+ or lower, are presented. The calculations followed the same principles for the second default simulation as well as for the approved rating. The rating from 2010 was seen as the internal rating (corresponding to factor score in 6.1.2) and the rating from 2011 was seen as the external rating (corresponding to expert rating in 6.1.2). To be able to compare the results of the simulation with the value of AR for the C-portfolio without simulation, the same calculations were done for the case where customers having score 0-, 0 or 0+ got the default-flag, see figure (8.5) in Appendix. As mentioned in the end of section 5.2, the AR was calculated both for "Portfolio %" and "Non- default %", where the calculations involved in the case regarding "Portfolio %"

followed exactly the same calculations as for "Non-default %", though with the data for the total portfolio instead. All results concerning implementation on the C-portfolio are presented in the following chapter.

(20)

Chapter 7

Results

7.1 Information Entropy - LC

Calculating the information entropy for each factor for the LC-portfolio, the following results occurred. Moody’s results are presented in the table as well, in order to be able to compare the results. See table (7.1) below:

Factor H [Moody’s] H [Result]

Financial factors

Factor x₁ 0,88 0,880047

Factor x2 0,86 0,862368

Factor x₃ 0,78 0,781216

Factor x4 0,88 0,881766

Factor x₅ 0,82 0,819734

Factor x6 0,91 0,908202

Qualitative factors

Factor y1 0,74 0,686433

Factor y2 0,66 0,559481

Factor y3 0,69 0,58514

Factor y4 0,70 0,589981

Factor y5 0,69 0,582711

Factor y6 0,69 0,641208

Table 7.1: Information Entropy LC-Portfolio

7.2 AR with Simulated Defaults - LC

Calculating the AR for each factor for the LC-portfolio, with the 10% worst customers defined as defaulted, following results occurred. See table (7.2) below.

Moody’s results can be found in the table as well.

(21)

Factor AR [Moody’s] AR [Result]

Financial factors

Factor x1 52% 52%

Factor x2 19% 17%

Factor x3 41% 41%

Factor x4 39% 41%

Factor x5 66% 64%

Factor x₆ 59% 58%

Qualitative factors

Factor y₁ 58% 57%

Factor y2 50% 52%

Factor y₃ 52% 55%

Factor y4 55% 57%

Factor y₅ 50% 51%

Factor y6 60% 60%

Table 7.2: AR with Simulated Defaults LC-Portfolio

7.3 Information Entropy - C

Running the calculations for the C-portfolio for the total financial factor for t=

T-1 yielded the following result:

HT −1= H2010= 0, 692155 where T= the actual year.

7.4 AR with Simulated Defaults - C

When calculating the AR with simulated defaults for the C-portfolio, following results occurred. See table (7.3) below:

Default definition AR Calculated AR Approved

Score 2+ and lower 57% 58%

Score 1+ and lower 65% 68%

Table 7.3: AR with Simulated Defaults C-portfolio

Calculating the AR without simulated defaults for the C-portfolio, following results occurred. See table (7.4) below:

Default definition AR Calculated AR Approved Score 0+ and lower 67% (Portfolio %) 72% (Portfolio %)

68% (Non-default %) 74% (Non-default %)

Table 7.4: AR without Simulated Defaults C-portfolio

(22)

Chapter 8

Analysis and Discussion

8.1 Conclusions

Due to the results attained in section 7.1 it can be seen that the results concerning the financial factors correspond very well to Moody’s results. On the other hand, the results concerning the qualitative factors did not show satisfying results. It was found during the project that another data set than the one used by Moody’s had been used in the analysis for the qualitative factors. Therefore these results were only presented but not taken into account when analysing the implementation method. However, due to the fact that the result for the financial factors corresponded very well, it indicated that the method chosen to confirm Moody’s results concerning information entropy was correct. In section 7.2 it appears that the AR for each factor corresponds well to Moody’s result as well. There is a deviation of 1-3%, which is probably caused by the area approximation used in the calculations. Moreover, the results are satisfying and this confirmation of the selected methods made it possible to consider the methods as reliable. It strengthened the argument that it could be used for implementation on another, bigger set of data, the C-portfolio.

Regarding the result in section 7.3 and according to the definition of low versus high information entropy in section 3.4, it can be seen that the information entropy for the total financial factor for 2010, H₂₀₁₀, is average. However it is not far from being considered as high, thus this result is satisfying.

According to the result in section 7.4, table 7.4 presents the results when applying no simulation of defaults. It is of interest to compare the attained result with the result in Nordea’s internal validation report. In the report the AR yielded 0,68 for the calculated rating and 0,74 for the approved rating [10].

Comparing these values with the attained values from the implementation regarding "Portfolio %", it can be seen that there is a deviation of 1-2%. Similarly as for the AR calculations for the LC-portfolio, this is probably caused by the area approximation used in the calculations. Due to the fact that the attained result corresponded well to the value in the validation report, this method can be seen as reliable and one can now be able to analyse the AR with simulated defaults. According to the definition of high versus low AR, both simulations

(23)

yield satisfying results. However simulating customers having score 1+ and lower as defaults resulted in a higher AR which is obviously preferable in order to make the best predictions. Therefore, this simulation can be applicable on a low-default portfolio.

8.2 Recommendations

In the light of the conclusion, Nordea is recommended to implement these two validation methods in order to ameliorate their rating model. As it can be seen throughout the thesis, these methods are implementable on different types of portfolios. Nordea should strive to raise the information entropy by trying to attain a non-biased distribution of the factor scores as possible, using information entropy as a tool to measure their improvements and deteriorations.

Secondly, Nordea should consider implementation of AR with simulated defaults for low-default portfolios, defining customers with score 1+ and lower as being defaulted. This will give high accuracy and be a useful tool for any low-default portfolio.

8.3 Accuracy

A possible source of error throughout the project is the approximation of the area, see figure (5.1). The approximation is based on the principles for a Rie- mann sum and by increasing the number of rectangles the accuracy increases.

For the type of data used in this model, this is not possible due to the fact that the factor scores (FS) are fixed to be an integer from 1-6. By redefining the mapping from factor values to FS and accepting non-integers for instance, the number of rectangles in the plot for AR would increase and the approximation would get closer to reality. Mathematically, by taking the limit and letting the number of rectangles approach infinity, the calculation of the area would be exact. However, this is not possible in reality and therefore there will always remain some deviation using the approximation.

(24)

References

1. Nationalencyklopedin: Nordea Bank AB. [web page]. [Read 2013-04-23].

Available: http://www.ne.se.focus.lib.kth.se/lang/nordea-bank-ab 2. Nordea Facts and Figures. [web page]. [Read 2013-04-21]. Available:

http://www.nordea.com/About+Nordea/Nordea+overview/Facts+and+

figures/1081354.html

3. Nordea Organisation. [web page]. [Read: 2013-04-20]. Available:

http://www.nordea.com/About+Nordea/Organisation/70432.html 4. Olsson U. Generalized linear models - An applied approach. Lund:

Studentlitteratur; 2002.

5. Lang H. Topics on Applied Mathematical Statistics (booklet). Stockholm:

KTH; 2012.

6. Hosmer D, Lemeshow S. Applied Logistic Regression. Ohio:

A Wiley-Interscience Publication; 2000.

7. Nordea. [rating model documentation]. "Corporate Rating Model"; 2009.

8. Blom G, Enger J, Englund G, Grandell J, Holst L. Sannolikhetsteori och statistikteori med tillämpningar. 5th ed. Stockholm: Studentlitteratur;

2004.

9. Moody’s Analytics. [validation report]. "Large Corporate Scorecard Review and Verification"; 2012.

10. Nordea. [validation report]. "Internal rating model validation report";

2011.

(25)

Appendix

Figure 8.1: Calculations Information Entropy (LC) Excel

(26)

Figure 8.2: Calculations AR (LC) Excel

(27)

Figure 8.3: Calculations Information Entropy (C) Excel

(28)

Figure 8.4: Calculations AR with simulation (C) Excel

(29)

Figure 8.5: Calculations AR without simulation (C) Excel