THE IMPACT OF DISTANCE, FEATURE WEIGHTING AND SELECTION FOR KNN IN CREDIT DEFAULT PREDICTION

(1)

THE IMPACT OF DISTANCE,

FEATURE WEIGHTING AND

SELECTION FOR KNN IN CREDIT

DEFAULT PREDICTION

Master Degree Project in Informatics

with a specialisation in Data Science

Spring term 2020

Huicheng Zhang

Supervisor: Nikolaos Kourentzes

Examiner: Huseyin Kusetogullari

(2)

ABSTRACT

With the rapid spread of credit card business around the world, credit risk has also expanded dramatically. The occurrence of a large number of credit card customer defaults has caused huge losses to financial institutions such as banks. Therefore, it is particularly important to accurately identify default customers. We investigate the use of the K Nearest Neighbor (KNN) algorithm, to evaluate the impact of the alternative distance functions, feature weighting, and feature selection on the accuracy and the area under curve (AUC) of the credit card default prediction model. For our evaluation, we use a credit card user dataset from Taiwan. We find that the Mahalanobis distance function performed best, feature weighting, and feature selection could improve the accuracy and AUC of the model.

(3)

1. Introduction

2 1.1 Research question

3 1.2 Motivation

3 1.3 The structure of the work

4 2. Background literature

5 2.1 Classification algorithms

5 2.2 Related work on credit card default modelling

8 3. Methods

10 3.1 The KNN algorithm

10 3.2 Feature weighting

11 3.3 Feature selection

12 3.4 Distance functions

13 4. Experimental design

16 4.1 The description of the dataset

16 4.2 The data preprocessing

17 4.3 Implementation details on the KNN

19 4.4 Metrics of evaluation

19 5. Results

21 6. Discussion

28 6.1 Regarding results

28 6.2 Limitations

29 6.3 Ethical considerations

29 6.4 Future work

30 7. Conclusions

31 8. References

32

(4)

1. Introduction

Credit card default is when a customer is unable to repay before the credit card bill is due. With the increasing use of credit cards, the number of defaulters has also increased (Shetty & Manoj, 2019). The bank in the case of credit card customer default will appear a significant financial loss, which is an important problem to be solved.

Predicting exactly which credit card customers are most likely to default is a big opportunity for all Banks. In China, from 2008 to 2011, the amount of credit card defaults increased from 3.37 billion yuan to 11.03 billion yuan, an increase of 7.66 billion yuan (Nie, 2012). In a developed financial system, default prediction is essential for predicting the credit risks of bank customers and reducing losses and uncertainties (Yeh & Lien, 2009). In any credit agreement, accurate estimates of individual credit risk are beneficial to banking institutions and borrowers (Hand & Henley, 2007). One aspect that is omitted is that we are really interested at the borderline default coset to decide on credit limits and costs.

The research focus of this paper is based on the use of the KNN algorithm for credit card defaults prediction, by comparing the AUC of different distance functions, the distance function with the highest AUC value is selected to build the model. Then to evaluate the impact of feature weighting, and feature selection, on the accuracy and the AUC of the KNN model.

The Euclidean distance function is the most widely used in the KNN (Hu et al., 2016). But in credit card default prediction applications, no study has examined the classification performance of different distance functions on the KNN. Therefore, it is worth exploring.

Feature selection and feature weighting are popular data preprocessing steps for generally (Duch & Grudzinski, 1999; Wolf & Shashua, 2005). The former selects a subset of relevant features to build the model. The latter can be weighted according to the relevance of the features, and the features with relatively low weighting will be deleted from the dataset. Both of them can simplify the model and shorten the training time, and improve the performance of the model (Panday & Amorim, 2017). Feature weighting and feature selection are also the research directions we pay attention to. In order to evaluate our model, accuracy and AUC are good indicators to judge the performance of the model.

(5)

1.1 Research question

How to improve the KNN model for credit card default prediction in particular a. What is the benefit of feature selection?

b. What is the benefit of feature weighting?

c. What is the impact of using alternative distance functions？

1.2 Motivation

Credit cards have become popular and started squeezing out cash, there are more and more products and services that do not require cash, such as various applications, airline tickets, online stores, and so on. There are fewer large-denomination notes and coins in circulation, especially the latter, because of the high minting and handling costs. Banks have been starting to encourage cashless payments (Fabris, 2019).

Lending is one of the key businesses for banks, and credit cards have been a huge success. To increase banks' market share, banks often issue credit cards without knowing their customers' backgrounds. Customers use credit cards more than they can repay, resulting in late payments and debt accumulation (Sharma & Mehra, 2018).

Provide an effective forecasting model for banks to determine whether their customers can pay their credit cards timely. Banks can make money with credit card services, but the disadvantage is that the customer may not be able to repay timely. Therefore, it is important to build an accurate model of credit card default, which can reduce the losses of the bank. Even a small fraction of one percent improvement can help reduce significant losses (Zhou & Lai, 2009).

Credit card defaults have caused significant financial losses to banks, while customers’ credit ratings have also been damaged. This is an important problem that needs to be resolved (Merikoski et al., 2018). Chou and Lo (2018) compared the neural network model with the KNN model in the prediction of credit card default and found that the neural network model had better performance. But they did not study the KNN further, such as whether different distance functions could improve the performance of the model. They also did not further study whether feature selection and feature weighting were beneficial to the KNN model. Therefore, it is still worth exploring whether a more accurate credit card default prediction model based on alternative distance functions, feature weighting, and feature selection can solve this problem well.

(6)

Compared with other supervised learning models, Liu (2018) and Mouhoub et al. (2018) found that the KNN model based on the Euclidean distance function does not perform well in credit default applications. But they also did not make more improvements to the KNN model, so it is worth exploring whether using other distance functions can improve the performance of the KNN model. More importantly, the KNN algorithm is intuitive and easy to implement, and its strong interpretability is very important for banks to build the credit default prediction model.

1.3 The structure of the work

In section 2 we discuss the background research. In section 3 we expand on the methods. In section 4 we discuss the design of the experiment. In section 5 we present the results of the KNN model for k-value, alternative distance functions, feature weighting, feature selection, and compare against benchmarks. In section 6 we discuss the whole work. In section 7 we discuss the conclusions.

(7)

2. Background literature

The purpose of this section is to provide readers with the necessary information to understand the content of the paper. It describes the main classification algorithms used in credit card default classification, with additional attention to the KNN method that is the focus of this work. And other related work in credit card default prediction.

2.1 Classification algorithms

2.1.1 K Nearest Neighbor

The KNN is a commonly used classification algorithm in data mining applications (He & Wang, 2007). And it is a non-parametric classification method, which is simple, but effective in many cases (Hand et al., 2001). Validity is very important in the process of building a credit card default prediction model because an invalid prediction result is useless for the bank. The KNN algorithm is intuitive and easy to implement. The first step is to select a suitable distance function including the Euclidean distance, Manhattan distance, Cosine distance, Mahalanobis distance, and so on. The Euclidean distance is widely used in distance-based classification tasks (Sainin & Alfred, 2010). In the second step, the distance between the unlabelled data and all neighbors is sorted from small to large, and then the top K are selected. The third step is to count the number of occurrences of each category in K neighbors and classify the unlabelled data as the category with the most occurrences.

Conventionally, the KNN algorithm uses the Euclidean distance function to measure the distance between two observations. Xie & Zhao (2015) adopted chi-square distance as the measurement function of the KNN algorithm, compared with the Euclidean distance, it has increased by 2% in accuracy. After applying chi-square distance, the performance improvement of the model is not significant, because they did not explore whether other distance functions, feature weighting, or feature selection could also improve the performance of the model.

Since each feature contributes to the label differently in the actual dataset, feature weighting can be adopted. Feature weighting is a good method to improve the performance of the model, Jankowski & Usowicz (2011) proposed that when feature weighting is applied, the weighting value of relevant features will be higher, while the weighting value of irrelevant features will be close to zero.

(8)

The feature weighting (such as the Pearson correlation coefficient, information entropy, etc) can not only improve the classification accuracy but also discard the features with weighting lower than a certain threshold, so as to improve the resource efficiency of the classifier. Sang & Liu (2008) proposed a feature-weighted KNN model, which was shown empirically to be effective in classification. However, they did not explore whether feature weighting could improve the performance of the KNN model in credit card applications, which is relevant here. Furthermore, the interaction of feature weighting with distance functions has not been explored.

Feature selection (such as random forest, ridge regression, etc) can reduce measurement and storage requirements and improve prediction performance (Kaelbling, 2003). Zhao et al. (2012) applied feature selection in text exploration and found that it could influence the effect of text classification. Karabulut et al. (2012) applied feature selection in the bayesian algorithm and found that the classification accuracy was improved by 15.55%. However, they did not consider whether feature weighting can also improve the performance of the model, and did not use feature selection and feature weighting together. Especially in credit card applications, whether feature selection and feature weighting can improve the accuracy of the KNN model has not been solved.

2.1.2 Decision tree

This is one of the most commonly used algorithms in credit scoring applications (Shetty & Manoj, 2019). The decision tree algorithm uses a tree structure and uses layer-by-layer reasoning to achieve the final classification. The decision tree consists of the following elements:

● Root node: a dataset containing all samples

● Internal nodes: each internal node is a question on features. It branches out to new nodes according to the answers.

● Leaf node: each leaf node has a class label, determined by the majority vote of training examples reaching that leaf.

When predicting, a certain attribute value is used to judge at the internal nodes of the tree, which branch node is entered according to the judgment result, and the classification result is obtained until reaching the leaf node. The decision tree is simple to interpret, but the risk of overfitting is high.

2.1.3 Random forest

Random forest uses random feature selection to generate many decision trees, and let the decision trees vote for how that example should be classified (Shetty

(9)

& Manoj, 2019). When we perform classification tasks, a new input sample is entered, and each decision tree in the forest is judged and classified separately. Each decision tree will get its classification result. For the sample we can let all decision trees vote for how that sample should be classified. Finally, the sample is classified into the category with the most votes.

The advantage of the random forest is that it is resistant to overfitting and has high accuracy because it combines multiple decision trees with random features. But it is much harder to interpret and time-consuming to create decision trees.

The random forest has a good performance in feature selection (Uddin & Uddiny, 2015; Genuer et al., 2010), the specific implementation process is introduced in section 3.3.

2.1.4 Naïve bayesian

Naïve bayesian classification is based on Bayes' theorem and assumes that the feature conditions are independent of each other (Merikoski et al., 2018). For a training set, taking the independence between features as a premise, through the learning model, the input X finds the output Y that maximizes the posterior probability. Naïve bayesian can deal with multiple classification problems but has a certain error rate because it determines the classification based on the prior data to determine the posterior probability.

2.1.5 Logistic regression

Logistic regression is a statistical model that can analyze the relationship between a type of the dependent variable and a set of independent variables. The logistic regression model produces a classification probability formula for solving binary classification (0 or 1) problems (Merikoski et al., 2018). It is simple to implement and easy to understand, but it is easy to underfit and the accuracy of classification is not high.

2.1.6 Support vector machine

The support vector machine (SVM) is a widely used supervised model that can be used for regression and classification tasks. It has been successfully applied in many aspects, such as time series prediction in the medical field. (Veropoulos et al., 1999). It can solve nonlinear problems and can handle high-dimensional datasets, but the interpretation is not strong.

Given a set of training instances, each training instance is marked as belonging to one or the other of the two categories, the SVM creates a model that assigns new instances to one of the two categories, making it a non-probabilistic binary

(10)

linear classifier. The SVM model represents instances as points in space so that the mapping allows instances of separate categories to be separated by as wide an obvious interval as possible. Then, map the new instances to the same space and predict the category based on which side of the interval they fall on.

2.1.7 The neural network

Neural networks are composed of neurons that can process information. The interconnection between neurons constitutes the algorithm model, including the input layer, hidden layer, and output layer (Hewahi, 2017). And neural networks have been successfully applied to practical classification tasks in various industries, including industrial, commercial, and scientific fields (Zhang, 2000).

When using the neural network to solve the classification problem, for example, we can apply the Sigmoid activation function, which is often used as the threshold function of the neural network to map variables between 0 and 1, to complete the classification task.

Mhatre et al. (2015) proposed that neural networks are stable because when an element of a neural network fails, it can continue its parallel nature without any problems. However, for large data processing, more processing time is needed (Vishnu & Patel, 2013).

2.2 Related work on credit card default modelling

Credit card default prediction is part of the credit scoring application. For the society, the application of credit score can reduce the application cost of borrowers, reduce the credit interest of lenders, reduce the loan interest of the whole society, and improve the economic efficiency. In addition, it can help enterprises make decisions in a fast and efficient way. The resulting speed and accuracy of decision making make credit score the cornerstone of risk management in banking, telecommunications, insurance, and retail industries (Abdou & Pointon, 2009).

The credit default prediction model divides customers into good (non-default) or bad (default) classes. It can increase the importance of the application of credit score. Compared with the traditional method based on expert experience, this method can improve the efficiency and time of loan approval, reduce the subjectivity in the loan approval process, and provide consistent decision-making. In the United States, over 90% of top financial institutions use the score to make billions of dollars of credit decisions every year (Zhou & Lai, 2009).

(11)

There are many algorithms for building credit card default prediction models. Shetty & Manoj (2019) predicted credit card defaults by using four machine learning algorithms including support vector machine, KNN, decision tree, and random forest. Compared with other models, the accuracy of the KNN model based on the Euclidean distance function is the lowest (only 0.6), and the AUC is only 0.57. However, they did not further explore whether feature selection or feature weighting could improve the accuracy of the KNN model. And they also did not use alternative distance functions to test the performance of the KNN model.

Liu (2018) compared neural networks with traditional machine learning models (SVM, KNN, decision tree, and random forest) and found that neural networks have higher accuracy than the KNN model. But he only compared different k values and did not compare different distance functions on the model. Mouhoub et al. (2018) compared logistic regression, Bayes, KNN, and random forest, and found that the precision of the KNN model was the lowest (only 73%) and the accuracy was only 72.9%, and only the Euclidean distance function was used.

In order to predict the consumer's transaction behavior, credit score, and related economic variables, Butaru et al. (2016) used decision tree, logistic regression, and random forest and other machine learning algorithms, and found that the single model could not predict the default customers well. So, it is not enough just to use different algorithms to build a credit card default model.

Therefore, using the KNN algorithm to build the model of credit card default, it is interesting to explore the influence of the distance function, feature selection, and feature weighting on the model. It is still an open question whether their interaction can improve the performance of the KNN model, especially in credit scoring applications.

(12)

3. Methods

This section introduces the KNN, the methods for feature weighting and feature selection, and the formulation of the different distance functions.

3.1 The KNN algorithm

The KNN algorithm is easy to implement and understand, and the following is its specific implementation process:

1) Establish a testing sample set and a training sample set.

The training sample set is expressed as: X = {( , ) | i = 1, 2, ... , n}xi ci

In the formula: xi x= ( i1, xi2, … , ) is an l-dimensional vector. That is, the xli

feature dimension is l, x_ij represents the j-th feature component value of the i-th training sample. c_i represents the corresponding category of the i-th sample, c_i belongs to the category label C, C = {1, 2 ,…, m}, m is the number of categories.

The testing sample set is expressed as: Y = { | j = 1, 2, ... , n}yj

In the formula: y_j y= ( 1_j,_y , … , ) is an l-dimensional vector. That is, the

j 2 _yl

j

feature dimension is l, yi represents the i-th feature component value of the

j

j-th testing sample.

2) Set k value. The determination of k value is adjusted repeatedly according to the classification effect in the experiment until the optimal k value is found. We use cross-validation to determine the k value, which is the method that evaluates and selects the parameter.

3) Calculate the distances between testing sample points and all training sample points. We first use the Euclidean distance function, other distance functions as alternatives.

4) Select k nearest neighbor training samples. For the testing sample point y, k training sample points nearest to the testing sample point y in the training sample set are found according to the Euclidean distance function, other distance functions as alternatives.

5) Discriminant rules for y category of testing samples. That is to conduct statistics on the k nearest neighbor training sample points obtained in step 4), calculate the number of each category of k training sample points, and classify

(13)

the category of testing samples into the category of the training sample with the largest number.

In two-dimensional space, the visual example is shown in figure 1, when k is equal to 3, the green circle is classified as a triangle, but when k is equal to 7, the green circle is classified as a rectangle. The Euclidean distance is implied by the circle.

Figure 1. 2-D example of the KNN algorithm

3.2 Feature weighting

Feature weighting can improve the accuracy of the classification model and the efficiency of the classifier (Jankowski & Usowicz, 2011). The working principle of feature weighting is to calculate the weighting of features through methods (such as correlation coefficient method, principal component analysis method, etc.). After applying feature weighting, the weighting value of relevant features can be higher, while the weighting value of irrelevant features will be assigned to close to zero. The higher the feature weighting is, the more it can affect the label. But the KNN algorithm can not weight features directly (Poorna et al., 2018).

The Pearson correlation coefficient can well reflect the degree of correlation between features and the label, and its value is between -1 and 1. Bensman (2004) proposed that the Pearson correlation coefficient can be used in the dataset with negative values, negative values are allowed in this measurement due to implicit normalization. Negative values represent negative correlations, which can also reflect the correlation between features and the label, so negative values are an important consideration. Because our dataset contains negative values, so it is the most appropriate to measure the correlation between features and the label, by applying the Pearson correlation coefficient

(14)

to the features, we can well know the correlation between each feature and the label, and we can carry out feature weighting according to the correlation. The higher the correlation is, the greater the weight value will be.

No relevant studies used the Pearson correlation coefficient as feature weighting in credit card defaults, it is very valuable to explore. Therefore, we use the Pearson correlation coefficient as the feature weighting and the product of the weighting value and data as the new value.

We use binary variables as an example to briefly introduce the Pearson correlation coefficient calculation. Suppose r represents the Pearson correlation coefficient between vector x and vector y, Egghe (2008) mentioned the definition of r is: = (1) rxy (x − x)(y − y) ∑n i = 1 i i

(

_√

∑n i = 1 (x − x)_i 2

)(

√

∑n i = 1 (y − y)_i 2

)

and represent the component of the vector x and y, and represent the

xi yi x y

average of the vector x and y.

3.3 Feature selection

Feature selection refers to selecting a subset of features from all the features to make the constructed model better. The principle of feature selection is to eliminate redundant features by using feature selection algorithms (such as decision tree, random forest, etc.), so as to reduce the number of features, improve model accuracy, and reduce the running time (Miao & Niu, 2016). On the other hand, the selection of truly relevant features simplifies the model and makes it easy for researchers to understand the process of data generation (Kaelbling, 2003).

Random forest is a popular and very effective feature selection algorithm, which is based on the idea of model aggregation and can calculate the importance of a single feature variable (Genuer et al., 2010; Han et al., 2006; Rogers & Gunn, 2005). It is very interesting to apply random forest as a feature selection in credit card default prediction models. Therefore, we use the random forest as the feature selection algorithm to calculate the importance of each feature and make a ranking of these features, to select the features with the high importance from all the features.

When using the random forest algorithm to calculate the importance of feature X, the specific steps are as follows:

(15)

1) For each decision tree in the random forest, the corresponding out of bag( OOB) data is used to calculate its OOB data error, which is denoted as errOOB1. The OOB error is an error estimation technique often used to evaluate the accuracy of the random forest and to select appropriate values for tuning parameters (Janitza & Hornung, 2018).

2) Randomly add noise to feature X of all OOB samples (It can randomly change the value of the sample at feature X) and calculate its OOB data error again, which is denoted as errOOB2.

3) Suppose there are N trees in the random forest, then calculate the importance score of feature X.

Importance score(X) = (errOOB2 - errOOB1) / N∑

When random noise is added to feature X, if the accuracy of OOB is greatly reduced, it indicates that feature X has a great influence on the classification results of samples, that is to say, its importance is relatively high. The greater the importance score, the more important the feature. And we use an importance score that is sensitive to the result as the threshold.

Feature selection and feature weighting can be applied separately to data preprocessing, and they can also be applied together. It is also our concern about whether they can further improve the performance of the model after applying together. In addition, in the KNN model, based on different distance functions, after applying feature selection and feature weighting, the performance of the model is also worth exploring.

3.4 Distance functions

1) Euclidean distance function

When applying the Euclidean distance function, we need to scale the data, because the result will be affected by the variable unit. The Euclidean distance matrix is a matrix of squared distances between points, which have been used in machine learning, wireless sensor networks, acoustics, and other fields (Dokmanic et al., 2015).

Agostino & Dardanoni (2009) mentioned the Euclidean distance formula in an n-dimensional space is:

d( , ) = x y

√

∑n (2)

i =1

(x _i − y_i)2

The formula represents the true distance between two points in the n-dimensional space, that is, the linear distance from the point to the point.

(16)

The Euclidean distance is easy to understand. However, the Euclidean distance treats the differences among variables equally, which may not meet the requirements in practical application and its calculation cost is high (Faber & Fisher, 2007).

2) Cosine distance function

When applying the Cosine distance function, we do not need to scale the data, because the result is not affected by the variable unit. The Cosine distance matrix is a matrix of the distance between elements using the Cosine of the angle between two vectors in the vector space. A vector is a directional line segment in a multidimensional space. If the direction of the two vectors is the same, that is, the angle between them is close to zero, then the two vectors are close.

The Cosine distance function can calculate the distance between features. The small distance between these features means the high similarity, while the large distance means the low similarity (Sitikhu et al., 2019).

Egghe & Leydesdorff (2009) mentioned the Cosine distance formula in an n-dimensional space is:

d( , ) = cos( ) = x y θ ∑(x × y ) (3) n i = 1 i i ×

√

∑n i = 1(x )i 2

√

∑n i = 1(y )i 2

represents the angle between the vector x and y, and represent the

θ x_i y_i

components of the vector x and y. Cosine distance function can be used for both continuous variables and categorical variables, but it can not work effectively with nominal data (Goswami et al., 2018).

3) Manhattan distance function

When applying the Manhattan distance function, we need to scale the data, because it is similar to the Euclidean distance function. The Manhattan distance matrix is a matrix of the sum of along the axes at 90 degrees or right angle between the points in the standard coordinate system.

The Manhattan distance formula in an n-dimensional space is: d( , ) = x y ∑n (4)

i = 1

x y

| | i − i|_|

and represent the components of the vector x and y. Manhattan distance

xi yi

is easy to generalize into higher dimensions, but it can not work efficiently with image data (Goswami et al., 2018).

(17)

4) Mahalanobis distance function

Mahalanobis distance is an effective method to calculate the similarity between two unknown sample sets. Unlike Euclidean distance, Mahalanobis distance takes into account the relationship between various features. It can use distance in multivariate analysis, including classification, clustering, hypothesis testing, and outlier detection, etc. (Joseph et al., 2013).

Assuming that the covariance matrix of x and y is denoted as S, the Mahalanobis distance formula is:

d( , ) = x_i y_i

√

(x _i − y ) S (x _iT −1 y ) (5)

i − i

and represent the components of the vector x and y. Mahalanobis

xi yi

distance uses the group mean and variance of each variable to solve the problem of correlation and scale, but it can not be calculated if the variables are highly correlated (Goswami et al., 2018).

We use a simple example to introduce the calculation process of the Mahalanobis distance function.

Assuming that = [3, 4, 5, 6], = [2, 2, 8, 4].x y

Then, convert and into a matrix , so = [[3, 4, 5, 6], [2, 2, 8, 4]].x y X X

The transpose of the matrix is: X XT = [[3, 2], [4, 2], [5, 8], [6, 4]]. The inverse matrix of the covariance matrix of is:X

= [[0.86, -0.21], [0.21, 0.18]]

S−1

Finally, the Mahalanobis distance matrix is: [0.93, 2.17, 2.42, 2.17, 1.56, 2.33]

(18)

4. Experimental design

This section introduces the information of the dataset, the data preprocessing, implementation details of the KNN model, and metrics of evaluation.

4.1 The description of the dataset

Taiwan‘s credit card customer data has been used in this paper. The dataset consists of 30,000 records and 23 features, each of which corresponds to a credit card customer. The target variable is a binary data (0: no default, 1: default) used to record whether the customer has made a default payment. The dataset includes the following features:

Table 1. Features of the credit card dataset

Feature Feature name

x1 LIMIT_BAL (amount of the given credit)

x2 SEX

x3 EDUCATION

x4 MARRIAGE (marital status)

x5 AGE (years)

x6 - x11 PAY (history of past payment, each of the last six months) x12 - x17 BILL_AMT (amount of bill statement, each of the last six

months)

x18 - x23 PAY_AMT (amount of previous payment, each of the last six months)

(19)

4.2 The data preprocessing

1) Unbalanced data processing

Visualize the label in the credit card dataset as follows:

Figure 2. The number of no default/ default in the credit card dataset

Figure 2 shows the total percentage of defaults in the dataset at 22.12%. That means 6,636 defaults out of 30,000 pieces of data, which is unbalanced. When the data is unbalanced, we can not use accuracy to evaluate our model. Because accuracy is only applicable when the dataset contains the same number of observations in each class balanced (Sharma & Mehra, 2018). So we need to balance the data.

The synthetic minority oversampling technique (SMOTE) is a good way to deal with the imbalance of the dataset (Ha & Bunke, 1997; Chawla et al., 2002). SMOTE is an oversampling method, in which the less varied classes (a few classes) in the training set are oversampled, and new specimens are formed to relieve the class imbalance. Therefore, we decide to use the SMOTE method to solve the data imbalance problem.

Figure 3. The number of no default/ default in the credit card dataset after applying SMOTE method

(20)

As shown in figure 3, after applying the SMOTE method to balance the dataset, the number of default and non-default is the same, which is 23,364. so we can use the accuracy to evaluate the performance of the model.

2) Data standardization

Scikit-learn is a Python module that integrates the latest machine learning algorithms for both supervised and unsupervised problems (Pedregosa et al., 2011). The standard scalar method of the scikit-learn can scale the features to between 0 and 1, which will be helpful for the learning of the model, by applying to all numerical features by removing the average value of the feature and then dividing by the standard deviation of the data (Pedregosa et al., 2011). So, we use this method to standardize the dataset.

Suppose is an array, so the formula of standardization is:x

= (6)

z_i x − ui_σ

is each value of array , is the mean value of array , is the standard

xi x u x σ

deviation of the array .x

3) Dataset separation

We use all the data after applying the SMOTE method to balance the dataset, and all the data are useful. Using the train_test_split method provided by the scikit-learn library to divide the dataset into the training set (80% of the dataset) and the testing set (20% of the dataset) is popular (Bokka et al., 2019; Modi et al., 2019; Géron, 2019). So we use the same method to split the dataset as they did, and set test_size is equal to 0.2. The train_test_split method randomly splits the sample data into the training set and the testing set, so it can avoid the uneven division of the sample data. The k-fold cross-validation can also split datasets, but it is computationally complex and time-consuming. By using validation data, we can adjust the parameters of the model. However, it is more popular to use cross-validation in the KNN model to determine k value, which can avoid overfitting or underfitting (Kohavi, 1995; Salzberg, 1997). So we use cross-validation to instead of validation data.

4.3 Implementation details on the KNN

1) The choice of k-value

Shetty & Manoj (2019) used k equals 5 to predict credit card defaults in the KNN model, and the model performed well. But we want to use a cross-validation method to determine whether the model performs best when k is equal to 5.

(21)

2) The choice of distance functions

We use the Euclidean distance function in the implementation of KNN, which does not mean that we do not replace the distance function. Cosine distance function, Manhattan distance function, and Mahalanobis distance function as alternatives, by comparing the performance of the final model to determine which distance function is most suitable.

4.4 Metrics of evaluation

Sharda et al. (2015) proposed the use of a confusion matrix in classification problems to estimate the real accuracy of classification models. Figure 4 shows the confusion matrix of two classification problems.

Figure 4. A confusion matrix of two-class classification results

We use a simple example to explain TP, FP, TN, and FN. Suppose there is a test to check for disease, and everyone who is tested either has the disease or does not. The test result can be positive (classifying the person as having the disease) or negative (classifying the person as not having the disease). Person's test results may or may not match their actual conditions, so:

● True Positive: Sick people correctly identified as sick ● False Positive: Healthy people incorrectly identified as sick ● True Negative: Healthy people correctly identified as healthy ● False Negative: Sick people incorrectly identified as healthy 1) Accuracy

Classification accuracy is the ratio of correctly classified instances (positives and negatives) divided by the total number of instances. And it is only applicable to balanced datasets.

(22)

Accuracy = T P + T N T P + F P + T N + F N

2) Receiver operating characteristic curve

The receiver operating characteristic (ROC) curve is a graphical plot between the true-positive rate and false-positive rate (Provost et al., 1998). The area under curve (AUC) stands for the area under the ROC curve, the AUC determines the accuracy measure of the classifier: 1 means that the classifier is perfect, and 0 means that the prediction is wrong (Sharda et al., 2015).

The AUC is an accepted traditional performance metric for a ROC curve, which can well reflect the performance of the model (Duda et al., 2001; Bradley, 1997; Lee, 2000). So we use AUC and accuracy to measure the performance of our model.

3) Benchmarks from other classifiers

It's a good practice to compare benchmark results, and it is popular in machine learning (Zhu et al., 2018; Boden et al., 2018). We use the basic KNN model, logistic regression, decision tree, and random forest to demonstrate the relative performance of the best KNN that we built. Because these algorithms have been found to perform well in credit scoring applications (Liu, 2018; Mouhoub et al., 2018; Shetty & Manoj, 2019).

For the basic KNN model, we use KNeighborsClassifier from the scikit-learn library and set the same k value as the KNN model we built, as well as the same feature selection and feature weighting method.

For logistic regression, decision tree and random forest model, we also use the models in the scikit-learn library and implement them in a standard way. For example, we use the LogisticRegression from the scikit-learn library with the default parameters. Similarly, the DecisionTreeClassifier and RandomForestClassifier are also applied from the scikit-learn library, and the parameters are also set to the default values. The same feature weighting and feature selection methods are applied.

(23)

5. Results

This section describes the different results of the KNN model in k-value, distance function, feature weighting, feature selection, and compare against benchmark results.

1) The influence of k-value on the KNN model

As shown in figure 5, using Taiwan‘s credit card customer dataset, in the KNN model, only based on the alternative distance functions, we test different k values and find that when k is equal to 5, the AUC value of the KNN model based on the Mahalanobis distance function is the highest (0.639). So we use the k equals 5 to build the model.

Figure 5. Performance of the KNN model

2) The influence of only distance functions on the KNN model Table 2. Performance of the KNN model

Distance function Accuracy AUC Euclidean distance 0.725 0.610

Cosine distance 0.735 0.622 Manhattan distance 0.733 0.613 Mahalanobis distance 0.743 0.639

(24)

As shown in table 2, based on the k equals 5, feature selection and feature weighting are not applied. Although the Euclidean distance is common in credit scores, the accuracy of the Mahalanobis distance in the KNN model is higher than that of the Euclidean distance. And the AUC of Mahalanobis distance is the highest.

The Mahalanobis distance also eliminates the interference of the correlation between features, therefore, when we predict the credit card default, it is the best choice as a distance measure, and the model can perform better. Moreover, in the credit card default prediction, the use of Mahalanobis distance as the distance measure of the KNN model is new, so it is well worth exploring. 3) The influence of only feature weighting on the KNN model

The Pearson correlation coefficient between each feature and label in the credit card dataset is as follows:

Table 3. The Pearson correlation coefficient between each feature and label Feature Pearson correlation coefficient

LIMIT_BAL -0.1535 SEX -0.0400 EDUCATION 0.0280 MARRIAGE -0.0243 AGE 0.0139 PAY_0 0.3248 PAY_2 0.2636 PAY_3 0.2353 PAY_4 0.2166 PAY_5 0.2041 PAY_6 0.1869 BILL_AMT1 -0.0196 BILL_AMT2 -0.0142 BILL_AMT3 -0.0141

(25)

BILL_AMT4 -0.0102 BILL_AMT5 -0.0068 BILL_AMT6 -0.0054 PAY_AMT1 -0.0729 PAY_AMT2 -0.0586 PAY_AMT3 -0.0563 PAY_AMT4 -0.0568 PAY_AMT5 -0.0551 PAY_AMT6 -0.0532

Based on the Mahalanobis distance function and the k equals 5, we only apply feature weighting, the performance of the KNN model is as follows:

Table 4. Performance of the KNN model Feature weighting Accuracy AUC

No 0.743 0.639

Pearson correlation coefficient

0.775 0.679

In table 4, it is clear that when the Pearson correlation coefficient is used as the feature weighting, the KNN model has better performance, accuracy is 0.775, AUC is 0.679.

4) The influence of only feature selection on the KNN model Table 5. The importance score of features Feature Importance

score

Importance score after applying feature

weighting

The differences

(26)

SEX 0.0121 0.0114 0.07% EDUCATION 0.0343 0.0329 0.14% MARRIAGE 0.0185 0.0218 -0.33% AGE 0.0674 0.0672 0.02% PAY_0 0.0864 0.0928 -0.64% PAY_2 0.0482 0.0528 -0.46% PAY_3 0.0278 0.0335 -0.57% PAY_4 0.0506 0.0289 2.17% PAY_5 0.0226 0.0276 -0.5% PAY_6 0.0297 0.0316 -0.19% BILL_AMT1 0.0512 0.0502 0.1% BILL_AMT2 0.0469 0.0471 -0.02% BILL_AMT3 0.0443 0.0451 -0.08% BILL_AMT4 0.0451 0.0444 0.07% BILL_AMT5 0.0432 0.0439 -0.07% BILL_AMT6 0.0426 0.0436 -0.1% PAY_AMT1 0.0440 0.0445 -0.05% PAY_AMT2 0.0420 0.0428 -0.08% PAY_AMT3 0.0425 0.0421 0.04% PAY_AMT4 0.0412 0.0407 0.05% PAY_AMT5 0.0432 0.0425 0.07% PAY_AMT6 0.0463 0.0461 0.02%

It is better when the importance score is high, the higher the importance score, the more important the feature. Through table 5, we can clearly see that the importance score of the feature changes very little when the feature is weighted.

(27)

Therefore, we use the original feature importance score and select features with an importance score greater than 0.05 to build our model, because the score of only five features is greater than 0.05, it is interesting to explore whether these five features can improve the performance of the model.

Based on the Mahalanobis distance function and the k equals 5, we only apply feature selection, the performance of the KNN model is as follows:

Table 6. Performance of the KNN model Feature selection Accuracy AUC

No 0.743 0.639

Random forest 0.765 0.672

As shown in table 6, the KNN model has higher accuracy (0.765) when using the random forest algorithm to select features. There are also improvements in AUC. Because the random forest can detect the interaction between the characteristics in the training process. Moreover, we chose the features with an importance score greater than 0.05, so the KNN model can perform better in training and prediction, and can save memory resources.

5) The influence of feature weighting and feature selection on the KNN model Figure 6. Performance of the KNN model

In figure 6, the feature “NO” represents that only the Mahalanobis distance function is applied in the KNN model. We explore the interaction between feature weighting and feature selection based on the Mahalanobis distance

(28)

function because the accuracy and AUC of the KNN model based on the Mahalanobis distance function are the highest than other distance functions. Weighting & Selection means that we first use the random forest for feature selection and choose five features with an importance score greater than 0.05. Then, the Pearson correlation coefficient is used to calculate the feature weighting. In this process, the selection of features will not have an impact on feature weighting, because the calculation of feature weighting is the similarity between each feature and label, which will not have an impact on between features.

We find that when both feature selection and feature weighting are applied, the accuracy and AUC of the KNN model are higher than using feature weighting or feature selection alone, and it performs better than only applying the Mahalanobis distance function.

The main reason for the results of only weighting and selection & weighting are close to each other is feature weighting more influential to the results than feature selection for the KNN model in the credit card default prediction. More specifically, there is little difference between the five features and all the features applied, indicating that the five features we selected could reflect the impact of all the features on the model performance, so the difference between them is minimal.

More importantly, after applying feature weighting and feature selection, the AUC of the KNN model is higher than Shetty & Manoj (2019) based solely on the Euclidean distance function. And the accuracy is also higher than the KNN model created by Mouhoub et al. (2018).

6) Compare against benchmark results

(29)

Based on the same feature selection (Random forest) and feature weighting (Pearson correlation coefficient) method, we test the experiment separately with different classifiers and obtain the accuracy and AUC of each classifier. In figure 7, we find that our best KNN model has the highest AUC and accuracy compared to the basic KNN, logistic regression, and decision tree. Compared with the random forest model, the AUC of our best KNN model is the same as it, but in terms of accuracy, our KNN model is slightly lower.

In short, we recommended changes have made the KNN model to outperform standard classifier and be competitive with a fairly advanced one, while being very simple!

(30)

6. Discussion

This section briefly introduces results, limitations of the work, related ethical considerations, and future work.

6.1 Regarding results

In terms of distance functions, the Mahalanobis distance and Cosine distance have little difference in the accuracy of the KNN model prediction, but the accuracy of Mahalanobis distance is higher (0.743). Moreover, the AUC and accuracy of Mahalanobis distance are both higher than those of Euclidean distance. Although the differences between them are not significant, small improvements in the models are important for banks in the application of credit card default prediction. Therefore, the Mahalanobis distance function can perform better in the KNN model when default prediction is made for credit card customers.

In default prediction for credit card customers, both feature weighting and feature selection can improve the accuracy and AUC of prediction. More importantly, when feature weighting and feature selection are applied at the same time, the accuracy and AUC of the model can be further improved. In terms of benchmarks, especially compared with the basic KNN model, our best KNN model has been greatly enhanced in terms of accuracy and AUC and is competitive with advanced classifiers. More importantly, in the actual credit scoring application of companies, the methods we used are very valuable to them, because they can try these methods to improve the performance of the model. In addition, for companies that have applied the KNN model, the Mahalanobis distance function can also be considered to improve the performance of the KNN model. After improving the performance of the model, they can reduce the losses due to default.

6.2 Limitations

There are several following limitations to this work:

1) The dataset of credit cards from Taiwan is not the recent behavior of customers, and the default analysis can not truly reflect the default situation of bank customers in recent years. It will be more worth exploring if we can get the latest credit card data from the bank.

2) The number of features is relatively small, only 23. If more features can be collected, the exploration of feature weighting and feature selection can be

(31)

more helpful, and the accuracy of model prediction can be improved to some extent.

3) In the selection of machine learning algorithms, we only improved the KNN algorithm but do not improve other algorithms such as Naïve bayesian and random forest. Therefore, we can not know whether the KNN algorithm is more competitive than other algorithms after our improvements.

4) We applied only one approach to feature selection and feature weighting and do not test whether a combination of multiple approaches (such as linear model, regularization, random forest, and so on) could improve model performance. Moreover, we do not use other feature processing methods, such as feature transformation and generation, so we do not know whether they can improve the performance of the KNN model better.

6.3 Ethical considerations

When it comes to ethical considerations in research, we adhere to the code of conduct established by Canter et al. (1994) to protect intellectual property rights and data privacy. Therefore, in this paper, we use publicly available data, and data collection is consistent with research principles and copyright law. In terms of model transparency, the KNN model we used is highly interpretable and understandable, such as the measurement of the distance between observation points, we used common distance functions. The treatment of features such as race could cause problems, so this feature is suppressed when applying feature weighting and selection.

6.4 Future work

Feature transformation and feature generation can also be applied to the classifier model, and the comparison of different feature processing in terms of classifier performance may be interesting.

In addition, whether using different feature weighting and feature selection methods in the KNN model can further improve the performance of the model is also worth exploring.

Last but not least, based on the methods we used, expand more general classification methods, such as random forest, Naïve bayesian, etc., to explore whether these methods can improve the performance of other models. For example, in the Naïve bayesian model, we can test the accuracy and AUC of the model only based on the Pearson correlation coefficient is used as the feature

(32)

weighting or only based on the random forest is used for feature selection. Then, we can use feature weighting and feature selection in the model at the same time. By comparing results, we can know whether the methods we used can improve the performance of the Naïve bayesian model.

(33)

7. Conclusions

In this project, we first used oversampling to balance the dataset in the preprocessing stage to avoid the overfitting of the model. And then we implemented the KNN classification algorithm, and we applied the Mahalanobis distance function because it performed better than other distance functions based on the accuracy and AUC of the prediction of the model. In order to better evaluate the impact of feature weighting and feature selection on model prediction, we set up a control experiment, such as only applying feature weighting/not applying feature weighting. Through the comparison, we can clearly know their influence on the model accuracy. After the application of feature weighting to the model, the accuracy of model prediction increased from 0.743 to 0.775, the AUC of model prediction increased from 0.639 to 0.679. After the application of feature selection, the accuracy of model prediction increased from 0.743 to 0.765. And applying them at the same time, the accuracy of the model prediction has increased to 0.78. In addition, our model performs well against the benchmarks.

Therefore, in the KNN model, based on Mahalanobis distance function, using the Pearson correlation coefficient as feature weighting, and using the random forest for feature selection, they can improve the accuracy and AUC of the model.

In the process of building a model, the processing of features is critical, because it is clear from our results that in the KNN model, applying feature selection and feature weighting can improve the accuracy and AUC of the model. Therefore, when considering the improvement of model performance, feature selection and feature weighting are conceivable solutions.

In the application of credit scoring, our results are helpful to those banks or enterprises that use the KNN to build a credit default prediction model and we provide some insights to improve model performance.

(34)

8. References

Abdou, H. A., and Pointon, J. (2009) Credit scoring and decision making in Egyptian public sector banks. International Journal of Managerial Finance, pp. 391-406.

Agostino, M. D., and Dardanoni, V. (2009) What's so special about Euclidean distance? A characterization with applications to mobility and spatial voting.

Social Choice and Welfare, pp. 211-233.

Bensman, S. J. (2004) Pearson's r and author cocitation analysis: A commentary on the controversy. Journal of the American Society for

Information Science and Technology, pp. 935.

Boden, C., Rabl, T., Schelter, S., and Markl, V. (2018) Benchmarking Distributed Data Processing Systems for Machine Learning Workloads. Performance Evaluation and Benchmarking for the Era of Artificial Intelligence, pp. 42-57.

Bokka, K. R., Hora, S., Jain, T., and Wambugu, M. (2019) Deep Learning for Natural Language Processing: Solve your natural language processing problems with smart deep neural networks, 1st Edition, pp.309.

Bradley, A. P. (1997) The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(6), pp. 1145–1159.

Butaru, F., Chen, Q. Q., Clarks, B., Das, S., Andrew, W. L., and Siddique, A. (2016) "Risk and risk management in the credit card industry", Journal of

Banking & Finance, vol. 72, pp. 218-239.

Canter, M. B., Bennett, B. E., Jones, S. E., and Nagy, T. F. (1994). Ethics for psychologists: A commentary on the APA Ethics Code. American Psychological

Association. pp. 1-254.

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. (2002) SMOTE: Synthetic Minority Over-sampling Technique.Journal of Artificial Intelligence

Research, pp. 321-357.

Chou, T., and Lo, M. (2018) Predicting Credit Card Defaults with Deep Learning and Other Machine Learning Models. International Journal of

(35)

Dokmanic, I., Parhizkar, R., Ranieri, J., and Vetterli, M. (2015) Euclidean Distance Matrices: Essential Theory, Algorithms and Applications.IEEE Signal

Processing Magazine. pp. 12-30.

Duch, W., and Grudzinski, K. (1999) Weighting and selection of features.

Intelligent Information Systems VIII. pp. 14-18.

Duda, R., Hart, P., & Stork, D. (2001) Pattern Classification.Wiley-Interscience, pp. 1-688.

Egghe, L. (2008) New relations between similarity measures for vectors based on vector norms.Journal of the American Society for Information Science and

Technology, pp. 232-239.

Egghe, L., and Leydesdorff, L. (2009) The relation between Pearson’s correlation coefficient r and Salton’s cosine measure. Journal of the American

Society for Information Science & Technology (forthcoming), pp. 1027-1036.

Faber, P., and Fisher, R. (2007) Pros and Cons of Euclidean Fitting. Pattern

Recognition: 23rd DAGM Symposium Munich, pp. 414-420.

Fabris, N. (2019) Cashless Society -- The Future of Money or a Utopia? Journal

of Central Banking Theory and Practice, pp. 53-66.

Genuer, R., Poggi, J. M., and Malot, C. T. (2010) Variable selection using Random Forests. Pattern Recognition Letters, 31(14), pp. 2225-2236.

Géron, A. (2019) Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems.

"O'Reilly Media, Inc.”, pp. 307-308.

Goswami, M., Babu, A., Purkayastha, B.S. (2018) A Comparative Analysis of Similarity Measures to find Coherent Documents. International Journal of

Management, Technology And Engineering, vol. 8, Issue XI, pp. 786-797.

Han, L., Embrechts, M. J., Szymanski, B., Sternickel, K., and Ross, A. (2006) Random Forests Feature Selection with K-PLS:Detecting Ischemia from Magnetocardiography. European Symposium on Artificial Neural Networks

Bruges (Belgium), pp. 26-28.

Hand, D., Mannila, H., and Smyth, P. (2001) Principles of Data Mining. A Bradford Book The MIT Press.

Hand, D. J., and Henley, W. E. (2007). Statistical classification methods in consumer credit scoring: a review. Journal of the Royal Statistical Society:

(36)

Ha, T. M., and Bunke, H. (1997). Off-line, Handwritten Numeral Recognition by Perturbation Method.Pattern Analysis and Machine Intelligence, 19(5), pp. 535-539.

He, Q. P., and Wang, J. (2007) Fault detection using the k-nearest neighbor rule for semiconductor manufacturing processes. IEEE Trans. Semicond.

Manuf. 20, pp. 345–354.

Hewahi, N. M. (2017) A hybrid approach based on genetic algorithm and particle swarm optimization to improve neural network classification. Journal

of Information Technology Research, 10(3). pp. 48-49.

Hu, L. Y., Huang, M. W., Ke, S. W., and Tsai, C. F. (2016) The distance function effect on k-nearest neighbor classification for medical datasets.SpringerPlus, pp. 1304.

Janitza, S., and Hornung, R. (2018) On the overestimation of random forest’s out-of-bag error. PLoS ONE. Available from:

https://doi.org/10.1371/journal.pone.0201904 [Accessed from 6th May 2020]. Jankowski, N., and Usowicz, K. (2011) Analysis of Feature Weighting Methods Based on Feature Ranking Methods for Classification. Neural Information

Processing, pp. 238-247.

Joseph, E., Galeano, P., and Lillo, R. E. (2013) The Mahalanobis distance for functional data with applications to classification. Technometrics, pp. 1-13. Kaelbling, L. P. (2003) An Introduction to Variable and Feature Selection.

Journal of Machine Learning Research, pp. 1157-1182.

Karabulut, E. M., Ozel, S. A., and Ibrikci, T. (2012) A comparative study on the effect of feature selection on classification accuracy. Procedia Technology, vol.1, pp. 323-327.

Kohavi, R. (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of International Joint

Conference on AI, pp. 1137–1145.

Lee, S. (2000). Noisy Replication in Skewed Binary Classification.

Computational Statistics and Data Analysis, pp. 165-191.

Liu, R. L. (2018) Machine Learning Approaches to Predict Default of Credit Card Clients. Modern Economy, pp. 1828-1838.

Merikoski, M., Viitala, A., and Shafik, N. (2018) Predicting and Preventing Credit Card Default. Seminar on Case Studies in Operations Research, pp. 1-7.

(37)

Mhatre, M. S., Siddiqui, F., Dongre, M., and Thakur, P. (2015) A Review paper on Artificial Neural Network: A Prediction Technique. International Journal of

Scientific & Engineering Research, vol.6, pp. 161-163

Miao, J. Y., and Niu, L. F. (2016) A Survey on Feature Selection. Procedia

Computer Science, pp. 919-926.

Modi, J., Traore, Issa., Ghaleb, A., Ganame, K., and Ahmed, S. (2019) Detecting ransomware in encrypted web traffic. Springer International Publishing. pp. 345-352.

Mouhoub, M., Sadaoui, S., Mohamed, O. T., and Ali, M. (2018) Recent Trends and Future Technology in Applied Intelligence. IEA/AIE: International

Conference on Industrial, Engineering and Other Applications of Applied

Intelligent Systems, pp. 99.

Nie, Y. (2012) Research on Credit Rating of Consumer Client of Credit Card Based on Data Mining. Available from:

Https://doi.org/10.7666/d.d224290 [Accessed from 1th April 2020]

Panday, D., and Amorim, R. C. D. (2017) Feature weighting as a tool for unsupervised feature selection. Information Processing Letters, pp. 129. Pedregosa et al. (2011) Scikit-learn: Machine Learning in Python, Journal of

Machine Learning Research, 12(85), pp. 2825–2830.

Poorna, S. S., Anuraj, K., and Nair, G. J.(2018) A Weight Based Approach For Emotion Recognition from Speech: An Analysis Using South Indian Languages.

Department of Electronics and Communication Engineering, pp. 14-24.

Provost, F., Fawcett, T., and Kohavi, R. (1998) The case against accuracy estimation for comparing induction algorithms. Proceeding of the 15th

International Conference on Machine Learning, pp. 445–453.

Rogers, J., and Gunn, S. (2005) Identifying Feature Relevance Using a Random Forest. Latent Structure and Feature Selection, pp. 173-184.

Sainin, M. S., and Alfred, R. (2010) Nearest Neighbour Distance Matrix Classification. Advanced Data Mining and Applications, pp. 114-124.

Salzberg, S. L. (1997) On comparing classifiers: pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1(3), pp.317-328.

Sang, Y. B., and Liu, Q. S. (2008) A Weighting-based on Feature of KNN Algorithm. Journal of hainan university (natural science), pp. 352-355.

(38)

Sharda, R., Delen, D., and Turban, E. (2015) Business intelligence and analytics. Tenth Edition.

Sharma, S., and Mehra, V. (2018) Default Payment Analysis of Credit Card Clients. Available from:

https://doi.org/10.13140/rg.2.2.31307.28967 [Accessed from 20th April 2020] Shetty, A. S., and Manoj, R. (2019). Prediction of default credit card users using data mining techniques.International Journal of Innovative Technology and

Exploring Engineering, 8(7), pp. 816-821.

Sitikhu, p., Pahi, k., Thapa, p., and Shakya, s. (2019) A Comparison of Semantic Similarity Methods for Maximum Human Interpretability. Available from:

https://doi.org/10.1109/AITB48515.2019.8947433 [Accessed from 7th May 2020]

Uddin, T., and Uddiny, A. (2015) A guided random forest based feature selection approach for activity recognition. Available from:

Https://doi.org/10.1109/ICEEICT.2015.7307376 [Accessed from 15th May 2020]

Veropoulos, K., Cristianini, N., and Campbell, C. (1999). The application of support vector machines to medical decision support: A case study. In ECCAl

Advanced Course in Artificial Intelligence, pp. 17-21.

Vishnu, D. A., and Patel, R. I. (2013) "A review on prediction of EDM Parameter using artificial neural network",International Journal of Scientific

Research, 2(3), pp. 145-149.

Wolf, L., and Shashua, A. (2005) Feature Selection for Unsupervised and Supervised Inference:The Emergence of Sparsity in a Weight-Based Approach.

Journal of Machine Learning Research, pp. 1855-1887.

Xie, H., and Zhao, H. Y. (2015) Improved KNN algorithm based on chi-square distance measurement, Applied Science and Technology, vol. 42, pp. 10-14. Yeh, I. C., and Lien, C. H. (2009) “The Comparisons of Data Mining Techniques for the Predictive Accuracy of Probability of Default of Credit Card Clients”,

Expert Systems with Applications, 36, pp. 2473-2480.

Zhang, G. (2000) Neural Networks for Classification: a Survey. Systems, Man, and Cybernetics. IEEE Transactions on Applications and Reviews, pp. 451–462.

Zhao, H., Liu, H. L., and Fan, Y. J. (2012) Study on the Application of Complex Network Theory in Chinese Text Feature Selection. Modern library

(39)

Zhou, L. G., and Lai, K. K. (2009) Adaboosting Neural Networks for Credit Scoring. The Sixth International Symposium on Neural Networks, pp. 875-884.

Zhu, H., Akrout, M., Zheng, B. J., Pelegris, A., Phanishayee, A., Schroeder, B., and Pekhimenko, G. (2018) Benchmarking and Analyzing Deep Neural Network Training. Available from:

THE IMPACT OF DISTANCE, FEATURE WEIGHTING AND SELECTION FOR KNN IN CREDIT DEFAULT PREDICTION