Stock market prediction using the K Nearest Neighbours algorithm and a comparison with the moving average formula

(1)

Neighbours algorithm and a comparison with

the moving average formula

Ida Vainionpää and Sophie Davidsson

Degree Project in Computer Science

DD143X

Supervisor: Pawel Andrzej Herman

Examinator: Örjan Ekeberg

(2)

be used and whether or not the predicted results are valid. This report will compare the prediction methods, the K Nearest Neighbour algorithm and the moving average formula using the closing prices of four Swedish equities that are based on the Stockholm stock exchange OMX. To get a proper familiarization into the background of stock markets and the utilized formulas, the report explains these theoretical concepts for the reader. A proper distribution of the results is given of the data with ap-propriate charts and tables. Lastly a discussion explains the implications of the results and the conclusion that the K Nearest Neighbour algorithm produced more accurate data when compared to the moving average for-mula.

(3)

List of Figures

1 KNN algorithm explained in a picture . . . 7

2 Pseudocode explaining the KNN algorithm in a simple way (KNN Classification Algorithm Implemented in Lisp) . . . . 8

3 The mathematical formula for the MA . . . 9

4 The mathematical formula for the euclidean distance . . . . 11

5 The mathematical formula for the average . . . 11

6 The mathematical formula for the MA . . . 12

7 The mathematical formula for RMSE . . . 12

8 The mathematical formula for MPE . . . 13

9 The mathematical formula for AD . . . 13

(4)

11 Graph displaying the actual values for ABB against the predicted values from the KNN algorithm . . . 15 12 Table displaying a short cutout of the actual price values,

those predicted using the KNN algorithm and their diﬀer-ence (deviation) . . . 15 13 Graph displaying the actual values for ABB against the

predicted values from the MA formula . . . 16 14 Table displaying a short cutout of the actual price values,

those predicted using the MA formula and their diﬀerence (deviation) . . . 16 15 Graph displaying the actual values for Astrazeneca against

the predicted values from the KNN algorithm . . . 17 16 Table displaying a short cutout of the actual price values,

those predicted using the KNN algorithm and their diﬀer-ence (deviation) . . . 17 17 Graph displaying the actual values for Astrazeneca against

the predicted values from the MA formula . . . 18 18 Table displaying a short cutout of the actual price values,

those predicted using the MA formula and their diﬀerence (deviation) . . . 18 19 Graph displaying the actual values for H&M against the

predicted values from the KNN algorithm . . . 19 20 Table displaying a short cutout of the actual price values,

those predicted using the KNN algorithm and their diﬀer-ence (deviation) . . . 19 21 Graph displaying the actual values for H&M against the

those predicted using the MA formula and their diﬀerence (deviation) . . . 20 23 Graph displaying the actual values for Investor against the

predicted values from the KNN algorithm . . . 21 24 Table displaying a short cutout of the actual price values,

those predicted using the KNN algorithm and their diﬀer-ence (deviation) . . . 21 25 Graph displaying the actual values for Investor against the

those predicted using the MA formula and their diﬀerence (deviation) . . . 22 27 Table displaying the three diﬀerent error calculations method

(5)

1 Introduction

Stock markets are today a large part of each nation’s financial system and the stock markets can be a direct portrayal of how a nation’s economy de-velopment and downturn is progressing. There are currently several meth-ods used in order to help market participators forecast market movement. The most commonly used market prediction methods use software based solutions that use graphical and statistical approaches of foreseeing the market movement (Beattle, 2011). One example of a statistical method would be the moving average which is often used by traders (Interactive Data Corp, 2014). These methods however are not competent enough to be relied on completely and often the opinion of an experienced market trader is valued higher than software based predictions (Beattle, 2011). If a valid prediction method were to exist, then one could foresee large fluctuations in the market and enforce countermeasures in order to reduce the consequences of such large developments. This could also generate a less chaotic behaviour in the stock market.

If this kind of prediction method were to exist the field of computer science would become more essential, because computers would be used more frequently. These days the field of computer science is ever evolv-ing and becomes more and more evident in the every aspect of one’s lifestyle. As the capability of computers continues to broaden, more and more tasks are being completed with the help of computers instead of hu-mans, thus eliminating the human error factor. We now see complicated tasks being achieved by computers through the use of artificial intelligence which is becoming ever nearer to replacing a part of the error filled hu-man mind. Many examples of technology replacing the huhu-man error is seen in aviation, which suggests that it one day could be evident in more fields (Ihilliard1 2011). This elimination of human error could also be introduced in the finance field for example when predicting stock market movements. Using computational intelligence, machine learning and data mining to find correlations in large data sets that humans are not capable of finding, can now be used as a prediction method in finance as well as the fields of medicine and biology (Alexander, 1998).

(6)

There are existing techniques when it comes to stock prediction, some of them are multispectral prediction, distortion controlled prediction and lempel-ziv based prediction. These are based on the fact that the data representation is more compact by removing redundancy while the essen-tial information is kept in format that is accessible (Azhar et al. 1994). Due to the scope of the project the techniques that were the most suitable to work with were the KNN algorithm and the MA formula despite the existing techniques listed above.

The most accurate way to predict the outcome of the stock market is a frequently discussed matter. It is extremely complicated to take into consideration all those factors that can influence a stock. For example, internal development, world events, inflation and interest rates, exchange rates and lastly hype (Wolski 2014). Over the last couple of years the idea of using data mining to try and prognosticate the stock market has been increasingly explored because of people’s growing interest in the stock market and ability to gain a profit.

1.1 Problem Statement

The problem statement we have chosen to work with in this project fol-lows from the hypothesis that the KNN algorithm is a more precise way of predicting closing prices than the MA formula is. The problem statement is formulated below:

Is using the KNN algorithm a more precise way of predicting the future closing prices of equities than using the more common method of MA?

1.2 Problem Scope

The following report will delve into the concept of using a data mining al-gorithm to attempt to predict the movement of four equities as accurately as possible on the Stockholm stock exchange OMX. This is an important aspect to knowledge because it can give rise to many prediction techniques in the future and it introduces a link between the finance field and the field of computer science. To explain the approach of the research ques-tion the following secques-tion will explain the scope of the project.

This report will explore the ability of using the KNN algorithm and the MA formula as methods of predicting stock market movements. Several steps will be followed as we attempt to answer and explore the research question.

- As an initial step, the algorithm and the formula will be implemented in a programming language that fits the purpose of the research question and then tested to ensure that it performs as it should.

(7)

- The results of the gathered data from the algorithm and the MA method will then be accumulated and displayed graphically through graphs, charts and explanatory text, in order to clearly display the accuracy.

- The error calculations from running the algorithm and the MA method is the numerical quantification of the project. It is necessary in order to test the correctness of the data collected and in order to allow an eval-uation of whether or not the method of using the KNN algorithm is a well working prediction model when attempting to predict stock market movements compared to the MA formula.

1.3 Report Outline

• The background (2) gives a brief overview of the diﬀerent aspects involved within this project. This includes some financial theory of for example how a stock market works, a concise explanation of how the KNN algorithm is structured and an overview of what data mining involves as well as a description of the MA formula. • How the results were achieved is described in method (3). The

process in which the algorithm was implemented and how the data was used to obtain results. It will also describe how the MA formula was used to arrive at results. All the thoughts and decisions made within the implementation phase will be documented and clearly displayed within the methods section.

• The findings are then presented in the results (4) section, where they are presented numerically through graphs and tables. Error calculations will also be exhibited in this section.

• The results are then analyzed within the discussion (5) section. This means that one considers the error calculations and the results one arrived at and then one attempts to explain why the results look the way they do, and whether or not one can answer the research question based on the results one has arrived at.

• The conclusion (6) then consists of a final statement which states whether or not the research question has been answered. It also must answer what implications the results one arrived at have had for the future of the concerned areas.

2 Background

2.1 Stock Markets

The stock market consists of buyers and sellers and their stocks. A stock is essentially a document that proves your right to a part of a company’s current and future net assets. For example, when companies have the wish to expand their business but do not have the financial resources for it, they can choose to sell a share of their stock. The owners would sell a share of the company in order to raise new capital to invest in the expan-sion of the business (Kennon 2014).

(8)

would cause such a rise would be when a company announces an unex-pected increase in its quarterly result. This would cause the price of the particular stock to rise even if the market is relatively stable (Little 2009). The Stockholm stock exchange is the largest stock exchange in the Nordic countries. It consists of approximately three hundred listed companies. The market data used in this report was gathered from the Stockholm stock exchange (Nasdaq OMX Nordic 2014).

Several of the market participants dedicate a lot of time in trying to forecast the market to their own or their customers benefit, in order to make a strong return. If they were to succeed they would end up with a huge financial rise. In addition to the subject being a very interesting one, rather complicated and valued knowledge, the main part is in fact to be able to predict the stock market.

2.2 Data Mining in Stock Market Analysis

Data mining consists of analyzing data from diﬀerent perspectives and putting the data to use by summarizing it into helpful information. The concept of data mining is essentially finding patterns in large collections of datasets.

By recognizing these patterns in large datasets analysts can recognize their behaviour and therefore perhaps even predict the future (Alexander, 1998). An eﬀective use of data mining and the recognizing of patterns, can lead to more eﬀective investment decisions on the stock market (Palace, 1996). Data mining is a helpful methodology hence it helps the human detect patterns that without the algorithm could easily be missed with-out extensive analysis and is thus a large part of this project (Alexander, 1998).

Data mining tools help people realize what is going on in the stock market, that can be missed without these tools. It is said that these kinds of tools are helpful because of the diﬃculty in stock market prediction (Nawawi, 2013).

The basics of using data mining as a stock market prediction tool is to; define the pattern, recognize the problem, collect the data, preparing it and lastly preprocessing it (Unica Technologies Inc 1997). Thus a large amount of time is spent on preparation before the user can start to predict the wanted prices.

2.3 KNN Algorithm

(9)

training data is divided up into vectors and then a distance from the test data to its neighbour is calculated by using one of several methods. The most common method being the weighted euclidean distance, which basically is the closest distance between the two components. This can be seen in the figure 1 where the nearest neighbours to B? are selected using the euclidean distance.

Figure 1: KNN algorithm explained in a picture

The figure shows how the KNN algorithm uses the Euclidean metrics to chose the nearest neighbours.

(10)

(11)

2.4 Moving Average

The MA is a statistical method which traders have traditionally used as a tool in predicting future prices (Interactive Data Corp, 2014). The formula is based on the method of using a certain amount of prior data and taking the average of that data in order to make a prediction. The formula may vary in appearance but usually follows the following structure:

Figure 3: The mathematical formula for the MA

As a common tool for traders this method will later be compared to the prediction using the KNN algorithm.

3 Method

In order to eﬃciently provide an answer to our research question, the correct data had to firstly be gathered and then the two methods were implemented and used in order to predict the closing prices of the four equities.

3.1 The Data Selection

One large part of this project was choosing which data to use the pre-diction methods with. As previously discussed, the data was chosen from the Stockholm stock exchange as this was the most easily accessed stock exchange data and the most relevant. Then four equities were chosen within diﬀerent sectors. This was important as the data for all four eq-uities should not all follow specific trends and movements as often seen by equities within the same sector. The chosen equities are large global companies who are often not only based in Stockholm, this means that the data also would react to changes within other stock markets and not only change just according to the Stockholm stock exchange. The chosen four were; Investor a Swedish based investment company, H&M a Swedish based retail company, Astrazeneca a Swedish and British based pharma-ceutical company and lastly ABB a Swedish and Swiss based technical company.

(12)

during the day than over night. So if for example a major macroeconomic event were to happen in a diﬀerent time zone the trader would not have the chance to adjust his/her position (Saint-Leger, 2014).

When choosing the time frame for which the closing prices were to be selected, there were two major constraints to be considered. Firstly a unlimited amount of data could not be selected due to the fact that this would cause practical problems. Then the selected data set had to be large enough to ensure that the predictions would be accurate. After con-sidering this, a timeframe of two years was selected as this was enough data to produce accurate predictions from, but the amount of data was still manageable.

With this in mind, the historical daily closing prices for a time span of two years was then selected as the data which the predictions would be based on and compared to.

3.2 KNN Algorithm

As an initial step the KNN algorithm had to be implemented and valid data had to be gathered from it. Several diﬀerent types of programming languages were considered, which could be utilized in order to implement the algorithm. It became quickly evident that the most suited language was matlab. This became obvious as the question was thoroughly re-searched and a handful of languages were compared. Matlab has several built in functions which were utilized and simplified the implementation phase. The built in function which is central to our version of the al-gorithm is the knnsearch function, which essentially returns the nearest neighbours of the values within a matrix based on values within another matrix (the training data). This function was used and then modified to apply it to the specific situation of predicting the closing prices. The function allows one to specify the number of neighbours to be selected and after testing a certain diﬀerent amount of neighbours, K = 4 was selected as it yielded the most accurate results, this meaning the smallest deviation from the actual price.

(13)

The euclidean distance is calculated by:

Figure 4: The mathematical formula for the euclidean distance

n - number of coordinates of the two points x and y - the coordinates of the two points

In order to find the next days prediction, the selected nearest neighbours indexes were all added with one unit in order to select the next day from these neighbours. This of course causes issues if one arrives at the last price in the matrix, and adds a unit meaning that the index becomes out of bounds. This however was solved with a special for-loop. The values of the elected next day neighbours are then averaged in order to make a prediction of how the closing price will evolve the following day (Plat-inumPrep, 2009).

The average was calculated by:

Figure 5: The mathematical formula for the average

Where xivalues where the nearest neighbours and the n value was the

number of neighbours k.

As a last step a while loop was implemented in order to be able to pre-dict several prices in one run, this simplified the process in gathering the prediction data for two years.

3.3 Moving Average

(14)

The MA was calculated by:

Figure 6: The mathematical formula for the MA

Where CLOSE(i) is the closing price and N is the number of days one has selected data for.

This formula was then applied to the gathered closing data for all of the equities where each 5 previous days were used to predict a sixth. As this prediction method requires a certain amount of previous data, the predic-tions were only possible 5 days after the first data was gathered. Meaning that for the KNN algorithm the predictions begin the 2012-01-02 and for the MA formula the predictions begin the 2012-01-09.

3.4 Error Calculation

3.4.1 Root Mean Square Error (RMSE)

RMSE is a method based on measuring the diﬀerence between an esti-mated value and an actual observation. (Holmes, 2000)

Figure 7: The mathematical formula for RMSE

The formula can be described as the diﬀerence between the observed values Xo b s and the estimated values XM o d e l, which is then squared and

divided by the number of values of one data set (n) i.e the number of observed values which of course is the same value as the number of actual values and then lastly square rooted. This universal error measurement will be used to measure the accuracy of the two diﬀerent prediction meth-ods that are implemented.

3.4.2 Mean Percentage Error (MPE)

(15)

Figure 8: The mathematical formula for MPE

The formula can be described as the diﬀerence between the forecasted value ft and the actual value at as a percentage which is then averaged

using the number of values in one set (n) i.e the number of observed values which of course is the same value as the number of actual values.

3.4.3 Average Diﬀerence

The AD is a simple error calculation where the average diﬀerence between the observed value and the actual value is calculated (World Encyclopedia, 2013).

Figure 9: The mathematical formula for AD

The formula can be described as the sum of all the diﬀerences between the observed values (x) and their relative actual values (y) divided by the number of values in one data set (n) i.e the number of observed values which of course is the same value as the number of actual values.

4 Results

4.1 Graphs and Tables

(16)

Figure 10: The diﬀerence between the average actual price and each prediction method, as per each equity

The combined graph and table figure 10 presents the deviation be-tween the averaged prices from each prediction method and the actual closing prices. In other words an average price of all the predicted data was calculated from each forecasting method as well as all the actual clos-ing prices. The average price from the KNN algorithm was then compared to the actual average price, this was also done with the MA formula. The bar chart figure 10 displays the diﬀerence between the predicted average price and the actual average price, for each diﬀerent equity. As seen in figure 10 the KNN algorithm (blue) has a continuously lower value than the MA formula (red). This of course suggests that the KNN prediction was closer to the actual value, than the MA formula.

(17)

4.1.1 ABB

Figure 11: Graph displaying the actual values for ABB against the predicted values from the KNN algorithm

(18)

Figure 13: Graph displaying the actual values for ABB against the predicted values from the MA formula

(19)

4.1.2 Astrazeneca

Figure 15: Graph displaying the actual values for Astrazeneca against the pre-dicted values from the KNN algorithm

(20)

Figure 17: Graph displaying the actual values for Astrazeneca against the pre-dicted values from the MA formula

(21)

4.1.3 H&M

Figure 19: Graph displaying the actual values for H&M against the predicted values from the KNN algorithm

(22)

Figure 21: Graph displaying the actual values for H&M against the predicted values from the MA formula

(23)

4.1.4 Investor

Figure 23: Graph displaying the actual values for Investor against the predicted values from the KNN algorithm

(24)

Figure 25: Graph displaying the actual values for Investor against the predicted values from the MA formula

(25)

4.2 Error Calculation

As described earlier three methods of error calculation were used in order to early state the accuracy of each of the prediction methods. The three methods used were; RMSE, MPE and AD.

Figure 27: Table displaying the three diﬀerent error calculations method and the values corresponding with each prediction methods as per equity

(26)

finds an average for all the produced values. As one can see from the table Figure 27, the KNN algorithm has a lower RMSE then the MA formula, for each of the four equities. The KNN values are almost half of the of the MA values, meaning that they diﬀered only half of what the MA values did.

The MPE method looks into the diﬀerence between the actual values and the predicted values and then finds a percentage diﬀerence. Similarly to the RMSE method, the MPE values for the KNN algorithm were also lower than those for the MA formula. This implies that the KNN algo-rithm values deviated less percentage wise from the actual prices than the MA formula did. One interesting observation with this error method is that, the equity Astrazeneca who received the highest values using the other two methods, received the lowest error using this method. This cu-rious observation will be further discussed later on in the paper.

Lastly the AD method looks also into the diﬀerence between the actual price and the predicted price by taking the absolute value of the diﬀer-ence and then averaging it. Just like the previous mentioned methods, the values for the predicted values using the KNN algorithm were lower than those for the MA formula.

5 Discussion

5.1 KNN Algorithm

After viewing the graphs and tables in the results section a clear trend becomes evident. This is especially evident in the average prices graph and the error calculation table. The average prices graph displayed ex-actly how close the average closing prices using the KNN algorithm was to the actual average closing prices. The difference between the two was smaller than 0.1 SEK, a considerably small difference. One could com-pare this to the AD, which is the average difference per day and the values were between 1.09 SEK and 2.05 SEK, which also is a considerably small difference per day. However these measurements are just an average and the prices as seen in the equity graphs (showing the actual price graphed against the predicted price using KNN), show that the price levels were ever changing. So in order to answer the question of whether the KNN prediction method was more precise than the MA formula, one cannot only take the average prices into consideration.

(27)

not taking the size of the total price into consideration. As the MPE is an error percentage of the total price, this indicator may be the most correct as it takes the large price diﬀerence between the equities into considera-tion. However when comparing it towards the MA this will not make a diﬀerence, as the values for the KNN error were considerably lower than those for the MA formula, including Astrazeneca.

Another matter to discuss is the choice of the value of K in the KNN algorithm. As the value was chosen based on experimentation and eval-uation, there is no way of knowing our conclusion was the most optimal wihtout evaluating the method which was not done. However since the data already supports our hypothesis that the KNN algorithm would be a more accurate method of prediction, this means that even if a more accurate value of k could have been chosen this would not have impacted the outcome of the conclusions reached. But it could have had an impact on the KNN results, making them even more eﬃcient. This is however something that would have to be more researched and expanded on. As mentioned in the method section of the paper we inserted a special case for arriving at the last price in the training data, thus it would fall out of bounds because of the fact there not being any more days to calcu-late on. This in turn means that the next index is the initial closing price, which of course may not have a direct relation to the last closing price of the matrix. But it is still used to predict the closing price after the last price, this of course can lead to an invalid prediction. However as this is only one prediction of out 499, its impact on the overall eﬃciency of the algorithm is relatively small. This of course could have been solved if the user would insert more data.

The way the predictions were made must also be considered in detail. The method, which was used, was based on one single prediction at a time. A single price was selected and then compared to the training data, finding its nearest neighbours and taking these neighbours’ next price. However each predicted price was not stored in order to make another prediction based on the already predicted price. Each price was predicted using only the already existing training data and not using the previous predictions. Whether or not this method was the most optimal can be argued, and if a prediction was to be made further then one future day at a time, another method of prediction had to be implemented. During the implementation stage this method of using already predicted prices to predict further was tested but it had a tendency to converge towards a single value and the single day prediction was favoured. The method used can easily be adapted in order to be able to predict further in the future but then the risk arises that this method also will have a tendency to converge towards a single value if one chooses to try and predict too far into the future. However if one were to be able to overcome the con-verging problem, both methods could be used to predict closing values on the stock market.

5.2 MA Formula

(28)

average prices graph, the difference between the average actual price and the MA formula predicted price, deviated almost ten times as much as then KNN algorithms predicted prices. However when regarding the er-ror calculations the difference between the two prediction methods are not as great. For example the MPE values who ranged from 1.07% to 1.4% were only slightly higher than the values for the KNN algorithm. An average error of approximately 1% of 2 years of data is still a very ac-curate result, even though the KNN algorithm performed more acac-curately. There can be many reasons for why the MA formula was not as pre-cise as the KNN algorithm, but one main reason could be the difference in how these methods consider previous data. While the KNN method searches for relevant data, the MA formula only regards the previous N days. The MA formula then bases the entire prediction on only those N days, whilst the KNN algorithm uses the entire 2 years worth of data to find relevant neighbours. This is a limitation of the MA formula as it does not take longer historic trends into consideration, but only looks at the closest previous data.

The value of N was chosen as 5, after some experimenting and compari-son. However the question arises if this way of choosing a N value was the most efficient way. Only whole numbers were considered and this could have limited the efficiency of choosing the value of N which was most ac-curate in this case. A further study and evaluation taking more values of N into account could result in more accurate values using the MA for-mula. However when considering the difference between the accuracy of the KNN algorithm and the MA formula, the change in value of N may better the values from the MA formula but will most probably not make the method more accurate than the KNN algorithm, as the difference in deviation between the two methods is large.

5.3 Implications of the results

As the results gathered show that the algorithm is a more accurate way of predicting the future closing prices, the question arises: why traders do not usually utilize even more technical aspects in order to predict stock market changes and in turn earn more money. Making the assumption that most traders have a purely economic background and may perceive the world of algorithms and programming rather intimidating and com-plex, this would explain the absence of more technical approaches to the prediction problem.

Another implication which must also be considered is that if algorithms were able to predict the market movements extremely accurately, and was available to all market participants, then the potential of making an ex-traordinary return from trading would disappear.

(29)

be based on the algorithm as social aspects which can only incorporated by a trader also needs to be considered. However a combination of the two, could only be beneficial to the trader.

6 Conclusion

To answer our problem statement, Is using the KNN algorithm a more precise way of predicting the future closing prices of equities than using the more common method of MA? In order to answer this question, the two methods of prediction were implemented and data was produced. The results were graphed and displayed partially in tables, calculations were then done in order to measure the accuracy of the data. After taking all of the diﬀerent versions of the data and presented calculations into account, there was a clear answer to the stated question, the KNN algorithm was in fact more accurate on all accounts.

This was not largely surprising as the KNN algorithm not only consid-ered the entire two years of data for each prediction but also took previous historic trends into consideration. It looked at the historic movement of the days which resembled the value one was attempting to predict. This method of prediction clearly proved to be eﬀective and more accurate than a method which is commonly used by traders.

The implication of this is that traders should take the step and explore the world of prediction using algorithms and machine learning more than they do these days. As the computer science sector continues to evolve and take new steps towards simplifying the environment around us, maybe its time for the trading environment to take the steps and rely more on computer science than on the statistical methods.

7 References

• Alexander, D. (1998). Data mining. Available at: http://www.laits.utexas.edu/ anor-man/BUS.FOR/course.mat/Alex/ [Accessed 3rd March 2014].

• Alkhatib, K., Najadat, H., Hmeidi, I., Ali Shantnawi, M.K. (2013) Stock Price Prediction Using K-Nearest Neighbor (kNN) Algoritm, International Journal of Business, Humanities and Technology, 3 (3),

pp. 34-44. Available at: http://www.ijbhtnet.com/journals/Vol_3_No_3_March_2013/4.pdf [Accessed: 28th February 2014].

• Azhar, S., Badros, G., Glodjo, A., Kao, M.Y, and Reif, J. (1994) ’Data compression techniques for stock market prediction’. Data Compression Conference, pp. 1-11.

• Beattle, A. (2011) The Basics Of Business Forecasting. Avail-able at: http://www.investopedia.com/articles/financial-theory/11/basics-business-forcasting.asp [Accessed 01 March 2014].

• Berson, A., Smith, S. and Thearling, K. (1999) Building Data Mining Applications for CRM.New York: McGraw-Hill.

• Dominique, G.,Huck, N. (2005) On the Use of Nearest Neighbors in

(30)

• Greenacre, M. (2008) ’Measures of distance between samples: Eu-clidean’. In: Greenacre’s, M. ed. Correspondence Analysis and Related Methods. Stanford University.

• Holmes. S. (2000) RMS Error [RMS Error]. Stanford University, 28th November.

• Ihilliard1. (2011) Faa guidance in human error for air traﬃc

control and pilot crew. Available at: http://www.studymode.com/essays/Faa-Guidance-In-Human-Error-For-704108.html [Accessed 1st april 2014].

• Imandoust, S.B., Bolandraftar, M. (2013) ’Application of K-Nearest Neighbor (KNN) Approach for Predicting Economic Events: The-oretical Background’, S B Imandoust Et Al. Int. Journal of Engi-neering Research and Applications, 3 (5), pp. 605-610. Available at: http://www.ijera.com/papers/Vol3_issue5/DI35605610.pdf [Accessed: 28th February 2014].

• Interactive Data Corp. (2014) Moving Averages - Simple and

Exponential. Available at: http://stockcharts.com/help/doku.php?id=chart_school :technical_indicators:moving_averages [Accessed 5th March 2014].

• Kennon, J. (2014) An introduction to the stock market.

Avail-able at: http://beginnersinvest.about.com/cs/newinvestors/l/bl_lesson1c.htm [Accessed: 4th March 2014].

• Little, K. (2009) How stock prices are changed. Available at: http://stocks.about.com/od/tradingbasics/a/032909prices.htm [Ac-cessed: 4th March 2014].

• Nasdaq OMX Nordic. (2014) Companies listed on Nasdaq OMX Stockholm. Available at: http://www.nasdaqomxnordic.com/aktier/listed-companies/stockholm, [Accessed: 4th March 2014].

• Nawawi, A. (2013) ’Stock market tip: use Google Trends’, The Con-versation, 26th April. Available at: http://theconversation.com/stock-markettipusegoogletrends13745 [Accessed: 1st april 2014].

• Palace, B. (1996) Data Mining. Available at: http://www.anderson.ucla.edu /faculty/jason.frand/teacher/technologies/palace/datamining.htm, [Ac-cessed: 3rd March 2014].

• PlatinumPrep, LLC. (2009) Arithmetic Mean (Average) - GMAT Math Study Guide. Available at: http://www.platinumgmat.com/ gmat_study_guide/statistics_mean [Accessed: 8th April 2014]. • Saint-Leger, R. (2014) What Is the Significance of a Closing Price

on a Stock? Available at: http://finance.zacks.com/significance-closing-price-stock-3007.html [Accessed: 9th April 2014].

• Swanson, D.A., Tayman, J., Bryan, T.M. (2010) Mape-R: a rescaled measure of accuracy for cross-sectional forecasts. University of Cal-ifornia Riverside and University of CalCal-ifornia San Diego.

• TA-Guru. (2010) Moving average. Available at:

http://www.ta-guru.com/Book/TechnicalAnalysis/TechnicalIndicators/MovingAverage.php5 [Accessed: 9th April 2014].

• Unica Technologies Inc. (1997) Solving Data Mining Problems Us-ing Pattern Recognition Software with Cdrom. Prentice Hall PTR Upper Saddle River, NJ, USA

(31)

• World Encyclopedia. (2013) Genomsnittlig avvikelse. Available at:

http://sv.swewe.com/word_show.htm/?26143_1&Genomsnittlig%7Cavvikelse [Accessed: 9th April 2014].

• Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, J.G., Ng, A., Liu, B., Yu, S.P., Zhou, Z.H., Steinbach, M., Hand, J.D. and Steinberg, D. (2007) ’Top 10 Algorithms in Data Mining’, Springer-Verlag London Limited, 14 (1), pp. 1-37. Avail-able at: http://www.cs.umd.edu/ samir/498/10Algorithms08.pdf [Ac-cessed 01 Mar. 2014].

7.1 Figures:

• Figure 1: Sergei Savchenko (1998). Editing Nearest Neighbour

Deci-sion Rules. Available: http://jeﬀ.cs.mcgill.ca/˜godfried/teaching/projects.pr.98 /sergei/project.html Last accessed: 1 st april 2014

• Figure 2: k-Nearest Neighbor Classification Algorithm Implemented