Predicting The Stock Market With Financial Time Series Using Hybrid Models - A Comparative Analysis

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2018,

Predicting The Stock Market With Financial Time Series Using Hybrid Models - A Comparative Analysis

VILHELM BUREVIK SANDBERG

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ENGINEERING SCIENCES

(2)

EXAMENSARBETE INOM TEKNIK, GRUNDNIVÅ, 15 HP

STOCKHOLM, SVERIGE 2018

Att förutspå aktiemarknaden i form av finansiella tidsserier genom att använda

hybridmetoder – en jämförande analys

VILHELM BUREVIK SANDBERG

KTH

SKOLAN FÖR TEKNIKVETENSKAP

(3)

Sammanfattning

Att förutspå aktieindex har visat sig vara svårt med traditionella modeller. Maskininlärning är ett fält inom datavetenskap som har visat goda resultat när det har applicerats på tidsserieanalys på grund av dess användning av smarta inlärningsalgoritmer. Det finns dock mycket kvar som kan förbättras i eftersökningen av bättre och mer utmärkande modeller.

Denna uppsats är tänkt att fylla detta gap genom att undersöka hur väl följande metoder SVR, SVM, LMANN och BRANN presterar när de försöker förutspå finansiella tidsserier i form utav aktieindex, en dag framåt i tiden.

De enskilda metoderna kommer också användas som riktmärke för att utvärdera två hybrider kombinerade av dessa. Dessa hybrider kommer att skapas genom att använda en optimeringsalgoritm GA, i syfte att minimera RMSE. Vidare pekar resultatet på att de två hybriderna överträffar de enskilda modellerna och att de därför visar sig vara mer framgångsrika när det kommer till att förutse morgondagens stängningspris för aktieindex.

Vidare bekräftar det statistiska testet att hybriderna presterar annorlunda jämfört med de enskilda modellerna, dock inte på vilket sätt. Det är emellertid inte statistiskt signifikant vilken av de två hybriderna som presterar bäst. Standardavvikelsen för hybridernas fel varierar dessutom något, vilket man bör ta hänsyn till och ha i åtanke om modellen ska användas och tillämpas. Viktig information om de resultaten som framkommit framgår av tabellerna 5.1 till 5.5.

Slutsatsvis har hybridmodeller återigen visat sig betydande när det gäller att förutspå morgondagens aktieindex. Resultatet av denna avhandling kommer förhoppningsvis att bidra till ytterligare forskning i strävan efter perfektion inom området för ekonomiska tidsserier.

(4)

Abstract

Predicting stock indexes has proven to be difficult using traditional technical analytics. Machine learning is a field within computer science that is frequently used, and based on its ability to use learning algorithms has shown good results. However there is still a lot of unexplored territory in the search for improved and superior methods.

This thesis seeks to fill this gap by examining how well the following methods perform when predicting one day ahead financial time series (SVR, SVM, LMANN and BRANN). The individual methods will also be used as benchmarks in the interest of evaluating two hybrids constructed of the above mentioned algorithms that will be merged by minimizing the RMSE using GA. The result doesn’t conclude which of the individual models that perform best when predicting short-term stocks. Further, the result strongly indicates that both the hybrids outperform all the individual methods by a wide margin, and therefore that they are proven to be successful in their performance. The statistical test confirms that the hybrids perform differently compared to the individual models, though not in what way. It is however not statistically significant which of the two hybrids performs best. Moreover, the result of the standard deviation shows that the performance of the hybrid varies somewhat, which should be taken into account if the model is to be used to invest real money. Important information about the result is presented in tables 5.1 through 5.5.

Conclusively, hybrid models have again proven significant in terms of predicting short-term stocks. The findings of this thesis will hopefully help further research in the quest for perfection in the field of financial time series prediction.

(5)

List of figures

2.1 Figure illustrating the ANN architect Source . . . 6

2.2 Figure illustrating the architectural NARX model [2]. . . 8

2.3 Figure illustrating the architectural GA-WA model [23]. . . 11

5.1 Prediction of OMX30 stock index compared to the actual for both the hybrids. . . 18

5.4 Prediction of S&P 500 stock index compared to the actual for both hybrids. . . 21

5.5 Prediction of Nikkei 225 stock index compared to the actual for both hybrids. . . 22

5.6 Prediction of OMX30 stock index compared to the actual for both hybrids. . . 23

(8)

List of tables

4.1 Table showing standard deviation for each time series . . . 14

5.1 Table showing the performances of methods . . . 23

5.2 Table showing standard deviation for the error of the best hybrid prediction . . . 23

5.3 Table showing average weight for WA-GA hybrids . . . 24

5.4 Table showing significant level for the hybrids . . . 24

5.5 Table showing significant level for all models . . . 24

(9)

Nomenclature

Acronyms / Abbreviations ANN Artificial Neural Network

ARCH Autoregressive Conditional Heteroskedasticity ARMA Autoregressive Moving Average Model BP Back Propagation

BPNN Back Propagation Neural Network

BRANN Bayesian Regularized Artificial Neural Network BR Bayesian Regularization

GA Genetic Algorithm

GARCH Generalised Autoregressive Conditional Heteroskedasticity LMANN Levenberg-Marquardt Artificial Neural Network

LM Levenberg–Marquardt MLP Multilayer Perceptron

NARX Nonlinear Autoregressive Network with Exogenous Input NN Neural Network

RMSE Root Mean Square Error

SMAPE Symmetric Mean Absolute Percent Error SVM Support Vector Machine

SVR Support Vector Regression

(10)

Chapter 1

Introduction

In recent years researchers have sought to improve ways to predict the stock market. Because of numerous dynamic factors in combination with the volatility of the market price the prediction problem is complex. A lot of research has been put into characterization and prediction. The price for an asset, for example a stock, produces what’s called a time series. Several types of financial time series have been documented and studied throughout the years. Lately, almost all financial transactions have been documented and stored. Analysis of time series is of great interest both theoretically and practically when making predictions on previously collected data [1].

Further, the theory needed to handle the stochastic uncertainties characteristic of financial time series makes the subject of interest not only to economists but also to physicists and mathematicians [2]. Lately, there have been several proposed machine-learning algorithms spotlighted by researchers. However, due to inbuilt characteristics such as hidden relations in models, non-stationary results, noisy data and a high degree of uncertainty in prediction, researchers have argued that artificial intelligence alone is insufficient, and various types of methods have been implemented, examined and proposed all in order to optimize the accuracy of the predictions. Some of which, combined, are called hybrid models [3][4].

Hybrid models combine two or more algorithms in order to utilize components of each and to cover for the weaknesses of each model. Various hybrid models have been suggested when predicting time series. For example Asadi, Hadavandi, Mehmanpazir and Nakhostin [3] suggest a preprocessed evolutionary LMANN algorithm. Paluch and Jackowska [4] suggest a hybrid based on fractal analysis. The overall result indicates that hybrids are more likely to make an accurate prediction of the stock market compared to using individual machine learning algorithms.

Various genetic techniques have been merged with single machine learning algorithms. For example Wu and Zhang [5] integrate a BPNN with an improved IBCO. Asadi and Hadavadi suggested a merge data preprocessing method GA and LM algorithm in the learning BPNN [3]. A unique hybrid to predict FTSE 100 next-day closing price was recommended by Al-hnaity, Abbod [6]. In the model, GA is used in order to determine the weights of the combined algorithm. The result shows that the suggested model exploits GA and outperforms the other exploited approaches. In the research, the choice of input variables and data transformation were used under data preprocessing in order to enhance the prediction performance of the model. The gained result has shown that the models can handle data fluctuations as well as improving the performance prediction. Because

(11)

1.1 Problem Statement 2

of the many complex factors in the stock market and the intense shift in stock indexes there is plenty of room for enhancing prediction models.

In recent research there have been few studies that compare the most recent hybrid ANN models with each other. In Alam and Ljungehed [7] three different ANN hybrid methods are examined and evaluated for the closing price for one-day stocks. The results point to that one of the methods, BRANN is outperforming the other two when it comes to predicting the market. The BRANN method was proposed by Ticknor [8] and is a three-layered feed-forward ANN using Bayesian regularization in the BP process, used for one-day stock price prediction. It is a hybrid in the sense that it uses ANN in combination with technical analysis. This model is of interest because of the use of a non-traditional training algorithm for stock prediction.

With the except of the above mentioned techniques involving GA there have been few studies comparing an Ann hybrid with a GA trained hybrid. This comparative study seeks to fill the gap in this area of research. The aim of this thesis is to compare two different machine learning hybrid algorithms using two different training methods in the back propagation.

One using Bayesian Regularized and one using Levenberg-Marquardt. Both of these will then be merged with a combination of SVR, SVM formed using GA as to predict the closing price for one-day stock indexes.

1.1 Problem Statement

This thesis is a comparative analysis of Support Vector Machines and Artificial Neural Networks in application to the problem of predicting one-day ahead financial time series. In addition, I intend to use a genetic algorithm to examine hybrids of the above methods. The objectives of this thesis are to study how well these machine learning models perform by comparing them when predicting three different financial time series and also, how well these individual methods stand against two combined hybrids constructed using GA.

1.2 Scope

The study is limited by numerous factors. These factors will restrict the study to be solely comparative. The choice of parameters for SVM, SVR, GA and ANN are based on existing literature and will not be examined in this study. Further, the selection of time series will not be analyzed on a profound level and the preprocessing as well as number of data points will be based on existing literature. The chosen time series consists of three different financial indexes, S&P 500, Nikkei 225 and OMX30. These three indexes were chosen because I want to test the model on three various time series, two of which are more fragile to changes in the world economy.

The restriction in knowledge of the field and time constraints play a major part in why the models will be implemented using built in toolboxes in MATLAB. Regarding the result, there will not be any guidelines as to how well the models would yield a return when applied to any stock market. All the financial indicators and parameters were chosen based on existing literature that intimates that they produce good results.

(12)

1.3 Thesis Outline 3

1.3 Thesis Outline

The thesis will be structured as follows. Chapter two will cover the background to provide an overview of the different machine learning methods that will be used to conduct the time series prediction. Chapter three will dive into the models that have been chosen to conduct the prediction and the nature of their parameters. In chapter four, the choice of data and how it will be preprocessed will be described as well as different algorithms that will be used to measure the performance. Chapter five will then describe the simulation and present the prediction result made by the models as well as compare their performances. The sixth chapter will discuss the findings in depth from a broader perspective including the obtained result and the reasons behind it. Lastly, the seventh chapter will conclude and summarize the thesis.

(13)

Chapter 2

Background

Due to the wide movements in and complex parameters of stock prices there is ample room for enhanced financial algorithms. Resent research has turned its focus away from traditional statistical models such as ARMA, ARCH, GARCH, all of which have failed to capture the complexity of the market.

Prediction is dynamic where past events are predicted using one or more time series. Dynamic neural networks, which include tapped delay lines are effective for nonlinear filtering and prediction.

Therefore, recent research has introduced and tried different combinations to optimize the predictions and build new hybrid models in order to increase accuracy and prediction speed, some of which are relevant for this thesis. The following methods and algorithms are of interest to the project.

2.1 Support Vector Machine (SVM)

The SVM model was originally developed by Vladimir Vapnik. It is of the most fundamental developments in the field of machine learning and can be applied in classification and regression [9]. The main goal of the algorithm is to choose the optimal high dimensional spaced hyperplane in order to ensure that the upper bound of generalization error is minimized. One limitation is that SVM only directly deals with linear parameters.

This is solved by mapping the original space into a higher dimensional space, which allows the analysis of a non-linear sample [10]. If for example we have a function that creates some randomly produced data points (xi,yi)then the formed function will be approximated by SVM as g(x) = wf(x) + b where f(x) is non-linearly mapped from the input space. The constants w,b are both estimated by minimizing the regularized risk equation:

R(C) = C1 N

Â

N i=1

L(di,yi)1

2+|| w ||² (1)

L(d,y) = 8<

:

| d y | e | d y | e

0 other (2)

Where C is called the regularization constant ande is called the tube size. C determines the trade-off between the empirical risk and the regularization term.e is the approximation of the accuracy placed on the training data points. The term C_N¹Â^N_i=1L(di,yi)¹₂is the risk term called empirical error. L(d,y) is the loss function which

(14)

2.2 Support Vector Regression (SVR) 5

enables one to use sparse data points to represent the g(x) function. The second term in (1) is the regularization term. By introducing the variablesz,z^⇤equation (1) can be transformed into:

R(w,z,z^⇤) =1

2ww^T+C ⇥

Â

^N

i=1

(z,z^⇤) (3)

Subjected to:

wf(x_i) +bi di e + zi^⇤ (4)

di wf(xi) e + zⁱ (5)

zi,zi^⇤ 0 (6)

The kernel function is derived after the Lagrange multipliers are defined and optimality constraints exploited.

T (x) =

Â

^l

i

(a_i a_i0)K(xi,xj) (7)

Wherea_i⁰is the Lagrange multiplier that satisfies the following equality ai⇥ ai⁰=0 ai,a_i⁰ 0. A popular kernel function is Radial Basis Function (RBF):

K(xi,xj) =exp( g || xi xj||²) (8)

Where the kernel value is equal to the inner product of two vectors (xi,xj)in the space (f(x_i),f(x_j)).

Theoretical background and mathematical traceability are some of the advantages of the SVM model. This has motivated researchers to apply them in various fields, such as prediction financial time series.

This model is of interest in this thesis project since it’s used in the hybrid model that will be applied in the prediction and simulation.

2.2 Support Vector Regression (SVR)

In 1996 a version of SVM used with regression was proposed by several people including the developer of SVM, Vladimir Vapnik [11]. As mentioned above, the idea with the SVM algorithm is to construct hyperplanes in a higher or infinite dimension space, which can be used for classifications. The aim of SVR is to construct a hyperplane that is as close to as many of the data points as possible. A main objective is to choose a norm for the hyperplane that is small while simultaneously minimizing the sum of the distances from the data points to the hyperplane [12].

2.3 Artificial Neuron Network (ANN)

ANN is an algorithm inspired by the biological NN found in our brain. There, billions of neurons interact by sending signals between each other [13]. Each neuron consists of a cell structure, dendrites, and synaptic terminals. It is when the synapses surpass a certain threshold that a pulse is sent from one neuron to another.

This can occur with different strengths, meaning that some signals contribute more than others to the activation

(15)

2.3 Artificial Neuron Network (ANN) 6

of neurons. Signals can be sent to multiple neurons, which creates a chain reaction [14]. ANN is used in several fields such as optimization, pattern recognition, and financial prediction [15]. Compared to the biological the artificial version is highly abstract and the algorithm uses layers instead of the complex network that builds the biological ANN. ANN can be illustrated as a weighted directed graph. MLP is one of the most common architectures [15]. MLP is built so that the input and output are made up of single layers, while the middle layers (called the hidden layers) are made up of one or more layers. Each node in the input layer is connected to the nodes of the first hidden layer. The data is then directed through the hidden layers until it reaches the output layer.

A node and a mathematical function replace each neuron. Input to each node is multiplied by default adaptive weights. If the sum of the weighted values satisfies the mathematical function, then the node is activated.

Whereas in the biological case, a neuron sends an impulse signal to another neuron [14]. Various functions are used to determine if the node has been activated or not, the most commonly used one is the sigmoid function:

fh= 1

1 + exp( x) (9)

The learning process of ANN consists of updating the network architecture and connection weights. The network must learn and adapt the weights iteratively, based on training techniques, to improve performances.

The functions representing the node process the data from the input node. The weighted value of each connection represents how much each node affects the output node. In order to change the weighted values of NN, training data is used. In order to avoid over-fitting, each ANN method has some kind of basic data on which it estimates and calculates the accuracy of the untrained data. The advantage of using the MLP method is that using several hidden layers increases the ability of the network to learn complex relationships between inputs and outputs, which can be difficult when using traditional algorithms [15][3].

Fig. 2.1 Figure illustrating the ANN architect¹

1http://neuralnetworksanddeeplearning.com/chap1.html

(16)

2.4 Back Propagation Neural Network (BPNN) 7

if we let xi=input variables, Ii=

Â

n j=0

xiwji,i = 1...m (10)

Where Iiis the activation function for the i:th node, we can define the output of the hidden layer as:

zi= fh(Ii), fh= 1

1 + exp( x) (11)

Where fhis the sigmoid activation function. The output of all the neurons can then be written as:

yt=ft

Â

m i=0

ziwi j (12)

2.4 Back Propagation Neural Network (BPNN)

One of the most popular methods when modeling time series with non-linear structures is feed forward [16]. A popular version is the three layers feed forward back propagation algorithm. In the model, the weights produce an LMS error of the desired or the actual value, as well as the estimated value from the output of the ANN.

Like the normal ANN the weights are determined in the propagation process by building connections between the nodes. The connection weights are given from the initial values. The error between the predicted and actual output values are back propagated in order to change and update the weights. In order to minimize the error, the desired and predicted outputs are compared after the supervised learning procedure [17]. The network contains both a hidden and an output layer of neurons with non-linear transfer functions as well as an output layer of neurons with linear transfer functions. This method is used in this thesis as both training algorithms use BP to update the weights and bias values.

2.5 Nonlinear Autoregressive Network with Exogenous Input (NARX)

The NARX is a continual dynamic recurrent neural network that uses feedback connections that link multiple layers of the network together. The model is based on the linear ARX model, which is frequently used in modeling time series. The core of the NARX model is based on the following equation:

y(n + 1) = f ((y(n),y(n 1),....y(n dy+1),u(n),u(n 1),....u(n du+1)) (13) In this equation the output signal y(n + 1) is regressed on prior values of the signal y(n + 1) and prior values of the exogenous input signal u(n + 1). The NARX model is implemented using a feed forward NN in order to approximate the function f .

The recurrent neural architectures of NARX have, in contrast to other recurrent neural models, limited feedback architectures that only come from the output instead of the hidden neurons [18]. In previous studies it has been shown that, in theory, NARX networks can be used instead of usual recurrent networks without computational losses and that they are at least equivalent to Turing machines [19]. NARX is not only powerful in theory. There are many practical examples where the NARX posses several advantages. In [20] the author concludes that in the NARX network the gradient-descent learning can be more effective when implemented compared to other recurrent architectures with hidden states.

(17)

2.5 Nonlinear Autoregressive Network with Exogenous Input (NARX) 8

Fig. 2.2 Figure illustrating the architectural NARX model [2].

(18)

2.6 Genetic Algorithm (GA) 9

According to [21] the NARX approach outperforms standard neural network based predictions. In this thesis, MATLAB’s built-in standard NARX will be used for the time series model when predicting the one day ahead stock index. The architecture of NARX that will be used is illustrated in figure 2.2 where the y(n) input will be the normalized stock index data extracted from Yahoo Finance while the u(n) input will be preprocessed using financial indicators. Since the standard function will be used on the shallow level, the underlying structure will not be analyzed and optimized based on the given time series. This is, of course, a limitation in finding the best prediction but because of time constraints this was out of scope for this study.

2.6 Genetic Algorithm (GA)

GA is modeled on the Darwinian selection mechanism and was first proposed in 1992 by Holland [22] and later developed by Koza. The objective of the algorithm is to solve optimization problems such as determining the optimal weights for the proposed hybrid model. This is used in [23] to construct a hybrid that predicts the one-day stock market. The same hybrid will be used in this thesis as one of the two that will be compared.

GA is very different, compared to other conventional optimization methods, which contributes to making GA more efficient in search of optimal solutions. Some of the advantages of GA are presented in [24]. One of the advantages is that the algorithm of GA when computing strings is encoding and decoding discrete points rather than using the original parameters value, thus compared to traditional calculus methods GA is tackling problems associated to non-differentiability and discontinuity functions. Therefore, due to adaptation of binary strings, GA fits computing logic operations better. Also GA uses a fitness function to evaluate a suggested solution. This is because the prior information is unimportant which makes the generation randomly selected.

2.7 Related Work

Several hybrid methods have been examined throughout different studies for financial time series with varying outputs. In [7] a hybrid model is tested using BR and compared in predicting the stock market with two other hybrids. Further, [23] is exploring and investigating a hybrid constructed of three methods that are merged together using GA. The result indicates that the suggested hybrid outperforms the individual methods when predicting different stock indexes. In this report, two different hybrids of the machine learning methods mentioned above are first introduced. Then, they are both merged with SVM and SVR using GA to form a new hybrid called GA-WA and its performance will be compared when predicting three different one-day ahead stock indexes.

In 2013 Ticknor proposed a hybrid model that uses Bayesian Regularization in the back propagation process [8] The model that is constructed out of a three-layer feed-forward ANN using BR was used for one-day stock prediction. The method uses BR as the training algorithm which makes it a hybrid. Ticknor did several experiments to determine how many hidden layers would optimize the stock prediction. He came to the conclusion that, with 5 layers the BRANN algorithm provided on average a 98 percent accuracy for the stocks over the total period. It also managed to handle volatility and noise without over-fitting the data. The advantages of the method are that it reduces over-fitting and over-training. The goal of the BR algorithm is to increase the generalization of the network. This is done not by finding the local minimum but instead by finding the global minimum for the most optimal weights. To understand how this is done, one must understand the theory behind the algorithm. The main idea is to transform nonlinear systems into well-posed problems by adding

(19)

2.7 Related Work 10

an extra term to the error function, which will be reduced during the training. When predicting the financial stocks the weights are driven to zero and the network will train on the nontrivial weights. In [25] Mammadli explores and predicts financial time series based on the LM algorithm. He uses the LM learning algorithm for the back propagation ANN with good results showing that the relative error of prediction is less than 3%.

LM is used in the back propagation process and trains the NN to update the weights and bias according to the LM algorithm. LM interpolates between the Gauss-Newton algorithm and gradient-decent algorithms as an optimization technique. By using the information obtained about the error, the algorithm adjusts each weight in order to reduce the functional error. After some interpolation the network hopefully converges to a state with a small calculation error [26].

As mentioned in the introduction, the machine learning aspect provides different technologies to predict the stock market for many different models. GA is good when it comes to optimizing the weights calculated for the input variables on a hybrid ANN model [27]. In [23] Al-hnaity and Abbod uses a hybrid called GA-WA with good results when used on a set of models in order to optimize the weights. The GA-WA approach is based on minimizing the combination error. If the actual value of model i at time t is: fit i = 1...n,t = 1...m and the weight vector iswt= [w1,w2,w3..wn]t

1 =

Â

ⁿ

i=1

wi (14)

The predicted value can be expressed as:

ˆyt=

Â

n i=1

witfit (15)

We can write this as: ˆy = FW, F = [ fit]mxn, ˆy = [ ˆy1, ˆy2, ...ˆyn]The error for the prediction can be expressed as:

et=yt ˆyt=

Â

n i=1

wityt

Â

n i=1

witfit=

Â

n i=1

wit(yt fit) =

Â

n i=1

witeit (16)

using the relation that the prediction error can be expressed as eit=yt fit.The combined prediction value is a mean of the hybrid.

ˆYcomb=w1tˆYSV R+w2tˆYSV M+w3tˆYANN

(w1t+w2t+w3t) (17)

The sum of w1t,w2t,w3t is between 0 and 1 for every time step t. An important step in the development of a hybrid model is to choose the value of the weights w for each method that will give the best prediction. The simplest approach is to put w1t=w2t=w3t=1/3. However, this is not necessarily the most optimal approach.

This is where the GA is utilized to determine the optimal weights for each model in every time step.

(20)

2.7 Related Work 11

Fig. 2.3 Figure illustrating the architectural GA-WA model [23].

(21)

Chapter 3

Model

In this chapter the models that will be used throughout the study will be presented and their nature described.

3.1 Bayesian Regularized Artificial Neural Network (BRANN)

The input that is used for the BR training includes daily stock data (opening price, minimum price, maximum price, closing price) and three different financial indicators (5- and 10-day Exponential Moving Average (EMA) and RSI). The input was normalized to the range [-1,1] while the output was pre-normalized. The transfer function that was used for the hidden layers was the tangent sigmoid, and the network was constructed:

6-5-1(input-hidden-output). The number of training iterations was set to 1,000.

3.2 Levenberg-Marquardt Artificial Neural Network (LMANN)

In the simulation the exact same inputs as for BRANN were used and normalized. Also, the transfer function, the network architecture, and the number of iterations were made identical.

3.3 SVR, SVM

In constructing SVR and SVM, a subset of inputs is first used to create a training method. The opening, maximum, minimum, and closing prices for four consecutive days are then used to predict the closing price for the fifth day. This method is saved and then used to predict the closing price for the entire time series.

Cross-Validation is also used by partitioning the data set into folds and estimating the accuracy of each fold in order to shelter against over-fitting.

For the SVM method the fine Gaussian function gave the best accuracy and is therefore used as the kernel function. In the SVR model Cubic SVM gave the lowest RMSE and is hence used as the kernel function.

3.4 GA-WA

In this report the genetic algorithm will be used to optimize the weights for a hybrid assembled of SVR, SVM, LMANN and SVR, SVM, BRANN.

(22)

Chapter 4

Method

4.1 Data Analytics

For the prediction and comparison three different stock indexes were used S&P 500, OMX30 and Nikkei 225.

The data was downloaded and preprocessed from Yahoo and stretches from 27/9 2016 - 27/3 2018. The data has also been adjusted for all applicable splits and dividend distributions using appropriate split and dividend multipliers, adhering to Center for Research in Security Prices (CRSP) standards. The input data consists of 377 data points, where each data point represents one day containing highest, lowest, opening and closing index. The input can therefore be expressed as a 377 by 4 matrix. The output consists of the closing price for the following day.

The reason for choosing this amount of data is that, according to Walczak [28] who conducted an empirical analysis of data requirements for forecasting ANN, the optimal amount of data is spanned from one to two years. Furthermore, in the ANN training the data is split into three categories: validation, testing, and training.

By trial and error the most optimal distribution was 15 percent for validation, 15 percent for testing and 70 percent for training. For SVM and SVR the data was divided into sets of five. Four consecutive days are used to predict the fifth. So, the training consists of 80 percent and the testing of 20 percent.

For all simulations the data is normalized because the weights are required to be equal for all inputs. Otherwise, if the neurons lie in different ranges, the highest absolute value would be favored during training [2].

The reason for choosing these different data sets is to test how the algorithms stand against each other when simulating markets in different regions of the world but also of different sizes. For instance, a smaller Swedish market is less affected by international events than the S&P500, which means that the stock indexes might respond differently to various world events.

The following table shows the standard deviation for each time series during the simulated time frame.

The table shows that both the Nikkei 225 and the S&P 500 has around twice as much spread as the OMX30.

(23)

4.2 Steps of Prediction 14

Stock index stimeseries

S&P 500 0.07650

OMX30 0.0394

Nikkei 225 0.0886

Table 4.1 Table showing standard deviation for each time series

4.2 Steps of Prediction

The predictions are made throughout several steps. For the ANN the data is first downloaded and preprocessed by normalization and the use of technical indicators. The network is then configured, initiated, and trained based on the chosen parameters stated in the model chapter. The performance of the prediction is then measured using the stated performance methods.

For the SVM and SVR the data is first downloaded and preprocessed by normalization and classified into data and non-data. the data is then further divided into training and test sets. The classified SVM is then trained using a linear kernel function and the performance of the prediction is measured using the stated performance methods.

The SVM, SVR, and ANN are then merged into one single hybrid and the performance is measured again.

4.3 Technical Indicators

To optimize the performance of the ANN methods, the following financial indicators are used as part of the input training.

Relative Strength Index (RSI)

The Relative Strength index is used to compare the gains and losses over a certain time period in order to measure the change in price movement. Its main objective is to determine if the asset is oversold or overbought.

RS =EMA(U,n)

EMA(D,n) (18)

RSI = 100 100

1 + RS (19)

Where U = upward change, D = downward change 8<

:

U = closenow closeprevious, D = 0 if we have a gain

closeprevious closenow, U = 0 if we have a loss (20)

Exponential Weighted Moving Average (EMA)

EMA is an impulse filter that applies weights that decrease exponentially for each time step. EMA is a kind of mean of the n previous data. It’s mostly used in order to capture short-term trend movements.

(24)

4.4 Benchmark 15

EMAtoday= p1+ (1 a)p2+ (1 a)²p3+···(1 a)^{n 1}pn

1 + (1 a) + (1 a)²+···(1 a)^{n 1} (21)

Where p1is price today, p2is price yesterday, and so on, anda = 2/(n + 1).

4.4 Benchmark

In order to scope out how well the two hybrids perform, their prediction errors will be calculated and compared.

Each individual method that the hybrid contains will be examined and its respective error will be used as a benchmark for how well the hybrid performs.

4.5 Measurements of Performance

The performance of the stock prediction is calculated by comparing how much the predicted closing price differs from the actual value. Two popular methods used for measuring this performance are RMSE and SMAPE. In order to determine if the compared hybrids are distinctive, statistical analysis is used in the form of a Friedman test

Friedman Test

The Friedman test is a non-parametric test that is used to detect differences across various test attempts. The null hypothesis of the test is to investigate if the effect of two columns is the same [29].

Since all the hybrids use the same time series as an input there can be no assumption made on the distribution of the input data. This makes the Friedman test suitable as a statistical test.

In this thesis the null hypothesis will be that there is no difference between the performance of the models in predicting the next day stock indexes. If the significance level is greater than 0.05 the null hypothesis cannot be rejected. For each test the predicted time series for each compared model will be used as input data for the comparison. Further, the test will be conducted over all three different time series.

Root Mean Square Error (RMSE)

RMSE is the square root of the average of squared errors predicted over a time series. It serves as a measure of accuracy by calculating the total error by comparing the actual and predicted value for each step as it is scale-dependent. The equation can be written as:

RMSE = s1

m

Â

m t=1

(yt ˆyt)² (22)

Symmetric Mean Absolute Percent Error (SMAPE)

SMAPE is the average of the percentage error by which the predicted and the actual value differ when predicted over a series of time steps.

(25)

4.5 Measurements of Performance 16

SMAPE = 1 m

Â

m t=1

| (y^t ˆyt)|

(| yt| + | ˆyt|)/2 (23)

The models are implemented using MATLAB and the NN algorithms are implemented using the built-in NNtools while the SVM ,SVR are implemented using Classification Learner and Regression Learner. The chosen parameters for the models are stated in the model section and are all based on previously conducted work as part of a literature study.

(26)

Chapter 5

Results

The simulation was made using MATLAB’s built-in apps NNtools, Clasification Learner, and Regression Learner. Since the NN training generates different results due to different initial conditions and sampling, the predicted closing price was calculated over an average of 100 iterations.

Throughout the results the following notation applies:

GA-WA1= hybrid using LM training, GA-WA2= hybrid using BR training

5.1 Evaluation of Performance

First, each individual method was simulated with the three different financial time series. The first 75 days’

predicted closing prices are plotted alongside the actual prices in figures 5.1 to 5.3. Furthermore, by using the stated tools to measure model performance, table 5.1 was produced.

The methods were then combined into two hybrids constructed using the function GA by optimizing the weights in order to minimize the RMSE. The average weights for each model, which correspond to the degree to which each model contributes to the hybrid, are presented in table 5.3. The average stretches from 0 to 1 in every time step and the theory on how these are used to construct the hybrids are found in the related work section.

The first 75 days’ closing prices for each hybrid are plotted alongside the actual prices throughout figures 5.4 to 5.6. Furthermore, a 95% confidence interval is used in order to determine whether the difference among hybrids is statistically significant. The interval is illustrated in the figures using error bars. The performance of the hybrids is also incorporated and highlighted in table 5.1.

In order to examine how much the performance of the best hybrid varies, the standard deviation of the error for each time series is shown in table 5.2.

(27)

5.1 Evaluation of Performance 18

Fig. 5.1 Prediction of OMX30 stock index compared to the actual for both the hybrids.

(28)

(29)

(30)

Fig. 5.4 Prediction of S&P 500 stock index compared to the actual for both hybrids.

(31)

Fig. 5.5 Prediction of Nikkei 225 stock index compared to the actual for both hybrids.

(32)

Fig. 5.6 Prediction of OMX30 stock index compared to the actual for both hybrids.

Index Error/Method SVM SVR LMNN GA-WA1 BRANN GA-WA2

S&P 500 RMSE 13.43 8.494 7.748 3.718 7.577 3.574

Nikkei 225 RMSE 70.02 75.66 77.41 51.72 79.63 54.92

OMX30 RMSE 6.329 6.758 5.777 3.973 5.922 3.914

S&P 500 SMAPE 3.039 · 10 ³ 2.239 · 10 ³ 1.818 · 10 ³ 7.637 · 10 ⁴ 1.782 · 10 ³ 7.409 · 10 ⁴ Nikkei 225 SMAPE 2.372 · 10 ³ 2.594 · 10 ³ 2.409 · 10 ³ 1.581 · 10 ³ 2.438 · 10 ³ 1.634 · 10 ³ OMX30 SMAPE 4.430 · 10 ³ 3.119 · 10 ³ 2.787 · 10 ³ 1.463 · 10 ³ 2.858 · 10 ³ 1.419 · 10 ³

Table 5.1 Table showing the performances of methods

Stock index serror

S&P 500 3.080

OMX30 3.210

Nikkei 225 40.63

Table 5.2 Table showing standard deviation for the error of the best hybrid prediction

(33)

5.2 Statistical Analysis 24

Table 5.3 Table showing average weight for WA-GA hybrids

Index GA-WA1 GA-WA2

w1 w2 w3 w1 w2 w3

OMX30 0.2906 0.1237 0.5855 0.2993 0.1775 0.5231 S&P 500 0.1948 0.1400 0.6651 0.1967 0.1378 0.6653 Nikkei 225 0.0530 0.3810 0.5657 0.0656 0.3775 0.5566

5.2 Statistical Analysis

In order to determine whether the test was statistically significant, a statistical test was also performed. This was done as mentioned in section 4.5 by performing the Friedman test on each time series. The following two tables show the results of the Friedman tests. Table 5.3 shows the results when comparing the two hybrids while table 5.4 shows the results when comparing the hybrids to each individual model.

Stock index p >c² S&P 500 0.03720

OMX30 0.01680

Nikkei 225 0.4484

Table 5.4 Table showing significant level for the hybrids

Stock index p >c² S&P 500 1.932 · 10 ⁴⁵ OMX30 7.178 · 10 ⁴⁴ Nikkei 225 1.312 · 10 ¹⁹

Table 5.5 Table showing significant level for all models

(34)

Chapter 6

Discussion

The following chapter will delve deeper into the model performance and outcome, and the proposed hybrid combination comparisons will be discussed in a broader context.

6.1 Ethical Considerations

There are several ethical issues that arise in light of the research that need to be considered. During the process of improving the financial predictions, the labor market will change and the work tasks for stockbrokers will change. There is a challenge in moving people from physical work to more complex roles as these methods could automate jobs. I believe it’s important to maintain transparency and that there is value in allowing as many people as possible to access the hybrids for future development and improvements.

Machines can also make mistakes and implementing a new stock prediction method should be done carefully.

In order to stay in control of a complex system the models should be carefully monitored when they are data-driven. Especially since the stock market is volatile and the method has not been tested on either major fluctuations or during a long time frame.

6.2 Sustainability

The use of machine learning methods can be discussed in a broader perspective. There are a variety of aspects where machine learning and AI can contribute to the public good and a more sustainable future. Further studies in the field of machine learning can contribute to maintaining the sustainable management of the financial industry through more and more data acquisition, interpretation, integration and model fitting.

In this thesis, all the data used are collected for free from Yahoo Finance and all methods are implemented in a way that makes the experiment easy to reconstruct, improve, and reevaluate.

6.3 Discussion of Result

Predicting financial time series is not an easy task. There is no arbitrary criterion that tells us if an individual prediction is good or bad. If the data is noisy and forecasting-unfriendly one will be happy classifying any

(35)

6.3 Discussion of Result 26

alternative forecasting method that behaves better as good. The two hybrid methods that are compared are developed based on suggestions found in previous studies. The chosen parameters are therefore selected based on these proposed methods and not re-evaluated. In evaluating the presented hybrids, non-real investment has been made and any possible profits made are therefore unknown.

From table 5.1 the power of using hybrid models when predicting one-day ahead stock indexes using time series for different markets is shown clearly. Both the combined hybrids are outperforming each individual algorithm. Individually, the support vector methods and the ANN are based on fundamental differences and it would be ignorant to say that one of the models generally performs better. They are all within good range of each other and there are a lot of parameters that if changed could have altered the result.

Further, as table 5.3 shows, the weights corresponding to the NN algorithm are all greater than 0.5 and hence the NN algorithm is the main contributor to the combined hybrid. For Nikkei 225 the NN algorithms are individually inferior to SVR, SVM but still, the weights corresponding to NN are greater than those of SVR, SVM. There could be several reasons for this, one of which is that the ANN methods might diverge a lot more at sharp peaks compared to SVM, SVR but perform better in general, when the prices are somewhat stable and the curves are somewhat smooth. This causes the NN method to have more points in closer range to the actual value.

By comparing the RMSE values for the respective indexes, one can see that Nikkei 225 has a RMSE which has a magnitude of power 10 greater than the RMSE error for OMX30 and S&P 500. This is because the Nikkei 225 index value is of order 10⁴while the OMX30, S&P 500 both have an index value of order 10³. However, all of these are in different currencies and given that a Japanese yen is lower than 0.1 of a Swedish krona and that a Swedish krona is much lower valued than a US dollar, the S&P 500 is the index that yields the highest return.

It can also be seen from the graphs that the assumed value follows the curve well from one day to another as well as follows the upturns and downturns. One can also see that the algorithms seem to have the most difficul- ties predicting the price when there is a sudden and sharpened peak, as the deviations are greatest at those points.

Regarding the statistical analysis the hybrid figures show, the 95% confidence interval using error bars overlap well when the fluctuations are relatively smooth but not so well at peaks when the market is experiencing more fluctuations. However, this is not sufficient information to draw any conclusion about the statistical significance of the experiment and statistical testing as of the Friedman test is required.

As of the Friedman test, table 5.3 shows, the null hypothesis can somewhat be rejected for OMX30 but barely rejected for S&P500. The table also shows that for the Nikkei 225 time series, one cannot say that one of the hybrids is outperforming the other. The reason for this is not clear, however there could be several contributions. One of which is that the hybrids only differ by the training algorithm for the Neural Network, making them very similar.

(36)

Chapter 7

Conclusion

The main objective of this thesis has been to study how well various individual machine learning models perform by comparing them when predicting stock indexes as of three different financial time series and also, how well these individual methods stand against two combined hybrids constructed using GA.

As the performance measurement shows, both the compared hybrids outperform each individual model for all three time series. Further, it’s hard to pick a winner between the individual models since they all perform relatively equally. The S&P500 is the time series that the hybrids predict best. The average of the best-predicted hybrid value differs by 0.146 percent from the actual value. For Nikkei 225 this value is around 0.256 percent and for OMX30 this value is around 0.248 percent making the S&P500 the one that theoretically would give the largest return when investing money.

All of the algorithms were implemented using inbuilt functions in MATLAB. The use of these restricts one’s understanding of how they work on a profound level as well as the possibility of understanding the deeper underlying construction of parameters and technical indicators that have been used.

The statistical test also shows that it is difficult to distinguish which of the two hybrids performs best when compared. We can however conclude that both of the hybrids are much better than each individual model and that they statistically performed differently. The test does not provide information about which way they are different. But, one can generally conclude that both of the hybrids have shown high potential in short-term stock prediction.

From table 4.1 we see that OMX30 has approximately twice as much spread as the two other time series.

Since S&P 500 was the time series that gave the best performance there is not a clear correlation between the fluctuations of the times series and how good the prediction is. Further, as table 5.2 shows, the standard deviation of the error for the best hybrid is just a bit lower than the value of the RMSE. The fluctuation of the RMSE is therefore somewhat significant, yet not enough to state that the prediction is bad.

(37)

7.1 Future Research 28

7.1 Future Research

Suggestions are provided in this section for further research that should improve and expand the ability of predicting short-term stocks. Due to lack of time and after evaluating the study, there were some things that I did not delve into but that I wish I might have spent more time on.

First, I would have investigated the characteristics that characterize time series more profoundly. Time series is rather complex and there are a lot of models that are used in order to predict time series and hence a lot of parameters that affect the analysis and the results on a large scale. I would also try to find single machine learning algorithms that complement each other in different aspects when it comes to dealing with different financial phenomena. This would provide methods that could merge into a more optimal hybrid when it comes to predicting financial time series and hence provide a better result for the stock prediction.

It’s not clear that the hybrid will generate the highest return when the GA is used in order to reduce the RMSE when creating the hybrids. In order to implement the model for earning money this should be studied further. It’s also not evident enough that the hybrids are the overall best-known method for short-term prediction and further research therefore has to be conducted to examine if there is a better way of using GA or other alternative optimization models when predicting short term stocks. Further, nonlinear financial time series can be preprocessed and optimized in a more detailed manner. This is due to the fact that several different financial indicators were excluded in this thesis, some of which are Williams %R, Momentum between times, Volume rate of change. The proportion and amount of data for validation, training, testing can also be varied and tested in order optimize the prediction.

As stated above there is much room for improvement of the result and this study has played a role in guiding future research in search of finding the most optimal method in a mist of a vast ocean of methods.

(38)

References

[1] G William Schwert. Why does stock market volatility change over time? The journal of finance, 44(5):1115–1153, 1989.

[2] R.S. Tsay, Analysis of Financial Time Series, John Wiley, New York (2002).

[3] Asadi, S., Hadavandi, E., Mehmanpazir, F., Nakhostin, M.M.: Hybridization of evolutionary Leven- berg–Marquardt neural networks and data pre-processing for stock market prediction. Knowl. Based Syst. 35, 245–258 (2012)

[4] M. Paluch and L. Jackowska-Strumillo. The influence of using fractal analysis in hybrid mlp model for short-term forecast of close prices on warsaw stock exchange. In Computer Science and Information Systems (FedCSIS), 2014 Federated Conference on, pages 111–118, Sept 2014. doi: 10.15439/2014F358.

[5] Y.Zang,L.Wu Stock market prediction of S and P500 via combination of improved bco approach and BP neural network. Expert Syst. Appl. 36(5), pp. 805–818 (2009)

[6] Al-Hnaity, B., Abbod, M.: A novel hybrid ensemble model to predict FTSE100 index by combining neural network and EEMD. In: 2015 European Control Conference (ECC), pp. 3021–3028. IEEE, New York (2015)

[7] Alam.J, Ljungehed.J A coparative study of hybrid artificial neural network models in one-day stock price prediction, KTH Stockholm 2015

[8] Jonathan L. Ticknor. A bayesian regularized artificial neural network for stock market forecasting. Expert Systems with Applications, 40(14):5501 – 5506, 2013. ISSN 0957-4174.

[9] Cortes, C., Vapnik, V.: Support-vector Networks. In: Machine Learning, vol. 20(3), pp. 273– 297. Springer, Heidelberg (1995).

(39)

30

[10] Pontil,M.,Verri,A.:Properties of support vector machines.In:Neural Computation,vol.10(4), pp. 955–974.

MIT Press, Cambridge (1998).

[11] rucker, Harris; Burges, Christopher J. C.; Kaufman, Linda; Smola, Alexander J.; and Vapnik, Vladimir N. (1997); "Support Vector Regression Machines", in Advances in Neural Information Processing Systems 9, NIPS 1996, 155–161, MIT Press.

[12] Xia,Y.,Liu,Y.,Chen,Z.:Support Vector Regression for prediction of stock trend. In:20136th International Conference on Information Management, Innovation Management and Industrial Engineering (ICIII), vol. 2, pp. 123–126. IEEE, New York (2013)

[13] Suzana Herculano-Houzel. The human brain in numbers: a linearly scaled- up primate brain. Front. Hum.

Neurosci., 3, 2009. doi: 10.3389/neuro.09. 031.2009.

[14] Carlos Gershenson. Artificial neural networks for beginners. arXiv preprint cs/0308031, 2003.

[15] A.K. Jain, Jianchang Mao, and K.M. Mohiuddin. Artificial neural networks: a tutorial. Computer, 29(3):31–44, Mar 1996. ISSN 0018-9162. doi: 10.1109/2.485891.

[16] Diaz-Robles, L.A., Ortega, J.C., Fu, J.S., Reed, G.D., Chow, J.C., Watson, J.G., Moncada- Herrera, J.A.: A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: the case of Temuco, Chile. Atmos. Environ. 42(35) 8331–8340 (2008)

[17] Kubat,Miroslav:Neural Networks: A Comprehensive Foundation by Simon Haykin Macmillan. Cambridge University Press, Cambridge (1999)

[18] K. S. Narendra and K. Parthasarathy, Identification and control of dynamical systems using neural

[19] H. T. Siegelmann, B. G. Horne, and C. L. Giles, Computational capabilities of recurrent narx neural networks, IEEE Trans. Syst., Man Cybern., pt. B, vol. 27, p. 208, Apr. 1997.

[20] B. G. Horne and C. L. Giles, An experimental comparison of recurrent neural networks, in Advances in Neural Information Processing Systems 7, G. Tesauro, D. Touretzky, and T. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 697-704.

[21] H.xie, H.Tang, Y.Liao, Time Series Prediction based on NARX neural networks: an Advanced Approach.

State Key Laboratory for Manufacturing Systems Engineering, Research Institute of Diagnostics & Cybernetics, Xi’an Jiaotong university, Xi’an, Shanxi, China

[22] John,H.H.:Adaptation in Natural and Artificial Systems:An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. MIT press, Cambridge (1992)

(40)

31

[23] Al-hnaity adn M.abbod Predicting Financial Time Series Data Using Hybrid Model, B. Electronic and Computer Engineering Department, Brunel University, Uxbridge 2016,

[24] The Cross-Entropy Method for Continuous Multi-Extremal Optimization. (Dirk P. Kroese, Sergey Porotsky, Reuven Y. Rubinstein). Methodology And Computing In Applied Probability 08/2006; 8(3):383-407.

[25] Sadig Mammadli. Financial time series prediction using artificial neural network based on Levenberg- Marquardt algorithm.9th International Conference on Theory and Application of Soft Computing, Computing with Words and Perception, ICSCCW 2017, 242-253 Augustt 2017, Budapestt,Hungary

[26] P.A. Wise. The Levenberg-Marquardt algorithm with finite difference approximations to the Jacobian matrix. Cornell University, 1973.(24 i datan)

[27] Wu, B., Chang, C.-L.: Using genetic algorithms to parameters (d, r) estimation for threshold autoregressive models. Comput. Stat. Data Anal. 38(3), 315–330 (2002)

[28] Steven Walczak. An empirical analysis of data requirements for financial forecasting with neural networks.

2001.

[29] Ulf Grandin. Dataanalys och hypotesprövning för statistikanvändare. Naturvårdsverket 2013

(41)

Predicting The Stock Market With Financial Time Series Using Hybrid Models - A Comparative Analysis

Predicting The Stock Market With Financial Time Series Using Hybrid Models - A Comparative Analysis

VILHELM BUREVIK SANDBERG

Att förutspå aktiemarknaden i form av finansiella tidsserier genom att använda

hybridmetoder – en jämförande analys

VILHELM BUREVIK SANDBERG

Sammanfattning

Abstract

Table of contents

List of figures

List of tables

Nomenclature

Chapter 1

Introduction

1.1 Problem Statement

1.2 Scope

1.3 Thesis Outline

Chapter 2

Background

2.1 Support Vector Machine (SVM)

Â

Â

Â

2.2 Support Vector Regression (SVR)

2.3 Artificial Neuron Network (ANN)

Â

Â

2.4 Back Propagation Neural Network (BPNN)

2.5 Nonlinear Autoregressive Network with Exogenous Input (NARX)

2.6 Genetic Algorithm (GA)

2.7 Related Work

Â

Â

Â

Â

Â

Â

Chapter 3

Model

3.1 Bayesian Regularized Artificial Neural Network (BRANN)

3.2 Levenberg-Marquardt Artificial Neural Network (LMANN)

3.3 SVR, SVM

3.4 GA-WA

Chapter 4

Method

4.1 Data Analytics

4.2 Steps of Prediction

4.3 Technical Indicators

Relative Strength Index (RSI)

Exponential Weighted Moving Average (EMA)

4.4 Benchmark

4.5 Measurements of Performance

Friedman Test

Root Mean Square Error (RMSE)

Â

Symmetric Mean Absolute Percent Error (SMAPE)

Â

Chapter 5

Results

5.1 Evaluation of Performance

5.2 Statistical Analysis

Chapter 6

Discussion

6.1 Ethical Considerations

6.2 Sustainability

6.3 Discussion of Result

Chapter 7

Conclusion

7.1 Future Research

References