Stock forecasting using ensemble neural networks

(1)

IN

DEGREE PROJECT COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2018,

Stock forecasting using ensemble neural networks

DANIEL SKANTZ

WILLIAM SKAGERSTRÖM

(2)

Stock forecasting using ensemble neural networks

Daniel Skantz, William Skagerstr¨ om KTH 2018-06-06

Degree Project for the course DD142X Supervisor: Arvind Kumar

Examiner: ¨Orjan Ekeberg

Swedish Title/Svensk titel: Aktieprediktion med kombinerade neurala n¨atverk

(3)

Abstract

This paper explores the viability of creating an artificial neural network for stock forecasting using an ensemble method, where each network is differ- entiated through a unique selection of input parameters. The parameters used in this paper were chosen based on ones used in previous research, and through utilizing a stepwise addition parameter search method. The problem was approached as both a regression and a classification problem, where we evaluated the networks performance for the purpose of stock forecasting using relevant measurements. For the regression part, the result was negative: the neural network was not able to beat a naive prediction strategy. However, for classification, a modest but significant positive result was achieved.

(4)

Sammanfattning

Den här rapporten utforskar förm˚agorna hos neurala nätverk att förutsp˚a aktiekurser genom att använda kompositionsmetoder, där nätverken ˚atskiljs genom deras inmatningsparametrar. Dessa parametrar valdes delvis baserat p˚a tidigare forskning, och delvis genom en stegvisa tilläggs-metod. Prob- lemet behandlades b˚ade som ett regressions- och ett klassifikationsprob- lem, och nätverkets prestanda utvärderades med relevanta m˚att. Gällande regressionsmetoden s˚a var v˚art resultat negativt, d˚a nätverket inte kunde prestera bättre än naiva strategier. För klassifieringsprediktion av kursens riktning erhölls dock ett blygsamt men signifikant positivt resultat.

(5)

1 Introduction

There lies an obvious profitability in the ability to predict future stock prices.

For a long time, economists strongly believed in the ”efficient market hypothesis”, which states that if one could make a profit by predicting future stock prices, the market would quickly react to this inefficiency, and the opportunity would vanish. From the efficient market hypothesis, it follows that technical analysis, the prediction of future stock prices using historical pricing, is a mean- ingless endeavor. Under this hypothesis, neither is credence given to fundamental analysis, which is based on information such as market share and interest rates [1]. It is implied that stock prices would follow a ”random walk”. Since the start of the current century, a growing number of economists believe that not only are stock prices to some degree predictable using historical prices, but also that there exists a possibility of making excess returns using this information[2].

The task of predicting stock prices is challenging, as the pricing is non-linear, noisy and affected by many external factors [3][4]. Examples of such factors are foreign currency exchange rates, the price of oil, GDP and public debt [3][5].

The ability to forecast the development of stock prices using methods within artificial intelligence has been a rising trend for the past few decades within the financial sector [6]. Some methods that have been used for this type of stock forecasting include genetic algorithms, trend and regression analysis, and classification methods such as support vector machines [7]. Another method used is the training of artificial neural networks, due to their ability to pick up on data patterns that could elude conventional analytic techniques [6].

1.1 Problem Statement

This paper investigates the performance of an ensemble averaging method of several different neural networks in its ability to forecast the development of stock prices, with the overarching research question being: Can one train an artificial neural network to predict stock prices with significant precision?

1.2 Scope

The neural networks were evaluated on stocks from NASDAQ Nordic’s index of historical prices [8]. Prediction was treated both as a regression problem, of the next day’s closing price for a given stock, and as a classification problem, whether tomorrow’s closing price will be higher or lower than today’s price.

The networks were evaluated using relevant measurements that are introduced in the method section of this paper. The choice of network structure, as well as

(7)

its parameters, were based on selections made in previous studies, in addition to our own experimentation using trial and error and parameter search. Stock predictions were done on a day-to-day basis.

2 Background

2.1 The stock market

The stock market is an organized market where individuals, groups, companies or banks may exchange and trade liquid assets. The exact structure of a given stock market varies from country to country, but the fundamental idea remains the same: one may exchange capital for shares, representing a degree of own- ership over a given company determined by the number of stocks owned to the total amount of stocks owned by said company [9]. Stock prices are continually updated over the course of a trading day, reflecting the influence of supply and demand by available customer and shares, as well as additional external factors [3].

The market itself is created by clerks and computer automation matching customer orders with the best available prices. The market price of a given stock is determined by the price at which a buyer and a seller agrees to exchange the stock for capital corresponding to the listed price. The availability of a given stock depends on the amount of shares a given company has issued, and varies from company to company. And thus the price of a stock is not necessarily always a reflection of the value of a company, but is also factored in by the total amount of liquid assets available.

2.1.1 Technical Indicators

The use of technical indicators is a common method for analyzing the stock market. Technical indicators are derived from mathematical functions that analyze the behavior of data over one or more data points. In the financial sector, trend following indicators are commonly used for stock market predictions. Trend following indicators are are ”lagged values of the independent variable, or indicators calculated from the lagged values” [10]. Some examples of these are stochastic oscillators and moving averages [11]. Examples of trend following indicator definitions follow:

(8)

Williams R%(t): ^{HighP rice}ⁿ−ClosingP ricet

HighP rice_n−LowP rice_n

Stochastic K%(t): ClosingP rice_t−LowP ricet−n

HighestHighP ricet−n−LowestLowP ricet−n

Exponential Moving Average(t): A = 2/(n+1); (Closingpricet∗A)+(EM At−1∗ (1 − A))

where n denotes a chosen time period.

2.2 Artificial Intelligence

Artificial Intelligence is a branch within computer science, and deals with studies of the development and application of intelligent machines and software. Com- mon applications for artificial intelligence are processes designed for learning, planning, visual perception, natural language processing and decision making [12]. Machine learning is one of the major branches of Artificial Intelligence that focuses on algorithms that improve themselves from trial and error. The way they learn is from statistical optimization from previous experience in the form of input data [13]. Machine learning can be further split into three separate branches, each representing a way of learning for the involved algorithms.

These are supervised learning, unsupervised learning, and reinforcement learning. The area relevant to this paper is supervised learning, which works by presenting an algorithm with input examples which maps to the desired output.

The algorithm then continually learns how to map from the input to the output by training on the mentioned examples. After the algorithm has been trained sufficiently, it is able to predict new (not previously trained on) examples by using the experience it acquired during the training phase. These types of predictions methods have a wide range of applications, and can be used for both classification and regression [14].

2.2.1 Artificial Neural Networks

Artificial neural networks, commonly referred to as neural networks, is a method within machine learning that tries to emulate the functionality of a biological brain on a smaller scale [14]. A regular neural network consists of neurons, each of which is an individual processor, which generates real-valued output. Neu- rons in the outer layer react to the external environment, and transmit stimuli throughout the network by means of weighted connections. Input paths serve a similar purpose to dendrites in the human brain, and output paths correspond to axons [15]. Neural networks are used as a technique in supervised learning;

for example, they can be trained to minimize an error measure on a particular set of training data, and then be used on previously unseen data for classification or regression tasks [16]. There are many ways one can structure an artificial

(9)

neural network, but simple variants typically consists of an input layer, one of more hidden layers, and an output layer.

Figure 1: Structure of an Artificial Neural Network (ANN), showcasing the Input layer, the Hidden layer, and the Output layer. Figure from Ticknor.

”A Bayesian regularized artificial neural network for stock market forecasting”.

(2013) [17].

The multilayer perceptron neural network is one of the more common types of neural networks. Even when outfitted with only one hidden layer, most non- linear functions can be approximated with arbitrarily high precision [17][18].

The network uses supervised learning to reduce an appropriate error function for the given task. Figure 1 displays a typical structure for an artificial neural network with one input layer, one hidden layer, and an output layer. The size of the input layer is determined by the amount of inputs used for the network, and the size of the hidden layer is chosen freely. The hidden layers (and sometimes, the output layer) are usually outfitted with an activation function that signals that the node is ”active” given an input. Common activation functions are sig- moid, tanh, relu, and softmax. The edges between the nodes are represented as a matrix, which contains numeric weighted values (often denoted simply as weights) for the connection between the different nodes[19][20]. The learning process then involves changing these weights to reduce the chosen error function by first propagating an input through the network from the input layer all the way to the output layer, determine the results output value, and then propagate backwards using the chain rule and adjust the weights using a method of gradient descent. This is usually denoted as a learning step, and the algorithm itself for which the networks learns using these steps is called back-propagation[17].

Some relevant terms to the algorithm that are briefly mentioned later include the learning rate, which is a hyper parameter that specifies how large adjust- ments are made after each gradient update, and the batch size, which is a term used to specify the number of input samples that are used in one forward and

(10)

backward pass for the back-propagation algorithm.

The advantage of neural networks is that they are universal function approxi- mators, that can approximate a large class of functions on a bounded interval arbitrarily closely [21][22]. They are also able to efficiently deal with the pres- ence of noise in data [10]. Unlike many traditional statistical methods, neural networks require no assumptions about the underlying mathematical model de- scribing the data. Instead, such a model is created dynamically by the neural network during the training process [23]. When the underlying distribution is extremely non-linear, regular statistical methods can be unsuitable [24]. For these reasons, the use of neural networks to solve regression and classification problems has become more common [25][26].

2.2.2 Construction of artificial neural networks for stock prediction

One of the many applications of Neural networks is the ability to conduct time series forecasting, as they can be flexibly utilized on non-linear problems. Us- ing an artificial neural network for stock forecasting can be done by creating a prediction algorithms that is able to train on historical data related to a given stock, and then make predictions for future prices.

Neural Networks have been used by several researchers to predict future prices and direction of change of the stock price index. It has been shown that the predictive ability can be superior to that of traditional regression techniques [27].

The output of the neural network is divided into two types, and defines the problem solving approach that it is apply for the purpose of stock forecasting, either by regression or by classification. Regression output from a neural network comes into the form of a number x ∈ R, where x represents the network prediction on the price of a stock on a given day. Classification instead attempts to output the trend of the stock, that is to say, if the price of the stock will in- crease or decrease on a given day. The output is represented as a binary number, where 0 represents a downward trend and 1 represents the upward trend a stock on a given day.

Neural networks are often constructed in a non-systematic way. The optimal structure depends on the specific task that is to be solved, and is often decided by heuristics, rules of thumb, and attempts using trial and error [28].

In a survey by Tk´aˇc and Verner (2015), it was found that neural networks of multi-layer feed-forward type was used in a majority of papers on neural networks aimed to aid business decisions [29]. In a survey by Atsalakis and Vala- vanis (2008), different architectures for Neural Networks used in stock price forecasting are described. Typical structures include one or two hidden layers, with between 3 to 50 neurons in each. Many authors used technical indicators,

(11)

typically between 2 to 25. The amount of trading days used varied; some authors used decades of data, while some used only a few months; normally, a few years of trading data was used by most authors [7]. Lasfer et al. (2013) conducted statistical analysis on different parameter choices for forecasting financial time series with neural networks. In the tested ranges, a learning rate of 0.001, 6 hidden neurons, and a linear output transfer function was found optimal for their dataset. It was also found that neural networks of feed-forward type were affected less by changing the number of hidden neurons, and that the choice of output transfer function become less important with more hidden neurons [30].

Walczak and Cerpa (1999) provided guidelines on heuristics for selecting an efficient neural network structure [31]. The performance of a neural network improves with the selection of relevant input variables, the right amount of input variables, amount of hidden layers and neurons in each hidden layer, the right learning rule, and dataset size. For stock price forecasting, the inclusion of correlated variables did not degrade performance [31].

Parametric searches are challenged by the non-deterministic nature of neural networks, and the risk for over-fitting on given data. Over-fitting is the phe- nomenon of adapting a model that performs especially well on the training set, but poorly on out of sample data [26][23]. To reduce the probability of selecting a suboptimal set of parameters by chance, one can iterate the neural network over the same set of parameters several times, and for example use the mean error term as the fitness indicator [24].

The choice of input is crucial to the performance of a neural network. In prac- tice, a subset of all reasonable inputs should be used. Using an exhaustive parameter search for an optimal set of inputs can be computationally unfeasi- ble. Heuristics such as stepwise selection have been used instead [32] [33].

2.2.3 Ensemble models

Ensemble modeling is a technique used in machine learning, whereby multiple algorithms are combined to yield better predictions. J. Stock writes, that when two predictive models yield different results, a combination is often preferred to any of the two predictions alone [34]. This method has been previously used to predict stock prices, stock price direction of change, and for generating investment strategies using machine learning algorithms [35]. Four general ensemble methods for neural networks are given by Yu (2008) [36]:

1. Vary the hyperparameters, such as initial weights, learning rate, and training batch size.

2. Vary the amount of hidden layers and the amount of neurons used.

3. Vary the used data.

4. Vary the training algorithm.

(12)

3 Method

3.1 Data Acquisition

The data that was used for the project was obtained from NASDAQ OMX Nordic’s historical prices service [8]. The time interval chosen was 600 days, roughly 2 years in trading days. The data consisted of 10 date-stamped data points that showcases different properties of the stock behavior on a given day, and include Bid, Ask, Opening price, High price, Low price, Closing price, Av- erage price, Total trading volume, and number of Trades.

The follow stocks were used in this paper:

• Ericsson B (ERIC B).

• NAXS (NAXS).

• Net Insight B (NETI B).

• Investor B (INVE B).

• Sv. Handelsbanken A (SHB A).

• ¨Oresund (ORES).

• Vostok New Ventures (VNV SDB).

• Proact IT Group (PACT).

• Hexagon B (HEXA B).

• HMS Networks (HMS).

The timeline used for each stock was 600 trading days, ending at 2018-03- 13.

3.2 Training the Neural Networks

The network was created using the Keras Framework, with Tensorflow as back- end [37][38]. Pandas and Numpy was utilized for managing data, and Matplotlib for creating graphs [39][40][41]. All code was written in Python 3.5.

The data was split into two separate sections for training and testing. Of the total 600 days used, 120 (20% of the total) was separated into a testing parti- tion, while the remaining 480 days was used for training. The predictions were done one day at a time, meaning for a trading day t, the goal is to predict the closing price of a day t + 1.

3.3 Evaluating Network Performance

Since predictions are made one day at a time, we chose a simple naive baseline to determine the significance of our results: we compare the error scores of our predictions, with the baseline error scores achieved using the naive prediction

(13)

that the price of the next trading day t + 1 is equal to the current trading day t, which can be expressed as

P rice(t + 1) = P rice(t) (1)

Where P rice(t) denotes the price of a stock on a trading day t for which a prediction is made. Given the random walk hypothesis, this prediction minimizes the expected error. This naive strategy was found to also have been used in previous studies, for example a comparative study by Sermpini et al (2012) [42].

There are different ways of judging the performance of a neural networks that perform regression based predictions. Common measures of error are the mean absolute percentage error (MAPE), mean absolute error, root mean square error and mean square error. We use the MAPE for our regression approach, as this measure makes prediction on stocks with different price levels more comparable.

(14)

The mean absolute percentage error is defined by:

M AP E = 100 n

n

X

t=1

Tt− Pt

Tt

(2)

Where Ttis the true value and Ptis the predicted value a given stock at day t.

For our classification approach, we use Accuracy, which is measured in per- centages and can be defined by:

Accuracy = 100 ∗CorrectP redictions

n (3)

Where CorrectP redictions is the number of correct classifications and n is the total number of predictions that were generated.

3.4 Structure of the Neural Network

3.4.1 Network layout

Using trial and error and previous research as a basis, we constructed a neural network with 5 neurons in the hidden layer. Trading data was normalized to the range [−1.0, 1.0] using the min-max function. Normalization can be necessary to meet algorithmic needs and to reduce the time needed to converge when using gradient descent to train the neural network [32]. Hyperbolic tangent was chosen as the activation function for the hidden layer of the regression variant of the neural network. The hyperbolic tangent function is defined as:

tanh(x) = 2

1 − e^−2x − 1 (4)

For the classification network, we chose rectified linear unit, or relu, defined by:

relu(x) = log(1 + e^x) (5)

The optimization method chosen was Adam, or Adaptive moment estimation [43]. The error function chosen was Mean absolute error (MAE) due to its simplicity to translate into Mean Absolute Percentage Error when comparing error values between different stocks. This is beneficial, as otherwise performance on stocks priced at different levels would not be comparable. For the classification network, we used binary cross-entropy, since we are dealing with a binary classification problem that consists of two classes (stock increases or decreases).

(15)

3.4.2 Input parameters

Previous research has been conducted on stock price prediction using technical analysis [2]. Commonly used inputs are historical opening prices, high prices, low prices, and closing prices for given days, where the future closing price is treated as the dependent variable.

The models used for the ensemble was chosen from previous work, from commonly used sets of inputs, and from the results from a forward-elimination parameter search heuristic. The parameter search works by continually adding indicators by training a network on a subset of the training data, keeping the rest as a validation data for evaluation (to keep the test data of being independent), keeping the indicator that yielded the lowest error score in each iteration.

The validation split of the testing data was 20% of the later trading days of the training set.

The considered technical indicators were given by Ticknor (2013), Patel et al.

(2015), and Zhang and Wu (2009); we make a reference to their articles for the mathematical definitions of each indicator. Some technical indicators were also handpicked for the use of the parameter search to obtain additional models.

The technical indicators used in the report are used in an abbreviated form.

Below follows a list of all the technical indicators used, and the abbreviations for them. Some of them use parameters to specify, for example, the look-back period (timespan) used for said technical indicator.

• Typical price, using 1-day period.

• William’s R%: W %R − t, where t is the timespan.

• Stochastic K%: SO%K.

• Stochastic D%: SO%D.

• Exponential Moving Average: EM A − t, where t is the timespan.

• MACD(x,y,z): Moving Average Convergence/Divergence, where x is the fastperiod, y is the slowperiod, and z is the signalperiod.

• ROC-t: Rate of change, where t is the timespan used.

• ADO-x-y: Chaikin A/D Oscillator, where x is the fastperiod and y is the slowperiod.

• CC1-t: Commodity Channel Index, where t is the timespan.

• ATR-t: Average True Range, where t is the timespan.

• HPACC-t: High Price Accelerator, where t is the timespan.

• WMA-t: Weighted Moving Average, where t is the timespan.

• RSI-t: Relative Strength Index, where t is the timespan.

• MOM-t: Momentum, where t is the time period.

The technical indicators were implemented using the Ta-Lib open source library [44]. Variables not specified above retain default values.

(16)

Each iteration of the parameter searches was in turn conducted 8 times (ac- counting for a slight variation due to the random initialization of weights), after which the mean of the error score was used to select variables for addition. The end result was a number of models for consideration to be used in the ensemble model. Parameter search was conducted on each of the two types of networks (regression and classification).

Out of the numerous combinations of inputs that resulted from the heuristic, two from each was selected; one model that showed the best performance of all evaluated models, and one similarly performing model but with more input variables:

Model 1 Close(t) Close(t-1) Close(t-2) EMA-5 ADO-3-10 Model 2 Close(t) Close(t-1) Close(t-2) EMA-5 MACD(12,26,9)

W%R-14 ROC-12 ADO-3-10

Table 1: Models from Regression parameter search

Model 3 Close(t) Close(t-1) Close(t-2) ROC-12 RSI-9 CCI-14 SO%k EMA-5 Model 4 Close(t) Close(t-1) Close(t-2) Close(t-3) Close(t-4) ROC-12 RSI-9

CCI 14 SO%k W%R-14 RSI-14 MOM-10 Typical price MOM-9 HPACC-10 ATR-14 WMA-10

Table 2: Models from Classification parameter search

In addition to these four input parameters, we chose to include a traditional candlestick model:

Model 5 Close(t) High(t) Low(t) Opening(t)

Table 3: Candlestick model

We also included a model obtained from our own trial and error experiments in setting up the network:

Model 6 Opening(t) High(t) Low(t) EMA-5 EMA-10

STO%K STO%D

Table 4: Custom model

(17)

3.5 Models used

The ensemble models were varied by input variables. The selections of input variables were obtained from our own experiments, a parameter-search we conducted, and from the tradition of candlestick patterns.

Model 1 Close(t) Close(t-1) Close(t-2) EMA-5 ADO-3-10 Model 2 Close(t) Close(t-1) Close(t-2) EMA-5 MACD(12 26 9)

ADO-3-10 W%R-14 ROC-12

Model 3 Close(t) Close(t-1) Close(t-2) ROC-12 RSI-9 CCI(14)

SO%K EMA(5)

Model 4 Close(t) Close(t-1) Close(t-2) Close(t-3) Close(t-4) ROC-12 RSI-9 CCI 14 SO%k W%R-14 RSI-14 MOM-10 Typical price MOM 9 HPACC 10 ATR-14 WMA 10

Model 5 Close(t) High(t) Low(t) Opening(t)

Model 6 Opening(t) High(t) Low(t) EMA(5) EMA(10) RSI W%R-14

STO%K STO%D

Table 5: Inputs of ensembled models

The neural network corresponding to each model ran for 10 iterations, thereby creating an ensemble of networks with different initial weights, generating 60 predictions for each forecasted day. The ensemble model is then created by two different composition methods: taking the median prediction for each day, and taking the mean prediction for each day.

4 Results

This section covers the results that were produced from our testing and evaluation of the neural networks performance for the purpose of stock forecasting.

In the regression part, we showcase the the plots produced from the output of the neural network, in addition to the performance of the ensemble model and the mean performance over all stocks for each model in the form of tables.

Classification is structured in the same manner, except that there are no plots due to the nature of the output of the classification approach.

(18)

4.1 Regression

Below follows three sections related to the performance of the neural network that was designed around the regression task.

4.1.1 Plots from the Ensemble models forecasting prediction

Below are the plots of predictions generated by the regression neural network.

Two samples have been picked out out of the total of the 10 stocks (2 plots each, 20 plots in total), containing the predictions of the neural network and the error. These two sets of plots were picked out due to them displaying two different characteristic prediction patterns: either the neural networks converge to a strategy similar to the naive baseline, of predicting the current day’s price to be roughly the same price as that of the previous day, or the neural networks make less obvious predictions. Figure 2 and 3 is an example of the former, where the plotting of the predicted price closely mirrors that of the actual price but with one day’s lag:

Figure 2: Regression predictions from the neural network on the Ericsson B stock.

(19)

Figure 3: Regression error from the neural network on the Ericsson B stock.

Figure 4 and 5 below correspond to predictions of the NAXS stock, where the naive strategy is utilized to a lesser degree:

Figure 4: Regression predictions from the neural network on the NAXS stock.

(20)

Figure 5: Regression error from the neural network on the NAXS stock.

There are a total of 20 plots, 2 for each stock which showcases the predictions of the neural network and is accompanied by a plot which shows the error over the prediction timeline. The plots for all of the graphs can be found in appendix A.

4.1.2 Ensemble performance

Table 6 below showcases the performance of the ensemble regressor with the mean and median composition of the network output versus that of the naive baseline.

(21)

Stock Mean MAPE(%) Median MAPE(%) Baseline MAPE(%)

ERIC B 1.48099 1.48256 1.48281

HEXA B 0.98422 0.97919 0.95881

HMS 1.54586 1.54629 1.50740

INVE B 0.82879 0.81846 0.81590

NAXS 0.91658 0.91047 0.97956

NETI B 2.08182 2.06832 2.08965

ORES 1.01176 1.01484 0.99681

PACT 1.53612 1.54716 1.46596

SHB A 0.83340 0.83312 0.86971

VNV SDB 1.13662 1.13350 1.13886

Table 6: Ensemble model MAPE using mean and median composition on each stock vs the baseline

4.1.3 Mean performance

Table 7 displays the mean performance of each individual model over all 10 stocks.

Model Mean Mape over all stocks(%)

Baseline 1.23054

Median Ensemble 1.23339

Mean Ensemble 1.23562

Model 1 1.22918

Model 2 1.23603

Model 3 1.25617

Model 4 1.29794

Model 5 1.23969

Model 6 1.24112

Table 7: Mean MAPE over all stocks

(22)

4.2 Classification

Below are two sections containing a table each which showcase the performance of the ensemble classifier against the baseline. Color highlighting was used to signify the significance of the individual results. Given the null hypothesis of random walk, we can assume that correct predictions are binomially distributed with probability 50%.

4.2.1 Ensemble performance

The number of trials for the ensemble performance is 120 for each stock. Green color marks a significant predictive ability at 99% confidence level (greater than or equal to 61.67%), red color an inferior predictive ability (less or equal to 38.33%), and yellow displays an uncertainty in the networks predictive ability.

Stock Accuracy(%) ERIC B 45.83

HEXA B 53.33

HMS 52.50

INVE B 52.50 NAXS 69.17 NETI B 59.17 ORES 55.00 PACT 48.33 SHB A 60.00 VNV SDB 64.17

Table 8: Ensemble model Accuracy

(23)

4.2.2 Mean Performance

The number of trials for the mean performance is 1200 (120 for each stock).

Green color marks a significant predictive ability at 99% confidence level (greater than or equal to 53.42%), red color an inferior predictive ability (less or equal to 46.58%), and yellow displays an uncertainty in the networks predictive ability.

Model Mean Accuracy across all stocks(%)

Ensemble 56.00

Model 1 53.83

Model 2 53.92

Model 3 54.83

Model 4 54.50

Model 5 54.83

Model 6 54.92

Table 9: Mean accuracy across all stocks for each individual model

5 Discussion

5.1 Discussion of method

Wishing for a more systematic approach in input parameter selection, than ones used in most previous research, we utilized a forward-addition parameter search, a heuristic inspired by stepwise regression. However, the selections of technical indicators, which were varied in size, tradition and systematism, turned out to not give considerable differences in the predictive abilities of the neural network.

We also found that most research is conducted using just one or a couple of stocks. Our sentiment that this is a weakness is shared by Ngyuen et al.

(2015)[45]. Using a very limited number of stocks poses the risk of selecting network parameters that are too specific for the given stocks, and could cause a form of over-fitting, especially as neural networks often are constructed using trial and error. While it could be desirable to specialize in the prediction of a single stock for investment purposes, one attribute for the neural network that

(24)

we sought to create was for it the be capable of predicting on several different sets of data, and so we decided to evaluate the networks predictions on 10 different stocks.

A baseline for regression performance was needed to at all evaluate the predictive ability of the neural network. We argue that the proposed neural network’s ability to predict can’t be compared alone to the ability of other neural network models, or to performance given by traditional statistical methods of time series analysis, as these only indirectly offer measures for comparison, while their own significance remains not shown.

5.2 Discussion of results

Our neural network was not able to significantly beat a naive baseline in predicting the adjacent closing price on a selection of 10 stocks; in fact, graphs of generated predictions show a clear convergence towards a naive strategy. Simi- lar results were obtained by R. Sitte and J. Sitte (2002), who argue that this is not a limitation to the neural network model, but a natural result of predicting an essentially random process[46]. Zhang and Wu (2009) argue contrarily, and say that a convergence toward a naive prediction strategy is a drawback to the back-propagation neural network model, and propose augmented strategies such as noise reducing data-preprocessing [25].

For classification, a modest but significant predictive ability was proven. An ensemble created from randomized initial neural network weights and varied input data were shown to be slightly more accurate than the individual models used to create it. Accuracy varied from roughly 40 to 70%, which shows that classification performance can hardly be tested on just one or a few stocks.

5.3 A note on regularization

Most neural networks usually apply some form of regularization in order to reduce the amount of overfitting that occurs when the optimizer maps the input samples in the training data to its corresponding desired output. This is to make it generalize better for the independent testing data. During the process of experimenting with several different structures of the neural network, several attempts were made to add regularization to the network in hopes of improving the results due to the reduction of overfitting on the dataset; however, each attempt was met with a net loss of both MAPE and accuracy. Attempts were made to add an L₂ regularization term to penalize the weight update, and we also tried to add a dropout layer between the hidden layer and the output layer, which ignores a chosen percentile of the neurons to generalize better [47].

We also conducted trials where we added the method of batch normalization,

(25)

which reduces the covariance shift of the individual training batches, which also indirectly works as a form of regularization [48]. The fact that every attempt at regularization, however so small, was met with a flat reduction of performance perhaps hints about the difficulty for the neural network to learn patterns from the input data provided.

6 Conclusion and recommendations

The results that we obtained show a negative result for stock price regression using neural networks, and a modest but positive result for stock price direction change classification. For regression, a convergence towards a naive strategy was shown. For classification, large variance was shown in the ability to predict stock price direction change between different stocks. This shows the impor- tance of using many stocks when evaluating a neural network’s classification ability. We therefore wish to recommend authors to use more than a few stocks when researching stock prediction, and provide graphs where predictive ability is clearly shown, as well as comparisons with a proper baseline. Furthermore, a rolling window should be used for normalization, as this procedure otherwise causes information leakage from the testing set to the training set; on the other hand, iterative training could yield better predictive performance, as more, and more recent, data is used to generate predictions. However, very few authors of our surveyed reports utilized this technique.

(26)

References

[1] William Leigh, Russell Purvis, and James M. Ragusa. “Forecasting the NYSE composite index with technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in romantic decision support”. In: Decision Support Systems 32.4 (2002), pp. 361–377. issn:

0167-9236. doi: https://doi.org/10.1016/S0167- 9236(01)00121- X. url: http : / / www . sciencedirect . com / science / article / pii / S016792360100121X.

[2] Burton G Malkiel. “The efficient market hypothesis and its critics”. In:

Journal of economic perspectives 17.1 (2003), pp. 59–82.

[3] Michel Ballings et al. “Evaluating multiple classifiers for stock price direction prediction”. In: Expert Systems with Applications 42.20 (2015), pp. 7046–7056. doi: 10.1016/j.eswa.2015.05.013.

[4] Jian-Zhou Wang et al. “Forecasting stock indices with back propagation neural network”. In: Expert Systems with Applications 38.11 (2011), pp. 14346–14355. issn: 0957-4174. doi: https://doi.org/10.1016/j.

eswa.2011.04.222. url: http://www.sciencedirect.com/science/

article/pii/S0957417411007494.

[5] Niall O’Connor and Michael G. Madden. “A neural network approach to predicting stock exchange movements using external factors”. In: Knowledge- Based Systems 19.5 (2006). AI 2005 SI, pp. 371–378. issn: 0950-7051. doi:

https://doi.org/10.1016/j.knosys.2005.11.015. url: http://www.

sciencedirect.com/science/article/pii/S0950705106000153.

[6] Robert R Trippi and Efraim Turban. Neural networks in finance and in- vesting: Using artificial intelligence to improve real world performance.

McGraw-Hill, Inc., 1992.

[7] George S. Atsalakis and Kimon P. Valavanis. “Surveying stock market forecasting techniques – Part II: Soft computing methods”. In: Expert Systems with Applications 36.3, Part 2 (2009), pp. 5932–5941. issn: 0957- 4174. doi: https : / / doi . org / 10 . 1016 / j . eswa . 2008 . 07 . 006. url:

http://www.sciencedirect.com/science/article/pii/S0957417408004417.

[8] Nasdaq OMX Nordic. Historical Prices. http://www.nasdaqomxnordic.

com. Visited 2018-03-13.

[9] Richard J Teweles and Edward S Bradley. The stock market. Vol. 64. John Wiley & Sons, 1998.

[10] Iebeling Kaastra and Milton Boyd. “Designing a neural network for forecasting financial and economic time series”. In: Neurocomputing 10.3 (1996), pp. 215–236. doi: 10.1016/0925-2312(95)00039-9.

[11] Mark P Taylor and Helen Allen. “The use of technical analysis in the foreign exchange market”. In: Journal of international Money and Finance 11.3 (1992), pp. 304–314.

(27)

[12] Stuart J Russell and Peter Norvig. Artificial intelligence: a modern approach. Malaysia; Pearson Education Limited, 2016.

[13] Nasser M Nasrabadi. “Pattern recognition and machine learning”. In:

Journal of electronic imaging 16.4 (2007), p. 049901.

[14] Simon Haykin and Neural Network. “A comprehensive foundation”. In:

Neural networks 2.2004 (2004), p. 41.

[15] Tim Hill, Marcus O’Connor, and William Remus. “Neural Network Mod- els for Time Series Forecasts”. In: Management Science 42.7 (1996), pp. 1082–

1092. issn: 00251909, 15265501. url: http://www.jstor.org/stable/

2634369.

[16] J¨urgen Schmidhuber. “Deep learning in neural networks: An overview”.

In: Neural Networks 61 (2015), pp. 85–117. doi: 10 . 1016 / j . neunet . 2014.09.003.

[17] Jonathan L. Ticknor. “A Bayesian regularized artificial neural network for stock market forecasting”. In: Expert Systems with Applications 40.14 (2013), pp. 5501–5506. doi: 10.1016/j.eswa.2013.04.013.

[18] Lin Wang, Yi Zeng, and Tao Chen. “Back propagation neural network with adaptive differential evolution algorithm for time series forecasting”.

In: Expert Systems with Applications 42.2 (2015), pp. 855–863. doi: 10.

1016/j.eswa.2014.08.018.

[19] G David Garson. “Interpreting neural-network connection weights”. In:

AI expert 6.4 (1991), pp. 46–51.

[20] ATC Goh. “Back-propagation neural networks for modeling complex systems”. In: Artificial Intelligence in Engineering 9.3 (1995), pp. 143–151.

[21] Kurt Hornik. “Approximation capabilities of multilayer feedforward networks”. In: Neural networks 4.2 (1991), pp. 251–257.

[22] Zhe Liao and Jun Wang. “Forecasting model of global stock index by stochastic time effective neural network”. In: Expert Systems with Appli- cations 37.1 (2010), pp. 834–841. issn: 0957-4174. doi: https://doi.

org/10.1016/j.eswa.2009.05.086. url: http://www.sciencedirect.

com/science/article/pii/S0957417409005247.

[23] G.Peter Zhang. “Time series forecasting using a hybrid ARIMA and neural network model”. In: Neurocomputing 50 (2003), pp. 159–175. issn:

0925-2312. doi: https://doi.org/10.1016/S0925- 2312(01)00702- 0. url: http : / / www . sciencedirect . com / science / article / pii / S0925231201007020.

[24] Harri Niska et al. “Evolving the neural network model for forecasting air pollution time series”. In: Engineering Applications of Artificial Intelli- gence 17.2 (2004), pp. 159–167.

(28)

[25] Yudong Zhang and Lenan Wu. “Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network”. In:

Expert Systems with Applications 36.5 (2009), pp. 8849–8854. doi: 10.

1016/j.eswa.2008.11.028.

[26] Erkam Guresen, Gulgun Kayakutlu, and Tugrul U. Daim. “Using artificial neural network models in stock market index prediction”. In: Expert Systems with Applications 38.8 (2011), pp. 10389–10397. issn: 0957-4174.

doi: https://doi.org/10.1016/j.eswa.2011.02.068. url: http://

www.sciencedirect.com/science/article/pii/S0957417411002740.

[27] Tae Hyup Roh. “Forecasting the volatility of stock price index”. In: Expert Systems with Applications 33.4 (2007), pp. 916–922. issn: 0957-4174. doi:

https://doi.org/10.1016/j.eswa.2006.08.001. url: http://www.

sciencedirect.com/science/article/pii/S0957417406002223.

[28] Alfonso Palmer, Juan Jose Montano, and Albert Ses´e. “Designing an artificial neural network for forecasting tourism time series”. In: Tourism Management 27.5 (2006), pp. 781–790.

[29] Michal Tk´aˇc and Robert Verner. “Artificial neural networks in business:

Two decades of research”. In: Applied Soft Computing 38 (2016), pp. 788–

804. issn: 1568-4946. doi: https://doi.org/10.1016/j.asoc.2015.

09.040. url: http://www.sciencedirect.com/science/article/pii/

S1568494615006122.

[30] Assia Lasfer, Hazim El-Baz, and Imran Zualkernan. “Neural Network design parameters for forecasting financial time series”. eng. In: IEEE, Apr.

2013, pp. 1–4. isbn: 978-1-4673-5812-5.

[31] Steven Walczak and Narciso Cerpa. “Heuristic principles for the design of artificial neural networks”. In: Information and Software Technology 41.2 (1999), pp. 107–117. issn: 0950-5849. doi: https://doi.org/10.

1016/S0950-5849(98)00116-5. url: http://www.sciencedirect.com/

science/article/pii/S0950584998001165.

[32] Sven F Crone. “Stepwise selection of artificial neural network models for time series prediction”. In: Journal of Intelligent Systems 14.2-3 (2005), pp. 99–122.

[33] Junsub Yi and Victor R Prybutok. “A neural network model forecasting for prediction of daily maximum ozone concentration in an industrialized urban area”. In: Environmental Pollution 92.3 (1996), pp. 349–357.

[34] James H Stock. “Forecasting economic time series”. In: A Companion to Theoretical Econometrics, Blackwell Publishers (2001), pp. 562–84.

[35] Rodolfo C. Cavalcante et al. “Computational Intelligence and Financial Markets: A Survey and Future Directions”. In: Expert Systems with Ap- plications 55 (2016), pp. 194–211. issn: 0957-4174. doi: https://doi.

org/10.1016/j.eswa.2016.02.006. url: http://www.sciencedirect.

com/science/article/pii/S095741741630029X.

(29)

[36] Lean Yu et al. BioInspired Credit Risk Analysis. Springer, 2008.

[37] Francois Chollet et al. Keras. https://keras.io. 2015.

[38] Mart´ın Abadi et al. TensorFlow: Large-Scale Machine Learning on Het- erogeneous Systems. Software available from tensorflow.org. 2015. url:

https://www.tensorflow.org/.

[39] Wes McKinney et al. “Data structures for statistical computing in python”.

In: Proceedings of the 9th Python in Science Conference. Vol. 445. Austin, TX. 2010, pp. 51–56.

[40] Travis E Oliphant. A guide to NumPy. Vol. 1. Trelgol Publishing USA, 2006.

[41] John D Hunter. “Matplotlib: A 2D graphics environment”. In: Computing in science & engineering 9.3 (2007), pp. 90–95.

[42] Georgios Sermpinis et al. “Forecasting and trading the EUR/USD exchange rate with stochastic Neural Network combination and time-varying leverage”. In: Decision Support Systems 54.1 (2012), pp. 316–329. issn:

0167-9236. doi: https://doi.org/10.1016/j.dss.2012.05.039. url:

http://www.sciencedirect.com/science/article/pii/S0167923612001509.

[43] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic optimization”. In: arXiv preprint arXiv:1412.6980 (2014).

[44] TA-Lib. http://mrjbq7.github.io/ta-lib/. Accesed 2018-06-06.

[45] Thien Hai Nguyen, Kiyoaki Shirai, and Julien Velcin. “Sentiment analysis on social media for stock movement prediction”. In: Expert Systems with Applications 42.24 (2015), pp. 9603–9611. doi: 10.1016/j.eswa.2015.

07.052.

[46] Renate Sitte and Joaquin Sitte. “Neural networks approach to the random walk dilemma of financial time series”. In: Applied Intelligence 16.3 (2002), pp. 163–171.

[47] Nitish Srivastava et al. “Dropout: A simple way to prevent neural networks from overfitting”. In: The Journal of Machine Learning Research 15.1 (2014), pp. 1929–1958.

[48] Sergey Ioffe and Christian Szegedy. “Batch normalization: Accelerating deep network training by reducing internal covariate shift”. In: arXiv preprint arXiv:1502.03167 (2015).

(30)

A Additional graphs

The graphs in this appendix showcases the forecasting of the regression network(in blue) vs the true value of the stock(in orange). The plots are accompanied by an error bar, showcasing the error in price over the timespan of the forecast.

(a) Prediction (b) Predicted vs Target

Figure 6: Figure a) Stock Forecast of ERIC B. Figure b) Error of the forecast over time

Figure 7: Figure a) Stock Forecast of HEXA B. Figure b) Error of the forecast over time

(31)

Figure 8: Figure a) Stock Forecast of HMS. Figure b) Error of the forecast over time

Figure 9: Figure a) Stock Forecast of INVE B. Figure b) Error of the forecast over time

Figure 10: Figure a) Stock Forecast of NAXS. Figure b) Error of the forecast over time

(32)

Figure 11: Figure a) Stock Forecast of NETI B. Figure b) Error of the forecast over time

Figure 12: Figure a) Stock Forecast of Ores. Figure b) Error of the forecast over time

Figure 13: Figure a) Stock Forecast of PACT. Figure b) Error of the forecast over time

(33)

Figure 14: Figure a) Stock Forecast of SHB A. Figure b) Error of the forecast over time

Figure 15: Figure a) Stock Forecast of VNV SDB. Figure b) Error of the forecast over time

(34)

TRITA-EECS-EX-2018:210

Stock forecasting using ensemble neural networks

Stock forecasting using ensemble neural networks

DANIEL SKANTZ

WILLIAM SKAGERSTRÖM

Stock forecasting using ensemble neural networks

Daniel Skantz, William Skagerstr¨ om KTH 2018-06-06

Contents

1 Introduction

1.1 Problem Statement

1.2 Scope

2 Background

2.1 The stock market

2.2 Artificial Intelligence

3 Method

3.1 Data Acquisition

3.2 Training the Neural Networks

3.3 Evaluating Network Performance

3.4 Structure of the Neural Network

3.5 Models used

4 Results

4.1 Regression

4.2 Classification

5 Discussion

5.1 Discussion of method

5.2 Discussion of results

5.3 A note on regularization

6 Conclusion and recommendations

References

A Additional graphs