Forecasting Stock Index using Deep Learning and how it can be applied in the financial sector

(1)

Forecasting Stock Index using Deep Learning and how it can be applied in the financial sector

SALEH SALEH ABBAS

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT

(2)

Deep Learning inom aktiemarknaden har fått allt större drag under de senaste åren. Detta till följd av den allt större mäng högkvalitativdata. Introduktionen av TensorFlow har gjort det

möjligt att under relativ korttid bygga ett kapabelt artificiellt neuronnät. Detta arbete fokuserar just på att bygga en Deep Learning modell i TensorFlow kapabel av att förutspå aktieröslen för ett stort index. Den artificiella neuronnäten bygger på Multilayer perceptrons

arkitekturen, som visar sig vara väldigt bra på att finna och förutspå icke linjära mönster.

Modellen används sedan för att tränas på och förutspå S&P500. Data har blivit uppdelat i två delar ett för träning och ett för att testa hur ackurat modellen är. Resultatet presenteras i form

av två grafer som visar hur modellen lär sig och blir allt bättre på att förutspå index. Den slutliga MSE är 0,038. Modellens resultat evalueras och jämförs med andra liknande studier.

Diskussion kring modellens styrkor och svagheter tas upp, samt rekommendationer för framtida studier diskuteras i rapporten.

Den andra delen av arbete fokuserar på industriell ekonomi aspekten. Där fokus ligger på hur modellen kan implementeras på banken som arbetet utfördes vid. Resultat visar en division

inom företaget. Resultatet visar även vad en investering i modellen kommer att kosta samt

hur mycket banken måste tjäna in för att investeringen ska ha en positive kassaflöde.

(3)

Abstract— the idea of predicting the stock market has existed for hundreds of years. From the pre-industrial age of japan investors used candlestick patterns to predict the movement of rice prices, to the modern age of high frequency robot traders. The rise of computational power and the availability of high resolution data in massive scale has given investors the opportunity to use neural networks in a more advanced and compelling ways. The recent introduction of TensorFlow has made it even easier to implement this technology in business. This report focuses solely on building up a Neural Network, unsupervised, to be trained and applied to predict a major stock market index. The data is minutely based and is used for training and testing.

The result showcases how the model learns and adapts based on the training data, and after several epochs the model can adapt to even more complex parts of the graph. The stock data carries lots of noise which affects the training and results. Furthermore, the architecture of the neural network is vital for the model’s capability. This model was built to run on CPU, which also effects the model’s efficiency, as of recently GPUs are proven to be more effective.

The second part, which focus on the economical aspect of the model and how it can be implemented, discusses the cost of investing in machine learning, and how much the investment must yield per year to breakeven. Furthermore, the mistrust of algorithms and the scepticism regarding machine learning within the bank is also brought up and discussed, as the biggest challenge is not actually creating a sophisticated model but convincing the analyst of its usefulness.

I. INTRODUCTION

HE researches in computer science and statics have yielded the ability to find patterns in a vast number of data sets, which can be categorized into structured and unstructured data.

This can be used by the computer to build its experience, making it able to perform tasks such as natural language processing and image processing. The computer’s ability to perform such tasks is usually referred to in a vague term

‘artificial intelligence’, which is a field that has existed for years. The new surge of computational power, and the vast availability of high quality data has sparked a new interest in the potential use of artificial intelligence [1]. It is already being used in medicine to diagnose diseases, language translations and driverless cars. Increasingly, huge efforts are being made

to also implement artificial intelligence in the financial sector.

Artificial intelligence AI is a general name for a field that encompasses sub-categories where machine learning is one of them. Machine learning is built on algorithms optimized automatically using data, which is then used to solve a problem.

This boils down to finding patterns in a substantial number of diverse data sources. Statistics are used in many of the machine learning tools, where linear regression models are used in cases that deal with millions of inputs. Machine learning adds a dimension of flexibility by making it not bounded to linear relationships, which is a key element used in financial analysis.

The general concept of machine learning is to deal with prediction and optimization, and not with causal inference.

Deep learning – using layer-based algorithm can be used for supervised, unsupervised or reinforced learning. Deep learning has yielded interesting results in different fields like image recognition and natural language processing. Deep learning algorithms can be used to discover generalizable concepts, for example, encoding the concept of “car” from a data set of images, and hedge funds can use algorithms to count the number of cars parked outside a retail store from a satellite image in order to predict sale figures for a period. What motivates this study is TensorFlow library which was released in 2017, it provides a framework that is both highly flexible for machine learning and is highly scalable [2].

A. Scientific Question and Scope

This thesis and the deep learning model will solely focus on the financial sector, and how it can be applied in financial intuitions like banks.

This work is conducted at a bank, which is seeking to implement artificial intelligence into their core business practice of predicating, evaluating and analyzing the stock market. It is seeking models capable of assisting or even replacing today’s analysts. The reason for this is to improve the efficiency of their analyzing processes, as well as the fact that they are getting mixed analytical conclusion from different analyst, making it difficult to make any recommendations to clients.

Scientific Question. How can a deep learning model be built in TensorFlow to predict stock index? How can the deep learning model be applied at a bank?

Forecasting Stock Index using Deep Learning and how it can be applied in the financial sector

Saleh, Saleh Abbas

T

(4)

The scope of this project is to create a deep learning model capable of predicting stock prices. The model will be created in TensorFlow. The scope of this prediction will be based on major stock index e.g. S&P500. The model created is capable of being used for other type of databased analyses, however, this is out of scope for this thesis. Hardware limitation is an issue since it will be the factor to how much data can be inputted into the model. There’s noise in the stock data, which will heavily affect the training of the model and its forecasting accuracy.

B. Previous Studies

Recent studies on predicting the stock market use techniques based on linear models [20] [21], which cannot implement important none-linear aspects. Another study which relies on Deep Learning models to predict the stock market showcases that deep neural networks outperform shallow neural networks, and they also outperform machine learning models [17]. It also indicates that deep learning shows promise as a method to predict stock returns. The same study had a mean squared error of (MSE) 0.0833 [17]. Another study that uses Multilayer Perceptron (MLP) showed that the neural network model can achieve comparable results against the Buy and Hold strategy [16].

Using artificial neural networks has been conducted before, this project is an extension to these previous works and an attempt to apply a deep learning model at the bank this thesis is done in collaboration with.

II. BACKGROUND A. Neural networks

Modern deep learning models have, to some degree, taken inspiration from the findings in neuroscience [8], however, computers don’t even come close to the complexity of how the neurons operate in the human brain. The concept is to have neurons that process information through networking.

Deep learning is the architecture of a neural network that is built upon several hidden layers [9]. The added benefit is that additional information can be formed between the layers, which is making a representation of the original information - these representations are a modification or cancellation of the actual input signals [8][9]. This also leads to the data being much more general in nature than the original input.

As seen in the image above, on the left side of the figure we have the inputs x1 to xn which can represent any numerical value. The data can be existing or come as a signal from pervious neurons [10]. The neuron assesses the input and calculates the output Oj which is then passed to the next connected neurons. The input of a neuron is thus the weighted sum of all previous neurons associated with it [10].

The core concept of the deep learning models is the hidden layers. Additionally, there are output and input layers. The input is the entry point of the data, while the output shows the resulting data. In the most basic model of a neural network all bordering layers are connected to each other [11].

In the neural network, the connection adapts based on the current data. In a simple architecture, the network remains constant. An iterative adaption is used in deep learning, the method is called Gradient Descent [12]. Which parametrizes the model in a direction in which the error decreases. This method is commonly used in the training of neural networks.

[10] [13]

The main application for neural networks is pattern recognition, which can be applied by conducting a feed forward neural network, that has been trained by associating output with input patterns. During the usage of the neural network it recognizes the input pattern and outputs the associated output pattern. The core aspect of the neural network is when given an input without any corresponding output, it then gives a taught output that is least different from the given pattern. Artificial neural networks ANNs are densely built of interconnected neurons, where each neuron takes real-valued inputs and produces a single output value, which then, depending where in the layer, could be an input to another layer. Through this process the ANNs gains the ability to learn and acquire knowledge. ANNs are one of the best and most effective learning methods to acquire, learn and interpret complex data [18]. The core mechanics of ANNs is a computational structure designed to mimic the human brain. The ANNs consist of neurons that are interconnected, where each connection has a weight. The bias states the summed value of the weight. Bias is essential for neural networks; the bias is taken as an input and its corresponding to the summed average of other input weights.

The class of sigmoid function 𝑺_ is defined by the formula

𝑆_ (𝑎) = (1 + 𝑒𝑥𝑝 {− 𝛽𝑎})⁻¹

And the output of the neuron is defined by Y = 𝑆_ (∑^𝑛_𝑖=1𝑤_𝑖𝑥_𝑖− 𝜃)

where  is a positive steepness parameter,  is the bias of the neuron. X represents the inputs and W represent the weights.

B. Architectures of neural networks

The user can freely adjust the architecture of neural networks, which is possible due to tools like TensorFlow

(5)

that allows users to model neural network and use them on data sets. Due to the demand for neural networks and deep learning, a vast number of various architectures of neural networks for specific applications has been created. Each of these architectures has been built with specific features to processes a certain type of information. Convolutional Neural Networks (CNNs), is an example of an image processing neural network architecture that consists of layers. The multilayer perceptron MLP architecture is structured in such way that all the input nodes are in one input layer, all the output layers are spread into one or several hidden layers. The following variables determine an MPL, the number of input nodes, the number of hidden layers and hidden nodes, and the number of output nodes.

How these parameters are structure depends on the problem at hand. There are several approaches, and finding the optimal architecture is difficult, in addition, there is no guarantee that the method is the most optimal solution for all forecasting tasks. There are no accurate methods for determining these parameters, and those methods at hand are solely based on simulations with limited experiments. Also, these guidelines are based on trial and error methods.

Therefore, designing an architecture is an artform.

C. Multilayer perceptron

Multilayer perceptron MPL is a feedforward neural network architecture with one or several hidden layers, where feedforward means that the data only flows in one direction, from the input layer to output layer. MPL consist of three core layers: the input layer, the hidden layer (the hidden layer consists of 4 layers) and the output layer. The data is fed into the input layer and fed into the neurons, after being processed within the individual neurons of the input layer, the out values are forwarded to neurons in the hidden layers and then to the neurons in the output layer. Multilayer perceptron is used for prediction, approximation, pattern classification and recognition. As mentioned before, connections between neurons are associated by weights and changing the weights reinforces learning of the network.

The process of how the weights are calculated happens during the learning phase. The most widely used learning algorithm is the backpropagation algorithm, which consists of a forward pass and a backward pass. In the forward pass, an input vector is applied to the nodes of the network and becomes a set of outputs for the network at the output layer.

The weights are all fixed during this phase. In the backward pass, calculation of the error term is conducted by finding the difference between the actual response and the network’s prediction. The weights are altered in such manner that the prediction comes closer to the actual value [19]. The training is an unconstrained nonlinear minimization problem where the weights are iteratively modified to minimize the overall mean squared error between the predicted value and the actual value. The backpropagation algorithm follows the gradient decent method.

III. METHOD

A. Data

The data was gathered from Google Finance API, based on S&P 500 on minute basis. The data consist of 40 000 rows and 1 column. The column has data for the date, S&P 500 and all its underlying 500 stocks separated by coma. The data was imported in Python. The TensorFlow framework was controlled using Python. This due to previous knowledge in python.

Pythons is also TensorFlow’s first well supported language.

Placeholders, the input X was built as a two-dimensional matrix with [None, number_of_stocks] and the output Y as one- dimensional matrix [None]. The idea behind None is to keep the number of observations flexible. The amount of

observations can be adjusted later.

MLP multilayer perceptron [5] and 4 hidden layers were used for this thesis and is based upon sequentially halving the size of each layer. The model built in TensorFlow is based on three core layers: the input layer, hidden layer (which consist of 4 layers) and the output layer. MPLs are a feedforward neural

network [6].

Parameters used for the model:

number of stocks in the training data First layer = 1024 (neurons)

Second layer = 512 Third layer = 256 Fourth layer = 128

The first parameter accounts for the number of stocks in the underlying index, the parameter is kept as a variable to be able to account indexes of any size. To date there is no actual science on how to build the most optimized neural network architecture for a certain prediction problem, therefore, the other parameters were based on the characteristics of the multilayer perceptron architecture. The main idea behind these parameters is to build a neural network that sequentially decreases in size, thus making the data more compressed. The goal is to map N amount of data into one output, by also decreasing the number of neurons the noise in the data gets removed while retaining the necessary information. Narrowing the neural network filters some of the data while holding on to necessary information as much as possible. Since predicting stock prices is not a none linear problem several layers were used. The neuron’s interlinked connections are associated by weights, and changing the weights enforces learning. The reason for using deep learning method is that previous studies have shown that it outperforms other machine learning methods [17].

B. Training and Testing of data

The data was split into two parts, one used for training and one for testing. The data was divided chronologically [14] , 90% for training and 10% testing. This means that the data for training

(6)

started at t=0 to t= 0.9 * (40 000), and the testing started at t=0.9

* (40 000) to t= 40 000.

C. Neural Network in market forecasting

Due to the fact that the stock market carries uncertainty and volatility, forecasting the stock market is challenging.

Furthermore, the complexity and nonlinearity make linear models obsolete. When the problem is dynamic and not well defined, or when the data is noisy or incomplete, neural networks can adapt and create an input- output relationship of nonlinear data.

D. Input and Output data

The input data will consist of 500 underlying stocks of the S&P 500 at T=t and the model predict the S&P500 price at T=

t+1.

IV. RESULT & EVALUATION

The deep learning model is built with TensorFlow, using S&P 500 data on a minute basis.

A. Data preparation

Data consisted of over 40 000-minutes S&P500 index total price stored in a CVS file spanning four month. each row representing 1 minute, the data has 40 000 rows, e.g. 40 000 minutes. The file was checked and cleaned, all missing values were filled

B. Training and test preparations

The data was divided chronologically, where 90 % were dedicated to training, other 10% were used to evaluate the accuracy of the forecast.

C. Scaling of Data

Python sklearn MinMaxSacler was used to scale the input and the output. Scaling is beneficial for the architecture since most common used activation functions are defined within a certain interval. Most commonly used sigmoid [0,1] and tanh [-1,1].

It is very significant to scale the correct data at the right time.

The scaling is done in such manner that it is firstly applied to the training data, and then it’s applied on the test data.

D. Placeholders

This model uses two placeholders, one for the input X to the network, at T=t (the value of all the underlaying stock prices of S&P 500 at time T=t) and the network outputs Y the index value of S&P 500 at T = t +1. The model predicts the next minute of the index based on the 500 underlaying stocks of S&P 500 prices a minute ago. The input is a two-dimensional matrix and the output is a vector of one dimension.

Biases and weights are variables used in this model, which adapts during training. They are implemented before training the model.

Four hidden layers are part of the model, where the first layer, double the size of the inputs, consist of 1024 neurons, the corresponding hidden layers consist of 512, 256 and 128

neurons. This build up compresses the information for each layer. It is understandably correct to assume other architectures of neurons can be designed and implemented for this kind of tasks, however this is out of scope for this paper. To avoid dimension mismatch between layers, the second dimension of the preceding layer is the first dimension in the present layer.

E. Network architecture

The variables and placeholders are built into a network of sequential matrix multiplications. Activation functions are used to transform the hidden layers, activation function makes the model none linear. ReLU – rectified linear unit was used in this model as activation function. The input layer, hidden layer and the output layer are the core matrices of this model’s architecture, built in such way that data are only allowed to proceed forward. Other less restrictive models allow data flow both ways.

Mean squared error function is used to measure the deviation between the actual data and the outputted data. The model uses optimization to adapt the variables during training by calculating the gradient. The variables are adapted in such way that decreases the error. For optimizing the model Adaptive Moment Estimation was used. Before training the variables are initialized, using TensorFlow variance scaling.

F. Training the Neural Network

The training is done by batches; random data samples are inputted. The number of data is disturbed equally to several batches. The batches train the network by representing the input and its corresponding output. The prediction generated by the model gets compared to the expected data, based on that TensorFlow optimizes the network by adjusting the variables.

This process is conducted for all batches, and a fully conducted batch is an epoch.

G. Result

Training the model after several epochs the model learns the behavior of the data. The data was split into training and testing, the model predicted the future values and compared it to the actual data. With few training data inputted to the model, the model’s prediction lags, and struggles to forecast complex patterns in the data set. However, the model’s predictability increases gradually as the number of epochs increases.

(7)

Fig.1.Testing the model on the test data, orange is the prediction and the blue is the actual. The model was trained with one Epoch. At the bottom is the time in minutes. The test is conducted on 10% of the total number of data thus 0-4000 minutes. X axis is the time horizon in minutes and the Y axis the scaled value of the index.

MSE 1 1.12

MSE 2 0.13

MSE 3 0.086

MSE 4 0.090

MSE 5 0.011

MSE 6 0.078

MSE 7 0.068

MSE 8 0.072

MSE 9 0.045

Final MSE 0.038

Tabel1 shows the progression of the MSE, mean squared error.

Note that this is only a small part of the over all MSE progression.

Tabel1 showcases how the MSE decreases however not gradually as sometimes the error increases before decreasing again. It’s apparent that one drawback of the gradient descent is there are no guarantee that the it will find the global minimum value. As there could be several minimum values on the graph and the gradient can get stuck in one of them.

Fig.2. After several epochs the model starts to adapt and learning more complex parts of the data (orange is the model’s prediction). X axis is the time horizon in minutes and the Y axis the scaled value of the index.

At about 10 Epochs the prediction follows the actual data quite accurately. The forecast on the training set had the mean squared error of 0.038, this number is low due to

scaling. (MAPE) The mean absolute percentage error for the forecast was 5.2%. Regarding the baseline; assuming that the value at t+1 will be the same as the value at t and using it to predict the same set of the test data, the baseline model had a mean absolute percentage error of 6,7 %. The deep learning model was trained based upon several months’ worth of data, where each prediction was based on the S&P500 constituents one minute ago, the time horizon for this model was n=1.

H. Evaluation

The model’s neural network inputs a sample data batch which goes through the whole network to the output layer. A comparison between the model’s prediction and the actual data is made, based on that, optimization is made, and the networks parameters are updated. The weight and biases are updated, and the next data batch is inputted into the neural network. This process continues until 10 epochs have been conducted. The neural networks prediction is evaluated on the test set consisting of data set aside and not used for training. The model becomes more accurate as more training data is inputted.

The result is very compelling, and it compares well to other similar studies. For instance, a recent study done in 2018 January also using deep learning to forecast a stock index achieved an MSE of 0.083 [17]. The reason why this study achieved a better MSE (0.038) has to do with several factors.

This study uses multi-layer perceptron which in this case shows to be a very capable neural network architecture for forecasting problems. Secondly, this study uses more neurons and the architecture is built in such way that the number of neurons decreases by half for each layer, compressing the information

(8)

which proved to be a good architectural design for this problem.

The result is based on previous historical data. However, this is never a guarantee for future forecast. If a stock went up 50%

the last years, doesn’t mean it will increase by the same parentage next year. However, most of today’s stock trades are conducted by computer algorithms which could have made the deep learning model’s ability to forecast easier. Since computer are more pragmatic and calculated than humans, the market is now made up mostly by mathematical formulas and algorithms, which makes it easier for the deep learning model to find and forecast patterns. As it is more difficult to quantify the random nature of human emotional behaviour. Interestingly, the model built for this thesis almost immiediatly assumes the stock index will go up, as seen in the image below

Fig. 3 showcases the model’s prediction attempt after a few batches.

The result is attributed to the fact that since 2010 the stock market value has been increasing and this can be due the fact that stock trading computer algorithms are designed to hold their stock positions. As fewer are willing to sell, the stock prices go up. This could have played to the model’s advantage as it’s easier to forecast an upward trend than high volatile market.

One initial idea when building this model was making it scalable. It’s not possible to just scale up the input data and get a significantly better result. But what plays a more significant role in building a deep learning model in TensorFlow is the architectural design of the neural network.

Another reason for the model’s good forecasting is that the data used was under a period of economical and political stability and the market was bullish, thus making the market less volatile and more predictable. The stock prediction is based on the underlying stocks at T= t-1 and is inspired by the High Frequency Trading algorithms widely used on the New York Stock exchange. The idea here is to find emerging trends in minute. The test & training split is quite large and rolling through the data and scale the prediction should have been used.

However, the reason why the data was split into two parts was to make it comprehensible and presentable for the non- technologically inclined, as this project was conduced at a bank

and aimed mainly for them. Pervious course at KTH taught in the way of splitting training and testing in two parts.

The 500 underlying constituents of S&P500 at time t are combined. The constitutes price at time t-1 are used to forecast S&P500 price at time t, therefore it’s not possible to add up everything to make a prediction. In this model and in case of every other Multi-layer models is based upon the assumption that patterns found in past data will be forecastable in the future.

V. DISCUSSION

There are several ways to achieve greater results, designing the neural network architecture differently e.g. recurrent neural network. Different kind of initialization and activation functions influence the forecasting. Dropout layers and early stopping are two additional factors that could be used to further improve the model. Data sets contains lots of noise, the model cannot account for extreme circumstances, to make the model more solid, the use of more variables and correlations can be in order. However, wrongly added variables can give false correlations and misleading output. The model can be improved by using other stopping criterions e.g. when MSE haven’t changed significantly after a several epochs. The data is minute based, and the reason for that is to be able to use a large volume of data for a shorter time horizon. As larger amount of data makes the model more accurate. On the other hand, the drawback regarding this approach is economic trends happen in period of months and years. Thus, what happens on a minute basis is very noisy and very contingent on that limited time frame. Therefore, the training data for the model should have similar time frequency as the required investment horizon.

There are different ways to use time series cross validations, which can generate different results. Some of the most common ones are rolling forecast and time series bootstrap sampling. For future studies it would be interesting to evaluate different approaches of time series cross validation and compare the results. In this project only, the historical stock prices have been used for forecasting using neural network. To improve the accuracy of the model macro and micro economic factors can be used as input variables. Technical indicators can also be used as input variables to further improve the model.

Neural network has shown to be state of the art for this type of problems however it lacks the ability of expressing how certain the prediction is. As for future studies, usage of Bayesian methods will be necessary, as it will give the ability to model the neural network based on probabilities instead of constant values. The industry that revolves around deep learning, and its engineers are having hard time understanding the neural network, and its decision making. It not possible to ask the machine why it took a certain action, furthermore it’s very difficult to debug neural networks. Therefore, to gain a deep understating of neural nets advanced tools for debugging must be developed. The model created for this thesis proved to adapt and be able to recognize more complex part of the graph as the training sets increased. As the training data increased, the

(9)

lag decreased. Further development to this study would be, studying the noise in stock market data and evaluate its effects on the model’s training. The model can further be improved by studying and applying financial correlations. This thesis showcases that it is possible to build a deep learning model in TensorFlow capable of forecasting a stock index. Due to TensorFlow it was possible to build a capable deep learning model within a reasonable timeframe. TensorFlow simplified the process of building deep learning model and made it possible to focus more on the architectural frame work of the neural network. Recurrent neural networks (RNN) were for a while the dominating neural architecture, however they are getting replaced on a wide range of different tasks. Feed forward models, which this paper used, has the capability of offering improvement both in stability and speed while RNN models are more expressive, this doesn’t increase the performance of the RNN models [22]. Recurrent models depend on the entire history X0 to XT and it’s not proven that this approach show a better long-term patter recognition capability [22]. Instead of making predictions based on the entire history set one can immediately predict YT using the most recent inputs, thus making a strong conditional independence assumption. the feed forward model predicts a target based only on the k most recent inputs. The recurrent models seem on the outside to be a more flexible and expressive architecture design compared to the feed forward architectures, however there are several reasons why to prefer the feed forward models. Training is the core concept of any neural network; recurrent models are difficult to optimize and are more demanding [23]. There has been a greater effort to design feed- forwards architectures and software to effectives the training of deep feed forward networks. In some instances, the feed forward network is vastly lighter weighted and perform inference faster than similar recurrent models [23]. A recent study furthermore showed that a generic feed forward model outperforms recurrent baselines on a wide range of tasks, from synthetic copying tasks to music generation [24]. “The ‘infinite memory’ advantage of RNNs is largely absent in practice” [24]. You don’t need a vast amount of context to gain on average a good prediction [24]. Even though the experiment explicitly required long term context the RNN proved incapable of learning long sequences [24]. This means trained RNN models are in practice a feed forward, this since backpropagation can’t learn patterns that in a significant way are greater than n steps and models trained using gradient

descent can’t have long term memory.

The model uses n = 1 as a time horizon and predicts on minutely basis, and the inspiration comes from Wall Street’s high frequency trading which predicts data based on milliseconds. Most of today’s S&P 500 trading volumes are conducted by high frequency robots. The use of short time horizon has as well been inspired by the high frequency robots, furthermore as a study showcased one does not need vast amount of context to gain a good prediction [26]. Most of the studies that built ANN model’s for stock prediction used a large time frame and many of them performed worse than the

baseline. One must take into consideration the fact that the stock market is opened on a limited number of hours each day, is closed on the weekends and besides that is closed 10 times during a year due to holidays. The ANN loses context when major event occurs during the stock markets closing hour, and suddenly see’s a large stock movement the next time the stock market open, this must have an impact on the model’s prediction accuracy as it tries to adapt to this sudden incoherent movement. This thesis tried to circumnavigate this probable effect on the model’s training, by only relaying on most recent input. The short time horizon gives the ability to train and test the built model rapidly. Furthermore, machines are significantly better at short term prediction [26]. As for future improvements one could test and compare different time horizons and conduct the training on a GPU rather than a CPU. When it comes to number of optimal observations there is not established method for neural networks. The use of multilayer perceptron (MPL) for time series forecasting is essentially a none-linear autoregressive model and thus one could theoretically be able to use ACF (autocorrelation function) and PACF (partial autocorrelation function). However, it hasn’t been proven that these two functions will generate a better result for MPL models and is therefore a great basis for future studies.

VI. SECOND PART

INDUSTRIAL MANAGEMENT

INTRODUCTION

This part will focus on the industrial management aspect of, the question; how can this model be applied at the bank? First and foremost, this project has been conducted at a top tier Swedish bank that focus on investment, advisory and portfolio management. As the project has been conducted in that organization great insights have been collected. There are many interesting facts that can be discussed. While the bank is trying to implement machine learning into their daily practice there is a clear division in the company. On the one hand those who believe in machine learning, and those who do not. There is a general consent of mistrusting an algorithms output even though the result shows clear evidence of better judgment [3], this can also be attributed to lack of knowledge. The root to this division and divided vision in the company is due the fact that employees have different educational backgrounds. Basically, those with and those without a technical education. Recent layoffs because of the banks further investments in automation has created a resistance to change (technological innovations system). The fear of losing employment is one of the driving forces to resent technological changes. However, the focus of this report will be upon how the model can be implemented to become part of the already established bank. There are two business implications,

1. Fully integrate machine learning into the business and rely solely on it for business investment.

2. A combination of using analysist and machine learning to enhance accuracy and aid further decision making [4]

Hedge investment funds whom heavily relies on machine

(10)

learning have doubled their returns compared to their traditional counterparts. A.I based hedge funds have outperformed both average quants and average hedge funds since 2010 [6].

Fig.4. AI/ Machine Learning Hedge Fund index vs quants and traditional hedge funds [6]. Blue diagram shows the return of AI/ Machin Learning hedge funds, and the other colors shows the returns of traditional hedge funds e.g. those who don’t use A.I.

VII. THEORY

A. Task Analysis

To get a deeper understanding of how machine learning can be implemented, a task analysis is vital. By observing in detail how employees performing their tasks and how the project goal is achieved, great insights are gained. The knowledge gained from this can then be used to evaluate the necessity and usefulness of a new tool. [5].

The analysis is performed accordingly:

1. Task identification

2. Collection of data of how the task is performed. This can be collected by observation.

3. Breaking down to subtask where each subtask is specified in terms of objective.

4. Visualization of the subtasks

5. Present the analysis to a person whom have not been part of the analyzation but who is well adhered regarding the tasks.

B. Investment Evaluation

Depending on the current bank’s situations in terms of computer hardware, software and employees. Implementing machine learning requires certain amount of money. To be able to run machine learning based systems; data scientist, software engineers and expert knowledge of deep learning is required.

Thus, it is essential to evaluate if the use of machine learning will be cost effective. Net present value and Internal Rate of return can be used [15].

𝑁𝑃𝑉:

𝑁𝑃𝑉 = ∑ 𝐶

(1 + 𝑟)^𝑡− 𝐶₀

𝑡=1 𝑇

𝐶_𝑡= 𝑛𝑒𝑡 𝑐𝑎𝑠ℎ 𝑖𝑛𝑓𝑙𝑜𝑤 𝐶₀= 𝑖𝑛𝑖𝑡𝑎𝑙 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 𝑟 = 𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡 𝑟𝑎𝑡𝑒

𝑡 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒 𝑝𝑒𝑟𝑖𝑜𝑑𝑠

0 = 𝑃₀+ 𝑃_𝑛 (1 + 𝐼𝑅𝑅)^𝑛

𝑃_𝑛= 𝐶𝑎𝑠ℎ 𝑓𝑙𝑜𝑤 𝑓𝑜𝑟 𝑝𝑒𝑟𝑖𝑑𝑜 𝑛 𝐼𝑅𝑅 = 𝑅𝑎𝑡𝑒 𝑜𝑓 𝑅𝑒𝑡𝑢𝑟𝑛

Payback method can also be used to evaluate the investment based upon how long it will gain to earn

back the initial investment cost.

𝑃𝑎𝑦𝑏𝑎𝑘 𝑃𝑒𝑟𝑖𝑜𝑑 = 𝐶𝑜𝑠𝑡 𝑜𝑓 𝑖𝑛𝑣𝑒𝑠𝑡𝑚𝑒𝑛𝑡 𝐴𝑛𝑛𝑎𝑢𝑙 𝑛𝑒𝑡 𝑐𝑎𝑠ℎ 𝑓𝑙𝑜𝑤

There are no rules regarding how long the payback period is, however it up to the business enterprise to decide on what a reasonable time horizon is.

C. Comparison between methods

The payback method is simpler than the net present value since there no need to estimate the discount rate. However, it doesn’t account for inflation. Therefore, the payback method should not be used for long term investments.

VIII. METHOD

A. Data gathering

This project started early in close collaboration with the bank, and in close discussion with many of the banks analyst and software engineers. All data gathered have been collected from the managers and employees. The data required for the payback method were retrieved from the managing software engineer and the payback period were discussed accordingly.

B. Task analysis

Working at the bank for months gave great insights. To fully understand the task conducted every day at the bank a close study of the analysts’ work was conducted. The author of this paper has also worked as analyst. The overall process of conducting a forecast was noted and discussed with a senior analyst for feedback. The purpose of the feedback was to have an accurate depiction of the work conducted.

C. Method for investment decision

The payback method was reformulated to calculate the annual net cash flow needed. This means that the method was used to calculate the annual net cash flow rather than the payback period. Based on the data gather from the bank it was possible to calculate the net cash flow. Net present value was not used due to the need to estimate future cash flows.

IX. RESULT & EVALUATION

A. Legacy systems

One of the major drawbacks of the bank’s current system is aging hardware and software. To be able to use TensorFlow and integrate it into the system, the minimum requirement is a 64- bit operative system, and CPUs capable of using Advanced Vector Extensions. These requirements were lacking in many of the banks hardware system, thus making it impossible to run

(11)

the model. However, on systems that matched the minimum requirements the model ran flawlessly.

B. Task Analysis

Forecasting is usually conducted by several associates and analyst. In general sense the forecasting is based upon using future company results such as future cash flows, and future profit and loss statements. The analysist relies heavily on future predictions and assumptions. The first step is a free cash flow analysis, the analyst is supposed to estimate the free cash flow seven years into the future. Depending on the analyst these estimations varies a lot, some tend to be more optimistic than others. However, the free cash flow analysis is mostly based on data gather from the company that is being evaluated, which are future assumptions. Stock trading companies wants to keep their stock prices up, thus these numbers can tend to be exaggerated. The next step is to compare the company to be evaluated with other similar companies in the industry. This can be prone to human error, e.g. an analyst might not take into consideration a bad performing company in that sector, this mistake leads to an overvaluation. Furthermore, what companies are chosen for the comparable company analysis plays a major role for the evaluation. This makes it more difficult to evaluate companies in niche markets. If for instance the growth rate is changed by +-0.5 percentage, the evaluated stock price will shift between +-10% in value. These evaluations usually take 100 hours to do.

As this bank is multinational this tend to lead to contradicting evaluations, e.g. one analyst in Copenhagen gives a forecast that is totally different from the forecast done in Stockholm.

This makes it difficult for the bank to give a final evaluation to their clients.

C. Investment decision

According to information from the current manager, there are two alternatives whether to invest in machine learning as an addition or still use the current analytical system.

The required payback period is estimated to be three years, as computer power according to Moore’s law doubles every 18 months. Since its not intended for the bank to rely completely on machine learning, the banks do not need to upgrade their hardware every 18 months. Furthermore, this is a replacement investment as new technology is replacing old ones. This investment is totally exclusory, as this investment totally exclude the other alternative.

These are the following costs for the base investment:

• Further development and maintenance of the software

• Servers

• Hardware

The following assumptions have been made: the model created for this thesis is still at an initial stage, more layers might need to be added to fully accommodate the bank’s needs. The model need also to be maintained to stay up do date with TensorFlow’s updates. For further development and maintenance 1 year will be the baseline approximation. Programmers at the bank earn about 550 000 SEK per year. The bank needs to expand their servers to accommodate larger amount of data, this is

approximated to cost 40 000 SEK. As the bank must upgrade their 40 outdated computers, the approximated cost per PC is assumed to be 14 000 SEK

The calculation is as follows:

Programmer 550 000SEK Server 40 000SEK Hardware 560 000 SEK Base investment 1 150 000SEK Investment-calculation:

𝑎𝑛𝑛𝑢𝑎𝑙 𝑛𝑒𝑡 𝑐𝑎𝑠ℎ 𝑓𝑙𝑜𝑤 = 1 150 000 𝑆𝐸𝐾

3 ≈ 380 000 𝑆𝐸𝐾/year This result showcases how much the investment must generate to breakeven. As the base investment is known, and the payback period of the investment is set to 3 years, by reformulating the payback period formula, it was possible to yield the annual surplus needed.

D. Implementation of the model

Since the banks have legacy systems, outdated hardware and software it won’t be possible to fully rely on machine learning based system. Furthermore, part of the banks business is to act as advisory for clients. The model created is not fully sophisticate or encompassing to be fully relied upon.

This lead to the second option which relies on a mixture of analysts and machine learning system. As stated previously the investment must yield a surplus of 380 000 SEK per year to breakeven. The whole idea of implementing deep learning tools is to reduce the number of analyst. As of today, the bank relies upon 15 analyst whom specialize in analyzing and forecasting major stock indexes and features. The average analyst cost the bank 62 000 SEK/ month, ~ 744 000 𝑆𝐸𝐾 𝑝𝑒𝑟 𝑦𝑒𝑎𝑟, by only reducing one analyst. The investment will yield a surplus of 364 000 SEK per year. However, the model can be trained for any types of stock related data, it all boils down to what data is used. The model can be used to forecast more than just stock index, it can be used to forecast company stocks, features, and warrants. If the model is chosen to be applied on several different areas, the bank will have the ability to further reduce staff dependency and thus further increasing the yearly surplus of the investment.

E. Algorithm mistrust

The biggest challenge for the bank to implement this model is not showcasing its capabilities or its error margins. It’s the mistrust to the algorithm, as stated before there is a division inside the company both with opposite reflection regarding the use of machine learning. This creates a great challenge for an enterprise in which is trying to adapt its business to integrate machine learning, as strong opposing forces inside the company will prevent this. This opposition is rooted in lack of knowledge and the fear of losing one’s position in the company. To be able to circumnavigate this mess the bank must be willing to educate those who lack technical knowledge. The bank is rooted in old company cultures and traditional work routines, which makes the bank vulnerable to future fintech companies.

(12)

F. Time reduction

One of the main benefits of implementing this model is time reduction, as for a typical analyst a forecast can take 100 hours to conclude, however, for the model, depending on the data set, it will only take a few hours. The current process needs many hours to go through financial statements, documents and company reviews, and in accordance analyst must make assumption, which further increases the errors. The model built for this thesis depends on historical data to make a forecast, which is not flawless. It’s worth pointing out the model had a deviation of 5.2%. A small calculation error, within +-0.5 parentage, done by an analyst will have an error margin of at least 10 % for the resulting forecast, this after spending 100 hours, whiles the deep learning model only took a few hours for the same outlined goal.

G. Limitation

The model is based on historical data, and cannot account for extreme circumstances, such as war, earthquakes or political turmoil. It can’t either account for rare random events such as flash crashes, or external political threats. Thus, when implementing the model, it should not be relied upon under above mentioned events, since the yielded results will be misleading.

H. Other Industries

This model was built for finance; however, it is fully applicable to other industries where forecasting is of essence: retailers, wholesalers, and logistics companies can use an adjusted version of this model for demand forecast, and fully integrate this to their daily business practice.

X. CONCLUSION

It was difficult to sell the idea of using machine learning to the banks analysts since it was difficult to explain the model’s code of action and its reasonings. Today it is not possible to scrutinize a neural network to gain knowledge of how it works.

Thousands of simulated neurons arranged in layers make up its embedded reasoning. The idea of the layers is to give the neural network the capability of understanding patterns at different levels of abstractions.

From the research conducted, it was observed that the bank spends over 100 hours of forecasting an index, which is very inefficient and expensive as hiring analysts is costly. By investing in the model and reducing one staff, the investment will yield a surplus return.

REFERENCES

[1] “European Joint Committee Discussion Paper on the Use of Big Data by Financial Institutions,” JC 2016 86, p. W.-K.

Chen, Linear Networks and Systems. Belmont, CA, USA [2] Tom Hope, Learning Tensorlfow A Guide to Building

Deep Learning Systems vol. ED-1

[3] Paul Michelman, When People Don't Trust Algorithm, MIT Sloan Management Review, 2017

[4] Ian H. Witten, Data Mining: Practical Machine Learning Tools and Techniques, 4^th Edition, Cambridge US 2017

[5] David E, Learning Internal Representations by Error Propagation, Parallel distributed processing: Explorations in the microstructure of cognition, Volume 1: Foundation. MIT Press, 1986.

[6] J. Schmidhuber, Deep learning in neural networks: An overview. Volume 61: pp 85–117. 2015

[7] W. Huang, Forecasting stock market movement direction with support vector machine, volume 32 Elsevier Ltd 2005 [8] I. Goodfellow, Deep Learning. MIT Press 2016

[9] M. Bishop, Neural Networks for Pattern Recognition, UK, Oxford Press 1995

[10] J. Schmidhuber, Deep learning in neural networks: An overview. Volume 61: pp 50. 2015

[11] Ruslan, "Learning with Hierarchical-Deep Models". IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 35 2013

[12] Dreyfus, "The computational solution of optimal control problems with time lag", IEEE Transactions on Automatic Control, Volume: 18, Issue: 4, Aug 1973

[13] Griewank, Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, Second Edition, SIAM, 2008

[14] Ripley, Pattern Recognition and Neural Networks, UK, Cambridge University Press 2007

[15] A. Damodaran, Investment Valuation, 3^rd Edition, ISBN:

978-1-118-01152-2. Apr 2012

[16] O, Berat“An Artificial Neural Network-based Stock Trading System Using Technical Analysis and Big Data Framework”

, Kennesaw State University, GA, U.S.A., 13-15 April, 2017 [17] M. Abe, “Deep Learning for Forecasting Stock Returns in

the Cross-Section”, 2018

[18] T. Mitchell, “Artificial Neural Networks: Based on Machine Learning”, Mc Graw Hill, 1994

[19] Simon Haykin, “Neural Network A Comprehensive Foundation”, Second edition, Prentice Hall, pp. 161-173, 1998.

[20] Y. Liu, “Cross-section expected returns”, 2016

[21] R.D McLean, “Does academic research destroy stock return predictability?” 2015

[22] J. Miller, “When Recurrent Models Don’t Need to Be Recurrent”, Moritz Hardt, 2018

[23] T. Mikolov, “On the difficulty of training Recurrent Neural Networks” R. Pascanu, Y. Bengio, Universite de Montreal ,2012

[24] S. Bai“An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling” J. Zico Kolter, Vladlen Koltun, 2018

[25] V. Sharam, “Prediction with a Short Memory”, 2016 [26] S Makridakis , “Statistical and Machine Learning

forecasting methods”, Universidad Veracruzana, MEXICO 2018

Author Presentation: The author of this thesis is studying Industrial Management and Engineering with specialization toward Computer Science at Royal Institute of technology. The author has since 2013 worked in the financial sector and held numerous position in one of Europe’s largest banks. The author has worked as a financial analyst and business analyst, where most of the work conducted has revolved around business valuations and stock forecasting. Furthermore, the author has worked with corporate bankruptcy cases and as a financial consultant for different financial intuitions.

(13)

www.kth.se