Applying investor sentiment to a prediction model of the stock market

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2017,

Applying investor sentiment to a

prediction model of the stock

market

AUGUST BERGMAN

SONJA ERICSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

(2)

Applying investor sentiment

to a prediction model of the

stock market

AUGUST BERGMAN AND SONJA ERICSSON

Degree Project in Computer Science Date: June 5, 2017

Supervisor: Jeanette Hällgren Kotaleski Examiner: Örjan Ekeberg

Swedish title: Applicering av tonalitetsanalys för att förutspå rörelser på aktiemarknaden

School of Computer Science and Communication

(3)

(4)

iii

Abstract

Using data-driven methods to predict the movements of the stock market is a growing field of research. Recently, large amounts of data sourced from online news and social media have been utilized to predict movements in financial markets. With the emergence of social media platforms, data can be gathered and used to quantify the sentiment of the market. This study investigates whether investor sentiment can be used to improve the precision of a prediction model of the stock market, specifically to explore whether the precision of a model which predicts intraday price change in direction of certain equities can be enhanced by the addition of investor sentiment. By collect- ing sentiment data derived from the classification of large amounts of messages from a social media platform aimed at investors and traders, a model was trained using technical data and subsequently retrained combined with sentiment data, to compare their performance. The results show that the predictive performance of the model is enhanced slightly by using sentiment data which indicates that there are potential benefits in using sentiment data to predict intraday price change in direction. However, as neither of the models shows significant classification performance, the results of this study should not be viewed as conclusive.

(5)

iv

Sammanfattning

Att använda data för att förutspå kommande prisförändringar på aktiemarknaden får allt mer ökad uppmärksamhet inom forskning. Nyli- gen har nyheter och aktivitet på sociala medier använts för att förutspå rörelser i finansiella marknader. Med uppkomsten av sociala medie- plattformar riktade mot investerare har det blivit möjligt att samla in stora mängder data och använda det för att kvantifiera den samlade uppfattningen om marknaden. Denna studie undersöker om precisionen av en prediktionsmodell av aktiemarknaden kan förbättras genom att använda sig av tonalitetsanalys inom investerarplattformar för att finna den samlade bedömningen om en finansiell tillgång, mer speci- fikt huruvida precisionen hos en modell som förutser de dagliga pris- förändringarna för specifika aktier kan förbättras. Detta har genom- förts genom att samla in data som framställts genom att klassificera en stor mängd meddelanden från en social medie-plattform för investerare. Resultaten från studien tyder på att en tonalitetsanalys lett till att modellens klassificeringsprecision förbättrats, vilket indikerar att investerares uppfattningar om marknaden kan användas för att för- utspå prisförändringar för en aktie. Modellernas precision är däremot inte tillräckligt signifikanta för att studiens resultat ska bedömas som slutgiltiga.

(6)

Chapter 1 Introduction

Modeling the behavior of the stock market and predicting the future value of financial assets can potentially result in massive investment returns. While there are many different methodologies used for the prediction of the stock market, with the advent of large quantities of data sourced from the Internet, effective machine learning algorithms have made the prediction of the stock market using data-driven methods an important field of research.

The efficient-market hypothesis (EMH) states that financial markets are informationally efficient and that as such, all available information related to a financial asset is reflected in its valuation, making prediction future stock prices inherently impossible [6]. The EMH is however not without its critics. Behavioral finance theory asserts that since the valuation of financial assets is driven by human agents, the psychological behaviors of these agents has an influence on future market prices. It is suggested that stock market trading decisions are influenced by emotions and that the behaviors of investors thus are shaped by periods of optimism or pessimism regarding the future price of stocks [2]. Being able to model these behaviors would then lead to at least a certain degree of predictability of the stock market.

Digital trends and large quantities of data can potentially be utilized to predict movements in financial markets, as some research indicate that such trends can be used as early indicators of new information, not reflected in the pricing of the concerned financial asset [12] [2] [25][5] [4]

[26] [10].

Several studies indicate that there are potential benefits of using the sentiment of the market to predict changes in the stock market [3]

1

(9)

2 CHAPTER 1. INTRODUCTION

[22] [21] [17] [20] [29] [1] [16]. Sentiment analysis uses natural language processing to determine public opinion, and it has been applied to stock market prediction by extracting the collective opinion regarding certain financial assets. Bollen, Mao, and Zeng [3] implement a prediction model of the Dow Jones Industrial Average (DJIA) combined with indicators of the public mood to predict daily changes in the DJIA closing prices with an accuracy of 87% and with 6% reduc- tion of the mean average percentage error. Zhang, Fuehres, and Gloor [29] measures collective emotions by analyzing Twitter posts and successfully correlate it to stock market indicators the following day. Ruiz et al. [20] studies the correlation between micro-blog activity and stock market events, specifically messages relating to certain companies.

Quantifying and learning from trends that model the market sentiment has been made possible by access to large amounts of data, through the emersion of social media platforms aimed at the trading community. This makes it possible to investigate not only whether investor sentiment can be used as early indicators of new information, but also study to what extent collective information gathering can be used to predict market trends, and if social media activity precedes decision making.

This study aims to gather sentiment data regarding particular stocks in order to analyze whether investor sentiment can be used to enhance stock market price prediction models.

1.1 Research Question

This report investigates whether investor sentiment can be used to enhance data-driven stock market prediction, and thus seeks to answer the following question: Can investor sentiment data be used to improve the precision of a prediction model of the stock market?

1.2 Scope

The scope of the study is limited by a number of factors, including time and knowledge in the field of machine learning. Hence, a high- level framework and data preprocessing tools are used to construct the prediction model. The results are derived and evaluated using basic statistical methods.

(10)

CHAPTER 1. INTRODUCTION 3

The models that are to be used will be trained using Apple and Facebook technical and sentiment data. Historical stock data will be gathered using Google Finance. Investor sentiment data will be retrieved using PsychSignal, a provider of trader sentiment data and analysis, which can be used to mine social network data from Stock- Twits, a social media platform for investors and traders. The availability of social media data is a limitation of this study, as these platforms only have been available for a few years and much of the data mining done in this field uses proprietary techniques that are not publically available. Since the study is done using pre-processed data, the study is limited by lack of insight into the methods used for deriving investor sentiment metrics.

The network architecture used is selected from models evaluated by earlier studies in the field and will be implemented with a high- level framework. The model will only be used to predict intraday price changes in direction. The study of sound investing strategies and de- termining causality is beyond the scope of this study.

1.3 Outline

The rest of the report is structured as follows:

Section two outlines the technical background and earlier research to review relevant theoretical material and define concepts essential for the rest of the report. Section three presents the method used to per- form the study. Section four shows the results acquired by the study.

Section five outlines an evaluation of the results and an analysis of the model’s performance. Finally, the paper concludes with a summary in section six.

(11)

Chapter 2 Background

This section describes the stock market, the different technological prediction models and machine learning algorithms used for stock prediction, and the concept of sentiment analysis.

2.1 The stock market

The stock market is the collective word used to refer to the aggregation of stock markets and exchanges in the world where the issuing, buying and selling of stocks occurs. A stock represents a share of ownership in a company and is an effective way for corporations to raise money for expansion or otherwise. By issuing stocks, investors are able to buy shares in a corporation and are thus entitled to any assets or earnings of the corporation.

Stock trading is one of the most common ways for investors to in- vest money with high liquidity and many companies exist with the sole purpose of investing money in the stock market. There are different opinions of what affect the price of a stock, but ultimately the price of a stock is determined by the demand and supply for it. According to the Stockholm Stock Exchange the demand and supply for a stock are mainly affected by expectations, company earnings and dividends, in- terest rates and the well-being of the economy, taxes and subventions, trends in society, changes in the company, rumors, and speculations.

The price of a stock is determined by earnings in the long run, but in the short run, there are a number of factors influencing it [14].

The predictability of the stock market prices has been debated for a long time and is still today. The EMH constitutes an important part of

4

(12)

CHAPTER 2. BACKGROUND 5

the modern financial theory and maintains that markets are informationally efficient, meaning that the pricing of assets reflects all available information and that as such it is impossible to outperform the market. The efficiency of markets is a consequence of competition; assets will always trade at their fair value because information is available to anyone, and any changes will be incorporated in prices without delay. Associated with the EMH is the random-walk hypothesis which states that the price series of stocks evolves randomly. The intu- ition behind it is that if information is incorporated in prices without delay, then prices are only affected by news, which are unpredictable by definition [12].

Stock market prediction can be divided into three categories; technical, fundamental and technological analysis. Technical analysis looks at technical data such as past prices, while fundamental uses information about a company’s business. Technological analysis is the use of machine learning and data mining techniques to predict the stock market.

The constraints of the EMH have been shown to be potentially too restrictive and that some prediction of the stock market may be possible [15]. One argument against it is that it has been empirically dis- proven by the multitude of investors who have consistently succeeded to outperform the market. Another proclaim is that people do not per- ceive information the same, and further that there are actors in the market which are driven by predictable behavioral and psychological elements [12] [2] [15].

2.2 Technological prediction models and sup-

port vector machines

For a machine learning model to be accurately applied to the stock market prediction problem, it must be able to handle the complex logic it represents, and particularly be able to handle noisy data and avoid overfitting. The learning algorithm used has a great impact on a model’s ability to predict future changes.

A variety of machine learning algorithms has been applied to the stock market prediction problem. One commonly used model is that of supervised learning, which is used to classify input data by training the learning algorithm using labeled data. Supervised learning can be

(13)

6 CHAPTER 2. BACKGROUND

implemented using a variety of different algorithms.

Support vector machine (SVM) is one specific model of supervised learning commonly used for both classification and regression analysis, which is resilient to overfitting. Using the kernel trick, SVMs can be used to learn classification rules even if the input data is not linearly separable. The selection of kernel method is usually done through cross-validation [11]. SVMs have been successfully applied to the forecasting of financial time series in several earlier studies [9]

[28] [24].

2.3 Sentiment analysis

Sentiment analysis has become an attractive field of study in computer science [7]. It can be defined as the act of using natural language processing to extract the subjective attitude about a particular topic. The emergence of big data and social media platforms has made it possible to use sentiment analysis to extract the collective opinion about stocks [3].

The internet and the availability of big data has changed the way investors retrieve and react to information. It is natural for a human to resort to ‘the wisdom of the crowd’ when facing large quantities of information and potentially great uncertainty [13]. Some research also indicates that while the stock market may be inherently unpredictable, since it is driven by new rather than old information, social media activity can be used as an early indicator of upcoming events, thus making the extraction of this activity possible to be used as early predictors of changes in the price of financial assets, i.e. early indicators of news that the market may not have reacted to yet. In fact, it may be that not only news but also social media is used by actors in the stock market when making decisions about investing in a stock, as human decision making is preceded by a phase of information gathering [3]

StockTwits is a social media platform aimed at the investing community, where investors can follow communications regarding a specific stock using its corresponding ticker symbol through the use of

“cashtags” [27]. The classification of communication according to its relevant stock makes the platform suitable for extracting data that reflects the collective opinion regarding a certain equity. PsychSignal uses a proprietary algorithm to extract quantitative market data. Their

(14)

CHAPTER 2. BACKGROUND 7

technology can be used to identify the stock-specific market sentiment by classifying StockTwits data [18].

(15)

Chapter 3 Method

The study started with a literature study in order to investigate the state-of-the-art in technological stock prediction and social media sentiment data extraction, in order to train a model suitable for the problem domain. A phase of data gathering then commenced, where historical stock data and pre-processed sentiment data was selected and retrieved. A Support Vector Machine (SVM) classifier using a radial basis kernel (RBF) function was then selected through cross-validation, and sequentially trained using scikit-learn, a tool for implementing data mining using the Python programming language. The RBF kernel has many advantages, such as the ability to map non-linear training data and the ease of implementation [24].

3.1 Data

Historical stock data containing intraday trading metrics were extracted using Google Finance. Each day of trading represents one data sam- ple, labeled according to their intraday price change in direction. This data is presented in table 3.1.

PsychSignal sentiment data was extracted using Quantopian. This data corresponds to tweets regarding specific stocks selected by the company cashtag from the StockTwits platform, which makes it possible to classify sentiment regarding specific stocks. These messages are then classified as either bullish or bearish, representing a positive or negative view of the stock and thus contributing to an upward or downward movement in sentiment trends, respectively. This data is presented in table 3.2.

8

(16)

CHAPTER 3. METHOD 9

Table 3.1: Technical data Open Price of first trade of the day High Highest trading price of the day Low Lowest trading price of the day Close Price of the last trade of the day Volume Number of trades of the day

These metrics were then processed for suitable feature extraction.

The formulas used for this processing are presented in table 3.3.

The features derived from applying the preprocessing formulas to the input metrics are presented in table 3.4. All feature values were scaled to zero mean and unit variance prior to training, a necessary pre-processing step for SVMs [23].

The particular stocks that were used as input data were chosen based on the availability of data. The data collected was restricted to the time period within which sufficient amounts of sentiment data were available. This resulted in the gathering of Apple (AAPL) stock and sentiment data during the period 2012-01-01 to 2017-03-02 resulting in 1293 data points, and 2014-01-01 to 2017-03-02 for Facebook (FB), resulting in 792 data points. Since there is no trading done during weekends or holidays, data points from these dates were removed.

3.2 Evaluating performance

K-fold cross-validation was used to evaluate the model hyper-parameters as the amount of data was restricted to the time period where the StockTwits platform was popularly used. This validation technique is used when there is not enough available data to split the data into partitions used for validation [8]. The model was then evaluated on its test performance by performing a classification measure using its capability of predicting the change in direction.

(17)

10 CHAPTER 3. METHOD

Table 3.2: Sentiment data

Bull scored messages Total count of bullish sentiment messages scored by PsychSignal’s algorithm

Bear scored messages Total count of bearish sentiment messages scored by PsychSignal’s algorithm

Bullish intensity

Score for each message’s language for the stength of the bullishness present in the messages on a 0-4 scale. 0 indicates no bullish sentiment measured, 4 indicates strongest bullish sentiment measured.

Bearish intensity

Score for each message’s language for the stength of the bearish present in the messages on a 0-4 scale. 0 indicates no bearish sentiment measured, 4 indicates strongest bearish sentiment measured

Total scanned messages

Number of messages coming through PsychSignal’s feeds and attributable to a symbol regardless of whether the

PsychSignal sentiment engine can score them for bullish or bearish intensity The descriptions are taken as is from Psychsignal Example Notebook

[19].

Table 3.3: Preprocessing formulas Definition

Price delta Close price_t Close price_{t 1} Price movement If Price delta > 0 then 1 else 0 Positivity measure Bull scored messages

Bear scored messages + Bull scored messages

Activity measure Total scanned messages_t SMA Simple moving average, SMA ¹₅P5

i=1Total scanned messages_{t i}

(18)

CHAPTER 3. METHOD 11

Table 3.4: Features

Open Price of first trade of the day High Highest trading price of the day Low Lowest trading price of the day Close Price of the last trade of the day Log-scaled volume Logarithmized volume

Delta Change in closing price from day before Positivity Ratio of bullish tweets from all classified Activity Activity measure

Bullish intensity Aggregated bullish intensity in messages Bearish intensity Aggregated bearish intensity in messages

(19)

Chapter 4 Results

The results from the trained models for both stocks that have been evaluated is presented here. The first model is trained using technical data only, the other with technical and sentiment data combined for the same time period. Excerpts from technical and sentiment features for a few of the data points are shown in table 4.1 and 4.2, respectively.

Figures 4.1 and 4.2 show how the historical price for the respective stocks has changed during the considered time period.

Table 4.1: Technical data points for the Apple Stock Date Open High Low Close Delta Volume 2017-03-02 140.0 140.28 138.76 138.96 -0.83 26210984 2017-03-01 137.89 140.15 137.6 139.79 2.80 36414585 2017-02-28 137.08 137.44 136.7 136.99 0.06 23482860 2017-02-27 137.14 137.44 136.28 136.93 0.27 20257426 2017-02-24 135.91 136.66 135.28 136.66 0.13 21776585 2017-02-23 137.38 137.48 136.3 136.53 -0.58 20788186 2017-02-22 136.43 137.12 136.11 137.11 0.41 20836932 2017-02-21 136.23 136.75 135.98 136.7 0.98 24507156 2017-02-17 135.1 135.83 135.1 135.72 0.38 22198197 2017-02-16 135.67 135.9 134.84 135.34 -0.17 22584555

12

(20)

CHAPTER 4. RESULTS 13

Table 4.2: Sentiment data points for the Apple Stock

Date Positivity Activity Bullish Intensity Bearish Intensity

2017-03-02 0.62 45.80 1.82 1.87

2017-03-01 0.68 637.20 1.75 1.73

2017-02-28 0.63 -247.60 1.80 1.79

2017-02-27 0.69 120.20 1.90 1.85

2017-02-24 0.62 55.80 1.73 1.66

2017-02-23 0.61 -146.80 1.95 1.64

2017-02-22 0.64 -228.00 1.87 1.82

2017-02-21 0.69 -270.40 1.83 1.75

2017-02-17 0.62 -438.20 1.71 1.78

2017-02-16 0.68 -295.20 1.85 1.49

Figure 4.1: Close price, Apple

Figure 4.2: Close price, Facebook

(21)

14 CHAPTER 4. RESULTS

Table 4.3 and table 4.4 show statistical information regarding the used sentiment data. Note that the standard deviation of the amount of messages posted each day is quite high and that the amount of messages is very low for certain days.

Table 4.3: Sentiment data summary, Apple

mean std min max

Total scanned messages 1072.41 966.04 26.00 8387.00 Bull scored messages 257.51 230.27 3.00 1995.00 Bear scored messages 157.33 155.76 0.00 1380.00 Bullish intensity 1.71 0.11 1.16 2.18 Bearish intensity 1.76 0.18 0.00 2.80

Table 4.4: Sentiment data summary, Facebook mean std min max Total scanned messages 368.12 438.75 8.00 4953.00 Bull scored messages 92.32 107.82 1.00 999.00 Bear scored messages 48.82 65.39 0.00 784.00 Bullish intensity 1.77 0.16 1.07 2.69 Bearish intensity 1.75 0.36 0.00 4.00

(22)

CHAPTER 4. RESULTS 15

4.1 Performance of each model

The results for each model run on each of the two stocks are presented in table 4.5. The tables show the classification accuracy on the test data, as a percentage, for the best performing runs for both models.

Table 4.5: Prediction accuracy Model 1 Model 2 Apple 52.124 55.985 Facebook 52.201 57.233

Figure 4.3: Model comparison

(23)

Chapter 5 Discussion

5.1 Discussion of results

The research conducted in this paper sought to explore the question of whether sentiment data could enhance a data-driven prediction model of the stock market. The results stated in table 4.5 are to be analyzed here.

The networks that were trained to predict intraday changes in direction for two different stocks using technical data only achieved a classification accuracy slightly higher than random guessing. Training the same networks using the same technical data from the same time period along with sentiment data containing information about the mood of the market did indeed increase the accuracy of the trained networks, for both tested stocks. The degree of accuracy improvement was, however, not more significant than a few percentage points.

These results were achieved using training data spanning from the rather short period of time that investor sentiment data have been available, which may have greatly limited the performance of the trained networks. These restrictions, however, apply to both versions of the trained networks, and thus do not inherently limit the enhancing ca- pabilities of applying sentiment data to a prediction model, but rather the performance of any of the networks trained. These limitations thus do not necessarily void the results gathered from this study, but modeling the wisdom of the crowd is, of course, made less accurate when the crowd is too small.

The results indicate that the predictive performance of the model is enhanced slightly by using sentiment data. The performance of the

16

(24)

CHAPTER 5. DISCUSSION 17

trained models are not significantly better than random guessing, and these results thus do not necessarily imply that the technical data used to train the networks is sufficiently indicative of changes in direction of stock market prices to give a significant predictive performance. This may have limited the success of the conducted research.

The results retrieved by applying sentiment data to the prediction model trained here implies a slight increase in accuracy in predicting intraday changes in price direction but does not imply any attempt at a sound investing strategy, as this is outside the scope of this research.

These results and their underlying correlations ought to have been studied further using more rigorous methods than what was used in this research to determine an answer to the problem stated.

5.2 Discussion of method

Using large quantities of data mined from the Internet in order to explore the predictability of the stock market is a relatively new area of research, so attempting to disprove the EMH is beyond the scope of this study. There are undoubtedly many complex factors that have affected the efficiency of this study with the methods that were used.

Many of these complex relationships are simply not known, which nat- urally affects how efficiently a particular model of these effects can be used to explore them.

Due to time constraints, the method used might have lacked the necessary complexity to accurately model these relationships, which has affected the effectiveness of the trained model. A fast and easy to implement machine learning framework was used to implement the trained networks, with limited means of analyzing results. This resulted in a difficulty in analyzing whether the results achieved here were limited due to the simplified model, or due to the actual relation- ship between sentiment data and future stock prices not existing. This dependency between evaluating the model and the phenomenon that has been attempted to model might have limited the success of this research.

Another limitation of the study is how we decided to model what it actually means for a financial asset to be overvalued or undervalued.

This is evident in two ways. First of all, we simply label data by using the relative intraday change in value, which in and of itself does not

(25)

18 CHAPTER 5. DISCUSSION

necessarily determine if a stock actually is overvalued or undervalued.

This would have required a more sophisticated way of labeling data, possibly also on a larger timeframe. A system as complex as the stock market justifies a more thorough analysis of what it actually means for a financial asset to be correctly valued. Secondly, the magnitude of change is not taken into account. One possible improvement would have been to define a certain magnitude of change in direction as the stock being correctly valued.

The learning model that was to train our classifier was chosen due to its relative ease of implementation using a high-level framework.

One disadvantage of the way this model was used is the lack of inter- pretability of our results. A more thorough statistical analysis of the model’s performance would have greatly improved the reliability of our results.

Another clear limitation of the study is the relative lack of training data. The prediction model was trained solely using data available from when sentiment data from the vendor we used has been available, which greatly reduced the amount of technical data that we were able to use in training the model. The relative lack of data also reduced the amount of validation we were able to do to validate the model’s performance. One way to counter this would have been to train the network on substantially more individual stocks and concatenate the results. This could also have had the effect of generalizing the results of the study.

In choosing pre-processed sentiment data, we compromised the amount of analysis of sentiment data that might be necessary to more accurately define what constitutes investor sentiment. This was, however, a necessary step due to time constraints. This data contains many points of interesting analysis such as the underlying motivations of people speaking positively or negatively about certain stocks. The way this data has been interpreted undoubtedly has a direct impact on the model’s performance but were, again, necessary under the constraints of this study.

Another limitation of the study is that there possibly exists a degree of ambivalence as to what the data actually models, meaning that the way we have chosen to interpret sentiment data might not actually be an accurate model of what the data in reality represents. One could just as well have interpreted a negative attitude towards a stock as a sign of a degree of risk associated with it rather than interpreting it as

(26)

CHAPTER 5. DISCUSSION 19

a sign of the stock being overvalued. We have also chosen to treat a bullish sentiment in the same way as a bearish sentiment towards a stock, which may not necessarily reflect the intent of the market.

(27)

Chapter 6 Conclusion

In this study, a predictive model of the stock market using historical technical data was compared to one enhanced using social media sentiment data in order to determine if investor sentiment can be used to enhance a prediction model of the stock market. A SVM classifier was trained using technical data for two different stocks in the technology sector using the Python framework scikit-learn. The accuracy of the trained networks was then evaluated, and sequentially re-trained using sentiment data derived by PsychSignal technology from the Stock- Twits investor social media platform, which classifies a bullish (positive) and a bearish (negative) sentiment towards a certain stock. The training data was labeled using its intraday point of change in direction.

An evaluation of the performance of the trained networks shows an increased classification accuracy of intraday price directions when applying sentiment data. However, due to the relatively poor performance of the trained networks and the limitations in the methods used in this study, further research should assure that this holds true. The results of this study should thus not be viewed as conclusive.

20

(28)

Chapter 7 Future research

Future research may improve the ways in which this study was conducted in several ways. A more sophisticated model of stock prediction using technical data with a more rigorous statistical analysis would improve the certainty of the results. One way to do this is to compare the predictive performance of several network architectures and to increase the amount of training data by studying many different individual stocks. One might also need to introduce a lower threshold for how many classified messages can constitute an adequate measure of sentiment.

This study was conducted by evaluating the intraday changes in price directions. This may be improved by studying a larger span of time-ahead prediction evaluation, to more accurately model the reality of stock trading.

21

(29)

Bibliography

[1] Malcolm Baker and Jeffrey Wurgler. Investor Sentiment and the Cross-Section of Stock Returns. Working Paper 10449. National Bu- reau of Economic Research, Apr. 2004. DOI: 10.3386/w10449.

URL: http://www.nber.org/papers/w10449.

[2] J. Bollen and H. Mao. “Twitter Mood as a Stock Market Predic- tor”. In: Computer 44.10 (Oct. 2011), pp. 91–94. ISSN: 0018-9162.

DOI: 10.1109/MC.2011.323.

[3] Johan Bollen, Huina Mao, and Xiaojun Zeng. “Twitter mood predicts the stock market”. In: Journal of Computational Science 2.1 (2011), pp. 1–8. ISSN: 1877-7503. DOI: http://doi.org/10.

1016/j.jocs.2010.12.007.URL: http://www.sciencedirect.

com/science/Article/pii/S187775031100007X.

[4] Werner F. M. De Bondt and Richard Thaler. “Does the Stock Mar- ket Overreact?” In: The Journal of Finance 40.3 (1985), pp. 793–805.

ISSN: 00221082, 15406261. URL: http : / / www . jstor . org / stable/2327804.

[5] Stefano DellaVigna. Psychology and Economics: Evidence from the Field. Working Paper 13420. National Bureau of Economic Re- search, Sept. 2007.DOI: 10.3386/w13420.URL: http://www.

nber.org/papers/w13420.

[6] Eugene F Fama. “Efficient capital markets: A review of theory and empirical work”. In: The journal of Finance 25.2 (1970), pp. 383–

417.

[7] Ronen Feldman. “Techniques and Applications for Sentiment Analysis”. In: Commun. ACM 56.4 (Apr. 2013), pp. 82–89. ISSN: 0001-0782. DOI: 10.1145/2436256.2436274.URL: http://

doi.acm.org.focus.lib.kth.se/10.1145/2436256.

2436274.

22

(30)

BIBLIOGRAPHY 23

[8] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference and prediction.

2nd ed. Springer, 2009, p. 241. URL: http : / / www - stat . stanford.edu/~tibs/ElemStatLearn/.

[9] Wei Huang, Yoshiteru Nakamori, and Shou-Yang Wang. “Fore- casting stock market movement direction with support vector machine”. In: Computers Operations Research 32.10 (2005). Appli- cations of Neural Networks, pp. 2513–2522.ISSN: 0305-0548.DOI: http://doi.org/10.1016/j.cor.2004.03.016. URL: http://www.sciencedirect.com/science/Article/

pii/S0305054804000681.

[10] Daniel Kahneman and Amos Tversky. “Prospect Theory: An Anal- ysis of Decision under Risk”. In: Econometrica 47.2 (1979), pp. 263–

291.ISSN: 00129682, 14680262.URL: http://www.jstor.org/

stable/1914185.

[11] Yong Liu, Shali Jiang, and Shizhong Liao. “Efficient Approxima- tion of Cross-validation for Kernel Methods Using Bouligand In- fluence Function”. In: Proceedings of the 31st International Confer- ence on International Conference on Machine Learning - Volume 32.

ICML’14. Beijing, China: JMLR.org, 2014, pp. I-324–I-332. URL: http : / / dl . acm . org / citation . cfm ? id = 3044805 . 3044843.

[12] Burton G. Malkiel. “The Efficient Market Hypothesis and Its Crit- ics”. In: The Journal of Economic Perspectives 17.1 (2003), pp. 59–

82. ISSN: 08953309.URL: http://www.jstor.org/stable/

3216840.

[13] Michela Nardo, Marco Petracco-Giudici, and Minás Naltsidis.

“WALKING DOWN WALL STREET WITH A TABLET: A SUR- VEY OF STOCK MARKET PREDICTIONS USING THE WEB”.

In: Journal of Economic Surveys 30.2 (2016), pp. 356–369.ISSN: 1467- 6419. DOI: 10.1111/joes.12102. URL: http://dx.doi.

org/10.1111/joes.12102.

[14] Nasdaq Nordic. Vad bestämmer priset på aktier? http://www.

nasdaqomxnordic.com/utbildning/aktier/vadbestammerprisetpaaktier.

Accessed: 2017-04-23.

(31)

24 BIBLIOGRAPHY

[15] Alya Al Nasseri, Allan Tucker, and Sergio de Cesare. “Quanti- fying StockTwits semantic terms’ trading behavior in financial markets: An effective application of decision tree algorithms”.

In: Expert Systems with Applications 42.23 (2015), pp. 9192–9210.

ISSN: 0957-4174.DOI: http://doi.org/10.1016/j.eswa.

2015.08.008. URL: http://www.sciencedirect.com/

science/Article/pii/S0957417415005473.

[16] Chong Oh and Olivia Sheng. “Investigating predictive power of stock micro blog sentiment in forecasting future stock price di- rectional movement”. In: (2011).

[17] Tobias Preis, Helen Susannah Moat, and H. Eugene Stanley. “Quan- tifying Trading Behavior in Financial Markets Using Google Trends”.

In: Scientific Reports 3.1684 (Apr. 2013).URL: http://dx.doi.

org/10.1038/srep01684.

[18] PsychSignal. About Us. https://psychsignal.com. Accessed:

2017-04-23.

[19] Psychsignal Example Notebook. StockTwits Trader Mood. https:

//www.quantopian.com/data/psychsignal/stocktwits.

Accessed: 2017-05-11.

[20] Eduardo J. Ruiz et al. “Correlating Financial Time Series with Micro-blogging Activity”. In: Proceedings of the Fifth ACM Inter- national Conference on Web Search and Data Mining. WSDM ’12.

Seattle, Washington, USA: ACM, 2012, pp. 513–522. ISBN: 978-1- 4503-0747-5. DOI: 10.1145/2124295.2124358. URL: http:

//doi.acm.org.focus.lib.kth.se/10.1145/2124295.

2124358.

[21] Serguei Saavedra et al. “Synchronicity, instant messaging, and performance among financial traders”. In: Proceedings of the Na- tional Academy of Sciences of the United States of America 108.13 (2011), pp. 5296–5301.ISSN: 00278424.URL: http://www.jstor.

org/stable/41125693.

[22] Robert P. Schumaker and Hsinchun Chen. “Textual Analysis of Stock Market Prediction Using Breaking Financial News: The AZFin Text System”. In: ACM Trans. Inf. Syst. 27.2 (Mar. 2009), 12:1–12:19.ISSN: 1046-8188.DOI: 10.1145/1462198.1462204.

URL: http : / / doi . acm . org . focus . lib . kth . se / 10 . 1145/1462198.1462204.

(32)

BIBLIOGRAPHY 25

[23] Scikit-learn. Preprocessing data. Accessed: 2017-04-23.

[24] Alaa F. Sheta, Sara Elsir M. Ahmed, and Hossam Faris. “A Com- parison between Regression, Artificial Neural Networks and Sup- port Vector Machines for Predicting Stock Market Index”. In:

International Journal of Advanced Research in Artificial Intelligence (IJARAI) 4.7 (2015).DOI: http://dx.doi.org/10.14569/

IJARAI . 2015 . 040710 # sthash . hZ48g3h8 . dpuf. URL: http://thesai.org/Publications/ViewPaper?Volume=

4&Issue=7&Code=IJARAI&SerialNo=10.

[25] Andrei Shleifer. Inefficient Markets: An Introduction to Behavioral Finance. 2000.DOI: 10.1093/0198292279.001.0001.

[26] Andrei Shleifer and Lawrence H. Summers. “The Noise Trader Approach to Finance”. In: The Journal of Economic Perspectives 4.2 (1990), pp. 19–33. ISSN: 08953309. URL: http://www.jstor.

org/stable/1942888.

[27] StockTwits. About StockTwits. https : / / stocktwits . com / about. Accessed: 2017-04-23.

[28] Francis E.H Tay and Lijuan Cao. “Application of support vector machines in financial time series forecasting”. In: Omega 29.4 (2001), pp. 309–317. ISSN: 0305-0483. DOI: http://doi.org/

10.1016/S0305- 0483(01)00026- 3. URL: http://www.

sciencedirect.com/science/Article/pii/S0305048301000263.

[29] Xue Zhang, Hauke Fuehres, and Peter A. Gloor. “Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear””. In: Procedia - Social and Behavioral Sciences 26 (2011), pp. 55–

62.ISSN: 1877-0428.DOI: http://dx.doi.org/10.1016/j.

sbspro.2011.10.562.URL: http://www.sciencedirect.

com/science/Article/pii/S1877042811023895.

(33)

www.kth.se

Applying investor sentiment to a prediction model of the stock market

Applying investor sentiment to a

prediction model of the stock

market

AUGUST BERGMAN

SONJA ERICSSON

Applying investor sentiment

to a prediction model of the

stock market

AUGUST BERGMAN AND SONJA ERICSSON

Abstract

Sammanfattning

Contents

Chapter 1

Introduction

1.1 Research Question

1.2 Scope

1.3 Outline

Chapter 2

Background

2.1 The stock market

2.2 Technological prediction models and sup-

port vector machines

2.3 Sentiment analysis

Chapter 3

Method

3.1 Data

3.2 Evaluating performance

Chapter 4

Results

4.1 Performance of each model

Chapter 5

Discussion

5.1 Discussion of results

5.2 Discussion of method

Chapter 6

Conclusion

Chapter 7

Future research

Bibliography