Trading strategies based on a pattern detection algorithm

(1)

Master Thesis, 30 hp

M.Sc. Industrial Engineering and Management – Risk Management, 300 hp

Trading strategies based on a pattern detection

algorithm

Elias Björklund

(2)

TRADING STRATEGIES BASED ON A PATTERN DETECTION ALGORITHM Submitted in partial fulfillment of the requirements for the degree Master of Science in Industrial Engineering and Management

Department of Mathematics and Mathematical Statistics Umeå University

SE-901 87 Umeå, Sweden

(3)

Abstract

This thesis aims to develop a method to algorithmically detect patterns used in technical analysis.

Non-parametric Kernel regression is used to smoothen the otherwise extremely noisy data of how stock prices develop over time. To find these patterns, Previously described quantitatively defined criteria are used with some modifications. In total six patterns are searched for, where three of them are intended to predict an incline of the asset price and three to predict a decline.

As a basis for the study, data of 500 U.S stocks are analyzed. These 500 stocks were all present in the S&P500 index at the beginning of 2010 and the daily closing price of each of these assets is obtained from the beginning of 2010 until the end of 2020. This period is divided into two peri- ods, one training set and one test set. The longer training set period is used to optimize trading strategies, and the shorter test period is used to test these strategies.

The algorithm to detect these patterns was successfully implemented and this resulted in detection

of a sufficient amount of each pattern to be able to evaluate their efficiency during the training

period. All of the patterns intended to predict a decline in the asset price failed. This is most

likely due to the fact that the stock market has had a nearly continuous increase during the entire

study period. These patterns are therefore not used for the analysis of the test period. In con-

trast, the three remaining patterns, which are all intended to predict an incline of the assets price,

could generate excess returns of the risk-free rate, before adjusting for risk. After risk adjustment,

two of these patterns outperformed a Buy-and-hold strategy during the training period. The best

combinations of parameters for each of these three patterns are then applied on the test data. The

most interesting conclusion from the analysis of the test period is that none of the pattern-based

strategies that could outperform the Buy-and-hold strategy during the training period can do that

during the test period. The conclusion is therefore that the strategies that are able to beat the

Buy-and-hold during the training period have a high probability of being over-optimized on that

particular data set and do not perform well enough to be relied on.

(4)

Sammanfattning

I den här studien är målet att utveckla en algoritm för att kunna upptäcka mönster som an- vänds inom teknisk analys. Icke-parametrisk Kernel regression används för att utjämna aktiepriser som annars är väldigt volatila. För att hitta dessa mönster har sex stycken, tidigare kvantitativt definierade, kriterier med några modifikationer använts. Totalt används sex mönster, där tre av dessa är avsedda för att förutspå en uppgång i aktiepriset, medan de tre andra är avsedda för att förutspå en nedgång i aktiepriset.

Som bas för studien används 500 amerikanska aktier. Dessa 500 aktier utgjorde indexet S&P 500 vid början av 2010 och det dagliga avslutspriset för dessa tillgångar samlades in från början av 2010 till slutet av 2020. Denna tidsperiod delas i två perioder, en längre period som utgör data för träning samt en kortare period som används för testning. Träningsdatan används för att optimera handelsstrategierna, medan den kortare tidsperioden nyttjas för att testa dessa strategier.

Algoritmen för att hitta de avsedda mönstren implementeras på ett framgångsrikt sätt och resul- terar i att ett tillräckligt antal av varje mönster hittas och deras effektivitet utvärderas. Samtliga mönster som är förknippat med framtida nedgång av aktiepriset fungerar inte. Detta beror mest troligt på att priset av de studerade aktierna har haft närmast kontinuerligt positiv utveckling under den valda tidsperioden. På grund av detta används dessa tre mönster inte på den data som utgör testperioden. De tre andra mönstren visar goda resultat vid analys av den data som an- vänds för att träna och optimera mönster. Alla tre mönster påvisar resultat av att kunna generera högre avkastning än den riskfria räntan, åtminstone innan de justeras för den medförda risken.

Efter riskjustering av avkastningen kan två av dessa mönster överträffa avkastningen av en strategi

där man i början av träningsperioden köper aktier för en lika stor summa av vart och ett av de

500 bolagen och sedan behåller dessa aktier till träningsperiodens sista dag, en så kallad ”Buy-

and-hold” strategi. De bästa parametrerna av de tre mönstren som påvisar en högre avkastning

än den riskfria räntan av används sedan på datan avsed för testning av strategierna. Från det

testet är den mest intressanta slutsatsen att ingen av de två tidigare strategierna som påvisar

högre riskjusterad avkastning än ”Buy-and-hold” strategin nu kan göra detta. Utifrån detta dras

slutsatsen att parametrarna för dessa strategier mest troligt är överoptimerade för träningsdatan

och att de därmed inte fungerar tillräckligt bra för att kontinuerligt kunna påvisa en riskjusterad

överavkastning.

(5)

Acknowledgements

The field of Technical Analysis has always been something that inspired me tremendously. I am therefore very grateful that I got the opportunity to write my master thesis in this subject. Even though it has been a challenge to collaborate with a company based in another city during a pan- demic, I am grateful to Amplify Capital and especially Tony Sarossy who gave me the opportunity to do this study in collaboration with them.

I would also like to thank my supervisor Dr. Eric Libby at the Department of Mathematics and Statistics, Umeå University for valuable comments and advises during my Master Thesis pe- riod.

Last but not least, I would like to thank my family who has encouraged me at times when I

have not seen any end to this paper. Without you, this would not have been possible.

(6)

1 Introduction 1

1.1 Background . . . . 1

1.2 Task description . . . . 1

1.3 Aim and purpose . . . . 1

1.4 Delimitations . . . . 1

1.5 Motivation of chosen work procedure . . . . 2

1.6 Disposition of the paper & advice to the reader . . . . 2

2 Literature studies 3 2.1 Efficient Market Hypothesis . . . . 3

2.2 Support for technical analysis . . . . 3

2.3 Related studies to pattern recognition in financial assets . . . . 3

3 Theory 5 3.1 Smoothing estimators and non-parametric Kernel Regression . . . . 5

3.1.1 Selection of bandwidth . . . . 6

3.2 Excess returns and risk adjusted returns . . . . 9

3.2.1 Sharpe Ratio . . . . 9

4 Method and choice of data 10 4.1 Data samples . . . . 10

4.2 Pattern detection algorithm . . . . 10

4.2.1 Quantitative definitions of technical patterns . . . . 11

4.2.2 Smoothening of price series . . . . 16

4.2.3 Rolling window . . . . 17

4.2.4 Locating the extremes . . . . 18

4.2.5 Finding the patterns . . . . 18

4.3 Evaluating the performance of the patterns . . . . 19

4.3.1 Conditional excess returns . . . . 19

4.3.2 Hypothesis testing . . . . 19

4.3.3 Calculating portfolio returns and risk adjustment of returns . . . . 20

4.3.4 Differences in evaluation of the training versus the test data . . . . 20

5 Results 21 5.1 Examples of identified patterns . . . . 21

5.2 Results from the training sample . . . . 24

5.2.1 Conditional excess returns . . . . 24

5.2.2 Risk-adjusted portfolio returns . . . . 26

5.3 Results from the test sample . . . . 27

(7)

List of Figures

1 The true curve of Y = sin(X) and 500 data points generated from Y=sin(X) +0.5Z, where Z ∼ N (0, 1). . . . 7 2 The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z,

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.1. . . . . 7 3 The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z,

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=2.5. . . . . 8 4 The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z,

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.35. . . . . 8 5 Example of how a Head & Shoulder-pattern could look like where E ₁ to E ₅ denotes

the extremes of the pattern. . . . . 11 6 Example of how a Inverse Head & Shoulder-pattern could look like where E 1 to E 5

denotes the extremes of the pattern. . . . . 12 7 Example of how a Broadening Top-pattern could look like where E 1 to E 5 denotes

the extremes of the pattern. . . . . 13 8 Example of how a Broadening Bottom-pattern could look like where E 1 to E 5 denotes

the extremes of the pattern. . . . . 14 9 Example of how a Rectangle Top-pattern could look like where E 1 to E 5 denotes the

extremes of the pattern and the dotted line represent the linear line between the two local minima. . . . . 15 10 Example of how a Rectangle Bottom-pattern could look like where E ₁ to E ₅ denotes

the extremes of the pattern and the dotted line represent the linear line between the two local maxima. . . . . 16 11 Illustration of rolling window from P ₁ − P _1+n−1 . . . . . 17 12 Illustration of rolling window from P ₂ − P _2+n−1 . . . . . 17 13 Illustration of the difference between an extreme in the smoothened price and an

important extreme in the original price series. . . . . 18 14 Example of a Head & Shoulder pattern found within the 63 days window for stock

symbol MMN.N at day 265-328. The blue vertical line represents the position in the window where the last extreme must be detected. . . . . 21 15 Example of a Inverse Head & Shoulder pattern found within the 63 days window for

stock symbol MMN.N at day 124-187. The blue vertical line represents the position in the window where the last extreme must be detected. . . . . 22 16 Example of a Broadening Top pattern found within the 63 days window for stock

symbol MMN.N at day 227-290. The blue vertical line represents the position in the window where the last extreme must be detected. . . . . 22 17 Example of a Broadening Bottom pattern found within the 63 days window for stock

symbol MMN.N at day 1272-1335. The blue vertical line represents the position in the window where the last extreme must be detected. . . . . 23 18 Example of a Rectangle Top pattern found within the 63 days window for stock

symbol MMN.N at day 194-257. The blue vertical line represents the position in the window where the last extreme must be detected. The black line is the linear line between the two minima. The value of the stock at the last day of the window must be below this line for the pattern to be recognized. . . . . 23 19 Example of a Rectangle Bottom pattern found within the 63 days window for stock

symbol MMN.N at day 1261-1324. The blue vertical line represents the position in

the window where the last extreme must be detected. The black line is the linear line

between the two maxima. The value of the stock at the last day of the window must

be above this line for the pattern to be recognized. . . . . 24

(8)

List of Tables

1 Means, standard deviations, 95% confidence interval of returns and numbers of Head & Shoulder patterns found for different holding periods and multiples of the bandwidth. . . . 25 2 Means, standard deviations, 95% confidence interval of returns and numbers of In-

verse Head & Shoulder patterns found for different holding periods and multiples of the bandwidth. . . . . 25 3 Means, standard deviations, 95% confidence interval of returns and numbers of

Broadening Top patterns found for different holding periods and multiples of the bandwidth. . . . . 25 4 Means, standard deviations, 95% confidence interval of returns and numbers of

Broadening Bottom patterns found for different holding periods and multiples of the bandwidth. . . . . 26 5 Means, standard deviations, 95% confidence interval of returns and numbers of Rect-

angle Top patterns found for different holding periods and multiples of the bandwidth. 26 6 Means, standard deviations, 95% confidence interval of returns and numbers of Rect-

angle Bottom patterns found for different holding periods and multiples of the band- width. . . . . 26 7 Sharpe Ratio of Inverse Head & Shoulder patterns found for different holding periods

and multiples of the bandwidth. Values in bold indicate the parameters with the highest Sharpe ratios. . . . . 27 8 Sharpe Ratio of Broadening Bottom patterns found for different holding periods and

multiples of the bandwidth. Values in bold indicate the parameters with the highest Sharpe ratios. . . . . 27 9 Sharpe Ratio of Rectangle Bottom patterns found for different holding periods and

multiples of the bandwidth. Values in bold indicate the parameters with the highest Sharpe ratios. . . . . 27 10 The mean annualized Sharpe ratio of the buy-and-hold strategy during the training

period. . . . . 27 11 The mean annualized Sharpe ratio during the test period, from the best performing

strategies during the training period. . . . . 28

12 The mean annualized Sharpe ratio of the buy-and-hold strategy during the test period. 28

(9)

1 Introduction

1.1 Background

Predicting the future price movements of financial markets is a relatively well-studied area. The Efficient Market Hypothesis (EMH) is a theory which declares that it is not possible to consistently beat the market (Fama 1965). The theory states that asset prices reflect all information making it impossible for investors to find undervalued market assets to buy and overpriced market assets to sell. This implies that strategies based on stock selection or market timing will not outperform the overall market and the only way for an investor to generate higher returns is by purchasing riskier assets.

Even though studies related to market efficiency have been awarded the Nobel Prize, there are still those who oppose this theory and believe that there are ways to continuously generate excess returns. Therefore methods that attempt to forecast and predict future trends and movements of financial assets are prevailing. One of these methods is called technical analysis which is a disci- pline that seeks to identify investment opportunities based on the past trading activities, such as volume and price movements.

In the field of technical analysis there are popular and well-known patterns of prices of finan- cial assets. These patterns can be categorized either as a reversal pattern, that signals that the trend of the price may be about to change direction, or a continuation pattern, which signals that the trend of the price will continue in the current direction. These patterns have been used within technical analysis for a long time in the belief of being able to forecast future price movements.

1.2 Task description

The assignment of this project has been designed as a proposal from the student and has then been approved by the employer. The employer, Amplify Capital, currently only trades financial assets manually but is interested in developing algorithmic trading in the future. This project will be an example of how a quantitative strategy can be developed and how its performance can be evaluated using backtesting of historical data.

The assignment will be to create a system that can recognize price patterns used in technical analysis that has been quantitatively defined and to develop a trading strategy based on these patterns. When the strategy is established it will be optimized by backtesting its performance on historical price data.

1.3 Aim and purpose

The aims and purposes of this project are to:

• Through a quantitative approach examine whether known price patterns used in technical analysis can provide an edge for an investor.

• Show how a quantitative trading strategy can be developed and how its performance can be evaluated.

1.4 Delimitations

The strategy will only be adapted for liquid markets. This is because when trading illiquid mar-

kets there is a risk that you cannot enter or exit a trade with the desired quantity without moving

the price of the asset. To be able to develop a trading system in an illiquid market it must be

taken into account how much a trade would move the price and that is not the purpose of this thesis.

(10)

There are many known patterns used in the field of technical analysis. Creating a trading system based on all of these known patterns would be too time consuming and therefore only six patterns were chosen for examination.

The developed trading strategy will not be used on real time data and in real trading. Instead the strategy will be backtested on historical data to evaluate its performance.

1.5 Motivation of chosen work procedure

Many aspects of technical analysis can be perceived as rather subjective, and even though there is a lot of literature about technical analysis, few of these are statistically based. Because of the subjectivity and lack of proper statistical analysis regarding price patterns related to technical analysis, the work procedure of this papers has been influenced by the limited numbers of previous studies done in the field, and by the literature which has a statistical and quantitative approach to the subject. The way that I have chosen to quantitatively define the patterns and the algorithm for detection of these patterns are essentially as described by Lo, Mamaysky, and Wang (2000), but with some modifications. The way to evaluate the effectiveness of these patterns are partially influenced by Savin, Weller, and Zvingelis (2007), but they only tested the predictive power of one pattern while I have chosen to test the predictive power of six patterns.

1.6 Disposition of the paper & advice to the reader

After this introduction, the thesis is divided into the following five main parts:

• Literature study

• Theory

• Method & choice of data

• Results

• Discussion and conclusions

After these five main parts of the paper there is also a list of references. My advice to the reader

is to read the entire papers from start to finish in order to understand all aspects that this paper

covers but also in order to perceive the results.

(11)

2 Literature studies

This section covers the Efficient Market Hypothesis, earlier studies that give support to technical analysis and related studies of pattern recognition in time series of financial assets. These are all useful information for reading this paper.

2.1 Efficient Market Hypothesis

According to the EMH of Fama (1965, 1970) all past prices of a stock are reflected in today’s asset prices, meaning that the prices of an asset follow a random walk over time. This implies that it should be not be possible to construct a trading strategy based on historical prices and to generate positive expected returns when accounting for the transaction costs. Since technical analysis is based on predicting price movements by using historical prices, the EHM indicates that technical analysis is of no use to earn positive expected returns. C. J. Neely (2003) argues that if technical analysis should be informative it is not enough for the returns on such a strategy to generate positive expected returns but it should also outperform the Buy-and-hold strategy on a risk-adjusted basis. The problem with risk adjustment of the returns is that risk is difficult to measure and every risk adjustment is subject to criticism (C. J. Neely 2003). White (2000) explains that when developing a trading strategy on a set of data and then evaluating it on the same data set, the performance of that strategy does not necessarily depend on the strategy itself, but of luck. This phenomena is called data snooping and indicates that by changing parameters of a trading och forecasting model one can succeed in making that model to perform well for the data set in which the model has been developed, but that it provides no certainty for the model to work well on a unseen set of data (White 2000).

2.2 Support for technical analysis

Brock, Lakonishok, and LeBaron (1992) tested two simple and popular trading strategies based on moving averages and trading range break on the Dow Jonex Index from 1897 to 1986. They found strong support that these two strategies had a predictive power even after testing the strategies on non-overlapping subperiods in order to avoid the problem of data snooping. C. Neely, Weller, and Dittmar (1996) used genetic programming techniques to determine rules for trading strategies and was able to find support for significant out-of-sample excess returns for six exchange rates between 1981 and 1995. By the use of bootstrapping they where able to detect patterns in the data that standard statistical models was not able to find.

2.3 Related studies to pattern recognition in financial assets

Several previous studies of pattern recognition and pattern matching in time series of financial assets has been conducted. The different studies proposes pattern detection algorithms ranging from more simple ones to machine learning techniques.

Osler and Chang (1995) developed an algorithm to detect head-and-shoulder patterns in currency

exchange rates by studying defined local extremes in the data. By applying the algorithm, they

found that a trading strategy based on their method of detection (the head-and-shoulder pattern),

was able to generate statistically significant profits for two of the six currencies for which the

strategy was tested. Lo, Mamaysky, and Wang (2000) proposed a pattern detection algorithm by

finding local extremes from smoothed prices series using non-parametric kernel regression. They

defined ten patterns based on the local extremes and tested the statistical significance of the one-

day continuously compounded return after a patter was completed. To determine if the chart

patterns was informative Lo, Mamaysky, and Wang (2000) compared the distribution of the one-

day returns followed by one of the ten patterns with the distribution of all one-day returns, i.e they

compared the conditional one-day returns versus the unconditional one-day returns. What they

found was that these patterns do provide incremental information but that it does not necessarily

(12)

generate excess trading profits. Savin, Weller, and Zvingelis (2007) utilize the pattern recognition algorithm of Lo, Mamaysky, and Wang (2000) with some modification to examine the predictive power of the head-and-shoulder pattern. While Lo, Mamaysky, and Wang (2000) tries to determine the predictive power one day after the patterns are completed Savin, Weller, and Zvingelis (2007) instead asses the predictive power of the head-and-shoulders pattern one, two and three months ahead of time. In their research, Savin, Weller, and Zvingelis (2007) provides strong evidence that their investigated pattern had potential to predict excess returns.

Even though many books have been published on the subject of technical analysis, these books

rarely use any statistical analysis to verify the content of them. The book Encyclopedia of chart

patterns (Bulkowski 2005) is one exception that uses statistical rules as an attempt to measure the

effectiveness of different chart patterns. To do this, Bulkowski uses a computer algorithm to search

for 53 different patterns in databases containing approximately 1,000 different stocks. Bulkowski

does not only report the statistics of the different but also present visual examples of the different

patterns.

(13)

3 Theory

This section will cover the theory on which this paper is based upon. To guide the reader through the outline, it is divided into the following subsections:

• Smoothing estimators and non-parametric Kernel Regression

• Excess returns and risk adjusted returns

3.1 Smoothing estimators and non-parametric Kernel Regression

The concepts of kernel regression will later be used in the method of detecting patterns in time series of financial assets but first the general theory about smoothing estimators and kernel regres- sion will be described.

The purpose of regression is to find a curve that describes a general relationship between an explanatory variable X and a response variable Y . For a dataset consisting of n number of data points {(X i , Y i )} ⁿ i=1 the relationship between the explanatory variable X and the response variable Y can be described as:

Y _i = m(X i ) + i , i = 1, ..., n (1)

where m is the unknown regression function and i are the observation errors. (Härdle 1990, p. 3) Smoothing of a dataset {(X i , Y i )} ⁿ i=1 concerns the method of approximating the mean response curve m in the regression relationship of Equation 1. If a dataset consists of several observation values of Y ’s at a fixed point X = x the estimation of m(x) can be performed by taking the aver- age of those corresponding Y -values. However, many times that is not the case and the dataset is structured such that there is only one single response variable Y and a single explanatory variable X . (Härdle 1990, p. 17)

In the case where there is only one single response variable Y , and a single explanatory vari- able X, a natural choice is to to take the mean of of the response variables near a point at x. This is called the local average and should be implemented so that the mean of the response variables Y lies in the neighbourhood of x, because observations of Y far away from x will in most cases have very different means compared to the observations of Y close to x. This can be seen as the basic idea of smoothing and can be formally defined as:

ˆ

m (x) = n ⁻¹

n

X

i=1

W _ni (x)Y i (2)

where {W ni (x)} ⁿ i=1 is a sequence of weights which is what determines how ˆ m (x) will be smoothed based on observations of Y ’s around x. (Härdle 1990, p. 22)

One approach to determine the representation of {W ni (x)} ⁿ i=1 from Equation 2 is by the ker- nel smoothing technique. A kernel is a continuous, bounded and symmetric real function K which integrates to one:

Z

K (u)du = 1 (3)

The sequence of weights for kernel smoothers is defined by:

W _ni (x) = K h n (x − X i )/ ˆ f _h _n (x) (4)

(14)

where

f ˆ _h _n (x) = n ⁻¹

n

X

i=1

K _h _n (x − X i ) (5)

which controls for the fact that the weights will always sum to one, and where

K h _n (u) = h ⁻¹ n K (u/h n ) (6)

is the kernel with a scaling factor h n , also called the bandwidth, which provides the flexibility of scaling the weights of the kernel. (Härdle 1990, p. 32)

Insertion of the weights sequence for kernel smoothers in Equation 4 into the smoothing estimator of Equation 2 provides the kernel estimator ˆ m h (x) of m(x) as:

ˆ m (x) =

P n

i=1 K h (x − X i )Y i

P n

i=1 K h (x − X i ) (7)

which is most known as the Nadaraya–Watson estimator. (Härdle 1990, p. 32)

As mentioned in Härdle (1990, p. 33) many different kernel functions are in general possible to use. For this paper the Gaussian kernel is used, which is the Gaussian distribution scaled with the bandwidth, h:

K h (x) = 1 h √

2π e ⁻ ^2h2 ^x2 (8)

3.1.1 Selection of bandwidth

A challenge with non-parametric Kernel regression is the choice of the bandwidth to find the appropriate amount of smoothing. This means choosing a value of h n in Equation 7 such that

ˆ

m (x) gives a fair approximation of m(x). A bandwidth value that is too low comes with the risk of overfitting the data and the smoothing estimator ˆ m (x) will be too rough compared to the the unknown regression function m(x). If the value of the bandwidth instead is too high the smoothing estimator ˆ m (x) will be too smooth and will underfit the data. To illustrate what is meant by this Figure 1 shows a curve blue that is generated by the equation:

y _i = sin(x i ), x i = 0, .., 2π (9)

and 500 red data points generated by the equation:

y _i = sin(x i ) + 0.5Z i , x _i = 0, .., π and Z i ∼ N (0, 1) (10)

which means that the the data points has a mean of the sine curve but a standard deviation of 0.5

from the sine curve.

(15)

Figure 1: The true curve of Y = sin(X) and 500 data points generated from Y=sin(X) +0.5Z, where Z ∼ N (0, 1).

Next kernel estimators with three different values of the bandwidth are estimated for the data.

Figure 2-4 shows the sin curve plotted as the blue line, the data points from 10 as red points and the kernel estimator as the black line. In Figure 2 the value of the bandwidth is 0.1 which seems to imply that to much weight is given to observations close and too little weights is given to observations far away. This entails that the kernel estimator is too noisy. Figure 3 illustrates the kernel estimation with bandwidth 2.5 and in this case it is clear that too much weight is given to observations far away since the kernel estimator does not fit the sin curve well. A better value of the bandwidth is shown in Figure 4 where the kernel regression function is very similar to the sin curve.

Figure 2: The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z, where

Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.1.

(16)

Figure 3: The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z, where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=2.5.

Figure 4: The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z, where

Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.35.

(17)

3.2 Excess returns and risk adjusted returns

Excess returns is a metric that helps investor to compare the performance of their investments to other investment alternatives. If the returns of an investment exceed the returns of a proxy, then excess returns are achieved. The proxy used to compare the investors returns can differ but some often used proxies are the risk free rate or a benchmark with similar levels of risk to the investment being considered. Excess returns are determined by subtracting the returns of the proxy from the returns of another investment. A positive value demonstrates that the returns of the investment outperformed the proxy, while a negative value value indicates that investing in the proxy gener- ated a higher return. (Chen 2021)

Risk-adjusted returns measure the yield of an investment relative to the amount of the risk associ- ated to it during that period. The concept of risk-adjusted returns can for instance be applied to individual stocks, investment funds or entire portfolios. There are different measures of adjusting returns after the risk with the investment has been accounted for, but the general idea is to help investors to determine if the rewards of their investment was worth the risk taken. (Chen 2020)

3.2.1 Sharpe Ratio

The Sharpe Ratio describes the excess return received for the extra volatility endured for holding a more risky asset. The Sharpe Ratio is calculated as follows:

Sharpe ratio = R p − R f

σ _p (12)

as originally described by the Nobel prize laureate William F. Sharpe. (Sharpe 1966)

Where R p is the portfolio return, R f the risk free rate and σ p the standard deviation of the

portfolio excess return.

(18)

4 Method and choice of data

The first part of this section describes what type of data samples that have been used and from where it was obtained. The second part of this section describes the method and the algorithm that was used to detect the patterns. Finally, the third part of this section presents the trading strategy based on the identified patterns together with an evaluation of the trading strategy’s performance.

4.1 Data samples

Collecting the data for this paper was a challenge. Since every pattern occurs rather infrequently it was necessary to have a data sample of more than one stock to be able to detect enough patterns to make the evaluation of these patterns statistically significant. To obtain a data sample of many companies, the data set for this paper was chosen to be the companies listed in S&P 500 index for January of 2010. The S&P 500 index is a composition of the 500 largest companies in the United States by their market capitalization and the constituents in the index are updated once every quarter. To find which stocks that were in the S&P 500 index during the beginning of 2010 was a challenge since data for this is not easily found without some kind of paid service. The constituents of the S&P 500 index was eventually found with the help of Eikon Data API which is a paid service that is accessible through Umeå University library. When the constituents of the index were found, the daily closing price was collected between 2010-01-04 and 2020-12-31.

This data was also obtained from the Eikon Data API and the closing prices were adjusted for stock splits and dividends. The motivation of choosing the companies listed in S&P 500 index for January of 2010 is that it will generate data for some assets that have performed well and is still in the index by the end of 2020. However, it will also generate data for stocks that have performed less well and therefore are no longer in the index by the end of 2020. If one would instead choose the constituents that were in the index by the end of 2020 and obtained the data from these stocks back to January of 2010, the data set would consist only of assets that have outperformed the rest of the market and the data set would be biased towards the best performing stocks of that decade.

This data sample was later divided into two subsamples. The first subsample was the training data where the trading strategies were developed and optimized. The second subsample was the test data which the optimized strategies from the training data were tested on. The training data consists of the data between 2010-01-04 and 2017-12-29 and the test data consists of the data between 2018-01-02 and 2020-12-31. The original data set was divided into these subsets such that the strategies were not developed and tested on the same data which would avoid the problem of data snooping mentioned by White (2000).

The risk free rate used in this paper was the daily 1 month Treasury bill (T-bill) rate, which was obtained from from the website of Kenneth R. French (French 2021).

4.2 Pattern detection algorithm

(19)

4.2.1 Quantitative definitions of technical patterns

To be able to detect technical patterns in time series of asset prices they must first be quantita- tively defined. Lo, Mamaysky, and Wang (2000) defined ten patterns and in this papers six of these definitions will be used. However, they did not define if these pattern were bullish or bear- ish. A bullish pattern suggests that the price of an asset will rise. In contrast, a bearish pattern suggests that the price will decline. By adding additional constraints for two of these six patterns as indicated below, and by the studies of the six patterns from Bulkowski (2005) I was able to categorize each of the patterns as either bullish or bearish.

The name of these patterns, if they are considered bullish or bearish, and how they are defined in terms of their geometric shape are:

Head & Shoulder (HS) - Bearish pattern

The HS-pattern is defined by five consecutive local extremes and have to satisfy the following conditions:

• The 1st and 5th extremes must be local maxima.

• The 3rd extreme must be a local maximum that is larger than the local maxima of the 1st and the 5th.

• The 2nd and 4th extremes must be local minima.

• The 1st extreme and 5th extremes have to be within 1.5 % of their average.

• The 2nd extreme and 4th extremes have to be within 1.5 % of their average.

Where Figure 5 illustrates an example of how a Head & Shoulder-pattern could look like.

Figure 5: Example of how a Head & Shoulder-pattern could look like where E 1 to E 5 denotes the

extremes of the pattern.

(20)

Inverse Head & Shoulder (IHS) - Bullish pattern

The IHS-pattern is the inverse of HS-pattern and is therefore also defined by five consecutive local extremes and have to satisfy the following conditions:

• The 1st and 5th extremes must be local minima.

• The 3rd extreme must be a local minimum that is less than the local minima of the 1st and the 5th.

• The 2nd and 4th extremes must be local maxima.

• The 1st extreme and 5th extremes have to be within 1.5 % of their average.

• The 2nd extreme and 4th extremes have to be within 1.5 % of their average.

Where Figure 6 illustrates an example of how a Inverse Head & Shoulder-pattern could look like.

Figure 6: Example of how a Inverse Head & Shoulder-pattern could look like where E 1 to E 5 denotes

(21)

Where Figure 7 illustrates an example of how a Broadening Top-pattern could look like.

Figure 7: Example of how a Broadening Top-pattern could look like where E 1 to E ₅ denotes the extremes of the pattern.

Broadening Bottom (BB) - Bullish pattern

The BB-pattern is the inverse of BB-pattern and is therefore also defined by five consecutive local extremes and have to satisfy the following conditions:

• The 1st, 3rd and 5th extremes are local minima.

• The 2nd and 4th extreme are local maxima.

• The 1st extreme>the 3rd extreme>the 1st extreme.

• The 2nd extreme<the 4th extreme.

Where Figure 8 illustrates an example of how a Broadening Bottom-pattern could look like.

(22)

Figure 8: Example of how a Broadening Bottom-pattern could look like where E 1 to E 5 denotes the extremes of the pattern.

Rectangle Top (RT) - Bearish pattern

The RT-pattern is defined by five consecutive local extremes that have to satisfy the following conditions:

• The 1st, 3rd and 5th extremes are local maxima.

• The 2nd and 4th extreme are local minima.

• The 1st, 3rd and 5th maxima, as well as the 2nd and 4th minima must be within 0.75 % of their average, respectively.

• The highest local minima must be<the lowest local maxima.

• The assets price when performing the trade must be below a linear line between the two

local minima, see Figure 18. This is an additional constraint added in this paper in order to

categorize the pattern as bearish.

(23)

Figure 9: Example of how a Rectangle Top-pattern could look like where E 1 to E 5 denotes the extremes of the pattern and the dotted line represent the linear line between the two local minima.

Rectangle Bottom (RB) - Bullish pattern

The RB-pattern is the inverse of RT-pattern and is therefore also defined by five consecutive local extremes and have to satisfy the following conditions:

• The 1st, 3rd and 5th extremes are local minima.

• The 2nd and 4th extreme are local maxima.

• The 1st, 3rd and 5th minima, as well as the 2nd and 4th maxima must be within 0.75 % of their average, respectively.

• The lowest local maxima must be>the highest local minima.

• The assets price when performing the trade must be above a linear line between the two local maxima, see Figure 19. This is an additional constraint added in this paper in order to categorize the pattern as bullish.

Where Figure 10 illustrates an example of how a Rectangle Bottom-pattern could look like.

(24)

Figure 10: Example of how a Rectangle Bottom-pattern could look like where E 1 to E 5 denotes the extremes of the pattern and the dotted line represent the linear line between the two local maxima.

4.2.2 Smoothening of price series

Time series of financial assets are extremely noisy and to look for every local extreme in such data would not be meaningful. There are different methods to smoothen the data, for example kernel regression, local polynomial regression and others. In this thesis, I have used non-parametric kernel regression which general theory is described in detail in Section 3.1. When trying to find a curve that describes a general relationship between an explanatory variable X and a response variable Y , Equation 2 explains that it can be described as:

Y _i = m(X i ) + i , i = 1, ..., n

where m is the unknown regression function and i is the observation errors.

In this paper the data set consists of daily stock prices and the data set for one stock is de-

fined as {(X i , P i )} ^T i=1 where P i is the price of the stock of the day X i . In other words, X is the

explanatory variable and P is the response variable. By inserting this into Equation 2 the following

(25)

Where K h is the kernel with the bandwidth h. In this paper the kernel is the Gaussian distribution from Equation 8 scaled by the bandwidth, h, which is expressed in Equation 11. To evaluate how the choice of the bandwidth affects the patterns being detected a multiple of 1.5 of h was also tested.

4.2.3 Rolling window

Once having obtained an approach to smoothen the price of a stock, a rolling window is applied.

This means that the entire price series of a stock is divided into overlapping windows of a certain length where the difference of the first day of two following windows was one day. Figure 11 and Figure 12 illustrates how the entire price series was divided into subseries of two following windows using the rolling window approach. There are two reasons for using the rolling window approach and dividing the price series into overlapping windows, which are:

• It provides the opportunity to decide the maximum number of days which a pattern can occur.

• It eliminates the look-ahead bias, which will be described later in Section 4.2.5.

In this paper the length of each window was selected to be 63 days. Lo, Mamaysky, and Wang (2000) used a length of 38 days for each window, but the study of 53 different patterns by Bulkowski (2005) showed that most patterns can form during time periods longer than 38 trading days.

Therefore the length of each window was chosen to be 63 days, which is the same number of days in the study done by Savin, Weller, and Zvingelis (2007), where they analyzed the Head & Shoulder pattern.

Figure 11: Illustration of rolling window from P 1 − P 1+n−1 .

Figure 12: Illustration of rolling window from P 2 − P 2+n−1 .

Given the price series {P 1 , .., P _T } windows were created from t to t + n − 1 where t varies from 1 to T − n + 1 and n was the selected number for each window of 63 days. This ensures that a rolling window of 63 days was created for the entire price series.

For every window, the Nadaraya-Watson estimator was used to smoothen the price in the window.

With Equation 14 each window of prices was smoothened by:

ˆ

m t,n (x) =

P t+n−1

j=t K h (x − X j )P j

P t+n−1

j=t K _h (x − X j ) , t = 1, ..., T − n + 1 (15)

(26)

4.2.4 Locating the extremes

Since the patterns in Section 4.2.1 are defined by local maxima and minima, the extremes in the price for each window must be identified. This was done in two steps. First the extremes in the smoothened price series ˆ m t,n (x) had to be identified. A point ˆ m t,n (x t ) is a local maximum if:

ˆ

m _t,n (x t−1 ) < ˆ m _t,n (x t ) and ˆ m _t,n (x t ) > ˆ m _t,n (x t+1 ) and a point ˆ m _t,n (x t ) is a local minimum if:

ˆ

m _t,n (x t−1 ) > ˆ m _t,n (x t ) and ˆ m _t,n (x t ) < ˆ m _t,n (x t+1 )

Once a local maximum or minimum was identified at ˆ m t,n (x t ), the original price series was ex- amined at [P t−1 , P t+1 ] to ensure that the extreme of the smoothened price corresponded to an extreme also in the original price curve. These extremes were labeled as Important extremes. After these two steps for finding the extremes were applied, the Important extremes could be found in the original price series with the help of the smoothened price series. See Figure 13 for an example.

Figure 13: Illustration of the difference between an extreme in the smoothened price and an impor-

tant extreme in the original price series.

(27)

Section in 4.2.1. However, it is not enough that five consecutive extremes that satisfies the condi- tions for a pattern is located. In addition, the last extreme must also occur at position n − 3 in a window. This means that there is a lag of 3 days between the last extreme and the day the pattern is considered as identified. This ensures that no look-ahead bias was used when later calculating the returns of a pattern. Since the kernel estimator ˆ m t,n (x t ) smoothens the price based on infor- mation from both before and after a point, t, it means that if the returns had been calculated from that point t, then the returns conditioned of a pattern would lead to a look-ahead bias. This is due to the fact that information regarding the price was used to determine the extreme at point t . Therefore, the last extreme of the five consecutive extremes that satisfies the conditions of a pattern must be located at day n − 3 within a window, and that returns can be calculated from point n, where n in this paper was selected to be 63 days.

4.3 Evaluating the performance of the patterns

When computing the returns conditioned on a pattern, and when evaluating the performance of a strategy based on a pattern, the following assumptions were made:

• The transaction costs are negligible.

• The performance of a strategy is not evaluated based on the gain in capital, but rather as a percentage of the increment or decrement relative to the trading price.

• Every time a pattern is identified, a trade can be made. There is never a lack of capital.

• The conditional excess returns follows a normal distribution

4.3.1 Conditional excess returns

For every pattern detected Lo, Mamaysky, and Wang (2000) calculated the continuously com- pounded return for one day, but as Savin, Weller, and Zvingelis (2007) mentions the time horizons stated in technical trading manuals is often longer even if there is no clear consensus of how long the time horizon should be. For this paper a time horizon of 10, 20 and 30 days was used to cal- culate the continuously compounded returns once a pattern was detected. Suppose that a pattern is identified in window t, where t = 1, ...T − n + 1, then the continuously compounded return is calculated by:

r t,k = ln P t+n+k

P _t+n

, k = 10, 20, 30 (16)

To calculate the excess return during this period the daily 1 month treasury bill (T-bill) rate is continuously compounded for the same period k and then subtracted from r t,k The six patterns defined in Section 4.2.1 are either bullish or bearish. For a bullish pattern to generate positive returns the conditional excess returns should be positive, and for a bearish pattern to generate positive returns the conditional excess returns should be negative.

For all stocks, the excess returns conditioned on a specific pattern was calculated. Then the mean and standard deviation of each specific pattern, specific value of k and both bandwidth multiples of 1.0 and 1.5 were calculated to evaluate the predictive power of the patterns.

4.3.2 Hypothesis testing

Every pattern was evaluated on its own. To evaluate the performance on the conditional excess returns there were two steps used to see if each pattern could generate excess returns.

1. Determine if the mean of the conditional returns is higher than zero for bullish patterns and

lower than zero for bearish patterns.

(28)

2. Create a 95 % confidence interval of conditional excess returns from the bullish and bearish patterns that have passed step 1.

The confidence interval was only computed during the training set to test the statistical significance of each pattern and each value of k.

4.3.3 Calculating portfolio returns and risk adjustment of returns

Testing if each pattern can generate excess returns does not give enough information. Adjusting the return for the related risk also had to be done. This was done with the Sharpe ratio from Equation 12. In order to the calculate the Sharpe ratio of a strategy, it was required to estimate the development of the portfolio over time. To estimate the portfolio return of a strategy, each of the one-day returns for each asset was calculated. Then for each of the assets that had a return of a certain day, the mean of those one day returns was calculated for each day. By applying this for each pattern, holding period and bandwidth multiple, a daily portfolio value was obtained.

Once a daily portfolio value was obtained for all days during the period, the average annualized Sharpe ratios were computed by:

Sharpe Ratio = ¯r p − ¯r f

σ p

∗ √

252 (17)

Where ¯r p is the mean daily portfolio log return, ¯r f is the mean daily risk free log rate, σ p is the standard deviation of the mean daily portfolio log return minus the mean of the risk free log rate and the multiplication of √

252 is to annualize the Sharpe ratios.

To calculate the average of log-returns over a period the following formula was used:

¯r = P T

t=1 r t

T

Where T is the amount of returns during that period and r t is every individual log returns during that period.

4.3.4 Differences in evaluation of the training versus the test data

• The lengths of the sample periods. The training data consists of the dates between 2010-01-04 to 2017-12-29 and the test data consists of the dates between 2018-01-02 to 2020-12-31.

• The hypothesis testing was only made on the training data.

• The average annualized Sharpe ratios was calculated for both data samples. For the training

data, the Sharpe ratios were calculated for all patterns, holding periods and bandwidth

multiples that had passed the hypothesis testing. For the test data, the Sharpe ratios were

only calculated based on the best combination of holding period and bandwidth multiple of

(29)

5 Results

The results obtained in this study was chosen to be divided into the following parts:

• Empirical examples of identified patterns.

• Results from the training sample.

• Results from the testing sample.

5.1 Examples of identified patterns

As described in Section 4.2, I used non-parametric Kernel regression to identify each of the six described patterns; HS, IHS, BT, BB, RT and RB. Figure 14-19 shows empirical examples of patterns detected from my analyses. To detect the patterns, I used two different multiples of the bandwidth; 1.0 and 1.5. This was done in order to evaluate two types of smoothening degrees.

A higher multiple of the bandwidth will result in a more smoothened curve and vice versa. In general, more patterns were detected using the bandwidth multiple of 1.0. The most common pattern detected in the training data was the IHS-pattern, for which I found between 2532-3061 unique patterns, see Table 2. The least common pattern in the training data was the RT-bottom for which I found between 1514-1781 unique patterns, see Table 5.

Figure 14: Example of a Head & Shoulder pattern found within the 63 days window for stock symbol

MMN.N at day 265-328. The blue vertical line represents the position in the window where the last

extreme must be detected.

(30)

Figure 15: Example of a Inverse Head & Shoulder pattern found within the 63 days window for

stock symbol MMN.N at day 124-187. The blue vertical line represents the position in the window

where the last extreme must be detected.

(31)

Figure 17: Example of a Broadening Bottom pattern found within the 63 days window for stock symbol MMN.N at day 1272-1335. The blue vertical line represents the position in the window where the last extreme must be detected.

Figure 18: Example of a Rectangle Top pattern found within the 63 days window for stock symbol

MMN.N at day 194-257. The blue vertical line represents the position in the window where the last

extreme must be detected. The black line is the linear line between the two minima. The value of

the stock at the last day of the window must be below this line for the pattern to be recognized.

(32)

Figure 19: Example of a Rectangle Bottom pattern found within the 63 days window for stock symbol MMN.N at day 1261-1324. The blue vertical line represents the position in the window where the last extreme must be detected. The black line is the linear line between the two maxima.

The value of the stock at the last day of the window must be above this line for the pattern to be recognized.

5.2 Results from the training sample

5.2.1 Conditional excess returns

In order to evaluate the performance of the patterns I chose to calculate the means, the standard

deviations and the 95 % confidence intervals for each pattern. I made these calculations for all

holding periods, k, where k = 10, 20 and 30 days. Similar to Savin, Weller, and Zvingelis (2007) I

used two different multiples of the bandwidth determined by Equation 11, these multiples were set

to 1.0 and 1.5. These data together with the Sharpe Ratio, described in Section 3.2.1 calculated

on the portfolio returns, were used to select which patterns and parameters that should be applied

on the test data. All data from these calculations are presented in Tables 1-6. As stated in Section

4.3.1 all bullish patterns (IHS, BB, RB) should generate excess returns if they result in a positive

value of mean return. In opposite, all bearish patterns should generate excess returns if they result

in a negative mean return. However, I found that all mean returns for all patterns, holdings periods

and bandwidth multiples generated positive mean returns. This indicates that none of the bearish

(33)

Table 1: Means, standard deviations, 95% confidence interval of returns and numbers of Head &

Shoulder patterns found for different holding periods and multiples of the bandwidth.

Head & Shoulder Holding

period (=k) Bandwidth

multiple Mean Standard deviation

Lower 95%

Confidence interval

Upper 95%

Confidence interval

Number of found patterns

10 1.0 0.003830 0.044730 0.002207 0.005452 2919

1.5 0.004015 0.046422 0.002189 0.005841 2483

20 1.0 0.009103 0.062431 0.006831 0.011375 2901

1.5 0.009951 0.062287 0.007494 0.012409 2468

30 1.0 0.013824 0.074804 0.011096 0.016552 2888

1.5 0.014950 0.073902 0.012025 0.017874 2453

Table 2: Means, standard deviations, 95% confidence interval of returns and numbers of Inverse Head & Shoulder patterns found for different holding periods and multiples of the bandwidth.

Inverse Head & Shoulder Holding

period (=k) Bandwidth

multiple Mean Standard deviation

Lower 95%

Confidence interval

Upper 95%

Confidence interval

Number of found patterns

10 1.0 0.002112 0.043640 0.000571 0.003653 3028

1.5 0.001493 0.043012 -0.000168 0.003154 2577

20 1.0 0.005087 0.062030 0.002889 0.007284 3061

1.5 0.005449 0.059955 0.003126 0.007772 2559

30 1.0 0.006196 0.078029 0.003421 0.008971 3037

1.5 0.006307 0.076011 0.003352 0.009262 2542

Table 3: Means, standard deviations, 95% confidence interval of returns and numbers of Broadening Top patterns found for different holding periods and multiples of the bandwidth.

Broadening Top Holding

period (=k) Bandwidth

multiple Mean Standard deviation

Lower 95%

Confidence interval

Upper 95%

Confidence interval

Number of found patterns

10 1.0 0.000931 0.048683 -0.001017 0.002880 2397

1.5 0.000746 0.050152 -0.001236 0.002728 2459

20 1.0 0.004578 0.068092 0.001840 0.007316 2376

1.5 0.002569 0.070288 -0.000216 0.005354 2447

30 1.0 0.007311 0.083403 0.003953 0.010669 2370

1.5 0.006572 0.084772 0.003207 0.009938 2438

(34)

Table 4: Means, standard deviations, 95% confidence interval of returns and numbers of Broadening Bottom patterns found for different holding periods and multiples of the bandwidth.

Broadening Bottom Holding

period (=k) Bandwidth

multiple Mean Standard deviation

Lower 95%

Confidence interval

Upper 95%

Confidence interval

Number of found patterns

10 1.0 0.000689 0.057276 -0.001767 0.003145 2089

1.5 0.000396 0.056435 -0.002019 0.002810 2099

20 1.0 0.004562 0.078618 0.001182 0.007941 2079

1.5 0.004823 0.079264 0.001427 0.008220 2092

30 1.0 0.005988 0.093291 0.001970 0.010005 2071

1.5 0.005308 0.091648 0.001373 0.009242 2084

Table 5: Means, standard deviations, 95% confidence interval of returns and numbers of Rectangle Top patterns found for different holding periods and multiples of the bandwidth.

Rectangle Top Holding

period (=k) Bandwidth

multiple Mean Standard deviation

Lower 95%

Confidence interval

Upper 95%

Confidence interval

Number of found patterns

10 1.0 0.004524 0.043670 0.002500 0.006548 1789

1.5 0.004879 0.043129 0.002739 0.007019 1560

20 1.0 0.011320 0.061586 0.008458 0.014183 1778

1.5 0.009607 0.060134 0.006614 0.012599 1551

30 1.0 0.015238 0.077376 0.011633 0.018843 1770

1.5 0.013230 0.075777 0.009453 0.017008 1546

Table 6: Means, standard deviations, 95% confidence interval of returns and numbers of Rectangle Bottom patterns found for different holding periods and multiples of the bandwidth.

Rectangle Bottom Holding

period (=k) Bandwidth

multiple Mean Standard deviation

Lower 95%

Confidence interval

Upper 95%

Confidence interval

Number of found patterns

10 1.0 0.003245 0.039920 0.001391 0.005099 1781

1.5 0.003025 0.041781 0.000931 0.005118 1530

20 1.0 0.007340 0.056522 0.004708 0.009973 1771

1.5 0.006278 0.057483 0.003391 0.009166 1523

(35)

best combinations of parameters in bold style in Tables 7-9, with the mean annualized Sharpe ratio of a Buy-and-hold strategy, see Table 10. The results show that the IHS- and RB-pattern both had combinations of parameters that could outperform the Buy-and-hold strategy on a risk ad- justed basis, while no combinations of the RB-pattern did manage to outperform the Buy-and-hold strategy.

Table 7: Sharpe Ratio of Inverse Head & Shoulder patterns found for different holding periods and multiples of the bandwidth. Values in bold indicate the parameters with the highest Sharpe ratios.

Inverse Head & Shoulder Holding

period (=k) Bandwidth

multiple Sharpe ratio

10 1.0 0.904329

20 1.0 0.895330

1.5 0.832638

30 1.0 0.813140

1.5 0.764897

Table 8: Sharpe Ratio of Broadening Bottom patterns found for different holding periods and multiples of the bandwidth. Values in bold indicate the parameters with the highest Sharpe ratios.

Broadening Bottom Holding

period (=k) Bandwidth

multiple Sharpe ratio

20 1.0 0.592857

1.5 0.538260

30 1.0 0.674423

1.5 0.662958

Table 9: Sharpe Ratio of Rectangle Bottom patterns found for different holding periods and multiples of the bandwidth. Values in bold indicate the parameters with the highest Sharpe ratios.

Rectangle Bottom Holding

period (=k) Bandwidth

multiple Sharpe ratio

10 1.0 0.688960

1.5 0.689927

20 1.0 0.833191

1.5 0.671701

30 1.0 0.820073

1.5 0.725508

Table 10: The mean annualized Sharpe ratio of the buy-and-hold strategy during the training period.

Training data Sharpe ratio Buy-and-hold-strategy 0.706497

5.3 Results from the test sample

Based on the results from the training data, I selected the best performing combination for each

of the bullish patterns to be used for the test sample. Those combinations are highlighted in

(36)

bold style in Tables 7-9. I noticed that the bandwidth multiple that provided the highest mean annualized Sharpe ratio was 1.0 for all of the three patterns. However, the patterns differed in which holding period that gave the highest mean annualized Sharpe ratio. For the IHS-pattern, the optimal holding period was 10 days, and for the BB-pattern and the RB-pattern the optimal holding periods were 30 and 20 days, respectively.

When analyzing the test samples using the optimal selection of combinations of holding peri- ods and bandwidth multiples from the training samples, I found that these strategies in general seemed to perform worse on the test period compared to their performance on the training data.

Both the IHS-pattern and the RB-patterns resulted in negative mean annualized Sharpe ratios.

However, the BB-pattern resulted in a positive mean annualized Sharpe ratio, but it was lower than the corresponding ratio from the training sample.

In order to evaluate what these values mean, they need to be evaluated to how the portfolio would have performed compared to some benchmark. I therefore calculated the mean annualized Sharpe ratio if one would have bought an equal part of each asset at the starting date (the Buy- and-hold strategy), see Table 12. Similar to the comparisons of the pattern performances between the training and the test data, the Buy-and-hold strategy resulted in a lower Sharpe ratio for the test data relative to the training data. This indicates that the overall performance of the assets probably has been worse during the test period. A possible explanation for this could be that the test period partly overlap with the time for the Covid-19 pandemic. However, the most interesting conclusion was that none of the pattern-based strategies that could outperform the Buy-and-hold strategy during the training period, could outperform this benchmark during the test period. In contrast, the only bullish pattern (BB-pattern) that could not beat the Buy-and-hold strategy during the training period was the only strategy that could outperform it during the test period.

Table 11: The mean annualized Sharpe ratio during the test period, from the best performing strategies during the training period.

Pattern Holding

period (=k) Bandwidth

multiple Sharpe ratio Inverse Head & Shoulder 10 1.0 -0.002192

Broadening Bottom 30 1.0 0.242376

Rectangle Bottom 20 1.0 -0.337638

Table 12: The mean annualized Sharpe ratio of the buy-and-hold strategy during the test period.

Test data Sharpe ratio

Buy-and-hold strategy 0.212277

Trading strategies based on a pattern detection algorithm

Master Thesis, 30 hp

M.Sc. Industrial Engineering and Management – Risk Management, 300 hp

Trading strategies based on a pattern detection

algorithm

Elias Björklund

TRADING STRATEGIES BASED ON A PATTERN DETECTION ALGORITHM Submitted in partial fulfillment of the requirements for the degree Master of Science in Industrial Engineering and Management

Department of Mathematics and Mathematical Statistics Umeå University

SE-901 87 Umeå, Sweden

Abstract

This thesis aims to develop a method to algorithmically detect patterns used in technical analysis.

The algorithm to detect these patterns was successfully implemented and this resulted in detection

of a sufficient amount of each pattern to be able to evaluate their efficiency during the training

period. All of the patterns intended to predict a decline in the asset price failed. This is most

likely due to the fact that the stock market has had a nearly continuous increase during the entire

study period. These patterns are therefore not used for the analysis of the test period. In con-

trast, the three remaining patterns, which are all intended to predict an incline of the assets price,

could generate excess returns of the risk-free rate, before adjusting for risk. After risk adjustment,

two of these patterns outperformed a Buy-and-hold strategy during the training period. The best

combinations of parameters for each of these three patterns are then applied on the test data. The

most interesting conclusion from the analysis of the test period is that none of the pattern-based

strategies that could outperform the Buy-and-hold strategy during the training period can do that

during the test period. The conclusion is therefore that the strategies that are able to beat the

Buy-and-hold during the training period have a high probability of being over-optimized on that

particular data set and do not perform well enough to be relied on.

Sammanfattning

Efter riskjustering av avkastningen kan två av dessa mönster överträffa avkastningen av en strategi

där man i början av träningsperioden köper aktier för en lika stor summa av vart och ett av de

500 bolagen och sedan behåller dessa aktier till träningsperiodens sista dag, en så kallad ”Buy-

and-hold” strategi. De bästa parametrerna av de tre mönstren som påvisar en högre avkastning

än den riskfria räntan av används sedan på datan avsed för testning av strategierna. Från det

testet är den mest intressanta slutsatsen att ingen av de två tidigare strategierna som påvisar

högre riskjusterad avkastning än ”Buy-and-hold” strategin nu kan göra detta. Utifrån detta dras

slutsatsen att parametrarna för dessa strategier mest troligt är överoptimerade för träningsdatan

och att de därmed inte fungerar tillräckligt bra för att kontinuerligt kunna påvisa en riskjusterad

överavkastning.

Acknowledgements

I would also like to thank my supervisor Dr. Eric Libby at the Department of Mathematics and Statistics, Umeå University for valuable comments and advises during my Master Thesis pe- riod.

Last but not least, I would like to thank my family who has encouraged me at times when I

have not seen any end to this paper. Without you, this would not have been possible.

Contents

1 Introduction 1

1.1 Background . . . . 1

1.2 Task description . . . . 1

1.3 Aim and purpose . . . . 1

1.4 Delimitations . . . . 1

1.5 Motivation of chosen work procedure . . . . 2

1.6 Disposition of the paper & advice to the reader . . . . 2

2 Literature studies 3 2.1 Efficient Market Hypothesis . . . . 3

2.2 Support for technical analysis . . . . 3

2.3 Related studies to pattern recognition in financial assets . . . . 3

3 Theory 5 3.1 Smoothing estimators and non-parametric Kernel Regression . . . . 5

3.1.1 Selection of bandwidth . . . . 6

3.2 Excess returns and risk adjusted returns . . . . 9

3.2.1 Sharpe Ratio . . . . 9

4 Method and choice of data 10 4.1 Data samples . . . . 10

4.2 Pattern detection algorithm . . . . 10

4.2.1 Quantitative definitions of technical patterns . . . . 11

4.2.2 Smoothening of price series . . . . 16

4.2.3 Rolling window . . . . 17

4.2.4 Locating the extremes . . . . 18

4.2.5 Finding the patterns . . . . 18

4.3 Evaluating the performance of the patterns . . . . 19

4.3.1 Conditional excess returns . . . . 19

4.3.2 Hypothesis testing . . . . 19

4.3.3 Calculating portfolio returns and risk adjustment of returns . . . . 20

4.3.4 Differences in evaluation of the training versus the test data . . . . 20

5 Results 21 5.1 Examples of identified patterns . . . . 21

5.2 Results from the training sample . . . . 24

5.2.1 Conditional excess returns . . . . 24

5.2.2 Risk-adjusted portfolio returns . . . . 26

5.3 Results from the test sample . . . . 27

List of Figures

1 The true curve of Y = sin(X) and 500 data points generated from Y=sin(X) +0.5Z, where Z ∼ N (0, 1). . . . 7 2 The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z,

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.1. . . . . 7 3 The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z,

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=2.5. . . . . 8 4 The true curve of Y=sin(X) and 500 data points generated from Y=sin(X) +0.5Z,

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.35. . . . . 8 5 Example of how a Head & Shoulder-pattern could look like where E 1 to E 5 denotes

the extremes of the pattern. . . . . 11 6 Example of how a Inverse Head & Shoulder-pattern could look like where E 1 to E 5

denotes the extremes of the pattern. . . . . 12 7 Example of how a Broadening Top-pattern could look like where E 1 to E 5 denotes

the extremes of the pattern. . . . . 13 8 Example of how a Broadening Bottom-pattern could look like where E 1 to E 5 denotes

where Z ∼ N (0, 1) and a kernel regression estimate of the data points with bandwidth h=0.35. . . . . 8 5 Example of how a Head & Shoulder-pattern could look like where E ₁ to E ₅ denotes

extremes of the pattern and the dotted line represent the linear line between the two local minima. . . . . 15 10 Example of how a Rectangle Bottom-pattern could look like where E ₁ to E ₅ denotes