Past predicts the future: A study on how machine learning can be used for investments

(1)

TVE 16 013 maj

Examensarbete 15 hp

2016-06-07

Past predicts the future

A study on how machine learning can be used

for investments

Iliam Barkino

Mattias Bertolino

(2)

Teknisk- naturvetenskaplig fakultet UTH-enheten Besöksadress: Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress: Box 536 751 21 Uppsala Telefon: 018 – 471 30 03 Telefax: 018 – 471 30 00 Hemsida: http://www.teknat.uu.se/student

Abstract

Past predicts the future - A study on how machine

learning can be used for investments

Iliam Barkino & Mattias Bertolino

Moving Average Crossover and Multiple Linear Regression are two investment methods which are used today. They were tested empirically on historic data between 2009-2015. Especially, they yield relatively good investment advice for assets classified as Fixed Income. How the data is weighted matters little when using Moving Average Crossover. Instead, the most important variables are the length of the long moving average filter and what asset class the method is used upon. The results showed that at least two years of past data is needed for the long filter in order to yield relatively acceptable Information Ratios of 0.4-0.9.

In Multiple Linear Regression, the most important step is designing the predictive model, as the selection of covariates which are assumed to be predictive is crucial to the certainty of the prediction. Additionally important parameters are the length of the filter and what the ridge tuning parameter is chosen to, as some asset classes need to be regulated more than others when fitting a linear model to them. The results showed that when using Multiple Linear Regression, different groups of assets needed different values of tuning parameter and filter length in order to obtain good Information Ratios. That is, there was not a specific value of any parameter that yielded good results for all groups of assets.

Generally, when applying a filter to an asset, it is important to have some indication of the length of its market trends as the filter needs to be long enough to foresee a change of trend, but short enough to be computationally manageable.

Examinator: Martin Sjödin Ämnesgranskare: Alex Basu

(3)

Acknowledgments

We, the authors, would like to thank our families and friends for their love, for their con-tinuous support throughout the project, and for their understanding of our absence. We would also like to extend a thank you to our appointed thesis opponents and classmates, H˚akan Öhrn and Adam Lindell, for their valuable feedback and suggestions on how to improve this thesis, as well as our project mentor, Alex Basu, and our supervisor at the IT-institution, Per Lötstedt, for their continuous feedback on our progress. Finally, we would like to give a special thanks to Lynx Asset Management for providing us with data, and to our main supervisor from Lynx, Tobias Rydén, for his experienced guidance, for his valuable reflections on every part of the project and for always being available as sounding board. All of your inputs have contributed considerably to this thesis.

Iliam Barkino Mattias Bertolino

7th June 2016 Uppsala Univeristy

Keywords

Moving Average Crossover (MAC), Multiple Linear Regression (MLR), Ordinary Least Squares (OLS), Ridge Regression, Information Ratio, Futures.

(4)

1 Introduction

1.1 Background

Historic data can be used to make a prediction of future data values. This is very useful in various disciplines e.g. weather casting, advertising, life science or finance. In the latter, hedge funds have specialized in calculating future values and predicting changes in asset values, i.e. price changes of fixed incomes, commodities, equities or foreign exchange.

There are numerous tools for deciding whether or not to invest in a specific asset, two methods have especially been useful during the past decades; Moving Average Crossover (MAC) and Multiple Linear Regression (MLR). MAC has suggested reliable indications on whether to enter a long or short position of a future until around year 2009 when it stopped being useful.[1] This raises a curiosity to how MAC worked and if is still suitable to use, given specific classes of assets or other constrains.

Furthermore, a well-fitted MLR model to certain assets is a powerful tool for determining future values of assets and therefore very useful for hedge funds and private investors, making the method intriguing for examination.

1.2 Problem description

Various parameters affect the result of MAC and MLR. The set of variables used in MAC could explain why the method has been inefficient since 2009, and these variables should therefore be tested in order to shed light on the efficiency of MAC today. Accordingly, various parameters in MLR should be tested to determine how to streamline the method.

1.3 Purpose

The purpose of this report is to use methods from the fields of machine learning, statis-tics and system engineering to construct two investment methods, MAC and MLR. The utility of the methods are also examined to determine which parameters optimize the performance of the methods and if the methods can be used commercially.

(6)

2 Theory

2.1 Description of assets

In Table 1, a description of the assets used in this report is presented.

Table 1: The assets used in this report

CAC Cotation Assist´ee en Continu jgb Japanese Government Bond 40 largest equities listed in France

Canada60 S&P Canada 60 Index tbond U.S. Treasury Bond Canadian Stock Market Index

DAX Deutsche Boerse AG German Stock Index Aluminium Price of aluminium 30 largest equities listed in Germany

DOW Dow Jones Americas Financial Index Brent Price of brent oil Price-weighted average of 30 blue-chip stocks

Estoxx The EURO STOXX 50 Copper Price of copper Leading blue chip index for the Eurozone

FTSE Financial Times Stock Exchange 100 Index Corn Price of corn 100 largest companies listed in London

Hangseng Hong Kong Hang Seng Index Crude Price of crude oil 50 largest stocks listed in Hong Kong

Nasdaq NASDAQ Stock Market Gold Price of gold Nikkei Nikkei 225 Natgas Price of natural gas

Index for Tokyo Stock Exchange

OMX OMX Stockholm 30 Rbob Price of RBOB gasoline 30 largest companies listed in Stockholm

SP500 The Standard & Poor’s 500 Silver Price of silver 500 leading companies in US

SPI Swiss Performance Index Soybeans Price of soybeans Switzerland’s overall stock market index

Taiwan Taiwan Cap. Weighted Stock Index Sugar Price of sugar 10ynote U.S. 10 Year Treasury Note Futures Wheat Price of wheat 5ynote U.S. 5 Year Treasury Note Futures Zinc Price of zinc aus10y Aus. 10 Year Treasury Note Futures AUD Australian Dollar bobl Bundesobligationen CAD Canadian Dollar

Medium-term futures issued in Germany

bund Bund EUR Euro

Long-term futures issued in Germany

cgb10y China Government 10 Year Bond GBP Pound Sterling Futures issued by China Government

gilt Gilt-edged Security JPY Japanese Yen U.K. equivalent to U.S. Treasury securities

These assets are categoized in four asset classes equities, fixed incomes, commodities and foreign exchange. In Table 2 the assets in each class are presented.

(7)

Table 2: Assets and asset classes used.

Equities Fixed Incomes Commodities Foreign Exchange

CAC 10ynote Aluminium AUD

Canada60 5ynote Brent CAD

DAX aus10y Copper EUR

DOW bobl Corn GBP

Estoxx bund Crude JPY

FTSE cgb10y Gold

Hangseng gilt Natgas

Nasdaq JGP RBOB

Nikkei tbond Silver

OMX Soybeans

SP500 Sugar

SPI Wheat

Taiwan Zinc

These four asset classes have different features and characteristics, which are important to consider when using machine learning to predict future price changes. Different asset classes may have different underlying market forces and respond differently to sudden macro economic impacts. This may result in different trend lengths and volatility which are important conditions when designing a filter to predict future changes in price.

2.2 Financial derivatives

A Forward is a mutual agreement of a future transaction of an underlying asset between two parties, in which one party delivers the asset to the other party for a price specified at the agreement date. The underlying asset to be delivered can be commodities, fixed income, foreign currencies, bonds or stocks. Both parties are bounded to fulfill their part of the transaction, which results in a risk for one or both parties to default.[3]

Futures are similar to forwards in the sense that they are agreements between two parties of an asset. Future contracts can however, in addition to secure a price on the underlying asset for a maturity date, be rolled over to new contracts in order to not buy or sell the underlying asset. I.e. future contracts can be exited. Futures can therefore be used to ”bet” on changes in asset prices. The reason for this is that futures are purchased on an exchange, while forwards are over the counter (OTC) products, i.e. the dealers buys and sells the products directly with each other rather than publicly with specific market prices. When entering a future, both parties makes a deposit of cash. For every daily change in the futures price, cash is taken from the deposit of the party for which the price change is unfavorable and given to the other party. The buying side, or the side that wants the price of the asset to rise is said to have entered a long position. The selling side is said to have a short position. [4]

(8)

rather than to hedge against price changes. Moreover, the performance of the investment methods will be measured in, and compered to each other by, their yielded Information Ratios.

2.3 Information Ratio

The Information Ratio is a measurement of how a security performs in relationship to its risk. It is calculated as follows: Let Rpt denote the return of an active portfolio in period

t and RBt denote the return of a benchmark in the same period t. The excess return,

ERt can then be defined as:

ERt= Rpt− RBt (1)

The arithmetic average of ERt over the historic time 1 ≤ t ≤ T , ER , can be calculated

as: ER = 1 T T X t=1 ERt (2)

Let now ˆσER denote the standard deviation of ERtover the same period. It is then given

by: ˆ σER= v u u t 1 T − 1 T X t=1 (ERt− ER)2 (3)

A historic Information Ratio, which is used in this study, is then given by:

IR = ER ˆ σER

(4) [5]

2.4 Moving Average Crossover

A moving average is an average of the last N:th data points which is calculated for every day of the time series. This is one of the most commonly used trend estimators in finance and can be used as an indicator whether to take a long or short position. The moving average can be viewed as a causal finite impulse response filter (FIR) with each zero possibly weighted and is defined as:

M_τL= 1 L L−1 X i=0 ωiPτ −i (5)

where τ is the day when the positioning is determined, L is the length of the filter, ωi

are the weights of the zeros and Pτ −i are the prices at time τ − i. A new filter can be

constructed by taking the difference of a long and a short filter. The long filter will be a smoother filter with longer response time than the short filter.

(9)

There are three widely used ways to choose the weights ωi for the τ − i day. If ωi = _L1∀i

then the moving average is denoted as a standard moving average (SMA). The drawbacks of this choice is that a data point a long time ago has the same influence in an estimation of the trend as a data point much closer in time. Another choice of weights is to choose a set of linearly decreasing weights where the most recent data point has the most impact to our estimation of the trend. The weights for ωi for N data points are calculated as:

ωi = (L − i) L P i=1 i (6)

A third variation of the moving average, know as exponentially weighted moving average (EWMA), is using exponentially decreasing weights. This choice of weights emphasizes even more the impact of recent data. The weights can be calculated as:

ωi = α(1 − α)i (7)

which recursively gives:

ωi = (1 − α)ωi−1 (8)

where α is the smoothing factor. The different weighting for the short and long filters are presented in Figure 1 Therefore, EWMA is an ARMA filter of order (1,0) given by:

(10)

0 200 400 600 800 1000 1200 1400 1600 1800 2000 Past datapoint 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Share of total weight

×10-3

Simple: Long filter Simple: Short filter Exponential: Long filter Exponential: Short filter Linear: Long filter Linear: Short filter

Figure 1: Relative weights of different MAC filters.

A simple strategy is to take a long position when the sign is positive and a short position when the sign is negative:

γτ = sgn h MLshort τ − M Llong τ i =                1, MLshort τ − M Llong τ > 0 0, MLshort τ − M Llong τ = 0 −1, MLshort τ − M Llong τ < 0 (10)

To facilitate the practical application of this, γτ can be set to γτ −1, should γτ = 0,

resulting in:

γτ =

( 1

−1 (11)

The daily returns for an asset n at time τ , rτ,l, is then given by:

rτ,n= ∆Pτ +1,nγτ,n (12)

(11)

2.5 Multiple Linear Regression

Multiple linear regression is a technique used to make a prediction of a quantitative response y by using a set of covariates, xj, with its regression coefficients, βj. Common

practice is to add a constant x0 = 1 to get a variable intercept.[6] This yields a model

given by:

y = β0+ x1β1+ x2β2+ ... + xpβp+ (13)

For N observations, multiple linear regression can be expressed in matrix notation as Y = Xβ + in short notation, where:

Y =      y1 y2 .. . yN      (14)

are N observations of the dependent variable,

X =      1 x11 x12 . . . x1p 1 x21 x22 . . . x2p .. . ... ... . .. ... 1 xN 1 xN 2 . . . xN p      (15)

is the design matrix constructed of N observations of p predictors, with an additional column of ones indicating the intercept.

β =      β0 β1 .. . βp      (16)

are the regression coefficients and:

=      1 2 .. . N      (17)

is the model error, assumed as ∼ N (0, σ2

I).

The regression coefficients βj needs to be estimated assuming the error is uncorrelated

and has zero mean. [6]

ˆ

y = ˆβ0+ ˆβ1x1+ ˆβ2x2+ ... ˆβpxp (18)

is used to estimate the regression coefficients. This can be done using ordinary least squares regression, or a regularized regression such as ridge regression, described below.

(12)

in MLR are dependent of the application. This can be seen in 2 (a) and 2 (b) where the b-coefficients are plotted for arbitrarily chosen settings for asset 1 (CAC) and asset 36 (AUD). Notice how a large tuning parameter shrinks the coefficients more aggressively.

(a) (b)

Figure 2: Filter coefficients for using MLR on CAC and AUD with filter length 100.

In various applications the variables are standardized by subtracting the mean and divid-ing by an estimate of the standard deviation. This will make the problem more convenient to use in situations were two or more dependent variables are compared in a risk-adjusted scale.

2.5.1 Ordinary Least Squares

To an overdetermined linear system of equations,

Y = Xβ, (19)

there is not always a solution for several choices of y. It is though possible to find an approximate linear solution fitting a p-space hyperplane to the training data. The ordinary least squares-method minimizes the residual sum of square distances from all data points to the fitted hyperplane with respect to the coefficients β. The residual sum of squares are defined as:

RSS(β) , N X i=1 (yi− ˆyi)2 = N X i=1 yi − β0− p X j=1 βijxij !2 (20)

where ˆyi is the i:th estimated value of yi.[7] Hence the least squares solution βLS is given

by minimizing the objective function in the L2-norm:

βLS = arg min

β ||Xβ − Y || 2

2 (21)

For X ∈ Rm×n_{, the objective function is given by:}

(13)

Taking the gradient with respect to β and setting this expression to zero, yields βLS:

∂||Xβ − y||2

∂β = 2XX

T_{β − 2X}T_{y = 0,} ₍₂₃₎

which leads to:

βLS = argmin β ||Xβ − y||2 2 = (XTX) −1 XTy (24) [8] 2.5.2 Ridge Regression

Ridge regression, or Tikhonov regularized regression, aims to cope with overfitting the data that can occur when ordinary least squares is used on an ill-posed problem. In ridge regression, a penalty factor called the tuning parameter, is used to adjust the sizes of the regression coefficients using the L2-regularization. The intercept is not penalized though,

as no regulations are imposed to it. The objective function in ridge regression is:

RSS(β, λ) , N X i=1 yi− β0− p X j=1 βijxij !2 + λ p X j=1 β_j2. (25)

If the tuning parameter, λ, is set to zero, the estimate will yield a least squares estimate, whereas if λ → ∞, the regression coefficients will be shrunk towards zero. Therefore, an umbrella term for ridge regression and similar methods is shrinkage methods. [9]

As in ordinary least squares, the solution is given by taking the derivative with respect to β and set it to zero. [10] In matrix form:

RSS(β, λ) , ||Xβ − y||22+ λ||β||22 (26)

∂RSS(β, λ)

∂β = 2X

T_{Xβ − 2X}T_{y + 2λβ} ₍₂₇₎

which simplifies to:

(XTX + Iλ)β = XTy, (28)

and thus the solution is given by:

βR = (XTX + Iλ)−1XTy (29)

The penalty term will induce a bias to the estimate, but reduce the variance. The choice of the tuning parameter is crucial in order to meet a good bias-variance trade off. [10]

To implement the ridge regression efficiently singular value decomposition can be used. Singular value decomposition is a factorization of a matrix A ∈ Rm×n into a unitary matrix U ∈ Rm×m_{, diagonal matrix Σ ∈ R}m×n _{and unitary matrix V ∈ R}n×n_{, where}

(14)

the diagonal entries of Σ, σi > 0 and σ1 ≥ σ2 ≥ ... ≥ σr ≥ σr+1 = σp, where r ≤ p =

min(m, n). [8] In short notation, this can be written as:

A = U ΣVT. (30)

In a Tikhonov regulation problem, the calculation of βR, is simplified to:

βR = V (ΣTΣ + λIn×n)−1ΣTUTy (31)

which improves the computation time since the inversion is limited to a diagonal matrix in each step. This is even more efficient if multiple tuning parameters are tested at once, as the diagonalization only needs to be done once.

3 Method

3.1 Grouping of assets

These assets presented in 2.1 are gathered in groups in order to speed up the computing process and easier compare the obtained results. Also, in the MLR strategy, the need of observations increases with the number of covariates in order to make the design matrix overdetermined. This will require very old observations which may not be as represen-tative as newer observations. The asset groups 1-7 that were analyzed are explained in Table 3. MAC and MLR were applied on each group of assets, and the performance of the methods were calculated as the Information Ratio on the mean performance of each goup. That is, for every group, the mean performance was measured in terms of mean returns and standard deviation of mean returns, and the Information Ratio for the entire group was calculated from this.

Table 3: Definition of grouping of the assets Asset Group Underlying assets

Group 1 CAC, Canada60, DAX, DOW, Estoxx, FTSE

Group 2 Hangseng, Nasdaq, Nikkei, Omx, Sp500, SPI, Taiwan Group 3 10ynote, 5ynote, Aus 10y, Bobl, Bund

Group 4 Cgb10y, Gilt, Jbg, Tbond

Group 5 Aluminium, Brent, Copper, Corn, Crude, Gold, Natgas Group 6 Rbob, Silver, Soybeans, Sugar, Wheat, Zink

Group 7 AUD, CAD, EUR, GBP, JPY

3.2 Moving Average Crossover

A short and a long filter was constructed using either SMA, WMA or EWMA for a set of defined long and short periods. After constructing a trend filter, a position was taken. In order to make the returns of all assets comparable, the asset specific returns were scaled by an estimate of the price volatility of each asset at time τ , ˆσn,τ:

(15)

ˆ σn,τ = v u u t 1 k k X i=1 (∆Pt−i,n− ∆Pn)2 (32)

where k is the number of past changes that were used to calculate the standard deviation. By applying the scaling, the mean daily returns of all n assets were calculated as:

˜ rτ = 1 L L X l=1 rτ,l= 1 L L X l=1 ∆Pτ +1,lγτ,l στ,l (33)

For every day, the daily returns were added cumulative in order to present the accumu-lated return at time τ , Rτ, as a function of time:

Rτ = τ X t=1 ˜ rt (34)

For every set of variable constraints, the daily return and the accumulated return were plotted and compared to observe how different variables affects the result. The analyzed variables were:

• Number of data points used to determine the short mean and the long mean value,

Lshort respectively Llong.

• Weighting type, i.e. SMA, exponential or linear weights • Different types of calculating the positioning, γ

3.3 Multiple Linear Regression

When setting up the regression model for a given asset, the daily change in closing prices in the each asset group from a defined time period back in time referred to as time lag were used as covariates. The response variable was set to the change in price from today to 21 days ahead i.e. the prediction time was 21 days. For M markets and change in price L days back, M · L + 1 covariates were used (including the intercept term). That is, each asset was predicted with the assets in the same asset group. The model setting at day τ and observation i was set as:

ˆ yτ,i = β₀τ + M X m=1 L X l=1 β_{m,τ −l}τ xτ,i_{m,τ −l} (35)

The covariates were standardized and centered by subtracting the mean and dividing with an estimation of the standard deviation. This was done in order to compare differ-ent assets on a risk-adjusted scale, rather than absolute market value.

To predict future price changes of the asset, the regression coefficients were estimated from a training set using ordinary least squares and ridge regression, and the main idea was that in a near future, the same model will apply. Using covariates standardized with

(16)

the volatility and mean estimated in the training set, a prediction was made.

The prediction was used to determine whether to take a long or short position, γ, in the asset using two positioning methods. The first method used, γ1, was a simple binary

choice method. The sign of the prediction indicating if the value of the asset was assumed increase or decrease in value and a position of +1 or -1 was taking, namely 100% of the money reserved for the asset was invested. The second method used, γ2 was a continuous

method where the predicted change in price was divided by the mean of the magnitudes of the 100 last predictions. This was done in order to relate the magnitude of the pre-diction to past prepre-dictions. A greater magnitude of the current prepre-diction indicated a more certain prediction, and thus a bigger portion of the reserved money for the asset was invested. If a prediction magnitude was greater than 1, the binary choice method was used to ensure that only money in our possess was used.

To evaluate the prediction, the actual price change was compared with the prediction at a later point and this gave the return, ˜rτ, of the prediction period for both OLS and ridge

regression calculated with equation (33). To get the final profit at time τ the accumula-tive return, Rτ, was calculated with equation (34). This procedure was repeated at the

end of each prediction cycle using the most recent training data, i.e. a new prediction was done every 21st day.

The risk coefficient was set to 5%. For every asset class the standardized return and the development of a holding starting from $10 000 was plotted for ridge tuning parameters 0, 1, 10, 100, 1000 and 10 000, varying time lag as 100, 200 and 300 and positioning method γi. For every setting the Information Ratio was calculated. The predictions were

constructed by historic data of 1000, 1640 or 2040 days, depending on the size of lag. The variation of historic data is because the bigger size of lag, the more historic data are needed to ensure the design matrix to be over ned. However, too much data means that older data will be used and old data are assumed to have different statistical properties than recent data.

4 Results

4.1 Moving Average Crossover

When using MAC on different asset classes, all classes except from foreign exchange yielded positive returns on investments. This is presented in Figure 3 to 6. Nevertheless, no asset class yielded Information Ratios above 1, as presented further on in the results.

(17)

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 1 2 3 4 5 6 7 8 Holdings [$] ×104

Figure 3: Developement of holdings using MAC on equities. The long filter has length 500 days and the short 200 days. A simple positioning method γ1 was used.

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 2 4 6 8 10 12 14 16 18 Holdings [$] ×104

Figure 4: Developement of holdings using MAC on fixed income. The long filter has length 500 days and the short 200 days. A simple positioning method γ1 was used.

(18)

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 Holdings [$] ×104

Figure 5: Developement of holdings using MAC on commodities. The long filter has length 500 days and the short 200 days. A simple positioning method γ1 was used.

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Holdings [$] ×104

Figure 6: Developement of holdings using MAC on foreign exchange. The long filter has length 500 days and the short 200 days. A simple positioning method γ1 was used.

When testing different weighting types, it was determined that the tree different weighting types yields very similar results. This is presented in Table 4 and Table 5, which shows what Information Ratios the three weighting methods yields for different groups of assets. The difference between Table 4 and Table 5 is the calculation of the positioning, γ, with

(19)

γ1 in Table 4 represents a simple long/short-positioning without consideration of how

strong the recommendation of long/short positioning is.

Table 4: Highest Information Ratios calculated for MAC with γ1

Weighting Type Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Simple 0.3163 0.4566 0.8301 0.8846 0.4485 0.5269 0.2889 Exponential 0.2756 0.4781 0.8024 0.8155 0.4264 0.4878 0.3610 Linear 0.2756 0.4754 0.8024 0.8155 0.4329 0.4947 0.3572

Table 5: Highest Information Ratios calculated for MAC with γ2

Weighting Type Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7 Simple 0.2772 0.4611 0.9465 0.8502 0.4464 0.6329 0.3933 Exponential 0.2763 0.4568 0.8109 0.8143 0.4246 0.5107 0.3728 Linear 0.26781 0.4552 0.8024 0.8133 0.4115 0.4766 0.3337

The Information Ratios presented in Table 4 and Table 5 are the highest obtained Infor-mation Ratios when using different combinations of length between long and short mean values. The long moving average varied between 50 and 2000 trading days, while the short moving average varied between 25 and 500 trading days.

It can be further noticed in Table 4 and Table 5 that MAC can be considered inefficient for most asset classes, as it only yields relatively high Information Ratios for Group 3 and Group 4. Both of these groups represents assets labeled as Fixed Income.

When it comes to what combination of long moving average and short moving average that yields the highest Information Ratio, the study showed that at least 600 trading days is needed for a long moving average in order to obtain a relatively high Information Ratio. This is presented in Figure 7 (a). Moreover, the results presented in Figure 7 (b) shows that MAC is not as dependent on the short moving average as the long moving average. However, the results suggest that either few or many days should be used for the short moving average.

(20)

600 400

Short Moving Average

200 0

Long Moving Average

0.05 0 0.1 0.15 0.2 Information Ratio 0.25 0.3 0.35 2000 1800 1600 1400 1200 1000 800 0.4 600 400 200 0 0.45 (a) 0 0.05 0.1 0.15 0.2 2000 0.25 0.3 0.35 0.4 0.45 Information Ratio

Long Moving Average

1000

Short Moving Average

0 500 450 400 350 300 250 200 150 100 50 0 (b)

Figure 7: Information Ratio as function of long/short combination for moving averages. A simple positioning method γ1 and a normal distribution of weights were used.

4.2 Multiple Linear Regression

In Table 6 to 8, the highest measured Information Ratios for choice of positioning method γ1 or γ2, varying lag time and varying ridge tuning parameter is presented. In Table 6 to 7

the highest measured Information Ratios for a fixed lag and asset group was chosen among all tuning parameters. In Table 8 the highest Information Ratios among all measured lags were chosen for each asset group and tuning parameter.

Table 6: Highest Information Ratios calculated for MLR with γ1

Lag

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

100 0.2115

0.7301

0.1935

0.2914

0.6950

0.0156

-0.8317

200 0.2135

-0.0662

0.5028

0.7236

0.6836

0.7081

0.2357

250 -0.1277

-0.0991

0.4479

0.4283

0.5362

0.2779

0.1875

Lag

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

100 0.1751

0.0667

0.0901

0.5359

0.0131

-0.2512

-0.5077

200 0.2208

-0.1632

0.4199

0.6585

0.5977

0.4498

0.0351

(21)

Tuning Group 1 Group 2 Group 3 Group 4 Group 5 Group 6 Group 7

0 0.1932 0.7137 -0.1521 0.2397 0.4771 0.6066 0.1816 1 0.2115 0.7301 0.0676 0.2176 0.6836 0.7081 0.2357 10 0.1004 0.6598 0.0264 0.2914 0.6950 0.6024 0.1875 100 0.2135 0.4157 0.1128 0.5621 0.6892 0.0953 0.1654 1000 -0.4378 -0.1266 0.3682 0.5189 0.5362 0.1179 0.0098 10 000 -0.4651 -0.5833 0.5027 0.7236 0.5114 0.2779 -0.1058

The Information Ratios presented in Table 6 to 8 show the importance to carefully choose tuning parameter and filter length. All together, the highest measured Information Ratio for each setting varied from 0.2135 to 0.7301. It can be noticed that groups that yielded better for lag time 100 days yielded worse for lag time 200 days and vice versa. Lag time 250 yielded always positive Information Ratios but never over 0.75.

In figure 8 to 11, the development of a holding in equities, fixed income, commodities and foreign exchange are plotted respectively.

Figure 8: Development of a holding in equities (group 2) with MLR for lag time 200 and a smart positioning method, γ2. Plotted for every tuning parameter lambda.

(22)

Figure 9: Development of a holding in fixed income (group 4) with MLR for lag time 200 and a smart positioning method, γ1. Plotted for every tuning parameter lambda.

Figure 10: Development of a holding in commodities (group 5) with MLR for lag time 200 and a smart positioning method, γ1. Plotted for every tuning parameter lambda.

(23)

Figure 11: Development of a holding in foreign exchange (group 7) with MLR for lag time 200 and a smart positioning method, γ2. Plotted for every tuning parameter lambda.

5 Discussion

5.1 Moving Average Crossover

As mentioned in the Background, MAC has since 2009 not been performing good enough to be used by hedge funds. However, the results showed that MAC could yield relatively acceptable Information Ratios. This was noticed especially for group 3 and 4, which are assets labeled as Fixed Income, most of them treasury bonds. This could be explained, as bonds are connected to a state and securities issued by states are very unlikely to change drastically in value as central banks of states wants reliable and stable developments of GDP. Bonds with high volatility may therefore reduce the credibility of the states economic ability to act and could reflect an unstable economic situation in the country. Additionally, the bonds used in this study belongs to the strongest economic countries in the world, so it is highly unlikely that their volatility would be high.

Applying MAC on the other groups yielded Information Ratios between 0.32-0.53, which is acceptable for such a simple method. It is however considered too low to be strictly used for investments, as the results suggest the risks are 2-3 times higher than the returns when using MAC. Nevertheless, this still suggests that MAC could be used to give hints about how asset prices are changing and be one of multiple tools for investors deciding whether or not to invest in a specific asset.

As for optimal parameter values for MAC, the results implies that longer filter yields more reliable results. This is presented in Figure 7. Again, this could be explained by lower standard deviations when longer filter are used, while the returns are relatively

(24)

small as well. The larger a filter is, the slower will it respond to changes. So if the long mean value is a large filter, all dynamic of the MAC will be in the short mean value. Additionally, the larger the filters are, the fewer position changes will be done due to little variance. So even though large filters yields higher Information Ratios, they are not necessarily good for predicting trends.

Furthermore, by using positioning proportional to the difference between the short and long filter, slightly better Information Ratios were yielded. This can be noticed by the difference between Table 4 and Table 5. During this study, three different types of weight-ing previous data were used: Simple (uniform) weights, exponential weights and linear weights. The obtained results showed little difference in performance of MAC when varying weighting type. These are unexpected results, as weighting type is connected to the filter length: linear weighting and, especially, exponential weighting uses recent data much more than older data. In that sense, a MAC using exponential weights on long filters, e.g. 500 respectively 200 days, would perform very similar to a MAC using exponential weights on filters of length 200 respectively 100 days. On the contrary, a MAC using simple weights will perform very differently depending on the length of the filter.

As the results of the different weighting yielded very similar Information Ratios, this suggests that MAC is more dependent on the length of the filters rather on the weighting type. As seen in Figure 1, the exponential and linear filters are similar and therefore it is natural that their performance is alike. Another thing to notice is that the weighting difference between the short filters is larger than the difference between the long filters. As Figure 7 suggests that the Information Ratio is more dependent on the size of the long filter, and this could explain why the difference in performance of the different weighting types is marginal.

5.2 Multiple Linear Regression

The difficulty of finding a general set of parameters, or a guideline to set the parameters are displayed in the results. Although difficult to find the optimal set of parameters, it is clear that OLS regression tends to overfit the training data and thus giving an inferior prediction. Contrarily, a too high tuning parameter will aggressively shrink the regression coefficients, inducing a strong bias.

The bias induced by regularizing the regression should therefore not discourage from the use of it. While OLS regression is the best linear unbiased estimator (BLUE) in the training data set, a biased estimator in the training set may perform better in the vali-dation set - the important set - giving a prediction with less variance. To find an optimal tuning parameter, one should carefully validate multiple predictions in parallel to see which model has a better fit in the validation data.

Likewise, when choosing the lag time, it is of utmost importance to tailor the lag time to match the trend of the asset of choice. This is shown in Table 7 where a lag time of 100

(25)

days managed particularly well when using on asset group 2, whereas lag time 200 days did a poor work, resulting in a negative Information Ratio. But using lag time 200 days on asset group 4 yielded an Information Ratio as of 0.7236, whereas 100 days only yielded an Information Ratio of 0.2914 when using the simple positioning method. This indicates that two very similar models can perform very differently depending on the application.

When choosing the positioning method, it can be seen in Table 6 to 8 that it is highly de-pending on the assets examined. This indicates that no general conclusion can be drawn whether a more dynamic positioning method like γ2 is to be preferred over a simple

po-sitioning method. A more dynamic popo-sitioning method could therefore be of interest for some settings.

The configuration of MLR in this study is not good enough to be used as reliable invest-ment method. In order to be used commercially, it has to include a method for selecting covariates. This could be achieved using methods such as cross validation or Akaike’s Information Criterion, and is a suggestion for further studies. It could be of interest to find an individual set of covariates for each asset predicted.

In this study, however, the main focus was to investigate how parameters such as lag time, magnitude of shrinking the regression coefficients and choice of positioning method could be used to find a good design of MLR. Thus, the study did not include a method for selecting covariates.

5.3 Comparison of MAC and MLR

In contrary to MAC, MLR yielded better results in several groups making MLR more versatile to different asset classes and markets. Moreover, MLR can be further devel-oped and tailored to yield yet better results, whilst the expansion options of MAC are somewhat limited. On the other side, MAC is a very simple method, a good start when starting with machine learning, and it can still be used by investors, just not to the same extent as MLR - especially a more developed MLR.

6 Conclusion

Moving Average Crossover should not be used as an investment method, but rather as an indication of how asset prices are changing. The most important parameters when using MAC are the filter lengths. MLR could, if developed more than in this study, be a commercial option for investment methods with machine learning. In order to streamline MLR further it should include an intelligent choosing of predictors. The most important parameters when using MLR depends highly on the investigated asset and no general conclusion for the parameters can be drawn.

(26)

Popul¨

arvetenskaplig sammanfattning

Modeller för att investera i olika till˚ag˚angar har länge varit ett hett ämne för s˚aväl banker och hedgefonder som privatpersoner. Allt medan teknologin p˚a senare ˚ar har utvecklats s˚a har nya investeringsmetoder kommit fram, baserade p˚a komplexa beräkningar. Tv˚a metoder som under 2000-talet använts flitigt är Glidande Korsande Medelvärden och Multipel Linjär Regression. Den tidigare är enklare men har sedan 2009 visat sig vara ganska oanvändbar, medan den senare blivit allt mer populär. I den här studien un-dersöktes olika parametervärden för de b˚ada metoderna under ˚aren 2009-2015 för att finna optimala förutsättngar för metoderna. Resultaten bedömdes utifr˚an utveckling av investerat kapital och investeringsrisken. Metoderna jämfördes utifr˚an m˚attet Informa-tionskvot, som beskriver den relativa avkastningen i förh˚allande till standardavvikelsen p˚a avkastningen.

Resultaten visade att korsande medelvärden som metod för att investera i terminer gav i snitt avkastningar p˚a hälften av risken d.v.s. Informationskvot p˚a runt 0.5, vilket ¨

ar godkänt för en s˚adan simpel metod. Metoden kan vara bra att använda för att f˚a en fingervisninging om hur pristrenden för tillg˚angens förändras i dagsläget. Däremot rekomenderas den inte att användas som ren investeringsmetod, utan snarare som ett hjälpmedel.

Vad gäller multipel linjär regression s˚a gav metoden godkända resultat med Information-skvoter p˚a 0.22-0.73 för alla testade grupper. Metoden kan vidareutvecklas genom t.ex. korsvalidering för att avgöra vilka tillg˚angar som bäst predikterar varandra, s˚a att den prediktiva modellen blir bättre. Genom en vidareutveckling av metoden skulle den, till skillnad fr˚an korsande medelvärden, kunna användas som investeringsmetod.

(27)

References

[1] WallStreetCourier. Golden Cross or other Simple Moving Average Crossover Strategies?, 2014. http://www.wallstreetcourier.com/port/po-articles-simple-moving-average-strategies.htm (Accessed 2016-05-22).

[2] Law, Jonathan. Option. A Dictionary of Business and Management (6 ed.), 2016.

http://www.oxfordreference.com.ezproxy.its.uu.se/view/10.1093/acref/9780199684984.001.0001/acref-9780199684984-e-4557# (Accessed 2016-05-22). eISBN: 9780191765278

[3] Law, Jonathan. Forward dealing. A Dictionary of Business and Management (6 ed.),

2016. http://www.oxfordreference.com.ezproxy.its.uu.se/view/10.1093/acref/9780199684984.001.0001/acref-9780199684984-e-2673 (Accessed 2016-05-22). eISBN: 9780191765278

[4] Law, Jonathan. Futures contract. A Dictionary of Business and Management (6 ed.),

2016. http://www.oxfordreference.com.ezproxy.its.uu.se/view/10.1093/acref/9780199684984.001.0001/acref-9780199684984-e-2801 (Accessed 2016-05-22). eISBN: 9780191765278.

[5] Goodwin, Thomas H. The Information Ratio. Financial Analysts Journal, Vol. 54, No. 4, pp. 34-43., 1998 Accessed: 22-05-2016 13:51 UTC.

[6] Dimitrios, A., & Stephen, G. H. Applied Econometrics. Palgrave Macmillan, 2011. [7] James, G., Witten, D., Hastie, T., & Tibshirani, R. An Introduction to Statistical

Learning. Springer New York Heidelberg Dordrecht London, 2013.

[8] Spagnolini U. Statistical Signal Processing in Engineering Draft(v2). Unpublished manuscript, Politecnico di Milano, 2015.

[9] Stanford University. Regularization: Ridge Regression and the LASSO. 2006. http://statweb.stanford.edu/ tibs/sta305files/Rudyregularization.pdf (Accessed 2016-05-22).

[10] Ginestet, Cedric E. Regularization: Ridge Regression and Lasso. http://math.bu.edu/people/cgineste/classes/ma575/p/w14 1.pdf (Accessed 2016-05-22).

Bibliography

[11] Papailias, F., Thomakos, D., D. An improved moving average technical trading rule. Elsevier B.V, 2015.

[12] Anghel D., G., I. How reliable is the moving average crossover rule for an investor on the romanian stock market?. The Review of Finance and Banking. Volume 05, Issue 2: 89—115, 2013.

[13] Pavlov V., Hurn S. Testing the profitability of moving-average rules as a portfolio selection strategy. Pacific-Basin Finance Journal 20: 825–842, 2013.

(28)

[14] Nguyen H., H., Yang Z., Le Duc T. Moving Average Trading Rules: Are They Trend-ing FollowTrend-ing Devices? Evidence from the Vietnamese Stock Market. International Review of Management and Business Research. Volume 03, Issue 4, 2014.

[15] Cruz T., R., da Costa C., Naz´ario R., T., Bergo G., S., Z., Sobreiro V., A., Kimura H. Trading System based on the use of technical analysis: A computational experiment. Journal of Behavioral and Experimental Finance 6: 42–55, 2015.

A

Extra Results

A.1 MAC

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 Holdings [$] ×104

Figure 12: Developement of holdings using MAC on equities. The long filter has length 200 days and the short 50 days. A smart positioning method γ2 and a simple weighting

(29)

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 0.5 1 1.5 2 2.5 3 3.5 4 Holdings [$] ×104

Figure 13: Developement of holdings using MAC on fixed income. The long filter has length 200 days and the short 50 days. A smart positioning method γ2 and a simple

weighting distribution were used.

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 0.5 1 1.5 2 2.5 3 3.5 Holdings [$] ×104

Figure 14: Developement of holdings using MAC on commodities. The long filter has length 200 days and the short 50 days. A smart positioning method γ2 and a simple

(30)

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Holdings [$] ×104

Figure 15: Developement of holdings using MAC on foreign exchange. The long filter has length 200 days and the short 50 days. A smart positioning method γ2 and a simple

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 2 4 6 8 10 12 Holdings [$] ×104

Figure 16: Developement of holdings using MAC on equities. The long filter has length 1000 days and the short 500 days. A smart positioning method γ2 and a simple weighting

(31)

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 0.5 1 1.5 2 2.5 Holdings [$] ×105

Figure 17: Developement of holdings using MAC on fixed income. The long filter has length 1000 days and the short 500 days. A smart positioning method γ2 and a simple

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0 1 2 3 4 5 6 7 Holdings [$] ×104

Figure 18: Developement of holdings using MAC on commodities. The long filter has length 1000 days and the short 500 days. A smart positioning method γ2 and a simple

(32)

2009 2010 2011 2012 2013 2014 2015 2016 Time [Year] 0.5 1 1.5 2 2.5 3 Holdings [$] ×104

Figure 19: Developement of holdings using MAC on foreign exchange.The long filter has length 1000 days and the short 500 days. A smart positioning method γ2 and a simple

A.2 MLR

Equities

Table 9: Information Ratios calculated for an MLR strategy in equities 1-6 with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.1932 0.0265 -0.4305 1 0.2115 0.1162 -0.3571 10 0.1004 0.0684 -0.1277 100 -0.5197 0.2135 -0.2178 1000 -1.0188 -0.4378 -0.8664 10 000 -1.0138 -0.4651 -1.1361

(33)

Table 10: Information Ratios calculated for an MLR strategy in equities 1-6 with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.1751 -0.0521 -0.3568 1 0.1033 0.1302 -0.3415 10 -0.1747 0.2208 -0.0987 100 -0.5546 0.1471 -0.2672 1000 -0.8657 -0.5736 -1.0108 10 000 -0.7830 -0.7005 -1.0705 Equities2

Table 11: Information Ratios calculated for an MLR strategy in equities 7-13 with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.7137 -0.0662 -0.4265 1 0.7301 -0.5094 -0.0991 10 0.6598 -0.7234 -0.2425 100 0.4157 -0.4898 -0.6959 1000 -0.1266 -0.5463 -0.6652 10 000 -0.8459 -0.5833 -0.9353

Table 12: Information Ratios calculated for an MLR strategy in equities 7-13 with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.1348 -0.1632 -0.4685 1 -0.1139 -0.4738 -0.2059 10 -0.0095 -0.5704 -0.2425 100 0.0667 -0.5816 -0.6059 1000 -0.1695 -0.5642 -0.7216 10 000 -0.4108 -0.6740 -0.9094 Fixed Income 1

(34)

Table 13: Information Ratios calculated for an MLR strategy in fixed income 1 with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.1521 -0.5637 -0.1445 1 -0.1225 -0.6524 0.0676 10 -0.2006 -0.6263 0.0264 100 -0.0482 -0.1202 0.1128 1000 0.1935 0.3682 0.3333 10 000 0.1294 0.5027 0.4479

Table 14: Information Ratios calculated for an MLR strategy in fixed income 1 with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.2504 -0.5030 -0.1459 1 -0.2495 -0.5088 -0.0579 10 -0.2760 -0.5256 0.0088 100 -0.2319 -0.0833 0.1416 1000 0.0901 0.3426 0.2647 10 000 0.0898 0.4199 0.4758 Fixed Income 2

Table 15: Information Ratios calculated for an MLR strategy in fixed income 2 with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.2397 0.1897 0.0743 1 0.2176 0.2018 0.0743 10 0.2914 0.1993 0.1379 100 0.2014 0.5621 0.2643 1000 -0.1839 0.51885 0.4283 10 000 -0.2436 0.7236 0.3940

(35)

Table 16: Information Ratios calculated for an MLR strategy in fixed income 2 with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.5136 0.3390 0.2247 1 0.5163 0.3525 0.2306 10 0.5359 0.4377 0.2714 100 0.5285 0.5483 0.3835 1000 0.3382 0.6585 0.5481 10 000 0.1156 0.5734 0.5664 Commodities 1

Table 17: Information Ratios calculated for an MLR strategy in commodities 1 with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.4771 0.4183 -0.2597 1 0.5476 0.6836 -0.0613 10 0.6950 0.3172 0.1837 100 0.6892 0.1984 0.3910 1000 0.1235 0.2478 0.5362 10 000 -0.0872 -0.0533 0.5114

Table 18: Information Ratios calculated for an MLR strategy in commodities 1 with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 0.01313 0.4962 -0.3008 1 0.0101 0.5977 -0.0582 10 0.0025 0.3735 0.3197 100 -0.0223 0.3105 0.4293 1000 -0.2156 0.2270 0.4232 10 000 -0.4108 0.1707 0.4953 Commodities 2

(36)

Table 19: Information Ratios calculated for an MLR strategy in commodities 2 with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.3428 0.6066 0.1646 1 -0.3515 0.7081 0.1478 10 0.0156 0.6024 0.1419 100 -0.1641 0.0272 0.0953 1000 -0.0737 0.1179 0.1088 10 000 -0.5351 0.1967 0.2779

Table 20: Information Ratios calculated for an MLR strategy in commodities 2 with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.6248 0.4498 0.1585 1 -0.6220 0.4236 0.1597 10 -0.6022 0.3087 0.1131 100 -0.4806 -0.0715 0.1160 1000 -0.2512 -0.0723 0.1555 10 000 -0.2576 -0.1259 0.0898

Table 21: Information Ratios calculated for an MLR strategy in foreign exchange with a simple +/- positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.9255 0.1816 0.1310 1 -0.8914 0.2357 0.1041 10 -0.8610 0.1706 0.1875 100 -0.9677 0.0048 0.1654 1000 -0.8317 -0.0151 0.0098 10 000 -1.0874 -0.1058 -0.1986

(37)

Table 22: Information Ratios calculated for an MLR strategy in foreign exchange with a smart positioning method

Tuning

Lag 100 Lag 200 Lag 250 Parameter 0 -0.5418 -0.0479 -0.0834 1 -0.5399 -0.0412 -0.0824 10 -0.5267 -0.0240 -0.0768 100 -0.5077 -0.0416 -0.0698 1000 -0.7088 0.0351 0.0368 10 000 -0.9888 -0.0878 -0.0011

Figure 20: Development of a holding in equities (group 1) with MLR for lag time 100 and a smart positioning method, γ1. Plotted for every tuning parameter lambda.

(38)

Figure 21: Development of a holding in equities (group 2) with MLR for lag time 100 and a simple positioning method, γ1. Plotted for every tuning parameter lambda.

(39)

Figure 24: Development of a holding in commodities (group 5) with MLR for lag time 100 and a smart positioning method, γ1. Plotted for every tuning parameter lambda.

(40)

(41)

B

MATLAB code

B.1 MAC

% Moving Average Crossover - main program

% This code applies moving average crossover on 40 assets for three

% different positioning strategies. The filter lengths are varied between % 50 and 2000 respectively 25 and 500.

% The assets were divided into seven groups in order to compare the results % to the results from MLR.

%

% 2016 Iliam Barkino, Mattias Bertolino

clear

% Load Data

load('KexJobbData.mat')

%% Process data to ajust for NaNs clPr = closingPrice;

[dates2, clPr] = removeNaN(dates, clPr);

dates2 = dates2(2062:end); % Starting from 2009 clPr = clPr(2062:end,:);

%% Parameters

longVector = (50:50:2000); % Different lengths of long filter shortVector = (25:25:500); % Different lengths of short filter

stdevDays = 21; % Number of days used to calculate the standard deviation iWeights = 3; % Number of positioning strategies

iMarkets = 7; % Number of asset groups

iGamma = 1; % Binary switch to determine what positioning strategy to use % Matrix containing all obtained maximal values of information ratio maxIR = zeros(iWeights, iMarkets, 1);

% Matrix showing what combinations of long and short filters that yielded % the highest information ratios

longShort = zeros(iWeights, iMarkets, 2);

% Matrix containing all developements of invested capital for different % combinations of long and short filters

holdingsMatrix = zeros(length(dates2), length(longVector),... length(shortVector));

risk = 0.05; % risk aversion

% Matrix containing all information ratios for different combinations of % long and short filters

(42)

%% Starting looping for all possible combinations

for iw = 1:iWeights

for im = 1:iMarkets

% Bar that shows status of the run

h = waitbar(0,['Weight type: ' num2str(iw) '/' num2str(iWeights) ...

', Market cathegory: ' num2str(im) '/' num2str(iMarkets)]);

waitbar(im/(iMarkets*iWeights) + (iw - 1)/iWeights)

switch(im) % Deciding which group of assets to evaluate

case(1) clPr2 = clPr(:,1:6); case(2) clPr2 = clPr(:,7:13); case(3) clPr2 = clPr(:,14:18); case(4) clPr2 = clPr(:,19:22); case(5) clPr2 = clPr(:,23:29); case(6) clPr2 = clPr(:,30:35); case(7) clPr2 = clPr(:,36:40); end for l = 1:length(longVector) long = longVector(l); for s = 1:length(shortVector) short = shortVector(s);

if short < long % long filter must be larger than short filter

%% Weights

if iw == 1

% Normal Weights

wLong = ones(1, long)/long; wShort = ones(1, short)/short;

elseif iw == 2

% Exponential Weights

alphaLong = 2/(long + 1); % Smoothing Param wLong = repmat(1-alphaLong, 1, long).ˆ(1:long); wLong = wLong/sum(wLong);

alphaShort = 2/(short + 1); % Smoothing Param

wShort = repmat(1-alphaShort, 1, short).ˆ(1:short); wShort = wShort/sum(wShort);

else

% Linear Weights (sums of digits)

wLong = 1/((long+1)*(long/2)) * flipud((1:long)'); wShort = 1/((short+1)*(short/2)) * flipud((1:short)');

end

% Filters

(43)

avgClS = filter(wShort, 1, clPr2);

%% Positioning

% To be able to redo the matrixes later [row, col] = size(clPr2);

trend = avgClS - avgClL;

% Choose positioning method (1 = smart, 0 = simple)

if iGamma > 0

absTrend = abs(trend);

absMeans = zeros(row-short:col);

for mm = 1:row-short

absMeans(mm,:) = mean(absTrend(mm:mm + short, :));

end

trendMean = [ones(short,col); absMeans]; gamma = trend./trendMean;

gamma(abs(gamma) > 1) = sign(trend(abs(gamma) > 1));

else

gamma = sign(trend); % gamma(i,j) = +/- 1 % Calculating position changes, if

% long average = short average, the position i kept [row2, col2] = find(gamma == 0);

for i = 1:length(row2) gamma(row2(i), col2(i)) = ... gamma(row2(i) - 1, col2(i)); end end %% Returns

% One day price difference

deltaP = diff(clPr2); % daily return % row of zeros for later calculations deltaP = [deltaP; zeros(1, col)]; % Standard deviation for returns

stdev1 = zeros(length(clPr2) - stdevDays, col);

for i = 1:length(clPr2) - stdevDays A = deltaP(i:i + stdevDays - 1, :); stdev1(i,:) = std(A);

end

% Dimension fit

stdev1 = [ones(stdevDays, col); stdev1]; % Adjust for standard deviation dimension

deltaP(1:stdevDays, :) = zeros(stdevDays, col); % Return of each asset on each day

ret1 = deltaP.*gamma./stdev1;

% Mean return of all assets on each day retTot1 = sum(ret1, 2)/col;

(44)

%% Investing

% Developement of invested capital holdings = zeros(length(dates2), col);

holdings(1,:) = 10000; % Starting att $10,000

for ii = 2:length(ret1)

% Calculating developement

holdings(ii,:) = holdings(ii - 1,:).*... (1 + risk*ret1(ii,:));

end

% Storing the mean developement for all assets in a % specific group

holdingsMatrix(:,l,s) = mean(holdings, 2); %Calculating mean return

meanProffit = mean((ret1(22:end, :))); %Calculating standard deviation of returns proffitStd = std((ret1(22:end, :)));

%Information ratio on daily bais meanInfoRet = meanProffit/proffitStd; %Annualizing the information ratio

infoRatios(l,s) = meanInfoRet*sqrt(250);

end end end

% Finding maximal Information Ratio between aall combinations of % long and short filter length

IR = infoRatios; IR(IR == 0) = nan;

maxIR(iw,im) = nanmax(nanmax(nanmax(IR)));

% Finding what combination of long and short filter length that % yielded maximal Information Ratio

[iLong, iShort] = find (IR == maxIR(iw,im));

if length(iLong) > 1

iLong = floor(mean(iLong)); iShort = floor(mean(iShort));

end

% Storing filter combinations with highest Information Ratio longShort(iw,im,1) = longVector(iLong); longShort(iw,im,2) = shortVector(iShort); close(h); end end

B.2 MLR

B.2.1 Main program

(45)

% Multiple Linear Regression - main program

% Description:

% Computes regression vector, predicts a change in price in a group % of assets and takes a position according to the predicion and the % standard deviation of each asset in the asset group.

%

% The Information Ratio is calculated and the development of % the holding of each asset class is plotted for multiple tuning

% parameters.

%

% 2016 Iliam Barkino, Mattias Bertolino clear; tic; %% Setup % Load Data load('KexJobbData.mat') % Prediction Param trainTime = 2040; % 1000 1640 2040 predTime = 21; timeFrame = 7448; lag = [250]; % 100 200 250

lambda = [0 1e0 1e1 1e2 1e3 1e4]; Ll = length(lambda); stdTime = 99; % Investment Param assetIndex = 1:7; bankStart = 10000; risk = 0.05;

smart = 1; % 0 or 1, 1 uses smart positioning, 0 don't

% Assets

name = {'Equities 1', 'Equities 2', 'Fixed Income 1', 'Fixed Income 2', ...

'Commodities 1', 'Commodities 2', 'Foreign Exchange'};

% Calculate all asset classes at ones

for asset = assetIndex

switch(asset) case(1) depAsset = [1:6]'; case(2) depAsset = [7:13]'; case(3) depAsset = [14:18]'; case(4) depAsset = [19:22]'; case(5) depAsset = [23:29]'; case(6) depAsset = [30:35]'; case(7) depAsset = [36:40]';

(46)

end

indepAsset = depAsset; Ld = length(depAsset(:,1)); Li = length(indepAsset(:,1)); % Repeat for every lag setting

for l = 1:length(lag) % Remove NaN's

% Start at 02-Jan-2009 % End at 06-Jan-2016

[datesNoNaN, clPr] = removeNaN(dates(timeFrame - predTime - ... trainTime:end), closingPrice(timeFrame - predTime - ... trainTime:end, :));

tradePeriods = floor((length(datesNoNaN) - trainTime)/predTime); diffClPr = diff(clPr);

% Pre-allocating lag-dependent variables b = zeros(lag(l)*Li, 1);

yTrain = zeros(trainTime - lag(l) - predTime, Ld);

xTrain = zeros(trainTime - lag(l) - predTime, lag(l)*Li); yVal = zeros(tradePeriods, Ld); yPred = zeros(tradePeriods, Ll*Ld); sigmay = zeros(1, Ld); holding = zeros(tradePeriods, Ll*Ld); holdingTot = zeros(tradePeriods, Ll); holding(1,:) = bankStart; holdingTot(1,:) = bankStart; datesAdjusted = holdingTot; gamma = zeros(tradePeriods, Ll*Ld);

% Speed up - This matrix needs to be created for every lag % when regressing, but not for every calculation.

ridgeEye = diag(repelem(lambda, 1, 1 + lag(l)*Li)*eye(Ll*(lag(l)*Li + 1))); % ridgeEye = diag(repelem(lambda, 1, lag(l)*Li)*eye(Ll*lag(l)*Li));

%% Regression

% Create a waitbar to show calculation time

h = waitbar(0,['Lag: ' num2str(l) '/' num2str(length(lag)) ...

', Class: ' num2str(asset) '/' num2str(assetIndex(end))]);

% Sliding window

for j = 1:tradePeriods % Speed Up - reuse data

if j > 1

yTrain(1:end - predTime, :) = yTrain(predTime + 1:end, :); xTrain(1:end - predTime, :) = xTrain(predTime + 1:end, :); start = trainTime - 2*predTime;

else

start = 1 + lag(l);

end

% Train the model

for i = start:trainTime - predTime

yTrain(i-lag(l), :) = clPr(i + j*predTime, depAsset) ... - clPr(i + (j-1)*predTime, depAsset);

(47)

xTemp = diffClPr(i - lag(l) + (j-1)*predTime : ... i - 1 + (j-1)*predTime, indepAsset);

xTrain(i-lag(l), :) = reshape(xTemp, 1, []);

end

% Standardize data

[xTrainStd, mux, sigmax] = zscore(xTrain);

XTrainStd = [ones(size(xTrainStd,1),1) xTrainStd]; % For every invested asset, calculate the regression % coefficients using both OLS and Ridge

b = ridgeRegress(yTrain, XTrainStd, lambda, ridgeEye);

%% Prediction & Validation

% Prediction of the change in price of each asset % xPred - are the predictors

% yPred - is the predicted change in price of each asset xTemp = diffClPr(i - lag(l) + j*predTime : ...

i - 1 + j*predTime, indepAsset); xPred = reshape(xTemp, 1, []);

xPred = (xPred - mean([xTrain; xPred]))./std([xTrain; xPred]); XPred = [1 xPred];

yPred(j,:) = XPred*b;

% Smart positioning (optional)

if smart > 0.5 if j > stdTime gamma(j,:) = yPred(j,:) ... ./mean(abs(yPred(j-stdTime:j,:))); else gamma(j,:) = yPred(j,:)./mean(abs(yPred(1:j,:))); end end % Validation

% yVal - is the actual standardized price change measured % at the end of the prediction time

yVal(j,:) = clPr(i + (j+1)*predTime, depAsset) ... - clPr(i + j*predTime, depAsset);

yVal(j,:) = (yVal(j,:) - mean(yTrain(end-stdTime:end, :))) ... ./std(yTrain(end-stdTime:end, :));

% Dates adjustment

% At each predicted day, the date is extracted

datesAdjusted(j,:) = datesNoNaN(i + (j+1)*predTime); waitbar(j/tradePeriods);

end

%% Strategy

% gamma - is the position to take for each asset

% holdingTot - is the evolution o f a holding in each asset group % infoR - is the Information Ratio for a strategy

(48)

% ret - is the risk adjusted return for each asset

% retTot - is the total r.a return for each ridge tuning param % risk - is the risk aversion coefficient

% Positioning if smart > 0.5 gamma(abs(gamma) > 1) = sign(gamma(abs(gamma) > 1)); else gamma = sign(yPred); % +/- 1 end

% Returns and Sharpe for each asset (/Ld) ret = repelem(yVal,1,Ll).*gamma;

retTot = cell2mat(arrayfun(@(x) sum(ret(:, x:Ll:end), 2), ... 1:Ll, 'uni', 0))/Ld;

infoR = mean(retTot)./std(retTot)*sqrt(250/predTime);

% Calculate the development of the total holding for each lambda

for ih = 2:length(ret(:,1))

holdingTot(ih,:) = holdingTot(ih - 1, :) ... .*(1 + risk*retTot(ih - 1, :));

end

%% Plots

% Plot the evolution of the total holding figure()

plot(datesAdjusted, holdingTot) ylabel('Holding [$]')

xlabel('Time [Year]')

str = cellstr(num2str(lambda', 'lambda = %d')); legend(str, 'Location', 'NorthWest');

datetick('x')

disp(['Sharpe ratio for lag ' num2str(lag(l)) ...

', and asset ' num2str(asset) ': ' num2str(infoR)])

close(h);

end end

hold off; toc;

B.2.2 Ridge Regression Function

function bOut = ridgeRegress(yTrain, xTrain, lambda, ridgeEye) % ridgeRegress calculates regression coefficients.

% bOut = ridgeRegress(yTrain, xTrain, lambda, ridgeEye) calculates % regression coefficients parallelly for multiple dependent variables % with shared design matrix using Singular Value Decomposition.

%

% Input:

% yTrain (matrix) - is a matrix with each column representing a dependent

% variable and each row representing an observation.

% xTrain (matrix) - is the shared design matrix with each column

(49)

% an observation.

% lambda (vector) - is a vector of each tuning parameter

% ridgeEye (matrix) - is diagonal with size length(lambda) x col(yTrain)

% with each element of lambda repeated length(lambda)

% times along the diagonal

%

% Output:

% bOut (matrix) - is the resulting regression coefficent vector for each

% dependent variable repeated for each lambda

%

% 2016 Iliam Barkino, Mattias Bertolino [rowX, colX] = size(xTrain);

[rowy, coly] = size(yTrain); lambdaLength = length(lambda); [U, D, V] = svd(xTrain);

% Resize to match number of tuning parameters lambda diagD = diag(repmat(diag(D'*D), lambdaLength, 1));

% Regress for all lambda and dependent variables at once b = (diagD + ridgeEye)\repmat(D'*U'*yTrain, lambdaLength, 1); b = reshape(b, colX, coly*lambdaLength);

bOut = V*b;

(50)

B.3 Data adjustment

function [dates, clPr] = removeNaN(dates, clPr) % removeNaN removes NaN's from matrices.

% [dates, clPr] = removeNaN(dates, clPr) removes all rows until the % first row has no NaN's. Then all NaN's are replaced

% with the previous row's value. %

% Input:

% dates (matrix) - dates for each asset

% clPr (matrix) - closing prices for each asset with NaN's %

% Output:

% dates (matrix) - dates for each asset w/ out NaN-days % clPr (matrix) - closing prices for each asset w/ out NaN's %

% 2016 Iliam Barkino, Mattias Bertolino N = length(clPr(1, :));

% Replace NaNs until common start index = zeros(N, 1);

for i = 1:N

index(i) = min(find(~isnan(clPr(:, i))));

end

startIndex = max(index); clPr(1:startIndex-1, :) = []; dates(1:startIndex-1, :) = [];

% Replacing next NaNs with previous values [row, col] = find(isnan(clPr));

for i = 1:length(col)

clPr(row(i), col(i)) = clPr(row(i) - 1, col(i));

end

Past predicts the future: A study on how machine learning can be used for investments

Examensarbete 15 hp

2016-06-07

Past predicts the future

A study on how machine learning can be used

for investments

Iliam Barkino

Mattias Bertolino

Abstract

Past predicts the future - A study on how machine

learning can be used for investments

Contents

1

Introduction

1.1

Background

1.2

Problem description

1.3

Purpose

2

Theory

2.1

Description of assets

2.2

Financial derivatives

2.3

Information Ratio

2.4

Moving Average Crossover

2.5

Multiple Linear Regression

3

Method

3.1

Grouping of assets

3.2

Moving Average Crossover

3.3

Multiple Linear Regression

4

Results

4.1

Moving Average Crossover

4.2

Multiple Linear Regression

Lag

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

100

0.2115

0.7301

0.1935

0.2914

0.6950

0.0156

-0.8317

200

0.2135

-0.0662

0.5028

0.7236

0.6836

0.7081

0.2357

250

-0.1277

-0.0991

0.4479

0.4283

0.5362

0.2779

0.1875

Lag

Group 1