Modeling and Forecasting Stock Index Returns using Intermarket Factor Models

(1)

Modeling and Forecasting Stock Index Returns using

Intermarket Factor Models

Predicting Returns and Return Spreads using Multiple Regression and Classification

Emil Tingstr¨om

SA104X Degree Project in Mathematical Statistics

Department of Mathematical Statistics Royal Institute of Technology

(2)

Abstract

The purpose of this thesis is to examine the predictability of stock indices with regression models based on intermarket factors. The underlying idea is that there is some correlation between past price changes and future price changes, and that models attempting to capture this could be improved by including information derived from correlated assets to make predictions of future price changes. The models are tested using the daily returns from Swedish stock indices and evaluated from a portfolio perspective and their statistical significance. Prediction of the direction of the price is also tested by Support vector machine classification on the OMXS30 index. The results indicate that there is some predictability in the market, in disagreement with the random walk hypothesis.

Sammanfattning

(3)

(4)

1 Introduction

This section will introduce the subject of quantitative trading and trading strategies and note some prior research on the subject. The purpose of this paper will then be presented as well as the outline of the report.

1.1 Quantitative trading

With the advent of information technology and computers the method of approaching the market by quantitative models have become common. Quantitative trading models makes use of mathematical and statistical analysis to exploit predictable patterns within financial data to base trading decision on.

Strategies based on quantitative trading models can usually be classified as either contrarian or trend following in their approach. Contrarian strategies attempt to trade against price changes, seeking to capitalize when the price returns to its previous equilibrium level. In contrast trend following strategies attempt to trade in the direction of previous price changes to capitalize on shifts in the balance of supply and demand. The success of these strategies depend on how well past price changes correlate with future price changes. If there was no correlation to exploit, then the logarithm of the price at a point in time Xtcould be represented by

Xt= Xt−1+ t (1)

with E[t] = 0 and zero autocorrelation E[tτ] = 0 for t 6= τ . This is referred to as the random walk

model and is consistent with the hypothesis that markets are efficient. Evidence from academic research has suggested that this models might be flawed and that stock index prices exhibit some level of correlation E[tτ] 6= 0 that would lead to predictability, however it might not be large

enough to produce a risk-adjusted return above the risk-free rate after accounting for transaction costs [6].

1.2 Stock indices

The OMXS30 PI stock index is a price index that represent the 30 most heavily traded stocks listed in Stockholm on Nasdaq OMX. It is generally used to track broad market movements in the Swedish stock market since it accounts for a significant share of the total market capitalization listed in Sweden. Other indices seek to track the performance of specific business sectors used to select stock or track the performance of a specific segment based on market capitalization such as stocks with a small capitalization.

1.3 Purpose

The purpose of this thesis is to investigate the predictability of Swedish stock indices. A previous study investigating contrarian strategies on the OMXS30 using daily data has found that the index exhibit some tendency to regain short term losses and pullback after short term gains, indicating a negative autocorrelation in short term returns [2]. Another study examining autoregressive models to predict European stock indices found that while the performance using the past 1 to 10 days return was generally poor, it improved by including the returns of other, correlated indices as input to the model. The intuition behind this was that combining correlated variables would eliminate some of the white noise, since a linear combination of the uncorrelated noise will offset each other [7].

(6)

autocorrelation but by intermarket relationships. For example capital flow into a riskier sector could be an indication of positive market sentiment and positive future price changes.

Based on this the attempt is made to predict the future return of an asset based on the return of a group of correlated stock indices. Using different method of regression, future return of different assets are predicted and the forecast is evaluated from a portfolio perspective and statistical significance. The dataset analyzed here is consists of the closing price for main index for the Swedish stock market, OMXS30 PI, along with different sector indices for the period 2002-12-27 to 2015-04-13. Numerical analysis is carried out using the programming language R with additional packages for the statistical analysis.

1.4 Outline

(7)

2 Initial Data Analysis

The closing prices for all indices are collected from Nasdaq OMX and cleaned by removing dates where the price for any index was missing. The indices used are:

• OMXS30 PI

• OMXS Oil & Gas PI

• OMXS Financials PI

• OMXS Automobiles & Parts PI

• OMXS Health Care PI

• OMXS Industrials PI

• OMXS Consumer Services PI

• OMXS Consumer Goods PI

• OMXS Utilities PI

• OMXS Food Producers PI

• OMXS Basic Materials PI

• OMXS Travel & Leisure PI

• OMXS Technology PI

• OMXS Telecommunications PI

• OMXS Small Cap PI

• OMXS Mid Cap PI

From this the daily log return can be computed.

Xt= ln(Pt) − ln(Pt−1) = ln Pt Pt−1 (2)

A total of 3084 data points per index is used. The correlation between the returns are displayed in Table 1.

(8)

OMXS30 Oil & Gas Financials Automobiles & P arts Health Care Industrials Consumer Service Consumer Go o ds Utilities Fo o d P ro ducers Basic Materials T ra v el & Leisure T ec hnology T elecomm unications Small Cap Mid Cap OMXS30 1 .55 .93 .68 .62 .93 .78 .83 .27 .43 .77 .56 .69 .71 .72 .82 Oil & Gas 1 .53 .45 .37 .54 .42 .50 .19 .32 .55 .34 .31 .40 .53 .59 Financials 1 .63 .55 .84 .68 .77 .26 .43 .74 .56 .55 .63 .70 .81 Automobiles & Parts 1 .49 .70 .55 .67 .25 .41 .63 .50 .45 .47 .67 .74 Health Care 1 .55 .50 .57 .22 .37 .51 .42 .41 .45 .57 .61 Industrials 1 .69 .79 .27 .43 .79 .56 .56 .61 .71 .83 Consumer Service 1 .66 .20 .39 .58 .52 .46 .52 .59 .69 Consumer Goods 1 .26 .45 .71 .52 .51 .57 .67 .78 Utilities 1 .21 .27 .21 .19 .18 .33 .32 Food Producers 1 .39 .35 .26 .33 .52 .56 Basic Materials 1 .51 .45 .51 .67 .78

Travel & Leisure 1 .35 .40 .59 .69

Technology 1 .43 .50 .52

Telecommunications 1 .52 .57

Small Cap 1 0.86

Mid Cap 1

Table 1: Return correlation matrix.

Mean Standard deviation

OMXS30 0.086 0.223

Oil & Gas 0.062 0.415

Financials 0.097 0.25

Automobiles & Parts 0.115 0.217

Health Care 0.069 0.179 Industrials 0.117 0.257 Consumer Service 0.101 0.211 Consumer Goods 0.070 0.199 Utilities -0.103 0.357 Food Producers 0.103 0.204 Basic Materials 0.070 0.283

Travel & Leisure 0.038 0.26

Technology 0.050 0.313

Telecommunications 0.026 0.229

Small Cap 0.102 0.139

Mid Cap 0.127 0.173

(9)

3 Model and Methodology

The main model tested in this paper is a multivariate autoregressive (AR) model written as

Yt=

X

k

βkXt−1k + t (3)

where Y is the return of the asset to be predicted and Xk _{represents the returns of each sector}

index from the previous day. The model is based on the hypothesis that the next day’s return of an asset (for example OMXS30) can be partially described by the previous day’s return of a group of sector indices. The coefficients need to be estimated before the model can be used, and to improve the accuracy of the models predictions the variables Y and Xk _{can be transformed and}

refined by excluding irrelevant variables.

3.1 Ordinary Least Squares multiple regression

The method of regression used for estimating the coefficients in the model is by Ordinary Least Squares (OLS). In (3) the coefficients βk are obtained by projecting the vector for the dependent

variable Y onto the space spanned by the vectors of the covariates Xk and taking the resulting coefficients. In matrix form the solution is

ˆ

β = (X|X)−1X|Y (4)

which will minimize the sum of the squared residual P

iˆ 2 i.

3.2 Coefficient of determination R

2

The coefficient of determination, or R2, is a number that indicates how well data fit a regression model. This is defined as

R2= 1 − P ˆ

2 i

P(Yi− ¯Y )2

(5)

where ¯Y is the mean value of Yi. The coefficient ranges from 0 to 1 and can be interpreted as the

percentage of the variations explained by the model.

3.3 Data snooping

When estimating the coefficients for the regression model the resulting predictions cannot be used to validate the model on the same set of data. This is commonly known as data snooping, where the hypothesis tested on a sample is also the one suggested by the same sample. To ensure performance out-of-sample going forward in time the prediction must be made using estimates made on the available data at the time. Also, due to the non-stationary properties of the underlying process the estimates will likely change over time. Taking this into account, the model will use a rolling window for estimating the coefficients and only include the past N tradings days in the regression. Longer look-back period will reduce the noise in the estimates, but will also be less responsive when the estimates change.

3.4 Akaike Information Criterion

One problem when specifying the regression model is choosing which covariates that should be included. A common way of determining the specification is by Akaike Information Criterion (AIC). The AIC value of a model is given by

nln(Xˆ2_t) + 2k (6)

(10)

rewards models with small error, but at the same time penalizes models with many parameters to discourage over-fitting the data [5].

Finding the model with the lowest AIC value is done by a stepwise algorithm, which starts with the full model and then in steps removes covariates to find improvements [8].

3.5 Normalization

The daily returns for the stock market are very dependent on the specific regime they fall into. The model specified in (3) does not include an intercept and therefore assumes the mean of Ytis

close to zero. This is a necessary restriction since any estimate of the long term mean return will be highly dependent on the period in the sample and likely not a robust estimate of future returns. Also since the volatility will vary in the sample, the residuals of (3) will also be dependent on time, E(ˆ2

t) = σti.e. there will be heteroskedasticity in the sample.

To account for this and improve the ability of the model to generalize out-of-sample, the daily log returns will be normalized by percentile ranking the return Xk

t with returns for the past 252

days (roughly one trading year), {Xk

t, Xt−1k , . . . , Xt−251k }. The percentile ranking is done by giving

the raw log return Xk

t a rank according to its value in an ordered list, e.g. the lowest return for the

past year is given rank 1 and the highest rank 252 with ties being given the lowest, then calculating the percentile for the return as Rt=

Rank(Xt)−1

252−1 . This is then bounded between -1 and 1 by the

transformation

200% × Rt− 100% (7)

The scaling will center the percentile for the median (0.5) at zero, so that each variable is distributed around zero. The same normalization procedure is applied to the independent variable Yt and

the dependent variables Xtk. This will hopefully reduce some of the heteroskedasticity and center

the mean close to zero. It will also remove the restriction of the linear model since any monotone dependency between the predicted variable and a covariate will be fitted in the model.

3.6 Portfolio evaluation and Sharpe ratio

The models will be evaluated by simulating a trading portfolio holding a 100% or -100% exposure to the return of the asset when the predicted next day returns is positive or negative respectively. The value of the portfolio is given by

V (t) = exp(

t

X

i

Yi× sign( ˆYi)) (8)

where Yi is the actual raw log return of the asset and ˆYi is the return predicted by the model. This

expression gives the value of a portfolio with compounded returns since the sum of logarithms is equal to the logarithm of the product, ln(a) + ln(b) = ln(ab). When the variables are normalized this means that the next day’s return is expected to be positive when the predicted normalized return ˆYi is positive. Since the normalized returns are centered with zero at the median of returns

for the past 252 days this means that in the implementation the median is expected to be close to zero. This is a necessary assumption since it is difficult to predict what the median will be for future returns and reasonable given that it is likely small compared to other errors in the model’s prediction.

The performance of the portfolio will be evaluated using the Sharpe ratio which is a standard measure for calculating the risk-adjusted return of a portfolio. The Sharpe ratio is defined as

S = CAGR

σ (9)

where CAGR stands for Compound annual growth rate

CAGR = V (n) V (0)

252 n

(11)

which is the annual geometric percentage return of the portfolio and σ is the annualized standard deviation of the portfolios return. The Sharpe ratio gives the average return earn per unit of risk, where risk is defined by the volatility of the portfolio. A high Sharpe ratio indicates that years with negative return will be rare and that the value of the portfolio will tend upwards with a smooth path.

3.7 Testing for statistical significance

While accurate predictions from the model will give a high expected Sharpe ratio, positive returns could be obtain by chance alone due to the randomness in the data sample. This means that the calculated Sharpe ratio is only an estimate of the true value and will contain errors that need to be accounted for in the analysis. A common method to infer statistical significance of the estimated values is by hypothesis testing. In statistical hypothesis testing a default position, called the null hypothesis, is defined as the assumption that there is no phenomenon but random chance that affect the results. The alternative hypothesis is that there is a phenomenon to be observed in the sample. From this a p-value is calculated, corresponding to the probability of observing results as extreme as those observed if the null hypothesis was true. Commonly a p-value lower than 0.05 is considered significant favoring the alternative hypothesis over the null hypothesis since it is unlikely (less than 1 in 20) to observe such results if it was true.

To determine the significance of the Sharpe ratio the p-value will be calculated using a Monte Carlo method. If the model could not predict the next day’s return with any accuracy the expected Sharpe ratio would be equivalent to what would be obtained from random chance. The null hypothesis H0 is therefore that the return and Sharpe ratio of the portfolio is equivalent to those

obtain by chance by trading the asset at random with the same net exposure to the returns of the asset traded. In order to calculate the p-value corresponding to the probability of observing a Sharpe ratio as high or higher than the observed if the models predictions are random, the distribution of Sharpe ratios for random predictions must be created. This is done by reordering the trading exposure for each day as given by sign( ˆYi) at random and calculating the resulting

Shape ratio S∗ for this random portfolio using equation (9). Repeating this process a sufficiently large number of times will give a distribution of Sharpe ratios that would be observed if the null hypothesis was true. The p-value can now be calculated from the resampled distribution of Sharpe ratios according to P (S∗_{≥ S|H}

0), which will be the percentage of random Sharpe ratios that are

(12)

4 Results and Analysis

This section will give the results of the model explained in the previous section when applied to returns of stock indices.

4.1 Predicting the next day’s return for OMXS30

The model is tested on OMXS30 using the previous days returns for all 16 indices, including OMXS30 and the 15 sector indices. Each day the coefficients are estimated by a regression on the past N = (500, 1000, 2000) days and the returns for the sector indices are used to make a prediction of the next day’s return for OMXS30. The resulting predictions are then used to simulate a portfolio that trades the OMXS30 based on the predicted direction for each day, taking a positive or negative portfolio exposure to the OMXS30 index depending on the predicted direction. The performance of the portfolio is then evaluated by calculating its Sharpe ratio and the associated p-value of this ratio for the period tested.

The results are tested using as input variables both raw returns, normalized returns and normalized returns combined with AIC to select the most appropriate covariates each day.

N = 500 N = 1000 N = 2000

Raw Returns -0.0939 (0.6071) 0.1137 (0.3936) -0.2957 (0.7068) Normalized Returns 0.1428 (0.3271) 0.5962 (0.0662) 0.7492 (0.1089) Normalized Returns & AIC 0.2593 (0.2322) 0.7195 (0.0372) 1.1327 (0.0304)

Table 3: Performance on OMXS30, Sharpe ratio and the associated p-value.

Table 3 summarizes the Sharpe ratio with different combinations for the model along with the p-value for the Sharpe ratio inside the parentheses. The Sharpe ratios increase with each added specification to the model. A notable improvement is the use of normalized returns instead of raw, which allows for better generalization out-of-sample. With normalized variables the Sharpe ratios were positive for each value of N, but only borderline significant for N = 1000. The results were significant (p-value bellow 0.05) for N = 1000 and N = 2000 using predictions from the model with both normalized variables and stepwise AIC selection. This is in agreement with the hypothesis that AIC is useful to remove irrelevant covariates in the model, since including some sector indices might be redundant or introduce more noise in the predictions.

The R2 for each regression is fairly low, around 0.005 to 0.03. This is expected since any predictable component will likely be small and noise will dominate. The return for OMXS30 is negatively correlated with the predicted return for the next day, indicating that negative autocorrelation is a component in the predictions of model.

(13)

Figure 1: Portfolio with normalization and AIC.

(14)

(15)

4.2 Using returns over several days as input

The previous model used only the one day returns as inputs and better performance could possibly be obtained by using the return over more than one day. By redefining the input variables as

Xt= ln _P t Pt−n (11)

and normalizing the result in the same way as before the model with AIC is tested on the same sample data, with the results shown in Table 4.

N = 500 N = 1000 N = 2000

n = 2 0.6655 (0.0293) 0.9756 (0.0096) 0.7128 (0.1188) n = 3 0.6428 (0.0362) 0.7965 (0.0267) 0.4240 (0.2187) n = 4 0.1678 (0.3085) 0.1869 (0.3222) 0.7677 (0.1099) n = 5 0.2803 (0.2094) 0.8389 (0.0217) 0.9719 (0.0528)

Table 4: Sharpe ratio and the associated p-value on OMXS30.

The performance is similar to the model with one day’s return as input, with no significant improvement. However, with longer term returns as inputs, the portfolio adjustments will be less frequent. This is an advantage from a trading perspective as less turnover will mean less transaction costs. Portfolio plots are included in the appendix.

4.3 Prediction of deviations from OMXS30

The previous section examined the performance of the model on when predicting daily returns for the broad market index. However, the returns series will contain a lot of white noise that is not captured by the model. One way to reduce the noise and get better predictions from the model could be to use the return of a sector index and subtract the return of the broad market, predicting the relative return of a sector. The independent variable is calculated as

Yt= ln _P t Pt−1 − ln P OM XS30 t POM XS30 t−1 (12)

where Ptis the price of a sector index. The returns are then ranked as a percentile of the 252

previous days returns as described in 3.4.

The performance of the model is tested in the same way as the previous section, by simulating trading a synthetic asset with the daily return given by Yt in equation (12) with normalized

variables and AIC selection for each regression.

(16)

N = 500 N = 1000 N = 2000 Oil & Gas 0.5846 (0.0512) 0.9355 (0.0132) 0.4582 (0.2118) Financials 1.1349 (0.0006) 1.1145 (0.002) 0.8566 (0.0811) Automobiles & Parts 2.0158 (<0.0001) 2.2351 (<0.0001) 1.4071 (0.0123) Health Care 0.0499 (0.4407) 0.3605 (0.1684) -0.3146 (0.7186) Industrials 0.8743 (0.0052) 0.9446 (0.0084) 1.8143 (0.0011) Consumer Service -0.1072 (0.6292) -0.1988 (0.7078) 0.5747 (0.1558) Consumer Goods 0.9478 (0.0031) 1.2454 (0.0009) 1.4162 (0.0081) Utilities 0.8361 (0.0159) 1.6338 (0.0002) 1.5401 (0.0083) Food Producers 1.6500 (<0.0001) 1.5951 (0.0002) 1.9845 (0.0009) Basic Materials 1.3218 (0.0002) 1.4859 (0.0002) 0.5265 (0.1779) Travel & Leisure 1.0400 (0.0022) 1.4865 (0.0003) 1.1418 (0.0280) Technology 0.3136 (0.1743) 0.5102 (0.0970) -0.4474 (0.7951) Telecommunications -0.1204 (0.6404) 0.0535 (0.4472) -0.2625 (0.6891) Small Cap 2.7341 (<0.0001) 2.2689 (<0.0001) 2.2845 (0.0002) Mid Cap 3.4078 (<0.0001) 3.7197 (<0.0001) 3.3986 (<0.0001)

Table 5: Sharpe ratio and the associated p-value for portfolios trading the relative return of the sector versus OMXS30.

4.4 Practical trading considerations

For accurate evaluation of the results some of the assumptions made in the simulation need to be considered. The first is that stock indices are not a tradable asset. To get exposure to the price changes of a stock index a trader need to either invest in the stocks that make up the components or in a derivative with the index as its underlying. The other consideration is that transaction costs will degrade the returns. Unless the transaction costs per trade are low, with a daily trading frequency the returns will be significantly affected when these are considered.

(17)

5 Support Vector Machine

The previous models all used linear regression to determine the next day return and then used the predicted sign for testing. This section examines a more advanced method for predicting just the sign by classification using the Support vector machine (SVM).

5.1 Linear SVM

Given some training data, with N points of the form {(xi, yi)}Ni=1, where xi∈ Rpis a p-dimensional

vector and yi∈ {−1, 1} is the associated label. The training data is said to linearly separable if

there exists a vector w and a scalar b such that

w · xi+ b ≥ 1 if yi= 1 (13)

w · xi+ b ≤ −1 if yi= −1 (14)

holds for all i. The inequalities can be rewritten in one equation as

yi(w · xi+ b) ≥ 1 (15)

so that the training data is separated by the hyperplane w · x + b = 0 with the margin of the separation given by the distance between the two hyperplanes

w · xi+ b = 1 (16)

w · xi+ b = −1 (17)

The distance between the hyperplanes is _|w|2 , meaning that the best separation of the training data is obtained by minimizing 1₂|w|2 _{(using the square and the factor} 1

2 for mathematical convenience).

The optimization problem is therefore

arg min w,b 1 2|w| 2 ₍₁₈₎ subject to yi(w · xi+ b) ≥ 1 (19)

for i = 1, . . . , N. This can be solved using solution methods for quadratic programming.

5.1.1 Dual form

Since w is determined by the hyperplane that allows for perfect separation of the data points according to yi, it will depend on the points xithat lie precisely on the margin. These vectors xi

are called Support vectors and satisfy yi(w · xi+ b) = 1. Writing w as a linear combination of the

training vectors

w =X

i

αiyixi

for some constants αi≥ 0, non-zero only for the corresponding support vectors. Using |w|2= w|·w

and deriving the Lagrangian, it is possible to show that the optimizations problem has a dual form of arg max αi≥0    N X i=1 αi− 1 2 N X j,k αjαkyjykk(xj, xk)    (20) subject to N X i=1 αiyy = 0 (21)

where k(xj, xk) = xj|· xk is the inner product in the Euclidean space, here called the kernel [4].

(18)

Figure 4: The hyperplane with the support vector.

5.2 Soft margin and kernels

If the training data is not linearly separable no hyperplane exists that can split the sample. This means that in order to find a solution some misclassification must be allowed. Cortes and Vapnik suggested a modification of the margin called the Soft Margin method by introducing a non-negative slack variable ξi [4]. The slack variable ξi gives a measure of the degree of error when classifying

the point i with a hyperplane. The inequalities that define the separating hyperplanes can be rewritten as

yi(w · xi+ b) ≥ 1 − ξi (22)

for i = 1, . . . , N. The objective function for the optimization problem will now need to include a way to penalize large values for ξi. Using a linear penalty function for the slack variables the

optimization problem is w,b,ξ ( 1 2|w| 2_{+ C} N X i=1 ξi ) (23) subject to yi(w · xi+ b) ≥ 1 − ξi, ξi≥ 0 (24)

for i = 1, . . . , N. Here C ≥ 0 is a constant which gives the cost of misclassification during the training. The Soft margin gives the dual optimization problem in the form

arg max αi≥0    N X i=1 αi− 1 2 N X j,k αjαkyjykk(xj, xk)    (25) subject to N X i=1 αiyy = 0, 0 ≤ αi≤ C ∀i (26)

(19)

5.2.1 Nonlinear kernels

While linear in its original formulation, SVM can be used as a nonlinear classifier by replacing the kernel in the optimization problem with a nonlinear kernel function. This will allow the algorithm to fit a hyperplane which separates the data points in the transformed feature space, which may be nonlinear in the original input space. An example of a kernel commonly used is the Gaussian radial basis function

k(xj, xk) = exp(−γ|xj− xk|2) (27)

with some constant γ ≥ 0. The corresponding feature space is a Hilbert space of infinite dimensions, mapping each data point by ϕ(xj) where ϕ is defined by k(xj, xk) = ϕ(xj)·ϕ(xk). The SVM does not

need to calculate ϕ(xj) to classify the data points, only the dot product w · ϕ(x) =Piαiyik(xi, x).

Other examples of kernels are the polynomial function k(xj, xk) = (xj · xj + γ)d and the

hyperbolic tangent k(xj, xk) = tanh(xj· xj+ γ).

5.3 Results for SVM on OMXS30

5.3.1 Results for linear SVM

The SVM classification method is tested in the same way as in previous sections. From the normalized returns of OMXS30 the sign is taken as yi to be used to train the SVM with the

normalized returns from the 16 sector indices as the data points x. The SVM is trained using data from the past N days and the predicted sign of the following day is used as the position when simulating a portfolio trading the OMXS30. The Sharpe ratio and the p-values of the Sharpe ratio calculated by the monte carlo method is used to evaluate the portfolios.

Since the soft margin method includes a variable for the penalty function, the cost of misclassi-fication C, a parameter has to be decided in advance. The effectiveness of SVM will depend on the selection of parameter C, however interpretation of the best choice of setting is difficult. For large values of C the optimization will choose a hyperplane with smaller margin but few misclassification, and conversely small values of C will cause the optimization to choose larger-margin separating hyperplanes even though the hyperplane misclassifies more points. Selecting the parameter using the in-sample data could be done by a parameter sweep, and testing an exponentially growing sequence of C, for example C ∈2−5_{, 2}−3_{, . . . , 2}11_{, 2}13_{. The result of each parameter value could}

then be evaluated using cross-validation by excluding some segments of the in-sample data during the training and selecting the value with the highest accuracy on the excluded data points. However, this will be very computationally expensive when testing a model that continually updates on past data.

To evaluate the impact of the cost parameter the model will be tested using different values with C ∈10−2_{, 10}−1_{, 10}0_{, 10}1_{, 10}2_{. The results of the SVM model using a linear kernel are shown}

in Tabel 6. N = 500 N = 1000 N = 2000 C = 0.01 0.0368 (0.4665) 0.7176 (0.0389) 1.0293 (0.0405) C = 0.1 0.3001 (0.1914) 0.4251 (0.1428) 1.0158 (0.0439) C = 1 0.7927 (0.0133) 0.5562 (0.0803) 1.2260 (0.0211) C = 10 0.7793 (0.0146) 0.5420 (0.0853) 1.0354 (0.0442) C = 100 0.7858 (0.0148) 0.5765 (0.0737) 1.0626 (0.0413)

Table 6: Sharpe ratio and the associated p-value on OMXS30 using linear SVM.

(20)

5.3.2 Results with radial kernel

The SVM with a radial kernel is also tested. This requires a second parameter for γ, and therefore increases the number of possible variations. The results with γ ∈10−2_{, 10}−1_{, 10}0_{, 10}1_{, 10}2_and

the different values for C are summaries in table 7.

N = 500 N = 1000 N = 2000 C = 0.01, γ = 0.01 -0.0066 (0.5051) 0.2312 (0.2416) -0.4875 (0.6969) C = 0.01, γ = 0.1 0.0321 (0.4579) 0.2104 (0.2618) -0.4634 (0.6877) C = 0.01, γ = 1 0.0173 (0.4702) 0.1364 (0.3204) -0.4707 (0.6199) C = 0.01, γ = 10 0.0311 (0.4508) 0.1462 (0.3054) -0.4888 (0.6358) C = 0.01, γ = 100 0.0354 (0.4390) 0.3091 (0.1863) -0.4802 (0.7591) C = 0.1, γ = 0.01 0.0896 (0.3801) 0.3558 (0.1762) 0.6052 (0.1464) C = 0.1, γ = 0.1 0.0421 (0.4401) 0.1359 (0.3551) 0.7505 (0.1277) C = 0.1, γ = 1 0.0173 (0.4688) 0.1364 (0.3216) -0.4707 (0.6230) C = 0.1, γ = 10 0.0311 (0.4472) 0.1462 (0.3016) -0.4888 (0.6341) C = 0.1, γ = 100 0.0354 (0.4391) 0.3091 (0.1865) -0.4802 (0.7537) C = 1, γ = 0.01 -0.2360 (0.7741) 0.7173 (0.0400) 0.6957 (0.1438) C = 1, γ = 0.1 0.2044 (0.2798) 0.4489 (0.1338) 0.4389 (0.2839) C = 1, γ = 1 0.2312 (0.2393) 0.5361 (0.0699) -0.3090 (0.5367) C = 1, γ = 10 -0.0202 (0.5142) 0.1118 (0.3493) -0.8368 (0.7834) C = 1, γ = 100 0.0886 (0.3836) 0.2879 (0.2021) -0.4802 (0.7573) C = 10, γ = 0.01 0.1259 (0.3536) 0.6407 (0.0581) 0.4578 (0.2612) C = 10, γ = 0.1 0.1865 (0.3005) 0.5607 (0.0793) 0.1870 (0.34016) C = 10, γ = 1 -0.2055 (0.7299) -0.2523 (0.7274) -0.1666 (0.4506) C = 10, γ = 10 0.0785 (0.3935) 0.1522 (0.3002) -0.9093 (0.8338) C = 10, γ = 100 0.0479 (0.4297) 0.2735 (0.2135) -0.4771 (0.7589) C = 100, γ = 0.01 0.2033 (0.2855) 0.4149 (0.1504) -0.1155 (0.6888) C = 100, γ = 0.1 -0.0627 (0.5814) 0.6739 (0.0468) 0.5415 (0.1820) C = 100, γ = 1 -0.2283 (0.7556) -0.0156 (0.4782) -0.2906 (0.5159) C = 100, γ = 10 0.0785 (0.3920) 0.1662 (0.2916) -0.9153 (0.8340) C = 100, γ = 100 0.0479 (0.4313) 0.2735 (0.2115) -0.4771 (0.7538)

Table 7: Sharpe ratio and the associated p-value on OMXS30 using SVM with a radial kernel.

(21)

6 Discussion and Conclusion

The purpose of this thesis was to test the predictability of stock indices with regression models using intermarket factors. The first models used multiple linear regression to predict the daily return of the OMXS30 index with the returns from 16 different sector indices as covariates. To improve performance, the model was also tested with normalized covariates and then refined by model selection with AIC. The results were generally positive and to some degree supports the hypothesis that past returns could give an indication of future returns. Some combinations of the model were able to generate statistically significant risk-adjusted returns when tested on historical data.

The multiple regression model was also used to test the predictability of the relative return between a sector index and the main index OMXS30. The results were generally positive and in some cases highly significant, however practical use of the models would need to consider important issues in the implementation.

A classification method by a support vector machine was also tested to predict the direction of the OMXS30 index. The results for linear classification were just good as those for the linear regression model with AIC, however nonlinear classification with a radial kernel failed to generate consistent results.

(22)

Appendix

Figure 5: Portfolio with n=2. Figure 6: Portfolio with n=2. Figure 7: Portfolio with n=2.

(23)

References

[1] David Aronson. Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley, 1 edition.

[2] Anna Bergfast. Automated trading using a dip searching strategy. Master’s thesis, Royal Institute of Technology, Stockholm, 2009.

[3] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines. 2001, updated March 4, 2013. URL: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf, [Online; accessed 2015-04-30].

[4] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Technical report, AT&T Labs-Research, USA.

[5] Harald Lang. Elements of Regression Analysis. Royal Institute of Technology, Stockholm.

[6] Andrew W. Lo and A. Craig MacKinlay. A Non-Random Walk Down Wall Street (5th ed.). Princeton University Press, Princeton, 2002.

[7] On Prediction and Filtering of Stock Index Returns. Royal institute of technology, stockholm. Master’s thesis, Fredrik Hallgren, 2011.

Modeling and Forecasting Stock Index Returns using Intermarket Factor Models