Modeling and Forecasting Stock Index Returns using
Intermarket Factor Models
Predicting Returns and Return Spreads using Multiple Regression and Classification
Emil Tingstr¨om
SA104X Degree Project in Mathematical Statistics
Department of Mathematical Statistics Royal Institute of Technology
Abstract
The purpose of this thesis is to examine the predictability of stock indices with regression models based on intermarket factors. The underlying idea is that there is some correlation between past price changes and future price changes, and that models attempting to capture this could be improved by including information derived from correlated assets to make predictions of future price changes. The models are tested using the daily returns from Swedish stock indices and evaluated from a portfolio perspective and their statistical significance. Prediction of the direction of the price is also tested by Support vector machine classification on the OMXS30 index. The results indicate that there is some predictability in the market, in disagreement with the random walk hypothesis.
Sammanfattning
Contents
1 Introduction 1 1.1 Quantitative trading . . . 1 1.2 Stock indices . . . 1 1.3 Purpose . . . 1 1.4 Outline . . . 22 Initial Data Analysis 3 3 Model and Methodology 5 3.1 Ordinary Least Squares multiple regression . . . 5
3.2 Coefficient of determination R2 . . . . 5
3.3 Data snooping . . . 5
3.4 Akaike Information Criterion . . . 5
3.5 Normalization . . . 6
3.6 Portfolio evaluation and Sharpe ratio . . . 6
3.7 Testing for statistical significance . . . 7
4 Results and Analysis 8 4.1 Predicting the next day’s return for OMXS30 . . . 8
4.2 Using returns over several days as input . . . 11
4.3 Prediction of deviations from OMXS30 . . . 11
4.4 Practical trading considerations . . . 12
5 Support Vector Machine 13 5.1 Linear SVM . . . 13
5.1.1 Dual form . . . 13
5.2 Soft margin and kernels . . . 14
5.2.1 Nonlinear kernels . . . 15
5.3 Results for SVM on OMXS30 . . . 15
5.3.1 Results for linear SVM . . . 15
5.3.2 Results with radial kernel . . . 16
6 Discussion and Conclusion 17
1
Introduction
This section will introduce the subject of quantitative trading and trading strategies and note some prior research on the subject. The purpose of this paper will then be presented as well as the outline of the report.
1.1
Quantitative trading
With the advent of information technology and computers the method of approaching the market by quantitative models have become common. Quantitative trading models makes use of mathematical and statistical analysis to exploit predictable patterns within financial data to base trading decision on.
Strategies based on quantitative trading models can usually be classified as either contrarian or trend following in their approach. Contrarian strategies attempt to trade against price changes, seeking to capitalize when the price returns to its previous equilibrium level. In contrast trend following strategies attempt to trade in the direction of previous price changes to capitalize on shifts in the balance of supply and demand. The success of these strategies depend on how well past price changes correlate with future price changes. If there was no correlation to exploit, then the logarithm of the price at a point in time Xtcould be represented by
Xt= Xt−1+ t (1)
with E[t] = 0 and zero autocorrelation E[tτ] = 0 for t 6= τ . This is referred to as the random walk
model and is consistent with the hypothesis that markets are efficient. Evidence from academic research has suggested that this models might be flawed and that stock index prices exhibit some level of correlation E[tτ] 6= 0 that would lead to predictability, however it might not be large
enough to produce a risk-adjusted return above the risk-free rate after accounting for transaction costs [6].
1.2
Stock indices
The OMXS30 PI stock index is a price index that represent the 30 most heavily traded stocks listed in Stockholm on Nasdaq OMX. It is generally used to track broad market movements in the Swedish stock market since it accounts for a significant share of the total market capitalization listed in Sweden. Other indices seek to track the performance of specific business sectors used to select stock or track the performance of a specific segment based on market capitalization such as stocks with a small capitalization.
1.3
Purpose
The purpose of this thesis is to investigate the predictability of Swedish stock indices. A previous study investigating contrarian strategies on the OMXS30 using daily data has found that the index exhibit some tendency to regain short term losses and pullback after short term gains, indicating a negative autocorrelation in short term returns [2]. Another study examining autoregressive models to predict European stock indices found that while the performance using the past 1 to 10 days return was generally poor, it improved by including the returns of other, correlated indices as input to the model. The intuition behind this was that combining correlated variables would eliminate some of the white noise, since a linear combination of the uncorrelated noise will offset each other [7].
autocorrelation but by intermarket relationships. For example capital flow into a riskier sector could be an indication of positive market sentiment and positive future price changes.
Based on this the attempt is made to predict the future return of an asset based on the return of a group of correlated stock indices. Using different method of regression, future return of different assets are predicted and the forecast is evaluated from a portfolio perspective and statistical significance. The dataset analyzed here is consists of the closing price for main index for the Swedish stock market, OMXS30 PI, along with different sector indices for the period 2002-12-27 to 2015-04-13. Numerical analysis is carried out using the programming language R with additional packages for the statistical analysis.
1.4
Outline
2
Initial Data Analysis
The closing prices for all indices are collected from Nasdaq OMX and cleaned by removing dates where the price for any index was missing. The indices used are:
• OMXS30 PI
• OMXS Oil & Gas PI
• OMXS Financials PI
• OMXS Automobiles & Parts PI
• OMXS Health Care PI
• OMXS Industrials PI
• OMXS Consumer Services PI
• OMXS Consumer Goods PI
• OMXS Utilities PI
• OMXS Food Producers PI
• OMXS Basic Materials PI
• OMXS Travel & Leisure PI
• OMXS Technology PI
• OMXS Telecommunications PI
• OMXS Small Cap PI
• OMXS Mid Cap PI
From this the daily log return can be computed.
Xt= ln(Pt) − ln(Pt−1) = ln Pt Pt−1 (2)
A total of 3084 data points per index is used. The correlation between the returns are displayed in Table 1.
OMXS30 Oil & Gas Financials Automobiles & P arts Health Care Industrials Consumer Service Consumer Go o ds Utilities Fo o d P ro ducers Basic Materials T ra v el & Leisure T ec hnology T elecomm unications Small Cap Mid Cap OMXS30 1 .55 .93 .68 .62 .93 .78 .83 .27 .43 .77 .56 .69 .71 .72 .82 Oil & Gas 1 .53 .45 .37 .54 .42 .50 .19 .32 .55 .34 .31 .40 .53 .59 Financials 1 .63 .55 .84 .68 .77 .26 .43 .74 .56 .55 .63 .70 .81 Automobiles & Parts 1 .49 .70 .55 .67 .25 .41 .63 .50 .45 .47 .67 .74 Health Care 1 .55 .50 .57 .22 .37 .51 .42 .41 .45 .57 .61 Industrials 1 .69 .79 .27 .43 .79 .56 .56 .61 .71 .83 Consumer Service 1 .66 .20 .39 .58 .52 .46 .52 .59 .69 Consumer Goods 1 .26 .45 .71 .52 .51 .57 .67 .78 Utilities 1 .21 .27 .21 .19 .18 .33 .32 Food Producers 1 .39 .35 .26 .33 .52 .56 Basic Materials 1 .51 .45 .51 .67 .78
Travel & Leisure 1 .35 .40 .59 .69
Technology 1 .43 .50 .52
Telecommunications 1 .52 .57
Small Cap 1 0.86
Mid Cap 1
Table 1: Return correlation matrix.
Mean Standard deviation
OMXS30 0.086 0.223
Oil & Gas 0.062 0.415
Financials 0.097 0.25
Automobiles & Parts 0.115 0.217
Health Care 0.069 0.179 Industrials 0.117 0.257 Consumer Service 0.101 0.211 Consumer Goods 0.070 0.199 Utilities -0.103 0.357 Food Producers 0.103 0.204 Basic Materials 0.070 0.283
Travel & Leisure 0.038 0.26
Technology 0.050 0.313
Telecommunications 0.026 0.229
Small Cap 0.102 0.139
Mid Cap 0.127 0.173
3
Model and Methodology
The main model tested in this paper is a multivariate autoregressive (AR) model written as
Yt=
X
k
βkXt−1k + t (3)
where Y is the return of the asset to be predicted and Xk represents the returns of each sector
index from the previous day. The model is based on the hypothesis that the next day’s return of an asset (for example OMXS30) can be partially described by the previous day’s return of a group of sector indices. The coefficients need to be estimated before the model can be used, and to improve the accuracy of the models predictions the variables Y and Xk can be transformed and
refined by excluding irrelevant variables.
3.1
Ordinary Least Squares multiple regression
The method of regression used for estimating the coefficients in the model is by Ordinary Least Squares (OLS). In (3) the coefficients βk are obtained by projecting the vector for the dependent
variable Y onto the space spanned by the vectors of the covariates Xk and taking the resulting coefficients. In matrix form the solution is
ˆ
β = (X|X)−1X|Y (4)
which will minimize the sum of the squared residual P
iˆ 2 i.
3.2
Coefficient of determination R
2The coefficient of determination, or R2, is a number that indicates how well data fit a regression model. This is defined as
R2= 1 − P ˆ
2 i
P(Yi− ¯Y )2
(5)
where ¯Y is the mean value of Yi. The coefficient ranges from 0 to 1 and can be interpreted as the
percentage of the variations explained by the model.
3.3
Data snooping
When estimating the coefficients for the regression model the resulting predictions cannot be used to validate the model on the same set of data. This is commonly known as data snooping, where the hypothesis tested on a sample is also the one suggested by the same sample. To ensure performance out-of-sample going forward in time the prediction must be made using estimates made on the available data at the time. Also, due to the non-stationary properties of the underlying process the estimates will likely change over time. Taking this into account, the model will use a rolling window for estimating the coefficients and only include the past N tradings days in the regression. Longer look-back period will reduce the noise in the estimates, but will also be less responsive when the estimates change.
3.4
Akaike Information Criterion
One problem when specifying the regression model is choosing which covariates that should be included. A common way of determining the specification is by Akaike Information Criterion (AIC). The AIC value of a model is given by
nln(Xˆ2t) + 2k (6)
rewards models with small error, but at the same time penalizes models with many parameters to discourage over-fitting the data [5].
Finding the model with the lowest AIC value is done by a stepwise algorithm, which starts with the full model and then in steps removes covariates to find improvements [8].
3.5
Normalization
The daily returns for the stock market are very dependent on the specific regime they fall into. The model specified in (3) does not include an intercept and therefore assumes the mean of Ytis
close to zero. This is a necessary restriction since any estimate of the long term mean return will be highly dependent on the period in the sample and likely not a robust estimate of future returns. Also since the volatility will vary in the sample, the residuals of (3) will also be dependent on time, E(ˆ2
t) = σti.e. there will be heteroskedasticity in the sample.
To account for this and improve the ability of the model to generalize out-of-sample, the daily log returns will be normalized by percentile ranking the return Xk
t with returns for the past 252
days (roughly one trading year), {Xk
t, Xt−1k , . . . , Xt−251k }. The percentile ranking is done by giving
the raw log return Xk
t a rank according to its value in an ordered list, e.g. the lowest return for the
past year is given rank 1 and the highest rank 252 with ties being given the lowest, then calculating the percentile for the return as Rt=
Rank(Xt)−1
252−1 . This is then bounded between -1 and 1 by the
transformation
200% × Rt− 100% (7)
The scaling will center the percentile for the median (0.5) at zero, so that each variable is distributed around zero. The same normalization procedure is applied to the independent variable Yt and
the dependent variables Xtk. This will hopefully reduce some of the heteroskedasticity and center
the mean close to zero. It will also remove the restriction of the linear model since any monotone dependency between the predicted variable and a covariate will be fitted in the model.
3.6
Portfolio evaluation and Sharpe ratio
The models will be evaluated by simulating a trading portfolio holding a 100% or -100% exposure to the return of the asset when the predicted next day returns is positive or negative respectively. The value of the portfolio is given by
V (t) = exp(
t
X
i
Yi× sign( ˆYi)) (8)
where Yi is the actual raw log return of the asset and ˆYi is the return predicted by the model. This
expression gives the value of a portfolio with compounded returns since the sum of logarithms is equal to the logarithm of the product, ln(a) + ln(b) = ln(ab). When the variables are normalized this means that the next day’s return is expected to be positive when the predicted normalized return ˆYi is positive. Since the normalized returns are centered with zero at the median of returns
for the past 252 days this means that in the implementation the median is expected to be close to zero. This is a necessary assumption since it is difficult to predict what the median will be for future returns and reasonable given that it is likely small compared to other errors in the model’s prediction.
The performance of the portfolio will be evaluated using the Sharpe ratio which is a standard measure for calculating the risk-adjusted return of a portfolio. The Sharpe ratio is defined as
S = CAGR
σ (9)
where CAGR stands for Compound annual growth rate
CAGR = V (n) V (0)
252 n
which is the annual geometric percentage return of the portfolio and σ is the annualized standard deviation of the portfolios return. The Sharpe ratio gives the average return earn per unit of risk, where risk is defined by the volatility of the portfolio. A high Sharpe ratio indicates that years with negative return will be rare and that the value of the portfolio will tend upwards with a smooth path.
3.7
Testing for statistical significance
While accurate predictions from the model will give a high expected Sharpe ratio, positive returns could be obtain by chance alone due to the randomness in the data sample. This means that the calculated Sharpe ratio is only an estimate of the true value and will contain errors that need to be accounted for in the analysis. A common method to infer statistical significance of the estimated values is by hypothesis testing. In statistical hypothesis testing a default position, called the null hypothesis, is defined as the assumption that there is no phenomenon but random chance that affect the results. The alternative hypothesis is that there is a phenomenon to be observed in the sample. From this a p-value is calculated, corresponding to the probability of observing results as extreme as those observed if the null hypothesis was true. Commonly a p-value lower than 0.05 is considered significant favoring the alternative hypothesis over the null hypothesis since it is unlikely (less than 1 in 20) to observe such results if it was true.
To determine the significance of the Sharpe ratio the p-value will be calculated using a Monte Carlo method. If the model could not predict the next day’s return with any accuracy the expected Sharpe ratio would be equivalent to what would be obtained from random chance. The null hypothesis H0 is therefore that the return and Sharpe ratio of the portfolio is equivalent to those
obtain by chance by trading the asset at random with the same net exposure to the returns of the asset traded. In order to calculate the p-value corresponding to the probability of observing a Sharpe ratio as high or higher than the observed if the models predictions are random, the distribution of Sharpe ratios for random predictions must be created. This is done by reordering the trading exposure for each day as given by sign( ˆYi) at random and calculating the resulting
Shape ratio S∗ for this random portfolio using equation (9). Repeating this process a sufficiently large number of times will give a distribution of Sharpe ratios that would be observed if the null hypothesis was true. The p-value can now be calculated from the resampled distribution of Sharpe ratios according to P (S∗≥ S|H
0), which will be the percentage of random Sharpe ratios that are
4
Results and Analysis
This section will give the results of the model explained in the previous section when applied to returns of stock indices.
4.1
Predicting the next day’s return for OMXS30
The model is tested on OMXS30 using the previous days returns for all 16 indices, including OMXS30 and the 15 sector indices. Each day the coefficients are estimated by a regression on the past N = (500, 1000, 2000) days and the returns for the sector indices are used to make a prediction of the next day’s return for OMXS30. The resulting predictions are then used to simulate a portfolio that trades the OMXS30 based on the predicted direction for each day, taking a positive or negative portfolio exposure to the OMXS30 index depending on the predicted direction. The performance of the portfolio is then evaluated by calculating its Sharpe ratio and the associated p-value of this ratio for the period tested.
The results are tested using as input variables both raw returns, normalized returns and normalized returns combined with AIC to select the most appropriate covariates each day.
N = 500 N = 1000 N = 2000
Raw Returns -0.0939 (0.6071) 0.1137 (0.3936) -0.2957 (0.7068) Normalized Returns 0.1428 (0.3271) 0.5962 (0.0662) 0.7492 (0.1089) Normalized Returns & AIC 0.2593 (0.2322) 0.7195 (0.0372) 1.1327 (0.0304)
Table 3: Performance on OMXS30, Sharpe ratio and the associated p-value.
Table 3 summarizes the Sharpe ratio with different combinations for the model along with the p-value for the Sharpe ratio inside the parentheses. The Sharpe ratios increase with each added specification to the model. A notable improvement is the use of normalized returns instead of raw, which allows for better generalization out-of-sample. With normalized variables the Sharpe ratios were positive for each value of N, but only borderline significant for N = 1000. The results were significant (p-value bellow 0.05) for N = 1000 and N = 2000 using predictions from the model with both normalized variables and stepwise AIC selection. This is in agreement with the hypothesis that AIC is useful to remove irrelevant covariates in the model, since including some sector indices might be redundant or introduce more noise in the predictions.
The R2 for each regression is fairly low, around 0.005 to 0.03. This is expected since any predictable component will likely be small and noise will dominate. The return for OMXS30 is negatively correlated with the predicted return for the next day, indicating that negative autocorrelation is a component in the predictions of model.
Figure 1: Portfolio with normalization and AIC.
4.2
Using returns over several days as input
The previous model used only the one day returns as inputs and better performance could possibly be obtained by using the return over more than one day. By redefining the input variables as
Xt= ln P t Pt−n (11)
and normalizing the result in the same way as before the model with AIC is tested on the same sample data, with the results shown in Table 4.
N = 500 N = 1000 N = 2000
n = 2 0.6655 (0.0293) 0.9756 (0.0096) 0.7128 (0.1188) n = 3 0.6428 (0.0362) 0.7965 (0.0267) 0.4240 (0.2187) n = 4 0.1678 (0.3085) 0.1869 (0.3222) 0.7677 (0.1099) n = 5 0.2803 (0.2094) 0.8389 (0.0217) 0.9719 (0.0528)
Table 4: Sharpe ratio and the associated p-value on OMXS30.
The performance is similar to the model with one day’s return as input, with no significant improvement. However, with longer term returns as inputs, the portfolio adjustments will be less frequent. This is an advantage from a trading perspective as less turnover will mean less transaction costs. Portfolio plots are included in the appendix.
4.3
Prediction of deviations from OMXS30
The previous section examined the performance of the model on when predicting daily returns for the broad market index. However, the returns series will contain a lot of white noise that is not captured by the model. One way to reduce the noise and get better predictions from the model could be to use the return of a sector index and subtract the return of the broad market, predicting the relative return of a sector. The independent variable is calculated as
Yt= ln P t Pt−1 − ln P OM XS30 t POM XS30 t−1 (12)
where Ptis the price of a sector index. The returns are then ranked as a percentile of the 252
previous days returns as described in 3.4.
The performance of the model is tested in the same way as the previous section, by simulating trading a synthetic asset with the daily return given by Yt in equation (12) with normalized
variables and AIC selection for each regression.
N = 500 N = 1000 N = 2000 Oil & Gas 0.5846 (0.0512) 0.9355 (0.0132) 0.4582 (0.2118) Financials 1.1349 (0.0006) 1.1145 (0.002) 0.8566 (0.0811) Automobiles & Parts 2.0158 (<0.0001) 2.2351 (<0.0001) 1.4071 (0.0123) Health Care 0.0499 (0.4407) 0.3605 (0.1684) -0.3146 (0.7186) Industrials 0.8743 (0.0052) 0.9446 (0.0084) 1.8143 (0.0011) Consumer Service -0.1072 (0.6292) -0.1988 (0.7078) 0.5747 (0.1558) Consumer Goods 0.9478 (0.0031) 1.2454 (0.0009) 1.4162 (0.0081) Utilities 0.8361 (0.0159) 1.6338 (0.0002) 1.5401 (0.0083) Food Producers 1.6500 (<0.0001) 1.5951 (0.0002) 1.9845 (0.0009) Basic Materials 1.3218 (0.0002) 1.4859 (0.0002) 0.5265 (0.1779) Travel & Leisure 1.0400 (0.0022) 1.4865 (0.0003) 1.1418 (0.0280) Technology 0.3136 (0.1743) 0.5102 (0.0970) -0.4474 (0.7951) Telecommunications -0.1204 (0.6404) 0.0535 (0.4472) -0.2625 (0.6891) Small Cap 2.7341 (<0.0001) 2.2689 (<0.0001) 2.2845 (0.0002) Mid Cap 3.4078 (<0.0001) 3.7197 (<0.0001) 3.3986 (<0.0001)
Table 5: Sharpe ratio and the associated p-value for portfolios trading the relative return of the sector versus OMXS30.
4.4
Practical trading considerations
For accurate evaluation of the results some of the assumptions made in the simulation need to be considered. The first is that stock indices are not a tradable asset. To get exposure to the price changes of a stock index a trader need to either invest in the stocks that make up the components or in a derivative with the index as its underlying. The other consideration is that transaction costs will degrade the returns. Unless the transaction costs per trade are low, with a daily trading frequency the returns will be significantly affected when these are considered.
5
Support Vector Machine
The previous models all used linear regression to determine the next day return and then used the predicted sign for testing. This section examines a more advanced method for predicting just the sign by classification using the Support vector machine (SVM).
5.1
Linear SVM
Given some training data, with N points of the form {(xi, yi)}Ni=1, where xi∈ Rpis a p-dimensional
vector and yi∈ {−1, 1} is the associated label. The training data is said to linearly separable if
there exists a vector w and a scalar b such that
w · xi+ b ≥ 1 if yi= 1 (13)
w · xi+ b ≤ −1 if yi= −1 (14)
holds for all i. The inequalities can be rewritten in one equation as
yi(w · xi+ b) ≥ 1 (15)
so that the training data is separated by the hyperplane w · x + b = 0 with the margin of the separation given by the distance between the two hyperplanes
w · xi+ b = 1 (16)
w · xi+ b = −1 (17)
The distance between the hyperplanes is |w|2 , meaning that the best separation of the training data is obtained by minimizing 12|w|2 (using the square and the factor 1
2 for mathematical convenience).
The optimization problem is therefore
arg min w,b 1 2|w| 2 (18) subject to yi(w · xi+ b) ≥ 1 (19)
for i = 1, . . . , N. This can be solved using solution methods for quadratic programming.
5.1.1 Dual form
Since w is determined by the hyperplane that allows for perfect separation of the data points according to yi, it will depend on the points xithat lie precisely on the margin. These vectors xi
are called Support vectors and satisfy yi(w · xi+ b) = 1. Writing w as a linear combination of the
training vectors
w =X
i
αiyixi
for some constants αi≥ 0, non-zero only for the corresponding support vectors. Using |w|2= w|·w
and deriving the Lagrangian, it is possible to show that the optimizations problem has a dual form of arg max αi≥0 N X i=1 αi− 1 2 N X j,k αjαkyjykk(xj, xk) (20) subject to N X i=1 αiyy = 0 (21)
where k(xj, xk) = xj|· xk is the inner product in the Euclidean space, here called the kernel [4].
Figure 4: The hyperplane with the support vector.
5.2
Soft margin and kernels
If the training data is not linearly separable no hyperplane exists that can split the sample. This means that in order to find a solution some misclassification must be allowed. Cortes and Vapnik suggested a modification of the margin called the Soft Margin method by introducing a non-negative slack variable ξi [4]. The slack variable ξi gives a measure of the degree of error when classifying
the point i with a hyperplane. The inequalities that define the separating hyperplanes can be rewritten as
yi(w · xi+ b) ≥ 1 − ξi (22)
for i = 1, . . . , N. The objective function for the optimization problem will now need to include a way to penalize large values for ξi. Using a linear penalty function for the slack variables the
optimization problem is w,b,ξ ( 1 2|w| 2+ C N X i=1 ξi ) (23) subject to yi(w · xi+ b) ≥ 1 − ξi, ξi≥ 0 (24)
for i = 1, . . . , N. Here C ≥ 0 is a constant which gives the cost of misclassification during the training. The Soft margin gives the dual optimization problem in the form
arg max αi≥0 N X i=1 αi− 1 2 N X j,k αjαkyjykk(xj, xk) (25) subject to N X i=1 αiyy = 0, 0 ≤ αi≤ C ∀i (26)
5.2.1 Nonlinear kernels
While linear in its original formulation, SVM can be used as a nonlinear classifier by replacing the kernel in the optimization problem with a nonlinear kernel function. This will allow the algorithm to fit a hyperplane which separates the data points in the transformed feature space, which may be nonlinear in the original input space. An example of a kernel commonly used is the Gaussian radial basis function
k(xj, xk) = exp(−γ|xj− xk|2) (27)
with some constant γ ≥ 0. The corresponding feature space is a Hilbert space of infinite dimensions, mapping each data point by ϕ(xj) where ϕ is defined by k(xj, xk) = ϕ(xj)·ϕ(xk). The SVM does not
need to calculate ϕ(xj) to classify the data points, only the dot product w · ϕ(x) =Piαiyik(xi, x).
Other examples of kernels are the polynomial function k(xj, xk) = (xj · xj + γ)d and the
hyperbolic tangent k(xj, xk) = tanh(xj· xj+ γ).
5.3
Results for SVM on OMXS30
5.3.1 Results for linear SVM
The SVM classification method is tested in the same way as in previous sections. From the normalized returns of OMXS30 the sign is taken as yi to be used to train the SVM with the
normalized returns from the 16 sector indices as the data points x. The SVM is trained using data from the past N days and the predicted sign of the following day is used as the position when simulating a portfolio trading the OMXS30. The Sharpe ratio and the p-values of the Sharpe ratio calculated by the monte carlo method is used to evaluate the portfolios.
Since the soft margin method includes a variable for the penalty function, the cost of misclassi-fication C, a parameter has to be decided in advance. The effectiveness of SVM will depend on the selection of parameter C, however interpretation of the best choice of setting is difficult. For large values of C the optimization will choose a hyperplane with smaller margin but few misclassification, and conversely small values of C will cause the optimization to choose larger-margin separating hyperplanes even though the hyperplane misclassifies more points. Selecting the parameter using the in-sample data could be done by a parameter sweep, and testing an exponentially growing sequence of C, for example C ∈2−5, 2−3, . . . , 211, 213 . The result of each parameter value could
then be evaluated using cross-validation by excluding some segments of the in-sample data during the training and selecting the value with the highest accuracy on the excluded data points. However, this will be very computationally expensive when testing a model that continually updates on past data.
To evaluate the impact of the cost parameter the model will be tested using different values with C ∈10−2, 10−1, 100, 101, 102 . The results of the SVM model using a linear kernel are shown
in Tabel 6. N = 500 N = 1000 N = 2000 C = 0.01 0.0368 (0.4665) 0.7176 (0.0389) 1.0293 (0.0405) C = 0.1 0.3001 (0.1914) 0.4251 (0.1428) 1.0158 (0.0439) C = 1 0.7927 (0.0133) 0.5562 (0.0803) 1.2260 (0.0211) C = 10 0.7793 (0.0146) 0.5420 (0.0853) 1.0354 (0.0442) C = 100 0.7858 (0.0148) 0.5765 (0.0737) 1.0626 (0.0413)
Table 6: Sharpe ratio and the associated p-value on OMXS30 using linear SVM.
5.3.2 Results with radial kernel
The SVM with a radial kernel is also tested. This requires a second parameter for γ, and therefore increases the number of possible variations. The results with γ ∈10−2, 10−1, 100, 101, 102 and
the different values for C are summaries in table 7.
N = 500 N = 1000 N = 2000 C = 0.01, γ = 0.01 -0.0066 (0.5051) 0.2312 (0.2416) -0.4875 (0.6969) C = 0.01, γ = 0.1 0.0321 (0.4579) 0.2104 (0.2618) -0.4634 (0.6877) C = 0.01, γ = 1 0.0173 (0.4702) 0.1364 (0.3204) -0.4707 (0.6199) C = 0.01, γ = 10 0.0311 (0.4508) 0.1462 (0.3054) -0.4888 (0.6358) C = 0.01, γ = 100 0.0354 (0.4390) 0.3091 (0.1863) -0.4802 (0.7591) C = 0.1, γ = 0.01 0.0896 (0.3801) 0.3558 (0.1762) 0.6052 (0.1464) C = 0.1, γ = 0.1 0.0421 (0.4401) 0.1359 (0.3551) 0.7505 (0.1277) C = 0.1, γ = 1 0.0173 (0.4688) 0.1364 (0.3216) -0.4707 (0.6230) C = 0.1, γ = 10 0.0311 (0.4472) 0.1462 (0.3016) -0.4888 (0.6341) C = 0.1, γ = 100 0.0354 (0.4391) 0.3091 (0.1865) -0.4802 (0.7537) C = 1, γ = 0.01 -0.2360 (0.7741) 0.7173 (0.0400) 0.6957 (0.1438) C = 1, γ = 0.1 0.2044 (0.2798) 0.4489 (0.1338) 0.4389 (0.2839) C = 1, γ = 1 0.2312 (0.2393) 0.5361 (0.0699) -0.3090 (0.5367) C = 1, γ = 10 -0.0202 (0.5142) 0.1118 (0.3493) -0.8368 (0.7834) C = 1, γ = 100 0.0886 (0.3836) 0.2879 (0.2021) -0.4802 (0.7573) C = 10, γ = 0.01 0.1259 (0.3536) 0.6407 (0.0581) 0.4578 (0.2612) C = 10, γ = 0.1 0.1865 (0.3005) 0.5607 (0.0793) 0.1870 (0.34016) C = 10, γ = 1 -0.2055 (0.7299) -0.2523 (0.7274) -0.1666 (0.4506) C = 10, γ = 10 0.0785 (0.3935) 0.1522 (0.3002) -0.9093 (0.8338) C = 10, γ = 100 0.0479 (0.4297) 0.2735 (0.2135) -0.4771 (0.7589) C = 100, γ = 0.01 0.2033 (0.2855) 0.4149 (0.1504) -0.1155 (0.6888) C = 100, γ = 0.1 -0.0627 (0.5814) 0.6739 (0.0468) 0.5415 (0.1820) C = 100, γ = 1 -0.2283 (0.7556) -0.0156 (0.4782) -0.2906 (0.5159) C = 100, γ = 10 0.0785 (0.3920) 0.1662 (0.2916) -0.9153 (0.8340) C = 100, γ = 100 0.0479 (0.4313) 0.2735 (0.2115) -0.4771 (0.7538)
Table 7: Sharpe ratio and the associated p-value on OMXS30 using SVM with a radial kernel.
6
Discussion and Conclusion
The purpose of this thesis was to test the predictability of stock indices with regression models using intermarket factors. The first models used multiple linear regression to predict the daily return of the OMXS30 index with the returns from 16 different sector indices as covariates. To improve performance, the model was also tested with normalized covariates and then refined by model selection with AIC. The results were generally positive and to some degree supports the hypothesis that past returns could give an indication of future returns. Some combinations of the model were able to generate statistically significant risk-adjusted returns when tested on historical data.
The multiple regression model was also used to test the predictability of the relative return between a sector index and the main index OMXS30. The results were generally positive and in some cases highly significant, however practical use of the models would need to consider important issues in the implementation.
A classification method by a support vector machine was also tested to predict the direction of the OMXS30 index. The results for linear classification were just good as those for the linear regression model with AIC, however nonlinear classification with a radial kernel failed to generate consistent results.
Appendix
Figure 5: Portfolio with n=2. Figure 6: Portfolio with n=2. Figure 7: Portfolio with n=2.
Figure 8: Portfolio with n=3. Figure 9: Portfolio with n=3. Figure 10: Portfolio with n=3.
Figure 11: Portfolio with n=4. Figure 12: Portfolio with n=4. Figure 13: Portfolio with n=4.
References
[1] David Aronson. Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals. Wiley, 1 edition.
[2] Anna Bergfast. Automated trading using a dip searching strategy. Master’s thesis, Royal Institute of Technology, Stockholm, 2009.
[3] Chih-Chung Chang and Chih-Jen Lin. Libsvm: A library for support vector machines. 2001, updated March 4, 2013. URL: http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf, [Online; accessed 2015-04-30].
[4] Corinna Cortes and Vladimir Vapnik. Support-vector networks. Technical report, AT&T Labs-Research, USA.
[5] Harald Lang. Elements of Regression Analysis. Royal Institute of Technology, Stockholm.
[6] Andrew W. Lo and A. Craig MacKinlay. A Non-Random Walk Down Wall Street (5th ed.). Princeton University Press, Princeton, 2002.
[7] On Prediction and Filtering of Stock Index Returns. Royal institute of technology, stockholm. Master’s thesis, Fredrik Hallgren, 2011.