Predicting Turning Points

(1)

U.U.D.M. Project Report 2009:20

Examensarbete i matematik, 30 hp

Handledare och examinator: Johan Tysk

September 2009

Predicting Turning Points

Jón Árni Traustason

(2)

(3)

Abstract

(4)

(5)

List of Figures

1 Dened Historical Short Run Turning Points, STP . . . 8

2 Dened Historical Long Run Turning Points, LTP . . . 9

3 Hidden Markov Model . . . 11

4 The Transition between States and the Information Series . . . . 12

5 Hidden Markov Model with lead . . . 12

6 Dierent Denitions . . . 21

7 Dierent leads for the short and long run. . . 22

8 Dierent sample sizes for the short and long run. . . 23

9 Out-Sample Denition Comparison . . . 26

10 Out-Sample Turning Points . . . 27

11 Out-Sample Probabilities . . . 27

14 Strategies based on prediction vs. buy and hold. . . 30

15 Turning Points in More Markets . . . 37

16 Turning Points in More Markets . . . 38

17 Out-Sample prediction in USA using Euro Zone plus macro sample. 39 18 Out-Sample prediction in Sweden using other markets. . . 40

List of Tables

1 Historical Bear and Bull markets of the S&P500: 1990-2009 . . . 9

2 The Macroeconomic Variable Sample. . . 19

3 Model tted Bear and Bull markets of the S&P500: 1990-2009 . 22 4 The Optimal Combinations for Dierent Sample Sizes . . . 23

5 Optimization Parameter Analysis . . . 24

6 Calculated Expected Time vs Historical Average Time . . . 25

7 Predicted Bear and Bull markets of the S&P500: 2002-2009 . . . 26

(7)

1 Introduction

It is the dream of every investor to be able to predict the stock market. To be able to answer the question, what will the stock market do? The only answer J.P. Morgan could give was: "It will uctuate"1 _{. Despite the fact that this is}

probably the only correct answer it does not help and a more in depth answer is desired. The aim of the thesis is to determine the turning point of the stock market cycle using macroeconomic variables, where the S&P 500 stock index is assumed to represent the market. The exact value of the market will not be discussed in any detail, the focal point will be to predict when it will turn. This will help investors to time their strategies and could also be usefule guideline or benchmark for macroeconomic analysist.

Earlier literature about turning points focus on the business cycle, both identifying them and predicting. Where the standard dention of a business cycle provided by Arthur F. Burns and Wesley C. Mitchell (1946, p3):

Business cycles are a type of uctuation found in the aggregate eco-nomic activity of nations that organize their work mainly in busi-ness enterprises: a cycle consists of expansions occurring at about the same time in many economic activities, followed by similarly general recessions, contractions, and revivals which merge into the expansion phase of the next cycle.

frames the problem at hand. The fundamental element from the dention is that the business cycle can be divided into two distinct regimes and the goal becomes to identify or predict the point of a regime shift, a turning point. To be able to predict a turning point in the stock market cycle the above also has to hold, i.e. the stock market cycle can be divided in distinct phases of dierent behavior. The stock market cycle, like the business cycle, is despite the name far from being cyclical, it is an irregular uctuation as one can clearly see, e.g. in Figure 1. The market uctuation might therefore be a more transparent name but the stock market cycle is more common so we will use that throughout this paper. We would also expect investors to behave dierently depending on how long we have been in a certain state. Maheu & McCurdy show for example that stock prices increase the most in the beginning of an expansion, bull market. So being able to predict when it starts is crucial for everyone that wants to prot from stock trades. They also found out, in contravention of, at least, my believe, that the probability of leaving a state decreases with time. Despite this we will not regard time duration in our calculation; instead we assume that the economic status eects probability of changing states. The economy on the other hand changes with time and time is therefore implicitly taken into count.

In theory the stock price today reects the expected future return of the company, so the correlation between stock prices and general economic activ-ity is high. But as stated the stock prices reect the future return so stock indexes are a leading indicator therefore should stock prices reach minimum

1_{See e.g.}

(8)

before the business cycle. The goal is to use macroeconomic variables to predict the future stock price in fact expected future macroeconomic variables should be used. Trying to predict future stock prices using available public information and past prices is also a violation of the semi strong form of the ecient market hypothesis dened by Fama (1970). So why bother?

First of all, some macroeconomic variables today give a clue about the fu-ture. The yield curve gives us information about how the future interest rate will be and unemployment, ination and GDP give us a clue about future con-sumption and investment. Second, is the market really ecient? Many claim otherwise. The ecient market hypothesis has also been attacked by critics who blame belief of rational markets for much of the last bubbles and bursts. One can also see the obvious, that if no one would could make money of analysing the market no one would bother. Last, we note that the aim is not to predict every step of the stock return only when it will turn. With that in mind we are set to go, but to be able to predict a turning point a concrete denition is needed.

1.1 Dening a Turning Point

As the name points out a Turning Point is when the market turns, i.e. goes from being upward moving to downward or vice versa. The upward moving market is commonly named a bull market and the downward period a bear market but although bull and bear market are common words in an investors dictionary there does not exist any academic general denition. The denition rst and foremost needs to capture the trend of the market, upward or downward. It should however not follow every turn it takes and should ignore short time decreases of stock prices during a bull market, correction and a short term increase of stock prices during a bear market, rallies.

(9)

LTP might be signaled before the actual turn.

Since a general dencition is lacking and more importantly the model applied in the paper depends heavily on the how the past states are categorized few alternatives or adjustments are considered. The dierent denitions will be investigated in more details, both how they t the model and capture the trends. 1.1.1 Adjustments in the Short Run

The default denition might be misleading since stock prices tend to have an increasing long term trend, to avoid this the returns are adjusted with respect to the historical average. Adjusted turning points, ATP, are dened when stock returns go from being lower than the historical average to be higher than it for two periods in a row. This way the economic values will only signal a chance when they uctuate more than normally. Here the historical average is assumed to be 0.6% a month and these kind of adjustment will only be made for the short run turning points.

A dierent adjustment can be maid by classifying the regimes by the market price of risk or the Sharp Ratio. The market price of risk is the return in excess of the risk free interest rate that the market demands as compensation for the risk taken, the reward-to-risk ratio of the market portfolio. The market price of risk is dened as MP R = µ−rf

σ , where µ is the return during the time period,

σ is the volatility and rf is the risk free interest rate. As for ATP, the market

price of risk turning points, MTP, are dened when the MPR goes from being below its historical average to being higher and stays above the average for at least two periods. The idea here is to take both the return and the risk, the volatility, of the stock into count. The states would represent when it pays o for an investor to take a risk, buy stocks, and when it doesn't.

At last, the denitions of the turning points given above are compared to the turning points that follow every change of direction in the stock returns or every direction turning points, EDTP. This obviously eliminates the risk of a hidden market but on the other hand it violates the fact that rallies and correction should not be dened as trends. This is also what investors would like to be able to predict and will therefore be tested.

1.1.2 Adjustments in the Long Run

Volatility of the market is far from being constant over time and dividing the stock market into dierent states with respect to volatility is therefore plausible. Some evidence point to that volatility is higher during a downward trend, e.g. Maheu & McCurdy. Plus that expected stock returns and volatility are related through the capital asset pricing model (CAPM)2_{, where higher volatility leads}

to lower expected return when the stock and the market are positively corre-lated, here the portfolio in hand is the market portfolio so the correlation is obviously one. It is therefore concluded that dening upward and downward

2_E[r

i] = rf +Cov(r_{V ar(r}i),r_m₎m ∗ (E[rm] − rf), where ri is the return of stock i and rm is the

(10)

regimes in the stock market based on volatility is reasonable. A volatility turn-ing point, VTP, is dened when the volatility goes from beturn-ing higher to lower than the historical average for two consecutive periods or vice versa. Here the volatility is estimated by the standard deviation of daily returns within each month. Since volatility is more stable over time than the stock returns the VIP are categorized as long run turning points.

So far the turning points have all be dened strictly based on if returns (or volatility) are higher or lower than they used to be not how much higher or lower. The third alternative is to dene a bull market, as a gain of 15% or more from a low point which was preceded by a 15% decline and vice versa for bear market. A degree turning point, DTP, is dened at the trough or the peak between the trends that are identied by the minimum 15% change.

1.1.3 Denition Overview

Given the denitions above we can now identify the turning points in the stock market, below in Figures 1 and 2 the STP and LTP turning points are presented respectively.

(11)

Figure 2: Dened Historical Long Run Turning Points, LTP

Given the nature of the denition the statistics between the regimes ob-viously diers and in Table 1 few statistics are given. In short the statistics indicate that the average monthly returns are two to three percent, where the direction depends on the trend, and that the volatility is signicantly higher in bear markets. Which is interesting since that no condition was set on the volatility in none except one denition, which strengthens the ground for VTP. Despite the seemingly relation between the return and volatility the VTP does not capture the trends as well as the other denitions. It should also be noted that DTP does seem to capture bear markets best of the long run denitions considered so setting it as default long run denition is tempting.

Denition Avr Bear Vol Bear Avr Bull Vol Bull AT Bear AT Bull STP -2.61 18.67 2.02 13.57 5.57 10.06 ATP -1.99 16.92 2.54 13.92 6.62 7.24 MTP -1.53 15.86 2.81 14.51 7.81 6.50 EDTP -3.73 16.23 3.19 14.70 1.54 2.30 AEDTP -3.14 15.44 3.58 15.08 1.88 2.10 LTP -2.13 21.40 1.48 12.72 12.80 32.60 VTP -0.77 22.18 1.24 10.78 13.00 19.71 DTP -2.98 23.07 1.48 12.82 13.00 43.75 Here the average returns are simply calculated taking the average of the monthly returns in the corresponding state and the average time is the given in months. The average volatility is calculated by taking the average of annualizaed of the inner month volatility of daily returns in each state.

Table 1: Historical Bear and Bull markets of the S&P500: 1990-2009

1.2 Disposition

(12)

section three the estimation method is introduced in details and the optimality conditions investigated. In section four the empirical results are calculated and presented and a short discussion is given of them. The last section, Section 5, concludes the paper.

2 The Model

To be able to capture the dierent behavior in the stock market the model has to take into count dierent regimes where the value of the parameters changes depending on in which regime it is. In our model the stock market can only be in one of the two regimes, bear or bull, and the regimes are recurrent but not periodic. The future states are unobserved and the goal is to predict when a switch between them will occur given the information today. There are two main methods how to determine this switch of regime, using a specic threshold or a hidden Markov chain. When a specic threshold is applied the regime change is assumed to happen when a certain value, threshold, is reach. Although one can say that the behavior in the stock market changes when a top or bottom is reach there is no way of dening that kind of threshold. A general threshold suits better when modeling e.g. ination or exchange rate, where reaching a certain level will trigger government intervention. In the Markov chain approach the regimes are assumed to follow a particular stochastic process, namely a Markov chain. The stochastic evolution of the states is a closer t to the stock market behavior and therefore the focus is set on the Markov chain approach.

2.1 Markov Chains

A Markov chain is a stochasic process with Markov property. The idea is that every state of the Markov chain determines in which regime the economy is in at time t and in our case we have dened two cases and the Markov chain, St

is dened as: St=

0 if we are in a bear market (recession)

1 if we are in a bull market (expansion) . (1) Our goal is to estimate P (St+l = j|It), where It is the information we have at

time t, and then determine from that whether a turning point will occur or not. The Markov Property states that the future states of a Markov chain do not depend on all past states only a xed number of states. The simplest form, a rst order Markov chain, is that the future state of the Markov chain only depends on the present state, i.e. the Markov chain has no memory

P (St+1= j|St= i, St−1= q, ...) = P (St+1|St). (2)

In our calculations time duration of a state is not taken into count, i.e. our Markov chain is of the simplest form, a rst order Markov chain with two discrete states. The probability of going from i to j is given by pij and the

(13)

From equation (3) we see that only two transition probabilities need to be estimated to get the transition matrix, the general rule is that z2_{− z} _{have to}

be estimated to get the transition matrix where z is the number of states in the Markov chain. It is therefore clear that the dimension of the problem will increase signicantly for every state added. As noted above our Markov chain is recurrent so we will always return to both states and we are interested in forecasting in which state we are in the future or more precisely when we will change states. The probability of being in state j after n months given that we are in state i today is given by p(n)

ij . Now since the chain is recurrent and given

that the natural assumption p00, p11> 03 holds the probability of being in the

same state after n months can be calculated4

p(n)₀₀ = 1 − p11 2 − p00− p11

+ 1 − p00 2 − p00− p11

(p00+ p11− 1)n. (4)

Like before we have the relation p(n)

01 = 1 − p (n)

00 and similar formula can be

derived for p(n) 11.

It is also interesting to take a quick look at the expected duration in the cur-rent state. We dene the expected time of leaving state i as ki = E[time to leave i]

which in our case is very easy to derive :

ki= 1 + pii ∗ ki+ pij ∗ kj (5)

now kj = 0since when we are in state j we have left i and we have,

k0= 1 1 − p00 and k1= 1 1 − p11 . (6)

This isn't however the whole story since our Markov chain is hidden but it will be interesting to compare, e.g. the real average time of a state and the expected time estimated using eq.(6).

2.2 Hidden Markov Model

In a Hidden Markov Model, HMM, a nite set of states, that follows a Markov chain, is assumed to explain a dierent behavior in a time series. The states are however not directly observable, only the dependant time series. In our case the unobserved states are given by eq.(1) and the observed series , Xt, that

represent the information that we can observe at time t, the relationship can be expressed graphically as:

Figure 3: Hidden Markov Model

3_{In fact it is enough that p}

(14)

Where each observation Xt is generated by a probability denstity function,

bSt(Xt), that depends on the state St. Here the information, Xt, is a vector of time series given by:

Xt=the observed time series = [X1t, X2t, ..., Xkt], (7)

where the sample of choice in this paper is presented in Table 2. The connection between the information and the states is graphed in Figure 4

Figure 4: The Transition between States and the Information Series

2.3 Markov Bayesian Classier

We follow Koskinen and Öller (1999) in details but we apply it on the stock market cycle not the business cycle. To be able to apply the model three as-sumptions have to hold; some macroeconomic variables lead the stock market and the stock market cycle can be modeled with a two state Markov chain where the observed variables are normally distributed within the states.

Since the information today, Xt, is assumed to lead the state, St, the relation

graphed in Figure 3 above changes. In Figure 5 the Hidden Markov Model is presented where Xtis assumed to lead Stby l periods.

Figure 5: Hidden Markov Model with lead

The probability density function now depends on a future state St+l and the

normality assumption gives Xt∼ N (µSt+l, σ

2

St+l). The goal is, as stated above, to estimate the probability of the stock market cycle being in a certain state in the future given the present information, that is:

P (St+l= j|Xt). (8)

These probabilities are estimated with the recursive algorithm presented below, using the transition probabilities of Stand the density function of Xt.

It is not enough to state some probability of being in a state, a decision has to be made. So a function, g(Xt), is dened, that decides which state is

predicted,

g(Xt) = jif P (St+l= j|Xt) ≥

1

(15)

and state j is predicted at time t+l if g(Xt) = j. This with the normality

assumptions

2.4 The Algorithm

Now to estimate the probability we use a recursive algorithm5_{. In the beginning}

of every iteration we start with P (St+l−1|Xt−1) from earlier calculations and

we take the following steps.

Step 1 - a natural probability estimate given the earlier probability estimation and the transition probabilities

P (St+l= j|Xt−1) = 1

X

i=0

pij∗ P (St+l−1= i|Xt−1). (10)

Step 2 - Baye's Theorem 6 7_{P (S} t+l = j|Xt) = f (Xt|St+l= j) ∗ P (St+l= j|Xt−1) P1 i=0f (Xt|St+l= i) ∗ P (St+l= i|Xt−1) . (11) Where f is the density function of the multivariate normal distribution8_.

One can now put these steps together in one and obtain one formula that estimates the probability at each time.

P (St+l= j|Xt) =

(p0j∗ p0(t + l − 1) + p1j∗ p1(t + l − 1)) ∗ f (Xt|St+l = j)

P1

i=0((p0i∗ p0(t + l − 1) + p1i∗ p1(t + l − 1)) ∗ f (Xt|St+l= i))

. (12) To be able to use this algorithm we need µSt, σSt and the transition matrix, P. We can calculate µSt and σSt, since we know in which state we have been in so far. This emphasizes how important the denition of choice is since it is needed to adjust the algorithm. The transition matrix, P, is how ever the variable that we use to minimize the cost of being wrong, see the cost function below. To start the algorithm some starting value is needed, one simple approach is just start with the neutral probability 1

2 of both states (Koskinen and Öller (1999)).

One could also consider the steady state probabilities9

P (S0= 0) = 1 − p11 2 − p00− p11 , P (S0= 1) = 1 − p00 2 − p00− p11 . (13) This can be added to the algorithm and estimated or we can use the method of estimating transition probabilities from Andersson(2006)

ˆ p00= n00 n00+ n01 , ˆp 11= n11 n11+ n10 (14) 5_{Hamilton (1994), p.692} 6_{Derived in Appendix A}

7_{Note that the decision function, eq.(9), can now be expressed as P (S}

(16)

where nij is the number of transitions from state i to j. Netfci (1984) states

that when the time series is long the initial probability do not play a big part in the nal result, so this complication might not pay o.

2.5 The cost function

The cost function C which is used in the estimation and minimized subject to the transition matrix in the recursive algorithm above. It is given by

C = ω ∗ M SE + (1 − ω) ∗ ECE, (15) where ω ∈ (0, 1) and its value depends on which is valued more MSE or ECE. MSE is the Brier's probability score

M SE = 1 T ∗

X

t

e2_t. (16)

With the error estimate et = P (St+l = j|Xt) − δ(St+l, j), where δ(St+l, j) is

the Kronecker delta function, and T is the length of Xt. ECE is the error count

estimate

ECE = 1

T ∗ count(g(Xt) 6= St+l). (17) MSE can be interpreted as how wrong the prediction is and ECE as how often it is wrong. So the decision can be made with condent a emphasis will be put on MSE, i.e. MSE is valued more so ω > 1

2. We will follow Koskinen and Öller

and set ω = 2 3.

Now our problem can be formulated as: minimize C(Ptrans)

subject to 1 > pii > 0 , for i = 1, 2. (18)

Here, Ptrans is the transition matrix (3) that determines the probabilities (16),

the upper constraint is strictly less than 1 since we assume that the Markov chain is recurrent and the natural assumption that the transition probability is strictly larger than 0, since else the market would jump from the state after each time period with probability 1. The problem looks simple at rst glance but the estimation becomes sort of cumbersome because of the complexity of the cost function.

Our objective function is obviously not continuous since at some points the ECE part causes n

T jump, where n is the number of periods where the prediction

changes and T is the number of periods. However, this does not apply for all predictions since at some points the decision will not change even though the decision probabilities change slightly, i.e. the cost function is continuous around these points.

Denition 2. A prediction is said to be stable if the Error Count Estimator is continuous in the neighborhood of it. Let v ∈ <n _{be the variables that are taken}

(17)

In other words a prediction is said to be stable if a small change in the input variables does not change it. In our case the prediction is the decision vector g(Xt)and the input variables here are the transition matrix, Ptrans. Our

predic-tion is stable as long as the probability vector does not have any values close to

1

2, this is something that is desired since a 50/50 prediction is not really reliable.

3 Estimation

After the model has been dened it has to be estimated, in this section the estimation will be discussed as well as what macroeconomic variables will be used as a predictor.

3.1 Smoothing

Before the model is applied some lter should be considered. Smoothing the data should decrease the risk of a false alarm since a high noise will be smooth. Applying a lter should also help to shift the time series into the right phase, this is essential since the macroeconomic variables may have a dierent lead on the states of the stock market. There is also a drawback of smoothing the data, the smoothing shortens the distance between the observations which reduces the probability of a successful detection and the probability of detecting it in time. We will therefore apply the model with and without a smoothing. The smoothing lter that is applied is an Exponential Weighted Moving Average, EWMA, and our smooth series is dened as

ˆ

Xt= λ. ∗ Xt+ (1 − λ). ∗ ˆXt−1 (19)

where .* is the element wise multiplication and λ is the weight vector of the vari-ables. Now the smoothing parameters, λ is added to our optimization variables and the problem (18) becomes:

minimize C(Ptrans, λ)

subject to 1 > pii > 0 , for i = 1, 2

1 > λj > 0 , ∀λj ∈ λ

(20) Now let dene the feasible domain D as a domain where all constrains are fullled and the problem becomes:

min(p,λ)∈DC(P, λ). (21)

3.2 Optimization Method

The most straight forward method to apply when the solution is estimated is a simple grid search, which is easy to apply when k is low, where k is the number of macroeconomic variables. The grid search is applied in two stages, in the rst stage the smoothing parameters are estimated, λ, and the transition probabili-ties, P, using a 0.1 grid with, i.e. λj= pii = 0.1, 0.2, .., 1for j = 1,2,..,k+1 and

(18)

calculations that have to be made in the rst stage of the grid search is given by 10 ∗ 10(k+1)_{. Most of more sophisticated techniques, such as Newton's or}

Oasi-Newton's methods, rely on the derivative of the objective function but in our case the objective function, eq 15, is not dierentiable. Therefore a more advanced search algorithm that is more ecient with increasing dimension of the problem is needed.

The algorithm of choice is DIRECT, a Lipschitzian optimization algorithm,10_.

In standard Lipschitzian optimization the objective function, f, is believed to be Lipschitzian continuous in the feasible domain, D', that is there exists a known constant C such that

|f (x) − f (x0)| ≤ C ∗ |x − x0|, ∀x, x0_{∈ D}0_. ₍₂₂₎

Thus there exists a constant, C, that is a upper bound on the rate of change of the function and a lower bound can be estimated in any closed hyper rectan-gle, when the vertices points have been calculated. Then an algorithm, such as Shubert's algorithm, and the lower bound is applied to choose between hyper rectangles, i.e. where to search, until the optimal point is obtained. However in DIRECT the Lipschitz constant is not needed. The advantage of not needing the Lipschitz constant are that it can be hard to estimate, which it is for eq. 15, and the Lipschitz constant is often fairly large since is a bond on the rate of change of the corresponding function. The problem that follow a cumbersome estima-tion of it are obvious and when it is large it will lead optimizaestima-tion algorithms based on it to overemphasis on global search which leads to slow convergence. As mentioned above, the Lipschitz constant is not used in the classical sense in DIRECT but can be looked as a weighting parameter between global verses local search. That is, instead for calculating the Lipschitz constant, all possible constants between zero and innity are applied to select the set of potentially optimal intervals. This solves two of the three problems of traditional Lipschitz optimization are met, namely slow convergence and specifying the Lipschitz constant. The third problem is complexity in higher dimension. Conventional algorithms estimate the vertices of each hyper rectangle which leads to 2n

eval-uations, where n is the number of dimensions. The algorithm in hand however uses the midpoint of the space, which is one no matter the dimension of the problem. Like other Liptschitzian algorithms, the DIRECT algorithm is also deterministic which makes multiple runs unnecessary, unlike genetic optimiza-tions where the optimal solution obtained is in some sense stochastic.

The selection of a potentially optimal interval does, as noted, depend on the Lipschitzian constant where a hyper rectangle j is said to be potentially optimal if there exists some ˆC > 0such that

f (cj) − ˆC ∗ dj≤ f (ci) − ˆC ∗ di ,for all i = 1, ..., m (23)

f (cj) − ˆC ∗ dj≤ |fmin| (24)

where ciis the midpoint of i-th hyper rectangle, dithe distance from the vertices

and is a lower bound for how much the checked solution needs to exceed the current best solution, fmin. is the only parameter that has to be determined

(19)

beforehand where evidence point out that DIRECT is fairly insensitive to the setting of it, here the default value 10−4 _{is used. For further details such as}

graphical explanations, performance comparison, convergence and how to di-vide the hyper rectangles see reference [8].

To apply DIRECT the constraints of the problem need to be fairly simple and in our case they are indeed so, the only thing we assume about the opti-mization variables is that they are in the domain D or inside the unit hypercube. A sucient condition for convergence of the algorithm is that the function is continuous - or at least continuous in the neighborhood of the optimal solu-tion. However, when the prediction is stable, the function is continuous in the neighborhood of the input variables. That is when our global optimum leads to stable prediction the algorithm will converge to it.

3.2.1 Optimality

A little further investigation of the optimality conditions is needed. The La-grangian of the problem is dened by:

L(P, λ, ν, γ) = C(P, λ) + 1 X i=0 (ν2i− ν2i+1) ∗ pi− ν2i+ k X j=0 (γ2j+1− γ2j+2) ∗ λj+1− γ2j+1 (25)

and the dual function is dened as the minimum of the Lagrangian,

d(ν, γ) =inf(P,λ)∈DL(P, λ, ν, γ). (26)

The Lagrangian dual problem becomes:

maximize d(ν, γ)

subject to ν, γ ≥ 0 . (27) One can see that an optimal solution to the Lagrangian dual problem, d∗_{, is}

always less or equal to the optimum of the initial problem, c∗_{, that is d}∗_{≤ c}∗_,

this is called weak duality. When we have d∗ _{= c}∗ _{then we say that strong}

duality holds and the optimal duality gap is zero. When all constraints are inactive so called Slater's conditions hold and the point is strictly feasible. In our case we assume that the transition probabilities are always strictly feasible, but the smoothing parameters can theoretically be equal to 1, if the future state only depends on the present value of the variable not the past states. The optimal solution of the dual problem, d(ν∗_{, γ}∗₎_{, is in our case obviously obtained}

when P1

i=0(ν2i−ν2i+1) ∗ pi−ν2i+P k

j=0(γ2j+1−γ2j+2) ∗ λj+1−γ2j+1= 0. Thus

d∗ = c∗ and strong duality holds and the complementary slackness conditions gives:

0 < p∗_i < 1 =⇒ ν_2i∗, ν_2i+1∗ = 0 (28) 0 < λ∗j < 1 =⇒ γ2j∗ , γ2j+1∗ = 0 (29)

or,

(20)

The problem is that the stationarity conditions of the KKT optimal conditions does not hold when some, λi = 1, thus the necessary optimal conditions are not

satised, ∇C(P, λ) + 1 X i=0 (ν2i− ν2i+1) + k X j=0 (γ2j+1− γ2j+2) 6= 0, (31) since γ∗

2j ∈ <+. It might be added that the strange looking cost function is still

a bit of a problem since it is not possible to calculate the derivative of it rigor-ously (it can only be estimated numerically). However if Proposition 1 holds, then the necissary conditions for ∇C(P, λ) = 0 are fullled.

The conclusion is that in order of our solution to converge to the optimal solution we must have a stable prediction and none of the smoothing parameters can equal 1. Both cases are highly unlikely but neither of them is impossible. Note also that if the optimal value of some smoothing constant equals one no smoothing should be applied on the corresponding variable.

3.3 Verifying the Assumptions

To apply our model we assume three assumptions, that the macroeconomic vari-ables have a lead the future state of the stock market, the stock market behavior can be categorized into two regimes and that within these states the information series is normally distributed.

There are in fact no tests that we can apply to verify that the macroeconomic variables do lead the state of the stock market cycle, the variables included in our sample are simply picked with an economical feeling and partly based on the results from S-S. Chen.

The next two assumptions go hand in hand, since the distribution within in the states depends on how many states we assume and how the states are dened. In our case several denitions of dening the market into two states have been proposed and a multivariate normal distribution assumed11_{. But to}

check for the normality the smoothing parameters have to be determined so the normality test cannot be done until after the calculations have been done. The calculations on the other hand depend on the distribution so for the moment normality is just assumed to hold12_.

3.4 The Data

Before we apply the models a closer look has to be taken at the data. We have to choose the macroeconomic variables that we consider and which of them we use.

11_{The student t distribution was also tried, simply by comparing the cost from the normal}

distribution when tting the model to the whole period, with poor results.

12_{After the estimation had be done and the smoothing coecients determined a Henze}

(21)

First of all the sample of possible macroeconomic variables have to be cho-sen. There are a lot of literature that look into the connection between the stock market and macroeconomic variables. A simple solution would be to con-sider the same macroeconomic variables as Boström (2009) or Shiu-Sheng Chen (2008) and then check the forecasting performance of the variables. We how-ever don't want our sample to be extremely large and therefore try to pick our sample carefully, the sample that is considered is presented in table 2.

The Macraeconomic Variable Sample

Stock Returns X1

The Purchasing Managers' Index Change, PMI X2 Total Production Growth X3

Yield Curve (5y - 1y) X4

Vehicle Sales Growth X5

Housing Starts Growth X6

Ination (from PPI) X7

The VIX Index Change13 _X8

Table 2: The Macroeconomic Variable Sample.

The closing value in each month is considered in all cases and the growth is calculated in all of the variables except the Yield Curve (since the value of it can be negative). The change is calculated by taking the rst dierence of the natural logarithm of the series, i.e. Zt= ln(Yt) − ln(Yt−1). The PMI and the

total production are seasonally adjusted and it should be added that dividends are excluded in the stock prices.

The data sample is not picked randomly and an economic reasoning lies be-hind the sample of choice. The PMI reects the expectations of the companies, hence their expected revenues which aects the stock prices. The total pro-duction growth does obviously aect the economy as a whole, thus the stock market, where it reects the overall demand side. The yield curve is assumed to indicate the monetary policy which gives clue about future or expected ina-tion. The vehicle sales and housing starts growth represent the households and their nancial expectations. The ination calculated using the PPI does in some sense lead the CPI ination and is an evidence of both expected revenues of the companies and households, through the cost and wages respectively. Finally the VIX index change does give information about the volatility, risk, of the market. After choosing the data some transformation is usually needed to obtain sta-tionarity but here two states of dierent behavior of the time series is assumed, i.e. nonstationarity is assumed. The two states also address the problem of heteroskedasticity, at least up to some level, since we have dierent volatility in the states.

Autocorrelation is a common problem when dealing with nancial time se-ries. One solution would be to address this directly in the model, that is assume that the stock market cycle would follow a switching AR(1) model:

(22)

where St+l ∼ N (0, σSt+l). This would complicate our recursive algorithm be-cause the past probability would have to be taken into count plus the density function would now be dependant of two states14_{. The main problem is however}

to estimate φ because φ is a vector since the autocorrelation coecient could be dierent between the time series, i.e. if a multivariate autocorrelation is as-sumed. If however we assume vector autoregressive model, VAR, φ is a matrix and the estimation becomes even more cumbersome. On top of these com-plications, Ivanov (2000) states that the problem of autocorrelation is mostly captured by the probability of staying in the same state. The adjustment to serial correlation is therefore considered unnecessary and left for further studies.

4 Estimation Results

When testing our model the smoothing coecients and the transition proba-bilities are estimated in an in-sample period. Then, holding the coecients constant, the model is used to predict the states in an out-sample period. An expanding out-sample window is applied, starting from January 2002 and the expanding step is one month up to July 2009. Note that although the return for the last month can be observed the state cannot be determined with full cer-tainty since a turning point has occurred if and only if the return has changed for two consecutive months. The latest information that is included in the in-sample is therefore Xt−l−1.Before the model is put in use, the lead, the size of

the sample and the combination of variables has to be decided. To make this decision the model is applied to the whole period and the results are compared, thus in all cases the decision is made by a simple trial and error method.

4.1 In-Sample Adjusments and Results

We start of by a comparison of the dierent denitions from the introduction. Then the short and the long-run denitions, STP and LTP, are analyzed in more details and the other denitions are assumed to have similar characteristics depending on in which category they are in, short or long run.

4.1.1 Denition Comparison

The results for the t of the dierent denitions to the model are presented in Figure 6, there the t of the non-smoothed STP and LTP are also presented, dened as NSTP and NLTP respectively.

(23)

Figure 6: Dierent Denitions

We clearly see from that when trying to capture every direction change, EDTP, the model does not give as good t as when the focus set on the suggested denitions in the short run. It can also be concluded from Figure 6 that the denition of turning points based on the MPR does not work and that the smoothing is necessary to get decent results. The STP and ATP give on the other hand a good t of the model and the t of those two is almost identical. In the long run DTP gives the best t which among the historical denitions in Table 1 hints that for it should be applied for the long run.15 _Although

the cost does give us a good idea on how the model does t the denition, the denitions are dierent which skews the comparison. One should therefore also take a brief look at the statistics obtained by tting the model, see Table 3 which corresponds with the historical statistics given in Table 1.

15_{Note that taking a dierent period might lead to dierent results. Couple of other periods}

were tested which gave similar results, the cost however did vary which is not a surprise since the longer the period is the harder it is to t the model which can be explained with the relation minx(f1(x) + f2(x)) ≥ minx1(f1(x1)) + minx2(f2(x2)). Thus one would expect the

(24)

Denition Avr Bear Vol Bear Avr Bull Vol Bull AT Bear AT Bull STP -1.20 20.02 1.14 13.32 6.80 17.88 ATP -1.23 18.04 1.79 13.11 11.33 15.88 MTP -0.05 16.27 1.31 14.31 7.33 8.50 EDTP -2.51 18.53 1.42 14.24 1.27 3.82 AEDTP -1.33 16.12 1.84 14.56 2.02 2.56 NSTP -0.57 22.46 0.72 13.36 2.58 10.00 LTP -1.57 22.61 1.27 12.17 10.83 32.40 VTP -0.47 22.41 1.02 10.79 12.71 23.33 DTP -2.58 22.91 1.41 12.75 10.80 43.25 NLTP -0.95 22.26 0.94 12.73 5.27 16.9

Table 3: Model tted Bear and Bull markets of the S&P500: 1990-2009 This strengthen our decision of disregarding further investigation of the MTP since it does not capture the bear markets at all according to the av-erage monthly returns. The VTP and the NSTP also have hard time capturing the bear markets and due to the bad t of the NSTP further investigation of it is regarded with fair condence. The general trend trough the denitions is that the average returns are between one and two percents, where the direction depends on the state, and that the monthly volatility is on the interval 5-5.5% in the bear market against 3-3.5% in the bull market. Note that as before that in none of the denitions except VTP a condition is put on the volatility which suggests that volatility are higher in a downward trend.

4.1.2 Lead

To be able to predict turning points the information obviously has to give a hint about the future behavior of the stock returns. To determine how long this lead is we test our model using the whole sample of variables and the whole period and dene the turning points for . We expect the model to give better results when more information is taken into count, so the model should t better when the lead is shorter. The cost for leads up to one year are presented for STP and LTP in 8.

(a) Short Run, STP (b) Long Run, DTP

Figure 7: Dierent leads for the short and long run.

(25)

the cost is not minimized when no lead is expected but when one month lead is assumed, which suggest that the assumption of a leading information series holds. The cost is also fairly low up to three month lead which gives some freedom when making a prediction where couple of possible leads can be tested. After three months the cost jumps up and again after eight so one can see a stepwise increasing trend. The lead in the short run will however be assumed to be one month if not stated otherwise throughout the paper. In the long run, Figure 7(b), the trend is not as obvious and perhaps the only thing one can read from the results is that the cost is minimized when the lead is assumed to be three months, which will be assumed as the default lead for the long run here after.

4.1.3 Choosing the Sample

Next the optimal sample is found when the model is tted to the whole period. Here we both investigate how the t changes with the size of the sample used and which variables are in the optimal samples. From above we see that the cost is minimized when the lead is assumed to be one month in the short run case and one quarter in the long run, we therefore assume those leads in the estimations below.

(a) Short Run, STP (b) Long Run, DTP

Figure 8: Dierent sample sizes for the short and long run.

Size Short Run, STP Long Run, LTP

1 X6 X7 2 X6 and X8 X1 and X3 3 X4, X5 and X6 X1, X3 and X7 4 X1, X5, X6 and X7 X1, X2, X3 and X7 5 X1, X5, X6, X7 and X8 X1, X2, X3, X6 and X7 6 X1, X2, X3, X6, X7 and X8 X1, X2, X3, X6, X7 and X8 7 X1, X2, X3, X5, X6, X7 and X8 X1, X2, X3, X5, X6, X7 and X8

Table 4: The Optimal Combinations for Dierent Sample Sizes

(26)

when a variable is added to a sample of 5 or more variables16_{. The results for}

the short run optimal combinations are kind of mind puzzling and not in line with our expectations. There the housing starts, X6, are an obvious winner where they are in all samples and the yield curve, X4, is excluded in all but the seven variable sample which is in contradiction to S-S Chen where they were found to give the best predicting power of a sample of variables, when they are examined individually. The long run results are in some way similar to the one in the short run, where the general trend is a decreasing cost with larger sample. The cost also seems only to decrease slightly with an extra variable when the sample consists of 5 or more variables and is again minimized with a sample of seven variables where the yield curve is excluded. An optimal seven variable sample, opt7, where the yield curve is excluded will therefore also be tested in both cases. These results do hint that a ve variable sample could be sucient. Keep though in mind that these optimal samples do change over time and therefore the optimal sample will be applied as well as the whole sample in the out-sample. The optimal combinations are however extremely fragile given which period the model is tted to. The explaination of why the yield curve is excluded could be that a change has been in the smoothing coecient for it or a regime shift in the monetary policy over the period. A simple test was done for the short run by taking the periods 1990 to 1999 and 2000 to 2009 seperately, in both cases the yield curve proved to be a strong candidate where it was included in most of the optimal combinations. The cost did also decrease continuously for every added variable the prediction presented will therefore be given taking the whole sample.

4.1.4 The Transition Probabilities and the Smoothing Parameters Here we show the smoothing parameters when they are estimated using the whole periods as an estimation sample. The problem is that the estimated values change depending on the period used for estimation so the analysis here below is not nal. As concluded above, the cost is not minimized when all variables are taken into the prediction but when the Yield Curve is excluded so an analysis of the coecients for the optimal sample is also presented

Parameter STP STP, opt7 DTP DTP, opt7 p00 0.7469 0.7780 0.9815 0.9787 p11 0.8512 0.9799 0.7346 0.9252 λ1 0.0432 0.1703 0.1626 0.2325 λ2 0.1488 0.1237 0.2654 0.1612 λ3 0.0556 0.0556 0.0556 0.0501 λ4 0.1118 *** 0.0556 *** λ5 0.6399 0.0514 0.6605 0.118 λ6 0.1749 0.1644 0.1667 0.1680 λ7 0.0501 0.0432 0.0514 0.0556 λ8 0.1584 0.1310 0.0556 0.5562

Table 5: Optimization Parameter Analysis

(27)

From the smoothing parameters one can conclude how long a change has to be to aect the state, where a lower smoothing parameter indicates that only a long lasting trend change in the time series aects the state, i.e. if the coecient is 0.0432 a 4.32% weight is put on the current value. To put things in perspective it should be noted that the EWMA smoothing coecients corresponds to a n step simple moving average where the relationship between the length n and the smoothing constant a is a = 2

n+1; so the smaller a is, the longer period is taken

into count in the smoothing process. The long run results are therefore in more contexts to our expectations since the stock market should take in information rather quickly, the results however do suggest that our model does need some smoothing to be considered. From the transition probabilities the expected time for each state can now be calculated using equation (6).

Time STP STP, opt7 DTP DTP, opt7 Model expected time in Bear 4.0 4.5 54.1 46.9

Model expected time in Bull 6.7 49.8 3.8 13.3 Real average time in Bear 5.6 *** 13.0 ***

Real average time in Bull 10.1 *** 43.8 ***

Table 6: Calculated Expected Time vs Historical Average Time

As mentioned, these calculated expected times do not give a clear picture since are not in a general Markov chain framework but in a HMM. That is indeed the case the only estimated time that gives some evidence is the estimated time for the STP.

4.1.5 Dierent Estimation Periods

In the above examples the whole period has been taken into count, one could argue that taking the initial in-sample (up to 2002) would be more coherent example but since the in-sample will expand and the question can be raised about which in-sample should be taken we take the whole period. Similar ex-ercise was however carried out for the initial in-sample, from 1990 to 2002. It gave almost identical solutions in the denition comparison, only with lower cost (see footnote). The lead comparison did however not give any concrete results where the cost started to increase as above but then decreased again. Finally the sample comparison was in line with the results above, i.e. the cost decreased with the size of the sample. The optimal combination in each size did however change which leads to that the predicting power of the macroeconomic variables is unstable.

4.1.6 The length of the in-sample

(28)

the pest prediction is obtained when as much information possible is taken into the estimation, i.e. when the in-sample is always taken from the starting date, January 1990. There is however, no trend in the cost reduction given with a larger in-sample and that might be explained by the fact that the business cycle is cyclical.

4.2 Out-Sample Results

4.2.1 Short Run

We start o as in the in-sample by comparing the denitions but now we have excluded some of them and added other variation of lead and samples found in the with in-sample testing, see Figure 9.

Figure 9: Out-Sample Denition Comparison

As before the cost is not considered sucient and the statistical table is revisited but now for the predicted states over the out-sample period.

Denition Avr Bear Vol Bear Avr Bull Vol Bull AT Bear AT Bull STP -1.18 24.25 0.80 12.35 4.88 5.78 ATP -0.48 21.87 0.45 12.30 3.78 3.23 LTP -0.055 25.87 -0.047 14.26 2.78 7.33 DTP -2.13 29.12 0.54 14.16 2.5 8.88 VTP -0.35 22.65 0.23 12.59 4.40 5.22

(29)

Figure 10: Out-Sample Turning Points

Figure 11: Out-Sample Probabilities

The model signals clusters of turning points in the beginning of the bull market in 2003 and in the middle of the bull market throughout 2005. The cluster that is signaled in the W shaped recovery in 2003 might be caused by uncertainty in economy that reects in the model. Where the uncertainty of the economy was caused by the few things e.g. The aftermath of the Enron case which ended with the Sarbanes-Oxley act17 _{in July 2002 that aected the}

nancial statements over the next periods and thus the expected revenues. The invasion in Iraq which led to increased uncertainty in the US economy and oil prices. The aftershock of the IT bubble in the labor market, where the volatility

(30)

in the monthly employment was exceptionally high around 2003. Finally one can look at the expansionary Monetary Policy in the US where interest rate was set at historical low level for a long period. In 2005 we have a hidden bull market, where the market is in a slow uptrend while the denition is stuck in a bear market. This might explain the problem that the model has, where the models signals a bull market in 7 out of 10 months of the hidden bull market. If one would count these signals as right ones, as they surely are, the ECE would drop from 0.4066 to 0.3297. This struggle could also indicate that a third state should be considered since the market almost moves sideways throughout the cluster.

A part from the struggles the prediction is good. It does get the right signal after the clusters and more importantly it does signal a bear market in the beginning of the nancial crisis in 2007, where the signal is given in October 2007 with 99% probabilities. Although the model does miss the short time bull market in the beginning of 2008 one might look at that as insignicant since it is followed by a massive bear market. The signal for the bull market is, like in 2003 a bit too late, where a bull market is rst signaled in July 2009 with the probability of 0.7267. A W-shaped recovery like in 2003 is therefore not out of the question but at least the signal indicates that a bottom has been reached. The model does however also predict a bull market in August with roughly 94% chance (which was correct since the return for August was around 1.86%) and that does strenghteh the probability of that the market has turned.

(31)

Figure 13: Out-Sample Turning Points

Again the model returns a cluster of signals around the turning point in 2003 which might be caused by the above mentioned events that shocked the market. Now on the other hand the model does not get into trouble in 2005, perhaps because the DTP denition does not dene the hidden bull market. Here the model does on the other hand give misleading signals in the 2007 crisis; the lead assumed here is however three months which gives the user time to reconsider. For example a signal could not be taken seriously except if it does hold for at least two months. When looked into the future the model does give extremely high possibility for a continuing bull market, with probabilities close to 100

4.3 Buy or Sell

We don't get paid in probabilities so we would like to value our prediction somehow. Three simple investment strategies are proposed:

1. A passive strategy, where stocks are hold when the model signals a bull market and bonds are hold when a bear market is signaled.

2. An aggressive strategy, where investor buys stocks during bull markets and shorts during bear markets.

3. A probability approach, where the investor buys stocks if the probability of bear market are less than 0.3, shorts stocks when the probabilities are above 0.7. If the probabilities are there between the investor buys but holds bonds when the model returns a probability there between.

The return from the bond is calculated using the 3 month Treasury Bill, where the given yearly return from the data series, y, is transformed into monthly rate using (1 + y)1

12. To keep things simple all transaction costs are also excluded, which might skews our results if the transactions between states are frequent.

(32)

The value of the initial investment is plotted in Figure 14 and the value at the end of the period is given in Table 8.

(a) Short Run, STP

(b) Long Run, DTP

Figure 14: Strategies based on prediction vs. buy and hold.

Strategy Str1 Str2 Str3 Bond Stock Short Strategy 157.33 220.30 178.53 119.44 86.98 Long Strategy 143.53 204.99 190.88 119.44 86.98

(33)

bond portfolio gives a better or similar return up until the beginning of 2006. The stock portfolio does even give more revenues than strategy three for a long period, thus one should avoid investing in the market based on strategy two for the short run estimated model. Strategies one and two do however bring in more, or as much, money as the stock portfolio over the whole period.

4.3.1 Dierent Cost Function

Since the model is used in an investment strategy to prot from the market the cost function could be tailored made to that. Recall that the cost function is made of two parts, MSE and ECE, where MSE (see eq16) can be looked at as how right we are and ECE, see eq 17, is a measure on how often we are right. Now instead of minimizing the cost of missing a turning point, ECE, the focus will be set on capturing the degree of the movement within the states with a performance estimator, P E =k 1 tr0 ∗ X i r_i0+ 1 tr1 ∗ X i r1_i − 1 k . (33) Where trj_{is the total return over state j and r}j

i is the return at time i where the

model predicts state j for j = 0, 1. Similar to MSE and ECE the performance estimator, PE, returns a number from 0-1 that indicates how much of the move-ment in stock returns the prediction captures, where 0 is optimal and is given if all movements are captured. This change of cost function does however not im-prove our results in the short run, where the ECE and the investment strategies are worse o (the MSE is however slightly better). In the long run the cost does however decrease signicantly while the investment strategies, that the aim was to improve, do not give a higher return.

4.3.2 Available Information

When applying the model one should take into consideration when the informa-tion is available. Three of the variables in the sample are not available in the rst week of the month, namely X3, X6 and X7 which makes it impossible to apply them if one is relying on one month lead, as for the STP prediction, to invest in the market. One therefore needs to either apply the sample with a two month lead or to apply a sample of the fast available variables. This exercise does however not give impressive results, it should though be noted that assum-ing two month lead does give fairly good results while the sample stripped of X3, X6 and X7 does give bad results. That indicates that the variables excluded are necessary which is in line with the results from the sample combinations found in Table 4.

5 Conclusion

(34)

to be one month in the short run and three months in the long run. Assuming the optimal leads even gave better results than taking no lead at all when the model was tted to the whole period, 1990 2009. The optimal sample used to t the model did decrease with size. Signicantly at rst but when the sample reached the size of ve or more variables the cost reduction of was little or none. Constructing an optimal sample did however not give a concrete solution where the optimal combination of dierent sizes was extremely unstable over time.

As mentioned, a larger sample of variables did improve our results and that are clearly the results when a prediction is made with only a single variable and the historical stock returns. To be able to apply the model in higher dimensions a suitable optimization, DIRECT, is introduced. The prediction made with the whole sample did show some promising results and both in the short and the long run a bear market was signal for the current crisis. In both cases the sig-nal of a bull market has also been given which might indicates that the worst is over. Both the long and the short run did however have some diculties identifying the W-shaped recovery in 2003. That could be explained by the extreme uncertainty in the US market at the time. The short run prediction does also cluster around 2005 which might indicate that a third state, sideway market, should be introduced. The hidden bull market in 2005 might though be a better explanation since the long run prediction is free from a heavy sig-nal cluster. The prediction is however fairly right when the trend is strong for a period of time and any signal might therefore indicate turbulence in the trend. Three naive investment strategies that rely on the prediction were also sug-gested and in all cases they did beat the simple buy-and-hold strategies for both bonds and stocks. That does strengthen the evidence that the model does give reliable prediction, at least up to some extent.

6 Further Studies

Here we present couple of suggested further investigation, we sill only dip our toes a bit in the water and will not go into any details.

6.1 Other Time Series

Applying this model to other time series, e.g. housing prices, interest rate or ex-change rate, and predict turning points in them using macroeconomic variables would be interesting. It is also a bit closer to the economic theory, i.e. that present macroeconomic variables do aect future value of, for example, housing prices.

6.2 Behavior Over Time

(35)

(36)

References

[1] Andersson, E., Bock,D. and Frisén,M. (2006): "Some Statistical Aspects of Methods for Detection of Turning Points in Business Cycles", Journal of Applied Statistics, 33, 257-278.

[2] Boström, J. (2009): "Forecasting Stock Market Return based on Macroe-conomic Variables".

[3] Chen, SS. (2009): "Predicting the bear stock market: Macroeconmomic variables as leading indicators", Journal of Banking & Finance, 33, 211-223.

[4] Fama, Eugene F. (1970): "Ecient Capital Markets: A Review of Theory and Emperical Work", Journal of Finance, 25, 383-417.

[5] Hamilton,J.D. (1989): "A new Approach to the Economic Analysis of Non-stationary Time Series and the Business Cycle", Econometria, 57, 357-384. [6] Hamilton J.D. (1994): "Time Series Analysis", Princton, New Jersey,

Princeton University Press.

[7] Hamilton, J.D., and Susmel, R. (1994); "Autoregressive conditional het-eroskedasticity and changes in regime", Journal of Economics, 64, 307-333. [8] Jones, D.R., Perttunen, C.D., Stuckman, B.E. (1993): "Lipschitzian Opti-mization without the Lipschitz Constant", Journal of OptiOpti-mization Theory and Application, 79, 157-181.

[9] Ivanov, D., Lahiri, K., and Seitz, F. (2000): "Interest rate spreads as pre-dictors of German ination and business cycle", International Journal of Forecasting, 19, 39-58.

[10] Kim, CJ., and Nelson, C.R. (1999): "State-Space Models with Regime Switching", Cambridge, Massachutes, The MIT press.

[11] Knif,J., Kolari,J. and Pynnönen,S.(2005): "What drives correlation be-tween stock market returns? International Evidence"

[12] Koskinen, L. and Öller, LE. (2004): "A Classifying Procedure for Signaling Turning Points", Journal of Forecasting, 23, 197-214.

[13] Maheu, J.M. and McCurdy, T.H. (2000): "Identifying Bull and Bear Mar-kets in Stock Returns", Journal of Business & Economic Statistics, 18100-112.

[14] Neftci, S.N. (1982): "Optimal Prediction of Cyclical Downturns", Journal of Economic Dynamics and Control, 4, 225-241.

[15] Norris, J.R. (1997): "Markov Chains", New York, New York, Cambridge University Press.

(37)

[17] shu] Shubert,B.O. (1972) : "A Sequential Method Seeking the Global Max-imum of a Function", SIAM Journal on Numerical Analysis, Vol.9,No.3, 379-388.

[18] Trujillo-Ortiz, A., R. Hernandez-Walls, K. Barba-Rojo and L. Cupul-Magana (2007). HZmvntest:Henze-Zirkler's Multivari-ate Normality Test. A MATLAB le. [WWW document]. URL

http://www.mathworks.com/matlabcentral/leexchange/loadFile.do?objectId=17931 [19] www.investopedia.com

(38)

A Additional Derivations

A.1 Baye's Theorem

The Baye's Theorem used in the algorithm is a bit dierent from the usual one because of the state dependence of the density of Xtand the fact that the state

is hidden and we wish to derive it18_{. The joint density function of X}

tand Stis

given by:

f (Xt, St+l|Xt−1) = f (Xt|St+l, Xt−1) ∗ P (St+l|Xt−1)

where Xt−1 is the information up to time t-1. Because the state is hidden and

the density of Xtdepends in it, the density function is dened by summing over

all possible values of St+l19

f (Xt|Xt−1) = 1

X

i=0

f (Xt|St+l = i) ∗ P (St+l = i|Xt−1).

The joint density can also be dened as20_:

f (Xt, St+l|Xt−1) = f (Xt|Xt−1) ∗ P (St+l|Xt).

Now the version of Baye's Theorem used in step 2 in the recursive algorithm can be obtained by putting the denitions of the joint distribution together:

P (St+l= j|Xt) =

f (Xt|St+l= j) ∗ P (St+l = j|Xt−1)

P1

i=0f (Xt|St+l= i) ∗ P (St+l = i|Xt−1)

.

A.2 Adding Serial Correlation

When the stock market cycle is assumed to follow an AR(1), see equation (32), the density function and the algorithm would has to be adjusted accordingly. We still have the normal density function but now it depends both on two states, St+l and St+l−1. Let Ztij = (Xt− µj− φ(Xt−1− µi)then

f (Xt|St+l= j, St+l−1= i) = fji(Xt) = 1 (2 ∗ π)n2 ∗ |σ_j|12 ∗ exp(−1 2∗ (Ztij) 0_{∗ σ}−1 j ∗ Ztij (34) and equation (12) now becomes:

P (St+l= j|Xt) =

p0j∗ p0(t + l − 1) ∗ fj0(Xt) + p1j∗ p1(t + l − 1)) ∗ fj1(Xt

P1

i=0((p0i∗ p0(t + l − 1) ∗ fj0(Xt) + p1i∗ p1(t + l − 1) ∗ fj1(Xt)

. (35)

18_{A deravation can also be found in Hamilton (1994), p.693} 19_{Kim and Nelson, p.60-61}

(39)

B Other Markets

Although that the focus is on the U.S. market other markets are of course of our interest, it would therefore be really convenient if other markets and the U.S. market would turn simultaneously. In gure 15 the STP turning points of dierent markets are plotted, where the indices are assumed to reect the corresponding market and just by observation we see that the indices turn at a similar time.

Figure 15: Turning Points in More Markets

(40)

Figure 16: Turning Points in More Markets

The results are no surprising, the prediction gave best results in the Swedish market which is the smallest and might therefore lag or depend more on the others. The Japanese market on the other hand gives the poorest results while the USA, UK and Euro Zone give a fairly similar cost. It is also obvious that the macroeconomic sample did give a better prediction based on the cost, the stock market returns might though improve the sample and the results for the Swedish market results encourage us to take the analysis further.

(41)

Figure 17: Out-Sample prediction in USA using Euro Zone plus macro sample. The results are similar to what obtained before where the clusters still show up in 2003 and 2005 and a bear market is signaled for the 2007 crisis (although here with a strange bull market lasting one month after the signal). Here the model does not predict until in August 2009 with 87% chance which is month later than before. The investment strategies all give worse return, they do how-ever all still beat holding bonds or stocks over the period and when looking over the whole period they perform better in the beginning since the 2003 turning point is captured better.

(42)

Predicting Turning Points

U.U.D.M. Project Report 2009:20

Examensarbete i matematik, 30 hp

Handledare och examinator: Johan Tysk

September 2009

Predicting Turning Points

Jón Árni Traustason

Contents

List of Figures

List of Tables

1 Introduction

1.1 Dening a Turning Point

1.2 Disposition

2 The Model

2.1 Markov Chains

2.2 Hidden Markov Model

2.3 Markov Bayesian Classier

2.4 The Algorithm

2.5 The cost function

3 Estimation

3.1 Smoothing

3.2 Optimization Method

3.3 Verifying the Assumptions

3.4 The Data

4 Estimation Results

4.1 In-Sample Adjusments and Results

4.2 Out-Sample Results

4.3 Buy or Sell

5 Conclusion

6 Further Studies

6.1 Other Time Series

6.2 Behavior Over Time

References

A Additional Derivations

A.1 Baye's Theorem

A.2 Adding Serial Correlation

B Other Markets

1.1 Dening a Turning Point

2.3 Markov Bayesian Classier