IN
DEGREE PROJECT MATHEMATICS, SECOND CYCLE, 30 CREDITS
,
STOCKHOLM SWEDEN 2020
Spectral Portfolio Optimisation
with LSTM Stock Price Prediction
NANCY WANG
Degree Projects in Mathematical Statistics (30 ECTS credits)
Master's Programme in Applied and Computational Mathematics (120 credits) KTH Royal Institute of Technology year 2017
TRITA-SCI-GRU 2020:083 MAT-E 2020:046
Royal Institute of Technology
School of Engineering Sciences
KTH SCI
ABSTRACT
Nobel Prize-winning modern portfolio theory (MPT) has been considered to be one of the most important and influential economic frameworks within finance and investment management. MPT assumes investors to be risk-averse and uses the variance of asset returns as a proxy of risk to maximise the performance of a portfolio. Successful portfolio management reply, thus on accurate risk esti-mate and asset return prediction. Risk estiesti-mates are commonly obtained through traditional asset pricing factor models, which allow the systematic risk to vary over time domain but not in the frequency space. This approach can impose limitations in, for instance, risk estimation. To tackle this shortcoming, interest in applications of spectral analysis to financial time series has increased lately. Among others, the novel spectral portfolio theory and the spectral factor model which demonstrate enhancement in portfolio performance through spectral risk estimation [1][11]. Moreover, stock price prediction has always been a challenging task due to its non-linearity and non-stationarity. Meanwhile, Machine learning has been successfully implemented in a wide range of applications where it is in-feasible to accomplish the needed tasks traditionally. Recent research has demon-strated significant results in single stock price prediction by artificial LSTM neural network [6][34]. This study aims to evaluate the combined effect of these two ad-vancements in a portfolio optimisation problem and optimise a spectral portfolio with stock prices predicted by LSTM neural networks. To do so, we began with mathematical derivation and theoretical presentation and then evaluated the port-folio performance generated by the spectral risk estimates and the LSTM stock price predictions, as well as the combination of the two. The result demonstrates that the LSTM predictions alone performed better than the combination, which in term performed better than the spectral risk alone.
Keywords: Artificial Neural Network, LSTM, Spectral factor model, Portfolio
SAMMANFATTNING
Den nobelprisvinnande moderna portföjlteorin (MPT) är utan tvekan en av de mest framgångsrika investeringsmodellerna inom finansvärlden och invester-ingsstrategier. MPT antar att investerarna är mindre benägna till risktagande och approximerar riskexponering med variansen av tillgångarnasränteavkastningar. Ny-ckeln till en lyckad portföljförvaltning är därmed goda riskestimat och goda förut-sägelser av tillgångspris. Riskestimering görs vanligtvis genom traditionella pris-sättningsmodellerna som tillåter risken att variera i tiden, dock inte i frekven-srummet. Denna begränsning utgör bland annat ett större fel i riskestimering. För att tackla med detta har intresset för tillämpningar av spektraanalys på fi-nansiella tidsserier ökat de senast åren. Bland annat är ett nytt tillvägagångssätt för att behandla detta den nyintroducerade spektralportföljteorin och spektralfak-tormodellen som påvisade ökad portföljenprestanda genom spektralriskskattning [1][11]. Samtidigt har prediktering av aktierpriser länge varit en stor utmaning på grund av dess icke-linjära och icke-stationära egenskaper medan maskininlärning har kunnat använts för att lösa annars omöjliga uppgifter. Färska studier har påvisat signifikant resultat i aktieprisprediktering med hjälp av artificiella LSTM neurala nätverk [6][34]. Detta arbete undersöker kombinerade effekten av dessa två framsteg i ett portföljoptimeringsproblem genom att optimera en spektral portfölj med framtida avkastningar predikterade av ett LSTM neuralt nätverk. Arbetet börjar med matematisk härledningar och teoretisk introduktion och sedan studera portföljprestation som genereras av spektra risk, LSTM aktieprispredikteringen samt en kombination av dessa två. Resultaten visar på att LSTM-predikteringen ensam presterade bättre än kombinationen, vilket i sin tur presterade bättre än enbart spektralriskskattningen.
Nyckelord: Artificiella neurala nätverk, LSTM, Spektralfaktormodell,
ACKNOWLEDGEMENTS
My sincerest gratitude goes to my supervisors Lars Meuller and Fredrik Viklund for their guidance, support and enthusiastic engagement. Lars has always provided multiple choices during the planning of this paper, and he has consistently allowed this paper to be my work while being supportive and enlightening within areas of his expertise. Fredrik has been very encouraging and kind, and he has given valuable and constructive suggestions during the planning and development of this work. It has been a delightful experience to have them as my supervisors.
Moreover, I would like to direct my appreciation to AP4 for offer this thesis opportunity. It has been a very educative experience.
Contents
1 Introduction 10
1.1 Background . . . 10
1.2 Objective . . . 13
2 Preliminary 14 2.1 Time Series Analysis . . . 14
2.1.1 Time Series . . . 14
2.1.2 Stationarity . . . 15
2.1.3 Spectral Density and Periodogram . . . 17
2.1.4 Time Series Models . . . 20
2.1.5 The Projection Theory . . . 24
2.1.6 Wold Decomposition . . . 26
2.2 Multivariate Time Series Analysis . . . 28
2.2.1 Introduction to Vectorial Time Series . . . 28
2.2.2 Multivariate Time Series Models . . . 29
2.2.3 Multivariate Wold Decomposition . . . 31
2.3 Extended Wold Decomposition . . . 33
2.3.1 Formal Derivation . . . 33
2.4 Spectral Decomposition Methods . . . 36
2.4.1 Parametric VAR(p) Method . . . 36
2.4.2 Nonparametric VMA(q) Method . . . 37
2.4.3 Scale Interpretation and Decomposition Example . . . 38
3 Spectral Factor Model 41 3.1 Two-Component Decomposition . . . 41
4 Spectral Portfolio-Optimisation Framework 45
4.1 Markowitz Optimisation Theory . . . 45
4.2 Spectral Risk . . . 46
4.3 Asset Pricing Factor Models . . . 47
4.3.1 Macro-Economical Factor Models . . . 47
4.3.2 Fundamental Factor Models . . . 48
4.4 Example of Risk and Beta Decomposition . . . 50
5 LSTM 53 5.1 Deep LSTM Network . . . 53
5.1.1 Introduction to Recurrent Neural Networks . . . 53
5.1.2 LSTM Neural Network . . . 55
5.2 Global Stock Prediction Example . . . 57
5.3 Hyperparameter Tuning . . . 58
5.4 Prediction Accuracy Evaluation . . . 63
6 Model Design and Data 66 6.1 Model Design . . . 66
6.2 Data . . . 67
7 Empirical Evaluation 68
8 Discussion 74
Symbol Glossary
N {0, 1, 2, . . . }
R the set of real numbers
Z, the set of integer numbers
Z+ the set of positive integer numbers
C the set of complex numbers
E[X] the expectation of random variable X
Cov(X, Y ) the covariance of the random variables X and Y Var(X) the variance of random variable X
L the back shift operator
a0 the transpose of vector a
A−1 the inverse of matrix A
List of Tables
1 Scale interpretation . . . 38
2 Spectra risk decomposition . . . 51
3 Spectra betas & weights . . . 51
4 Prediction MSE for single-layered LSTM . . . 59
5 Prediction MAPE for single layered NNs . . . 59
6 Model summary of the final LSTM model . . . 60
7 Prediction MSE for 2 layer NNs . . . 61
8 Prediction MAPE for 2 layer NNs . . . 61
9 Prediction errors of dropout rates . . . 62
10 Prediction outliers - error distribution . . . 64
11 Prediction outliers - data overview . . . 64
12 Data overview . . . 67
13 Portfolio return overview - CAPM . . . 69
14 Portfolio return overview - FF3 . . . 70
15 Portfolio return overview - Carhart . . . 70
16 Portfolio return overview - FF5 . . . 71
List of Figures
1 Time series de-trending . . . 17
2 Google stock before DFT . . . 19
3 Power spectrum of Google stock . . . 19
4 Reconstruction from partial signal 1 . . . 20
5 Reconstruction from partial signal 2 . . . 20
6 Random variable projection . . . 26
7 Extended Wold decomposition - original data . . . 39
8 Distribution of spectral components . . . 39
9 Spectral orthogonal components . . . 40
10 Spectral betas & risk example - data visualisation . . . 50
11 Unit structure of a RNN . . . 55
12 Unit structure of the LSTM neural network . . . 56
13 One-month ahead global stock prediction by deep LSTM neural network . . . 57
14 Errors of different dropout rates. . . 62
15 Stock prediction error including outliers . . . 63
16 Outliers visualisation . . . 65
17 Stock prediction error excluding outliers . . . 65
18 Optimisation model design . . . 66
19 Portfolio return box plot - 10% target return . . . 71
20 Portfolio return box plot - 15% target return . . . 72
1 Introduction
1.1 Background
Data analysis has become well developed both in theory and in practice during the past decades. Large datasets are becoming available and computational tech-niques are advancing like never before, discovering new quantitative methods and refinement to classical investment problems has become crucial for investors to remain competitive. One of the essential tasks within investment management is portfolio management. A portfolio is a collection of investment assets such as stocks, bonds, commodity, etc. Portfolio Management refers to the process of de-termining investment strategy such that the return is maximised over the investing period. In other words, the main challenge within portfolio management is to al-locate a given capital within a portfolio such that the return is maximum for a given acceptable risk level.
What are risk? In asset pricing theory, there are two types of risk, namely the systematic and the unsystematic risk. The unsystematic risk can be eliminated through diversification, i.e. divide the capital into various kind of assets and spread the risk such that they cancel out one another. According to MPT, diversification is accomplished by choosing assets whose returns are uncorrelated with one another, which makes the covariance matrix of the asset returns a good proxy for risk. The systematic uncertainty is, on the contrary, non-diversifiable. It is, therefore, vital to find an optimal strategy which maximises the portfolio return without exposing oneself for unnecessary risk. By Markowitz portfolio theory [25], this is a quadratic optimisation problem which has two inputs, namely the expected asset returns and the risk estimates, i.e. the variance matrix of asset returns. For this reason, the problem is also called a mean-variance problem. The key to successful portfolio optimisation is thus accurate risk estimation and asset return prediction.
problems have been limited until recently. During the past years, there has been a reborn of interests in spectral analysis to financial and economic time series. Among others, Croux et al. proposed a frequency-based measure to describe the co-movement of co-stationary processes [15], Barunik et al. used frequency decom-position to examine the pricing of market risks in the cross-section of asset returns at various time horizons [16]. In particular, Chomesh et al. [11] introduced a spec-tral portfolio theory in 2016 and decomposed alphas, betas and risk into a sum of periodic functions. Bandi et al. [1] took further advancement and presented the spectral factor model in 2019. The authors performed optimisation to their brand-new spectral portfolio and demonstrated that the spectral decomposition leads to better risk estimate in a mean-variance optimisation problem.
Another equally important input for the mean-variance problem is the predic-tion of future asset returns. Due to the stochastic nature of the asset returns with non-linearity and non-stationarity properties, it is not a simple task to forecast asset returns with high precision, specifically for long forecasting period. Asset returns are commonly predicted by fundamental or technical analysis. Funda-mental analysis evaluates a company’s past performance as well as the credibility of its accounts while technical analysis predicts stock prices solely on the trends of the historical data. These analyses has been successful and well implemented by investors. However, some shortcomings are yet to overcome. In particular, the fundamental analysis requires an extensive amount of workload and acts as a long-term strategy. Similarly, the technical analysis is a short-term strategy due to the non-linearity and non-stationarity of asset prices.
major global stock indices [6] and Ta et al. performed stock price predicted with significant result in the context of portfolio optimisation [34].
1.2 Objective
With the background presented above, this paper aims to study the combined effect of spectral portfolio theory and the LSTM based asset price prediction. This goal can be divided into two parts. The first part is to understand the spectral risks, betas, how they appear in a spectral factor model and how they act in a spectral portfolio context. This part comprises both theoretical and practical understand-ing of the spectral factor model, which includes but not limits to the mathematical derivation and empirical portfolio optimisation with financial datasets. The sec-ond part is to explore the possibility of long-horizon time series forecasting by an LSTM neural network. This part consists of a theoretical understanding of the LSTM neural network and hyperparameter tuning to set up an optimal prediction model. Finally, to achieve the objective of this study, a mean-variance portfolio was optimised with the risk estimates evaluated by the spectral factor model and the expected asset returns predicted by the LSTM neural network, as well as the combination of the two. Thenceforth, the portfolio performances were evaluated and compared between the different models.
2 Preliminary
2.1 Time Series Analysis
In this section, we introduce some basic time-series properties and relevant con-cepts that will be needed for financial time series studies in this paper.
2.1.1 Time Series
Let us begin by defining a stochastic process:
Definition 1. A stochastic process is a family of random variables {Xt, t ∈ T}
defined on a probability space (Ω, F, P ).
The set T above is known as the parameter set or the index set, some common index sets are: Z+
, Z, R+ and R. A stochastic process with T ⊂ Z is usually called
a time series. Moreover, a stochastic process is usually described by its means and covariances, occasionally under some general assumptions.
Definition 2. For {Xt, t ∈ T } a stochastic process with Var(Xt) < ∞, the
mean and covariance functions are defined as µX(t) = E[Xt] respective γX(r, s) =
Cov(Xr, Xs), where {t, r, s} ∈ T .
Definition 3. A realisation is a function f : t → xt which assigns the random
variable Xt a realisation xt at each point in time t. We denote such a realisation
by {xt}.
We call the realisation or its underlying stochastic process a time series. Loosely speaking, a time series is a sequence of observations {xt} whose element xt is an
observed outcome of the random variable Xt at time t. In general, observations
can be done randomly over a continuous time interval or at some fixed discrete time points. In this paper, we restrict ourselves to equidistant discrete time series since we aim to study financial time series which are commonly daily, monthly or quarterly data.
The main focus of time series analysis is to understand the relationship between observations. Often, we assume some parametric trend and de-trend the data before further analysis. Consider the classical time series decomposition:
Xt = mt+ st+ Yt, (1)
• mt :a trend component, commonly a slowly changing function
• st: a seasonal component, commonly a periodic function
• Yt: a zero mean time series
Some common trends are:
• The linear parametric trend, st = 0:
Xt= β0+ β1t
| {z }
mt
+Yt.
• The quadratic parametric trend, st = 0:
Xt= β0+ β1t + β2t2
| {z }
mt
+Yt.
• The seasonal parametric trend, mt = 0, Ω ∈ R denotes the period:
Xt= β0+ β1sin 2πt Ω + β2cos 2πt Ω | {z } st +Yt.
2.1.2 Stationarity
When studying a time series, we often look at its stationarity, i.e. whether its statistical properties remain constant over time. Mathematically, there are dif-ferent levels of stationarity. Here we introduce strictly and weakly stationarity.
Definition 4. A stochastic process {Xt, t ∈ Z} is strictly stationary if the
distri-bution of (Xt1, . . . , Xtk) and (Xt1+h, . . . , Xtk+h) are the same ∀ t1, . . . , tk, h ∈ Z
Definition 5. For a stochastic process {Xt, t ∈ Z}, if
(i) Var(Xt) < ∞ ∀ t ∈ Z,
(ii) µX(t) = µ ∀ t ∈ Z,
(iii) γX(r, s) = γX(r + t, s + t) ∀ r, s, t ∈ Z,
then the process is said to be weakly stationary or covariance stationary.
From Definition 5, one can see that it is convenient to consider the covariance in term of r − s and define γX(h) ≡ γX(h, 0), where h is commonly known as the
lag. A weakly stationary process is often studied through its autocovariance and
Definition 6. For a covariance stationary stochastic process {Xt, t ∈ T }, the
auto-covariance and the autocorrelation function are defined as γX(h) = Cov(Xt+h, Xt)
and ρX(h) = γγXX(h)(0), respectively.
One of the simplest covariance stationary process is the white noise which is defined as follows.
Definition 7. A process {Xt, t ∈ Z} is said to be a white noise with mean µ and
variance σ2, written {X
t∼ W N (µ, σ2)}, if E[Xt] = µ and γ(h) = σ2 if h = 0 and
zero otherwise.
One example of a strictly stationary process is the Gaussian process defined as follows.
Definition 8. The process {Xt, t ∈ Z} is said to be Gaussian if all
finite-dimensional distributions of {Xt} are multivariate normally distributed.
Most of the time series cannot be considered as strictly stationary. For this reason, much literature drops the word weakly and simply use stationary for weakly stationary processes without further specification. We adopt this custom in the rest of this paper.
Theorem 1. The covariance function γX(h) of a stationary process {Xt, t ∈ Z}
is an even positive definite function. Conversely for any even positive semi-definite function there exists a stationary time series with this positive semi-semi-definite function as its covariance function.
Theorem (1) follows from the definition of the covariance function of a station-ary process. The proof of the theorem can be found in e.g. [28].
Let us re-consider the classical time series decomposition as per Equation (1), let st = 0. One way to extract the deterministic component mt such that Yt is
Figure 1: Monthly adjusted closing price of Amazon.com,Inc. in the upper plot and the de-trended log difference plot at the bottom.
2.1.3 Spectral Density and Periodogram
A time series is often defined by its mean and covariance. However, it is not always a simple task to find a spatial covariance function. Sometimes, a walk-around of the problem is to look into the frequency space or the reciprocal space of the time series and study its spectral density.
Assume the autocovariance function γ of a time series {Xt, t ∈ Z} is summable
in the sense that P∞
h=−∞|γ(h)| < ∞, then the spectral density, f, of the time
series is defined as f (λ) = 1 2π ∞ X h=−∞ e−ihλγ(h), −π ≤ λ ≤ π.
It then follows by taking the inverse Fourier transform that ˆ π −π eihλf (λ)dλ = ˆ π −π 1 2π ∞ X l=−∞ ei(h−l)λγ(l)dλ = 1 2π ∞ X l=−∞ γ(l) ˆ π −π ei(h−l)λdλ = γ(h).
Let us consider Equation (1) with seasonal parametric trend, zero mean and let ω = 2π
Xt = β1sin (ωt) + β2cos (ωt) + Yt,
where β1 and β2 are uncorrelated random variabels with zero means and variances
σ2. We note specifically that β
1sin (ωt) + β2cos (ωt) is periodic since E[Xt] = 0,
the covariance function is then,
γX(h) = E[Xt+hXt]
= E[AB(· · · )] + E[A2cos(ω(t + h))cos(ωt)] + E[B2sin(ω(t + h))sin(ωt)]
= 0 + E[A2]cos(ω(t + h))cos(ωt) + E[B2]sin(ω(t + h))sin(ωt) = σ2cos(ω(t + h))cos(ωt) + σ2sin(ω(t + h))sin(ωt)
= σ2{cos(ω(t + h))cos(ωt) + sin(ω(t + h))sin(ωt)} = σ2cos(ωh).
Expressed in the complex form, we get the spectral representation of Xt,
γX(h) =
σ2
2
e−ihω+ eihω. (2)
If the process is covariance stationary, then by Wiener-Khinchine theorem [35] , the power spectral density, S, of the process is
S(ν) =
∞
X
h=−∞
γX(h)e−i2πνh,
where ν denotes the frequency. In words, the Wiener-Khinchine theorem states that the power spectrum of a covariance stationary stochastic process is defined as the Fourier transform of its autocorrelation function. The theorem provides a link between the power spectrum of a stationary process and the Fourier transform of the autocorrelation function.
Figure 2: Monthly return of the Google stock before decomposition, the data was dated between Jan 2005 - December 2019.
Figure 4: Reconstruction prom partial signal less than 1 cycle per year. Note that the 2008 financial crisis is visible in this spectrum.
Figure 5: Reconstruction with the signal from 2-3 cycles per year. Note that the 2008 financial crisis is no longer visible in the curve.
2.1.4 Time Series Models
Definition 9. A time series model is a specification of the joint distribution of
{Xt} for which {xt} is a realisation.
One can think of a time series model as a model of the observations, i.e. a model of the data {xt}. We precede to introduce some of the standard time series
Definition 10. The process {Xt, t ∈ Z} is said to be a moving average of order
q, MA(q), if
Xt= Zt+ θ1Zt−1+ · · · + θqZt−q, {Zt} ∼ W N (0, σ2),
where {θ1, · · · , θq} ∈ R.
Definition 11. The process {Xt, t ∈ Z} is said to be a linear process if it has the
representation Xt= ∞ X k=−∞ ψkZt−k, {Zt} ∼ W N (0, σ2), where ∞ X k=−∞ |ψk| < ∞.
Theorem 2. A linear process {Xt, t ∈ Z} with representation
Xt=
∞
X
k=−∞
ψkZt−k, {Zt} ∼ W N (0, σ2),
is stationary with µX(t) = 0, autocovariance function
γX(h) =
∞
X
k=−∞
ψkψk+hσ2,
and spectral density fX(λ) = σ2 2π|ψ(e −iλ )|2, where ψ(x) = ∞ X k=−∞ ψkxk.
Proof. We start by computing the mean and autocovariance function, given {Zt} ∼
and γX(h) = E [Xt+hXt] = E ( ∞ X k=−∞ ψkZt+h−k)( ∞ X j=−∞ ψjZt−j) = ∞ X k=−∞ ∞ X j=−∞ ψkψjE [Zt+h−kZt−j] = ∞ X k=−∞ ∞ X j=−∞ ψk+hψjE [Zt−kZt−j] = ∞ X k=−∞ ψk+hψkE [Zt−kZt−k] = ∞ X k=−∞ ψkψk+hσ2.
Moreover, by using the spectral density and γX(h), we have
fX(λ) = 1 2π ∞ X h=−∞ e−ihλγX(h) = 1 2π ∞ X h=−∞ e−ihλ ∞ X k=−∞ ψkψk+hσ2 = σ 2 2π ∞ X h=−∞ ∞ X k=−∞ e−ihλψkψk+h = σ 2 2π ∞ X h=−∞ ∞ X k=−∞ ψkeikλψk+he−i(k+h)λ = σ 2 2π ∞ X h=−∞ ∞ X k=−∞ ψkeikλψke−ikλ = σ 2 2πψ(e iλ )ψ(e−iλ) = σ 2 2π|ψ(e −iλ )|2.
Definition 12. The process {Xt, t ∈ Z} is said to be an autoregressive moving
average process of order (p, q), ARMA(p, q), if it is stationary and if Xt− φ1Xt−1− · · · − φpXt−p = Zt+ θ1Zt−1+ · · · + θqZt−q,
where {Zt} ∼ W N (0, σ2).In addition, we say that {Xt}is an ARMA(p, q) process
It is common to write the ARMA(p, q) process in the alternative form φ(L)Xt= θ(L)Zt, t ∈ Z, where φ(z) = 1 − φ1z − · · · − φpzp θ(z) = 1 + θ1z + · · · + θqzq,
where the backshift operator (also called lag operator), L, is defined as LjX t =
Xt−j.
Specifically, we have a MA(q) process if p = 0, and if q = 0 we have an autoregressive process defined as follows.
Definition 13. The process {Xt, t ∈ Z} is said to be an autoregressive process of
order p, AR(p), if it is stationary and if
Xt− φ1Xt−1− · · · − φpXt−p= Zt, {Zt} ∼ W N (0, σ2).
We say that {Xt} is an AR(p) with mean µ if {Xt− µ} is also an AR(p) process. Definition 14. An ARMA(p, q) process defined by the equations
φ(L)Xt= θ(L)Zt, {Zt} ∼ W N (0, σ2),
is said to be causal if there exists constants {ϕj} satisfying P∞j=0|ϕj| < ∞ and
Xt=
∞
X
j=0
ϕjZt−j, t ∈ Z.
In other words, the property of being causal is a way to relate {Xt} and {Zt},
it does not belong to the process {Xt} alone. Another equivalent way to express
the causality is to require the following relationship Cov(Xt, Zt+j) = 0, ∀j ∈ Z+.
Theorem 3. Let {Xt} be an ARMA(p,q) process for which φ and θ have no
common zeros. Then {Xt} is causal if and only if φ(z) 6= 0, ∀|z| ≤ 1, z ∈ C. The
coefficients {ϕj} are uniquely defined by the identity
∞ X j=0 ϕjzj = θ(z) φ(z).
2.1.5 The Projection Theory
Before proceeding further, we need to familiarise the projection theory in vector space. It is assumed that the readers are familiar with the definition and basic properties of Hilbert space. See [37] for a more detailed introduction.
Consider the random variables X1, · · · , Xn, Y, all with finite mean and
vari-ance. We would like to predict Y in terms of X1, · · · , Xn, we restrict ourselves to
linear prediction and aim to find an estimator ˆY such that ˆ
Y = a0+ a1Xn+ · · · + anX1,
minimises the mean square error E[(Y − ˆY )2].
Denote E[Xi] = µi, E[Y ] = µ, γ = (γ1, . . . , γn)0 = (Cov(Xn, Y ), . . . , Cov(X1, Y )),
and Γ = γ1,1 · · · γ1,n ... ... γn,1 · · · γn,n = Cov(Xn, Xn) · · · Cov(Xn, X1) ... ... ... Cov(X1, Xn) · · · Cov(X1, X1) .
Note the indices are purposely skewed backwards. We can then for simplicity write,
ˆ Y − µ = a0+ a1(Xn− µn) + · · · + an(X1− µ1). Now E[(Y −Y )ˆ 2] = E[(Y − µ − a0 − a1(Xn− µn) − · · · − an(X1− µ1))2] = a20+ a0E[(Y − µ) − a1(Xn− µn) − · · · − an(X1− µ1)]+ + E[((Y − µ) − a1(Xn− µn) − · · · − an(X1− µ1))2] = E[((Y − µ) − a1(Xn− µn) − · · · − an(X1− µ1))2].
Where the last equality is valid by setting a0 = 0 so that E[(Y − ˆY )2] is
∂E[(Y − ˆY )2]
∂ai
= ∂E[((Y − µ) − a1(Xn− µn) − · · · − an(X1− µ1))
2]
∂ai
= −2E[(Wn−i+1− µn−i+1)((Y − µ) − a1(Xn− µn) − · · · − an(X1− µ1))]
= −2{E[(Wn−i+1− µn−i+1)((Y − µ)] − a1E[(Wn−i+1− µn−i+1)((Xn− µn)]−
· · · − anE[(Wn−i+1− µn−i+1)((X1− µ1)]}
= −2{Cov(Wn−i+1, Y) − a1Cov(Wn−i+1, Xn)−, · · · , −anCov(Wn−i+1, X1)}
= −2(γi− n
X
j=1
aiγij) = 0, i = 1, . . . , n.
Hence E[(Y − ˆY )2] is minimised if γ
i = Pnj=1aiγij for i = 1, . . . n. In vector
form, we can write γ = Γa or a = Γ−1γ if Γ is an invertible matrix.
Now suppose there is another vector a∗ associated with an estimator ˆY∗ such
that γ = Γa∗, where a∗ is not necessarily identical as a, then
Var( ˆY − ˆY∗) = (a − a∗)0Γ(a − a∗) = (a − a∗)0(γ − γ) = 0.
Thus ˆY is a unique estimator of Y when Γ is nonsingular. Note that our calculation does not require {µ, µ1, . . . , µn} = 0. Moreover, the predictor ˆY is
determined by
Cov(Y − ˆY , Xi) = 0 ∀ i = 1, . . . , n. (3)
Consider the space L2(Ω, F , P )of all random variables W defined in the
proba-bility space (Ω, F, P ) such that E[W2] < ∞. Let the inner product of two random
variables in the space be defined as <W1, W2> = E[W1W2], and consider the
Hilbert space generated by the random variables defined as
H = a0Y + a1Xn+ · · · + anX1, ai ∈ R∀ i ∈ {0, 1, . . . , n},
and its observable subspace
M = a1Xn+ · · · + anX1, ai ∈ R ∀ i ∈ {1, . . . , n}.
For two random variables W1 and W2 in H, W1 ⊥ W2 is equivalent as saying
Cov(W1, W2) = 0, which in term implies Var(W1 + W2) = Var(W1) + Var(W2).
Y − ˆY ⊥ M, for this reason, Equation (3) is often referred as the Projection theory.
We illustrate this relation in Figure (6), note that ˆY ∈ M ⊂ H, Y − ˆY ∈ M⊥ ⊂ H.
Figure 6: Illustration of the projection theorem for stochastic variables. Intuitively, this is the same as vector projection although the orthogonality between Y − ˆY and ˆY is expressed by Cov(Y − ˆY , ˆY ) = 0.
In the upcoming chapters, we will use the following notations without further specification:
M = sp{W1, . . . , Wn}, the Hilbert space spanned by {W1, . . . , Wn},
ˆ
Y = Psp{W1,...,Wn}Y, the projection of Y on M,
E[|Y −Y |ˆ 2] = Var(Y − ˆY ) = ||Y − ˆY ||2, the variance of the error between Y and ˆY .
2.1.6 Wold Decomposition
Let {Xt} be a zero mean stationary process. Denote Mn = sp{Xt, t ≤ n}, σ2 =
E[|Xn+1 − PMnXn+1|
2] and M
−∞ = T∞n=−∞Mn. The Hilbert space M−∞ is
commonly known as the infinite past.
Definition 15. The process {Xt}is called deterministic if σ2 = 0, or equivalently
if Xt∈ M−∞∀ t ∈ Z.
It should be pointed out that the space Mn is the observable Hilbert space,
being deterministic means that the mean square error between one step predic-tion value Xn+1 and the projection of Xn+1 on the entire past is zero. It is also
Definition 16. The process {Xt}is called purely non-deterministic if M−∞= {0}.
Theorem 4. (The Wold Decomposition Theorem) Any zero-mean covariance
sta-tionary process {Xt} with σ2 > 0 can be expressed as
Xt= ∞ X j=0 ψjZt−j+ Vt, where • ψ0 = 1 and P∞j=0ψj2 < ∞, • {Zt} ∼ W N (0, σ2) and {Vt} is deterministic, • Zt∈ Mt, Vt∈ M−∞ ∀ t ∈ Z, • E[ZtVs] = 0 ∀ t, s ∈ Z.
Moreover, the sets {ψj}, {Zt} and {Vt} are uniquely determined by {Xt} and the
conditions above.
2.2 Multivariate Time Series Analysis
2.2.1 Introduction to Vectorial Time Series
In practice, it is often beneficial to analyse multiple time series simultaneously. Partially because some models require numerous variables to be analysed at the same time, partially because the covariance and correlation between variables are better capture in a multivariate model. The models treated later in this study in-volve modelling and analysing multiple financial time series simultaneously. There-fore, we study some multivariate time series in this section.
Let {Xt} = {. . . , Xt−1, Xt, Xt+1, . . . } be an m-dimensional stochastic process
consists of random vectors. Denote Xt = (X1,t, X2,t, . . . , Xm,t)0 as the random
vector on Rm, i.e. {X
t} consists of m-component time series {X1,t, . . . , Xm,t}
evolving simultaneously in time. Moreover, {Xt} is covariance stationary if every
component time series is covariance stationary. We defined the multivariate mean as µt≡ E[Xt] = E[X1,t] ... E[Xm,t] = µ1 ... µm ,
the covariance matrix as Γ0 = Var(Xt) = Cov(Xt) = Cov(X1,t, X1,t) · · · Cov(X1,t, Xm,t) ... ... ... Cov(Xm,t, X1,t) · · · Cov(Xm,t, Xm,t) .
and the correlation matrix
R0 = Corr(Xt) =
√
DΓ0
√
D, where D = diag(Γ0),
where both Γ0 and R0 are symmetric matrices. In terms of lag, the lag h
cross-covariance matrix is Γh = Cov(Xt, Xt−h) = Cov(X1,t, X1,t−h) · · · Cov(X1,t, Xm,t−h) ... ... ... Cov(Xm,t, X1,t−h) · · · Cov(Xm,t, Xm,t−h) .
Similarly, the lag h cross-correlation matrix is defined as
Rh = Corr(Xt) =
√
DΓh
√
D, where D = diag(Γ0),
2.2.2 Multivariate Time Series Models
Definition 17. The process {Xt} is said to be an vector autoregressive
moving-average process of order (p, q), VARMA(p, q), process if it is stationary and if
Xt− Φ1Xt−1− · · · − ΦpXt−p= Zt+ Θ1Zt−1+ · · · + ΘqZt−q,
where Φ 6= 0, Θ 6= 0 and {Zt} ∼ W N (0, Σ).We say that {Xt}is an VARMA(p, q)
process with mean µ if {Xt− µ}is an VARMA(p, q) process.
The VARMA(p, q) process has an alternative form in term of back shift oper-ator: Φ(L)Xt= Θ(L)Zt, t ∈ Z, where Φ(z) = Im− Φ1z − · · · − Φpzp Θ(z) = Im+ Θ1z + · · · + Θqzq,
and Φ(L), Θ(L) are m × m matrices whose elements are lag polynomials of order less or equal to p and q, respectively.
If p = 0, there is no autoregressive part, we have a simple vector moving-average process, VMA(q). If q = 0, then there is no moving-moving-average part, we have a simple vector autoregressive process, VAR(p).
Let {Xt}be a m-variate random process. The basic p-lag vector autoregressive
model, denoted as VAR(p), has the form
Xt− Φ1Xt−1− Φ2Xt−2− · · · − ΦpXt−p = Zt, (4)
where the matrices {Φi}pi=1 are coefficients of size m × m, the white noise Zt ∼ W N (0, Σ)is of size m × 1.
In terms of the lag operator L, the VAR(p) takes the form Φ(L)Xt= Zt,
where Φ(L) = Im− Φ1L − · · · − ΦpLp. The VAR(p)process is stable if det(Im−
Φ1z − · · · − Φpzp) 6= 0, ∀ z ∈ C, |z| ≤ 1. Assuming the process has been initialised
in the infinite past M−∞, then the stable VAR(p) process is stationary and ergodic
with time invariant means, variances and autocovariances.
In fact, we can write any VAR(p) process to VAR(1) in the companion form
if we define Yt = Xt Xt−1 ... ... Xt−p+1 , Λ = A1 A2 · · · Ap Im 0 · · · 0 0 Im 0 · · · 0 ... ... ... ... ... 0 · · · 0 Im 0 , Et = Zt 0 ... ... 0 .
The matrix Λ is commonly known as the companion matrix of the VAR(p) process. In this form, the VAR(p) process is stable if and only if all the eigenvalues of the companion block matrix have modulus less than one. Note that if Xt is stable,
then it is also covariance stationary.
The VAR(p) model is advantageous in the sense that it is flexible and easy to model for common multivariate time series analysis. It has been proven to be useful for representing and forecasting economical and financial time series. However, it should be noted that this basic VAR(p) model may not always be sufficient to represent the essential characteristics of the data. One extended version is
Xt= c + A1Xt−1+ A2Xt−2+ · · · + ApXt−p+ ΨQt+ ΦZt+ t,
where Qt representing the deterministic components, Zt represents the stochastic
exogenous variables and Ψ, Φ are the associated coefficient matrices.
Definition 18. A process {Xt} with Φ(L)Xt = Θ(L)Zt is called causal with
respect to {Zt}if and only if there exists a sequence of absolutely summable matrices
{Ψj}, j ∈ N,P∞j=0|Ψj| < ∞ such that Xt= ∞ X j=0 ΨjZt−j.
Theorem 5. Let {Xt} be a VARMA(p,q) process with Φ(L)Xt = Θ(L)Zt and
assume that
det(Φ(z)) 6= 0 ∀z ∈ C with |z| ≤ 1,
then the stochastic difference equation Φ(L)Xt= Θ(L)Zt has exactly one
station-ary solution with causal representation
Xt=
∞
X
j=0
ΨjZt−j,
whereby the sequence of matrices {Ψj} is absolutely summable and where the
matrices are uniquely determined by the identity
Here we note a special case which is crucial in our study, if q = 0, i.e. Θ(L) =
Im, then we have a VAR(p) process, namely
Φ(L)Xt= ImZt = Zt ⇐⇒ Xt= Φ(L)−1Zt (5)
moreover,
Φ(L)Ψ(L) = Im ⇐⇒ Ψ(L) = Φ(L)−1.
If we restrict ourselves further and let p = 1, and consider the VAR(1) process
Yt= ΛYt−1+ Et.
we can then recursively write
Yt= ΛYt−1+ Et = Λ(ΛYt−2+ Et−1) + Et = Λ(Λ(ΛYt−3+ Et−2) + Et−1) + Et = Λ(Λ(Λ(· · · ) + Et−2) + Et−1) + Et = Et+ ΛEt−1+ Λ2Et−2+ Λ3Et−3+ · · · = ∞ X j=0 ΛjEt−j = ∞ X j=0 ΛjLjE t = ∞ X j=0 ΛjLj Et, where Λ0 = I
m. Combining with Equation (5), we have following ways to express Yt, Yt = Λ(L)−1Et= ∞ X j=0 ΛjEt−j = ∞ X j=0 ΛjLj Et. (6) In particular, we note Λ(L)−1 = ∞ X j=0 ΛjLj, (7)
2.2.3 Multivariate Wold Decomposition
co-variance stationary time series {Xt} can be decomposed as Xt= Vt+ ∞ X k=0 ΨkZt−k, where
• {Vt} is m dimensional linearly deterministic process.
• Ψ0 = I and P∞k=0ΨkΨ0k < ∞.
• {Zt} is multivariate white noise process with – E[Zt] = 0 and
– Var(Zt) = E[ZtZ0t] = Σ
– cov(Zt, Zt−k) = E[ZtZ0t−k] = 0 ∀ k ∈ Z+ – cov(Zt, Vt−k) = 0 ∀ k ∈ Z.
2.3 Extended Wold Decomposition
In this chapter, we put forward the theoretical derivation of the extended Wold decomposition first introduced in 2017 by Cerreia-Vioglio et al. [10]. This repre-sentation is obtained in the time-domain, but it offers a tool to analyse frequency-specific properties simultaneously. In their derivation, they used a Hilbert A-modules [8] which is a generalisation of Hilbert spaces. Ortu et al. observed and justified that the extended Wold decomposition is the Discrete Haar Transform (DHT) of the original innovations of a classical wold decomposition [29]. In their study, they introduced a decomposition method by transforming the moving aver-age representation of a zero-mean covariance stationary process. Their extended Wold decomposition is based on a scaling operator that makes averages of process innovations. In particular, they introduced a scale and each component at scale
j is an infinite moving average driven by a sequence of orthogonal innovations
that are located on a time grid whose unit spacing is 2j times of the original unit
grid. The DHT is a wavelet transform that is based on the Haar function, here we simply state that DHT is suitable in the analysis of the localised features of a signal due to the orthogonality of the Haar function. For more details, see for instance chapter 5 and 6 in [36].
In our derivation, we start by presenting the extended Wold representation for a univariate stochastic process in the same manner as that of Ortu et al. [29]. The univariate case will give us a good mathematical understanding of extended decomposition itself. After that, we present the multivariate case, which is needed in the application of the spectral factor model for portfolio analysis.
2.3.1 Formal Derivation
Let {Xt} be a real-valued zero mean covariance-stationary process defined on the
space L2(Ω, F , P). The Wold decomposition of the X
t is then Xt= ∞ X k=0 ψkZt−k,
where {Zt} is a unique white noise process and {ψk} ∈ R is a square-summable
sequence with ψk = E[XtZt−k]. The coefficients {ψk} are named {αk} in [10],
however we adapt the notation to the context of this paper so it is easier for the reader to follow.
Haar Transformation (DHT), {Z(j) t }, as Zt(j) = √1 2j 2j−1−1 X i=0 Zt−i− 2j−1−1 X i=0 Zt−2j−1−i .
We then consider a sub-series of {Z(j)
t }defined in the support S
(j)
t = {t − k2j, k ∈
Z}. The process {Zt(j)} is a moving average of order 2j − 1 with respect to the
innovations {Zt}. In common economical and financial time series, there is some
correlation between the scale variables Z(j)
t−k2j and Z
(j)
τ −k2j with |t − τ| ≤ 2j − 1.
However, on the support S(j)
t , each sub-series {Z
(j)
t−k2j, k ∈ Z} is a white noise
process.
To be more explicitly, we decompose the space Ht(Z)if infinite moving averages
whose underlying white noise process is {Zt}into orthogonal subspaces Wt(j)where
Ht(Z) = (∞ X k=0 akZt−k, ∞ X k=0 a2k< ∞ ) , Wt(j) = (∞ X k=0 b(j)k Zt−k2(j) j, b (j) k ∈ R, j ∈ Z + ) .
It is evident that Xt in contained in the space Ht(Z) since it is a linear
summa-tion of the innovasumma-tions {Zt} which are contained in Ht(Z). Let X
(j)
t denote the
projection of Xt on the subspaces W
(j)
t , j ∈ Z+, we can then re-express Xt as
Xt=
∞
X
j=1
Xt(j).
Fix t, the components X(j)
t are orthogonal to each other by construction. In
addition, since X(j) t ∈ W (j) t , it can be expressed as Xt(j)= ∞ X k=0 ϕ(j)k Zt−k2(j) j,
for some real coefficient sequence {ϕ(j)
k }such that ∞ X k=0 ϕ(j)k 2 < ∞. The coefficients {ϕ(j)
k } can be obtained by projecting Xt on subspace spanned
Thus, we get the extended Wold representation of Xt as Xt= ∞ X j=1 Xt(j) = ∞ X j=1 ∞ X k=0 ϕ(j)k Zt−k2(j) j. (9)
After some algebraic manipulation of Equation (8) we arrive at
ϕ(j)k = √1 2j 2j−1−1 X i=0 ψk2j+i− 2j−1−1 X i=0 ψk2j+2j−1+i . (10)
We note that the coefficients in the extended Wold representation is also a DHT of the coefficients of classical Wold representation.
Consider now the multidimensional case. Specifically, let us consider a m -dimensional covariance stationary process {Xt} = {. . . , Xt−1, Xt, Xt+1, . . . } with
zero mean. Denote Xt = (X1,t, X2,t, . . . , Xm,t)0 as the random vector on Rm. The
classical vectorial Wold representation is
Xt= X1,t ... Xm,t = ∞ X k=0 ΨkZt−k.
From the univariate case, we get the analogous multivariate extended Wold representation Xt = ∞ X j=1 ∞ X k=0 ϕ(j)k Z(j)t−k2j. (11)
where the extended Wold coefficients are ϕ(j)k = √1 2j 2j−1−1 X i=0 Ψk2j+i− 2j−1−1 X i=0 Ψk2j+2j−1+i , (12)
and the innovation process at scale j is
2.4 Spectral Decomposition Methods
In this section, we present two methods to decompose a vectorial time series into frequency-specific orthogonal components. The decomposition is conducted in the time domain, the components are, however, in orthogonal frequency spaces.
2.4.1 Parametric VAR(p) Method
Given a time series vector, one can find the order of autoregression using some order selection program, e.g. the Statsmodels package for Python user. With p obtained from order selection, we can then consider a n-variate VAR(p) process and write the process as
xt = p
X
i=1
Aixt−i+ zt,
where xt = (x1,t, x2,t, . . . , xn,t)0. The error process zt is a white noise process with
zero mean and diagonal covariance matrix E(ztz0t) = Σz. As we have seen in
Chap-ter (2.2.2), this process can be rewritten as a VAR(1) process in the companion form
Xt= ΛXt−1+ Zt. (14)
An estimate of the parameter matrix ˆΛand the error process ˆZtcan be obtained
by the Ordinary Least Square (OLS) method. Once this is done, we check the convergence condition in Theorem (5), i.e. we check if det(Inp − ˆΛz) 6= 0 ∀z ∈
C with |z| ≤ 1. This is readily done by asserting the eigenvalues of matrix ˆΛ to have absolute value less or equal to one. Given that the assertion is true, we can express Xt in the Wold representation
Xt= ∞ X i=0 ΨiZt−i= ∞ X i=0 ˆ ΛiZˆt−i, (15)
and the zeroth coefficient is ˆΛ0 = I
np. Furthermore, we recall the extended Wold
representation from Chapter (2.3), hence we can express Xt by the estimates as Xt= ∞ X j=1 X(j)t = ∞ X j=1 ∞ X k=0 ˆ ϕ(j)k Zˆ(j)t−k2j. (16)
and ˆ Z(j)t = √1 2j 2j−1−1 X i=0 ˆ Zt−i− 2j−1−1 X i=0 ˆ Zt−2j−1−i . (18)
2.4.2 Nonparametric VMA(q) Method
In this section, we introduce a nonparametric method to decompose a multivariate covariance-stationary process into frequency-specific components representing its evolution in time and frequency. In contrast to the parametric approach, which is built on the VAR(p) model, the nonparametric method is built on the moving average process. In particular, we use the Discrete Haar Transform to transform the time series into its orthogonal frequency components.
Consider a multivariate time series {Xt}, we first construct its moving average
π(j)t = 1 2j 2j−1 X i=0 Xt−i. In particular, we note π(j)
t = Xt for j = 0. We then define the j-th frequency
component of Xt to be X(j)t = π (j−1) t − π (j) t .
For a fix 1 ≤ J ∈ Z, we have
Xt= J
X
j=1
X(j)t + Xrest ,
where the residual term Xres t = π
(J )
t .
In order to gain some intuition, we observe firstly that the moving averages πt(j) is a 2j period moving average of X
t and it satisfies the recursion
πt(j) = π (j−1) t + π (j−1) t−2j−1 2 . Now, since X(j)
t is the difference between the moving average of length 2j−1 and
2j, it captures the fluctuations that includes in the 2j−1 moving average but not
the 2j moving average. To be more explicit, X(j)
t captures the fluctuations whose
half-life belongs to the time interval [2j−1, 2j). Hence, the residual term converges
2.4.3 Scale Interpretation and Decomposition Example
In this section, we illustrate the spectral decomposition by an example. In partic-ular, we implement the non-parametric method on the monthly return of the S&P 500 index. To interpret the decomposition, we present the scale interpretation in Table (1). We remark that the base can be replaced by an arbitrary positive integer greater than one. For instance, one can change the base to 5 for daily data, and the cycle interval will be [1, 5), [5, 25), . . . days.
Scale Cycle(period) j = 1 [1,2) j = 2 [2,4) j = 3 [4,8) j = 4 [8,16) j = 5 [16,32) j = 6 [32,64) ... ... j = J [2J, ∞)
Table 1: Interpretation of the scale j. Recall that the frequency is the inverse of the period length. Hence, larger j corresponds to lower frequency and vice versa. The max-imum frequency resolution is not limited, if the historical data are short, it is preferred to have a smaller J. On the other hand, if one is interested in studying the characters of low-frequency components, one could fix a larger J. Furthermore, the period has the same unit as the data. For instance, if the data are daily, then the cycle is counted in days.
A scatter plot of the original data is presented in Figure (7). In order to preserve as much as possible information in the study of the components, the mean was not removed before the decomposition. It can be seen from the plot that there is some cyclic behaviour when the large spikes are disregarded.
Figure 7: Monthly return of the S&P 500 index before decomposition, the data was dated between May 1960 - December 2019.
3 Spectral Factor Model
In this section, we introduce and derive the spectral factor model presented by Bandi et al. (2019) [1]. We shall show that the spectral factor model provides a handy tool to allow the systematic risks to be varied across different frequencies in the time domain.
First, let us recall the one factor linear model for two covariance-stationary time series Yt, Xt which obeys:
Yt = α + βXt+ Zt, (19)
where Zt ∼ W N (0, σ2) and Xt is called the factor. The tradition beta is then
given by
β = Cov(Xt, Yt)
Var(Xt)
.
3.1 Two-Component Decomposition
In order to get some intuition behind the multi-component decomposition, we proceed to decompose the factor model by decomposing the time series into two orthogonal components, one component with business cycle greater than p periods and the other component shorter than p periods where p = 2j
, 1 ≤ j ∈ Z. To
be more specific, we let Yt = Yt++ Y
−
t , where Yt+ is the component of Yt with
economic cycles greater p periods and Y−
t is the component with business cycle
shorter than p periods, analogously we decompose Xt as Xt= Xt++ Xt−.
Due to the orthogonality, we have
Cov(Xt+, Xt−) = Cov(Yt+, Yt−) = Cov(Xt+, Yt−) = Cov(Xt−, Yt+) = 0. Hence, we can write Equation (19) as:
Yt= Yt++ Y
−
t (20)
= α + β+Xt++ β−Xt−+ Zt. (21)
Due to the orthogonality of the components of {Xt, Yt}and the linearity of the
covariance, we get the following:
Here the partial effects β±are called spectra betas, note that the spectral betas
can be obtained by projecting either Yton the orthogonal components of Xtor by
projecting the orthogonal components of Yt on the corresponding components of
Xt. We proceed to show the traditional β is a superposition of the spectral betas:
β = Cov(Xt, Yt) Var(Xt) = Cov(X + t + X − t , Yt++ Y − t ) Var(Xt) = Cov(X + t , Yt+) + Cov(Xt−, Yt−) Var(Xt) = Var(X + t ) Var(Xt) | {z } ≡w+ Cov(Xt+, Yt+) Var(Xt+) | {z } ≡β+ +Var(X − t ) Var(Xt) | {z } ≡w− Cov(Xt−, Yt−) Var(Xt−) | {z } ≡β− = w+β++ w−β−,
where {w+, w−} ∈ R may be considered as the weights for the spectral betas and
3.2 Multi-Component Decomposition
Analogous to the two-component decomposition, we can decompose the single factor model into multi-components. Consider a zero-mean covariance stationary process Xt, its extended Wold representation is Xt = P∞j=1X
(j)
t , hence Equation
(19) can be written as:
Yt= α + βXt+ Zt= α +
∞
X
j=1
βjXt(j)+ Zt,
where the index j is associated with with a specific range of frequency, and the components X(j)
t s are orthogonal to each another.
Moreover, if Ytis also a zero-mean covariance-stationary process, we can make
orthogonal projection of Ytonto the same sub-spaces of the orthogonal components
of Xt, i.e. Yt= ∞ X j=1 Yt(j), such that
Cov(Xti, Xtj) = Cov(Yti, Ytj) = Cov(Xti, Ytj) = 0 if i 6= j, for all i, j ∈ Z+. We can then decompose β as
β = Cov(Xt, Yt) Var(Xt) = Cov(X (1) t + X (2) t + · · · , Y (1) t + Y (2) t + · · · ) Var(Xt) = Cov(X (1) t , Y (1) t ) + Cov(X (2) t , Y (2) t ) + · · · Var(Xt) = Var(X (1) t ) Var(Xt) | {z } ≡w1 Cov(Xt(1), Yt(1)) Var(Xt(1)) | {z } ≡β1 +Var(X (2) t ) Var(Xt) | {z } ≡w2 Cov(Xt(2), Yt(2)) Var(Xt(2)) | {z } ≡β2 + · · · = w1β1+ w2β2 + · · · = ∞ X j=1 wjβj,
where wj ∈ [0, 1], ∀ j ∈ Z+ are the spectral weights satisfying the condition
P∞
analo-gous definition of spectral betas as the two-component case, namely βj = Cov(Xt(j), Yt) Var(Xt(j)) = Cov(Xt(j), Yt(j)) Var(Xt(j)) . (22)
To summary, the beta for any bivariate zero-mean covariance stationary pro-cess, {Xt, Yt}, that has a Wold representation can be decomposed into frequency
specific spectral betas such that
β =
∞
X
j=1
wjβj. (23)
where the spectral weights satisfy
wj = Var(Xt(j)) Var(Xt) , such that ∞ X j=1 wj = 1. (24)
In practice, we need far from infinite number of decomposition in real appli-cation. Partly because it is computationally costly, partly because such a fine division of frequency space does not necessarily give information of interest. Now suppose we wish to decompose our factor model into J, 1 ≤ J ∈ Z, frequency range and let the (J + 1)-th range denotes the residual in the frequency space, we can then write the factor model as
Yt = α + J X j=1 βjX (j) t + βJ +1(Xt− J X j=1 Xt(j)) + Zt (25) = α + J X j=1 βjX (j) t + βJ +1X (J +1) t + Zt, (26)
where we have defined the frequency residual, X(J +1)
t , as Xt(J +1) ≡ Xt− J X j=1 Xt(j). (27)
By definition, the residual term, X(J +1)
t , is orthogonal to X
(j)
t , j ∈ {1, . . . , J }.
4 Spectral Portfolio-Optimisation
Frame-work
The single factor model presented in previous sections can be easily extended into a multi-factor model if a proof is conducted by induction. In this section, we consider a portfolio optimisation problem with implementation of spectral factor model to outline the framework of spectral portfolio optimisation.
4.1 Markowitz Optimisation Theory
Consider a portfolio with N assets, let Ri denote the return of asset i during time
interval [t−1, t), R = (R1, . . . , RN)0, µ = E[R] and Σ = Cov(R, R0). The portfolio
optimisation problem is to find the optimal weights wi for asset i, j = 1, . . . , N
such that the portfolio return is maximised. It is common to set the P
iwi = 1.
Recall that we can go both long and short on the assets if wi ∈ [−1, 1]. For a
portfolio that does not allow short sales, one can constrain the weights such that
wi ∈ [0, 1].
In this setup, the return of the portfolio Rp is also a random variable with
mean E[Rp] = µ0w and variance Var(Rp) = w0Σw. Given an expected portfolio
return µ∗
∈ R, then by the Markowitz portfolio theory [14] , the optimal portfolio weights can be obtained by solving the following quadratic problem:
ˆ w = arg min w w 0Σw subject to µ0w = µ∗, w ≥ 0 and 10w = 1, (28)
where the condition w ≥ 0 applies for portfolios that do not allow short-sell. As we can see from the problem setup, for a fixed target portfolio return µ∗, the
portfolio optimisation problem has two key inputs, namely the covariance matrix Σand the expected asset return vector µ.
When solving the above problem in a quadratic programming package, the problem is rewritten in the standard quadratic form:
arg min w w0Σw, " µ0 10 # w = " µ∗ 1 # , −INw ≤ 0,
4.2 Spectral Risk
We proceed further to risk decomposition in factor models. To be more specific, we present the framework to decompose the risk covariance Σ for a spectral factor model with an arbitrary number of factors. Consider a factor model with M factors denoted as fm, m ∈ {1, . . . , M }, the return for asset i is then
Ri = αi+ M
X
m=1
βi,mfm+ i, i ∈ {1, . . . , N }, (29)
where the βi,m is the sensitivity of asset i to factor fm and i is the idiosyncratic
return of asset i. We can then write the return for all the assets in the portfolio as
R1 ... RN = β1,1 · · · β1,M ... ... ... ... βN,1 · · · βN,M f1 ... fM + 1 ... N .
or in the compact matrix form
R = α + βf + . (30)
It is common to assume the asset specific shocks to be zero-mean and cross sec-tionally uncorrelated, i.e. Cov(i, j) = 0 if i 6= j, hence E[0] = Dis a diagonal
matrix. Let V be the covariance matrix of the factors, by defining β as the least square projection β = Cov(R, f)V−1, it follows that Cov(f, ) = 0. The covariance
of the return is then
Σ =Cov(R, R0) = βVβ0+ D. (31)
Suppose we would like to decompose the factors into J orthogonal frequency-specific components, then we can write Equation (30) as
R = α + J X j=1 β(j)f(j)+ (j). (32)
Due to the orthogonality of the components and the linearity of the covariance, the covariance matrix of the returns becomes
Σ =
J
X
j=1
Σ(j), where Σ(j)= β(j)V(j)β(j)0 + D(j). (33)
From the relationship PJ
j=1β(j)f(j) = βf and
PJ
j=1(j) = , we see that the
4.3 Asset Pricing Factor Models
In the previous section, we presented the spectral risk decomposition in a factor model. However, we did not specify the factors fm in Equation (29). In this
section, we introduce some of the most common asset pricing factor models. In general, an asset pricing models specify two sources of volatility for the re-turn of an asset, namely one systematic and one unsystematic. The systematic uncertainty is represented by the non-diversifiable risk, and the unsystematic un-certainty is expressed by an asset-specific volatility that is diversifiable. Ideally, a reasonable investor would like to diversify her portfolio such that the unsystematic risk is eliminated and her exposure to the systematic risk is minimum. However, systematic risk is not directly observable, which leads to the search of appropriate approximations. These approximative observable series are commonly known as factors. The coefficients β before the factors indicate the degree to which an asset’s return is correlated with the factor. In other words, β is simply an indicator of an asset’s vulnerability to the systematic risk. If β = 1 for an asset, the asset is said to be in phase with the overall market. If β > 1, the asset is more volatile than the market, and conversely, the asset is less risky than the market if β < 1. In general, the factors can be categorised into macro-economical, statistical and fundamental factors [13]. Statistical factors are found by statistical methods such as factor analysis and principal component analysis. Here, we restrict ourself to standard macro-economical and fundamental factor models.
4.3.1 Macro-Economical Factor Models
A macroeconomic factor, as the name indicates, is the realised values of macro variables. Example of macroeconomic factor are market risk, inflation, industrial production, interest rates, unemployment rates [12]. The simplest macroeconomic factor model is the Single Index Market (SIM) Model developed by Sharpe in 1963 [31] and the well known Capital Market Pricing Model (CAPM). Mathematically, the SIM model is the factor model in Equation (29) with M = 1, under the same mathematical assumptions, the return for asset i between time t − 1 and t is
Ri = αi+ βiRm+ i,
where
• Ri : The excess return of asset i
• Rm : The excess return of the market