• No results found

Some Contributions to Heteroscedastic Time Series Analysis and Computational Aspects of Bayesian VARs

N/A
N/A
Protected

Academic year: 2022

Share "Some Contributions to Heteroscedastic Time Series Analysis and Computational Aspects of Bayesian VARs"

Copied!
48
0
0

Loading.... (view fulltext now)

Full text

(1)

Some Contributions to

Heteroscedastic Time Series Analysis and Computational Aspects of Bayesian VARs

Oskar Gustafsson

Oskar Gustafsson Some Contributions to Heteroscedastic Time Series Analysis and Computational Aspects of Bayesian VARs

Doctoral Thesis in Statistics at Stockholm University, Sweden 2020

Department of Statistics

ISBN 978-91-7911-356-8

(2)
(3)

Some Contributions to Heteroscedastic Time Series Analysis and Computational Aspects of Bayesian VARs

Oskar Gustafsson

Academic dissertation for the Degree of Doctor of Philosophy in Statistics at Stockholm University to be publicly defended on Friday 18 December 2020 at 13.00 in hörsal 9, hus D, Universitetsvägen 10 D.

Abstract

Time-dependent volatility clustering (or heteroscedasticity) in macroeconomic and financial time series has been analyzed for more than half a century. The inefficiencies it causes in various inference procedures are well known and understood.

Despite this, heteroscedasticity is surprisingly often neglected in practical work. The correct way is to model the variance jointly with the other properties of the time series by using some of the many methods available in the literature. In the first two papers of this thesis, we explore a third option, that is rarely used in the literature, in which we first remove the heteroscedasticity and only then fit a simpler model to the homogenized data.

In the first paper, we introduce a filter that removes heteroscedasticity from simulated data without affecting other time series properties. We show that filtering the data leads to efficiency gains when estimating parameters in ARMA models, and in some cases to higher forecast precision for US GDP growth.

The work of the first paper is extended to the case of multivariate time series in Paper II. In this paper, the stochastic volatility model is used for tracking the latent evolution of the time series variances. Also in this scenario variance stabilization offers efficiency gains when estimating model parameters.

During the last decade, there has been an increasing interest in using large-scale VARs together with Bayesian shrinkage methods. The rich parameterization together with the need for simulations methods results in a computational bottleneck that either force concessions regarding the flexibility of the model or the size of the data set. In the last two papers, we address these issues with methods from the machine learning literature.

In Paper III, we develop a new Bayesian optimization strategy for finding optimal hyperparameters for econometric models via maximization of the marginal likelihood. We illustrate that the algorithm finds optimal values fast compared to conventional methods.

Finally, in Paper IV we present a fast variational inference (VI) algorithm for approximating the parameter posterior and predictive distribution of the steady-state BVAR. We show that VI produces results that are very close to those of the conventional Gibbs sampler but are obtained at a much lower computational cost. This is illustrated in both a simulation study and on US macroeconomic data.

Keywords: Time series, heteroscedasticity, variance stabilizing filters, Bayesian vector autoregressions, Bayesian optimization, variational inference.

Stockholm 2020

http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-186542

ISBN 978-91-7911-356-8 ISBN 978-91-7911-357-5

Department of Statistics

Stockholm University, 106 91 Stockholm

(4)
(5)

SOME CONTRIBUTIONS TO HETEROSCEDASTIC TIME SERIES ANALYSIS AND COMPUTATIONAL ASPECTS OF BAYESIAN VARS

Oskar Gustafsson

(6)
(7)

Some Contributions to

Heteroscedastic Time Series Analysis and Computational Aspects of Bayesian VARs

Oskar Gustafsson

(8)

©Oskar Gustafsson, Stockholm University 2020 ISBN print 978-91-7911-356-8

ISBN PDF 978-91-7911-357-5

Printed in Sweden by Universitetsservice US-AB, Stockholm 2020

(9)

To Julia.

(10)
(11)

List of Papers

The following papers, referred to in the text by their Roman numerals, are included in this thesis.

PAPER I: Gustafsson, O. and Stockhammar, P., Variance stabilizing filters, Communications in Statistics - Theory and Methods, 48(24):

6155-6168 (2019).

DOI: 10.1007/s00285-017-1147-0

PAPER II: Gustafsson, O., Variance stabilization for multivariate time se- ries

(Under review)

PAPER III: Gustafsson, O., Villani, M., and Stockhammar, P., Bayesian op- timization of hyperparameters when the marginal likelihood is estimated by MCMC

(Manuscript)

PAPER IV: Gustafsson, O. and Villani, M., Variational inference for the steady-state BVAR

(Manuscript)

(12)
(13)

Acknowledgements

During the doctoral studies, there have been many ups and downs that were more or less deep, and I have many to thank for smoothing out the downturns without cutting the peaks (too much).

First and foremost, I want to thank my supervisor Pär Stockhammar, from day one I have appreciated your strong support, trust, and encouragement. You have always let me make my own routes towards the goal, and at the same time, provided invaluable guidance and discussions. I have always enjoyed our meetings, both the parts regarding academic progress and the ones involving travel plans, Tottenham Hotspur, and much more.

I would also like to thank Mattias Villani, who co-authored two of the papers in this thesis. Thank you for introducing me to new interesting topics and for sharing your experience. I have really learned a lot. I thank my co- supervisor, Frank Miller, for your comments on my work, especially in the early part of the Ph.D. program.

Many thanks go to my friends and colleagues at the department, who made every day a better one, especially my fellow Ph.D. candidates. Thank you, Edgar, for the many shared experiences from the first day in our shared room until the last days as room neighbors. Also, thanks to Jonas and Oscar for many interesting and elevating talks as well as heavy lifts.

Thanks go to all my friends outside the department. To Adrian, my oldest friend, for countless good times, and to all other Ljugarns-*n for many of the same times. To Robert and Johan for other good times, and for the many strategical and tactical battles we’ve had during boardgames.

The greatest of thanks goes to my family for all the love and support over the years. My parents Bodil and Niklas, I cannot thank you enough for all the support, and for believing in me from the (real) beginning. Rebecka, you are not only my little sister, but a great friend and someone I know I can always turn to. My final greatest thanks go to my better half, best friend, and wife, Julia. You are exceptional, and your ability to make me laugh, to put things in different perspectives, and to provide comfort during periods of intense work and doubts has made huge differences over the years. We have been in this project together, and I dedicate this thesis to you.

Stockholm, November 2020 Oskar Gustafsson

(14)
(15)

Contents

List of Papers i

Acknowledgements iii

1 Introduction 1

2 Time series analysis 3

2.1 Stationarity . . . 3

2.2 ARMA models . . . 5

2.3 VAR models . . . 6

2.4 Bayesian VAR models . . . 7

2.5 State-space models . . . 9

3 Heteroscedasticity in time series 11 3.1 Heteroscedasticity and regression . . . 11

3.2 ARCH and GARCH models . . . 12

3.3 Stochastic volatility models . . . 13

4 Computational methods 17 4.1 Markov chain Monte Carlo . . . 17

4.2 Variation inference . . . 18

4.3 The Kalman filter . . . 19

4.4 Sequential Monte Carlo . . . 20

4.5 Bayesian optimization . . . 21

5 Summary of papers 23

Sammanfattning 27

References 29

(16)
(17)

1. Introduction

"What should we do next?"is the question constantly nagging decision-makers, and the conclusion that it depends on what will happen in the future follows immediately. In the moment when important decisions have to be made, we should consider all the available information at hand and project it into what we think will happen, i.e. we make a prediction, and base our action on that.

Producing predictions to guide decision-making processes is an important aspect in the daily work for econometricians in most branches, not the least when it comes to the real economy. Accurate forecasts are important for help- ing decision-makers and analysts to take good actions that affects many peo- ple. Some examples are whether to: carry out an investment plan, to invest in a certain stock, to change the repo rate, or to alter a certain tax pressure.

Analyzing and forecasting the real economy is typically done using time series analysis. An area which is constantly evolving and new innovations in the field continue to boost the foundations on which to base decisions on. The aim of this thesis is to make some small contributions to this field and to push the rock just a little bit further up the hill.

A fact known for many decades is that macroeconomic, and especially financial time series possess volatility clusters, also known as heteroscedastic- ity. A phenomenon most clearly explained in the famous quote of Mandelbrot (1963):

"...large changes tend to be followed by large changes-of either sign-and small changes tend to be followed by small changes...".

That heteroscedasticity inflict efficiency-losses for various inference and predictive procedures has been known for a long time, and usually, one of two ways of handling it is used. The first (and arguably correct) way is to incorporate the information into the analysis and model the unobserved vari- ance of the time series. There are many good ways of doing this, for example through ARCH-type (Engle, 1982) or stochastic volatility type of models (Tay- lor, 1994). However the models quickly become complicated and they some- times require long computation times. The second option is to move on and simply neglect the volatility clustering to get less complications, an approach that is typically associated with efficiency losses. In Paper I and II of this the- 1

(18)

sis we look at a third option that is rarely used, namely to first remove the unwanted heteroscedasticity, and only afterwards estimating a simple model and carry out the analysis, see Stockhammar and Öller (2012).

Another aspect of modern time series econometrics, as well as in many other areas, is that larger and larger data sets are considered to incorporate more information into the forecasts. In this context "large" refers to "wide"

data sets. That is, there are increasingly many time series included as opposed to the time series length, which is considered fixed.

A problem with using wide data sets when analyzing macroeconomic time series is that models become highly parametric. As a result, the dynamic re- lation between variables in the system becomes harder to estimate with high precision and forecasts might be erratic. This problem is commonly referred to as over-fitting and is a well-known problem in all areas of statistics.

In the last couple of decades, methods for overcoming the over-fitting issue by "shrinking" the model towards a simpler specification has become a go-to option in macroeconomic forecasting, see e.g. Doan et al. (1984) and Litter- man (1986). This opens up for using the increasing amount of collected data to produce a more solid decision basis. However, as we increase the information set we run into another bottleneck: computational resources. The parametric nature of the time series models quickly increases the computational burden as more time series are added. This makes it impractical to use the whole infor- mation set when we either need the predictions right now or when the models have to be estimated repeatedly many times.

In Paper III we address the issue of selecting the hyperparameters when each model evaluation is costly and in Paper IV we work with reducing the computational cost associated with each run.

The rest of this introductory part gives a brief description of some concepts and models that are used in the papers. Section 2 gives a small introduction to the concept of stationarity and introduce some commonly used time series models. Section 3 is devoted to explaining the concept of heteroscedasticity in time series and to present some popular models to deal with it. Section 4 introduces some computational techniques, and, finally, Section 5 gives a summary of the papers.

2

(19)

2. Time series analysis

This section contains a brief overview of some important concept and models that has filled an important role in the history of time series analysis as well as in today’s developments.

In particular, I will focus on the concept of stationarity and introduce the class of autoregressive moving average (ARMA) models as well as their mul- tivariate counterparts, primarily vector autoregressive (VAR) processes. Later in the section I also discuss Bayesian estimation of the VAR and state-space models. For a thorough textbook treatment of the models and concepts dis- cussed in this section, see Hamilton (1994).

2.1 Stationarity

Time series analysis suffers from the fact that at a given point in time we only observe one single observation, as opposed to an ensemble which is typically available in other areas of statistics. A central concept that most time series models requires for useful inference is that of stationarity. Stationarity of a process means that certain properties of the time series is constant over time.

There exist some different versions of stationarity in the literature, but two common definitions are given below:

Definition 1: A time series {Yt: t = 1, 2, . . . } is strictly stationary if the simultaneous distribution for {Ys+1,Ys+2, . . . ,Ys+n} is the same as for {Yt+1,Yt+2, . . . ,Yt+n} for all s, t and n.

Definition 2: A time series {Yt : t = 1, 2, . . . } is weakly, or covariance, stationaryif:

1. The expected value and the variance for Yt exists and is constant over all time periods, t.

2. For all pairs of time points s and t, the covariance, Cov(Yt,Yt+s) only depends on s. That is, the covariance function only depends on the time distance between points, and not on time itself.

As can be understood from the labeling of the two definitions strict stationarity is a stronger assumption which is hard to verify in empirical work. One should 3

(20)

also note that strict stationarity implies weak but not vice versa (unless we make the additional assumption that the time series is normally distributed).

For most practical work it is the weaker form of stationarity that is used, and for the remainder of this text the term "stationary" will be reserved for "covariance stationary". It is worth noting that in the multivariate context one may consider certain combinations of the variables that are stationary, a concept known as cointegration, see e.g. Johansen (1995).

Example 1: A common way to model for example stock prices is with the random walk (RW). The RW is given by

Yt= Yt−1+ εt, where εiid∼ N(0, σ2) (2.1) and is an example of a non-stationary process. This can easily be seen by recursively moving backward in time:

Yt =Yt−1+ εt

=Yt−2+ εt−1+ εt

...

=Y0+ ε1+ ε2+ · · · + εt,

(2.2)

where Y0is some assumed starting value for the process. When calcu- lating the expected value and variance of the RW we get the following:

E[Yt] =E[Y0+ ε1+ ε2+ · · · + εt] = Y0

Var(Yt) =Var(Y0) +Var(ε1) +Var(ε2) + · · · +Var(εt) =

2+ σ2+ · · · + σ2= tσ2

where the variance calculation follows from the iid assumption for the error terms. If we look at the expected value we can see that it is con- stant, no matter what value we use for t. However, the variance is clearly time-dependent.

The RW that we saw in the example above is said to be integrated of order 1, or I(1). This is a common property among economic and financial variables, and is something that one has to deal with. One way of dealing with multivari- ate integrated time series is, the before mentioned, analysis of cointegration, see e.g. Johansen (1995) and Engle and Granger (1987), and the other com- mon approach is to make the processes stationary before fitting a model. The standard way to make an I(1) variable stationary is to take the first difference, often denoted as ∆Yt= Yt−Yt−1. Sometimes a time series may be integrated of 4

(21)

higher order than 1, for example I(d) or ’integrated of order d’, the time series then becomes stationary after d consecutive difference. For example if d = 2 we get ∆2Yt = ∆(∆Yt) = ∆Yt− ∆Yt−1= Yt− 2Yt−1+Yt−2.

Example 2: That the RW becomes stationary after first differencing is seen immediately by moving Yt−1 to the left hand side, i.e. we get:

∆Yt = Yt−Yt−1= εt which is iid for all t and thus stationary.

Finding the right order of integration is the first step in the so-called Box- Jenkins approach (Box and Jenkins, 1970) for selecting a times series models.

The next step is to identify the number of autoregressive components and mov- ing average components to form an ARMA model which is described in the next subsection.

2.2 ARMA models

Box and Jenkins (1970) popularized the approach of first making suitable transformations to a time series to make it stationary (by e.g. first differencing, seasonal adjustments, etc.), and then fitting a parsimonious model with a few autoregressive and moving average components (ARMA).

The simplest of such process is arguably the AR(1)

Yt= α + φYt−1+ εt, where ε iid∼ N(0, σ2) (2.3) where α is an intercept and the condition |φ | < 1, on the autoregressive pa- rameter, ensures that the process is stationary. It is easy to show that the mean of the AR(1) process is given by:

E[Yt] = α + φ E[Yt−1] or µ = α

1 − φ. (2.4)

This makes it clear that the mean of the process only exists if φ 6= 1 and that the mean is zero only if α = 0.

If we make the replacement α = µ(1 − φ ) in (2.3) we get the mean- adjusted version of the model (which is used in its multivariate form in Paper III and IV):

Yt− µ = φ (Yt−1− µ) + εt, where εiid∼ N(0, σ2) (2.5) If we repeatedly substitute past terms in (2.5) we get the infinite MA represen- tation

Yt− µ = εt+ φ εt−1+ φ2εt−2+ · · · =

i=0

φiεt−i. (2.6) 5

(22)

The result that the AR(1) can be written as an infinite weighted sum of past error terms plus a deterministic component (in this case the constant µ) is not unique, but it holds for all stationary processes and is known as the Wold rep- resentation theorem, (Wold, 1938). The re-parametrization of AR-processes into an infinite order MA counterpart can useful for many purposes, such as for example impulse-response analysis, see e.g. Lütkepohl (2007).

The second part (MA) has a similar appearance but is built up by its past error terms, for example, the MA(1) process is given by:

Yt = α + εt+ θ εt−1, (2.7)

where θ is called the moving-average parameter. As opposed to the AR-model, all MA models are considered stationary, and any invertible MA-process can be rewritten as an infinite order AR-process.

Both the AR- and the MA-processes can easily be extended with several lags and combining them we get the ARMA(p,q)-process:

Yt=α + φ1Yt−1+ . . . φpYt−p+ εt+ θ1εt−1+ · · · + θ1εt−q (2.8)

=α +

p

i=1

φiYt−i+

q

j=1

θjεt− j (2.9)

The sums in ARMA-processes are typically expressed in more compact nota- tion using the so-called lag-operator which will be useful also in the multivari- ate extension. The lag-operator is defined as φ (L) = (1 − φ1L− φ2L2− · · · − φpLp) where LYt = Yt−1, L2Yt = Yt−2and so on. In a similar way we can define the lag-operator for the MA-components, first we define ˜θ = −θ and we then get the lag-operator ˜θ (L) = (1 − ˜θ1L− ˜θ2L2− · · · − ˜θqLq). We can then express the ARMA in compact notation as:

φ (L)Yt = α + ˜θ (L)εt (2.10) The ARMA-model has been analyzed and used in countless papers (includ- ing Paper I of this thesis) and should be seen as a corner-stone in any econome- trician’s toolbox. Most things that we know from the univariate ARMA models spills over to their multivariate counterparts which are briefly explained in the next section.

2.3 VAR models

Economic and financial variables interrelate in complex and dynamic ways that univariate specifications, no matter how well developed, will fail to capture. To get a nuanced picture of the reality we need to consider the joint evolution of 6

(23)

different parts of the economy at the same time. This can be accomplished by considering multivariate generalizations of the models in Section 2.2, which are called vector-ARMA or simply VARMA. The vector processes that include moving-average terms can potentially lead to more parsimonious representa- tions, however, they are typically much harder to estimate Lütkepohl (2007), and one typically resort to only using autoregressive terms. Therefore the rest of the section is devoted to vector autoregressions (VARs), popularized by Chris Sims (Sims, 1980), who was also rewarded the Sveriges Riksbank Prize in Economic Sciences in memory of Alfred Nobel in 2011.

The VAR(p) model is the vector analogue of the AR-process in Section 2.2 and is given by:

yt = c +

p

i=1

Πiyt−p+ εt (2.11)

where yt is a (n × 1) vector of time series measured at time t, c is a (n × 1) vector of intercepts, Πi is a (n × n) matrix of regression coefficients, and {εt}Tt=1are iid (n × 1) error terms often assumed to be N(0, Σ).

The VAR specified in (2.11) is in reduced form and can also be written as:

yt= ΠTxt+ εt, (2.12)

where Π = (c, Π1, . . . , Πp)Tis of dimension (np+1)×n and xt= (1, yTt−1, . . . , yTt−p)T is a (np + 1) × 1 vector.

The VAR can be rewritten in many ways that are useful in different settings and one way to write it is in the mean-adjusted form as we saw for the AR(1) in (2.5).

yt = µ + Π1(yt−1− µ) + · · · + Πp(yt−p− µ) + εt (2.13) As in the case with the AR model the intercept terms are now replaced by the unconditional mean; this formulation of the VAR is used in both Paper III and IV.

2.4 Bayesian VAR models

VARs are due to their flexibility and ability to capture dynamic relationships among time series highly used in practical work for both policy analysis and forecasting. However, the flexibility comes from the dense parametrization and when too many time series (for a given number of observations) are included in the analysis, maximum likelihood will start to over-fit the data.

A popular way that has become standard in the literature is to use Bayesian methods to estimate the models and overcoming the over-fitting problems.

7

(24)

In the VAR literature, the use of Bayesian methods is more seen as a con- venient way to impose shrinkage and overcoming the curse of dimensional- ity, rather than necessarily taking a philosophical stand-point. In Bayesian VARs (BVARs) the researcher is allowed to specify a simple alternative model and shrink the over-fitted maximum likelihood parameters toward the simpler model.

There are several ways to formulate the prior and to shrink the VAR in the literature, but most of them are extensions of the Minnesota prior, (Litterman, 1986). The Minnesota prior builds on a set of simple principles which can be described by (i) many macroeconomic variables (in levels) behave like univari- ate RWs, (ii) more recent lags matter more than distant lags, and (iii) own lags matters more than cross-lags.

(i) relates to how one should set the prior mean for the regression coeffi- cients. Litterman (1986) suggests that the first own lag should be set to 1, and all other coefficients to 0 to reflect the RW observations. However, this implies a non-stationary process. When we work with stationary time series the prior can be adjusted to instead reflect that the first own lag should have some per- sistence but that it should be lower than one in absolute value. (ii) and (iii) are used to set the elements of the prior covariance matrix, ΩΠ, and are quantified through the following equations:

ωii=

λ12

lλ3, for the own lag , l of variable r, i = n(l − 1) + r,

1λ2sr)2

lλ3sj)2 , for cross-lag, l of variable r 6= j, i = (l − 1)n + j, (2.14) where ωii is the i:th diagonal element of the prior covariance matrix of the VAR dynamics. λ1− λ3 are hyperparameters that are generally referred to as the "overal-shrinkage", the "cross-lag shrinkage" and the "lag-decay" param- eters respectively. The hyperparameters make sure that the structure of the prior covariance matrix induce the shrinkage recommended by (ii) and (iii) above. Traditionally, the hyperparameters are set to some conventional values that dates back to Doan et al. (1984), which were optimized on a historical data set. The hyperparameters are given special attention in Paper III, where we suggest an approach for optimizing them in computationally demanding situations where Markov chain Monte Carlo methods (MCMC) are used, see Section 4.1. As mentioned before, there is a lot of literature that proposes dif- ferent prior structures for the BVAR and some of the most popular approaches are presented in Karlsson (2013).

A prior that is of particular interest in this thesis is the steady-state prior of Villani (2009). The basic idea is to benefit from the typically good knowledge that economists have regarding the long-run mean of economic variables. A common practice for other priors is to set the prior mean for the intercept, 8

(25)

c, to zero and the variance to an arbitrarily large number. This represents complete ignorance and does not make use of any knowledge that we may actually have. Imposing an informative prior on the steady-states has shown to be particularly useful for forecasting macroeconomic time series, see for example Villani (2009);Beechey and Österholm (2010), and Wright (2013).

The basic idea is very simple, we just rewrite the model in its mean- adjusted form, and using the lag-operator it can be written as:

Π(L)(yt− µ) = εt (2.15)

However, this comes with the necessary complication that the model becomes non-linear in the parameters. This complicates the estimation, however, Vil- lani (2009) suggests a simple Gibbs sampling scheme to simulate from the posterior distribution.

Many of the popular priors used for BVARs makes the analysis dependent on MCMC methods. This is a minor problem for low- to medium dimensional VARs, say 2-10 variables. But, as we include more time series the compu- tational burden quickly gets cumbersome, especially for recursive forecasting exercises where the models have to be re-estimated many times. Already after approximately 20 time series the computational times become an issue, and VAR systems with up to 100 variables basically becomes infeasible. At the same time, there has been an interest in scaling up the BVARs which tend to improve forecast performance, see e.g. Ba´nbura et al. (2010). Paper III and IV of this thesis target two of the problems that are associated with scaling up the BVAR, (i) how select the hyperparameter to impose an adequate de- gree of shrinkage, (ii) how to reduce the computational burden without major concessions regarding the flexibility.

2.5 State-space models

A useful way of thinking about a time series is that there exists some under- lying, but unobservable, latent process that drives the movement of the time series that we actually observe. This is referred to as the state-space view in the time series context, e.g. Harvey (1989) and Durbin and Koopman (2012).

A general formulation of the state-space model (SSM) is given by:

yt= f (αt, θ ) + εt (2.16) αt=g(αt−1, θ ) + ηt, (2.17) where the latent process, αt, describes the evolution of the system over time, and θ represents a vector of fixed parameters. We may notice that the formu- lation above is quite flexible with f and g representing potentially non-linear functions but that it includes additive noise from some arbitrary distribution.

9

(26)

One particularly useful and convenient case of the SSM is the linear Gaus- sian state-space model (LG-SSM). The general form of the LG-SSM can be written as in Durbin and Koopman (2012):

yt =Ztαt+ εt, εt ∼ N(0, Ht)

αt =Ttαt−1+ Rtηt, ηt ∼ N(0, Qt) (2.18) We can see from (2.18) that the top or the observation equation is basically a regression model with the regressors being the latent variable, which by itself is represented by a VAR(1) in the state equation. Even though the LG-SSM seem restrictive, all of the models considered this far in the thesis can be put into a state-space representation using the LG-SSM.

Example 3: For intuition it may be good to consider a simple example of a LG-SSM, the local level model (LLM):

ytt+ εt, εt ∼ N(0, σε2) (2.19) αtt−1+ ηt, ηt ∼ N(0, ση2). (2.20) We can see that the observation equation of the LLM consists if a nor- mally distributed error term, and a time-varying level, which is not di- rectly observable. Further, the unobservable level depend on the previ- ous value of itself and another normally distributed error term through a RW relation.

With the state-space framework one can model many sorts of dynamically evolving systems and it plays an important role in modern macroeconomic modeling. To mention some uses we may consider the time-varying VAR where we let each regression coefficient follow their own latent process, we may also let the unique elements in the covariance matrix to evolve over time (this is known as the stochastic volatility model), see for example Cogley and Sargent (2005) or Primiceri (2005) for both sources of parameter variations.

Dynamic factor models are also often written in state-space form, see Stock and Watson (2002b).

10

(27)

3. Heteroscedasticity in time series

In the previous sections, we have assumed that the sequence of error terms are uncorrelated and normally distributed with a constant variance σt2= σ2(and Σt= Σ in the multivariate case) ∀t. This assumption is unrealistic in most real- world applications, and scientists have observed a reduction in the variance for many important macroeconomic time series over the last decades. This has been referred to as the Great Moderation of the business-cycle, see Stock and Watson (2002a). This empirical observation is the main focus in Paper I and II where the assumption of constant variance is relaxed. This can be accounted for in several ways, some of which are explained below. See e.g. Tsay (2010) for a description of some of the most popular volatility models.

3.1 Heteroscedasticity and regression

That ordinary least squares (OLS) is inefficient under heteroscedasticity has been known for a long time and the problem spills over to time series analysis.

Periods associated with different variances will be weighted differently which induce inefficiencies.

Consider the model:

y= X β + ut, (3.1)

with E[u|X ] = 0 and E[uu0|X] = Σ = σ2Ω, with Ω 6= I being a positive definite matrix. The variance is given by

Var( ˆβ ) = Eh ˆβ − β  ˆβ − β

i

= σ2 XTX−1

XTΩX XTX−1

(3.2) which is clearly different from σ2 XTX in OLS. The inefficiencies can have more or less serious implications for the inference under different circum- stances, where for example confidence intervals, t-statistics, and other widely used quantities becomes erroneous.

A way to re-gain efficiency is through generalized least squares (GLS), see e.g. Hamilton (1994) and Greene (2014), to make sure that the weighting of data points becomes more homogeneous. In GLS we pre-multiply equation 11

(28)

(3.1) by a non-singular matrix T (which is supposed to approximate Ω−1/2) such that we get:

T× y = (T × X)β + T × u ⇔ y= Xβ + u, (3.3) where E[u|X] = 0 and E[uu0|X] = σ2I, which means that OLS on the trans- formed variable regains the "BLUE" (best linear unbiased estimator) proper- ties and β can be estimated efficiently.

A special case of GLS is the weighted least squares (WLS) which is useful when the model errors are heteroscedastic. We can write the WLS variance of β with off-diagonal elements set to zero (so no autocorrelation in the errors)ˆ as:

Σ = σ2Ω = σ2

ω1 0 . . . 0 0 ω2 . . . 0 ... ... . .. ... 0 0 . . . ωT

=

σ12 0 . . . 0 0 σ22 . . . 0 ... ... . .. ... 0 0 . . . σT2

(3.4)

where we can see that each variance component is associated with its own error term. WLS allows for the nice interpretation that σ2can be seen as the "overall variance" and the ω:s can be rescaled into weights since tr(Ω) = ∑t=1T ωi= T , (Greene, 2014).

In time series models, the regressors are dynamically evolving which car- ries the implication that the conditional variance is itself is evolving. There are mainly two approaches to model the variance evolution in time series, which are described in the next subsection.

3.2 ARCH and GARCH models

In his seminal paper from 1982, Robert Engle introduced the ARCH model (auto regressive conditional heteroscedasticity) for which he 2003 was awarded the Sveriges Riksbank Prize in Economic Science in Memory of Alfred Nobel.

The ARCH model has been extended in many directions to capture asymme- tries, non-linearities, and leverage-effects, etc. See for example Franses and van Dijk (2000) for several examples. In this subsection, we will take a closer look at two types of ARCH processes, namely the original ARCH(1) process and the ARMA-GARCH process which is used in Paper I of this thesis.

The ARCH(1) process is given by following equations:

ytt= ht1/2νt, where νt

iid∼ N(0, 1), ht =a0+ a1εt−12 ,

(3.5)

12

(29)

where ht can be seen as the conditional variance, i.e. Var(yt|yt−1) = ht. From equation (3.5) we can see that the conditional variance is now decided by the square of the lagged error term. The ARCH(1) can easily be extended to an ARCH(p) model including more lags of ε. It is often the case that the ARCH model requires many lags (and thus many parameters) to fit the variance evo- lution well. This insight lead to the generalized ARCH (GARCH) model of Bollerslev (1986), which is closely related to an ARMA model for the condi- tional variance.

For instance the ARMA(1,1)-GARCH(1,1) is given by yt=φ yt−1+ εt+ θ εt−1,

εt=h1/2t νt, where νt

iid∼ N(0, 1) ht=a0+ a1εt−12 + b1ht−1,

(3.6)

where we have now added lagged values of the conditional variance. Also the GARCH(1,1) model can easily be extended to incorporate more lags, yielding a GARCH(p,q) model.

The GARCH model is by its parsimonious parameterization and its sim- plicity still today a popular tool for modeling heteroscedasticity in time series.

However, the GARCH can be thought of as a linear model for the conditional variances and is thus unable to capture certain empirical properties that has been observed for financial data. Some of them are: asymmetries, for example, that "good news" regarding a stock tends to lower the variance in subsequent periods, while "bad news" tend to increase it. We can also have feedback from the variance to the mean such that increased volatility can increase the price of a certain option. Skewness, such that large negative shocks are more common than positive, higher persistence of variance shocks, etc.

There exists a lot of literature on how to adjust the GARCH model to better describe the above empirical observations. These models are more complex, but offer both great insights into the mechanism that drives the evolution of the variance and also provide a better fit than the regular GARCH. Some popular choices in the literature are the integrated (I)GARCH (Engle and Bollerslev, 1986), the exponential (E)GARCH (Nelson, 1991), the GRJ-GARCH (Glosten et al., 1993) and GARCH models with time-varying parameters (Chou et al., 1992) to mention e few.

3.3 Stochastic volatility models

The other paradigm for modeling the volatility of time series is by using stochas- tic volatility, Taylor (1994) and Harvey et al. (1994). The main differences compared to the GARCH models is that the stochastic volatility model (SVM) 13

(30)

has an additional error term, which makes it more flexible. So while the un- observed conditional volatility of the GARCH can be seen as a deterministic function of the past information set and can be constructed from the past val- ues of the data and the parameters, this is not the case for the SVM. The basic SVM with one autoregressive term for the variance is given by:

yttet, where et ∼N(0, 1)

log σt0+ φ1log σt−1+ ηt, where ηt ∼N(0, ση2). (3.7) where et and ηt are assumed independent. We can see from the fact that the volatility has an own equation with an additional error term that the SVM is in a state-space form, see Section 2.5. A heuristic interpretation of the SVM is that ηt captures shocks that come to the intensity of the information flow that is represented by σt and that a shock to etsignifies the content of the news, e.g.

the magnitude and the sign, (Franses and van Dijk, 2000).

Another parameterization of the model that is used in the second paper of this thesis is:

yt =β expαt

2



et, where et∼N(0, 1) αt =φ αt−1+ ηt, where ηt∼N(0, ση2).

(3.8)

In this specification, β2 controls the overall variance of the error, while α1:T

follows a mean-zero AR-process which allows the time-varying variance to temporarily deviate from its long-run level. Both parametrizations can be made more flexible by including higher-order AR-terms, but the AR(1) specification is the most used in practical work.

An advantage of the GARCH model is that the conditional variance (as mentioned before) can be constructed using the past information set, and the parameters can thus easily be found via maximum likelihood optimization.

However, the extra error term makes the estimation procedure a bit more com- plicated for the stochastic volatility model. There are typically two ways to es- timate the parameters in the stochastic volatility model. The first way is to use the quasi-maximum likelihood approach of Harvey et al. (1994). The second route is to use Bayesian estimation and the forward filtering backward simu- lation idea, see e.g. Durbin and Koopman (2012). Both of the above methods linearizes the observation equation and approximate the resulting (non-normal) error term with a normal mixture. In this thesis, however, I keep the non-linear specification and instead use sequential Monte Carlo (SMC), see chapter 14.5 in Durbin and Koopman (2012) for an example.

The SVM has become a popular part of the recent BVAR-literature since it both adds valuable insights to the economic systems and allows for more 14

(31)

efficient parameter estimation. The common approach when using the SVM for multivariate time series is to use Bayesian methods and MCMC to sim- ulate from the posterior distribution. This is typically done by first drawing a decorrelation matrix, L, such that Lεt becomes a sequence of uncorrelated error terms with time-varying variances. Since the transformed error terms become uncorrelated, one can apply the univariate approach on each equation individually and then retain the desired covariance matrix. This approach is further explained in Paper II.

15

(32)

16

(33)

4. Computational methods

Paper II, III, and IV study computational aspects of the Bayesian vector autore- gressions. A large part focuses on how we can obtain as good approximations of the posterior distribution as possible, preferably at a low computational cost.

This section therefore introduces some of the methods that are used for this purpose in the papers.

4.1 Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) is a simulation method that has become the standard way of doing Bayesian inference, see for example Gelman et al.

(2013). MCMC allows for sampling from non-standard and possibly high- dimensional posterior distributions. The draws are typically dependent but can be shown to converge in distribution to the target distribution. To have access to such a sample comes with a set of very appealing advantages, and one can use it to for example: easily compute posterior intervals for the parameters, model selection or to calculate predictive distributions.

There are several MCMC algorithms, but the most common in the BVAR literature is the Gibbs sampler which is also described here. The downside with MCMC methods is that they are computationally intensive. This can quickly become a problem if we either have to evaluate large data sets, or when the parameter vector of interest is high-dimensional.

Gibbs sampling

Consider a vector θ = (θ1, θ2, . . . , θk) that consists on k different parameter blocks that constitutes our model. In most interesting cases the posterior dis- tribution for the whole vector, θ , does not have a closed form. However, there may exist a closed form solution if we condition on the other parame- ter blocks. That is, we may have a known expression for p(θj− j, y1:T) where θ− j denotes all parameter blocks excluding θj and y1:T denotes all available data. The Gibbs sampler of Geman and Geman (1984) and Gelfand and Smith (1990) provides a clever way to obtain draws from the joint posterior distribu- tion by using the full conditionals via sequential sampling as follows:

17

(34)

• Select initial values θ(0)= (θ1(0), θ2(0), . . . , θk(0))

• For g from 1 to G repeatedly draw from the full conditionals according to:

1: θ1(g)∼ p(θ12(g−1), θ3(g−1), . . . , θk(g−1), y1:T) 2: θ2(g)∼ p(θ21(g), θ3(g−1), . . . , θk(g−1), y1:T)

...

k: θk(g)∼ p(θk1(g), θ2(g), . . . , θk−1(g), y1:T).

By discarding the first ˜G simulations from the algorithm as a burn-in sam- ple (this is done so that we only keep draws for which the Markov chain has already converged) we have obtained a sample from the desired posterior dis- tribution. For a textbook treatment of the Gibbs sampler and other MCMC methods, see e.g. Robert and Casella (2005).

4.2 Variation inference

Variational inference (VI), or Variational Bayes, is an optimization method that comes from the machine learning literature and is used to approximate probability distributions, Jordan et al. (1999). VI is especially useful for high- dimensional Bayesian inference since it is typically much faster than MCMC, which makes it potentially useful for BVARs.

The main idea with VI is that we want to replace the intractable posterior distribution p(θ |y) with a simpler density q(θ ) that belongs to a tractable fam- ily of distributions,Q. When we have defined our class of tractable alternative densities we need to find the one that is closest to the true posterior. This amounts to minimizing the Kullback-Leibler distance of the posterior from the approximating density, i.e. we solve the minimization problem:

q= arg min

q∈Q {KL (q(θ )||p(θ |y))}. (4.1) An interesting fact to note is that:

KL (q(θ )||p(θ |y)) = − Z

q(θ ) logp(θ |y)p(θ )

q(θ ) dθ + log p(y), (4.2) which means that minimizing the KL in (4.1) is equivalent to maximizing a lower bound on the log marginal likelihood, this is typically referred to as the evidence lower bound(ELBO).

18

(35)

The idea is straightforward, but we still have to decide on which distri- butions to include in Q. A common approach (also used Paper IV) is to use the structured mean-field factorization which assumes that the posterior can be factorized into k independent blocks of distributions, e.i. q(θ1, θ2, . . . , θk) = q(θ1)q(θ2) . . . q(θk). The mean-field approximation of the posterior can be seen as restrictive, and sometimes it can severely underestimate the variance of the posterior distributions for the parameters. However, it makes no other assumptions regarding the functional form, and in many situations, the poste- rior dependence is not too high and the approximation works very well. This is the case in Paper IV of this thesis where we present a VI approach for ap- proximating the posterior distribution of the steady-state BVAR. We show that the posterior forecast distributions are virtually the same as when using con- ventional methods, and that they are obtained at a small fraction of the time.

There are other, less restrictive, ways to selectQ, however they are more computationally intensive and less robust than the structured mean field ap- proach. For an introduction to the topic, see e.g. Blei et al. (2017) and Ormerod and Wand (2010).

4.3 The Kalman filter

In Section 2, we have seen a simple example of the linear Gaussian state- space model(LG-SSM), i.e. the local level model. The fact that it is linear and Gaussian means that the model only contains linear transformations of normal random variables. This in turn means that all resulting conditional distributions will also be normal, a fact that makes inference very convenient.

When working with the state-space model the ultimate goal is often to ob- tain the distribution of the states given the data and the fixed parameters, and there are mainly three kinds of distributions that we are interested in; (i) the predictive densities pθt|y1:t−1), (ii) the filtering densities pθt|y1:t), and (iii) the smoothing densities pθt|y1:T). Note that the subscript θ indicates that the distributions on the states are conditional on the fixed parameters col- lected in θ .

The predictive distribution is immediately available to us since:

pθt, αt−1|y1:t−1) = pθtt−1, y1:t−1)pθt−1|y1:t−1), (4.3) and to get the predictive distribution we just marginalize over αt−1:

pθt|y1:t−1) = Z

pθtt−1, y1:t−1)pθt−1|y1:t−1)dαt−1 (4.4) where pθt−1|y1:t−1) is the filtering distribution from the previous period and pθtt−1, y1:t−1) is defined through the state equation.

19

(36)

The widely used Kalman filter (Kalman, 1960) is the main tool for obtain- ing the filtering distributions pθt|y1:t). Since we know that the filtering dis- tributions in the LG-SSM are normal, we know that they are fully characterized by their expected values at|t = E[αt|y1:t] and their variance Pt|t= var(αt|y1:t).

This means that in order to calculate the filtering distributions we just need to estimate the sequences {at|t, Pt|t}Tt=1, where we assume some initial state vec- tor α0∼ N(a0, P0). This is done via forward recursions of the Kalman filter through the filtering equations (Durbin and Koopman, 2012):

νt =yt− Ztat, Ft =ZtPtZtT+ Ht, at|t =at+ PtZTTtFt−1νt, Pt|t =Pt− PtZTTtFt−1ZtPt, at+1=Ttat+ Ktνt, Pt+1=TtPt(Tt− KtZt)T+ RtQtRTt , for t = 1, . . . , T where Kt = TtPtZTTtF−1is called the Kalman gain.

After the filtering distributions are obtained, we can use the calculated state sequences to calculate the smoothing distributions, pθt|y1:T) where we con- dition on the whole data sequence. Another approach is to use the forward fil- tering results together with simulations that (similar to the Kalman smoother) works backward in time, to obtain sample realizations from the smoothing dis- tribution. These approaches are called forward filtering - backward smooth- ingand forward filtering - backward simulation respectively, see Durbin and Koopman (2012).

4.4 Sequential Monte Carlo

Many times SSMs are either non-linear, non-Gaussian, or both in which case the Kalman filter does not provide an exact solution to the right problem. In this case, we have to resort to approximate methods and the standard choice is then sequential Monte Carlo (SMC) also known as the particle filter (PF), Gordon et al. (1993) and Kitagawa (1996). The PF, in a way, mimics the ap- proach of the KF as it sequentially updating the filtering distributions forward in time by simulating so-called particles {a(i)t }Ni=1.

In principle we can (as in the case of the LG-SSM) calculate the filtering distribution using Bayes’s theorem as:

pθt|y1:t) = pθ(ytt)pθt|y1:t−1)

pθ(yt|y1:t−1) , (4.5)

where pθt|y1:t−1) is given in (4.4). But the problem is that the integral is intractable in general, however, it can be approximated at desired precision via importance sampling, se e.g. Robert and Casella (2005). Assuming that 20

(37)

we possess an empirical approximation of the previous filtering distribution consisting of a weighted sample {at−1i , ωt−1i }Ni=1, we have:

ˆ

pθt−1|y1:t−1) =

N i=1

ωt−1i δai

t−1t−1) (4.6)

where δ denotes the Dirac-delta (point mass) function.

Inserting (4.6) into (4.4) gives the approximation of the predictive distri- bution:

ˆ

pθt|y1:t−1) = Z

pθtt−1)

N i=1

ωt−1i δai

t−1t−1)dαt−1

=

N

i=1

ωt−1i pθt|ait−1)

(4.7)

Using (4.6) together with the known structure of the observation equation we can now evaluate the filtering distribution upto the proportionally constant p(yt|y1:t−1). That is:

pθt|y1:t) ∝

N

i=1

ωt−1i pθ(ytt)pθt|ait−1), (4.8) which means that we may target it with another importance sampler. In Pa- per II the Bootstrap particle filter (BPF), which is the most commonly used PF, is used to simulate from the filtering distributions. In the BPF we choose the new proposal density to be the predictive distribution, i.e. q(αt|y1:t) =

ˆ

pθt|y1:t−1). This turns out to be a computational convenient choice since the weights of the new sample of particles are just calculated based on the observation equation as:

˜

ωti= pθ(yt|ait), ωti= ω˜ti

Ni=1ω˜ti

. (4.9)

The fact that we may use the Markov structure of the state equation to propa- gate the particle system forward in time is both intuitive and computationally fast. However, it is inefficient in the sense that the latest observation yt is not used to construct the proposal kernel and there exist more efficient methods.

For an overview of SMC, see e.g. Doucet et al. (2013).

4.5 Bayesian optimization

Bayesian optimization(BO) (Mockus, 1994; Mockus et al., 1978) is another it- erative optimization approach that originates from the machine learning litera- ture. BO is particularly useful in situations when we need to optimize a costly, 21

(38)

and potentially noisy, function f in a small to moderately large-dimensional parameter space. These circumstances arise in many areas of statistics. One typical case is when we want to maximize a simulation-analogue of the marginal likelihood w.r.t. some model choices, for example some set of hyperparame- ters as in Paper III. Another example is when we want to estimate the fixed parameters in SMC, as in the second paper.

The distinction from standard optimization techniques is that BO treats f as an unknown object on which we can make Bayesian inference. That is, we can construct a probability model for the function and thus quantify uncertainty regarding it. The most common way of constructing such probability models on functions is to use a Gaussian Process (GP), see Rasmussen and Williams (2006). The fact that we can construct a model for f in conjecture with a so-called acquisition strategy (i.e. a way to select the next evaluation point) makes the two building blocks for BO.

One of the more popular acquisition strategies is the expected improvement (EI):

EI(x) = (m(x) − fmax)Φ m(x) − fmax

s(x)



+ s(x)φ m(x) − fmax

s(x)



(4.10) where Φ, φ denotes respectively the cdf, and pdf of the normal distribution, fmax is the currently highest function evaluation, and m(x) and s(x) are the estimated mean and standard deviation of f at the candidate evaluation point x produced from our model. A simple heuristic way to explain BO is by the following steps:

1. Compute the posterior distribution for f based on a set of function eval- uations { f (xi)}ki=1consisting of evaluated points {xi}ki=1.

2. Use the model to maximize the utility function defined by the acquisition strategy and select a new evaluation point accordingly.

3. Evaluate the selected point and extend the information set in step 1.

4. Update the posterior of f on the new information set.

5. Repeat steps 2-4 until convergence.

An easily accessible tutorial on Bayesian optimization can be found in e.g.

Brochu et al. (2010).

22

(39)

5. Summary of papers

Paper I

In this paper, we develop a time series filter to remove unspecified form of heteroscedasticity from univariate time series. The basic idea is to obtain a low-frequency estimate of the moving standard deviation process, and use it to scale the time series of interest. The filter can be seen as an improved and more flexible version of the filter proposed by Stockhammar and Öller (2012).

We investigate the properties of the filter via a series of simulation stud- ies and find that it works well in the sense that it removes the unwanted het- eroscedasticity. This is done without disrupting other time series properties such as the autocorrelation structure.

We find that pre-filtering of heteroscedastic ARMA-processes improves the estimation efficiency of the model parameters. For example, the filtering approach is close to as efficient as the ARMA-GARCH model, even when the underlying process is an ARMA-GARCH. Furthermore, it is more efficient if the variance series contains level-shifts.

We show on the quarterly US GDP growth that the filter can be used to construct more correct confidence intervals and, under some circumstances, improve forecast precision.

Paper II

The filter in the first can be useful for practitioners who want a simple way to deal with heteroscedasticity and still using simple ARMA models. However, most modern work considers multivariate time series, and it is thus a natural next step to bring the ideas of the first paper into a multivariate context. As mentioned in Section 2, the workhorse of multivariate time series analysis is the VAR model which is flexible, but suffers from the curse of dimensionality.

The over-fitting problem is not remedied (but quite the opposite) by introduc- ing even more parameters to jointly model the covariance matrix, and thus a multivariate generalization of the filter in Paper I could be useful.

In this paper a multivariate variance stabilizing filter is presented which is shown to be useful for both classical and Bayesian analysis of the VAR.

23

(40)

From the classical viewpoint, it is shown that removing the heteroscedastic- ity from simulated stochastic volatility VARs reduces the standard errors for the coefficient estimates and provides a more robust lag-length selection. In Bayesian VARs we can instead see an improvement in computational conve- nience. Since the multivariate filter outputs an estimate of the variance series which is close to the one obtained from the stochastic volatility model using MCMC it can be conditioned on. It can thus be used for estimating a BVAR while accounting for stochastic volatility. Since we condition on the estimate from the filter we can use natural-conjugate priors to calculate the BVAR and thus avoid MCMC. It is shown in a real data study on US macroeconomic quarterly data that the filtering approach produces out-of-sample forecast dis- tributions that are close to those of the MCMC approach compared to when the heteroscedasticity is ignored.

Paper III

The commonly used solution to the over-fitting problems in the VAR is to use Bayesian methods as a shrinking device. This is done by formulating an informative prior that pulls the coefficients of the VAR closer to some less complicated model, for example, n independent AR(1)-processes. To specify the degree of shrinkage for the BVAR the researcher has to select a number of hyperparameters, for which the optimal values are hard to guess. A way to incorporate the data in the selection process is to pick the hyperparameters that maximize the marginal likelihood. However, we typically only have access to noisy simulation-based estimates of the marginal likelihood which are costly to obtain. This situation is common in econometrics where the marginal likeli- hood is frequently computed via MCMC with the precision determined by the number of MCMC draws.

In this paper a new Bayesian optimization framework is introduced for when the user can control the balance between computational effort and noise- level of the function evaluations. The proposed acquisition strategy allows the optimizer to occasionally try cheap and noisy function evaluations to explore the function space and find the optimum faster.

We illustrate the algorithm in a real data application on US macroeconomic data, where we find optimal hyperparameters for the steady-state BVAR of Vil- lani (2009). We show that our approach find better (in terms of the marginal likelihood) hyperparameters than a grid search in a small fraction of the time.

We also show that our acquisition strategy is faster than regular Bayesian opti- mization.

24

(41)

Paper IV

The steady-state BVAR is estimated by Gibbs sampling which provides a trou- blesome bottleneck in the increasingly popular large-scale BVARs with many predictors. This makes the model impractical, especially in model comparison situations or for recursive forecasting exercises where the Gibbs sampler has to be executed many times.

In this paper, we present a variational inference (VI) approach using a structured mean-field approximation of the posterior distribution for the pa- rameters of the steady-state BVAR. We show that the VI approximation pro- vides good approximations to the parameter posteriors under realistic condi- tions and that the forecast distributions are close to identical to those obtained by Gibbs sampling. We also show that the VI approach can be used as a robust model selection tool and that it scales much better to large data sets than Gibbs sampling.

25

(42)

26

References

Related documents

While trying to keep the domestic groups satisfied by being an ally with Israel, they also have to try and satisfy their foreign agenda in the Middle East, where Israel is seen as

Keywords: Time series forecasting, ARIMA, SARIMA, Neural network, long short term memory, machine

In Paper IV, the imputation method is tested on simulated and real, longitudinal data sets with various rates of non-response. Studies with simulated data show a

As a whole, this thesis drives you through a systematic literature review to select best suitable way to perform sentiment analysis and then performing Twitter data sentiment

The choice of length on the calibration data affect the choice of model but four years of data seems to be the suitable choice since either of the models based on extreme value

This feature of a frequency- dependent time window is also central when the wavelet transform is used to estimate a time-varying spectrum.. 3 Non-parametric

This short paper models time series of the number of shareholders (owners) in large Finnish and Swedish stocks.. The numbers of owners may beyond being interesting in their own right

The continuous interference of the controllers indeed makes experimenting challenging as the control action can potentially eliminate the impact of the experimental factors on