• No results found

Measuring the information content of Riksbank meeting minutes

N/A
N/A
Protected

Academic year: 2022

Share "Measuring the information content of Riksbank meeting minutes"

Copied!
44
0
0

Loading.... (view fulltext now)

Full text

(1)

Measuring the information content of Riksbank meeting minutes

Sofia Fr¨ ojd

March, 2019

(2)

Measuring the information content of Riksbank meeting minutes is a project done in the course Master’s thesis in Engineering Physics, 30 ECTS at the Department of Physics at Ume˚a University.

Student:

Sofia Fr¨ojd, sofr0024@student.umu.se

External supervisors:

Kristofer Eriksson, kristofer.eriksson@nordea.com Mats Hyd´en, mats.hyden@nordea.com

Internal supervisors:

Andreas Nordenstr¨om, andreas.nordenstrom@umu.se

Examiner:

Markus ˚Adahl, markus.adahl@umu.se

(3)

“All models are wrong, but some are useful”

– George Box

(4)

Abstract

As the amount of information available on the internet has increased sharply in the last years, methods for measuring and comparing text-based information is gaining popularity on financial markets. Text mining and natural language processing has become an important tool for classi- fying large collections of texts or documents. One field of applications is topic modelling of the minutes from central banks’ monetary policy meetings, which tend to be about topics such as

”inflation”, ”economic growth” and ”rates”. The central bank of Sweden is the Riksbank, which hold 6 annual monetary policy meetings where the members of the Executive Board decide on the new repo rate. Two weeks later, the minutes of the meeting is published and information regarding the future monetary policy is given to the market in the form of text. This information has before release been unknown to the market, thus having the potential to be market-sensitive.

Using Latent Dirichlet Allocation (LDA), an algorithm used for uncovering latent topics in doc- uments, the topics in the meeting minutes should be possible to identify and quantify. In this project, 8 topics were found regarding, among other, inflation, rates, household debt and economic development.

An important factor in analysis of central bank communication is the underlying tone in the discussions. It is common to classify central bankers as hawkish or dovish. Hawkish members of the board tend to favour tightening monetary policy and rate hikes, while more dovish members advocate a more expansive monetary policy and rate cuts. Thus, analysing the tone of the minutes can give an indication of future moves of the monetary policy rate.

The purpose of this project is to provide a fast method for analysing the minutes from the Riksbank monetary policy meetings. The project is divided into two parts. First, a LDA model was trained to identify the topics in the minutes, which was then used to compare the content of two consecutive meeting minutes. Next, the sentiment was measured as a degree of hawkishness or dovishness. This was done by categorising each sentence in terms of their content, and then counting words with hawkish or dovish sentiment. The resulting net score gives larger values to more hawkish minutes and was shown to follow the repo rate path well. At the time of the release of the minutes, the new repo rate is already known, but the net score does gives an indication of the stance of the board.

(5)

Sammanfattning

M¨angden information tillg¨anglig p˚a internet har ¨okat kraftigt under de senaste ˚aren, vilket har lett till att metoder f¨or att m¨ata och j¨amf¨ora textbaserad information ¨okar i popularitet p˚a de finansiella marknaderna. Text mining och natural language processing (NLP) ¨ar numera viktiga verktyg f¨or att klassificera stora samlingar av dokument. Ett anv¨andningsomr˚ade ¨ar topic modeling av protokoll fr˚an centralbankers penningpolitiska m¨oten, vilka tenderar att handla om

¨

amnen s˚a som ”inflation”, ”ekonomisk tillv¨axt” och ”r¨antor”. Sveriges centralbank ¨ar Riksbanken, som ˚arligen h˚aller 6 penningpolitiska m¨oten d¨ar ledam¨oterna i direktionen bland annat best¨ammer den nya repor¨antan. Tv˚a veckor senare publiceras protokollet fr˚an m¨otet d¨ar information ang˚aende den framtida penningpolitiken ges till marknaden i form av text. Denna information har innan publiceringen varit ok¨and f¨or marknaden, och har d¨armed potential att vara marknadsp˚averkande.

Genom att anv¨anda Latent Dirichlet Allocation (LDA), kan ¨amnena i m¨otesprotokollen identifieras och kvantifieras.

En viktig del av att analysera kommunikation fr˚an centralbanker ¨ar den underliggande tonen.

Det ¨ar vanligt att klassificera ledam¨oter (och centralbanker) som h¨okaktiga eller duvaktiga, d¨ar h¨okaktiga ledam¨oter tenderar att f¨orespr˚aka en stramare penningpolitik och h¨ojda r¨antor, medan duvaktiga ledam¨oter hellre ser en mer expansiv penningpolitik och l¨agre r¨antor. Genom att studera tonen i m¨otesprotokollen kan man d˚a skapa sig en starkare bild av den framtida penning- politiken.

Syftet med detta projekt ¨ar att skapa en metod f¨or en snabb analys av protokollen fr˚an Riks- bankens penningpolitiska m¨oten. Projektet ¨ar uppdelat i tv˚a delar. F¨orst tr¨anades en LDA modell f¨or att hitta de ¨amnen som behandlades i protokollen, vilka anv¨ands f¨or att j¨amf¨ora ett nytt protokoll med tidigare i termer av inneh˚all. I den andra delen av projektet m¨ats tonen i protokollen som en grad av h¨ok- eller duvaktighet, baserat p˚a en enkel men v¨alfungerad modell som bygger p˚a att r¨akna specifika ord. Detta resulterade i en nettopo¨ang som ger h¨ogre v¨arden till mer h¨okaktiga protokoll, och visades f¨olja repor¨antan v¨al. N¨ar ett protokoll publiceras ¨ar den nya repor¨antan redan k¨and, men nettopo¨ang ger en bra indikation p˚a direktionens h˚allning.

(6)

Acknowledgement

Working on this project has been a fantastic experience and have have learned so much during this time.

So before diving into the thesis, I would like to specially thank Kristofer Eriksson, Mats Hyd´en, Kristin Magnusson Bernard and Inge Klaver at Nordea for giving me the opportunity to work on this project, for always being there to help me, for all the discussions and for always listening to my ideas.

Sofia Fr¨ojd, Ume˚a University, March, 2019

(7)

Contents

1 Introduction 1

1.1 Background . . . 1

1.1.1 Previous studies . . . 2

1.2 Purpose and goal . . . 3

1.3 Delimitations . . . 3

1.4 Disposition . . . 3

2 Theory 4 2.1 Latent Dirichlet allocation . . . 4

2.2 Inference and parameter estimation . . . 6

2.2.1 Variational Bayesian inference . . . 6

2.2.2 Online Variational Inference . . . 8

2.2.3 Model parameter estimation . . . 9

2.3 Perplexity . . . 9

2.4 Limitations of LDA . . . 10

3 Method 11 3.1 Preprocessing the meeting minutes . . . 11

3.2 How to train your model . . . 12

3.3 Time series . . . 12

3.4 Sentiment analysis . . . 13

4 Results 14 4.1 Topics . . . 14

4.2 Comparing minutes . . . 16

4.3 Net score of the Riksbank . . . 17

5 Discussion 18 5.1 The assumption of exchangeability . . . 18

5.2 The number of sections in the corpus . . . 18

5.3 Absence of relevant topics in sections . . . 19

5.4 Time aspect . . . 19

5.5 Stop lists . . . 20

5.6 Misclassification . . . 20

5.7 Sentiment analysis limitations . . . 20

6 Conclusions 22

(8)

A Mathematical introduction 25

A.1 Bayesian statistics . . . 25

A.2 Multinomial distribution . . . 26

A.3 The Dirichlet distribution . . . 26

A.4 Kullback-Leibler divergence . . . 27

B Parameter estimation 28 C Topics 31 D Tables 34 D.1 Hawkish and dovish words . . . 34

D.2 Exchangeability . . . 35

(9)

Abbreviations:

EB Executive board (of the Riksbank) NLP Natural language processing LDA Latent Dirichlet allocation

BOW Bag-of-words

VBI Variational Bayesian inference MCMC Markov Chain Monte Carlo KL divergence Kullback-Leibler divergence

QE Quantitative easing

Notations:

θd The distribution of topics in document d. A vector of length K α Dirichlet prior on θd. A vector of length K.

βk The distribution of words in a topic k. Vector of length V . η Dirichlet prior on βk.

K Number of topics represented in the corpus.

V Length of vocabulary.

D Number of documents in the corpus.

Nd Number of words in document d.

wd The sequence of words in document d, a vector of length Nd.

zd The assignment of topics for each word in document d. A vector of length Nd. γdk Variational parameter for θd, the probability of topic k in document d.

φdnk Variational parameter for zd, the probability that the n:th word in document d is generated by topic k.

λkv Variational parameter for β, the probability of word v from the vocabulary in topic k .

(10)

Chapter 1

Introduction

1.1 Background

Measuring and comparing text-based information has for long been used on financial markets, e.g. in attempts to predict stock prices or extracting the sentiment in seminal papers [1]. With the vast amount of information available on the internet today, text mining and natural language processing (NLP) has become an important tool for extracting information and classifying large collection of texts or documents. One field of application is analysis of minutes from central banks’ monetary policy meetings. These minutes tend to be about different topics such as ”inflation”, ”growth” or ”rate decisions”, and manually quantifying these topics fast can be challenging. NLP can be used to find the latent topics in meeting minutes in a more objective way. Over time, the topics should occupy varying amount of time in the meetings and should be possible to both identify and quantify.

The Swedish currency and interest market has a turnover corresponding to about two times the entire Swedish GDP per month. One important factor driving these markets is the expectations on the repo rate, which is decided by the Riksbank [2]. The Riksbank is the central bank of Sweden and has the responsibility of keeping inflation stable and close to a target. The repo rate is the rate to which banks can borrow or deposit money with the Riksbank, and will consecutively affect other interest rates, economic activity and inflation [3].

The Executive Board of the Riksbank (EB) hold six annual monetary policy meetings where the members discuss the current economic situation and development, share their views on the future and decide on the new repo rate. Communication to the public is an important tool for the Riksbank and many other central banks. By clearly explaining their reasoning will help both households, banks and companies to understand and be able to predict future monetary policy, and take actions based on it. Around two weeks after the monetary policy meeting, the meeting minutes are published and information regarding future monetary policy is given to the market in the form of text. Since this information has been unknown prior to the release, it has the potential to be market-sensitive. Breaking down the minutes into their underlying topics can be a fast way of analysing the content at the release [4].

An important factor in the analysis of the meeting minutes from central banks is the underlying tone in the text, which can give an indication of the future moves of the monetary policy rates. A commonly used classification of a central banker is as more hawkish or more dovish. Hawkish members of the board favour raising rates, slowing economic growth and tightening monetary policy. Dovish members on the other hand advocate rates cut, stimulating economic growth and a more expansionary monetary policy. In the same way, central banks can be classified in terms of a degree of hawkishness or dovishness [5].

(11)

1.1.1 Previous studies

Many papers explore the content of central bank meeting minutes, and one of them is by N. Jegadeesh and D.

Wu (2015), who analysed the meeting minutes from the Federal Reserve (Fed), the central bank of the United States. They decomposed the Federal Open Market Committee (FOMC) meeting minutes into distributions of hidden topics, with the purpose of determine the informativeness of the content. Using Latent Dirichlet Allocation (LDA), the minutes could be described as a distribution among the topics ”growth”, ”inflation”,

”financial markets” and ”policy”. The tone in the minutes was calculated by counting the occurrence of positive and negative words, then taking the difference between the two as the net score. In addition, uncertainty words were counted as an uncertainty score. This procedure was done both for each topic and on the minutes in full, without topic classification. The words in each category comes from the Harvard IV Dictionary and the financial tonal list developed by Loughran and McDonald (2011). Market reaction was used as an objective evaluation to the informativeness of the minutes and the topics, and a significant relation between topic content and market volatility was found [6].

Many studies have explored the sentiment in central bank communication. E.g. S. Sharpe, et. al. (2017), who measured the degree of optimism or pessimism in texts describing Fed forecasts, and linking the tonality to, among others, GDP growth and unemployment in the upcoming quarters. Their method of measuring the tone was based on two lists containing positive and negative words, where the lists again came from the Harvard IV Dictionary, with some words manually removed. They measured the tone in the overall document, without taking double negatives into consideration, to reduce the amount of judgement needed.

Each word was weighted according to the tf-idf1 scheme, where the weight for each word is equal to the number of times the word appears in a document divided by the number of times it has appeared in the previous documents. In this method, infrequent words are considered more informative and is given a higher weight. They found that the tonality did have predictive power on both GDP growth and unemployment, and also equity prices and monetary policy surprises. The paper concluded that the narrative in economic forecasts did contain valuable information [7].

Sentiment analysis has also been made on the Riksbank by M. Apel and M. Blix Grimaldi (2012). They investigated to what extent the central bank communication reflects the full spectrum of opinions from each member of the board, and if the information regarding each individual member’s view makes policy easier to understand and predict. Their approach was to measure the tone in the minutes from the monetary policy meetings as a degree of hawkishness or dovishness, called net score. Instead of just counting positive or negative words in the text, they searched for combinations of nouns and adjectives to capture the tone in the text, such as ”higher inflation” or ”slower growth”, where the first case would be classified as hawkish and the second as dovish. The purpose of choosing two-word combinations is to avoid the problem of single words being used in different ways in different contexts. Their conclusion was that the information in the minutes is useful when it comes to predicting future monetary policy actions [8].

In 2017, E. Tobback, et. al., measured the tone in news articles reporting on the European Central Bank (ECB) monetary policy decisions, as a degree of hawkishness or dovishness, called a HD index. They used two different methods, semantic orientation (SO) and Support Vector Machine (SVM) classification, which they accompanied by using LDA to pick the dominant topics in the articles. They found that SVM showed more moderate levels of tonality, though, SVM and SO were closely correlating. For detecting the tone in the news articles, they concluded that SVM performed better than SO, because SVM captured the dovish tones better during the sovereign debt crisis around 2011. From LDA they found that the focus of the media shifted over time from the actual rate decisions to the non-standard monetary policy measurements, such as the asset purchase program. When assessing media’s perception of ECB, they concluded that a more suitable approach would be to use supervised text classification instead. This paper gave some examples of applications of the HD index, e.g. to anticipate future monetary policy or measure how ECB communication is perceived by media and other observers [9].

1Tf-idf stands for term frequency-inverse document frequency

(12)

1.2 Purpose and goal

The purpose of this project is to provide a method for fast analysis of the minutes from the Riksbank’s monetary policy meetings at the release. The analysis will be based on topic modelling using Latent Dirichlet Allocation (LDA) to identify the topics discussed in the minutes, which can then be used to compare the content of two consecutive minutes. The next step is then to measure the sentiment in the minutes as a degree of hawkishness or dovishness, in order to get an indication of the stance of the Executive Board.

1.3 Delimitations

This project will only consider the monetary policy meeting minutes from the Riksbank, written in Swedish.

Training will only use minutes within the time span of January 1999 to October 2018. The minutes from December 2018 and February 2019 will be used in testing.

1.4 Disposition

Chapter 2 introduces Latent Dirichlet Allocation, mentions some different approaches to classifying text corpora, and ends with a review of some of the limitations of LDA. Chapter 3 goes through the preprocessing of meeting minutes, how they are used to train a LDA model to find topics and how the time series of the topic distribution in the minutes are made. The chapter also describes how the sentiment in the text was captured. The resulting topics, time series and net score are presented in chapter 4. A discussion of the performance of the model is found in chapter 5. The final conclusions of this project are given in Chapter 6.

Chapter 2 covers some mathematical concepts that might not be known by all readers, so Appendix A tries to give short introductions to, among other things, Bayesian statistics and the Dirichlet distribution, which are key concepts in LDA.

(13)

Chapter 2

Theory

This chapter starts with an introduction to Latent Dirichlet Allocation. First, a short note on applications of LDA is given, followed by a description of the generative process of the model and finally an explanation to how the model finds latent topics in a corpus. The concept of perplexity is introduced as a method of evaluating the model, and the chapter ends with a review of some limitations to LDA. Many concepts men- tioned in this section, such as Bayesian statistics, the Dirichlet distribution and the multinomial distribution are introduced in Appendix A.

2.1 Latent Dirichlet allocation

A common approach to topic modelling is to use Latent Dirichlet Allocation (LDA), first introduced by Blei et al. in 2003 [10]. The LDA model has many applications in a variety of fields, from investigating the presence of mental health articles in newspapers, to finding recommendations for articles to read or studying the history of scientific ideas [11][12]. But it does not only have to be tied to text, LDA can also be applied in image clustering and collaborative filtering [10]. The LDA model is also very useful for uncovering the semantic structure of large collections of documents, or corpora. We can think of documents in a corpus as a mixture of a number of unobservable, or latent, topics, where each topic has an unobservable distribution of words. The only observable quantity in the corpus are the words. By using LDA we wish to find these unobservable distributions.

LDA is a generative topic model, which assumes that all documents in a corpus is generated by the same sta- tistical process, and then learns how to generate similar documents. There are different learning-approaches and this chapter will focus on learning through Bayesian models. LDA is a three-level hierarchical Bayesian model, which is suitable when data is available on several levels, as for the case of documents in the corpus and words in the documents, where each document consist of different distributions of topics and each topic consist of a distribution over words. Finally, the LDA model is unsupervised, meaning that there is no assignment of words to the topics beforehand, the model is only told the number of topics that should be found in the corpus.

Let a corpus consist of D documents containing K topics, where all words present in the corpus comes from a vocabulary of V words. The total number of words in document d is given by Nd. Each document in the corpus is assumed to be generated by the same process:

1. The mixture of topics in a document is given by θd and follows θd∼ Dir(α), where α is the Dirichlet prior and is the same for the entire corpus. Both α and θd will be vectors of length K with all αk > 0.

We can see α as a prior belief to each topic’s probability and a measure of how distributed the topics are in the documents, larger values to α gives a more even distribution of the topics. In topic modelling,

(14)

reasonable values to α are usually smaller than 1 [13]. θd will take values on the (K − 1)-simplex (further explained in Appendix A.3) and has a density function given by

p(θd|α) = Γ ΣKk=1αk ΠKk=1Γ(αk)

K

Y

k=1

θdkαk−1, (2.1)

where Γ is the gamma function. θdk is the probability that document d contains topic k.

2. Each document is a sequence of Nd words, so a document can be described as wd= {wd1, . . . , wdNd}, where each word wdnis a vector of length V with exactly one component equal to 1 and all others are zero. This can also be denoted by superscripts, where one word wvdn= 1 and all others wudn= 0 for v 6= u. For each word index n in the document:

(a) Choose a topic from zdn ∼ Multinomial(θd). zd will then be a Nd× K vector containing the topic for each word. Each zdn is a vector of length K that for one topic k is equal to 1 and 0 everywhere else. Similar as above, we can write the topic assignment with superscripts, where one topic zdnk = 1 and all others are zdni = 0, k 6= i.

(b) For topic zdnwe choose the word wdnfrom p(wdn|zdn, β), where β ∼ Dir(η). β is a K × V -matrix with the probability that the k:th topic contains the v:th word, and is assumed to be a Dirichlet distribution parameterized by the scalar Dirichlet prior η [14]. βk has a density function given by

p(βk|η) =Γ (V η) Γ (η)V

V

Y

v=1

βkvη−1. (2.2)

By using LDA we wish to find the distribution of words in the topics, β, and the distribution of topics in a document, θd. Given a corpus, LDA assumes that all documents have been generated by the process above and then tries to estimate the parameter values that best explains the corpus. In the generative process, documents are represented as vectors of words but there is no consideration about the order in which the words appear, there is an exchangeability of the words in the document. This allows us to use the bag-of- words (BOW) assumption, where a document can be represented as list of unique words together with a count for how many times they appear in a document. There is also an assumption of exchangeability of the documents in the corpus, the order in which the documents appear is of no importance. The generative process can also be explained graphically, as in Fig. 2.1, which shows the three levels of the model.

Figure 2.1: Plate notation of LDA, showing the three levels of the model. α and η are corpus level parameters, θd act on a document level, βk on topics, zdn and wdn are parameters for each word in a document.

(15)

In this notation, plates represent repeated variables such as documents in the corpus and words in documents.

The observable variables are shaded, the others are latent. The nodes represent random variables, such as the model parameters. The arrows show the dependency between the nodes. The outer plate represents one of the D documents in the corpus, and the inner plate describes the Nd words in that document. The outer parameters α and η are corpus level parameters and are the same for all documents. βk is sampled once per topic and the topic mixture θd is sampled once per document. For each word in the document zdnand wdn are sampled.

Given α and η, the joint distribution of θd, zd, wd and β in a document is given by

p(θd, zd, wd, β|α, η) = p(θd|α) p(β|η)

Nd

Y

n=1

p(zdnd) p(wdn|zdn, β), (2.3)

where p(θd|α) is given by Eq. (2.1) and p(β|η) is given by Eq. (2.2). The topic proportion p(zdnd) is simply θk for a k giving zdn= 1. Last, p(wdn|zdn, β) is the probability of a word in a topic. By integrating Eq. (2.3) over θd and summing over zdn, the marginal distribution of a document is given by

p(wd|α, η) = p(β|η) Z

p(θd|α)

Nd

Y

n=1

X

zdn

p(zdnd) p(wdn|zdn, β)

!

d. (2.4)

2.2 Inference and parameter estimation

When using LDA, a key problem is to compute the posterior distribution of latent variables θ and z in a document. The posterior is given by Bayes theorem as

p(θd, zd, β|wd, α, η) = p(θd, zd, wd, β|α, η)

p(wd|α, η) , (2.5)

where the nominator is joint distribution given by Eq. (2.3) and the denominator is the marginal distribution of a document given by Eq. (2.4). Normalizing the denominator gives

p(wd|α, η) = p(β|η) Γ ΣKk=1αk ΠKk=1Γ(αk)

Z K

Y

k=1

θdkαk−1

! Nd Y

n=1 K

X

k=1 V

Y

v=1

dkβkv)wvdn

! dθd,

which is difficult to compute due to the coupling between θdand β [10]. There are multiple ways to deal with this problem. One way is to use variational Bayesian inference (VBI) which is explained in the following section. Another method is Markov Chain Monte Carlo (MCMC), a motivation to why VBI is chosen over MCMC can be found in Appendix A.1.

2.2.1 Variational Bayesian inference

The idea behind variational Bayesian inference (VBI) is to find an approximation to the intractable posterior distribution by a simpler distribution, indexed by variational parameters. The process is to take a family of lower bounds on the log likelihood of a document and find the optimal values to the variational parame- ters by minimizing the Kullback-Leiber (KL) divergence, which is a measure of how similar two probability distributions are to each other, explained in more detail in Appendix A.4. Lower values to the KL diver- gence indicate that the distributions are more like each other. When the optimal values to the variational parameters are found, it is then possible to approximate the model parameters.

(16)

To approximate the posterior distribution given by Eq. (2.5), we let the family of lower bounds be charac- terized by the variational distribution

q(θd, zd, β|γd, φd, λ) =

K

Y

k=1

Dir(βkk)q(θdd)

Nd

Y

n=1

q(zdndn)

where the per-word topic assignment zd is parameterized by the multinomial φ, where φdnk assigns the probability that the n:th word in document d is generated by topic k. The topic weights θ is parameterized by the Dirichlet prior γ, where γdk is the probability of topic k in document d. Finally, the posterior over topics β is parameterized by the Dirichlet parameter λ, where λkvis the probability for the v:th word in the k:th topic. The model with variational parameters can be described by plate notation as in Fig. 2.2.

Figure 2.2: Plate notation for the model with variational parameters, showing the the variational parameters γ, φ, λ and their connection to the model parameters θ, z and β.

The optimizing parameters γd, φdand λfor a document d is found by minimizing the KL divergence DKL. Thus, the optimization problem we wish to solve is given by

d, φd, λ) = arg min

(γ,φ,λ)

DKL(q(θd, zd, β)||p(θd, zd, β|wd, α, η)) .

A derivation of the optimal values to the variational parameters can be found in Appendix B. Using Jensen’s inequality we can find the lower bound to the log likelihood of a document as

log p(wd|α, η) ≥ Eq[log p(θd, zd, wd, β|α, η)] − Eq[log q(θd, zd, β)], (2.6) In Eq. (2.6), the difference between the left-hand side and the right-hand side is equal to the KL divergence, see Appendix B. Denoting the right hand side as L(γd, φd, λ; α, η), Eq. (2.6) is rewritten to

log p(wd|α, η) = L(γd, φd, λ; α, η) + DKL(q(θd, zd, β)||p(θd, zd, β|wd, α, η)) .

Since log p(wd|α, η) − L(γd, φd, λ; α, η) ≥ 0, maximizing the lower bound L with respect to γ, φ and λ will minimize the KL divergence, and we find the variational distribution q that is most similar to the true posterior distribution p. The optimizing parameters γd, φd and λ are found by taking the derivative of the lower bound and setting it equal to zero, the procedure is further explained in Appendix B. This will give a optimizing value to φdnk as

φdnk∝ exp {Eq[log θdk] + Eq[log βkv]} , which can be expanded to

φdnk∝ exp (

Ψ(γdk) − Ψ

K

X

i=1

γdi

!!

+ Ψ(λkv) − Ψ

V

X

i=1

λki

!!)

, (2.7)

(17)

where Ψ is the first derivative of the log Γ function. Here, β is drawn from a Dirichlet distribution with η as a prior. Without η then βkv= p(wdnv = 1|zdnk = 1) and the equation above would be φdnk∝ βkvexp{Ψ(γdk) − Ψ(ΣKi=1γdi)}. Since βkv is the probability of word v in topic k, any word in a document not represented in the vocabulary would be assigned with a probability of 0 by φdnk. The addition of a Dirichlet parameter on β is called smoothing and it is seen in Eq. (2.7) that also words not included in the vocabulary will be given a positive probability.

Following the same procedure as above, γdk and λkvare given by

γdk= αk+

Nd

X

n=1

φdnk. (2.8)

λkv= η +

D

X

d=1 Nd

X

n=1

φdnkwvdn. (2.9)

The optimizing parameters γd, φd, and λ are found by an EM-algorithm, presented in Algorithm 1. λ is randomly initialized once and γdk is randomly initialized once for each new document. The E-step updates φdnkand γdkuntil convergence using Eqs. (2.7) and (2.8), respectively. The M-step then finds the optimizing value to λkvgiven φdnk by Eq. (2.9).

Algorithm 1 VBI Initialize λ randomly

while Improvement in L >threshold do E step:

for d = 1 to D do

Initialize γdk randomly repeat

Set φdnk∝ exp(Eq[log θdk] + Eq[log βkv]) Set γdk= αk+PNd

n=1φdnk

until K1 PK

k=1|change in γdk| < threshold end for

M step:

Set λkv= η +PD d=1

PNd

n=1φdvkwvdn end while

2.2.2 Online Variational Inference

VBI is unfortunately not well suited for large scale corpora or when new documents are arriving in a stream, since it must pass the entire corpus in each iteration. Instead, one can use online variational Bayes inference for LDA (Online LDA) which is very similar to VBI but converges faster for larger data sets and finds the topics just as well [15]. Online LDA utilizes an EM-algorithm to find the optimizing parameters γdk and φdnk in the same way as in section 2.2.1. The M-step then finds the optimizing λnew, given φdnk, from the following equation

λnew= (1 − ρdold+ ρdλ,˜ (2.10)

where ρd assigns weights to the parameters λold and ˜λ and is given by ρd= (τ0+ d)−κ,

(18)

where τ0 slows down early iterations of the algorithm and κ is the rate at which old values are forgotten when new documents are being examined. ˜λ is the value that λ would have, given φ, if the corpus consisted of the one document d repeated D times, given by

λ˜kv= η + D

Nd

X

n=1

φdnkwdnv .

The EM procedure is very similar to the one in VBI. We initialize a value to γ and λ. The E-step finds the optimal values to φ and γ as before, but in the M-step we now finds λ by Eq. (2.10). Parameter estimation for Online LDA is summarized in algorithm 2.

Algorithm 2 Online LDA Define ρd= (τ0+ d)−κ Initialize λ randomly

while Improvement in L >threshold do E step:

for d = 1 to ∞ do

Initialize γdk randomly repeat

Set φdvk∝ exp(Eq[log θdk] + Eq[log βkv]) Set γdk= αk+PNd

n=1φdnk

until K1 PK

k=1|change in γdk| < threshold end for

M step:

Compute ˜λkv= η + DPNd

n=1φdnkwvdn Set λnew= (1 − ρdold+ ρd˜λ end while

2.2.3 Model parameter estimation

With the optimizing parameters, we can update the values to α with αnew= αold− ρdα(γ˜ d),

where ˜α(γd) is the inverse of the Hessian matrix multiplied with the gradient ∇αL(γ, φ, λ; α, η) .

2.3 Perplexity

The definition of the word perplexity is ”the inability to deal with or understand something”. In NLP, it is used as a measure of model’s ability to predict a sample. Perplexity measures the likelihood of some test data and is equivalent to the inverse of the geometric mean per-word likelihood. This means that a lower value gives a better prediction. For a test corpus of Dtdocuments, the perplexity is given by

perplexity(Dt) = exp (

− PDt

d=1log p(wd) PDt

d=1Nd )

where p(wd) is the marginal distribution of a document and Nd is the number of words in the document [10].

(19)

2.4 Limitations of LDA

One reason for using LDA is to get an objective explanation for the structure of documents, but to determine whether the model is good or not there is a need for subjective evaluation. This problem arises because there are no outstanding objective evaluation methods. This issue is particularly clear when it comes to assigning the number of topics, K, as there is no clear way of measuring what number topics represents the actual distributions best. This case can be particularly troublesome in small corpora where it is expected that a corpus treats only a few topics. To evaluate the best choice of K, one can subjectively evaluate the number of topics by looking at the most probable words for each topic and intuitively chose the K that give the best output.

The length and number of documents play a crucial role for getting comprehensible topics. A rule of thumb is that more documents produce a better model [15]. LDA shows poor performance on short texts such as Tweets, Facebook posts, or newspaper comments. The model learns topics from co-occurrences of words on document-level, and in small texts the number of words can simply be too few. Even though a clear topic might not be present in the text, LDA will still find K topics. For short texts, one could either modify the LDA or the structure of texts, e.g. by combining sets to a larger text [16][17].

Long documents, such as books or minutes, will neither produce comprehensible topics. These texts are likely to contain a mixture of multiple topics and it can therefore be difficult for LDA to separate the different topics. A solution to this issue can be to split the documents into e.g. chapters or sections.

The assumption of exchangeability between documents in a corpus can be unrealistic for a corpus with a long time span, in which case we can speculate that the topics in the documents will evolve over time. Such documents can for example be news articles, emails or scientific journals, which all reflects a content of change. To model these kinds of corpora, one can use Dynamic Topic Modelling which captures the change in content with respect to time [18].

(20)

Chapter 3

Method

A common saying in computer science is ”garbage in – garbage out”, this also holds true in topic modelling.

That is why preprocessing the meeting minutes is a vital part to obtain interpretable results. This chapter will explain how the minutes from the EB’s monetary policy meetings were processed before training the LDA model. Next follows a description of how the model was trained and how the topics were used. The chapter ends with an explanation of how the sentiment in the minutes was measured.

3.1 Preprocessing the meeting minutes

Swedish versions of the minutes from the EB’s monetary policy meetings from January 1999 to October 2018 were downloaded as PDF from the Riksbank’s web page using a web scraper. The minutes from December 2018 and February 2019 were downloaded manually at the time of publication. Over time there have been extraordinary meetings in cases when the board needs to make decisions between ordinary meetings. These minutes tend to be very short and contains only short discussions, thus hold no valuable information to the LDA model. For this reason, minutes shorter than 4 pages were excluded. This results in a total of 134 minutes remaining. Over the years there have been minor changes to the layout of the minutes, but essentially, they are composed in very a similar way. The first pages usually contain a front page and summary of the meeting, followed by a list of participants. Next follows statements from representatives from different departments regarding economic development. Most of the minutes consist of the discussions brought by the EB, where they give their current views on the monetary policy and their outlook. The last pages contain a list of decisions and information about the vote on the new repo rate.

For the analysis, we are interested in the sections containing EB’s discussions, so the first and last pages are removed and the documents are split into sections. LDA assumes that one document treats one topic, so splitting the minutes into sections will make it easier for the model to learn the topics. In Python, the minutes of a meeting will be a list of sections, where each section is a list of words, represented as strings.

Following a similar approach as Jegadeesh and Wu (2013) [6], the preprocessing of each section is done in several steps:

1. All words are tokenized, meaning that all sections are split into individual words, then changed into lowercase letters, accent marks and number are removed and words shorter than 2 letters are deleted.

2. Removing all stop words, words that occur frequently in the text but contain no valuable information to the topics. Examples of such words are ”i”, ”och”, ”Riksbank” and ”Ingves”. The stop list contains a standard set of words, the names of the members of the Executive Board, the names of the months, weeks, days etc. Also, a list of adjectives was removed, such as ”higher” and ”lower”. These words are commonly mentioned in all topics but can have very different meanings, so removing these words

(21)

create less connections between the different topics. In all, the stop list contains 690 words.

3. All words are lemmatized, changed to its root form, to reduce the number of words in the corpus. E.g.

the word ”inflationen” will become ”inflation”, and ”r¨antorna” or ”r¨antan” become ”r¨anta”.

4. As a final step, all sections shorter than 45 words are removed from the corpus. As discussed in section 2.4, LDA shows poor performance on short texts, so by setting a lower limit to the length of the sections is most likely to produce a better model.

To simplify the training, all unique words in the preprocessed corpus are given an ID, and the words together with their corresponding ID are stored in a dictionary. With the use of this dictionary it is possible to create a bag-of-words (BOW) for each section in the corpus, assigning all words in a section with the right word ID and a count of how many times the word appears in the section. In a situation where a section contains words that are not in the dictionary, these words will not appear in the BOW corpus. To remove any possibly misspelled words, all words appearing less than 7 times in the entire corpus were removed from the dictionary. The preprocessing on the minutes from the Riksbank resulted in a dictionary of 2343 words and a corpus of 1697 sections.

3.2 How to train your model

There are many different approaches available for topic modelling. This project will use the Gensim library in Python, which provides an algorithm for LDA and a variety of features for evaluating the model [19].

Gensim has ability to handle large corpora without having to load an entire corpus in memory since it is based on Online LDA and follows Algorithm 2 in section 2.2.2. There is also the option of parameter estimation using VBI, as in Algorithm 1.

The function ldamodel was used to train the model, which use the BOW corpus, the dictionary and the number of topics, K, assumed to be present in the model. The model will find K topics in the training corpus and the probability of each word in the dictionary belonging to that topic. By testing different number of topics, it was decided that 8 topics gave the most comprehensible results. Each E-step will iterate through the corpus 120 times when inferring the topic distribution and the model will go through the corpus 10 times. The Dirichlet parameter α is updated after every M-step and η is kept constant at 1/K for the entire training. When estimating the topic distribution in the minutes, all topics with a probability less than 10−6 were filtered out. The default value in Gensim is 0.01 but lowering the minimum level will put a value to each topic in each of the minutes, making it easier to plot time series later.

3.3 Time series

With the topics from a trained model, it is possible to estimate the topic distribution of the meeting minutes and show the change over time. This part involves no training so the preprocessing of the minutes will be slightly different. The same minutes used in training the model were used for making the time series, plus the minutes from December 2018 and February 2019. To capture the topic distribution in the minutes of a meeting, only sections shorter than 2 words are excluded, sections shorter than that are assumed to contain no information. The dictionary was used to convert the texts into a BOW, and the trained model is then used to find the topic distribution for each of the minutes. The result is displayed as a stack plot.

(22)

3.4 Sentiment analysis

The tone in the minutes of a new meeting was measured by a simple approach of word counts. Two lists of hawkish and dovish words were manually constructed, containing all forms of a word instead of lemmatized words. The words in the lists were chosen based on which words were commonly used in the meeting minutes.

Examples of hawkish words are ”increase”, ”raising”, ”rise”, ”positive”, ”hike” and ”tightening”, in total consisting of 54 words. The dovish list includes words such as ”decreasing”, ”lower”, ”uncertainty”, ”weak”,

”negative”, ”cut” and ”expansive”, in total 82 words. The full lists can be found in Appendix D.1. The new minutes of the meeting was split into sentences and each word tokenized. In this part, no stop words were removed, and no lemmatization was used.

Each sentence was then categorized into one of two categories based on the frequency of certain words. The first category contains words such as ”inflation”, ”monetary policy” and ”growth”, the second include words such as ”uncertainty”, ”unemployment”, ”debt” and ”risks”. The sentences regarding monetary policy or inflation will treat words such as ”raising” or ”increase” as hawkish words, while sentences discussing e.g.

household debt, uncertainty or risks will most likely treat words as ”increase” as a reason for keeping lower rates and an expansive monetary policy, therefore being dovish. All sentences that could not be categorised based on the word counts were placed in the first category.

When the sentences are categorised, words from the hawkish and dovish lists are counted and stored in a hawkish score and a dovish score. Sentences in the first category will give one point to the hawkish score if the word appear in the hawkish list, and dovish words give dovish points. The opposite goes for the second category, if a word appears in the hawkish list it will give one point to the dovish score, and vice versa.

When all words are counted, the net score of each of the minutes is calculated by net score = hawks − doves

hawks + doves,

which will take values between -1 and +1, where larger values indicate a more hawkish tone.

To just count hawkish or dovish words do come with one problem. An example is the sentence ”I do not support the proposal in the Monetary Policy Report to now raise the repo rate”1, which in the word count would be regarded as hawkish, when it in reality would be considered a dovish statement. This happens since the model cannot connect the words ”do not” to the word ”raise”. Many more similar examples can be found in the meeting minutes. To work around this problem, each sentence went through a search of the word ”not”2, and if the word appeared, the hawkish or dovish words in that sentence were not counted.

1Minutes of the Monetary Policy Meeting held on 19 December 2018, p. 11, https://www.riksbank.se/en-gb/

2The word ”inte” in Swedish

(23)

Chapter 4

Results

This chapter starts by introducing the topics found by the LDA model and how the distribution of the topics in the minutes has changed over time. At the time of the release of new minutes of a meeting, the model is used for a fast comparison of two consecutive minutes, and the results are shown here. Finally, a presentation of the net score of the minutes are shown in comparison to the repo rate.

4.1 Topics

By testing a number of different values to K and manually evaluating the content of the words and the uniqueness of each topic, it was found that the best results came from 8 topics. The topic label was chosen as a word that best represented the words in each topic. How each topic has been distributed over time is shown in Fig 4.1. The most important words for each topic, with their corresponding weight, are presented in table 4.1.

Figure 4.1: Time series of the topic distributions in the meeting minutes.

(24)

Several time periods can be picked out from the figure above. One of the most notable is the huge increase in the topic ”Swedish monetary policy”, which started around 2007 and grew a lot up until 2009, when it started to decline again. This topic covers, among others, the rate decision, some economic outlook and unemployment. In late 2007, the minutes were also changed to now naming each board member by name, which may have increased the proportion of the topic in the minutes to some extent. It is also notable that during the time where the topic had the largest proportion in the minute, Lars E.O. Svensson was on the Executive Board (2007–2013). He advocated a more expansionary monetary policy and a negative rate due to the high unemployment rate and falling inflation, and was very critical to the fast rate hike that occurred from mid 2010 to the end of 2011. Looking at sections with high probability of belonging to this topic, it was seen that many of the sections contained discussions brought by Svensson1.

As expected, the topic ”external risks” showed an increase around the financial crisis of 2008. During this time, there was little discussion regarding the inflation. It is also seen that the topic of ”QE” (quantitative easing) first appeared around 2014. This was around the time when the Riksbank started purchasing government bonds to make monetary policy more expansive, instead of cutting the rates. ”Household debt”

has also been increasing in later years. Because of low rates, households have started to borrowing money in greater extension, making them sensitive to increasing rates. The increasing household debt has for a while been worrying to the Riksbank.

A cross-check for sections that contain one dominant topic can be found in Appendix C.

Table 4.1: The topics and the 10 most probable words for each topic.

Household debt Economic growth External risks Inflation

Word Weight Word Weight Word Weight Word Weight

hush˚all 0.038265 tillv¨axt 0.019055 risk 0.013338 inflation 0.050273

risk 0.016005 marknad 0.012296 ekonomi 0.01332 v¨anta 0.017019

skulds¨attning 0.014036 euroomr˚ade 0.011274 usa 0.011614 prognos 0.012231 f¨oretag 0.012271 usa 0.010809 finansiella 0.010878 bed¨omning 0.012004

effekt 0.009516 bank 0.010672 tillv¨axt 0.010776 sikt 0.011636

tid 0.008954 kvartal 0.010129 sverige 0.009628 tillv¨axt 0.010707

usa 0.008258 hush˚all 0.009939 repor¨anta 0.009569 repor¨anta 0.01021 r¨anta 0.008021 prognos 0.008797 r¨anta 0.009225 inflationsrapport 0.008262 ekonomi 0.007511 finansiella 0.008426 inflation 0.00917 m˚al 0.007927

bostadsmarknad 0.007503 l¨and 0.00821 svenska 0.008951 ˚ars 0.007271

Foreign monetary policy QE Rates Swedish monetary policy

Word Weight Word Weight Word Weight Word Weight

inflation 0.018297 k¨op 0.018704 r¨anta 0.0369 r¨antebana 0.024049 styrr¨anta 0.017155 krona 0.011547 repor¨anta 0.019121 prognos 0.022905 prognos 0.016418 arbetsl¨oshet 0.01133 svenska 0.014611 inflation 0.021388

˚atg¨ard 0.010277 inflation 0.010233 visa 0.011822 repor¨antebana 0.018794 centralbank 0.009639 statsobligation 0.010177 utl¨andska 0.011149 resursutnyttja 0.014828 omv¨arld 0.00951 tid 0.007505 ekonomi 0.010957 arbetsl¨oshet 0.014789 federal 0.009503 tillg˚angsk¨op 0.00702 styrr¨anta 0.010832 repor¨anta 0.01258

ecb 0.009314 marknad 0.006987 marknad 0.00978 r¨anta 0.010699

r¨anta 0.008334 ecb 0.006901 inflation 0.009489 visa 0.007209

finansiella 0.007857 m˚anad 0.006757 r¨antebana 0.009444 niv˚a 0.006612

1Several of his speeches can be found at http://archive.riksbank.se/sv/Webbarkiv/Publicerat/Tal/index.html@s=Lars+

E.O.+Svensson.html

(25)

4.2 Comparing minutes

At the release of the minutes of a new meeting, the trained LDA model can be used to get a fast comparison of the new and previous minutes to see how the focus of EB’s discussions have changed, as shown in Fig.

4.2. Fig. 4.2a shows the meeting minutes from October 2018 and December 2018, and Fig 4.2b shows the minutes from December 2018 and February 2019. After reading the minutes, these distributions did seem reasonable.

(a) October vs. December.

(b) December vs. February

Figure 4.2: Comparison between two consecutive meeting minutes.

(26)

4.3 Net score of the Riksbank

Fig. 4.3 shows the net score of the meeting minutes as a degree of hawkishness or dovishness, compared to the repo rate. It is seen that the net score follows the repo rate quite well. When the net score is more dovish, the rate is either lowered or left unchanged. A more hawkish score is seen at a rate hike or when left unchanged. It is also seen that the EB tend to get more dovish in the minutes before a rate cut, and slightly more hawkish before a rate hike.

Figure 4.3: The net score of the Riksbank compared with the repo rate. More positive values indicate a more hawkish tone.

(27)

Chapter 5

Discussion

A problem with topic modelling is the lack of objective evaluation methods. It can be difficult to find the model that best describes the corpus without any subjective interpretation of the topics. This chapter attempts to evaluate the LDA model by testing the assumption of exchangeability and the assumption that there are enough meeting minutes in the corpus. There are also discussions regarding the time span of the minutes, issues regarding the classification of topics and finally a review of some limitations with the sentiment analysis.

5.1 The assumption of exchangeability

In section 2.1 it was stated that the order in which documents, or sections in this case, appear in a corpus is of no importance in LDA. One way of testing the assumption of exchangeability is to use KL divergence, explained in Appendix A.4, which measures the similarity of two models. The exchangeability is tested by training one model on a corpus with the sections ordered in ascending time and a second model with sections in a reversed order. If the assumption of exchangeability holds true, the two models should produce the same topics in the same order. This can be seen by pure inspection of the words in each topic and their weight, or, as a less time-consuming approach, in a matrix comparing each topic in the models using KL divergence, where small values on the diagonal of the matrix indicate similar topics. The resulting matrix of this test is shown in Appendix D.2, where it is seen that the values on the diagonal take the smallest values in the matrix, indicating that the topics from the two models are very similar. Thus, the order of the sections in the corpus has no effect on the training.

5.2 The number of sections in the corpus

So far, it has been assumed that the number of meeting minutes is enough to find comprehensible topics, as seen in chapter 4. Using perplexity is a method for testing this assumption. As stated in section 2.3, perplexity is a measure of how well a trained model explains an unseen data set. Lower perplexity means that the model gives a better explanation. An application of this measure can be used to test how well the LDA model could explain the minutes of a new meeting, in terms of the number of sections in the training corpus.

Fig. 5.1 shows the perplexity for a training set of varying number of sections and the unseen meeting minutes from December 2018 and February 2019. The perplexity has not yet converged, but the results indicate that the model could be improved by adding more sections to the training corpus. This is an issue which cannot be easily solved since there are no more meeting minutes available. It is also not possible to increase the number of sections by splitting the current sections into smaller pieces, as the performance of the model is dependent on the length of the sections.

(28)

Figure 5.1: Perplexity given an unseen corpus of meeting minutes from December 2018 and February 2019.

Perplexity could be used as an indicator that there are enough minutes available to produce a good model, or that adding more sections could possibly improve the results. One should however not only use perplexity as the only evaluation of the model. Since distribution of words in the topics and the time series of the topic distributions give intuitively good results, we can assume that our model is good enough.

5.3 Absence of relevant topics in sections

It is not unreasonable to think that there might be some sections in the minutes that do not treat any of the topics from the model in particular, or that there are sections which are very short or contain no words after preprocessing. LDA will still determine the distribution among the K topics in these sections, which can result in an inaccurate posterior distribution. In the case of very short sections, the problem is solved by excluding short sections in the preprocessing. After reading some minutes, it can be presumed that the vast majority of the sections is treating at least one of the topics. It was found that sections without any topics were very short and contained many stop words. As a consequence, these topics are filtered out in the preprocess.

5.4 Time aspect

As time goes, it would be reasonable to think that new topics and words appear in the minutes. For example, words connected to the UK’s withdrawal from the EU (Brexit) is not mentioned before 2016 but now have some impact in monetary policy. When new minutes contain words that are not in the dictionary, these words will be filtered out in the preprocess. To overcome this problem, one could consider updating both the dictionary and the LDA model after the release of a new minute. However, this will change the words and their weights in the topics and thus changing the time series and might also have an effect on the comparison between the minutes.

As mentioned in section 2.4, one possible solution to capture the change in topics over time could be to use Dynamic Topic Modelling, which trains LDA models for chunks of minutes from shorter time spans.

However, this method will reduce the number of minutes in each training chunk and deteriorate the results.

It is therefore not a suitable method for Riksbank meeting minutes.

(29)

5.5 Stop lists

The topics are strongly influenced by the words appearing in the sections, so removing words from the corpus can therefore have an impact on the topics. It was seen that removing one word from the corpus could have a great impact on the topics, while removing other words had no impact at all. The content of the stop list thus requires some subjective evaluation, some words might play an important role in finding comprehensible topics.

The choice of stop words was not made arbitrarily. By excluding common words such as ”monetary policy”,

”report”, ”member of the board” and the name of the members gave much clearer and more interpretable topics. This was also the reason for excluding such as ”higher” and ”lower”, since these words tend to appear in all topics, but is not strongly connected to any specific topic.

5.6 Misclassification

Since LDA is an unsupervised learning method, it might not identify topics perfectly. The model is only given the number of topics to find, and the choice might not reflect the actual topics contained in the minutes. There is a possibility that some topics obtained from the model consist of multiple ”subtopics”

that more suitably would classify into two separate topics. A possible solution could be to increase the number of topics, K, to find in the model. Though this is not guaranteed to split the two subtopics into separate topics, the model might still interpret them as the same. Some misclassifications like these have been discovered in these topics.

One example is the discussions regarding Brexit, which in this model mainly belongs to the topic ”household debt”, but intuitively would be more suitable in the topic ”external risks”. Brexit is a rather new topic and not frequently mentioned in the minutes, but is mainly occurring in the same sections as words that are likely to belong in the ”external risks” topic. It is not clear why the model connect Brexit to household debt, but the sections in which the two topics appear might have some common words that can be the reason for the misclassification.

It was also found that ”QE” mainly covers discussions regarding the purchases of government bonds, which started in 2015. However, many sections classified to belong to that topic also discuss unemployment. There is no reasonable connection between the two topics and they seldom appear together in the same section.

Again, why LDA decided that they belong together is unclear, but a reason could be that the two topics have many words in common.

5.7 Sentiment analysis limitations

To classify the exact sentiment of a sentence was found to be quite difficult. Since a word can have many synonyms, it would be rather tedious to capture all possible variations of words that have an impact on the sentiment in a text. One example that can be hard to capture is when board members do not agree with the suggestion for the future repo rate, e.g. by saying that they ”see no reason of cutting the rate”, where the model would not be able to connect the word ”no reason” and ”cutting” and thus would incorrectly classify the sentence as dovish. Many similar examples can be found in the minutes. By just changing the sentiment from hawkish to dovish if the word ”not” is present in the sentence is not appropriate either, it might not be the case that the word is connected to any of the words in the hawkish or dovish lists.

One approach could be to just remove the word counts from the sentences containing the word ”not”. The difference in net scores for the model counting all words and a model excluding sentences containing ”not”

is shown in Fig. 5.2. As seen, there is not much difference between the two net scores, thus indicating that ambiguity in the sentences is not too big of an issue. Fig 5.2 also shows the net score when sentences are not categorised, as seen there is not much difference between the different approaches.

(30)

Figure 5.2: The net score when sentences are divided into categories, when all sen- tences containing the word ”not” are excluded and the net score without categorising the sentences.

(31)

Chapter 6

Conclusions

The goal of this project was to implement a fast method for analysing minutes from the Riksbank monetary policy meetings. The first step was to find the topics contained in the minutes using LDA, then use the topics to get an indication of what the minutes contained. The model was trained on minutes from January 1999 to October 2018 and two minutes, December 2018 and February 2019, were used in testing. Looking at the distribution of words in the topics and the time series, it was decided that the best choice was to use 8 topics. The model showed some flaws when it came to classifying smaller topics, e.g. Brexit ended up in the topic concerning household debt and unemployment were included in the topic of quantitative easing. However, increasing the number of topics did not improve the results. A general rule of thumb in text analysis is that more documents produce a better model, so it is not unlikely that adding more meeting minutes could improve the content of the topics and the classification of sections into distributions of topics.

An issue regarding preprocessing was to convert PDF documents into sections. When training LDA models on meeting minutes from other central banks (e.g. ECB), where minutes can be scraped in html-format, it was much easier to split the texts into sections, thus being easier to process.

The second part of the project was to get a measure of the sentiment in the minutes in terms of a degree of hawkishness or dovishness, named as net score. To do this, a very simple approach was used where each sentence was categorised into one of two categories depending on the words in the sentence. Next, all hawkish and dovish words were counted and the difference between them gave the net score. To avoid counting any negative sentiment, the counts in a sentence containing the word ”not” were excluded. The net score was compared to the repo rate, and was shown to follow the rate curve quite well, the EB tended to get slightly more hawkish before a rate hike and slightly more dovish prior to a cut. At the time of the release of the minutes, the new repo rate is already known, but the net score does give an indication of the stance of the board.

The sentiment analysis in this project was based on a very simple method, and although it gave satisfying results, other methods can be tested. One possible method is to manually tag each sentence in terms of hawkish, dovish or neutral tone and to train a classifier, then use the model to find the net score for the minutes. Similar studies include Binette et. al. (2019)1, who derived the sentiment in the monetary policy report from Bank of Canada, where they used a deep learning algorithm to measure the positive or negative tone in each sentence.

1A. Binette, D. Tchebotarev (2019), https://www.bankofcanada.ca/wp-content/uploads/2019/02/san2019-5.pdf

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Byggstarten i maj 2020 av Lalandia och 440 nya fritidshus i Søndervig är således resultatet av 14 års ansträngningar från en lång rad lokala och nationella aktörer och ett

Omvendt er projektet ikke blevet forsinket af klager mv., som det potentielt kunne have været, fordi det danske plan- og reguleringssystem er indrettet til at afværge

I Team Finlands nätverksliknande struktur betonas strävan till samarbete mellan den nationella och lokala nivån och sektorexpertis för att locka investeringar till Finland.. För

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

The EU exports of waste abroad have negative environmental and public health consequences in the countries of destination, while resources for the circular economy.. domestically