• No results found

Did 2001 Mark the Beginning of a More Manipulated Market? An Analysis of Financial Markets via Benford's Law

N/A
N/A
Protected

Academic year: 2021

Share "Did 2001 Mark the Beginning of a More Manipulated Market? An Analysis of Financial Markets via Benford's Law"

Copied!
60
0
0

Loading.... (view fulltext now)

Full text

(1)

School of Education, Culture and Communication

Division of Mathematics and Physics

BACHELOR’S DEGREE PROJECT IN MATHEMATICS

Did 2001 Mark the Beginning of a More Manipulated Market? An Analysis

of Financial Markets via Benford’s Law

by

Erik Munther and Richard Wright

MAA322 — Examensarbete i matematik för kandidatexamen

DIVISION OF MATHEMATICS AND PHYSICS

MÄLARDALEN UNIVERSITY

(2)

School of Education, Culture and Communication

Division of Mathematics and Physics

MAA322 — Bachelor’s Degree Project in Mathematics

Date of presentation:

4𝑡 ℎJune 2021

Project name:

Did 2001 Mark the Beginning of a More Manipulated Market? An Analysis of Financial Markets via Benford’s Law

Author(s):

Erik Munther and Richard Wright

Version: 11th June 2021 Supervisor(s): Milica Rancic Reviewer: Marko Dimitrov Examiner: Achref Bachouch Comprising: 15 ECTS credits

(3)

Abstract

Can the law of the natural distribution of random numbers expose malice in financial markets? This thesis aims to analyze the indices S&P 500 and STOXX 600, in an effort to identify days in which behavior in the market was the result of financial manipulation or non normal market movements. What was discovered by extending a previous study [10], was that we could accurately identify many days in which the market crashed or was affected by malpractice similar to the events in the 2007-2008 financial crisis.

Keywords: Benford’s Law, S&P 500, STOXX Europe 600, Financial Markets, Stock

(4)

Acknowledgements

(5)

Contents

List of Figures 4 List of Tables 5 1 Introduction 6 1.1 Objectives . . . 6 1.2 Literature Review . . . 7 1.2.1 Outline . . . 8

2 Theory and Background 9 2.1 Discovering Benford’s Law . . . 9

2.2 Generalization of the Benford’s Law . . . 13

2.2.1 Single Digit Tests . . . 14

2.2.2 Significant-Digit Law . . . 14

2.3 Chi-Square Goodness-of-Fit Test . . . 15

2.4 Stock Indices . . . 16 3 Methodology 17 3.1 Data Collection . . . 17 3.2 Implementation in MATLAB . . . 17 3.3 Data Description . . . 18 3.4 Replication . . . 18 3.4.1 Overall Analysis . . . 18 3.4.2 Day-by-Day Analysis . . . 20

3.4.3 Consecutive Rejection Days . . . 22

4 Numerical Results 24 4.1 First digit test . . . 24

4.1.1 Overall analysis . . . 24

4.1.2 Day-by-Day Analysis . . . 28

4.1.3 Consecutive Rejection Days . . . 31

4.2 Second Digit Analysis . . . 33

4.2.1 Overall . . . 33

(6)

4.2.3 Consecutive rejection days . . . 39

5 Conclusion 40

5.1 Future Work . . . 41

6 Bibliography 45

Appendix A Data Collection 47

(7)

List of Figures

3.1 Overall empirical probability distribution S&P 500 1995-2007 . . . 19

3.2 Chi-square calculated day-by-day S&P 500 1995-2007 . . . 20

3.3 Least rejected and most rejected days S&P 500 1995-2007 . . . 22

4.1 Overall empirical probability distribution S&P 500 1985-1994 . . . 25

4.2 Overall empirical probability distribution S&P 500 2010-2020 . . . 26

4.3 Overall analysis STOXX 600 2010-2020 . . . 27

4.4 Second digit analysis of data from S&P 500 1995-2007 . . . 33

4.5 Second digit analysis of data from S&P 500 2010-2020 . . . 34

4.6 Second digit analysis of data from S&P 500 1985-1994 . . . 35

4.7 Overall second digit analysis STOXX 600 2010-2020 . . . 36

B.1 Day-by-day analysis S&P 500 2010-2020 . . . 50

B.2 Least rejected and most rejective days S&P 500 2010-2020 . . . 51

B.3 Day-by-day analysis STOXX 600 2010-2020 . . . 51

B.4 Least rejected and most rejective days for STOXX 600 2010-2020 . . . 52

B.5 Day-by-day second digit analysis STOXX 600 2010-2020 . . . 52

B.6 Least rejected and most rejective days second digit STOXX 600 2010-2020 . 53 B.7 Day-by-day second digit analysis S&P 500 2010-2020 . . . 53

B.8 Least rejected and most rejective days second digit S&P 500 2010-2020 . . . 54

B.9 Day-by-day second digit analysis S&P 500 1995-2007 . . . 54

B.10 Least rejected and most rejective days second digit S&P 500 1995-2007 . . . 55

B.11 Day-by-day analysis S&P 500 1985-1994 . . . 55

B.12 Least rejected and most rejective days S&P 500 1985-1994 . . . 56

B.13 Day-by-day second digit analysis S&P 500 1985-1994 . . . 56

(8)

List of Tables

2.1 Benford’s findings compared to logarithmic relation . . . 10

3.1 Data set of each stock index . . . 18

3.2 Chi-square calculations S&P 500 . . . 19

3.3 45 most rejected days in relation to Chi-square S&P 500 1995-2007 . . . 21

3.4 Consecutive days rejected S&P 500 . . . 23

4.1 Chi-square calculations S&P 500 1985-1994 . . . 24

4.2 Chi-square calculations S&P 500 2010-2020 . . . 26

4.3 Chi-square calculations STOXX 600 . . . 27

4.4 45 most rejected days in relation to Chi-square S&P 500 1985 - 1994 . . . 28

4.5 45 most rejected days in relation to Chi-square S&P 500 2010-2020 . . . 29

4.6 45 most rejected days STOXX 600 2010-2020 . . . 30

4.7 Consecutive days rejected S&P 500 1985-1994 . . . 31

4.8 Consecutive days rejected S&P 500 2010-2020 . . . 32

4.9 Consecutive days rejected STOXX 600 . . . 32

4.10 Chi-square calculations S&P 500 1995-2007 . . . 33

4.11 Chi-square calculations S&P 500 2010-2020 . . . 34

4.12 Chi-square calculations S&P 500 1985-1994 . . . 35

4.13 Chi-square calculations STOXX 600 . . . 36

4.14 Second digit’s 45 most rejected days in relation to Chi-square S&P 1995 - 2007 37 4.15 Second digits 45 most rejected days in relation to Chi-square S&P 500 1985-1994 37 4.16 Second digit 45 most rejected days S&P 500 2010-2020 . . . 38

4.17 Second digit 45 most rejected days STOXX 600 . . . 38

4.18 Second digit consecutive days rejected S&P 500 1995-2007 . . . 39

4.19 Second digit consecutive days rejected S&P 500 1985-1994 . . . 39

4.20 Second digit consecutive days rejected S&P 500 2010-2020 . . . 39

(9)

Chapter 1

Introduction

Determining the legitimacy of a data set has been a widely researched area in mathematics. One of the primary methods used to validate the results has been probability. Benford’s Law, also known as the Newcomb-Benford Law or the First-Digit Law is the finding that the empirical probability of the leading digits within the numbers found in a series of records are ordered in a way that the digit 1 is the most common, followed by 2, 3, and diminishing up to 9. Following uniform probability distribution one would assume each number has an equal probability to appear, in actuality this is not the case. Two mathematicians, Simon Newcomb and Frank Benford stumbled upon this finding while looking through the log tables and noticed the pages beginning with digit 1 were shoddier than the later pages.

The history of Benford’s Law originates 50 years prior to the discovery of it by Frank Benford. The first record on the subject was discovered in 1881 by Simon Newcomb, he identified that to anyone using the logarithm table it is clear that the ten digits to not occur with the same frequency as one another [18]. In 1938 Frank Benford released a paper titled "The Law of Anomalous Numbers"[2] which displayed the discovery of Benford’s Law. The rediscovery of the law experienced a surge of interest as its application could be applied to a wide variety of data sets ranging from basic statistics such as population numbers, and death rates to more advanced applications in mathematics and physics.

1.1

Objectives

The aim is to study Benford’s Law by reviewing the literature then understanding the procedure applied in a previous study [10]. To further comprehension of the law we will be attempting to replicate the results authors obtained. Subsequently, taking a new data set that considers COVID-19 and then applying the same methodology as above, we will analyze the results. After achieving these results ahead of schedule, we decided to deepen the research by taking an additional data set, one from a different index and another from the former index but at an earlier time period. Then, per the suggestion from the authors of the paper [10] we tested all data sets for the distribution of the second digit to check for conformity.

(10)

1.2

Literature Review

Our literature review will be based on three different research papers on Benford’s Law in accordance to different areas where it can be applied. Most relevant to our work is the final paper that utilizes Benford’s Law to analyse the S&P 500 and STOXX Europe 600, as it is the study we will be revising and replicating. Each paper was chosen to demonstrate the wide range of data sets that Benford’s Law is applicable to.

The applicability of Benford’s Law being so varied allowed us to choose from a wide variety of sources to use in our literature review. Articles chosen had a large correlation to either a current world issue or our thesis. Currently, the world is being drastically affected by the COVID-19 pandemic, at this point in time it is one of, if not the primary, problems in the world [13]. The next two chosen papers were focused on finance, one being on accounting fraud and the other on an analysis of the S&P 500, the topic we our reviewing. The reasoning for accounting fraud is due to it being one of the primary applications of Benford’s Law currently being used and it being a similar research area that would be studied within our current program analytical finance. We decided upon the S&P 500 case as when discussing the focus of our thesis we wanted to apply the law to the stock market and in our research this paper perfectly encapsulated what we envisioned.

Benford’s Law was popularised in 1938 by Frank Benford and since then a plethora of research has been done on the Law; meaning many articles have to be excluded. Research done on detecting fraud is interesting, however many papers are done relating to fields outside of finance so they are not relevant to our thesis. Many applications of the law have been done within finance which led us narrow our review to two papers as they sufficiently covered what was relevant to our thesis in a succinct manner.

Simon Newcomb - a Canadian/American astronomer, mathematician, and economist was the first person to have written about the phenomenon known now as the first digit Law or Benford’s Law. He first wrote about the phenomenon in 1881, [18] saying "That the ten digits do not occur with equal frequency must be evident to anyone making much use of logarithmic tables, and noticing how much faster the first pages wear out the the last ones.". He went on to briefly described the phenomenon and show the probability distribution of the first and second digits in natural numbers. After this article, there was a minor remark with a reference to Newcomb’s work made by Edwin G. Boring [4] in 1920, which was nothing but a sentence. Until Frank Benford in 1938 published his paper [2], in which he gave ample empirical evidence for the phenomena, then attempted to give reasoning behind the law but it was not considered rigorous enough. Later in 1995, a widely accepted explanation for the law is given in the paper [12] by Theodore P. Hill who through statistical reasoning and building on papers that had previously worked on explaining Benford’s Law, arrived at a solid proof.

The first article looked at the possible relation between Benford’s Law and the data sets used to calculate the number of cases and fatalities caused by the virus [13]. The article focused on the leading digit in the analysis of the involved data sets and gathering their data by comparing the reported integers to the expected distribution. The data they displayed showed the cumulative data from different countries obtained from Prof. M. Handley in a graph along with Benford’s Law. Going through each country gave a wide variety of results, displaying many countries not conforming perfectly to Benford’s Law, mainly due to specific countries

(11)

having lower samples of data sets. The culmination of all countries cumulative data sets lead to the desired results, fitting Benford’s Law.

The second article [16] was a writing of Mark Nagrini, most known for his work on the utilization of Benford’s Law as for auditing and accounting primarily identifying fraud. In the article Nagrini goes over the application Benford’s Law has for accountants and auditors in identifying digit abnormalities. The data sets used for these scenarios must follow certain rules: the numbers must depict similar phenomena, there is no built-in maximum or minimum, and they are not assigned numbers. The article proceeded to go over the math involved with the correlation to mutual funds. Considering the total assets of a mutual fund growing 10% per year, when the total assets are at $100 million, the first digit is 1. The first digit will continue to be 1 until the total assets reach $200 million, a 100% increase, this increase would take approximately 7.3 years. The increase to ever next first digit would take less and less time with the time going from $500 million to $600 million being approximately 1.9 years. The time would decrease until arriving at $1 billion, in which the cycle would repeat. The article proceeding to go over the the application in other financial data sets, involving income tax, stock exchange data, sales figures, corporate disbursements, demographics, and scientific data. The finalizing of the article ending with the prime use of Benford’s Law in identifying fraud with it being to spot anomalies in data sets which are worthy of investigating.

In the paper [10] a data set from the S&P 500 between years 1995-2007 is checked against Benford law, firstly to see if the closing prices and returns of 361 stocks in the index fit Benford’s distribution. Secondly, analysing the first significant digit distribution of the returns day-by-day; as returns followed Benford’s Law more closely. From that second analysis they took the sequences of consecutive days where the data did not fit Benford’s Law to check what anomalous events could have occured. Their method to checking how closely the data fit Benford’s Law was a Chi-square goodness-of-fit test with 8 degrees of freedom. Interestingly, the null hypothesis was rejected! A null hypothesis is a statistical hypothesis that states that no difference exists between specific characteristics of a population. Although, when comparing the same Chi-square test done on a uniform distribution against the Benford’s distribution the former rejects the null many times the magnitude of the latter. Furthermore, the same kind of analyse done on a smaller data set did not reject the null. This caused the researchers to conclude that the data does fit, save a few days that distorted the results.

1.2.1

Outline

The following thesis is a report focused on the utilization of Benford’s Law on the stock market, specifically the S&P 500 and STOXX Europe 600. Chapter 1 will go over a brief introduction into the topic itself, showing the reasoning behind the thesis and research done. Chapter 2 will go into more depth on Benford’s Law including its’ creation and the statistical test which will be used in Chapter 4. Chapter 3 is where the problem and procedure is outlined and the replication of the aforementioned study will be. In Chapter 4 the method outlined in Chapter 3 will be applied to three other data sets and commented on. Chapter 5 will conclude the end results of our thesis showing the culmination of everything done and learnt, along with a discussion on future research.

(12)

Chapter 2

Theory and Background

This Chapter includes how the law was discovered twice, then a general derivation of said law. A generalization of the law follows showing how it can be expanded to more than just the first significant digit. Some further theory and information that is relevant for the rest of the thesis is included.

2.1

Discovering Benford’s Law

The discovery of the first digit law began with Simon Newcomb. Born in Nova Scotia, Canada. Newcomb initially did not have much schooling, mainly gaining most of the knowledge he had from his father John Burton Newcomb, who was a itinerant school teacher[19]. His father started teaching Newcomb at an early age, by the time Simon was four he had learnt to count, by 5 he was dealing with calculations in addition and multiplication, and by the time Simon was seven he had finished a book on arithmetic and had been dealing with the usage of cube roots.

When Simon became sixteen he moved to Maryland, USA to join his father who had previously moved there. Containing little formal knowledge Simon began going to Washington D.C to study in the libraries, focusing on astronomy and mathematics[19]. Through his avid learning habits he eventually got into Lawrence Scientific School of Harvard University, where he got a degree his 1858. This was the beginning of his numerous positions throughout his life ranging from working at the United States Naval Observatory at Washington to being selected as a member of the National Academy of Sciences.

Simon Newcomb’s greatest accomplishment came before the discovery of the first digit law. In 1879 he produced work in the Astronomical Papers Prepared for the use of the American

Ephemeris and Nautical Almanac, a series of his works done on astronomy. This led to

him being considered one of America’s first great astronomers[6]. Two years later Newcomb discovered the frequency of the first digits while flipping through a logarithm table. He wrote a article [18] on his discovery, but more as a side note than any grand discovery as to him it seemed like old knowledge that had not been formalised. After this article, a minor remark with a reference to Newcomb’s work is made by Edwin G. Boring [4] in 1920, but no further mention on the frequencies of the first digit was made.

(13)

Until in 1938, years after Simon Newcomb’s original discovery of the first digit law another discovered the same phenomenon. This was Frank Benford, who later had the law named after him resulting in it being called Benford’s Law. Born 1883 in Johnston, Pennsylvania, Frank Albert Benford. Jr was an American electrical engineer and physicist who for the majority of his career worked at General Electrics (GE) as a research physicist. While working at GE Benford noticed something unusual about a book of logarithm tables. He noticed the first digits were more worn out than the following digits. Upon discovering this Benford spent the next few years compiling data until in 1938 he published his results displaying data sets of more than 20,000 values such including atomic weights, numbers in magazine articles, baseball statistics, and the areas of rivers. His research showed that all these values followed the same pattern he discovered in the logarithm book; shown in Table (2.1).

Table 2.1: Benford’s findings compared to logarithmic relation Digit Logarithmic relation Benford’s findings

1 0.3010 0.306 2 0.1761 0.185 3 0.1249 0.124 4 0.0969 0.094 5 0.0792 0.080 6 0.0669 0.064 7 0.0580 0.051 8 0.0512 0.049 9 0.0458 0.047

The way he discovered the law as reported in his paper [2] is by observing that the first digit occurrence is similar to the common logarithm of 2 (i.e. 0.3010) and that the frequency of the second digit is similar to the difference between 𝑙𝑜𝑔3 − 𝑙𝑜𝑔2 = 0.1761. Extending this reasoning to all first digits he concluded that they tend to follow the logarithmic property,

𝑃𝑑 = log 𝑏  𝑑+ 1 𝑑  , (2.1)

where 𝑃𝑑 is the probability of a number having the leading digit 𝑑. 𝑏 refers to the log base which is commonly 10, but works for any value.

Benford attempted to find a geometric basis for the law by analysing the natural number system, in search for the distorting factor that affects the frequency of numbers occurring with certain first significant digits. He reasoned the occurrence of 1’s as the first digit is about 11.12% when counting from 1-1000. Then, temporarily 1’s occur at a rate of 55.55% as the range is increased to 20000. The rate of occurrence of 1’s as the first digit decreases with every 10000 added until 100000 is reached, where 1’s are again occurring at 11,12%. Benford then plotted these frequencies on a semi-logarithmic plot and found the area under the curve that represented the percent of occurrence of 1’s at each 10000 interval and found the area to be approximately 0.30103.

(14)

According to [24] Benford’s Law is not capable of being proven in the traditional sense as it is an "empirically observed phenomena rather than abstract mathematical facts". Despite this many attempts at proving, or other wise supporting the law have been made [9],[2],[12]. We will be presenting a general derivation of the law made by [9] involving the Laplace transform. In the decimal system, the probability 𝑃𝑑 of any given number having a first digit 𝑑 is the sum of the probability that said number belongs to the interval [𝑑 · 𝑏𝑛,(𝑑 + 1) · 𝑏𝑛) for all integers 𝑛, thus 𝑃𝑑 is expressed as

𝑃𝑑 = ∞ Õ 𝑛=−∞ ∫ (𝑑+1)𝑏𝑛 𝑑𝑏𝑛 𝑓(𝑥)dx, (2.2)

which after introducing a new function 𝑔𝑑(𝑥) allows equation (2.2) to be rewritten as

𝑃𝑑 =

∫ ∞

0

𝑓(𝑥)𝑔𝑑(𝑥)dx. (2.3)

Using the Heaviside step function,

𝜂(𝑥) = ( 1, if 𝑥 ≥ 0, 0, if 𝑥 < 0, (2.4) where 𝑔𝑑(𝑥) = ∞ Õ 𝑛=−∞ [𝜂(𝑥 − 𝑑𝑏𝑛 ) − 𝜂(𝑥 − (𝑑 + 1)𝑏𝑛 )].

The variable 𝑔𝑑(𝑥) represents a density function which shows how the 9 digits in the decimal system all have different shapes [9]. This means neither can be a simple translation or expansion of the other.

Equations (2.2) and (2.3) can both be used to find the 𝑃𝑑 of any 𝑓 (𝑥) numerically even though they do not generally fit equation (2.1) perfectly. This speaks to the fact that Benford’s Law is not a strict law of anomalous numbers more of a tendency of anomalous numbers. Nonetheless, applying the technique of Laplace transform, it is shown that Benford’s Law is a relatively good approximation for "well-behaved" probability density functions, especially compared to the uniform distribution that one would expect first digits to occur at. By well-behaved what is meant is that the probability density function has an inverse Laplace transform, in other words if the 𝑓 (𝑥) is extended to the complex plane, according to the complex inversion formula[15], and satisfies[9]:

1. 𝑓 (𝑥) is analytic on C except for a finite number of isolated singularities; 2. 𝑓 (𝑥) is analytic on the half plane 𝑥 | 𝑅𝑒𝑧 > 0;

3. There are positive constants 𝑀, 𝑅, and 𝛽 such that | 𝑓 (𝑥) | ≤ 𝑀

|𝑥 |𝛽 whenever |𝑧| ≥ 𝑅, 𝑓(𝑥) has an inverse Laplace transform. In this general derivation of the law we assume the 𝑓(𝑥) is well-behaved.

(15)

Given some Laplace transform: 𝑓(𝑥) = ∫ ∞ 0 𝑓(𝑡)𝑒−𝑡𝑥dt, 𝐺𝑑(𝑡) = ∫ ∞ 0 𝑔𝑑(𝑥)𝑒−𝑡𝑥dx, (2.5)

its possible to reformulate equation (2.3) using the properties of Laplace transforms,

∫ ∞ 0 𝑓(𝑥)𝑔𝑑(𝑥)dx = ∫ ∞ 0 𝑔𝑑(𝑥)dx ∫ ∞ 0 𝑓(𝑡)𝑒−𝑡𝑥dt, = ∫ ∞ 0 𝑓(𝑡)dt ∫ ∞ 0 𝑔𝑑(𝑥)𝑒−𝑡𝑥dx, = ∫ ∞ 0 𝑓(𝑡)𝐺𝑑(𝑡)dt. (2.6)

Now to solve equation (2.6) we start by solving the function 𝐺𝑑(𝑡) using equation (2.5),

𝐺𝑑(𝑡) = ∫ ∞ 0 𝑔𝑑(𝑥)𝑒−𝑡𝑥dx, = ∞ Õ 𝑛=−∞ ∫ (𝑑+1)𝑏𝑛 𝑑𝑏𝑛 𝑒−𝑡𝑥dx, = ∞ Õ 𝑛=−∞  1 𝑡 𝑒−𝑡𝑥 (𝑑+1)𝑏𝑛 𝑑𝑏𝑛  , = 1 𝑡 ∞ Õ 𝑛=−∞ (𝑒−𝑡𝑑𝑏𝑛− 𝑒−𝑡 (𝑑+1)𝑏𝑛). (2.7)

It is evident that 𝐺𝑑(𝑡) is a function of two variables 𝑑 and 𝑡. Despite the fact that 𝑑 is defined in the set 1, 2, 3, .., 9 it is extendable to the whole real axis. Hence 𝐺𝑑(𝑡) is a continuous function of both variables. To evaluate the function we will be first taking the partial derivative with respect to 𝑑, 𝜕 𝐺𝑑(𝑡) 𝜕 𝑑 = 1 𝑡 ∞ Õ 𝑛=−∞ (−𝑡𝑏𝑛 𝑒−𝑡𝑑𝑏 𝑛 + 𝑡𝑏𝑛 𝑒−𝑡 (𝑑+1)𝑏 𝑛 ), ≈ ∫ ∞ −∞ (−𝑏𝑥 𝑒−𝑡𝑑𝑏 𝑥 + 𝑏𝑥 𝑒−𝑡 (𝑑+1)𝑏 𝑥 )dx, Substituting 𝑏𝑥for 𝑦 𝜕 𝐺𝑑(𝑡) 𝜕 𝑑 = 1 ln𝑏 ∫ ∞ 0 (−𝑒−𝑡𝑑𝑦+ 𝑒−𝑡 (𝑑+1)𝑦)dy, = 1 ln𝑏  − 1 𝑡 𝑑 + 1 𝑡(𝑑 + 1)  . (2.8)

(16)

The approximation comes from integration replacing a summation which is not a strictly equivalent substitution; as 𝐺𝑑(𝑡) → 0 when 𝑑 → ∞. For a more strict proof refer to the paper [8] which formed the basis for this proof. Equation (2.8) is now integrated for substitution later, 𝐺𝑑(𝑡) = ∫ 1 ln𝑏  − 1 𝑡 𝑑 + 1 𝑡(𝑑 + 1)  dd, = 1 𝑡 1 ln𝑏 ∫  − 1 𝑑 + 1 𝑑+ 1  dd, = 1 𝑡 ln|𝑑+1𝑑 | ln𝑏 , = 1 𝑡log𝑏  𝑑+ 1 𝑑  . (2.9)

With the function 𝐺𝑑(𝑡) integrated, the following normalization condition of 𝑓 (𝑡) is used, 1 = ∫ ∞ 0 𝑓(𝑥)dx, = ∫ ∞ 0 dx ∫ ∞ 0 𝑓(𝑡)𝑒−𝑡𝑥dt, = ∫ ∞ 0 𝑓(𝑡)dt ∫ ∞ 0 𝑒−𝑡𝑥dx, = ∫ ∞ 0 𝑓(𝑡) 𝑡 dt, (2.10)

so that it is now easy to show how equation (2.6) can be solved by simply substituting and simplifying equations (2.9) and (2.10) i.e.

𝑃𝑑 = ∫ ∞ 0 𝑓(𝑥)𝑔𝑑(𝑥)dx, = ∫ ∞ 0 𝑓(𝑡)𝐺𝑑(𝑡)dt, = ∫ ∞ 0 𝑓(𝑡) 𝑡 log𝑏  𝑑+ 1 𝑑  dt, = log𝑏  𝑑+ 1 𝑑 ∫ ∞ 0 𝑓(𝑡) 𝑡 dt, = log𝑏  𝑑+ 1 𝑑  .

2.2

Generalization of the Benford’s Law

The generalization of the law expands from the probability of a first significant digits rate of occurrence to the probability of the second, third, until infinity along with the probability of any combination of significant digits. The following formulas show the different varieties of tests Benford’s Law can bring.

(17)

2.2.1

Single Digit Tests

The single digit tests are used to identify the frequency of any single digit in accordance to Benford’s Law. The application of it is used solely for any one digit in any place in a number. Each digit test contains digits ranging from 𝐷1, 𝐷2, 𝐷3, ..., 𝐷𝑘 with them representing the first digit, second digit, third digit, to the 𝑘𝑡 ℎ digit respectively. The following tests will be comprised of the usage of single digit tests up to the fourth digit, beginning with the first digit test: 𝑃(𝐷 1 = 𝑑1) = log10  𝑑 1+ 1 𝑑 1  , (2.11)

where 1 ≤ 𝑑1 ≤ 9 (0 can not be a first digit),

𝑃(𝐷 2= 𝑑2) = 9 Õ 𝑑 1=1 log10  1 + 1 10𝑑1+ 𝑑2  , (2.12) where 0 ≤ 𝑑2 ≤ 9, 𝑃(𝐷 3= 𝑑3) = 9 Õ 𝑑 1=1 9 Õ 𝑑 2=0 log10  1 + 1 102𝑑 1+ 10𝑑2+ 𝑑3  , where 0 ≤ 𝑑3 ≤ 9, 𝑃(𝐷 4= 𝑑4) = 9 Õ 𝑑 1=1 9 Õ 𝑑 2=0 9 Õ 𝑑 3=0 log10  1 + 1 103𝑑 1+ 102𝑑2+ 10𝑑3+ 𝑑4  ,

where 0 ≤ 𝑑4 ≤ 9, then observing the tests up to the fourth a clear pattern emerges. Following this enables us to carry on for the remaining digits [7]. Following the fourth digit test to the ninth digit test the frequency of occurrence becomes close to uniform.

The next two tests are from the works of Mark Nagrini when he delved in to the application of Benford’s Law in identifying fraud [17].

2.2.2

Significant-Digit Law

The significant-digit law is used to identify the frequency of multiple digits in a number. The application of it is for determining the frequency of a collection of digits in a number rather than any single digit throughout the number.

The significant-digit law is:

𝑃(𝐷 1= 𝑑1, ..., 𝐷𝑘 = 𝑑𝑘) = log 10  1 +  𝑘 Õ 𝑖=1 𝑑𝑘10 𝑘−1 −1 ,

where 𝐷1, 𝐷2, 𝐷3, ..., 𝐷𝐾are the first digit, second digit, third digit, leading up to the 𝑘 𝑡 ℎ

digit. An example of this would be using 5 as the first digit, 3 as the second digit, and 4 as the third digit, giving us:

𝑃(𝐷

1= 𝑑1, 𝐷2= 𝑑2, 𝐷3= 𝑑3) = log10(1 + 534

(18)

This gives the frequency of these three numbers appearing in a row at 0.081%. This test becomes applicable for multiple digit tests allowing us to determine the frequency a certain order of numbers can appear with.

2.3

Chi-Square Goodness-of-Fit Test

The Chi-square goodness-of-fit test is a statistical model used to determine if observed fre-quencies of certain numbers differ from the expected frefre-quencies. The usage of the Chi-square goodness-of-fit test will be for the comparison of our values in accordance to Benford’s Law against the uniform probability distribution.

The test statistic for the Chi-squared goodness-of-fit is called the Chi-squared value, it can be calculated using the following:

𝜒2= 𝑔 Õ 𝑖=1 (𝑂𝑖− 𝐸𝑖)2 𝐸𝑖 (2.13)

where 𝑔 is the number of groups, 𝐸𝑖 represents the expected frequency count for the 𝑖 𝑡 ℎ

level of the variable, and 𝑂𝑖represents the observed frequency count for the 𝑖

𝑡 ℎ

level of the variable.

P-Value

To obtain the P-value from the Chi-square distribution you need two separate numbers. • Degrees of Freedom

• The 𝜒2value

We denote the degrees of freedom as 𝑚, then the number of levels (𝑘) of the variable minus one gives us: 𝑚 = 𝑘 − 1.

Procedure

The procedure for the Chi-square goodness-of-fit test is to initially set up the null and alternative hypothesis. We use the Chi-square test to check the validity of the distribution used in a random event. The test then checks the null hypothesis against the alternative hypothesis.

1. Null Hypothesis - assumption there is no difference between the observed and expected values.

2. Alternative Hypothesis - assumption there is a significant difference between the ob-served and expected values.

To determine whether you will reject the null hypothesis we compare our results to the 𝜒2 table values. Matching the degrees of freedom and the value of the Chi-square goodness-of-fit test to the table value we check if the value exceeds the table value, if it does then we reject

(19)

the null hypothesis and conclude that there is a significant difference between the observed and expected values. If the value of the test subceeds the table value we do not reject null hypothesis concluding that there is no difference between the observed and expected values.

2.4

Stock Indices

The S&P 500 or (Standard and Poor’s) is an index on the stock market that includes 500 large companies listed on US stock exchanges. The index history originates in Poor’s Publishing around 1860, at which point was a mere investors guide to the railroad industry. Later in 1923, Standard Statistics Company created a stock market index with 233 US stocks updated weekly. Then in 1941, Poor’s Publishing merged with Standard Statistics Company forming Standard & Poor’s. Finally on March 4𝑡 ℎ 1957, the S&P 500 stock composite index was born.

Companies are introduced into the index by decision of a committee using a selection criteria [11]. Their goal is to be indicative of the largest public companies in the US, to do so they reconstitute the index quarterly, although, turnover is minimized even if companies momentarily fail to meet the selection criteria.

The weighting of stocks in the index is a free-float capitalization-weighted method meaning companies are weighted according to their respective market capitalization’s (only publicly traded shares are included in the capitalization measure). The index is valued by the formula

Index Level =

Í (𝑃𝑖· 𝑄𝑖) 𝐷𝑖 𝑣𝑖 𝑠𝑜𝑟

,

where 𝑃𝑖 is the price of the 𝑖 𝑡 ℎ

stock in the index; 𝑄𝑖 is the number of publicly traded shares for the stock; and the 𝐷𝑖𝑣𝑖𝑠𝑜𝑟 is a number that is adjusted to keep the index from deviating based on actions like share issuance, share buybacks, special dividends, etc. The index has averaged a return of approximately 9.8% every year (not adjusted to inflation) making it a safer and reliable index to invest in for long term investors.

The second index analyzed is the STOXX 600, a stock index that is a subset of the STOXX Global 1800. The STOXX 600 index is focuses on companies based in Austria, Belgium, Denmark, Finland, France, Germany, Ireland, Italy, Luxembourg, theNetherlands, Norway, Poland, Portugal, Spain, Sweden, Switzerland and the United Kingdom,[23]. STOXX 600 is derived from the STOXX Europe Total Market Index, the index which represents 95% of the free-float market cap European companies. The STOXX 600 is built up of 19 different sectors for more diversification, the classification is based on the Industry Classification Benchmark or ICB.

(20)

Chapter 3

Methodology

The purpose of our thesis is to analyse the stock market via the S&P 500 and STOXX Europe 600 to discern if Benford’s Law could find anomalous behaviour. After reviewing the literature we found the paper [10] in which this question is addressed. Upon thoroughly examining the paper we proceeded to attempt to replicate their results in MATLAB so that we could apply the same method to other data sets. The replication will be included in this Chapter.

3.1

Data Collection

The first step is to collect the data used so we could design the code. To do this we elicit the help of the function [22] which pulls historical data from Yahoo finance. We then went through 613 of the stocks that are currently in the S&P 500 and have been as many did not trade during the specific time interval that was analyzed in the paper [10]. This resulted in us collecting the same number of stocks they used but not the same exact stocks, our results show they differ slightly from the paper being replicated.

3.2

Implementation in MATLAB

Next, we built the code [25] with the help of the function [1] which took the data and found the occurrences of each first digit. The data is first ran through the function column wise as the function prefers data vectors that are larger that 1000 data points. We then calculated the Chi-square test statistic by multiplying both the observed percentage of occurrences of each first digit and the expected occurrences by the sum of the total number of observations. This was done to the logarithmic returns of the data and the closing prices, against Benford and Uniform distributions, as will be shown in the numerical results. The next analysis done is to do the same process but instead finding the Chi-square for the entire data set we look at each days goodness-of-fit to Benfords distribution; only logarithmic returns are checked in this part of the analysis as they conform better to Benford. This is done by running the same data row wise, so we get every companies logarithmic return for each day.

Note: some error in our results could possibly be due to the function preferring sample sizes larger than 1000. After we have a matrix of each days occurrence of first digits we do the

(21)

same Chi-square goodness-of-fit test and plot the results on scatter plots. Lastly we go through all the rejected days (days rejected the null) and check if they are consecutive to see how long a certain anomalous event lasted before the market could correct itself.

3.3

Data Description

Table 3.1 shown below will display the data sets used for each of our experiments.

Table 3.1: Data set of each stock index

Stock Index Dates Number of stocks Log Returns/Close Prices

S&P 500 Jan 2, 1985 – Dec 29, 1994 240 2526

S&P 500 Aug 14, 1995 – Oct 17,2007 361 3066

S&P 500 Dec 1, 2010 – Dec 1, 2020 429 2517

STOXX 600 Dec 1, 2010 – Dec 1, 2020 312 2517

The initial experiments will consists of three separate steps:

1. First is investigating the first significant digit’s overall probability distribution on the data set of log returns and prices.

2. Second is to investigate the first significant digit’s day-by-day results.

3. Third we investigate the distribution of the first significant digit consecutive days that do not conform to Benford’s Law.

3.4

Replication

The specific stocks and information on the S&P 500 stocks used between the same period as the original experiment August 14, 1995 - October 17, 2007 could not be retrieved, causing the results to slightly differ from the original experiment.

3.4.1

Overall Analysis

Using the Chi-square goodness-of-fit test we are able to compare the uniform probability distribution to the probability distribution (𝑃𝑑) of the first significant digits of the observed prices and log returns against Benford’s Law. This enables us to compare Benford’s Law to the observed log return and price values, as well as the uniform distribution.

(22)

1 2 3 4 5 6 7 8 9 First Significant Digit

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Pd

Overall Empirical Probability Distributions

Observed Log return values Obeserved Prices Values Benford distribution Uniform distribution

Figure 3.1: Overall empirical probability distribution S&P 500 1995-2007

Table 3.2: Chi-square calculations S&P 500

Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Benford 56587.64 6458.73

Uniform 600683.93 657585.48

In Figure 3.1 we can observe the probability distribution of the log returns match almost identically to Benford’s Law, with the price values being fairly matched to Benford’s Law. In comparison to the uniform distribution containing no commonalities, the log returns and price values seem to conform more to Benford’s Law. Using quantitative results following a significance level of 0.05 we can see our Chi-square 𝜒2 values determine that the null hypothesis is actually rejected according to these results. When compared to the uniform distribution the results shown are far more rejected than Benford’s. In connection to [10] we followed the theories by [20] which leads to using the level to which that Benford’s and uniform distribution are rejected as a distance, we can say that the empirical probability distribution aligns more with Benford’s Law than it does to uniform distribution due to Benford’s being "closer". Expanding on this, using the same logic the empirical probability of the log returns for Benford’s Law are the closest as they have the value closest to being not rejected. When looking at the overall Chi-square test the results are still a large amount off from the 𝜒8,0.952 value, one reason of this was due to the large data set, coming from 361 stocks with 3066 log returns and daily close prices. When applied to a smaller set, for example the day-by-day analysis the number conforms much closer to the 𝜒8,0.952 value.

(23)

3.4.2

Day-by-Day Analysis

Next we delve deeper into the performance of the stocks as we go into a day-by-day analysis of them. Following the previous results we know the log returns empirical probability distribution conformed closer to Benford’s Law than uniform distribution, due to it being the closest value to being not rejected, it is the focus on the day-by-day analysis.

From the 3066 log return data sets there were 857 days rejected for 𝛼 = 0.01 and 1373 days rejected for 𝛼 = 0.05. The percentage for rejection was then 27.95% and 44.77% for each assigned log returns. Figure 3.2 contains a green line going horizontally throughout the figure, this indicates the 𝜒8,0.952 value which is used to compare all of the obtained 𝜒2values.

14-Aug-199515-Feb-199618-Aug-199619-Feb-199724-Aug-199725-Feb-199829-Aug-199803-Mar-199904-Sep-199907-Mar-200008-Sep-200013-Mar-200114-Sep-200118-Mar-200220-Sep-200224-Mar-200325-Sep-200328-Mar-200430-Sep-200403-Apr-200505-Oct-200509-Apr-200611-Oct-200614-Apr-200717-Oct-2007 Date 0 20 40 60 80 100 120 140 160 180 200 220 240 Chi-Square

Chi-Square values of the day-by-day analysis of returns

(24)

Table 3.3: 45 most rejected days in relation to Chi-square S&P 500 1995-2007

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 27-Feb-2007 242.97 16 17-Mar-2003 92.09 31 22-Feb-2005 77.22

2 29-Jul-2002 196.57 17 27-Dec-2002 88.82 32 18-May-2006 76.49

3 24-Mar-2003 177.30 18 24-Jan-2003 88.43 33 30-May-2003 75.80

4 30-Aug-2007 130.72 19 11-Jul-2007 86.44 34 14-Apr-2003 75.06

5 06-Aug-2007 125.28 20 10-Mar-2003 85.62 35 05-Aug-2002 74.97

6 29-Aug-2007 122.89 21 05-Mar-2007 84.78 36 25-Jun-2007 74.01

7 14-Mar-2007 121.85 22 01-Oct-2003 84.03 37 13-Jul-2007 72.26

8 08-Jun-2007 117.49 23 03-Sep-2002 83.97 38 05-Aug-2004 71.38

9 24-Jul-2002 115.48 24 10-Sep-2007 83.48 39 16-Jun-2003 71.29

10 27-Oct-1997 112.50 25 19-Sep-2007 83.34 40 14-Jun-2007 71.21

11 11-May-2007 109.74 26 04-Nov-2004 80.91 41 30-Jul-2007 70.30

12 02-Jan-2003 104.26 27 22-Aug-2003 80.86 42 05-Jul-2002 70.03

13 17-Jun-2002 100.29 28 08-Mar-1996 79.63 43 15-Aug-1997 70.00

14 06-Aug-2002 97.36 29 27-Nov-2002 78.23 44 02-Oct-2007 69.76

15 04-Aug-1998 96.75 30 27-Jul-2007 77.87 45 25-May-2004 68.90

In Table 3.3 we showcase the 45 most rejected days in relation to the Chi-square results using the 0.05 significance level. The most rejected days tended to follow big events in the stock market that cause these large anomalous. The most rejected day February 27𝑡 ℎ, 2007 was due to the wall street crash, in which the top American stock indices vastly dropped, with the primary ones being the Dow Jones and S&P 500. The Dow Jones index in particular fell 416 points which had been the largest drop since September 11𝑡 ℎ, 2001. This crash led to global stock markets being put under immense pressure, specifically Europe, Japan and Hong Kong. The second most rejected day was another stock market crash where the market in the United States had started to recover from September 11𝑡 ℎ, 2001, then in March, 2002 proceeded to decline until it hit dramatic lows in July and September, lows that had not been reached since 1997 and 1998. One interesting observation is that after investigating Table 3.3, out of the 45 most rejected days, 4 happened before September 11𝑡 ℎ and 41 came afterwards, even though September 11𝑡 ℎ, 2001 is 6 years 1 month after the beginning of the data set and 6 years 1 month before the ending of the data set.

(25)

1 2 3 4 5 6 7 8 9 First Significant Digit

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Pd

Distribution of first digit for returns for most and least accepted days

Most Accepted Day Least Accepted Day Benford Distribution Uniform Distribution

Figure 3.3: Least rejected and most rejected days S&P 500 1995-2007

In Figure 3.3 the least rejected and most rejected days for Benford’s law are displayed, being compared to Benford’s distribution and uniform distribution. Observing it you can identify that the most rejected days of Benford’s law still conform more to Benford’s distribution than they do to uniform distribution.

3.4.3

Consecutive Rejection Days

Continuing with the focus on log returns we have the consecutive rejection days for both 𝛼 = 0.01 and 0.05. In Table 3.4 we display the sequences of consecutive days that were rejected ranging from 1-10.

(26)

Table 3.4: Consecutive days rejected S&P 500 Consecutive Rejections 𝛼= 0.01 𝛼= 0.05 1 563 689 2 124 153 3 44 92 4 9 31 5 7 18 6 0 11 7 3 7 8 0 5 9 1 5 10 0 1

The results slightly differ from [10] as we attained up to a maximum of 10 days being rejected, nevertheless this follows the same logic with [10]. The S&P 500 here displays the ability to assimilate any anomalies that happen in the market and lead them back to being not rejected days. This is apparent as a large majority of days rejected are then not rejected the next consecutive day.

(27)

Chapter 4

Numerical Results

In this Chapter we display the numerical results of our findings from the multiple experiments with the usage of Benford’s Law on the S&P 500 and the stock market. The first displayed results will delve into a previous time period in the S&P 500. The experiments will be an expansion on different years of the S&P 500 and the STOXX 600 index.

The Chi-square goodness-of-fit tests conducted on the proceeding experiments will be using 𝑚 = 8 degrees of freedom with two different significance levels, the first being a 0.05 and the second being a 0.01. In these scenarios, the null hypothesis will be rejected for 𝛼 = 0.05 if the value of 𝜒2is greater than 𝜒8,0.952 = 15.507 and the null hypothesis will be rejected for 𝛼 = 0.01 if the value of 𝜒2is greater than 𝜒28,0.99 = 20.09.

4.1

First digit test

The following experiments are an expansion of the original experiment. They include three separate data sets: S&P 500 between 1985-1994, S&P 500 between 2010-2020, and STOXX 600 between 2010-2020.

4.1.1

Overall analysis

Using the same Chi-square goodness-of-fit test we replicate the same analysis on three different data sets. Figure 4.1 displays the same comparison between the uniform probability distribution to the probability distribution (𝑃𝑑) of the first significant digits of the observed prices and log returns against Benford’s Law set in 1985 - 1994.

Table 4.1: Chi-square calculations S&P 500 1985-1994

Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Benford 8210.65 2641.96

Uniform 353437.04 308406.53

From a visual standpoint the observed values conform even more to Benford’s distribution here than in the original experiment. Looking at Table 4.2 we can see that following previous

(28)

1 2 3 4 5 6 7 8 9 First Significant Digit

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Pd

Overall Empirical Probability Distributions

Observed Log return values Obeserved Prices Values Benford distribution Uniform distribution

Figure 4.1: Overall empirical probability distribution S&P 500 1985-1994

theories [20], by using distance as a determinant of level of rejection, we can identify that this new data set conforms even more to Benford’s Law than the previous experiment. Continuing with this theory, w.r.t prices are 8210.65 they are 689.20% closer, and as w.r.t returns are 2641.96 they are 244.47% closer. Previously mentioned in Section 3.4, one reason to the distance of being rejected from the 𝜒8,0.952 value was due to the large data set. As the 1985-1995 data set is the smallest out of all experiments it would fit this reasoning.

The next experiment follows the most recent time period between 2010 - 2020 following the same procedures. This experiment consisted of 2517 daily close prices and log returns for 429 stocks, making it contain more than the 1985-1994 data but less than the original experiment.

(29)

1 2 3 4 5 6 7 8 9

First Significant Digit

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Pd

Overall Empirical Probability Distributions

Observed Log return values Obeserved Prices Values Benford distribution Uniform distribution

Figure 4.2: Overall empirical probability distribution S&P 500 2010-2020

Table 4.2: Chi-square calculations S&P 500 2010-2020 Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Benford 48818.14 8570.04

Uniform 271416.54 643572.94

In Figure 4.2 we observe the results that do not comply with our theory following data set size results in Benford’s Law. The 𝜒2w.r.t prices found are in between the original experiment and the 1985-1994 results making it 594% higher than the 1985-1994 results and the original results being 115.92% higher. The 𝜒2w.r.t returns proceed to go completely against the theory as they contain the highest result out of the three being 324.40% farther than the returns from 1985 - 1994 and 132.69% away from the original. One remark to a prior statement is the obscurity of the results obtained after 2001, looking at these results they continue to align with that statement.

One final experiment will be done on the application of Benford’s Law on the stock market. This specific experiment will be done on the STOXX 600 instead of the prior S&P 500 index experiments. The experiment consists of 312 stocks with 2517 daily close prices and log returns.

(30)

1 2 3 4 5 6 7 8 9

First Significant Digit

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 P d

Overall Empirical Probability Distributions

Observed Log Return Values Obeserved Prices Values Benford Distribution Uniform Distribution

Figure 4.3: Overall analysis STOXX 600 2010-2020

Table 4.3: Chi-square calculations STOXX 600

Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Benford 6015.23 5511.93

Uniform 326731.30 457845.50

In Figure 4.3 the overall empirical probability distribution for this data set is displayed, showcasing the observed log returns and price values attained in comparison to Benford’s distribution and uniform distribution. Comparing this experiment to the S&P 500 2010-2020 data we have two experiments conducting in the same time period with the same daily log returns and daily close prices. Although both data sets are identical, the overall results differ vastly. While previous Figure 4.2 conformed to Benford’s distribution, the values do not match completely. Observing Figure 4.3 the conformity of the values are much higher in comparison. Moving into a quantitative analysis we look at Table 4.3 where the 𝜒2w.r.t prices and returns are shown for Benford’s distribution and uniform distribution. Observing the values we find that we achieve the lowest 𝜒2 w.r.t prices out of all experiments completed thus far. While this follows the theory of data sets influencing the level of rejection, this theory was proven false previously making it nullified. Inspecting Table 4.3 we see that the 𝜒2 w.r.t prices are 940.74% less rejected than the original experiment, 136.50% less rejected than the 1985-1994 data set, and 811.58% closer than the S&P 500 2010-2020 data set. The 𝜒2w.r.t returns are then 118.81% less rejected than the original, the 1985-1994 period is 208.63% less rejected than the STOXX 600, and STOXX 600 is 155.48% less rejected than the S&P 500 2010-2020 data set.

(31)

4.1.2

Day-by-Day Analysis

The day-by-day analysis for the 1985-1994 period is shown in Figure B.11 located in the appendix, it displays the Chi-square levels for each day of each stock. Previously mentioned the 1985-1994 data set was rejected less in terms of 𝜒2 values, looking at the figure we can see a larger portion of values appear under the rejected value in comparison to the original experiment. The data set consisted of 2526 daily close prices and log returns, which we used for 𝛼 = 0.05 and 0.01. This gave us 277 rejected days for 𝛼 = 0.01, and 555 rejected days for 𝛼= 0.05. The percentage rejected then being 10.97% for 0.01 significance level and 21.97% for 0.05.

Looking at Table 4.4 we see the 45 most rejected days with the first two being September 11𝑡 ℎ, 1986 and November 30𝑡 ℎ, 1987. September, 1986 a large stock market crash dropped the S&P 500 8.5% and dropped NASDAQ 5.4%, before this the market for August was bullish being up more than 5%. This same occurrence proceeded to happen in September in the year 2000, with S&P 500 and NASDAQ dropping the same amount and August being up more than 5% prior to the crash, indicating a potential trend. The next largest drop in November, 1987 came a month after the stock market crash on October 19𝑡 ℎ, 1987 otherwise known as Black Monday [3], in which the DJIA (Dow Jones Industrial Average) dropped 22.6% which was the largest drop since December 12𝑡 ℎ, 1914 and the largest single day fall in U.S history. Looking at Table 4.4, the 5𝑡 ℎ most rejected day also happened to be the Black Monday crash, with many of the top rejected days coming shortly after October 19𝑡 ℎ, 1987.

Table 4.4: 45 most rejected days in relation to Chi-square S&P 500 1985 - 1994

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 11-Sep-1986 138.36 16 21-Aug-1991 48.35 31 11-Nov-1988 39.86

2 30-Nov-1987 87.47 17 26-Oct-1987 47.86 32 22-Oct-1987 39.50

3 14-Apr-1988 86.44 18 17-Mar-1989 45.66 33 29-Sep-1988 39.46

4 16-Oct-1987 77.88 19 13-Oct-1989 45.46 34 03-Aug-1990 38.46

5 19-Oct-1987 77.83 20 22-Jan-1990 44.94 35 06-Apr-1988 38.42

6 06-Aug-1990 68.58 21 16-Feb-1993 44.75 36 17-Jan-1991 37.91

7 01-Oct-1990 66.62 22 22-Feb-1989 43.43 37 02-Sep-1988 37.06

8 30-Mar-1987 65.64 23 27-Jul-1994 42.39 38 07-Jul-1986 36.32

9 08-Jan-1988 63.17 24 03-Dec-1987 41.43 39 24-Feb-1989 36.31

10 19-Aug-1991 60.93 25 14-Apr-1987 40.92 40 09-Apr-1987 36.24

11 04-Jan-1988 60.70 26 09-Jun-1986 40.87 41 08-Jun-1988 35.97

12 23-Aug-1990 55.92 27 19-May-1987 40.71 42 09-Nov-1990 35.93

13 12-Jan-1990 54.78 28 20-Jan-1989 40.21 43 23-Sep-1985 35.60

14 21-Oct-1987 54.56 29 09-Jan-1986 40.14 44 04-Feb-1994 35.52

15 31-May-1988 48.88 30 02-May-1988 40.04 45 14-Oct-1987 35.52

Looking at Figure B.12.12 the least rejected days seem not to conform to either Benford’s distribution or uniform distribution, with the least rejected days almost completely aligning to Benford’s Law.

(32)

Ob-serving Figure B.1 we can identify the higher amount of Chi-square values above the rejection line in comparison to the 1985-1994 values. This showcases the level to which the Chi-square values have been rejected as the 2010-2020 data nullified a potential theory on the data set in stocks and Benford’s Law. The 2010-2020 data set contains 2517 daily close prices and log returns for 429 stocks. Using 𝛼 = 0.01 we have 1159 rejected days, and for 𝛼 = 0.05 we have 1534 rejected days. This leads to a 53.86% rejection rate for 0.01 significance level and a 71.28% rejection rate for 0.01. Moving on to Table 4.5 we see the 45 most rejected days for the 2010-2020 data set, with the first two being August 24𝑡 ℎ, 2015 and February 5𝑡 ℎ, 2018. Starting with August 24𝑡 ℎ a crash known as the 2015 Flash Crash occurred, in which the S&P 500 fell 103 points in a matter of minutes. The second most rejected day February 5𝑡 ℎwas also a stock market crash which led to the S&P 500 falling 2.1%, this being due to a surge of jobs in the U.S economy leading to a large increase in interest rates affecting the entire economy.

Table 4.5: 45 most rejected days in relation to Chi-square S&P 500 2010-2020

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 24-Aug-2015 430.34 16 23-Aug-2011 214.07 31 11-Mar-2020 178.80 2 05-Feb-2018 405.23 17 12-Mar-2020 213.05 32 08-Oct-2014 178.36 3 11-Aug-2011 332.88 18 11-Jun-2020 205.43 33 25-Sep-2014 178.10 4 26-Dec-2018 304.97 19 10-Oct-2013 204.35 34 18-Aug-2011 176.49 5 08-Feb-2018 299.75 20 02-Mar-2020 196.79 35 03-Feb-2014 172.72 6 04-Mar-2020 295.26 21 29-Jun-2015 191.05 36 23-Nov-2012 171.57 7 08-Aug-2011 295.00 22 29-Jan-2016 190.64 37 02-Aug-2011 169.82 8 01-Sep-2015 284.65 23 09-Nov-2011 189.34 38 28-Oct-2020 167.04 9 27-Feb-2020 251.67 24 06-Apr-2018 187.94 39 07-Sep-2011 160.10 10 21-Aug-2015 246.00 25 04-Jan-2019 186.71 40 16-Mar-2020 159.92 11 20-Jun-2013 242.21 26 10-Oct-2011 185.79 41 31-Jul-2014 157.73 12 08-Sep-2015 237.98 27 30-Nov-2011 182.14 42 31-Dec-2014 157.52 13 26-Aug-2015 225.56 28 29-Aug-2011 181.23 43 18-Dec-2014 157.07 14 09-Sep-2011 222.47 29 25-Feb-2020 180.51 44 05-Aug-2019 156.52 15 04-Aug-2011 219.02 30 09-Sep-2015 178.91 45 24-Dec-2018 156.16

In Figure B.2.2 the least and most rejected days for the S&P 500 between 2010-2020 are displayed. In the Figure the most rejected days seem to conform slightly more to Benford’s distribution, with the least rejected days continuing to almost completely conform. This aligns with the most and least rejected in the previous experiments.

Finalizing the day-by-day analysis is the STOXX 600 data set in the 2010-2020 time period comprising of 2517 daily close prices and log returns from 316 stocks. The Chi-square levels for each day of the stock index are displayed in Figure B.3, which expands and explains the level to which the days are being rejected in accordance to the Chi-square results. Looking at the significance levels, using 𝛼 = 0.01 we have 503 rejected days, with 𝛼 = 0.05 we have 922 rejected. This leads to a 19.98% rejection rate for significance level 0.01 and a 36.62% rejection rate for 0.05.

Shown above, we have Table 4.17 displaying the 45 most rejected days for the STOXX 600 data set, with the two most rejected days being June 3𝑟 𝑑, 2015 and August 11𝑡 ℎ, 2011. While no

(33)

Table 4.6: 45 most rejected days STOXX 600 2010-2020

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 03-Jun-2015 66.04 16 28-Nov-2011 46.35 31 25-Mar-2014 39.74

2 11-Aug-2011 61.94 17 10-Jun-2015 45.46 32 16-Feb-2016 38.98

3 17-Mar-2011 58.83 18 06-May-2011 45.02 33 01-Mar-2016 38.66

4 28-Mar-2018 55.90 19 16-Mar-2011 44.25 34 23-Aug-2019 38.37

5 22-Sep-2015 55.18 20 05-Jul-2019 43.56 35 16-Nov-2015 38.30

6 21-Aug-2015 54.70 21 10-Dec-2014 42.45 36 28-Apr-2017 38.17

7 23-Aug-2011 53.75 22 21-Oct-2014 42.36 37 02-Apr-2018 37.98

8 08-Dec-2015 53.72 23 27-Jul-2015 41.16 38 26-Mar-2015 37.88

9 09-Oct-2013 51.39 24 15-Jan-2016 41.07 39 16-Oct-2014 37.73

10 22-May-2015 50.73 25 27-Dec-2013 40.92 40 20-May-2020 37.60

11 15-Dec-2014 48.80 26 17-Aug-2017 40.91 41 02-Oct-2015 37.43

12 21-Oct-2011 48.61 27 27-Aug-2018 40.78 42 11-Feb-2014 37.34

13 17-Nov-2016 48.26 28 17-Aug-2015 40.49 43 15-Oct-2020 37.26

14 29-Apr-2020 48.21 29 22-Sep-2017 40.29 44 26-Jun-2015 37.05

15 20-Jan-2016 46.52 30 15-Jul-2014 39.83 45 09-Jan-2012 36.87

specific major stock crash happened June 3𝑟 𝑑, it is linked to the previous 45 most rejected days for 2010-2020 as from June 3𝑟 𝑑, 2015 to August 24𝑡 ℎ, 2015 an estimated ten trillion dollars were lost in global markets [21]. The second date, August 11𝑡 ℎ, 2011, has correlation to the previous 2010-2020 data set as the August 11𝑡 ℎ is the third most rejected day in that data set. This day is special as stock markets globally fell due to the U.S having their credit downgraded from AAA rating, the European Sovereign debt crisis, and for France and the UK the fear of having their credit rating being downgraded from AAA as well [5].

(34)

4.1.3

Consecutive Rejection Days

Looking at a deeper analysis of the log returns we have the consecutive rejection days for the three new data sets with 𝛼 = 0.01 and 0.05. The Tables below display the consecutive rejection days for the new experiments on the S&P 500 and the STOXX 600, with Table 4.7 containing a maximum of 6 days being rejected, Table 4.8 containing a maximum of 20 days being rejected, and Table 4.9 containing a maximum of 11 days being rejected. An observation from all of these Tables shows that they respond the same way the original experiment did as they all display the ability to assimilate the anomalies that appear throughout their data and conform back to normality.

Table 4.7: Consecutive days rejected S&P 500 1985-1994

Consecutive rejections 𝛼= 0.01 𝛼= 0.05 1 230 405 2 32 76 3 4 15 4 2 10 5 0 2 6 0 1

(35)

Table 4.8: Consecutive days rejected S&P 500 2010-2020 Consecutive rejections 𝛼= 0.01 𝛼= 0.05 1 553 562 2 107 127 3 56 63 4 35 50 5 15 21 6 12 16 7 6 13 8 9 14 9 2 4 10 2 3 11 1 4 12 0 0 13 0 0 14 0 1 15 0 3 16 0 0 17 0 0 18 0 0 19 1 2 20 0 2

Table 4.9: Consecutive days rejected STOXX 600

Consecutive rejections 𝛼= 0.01 𝛼= 0.05 1 378 543 2 66 119 3 14 54 4 4 18 5 2 11 6 2 6 7 0 1 8 0 1 9 0 0 10 0 0 11 0 1

(36)

4.2

Second Digit Analysis

4.2.1

Overall

A further analysis is done on the data set from the replication as in the paper [10] a remark is made on the potential benefit from checking against the second significant digit given by equation (2.13). The analysis involved using the same code with a few adjustments includ-ing changinclud-ing all 9’s in for loops to 10’s as the second digit includes "0" as a significant digit, as well as changing "benford_extract(adjc(:,i),’ALL’,10)" in the function call to "ben-ford_extract(adjc(:,i),’ALL’,10,’2ND’)".

0 1 2 3 4 5 6 7 8 9

Second Significant Digit

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 P d

Overall Empirical Probability Distributions

Observed Log Return Values Obeserved Prices Values Second Law Distribution Uniform Distribution

Figure 4.4: Second digit analysis of data from S&P 500 1995-2007

Table 4.10: Chi-square calculations S&P 500 1995-2007 Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Second Law 1660.68 143.77

Uniform 18154.61 24805.35

What was observed in Table 4.10 is that distribution of the log returns and prices follow the second digits distribution even more closely than they follow the first significant digit distribution. The large disparity is attributed to how Chi-square goodness-of-fit is calculated. By this we mean small disparity in the difference in occurrences of "1’s" in the data would correspond to a larger test statistic in the first significant digit compared to the second significant digit. To be precise the test statistic of log returns against Uniform compared to Benford’s

(37)

distribution is 172.5 times further in the second significant digit compared to 101.8 times further in the first significant digit, which means the second digit is more conforming than the first according to Chi-square.

0 1 2 3 4 5 6 7 8 9

Second Significant Digit

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 0.125 P d

Overall Empirical Probability Distributions

Observed Log Return Values Obeserved Prices Values Second Law Distribution Uniform Distribution

Figure 4.5: Second digit analysis of data from S&P 500 2010-2020

Table 4.11: Chi-square calculations S&P 500 2010-2020 Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Second Law 308.02 263.54

(38)

0 1 2 3 4 5 6 7 8 9

Second Significant Digit

0.07 0.08 0.09 0.1 0.11 0.12 0.13 P d

Overall Empirical Probability Distributions

Observed Log Return Values Obeserved Prices Values Second Law Distribution Uniform Distribution

Figure 4.6: Second digit analysis of data from S&P 500 1985-1994

Table 4.12: Chi-square calculations S&P 500 1985-1994 Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Second Law 1282.19 359.83

(39)

0 1 2 3 4 5 6 7 8 9

Second Significant Digit

0.08 0.085 0.09 0.095 0.1 0.105 0.11 0.115 0.12 P d

Overall Empirical Probability Distributions

Observed Log Return Values Obeserved Prices Values Second Law Distribution Uniform Distribution

Figure 4.7: Overall second digit analysis STOXX 600 2010-2020

Table 4.13: Chi-square calculations STOXX 600

Reference Probability Distribution 𝜒2w.r.t. prices 𝜒2w.r.t returns

Second Law 225.82 37.72

Uniform 15605.76 16987.07

4.2.2

Day-by-day

The day-by-day analysis did not yield as promising a result in the second significant digit analysis compared to the first. In Appendix B the second digit day-by-day results are displayed. Looking at the 1995-2007 period on Figure B.9, it is shown that most days do not reject the null. Furthermore, when examining the days that most rejected the null, we could not correlate it to an anomalous event as we could in many cases with the first digit test. The following Tables will showcase the results for the 45 most rejected days in relation to Chi-square for all the second digit tests.

Evaluating the 45 most rejected days for each data set confirmed that looking into the day-by-day results did not provide much information either. Looking at the first digit, the 45 most rejected days all had correlation to major stock market crashes or big events that heavily influenced the stock market. In comparison, the second digit brought not significant days in the 45 most rejected days, showing that it does not yield promising results either.

(40)

Table 4.14: Second digit’s 45 most rejected days in relation to Chi-square S&P 1995 - 2007

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 08-Dec-2005 34.30 16 13-Feb-2003 27.00 31 05-Mar-1997 24.75

2 29-Jul-2004 32.69 17 28-Oct-2004 26.91 32 11-May-1998 24.74

3 25-Feb-2004 32.37 18 23-Mar-2006 26.79 33 12-Jun-2003 24.47

4 18-Nov-2003 31.37 19 17-Jun-1998 26.24 34 28-Jan-2003 24.45

5 17-Oct-2007 31.06 20 10-Feb-1997 26.09 35 08-Dec-2003 24.40

6 08-Sep-2003 30.91 21 21-Oct-2003 25.89 36 07-Dec-2001 24.32

7 22-Jun-2000 30.082 22 29-Oct-1996 25.47 37 09-Aug-2000 24.21

8 15-Mar-2007 29.31 23 10-Mar-2004 25.40 38 28-Jan-2005 24.20

9 22-Mar-2006 28.76 24 10-May-2005 25.33 39 17-Nov-1997 24.20

10 24-Mar-2005 28.49 25 03-Oct-2003 25.29 40 25-Feb-1998 24.09

11 20-Oct-2005 27.79 26 21-Aug-1998 25.08 41 21-Mar-2007 24.08

12 15-Feb-2000 27.77 27 30-Jun-2004 25.07 42 13-Sep-2002 23.93

13 14-Jan-1997 27.33 28 24-Jan-2003 25.03 43 26-Jul-2006 23.85

14 12-Feb-2007 27.32 29 04-Mar-1997 24.95 44 11-Jun-1996 23.79

15 29-Oct-2003 27.14 30 14-Jul-2004 24.80 45 06-Apr-1999 23.70

Table 4.15: Second digits 45 most rejected days in relation to Chi-square S&P 500 1985-1994

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 11-May-1992 33.31 16 24-Jan-1994 24.99 31 02-Aug-1990 23.02

2 07-Mar-1986 33.08 17 21-Aug-1990 24.90 32 06-Sep-1985 22.88

3 20-Jan-1986 29.16 18 03-Sep-1986 24.59 33 07-May-1985 22.85

4 11-Sep-1986 28.77 19 01-Sep-1988 24.33 34 23-May-1990 22.81

5 23-Nov-1987 27.92 20 10-Dec-1991 23.96 35 03-May-1993 22.80

6 20-May-1985 27.74 21 27-Sep-1994 23.73 36 16-Apr-1986 22.79

7 26-Oct-1987 27.71 22 12-Feb-1986 23.62 37 15-Nov-1994 22.68

8 24-Jan-1986 27.23 23 17-Feb-1993 23.54 38 28-Oct-1986 22.63

9 19-Nov-1992 26.64 24 25-May-1994 23.41 39 10-Oct-1985 22.60

10 23-Dec-1994 26.34 25 14-Jun-1991 23.40 40 12-Dec-1989 22.59

11 27-Jun-1988 25.78 26 04-Nov-1993 23.40 41 16-Jul-1987 22.57

12 15-Apr-1986 25.47 27 18-May-1987 23.32 42 17-Oct-1989 22.55

13 02-May-1994 25.42 28 25-Jul-1986 23.29 43 18-Aug-1993 22.46

14 24-Aug-1987 25.42 29 06-Dec-1990 23.28 44 13-Jan-1994 22.38

(41)

Table 4.16: Second digit 45 most rejected days S&P 500 2010-2020

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 13-Nov-2019 30.24 16 28-Nov-2018 24.97 31 15-May-2019 22.08

2 19-Feb-2015 30.18 17 12-Apr-2018 24.64 32 15-Jul-2015 21.90

3 08-Oct-2018 29.94 18 17-Nov-2014 24.60 33 13-Mar-2015 21.88

4 07-Jun-2012 29.36 19 09-Dec-2019 24.52 34 26-Mar-2014 21.80

5 04-May-2011 29.23 20 09-May-2018 24.20 35 17-May-2013 21.74

6 29-Dec-2010 28.86 21 29-Aug-2011 24.15 36 14-May-2019 21.72

7 31-Dec-2014 28.30 22 19-Dec-2016 23.43 37 30-Nov-2011 21.50

8 15-Jul-2019 28.29 23 30-Oct-2017 23.41 38 22-Apr-2015 21.45

9 08-Jul-2016 27.70 24 27-Dec-2011 23.36 39 22-Jun-2011 21.38

10 17-Dec-2014 27.12 25 06-Sep-2016 23.28 40 19-Mar-2012 21.38

11 25-Sep-2018 26.73 26 01-Jun-2012 23.06 41 09-Jun-2017 21.34

12 23-Aug-2017 26.55 27 28-Dec-2012 22.84 42 29-Dec-2017 21.30

13 21-Feb-2018 26.24 28 29-Nov-2011 22.70 43 28-Feb-2018 21.26

14 28-Jul-2017 25.83 29 08-Jun-2016 22.65 44 18-Jun-2012 21.25

15 05-Feb-2018 25.82 30 29-Apr-2015 22.09 45 21-Nov-2019 21.23

Table 4.17: Second digit 45 most rejected days STOXX 600

Rank Day 𝜒2 Rank Day 𝜒2 Rank Day 𝜒2

1 25-Feb-2019 37.88 16 22-Jan-2015 23.00 31 10-Jul-2017 21.79

2 16-Apr-2020 30.76 17 06-Jun-2016 22.98 32 28-Dec-2016 21.63

3 24-Feb-2011 29.10 18 24-Jan-2019 22.96 33 29-Dec-2017 21.46

4 01-Apr-2011 28.60 19 21-Dec-2018 22.95 34 21-Apr-2011 21.33

5 21-Nov-2018 27.48 20 28-Oct-2014 22.94 35 04-Sep-2012 21.28

6 29-May-2020 27.48 21 22-Jan-2019 22.91 36 04-Jan-2019 21.27

7 22-Sep-2016 27.01 22 15-Jan-2014 22.70 37 31-Dec-2018 21.19

8 29-Nov-2019 26.46 23 27-Jul-2012 22.61 38 03-Mar-2016 21.16

9 03-Oct-2011 26.17 24 19-May-2020 22.60 39 23-Jan-2017 21.07

10 10-Nov-2015 25.69 25 12-Sep-2016 22.53 40 01-Dec-2014 20.94

11 05-May-2015 25.58 26 02-Apr-2020 22.38 41 24-Aug-2018 20.91

12 18-Sep-2017 24.12 27 22-Jan-2013 22.32 42 05-Feb-2018 20.75

13 26-Dec-2014 23.40 28 10-Feb-2016 22.15 43 31-Jul-2020 20.65

14 23-Sep-2019 23.21 29 09-Jan-2015 22.05 44 22-Feb-2019 20.62

(42)

4.2.3

Consecutive rejection days

Ending with the consecutive rejection days the results did not produce much as the max consecutive rejection days were four, which happened a singular time in the 1985-1994 period following a significance level of 0.01. Meaning that for the second digit they assume normality at a much higher rate than the single digit did.

Table 4.18: Second digit consecutive days rejected S&P 500 1995-2007

Consecutive rejections 𝛼= 0.01 𝛼= 0.05

1 74 250

2 4 24

3 0 2

Table 4.19: Second digit consecutive days rejected S&P 500 1985-1994

Consecutive rejections 𝛼= 0.01 𝛼= 0.05

1 47 193

2 1 13

3 0 3

4 0 1

Table 4.20: Second digit consecutive days rejected S&P 500 2010-2020

Consecutive rejections 𝛼= 0.01 𝛼= 0.05

1 34 151

2 1 13

3 0 3

Table 4.21: Second digit consecutive days rejected STOXX 600 2010-2020

Consecutive rejections 𝛼= 0.01 𝛼= 0.05

1 30 122

2 0 9

Figure

Figure 3.1: Overall empirical probability distribution S&amp;P 500 1995-2007 Table 3.2: Chi-square calculations S&amp;P 500
Figure 3.2: Chi-square calculated day-by-day S&amp;P 500 1995-2007
Table 3.3: 45 most rejected days in relation to Chi-square S&amp;P 500 1995-2007
Figure 3.3: Least rejected and most rejected days S&amp;P 500 1995-2007
+7

References

Related documents

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

Byggstarten i maj 2020 av Lalandia och 440 nya fritidshus i Søndervig är således resultatet av 14 års ansträngningar från en lång rad lokala och nationella aktörer och ett

Omvendt er projektet ikke blevet forsinket af klager mv., som det potentielt kunne have været, fordi det danske plan- og reguleringssystem er indrettet til at afværge

I Team Finlands nätverksliknande struktur betonas strävan till samarbete mellan den nationella och lokala nivån och sektorexpertis för att locka investeringar till Finland.. För

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

Av tabellen framgår att det behövs utförlig information om de projekt som genomförs vid instituten. Då Tillväxtanalys ska föreslå en metod som kan visa hur institutens verksamhet

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

(c) Binary Segmentation Model (d) Hidden Markov Model Figure 4.7: Sequence of underlying probabilities predicted by the models on data with low frequency of changes..