• No results found

Estimating Companies’ Survival in Financial Crisis: Using the Cox Proportional Hazards Model

N/A
N/A
Protected

Academic year: 2021

Share "Estimating Companies’ Survival in Financial Crisis: Using the Cox Proportional Hazards Model"

Copied!
77
0
0

Loading.... (view fulltext now)

Full text

(1)

Using the Cox Proportional Hazards Model

By: Niklas Andersson

Independent Thesis Advanced Level

Department of Statistics

Supervisor: Inger Persson

(2)

Abstract

This master thesis is aimed towards answering the question What is the contribution from a company’s sector with regards to its survival of a financial crisis? with the sub question

Can we use survival analysis on financial data to answer this?. Thus survival analysis is used to answer our main question which is seldom used on financial data. This is

interesting since it will study how well survival analysis can be used on financial data at the same time as it will evaluate if all companies experiences a financial crisis in the same

way. The dataset consists of all companies traded on the Swedish stock market during 2008. The results show that the survival method is very suitable the data that is used.

The sector a company operated in has a significant effect. However the power is to low too give any indication of specific differences between the different sectors. Further on

it is found that the group of smallest companies had much better survival than larger companies.

Keywords: Survival Analysis, Survival Data, Time to Event, 2008 Financial Crisis

and Swedish Stock Market.

Acknowledgments

First and foremost I would like to thank my family for their continuous support through-out my academical studies. Secondly I wold like to thank my supervisor, Inger Persson,

(3)

Contents

1 Introduction 1 1.1 Background . . . 1 1.2 Method . . . 3 2 Theory 5 2.1 Survival Analysis . . . 5 2.1.1 Survival Function . . . 6 2.1.2 Hazard Function . . . 7 2.1.3 Censoring . . . 8

2.1.4 Estimations of Survival and Hazard . . . 10

2.1.5 Test of Group Differences . . . 13

2.1.6 The Cox Proportional Hazards model . . . 17

2.1.7 Assumptions, Goodness of Fit and Diagnostics . . . 25

2.2 Key Performance Indicators . . . 33

2.2.1 Liquidity . . . 33 2.2.2 Solvency . . . 35 3 Data 36 3.1 The Observations . . . 36 3.2 The Variables . . . 36 3.3 An Event . . . 38 4 Results 40 4.1 Survival Functions and Cumulative Hazards . . . 40

4.1.1 By Sectors . . . 43

4.1.2 By Size . . . 46

4.2 Cox Modeling . . . 48

4.2.1 Individual Study Start . . . 48

4.2.2 Day One Study Start . . . 50

4.2.3 Final Model . . . 60

5 Summary and Conclusion 63 6 Bibliography 65 7 Appendix 1 7.1 Descriptive Statistics for stratums . . . 1

7.2 Cummulative Hazards . . . 3

7.3 Companies . . . 5

7.4 Exampel Data . . . 7

(4)

Below follows a short background to this thesis outlining the problem and environment

that we will work with. This is followed by a short presentation of the method that will be used and the main question that the thesis will strive towards answering.

1.1

Background

Stock markets have always been susceptible to financial crisis and this in turn affects each company with stocks traded in the market. Each company will do its best to survive a

financial crisis and keep the investors beveling in it. In extension the company is trying to keep the price of its stock from falling too far.

The reasons for financial crises have been plenty during the history and although not

all financial crises are rooted in a stock market many have come to affect these markets. Kindleberger and Aliber (2005, ch. 1) report of the first known financial crisis recorded,

the ”Tulipmania” that originated in overinflated prices on tulip bulbs. Kindleberger and Aliber (2005, ch. 1) also tell of more recent financial crises like the 1920s stock price

bubble or even more close at heart the real estate and stock crisis during the 80s and the beginning of the 90s that affected Sweden, Finland and Norway amongst other nations.

In times like these we might see stock prices falling rapidly and far with huge

con-sequences for companies and financial institutions. But we do not expect every stock to experience the same price fall, after all no companies are the same. Therefore their

reaction and to what degree they are affected by a financial crises differ from company to company. We also expect companies and the price of their stock to react differently

from crisis to crisis since in the same way as no company is the same as another there are always parameters that make each crisis unique .

During 2008 the world’s stock markets went trough a recession leading to huge price

falls in stock prices. Sweden was no exception and the general price index of the OMX

(5)

Stockholm exchange closed at a mere 50.2% of its notation at the beginning of the year.

The recession affected all stocks traded on the Stockholm market and no stock that was traded in the beginning of the year closed on a positive note by the end of december.

However as we have stated stocks are not the same and there ought to be differences between them. One could speculate that the size of the company and in what sector

it operates could make a difference in the stock’s sensitivity for a recession like the one

in 2008. There were most likely also other factors, such as Key Performance Indicators (KPI), specific to each company that affected to what extent the price on their stock fell.

After all it is the traders that set the price and their confidence and belief in a company’s stock determines what price they are wiling to pay. Figure: (1.1)below shows that there

is some difference in how the index price for different sectors have developed during the time of interest in this thesis.

Figure 1.1: Section indexes during 2008 (base value 2008-01-01)

As of now we do not know if these differences are significant, if the sectors were effected differently during this time period. The primary interest of this thesis will be

to determine if the different sectors played a roll in how well a stock resisted price fall during 2008.

(6)

1.2

Method

What we want to find out in this thesis is if there were differences in each company’s stock’s resistance to price fall during the 2008 recession, specifically if there where

dif-ferences due to different sectors/industries that the companies operated in. In order to do this we have chosen to use the statistical methods of survival analysis and the Cox

proportional hazards modeling.

We will be working with survival time data or Time to Event data where the depen-dent variable is the time from when we start observing something until a specific event

occurs. When working with this kind of data the Cox proportional hazards model is the by far most popular method. In fact according to Allison (2010, ch. 5) the original paper

written by Sir David Cox in 1972 where he presents the proportional hazards method had been cited over 1000 times in 2009 making it the most cited paper in statistics and

earning it a place on the top 100 most cited papers of science. There are many reasons for this popularity and the foremost is probably the fact that the model itself does not

need any information about some underlying distribution that we expect the survival time to follow. This makes the Cox method semi-parametric and sets it aside from other

parametric models where you need to make a decision about the underlying distribution. This makes the Cox proportional hazards model more robust. How this semi-parametric

model works we will see in Section: (2.1.6). There are also other reasons for the methods popularity and amongst them is the fact that it is relative easy to include time

depen-dent covariates and it can use both discrete and continuous measurements of the Time to Event. However this said, the Cox method is not a universal method and there are

times where a parametric model based on known distributions is preferred. Also using a semi-parametric method will result in higher standard deviations while estimating

pa-rameters as we will see in Section: (2.1.4). (Allison, 2010, ch. 5)

In this thesis the Cox method will be used since the the underlying hazards is un-known in our data and the dependent variable will be a Time to Event variable since we

(7)

In the research leading up to this paper I have found few cases of Time to Event data

and thus Cox proportional hazards modeling being used on financial data. The lack of previous research suggests that it is not commonly used in this field however Ni (2009)

has successfully used this method in a paper focused on the effect from a number of share (company stocks) related KPI:s. She defined a certain price fall in the companies’ stocks

(same for all stocks) to be the event of interest (essential in survival analysis) in order

to find each company’s survival time. In essence the method she used is the same as the one used in this paper however the focus will be shifted towards the contribution of the

sector in which a company operates.

With the given background this thesis will strive towards answering the following questions: What is the contribution from a company’s sector with regards to its survival

of a financial crisis? with the sub question Can we use survival analysis on financial data to answer this?

As already mentioned all crises are not the same thus this thesis will be limited to

the crisis and following price fall that took place during 2008. In the same way not all markets are the same and a second limitation will be that only stocks traded on the

OMX Stockholm exchange market will be included in the study. The dataset will initially consist of stocks traded at the beginning of 2008 and they will be studied throughout the

(8)

In this section we will look closer at the theories that will be used as a foundation in

this thesis. We will cover theory regarding survival analysis in general and the Cox Proportional Hazards model as well as looking at financial theory regarding KPI:s that

can influence the investors’ decisions.

2.1

Survival Analysis

In survival analysis we are mainly concerned with studying time to event data. That

is, we have a specific event that either will or will not happen for the observations of our study. This kind of data can occur in a wide number of fields such as medicine,

engineering and economics. A simple example of an event can be the death of a patient or a specific diagnosis within the field of medicine while for an engineer it could mean the

breaking of a component in a machine. It is then the time to this specific event, survival time that we will study in order to find the survival time of each observation. Of course

the event does not have to be something negative but could rather be a remission after a treatment. (Klein and Moeschberger, 2005, ch. 2.1)

Throughout this theory section we will give examples on graphical and numerical

re-sults by applying the theories on a dataset created for this purpose. The event in this dataset is, as it will be when we look at our main dataset, negative and is concerning

the time it takes for a mechanical component to brake. The dataset and its variables is presented in depth in Appendix: (7.4).

(9)

2.1.1

Survival Function

We will let the time to our event of interest occurs be T , then we can characterize the distribution of T by using the Survival Function. This function tells us the probability

of an individual (observation) surviving beyond a specific time, t, or the probability of experiencing the event after the time t (Klein and Moeschberger, 2005, ch. 2.2). We can

define the survival function as:

S(t) = P r(T > t) (2.1)

The reason for the the survival function being P r(T > t) rather than P r(T = t) is that

some observations in the study will not experiencing the event during our study and thus there time to event is unknown. These observations are called censored observations and

in Section: (2.1.3) we will lock closer at why and how censoring occur

The survival function itself is in fact the complement of the cumulative distribution function [S(t) = 1 − F (t)] or just as well the integral of the probability density function S(t) = Rt∞f (u)du. If T is continous then S(t) is also continous and a strictly decreasing function. At time 0 the survival probability will always be 1 and as time goes towards

in-finity the probability of survival will go towards 0.(Klein and Moeschberger, 2005, ch. 2.2)

(Klein and Moeschberger, 2005, ch. 2.2) also report what happens when T is not continuous. This is often the case in survival analysis due to lack of precision in the

measurement of time. A medical study might only measure for their event at check-ups done at some time interval, such as once a year. In that case the researcher are only able

to determine that the event has happened between the last check-up and this check-up. Then we have to tweak the definition of our survival function. As we will se in Section:

(3) this is the case in our study since we only can observe the event once a day. Assume that time to event is discrete and can take on the values ti, i = 1, 2, 3, . . . then:

p(tj) = P r(T = ti), i = 1, 2, 3, . . . where t1 < t2 < t3 < · · ·

which gives

S(t) = P r(T > t) =X

ti>x

(10)

2.1.2

Hazard Function

Another way of looking at the survival time is trough the Hazard Function or Hazard Rate [h(t)]. This function tells us the probability of experiencing the event in the next

instant conditioned on the fact that the event has not happened up to that point in time. This in turn describes the probability of experiencing the event over time and how

that probability changes with time. The hazard rate is defined as (see e.g Klein and Moeschberger (2005, ch. 2.3)):

h(t) = lim

∆t→0

P [t ≤ T < t + ∆t|T ≥ t]

∆t (2.3)

and if T is continuous then,

h(t) = f (t) S(t) = −

∂ln[S(t)]

δt (2.4)

Cumulative Hazard Function

Closely related to the hazard function is the Cumulative Hazard Function which as the name suggests accumulate the hazard rates over time. This definition is also given by

Klein and Moeschberger (2005, ch. 2.3) for a continuous T :

H(t) = Z t 0 h(u)du = −ln[S(t)] (2.5) where S(t) = exp[−H(t)] = exp  − Z t 0 h(u)du  (2.6)

If we do not have continuous T then the cumulative hazard function is

H(t) =X

ti≤t

(11)

2.1.3

Censoring

When working with survival data one thing will almost always be a problem to consider, censoring. Kleinbaum and Klein (2012, ch. 1.2) describe that censoring occurs when we

have some information about an observation’s survival time without knowing the exact time. A simple example is when a patient in a study is no longer followed while the event

of interest has not yet happened. In this case we know that the patient ”survived” up until the point where we stopped following that patient but not for how long afterwards.

Kleinbaum and Klein (2012, ch. 1.2) give three general reasons for censoring.

• The study ends without the event occurring for an observation • The observation is lost to follow-up

• When working with people they can choose to withdraw from the study

In the end of a study all observations will either have experienced the event itself or

they will be censored. However not all cases of censoring are the same and it is common to divide censoring into two major categories, right and left censoring which is determined

by how the data is collected.

Right Censoring

Right censoring is when the event is observed only if it occurs before a predetermined time. This is the simplest form of right censoring and is often called Type I censoring.

An example of this is when a study starts off with a number of patients in whom we await an event. However due to cost the study might end before all individuals have

experienced the event thus those individuals will be right censored. In this simplest form all individuals that are censored when the study ends will have been observed for the

same length (from the start to the end of the study) and have the same censoring time. (Klein and Moeschberger, 2005, ch. 3.2)

Similar to the Type I censoring is the generalized Type I censoring which as in the

basic Type I case has a predetermined end date of the study. In this case though the starting date of all individuals are not the same, rather they enter the study at individual

(12)

when the study ends those observations that are time censored will not have the same

study time. (Klein and Moeschberger, 2005, ch. 3.2)

Another common type of right censoring that is often used within the field of tech-nology according to Klein and Moeschberger (2005, ch. 3.2) is the Type II censoring. In

this type of censoring a number of observations (n) is entered in the study at the same

time but instead of a pre-determined time at which the study ends a number of events (r) is chosen. We have the condition that r < n in this case and when r amount of the

observations have experienced the event the study ends. One property of this kind of censoring is that the data will consist of the r smallest event times from a random sample

n.

A pseudo case of right censoring that is also common in survival analysis which can coexist with many other types of censoring is Random Censoring. Random censoring

oc-cur when we, for some reason, no longer can study our observation/individual although we know that it has not yet experienced the event. Random censoring implies that the

censoring time and the event time are independent. This might come about due to a patient in a medical study moving outside of the area that hospital performing the study

operated in an thus the patient is lost to further follow ups which in turn leads to the event never being observed although the study itself is still ongoing. These cases are

censored at the time when they are lost and not, if for example type I censoring is used, at the end of the study. (Klein and Moeschberger, 2005, ch. 3.2)

In our study random censoring will be present. This might be due to a number of

things such as merges, buy ups, or termination of trade. Other types of censoring exist such as Left Censoring and Truncation but these will not be precent in this thesis. For

information about other types of censoring se i.g Klein and Moeschberger (2005, ch 3) or Kleinbaum and Klein (2012, ch 1)

(13)

2.1.4

Estimations of Survival and Hazard

In the next two sections we will look closer at estimations of survival time and hazard rates. There are two different methods that we will cover which are both applicable when

working with right censured data. The focus on right censured data is due to the fact that this is the kind of censoring present in our dataset. The dataset itself is outlined in

detail in Section: (3). The two methods that we will look at are the Kaplan-Meier (K-M) or Product-Limit estimator and the Nelson-Aalen (N-A) estimator. As we will see both

these methods can be used to derive the survival function and the cumulative hazard function. Since our dataset will be right censored we will only focus on these methods.

When presenting and working with the methods we will use the following notations and assumptions. We will have a total of n individuals or observations and each of those will

hold information about their time to event tias well as if they experienced an event or were

censored. The data will be discrete thus allowing for possible ties and the event/censoring

can only be observed at specific time points thus t1 < t2 < t3 < · · · < tD. t1 represent the

first time at which we can observe an event, ti the ith time point and tD the time point at

which the last event is observed. At a given time point we know the number of individuals that are risking to experience the event Yi and we can observe di number of events. If

di > 1 we will have a tie at time point ti. The ratio Ydi

i can thus be interpreted as the

conditional probability of experiencing the event at time i if you have not experienced it

at any previous times. This quantity will be used in both methods in order to estimate the survival function S(t) and the cumulative hazard function H(t). These notations are

equivalent to those used by Klein and Moeschberger (2005) in their presentation of these estimates with the assumption that the censoring time is unrelated to the event time

(there is no information about the time to the event for a censored observation).

Kaplan–Meier

The Product-Limit estimator of the survival function was introduced by Kaplan and

Meier (1958) and thus is also called the Kaplan-Meier estimator. This estimator has a fault that we need to be aware of, it is only well defined for the time interval that has

been observed in the data, t1− tD. In other words beyond the largest observed time the

(14)

is defined as (see e.g Klein and Moeschberger (2005, ch. 4.2)): ˆ S(t) = Y ti≤t  1 − di Yi  , f or t1 ≤ t (2.8)

If the condition of t1 ≤ t is not met then ˆS(t) = 1 , that is the probability of survival

before the first observation is 1. The variance of these estimates can be derived trough

the Greenwood’s formula (see e.g Klein and Moeschberger (2005, ch. 4.2)):

ˆ V [ ˆS(t)] = ˆS(t)2X ti≤t di Yi(Yi− di) (2.9)

Earlier we have pointed out the connection between the Survival function and Hazard Rates trough which we can use the K-M estimator to estimate the cumulative hazard

function as well as the survival function. By taking −ln to the estimated survival func-tion at each time point we can acquire an estimafunc-tion of the cumulative hazard ratio for

that time point, −ln[ ˆS(t)] = ˆH(t).

Example: In Figure: (2.1) the survival function for the example data is estimated using the Kaplan-Meier estimator. The time to event data is discreet and the survival

function have a step or stair shaped downward slope. The survival function is as discussed in Section: (2.1.1) strictly decreasing. The last observation (in time) is censored thus the

survival function ends in a horizontal line.

(15)

Nelson–Aalen

Although the K-M estimator can be used to estimate the cumulative hazard rate the Nelson-Aalen estimator of the cumulative hazard is considered to have better low sample

properties and will therefore be used in this thesis (Klein and Moeschberger, 2005, ch. 4.2). The estimator was developed by Nelson (1972) and Aalen (1978) where Aalen in the same

publication gave the expression for the variance of this estimator that is given in (2.11). The estimator itself is given below in (2.10) as it is presented by Klein and Moeschberger

(2005, ch. 4.2). ˜ S(t) = X ti≤t  1 − di Yi  , f or t1 ≤ t (2.10)

If the condition of t1 ≤ t is not met then ˜H(t) = 0 , that is the cumulative hazard

before the first observation is 0.

σ2H(t) =X ti≤t di Y2 i (2.11)

As with the K-M estimator this estimator is well defined up to the largest observation of t. Just as we could transform ˆS(t) to be an estimator of H(t) trough −ln[ ˆS(t)] = ˆH(t)

we can transform the N-A estimator to be an alternative estimator of the survival func-tion, ˜S(t) = −exp[ ˜H(t)] . (Klein and Moeschberger, 2005, ch. 4.2)

Example: Below in Figure: (2.2) the cumulative hazards are estimated on the

exam-ple data using the Nelson-Aalen estimator. The cumulative hazards is strictly increasing since it is the cumulative of the estimated hazard at each point in time. The last

obser-vation (in time) is censored thus the cumulative hazards ends in a horizontal line in the same way as the survival function in Figure: (2.1).

(16)

Figure 2.2: Estimated cumulative hazards on the exaple data

2.1.5

Test of Group Differences

When we are studying a variable that can be divided into two or more groups we might

be interested in testing if there is a difference between these groups regarding their sur-vival/hazard. Example of these kinds of studies where different survival between groups

might be of interest is in the field of medicine where you can have three different groups of patients all suffering of the same disease but treated with different medicines. Thus the

primary interest of that study might be to determine if there is a difference in survival between the groups and in extension possibly also between the medicines.In the same

way an engineer might desire to test if there is a difference between how long a com-ponent lasts before breaking in a machine depending on which manufacturer made that

component. In order to keep the notation equivalent to that of Klein and Moeschberger (2005) we will let K represent the groups where Kj represents the jth group. Also τ will

represent the last point in time where there is one observation still at risk for all groups.

The test itself focuses on the hazard at each time point where there is an observed event and determines if there is a differenc between the groups at that time point. A

weak-ness with this kinds of multigroup tests that is pointed out by Klein and Moeschberger (2005, ch. 7.3) is that it tests if at least one population differs from the others at any

(17)

point in time. In other words while it can detect if there is a significant difference between

the groups, it only tells us if there is a difference between one, not which, group and the rest. The hypothesis is:

H0 : h1(t) = h2(t) = h3(t) = · · · = hK(t), f or all t ≤ τ

H1 : at least one of hj(t) differ f or one, or more, t ≤ τ

(2.12)

with the test function

Zj(τ ) = D X i=1 Wj(ti)  dij Yij − di Yi  , J = 1, . . . , K. (2.13)

This test function will be a corner peace in creating the test statistic. In (2.13) Wj(ti)

is representing a weight that would/could be different for each group. However this is not

the case in the most common variations of this test, some of which we will cover below, thus (2.13) can be simplified. Those tests that will concern us use a weighting function

where Wj(ti) = W (ti)Yij thus W (ti) is the same for all groups and Yij is the numbers at

risk in the jth group at the tth

i time. Using this we can simplify (2.13) to (2.14). (Klein

and Moeschberger, 2005, ch. 7.3) Zj(τ ) = D X i=1 W (ti)  dij − Yij  di Yi  j = 1, . . . , K. (2.14)

Further on the variance and covariances of Zj(τ ) from (2.14) are given by:

ˆ σjj = D X i=1 W (ti)2 Yij Yi  1 −Yij Y i   Yi− di Yi− 1  di, j = 1, . . . , K (2.15) and ˆ σjg = − D X i=1 W (ti)2 Yij Yi Yig Yi  Yi− di Yi− 1  di, g 6= j (2.16)

In both 2.15 and 2.16 we find the term Yi−di Yi−1



which will be equal to one in all cases except when two observations have the same time to event. Thus this term corrects for

possible ties. All possible Zj(τ ) are linearly dependent and the test statistic that we

will use is produced by excluding any one of the Zj0s. From this choice we can get the

(18)

in the test statistic (2.17). The test statistic is on quadratic form and when the null

hypothesis is true and we have a large sample it is chi-square distributed with K − 1 degrees of freedom. That means that when using significance level α to test H0 the test

will be rejected when χ2 is larger than the αth upper percentage of a χ2

K−1 distribution.

(Klein and Moeschberger, 2005, ch. 7.3)

χ2 = (Z1(τ ), . . . , Zk−1(τ ))Σ−1(Z1(τ ), . . . , Zk−1(τ ))t (2.17)

In the special case where K = 2 the test statistic in (2.17) simplifies to:

Z = D P i=1 W (ti) h di1− Yi1  di Yi i D P i=1 W (ti)2 YYi1i 1 − YY ii1 Yi−di Yi−1  di = Z1(τ ) ˆ σ1 (2.18)

Where Z is standard normally distributed.

There is a multitude of possible variation on this test due to the weight function and altering this weight function will give the test different properties. The arguably most

simple weight is 1 where all time points have the same importance. This example is commonly known as the Log-Rank test. Below follows a list of some variations on the test

presented by Klein and Moeschberger (2005, ch. 7.3) and Kleinbaum and Klein (2012, ch. 2.5-2.6).

Log-Rank test

W (t) = 1: All time points have the same weight which makes it ideal when the hazard rates in all groups are proportional.

Gehan’s or Wilcoxon Test

W (ti) = Yi: More weight is laid at time points where more people are in the risk group.

That is, weighted towards earlier failures. This test is best if we have reasons to suspect that ”treatment” effect is strongest early on in the research. Focus is on differences early

(19)

Tarone-Ware test

W (ti) = f (Yi) where f (y) =

y: This test puts most weight to differences between the observed and expected events at time points where there is the most data.

In Section: (4.1) both the Log-Rank and Wilcoxon tests will be presented and looked

at when determining significant difference between different sectors and company sizes. This in order to find both general differences throughout time with the unweighted

Log-Rank test at the same time as possible differences early on can be detected using the Wilcoxon test. In our study we will have the most data early on thus the Tarone-Ware

test wild also put emphasis on early differences and it will not be used in favor of the Wilcoxon test.

Example: Now lets implement this group test on our example data. The Factor

vari-able consist of two different entries Factor1 and Factor2 which we can stratify the survival time by. First in order to get a graphical image of the differences Figure: (2.3) shows the

estimated survival function stratified by the Factor variable.

(20)

Since the red line for Factor2 is consistently below the line for Factor1 we can suspect

that Factor1 produces components with longer survival than Factor2. However we need to look at the test results in Table: (2.1) to confirm this. Looking at Table: (2.1) we se

that nether of the test statistics are significant on the 5% level thus there are (so far) no significant differences between the two factories.

Table 2.1: Tests of equality in the example data stratifyed by Factory

Test Chi-Square DF Pr> Chi-Square

Log-Rank 0.4644 1 0.4956

Wilcoxon 1.2141 1 0.2705

2.1.6

The Cox Proportional Hazards model

Sometime there is more than just some group variable that differentiate the individuals in a study. In such a case the method presented in Section: 2.1.5 would not suffice. What

we need is a more complex model that can take multiple variables into account in the same way as an ordinary linear model does. The proportional hazards model is such a

model and it was presented by Cox (1972). Thus the model is often referred to as the Cox proportional hazards model.

Sticking with the notations of Klein and Moeschberger (2005) we will let Tj denote

the time individual j has been/was in the study, δj is indicating if the individual has

experienced the event (δj = 1 if the event has occurred) and Zj(t) is a vector consisting

of the k = 1, . . . , p observed covariates for individual j at time point t. In total there are n individuals (j = 1, . . . , n). For simplistic reasons and due to the fact that there will be

no time dependent variable in this study we can let Zj(t) = Zj. Using these inputs we

can create the model that was suggested by Cox (1972) where h(t|Z) is the hazard rate

at time t and Z is the covariate vector.

(21)

In this equation h0(t) represents a baseline hazard, that is, ignoring all other variables

there will still be some hazard attributed toward experiencing the event. The fact that this proportional hazard model allows for the actual distribution of the survival to be

unknown, or unspecified, is a key feature that is part of what has made the model very popular. This is also what makes the model semi parametric. By specifying h0(t) one

could acquire other linear models such as the exponential or Weibull models. (Allison,

2010, p. 126-127)

In Equation: (2.19) c(βtZ) is a known function and a requirement is that h(t|Z) must be positive. According to Klein and Moeschberger (2005, ch. 8.1) a commonly chosen

function for c(βtZ) is:

c(βtZ) = exp(βtZ) (2.20)

This is also the function that Cox (1972) uses. In this equation,

βt =β1 β2 β3 . . . βp  and Z =            Z1 Z2 Z3 .. . Zp            (2.21) thus βtZ = β1Z1+ β2Z2+ β3Z3+ · · · + βpZp = p X k=1 βkZk (2.22)

and we can rewrite Equation: 2.19 to

h(t|Z) = h0(t) exp(βtZ) = h0(t) exp p X k=1 βkZk ! . (2.23)

(22)

Hazard rate

So why is it called “proportional hazards”? This is due to the fact that we can use the Cox Proportion Hazards model to look at the difference in hazards (of experiencing the

event) between two individuals with different covariate values via the hazard rate: (Klein and Moeschberger, 2005, ch. 8.1) h(t|Z) h(t|Z∗) = h0(t) exp  p P k=1 βkZk  h0(t) exp  p P k=1 βkZk∗  = exp " p X k=1 βk(Zk− Zk∗) # (2.24)

This is a ratio of the two different individuals’ hazards where Z and Z∗ are the

sets of covariates for the first and second individual respectively. This ratio will be a constant which means that the hazard rates are proportional over time between two

different individuals. The proportionality of the hazards between to individuals is the key assumption of this model. The easiest example of describing how to interpret this ratio

is when only one covariate differs between the two individuals. Klein and Moeschberger (2005, ch. 8.1) use the case of one individual receiving a medicine (Z1 = 1) while the

other individual receives a placebo (Z1 = 0) and all other covariates (Z2 − Zp) have the

same value. In this case Equation: 2.24 would result in,

h(t|Z)

h(t|Z∗) = exp(β1) (2.25)

where exp(β1) is the risk of the event occurring if an individual have received the

medicine, in comparison with placebo. If the medicine improves the survival then the ratio will be less than 1.

Estimating β using Partial Maximum Likelihood

We can estimate the values of β when there are no ties in the time to event by the partial likelihood in Equation: 2.26. In the next section we will explore the method when ties are

pressent. The partial likelihood function was also introduced by D. R. Cox in his article from 1972 and it is called partial due to the fact that the baseline hazard (h0(t)) can be

unknown and is left out of the covariate estimation. Due to this lack of information in the likelihood function the standard errors will be larger than if the whole model had

(23)

that the estimates still will have good properties without any knowledge of the baseline

hazard and they will be consistent and asymptotically normal for large samples. (Allison, 2010, p. 126-129)

This likelihood can be derived from Equation: 2.23 as presented by Cox (1972):

L(β) = D Y i=1 exp  p P k=1 βkZ(i)k  P j∈R(ti)exp  p P k=1 βkZjk  (2.26)

In this equation Z(i)k is the kth covariate associated with the individual that have

failure time ti. The nominator only consists of the individual that experience the event

at time ti and the denominator is defined as the conditional set of individuals who were

still in the study just before time ti (the risk set R(ti)). Taking the log of the likelihood

in 2.26 gives: log L(β) = D X i=1 p X k=1 βkZ(i)k ! − D X i=1 ln   X j∈R(ti) exp p X k=1 βkZjk !  (2.27)

The maximum likelihood (or partial maximum likelihood in this case) estimates can be found by maximizing this equation. This is done by solving the derivative of 2.27

(known as the score function) with respect to the β. The score function is given by Cox (1972) and presented in our notation by Klein and Moeschberger (2005, ch. 8.1):

Uh(β) = ∂ log L(β) ∂βh = D X i=1 Z(i)h− D X i=1 P j∈R(ti)Zjhexp  p P k=1 βkZjk  P j∈R(ti)exp  p P k=1 βkZjk  (2.28)

The estimates themselves are then found by solving Uh(β) = 0 for each h = 1, . . . , p.

Calculating these estimates can be done numerically using some kind of iterative process, such as the Newton-Raphson metode, but in this thesis we will rely on the results that

(24)

In the next section we will look at ways to test the global hypothesis regarding the

estimations and in those tests we will need the information matrix. This matrix is the negative of the 2nd derivative of Equation: 2.27. The information matrix will be

de-noted by I(β) which will be a p × p matrix where the (g, h)th element is: (Klein and

Moeschberger, 2005, ch. 8.1) Igh(β) = D P i=1 P j∈R(ti)ZjgZjhexp  p P k=1 βkZjk  P j∈R(ti)exp  p P k=1 βkZjk  −PD i=1 P j∈R(ti)Zjgexp  p P k=1 βkZjk  P j∈R(ti)exp  p P k=1 βkZjk  × D P i=1 P j∈R(ti)Zjhexp  p P k=1 βkZjk  P j∈R(ti)exp  p P k=1 βkZjk  (2.29) Ties

Estimating the β values using the Partial Maximum Likelihood method described above is fine as long as there are no ties in the survival time. If any two events occur at the same

point in time then the estimation of β must be adjusted. There are a number of methods to handle and set up the partial maximum likelihood when ties are present and Klein

and Moeschberger (2005, ch. 8.4) outline three classical methods, the Breslow, Efron and Cox or Discrete methods.

Allison (2010, ch. 5) describes the Discreate method and another method in depth,

the Exact method. The Discrete method assumes that time is discrete hence if two events happen at the same time there is no underlying order, both events really happened at

the same time. In most cases this is highly unlikely and tied events are often due to the fact that we can not measure time exact enough. The Exact method assumes that there

is an underlying order in the events but since we can only observe event at time intervals we can not determine which of the events occurred first if two events are recorded at the

same time point, ti. It is this exact method that will be relevant in our case since our

time data will be on a daily interval, thus two events can happen on the same day but

we will not be able to tell which occurred earlier in the day. We will shortly go trough the theory of the Exact method next.

(25)

The partial maximum likelihood derivation of β described in the previous section

needed all events to be individually ordered in Equation: (2.26) thus ties will pose a problem when using this method. As explained the exact method assumes that ties are

due to inexact measurement of time and that there are a true underlying order. Using this assumption and some basic probability theory the Exact method can estimate the

likelihood at those points in time where ties are present using all possible ways of ordering

the events. If we assume that 3 events are tied at the 5th point in time where events are observed, t5, then there are 3! = 6 different ways these events could have been ordered

in reality. If we let each of these 6 set of orders be denoted by Ai and since each Ai is

mutually exclusive the sum of all Ai will be the union of these events. The likelihood at

the 5th point in time will then be L5 = 6

P

i=1

P r(Ai).

Global test

The global hypothesis that we will be concerned with is H0 : β = β0 vs H1 : β 6= β0.

What we test is if our model with the covariates is significantly different from a reduced model. This reduced model β0 can be seen as the model without the variables of interest

while β represent the model including the variables we want to test. β is called the full model. (Kleinbaum and Klein, 2012, p. 103)

We will let b = (b1, b2, b3, . . . , bp)0 represent the estimated coefficients of β derived

using the maximum likelihood method presented in the previous section. There are three different tests that we will look at in this section and use later on in our analysis in

Sec-tion: 4.2. All of these use this hypothesis but with slightly different test statistics. The three tests presented below are those that will be reported by SAS when we run our model.

The Wald test uses the fact that b has a p-variate normal distribution for large

samples with mean β and variance-covariance matrix I−1(b). The test statistic is then χ2 distributed with p degrees of freedom where the test statistic is: (Klein and Moeschberger,

2005, p. 254)

(26)

The Likelihood Ratio test is also χ2 distributed with p degrees of freedom for large

samples of n but uses the following test statistic: Klein and Moeschberger (2005, p. 254)

χ2LR = 2[log L(b) − log L(β0)] (2.31)

In this test statistic log L(b) represent Equation: 2.27 using the estimated values of

β while log L(β0) is Equation: 2.27 with the reduced model. According to Kleinbaum and Klein (2012, p. 104) the Likelihood ratio test has better statistical properties than

the Wald test but in general, at least for large samples, they produce fairly similar test statistics and rejections.

The last test that we will look at is the Score test which as the name suggests uses

the score function. In this test U (β) = (U1(β), . . . , Up(β))t and U (β) is asymptotically

p-variate normal with mean 0 and covariance matrix I(β). The test statistic is χ2

distributed with p degrees of freedom for large samples and is on the following form: Klein and Moeschberger (2005, ch. 8.3)

χ2S = U (β0)tI−1(β0)U (β0) (2.32)

In Section: (4.2) we will mainly focus on the Likelihood Ratio statistics due to the

better statistical properties but all of these statistics will be presented when a model has been estimated. We will also perform some testes on a subset of variables from the model

and in those cases only the Wald statistic since it is readily available in SAS for these tests.

Example: In Table: (2.2) the result from an fitted Cox model is presented. In this model all available covariates are used to estimate the survival of our components. All

covariates are significant with the largest P-value being 0.0155. Since the hazard ratio for Usage is smaller than 1, 0.815, each extra step of 1 in the usage lowers the hazard

of breakage with a ratio of 0.815. A higher grade awarded in the tensile strength test increases the hazard of the event occurring (lower test score is better). Factor1 is a

dummy for the Factor covariate and reference is Factor2. This means that the estimated

hazard ratio for Factor1 is the hazard for Factor1 components in comparison to those from Factor2. In comparison to the result in Table: (2.1) we now have significant effect

(27)

Table 2.2: Estimated Cox model using example Data

Variable DF Parameter Standard Chi-Square Pr> Hazard Lower 95% Hazard Ratio Upper 95% Hazard Ratio Estimate Error Chi-Square Ratio Confidence Limit Confidence Limit Usage 1 -0.20406 0.05470 13.9180 0.0002* 0.815 0.729 0.905 Grade 1 0.08821 0.01872 22.2083 < 0.0001* 1.092 1.054 1.135 Factory1 1 -1.138955 0.47061 5.8573 0.0155* 0.320 0.122 0.783

∗=significant on the 5% level

The global test of the fitted Cox model is presented in Table: (2.3) and from the results we can deduct that the model itself is significant since all three test statistics have

a P-value of less than 0.0001. In this table we can also se the Akaike information criterion which we will look closer at in the next section.

Table 2.3: Tests of global hypothesis, β = 0, using example data

Test Chi-Square DF Pr> Chi-Square

Likelihood Ratio 41.9147 3 < 0.0001*

Score 40.3968 3 < 0.0001*

Wald 31.3107 3 < 0.0001*

Akaike information criterion (AIC) 134.695

(28)

2.1.7

Assumptions, Goodness of Fit and Diagnostics

In this section we will look at methods to check the vital proportional hazards assump-tions and evaluate how good the model is. We will begin by looking at two methods of

checking for proportional hazards then we will look at some information criteria before we will examine some residuals.

Test of Proportional Hazards

Although we have not considered time dependent variables so far due to the fact that we

will not have any such variables in this thesis there is still a usage of them that we will mention. Klein and Moeschberger (2005, ch. 9.2) describe how time dependent covariates

can be used to test the critical assumption of proportional hazards for our covariates that is needed when modeling with the Cox proportional hazards model. In order to test if a

covariate, Z1, that is not time dependent violates the proportional hazards assumption

we first create a new covariate using Z1 which is artificially time dependent. Let this new

variable be Z2(t) = Z1 × g(t) where g(t) is a function of time such as g(t) = ln t. The

hazard rate at time t (Equation: 2.19) would be,

h(t|Z1) = h0(t) exp[β1Z1+ β2(Z1× g(t))] (2.33)

and the hazard ratio between two individuals with different values on Z1 is

h[t|Z1]

h[t|Z1∗] = exp[β1(Z1− Z

1) + β2g(t)(Z1− Z1∗)]. (2.34)

It is clear from this hazard ratio that it will only be dependent on the time through g(t) if β2 6= 0. Thus by testing the hypothesis H0 β2 = 0 we will test if the proportional

hazards assumption holds for the covariate in mind. A rejection of H0 would mean that

the assumption is violated. What this means in practice is that we look at the

proba-bility of the estimated parameter for the new time dependent variable when estimating the model. If that parameter is significant (different from zero) then the proportional

hazards assumption does not hold for that variable. In our case we will also test the as-sumption on dummy variables and since all dummies have to be tested together a linear

(29)

be used.

Persson and Khamis (2008) present and test the statistical properties and powers for

a number of g(t) choices such as √t, et and ln(t). They found that botht and ln(t)

are good choices and since ln(t) also is the choice that Klein and Moeschberger (2005,

ch. 9.2) propose we will use g(t)=ln(t) when we test the proportional hazards assumption

in Section: (4.2).

There are different solutions to use if the proportional hazard assumptions fails when testing it in this manner. One solution is to include the created time dependent variable in

the estimated model however that variable will be hart to interpret. For further discussion on this se e.g. Klein and Moeschberger (2005, ch. 9.2).

Graphical Evaluation of Proportional Hazards, Arjas Plot

One way to graphically check for the proportional hazards assumption is trough the Ar-jas plot first presented by ArAr-jas (1988). The ArAr-jas plot is not limited to checking the

assumption of proportional hazards, it can also be used to check the overall fit of a pro-portional hazards regression model such as the Cox model. Let us assume that Z∗ is a

set of covariates included in the model and we are considering adding a new covariate, Z1.

Using the Arjas plot we can evaluate if Z1 should be included and if Z1 has proportional

hazards adjusted for the existing covariates. (Klein and Moeschberger, 2005, ch. 11.4)

There exists a number of methods to check for proportional hazards graphically as described by Persson and Khamis (2007) in their comparison of different methods. They

recommend that the Arjas plot/method should be the preferred method of assessing the proportional hazards assumption graphically except for the special case where the

hazard is strictly increasing. Thus this method will be used together with the test of pro-portional hazard using a created function of time variable (described in previous section).

Klein and Moeschberger (2005, ch. 11.4) outline what is needed in order to produce

the Arjas plot and first we will need to estimate the proportional hazards model using the covariate set Z∗. If Z1 is continuous we will need to group it into K levels making

(30)

it discrete. For each of these levels (or the existing levels of a categorical variable) and

at every event time, ti, the Total time On Test (TOT) is calculated from the estimated

cumulative hazard rate of the model and the total number of observed events up to that

time point, N . The calculations of these two statistics are given next:

TOTg(ti) = X Z1, j=g ˆ H(min(ti, Tj)|Z∗j) (2.35) Ng(ti) = X Z1, j=g δjI(Tj ≤ ti) (2.36)

What this means is that TOTg(ti) is the sum of the estimated cumulative hazards

for the the model using the covariate set Z∗ for all individuals in the group g where g = 1, . . . , K up to time ti, or the largest event time in the group Tj. Ng(ti) is simply the

number of experienced events in the same group up to the same point in time. If Z1 is

redundant and not needed for the model’s fit a plot of Ng versus TOTg, the Arjas plot,

will result in a 45◦ line trough the origin. If we plot Ng versus TOTg for all the groups

this will result in K lines and if they are linear but differ from the 45◦ line Z1 should

be included in the model. Finally if the lines produced are not linear it indicates a vi-olation of the proportional hazards assumption. (Klein and Moeschberger, 2005, ch. 11.4)

Example: We will look at an example of the Arjas plot using our example data. In

Figure: (2.4) the Factor variable is checked for proportional hazards. Since it consist of Factory1 and Factory2 this grouping is insinuative to use in the Arjas test and looking

at Figure: (2.4) nether of the two factories seam to have proportional hazards, both lines

(31)

Figure 2.4: Arjas plot of the example data by Factory

Comparing Models using Information Criteria

In order to evaluate if one model is better than another, to evaluate the goodness of fit,

we can use so called information criteria. These information criteria only give an informal indication of which model has the best fit, the difference between two criteria can not be

tested . (Allison, 2010, p. 74)

All three of these criterions use the log-likelihood, in specific they use −2 × log L or −2 × [Equation : (2.27)]. While you can use just −2 × log L as a goodness of fit statistic we will focus on three variations of this statistic which adjust for the number of covariates in the respective model. To be specific they penalise a model with more covariates. Below

these statistics are presented as Allison (2010, p. 74-74) present them (remember that we have a total of p covariates and n observations in our notation):

(32)

AIC = −2 log L + 2k BIC = −2 log L + k log n

AICC = −2 log L + 2k + 2k(k+1)n−k1

(2.37)

Akaike’s information criterion (AIC) and Bayesian information criterion (BIC) both

penalise for additional covariates where (BIC) in most applications penalise the most. The Akaike’s information criterion corrected (AICC) is a slight alteration of the (AIC)

statistic which takes the number of observations into account and thus may behave better in small samples. (Allison, 2010, p. 74-75) 1 When we estimate our models in Section:

(4.2) we will use the AIC value to compare models.

Residuals

Martingale residuals

While we might know which covariates we want to use in our Cox proportional haz-ards model we might be uncertain what form of that variable would explain its effect

on survival best. It might be that, Z2, log Z or some other transformation explains the variable’s contribution to survival better than the plain form Z. Further on we will have

continuous variables in this thesis and it might be appropriate to discretisize one or more of those variables in order to better estimate their influence on the model. A modification

of the Cox-Snell residuals called Martingale residuals can be used to find an appropriate functional form of a variate and whether it should be discretisized. The Martingale

resid-ual, ˆM , for right-censored data and time independent covariates is defined in Equation: 2.38. It has the property of

n

P

j=1

ˆ

Mj = 0 and for large samples each ˆMj is uncorrelated

with population mean zero. These residuals can be seen as the difference between ob-served number of events, δj, and the expected number of events, rj. Thus the Martingale

residuals represent the excess number of events in the data that is not predicted by the Cox proportional hazard model. (Klein and Moeschberger, 2005, ch. 11.3)

ˆ

Mj = δj − rj (2.38)

1Important to remember is that when comparing the goodness of fit statistics for different models

you can not compare (AIC) for one model with (BIC) for another and so on. Only the same criterion can be compared between different models.

(33)

The method for finding an appropriate functional form for the variable of interest is

to exclude that variable when estimating the Cox proportional hazards model and then calculate the Martingale residuals. Let Z1 represent the residual of interest. Next we

plot the residuals ˆMj against Z1 for each jth observation. Usually a smoothed fit of the

diagram is used to get a clear sense of the “linearity” in the scatter plot through time.

It is this fitted curve that gives us an indication of what functional form of Z1 should

be used. The aim is to get a fitted (smoothed) curve that is linear. If the line contains a clear break in it when Z1 is continuous this indicates that discretisizing Z1 whould be

apropriate. (Klein and Moeschberger, 2005, ch. 11.3)

Example: In the example data we have a variable called Grade. This variable could illustrate a test score given to the component in a tensile strength test and thus it might

tell if a component will break. In Figure: (2.5) the Martingale residuals for Grade are plotted with a fitted curve. There is a clear break in this fitted curve which indicates

that the Grade variable which is continuous should be discretized. Locking at the figure suggests that all test scores below 60-65 could be grouped into one group and all above

into another.

(34)

Cox-Snell residuals

While AIC, BIC and AICC can be used to compare the goodness of fit for different models they do not tell if a specific model has a good fit. One way of testing if an estimated

Cox proportional hazards model has good fit is through the Cox-Snell residuals. Lee and Wang (2003, ch. 8.4) present the Cox-Snell residuals as,

rj = − log ˆS(tj) (2.39)

where tj is the observed survival time, censored or uncensored for individual j. ˆS(t) is

the estimated survival function based on the estimated covariate. If the observation at

tj is censored then the corresponding rj will be treated as an censored observation. This

means that when plotting these residuals they will be represented by a step like line just

as the survival function and cumulative hazards function.

If the residuals of a fitted Cox proportional hazards model are good, that is if the model fits the data, then the residuals should line up on a 45◦ line when plotted versus

the estimated cumulative hazard rate for these residuals. If a model fits the dataset then the Cox-Snell residuals will follow the unit exponential distribution. If we let ˆSR(r)

represent the Kaplan-Meier estimates with respect to the residuals then − log ˆSR will be

the estimated cumulative hazard and for each individual − log ˆSR(rj) = rj if the model

is appropriate. (Lee and Wang, 2003, ch. 8.4)

While the Cox-Snell residuals are useful when determining the over all goodnes of fit for a model Klein and Moeschberger (2005, p. 358) point out that this method gives no

indicator why a model does not fit well in such cases. Further on assuming exponential distribution for the residuals only truly holds when the actual covariate values are used

rather than the estimated values and since we want to check the goodness of fit for an estimated model these values will be estimated. Thus departure from the exponential

distribution (which will be observed as departure from the 45◦ line in the figure) can be due to uncertainty in the estimations of β and the cumulative hazard. This effect will be

largest in the right tail of the distribution (and in the right end of the figure) for small samples. (Klein and Moeschberger, 2005, p. 359)

(35)

Example: In Table: (2.2) we estimated a Cox model using our example data. Now we

will look at the goodness of fit for that model using the Cox-Snell residuals. In Figure: (2.6) the Cox-Snell plot is presented and the residuals deviates somewhat from the 45◦

line. This model seams to not fit the data perfectly.

Figure 2.6: Plot of Cox-Snell residuals for the estimated model (all covariates) of the

(36)

2.2

Key Performance Indicators

There is a very large amount of Key Performance Indicators (KPI:s) or financial ratios used to analyse all sorts of aspects of a company’s financial health and well being. Using

these an investor can decide if he find the company worthy of investment. To include them all would be, close to, if not impossible. In this thesis we will rather focus on two

kinds of KPI:s aimed specifically towards the companies’ ability to pay off loans.

Penman (2010, ch. 19) defines two types of ratios that are of importance when analysing a company’s ability to pay off its loans and avoid defaulting (going bankrupt

and being terminated or sold in order to give back what is possible to the lenders/banks). These two main types of ratios are Liquidity Ratio and Solvency Ratio.

2.2.1

Liquidity

The liquidity Ratio is concerned with the short term papers (loans and debts that will

have to be repaid within one year). As this might imply liquidity gives an indication of how well a company will succeed with paying of loans and dept that are due in a near

future. It means that these ratios are constructed using assets (things that can/will be turned into cash) and liabilities (debts and loans) that are, for assets, going to result in

cash within a year, and for liabilities, are coming due within a year. To differ these assets and liabilities from those with a longer time frame they are called current. (Penman,

2010, ch. 19)

So why is this of interest? Brealey et al. (2011, ch. 28.7) makes the comparison to a household and a typical financial situation you can find yourself in. If you for some reason

are facing a large unexpected bill you need to use capital easily accessible in order to pay it quickly. Savings and capital invested in stocks are examples of quick, easy money. But

if those do not suffice then you might have trouble with realizing assets such as a car or a vacation house into cash quickly enough to meat the bill. In the same way companies

have assets that are easy to realize as well as those that will take a significant amount of time before they are turned into cash.

(37)

There are in total six different types of liquidity measures presented by Penman (2010, ch. 19) and we will focus on those that qre also presented by Brealey et al. (2011, ch. 28.7).

Those that indicate the ability of a company’s current assets to pay for the current liabilities. The three that we will not focus on are concerned with how well different

types of cash flow can cover liabilities and expenditures.

Current Ratio = Current Assets

Current Liabilities (2.40)

Quick (or Acid Test) Ratio = Cash + Short-Term Investments + Receivables Current Liabilities

2 (2.41)

Cash Ratio = Cash + Short-Term Investments

Current Liabilities (2.42)

These three ratios in general explain the same thing but there are slight differences between them in the numerator. The Current Ratio is the most general one based on all

current assets while the Quick Ratio excludes Inventories due to the fact that these are a bit slower to turn into cash. Cash Ratio as the name suggests only takes into account

the Cash and those investment that can be liquified almost immediately. (Penman, 2010, ch. 19)

One thing to keep in mind is that not all companies are the same and since they have

different businesses what might be considered a good current ratio for one company could be bad for another in the eyes of an investor. A current ratio of 1:1 could be good for a

company which quickly sells and restocks its inventory while for a manufacturing com-pany with a slow process from inventory to cash a ratio of 2:1 might just be acceptable.

This slowness from inventory to cash might also bee seen as an overoptimistic view of the situation and hence the quick ratio could bee seen as harsher since it excludes the

inventories. (Melville, 2011, p. 364)

(38)

Later we will use the Quick Ratio as an good middle ground between three kind of

ratios to represent liquidity in our study

2.2.2

Solvency

While liquidity gives the investors a picture of the short-term situation investors might be interested in how well a company is equipped to pay off long-term debts. Thus investors

look at Solvency Ratios in order to estimate a companies ability to cover debts in a more distant future. (Penman, 2010, ch. 19)

Below are presented three different types of solvency ratios that are of interest. There

are some differences between them but in general they are rather alike. As before these three ratios are gathered from Penman (2010, ch. 19).

Debt to Total Assets = Total Debt Total Assets

3 (2.43)

Debt to Equity4 = Total Debt

Total Equity (2.44)

Long-Term Debt Ratio = Long-Term Debt

Long-Term Debt + Total Equity (2.45)

It is interesting to note what Brealey et al. (2011, ch. 28.6) mention regarding these ratios. They use the book (or accounting) value of a company, the equity, instead of

the market value. While in a default situation the market value is what determines if debt holders get their money back it also includes things that the investors assume to be

positive for the future value of the shares. Assets such as research and development are included in the mark value during good times but these values might disappear if times

become bad, thus the market value is often ignored by lenders.

The Debt to Equity ratio will be used as a solvency ratio later on which in fact is closely related to another KPI commonly used in Sweden, the Solidity (Soliditet).

3Total Assets=Liabilities + Total Equity

(39)

In this part of the thesis we will focus on the data set gathered in order to answer our

questions from Section: (1.2). We will present a summary of how the dataset and the variables were acquired and how the observations (company stocks) were selected. We

will also look closer at how our event is defined.

3.1

The Observations

In Section: (1.2) it was briefly mentioned that this study would span the year of 2008 and

contain stocks listed on the main Swedish stock market, the OMX Stockholm exchange market. Thus an initial list of stocks traded on the first day of trade, January 2nd, 2008

were gathered from the newspaper Dagens Industri (DI). The January 3rd edition was

used since it contains the closing prices of all stocks traded in Sweden during the first day

of trade. The actual time series data were then collected using the software Datastream Professional from Thomson Reuter. Due to missing information/data a total of 293 stocks

from the original list of 301 companies and their daily closing prices during 2008 were gathered from Datastream. In Appendix: (7.3) a list of all 293 stocks is presented.

3.2

The Variables

Apart from the necessary event time variable that will be presented in Section (3.3) a

number of covariates will be used in our study. What sector a company operated in will of course be a vital variable since the primary interest in this thesis is the effect on stock

price attributed to the sectors during 2008. During 2008 stocks in Sweden were typically divided into 9 different sectors and the information about which sector a particular

com-pany operated in were gathered from the January 2nd edition of DI.

Another variable that is also collected from DI is company size. It is logical to include size in this study since the size of a company very well could contribute to how well it

survives a financial crisis like the one in 2008. By DI, and in general, companies are

(40)

sorted into three size groups, Small Cap, Mid Cap and Large Cap. What size a company

is sorted to is determined by the accumulated value of its stocks (number of stocks × price of one stock). If the stock value of a company is larger than 1 billion euro it is

sorted to Large Cap. If the value is between 150 million euro and 1 billion euro it is a Mid Cap company and finally companies with less that 150 million in accumulated stock

value is regarded to be a Small Cap company.

In Section: (2.2) we have discussed the usage of Liquidity and Solvency ratios to

eval-uate the capacity off a company to pay of its loans. We will use these two measurements or ratios in our model in order to further, apart from size, adjust for the fact that not all

companies are the same.

These ratios are presented by the companies for each fiscal year (12 month interval of financial reporting, typically Jan-Dec) and presented in the annual report presenting

figures for that year. This means that, in general, last years ratios will be available 3-5 months into the new year. When our study starts in january 2008, harshly speaking, it

was only the KPI:s from the financial report regarding the year 2006 (presented during the spring of 2007) that were available to the traders. This would imply that it is the

Liquidity and Solvency ratios from 2006 that should be used in the study since these are the latest figures available at the beginning of the study. However we vill use the KPI:s

that were presented in the financial reports for 2007, thus available to the traders during the spring of 2008, rather than those from 2006 in order to get figures that are a bit more

up to date.

The two ratios that will be used in this study are the Quick Ratio as an indicator of Liquidity and Debt to Equity as a solvency ratio. As with the time-series data for each

stock these KPI:s are gathered from Datastream Professional where unfortunatly some missing values regarding debt, liability and assets resulted in a total of 201 observation

with information about both their Quick Ratio and Debt to Equity. As we will see in Sec-tion: (4.2) this will mean that when including these variables will have less observations

(41)

3.3

An Event

In order to find the time to event for each of our stocks we need to define what an event is. Since we are interested in finding differences that determine how well a company can

withstand a recession the death or termination of a company would be a suitable event. However since few of the companies traded in the beginning of 2008 were terminated

during the year this would lead to few events. Thus this thesis will use the same method as Ni (2009) where the event is defined as a specific amount of fall in a stock’s price.

Looking at the OMX Stockholm share price index, which is an index compiled of the

prices for all stocks traded on the OMX Stockholm exchange, we can determine that the index lost almost 50% of its value. In fact the index was at its all year high notation on

the first day of trade, january 2nd. Using that as the index base (value 100) its lowest notation of 50.18 were measured on the 21st of November. Since the index is a weight of

all companies some will have a higher value at a specific point in time and some will have a lower value than the index at that point. This means that although some companies

also fell to 50% of their initial value during this year others will not have done so.

In this study we will work with two types of event times. One where the study starts at the beginning of the year for all stocks and ends on the last day of trade. This would be

Type I censoring as described in Section: (2.1.3). Since the price index fell to a minimum of 50% of its initial value at the beginning of the study the arbitrary point of a price fall

to below 60% of the initial value of the stock will be regarded as an event. This point is chosen such that most observations (stocks) will have experienced the event and avoid a

huge amount of censored observations (choosing the event to be a price fall to 50% of the initial stock value whould result in more censored cases). To separate this time to event

from the next one we will refer to this as ’Day One’ start.

The second time to event variable that will be used is individual for each stock. This time to event is going to use the same window in time for the study however the study of

each observation will start at its respective year high notation. That is for some stocks the time to event might be counted from the first day of the study, january 2nd, while

(42)

will end at the end of the year. Thus we are still are working with right censoring but

in this case Type I generalized censoring. Once again we will use the price index to find a suitable threshold for an event. Since the index has its highest notation on the first

day of trade in the study the maximum price fall from the highest notation and the first notation in the year will be the same. This means that we can use a price fall to below

60% of each stock’s highest notation as an event as well. This method is the one used by

Ni (2009) and this will be referred to as individual time to event.

A difference that is noteworthy between these two methods is that in the first case, ’Day One’, all censored observations (except for random censoring events) will be at the

same time point (the maximum possible study time, one year or 261 days of trade). The censoring time for observations that have not experienced the event will differ when we

use individual start since each observation is going to be observed for a different amount of time.

A third method of defining the start of our study could be to determine the start

of the financial crisis of 2008. One such definition of start time could be the Lehman Brothers collapse of 2008 however other starts could be defined as well. Attempting to

of define the start of the 2008 financial crisis is left out of this study and we will only use the two time to event variables presented above.

In this dataset there are some cases where we can not follow the stock’s price to the

end of the study while the event still has not occurred at the point where the observation is lost. Thus there are some cases where stocks are lost to random censoring (se Section:

(2.1.3)). As explained these observations can have experienced a number of things such as merges, buy ups or termination of trade which have resulted in the stock no longer

being traded. For both types of time to event data , ’Day One’and individual, there are a total of seven cases were the stocks have been lost to the study. These have been censored

References

Related documents

This refers, however, to listed companies and do not focus specifically on the financial crisis in 2008 as our paper does, but it has a lot of similarities to our study since

The Kaminsky-Reinhart model will be applied using empirical data for Greece between the years 2001 and 2009, where the risk of a currency crisis and a banking crisis will be

Translated into the present financial crisis we should hence expect a rather immediate response in lower remittances as labour market conditions in remitting countries worsen, and

The aim of this paper is to compare the performance of private equity owned companies to the performance of publicly traded companies and conclude which group recovered best from the

Nordea states that in 2010 it had a tier 1 capital relation higher than the new requirement, a good return on equity compared to the market average and a stable liquidity

Judging by the results from the regression with classical standard errors (table 5) and its F-tests, size, growth, tangibility of assets and profitability are all

The input layer, hidden layer and the output layer are the core matrices of this model’s architecture, built in such way that data are only allowed to

Notably, although participating firms with relatively low treatment intensity invest less and at a higher marginal product of capital than non-participating firms – as expected if