• No results found

Focusing on Surgical Decompression in Lumbar Spinal Stenosis

N/A
N/A
Protected

Academic year: 2021

Share "Focusing on Surgical Decompression in Lumbar Spinal Stenosis"

Copied!
58
0
0

Loading.... (view fulltext now)

Full text

(1)

Analysis of Prognostic Factors Including Slippage for the Patients’ Self-reported Clinical Outcomes

Focusing on Surgical Decompression in Lumbar Spinal Stenosis

Shaobo JINc Supervisor: Adam Taube

Master Thesis in Statistics Department of Statistics Uppsala University, Sweden

May, 2011

(2)

ii

Abstract

200 patients were retrospectively studied to identify prognostic factors of clinical outcomes of surgery for the lumbar spinal stenosis. Baseline characteristics and out- comes between patients with degenerative spondylolisthesis and patients with spinal stenosis alone was compared. Influence of further slippage was also studied. Logis- tic regression and several model selection methods were used to identify prognostic factors. ’Age’ was found to be a prognostic factor for satisfaction, walking capacity and change in leg pain post-operation. ’PF scores pre-operation’ could be used to predict walking distance post-operation. There was no significant difference between patients with degenerative spondylolisthesis and patients with spinal stenosis alone.

Further slippage was not important in clinical outcomes.

Keywords: Lumbar spinal stenosis, degenerative spondylolisthesis, further slippage

(3)

iii

Acknowledgements

I greatly appreciate my supervisor Prof. Adam Taube for his constructive thoughts and suggestions. Thanks to Dr.Bo Nystr¨om for providing the data and giving fruitful comments about the backgroud of the lumbar spinal stenosis. I also thank Birgitta Schillberg, Ulf Mostr¨om, Per Lundin and other colleagues for collecting the data.

(4)

Contents

Abstract ii

Acknowledgements iii

List of Tables vi

List of Figures viii

1 Introduction 1

1.1 Background . . . 1

1.2 Previous studies . . . 3

2 Methodology 6 2.1 Available data . . . 6

2.2 Statistical methods . . . 6

2.2.1 Univariate analysis . . . 6

2.2.2 Multivariate analysis . . . 7

3 Descriptive analysis 8 3.1 Basic population . . . 8

3.2 Pre-operative factors . . . 9

3.2.1 Surgery related factors . . . 10

3.2.2 Post-operative factors . . . 10

3.2.3 Outcomes . . . 11

4 Outcome analysis 12 4.1 Satisfaction . . . 12

4.1.1 'Univariate' analysis . . . 12

4.1.2 Multivariate analysis . . . 15

4.2 Walking distance post-operation . . . 19

4.2.1 Univariate analysis . . . 20

4.2.2 Multivariate analysis . . . 21

4.3 Change in back pain post-operation . . . 22

iv

(5)

CONTENTS v

4.3.1 Univariate analysis . . . 23

4.3.2 Multivariate analysis . . . 23

4.4 Change in leg pain post-operation . . . 24

4.4.1 Univariate analysis . . . 24

4.4.2 Multivariate analysis . . . 24

5 Degenerative spondylolisthesis Versus spinal stenosis 26 5.1 Comparison of baseline characteristics . . . 26

5.2 Comparison of outcomes . . . 26

6 Does further slippage matter? 29 6.1 Comparison of baseline factors . . . 29

6.2 Comparison of outcomes . . . 29

7 Alternative methods 32 8 Further Discussion 34 8.1 Discussion . . . 34

8.2 Conclusions . . . 37

Reference 38

A Basic information of available data 42

B ’Univariate’ analysis of outcomes 45

C Walking distance post-operation 46

D Change in back pain post-operation 48

E Change in leg pain post-operation 50

(6)

List of Tables

3.1 Slippage before surgery and No. of levels decompressed . . . 10

3.2 Slippage before surgery and further slippage . . . 11

4.1 Variables after combination of categories . . . 13

4.2 'Univariate' analysis of 'Satisfaction' . . . 14

4.3 Saturated logistic regression of 'satisfaction' . . . 17

4.4 Profile likelihood confidence intervals . . . 17

4.5 Stepwise regression and Backward elimination methods . . . 18

4.6 Results of the model chosen by AIC . . . 19

4.7 Prognostic factors for other outcomes . . . 21

5.1 Categorical baseline characteristics of patients with degenerative spondy- lolisthesis and spinal stenosis . . . 27

5.2 Continuous baseline characteristics of patients with degenerative spondy- lolisthesis and spinal stenosis alone . . . 27

5.3 Comparison of SF-36 scores post-operation . . . 28

6.1 Categorical baseline characteristics of patients with further slippage and without further slippage . . . 30

6.2 Continuous baseline characteristics of patients with further slippage and without further slippage . . . 30

6.3 Comparison of SF-36 scores post-operation . . . 31

A.1 Frequencies and percentages of categorical data . . . 42

A.2 Basic information for continuous variables . . . 44

B.1 P-values of univariate analysis . . . 45

C.1 Saturated logistic regression model of 'Walking distance post-operation'(Profile likelihood confidence intervals) 1 . . . 46

C.2 Model selection for 'Walking distance post-operation' . . . 47

1There is no big difference between the Wald method and the profile likelihood method. So we simply show the results by the profile likelihood method

vi

(7)

LIST OF TABLES vii

D.1 Saturated logistic regression model of 'Change in back pain post- operation'(profile likelihood confidence intervals)2 . . . 48 D.2 Model selection for 'Change in back pain' . . . 49 E.1 Saturated logistic regression model of 'Change in leg pain post-operation'(Profile

likelihood confidence intervals) 3 . . . 50 E.2 Model selection for 'Change in leg pain post-operation' . . . 50

2There is no big difference between the Wald method and the profile likelihood method. So we simply show the results by the profile likelihood method

3There is no big difference between the Wald method and the profile likelihood method. So we simply show the results by the profile likelihood method

(8)

List of Figures

1.1 Anatomy of spinal stenosis [14] . . . 1

1.2 Spinal column [14] . . . 2

3.1 Histogram of 'Age' . . . 9

4.1 Empirical distribution of SF-36 scores(PF,BP,GH,MH) . . . 15

4.2 ROC curve for the saturated model . . . 18

4.3 Model diagnostics4 . . . 18

4.4 ROC curve for the model selected by stepwise regression . . . 19

4.5 Residuals of the model chosen by stepwise regression5 . . . 19

4.6 ROC curve for the model selected by AIC backward elimination . . . 20

4.7 Residuals of the model chosen by AIC6 . . . 20

5.1 Empirical distribution of SF-36 scores(PF,BP,GH,MH) . . . 28

6.1 Empirical distributions of SF-36 scores(PF, BP, GH and MH) . . . . 31

C.1 ROC curve of the saturated model . . . 47

C.2 Model diagnostic of the saturated model7 . . . 47

D.1 ROC curve of the saturated model . . . 49

D.2 Model diagnostic of the saturated model8 . . . 49

41 stands for 'Satisfied' and 0 stands for 'Uncertain or not satisfied'

51 stands for 'Satisfied' and 0 stands for 'Uncertain or not satisfied'

61 stands for 'Satisfied' and 0 stands for 'Uncertain or not satisfied'

71 stands for 'Less than 1km' and 0 stands for 'more than 1km'

81 stands for 'Unchanged or worse' and 0 stands for 'Better'

viii

(9)

Chapter 1 Introduction

1.1 Background

Our study is concerned with spinal stenosis, which means narrowing of the spinal canal resulting in pressure on the spinal cord and/or nerve roots (see Figure 1).

Figure 1.1: Anatomy of spinal stenosis [14]

Spinal stenosis can occur anywhere in the spine: cervical, thoracic or lumbar spine.

Generally, it is common to have spinal stenosis in the low back and neck, which are called lumbar spinal stenosis and cervical spinal stenosis respectively. Lumbar spinal stenosis causes pain in the lower back and in the legs when standing or walking.

The cause of spinal stenosis can be classified into two cases: inherited spinal stenosis and acquired one. The latter is more common, and usually due to the aging process.

Injury in the back may also cause spinal stenosis. The most common reason of spinal stenosis is the bone spurs (osteoarthritis). The body produces little bones called bone spurs in the place where cartilage becomes rough and bones rubbing against each other. Besides, degenerative spondylolisthesis may result to narrowing of the spinal canal too. The word 'spondylolisthesis' comes from the Greek roots spondyl

1

(10)

CHAPTER 1. INTRODUCTION 2

Figure 1.2: Spinal column [14]

(11)

CHAPTER 1. INTRODUCTION 3

(spine) and olisthesis (slip) [38]. If such forward slipping narrows the spinal canal sufficiently such that pressuring on the spinal cord and nerve roots, by definition, it causes spinal stenosis.

The Clinic of Spinal Surgery in Str¨angn¨as (CSS) is a specialist clinic for spinal surgery located at Str¨angn¨as, Sweden. It targets its activities for patients with acute or chronic disorders of the back or neck. At CSS, magnetic resonance imaging (MRI) is used to diagnosing and evaluating spinal stenosis, which is particularly sensitive to detect such problems. Once detected, non-surgical treatments and/or surgery can be carried out. Non-surgical treatments include drugs and analgesics, corticosteroid injections, massage and acupuncture, and so on. If non-surgical treatment fails or severe symptoms are detected, it should be treated with surgery.

In the operation, the tissues that press against nerve structures are removed to create more spaces in the spinal canal. The procedure is called decompression. In most hospitals, the decompression of the involved area is achieved by laminectomy, which means removal of the vertebral arch. At CSS, however, microsurgical technique is used which allows decompression without removing the vertebral arch. It has been shown that following laminectomy vertebrae may slip out of the correct position caus- ing pain in the leg and back. In order to deal with this issue, bone graft is needed to create an environment where bones will fuse together to make the spines stable. How- ever, not everyone needs a spinal fusion, and probably less so following microsurgical operations with preservation of the vertebral arch. A lot of research has been done concerning the question whether decompression should be combined with fusion in case of degenerative spinal stenosis, the less so if there is already a slip before surgery- degenerative spondylolisthesis.

In this paper, we will focus on microsurgical decompression without fusion in pa- tients operated for lumbar spinal stenosis. Our aim is to find what prognostic factors influence the outcome of surgery. Our special interest is whether a slippage between two vertebrae seen before surgery means a clinical result after surgery inferior to that where slippage does not exist and whether further slippage after surgery is a factor of importance for a bad clinical result.

1.2 Previous studies

Lots of work has been done concerning spinal stenosis. In general, both surgical and non-surgical treatments can be used to cure spinal stenosis or at least alleviate the symptoms. Atlas et al. [3] found that patients with severe lumbar spinal stenosis who were treated by surgery had greater improvement than patients treated non-

(12)

CHAPTER 1. INTRODUCTION 4

surgically at 1-year evaluation. Atlas et al. [4] found that relative benefit of surgery declined over time but remained superior to non-surgical treatments. However, few randomized studies are available in the literature comparing surgical and non-surgical treatments. The assignment of the treatment is usually according to the symptoms of patients and judgement of physician [4]. Despite of this, surgery is a world-wide accepted treatment for patients who suffer from spinal stenosis. Most of the works in the literature focused on surgery treatments. However, it seems that there is no consensus in almost all aspects concerning surgical treatment.

Most papers used grade scale to evaluate the results: 'excellent, good, fair, poor' or 'satisfied, uncertain, not satisfied'. There was a large variation in the percentage with good outcomes or satisfaction rates [4, 10, 15, 17, 18, 29, 34, 37]. Even though the ways that different authors defined excellent-good or satisfaction was different, the outcomes were based on patients’ subjective assessment.

When it comes to prognostic factors, the variations are larger. Much work has been done to find prognostic factors to predict the outcomes of surgery. J¨onsson et al. [17]

found that duration of leg pain showed some tendency toward correlation with infe- rior result of surgery, which was not significant, and the preoperative anteroposterior (AP) diameter was significantly correlated with the outcome. Johnsson et al. [15] also found that largest AP difference was significantly correlated with the outcome, and he also found gender and further slippage were significant factors. However, no others found gender would influence the outcome. Lehto et al. [28] only found that females had better results than males, but it was not significant. They also found that No. of levels decompressed would influence the outcome, but no other studies have confirmed this result. Katz et al. [18] found that predominance of back pain (which was more severe, back pain or leg pain), sickness impact profile and comorbidity scale could be used to predict whether the surgery was excellent, good, fair or poor. In another paper published four years later, Katz et al. [20] found that self-rated health was the most powerful to predict the outcomes of surgery. He also noticed that cardiovascular comorbidity, mental health and income were significant factors.

Besides the satisfaction rate or excellent-good rate, Katz et al. [20] found that walking capacity after surgery was influenced by self-rated health, cardiovascular comorbid- ity, income, walking capacity before surgery and whether fusion was performed. They also found self-rated health, cardiovascular comorbidity, income, mental health and whether fusion was performed would influence symptom severity. J¨onsson et al. [17]

found that comorbid disorders would worsen walking capacity significantly, and low preoperative AP diameter was significantly correlated with improvement of walking ability.

(13)

CHAPTER 1. INTRODUCTION 5

Both degenerative spondylolistheis and acquired spinal stenosis will cause narrowing of the spinal canal, pressuring on the nerve roots. It is common to have degener- ative spondylolisthesis with spinal stenosis. Johnsson et al. [16] separated patients with spinal stenosis alone and patients with degenerative spondylolisthesis and spinal stenosis. Niggemeyer et al. [30] excluded patients with degenerative spondylolisthe- sis. Some authors simply focused on patients with both [9, 13]. Pearson et al. [31]

designed a randomized study to analyze whether such slippage mattered. However, not everyone was a complier. More than 30% percent of patients assigned to surgery group didn’t have that intervention, and more than 40 percent of patients assigned to non-surgical group had surgery instead. They found that degenerative spondylolis- thesis patients improved more with surgery than patients with spinal stenosis alone in BP and PF scores of SF-36. They suggested that patients with spondylolisthesis should not be combined with patients with spinal stenosis alone. Johnsson et al. [15]

found that slippage before surgery could not be shown to influence the outcome.

Katz et al. [20] used an outcome variable ranging from 0-100 to evaluate whether the surgery was good or not, and he found slippage before surgery could not be shown to influence such outcome.

Except trying to find prognostic factors, some authors focused on whether fusion should be performed in the surgery. Whether fusion should be performed is also con- troversial. Gelalis et al. [10] reported that patients with concomitant spinal fusion were more satisfied with surgery. However, there were only 5 patients who underwent fusion. Herkowitz and Kurz [13] found that the results were significantly better with respect to relief of pain in back and leg in patients who had a concomitant fusion, as well as excellent/good/fair/poor outcome. However, a meta-analysis showed that decompression only processed highest rate of good results, followed by decompression and fusion with instrumentation, decompression and fusion without instrumentation had the worst result [30]. Katz et al. [19] showed that non-instrumented arthrodesis was associated with superior relief of back pain and improvement of walking capacity at 6 months and 2 years. A randomized study showed that successful fusion didn’t influence outcome in pain in the back and leg [9].

Overall, uncertainties still remain in treatment assignments, prognostic factors and whether fusion should be performed or not.

(14)

Chapter 2

Methodology

2.1 Available data

We retrospectively studied 200 patients operated from 2000 to 2002. The follow-up ranged from 5.3 to 9 years after surgery. The outcome was first measured by the patients’ own opinions, whether they were satisfied with the result, uncertain or not satisfied. Secondly, CSS had the measure concerning changes in back pain and leg pain, whether the patients reported to be much better, somewhat better, etc. Slip- page before and after surgery have also been recorded. Thirdly, both duration of the back pain and duration of the leg pain were categorized into 5 disjoint categories, so we can’t use the mean and standard error to characterize the duration of the pain.

The case is similar concerning ’walking distance’ before and after surgery, which were categorized into 4 disjoint categories. Finally, we have all data in SF-36 before and after surgery. SF-36 stands for Short Form (36) Health Survey, which is a survey of patient health. It is commonly used in medical applications. SF-36 consists of 8 kinds of scores. They are called physical functioning (PF), role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE) and mental health (MH).

2.2 Statistical methods

Statistical methods used were univariate analysis and multivariate analysis.

2.2.1 Univariate analysis

We selected one outcome variable and studied the relationship between the out- come variable and every explanatory variable one at a time using chi-squared test or Kruskal-Wallis test which was just the Mann-Whitney test when the variables had

6

(15)

CHAPTER 2. METHODOLOGY 7

two categories. We use normal approximation when calculating P-values. Empirical distribution was estimated if the outcome variable was continuous.

2.2.2 Multivariate analysis

For every categorical outcome variable, we used logistic regression and stepwise re- gression to identify significant explanatory variables. When we talked about the stepwise regression, there were 3 different methods under this framework, namely forward selection, backward elimination and stepwise regression which was the com- bination of the former two methods. However, Hastie et al. [12] argued that the F-test based selection methods were out of fashion, and they recommended the use of AIC based backward elinimation. The method AIC based backward elimination is similar with the backward elimination, but based on the AIC instead of the F-test.

At each step, we delete one variable and select the model with the smallest AIC, until any deleting will cause the AIC increasing. In this paper, we would use F-test based backward elimination and stepwise regression together with AIC based back- ward elimination. Analyses were performed with SAS and/or R. Type 3 analysis was used to test whether variable of interest was statistically significant (Type 3 analysis compares likelihood of the full model with the model without the variable of interest).

Conventionally, significance level was set to be 0.05. Simon and Altman [35] argued that because many comparisons were made in prognostic factors studies, 0.01 was more appropriate than 0.05. So we used 0.01 as significance level, but we showed the results using 0.05 and 0.1 as well.

(16)

Chapter 3

Descriptive analysis

In our study, the available variables were 'Surgeon', 'Gender', 'Age', 'No. of levels operated', 'Duration of back pain', 'Duration of leg pain', 'Walking dis- tance pre-operation', 'Walking distance post-operation', 'Change in back pain post- operation', 'Change in leg pain post-operation', 'Satisfaction', 'Slippage before surgery', 'Further slippage' and SF-36 scores pre and post operation. Among these available variables, 'Surgeon' and 'Gender' were nominal variables, 'No. of levels operated', 'Duration of back pain', 'Duration of leg pain', 'Walking dis- tance pre-operation', 'Walking distance post-operation', 'Change in back pain post- operation', 'Change in leg pain post-operation', 'Satisfaction', 'Slippage before surgery' and 'Further slippage' were ordianl variables. 'Age' and all the SF-36 scores were also ordinal variables, but the categories were so many for each of them.

Petrie and Sabin [32] argued that the ordinal variables with more than 2 categories could be treated as continuous variables by assuming the linear relationships which made the full use of the ordering; meanwhile they could be treated as nominal vari- ables which wasted some information of ordering, but it was prefered. Apparently, 'Age' and all the SF-36 scores all had linear relationships themselves, so we treated these variables as continuous variables. For the other ordinal variables, we treated them as nominal variables and created dummy variables when make regression.

3.1 Basic population

108 females and 92 males were included in the study. The mean age when surgery was performed was 64.62 (SD 9.74) ranging from 34 to 86 years old. Females were older than males, 65.50 (SD 9.24) VS. 63.59 (SD 10.25), but this was not statistically significant (P=0.2551). From Figure 3.1 we can see that the distribution of 'age' is not symmetric.

8

(17)

CHAPTER 3. DESCRIPTIVE ANALYSIS 9

Figure 3.1: Histogram of 'Age'

3.2 Pre-operative factors

'Duration of back pain' was categorized into 5 levels, labeled from 0 to 4 meaning 'No pain in the back', 'Less than 3 months', '3-12 months', '1-2 years', 'More than 2 years' respectively. So we can't use the mean and standard deviation of the duration. It is worth to point out that no patient in our study was classified into level 0, that was, all the patients involved in our study had pain in the back.

The frequencies for the other levels were 21, 35, 73 and 71 from level 1 to level 4 respectively. When it came to 'duration of leg pain before surgery', the situation was similar. 'Duration of the leg pain' was categorized into 5 levels, labeled from 0 to 4 meaning 'No pain in the leg', 'Less than 3 months', '3-12 months', '1-2 years', 'More than 2 years' respectively. Still, there was no patient who was labeled as level 0. The frequencies for the other levels were 12, 40, 82 and 66 from level 1 to level 4 respectively.

'Walking distance pre-operation' was categorized into 4 categories, labeled from 1 to 4 meaning 'Less than 100 meters', '100 to 500 meters', '0.5 to 1km', 'More than 1km' respectively. The frequencies for each level were 70, 66, 36 and 28 from level 1 to level 4.

'Slippage before surgery' was categorized into 3 levels, labeled from 0 to 2 meaning

(18)

CHAPTER 3. DESCRIPTIVE ANALYSIS 10

Table 3.1: Slippage before surgery and No. of levels decompressed

No slippage before surgery Slippage exists before surgery

Single level decompressed 38 34

Multilevel decompressed 80 48

'No slippage', 'Slippage 3-6 mm' and 'Slippage more than 6 mm'. The frequencies were 119, 61 and 20 for every level.

Four of eight kinds of scores of SF-36 questionnaire could be expected to be correlated with our problem: PF, BP, GH and MH. There was 1 missing value in 'PF score pre-operation', 5 in 'BP score pre-operation', 4 in 'GH score pre-operation' and 1 in 'MH score pre-operation'. The mean and standard deviation for every variable was 36.9749 (SD 20.2789), 30.4769 (SD 15.3072), 66.3214 (SD 19.1119), 67.7538 (SD 20.9972).

3.2.1 Surgery related factors

There were 8 different surgeons, and the numbers of operations that they performed were quite different. Surgeon MR carried out 79 operations while surgeon SN and HS only operated 1 and 2 times respectively.

Automatically the lumbar spine has five levels (five vertebrae). Narrowing in spinal canal can occur in one or several levels. In our study, 36% of patients had a single-level decompression and 4% patients had four or more levels decompressed. Older patients tended to have multilevel decompressed (P=0.0056). From Table 3.1 we could see that patients without slippage before surgery tended to be decompressed on several levels. However, chi-square test was not significant with P-value equaled to 0.1796.

3.2.2 Post-operative factors

According to the radiologist, the measuring technique was not so precise, then slip- page less than 3 mm was uncertain. So slippage started from 3 mm. And whenever slippage less than 3 mm was observed, it was marked as no slippage. For slippage which was larger than or equalled to 3 mm, the actual slippage was recorded. Further slippage was found in 62 patients. For simplicity, in the rest of the paper, we clas- sified the patients into two groups: 'With further slippage' and 'Without further slippage'. Patients with moderate slippage before surgery tended to have the largest further slippage rate(37.70%). 30.25% of the patients with no slippage before surgery had further slippage, while only 15% patients who suffer from severe slippage before surgery had further slippage. The association between the slippage before surgery and further slippage was not significant(P=0.1567).

(19)

CHAPTER 3. DESCRIPTIVE ANALYSIS 11

Table 3.2: Slippage before surgery and further slippage Further slippage

0 1

Slippage before surgery

0 83 36

1 38 23

2 17 3

3.2.3 Outcomes

'Walking distance post-operation' were categorized into 4 categories, labeled from 1 to 4 meaning 'Less than 100 meters', '100 to 500 meters', '0.5 to 1km', 'More than 1km'. The frequencies for each level were 34, 35, 20 and 111 from level 1 to level. There were 20 cases for whom the walking distance post-operation was shorter than that pre-operation.

Both 'Change in back pain post-operation' and 'Change in leg pain post-operation' were categorized into 7 categories labeled from 1 to 7 meaning 'No back pain before', 'Completely free', 'Much better', 'Somewhat better', 'Unchanged', 'Somewhat worse', 'Worse'. The frequency for each category was 10, 54, 67, 32, 4, 10 and 23 respectively for back pain and 21, 65, 50, 24, 15, 8 and 16 for leg pain. There was one missing value in 'Change in leg pain post-operation', the corresponding value of 'Change in back pain post-operation' was 3.

The variable 'Satisfaction' described the degrees of satisfaction of patients. 1 stood for 'Satisfied', 2 stood for 'Uncertain' and 3 stood for 'Not satisfied'. The values were subjectively selected by patients themselves. The frequencies of 3 different cat- egories were 157, 35 and 7, and there was one missing value.

There were 5 missing values in 'PF score post-operation', 6 in 'GH score post- operation', 4 in 'MH score post-operation'. The means and standard deviations of PF, BP, GH and MH scores post-operation were 60.1382 (SD 29.9768), 57.7550 (SD 28.6813), 64.4381 (SD 22.3666) and 79.0612 (SD 20.5837) respectively.

We summarize basic information of variables mentioned above in Appendix A.

(20)

Chapter 4

Outcome analysis

There were four outcome variables that we were interested in: 'Satisfaction', 'Walking distance post-operation', 'Change in back pain post-operation' and 'Change in leg pain post-operation'. Possible explanatory variables were the same for every out- come variable. We would illustrate the procedure in details using 'Satisfaction' as the dependent variable. For other outcomes, we simply show the results and leave the details in Appendix.

4.1 Satisfaction

The variable 'Satisfaction' had only three levels: 'Satisfied', 'Uncertain' and 'Not satisfied'. The values were subjectively selected by patients themselves to evaluate the outcome of the variable. Only 7 patients (3.52%) stated that they were not sat- isfied with the surgery, so I combined the categories 'Uncertain' and 'Not satisfied' into one group. The frequencies were 157 and 42. The satisfaction rate was 78.89%

(Table 4.1).

In the literature, many authors used 'Satisfaction' as the outcome variable to measure the success of surgery. Patients know their overall health condition better than anyone else. So the values patients selected can represent improvement in almost every aspect, which means it is suitable as an outcome variable.

4.1.1 'Univariate' analysis

Some surgeons performed very small amount of surgeries. So we combined surgeon AA, DR, HM, HS, SN and SW into one category, then we had 3 levels: 'MR', 'BN' and 'Other surgeons'. The frequencies for each category were 79, 62 and 59 respec- tively (Table 4.1). The rates of satisfaction for these three subgroups were 78.48%, 88.52% and 69.49%, the variability was not significant (P=0.0380). When it came to

12

(21)

CHAPTER 4. OUTCOME ANALYSIS 13

Table 4.1: Variables after combination of categories

Category Frequency Percentage

Satisfaction Satisfied 157 78.89%

Uncertain or not 42 21.11%

Surgeon

MR 79 39.50%

BN 62 31.00%

The others 59 29.50%

No. of levels operated

One 72 36.00%

Two 86 43.00%

Three or more 42 21.00%

'Gender', such rates were similar for males (79.12%) and females (78.70%). There was no significant difference between these two.

Since only 6 patients had four levels decompressed and only 2 patients had five lev- els decompressed, I combined 'Three levels', 'Four levels' and 'Five levels' as one category, the corresponding frequency was 42. Patients with two levels decompressed had the highest rate of satisfaction with 81.40%. There was a tendency that patients with longer duration of back pain had lower rate of satisfaction. However, it was not significant. Such trend didn't exist with respect to the duration of leg pain. There was also a tendency that patients who had longer walking distance before surgery had higher rate of satisfaction, but it was not statistically significant. Patients who had severe slippage before surgery had the highest rate of satisfaction which equaled to 84.21% while patients who had moderate slippage before surgery had lowest rate of satisfaction which equaled to 72.13%. 'Further slippage' is not a prognostic factor, but our special interest is that whether further slippage influences the outcome of the operation. So we looked through these variable as well. The satisfaction rate of pa- tients without further slippage was slightly higher than that of patients with further slippage. However, the influence was not significant. Figure 4.1 showed the empirical distribution of PF, BP, GH and MH scores pre-operation for the 'Satisfied' subgroup and 'Uncertain or not satisfied' subgroup. There was no significant difference within each kind of score. However, given PF score, the empirical distribution function of 'Uncertain or not satisfied' group was always higher than that of 'Satisfied' group.

Things were similar when GH scores are given. This meant that patients who were satisfied with the surgery were more likely to have higher PF and GH scores.

In summary, no significant factor was found using 'univariate' analysis.

(22)

CHAPTER 4. OUTCOME ANALYSIS 14

Table 4.2: 'Univariate' analysis of 'Satisfaction'

Variables Categories Rate of success P-value Surgeon

MR 78.48%

0.0380

BN 88.52%

Other 69.49%

Gender Male 79.12%

0.9427

Female 78.79%

No. of levels operated

One 76.06%

0.7157

Two 81.40%

Three or more 78.57%

Duration of back pain

<3 months 85.71%

0.7528 3-12 months 82.86%

1-2 years 77.78%

>2 years 76.06%

Duration of back pain

<3 months 83.33%

0.7528 3-12 months 74.36%

1-2 years 84.15%

>2 years 74.24%

Walking distance pre-operation

<100m 74.29%

0.5805

100-500m 78.79%

500-1000m 83.33%

>1000m 85.19%

Slippage before surgery

<3mm 81.51%

0.2882

3-6mm 72.13%

>6mm 84.21%

Further slippage Yes 78.10%

0.6839

No 80.65%

Age 0.0101

PF score pre-operation 0.5182

BP score pre-operation 0.7598

GH score pre-operation 0.4172

MH score pre-operation 0.9842

(23)

CHAPTER 4. OUTCOME ANALYSIS 15

Figure 4.1: Empirical distribution of SF-36 scores(PF,BP,GH,MH)

4.1.2 Multivariate analysis

We used logistic regression to identify statistically significant explanatory variables.

We modelled the event of 'Uncertain or not satisfied'. First, we included all the variables mentioned above which had P-values of the univariate analyses less than 0.3 in the logistic regression model. The effect 'Further slippage' didn’t fullfil this requirement, but it was our special interest, so we included it in the model as well.

The main effects included in the model were 'Surgeon', 'Age', 'Slippage before surgery' and 'Further slippage'. We didn’t take any interaction terms into account.

We call this model the saturated model in this paper. However, usually the saturated model refered to the model with all the main effects and interactions. We always used the last category as reference category (Table 4.3). In the saturated model, 1 observations were deleted due to the missing values for the dependent or explanatory variables leaving 42 events in all.

We used Spearman correlation coefficients to exam the correlation between the contin- uous and ordinal variables. The associations between the nominal variable and other variables were measured by chi-square test or Kruskal-Wallis test. No significant large correlation was detected. Allison [1] argued that multicollinearity was a property of explanatory variables, so the VIF(variance inflation factor) could be used to detect the existence of multicollinearity. No severe variance inflation was detected except

(24)

CHAPTER 4. OUTCOME ANALYSIS 16

the dummy variables we created for 'Slippage before surgery' with VIF=3.5832 and 3.6441 which beyonded the threshold value 2.5 proposed by Allison [1]. We combined the categories 'Slippage 3-6mm' and 'Slippage more than 6mm' as a new category.

So the effect 'Slippage before surgery' had only two levels now, representing 'No slippage' and 'Slippage exists'. Then we eliminated the potential multicollinearity.

Therefore, there were 4 main effects with 5 predictors in the model excluding the intercept term, resulting in 8.04 events per paramter in the saturated model. The deviance and the Pearson chi-square goodness-of-fit statistics all supported that the saturated model was adequate(P=0.4097, 0.3186). However, Dobson and Barnett [8]

argued that the deviance and the Pearson chi-square statistic were not useful mea- sures of fit if continuous variables were included in the model since the sample size requirement for the asymptotic chi-square distribution was not met. Stokes et al. [36]

proposed 3 methods to test the goodness-of-fit with the presence of continuous vari- ables. Two of the three alternatives were widely used in practice. One was the residual score statistic. If the model fitted well, the residual score statistic was chi-square dis- tributed. The residual score statistic was computed by comparing the model that we chose with a expanded model. Stokes et al. [36] stated that there should be at least 5 events per parameter so as to use the score statistic. The other was the Hosmer and Lemeshow goodness-of-fit test, which was also reccommended by Dobson and Bar- nett [8]. This test showed that the saturated model fitted the data well(P=0.3003).

Further, the likelihood ratio statistic for testing the global null hypothesis that all the coefficients equaled to zero nearly approached significance(P=0.0153).

The goodness-of-fit tests mentioned above made it reasonable to present the esti- mations of the model. We reported the odds ratio point estimates and Wald 95%

confidence interval estimates in Table 4.3 as well as the P-values of Type 3 analysis.

Allison [1] and Stokes et al. [36] both pointed out that the profile likelihood confi- dence intervals provided better results than Wald confidence intervals in small sample size. Table 4.4 showed the 95% profile likelihood confidence intervals. There was no big difference between the results in Table 4.3 and Table 4.4. From Table 4.4 we could see that patients treated by Dr.BN had only 13 odds of the patients treated by 'Other' doctors. Meanwhile patients treated by Dr.MR had only about 35 odds of the patients treated by 'Other' doctors. Only 'Age' nearly approached significance in the saturated model. If 'Age' increased one unit, the odds for the unsatisfactory rate increased 1.052. Patients without slippage before surgery tended to have larger satisfaction rate than patients with slippage. Patients without further slippage after surgery tended to have lower satisfactory rate than patients with further slippage, the corresponding odds ratio was 1.064. The profile likelihood confidence interval for odds ratio of 'Age' didn't cover the point 1, which was also an evidence of existence of association between 'Age' and 'Satisfaction'. Figure 4.2 showed the ROC curve

(25)

CHAPTER 4. OUTCOME ANALYSIS 17

Table 4.3: Saturated logistic regression of 'satisfaction'

Odds ratio estimates Type

Categories

Point 95% Wald 3

estimates confidence limits analysis lower upper P-values

Surgeon BN/Others 0.317 0.119 0.846

0.0655 MR/Others 0.579 0.262 1.281

Age – 1.052 1.012 1.094 0.0113

Slippage before surgery No/Yes 0.756 0.370 1.544 0.4425 Further slippage No/Yes 1.064 0.486 2.326 0.8774

Table 4.4: Profile likelihood confidence intervals Odds ratio estimates Categories

Point 95% profile likelihood estimates confidence limits

lower upper

Surgeon BN/Others 0.317 0.112 0.818

MR/Others 0.579 0.259 1.280

Age 1.052 1.013 1.096

Slippage before surgery No/Yes 0.756 0.370 1.553 Further slippage No/Yes 1.064 0.494 2.389

of the saturated model. The area under the curve was only 0.6895 which was not good enough.

In the logistic regression model, the Pearson residuals and the deviance residuals played roles as model diagnostic tools. However, with the presence of the continuous variables, such residuals were always uninformative [8, 36]. We still paid some at- tention to these residuals, but we didn't bother with the results based on residuals.

Figure 4.3 displayed the Pearson residuals and the deviance residuals. The lever- ages were the diagonal elements in the estimated hat matrix. If the model fitted the data well, the standardized Pearson residuals and the standardized deviance residuals should be standard normally distributed. However, from Figure 4.3 we could conclude that both these residuals were skewly distributed. Test for normality rejected nor- mality hypothesis with strong evidence(P<0.0001). If we used the saturated model to predict whether patients were satisfied with the surgery, the error was up to 22%.

Further inspection showed that most of the prediction errors occured on the patients who were not satisfied or uncertain with the surgery.

(26)

CHAPTER 4. OUTCOME ANALYSIS 18

Figure 4.2: ROC curve for the sat- urated model

Figure 4.3: Model diagnosticsa

a1 stands for 'Satisfied' and 0 stands for 'Uncertain or not satisfied'

Table 4.5: Stepwise regression and Backward elimination methods Significance level

Entry=0.01 Stay=0.01 Entry=0.05 Stay=0.05 Entry=0.1 Stay=0.1

Surgeon – – 0.0609

Age 0.0056 0.0056 0.0086

Then we tried to select explanatory variables for the satuared model. Both back- ward elimination and stepwise regression method had the same results. If we set both entering significance level and removing significance level equal to 0.01 or 0.05, then 'Age' entered the model. If these significance levels increased to 0.1, 'Surgeon' also entered the model(Table 4.5). Consider the model selected when both entering and removing significance levels equaled to 0.01, the deviance and Pearson chi-square test suggested that the model fitted the data well(P=0.4264, 0.5304). Both the resid- ual score test and the Hosmer and Lemeshow goodness-of-fit test also supported the model(P=0.1691,0.8728). The expanded model used in the residual score statistic used was the saturated model. The estimated odds ratio was 1.056, which showed that older people tended to have less satisfactory rate. The profile likelihood con- fidence interval didn't cover 1. Figure 4.4 showed the ROC curve for the model selected by stepwise regression as well as the backward elimination. The ROC curve was even below the diagonal line sometimes. The residuals were still not normally distributed. The prediction error was up to 21.5%.

Hastie et.al [12] proposed the use of AIC based backward elimination for subset selec- tion. This method resulted in a different result comparing with the stepwise regression

(27)

CHAPTER 4. OUTCOME ANALYSIS 19

Figure 4.4: ROC curve for the model selected by stepwise regression

Figure 4.5: Residuals of the model chosen by stepwise regressiona

a1 stands for 'Satisfied' and 0 stands for 'Uncertain or not satisfied'

Table 4.6: Results of the model chosen by AIC

Odds ratio estimates Type

Categories

Point 95% profile likelihood 3 estimates confidence limits analysis

lower upper P-values

Surgeon BN/Others 0.312 0.111 0.802

0.0609 MR/Others 0.584 0.262 1.289

Age 1.053 1.014 1.097 0.0086

and backwad elimination which based on seires of F tests. The explanatory variables that AIC selected were 'Surgeon', 'Age' with AIC=198.874. The model was exactly the same as the stepwise regression and backward elimination when significance level was set to 0.1. Type 3 analysis showed that only 'Age' was significant (P=0.0086).

All the goodness-of-fit statistics suggested that the model fitted the data well. The error rate when predicting was 21.5%. Figure 4.6 showed the ROC curve for the model. The ROC curve looked somewhat better than the model selected by the step- wise regression, and was simialr to the curve of the saturated model. The residuals were also not satisfactory.

4.2 Walking distance post-operation

Improvement of walking capacity is very meaningful to patients who suffer from spinal stenosis, since walking disability is a very important symptom of spinal stenosis. Re-

(28)

CHAPTER 4. OUTCOME ANALYSIS 20

Figure 4.6: ROC curve for the model selected by AIC backward elimination

Figure 4.7: Residuals of the model chosen by AICa

a1 stands for 'Satisfied' and 0 stands for 'Uncertain or not satisfied'

call that 'Walking distance post-operation' has 4 levels. So if we use PROC LOGIS- TIC in SAS to fit a logistic regression model, the model is actually a proportional odds model(also called the cumulative logit model) which depends on the propor- tional odds assumption. The proportional odds assumption means that for every explanatory variable the corresponding slope is constant across all categories. So the slope paramters don't vary across different categories, only the intercept varies.

However, the score test rejected the proportional odds assumption with strong ev- idence(P=0.0003). So it was better to fit a generalized logit model which assumed that both intercpets and slopes varied across categories. However, we didn't have enough large sample size to fit such a model. So we combine the categories 'Less than 100 meters', '100 to 500 meters' and '0.5 to 1km' into one category. Now, the outcome 'Walking distance post-operation' was binary, and we can fit a binary logistic regression model. Details could be found in Appendix B and C.

4.2.1 Univariate analysis

The outcome 'Walking distance post-operation' had two levels: 'Less than 1km' and 'More than 1km'. The corresponding frequenciew were 89 and 111 respec- tively. Univariate analysis showed that 'Walking distance pre-operation', 'PF score pre-operation' and 'BP score pre-operation' were significant compared to the signif- icance level 0.01. 'Gender' was nearly approached significance with P-value=0.0234.

Males tended to have longer walking distance after surgery than females. Patients with one or two levels operated also have longer walking distance after surgery, and patients with shorter duration of leg pain would have longer walking distance. But these two effects were not significant.

(29)

CHAPTER 4. OUTCOME ANALYSIS 21

Table 4.7: Prognostic factors for other outcomes Outcomes

Walkding distance Change in back Change in leg post-operation pain post-operation pain post-operation

Univariate

Age(<0.0001)

Age(0.0011) Walking distance pre-operation

(<0.0001)

analysis

PF score pre-operation GH score pre-operation –

(<0.0001) (0.0019)

BP score pre-operation (0.0053)

Stepwise Age(0.0003)

PF score pre-operation Age(0.0029) –

regression (< 0.0001) Surgeon(0.0681)

Age(<0.0001) Age(0.0014) AIC Further slippage(0.1039) BP score pre-operation

backward PF score pre-operation (0.1477) Age(0.0160)

elimination (0.0006) Duration of back pain

GH score pre-operation (0.1054) (0.1143)

4.2.2 Multivariate analysis

Effects with the P-values of the univariate analysis less than 0.3 were included in the logistic regression model. These effects were 'Surgeon', 'Gender', 'Age', 'Duration of back pain', 'Walking distance pre-operation', 'Further slippage', 'PF score pre- operation', 'BP score pre-operation', 'GH score pre-operation' and 'MH score pre- operation'. Although 'Slippage before surgery' didn't fulfill the requirement, we also added it in to the model since it was our special interest. So the saturated model contained 11 main effects. VIF suggested that there might be collinearity within the dummy variables for the effect 'Walking distance pre-operation' as well as the dummy variables for the effect 'Slippage before sugery'. So we combined the categories '0.5 to 1km' and 'more than 1km' into one category for the effect 'Walking distance pre-operation', and combined 'Slippage 3-6mm' and 'Slippage more than 6mm' into one category for the effect 'Slippage before sugery'. In par- ticular, the effect 'Slippage before sugery' had two categories now: 'With slippage before surgery' and 'Without slippage before surgery'. Then we modelled the event that walking distance post-operation less than 1km by fitting a saturated model. 7

(30)

CHAPTER 4. OUTCOME ANALYSIS 22

observations were deleted due to the missing values for the response or explanatory variables leaving 84 events in all. In the saturated model we have 11 main effects with 15 parameters excluding the intercept term. So the events/parameters ratio was 5.6.

Both the deviance and Pearson goodness-of-fit statistics supported that the model was adequate, but the evidence was not very strong(P=0.0575, 0.0611). The Hosmer and Lemeshow goodness-of-fit test also supported the model(P=0.4846). We also reject the null hypothesis that all the coefficients were zero with strong evidence(P<0.0001).

The decent goodness-of-fit tests made it meaningful to examine the other results.

Type 3 analysis showed that only 'Age' was significant. The Wald estimates and the profile likelihood estimates for the odds ratios were quite similar. Patients without slippage before surgery had odds 1.2 times larger than that of patients with slippage before surgery meaning that patients without slippage before surgery tended to have shorter walking distance after surgery. Patients without further slippage had odds only 35 times than that of patients with further slippage. So patients without further slippage tended to have longer walking distance after surgery. If 'Age' increased one unit, the odds increased 1.083, which meant that older people tended to have short walking distance after surgery. If we use the saturated model to predict, the error was up to 25.4%.

'Age' and 'PF score pre-operation' were selecetd by the stepwise regression or backward elimination when the significance level was 0.01. Both deviance and Pear- son goodness-of-fit test supported the model(P=0.0497, 0.3091). The Hosmer and Lemeshow goodness-of-fit also indicated that the model fitted the data well(P=0.8202).

We also reject the hypothesis that all the slope parameters were zero(P<0.0001).

Older people tended to have short walking distance, while patients with larger PF score pre-operation were more likely to have longer walking distance. The prediction error was 29.6%. The model selected by AIC backward elimination was quite dif- ferent. Besides 'Age' and 'PF score pre-operation', 'Surgeon', 'Further slippage' and 'GH score pre-operation' also entered the final model. However, the Type 3 analysis showed that only 'Age' and 'PF score pre-operation' was significant. All the goodness-of-fit tests supported the model. Patients without further slippage had only 12 odds than patients with further slippage indicating that the presence of fur- ther slippage might cause walking distance shorter. The error rate when we used the model selected by AIC to predict was 28.6%.

4.3 Change in back pain post-operation

Back pain is a commom sympton for spinal stenosis patients. In our study, all patients suffered from pain in the back. Surgery was supposed to relief such pain. Recall that

(31)

CHAPTER 4. OUTCOME ANALYSIS 23

'Change in back pain post-operation' had 7 levels, so as the case in the last section, we could choose to fit a proportional odds model or a generalized logit model. However, the test for the proportional odds assumption rejected the hypothesis, and we still didn't have enough large sample size to fit a generalized logit model. So we combined the categories 'Unchanged', 'Somewhat worse' and 'Worse' into a new category representing the case that the surgery changed nothing or even made it worse, and combined the other 4 categories into a new one representing the case that the surgery at least did some help. We would like to model the event 'Unchanged or worse'. The frequency of such event was 37.

4.3.1 Univariate analysis

Univariate analysis showed that only 'Age' was significant. Patients whose back pain situation was unchanged or becoming worse were older than patients who experienced some relief on the back pain(69.62 vs 63.48). Females tended to have larger chance to suffer from the bad outcomes than males. Patients with severe slippage before surgery tended to have better outcomes. But these effects were not significant.

4.3.2 Multivariate analysis

Effects with P-values smaller than 0.3 in the univariate analysis were included in the saturated model. Such effects were 'Age', 'Duration of leg pain', 'PF score pre-operation', 'BP score pre-operation' and 'GH score pre-operation'. We also included 'Slippage before surgery' and 'Further slippage' in the model since they were our special interests. So we have 7 main effects. VIF suggested that there might be collinearity among the dummy variables created for 'Slippage before surgery', so we combined some categories as what we did in section 4.1.2 and section 4.2.2.

Then in the saturated model, we had 7 main effects with 9 parameters excluding the intercept term. When estimating, 7 observations were deleted due to missing values for the response variables resulting in only 32 events. Thus the events/parameters ratio was only 3.6 in the saturated model.

The deviance and Pearson goodness-of-fit suggested that the saturated model fit- ted the data well, as well as the Hosmer and Lemeshow goodness-of-fit test. We also rejected the hypothesis that all the coefficients were zero. From this aspect, the model was adequate. Type 3 analysis showed that still only 'Age' was significant. There was no big difference between the profile likelihood confidence interval and the Wald confidence interval for odds ratio. Patients without slippage before surgery had only half odds of being unchanged or worse than patients with slippage before surgery.

And patients without further slippage had 1.4 times odds larger than patients with further slippage. The error when predicting was 16.1%.

(32)

CHAPTER 4. OUTCOME ANALYSIS 24

Then we used stepwise regression method to select variables. Only the effect 'Age' entered the model chosen by stepwise regression. The estimated odds ratio was 1.075 meaning that older patients tended to have bad results with respect to the back pain.

The error when predicting was 18.5%. The backward elimination method led to the same results as stepwise regression. However, the AIC backward elimination led to a model which had two additional main effects: 'Duration of leg pain', and 'BP score pre-operation'. Type 3 analysis showed that only 'Age' was significant in this model.

4.4 Change in leg pain post-operation

In our study, all patients suffered from leg pain before surgery. Change in leg pain after surgery is even more important than change in back pain. Recall that 'Change in leg pain post-operation' had 7 levels, so we could choose to fit a proportional odds model or a generalized logit model. However, the proportional odds assumption was rejected, and we still didn't have enough large sample size to fit a generalized logit model. So we did the same thing as what we did in the last section: combining the categories 'Unchanged', 'Somewhat worse' and 'Worse' into a new category representing the case that the surgery changed nothing or even made it worse, and combined the other 4 categories into a new one representing the case that the surgery at least did some help. We would like to model the event 'Unchanged or worse'. The frequency of such event was 40.

4.4.1 Univariate analysis

No significant main effects were detected by using univariate analysis. Only 'Age' nearly approached significance (P=0.0195). Patients whose leg pain situation was unchanged or becoming worse were older than patients who experienced some relief on the back pain(67.97 vs 63.78). Females tended to have larger chance to suffer from the bad outcomes than males. Patients with further slippage tended to have better outcomes. But these effects were not significant.

4.4.2 Multivariate analysis

Effect with P-values smaller than 0.3 in univariate analysis were included in the satu- rated model. Such effects were 'Age', 'Gender', 'Duration of leg pain'. And we also included 'Slippage before surgery' and 'Further slippage' resulting in 4 main effects.

So we have 5 main effects with 8 paramenters excluding the intercept term in the sat- urated model. However, we encountered the problem of quasi-complete separation.

The quasi-complete separation meant that there existed some linear combination of

(33)

CHAPTER 4. OUTCOME ANALYSIS 25

parameters which nearly perfectly predicted the outcomes. This made the maximum likelihood estimates might not exist. Further inspection showed that it was 'Duration of leg pain' which caused the problem. When such duration was less than 3 months, all the patients enjoyed improvements with respect to leg pain. So we combined the categories 'Less than 3 months' and '3-12 months' into a new one to cope with the quasi-complete separation. For more methods dealing with quasi-complete sep- aration, please see Allison [1]. We also combined the categories of 'Slippage before surgery' to avoid potential multicolliearity. Then there were 6 paramters left, the corresponding events/parameters ratio was 6.7.

The deviance and Pearson goodness-of-fit tests and the Hosmer and Lemeshow goodness- of-fit test sugeested that the saturated model fitted the data well. However, we can't reject the null hypothesis that all the coefficients were zero. When we use stepwise regression to select optimal explanatory variable subsets, none of them entered the model if we set the significance level as 0.01. The backward elimination method led to the same result to the stepwise regression. The AIC backward elimination led to the same result when the significance level was set to be 0.05 or 0.1 in stepwise regression. In this case, 'Age' entered the model which nearly approached signifi- cance(P=0.0160). Although the deviance, Pearson and the Hosmer and Lemeshow goodness-of-fit test showed that the model fitted the data well, the assumption that all the coefficients were zero could not be rejected. So we hardly found a effect which could be used as a prognostic factor.

(34)

Chapter 5

Degenerative spondylolisthesis Versus spinal stenosis

One of our special interest was to find whether whether the slippage before surgery matters. We divided all patients into two subgroups: patients with degenerative spondylolisthesis and patients with spinal stenosis alone. Degenerative spondylolis- thesis was found in 81 patients.

5.1 Comparison of baseline characteristics

The proportion of females in degenerative spondylolisthesis group was higher than that in spinal stenosis group (64.20% VS. 47.06%, P=0.0170). No significant differ- ence could be found between two groups concerning baseline characteristics (Table 5.1 and Table 5.2). And we have already shown in section 3.2.1 that patients with de- generative spondylolisthesis tended to have multilevel decompression. Such difference was not significant.

5.2 Comparison of outcomes

We have shown in chapter 4 that slippage before surgery would not influence clini- cal outcomes such as 'Satisfaction', 'Walking distance post-operation', 'Change in back pain post-operation' and 'Change in leg pain post-operation'. From Figure 5.1 we can see that patients with degenerative spondylolisthesis and patients with spinal stenosis alone were similar in empirical distributions of PF, BP, GH and MH scores both before and after surgery. Wilcoxon tests suggested that there was no significant difference between these two groups in any kind of scores (Table 5.3).

Thus no difference can be demonstrated for both baseline factors and outcomes.

26

(35)

CHAPTER 5. DEGENERATIVE SPONDYLOLISTHESIS VERSUS SPINAL STENOSIS27

Table 5.1: Categorical baseline characteristics of patients with degenerative spondy- lolisthesis and spinal stenosis

Categories Degenerative Spinal

P-values Spondylolisthesis stenosis

Gender Male 35.89% 52.94%

0.0170

Female 64.20% 47.06%

Duration of back pain

<3 months 8.64% 11.76%

0.1708

3-12 months 11.11% 21.85%

1-2 years 39.51% 34.45%

>2 years 40.74% 31.93%

Duration of leg pain

<3 months 8.64% 4.2%

0.1708

3-12 months 13.58% 24.37%

1-2 years 45.68% 37.82%

>2 years 32.10% 33.61%

Walking distance pre-operation

<100m 38.27% 32.77%

0.6258

100-500m 28.40% 36.13%

500-1000m 17.28% 18.49%

>1000m 16.05% 12.61%

Table 5.2: Continuous baseline characteristics of patients with degenerative spondy- lolisthesis and spinal stenosis alone

Degenerative spondylolisthesis Spinal stenosis P-values

Age 65.7901 63.8235

0.2413 (SD 8.7916) (SD 10.2945)

PF score 37.4568 36.6440

0.7737 pre-operation (SD 20.8950) (SD 19.9266)

BP score 30.1282 30.7094

0.8492 pre-operation (SD 14.4153) (SD 15.9309)

GH score 64.0086 67.8291

0.1861 pre-operation (SD 19.0284) (SD 19.1015)

MH score 66.5062 68.6102

0.7737 pre-operation (SD 19.4808) (SD 22.0183)

(36)

CHAPTER 5. DEGENERATIVE SPONDYLOLISTHESIS VERSUS SPINAL STENOSIS28

Figure 5.1: Empirical distribution of SF-36 scores(PF,BP,GH,MH)

Table 5.3: Comparison of SF-36 scores post-operation

Degenerative spondylolisthesis Spinal stenosis P-values

PF score 56.4476 62.5985

0.1686 post-operation (SD 30.4267) (SD 29.5475)

BP score 58.1358 57.4958

0.8451 post-operation (SD 27.0739) (SD 29.8360)

GH score 62.5658 65.6441

0.3265 post-operation (SD 22.7714) (SD 22.1153)

MH score 78.8831 79.1765

0.9959 post-operation (SD 20.6945) (SD 20.5985)

(37)

Chapter 6

Does further slippage matter?

We have shown in Chapter 4 that further slippage would not influence clinical out- comes. We will look deeper at this problem in this chapter. Further slippage was found in 62 patients.

6.1 Comparison of baseline factors

The proportion of males among the patients without further slippage was larger than that of females while the proportion of females among patients with further slippage was larger than that of males. The difference was not significant. Patients without further slippage tended to be older and had larger SF-36 scores pre-operation. No significant difference could be found concerning baseline characteristics. Only 'MH score pre-operation' nearly approached significance.

6.2 Comparison of outcomes

We have shown in Chapter 4 that slippage before surgery would not influence clini- cal outcomes such as 'Satisfaction', 'Walking distance post-operation', 'Change in back pain post-operation' and 'Change in leg pain post-operation'. From Figure 6.1 we could see that patients with further slippage and those without further slippage were similar in empirical distribution of SF-36 scores after surgery. We can't reject the hypothesis that they came from the same distribution(Table6.3).

Thus no difference can be demonstrated for both baseline factors and outcomes.

29

(38)

CHAPTER 6. DOES FURTHER SLIPPAGE MATTER? 30

Table 6.1: Categorical baseline characteristics of patients with further slippage and without further slippage

Categories Without further With further

P-values slippgage slippage

Gender Male 51.45% 33.87%

0.0211

Female 48.44% 66.13%

Duration of back pain

<3months 13.04% 4.84%

0.2927

3-12months 17.39% 17.74%

1-2years 34.06% 41.94%

> 2years 35.51% 35.48%

Duration of leg pain

<3months 7.97% 1.61%

0.2269

3-13months 18.64% 22.58%

1-2years 38.41% 46.77%

>2years 34.78% 29.03%

Walking distance pre-operation

<100m 36.23% 32.26%

0.0757

100-500m 28.26% 43.55%

500-1000m 18.12% 17.74%

>1000m 17.39% 6.45%

Table 6.2: Continuous baseline characteristics of patients with further slippage and without further slippage

Without further slippage With further slippage P-values

Age 65.1667 63.4032

0.1613

(SD 10.0963) (SD 8.8493)

PF score 37.8613 35.0161

0.4211 pre-operation (SD 20.4206) (SD 19.9832)

BP score 31.0889 29.1000

0.3520 pre-operation (SD 15.7074) (SD 14.3983)

GH score 66.6471 65.5833

0.6357 pre-operation (SD 19.4854) (SD 18.3758)

MH score 70.0803 62.6129

0.0268 pre-operation (SD 20.2576) (SD 21.8432)

(39)

CHAPTER 6. DOES FURTHER SLIPPAGE MATTER? 31

Figure 6.1: Empirical distributions of SF-36 scores(PF, BP, GH and MH)

Table 6.3: Comparison of SF-36 scores post-operation

Without further slippage With further slippage P-values

PF score 61.0453 58.1922

0.4727 post-operation (SD 30.0726) (SD 29.9208)

BP score 59.7900 53.2258

0.1434 post-operation (SD 28.4352) (SD 28.9388)

GH score 64.3116 64.7208

0.8680 post-operation (SD 22.2035) (SD 22.9132)

MH score 80.7826 74.9655

0.1045 post-operation (SD 19.2902) (SD 23.0400)

(40)

Chapter 7

Alternative methods

In this chapter, we introduce some alternative methods to analyze the data. However, due to the lack of the observations, it is not feasible to demonstrate such methods in this paper. But they can be carried out easily if we have a dataset with large sample size.

In Chapter 4, we mainly use the binary logistic regression to analysis the data, which is widely used in practice to analyze binary data. In this model, we assume the linear relationship in explanatory variables. The linear assumption, which is easy to esti- mate and interpret, is widely used in applied statistics. However, this is a quite strict assumption. And the linear model may not be able to capture the feature of the data well if the true effect is not linear. This is mostly the case in practice. A natural generalization of the linear logistic regression model is the additive logistic model which is linear in parameters but not linear in explanatory variables. For detailed description, please see Hastie et al. [12]. The additive logisitc model is defined as

log( P

1 − P) = β0+X

(fi(Xi)) (7.1)

where each fi is an unknown function that we would like to estimate. Usually we only estimate fi(Xi) if Xi is continuous, and if Xi is discrete, we simply use dummy variables as what we used in ordinary linear logistic model. We can use the function 'gam()' in R to estimate the generelized additive model. We can use the AIC back- ward elimination method to select the optimal subset of explanatory variables.

When the dependent variable is binary, a natural alternative to the binary logis- tic regression model is the so called the LDA(linear discriminant analysis). Usually, a linear function is constructed, and the individuals can then be classified using the linear discriminant function. The discriminant analysis is a very important statisti- cal tool in practice. However, the classic discriminant analysis designed by Fisher is mainly for continuous discriminators only. The distance measure is in fact the Maha-

32

(41)

CHAPTER 7. ALTERNATIVE METHODS 33

lanobis distance to measure the distance between two groups which is only meaningful for continuous variables [24]. Krzanowski introduced the idea of mixed discriminant analysis and the distance measure between populations which using both continuous and categorical variables as discriminators in a series of papers [22, 23, 24, 25, 26].

Furher Krzanowski [27] and Daudin [7] have introduced the methods of varialbe se- lection. Within the framework of discriminant analysis, some researchers used the partial least square-discriminant analysis to model the categorical dependent vari- ables [5, 33]. And the variable selection methods are also available [2, 6]

In this paper, we used the traditional stepwise regression and backward regression method to select optimal subset of explanatory variables which are based on series of F tests. We also used the AIC backward elimination method reccommended by Hastie et al. [12]. These selection methods are discrete processes which often exhibit high variance and don't reduce the prediction error of the full model; shrinkage methods like ridge regression and LASSO are more 'continuous' and don't suffer as much from high variability [12]. The L1 penalty used in the LASSO could be used in any linear regression model. So LASSO may be an alternative method when selecting subsets of explanatory variables.

(42)

Chapter 8

Further Discussion

8.1 Discussion

Among old people, spinal stenosis is a common disease causing back and leg pain. As we are marching into the aging society, we need to pay more attention to such issues.

Lots of work has been done concerning spinal stenosis.

In our study, we used a three scaled variable 'satisfaction' to measure whether pa- tients were satisfied, uncertain or not satisfied with surgery. 78.90% patients claimed that they were satisfied with the operation. However, there is a large variation in excellent-good rates or satisfaction rates in the literature. Katz et al [18] reported satisfaction rate up to 84%. Sanderson and Wood [34] reported the excellent or good rate up to 81%. Gelalis et al [10] reported the excellent or good rate as 72%.

63% patients were satisfied with the surgery in Maine lumbar spine study [4]. Lehto and Honkanen [28] reported that only 57% patients had excellent or good outcomes.

Johnsson et al [15] showed that 59% patients assessed the results of the operation as excellent or good. J¨onsson et al [17] found that such rate was 63% when the follow- up was 4 months after surgery, it increased to 67% at 2 years follow-up and then decreased to 52% when follow-up was 5 years. Turner et al [37] pointed out that most studies in the literature had excellent or good rate fewer than 80%. They also no- ticed that studies with shorter follow-ups reported better outcomes and studies with higher proportions of patients with degenerative spondylolishesis had significantly better outcomes.

In the current study, we found that there was a tendency that patients who had longer walking distance before surgery had higher satisfaction rate and patients with longer duration of back pain tended to have lower satisfaction rate. However, both of these were non-significant. Prognostic factors such as gender, No. of levels operated were non-significant. Johnsson et al [15] found that gender was significant. However,

34

References

Related documents

The rela- tionship between Modic changes at study start and impaired physical function, measured with the RMDQ (Roland Morris Disability Questionnaire, 0 = best, 23 = worst

• to identify potential differences between subjects who seek care for their low- back pain problems and those who don’t with respect to pain intensity, grade of disability,

Evoked thalamic neuronal activity following DRG application of two nucleus pulposus derived cell populations: an experimental study in rats... Neuronal networks involved in low

To investigate the responsiveness and minimal important change of four physical capacity tasks used to assess functioning in patients with chronic LBP due to DDD who undergo

Outcome Measures of Functioning and Physical Activity in Patients with Low Back Pain | Max Jakobsson.

Use of the PREPARE (PREhabilitation, Physical Activity and exeRcisE) program to improve outcomes after lumbar fusion surgery for severe low back pain: a study protocol of

Having Chronic Low Back Pain (CLBP) respectively the number of locations with pain are associated with lower physical status (SF-36 PCS).. The number of locations with pain

The end variable pain level after one year, measured with visual analogue scale, is first preliminarily analysed using beta, logistic and ordinal regression, all three.. As