Customer Satisfaction Analysis

(1)

Master Thesis in Statistics, Data Analysis and Knowledge Discovery

Customer Satisfaction Analysis

(2)

(3)

(4)

Abstract

The objective of this master thesis is to identify “key-drivers” embedded in customer satisfaction data. The data was collected by a large transportation sector corporation during five years and in four different countries. The questionnaire involved several different sections of questions and ranged from demographical information to satisfaction attributes with the vehicle, dealer and several problem areas. Various regression, correlation and cooperative game theory approaches were used to identify the key satisfiers and dissatisfiers. The theoretical and practical advantages of using the Shapley value, Canonical Correlation Analysis and Hierarchical Logistic Regression has been demonstrated and applied to market research.

(5)

(6)

Acknowledgements

This work would not have been completed without support of many individuals. I would like to thank everyone who has helped me along the way. Particularly: Prof. Anders Nordgaard and Malte Isacsson for providing guidance, encouragement and support over the course of my master’s research. Prof. Anders Grimvall for serving on my thesis committee and valuable suggestions. Volvo Car Corporation for providing the data. Lastly, to everyone else without whose support none of this would have been possible.

(7)

(8)

Index of tables

Table 1: Datasets Summary ... 8

Table 2: Frequencies and proportions of the satisfaction attributes (Country A, Year 2006) ... 11

Table 3: Problem areas occurance (Country A, Year 2006) ... 13

Table 4: Dealer Satisfiers, Country A, Years 2006 and 2007 respectively ... 31

Table 5: Dealer Satisfiers, Country A, Years 2008 and 2009 respectively ... 32

Table 6: Dealer Satisfiers, Country A, Year 2010... 32

Table 7: Vehicle satisfiers, Country A, Years 2006 and 2007 respectively ... 33

Table 8: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively ... 34

Table 9: Vehicle Satisfiers, Country A, Year 2010 ... 35

Table 10: Vehicle Satisfiers, Country A, Years 2006 and 2007 respectively (respondents with no problems) ... 36

Table 11: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively (respondents with no problems) ... 37

Table 12: Vehicle Satisfiers, Country A, Year 2010 (respondents with no problems) ... 38

Table 13: Dissatisfiers, Country A, Year 2006 and 2007 respectively ... 39

Table 14: Dissatisfiers, Country A, Year 2008 and 2009 respectively ... 40

Table 15: Dissatisfiers, Country A, Year 2010 ... 40

Table 16: Ven problem area sub-categories, Country A, Year 2006 ... 41

Table 17: Bulding the GLIMMIX procedure ... 47

Table 18: Country A, Year 2006 ... 48

Table 19 ... 48

Table 20: Solution for fixed effects ... 48

Table 21 ... 50

Table 22 ... 51

(11)

Index of figures

Figure 1: Frequency distribution of variable V191 in Country A, Year 2006 ... 10

Figure 2: Proportions of problem areas ... 14

Figure 3: Kano Model Attributes ... 16

Figure 4: Two-level hierarchical regression ... 26

Figure 5: Satisfaction Attribute V90, Country A, Year 2006 ... 31

Figure 6: Noise-Reach table, Country A, Year 2006 ... 42

Figure 7: Time Series Analysis, Country A ... 43

Figure 8: Trend in V1, Country A ... 43

Figure 9: Time Series Analysis, Country A (respondents with no problems) ... 44

Figure 10: Trend in V1, Country A (respondents with no problems) ... 45

Figure 11: Trend in V8, Country A, all respondents vs. only those with no problems ... 45

Figure 14: Time Series Analysis, problem areas, Country A ... 47

Index of equations

Equation 1: Regression Model ... 17

Equation 2: R-squared ... 18

Equation 3: Nash Equlibrium ... 19

Equation 4: Marginal contribution of a player in a game ... 19

Equation 5: Potential Function ... 20

Equation 6: Differences ... 20

Equation 7: Payoff ... 20

Equation 8: Shapley Value... 21

Equation 9: Regression model ... 22

Equation 10: Variance ... 22

Equation 11: Relative contributions ... 22

Equation 12: Marginal Effect ... 23

Equation 13: Shapley Value R-squared decomposition ... 23

Equation 14: Fields R-squared decomposition ... 23

Equation 15: Success ... 24

Equation 16: Ordinary logistic regression model ... 26

Equation 17: Random effects ... 27

Equation 18: Fixed effects ... 27

(12)

1 Introduction

Predictive analytics rely on different statistical techniques deriving from fields such as data mining, modeling and game theory. The main reason for using these is to extract information from large and complex datasets and use it to forecast future trend patterns. In business, predictive models search for patterns and hidden relationships in historical or transactional data to serve as a guide for decision making and identifying risk and opportunities.

Several data mining techniques have been developed and showed positive impact over the years in the large range of business fields. The most well-known applications include applications in finance (e.g. credit scoring), marketing, and fraud detection. Each field by itself then offers an enormous amount of possibilities where the analysis of the large datasets can be exploited in a profitable fashion.

Taking marketing into consideration; among the main topics covered by predictive analytics are CRM (Customer Relationship Management), cross-selling, customer retention and direct marketing. Moreover, achieving these goals is in general based on conducting an appropriate customer analysis.

A large portion of the data required by customer analytics is more than often acquired conducting customer satisfaction surveys. Customer satisfaction is a well known term in marketing; it indicates how well the products or services provided by the supplier meet the customer expectations. In a highly competitive market, the companies may take advantage of such information to differentiate or improve their products or/and services in order to increase their (market) share of customers and customers’ loyalty. Such data is among the most frequently collected indicators of market perceptions.

The purpose of this master thesis is to develop an appropriate customer satisfaction analysis procedure that will provide an indicator of customer behavior on a highly competitive automotive market. Thus, the aim of the analysis is to find hidden relationships, patterns and trends in the datasets provided.

(13)

including data sources, raw data and secondary data. The third part consists of the methods used and model building, while the last part is focusing on the results and discussion of the latter. The research is concluded with a critical assessment of the results obtained and the adequacy of the methods used.

1.1 Background

The research is based on a customer satisfaction survey performed on new car owners (i.e. owners of cars that are three months in service). The survey was conducted on several different markets and consists of different areas of customer characteristics, customer satisfaction and related issues. As cars are consumer products, automotive businesses are driven by customer satisfaction. Hence an improvement in consumer insight and information gain through customer data is sought for constantly. It is essential to mention that satisfaction is a very abstract concept and the actual state of satisfaction varies between different individuals and different products or services. It depends on several psychological and physical variables. Additional options or alternative products and services that are available to customers in particular industry can be too seen as a source of variability of the satisfaction level. Most valuable satisfaction behaviors to investigate are loyalty and recommend rate.

The main purpose of customer satisfaction analysis is often, understanding the impact of explanatory variables on the overall dependent variable. This means that a list of priority items, that can be improved, needs to be established, since the improvement in any of these will have a positive impact on overall satisfaction or customer loyalty and retention (Tang & Weiner, 2005). When choosing an appropriate statistical technique it is necessary to have a clear vision whether the purpose of the analysis is solely exploratory or predictive.

Some of the most common customer satisfaction techniques include; ordinary least squares, Shapley value regression, penalty & reward analysis, Kruskal’s relative importance, partial least squares and logistic regression. Since customer satisfaction studies are usually tracking studies, the results can be monitored over time and allow for trend detection. Moreover, one of the challenges when choosing the methodology and building the model is to assure that the results are consistent when tracking market over time (Tang & Weiner, 2005).

(14)

1.2 Objective

The main objective of the master thesis is to find appropriate statistical techniques that present good applications in customer satisfaction analysis. Furthermore, they should provide tools for identifying “key drivers”, patterns, relationships among several sets of (dependent and independent) variables and measures of relative importance.

More specific objectives of the thesis are; finding an exact measure of the contributions of the explanatory variables to the dependent variable and identifying the greatest satisfiers and dissatisfiers influencing customer satisfaction with the dealer and with the vehicle. Exploring the nature of the satisfaction attributes and evaluating whether there is a possibility to establish a “clean” measure of experienced problems and consequentially classify them into “fixable” and those that cannot be repaired, but are a matter of customers’ personal preferences (i.e. “annoying concept”). Finally, examining the relationships between two sets of variables (i.e. satisfaction related problems and satisfaction attributes).

The thesis aims to be consistent with the most commonly used customer satisfaction analysis techniques and available literature. What can and cannot be modeled and predicted needs to be clearly stated at all points.

2 Data

The data used in this research was collected by conducting a customer satisfaction survey among new car owners (i.e. customers who had purchased a new car within three months). The questionnaire was divided into twelve different sections; ranging from personal, demographical questions to questions directly connected to satisfaction with the new car, previous cars and views on automotive industry. Data on customer satisfaction is often taken as a key performance indicator within business and is often incorporated in balance scorecards.

An important basic requirement for effective research on customer satisfaction is building an appropriate questionnaire that provides reliable and representative measures. The general guideline is to build questions on whether the product or service has met or exceeded expectations. Expectations and consequently customer perceptions are therefore the key factor

(15)

Questions are based on individual level perceptions but are usually reported on aggregate level. According to Batra and Athola (Batra & Athola, 1990) customers purchase products and services based on two types of benefits; hedonic and utilitarian. The first is connected to experiential attributes and the latter is linked to the functional attributes of the product.

The survey used in this research, involved most common measures of customer satisfaction; sets of statements using Likert technique and scales (Likert, 1932).

2.1 Raw data

The data provided was based on the survey conducted in four different countries (A, B, C and D) and ranging over five years – 2006 to 2010, except for country C, where the survey was conducted every second year (i.e. 2001, 2003, 2005, 2007, 2009)

Table 1: Datasets Summary Country/Year Number of Variables Recorded responses A 2006 394 41474 A 2007 411 41657 A 2008 387 42783 A 2009 382 40531 A 2010 346 39879 B 2006 390 46690 B 2007 400 46148 B 2008 387 49918 B 2009 382 48833 B 2010 385 46987 C 2001 301 10830 C 2003 362 9738 C 2005 371 10912 C 2007 398 10592 C 2009 365 12341 D 2006 403 13667 D 2007 410 16509 D 2008 385 18968 D 2009 382 20875 D 2010 426 23664 TOTAL / 592996

(16)

In total there were 592996 responses and 229218251 data-points. The number of variables in each survey ranged from 301 to 426.

The results from the development of the methodology are based on a survey1 in the country A that included 41474 responses and 394 variables in the year 2006. In the trend analysis the datasets included all five years.

The survey in question yielded 31351 valid responses, representing 76% of all customers who participated. The variables used in the core part of the analysis were 34 satisfaction attributes and 14 problem areas, where each problem area consists in general of 20 sub-categories.

The satisfaction attributes were evaluated on a 1 to 10 scale, where 1 represented the worst and 10 the best possible outcome. Problem areas on the other hand allowed for several nominal values.

2.2 Secondary data

Since the survey comprised several questions that allowed more than one answer (e.g. problem areas), the first step of the analysis was to transform these into binary form, using dummy variables. However, various variables in the research posed a bigger challenge and required further investigation to decide whether they should be treated as being ordinal or interval.

Variables ranked on a “never, occasionally, sometimes, always” scale present a problem on relative placement of the two middle categories, thus Knapp (Knapp, 1990) argues that this produces a less-than ordinal scale. The controversy arises from the key terms such as “appropriateness” and “meaningfulness”. Conservative views (Siegel, 1956) are based on the assumption that once the ordinal level has been adopted, the inferences are restricted to population medians and non-parametric procedures must be used, hence the power of the statistics is lower. Labovitz (Labovits, 1967, pp. 151-160) on the other hand argues that there are no true restrictions in using parametric procedures for ordinal scales, since the assumption of the validity of the t and F distributions do not include the type of the scale, which consequentially provides statistics of higher power.

(17)

The number of the categories building the scale is important too. The remaining variables varied in scale level and the two types of scales occurring were a 1 to 4 scale and 1 to 10 scale, where the latter tends to continuize things more than the first. Moreover, there have been several studies (Hausknecht, 1990) on measurement scales in customer satisfaction analysis, which attempt to prove the validity of treating an ordinal scale with several categories as interval.

2.3 Assessment of data quality

The quality and the nature of the data provided was first assessed by applying an univariate approach; identifying the distributions, response rate and percentage of missing values. As a last step of this pre-analysis, the most common issues when dealing with customer satisfaction data were pointed out.

2.3.1 Univariate Analysis of the Satisfaction Attributes

(18)

Table 2: Frequencies and proportions of the satisfaction attributes (Country A, Year 2006) Scale V191 V14 V193 V7 V3 1 0,41% 0,06% 0,24% 0,07% 0,18% 2 0,27% 0,04% 0,14% 0,07% 0,16% 3 0,57% 0,12% 0,22% 0,20% 0,32% 4 1,17% 0,31% 0,62% 0,63% 0,86% 5 1,42% 1,01% 1,11% 1,47% 1,72% 6 5,20% 3,61% 3,51% 5,13% 4,95% 7 8,71% 8,89% 8,90% 12,11% 11,94% 8 28,68% 24,65% 25,37% 27,32% 26,21% 9 29,09% 28,46% 31,62% 28,01% 27,42% 10 24,48% 32,85% 28,27% 24,99% 26,24% Total (responses) 94,78% 97,59% 97,53% 97,59% 97,54% Missing values 5,22% 2,41% 2,47% 2,41% 2,46% Scale V6 V8 V17 V23 V1 1 0,15% 0,11% 0,09% 0,27% 0,48% 2 0,17% 0,15% 0,05% 0,18% 0,27% 3 0,31% 0,29% 0,06% 0,33% 0,41% 4 0,98% 1,00% 0,19% 1,06% 0,75% 5 2,18% 2,25% 0,82% 2,68% 1,00% 6 6,51% 7,33% 2,77% 8,44% 2,64% 7 14,35% 14,31% 8,37% 14,38% 6,75% 8 27,12% 27,57% 24,16% 25,95% 19,63% 9 25,39% 26,08% 29,59% 22,60% 28,05% 10 22,83% 20,90% 33,91% 24,11% 40,03% Total (responses) 97,36% 96,96% 96,14% 96,99% 94,15% Missing values 2,64% 3,04% 3,86% 3,01% 5,85% Scale V12 V15 V208 V9 V19 1 0,10% 0,09% 0,27% 0,13% 0,15% 2 0,06% 0,07% 0,24% 0,08% 0,19% 3 0,14% 0,16% 0,47% 0,16% 0,34% 4 0,57% 0,49% 1,41% 0,49% 1,14% 5 1,38% 1,06% 2,16% 0,92% 1,81% 6 4,49% 3,65% 5,95% 3,08% 5,31% 7 11,25% 9,54% 11,64% 8,98% 11,12% 8 27,18% 25,44% 24,79% 25,16% 25,76% 9 27,95% 29,00% 25,96% 29,17% 26,68% 10 26,88% 30,50% 27,11% 31,83% 27,50% Total (responses) 97,35% 97,52% 94,37% 97,46% 97,41% Missing values 2,65% 2,48% 5,63% 2,54% 2,59%

(19)

Scale V211 V4 V16 V13 V11 1 0,52% 0,28% 0,13% 0,07% 0,12% 2 0,35% 0,19% 0,14% 0,07% 0,10% 3 0,64% 0,24% 0,28% 0,11% 0,22% 4 1,61% 0,70% 0,92% 0,45% 0,78% 5 5,63% 2,56% 2,06% 1,15% 1,72% 6 10,72% 5,81% 5,86% 3,94% 5,22% 7 16,59% 12,44% 13,22% 11,09% 13,18% 8 24,61% 24,94% 27,81% 27,74% 28,08% 9 19,90% 24,13% 25,38% 27,96% 26,36% 10 19,44% 28,71% 24,19% 27,43% 24,22% Total (responses) 90,23% 94,66% 97,29% 97,26% 96,84% Missing values 9,77% 5,34% 2,71% 2,74% 3,16% Scale V20 V26 V10 V22 V2 1 0,13% 0,10% 0,19% 0,18% 0,14% 2 0,11% 0,09% 0,16% 0,15% 0,09% 3 0,22% 0,16% 0,31% 0,28% 0,25% 4 0,64% 0,77% 0,87% 1,11% 0,73% 5 1,75% 1,47% 1,37% 1,86% 1,45% 6 5,34% 5,09% 4,41% 5,67% 4,47% 7 12,42% 13,01% 10,06% 12,32% 11,42% 8 27,76% 27,95% 25,13% 27,12% 27,57% 9 26,63% 25,78% 27,97% 26,06% 27,79% 10 25,02% 25,57% 29,52% 25,24% 26,09% Total (responses) 94,84% 97,03% 97,69% 96,91% 97,50% Missing values 5,16% 2,97% 2,31% 3,09% 2,50% Scale V221 V222 V223 V18 1 0,06% 0,26% 0,19% 0,09% 2 0,09% 0,27% 0,18% 0,13% 3 0,15% 0,56% 0,41% 0,23% 4 0,54% 1,80% 1,19% 0,73% 5 1,51% 3,21% 2,68% 1,78% 6 4,45% 8,24% 6,92% 5,37% 7 10,95% 12,70% 13,33% 12,67% 8 25,89% 23,91% 26,06% 26,89% 9 27,23% 23,09% 24,06% 26,05% 10 29,15% 25,97% 24,97% 26,05% Total (responses) 97,50% 97,43% 97,16% 97,52% Missing values 2,50% 2,57% 2,84% 2,48%

Taking into consideration the most favorable rating scores; meaning that the attribute scores were at least “very satisfied” (i.e. 7) the above tables illustrate that 77,5% to 96% of the customers were at least “very satisfied” on at least one of the satisfaction attributes. The lowest satisfaction

(20)

score was associated with V202 with 77,5%, however it still represents a majority attitude. The lowest response rate was associated with the attribute V211 with missing rate at 9,8%.

2.3.2 Univariate Analysis of Problem Areas

Total of 17950 problems appeared, meaning that 43,3% of all respondents experienced at least one problem. The below tables represent frequencies of the individual problem areas. The most common problems appear in the “Vel” category with 10,9% occurrence. The least common are “Vs” problems.

Table 3: Problem areas occurance (Country A, Year 2006) Problem Area Vp Ve Vw Vb Vo Vi Vel

Number of

experienced problems 2081 1759 738 3401 3155 3635 4500

% in the total

population 5,20% 4,24% 1,78% 8,20% 7,61% 8,76% 10,85%

Problem Area Ven Vcl Vbr Vsw Vs Vex Vot Number of

experienced problems 2452 1568 1480 1216 502 364 428

% in the total

(21)

Figure 2 represents the proportion of each problem area.

Figure 2: Proportions of problem areas

A very common challenge when dealing with customer satisfaction data is how to overcome the problem of multicollinearity. It can be controlled and avoided by a well-designed questionnaire, however, in most cases this is difficult to achieve. The attributes measured in the survey were in general highly correlated with each other. An example of such problem would be when evaluating the dealer where the car was purchased; the dealers’ ability to solve problems is highly correlated with the dealers’ friendliness.

Another issue relates to dealing with customer satisfaction data that is of tracking nature. It is challenging to reassure that the results obtained reflect real changes in the market and not just a small number of respondent checking different satisfaction levels (e.g. 8 instead of 9).

It is important to note that the percentage of customer who had experienced at least one problem, but are still at least “very satisfied” is 84,5%, which is the majority portion. Adding this imbalance to the fact that the nature of the survey is offering only very scarce information on problem areas, this may lead to several restrictions when analyzing the latter. Since the objectives of the thesis involve a deep analysis of the experienced problems, more appropriate measures

(22)

should be provided by further expansion and development of the “things go wrong” section of the questionnaire.

3 Methods

The main methods used in the research are:

Kano Modeling; providing deeper understanding of the customer satisfaction data and what can be achieved using the available data.

Shapley Value; overcoming the problem of multicollinearity, providing better regression results and allowing for trend analysis.

Hierarchical logistic regression; exploring different, hierarchically ranked layers of the data.

Canonical correlation; analyzing relationships between two different sets of variables.

3.1 Kano Modeling

The theory of has been developed by professor Noriaki Kano (Mikulic & Prebežac, 2011, pp. 44-66) and involves product development and customer satisfaction. It classifies product attributes into five categories based on customer perceptions; enhancers, one-dimensional, must-be, indifferent and reverse.

The theory states that the relationship between the performance of a product attribute and satisfaction level is not necessarily linear. Certain attributes can be asymmetrically related with satisfaction levels. These relationships are visually presented in figure 3.

(23)

Figure 3: Kano Model Attributes

Where an attractive attribute provides satisfaction when it is fully implemented, the non-fulfillment of such does not, however cause dissatisfaction. Must-be attributes on the other hand results in dissatisfaction if not fulfilled, but the fulfillment does not increase satisfaction. One-dimensional attributes increase the satisfaction when implemented and dissatisfaction appears if the attribute is not fulfilled. Indifferent attributes do not affect the consumer satisfaction in any way, while reversal attributes result in customer dissatisfaction when fulfilled and sat isfaction when not fulfilled (e.g. when technology that is difficult to understand and complicated to use or maneuver is implemented this may cause dissatisfaction).

There are several advantages to integrate the Kano modeling; classification of attributes can be used to optimize and improve the products, discover the attractors and develop product differentiation. Moreover, attribute classification provides valuable help in prioritizing requirements and identifying attributes that need attention. An important measure to separate experienced problems with the product, from those that can be fixed and those that are of

(24)

personal preference, may be introduced by using Kano modeling. The nature of the attractive and must-be attributes would allow applying attributable and relative risk techniques.

Attributable risk measures the reduction in dissatisfaction that would be observed if the consumers would not experience a particular problem, compared to the actual pattern. Relative risk is a ratio of the probability of dissatisfaction occurring among the group of consumers that experienced a particular problem compared to the probability of the dissatisfaction occurring among the group of consumers that did not experience the problem. However, the data used in this research did not include any additional indicator on problem attributes, hence classification of these is problematic.

3.2 Shapley Value Regression

Regression models offer a convenient method for summarizing and achieving two very different goals in data analysis. One is prediction and another is inference about interaction between the predictor variables and the outcome variable. Yet, regression models do not prove that such relationships exist, they simply summarize the likely effects if the models are as hypothesized (Lipovetsky & Conklin, 2001).

3.2.1 Assessing Importance in a Regression Model

Considering a simple model;

) , ( X f Y

Equation 1: Regression Model

where all of the predictor variables – x - are uncorrelated with each other; the standardized regression coefficients (called Beta coefficients - β) are taken as measures of importance. These measure the expected change in Y (i.e. dependent variable) when x changes by one standard deviation.

Having a negative β (for one particular predictor) can present a potential complication. However, since the actual value of β is its absolute value and the sign represents the direction of the effect, β can be represented by either squaring the values or simply taking the absolute value.

(25)

The sum of the standardized coefficients is then equal to the overall R2 of the model, where R2 (named coefficient of multiple determination) is a measure of the overall quality of the fit of the model (Lipovetsky & Conklin, 2001). Hence, each individual squared coefficient can be interpreted as the percentage of the explained variance by that individual variable.

tot reg i i i i SS SS y y y f R ₂ 2 2 ) ( ) ( Equation 2: R-squared

Nevertheless, the above explained situation almost never occurs in real data. Consequently, assessing standardized regression coefficients as explained above does not lead to a good indication of importance of each individual variable. The greater the correlation between the predictor variables the less meaningful the evaluated coefficients are (e.g. taking two variables with correlation of 1 into consideration; their coefficients would yield an infinite number of combination of predictors, each making exactly the same contribution). As a solution to this, I propose a technique used in Game Theory – the Shapley Value.

3.2.2 Potential, Value and Consistency

Shapley value, a solution concept in cooperative game theory, was introduced by Lloyd Shapley in 1953 (Shapley, 1953). It assigns a unique distribution of total surplus generated by the coalition of all players and it produces a unique solution satisfying the general requirements of the Nash equilibrium (i.e. choosing an optimal strategy under uncertainty) (Kuhn & Tucker, 1959). There is always exactly one such allocation procedure.

3.2.2.1 Nash Equilibrium

Nash Equilibrium is a solution strategy in game theory (named after John Forbes Nash, who introduced it). It involves a game of two or more players, where each player is assumed to be aware of the equilibrium strategies – x-i*. of other players and is making the best decision they

can, taking into consideration the decisions of the remaining players. Moreover, none of the players can gain anything by changing their decision, if the decisions of the others remain

(26)

unchanged. The set of strategies chosen under such circumstances and its payoff then constitute the Nash Equilibrium.

)

,

(

)

,

(

;

,

x

_i

S

_i

x

_i

x

_i*

f

_i

x

_i*

x

*_i

f

_i

x

_i

x

*_i

i

Equation 3: Nash Equlibrium

Where;

(S, f) is a game of n players and Si is a strategy of player i

S = S1xS2 … xSN is a set of strategy combinations where, f = (f1(x),…fn(x)) is the payoff

function for x S

xi is a strategy combination for player i while x-i is a strategy combination for all players

except player i

Thus, when each player i chooses strategy xi, it follows that x = (x1,.., xn), and the resulting

playoff for player i equals fi(x). Once player i cannot improve their payoff by changing their

strategy, then the strategy has achieved xi*. Consequently the strategy combination x* S is the

Nash Equilibrium.

3.2.2.2 Potential

Theorem 1: “There exists a unique real function on games – called the potential – such that the marginal contributions of all players (according to this function) are always efficient. Moreover, the resulting payoff vector is precisely the Shapley value (Econometrica, 1989).”

v) {i}, \ ( ) , ( ) , (N v P N v P N P Di

Equation 4: Marginal contribution of a player in a game

Where;

N is a finite number of players

v is a characteristic function satisfying v(Φ) = 0 (N\{i}, v) is a subgame S

(27)

Thus, a function P(N, v) is called the potential function if it satisfies the following for all games; N i i N v v N P D ( , ) ( )

Equation 5: Potential Function

Moreover, the satisfaction of the above condition determines the uniqueness of the potential function. According to Hart & Mass-Colell (Hart & Mass-Colell, 1989, pp. 589-614), it follows that the potential function is such that the allocation of marginal contributions always adds up exactly to the grand coalition. This is referred to as efficiency.

Furthermore, DiP(N, v) = Shi(N, v); where Shi denotes the Shapley value of player i in the game (N, v).

3.2.2.3 Preservation of differences

Preservation of differences looks at the payoff allocation problem from another view. That is, what would player i gain if player j is not be included and what would j get if player i would not be included in the model. Hart & Mass-Colell (Ibid.), show that one obtains a unique efficient outcome which simultaneously preserves all these differences.

{i}) \ ( x -{j}) \ ( j N N x dij i Equation 6: Differences Thus, {i}) \ ( ) ( x {i}) \ ( ) (N x N j N x N xi i j Equation 7: Payoff

The above equality has been used by Myerson (Myerson, 1980) and it has been proven that any solution that is obtained by a potential function satisfies the condition. Hence, any such solution clearly coincides with the Shapley value.

3.2.2.4 Consistency

An important characterization of the value is its internal consistency property.

Theorem 2: “Consider the class of solutions that, for two-person games, divide the surplus equally. Then the Shapley value is the unique consistent solution in this class (Econometrica, 1989).”

(28)

In general, the consistency requirement as stated above may be described with: Φ being a function that associates a payoff to every player in every game

reduced game, among any group of players in a game, defined as: giving the payoff according to Φ to the rest of the players

It follows that Φ is consistent if and only if, when applied to any reduced game, yields the same payoffs as in the original game (Econometrica, 1989).

3.2.2.5 Value

In regression, the attributes are thought of as players and the total value of the game as the R2. The formulation of the Shapley value of a single attribute is defined as:

k i j j i j i k j v M v M SV ( | ) ( | ( ))

Equation 8: Shapley Value

Where;

v(Mi|j) is the R2 of a model i containing predictor j

v(Mi|j(-j)) is the R2 of the same model i without j

! )! 1 ( ! n k n k

k ; is a weight based on the number of predictors in total (n) and the

number of predictors in this model (k)

3.2.3 Shapley-based R2 Decomposition

Shapley value offers very robust estimate of the relative importance of predictor variables even when there are high levels of correlation and/or skewness in the data.

The most common approach to R2 decomposition in cases of multicollinearity is a stepwise regression and its procedures. However, this method is of arbitrary nature and it does not always lead to efficient conclusions. Moreover, the significance test does not always allow the ranking of the independent variables in order of importance (Israeli, 2007, pp. 199-212). An alternative approach has been proposed by Chantreuil and Trannoy (Chanteruil & Trannoy, 1999), who used the concept of the Shapley value. Shorrocks (Shorrocks, 1999) then argues that Shapley value based procedures can be applied in various situations, leading to different results.

(29)

While traditional decompositions such as Fields (Fields, 2003) decomposition, can be applied to simple linear regressions models and perform well in finding the effects of the explanatory variables, the new approach (i.e. Shapley value based approach) may also be applied to more complicated regression models. These may include interactions, dummy variables and high multicollinearity between explanatory variables.

3.2.3.1 Decomposing R2

Consider a regression model;

J j j jx e b a y 1

Equation 9: Regression model

where the total sum of squares (in essence the raw variance of y) can be decomposed into the model sum of squares (SSreg) and the error sum of squares (SSerror):

) ( ) ˆ ( ) (y SS Var y Var e Var _tot Equation 10: Variance

The R2 of the regression is then taken as previously stated:

tot reg

SS SS R2

Following the Mood, Graybill and Boes (Mood et al., 1974) theorem the relative contributions may be stated as:

J j j jx y Cov e y b Cov y Var 1 ) , ( ) , ( ) (

Equation 11: Relative contributions

Omitting the residuals it follows that:

) ( ) , ( 1 ) ( ) , ( ) ( 1 2 y Var y e Cov y Var y x Cov b y R J j j j

(30)

Continuing from the above equation, the explanatory variables can be ranked according to their importance. However, this fails to account for probable correlation between the contribution of an individual explanatory variable and that of the remaining variables. On the other hand, Shapley decomposition procedure requires the contribution of a variable being equal to its marginal effect.

The marginal effect can be expressed as:

* * * 2 2 e x b a y R e x b x b a y R M S j j j S j k k j j k

Equation 12: Marginal Effect

Where; S is a subgroup of explanatory variables not including variable k.

Taking a simple example into consideration, where y a b₁x₁ b₂x₂ e , the difference of the two decompositions may be seen from the following:

Shapley decomposition: ) ( ) ( ) ( 2 1 ** 1 * * 1 * * 2 * 2 * 2 * 2 2 2 1 1 2 1 R a bx b x e R a b x e R a b x e C ) ( ) ( ) ( 2 1 * 2 * 2 * 2 * * 1 * * 1 * * 2 2 2 1 1 2 2 R a b x b x e R a b x e R a b x e C

Equation 13: Shapley Value R-squared decomposition

Fields decomposition: ) ( ) , ( 2 ) ( ) ( ) , ( 2 ) ( ₁ ₁** ₁ ₂ ₂* ₂ 1 y Var y x Cov b b y Var y x Cov b b C ) ( ) , ( 2 ) ( ) ( ) , ( 2 ) ( ₂ ₂* ₂ ₁ ₁** ₁ 2 y Var y x Cov b b y Var y x Cov b b C

Equation 14: Fields R-squared decomposition

Of special interest when comparing the two decompositions are models that are including high multicollinearity. This issue is particularly problematic when dealing with Fields decomposition due to the reason that it uses the estimated coefficients. The estimated variances of these will be

(31)

coefficients. Moreover, a small change in the model can result in a large change in the estimated coefficients. In contrast, Shapley based decomposition uses marginal contributions of a variable from all sequences. The value of the contribution will be high or low depending on whether the variable to which the variable in question is correlated is already included in the model. Consequentially two strongly correlated variables will result in having similar contributions. Israeli (Israeli, 2006, pp. 199-212) then argues that it is possible to similarly treat cases where non-linear effects of a variable are included in the regression models and models where interacting variables are introduced. There is no evidence of Fields decomposition, how the contribution should be divided in such cases, while this represents no problem for Shapley decomposition.

3.2.4 Choosing “key-drivers”

Up to this point, a method that successfully measures the relative importance of attributes in the model has been established. The following analytical design is proposed to effectively identify the key dissatisfiers (i.e. attributes that need attention).

The notations used include:

P(D) – probability of dissatisfaction

P(F) – probability of failure by any of the independent attributes P(D|F) – conditional probability of dissatisfaction among failed

P(D|F’) – conditional probability of dissatisfaction among non-failed

P(F|D) – conditional probability of failure among those dissatisfied – reach value P(F|D’) – conditional probability of failure among those non-dissatisfied – noise value In general, it is possible to say that values on the several bottom levels (less than 5) on the ordinal satisfaction scale prove dissatisfaction (D) and an identified problem corresponds to failure (F). The opposite events; non-dissatisfaction and non-failure are denoted as D’ and F’ respectively.

To identify the attributes that need attention, it is necessary to find the maximum values of the:

) ' ( ) ( Reach Noise P FD P FD Success Equation 15: Success

(32)

This is a measure of the prevalence of failed respondents, among those who are dissatisfied, in comparison with failed respondents, among those non-dissatisfied.

Considering a situation where all the attributes are ordered by their Shapley values in descending order and corresponding reach and noise values are given. According to Conklin and Lipovetsky (Conklin & Lipovetsky, 2004), adding the second ranked attribute to the model along with the first one; will increase the reach function (i.e. the failure on either of the two attributes increases the amount of dissatisfied customers). However, the noise function increases correspondingly (i.e. the non-dissatisfied ones). Adding more attributes results in the same pattern.

In general, reach means reassuring that a large part of the total number of dissatisfied customers are taken into consideration (which needs to be maximized), while a large noise number would mean focusing on problems that are not actual causes of dissatisfaction (Conklin & Lipovetsky, 2004).

Once added noise overwhelms the added reach, when including the next attribute into the model, success begins to decrease. At that point the final set of key dissatisfiers is defined (Conklin & Lipovetsky, 2004).

3.3 Trend Analysis

Using the Shapley value as the measure of importance, allows us to track market over time. The differences between two waves are due to actual changes in the market.

3.3.1 The time consistent Shapley value

The Shapley value is one of the most commonly used sharing mechanisms in static cooperation games with transferable payoffs (Yeung, 2010, pp. 137-149). Actually, the time-consistency property of the Shapley value means that if one renegotiates the agreement at any intermediate instant of time, assuming that cooperation has prevailed from initial date until that instant, then one would obtain the same outcome (Petrosjan & Zaccour, 2001, pp. 381-398). Thus, taking this property allows us to compare the marginal contribution of each satisfaction attribute over time.

(33)

3.4 Hierarchical Logistic Regression Modeling

A hierarchical logistic regression model is proposed to examine data with group structure and a binary response variable. There group structure is usually characterized by two levels; micro and macro. The structure is visually presented in the figure 4.

Figure 4: Two-level hierarchical regression

The same variables, predictors are used in each context, but the micro predictors are allowed to vary over context. At the first (micro) level, ordinary logistic regression model is applied. At the second (macro) level the micro coefficients are treated as functions of macro predictors. A Bayes estimation procedure is used to estimate the micro and macro coefficients. The components of the model represent within- and between- macro variance. An algorithm for finding the maximum likelihood estimates of the covariance of the components is proposed. The make-model car is viewed as macro observations and individual cars as micro. Dai, Li and Rocke (Dai et al., NN) propose the following procedure.

3.4.1 Ordinary logistic regression model

Let y be a binary outcome variable (i.e. the customer is satisfied or dissatisfied) that follows Bernoulli distribution y ~ Bin (1, π) and x be a car level predictor. Then the model can be written as: ij ij ij e y logit(πij) = ij ij ij x ) 1 log(

Equation 16: Ordinary logistic regression model

(34)

Where;

- i = 1 … Ij is the car level indicator and

- j = 1 … J is the make-model level indicator

- ij is the probability of dissatisfaction for car i among make-model j, conditional on x

Assumptions made in this model are that the micro level random errors eij are independent with

moments E(eij) = 0 and Var(eij) = e2 ij(1 ij).

3.4.2 Hierarchical logistic regression

Extending the ordinary model and accounting for effects of the second macro- level may be done by including design variables (dummy variables). Each second level unit (i.e. each make-model unit) has its own intercept in the model. These intercepts are used to measure the differences between make-models.

logit(πij) = αj + βxij where αj is the make-model intercept and its effect can be either fixed or

random (Domidenko, 2004). For simplicity purposes it is possible to treat the effects as random and re-write the model as following:

logit(πij) = αj + βxij where αj = u _j

Equation 17: Random effects

It is then possible to add second level predictors. The above equation will therefore be extended to:

logit(πij) = αj + βxij

αj = α + γzj + uj

Equation 18: Fixed effects

Where the added term γ is a fixed effect and z is the second level predictor. Using the same predictors, the model can be extended further for investigation of possible cross-level interaction. The algorithm can be applied using SAS procedure PROC GLIMMIX.

(35)

3.5 Canonical Correlation Analysis

Canonical correlation has been introduced by Harold Hotelling (Johnson & Wichern, 2001) and is a way of exploring the cross-covariance matrices.

Consider two sets of variables x1, … , xn and y1, … ,ym and assume there are correlations among

these variables. Then the canonical correlation analysis will result in finding combinations of x’s and y’s which have maximum correlation with each other.

3.5.1 Formulation Given vectors; X = (x1, …, xn) and, Y = (y1, …, yn) Let; xx cov(X,X)and, YY cov(Y,Y)

The parameter to maximize is;

YY xx xy b b a a b a ' ' '

Equation 19: CCA parameter

Following: The canonical variables are defined by; U = a’X

V = b’Y

3.5.2 Issues and practical usage

The main benefit of using the canonical correlation analysis is its diversification from other (appropriate) multivariate techniques that impose very rigid restrictions. It is generally believed that those provide results of higher quality. However, for the purpose of this research and when dealing with this type of data, the fact that canonical correlation places the fewest restrictions makes it the most appropriate and powerful multivariate technique. It may be seen as a generalization of the multiple linear regression.

(36)

Variables included in the analysis should be on ratio or interval scale. However nominal or ordinal variables can be used after converting them to sets of dummy variables. Even though testing significance of the canonical correlations requires data to be multivariate normal, the technique performs well for descriptive purposes even if the requirement is not necessarily fulfilled. Hair (Hair et al.,1998) discusses the flexibility of the canonical correlation and its advantages, particularly in the context when the dependent and explanatory variables can be either metric or non-metric. Hence, the application is broadly consistent with existing literature.

4 Computations and Results

The very first step when conducting the analyses was using SAS statistical software to transform the variables that allowed more than one answer (e.g. problem areas) into binary form by adding dummy variables.2

4.1 Shapley Value

I used R statistical language more specifically The Package relaimpo (Relative Importance for Linear Regression in R). This package implements six different metrics for assessing relative importance of predictors in the linear model. Moreover, it offers exploratory bootstrap confidence intervals (Journal of Statistical Software, 2006).

For the purpose of this research, there are three particularly useful metrics; “lmg”, “first” and “last”, described in the following

“lmg”; these are the Shapley Values. The metric is a decomposition of R2

into non-negative contributions that automatically sum to the total R2. It is recommended to use when calculating relative importance, since it uses both direct effect and effects adjusted for other predictors in the model.

“First”; these are univariate R2

values from regression models with one predictor only. They explain what each predictor individually is able to explain. If predictors are correlated the sum of all “firsts” will be high above the the overall R2

(37)

“Last”; these explain what each predictor is able to able to explain in addition to all other predictors. The values represent the increase in R2 when the specific predictor is added to the model. In case of correlation among the predictors, summing “lasts” will not add up to the overall R2.

A potential drawback are computational difficulties, hence sampling of attributes is necessary. Theil (Theil, 1987) suggests that an information measure may be introduced, thus information coefficient was introduced as a pre-analysis step. Information coefficient is a measure for evaluating the quality and usefulness of attributes. Unavoidably, 20 vehicle related attributes were chosen in each dataset.

The following analysis is based on the R-output3 and includes relative importance of 15 satisfaction attributes regarding the dealer, where the vehicle was purchased, followed by 20 attributes regarding the vehicle, both ranging over 4 years.

3

(38)

4.1.1 Ranked Satisfiers (related to the satisfaction with the dealer)

Figure 5 is illustrating the frequency distribution of the response variable.

Figure 5: Satisfaction Attribute V90, Country A, Year 2006

Tables 4 to 6 are displaying the “lmg” metrics of the attributes regarding the satisfaction with the dealer and are ordered according to their relative importance.

Table 4: Dealer Satisfiers, Country A, Years 2006 and 2007 respectively

lmg RI % lmg RI % V91 0,185213 18,52% V91 0,1910097 19,10% V94 0,098906 9,89% V94 0,09609172 9,61% V103 0,093681 9,37% V103 0,08891168 8,89% V98 0,085307 8,53% V198 0,08803514 8,80% V93 0,072336 7,23% V93 0,07442355 7,44% V101 0,06959 6,96% V101 0,06921337 6,92% V95 0,064972 6,50% V95 0,06514394 6,51% V97 0,062902 6,29% V97 0,06124171 6,12% V102 0,058982 5,90% V96 0,05719067 5,72% V96 0,055878 5,59% V102 0,05623089 5,62% V99 0,055683 5,57% V99 0,05405284 5,41% V100 0,048342 4,83% V92 0,05121555 5,12%

(39)

Table 5: Dealer Satisfiers, Country A, Years 2008 and 2009 respectively lmg RI % lmg RI % V91 0,1876471 18,76% V91 0,18835054 18,84% V94 0,0988604 9,89% V94 0,09666288 9,67% V103 0,0874508 8,75% V103 0,09215811 9,22% V98 0,0854664 8,55% V98 0,08446151 8,45% V93 0,0752088 7,52% V93 0,0736499 7,36% V101 0,0711366 7,11% V101 0,07036984 7,04% V95 0,0653855 6,54% V95 0,06565629 6,57% V97 0,0627659 6,28% V97 0,06140672 6,14% V102 0,0596454 5,96% V102 0,0573845 5,74% V96 0,0561828 5,62% V96 0,05621675 5,62% V99 0,0542581 5,43% V99 0,05484663 5,48% V92 0,0498752 4,99% V92 0,0518594 5,19% V100 0,0461172 4,61% V100 0,04697694 4,70%

Table 6: Dealer Satisfiers, Country A, Year 2010

Lmg RI% V91 18,78% 18,78% V94 10,01% 10,01% V103 8,75% 8,75% V98 8,35% 8,35% V93 7,49% 7,49% V101 7,21% 7,21% V95 6,77% 6,77% V102 6,12% 6,12% V97 6,01% 6,01% V96 5,50% 5,50% V92 5,21% 5,21% V99 5,18% 5,18% V100 4,63% 4,63%

(40)

4.1.2 Ranked Satisfiers (related to the satisfaction with the vehicle)

Tables 7 to 9 are illustrating satisfaction attributes regarding the vehicle and are ordered according to their relative importance.

Table 7: Vehicle satisfiers, Country A, Years 2006 and 2007 respectively

lmg RI % lmg RI % V1 0,17533809 17,53% V1 0,1712068 17,12% V2 0,06934197 6,93% V2 0,0652616 6,53% V3 0,06650988 6,65% V21 0,0643264 6,43% V4 0,05235531 5,24% V3 0,0581126 5,81% V5 0,05018064 5,02% V7 0,0481799 4,82% V6 0,04983194 4,98% V11 0,0468997 4,69% V7 0,04902994 4,90% V6 0,0458007 4,58% V8 0,04524226 4,52% V9 0,0452338 4,52% V9 0,04373201 4,37% V5 0,0443357 4,43% V10 0,04235835 4,24% V8 0,0433154 4,33% V11 0,04168734 4,17% V13 0,0409352 4,09% V12 0,03989079 3,99% V10 0,0400688 4,01% V13 0,03868629 3,87% V12 0,0396294 3,96% V14 0,03835219 3,84% V25 0,0386817 3,87% V15 0,03749254 3,75% V15 0,0373659 3,74% V16 0,03447326 3,45% V17 0,0370816 3,71% V17 0,03417306 3,42% V14 0,0359364 3,59% V18 0,03296703 3,30% V16 0,0352603 3,53% V19 0,03095085 3,10% V22 0,0334706 3,35% V20 0,02740625 2,74% V20 0,0288976 2,89%

(41)

Table 8: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively lmg RI % lmg RI % V1 0,12683183 12,68% V1 0,11186845 11,19% V23 0,08183276 8,18% V23 0,0769934 7,70% V2 0,06149193 6,15% V25 0,06519498 6,52% V21 0,05708817 5,71% V21 0,06440816 6,44% V15 0,04966849 4,97% V7 0,05385323 5,39% V7 0,04958275 4,96% V3 0,05198625 5,20% V3 0,04926621 4,93% V4 0,04671821 4,67% V24 0,04758717 4,76% V11 0,04539 4,54% V4 0,04521186 4,52% V9 0,04407208 4,41% V8 0,04422061 4,42% V10 0,04395921 4,40% V10 0,04354485 4,35% V8 0,04258536 4,26% V9 0,04270511 4,27% V13 0,04169902 4,17% V11 0,04203345 4,20% V12 0,04096882 4,10% V17 0,03907649 3,91% V17 0,0406323 4,06% V13 0,03894794 3,89% V14 0,0406012 4,06% V5 0,03793851 3,79% V26 0,03910197 3,91% V14 0,03757862 3,76% V15 0,03835864 3,84% V25 0,0361539 3,62% V24 0,03791848 3,79% V12 0,03570495 3,57% V22 0,03720759 3,72% V16 0,03353438 3,35% V16 0,03648265 3,65%

(42)

Table 9: Vehicle Satisfiers, Country A, Year 2010 lmg RI % V27 0,15284015 15,28% V1 0,11910833 11,91% V2 0,05708717 5,71% V21 0,05692879 5,69% V7 0,04802484 4,80% V4 0,04617533 4,62% V8 0,04512617 4,51% V3 0,04414864 4,41% V9 0,04018143 4,02% V11 0,0388276 3,88% V13 0,03744338 3,74% V17 0,03739872 3,74% V10 0,03708772 3,71% V12 0,0362962 3,63% V15 0,03521464 3,52% V14 0,03482542 3,48% V25 0,03391699 3,39% V22 0,03373433 3,37% V16 0,03357805 3,36% V19 0,0320561 3,21%

(43)

4.1.2.1 Among customers that did not experience any problems

The follow up analysis took a closer look on the customers, who did not experience any problems and compared the obtained relative importances to those obtained in the previous section where all the customers were included in the analysis.

Tables 10 to 12 are displaying the satisfaction attributes regarding the vehicle, ordered according to their relative importance.

Table 10: Vehicle Satisfiers, Country A, Years 2006 and 2007 respectively (respondents with no problems) lmg RI % lmg RI % V7 0,06927767 6,93% V14 0,07256916 7,26% V14 0,06706004 6,71% V2 0,06769435 6,77% V2 0,06605162 6,61% V7 0,0675463 6,75% V1 0,0636296 6,36% V11 0,0565514 5,66% V8 0,0587735 5,88% V1 0,05623229 5,62% V3 0,05574437 5,57% V8 0,05400056 5,40% V11 0,05187558 5,19% V3 0,052504 5,25% V6 0,05026635 5,03% V10 0,05111744 5,11% V13 0,04927452 4,93% V13 0,05007357 5,01% V10 0,04775618 4,78% V6 0,0469457 4,69% V5 0,046488 4,65% V9 0,04605065 4,61% V9 0,04626672 4,63% V25 0,04549564 4,55% V12 0,04400858 4,40% V29 0,04543294 4,54% V16 0,04299308 4,30% V5 0,04505041 4,51% V28 0,04233533 4,23% V22 0,04427986 4,43% V15 0,04196933 4,20% V12 0,04359521 4,36% V26 0,04179348 4,18% V26 0,04337561 4,34% V22 0,03961751 3,96% V15 0,04075267 4,08% V17 0,03906054 3,91% V17 0,04038925 4,04% V4 0,035758 3,58% V30 0,03034298 3,03%

(44)

Table 11: Vehicle Satisfiers, Country A, Years 2008 and 2009 respectively (respondents with no problems) lmg RI % lmg RI % V2 0,0751163 7,51% V14 0,07195362 7,20% V7 0,06607087 6,61% V17 0,0706223 7,06% V14 0,06505198 6,51% V2 0,06242987 6,24% V10 0,05773926 5,77% V10 0,05606256 5,61% V8 0,05375072 5,38% V1 0,0551477 5,51% V3 0,05281364 5,28% V11 0,05439075 5,44% V1 0,05021427 5,02% V31 0,0535409 5,35% V11 0,04988962 4,99% V8 0,05202964 5,20% V31 0,04841213 4,84% V13 0,05015063 5,02% V26 0,04825039 4,83% V3 0,0497964 4,98% V9 0,04797203 4,80% V26 0,04625483 4,63% V12 0,04751064 4,75% V16 0,04612255 4,61% V17 0,04722728 4,72% V17 0,04419138 4,42% V13 0,04678278 4,68% V21 0,04413617 4,41% V21 0,04508103 4,51% V25 0,04400253 4,40% V16 0,04223649 4,22% V28 0,04375152 4,38% V25 0,04173557 4,17% V9 0,0433502 4,34% V15 0,04111215 4,11% V12 0,04258554 4,26% V28 0,03983678 3,98% V15 0,03882214 3,88% V27 0,03319606 3,32% V27 0,03065876 3,07%

(45)

Table 12: Vehicle Satisfiers, Country A, Year 2010 (respondents with no problems) lmg RI % V2 0,06599232 6,60% V7 0,06496484 6,50% V14 0,060577 6,06% V8 0,05773304 5,77% V1 0,05416185 5,42% V31 0,05325409 5,33% V11 0,05198424 5,20% V13 0,05113744 5,11% V10 0,05081902 5,08% V3 0,04954933 4,95% V21 0,04667376 4,67% V170 0,04555793 4,56% V9 0,04555003 4,56% V26 0,04542082 4,54% V25 0,04530835 4,53% V22 0,04492645 4,49% V16 0,04480985 4,48% V15 0,041844 4,18% V12 0,04176165 4,18% V27 0,03797399 3,80%

From the above tables it is possible to notice that the contributions of the satisfaction attributes are very close in terms of importance. A number of new attributes, which were previously less important, entered the new model (e.g. V28, V29, V30 and V31). The importance of the attribute V14 increased greatly and is appearing on the top three list each year.

(46)

4.1.3 Ranked Dissatisfiers

In contrast to the previous section, this part is focusing on the identification of the greatest dissatisfier. The Shapley value was calculated for all experienced problem areas, followed by analysis of problems in each problem area (i.e. sub-categories).

Tables 13 to 15 are illustrating the problem areas ranked according to their relative importance.

Table 13: Dissatisfiers, Country A, Year 2006 and 2007 respectively

lmg RI % lmg RI % Ven 0,212139073 21,21% Ven 0,212809172 21,28% Vb 0,12142325 12,14% Vc 0,137871419 13,79% Vc 0,120215922 12,02% Vb 0,123632357 12,36% Vel 0,108495021 10,85% Vel 0,089960019 9,00% Vi 0,100405042 10,04% Vi 0,074496974 7,45% Vo 0,062205607 6,22% Vo 0,061172457 6,12% Vsw 0,0606551 6,07% Vsw 0,056500635 5,65% Vbr 0,045555738 4,56% Vw 0,05154355 5,15% Ve 0,042242747 4,22% Vbr 0,048614873 4,86% Vp 0,039842387 3,98% Vs 0,046087759 4,61% Vs 0,033740768 3,37% Vp 0,042708995 4,27% Vw 0,03095202 3,10% Ve 0,035370297 3,54% Vex 0,013582749 1,36% Vex 0,0104424 1,04% Vot 0,008544577 0,85% Vot 0,008789093 0,88%

(47)

Table 14: Dissatisfiers, Country A, Year 2008 and 2009 respectively lmg RI % lmg RI % Ven 0,252488469 25,25% Ven 0,214316999 21,43% Vb 0,110277994 11,03% Vc 0,151047967 15,10% Vc 0,103348859 10,33% Vb 0,122505803 12,25% Vel 0,093153875 9,32% Vel 0,088606572 8,86% Vi 0,087610414 8,76% Vi 0,079500754 7,95% Vo 0,05987042 5,99% Vs 0,055625147 5,56% Vw 0,05203434 5,20% Vbr 0,050078129 5,01% Vbr 0,048519557 4,85% Vo 0,049666289 4,97% Vp 0,045876248 4,59% Ve 0,048929548 4,89% Vs 0,042098418 4,21% Vsw 0,047450551 4,75% Vsw 0,039320631 3,93% Vp 0,047259639 4,73% Ve 0,033029765 3,30% Vw 0,027595428 2,76% Vex 0,025816516 2,58% Vex 0,015697143 1,57% Vot 0,006554494 0,66% Vot 0,001720031 0,17%

Table 15: Dissatisfiers, Country A, Year 2010 lmg RI % Ven 0,292895 29,29% Vc 0,136178 13,62% Vb 0,089357 8,94% Vel 0,08703 8,70% Vi 0,065648 6,56% Vo 0,05879 5,88% Vbr 0,049947 4,99% Vs 0,049329 4,93% Vsw 0,041235 4,12% Vp 0,039328 3,93% Ve 0,036337 3,63% Vw 0,028256 2,83% Vex 0,014915 1,49% Vot 0,010756 1,08%

(48)

The analysis was then applied to sub-categories in order to identify the absolute dissatisfier.

Table 16: Ven problem area sub-categories, Country A, Year 2006

lmg RI % Ve4 0,244029091 24,40% Ve1 0,20440145 20,44% Ve7 0,149881772 14,99% Ve98 0,105140416 10,51% Ve5 0,071517411 7,15% Ve8 0,041940597 4,19% Ve6 0,033103608 3,31% Ve18 0,019165955 1,92% Ve15 0,018551414 1,86% Ve9 0,018305727 1,83% Ve19 0,016829286 1,68% Ve2 0,016335526 1,63% Ve11 0,01555382 1,56% Ve16 0,013381763 1,34% Ve10 0,008700418 0,87% Ve27 0,007703845 0,77% Ve26 0,006485322 0,65% Ve22 0,003608788 0,36% Ve12 0,002974477 0,30% Ve14 0,00104175 0,10% Ve17 0,000853017 0,09% Ve3 0,000494547 0,05%

(49)

4.1.4 “Key attributes” identification

Figure 6: Noise-Reach table, Country A, Year 2006

Figure 6 (above) is illustrating the “key attributes” identification. Problem areas are ranked according to the corresponding Shapley values and “reach” and “noise” are calculated according to the equation 15. Once added noise overcomes added reach, the cutting point is known. All problem areas with corresponding success less than 0 are unimportant.

4.2 Time Series and Trend Analysis

Time series analysis in order to detect possible trend in relative importance was applied to those satisfaction attributes (in relation to the vehicle) that were repeating in the model over the 5 years. This is illustrated in figure 7.

(50)

Figure 7: Time Series Analysis, Country A

According to the above chart, the relative importance of the satisfaction attribute V1 is showing most fluctuation over time, while the remaining attributes are fairly stable.

Figure 8 shows fitted linear trend line, which illustrates the changes in satisfaction attribute V1 over five consecutive years of study. The R2 represents trendline trustworthiness. Its value of 0,8153 confirms a fairly good fit of the line to the data.4

Customer Satisfaction Analysis

Master Thesis in Statistics, Data Analysis and Knowledge Discovery