IN
DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS
STOCKHOLM SWEDEN 2016,
Determining Important Factors for Profitability in Dietary Supplement Retailing by Multiple Linear
Regression
AUGUST DÉNES MARTIN LINDBLOM
KTH ROYAL INSTITUTE OF TECHNOLOGY
Determining Important Factors for Profitability in Dietary Supplement Retailing by Multiple Linear Regression
A U G U S T D É N E S M A R T I N L I N D B L O M
Degree Project in Applied Mathematics and Industrial Economics (15 credits) Degree Progr. in Industrial Engineering and Management (300 credits)
Royal Institute of Technology year 2016 Supervisors at KTH: Fredrik Armerin, Jonatan Freilich
Examiner: Henrik Hult
TRITA-MAT-K 2016:10 ISRN-KTH/MAT/K--16/10--SE
Royal Institute of Technology SCI School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci
Determining Important Factors for Profitability in Dietary Supplement Retailing by Multiple
Linear Regression
August D´enes Martin Lindblom
May 24, 2016
Abstract
This thesis in applied statistics and industrial economics examined which factors and strategies that had a statistically significant impact on prof- itability, within the business to consumer dietary supplement market. The data this thesis was based on consisted of several annual reports from the year 2011 to 2015 and other strategic information. The data included 19 different dietary supplement retailers on the Swedish market. In order to establish which factors had a significant impact on profitability, a linear regression was used. The result was an identified linear relationship be- tween Operating Margin and the covariates Solidity, Average Salary, Only Own Brand, Free Returns, Student Discounts and Chat. A market anal- ysis was then performed using Porter’s five forces and a PEST analysis.
The analysis concluded that the market is attractive but there are a few uncertainties surrounding the future development of the dietary supple- ment retailing market.
Sammanfattning
Det h¨ar kanditatexamensarbetet i till¨ampad statistik och industriell ekonomi unders¨okte vilka faktorer och strategier som hade en statistiskt signifikant p˚averkan p˚a l¨onsamhet, inom kosttillskotts˚aterf¨ors¨aljning p˚a den svenska konsumentmarknaden. Data som rapporten ¨ar baserad p˚a kom ifr˚an fler- talet ˚arsredovisningar mellan ˚aren 2011 och 2015 samt annan strategisk information. Studien inkluderade 19 olika f¨oretag p˚a den svenska kosttill- skotts˚aterf¨ors¨aljningsmarknaden. F¨or att identifiera de p˚averkande fak- torerna till¨ampades linj¨ar regression. Resultat visade att det finns ett linj¨art samband mellan R¨orelsemarginal och kovariaten Soliditet, Snittl¨on, Endast Eget M¨arke, Fria Returer, Studentrabatt och Chatt. Sedan utf¨ordes en marknadsanalys med hj¨alp av Porters’s femkraftsmodell samt PEST- analys. Anslysen fastslog att marknaden ¨ar attraktiv, men det finns n˚agra os¨akerheter r¨orande den framtida utvecklingen p˚a kosttillskotts˚aterf¨ors¨ajn- ingsmarknaden.
Contents
1 Introduction 4
1.1 Background . . . . 4
1.2 Aim . . . . 4
1.3 Problem Formulation . . . . 5
2 Theory 6 2.1 The Homoskedastic Linear Regression Model . . . . 6
2.1.1 Assumptions . . . . 6
2.1.2 Covariates . . . . 7
2.1.3 The Ordinary Least Square Estimation . . . . 7
2.2 Potential Modelling Errors . . . . 7
2.2.1 Multicollinearity . . . . 7
2.2.2 Micronumerosity . . . . 8
2.2.3 Heteroskedasticity . . . . 8
2.2.4 Endogeneity . . . . 10
2.3 Model Evaluation . . . . 11
2.3.1 R2 and Adjusted R2 . . . . 11
2.3.2 Eta Squared, η2 . . . . 12
2.3.3 F-test . . . . 12
2.3.4 P-value . . . . 12
2.3.5 AIC . . . . 12
3 Methodology 14 3.1 Demarcation . . . . 14
3.2 Defining Variables . . . . 15
3.2.1 Dependent Variable . . . . 15
3.2.2 Explanatory Variables . . . . 15
3.3 Data . . . . 17
3.3.1 Financial Data . . . . 17
3.3.2 Qualitative Data . . . . 17
3.4 Initial Model . . . . 18
3.4.1 VIF-test . . . . 18
3.4.2 Estimation of the Initial Model . . . . 19
3.4.3 Reduction of the Initial Model . . . . 20
4 Results 21 4.1 The Final Model . . . . 21
4.2 Model Validation . . . . 21
4.2.1 VIF . . . . 21
4.2.2 Adjusted R2 . . . . 22
4.2.3 F-Statistics and P-values . . . . 22
4.2.4 Residuals . . . . 22
5 Discussion 24 5.1 Method . . . . 24
5.1.1 Using Regression to Evaluate Strategy . . . . 24
5.1.2 Demarcation . . . . 24
5.1.3 Dependent Variable . . . . 25
5.1.4 Explanatory Variables . . . . 25
5.1.5 Data . . . . 27
5.1.6 The Reduction . . . . 28
5.2 Results . . . . 29
5.2.1 The Final Model . . . . 29
5.2.2 Model Validation . . . . 30
6 Market Analysis 33 6.1 Choosing Market Analysis Models . . . . 33
6.2 Theoretical Framework . . . . 33
6.2.1 Porter’s Five Forces . . . . 33
6.2.2 Criticism Towards Porters’ Five Force Model . . . . 34
6.2.3 PEST Analysis . . . . 34
6.3 Application of Porter’s Five Forces . . . . 35
6.3.1 Threat of New Entrants . . . . 35
6.3.2 Power of Suppliers . . . . 36
6.3.3 Power of Buyers . . . . 36
6.3.4 Threat of Available Substitutes . . . . 36
6.3.5 Competitive Rivalry . . . . 37
6.3.6 Summary of Porter’s Five Forces . . . . 37
6.4 PEST Analysis of the Dietary Supplement Market . . . . 37
6.4.1 Political Factors . . . . 37
6.4.2 Economical Factors . . . . 38
6.4.3 Social Factors . . . . 38
6.4.4 Technological Factors . . . . 38
6.4.5 Summary of PEST Analysis . . . . 39
7 Conclusion 40 8 References 41 List of Tables 1 Company Declaration . . . . 15
2 Defined Variables . . . . 18
3 VIF-values of the Initial Model . . . . 19
4 Results of the Initial Model . . . . 19
5 Eliminated Covariates . . . . 20
6 Results of the Final Model . . . . 21
7 VIF-values of the Final Model . . . . 22
List of Figures 1 Homo- and Heteroskedastic Residual Plots . . . . 9
2 Homo- and Heteroskedastic Density Plots . . . . 9
3 Homo- and Heteroskedastic Normal Q-Q Plots . . . . 10
4 Residual and Density plot . . . . 23
5 Normal Q-Q Plot . . . . 23
6 Illustration of Porter’s Five Forces . . . . 35
1 Introduction
1.1 Background
The supplement retailing industry has been and still is considered a rather stig- matized market. This could be due to the fact that dietary supplements are deemed unnecessary by certain health researchers (Bratt 2014). Supplements have also been associated with bodybuilding, an industry known for its use of anabolic steroids (Chappell 2015). Furthermore, a study conducted by the Stockholm Environment Administration in 2012 concluded that 40 % of the 122 studied companies carried one or more products containing illegal or poten- tially harmful substances. In total 61 products received a sale prohibition and 15 substances were deemed illegal or classified as pharmaceuticals (Johansson
& Jonsson 2013).
Despite this stigma, the supplement industry in Sweden has been constantly growing and widening its customer base in recent years. Consequently the sup- plement market has now become a multibillion SEK industry. The growth is partially attributed to the current fitness trend and the widespread use of so- cial media as a marketing platform (TT 2014). This has led to several new competitors on the market, hence it probably becomes more important to dif- ferentiate the company’s brand to generate profitability. This becomes evident when comparing competitors, they may differ in their assortment, discounts and offer online retailing or physical stores. It is most likely also important to com- pare other managerial aspects of the business, such as the amount of customer service provided or the company’s relation to its suppliers.
The supplement industry is not a new phenomenon, yet it has never been as widely successful as it is now. This rapid ascent makes it an interesting market to analyze.
1.2 Aim
The aim of the project is to establish how different strategies or factors within the business to consumer dietary supplement market impact profitability. The project will also present an interpretation of why these factors have an effect.
This is performed with the intent of providing a structural interpretation. Mean- ing that the aim is to determine how different factors and strategies affect a company’s profitability (positive or negative), rather than determining to what extent these factors contribute. The project also aims to provide sufficient un- derstanding of the current market conditions to enable companies to formulate a business strategy that take these factors into account.
The project becomes mostly relevant to potential market entrants and current competitors. The project might also be applicable to some extent in similar retailing markets.
1.3 Problem Formulation
In order to achieve the aim, the project attempts to answer the following problem formulations:
• Which factors have a statistically significant impact on profitability in dietary supplement retailing?
– How could these factors be interpreted?
• Is the market attractive in terms of long-run profitability?
• Is the market sensitive to external factors?
2 Theory
2.1 The Homoskedastic Linear Regression Model
The multiple linear regression model is specified as follows yi=
k
X
j=0
xijβj+ ei, i = 1, . . . , n. (1) in this specification yi is an observation of a random dependent variable y.
The value of y depends on the explanatory variables or covariates x·j and the additional random variable ei, the residual or error term. The βj parameters are what is called regression coefficients. These parameter are unknown and are to be estimated from observational data. The stated model (1) consist of n observations and k covariates, this can be denoted on matrix form as
Y = Xβ + e (2)
where
Y =
y1
y2
... yn
, X =
1 x11 x12 · · · x1k
1 x21 x22 · · · x2k
... ... ... . .. ... 1 xn1 xn2 · · · xnk
, β =
β0
β1
... βk
, e =
e1
e2
... en
(Lang 2014)
2.1.1 Assumptions
The previously stated regression model consist of five basic assumptions regard- ing the way the observations are generated.
1. The first assumption is that the dependent variable y can be calculated as a linear combination of the covariates x·j plus the residual e. The unknown coefficients of this function form the vector β in equation (2).
2. The second assumption is that the expected value of the residual is equal to zero
E[ei] = 0
This means that the mean of the distribution from which the error e is drawn is zero.
3. The third assumption is that all the error terms have the same variance and that they are uncorrelated to one another. This can be expressed as
E[e2i] = σ2 where σ is unknown.
4. The fourth assumption is that the independent variables or covariates are deterministic, meaning that they are fixed in repeated samples.
5. The final assumption is that the number of observations is larger than the number of covariates and that there are not any linear relationships between any of the covariates.
(Kennedy 2008)
2.1.2 Covariates
There are two types of covariates to consider when formulating a regression model. These are quantitative covariates and qualitative covariates.
The quantitative covariates assume a numeric value to quantify how much of a property was recorded in a specific outcome. For example if a car model is considered a quantitative covariate could be the amount of horsepower this par- ticular cars engine can generate.
A qualitative covariate or a dummy variable only assumes the value one or zero to indicate whether a particular outcome has a certain quality. If for ex- ample a car is yet again considered a dummy variable could be if the car is made in the U.S. If the car was made in the U.S the dummy variable would assume the value one. If the car was not made in the U.S the dummy variable would assume the value zero.
2.1.3 The Ordinary Least Square Estimation
The ordinary least square estimate of the vector β from (2) is the vector ˆβ that minimizes the sum of the squared residuals ˆete = |ˆˆ et|2, where ˆe and ˆβ is defined as
ˆ
e = Y − X ˆβ, ˆβ =
βˆ0
βˆ1 ... βˆk
(3)
in order to find the vector ˆβ that minimizes the sum of the squared residuals the normal equations (4) are solved for ˆβ
Xte = 0ˆ (4)
by using equation (3) in (4) we get the expression
Xt(Y − X ˆβ) = 0 (5)
solving for ˆβ gives us the expression
β = (Xˆ tX)−1XtY (6)
(Lang 2014)
2.2 Potential Modelling Errors
2.2.1 Multicollinearity
Multicollinearity occurs when at least one of the covariates are strongly corre- lated with a linear combination of other covariates in the model. It causes the standard errors of one or more of the regression coefficients to be large, hence the point estimates of these coefficients become imprecise. However, these stan- dard errors decreases with increasing number of observations, which means the
problem is often due to lack of data.
If at least two of the covariates in the model are perfectly linearly dependent, the OLS estimate has no unique solution, consequently the XtX matrix becomes singular. This is called strict or perfect multicollinearity, and causes the OLS estimation to be impossible regardless of the amount of observations. When it is not as obvious, it is possible to detect multicollinearity by utilizing the Variance Inflation Factor (VIF).
VIF
The VIF is a method of detecting multicollinearity and is defined as
V IF = (1 − R2)−1 (7)
R2 in the expression above is derived for every covariate by implementing the investigated covariate as the dependent variable with respect to the other co- variates in the model. The R2 is then calculated as shown in Section 9. A VIF value > 10 indicates that multicollinearity is a substantial problem in the model. (Lang 2015, Hansen 2016)
2.2.2 Micronumerosity
Another problem with small sample sizes is micronumerosity. If the amount of observations is small it is likely that the asymptotics, stemmed from the Central Limit Theorem, do not occur. In this case a homoskedastic approach must be employed instead of the, in most cases more robust heteroskedastic approach.
(Hansen 2016, Lang 2015) 2.2.3 Heteroskedasticity
Heteroskedasticity implies that the variances of the error terms are dependent of the observations. This is contradictory to the third assumption in Section 2.1.1 which states that all error terms have the same variances. The phenomenon of different variances causes the standard deviations of the parameter estimates to be inconsistent. If the residuals are close to being homoskedastic it is preferable to reformulate the model so that it is possible to use OLS.
To detect if heteroskedasticity is present it is possible to use the Residual plot, the Density plot and the Normal Quantile Quantile plot (Q-Q plot). The dots in the residual plot will be randomly spread if the residuals are homoskedastic, illustrated to the left in Figure 1, and if the residuals are heteroskedastic the dots will follow some sort of pattern, illustrated to the right in Figure 1.
Figure 1: Homo- and Heteroskedastic Residual Plots
The density plot curve is symmetric if the data is homoskedastic, illustrated to the left in Figure 2, otherwise the data is considered heteroskedastic, illustrated to the right in Figure 2.
Figure 2: Homo- and Heteroskedastic Density Plots
The Q-Q plot plots the standardized residuals against the theoretical quantiles.
If the Q-Q plotted dots follow a straight line, illustrated to the left in Fig- ure 3, the regression is homoskedastic otherwise heteroskedasticity is present, illustrated to the right in Figure 3.
Figure 3: Homo- and Heteroskedastic Normal Q-Q Plots
If reformulation is not possible or the problem with heteroskedasticity remains after reformulation, Halbert White introduced a tool that estimates the covari- ance matrix for the heteroskedastic regression model in 1980. This estimator is called White’s Consistent Variance Estimator and is formulated as
Cov( ˆβ) = (XtX)−1XtD(ˆe2i)X(XtX)−1 (8) where D(ˆe2i) is a diagonal matrix with diagonal elements ˆe2i. (Lang 2015) 2.2.4 Endogeneity
Endogeneity occurs when at least one of the covariates is correlated with the residual. This problem only appears when the regression is used for a structural interpretation and not when it is used for prediction purposes. The phenomenon causes inconsistent OLS estimates. If the correlation between a covariate and the residual is positive the coefficient will be over estimated and vice versa.
There are different reasons as to why problems with endogeneity appears. Some of the most common reasons are
Sample selection bias
Sample selection bias implies that the data selection is biased on some other basis than the covariates explain. The unexplained basis may lead to incorrect conclusions.
Simultaneity
Simultaneity appears when the dependent variable influence any of the covari- ates in the regression. If an event that is not a part of the explanatory variables in the regression influence the dependent variable, it will be a part of the resid- ual. Consequently the dependent variable simultaneously influences a covariate, thereby the residual influences the covariate and endogeneity arises.
Missing Relevant covariates
If the regression model misses relevant explanatory variables, they will be part of the residual and may cause incorrect estimates. Hence it is important to
include all of the relevant covariates.
Measurement errors
Measurement errors can also cause endogeneity. A measurement error in the dependent variable will only add an element in the residual, and therefore it will not cause endogeneity. However, if the measurement error is made in a co- variate it will cause a correlation between the covariate and the residual, unless the β-estimate is zero, meaning endogeneity is present.
Problems with endogeneity may be remedied by replacing the endogenous vari- able with heavily correlated variables called instruments. It is important that the instruments are not correlated with the residual for the method to be help- ful. This process is called 2SLS, Two Stage Least Squares.
(Lang 2015; Hansen 2016)
2.3 Model Evaluation
2.3.1 R2 and Adjusted R2
The R2is a statistic that is often reported by statistical software. Conceptually R2 is the proportion of variation in the dependent variable y that can be ac- counted for by fitting y to a particular model instead of viewing it in isolation.
One could say that R2is a measure of goodness of fit. The standard formulation of R2is given by
R2= 1 − P (y − ˆy)2
P (y − ¯y)2 (9)
where ˆy is the fitted value of y with a linear regression model and ¯y is the esti- mated expectation of y.
The R2 statistic is used for model comparison, by defining different models and comparing them to the empty model: y = β0+ e with residual sum of squares (RSS) P (y − ¯y)2= RSS(empty) one can determine which model cap- tures the most variance and hence which covariates to include. It is however important to note that the R2 statistic in itself is not a measure of model adequacy, meaning that a large valued R2 does not imply that the model is good in the absolute sense. This can be illustrated by introducing the full model y = β0+ X1β1+ .... + Xkβk+ e with the residual sum of squares (RSS) P (y − ˆy)2= RSS(f ull). The expression (9) can now be written as
R2= 1 − RSS(f ull)
RSS(empty) (10)
(Anderson-Sprecher 1994)
One problem with the R2 statistic is that it necessarily increases when more covariates are added to a model. This can be partially prevented by using the adjusted R2 or ¯R2, since the adjusted R2 corrects for degrees of freedom and thereby deflates the value. The expression for the adjusted R2 is given by
R¯2= 1 −(n − 1)P (y − ˆy)2
(n − k)P (y − ¯y)2 (11)
where n is the number of observations and k is the number of included covariates.
(Hansen 2016)
2.3.2 Eta Squared, η2
Eta squared or η2 is a measure of effect size. The η2 statistic measures the relative reduction of variance in the residual term. This is achieved by comparing the variance of the residuals between two models, the first model is called the full model and contains all covariates. The second model is called the reduced model and contains all of the covariates from the full model except the covariates being studied. The η2 statistic will then present how much of the variance in the resisudal term was accounted for by the covariates removed from the full model. The η2 statisitc is calculated as follows
η2= | ˆe∗|2− |ˆe|2
| ˆe∗|2 (12)
where ˆe∗ is the residual term of the reduced model and ˆe is the residual of the full model. (Lang 2014)
2.3.3 F-test
F-test makes it possible to test if a number r of the β-estimates should be excluded from the model. This is done under the null hypothesis assumption, meaning that the r number of β’s is equal to zero. The test statistic for the F-test is given by
F = R2 1 − R2
n − k − 1
r (13)
where n is the number of observations and k is the number of covariates. (Lang 2015)
2.3.4 P-value
The p-value is often used in hypothesis testing. The value states the probability that the obtained result occurs under the null hypothesis assumption. Hence, a low p-value indicates that the null hypothesis should be rejected. The p-value is calculated as
P − value = P (F (r, n − k − 1) > F ) (14) where F (r, n − k − 1) is the F-distrubution, r is the number of covariates tested under the null hypothesis, n is the number of observations, k is the total amount of covariates and F is the F-statistic. (Lang 2015)
2.3.5 AIC
Akaike Information Criterion (AIC) is a model evaluation test. Sometimes it is difficult to decide whether an explanatory variable is appropriate to include in the model. In this case the AIC indicates which covariates to include. The AIC is formulated as
AIC = n ln (|ˆe|2) + 2k (15)
where n is the number of observations and k is the number of covariates. The model closest to the estimated “true model”, is the one that minimizes AIC since minimizing AIC is the same as minimizing the estimated information loss.
(Lang 2015)
3 Methodology
This section will explain which type of companies were included in the study, how the variables were defined, how the data was collected and finally how the initial model was defined and reduced. All calculations were performed in R.
3.1 Demarcation
The study was limited to dietary supplement retailing companies registered on the Swedish market between 2011 and 2015. In order to be included in the study the following requirements had to be met:
1. The company had to be registered at Swedish Companies Registration Office as a Limited or Private Company. This requirement was necessary since these types of companies are obligated to report their financial data and hence the data is available through annual reports (Swedish Compa- nies Registration Office 2016).
2. The company had to offer their assortment of products online. Meaning you had to be able to place an order on their website. This requirement was crucial since most of the differences in strategies and provided services were centered around online retailing.
3. The third requirement was that the company’s main offering had to be dietary supplements related to fitness or performance enhancing. These types of supplements would include: protein powders, pre- workouts, cre- atine and branched chain- amino acids to name a few. As a result, other types of stores who offered dietary supplements as an addition to their main assortment such as pharmacies and general sports equipment stores were excluded.
4. The last requirement was that the company had to offer business to con- sumer sales. This requirement was established since the investigated strategies were directed towards the business to consumer market.
In total 19 different companies were included in the study, all of which fulfilled the requirements stated above. The included companies are listed below, see Table 1.
Company Observed Year(s)
Atletbutiken 2015
B5 Sports 2013, 2014
BMR Sports Nutrition 2014
Bodypower 2012, 2013, 2014
Fitnessbutiken 2012, 2013, 2014 Fitnessguru 2012, 2013, 2014 Gymgrossisten 2012, 2013, 2014 Gymvaruhuset 2012, 2013, 2014, 2015 Lyftarshopen 2013, 2014, 2015
MMSports 2012, 2014
Muskelbygget 2012, 2013, 2014
Nutramino 2012, 2013, 2014
Proteinbolaget 2013, 2014, 2015 Proteinbutiken 2012, 2013, 2014 Proteinfabrikken 2012, 2013, 2014
Sportkost 2014
Supplementstore 2013, 2014, 2015 Svenskt Kosttillskott 2012, 2013, 2014 Tillskottsbolaget 2013, 2014
Table 1: Company Declaration
3.2 Defining Variables
3.2.1 Dependent Variable
The dependent variable in the regression consisted of the operating margin for the observed companies. This key figure was defined as
Operating M argin = EBIT
N et Sales (16)
3.2.2 Explanatory Variables
This section contains an explanation of the included covariates in the initial model. The covariates were selected on the basis that they may have an impact on profitability.
Employees
The employees covariate stated the amount of employees the considered com- pany had during the observed financial year.
Solidity
This covariate represented the solidity of the observed company at the end of each financial year. Solidity was defined as
Solidity = Adjusted Equity
T otal Assets = Equity + (T ax Rate × U ntaxed Reserves) T otal Assets
(17)
Cash Position
The Cash Position variable was the company’s cash position at the end of each financial year.
Average Salary
This covariate represented the average salary of an employee. It was defined as Average Salary = T otal Salary of Employees
N umber of Employees (18)
Years on the Market
Years on the Market stated how many years the considered company had been registered at the Swedish Companies Registration Office.
Physical Store
Physical Store was a dummy variable that indicated one if the company pos- sessed at least one physical store and zero otherwise.
Only Own Brand
Only Own Brand was a dummy variable that indicated one if the company only sold its own brand, and zero if they offered several brands.
Extensive Assortment
Extensive Assortment was a dummy variable that indicated one if the company offered an extensive assortment. An extensive assortment means that the com- pany sold more products than dietary supplements, such as training clothes and workout accessories.
Free Returns
This covariate was a dummy variable that indicated one if the company offered free returns and zero if not.
Free Shipping
This covariate was a dummy variable that indicated one if the company offered free shipping and zero if not.
Student Discount
This covariate was a dummy variable that indicated one if the company offered student discount and zero if not.
Bonus Ladder
Bonus Ladder was a dummy variable that indicated one if the company had special offers when certain price levels were reached. The special offers could consist of free or discounted products. If the company did not use a bonus ladder, the variable indicated zero.
Chat
Chat was a dummy variable that indicated one if the company provided cus- tomer service through an online chat service, and zero if not.
See Sections 5.1.3 and 5.1.4 for a discussion regarding the reasoning behind the choice of dependent variable and covariates.
3.3 Data
The data used throughout this study was mainly obtained online through differ- ent sites who publish annual reports and other key figures. The rest of the data was obtained through communication with company representatives. In the few cases where the company would not provide the data, the qualitative data was collected through the company’s social media feeds and from historical versions of their websites using the service Wayback Machine by Internet Archive.
3.3.1 Financial Data
Since the majority of the companies were private their annual reports were not available free of charge. Because of this fact all financial information was obtained through third party services who publish companies’ financial reports:
allabolag.se and merinfo.se.
3.3.2 Qualitative Data
Qualitative data was defined as data regarding services, offers and assortment provided by the company. This would have included: free shipping, free re- turns, bonus products or bonus ladders, student discounts, a wide assortment of brands, a wide assortment of additional products except dietary supple- ments, physical store and if they had a live interaction chat service on their website. These types of data were mainly obtained through contact with com- pany representatives. In some cases the company failed to provide the data.
The data was then obtained through the companies’ social media feeds and through previous versions of their websites using the Wayback Machine pro- vided by archive.org/web.
Some data proved difficult to find and therefore if the company did not respond the following assumptions were made.
Student Discounts
When the company did not provide the information regarding their historical student discount policies, the following action was taken.
1. The company’s social media feed for the particular year in question was investigated.
2. The company’s web page for that particular year was investigated.
3. The historical versions of the web pages mecenat.se and studentkortet.se were investigated to see whether they had any stated agreements with the company in question.
If neither of these three suggested that the company offered any student dis- counts it was assumed that they did not.
Bonus Ladders
Much like the student discounts these types of data were difficult to obtain if the company did not provide them. This was due to the fact that most companies did not advertise these types of offers, they only appeared once a customer was ready to finish their order. When the company did not provide the data the following action was taken.
1. The company’s social media feed for the particular year in question was investigated.
2. The company’s web page for that particular year was investigated.
If no indication of these types of offers were found it was assumed that the company did not offer any bonus ladders.
3.4 Initial Model
The initial model consisted of all the identified explanatory variables. The model was then reduced using several methods for covariate testing, such as ∆AIC, p-value and η2. The initial model was formulated as
yi= β0+ β1xi,1+ β2xi,2+ β3xi,3+ β4xi,4+ β5xi,5+ β6xi,6+ β7xi,7+ +β8xi,8+ β9xi,9+ β10xi,10+ β11xi,11+ β12xi,12+ β13xi,13+ ei
(19)
where i denoted the observational index and the parameters were measured as stated in Table 2.
Variable Name Unit
yi Operating Margin Percent, %
xi,1 Employees Quantity
xi,2 Solidity Percent, %
xi,3 Cash Position Percent, % xi,4 Average Salary kkr per year xi,5 Years on the Market Years
xi,6 Physical Store Dummy, 0 or 1 xi,7 Only Own Brand Dummy, 0 or 1 xi,8 Extensive Assortment Dummy, 0 or 1 xi,9 Free Returns Dummy, 0 or 1 xi,10 Free Shipping Dummy, 0 or 1 xi,11 Student Discount Dummy, 0 or 1 xi,12 Bonus Ladder Dummy, 0 or 1
xi,13 Chat Dummy, 0 or 1
Table 2: Defined Variables
3.4.1 VIF-test
After the initial model was formulated a VIF-test was performed to control for multicollinearity between explanatory variables. The calculations were per- formed for each variable according to equation (7). Results presented in Table 3.
Variable VIF
Employees 3.047
Solidity 1.757
Cash Position 2.601 Average Salary 2.639 Years on the Market 2.203 Physical Store 2.037 Only Own Brand 3.234 Extensive Assortment 2.396
Free Returns 2.692
Free Shipping 1.407 Student Discount 2.390
Bonus Ladder 3.820
Chat 2.806
Table 3: VIF-values of the Initial Model
Since all of the explanatory variables yielded a VIF value well below ten, none of the covariates were eliminated due to multicollinearity.
3.4.2 Estimation of the Initial Model
The regression of the initial model rendered the results presented in Table 4.
Covariate β-estimate Standard Error Eta Squared P-value Lower 2.5% Upper 97.5%
Intercept 0.00004 0.06210 0.00000 0.9995 -0.12591 0.12599
Employees -0.00027 0.00069 0.00394 0.6987 -0.00166 0.00112
Solidity 0.08351 0.03253 0.20537 0.0146 0.01753 0.14949
Cash Position -0.00683 0.02004 0.00662 0.7351 -0.04748 0.03381
Average Salary 0.00018 0.00020 0.05811 0.3669 -0.00022 0.00058
Years on the Market 0.00149 0.00213 0.00719 0.4905 -0.00284 0.00581
Physical Store 0.00730 0.03436 0.00106 0.8330 -0.06239 0.07699
Only Own Brand -0.09041 0.06026 0.07488 0.1422 -0.21262 0.03179
Extensive Assortment -0.00207 0.04286 0.00003 0.9617 -0.08899 0.08485
Free Returns 0.04643 0.06179 0.02756 0.4573 -0.07889 0.17175
Free Shipping -0.04894 0.05418 0.02618 0.3724 -0.15882 0.06095
Student Discount -0.05275 0.03563 0.04743 0.1475 -0.12501 0.01952
Bonus Ladder 0.01070 0.06290 0.00118 0.8659 -0.11687 0.13827
Chat 0.07189 0.06741 0.03302 0.2933 -0.06482 0.20859
Table 4: Results of the Initial Model Initial Model Values:
• Multiple R-squared: R2= 0.6065
• Adjusted R-squared: ¯R2= 0.4645
• F-statistic for all covariates to be equal to zero: 4.269 on 13 and 36 degrees of freedom
• P-value for all covariates to be equal to zero: 0.000272
3.4.3 Reduction of the Initial Model
After estimating the initial model several statistics were taken into account in order to establish whether the model would have benefited from reduction of covariates. These statistics included, η2, ∆AIC, adjusted R2 and p-value.
Eta Squared
The η2 statistic was used to evaluate how much variance each covariate ac- counted for in the model. A low valued η2 indicated that the covariate should have been removed from the model since it was not reducing the amount of variance by a significant amount.
∆AIC
This statistic was defined as
∆AIC = AIC(Reduced M odel) − AIC(F ull M odel) (20) where the Reduced Model consisted of all but one of the covariates from the Full Model. This statistic was calculated for each covariate in the model and if the ∆AIC statistic was negative that would have suggested that the Reduced Model was favored over the Full Model.
Adjusted R2
The adjusted R2 was calculated for every new model after a covariate was re- moved. If the value of the ¯R2was higher for the reduced model than the former model, the conclusion was drawn that the model was improved by the reduction.
P-value
The estimated p-value for each covariate was calculated, a large p-value would have suggested that the null hypothesis was likely to be true. Hence covariates with large p-values were excluded from the model.
All of these statistics were evaluated together for each covariate, if they all coincided meaning that the covariate had a low valued η2, negative ∆AIC, high p-value and if the model produced a higher adjusted R2without the covariate, the covariate was eliminated from the model. The procedure was repeated until the model was no longer reducible. The excluded covariates are shown in Table 5.
Covariate P-value η2 ∆AIC Adjusted R2
Extensive Assortment 0.96170 0.00003 -1.99858 0.4789 Physical Store 0.82379 0.00117 -1.94144 0.4920 Bonus Ladder 0.77724 0.00311 -1.84426 0.5035
Employees 0.71596 0.00392 -1.80358 0.5140
Cash Position 0.74518 0.00565 -1.71684 0.5232 Years on the Market 0.70105 0.00346 -1.82652 0.5329 Free Shipping 0.35016 0.02536 -0.71576 0.5319
Table 5: Eliminated Covariates
4 Results
4.1 The Final Model
Covariate β-estimate Standard Error Eta Squared P-value Lower 2.5% Upper 97.5%
Intercept -0.03198 0.02953 0.02953 0.2849 -0.09154 0.02758
Solidity 0.08305 0.02262 0.26685 0.0007 0.03742 0.12868
Average Salary 0.00020 0.00008 0.13962 0.0203 0.00003 0.00036
Only Own Brand -0.10356 0.03967 0.19182 0.0124 -0.18357 -0.02356
Free Returns 0.04664 0.02216 0.05764 0.0412 0.00196 0.09133
Student Discount -0.06705 0.22469 0.11728 0.0047 -0.11237 -0.02174
Chat 0.07230 0.0209 0.06890 0.0012 0.03018 0.11441
Table 6: Results of the Final Model Final Model Values:
• Multiple R-squared: R2= 0.5892
• Adjusted R-squared: ¯R2= 0.5319
• F-statistic for all covariates to be equal to zero: 10.28 on 6 and 43 degrees of freedom
• P-value for all covariates to be equal to zero: 4.811e-07
The final regression rendered the following equation, with β-estimates obtained from Table 6:
Operating M argin = −0.03198 + 0.08305 × [Solidity] + 0.00020 × [Average Salary]−
−0.10356 × [Only Own Brand] + 0.04664 × [F ree Returns] − 0.06705 × [Student Discount]+
+0.07230 × [Chat]
(21)
4.2 Model Validation
This section will explain and compare the results of the initial model and the final model. The issue of homoskedasticity versus heteroskedasticity will also be adressed.
4.2.1 VIF
In order to control for multicollinearity a new VIF-test was conducted. The VIF-test yielded no problems with multicollinearity, consequently no covariates had to be excluded from the final model. See Table 7 for VIF results.
Variable VIF
Solidity 1.182
Average Salary 1.132 Only Own Brand 1.386 Free Returns 1.206 Student Discount 1.386
Chat 1.254
Table 7: VIF-values of the Final Model
4.2.2 Adjusted R2
The initial model yielded an adjusted R2 statistic of 0.4645, while the reduced final model yielded an adjusted R2statistic of 0.5319. The final model obtained a larger value which would have implied that the amount of variance in the dependent variable captured by the covariates was larger for the final model.
Thus the conclusion was drawn that the final model had more explanatory properties than the initial model.
4.2.3 F-Statistics and P-values
The initial model yielded an F-statistic of 4.269 on 13 and 36 degrees of freedom while the final model yielded an F-statistic of 10.28 on 6 and 43 degrees of free- dom. The higher valued F-statistic obtained from the final model meant that the hypothesis that all the included covariates were equal to zero was less likely to be true. Thus the conclusion was drawn that the final model was closer to the true model than the initial model, and hence was a better representation of the relationship between the dependent variable and the explanatory variables.
The p-values of each covariate in the final model were below 0.05, this meant that the model suggested that with 95% confidence the null hypothesis could be rejected. It was hence very likely that the covariates had an effect on the dependent variable. A comparison between the p-values for the entire initial model and final model resulted in the conclusion that the final model was less likely to fulfill the null hypothesis, i.e the final model was an improvement from the initial model.
4.2.4 Residuals
In Figure 4 below the residual and density plots of the final model are shown.
The residuals are randomly spread which would suggest homoskedasticity. The density plot does not exactly look normally distributed. However, it does re- semble a normally distributed density with expectation equal to zero, which would imply homoskedasticity. However, due to the discrepancies and the slight offset it is more likely for the data to be heteroskedastic. The residual plot was generated using the function plot(resid(regression)) and the density plot was generated using the function plot(density(resid(regression))).
Figure 4: Residual and Density plot
In Figure 5 below the normal quantile quantile plot does approximately fol- low a straight line. This implies that the data comes from a normal distribu- tion, and consequently suggests homoskedasticity. The normal quantile quan- tile plot was generated using the functions qqnorm(resid(regression)) and qq- line(resid(regression)).
Figure 5: Normal Q-Q Plot
5 Discussion
This section will discuss the potential problems with the study’s methodology.
A motivation for the choice of the initial covariates will be given as well as an interpretation and some criticisms towards the final model.
5.1 Method
The following section will discuss regression as a tool for evaluating strategy, the demarcation, data collecting, variable definitions and finally the reduction of the initial regression model.
5.1.1 Using Regression to Evaluate Strategy
Regression might appear to be an unconventional way of evaluating business strategy. However, the aim of the study was to identify factors and decisions that had a statistically significant impact on profitability, and therefore utiliz- ing regression was a natural choice of method. Regression has also been used in earlier managerial studies to evaluate business strategies within other ar- eas. For example, a study on how entry barriers affect industry profitability was conducted using regression (Chappel, Marks & Park 1983), as well as a study on how racial diversity interacted with business strategy to determine firm performance (Richard 2000). Similarly, a study has also been made on gym profitability using linear regression (Axelsson & K¨allsbo 2015). The aim of the project combined with the knowledge of previous studies with similar methodology would suggest that regression is a valid approach to our problem formulation.
5.1.2 Demarcation
The main focus of the study was to investigate how different strategies regarding dietary supplement retailing affected business to consumer sales. Therefore it is important to mention the fact that some of the included companies also offered business to business sales. This might have affected the results of the study since the model did not account for how much of the company’s operations were represented by the business to consumer sales. Another potential issue in the demarcation requirements was that the company had to be registered in Swe- den, this might have led to important market competitors being excluded from the study. It is also important to note that some of the included companies who were registered in Sweden had counterparts or might have conducted sales in other countries where their strategy might have differed. Which again could have affected the accuracy of the results.
It is also worth mentioning that some companies who fulfilled the requirements had to be excluded. This was mainly due to a lack of financial data since some of them were registered in 2015 and therefore had not yet been operating for a full year. Some interesting market competetitors also had to be excluded due to being sole proprietorships or being part of larger companies with several different businesses and lacked individual financial reports.