Determining Important Factors for Profiability in Dietary Supplement Retailing by Multiple Linear Regression

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2016,

Determining Important Factors for Profitability in Dietary Supplement Retailing by Multiple Linear

Regression

AUGUST DÉNES MARTIN LINDBLOM

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)

(3)

Determining Important Factors for Profitability in Dietary Supplement Retailing by Multiple Linear Regression

A U G U S T D É N E S M A R T I N L I N D B L O M

Degree Project in Applied Mathematics and Industrial Economics (15 credits) Degree Progr. in Industrial Engineering and Management (300 credits)

Royal Institute of Technology year 2016 Supervisors at KTH: Fredrik Armerin, Jonatan Freilich

Examiner: Henrik Hult

TRITA-MAT-K 2016:10 ISRN-KTH/MAT/K--16/10--SE

Royal Institute of Technology SCI School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

(4)

(5)

Determining Important Factors for Profitability in Dietary Supplement Retailing by Multiple

Linear Regression

August D´enes Martin Lindblom

May 24, 2016

Abstract

This thesis in applied statistics and industrial economics examined which factors and strategies that had a statistically significant impact on profitability, within the business to consumer dietary supplement market. The data this thesis was based on consisted of several annual reports from the year 2011 to 2015 and other strategic information. The data included 19 different dietary supplement retailers on the Swedish market. In order to establish which factors had a significant impact on profitability, a linear regression was used. The result was an identified linear relationship between Operating Margin and the covariates Solidity, Average Salary, Only Own Brand, Free Returns, Student Discounts and Chat. A market analysis was then performed using Porter’s five forces and a PEST analysis.

The analysis concluded that the market is attractive but there are a few uncertainties surrounding the future development of the dietary supplement retailing market.

Sammanfattning

Det här kanditatexamensarbetet i tillämpad statistik och industriell ekonomi undersökte vilka faktorer och strategier som hade en statistiskt signifikant p˚averkan p˚a lönsamhet, inom kosttillskotts˚aterförsäljning p˚a den svenska konsumentmarknaden. Data som rapporten är baserad p˚a kom ifr˚an fler- talet ˚arsredovisningar mellan ˚aren 2011 och 2015 samt annan strategisk information. Studien inkluderade 19 olika företag p˚a den svenska kosttillskotts˚aterförsäljningsmarknaden. För att identifiera de p˚averkande fak- torerna tillämpades linjär regression. Resultat visade att det finns ett linjärt samband mellan Rörelsemarginal och kovariaten Soliditet, Snittlön, Endast Eget Märke, Fria Returer, Studentrabatt och Chatt. Sedan utfördes en marknadsanalys med hjälp av Porters’s femkraftsmodell samt PEST- analys. Anslysen fastslog att marknaden är attraktiv, men det finns n˚agra osäkerheter rörande den framtida utvecklingen p˚a kosttillskotts˚aterförsäjn- ingsmarknaden.

(6)

(7)

Contents

1 Introduction 4

1.1 Background . . . . 4

1.2 Aim . . . . 4

1.3 Problem Formulation . . . . 5

2 Theory 6 2.1 The Homoskedastic Linear Regression Model . . . . 6

2.1.1 Assumptions . . . . 6

2.1.2 Covariates . . . . 7

2.1.3 The Ordinary Least Square Estimation . . . . 7

2.2 Potential Modelling Errors . . . . 7

2.2.1 Multicollinearity . . . . 7

2.2.2 Micronumerosity . . . . 8

2.2.3 Heteroskedasticity . . . . 8

2.2.4 Endogeneity . . . . 10

2.3 Model Evaluation . . . . 11

2.3.1 R² and Adjusted R² . . . . 11

2.3.2 Eta Squared, η² . . . . 12

2.3.3 F-test . . . . 12

2.3.4 P-value . . . . 12

2.3.5 AIC . . . . 12

3 Methodology 14 3.1 Demarcation . . . . 14

3.2 Defining Variables . . . . 15

3.2.1 Dependent Variable . . . . 15

3.2.2 Explanatory Variables . . . . 15

3.3 Data . . . . 17

3.3.1 Financial Data . . . . 17

3.3.2 Qualitative Data . . . . 17

3.4 Initial Model . . . . 18

3.4.1 VIF-test . . . . 18

3.4.2 Estimation of the Initial Model . . . . 19

3.4.3 Reduction of the Initial Model . . . . 20

4 Results 21 4.1 The Final Model . . . . 21

4.2 Model Validation . . . . 21

4.2.1 VIF . . . . 21

4.2.2 Adjusted R² . . . . 22

4.2.3 F-Statistics and P-values . . . . 22

4.2.4 Residuals . . . . 22

5 Discussion 24 5.1 Method . . . . 24

5.1.1 Using Regression to Evaluate Strategy . . . . 24

5.1.2 Demarcation . . . . 24

5.1.3 Dependent Variable . . . . 25

(8)

5.1.4 Explanatory Variables . . . . 25

5.1.5 Data . . . . 27

5.1.6 The Reduction . . . . 28

5.2 Results . . . . 29

5.2.1 The Final Model . . . . 29

5.2.2 Model Validation . . . . 30

6 Market Analysis 33 6.1 Choosing Market Analysis Models . . . . 33

6.2 Theoretical Framework . . . . 33

6.2.1 Porter’s Five Forces . . . . 33

6.2.2 Criticism Towards Porters’ Five Force Model . . . . 34

6.2.3 PEST Analysis . . . . 34

6.3 Application of Porter’s Five Forces . . . . 35

6.3.1 Threat of New Entrants . . . . 35

6.3.2 Power of Suppliers . . . . 36

6.3.3 Power of Buyers . . . . 36

6.3.4 Threat of Available Substitutes . . . . 36

6.3.5 Competitive Rivalry . . . . 37

6.3.6 Summary of Porter’s Five Forces . . . . 37

6.4 PEST Analysis of the Dietary Supplement Market . . . . 37

6.4.1 Political Factors . . . . 37

6.4.2 Economical Factors . . . . 38

6.4.3 Social Factors . . . . 38

6.4.4 Technological Factors . . . . 38

6.4.5 Summary of PEST Analysis . . . . 39

7 Conclusion 40 8 References 41 List of Tables 1 Company Declaration . . . . 15

2 Defined Variables . . . . 18

3 VIF-values of the Initial Model . . . . 19

4 Results of the Initial Model . . . . 19

5 Eliminated Covariates . . . . 20

6 Results of the Final Model . . . . 21

7 VIF-values of the Final Model . . . . 22

List of Figures 1 Homo- and Heteroskedastic Residual Plots . . . . 9

2 Homo- and Heteroskedastic Density Plots . . . . 9

3 Homo- and Heteroskedastic Normal Q-Q Plots . . . . 10

4 Residual and Density plot . . . . 23

5 Normal Q-Q Plot . . . . 23

6 Illustration of Porter’s Five Forces . . . . 35

(9)

1 Introduction

1.1 Background

The supplement retailing industry has been and still is considered a rather stig- matized market. This could be due to the fact that dietary supplements are deemed unnecessary by certain health researchers (Bratt 2014). Supplements have also been associated with bodybuilding, an industry known for its use of anabolic steroids (Chappell 2015). Furthermore, a study conducted by the Stockholm Environment Administration in 2012 concluded that 40 % of the 122 studied companies carried one or more products containing illegal or poten- tially harmful substances. In total 61 products received a sale prohibition and 15 substances were deemed illegal or classified as pharmaceuticals (Johansson

& Jonsson 2013).

Despite this stigma, the supplement industry in Sweden has been constantly growing and widening its customer base in recent years. Consequently the supplement market has now become a multibillion SEK industry. The growth is partially attributed to the current fitness trend and the widespread use of social media as a marketing platform (TT 2014). This has led to several new competitors on the market, hence it probably becomes more important to dif- ferentiate the company’s brand to generate profitability. This becomes evident when comparing competitors, they may differ in their assortment, discounts and offer online retailing or physical stores. It is most likely also important to compare other managerial aspects of the business, such as the amount of customer service provided or the company’s relation to its suppliers.

The supplement industry is not a new phenomenon, yet it has never been as widely successful as it is now. This rapid ascent makes it an interesting market to analyze.

1.2 Aim

The aim of the project is to establish how different strategies or factors within the business to consumer dietary supplement market impact profitability. The project will also present an interpretation of why these factors have an effect.

This is performed with the intent of providing a structural interpretation. Mean- ing that the aim is to determine how different factors and strategies affect a company’s profitability (positive or negative), rather than determining to what extent these factors contribute. The project also aims to provide sufficient un- derstanding of the current market conditions to enable companies to formulate a business strategy that take these factors into account.

The project becomes mostly relevant to potential market entrants and current competitors. The project might also be applicable to some extent in similar retailing markets.

(10)

1.3 Problem Formulation

In order to achieve the aim, the project attempts to answer the following problem formulations:

• Which factors have a statistically significant impact on profitability in dietary supplement retailing?

– How could these factors be interpreted?

• Is the market attractive in terms of long-run profitability?

• Is the market sensitive to external factors?

(11)

2 Theory

2.1 The Homoskedastic Linear Regression Model

The multiple linear regression model is specified as follows yi=

k

X

j=0

xijβj+ ei, i = 1, . . . , n. (1) in this specification y_i is an observation of a random dependent variable y.

The value of y depends on the explanatory variables or covariates x_·j and the additional random variable ei, the residual or error term. The βj parameters are what is called regression coefficients. These parameter are unknown and are to be estimated from observational data. The stated model (1) consist of n observations and k covariates, this can be denoted on matrix form as

Y = Xβ + e (2)

where

Y =





 y1

y2

... yn





 , X =







1 x11 x12 · · · x1k

1 x21 x22 · · · x2k

... ... ... . .. ... 1 xn1 xn2 · · · xnk





 , β =





 β0

β1

... βk





 , e =





 e1

e2

... en





 (Lang 2014)

2.1.1 Assumptions

The previously stated regression model consist of five basic assumptions regarding the way the observations are generated.

1. The first assumption is that the dependent variable y can be calculated as a linear combination of the covariates x_·j plus the residual e. The unknown coefficients of this function form the vector β in equation (2).

2. The second assumption is that the expected value of the residual is equal to zero

E[e_i] = 0

This means that the mean of the distribution from which the error e is drawn is zero.

3. The third assumption is that all the error terms have the same variance and that they are uncorrelated to one another. This can be expressed as

E[e²_i] = σ² where σ is unknown.

4. The fourth assumption is that the independent variables or covariates are deterministic, meaning that they are fixed in repeated samples.

5. The final assumption is that the number of observations is larger than the number of covariates and that there are not any linear relationships between any of the covariates.

(Kennedy 2008)

(12)

2.1.2 Covariates

There are two types of covariates to consider when formulating a regression model. These are quantitative covariates and qualitative covariates.

The quantitative covariates assume a numeric value to quantify how much of a property was recorded in a specific outcome. For example if a car model is considered a quantitative covariate could be the amount of horsepower this particular cars engine can generate.

A qualitative covariate or a dummy variable only assumes the value one or zero to indicate whether a particular outcome has a certain quality. If for example a car is yet again considered a dummy variable could be if the car is made in the U.S. If the car was made in the U.S the dummy variable would assume the value one. If the car was not made in the U.S the dummy variable would assume the value zero.

2.1.3 The Ordinary Least Square Estimation

The ordinary least square estimate of the vector β from (2) is the vector ˆβ that minimizes the sum of the squared residuals ˆe^te = |ˆˆ e^t|², where ˆe and ˆβ is defined as

ˆ

e = Y − X ˆβ, ˆβ =





 βˆ0

βˆ₁ ... βˆk







(3)

in order to find the vector ˆβ that minimizes the sum of the squared residuals the normal equations (4) are solved for ˆβ

X^te = 0ˆ (4)

by using equation (3) in (4) we get the expression

X^t(Y − X ˆβ) = 0 (5)

solving for ˆβ gives us the expression

β = (Xˆ ^tX)⁻¹X^tY (6)

(Lang 2014)

2.2 Potential Modelling Errors

2.2.1 Multicollinearity

Multicollinearity occurs when at least one of the covariates are strongly correlated with a linear combination of other covariates in the model. It causes the standard errors of one or more of the regression coefficients to be large, hence the point estimates of these coefficients become imprecise. However, these standard errors decreases with increasing number of observations, which means the

(13)

problem is often due to lack of data.

If at least two of the covariates in the model are perfectly linearly dependent, the OLS estimate has no unique solution, consequently the X^tX matrix becomes singular. This is called strict or perfect multicollinearity, and causes the OLS estimation to be impossible regardless of the amount of observations. When it is not as obvious, it is possible to detect multicollinearity by utilizing the Variance Inflation Factor (VIF).

VIF

The VIF is a method of detecting multicollinearity and is defined as

V IF = (1 − R²)⁻¹ (7)

R² in the expression above is derived for every covariate by implementing the investigated covariate as the dependent variable with respect to the other covariates in the model. The R² is then calculated as shown in Section 9. A VIF value > 10 indicates that multicollinearity is a substantial problem in the model. (Lang 2015, Hansen 2016)

2.2.2 Micronumerosity

Another problem with small sample sizes is micronumerosity. If the amount of observations is small it is likely that the asymptotics, stemmed from the Central Limit Theorem, do not occur. In this case a homoskedastic approach must be employed instead of the, in most cases more robust heteroskedastic approach.

(Hansen 2016, Lang 2015) 2.2.3 Heteroskedasticity

Heteroskedasticity implies that the variances of the error terms are dependent of the observations. This is contradictory to the third assumption in Section 2.1.1 which states that all error terms have the same variances. The phenomenon of different variances causes the standard deviations of the parameter estimates to be inconsistent. If the residuals are close to being homoskedastic it is preferable to reformulate the model so that it is possible to use OLS.

To detect if heteroskedasticity is present it is possible to use the Residual plot, the Density plot and the Normal Quantile Quantile plot (Q-Q plot). The dots in the residual plot will be randomly spread if the residuals are homoskedastic, illustrated to the left in Figure 1, and if the residuals are heteroskedastic the dots will follow some sort of pattern, illustrated to the right in Figure 1.

(14)

Figure 1: Homo- and Heteroskedastic Residual Plots

The density plot curve is symmetric if the data is homoskedastic, illustrated to the left in Figure 2, otherwise the data is considered heteroskedastic, illustrated to the right in Figure 2.

Figure 2: Homo- and Heteroskedastic Density Plots

The Q-Q plot plots the standardized residuals against the theoretical quantiles.

If the Q-Q plotted dots follow a straight line, illustrated to the left in Fig- ure 3, the regression is homoskedastic otherwise heteroskedasticity is present, illustrated to the right in Figure 3.

(15)

Figure 3: Homo- and Heteroskedastic Normal Q-Q Plots

If reformulation is not possible or the problem with heteroskedasticity remains after reformulation, Halbert White introduced a tool that estimates the covari- ance matrix for the heteroskedastic regression model in 1980. This estimator is called White’s Consistent Variance Estimator and is formulated as

Cov( ˆβ) = (X^tX)⁻¹X^tD(ê²_i)X(X^tX)⁻¹ (8) where D(ê²_i) is a diagonal matrix with diagonal elements ê²_i. (Lang 2015) 2.2.4 Endogeneity

Endogeneity occurs when at least one of the covariates is correlated with the residual. This problem only appears when the regression is used for a structural interpretation and not when it is used for prediction purposes. The phenomenon causes inconsistent OLS estimates. If the correlation between a covariate and the residual is positive the coefficient will be over estimated and vice versa.

There are different reasons as to why problems with endogeneity appears. Some of the most common reasons are

Sample selection bias

Sample selection bias implies that the data selection is biased on some other basis than the covariates explain. The unexplained basis may lead to incorrect conclusions.

Simultaneity

Simultaneity appears when the dependent variable influence any of the covariates in the regression. If an event that is not a part of the explanatory variables in the regression influence the dependent variable, it will be a part of the residual. Consequently the dependent variable simultaneously influences a covariate, thereby the residual influences the covariate and endogeneity arises.

Missing Relevant covariates

If the regression model misses relevant explanatory variables, they will be part of the residual and may cause incorrect estimates. Hence it is important to

(16)

include all of the relevant covariates.

Measurement errors

Measurement errors can also cause endogeneity. A measurement error in the dependent variable will only add an element in the residual, and therefore it will not cause endogeneity. However, if the measurement error is made in a covariate it will cause a correlation between the covariate and the residual, unless the β-estimate is zero, meaning endogeneity is present.

Problems with endogeneity may be remedied by replacing the endogenous variable with heavily correlated variables called instruments. It is important that the instruments are not correlated with the residual for the method to be help- ful. This process is called 2SLS, Two Stage Least Squares.

(Lang 2015; Hansen 2016)

2.3 Model Evaluation

2.3.1 R² and Adjusted R²

The R²is a statistic that is often reported by statistical software. Conceptually R² is the proportion of variation in the dependent variable y that can be accounted for by fitting y to a particular model instead of viewing it in isolation.

One could say that R²is a measure of goodness of fit. The standard formulation of R²is given by

R²= 1 − P (y − ˆy)²

P (y − ¯y)² (9)

where ˆy is the fitted value of y with a linear regression model and ¯y is the estimated expectation of y.

The R² statistic is used for model comparison, by defining different models and comparing them to the empty model: y = β0+ e with residual sum of squares (RSS) P (y − ¯y)²= RSS(empty) one can determine which model cap- tures the most variance and hence which covariates to include. It is however important to note that the R² statistic in itself is not a measure of model adequacy, meaning that a large valued R² does not imply that the model is good in the absolute sense. This can be illustrated by introducing the full model y = β₀+ X₁β₁+ .... + X_kβ_k+ e with the residual sum of squares (RSS) P (y − ˆy)²= RSS(f ull). The expression (9) can now be written as

R²= 1 − RSS(f ull)

RSS(empty) (10)

(Anderson-Sprecher 1994)

One problem with the R² statistic is that it necessarily increases when more covariates are added to a model. This can be partially prevented by using the adjusted R² or ¯R², since the adjusted R² corrects for degrees of freedom and thereby deflates the value. The expression for the adjusted R² is given by

R¯²= 1 −(n − 1)P (y − ˆy)²

(n − k)P (y − ¯y)² (11)

(17)

where n is the number of observations and k is the number of included covariates.

(Hansen 2016)

2.3.2 Eta Squared, η²

Eta squared or η² is a measure of effect size. The η² statistic measures the relative reduction of variance in the residual term. This is achieved by comparing the variance of the residuals between two models, the first model is called the full model and contains all covariates. The second model is called the reduced model and contains all of the covariates from the full model except the covariates being studied. The η² statistic will then present how much of the variance in the resisudal term was accounted for by the covariates removed from the full model. The η² statisitc is calculated as follows

η²= | ˆe_∗|²− |ˆe|²

| ˆe_∗|² (12)

where ˆe∗ is the residual term of the reduced model and ˆe is the residual of the full model. (Lang 2014)

2.3.3 F-test

F-test makes it possible to test if a number r of the β-estimates should be excluded from the model. This is done under the null hypothesis assumption, meaning that the r number of β’s is equal to zero. The test statistic for the F-test is given by

F = R² 1 − R²

n − k − 1

r (13)

where n is the number of observations and k is the number of covariates. (Lang 2015)

2.3.4 P-value

The p-value is often used in hypothesis testing. The value states the probability that the obtained result occurs under the null hypothesis assumption. Hence, a low p-value indicates that the null hypothesis should be rejected. The p-value is calculated as

P − value = P (F (r, n − k − 1) > F ) (14) where F (r, n − k − 1) is the F-distrubution, r is the number of covariates tested under the null hypothesis, n is the number of observations, k is the total amount of covariates and F is the F-statistic. (Lang 2015)

2.3.5 AIC

Akaike Information Criterion (AIC) is a model evaluation test. Sometimes it is difficult to decide whether an explanatory variable is appropriate to include in the model. In this case the AIC indicates which covariates to include. The AIC is formulated as

AIC = n ln (|ˆe|²) + 2k (15)

(18)

where n is the number of observations and k is the number of covariates. The model closest to the estimated “true model”, is the one that minimizes AIC since minimizing AIC is the same as minimizing the estimated information loss.

(Lang 2015)

(19)

3 Methodology

This section will explain which type of companies were included in the study, how the variables were defined, how the data was collected and finally how the initial model was defined and reduced. All calculations were performed in R.

3.1 Demarcation

The study was limited to dietary supplement retailing companies registered on the Swedish market between 2011 and 2015. In order to be included in the study the following requirements had to be met:

1. The company had to be registered at Swedish Companies Registration Office as a Limited or Private Company. This requirement was necessary since these types of companies are obligated to report their financial data and hence the data is available through annual reports (Swedish Compa- nies Registration Office 2016).

2. The company had to offer their assortment of products online. Meaning you had to be able to place an order on their website. This requirement was crucial since most of the differences in strategies and provided services were centered around online retailing.

3. The third requirement was that the company’s main offering had to be dietary supplements related to fitness or performance enhancing. These types of supplements would include: protein powders, pre- workouts, cre- atine and branched chain- amino acids to name a few. As a result, other types of stores who offered dietary supplements as an addition to their main assortment such as pharmacies and general sports equipment stores were excluded.

4. The last requirement was that the company had to offer business to consumer sales. This requirement was established since the investigated strategies were directed towards the business to consumer market.

In total 19 different companies were included in the study, all of which fulfilled the requirements stated above. The included companies are listed below, see Table 1.

(20)

Company Observed Year(s)

Atletbutiken 2015

B5 Sports 2013, 2014

BMR Sports Nutrition 2014

Bodypower 2012, 2013, 2014

Fitnessbutiken 2012, 2013, 2014 Fitnessguru 2012, 2013, 2014 Gymgrossisten 2012, 2013, 2014 Gymvaruhuset 2012, 2013, 2014, 2015 Lyftarshopen 2013, 2014, 2015

MMSports 2012, 2014

Muskelbygget 2012, 2013, 2014

Nutramino 2012, 2013, 2014

Proteinbolaget 2013, 2014, 2015 Proteinbutiken 2012, 2013, 2014 Proteinfabrikken 2012, 2013, 2014

Sportkost 2014

Supplementstore 2013, 2014, 2015 Svenskt Kosttillskott 2012, 2013, 2014 Tillskottsbolaget 2013, 2014

Table 1: Company Declaration

3.2 Defining Variables

3.2.1 Dependent Variable

The dependent variable in the regression consisted of the operating margin for the observed companies. This key figure was defined as

Operating M argin = EBIT

N et Sales (16)

3.2.2 Explanatory Variables

This section contains an explanation of the included covariates in the initial model. The covariates were selected on the basis that they may have an impact on profitability.

Employees

The employees covariate stated the amount of employees the considered company had during the observed financial year.

Solidity

This covariate represented the solidity of the observed company at the end of each financial year. Solidity was defined as

Solidity = Adjusted Equity

T otal Assets = Equity + (T ax Rate × U ntaxed Reserves) T otal Assets

(17)

(21)

Cash Position

The Cash Position variable was the company’s cash position at the end of each financial year.

Average Salary

This covariate represented the average salary of an employee. It was defined as Average Salary = T otal Salary of Employees

N umber of Employees (18)

Years on the Market

Years on the Market stated how many years the considered company had been registered at the Swedish Companies Registration Office.

Physical Store

Physical Store was a dummy variable that indicated one if the company pos- sessed at least one physical store and zero otherwise.

Only Own Brand

Only Own Brand was a dummy variable that indicated one if the company only sold its own brand, and zero if they offered several brands.

Extensive Assortment

Extensive Assortment was a dummy variable that indicated one if the company offered an extensive assortment. An extensive assortment means that the company sold more products than dietary supplements, such as training clothes and workout accessories.

Free Returns

This covariate was a dummy variable that indicated one if the company offered free returns and zero if not.

Free Shipping

This covariate was a dummy variable that indicated one if the company offered free shipping and zero if not.

Student Discount

This covariate was a dummy variable that indicated one if the company offered student discount and zero if not.

Bonus Ladder

Bonus Ladder was a dummy variable that indicated one if the company had special offers when certain price levels were reached. The special offers could consist of free or discounted products. If the company did not use a bonus ladder, the variable indicated zero.

Chat

Chat was a dummy variable that indicated one if the company provided customer service through an online chat service, and zero if not.

(22)

See Sections 5.1.3 and 5.1.4 for a discussion regarding the reasoning behind the choice of dependent variable and covariates.

3.3 Data

The data used throughout this study was mainly obtained online through different sites who publish annual reports and other key figures. The rest of the data was obtained through communication with company representatives. In the few cases where the company would not provide the data, the qualitative data was collected through the company’s social media feeds and from historical versions of their websites using the service Wayback Machine by Internet Archive.

3.3.1 Financial Data

Since the majority of the companies were private their annual reports were not available free of charge. Because of this fact all financial information was obtained through third party services who publish companies’ financial reports:

allabolag.se and merinfo.se.

3.3.2 Qualitative Data

Qualitative data was defined as data regarding services, offers and assortment provided by the company. This would have included: free shipping, free returns, bonus products or bonus ladders, student discounts, a wide assortment of brands, a wide assortment of additional products except dietary supplements, physical store and if they had a live interaction chat service on their website. These types of data were mainly obtained through contact with company representatives. In some cases the company failed to provide the data.

The data was then obtained through the companies’ social media feeds and through previous versions of their websites using the Wayback Machine provided by archive.org/web.

Some data proved difficult to find and therefore if the company did not respond the following assumptions were made.

Student Discounts

When the company did not provide the information regarding their historical student discount policies, the following action was taken.

1. The company’s social media feed for the particular year in question was investigated.

2. The company’s web page for that particular year was investigated.

3. The historical versions of the web pages mecenat.se and studentkortet.se were investigated to see whether they had any stated agreements with the company in question.

If neither of these three suggested that the company offered any student discounts it was assumed that they did not.

(23)

Bonus Ladders

Much like the student discounts these types of data were difficult to obtain if the company did not provide them. This was due to the fact that most companies did not advertise these types of offers, they only appeared once a customer was ready to finish their order. When the company did not provide the data the following action was taken.

1. The company’s social media feed for the particular year in question was investigated.

2. The company’s web page for that particular year was investigated.

If no indication of these types of offers were found it was assumed that the company did not offer any bonus ladders.

3.4 Initial Model

The initial model consisted of all the identified explanatory variables. The model was then reduced using several methods for covariate testing, such as ∆AIC, p-value and η². The initial model was formulated as

yi= β0+ β1xi,1+ β2xi,2+ β3xi,3+ β4xi,4+ β5xi,5+ β6xi,6+ β7xi,7+ +β8xi,8+ β9xi,9+ β10xi,10+ β11xi,11+ β12xi,12+ β13xi,13+ ei

(19)

where i denoted the observational index and the parameters were measured as stated in Table 2.

Variable Name Unit

y_i Operating Margin Percent, %

x_i,1 Employees Quantity

x_i,2 Solidity Percent, %

xi,3 Cash Position Percent, % xi,4 Average Salary kkr per year xi,5 Years on the Market Years

xi,6 Physical Store Dummy, 0 or 1 xi,7 Only Own Brand Dummy, 0 or 1 xi,8 Extensive Assortment Dummy, 0 or 1 xi,9 Free Returns Dummy, 0 or 1 xi,10 Free Shipping Dummy, 0 or 1 xi,11 Student Discount Dummy, 0 or 1 x_i,12 Bonus Ladder Dummy, 0 or 1

x_i,13 Chat Dummy, 0 or 1

Table 2: Defined Variables

3.4.1 VIF-test

After the initial model was formulated a VIF-test was performed to control for multicollinearity between explanatory variables. The calculations were performed for each variable according to equation (7). Results presented in Table 3.

(24)

Variable VIF

Employees 3.047

Solidity 1.757

Cash Position 2.601 Average Salary 2.639 Years on the Market 2.203 Physical Store 2.037 Only Own Brand 3.234 Extensive Assortment 2.396

Free Returns 2.692

Free Shipping 1.407 Student Discount 2.390

Bonus Ladder 3.820

Chat 2.806

Table 3: VIF-values of the Initial Model

Since all of the explanatory variables yielded a VIF value well below ten, none of the covariates were eliminated due to multicollinearity.

3.4.2 Estimation of the Initial Model

The regression of the initial model rendered the results presented in Table 4.

Covariate β-estimate Standard Error Eta Squared P-value Lower 2.5% Upper 97.5%

Intercept 0.00004 0.06210 0.00000 0.9995 -0.12591 0.12599

Employees -0.00027 0.00069 0.00394 0.6987 -0.00166 0.00112

Solidity 0.08351 0.03253 0.20537 0.0146 0.01753 0.14949

Cash Position -0.00683 0.02004 0.00662 0.7351 -0.04748 0.03381

Average Salary 0.00018 0.00020 0.05811 0.3669 -0.00022 0.00058

Years on the Market 0.00149 0.00213 0.00719 0.4905 -0.00284 0.00581

Physical Store 0.00730 0.03436 0.00106 0.8330 -0.06239 0.07699

Only Own Brand -0.09041 0.06026 0.07488 0.1422 -0.21262 0.03179

Extensive Assortment -0.00207 0.04286 0.00003 0.9617 -0.08899 0.08485

Free Returns 0.04643 0.06179 0.02756 0.4573 -0.07889 0.17175

Free Shipping -0.04894 0.05418 0.02618 0.3724 -0.15882 0.06095

Student Discount -0.05275 0.03563 0.04743 0.1475 -0.12501 0.01952

Bonus Ladder 0.01070 0.06290 0.00118 0.8659 -0.11687 0.13827

Chat 0.07189 0.06741 0.03302 0.2933 -0.06482 0.20859

Table 4: Results of the Initial Model Initial Model Values:

• Multiple R-squared: R²= 0.6065

• Adjusted R-squared: ¯R²= 0.4645

• F-statistic for all covariates to be equal to zero: 4.269 on 13 and 36 degrees of freedom

• P-value for all covariates to be equal to zero: 0.000272

(25)

3.4.3 Reduction of the Initial Model

After estimating the initial model several statistics were taken into account in order to establish whether the model would have benefited from reduction of covariates. These statistics included, η², ∆AIC, adjusted R² and p-value.

Eta Squared

The η² statistic was used to evaluate how much variance each covariate accounted for in the model. A low valued η² indicated that the covariate should have been removed from the model since it was not reducing the amount of variance by a significant amount.

∆AIC

This statistic was defined as

∆AIC = AIC(Reduced M odel) − AIC(F ull M odel) (20) where the Reduced Model consisted of all but one of the covariates from the Full Model. This statistic was calculated for each covariate in the model and if the ∆AIC statistic was negative that would have suggested that the Reduced Model was favored over the Full Model.

Adjusted R²

The adjusted R² was calculated for every new model after a covariate was removed. If the value of the ¯R²was higher for the reduced model than the former model, the conclusion was drawn that the model was improved by the reduction.

P-value

The estimated p-value for each covariate was calculated, a large p-value would have suggested that the null hypothesis was likely to be true. Hence covariates with large p-values were excluded from the model.

All of these statistics were evaluated together for each covariate, if they all coincided meaning that the covariate had a low valued η², negative ∆AIC, high p-value and if the model produced a higher adjusted R²without the covariate, the covariate was eliminated from the model. The procedure was repeated until the model was no longer reducible. The excluded covariates are shown in Table 5.

Covariate P-value η² ∆AIC Adjusted R²

Extensive Assortment 0.96170 0.00003 -1.99858 0.4789 Physical Store 0.82379 0.00117 -1.94144 0.4920 Bonus Ladder 0.77724 0.00311 -1.84426 0.5035

Employees 0.71596 0.00392 -1.80358 0.5140

Cash Position 0.74518 0.00565 -1.71684 0.5232 Years on the Market 0.70105 0.00346 -1.82652 0.5329 Free Shipping 0.35016 0.02536 -0.71576 0.5319

Table 5: Eliminated Covariates

(26)

4 Results

4.1 The Final Model

Covariate β-estimate Standard Error Eta Squared P-value Lower 2.5% Upper 97.5%

Intercept -0.03198 0.02953 0.02953 0.2849 -0.09154 0.02758

Solidity 0.08305 0.02262 0.26685 0.0007 0.03742 0.12868

Average Salary 0.00020 0.00008 0.13962 0.0203 0.00003 0.00036

Only Own Brand -0.10356 0.03967 0.19182 0.0124 -0.18357 -0.02356

Free Returns 0.04664 0.02216 0.05764 0.0412 0.00196 0.09133

Student Discount -0.06705 0.22469 0.11728 0.0047 -0.11237 -0.02174

Chat 0.07230 0.0209 0.06890 0.0012 0.03018 0.11441

Table 6: Results of the Final Model Final Model Values:

• Multiple R-squared: R²= 0.5892

• Adjusted R-squared: ¯R²= 0.5319

• F-statistic for all covariates to be equal to zero: 10.28 on 6 and 43 degrees of freedom

• P-value for all covariates to be equal to zero: 4.811e-07

The final regression rendered the following equation, with β-estimates obtained from Table 6:

Operating M argin = −0.03198 + 0.08305 × [Solidity] + 0.00020 × [Average Salary]−

−0.10356 × [Only Own Brand] + 0.04664 × [F ree Returns] − 0.06705 × [Student Discount]+

+0.07230 × [Chat]

(21)

4.2 Model Validation

This section will explain and compare the results of the initial model and the final model. The issue of homoskedasticity versus heteroskedasticity will also be adressed.

4.2.1 VIF

In order to control for multicollinearity a new VIF-test was conducted. The VIF-test yielded no problems with multicollinearity, consequently no covariates had to be excluded from the final model. See Table 7 for VIF results.

(27)

Variable VIF

Solidity 1.182

Average Salary 1.132 Only Own Brand 1.386 Free Returns 1.206 Student Discount 1.386

Chat 1.254

Table 7: VIF-values of the Final Model

4.2.2 Adjusted R²

The initial model yielded an adjusted R² statistic of 0.4645, while the reduced final model yielded an adjusted R²statistic of 0.5319. The final model obtained a larger value which would have implied that the amount of variance in the dependent variable captured by the covariates was larger for the final model.

Thus the conclusion was drawn that the final model had more explanatory properties than the initial model.

4.2.3 F-Statistics and P-values

The initial model yielded an F-statistic of 4.269 on 13 and 36 degrees of freedom while the final model yielded an F-statistic of 10.28 on 6 and 43 degrees of freedom. The higher valued F-statistic obtained from the final model meant that the hypothesis that all the included covariates were equal to zero was less likely to be true. Thus the conclusion was drawn that the final model was closer to the true model than the initial model, and hence was a better representation of the relationship between the dependent variable and the explanatory variables.

The p-values of each covariate in the final model were below 0.05, this meant that the model suggested that with 95% confidence the null hypothesis could be rejected. It was hence very likely that the covariates had an effect on the dependent variable. A comparison between the p-values for the entire initial model and final model resulted in the conclusion that the final model was less likely to fulfill the null hypothesis, i.e the final model was an improvement from the initial model.

4.2.4 Residuals

In Figure 4 below the residual and density plots of the final model are shown.

The residuals are randomly spread which would suggest homoskedasticity. The density plot does not exactly look normally distributed. However, it does re- semble a normally distributed density with expectation equal to zero, which would imply homoskedasticity. However, due to the discrepancies and the slight offset it is more likely for the data to be heteroskedastic. The residual plot was generated using the function plot(resid(regression)) and the density plot was generated using the function plot(density(resid(regression))).

(28)

Figure 4: Residual and Density plot

In Figure 5 below the normal quantile quantile plot does approximately follow a straight line. This implies that the data comes from a normal distribution, and consequently suggests homoskedasticity. The normal quantile quantile plot was generated using the functions qqnorm(resid(regression)) and qq- line(resid(regression)).

Figure 5: Normal Q-Q Plot

(29)

5 Discussion

This section will discuss the potential problems with the study’s methodology.

A motivation for the choice of the initial covariates will be given as well as an interpretation and some criticisms towards the final model.

5.1 Method

The following section will discuss regression as a tool for evaluating strategy, the demarcation, data collecting, variable definitions and finally the reduction of the initial regression model.

5.1.1 Using Regression to Evaluate Strategy

Regression might appear to be an unconventional way of evaluating business strategy. However, the aim of the study was to identify factors and decisions that had a statistically significant impact on profitability, and therefore utilizing regression was a natural choice of method. Regression has also been used in earlier managerial studies to evaluate business strategies within other ar- eas. For example, a study on how entry barriers affect industry profitability was conducted using regression (Chappel, Marks & Park 1983), as well as a study on how racial diversity interacted with business strategy to determine firm performance (Richard 2000). Similarly, a study has also been made on gym profitability using linear regression (Axelsson & K¨allsbo 2015). The aim of the project combined with the knowledge of previous studies with similar methodology would suggest that regression is a valid approach to our problem formulation.

5.1.2 Demarcation

The main focus of the study was to investigate how different strategies regarding dietary supplement retailing affected business to consumer sales. Therefore it is important to mention the fact that some of the included companies also offered business to business sales. This might have affected the results of the study since the model did not account for how much of the company’s operations were represented by the business to consumer sales. Another potential issue in the demarcation requirements was that the company had to be registered in Swe- den, this might have led to important market competitors being excluded from the study. It is also important to note that some of the included companies who were registered in Sweden had counterparts or might have conducted sales in other countries where their strategy might have differed. Which again could have affected the accuracy of the results.

It is also worth mentioning that some companies who fulfilled the requirements had to be excluded. This was mainly due to a lack of financial data since some of them were registered in 2015 and therefore had not yet been operating for a full year. Some interesting market competetitors also had to be excluded due to being sole proprietorships or being part of larger companies with several different businesses and lacked individual financial reports.