• No results found

Non-financial factors that affect a company's total card transaction volume

N/A
N/A
Protected

Academic year: 2021

Share "Non-financial factors that affect a company's total card transaction volume"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2017,

Non-financial factors that affect a company's total card transaction volume

CLARA CARLMERED JOSEFIN KULLMAN

KTH ROYAL INSTITUTE OF TECHNOLOGY

(2)
(3)

Non-financial factors that affect a company's total card transaction volume

CLARA CARLMERED JOSEFIN KULLMAN

Degree Projects in Applied Mathematics and Industrial Economics Degree Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2017

Supervisors at KTH: Henrik Hult, Kristina Nyström Examiner at KTH: Henrik Hult

(4)

TRITA-MAT-K 2017:03 ISRN-KTH/MAT/K--17/03--SE

Royal Institute of Technology School of Engineering Sciences KTH SCI

SE-100 44 Stockholm, Sweden

(5)

Preface

We, Clara Calmered and Josefin Kullman, would like to thank our two supervisors for this study and report, from The Royal Institute of Technology. Henrik Hult, at the Institute for Mathematical Statistics, has provided us with his insights on regression analysis and similarly, Kristina Nystr¨om at the Institute of Industrial Management, has provided us with her insights on the management part of this report.

We would also like to thank our two supervisors Kalle and Nils from Albatross for their big support and the large sets of data they have provided us with. They have also helped with the data processing, given us insights of the payment solutions industry and provided us with continuous feedback.

(6)
(7)

Abstract

The study aims to investigate which factors that could potentially affect the yearly card transac- tion volume for smaller or medium sized businesses, in this study entitled merchants, not listed on the stock exchange. These merchants are characterized by a low number of employees, and in many cases smaller businesses without any noted ambition to grow in size. They are all directed towards consumers, and are therefore heavily reliable of people actually visiting their business in order to make revenues. These merchants do not offer any public financial information like previous yearly revenues, planned future investments or any debts the company might have, and hence the study is focused on factors an external actor could take part of. By investigating non-financial factors, this investigation further contributes with research on how a company can ensure growth and revenues. It also emphasizes specific industries or size of companies that generate larger card transaction volumes than others.

By mapping both possible quantitative and qualitative factors, that could potentially have an impact on the yearly card transaction volume of a merchant, a mathematical analysis has been conducted. The mathematical part of the study is based on multiple linear regression analysis, where the result is a model that can predict the card transaction volume for a company. The study has been conducted on actors operating on the Danish market, but could with smaller adjustments also be applied on other geographical markets.

(8)
(9)

Sammanfattning

Denna uppsats syftar till att unders¨oka vilka faktorer som p˚averkar den ˚arliga totala volymen av korttransaktioner f¨or ej b¨orsnoterade f¨oretag. Dessa verksamheter karakteriseras av ett f˚atal antal anst¨allda samt i m˚anga fall mindre f¨oretag utan n˚agon noterad ambition att v¨axa. Alla verksamheter verkar mot konsumenter och f¨orlitar sig d¨arf¨or p˚a att m¨anniskor fysiskt bes¨oker deras butik eller verksamhet, f¨or att de ska kunna ¨oka sin egen oms¨attning. D˚a verksamheterna i sig saknar finansiell allm¨an tillg¨anglig information kring tidigare oms¨attning, planerade in- vesteringar och potentiella l˚an f¨oretaget kan t¨ankas ha, baseras analysen p˚a andra faktorer som

¨

ar m¨ojliga f¨or en utomst˚aende akt¨or att ta del av. Genom att unders¨oka ej finansiella faktorer s˚a bidrar denna studie med forskning kring hur ett f¨oretag kan f¨ors¨akra sig om tillv¨axt och

¨

okad oms¨attning. Studien specifierar ¨aven vilka typer av industrier eller storlek p˚a f¨oretag som genererar st¨orre korttransaktionsvolymer ¨an andra.

Genom att kartl¨agga b˚ade m¨ojliga kvantitativa samt kvalitativa faktorer, som skulle kunna p˚averka ett f¨oretags korttransaktionsvolym har en matematisk analys genomf¨orts. Den matem- atiska delen av studien baseras p˚a multipel linj¨ar regressionsanalys, d¨ar resultatet ¨ar en modell som med en viss signifikansniv˚a kan f¨orutsp˚a korttransaktionsvolym f¨or f¨oretag. Den slutgiltiga modellen grundar sig p˚a observationer fr˚an den danska marknaden, men skulle med viss justering

¨

aven kunna appliceras p˚a andra geografiska marknader.

(10)
(11)

Contents

1 Introduction 7

1.1 Background . . . 7

1.2 Aim . . . 8

1.3 Research Questions . . . 8

1.4 Scope . . . 8

2 Mathematical Theory 9 2.1 The Multiple Regression Analysis Model . . . 9

2.1.1 Ordinay Least Squares Estimation . . . 10

2.1.2 Variables . . . 11

2.1.3 Important Assumptions . . . 11

2.2 Reducing the Model . . . 11

2.2.1 R2, Adjusted R2 and partial η2 . . . 12

2.2.2 Akaike Information Criterion . . . 12

2.2.3 Bayesian Information Criterion . . . 13

2.2.4 Mean Squared Error . . . 13

2.2.5 Hypothesis Testing . . . 13

2.2.6 P-value and Confidence Intervals . . . 13

2.3 Model Validation . . . 14

2.3.1 Linearity . . . 14

2.3.2 Normality . . . 14

2.4 Errors . . . 14

2.4.1 Multicollinearity . . . 14

2.4.2 Endogeneity . . . 15

2.4.3 Heteroscedasticity . . . 16

3 Business Model 18 4 Method 20 4.1 Data Pre-Processing . . . 20

4.2 Variable Selection . . . 20

4.2.1 Dependent Variable . . . 20

4.2.2 Covariates . . . 21

4.3 Initial Model . . . 26

4.3.1 Classification and descriptive statistics . . . 26

4.3.2 Initial equation . . . 27

5 Results 28 5.1 Initial Model . . . 28

5.1.1 Model Evaluation . . . 30

5.1.2 Reducing the Model . . . 31

5.2 Final Model . . . 32

5.2.1 Model Evaluation . . . 34

5.2.2 Additional Models . . . 35

6 Discussion 37

(12)

6.1 Analysis of Results . . . 37

6.1.1 Covariates . . . 37

6.2 Limitations . . . 40

6.3 Recommendations . . . 40

7 Conclusion 41 8 Appendix 44 8.1 Residuals vs Fitted . . . 44

8.2 Scale Location . . . 44

8.3 Residuals vs Leverage . . . 44

8.4 VIF . . . 45

List of Figures

1 Hetero- vs homoscedasticity . . . 16

2 Four-party model (Albatross, 2017a) . . . 18

3 Histogram and QQ-plot for the Initial Model . . . 30

4 Histogram for the transformed variable Local Card Transactions . . . 31

5 Histogram and QQ-plot for the Final Model . . . 34

6 Correlation Matrix for the quantitative variables in the Final Model . . . 35

7 Plot of the development of the number of employees, in relation to 0 employees . . . 37

8 Residuals vs Fitted . . . 44

9 Scale Location . . . 44

10 Residuals vs Leverage . . . 44

List of Tables

1 Interval for Number of Employees . . . 25

2 Covariates for the Initial Model . . . 26

3 Descriptive Statistics . . . 26

4 Summary table for Initial Model . . . 28

5 Statistics for the Initial Model . . . 30

6 Summary table for the Final Model . . . 32

7 Statistics for the Final Model . . . 34

8 Final Model 1 vs Final Model 2 . . . 36

9 VIF - Final Model . . . 45

(13)

1 Introduction

1.1 Background

Payments are made every day within all industries in different parts of the world. By definition, payments are the act of giving money for a delivered product or service. The year of 1946 the first bank card, named Charg-It was introduced by a banker in Brooklyn (MasterCard, 2012). At this early stage, people in ownership of one of these cards had to be a customer of Biggin’s Bank and only local payments could be made. Since then, card payments have evolved to be the main payment solution, replacing cash and other types of trading. The systems needed for card payments have, until only a few years ago, been an expensive investment and therefore only profitable for larger corporates with a large amount of customers on a daily basis. Today, start-up companies, smaller businesses or simply a private artist have the opportunity to buy a card payment solution customized to their specific need and use.

Albatross is a company which provides and sells payment solutions to companies, ranging from micro businesses to enterprises. Since there is confidential data and information in this report the company is entitled the fictional name, Albatross and their products and services are entitled ”Product X”

and ”Product Y”. Albatross operates internationally by owning and operating payment solutions companies in North America, the Pacific’s and Europe. They offer a wide range of products and services for their customers, where ”Product X” as their main product is a bundled offer combining card terminal and the subscription under one agreement. ”Product Y” is another product, which only provides the legal agreements to the customers already in possession of a card terminal (Albatross, 2017b).

However, Albatross is operating in a highly competitive market, where the global digitalization and the technological development ensure an industry that is always changing. This, in combination with people’s demand for safe and convenient payment solutions makes customer knowledge key for Albatross. Their most profitable customers are those merchants, by definition and according to their pricing model, with the most frequently and largest card transactions over time (Albatross, 2017b).

When it comes to valuation of a company there exists three main methods (Bernstr¨om, 2013, p.6- 7). The discounted cash flow is based on the weighted average cost of capital method and is most applicable when a company maintains a stable debt-to-value ratio. In case the debt-to-value ratio is expected to change, the adjusted present value (APV) method is more appropriate since it separates the cost of equity and cost of debt. Multiple valuation, is a method where a company is valued based on how other similar companies’ ratio comparing values are. All these methods are based on financial available information, and techniques used when the companies at hand are listed and established (Koller et al., p.103). However, there is a lack of public financial information concerning small or new companies, mainly since start-up companies might not generate any large revenues yet or the information that exists is not publicly published due to type of enterprise (Berk and DeMarzo, 2011, p.20). This thesis aim to explain factors, which are non-financial, that affect the card transaction volume for these companies. The purpose is to find new potential customers for Albatross and as a consequence of the result, distinguish smaller companies from larger ones within the small medium business space.

(14)

1.2 Aim

The European infrastructure for card payments is currently changing where the aim is to increase the social economic efficiency. This would decrease the use of cash entailing cost savings for the whole Europe. It is important to get a deeper understanding of the underlying factors and the implications for the card payment infrastructure, the different actors and the present business models (Arvidsson, 2009, p.2-3). This study further focuses on the role of the acquirer and how their revenues can be optimized within their existing business model.

The aim is to locate the most profitable customers for Albatross, where these merchants’ size is represented by card transaction volume. Albatross has expressed a need to distinguish smaller companies from larger companies, in order to target potential customers. As of today Albatross uses directed outbound sales in order to sell their subscriptions and a distinction between future profitable customers and those that will not generate enough revenues is needed. By using multiple linear regression as mathematical model this thesis aim to obtain a prediction model for the customers’

card transaction volume and to investigate variables that may have an impact on the transaction volume. Due to the assumption that there is no available financial data on the future potential customers, other variables such as the geographical location of the business need to be considered.

1.3 Research Questions

• What specific factors affect the total card transaction volume for Albatross’s customers?

• How can Albatross distinguish smaller merchants verses larger companies without any financial information available?

• Which of the selected factors in the prediction model are the most important ones?

1.4 Scope

To be able to reach an accurate model that can fully explain the profitability of Albatross’s customers, some demarcations are needed. First of all the thesis will only be executed including customers operating on the Danish market, since Albatross has expressed a need to understand and expand their business within Denmark. It is reasonable to believe that the results can be applied on the Swedish market as well as other markets, with only minor changes in the model. What differentiates Denmark from Sweden is the fact that there is a local debit scheme present in the Danish market which limits the usage of international cards (Albatross, 2017a). This study will only include data transactions that are profitable for Albatross, simply because this is the only available data.

The dataset will be based on companies within the small medium business space. This is due to the fact that in terms of total revenue, this is one of Albatross’s primary segments of interest. In relation to this, only companies using ”Product X” and ”Product Y” will be considered.

Furthermore the study will focus solely on in-store companies and no e-commerce, meaning only merchants that accepts payments in physical stores.

(15)

2 Mathematical Theory

This thesis aim to find a statistical model that explains the relationship between a number of different factors and the card transaction volume for a merchant. To do so, multiple linear regression will be used. The technique estimates the relationship between a dependent variable and a set of covariates, also named explanatory variables. The method explains how the dependent variable varies, when the value of the explanatory variables are changed one by one, i.e. when the other variables are held fixed (Montgomery et al., 2015, p.2).

2.1 The Multiple Regression Analysis Model

When the regression model includes two or more covariates, as in this specific study, it is entitled multiple linear regression. The model itself could be either structural and/or predictional, and the data used to maintain the final model could be either observational or experimental. The difference lies within how it is possible to interfere with the events resulting in the data points, where experimental data could be explained as staged data and observational data are observations of randomly occurred events.

The structural perspective assumes the covariates to affect the dependent variable when assessing them. Whereas the covariates in the predictional interpretation does not have to directly influence the dependent variable in order to have a predictive effect. This study is performed from a structural and predictional perspective, with observational data.

The multiple regression model is defined as (Montgomery et al., 2015, p.69).

yi=

X

i=1

xijβj+ ei, i = 1, 2, ... (1)

Where yi is the dependent variable, n is the number of observations and xij is the value of the covariate j. The ei is the error term needed since the relationship between the covariates and the dependent variable is almost never perfectly linear. The βj is the unknown value, which is to be estimated from the data.

This can also be written in matrix form

Y = Xβ +  (2)

The following terms Y, X, β and  in the equation are equal to

Y =









 y1

. . . yn









(3)

(16)

X =





1 x11 . . x1k

. . . . .

. . . . .

1 xn1 . . xnk





(4)

β =









 β0

. . . βk









(5)

 =









 e1

. . . en









(6)

The Y-vector is the dependent random variable, where the dimension is n × 1. The covariate Xiare written in an n × (k + 1) - matrix, where the column with ones represents the intercept β0.

2.1.1 Ordinay Least Squares Estimation

In order to estimate the unknown parameters βi in the linear regression model, one can apply the Ordinary Least Squares estimation (OLS) (Montgomery et al., 2015, p.70). The OLS estimate of βi

is ˆβiand is calculated by minimizing the sum of the squared residuals, ˆei2. The squared residuals is the squared distance between the observed response variable yi and the fitted response variable ˆyi. In order for the Ordinary Least Square to be Best Linear Unbiased (BLUE) some conditions need to be satisfied. If so, these estimates provide the best estimate possible for the relationship between the covariates and the dependent variable. Values closer to zero indicates less influence, and values far from zero corresponds to strong influence (Lang, 2015, p.7).

Mathematically this is described as

Xtˆe = 0 (7)

Where

ˆ

e = Y − X ˆβ (8)

By solving the normal equations, the estimate of β can be solved as

β = (Xˆ tX)−1XtY (9)

(17)

2.1.2 Variables

The covariates are classified in two different categories, standard or dummy variable. A standard variable is quantitative and holds a real value, meanwhile a dummy variable is qualitative and is also called an indicator. The dummy is therefore set to a binary variable, taking either a value of one or zero depending on the information from the data (Lang, 2015). A dummy variable could for example be ”advertising”, where the covariate will be set to one if the company is registered to not receive advertising and if not registered, it is set to zero. In order to evaluate and compare the estimates for these dummy variables, one needs to remove one of the categories, for example

”non-advertising”. This makes it possible to compare them, where the estimate is in relation to the percentage share of the covariate in relation to the total number of observations. This removed dummy is called benchmark, where the other coefficients within this group are benchmarked against this one. This also avoids perfect multicollinearity (Montgomery et al., 2015).

2.1.3 Important Assumptions

When applying the linear regression model, these five basic assumptions are made (Montgomery et al., 2015, p.20)

1. The first assumption is that the dependent variable Y is written as a linear function of the independent variables X, plus an error term.

2. The second assumption is that the the expected value of the error term, e, is equal to zero.

This means that the estimator of β is unbiased.

E(ei) = 0 (10)

3. The third assumption is that the regression is homoscedastic, meaning that all the error terms have the same variance with no correlations between them. This can be written as

E(e2i) = σ2 (11)

4. The fourth assumption is that covariates are considered to be fixed in repeated samples, mean- ing that they are deterministic.

5. The fifth assumption is that multicollinearity does not exist and that the number of observa- tions are greater than the number of covariates.

2.2 Reducing the Model

When finding a regression model, one important part is to decide which covariates that should be included in the model. The way to do this is to introduce more covariates than might be needed in order to test them against each other and evaluate them statistically, and then exclude certain covariates according to these tests. This is an important step of generating the regression model, since excluding too many covariates might lead to a model that does not explain the relationship well enough while including too many, may result in the model being too well-fitted to the data (Frost, 2013). It somewhat creates relationships between covariates where there is none, or at least not much of it. Methods, and different measures are used to evaluate these covariates and to reduce the ones

(18)

that effect poorly on the regression model. This, while at the same time keep those which results in a high level of explanation and holds for a low residual (Montgomery et al., 2015, p.328).

2.2.1 R2, Adjusted R2 and partial η2

R2is an estimate which explains the correlation between the covariates and the dependent variable.

It is a measure of goodness of fit, evaluating how well the model explains this relationship. R2 is defined as

R2= | ˆe|2− |ˆe|2

| ˆe|2 (12)

where

| ˆe|2=

n

X

i=1

(yi− ˜y)2, |ˆe| =

n

X

i=1

(X ˆβ − ˜y)2 (13)

In words, ˆe is the sum of the residuals from the covariates in the model, and ˆe is the sum of the residuals without the covariates, that is performed only on the intercept. This result could be explained as a percentage of the error being explained by the model. An important part to consider is the fact that the measure will increase when increasing the number of covariates, and it will therefore often indicate the biggest model to be the best fit (Lang, 2015, p.8).

Adjusted R2 is a form of R2 that is preferred in order to address this problem, since it penalizes models when adding more covariates. In other words, it accounts for the degrees of freedom lost when adding a new covariate (Montgomery et al., 2015).

Partial eta squared, η2, is another similar method which calculates how much a covariate contributes to reducing the error in the model, which is also known as the effect-size. In relation to R2, it only measures the relative explanation on one of the covariates, instead on all of them (Montgomery et al., 2015).

The R2 and the adjusted R2 will be used in this study, to further ensure a mathematical correct model.

2.2.2 Akaike Information Criterion

Akaike Information Criterion (AIC), is a method used when choosing which covariates to include in the the model. Unlike hypothesis testing AIC estimates the quality of each model in comparison to the other potential models. In other words AIC does not give any information about the quality of the model itself, which is why AIC will sort out irrelevant covariates by suggesting the preferred model to be the one that minimizes the below (Lang, 2015, p.22)

AIC = nln(|ˆe2|) + 2k (14)

where, k is the number of the coefficients, n is the number of observations and | ˆe2| is the sum of the squared residuals.

(19)

2.2.3 Bayesian Information Criterion

Bayesian Information Criterion (BIC) his is another very similar method to AIC, as presented below

BIC = ln(n)k − 2 ln( ˆL) (15)

There is argument for why AIC is a better tool than BIC, since it chooses the optimal model in terms of mean squared error, where previous studies states AIC to be the superior model (Burnham and Anderson, 2004). Even if so, BIC will be used as an additional method to test models against each other.

2.2.4 Mean Squared Error

The mean squared error (MSE) measures the difference between the estimator and what is estimated, in other words the average of the squared errors. It includes both the variance of the error and its bias, and hence for an unbiased estimator the MSE is the variance of the estimator.

M SE = 1 n

n

X

i=1

( ˆYi− Yi)2 (16)

2.2.5 Hypothesis Testing

Hypothesis testing is the statistical tests performed, when testing two opposing hypothesis against each other. The null hypothesis is usually equal to the result being zero, or no difference at all and the opposing hypothesis is the statement one wants to conclude is true (Minitab, 2016). To do this, the p-value and the level of significance is used.

2.2.6 P-value and Confidence Intervals

The hypothesis tested, is if one or more of the beta values are equal to zero (Lang, 2015). That is

H0: β1= β2= ...βk = 0 (17)

The test gives and indication about the overall adequacy of the model since it tests if there is linear relationships between the response variable y and any of the regressor variables Xi. One of this tests is the p-value, which is derived from the F-value where the F-value is calculated assuming that the null hypothesis is true and that the distribution of the F-value is known.

F =

j− Bj0

SE( ˆBj)2 (18)

(20)

If X is said to be a random variable with a known F-distribution, then the specific p-value is the probability that X is greater than the computed F-value, that is

P (X > F ), where X ∈ F (q, n − k − 1) (19) Here n is the number of data points and k is the number of covariates in the model. By choosing a level of significance, any p-value less than this specific value can be rejected. The significance level is by default set to 5%, where any p-value higher than this level means that the null hypothesis cannot be rejected.

The t-value

Another very similar value to the F-value is the t-value. It is used for making interference about a single coefficient in a linear regression. It works the same way as the F-value, where the p-value of the t-test is the probability of X being greater than the computed t-value. The t-value is defined as below

t = βˆi

SE( ˆβi

(20)

2.3 Model Validation

2.3.1 Linearity

The Residuals versus fitted plot, is a scatter plot with the residuals labeled on the y axis and the fitted values labeled on the x axis. The plot is relevant when detecting non-linearity, unequal variance and outliers. The goal is to have a linear relationship between the explanatory variables and an response variable which is the reason why the residuals are supposed to be equally spread around a horizontal line in the plot. The variance of the error terms are equal when the residuals form a ”horizontal band” around the zero- line and a conclusion that there are no outliers could be drawn in case there are no residuals ”standing out”.

2.3.2 Normality

The QQ-plot (quantile-quantile plot) plots two sets of quantiles against each other in order to tell if they have the same distribution, in that case the observations form a straight line. Any distribution is allowed in the QQ-plot and it can therefore be used to determine whether the error terms in the regression model are normally distributed or not.

2.4 Errors

2.4.1 Multicollinearity

Multicollinearity is when two or more variables in a multiple regression model are almost perfectly linearly correlated and it exists a near-linear dependence between the variables. Highly linear cor- relation has a negative impact on the estimate of the regression coefficient and should therefore

(21)

be eliminated, or further investigated. However as the sample size increases, the standard error decreases and hence multicollineraity is usually more of a problem when handling small sample sizes.

Variance Influence Factor is a method where the relationship between the covariates is estimated (Allison, 2012). A VIF value larger than 10 is an indication of multicollinearity in the model and the corresponding covariate should be removed or further investigated (Montgomery et al., 2015, p.296). The VIF method is applied stepwise until there is no longer any VIF value larger than 10, indicating no linear dependence between the covariates.

V IF = 1

1 − R2 (21)

2.4.2 Endogeneity

The residual for the OLS estimate is assumed to have conditional mean of zero (Lang, 2015, p.28), that is

E[|X] = 0 (22)

Violation of this assumption is called endogeneity, indicating that the estimation is not valid. En- dogeneity could be caused by a number of various reasons, some of them described below.

Simultaneity

Simultaneity is when there is a two way effect between the dependent variable and a covariate (Lang, 2015, p.26-27). For example, trying to estimate a consumer demand for a product where the price is used as a covariate. The effect of these two factors affecting each other naturally, will be included in the error term, causing endogeneity.

Sample selection bias

In order to generate a regression model, one needs to use observations collected somewhere, and depending on the purpose of the model it is impossible to use data representing the whole population.

Since smaller datasets are used, these need to be selected randomly and if not, this is an important factor to consider when evaluating the model. Sample selection bias could then cause endogeneity, since it may be so that the sample selection correlates with an unmeasured covariate that is explained by the error term (Lang, 2015, p.26).

Measurement errors

If there exists a measurement error in the sample data, it will ultimately effect the end result and cause correlation between the independent variables and the error terms (Lang, 2015, p.28). If the model is assumed to look like

yi= α + βX + e (23)

A measurement error will show as

X= X + z (24)

(22)

Which then inserted in the estimation of the model, results in

yi= α + βX + (e + βz) = α + βX + ξ (25)

The independent variables and the error term ξ are now correlated through the added measurement error z and β which causes endogeneity.

Missing relevant covariates

In case some relevant covariates are missing in the model, the error term will include this in an attempt to ”explain” this. The first step of generating a regression model, is to identify and include as many relevant covariates as possible. As described earlier, this results in tests needed to test them against each other and later remove some of them from the final model (Lang, 2015, p.27).

2.4.3 Heteroscedasticity

Scale-Location plot is a plot with the square root of the standardized residuals on the y axis and the fitted value on the x axis. The plot is used to detect heteroscedasticity which is the opposite to homoscedasticity, where homoscedasticity is when all random variables in a data set have equal variance. In other words, if the error terms do not have constant variance they are said to be heteroscedastic. An assumption when generating a regression model is that the the error terms should have the same variance for which the observations in the scalar-location plot will be spread equally and randomly, forming a horizontal line.

(a) Heteroscedasticity (Kim, 2015) (b) Homoscedasticity (Kim, 2015)

Figure 1: Hetero- vs homoscedasticity

There are different ways to reduce heteroscedastic errror terms, where some will be described be- low.

(23)

Reformulate the model

To reduce heteroscedasticity one can transform the variables of the model, for example taking the logarithm of the response variable. This in order to reduce the spread of the data set. Another way to reformulate the model would be to add more explanatory variables (Lang, 2015, p.18).

White’s Consistent Variance Estimator

This estimator is an alternative to the standard covariance matrix, used when deriving the standard errors of a heteroscedastic regression. It looks as follows

Cov( ˆβ) = (X0X)−1XtD( ˆei2)X(XtX)−1 (26) The regression is still performed using OLS, but the standard errors are estimated using White’s consistent variance estimator (Lang, 2015, p.18)

(24)

3 Business Model

The infrastructure for card payments often include four parties which are; the issuer, the acquirer, the card holder and the merchant (Arvidsson, 2009, p.10-11). The roles and business relationships between the four parties vary during the different steps of a transaction. The card issuer is a financial institute, often a bank which provides the card holder (consumer) with a card and a card agreement. This means that the financial institute is the one financially responsible for risks such as card fraud. To be entitled a card, the card holder pays a yearly fee to the card issuer. Every time a transaction is made, where the card holder makes a payment for a product/service which the merchant provides, the transaction is transferred from the card holder to the issuer. The issuer then transfers the actual money from the consumer’s bank account for the corresponding transaction to the acquirer, which transfers an interchange fee to the issuer. The acquirer is the link between the issuer and the merchant and therefore the one responsible for transferring the amount to the merchant’s bank account (Arvidsson, 2009, p.10-11). Due to the financial risk the acquirer takes on during this transaction, they charge the merchant a fee (Albatross, 2017a).

Figure 2: Four-party model (Albatross, 2017a)

The infrastructure for card payments is complex since there are different scenarios and set-ups, which involves the system flow of transactions. Depending on what type of transaction and card, the cost component will vary. There are various types of transactions depending on where the four parties are located, and the most common types are domestic, intra-regional and inter-regional. Domestic is when the merchant and the issuer are located in the same country. Intra-regional is when the merchant and the issuer are located within the same card scheme region but not situated in the same country, e.g. merchant in Sweden and the issuer in Germany. The third transaction is inter-regional, which is when the merchant and issuer are in different card scheme regions, e.g. the merchant in Denmark and the issuer in Canada (Albatross, 2017a).

(25)

As well as different types of transactions, there are also several types of cards where they are sorted as general, single branded/private label and co-branded. A general card, includes MasterCard, Visa, DinersClub, where they all have the same functionality at all merchants. The single branded/private label, is a more limited card in the sense that it is only accepted at a few merchants or only in one merchant chain. For example do cards such as petrol cards and grocery store cards belong to this category. A co-branded card is one card but with multiple functions and brands at co-brand merchant premises, e.g. Circle K and MasterCard (Albatross, 2017a).

As mentioned above, an acquirer is exposed to a high level of risk, due to several reasons. A major risk is the exposure of charge-back since the acquirer provides prepaid services where there is no real guarantee of an actual deliver from the merchant. Another major risk is penalty fees and sanctions, which is related to the high restrictions of how a party handles sensitive data information. Delayed settlements, FX-risks and fraud affecting several parties, are other risks which the acquirer might face. There is also a political risk, since a change in tax laws and regulations might have an impact on the total card transaction volume (Ross et al., 2013, p.965). Due to the fact that there is a high risk in acquiring it is important that the merchant is reliable and has a financial stability (Albatross, 2017a).

(26)

4 Method

This section will describe how the thesis was approached by applying different methods, and explain the included variables and the reasons to include them.

By going through the given five steps, according to section 10.3 in Montgomery one can reduce data and covariates stepwise. The five steps are:

1. Fit the largest model possible to the data.

2. Perform a thorough analysis of the model.

3. Determine if a transformation of the response variable or of some of the regressors is necessary.

4. Compare and evaluate the best models recommended by each criterion.

5. Perform a thorough analysis of the best models, usually two to three models.

4.1 Data Pre-Processing

The majority of the data was obtained from Albatross who could provide reliable data on current customers of theirs. In order to get a deeper understanding, meetings and interviews were conducted with employees at Albatross. This resulted in a general understanding of the market itself, current customers and their specific business plan. All of which were applied on the research questions described in the introduction. Data points for some of the variables were obtained from Statistic Denmark’s web page.

Since the explanatory variables are all, in one way or another supposed to affect the dependent variable, it was key to fully understand how the data for these were obtained. Another important reason to further understand the data given, was in order to make assumptions and later remove and group data according to mathematical tests and a theoretical correct background.

First, the study consisted of X number of different observations which was reduced to Y number of observations. This due to insufficient data for some of the chosen covariates. For example were any observations with yearly revenues below Z DKK removed.

4.2 Variable Selection

The aim is to obtain a prediction model, in order to investigate which non-financial factors that could potentially affect a company’s total card transaction volume. This prediction model will be applicable on Albatross’s customers, which is why the explanatory and response variables are based on the customers of theirs.

4.2.1 Dependent Variable Total card transaction volume

The dependent variable is set to the total card transaction volume, since the aim is to find the most profitable customers for Albatross. A profitable customer is defined with Albatross’s business model

(27)

in mind, where they charge their customers a fee. As an acquirer, Albatross are facing a number of risks, since they provide prepaid services where the transactions contain sensitive data (Albatross, 2017a). Due to major risks for an acquirer, it is important that their customers, the merchants are reliable and financially stable.

These transactions are extracted from the year of 2016 and is measured in DKK. It is simply the total card transactions throughout the year, where customers that registered later than the start of January 2016 have been added up and then divided by the number of active months. Active months is calculated as the time, in months, from the first active week until the last of December 2016.

The seasonal merchants, which only have transactions during a few months of the year will show a smaller transaction volume since number of active months is still from the first active week since the last of December.

The dependent variable was log-transformed in order to equally spread the result, and make for normally distributed residuals. This was done in the initial model, as below

log (yi) =

n

X

i=1

xijβj+ ei, i = 1, ..., n (27)

4.2.2 Covariates

Local Card Transactions

The total card transaction volume made for each customer will directly affect Albatross’s revenue due to their pricing model. However, because of the local debit scheme in Denmark, which is separated from the international card scheme, there is an additional factor to consider when investigating the card transaction volume. The variable is relevant, since there are arguments for an increasing trend towards domestic processing of payment transactions (Capgemini, 2011). During transactions made within the local scheme, Albatross will not act as the acquirer and will therefore not make any profit out of these specific transactions (Albatross, 2017b). However a payment made with a local debit card could also be transferred within the international card scheme, where Albatross in this case will act as an acquirer and therefore receive their fee of the transactions.

This covariate gives a measure of the share of local debit card payments made, out of the total card transaction volume.

Hypothesis 1: The local card transaction volume, will affect the total card transaction volume, either positively or negatively.

Age

It takes time to build a company in terms of cost, revenue and loyal customers. Considering the S-curve a start-up company has huge costs in proportion to their revenues. This varies depending on industry, but there are some general costs like research costs, employee expenses, insurance costs, which have an impact on the the financial survival for a company (Morah, 2017). This study does not investigate the costs of the merchants, but simply the outcome in the form of card transactions.

The belief is that there should be a clear correlation between the age of the merchant and its card transaction volume.

(28)

However even though a company is well established in the market, it is always important to develop and adapt when other factors such as fluctuations and digitalization affect the conditions on the market. Economies of scale and creative destruction tend to consolidate markets over time, and this with various outcome within different industries.

The age is measured from the day the company was first legally registered, until the last December of 2016. Any companies registered in the beginning of 2017 were removed from the dataset. This resulted in the companies’ age ranging from 2 months to 113 years.

Hypothesis 2: The age of a company which operates in a mature market will have a positive effect on the total card transaction volume, where an older company should generate larger revenues.

Number of Sales Locations

This study is based on observations with merely physical stores and no e-commerce which makes number of sales location an interesting variable. It is an indicator of number of distribution channels, the growth rate of the company, how exposed and flexible the company is to its customers and its stability to geographical fluctuations in demand. Larger corporations often use number of newly opened stores as a measure of their current expansion and growth rate (H&M, 2016).

Any company that had more than 50 registered sales locations were removed from the dataset. A company with more than one sales location has still the rest of the stores registered at the same single address, which could give an incorrectly result since the geographical place of each stores is not considered.

Hypothesis 3: The number of sales locations will have a positive effect on the dependent vari- able, where more stores or service centers should indicate merchants with a larger card transaction volume.

Average income

Albatross is highly dependent on their merchants to attract consumers in order to increase their sales, which in the next step also increases Albatross’s revenues. Therefore the location of the business is of importance and one factor to consider is the average income of the people living in this specific area. Classical microeconomics models suggest that a person who earns more is more likely to spend more, regardless the price of normal goods. The definition of normal goods is that the demand will increase when consumer income rises. However, if so is not the case and the demand decreases when income rises, these goods are called inferior goods (Krugman and Wells, 2013, p.72, p.75). A normal good is often more expensive and desirable which is why people would, when they can afford more, choose to buy normal goods rather than inferior goods. Assuming the merchants supply normal goods, the ones situated in more attractive areas where people on an average earn more, should hold larger yearly revenues.

Income is also often dependent on age, education and gender of a specific person. Average income is measured as the yearly average for both men and women of all ages in each region. It still holds, that a region with slightly more women resident, might result in a negative impact on the average income covariate. An additional factor to this is that men and women might choose to spend their money on different things, and furthermore use their credit cards in different types of stores. The same applies for age, where a slightly younger population in a region will decrease the average of

(29)

the income, but on the other hand, younger people tend to use their credit card in advantage of cash (Hayhoe et al., 2000). Even if this uncertainty exists, and there are no data points describing the specific customer when it comes to age, education and gender, it is reasonable to assume that a larger average income should affect the result positively.

Average income are collected from (Denmark, 2016, table 212, page 218) and sorted according to the 98 different regions in Denmark.

Hypothesis 4: A merchant situated in a region where the average income is higher, should also generate a larger total card transaction volume.

Population Density

In the different regions of Denmark there is a variety of the size of the population. One assumption is that the more people that live in a region, the higher the total consumption should be (Krugman and Wells, 2013, p.73, p.75). However, this does not have to be the case since, a high population density will probably result in more competitors situated in this region as well. The fact that people commute also means that they do not need to spend their money on products and services in the same region as they live in.

The population density is the number of people living in the region divided by the area, measured in square kilometres. Cities and more densely populated areas should ensure more people visiting and therefore increase the chance of buying products or services. This data is also received from (Denmark, 2016, table 6, page 26).

Hypothesis 5: A high population density should have a positive impact on the total card transaction volume.

Advertising

This variable tells whether or not the company is available to ”directed marketing”, which is ap- propriate to measure since Albatross uses outbound sales in order to acquirer new customers. Non- advertising means that if a potential customer has registered that they are not available to contact, Albatross will not be able to contact them in a directly pursued way. The only chance for the po- tential customer to become a customer of Albatross is if they make an effort and contact Albatross themselves.

Also, a merchant unavailable to directed sales and marketing might miss out on opportunities pro- vided in terms of marketing, collaborations with other companies or offers. On the other hand, this could also be a factor that contributes to the merchant being more focused on their own growth and future revenues and a company with a well worked out business model. Some companies in the small medium business space do not feel that they necesseraily need to grow in size, but they can still generate consistent revenues, and card transactions.

Hypothesis 6: This variable will have an impact on the model, but it could be either a positive or negative estimate, where the result will provide more information on this specific covariate.

(30)

Business Form

This variable implies what sort of legal business form the companies are registered as. The different business forms have different advantages and disadvantages such as taxation and legal rights. Which business form suits best for a specific company depends on factors such as the size of the company, number of employees and revenue (Berk and DeMarzo, 2011, p.3-5). Sole proprietorship, partnership, private limited company and limited company are the four major types of business forms.

Shortly, sole proprietorship is a business which is owned and run by one person. It is a simple business form since it is straightforward to set up and is usually a smaller business. However, legally there is no separation between the firm and the owner, which means that the owner has unlimited personal liability for any of the firm’s debts. The only difference between a sole proprietorship and a partnership, is that a partnership has two owners instead of one. In this case both partners are liable for any of the firm’s debt (Berk and DeMarzo, 2011, 5). A private limited company is a company where the owners have limited liability but can in the same time run the business. A limited company is defined as a judicial people or a legal entity, which separates the firm from its owners. This means that the owners are not liable for any debts or obligations the limited company has and vice versa. The owners are in this case called shareholders, where the number of owners are unlimited and where the owners own a share of stock in the company.

This variable might therefore affect the total card transaction volume. Any business form including less than 20 observations each are grouped within ”Other Business Forms”.

The variables included in the initial regression model are

• Entrepreneurial Society

• Foreign Company

• Limited Company

• Non-profit Organization

• Partnership

• Private Limited Company

• Sole Proprietorship

• Union

• Other Business Forms

Hypothesis 7: Considering the size in terms of number of employees and revenue of the majority of Albatross’s customers, limited company is one of the business forms that will have the largest positive impact on the total card transaction volume.

Industry

The industry, for which a merchant is active within, will differ in terms of size, growth rate and win rate competition. The estimates for the different industries should work as indications of which industries that may use card payment solutions in advantage of cash or invoice. The average cost for a product or service varies between industries and the amount of a potential transaction is an important factor in terms of secure transfers, in the eyes of a consumer.

(31)

Some industries will be more attractive in the market due to macroeconomic factors that varies over time. Such factors could be both political and financial, both globally and nation-wide. Today’s current financial circumstances will attract or reject future employers and investors for certain in- dustries. Universum’s findings from the ”Global Talent Survey” states that Danish college students look for a career with high future income as they show less interest in working in the public sector (Barck, 2016). In terms of this, the estimates for the different industries should give an indication of which industries that is profitable, and therefore attractive to both employees and investors.

The industry is a variable which is classified by Albatross itself. An important assumption is that this classification is correct.

Hypothesis 8: Caf´e/Restaurant/Bar is one of the industries that will have the largest positive impact on the card transaction volume, since it is an industry where card payments are frequently used.

Financial Information

This covariate indicates if the company previously has been in financial distress, and if they have faced bankruptcy or not. The variable is divided into two dummies where one indicates previous financial distress, and the other one no financial remarks. Financial distress could be due to various reasons. Small business owners tend to make mistakes such as incorrect pricing, poorly handled debt management or no real cash flow management (Ashe-Edmunds, 2017).

Hypothesis 9: It is possible for a company to recover and learn from previous mistakes, but it is unlikely and the dummy indicating financial distress should yield a negative estimate.

Number of employees

A company usually strives for the highest revenue per employee as possible and will not hire any new employees if not compelled to do so (Investopedia, 2017b). Companies that create higher shareholder value also show stronger employment growth (Koller et al., 11). This explains why number of employees gives an indication of, productivity, the growth rate of the company, the size of the demand and the number of customers. In this study only physical stores are considered, where many of the merchants are in the service industry, which requires a larger number of employees.

Another aspect to consider when analyzing the result of this variable, is that employee costs are one of the largest expenses for companies (Investopedia, 2017b). Often costumers evaluate the company for either its good service or for the low price. A company therefore needs to balance between the service level, the cost for the number of employees and the output in form of revenues. Even though the cost in general does not affect Albatross in the short term, it might in the long term since this will affect the future investments and expansion of a company.

The datapoints for the number of employees where given in intervals, and therefore set to dummies.

Since most of Albatross’s customers are in the service industry this is appropriate to take into consideration.

Interval for Number of Employees:

Table 1: Interval for Number of Employees

(32)

Interval: 1 2 3 4 5 6 7 8 9 10 0 1 2-4 5-9 10-19 20-49 50-99 100-199 200-499 1000+

Hypothesis 10: Number of employees will impact, positively but not linearly, on the total card transaction volume.

4.3 Initial Model

4.3.1 Classification and descriptive statistics

The covariates are classified as either a standard- or a dummy variable depending on if it is a quantitative or qualitative value. Below is the classification for the covariates for the initial model.

Table 2: Covariates for the Initial Model Dependent variable

Total Revenues Quantitative

Covariates

Local Card Transactions Quantitative

Age Quantitative

Number of Sales Locations Quantitative

Average income Quantitative

Population Density Quantitative

Advertising Dummy

Business Form Dummy

Industry Dummy

Financial Information Dummy Number of employees, interval Dummy

The below descriptive statistics table gives an overall overview of the collected sample data for the quantitative variables. Some information regarding local card transactions is left out due to confidentiality.

Table 3: Descriptive Statistics

Covariate Mean Std.Error Median Minimum Maximum

Age 10,8449 0,2908 7,07 0.1200 112,98

Local Card Transactions *** 0,0034 *** 0 1

Number of Sales Locations 1,8219 0,2467 1 1 1278

Average income 218283,2824 434,4060 209991 179988 425302

Population density 1343,6275 32,5397 147,1 15,2 11995,5

(33)

4.3.2 Initial equation

The initial equation looks as below, where the β-values are the values that are to be estimated through the regression

CardT ransactionV olume =

10

Y

i=0

eβi kxi = β0

∗ eβ1∗LocalCardT ransactions

∗ eβ2∗Age

∗ eβ3∗SalesLocations

∗ eβ4∗Averageincome

∗ eβ5∗P opulationDensity

∗ eβ6k∗Advertisingk=0,1

∗ eβ7k∗BusinessF ormk=0,1,...,9

∗ eβ8k∗Industryk=0,1,...,36

∗ eβ9k∗F inancialInf ormationk=0,1

∗ eβ10k∗N umberof Employeesk=0,1,...,10

(28)

(34)

5 Results

This section presents the concluded results through graphs and tables, from the statistical tests made. First an initial model will be introduced, discussed and due to the different tests performed modified. The modified model will be presented as the final model where this model will be evaluated and validated.

5.1 Initial Model

The covariates left out, and thus serving as benchmarks are

• Caf´e/Bar/Restaurant

• Limited Company

• 0 Employees

• No bankruptcy

• Non-advertising

The results of the regression model is presented in a table including estimates with corresponding standard errors, t-values, p-values and the confidence interval.

Table 4: Summary table for Initial Model

Covariate Estimate Std.Error t P-value Lower Upper

(Intercept) 11.9700 0.1561 76.7140 0.0000 11.6672 12.2791

Local Card Transactions 1.0390 0.7234 14.360 0.0000 0.8969 1.1805

Age -0.1117 0.0021 -5.327 0.0000 -0.0153 -0.0071

Number of Sales Locations -0.0013 0.0012 -1.131 0.2581 -0.0037 0.0010

Average income -0.0000 0.0000 -0.204 0.8383 0.0000 0.0000

Population Density 0.0001 0.00000 7.829 0.0000 0.0000 0.0000

Industry

Airline 0.0823 1.1590 0.0710 0.9434 -2.1899 2.3546

Apparel -0.1921 0.0834 -2.303 0.0213 -0.3556 -0.0286

Automotive -0.8412 0.0834 -10.0860 0.0000 -1.0047 -0.6777

Banking -0.6513 0.3545 -1.837 0.0662 -1.3462 0.0436

Books Office Supplies -0.5207 0.3709 -1.404 0.1604 -1.2477 0.2063

Cafe/Restaurant/Bar Benchmark - - - - -

Clubs -1.4320 0.1396 -10.2600 0.0000 -1.7058 -1.1585

Construction -1.9980 0.1525 -13.0980 0.0000 -2.2968 -1.6988

Consulting -0.7233 0.2054 -3.5220 0.0004 -1.1259 -0.3208

Education -1.5870 0.2673 -5.938 0.0000 -2.1113 -1.0633

Electronics -0.1366 0.2176 -0.628 0.5302 -0.5632 0.2810

Entertainment -0.8543 0.1542 -5.539 0.0000 -1.1566 -0.5519

Florists 0.0040 0.9269 0.0040 0.9966 -1.8131 1.8210

Food and Beverage -0.0720 0.1423 -0.506 0.6128 -0.3509 0.2069

Furniture Interior -0.3822 0.1271 -3.008 0.0026 -0.6313 -0.1331

Gaming Gambling 0.5050 0.5091 0.9920 0.3212 -0.4930 0.1503

(35)

Government -1.3450 0.4029 -3.3380 0.0008 -2.1349 -0.5552

Groceries 0.0861 0.0921 0.9360 0.3495 -0.0943 0.2666

Health Beauty -1.0100 0.0749 -13.4910 0.0000 -1.1569 -0.8634

Healthcare -1.4660 0.0864 -16.9710 0.0000 -1.6356 -1.2969

Hotels Camping 0.3734 0.1780 2.0980 0.0360 0.0244 0.7224

Media -1.1570 0.1614 -7.1710 0.0000 -1.4736 -0.8409

Other Industries -0.9768 0.0787 -12.4120 0.0000 -1.1311 -0.8225

None -0.1419 0.6091 -0.2330 0.8158 -1.3360 1.0521

Not Rated -0.5385 0.7306 -0.7370 0.4611 -1.9708 0.8937

Petrol 1.4170 0.8047 1.7610 0.0783 -0.1605 2.9944

Real Estate -0.5592 0.2119 -2.6390 0.0083 -0.9747 -0.1438

Religious Organizations -1.7820 0.6225 -2.8630 0.0042 -3.0023 -0.5618

Retail -0.3142 0.0830 -3.7860 0.0002 -0.4769 -0.1515

Services -0.9292 0.1431 -6.4910 0.0000 -1.2098 -0.6486

Software -1.573 0.3811 -4.1280 0.0000 -2.3199 -0.8259

Sport 0.1844 0.1403 1.3140 0.1890 -0.0907 0.4595

Telecommunications -1.3730 0.6087 -2.2550 0.0242 -2.5659 -0.1795

Transportation -0.8389 0.1960 -4.2800 0.0000 1.2231 -0.4546

Business Form

Other Business Forms -0.6325 0.1909 -3.314 0.0009 -1.0066 -0.2583

Foreign Company 0.5650 0.3877 1.457 0.1451 0.1951 1.3251

Non Profit Organization -0.8296 0.2443 -3.395 0.0007 -1.3086 -0.3506

Partnership -0.2477 0.1203 -2.060 0.03948 -0.4835 -0.0119

Limited Company Benchmark - - - - -

Private Limited Company 0.0981 0.0872 1.1250 0.2606 0.0728 0.2690 Sole Proprietorship -0.4112 0.0853 -4.8190 0.0000 -0.5784 -0.2439

Union -0.6035 0.1556 -3.8780 0.0001 -0.9085 -0.2984

Number of Employees

0 Benchmark - - - - -

1 0.2233 0.0690 3.2340 0.0012 0.0879 0.3586

2-4 0.5338 0.0551 9.6940 0.0000 0.4259 0.6419

5-9 0.9382 0.0692 13.5600 0.0000 0.8026 1.0738

10-19 1.1820 0.0934 12.6500 0.0000 0.9987 1.3650

20-49 1.6680 0.1276 13.0780 0.0000 1.4182 1.9183

50-99 1.2350 0.2391 5.1660 0.0000 0.7665 1.7037

100-199 1.8340 0.4238 4.3260 0.0000 1.0028 2.6644

200-499 1.3310 0.4711 2.8250 0.0047 0.4072 2.2543

1000+ 1.8380 0.6085 3.0210 0.0025 0.6455 3.0310

Financial Information

Bankruptcy -0.7304 0.1955 -3.7360 0.0002 -1.1137 -0.3471

No Bankruptcy Benchmark - - - - -

Advertising

Non-advertising -0.0455 0.0456 -0.9980 0.3185 -0.1350 0.0439

Advertising Benchmark - - - -

(36)

5.1.1 Model Evaluation

A conclusion whether the residuals are normally distributed or not could be drawn by looking at the QQ-plot. The initial plot has a heavy tail(s) where some points are off. The heavy tail indicates that the residuals are not normally distributed.

(a) Histogram for the initial model (b) QQ-plot for the initial model

Figure 3: Histogram and QQ-plot for the Initial Model

Other plots were studied, where one could conclude that the error terms were homoscedastic (ap- pendix p 44). Outliers were handled with cook’s distance and further investigated in order to detect influential points (appendix p 44).

Statistics

By looking at Table 4 above, the p-value indicates if the coefficients are significant or not. For some of them the p-value is fairly high, which could depend on very few observations within the specific category. These covariates should therefore be further investigated, using tools and models described in the mathematical theory section.

Below statistics are further used to evaluate the model

Table 5: Statistics for the Initial Model Statistic Initial Model

R2 0.2522

Adjusted R2 0.2456 Standard Error 1.1920

P-value 0.0000

The value for R2 means that the covariates explains 25,2 % of the total variation in the dependent variable (Frost, 2014). The fact that the value of the adjusted R2is very close to R2indicates that the regression is not penalized by the amount of covariates, which is very positive in the context.

(37)

Multicollinearity

The possible problem of multicollinearity was addressed using the VIF-method, where any values larger than 10 would indicate multicollinearity. By performing this test, one could conclude that multicollinearity was not an issue. This due to the large number of observations.

5.1.2 Reducing the Model Transformation of variables

Average income was log-transformed in order to reduce the heavy tail in the QQ-plot.

The covariate, local card transactions was transformed as below

log ( p

1 − p) (29)

where p equals local card transactions from the data set. The variable is log-transformed in order for the datapoints to be more equally spread and normally distributed. By transforming the covariate according to the above formula, the end value will approach infinity when p approaches zero, and vice versa. Values equal to zero have been set to a very small value (0,00001) and values equal to 1 have been set to a value as close to 1 as possible (0,99999). The below graph shows the normal distribution of this covariate.

Figure 4: Histogram for the transformed variable Local Card Transactions

Variable Selection

To reduce the model, AIC and the p-values were mainly used, where any covariates with a too high p-value were further investigated. Some of the covariates had very high p-values due to very few number of observations and were therefore regrouped within a more general category. Other covariates were simply removed, due to high p-values and stepwise performed AIC.

These covariates were grouped within ”Other Industries”

• Airlines

• Florists

• Religious organizations

These covariates were removed completely

(38)

• Bankruptcy

• Advertising

• Population Density

Regarding the number of employees, any merchants with more than 200 employees were grouped within a new 200+ interval. This due to low level of significance, which is stressed by the fact that small- and mediumsized enterprises (SME), according to the enterprise policy in EU consists of less than 250 employees ((eurostats) et al., 2015).

The reason for not removing all covariates that did not show a reasonable level of significance is the fact that they could still explain some of the variation in the response variable, and the p-value is solely an indication of not being able to reject the null hypothesis.

5.2 Final Model

Table 6: Summary table for the Final Model

Covariate Estimate Std.Error t P-value Lower Upper

(Intercept) 13.2700 1.4610 9.0840 0.0000 10.4074 16.1351

Local Card Transactions 0.1894 0.0119 15.8880 0.0000 0.1660 0.2128

Age -0.0091 0.0016 -5.5830 0.0000 -0.0122 -0.0059

Number of Sales Locations 0.1018 0.0138 7.3730 0.0000 0.0747 0.1288

Average income -0.1438 0.1274 -0.5250 0.5996 -0.6806 0.3930

Industry

Apparel -0.1189 0.0638 -1.8640 0.0623 0.4395 0.0061

Automotive -0.6579 0.0407 -10.2690 0.0000 0.7835 0.5323

Banking -0.4515 0.2765 -1.6330 0.1025 0.9936 0.0905

Books Office Supplies -0.4035 0.2911 -1.3860 0.1658 0.9742 0.1673

Cafe/Bar/Restaurant Benchmark - - - - -

Clubs -9.9200 0.1127 -8.8030 0.0000 -1.2129 -0.7712

Construction -1.1790 0.1247 -9.4530 0.0000 -1.4237 -0.9346

Consulting -0.6419 0.1575 -4.0760 0.0000 -0.9506 -0.3332

Education -1.2600 0.2254 -5.5920 0.0000 -1.7023 -0.8185

Electronics 0.0756 0.1692 0.4470 0.6548 -0.2556 -0.4073

Entertainment -0.5511 0.1223 -4.5040 0.0000 -0.7909 -0.3112

Food and Beverage -0.1270 0.1069 -1.1880 0.2350 -0.3366 -0.0826

Furniture and Interior -0.2541 0.0984 -2.5860 0.0097 -0.4467 -0.0615

Gaming and Gambling 0.9655 0.3984 2.4240 0.0154 0.1846 1.7465

Government -0.8040 0.0398 -2.0190 0.0435 -1.5848 -0.0233

Groceries 0.1558 0.0701 2.2240 0.0262 0.0185 0.2932

Health and Beauty -0.9327 0.0576 -16.2760 0.0000 -1.0501 -0.8243

Healthcare -1.1090 0.0688 -16.1210 0.0000 -1.2440 -0.9743

Hotels and Camping 0.3341 0.1342 2.4900 0.0128 0.0711 0.5971

Media 0.6778 0.1315 -5.1550 0.0000 -0.9355 -0.4200

Other Industries -0.6413 0.0613 -10.4690 0.0000 -0.7614 -0.5212

Petrol 1.2550 0.5979 2.0990 0.0360 0.0827 2.4268

Real Estate -0.7106 0.1617 -4.3960 0.0000 -1.0274 -0.3937

(39)

Retail -0.2885 0.0633 -4.5620 0.0000 -0.4125 -0.4512

Services -0.6712 0.1122 -5.9800 0.0000 -0.8913 -0.1167

Software -0.7453 0.3207 -2.3240 0.0201 -0.1374 -0.1167

Sports and Recreation Clubs 0.1277 0.1058 1.2080 0.2272 -0.0796 0.3350

Telecommunications -0.9283 0.4886 -1.9000 0.0575 -1.8860 0.0030

Transportation -0.3769 0.1569 -2.4030 0.0163 -0.6844 -0.0694

Business Form

Other Business Forms -0.6331 0.1527 -4.1460 0.0000 -0.9325 -0.3374

Foreign Company 0.3764 0.2885 1.3050 0.1921 -0.1892 0.9412

Non Profit Organization -0.5457 0.1984 -2.7510 0.0060 -0.9346 -0.1568

Partnership -0.3764 0.0928 -3.2320 0.0012 -0.4817 -0.1180

Limited Company Benchmark - - - - -

Private Limited Company 0.0499 0.0674 0.4000 0.4593 -0.0822 0.1819 Sole Proprietorship -0.3841 0.0662 -5.8030 0.0000 -0.5139 -0.2544

Union -0.5180 0.1242 -4.1700 0.0000 -0.7615 -0.2744

Number of Employees

0 Benchmark - - - - -

1 0.0600 0.0534 1.1230 0.0000 -0.0447 0.1647

2-4 0.2773 0.0420 6.6040 0.0000 0.1950 0.3596

5-9 0.6633 0.0525 12.6730 0.0000 0.5604 0.7662

10-19 0.8254 0.0710 11.6180 0.0000 0.6861 0.9646

20-49 1.2970 0.0983 13.1990 0.0000 1.1043 1.4895

50-99 1.0680 0.1943 5.4960 0.0000 06868 1.4484

100-199 1.5760 0.3557 4.4320 0.0000 0.8790 2.2735

200+ 0.7197 0.3557 1.9990 0.0457 0.0139 1.4255

Below is the equation for the final model with the estimated β-values

CardT ransactionV olume =

7

Y

i=0

eβi kxi = 13.27

∗ e0.1894∗log ( LocalCardT ransactions 1−LocalCardT ransactions)

∗ e−0.0091∗Age

∗ e0.1018∗N umberof SalesLocations

∗ e−0.1438∗log (Averageincome)

∗ eβ5k∗BusinessF ormk=0,1,...,9

∗ eβ6k∗Industryk=0,1,...,28

∗ eβ7k∗N umberof Employeesk=0,1,...,8

(30)

References

Related documents

Stöden omfattar statliga lån och kreditgarantier; anstånd med skatter och avgifter; tillfälligt sänkta arbetsgivaravgifter under pandemins första fas; ökat statligt ansvar

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

För att uppskatta den totala effekten av reformerna måste dock hänsyn tas till såväl samt- liga priseffekter som sammansättningseffekter, till följd av ökad försäljningsandel

Syftet eller förväntan med denna rapport är inte heller att kunna ”mäta” effekter kvantita- tivt, utan att med huvudsakligt fokus på output och resultat i eller från

Generella styrmedel kan ha varit mindre verksamma än man har trott De generella styrmedlen, till skillnad från de specifika styrmedlen, har kommit att användas i större

I regleringsbrevet för 2014 uppdrog Regeringen åt Tillväxtanalys att ”föreslå mätmetoder och indikatorer som kan användas vid utvärdering av de samhällsekonomiska effekterna av

Närmare 90 procent av de statliga medlen (intäkter och utgifter) för näringslivets klimatomställning går till generella styrmedel, det vill säga styrmedel som påverkar

Finanční analýza je jedním z nejčastěji uţívaných a tedy nejdůleţitějších nástrojů ke zhodnocení finančního zdraví podniku, a proto byla zvolena jako