Algorithm that creates product combinations based on customer data analysis

(1)

IN

DEGREE PROJECT TECHNOLOGY, FIRST CYCLE, 15 CREDITS

,

STOCKHOLM SWEDEN 2017

Algorithm that creates product

combinations based on customer

data analysis

An approach with Generalized Linear Models

and Conditional Probabilities

ENKHZUL UYANGA

LIDA WANG

(2)

(3)

Algorithm that creates product

combinations based on customer

data analysis

An approach with Generalized Linear Models

and Conditional Probabilities

ENKHZUL UYANGA

LIDA WANG

Degree Projects in Applied Mathematics and Industrial Economics Degree Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2017

Supervisor at eleven AB: Joachim Ramberg

(4)

TRITA-MAT-K 2017:17 ISRN-KTH/MAT/K--17/17--SE

Royal Institute of Technology School of Engineering Sciences

KTH SCI

(5)

Abstract

This bachelor’s thesis is a combined study of applied mathematical statistics and indus-trial engineering and management implemented to develop an algorithm which creates product combinations based on customer data analysis for eleven AB. Mathematically, generalized linear modelling, combinatorics and conditional probabilities were applied to create sales prediction models, generate potential combinations and calculate the condi-tional probabilities of the combinations getting purchased. SWOT analysis was used to identify which factors can enhance the sales from an industrial engineering and manage-ment perspective.

Based on the regression analysis, the study showed that the considered variables, which were sales prices, brands, ratings, purchase countries, purchase months and how new the products are, affected the sales amounts of the products. The algorithm takes a barcode of a product as an input and checks whether if the corresponding product type satisfies the requirements of predicted sales amount and conditional probability. The algorithm then returns a list of possible product combinations that fulfil the recommendations.

(6)

(7)

Sammanfattning

Algoritm som skapar produktkombinationer baserad på kunddata

analys: En metod med Generaliserade Linjära Modeller och

Betingade Sannolikheter

Detta kandidatexamensarbete är en kombinerad studie av tillämpad matematisk statistik och industriell ekonomisk implementering för att utveckla en algoritm som skapar pro-duktkombinationer baserad på kunddata analys för eleven AB. I den matematiska delen tillämpades generaliserade linjära modeller, kombinatorik och betingade sannolikheter för att skapa prediktionsmodeller för försäljningsantal, generera potentiella kombinationer och beräkna betingade sannolikheter att kombinationerna bli köpta. SWOT-analys an-vändes för att identifiera vilka faktorer som kan öka försäljningen från ett industriell ekonomiskt perspektiv.

Baserat på regressionsanalysen, studien har visat att de betraktade variablerna, som var försäljningspriser, varumärken, försäljningsländer, försäljningsmånader och hur nya produkterna är, påverkade försäljningsantalen på produkterna. Algoritmen tar emot en streckkod av en produkt som inmatning och kontrollerar om den motsvarande pro-dukttypen uppfyller kraven för predikterad försäljningssumma och betingad sannolikhet. Algoritmen returnerar en lista av alla möjliga kombinationer på produkter som uppfyller rekommendationerna.

(8)

(9)

Acknowledgements

We would like to express our gratitude to our supervisor, Professor Henrik Hult at the Department of Mathematics, for his great support, inspiration and valuable advices dur-ing the project. In addition, we would like to thank our supervisor, Professor Pontus Braunerhjelm at the Department of Industrial Economics and Management, for his help and suggestions during the project.

(10)

(11)

Introduction

1.1 Background

As the e-commerce market grows rapidly, there is a growing tendency in online beauty products. According to Euromonitor International, global consumers spent $24 billion online on beauty and personal care products in 2015, which is equal to 6,5% of the total market and double of the statistics back in 2010 (Weinswig, 2016). The industry actors are using different campaigns, offers, personalized services, smart promotions and such, which intensifies the competition within the market. The importance of understanding customers’ needs in order for companies to keep their competitive advantages is increas-ing.

Advancements in information storage give the e-commerce market valuable information about the customers, if interpreted correctly. Such as seasonal variations, extremely short clockspeeds of beauty products and market dynamism make it complex to predict customer’s buying behaviour without mathematically analysing the big data.

The thesis is a collaboration between the authors and eleven AB which is a certified distributive online reseller located in Sweden. eleven AB offers a wide range of beauty products in categories: makeup, perfume, skincare and haircare. Since its establishment in 2004 in Västervik, Sweden, the company successfully expanded into neighbouring mar-kets, Finland and Norway. As of today, eleven AB is one of Scandinavia’s leading beauty stores and one of Sweden’s oldest online reseller of beauty products.

(14)

combinations for its customers. The combinations are created independently by workers, considering the current trends and campaigns (Ramberg, 2017). Most of the current product combinations consist of the same product types, which are proved to be not so well-selling.

The sales statistics are not only affected by different offers, but also determined by the effectiveness of the marketing strategy. A good marketing strategy will benefit eleven AB with increased sales and a more sustainable competitive advantage.

The thesis is of impact for both eleven AB and its customer base. If the product com-binations are of customers’ tastes and needs, they could simply buy the package for a better price than purchasing each of the products separately both on eleven or from other market competitors. eleven AB will hopefully be able to increase its average order value. As to mention earlier similar studies, recommender systems (RS) are widely used in the e-commerce market for personalized recommendations of other products for each cus-tomer. This, both, enriches the online experience and increases the revenues significantly (Schafer, Konstan and Riedl, 1999). As ideas and concepts will be studied from these basic approaches in order to simplify the development of the formula, one has to realise that the product combinations will be developed based on the all existing factual cus-tomer data. Furthermore, as a RS tries to make predictions on user preferences and make recommendations that should interest a certain customer, the product combinations have to consider the final likelihood of customers buying the specific combinations consisting of two or more products. This eliminates the characteristic of being personal from the algorithm which will be derived from this study, unlike the recommender systems.

1.2 Purpose

The purpose of this thesis is to find, through regression analysis, the significant factors or variables that lead to purchase of different products. The variables considered are sales price, rating, brand, purchase month, purchase country and how new or old the product is. Combinations are made of products with high sales predictions. Conditional proba-bilities are applied afterwards to create matrices that explain the purchase probaproba-bilities of the product combinations consisting of two and three products.

Based on the analysis, an algorithmic specification is developed which creates product combinations including two or three different products. The algorithm should be able to take a barcode as an input and in return show barcodes for other products that together with the input makes an optimal product package for a specific month.

(15)

The algorithm, together with the marketing strategy recommendations, should be used by a staff in charge for purpose of optimizing the average order values of eleven AB.

1.3 Problem definition

With the intention of accomplishing the purpose of the thesis, the central questions that will be answered are:

• Which factors are most significant for sales statistics of certain product types? • Which product types have the highest sales amount during specific months? • Is it possible to create product combinations including two or three different

products of same and different product categories?

• If so, what is the conditional probability of a customer buying the second (and the third) product of the combination, given the first product as input?

In addition to the mathematical formulated questions above, the following questions will be addressed from a strategic perspective:

• How can better understanding of the market and its competitors increase sales?

1.4 Scope

This study creates combinations of two and three products. The combinations have no repetition, that is to say, the combinations consist of different product types. It is worth noting that only the three main product categories are studied in this thesis. A SWOT analysis is applied in order to identify internal and external factors that have positive respectively negative impacts on eleven AB’s sales.

(16)

Chapter 2

Theory

2.1 Generalized linear model

Generalized linear model (GLM) is a generalization of the ordinary linear regression, which the assumption with normal distribution is not requested. However, the response variable must be a member of the exponential family in a GLM, which consists of normal, Poisson, binomial, exponential and gamma distributions.

The general form of the distribution of the exponential family can be expressed as: f (yi, θi, φ) = e

yiθi−b(θi)

a(φ) +h(yi,φ)

where y_i is the response variable, θ_i is the natural location parameter and φ is a scale parameter.

For members of the exponential family, expectation and variance can be represented as: E(y) = µ = db(θi) dθi Var(y) = d 2_b(θ i) dθ2 i a(φ) = dµ dθi a(φ) (Montgomery, Peck and Vining 2012, p.450)

2.1.1 Link functions

The basic concept of a GLM is to develop a linear model by using an appropriate link function for the specific function. Assume that the expected value of the response variable can be written as:

(17)

Let ηi be the linear predictor. The function g that relates the mean of the response

variable to the linear predictor is called link function and it can be defined as: g(µi) = ηi= β0+ β1x1+ · · · + βkxk= x0iβ

where β0 is an intercept, and xi =

(

1 if βi is used in cell i

0 if βi is a reference cell

Thus, the expected value of the response variable is: E(yi) = g−1(ηi) = g−1(x0iβ).

Table 2.1 shows the most common distributions with link functions in a GLM. Distribution Link name Link function

Normal Identity g(µi) = µi Exponential Inverse g(µi) = 1 µi Gamma Inverse g(µi) = 1 µi Poisson Log g(µi) = ln(µi) Binomial Logit g(µi) = ln( µi 1 − µi )

Table 2.1: Link functions for the Generalized Linear Model

(Montgomery, Peck and Vining 2012, p.451)

2.1.2 Maximum likelihood estimation

Maximum likelihood is the theoretical method for estimating regression coefficients in a GLM. The basic idea behind the maximum likelihood estimation is to find the coefficients βi by creating a likelihood function such that the estimates have the highest probabilities

of matching the actual observed values. The likelihood function can be written as: L(y, θ, φ) =

n

Y

i=1

f (yi, θi, φ).

Since the logarithm is a monotonically increasing function and achieves its maximum value at the same point as the function itself, the log-likelihood can be used instead of the likelihood function.

ln(L(y, θ, φ)) =

n

X

i=1

f (yi, θi, φ).

The maximum-likelihood estimators, ˆβis can be obtained by solving the equations:

∂lnL ∂βi

(18)

Thus, the fitted generalized linear model will be given by: ˆ

yi= g−1(x0iβ).ˆ

For the Poisson distribution, the fitted generalized model becomes: ˆ

yi = ex

0 iβˆ

(Montgomery, Peck and Vining 2012, p.444-446)

2.2 Poisson distribution

The probability mass function of the Poisson distribution with mean µ is: pX(k) =

µk k!e

−µ_{, k = 0, 1, 2....}

A discrete Poisson distributed random variable is often denoted by X ∈ P o(µ) and the mean and variance of X are both equal to µ, i.e. E(X) = Var(X) = µ.

The Poisson distribution is a discrete probability distribution that represents the prob-ability of a given number of events occurring randomly within a fixed time interval or space. With events occurring randomly in a time interval, it means that events may occur at any time and they occur independently of each other. The assumption of inde-pendence implies that if several events occur over a period of time, it will not affect the number of events that occur in a later period. In general, the Poisson distribution can also be used to model the probability that a number of events occur in other specified intervals such as distance, area or volume.

(Blom, et al. 2016, p.180,182)

2.3 Variable selection and model evaluation

2.3.1 Wald test and confidence intervals

The Wald test is used to find out if explanatory variables in a model are significant and is often called a "significance test". The principle of the statistical test is to fit the re-gression model without restrictions and then evaluate whether the results appear within the range of sampling variability agree with the hypothesis (Greene 2012, p.155). The simplest case of the Wald test is tests on individual model coefficients. The null hypothesis is written as:

H0: βj = 0, H1: βj 6= 0,

and the test statistic for the null hypothesis is given by: Z0=

ˆ βj

se( ˆβj)

(19)

The Wald test also can be used to construct confidence intervals on individual regres-sion coefficients in the linear predictor. An approximate 100(1-α) percentage confidence interval on the j-th model coefficient is:

I_βˆ_j = [ ˆβj− Zα/2se( ˆβj), βˆj+ Zα/2se( ˆβj)]

where α is the significance level and the commonly used significance levels are 0.10, 0.05 and 0.01.

If the confidence interval I_βˆ_j does not include zero, the null hypothesis that the model

coefficient is zero should be rejected at the α% significance level. (Montgomery, Peck and Vining 2012, p.436-437)

2.3.2 AIC and BIC

The AIC measures the quality of each model, comparing with each of the other models. Let L denote the likelihood function for a specific model, the AIC is defined as:

AIC = −2ln(L) + 2p where p is the number of parameters in the model.

The difference of AIC between a full and a reduced model can be used to decide whether to include or exclude an explanatory variable. If the AIC of the full model is greater than the reduced model, the variable should be excluded. But it is worth noting that if the AIC of the full model is smaller than the reduced model, it can not conclude that the variable must be excluded since the AIC does not provide an absolute certain answer about which model is the best and therefore needs other complementary tests.

The Bayesian information criterion (BIC) is another criterion for model selection and it is closely related to the AIC. The BIC is formally defined as:

BIC = −2ln(L) + pln(L) and the model with lowest BIC is preferred.

(Montgomery, Peck and Vining 2012, p.336)

2.4 Multicollinearity

(20)

There are several ways for detecting multicollinearity, one of which is the eigensystem analysis of covariance matrix X0X. If there is one or more small eigenvalues λ, it implies that the near-dependencies exist among the columns of X. The eigensystem analysis also can be performed through examining the condition number of X0X, defined as:

κ = λmax λmin

In general, if κ is less than 100, there is no serious problem with multicollinearity. If κ is between 100 and 1000, it implies moderate to strong multicollinearity. For κ that exceeds 1000, the multicollinearity problem is severe.

(Montgomery, Peck and Vining 2012, p.285-298)

2.5 Combinatorics

For any integer n≥ 0 and any real number x and y, the binomial theorem is stated as below: (x + y)n= n X k=0 n k xn−kyk

Because of the binomial theorem, the numbers n_k are called binomial coefficients. The term combination is sometimes used to indicate a subset or to indicate an unordered collection of different items. Thus, n_k denotes the number of k-element subsets of an n-element set and can be formulated as below:

n k

= n!

(n − k)!k!

The key features of this combination are that order does not matter and repeated elements are not allowed.

(Mazur 2010, p.9)

2.6 Conditional probability

Given two events A and B, with P(B) > 0, the conditional probability of B occurring given that A occurs is defined as:

P (B|A) = P (A ∩ B) P (A) (Gut 2009, p.5)

Based on the definition, the idea of conditional probability extends naturally to the case when the probability of an event is conditioned on several events such as:

(21)

2.7 Recommender systems

Basic theories

Recommender systems are widely used in the e-commerce market for personalized rec-ommendations of other products for each customer. The recommended products can for example be physical goods, films, music, articles, social tags and services. The system enriches the online experience, increases the conversion rate and affects the revenues pos-itively (Schafer, Konstan and Riedl, 1999).

Theoretically, recommender systems are a "spectrum of systems describing any system that provides individualization of the recommendation results and leads to a procedure that helps users in a personalized way to interesting or useful objects in a large space of possible options"(Lampropoulus and Tsihrintzis 2015, p.1). A recommender system helps its user by filtering an overload of information by providing the most appropriate and valuable information for the specific user.

To make recommendations, personal information about the user preference is needed in order to predict the user’s rating for other items than they have been in touch before. There are three different methods of collecting knowledge about user preferences: im-plicit, explicit and mixing approach. The implicit approach does not require any active involvement from the user and is based on recording the user behaviour. A typical exam-ple of implicit rating is a historic purchase data. The explicit approach is based on user interrogation by requiring the user to specify their preference for any particular item. Lastly, the mixing approach is a combination of the previous two.

(22)

prediction based on similarities between items which have been rated by a common user. A common flaw of collaborative methods is the risk of sparse data. Since, every user is not obligated to rate and provide their opinions for every item, the sparsity leads to insufficient results. But, there are different techniques for reducing the sparsity and De-fault Voting is the simplest.

Additional studies

Many different methods, approaches and algorithms for recommender systems have been derived from earlier studies. One of the main purposes of further researches within this area is to find a better way of handling common problems such as sparsity, user interface issues, information loss and overall effectiveness of the system. Among the different re-search questions, ones that might be interesting for this study is shift of customer demand or interest and how the recommender systems can be combined with strategic operations of an organization.

Regardless if a customer’s future actions are predicted with his/hers past actions or if the assumption of ’similar users prefer similar items’ is used, these do not consider the interest shift of the customers with the change of time. Reliability of the user ratings and the quality of final recommendation performance are affected negatively when the interest shift is not considered. Zhimin Cheng, Yi Jiang and Yao Zhao(2010) proposed an effective collaborative filtering algorithm which uses time-based weighting function that makes final recommendations that reflect the change of user interest in time. The time-based weighting makes it possible to put more importance on more recent ratings when matching a user to another. The basic idea is that the algorithm specifies a time interval of user ratings, as to say, greater value for a time interval indicates an earlier evaluation time. Using the proposed weighting function, rating time intervals can be used to enhance the contribution of recent ratings to the prediction and weaken the con-tribution of earlier ratings.

(23)

sat-isfied only passively. When the win-win strategy was used, a bi-objective programming problem arose; maximizing both the suppliers profit and customer’s satisfaction under same conditions as in the other strategy. The experiment with the win-win strategy showed that the service level decreases while the profit gain increases. But the results also showed that it was possible to maintain a specific recommendation performance while increasing the profit gains.

2.8 SWOT-analysis

SWOT analysis, which stands for Strength, Weakness, Opportunity and Threat, is a process where internal and external inhibitors and enhancers of performance can be iden-tified. The analysis can be applied on organizations, people, business ideas, products, departments etc. Based on the analysis, these organizational influences can for instance help the stakeholders to decide what future actions to take from a strategic perspective (Silber et al. 2010, p.115-116).

Strength: an internal enhancer that give an advantage over others and excels the orga-nization from competitors.

Weakness: an internal inhibitor that stops the organization from performing at its opti-mum level.

Opportunities: an external enhancer of performance that can be pursued or exploited to gain benefit and competitive advantage.

(24)

Chapter 3

Methods

3.1 About the implementation

Concepts of recommender systems are used in this study. Historical sales statistics and product ratings are used as data in this project, which can be seen as a mixing approach to collecting knowledge about eleven AB’s customers. This study aims to generate prod-uct combinations and its probability of getting purchased based on customers’ preferences assuming that the preferences will be unchanged through time as in the content-based method.

The following parts in Methods are divided into two main parts. The purpose of the first part is to develop models with help of Generalized Linear Models (GLM) for each product type in order to identify the significant variables that lead to a purchase and also to predict sales amount. The significant variables are helpful when guiding the al-gorithm user in exactly which products should be included in the combinations later on. The predictions are needed when identifying the most sold products during different time intervals. The purpose of the second part is to test if the conditional probabilities of the combinations are high enough. In other words, the conditional probabilities are necessary for determining the feasibility of the proposed combinations.

(25)

The algorithm specifies the exact steps from getting a barcode as an input to offering barcodes as possible combinations. This can be seen as an overall summary of the meth-ods used for the study with a clear context.

The programme R is the main tool of this study. For each section in the following parts of Methods, a brief summary is presented including which functions are used in order to get the required results, in which format the data is stored etc. In addition, Python and Microsoft Excel are used as additional tools to sort some of the data.

3.2 Part I: Analysis and model development by GLM

3.2.1 Data collection

The data was obtained through assistance of eleven AB’s CTO and consisted of four main data categories: order/sales data, product data, group data and customer data. The order data contained information about each order and consisted of variables such as order ID, product ID, currency, purchase date, purchased quantity, customer ID and shipping cost. The sales data used in this project contained all orders during the period of 1st of January 2014 to 31st of December 2016. The product data contained informa-tion about each product and consisted of variables such as product ID, barcode, brand, rating, sales price and launch date. The group data contained information about the different product types and consisted of variables such as product ID, barcode and group titles of the products. The customer data contained information about the customers and consisted of the variables customer ID, country and city.

There is a need to clarify the hierarchical structure of the products and its groupings used in this study. Each product belongs to a product type and each product type belongs to a product category. For example, a ’red colored lipstick of brand X’ belongs to the product type ’lipstick’ and the product type ’lipstick’ belongs to the product category ’makeup’. The product type ’lipstick’ has a certain product ID and can contain many dif-ferent products with difdif-ferent barcodes, in other words, barcoded product is the bottom element of the structure. The highest element, product categories, can contain different product types, as ’lipstick’ and ’mascara’ both belong to ’makeup’.

The main data collections could be merged as one with help of barcodes, product ID, customer ID and order ID which links the files together. By merging, a final data file was able to be retrieved which included a detailed information about each order. Some data points have been deleted in advance because there were empty values on order ID and sales price.

(26)

had to be handled during the project. The final algorithm developed by this study can be used for the other product categories, which were not included in the steps of developing the algorithm.

The scoped data was manipulated and transformed into two types of variables as follow-ing:

Response variable Explanation

Sales quantity the total number of each sold

product type during a set time interval

Explanatory variables Explanation

Sales price the ordinary sales price of each

product

Brand the brand name of the product

Rating the numeric rating scale

in-dicated by points from 1 to 5 for each product or ’NA’ if the product has no rating

How new the measure on how new or old

the product is by calculating the number of days between the purchase and launch date

Purchase country the country where the purchase

was completed

Purchase month the month when the purchase

was completed

Table 3.1: The response and explanatory variables with explanations.

In R, the data is stored as a data frame where the rows correspond the observations or the orders and columns correspond the variables.

3.2.2 Model building

(27)

models for each product type. Initial grouping of data

The data was grouped into several groups for each explanatory variable. The main intention behind the grouping of the data was to better classify each variable’s character and to test for differences between the groups in overall evaluation. The factors that have been taken into consideration during the grouping of the variables were:

• The number of observations in each group had to be large enough and similarly distributed such that the GLM estimation would be stable.

• The groups had to be as sales performance homogeneous as possible such that the performance did not vary too much within the groups.

Explanatory variables Group 1 Group 2 Group 3 Group 4 Unit

Sales price <100 100-300 >300 - SEK

Brand Premium Others - -

-Rating NA 1-2 3 4-5

-How new < 1 1-2 2-3 >3 year

Purchase country SE FI NO Others

-Purchase month 1-2, 9-10 3-5 6-8 11-12 month

Table 3.2: The initial grouping of data, where NA stands for not available.

(28)

the whole year for the beauty e-commerce market.. Model evaluation

The Wald test was used to check if the chosen groupings led to significant estimators, βi, which can be expressed by showing the significance level for each explanatory variable.

Coefficients:

Estimate Std. Error z value Pr(>|z|) Intercept 0.001098 0.113156 0.010 0.99226 salesprice100-300 0.628124 0.053344 11.775 < 2e-16 *** salesprice≥ 300 0.027535 0.057623 0.478 0.63276 brandothers 0.305168 0.031202 9.780 < 2e-16 *** rating1-2 0.185946 0.055286 3.363 0.00077 *** rating4-5 -0.100352 0.033484 -2.997 0.00273 ** hownew1-2 0.035756 0.045394 0.788 0.43088 hownew2-3 -0.294372 0.049087 -5.997 2.01e-09 *** hownew>3 0.051804 0.044463 1.165 0.24398 counrtyNO 0.163335 0.108044 1.512 0.13060 countrySE 2.481706 0.086608 28.654 < 2e-16 *** counrtyOthers -0.150151 0.125921 -1.192 0.23309 month3-5 0.728554 0.043885 16.601 < 2e-16 *** month6-8 0.813184 0.043909 18.520 < 2e-16 *** month11-12 -0.232227 0.056488 -4.111 3.94e-05 ***

Table 3.3: The Wald test for product type A37, where the significance levels are denoted by: 0 ’***’, 0.001 ’**’, 0.1 ’ ’.

Table 3.3 shows the significance levels of product type A37 with regard to the initial grouping of data, which indicated that the initial grouping was not good enough since the groups How new and Purchase country tended to be not significant at all. In fact, the same problem with grouping of How new and Purchase country occurred for over one fourth of the product types, which implied that some new groupings were needed. New grouping of data

(29)

Explanatory variables Group 1 Group 2 Group 3 Group 4 Unit

Sales price <100 100-300 >300 - SEK

Brand Premium Others - -

-Rating NA 1-2 3 4-5

-How new ≤ 2 >2 - - year

Purchase country SE DKFINO Others -

-Purchase month 1-2, 9-10 3-5 6-8 11-12 month

Table 3.4: The new grouping of data, where NA stands for not available and DKFINO represents the country codes for Denmark, Finland and Norway.

New model evaluation Coefficients:

Estimate Std. Error z value Pr(>|z|) Intercept 2.80679 0.06616 42.422 < 2e-16 *** salesprice100-300 1.02609 0.05186 19.787 < 2e-16 *** salesprice≥ 300 0.33093 0.05678 5.828 5.60e-09 *** brandothers 0.30863 0.03099 9.959 < 2e-16 *** rating1-2 -0.24263 0.05346 -4.538 5.67e-06 *** rating4-5 -0.20255 0.03322 -6.097 1.08e-09 *** hownew>2 -0.09764 0.03186 -3.065 0.00218 ** counrtyOthers -2.89240 0.09970 -29.011 < 2e-16 *** counrtyDKFINO -2.21855 0.05389 -41.170 < 2e-16 *** month3-5 0.75875 0.04384 17.309 < 2e-16 *** month6-8 0.81355 0.04384 18.558 < 2e-16 *** month11-12 -0.25541 0.05644 -4.525 6.03e-06 ***

Table 3.5: The Wald test for product type A37 after new grouping, where the significance levels are denoted by: 0 ’***’, 0.001 ’**’, 0.1 ’ ’.

Table 3.5 shows the significance levels of product type A37 based on the new grouping. According to Table 3.5, all of the explanatory variables have become significant, which also means that the new grouping provided a better grouping, compared to the initial one. In fact, the overall significance of the variables How new and Purchase country im-proved notably since there were only a very few product types which had low significance after the final grouping.

(30)

3.2.3 Variable selection

There were few product types that had low significance or no significance at all for all groups under some specific explanatory variables. Under these circumstances, variable selection methods AIC and BIC were used in order to decide whether to include or ex-clude the variables of low significance.

AIC BIC

without full model without full model A26 207.6061 210.66 221.51 229.78 A35 579.86 555.204 600.72 585.332

Table 3.6: Methods AIC and BIC applied for product types A26 and A35, comparing a full model and a reduced model without the explanatory variable Purchase month.

AIC BIC

without full model without full model A26 207.58 210.66 219.75 229.78 A35 567.71 555.204 588.57 585.332

Table 3.7: Methods AIC and BIC applied for product types A26 and A35, comparing a full model and a reduced model without the explanatory variable Rating.

Table 3.6 shows that the product type A26 had a better model when the variable Pur-chase month was excluded. This result can be interpreted as month or season did not matter for expected sales performance. On the contrary, the variable should be preserved for the product type A35.

Table 3.7 shows that the product type A26 had a better model when the variable Rat-ing was excluded. This result can be interpreted as ratRat-ing did not matter for expected sales performance or that the available data was not enough to model a rating significant model since as mentioned earlier a majority of the products did not have rating data. On the other hand, the product type A35 had a better model when the variable was included. When a product type had a reduced model by excluding the explanatory variable Pur-chase month and/or more, a decision had to be made whether to keep the product type for the continued study or not. A reduced model could be a result of having insufficient data, which led to insignificant variables that can affect the final prediction model nega-tively. Particularly, if the Purchase month variable was removed, then a prediction model for each month could not be retrieved. Therefore, product types with this problem had to be removed in order to secure the significance of the final models.

(31)

3.2.4 Residual analysis

Figure 3.1: Residuals vs. Fitted values plot

Figure 3.2: Scale-Location plot

(32)

dashed line when the predicted values become lager. It indicates that the variances are not equal and there are non-linear relationships between the response variable and the regressors. Thus, the assumption without any requested normal distribution seems to be appropriate.

Figure 3.2 shows the relationship between the square root of the standardized deviance residuals and the predicted values. In fact, the figure is similar to the Figure 3.1. The only difference between these plots is the different way of showing y-axis versus predicted values. In Figure 3.2, the applied red line is quite linear, which implies that there is a proportional relationship between the standardized deviance residuals and the predicted values. It also verifies the accuracy of the assumption, namely using Poisson distribution as the GLM model, since the variance is equal to the mean for the Poisson distribution. From Figure 3.1 and Figure 3.2, there are some observations, e.g. 2, 22, 204, 258 that stand out from the cluster of observations, which can be considered as potential outliers. However, they do not necessarily indicate any problem. By deleting these potential out-liers and comparing new plots with the old ones, there were no big differences between the plots. These observations should therefore be retained and no further measurement is needed.

By checking the same plots for all other prediction models, they have been shown with similar tendencies, which imply that the assumption with Poisson distribution is correct. In R, the method plot() is used to create the residual plots.

3.2.5 Multicollinearity

Based on the theory of multicollinearity, there are several ways for detecting the problem of near-linear dependencies among the regressors, of which the condition number has been applied for analysis. The condition number of the covariance matrix for the prod-uct B3 turned out to be 4.659913, which implied that there was not any serious problem with multicollinearity. In fact, the condition number for every prediction model was less than 100. Therefore, it was unnecessary to deal with multicollinearity.

In R, the methods model.matrix(), nrow(), ncol(), cor() are used to create correlation matrix. Methods eigen(), max(), min() are used to calculate condition number.

3.2.6 Recommendations

(33)

The recommendations were retrieved by analysing the coefficients of each product type. Normally, the specific group that had the largest coefficient within each variable was chosen as the optimal group of that explanatory variable. If the percental difference of predicted sales amount of two different groups is lower than X%, then the both groups would be considered as optimal. In this study, X was set as 50. This requirement can of course be changed based on the algorithm user’s preferences and needs.

In R, the recommendation for each product type is stored as a new variable, adding a column to the data frame consisting of the product types.

3.2.7 Sales performance predictions

Based on the recommendations, sales performance predictions have been calculated for Sweden. When two or more groups were optimal within a same explanatory variable, an average coefficient value was used. By rounding the predicted numerals to integers, a predicted sales amount for each season was generated for each product type. By di-viding by the number of months included in each season, an average amount for each month was retrieved. Afterwards, one could clearly see which product types within the product categories A, B and C had the highest predicted sales amount during each month. In R, basic mathematical operations and the method round() are used.

3.3 Part II: Conditional probability matrices

3.3.1 Generating product combinations

Based on the prediction models for each product type, gained in Part I, totally n number of top selling product types for each season from the product categories A, B and C were chosen. In other words, a number of most-selling product types from category A, b number of most-selling product types from B and c number from C were chosen. This led to a + b + c = n product types. Exactly how much a, b and c should be can be chosen based on how high the user wants to set the ’most-selling percentile’. In this case, an upper percentile of fifty percent was chosen, that is, upper half of the product types within each category were chosen when ranked by highest predicted sales amount. The chosen product types were then used in creating combinations with no repetition consisting of two respectively three product types. The number of unique combinations for each season consisting of two product types were n₂. The number of unique combi-nations for each season consisting of three product types were n₃.

(34)

interest in this study as mentioned in the introduction.

In R, the function combn() is used to generate the combinations. The n top-selling prod-ucts are saved in an array with dimensions 1xn. The combinations are saved as matrices, with dimensions 2x n₂ and 3x n₃, where the columns are the different combinations and the rows are the product types included in the combinations.

3.3.2 Generating conditional probability matrices

In order to create conditional probability matrices for the unique combinations that have been obtained from Section 3.3.1 of two respective three-product types for each season, the original historical sales data was to be used and analysed once again. The main pur-pose of using the historical sales data was to validate the unique combinations that were gained based on the predictions model from Part I, i.e. if those combinations actually have been occurred historically and if there were customers who preferred those kinds of product combinations.

Out of the historical sales data, the number of each combination that existed in the com-bination lists in Section 3.3.1 and the quantity of each product types were calculated. This could be done by checking and counting which products that were purchased by the same customers with same order IDs. It is worth noting that customers could purchase the same products several times, which could be distinguished through different order IDs. Customers can also buy same products in the same order, but the study has not con-sidered these situations since combinations consisting of same products were out of scope. Thus, the probability for customers purchasing a specific product in a specific month, for example product A1, or purchasing products A1 and B1 respective A1 and B1 and C1 was calculated as follows:

P(A1) = the sales amount of product A1/ the total number of order IDs

P(A1 ∩ B1) = the sales amount of product combinations of A1 and B1 / the total num-ber of order IDs

P(A1 ∩ B1 ∩ C1) = the sales amount of product combinations of A1, B1 and C1 / the total number of order IDs.

(35)

P (B1|A1) = P (A1 ∩ B1) P (A1)

P (C1|A1 ∩ B1) = P (A1 ∩ B1 ∩ C1) P (A1 ∩ B1)

where P(B1|A1) indicates the conditional probability of purchasing product B1 given that the customer has already purchased product A1 and P(C1|A1 ∩ B1) represents the conditional probability of purchasing product C1 given that the product combination of product A1 and B1 has been purchased earlier.

When calculating the conditional probability of the three-product combination, there were several aspects to consider. First of all, there had to be enough data in order for the result to be mathematically reliable. Therefore, a requirement, Y, of minimum 15 counts on P(A1 ∩ B1 ∩ C1) had to be set. Secondly, one had to make sure that the probability of products A1 and B1 getting purchased was high enough in order to make the probability of C1 getting purchased worthy. Therefore, both P(B|A) and P(A|B) had to be over a certain requirement, Z, for the next calculation to take place. The requirement was set as 0.3.

In R, the function array() is used to create matrix with products or product combinations as column and row names. By using several for-loops and basic operations of arithmetic, every cell of a matrix is filled with corresponding conditional probabilities.

3.4 Algorithm formulation

The algorithm, based on the initial data, creates sales prediction models with help of GLM with Poisson distribution for desired product types. The sales enhancing variable groups are classified as recommendations for each product type. Sales amounts are pre-dicted for each month and product type. Combinations of two or more products are generated for the well-selling products for respective seasons. The conditional proba-bilities matrices are then generated. Based on the user’s requirements of probability requirements, the algorithm suggests combinations including the input product.

(36)

Requirement variable:

When to use: When setting a requirement: Value used in this study:

X (p.21) When deciding

recommenda-tions based on regression co-efficient values. If the sec-ond largest coefficient within a variable group has nearly similar value as the max, both should be included.

Check whether the percental difference of the predicted sales amount of two different coefficients within a variable group is less than X%.

50

a,b,c (p.21) When deciding how many

top selling products for each month from each product category should be included

when generating

combina-tions.

A ’most selling percentile’ should be selected to choose a specific number of prod-uct types among the highest ranked ones in predicted sales amount for each category.

a = 0.5*number of prod-uct types in category A; b = 0.5*number of prod-uct types in category B; c = 0.5*number of prod-uct types in category C.

Y (p.23) When deciding whether if

P(A1 ∩ B1 ∩ C1) data is enough for calculations of conditional probability for three products.

When deciding whether if P(A1 ∩ B1 ∩ C1) data is enough for calculations of conditional probability for three products.

15

Z (p.23) When deciding whether if

P(B|A) and P(A|B) are

enough to calculate the con-ditional probability for three products.

Must be over a certain level in order to secure that the calcu-lated probability of C getting purchased was reliable.

0.3

(37)

Chapter 4

Results

4.1 Part I

4.1.1 Sales performance prediction models

The final model can be expressed as:

log(y) = β0+ 12

X

i=1

xiβi.

In order to make the formula more reader-friendly, the final model can be described as below:

log(predicted sales amount) = β0 +      0, if sales price < 100 β1, if sales price 100 − 300 β2, if sales price > 300 + ( 0, if premium brand β3, if other brands +          0, if rating NA β4, if rating 1 − 2 β5, if rating 3 β6, if rating 4 − 5 + ( 0, if new β7, if old +      0, if country SE β8, if country DKFINO β9, if country others +          0, if month 1 − 2, 9 − 10 β10, if month 3 − 5 β11, if month 6 − 8 β12, if month 11 − 12

(38)

A1 A3 B3 B6 C6 C7 Intercept β0 1.7468035 5.34711268 4.841939 6.076943 3.43216 7.452706

Sales price <100 ref 0 0 0 0 0 0

Sales price 100-300 β1 0.4164126 0.48178607 1.222029 -0.114579 0.84161 -0.740294

Sales price >300 β2 -1.6926008 -0.07297118 -3.375762 -2.945554 -3.61049 -3.827561

Premium Brand ref 0 0 0 0 0 0

Other brands β3 1.2352715 0.14443347 1.040213 -0.192243 0.68234 1.325882

Rating NA ref 0 0 0 0 0 0

Rating 1-2 β4 NA -0.73942075 -2.119507 -1.636227 -3.21259 -3.725096

Rating 3 β5 NA -1.21466017 -0.160187 -0.911495 -1.50872 -3.764935

Rating 4-5 β6 0.6278300 0.62030802 0.140401 1.142131 1.04570 -2.119104

How new ≤ 2 ref 0 0 0 0 0 0

How new >2 β7 1.0867559 0.21598123 0.670811 0.185135 0.92079 -0.592148 Country SE ref 0 0 0 0 0 0 Country DKFINO β8 -2.0689644 -1.86755191 -2.462620 -1.926567 -2.30330 -1.829067 Country others β9 -3.0197942 -3.81171389 -4.599171 -4.813035 -4.55466 -4.245881 Month 1-2,9-10 ref 0 0 0 0 0 0 Month 3-5 β10 0.4685151 -0.42065135 -0.272584 -0.227164 -0.30863 -0.249008 Month 6-8 β11 0.9445195 -0.49300602 -0.338957 -0.346635 -0.30497 -0.216541 Month 11-12 β12 -0.5008866 -0.36292340 -0.402817 -0.431793 -0.70143 -0.441308 Table 4.1: Showing the coefficients and reference groups of product types A1, A3, B3, B6, C6 and C7, where NA

(39)

4.1.2 Recommendations for each product type

Product type Recommendation

A23 Price between 100-300/Does not depend on brand/Rating between 4-5/Old A6 Price between 100-300/Premium brands/Rating between 4-5/Old

A31 Price lower than 100/Other brands/No rating/New

A26 Price between 100-300/Does not depend on brand/No rating/Old A37 Price lower than 100/Other brands/Rating between 4-5/Old

Table 4.2: Showing the recommendations of five most selling product types in category A.

From Table 4.2, one can see the recommendations for the top five most selling product types of category A. For instance, if A23 is included in a combination, the user should choose a product that costs between 100-300 SEK, has a rating between 4-5 and has launched more than 2 years ago. While which brand the product has does not matter. When a combination consists of a certain product type, its recommendation should be followed when choosing the respective products.

4.1.3 Selection of product types

With help of the created models and the recommendations, a sales prediction was calcu-lated for each product type each month.

Months 1-2, 9-10 Months 3-5 Months 6-8 Months 11-12

A1 42 88 142 50

A3 227 198 185 315

A6 458 384 411 559

Table 4.3: Showing the predicted average sales amount per month for product types A1, A3 and A6.

(40)

Months 1-2, 9-10 Months 3-5 Months 6-8 Months 11-12 Top 1 C7 C7 C7 C7 Top 2 B19 C9 C9 C9 Top 3 C9 B19 B19 B19 Top 4 B22 B22 B22 B22 Top 5 B3 B3 B30 B29

Table 4.4: The top five best-selling product types for specific months.

4.2 Part II

4.2.1 Product combinations

Combination 1 Combination 2 Combination 3 Combination 4 Combination 5

Product type A5 A8 B3 C7 B6

Product type A3 B19 C14 B9 B13

Table 4.5: An extract of unique combinations of two products for months 1-2, 9-10.

Table 4.5 shows an extract of the unique combinations of two-product type combinations in the form of matrix. For example, the two-product combination of ’foundation’ and ’mascara’ is recommended during months January to February and September to Octo-ber; the combination of ’gel nail polish’ and ’mini nail polish’ is recommended during spring (Mar. to May); the combination of ’foundation’ and ’eyeliner’ is recommended during summer (June to Aug.) and the combination of ’nail file’ and ’gel nail polish’ is recommended during the winter (Nov. to Dec.).

Combination 1 Combination 2 Combination 3 Combination 4 Combination 5

Product type A20 A31 A5 A31 A21

Product type B16 A9 A41 C6 B19

Product type A23 A6 B9 B11 B11

Table 4.6: An extract of unique combinations of three products for months 1-2, 9-10.

(41)

’rouge’, ’contouring’ and ’highlighter’ is recommended during months January to Febru-ary and September to October; the combination of ’face mask’, ’skin problem (body)’ and ’skin problem (face)’ is recommended during spring (Mar. to May); the combination of ’sports, health & travel’, ’bronzer’ and ’sunless tanning’ is recommended during the summer (June to Aug.) and the combination of ’eye shadow’, ’bronzer’ and ’highlighter’ is recommended during the winter (Nov. to Dec.).

Observe that the examples above are not based on Table 4.5 and Table 4.6.

4.2.2 Conditional probability matrices

A1 A1 ∩B1

B1 P(B1|A1)

-C1 P(C1|A1) P(C1|A1 ∩B1)

Table 4.7: The structure of the conditional probability matrix.

Table 4.7 shows the structure of the conditional probability matrix, where each cell de-notes the conditional probability of a product in the rows given a product in the columns. Totally, 8 different conditional probability matrix have been obtained.

A5 A31 A21 A15 A10

A17 0.0198660714 0.0195089873 0.024049651 0.2402221970 0.0113759480 A29 0.0696428571 0.0427444103 0.038789760 0.0192721914 0.0755687974 A3 0.2292410714 0.1030249890 0.050426687 0.0252805804 0.0731310943 A36 0.0290178571 0.0241122315 0.036462374 0.2647092166 0.0213976165 A22 0.0046875000 0.0032880316 0.003878976 0.0015871216 0.0487540628

Table 4.8: An extract of the conditional probability matrix of two-product combinations for months 1-2, 9-10.

(42)

A5 ∩ A31 A5 ∩ 21 A31 ∩ A8 A31 ∩ A40 A21 ∩ A32 B30 0.020100503 0.000 0.005882353 0.00000000 0.07692308 B12 0.040201005 0.025 0.111764706 0.05405405 0.12820513 B29 0.030150754 0.000 0.035294118 0.00000000 0.00000000 B25 0.105527638 0.100 0.088235294 0.08108108 0.20512821 B8 0.035175879 0.000 0.023529412 0.02702703 0.07692308

Table 4.9: An extract of the conditional probability matrix of three-product combinations for months 1-2, 9-10.

Table 4.9 shows an extract of the conditional probability matrix of three-product combi-nations. For instance, the conditional probability for purchasing ’face mask’ given that a customer has already purchased ’Quick fix’ and ’Anti-age’ is 0.054662379 for months November to December.

Observe that the examples above are not based on Table 4.8 and Table 4.9.

4.3 Algorithm

(43)

4.4 SWOT chart

Strengths Weaknesses

• Different shipping alternatives • Few premium brands

• Free standard shipping • Personalized service available only

for specific brands

• Personalized service: engravings • Different shipping alternatives not

so well-adapted among customers

• Active campaigns • Not known to be specialists on

cer-tain area of products • New and trendy products launch fast

• Student discount

• Sponsoring of fashion/beauty events • Beauty blog/magazine

• Active on social media

(Facebook, YouTube, Instagram, Twitter and Spotify.)

Opportunities Threats

• More sponsoring of public events • Not having a better competitive

advantage against the competitors

• More focus on the affiliate program • Lower barriers to market entry

• Campaigns related to the physical stores • Widen the shipping alternatives and adapt the existing ones more for the customers

• Annual campaigns with regards to different holidays and special occasions

• Campaigns relates to viral trends or beauty phenomenon (e.g. organic cosmetics and South Korean beauty products.)

Table 4.10: The SWOT chart for eleven AB.

(44)

Chapter 5

Discussion

5.1 Interpretation of the results

In total, 89 prediction models were developed for the different product types. The overall significance of the variable groups and residual analysis showed to be good. Furthermore, the presence of multicollinearity was low. These promising results are probably a positive consequence of the large number of observations. The recommendations were particular for every product type. A brief interpretation of each variable and its recommendations is given below.

Sales price: The three different groups of sales prices fitted the product types of the categories A, B and C well. It was observed that the sales price between 100-300 was the group that tended to be most selling for categories A and B. This can be explained as most products for these categories have sales prices within this range. Furthermore, products for these categories within this price range most likely match the customer’s expectation. As for category C, prices lower than 100 was most selling. This has same explanations as earlier but for category C.

Brand: The two different groups of brands fitted the product types of the categories A, B and C well. It was observed that the other brands were most selling for all categories. The reason behind this can be that the number of premium brands in contrast to number of other brands is low.

Rating: The four different groups of ratings fitted the product types of the categories A, B and C well. Higher rating between 4-5 had a very positive influence on the sales amount while lower ratings 1-2 had a negative impact, more negative than No rating. This can be explained as lower ratings actively give negative review of a product for the customers, while products with no ratings leave a neutral impression on the customers. Same trend was observed for all categories.

(45)

of the categories A, B and C well. How new the product should be seemed to differ for the product categories. For categories B and C, newer products seemed to have more positive influence on the sales amount. As for category A, older products were more popular. These results can be explained by the characteristic difference of the categories. Categories B and C are more trend-related and number of new launchings per time in-terval is higher than product types related to category A.

Purchase country: The three different groups of purchase countries fitted the product types of the categories A, B and C well. It was clear that Sweden is the biggest and main market. The high significance of the groupings also tells that markets of Denmark, Finland and Norway are sales performance homogenous.

Purchase month: The four different groups of purchase months fitted the product types of the categories A, B and C well. Sales amounts had seasonal variations, for instance sun protection products dominated the summer season. On the other hand, many product types had a constant high sales amount throughout a year.

When the recommendation for a certain product says No rating, the rating of the prod-uct does not matter. Observe that this can occur because that non-rated prodprod-ucts sell better or that none of the products had ratings from the very beginning so that this recommendation had to be given. Similarly, Price lower than 100 can be interpreted as that cheaper products of the given product type sells better or that all products within the same type is under 100. These interpretations should be taken into account when using the recommendations.

The study is based on factual sales statistics of three years. The regression method of generalized linear model with Poisson distribution proved to fit the data. Therefore, the results are reliable. However, the different requirement variables and calculations with average values could have reduced the final reliability of the results.

Furthermore, other approaches with the same problem definition as this study could get different results. Importantly, the offered total price of a final combination and whether if the customers have the right to choose between different products while combining will hugely affect the sales amount of the combinations. In other words, the marketing strategy will affect the factual outcome of the mathematical predictions.

5.2 Marketing strategy analysis

Strengths:

(46)

service, engravings of specific products gives further value to the products. eleven AB almost always have different campaigns, such as sales and special offers, which undoubt-edly enhance sales. It is notable that student discount is offered on all standard orders. The range of the product offering is wide and new and trendy products from the existing brands often launch very fast on the site. The company has an affiliate program and has sponsored or cooperated with different public events which of course help building the brand. Furthermore, many social media channels are used which enable direct contact and updating with the customers.

Weaknesses:

One can easily see that campaigns in form of special offers and personalized services are only available for some premium brands. The premium brands get a lot of mar-keting while impacts of many other brands are getting underestimated and may not be optimally exploited. Even though there are shipping alternatives that are unique and different, they tend to be not so well-adapted among the customers. Another weakness of the marketing is that the company still not possess a competitive advantage related to being a specialist in a certain beauty area, compared to for instance Lyko which is specialized in haircare.

Opportunities:

As for the opportunities, eleven AB is considering more sponsoring of public events in order to increase its visibility. For instance, being featured on an event’s brochure, ap-pearing on an event’s official website with eleven’s logo or having an item in the goodie bags are all good ways to highlight its sponsorships, which in turn will improve eleven’s brand recognition. Furthermore, eleven should also focus more on the affiliate programs. Besides the current affiliate marketing channel, i.e. blogs, it is worth expanding to other platforms such as YouTube. By collaborating with well-known youtubers, both eleven as a company and its products will get more visibility and wider publicity. In addition to the above-mentioned points, there are also a number of opportunities in regards to campaign effectiveness. An obvious difference between online retailer and physical store is that customers can first try and then buy products. Thus, for customers who prefer to try on the products, more physical stores and campaigns related to the physical stores are needed. The campaigns should also be related to different trendy products and brands, for example, South Korean and organic cosmetics and skincare products have become more popular in recent years. By doing frequent market research, eleven can always keep track of the latest market trends and customer demands. It will also be a beneficial way to create annual campaigns with respect to different holidays such as Valentine’s Day and Black Friday. Last but not least, eleven should be able to widen its shipping alternatives and adapt the existing ones more for the customers.

(47)

The biggest threat that eleven is facing relates to its competitors in the local market such as Lyko and Kicks. As mentioned before, Lyko is more specialized in haircare prod-ucts and Kicks is established much earlier in addition to enjoying a higher popularity than eleven. Compared to its competitors, eleven does not have a stronger competitive position with regard to either prices or products. Another potential threat is the lower entry barriers to the beauty e-commerce market since the costs to develop and provide e-commerce services are relatively low. Thus, eleven will be faced with more intense competition in a rapidly evolving industry as beauty e-commerce market.

Discussion:

From the SWOT analysis, one can see that similar points are mentioned as strengths as well as weaknesses and opportunities. The interpretation is that the company should fo-cus on the current marketing strategy simultaneously as taking steps to improve/strengthen it further. Especially for the shipping alternatives, eleven should optimize them and make them more adapted for its customers. eleven has since long made great efforts on offering different campaigns, but the company should pay increasing attention to the selection of campaigns and focus more on a few campaigns, turning to some annual big events in order to attain the customers’ high expectations during these periods. Moreover, it is important for eleven to keep on developing its affiliate programs as mentioned in the opportunities section. Since customers are the cornerstones of eleven’s success, it is essential that the company strives to better understand and fulfil customers’ demand continuously. Even though it has been pointed out that new entrants could be considered as threats for eleven, the impact is not as severe as it may seem. eleven already has a leading position in Scandinavian market, thus, they are less vulnerable towards potential new competitors.

5.3 Shortcomings

The initial data was overall well fit to Poisson distribution but there were few models which were not perfectly fit to the distribution. The final groupings of variables led to a huge improvement of overall significance of the coefficient estimates. Unfortunately, a very few coefficients remained insignificant. These could have affected the reliability of the prediction models negatively. A better grouping of variables could have been re-trieved.

(48)

taking only one of these product types into consideration. However, in order to fully model all product types, this kind of error must be handled better by having a better hierarchical structure initially. Another error in the data were insufficient data observa-tions for some product types or some combinaobserva-tions. These errors were handled by setting requirement of minimum number of observed data in order to get reliable mathematical results. This could have led to disregarding of many possible combinations and should be improved for later studies.

The different requirements needed during the study can be hard to set. Not so well-reasoned or well-thought-out requirements can affect the final results negatively.

This study did not include the different factors which in reality affect the sale statistics tremendously. For example, different campaigns, sales, viral trends, number of clicks of each product and number of times each product was placed in a shopping basket. By taking these factors into consideration, a better algorithm could have been developed. Finally, the main assumption the study had was that the users’ preferences remain un-changed through time. In fact, preferences and demands in the beauty e-commerce market change, just like it does in the whole retailing sector. Therefore, a current reli-able result cannot remain relireli-able for a long time. Constant update of initial sales data should be strived.

5.4 Comparison to earlier studies

Studies related to recommender system are the only similar studies we were able to identify. Despite the similarities, the existing differences can dilute the purpose of com-parisons. Therefore, it is only relevant to compare the main ideas and approaches from a general view.

As discussed earlier, interest or demand within the beauty e-commerce market changes through time. The interest shift should be taken into account when interpreting the reliability of the results of this study. The algorithm derived from this study has not actively handled the interest shifts of the customers. As recommended, the customer data which the algorithm is based on should be updated to a more recent one regularly. On the other hand, unlike the recommender systems, interest shift of a customer base is slower or more solid than interest shifts of an individual. Therefore, an active handling of this might not be as consequential as in recommender systems.

(49)

(50)

Chapter 6

Conclusion

The purpose of this thesis was to create product combinations based on customer data analysis. This thesis provided a combined study of applied mathematical statistics and marketing strategy analysis. Coming back to the target questions, the corresponding answers can be described as following:

• Which factors are most significant for sales statistics of certain product types? Based on the regression analysis, it has been shown that all the defined factors, sales price, brand, rating, purchase month, purchase country and how new or old the product is, are significant for sales statistics. However, there are specific characteristics within each variable group that are most significant for a certain product type, which have been presented in Section 5.1.

• Which product types have the highest sales amount during specific months? It can be concluded that product type C7 has the best sales amount throughout a year (see Table 4.4 in Section 4.1.3). Most of other top products occur more than once in the table, which also indicates that seasonal variation has not so much impact on top-selling products.

• Is it possible to create product combinations including two or three different products of same and different product categories?

The study shows that combinatorics can indeed be used to combine two and three prod-ucts of same and different product categories (see Section 4.2.1).

(51)

The conditional probability of a second or a third product getting purchased can be re-trieved by the conditional probability matrices which were developed in Section 4.2.2.

(52)

Chapter 7

Further research

Throughout the whole study, a problem which has been identified was the main struc-ture of the initial data. How to develop a better data sorting system is recommended for further research. It will not only help to build better models but also benefit company itself in clarifying all the information.

(53)

Chapter 8

References

Blom, Gunnar, Enger, Jan, Englund, Gunnar, Grandell, Jan and Holst, Lars. 2016. Sannolikhetsteori och statistikteori med tillämpningar. 5th ed. Studentlitteratur AB. Chen, Zhimin, Jiang, Yi and Zhao, Yao. 2010. A Collaborative Filtering Recommendation Algorithm Based on User Interest Change and Trust Evaluation. International Journal of Digital Content Technology and its Applications: Volume 4, Number 9.

eleven AB. 2017. Om eleven. eleven.se. https://eleven.se/om-eleven-p6.html (Accessed 2017.04.28).

Greene, William H. 2012. Econometric Analysis. 7:th ed. England: Pearson Education Limited.

Gut, Allan. 2009. An Intermediate Course in Probability. 2:nd ed. New York: Springer Science+Business Media.

Lampropoulos, Aristomenis S. and Tsihrintzis, George A. 2015.

Machine Learning Paradigms: Application in Recommender Systems. Springer Interna-tional Publishing. Switzerland.

Mazur, David R. 2010. Combinatorics: A Guided Tour. USA: The Mathematical Asso-ciation of America.

Montgomery, Douglas C., Peck, Elizabeth A. and Vining, G. Geoffrey. 2012.

Introduction to Linear Regression Analysis. 5:th ed. Hoboken, New Jersy: John Wiley & Sons, Inc.

Ramberg, Joachim; CTO eleven AB. 2017. Interview 27 Jan.

(54)

University of Minnesota.

Silber Kenneth H., Foshay Wellesley R., Watkins, Ryan, Leigh, Doug, Moseley, James L. and Dessinger, Joan C. 2010. Handbook of Improving Performance in the Workplace. Vol. 2. International Society fot Performance Improvement.

Wang, Hsiao-Fan and Wu, Cheng-Ting. 2010. A strategy-oriented operation module for recommender systems in E-commerce. Computers & Operations Research 39 (2012) 1837-1849.

(55)

(56)

Algorithm that creates product combinations based on customer data analysis

Algorithm that creates product

combinations based on customer

data analysis

An approach with Generalized Linear Models

and Conditional Probabilities

ENKHZUL UYANGA

LIDA WANG

Algorithm that creates product

combinations based on customer

data analysis

An approach with Generalized Linear Models

and Conditional Probabilities

ENKHZUL UYANGA

LIDA WANG

Algoritm som skapar produktkombinationer baserad på kunddata

analys: En metod med Generaliserade Linjära Modeller och

Betingade Sannolikheter

Contents

Chapter 1

Introduction

1.1

Background

1.2

Purpose

1.3

Problem definition

1.4

Scope

Chapter 2

Theory

2.1

Generalized linear model

2.2

Poisson distribution

2.3

Variable selection and model evaluation

2.4

Multicollinearity

2.5

Combinatorics

2.6

Conditional probability

2.7

Recommender systems

2.8

SWOT-analysis

Chapter 3

Methods

3.1

About the implementation

3.2

Part I: Analysis and model development by GLM

3.3

Part II: Conditional probability matrices

3.4

Algorithm formulation

Chapter 4

Results

4.1

Part I

4.2

Part II

4.3

Algorithm

4.4

SWOT chart

Chapter 5

Discussion

5.1

Interpretation of the results

5.2

Marketing strategy analysis

5.3

Shortcomings

5.4

Comparison to earlier studies

Chapter 6

Conclusion

Chapter 7