• No results found

A comparative study of algorithms used in recommender systems: measuring their accuracy on cold start data

N/A
N/A
Protected

Academic year: 2022

Share "A comparative study of algorithms used in recommender systems: measuring their accuracy on cold start data"

Copied!
54
0
0

Loading.... (view fulltext now)

Full text

(1)

IN

DEGREE PROJECT COMPUTER ENGINEERING, FIRST CYCLE, 15 CREDITS

STOCKHOLM SWEDEN 2020,

A comparative study of algorithms used in recommender systems:

measuring their accuracy on cold- start data

ISAC HAGLUND LISA JOHANSSON

KTH ROYAL INSTITUTE OF TECHNOLOGY

SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

(2)

Bachelor in Computer Science Date: June 8, 2020

Supervisor: Kevin Smith Examiner: Pawel Herman

School of Electrical Engineering and Computer Science Swedish title: En jämförelsestudie om algoritmer inom

rekommendationssystem: mätning av deras precision på cold-start data

(3)
(4)

iii

Abstract

A common problem today is the increasing amount of information that people get exposed to, an issue that recommender systems aim to fix through giving personalized recommendations. The most widely used technique within rec- ommender systems is collaborative filtering, which uses the similarity between users who share the same interest. A recommendation can, therefore, be done by finding users with similar interests and finding new items through those. A known problem within recommender systems is the cold-start problem. This arises when a new user or a new item is added to the system. Due to limited information about those, it becomes more difficult to generate accurate per- sonalized recommendations.

The aim of this report is to study how a set of algorithms within collabora- tive filtering performs during the cold-start problem. The chosen algorithms are SVD, SVD++, and Slope One. Both SVD and SVD++ belong to a model- based approach, and Slope One belongs to a memory-based approach, two categories that algorithms within collaborative filtering are divided into.

The result of the study indicates that the memory-based algorithm, Slope One, is less accurate and has lower performance than the model-based algorithms, SVD and SVD++, which is in line with previous research. Regarding SVD and SVD++, further studies need to be conducted in order to conclude which of them performs the best during the cold-start problem.

(5)

iv

Sammanfattning

Ett vanligt problem idag är den ökande mängd information som personer blir exponerad för, ett problem som rekommendationssystem syftar till att lösa ge- nom att ge personliga rekommendationer. Den mest använda modellen inom rekommendationssystem är collaborative filtering, som använder likheter mel- lan användare som delar samma intresse. En rekommendation kan därför göras genom att hitta användare med liknade intresse och hitta nya produkter via den.

Ett känt problem inom rekommendationssystem är cold-start problemet. Detta uppstår när en ny användare eller en ny produkt läggs till i systemet. På grund av begränsad information om dessa blir det svårare att generera korrekta per- sonliga rekommendationer.

Syftet med den här rapporten är att studera hur en mängd algoritmer inom collaborative filtering presterar under cold-start problemet. Dem valda algo- ritmerna är SVD, SVD++ och Slope One. Både SVD och SVD++ är modell- baserade, medan Slope One är minnesbaserad. Dessa är två kategorier som algoritmerna inom collaborative filtering delas in i.

Resultatet av studien indikerar på att den minnesbaserade algoritmen Slope One har sämre prestanda än dem modellbaserade algoritmerna SVD och SVD++, vilket är i enlighet med tidigare forskningar. Vad gäller SVD och SVD++ be- höver ytterligare studier genomföras för att kunna dra någon slutsats om vilken av algoritmerna som presterar bäst under cold-start problemet.

(6)

Contents

1 Introduction 1

1.1 Research Question . . . 2

1.2 Thesis Organization . . . 2

2 Background 3 2.1 Collaborative filtering . . . 3

2.2 Cold-start problem . . . 4

2.2.1 New Item . . . 4

2.2.2 New User . . . 4

2.3 Algorithms . . . 5

2.3.1 SVD . . . 5

2.3.2 SVD++ . . . 5

2.3.3 Slope One . . . 5

2.4 Related work . . . 6

3 Methods 7 3.1 Data set . . . 7

3.2 Base Case . . . 7

3.3 Cold-start data . . . 8

3.3.1 New User . . . 8

3.3.2 New Item . . . 8

3.4 Software . . . 8

4 Results 10 4.1 Error estimation methods . . . 10

4.2 Base Case . . . 11

4.3 Cold Start . . . 12

4.3.1 New User . . . 12

4.3.2 New Item . . . 17

v

(7)

vi CONTENTS

5 Discussion 23

5.1 Result . . . 23

5.2 Method . . . 24

6 Conclusions 26 Bibliography 27 A Code 31 A.1 New Item . . . 32

A.2 New User . . . 34

B Raw Data 35 B.1 New Item . . . 35

B.1.1 RMSE . . . 35

B.1.2 MAE . . . 38

B.2 New User . . . 41

B.2.1 RMSE . . . 41

B.2.2 MAE . . . 44

(8)

Chapter 1 Introduction

The increasing amount of information that people get exposed to online has become a problem in recent years. Recommender systems aim to fix this, by helping users find content that is the most valuable for them. Approaches to recommender systems provide the user with personalized recommendations regarding content and services [1], in areas such as e-commerce, e-learning, e-government, and, e-business. Movies, books, and music are commonly rec- ommended through the use of recommender systems [2].

Approaches to recommender systems are mainly divided between the two tech- niques; content-based filtering (CBF) and collaborative filtering (CF), where the latter is most widely used. The CF-technique collects ratings for items and matches these together with users that share the same preferences [3], while the CBF-technique selects items based on the content of the items and the user’s preferences [4]. This paper deals with the SVD-algorithm its extended version SVD++, and Slope One, which all are based on CF.

Algorithms within the set of CF-techniques suffer from one common issue called the cold-start problem, which consists of two issues; new user and new item. The former refers to data regarding a new user with few or no existing ratings while the latter refers to a new item with few or no existing ratings.

Since the information about the user or item is limited, it becomes a problem for algorithms, inhibiting them from giving accurate recommendations [5].

1

(9)

2 CHAPTER 1. INTRODUCTION

1.1 Research Question

Since the cold-start problem is a common issue for CF-algorithms it would be interesting to compare a subset of these algorithms to see which of them performs the best. Therefore, we pose the question:

How does the CF model SVD, its extended version SVD++, and Slope One, compare regarding their accuracy on the cold-start problem?

1.2 Thesis Organization

Chapter 2 begins with explaining the concept of collaborative filtering, the cold-start problem, as well as presenting the chosen algorithms. Chapter 3 then describes the method used in order to compare the algorithms, along with the used data sets and the software. In chapter 4 the results from the study are presented in several figures, then followed by chapter 5, which discusses the results as well as the approach and finally, chapter 6, which presents a conclusion.

(10)

Chapter 2 Background

In this section the collaborative filtering technique will first be explained. There- after the cold-start problem along with the chosen algorithms will be described further.

2.1 Collaborative filtering

The CF-technique is based on the fact that humans have been sharing opinions with each other for centuries. It uses the assumption that people with similar tastes and opinions are likely to rate items similarly [6]. There exist several algorithms using the CF-technique which are all based on the main idea of predictions being founded on the similarity between users who share the same interest. It is possible to measure this similarity and once it is quantified it can compute personalized recommendations for users [7].

Furthermore, the CF-technique is divided into memory-based and model-based algorithms. Memory-based algorithms require that all data in terms of items, ratings, and users are stored in the memory [6]. This approach uses the rela- tionship between users (user-based) or items (item-based) [8]. In user-based approaches, users with similar tastes are found, and new items are recom- mended through those. In item-based approaches, new items are recommended through finding users that like a specific item and find new items that those users like [9]. One memory-based algorithm is Slope One [10].

Model-based algorithms are built using machine learning in order to predict a user’s rating of items [9]. Latent-factor models are one of the most recent and commonly used methodologies [11], and these models use the user-item rating

3

(11)

4 CHAPTER 2. BACKGROUND

matrix in order to define the users and the items by a number of factors. Con- sidering movies, such a factor could for instance be genres. Considering users, these factors measure the users’ opinion regarding the movie in contrast to the movie factor. One instance of latent-factor models is matrix factorization [12], which is one of the most successful model-based approaches. It can achieve better accuracy than memory-based algorithms [8]. One model within matrix factorization is single value decomposition (SVD) [13][12]. SVD factorizes the user-item rating matrix, which contains items’ ratings given by users, into three other matrices, which then are used to make predictions [3].

Since memory-based algorithms base their predictions on large data sets, this leads to downgrade system performance. Model-based algorithms aim to solve this problem [13]. Moreover, memory-based algorithms also tend to have lower accuracy on their predictions, compared to model-based algorithms [14].

2.2 Cold-start problem

In order to provide personalized recommendations, the CF-technique requires information in terms of previous ratings, from the user and the item. There- fore, the cold-start problem arises with new or inactive users, new items, and a combination of both [15]. The cold-start problem is one of the most challeng- ing problems, and even though research has been conducted it has not been completely solved yet [16]. This paper will focus on the cold-start problem regarding the issues of new users and new items, which are explained further below.

2.2.1 New Item

This refers to the problem when there are new items in the system which have not been rated yet. This means the items in question will not be recommended to users in an efficient manner [6].

2.2.2 New User

When a user is registered for the first time, the person does not have any rat- ings on items, which means that the user’s preferences cannot be defined. The algorithms can therefore not provide any personalized recommendations [6].

However, there are some strategies that can be used to minimize this problem.

(12)

CHAPTER 2. BACKGROUND 5

One strategy is to force the user to rate some initial items in the beginning, another one is to display some non-personalized recommendations until the user has rated some items on their own, yet another option is to ask questions about the user’s preferences in the upon the users first session [6].

2.3 Algorithms

2.3.1 SVD

As mentioned above the SVD algorithm is a latent factor model. This algo- rithm tries to transform both users and items to the same latent factor space.

This space tries to clarify the ratings by defining both users and items on fac- tors given by user feedback. Once the transformation is done, both users and items are comparable [17].

Specifically, the algorithm uses matrix factorization. It factorizes the user-item rating matrix into three other matrices, which are used in order to recommend items for a user [3]. Furthermore, a SVD-based algorithm has proven to be an efficient and effective algorithm when dealing with the cold-start problem [18].

2.3.2 SVD++

The SVD++ algorithm is an extended version of the SVD algorithm. In sim- ilarity with SVD, SVD++ utilize the user-item rating matrix, which contains ratings given by n users and m items. The user-item rating matrix is factor- ized into two latent-factor matrices, one user-factor matrix which indicate the user’s features, and one item-factor matrix which indicate the item’s features.

As an improved version of SVD, SVD++ also uses implicit feedback, which for example, includes historical browsing and rating data. To accomplish this, SVD++ adds a vector for each item, containing the characteristics of the item.

Together with the other matrices, this is used to predict ratings. Moreover, SVD++ achieves a better prediction accuracy than SVD because of its use of implicit feedback [19].

2.3.3 Slope One

Slope One is another commonly used algorithm within the CF-technique. This algorithm is easy to implement and also reasonably accurate, and is, therefore,

(13)

6 CHAPTER 2. BACKGROUND

useful in real-world recommender systems. Slope One predict ratings by using information from users who rated the same item and information from items rated by the same user. Moreover, Slope One has proven to be efficient on sparse data [20], which is another common problem within the CF-technique where there exist few ratings compared to the amount of users and items [21].

The sparse-data problem is also closely related to the cold-start problem [15], and thus evaluating Slope One’s performance on cold-start data is interesting.

2.4 Related work

In the mid-1990s the recommender system topic became an independent re- search area. Traditional models like collaborative filtering and content-based filtering have commonly been used since then. Recently more advanced mod- els have been developed in order to achieve better performance and to re- duce common issues that emerge from the traditional models [2]. For in- stance, Basiri et al. have proposed a new hybrid between the CF and the CBF-technique in order to avoid the cold-start problem [5], and Pennock et al. developed a hybrid that combines memory and model-based collaborative filtering [22].

Research regarding recommender systems often aims to either develop already existing algorithms further or develop new algorithms in order to improve the prediction result [21]. Evert and Mattisson compare SVD++ with Slope One and, NMF, which all are algorithms within the CF-technique, regarding user prediction. They conclude that SVD++ outperforms Slope One, also showing that the attributes of data sets affect how well a model will perform [21].

In terms of the discussed SVD-model in this paper, several extended versions have been developed in order to achieve a better prediction accuracy. Besides the extended versions above, Zhang et. al proposes a hybrid between SVD++

and auto-encoding to promote computational efficiency, however, they also measure the RMSE of the compared models. Averaging their results from their five times iterated execution, they find that their hybrid outperforms the original SVD++ in predictive accuracy [23].

(14)

Chapter 3 Methods

In this section, the method of the study is described. Three different test-cases have been defined in order to measure the performance of the algorithms, and these are; base case, new user, and new item. The two test-cases new user and new item constitute the cold-start problem. The base case has been defined to compare the algorithms’ performance on normal data. How the data of these test-cases are represented is described further below. In order to obtain a fair result, each algorithm has been tested five times for each test-case.

3.1 Data set

The data set used for all test-cases is MovieLens 100k from MovieLens. It in- cludes 100 000 ratings, ranging from one to five, given by 943 users on 1682 movies. The data set only includes explicit data, which means preferences are given explicitly by the user, in this case, ratings [24]. It does not however con- tain implicit data, which is data not given explicitly by the user, for instance historical browsing data [19].

3.2 Base Case

To evaluate and compare the algorithms’ performance the Movielens data set has been defined as a base case. This base case represents data that does not include the cold-start problem, which means that users or items with few

7

(15)

8 CHAPTER 3. METHODS

ratings are not included. In accordance with previous research, this will be conducted by having 20 percent of the data as the test set and 80 percent of the data as the training set [18].

3.3 Cold-start data

3.3.1 New User

To evaluate the algorithms’ performance on the test case new user, 80 percent of the data from Movielens 100k is used as the training set and the remaining 20 percent is used as the test set. The data in the test set is manipulated in order to represent the cold-start problem. In accordance with previous research, users that have rated less than 20 items, can be considered as cold-start users [25]. Therefore, the test-case new user is represented with data that includes [1,..,19] ratings per user. Each user’s occurrence in the test set is counted, and depending on the number of ratings per user the cold-start data should include, a number of ratings per user are deleted. The users in the test-set are unique and are randomly chosen each time cold-start data is generated.

3.3.2 New Item

The cold-start situation new item is represented in the same way as the test- case new user. The data is divided into a training and test set, which consists of 80 percent respectively 20 percent. The data in the test set is manipulated in the same way as the data for new user, but instead of counting each user’s oc- currence, each item’s occurrence is counted. Depending on if the item should have [1,...,19] ratings, a specific number of ratings are deleted. This is done by deleting rows.

One difference between the test-cases new user and new item is that when gen- erating the data for new item, the data set initially contains cold-start items, i.e.

items with less than 20 ratings. Since this would make our generated cold-start inconsistent these items are deleted from the data set before it is manipulated.

3.4 Software

In this study the software library Surprise is used. Surprise is a Python scikit, a package used with Python, that builds and analyze recommender systems

(16)

CHAPTER 3. METHODS 9

that deal with explicit rating data [26]. It provides several built-in algorithms, and built-in data as well if needed. The test-case base case was implemented directly, using the built-in function for splitting the data into a training set and a test set. The test-cases new item and new users, which constitute the cold-start problem, needed extra code in order to manipulate the data and some minor changes in the built-in functions in order to run the algorithms. This code is shown in Appendix A.

(17)

Chapter 4 Results

In this section, the algorithms’ performance is presented by different figures.

The figures represent the performance of SVD, SVD++, and Slope One on the different test cases. The chapter starts off by describing the error estimation methods that were used to evaluate the algorithms’ performance. The raw data from each run of the test-cases can be seen in Appendix B.

4.1 Error estimation methods

Two widely used error estimations methods, that measure the accuracy of al- gorithms, are root mean square error (RMSE) and mean absolute error (MAE).

These methods compare the predicted rating with the true rating and then cal- culate a value. A lower value implies a more accurate prediction [27]. They differ in the sense that MAE uses an identical weight to all errors while RMSE uses a higher weight on errors proportional to their absolute values, squaring the prediction errors. There is no generally preferred method, however, they become useful in different scenarios, RMSE is used when the sample’s error distribution is expected to be normally distributed with low variance, whilst MAE is used when the error is uniformly distributed with high variance ex- pected. Both methods have higher accuracy with larger samples, as the sample grows so does the accuracy [28].

(i) MAE = 1 n

Xn i=1

|ei|

10

(18)

CHAPTER 4. RESULTS 11

(ii) RMSE =

s1

n⌃ni=1ei 2

The formulas above are used to calculate the MAE (i) and RMSE (ii), where n is the number of test samples made and e is the absolute value of the prediction and actual value [28].

4.2 Base Case

In order to compare the algorithms’ performance, a base case is defined. This case includes normal data that excludes cold-start data. This case was tested five times with each algorithm. The result in the figures below are shown as dots, indicating the RMSE of each iteration.

Figure 4.1: This diagram shows the result of the algorithms’ performance measured with RMSE, for each iteration.

(19)

12 CHAPTER 4. RESULTS

Figure 4.2: This diagram shows the result of the algorithms’ performance measured with MAE, for each iteration.

The performance of Slope One on normal data shows an RMSE-value be- tween (ca) 0.95 and 0.94, and an MAE-value between (ca) 0.75 and 0.735.

A slight improvement in SVD’s performance, compared to Slope One, can be seen, having an RMSE-value around 0.935, and an MAE-value ranging from (ca) 0.735 to 0.74. SVD++ outperforms the other algorithms, having an RMSE-value around 0.92, and an MAE value around 0.725. The above result clearly indicates that Slope One has the lowest performance.

4.3 Cold Start

The result for each algorithm is shown in a so-called, box-plot diagram. Each box consists of all the resulting data, which has been gathered by running the algorithms five times for each cold-start case, i.e. [1,...,19] ratings. Further- more, the green horizontal line in the box represents the median value of all iterations, the upper part of the box represents the first quartile and the lower part represents the third quartile. The vertical line represents the maximum and minimum value of the iterations.

4.3.1 New User

The following diagrams show the performance of the different algorithms on the cold-start data generated for the test-case new user. The x-axis represents the number of ratings of every user and the y-axis represents the error estima- tion values.

(20)

CHAPTER 4. RESULTS 13

SVD

Figure 4.3: This diagram shows SVD’s performance on the cold-start data for the test-case new user, evaluated with RMSE.

Figure 4.4: This diagram shows SVD’s performance on the cold-start data for the test-case new user, evaluated with MAE.

With new user data, the SVD algorithm results in minor improvement when increasing the cold start factor. The median accuracy of the RMSE starts

(21)

14 CHAPTER 4. RESULTS

out around 1.10, reaching down to an approximate median of 1.075 (figure 4.3). The median of MAE also decreases, from between 0.90 and 0.91 to be- tween 0.87 and 0.88 (figure 4.4). Although a slight improvement follows the increase of the cold start factor, it does not compare to the performance quality of the base case shown in figures 4.1 and 4.2, indicating the poor performance effect which cold-start data has on SVD.

SVD++

Figure 4.5: This diagram shows SVD++’s performance on the cold-start data for the test-case new user, evaluated with RMSE.

(22)

CHAPTER 4. RESULTS 15

Figure 4.6: This diagram shows SVD++’s performance on the cold-start data for test-case new user, evaluated with MAE.

SVD++’s performance on new user starts off with a lower value of RMSE, at a level between 1.0 and 1.01, then ending with a median between 1.03 and 1.04 for RMSE (figure 4.5). MAE similarly starts off at a level between 0.79 and 0.8 and ends with a median between 0.83 and 0.84 (figure 4.6). This result indicates that SVD++ has a lower performance on cold-start data, compared to the performance on the base case, shown in figure 4.1.

Slope One

Figure 4.7: This diagram shows Slope One’s performance on the cold-start data for the test-case new user, evaluated with RMSE.

(23)

16 CHAPTER 4. RESULTS

Figure 4.8: This diagram shows Slope One’s performance on the cold-start data for the test-case new user, evaluated with MAE.

The result above indicates that Slope One has lower accuracy operating on cold-start data. The median of RMSE starts off between 1.075 and 1.100 and ends between around 1.125 (figure 4.7). MAE median starts at a value between 0.91 and 0.92 and stops between 0.95 and 0.96 (figure 4.8).

Visualization of the algorithms’ performance side by side:

In order to obtain a clear visualization of the algorithms’ differences in their performance on the cold-start case new user, the below figures, 4.9 and 4.10, were created. These show the mean value from all five iterations of each cold- start rating.

Figure 4.9: This diagram shows all three algorithms’ performance on the cold- start data for the test-case new user, evaluated with RMSE.

(24)

CHAPTER 4. RESULTS 17

Figure 4.10: This diagram shows all three algorithms’ performance on the cold-start data for the test-case new user, evaluated with MAE.

The compared results in figures 4.9 and 4.10 above, show that Slope One has a higher value of both RMSE and MAE, indicating a lower performance compared to both SVD and SVD++. Moreover, the figures also indicate that SVD has a lower performance compared to SVD++, in the case of new user.

4.3.2 New Item

The following diagrams show the performance of the different algorithms on the cold-start data generated for the test-case new item. The x-axis represents the number of ratings of every item and the y-axis represents the error estima- tion values.

(25)

18 CHAPTER 4. RESULTS

SVD

Figure 4.11: This diagram shows SVD’s performance on the cold-start data for the test-case new item, evaluated with RMSE.

Figure 4.12: This diagram shows SVD’s performance on the cold-start data for the test-case new item, evaluated with MAE.

Both figure 4.11 and figure 4.12 show a minor improvement of the algo- rithm’s performance in the cold-start case with few ratings when compared to the cold-start case with 19 ratings. The median RMSE of SVD starts off with a value around 1.100, and ends with a value between 1.075 and 1.100 (figure 4.11). The median MAE begins with a value close to 0.900 and ends around 0.875 (figure 4.12). Compared to SVD’s performance on normal data, shown in figures 4.1 and 4.2, these box-plot diagrams indicate lower performance,

(26)

CHAPTER 4. RESULTS 19

specifically showing the poor accuracy of the algorithm during the cold-start problem for new item.

SVD++

Figure 4.13: This diagram shows SVD++’s performance on the cold-start data for the test-case new item, evaluated with RMSE.

Figure 4.14: This diagram shows SVD++’s performance on the cold-start data for the test-case new item, evaluated with MAE.

SVD++’s performance on the cold-start data is similar to SVD’s perfor- mance. The algorithm has lesser accuracy in its performance of cold-start data, compared to the result of normal data, which is shown in figures 4.1 and 4.2. The median RMSE starts with an value close to 1.100, and ends with an

(27)

20 CHAPTER 4. RESULTS

value between 1.075 and 1.100 (figure 4.13), and the median for MAE starts around 0.900 and ends close to 0.875 (figure 4.14).

Slope One

Figure 4.15: This diagram shows Slope One’s performance on the cold-start data for the test-case new item, evaluated with RMSE.

Figure 4.16: This diagram shows Slope One’s performance on the cold-start data for the test-case new item, evaluated with MAE.

Compared to the normal data, Slope One has lower accuracy on cold-start data. The median of RMSE starts between 1.16 and 1.18, then ending around 1.16 (figure 4.15). The median MAE both starts off and ends with a value be- tween 0.96 and 0.98 (figure 4.16). These results also indicate that Slope One

(28)

CHAPTER 4. RESULTS 21

has lower performance and lower correctness compared to SVD and SVD++

during the cold-start problem new item as well. This is also visualized in fig- ure 4.17 and 4.18 below.

Visualization of the algorithms’ performance side by side:

In order to obtain a clear visualization of the algorithms’ differences in their performance on the cold-start case new item, the below figures, 4.17 and 4.18, were created. These show the mean value from all five iterations of each cold- start rating.

Figure 4.17: This diagram shows all three algorithms’ performance on the cold-start data for the test-case new item, evaluated with RMSE.

Figure 4.18: This diagram shows all three algorithms’ performance on the cold-start data for the test-case new item, evaluated with MAE.

(29)

22 CHAPTER 4. RESULTS

As shown in figures 4.17 and 4.18, Slope One has a higher value of both RMSE and MAE, which indicates a lower accuracy of the predictions. Inter- estingly, SVD and SVD++ almost have the exact same values of RMSE and MAE, which indicates that their performance on this data set is almost the same (figure 4.17 and 4.18).

(30)

Chapter 5 Discussion

5.1 Result

The results indicate that Slope One is the algorithm that has the lowest accu- racy on all test-cases. The other algorithms, SVD and SVD++, almost have identical results operating on the test-case new item, but on the other hand, SVD++ differs from SVD working with the test-case new user. Even though the cold start factor does not affect the accuracy substantially, SVD++ outper- formed SVD on the new user data which is visualized in figures 4.9 and 4.10.

Compared to the result from the base case, all algorithms perform less accu- rately when using cold start data, which is in line with previous research.

As mentioned in the background chapter, Slope One is a memory-based algo- rithm, which according to previous research has a lower accuracy than model- based algorithms. The base case together with all the test-cases for cold-start data shows that Slope One has a higher value of RMSE and MAE than SVD and SVD++, which indicate that Slope One has a lower performance on predic- tions compared to SVD and SVD++. Since SVD and SVD++ are model-based algorithms, this result is in line with previous research.

In addition, rather surprisingly, one observation is the irregular curve that was formed by the cold-start test-cases. Since the algorithms need information in order to predict a user’s behavior, our initial expectations were that the cold- start case with 19 ratings, both for new item and new user, would achieve higher accuracy compared to the cold-start case with few ratings (0,1,2,... rat- ings). The curve should then have been a smooth curve decreasing by the number of ratings provided. Our hypothesis was to have the lowest RMSE and

23

(31)

24 CHAPTER 5. DISCUSSION

MAE value at the highest number of ratings in the cold-start cases. As seen in the above figures, for instance, figures 4.9 and 4.10, this is not the case.

Figure 4.9 for instance shows a lower RMSE value for the cold-start case with one rating than the cold-start case with 19 ratings. A possible explanation for this could be that the small amount of ratings used in our research does not have any substantial effect on the algorithms. The small amount could rather confuse the results than improving them, until a certain turning point where having a number of previous ratings actually increase the accuracy of the algo- rithms. The conclusion here is that we might have used an insufficient amount of cold start cases, and that letting them range from [1..100] rather than [1..19]

could have increased the effect of adding ratings.

Regarding the similar result of SVD and SVD++ on the cold-start case new item, this was first very surprising. A lower value of RMSE and MAE for SVD++ compared to SVD was expected since SVD++ is an improved algo- rithm of SVD. According to previous research, this algorithm should perform better, i.e. have higher accuracy in predicting user ratings. As seen in the result above, SVD++ has almost the exact same result as SVD and no clear improvements can be seen in figures 4.17 and 4.18. However, this could ac- tually depend on the choice of data set. MovieLens 100K uses explicit data, and as an improvement of SVD, SVD++ uses implicit data. Since the implicit data is not taken into consideration in the MovieLens database, this may have affected the performance of SVD++.

On the other hand, SVD++ outperforms SVD on the cold-start case new user, even though the same data set, with only explicit data, is used. A possible explanation for this difference is the manipulation of the new item data, done in order to obtain cold-start items. When deleting some items in the data set, a possible error could have occurred which might have affected the result of new item. This will be discussed further in the section method below.

5.2 Method

In retrospect, the method used could have utilized an increased number of data sets to obtain a wider and more truthful perspective of how the algorithms per- form on cold-start data. Suggestions for further research would be to use the MovieLens 1M as well as others like (Netflix) data sets to eliminate biases that a single data set could bring. Also making, as mentioned above, more ex- tended cold-start data, with increased cold-start factors. Further suggestions

(32)

CHAPTER 5. DISCUSSION 25

would be repeating the experiments more than five times, acquiring more data points, thus more accurate results. We also suggest using data sets with im- plicit data, to more further justify the use of SVD++, giving this experiment a fairer possibility for evaluation.

Regarding the removal of users with a limited amount of ratings in the new item data manipulation, the amount of data decreased significantly (44%), which could have affected the final predictions in the cold-start case new item.

Therefore we suggest looking into another method of manipulating the data set, and thus probably acquiring a more accurate result.

(33)

Chapter 6

Conclusions

From this study, the conclusion can be drawn that Slope One is the algorithm with the lowest performance. Moreover, we can also conclude that all algo- rithms’ performances decrease during cold-start cases.

However, since the result of the algorithms, SVD and SVD++ included some differences regarding the cold-start cases new user and new item, no valid con- clusion regarding these algorithms’ performance can be drawn. To conclude which of the algorithms that perform the best on cold-start data, further re- search needs to be conducted, where the suggestions in the subsection method above, is taken into consideration. This also account for the expected trend of higher accuracy resulting from an increased number of ratings which cannot be confirmed within the extent of our research, and therefore further research is needed to provide knowledge on this matter.

26

(34)

Bibliography

[1] Gediminas Adomavicius and Alexander Tuzhilin. “Toward the next gen- eration of recommender systems: a survey of the state-of-the-art and possible extensions”. In: IEEE Transactions on Knowledge and Data Engineering 17.6 (2005), pp. 734–749.

[2] Jie Lu et al. “Recommender system application developments: A sur- vey.” In: Decision Support Systems 74 (2015), pp. 12–32. : https:

//doi.org/10.1016/j.dss.2015.03.008.

[3] Huseyin Polat and Wenliang Du. “SVD-Based Collaborative Filtering with Privacy”. In: Proceedings of the 2005 ACM Symposium on Applied Computing. SAC ’05. Association for Computing Machinery, 2005, pp. 791–

795.

[4] Robin van Meteren and Maarten van Someren. Using content-based fil- tering for recommendation. : http://users.ics.forth.

gr/~potamias/mlnia/paper_6.pdf. (accessed: 18.03.2020).

[5] Javad Basiri et al. “Alleviating the cold-start problem of recommender systems using a new hybrid approach”. In: 2010 5th International Sym- posium on Telecommunications. 2010, pp. 962–967.

[6] J. Ben Schafer et al. “Collaborative Filtering Recommender Systems”.

In: The Adaptive Web: Methods and Strategies of Web Personalization.

Springer Berlin Heidelberg, 2007, pp. 291–324.

[7] Daniel Billsus and Michael J. Pazzani. Learning Collaborative Infor- mation Filters. : https://www.aaai.org/Papers/Workshops/

1998/WS-98-08/WS98-08-005.pdf. (accessed: 18.03.2020).

[8] Xin Guan, Chang-Tsun Li, and Yu Guan. “Matrix Factorization With Rating Completion: An Enhanced SVD Model for Collaborative Fil- tering Recommender Systems”. In: IEEE Access 5 (2017), pp. 27668–

27678.

27

(35)

28 BIBLIOGRAPHY

[9] Prince Grover. Various implementations of collaborative filtering. :

https://towardsdatascience.com/various-implementations- of-collaborative-filtering-100385c6dfe0. (accessed:

21.03.2020).

[10] Yongqiang Wang et al. “Learning to Recommend Based on Slope One Strategy”. In: Web Technologies and Applications. Ed. by Quan Z. Sheng et al. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 537–

544.

[11] “Latent Factor Models and Matrix Factorizations”. In: Encyclopedia of Machine Learning. Ed. by Claude Sammut and Geoffrey I. Webb.

Boston, MA: Springer US, 2010, pp. 571–571. : 978-0-387-30164-

8. : 10.1007/978-0-387-30164-8_887. : https:

//doi.org/10.1007/978-0-387-30164-8_887.

[12] Yehuda Koren, Robert Bell, and Chris Volinsky. “Matrix Factoriza- tion Techniques for Recommender Systems”. In: Computer 42.8 (2009), pp. 30–37.

[13] Minh-Phung Do, Dung Nguyen, and Nguyen of Loc. Model-based ap- proach for Collaborative Filtering. : https://www.researchgate.

net/publication/321753015_Model-based_approach_

for_Collaborative_Filtering. (accessed: 18.03.2020).

[14] P. H. Aditya, I. Budi, and Q. Munajat. “A comparative analysis of memory- based and model-based collaborative filtering on the implementation of recommender system for E-commerce in Indonesia: A case study PT X”. In: 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS). 2016, pp. 303–308.

[15] Blerina Lika, Kostas Kolomvatsos, and Stathes Hadjiefthymiades. “Fac- ing the cold-start problem in recommender systems”. In: Expert Systems with Applications 41 (2014), pp. 2065–2073. : https://doi.

org/10.1016/j.eswa.2013.09.005.

[16] Mi Zhang et al. “Addressing Cold Start in Recommender Systems: A Semi-Supervised Co-Training Algorithm”. In: Proceedings of the 37th International ACM SIGIR Conference on Research Development in In- formation Retrieval. SIGIR ’14. Gold Coast, Queensland, Australia: As- sociation for Computing Machinery, 2014, pp. 73–82. : https:

//doi.org/10.1145/2600428.2609599.

(36)

BIBLIOGRAPHY 29

[17] Yehuda Koren. “Factorization Meets the Neighborhood: A Multifaceted Collaborative Filtering Model”. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.

KDD ’08. Las Vegas, Nevada, USA: Association for Computing Ma- chinery, 2008, pp. 426–434. : 9781605581934. : https://

doi.org/10.1145/1401890.1401944.

[18] Shien Ge and Xinyang Ge. “An SVD-based Collaborative Filtering ap- proach to alleviate cold-start problems”. In: 2012 9th International Con- ference on Fuzzy Systems and Knowledge Discovery. 2012, pp. 1474–

1477.

[19] Zhengzheng Xian et al. “New Collaborative Filtering Algorithms Based on SVD++ and Differential Privacy”. In: Mathematical Problems in En- gineering 2017 (2017).

[20] Daniel Lemire and Anna Maclachlan. “Slope One Predictors for On- line Rating-Based Collaborative Filtering”. In: Proceedings of the 2005 SIAM International Conference on Data Mining, pp. 471–475. : https : / / epubs . siam . org / doi / abs / 10 . 1137 / 1 . 9781611972757.43.

[21] Anna-Karin Evert and Alfrida Mattisson. Limited Data in Recommender Systems: The Impact of Sparse Data and Cold Start on the Recom- mendation Algorithm Slope One. : https://www.kth.se/

social/files/588788adf2765412b6e8ec74/A-KEvert_

AMattisson_dkand16.pdf. (accessed: 17.02.2020).

[22] David M. Pennock et al. “Collaborative Filtering by Personality Diag- nosis: A Hybrid Memory and Model-Based Approach”. In: Proceed- ings of the 16th Conference on Uncertainty in Artificial Intelligence.

San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 473–480.

[23] Shuai Zhang, Lina Yao, and Xiwei Xu. “AutoSVD++: An Efficient Hy- brid Collaborative Filtering Model via Contractive Auto-Encoders”. In:

Proceedings of the 40th International ACM SIGIR Conference on Re- search and Development in Information Retrieval. SIGIR ’17. Shinjuku, Tokyo, Japan: Association for Computing Machinery, 2017, pp. 957–

960. : https://doi.org/10.1145/3077136.3080689.

[24] F. Maxwell Harper and Joseph A. Konstan. “The MovieLens Datasets:

History and Context”. In: ACM Trans. Interact. Intell. Syst. 5.4 (2015).

: https://doi.org/10.1145/2827872.

(37)

30 BIBLIOGRAPHY

[25] Jesús Bobadilla et al. “A collaborative filtering approach to mitigate the new user cold-start problem.” In: Knowledge-Based Systems 26 (2012), pp. 225–238. : https://doi.org/10.1016/j.knosys.

2011.07.021.

[26] Nicolas Hug. Surprise, a Python library for recommender systems. http:

//surpriselib.com. 2017.

[27] Feng Zhang et al. “Fast algorithms to evaluate collaborative filtering recommender systems”. In: Knowledge-Based Systems 96 (2016), pp. 96–

103. : https://doi.org/10.1016/j.knosys.2015.

12.025.

[28] Tianfeng Chai and R.R. Draxler. “Root mean square error (RMSE) or mean absolute error (MAE)?– Arguments against avoiding RMSE in the literature”. In: Geoscientific Model Development 7 (June 2014), pp. 1247–1250. : https://doi.org/10.5194/gmd-7- 1247-2014.

(38)

31

(39)

32 APPENDIX A. CODE

Appendix A Code

A.1 New Item

(40)

APPENDIX A. CODE 33

(41)

34 APPENDIX A. CODE

A.2 New User

(42)

Appendix B Raw Data

B.1 New Item

B.1.1 RMSE

35

(43)

36 APPENDIX B. RAW DATA

(44)

APPENDIX B. RAW DATA 37

(45)

38 APPENDIX B. RAW DATA

B.1.2 MAE

(46)

APPENDIX B. RAW DATA 39

(47)

40 APPENDIX B. RAW DATA

(48)

APPENDIX B. RAW DATA 41

B.2 New User

B.2.1 RMSE

(49)

42 APPENDIX B. RAW DATA

(50)

APPENDIX B. RAW DATA 43

(51)

44 APPENDIX B. RAW DATA

B.2.2 MAE

(52)

APPENDIX B. RAW DATA 45

(53)

46 APPENDIX B. RAW DATA

(54)

www.kth.se

TRITA-EECS-EX-2020:372

References

Related documents

Complex projects, complexity models, Grounded theory, structural complexity, technical complexity, uncertainty, IT and Engineering projects, Organizational change

Påverkan av gles data och cold start på rekommendationsalgoritmen Slope One. ANNA-KARIN EVERT

Background: This randomised study compared the detection rate of cervical intraepithelial neoplasia-positive (CIN2 þ ) based on histology in women performing repeated self-sampling

Snow, ice and cold have become integral to the meanings of polar areas and cultural understandings of cold matters therefore incorporate the idea, although rarely, perhaps,

Worth to mention is that many other CF schemes are dependent on each user’s ratings of an individ- ual item, which in the case of a Slope One algorithm is rather considering the

If for a given task its “local” constraint does not hold then this makes false all the arc constraints attach to this task; as a consequence the vertex associated to that task

För att stärka studiens giltighet beskriver Hassmen och Hassmen (2008, s. 136) vikten av att intervjun fångar det fenomen som är av intresse för studien. Denna uppsats ville undersöka

Samtida hög relativa luftfuktighet och hög temperatur som är optimal för mö- geltillväxt (22 till 27 °C) förekommer normalt inte i simuleringen varken före eller